SUMMARY : Session O9-SE TTS & Units for TTS

 

Title A joint prosody evaluation of French text-to-speech synthesis systems
Authors M. Garcia, C. D’alessandro, G. Bailly, P. Mareüil, M. Morel
Abstract This paper reports on prosodic evaluation in the framework of the EVALDA/EvaSy project for text-to-speech (TTS) evaluation for the French language. Prosody is evaluated using a prosodic transplantation paradigm. Intonation contours generated by the synthesis systems are transplanted on a common segmental content. Both diphone based synthesis and natural speech are used. Five TTS systems are tested along with natural voice. The test is a paired preference test (with 19 subjects), using 7 sentences. The results indicate that natural speech obtains consistently the first rank (with an average preference rate of 80%), followed by a selection based system (72%) and a diphone based system (58%). However, rather large variations in judgements are observed among subjects and sentences, and in some cases synthetic speech is preferred to natural speech. These results show the remarkable improvement achieved by the best selection based synthesis systems in terms of prosody. In this way; a new paradigm for evaluation of the prosodic component of TTS systems has been successfully demonstrated.
Keywords text-to-speech synthesis, evaluation, prosody, prosodic transplantation
Full paper A joint prosody evaluation of French text-to-speech synthesis systems