Designing speech database with prosodic variety for expressive TTS system
Hiromichi KAWANAMI (Graduate School of Information Science, Nara Institute of Science and Technology)
Tsuyoshi MASUDA (Graduate School of Information Science, Nara Institute of Science and Technology)
Tomoki TODA (Graduate School of Information Science, Nara Institute of Science and Technology)
Kiyohiro SHIKANO (Graduate School of Information Science, Nara Institute of Science and Technology)
SO9: Emotional & Specific Databases
For the purpose of building speech synthesis system that can generate high-quality speech with wide range in prosody and realize fine prosody control, we propose new speech database constructing method. As a speech synthesis method, we select a hybrid system which consists of two part : speech unit selection and prosody modification part by STRAIGHT (vocoder type high quality analysis-synthesis method). Our viewpoint for designing database is to reduce amount of prosody modification. which causes quality deterioration. Hence, to make it possible to generate arbitrary prosody within permissible range of prosody modification, we designed 9 sub-databases those consist of same phonetic balanced text set with different prosody. In this paper, we report the designing method and general features of obtained databases. Listening tests focused on durational fearure were also conducted. The results show effectiveness of the method and the necessity to change unit selection cost according to speech rate.
Prosody, Speech database, Speech synthesis, Analysis-Synthesis, Waveform concatenation