Duration Modeling for Turkish Text-to-Speech Synthesis System
Ö. Öztürk (1), Ö. Salor (2), T. Çiloğlu (2), M. Demirekler (2)
(1) Dept. of Electrical and Electronics Eng., Dokuz Eylul Univ., Izmir, Turkey; (2) Dept. of Electrical and Electronics Eng., Middle East Tech. Univ., Ankara, Turkey
Naturalness of synthetic speech depends on appropriate modeling of prosodic aspects. Mostly, three prosody components are modeled: segmental duration, pitch contour and intensity. In this study, we present our work on modeling segmental duration in Turkish by using machine-learning algorithms. The models predict phone durations based on attributes such as phone identity, neighboring phone identities, lexical stress, position of syllable in word, part-of-speech information, word length in number of syllables and position of word in utterance. Obtained models predict segment durations better than mean duration approximations.
TTS, duration modeling, machine-learning, Turkish