Designing Prosodic Databases for Automatic Modeling of Slovenian Language in a Multilingual TTS System


Achim F. Müller (Siemens Corporate Technology, Dept. CTIC 5, 81730 Munich, Germany)

Janez Sterga (University of Maribor, Fact. for EE and Comp. Science Maribor, Slovenia)

Bogomir Horvat (University of Maribor, Fact. for EE and Comp. Science Maribor, Slovenia)


SP1: Speech Resources


In this paper the design of a prosodic data base and the data driven prediction of phrase breaks for modeling Slovenian language in a multilingual text-to-speech (TTS) system are presented. Automatic learning techniques offer a solution in adapting prosodic models to a new language, voice or a new application, because they allow prosodic regularities to be automatically extracted from a prosodic database of natural speech. Such techniques depend on the construction of a large corpus labeled with symbolic prosody labels. The labeling can be done either automatically or by hand. While automatic labeling can be less accurate than hand labeling, the latter is very time consuming. Therefore an interactive tool for semi-automatic labeling that uses the segmented spoken counterpart of the text as input will be presented. The tool combines the advantage of hand labeling and automatic labeling by achieving a high consistency in labeling and reducing the time that would be needed for hand labeling. The labeled Slovenian corpus has been used to train our phrase break prediction module. Experiments for the data driven prediction of major and minor phrase break labels have been performed. The achieved prediction accuracy marks state-of-the art for phrase break prediction accuracy for Slovenian language. 



Full Paper