Automatic Detection of Acoustic Centres of Reliability for Tagging Paralinguistic Information in Expressive Speech


Parham Mokhtari (JST-CREST ESP Project, ATR Human Information Science Laboratories, Kyoto, Japan.)

Nick Campbell (JST-CREST ESP Project, ATR Human Information Science Laboratories, Kyoto, Japan.)


SO9: Emotional & Specific Databases


Preparation of a unit-database to be used in concatenative speech synthesis demands sufficiently robust, unsupervised algorithms for processing the typically huge corpora. The demands are even more stringent when considering a corpus large enough to capture a wide variety of speaking-styles and emotions, even of a single speaker. This paper describes a method of combining robust acoustic-prosodic and cepstral analyses to locate centres of acoustic-phonetic reliability in the speech stream, wherein physiologically meaningful parameters related to voice quality can be estimated more reliably. These parameters which describe the state of glottal phonation and of supralaryngeal articulation, can then provide a paralinguistic annotation of the unit-database, thereby enabling speech synthesis with a greater variety of expressions and speaking-styles.


Centres-of-reliability, Speech synthesis, Speaking-style, Annotation, Paralinguistic

