Title Cross-lingual Interpolation of Speech Recognition Models
Authors Micca Giorgio (CSELT, Via G. Reiss Romoli 274, 10148 Torino, Italia,
Frasca Alessandra (Università di Roma “La Sapienza” Italia)
Di Benedetto Maria Gabriella (Università di Roma “La Sapienza” Italia)
Keywords Cross-Lingual, Multi-Lingual, Speech Recognition
Session Session SP4 - Tools for Evaluation and Processing of Spoken Language Resources
Abstract A method is proposed for implementing the cross-lingual porting of recognition models for rapid prototyping of speech recognisers in new target languages, specifically when the collection of large speech corpora for training would be economically questionable. The paper describes a way to build up a multilingual model which includes the phonetic structure of all the constituent languages, and which can be exploited to interpolate the recognition units of a different language. The CTSU (Classes of Transitory-Stationary Units) approach is exploited to derive a well balanced set of recognition models, as a reasonable trade-off between precision and trainability. The phonemes of the untrained language are then mapped onto the multilingual inventory of recognition units, and the corresponding CTSUs are then obtained. The procedure was tested with a preliminary set of 10 Rumanian speakers starting from an Italian-English-Spanish CTSU model. The optimal mapping of the vowel phone set of this language onto the multilingual phone set was obtained by inspecting the F1 and F2 formants of the vowel sounds from two male and female Rumanian speakers, and by comparing them with the values of F1 and F2 of the other three languages. Results in terms of recognition word accuracy measured on a preliminary test set of 10 speakers are reported.