Cypriot Speech Database: Data Collection and Greek to Cypriot Dialect Adaptation


Nikos Fakotakis

Wire Communications Laboratory, Electrical and Computer Engineering Dept., University of Patras, 265 00 Rion, Patras, Greece




This paper describes the Cypriot Greek speech database collected in the framework of the European project OrienTel (IST-2000-28373) and the acoustic models adaptation techniques that were applied in order to perform dialect adaptation from Greek to Cypriot. Greek and Cypriot Greek share the same phoneme set. However, there are some differences in the way the same phonemes are pronounced. That is, Cypriot Greek may be considered as a variation of standard Greek. Utterances from 500 speakers are used (450 for training, that is, performing adaptation, and 50 as testing material). Two tools are available for training, adaptation and evaluation of the acoustic models. These are the Wire Communications Laboratory (WCL) recognition tool and the Hidden Markov Models toolkit (HTK). For both recognition engines Greek acoustic models were already available using the SpeechDat-II Greek telephone database. Two well-known techniques are applied for adapting the Greek acoustic models to the new data: Maximum Likelihood Linear Regression (MLLR) and Maximum A-Posteriori (MAP) adaptation. Pure Cypriot Greek models are also trained using only the Cypriot Greek database, to be compared with the adapted ones. Preliminary results show a small improvement in the performance of the adapted models over the pure Greek and Cypriot Greek models.


speech databases, dialect adaptation

Language(s) Cypriot Greek, Greek
