Creating Slovenian Language Resources for Development of Speech-to-Speech Translation Components
Darinka Verdonik, Matej Rojc, Zdravko Kačič
University of Maribor, Faculty of Electrical Engineering and Computer Scinence, Smetanova ul. 17, Maribor, Slovenia
Article brings detailed information about procedures of building Slovenian lexica within the LC-STAR project, and also detailed information about the size of that lexica. University of Maribor joined the LC-STAR project in order to provide appropriate language resources for developing speech-to-speech translation technology for Slovenian language. Lexica exists from three parts: 65.000 common words, 45.000 proper names and 6.000 special application domain words. All lexica will be morpho-syntactically tagged and phonetically transcribed. Quality of produced language resources is ensured by independent validation.
speech-to-speech translation, Slovenian, LC-STAR, POS, lexica, word list, proper names, common words