LREC 2018 Proceedings

Summary of the paper

Title	Epitran: Precision G2P for Many Languages
Authors	David R. Mortensen, Siddharth Dalmia and Patrick Littell
Abstract	Epitran is a massively multilingual, multiple back-end system for G2P (grapheme-to-phoneme) transduction which is distributed with support for 61 languages. It takes word tokens in the orthography of a language and outputs a phonemic representation in either IPA or X-SAMPA. The main system is written in Python and is publicly available as open source software. Its efficacy has been demonstrated in multiple research projects relating to language transfer, polyglot models, and speech. In a particular ASR task, Epitran was shown to improve the word error rate over Babel baselines for acoustic modeling.
Topics	Phonetic Databases, Phonology, Speech Synthesis, Speech Recognition/Understanding
Full paper	Epitran: Precision G2P for Many Languages
Bibtex	@InProceedings{MORTENSEN18.890, author = {David R. Mortensen and Siddharth Dalmia and Patrick Littell}, title = "{Epitran: Precision G2P for Many Languages}", booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {May 7-12, 2018}, address = {Miyazaki, Japan}, editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga}, publisher = {European Language Resources Association (ELRA)}, isbn = {979-10-95546-00-9}, language = {english} }