Summary of the paper

Title Epitran: Precision G2P for Many Languages
Authors David R. Mortensen, Siddharth Dalmia and Patrick Littell
Abstract Epitran is a massively multilingual, multiple back-end system for G2P (grapheme-to-phoneme) transduction which is distributed with support for 61 languages. It takes word tokens in the orthography of a language and outputs a phonemic representation in either IPA or X-SAMPA. The main system is written in Python and is publicly available as open source software. Its efficacy has been demonstrated in multiple research projects relating to language transfer, polyglot models, and speech. In a particular ASR task, Epitran was shown to improve the word error rate over Babel baselines for acoustic modeling.
Topics Phonetic Databases, Phonology, Speech Synthesis, Speech Recognition/Understanding
Full paper Epitran: Precision G2P for Many Languages
