Lexicon Optimization: Maximizing Lexical Coverage in Speech Recognition through Automated Compounding


Vincent Vandeghinste (Centre for Computational Linguistics, K.U. Leuven)


SO6: Phonetic Lexicons


In this report we show that a lexicon can be designed in such a way that lexical coverage can be maximized by real-time lexicon expansion and a limited word part lexicon for Dutch speech recognition. More specifically, we describe how the lexicon is designed and how the real-time expansion module was built and tested. Tests were performed using a 36.000 entries lexicon. The test results show that out-of-vocabulary rates are rather small, due to automated rule-based compounding of the lexical building blocks. Statistical information was included to improve the accuracy of the rule-based compounding system. This approach proved to be successful.


Compounding, Lexical coverage, Speech recognition lexicon

Full Paper