Summary of the paper

Title SCALE: A Scalable Language Engineering Toolkit
Authors Joris Pelemans, Lyan Verwimp, Kris Demuynck, Hugo Van hamme and Patrick Wambacq
Abstract In this paper we present SCALE, a new Python toolkit that contains two extensions to n-gram language models. The first extension is a novel technique to model compound words called Semantic Head Mapping (SHM). The second extension, Bag-of-Words Language Modeling (BagLM), bundles popular models such as Latent Semantic Analysis and Continuous Skip-grams. Both extensions scale to large data and allow the integration into first-pass ASR decoding. The toolkit is open source, includes working examples and can be found on
Topics Language Modelling, Speech Recognition/Understanding, Tools, Systems, Applications
Full paper SCALE: A Scalable Language Engineering Toolkit
Bibtex @InProceedings{PELEMANS16.484,
  author = {Joris Pelemans and Lyan Verwimp and Kris Demuynck and Hugo Van hamme and Patrick Wambacq},
  title = {SCALE: A Scalable Language Engineering Toolkit},
  booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
  year = {2016},
  month = {may},
  date = {23-28},
  location = {Portoro┼ż, Slovenia},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {978-2-9517408-9-1},
  language = {english}
Powered by ELDA © 2016 ELDA/ELRA