Summary of the paper

Title TED-LIUM: an Automatic Speech Recognition dedicated corpus
Authors Anthony Rousseau, Paul Deléglise and Yannick Estève
Abstract This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based on the TED Talks. This corpus was built during the IWSLT 2011 Evaluation Campaign, and is composed of 118 hours of speech with its accompanying automatically aligned transcripts. We describe the content of the corpus, how the data was collected and processed, how it will be publicly available and how we built an ASR system using this data leading to a WER score of 17.4 %. The official results we obtained at the IWSLT 2011 evaluation campaign are also discussed.
Topics Corpus (creation, annotation, etc.), Speech Recognition/Understanding, Speech resource/database
Full paper TED-LIUM: an Automatic Speech Recognition dedicated corpus
Bibtex @InProceedings{ROUSSEAU12.698,
  author = {Anthony Rousseau and Paul Deléglise and Yannick Estève},
  title = {TED-LIUM: an Automatic Speech Recognition dedicated corpus},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
Powered by ELDA © 2012 ELDA/ELRA