Summary of the paper

Title A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation
Authors Robert Herms, Laura Seelig, Stefanie Münch and Maximilian Eibl
Abstract In this Paper we present a corpus named SXUCorpus which contains read and spontaneous speech of the Upper Saxon German dialect. The data has been collected from eight archives of local television stations located in the Free State of Saxony. The recordings include broadcasted topics of news, economy, weather, sport, and documentation from the years 1992 to 1996 and have been manually transcribed and labeled. In the paper, we report the methodology of collecting and processing analog audiovisual material, constructing the corpus and describe the properties of the data. In its current version, the corpus is available to the scientific community and is designed for automatic speech recognition (ASR) evaluation with a development set and a test set. We performed ASR experiments with the open-source framework sphinx-4 including a configuration for Standard German on the dataset. Additionally, we show the influence of acoustic model and language model adaptation by the utilization of the development set.
Topics Corpus (Creation, Annotation, etc.), Speech Recognition/Understanding, Acquisition
Full paper A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation
Bibtex @InProceedings{HERMS16.548,
  author = {Robert Herms and Laura Seelig and Stefanie Münch and Maximilian Eibl},
  title = {A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation},
  booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
  year = {2016},
  month = {may},
  date = {23-28},
  location = {Portoro┼ż, Slovenia},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {978-2-9517408-9-1},
  language = {english}
 }
Powered by ELDA © 2016 ELDA/ELRA