Summary of the paper

Title The ATCOSIM Corpus of Non-Prompted Clean Air Traffic Control Speech
Authors Konrad Hofbauer, Stefan Petrik and Horst Hering
Abstract Air traffic control (ATC) is based on voice communication between pilots and controllers and uses a highly task and domain specific language. Due to this very reason, spoken language technologies for ATC require domain-specific corpora, of which only few exist to this day. The ATCOSIM Air Traffic Control Simulation Speech corpus is a speech database of non-prompted and clean ATC operator speech. It consists of ten hours of speech data, which were recorded in typical ATC control room conditions during ATC real-time simulations. The database includes orthographic transcriptions and additional information on speakers and recording sessions. The ATCOSIM corpus is publicly available and provided online free of charge. In this paper, we first give an overview of ATC related corpora and their shortcomings. We then show the difficulties in obtaining operational ATC speech recordings and propose the use of existing ATC real-time simulations. We describe the recording, transcription, production and validation process of the ATCOSIM corpus, and outline an application example for automatic speech recognition in the ATC domain.
Language Single language
Topics Speech resource/database, Corpus (creation, annotation, etc.), Controlled languages
Full paper The ATCOSIM Corpus of Non-Prompted Clean Air Traffic Control Speech
Slides -
Bibtex @InProceedings{HOFBAUER08.545,
  author = {Konrad Hofbauer, Stefan Petrik and Horst Hering},
  title = {The ATCOSIM Corpus of Non-Prompted Clean Air Traffic Control Speech},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {},
  language = {english}

Powered by ELDA © 2008 ELDA/ELRA