Title The Fisher Corpus: A Resource for the Next Generations of Speech-to-Text
Author(s) Christopher Cieri, David Miller, Kevin Walker

University of Pennsylvania, Linguistic Data Consortium, Philadelphia, PA, USA

Session O4-S
Abstract This paper describes, within the context of the DARPA EARS program, the design and implementation of the Fisher protocol for collecting conversational telephone speech which has yielded more than 16,000 English conversations. It also discusses the Quick Transcription specification that allowed 2000 hours of Fisher audio to be transcribed in less than one year. Fisher data is already in use within the DARPA EARS programs and will be published via the Linguistic Data Consortium for general use beginning in 2004.
Keyword(s) Language resources, speech, speech recognition, speech to text, conversational telephone speech, data collection, transcription, quick transcription
Language(s) Arabic, English, Mandarin
