The NIST Meeting Room Pilot Corpus
John S. Garofolo (1), Christophe D. Laprun (1, 2), Martial Michel (1, 2), Vincent M. Stanford (1), Elham Tabassi (1)
(1) National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg MD 20899, USA; (2) Systems Plus Inc., 1370 Piccard Drive, Suite 270, Rockville, MD 20850, USA
One of the next big challenges in Automatic Speech Recognition (ASR) is the transcription of speech in meetings. This task is particularly problematic for current recognition technologies because, in most realistic meeting scenarios, the vocabularies are unconstrained, the speech is spontaneous and often overlapping, and the microphones are inconspicuously placed. To support the development of meeting recognition technologies by both the speech recognition and video extraction research communities, NIST is providing a development and evaluation infrastructure including: a multi-media corpus of audio and video from meetings collected at NIST using a variety of microphones and video cameras, new evaluation protocols, metrics, software, rich transcription conventions, sponsoring evaluations and workshops, facilitating multi-site data pooling, and helping bring the community together to focus on the technical challenges. To date, NIST has collected a pilot corpus of 15 hours of meetings in its specially-instrumented Meeting Data Collection Laboratory. The corpus includes digital recordings from close-talking mics, lapel mics, distantly-placed mics, 5 digitally-recorded camera views, and full speaker/word-level transcripts. This data is being used in the development and evaluation of speech technologies and by the video extraction community under the auspices of the ARDA Video Analysis and Content Exploitation (VACE) program.
meeting, corpus, evaluation, speech recognition, video extraction, data collection