Title The Spoken Dutch Corpus. Overview and First Evaluation
Authors Oostdijk Nelleke (Dept. of Language and Speech, University of Nijmegen, P.O. Box 9103, 6500 HD Nijmegen, The Netherlands,
Keywords Annotation, Corpus Design, Dutch (spoken), Evaluation, Spoken Language Corpora
Session Session SP3 - Spoken Language Resources' Projects
Abstract In this paper the Spoken Dutch Corpus project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a 10-million-word corpus of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for research in the fields of computational linguistics and language and speech technology. The paper first gives an overall description of the project, its aims, structure and organization. It then goes on to discuss the considerations - both methodological and practical - that have played a role in the design of the corpus as well as in its compilation and annotation. The paper concludes with an account of the data that are available in the first release of the first part of the corpus that came out on March 1st, 2000.