|LREC 2000 2nd
International Conference on Language Resources & Evaluation
Previous Paper Next Paper
|The Spoken Dutch Corpus. Overview and First Evaluation
|Oostdijk Nelleke (Dept. of Language and Speech, University of Nijmegen, P.O. Box 9103, 6500 HD Nijmegen, The Netherlands, firstname.lastname@example.org)
|Annotation, Corpus Design, Dutch (spoken), Evaluation, Spoken Language Corpora
|Session SP3 - Spoken Language Resources' Projects
|In this paper the Spoken Dutch Corpus project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a 10-million-word corpus of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for research in the fields of computational linguistics and language and speech technology. The paper first gives an overall description of the project, its aims, structure and organization. It then goes on to discuss the considerations - both methodological and practical - that have played a role in the design of the corpus as well as in its compilation and annotation. The paper concludes with an account of the data that are available in the first release of the first part of the corpus that came out on March 1st, 2000.