Linguistic annotation of the Spoken Dutch Corpus: If we had to do it all over again ...


Ineke Schuurman (1), Wim Goedertier (2), Heleen Hoekstra (3), Nelleke Oostdijk (4), Richard Piepenbrock (4), Machteld Schouppe (1)

(1) Center for Computational Linguistics, University of Leuven, Maria-Theresiastraat 21, 3000 Leuven, Belgium (ineke.schuurman,machteld.schouppe@ccl.kuleuven.ac.be); (2) Electronics and Information Systems (ELIS), University of Ghent, Sint-Pietersnieuwstraat 41, 9000 Gent, Belgium (odul@elis.ugent.be); (3) Utrecht Institute of Linguistics OTS, University of Utrecht, Trans 10, 3512 JK Utrecht, The Netherlands, (heleen.hoekstra@let.uu.nl); (4) Department of Language and Speech, University of Nijmegen, P.O.Box 9103, 6500 HD Nijmegen, The Netherlands, (n.oostdijk,r.piepenbrock@let.kun.nl)




After the successful completion of the Spoken Dutch Corpus (1998 -- 2003) the time is ripe to take some time to sit back and reflect on our achievements and the procedures underlying them in order to learn from our experiences. In this paper we will in particular pay attention to issues affecting the levels of linguistic annotation, but some more general issues deserve to be treated as well (bug reporting, consistency). We will try to come up with solutions, but sometimes we want to invite further discussion from other researchers.


creation of LR, (bi)national action, spoken language, dialogues, linguistic annotation



Full Paper