Using the Spoken Dutch Corpus for type-logical grammar induction
Michael Moortgat (Utrecht Institute of Linguistics – OTS Trans 10, 3512 JK Utrecht, The Netherlands)
Richard Moot (Utrecht Institute of Linguistics – OTS Trans 10, 3512 JK Utrecht, The Netherlands)
WP1: Corpora & Corpus Tools
The dependency-based annotation format employed within the Spoken Dutch Corpus (CGN) project (van der Wouden et al., 2002) has been designed in such a way as to enable a transparent mapping to the derivational structures of current ‘lexicalized’ grammar formalisms. Through such translations, the CGN tree bank can be used to train and evaluate computational grammars within these frameworks. In this paper we use the computational facilities of the Grail system (see Moot, 2002) to extract type logical grammars from the CGN annotation graphs. Grail is a general grammar development environment for type-logical categorial grammars (TLG). The Grail parsing engine combines proof net technology with structural rewriting.