LREC 2000 2nd International Conference on Language Resources & Evaluation

Title Building a Treebank for French
Authors Abeillé Anne (TALaNa, Université Paris 7, 75251 Paris cedex 05, FRANCE,
Clément Lionel (TALaNa, Université Paris 7, 75251 Paris cedex 05, FRANCE,
Kinyon Alexandra (University of Pennsylvania, Philadelphia, USA,
Keywords Corpus Annotation, Corpus Linguistics, Parsing, Shalow Parsing, Tagging, Treebank
Session Session WO2 - Treebanks
Abstract Very few gold standard annotated corpora are currently available for French. We present an ongoing project to build a reference treebank for French starting with a tagged newspaper corpus of 1 Million words (Abeillé et al., 1998), (Abeillé and Clément, 1999). Similarly to the Penn TreeBank (Marcus et al., 1993), we distinguish an automatic parsing phase followed by a second phase of systematic manual validation and correction. Similarly to the Prague treebank (Hajicova et al., 1998), we rely on several types of morphosyntactic and syntactic annotations for which we define extensive guidelines. Our goal is to provide a theory neutral, surface oriented, error free treebank for French. Similarly to the Negra project (Brants et al., 1999), we annotate both constituents and functional relations.