Title Enriching a French Treebank
Author(s) Anne Abeillé, Nicolas Barrier

Laboratoire de linguistique formelle

Abstract This paper presents the current status of the French treebank developed at Paris 7 (Abeillé et al., 2003a). The corpus comprises 1 million words from the newspaper le Monde, fully annotated and disambiguated for parts of speech, inflectional morphology, compounds and lemmas, and syntactic constituents. It is representative of contemporary normalized written French, and covers a variety of authors and subjects (economy, literature, politics, etc.), with extracts from newspapers ranging from 1989 to 1993. It has been used by computational linguists to train and evaluate taggers, parsers and lemmatizers, as well as by psycholinguists to extract lexical and syntactic preferences (Pynte et al., 2001). It is now being enriched with functional information, and used for parsing evaluation.
Keyword(s) French, treebank, functional tagging, evaluation
Language(s) French
