LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title GéDériF: Automatic Generation and Analysis of Morphologically Constructed Lexical Resources
Authors Namer Fiammetta (LANDISCO, Universite de Nancy 2 - 23 Bd Albert 1 er - BP 3397 - 54015 Nancy cedex. France, namer@clsh.univ-nancy2.fr)
Dal Georgette (UMR 8528 «SILEX», CNRS & Université de Lille 3 - BP 149 - 59653 Villeneuve d’Ascq cedex. France, dal@univ-lille3.fr)
Keywords Automatic Generation of Constructed Words, Derivational Morphology, Lexical Resources, Rule-Based Stemming
Session Session WO18 - Morphology in Lexical and Textual Resources
Full Paper 279.ps, 279.pdf
Abstract One of the major frequent problems in text retrieval comes from large number of words encountered which are not listed in general language dictionaries. However, it is very often the case that these words are morphologically complex, and as such have a meaning which is predictable on the basis of their structure. Furthermore, such words typically belong to specialized language uses (e.g. scientific, philosophical or media technolects). Consequently, tools for listing and analysing such words can help enrich a terminological database. The purpose of this paper is to present a system that automatically generates morphologically complex lexical French items which are not listed in dictionaries, and that furthermore provides a structural and semantic analysis of these items. The output of this system is a morphological database (currently in progress) which forms a powerful lexical resource. It will be very useful in Natural Language Processing (NLP) and in IR (Information Retrieval) applications. Indeed the system generates a potentially infinite set of complex (derived) lexical units (henceforth CLUs) automatically associated with a rich array of morpho-semantic features, and is thus capable of dealing morphologically complex structures which are unlisted in dictionaries.