Building domain specific lexical hierarchies from corpora
Olivier Ferret (CEA - LIST BP 6 18, route du Panorama, 92265 Fontenay-aux-Roses Cedex)
Christian Fluhr (CEA - LIST BP 6 18, route du Panorama, 92265 Fontenay-aux-Roses Cedex)
Françoise Rousseau-Hans (CEA - DTI Saclay, 91191 Gif-sur-Yvette Cedex)
Jean-Luc Simoni (CEA - LIST BP 6 18, route du Panorama, 92265 Fontenay-aux-Roses Cedex)
In this article, we present a new algorithm for building domain specific lexical hierarchies from texts. The basic elements of such a hierarchy are the normalized terms - mono and multi-word terms - extracted from a large corpus by a terminological extractor. The algorithm relies on collocations for representing the meaning of these terms, finding hierarchical relations between them and finally, organizing them into a hierarchy. Moreover, it takes into account the polysemy of terms while it builds the hierarchy. We also present the results of its application on a part of the corpus designed for the ARC A3 of the Francil network and we go through its possible applications.
Acquisition of lexical resources, Acquisition of semantic resources, Semantic lexicons, Thesaurus building, Acquisition of semantic relations