Webaffix: Discovering Morphological Links on the WWW
Nabil Hathout (ERSS / CNRS & UniversitŽe de Toulouse Le Mirail - France 5, allŽees A. Machado, F-31058 Toulouse CEDEX 1)
Ludovic Tanguy (ERSS / CNRS & UniversitŽe de Toulouse Le Mirail - France 5, allŽees A. Machado, F-31058 Toulouse CEDEX 1)
WP5: Components & Systems
This paper presents a new language-independent method for finding morphological links between newly appeared words (i.e. absent from reference word lists). Using the WWW as a corpus, the Webaffix tool detects the occurrences of new derived lexemes based on a given suffix, proposes a base lexeme following a standard scheme (such as noun-verb), and then performs a compatibility test on the word pairs produced, using the Web again, but as a source of cooccurrences. The resulting pairs of words are used to build generic morphological databases useful for a number of NLP tasks. We develop and comment an example use of Webaffix to find new noun/verb pairs in French.