Title The Effect of Bias on an Automatically-built Word Sense Corpus
Author(s) David Martinez, Eneko Agirre

IXA Group, University of the Basque Country

Session O40-W
Abstract The goal of this paper is to explore the large-scale automatic acquisition of sense-tagged examples to be used for Word Sense Disambiguation (WSD). We have applied the ``monosemous relatives'' method on the Web in order to build such a resource for all nouns in WordNet. The analysis of some parameters revealed that the distribution of the word senses (bias) in the training and test corpus is a determinant factor. Provided there is a method to approximate the bias for each word sense, the results we obtained for English are comparable to the use of hand-tagged data (Semcor), which is a very interesting perspective for lesser studied languages.
Keyword(s) Word Sense Disambiguation, Automatic Corpus Acquisition, Bootstrapping
Language(s) English
Full Paper 648.pdf