A Freely Available Automatically Generated Thesaurus of Related Words
University of Mainz
A freely available English thesaurus of related words is presented that has been automatically compiled by analyzing the distributional similarities of words in the British National Corpus. The quality of the results has been evaluated by comparison with human judgments as obtained from non-native and native speakers of English who were asked to provide rankings of word similarities. According to this measure, the results generated by our system are better than the judgments of the non-native speakers and come close to the native speakers’ performance. An advantage of our approach is that it does not require syntax parsing and therefore can be more easily adapted to other languages. As an example, a similar thesaurus for German has already been completed.
thesaurus generation, semantic similarity, distributional similarity, singular value decomposition, word co-occurrence