The Cross-Breeding of Dictionaries


Adam Meyers, Ruth Reeves, Catherine Macleod, Rachel Szekely, Veronika Zielinska, Brian Young

New York University




Especially for English, the number of hand-coded electronic resources available to the Natural Language Processing Community keeps growing: annotated corpora, treebanks, lexicons, wordnets, etc. Unfortunately, initial funding for such projects is much easier to obtain than the additional funding needed to enlarge or improve upon such resources. Thus once one proves the usefulness of a resource, it is difficult to make that resource reach its full potential. We discuss techniques for combining dictionary resources and producing others by semi-automatic means. The resources we created using these techniques have become an integral part of our work on NomBank, a project with the goal of annotating noun arguments in the Penn Treebank II corpus (PTB).


Lexicon, Dictionary, Corpora, Lexicography, Resources

Language(s) English
Full Paper