Title Building Part-of-speech Corpora through Histogram Hopping
Author(s) Marc Vilain

The MITRE Corporation

Session P14-W
Abstract This paper are concerned with lowering the cost of producing training resources for part-of-speech taggers. We focus primarily on the resource needs of unsupervised taggers, as these can be trained with simpler resources than their supervised counterparts. We introduce histogram hopping, a new approach for developing the central training resources of unsupervised taggers, and describe a simple annotation prototype that implements the approach. We then discuss the applicability of histogram hopping to the development of resources for supervised taggers. Finally, we report on a preliminary pilot study for French that validates this work.
Keyword(s) Part-of-speech, unsupervised learning, lexicon
Language(s) English, French
Full Paper 763.pdf