|LREC 2000 2nd
International Conference on Language Resources & Evaluation
Previous Paper Next Paper
|Something Borrowed, Something Blue: Rule-based Combination of POS Taggers
|Borin Lars (Department of Linguistics, Uppsala University, Box 527, SE–751 20 Uppsala, SWEDEN, Lars.Borin@ling.uu.se)
|Knowledge-Rich NLP, Machine Learning, Multilingual Corpora, Parallel Corpora, POS Tagging
|Session WO1 - Corpus Tagging
|Linguistically annotated text resources are still scarce for many languages and for many text types, mainly because their creation repre-sents a major investment of work and time. For this reason, it is worthwhile to investigate ways of reusing existing resources in novel ways. In this paper, we investigate how off-the-shelf part of speech (POS) taggers can be combined to better cope with text material of a type on which they were not trained, and for which there are no readily available training corpora. We indicate—using freely avail-able taggers for German (although the method we describe is not language-dependent)—how such taggers can be combined by using linguistically motivated rules so that the tagging accuracy of the combination exceeds that of the best of the individual taggers.