LREC 2000 2nd International Conference on Language Resources & Evaluation

Previous Paper   Next Paper

Title Something Borrowed, Something Blue: Rule-based Combination of POS Taggers
Authors Borin Lars (Department of Linguistics, Uppsala University, Box 527, SE–751 20 Uppsala, SWEDEN,
Keywords Knowledge-Rich NLP, Machine Learning, Multilingual Corpora, Parallel Corpora, POS Tagging
Session Session WO1 - Corpus Tagging
Full Paper, 158.pdf
Abstract Linguistically annotated text resources are still scarce for many languages and for many text types, mainly because their creation repre-sents a major investment of work and time. For this reason, it is worthwhile to investigate ways of reusing existing resources in novel ways. In this paper, we investigate how off-the-shelf part of speech (POS) taggers can be combined to better cope with text material of a type on which they were not trained, and for which there are no readily available training corpora. We indicate—using freely avail-able taggers for German (although the method we describe is not language-dependent)—how such taggers can be combined by using linguistically motivated rules so that the tagging accuracy of the combination exceeds that of the best of the individual taggers.