Corpus-based Learning of Lexical Resources for German Named Entity Recognition
Computational Linguistics, University Duisburg-Essen, Duisburg - Germany (email@example.com)
This paper explores the use of unlabeled data in a knowledge-poor approach to German NER. German is especially interesting for NER since not only names but all nouns are capitalized. Therefore, large and reliable lexical resources are necessary to develop and adapt systems for NER. Motivated by a model of word form observance, distinguishing three levels of different granularity, a method for the automatic creation of domain-sensitive lexical resources for NER is proposed. The approach uses linear SVMs and is based solely on an annotated corpus of reasonable size and a large amount of unlabeled data.
Named Entity Recognition, linear SVM, learning from unlabeled data