Title

Title	Corpus-based Learning of Lexical Resources for German Named Entity Recognition
Author(s)	Marc Rössler Computational Linguistics, University Duisburg-Essen, Duisburg - Germany (marc.roessler@uni-duisburg.de)
Session	O15-W
Abstract	This paper explores the use of unlabeled data in a knowledge-poor approach to German NER. German is especially interesting for NER since not only names but all nouns are capitalized. Therefore, large and reliable lexical resources are necessary to develop and adapt systems for NER. Motivated by a model of word form observance, distinguishing three levels of different granularity, a method for the automatic creation of domain-sensitive lexical resources for NER is proposed. The approach uses linear SVMs and is based solely on an annotated corpus of reasonable size and a large amount of unlabeled data.
Keyword(s)	Named Entity Recognition, linear SVM, learning from unlabeled data
Language(s)	German
Full Paper	373.pdf