Creation of reusable components and language resources for Named Entity Recognition in Russian


Borislav Popov (1), Angel Kirilov (1), Diana Maynard (2), Dimitar Manov (1)

(1) Ontotext Lab (Sirma AI); (2) University of Sheffield




This paper describes the development of the RussIE system in which we experimented with the creation of reusable processing components and language resources for a Russian Information Extraction system. The work was done as part of a multilingual project to adapt existing tools and resources for HLT to new domains and languages. The system was developed within the GATE architecture for language processing, and aims to explore the boundaries of language resource reuse and adaptability across languages and language types, rather than to create a full-scale IE system at the very peak of performance. Nevertheless, the systgem achieves a very creditable 71% F-Measure on news texts, and there is much scope for future improvement of this score.


Russian, named entity recognition, lexical resources, inflectional morphology

Language(s) English, Russian
Full Paper