Summary of the paper

Title Designing and Evaluating a Russian Tagset
Authors Serge Sharoff, Mikhail Kopotev, Tomaz Erjavec, Anna Feldman and Dagmar Divjak
Abstract This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset is based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 500 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set that can be shared with other researchers.
Language Single language
Topics Morphology, Tagging, Multilinguality
Full paper Designing and Evaluating a Russian Tagset
Slides -
Bibtex @InProceedings{SHAROFF08.78,
  author = {Serge Sharoff, Mikhail Kopotev, Tomaz Erjavec, Anna Feldman and Dagmar Divjak},
  title = {Designing and Evaluating a Russian Tagset},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA