| Title | 
  Designing and Evaluating a Russian Tagset | 
  
  
  | Authors | 
  Serge Sharoff, Mikhail Kopotev, Tomaz Erjavec, Anna Feldman and Dagmar Divjak | 
  
  
  | Abstract | 
  This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset is based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 500 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set that can be shared with other researchers. | 
  
  
  | Language | 
  Single language | 
  
  
  | Topics | 
  Morphology, Tagging, Multilinguality   | 
  
  
  Full paper  | 
  Designing and Evaluating a Russian Tagset | 
  
  
  Slides  | 
  - | 
  
  
  | Bibtex | 
  @InProceedings{SHAROFF08.78, 
   author =  {Serge Sharoff, Mikhail Kopotev, Tomaz Erjavec, Anna Feldman and Dagmar Divjak}, 
   title =  {Designing and Evaluating a Russian Tagset}, 
   booktitle =  {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)}, 
   year =  {2008}, 
   month =  {may}, 
   date =  {28-30}, 
   address =  {Marrakech, Morocco}, 
   editor =  {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias}, 
   publisher =  {European Language Resources Association (ELRA)}, 
   isbn =  {2-9517408-4-0}, 
   note =  {http://www.lrec-conf.org/proceedings/lrec2008/}, 
   language =  {english} 
   }   |