Summary of the paper

Title TweetNorm_es: an Annotated Corpus for Spanish Microtext Normalization
Authors Iñaki Alegria, Nora Aranberri, Pere Comas, Victor Fresno, Pablo Gamallo, Lluís Padró, Iñaki San Vicente, Jordi Turmo and Arkaitz Zubiaga
Abstract In this paper we introduce TweetNorm_es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.
Topics Collaborative Resource Construction, Corpus (Creation, Annotation, etc.)
Full paper TweetNorm_es: an Annotated Corpus for Spanish Microtext Normalization
Bibtex @InProceedings{ALEGRIA14.442,
  author = {Iñaki Alegria and Nora Aranberri and Pere Comas and Victor Fresno and Pablo Gamallo and Lluís Padró and Iñaki San Vicente and Jordi Turmo and Arkaitz Zubiaga},
  title = {TweetNorm_es: an Annotated Corpus for Spanish Microtext Normalization},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
  year = {2014},
  month = {may},
  date = {26-31},
  address = {Reykjavik, Iceland},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-8-4},
  language = {english}
 }
Powered by ELDA © 2014 ELDA/ELRA