Summary of the paper

Title LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
Authors Annemarie Friedrich, Marina Valeeva and Alexis Palmer
Abstract We present LQVSumm, a corpus of about 2000 automatically created extractive multi-document summaries from the TAC 2011 shared task on Guided Summarization, which we annotated with several types of linguistic quality violations. Examples for such violations include pronouns that lack antecedents or ungrammatical clauses. We give details on the annotation scheme and show that inter-annotator agreement is good given the open-ended nature of the task. The annotated summaries have previously been scored for Readability on a numeric scale by human annotators in the context of the TAC challenge; we show that the number of instances of violations of linguistic quality of a summary correlates with these intuitively assigned numeric scores. On a system-level, the average number of violations marked in a system's summaries achieves higher correlation with the Readability scores than current supervised state-of-the-art methods for assigning a single readability score to a summary. It is our hope that our corpus facilitates the development of methods that not only judge the linguistic quality of automatically generated summaries as a whole, but which also allow for detecting, labeling, and fixing particular violations in a text.
Topics Summarisation, Discourse Annotation, Representation and Processing
Full paper LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
Bibtex @InProceedings{FRIEDRICH14.578,
  author = {Annemarie Friedrich and Marina Valeeva and Alexis Palmer},
  title = {LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
  year = {2014},
  month = {may},
  date = {26-31},
  address = {Reykjavik, Iceland},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-8-4},
  language = {english}
Powered by ELDA © 2014 ELDA/ELRA