Re-using high-quality resources for continued evaluation of automated summarization systems


Laura Alonso i Alemany (1), Maria Fuentes (2), Marc Massot (3), Horacio Rodríguez (2)

(1) GRIAL, Departament de Lingüística General, Universitat de Barcelona; (2) TALP Research Centre, Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya; (3) Departament d'Informàtica i Matemàtica Aplicada, Universitat de Girona




In this paper we present a method for re-using the human judgements on summary quality provided by the DUC contest. The score to be awarded to automatic summaries is calculated as a function of the scores assigned manually to the most similar summaries for the same document. This approach enhances the standard n-gram based evaluation of automatic summarization systems by establishing similarities between {\it extractive} (vs. {\it abstractive}) summaries and by taking advantage of the big quantity of evaluated summaries available from the DUC contest. The utility of this method is exemplified by the improvements achieved on a headline production system.


automatic summarization, automatic evaluation, evaluation of subjective tasks

Language(s) English
