Summary of the paper

Title A good space: Lexical predictors in word space evaluation
Authors Christian Smith, Henrik Danielsson and Arne Jönsson
Abstract Vector space models benefit from using an outside corpus to train the model. It is, however, unclear what constitutes a good training corpus. We have investigated the effect on summary quality when using various language resources to train a vector space based extraction summarizer. This is done by evaluating the performance of the summarizer utilizing vector spaces built from corpora from different genres, partitioned from the Swedish SUC-corpus. The corpora are also characterized using a variety of lexical measures commonly used in readability studies. The performance of the summarizer is measured by comparing automatically produced summaries to human created gold standard summaries using the ROUGE F-score. Our results show that the genre of the training corpus does not have a significant effect on summary quality. However, evaluating the variance in the F-score between the genres based on lexical measures as independent variables in a linear regression model, shows that vector spaces created from texts with high syntactic complexity, high word variation, short sentences and few long words produce better summaries.
Topics Summarisation, Information Extraction, Information Retrieval, Evaluation methodologies
Full paper A good space: Lexical predictors in word space evaluation
Bibtex @InProceedings{SMITH12.335,
  author = {Christian Smith and Henrik Danielsson and Arne Jönsson},
  title = {A good space: Lexical predictors in word space evaluation},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
Powered by ELDA © 2012 ELDA/ELRA