| Title | 
  Can we Evaluate the Quality of Generated Text? | 
  
  
  | Authors | 
  David Hardcastle and Donia Scott | 
  
  
  | Abstract | 
  Evaluating the output of NLG systems is notoriously difficult, and performing assessments of text quality even more so. A range of automated and subject-based approaches to the evaluation of text quality have been taken, including comparison with a putative gold standard text, analysis of specific linguistic features of the output, expert review and task-based evaluation. In this paper we present the results of a variety of such approaches in the context of a case study application. We discuss the problems encountered in the implementation of each approach in the context of the literature, and propose that a test based on the Turing test for machine intelligence offers a way forward in the evaluation of the subjective notion of text quality. | 
  
  
  | Language | 
  Single language | 
  
  
  | Topics | 
  Evaluation methodologies, Generation, Other   | 
  
  
  Full paper  | 
  Can we Evaluate the Quality of Generated Text? | 
  
  
  Slides  | 
  - | 
  
  
  | Bibtex | 
  @InProceedings{HARDCASTLE08.797, 
   author =  {David Hardcastle and Donia Scott}, 
   title =  {Can we Evaluate the Quality of Generated Text?}, 
   booktitle =  {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)}, 
   year =  {2008}, 
   month =  {may}, 
   date =  {28-30}, 
   address =  {Marrakech, Morocco}, 
   editor =  {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias}, 
   publisher =  {European Language Resources Association (ELRA)}, 
   isbn =  {2-9517408-4-0}, 
   note =  {http://www.lrec-conf.org/proceedings/lrec2008/}, 
   language =  {english} 
   }   |