|LREC 2000 2nd International Conference on Language Resources & Evaluation
Previous Paper Next Paper
|The TREC-8 Question Answering Track
|Voorhees Ellen M. (National Institute of Standards and Technology, Gaithersburg, MD 20899, firstname.lastname@example.org)
Tice Dawn M. (National Institute of Standards and Technology, Gaithersburg, MD 20899, email@example.com)
|Human Assessors, Question Answering, Validation
|Session EO5 - Information Retrieval and Question Answering Evaluation
|The TREC-8 Question Answering track was the first large-scale evaluation of domain-independent question answering systems. This paper summarizes the results of the track, including both an overview of the approaches taken to the problem and an analysis of the evaluation methodology. Retrieval results for the more stringent condition in which system responses were limited to 50 bytes showed that explicit linguistic processing was more effective than the bag-of-words approaches that are effective for document retrieval. The use of multiple human assessors to judge the correctness of the systems' responses demonstrated that assessors have legitimate differences of opinion as to correctness even for fact-based, short-answer questions. Evaluations of question answering technology will need to accommodate these differences since eventual end-users of the technology will have similar differences.