Title Designing a Realistic Evaluation of an End-to-end Interactive Question Answering System
Author(s) Nina Wacholder (1), Sharon Small (2), Bing Bai (1), Diane Kelly (3), Robert Rittman (1), Sean Ryan (2), Robert Salkin (1), Peng Song (1), Ying Sun (1), Liu Ting (2), Paul Kantor (1), Tomek Strzalkowski (2)

(1) Rutgers University, (2) SUNY Albany, (3) University of North Carolina

Session O27-ESW
Abstract We report on the development of material for an evaluation exercise designed to assess the overall design and usability of HITIQA, an interactive question-answering system for preparing broad ranging reports on complex issues. The two basic objectives of the evaluation were (1) To perform a realistic assessment of the usefulness and usability of HITIQA as an end-to-end system, from the information seeker’s initial questions to completion of a draft report; and (2) To develop metrics to compare the answers obtained by different analysts and evaluate the quality of the support that HITIQA provides. We used qualitative and quantitative tools to obtain data about analyst’s comfort with the HITIQA system, especially its novel features such as the ability to answer complex questions and the interactive dialogue. Because of the impracticality of measuring the quality of HITIQA output with the standard metrics of precision and recall, we developed a new task –cross-evaluation--to indirectly measure the quality of the answers obtained using HITIQA; in this black-box assessment, analysts rate the quality of their own and their colleagues’ reports.
Keyword(s) Evaluation Methodologies, Protocols and Measures, Blackbox, Usability, User Experience Evaluation, Question-Answering, Qualitative, Quantitative
Language(s) English
Full Paper 675.pdf