LREC 2000 2nd International Conference on Language Resources & Evaluation

Previous Paper   Next Paper

Title Sublanguage Dependent Evaluation: Toward Predicting NLP performances
Authors Illouz Gabriel (LIMSI, CNRS / Université Paris Sud, Orsay, France,
Keywords Evaluation (predictive), Performance Variations, POS Tagging, Sublanguages, Textual Typology
Session Session EO3 - Evaluation and Semantics
Full Paper, 252.pdf
Abstract In Natural Language Processing (NLP) Evaluation, such as MUC (Hirshman, 98), TREC (Harman, 98), GRACE (Adda et al, 97), SENSEVAL (Kilgarriff98), performance results provided are often average made on the complete test set. That does not give any clues on the systems robustness. knowing which system performs better on average does not help us to find which is the best for a given subset of a language. In the present article, the existing approaches which take into account language heterogeneity and offer methods to identify sublanguages are presented. Then we propose a new metric to assess robustness and we study the effect of different sublanguages identified in the Penn Tree Bank Corpus on performance variations observed for POS tagging. The work we present here is a first step in the development of predictive evaluation methods, intended to propose new tools to help in determining in advance the range of performance that can be expected from a system on a given dataset.