LREC 2002 Workshop

Machine Translation Evaluation:
Human Evaluators Meet Automated Metrics

Motivation and Aims

The Evaluation Working group of the ISLE project has organised a series of workshops on MT evaluation. Each of these workshops has contained a practical component, where participants have been asked to carry out exercises involving MT evaluation. These workshops proved to be very illuminating, and have stimulated on-going work in the area, much of it reported in the latest workshop in the series, held at the MT Summit meeting in September of 2001.

Results from previous workshops can be consulted at www.issco.unige.ch/projects/isle/ewg.htm and the proceedings from the MT Summit in Santiago de Compostela can be requested from the organisers.

The workshop at LREC will continue the series, and will consist primarily of hands-on exercises defined to investigate empirically a small number of metrics proposed for evaluation of MT systems and the potential relationships between them.

In an effort to develop a more systematic MT evaluation methodology, recent work in the EAGLES and ISLE projects, funded by the EU and NSF, has created a framework of characteristics in terms of which MT evaluations and systems, past and future, can be described and classified. The resulting taxonomy can be consulted at http://issco-www.unige.ch/projects/isle/taxonomy2/.

Previous workshops have led to critical analysis of measures drawn from the literature, and to the creation of new measures. Of the latter, several are aimed at eventual automation of the evaluation task and/or at finding relatively simple and inexpensive measures which correlate well with more complex measures that are hard to automate or expensive to implement.

Given this background, the time has come to concentrate on systematizing the actual evaluation measures themselves. For any particular measure, one would like to know how accurate it is, how expensive and/or difficult to apply, how independent of other measures, etc. Very little of this type of information is available to date.

This workshop will focus on these issues. The organizers will provide the participants in advance with the materials required to:

Perform a small evaluation, using one or two measures;
Perform a cross-measure analysis of the resulting scores;
Create a general characterization of the measure's performance.

The participants will then apply these measures to the data made available, and bring their results to the workshop in order to integrate them with other participants' results.

The overall intention of the workshop is to discover, empirically, what kinds of characteristics are easily determinable, and how accurate they actually are. Only through a process of assessing the evaluations can we eventually arrive at a small but accurate set of measures that adequately cover the set of phenomena MT system evaluators, system developers, and potential MT users care about.

It is our hope that participants will feel inspired to continue this process, so that the combined results can be assembled later, integrated into the framework, and become a valuable resource to anyone interested in MT evaluation.

Organizing Committee

Marianne Dabbadie	EVALING, Paris (France)
Tony Hartley	Centre for Translation Studies, University of Leeds (UK)
Eduard Hovy	USC Information Sciences Institute, Marina del Rey (USA)
Margaret King	ISSCO/TIM/ETI, University of Geneva (Switzerland)
Bente Maegaard	Center for Sprogteknologi, Copenhagen (Denmark)
Sandra Manzi	ISSCO/TIM/ETI, University of Geneva (Switzerland)
Keith J. Miller	The MITRE Corporation (USA)
Widad Mustafa El Hadi	Université Lille III - Charles de Gaulle (France)
Andrei Popescu-Belis	ISSCO/TIM/ETI, University of Geneva (Switzerland)
Florence Reeder	The MITRE Corporation (USA)
Michelle Vanni	U.S. Department of Defense (USA)

Participation

Participants wishing to receive preparatory data should send the the following information to contact person below:

Name, address, email contact;
Experience in MT evaluation;
Languages known and level of comprehension (elementary, fair, good, near-native, native).

Contact

Andrei Popescu-Belis
ISSCO/TIM/ETI, University of Geneva
40, bd du Pont d'Arve
CH-1211 Geneva 4 - SWITZERLAND
Email (preferred): andrei.popescu-belis@issco.unige.ch
Fax: (41 22) 705 86 86

Important Dates

Registration with the workshop organizers	20th February 2002
Distribution of pre-workshop material	March 2002
Workshop	27 May 2002

Preliminary Schedule

Morning session

Introduction and welcome
Background and workshop themes
integration of evaluation exercises (start)

09:00 to 13:00

Afternoon session

integration of evaluation exercise (continue)
reports
cross-evaluation analysis
final wrap-up

14:30 to 18:30

Workshop Registration Fees

The registration fees for the workshop are:

If you are not attending LREC: 140 EURO
If you are attending LREC: 90 Euro

Machine Translation Evaluation: Human Evaluators Meet Automated Metrics