Multimodal Resources and Multimodal Systems Evaluation

Motivation and Aims

Individual organizations and countries have been investing in the creation of resources and methods for the evaluation of resources, technologies, products and applications. This is evident in the US DARPA HLT programme, the EU HLT programme under FP5-IST, the German MTI Program, the Francophone AUF programme and others. The European 6th Framework program (FP6), planned for a start in 2003, includes multilingual and multisensorial communication as major R&D issues. Substantial mutual benefits can be expected from addressing these issues through international cooperation. Nowhere is this more important than in the relatively new areas of multimedia (i.e., text, audio, video), multimodal (visual, auditory, tactile), and multicodal (language, graphics, gesture) communication.

Multimodal resources are concerned with the capture and annotation of multiple modalities such as speech, hand gesture, gaze, facial expression, body posture, graphics, etc. Until recently, only a handful of researchers have been engaged in the development of multimodal resources and their application in systems. Even so, most have focused on a limited set of modalities, custom annotation schemes and within a particular application domain.

The primary purpose of this one day workshop (feeding into a subsequent half day Multimodal Roadmap workshop) will be to report and discuss multimodal resources, annotation standards, tools and methods, and evaluation metrics/methods, as well as strategize jointly about the way forward.

Workshop Agenda

The workshop will be a mix of short presentations and facilitated sessions with the intent of jointly identifying grand challenge problems, a shared understanding of and plan for multimedia resources and applications, and identification of methods for facilitating the creation of multimedia resources. The workshop will consist of a morning session (8:00am to 13:30) and an afternoon session (14:30 to 20:00), with a focus on multimodal resources, annotation and evaluation. A common repository of illustrative multimodal video samples will be built prior to the workshop. Workshop participants will be encouraged to annotate some of them using their own coding scheme or tool and report results at the workshop. Elements of the workshop will include:

Topics to be addressed in the workshop include, but are not limited to:


This workshop will consist primarily of working sessions. However, presentations and participation in the workshop will be based on an assessment of a 4 page extended position statement which addresses one or more of the fundamental multimodal road map and/or multimodal resource issues posed by the workshop. Submissions must be in English, no more than 4 pages long, and in single column format. The first page should include the title, names and affiliations of the authors; the full address of the first author (or a contact person), including phone, fax, email, URL; and 5 keywords.

Submissions should be sent electronically in Word (preferably) or PDF or ASCII text format to arrive no later than 15 January 2002 to Jean-Claude Martin and Paula MacDonald.
Demonstrations of multimodal LR and related tools will be considered as well. Please send a demonstration outline of 2 pages.
Authors willing to participate to the a collective annotation exercise of the morning are encouraged to consult where they can submit one ore more short (approx. 1 minute) video(s) with an accompanying annotation.

As soon as possible, authors are encouraged to send a brief email indicating their intention to participate, including their contact information and the topic they intend to address in their submission. Proceedings of the workshop will be printed.

Important Dates

Call for papers/invitation 17th December 2001
Submission deadline 15th February 2002
Notification, stylesheets available 1st March 2002
Camera ready paper due 15th March 2002
Workshop program due 3rd April 2002
Proceedings due 20th April 2002
Workshop 1st June 2002


Mark T. Maybury
Information Technology Division
The MITRE Corporation, 3K-205
202 Burlington Road
Bedford, MA 01730
Phone: +1(781) 271-7230
Fax.: +1(781) 271-2780

Jean-Claude Martin
B.P. 133
F-91403 Orsay (France)
Phone: +33 6 84 21 62 05
Fax. : +33 1 69 85 80 88

Programme Committee

Mark Maybury The MITRE Corporation (Co-Chair) (USA)
Jean-Claude Martin LIMSI-CNRS/LINC-University Paris 8 (Co-Chair) (France)
Lisa Harper The Mitre Corporation (USA)
Catherine Pelachaud University of Rome "La Sapienza" (Italy)
Michael Kipp DFKI (Germany)
Wolfgang Wahlster DFKI (Germany)
Oliviero Stock IRST
Harry Bunt Tilburg University (The Netherlands)
Antonio Zampolli Consiglio Nazionale delle Ricerche (Italy)
Steven Krauwer ELSNET
Niels Ole Bernsen Natural Interactive Systems Laboratory, University of Southern Danemark (Odense, Denamrk)
Laila Dybkjaer Natural Interactive Systems Laboratory, University of Southern Danemark (Odense, Denamrk)

Workshop Registration Fees

The registration fees for the workshop are:
These fees include a coffee break and the Proceedings of the workshop.

Related Activities

Recently, several projects, initiatives and organisations have addressed multimodal resources with a federative approach: Other recent initiatives in the United States include:

In the summer 2001, a Call for Proposal dedicated to Multimodality was launched within the IST Program in autumn 2001. We hope that these participants will also be interested in participating in the LREC 2002 conference.

Starting in 2003, the European 6th Framework program (FP6) will include multilingual and multisensorial communication as a primary R&D focus. Technology evaluation is a specific item in the Integrated Project instrument presentation.

Until now, the collection and annotation of multimodal corpora has been made on an individual basis; individual researchers and teams typically develop custom coding schemes and tools within narrow task domains. As a result, there is a distinct lack of shared knowledge and understanding in terms of how to compare various coding schemes and tools. This makes it difficult to bootstrap off of the results and experiences of others. Given that the annotation of corpora (particularly multimodal corpora) is very costly, we anticipate a growing need for the development of tools and methodologies that enable the collaborative building and sharing of multimodal resources. ery costly, we anticipate a growing need for the development of tools and methodologies that enable the collaborative building and sharing of multimodal resources.