Title Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment
Author(s) Georges Fafiotte (1), Christian Boitet (1), Mark Seligman (1), Chengqing Zong (2)

(1) GETA, CLIPS, IMAG-campus (UJF - Grenoble 1), 385 rue de la Bibliothèque, BP 53, F-38041 Grenoble cedex 9, France, georges.fafiotte@imag.fr, christian.boitet@imag.fr, mark.seligman@spokentranslation.com; (2) National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, P.O.Box 2728 Beijing 100080, China, cqzong@nlpr.ia.ac.cn

Session P9-SE
Abstract We describe here the three main platforms in the ERIM family of Web-based environments for human interpreting, two of them in more details – ERIM-Interp and ERIM-Collect –, then ERIM-Aid. Each platform supports an aspect of the collecting or study of spontaneous bilingual dialogues, translated by an interpreter. ERIM-Interp is the core environment, providing mediated communication between speakers and human interpreters over the network. Using ERIM-Collect, French-Chinese interpreting data have been collected within the three-year "ChinFaDial" project supported by LIAMA, a French-Chinese laboratory in Beijing. These "raw" speech data will be made available in the spring of 2004 on an open-access basis, using the DistribDial server, on a CLIPS-GETA website. Our goal is to extend such corpora, on a collaborative scheme, to allow other research groups to contribute to the site whatever annotations they may have created, and to share them under the same conditions (GPL). An ERIM-Aid variant is intended to provide focused machine aids to human interpreters working over the Web, or possibly to distant monolingual speakers conversing in different languges.
Keyword(s) Data collection, spontaneous speech, dialogue, speech corpora, interpreter, interpreting, free distribution, freeware
Language(s) Any language, for the generic software resource; French-Chinese, for the ChinFaDial corpora
