Title

Title	Multimodal Multilingual Resources in the Subtitling Process
Author(s)	Stelios Piperidis (1,2) Iason Demiros (1,2), Prokopis Prokopidis (1,2), Peter Vanroose (3), Anja Hoethker (4), Walter Daelemans (4), Elsa Sklavounou (5), Manos Konstantinou (6), Yannis Karavidas (7) (1) Institute for Language and Speech Processing, Artemidos 6 & Epidavrou, 151 25 Athens, Greece. (2)National Technical University of Athens. (3) Katholieke Universiteit Leuven, div. ESAT/PSI, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium. (4)CNTS Language Technology Group- Universiteit Antwerpen, Universiteitsplein 1, B-2610 Antwerpen Belgium. (5) Systran SA, 1 rue du Cimetiere-BP 7, 95230 Soisy Sous Montmorency, France. (6) Lumiere Cosmos Communications SA, Lazarou Sohou 5, 11525 Athens, Greece. (7) British Broadcasting Corporation (World Service), Bush House, Strand, London WC2 4PH. Email :{spip, iason, prokopis}@ilsp.gr, Peter.Vanroose@esat.kuleuven.ac.be, hoethker@uia.ua.ac.be, sklavounou@systran.fr, mkonstantinou@lumiere.gr, yannis.karavidas@bbc.co.uk
Session	O10-MSE
Abstract	In view of the expansion of digital television and the increasing demand to manipulate audiovisual content, tools producing subtitles in a multilingual setting become indispensable for the subtitling industry. Operating in this setting, the MUSA project aims at the development of a system which combines speech recognition, advanced text analysis, and machine translation to help generate multilingual subtitles; a system that converts audio streams into text transcriptions, condenses the content to meet the spatio-temporal constraints of the subtitling process and produces draft translations in two language pairs. Three European languages are supported: English as source and target as far as subtitling generation is concerned, French and Greek as subtitle translation target languages. In order to train and evaluate system components, an array of application specific resources are necessary. Primary audiovisual data consist in BBC TV documentaries and "newsy" current affairs programmes. For each programme, the following data are captured: the actual video - its transcript or script - English, Greek and French subtitles - and topically relevant newspaper or web-sourced extracts.
Keyword(s)	Multilingual subtitling, multimodal resources, sentence compression, machine translation resources
Language(s)	EN, EL, FR
Full Paper	680.pdf