The COST278 Pan-European Broadcast News Database


An Vandecatseye (1), Jean-Pierre Martens (1), Joao Neto (2), Hugo Meinedo (2), Carmen Garcia- Mateo (3), Javier Dieguez (3), France Mihelic (4), Janez Zibert (4), Jan Nouza (5), Petr David (5), Matus Pleva (6), Anton Cizmar (6), Harris Papageorgiou (7), Christina Alexandris (7)

(1) Ghent University, Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium, {avdecats, martens}@elis.ugent.be; (2) INESC ID, Tua Alves Redol 9, 1000- 029 Lisbon, Portugal; (3) University of Vigo, 36200 Pontevedra, Vigo, Spain; (4) University of Ljubljana, Trzaska 25, SI - 1000 Ljubjlana, Slovenia; (5) Technical University of Liberec, Halkova 5, 461 17 Liberec, Czech Republic; (6) Technical University of Kosice, Letna 9, 04120 Kosice, Slovakia; (7) ILSP, Artemidos 6 & Epidavrou, GR-151 25 Maroussi, Greece




This paper describes a pan-European multilingual audio and video database of broadcast news shows. The database was constructed by seven institutions that are collaborating in the European COST278 action on Spoken Language Interaction in Telecommunications. At present, the database comprises broadcast news shows in seven languages, namely Dutch, Portuguese, Galician, Czech, Slovenian, Slovakian and Greek, but the policy is to attract new partners that bring in new data which are constructed and transcribed according to the rules and procedures outlined in this paper. The data comes with evaluation software that should facilitate a comparison of experiments.


multilingual LR, multimodal LR, broadcast news

Language(s) Dutch, Portuguese, Galician, Czech, Slovenian, Slovakian, Greek
Full Paper