The BABEL Project


The BABEL project was a joint European project under the COPERNICUS scheme, completed at the end of December 1998.

According to the Bible all men once spoke a common language. But they built a tall structure, the Tower of Babel, and tried to reach the heavens. To punish them for their pride and folly, God made them unable to understand each other, each group speaking a different language.
Details are available of the BABEL-organised workshop Speech Database Development for Central and Eastern European Languages (a satellite workshop in association with the First International Conference on Language Resources and Evaluation Granada, May 28-30 1998).


BABEL was a joint European project under the COPERNICUS scheme comprising partners from a number of Eastern and Western European research centers. BABEL has produced a multi-language database comprising five of the most widely differing Eastern European languages: Bulgarian, Estonian, Hungarian, Polish and Romanian. The database has been designed and collected using the standards and protocols laid out in the European Union ESPRIT SAM project and follow the format of the EUROM 1 database; a database of eleven Central and Western European languages: Danish, Dutch, English, French, German, Italian, Norwegian, Swedish, Greek, Portuguese and Spanish.

A comparable database of spoken language has been created. For each language, as in the EUROM 1 database, there is a many-talker, few-talker and very-few-talker corpus with material minimally including prepared lists of numbers (covering the phonotactic possibilities of each language), passages and sentences. Talkers are selected equally from both sexes. Little material of this sort has been gathered for languages spoken in countries eligible for Copernicus funding, though a database of Czech exists and a small database of Bulgarian speech has been recorded and labelled by members of the present consortium using SAM protocols.

Data collection was carried out on a SAM speech workstation: this is a PC equipped with standard specified audio hardware and software. Analysis of the data comprises at least end-point labelling for all data with some more detailed labelling of other sections including phonemic/phonetic transcriptions using the SAMPA system.

The major objective of the project has been to provide a common European resource for Spoken Language Engineering and research. Data will be made available on CD-ROM through ELRA early in 2000.

The project officially started on the 1st March 1995 and the inaugural meeting took place in the Speech Research Lab at the University of Reading on Thursday 9th and Friday 10th of March 1995.

Contact Info

The project's official email address is Babel@Reading.AC.UK which is a mailing list relaying information to all partners.

List of Partners.

Work Programme.

Useful Information

A proposed project logo (similar to the bitmap logo above) is available to download. This is an encapsulated postscript document. Some other logos have been designed.

I have also collected a few pieces of freely available software that may be useful. Included are programs to play audio files, calculate spectrograms (supposedly in real-time), and convert between audio file formats. Some make use of the SoundBlaster card. I have not tested any of these. Suggestions for other software to include welcome.

Here's an interesting web site going by the same name as our project.

Project Co-ordinator: Professor Peter Roach(

Dr. Simon Arnfield(
Dr. Simon Arnfield(