SALA II across the finish line: a large collection of mobile telephone speech databases from North and Latin America completed


 Henk van den Heuvel (1), Phil Hall (2) , Harald Höge (3), Asunción Moreno (4), Antonio Rincon (5), Franco Senia (6)

(1) SPEX, Nijmegen, Netherlands; (2) Appen Pty Ltd, Chatswood, Australia; (3) Siemens AG, Munich, Germany; (4) UPC, Barcelona, Spain; (5) S.L. “Atlas”, Barcelona, Spain; (6) Loquendo Vocal Technology and Services, Turin, Italy




 The SALA II project comprises mobile telephone recordings according to the SpeechDat (II) paradigm for several languages in North and Latin America. Each database contains the recordings of 1000 speakers, with the exception of US Spanish (2000 speakers) and US English (4000 speakers). A quarter of the recordings of each database are made respectively in a quiet environment (home/office), in the street, in a public place, and in a moving vehicle. This paper presents an evaluation of the project. The paper details on experiences with respect to the implementation of design specifications, speaker recruitment, data recordings (on site), data processing, orthographic transcription and lexicon generation. Furthermore, the validation procedure and its results are documented. Finally, the availability and distribution of the databases are addressed.


 speech databases, data collection, mobile teleservices, Speechdat, SALA


 Spanish, Portuguese, English, French

Full Paper