This paper describes the current status of speech database technology (SDT) dedicated for the development of commercially used recognisers. SDT has as goal to provide speech databases which are optimally suited to train speaker independent recognisers adapted to all relevant application areas. Each application area is characterized by specific acoustic environments in which the user acts. The databases have to reflect these environments in order to achieve optimal recognition rates. The paper reviews the status of SDT in the view of existing and upcoming application areas and further development of HMM technology . Based on this review a generic set of application specific speech databases is defined. Taking into account the further development of speech recognition technology, the current and upcoming applications and the dimension of multilinguality within a mobile society some
This paper addresses the issues influencing the development of speech technology in Oceania by considering three main factors: the status and projects of large-scale telephone-based speech data corpora in the region, the relevant linguistic structure of the region, and the human geography of the region with particular emphasis on its telecommunication and information infrastructure. It accesses the views of a varied group of experts on the region and links their views on language use, culture, telecommunications cost, and the directions in which these factors are changing, with the factors important for the development of telephone-based speech technology. It concludes that while there are certain local inhibitors to such development there are also opportunities which can be exploited when the relevant factors are fully explored and evaluated.
The objective of the SALA (SpeechDat across Latin America) project is to record large SpeechDat-like databases to train telephonespeech recognisers in Latin American countries. In this project, Latin America was divided in eight wide recording areas. One or morecountries make up each area. Speakers are recruited with the goal to obtain a homogeneous dialectal distribution within each area and agood balance of age and sex. In each call a total of 45 read and spontaneous items are recorded A label file containing a manualorthographic transcription of the speech really uttered by the speaker accompanies each audio file. Each database comes with a lexiconcontaining the phonemic transcription in SAMPA of each word in the orthographic transcriptions. More than 6000 speakers have beenalready been recorded. The speakers come from different areas from Brazil, Mexico, Colombia and Venezuela. The databases fromthese countries will be finished on June 2000.
This paper describes the creation of five new telephony speech databases for Central and Eastern European languages within the Speech-Dat(E) project. The 5 languages concerned are Czech, Polish, Slovak, Hungarian, and Russian. The databases follow SpeechDat-IIspecifications with some language specific adaptation. The present paper describes the differences between SpeechDat(E) and earlierSpeechDat projects with regard to database items such as generation of phonetically rich sentences, speaker recruitment, etc. The collectionsof the DBs are in the finishing phase. The DBs will be validated by SPEX and will be distributed by ELRA.
This paper presents an overview of the SpeechDat-Car project. The SpeechDat-Car project is a 4th framework EC project in the Language Engineering programme. It aims at collecting a set of ten speech databases to support training and testing of robust multilingual speech recognition for in-car applications. The consortium participants are car manufacturers, telephone communications providers, and universities. More precisely, this paper describes the background of the project, its organisation, and the characteristics of the databases. It further addresses the recording platforms, the validation scenario, and provides the current status of the recordings for the ten languages.
WWWTranscribe is a transcription system based on the WWW. It is platform independent and allows network access to speech databases. Its modular structure make it flexible, and it connects easily to existing signal processing applications or database management systems. WWWTranscribe consists of static HTML documents containing forms. To these forms CGI applications are attached that perform data processing and that dynamically create subsequent HTML documents. Within the HTML forms, embedded programs allow formal consistency checks for the annotation, and they provide support for often-needed editing tasks such as digit to string conversion, etc.
The tutorial gives an overview of WWWTranscribe, describes its implementation, and shows how to adapt WWWTranscribe to different languages and annotation tasks.
As validation of Spoken Language Resources we consider the evaluation of a SLR against a fixed set of requirements, usually its specifications. In this paper, special emphasis is given to the validation criteria and procedures used in the SpeechDat projects. These criteria and procedures are summarised and put to a critical evaluation.
The COST 249 SpeechDat reference recogniser is a fully automatic, language-independent training procedure for building a phoneticrecogniser. It relies on the HTK toolkit and a SpeechDat(II) compatible database. The recogniser is designed to serve as a referencesystem in multilingual recognition research. This paper documents version 0.95 of the reference recogniser and presents results on smalland medium vocabulary recognition for five languages. The paper is a sligthly expanded version of a paper presented at LREC-2000(Johansen et al., 2000).
We present two telephone speech databases for Austrian German. The databases contain one thousand calls each, from thefixed and from the mobile telephone network. Speakers were chosen to assure a representative distribution over accentregions, sex, and age groups. The databases are compliant with the guidelines of the Speechdat project. We discuss thecharacteristics of Austrian German, describe the contents of the databases, the speaker recruitment methods, transcriptions ofthe calls, the phonetic lexicon, and quality assurance measures. In the end, we outline our plans to use the database forresearch on the pronunciation of Austrian German and automatic dialect identification.
Under the SpeechDat specifications, the Spanish member of SpeechDat consortium has recorded a Catalan database that includes onethousand speakers. This communication describes some experimental work that has been carried out using both the Spanish and theCatalan speech material.A speech recognition system has been trained for the Spanish language using a selection of the phonetically balanced utterances fromthe 4500 SpeechDat training sessions. Utterances with mispronounced or incomplete words and with intermittent noise were discarded.A set of 26 allophones was selected to account for the Spanish sounds and clustered demiphones have been used as context dependentsub-lexical units. Following the same methodology, a recognition system was trained from the Catalan SpeechDat database. Catalansounds were described with 32 allophones. Additionally, a bilingual recognition system was built for both the Spanish and Catalanlanguages. By means of clustering techniques, the suitable set of allophones to cover simultaneously both languages was determined.Thus, 33 allophones were selected. The training material was built by the whole Catalan training material and the Spanish materialcoming from the Eastern region of Spain (the region where Catalan is spoken).The performance of the Spanish, Catalan and bilingual systems were assessed under the same framework. The Spanish system exhibitsa significantly better performance than the rest of systems due to its better training. The bilingual system provides an equivalentperformance to that afforded by both language specific systems trained with the Eastern Spanish material or the Catalan SpeechDatcorpus.
2> Workshop Programme