Spoken and Written Language Resources for Vietnamese
Viet-Bac Le (1), Do-Dat Tran (1, 2), Eric Castelli (2), Laurent Besacier (1), Jean-François Serignat (1)
(1) CLIPS-IMAG Laboratory, UMR CNRS 5524, BP 53, 38041 Grenoble Cedex 9, FRANCE; (2) International Research Center MICA; 1 Dai Co Viet, Hanoi, VIETNAM
This paper presents an overview of our activities for spoken and written language resources for Vietnamese implemented at CLIPS-IMAG Laboratory and International Research Center MICA. A new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. The first results of a process of building a large Vietnamese speech database (VNSpeechCorpus) and a phonetic dictionary, which is used for automatic alignment process, are also presented.
Vietnamese language, Minority language, Speech corpus, Text corpus, Pronunciation dictionary