A Spoken Afrikaans Language Resource Designed for Research on Pronunciation Variations


Daan Wissing (1), Jean-Pierre Martens (2), Ulrike Janke (1), Wim Goedertier (2)

(1) Human Language Technology Laboratory (HLT-L), North-West University, Potchefstroom Campus, Private Bag X6001, Potchefstroom, South Africa. ntldpw@puk.ac.za; (2) ELIS, Ghent University, Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium. martens@elis.ugent.be




In this contribution, the design, collection, annotation and planned distribution of a new spoken language resource of Afrikaans (SALAR) is discussed. The corpus contains speech of mother tongue speakers of Afrikaans, and is intended to become a primary national language resource for phonetic research and research on pronunciation variations. As such, the corpus is designed to expose pronunciation variations due to regional accents, speech rate (normal and fast speech) and speech mode (read and spontaneous speech). The corpus is collected by the Potchefstroom Campus of the North-West University, but in all phases of the corpus creation process there was a close collaboration with ELIS-UG (Belgium), one of the institutions that has been engaged in the creation of the Spoken Dutch Corpus (CGN).


Afrikaans, pronunciation variation, speech mode, multi-style LR

Language(s) Afrikaans
Full Paper