SUMMARY : Session O11-W Corpus & Lexicon


Title A translated corpus of 30,000 French SMS
Authors C. Fairon, S. Paumier
Abstract The development of communication technologies has contributed to the appearance of new forms in the written language that scientists have to study according to their peculiarities (typing or viewing constraints, synchronicity, etc). In the particular case of SMS (Short Message Service), studies are complicated by a lack of data, mainly due to technical constraints and privacy considerations. In this paper, we present a corpus of 30,000 French SMS collected through a project in Belgium named “Faites don de vos SMS ŕ la science” (Give your SMS to Science). This corpus is unique in its quality, its size and the fact that the SMS have been manually translated into “standard” French. We will first describe the collection process and discuss the writers' profiles. Then we will explain in detail how the translation was carried out.
