Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects


Massimo Moneglia

LABLITA, Dipartimento di Italianistica, University of Florence




The paper provides cross-linguistic measurements of everyday language use based on the C-ORAL-ROM multilingual corpus of spontaneous speech. The average and the variation coefficient of a series of standard parameters are provided, faced to the main sociological and structural contexts of spoken language use. Mid-Length of Utterances (MLU); Mid-Length of the dialogic turn (MLTw); Speed; Mid length of the tone unit (MLTone); Fragmentation. Such variation parameters show strong predictable characters at cross-linguistic level. MLU has a positive correlation with MLTw and is shows highly predictable values in informal dialogic structures. Both MLU and MLTw have an inverse correlation with Speed. MLTone and Speed are predictable according to language specific features, but while MLTone have low intra-linguistic variation, Speed record a cross-linguistic tendency to lower values in formal language uses. Fragmentation is a permanent feature of spoken language, but it varies mainly according with speakers.


speech corpora, metrics, multilinguality, romance languages

Language(s) Italian, French, Portuguese, Spanish
Full Paper