Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”.


Morena Danieli (1), Juan María Garrido (2), Massimo Moneglia (3), Andrea Panizza (1), Silvia Quazza (1), Marc Swerts (4)

(1) LOQUENDO, Torino; (2) Telefónica Investigación y Desarrollo; (3) LABLITA, Dipartimento di Italianistica, Università di Firenze; (4) Tilburg University, Faculty of Arts




C-ORAL-ROM, Integrated Reference Corpora For Spoken Romance Languages, is a multilingual corpus of spontaneous speech delivered within the IST Program. Corpora are tagged with respect to terminal and non terminal prosodic breaks. Terminal breaks are considered the most perceptively relevant cues to determine the utterance boundaries in spontaneous speech resources. The paper presents the evaluation of the inter-annotator agreement accomplished by an institution external to the consortium and shows the level of reliability of the tagging delivered and the annotation scheme adopted. The data show, at cross-linguistic level, a very high K coefficient (between 7.7 and 9.2, according to the language resource). A strong level of agreement specifically for terminal breaks has also been recorded. The data thus show that the annotation of the utterances identified in terms of their prosodic breaks is able to capture relevant perceptual facts, and it appears that the proposed coding scheme can be applied in a highly replicable way.


Evaluation, Speech corpora, romance languages, prosody.

Language(s) Italian, French, Portuguese, Spanish
Full Paper