ADAM: The SI-TAL Corpus of Annotated Dialogues


Roldano Cattoni (itc-IRST, Trento, Italy)

Morena Danieli (Loquendo S.p.A., Torino, Italy)

Vanessa Sandrini (itc-IRST, Trento, Italy)

Claudia Soria (ILC-CNR, Pisa, Italy)


SO3: Dialogue-Conversation Evaluation


In this paper we describe the methodological assumptions, general architectural framework and annotation and encoding practices underlying the ADAM Corpus, which has been developed as part of the Italian national project SI-TAL. Each of the 450 dialogues is represented by an orthographic transcription and is annotated at five levels of linguistic information, namely prosody, pos tagging, syntax, semantics, and pragmatics.
A coherent, unitary approach to design and application of annotation schemes was pursued across all annotation levels. Particular attention was paid in developing the schemes in order to be consistent with criteria of robustness, wide coverage and compliance with existing standards.
The evaluation of the annotation revealed a high degree of either inter-annotator agreement and annotation accuracy, with very promising results for what concerns the usability of the annotation schemes proposed and the accuracy of the annotation applied to the corpus. The ADAM Corpus also represents an interesting experiment at the architectural design level, as the way in which the annotation is organized and structured, as well as represented in a given physical format, aims at maximizing further reusability of the annotated material in terms of wide circulability of the corpus across different annotation practices and research purposes.


Spoken dialogue corpus, Multilevel annotation, Standoff markup, Annotation standards, Validation

