SUMMARY : Session O18-M Multimodal Corpora Annotation and Tools


Title H. C. Andersen Conversation Corpus
Authors N. Bernsen, L. Dybkjśr, S. Kiilerich
Abstract This paper describes the design, collection and current status of the Hans Christian Andersen (HCA) conversation corpus. The corpus consists of five separate corpora and represents transcription and annotation of some 57 hours of English spoken and deictic gesture user-system interaction recorded mainly with children 2002-2005. The corpora were collected as part of the development and evaluation process of two consecutive research prototypes. The set-up used to collect each corpus is described as well as our use of each corpus in system development. We describe the annotation of each corpus and briefly present various uses we have made of the corpora so far. The HCA corpus was made publicly available at in March 2006.
Keywords Multimodal corpora, speech and pointing gesture, annotation
