|Title||Image-Language Multimodal Corpora: Needs, Lacunae and an AI Synergy for Annotation|
Katerina Pastra, Yorick Wilks
Department of Computer Science, University of Sheffield, U.K.
|Abstract||The growing demand for intelligent multimedia systems has led to the development of various multimodal resources and corresponding annotation schemes and processing tools. In this paper, we argue that there is a striking lack of multimodal corpora capturing the association and interaction of visual and linguistic data. We relate this research lacuna to vision-language integration prototypes developed within Artificial Intelligence (AI) and show how the needs of the latter dictate the development of such resources for a wide variety of applications. We identify the annotation requirements imposed on image-language corpora by these needs and the nature of the modalities involved and suggest a semi-automatic way of meeting them.|
|Keyword(s)||Multimodal Corpora, annotation|