Evaluating Multimodal NLG using Production Experiments


Ielka van der Sluis (1), Emiel Krahmer (2)

(1) Computational Linguistics & AI; (2) Communication & Cognition, Faculty of Arts, Tilburg University




In this paper we report on an evaluation study for the generation of multimodal referring expressions. To test our algorithm, which allows for various gradations of preciseness in pointing, subjects performed an object identification task in a strict experimental setting. 20 subjects participated and were instructed to always use a pointing gesture (they were led to believe they were testing a new kind of `digital pointing device'). The subjects performed their tasks on two distances: close (10 subjects) and at a distance of 2.5 meters (10 subjects). The assumption is that these conditions yield precise and imprecise pointing gestures respectively. In addition we varied the `type' of target objects (geometrical figures versus pictures of persons). This study resulted in a corpus of 600 multimodal referring expressions. A statistical analysis (ANOVA) revealed a main effect of distance (subjects adapt their language to the kind of pointing gesture) and also a main effect of target (persons are more difficult to describe than objects). The advantages and disadvantages of this evaluation method are discussed.


evaluation, multimodality, spoken language processing, production experiment, natural language generation

Language(s) Dutch
Full Paper