Summary of the paper

Title Towards an Automatic Assessment of Crowdsourced Data for NLU
Authors Patricia Braunger, Wolfgang Maier, Jan Wessling and Maria Schmidt
Abstract Recent development of spoken dialog systems has moved away from a command-style input and aims at allowing a natural input style. Obtaining suitable data for training and testing such systems is a significant challenge. We investigate with which methods data elicited via crowdsourcing can be assessed with respect to its naturalness and usefulness. Since the criteria with which to assess usefulness depend on the application purpose of crowdsourced data we investigate various facets such as noisy data, naturalness and building natural language understanding (NLU) models. Our results show that valid data can be automatically identified with the help of a word based language model. A comparison of crowdsourced data and system usage data on lexical, syntactic and pragmatic level reveals detailed information on the differences between both data sets. However, we show that using crowdsourced data for training NLU services achieves similar results as system usage data.
Topics Evaluation Methodologies, Corpus (Creation, Annotation, Etc.), Speech Recognition/Understanding
Full paper Towards an Automatic Assessment of Crowdsourced Data for NLU
