Summary of the paper

Title Towards an Automatic Assessment of Crowdsourced Data for NLU
Authors Patricia Braunger, Wolfgang Maier, Jan Wessling and Maria Schmidt
Abstract Recent development of spoken dialog systems has moved away from a command-style input and aims at allowing a natural input style. Obtaining suitable data for training and testing such systems is a significant challenge. We investigate with which methods data elicited via crowdsourcing can be assessed with respect to its naturalness and usefulness. Since the criteria with which to assess usefulness depend on the application purpose of crowdsourced data we investigate various facets such as noisy data, naturalness and building natural language understanding (NLU) models. Our results show that valid data can be automatically identified with the help of a word based language model. A comparison of crowdsourced data and system usage data on lexical, syntactic and pragmatic level reveals detailed information on the differences between both data sets. However, we show that using crowdsourced data for training NLU services achieves similar results as system usage data.
Topics Evaluation Methodologies, Corpus (Creation, Annotation, Etc.), Speech Recognition/Understanding
Full paper Towards an Automatic Assessment of Crowdsourced Data for NLU
Bibtex @InProceedings{BRAUNGER18.539,
  author = {Patricia Braunger and Wolfgang Maier and Jan Wessling and Maria Schmidt},
  title = "{Towards an Automatic Assessment of Crowdsourced Data for NLU}",
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {May 7-12, 2018},
  address = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {979-10-95546-00-9},
  language = {english}
Powered by ELDA © 2018 ELDA/ELRA