LREC 2018 Proceedings

Summary of the paper

Title	Improving Machine Translation of Educational Content via Crowdsourcing
Authors	Maximiliana Behnke, Antonio Valerio Miceli Barone, Rico Sennrich, Vilelmini Sosoni, Thanasis Naskos, Eirini Takoulidou, Maria Stasimioti, Menno Van Zaanen, Sheila Castilho, Federico Gaspari, Panayota Georgakopoulou, Valia Kordoni, Markus Egg and Katia Lida Kermanidis
Abstract	The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with pre-existing in-domain corpora.
Topics	Crowdsourcing, Corpus (Creation, Annotation, Etc.), Machine Translation, Speechtospeech Translation
Full paper	Improving Machine Translation of Educational Content via Crowdsourcing
Bibtex	@InProceedings{BEHNKE18.855, author = {Maximiliana Behnke and Antonio Valerio Miceli Barone and Rico Sennrich and Vilelmini Sosoni and Thanasis Naskos and Eirini Takoulidou and Maria Stasimioti and Menno Van Zaanen and Sheila Castilho and Federico Gaspari and Panayota Georgakopoulou and Valia Kordoni and Markus Egg and Katia Lida Kermanidis}, title = "{Improving Machine Translation of Educational Content via Crowdsourcing}", booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {May 7-12, 2018}, address = {Miyazaki, Japan}, editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga}, publisher = {European Language Resources Association (ELRA)}, isbn = {979-10-95546-00-9}, language = {english} }