Summary of the paper

Title A Multilingual Wikified Data Set of Educational Material
Authors Iris Hendrickx, Eirini Takoulidou, Thanasis Naskos, Katia Lida Kermanidis, Vilelmini Sosoni, Hugo De Vos, Maria Stasimioti, Menno Van Zaanen, Panayota Georgakopoulou, Valia Kordoni, Maja Popovic, Markus Egg and Antal Van den Bosch
Abstract We present a parallel wikified data set of parallel texts in eleven language pairs from the educational domain. English sentences are lined up to sentences in eleven other languages (BG, CS, DE, EL, HR, IT, NL, PL, PT, RU, ZH) where names and noun phrases (entities) are manually annotated and linked to their respective Wikipedia pages. For every linked entity in English, the corresponding term or phrase in the target language is also marked and linked to its Wikipedia page in that language. The annotation process was performed via crowdsourcing. In this paper we present the task, annotation process, the encountered difficulties with crowdsourcing for complex annotation, and the data set in more detail. We demonstrate the usage of the data set for Wikification evaluation. This data set is valuable as it constitutes a rich resource consisting of annotated data of English text linked to translations in eleven languages including several languages such as Bulgarian and Greek for which not many LT resources are available.
Topics Topic Detection & Tracking, Corpus (Creation, Annotation, Etc.), Other
Full paper A Multilingual Wikified Data Set of Educational Material
Bibtex @InProceedings{HENDRICKX18.515,
  author = {Iris Hendrickx and Eirini Takoulidou and Thanasis Naskos and Katia Lida Kermanidis and Vilelmini Sosoni and Hugo De Vos and Maria Stasimioti and Menno Van Zaanen and Panayota Georgakopoulou and Valia Kordoni and Maja Popovic and Markus Egg and Antal Van den Bosch},
  title = "{A Multilingual Wikified Data Set of Educational Material}",
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {May 7-12, 2018},
  address = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {979-10-95546-00-9},
  language = {english}
Powered by ELDA © 2018 ELDA/ELRA