Summary of the paper

Title Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT
Authors Jesús González-Rubio, Jorge Civera, Alfons Juan and Francisco Casacuberta
Abstract Currently, a great effort is being carried out in the digitalisation of large historical document collections for preservation purposes. The documents in these collections are usually written in ancient languages, such as Latin or Greek, which limits the access of the general public to their content due to the language barrier. Therefore, digital libraries aim not only at storing raw images of digitalised documents, but also to annotate them with their corresponding text transcriptions and translations into modern languages. Unfortunately, ancient languages have at their disposal scarce electronic resources to be exploited by natural language processing techniques. This paper describes the compilation process of a novel Latin-Catalan parallel corpus as a new task for statistical machine translation (SMT). Preliminary experimental results are also reported using a state-of-the-art phrase-based SMT system. The results presented in this work reveal the complexity of the task and its challenging, but interesting nature for future development.
Topics Corpus (creation, annotation, etc.), Machine Translation, SpeechToSpeech Translation, Statistical and machine learning methods
Full paper Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT
Slides -
Bibtex @InProceedings{GONZLEZRUBIO10.541,
  author = {Jesús González-Rubio and Jorge Civera and Alfons Juan and Francisco Casacuberta},
  title = {Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
Powered by ELDA © 2010 ELDA/ELRA