Summary of the paper

Title Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons
Authors Antoine Bourlon, Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi
Abstract Sentence alignment is a task that consists in aligning the parallel sentences in a translated article pair. This paper describes a method to perform sentence boundary detection and alignment simultaneously, which significantly improves the alignment accuracy on languages like Chinese with uncertain sentence boundaries. It relies on the definition of hard (certain) and soft (uncertain) punctuation delimiters, the latter being possibly ignored to optimize the alignment result. The alignment method is used in combination with lexicons automatically generated from the input article pairs using pivot-based MT, achieving better coverage of the input words with fewer entries than pre-existing dictionaries. Pivot-based MT makes it possible to build dictionaries for language pairs that have scarce parallel data. The alignment method is implemented in a tool that will be freely available in the near future.
Topics Corpus (Creation, Annotation, etc.), Machine Translation, SpeechToSpeech Translation, Lexicon, Lexical Database
Full paper Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons
Bibtex @InProceedings{BOURLON16.427,
  author = {Antoine Bourlon and Chenhui Chu and Toshiaki Nakazawa and Sadao Kurohashi},
  title = {Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons},
  booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
  year = {2016},
  month = {may},
  date = {23-28},
  location = {Portoro┼ż, Slovenia},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {978-2-9517408-9-1},
  language = {english}
 }
Powered by ELDA © 2016 ELDA/ELRA