Summary of the paper

Title Large aligned treebanks for syntax-based machine translation
Authors Gideon Kotzé, Vincent Vandeghinste, Scott Martens and Jörg Tiedemann
Abstract We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- and example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-MT). For the language pair Dutch to English, we present evaluation scores of both the nonterminal constituent alignments and the MT system itself, and in the latter case, compare them with those of Moses, a current state-of-the-art statistical MT system, when trained on the same data.
Topics Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), Grammar and Syntax
Full paper Large aligned treebanks for syntax-based machine translation
