Towards the Use of Word Stems and Suffixes for Statistical Machine Translation


Maja Popović, Hermann Ney

Lehrstuhl für Informatik VI - Computer Science Department, RWTH Aachen University, Ahornstrasse 55, 52056 Aachen, Germany, {popovic, ney}@cs.rwth-aachen.de




In this paper we present methods for improving the quality of translation from an inflected language into English by making use of part-of-speech tags and word stems and suffixes in the source language. Results for translations from Spanish and Catalan into English are presented on the LC-STAR trilingual corpus which consists of spontaneously spoken dialogues in the domain of travelling and appointment scheduling. Results for translation from Serbian into English are presented on the Assimil language course, the bilingual corpus from unrestricted domain. We achieve up to 5% relative reduction of error rates for Spanish and Catalan and about 8% for Serbian


statistical machine translation, stem, suffix, POS tags

Language(s) Spanish, Catalan, Serbian
Full Paper