Scaling Up an MT Prototype for Industrial Use - Databases and Data Flow


Anna Sågvall Hein (Uppsala University and Scania CV AB)

Eva Forsbom (Uppsala University and Scania CV AB)

Jörg Tiedemann (Uppsala University and Scania CV AB)

Per Weijnitz (Uppsala University and Scania CV AB)

Ingrid Almqvist (Uppsala University and Scania CV AB)

Leif-Jöran Olsson (Uppsala University and Scania CV AB)

Sten Thaning (Uppsala University and Scania CV AB)


WP5: Components & Systems


In a cooperative project between Uppsala University, the bus and truck manufacturing company Scania CV AB, and the translation company Explicon AB, issues of scaling up the transfer-based machine translation prototype MULTRA for industrial use is beeing investigated. The project is limited to one domain, automotive service literature, and one translation direction, Swedish to English, but issues concerning the change of domain, translation direction and language pair are also considered. Three focal points of the project work have been the design and implementation of the new MATS system, including the redesign, porting and integration of MULTRA, the redesign and implementation of the dictionaries of the language modules as a lexical database, and the scaling up of the dictionaries and the grammars. The system is currently trained on a corpus of aligned bitexts from the automotive service domain. The coverage of the lexical data is almost complete, and validated by professional translators, but the grammars are still limited. Despite the incomplete state of the grammars, the system already translates more than a third of the segments in the corpus. Preliminary evaluations of system performance and coverage have been made, and further development of evaluation methods and metrics are in progress.


Machine translation, Language resources, Translation system, Evaluation, Scaling-Up

Full Paper