Title MetaMorpho TM: A Rule-Based Translation Corpus
Author(s) Tamás Gröbler, Gábor Hodász, Balázs Kis

MorphoLogic , Orbánhegyi út 5. H-1126 Budapest, Hungary , {grobler;hodasz;kis}@morphologic.hu

This paper discusses the aspects of bi-lingual resource processing within a rule-based translation memory (TM) system currently being developed. Translation memories can be viewed as translation tools incorporating parallel corpora, mainly aligned at the sentence level. Usually, these corpora have no linguistic annotation, as commercial TM systems perform queries at the character level, using fuzzy matches. The proposed translation memory system uses linguistic analysis (morphology and parsing) to determine similarity between two source-language segments, and attempts to assemble a sensible translation using translations of source-language chunks if the entire source segment was not found. This is achieved by integrating a rule-based machine translation (RBMT) engine. The drawback of this approach is language-dependence; however, proper grammar acquisition methods are being developed to speed up grammar preparation for further language pairs. This paper addresses the problem of adding sufficient linguistic annotation to segment pairs – translation units (TU) – for new segment pairs to integrate with the RBMT scheme. This should be fully automatic because adding a new translation unit to a translation memory must be transparent, without requiring user reaction. The paper discusses a robust enough method to obtain as much linguistic annotation as possible, while keeping the error rate low.

Keyword(s) MetaMorpho TM, Translation Corpus
