A Part-of-Speech-Based Search Algorithm for Translation Memories
Reinhard Rapp (University of Mainz, FASK 76711 Germersheim, Germany)
WP1: Corpora & Corpus Tools
The retrieval of related sentences in state-of-the-art translation memory systems is based on orthographic similarities. This often leads to poor search results, since orthographically similar sentences are not necessarily semantically related. In this paper we propose a search algorithm that aims to reduce this problem by taking part-of-speech information into account. It requires that the parallel sentences stored in the translation memory are processed using standard tools for word alignment and part-of-speech tagging. The work described is part of an ongoing project in example-based machine translation.
Example-Based machine translation, Translation memory, Part-of-Speech tagging, Retrieval algorithm, Sentence similarity