Title

Title	Lexical token alignment: experiments, results and applications
Authors	Dan Tufiş (RACAI, 13 Septembrie, 13, Bucharest 1, Romania) Ana-Maria Barbu (RACAI, 13 Septembrie, 13, Bucharest 1, Romania)
Session	WP1: Corpora & Corpus Tools
Abstract	Lexical alignment is one of the most challenging tasks in processing and exploiting parallel texts. There are numerous applications that may benefit from an accurate multilingual lexical alignment of bi- and multi-language corpora. We describe in this paper a hypothesistesting approach to the problem of automatic extraction of translation equivalents from sentence-aligned and tagged parallel corpora. The algorithm was used for automatic extraction of 6 bi-lingual lexicons with English as source language and Bulgarian, Czech, Estonian, Hungarian, Romanian and Slovene as the target one, as well as a 7-language lexicon with English as a hub and the other 6 CEE languages. For the experiments described here we used the 7-language aligned corpus based on Orwell’s "1984" novel.
Keywords	Lexical token alignment
Full Paper	32.pdf