Construction of a Bilingual Arabic-Spanish Lexicon of Verbs Based on a Parallel Corpus


Doaa Samy (1); AntonioMoreno-Sandoval (1); Josť M. Guirao (2)

(1) Laboratorio de Lingüística Informática, Universidad Autónoma Madrid, Cantoblanco 28049 Madrid, Spain; (2) Dept. of Software Engineering, University of Granada




Parallel corpora are considered an important resource for the development of linguistic tools. In this paper our main goal is the development of a bilingual lexicon of verbs. The construction of this lexicon is possible using two main resources: I) a parallel corpus (through the alignment); II) the linguistic tools developed for Spanish (which serve as a starting point for developing tools for Arabic language). At the end, aligned equivalent verbs are detected automatically from a parallel corpus Spanish-Arabic. To achieve this goal, we had to pass through different preparatory stages concerning the assesment of the parallel corpus, the monolingual tokenization of each corpus, a preliminary sentence alignment and finally applying the model of automatic extraction of equivalent verbs. Our method is hybrid, since it combines both statistical and linguistic approaches.


Parallel Corpora, Arabic Processing, Spanish Processing, Alignment


Spanish, Arabic

Full Paper