A Multilingual Database of Idioms
Aline Villavicencio (1), Timothy Baldwin (2), Benjamin Waldron (1)
(1) University of Cambridge Computer Laboratory, (Villavicencio and Waldron); (2) CSLI, Stanford University, (Baldwin)
This paper presents a possible architecture for a multilingual database of idioms. We discuss the challenges that idioms present to the creation of such a database and propose a possible encoding that maximises the amount of information that can be stored for different languages. Such a resource provides important information for linguistic, computational linguistic and psycholinguistic use, and allows for the comparison of different phenomena in different languages. This can provide the basis for a better understanding of regularities in idioms across languages.
Idiom, lexical database, multiword expression