Towards Best Practice for Multiword Expressions in Computational Lexicons


Nicoletta Calzolari (Istituto di Linguistica Computazionale, CNR, Pisa, Italy)

Charles J. Fillmore (ICSI, University of California Berkeley)

Ralph Grishman (New York University, USA)

Nancy Ide (Department of Computer Science, Vassar College, USA)

Alessandro Lenci (Università di Pisa, Italy)

Catherine MacLeod (New York University, USA)

Antonio Zampolli (Istituto di Linguistica Computazionale, CNR, Pisa, Italy)


WO19: Multi Word Expressions & Metaphors


The importance and role of multi-word expressions (MWE) in the description and processing of natural language has been long recognized. However, multi-word information has often been relegated to the marginal role of idiosyncratic lexical information. The need for MWE lexicons grows even more acute for multi-lingual applications, for which (sometimes complex) correspondences must be identified, classified, and recorded. Within the XMELLT and ISLE projects we have started to investigate the potential to develop multi-lingual, multi-word expression lexicons incorporating both syntactic and semantic information. We aim at specifying means to acquire and represent multi-word lexical entries for multiple languages, and establishing uniform (or inter-translatable) standards for describing multi-word lexical entries. We explored theoretical approaches used in large lexicon-building projects, in particular FrameNet and SIMPLE. They constitute interesting frameworks for the explicit syntactic and semantic representation of MWEs, due mainly to their ability to capture semantic multidimensionality, through frame elements and qualia relations respectively. We also developed an abstract data model for lexical information together with a representation in XML for it. Our goal is to define a set of minimal lexicon “objects”, which can serve not only as a model for MWEs but also for lexical data in general.


Multiword expressions, Semantic lexicons, Multilingual lexicons

Full Paper