Title Towards a Strategy for a Representation of Collocations - Extending the Danish PAROLE-lexicon
Authors Braasch Anna (Center for Sprogteknologi Njalsgade 80, DK-2300, Denmark, e-mail:
Olsen Sussi (Center for Sprogteknologi Njalsgade 80, DK-2300, Denmark, e-mail:
Keywords Collocation, NLP-Lexicon, PAROLE, Word Combinations
Session Session WP4 - Lexicon: Semantic and Multilingual Issues
Abstract We describe our attempts to formulate a pragmatic definition and a partial typology of the lexical category of ’collocation’ taking both lexicographical and computational aspects into consideration. This provides a suitable basis for encoding collocations in an NLP-lexicon. Further, this paper explains the principles of an operational encoding strategy which is applied to a core section of the typology, namely to subtypes of verbal collocation. This strategy is adapted to a pre-defined lexicon model which has been developed in the PAROLE-project. The work is carried out within the framework of the STO-project the aim of which is to extend the Danish PAROLE-lexicon. The encoding of collocations, in addition to single-word lemmas, greatly increases the lexical and linguistic coverage and thereby also the usability of the lexicon as a whole. Decisions concerning the selection of the most frequent types of collocation to be encoded are made on empirical data i.e. corpus-based recognition. We present linguistic descriptions with focus on some characteristic syntactic features of collocations that are observed in a newspaper corpus. We then give a few prototypical examples provided with formalised descriptions in order to illustrate the restriction features. Finally, we discuss the perspectives of the work done so far.