Lexical and Textual Resources for Sense Recognition and Description


Jerker Jaerborg (Spraakdata University of Goeteborg, Sweden)

Dimitrios Kokkinakis (Spraakdata University of Goeteborg, Sweden)

Maria Toporowska Gronostaj (Spraakdata University of Goeteborg, Sweden)


It is common knowledge that the creation of language resources for Language  Engineering (LE) applications is a time-consuming, and hence expensive, enterprise.  From this knowledge stems the demand for the re-usability of resources, which always remains essential. In this paper we will, however, concentrate on another, complementary, aspect, namely that of combining and extending existing resources by a variety of means and with a minimum of manual interaction. The resources to be discussed below consist of (i) a large lexical database, (ii) a formalized computational lexicon, and (iii) a sense-tagged corpus for Swedish. Some results concerning the semi-automatic annotation of the corpus and examples of a variety of phenomena analysed, such as compounding, will also be given. The annotation has been performed within the framework of the SemTag project, while part of this material has been successfully used in the SENSEVAL-2 exercise. In addition to these three resources, it can be added the background material of the Swedish Language Bank (some hundred million words) that forms the basis for the creation of (i) and partly (ii). Having been developed at our department, the lexical resources can easily be accessed, and, more importantly, can be systematically improved where necessary. It should be noted that this type of work requires close cooperation between specialists in lexicography and language technology.


Lexical semantics, Semantic tagging, Dictionary senses, Semantic resources

