Current Developments of STO - The Danish Lexicon Project for NLP and HLT Applications
Anna Braasch (Center for Sprogteknologi, Copenhagen)
The Centre for Language Technology (Center for Sprogteknologi, CST) is in charge of a national project developing a large-scale Danish lexicon for HLT and NLP applications. The short name of the project is STO, which stands for SprogTegnologisk Ordbase (Lexical Database for Language Technology). The project is inspired by principles and methods applied in the multilingual LE-PAROLE project (1996-98) the aim of which was to develop harmonised written language resources for 12 EU languages. The Danish PAROLE lexicon was produced by CST and the STO project highly benefits from the experience acquired from the work mentioned. This paper deals with a few central tasks of the ongoing project. It discusses the development of a smaller lexical resource produced in a multilingual environment into a large-scale, monolingual resource. Two different methods of increasing the vocabulary will be presented in detail; the extension of the linguistic coverage and the refinement of the linguistic description by including more detailed language-specific information. Finally, some exploitation perspectives and the development of an internet-based user-interface will be presented. The STO project gets funding from the Danish Ministry for Science, Technology and Development for a period of three years (2001-2004).
Comprehensive computational lexicon of Danish, NLP/HLT applications, Information content and structure, Current developments of lexical and linguistic coverage, Treatment of language-specific phenomena, Exploitation of the lexical resource