Evaluation Corpora for Sense Disambiguation in the Medical Domain
Diana Raileanu (DFKI GmbH Stuhlsatzenhausweg 3, 66123 Saarbrücken, Germany)
Paul Buitelaar (DFKI GmbH Stuhlsatzenhausweg 3, 66123 Saarbrücken, Germany)
Spela Vintar (DFKI GmbH Stuhlsatzenhausweg 3, 66123 Saarbrücken, Germany)
Jörg Bay (Zinfo, University of Frankfurt 60590 Frankfurt am Main, Germany)
An important aspect of word sense disambiguation is the evaluation of different methods and parameters. Unfortunately, there is a lack of test sets for evaluation, specifically for languages other than English and even more so for specific domains like medicine. Given that our work focuses on English as well as German text in the medical domain, we had to develop our own evaluation corpora in order to test our disambiguation methods. In this paper we describe the work on developing these corpora, using GermaNet and UMLS as (lexical) semantic resources, next to a description of the annotation tool KiC that we developed for support of the annotation task.