Title An Annotated German-Language Medical Text Corpus as Language Resource
Author(s) Joachim Wermter, Udo Hahn

Text Knowledge Engineering Lab, Freiburg University, Werthmannplatz 1, D-79098 Freiburg, Germany

Abstract We describe the structure of a German-language corpus which contains a variety of medical text genres. Clinical documents (discharge summaries, pathology, histology and surgery reports) are distinguished from non-clinical ones (textbook articles and consumer health care documents from a Web portal). After introducing a medical extension of the general-language STTS tagset which accounts for unique features of the medical sublanguage encountered in these documents, we discuss some of the quantitative properties of the annotations (e.g., distribution patterns of part-of-speech tags).
Keyword(s) Text corpus, medical application, annotation, tagging, sublanguage
Language(s) German
