| Title | 
  Automatic Annotation and Manual Evaluation of the Diachronic German Corpus TüBa-D/DC | 
  
  
  | Authors | 
  Erhard Hinrichs and Thomas Zastrow | 
  
  
  | Abstract | 
  This paper presents the Tübingen Baumbank des Deutschen Diachron (TüBa-D/DC), a linguistically annotated corpus of selected diachronic materials from the German Gutenberg Project. It was automatically annotated by a suite of NLP tools integrated into WebLicht, the linguistic chaining tool used in CLARIN-D. The annotation quality has been evaluated manually for a subcorpus ranging from Middle High German to Modern High German. The integration of the TüBa-D/DC into the CLARIN-D infrastructure includes metadata provision and harvesting as well as sustainable data storage in the Tübingen CLARIN-D center. The paper further provides an overview of the possibilities of accessing the TüBa-D/DC data. Methods for full-text search of the metadata and object data and for annotation-based search of the object data are described in detail. The WebLicht Service Oriented Architecture is used as an integrated environment for annotation based search of the TüBa-D/DC. WebLicht thus not only serves as the annotation platform for the TüBa-D/DC, but also as a generic user interface for accessing and visualizing it. | 
  
  
  | Topics | 
  Corpus (creation, annotation, etc.), Grammar and Syntax, Part of speech tagging   | 
  
  
  Full paper   | 
  Automatic Annotation and Manual Evaluation of the Diachronic German Corpus TüBa-D/DC | 
  
  
  | Bibtex | 
  @InProceedings{HINRICHS12.166, 
   author =  {Erhard Hinrichs and Thomas Zastrow},    title =  {Automatic Annotation and Manual Evaluation of the Diachronic German Corpus TüBa-D/DC},    booktitle =  {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},    year =  {2012},    month =  {may},    date =  {23-25},    address =  {Istanbul, Turkey},    editor =  {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},    publisher =  {European Language Resources Association (ELRA)},    isbn =  {978-2-9517408-7-7},    language =  {english}  }   |