Summary of the paper

Title Fivehundredmillionandone Tokens. Loading the AAC Container with Text Resources for Text Studies.
Authors Hanno Biber and Evelyn Breiteneder
Abstract The """"AAC - Austrian Academy Corpus"""" is a diachronic German language digital text corpus of more than 500 million tokens. The text corpus has collected several thousands of texts representing a wide range of different text types. The primary research aim is to develop text language resources for the study of texts. For corpus linguistics and corpus based language research large text corpora need to be structured in a systematic way. For this structural purpose the AAC is making use of the notion of container. By container in the context of corpus research we understand a flexible system of pragmatic representation, manipulation, modification and structured storage of annotated items of text. The issue of representing a large corpus in formats that offer only limited space is paradigmatic for the general task of representing a language by just a small collection of text or a small sample of the language. Methods based upon structural normalization and standardization have to be developed in order to provide useful instruments for text studies.
Topics LR national/international projects, organizational/policy issues, Usability, user satisfaction, Corpus (creation, annotation, etc.)
Full paper Fivehundredmillionandone Tokens. Loading the AAC Container with Text Resources for Text Studies.
Bibtex @InProceedings{BIBER12.857,
  author = {Hanno Biber and Evelyn Breiteneder},
  title = {Fivehundredmillionandone Tokens. Loading the AAC Container with Text Resources for Text Studies.},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
 }
Powered by ELDA © 2012 ELDA/ELRA