Summary of the paper

Title An Annotation Scheme and Gold Standard for Dutch-English Word Alignment
Authors Lieve Macken
Abstract The importance of sentence-aligned parallel corpora has been widely acknowledged. Reference corpora in which sub-sentential translational correspondences are indicated manually are more labour-intensive to create, and hence less wide-spread. Such manually created reference alignments -- also called Gold Standards -- have been used in research projects to develop or test automatic word alignment systems. In most translations, translational correspondences are rather complex; for example word-by-word correspondences can be found only for a limited number of words. A reference corpus in which those complex translational correspondences are aligned manually is therefore also a useful resource for the development of translation tools and for translation studies. In this paper, we describe how we created a Gold Standard for the Dutch-English language pair. We present the annotation scheme, annotation guidelines, annotation tool and inter-annotator results. To cover a wide range of syntactic and stylistic phenomena that emerge from different writing and translation styles, our Gold Standard data set contains texts from different text types. The Gold Standard will be publicly available as part of the Dutch Parallel Corpus.
Topics Corpus (creation, annotation, etc.), Multilinguality, Machine Translation, SpeechToSpeech Translation
Full paper An Annotation Scheme and Gold Standard for Dutch-English Word Alignment
Slides -
Bibtex @InProceedings{MACKEN10.100,
  author = {Lieve Macken},
  title = {An Annotation Scheme and Gold Standard for Dutch-English Word Alignment},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
Powered by ELDA © 2010 ELDA/ELRA