Multi-Tier Annotations in the Verbmobil Corpus
Karl Weilhammer (Institut für Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universität München)
Uwe Reichel (Institut für Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universität München)
Florian Schiel (Institut für Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universität München)
SP2: Speech Varieties And Multilingual ASR
In very large and diverse scientific projects where as different groups as linguists and engineers with different intentions work on the same signal data or its orthographic transcript and annotate new valuable information, it will not be easy to build a homogeneous corpus. We will describe how this can be achieved, considering the fact that some of these annotations have not been updated properly, or are based on erroneous or deliberately changed versions of the basis transcription. We used an algorithm similar to dynamic programming to detect differences between the transcription on which the annotation depends and the reference transcription for the whole corpus. These differences are automatically mapped on a set of repair operations for the transcriptions such as splitting compound words and merging neighbouring words. On the basis of these operations the correction process in the annotation is carried out. It always depends on the type of the annotation as well as on the position and the nature of the difference, whether a correction can be carried out automatically or has to be fixed manually. Finally we present a investigation in which we exploit the multi-tier annotations of the Verbmobil corpus to find out how breathing is correlated with prosodic-syntactic boundaries and dialog acts.
Spontaneous speech, Aligning annotations, Breathing, Dialog corpus