Using a Parallel Transcript/Subtitle Corpus for Sentence Compression
Vincent Vandeghinste (1), Erik Tjong Kim Sang (2)
(1) Centre for Computational Linguistics, KULeuven, Belgium; (2) CNTS, University of Antwerp, Belgium
The paper describes the construction and usage of a parallel corpus consisting of transcripts of television programs on the one hand and subtitles of those television programs on the other hand. The subtitles were targeted at hearing-impaired people. They are in the same language as the television programs (Dutch). Our goal is to convert transcripts to subtitles. We will apply the corpus for learning how to perform sentence compression in much the same way as Jing (2001).
subtitling, sentence compression, sentence alignment, hearing-impaired, sentence reduction