Translation Unit concerning Timing of Simultaneous Translation


Hideki KASHIOKA (ATR Spoken Language Translation Research Laboratories)


SO2: Speech To Speech Translation


This paper discusses and proposes a translation unit for simultaneous translation using a machine translation system. Monologues, such as lectures or broadcast news, are used as the target of simultaneous speech translation. To date, a lot of research on speech translation has dealt with dialogues, especially travel conversations. Most of the speech translation systems in MT have treated a sentence as a translation unit. In the ATR travel conversation database, sentence length is less than 10 words on average. Therefore, most of the sentences are simple and almost all of the utterances are constructed in one or two sentences. However, the sentences of monologues are longer than travel dialogues. They have over 30 words (as in ``ASU-wo-YOMU,'' a TV news commentary program) on average, and most of the sentences are complex or compound. Accordingly, it is difficult to treat a sentence as a translation unit for monologues, and thus an appropriate translation unit needs to be found. Considering this, we hypothesized that an adequate translation unit of speech translation systems relates to the translation unit of a human simultaneous translator. Therefore, we collected simultaneous translation data from lectures by human translators and investigated the characteristics of monologues and simultaneous translatio


Machine translation, Simultaneous translation, Translation unit, Parallel corpus, Monologue

Full Paper