A Comparative Study on Human Communication Behaviors and Linguistic Characteristics for Speech-to-Speech Translation
Toshiyuki Takezawa, Genichiro Kikui
ATR Spoken Language Translation Research Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan
A large bilingual corpus of English and Japanese is being built at ATR Spoken Language Translation Research Laboratories in order to improve speech translation technology to the level where people can use a portable translation system for traveling abroad, dining and shopping, and hotel situations. As a part of these corpus construction activities, we have been collecting spoken dialogue data by using an experimental translation system between English and Japanese. In a previous study, we found that humans communicate as part of their daily social life, so they prefer using complex sentences and saying than one sentence per utterance. However, corpus-based machine translation systems for conversational expressions tend to be limited to dealing with short simple sentences. To find a way to bridge the gap between human communication behaviors and system performance, we examined the relationship between instructions and linguistic expressions. The experimental results suggest that a state-of-the-art translation system may be useful for subjects who can make their utterance length short by following instructions.
Bilingual corpus, spoken language, speech translation, spoken dialogue, utterance length.