SUMMARY : Session O23-SG Speech Corpora & Annotation


Title Dependency-structure Annotation to Corpus of Spontaneous Japanese
Authors K. Uchimoto, R. Hamabe, T. Maruyama, K. Takanashi, T. Kawahara, H. Isahara
Abstract In Japanese, syntactic structure of a sentence is generally represented by the relationship between phrasal units, or bunsetsus inJapanese, based on a dependency grammar. In the same way, thesyntactic structure of a sentence in a large, spontaneous, Japanese-speech corpus, the Corpus of Spontaneous Japanese (CSJ), isrepresented by dependency relationships between bunsetsus. This paper describes the criteria and definitions of dependency relationships between bunsetsus in the CSJ. The dependency structure of the CSJ is investigated, and the difference in the dependency structures ofwritten text and spontaneous speech is discussed in terms of thedependency accuracies obtained by using a corpus-based model. It is shown that the accuracy of automatic dependency-structure analysis canbe improved if characteristic phenomena of spontaneous speech such as self-corrections, basic utterance units in spontaneous speech, and bunsetsus that have no modifiee are detected and used for dependency-structure analysis.
Keywords dependency structure, annotation, spontaneous speech corpus, annotation tool, dependency-structure analysis
