HMMs for Automatic Phonetic Segmentation
Doroteo Torre Toledano (M.I.T. Laboratory for Computer Science,200 Technology Sq. 02139 Cambridge, MA, USA.)
Luis A. Hernández Gómez (Universidad Politécnica de Madrid, Ciudad Universitaria s/n, 28040 Madrid, SPAIN)
SP3 Annotation Tools: From Speech Segments To Dialogues
This paper presents an analysis of the most frequently used approach in automatic phonetic segmentation computing forced alignments using HMMs and features similar to those used in speech recognition. We start by analyzing the segmentation accuracy of context-dependent and context-independent HMMs, and proposing an explanation for the results. We focus our attention on the loss of correspondence between phones and context-dependent HMMs. This effect was already proposed to explain the surprisingly worse segmentation accuracy of context-dependent HMMs, given its clear superiority in speech recognition. We argue that this effect should lead to systematic segmentation errors. Therefore, we propose a new method, called Statistical Correction of Context Dependent Boundary Marks (SCCDBM), which partially corrects these systematic errors making segmentation results for context-dependent HMMs followed SCCDBM clearly superior to those obtained with context-independent HMMs. This observation empirically proves the existence of systematic segmentation errors and adds empirical evidence to the explanation for the worse segmentation accuracy of context-dependent HMMs. Finally, we analyze how speaker adaptation improves segmentation accuracy, and how speaker adaptation hardly modifies the systematic errors produced by context-dependent HMMs.
Phonetic segmentation, Speech recognition, Speech synthesis