HMMs for Automatic Phonetic Segmentation


Doroteo Torre Toledano (M.I.T. Laboratory for Computer Science,200 Technology Sq. 02139 Cambridge, MA, USA.)

Luis A. Hernández Gómez (Universidad Politécnica de Madrid, Ciudad Universitaria s/n, 28040 Madrid, SPAIN)


SP3 Annotation Tools: From Speech Segments To Dialogues


This paper presents an analysis of the most frequently used approach in automatic phonetic segmentation ­ computing forced alignments using HMMs and features similar to those used in speech recognition. We start by analyzing the segmentation accuracy of context-dependent and context-independent HMMs, and proposing an explanation for the results. We focus our attention on the loss of correspondence between phones and context-dependent HMMs. This effect was already proposed to explain the surprisingly worse segmentation accuracy of context-dependent HMMs, given its clear superiority in speech recognition. We argue that this effect should lead to systematic segmentation errors. Therefore, we propose a new method, called Statistical Correction of Context Dependent Boundary Marks (SCCDBM), which partially corrects these systematic errors making segmentation results for context-dependent HMMs followed SCCDBM clearly superior to those obtained with context-independent HMMs. This observation empirically proves the existence of systematic segmentation errors and adds empirical evidence to the explanation for the worse segmentation accuracy of context-dependent HMMs. Finally, we analyze how speaker adaptation improves segmentation accuracy, and how speaker adaptation hardly modifies the systematic errors produced by context-dependent HMMs. 


Phonetic segmentation, Speech recognition, Speech synthesis

Full Paper