Phonological Treebanks - Issues in Generation and Application
Moritz Neugebauer, Stephen Wilson
Department of Computer Science, University College Dublin
The continuing popularity of XML as a data exchange format and the concurrent rise of treebanks as natural language resources within various domains of language processing have led us to extend their domain of application to phonological data. Typically, treebanks are a language resource that provides annotations of natural languages at various levels of structure and in this paper we present a tree-based format to capture phonological information at the syllable level, at the segment level and even including more fine-grained featural information. Two integrated modules in relation to phonological treebanks are described: the first uses a multilingual feature set to augment segment-annotated corpora in terms of a tree-based structure represented in XML. The second module allows these feature trees to be traversed and the data contained in it to be optimised in a purely data-driven manner.
XML, Annotation, Treebanks, Phonology