Querying dependency treebanks with XML
Gosse Bouam (Rijksuniversiteit Groningen)
Geert Kloosterman (Rijksuniversiteit Groningen)
WP4: Corpus Annotation
The need for manual editing during construction of a treebank may impose constraints on the representation of dependency trees which are not optimal for linguistic exploration. Using XML-technology it is possible to maintain the treebank both in a form suitable for editing and in a form suitable for linguistic exploration. By choosing a compact representation, we can use XPath directly as query language. We argue that, given an explicit encoding of string positions, this direct encoding of dependency trees as XML-trees can represent discontinuous constituents in a way that supports queries involving both dependency and linear order.
XML, Xpath, Dependency trees, Treebank