Querying dependency treebanks with XML


Gosse Bouam (Rijksuniversiteit Groningen)

Geert Kloosterman (Rijksuniversiteit Groningen)


WP4: Corpus Annotation


The need for manual editing during construction of a treebank may impose constraints on the representation of dependency trees which are not optimal for linguistic exploration. Using XML-technology it is possible to maintain the treebank both in a form suitable for editing and in a form suitable for linguistic exploration. By choosing a compact representation, we can use XPath directly as query language. We argue that, given an explicit encoding of string positions, this direct encoding of dependency trees as XML-trees can represent discontinuous constituents in a way that supports queries involving both dependency and linear order.


XML, Xpath, Dependency trees, Treebank

