Title

Title	Linguistic Corpus Search
Author(s)	Christian Biemann (1), Uwe Quasthoff (1), Christian Wolff (2) (1) Leipzig University, Computer Science Institute, Natural Language Processing Dept., Augustusplatz 10/11, 04109 Leipzig, Germany. (2) Regensburg University, Institute for Media, Information and Cultural Studies, Media Computing Dept., Universitätsstr. 31, 93040 Regensburg, Germany
Session	P1-W
Abstract	Searching corpora with linguistic questions requires both additional information encoded in the corpus and efficiency as in “traditional” search engines. We describe a search engine-like approach to querying plain as well as part-of-speech-tagged monolingual corpora. This approach makes use of a ‘minimalist’ query language which nevertheless allows powerful searches by optionally ignoring positional as well as inflectional features in the corpus sentences. Many queries can be formulated without detailed training via a simple web-based front-end. Relevant applications of this search tool in knowledge extraction are discussed as well.
Keyword(s)	Search, Indexing, linguistic constructions, large corpora
Language(s)	German, English, language-independent
Full Paper	546.pdf