Linguistic Corpus Search


Christian Biemann (1), Uwe Quasthoff (1), Christian Wolff (2)

(1) Leipzig University, Computer Science Institute, Natural Language Processing Dept., Augustusplatz 10/11, 04109 Leipzig, Germany. (2) Regensburg University, Institute for Media, Information and Cultural Studies, Media Computing Dept., Universitätsstr. 31, 93040 Regensburg, Germany




Searching corpora with linguistic questions requires both additional information encoded in the corpus and efficiency as in “traditional” search engines. We describe a search engine-like approach to querying plain as well as part-of-speech-tagged monolingual corpora. This approach makes use of a ‘minimalist’ query language which nevertheless allows powerful searches by optionally ignoring positional as well as inflectional features in the corpus sentences. Many queries can be formulated without detailed training via a simple web-based front-end. Relevant applications of this search tool in knowledge extraction are discussed as well.


Search, Indexing, linguistic constructions, large corpora

Language(s) German, English, language-independent
Full Paper