LREC 2000 2nd International Conference on Language Resources & Evaluation

Previous Paper   Next Paper

Title Learning Verb Subcategorization from Corpora: Counting Frame Subsets
Authors Zeman Daniel (Ϊstav formαlnν a aplikovanι lingvistiky, Univerzita Karlova, Praha)
Sarkar Anoop (Department of Computer and Information Science, University of Pennsylvania, Philadelphia)
Keywords Corpus, Frames, Subcategorization, Syntax, Valency, Verb
Session Session WO6 - Acquisition of Lexical Information
Full Paper, 145.pdf
Abstract We present some novel machine learning techniques for the identification of subcategorization information for verbs in Czech. We compare three different statistical techniques applied to this problem. We show how the learning algorithm can be used to discover previously unknown subcategorization frames from the Czech Prague Dependency Treebank. The algorithm can then be used to label dependents of a verb in the Czech treebank as either arguments or adjuncts. Using our techniques, we are able to achieve 88 % accuracy on unseen parsed text.