Acquiring Compact Lexicalized Grammars from a Cleaner Treebank
Julia Hockenmaier (Division of Informatics, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW United Kingdom)
Mark Steedman (Division of Informatics, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW United Kingdom)
We present an algorithm which translates the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations. To do this we have needed to make several systematic changes to the Treebank which have to effect of cleaning up a number of errors and inconsistencies. This process has yielded a cleaner treebank that can potentially be used in any framework. We also show how unary type-changing rules for certain types of modifiers can be introduced in a CCG grammar to ensure a compact lexicon without augmenting the generative power of the system. We demonstrate how the combination of preprocessing and type-changing rules minimizes the lexical coverage problem.