LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Derivation in the Czech National Corpus
Authors Klímová Jana (Institute of Czech Language, Academy of Scienses of the Czech republic, Letenská 4, 118 51 Praha, Czech republic, jana.klimova@ff.cuni.cz)
Kocek Jan (Institute of the Czech National Corpus, Charles University, Faculty of Philosophy, Nám.J.Palacha 2, 116 38 Praha 1, Czech republic, Jan.kocek@ff.cuni.cz)
Keywords Czech National Corpus, Derivation, Paradigmatic and Semantic Properties of Suffixes, Productivity of Suffixes
Session Session WO18 - Morphology in Lexical and Textual Resources
Full Paper 153.ps, 153.pdf
Abstract The aim of this paper is to describe one the main means of Czech word formation - derivation. New Czech words are created by composition or by derivation (by using prefixes or suffixes). The suffixes which are added to the stem are used much more frequently than prefixes standing before the stem. The most frequent suffixes will be classified according to the paradigmatic and semantic properties and according to the changes they cause in the stem. The research is done on the Czech national corpus (CNC), the frequencies of the investigated suffixes illustrate their roductivity in present day Czech language. This research is of a particular value for a highly inflected language such as Czech. Possible applications of this system are various NLP systems, e.g. spelling checkers and machine translation systems. The results of this work serve for the computational processing of Czech word formation and in future for the creation of the Czech derivational dictionary.