Lexical Analysis of Agglutinative Languages Using a Dictionary of Lemmas and Lexical Transducers


Sun-Mee BAE, Key-Sun Choi

Division of Computer Science, Dept. of EECS, Korea Advanced Institute of Science and Technology




This paper presents a simple method for performing a lexical analysis of agglutinative languages like Korean, which have a heavy morphology. Especially, for nouns and adverbs with regular morphological modifications and/or high productivity, we do not need to artificially construct huge dictionaries of all inflected forms of lemmas. To construct a dictionary of lemmas and lexical transducers, first, we construct automatically a dictionary of all inflected forms from KAIST POS-Tagged Corpus. Secondly, we separate the party of lemmas and one of sequences of inflectional suffixes. Thirdly, we describe their lexical transducers (i.e., morphological rules) to recognize all inflected forms of lemmas for nouns and adverbs according to the combinatorial restrictions between lemmas and their inflectional suffixes. Finally, we evaluate the advantages of this method.


lexical analysis, dictionary of lemmas, dictionary of inflected forms, lexical transducer, morphological parser


Korean (agglutinative language)

Full Paper