A Two-level Morphological Analyser and Generator for Irish using Finite-State Transducers
Elaine Uí Dhonnchadha (Institiúid Teangeolaíochta Éireann 31 Plás Mhic Liam, Baile Átha Cliath 2, Éire, and Dublin City University Glasnevin, Dublin 11, Ireland.)
WP5: Components & Systems
Computational morphology is an important part of natural language processing. Finite-state techniques have been applied successfully in computational phonology and morphology to many of the world’s major languages. Celtic languages such as Modern Irish present challenging morphological features that to date have not been addressed using finite-state technology. This paper presents a finite-state two-level morphology of Irish developed using Xerox Finite-State Tools. The system encodes the inflectional morphology of all inflected parts-of-speech in Modern Irish. The morphotactics of stems and affixes are encoded in the lexicon and word mutations are implemented as a series of replace rules encoded as regular expressions. Both the lexicons and rules are compiled into finite state transducers and combined to produce a single lexical transducer for the language. A major advantage of finite-state two-level implementations of morphology is their inherent bi-directionality; the same system is used for both analysis and generation of word forms in the language. This resource can be used as a component part in many NLP applications such as spelling checkers/correctors, stemmers, and text to speech synthesisers. It can also be used in tokenising, lemmatising and part-of-speech tagging of a corpus of text. The system, which is designed for broad coverage of the language, is evaluated against the most frequently used words in a corpus of contemporary Irish texts. Finally, possible extensions to the system are suggested, such as derivational morphology and the inclusion of dialectal or historical word-forms.
Morphological analysis, Generator for irish