Summary of the paper

Title Towards Language Technology for Mi'kmaq
Authors Anant Maheshwari, Leo Bouscarrat and Paul Cook
Abstract Mi'kmaq is a polysynthetic Indigenous language spoken primarily in Eastern Canada, on which no prior computational work has focused. In this paper we first construct and analyze a web corpus of Mi'kmaq. We then evaluate several approaches to language modelling for Mi'kmaq, including character-level models that are particularly well-suited to morphologically-rich languages. Preservation of Indigenous languages is particularly important in the current Canadian context; we argue that natural language processing could aid such efforts.
Topics Language Modelling, Endangered Languages, Corpus (Creation, Annotation, Etc.)
