Title Researching Less-Resourced Languages – the DigiSami Corpus
Authors Kristiina Jokinen
Abstract Increased use of digital devices and data repositories has enabled a digital revolution in data collection and language research, and has also led to important activities supporting speech and language technology research for less-resourced languages. This paper describes the DigiSami project and its research results, focussing on spoken corpus collection and speech technology for the Fenno-Ugric language North Sami. The paper also discusses multifaceted questions on ethics and privacy related to data collection for less-resourced languages and indigenous communities.
Topics Corpus (Creation, Annotation, Etc.), Other, Lr National/International Projects, Infrastructural/Policy Issues
Full paper Researching Less-Resourced Languages – the DigiSami Corpus
