Summary of the paper

Title Researching Less-Resourced Languages – the DigiSami Corpus
Authors Kristiina Jokinen
Abstract Increased use of digital devices and data repositories has enabled a digital revolution in data collection and language research, and has also led to important activities supporting speech and language technology research for less-resourced languages. This paper describes the DigiSami project and its research results, focussing on spoken corpus collection and speech technology for the Fenno-Ugric language North Sami. The paper also discusses multifaceted questions on ethics and privacy related to data collection for less-resourced languages and indigenous communities.
Topics Corpus (Creation, Annotation, Etc.), Other, Lr National/International Projects, Infrastructural/Policy Issues
Full paper Researching Less-Resourced Languages – the DigiSami Corpus
Bibtex @InProceedings{JOKINEN18.954,
  author = {Kristiina Jokinen},
  title = "{Researching Less-Resourced Languages – the DigiSami Corpus}",
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {May 7-12, 2018},
  address = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {979-10-95546-00-9},
  language = {english}
Powered by ELDA © 2018 ELDA/ELRA