Securing Interpretability: The Case of Ega Language Documentation


Dafydd Gibbon (1), Catherine Bow (2), Steven Bird (2,3), Baden Hughes (2)

(1) Universitšt Bielefeld, Germany; (2) Department of Computer Science and Software Engineering, University of Melbourne, Australia; (3) Linguistic Data Concortium, University of Pennsylvania, USA




The prime consideration in designing sustainable language resources is to ensure that they remain interpretable for coming generations of users. In this paper we adopt a new perspective on resource creation - securing the interpretability of data, using a case study of Ega, an endangered African language for which a small amount of legacy data is available. Basic ste ps to securing interpretability are to transfer files to durable media, and where possible, to convert all legacy data into XML files with Unicode character encodings. In the absence of agreed `best practice' standards, we propose a methodology of `better practice' to assist in the transition process towards this goal. We discuss a number of issues involved in securing interpretability of the lexicon, character encodings, interlinear glossed text, annotated recordings and nomenclature in linguistic descriptions, and describe our solutions.


repository, reusability, sustainable resources, language documentation, metadata, ontology, legacy data

Language(s) Ega
