ELRA Validation Methodology and Standard Promotion for Linguistic Resources
Hanne Fersøe (1), Monica Monachini (2)
(1) Center for Sprogteknologi (CST) – Københavns Universitet, Njalsgade 80, Copenhagen, Denmark, firstname.lastname@example.org; (2) Istituto di Linguistica Computazionale (ILC) – Consiglio Nazionale delle Ricerche , Via Moruzzi 1, Pisa, Italy, email@example.com
This paper describes the results of work made for ELRA during 2003-2004. It describes the methodology for validation of written language resources (WLRs), specifically lexica, which has been developed for ELRA and tested on a few resources in the ELRA catalogue. It discusses the importance of key issues in lexicon creation and validation such as the adoption of standards for the coding of linguistic content and the importance of documentation. It reports on the experience gained from applying the methodology to lexical resources in the ELRA catalogue arguing that the checks must be reasonable, informative, on a suitable level of detail, and generic. It proposes a set of basic elements to be included in future discussions on establishing standards for lexicon resources. In conclusion it sketches the work to be undertaken in 2004 to promote validation and the adoption of standards.
validation, lexica, written language resources, standards, ELRA