LREC 2000 2nd International Conference on Language Resources & Evaluation

Title The Concede Model for Lexical Databases
Authors Erjavec Tomaž (Dept. for Intelligent Systems, Jožef Stefan Institute, Ljubljana, Slovenia,
Evans Roger (Information Technology Research Institute, University of Brighton, Lewes Rd, Brighton, UK,, http:/
Ide Nancy (Department of Computer Science, Vassar College, Poughkeepsie, NY 12604-0520 USA,
Kilgarriff Adam (ITRI, University of Brighton, Brighton, England,
Keywords Dictionary, Lexical Database, TEI, Up-Translation, XML
Session Session WP1 - Lexicon
Full Paper, 335.pdf
Abstract The value of language resources is greatly enhanced if they share a common markup with an explicit minimal semantics. Achieving this goal for lexical databases is difficult, as large-scale resources can realistically only be obtained by up-translation from pre-existing dictionaries, each with its own proprietary structure. This paper describes the approach we have taken in the Concede project, which aims to develop compatible lexical databases for six Central and Eastern European languages. Starting with sample entries from original presentation-oriented electronic representations of dictionaries, we transformed the data into an intermediate TEI-compatible represen-tation to provide a common baseline for evaluating and comparing the dictionaries. We then developed a more restrictive encoding, formalised as an XML DTD with a clearly-defined semantic interpretation. We present this DTD and discuss a sample conversion from TEI, together with an application which hyperlinks a HTML representation of the dictionary to on-line concordancing over a corpus.