The Language Belongs to the People!


Cornelis H.A. Koster (1), Stefan Gradmann (2)

(1) Computing Science Institute, University of Nijmegen, The Netherlands, kees@cs.kun.nl; (2) Regionales Rechenzentrum der Universitaet, Hamburg, Germany, stephan.gradmann@rrz.uni-hamburg.de




For every natural language, computer-readable basic resources like grammars and linguistic lexica are increasingly needed for educational, scientific and economical reasons. Especially in countries with a strong and modern economy, enormous efforts are invested in developing such resources, without common purpose and synergy. Property rights are yealously guarded by industrial and academical developers alike. Enormous amounts of subvention are wasted on projects that can not make use of the results of others, and whose results either are hidden or just evaporate. We investigate by small case studies the predicaments faced by linguistic academic researchers, research consortia, traditional publishers and application developers, and show that the prevailing restrictive attitude is counterproductive. It is impossible to ever earn back the (public) investment in linguistic resources, even the costs of distribution and incasso exceed the budgets of academics and industries feel compelled to invest in developing their own resources rather than base themselves on resources made my others. The present situation of linguistic resources resembles that of software systems and applications in the eighties: on the one hand many people try in vain to make some money from their intellectual work, hampering free exchange and collaboration, on the other hand monopolies threaten to corner the market. The resemblance is large enough to suggest the same solution: basic linguistic resources which have been developed with public support should be made freely available in the public domain. We propose therefore to treat such basic resources as Free Software, with the following advantages: - distribution of the most recent version per ftp is very cheap - developers are invited to improve and extend the work of their predecessors, rather that repeat it, and share the results - the quality of the resources is continuously improved, the maintenance problem is solved - academic researchers are liberated from the problems of using proprietary resources - commercial use is possible without fuzz and (unpredictable) expenses. As a good example, we discuss the AGFL project. With the aid of the Dutch Research Organisation NWO and (later) the NLnet foundation, the AGFL parser generator system for natural languages was brought under the GNU public licence in 2001. Since then it has been freely distributed, along with the EP4IR grammar and lexicon of English, which was developed in the course of two European projects, along with a grammar and lexicon of Dutch. Similar grammars for Russian, German, Spanish and Modern Standard Arabic are under development by other groups.


Basic Language Resource, open source, free software.

Language(s) N/A
Full Paper