LREC 2000 2nd International Conference on Language Resources & Evaluation

Previous Paper   Next Paper

Title An Open Architecture for the Construction and Administration of Corpora
Authors Orăsan Constantin (School of Humanities, Languages and Social Sciences, Stafford Street, University of Wolverhampton, Wolverhampton, WV1 1SB, United Kingdom,
Krishnamurthy Ramesh (Computational Linguistics Group, School of Humanities, Languages and Social Sciences,, University of Wolverhampton, Stafford Street, Wolverhampton, WV1 1SB, United Kingdom)
Keywords Client-Server, Copyright, Corpora, Corpus Administration, Corpus Building, Modular Programming
Session Session WO12 - Language Resources: Infrastructural Issues
Full Paper, 176.pdf
Abstract The use of language corpora for a variety of purposes has increased significantly in recent years. General corpora are now available for many languages, but research often requires more specialized corpora. The rapid development of the World Wide Web has greatly improved access to data in electronic form, but research has tended to focus on corpus annotation, rather than on corpus building tools. Therefore many researchers are building their own corpora, solving problems independently, and producing project-specific systems which cannot easily be re-used. This paper proposes an open client-server architecture which can service the basic operations needed in the construction and administration of corpora, but allows customisation by users in order to carry out project-specific tasks. The paper is based partly on recent practical experience of building a corpus of 10 million words of Written Business English from webpages, in a project which was co-funded by ELRA and the University of Wolverhampton.