LREC 2000 2nd International Conference on Language Resources & Evaluation

Previous Paper   Next Paper

Title A Self-Expanding Corpus Based on Newspapers on the Web
Authors Hofland Knut (HIT Centre, University of Bergen Allegt. 27, N-5007 Bergen, Norway,
Keywords Batch Download, Corpus, Newspapers, Web, Web-Based Concordance
Session Session WO15 - Language Resources Projects
Full Paper, 362.pdf
Abstract A Unix-based system is presented which automatic collects newspaper articles from the web, converts the texts, and includes these texts in a newspaper corpus. This corpus can be searched from a web-browser. The corpus is currently 70 millions words and increases by 4 millions words each month.