Living off the land: The Web as a source of practice texts for learners of less prevalent languages


Kristina Nilsson (Computational Linguistics, Department of Linguistics, Stockholm University, SE-106 91 Stockholm, Sweden)

Lars Borin (Department of Linguistics, Uppsala University, Box 527, SE-751 20 Uppsala, Sweden)


WP1: Corpora & Corpus Tools


This study focuses on how to automatically locate text sources published on the World Wide Web in order to produce adequate and up-to-date learning materials for second language learners of Nordic languages. The Web is an excellent source of authentic text materials. However, the large amount of information available on the Web makes search services necessary. Hence, we are developing Squirrel, a prototype Web meta-search service, described in this paper, which collects text material in the Nordic languages according to language, topic and difficulty level. Our primary target group consists of exchange students to Nordic institutions of higher education, and their language teachers, although in the longer perspective, we would also like to be able to do something for minority language communities. We describe the basic implementation of Squirrel, and present preliminary results from trying it out. Finally we discuss the (lack of) Web resources in less prevalent languages, and how we imagine that applications like Squirrel could fit into a second or foreign language learning situation. 


Language resources, World wide web, Minority languages, Less prevalent languages, Computer-Assisted language learning, Authentic learning materials, Second language learning, Readability, Written language identification, Query term extraction

Full Paper