Title Evaluation Set for Slovak News Information Retrieval
Authors Daniel Hládek, Ján Staš and Jozef Juhár
Abstract This work proposes an information retrieval evaluation set for the Slovak language. A set of 80 queries written in the natural language is given together with the set of relevant documents. The document set contains 3980 newspaper articles sorted into 6 categories. Each document in the result set is manually annotated for relevancy with its corresponding query. The evaluation set is mostly compatible with the Cranfield test collection using the same methodology for queries and annotation of relevancy. In addition to that it provides annotation for document title, author, publication date and category that can be used for evaluation of automatic document clustering and categorization.
Topics Information Extraction, Information Retrieval, Evaluation Methodologies, Document Classification, Text categorisation
