Title Exploiting Anchor Text as a Lexical Resource
Author(s) Peter Anick


Session P6-T
Abstract Anchor texts, the strings associated with hyperlinks on a web page, are currently employed to express millions of referrals to sites and topics on the world wide web. We consider how these strings might be exploited as a lexical resource, particularly when viewed from the perspective of their target documents rather than their sources. We find that for many target pages, incoming anchors form a miniature corpus of reference expressions whose properties with relation both to other target sites and to each other can be put to use for mining lexical information. 
Keyword(s) Anchor text, data mining, entity extraction, proper names, hyperlinks, world wide web
Language(s) English
Full Paper 756.pdf