Summary of the paper

Title Similar Term Discovery using Web Search
Authors Peter Anick, Vijay Murthi and Shaji Sebastian
Abstract We present an approach to the discovery of semantically similar terms that utilizes a web search engine as both a source for generating related terms and a tool for estimating the semantic similarity of terms. The system works by associating with each document in the search engine’s index a weighted term vector comprising those phrases that best describe the document’s subject matter. Related terms for a given seed phrase are generated by running the seed as a search query and mining the result vector produced by averaging the weights of terms associated with the top documents of the query result set. The degree of similarity between the seed term and each related term is then computed as the cosine of the angle between their respective result vectors. We test the effectiveness of this approach for building a term recommender system designed to help online advertisers discover additional phrases to describe their product offering. A comparison of its output with that of several alternative methods finds it to be competitive with the best known alternative.
Language Multiple languages
Topics Information Extraction, Information Retrieval, Tools, systems, applications, Text mining
Full paper Similar Term Discovery using Web Search
Slides -
Bibtex @InProceedings{ANICK08.306,
  author = {Peter Anick, Vijay Murthi and Shaji Sebastian},
  title = {Similar Term Discovery using Web Search},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {},
  language = {english}

Powered by ELDA © 2008 ELDA/ELRA