Summary of the paper

Title Finding Domain Terms using Wikipedia
Authors Jorge Vivaldi and Horacio Rodríguez
Abstract In this paper we present a new approach for obtaining the terminology of a given domain using the category and page structures of the Wikipedia in a language independent way. Our approach consists basically, for each domain, on navigating the Category graph of the Wikipedia starting from the root nodes associated to the domain. A heavy filtering mechanism is carried out for preventing as much as possible the inclusion of spurious categories. For each selected category all the pages belonging to it are then recovered and filtered. This procedure is iterate several times until achieving convergence. Both category names and page names are considered candidates to belong to the terminology of the domain. This approach has been applied to three broad coverage domains: astronomy, chemistry and medicine, and two languages, English and Spanish, showing a promising performance.
Topics Information Extraction, Information Retrieval, Knowledge Discovery/Representation, Other
Full paper Finding Domain Terms using Wikipedia
Slides Finding Domain Terms using Wikipedia
Bibtex @InProceedings{VIVALDI10.748,
  author = {Jorge Vivaldi and Horacio Rodríguez},
  title = {Finding Domain Terms using Wikipedia},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
Powered by ELDA © 2010 ELDA/ELRA