Summary of the paper

Title Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them
Authors Bruno Laranjeira, Viviane Moreira, Aline Villavicencio, Carlos Ramisch and Maria José Finatto
Abstract Comparable corpora have been used as an alternative for parallel corpora as resources for computational tasks that involve domain-specific natural language processing. One way to gather documents related to a specific topic of interest is to traverse a portion of the web graph in a targeted way, using focused crawling algorithms. In this paper, we compare several focused crawling algorithms using them to collect comparable corpora on a specific domain. Then, we compare the evaluation of the focused crawling algorithms to the performance of linguistic processes executed after training with the corresponding generated corpora. Also, we propose a novel approach for focused crawling, exploiting the expressive power of multiword expressions.
Topics Evaluation Methodologies, Machine Translation, SpeechToSpeech Translation
Full paper Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them
Bibtex @InProceedings{LARANJEIRA14.1095,
  author = {Bruno Laranjeira and Viviane Moreira and Aline Villavicencio and Carlos Ramisch and Maria José Finatto},
  title = {Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
  year = {2014},
  month = {may},
  date = {26-31},
  address = {Reykjavik, Iceland},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-8-4},
  language = {english}
 }
Powered by ELDA © 2014 ELDA/ELRA