Summary of the paper

Title Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation
Authors Yan Song and Fei Xia
Abstract Domain adaptation is an important topic for natural language processing. There has been extensive research on the topic and various methods have been explored, including training data selection, model combination, semi-supervised learning. In this study, we propose to use a goodness measure, namely, description length gain (DLG), for domain adaptation for Chinese word segmentation. We demonstrate that DLG can help domain adaptation in two ways: as additional features for supervised segmenters to improve system performance, and also as a similarity measure for selecting training data to better match a test set. We evaluated our systems on the Chinese Penn Treebank version 7.0, which has 1.2 million words from five different genres, and the Chinese Word Segmentation Bakeoff-3 data.
Topics Statistical and machine learning methods, Other, Part of speech tagging
Full paper Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation
Bibtex @InProceedings{SONG12.973,
  author = {Yan Song and Fei Xia},
  title = {Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
 }
Powered by ELDA © 2012 ELDA/ELRA