Automatic paraphrasing based on parallel corpus for normalization


Mitsuo Shimohata (ATR Spoken Language Translation Research Laboratories)

Eiichiro Sumita (ATR Spoken Language Translation Research Laboratories)


WP1: Corpora & Corpus Tools


There are various ways to express the same meaning in natural language. This diversity causes difficulty in many fields of natural language processing. It can be reduced by normalization of synonymous expressions, which is done by replacing various synonymous expressions with a standard one. In this paper, we propose a method for extracting paraphrases from a parallel corpus automatically and utilizing them for normalization. First, synonymous sentences are grouped by the equivalence of translation. Then, synonymous expressions are extracted by the differences between synonymous sentences. Synonymous expressions contain not only interchangeable words but also surrounding words in order to consider contextual condition. Our method has two advantages: 1) only a parallel corpus is required, and 2) various types of paraphrases can be acquired.


Paraphrase, Parallel corpus, DP-Matching

