Using the Web as a Linguistic Resource for Learning Reformulations Automatically


Florence Duclaye (France Télécom R&D - ENST)

François Yvon (ENST)

Olivier Collin (France Télécom R&D)


WP1: Corpora & Corpus Tools


The use of paraphrases as a potential way to improve question answering, machine translation or automatic text summarization systems has long attracted the interest of researchers in natural language processing. However, manually entering reformulations into a system is a tedious and time-consuming process, if not an endless one. In this paper, we introduce a learning machinery aimed at acquiring reformulations automatically. Our system uses the Web as a linguistic resource and takes advantage of the results of an existing question answering system. Starting with one single prototypical argument tuple of a given semantic relation, our system first searches for potential alternative formulations of the relation, then finds new potential argument tuples, and iterates this process to progressively validate the candidate formulations. This learning process combines an acquisition stage, whose goal is to retrieve new evidences from Web pages, and a validation stage, whose role is to filter out noise and discard invalid paraphrases. After justifying the use of the Web as a linguistic resource, we describe our system, and report on primary results on a series of test semantic relations.


Reformulation, Paraphrase, Machine learning, Bootstrapping

Full Paper