LREC 2016 Proceedings

Summary of the paper

Title	UPPC - Urdu Paraphrase Plagiarism Corpus
Authors	Muhammad Sharjeel, Paul Rayson and Rao Muhammad Adeel Nawab
Abstract	Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect. Several methods and automatic systems have been proposed to deal with it. However, evaluation and comparison of such solutions is not possible because of the unavailability of benchmark corpora with manual examples of paraphrase plagiarism. To deal with this issue, we present the novel development of a paraphrase plagiarism corpus containing simulated (manually created) examples in the Urdu language - a language widely spoken around the world. This resource is the first of its kind developed for the Urdu language and we believe that it will be a valuable contribution to the evaluation of paraphrase plagiarism detection systems.
Topics	Corpus (Creation, Annotation, etc.), Textual Entailment and Paraphrasing, Evaluation Methodologies
Full paper	UPPC - Urdu Paraphrase Plagiarism Corpus
Bibtex	@InProceedings{SHARJEEL16.364, author = {Muhammad Sharjeel and Paul Rayson and Rao Muhammad Adeel Nawab}, title = {UPPC - Urdu Paraphrase Plagiarism Corpus}, booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)}, year = {2016}, month = {may}, date = {23-28}, location = {Portorož, Slovenia}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {978-2-9517408-9-1}, language = {english} }