LREC 2012 Proceedings

Summary of the paper

Title	Turkish Paraphrase Corpus
Authors	Seniz Demir, Ilknur Durgar El-Kahlout, Erdem Unal and Hamza Kaya
Abstract	Paraphrases are alternative syntactic forms in the same language expressing the same semantic content. Speakers of all languages are inherently familiar with paraphrases at different levels of granularity (lexical, phrasal, and sentential). For quite some time, the concept of paraphrasing is getting a growing attention by the research community and its potential use in several natural language processing applications (such as text summarization and machine translation) is being investigated. In this paper, we present, what is to our best knowledge, the first Turkish paraphrase corpus. The corpus is gleaned from four different sources and currently contains 1270 paraphrase pairs. All paraphrase pairs are carefully annotated by native Turkish speakers with the identified semantic correspondences between paraphrases. The work for expanding the corpus is still under way.
Topics	Corpus (creation, annotation, etc.), Textual Entailment and Paraphrasing, Other
Full paper	Turkish Paraphrase Corpus
Bibtex	@InProceedings{DEMIR12.968, author = {Seniz Demir and Ilknur Durgar El-Kahlout and Erdem Unal and Hamza Kaya}, title = {Turkish Paraphrase Corpus}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} }