LREC 2018 Proceedings

Summary of the paper

Title	TSix: A Human-involved-creation Dataset for Tweet Summarization
Authors	Minh-Tien Nguyen, Dac Viet Lai, Huy-Tien Nguyen and Minh-Le Nguyen
Abstract	We present a new dataset for tweet summarization. The dataset includes six events collected from Twitter from October 10 to November 9, 2016. Our dataset features two prominent properties. Firstly, human-annotated gold-standard references allow to correctly evaluate extractive summarization methods. Secondly, tweets are assigned into sub-topics divided by consecutive days, which facilitate incremental tweet stream summarization methods. To reveal the potential usefulness of our dataset, we compare several well-known summarization methods. Experimental results indicate that among extractive approaches, hybrid term frequency -- document term frequency obtains competitive results in term of ROUGE-scores. The analysis also shows that polarity is an implicit factor of tweets in our dataset, suggesting that it can be exploited as a component besides tweet content quality in the summarization process.
Topics	Summarisation, Information Extraction, Information Retrieval, Corpus (Creation, Annotation, Etc.)
Full paper	TSix: A Human-involved-creation Dataset for Tweet Summarization
Bibtex	@InProceedings{NGUYEN18.516, author = {Minh-Tien Nguyen and Dac Viet Lai and Huy-Tien Nguyen and Minh-Le Nguyen}, title = "{TSix: A Human-involved-creation Dataset for Tweet Summarization}", booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {May 7-12, 2018}, address = {Miyazaki, Japan}, editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga}, publisher = {European Language Resources Association (ELRA)}, isbn = {979-10-95546-00-9}, language = {english} }