LREC 2014 Proceedings

Summary of the paper

Title	Crowdsourcing and Annotating NER for Twitter #drift
Authors	Hege Fromreide, Dirk Hovy and anders Søgaard
Abstract	"We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can be obtained from crowdsourced annotations, making it more feasible to ""catch up"" with language drift."
Topics	Social Media Processing, Crowdsourcing
Full paper	Crowdsourcing and Annotating NER for Twitter #drift
Bibtex	@InProceedings{FROMREIDE14.421, author = {Hege Fromreide and Dirk Hovy and anders Søgaard}, title = {Crowdsourcing and Annotating NER for Twitter #drift}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} }