LREC 2020 Proceedings Home | Workshops | Industry Track | LREC 2020 WEBSITE | ELRA WEBSITE

Proceedings of the 12th Language Resources and Evaluation Conference


Full proceedings volume (PDF) | Programme | Author index | Bibliography (BibTeX) | Editors

Opening Addresses
Address by the LREC Chair
Nicoletta Calzolari
Address by the ELRA President
António Branco
Address by the ELRA Secretary General
Khalid Choukri
Address by the LREC 2020 Local Committee Chairs
Frédéric Béchet and Philippe Blache

pdf bib Papers pages
pdf bib Neural Mention Detection
Juntao Yu, Bernd Bohnet and Massimo Poesio
pp. 1‑10
pdf bib A Cluster Ranking Model for Full Anaphora Resolution
Juntao Yu, Alexandra Uma and Massimo Poesio
pp. 11‑20
pdf bib Mandarinograd: A Chinese Collection of Winograd Schemas
Timothée Bernard and Ting Han
pp. 21‑26
pdf bib On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation Tasks
Alexander Henlein and Alexander Mehler
pp. 27‑33
pdf bib NoEl: An Annotated Corpus for Noun Ellipsis in English
Payal Khullar, Kushal Majmundar and Manish Shrivastava
pp. 34‑43
pdf bib An Annotated Dataset of Coreference in English Literature
David Bamman, Olivia Lewke and Anya Mansoor
pp. 44‑54
pdf bib GerDraCor-Coref: A Coreference Corpus for Dramatic Texts in German
Janis Pagel and Nils Reiter
pp. 55‑64
pdf bib A Study on Entity Resolution for Email Conversations
Parag Pravin Dakle, Takshak Desai and Dan Moldovan
pp. 65‑73
pdf bib Model-based Annotation of Coreference
Rahul Aralikatte and Anders Søgaard
pp. 74‑79
pdf bib French Coreference for Spoken and Written Language
Rodrigo Wilkens, Bruno Oberle, Frédéric Landragin and Amalia Todirascu
pp. 80‑89
pdf bib Cross-lingual Zero Pronoun Resolution
Abdulrahman Aloraini and Massimo Poesio
pp. 90‑98
pdf bib Exploiting Cross-Lingual Hints to Discover Event Pronouns
Sharid Loáiciga, Christian Hardmeier and Asad Sayeed
pp. 99‑103
pdf bib MuDoCo: Corpus for Multidomain Coreference Resolution and Referring Expression Generation
Scott Martin, Shivani Poddar and Kartikeya Upasani
pp. 104‑111
pdf bib Affection Driven Neural Networks for Sentiment Analysis
Rong Xiang, Yunfei Long, Mingyu Wan, Jinghang Gu, Qin Lu and Chu-Ren Huang
pp. 112‑119
pdf bib The Alice Datasets: fMRI & EEG Observations of Natural Language Comprehension
Shohini Bhattasali, Jonathan Brennan, Wen-Ming Luh, Berta Franzluebbers and John Hale
pp. 120‑125
pdf bib Modelling Narrative Elements in a Short Story: A Study on Annotation Schemes and Guidelines
Elena Mikhalkova, Timofei Protasov, Polina Sokolova, Anastasiia Bashmakova and Anastasiia Drozdova
pp. 126‑132
pdf bib Cortical Speech Databases For Deciphering the Articulatory Code
Harald Höge
pp. 133‑137
pdf bib ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation
Nora Hollenstein, Marius Troendle, Ce Zhang and Nicolas Langer
pp. 138‑146
pdf bib Linguistic, Kinematic and Gaze Information in Task Descriptions: The LKG-Corpus
Tim Reinboth, Stephanie Gross, Laura Bishop and Brigitte Krenn
pp. 147‑155
pdf bib The ACQDIV Corpus Database and Aggregation Pipeline
Anna Jancso, Steven Moran and Sabine Stoll
pp. 156‑165
pdf bib Providing Semantic Knowledge to a Set of Pictograms for People with Disabilities: a Set of Links between WordNet and Arasaac: Arasaac-WN
Didier Schwab, Pauline Trial, Céline Vaschalde, Loïc Vial, Emmanuelle Esperanca-Rodier and Benjamin Lecouteux
pp. 166‑171
pdf bib Orthographic Codes and the Neighborhood Effect: Lessons from Information Theory
Stéphan Tulkens, Dominiek Sandra and Walter Daelemans
pp. 172‑181
pdf bib Understanding the Dynamics of Second Language Writing through Keystroke Logging and Complexity Contours
Elma Kerz, Fabio Pruneri, Daniel Wiechmann, Yu Qiao and Marcus Ströbel
pp. 182‑188
pdf bib Design of BCCWJ-EEG: Balanced Corpus with Human Electroencephalography
Yohei Oseki and Masayuki Asahara
pp. 189‑194
pdf bib Using the RUPEX Multichannel Corpus in a Pilot fMRI Study on Speech Disfluencies
Katerina Smirnova, Nikolay Korotaev, Yana Panikratova, Irina Lebedeva, Ekaterina Pechenkova and Olga Fedorova
pp. 195‑203
pdf bib Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language
Aomi Koyama, Tomoshige Kiyuna, Kenji Kobayashi, Mio Arai and Mamoru Komachi
pp. 204‑211
pdf bib Effective Crowdsourcing of Multiple Tasks for Comprehensive Knowledge Extraction
Sangha Nam, Minho Lee, Donghwan Kim, Kijong Han, Kuntae Kim, Sooji Yoon, Eun-kyung Kim and Key-Sun Choi
pp. 212‑219
pdf bib Developing a Corpus of Indirect Speech Act Schemas
Antonio Roque, Alexander Tsuetaki, Vasanth Sarathy and Matthias Scheutz
pp. 220‑228
pdf bib Quality Estimation for Partially Subjective Classification Tasks via Crowdsourcing
Yoshinao Sato and Kouki Miyazawa
pp. 229‑235
pdf bib Crowdsourcing in the Development of a Multilingual FrameNet: A Case Study of Korean FrameNet
Younggyun Hahm, Youngbin Noh, Ji Yoon Han, Tae Hwan Oh, Hyonsu Choe, Hansaem Kim and Key-Sun Choi
pp. 236‑244
pdf bib Towards a Reliable and Robust Methodology for Crowd-Based Subjective Quality Assessment of Query-Based Extractive Text Summarization
Neslihan Iskender, Tim Polzehl and Sebastian Möller
pp. 245‑253
pdf bib A Seed Corpus of Hindu Temples in India
Priya Radhakrishnan
pp. 254‑258
pdf bib Do You Believe It Happened? Assessing Chinese Readers’ Veridicality Judgments
Yu-Yun Chang and Shu-Kai Hsieh
pp. 259‑267
pdf bib Creating Expert Knowledge by Relying on Language Learners: a Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language Learning
Lionel Nicolas, Verena Lyding, Claudia Borg, Corina Forascu, Karën Fort, Katerina Zdravkova, Iztok Kosem, Jaka Čibej, Špela Arhar Holdt, Alice Millour, Alexander König, Christos Rodosthenous, Federico Sangati, Umair ul Hassan, Anisia Katinskaia, Anabela Barreiro, Lavinia Aparaschivei and Yaakov HaCohen-Kerner
pp. 268‑278
pdf bib MAGPIE: A Large Corpus of Potentially Idiomatic Expressions
Hessel Haagsma, Johan Bos and Malvina Nissim
pp. 279‑287
pdf bib CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues
Francisco Javier Chiyah Garcia, José Lopes, Xingkun Liu and Helen Hastie
pp. 288‑297
pdf bib Effort Estimation in Named Entity Tagging Tasks
Inês Gomes, Rui Correia, Jorge Ribeiro and João Freitas
pp. 298‑306
pdf bib Using Crowdsourced Exercises for Vocabulary Training to Expand ConceptNet
Christos Rodosthenous, Verena Lyding, Federico Sangati, Alexander König, Umair ul Hassan, Lionel Nicolas, Jolita Horbacauskiene, Anisia Katinskaia and Lavinia Aparaschivei
pp. 307‑316
pdf bib Predicting Multidimensional Subjective Ratings of Children’ Readings from the Speech Signals for the Automatic Assessment of Fluency
Gérard Bailly, Erika Godde, Anne-Laure Piat-Marchand and Marie-Line Bosse
pp. 317‑322
pdf bib Constructing Multimodal Language Learner Texts Using LARA: Experiences with Nine Languages
Elham Akhlaghi, Branislav Bédi, Fatih Bektaş, Harald Berthelsen, Matthias Butterweck, Cathy Chua, Catia Cucchiarin, Gülşen Eryiğit, Johanna Gerlach, Hanieh Habibi, Neasa Ní Chiaráin, Manny Rayner, Steinþór Steingrímsson and Helmer Strik
pp. 323‑331
pdf bib A Dataset for Investigating the Impact of Feedback on Student Revision Outcome
Ildiko Pilan, John Lee, Chak Yan Yeung and Jonathan Webster
pp. 332‑339
pdf bib Creating Corpora for Research in Feedback Comment Generation
Ryo Nagata, Kentaro Inui and Shin’ichiro Ishikawa
pp. 340‑345
pdf bib Using Multilingual Resources to Evaluate CEFRLex for Learner Applications
Johannes Graën, David Alfter and Gerold Schneider
pp. 346‑355
pdf bib Immersive Language Exploration with Object Recognition and Augmented Reality
Benny Platte, Anett Platte, Christian Roschke, Rico Thomanek, Thony Rolletschke, Frank Zimmer and Marc Ritter
pp. 356‑362
pdf bib A Process-oriented Dataset of Revisions during Writing
Rianne Conijn, Emily Dux Speltz, Menno van Zaanen, Luuk Van Waes and Evgeny Chukharev-Hudilainen
pp. 363‑368
pdf bib Automated Writing Support Using Deep Linguistic Parsers
Luís Morgado da Costa, Roger V P Winder, Shu Yun Li, Benedict Christopher Lin Tzer Liang, Joseph Mackinnon and Francis Bond
pp. 369‑377
pdf bib TLT-school: a Corpus of Non Native Children Speech
Roberto Gretter, Marco Matassoni, Stefano Bannò and Falavigna Daniele
pp. 378‑385
pdf bib Toward a Paradigm Shift in Collection of Learner Corpora
Anisia Katinskaia, Sardana Ivanova and Roman Yangarber
pp. 386‑391
pdf bib Quality Focused Approach to a Learner Corpus Development
Roberts Darģis, Ilze Auziņa, Kristīne Levāne-Petrova and Inga Kaija
pp. 392‑396
pdf bib An Exploratory Study into Automated Précis Grading
Orphee De Clercq and Senne Van Hoecke
pp. 397‑404
pdf bib Adjusting Image Attributes of Localized Regions with Low-level Dialogue
Tzu-Hsiang Lin, Alexander Rudnicky, Trung Bui, Doo Soon Kim and Jean Oh
pp. 405‑412
pdf bib Alignment Annotation for Clinic Visit Dialogue to Clinical Note Sentence Language Generation
Wen-wai Yim, Meliha Yetisgen, Jenny Huang and Micah Grossman
pp. 413‑421
pdf bib MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines
Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Goyal, Peter Ku and Dilek Hakkani-Tur
pp. 422‑428
pdf bib A Comparison of Explicit and Implicit Proactive Dialogue Strategies for Conversational Recommendation
Matthias Kraus, Fabian Fischbach, Pascal Jansen and Wolfgang Minker
pp. 429‑435
pdf bib Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque
Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa and Eneko Agirre
pp. 436‑442
pdf bib Construction and Analysis of a Multimodal Chat-talk Corpus for Dialog Systems Considering Interpersonal Closeness
Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose and Akinori Ito
pp. 443‑448
pdf bib BLISS: An Agent for Collecting Spoken Dialogue Data about Health and Well-being
Jelte van Waterschoot, Iris Hendrickx, Arif Khan, Esther Klabbers, Marcel de Korte, Helmer Strik, Catia Cucchiarini and Mariët Theune
pp. 449‑458
pdf bib The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service
Meng Chen, Ruixue Liu, Lei Shen, Shaozu Yuan, Jingyan Zhou, Youzheng Wu, Xiaodong He and Bowen Zhou
pp. 459‑466
pdf bib "Cheese!": a Corpus of Face-to-face French Interactions. A Case Study for Analyzing Smiling and Conversational Humor
Béatrice Priego-Valverde, Brigitte Bigi and Mary Amoyal
pp. 467‑475
pdf bib The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue Systems
Alberto Chierici, Nizar Habash and Margarita Bicec
pp. 476‑484
pdf bib How Users React to Proactive Voice Assistant Behavior While Driving
Maria Schmidt, Wolfgang Minker and Steffen Werner
pp. 485‑490
pdf bib Emotional Speech Corpus for Persuasive Dialogue System
Sara Asai, Koichiro Yoshino, Seitaro Shinagawa, Sakriani Sakti and Satoshi Nakamura
pp. 491‑497
pdf bib Multimodal Analysis of Cohesion in Multi-party Interactions
Reshmashree Bangalore Kantharaju, Caroline Langlet, Mukesh Barange, Chloé Clavel and Catherine Pelachaud
pp. 498‑507
pdf bib Treating Dialogue Quality Evaluation as an Anomaly Detection Problem
Rostislav Nedelchev, Ricardo Usbeck and Jens Lehmann
pp. 508‑512
pdf bib Evaluation of Argument Search Approaches in the Context of Argumentative Dialogue Systems
Niklas Rach, Yuki Matsuda, Johannes Daxenberger, Stefan Ultes, Keiichi Yasumoto and Wolfgang Minker
pp. 513‑522
pdf bib PATE: A Corpus of Temporal Expressions for the In-car Voice Assistant Domain
Alessandra Zarcone, Touhidul Alam and Zahra Kolagar
pp. 523‑530
pdf bib Mapping the Dialog Act Annotations of the LEGO Corpus into ISO 24617-2 Communicative Functions
Eugénio Ribeiro, Ricardo Ribeiro and David Martins de Matos
pp. 531‑539
pdf bib Estimating User Communication Styles for Spoken Dialogue Systems
Juliana Miehle, Isabel Feustel, Julia Hornauer, Wolfgang Minker and Stefan Ultes
pp. 540‑548
pdf bib The ISO Standard for Dialogue Act Annotation, Second Edition
Harry Bunt, Volha Petukhova, Emer Gilmartin, Catherine Pelachaud, Alex Fang, Simon Keizer and Laurent Prévot
pp. 549‑558
pdf bib The AICO Multimodal Corpus – Data Collection and Preliminary Analyses
Kristiina Jokinen
pp. 559‑564
pdf bib A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models
Fabian Galetzka, Chukwuemeka Uchenna Eneh and David Schlangen
pp. 565‑573
pdf bib A French Medical Conversations Corpus Annotated for a Virtual Patient Dialogue System
Fréjus A. A. Laleye, Gaël de Chalendar, Antonia Blanié, Antoine Brouquet and Dan Behnamou
pp. 574‑580
pdf bib Getting To Know You: User Attribute Extraction from Dialogues
Chien-Sheng Wu, Andrea Madotto, Zhaojiang Lin, Peng Xu and Pascale Fung
pp. 581‑589
pdf bib Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization
Abhinav Kumar, Barbara Di Eugenio, Jillian Aurisano and Andrew Johnson
pp. 590‑599
pdf bib RDG-Map: A Multimodal Corpus of Pedagogical Human-Agent Spoken Interactions.
Maike Paetzel, Deepthi Karkada and Ramesh Manuvinakurike
pp. 600‑609
pdf bib MPDD: A Multi-Party Dialogue Dataset for Analysis of Emotions and Interpersonal Relationships
Yi-Ting Chen, Hen-Hsen Huang and Hsin-Hsi Chen
pp. 610‑614
pdf bib “Alexa in the wild” – Collecting Unconstrained Conversations with a Modern Voice Assistant in a Public Environment
Ingo Siegert
pp. 615‑619
pdf bib EDA: Enriching Emotional Dialogue Acts using an Ensemble of Neural Annotators
Chandrakant Bothe, Cornelius Weber, Sven Magg and Stefan Wermter
pp. 620‑627
pdf bib PACO: a Corpus to Analyze the Impact of Common Ground in Spontaneous Face-to-Face Interaction
Mary Amoyal, Béatrice Priego-Valverde and Stephane Rauzy
pp. 628‑633
pdf bib Dialogue Act Annotation in a Multimodal Corpus of First Encounter Dialogues
Costanza Navarretta and Patrizia Paggio
pp. 634‑643
pdf bib A Conversation-Analytic Annotation of Turn-Taking Behavior in Japanese Multi-Party Conversation and its Preliminary Analysis
Mika Enomoto, Yasuharu Den and Yuichi Ishimoto
pp. 644‑652
pdf bib Understanding User Utterances in a Dialog System for Caregiving
Yoshihiko Asao, Julien Kloetzer, Junta Mizuno, Dai Saiki, Kazuma Kadowaki and Kentaro Torisawa
pp. 653‑661
pdf bib Designing Multilingual Interactive Agents using Small Dialogue Corpora
Donghui Lin, Masayuki Otani, Ryosuke Okuno and Toru Ishida
pp. 662‑667
pdf bib Multimodal Corpus of Bidirectional Conversation of Human-human and Human-robot Interaction during fMRI Scanning
Birgit Rauchbauer, Youssef Hmamouche, Brigitte Bigi, Laurent Prévot, Magalie Ochs and Thierry Chaminade
pp. 668‑675
pdf bib The Brain-IHM Dataset: a New Resource for Studying the Brain Basis of Human-Human and Human-Machine Conversations
Magalie Ochs, Roxane Bertrand, Aurélie Goujon, Deirdre Bolger, Anne-Sophie Dubarry and Philippe Blache
pp. 676‑683
pdf bib Dialogue-AMR: Abstract Meaning Representation for Dialogue
Claire Bonial, Lucia Donatelli, Mitchell Abrams, Stephanie M. Lukin, Stephen Tratz, Matthew Marge, Ron Artstein, David Traum and Clare Voss
pp. 684‑695
pdf bib Relation between Degree of Empathy for Narrative Speech and Type of Responsive Utterance in Attentive Listening
Koichiro Ito, Masaki Murata, Tomohiro Ohno and Shigeki Matsubara
pp. 696‑701
pdf bib Intent Recognition in Doctor-Patient Interviews
Robin Rojowiec, Benjamin Roth and Maximilian Fink
pp. 702‑709
pdf bib BrainPredict: a Tool for Predicting and Visualising Local Brain Activity
Youssef Hmamouche, Laurent Prévot, Magalie Ochs and Thierry Chaminade
pp. 710‑716
pdf bib MTSI-BERT: A Session-aware Knowledge-based Conversational Agent
Matteo Antonio Senese, Giuseppe Rizzo, Mauro Dragoni and Maurizio Morisio
pp. 717‑725
pdf bib Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers
Kallirroi Georgila, Carla Gordon, Volodymyr Yanov and David Traum
pp. 726‑734
pdf bib Which Model Should We Use for a Real-World Conversational Dialogue System? a Cross-Language Relevance Model or a Deep Neural Net?
Seyed Hossein Alavi, Anton Leuski and David Traum
pp. 735‑742
pdf bib Chinese Whispers: A Multimodal Dataset for Embodied Language Grounding
Dimosthenis Kontogiorgos, Elena Sibirtseva and Joakim Gustafson
pp. 743‑749
pdf bib AMUSED: A Multi-Stream Vector Representation Method for Use in Natural Dialogue
Gaurav Kumar, Rishabh Joshi, Jaspreet Singh and Promod Yenigalla
pp. 750‑758
pdf bib An Annotation Approach for Social and Referential Gaze in Dialogue
Vidya Somashekarappa, Christine Howes and Asad Sayeed
pp. 759‑765
pdf bib A Penn-style Treebank of Middle Low German
Hannah Booth, Anne Breitbarth, Aaron Ecay and Melissa Farasyn
pp. 766‑775
pdf bib Books of Hours. the First Liturgical Data Set for Text Segmentation.
Amir Hazem, Beatrice Daille, Christopher Kermorvant, Dominique Stutzmann, Marie-Laurence Bonhomme, Martin Maarand and Mélodie Boillet
pp. 776‑784
pdf bib Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia
Sergey Zinin and Yang Xu
pp. 785‑793
pdf bib The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study
Stefan Fischer, Jörg Knappen, Katrin Menzel and Elke Teich
pp. 794‑802
Annelen Brunner, Stefan Engelberg, Fotis Jannidis, Ngoc Duyen Tanja Tu and Lukas Weimer
pp. 803‑812
pdf bib WeDH - a Friendly Tool for Building Literary Corpora Enriched with Encyclopedic Metadata
Mattia Egloff and Davide Picca
pp. 813‑816
pdf bib Automatic Section Recognition in Obituaries
Valentino Sabbatino, Laura Ana Maria Bostan and Roman Klinger
pp. 817‑825
pdf bib SLäNDa: An Annotated Corpus of Narrative and Dialogue in Swedish Literary Fiction
Sara Stymne and Carin Östman
pp. 826‑834
pdf bib RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text
Sean Papay and Sebastian Padó
pp. 835‑841
pdf bib A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus"
Roman Schneider
pp. 842‑848
pdf bib The BDCamões Collection of Portuguese Literary Documents: a Research Resource for Digital Humanities and Language Technology
Sara Grilo, Márcia Bolrinha, João Silva, Rui Vaz and António Branco
pp. 849‑854
pdf bib Dataset for Temporal Analysis of English-French Cognates
Esteban Frossard, Mickael Coustaty, Antoine Doucet, Adam Jatowt and Simon Hengchen
pp. 855‑859
pdf bib Material Philology Meets Digital Onomastic Lexicography: The NordiCon Database of Medieval Nordic Personal Names in Continental Sources
Michelle Waldispühl, Dana Dannells and Lars Borin
pp. 860‑867
pdf bib NLP Scholar: A Dataset for Examining the State of NLP Research
Saif M. Mohammad
pp. 868‑877
pdf bib The DReaM Corpus: A Multilingual Annotated Corpus of Grammars for the World’s Languages
Shafqat Mumtaz Virk, Harald Hammarström, Markus Forsberg and Søren Wichmann
pp. 878‑884
pdf bib LiViTo: Linguistic and Visual Features Tool for Assisted Analysis of Historic Manuscripts
Klaus Müller, Aleksej Tikhonov and Roland Meyer
pp. 885‑890
pdf bib TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative Annotation of Texts
Giuseppe Abrami, Manuel Stoeckel and Alexander Mehler
pp. 891‑900
pdf bib Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings
Bikash Gyawali, Lucas Anastasiou and Petr Knoth
pp. 901‑910
pdf bib “Voices of the Great War”: A Richly Annotated Corpus of Italian Texts on the First World War
Federico Boschetti, irene de felice, Stefano Dei Rossi, Felice Dell’Orletta, Michele Di Giorgio, Martina Miliani, Lucia C. Passaro, Angelica Puddu, Giulia Venturi, Nicola Labanca, Alessandro Lenci and Simonetta Montemagni
pp. 911‑918
pdf bib DEbateNet-mig15:Tracing the 2015 Immigration Debate in Germany Over Time
Gabriella Lapesa, Andre Blessing, Nico Blokker, Erenay Dayanik, Sebastian Haunss, Jonas Kuhn and Sebastian Padó
pp. 919‑927
pdf bib A Corpus of Spanish Political Speeches from 1937 to 2019
Elena Álvarez-Mellado
pp. 928‑932
pdf bib A New Latin Treebank for Universal Dependencies: Charters between Ancient Latin and Romance Languages
Flavio Massimiliano Cecchini, Timo Korkiakangas and Marco Passarotti
pp. 933‑942
pdf bib Identification of Indigenous Knowledge Concepts through Semantic Networks, Spelling Tools and Word Embeddings
Renato Rocha Souza, Amelie Dorn, Barbara Piringer and Eveline Wandl-Vogt
pp. 943‑947
pdf bib A Multi-Orthography Parallel Corpus of Yiddish Nouns
Jonne Saleva
pp. 948‑952
pdf bib An Annotated Corpus of Adjective-Adverb Interfaces in Romance Languages
Katharina Gerhalter, Gerlinde Schneider, Christopher Pollin and Martin Hummel
pp. 953‑957
pdf bib Language Resources for Historical Newspapers: the Impresso Collection
Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Ströbel and Raphaël Barman
pp. 958‑968
pdf bib Allgemeine Musikalische Zeitung as a Searchable Online Corpus
Bernd Kampe, Tinghui Duan and Udo Hahn
pp. 969‑976
pdf bib Stylometry in a Bilingual Setup
Silvie Cinkova and Jan Rybicki
pp. 977‑984
pdf bib Dialect Clustering with Character-Based Metrics: in Search of the Boundary of Language and Dialect
Yo Sato and Kevin Heffernan
pp. 985‑990
pdf bib DiscSense: Automated Semantic Analysis of Discourse Markers
Damien Sileo, Tim Van de Cruys, Camille Pradel and Philippe Muller
pp. 991‑999
pdf bib ThemePro: A Toolkit for the Analysis of Thematic Progression
Monica Dominguez, Juan Soler and Leo Wanner
pp. 1000‑1007
pdf bib Machine-Aided Annotation for Fine-Grained Proposition Types in Argumentation
Yohan Jo, Elijah Mayfield, Chris Reed and Eduard Hovy
pp. 1008‑1018
pdf bib Chinese Discourse Parsing: Model and Evaluation
Lin Chuan-An, Shyh-Shiun Hung, Hen-Hsen Huang and Hsin-Hsi Chen
pp. 1019‑1024
pdf bib Shallow Discourse Annotation for Chinese TED Talks
Wanqiu Long, Xinyi Cai, James Reid, Bonnie Webber and Deyi Xiong
pp. 1025‑1032
pdf bib The Discussion Tracker Corpus of Collaborative Argumentation
Christopher Olshefski, Luca Lugini, Ravneet Singh, Diane Litman and Amanda Godley
pp. 1033‑1043
pdf bib Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection
Henny Sluyter-Gäthje, Peter Bourgonje and Manfred Stede
pp. 1044‑1050
pdf bib A Corpus of Encyclopedia Articles with Logical Forms
Nathan Rasmussen and William Schuler
pp. 1051‑1060
pdf bib The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing
Peter Bourgonje and Manfred Stede
pp. 1061‑1066
pdf bib On the Creation of a Corpus for Coherence Evaluation of Discursive Units
Elham Mohammadi, Timothe Beiko and Leila Kosseim
pp. 1067‑1072
pdf bib Joint Learning of Syntactic Features Helps Discourse Segmentation
Takshak Desai, Parag Pravin Dakle and Dan Moldovan
pp. 1073‑1080
pdf bib Creating a Corpus of Gestures and Predicting the Audience Response based on Gestures in Speeches of Donald Trump
Verena Ruf and Costanza Navarretta
pp. 1081‑1088
pdf bib GeCzLex: Lexicon of Czech and German Anaphoric Connectives
Lucie Poláková, Kateřina Rysová, Magdaléna Rysová and Jiří Mírovský
pp. 1089‑1096
pdf bib DiMLex-Bangla: A Lexicon of Bangla Discourse Connectives
Debopam Das, Manfred Stede, Soumya Sankar Ghosh and Lahari Chatterjee
pp. 1097‑1102
pdf bib Semi-Supervised Tri-Training for Explicit Discourse Argument Expansion
Rene Knaebel and Manfred Stede
pp. 1103‑1109
pdf bib WikiPossessions: Possession Timeline Generation as an Evaluation Benchmark for Machine Reading Comprehension of Long Texts
Dhivya Chinnappa, Alexis Palmer and Eduardo Blanco
pp. 1110‑1117
pdf bib TED-Q: TED Talks and the Questions they Evoke
Matthijs Westera, Laia Mayol and Hannah Rohde
pp. 1118‑1127
pdf bib CzeDLex 0.6 and its Representation in the PML-TQ
Jiří Mírovský, Lucie Poláková and Pavlína Synková
pp. 1128‑1134
pdf bib Corpus for Modeling User Interactions in Online Persuasive Discussions
Ryo Egawa, Gaku Morio and Katsuhide Fujita
pp. 1135‑1141
pdf bib Simplifying Coreference Chains for Dyslexic Children
Rodrigo Wilkens and Amalia Todirascu
pp. 1142‑1151
pdf bib Adapting BERT to Implicit Discourse Relation Classification with a Focus on Discourse Connectives
Yudai Kishimoto, Yugo Murawaki and Sadao Kurohashi
pp. 1152‑1158
pdf bib What Speakers really Mean when they Ask Questions: Classification of Intentions with a Supervised Approach
Angèle Barbedette and Iris Eshkol-Taravella
pp. 1159‑1166
pdf bib Modeling Dialogue in Conversational Cognitive Health Screening Interviews
Shahla Farzana, Mina Valizadeh and Natalie Parde
pp. 1167‑1177
pdf bib Stigma Annotation Scheme and Stigmatized Language Detection in Health-Care Discussions on Social Media
Nadiya Straton, Hyeju Jang and Raymond Ng
pp. 1178‑1190
pdf bib An Annotated Dataset of Discourse Modes in Hindi Stories
Swapnil Dhanwal, Hritwik Dutta, Hitesh Nankani, Nilay Shrivastava, Yaman Kumar, Junyi Jessy Li, Debanjan Mahata, Rakesh Gosangi, Haimin Zhang, Rajiv Ratn Shah and Amanda Stent
pp. 1191‑1196
pdf bib Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag Set
Hassan S. Shavarani and Satoshi Sekine
pp. 1197‑1201
pdf bib An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis
Leila Moudjari, Karima Akli-Astouati and Farah Benamara
pp. 1202‑1210
pdf bib Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) Task
Valeriya Slovikovskaya and Giuseppe Attardi
pp. 1211‑1218
pdf bib Scientific Statement Classification over
Deyan Ginev and Bruce R Miller
pp. 1219‑1226
pdf bib Cross-domain Author Gender Classification in Brazilian Portuguese
Rafael Dias and Ivandré Paraboni
pp. 1227‑1234
pdf bib LEDGAR: A Large-Scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts
Don Tuggener, Pius von Däniken, Thomas Peetz and Mark Cieliebak
pp. 1235‑1241
pdf bib Online Near-Duplicate Detection of News Articles
Simon Rodier and Dave Carter
pp. 1242‑1249
pdf bib Automated Essay Scoring System for Nonnative Japanese Learners
Reo Hirao, Mio Arai, Hiroki Shimanaka, Satoru Katsumata and Mamoru Komachi
pp. 1250‑1257
pdf bib A Real-World Data Resource of Complex Sensitive Sentences Based on Documents from the Monsanto Trial
Jan Neerbek, Morten Eskildsen, Peter Dolog and Ira Assent
pp. 1258‑1267
pdf bib Discovering Biased News Articles Leveraging Multiple Human Annotations
Konstantina Lazaridou, Alexander Löser, Maria Mestre and Felix Naumann
pp. 1268‑1277
pdf bib Corpora and Baselines for Humour Recognition in Portuguese
Hugo Gonçalo Oliveira, André Clemêncio and Ana Alves
pp. 1278‑1285
pdf bib FactCorp: A Corpus of Dutch Fact-checks and its Multiple Usages
Marten van der Meulen and W. Gudrun Reijnierse
pp. 1286‑1292
pdf bib Automatic Orality Identification in Historical Texts
Katrin Ortmann and Stefanie Dipper
pp. 1293‑1302
pdf bib Using Deep Neural Networks with Intra- and Inter-Sentence Context to Classify Suicidal Behaviour
Xingyi Song, Johnny Downs, Sumithra Velupillai, Rachel Holden, Maxim Kikoler, Kalina Bontcheva, Rina Dutta and Angus Roberts
pp. 1303‑1310
pdf bib A First Dataset for Film Age Appropriateness Investigation
Emad Mohamed and Le An Ha
pp. 1311‑1317
pdf bib Habibi - a multi Dialect multi National Arabic Song Lyrics Corpus
Mahmoud El-Haj
pp. 1318‑1326
pdf bib Age Suitability Rating: Predicting the MPAA Rating Based on Movie Dialogues
Mahsa Shafaei, Niloofar Safi Samghabadi, Sudipta Kar and Thamar Solorio
pp. 1327‑1335
pdf bib Email Classification Incorporating Social Networks and Thread Structure
Sakhar Alkhereyf and Owen Rambow
pp. 1336‑1345
pdf bib Development and Validation of a Corpus for Machine Humor Comprehension
Yuen-Hsien Tseng, Wun-Syuan Wu, Chia-Yueh Chang, Hsueh-Chih Chen and Wei-Lun Hsu
pp. 1346‑1352
pdf bib Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers
Núria Gala, Anaïs Tack, Ludivine Javourey-Drevet, Thomas François and Johannes C. Ziegler
pp. 1353‑1361
pdf bib A Corpus for Detecting High-Context Medical Conditions in Intensive Care Patient Notes Focusing on Frequently Readmitted Patients
Edward T. Moseley, Joy T. Wu, Jonathan Welt, John Foote, Patrick D. Tyler, David W. Grant, Eric T. Carlson, Sebastian Gehrmann, Franck Dernoncourt and Leo Anthony Celi
pp. 1362‑1367
pdf bib Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus
Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau
pp. 1368‑1375
pdf bib An Evaluation of Progressive Neural Networksfor Transfer Learning in Natural Language Processing
Abdul Moeed, Gerhard Hagerer, Sumit Dugar, Sarthak Gupta, Mainak Ghosh, Hannah Danner, Oliver Mitevski, Andreas Nawroth and Georg Groh
pp. 1376‑1381
pdf bib WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection
Noé Cécillon, Vincent Labatut, Richard Dufour and Georges Linarès
pp. 1382‑1390
pdf bib FloDusTA: Saudi Tweets Dataset for Flood, Dust Storm, and Traffic Accident Events
Btool Hamoui, Mourad Mars and Khaled Almotairi
pp. 1391‑1396
pdf bib An Annotated Corpus for Sexism Detection in French Tweets
Patricia Chiril, Véronique Moriceau, Farah Benamara, Alda Mari, Gloria Origgi and Marlène Coulomb-Gully
pp. 1397‑1403
pdf bib Measuring the Impact of Readability Features in Fake News Detection
Roney Santos, Gabriela Pedro, Sidney Leal, Oto Vale, Thiago Pardo, Kalina Bontcheva and Carolina Scarton
pp. 1404‑1413
pdf bib When Shallow is Good Enough: Automatic Assessment of Conceptual Text Complexity using Shallow Semantic Features
Sanja Stajner and Ioana Hulpuș
pp. 1414‑1422
pdf bib DecOp: A Multilingual and Multi-domain Corpus For Detecting Deception In Typed Text
Pasquale Capuozzo, Ivano Lauriola, Carlo Strapparava, Fabio Aiolli and Giuseppe Sartori
pp. 1423‑1430
pdf bib Age Recommendation for Texts
Alexis Blandin, Gwénolé Lecorvé, Delphine Battistelli and Aline Étienne
pp. 1431‑1439
pdf bib Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition
Xiaolei Huang, Linzi Xing, Franck Dernoncourt and Michael J. Paul
pp. 1440‑1448
pdf bib VICTOR: a Dataset for Brazilian Legal Documents Classification
Pedro Henrique Luz de Araujo, Teófilo Emídio de Campos, Fabricio Ataides Braz and Nilton Correia da Silva
pp. 1449‑1458
pdf bib Dynamic Classification in Web Archiving Collections
Krutarth Patel, Cornelia Caragea and Mark Phillips
pp. 1459‑1468
pdf bib Aspect Flow Representation and Audio Inspired Analysis for Texts
Larissa Vasconcelos, Claudio Campelo and Caio Jeronimo
pp. 1469‑1477
pdf bib Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing
Sora Lim, Adam Jatowt, Michael Färber and Masatoshi Yoshikawa
pp. 1478‑1484
pdf bib Evaluation of Deep Gaussian Processes for Text Classification
P. Jayashree and P. K. Srijith
pp. 1485‑1491
pdf bib EmoEvent: A Multilingual Emotion Corpus based on different Events
Flor Miriam Plaza del Arco, Carlo Strapparava, L. Alfonso Urena Lopez and Maite Martin
pp. 1492‑1498
pdf bib MuSE: a Multimodal Dataset of Stressed Emotion
Mimansa Jaiswal, Cristian-Paul Bara, Yuanhang Luo, Mihai Burzo, Rada Mihalcea and Emily Mower Provost
pp. 1499‑1510
pdf bib Affect inTweets: A Transfer Learning Approach
Linrui Zhang, Hsin-Lun Huang, Yang Yu and Dan Moldovan
pp. 1511‑1516
pdf bib Annotation of Emotion Carriers in Personal Narratives
Aniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner and Giuseppe Riccardi
pp. 1517‑1525
pdf bib Towards Interactive Annotation for Hesitation in Conversational Speech
Jane Wottawa, Marie Tahon, Apolline Marin and Nicolas Audibert
pp. 1526‑1532
pdf bib Abusive language in Spanish children and young teenager’s conversations: data preparation and short text classification with contextual word embeddings
Marta R. Costa-jussà, Esther González, Asuncion Moreno and Eudald Cumalat
pp. 1533‑1537
pdf bib IIIT-H TEMD Semi-Natural Emotional Speech Database from Professional Actors and Non-Actors
Banothu Rambabu, Kishore Kumar Botsa, Gangamohan Paidi and Suryakanth V Gangashetty
pp. 1538‑1545
pdf bib The POTUS Corpus, a Database of Weekly Addresses for the Study of Stance in Politics and Virtual Agents
Thomas Janssoone, Kévin Bailly, Gaël Richard and Chloé Clavel
pp. 1546‑1553
pdf bib GoodNewsEveryone: A Corpus of News Headlines Annotated with Emotions, Semantic Roles, and Reader Perception
Laura Ana Maria Bostan, Evgeny Kim and Roman Klinger
pp. 1554‑1566
pdf bib SOLO: A Corpus of Tweets for Examining the State of Being Alone
Svetlana Kiritchenko, Will Hipson, Robert Coplan and Saif M. Mohammad
pp. 1567‑1577
pdf bib PoKi: A Large Dataset of Poems by Children
Will Hipson and Saif M. Mohammad
pp. 1578‑1589
pdf bib AlloSat: A New Call Center French Corpus for Satisfaction and Frustration Analysis
Manon Macary, Marie Tahon, Yannick Estève and Anthony Rousseau
pp. 1590‑1597
pdf bib Learning the Human Judgment for the Automatic Evaluation of Chatbot
Shih-Hung Wu and Sheng-Lun Chien
pp. 1598‑1602
pdf bib Korean-Specific Emotion Annotation Procedure Using N-Gram-Based Distant Supervision and Korean-Specific-Feature-Based Distant Supervision
Young-Jun Lee, Chae-Gyun Lim and Ho-Jin Choi
pp. 1603‑1610
pdf bib Semi-Automatic Construction and Refinement of an Annotated Corpus for a Deep Learning Framework for Emotion Classification
Jiajun Xu, Kyosuke Masuda, Hiromitsu Nishizaki, Fumiyo Fukumoto and Yoshimi Suzuki
pp. 1611‑1617
pdf bib CEASE, a Corpus of Emotion Annotated Suicide notes in English
Soumitra Ghosh, Asif Ekbal and Pushpak Bhattacharyya
pp. 1618‑1626
pdf bib Training a Broad-Coverage German Sentiment Classification Model for Dialog Systems
Oliver Guhr, Anne-Kathrin Schumann, Frank Bahrmann and Hans Joachim Böhme
pp. 1627‑1632
pdf bib An Event-comment Social Media Corpus for Implicit Emotion Analysis
Sophia Yat Mei Lee and Helena Yan Ping Lau
pp. 1633‑1642
pdf bib An Emotional Mess! Deciding on a Framework for Building a Dutch Emotion-Annotated Corpus
Luna De Bruyne, Orphee De Clercq and Veronique Hoste
pp. 1643‑1651
pdf bib PO-EMO: Conceptualization, Annotation, and Modeling of Aesthetic Emotions in German and English Poetry
Thomas Haider, Steffen Eger, Evgeny Kim, Roman Klinger and Winfried Menninghaus
pp. 1652‑1663
pdf bib Learning Word Ratings for Empathy and Distress from Document-Level User Responses
João Sedoc, Sven Buechel, Yehonathan Nachmany, Anneke Buffone and Lyle Ungar
pp. 1664‑1673
pdf bib Evaluation of Sentence Representations in Polish
Slawomir Dadas, Michał Perełkiewicz and Rafał Poświata
pp. 1674‑1680
pdf bib Identification of Primary and Collateral Tracks in Stuttered Speech
Rachid Riad, Anne-Catherine Bachoud-Lévi, Frank Rudzicz and Emmanuel Dupoux
pp. 1681‑1688
pdf bib How to Compare Automatically Two Phonological Strings: Application to Intelligibility Measurement in the Case of Atypical Speech
Alain Ghio, Muriel Lalain, Laurence Giusti, Corinne Fredouille and Virginie Woisard
pp. 1689‑1694
pdf bib Evaluating Text Coherence at Sentence and Paragraph Levels
Sennan Liu, Shuang Zeng and Sujian Li
pp. 1695‑1703
pdf bib HardEval: Focusing on Challenging Tokens to Assess Robustness of NER
Gabriel Bernier-Colborne and Phillippe Langlais
pp. 1704‑1711
pdf bib An Evaluation Dataset for Identifying Communicative Functions of Sentences in English Scholarly Papers
Kenichi Iwatsuki, Florian Boudin and Akiko Aizawa
pp. 1712‑1720
pdf bib An Automatic Tool For Language Evaluation
Fabio Fassetti and Ilaria Fassetti
pp. 1721‑1726
pdf bib Which Evaluations Uncover Sense Representations that Actually Make Sense?
Jordan Boyd-Graber, Fenfei Guo, Leah Findlater and Mohit Iyyer
pp. 1727‑1738
pdf bib Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections
Yi-An Lai, Xuan Zhu, Yi Zhang and Mona Diab
pp. 1739‑1746
pdf bib Towards Few-Shot Event Mention Retrieval: An Evaluation Framework and A Siamese Network Approach
Bonan Min, Yee Seng Chan and Lingjun Zhao
pp. 1747‑1752
pdf bib Linguistic Appropriateness and Pedagogic Usefulness of Reading Comprehension Questions
Andrea Horbach, Itziar Aldabe, Marie Bexte, Oier Lopez de Lacalle and Montse Maritxalar
pp. 1753‑1762
pdf bib Dataset Reproducibility and IR Methods in Timeline Summarization
Leo Born, Maximilian Bacher and Katja Markert
pp. 1763‑1771
pdf bib Database Search vs. Information Retrieval: A Novel Method for Studying Natural Language Querying of Semi-Structured Data
Stefanie Nadig, Martin Braschler and Kurt Stockinger
pp. 1772‑1779
pdf bib Why Attention is Not Explanation: Surgical Intervention and Causal Reasoning about Neural Models
Christopher Grimsley, Elijah Mayfield and Julia R.S. Bursten
pp. 1780‑1790
pdf bib Have a Cake and Eat it Too: Assessing Discriminating Performance of an Intelligibility Index Obtained from a Reduced Sample Size
Anna Marczyk, Alain Ghio, Muriel Lalain, Marie Rebourg, Corinne Fredouille and Virginie Woisard
pp. 1791‑1795
pdf bib Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings
Abdul Moeed, Yang An, Gerhard Hagerer and Georg Groh
pp. 1796‑1802
pdf bib LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation
Gustavo Aguilar, Sudipta Kar and Thamar Solorio
pp. 1803‑1813
pdf bib Paraphrase Generation and Evaluation on Colloquial-Style Sentences
Eetu Sjöblom, Mathias Creutz and Yves Scherrer
pp. 1814‑1822
pdf bib Analyzing Word Embedding Through Structural Equation Modeling
Namgi Han, Katsuhiko Hayashi and Yusuke Miyao
pp. 1823‑1832
pdf bib Evaluation of Lifelong Learning Systems
Yevhenii Prokopalo, Sylvain Meignier, Olivier Galibert, Loic Barrault and Anthony Larcher
pp. 1833‑1841
pdf bib Interannotator Agreement for Lexico-Semantic Annotation of a Corpus
Elżbieta Hajnicz
pp. 1842‑1848
pdf bib An In-Depth Comparison of 14 Spelling Correction Tools on a Common Benchmark
Markus Näther
pp. 1849‑1857
pdf bib Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks
Yu Yuan and Serge Sharoff
pp. 1858‑1865
pdf bib Evaluating Language Tools for Fifteen EU-official Under-resourced Languages
Diego Alves, Gaurish Thakkar and Marko Tadić
pp. 1866‑1873
pdf bib Word Embedding Evaluation for Sinhala
Dimuthu Lakmal, Surangika Ranathunga, Saman Peramuna and Indu Herath
pp. 1874‑1881
pdf bib Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Carlos Aspillaga, Andrés Carvallo and Vladimir Araujo
pp. 1882‑1894
pdf bib Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations
Arkadiusz Janz, Łukasz Kopociński, Maciej Piasecki and Agnieszka Pluwak
pp. 1895‑1901
pdf bib A Tale of Three Parsers: Towards Diagnostic Evaluation for Meaning Representation Parsing
Maja Buljan, Joakim Nivre, Stephan Oepen and Lilja Øvrelid
pp. 1902‑1909
pdf bib Headword-Oriented Entity Linking: A Special Entity Linking Task with Dataset and Baseline
Mu Yang, Chi-Yen Chen, Yi-Hui Lee, Qian-hui Zeng, Wei-Yun Ma, Chen-Yang Shih and Wei-Jhih Chen
pp. 1910‑1917
pdf bib TableBank: Table Benchmark for Image-based Table Detection and Recognition
Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou and Zhoujun Li
pp. 1918‑1925
pdf bib WIKIR: A Python Toolkit for Building a Large-scale Wikipedia-based English Information Retrieval Dataset
Jibril Frej, Didier Schwab and Jean-Pierre Chevallet
pp. 1926‑1933
pdf bib Constructing a Public Meeting Corpus
Koji Tanaka, Chenhui Chu, Haolin Ren, Benjamin Renoust, Yuta Nakashima, Noriko Takemura, Hajime Nagahara and Takao Fujikawa
pp. 1934‑1940
pdf bib Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature
Fusataka Kuniyoshi, Kohei Makino, Jun Ozawa and Makoto Miwa
pp. 1941‑1950
pdf bib WEXEA: Wikipedia EXhaustive Entity Annotation
Michael Strobl, Amine Trabelsi and Osmar Zaiane
pp. 1951‑1958
pdf bib Handling Entity Normalization with no Annotated Corpus: Weakly Supervised Methods Based on Distributional Representation and Ontological Information
Arnaud Ferré, Robert Bossy, Mouhamadou Ba, Louise Deléger, Thomas Lavergne, Pierre Zweigenbaum and Claire Nédellec
pp. 1959‑1966
pdf bib HBCP Corpus: A New Resource for the Analysis of Behavioural Change Intervention Reports
Francesca Bonin, Martin Gleize, Ailbhe Finnerty, Candice Moore, Charles Jochim, Emma Norris, Yufang Hou, Alison J. Wright, Debasis Ganguly, Emily Hayes, Silje Zink, Alessandra Pascale, Pol Mac Aonghusa and Susan Michie
pp. 1967‑1975
pdf bib Cross-lingual Structure Transfer for Zero-resource Event Extraction
Di Lu, Ananya Subburathinam, Heng Ji, Jonathan May, Shih-Fu Chang, Avi Sil and Clare Voss
pp. 1976‑1981
pdf bib Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction
Alan Ramponi, Barbara Plank and Rosario Lombardo
pp. 1982‑1989
pdf bib Semantic Annotation for Improved Safety in Construction Work
Paul Thompson, Tim Yates, Emrah Inan and Sophia Ananiadou
pp. 1990‑1999
pdf bib Social Web Observatory: A Platform and Method for Gathering Knowledge on Entities from Different Textual Sources
Leonidas Tsekouras, Georgios Petasis, George Giannakopoulos and Aris Kosmopoulos
pp. 2000‑2008
pdf bib Development of a Corpus Annotated with Medications and their Attributes in Psychiatric Health Records
Jaya Chaturvedi, Natalia Viani, Jyoti Sanyal, Chloe Tytherleigh, Idil Hasan, Kate Baird, Sumithra Velupillai, Robert Stewart and Angus Roberts
pp. 2009‑2016
pdf bib Do not let the history haunt you: Mitigating Compounding Errors in Conversational Question Answering
Angrosh Mandya, James O’ Neill, Danushka Bollegala and Frans Coenen
pp. 2017‑2025
pdf bib CLEEK: A Chinese Long-text Corpus for Entity Linking
Weixin Zeng, Xiang Zhao, Jiuyang Tang, Zhen Tan and Xuqian Huang
pp. 2026‑2035
pdf bib The Medical Scribe: Corpus Development and Model Performance Analyses
Izhak Shafran, Nan Du, Linh Tran, Amanda Perry, Lauren Keyes, Mark Knichel, Ashley Domin, Lei Huang, Yu-hui Chen, Gang Li, Mingqiu Wang, Laurent El Shafey, Hagen Soltau and Justin Stuart Paul
pp. 2036‑2044
pdf bib A Contract Corpus for Recognizing Rights and Obligations
Ruka Funaki, Yusuke Nagata, Kohei Suenaga and Shinsuke Mori
pp. 2045‑2053
pdf bib Recognition of Implicit Geographic Movement in Text
Scott Pezanowski and Prasenjit Mitra
pp. 2054‑2063
pdf bib Extraction of the Argument Structure of Tokyo Metropolitan Assembly Minutes: Segmentation of Question-and-Answer Sets
Keiichi Takamaru, Yasutomo Kimura, Hideyuki Shibuki, Hokuto Ototake, Yuzu Uchida, Kotaro Sakamoto, Madoka Ishioroshi, Teruko Mitamura and Noriko Kando
pp. 2064‑2068
pdf bib A Term Extraction Approach to Survey Analysis in Health Care
Cécile Robin, Mona Isazad Mashinchi, Fatemeh Ahmadi Zeleti, Adegboyega Ojo and Paul Buitelaar
pp. 2069‑2077
pdf bib A Scientific Information Extraction Dataset for Nature Inspired Engineering
Ruben Kruiper, Julian F.V. Vincent, Jessica Chen-Burger, Marc P.Y. Desmulliez and Ioannis Konstas
pp. 2078‑2085
pdf bib Automated Discovery of Mathematical Definitions in Text
Natalia Vanetik, Marina Litvak, Sergey Shevchuk and Lior Reznik
pp. 2086‑2094
pdf bib WN-Salience: A Corpus of News Articles with Entity Salience Annotations
Chuan Wu, Evangelos Kanoulas, Maarten de Rijke and Wei Lu
pp. 2095‑2102
pdf bib Event Extraction from Unstructured Amharic Text
ephrem tadesse, Rosa Tsegaye and Kuulaa Qaqqabaa
pp. 2103‑2109
pdf bib Comparing Machine Learning and Deep Learning Approaches on NLP Tasks for the Italian Language
Bernardo Magnini, Alberto Lavelli and Simone Magnolini
pp. 2110‑2119
pdf bib MyFixit: An Annotated Dataset, Annotation Tool, and Baseline Methods for Information Extraction from Repair Manuals
Nima Nabizadeh, Dorothea Kolossa and Martin Heckmann
pp. 2120‑2128
pdf bib Towards Entity Spaces
Marieke van Erp and Paul Groth
pp. 2129‑2137
pdf bib Love Me, Love Me, Say (and Write!) that You Love Me: Enriching the WASABI Song Corpus with Lyrics Annotations
Michael Fell, Elena Cabrio, Elmahdi Korfed, Michel Buffa and Fabien Gandon
pp. 2138‑2147
pdf bib Evaluating Information Loss in Temporal Dependency Trees
Mustafa Ocal and Mark Finlayson
pp. 2148‑2156
pdf bib Populating Legal Ontologies using Semantic Role Labeling
Llio Humphreys, Guido Boella, Luigi Di Caro, Livio Robaldo, Leon van der Torre, Sepideh Ghanavati and Robert Muthuri
pp. 2157‑2166
pdf bib PST 2.0 – Corpus of Polish Spatial Texts
Michał Marcińczuk, Marcin Oleksy and Jan Wieczorek
pp. 2167‑2174
pdf bib Natural Language Premise Selection: Finding Supporting Statements for Mathematical Text
Deborah Ferreira and André Freitas
pp. 2175‑2182
pdf bib Odinson: A Fast Rule-based Information Extraction Framework
Marco A. Valenzuela-Escárcega, Gus Hahn-Powell and Dane Bell
pp. 2183‑2191
pdf bib The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources
Jennifer D’Souza, Anett Hoppe, Arthur Brack, Mohmad Yaser Jaradeh, Sören Auer and Ralph Ewerth
pp. 2192‑2203
pdf bib MathAlign: Linking Formula Identifiers to their Contextual Natural Language Descriptions
Maria Alexeeva, Rebecca Sharp, Marco A. Valenzuela-Escárcega, Jennifer Kadowaki, Adarsh Pyarelal and Clayton Morrison
pp. 2204‑2212
pdf bib Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction
Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe and Montse Maritxalar
pp. 2213‑2222
pdf bib Temporal Histories of Epidemic Events (THEE): A Case Study in Temporal Annotation for Public Health
Jingcheng Niu, Victoria Ng, Gerald Penn and Erin E. Rees
pp. 2223‑2230
pdf bib Exploiting Citation Knowledge in Personalised Recommendation of Recent Scientific Publications
Anita Khadka, Iván Cantador and Miriam Fernandez
pp. 2231‑2240
pdf bib A Platform for Event Extraction in Hindi
Sovan Kumar Sahoo, Saumajit Saha, Asif Ekbal and Pushpak Bhattacharyya
pp. 2241‑2250
pdf bib Rad-SpatialNet: A Frame-based Resource for Fine-Grained Spatial Relations in Radiology Reports
Surabhi Datta, Morgan Ulinski, Jordan Godfrey-Stovall, Shekhar Khanpara, Roy F. Riascos-Castaneda and Kirk Roberts
pp. 2251‑2260
pdf bib NLP Analytics in Finance with DoRe: A French 250M Tokens Corpus of Corporate Annual Reports
Corentin Masson and Patrick Paroubek
pp. 2261‑2267
pdf bib The Language of Brain Signals: Natural Language Processing of Electroencephalography Reports
Ramon Maldonado and Sanda Harabagiu
pp. 2268‑2275
pdf bib Humans Keep It One Hundred: an Overview of AI Journey
Tatiana Shavrina, Anton Emelyanov, Alena Fenogenova, Vadim Fomin, Vladislav Mikhailov, Andrey Evlampiev, Valentin Malykh, Vladimir Larin, Alex Natekin, Aleksandr Vatulin, Peter Romov, Daniil Anastasiev, Nikolai Zinov and Andrey Chertok
pp. 2276‑2284
pdf bib Towards Data-driven Ontologies: a Filtering Approach using Keywords and Natural Language Constructs
Maaike de Boer and Jack P. C. Verhoosel
pp. 2285‑2292
pdf bib A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News
Ali Jabbari, Olivier Sauvage, Hamada Zeine and Hamza Chergui
pp. 2293‑2299
pdf bib Inferences for Lexical Semantic Resource Building with Less Supervision
Nadia Bebeshina and Mathieu Lafourcade
pp. 2300‑2305
pdf bib Acquiring Social Knowledge about Personality and Driving-related Behavior
Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada and Sadao Kurohashi
pp. 2306‑2315
pdf bib Implicit Knowledge in Argumentative Texts: An Annotated Corpus
Maria Becker, Katharina Korfhage and Anette Frank
pp. 2316‑2324
pdf bib Multiple Knowledge GraphDB (MKGDB)
Stefano Faralli, Paola Velardi and Farid Yusifli
pp. 2325‑2331
pdf bib Orchestrating NLP Services for the Legal Domain
Julian Moreno-Schneider, Georg Rehm, Elena Montiel-Ponsoda, Víctor Rodriguez-Doncel, Artem Revenko, Sotirios Karampatakis, Maria Khvalchik, Christian Sageder, Jorge Gracia and Filippo Maganza
pp. 2332‑2340
pdf bib Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph
Georgeta Bordea, Stefano Faralli, Fleur Mougin, Paul Buitelaar and Gayo Diallo
pp. 2341‑2347
pdf bib Subjective Evaluation of Comprehensibility in Movie Interactions
Estelle Randria, Lionel Fontan, Maxime Le Coz, Isabelle Ferrané and Julien Pinquier
pp. 2348‑2357
pdf bib Representing Multiword Term Variation in a Terminological Knowledge Base: a Corpus-Based Study
Pilar León-Araúz, Arianne Reimerink and Melania Cabezas-García
pp. 2358‑2367
pdf bib Understanding Spatial Relations through Multiple Modalities
Soham Dan, Hangfeng He and Dan Roth
pp. 2368‑2372
pdf bib A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
Dwaipayan Roy, Sumit Bhatia and Prateek Jain
pp. 2373‑2380
pdf bib Pártélet: A Hungarian Corpus of Propaganda Texts from the Hungarian Socialist Era
Zoltán Kmetty, Veronika Vincze, Dorottya Demszky, Orsolya Ring, Balázs Nagy and Martina Katalin Szabó
pp. 2381‑2388
pdf bib KORE 50^DYWC: An Evaluation Data Set for Entity Linking Based on DBpedia, YAGO, Wikidata, and Crunchbase
Kristian Noullet, Rico Mix and Michael Färber
pp. 2389‑2395
pdf bib Eye4Ref: A Multimodal Eye Movement Dataset of Referentially Complex Situations
Özge Alacam, Eugen Ruppert, Amr Rekaby Salama, Tobias Staron and Wolfgang Menzel
pp. 2396‑2404
pdf bib SiBert: Enhanced Chinese Pre-trained Language Model with Sentence Insertion
Jiahao Chen, Chenjie Cao and Xiuyan Jiang
pp. 2405‑2412
pdf bib Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset
Brian Roark, Lawrence Wolf-Sonkin, Christo Kirov, Sabrina J. Mielke, Cibu Johny, Isin Demirsahin and Keith Hall
pp. 2413‑2423
pdf bib GM-RKB WikiText Error Correction Task and Baselines
Gabor Melli, Abdelrhman Eldallal, Bassim Lazem and Olga Moreira
pp. 2424‑2430
pdf bib Embedding Space Correlation as a Measure of Domain Similarity
Anne Beyer, Göran Kauermann and Hinrich Schütze
pp. 2431‑2439
pdf bib Wiki-40B: Multilingual Language Model Dataset
Mandy Guo, Zihang Dai, Denny Vrandečić and Rami Al-Rfou
pp. 2440‑2452
pdf bib Know thy Corpus! Robust Methods for Digital Curation of Web corpora
Serge Sharoff
pp. 2453‑2460
pdf bib Evaluating Approaches to Personalizing Language Models
Milton King and Paul Cook
pp. 2461‑2469
pdf bib Class-based LSTM Russian Language Model with Linguistic Information
Irina Kipyatkova and Alexey Karpov
pp. 2470‑2474
pdf bib Adaptation of Deep Bidirectional Transformers for Afrikaans Language
Sello Ralethe
pp. 2475‑2478
pdf bib FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoit Crabbé, Laurent Besacier and Didier Schwab
pp. 2479‑2490
pdf bib Accelerated High-Quality Mutual-Information Based Word Clustering
Manuel R. Ciosici, Ira Assent and Leon Derczynski
pp. 2491‑2496
pdf bib Rhythmic Proximity Between Natives And Learners Of French - Evaluation of a metric based on the CEFC corpus
Sylvain Coulange and Solange Rossato
pp. 2497‑2502
pdf bib From Linguistic Resources to Ontology-Aware Terminologies: Minding the Representation Gap
Giulia Speranza, Maria Pia di Buono, Johanna Monti and Federico Sangati
pp. 2503‑2510
pdf bib Modeling Factual Claims with Semantic Frames
Fatma Arslan, Josue Caraballo, Damian Jimenez and Chengkai Li
pp. 2511‑2520
pdf bib Automatic Transcription Challenges for Inuktitut, a Low-Resource Polysynthetic Language
Vishwa Gupta and Gilles Boulianne
pp. 2521‑2527
pdf bib Geographically-Balanced Gigaword Corpora for 50 Language Varieties
Jonathan Dunn and Ben Adams
pp. 2528‑2536
pdf bib Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language
Maaz Amjad, Grigori Sidorov and Alisa Zhila
pp. 2537‑2542
pdf bib Evaluation of Greek Word Embeddings
Stamatis Outsios, Christos Karatsalos, Konstantinos Skianis and Michalis Vazirgiannis
pp. 2543‑2551
pdf bib A Dataset of Mycenaean Linear B Sequences
Katerina Papavassiliou, Gareth Owens and Dimitrios Kosmopoulos
pp. 2552‑2561
pdf bib The Nunavut Hansard Inuktitut–English Parallel Corpus 3.0 with Preliminary Machine Translation Results
Eric Joanis, Rebecca Knowles, Roland Kuhn, Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene Stewart and Jeffrey Micher
pp. 2562‑2572
pdf bib Exploring Bilingual Word Embeddings for Hiligaynon, a Low-Resource Language
Leah Michel, Viktor Hangya and Alexander Fraser
pp. 2573‑2580
pdf bib A Finite-State Morphological Analyser for Evenki
Anna Zueva, Anastasia Kuznetsova and Francis Tyers
pp. 2581‑2589
pdf bib Morphology-rich Alphasyllabary Embeddings
Amanuel Mersha and Stephen Wu
pp. 2590‑2595
pdf bib Localization of Fake News Detection via Multitask Transfer Learning
Jan Christian Blaise Cruz, Julianne Agatha Tan and Charibeth Cheng
pp. 2596‑2604
pdf bib Evaluating Sentence Segmentation in Different Datasets of Neuropsychological Language Tests in Brazilian Portuguese
Edresson Casanova, Marcos Treviso, Lilian Hübner and Sandra Aluísio
pp. 2605‑2614
pdf bib Jejueo Datasets for Machine Translation and Speech Synthesis
Kyubyong Park, Yo Joong Choe and Jiyeon Ham
pp. 2615‑2621
pdf bib Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language
Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai and Tatsuya Kawahara
pp. 2622‑2628
pdf bib Development of a Guarani - Spanish Parallel Corpus
Luis Chiruzzo, Pedro Amarilla, Adolfo Ríos and Gustavo Giménez Lugo
pp. 2629‑2633
pdf bib AR-ASAG An ARabic Dataset for Automatic Short Answer Grading Evaluation
Leila Ouahrani and Djamal Bennouar
pp. 2634‑2643
pdf bib Processing Language Resources of Under-Resourced and Endangered Languages for the Generation of Augmentative Alternative Communication Boards
Anne Ferger
pp. 2644‑2648
pdf bib The Nisvai Corpus of Oral Narrative Practices from Malekula (Vanuatu) and its Associated Language Resources
Jocelyn Aznar and Núria Gala
pp. 2649‑2656
pdf bib Building a Time-Aligned Cross-Linguistic Reference Corpus from Language Documentation Data (DoReCo)
Ludger Paschen, François Delafontaine, Christoph Draxler, Susanne Fuchs, Matthew Stave and Frank Seifart
pp. 2657‑2666
pdf bib Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages
Kevin Duh, Paul McNamee, Matt Post and Brian Thompson
pp. 2667‑2675
pdf bib Improved Finite-State Morphological Analysis for St. Lawrence Island Yupik Using Paradigm Function Morphology
Emily Chen, Hyunji Hayley Park and Lane Schwartz
pp. 2676‑2684
pdf bib Towards a Spell Checker for Zamboanga Chavacano Orthography
Marcelo Yuji Himoro and Antonio Pareja-Lora
pp. 2685‑2697
pdf bib Identifying Sentiments in Algerian Code-switched User-generated Comments
Wafia Adouane, Samia Touileb and Jean-Philippe Bernardy
pp. 2698‑2705
pdf bib Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German
Lucy Linder, Michael Jungo, Jean Hennebert, Claudiu Cristian Musat and Andreas Fischer
pp. 2706‑2711
pdf bib Evaluating Sub-word Embeddings in Cross-lingual Models
Ali Hakimi Parizi and Paul Cook
pp. 2712‑2719
pdf bib A Swiss German Dictionary: Variation in Speech and Writing
Larissa Schmidt, Lucy Linder, Sandra Djambazovska, Alexandros Lazaridis, Tanja Samardžić and Claudiu Musat
pp. 2720‑2725
pdf bib Towards a Corsican Basic Language Resource Kit
Laurent Kevers and Stella Retali-Medori
pp. 2726‑2735
pdf bib Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi’kmaq Language Modelling
Jeremie Boudreau, Akankshya Patra, Ashima Suvarna and Paul Cook
pp. 2736‑2745
pdf bib Exploring a Choctaw Language Corpus with Word Vectors and Minimum Distance Length
Jacqueline Brixey, David Sides, Timothy Vizthum, David Traum and Khalil Iskarous
pp. 2746‑2753
pdf bib Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yorùbá and Twi
Jesujoba Alabi, Kwabena Amponsah-Kaakyire, David Adelani and Cristina España-Bonet
pp. 2754‑2762
pdf bib TRopBank: Turkish PropBank V2.0
Neslihan Kara, Deniz Baran Aslan, Büşra Marşan, Özge Bakay, Koray Ak and Olcay Taner Yıldız
pp. 2763‑2772
pdf bib Collection and Annotation of the Romanian Legal Corpus
Dan Tufiș, Maria Mitrofan, Vasile Păiș, Radu Ion and Andrei Coman
pp. 2773‑2777
pdf bib An Empirical Evaluation of Annotation Practices in Corpora from Language Documentation
Kilu von Prince and Sebastian Nordhoff
pp. 2778‑2787
pdf bib Annotated Corpus for Sentiment Analysis in Odia Language
Gaurav Mohanty, Pruthwik Mishra and Radhika Mamidi
pp. 2788‑2795
pdf bib Building a Task-oriented Dialog System for Languages with no Training Data: the Case for Basque
Maddalen López de Lacalle, Xabier Saralegi and Iñaki San Vicente
pp. 2796‑2802
pdf bib SENCORPUS: A French-Wolof Parallel Corpus
Elhadji Mamadou Nguer, Alla Lo, Cheikh M. Bamba Dione, Sileye O. Ba and Moussa Lo
pp. 2803‑2811
pdf bib A Major Wordnet for a Minority Language: Scottish Gaelic
Gábor Bella, Fiona McNeill, Rody Gorman, Caoimhin O Donnaile, Kirsty MacDonald, Yamini Chandrashekar, Abed Alhakim Freihat and Fausto Giunchiglia
pp. 2812‑2818
pdf bib Crowdsourcing Speech Data for Low-Resource Languages from Low-Income Workers
Basil Abraham, Danish Goel, Divya Siddarth, Kalika Bali, Manu Chopra, Monojit Choudhury, Pratik Joshi, Preethi Jyoti, Sunayana Sitaram and Vivek Seshadri
pp. 2819‑2826
pdf bib A Resource for Studying Chatino Verbal Morphology
Hilaria Cruz, Antonios Anastasopoulos and Gregory Stump
pp. 2827‑2831
pdf bib Learnings from Technological Interventions in a Low Resource Language: A Case-Study on Gondi
Devansh Mehta, Sebastin Santy, Ramaravind Kommiya Mothilal, Brij Mohan Lal Srivastava, Alok Sharma, Anurag Shukla, Vishnu Prasad, Venkanna U, Amit Sharma and Kalika Bali
pp. 2832‑2838
pdf bib Irony Detection in Persian Language: A Transfer Learning Approach Using Emoji Prediction
Preni Golazizian, Behnam Sabeti, Seyed Arad Ashrafi Asli, Zahra Majdabadi, Omid Momenzadeh and reza fahmi
pp. 2839‑2845
pdf bib Towards Computational Resource Grammars for Runyankore and Rukiga
David Bamutura, Peter Ljunglöf and Peter Nebende
pp. 2846‑2854
pdf bib Optimizing Annotation Effort Using Active Learning Strategies: A Sentiment Analysis Case Study in Persian
Seyed Arad Ashrafi Asli, Behnam Sabeti, Zahra Majdabadi, Preni Golazizian, reza fahmi and Omid Momenzadeh
pp. 2855‑2861
pdf bib BanFakeNews: A Dataset for Detecting Fake News in Bangla
Md Zobaer Hossain, Md Ashraful Rahman, Md Saiful Islam and Sudipta Kar
pp. 2862‑2871
pdf bib A Resource for Computational Experiments on Mapudungun
Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo Vega, Antonios Anastasopoulos, Lori Levin and Alan W Black
pp. 2872‑2877
pdf bib Automated Parsing of Interlinear Glossed Text from Page Images of Grammatical Descriptions
Erich Round, Mark Ellison, Jayden Macklin-Cordes and Sacha Beniamine
pp. 2878‑2883
pdf bib The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration
Arya D. McCarthy, Rachel Wicks, Dylan Lewis, Aaron Mueller, Winston Wu, Oliver Adams, Garrett Nicolai, Matt Post and David Yarowsky
pp. 2884‑2892
pdf bib Towards Building an Automatic Transcription System for Language Documentation: Experiences from Muyu
Alexander Zahrer, Andrej Zgank and Barbara Schuppler
pp. 2893‑2900
pdf bib Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation Data
Daniel Jettka and Timm Lehmberg
pp. 2901‑2905
pdf bib CantoMap: a Hong Kong Cantonese MapTask Corpus
Grégoire Winterstein, Carmen Tang and Regine Lai
pp. 2906‑2913
pdf bib No Data to Crawl? Monolingual Corpus Creation from PDF Files of Truly low-Resource Languages in Peru
Gina Bustamante, Arturo Oncevay and Roberto Zariquiey
pp. 2914‑2923
pdf bib Creating a Parallel Icelandic Dependency Treebank from Raw Text to Universal Dependencies
Hildur Jónsdóttir and Anton Karl Ingason
pp. 2924‑2931
pdf bib Building a Universal Dependencies Treebank for Occitan
Aleksandra Miletic, Myriam Bras, Marianne Vergez-Couret, Louise Esher, Clamença Poujade and Jean Sibille
pp. 2932‑2939
pdf bib Building the Old Javanese Wordnet
David Moeljadi and Zakariya Pamuji Aminullah
pp. 2940‑2946
pdf bib CPLM, a Parallel Corpus for Mexican Languages: Development and Interface
Gerardo Sierra Martínez, Cynthia Montaño, Gemma Bel-Enguix, Diego Córdova and Margarita Mota Montoya
pp. 2947‑2952
pdf bib SiNER: A Large Dataset for Sindhi Named Entity Recognition
Wazir Ali, Junyu Lu and Zenglin Xu
pp. 2953‑2961
pdf bib Construct a Sense-Frame Aligned Predicate Lexicon for Chinese AMR Corpus
Li Song, Yuling Dai, Yihuan Liu, Bin Li and Weiguang Qu
pp. 2962‑2969
pdf bib MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
Lifeng Han, Gareth Jones and Alan Smeaton
pp. 2970‑2979
pdf bib A Myanmar (Burmese)-English Named Entity Transliteration Dictionary
Aye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama and Eiichiro Sumita
pp. 2980‑2983
pdf bib CA-EHN: Commonsense Analogy from E-HowNet
Peng-Hsuan Li, Tsan-Yu Yang and Wei-Yun Ma
pp. 2984‑2990
pdf bib Building Semantic Grams of Human Knowledge
Valentina Leone, Giovanni Siragusa, Luigi Di Caro and Roberto Navigli
pp. 2991‑3000
pdf bib Automatically Building a Multilingual Lexicon of False Friends With No Supervision
Ana Sabina Uban and Liviu P. Dinu
pp. 3001‑3007
pdf bib A Parallel WordNet for English, Swedish and Bulgarian
Krasimir Angelov
pp. 3008‑3015
pdf bib ENGLAWI: From Human- to Machine-Readable Wiktionary
Franck Sajous, Basilio Calderone and Nabil Hathout
pp. 3016‑3026
pdf bib Opening the Romance Verbal Inflection Dataset 2.0: A CLDF lexicon
Sacha Beniamine, Martin Maiden and Erich Round
pp. 3027‑3035
pdf bib word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
Yo Joong Choe, Kyubyong Park and Dongwoo Kim
pp. 3036‑3045
pdf bib Introducing Lexical Masks: a New Representation of Lexical Entries for Better Evaluation and Exchange of Lexicons
Bruno Cartoni, Daniel Calvelo Aros, Denny Vrandecic and Saran Lertpradit
pp. 3046‑3052
pdf bib A Large-Scale Leveled Readability Lexicon for Standard Arabic
Muhamed Al Khalil, Nizar Habash and Zhengyang Jiang
pp. 3053‑3062
pdf bib Preserving Semantic Information from Old Dictionaries: Linking Senses of the ’Altfranzösisches Wörterbuch’ to WordNet
Achim Stein
pp. 3063‑3068
pdf bib Cifu: a Frequency Lexicon of Hong Kong Cantonese
Regine Lai and Grégoire Winterstein
pp. 3069‑3077
pdf bib Odi et Amo. Creating, Evaluating and Extending Sentiment Lexicons for Latin.
Rachele Sprugnoli, Marco Passarotti, Daniela Corbetta and Andrea Peverelli
pp. 3078‑3086
pdf bib WordWars: A Dataset to Examine the Natural Selection of Words
Saif M. Mohammad
pp. 3087‑3095
pdf bib Challenge Dataset of Cognates and False Friend Pairs from Indian Languages
Diptesh Kanojia, Malhar Kulkarni, Pushpak Bhattacharyya and Gholamreza Haffari
pp. 3096‑3102
pdf bib Development of a Japanese Personality Dictionary based on Psychological Methods
Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada and Sadao Kurohashi
pp. 3103‑3108
pdf bib A Lexicon-Based Approach for Detecting Hedges in Informal Text
Jumayel Islam, Lu Xiao and Robert E. Mercer
pp. 3109‑3113
pdf bib Word Complexity Estimation for Japanese Lexical Simplification
Daiki Nishihara and Tomoyuki Kajiwara
pp. 3114‑3120
pdf bib Inducing Universal Semantic Tag Vectors
Da Huo and Gerard de Melo
pp. 3121‑3127
pdf bib LexiDB: Patterns & Methods for Corpus Linguistic Database Management
Matthew Coole, Paul Rayson and John Mariani
pp. 3128‑3135
pdf bib Towards a Semi-Automatic Detection of Reflexive and Reciprocal Constructions and Their Representation in a Valency Lexicon
Václava Kettnerová, Marketa Lopatkova, Anna Vernerová and Petra Barancikova
pp. 3136‑3144
pdf bib Languages Resources for Poorly Endowed Languages : The Case Study of Classical Armenian
Chahan Vidal-Gorène and Aliénor Decours-Perez
pp. 3145‑3152
pdf bib Constructing Web-Accessible Semantic Role Labels and Frames for Japanese as Additions to the NPCMJ Parsed Corpus
Koichi Takeuchi, Alastair Butler, Iku Nagasaki, Takuya Okamura and Prashant Pardeshi
pp. 3153‑3161
pdf bib Large-scale Cross-lingual Language Resources for Referencing and Framing
Piek Vossen, Filip Ilievski, Marten Postma, Antske Fokkens, Gosse Minnema and Levi Remijnse
pp. 3162‑3171
pdf bib Modelling Etymology in LMF/TEI: The Grande Dicionário Houaiss da Língua Portuguesa Dictionary as a Use Case
Fahad Khan, Laurent Romary, Ana Salgado, Jack Bowers, Mohamed Khemakhem and Toma Tasovac
pp. 3172‑3180
pdf bib Linking the TUFS Basic Vocabulary to the Open Multilingual Wordnet
Francis Bond, Hiroki Nomoto, Luís Morgado da Costa and Arthur Bond
pp. 3181‑3188
pdf bib Some Issues with Building a Multilingual Wordnet
Francis Bond, Luis Morgado da Costa, Michael Wayne Goodman, John Philip McCrae and Ahti Lohk
pp. 3189‑3197
pdf bib Collocations in Russian Lexicography and Russian Collocations Database
Maria Khokhlova
pp. 3198‑3206
pdf bib Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0
Clémentine Fourrier and Benoît Sagot
pp. 3207‑3216
pdf bib OFrLex: A Computational Morphological and Syntactic Lexicon for Old French
Gaël Guibon and Benoît Sagot
pp. 3217‑3225
pdf bib Automatic Reconstruction of Missing Romanian Cognates and Unattested Latin Words
Alina Maria Ciobanu, Liviu P. Dinu and Laurentiu Zoicas
pp. 3226‑3231
pdf bib A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John Philip McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, Thomas Troelsgård, Sussi Olsen, Simon Krek, Veronika Lipp, Tamás Váradi, László Simon, András Gyorffy, Carole Tiberius, Tanneke Schoonheim, Yifat Ben Moshe, Maya Rudich, Raya Abu Ahmad, Dorielle Lonke, Kira Kovalenko, Margit Langemets, Jelena Kallas, Oksana Dereza, Theodorus Fransen, David Cillessen, David Lindemann, Mikel Alonso, Ana Salgado, José Luis Sancho, Rafael-J. Ureña-Ruiz, Jordi Porta Zamorano, Kiril Simov, Petya Osenova, Zara Kancheva, Ivaylo Radev, Ranka Stanković, Andrej Perdih and Dejan Gabrovsek
pp. 3232‑3242
pdf bib A Broad-Coverage Deep Semantic Lexicon for Verbs
James Allen, Hannah An, Ritwik Bose, Will de Beaumont and Choh Man Teng
pp. 3243‑3251
pdf bib Computational Etymology and Word Emergence
Winston Wu and David Yarowsky
pp. 3252‑3259
pdf bib A Dataset of Translational Equivalents Built on the Basis of plWordNet-Princeton WordNet Synset Mapping
Ewa Rudnicka and Tomasz Naskręt
pp. 3260‑3264
pdf bib TRANSLIT: A Large-scale Name Transliteration Resource
Fernando Benites, Gilbert François Duivesteijn, Pius von Däniken and Mark Cieliebak
pp. 3265‑3271
pdf bib Computing with Subjectivity Lexicons
Caio L. M. Jeronimo, Claudio E. C. Campelo, Leandro Balby Marinho, Allan Sales, Adriano Veloso and Roberta Viola
pp. 3272‑3280
pdf bib The ACoLi Dictionary Graph
Christian Chiarcos, Christian Fäth and Maxim Ionov
pp. 3281‑3290
pdf bib Resources in Underrepresented Languages: Building a Representative Romanian Corpus
Ludmila Midrigan - Ciochina, Victoria Boyd, Lucila Sanchez-Ortega, Diana Malancea_Malac, Doina Midrigan and David P. Corina
pp. 3291‑3296
pdf bib World Class Language Technology - Developing a Language Technology Strategy for Danish
Sabine Kirchmeier, Bolette Pedersen, Sanni Nimb, Philip Diderichsen and Peter Juel Henrichsen
pp. 3297‑3301
pdf bib A Corpus for Automatic Readability Assessment and Text Simplification of German
Alessia Battisti, Dominik Pfütze, Andreas Säuberli, Marek Kostrzewa and Sarah Ebling
pp. 3302‑3311
pdf bib The CLARIN Knowledge Centre for Atypical Communication Expertise
Henk van den Heuvel, Nelleke Oostdijk, Caroline Rowland and Paul Trilsbeek
pp. 3312‑3316
pdf bib Corpora of Disordered Speech in the Light of the GDPR: Two Use Cases from the DELAD Initiative
Henk van den Heuvel, Aleksei Kelli, Katarzyna Klessa and Satu Salaasti
pp. 3317‑3321
pdf bib The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe
Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajic, Khalid Choukri, Andrejs Vasiļjevs, Gerhard Backfried, Christoph Prinz, Jose Manuel Gomez-Perez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim Köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriūtė, Núria Bel, António Branco, Gerhard Budin, Walter Daelemans, Koenraad De Smedt, Radovan Garabík, Maria Gavriilidou, Dagmar Gromann, Svetla Koeva, Simon Krek, Cvetana Krstev, Krister Lindén, Bernardo Magnini, Jan Odijk, Maciej Ogrodniczuk, Eiríkur Rögnvaldsson, Mike Rosner, Bolette Pedersen, Inguna Skadina, Marko Tadić, Dan Tufiș, Tamás Váradi, Kadri Vider, Andy Way and François Yvon
pp. 3322‑3332
pdf bib A Framework for Shared Agreement of Language Tags beyond ISO 639
Frances Gillis-Webber and Sabine Tittel
pp. 3333‑3339
pdf bib Gigafida 2.0: The Reference Corpus of Written Standard Slovene
Simon Krek, Špela Arhar Holdt, Tomaž Erjavec, Jaka Čibej, Andraz Repar, Polona Gantar, Nikola Ljubešić, Iztok Kosem and Kaja Dobrovoljc
pp. 3340‑3345
pdf bib Corpus Query Lingua Franca part II: Ontology
Stefan Evert, Oleg Harlamov, Philipp Heinrich and Piotr Banski
pp. 3346‑3352
pdf bib A CLARIN Transcription Portal for Interview Data
Christoph Draxler, Henk van den Heuvel, Arjan van Hessen, Silvia Calamai and Louise Corti
pp. 3353‑3359
pdf bib Ellogon Casual Annotation Infrastructure
Georgios Petasis and Leonidas Tsekouras
pp. 3360‑3365
pdf bib European Language Grid: An Overview
Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici, Miroslav Janosik, Katja Prinz, Christoph Prinz, Severin Stampler, Dorothea Thomas-Aniola, Jose Manuel Gomez-Perez, Andres Garcia Silva, Christian Berrío, Ulrich Germann, Steve Renals and Ondrej Klejch
pp. 3366‑3380
pdf bib The Competitiveness Analysis of the European Language Technology Market
Andrejs Vasiļjevs, Inguna Skadina, Indra Samite, Kaspars Kauliņš, Ēriks Ajausks, Jūlija Meļņika and Aivars Bērziņš
pp. 3381‑3389
pdf bib Constructing a Bilingual Hadith Corpus Using a Segmentation Tool
Shatha Altammami, Eric Atwell and Ammar Alsalka
pp. 3390‑3398
pdf bib Facilitating Corpus Usage: Making Icelandic Corpora More Accessible for Researchers and Language Users
Steinþór Steingrímsson, Starkaður Barkarson and Gunnar Thor Örnólfsson
pp. 3399‑3405
pdf bib Interoperability in an Infrastructure Enabling Multidisciplinary Research: The case of CLARIN
Franciska de Jong, Bente Maegaard, Darja Fišer, Dieter van Uytvanck and Andreas Witt
pp. 3406‑3413
pdf bib Language Technology Programme for Icelandic 2019-2023
Anna Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson and Steinþór Steingrímsson
pp. 3414‑3422
pdf bib Privacy by Design and Language Resources
Pawel Kamocki and Andreas Witt
pp. 3423‑3427
pdf bib Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid
Penny Labropoulou, Katerina Gkirtzou, Maria Gavriilidou, Miltos Deligiannis, Dimitris Galanis, Stelios Piperidis, Georg Rehm, Maria Berger, Valérie Mapelli, Michael Rigault, Victoria Arranz, Khalid Choukri, Gerhard Backfried, Jose Manuel Gomez-Perez and Andres Garcia-Silva
pp. 3428‑3437
pdf bib Related Works in the Linguistic Data Consortium Catalog
Daniel Jaquette, Christopher Cieri and Denise DiPersio
pp. 3438‑3442
pdf bib Language Data Sharing in European Public Services – Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures
Lilli Smal, Andrea Lösch, Josef van Genabith, Maria Giagkou, Thierry Declerck and Stephan Busemann
pp. 3443‑3448
pdf bib A Progress Report on Activities at the Linguistic Data Consortium Benefitting the LREC Community
Christopher Cieri, James Fiumara, Stephanie Strassel, Jonathan Wright, Denise DiPersio and Mark Liberman
pp. 3449‑3456
pdf bib Digital Language Infrastructures – Documenting Language Actors
Verena Lyding, Alexander König and Monica Pretti
pp. 3457‑3462
pdf bib Samrómur: Crowd-sourcing Data Collection for Icelandic Speech Recognition
David Erik Mollberg, Ólafur Helgi Jónsson, Sunneva Þorsteinsdóttir, Steinþór Steingrímsson, Eydís Huld Magnúsdóttir and Jon Gudnason
pp. 3463‑3467
pdf bib Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages
Astik Biswas, Emre Yilmaz, Febe De Wet, Ewald Van der westhuizen and Thomas Niesler
pp. 3468‑3474
pdf bib CLFD: A Novel Vectorization Technique and Its Application in Fake News Detection
Michail Mersinias, Stergos Afantenos and Georgios Chalkiadakis
pp. 3475‑3483
pdf bib SimplifyUR: Unsupervised Lexical Text Simplification for Urdu
Namoos Hayat Qasmi, Haris Bin Zia, Awais Athar and Agha Ali Raza
pp. 3484‑3489
pdf bib Jamo Pair Encoding: Subcharacter Representation-based Extreme Korean Vocabulary Compression for Efficient Subword Tokenization
Sangwhan Moon and Naoaki Okazaki
pp. 3490‑3497
pdf bib Offensive Language and Hate Speech Detection for Danish
Gudbjartur Ingi Sigurbergsson and Leon Derczynski
pp. 3498‑3508
pdf bib Semi-supervised Deep Embedded Clustering with Anomaly Detection for Semantic Frame Induction
Zheng Xin Yong and Tiago Timponi Torrent
pp. 3509‑3519
pdf bib Search Query Language Identification Using Weak Labeling
Ritiz Tambi, Ajinkya Kale and Tracy Holloway King
pp. 3520‑3527
pdf bib Automated Phonological Transcription of Akkadian Cuneiform Text
Aleksi Sahala, Miikka Silfverberg, Antti Arppe and Krister Lindén
pp. 3528‑3534
pdf bib COSTRA 1.0: A Dataset of Complex Sentence Transformations
Petra Barancikova and Ondřej Bojar
pp. 3535‑3541
pdf bib Automatic In-the-wild Dataset Annotation with Deep Generalized Multiple Instance Learning
Joana Correia, Isabel Trancoso and Bhiksha Raj
pp. 3542‑3550
pdf bib How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR
Phillip Benjamin Ströbel, Simon Clematide and Martin Volk
pp. 3551‑3559
pdf bib Dirichlet-Smoothed Word Embeddings for Low-Resource Settings
Jakob Jungmaier, Nora Kassner and Benjamin Roth
pp. 3560‑3565
pdf bib On The Performance of Time-Pooling Strategies for End-to-End Spoken Language Identification
Joao Monteiro, Md Jahangir Alam and Tiago Falk
pp. 3566‑3572
pdf bib Neural Disambiguation of Lemma and Part of Speech in Morphologically Rich Languages
José María Hoya Quecedo, Koppatz Maximilian and Roman Yangarber
pp. 3573‑3582
pdf bib Non-Linearity in Mapping Based Cross-Lingual Word Embeddings
Jiawei Zhao and Andrew Gilman
pp. 3583‑3589
pdf bib LibriVoxDeEn: A Corpus for German-to-English Speech Translation and German Speech Recognition
Benjamin Beilharz, Xin Sun, Sariya Karimova and Stefan Riezler
pp. 3590‑3594
pdf bib SEDAR: a Large Scale French-English Financial Domain Parallel Corpus
Abbas Ghaddar and Phillippe Langlais
pp. 3595‑3602
pdf bib JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
Makoto Morishita, Jun Suzuki and Masaaki Nagata
pp. 3603‑3609
pdf bib Neural Machine Translation for Low-Resourced Indian Languages
Himanshu Choudhary, Shivansh Rao and Rajesh Rohilla
pp. 3610‑3615
pdf bib Content-Equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for NMT
Hideya Mino, Hideki Tanaka, Hitoshi Ito, Isao Goto, Ichiro Yamada and Takenobu Tokunaga
pp. 3616‑3622
pdf bib NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic Translations
Helena Caseli and Marcio Inácio
pp. 3623‑3629
pdf bib Evaluation Dataset for Zero Pronoun in Japanese to English Translation
Sho Shimazu, Sho Takase, Toshiaki Nakazawa and Naoaki Okazaki
pp. 3630‑3634
pdf bib Better Together: Modern Methods Plus Traditional Thinking in NP Alignment
Ádám Kovács, Judit Ács, Andras Kornai and Gábor Recski
pp. 3635‑3639
pdf bib Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation
Haiyue Song, Raj Dabre, Atsushi Fujita and Sadao Kurohashi
pp. 3640‑3649
pdf bib Being Generous with Sub-Words towards Small NMT Children
Arne Defauw, Tom Vanallemeersch, Koen Van Winckel, Sara Szoc and Joachim Van den Bogaert
pp. 3650‑3656
pdf bib Document Sub-structure in Neural Machine Translation
Radina Dobreva, Jie Zhou and Rachel Bawden
pp. 3657‑3667
pdf bib An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems
Alessandro Raganato, Yves Scherrer and Jörg Tiedemann
pp. 3668‑3675
pdf bib MEDLINE as a Parallel Corpus: a Survey to Gain Insight on French-, Spanish- and Portuguese-speaking Authors’ Abstract Writing Practice
Aurélie Névéol, Antonio Jimeno Yepes and Mariana Neves
pp. 3676‑3682
pdf bib JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation
Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song and Sadao Kurohashi
pp. 3683‑3691
pdf bib A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?
Julia Ive, Lucia Specia, Sara Szoc, Tom Vanallemeersch, Joachim Van den Bogaert, Eduardo Farah, Christine Maroti, Artur Ventura and Maxim Khalilov
pp. 3692‑3697
pdf bib Linguistically Informed Hindi-English Neural Machine Translation
Vikrant Goyal, Pruthwik Mishra and Dipti Misra Sharma
pp. 3698‑3703
pdf bib A Test Set for Discourse Translation from Japanese to English
Masaaki Nagata and Makoto Morishita
pp. 3704‑3709
pdf bib An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages
Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu and David Yarowsky
pp. 3710‑3718
pdf bib TDDC: Timely Disclosure Documents Corpus
Nobushige Doi, Yusuke Oda and Toshiaki Nakazawa
pp. 3719‑3726
pdf bib MuST-Cinema: a Speech-to-Subtitles corpus
Alina Karakanta, Matteo Negri and Marco Turchi
pp. 3727‑3734
pdf bib On Context Span Needed for Machine Translation Evaluation
Sheila Castilho, Maja Popović and Andy Way
pp. 3735‑3742
pdf bib A Multilingual Parallel Corpora Collection Effort for Indian Languages
Shashank Siripragada, Jerin Philip, Vinay P. Namboodiri and C V Jawahar
pp. 3743‑3751
pdf bib To Case or not to case: Evaluating Casing Methods for Neural Machine Translation
Thierry Etchegoyhen and Harritxu Gete
pp. 3752‑3760
pdf bib The MARCELL Legislative Corpus
Tamás Váradi, Svetla Koeva, Martin Yamalov, Marko Tadić, Bálint Sass, Bartłomiej Nitoń, Maciej Ogrodniczuk, Piotr Pęzik, Verginica Barbu Mititelu, Radu Ion, Elena Irimia, Maria Mitrofan, Vasile Păiș, Dan Tufiș, Radovan Garabík, Simon Krek, Andraz Repar, Matjaž Rihtar and Janez Brank
pp. 3761‑3768
pdf bib ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts
Felipe Soares, Mark Stevenson, Diego Bartolome and Anna Zaretskaya
pp. 3769‑3774
pdf bib Corpora for Document-Level Neural Machine Translation
Siyou Liu and Xiaojun Zhang
pp. 3775‑3781
pdf bib OpusTools and Parallel Corpus Diagnostics
Mikko Aulamo, Umut Sulubacak, Sami Virpioja and Jörg Tiedemann
pp. 3782‑3789
pdf bib Literary Machine Translation under the Magnifying Glass: Assessing the Quality of an NMT-Translated Detective Novel on Document Level
Margot Fonteyne, Arda Tezcan and Lieve Macken
pp. 3790‑3798
pdf bib Handle with Care: A Case Study in Comparable Corpora Exploitation for Neural Machine Translation
Thierry Etchegoyhen and Harritxu Gete
pp. 3799‑3807
pdf bib The FISKMÖ Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic Research
Jörg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter and Niko Papula
pp. 3808‑3815
pdf bib Multiword Expression aware Neural Machine Translation
Andrea Zaninello and Alexandra Birch
pp. 3816‑3825
pdf bib An Enhanced Mapping Scheme of the Universal Part-Of-Speech for Korean
Myung Hee Kim and Nathalie Colineau
pp. 3826‑3833
pdf bib Finite State Machine Pattern-Root Arabic Morphological Generator, Analyzer and Diacritizer
Maha Alkhairy, Afshan Jafri and David Smith
pp. 3834‑3841
pdf bib An Unsupervised Method for Weighting Finite-state Morphological Analyzers
Amr Keleg, Francis Tyers, Nick Howell and Tommi Pirinen
pp. 3842‑3850
pdf bib Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction
Danushka Bollegala, Ryuichi Kiryo, Kosuke Tsujino and Haruki Yukawa
pp. 3851‑3860
pdf bib A Supervised Part-Of-Speech Tagger for the Greek Language of the Social Web
Maria Nefeli Nikiforos and Katia Lida Kermanidis
pp. 3861‑3867
pdf bib Bag & Tag’em - A New Dutch Stemmer
Anne Jonker, Corné de Ruijt and Jornt de Gruijl
pp. 3868‑3876
pdf bib Glawinette: a Linguistically Motivated Derivational Description of French Acquired from GLAWI
Nabil Hathout, Franck Sajous, Basilio Calderone and Fiammetta Namer
pp. 3877‑3885
pdf bib BabyFST - Towards a Finite-State Based Computational Model of Ancient Babylonian
Aleksi Sahala, Miikka Silfverberg, Antti Arppe and Krister Lindén
pp. 3886‑3894
pdf bib Morphological Analysis and Disambiguation for Gulf Arabic: The Interplay between Resources and Methods
Salam Khalifa, Nasser Zalmout and Nizar Habash
pp. 3895‑3904
pdf bib Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus
Eleni Metheniti and Guenter Neumann
pp. 3905‑3912
pdf bib Introducing a Large-Scale Dataset for Vietnamese POS Tagging on Conversational Texts
Oanh Tran, Tu Pham, Vu Dang and Bang Nguyen
pp. 3913‑3921
pdf bib UniMorph 3.0: Universal Morphology
Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ernštreits, Yuval Pinter, Cassandra L. Jacobs, Ryan Cotterell, Mans Hulden and David Yarowsky
pp. 3922‑3931
pdf bib Building the Spanish-Croatian Parallel Corpus
Bojana Mikelenić and Marko Tadić
pp. 3932‑3936
pdf bib DerivBase.Ru: a Derivational Morphology Resource for Russian
Daniil Vodolazsky
pp. 3937‑3943
pdf bib Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning
Stig-Arne Grönroos, Sami Virpioja and Mikko Kurimo
pp. 3944‑3953
pdf bib Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
Ranka Stankovic, Branislava Šandrih, Cvetana Krstev, Miloš Utvić and Mihailo Skoric
pp. 3954‑3962
pdf bib Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages
Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu and David Yarowsky
pp. 3963‑3972
pdf bib Cairo Student Code-Switch (CSCS) Corpus: An Annotated Egyptian Arabic-English Corpus
Mohamed Balabel, Injy Hamed, Slim Abdennadher, Ngoc Thang Vu and Özlem Çetinoğlu
pp. 3973‑3977
pdf bib Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation
Alexey Sorokin
pp. 3978‑3983
pdf bib Visual Modeling of Turkish Morphology
Berke Özenç and Ercan Solak
pp. 3984‑3990
pdf bib Kvistur 2.0: a BiLSTM Compound Splitter for Icelandic
Jón Daðason, David Mollberg, Hrafn Loftsson and Kristín Bjarnadóttir
pp. 3991‑3995
pdf bib Morphological Segmentation for Low Resource Languages
Justin Mott, Ann Bies, Stephanie Strassel, Jordan Kodner, Caitlin Richter, Hongzhi Xu and Mitchell Marcus
pp. 3996‑4002
pdf bib CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin and Edouard Grave
pp. 4003‑4012
pdf bib On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning
Yerai Doval, Jose Camacho-Collados, Luis Espinosa Anke and Steven Schockaert
pp. 4013‑4023
pdf bib Building an English-Chinese Parallel Corpus Annotated with Sub-sentential Translation Techniques
Yuming Zhai, Lufei Liu, Xinyi Zhong, Gbariel Illouz and Anne Vilnat
pp. 4024‑4033
pdf bib Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajic, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers and Daniel Zeman
pp. 4034‑4043
pdf bib EMPAC: an English–Spanish Corpus of Institutional Subtitles
Iris Serrat Roozen and José Manuel Martínez Martínez
pp. 4044‑4053
pdf bib Cross-Lingual Word Embeddings for Turkic Languages
Elmurod Kuriyozov, Yerai Doval and Carlos Gómez-Rodríguez
pp. 4054‑4062
pdf bib How Universal are Universal Dependencies? Exploiting Syntax for Multilingual Clause-level Sentiment Detection
Hiroshi Kanayama and Ran Iwamoto
pp. 4063‑4073
pdf bib Multilingual Culture-Independent Word Analogy Datasets
Matej Ulčar, Kristiina Vaik, Jessica Lindström, Milda Dailidėnaitė and Marko Robnik-Šikonja
pp. 4074‑4080
pdf bib GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies
Marta R. Costa-jussà, Pau Li Lin and Cristina España-Bonet
pp. 4081‑4088
pdf bib SpiCE: A New Open-Access Corpus of Conversational Bilingual Speech in Cantonese and English
Khia A. Johnson, Molly Babel, Ivan Fong and Nancy Yiu
pp. 4089‑4095
pdf bib Identifying Cognates in English-Dutch and French-Dutch by means of Orthographic Information and Cross-lingual Word Embeddings
Els Lefever, Sofie Labat and Pranaydeep Singh
pp. 4096‑4101
pdf bib Lexicogrammatic translationese across two targets and competence levels
Maria Kunilovskaya and Ekaterina Lapshinova-Koltunski
pp. 4102‑4112
pdf bib UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages
Ehsaneddin Asgari, Fabienne Braune, Benjamin Roth, Christoph Ringlstetter and Mohammad Mofrad
pp. 4113‑4120
pdf bib CanVEC - the Canberra Vietnamese-English Code-switching Natural Speech Corpus
Li Nguyen and Christopher Bryant
pp. 4121‑4129
pdf bib A Spelling Correction Corpus for Multiple Arabic Dialects
Fadhl Eryani, Nizar Habash, Houda Bouamor and Salam Khalifa
pp. 4130‑4138
pdf bib A Dataset for Multi-lingual Epidemiological Event Extraction
Stephen Mutuvi, Antoine Doucet, Gael Lejeune and Moses Odeo
pp. 4139‑4144
pdf bib Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics
Julia Krasselt, Philipp Dressen, Matthias Fluor, Cerstin Mahlow, Klaus Rothenhäusler and Maren Runte
pp. 4145‑4151
pdf bib Analysis of GlobalPhone and Ethiopian Languages Speech Corpora for Multilingual ASR
Martha Yifiru Tachbelie, Solomon Teferra Abate and Tanja Schultz
pp. 4152‑4156
pdf bib Multilingualization of Medical Terminology: Semantic and Structural Embedding Approaches
Long-Huei Chen and Kyo Kageura
pp. 4157‑4166
pdf bib Large Vocabulary Read Speech Corpora for Four Ethiopian Languages: Amharic, Tigrigna, Oromo and Wolaytta
Solomon Teferra Abate, Martha Yifiru Tachbelie, Michael Melese, Hafte Abera, Tewodros Abebe, Wondwossen Mulugeta, Yaregal Assabie, Million Meshesha, Solomon Afnafu and Binyam Ephrem Seyoum
pp. 4167‑4171
pdf bib Incorporating Politeness across Languages in Customer Care Responses: Towards building a Multi-lingual Empathetic Dialogue Agent
Mauajama Firdaus, Asif Ekbal and Pushpak Bhattacharyya
pp. 4172‑4182
pdf bib WikiBank: Using Wikidata to Improve Multilingual Frame-Semantic Parsing
Cezar Sas, Meriem Beloucif and Anders Søgaard
pp. 4183‑4189
pdf bib Multilingual Corpus Creation for Multilingual Semantic Similarity Task
Mahtab Ahmed, Chahna Dixit, Robert E. Mercer, Atif Khan, Muhammad Rifayat Samee and Felipe Urra
pp. 4190‑4196
pdf bib CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
Changhan Wang, Juan Pino, Anne Wu and Jiatao Gu
pp. 4197‑4203
pdf bib A Visually-Grounded Parallel Corpus with Phrase-to-Region Linking
Hideki Nakayama, Akihiro Tamura and Takashi Ninomiya
pp. 4204‑4210
pdf bib Multilingual Dictionary Based Construction of Core Vocabulary
Winston Wu, Garrett Nicolai and David Yarowsky
pp. 4211‑4217
pdf bib Common Voice: A Massively-­Multilingual Speech Corpus
Rosana Ardila, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer, Michael Henretty, Reuben Morais, Lindsay Saunders, Francis Tyers and Gregor Weber
pp. 4218‑4222
pdf bib Massively Multilingual Pronunciation Modeling with WikiPron
Jackson L. Lee, Lucas F.E. Ashby, M. Elizabeth Garza, Yeonju Lee-Sikka, Sean Miller, Alan Wong, Arya D. McCarthy and Kyle Gorman
pp. 4223‑4228
pdf bib HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment
Anssi Yli-Jyrä, Josi Purhonen, Matti Liljeqvist, Arto Antturi, Pekka Nieminen, Kari M. Räntilä and Valtter Luoto
pp. 4229‑4236
pdf bib ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-English
Injy Hamed, Ngoc Thang Vu and Slim Abdennadher
pp. 4237‑4246
pdf bib Cross-lingual Named Entity List Search via Transliteration
Aleksandr Khakhmovich, Svetlana Pavlova, Kira Kirillova, Nikolay Arefyev and Ekaterina Savilova
pp. 4247‑4255
pdf bib Serial Speakers: a Dataset of TV Series
Xavier Bost, Vincent Labatut and Georges Linares
pp. 4256‑4264
pdf bib Image Position Prediction in Multimodal Documents
Masayasu Muraoka, Ryosuke Kohita and Etsuko Ishii
pp. 4265‑4274
pdf bib Visual Grounding Annotation of Recipe Flow Graph
Taichi Nishimura, Suzushi Tomori, Hayato Hashimoto, Atsushi Hashimoto, Yoko Yamakata, Jun Harashima, Yoshitaka Ushiku and Shinsuke Mori
pp. 4275‑4284
pdf bib Building a Multimodal Entity Linking Dataset From Tweets
Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne and Brigitte Grau
pp. 4285‑4292
pdf bib A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study
salima mdhaffar, Yannick Estève, Antoine Laurent, Nicolas Hernandez, Richard Dufour, Delphine Charlet, Geraldine Damnati, Solen Quiniou and Nathalie Camelin
pp. 4293‑4301
pdf bib Annotating Event Appearance for Japanese Chess Commentary Corpus
Hirotaka Kameko and Shinsuke Mori
pp. 4302‑4308
pdf bib Offensive Video Detection: Dataset and Baseline Results
Cleber Alcântara, Viviane Moreira and Diego Feijo
pp. 4309‑4319
pdf bib Adding Gesture, Posture and Facial Displays to the PoliModal Corpus of Political Interviews
Daniela Trotta, Alessio Palmero Aprosio, Sara Tonelli and Annibale Elia
pp. 4320‑4326
pdf bib E:Calm Resource: a Resource for Studying Texts Produced by French Pupils and Students
Lydia-Mai Ho-Dac, Serge Fleury and Claude Ponton
pp. 4327‑4332
pdf bib Introducing MULAI: A Multimodal Database of Laughter during Dyadic Interactions
Michel-Pierre Jansen, Khiet P. Truong, Dirk K.J. Heylen and Deniece S. Nazareth
pp. 4333‑4342
pdf bib The Connection between the Text and Images of News Articles: New Insights for Multimedia Analysis
Nelleke Oostdijk, Hans van Halteren, Erkan Bașar and Martha Larson
pp. 4343‑4351
pdf bib LifeQA: A Real-life Dataset for Video Question Answering
Santiago Castro, Mahmoud Azab, Jonathan Stroud, Cristina Noujaim, Ruoyao Wang, Jia Deng and Rada Mihalcea
pp. 4352‑4358
pdf bib A Domain-Specific Dataset of Difficulty Ratings for German Noun Compounds in the Domains DIY, Cooking and Automotive
Julia Bettinger, Anna Hätty, Michael Dorna and Sabine Schulte im Walde
pp. 4359‑4367
pdf bib All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German
Yana Strakatova, Neele Falk, Isabel Fuhrmann, Erhard Hinrichs and Daniela Rossmann
pp. 4368‑4378
pdf bib Variants of Vector Space Reductions for Predicting the Compositionality of English Noun Compounds
Pegah Alipoor and Sabine Schulte im Walde
pp. 4379‑4387
pdf bib Varying Vector Representations and Integrating Meaning Shifts into a PageRank Model for Automatic Term Extraction
Anurag Nigam, Anna Hätty and Sabine Schulte im Walde
pp. 4388‑4394
pdf bib Rigor Mortis: Annotating MWEs with a Gamified Platform
Karën Fort, Bruno Guillaume, Yann-Alan Pilatte, Mathieu Constant and Nicolas Lefèbvre
pp. 4395‑4401
pdf bib A Multi-word Expression Dataset for Swedish
Murathan Kurfalı, Robert Östling, Johan Sjons and Mats Wirén
pp. 4402‑4409
pdf bib A Joint Approach to Compound Splitting and Idiomatic Compound Detection
Irina Krotova, Sergey Aksenov and Ekaterina Artemova
pp. 4410‑4417
pdf bib Dedicated Language Resources for Interdisciplinary Research on Multiword Expressions: Best Thing since Sliced Bread
Ferdy Hubers, Catia Cucchiarini and Helmer Strik
pp. 4418‑4425
pdf bib Detecting Multiword Expression Type Helps Lexical Complexity Assessment
Ekaterina Kochmar, Sian Gooding and Matthew Shardlow
pp. 4426‑4435
pdf bib Introducing RONEC - the Romanian Named Entity Corpus
Stefan Daniel Dumitrescu and Andrei-Marius Avram
pp. 4436‑4443
pdf bib A Semi-supervised Approach for De-identification of Swedish Clinical Text
Hanna Berg and Hercules Dalianis
pp. 4444‑4450
pdf bib A Chinese Corpus for Fine-grained Entity Typing
Chin Lee, Hongliang Dai, Yangqiu Song and Xin Li
pp. 4451‑4457
pdf bib Czech Historical Named Entity Corpus v 1.0
Helena Hubková, Pavel Kral and Eva Pettersson
pp. 4458‑4465
pdf bib CodE Alltag 2.0 — A Pseudonymized German-Language Email Corpus
Elisabeth Eder, Ulrike Krieg-Holz and Udo Hahn
pp. 4466‑4477
pdf bib A Dataset of German Legal Documents for Named Entity Recognition
Elena Leitner, Georg Rehm and Julian Moreno-Schneider
pp. 4478‑4485
pdf bib Sensitive Data Detection and Classification in Spanish Clinical Text: Experiments with BERT
Aitor García Pablos, Naiara Perez and Montse Cuadros
pp. 4486‑4494
pdf bib Named Entities in Medical Case Reports: Corpus and Experiments
Sarah Schulz, Jurica Ševa, Samuel Rodriguez, Malte Ostendorff and Georg Rehm
pp. 4495‑4500
pdf bib Hedwig: A Named Entity Linker
Marcus Klang and Pierre Nugues
pp. 4501‑4508
pdf bib An Experiment in Annotating Animal Species Names from ISTEX Resources
Sabine Barreaux and Dominique Besagni
pp. 4509‑4513
pdf bib Where are we in Named Entity Recognition from Speech?
Antoine Caubrière, Sophie Rosset, Yannick Estève, Antoine Laurent and Emmanuel Morin
pp. 4514‑4520
pdf bib Tagging Location Phrases in Text
Paul McNamee, James Mayfield, Cash Costello, Caitlyn Bishop and Shelby Anderson
pp. 4521‑4528
pdf bib ScienceExamCER: A High-Density Fine-Grained Science-Domain Corpus for Common Entity Recognition
Hannah Smith, Zeyu Zhang, John Culnan and Peter Jansen
pp. 4529‑4546
pdf bib NorNE: Annotating Named Entities for Norwegian
Fredrik Jørgensen, Tobias Aasmoe, Anne-Stine Ruud Husevåg, Lilja Øvrelid and Erik Velldal
pp. 4547‑4556
pdf bib Tag Me If You Can! Semantic Annotation of Biodiversity Metadata with the QEMP Corpus and the BiodivTagger
Felicitas Löffler, Nora Abdelmageed, Samira Babalou, Pawandeep Kaur and Birgitta König-Ries
pp. 4557‑4564
pdf bib Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases
Shuntaro Yada, Ayami Joh, Ribeka Tanaka, Fei Cheng, Eiji Aramaki and Sadao Kurohashi
pp. 4565‑4572
pdf bib Creating a Dataset for Named Entity Recognition in the Archaeology Domain
Alex Brandsen, Suzan Verberne, Milco Wansleeben and Karsten Lambers
pp. 4573‑4577
pdf bib Development of a Medical Incident Report Corpus with Intention and Factuality Annotation
Hongkuan Zhang, Ryohei Sasano, Koichi Takeda and Zoie Shui-Yee Wong
pp. 4578‑4584
pdf bib ProGene - A Large-scale, High-Quality Protein-Gene Annotated Benchmark Corpus
Erik Faessler, Luise Modersohn, Christina Lohr and Udo Hahn
pp. 4585‑4596
pdf bib DaNE: A Named Entity Resource for Danish
Rasmus Hvingelby, Amalie Brogaard Pauli, Maria Barrett, Christina Rosted, Lasse Malm Lidegaard and Anders Søgaard
pp. 4597‑4604
pdf bib Fine-grained Named Entity Annotations for German Biographic Interviews
Josef Ruppenhofer, Ines Rehbein and Carolina Flinz
pp. 4605‑4614
pdf bib A Broad-coverage Corpus for Finnish Named Entity Recognition
Jouni Luoma, Miika Oinonen, Maria Pyykönen, Veronika Laippala and Sampo Pyysalo
pp. 4615‑4624
pdf bib Embeddings for Named Entity Recognition in Geoscience Portuguese Literature
Bernardo Consoli, Joaquim Santos, Diogo Gomes, Fabio Cordeiro, Renata Vieira and Viviane Moreira
pp. 4625‑4630
pdf bib Establishing a New State-of-the-Art for French Named Entity Recognition
Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary and Benoît Sagot
pp. 4631‑4638
pdf bib Building OCR/NER Test Collections
Dawn Lawrie, James Mayfield and David Etter
pp. 4639‑4646
pdf bib Reconstructing NER Corpora: a Case Study on Bulgarian
Iva Marinova, Laska Laskova, Petya Osenova, Kiril Simov and Alexander Popov
pp. 4647‑4652
pdf bib MucLex: A German Lexicon for Surface Realisation
Kira Klimt, Daniel Braun, Daniela Schneider and Florian Matthes
pp. 4653‑4657
pdf bib Generating Major Types of Chinese Classical Poetry in a Uniformed Framework
Jinyi Hu and Maosong Sun
pp. 4658‑4663
pdf bib Video Caption Dataset for Describing Human Actions in Japanese
Yutaro Shigeto, Yuya Yoshikawa, Jiaqing Lin and Akikazu Takeuchi
pp. 4664‑4670
pdf bib Decode with Template: Content Preserving Sentiment Transfer
Zhiyuan Wen, Jiannong Cao, Ruosong Yang and Senzhang Wang
pp. 4671‑4679
pdf bib Best Student Forcing: A Simple Training Mechanism in Adversarial Language Generation
Jonathan Sauder, Ting Hu, Xiaoyin Che, Goncalo Mordido, Haojin Yang and Christoph Meinel
pp. 4680‑4688
pdf bib Controllable Sentence Simplification
Louis Martin, Éric de la Clergerie, Benoît Sagot and Antoine Bordes
pp. 4689‑4698
pdf bib Exploring Transformer Text Generation for Medical Dataset Augmentation
Ali Amin-Nejad, Julia Ive and Sumithra Velupillai
pp. 4699‑4708
pdf bib Multi-lingual Mathematical Word Problem Generation using Long Short Term Memory Networks with Enhanced Input Features
Vijini Liyanage and Surangika Ranathunga
pp. 4709‑4716
pdf bib Time-Aware Word Embeddings for Three Lebanese News Archives
Jad Doughman, Fatima Abu Salem and Shady Elbassuoni
pp. 4717‑4725
pdf bib GGP: Glossary Guided Post-processing for Word Embedding Learning
Ruosong Yang, Jiannong Cao and Zhiyuan Wen
pp. 4726‑4730
pdf bib High Quality ELMo Embeddings for Seven Less-Resourced Languages
Matej Ulčar and Marko Robnik-Šikonja
pp. 4731‑4738
pdf bib Is Language Modeling Enough? Evaluating Effective Embedding Combinations
Rudolf Schneider, Tom Oberhauser, Paul Grundmann, Felix Alexander Gers, Alexander Loeser and Steffen Staab
pp. 4739‑4748
pdf bib Language Modeling with a General Second-Order RNN
Diego Maupomé and Marie-Jean Meurs
pp. 4749‑4753
pdf bib Towards a Gold Standard for Evaluating Danish Word Embeddings
Nina Schneidermann, Rasmus Hvingelby and Bolette Pedersen
pp. 4754‑4763
pdf bib Urban Dictionary Embeddings for Slang NLP Applications
Steven Wilson, Walid Magdy, Barbara McGillivray, Kiran Garimella and Gareth Tyson
pp. 4764‑4773
pdf bib Representation Learning for Unseen Words by Bridging Subwords to Semantic Networks
Yeachan Kim, Kang-Min Kim and SangKeun Lee
pp. 4774‑4780
pdf bib Give your Text Representation Models some Love: the Case for Basque
Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa and Eneko Agirre
pp. 4781‑4788
pdf bib On the Correlation of Word Embedding Evaluation Metrics
François Torregrossa, Vincent Claveau, Nihel Kooli, Guillaume Gravier and Robin Allesiardo
pp. 4789‑4797
pdf bib CBOW-tag: a Modified CBOW Algorithm for Generating Embedding Models from Annotated Corpora
Attila Novák, László Laki and Borbála Novák
pp. 4798‑4801
pdf bib Much Ado About Nothing – Identification of Zero Copulas in Hungarian Using an NMT Model
Andrea Dömötör, Zijian Győző Yang and Attila Novák
pp. 4802‑4810
pdf bib Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift
Matej Martinc, Petra Kralj Novak and Senja Pollak
pp. 4811‑4819
pdf bib Improving NMT Quality Using Terminology Injection
Duane K. Dougal and Deryle Lonsdale
pp. 4820‑4827
pdf bib Word Embedding Evaluation in Downstream Tasks and Semantic Analogies
Joaquim Santos, Bernardo Consoli and Renata Vieira
pp. 4828‑4834
pdf bib Detection of Reading Absorption in User-Generated Book Reviews: Resources Creation and Evaluation
Piroska Lendvai, Sándor Darányi, Christian Geng, Moniek Kuijpers, Oier Lopez de Lacalle, Jean-Christophe Mensonides, Simone Rebora and Uwe Reichel
pp. 4835‑4841
pdf bib Developing an Arabic Infectious Disease Ontology to Include Non-Standard Terminology
Lama Alsudias and Paul Rayson
pp. 4842‑4850
pdf bib Aligning Wikipedia with WordNet:a Review and Evaluation of Different Techniques
Antoni Oliver
pp. 4851‑4858
pdf bib The MWN.PT WordNet for Portuguese: Projection, Validation, Cross-lingual Alignment and Distribution
António Branco, Sara Grilo, Márcia Bolrinha, Chakaveh Saedi, Ruben Branco, João Silva, Andreia Querido, Rita de Carvalho, Rosa Gaudio, Mariana Avelãs and Clara Pinto
pp. 4859‑4866
pdf bib Ontology-Style Relation Annotation: A Case Study
Savong Bou, Naoki Suzuki, Makoto Miwa and Yutaka Sasaki
pp. 4867‑4876
pdf bib The Ontology of Bulgarian Dialects – Architecture and Information Retrieval
Rositsa Dekova
pp. 4877‑4882
pdf bib Spatial AMR: Expanded Spatial Annotation in the Context of a Grounded Minecraft Corpus
Julia Bonn, Martha Palmer, Zheng Cai and Kristin Wright-Bettner
pp. 4883‑4892
pdf bib English WordNet Random Walk Pseudo-Corpora
Filip Klubička, Alfredo Maldonado, Abhijit Mahalunkar and John Kelleher
pp. 4893‑4902
pdf bib On the Formal Standardization of Terminology Resources: The Case Study of TriMED
Federica Vezzani and Giorgio Maria Di Nunzio
pp. 4903‑4910
pdf bib Metaphorical Expressions in Automatic Arabic Sentiment Analysis
Israa Alsiyat and Scott Piao
pp. 4911‑4916
pdf bib HotelRec: a Novel Very Large-Scale Hotel Recommendation Dataset
Diego Antognini and Boi Faltings
pp. 4917‑4923
pdf bib Doctor Who? Framing Through Names and Titles in German
Esther van den Berg, Katharina Korfhage, Josef Ruppenhofer, Michael Wiegand and Katja Markert
pp. 4924‑4932
pdf bib Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification
Alexander Rietzler, Sebastian Stabinger, Paul Opitz and Stefan Engl
pp. 4933‑4941
pdf bib An Empirical Examination of Online Restaurant Reviews
Hyun Jung Kang and Iris Eshkol-Taravella
pp. 4942‑4947
pdf bib Manovaad: A Novel Approach to Event Oriented Corpus Creation Capturing Subjectivity and Focus
Lalitha Kameswari and Radhika Mamidi
pp. 4948‑4954
pdf bib Toward Qualitative Evaluation of Embeddings for Arabic Sentiment Analysis
Amira Barhoumi, Nathalie Camelin, Chafik Aloulou, Yannick Estève and Lamia Hadrich Belguith
pp. 4955‑4963
pdf bib Annotating Perspectives on Vaccination
Roser Morante, Chantal van Son, Isa Maks and Piek Vossen
pp. 4964‑4973
pdf bib Aspect On: an Interactive Solution for Post-Editing the Aspect Extraction based on Online Learning
Mara Chinea-Rios, Marc Franco-Salvador and Yassine Benajiba
pp. 4974‑4981
pdf bib Recommendation Chart of Domains for Cross-Domain Sentiment Analysis: Findings of A 20 Domain Study
Akash Sheoran, Diptesh Kanojia, Aditya Joshi and Pushpak Bhattacharyya
pp. 4982‑4990
pdf bib Inference Annotation of a Chinese Corpus for Opinion Mining
Liyun Yan, Danni E, Mei Gan, Cyril Grouin and Mathieu Valette
pp. 4991‑4999
pdf bib Cooking Up a Neural-based Model for Recipe Classification
Elham Mohammadi, Nada Naji, Louis Marceau, Marc Queudot, Eric Charton, Leila Kosseim and Marie-Jean Meurs
pp. 5000‑5009
pdf bib Enhancing a Lexicon of Polarity Shifters through the Supervised Classification of Shifting Directions
Marc Schulder, Michael Wiegand and Josef Ruppenhofer
pp. 5010‑5016
pdf bib Dataset Creation and Evaluation of Aspect Based Sentiment Analysis in Telugu, a Low Resource Language
Yashwanth Reddy Regatte, Rama Rohit Reddy Gangula and Radhika Mamidi
pp. 5017‑5024
pdf bib A Fine-grained Sentiment Dataset for Norwegian
Lilja Øvrelid, Petter Mæhlum, Jeremy Barnes and Erik Velldal
pp. 5025‑5033
pdf bib The Design and Construction of a Chinese Sarcasm Dataset
Xiaochang Gong, Qin Zhao, Jun Zhang, Ruibin Mao and Ruifeng Xu
pp. 5034‑5039
pdf bib Target-based Sentiment Annotation in Chinese Financial News
Chaofa Yuan, Yuhan Liu, Rongdi Yin, Jun Zhang, Qinling Zhu, Ruibin Mao and Ruifeng Xu
pp. 5040‑5045
pdf bib Multi-domain Tweet Corpora for Sentiment Analysis: Resource Creation and Evaluation
Mamta ., Asif Ekbal, Pushpak Bhattacharyya, Shikha Srivastava, Alka Kumar and Tista Saha
pp. 5046‑5054
pdf bib Reproduction and Revival of the Argument Reasoning Comprehension Task
João António Rodrigues, Ruben Branco, João Silva and António Branco
pp. 5055‑5064
pdf bib Design and Evaluation of SentiEcon: a fine-grained Economic/Financial Sentiment Lexicon from a Corpus of Business News
Antonio Moreno-Ortiz, Javier Fernandez-Cruz and Chantal Pérez Chantal Hernández
pp. 5065‑5072
pdf bib ParlVote: A Corpus for Sentiment Analysis of Political Debates
Gavin Abercrombie and Riza Batista-Navarro
pp. 5073‑5078
pdf bib Offensive Language Detection Using Brown Clustering
Zuoyu Tian and Sandra Kübler
pp. 5079‑5087
pdf bib Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis
Stavros Assimakopoulos, Rebecca Vella Muskat, Lonneke van der Plas and Albert Gatt
pp. 5088‑5097
pdf bib Marking Irony Activators in a Universal Dependencies Treebank: The Case of an Italian Twitter Corpus
Alessandra Teresa Cignarella, Manuela Sanguinetti, Cristina Bosco and Paolo Rosso
pp. 5098‑5105
pdf bib HAHA 2019 Dataset: A Corpus for Humor Analysis in Spanish
Luis Chiruzzo, Santiago Castro and Aiala Rosá
pp. 5106‑5112
pdf bib Offensive Language Identification in Greek
Zesis Pitenis, Marcos Zampieri and Tharindu Ranasinghe
pp. 5113‑5119
pdf bib Syntax and Semantics in a Treebank for Esperanto
Eckhard Bick
pp. 5120‑5127
pdf bib Implementation and Evaluation of an LFG-based Parser for Wolof
Cheikh M. Bamba Dione
pp. 5128‑5136
pdf bib The Treebank of Vedic Sanskrit
Oliver Hellwig, Salvatore Scarlata, Elia Ackermann and Paul Widmer
pp. 5137‑5146
pdf bib Inherent Dependency Displacement Bias of Transition-Based Algorithms
Mark Anderson and Carlos Gómez-Rodríguez
pp. 5147‑5155
pdf bib A Gold Standard Dependency Treebank for Turkish
Tolga Kayadelen, Adnan Ozturel and Bernd Bohnet
pp. 5156‑5163
pdf bib Chunk Different Kind of Spoken Discourse: Challenges for Machine Learning
Iris Eshkol-Taravella, Mariame Maarouf, Flora Badin, Marie Skrovec and Isabelle Tellier
pp. 5164‑5168
pdf bib GRAIN-S: Manually Annotated Syntax for German Interviews
Agnieszka Falenska, Zoltán Czesznak, Kerstin Jung, Moritz Völkel, Wolfgang Seeker and Jonas Kuhn
pp. 5169‑5177
pdf bib Yorùbá Dependency Treebank (YTB)
Olájídé Ishola and Daniel Zeman
pp. 5178‑5186
pdf bib English Recipe Flow Graph Corpus
Yoko Yamakata, Shinsuke Mori and John Carroll
pp. 5187‑5194
pdf bib Development of a General-Purpose Categorial Grammar Treebank
Yusuke Kubota, Koji Mineshima, Noritsugu Hayashi and Shinya Okano
pp. 5195‑5201
pdf bib Dependency Parsing for Urdu: Resources, Conversions and Learning
Toqeer Ehsan and Miriam Butt
pp. 5202‑5207
pdf bib Prague Dependency Treebank - Consolidated 1.0
Jan Hajic, Eduard Bejček, Jaroslava Hlavacova, Marie Mikulová, Milan Straka, Jan Štěpánek and Barbora Štěpánková
pp. 5208‑5218
pdf bib Training a Swedish Constituency Parser on Six Incompatible Treebanks
Richard Johansson and Yvonne Adesam
pp. 5219‑5224
pdf bib Parsing as Tagging
Robert Vacareanu, George Caique Gouveia Barbosa, Marco A. Valenzuela-Escárcega and Mihai Surdeanu
pp. 5225‑5231
pdf bib The EDGeS Diachronic Bible Corpus
Gerlof Bouma, Evie Coussé, Trude Dijkstra and Nicoline van der Sijs
pp. 5232‑5239
pdf bib Treebanking User-Generated Content: A Proposal for a Unified Representation in Universal Dependencies
Manuela Sanguinetti, Cristina Bosco, Lauren Cassidy, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah and Amir Zeldes
pp. 5240‑5250
pdf bib A Diachronic Treebank of Russian Spanning More Than a Thousand Years
Aleksandrs Berdicevskis and Hanne Eckhoff
pp. 5251‑5256
pdf bib ÆTHEL: Automatically Extracted Typelogical Derivations for Dutch
Konstantinos Kogkalidis, Michael Moortgat and Richard Moot
pp. 5257‑5266
pdf bib GUMBY – A Free, Balanced, and Rich English Web Corpus
Luke Gessler, Siyao Peng, Yang Liu, Yilun Zhu, Shabnam Behzad and Amir Zeldes
pp. 5267‑5275
pdf bib Typical Sentences as a Resource for Valence
Uwe Quasthoff, Lars Hellan, Erik Körner, Thomas Eckart, Dirk Goldhahn and Dorothee Beermann
pp. 5276‑5281
pdf bib Recognizing Sentence-level Logical Document Structures with the Help of Context-free Grammars
Jonathan Hildebrand, Wahed Hemati and Alexander Mehler
pp. 5282‑5290
pdf bib When Collaborative Treebank Curation Meets Graph Grammars
Gaël Guibon, Marine Courtin, Kim Gerdes and Bruno Guillaume
pp. 5291‑5300
pdf bib ODIL_Syntax: a Free Spontaneous Spoken French Treebank Annotated with Constituent Trees
Ilaine Wang, Aurore Pelletier, Jean-Yves Antoine and Anaïs Halftermeyer
pp. 5301‑5307
pdf bib Towards the Conversion of National Corpus of Polish to Universal Dependencies
Alina Wróblewska
pp. 5308‑5315
pdf bib SegBo: A Database of Borrowed Sounds in the World’s Language
Eitan Grossman, Elad Eisen, Dmitry Nikolaev and Steven Moran
pp. 5316‑5322
pdf bib Developing Resources for Automated Speech Processing of Quebec French
Mélanie Lancien, Marie-Hélène Côté and Brigitte Bigi
pp. 5323‑5328
pdf bib AlloVera: A Multilingual Allophone Database
David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W Black, Florian Metze and Graham Neubig
pp. 5329‑5336
pdf bib Arabic Speech Rhythm Corpus: Read and Spontaneous Speaking Styles
Omnia Ibrahim, Homa Asadi, Eman Kassem and Volker Dellwo
pp. 5337‑5342
pdf bib Comparing Methods for Measuring Dialect Similarity in Norwegian
Janne Johannessen, Andre Kåsen, Kristin Hagen, Anders Nøklestad and Joel Priestley
pp. 5343‑5350
pdf bib AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition
Afroz Ahamad, Ankit Anand and Pranesh Bhargava
pp. 5351‑5358
pdf bib A Framework for Evaluation of Machine Reading Comprehension Gold Standards
Viktor Schlegel, Marco Valentino, Andre Freitas, Goran Nenadic and Riza Batista-Navarro
pp. 5359‑5369
pdf bib Multi-class Hierarchical Question Classification for Multiple Choice Science Exams
Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord and Peter Clark
pp. 5370‑5382
pdf bib Assessing Users’ Reputation from Syntactic and Semantic Information in Community Question Answering
Yonas Woldemariam
pp. 5383‑5391
pdf bib Unsupervised Domain Adaptation of Language Models for Reading Comprehension
Kosuke Nishida, Kyosuke Nishida, Itsumi Saito, Hisako Asano and Junji Tomita
pp. 5392‑5399
pdf bib Propagate-Selector: Detecting Supporting Sentences for Question Answering via Graph Neural Networks
Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui and Kyomin Jung
pp. 5400‑5407
pdf bib An Empirical Comparison of Question Classification Methods for Question Answering Systems
Eduardo Cortes, Vinicius Woloszyn, Arne Binder, Tilo Himmelsbach, Dante Barone and Sebastian Möller
pp. 5408‑5416
pdf bib Cross-sentence Pre-trained Model for Interactive QA matching
Jinmeng Wu and Yanbin Hao
pp. 5417‑5424
pdf bib SQuAD2-CR: Semi-supervised Annotation for Cause and Rationales for Unanswerability in SQuAD 2.0
Gyeongbok Lee, Seung-won Hwang and Hyunsouk Cho
pp. 5425‑5432
pdf bib Generating Responses that Reflect Meta Information in User-Generated Question Answer Pairs
Takashi Kodama, Ryuichiro Higashinaka, Koh Mitsuda, Ryo Masumura, Yushi Aono, Ryuta Nakamura, Noritake Adachi and Hidetoshi Kawabata
pp. 5433‑5441
pdf bib AIA-BDE: A Corpus of FAQs in Portuguese and their Variations
Hugo Gonçalo Oliveira, João Ferreira, José Santos, Pedro Fialho, Ricardo Rodrigues, Luisa Coheur and Ana Alves
pp. 5442‑5449
pdf bib TutorialVQA: Question Answering Dataset for Tutorial Videos
Anthony Colas, Seokhwan Kim, Franck Dernoncourt, Siddhesh Gupte, Zhe Wang and Doo Soon Kim
pp. 5450‑5455
pdf bib WorldTree V2: A Corpus of Science-Domain Structured Explanations and Inference Patterns supporting Multi-Hop Inference
Zhengnan Xie, Sebastian Thiem, Jaycie Martin, Elizabeth Wainwright, Steven Marmorstein and Peter Jansen
pp. 5456‑5473
pdf bib Chat or Learn: a Data-Driven Robust Question-Answering System
Gabriel Luthier and Andrei Popescu-Belis
pp. 5474‑5480
pdf bib Project PIAF: Building a Native French Question-Answering Dataset
Rachel Keraron, Guillaume Lancrenon, Mathilde Bras, Frédéric Allary, Gilles Moyse, Thomas Scialom, Edmundo-Pavel Soriano-Morales and Jacopo Staiano
pp. 5481‑5490
pdf bib Cross-lingual and Cross-domain Evaluation of Machine Reading Comprehension with Squad and CALOR-Quest Corpora
Delphine Charlet, Geraldine Damnati, Frederic Bechet, gabriel marzinotto and Johannes Heinecke
pp. 5491‑5497
pdf bib ScholarlyRead: A New Dataset for Scientific Article Reading Comprehension
Tanik Saikh, Asif Ekbal and Pushpak Bhattacharyya
pp. 5498‑5504
pdf bib Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task
Md Tahmid Rahman Laskar, Jimmy Xiangji Huang and Enamul Hoque
pp. 5505‑5514
pdf bib Automatic Spanish Translation of SQuAD Dataset for Multi-lingual Question Answering
Casimiro Pio Carrino, Marta R. Costa-jussà and José A. R. Fonollosa
pp. 5515‑5523
pdf bib A Corpus for Visual Question Answering Annotated with Frame Semantic Information
Mehrdad Alizadeh and Barbara Di Eugenio
pp. 5524‑5531
pdf bib Evaluation of Dataset Selection for Pre-Training and Fine-Tuning Transformer Language Models for Clinical Question Answering
Sarvesh Soni and Kirk Roberts
pp. 5532‑5538
pdf bib A Shared Task of a New, Collaborative Type to Foster Reproducibility: A First Exercise in the Area of Language Science and Technology with REPROLANG2020
António Branco, Nicoletta Calzolari, Piek Vossen, Gertjan Van Noord, Dieter van Uytvanck, João Silva, Luís Gomes, André Moreira and Willem Elbers
pp. 5539‑5545
pdf bib A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings: Making the Method Robustly Reproducible as Well
Nicolas Garneau, Mathieu Godbout, David Beauchemin, Audrey Durand and Luc Lamontagne
pp. 5546‑5554
pdf bib A Closer Look on Unsupervised Cross-lingual Word Embeddings Mapping
Kamil Pluciński, Mateusz Lango and Michał Zimniewicz
pp. 5555‑5562
pdf bib Reproducing a Morphosyntactic Tagger with a Meta-BiLSTM Model over Context Sensitive Token Encodings
Yung Han Khoe
pp. 5563‑5568
pdf bib Reproducing Neural Ensemble Classifier for Semantic Relation Extraction inScientific Papers
Kyeongmin Rim, Jingxuan Tu, Kelley Lynch and James Pustejovsky
pp. 5569‑5578
pdf bib Text Classification Using Language Modeling: Reproducing ULMFiT
Mohamed Abdellatif and Ahmed Elgammal
pp. 5579‑5587
pdf bib CombiNMT: An Exploration into Neural Text Simplification Models
Michael Cooper and Matthew Shardlow
pp. 5588‑5594
pdf bib Reproducing Monolingual, Multilingual and Cross-Lingual CEFR Predictions
Yves Bestgen
pp. 5595‑5602
pdf bib Reproduction and Replication: A Case Study with Automatic Essay Scoring
Eva Huber and Çağrı Çöltekin
pp. 5603‑5613
pdf bib REPROLANG 2020: Automatic Proficiency Scoring of Czech, English, German, Italian, and Spanish Learner Essays
Andrew Caines and Paula Buttery
pp. 5614‑5623
pdf bib Language Proficiency Scoring
Cristina Arhiliuc, Jelena Mitrović and Michael Granitzer
pp. 5624‑5630
pdf bib The Learnability of the Annotated Input in NMT Replicating (Vanmassenhove and Way, 2018) with OpenNMT
Nicolas Ballier, Nabil Amari, Laure Merat and Jean-Baptiste Yunès
pp. 5631‑5640
pdf bib KGvec2go – Knowledge Graph Embeddings as a Service
Jan Portisch, Michael Hladik and Heiko Paulheim
pp. 5641‑5647
pdf bib Ontology Matching Using Convolutional Neural Networks
Alexandre Bento, Amal Zouaq and Michel Gagnon
pp. 5648‑5653
pdf bib Defying Wikidata: Validation of Terminological Relations in the Web of Data
Patricia Martín-Chozas, Sina Ahmadi and Elena Montiel-Ponsoda
pp. 5654‑5659
pdf bib Recent Developments for the Linguistic Linked Open Data Infrastructure
Thierry Declerck, John Philip McCrae, Matthias Hartung, Jorge Gracia, Christian Chiarcos, Elena Montiel-Ponsoda, Philipp Cimiano, Artem Revenko, Roser Saurí, Deirdre Lee, Stefania Racioppa, Jamal Abdul Nasir, Matthias Orlikowsk, Marta Lanau-Coronas, Christian Fäth, Mariano Rico, Mohammad Fazleh Elahi, Maria Khvalchik, Meritxell Gonzalez and Katharine Cooney
pp. 5660‑5667
pdf bib Annotation Interoperability for the Post-ISOCat Era
Christian Chiarcos, Christian Fäth and Frank Abromeit
pp. 5668‑5677
pdf bib A Large Harvested Corpus of Location Metonymy
Kevin Alex Mathews and Michael Strube
pp. 5678‑5687
pdf bib The DAPRECO Knowledge Base: Representing the GDPR in LegalRuleML
Livio Robaldo, Cesare Bartolini and Gabriele Lenzini
pp. 5688‑5697
pdf bib The Universal Decompositional Semantics Dataset and Decomp Toolkit
Aaron Steven White, Elias Stengel-Eskin, Siddharth Vashishtha, Venkata Subrahmanyan Govindarajan, Dee Ann Reisinger, Tim Vieira, Keisuke Sakaguchi, Sheng Zhang, Francis Ferraro, Rachel Rudinger, Kyle Rawlins and Benjamin Van Durme
pp. 5698‑5707
pdf bib Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit?
Emmanuele Chersoni, Ludovica Pannitto, Enrico Santus, Alessandro Lenci and Chu-Ren Huang
pp. 5708‑5713
pdf bib Ciron: a New Benchmark Dataset for Chinese Irony Detection
Rong Xiang, Xuefeng Gao, Yunfei Long, Anran Li, Emmanuele Chersoni, Qin Lu and Chu-Ren Huang
pp. 5714‑5720
pdf bib wikiHowToImprove: A Resource and Analyses on Edits in Instructional Texts
Talita Anthonio, Irshad Bhat and Michael Roth
pp. 5721‑5729
pdf bib Must Children be Vaccinated or not? Annotating Modal Verbs in the Vaccination Debate
Liza King and Roser Morante
pp. 5730‑5738
pdf bib NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution
Aditya Khandelwal and Suraj Sawant
pp. 5739‑5748
pdf bib Spatial Multi-Arrangement for Clustering and Multi-way Similarity Dataset Construction
Olga Majewska, Diana McCarthy, Jasper van den Bosch, Nikolaus Kriegeskorte, Ivan Vulić and Anna Korhonen
pp. 5749‑5758
pdf bib A Short Survey on Sense-Annotated Corpora
Tommaso Pasini and Jose Camacho-Collados
pp. 5759‑5765
pdf bib Using Distributional Thesaurus Embedding for Co-hyponymy Detection
Abhik Jana, Nikhil Reddy Varimalla and Pawan Goyal
pp. 5766‑5771
pdf bib NUBes: A Corpus of Negation and Uncertainty in Spanish Clinical Texts
Salvador Lima Lopez, Naiara Perez, Montse Cuadros and German Rigau
pp. 5772‑5781
pdf bib Decomposing and Comparing Meaning Relations: Paraphrasing, Textual Entailment, Contradiction, and Specificity
Venelin Kovatchev, Darina Gold, M. Antonia Marti, Maria Salamo and Torsten Zesch
pp. 5782‑5791
pdf bib Object Naming in Language and Vision: A Survey and a New Dataset
Carina Silberer, Sina Zarrieß and Gemma Boleda
pp. 5792‑5801
pdf bib MSD-1030: A Well-built Multi-Sense Evaluation Dataset for Sense Representation Models
Ting-Yu Yen, Yang-Yin Lee, Yow-Ting Shiue, Hen-Hsen Huang and Hsin-Hsi Chen
pp. 5802‑5809
pdf bib Figure Me Out: A Gold Standard Dataset for Metaphor Interpretation
Omnia Zayed, John Philip McCrae and Paul Buitelaar
pp. 5810‑5819
pdf bib Extrinsic Evaluation of French Dependency Parsers on a Specialized Corpus: Comparison of Distributional Thesauri
Ludovic Tanguy, Pauline Brunet and Olivier Ferret
pp. 5820‑5828
pdf bib Dataset and Enhanced Model for Eligibility Criteria-to-SQL Semantic Parsing
Xiaojing Yu, Tianlong Chen, Zhengjie Yu, Huiyu Li, Yang Yang, Xiaoqian Jiang and Anxiao Jiang
pp. 5829‑5837
pdf bib Recognizing Semantic Relations by Combining Transformers and Fully Connected Models
Dmitri Roussinov, Serge Sharoff and Nadezhda Puchnina
pp. 5838‑5845
pdf bib Word Attribute Prediction Enhanced by Lexical Entailment Tasks
Mika Hasegawa, Tetsunori Kobayashi and Yoshihiko Hayashi
pp. 5846‑5854
pdf bib From Spatial Relations to Spatial Configurations
Soham Dan, Parisa Kordjamshidi, Julia Bonn, Archna Bhatia, Zheng Cai, Martha Palmer and Dan Roth
pp. 5855‑5864
pdf bib Representing Verbs with Visual Argument Vectors
Irene Sucameli and Alessandro Lenci
pp. 5865‑5870
pdf bib Are White Ravens Ever White? - Non-Literal Adjective-Noun Phrases in Polish
Agnieszka Mykowiecka and Malgorzata Marciniak
pp. 5871‑5877
pdf bib CoSimLex: A Resource for Evaluating Graded Word Similarity in Context
Carlos Santos Armendariz, Matthew Purver, Matej Ulčar, Senja Pollak, Nikola Ljubešić and Mark Granroth-Wilding
pp. 5878‑5886
pdf bib A French Version of the FraCaS Test Suite
Maxime Amblard, Clément Beysson, Philippe de Groote, Bruno Guillaume and Sylvain Pogodalla
pp. 5887‑5895
pdf bib Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System
Seid Muhie Yimam, Gopalakrishnan Venkatesh, John Lee and Chris Biemann
pp. 5896‑5904
pdf bib Sense-Annotated Corpora for Word Sense Disambiguation in Multiple Languages and Domains
Bianca Scarlini, Tommaso Pasini and Roberto Navigli
pp. 5905‑5911
pdf bib FrSemCor: Annotating a French Corpus with Supersenses
Lucie Barque, Pauline Haas, Richard Huyghe, Delphine Tribout, Marie Candito, Benoit Crabbé and Vincent Segonne
pp. 5912‑5918
pdf bib A Formal Analysis of Multimodal Referring Strategies Under Common Ground
Nikhil Krishnaswamy and James Pustejovsky
pp. 5919‑5927
pdf bib Improving Neural Metaphor Detection with Visual Datasets
Gitit Kehat and James Pustejovsky
pp. 5928‑5933
pdf bib Building a Hebrew Semantic Role Labeling Lexical Resource from Parallel Movie Subtitles
Ben Eyal and Michael Elhadad
pp. 5934‑5942
pdf bib Word Sense Disambiguation for 158 Languages using Word Embeddings Only
Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto and Alexander Panchenko
pp. 5943‑5952
pdf bib Extraction of Hyponymic Relations in French with Knowledge-Pattern-Based Word Sketches
Antonio San Martín, Catherine Trekker and Pilar León-Araúz
pp. 5953‑5961
pdf bib SeCoDa: Sense Complexity Dataset
David Strohmaier, Sian Gooding, Shiva Taslimipoor and Ekaterina Kochmar
pp. 5962‑5967
pdf bib A New Resource for German Causal Language
Ines Rehbein and Josef Ruppenhofer
pp. 5968‑5977
pdf bib One Classifier for All Ambiguous Words: Overcoming Data Sparsity by Utilizing Sense Correlations Across Words
Prafulla Kumar Choubey and Ruihong Huang
pp. 5978‑5985
pdf bib A Corpus of Adpositional Supersenses for Mandarin Chinese
Siyao Peng, Yang Liu, Yilun Zhu, Austin Blodgett, Yushi Zhao and Nathan Schneider
pp. 5986‑5994
pdf bib The Russian PropBank
Sarah Moeller, Irina Wagner, Martha Palmer, Kathryn Conger and Skatje Myers
pp. 5995‑6002
pdf bib What Comes First: Combining Motion Capture and Eye Tracking Data to Study the Order of Articulators in Constructed Action in Sign Language Narratives
Tommi Jantunen, Anna Puupponen and Birgitta Burger
pp. 6003‑6007
pdf bib LSF-ANIMAL: A Motion Capture Corpus in French Sign Language Designed for the Animation of Signing Avatars
Lucie Naert, Caroline Larboulette and Sylvie Gibet
pp. 6008‑6017
pdf bib Sign Language Recognition with Transformer Networks
Mathieu De Coster, Mieke Van Herreweghe and Joni Dambre
pp. 6018‑6024
pdf bib Annotating a Fable in Italian Sign Language (LIS)
Serena Trolvi and Rodolfo Delmonte
pp. 6025‑6034
pdf bib HamNoSyS2SiGML: Translating HamNoSys Into SiGML
Carolina Neves, Luísa Coheur and Hugo Nicolau
pp. 6035‑6039
pdf bib Dicta-Sign-LSF-v2: Remake of a Continuous French Sign Language Dialogue Corpus and a First Baseline for Automatic Sign Language Processing
Valentin Belissen, Annelies Braffort and Michèle Gouiffès
pp. 6040‑6048
pdf bib An HMM Approach with Inherent Model Selection for Sign Language and Gesture Recognition
Sandrine Tornay, Oya Aran and Mathew Magimai Doss
pp. 6049‑6056
pdf bib VROAV: Using Iconicity to Visually Represent Abstract Verbs
Simone Scicluna and Carlo Strapparava
pp. 6057‑6062
pdf bib MEDIAPI-SKEL - A 2D-Skeleton Video Database of French Sign Language With Aligned French Subtitles
Hannah Bull, Annelies Braffort and Michèle Gouiffès
pp. 6063‑6068
pdf bib Alignment Data base for a Sign Language Concordancer
Marion Kaczmarek and Michael Filhol
pp. 6069‑6072
pdf bib Evaluation of Manual and Non-manual Components for Sign Language Recognition
Medet Mukushev, Arman Sabyrov, Alfarabi Imashev, Kenessary Koishybay, Vadim Kimmelman and Anara Sandygulova
pp. 6073‑6078
pdf bib TheRuSLan: Database of Russian Sign Language
Ildar Kagirov, Denis Ivanko, Dmitry Ryumin, Alexander Axyonov and Alexey Karpov
pp. 6079‑6085
pdf bib A Survey on Natural Language Processing for Fake News Detection
Ray Oshikawa, Jing Qian and William Yang Wang
pp. 6086‑6093
pdf bib RP-DNN: A Tweet Level Propagation Context Based Deep Neural Networks for Early Rumor Detection in Social Media
Jie Gao, Sooji Han, Xingyi Song and Fabio Ciravegna
pp. 6094‑6105
pdf bib Issues and Perspectives from 10,000 Annotated Financial Social Media Data
Chung-Chi Chen, Hen-Hsen Huang and Hsin-Hsi Chen
pp. 6106‑6110
pdf bib Searching Brazilian Twitter for Signs of Mental Health Issues
Wesley Santos, Amanda Funabashi and Ivandré Paraboni
pp. 6111‑6117
pdf bib RedDust: a Large Reusable Dataset of Reddit User Traits
Anna Tigunova, Paramita Mirza, Andrew Yates and Gerhard Weikum
pp. 6118‑6126
pdf bib An Annotated Social Media Corpus for German
Eckhard Bick
pp. 6127‑6135
pdf bib The rJokes Dataset: a Large Scale Humor Collection
Orion Weller and Kevin Seppi
pp. 6136‑6141
pdf bib EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus
Thomas Proisl, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Andreas Blombach and Stefan Evert
pp. 6142‑6148
pdf bib Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection
Kai Nakamura, Sharon Levy and William Yang Wang
pp. 6149‑6157
pdf bib Optimising Twitter-based Political Election Prediction with Relevance andSentiment Filters
Eric Sanders and Antal van den Bosch
pp. 6158‑6165
pdf bib A Real-Time System for Credibility on Twitter
Adrian Iftene, Daniela Gifu, Andrei-Remus Miron and Mihai-Stefan Dudu
pp. 6166‑6173
pdf bib A Corpus of Turkish Offensive Language on Social Media
Çağrı Çöltekin
pp. 6174‑6184
pdf bib From Witch’s Shot to Music Making Bones - Resources for Medical Laymen to Technical Language and Vice Versa
Laura Seiffe, Oliver Marten, Michael Mikhailov, Sven Schmeier, Sebastian Möller and Roland Roller
pp. 6185‑6192
pdf bib I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language
Tommaso Caselli, Valerio Basile, Jelena Mitrović, Inga Kartoziya and Michael Granitzer
pp. 6193‑6202
pdf bib A Multi-Platform Arabic News Comment Dataset for Offensive Language Detection
Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Abdelali, Soon-gyo Jung, Bernard J Jansen and Joni Salminen
pp. 6203‑6212
pdf bib Twitter Trend Extraction: A Graph-based Approach for Tweet and Hashtag Ranking, Utilizing No-Hashtag Tweets
Zahra Majdabadi, Behnam Sabeti, Preni Golazizian, Seyed Arad Ashrafi Asli, Omid Momenzadeh and reza fahmi
pp. 6213‑6219
pdf bib A French Corpus for Event Detection on Twitter
Béatrice Mazoyer, Julia Cagé, Nicolas Hervé and Céline Hudelot
pp. 6220‑6227
pdf bib Minority Positive Sampling for Switching Points - an Anecdote for the Code-Mixing Language Modeling
Arindam Chatterjere, Vineeth Guptha, Parul Chopra and Amitava Das
pp. 6228‑6236
pdf bib Do You Really Want to Hurt Me? Predicting Abusive Swearing in Social Media
Endang Wahyu Pamungkas, Valerio Basile and Viviana Patti
pp. 6237‑6246
pdf bib Detecting Troll Tweets in a Bilingual Corpus
Lin Miao, Mark Last and Marina Litvak
pp. 6247‑6254
pdf bib Collecting Tweets to Investigate Regional Variation in Canadian English
Filip Miletic, Anne Przewozny-Desriaux and Ludovic Tanguy
pp. 6255‑6264
pdf bib DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter
Ines Abbes, Wajdi Zaghouani, Omaima El-Hardlo and Faten Ashour
pp. 6265‑6271
pdf bib Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing
Rob van der Goot, Alan Ramponi, Tommaso Caselli, Michele Cafagna and Lorenzo De Mattei
pp. 6272‑6278
pdf bib TArC: Incrementally and Semi-Automatically Collecting a Tunisian Arabish Corpus
Elisa Gugliotta and Marco Dinarelli
pp. 6279‑6286
pdf bib Small Town or Metropolis? Analyzing the Relationship between Population Size and Language
Amy Rechkemmer, Steven Wilson and Rada Mihalcea
pp. 6287‑6291
pdf bib Inferring Social Media Users’ Mental Health Status from Multimodal Information
Zhentao Xu, Verónica Pérez-Rosas and Rada Mihalcea
pp. 6292‑6299
pdf bib Synthetic Data for English Lexical Normalization: How Close Can We Get to Manually Annotated Data?
Kelly Dekker and Rob van der Goot
pp. 6300‑6309
pdf bib A Corpus of German Reddit Exchanges (GeRedE)
Andreas Blombach, Natalie Dykes, Philipp Heinrich, Besim Kabashi and Thomas Proisl
pp. 6310‑6316
pdf bib French Tweet Corpus for Automatic Stance Detection
Marc Evrard, Rémi Uro, Nicolas Hervé and Béatrice Mazoyer
pp. 6317‑6322
pdf bib LSCP: Enhanced Large Scale Colloquial Persian Language Understanding
Hadi Abdi Khojasteh, Ebrahim Ansari and Mahdi Bohlouli
pp. 6323‑6327
pdf bib Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech
Yin May Oo, Theeraphol Wattanavekin, Chenfang Li, Pasindu De Silva, Supheakmungkol Sarin, Knot Pipatsrisawat, Martin Jansche, Oddur Kjartansson and Alexander Gutkin
pp. 6328‑6339
pdf bib Evaluating and Improving Child-Directed Automatic Speech Recognition
Eric Booth, Jake Carns, Casey Kennington and Nader Rafla
pp. 6340‑6345
pdf bib Parallel Corpus for Japanese Spoken-to-Written Style Conversion
Mana Ihori, Akihiko Takashima and Ryo Masumura
pp. 6346‑6353
pdf bib Multi-Staged Cross-Lingual Acoustic Model Adaption for Robust Speech Recognition in Real-World Applications - A Case Study on German Oral History Interviews
Michael Gref, Oliver Walter, Christoph Schmidt, Sven Behnke and Joachim Köhler
pp. 6354‑6362
pdf bib Large Corpus of Czech Parliament Plenary Hearings
Jonas Kratochvil, Peter Polak and Ondrej Bojar
pp. 6363‑6367
pdf bib Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis
Eva Szekely, Jens Edlund and joakim gustafson
pp. 6368‑6374
pdf bib ATC-ANNO: Semantic Annotation for Air Traffic Control with Assistive Auto-Annotation
Marc Schulder, Johannah O’Mahony, Yury Bakanouski and Dietrich Klakow
pp. 6375‑6380
pdf bib MASRI-HEADSET: A Maltese Corpus for Speech Recognition
Carlos Daniel Hernandez Mena, Albert Gatt, Andrea DeMarco, Claudia Borg, Lonneke van der Plas, Amanda Muscat and Ian Padovani
pp. 6381‑6388
pdf bib Automatic Period Segmentation of Oral French
Natalia Kalashnikova, Loïc Grobol, Iris Eshkol-Taravella and François Delafontaine
pp. 6389‑6394
pdf bib Corpus Generation for Voice Command in Smart Home and the Effect of Speech Synthesis on End-to-End SLU
Thierry Desot, François Portet and Michel Vacher
pp. 6395‑6404
pdf bib Text and Speech-based Tunisian Arabic Sub-Dialects Identification
Najla Ben Abdallah, Saméh Kchaou and Fethi Bougares
pp. 6405‑6411
pdf bib Urdu Pitch Accents and Intonation Patterns in Spontaneous Conversational Speech
Luca Rognoni, Judith Bishop, Miriam Corris, Jessica Fernando and Rosanna Smith
pp. 6412‑6416
pdf bib IndicSpeech: Text-to-Speech Corpus for Indian Languages
Nimisha Srivastava, Rudrabha Mukhopadhyay, Prajwal K R and C V Jawahar
pp. 6417‑6422
pdf bib Using Automatic Speech Recognition in Spoken Corpus Curation
Jan Gorisch, Michael Gref and Thomas Schmidt
pp. 6423‑6428
pdf bib Integrating Disfluency-based and Prosodic Features with Acoustics in Automatic Fluency Evaluation of Spontaneous Speech
Huaijin Deng, Youchao Lin, Takehito Utsuro, Akio Kobayashi, Hiromitsu Nishizaki and Junichi Hoshino
pp. 6429‑6437
pdf bib DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus
Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura and Hiroshi Saruwatari
pp. 6438‑6443
pdf bib Automatic Speech Recognition for Uyghur through Multilingual Acoustic Modeling
Ayimunishagu Abulimiti and Tanja Schultz
pp. 6444‑6449
pdf bib The SAFE-T Corpus: A New Resource for Simulated Public Safety Communications
Dana Delgado, Kevin Walker, Stephanie Strassel, Karen Jones, Christopher Caruso and David Graff
pp. 6450‑6457
pdf bib Lexical Tone Recognition in Mizo using Acoustic-Prosodic Features
Parismita Gogoi, Abhishek Dey, Wendy Lalhminghlui, Priyankoo Sarmah and S R Mahadeva Prasanna
pp. 6458‑6461
pdf bib Artie Bias Corpus: An Open Dataset for Detecting Demographic Bias in Speech Applications
Josh Meyer, Lindy Rauchenstein, Joshua D. Eisenberg and Nicholas Howell
pp. 6462‑6468
pdf bib Evaluation of Off-the-shelf Speech Recognizers Across Diverse Dialogue Domains
Kallirroi Georgila, Anton Leuski, Volodymyr Yanov and David Traum
pp. 6469‑6476
pdf bib CEASR: A Corpus for Evaluating Automatic Speech Recognition
Malgorzata Anna Ulasik, Manuela Hürlimann, Fabian Germann, Esin Gedik, Fernando Benites and Mark Cieliebak
pp. 6477‑6485
pdf bib MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible
Marcely Zanon Boito, William Havard, Mahault Garnerin, Éric Le Ferrand and Laurent Besacier
pp. 6486‑6493
pdf bib Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems
Fei He, Shan-Hui Cathy Chu, Oddur Kjartansson, Clara Rivera, Anna Katanova, Alexander Gutkin, Isin Demirsahin, Cibu Johny, Martin Jansche, Supheakmungkol Sarin and Knot Pipatsrisawat
pp. 6494‑6503
pdf bib Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech
Adriana Guevara-Rukoz, Isin Demirsahin, Fei He, Shan-Hui Cathy Chu, Supheakmungkol Sarin, Knot Pipatsrisawat, Alexander Gutkin, Alena Butryna and Oddur Kjartansson
pp. 6504‑6513
pdf bib A Manually Annotated Resource for the Investigation of Nasal Grunts
Aurélie Chlébowski and Nicolas Ballier
pp. 6514‑6522
pdf bib The Objective and Subjective Sleepiness Voice Corpora
Vincent P. Martin, Jean-Luc Rouas, Jean-Arthur Micoulaud Franchi and Pierre Philip
pp. 6523‑6531
pdf bib Open-source Multi-speaker Corpora of the English Accents in the British Isles
Isin Demirsahin, Oddur Kjartansson, Alexander Gutkin and Clara Rivera
pp. 6532‑6541
pdf bib TV-AfD: An Imperative-Annotated Corpus from The Big Bang Theory and Wikipedia’s Articles for Deletion Discussions
Yimin Xiao, Zong-Ying Slaton and Lu Xiao
pp. 6542‑6548
pdf bib A Large Scale Speech Sentiment Corpus
Eric Chen, Zhiyun Lu, Hao Xu, Liangliang Cao, Yu Zhang and James Fan
pp. 6549‑6555
pdf bib SibLing Corpus of Russian Dialogue Speech Designed for Research on Speech Entrainment
Tatiana Kachkovskaia, Tatiana Chukaeva, Vera Evdokimova, Pavel Kholiavin, Natalia Kriakina, Daniil Kocharov, Anna Mamushina, Alla Menshikova and Svetlana Zimina
pp. 6556‑6561
pdf bib PhonBank and Data Sharing: Recent Developments in European Portuguese
Ana Margarida Ramalho, Maria João Freitas and Yvan Rose
pp. 6562‑6570
pdf bib SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay
Yuki Saito, Shinnosuke Takamichi and Hiroshi Saruwatari
pp. 6571‑6577
pdf bib Improving Speech Recognition for the Elderly: A New Corpus of Elderly Japanese Speech and Investigation of Acoustic Modeling for Speech Recognition
Meiko Fukuda, Hiromitsu Nishizaki, Yurie Iribe, Ryota Nishimura and Norihide Kitaoka
pp. 6578‑6585
pdf bib Preparation of Bangla Speech Corpus from Publicly Available Audio & Text
Shafayat Ahmed, Nafis Sadeq, Sudipta Saha Shubha, Md. Nahidul Islam, Muhammad Abdullah Adnan and Mohammad Zuberul Islam
pp. 6586‑6592
pdf bib On Construction of the ASR-oriented Indian English Pronunciation Dictionary
Xian Huang, Xin Jin, Qike Li and Keliang Zhang
pp. 6593‑6598
pdf bib Gender Representation in Open Source Speech Resources
Mahault Garnerin, Solange Rossato and Laurent Besacier
pp. 6599‑6605
pdf bib RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition
Alexandru-Lucian Georgescu, Horia Cucu, Andi Buzo and Corneliu Burileanu
pp. 6606‑6612
pdf bib FAB: The French Absolute Beginner Corpus for Pronunciation Training
Sean Robertson, Cosmin Munteanu and Gerald Penn
pp. 6613‑6620
pdf bib Call My Net 2: A New Resource for Speaker Recognition
Karen Jones, Stephanie Strassel, Kevin Walker and Jonathan Wright
pp. 6621‑6626
pdf bib DaCToR: A Data Collection Tool for the RELATER Project
Juan Hussain, Oussama Zenkri, Sebastian Stüker and Alex Waibel
pp. 6627‑6632
pdf bib Development and Evaluation of Speech Synthesis Corpora for Latvian
Roberts Darģis, Peteris Paikens, Normunds Gruzitis, Ilze Auzina and Agate Akmane
pp. 6633‑6637
pdf bib Abstractive Document Summarization without Parallel Data
Nikola I. Nikolov and Richard Hahnloser
pp. 6638‑6644
pdf bib GameWikiSum: a Novel Large Multi-Document Summarization Dataset
Diego Antognini and Boi Faltings
pp. 6645‑6650
pdf bib Summarization Corpora of Wikipedia Articles
Dominik Frefel
pp. 6651‑6655
pdf bib Language Agnostic Automatic Summarization Evaluation
Christopher Tauchmann and Margot Mieskes
pp. 6656‑6662
pdf bib Two Huge Title and Keyword Generation Corpora of Research Articles
Erion Çano and Ondřej Bojar
pp. 6663‑6671
pdf bib A Multi-level Annotated Corpus of Scientific Papers for Scientific Document Summarization and Cross-document Relation Discovery
Ahmed AbuRa’ed, Horacio Saggion and Luis Chiruzzo
pp. 6672‑6679
pdf bib Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Dmitrii Aksenov, Julian Moreno-Schneider, Peter Bourgonje, Robert Schwarzenberg, Leonhard Hennig and Georg Rehm
pp. 6680‑6689
pdf bib A Data Set for the Analysis of Text Quality Dimensions in Summarization Evaluation
Margot Mieskes, Eneldo Loza Mencía and Tim Kronsbein
pp. 6690‑6699
pdf bib Summarization Beyond News: The Automatically Acquired Fandom Corpora
Benjamin Hättasch, Nadja Geisler, Christian M. Meyer and Carsten Binnig
pp. 6700‑6708
pdf bib Invisible to People but not to Machines: Evaluation of Style-aware HeadlineGeneration in Absence of Reliable Human Judgment
Lorenzo De Mattei, Michele Cafagna, Felice Dell’Orletta and Malvina Nissim
pp. 6709‑6717
pdf bib Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation
Paul Tardy, David Janiszek, Yannick Estève and Vincent Nguyen
pp. 6718‑6724
pdf bib A Summarization Dataset of Slovak News Articles
Marek Suppa and Jergus Adamec
pp. 6725‑6730
pdf bib DaNewsroom: A Large-scale Danish Summarisation Dataset
Daniel Varab and Natalie Schluter
pp. 6731‑6739
pdf bib Diverging Divergences: Examining Variants of Jensen Shannon Divergence for Corpus Comparison Tasks
Jinghui Lu, Maeve Henchion and Brian Mac Namee
pp. 6740‑6744
pdf bib TopicNet: Making Additive Regularisation for Topic Modelling Accessible
Victor Bulatov, Vasiliy Alekseev, Konstantin Vorontsov, Darya Polyudova, Eugenia Veselova, Alexey Goncharov and Evgeny Egorov
pp. 6745‑6752
pdf bib SC-CoMIcs: A Superconductivity Corpus for Materials Informatics
Kyosuke Yamaguchi, Ryoji Asahi and Yutaka Sasaki
pp. 6753‑6760
pdf bib GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
Masato Hagiwara and Masato Mita
pp. 6761‑6768
pdf bib Annotation of Adverse Drug Reactions in Patients’ Weblogs
Yuki Arase, Tomoyuki Kajiwara and Chenhui Chu
pp. 6769‑6776
pdf bib Beyond Citations: Corpus-based Methods for Detecting the Impact of Research Outcomes on Society
Rezvaneh Rezapour, Jutta Bopp, Norman Fiedler, Diana Steffen, Andreas Witt and Jana Diesner
pp. 6777‑6785
pdf bib Toxic, Hateful, Offensive or Abusive? What Are We Really Classifying? An Empirical Analysis of Hate Speech Datasets
Paula Fortuna, Juan Soler and Leo Wanner
pp. 6786‑6794
pdf bib Unsupervised Argumentation Mining in Student Essays
Isaac Persing and Vincent Ng
pp. 6795‑6803
pdf bib Aspect-Based Sentiment Analysis as Fine-Grained Opinion Mining
Gerardo Ocampo Diaz, Xuanming Zhang and Vincent Ng
pp. 6804‑6811
pdf bib Predicting Item Survival for Multiple Choice Questions in a High-Stakes Medical Exam
Victoria Yaneva, Le An Ha, Peter Baldwin and Janet Mee
pp. 6812‑6818
pdf bib Discourse Component to Sentence (DC2S): An Efficient Human-Aided Construction of Paraphrase and Sentence Similarity Dataset
Won Ik Cho, Jong In Kim, Young Ki Moon and Nam Soo Kim
pp. 6819‑6826
pdf bib Japanese Realistic Textual Entailment Corpus
Yuta Hayashibe
pp. 6827‑6834
pdf bib Improving the Precision of Natural Textual Entailment Problem Datasets
Jean-Philippe Bernardy and Stergios Chatzikyriakidis
pp. 6835‑6840
pdf bib Comparative Study of Sentence Embeddings for Contextual Paraphrasing
Louisa Pragst, Wolfgang Minker and Stefan Ultes
pp. 6841‑6851
pdf bib HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference
Tianyu Liu, Zheng Xin, Baobao Chang and Zhifang Sui
pp. 6852‑6860
pdf bib SAPPHIRE: Simple Aligner for Phrasal Paraphrase with Hierarchical Representation
Masato Yoshinaka, Tomoyuki Kajiwara and Yuki Arase
pp. 6861‑6867
pdf bib TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages
Yves Scherrer
pp. 6868‑6873
pdf bib Automated Fact-Checking of Claims from Wikipedia
Aalok Sathe, Salar Ather, Tuan Manh Le, Nathan Perry and Joonsuk Park
pp. 6874‑6882
pdf bib Towards the Necessity for Debiasing Natural Language Inference Datasets
Mithun Paul Panenghat, Sandeep Suntwal, Faiz Rafique, Rebecca Sharp and Mihai Surdeanu
pp. 6883‑6888
pdf bib A French Corpus for Semantic Similarity
Rémi Cardon and Natalia Grabar
pp. 6889‑6894
pdf bib Developing Dataset of Japanese Slot Filling Quizzes Designed for Evaluation of Machine Reading Comprehension
Takuto Watarai and Masatoshi Tsuchiya
pp. 6895‑6901
pdf bib Detecting Negation Cues and Scopes in Spanish
Salud María Jiménez-Zafra, Roser Morante, Eduardo Blanco, María Teresa Martín Valdivia and L. Alfonso Ureña López
pp. 6902‑6911
pdf bib TIARA: A Tool for Annotating Discourse Relations and Sentence Reordering
Jan Wira Gotama Putra, Simone Teufel, Kana Matsumura and Takenobu Tokunaga
pp. 6912‑6920
pdf bib Infrastructure for Semantic Annotation in the Genomics Domain
Mahmoud El-Haj, Nathan Rutherford, Matthew Coole, Ignatius Ezeani, Sheryl Prentice, Nancy Ide, Jo Knight, Scott Piao, John Mariani, Paul Rayson and Keith Suderman
pp. 6921‑6929
pdf bib Correcting the Autocorrect: Context-Aware Typographical Error Correction via Training Data Augmentation
Kshitij Shah and Gerard de Melo
pp. 6930‑6936
pdf bib KidSpell: A Child-Oriented, Rule-Based, Phonetic Spellchecker
Brody Downs, Oghenemaro Anuyah, Aprajita Shukla, Jerry Alan Fails, Sole Pera, Katherine Wright and Casey Kennington
pp. 6937‑6946
pdf bib ThaiLMCut: Unsupervised Pretraining for Thai Word Segmentation
Suteera Seeha, Ivan Bilan, Liliana Mamani Sanchez, Johannes Huber, Michael Matuschek and Hinrich Schütze
pp. 6947‑6957
pdf bib CCOHA: Clean Corpus of Historical American English
Reem Alatrash, Dominik Schlechtweg, Jonas Kuhn and Sabine Schulte im Walde
pp. 6958‑6966
pdf bib Outbound Translation User Interface Ptakopět: A Pilot Study
Vilém Zouhar and Ondřej Bojar
pp. 6967‑6975
pdf bib Seshat: a Tool for Managing and Verifying Annotation Campaigns of Audio Data
Hadrien Titeux, Rachid Riad, Xuan-Nga Cao, Nicolas Hamilakis, Kris Madden, Alejandrina Cristia, Anne-Catherine Bachoud-Lévi and Emmanuel Dupoux
pp. 6976‑6982
pdf bib Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages
Cash Costello, Shelby Anderson, Caitlyn Bishop, James Mayfield and Paul McNamee
pp. 6983‑6987
pdf bib Natural Language Processing Pipeline to Annotate Bulgarian Legislative Documents
Svetla Koeva, Nikola Obreshkov and Martin Yalamov
pp. 6988‑6994
pdf bib CLDFBench: Give Your Cross-Linguistic Data a Lift
Robert Forkel and Johann-Mattis List
pp. 6995‑7002
pdf bib KonText: Advanced and Flexible Corpus Query Interface
Tomáš Machálek
pp. 7003‑7008
pdf bib Word at a Glance: Modular Word Profile Aggregator
Tomáš Machálek
pp. 7009‑7014
pdf bib RKorAPClient: An R Package for Accessing the German Reference Corpus DeReKo via KorAP
Marc Kupietz, Nils Diewald and Eliza Margaretha
pp. 7015‑7021
pdf bib CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing
Ossama Obeid, Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alexander Erdmann and Nizar Habash
pp. 7022‑7032
pdf bib ReSiPC: a Tool for Complex Searches in Parallel Corpora
Antoni Oliver and Bojana Mikelenić
pp. 7033‑7037
pdf bib HitzalMed: Anonymisation of Clinical Text in Spanish
Salvador Lima Lopez, Naiara Perez, Laura García-Sardiña and Montse Cuadros
pp. 7038‑7043
pdf bib The xtsv Framework and the Twelve Virtues of Pipelines
Balázs Indig, Bálint Sass and Iván Mittelholcz
pp. 7044‑7052
pdf bib A Web-based Collaborative Annotation and Consolidation Tool
Tobias Daudert
pp. 7053‑7059
pdf bib Data Query Language and Corpus Tools for Slot-Filling and Intent Classification Data
Stefan Larson, Eric Guldan and Kevin Leach
pp. 7060‑7068
pdf bib SHR++: An Interface for Morpho-syntactic Annotation of Sanskrit Corpora
Amrith Krishna, Shiv Vidhyut, Dilpreet Chawla, Sruti Sambhavi and Pawan Goyal
pp. 7069‑7076
pdf bib KOTONOHA: A Corpus Concordance System for Skewer-Searching NINJAL Corpora
Teruaki Oka, Yuichi Ishimoto, Yutaka Yagi, Takenori Nakamura, Masayuki Asahara, Kikuo Maekawa, Toshinobu Ogiso, Hanae Koiso, Kumiko Sakoda and Nobuko Kibe
pp. 7077‑7083
pdf bib Gamification Platform for Collecting Task-oriented Dialogue Data
Haruna Ogawa, Hitoshi Nishikawa, Takenobu Tokunaga and Hikaru Yokono
pp. 7084‑7093
pdf bib Improving the Production Efficiency and Well-formedness of Automatically-Generated Multiple-Choice Cloze Vocabulary Questions
Ralph Rose
pp. 7094‑7101
pdf bib Improving Sentence Boundary Detection for Spoken Language Transcripts
Ines Rehbein, Josef Ruppenhofer and Thomas Schmidt
pp. 7102‑7111
pdf bib MorphAGram, Evaluation and Framework for Unsupervised Morphological Segmentation
Ramy Eskander, Francesca Callejas, Elizabeth Nichols, Judith Klavans and Smaranda Muresan
pp. 7112‑7122
pdf bib CTAP for Italian: Integrating Components for the Analysis of Italian into a Multilingual Linguistic Complexity Analysis Tool
Nadezda Okinina, Jennifer-Carmen Frey and Zarah Weiss
pp. 7123‑7131
pdf bib Do you Feel Certain about your Annotation? A Web-based Semantic Frame Annotation Tool Considering Annotators’ Concerns and Behaviors
Regina Stodden, Behrang QasemiZadeh and Laura Kallmeyer
pp. 7132‑7139
pdf bib Seq2SeqPy: A Lightweight and Customizable Toolkit for Neural Sequence-to-Sequence Modeling
Raheel Qader, François Portet and Cyril Labbe
pp. 7140‑7144
pdf bib Profiling-UD: a Tool for Linguistic Profiling of Texts
Dominique Brunato, Andrea Cimino, Felice Dell’Orletta, Giulia Venturi and Simonetta Montemagni
pp. 7145‑7151
pdf bib EstNLTK 1.6: Remastered Estonian NLP Pipeline
Sven Laur, Siim Orasmaa, Dage Särg and Paul Tammo
pp. 7152‑7160
pdf bib A Tree Extension for CoNLL-RDF
Christian Chiarcos and Luis Glaser
pp. 7161‑7169
pdf bib Lemmatising Verbs in Middle English Corpora: The Benefit of Enriching the Penn-Helsinki Parsed Corpus of Middle English 2 (PPCME2), the Parsed Corpus of Middle English Poetry (PCMEP), and A Parsed Linguistic Atlas of Early Middle English (PLAEME)
Carola Trips and Michael Percillier
pp. 7170‑7178
pdf bib CoCo: A Tool for Automatically Assessing Conceptual Complexity of Texts
Sanja Stajner, Sergiu Nisioi and Ioana Hulpuș
pp. 7179‑7186
pdf bib PyVallex: A Processing System for Valency Lexicon Data
Jonathan Verner and Anna Vernerová
pp. 7187‑7193
pdf bib Editing OntoLex-Lemon in VocBench 3
Manuel Fiorelli, Armando Stellato, Tiziano Lorenzetti, Andrea Turbati, Peter Schmitz, Enrico Francesconi, Najeh Hajlaoui and Brahim Batouche
pp. 7194‑7203
pdf bib MALT-IT2: A New Resource to Measure Text Difficulty in Light of CEFR Levels for Italian L2 Learning
Luciana Forti, Giuliana Grego Bolli, Filippo Santarelli, Valentino Santucci and Stefania Spina
pp. 7204‑7211
pdf bib Fintan - Flexible, Integrated Transformation and Annotation eNgineering
Christian Fäth, Christian Chiarcos, Björn Ebbrecht and Maxim Ionov
pp. 7212‑7221
pdf bib Contemplata, a Free Platform for Constituency Treebank Annotation
Jakub Waszczuk, Ilaine Wang, Jean-Yves Antoine and Anaïs Halftermeyer
pp. 7222‑7229
pdf bib Interchange Formats for Visualization: LIF and MMIF
Kyeongmin Rim, Kelley Lynch, Marc Verhagen, Nancy Ide and James Pustejovsky
pp. 7230‑7237
pdf bib Developing NLP Tools with a New Corpus of Learner Spanish
Sam Davidson, Aaron Yamada, Paloma Fernandez Mira, Agustina Carando, Claudia H. Sanchez Gutierrez and Kenji Sagae
pp. 7238‑7243
pdf bib DeepNLPF: A Framework for Integrating Third Party NLP Tools
Francisco Rodrigues, Rinaldo Lima, William Domingues, Robson Fidalgo, Adrian Chifu, Bernard Espinasse and Sébastien Fournier
pp. 7244‑7251

Last modified on May 19, 2020, 7:54 p.m.