Proceedings of The 13th Language Resources and Evaluation Conference

LREC 2022 Proceedings Home | Workshops | LREC 2022 WEBSITE | ELRA WEBSITE

Proceedings of the 13th Language Resources and Evaluation Conference

Full proceedings volume (PDF) | Programme | Author index | Bibliography (BibTeX) | Editors

pdf	bib	slides	video	Papers	pages
pdf	bib	slides	video	Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet Alexandre Diniz da Costa, Mateus Coutinho Marim, Ely Matos and Tiago Timponi Torrent	pp. 1‑12
pdf	bib	slides	video	HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation Serge Gladkoff and Lifeng Han	pp. 13‑21
pdf	bib	slides	video	Priming Ancient Korean Neural Machine Translation chanjun park, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo and Heuiseok Lim	pp. 22‑28
pdf	bib	slides	video	GECO-MT: The Ghent Eye-tracking Corpus of Machine Translation Toon Colman, Margot Fonteyne, Joke Daems, Nicolas Dirix and Lieve Macken	pp. 29‑38
pdf	bib	slides	video	Introducing Frege to Fillmore: A FrameNet Dataset that Captures both Sense and Reference Levi Remijnse, Piek Vossen, Antske Fokkens and Sam Titarsolej	pp. 39‑50
pdf	bib	slides	video	Compiling a Suitable Level of Sense Granularity in a Lexicon for AI Purposes: The Open Source COR Lexicon Bolette Pedersen, Nathalie Carmen Hau Sørensen, Sanni Nimb, Ida Flørke, Sussi Olsen and Thomas Troelsgård	pp. 51‑60
pdf	bib	slides	video	Sense and Sentiment Francis Bond and Merrick Choo	pp. 61‑69
pdf	bib	slides	video	Enriching Linguistic Representation in the Cantonese Wordnet and Building the New Cantonese Wordnet Corpus Ut Seong Sio and Luís Morgado da Costa	pp. 70‑78
pdf	bib	slides	video	ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus Nizar Habash and David Palfreyman	pp. 79‑88
pdf	bib	slides	video	Turkish Universal Conceptual Cognitive Annotation Necva Bölücü and Burcu Can	pp. 89‑99
pdf	bib		video	Introducing the CURLICAT Corpora: Seven-language Domain Specific Annotated Corpora from Curated Sources Tamás Váradi, Bence Nyéki, Svetla Koeva, Marko Tadić, Vanja Štefanec, Maciej Ogrodniczuk, Bartłomiej Nitoń, Piotr Pęzik, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Dan Tufiș, Radovan Garabík, Simon Krek and Andraž Repar	pp. 100‑108
pdf	bib	slides	video	RU-ADEPT: Russian Anonymized Dataset with Eight Personality Traits C. Anton Rytting, Valerie Novak, James R. Hull, Victor M. Frank, Paul Rodrigues, Jarrett G. W. Lee and Laurel Miller-Sims	pp. 109‑118
pdf	bib		video	CoQAR: Question Rewriting on CoQA Quentin Brabant, Gwénolé Lecorvé and Lina M. Rojas Barahona	pp. 119‑126
pdf	bib		video	User Interest Modelling in Argumentative Dialogue Systems Annalena Aicher, Nadine Gerstenlauer, Wolfgang Minker and Stefan Ultes	pp. 127‑136
pdf	bib		video	Every time I fire a conversational designer, the performance of the dialogue system goes down Giancarlo Xompero, Michele Mastromattei, Samir Salman, Cristina Giannone, Andrea Favalli, Raniero Romagnoli and Fabio Massimo Zanzotto	pp. 137‑145
pdf	bib	slides	video	An Empirical Study on the Overlapping Problem of Open-Domain Dialogue Datasets Yuqiao Wen, Guoqing Luo and Lili Mou	pp. 146‑153
pdf	bib		video	Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project Federica Gamba, Francesca Frontini, Daan Broeder and Monica Monachini	pp. 154‑163
pdf	bib		video	How to be FAIR when you CARE: The DGS Corpus as a Case Study of Open Science Resources for Minority Languages Marc Schulder and Thomas Hanke	pp. 164‑173
pdf	bib	poster	video	Italian NLP for Everyone: Resources and Models from EVALITA to the European Language Grid Valerio Basile, Cristina Bosco, Michael Fell, Viviana Patti and Rossella Varvara	pp. 174‑180
pdf	bib	poster	video	Cross-Lingual Link Discovery for Under-Resourced Languages Michael Rosner, Sina Ahmadi, Elena-Simona Apostol, Julia Bosque-Gil, Christian Chiarcos, Milan Dojchinovski, Katerina Gkirtzou, Jorge Gracia, Dagmar Gromann, Chaya Liebeskind, Giedrė Valūnaitė Oleškevičienė, Gilles Sérasset and Ciprian-Octavian Truică	pp. 181‑192
pdf	bib		video	Angry or Sad ? Emotion Annotation for Extremist Content Characterisation Valentina Dragos, Delphine Battistelli, Aline Etienne and Yolène Constable	pp. 193‑201
pdf	bib		video	Identification of Multiword Expressions in Tweets for Hate Speech Detection Nicolas Zampieri, Carlos Ramisch, Irina Illina and Dominique Fohr	pp. 202‑210
pdf	bib	poster	video	Causal Investigation of Public Opinion during the COVID-19 Pandemic via Social Media Text Michael Jantscher and Roman Kern	pp. 211‑226
pdf	bib	poster	video	Misspelling Semantics in Thai Pakawat Nakwijit and Matthew Purver	pp. 227‑236
pdf	bib		video	Automatic Detection of Stigmatizing Uses of Psychiatric Terms on Twitter Véronique MORICEAU, Farah Benamara and Abdelmoumene Boumadane	pp. 237‑243
pdf	bib	poster	video	CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets Isabelle Mohr, Amelie Wührl and Roman Klinger	pp. 244‑257
pdf	bib	poster	video	XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond Francesco Barbieri, Luis Espinosa Anke and Jose Camacho-Collados	pp. 258‑266
pdf	bib	poster	video	‘Am I the Bad One’? Predicting the Moral Judgement of the Crowd Using Pre–trained Language Models Areej Alhassan, Jinkai Zhang and Viktor Schlegel	pp. 267‑276
pdf	bib	poster	video	Generating Questions from Wikidata Triples Kelvin Han, Thiago Castro Ferreira and Claire Gardent	pp. 277‑290
pdf	bib	poster	video	Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition Matteo Muffo, Aldo Cocco and Enrico Bertino	pp. 291‑297
pdf	bib		video	Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization Yuji Naraki, Tetsuya Sakai and Yoshihiko Hayashi	pp. 298‑304
pdf	bib	poster	video	Perceived Text Quality and Readability in Extractive and Abstractive Summaries Julius Monsen and Evelina Rennes	pp. 305‑312
pdf	bib	poster	video	Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization Alex Mei, Anisha Kabir, Rukmini Bapat, John Judge, Tony Sun and William Yang Wang	pp. 313‑318
pdf	bib	poster	video	Automating Horizon Scanning in Future Studies Tatsuya Ishigaki, Suzuko Nishino, Sohei Washino, Hiroki Igarashi, Yukari Nagai, Yuichi Washida and Akihiko Murai	pp. 319‑327
pdf	bib		video	ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining Nguyen Minh, Vu Hoang Tran, Vu Hoang, Huy Duc Ta, Trung Huu Bui and Steven Quoc Hung Truong	pp. 328‑337
pdf	bib	poster	video	Privacy-Preserving Graph Convolutional Networks for Text Classification Timour Igamberdiev and Ivan Habernal	pp. 338‑350
pdf	bib	poster	video	ArMATH: a Dataset for Solving Arabic Math Word Problems Reem Alghamdi, Zhenwen Liang and Xiangliang Zhang	pp. 351‑362
pdf	bib	poster	video	KIMERA: Injecting Domain Knowledge into Vacant Transformer Heads Benjamin Winter, Alexei Figueroa Rosero, Alexander Löser, Felix Alexander Gers and Amy Siu	pp. 363‑373
pdf	bib	poster	video	Distilling the Knowledge of Romanian BERTs Using Multiple Teachers Andrei-Marius Avram, Darius Catrina, Dumitru-Clementin Cercel, Mihai Dascalu, Traian Rebedea, Vasile Pais and Dan Tufis	pp. 374‑384
pdf	bib	poster	video	Personalized Filled-pause Generation with Group-wise Prediction Models Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi and Hiroshi Saruwatari	pp. 385‑392
pdf	bib	poster	video	Transformer versus LSTM Language Models trained on Uncertain ASR Hypotheses in Limited Data Scenarios Imran Sheikh, Emmanuel Vincent and Irina Illina	pp. 393‑399
pdf	bib	poster	video	Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised? Boshko Koloski, Senja Pollak, Blaž Škrlj and Matej Martinc	pp. 400‑409
pdf	bib	poster	video	Evaluating Pretraining Strategies for Clinical BERT Models Anastasios Lamproudis, Aron Henriksson and Hercules Dalianis	pp. 410‑416
pdf	bib	poster	video	KazNERD: Kazakh Named Entity Recognition Dataset Rustem Yeshpanov, Yerbolat Khassanov and Huseyin Atakan Varol	pp. 417‑426
pdf	bib	poster	video	Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization Michail Mersinias and Panagiotis Valvis	pp. 427‑435
pdf	bib	poster	video	Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning Mike Zhang, Kristian Nørgaard Jensen and Barbara Plank	pp. 436‑447
pdf	bib	poster	video	Semantic Role Labelling for Dutch Law Texts Roos Bakker, Romy A.N. van Drie, Maaike de Boer, Robert van Doesburg and Tom van Engers	pp. 448‑457
pdf	bib	poster	video	English Language Spelling Correction as an Information Retrieval Task Using Wikipedia Search Statistics Kyle Goslin and Markus Hofmann	pp. 458‑464
pdf	bib	poster	video	CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction Meisin Lee, Lay-Ki Soon, Eu Gene Siew and Ly Fie Sugianto	pp. 465‑479
pdf	bib	poster	video	Claim Extraction and Law Matching for COVID-19-related Legislation Niklas Dehio, Malte Ostendorff and Georg Rehm	pp. 480‑490
pdf	bib	poster	video	Constructing A Dataset of Support and Attack Relations in Legal Arguments in Court Judgements using Linguistic Rules Basit Ali, Sachin Pawar, Girish Palshikar and Rituraj Singh	pp. 491‑500
pdf	bib	poster	video	KIND: an Italian Multi-Domain Dataset for Named Entity Recognition Teresa Paccosi and Alessio Palmero Aprosio	pp. 501‑507
pdf	bib	poster	video	Russian Jeopardy! Data Set for Question-Answering Systems Elena Mikhalkova and Alexander A. Khlyupin	pp. 508‑514
pdf	bib	poster	video	Know Better – A Clickbait Resolving Challenge Benjamin Hättasch and Carsten Binnig	pp. 515‑523
pdf	bib		video	Valet: Rule-Based Information Extraction for Rapid Deployment Dayne Freitag, John Cadigan, Robert Sasseen and Paul Kalmar	pp. 524‑533
pdf	bib		video	Negation Detection in Dutch Spoken Human-Computer Conversations Tom Sweers, Iris Hendrickx and Helmer Strik	pp. 534‑542
pdf	bib	slides	video	Reflections on 30 Years of Language Resource Development and Sharing Christopher Cieri, Mark Liberman, Sunghye Cho, Stephanie Strassel, James Fiumara and Jonathan Wright	pp. 543‑550
pdf	bib	slides	video	Language Resources to Support Language Diversity – the ELRA Achievements Valérie Mapelli, Victoria Arranz, Khalid Choukri and Hélène Mazo	pp. 551‑558
pdf	bib	slides	video	Ethical Issues in Language Resources and Language Technology – Tentative Categorisation Pawel Kamocki and Andreas Witt	pp. 559‑563
pdf	bib	slides	video	Do we Name the Languages we Study? The #BenderRule in LREC and ACL articles Fanny Ducel, Karën Fort, Gaël Lejeune and Yves Lepage	pp. 564‑573
pdf	bib		video	Aspect-Based Emotion Analysis and Multimodal Coreference: A Case Study of Customer Comments on Adidas Instagram Posts Luna De Bruyne, Akbar Karimi, Orphee De Clercq, Andrea Prati and Veronique Hoste	pp. 574‑580
pdf	bib	slides	video	Multi-source Multi-domain Sentiment Analysis with BERT-based Models Gabriel Roccabruna, Steve Azzolin and Giuseppe Riccardi	pp. 581‑589
pdf	bib	slides	video	NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis Shamsuddeen Hassan Muhammad, David Adelani, Anuoluwapo Aremu and Idris Abdulmumin	pp. 590‑602
pdf	bib	slides	video	A (Psycho-)Linguistically Motivated Scheme for Annotating and Exploring Emotions in a Genre-Diverse Corpus Aline Etienne, Delphine Battistelli and Gwénolé Lecorvé	pp. 603‑612
pdf	bib	slides	video	Integrating a Phrase Structure Corpus Grammar and a Lexical-Semantic Network: the HOLINET Knowledge Graph Jean-Philippe Prost	pp. 613‑622
pdf	bib	slides	video	On the Impact of Temporal Representations on Metaphor Detection Giorgio Ottolina, Matteo Luigi Palmonari, Manuel Vimercati and Mehwish Alam	pp. 623‑632
pdf	bib	slides	video	Analysis and Prediction of NLP Models via Task Embeddings Damien Sileo and Marie-Francine Moens	pp. 633‑647
pdf	bib	slides	video	Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data Amir Hazem, Merieme Bouhandi, Florian Boudin and Beatrice Daille	pp. 648‑662
pdf	bib	slides	video	Few-Shot Learning for Argument Aspects of the Nuclear Energy Debate Lena Jurkschat, Gregor Wiedemann, Maximilian Heinrich, Mattes Ruckdeschel and Sunna Torge	pp. 663‑672
pdf	bib	slides	video	MuLVE, A Multi-Language Vocabulary Evaluation Data Set Anik Jacobsen, Salar Mohtaj and Sebastian Möller	pp. 673‑679
pdf	bib	slides	video	PLOD: An Abbreviation Detection Dataset for Scientific Documents Leonardo Zilio, Hadeel Saadany, Prashant Sharma, Diptesh Kanojia and Constantin Orăsan	pp. 680‑688
pdf	bib	poster	video	Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms Tosin Adewumi, Roshanak Vadoodi, Aparajita Tripathy, Konstantina Nikolaido, Foteini Liwicki and Marcus Liwicki	pp. 689‑696
pdf	bib	poster	video	LeSpell - A Multi-Lingual Benchmark Corpus of Spelling Errors to Develop Spellchecking Methods for Learner Language Marie Bexte, Ronja Laarmann-Quante, Andrea Horbach and Torsten Zesch	pp. 697‑706
pdf	bib	poster	video	Subjective Text Complexity Assessment for German Laura Seiffe, Fares Kallel, Sebastian Möller, Babak Naderi and Roland Roller	pp. 707‑714
pdf	bib	poster	video	Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora Elena Frick, Thomas Schmidt and Henrike Helmer	pp. 715‑722
pdf	bib	poster	video	DiaBiz – an Annotated Corpus of Polish Call Center Dialogs Piotr Pęzik, Gosia Krawentek, Sylwia Karasińska, Paweł Wilk, Paulina Rybińska, Anna Cichosz, Angelika Peljak-Łapińska, Mikołaj Deckert and Michał Adamczyk	pp. 723‑726
pdf	bib	poster	video	LaVA – Latvian Language Learner corpus Roberts Darģis, Ilze Auziņa, Inga Kaija, Kristīne Levāne-Petrova and Kristīne Pokratniece	pp. 727‑731
pdf	bib	poster	video	The EuroPat Corpus: A Parallel Corpus of European Patent Data Kenneth Heafield, Elaine Farrow, Jelmer van der Linde, Gema Ramírez-Sánchez and Dion Wiggins	pp. 732‑740
pdf	bib	poster	video	"Beste Grüße, Maria Meyer" — Pseudonymization of Privacy-Sensitive Information in Emails Elisabeth Eder, Michael Wiegand, Ulrike Krieg-Holz and Udo Hahn	pp. 741‑752
pdf	bib		video	Criteria for the Annotation of Implicit Stereotypes Wolfgang Schmeisser-Nieto, Montserrat Nofre and Mariona Taulé	pp. 753‑762
pdf	bib	poster	video	Common Phone: A Multilingual Dataset for Robust Acoustic Modelling Philipp Klumpp, Tomas Arias, Paula Andrea Pérez-Toro, Elmar Noeth and Juan Orozco-Arroyave	pp. 763‑768
pdf	bib	poster	video	Curras + Baladi: Towards a Levantine Corpus Karim Al-Haff, Mustafa Jarrar, Tymaa Hammouda and Fadi Zaraket	pp. 769‑778
pdf	bib	poster	video	Annotation Study of Japanese Judgments on Tort for Legal Judgment Prediction with Rationales Hiroaki Yamada, Takenobu Tokunaga, Ryutaro Ohara, Keisuke Takeshita and Mihoko Sumida	pp. 779‑790
pdf	bib	poster	video	Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online Dana Ruiter, Liane Reiners, Ashwin Geet D’Sa, Thomas Kleinbauer, Dominique Fohr, Irina Illina, Dietrich Klakow, Christian Schemer and Angeliki Monnier	pp. 791‑804
pdf	bib	poster	video	ParCorFull2.0: a Parallel Corpus Annotated with Full Coreference Ekaterina Lapshinova-Koltunski, Pedro Augusto Ferreira, Elina Lartaud and Christian Hardmeier	pp. 805‑813
pdf	bib	poster	video	A Multi-Party Dialogue Ressource in French Maria Boritchev and Maxime Amblard	pp. 814‑823
pdf	bib		video	Bicleaner AI: Bicleaner Goes Neural Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Marta Bañón and Sergio Ortiz Rojas	pp. 824‑831
pdf	bib	poster	video	Semi-automatically Annotated Learner Corpus for Russian Anisia Katinskaia, Maria Lebedeva, Jue Hou and Roman Yangarber	pp. 832‑839
pdf	bib		video	UniMorph 4.0: Universal Morphology Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Abbott Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, brijesh bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud’hommeaux, Maria Nepomniashchaya, fausto giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty and Ekaterina Vylomova	pp. 840‑855
pdf	bib	poster	video	Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation Dmytro Kalpakchi and Johan Boye	pp. 856‑866
pdf	bib	poster	video	CyberAgressionAdo-v1: a Dataset of Annotated Online Aggressions in French Collected through a Role-playing Game Anaïs Ollagnier, Elena Cabrio, Serena Villata and Catherine Blaya	pp. 867‑875
pdf	bib	poster	video	Finnish Hate-Speech Detection on Social Media Using CNN and FinBERT Md Saroar Jahan, Mourad Oussalah and Nabil Arhab	pp. 876‑882
pdf	bib	poster	video	Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing Hyeonseok Moon, chanjun park, Seolhwa Lee, Jaehyung Seo, Jungseob Lee, Sugyeong Eo and Heuiseok Lim	pp. 883‑891
pdf	bib	poster	video	Domain Mismatch Doesn’t Always Prevent Cross-lingual Transfer Learning Daniel Edmiston, Phillip Keung and Noah A. Smith	pp. 892‑899
pdf	bib	poster	video	Cross-Lingual Knowledge Transfer for Clinical Phenotyping Jens-Michalis Papaioannou, Paul Grundmann, Betty van Aken, Athanasios Samaras, Ilias Kyparissidis, George Giannakoulas, Felix Gers and Alexander Loeser	pp. 900‑909
pdf	bib		video	The Multilingual Microblog Translation Corpus: Improving and Evaluating Translation of User-Generated Text Paul McNamee and Kevin Duh	pp. 910‑918
pdf	bib		video	Multilingual and Multimodal Learning for Brazilian Portuguese Júlia Sato, Helena Caseli and Lucia Specia	pp. 919‑927
pdf	bib	poster	video	LibriS2S: A German-English Speech-to-Speech Translation Corpus Pedro Jeuris and Jan Niehues	pp. 928‑935
pdf	bib	poster	video	A Linguistically Motivated Test Suite to Semi-Automatically Evaluate German–English Machine Translation Output Vivien Macketanz, Eleftherios Avramidis, Aljoscha Burchardt, He Wang, Renlong Ai, Shushen Manakhimova, Ursula Strohriegel, Sebastian Möller and Hans Uszkoreit	pp. 936‑947
pdf	bib	poster	video	Cross-lingual Transfer of Monolingual Models Evangelia Gogoulou, Ariel Ekgren, Tim Isbister and Magnus Sahlgren	pp. 948‑955
pdf	bib	poster	video	Dataset of Student Solutions to Algorithm and Data Structure Programming Assignments Fynn Petersen-Frey, Marcus Soll, Louis Kobras, Melf Johannsen, Peter Kling and Chris Biemann	pp. 956‑962
pdf	bib	poster	video	Language Patterns and Behaviour of the Peer Supporters in Multilingual Healthcare Conversational Forums Ishani Mondal, Kalika Bali, Mohit Jain, Monojit Choudhury, Jacki O’Neill, Millicent Ochieng, Kagnoya Awori and Keshet Ronen	pp. 963‑975
pdf	bib	poster	video	Frame Shift Prediction Zheng Xin Yong, Patrick D. Watson, Tiago Timponi Torrent, Oliver Czulo and Collin Baker	pp. 976‑986
pdf	bib	poster	video	CLeLfPC: a Large Open Multi-Speaker Corpus of French Cued Speech Brigitte BIGI, Maryvonne Zimmermann and Carine André	pp. 987‑994
pdf	bib	poster	video	Samrómur Children: An Icelandic Speech Corpus Carlos Daniel Hernandez Mena, David Erik Mollberg, Michal Borský and Jón Guðnason	pp. 995‑1002
pdf	bib	poster	video	The Norwegian Parliamentary Speech Corpus Per Erik Solberg and Pablo Ortiz	pp. 1003‑1008
pdf	bib		video	A Speech Recognizer for Frisian/Dutch Council Meetings Martijn Bentum, Louis ten Bosch, Henk van den Heuvel, Simone Wills, Domenique van der Niet, Jelske Dijkstra and Hans Van de Velde	pp. 1009‑1015
pdf	bib		video	Elderly Conversational Speech Corpus with Cognitive Impairment Test and Pilot Dementia Detection Experiment Using Acoustic Characteristics of Speech in Japanese Dialects Meiko Fukuda, Ryota Nishimura, Maina Umezawa, Kazumasa Yamamoto, Yurie Iribe and Norihide Kitaoka	pp. 1016‑1022
pdf	bib	poster	video	A Spoken Drug Prescription Dataset in French for Spoken Language Understanding Ali Can Kocabiyikoglu, François Portet, Prudence Gibert, Hervé Blanchon, Jean-Marc Babouchkine and Gaëtan Gavazzi	pp. 1023‑1031
pdf	bib	poster	video	Towards an Open-Source Dutch Speech Recognition System for the Healthcare Domain Cristian Tejedor-García, Berrie van der Molen, Henk van den Heuvel, Arjan van Hessen and Toine Pieters	pp. 1032‑1039
pdf	bib	poster	video	A Dataset for Speech Emotion Recognition in Greek Theatrical Plays Maria Moutti, Sofia Eleftheriou, Panagiotis Koromilas and Theodoros Giannakopoulos	pp. 1040‑1046
pdf	bib	poster	video	Audiobook Dialogues as Training Data for Conversational Style Synthetic Voices Liisi Piits, Hille Pajupuu, Heete Sahkai, Rene Altrov, Liis Ermus, Kairi Tamuri, Indrek Hein, Meelis Mihkla, Indrek Kiissel, Egert Männisalu, Kristjan Suluste and Jaan Pajupuu	pp. 1047‑1053
pdf	bib		video	Using a Knowledge Base to Automatically Annotate Speech Corpora and to Identify Sociolinguistic Variation Yaru WU, Fabian Suchanek, Ioana Vasilescu, Lori Lamel and Martine Adda-Decker	pp. 1054‑1060
pdf	bib		video	Phone Inventories and Recognition for Every Language Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black and Shinji Watanabe	pp. 1061‑1067
pdf	bib	slides	video	Constructing Parallel Corpora from COVID-19 News using MediSys Metadata Dimitrios Roussis, Vassilis Papavassiliou, Sokratis Sofianopoulos, Prokopis Prokopidis and Stelios Piperidis	pp. 1068‑1072
pdf	bib	slides	video	A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes Dongxu Zhang, Sunil Mohan, Michaela Torkar and Andrew McCallum	pp. 1073‑1082
pdf	bib	slides	video	DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries Jayetri Bardhan, Anthony Colas, Kirk Roberts and Daisy Zhe Wang	pp. 1083‑1097
pdf	bib	slides	video	Efficiently and Thoroughly Anonymizing a Transformer Language Model for Dutch Electronic Health Records: a Two-Step Method Stella Verkijk and Piek Vossen	pp. 1098‑1103
pdf	bib	slides	video	BERTrade: Using Contextual Embeddings to Parse Old French Loïc Grobol, Mathilde Regnault, Pedro Ortiz Suarez, Benoît Sagot, Laurent Romary and Benoit Crabbé	pp. 1104‑1113
pdf	bib	slides	video	Out-of-Domain Evaluation of Finnish Dependency Parsing Jenna Kanerva and Filip Ginter	pp. 1114‑1124
pdf	bib	slides	video	TArC: Tunisian Arabish Corpus, First complete release elisa gugliotta and Marco Dinarelli	pp. 1125‑1136
pdf	bib	slides	video	Towards Universal Segmentations: UniSegments 1.0 Zdeněk Žabokrtský, Niyati Bafna, Jan Bodnár, Lukáš Kyjánek, Emil Svoboda, Magda Ševčíková and Jonáš Vidra	pp. 1137‑1149
pdf	bib	slides	video	TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP Steven Moran, Christian Bentz, Ximena Gutierrez-Vasques, Olga Sozinova and Tanja Samardzic	pp. 1150‑1158
pdf	bib		video	Leveraging a Bilingual Dictionary to Learn Wolastoqey Word Representations Diego Bear and Paul Cook	pp. 1159‑1166
pdf	bib	slides	video	Unmasking the Myth of Effortless Big Data - Making an Open Source Multi-lingual Infrastructure and Building Language Resources from Scratch Linda Wiechetek, Katri Hiovain-Asikainen, Inga Lill Sigga Mikkelsen, Sjur Moshagen, Flammie Pirinen, Trond Trosterud and Børre Gaup	pp. 1167‑1177
pdf	bib		video	Building and curating conversational corpora for diversity-aware language science and technology Andreas Liesenfeld and Mark Dingemanse	pp. 1178‑1192
pdf	bib	slides	video	EPIC UdS - Creation and Applications of a Simultaneous Interpreting Corpus Heike Przybyl, Ekaterina Lapshinova-Koltunski, Katrin Menzel, Stefan Fischer and Elke Teich	pp. 1193‑1200
pdf	bib	slides	video	Development of a Benchmark Corpus to Support Entity Recognition in Job Descriptions Thomas Green, Diana Maynard and Chenghua Lin	pp. 1201‑1208
pdf	bib	slides	video	CAMIO: A Corpus for OCR in Multiple Languages Michael Arrigo, Stephanie Strassel, Nolan King, Thao Tran and Lisa Mason	pp. 1209‑1216
pdf	bib		video	FABRA: French Aggregator-Based Readability Assessment toolkit Rodrigo Wilkens, David Alfter, Xiaoou Wang, Alice Pintard, Anaïs Tack, Kevin P. Yancey and Thomas François	pp. 1217‑1233
pdf	bib		video	Towards Building a Spoken Dialogue System for Argument Exploration Annalena Aicher, Nadine Gerstenlauer, Isabel Feustel, Wolfgang Minker and Stefan Ultes	pp. 1234‑1241
pdf	bib	poster	video	FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue chanjun park, Yoonna Jang, Seolhwa Lee, Sungjin Park and Heuiseok Lim	pp. 1242‑1248
pdf	bib		video	Self-Contained Utterance Description Corpus for Japanese Dialog Yuta Hayashibe	pp. 1249‑1255
pdf	bib	poster	video	DialCrowd 2.0: A Quality-Focused Dialog System Crowdsourcing Toolkit Jessica Huynh, Ting-Rui Chiang, Jeffrey Bigham and Maxine Eskenazi	pp. 1256‑1263
pdf	bib	poster	video	A Brief Survey of Textual Dialogue Corpora Hugo Gonçalo Oliveira, Patrícia Ferreira, Daniel Martins, Catarina Silva and Ana Alves	pp. 1264‑1274
pdf	bib		video	A Unified Approach to Entity-Centric Context Tracking in Social Conversations Ulrich Rückert, Srinivas Sunkara, Abhinav Rastogi, Sushant Prakash and Pranav Khaitan	pp. 1275‑1285
pdf	bib	poster	video	A Unifying View On Task-oriented Dialogue Annotation Vojtěch Hudeček, leon-paul Schaub, Daniel Stancl, Patrick Paroubek and Ondřej Dušek	pp. 1286‑1296
pdf	bib	poster	video	A Multi-source Graph Representation of the Movie Domain for Recommendation Dialogues Analysis Antonio Origlia, Martina Di Bratto, Maria Di Maro and Sabrina Mennella	pp. 1297‑1306
pdf	bib	poster	video	SHARE: A Lexicon of Harmful Expressions by Spanish Speakers Flor Miriam Plaza-del-Arco, Ana Belén Parras Portillo, Pilar López Úbeda, Beatriz Gil and María-Teresa Martín-Valdivia	pp. 1307‑1316
pdf	bib		video	Wiktextract: Wiktionary as Machine-Readable Structured Data Tatu Ylonen	pp. 1317‑1325
pdf	bib	poster	video	NyLLex: A Novel Resource of Swedish Words Annotated with Reading Proficiency Level Daniel Holmer and Evelina Rennes	pp. 1326‑1331
pdf	bib	poster	video	Making a Semantic Event-type Ontology Multilingual Zdenka Uresova, Karolina Zaczynska, Peter Bourgonje, Eva Fučíková, Georg Rehm and Jan Hajic	pp. 1332‑1343
pdf	bib	poster	video	NomVallex: A Valency Lexicon of Czech Nouns and Adjectives Veronika Kolářová and Anna Vernerová	pp. 1344‑1352
pdf	bib	poster	video	TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively Izaskun Aldezabal, Jose Mari Arriola and Arantxa Otegi	pp. 1353‑1359
pdf	bib	poster	video	Animacy Denoting German Nouns: Annotation and Classification Manfred Klenner and Anne Göhring	pp. 1360‑1364
pdf	bib	poster	video	x-enVENT: A Corpus of Event Descriptions with Experiencer-specific Emotion and Appraisal Annotations Enrica Troiano, Laura Ana Maria Oberlaender, Maximilian Wegge and Roman Klinger	pp. 1365‑1375
pdf	bib		video	Polar Quantification of Actor Noun Phrases for German Anne Göhring and Manfred Klenner	pp. 1376‑1380
pdf	bib	poster	video	Czech Dataset for Cross-lingual Subjectivity Classification Pavel Přibáň and Josef Steinberger	pp. 1381‑1391
pdf	bib	poster	video	RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection Alexandra Ciobotaru, Mihai Vlad Constantinescu, Liviu P. Dinu and Stefan Dumitrescu	pp. 1392‑1399
pdf	bib	poster	video	Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans Katrin Ortmann	pp. 1400‑1407
pdf	bib		video	Probing Pre-trained Auto-regressive Language Models for Named Entity Typing and Recognition Elena V. Epure and Romain Hennequin	pp. 1408‑1417
pdf	bib	poster	video	Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embeddings Rob van der Goot, Max Müller-Eberstein and Barbara Plank	pp. 1418‑1427
pdf	bib	poster	video	The Subject Annotations of the Danish Parliament Corpus (2009-2017) - Evaluated with Automatic Multi-label Classification Costanza Navarretta and Dorte Haltrup Hansen	pp. 1428‑1436
pdf	bib	poster	video	A Systematic Study Reveals Unexpected Interactions in Pre-Trained Neural Machine Translation Ashleigh Richardson and Janet Wiles	pp. 1437‑1443
pdf	bib		video	Holistic Evaluation of Automatic TimeML Annotators Mustafa Ocal, Adrian Perez, Antonela Radas and Mark Finlayson	pp. 1444‑1453
pdf	bib	poster	video	Measuring Uncertainty in Translation Quality Evaluation (TQE) Serge Gladkoff, Irina Sorokina, Lifeng Han and Alexandra Alekseeva	pp. 1454‑1461
pdf	bib	poster	video	Challenging the Transformer-based models with a Classical Arabic dataset: Quran and Hadith Shatha Altammami and Eric Atwell	pp. 1462‑1471
pdf	bib	poster	video	Question Modifiers in Visual Question Answering William Britton, Somdeb Sarkhel and Deepak Venugopal	pp. 1472‑1479
pdf	bib	poster	video	Multimodal Pipeline for Collection of Misinformation Data from Telegram Jose Sosa and Serge Sharoff	pp. 1480‑1489
pdf	bib	poster	video	Identifying Tension in Holocaust Survivors’ Interview: Code-switching/Code-mixing as Cues Xinyuan Xia, Lu Xiao, Kun Yang and Yueyue Wang	pp. 1490‑1495
pdf	bib	poster	video	Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering? Kristian Nørgaard Jensen and Barbara Plank	pp. 1496‑1508
pdf	bib		video	Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset Svetla Koeva, Ivelina Stoyanova and Jordan Kralev	pp. 1509‑1518
pdf	bib		video	Sign Language Production With Avatar Layering: A Critical Use Case over Rare Words Jung-Ho Kim, Eui Jun Hwang, Sukmin Cho, Du Hui Lee and Jong Park	pp. 1519‑1528
pdf	bib		video	The VoxWorld Platform for Multimodal Embodied Agents Nikhil Krishnaswamy, William Pickard, Brittany Cates, Nathaniel Blanchard and James Pustejovsky	pp. 1529‑1541
pdf	bib	poster	video	MemoSen: A Multimodal Dataset for Sentiment Analysis of Memes Eftekhar Hossain, Omar Sharif and Mohammed Moshiul Hoque	pp. 1542‑1554
pdf	bib		video	RUSAVIC Corpus: Russian Audio-Visual Speech in Cars Denis Ivanko, Alexandr Axyonov, Dmitry Ryumin, Alexey Kashevnik and Alexey Karpov	pp. 1555‑1559
pdf	bib	poster	video	A First Corpus of AZee Discourse Expressions Camille Challant and Michael Filhol	pp. 1560‑1565
pdf	bib	poster	video	BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment Luis Lebron, Yvette Graham, Kevin McGuinness, Konstantinos Kouramas and Noel E. O’Connor	pp. 1566‑1575
pdf	bib		video	Abstract Meaning Representation for Gesture Richard Brutti, Lucia Donatelli, Kenneth Lai and James Pustejovsky	pp. 1576‑1583
pdf	bib	slides	video	The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild Taja Kuzman, Peter Rupnik and Nikola Ljubešić	pp. 1584‑1594
pdf	bib	slides	video	The Spoken Language Understanding MEDIA Benchmark Dataset in the Era of Deep Learning: data updates, training and evaluation tools Gaëlle Laperrière, Valentin Pelloin, Antoine Caubrière, salima mdhaffar, Nathalie Camelin, Sahar Ghannay, Bassam Jabaian and Yannick Estève	pp. 1595‑1602
pdf	bib	slides	video	BasqueGLUE: A Natural Language Understanding Benchmark for Basque Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri and Aitor Soroa	pp. 1603‑1612
pdf	bib	slides	video	Resources and Experiments on Sentiment Classification for Georgian Nicolas Stefanovitch, Jakub Piskorski and Sopho Kharazi	pp. 1613‑1621
pdf	bib		video	CoFiF Plus: A French Financial Narrative Summarisation Corpus Nadhem ZMANDAR, Tobias Daudert, Sina Ahmadi, Mahmoud El-Haj and Paul Rayson	pp. 1622‑1639
pdf	bib	slides	video	Generating Extended and Multilingual Summaries with Pre-trained Transformers Rémi Calizzano, Malte Ostendorff, Qian Ruan and Georg Rehm	pp. 1640‑1650
pdf	bib	slides	video	MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases Louis Martin, Angela Fan, Éric de la Clergerie, Antoine Bordes and Benoît Sagot	pp. 1651‑1664
pdf	bib	slides	video	Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation Samhita Honnavalli, Aesha Parekh, Lily Ou, Sophie Groenwold, Sharon Levy, Vicente Ordonez and William Yang Wang	pp. 1665‑1670
pdf	bib		video	Combining ELECTRA and Adaptive Graph Encoding for Frame Identification Fabio Tamburini	pp. 1671‑1679
pdf	bib	slides	video	Polysemy in Spoken Conversations and Written Texts Aina Garí Soler, Matthieu Labeau and Chloé Clavel	pp. 1680‑1690
pdf	bib	slides	video	Cross-Level Semantic Similarity for Serbian Newswire Texts Vuk Batanović and Maja Miličević Petrović	pp. 1691‑1699
pdf	bib		video	Universal Proposition Bank 2.0 Ishan Jindal, Alexandre Rademaker, Michał Ulewicz, Ha Linh, Huyen Nguyen, Khoi-Nguyen Tran, Huaiyu Zhu and Yunyao Li	pp. 1700‑1711
pdf	bib		video	The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts Nora Hollenstein, Maria Barrett and Marina Björnsdóttir	pp. 1712‑1720
pdf	bib	slides	video	The Brooklyn Multi-Interaction Corpus for Analyzing Variation in Entrainment Behavior Andreas Weise, Matthew McNeill and Rivka Levitan	pp. 1721‑1731
pdf	bib	slides	video	Pro-TEXT: an Annotated Corpus of Keystroke Logs Aleksandra Miletic, Christophe Benzitoun, Georgeta Cislaru and Santiago Herrera-Yanez	pp. 1732‑1739
pdf	bib		video	Work Hard, Play Hard: Collecting Acceptability Annotations through a 3D Game Federico Bonetti, Elisa Leonardelli, Daniela Trotta, Raffaele Guarasci and Sara Tonelli	pp. 1740‑1750
pdf	bib	poster	video	DiHuTra: a Parallel Corpus to Analyse Differences between Human Translations Ekaterina Lapshinova-Koltunski, Maja Popović and Maarit Koponen	pp. 1751‑1760
pdf	bib	poster	video	Data Expansion Using WordNet-based Semantic Expansion and Word Disambiguation for Cyberbullying Detection Md Saroar Jahan, Djamila Romaissa Beddiar, Mourad Oussalah and Muhidin Mohamed	pp. 1761‑1770
pdf	bib	poster	video	ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation Peter Polák, Muskaan Singh, Anna Nedoluzhko and Ondřej Bojar	pp. 1771‑1779
pdf	bib	poster	video	KSoF: The Kassel State of Fluency Dataset – A Therapy Centered Dataset of Stuttering Sebastian Bayerl, Alexander Wolff von Gudenberg, Florian Hönig, Elmar Noeth and Korbinian Riedhammer	pp. 1780‑1787
pdf	bib		video	EZCAT: an Easy Conversation Annotation Tool Gaël Guibon, Luce Lefeuvre, Matthieu Labeau and Chloé Clavel	pp. 1788‑1797
pdf	bib	poster	video	Spoken Language Treebanks in Universal Dependencies: an Overview Kaja Dobrovoljc	pp. 1798‑1806
pdf	bib	poster	video	LeConTra: A Learner Corpus of English-to-Dutch News Translation Bram Vanroy and Lieve Macken	pp. 1807‑1816
pdf	bib	poster	video	Annotating Attribution in Czech News Server Articles Barbora Hladka, Jiří Mírovský, Matyáš Kopp and Václav Moravec	pp. 1817‑1823
pdf	bib	poster	video	Xposition: An Online Multilingual Database of Adpositional Semantics Luke Gessler, Nathan Schneider, Joseph C. Ledford and Austin Blodgett	pp. 1824‑1830
pdf	bib	poster	video	A Study in Contradiction: Data and Annotation for AIDA Focusing on Informational Conflict in Russia-Ukraine Relations Jennifer Tracey, Ann Bies, Jeremy Getman, Kira Griffitt and Stephanie Strassel	pp. 1831‑1838
pdf	bib		video	Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure Najet Hadj Mohamed, Cherifa Ben Khelil, Agata Savary, Iskandar keskes, Jean-Yves Antoine and Lamia Hadrich-Belguith	pp. 1839‑1848
pdf	bib		video	Annotation of Communicative Functions of Short Feedback Tokens in Switchboard Carol Figueroa, Adaeze Adigwe, Magalie Ochs and Gabriel Skantze	pp. 1849‑1859
pdf	bib		video	A Dataset of Offensive Language in Kosovo Social Media Adem Ajvazi and Christian Hardmeier	pp. 1860‑1869
pdf	bib	poster	video	The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses Bashar Alhafni, Nizar Habash and Houda Bouamor	pp. 1870‑1884
pdf	bib		video	The Engage Corpus: A Social Media Dataset for Text-Based Recommender Systems Daniel Cheng, Kyle Yan, Phillip Keung and Noah A. Smith	pp. 1885‑1889
pdf	bib	poster	video	Annotating Arguments in a Corpus of Opinion Articles Gil Rocha, Luís Trigo, Henrique Lopes Cardoso, Rui Sousa-Silva, Paula Carvalho, Bruno Martins and Miguel Won	pp. 1890‑1899
pdf	bib	poster	video	German Parliamentary Corpus (GerParCor) Giuseppe Abrami, Mevlüt Bagci, Leon Hammerla and Alexander Mehler	pp. 1900‑1906
pdf	bib	poster	video	NerKor+Cars-OntoNotes++ Attila Novák and Borbála Novák	pp. 1907‑1916
pdf	bib	poster	video	A Comparative Cross Language View On Acted Databases Portraying Basic Emotions Utilising Machine Learning Felix Burkhardt, Anabell Hacker, Uwe Reichel, Hagen Wierstorf, Florian Eyben and Björn Schuller	pp. 1917‑1924
pdf	bib		video	Nkululeko: A Tool For Rapid Speaker Characteristics Detection Felix Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben and Björn Schuller	pp. 1925‑1932
pdf	bib	poster	video	Speech Aerodynamics Database, Tools and Visualisation Shi YU, Clara Ponchard, Roland Trouville, Sergio Hassid and Didier Demolin	pp. 1933‑1938
pdf	bib			PATATRA and PATAFreq: two French databases for the documentation of within-speaker variability in speech Cécile Fougeron, Nicolas Audibert, cedric Gendrot, Estelle Chardenon and Louise Wohmann	pp. 1939‑1944
pdf	bib	poster	video	The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition Jonathan Mukiibi, Andrew Katumba, Joyce Nakatumba-Nabende, Ali Hussein and Joshua Meyer	pp. 1945‑1954
pdf	bib		video	Far-Field Speaker Recognition Benchmark Derived From The DiPCo Corpus Mickael Rouvier and Mohammad Mohammadamini	pp. 1955‑1959
pdf	bib		video	Evaluating Sampling-based Filler Insertion with Spontaneous TTS Siyang Wang, joakim gustafson and Éva Székely	pp. 1960‑1969
pdf	bib		video	BEA-Base: A Benchmark for ASR of Spontaneous Hungarian Peter Mihajlik, Andras Balog, Tekla Etelka Graczi, Anna Kohari, Balázs Tarján and Katalin Mady	pp. 1970‑1977
pdf	bib	poster	video	SNuC: The Sheffield Numbers Spoken Language Corpus Emma Barker, Jon Barker, Robert Gaizauskas, Ning Ma and Monica Lestari Paramita	pp. 1978‑1984
pdf	bib	poster	video	The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects Liang Zhao and Eleanor Chodroff	pp. 1985‑1990
pdf	bib	poster	video	The Speed-Vel Project: a Corpus of Acoustic and Aerodynamic Data to Measure Droplets Emission During Speech Interaction Francesca Carbone, Gilles Bouchet, Alain Ghio, Thierry Legou, Carine André, muriel lalain, Sabrina Kadri, Caterina Petrone, Federica Procino and Antoine Giovanni	pp. 1991‑1999
pdf	bib		video	Towards Speech-only Opinion-level Sentiment Analysis Annalena Aicher, Alisa Gazizullina, Aleksei Gusev, Yuri Matveev and Wolfgang Minker	pp. 2000‑2006
pdf	bib	poster	video	At the Intersection of NLP and Sustainable Development: Exploring the Impact of Demographic-Aware Text Representations in Modeling Value on a Corpus of Interviews Goya van Boven, Stephanie Hirmer and Costanza Conforti	pp. 2007‑2021
pdf	bib	poster	video	A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis Michael Gref, Nike Matthiesen, Sreenivasa Hikkal Venugopala, Shalaka Satheesh, Aswinkumar Vijayananth, Duc Bach Ha, Sven Behnke and Joachim Köhler	pp. 2022‑2031
pdf	bib	poster	video	Detecting Optimism in Tweets using Knowledge Distillation and Linguistic Analysis of Optimism Ștefan Cobeli, Ioan-Bogdan Iordache, Shweta Yadav, Cornelia Caragea, Liviu P. Dinu and Dragoș Iliescu	pp. 2032‑2041
pdf	bib	poster	video	Dataset and Baseline for Automatic Student Feedback Analysis Missaka Herath, Kushan Chamindu, Hashan Maduwantha and Surangika Ranathunga	pp. 2042‑2049
pdf	bib	poster	video	EENLP: Cross-lingual Eastern European NLP Index Alexey Tikhonov, Alex Malkhasov, Andrey Manoshin, George-Andrei Dima, Réka Cserháti, Md.Sadek Hossain Asif and Matt Sárdi	pp. 2050‑2057
pdf	bib	poster	video	Slovene SuperGLUE Benchmark: Translation and Evaluation Aleš Žagar and Marko Robnik-Šikonja	pp. 2058‑2065
pdf	bib	poster	video	Speech Resources in the Tamasheq Language Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier and Yannick Estève	pp. 2066‑2071
pdf	bib		video	Aesop’s fable "The North Wind and the Sun" Used as a Rosetta Stone to Extract and Map Spoken Words in Under-resourced Languages elena knyazeva, Philippe Boula de Mareüil and Frédéric Vernier	pp. 2072‑2079
pdf	bib	poster	video	Multilingual Open Text Release 1: Public Domain News in 44 Languages Chester Palen-Michel, June Kim and Constantine Lignos	pp. 2080‑2089
pdf	bib	poster	video	TweetTaglish: A Dataset for Investigating Tagalog-English Code-Switching Megan Herrera, Ankit Aich and Natalie Parde	pp. 2090‑2097
pdf	bib	poster	video	Jojajovai: A Parallel Guarani-Spanish Corpus for MT Benchmarking Luis Chiruzzo, Santiago Góngora, Aldo Alvarez, Gustavo Giménez-Lugo, Marvin Agüero-Torales and Yliana Rodríguez	pp. 2098‑2107
pdf	bib	slides	video	Assessing Multilinguality of Publicly Accessible Websites Rinalds Vīksna, Inguna Skadiņa, Raivis Skadiņš, Andrejs Vasiļjevs and Roberts Rozis	pp. 2108‑2116
pdf	bib		video	A Methodology for Building a Diachronic Dataset of Semantic Shifts and its Application to QC-FR-Diac-V1.0, a Free Reference for French David Kletz, Philippe Langlais, François Lareau and Patrick Drouin	pp. 2117‑2125
pdf	bib	slides	video	CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models Jörg Frohberg and Frank Binder	pp. 2126‑2140
pdf	bib	slides	video	Evaluating Gender Bias in Speech Translation Marta R. Costa-jussà, Christine Basta and Gerard I. Gállego	pp. 2141‑2147
pdf	bib	slides	video	Design Choices in Crowdsourcing Discourse Relation Annotations: The Effect of Worker Selection and Training Merel Scholman, Valentina Pyatkin, Frances Yung, Ido Dagan, Reut Tsarfaty and Vera Demberg	pp. 2148‑2156
pdf	bib	slides	video	TBD3: A Thresholding-Based Dynamic Depression Detection from Social Media for Low-Resource Users Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian and Ophir Frieder	pp. 2157‑2165
pdf	bib	slides	video	SpecNFS: A Challenge Dataset Towards Extracting Formal Models from Natural Language Specifications Sayontan Ghosh, Amanpreet Singh, Alex Merenstein, Wei Su, Scott A. Smolka, Erez Zadok and Niranjan Balasubramanian	pp. 2166‑2176
pdf	bib		video	Argument Similarity Assessment in German for Intelligent Tutoring: Crowdsourced Dataset and First Experiments Xiaoyu Bai and Manfred Stede	pp. 2177‑2187
pdf	bib	slides	video	Leveraging Pre-trained Language Models for Gender Debiasing Nishtha Jain, Declan Groves, Lucia Specia and Maja Popović	pp. 2188‑2195
pdf	bib	slides	video	Unsupervised Embeddings with Graph Auto-Encoders for Multi-domain and Multilingual Hate Speech Detection Gretel Liz De la Peña Sarracén and Paolo Rosso	pp. 2196‑2204
pdf	bib	slides	video	FQuAD2.0: French Question Answering and Learning When You Don’t Know Quentin Heinrich, Gautier Viaud and Wacim Belblidia	pp. 2205‑2214
pdf	bib		video	Large-Scale Hate Speech Detection with Cross-Domain Transfer Cagri Toraman, Furkan Şahinuç and Eyup Yilmaz	pp. 2215‑2225
pdf	bib	slides	video	GLoHBCD: A Naturalistic German Dataset for Language of Health Behaviour Change on Online Support Forums Selina Meyer and David Elsweiler	pp. 2226‑2235
pdf	bib	slides	video	Creating a Data Set of Abstractive Summaries of Turn-labeled Spoken Human-Computer Conversations Iris Hendrickx	pp. 2236‑2244
pdf	bib	slides	video	OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue Wen Cui, Leanne Rolston, Marilyn Walker and Beth Ann Hockey	pp. 2245‑2256
pdf	bib		video	Collecting Visually-Grounded Dialogue with A Game Of Sorts Bram Willemsen, Dmytro Kalpakchi and Gabriel Skantze	pp. 2257‑2268
pdf	bib	poster	video	CoRoSeOf - An Annotated Corpus of Romanian Sexist and Offensive Tweets Diana Constantina Hoefels, Çağrı Çöltekin and Irina Diana Mădroane	pp. 2269‑2281
pdf	bib	poster	video	ArMIS - The Arabic Misogyny and Sexism Corpus with Annotator Subjective Disagreements Dina Almanea and Massimo Poesio	pp. 2282‑2291
pdf	bib	poster	video	Annotating Interruption in Dyadic Human Interaction Liu YANG, Catherine ACHARD and Catherine PELACHAUD	pp. 2292‑2297
pdf	bib	poster	video	The Causal News Corpus: Annotating Causal Relations in Event Sentences from News Fiona Anting Tan, Ali Hürriyetoğlu, Tommaso Caselli, Nelleke Oostdijk, Tadashi Nomoto, Hansi Hettiarachchi, Iqra Ameer, Onur Uca, Farhana Ferdousi Liza and Tiancheng Hu	pp. 2298‑2310
pdf	bib	poster	video	Samrómur: Crowd-sourcing large amounts of data Staffan Hedström, David Erik Mollberg, Ragnheiður Þórhallsdóttir and Jón Guðnason	pp. 2311‑2316
pdf	bib	poster	video	An Annotated Corpus of Textual Explanations for Clinical Decision Support Roland Roller, Aljoscha Burchardt, Nils Feldhus, Laura Seiffe, Klemens Budde, Simon Ronicke and Bilgin Osmanodja	pp. 2317‑2326
pdf	bib	poster	video	LARD: Large-scale Artificial Disfluency Generation Tatiana Passali, Thanassis Mavropoulos, Grigorios Tsoumakas, Georgios Meditskos and Stefanos Vrochidis	pp. 2327‑2336
pdf	bib	poster	video	The CRECIL Corpus: a New Dataset for Extraction of Relations between Characters in Chinese Multi-party Dialogues Yuru Jiang, Yang Xu, Yuhang Zhan, Weikai He, Yilin Wang, Zixuan Xi, Meiyun Wang, Xinyu Li, Yu Li and Yanchao Yu	pp. 2337‑2344
pdf	bib	poster	video	The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa and Nizar Habash	pp. 2345‑2352
pdf	bib		video	A Universal Dependencies Treebank of Ancient Hebrew Daniel Swanson and Francis Tyers	pp. 2353‑2361
pdf	bib	poster	video	Hate Speech Dynamics Against African descent, Roma and LGBTQI Communities in Portugal Paula Carvalho, Bernardo Cunha, Raquel Santos, Fernando Batista and Ricardo Ribeiro	pp. 2362‑2370
pdf	bib		video	Evolving Large Text Corpora: Four Versions of the Icelandic Gigaword Corpus Starkaður Barkarson, Steinþór Steingrímsson and Hildur Hafsteinsdóttir	pp. 2371‑2381
pdf	bib	poster	video	A Pragmatics-Centered Evaluation Framework for Natural Language Understanding Damien Sileo, Philippe Muller, Tim Van de Cruys and Camille Pradel	pp. 2382‑2394
pdf	bib	poster	video	Conversational Analysis of Daily Dialog Data using Polite Emotional Dialogue Acts Chandrakant Bothe and Stefan Wermter	pp. 2395‑2400
pdf	bib	poster	video	Inducing Discourse Marker Inventories from Lexical Knowledge Graphs Christian Chiarcos	pp. 2401‑2412
pdf	bib		video	Story Trees: Representing Documents using Topological Persistence Pantea Haghighatkhah, Antske Fokkens, Pia Sommerauer, Bettina Speckmann and Kevin Verbeek	pp. 2413‑2429
pdf	bib	poster	video	Extracting and Analysing Metaphors in Migration Media Discourse: towards a Metaphor Annotation Scheme Ana Zwitter Vitez, Mojca Brglez, Marko Robnik Šikonja, Tadej Škvorc, Andreja Vezovnik and Senja Pollak	pp. 2430‑2439
pdf	bib	poster	video	DDisCo: A Discourse Coherence Dataset for Danish Linea Flansmose Mikkelsen, Oliver Kinch, Anders Jess Pedersen and Ophélie Lacroix	pp. 2440‑2445
pdf	bib	poster	video	LPAttack: A Feasible Annotation Scheme for Capturing Logic Pattern of Attacks in Arguments Farjana Sultana Mim, Naoya Inoue, Shoichi Naito, Keshav Singh and Kentaro Inui	pp. 2446‑2459
pdf	bib	poster	video	BeSt: The Belief and Sentiment Corpus Jennifer Tracey, Owen Rambow, Claire Cardie, Adam Dalton, Hoa Trang Dang, Mona Diab, Bonnie Dorr, Louise Guthrie, Magdalena Markowska, Smaranda Muresan, Vinodkumar Prabhakaran, Samira Shaikh and Tomek Strzalkowski	pp. 2460‑2467
pdf	bib		video	MOTIF: Contextualized Images for Complex Words to Improve Human Reading Xintong Wang, Florian Schneider, Özge Alacam, Prateek Chaudhury and Chris Biemann	pp. 2468‑2477
pdf	bib	poster	video	Challenges with Sign Language Datasets for Sign Language Recognition and Translation Mirella De Sisto, Vincent Vandeghinste, Santiago Egea Gómez, Mathieu De Coster, Dimitar Shterionov and Horacio Saggion	pp. 2478‑2487
pdf	bib	poster	video	A Low-Cost Motion Capture Corpus in French Sign Language for Interpreting Iconicity and Spatial Referencing Mechanisms Clémence Mertz, Vincent BARREAUD, Thibaut Le Naour, Damien Lolive and Sylvie Gibet	pp. 2488‑2497
pdf	bib		video	The CLAMS Platform at Work: Processing Audiovisual Data from the American Archive of Public Broadcasting Marc Verhagen, Kelley Lynch, Kyeongmin Rim and James Pustejovsky	pp. 2498‑2506
pdf	bib	poster	video	BU-NEmo: an Affective Dataset of Gun Violence News Carley Reardon, Sejin Paik, Ge Gao, Meet Parekh, Yanling Zhao, Lei Guo, Margrit Betke and Derry Tanti Wijaya	pp. 2507‑2516
pdf	bib	poster	video	RoomReader: A Multimodal Corpus of Online Multiparty Conversational Interactions Justine Reverdy, Sam O’Connor Russell, Louise Duquenne, Diego Garaialde, Benjamin R. Cowan and Naomi Harte	pp. 2517‑2527
pdf	bib	poster	video	Quevedo: Annotation and Processing of Graphical Languages Antonio F. G. Sevilla, Alberto Díaz Esteban and José María Lahoz-Bengoechea	pp. 2528‑2535
pdf	bib	poster	video	Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel’s Weekly Video Podcasts Debjoy Saha, Shravan Nayak and Timo Baumann	pp. 2536‑2540
pdf	bib		video	Crowdsourcing Kazakh-Russian Sign Language: FluentSigners-50 Medet Mukushev, Aigerim Kydyrbekova, Alfarabi Imashev, Vadim Kimmelman and Anara Sandygulova	pp. 2541‑2547
pdf	bib		video	Connecting a French Dictionary from the Beginning of the 20th Century to Wikidata Pierre Nugues	pp. 2548‑2555
pdf	bib	poster	video	Metaphor annotation for German Markus Egg and Valia Kordoni	pp. 2556‑2562
pdf	bib	poster	video	NorDiaChange: Diachronic Semantic Change Dataset for Norwegian Andrey Kutuzov, Samia Touileb, Petter Mæhlum, Tita Enstad and Alexandra Wittemann	pp. 2563‑2572
pdf	bib	poster	video	Exploring Transformers for Ranking Portuguese Semantic Relations Hugo Gonçalo Oliveira	pp. 2573‑2582
pdf	bib	poster	video	Building Static Embeddings from Contextual Ones: Is It Useful for Building Distributional Thesauri? Olivier Ferret	pp. 2583‑2590
pdf	bib	poster	video	Sentence Selection Strategies for Distilling Word Embeddings from BERT Yixiao Wang, Zied Bouraoui, Luis Espinosa Anke and Steven Schockaert	pp. 2591‑2600
pdf	bib		video	DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish Gioia Baldissin, Dominik Schlechtweg and Sabine Schulte im Walde	pp. 2601‑2609
pdf	bib	poster	video	My Case, For an Adposition: Lexical Polysemy of Adpositions and Case Markers in Finnish and Latin Daniel Chen and Mans Hulden	pp. 2610‑2616
pdf	bib	poster	video	WiC-TSV-de: German Word-in-Context Target-Sense-Verification Dataset and Cross-Lingual Transfer Analysis Anna Breit, Artem Revenko and Narayani Blaschke	pp. 2617‑2625
pdf	bib	poster	video	Re-train or Train from Scratch? Comparing Pre-training Strategies of BERT in the Medical Domain Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne and Pierre Zweigenbaum	pp. 2626‑2633
pdf	bib	poster	video	Universal Semantic Annotator: the First Unified API for WSD, SRL and Semantic Parsing Riccardo Orlando, Simone Conia, Stefano Faralli and Roberto Navigli	pp. 2634‑2641
pdf	bib	slides	video	D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research Jan Philip Wahle, Terry Ruas, Saif Mohammad and Bela Gipp	pp. 2642‑2651
pdf	bib	slides	video	SciPar: A Collection of Parallel Corpora from Scientific Abstracts Dimitrios Roussis, Vassilis Papavassiliou, Prokopis Prokopidis, Stelios Piperidis and Vassilis Katsouros	pp. 2652‑2657
pdf	bib	slides	video	CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms Martha Gavidia, Patrick Lee, Anna Feldman and JIng Peng	pp. 2658‑2671
pdf	bib	slides	video	Camel Treebank: An Open Multi-genre Arabic Dependency Treebank Nizar Habash, Muhammed AbuOdeh, Dima Taji, Reem Faraj, Jamila El Gizuli and Omar Kallas	pp. 2672‑2681
pdf	bib	slides	video	MentSum: A Resource for Exploring Summarization of Mental Health Online Posts Sajad Sotudeh, Nazli Goharian and Zachary Young	pp. 2682‑2692
pdf	bib	slides	video	Klexikon: A German Dataset for Joint Summarization and Simplification Dennis Aumiller and Michael Gertz	pp. 2693‑2701
pdf	bib	slides	video	Applying Automatic Text Summarization for Fake News Detection Philipp Hartl and Udo Kruschwitz	pp. 2702‑2713
pdf	bib	slides	video	Increasing CMDI’s Semantic Interoperability with schema.org Nino Meisinger, Thorsten Trippel and Claus Zinn	pp. 2714‑2720
pdf	bib	slides	video	RefCo and its Checker: Improving Language Documentation Corpora’s Reusability Through a Semi-Automatic Review Process Herbert Lange and Jocelyn Aznar	pp. 2721‑2729
pdf	bib		video	Identification and Analysis of Personification in Hungarian: The PerSECorp project Gábor Simon	pp. 2730‑2738
pdf	bib			ISO-based Annotated Multilingual Parallel Corpus for Discourse Markers Purificação Silvano, Mariana Damova, Giedrė Valūnaitė Oleškevičienė, Chaya Liebeskind, Christian Chiarcos, Dimitar Trajanov, Ciprian-Octavian Truică, Elena-Simona Apostol and Anna Baczkowska	pp. 2739‑2749
pdf	bib	slides	video	LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild David Gimeno-Gómez and Carlos-D. Martínez-Hinarejos	pp. 2750‑2758
pdf	bib	slides	video	Modality Alignment between Deep Representations for Effective Video-and-Language Learning Hyeongu Yun, Yongil Kim and Kyomin Jung	pp. 2759‑2770
pdf	bib	slides	video	Mutual Gaze and Linguistic Repetition in a Multimodal Corpus Anais Murat, Maria Koutsombogera and Carl Vogel	pp. 2771‑2780
pdf	bib	slides	video	Multidimensional Coding of Multimodal Languaging in Multi-Party Settings Christophe Parisse, Marion Blondel, Stéphanie Caët, Claire Danet, Coralie Vincent and Aliyah Morgenstern	pp. 2781‑2787
pdf	bib	poster	video	Constructing a Lexical Resource of Russian Derivational Morphology Lukáš Kyjánek, Olga Lyashevskaya, Anna Nedoluzhko, Daniil Vodolazsky and Zdeněk Žabokrtský	pp. 2788‑2797
pdf	bib	poster	video	Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship Temuulen Khishigsuren, Gábor Bella, Khuyagbaatar Batsuren, Abed Alhakim Freihat, Nandu Chandran Nair, Amarsanaa Ganbold, Hadi Khalilia, Yamini Chandrashekar and fausto giunchiglia	pp. 2798‑2807
pdf	bib	poster	video	Towards Latvian WordNet Peteris Paikens, Mikus Grasmanis, Agute Klints, Ilze Lokmane, Lauma Pretkalniņa, Laura Rituma, Madara Stāde and Laine Strankale	pp. 2808‑2815
pdf	bib		video	Building Sentiment Lexicons for Mainland Scandinavian Languages Using Machine Translation and Sentence Embeddings Peng Liu, Cristina Marco and Jon Atle Gulla	pp. 2816‑2825
pdf	bib	poster	video	A Thesaurus-based Sentiment Lexicon for Danish: The Danish Sentiment Lexicon Sanni Nimb, Sussi Olsen, Bolette Pedersen and Thomas Troelsgård	pp. 2826‑2832
pdf	bib		video	IndoUKC: A Concept-Centered Indian Multilingual Lexical Resource Nandu Chandran Nair, Rajendran S. Velayuthan, Yamini Chandrashekar, Gábor Bella and fausto giunchiglia	pp. 2833‑2840
pdf	bib		video	Korean Language Modeling via Syntactic Guide Hyeondey Kim, Seonhoon Kim, INHO KANG, Nojun Kwak and Pascale Fung	pp. 2841‑2849
pdf	bib	poster	video	A Whole-Person Function Dictionary for the Mobility, Self-Care and Domestic Life Domains: a Seedset Expansion Approach Ayah Zirikly, Bart Desmet, Julia Porcino, Jonathan Camacho Maldonado, Pei-Shu Ho, Rafael Jimenez Silva and Maryanne Sacco	pp. 2850‑2855
pdf	bib	poster	video	Placing multi-modal, and multi-lingual Data in the Humanities Domain on the Map: the Mythotopia Geo-tagged Corpus Voula Giouli, Anna Vacalopoulou, Nikolaos Sidiropoulos, Christina Flouda, Athanasios Doupas, Giorgos Giannopoulos, Nikos Bikakis, Vassilis Kaffes and Gregory Stainhaouer	pp. 2856‑2864
pdf	bib		video	An Architecture of resolving a multiple link path in a standoff-style data format to enhance the mobility of language resources Kazushi Ohya	pp. 2865‑2873
pdf	bib	poster	video	A Corpus of German Citizen Contributions in Mobility Planning: Supporting Evaluation Through Multidimensional Classification Julia Romberg, Laura Mark and Tobias Escher	pp. 2874‑2883
pdf	bib		video	Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars Jakob Lesage, Hannah J. Haynie, Hedvig Skirgård, Tobias Weber and Alena Witzlack-Makarevich	pp. 2884‑2890
pdf	bib	poster	video	Hong Kong: Longitudinal and Synchronic Characterisations of Protest News between 1998 and 2020 Arya D. McCarthy and Giovanna Maria Dora Dore	pp. 2891‑2900
pdf	bib	poster	video	Nunc profana tractemus. Detecting Code-Switching in a Large Corpus of 16th Century Letters Martin Volk, Lukas Fischer, Patricia Scheurer, Bernard Silvan Schroffenegger, Raphael Schwitter, Phillip Ströbel and Benjamin Suter	pp. 2901‑2908
pdf	bib	poster	video	Quality and Efficiency of Manual Annotation: Pre-annotation Bias Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková and Jan Hajic	pp. 2909‑2918
pdf	bib		video	A Comprehensive Evaluation and Correction of the TimeBank Corpus Mustafa Ocal, Antonela Radas, Jared Hummer, Karine Megerdoomian and Mark Finlayson	pp. 2919‑2927
pdf	bib	poster	video	Evaluating Multilingual Sentence Representation Models in a Real Case Scenario Rocco Tripodi, Rexhina Blloshmi and Simon Levis Sullam	pp. 2928‑2939
pdf	bib	poster	video	Validity, Agreement, Consensuality and Annotated Data Quality Anaëlle Baledent, Yann Mathet, Antoine Widlöcher, Christophe Couronne and Jean-Luc Manguin	pp. 2940‑2948
pdf	bib	poster	video	Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding salima mdhaffar, Valentin Pelloin, Antoine Caubrière, Gaëlle Laperriere, Sahar Ghannay, Bassam Jabaian, Nathalie Camelin and Yannick Estève	pp. 2949‑2956
pdf	bib	poster	video	JGLUE: Japanese General Language Understanding Evaluation Kentaro Kurihara, Daisuke Kawahara and Tomohide Shibata	pp. 2957‑2966
pdf	bib			Using the LARA Little Prince to compare human and TTS audio quality Elham Akhlaghi, Ingibjörg Iða Auðunardóttir, Anna Bączkowska, Branislav Bédi, Hakeem Beedar, Harald Berthelsen, Cathy Chua, Catia Cucchiarin, Hanieh Habibi, Ivana Horváthová, Junta Ikeda, Christèle Maizonniaux, Neasa Ní Chiaráin, Chadi Raheb, Manny Rayner, John Sloan, Nikos Tsourakis and Chunlin Yao	pp. 2967‑2975
pdf	bib	poster	video	Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations Chris Emmery, Ákos Kádár, Grzegorz Chrupała and Walter Daelemans	pp. 2976‑2988
pdf	bib	poster	video	Constructing Distributions of Variation in Referring Expression Type from Corpora for Model Evaluation T. Mark Ellison and Fahime Same	pp. 2989‑2997
pdf	bib		video	Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis Aleksandr Perevalov, Xi Yan, Liubov Kovriguina, Longquan Jiang, Andreas Both and Ricardo Usbeck	pp. 2998‑3007
pdf	bib	poster	video	Multi-Task Learning for Cross-Lingual Abstractive Summarization Sho Takase and Naoaki Okazaki	pp. 3008‑3016
pdf	bib	poster	video	How Much Context Span is Enough? Examining Context-Related Issues for Document-level MT Sheila Castilho	pp. 3017‑3025
pdf	bib	poster	video	TANDO: A Corpus for Document-level Machine Translation Harritxu Gete, Thierry Etchegoyhen, David Ponce, Gorka Labaka, Nora Aranberri, Ander Corral, Xabier Saralegi, Igor Ellakuria and Maite Martin	pp. 3026‑3037
pdf	bib	poster	video	Unsupervised Machine Translation in Real-World Scenarios Ona de Gibert Bonet, Iakes Goenaga, Jordi Armengol-Estapé, Olatz Perez-de-Viñaspre, Carla Parra Escartín, Marina Sanchez, Mārcis Pinnis, Gorka Labaka and Maite Melero	pp. 3038‑3047
pdf	bib		video	COVID-19 Mythbusters in World Languages Mana Ashida, Jin-Dong Kim and Lee Seunghun	pp. 3048‑3055
pdf	bib	poster	video	On the Multilingual Capabilities of Very Large-Scale English Language Models Jordi Armengol-Estapé, Ona de Gibert Bonet and Maite Melero	pp. 3056‑3068
pdf	bib	poster	video	Evaluating Subtitle Segmentation for End-to-end Generation Systems Alina Karakanta, François Buet, Mauro Cettolo and François Yvon	pp. 3069‑3078
pdf	bib	poster	video	Using Semantic Role Labeling to Improve Neural Machine Translation Reinhard Rapp	pp. 3079‑3083
pdf	bib	poster	video	A Deep Transfer Learning Method for Cross-Lingual Natural Language Inference Dibyanayan Bandyopadhyay, Arkadipta De, Baban Gain, Tanik Saikh and Asif Ekbal	pp. 3084‑3092
pdf	bib	poster	video	Simple TICO-19: A Dataset for Joint Translation and Simplification of COVID-19 Texts Matthew Shardlow and Fernando Alva-Manchego	pp. 3093‑3102
pdf	bib		video	Building Comparable Corpora for Assessing Multi-Word Term Alignment Omar Adjali, Emmanuel Morin and Pierre Zweigenbaum	pp. 3103‑3112
pdf	bib		video	Mean Machine Translations: On Gender Bias in Icelandic Machine Translations Agnes Sólmundsdóttir, Dagbjört Guðmundsdóttir, Lilja Björk Stefánsdóttir and Anton Ingason	pp. 3113‑3121
pdf	bib	poster	video	An Analysis of Dialogue Act Sequence Similarity Across Multiple Domains Ayesha Enayet and Gita Sukthankar	pp. 3122‑3130
pdf	bib	poster	video	Constructing a Culinary Interview Dialogue Corpus with Video Conferencing Tool Taro Okahisa, Ribeka Tanaka, Takashi Kodama, Yin Jou Huang and Sadao Kurohashi	pp. 3131‑3139
pdf	bib	poster	video	UgChDial: A Uyghur Chat-based Dialogue Corpus for Response Space Classification Zulipiye Yusupujiang and Jonathan Ginzburg	pp. 3140‑3149
pdf	bib	poster	video	A Speculative and Tentative Common Ground Handling for Efficient Composition of Uncertain Dialogue Saki Sudo, Kyoshiro Asano, Koh Mitsuda, Ryuichiro Higashinaka and Yugo Takeuchi	pp. 3150‑3157
pdf	bib	poster	video	BaSCo: An Annotated Basque-Spanish Code-Switching Corpus for Natural Language Understanding Maia Aguirre, Laura García-Sardiña, Manex Serras, Ariane Méndez and Jacobo López	pp. 3158‑3163
pdf	bib		video	ProDial – An Annotated Proactive Dialogue Act Corpus for Conversational Assistants using Crowdsourcing Matthias Kraus, Nicolas Wagner and Wolfgang Minker	pp. 3164‑3173
pdf	bib	poster	video	ELITR Minuting Corpus: A Novel Dataset for Automatic Minuting from Multi-Party Meetings in English and Czech Anna Nedoluzhko, Muskaan Singh, Marie Hledíková, Tirthankar Ghosal and Ondřej Bojar	pp. 3174‑3182
pdf	bib		video	Extracting Age-Related Stereotypes from Social Media Texts Kathleen C. Fraser, Svetlana Kiritchenko and Isar Nejadgholi	pp. 3183‑3194
pdf	bib	slides	video	Borrowing or Codeswitching? Annotating for Finer-Grained Distinctions in Language Mixing Elena Alvarez-Mellado and Constantine Lignos	pp. 3195‑3201
pdf	bib	slides	video	Multi-Aspect Transfer Learning for Detecting Low Resource Mental Disorders on Social Media Ana Sabina Uban, Berta Chulvi and Paolo Rosso	pp. 3202‑3219
pdf	bib	slides	video	ArCovidVac: Analyzing Arabic Tweets About COVID-19 Vaccination Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury and Firoj Alam	pp. 3220‑3230
pdf	bib	slides	video	FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias Flora Sakketou, Joan Plepi, Riccardo Cervero, Henri Jacques Geiss, Paolo Rosso and Lucie Flek	pp. 3231‑3241
pdf	bib	slides	video	Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition Julia Pritzen, Michael Gref, Dietlind Zühlke and Christoph Andreas Schmidt	pp. 3242‑3249
pdf	bib	slides	video	SDS-200: A Swiss German Speech to Standard German Text Corpus Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak and Manfred Vogel	pp. 3250‑3256
pdf	bib		video	Extracting Linguistic Knowledge from Speech: A Study of Stop Realization in 5 Romance Languages Yaru WU, Mathilde Hutin, Ioana Vasilescu, Lori Lamel and Martine Adda-Decker	pp. 3257‑3263
pdf	bib	slides	video	Overlaps and Gender Analysis in the Context of Broadcast Media Martin Lebourdais, Marie Tahon, Antoine LAURENT, Sylvain Meignier and Anthony Larcher	pp. 3264‑3270
pdf	bib	slides	video	A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification. Rémi Uro, David Doukhan, Albert Rilliard, Laetitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon and Antoine Laurent	pp. 3271‑3280
pdf	bib	slides	video	DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations Merel Scholman, Tianai Dong, Frances Yung and Vera Demberg	pp. 3281‑3290
pdf	bib	slides	video	QT30: A Corpus of Argument and Conflict in Broadcast Debate Annette Hautli-Janisz, Zlata Kikteva, Wassiliki Siskou, Kamila Gorska, Ray Becker and Chris Reed	pp. 3291‑3300
pdf	bib	slides	video	Scaling up Discourse Quality Annotation for Political Science Neele Falk and Gabriella Lapesa	pp. 3301‑3318
pdf	bib	slides	video	Clarifying Implicit and Underspecified Phrases in Instructional Text Talita Anthonio, Anna Sauer and Michael Roth	pp. 3319‑3330
pdf	bib	slides	video	Multilingual Pragmaticon: Database of Discourse Formulae Anton Buzanov, Polina Bychkova, Arina Molchanova, Anna Postnikova and Daria Ryzhova	pp. 3331‑3336
pdf	bib	slides	video	Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Dusko Vitas, Mihailo Skoric and Milica Ikonić Nešić	pp. 3337‑3345
pdf	bib	slides	video	Exploring Text Recombination for Automatic Narrative Level Detection Nils Reiter, Judith Sieker, Svenja Guhr, Evelyn Gius and Sina Zarrieß	pp. 3346‑3353
pdf	bib	slides	video	Automatic Normalisation of Early Modern French Rachel Bawden, Jonathan Poinhos, Eleni Kogkitsidou, Philippe Gambette, Benoît Sagot and Simon Gabay	pp. 3354‑3366
pdf	bib	slides	video	From FreEM to D’AlemBERT: a Large Corpus and a Language Model for Early Modern French Simon Gabay, Pedro Ortiz Suarez, Alexandre BARTZ, Alix Chagué, Rachel Bawden, Philippe Gambette and Benoît Sagot	pp. 3367‑3374
pdf	bib	slides	video	Detecting Multiple Transitions in Literary Texts Nuette Heyns and Menno van Zaanen	pp. 3375‑3381
pdf	bib	poster	video	BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions Nayla Escribano, Jon Ander Gonzalez, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre and Rodrigo Agerri	pp. 3382‑3390
pdf	bib	poster	video	GerEO: A Large-Scale Resource on the Syntactic Distribution of German Experiencer-Object Verbs Johanna M. Poppek, Simon Masloch and Tibor Kiss	pp. 3391‑3397
pdf	bib	poster	video	ACT2: A multi-disciplinary semi-structured dataset for importance and purpose classification of citations Suchetha Nambanoor Kunnath, Valentin Stauber, Ronin Wu, David Pride, Viktor Botev and Petr Knoth	pp. 3398‑3406
pdf	bib	poster	video	Quantification Annotation in ISO 24617-12, Second Draft Harry Bunt, Maxime Amblard, Johan Bos, Karën Fort, Bruno Guillaume, Philippe de Groote, Chuyuan Li, Pierre Ludmann, Michel Musiol, Siyana Pavlova, Guy Perrier and Sylvain Pogodalla	pp. 3407‑3416
pdf	bib		video	The LTRC Hindi-Telugu Parallel Corpus Vandan Mujadia and Dipti Sharma	pp. 3417‑3424
pdf	bib	poster	video	MHE: Code-Mixed Corpora for Similar Language Identification Priya Rani, John P. McCrae and Theodorus Fransen	pp. 3425‑3433
pdf	bib	poster	video	Bazinga! A Dataset for Multi-Party Dialogues Structuring Paul Lerner, Juliette Bergoënd, Camille Guinaudeau, Hervé Bredin, Benjamin Maurice, Sharleyne Lefevre, Martin Bouteiller, Aman Berhe, Léo Galmant, Ruiqing Yin and Claude Barras	pp. 3434‑3441
pdf	bib		video	The Ellogon Web Annotation Tool: Annotating Moral Values and Arguments Alexandros Fotios Ntogramatzis, Anna Gradou, Georgios Petasis and Marko Kokol	pp. 3442‑3450
pdf	bib	poster	video	WeCanTalk: A New Multi-language, Multi-modal Resource for Speaker Recognition Karen Jones, Kevin Walker, Christopher Caruso, Jonathan Wright and Stephanie Strassel	pp. 3451‑3456
pdf	bib	poster	video	Using Wiktionary to Create Specialized Lexical Resources and Datasets Lenka Bajčetić and Thierry Declerck	pp. 3457‑3460
pdf	bib	poster	video	STAPI: An Automatic Scraper for Extracting Iterative Title-Text Structure from Web Documents Nan Zhang, Shomir Wilson and Prasenjit Mitra	pp. 3461‑3470
pdf	bib	poster	video	ELTE Poetry Corpus: A Machine Annotated Database of Canonical Hungarian Poetry Péter Horváth, Péter Kundráth, Balázs Indig, Zsófia Fellegi, Eszter Szlávich, Tímea Borbála Bajzát, Zsófia Sárközi-Lindner, Bence Vida, Aslihan Karabulut, Mária Timári and Gábor Palkó	pp. 3471‑3478
pdf	bib	poster	video	HAWP: a Dataset for Hindi Arithmetic Word Problem Solving Harshita Sharma, Pruthwik Mishra and Dipti Sharma	pp. 3479‑3490
pdf	bib	poster	video	The Bulgarian Event Corpus: Overview and Initial NER Experiments Petya Osenova, Kiril Simov, Iva Marinova and Melania Berbatova	pp. 3491‑3499
pdf	bib		video	A Corpus for Commonsense Inference in Story Cloze Test Bingsheng Yao, Ethan Joseph, Julian Lioanag and Mei Si	pp. 3500‑3508
pdf	bib	poster	video	Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson and Magnus Sahlgren	pp. 3509‑3518
pdf	bib	poster	video	Constrained Language Models for Interactive Poem Generation Andrei Popescu-Belis, Àlex Atrio, Valentin Minder, Aris Xanthos, Gabriel Luthier, Simon Mattei and Antonio Rodriguez	pp. 3519‑3529
pdf	bib		video	ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls Huije Lee, Young Ju NA, Hoyun Song, Jisu Shin and Jong Park	pp. 3530‑3541
pdf	bib	poster	video	Generating Textual Explanations for Machine Learning Models Performance: A Table-to-Text Task Isaac Ampomah, James Burton, Amir Enshaei and Noura Al Moubayed	pp. 3542‑3551
pdf	bib	poster	video	Barch: an English Dataset of Bar Chart Summaries Iza Škrjanec, Muhammad Salman Edhi and Vera Demberg	pp. 3552‑3560
pdf	bib	poster	video	Effectiveness of Data Augmentation and Pretraining for Improving Neural Headline Generation in Low-Resource Settings Matej Martinc, Syrielle Montariol, Lidia Pivovarova and Elaine Zosa	pp. 3561‑3570
pdf	bib		video	Effectiveness of French Language Models on Abstractive Dialogue Summarization Task Yongxin Zhou, François Portet and Fabien Ringeval	pp. 3571‑3581
pdf	bib	poster	video	ALEXSIS: A Dataset for Lexical Simplification in Spanish Daniel Ferrés and Horacio Saggion	pp. 3582‑3594
pdf	bib	poster	video	The IARPA BETTER Program Abstract Task Four New Semantically Annotated Corpora from IARPA’s BETTER Program Timothy Mckinnon and Carl Rubino	pp. 3595‑3600
pdf	bib	poster	video	A Named Entity Recognition Corpus for Vietnamese Biomedical Texts to Support Tuberculosis Treatment Uyen Phan, Phuong N.V Nguyen and Nhung Nguyen	pp. 3601‑3609
pdf	bib	poster	video	RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour Erick Mendez Guzman, Viktor Schlegel and Riza Batista-Navarro	pp. 3610‑3625
pdf	bib	poster	video	Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT Mustafa Jarrar, Mohammed Khalilia and Sana Ghanem	pp. 3626‑3636
pdf	bib		video	Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient’s Perspective Lisa Raithel, Philippe Thomas, Roland Roller, Oliver Sapina, Sebastian Möller and Pierre Zweigenbaum	pp. 3637‑3649
pdf	bib	poster	video	GGPONC 2.0 - The German Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline NER Taggers Florian Borchert, Christina Lohr, Luise Modersohn, Jonas Witt, Thomas Langer, Markus Follmann, Matthias Gietzelt, Bert Arnrich, Udo Hahn and Matthieu-P. Schapranow	pp. 3650‑3660
pdf	bib	poster	video	ClinIDMap: Towards a Clinical IDs Mapping for Data Interoperability Elena Zotova, Montse Cuadros and German Rigau	pp. 3661‑3669
pdf	bib		video	Identifying Draft Bills Impacting Existing Legislation: a Case Study on Romanian Corina Ceausu and Sergiu Nisioi	pp. 3670‑3674
pdf	bib	poster	video	MuLD: The Multitask Long Document Benchmark George Hudson and Noura Al Moubayed	pp. 3675‑3685
pdf	bib	poster	video	A Cross-document Coreference Dataset for Longitudinal Tracking across Radiology Reports Surabhi Datta, Hio Cheng Lam, Atieh Pajouhi, Sunitha Mogalla and Kirk Roberts	pp. 3686‑3695
pdf	bib	poster	video	How’s Business Going Worldwide ? A Multilingual Annotated Corpus for Business Relation Extraction Hadjer Khaldi, Farah Benamara, Camille Pradel, Grégoire Sigel and Nathalie Aussenac-Gilles	pp. 3696‑3705
pdf	bib		video	Do Transformer Networks Improve the Discovery of Rules from Text? Mahdi Rahimi and Mihai Surdeanu	pp. 3706‑3714
pdf	bib	poster	video	Offensive language detection in Hebrew: can other languages help? Marina Litvak, Natalia Vanetik, Chaya Liebeskind, Omar Hmdia and Rizek Abu Madeghem	pp. 3715‑3723
pdf	bib	poster	video	JaMIE: A Pipeline Japanese Medical Information Extraction System with Novel Relation Annotation Fei Cheng, Shuntaro Yada, Ribeka Tanaka, Eiji ARAMAKI and Sadao Kurohashi	pp. 3724‑3731
pdf	bib	poster	video	Enhanced Entity Annotations for Multilingual Corpora Michael Strobl, Amine Trabelsi and Osmar Zaïane	pp. 3732‑3740
pdf	bib	poster	video	Enriching Epidemiological Thematic Features For Disease Surveillance Corpora Classification Edmond Menya, Mathieu Roche, Roberto Interdonato and Dickson Owuor	pp. 3741‑3750
pdf	bib	poster	video	Spanish Datasets for Sensitive Entity Detection in the Legal Domain Ona de Gibert Bonet, Aitor García Pablos, Montse Cuadros and Maite Melero	pp. 3751‑3760
pdf	bib		video	ConvTextTM: An Explainable Convolutional Tsetlin Machine Framework for Text Classification Bimal Bhattarai, Ole-Christoffer Granmo and Lei Jiao	pp. 3761‑3770
pdf	bib	poster	video	Elvis vs. M. Jackson: Who has More Albums? Classification and Identification of Elements in Comparative Questions Meriem Beloucif, Seid Muhie Yimam, Steffen Stahlhacke and Chris Biemann	pp. 3771‑3779
pdf	bib		video	Decorate the Examples: A Simple Method of Prompt Design for Biomedical Relation Extraction Hui-Syuan Yeh, Thomas Lavergne and Pierre Zweigenbaum	pp. 3780‑3787
pdf	bib	poster	video	Comparing Annotated Datasets for Named Entity Recognition in English Literature Rositsa Ivanova, Marieke van Erp and Sabrina Kirrane	pp. 3788‑3797
pdf	bib		video	Investigating User Radicalization: A Novel Dataset for Identifying Fine-Grained Temporal Shifts in Opinion Flora Sakketou, Allison Lahnala, Liane Vogel and Lucie Flek	pp. 3798‑3808
pdf	bib	slides	video	APPReddit: a Corpus of Reddit Posts Annotated for Appraisal Marco Antonio Stranisci, Simona Frenda, Eleonora Ceccaldi, Valerio Basile, Rossana Damiano and Viviana Patti	pp. 3809‑3818
pdf	bib	slides	video	Evaluating Methods for Extraction of Aspect Terms in Opinion Texts in Portuguese - the Challenges of Implicit Aspects Mateus Machado and Thiago Alexandre Salgueiro Pardo	pp. 3819‑3828
pdf	bib	slides	video	SenticNet 7: A Commonsense-based Neurosymbolic AI Framework for Explainable Sentiment Analysis Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing and Kenneth Kwok	pp. 3829‑3839
pdf	bib	slides	video	Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo Roberto Zariquiey, Claudia Alvarado, Ximena Echevarría, Luisa Gomez, Rosa Gonzales, Mariana Illescas, Sabina Oporto, Frederic Blum, Arturo Oncevay and Javier Vera	pp. 3840‑3851
pdf	bib		video	The Norwegian Colossal Corpus: A Text Corpus for Training Large Norwegian Language Models Per Kummervold, Freddy Wetjen and Javier de la Rosa	pp. 3852‑3860
pdf	bib	slides	video	Embeddings models for Buddhist Sanskrit Ligeia Lugli, Matej Martinc, Andraž Pelicon and Senja Pollak	pp. 3861‑3871
pdf	bib	slides	video	Development of Automatic Speech Recognition for the Documentation of Cook Islands Māori Rolando Coto-Solano, Sally Akevai Nicholas, Samiha Datta, Victoria Quint, Piripi Wills, Emma Ngakuravaru Powell, Liam Koka’ua, Syed Tanveer and Isaac Feldman	pp. 3872‑3882
pdf	bib		video	A Generalized Approach to Protest Event Detection in German Local News Gregor Wiedemann, Jan Matti Dollbaum, Sebastian Haunss, Priska Daphi and Larissa Daria Meier	pp. 3883‑3891
pdf	bib	slides	video	Evaluation of Transfer Learning and Domain Adaptation for Analyzing German-Speaking Job Advertisements Ann-Sophie Gnehm, Eva Bühlmann and Simon Clematide	pp. 3892‑3901
pdf	bib	slides	video	Pre-Training Language Models for Identifying Patronizing and Condescending Language: An Analysis Carla Perez Almendros, Luis Espinosa Anke and Steven Schockaert	pp. 3902‑3911
pdf	bib		video	HeLI-OTS, Off-the-shelf Language Identifier for Text Tommi Jauhiainen, Heidi Jauhiainen and Krister Lindén	pp. 3912‑3922
pdf	bib	slides	video	Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages Silvia Severini, Ayyoob ImaniGooghari, Philipp Dufter and Hinrich Schütze	pp. 3923‑3933
pdf	bib	slides	video	Towards the Construction of a WordNet for Old English Fahad Khan, Francisco J. Minaya Gómez, Rafael Cruz González, Harry Diakoff, Javier E. Diaz Vera, John P. McCrae, Ciara O’Loughlin, William Michael Short and Sander Stolk	pp. 3934‑3941
pdf	bib	slides	video	A Framenet and Frame Annotator for German Social Media Eckhard Bick	pp. 3942‑3949
pdf	bib	slides	video	The Robotic Surgery Procedural Framebank Marco Bombieri, Marco Rospocher, Simone Paolo Ponzetto and Paolo Fiorini	pp. 3950‑3959
pdf	bib	poster	video	Representing the Toddler Lexicon: Do the Corpus and Semantics Matter? Jennifer Weber and Eliana Colunga	pp. 3960‑3968
pdf	bib	poster	video	Organizing and Improving a Database of French Word Formation Using Formal Concept Analysis Nyoman Juniarta, Olivier Bonami, Nabil Hathout, Fiammetta Namer and Yannick Toussaint	pp. 3969‑3976
pdf	bib	poster	video	Towards a new Ontology for Sign Languages Thierry Declerck	pp. 3977‑3983
pdf	bib	poster	video	Towards the Detection of a Semantic Gap in the Chain of Commonsense Knowledge Triples Yoshihiko Hayashi	pp. 3984‑3993
pdf	bib	poster	video	COPA-SSE: Semi-structured Explanations for Commonsense Reasoning Ana Brassard, Benjamin Heinzerling, Pride Kavumba and Kentaro Inui	pp. 3994‑4000
pdf	bib	poster	video	GRhOOT: Ontology of Rhetorical Figures in German Ramona Kühn, Jelena Mitrović and Michael Granitzer	pp. 4001‑4010
pdf	bib	poster	video	Querying a Dozen Corpora and a Thousand Years with Fintan Christian Chiarcos, Christian Fäth and Maxim Ionov	pp. 4011‑4021
pdf	bib	poster	video	The Index Thomisticus Treebank as Linked Data in the LiLa Knowledge Base Francesco Mambrini, Marco Passarotti, Giovanni Moretti and Matteo Pellegrini	pp. 4022‑4029
pdf	bib	poster	video	Building a Multilingual Taxonomy of Olfactory Terms with Timestamps Stefano Menini, Teresa Paccosi, Serra Sinem Tekiroğlu and Sara Tonelli	pp. 4030‑4039
pdf	bib	poster	video	Attention Understands Semantic Relations Anastasia Chizhikova, Sanzhar Murzakhmetov, Oleg Serikov, Tatiana Shavrina and Mikhail Burtsev	pp. 4040‑4050
pdf	bib	poster	video	Analysis of Dialogue in Human-Human Collaboration in Minecraft Takuma Ichikawa and Ryuichiro Higashinaka	pp. 4051‑4059
pdf	bib	poster	video	Data Collection for Empirically Determining the Necessary Information for Smooth Handover in Dialogue Sanae Yamashita and Ryuichiro Higashinaka	pp. 4060‑4068
pdf	bib		video	The slurk Interaction Server Framework: Better Data for Better Dialog Models Jana Götze, Maike Paetzel-Prüsmann, Wencke Liermann, Tim Diekmann and David Schlangen	pp. 4069‑4078
pdf	bib	poster	video	Corpus Design for Studying Linguistic Nudges in Human-Computer Spoken Interactions Natalia Kalashnikova, Serge Pajak, Fabrice Le Guel, Ioana Vasilescu, Gemma Serrano and Laurence Devillers	pp. 4079‑4087
pdf	bib		video	Dialogue Corpus Construction Considering Modality and Social Relationships in Building Common Ground Yuki Furuya, Koki Saito, Kosuke Ogura, Koh Mitsuda, Ryuichiro Higashinaka and Kazunori Takashio	pp. 4088‑4095
pdf	bib	poster	video	EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems Shutong Feng, Nurul Lubis, Christian Geishauser, Hsien-chin Lin, Michael Heck, Carel van Niekerk and Milica Gasic	pp. 4096‑4113
pdf	bib	poster	video	Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System Eda Okur, Saurav Sahay and Lama Nachman	pp. 4114‑4125
pdf	bib		video	Towards Modelling Self-imposed Filter Bubbles in Argumentative Dialogue Systems Annalena Aicher, Wolfgang Minker and Stefan Ultes	pp. 4126‑4134
pdf	bib	poster	video	Telling a Lie: Analyzing the Language of Information and Misinformation during Global Health Events Ankit Aich and Natalie Parde	pp. 4135‑4141
pdf	bib	poster	video	Misogyny and Aggressiveness Tend to Come Together and Together We Address Them Arianna Muti, Francesco Fernicola and Alberto Barrón-Cedeño	pp. 4142‑4148
pdf	bib	poster	video	The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse Ritesh Kumar, Shyam Ratan, Siddharth Singh, Enakshi Nandi, Laishram Niranjana Devi, Akash Bhagat, Yogesh Dawer, bornini lahiri, Akanksha Bansal and Atul Kr. Ojha	pp. 4149‑4161
pdf	bib	poster	video	TUSC: Emotion Word Usage in Tweets from US and Canada Krishnapriya Vishnubhotla and Saif M. Mohammad	pp. 4162‑4176
pdf	bib		video	A Turkish Hate Speech Dataset and Detection System Fatih Beyhan, Buse Çarık, İnanç Arın, Ayşecan Terzioğlu, Berrin Yanikoglu and Reyyan Yeniterzi	pp. 4177‑4185
pdf	bib		video	Life is not Always Depressing: Exploring the Happy Moments of People Diagnosed with Depression Ana-Maria Bucur, Adrian Cosma and Liviu P. Dinu	pp. 4186‑4192
pdf	bib	poster	video	Evaluating Tokenizers Impact on OOVs Representation with Transformers Models Alexandra Benamar, Cyril Grouin, Meryl Bothua and Anne Vilnat	pp. 4193‑4204
pdf	bib		video	Assessing the Quality of an Italian Crowdsourced Idiom Corpus:the Dodiom Experiment Giuseppina Morza, Raffaele Manna and Johanna Monti	pp. 4205‑4211
pdf	bib	poster	video	Medical Crossing: a Cross-lingual Evaluation of Clinical Entity Linking Anton Alekseev, Zulfat Miftahutdinov, Elena Tutubalina, Artem Shelmanov, Vladimir Ivanov, Vladimir Kokh, Alexander Nesterov, Manvel Avetisian, Andrei Chertok and Sergey Nikolenko	pp. 4212‑4220
pdf	bib		video	MTLens: Machine Translation Output Debugging Shreyas Sharma, Kareem Darwish, Lucas Pavanelli, Thiago Castro Ferreira, Mohamed Al-Badrashiny, Kamer Ali Yuksel and Hassan Sawaf	pp. 4221‑4226
pdf	bib		video	IceBATS: An Icelandic Adaptation of the Bigger Analogy Test Set Steinunn Rut Friðriksdóttir, Hjalti Daníelsson, Steinþór Steingrímsson and Einar Sigurdsson	pp. 4227‑4234
pdf	bib		video	Transfer Learning Methods for Domain Adaptation in Technical Logbook Datasets Farhad Akhbardeh, Marcos Zampieri, Cecilia Ovesdotter Alm and Travis Desell	pp. 4235‑4244
pdf	bib	poster	video	Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data Thomas Vakili, Anastasios Lamproudis, Aron Henriksson and Hercules Dalianis	pp. 4245‑4252
pdf	bib	poster	video	Dilated Convolutional Neural Networks for Lightweight Diacritics Restoration Bálint Csanády and András Lukács	pp. 4253‑4259
pdf	bib	poster	video	Generating Artificial Texts as Substitution or Complement of Training Data Vincent Claveau, Antoine Chaffin and Ewa Kijak	pp. 4260‑4269
pdf	bib		video	From Pattern to Interpretation. Using Colibri Core to Detect Translation Patterns in the Peshitta. Mathias Coeckelbergs	pp. 4270‑4274
pdf	bib		video	PAGnol: An Extra-Large French Generative Model Julien Launay, E.L. Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Alessandro Cappelli, Iacopo Poli and Djamé Seddah	pp. 4275‑4284
pdf	bib	poster	video	CEPOC: The Cambridge Exams Publishing Open Cloze dataset Mariano Felice, Shiva Taslimipoor, Øistein E. Andersen and Paula Buttery	pp. 4285‑4290
pdf	bib	poster	video	ALBETO and DistilBETO: Lightweight Spanish Language Models José Cañete, Sebastian Donoso, Felipe Bravo-Marquez, Andrés Carvallo and Vladimir Araujo	pp. 4291‑4298
pdf	bib	poster	video	On the Robustness of Cognate Generation Models Winston Wu and David Yarowsky	pp. 4299‑4305
pdf	bib	slides	video	CLISTER : A Corpus for Semantic Textual Similarity in French Clinical Narratives Nicolas Hiebel, Olivier Ferret, Karën Fort and Aurélie Névéol	pp. 4306‑4315
pdf	bib	slides	video	The Chinese Causative-Passive Homonymy Disambiguation: an adversarial Dataset for NLI and a Probing Task Shanshan Xu and Katja Markert	pp. 4316‑4323
pdf	bib	slides	video	Modeling Noise in Paraphrase Detection Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann and Mathias Creutz	pp. 4324‑4332
pdf	bib	slides	video	Give me your Intentions, I’ll Predict our Actions: A Two-level Classification of Speech Acts for Crisis Management in Social Media Enzo laurenti, Nils Bourgon, Farah Benamara, Alda Mari, Véronique MORICEAU and Camille Courgeon	pp. 4333‑4343
pdf	bib	slides	video	Towards a Cleaner Document-Oriented Multilingual Crawled Corpus Julien Abadji, Pedro Ortiz Suarez, Laurent Romary and Benoît Sagot	pp. 4344‑4355
pdf	bib	slides	video	A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Jónsson, Vilhjalmur Thorsteinsson and Hafsteinn Einarsson	pp. 4356‑4366
pdf	bib	slides	video	Adapting Language Models When Training on Privacy-Transformed Data Tugtekin Turan, Dietrich Klakow, Emmanuel Vincent and Denis Jouvet	pp. 4367‑4373
pdf	bib	slides	video	Evaluation of Transfer Learning for Polish with a Text-to-Text Model Aleksandra Chrabrowa, Łukasz Dragan, Karol Grzegorczyk, Dariusz Kajtoch, Mikołaj Koszowski, Robert Mroczkowski and Piotr Rybak	pp. 4374‑4394
pdf	bib		video	Evaluation of HTR models without Ground Truth Material Phillip Benjamin Ströbel, Martin Volk, Simon Clematide, Raphael Schwitter, Tobias Hodel and David Schoch	pp. 4395‑4404
pdf	bib	slides	video	A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking Tomasz Korybski, Elena Davitti, Constantin Orasan and Sabine Braun	pp. 4405‑4413
pdf	bib	slides	video	Are Embedding Spaces Interpretable? Results of an Intrusion Detection Evaluation on a Large French Corpus Thibault Prouteau, Nicolas Dugué, Nathalie Camelin and Sylvain Meignier	pp. 4414‑4419
pdf	bib	slides	video	Corpus for Automatic Structuring of Legal Documents Prathamesh Kalamkar, Aman Tiwari, Astha Agarwal, Saurabh Karn, Smita Gupta, Vivek Raghavan and Ashutosh Modi	pp. 4420‑4429
pdf	bib	slides	video	The Search for Agreement on Logical Fallacy Annotation of an Infodemic Claire Bonial, Austin Blodgett, Taylor Hudson, Stephanie M. Lukin, Jeffrey Micher, Douglas Summers-Stay, Peter Sutor and Clare Voss	pp. 4430‑4438
pdf	bib			Recovering Patient Journeys: A Corpus of Biomedical Entities and Relations on Twitter (BEAR) Amelie Wührl and Roman Klinger	pp. 4439‑4450
pdf	bib	poster	video	Improving Event Duration Question Answering by Leveraging Existing Temporal Information Extraction Data Felix Virgo, Fei Cheng and Sadao Kurohashi	pp. 4451‑4457
pdf	bib	poster	video	Entity Linking over Nested Named Entities for Russian Natalia Loukachevitch, Pavel Braslavski, Vladimir Ivanov, Tatiana Batura, Suresh Manandhar, Artem Shelmanov and Elena Tutubalina	pp. 4458‑4466
pdf	bib	poster	video	HiNER: A large Hindi Named Entity Recognition Dataset Rudra Murthy, Pallab Bhattacharjee, Rahul Sharnagat, Jyotsana Khatri, Diptesh Kanojia and Pushpak Bhattacharyya	pp. 4467‑4476
pdf	bib	poster	video	Bootstrapping Text Anonymization Models with Distant Supervision Anthi Papadopoulou, Pierre Lison, Lilja Øvrelid and Ildikó Pilán	pp. 4477‑4487
pdf	bib	poster	video	Natural Questions in Icelandic Vésteinn Snæbjarnarson and Hafsteinn Einarsson	pp. 4488‑4496
pdf	bib		video	QA4IE: A Quality Assurance Tool for Information Extraction Rafael Jimenez Silva, Kaushik Gedela, Alex Marr, Bart Desmet, Carolyn Rose and Chunxiao Zhou	pp. 4497‑4503
pdf	bib	poster	video	A New Dataset for Topic-Based Paragraph Classification in Genocide-Related Court Transcripts Miriam Schirmer, Udo Kruschwitz and Gregor Donabauer	pp. 4504‑4512
pdf	bib		video	DeepREF: A Framework for Optimized Deep Learning-based Relation Classification Igor Nascimento, Rinaldo Lima, Adrian-Gabriel CHIFU, Bernard Espinasse and Sébastien Fournier	pp. 4513‑4522
pdf	bib		video	Exploring Data Augmentation Strategies for Hate Speech Detection in Roman Urdu Ubaid Azam, Hammad Rizwan and Asim Karim	pp. 4523‑4531
pdf	bib		video	Incorporating LIWC in Neural Networks to Improve Human Trait and Behavior Analysis in Low Resource Scenarios Isil Yakut Kilic and Shimei Pan	pp. 4532‑4539
pdf	bib	poster	video	Using Sentence-level Classification Helps Entity Extraction from Material Science Literature Ankan Mullick, Shubhraneel Pal, Tapas Nayak, Seung-Cheol Lee, Satadeep Bhattacharjee and Pawan Goyal	pp. 4540‑4545
pdf	bib	poster	video	A Twitter Corpus for Named Entity Recognition in Turkish Buse Çarık and Reyyan Yeniterzi	pp. 4546‑4551
pdf	bib		video	A STEP towards Interpretable Multi-Hop Reasoning:Bridge Phrase Identification and Query Expansion Fan Luo and Mihai Surdeanu	pp. 4552‑4560
pdf	bib		video	Question Generation and Answering for exploring Digital Humanities collections Frederic Bechet, Elie Antoine, Jérémy Auguste and Géraldine Damnati	pp. 4561‑4568
pdf	bib		video	Evaluating Retrieval for Multi-domain Scientific Publications Nancy Ide, Keith Suderman, Jingxuan Tu, Marc Verhagen, Shanan Peters, Ian Ross, John Lawson, Andrew Borg and James Pustejovsky	pp. 4569‑4576
pdf	bib	poster	video	Modeling Dutch Medical Texts for Detecting Functional Categories and Levels of COVID-19 Patients Jenia Kim, Stella Verkijk, Edwin Geleijn, Marieke van der Leeden, Carel Meskers, Caroline Meskers, Sabina van der Veen, Piek Vossen and Guy Widdershoven	pp. 4577‑4585
pdf	bib	poster	video	Hierarchical Aggregation of Dialectal Data for Arabic Dialect Identification Nurpeiis Baimukan, Houda Bouamor and Nizar Habash	pp. 4586‑4596
pdf	bib	poster	video	Investigating Active Learning Sampling Strategies for Extreme Multi Label Text Classification Lukas Wertz, Katsiaryna Mirylenka, Jonas Kuhn and Jasmina Bogojeska	pp. 4597‑4605
pdf	bib	poster	video	German Light Verb Constructions in Business Process Models Kristin Kutzner and Ralf Laue	pp. 4606‑4610
pdf	bib		video	PhysNLU: A Language Resource for Evaluating Natural Language Understanding and Explanation Coherence in Physics Jordan Meadows, Zili Zhou and André Freitas	pp. 4611‑4619
pdf	bib	poster	video	HECTOR: A Hybrid TExt SimplifiCation TOol for Raw Texts in French Amalia Todirascu, Rodrigo Wilkens, Eva Rolin, Thomas François, Delphine Bernhard and Núria Gala	pp. 4620‑4630
pdf	bib	poster	video	AiRO - an Interactive Learning Tool for Children at Risk of Dyslexia Peter Juel Henrichsen and Stine Fuglsang Engmose	pp. 4631‑4636
pdf	bib	poster	video	Creating a Basic Language Resource Kit for Faroese Annika Simonsen, Sandra Saxov Lamhauge, Iben Nyholm Debess and Peter Juel Henrichsen	pp. 4637‑4643
pdf	bib	poster	video	Developing a Spell and Grammar Checker for Icelandic using an Error Corpus Hulda Óladóttir, Þórunn Arnardóttir, Anton Ingason and Vilhjálmur Þorsteinsson	pp. 4644‑4653
pdf	bib	poster	video	The TalkMoves Dataset: K-12 Mathematics Lesson Transcripts Annotated for Teacher and Student Discursive Moves Abhijit Suresh, Jennifer Jacobs, Charis Harty, Margaret Perkoff, James H. Martin and Tamara Sumner	pp. 4654‑4662
pdf	bib	poster	video	Automating Idea Unit Segmentation and Alignment for Assessing Reading Comprehension via Summary Protocol Analysis Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga, Yasuyo Sawaki and Mika Ishizuka	pp. 4663‑4673
pdf	bib	poster	video	IRAC: A Domain-Specific Annotated Corpus of Implicit Reasoning in Arguments Keshav Singh, Naoya Inoue, Farjana Sultana Mim, Shoichi Naito and Kentaro Inui	pp. 4674‑4683
pdf	bib	poster	video	Conversational Speech Recognition Needs Data? Experiments with Austrian German Julian Linke, Philip N. Garner, Gernot Kubin and Barbara Schuppler	pp. 4684‑4691
pdf	bib	poster	video	A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications Vijini Liyanage, Davide Buscaldi and Adeline Nazarenko	pp. 4692‑4700
pdf	bib		video	Building a Dataset for Automatically Learning to Detect Questions Requiring Clarification Ivano Lauriola, Kevin Small and Alessandro Moschitti	pp. 4701‑4707
pdf	bib	poster	video	The ALPIN Sentiment Dictionary: Austrian Language Polarity in Newspapers Thomas Kolb, Sekanina Katharina, Bettina Manuela Johanna Kern, Julia Neidhardt, Tanja Wissik and Andreas Baumann	pp. 4708‑4716
pdf	bib		video	Text Classification and Prediction in the Legal Domain Minh-Quoc Nghiem, Paul Baylis, André Freitas and Sophia Ananiadou	pp. 4717‑4722
pdf	bib	poster	video	I still have Time(s): Extending HeidelTime for German Texts Andy Luecking, Manuel Stoeckel, Giuseppe Abrami and Alexander Mehler	pp. 4723‑4728
pdf	bib	poster	video	Morphological Complexity of Children Narratives in Eight Languages Gordana Hržica, Chaya Liebeskind, Kristina Š. Despot, Olga Dontcheva-Navratilova, Laura Kamandulytė-Merfeldienė, Sara Košutar, Matea Kramarić and Giedrė Valūnaitė Oleškevičienė	pp. 4729‑4738
pdf	bib		video	EXPRES Corpus for A Field-specific Automated Exploratory Study of L2 English Expert Scientific Writing Ana-Maria Bucur, Madalina Chitez, Valentina Muresan, Andreea Dinca and Roxana Rogobete	pp. 4739‑4746
pdf	bib	poster	video	An Evaluation Framework for Legal Document Summarization Ankan Mullick, Abhilash Nandy, Manav Kapadnis, Sohan Patnaik, Raghav R and Roshni Kar	pp. 4747‑4753
pdf	bib		video	Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France’s Court of Cassation Rulings Thibault Charmet, Inès Cherichi, Matthieu Allain, Urszula Czerwinska, Amaury Fouret, Benoît Sagot and Rachel Bawden	pp. 4754‑4766
pdf	bib		video	Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset Bolat Tleubayev, Zhanel Zhexenova, Kenessary Koishybay and Anara Sandygulova	pp. 4767‑4773
pdf	bib		video	gaBERT — an Irish Language Model James Barry, Joachim Wagner, Lauren Cassidy, Alan Cowap, Teresa Lynn, Abigail Walsh, Mícheál J. Ó Meachair and Jennifer Foster	pp. 4774‑4788
pdf	bib		video	PoS Tagging, Lemmatization and Dependency Parsing of West Frisian Wilbert Heeringa, Gosse Bouma, Martha Hofman, Jelle Brouwer, Eduard Drenth, Jan Wijffels and Hans Van de Velde	pp. 4789‑4798
pdf	bib	poster	video	A Dataset of Offensive German Language Tweets Annotated for Speech Acts Melina Plakidis and Georg Rehm	pp. 4799‑4807
pdf	bib	poster	video	Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German Marie-Pauline Krielke, Luigi Talamo, Mahmoud Fawzi and Jörg Knappen	pp. 4808‑4816
pdf	bib	poster	video	The Tembusu Treebank: An English Learner Treebank Luís Morgado da Costa, Francis Bond and Roger V. P. Winder	pp. 4817‑4826
pdf	bib	poster	video	The Norwegian Dialect Corpus Treebank Andre Kåsen, Kristin Hagen, Anders Nøklestad, Joel Priestly, Per Erik Solberg and Dag Trygve Truslew Haug	pp. 4827‑4832
pdf	bib	poster	video	RRGparbank: A Parallel Role and Reference Grammar Treebank Tatiana Bladier, Kilian Evang, Valeria Generalova, Zahra Ghane, Laura Kallmeyer, Robin Möllemann, Natalia Moors, Rainer Osswald and Simon Petitjean	pp. 4833‑4841
pdf	bib	poster	video	Unifying Morphology Resources with OntoLex-Morph. A Case Study in German Christian Chiarcos, Christian Fäth and Maxim Ionov	pp. 4842‑4850
pdf	bib	slides	video	Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers Takuto Asakura, Yusuke Miyao and Akiko Aizawa	pp. 4851‑4858
pdf	bib	slides	video	CorefUD 1.0: Coreference Meets Universal Dependencies Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes and Daniel Zeman	pp. 4859‑4872
pdf	bib	slides	video	The Universal Anaphora Scorer Juntao Yu, Sopan Khosla, Nafise Sadat Moosavi, Silviu Paun, Sameer Pradhan and Massimo Poesio	pp. 4873‑4883
pdf	bib	slides	video	Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes Anastasia Zhukova, Felix Hamborg and Bela Gipp	pp. 4884‑4893
pdf	bib	slides	video	Explainable Tsetlin Machine Framework for Fake News Detection with Credibility Score Assessment Bimal Bhattarai, Ole-Christoffer Granmo and Lei Jiao	pp. 4894‑4903
pdf	bib	slides	video	Enhancing Deep Learning with Embedded Features for Arabic Named Entity Recognition Ali L. Hatab, Caroline Sabty and Slim Abdennadher	pp. 4904‑4912
pdf	bib	slides	video	SCAI-QReCC Shared Task on Conversational Question Answering Svitlana Vakulenko, Johannes Kiesel and Maik Fröbe	pp. 4913‑4922
pdf	bib	slides	video	Semantic Relations between Text Segments for Semantic Storytelling: Annotation Tool - Dataset - Evaluation Michael Raring, Malte Ostendorff and Georg Rehm	pp. 4923‑4932
pdf	bib	slides	video	Evaluating Pre-training Objectives for Low-Resource Translation into Morphologically Rich Languages Prajit Dhar, Arianna Bisazza and Gertjan van Noord	pp. 4933‑4943
pdf	bib	slides	video	Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding Abhidip Bhattacharyya, Cecilia Mauceri, Martha Palmer and Christoffer Heckman	pp. 4944‑4954
pdf	bib	slides	video	Rosetta-LSF: an Aligned Corpus of French Sign Language and French for Text-to-Sign Translation Elise Bertin-Lemée, Annelies Braffort, Camille Challant, Claire Danet, Boris Dauriac, Michael Filhol, Emmanuella Martinod and Jérémie Segouat	pp. 4955‑4962
pdf	bib	slides	video	MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset Marina Fomicheva, Shuo Sun, Erick Fonseca, Chrysoula Zerva, Frédéric Blain, Vishrav Chaudhary, Francisco Guzmán, Nina Lopatina, Lucia Specia and André F. T. Martins	pp. 4963‑4974
pdf	bib		video	OpenKorPOS: Democratizing Korean Tokenization with Voting-Based Open Corpus Annotation Sangwhan Moon, Won Ik Cho, Hye Joo Han, Naoaki Okazaki and Nam Soo Kim	pp. 4975‑4983
pdf	bib	poster	video	Enriching Grammatical Error Correction Resources for Modern Greek Katerina Korre and John Pavlopoulos	pp. 4984‑4991
pdf	bib	poster	video	A Hmong Corpus with Elaborate Expression Annotations David R. Mortensen, Xinyu Zhang, Chenxuan Cui and Katherine Zhang	pp. 4992‑5000
pdf	bib		video	ELAL: An Emotion Lexicon for the Analysis of Alsatian Theatre Plays Delphine Bernhard and Pablo Ruiz Fabo	pp. 5001‑5010
pdf	bib	poster	video	Universal Dependencies for Western Sierra Puebla Nahuatl Robert Pugh, Marivel Huerta Mendez, Mitsuya Sasaki and Francis Tyers	pp. 5011‑5020
pdf	bib	poster	video	The Construction and Evaluation of the LEAFTOP Dataset of Automatically Extracted Nouns in 1480 Languages Gregory Baker and Diego Molla	pp. 5021‑5028
pdf	bib	poster	video	Huqariq: A Multilingual Speech Corpus of Native Languages of Peru forSpeech Recognition Rodolfo Zevallos, Luis Camacho and Nelsi Melgarejo	pp. 5029‑5034
pdf	bib		video	Writing System and Speaker Metadata for 2,800+ Language Varieties Daan van Esch, Tamar Lucassen, Sebastian Ruder, Isaac Caswell and Clara Rivera	pp. 5035‑5046
pdf	bib		video	The PALMA Corpora of African Varieties of Portuguese Tjerk Hagemeijer, Amália Mendes, Rita Gonçalves, Catarina Cornejo, Raquel Madureira and Michel Généreux	pp. 5047‑5053
pdf	bib		video	A Learning-Based Dependency to Constituency Conversion Algorithm for the Turkish Language Büşra Marşan, Oğuz K. Yıldız, Aslı Kuzgun, Neslihan Cesur, Arife B. Yenice, Ezgi Sanıyar, Oğuzhan Kuyrukçu, Bilge N. Arıcan and Olcay Taner Yıldız	pp. 5054‑5062
pdf	bib		video	Standard German Subtitling of Swiss German TV content: the PASSAGE Project Jonathan David Mutal, Pierrette Bouillon, Johanna Gerlach and Veronika Haberkorn	pp. 5063‑5070
pdf	bib	poster	video	A Survey of Multilingual Models for Automatic Speech Recognition Hemant Yadav and Sunayana Sitaram	pp. 5071‑5079
pdf	bib	poster	video	LuxemBERT: Simple and Practical Data Augmentation in Language Model Pre-Training for Luxembourgish Cedric Lothritz, Bertrand Lebichot, Kevin Allix, Lisa Veiber, TEGAWENDE BISSYANDE, Jacques Klein, Andrey Boytsov, Clément Lefebvre and Anne Goujon	pp. 5080‑5089
pdf	bib	poster	video	PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection Salar Mohtaj, Fatemeh Tavakkoli and Habibollah Asghari	pp. 5090‑5096
pdf	bib	poster	video	Introducing the Welsh Text Summarisation Dataset and Baseline Systems Ignatius Ezeani, Mahmoud El-Haj, Jonathan Morris and Dawn Knight	pp. 5097‑5106
pdf	bib	poster	video	A Systematic Approach to Derive a Refined Speech Corpus for Sinhala Disura Warusawithana, Nilmani Kulaweera, Lakshan Weerasinghe and Buddhika Karunarathne	pp. 5107‑5113
pdf	bib	poster	video	IgboBERT Models: Building and Training Transformer Models for the Igbo Language Chiamaka Chukwuneke, Ignatius Ezeani, Paul Rayson and Mahmoud El-Haj	pp. 5114‑5122
pdf	bib	poster	video	Latvian National Corpora Collection – Korpuss.lv Baiba Saulite, Roberts Darģis, Normunds Gruzitis, Ilze Auzina, Kristīne Levāne-Petrova, Lauma Pretkalniņa, Laura Rituma, Peteris Paikens, Arturs Znotins, Laine Strankale, Kristīne Pokratniece, Ilmārs Poikāns, Guntis Barzdins, Inguna Skadiņa, Anda Baklāne, Valdis Saulespurēns and Jānis Ziediņš	pp. 5123‑5129
pdf	bib	poster	video	Investigating the Relationship Between Romanian Financial News and Closing Prices from the Bucharest Stock Exchange Ioan-Bogdan Iordache, Ana Sabina Uban, Catalin Stoean and Liviu P. Dinu	pp. 5130‑5136
pdf	bib	poster	video	A Free/Open-Source Morphological Analyser and Generator for Sakha Sardana Ivanova, Jonathan Washington and Francis Tyers	pp. 5137‑5142
pdf	bib	poster	video	An Expanded Finite-State Transducer for Tsuut’ina Verbs Joshua Holden, Christopher Cox and Antti Arppe	pp. 5143‑5152
pdf	bib		video	BD-SHS: A Benchmark Dataset for Learning to Detect Online Bangla Hate Speech in Different Social Contexts Nauros Romim, Mosahed Ahmed, Md Saiful Islam, Arnab Sen Sharma, Hriteshwar Talukder and Mohammad Ruhul Amin	pp. 5153‑5162
pdf	bib		video	Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction Mehdi Mirzapour, Waleed Ragheb, Mohammad Javad Saeedizade, Kevin Cousot, Helene Jacquenet, Lawrence Carbon and Mathieu Lafourcade	pp. 5163‑5169
pdf	bib	poster	video	The Badalona Corpus - An Audio, Video and Neuro-Physiological Conversational Dataset Philippe Blache, Salomé Antoine, Dorina De Jong, Lena-Marie Huttner, Emilia Kerr, Thierry Legou, Eliot Maës and Clément François	pp. 5170‑5177
pdf	bib	poster	video	Reading Time and Vocabulary Rating in the Japanese Language: Large-Scale Japanese Reading Time Data Collection Using Crowdsourcing Masayuki Asahara	pp. 5178‑5187
pdf	bib	poster	video	Thematic Fit Bits: Annotation Quality and Quantity Interplay for Event Participant Representation Yuval Marton and Asad Sayeed	pp. 5188‑5197
pdf	bib		video	ChiSense-12: An English Sense-Annotated Child-Directed Speech Corpus Francesco Cabiddu, Lewis Bott, Gary Jones and Chiara Gambi	pp. 5198‑5205
pdf	bib		video	Making People Laugh like a Pro: Analysing Humor Through Stand-Up Comedy Beatrice Turano and Carlo Strapparava	pp. 5206‑5211
pdf	bib	poster	video	Testing Focus and Non-at-issue Frameworks with a Question-under-Discussion-Annotated Corpus Christoph Hesse, Maurice Langner, Ralf Klabunde and Anton Benz	pp. 5212‑5219
pdf	bib	poster	video	Development of a Multilingual CCG Treebank via Universal Dependencies Conversion Tu-Anh Tran and Yusuke Miyao	pp. 5220‑5233
pdf	bib		video	The Automatic Extraction of Linguistic Biomarkers as a Viable Solution for the Early Diagnosis of Mental Disorders Gloria Gagliardi and Fabio Tamburini	pp. 5234‑5242
pdf	bib	poster	video	Singlish Where Got Rules One? Constructing a Computational Grammar for Singlish Siew Yeng Chow and Francis Bond	pp. 5243‑5250
pdf	bib			COSMOS: Experimental and Comparative Studies of Concept Representations in Schoolchildren Jeanne Villaneau and Farida SAID	pp. 5251‑5260
pdf	bib		video	Features of Perceived Metaphoricity on the Discourse Level: Abstractness and Emotionality Prisca Piccirilli and Sabine Schulte im Walde	pp. 5261‑5273
pdf	bib	poster	video	Hollywood Identity Bias Dataset: A Context Oriented Bias Analysis of Movie Dialogues Sandhya Singh, Prapti Roy, Nihar Sahoo, Niteesh Mallela, Himanshu Gupta, Pushpak Bhattacharyya, Milind Savagaonkar, Nidhi Sultan, Roshni Ramnani, Anutosh Maitra and Shubhashis Sengupta	pp. 5274‑5285
pdf	bib		video	VoxCommunis: A Corpus for Cross-linguistic Phonetic Analysis Emily Ahn and Eleanor Chodroff	pp. 5286‑5294
pdf	bib		video	Tracking Textual Similarities in Neo-Latin Drama Networks Andrea Peverelli, Marieke van Erp and Jan Bloemendal	pp. 5295‑5303
pdf	bib	poster	video	Named Entity Recognition in Estonian 19th Century Parish Court Records Siim Orasmaa, Kadri Muischnek, Kristjan Poska and Anna Edela	pp. 5304‑5313
pdf	bib		video	Investigating Independence vs. Control: Agenda-Setting in Russian News Coverage on Social Media Annerose Eichel, Gabriella Lapesa and Sabine Schulte im Walde	pp. 5314‑5323
pdf	bib	poster	video	SLäNDa version 2.0: Improved and Extended Annotation of Narrative and Dialogue in Swedish Literature Sara Stymne and Carin Östman	pp. 5324‑5333
pdf	bib	poster	video	AGILe: The First Lemmatizer for Ancient Greek Inscriptions Evelien de Graaf, Silvia Stopponi, Jasper K. Bos, Saskia Peels-Matthey and Malvina Nissim	pp. 5334‑5344
pdf	bib	poster	video	»textklang« – Towards a Multi-Modal Exploration Platform for German Poetry Nadja Schauffler, Toni Bernhart, Andre Blessing, Gunilla Eschenbach, Markus Gärtner, Kerstin Jung, Anna Kinder, Julia Koch, Sandra Richter, Gabriel Viehhauser, Ngoc Thang Vu, Lorenz Wesemann and Jonas Kuhn	pp. 5345‑5355
pdf	bib		video	Predicting the Proficiency Level of Nonnative Hebrew Authors Isabelle Nguyen and Shuly Wintner	pp. 5356‑5365
pdf	bib		video	Trends, Limitations and Open Challenges in Automatic Readability Assessment Research Sowmya Vajjala	pp. 5366‑5377
pdf	bib		video	HateCheckHIn: Evaluating Hindi Hate Speech Detection Models Mithun Das, Punyajoy Saha, Binny Mathew and Animesh Mukherjee	pp. 5378‑5387
pdf	bib	poster	video	Surfer100: Generating Surveys From Web Resources, Wikipedia-style Irene Li, Alex Fabbri, Rina Kawamura, Yixin Liu, Xiangru Tang, Jaesung tae, Chang Shen, Sally Ma, Tomoe Mizutani and Dragomir Radev	pp. 5388‑5392
pdf	bib		video	MS-LaTTE: A Dataset of Where and When To-do Tasks are Completed Sujay Kumar Jauhar, Nirupama Chandrasekaran, Michael Gamon and Ryen White	pp. 5393‑5403
pdf	bib		video	KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics Saida Mussakhojayeva, Yerbolat Khassanov and Huseyin Atakan Varol	pp. 5404‑5411
pdf	bib	poster	video	A Graph-Based Method for Unsupervised Knowledge Discovery from Financial Texts Joel Oksanen, Abhilash Majumder, Kumar Saunack, Francesca Toni and Arun Dhondiyal	pp. 5412‑5417
pdf	bib	poster	video	Leveraging Mental Health Forums for User-level Depression Detection on Social Media Sravani Boinepelli, Tathagata Raha, Harika Abburi, Pulkit Parikh, Niyati Chhaya and Vasudeva Varma	pp. 5418‑5427
pdf	bib	poster	video	Classifying Implant-Bearing Patients via their Medical Histories: a Pre-Study on Swedish EMRs with Semi-Supervised GanBERT Benjamin Danielsson, Marina Santini, Peter Lundberg, Yosef Al-Abasse, Arne Jonsson, Emma Eneling and Magnus Stridsman	pp. 5428‑5435
pdf	bib	poster	video	Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect Saméh Kchaou, rahma boujelbane, Emna Fsih and Lamia Hadrich-Belguith	pp. 5436‑5443
pdf	bib		video	EnsyNet: A Dataset for Encouragement and Sympathy Detection Tiberiu Sosea and Cornelia Caragea	pp. 5444‑5449
pdf	bib	poster	video	Preliminary Results on the Evaluation of Computational Tools for the Analysis of Quechua and Aymara Marcelo Yuji Himoro and Antonio Pareja-Lora	pp. 5450‑5459
pdf	bib	slides	video	A Tale of Two Regulatory Regimes: Creation and Analysis of a Bilingual Privacy Policy Corpus Siddhant Arora, Henry Hosseini, Christine Utz, Vinayshekhar Bannihatti Kumar, Tristan Dhellemmes, Abhilasha Ravichander, Peter Story, Jasmine Mangat, Rex Chen, Martin Degeling, Thomas Norton, Thomas Hupperich, Shomir Wilson and Norman Sadeh	pp. 5460‑5472
pdf	bib	poster	video	MeSHup: Corpus for Full Text Biomedical Document Indexing Xindi Wang, Robert E. Mercer and Frank Rudzicz	pp. 5473‑5483
pdf	bib		video	Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding Yanjun Gao, Dmitriy Dligach, Timothy Miller, Samuel Tesch, Ryan Laffin, Matthew M. Churpek and Majid Afshar	pp. 5484‑5493
pdf	bib	poster	video	KC4MT: A High-Quality Corpus for Multilingual Machine Translation Vinh Van Nguyen, Ha Nguyen, Huong Thanh Le, Thai Phuong Nguyen, Tan Van Bui, Luan Nghia Pham, Anh Tuan Phan, Cong Hoang-Minh Nguyen, Viet Hong Tran and Anh Huu Tran	pp. 5494‑5502
pdf	bib		video	Developing A Multilabel Corpus for the Quality Assessment of Online Political Talk Kokil Jaidka	pp. 5503‑5510
pdf	bib	poster	video	BILinMID: A Spanish-English Corpus of the US Midwest Irati Hurtado	pp. 5511‑5516
pdf	bib	poster	video	One Document, Many Revisions: A Dataset for Classification and Description of Edit Intents Dheeraj Rajagopal, Xuchao Zhang, Michael Gamon, Sujay Kumar Jauhar, Diyi Yang and Eduard Hovy	pp. 5517‑5524
pdf	bib	poster	video	CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform Yue Cui, Junhui Zhu, Liner Yang, Xuezhi Fang, Xiaobin Chen, Yujie Wang and Erhong Yang	pp. 5525‑5538
pdf	bib	poster	video	A Corpus for Suggestion Mining of German Peer Feedback Roman Rietsche, Eva Ritz, Julius Janda and Dominik Pfütze	pp. 5539‑5547
pdf	bib	poster	video	CLGC: A Corpus for Chinese Literary Grace Evaluation Yi Li, Dong Yu and pengyuan liu	pp. 5548‑5556
pdf	bib		video	Anonymising the SAGT Speech Corpus and Treebank Özlem Çetinoğlu and Antje Schweitzer	pp. 5557‑5564
pdf	bib	poster	video	Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction Daisuke Suzuki, Yujin Takahashi, Ikumi Yamashita, Taichi Aida, Tosho Hirasawa, Michitaka Nakatsuji, Masato Mita and Mamoru Komachi	pp. 5565‑5572
pdf	bib	poster	video	Enhanced Distant Supervision with State-Change Information for Relation Extraction Jui Shah, Dongxu Zhang, Sam Brody and Andrew McCallum	pp. 5573‑5579
pdf	bib	poster	video	The Hebrew Essay Corpus Chen Gafni, Anat Prior and Shuly Wintner	pp. 5580‑5586
pdf	bib	poster	video	Design and Evaluation of the Corpus of Everyday Japanese Conversation Hanae Koiso, Haruka Amatani, Yasuharu Den, Yuriko Iseki, Yuichi Ishimoto, Wakako Kashino, Yoshiko Kawabata, Ken’ya Nishikawa, Yayoi Tanaka, Yasuyuki Usuda and Yuka Watanabe	pp. 5587‑5594
pdf	bib		video	Developing Language Resources and NLP Tools for the North Korean Language Arda Akdemir, Yeojoo Jeon and Tetsuo Shibuya	pp. 5595‑5600
pdf	bib	poster	video	Developing a Dataset of Overridden Information in Wikipedia Masatoshi Tsuchiya and Yasutaka Yokoi	pp. 5601‑5608
pdf	bib	poster	video	BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language Bernardo Consoli, Henrique D. P. dos Santos, Ana Helena D. P. S. Ulbrich, Renata Vieira and Rafael H. Bordini	pp. 5609‑5616
pdf	bib	poster	video	Universal Grammatical Dependencies for Portuguese with CINTIL Data, LX Processing and CLARIN support António Branco, João Ricardo Silva, Luís Gomes and João António Rodrigues	pp. 5617‑5626
pdf	bib	poster	video	CWID-hi: A Dataset for Complex Word Identification in Hindi Text Gayatri Venugopal, Dhanya Pramod and Ravi Shekhar	pp. 5627‑5636
pdf	bib	poster	video	Automatic Classification of Russian Learner Errors Alla Rozovskaya	pp. 5637‑5647
pdf	bib	poster	video	Annotation of metaphorical expressions in the Basic Corpus of Polish Metaphors Elżbieta Hajnicz	pp. 5648‑5653
pdf	bib		video	ChiMST: A Chinese Medical Corpus for Word Segmentation and Medical Term Recognition Yuanhe Tian, Han Qin, Fei Xia and Yan Song	pp. 5654‑5664
pdf	bib	poster	video	Building a Synthetic Biomedical Research Article Citation Linkage Corpus Sudipta Singha Roy and Robert E. Mercer	pp. 5665‑5672
pdf	bib	poster	video	Dataset Construction for Scientific-Document Writing Support by Extracting Related Work Section and Citations from PDF Papers Keita Kobayashi, Kohei Koyama, Hiromi Narimatsu and Yasuhiro Minami	pp. 5673‑5682
pdf	bib	poster	video	RuPAWS: A Russian Adversarial Dataset for Paraphrase Identification Nikita Martynov, Irina Krotova, Varvara Logacheva, Alexander Panchenko, Olga Kozlova and Nikita Semenov	pp. 5683‑5691
pdf	bib		video	Atril: an XML Visualization System for Corpus Texts Andressa Rodrigues Gomide, Conceição Carapinha and Cornelia Plag	pp. 5692‑5695
pdf	bib	poster	video	MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi Aryaman Arora, Nitin Venkateswaran and Nathan Schneider	pp. 5696‑5704
pdf	bib			Universal Dependencies for Punjabi Aryaman Arora	pp. 5705‑5711
pdf	bib		video	TeSum: Human-Generated Abstractive Summarization Corpus for Telugu Ashok Urlana, Nirmal Surange, Pavan Baswani, Priyanka Ravva and Manish Shrivastava	pp. 5712‑5722
pdf	bib		video	A Corpus of Simulated Counselling Sessions with Dialog Act Annotation John Lee, Haley Fong, Lai Shuen Judy Wong, Chun Chung Mak, Chi Hin Yip and Ching Wah Larry Ng	pp. 5723‑5730
pdf	bib	poster	video	Interactive Evaluation of Dialog Track at DSTC9 Shikib Mehri, Yulan Feng, Carla Gordon, Seyed Hossein Alavi, David Traum and Maxine Eskenazi	pp. 5731‑5738
pdf	bib		video	HADREB: Human Appraisals and (English) Descriptions of Robot Emotional Behaviors Josue Torres-Fonsesca and Casey Kennington	pp. 5739‑5748
pdf	bib		video	Dialogue Collection for Recording the Process of Building Common Ground in a Collaborative Task Koh Mitsuda, Ryuichiro Higashinaka, Yuhei Oga and Sen Yoshida	pp. 5749‑5758
pdf	bib	poster	video	Collection and Analysis of Travel Agency Task Dialogues with Age-Diverse Speakers Michimasa Inaba, Yuya Chiba, Ryuichiro Higashinaka, Kazunori Komatani, Yusuke Miyao and Takayuki Nagai	pp. 5759‑5767
pdf	bib	poster	video	Strategy-level Entrainment of Dialogue System Users in a Creative Visual Reference Resolution Task Deepthi Karkada, Ramesh Manuvinakurike, Maike Paetzel-Prüsmann and Kallirroi Georgila	pp. 5768‑5777
pdf	bib	poster	video	MMChat: Multi-Modal Chat Dataset on Social Media Yinhe Zheng, Guanyi Chen, Xin Liu and Jian Sun	pp. 5778‑5786
pdf	bib	poster	video	E-ConvRec: A Large-Scale Conversational Recommendation Dataset for E-Commerce Customer Service meihuizi jia, Ruixue Liu, Peiying Wang, Yang Song, Zexi Xi, Haobin Li, Xin Shen, Meng Chen, Jinhui Pang and Xiaodong He	pp. 5787‑5796
pdf	bib	poster	video	SHONGLAP: A Large Bengali Open-Domain Dialogue Corpus Syed Mostofa Monsur, Sakib Chowdhury, Md Shahrar Fatemi and Shafayat Ahmed	pp. 5797‑5804
pdf	bib	poster	video	A Comparison of Praising Skills in Face-to-Face and Remote Dialogues Toshiki Onishi, Asahi Ogushi, Yohei Tahara, Ryo Ishii, Atsushi Fukayama, Takao Nakamura and Akihiro Miyata	pp. 5805‑5812
pdf	bib		video	Comparing Approaches to Language Understanding for Human-Robot Dialogue: An Error Taxonomy and Analysis Ada Tur and David Traum	pp. 5813‑5820
pdf	bib		video	SPORTSINTERVIEW: A Large-Scale Sports Interview Benchmark for Entity-centric Dialogues Hanfei Sun, Ziyuan Cao and Diyi Yang	pp. 5821‑5828
pdf	bib	poster	video	EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues Gopendra Vikram Singh, Priyanshu Priya, Mauajama Firdaus, Asif Ekbal and Pushpak Bhattacharyya	pp. 5829‑5837
pdf	bib		video	The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts Krishnapriya Vishnubhotla, Adam Hammond and Graeme Hirst	pp. 5838‑5848
pdf	bib		video	Who’s in, who’s out? Predicting the Inclusiveness or Exclusiveness of Personal Pronouns in Parliamentary Debates Ines Rehbein and Josef Ruppenhofer	pp. 5849‑5858
pdf	bib	poster	video	A Language Modelling Approach to Quality Assessment of OCR’ed Historical Text Callum Booth, Robert Shoemaker and Robert Gaizauskas	pp. 5859‑5864
pdf	bib	poster	video	Identifying Copied Fragments in a 18th Century Dutch Chronicle Roser Morante, Eleanor L. T. Smith, Lianne Wilhelmus, Alie Lassche and Erika Kuijpers	pp. 5865‑5878
pdf	bib	poster	video	A Study of Distant Viewing of ukiyo-e prints Konstantina Liagkou, John Pavlopoulos and Ewa Machotka	pp. 5879‑5888
pdf	bib		video	CCTAA: A Reproducible Corpus for Chinese Authorship Attribution Research Haining Wang and Allen Riddell	pp. 5889‑5893
pdf	bib	poster	video	An automatic model and Gold Standard for translation alignment of Ancient Greek Tariq Yousef, Chiara Palladino, Farnoosh Shamsian, Anise d’Orange Ferreira and Michel Ferreira dos Reis	pp. 5894‑5905
pdf	bib		video	Rhetorical Structure Approach for Online Deception Detection: A Survey Francielle Vargas, Jonas D‘Alessandro, Zohar Rabinovich, Fabrício Benevenuto and Thiago Pardo	pp. 5906‑5915
pdf	bib	poster	video	TYPIC: A Corpus of Template-Based Diagnostic Comments on Argumentation Shoichi Naito, Shintaro Sawada, Chihiro Nakagawa, Naoya Inoue, Kenshi Yamaguchi, Iori Shimizu, Farjana Sultana Mim, Keshav Singh and Kentaro Inui	pp. 5916‑5928
pdf	bib		video	Towards Speaker Verification for Crowdsourced Speech Collections John Mendonca, Rui Correia, Mariana Lourenço, João Freitas and Isabel Trancoso	pp. 5929‑5937
pdf	bib	poster	video	Align-smatch: A Novel Evaluation Method for Chinese Abstract Meaning Representation Parsing based on Alignment of Concept and Relation Liming Xiao, Bin Li, Zhixing Xu, Kairui Huo, Minxuan Feng, Junsheng Zhou and Weiguang Qu	pp. 5938‑5945
pdf	bib	poster	video	Dynamic Human Evaluation for Relative Model Comparisons Thórhildur Thorleiksdóttir, Cedric Renggli, Nora Hollenstein and Ce Zhang	pp. 5946‑5955
pdf	bib		video	Please, Don’t Forget the Difference and the Confidence Interval when Seeking for the State-of-the-Art Status Yves Bestgen	pp. 5956‑5962
pdf	bib		video	PCR4ALL: A Comprehensive Evaluation Benchmark for Pronoun Coreference Resolution in English Xinran Zhao, Hongming Zhang and Yangqiu Song	pp. 5963‑5973
pdf	bib	poster	video	Estimating Confidence of Predictions of Individual Classifiers and TheirEnsembles for the Genre Classification Task Mikhail Lepekhin and Serge Sharoff	pp. 5974‑5982
pdf	bib	poster	video	What do we really know about State of the Art NER? Sowmya Vajjala and Ramya Balasubramaniam	pp. 5983‑5993
pdf	bib	poster	video	ProQE: Proficiency-wise Quality Estimation dataset for Grammatical Error Correction Yujin Takahashi, Masahiro Kaneko, Masato Mita and Mamoru Komachi	pp. 5994‑6000
pdf	bib	poster	video	Evaluation of Off-the-shelf Speech Recognizers on Different Accents in a Dialogue Domain Divya Tadimeti, Kallirroi Georgila and David Traum	pp. 6001‑6008
pdf	bib		video	Sentence Pair Embeddings Based Evaluation Metric for Abstractive and Extractive Summarization Ramya Akula and Ivan Garibay	pp. 6009‑6017
pdf	bib		video	On “Human Parity” and “Super Human Performance” in Machine Translation Evaluation Thierry Poibeau	pp. 6018‑6023
pdf	bib	poster	video	Evaluation Benchmarks for Spanish Sentence Representations Vladimir Araujo, Andrés Carvallo, Souvik Kundu, José Cañete, Marcelo Mendoza, Robert E. Mercer, Felipe Bravo-Marquez, Marie-Francine Moens and Alvaro Soto	pp. 6024‑6034
pdf	bib		video	UMUTextStats: A linguistic feature extraction tool for Spanish José Antonio García-Díaz, Pedro José Vivancos-Vicente, Ángela Almela and Rafael Valencia-García	pp. 6035‑6044
pdf	bib		video	Problem-solving Recognition in Scientific Text Kevin Heffernan and Simone Teufel	pp. 6045‑6058
pdf	bib	poster	video	HRCA+: Advanced Multiple-choice Machine Reading Comprehension Method YUXIANG ZHANG and Hayato Yamana	pp. 6059‑6068
pdf	bib		video	HyperBox: A Supervised Approach for Hypernym Discovery using Box Embeddings Maulik Parmar and Apurva Narayan	pp. 6069‑6076
pdf	bib	poster	video	Extracting Space Situational Awareness Events from News Text Zhengnan Xie, Alice Saebom Kwak, Enfa George, Laura W. Dozal, Hoang Van, Moriba Jah, Roberto Furfaro and Peter Jansen	pp. 6077‑6082
pdf	bib		video	PerCQA: Persian Community Question Answering Dataset Naghme Jamali, Yadollah Yaghoobzadeh and Heshaam Faili	pp. 6083‑6092
pdf	bib		video	GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns Piyawat Lertvittayakumjorn, Leshem Choshen, Eyal Shnarch and Francesca Toni	pp. 6093‑6103
pdf	bib		video	Recurrent Neural Networks with Mixed Hierarchical Structures and EM Algorithm for Natural Language Processing zhaoxin luo and Michael Zhu	pp. 6104‑6113
pdf	bib	poster	video	Korean-Specific Dataset for Table Question Answering Changwook Jun, Jooyoung Choi, Myoseop Sim, Hyun Kim, Hansol Jang and Kyungkoo Min	pp. 6114‑6120
pdf	bib	poster	video	GerCCT: An Annotated Corpus for Mining Arguments in German Tweets on Climate Change Robin Schaefer and Manfred Stede	pp. 6121‑6130
pdf	bib	poster	video	Budget Argument Mining Dataset Using Japanese Minutes from the National Diet and Local Assemblies Yasutomo Kimura, Hokuto Ototake and Minoru Sasaki	pp. 6131‑6138
pdf	bib	poster	video	Context-based Virtual Adversarial Training for Text Classification with Noisy Labels Do-Myoung Lee, Yeachan Kim and Chang gyun Seo	pp. 6139‑6146
pdf	bib	poster	video	FinMath: Injecting a Tree-structured Solver for Question Answering over Financial Reports Chenying Li, Wenbo Ye and Yilun Zhao	pp. 6147‑6152
pdf	bib	poster	video	HeadlineCause: A Dataset of News Headlines for Detecting Causalities Ilya Gusev and Alexey Tikhonov	pp. 6153‑6161
pdf	bib	poster	video	Incorporating Zoning Information into Argument Mining from Biomedical Literature Boyang Liu, Viktor Schlegel, Riza Batista-Navarro and Sophia Ananiadou	pp. 6162‑6169
pdf	bib		video	MAKED: Multi-lingual Automatic Keyword Extraction Dataset Yash Verma, Anubhav Jangra, Sriparna Saha, Adam Jatowt and Dwaipayan Roy	pp. 6170‑6179
pdf	bib	poster	video	From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction Robert Vacareanu, Marco A. Valenzuela-Escárcega, George Caique Gouveia Barbosa, Rebecca Sharp, Gustave Hahn-Powell and Mihai Surdeanu	pp. 6180‑6189
pdf	bib		video	Enhancing Relation Extraction via Adversarial Multi-task Learning Han Qin, Yuanhe Tian and Yan Song	pp. 6190‑6199
pdf	bib	poster	video	Query Obfuscation by Semantic Decomposition Danushka Bollegala, Tomoya Machide and Ken-ichi Kawarabayashi	pp. 6200‑6211
pdf	bib		video	TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks Ruofan Hu, Dongyu Zhang, Dandan Tao, Thomas Hartvigsen, Hao Feng and Elke Rundensteiner	pp. 6212‑6222
pdf	bib		video	Named Entity Recognition to Detect Criminal Texts on the Web Paweł Skórzewski, Mikołaj Pieniowski and Grazyna Demenko	pp. 6223‑6231
pdf	bib		video	Task-Driven and Experience-Based Question Answering Corpus for In-Home Robot Application in the House3D Virtual Environment zhuoqun Xu, Liubo Ouyang and Yang Liu	pp. 6232‑6239
pdf	bib	poster	video	ELRC Action: Covering Confidentiality, Correctness and Cross-linguality Tom Vanallemeersch, Arne Defauw, Sara Szoc, Alina Kramchaninova, Joachim Van den Bogaert and Andrea Lösch	pp. 6240‑6249
pdf	bib	poster	video	RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports Sarvesh Soni, Meghana Gudala, Atieh Pajouhi and Kirk Roberts	pp. 6250‑6259
pdf	bib	poster	video	Knowledge Graph - Deep Learning: A Case Study in Question Answering in Aviation Safety Domain Ankush Agarwal, Raj Gite, Shreya Laddha, Pushpak Bhattacharyya, Satyanarayan Kar, Asif Ekbal, Prabhjit Thind, Rajesh Zele and Ravi Shankar	pp. 6260‑6270
pdf	bib	poster	video	A Bayesian Topic Model for Human-Evaluated Interpretability Justin Wood, Corey Arnold and Wei Wang	pp. 6271‑6279
pdf	bib		video	A Large Interlinked Knowledge Graph of the Italian Cultural Heritage Stefano Faralli, Andrea Lenzi and Paola Velardi	pp. 6280‑6289
pdf	bib	poster	video	Training on Lexical Resources Kenneth Church, Xingyu Cai and Yuchen Bian	pp. 6290‑6299
pdf	bib	poster	video	Challenging the Assumption of Structure-based embeddings in Few- and Zero-shot Knowledge Graph Completion Filip Cornell, Chenda zhang, Jussi Karlgren and Sarunas Girdzijauskas	pp. 6300‑6309
pdf	bib		video	Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases Andis Lagzdiņš, Uldis Siliņš, Toms Bergmanis, Mārcis Pinnis, Artūrs Vasiļevskis and Andrejs Vasiļjevs	pp. 6310‑6316
pdf	bib	poster	video	RELATE: Generating a linguistically inspired Knowledge Graph for fine-grained emotion classification Annika Marie Schoene, Nina Dethlefs and Sophia Ananiadou	pp. 6317‑6327
pdf	bib		video	Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR Nina Markl and Stephen Joseph McNulty	pp. 6328‑6339
pdf	bib	poster	video	Masader: Metadata Sourcing for Arabic Text and Speech Data Resources Zaid Alyafeai, Maraim Masoud, Mustafa Ghaleb and Maged S. Al-shaibani	pp. 6340‑6351
pdf	bib	poster	video	Linghub2: Language Resource Discovery Tool for Language Technologies Cécile Robin, Gautham Vadakkekara Suresh, Víctor Rodriguez-Doncel, John P. McCrae and Paul Buitelaar	pp. 6352‑6360
pdf	bib		video	CxLM: A Construction and Context-aware Language Model Yu-Hsiang Tseng, Cing-Fang Shih, Pin-Er Chen, Hsin-Yu Chou, Mao-Chang Ku and Shu-Kai HSIEH	pp. 6361‑6369
pdf	bib	poster	video	The Lexometer: A Shiny Application for Exploratory Analysis and Visualization of Corpus Data Oufan Hai, Matthew Sundberg, Katherine Trice, Rebecca Friedman and Scott Grimm	pp. 6370‑6376
pdf	bib	poster	video	TallVocabL2Fi: A Tall Dataset of 15 Finnish L2 Learners’ Vocabulary Frankie Robertson, Li-Hsin Chang and Sini Söyrinki	pp. 6377‑6386
pdf	bib	poster	video	CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts Muskan Garg, Chandni Saxena, Sriparna Saha, Veena Krishnan, Ruchi Joshi and Vijay Mago	pp. 6387‑6396
pdf	bib		video	How Does the Experimental Setting Affect the Conclusions of Neural Encoding Models? Xiaohan Zhang, Shaonan Wang and Chengqing Zong	pp. 6397‑6404
pdf	bib	poster	video	SPADE: A Big Five-Mturk Dataset of Argumentative Speech Enriched with Socio-Demographics for Personality Detection Elma Kerz, Yu Qiao, Sourabh Zanwar and Daniel Wiechmann	pp. 6405‑6419
pdf	bib		video	Progress in Multilingual Speech Recognition for Low Resource Languages Kurmanji Kurdish, Cree and Inuktut vishwa gupta and Gilles Boulianne	pp. 6420‑6428
pdf	bib		video	Efficient Entity Candidate Generation for Low-Resource Languages Alberto Garcia-Duran, Akhil Arora and Robert West	pp. 6429‑6438
pdf	bib		video	What a Creole Wants, What a Creole Needs Heather Lent, Kelechi Ogueji, Miryam de Lhoneux, Orevaoghene Ahia and Anders Søgaard	pp. 6439‑6449
pdf	bib	poster	video	Extensions to Brahmic script processing within the Nisaba library: new scripts, languages and utilities Alexander Gutkin, Cibu Johny, Raiomond Doctor, Lawrence Wolf-Sonkin and Brian Roark	pp. 6450‑6460
pdf	bib		video	Predicting Embedding Reliability in Low-Resource Settings Using Corpus Similarity Measures Jonathan Dunn, Haipeng Li and Damian Sastre	pp. 6461‑6470
pdf	bib	poster	video	Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation Idris Abdulmumin, Satya Ranjan Dash, Musa Abdullahi Dawud, Shantipriya Parida, Shamsuddeen Muhammad, Ibrahim Sa’id Ahmad, Subhadarshi Panda, Ondřej Bojar, Bashir Shehu Galadanci and Bello Shehu Bello	pp. 6471‑6479
pdf	bib	poster	video	A Survey of Machine Translation Tasks on Nigerian Languages Ebelechukwu Nwafor and Anietie Andy	pp. 6480‑6486
pdf	bib	poster	video	Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung YIU, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram Shi and Pascale Fung	pp. 6487‑6494
pdf	bib	poster	video	Survey on Thai NLP Language Resources and Tools Ratchakrit Arreerard, Stephen Mander and Scott Piao	pp. 6495‑6505
pdf	bib	poster	video	LaoPLM: Pre-trained Language Models for Lao Nankai Lin, Yingwen Fu, Chuwei Chen, Ziyu Yang and Shengyi JIANG	pp. 6506‑6512
pdf	bib	poster	video	The Maaloula Aramaic Speech Corpus (MASC): From Printed Material to a Lemmatized and Time-Aligned Corpus Ghattas Eid, Esther Seyffarth and Ingo Plag	pp. 6513‑6520
pdf	bib	poster	video	VIMQA: A Vietnamese Dataset for Advanced Reasoning and Explainable Multi-hop Question Answering Khang Le, Hien Nguyen, Tung Le Thanh and Minh Nguyen	pp. 6521‑6529
pdf	bib		video	Language Identification for Austronesian Languages Jonathan Dunn and Wikke Nijhof	pp. 6530‑6539
pdf	bib	poster	video	A Mapudüngun FST Morphological Analyser and its Web Interface Andrés Chandía	pp. 6540‑6547
pdf	bib	poster	video	Improving Large-scale Language Models and Resources for Filipino Jan Christian Blaise Cruz and Charibeth Cheng	pp. 6548‑6555
pdf	bib	poster	video	Thirumurai: A Large Dataset of Tamil Shaivite Poems and Classification of Tamil Pann Shankar Mahadevan, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Prabakaran Chandran, Ruba Priyadharshini, Sangeetha S and Bharathi Raja Chakravarthi	pp. 6556‑6562
pdf	bib	poster	video	Generating Monolingual Dataset for Low Resource Language Bodo from old books using Google Keep Sanjib Narzary, Maharaj Brahma, Mwnthai Narzary, Gwmsrang Muchahary, Pranav Kumar Singh, Apurbalal Senapati, Sukumar Nandi and Bidisha Som	pp. 6563‑6570
pdf	bib		video	AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition Dhrubajyoti Pathak, Sukumar Nandi and Priyankoo Sarmah	pp. 6571‑6577
pdf	bib		video	GeezSwitch: Language Identification in Typologically Related Low-resourced East African Languages Fitsum Gaim, Wonsuk Yang and Jong C. Park	pp. 6578‑6584
pdf	bib		video	Handwritten Paleographic Greek Text Recognition: A Century-Based Approach Paraskevi Platanou, John Pavlopoulos and Georgios Papaioannou	pp. 6585‑6589
pdf	bib		video	Quality Control for Crowdsourced Bilingual Dictionary in Low-Resource Languages Hiroki Chida, Yohei Murakami and Mondheera Pituxcoosuvarn	pp. 6590‑6596
pdf	bib	poster	video	An Inflectional Database for Gitksan Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai and Miikka Silfverberg	pp. 6597‑6606
pdf	bib		video	PyCantonese: Cantonese Linguistics and NLP in Python Jackson Lee, Litong Chen, Charles Lam, Chaak Ming Lau and Tsz-Him Tsui	pp. 6607‑6611
pdf	bib		video	Afaan Oromo Hate Speech Detection and Classification on Social Media Teshome Mulugeta Ababu and Michael Melese Woldeyohannis	pp. 6612‑6619
pdf	bib	poster	video	Cross-lingual Linking of Automatically Constructed Frames and FrameNet Ryohei Sasano	pp. 6620‑6625
pdf	bib		video	Aligning the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs Ana-Maria Barbu, Verginica Barbu Mititelu and Cătălin Mititelu	pp. 6626‑6634
pdf	bib		video	PortiLexicon-UD: a Portuguese Lexical Resource according to Universal Dependencies Model Lucelene Lopes, Magali Duran, Paulo Fernandes and Thiago Pardo	pp. 6635‑6643
pdf	bib	poster	video	Extended Parallel Corpus for Amharic-English Machine Translation Andargachew Mekonnen Gezmu, Andreas Nürnberger and Tesfaye Bayu Bati	pp. 6644‑6653
pdf	bib	poster	video	Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French Cheikh M. Bamba Dione, Alla LO, Elhadji Mamadou Nguer and sileye ba	pp. 6654‑6661
pdf	bib	poster	video	Criteria for Useful Automatic Romanization in South Asian Languages Isin Demirsahin, Cibu Johny, Alexander Gutkin and Brian Roark	pp. 6662‑6673
pdf	bib	poster	video	BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation Yuqian Dai, Marc de Kamps and Serge Sharoff	pp. 6674‑6690
pdf	bib		video	CVSS Corpus and Massively Multilingual Speech-to-Speech Translation Ye Jia, Michelle Tadmor Ramanovich, Quan Wang and Heiga Zen	pp. 6691‑6703
pdf	bib	poster	video	JParaCrawl v3.0: A Large-scale English-Japanese Parallel Corpus Makoto Morishita, Katsuki Chousa, Jun Suzuki and Masaaki Nagata	pp. 6704‑6710
pdf	bib	poster	video	Learning How to Translate North Korean through South Korean Hwichan Kim, Sangwhan Moon, Naoaki Okazaki and Mamoru Komachi	pp. 6711‑6718
pdf	bib	poster	video	FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation Wenhao Zhu, Shujian Huang, Tong Pu, Pingxuan Huang, xu zhang, Jian Yu, Wei Chen, Yanfeng Wang and Jiajun CHEN	pp. 6719‑6727
pdf	bib	poster	video	SansTib, a Sanskrit - Tibetan Parallel Corpus and Bilingual Sentence Embedding Model Sebastian Nehrdich	pp. 6728‑6734
pdf	bib	poster	video	VISA: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation Yihang Li, Shuichiro Shimizu, Weiqi Gu, Chenhui Chu and Sadao Kurohashi	pp. 6735‑6743
pdf	bib	poster	video	A Benchmark Dataset for Multi-Level Complexity-Controllable Machine Translation Kazuki Tani, Ryoya Yuasa, Kazuki Takikawa, Akihiro Tamura, Tomoyuki Kajiwara, Takashi Ninomiya and Tsuneo Kato	pp. 6744‑6752
pdf	bib	poster	video	gaHealth: An English–Irish Bilingual Corpus of Health Data Séamus Lankford, Haithem Afli, Órla Ní Loinsigh and Andy Way	pp. 6753‑6758
pdf	bib	poster	video	Translation Memories as Baselines for Low-Resource Machine Translation Rebecca Knowles and Patrick Littell	pp. 6759‑6767
pdf	bib	poster	video	N24News: A New Dataset for Multimodal News Classification Zhen Wang, Xu Shan, Xiangxie Zhang and Jie Yang	pp. 6768‑6775
pdf	bib		video	MultiSubs: A Large-scale Multimodal and Multilingual Dataset Josiah Wang, Josiel Figueiredo and Lucia Specia	pp. 6776‑6785
pdf	bib	poster	video	CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung YIU, Rita Frieske, Holy Lovenia, Genta Winata, Qifeng Chen, Xiaojuan Ma, Bertram Shi and Pascale Fung	pp. 6786‑6793
pdf	bib		video	Multimodal Negotiation Corpus with Various Subjective Assessments for Social-Psychological Outcome Prediction from Non-Verbal Cues Nobukatsu Hojo, Satoshi Kobashikawa, Saki Mizuno and Ryo Masumura	pp. 6794‑6801
pdf	bib		video	MMDAG: Multimodal Directed Acyclic Graph Network for Emotion Recognition in Conversation Shuo Xu, Yuxiang Jia, Changyong Niu and Hongying Zan	pp. 6802‑6807
pdf	bib		video	Automatic Gloss-level Data Augmentation for Sign Language Translation Jin Yea Jang, Han-Mu Park, Saim Shin, Suna Shin, Byungcheon Yoon and Gahgene Gweon	pp. 6808‑6813
pdf	bib	poster	video	Image Description Dataset for Language Learners Kento Tanaka, Taichi Nishimura, Hiroaki Nanjo, Keisuke Shirai, Hirotaka Kameko and Masatake Dantsuji	pp. 6814‑6821
pdf	bib		video	The Multimodal Annotation Software Tool (MAST) Bruno Cardoso and Neil Cohn	pp. 6822‑6828
pdf	bib		video	A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning Gerald Schwiebert, Cornelius Weber, Leyuan Qu, Henrique Siqueira and Stefan Wermter	pp. 6829‑6836
pdf	bib	poster	video	Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers Muskan Garg, Seema Wazarkar, Muskaan Singh and Ondřej Bojar	pp. 6837‑6847
pdf	bib	poster	video	Cross-lingual and Multilingual CLIP Fredrik Carlsson, Philipp Eisen, Faton Rekathati and Magnus Sahlgren	pp. 6848‑6854
pdf	bib	poster	video	BAN-Cap: A Multi-Purpose English-Bangla Image Descriptions Dataset Mohammad Faiyaz Khan, S.M. Sadiq-Ur-Rahman Shifath and Md Saiful Islam	pp. 6855‑6865
pdf	bib		video	SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition Naoki Kimura, Zixiong Su, Takaaki Saeki and Jun Rekimoto	pp. 6866‑6873
pdf	bib		video	A Simple Yet Effective Corpus Construction Method for Chinese Sentence Compression Yang Zhao, Hiroshi Kanayama, Issei Yoshida, Masayasu Muraoka and Akiko Aizawa	pp. 6874‑6883
pdf	bib		video	JADE: Corpus for Japanese Definition Modelling Han Huang, Tomoyuki Kajiwara and Yuki Arase	pp. 6884‑6888
pdf	bib		video	Unraveling the Mystery of Artifacts in Machine Generated Text Jiashu Pu, Ziyi Huang, Yadong Xi, Guandan Chen, Weijie Chen and Rongsheng Zhang	pp. 6889‑6898
pdf	bib		video	Logic-Guided Message Generation from Raw Real-Time Sensor Data Ernie Chang, Alisa Kovtunova, Stefan Borgwardt, Vera Demberg, Kathryn Chapman and Hui-Syuan Yeh	pp. 6899‑6908
pdf	bib	poster	video	The Bull and the Bear: Summarizing Stock Market Discussions Ayush Kumar, Dhyey Jani, Jay Shah, Devanshu Thakar, Varun Jain and Mayank Singh	pp. 6909‑6913
pdf	bib	poster	video	Combination of Contextualized and Non-Contextualized Layers for Lexical Substitution in French Kévin Espasa, Emmanuel Morin and Olivier Hamon	pp. 6914‑6921
pdf	bib	poster	video	SuMe: A Dataset Towards Summarizing Biomedical Mechanisms Mohaddeseh Bastan, Nishant Shankar, Mihai Surdeanu and Niranjan Balasubramanian	pp. 6922‑6931
pdf	bib		video	CATAMARAN: A Cross-lingual Long Text Abstractive Summarization Dataset zheng chen and Hongyu Lin	pp. 6932‑6937
pdf	bib		video	Emotion analysis and detection during COVID-19 Tiberiu Sosea, Chau Pham, Alexander Tekle, Cornelia Caragea and Junyi Jessy Li	pp. 6938‑6947
pdf	bib	poster	video	Cross-lingual Emotion Detection Sabit Hassan, Shaden Shaar and Kareem Darwish	pp. 6948‑6958
pdf	bib		video	DirectQuote: A Dataset for Direct Quotation Extraction and Attribution in News Articles Yuanchi Zhang and Yang Liu	pp. 6959‑6966
pdf	bib	poster	video	VaccineLies: A Natural Language Resource for Learning to Recognize Misinformation about the COVID-19 and HPV Vaccines Maxwell Weinzierl and Sanda Harabagiu	pp. 6967‑6975
pdf	bib		video	Tackling Irony Detection using Ensemble Classifiers Christoph Turban and Udo Kruschwitz	pp. 6976‑6984
pdf	bib	poster	video	Automatic Construction of an Annotated Corpus with Implicit Aspects Aye Aye Mar and Kiyoaki Shirai	pp. 6985‑6991
pdf	bib	poster	video	A Multimodal Corpus for Emotion Recognition in Sarcasm Anupama Ray, Shubham Mishra, Apoorva Nunna and Pushpak Bhattacharyya	pp. 6992‑7003
pdf	bib		video	Annotation of Valence Unfolding in Spoken Personal Narratives Aniruddha Tammewar, Franziska Braun, Gabriel Roccabruna, Sebastian Bayerl, Korbinian Riedhammer and Giuseppe Riccardi	pp. 7004‑7013
pdf	bib		video	A Large-Scale Japanese Dataset for Aspect-based Sentiment Analysis Yuki Nakayama, Koji Murakami, Gautam Kumar, Sudha Bhingardive and Ikuko Hardaway	pp. 7014‑7021
pdf	bib		video	A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain Haruya Suzuki, Yuto Miyauchi, Kazuki Akiyama, Tomoyuki Kajiwara, Takashi Ninomiya, Noriko Takemura, Yuta Nakashima and Hajime Nagahara	pp. 7022‑7028
pdf	bib		video	Complementary Learning of Aspect Terms for Aspect-based Sentiment Analysis Han Qin, Yuanhe Tian, Fei Xia and Yan Song	pp. 7029‑7039
pdf	bib		video	Deep One-Class Hate Speech Detection Model saugata bose and Dr. Guoxin Su	pp. 7040‑7048
pdf	bib	poster	video	Opinions in Interactions : New Annotations of the SEMAINE Database Valentin Barriere, Slim Essid and Chloé Clavel	pp. 7049‑7055
pdf	bib			Pars-ABSA: a Manually Annotated Aspect-based Sentiment Analysis Benchmark on Farsi Product Reviews Taha Shangipour ataei, Kamyar Darvishi, Soroush Javdan, Behrouz Minaei-Bidgoli and Sauleh Eetemadi	pp. 7056‑7060
pdf	bib	poster	video	HindiMD: A Multi-domain Corpora for Low-resource Sentiment Analysis Mamta ., Asif Ekbal, Pushpak Bhattacharyya, Tista Saha, Alka Kumar and Shikha Srivastava	pp. 7061‑7070
pdf	bib	poster	video	Sentiment Analysis of Homeric Text: The 1st Book of Iliad John Pavlopoulos, Alexandros Xenos and Davide Picca	pp. 7071‑7077
pdf	bib	poster	video	The Persian Dependency Treebank Made Universal Pegah Safari, Mohammad Sadegh Rasooli, Amirsaeid Moloodi and Alireza Nourian	pp. 7078‑7087
pdf	bib		video	GujMORPH - A Dataset for Creating Gujarati Morphological Analyzer Jatayu Baxi and brijesh bhatt	pp. 7088‑7095
pdf	bib	poster	video	Informal Persian Universal Dependency Treebank Roya Kabiri, Simin Karimi and Mihai Surdeanu	pp. 7096‑7105
pdf	bib		video	Automatic Correction of Syntactic Dependency Annotation Differences Andrew Zupon, Andrew Carnie, Michael Hammond and Mihai Surdeanu	pp. 7106‑7112
pdf	bib	poster	video	Building Large-Scale Japanese Pronunciation-Annotated Corpora for Reading Heteronymous Logograms Fumikazu Sato, Naoki Yoshinaga and Masaru Kitsuregawa	pp. 7113‑7121
pdf	bib		video	StyleKQC: A Style-Variant Paraphrase Corpus for Korean Questions and Commands Won Ik Cho, Sangwhan Moon, Jongin Kim, Seokmin Kim and Nam Soo Kim	pp. 7122‑7128
pdf	bib		video	Syntax-driven Approach for Semantic Role Labeling Yuanhe Tian, Han Qin, Fei Xia and Yan Song	pp. 7129‑7139
pdf	bib		video	HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish Marcin Woliński, Bartłomiej Nitoń, Witold Kieraś and Jakub Szymanik	pp. 7140‑7146
pdf	bib		video	Lexical Resource Mapping via Translations hongchang Bao, Bradley Hauer and Grzegorz Kondrak	pp. 7147‑7154
pdf	bib		video	Unsupervised Attention-based Sentence-Level Meta-Embeddings from Contextualised Language Models Keigo Takahashi and Danushka Bollegala	pp. 7155‑7163
pdf	bib		video	Identification of Fine-Grained Location Mentions in Crisis Tweets Sarthak Khanal, Maria Traskowsky and Doina Caragea	pp. 7164‑7173
pdf	bib		video	HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection Francielle Vargas, Isabelle Carvalho, Fabiana Rodrigues de Góes, Thiago Pardo and Fabrício Benevenuto	pp. 7174‑7183
pdf	bib		video	MentalBERT: Publicly Available Pretrained Language Models for Mental Healthcare Shaoxiong Ji, Tianlin Zhang, Luna Ansari, Jie Fu, Prayag Tiwari and Erik Cambria	pp. 7184‑7190
pdf	bib	poster	video	Leveraging Hashtag Networks for Multimodal Popularity Prediction of Instagram Posts Yu Yun Liao	pp. 7191‑7198
pdf	bib	poster	video	Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis Hang Jiang, Yining Hua, Doug Beeferman and Deb Roy	pp. 7199‑7208
pdf	bib		video	Did that happen? Predicting Social Media Posts that are Indicative of what happened in a scene: A case study of a TV show Anietie Andy, Reno Kriz, Sharath Chandra Guntuku, Derry Tanti Wijaya and Chris Callison-Burch	pp. 7209‑7214
pdf	bib		video	HashSet - A Dataset For Hashtag Segmentation Prashant Kodali, Akshala Bhatnagar, Naman Ahuja, Manish Shrivastava and Ponnurangam Kumaraguru	pp. 7215‑7219
pdf	bib	poster	video	Using Convolution Neural Network with BERT for Stance Detection in Vietnamese Oanh Tran, Anh Cong Phung and Bach Xuan Ngo	pp. 7220‑7225
pdf	bib	poster	video	Annotation-Scheme Reconstruction for "Fake News" and Japanese Fake News Dataset Taichi Murayama, Shohei Hisada, Makoto Uehara, Shoko Wakamiya and Eiji ARAMAKI	pp. 7226‑7234
pdf	bib	poster	video	RoBERTuito: a pre-trained language model for social media text in Spanish Juan Manuel Pérez, Damián Ariel Furman, Laura Alonso Alemany and Franco M. Luque	pp. 7235‑7243
pdf	bib		video	Construction of Responsive Utterance Corpus for Attentive Listening Response Production Koichiro Ito, Masaki Murata, Tomohiro Ohno and Shigeki Matsubara	pp. 7244‑7252
pdf	bib		video	Speak: A Toolkit Using Amazon Mechanical Turk to Collect and Validate Speech Audio Recordings Christopher Song, David Harwath, Tuka Alhanai and James Glass	pp. 7253‑7258
pdf	bib	poster	video	ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation Holy Lovenia, Samuel Cahyawijaya, Genta Winata, Peng Xu, Yan Xu, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram Shi and Pascale Fung	pp. 7259‑7268
pdf	bib	poster	video	A Romanization System and WebMAUS Aligner for Arabic Varieties Jalal Al-Tamimi, Florian Schiel, Ghada Khattab, Navdeep Sokhey, Djegdjiga Amazouz, Abdulrahman Dallak and Hajar Moussa	pp. 7269‑7276
pdf	bib	poster	video	BembaSpeech: A Speech Recognition Corpus for the Bemba Language Claytone Sikasote and Antonios Anastasopoulos	pp. 7277‑7283
pdf	bib	poster	video	BehanceCC: A ChitChat Detection Dataset For Livestreaming Video Transcripts Viet Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt and Thien Huu Nguyen	pp. 7284‑7290
pdf	bib		video	Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection Sheng Li, Jiyi Li, Qianying Liu and Zhuo Gong	pp. 7291‑7297
pdf	bib		video	A new European Portuguese corpus for the study of Psychosis through speech analysis Maria Forjó, Daniel Neto, Alberto Abad, HSofia Pinto and Joaquim Gago	pp. 7298‑7304
pdf	bib	poster	video	Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks Aghilas SINI, Damien Lolive, Nelly Barbot and Pierre Alain	pp. 7305‑7313
pdf	bib	poster	video	Multilingual Transfer Learning for Children Automatic Speech Recognition Thomas Rolland, Alberto Abad, Catia Cucchiarini and Helmer Strik	pp. 7314‑7320
pdf	bib	poster	video	BehanceQA: A New Dataset for Identifying Question-Answer Pairs in Video Transcripts Amir Pouran Ben Veyseh, Viet Lai, Franck Dernoncourt and Thien Huu Nguyen	pp. 7321‑7327
pdf	bib		video	Bidirectional Skeleton-Based Isolated Sign Recognition using Graph Convolutional Networks Konstantinos M. Dafnis, Evgenia Chroni, Carol Neidle and Dimitri Metaxas	pp. 7328‑7338
pdf	bib	poster	video	Deep learning-based end-to-end spoken language identification system for domain-mismatched scenario Woohyun Kang, Md Jahangir Alam and Abderrahim Fathan	pp. 7339‑7343
pdf	bib	poster	video	Handwritten Character Generation using Y-Autoencoder for Character Recognition Model Training Tomoki Kitagawa, Chee Siang Leow and Hiromitsu Nishizaki	pp. 7344‑7351
pdf	bib	poster	video	Attention is All you Need for Robust Temporal Reasoning Lis Kanashiro Pereira	pp. 7352‑7359
pdf	bib	poster	video	PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter Kornraphop Kawintiranon and Lisa Singh	pp. 7360‑7367
pdf	bib	poster	video	Modeling the Impact of Syntactic Distance and Surprisal on Cross-Slavic Text Comprehension Irina Stenger, Philip Georgis, Tania Avgustinova, Bernd Möbius and Dietrich Klakow	pp. 7368‑7376
pdf	bib	poster	video	BERTifying Sinhala - A Comprehensive Analysis of Pre-trained Language Models for Sinhala Text Classification Vinura Dhananjaya, Piyumal Demotte, Surangika Ranathunga and Sanath Jayasena	pp. 7377‑7385
pdf	bib	poster	video	Pre-training and Evaluating Transformer-based Language Models for Icelandic Jón Friðrik Daðason and Hrafn Loftsson	pp. 7386‑7391

Last modified on June 13, 2022, 10:59 a.m.