LREC 2022 Proceedings Home | Workshops | LREC 2022 WEBSITE | ELRA WEBSITE

Proceedings of the 13th Language Resources and Evaluation Conference

 

Full proceedings volume (PDF) | Programme | Author index | Bibliography (BibTeX) | Editors


pdf bib Papers pages
pdf bib Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet
Alexandre Diniz da Costa, Mateus Coutinho Marim, Ely Matos and Tiago Timponi Torrent
pp. 1‑12
pdf bib HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation
Serge Gladkoff and Lifeng Han
pp. 13‑21
pdf bib Priming Ancient Korean Neural Machine Translation
chanjun park, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo and Heuiseok Lim
pp. 22‑28
pdf bib GECO-MT: The Ghent Eye-tracking Corpus of Machine Translation
Toon Colman, Margot Fonteyne, Joke Daems, Nicolas Dirix and Lieve Macken
pp. 29‑38
pdf bib Introducing Frege to Fillmore: A FrameNet Dataset that Captures both Sense and Reference
Levi Remijnse, Piek Vossen, Antske Fokkens and Sam Titarsolej
pp. 39‑50
pdf bib Compiling a Suitable Level of Sense Granularity in a Lexicon for AI Purposes: The Open Source COR Lexicon
Bolette Pedersen, Nathalie Carmen Hau Sørensen, Sanni Nimb, Ida Flørke, Sussi Olsen and Thomas Troelsgård
pp. 51‑60
pdf bib Sense and Sentiment
Francis Bond and Merrick Choo
pp. 61‑69
pdf bib Enriching Linguistic Representation in the Cantonese Wordnet and Building the New Cantonese Wordnet Corpus
Ut Seong Sio and Luís Morgado da Costa
pp. 70‑78
pdf bib ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus
Nizar Habash and David Palfreyman
pp. 79‑88
pdf bib Turkish Universal Conceptual Cognitive Annotation
Necva Bölücü and Burcu Can
pp. 89‑99
pdf bib Introducing the CURLICAT Corpora: Seven-language Domain Specific Annotated Corpora from Curated Sources
Tamás Váradi, Bence Nyéki, Svetla Koeva, Marko Tadić, Vanja Štefanec, Maciej Ogrodniczuk, Bartłomiej Nitoń, Piotr Pęzik, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Dan Tufiș, Radovan Garabík, Simon Krek and Andraž Repar
pp. 100‑108
pdf bib RU-ADEPT: Russian Anonymized Dataset with Eight Personality Traits
C. Anton Rytting, Valerie Novak, James R. Hull, Victor M. Frank, Paul Rodrigues, Jarrett G. W. Lee and Laurel Miller-Sims
pp. 109‑118
pdf bib CoQAR: Question Rewriting on CoQA
Quentin Brabant, Gwénolé Lecorvé and Lina M. Rojas Barahona
pp. 119‑126
pdf bib User Interest Modelling in Argumentative Dialogue Systems
Annalena Aicher, Nadine Gerstenlauer, Wolfgang Minker and Stefan Ultes
pp. 127‑136
pdf bib Every time I fire a conversational designer, the performance of the dialogue system goes down
Giancarlo Xompero, Michele Mastromattei, Samir Salman, Cristina Giannone, Andrea Favalli, Raniero Romagnoli and Fabio Massimo Zanzotto
pp. 137‑145
pdf bib An Empirical Study on the Overlapping Problem of Open-Domain Dialogue Datasets
Yuqiao Wen, Guoqing Luo and Lili Mou
pp. 146‑153
pdf bib Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project
Federica Gamba, Francesca Frontini, Daan Broeder and Monica Monachini
pp. 154‑163
pdf bib How to be FAIR when you CARE: The DGS Corpus as a Case Study of Open Science Resources for Minority Languages
Marc Schulder and Thomas Hanke
pp. 164‑173
pdf bib Italian NLP for Everyone: Resources and Models from EVALITA to the European Language Grid
Valerio Basile, Cristina Bosco, Michael Fell, Viviana Patti and Rossella Varvara
pp. 174‑180
pdf bib Cross-Lingual Link Discovery for Under-Resourced Languages
Michael Rosner, Sina Ahmadi, Elena-Simona Apostol, Julia Bosque-Gil, Christian Chiarcos, Milan Dojchinovski, Katerina Gkirtzou, Jorge Gracia, Dagmar Gromann, Chaya Liebeskind, Giedrė Valūnaitė Oleškevičienė, Gilles Sérasset and Ciprian-Octavian Truică
pp. 181‑192
pdf bib Angry or Sad ? Emotion Annotation for Extremist Content Characterisation
Valentina Dragos, Delphine Battistelli, Aline Etienne and Yolène Constable
pp. 193‑201
pdf bib Identification of Multiword Expressions in Tweets for Hate Speech Detection
Nicolas Zampieri, Carlos Ramisch, Irina Illina and Dominique Fohr
pp. 202‑210
pdf bib Causal Investigation of Public Opinion during the COVID-19 Pandemic via Social Media Text
Michael Jantscher and Roman Kern
pp. 211‑226
pdf bib Misspelling Semantics in Thai
Pakawat Nakwijit and Matthew Purver
pp. 227‑236
pdf bib Automatic Detection of Stigmatizing Uses of Psychiatric Terms on Twitter
Véronique MORICEAU, Farah Benamara and Abdelmoumene Boumadane
pp. 237‑243
pdf bib CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets
Isabelle Mohr, Amelie Wührl and Roman Klinger
pp. 244‑257
pdf bib XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond
Francesco Barbieri, Luis Espinosa Anke and Jose Camacho-Collados
pp. 258‑266
pdf bib ‘Am I the Bad One’? Predicting the Moral Judgement of the Crowd Using Pre–trained Language Models
Areej Alhassan, Jinkai Zhang and Viktor Schlegel
pp. 267‑276
pdf bib Generating Questions from Wikidata Triples
Kelvin Han, Thiago Castro Ferreira and Claire Gardent
pp. 277‑290
pdf bib Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition
Matteo Muffo, Aldo Cocco and Enrico Bertino
pp. 291‑297
pdf bib Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization
Yuji Naraki, Tetsuya Sakai and Yoshihiko Hayashi
pp. 298‑304
pdf bib Perceived Text Quality and Readability in Extractive and Abstractive Summaries
Julius Monsen and Evelina Rennes
pp. 305‑312
pdf bib Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization
Alex Mei, Anisha Kabir, Rukmini Bapat, John Judge, Tony Sun and William Yang Wang
pp. 313‑318
pdf bib Automating Horizon Scanning in Future Studies
Tatsuya Ishigaki, Suzuko Nishino, Sohei Washino, Hiroki Igarashi, Yukari Nagai, Yuichi Washida and Akihiko Murai
pp. 319‑327
pdf bib ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining
Nguyen Minh, Vu Hoang Tran, Vu Hoang, Huy Duc Ta, Trung Huu Bui and Steven Quoc Hung Truong
pp. 328‑337
pdf bib Privacy-Preserving Graph Convolutional Networks for Text Classification
Timour Igamberdiev and Ivan Habernal
pp. 338‑350
pdf bib ArMATH: a Dataset for Solving Arabic Math Word Problems
Reem Alghamdi, Zhenwen Liang and Xiangliang Zhang
pp. 351‑362
pdf bib KIMERA: Injecting Domain Knowledge into Vacant Transformer Heads
Benjamin Winter, Alexei Figueroa Rosero, Alexander Löser, Felix Alexander Gers and Amy Siu
pp. 363‑373
pdf bib Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
Andrei-Marius Avram, Darius Catrina, Dumitru-Clementin Cercel, Mihai Dascalu, Traian Rebedea, Vasile Pais and Dan Tufis
pp. 374‑384
pdf bib Personalized Filled-pause Generation with Group-wise Prediction Models
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi and Hiroshi Saruwatari
pp. 385‑392
pdf bib Transformer versus LSTM Language Models trained on Uncertain ASR Hypotheses in Limited Data Scenarios
Imran Sheikh, Emmanuel Vincent and Irina Illina
pp. 393‑399
pdf bib Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised?
Boshko Koloski, Senja Pollak, Blaž Škrlj and Matej Martinc
pp. 400‑409
pdf bib Evaluating Pretraining Strategies for Clinical BERT Models
Anastasios Lamproudis, Aron Henriksson and Hercules Dalianis
pp. 410‑416
pdf bib KazNERD: Kazakh Named Entity Recognition Dataset
Rustem Yeshpanov, Yerbolat Khassanov and Huseyin Atakan Varol
pp. 417‑426
pdf bib Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization
Michail Mersinias and Panagiotis Valvis
pp. 427‑435
pdf bib Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning
Mike Zhang, Kristian Nørgaard Jensen and Barbara Plank
pp. 436‑447
pdf bib Semantic Role Labelling for Dutch Law Texts
Roos Bakker, Romy A.N. van Drie, Maaike de Boer, Robert van Doesburg and Tom van Engers
pp. 448‑457
pdf bib English Language Spelling Correction as an Information Retrieval Task Using Wikipedia Search Statistics
Kyle Goslin and Markus Hofmann
pp. 458‑464
pdf bib CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction
Meisin Lee, Lay-Ki Soon, Eu Gene Siew and Ly Fie Sugianto
pp. 465‑479
pdf bib Claim Extraction and Law Matching for COVID-19-related Legislation
Niklas Dehio, Malte Ostendorff and Georg Rehm
pp. 480‑490
pdf bib Constructing A Dataset of Support and Attack Relations in Legal Arguments in Court Judgements using Linguistic Rules
Basit Ali, Sachin Pawar, Girish Palshikar and Rituraj Singh
pp. 491‑500
pdf bib KIND: an Italian Multi-Domain Dataset for Named Entity Recognition
Teresa Paccosi and Alessio Palmero Aprosio
pp. 501‑507
pdf bib Russian Jeopardy! Data Set for Question-Answering Systems
Elena Mikhalkova and Alexander A. Khlyupin
pp. 508‑514
pdf bib Know Better – A Clickbait Resolving Challenge
Benjamin Hättasch and Carsten Binnig
pp. 515‑523
pdf bib Valet: Rule-Based Information Extraction for Rapid Deployment
Dayne Freitag, John Cadigan, Robert Sasseen and Paul Kalmar
pp. 524‑533
pdf bib Negation Detection in Dutch Spoken Human-Computer Conversations
Tom Sweers, Iris Hendrickx and Helmer Strik
pp. 534‑542
pdf bib Reflections on 30 Years of Language Resource Development and Sharing
Christopher Cieri, Mark Liberman, Sunghye Cho, Stephanie Strassel, James Fiumara and Jonathan Wright
pp. 543‑550
pdf bib Language Resources to Support Language Diversity – the ELRA Achievements
Valérie Mapelli, Victoria Arranz, Khalid Choukri and Hélène Mazo
pp. 551‑558
pdf bib Ethical Issues in Language Resources and Language Technology – Tentative Categorisation
Pawel Kamocki and Andreas Witt
pp. 559‑563
pdf bib Do we Name the Languages we Study? The #BenderRule in LREC and ACL articles
Fanny Ducel, Karën Fort, Gaël Lejeune and Yves Lepage
pp. 564‑573
pdf bib Aspect-Based Emotion Analysis and Multimodal Coreference: A Case Study of Customer Comments on Adidas Instagram Posts
Luna De Bruyne, Akbar Karimi, Orphee De Clercq, Andrea Prati and Veronique Hoste
pp. 574‑580
pdf bib Multi-source Multi-domain Sentiment Analysis with BERT-based Models
Gabriel Roccabruna, Steve Azzolin and Giuseppe Riccardi
pp. 581‑589
pdf bib NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis
Shamsuddeen Hassan Muhammad, David Adelani, Anuoluwapo Aremu and Idris Abdulmumin
pp. 590‑602
pdf bib A (Psycho-)Linguistically Motivated Scheme for Annotating and Exploring Emotions in a Genre-Diverse Corpus
Aline Etienne, Delphine Battistelli and Gwénolé Lecorvé
pp. 603‑612
pdf bib Integrating a Phrase Structure Corpus Grammar and a Lexical-Semantic Network: the HOLINET Knowledge Graph
Jean-Philippe Prost
pp. 613‑622
pdf bib On the Impact of Temporal Representations on Metaphor Detection
Giorgio Ottolina, Matteo Luigi Palmonari, Manuel Vimercati and Mehwish Alam
pp. 623‑632
pdf bib Analysis and Prediction of NLP Models via Task Embeddings
Damien Sileo and Marie-Francine Moens
pp. 633‑647
pdf bib Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data
Amir Hazem, Merieme Bouhandi, Florian Boudin and Beatrice Daille
pp. 648‑662
pdf bib Few-Shot Learning for Argument Aspects of the Nuclear Energy Debate
Lena Jurkschat, Gregor Wiedemann, Maximilian Heinrich, Mattes Ruckdeschel and Sunna Torge
pp. 663‑672
pdf bib MuLVE, A Multi-Language Vocabulary Evaluation Data Set
Anik Jacobsen, Salar Mohtaj and Sebastian Möller
pp. 673‑679
pdf bib PLOD: An Abbreviation Detection Dataset for Scientific Documents
Leonardo Zilio, Hadeel Saadany, Prashant Sharma, Diptesh Kanojia and Constantin Orăsan
pp. 680‑688
pdf bib Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms
Tosin Adewumi, Roshanak Vadoodi, Aparajita Tripathy, Konstantina Nikolaido, Foteini Liwicki and Marcus Liwicki
pp. 689‑696
pdf bib LeSpell - A Multi-Lingual Benchmark Corpus of Spelling Errors to Develop Spellchecking Methods for Learner Language
Marie Bexte, Ronja Laarmann-Quante, Andrea Horbach and Torsten Zesch
pp. 697‑706
pdf bib Subjective Text Complexity Assessment for German
Laura Seiffe, Fares Kallel, Sebastian Möller, Babak Naderi and Roland Roller
pp. 707‑714
pdf bib Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora
Elena Frick, Thomas Schmidt and Henrike Helmer
pp. 715‑722
pdf bib DiaBiz – an Annotated Corpus of Polish Call Center Dialogs
Piotr Pęzik, Gosia Krawentek, Sylwia Karasińska, Paweł Wilk, Paulina Rybińska, Anna Cichosz, Angelika Peljak-Łapińska, Mikołaj Deckert and Michał Adamczyk
pp. 723‑726
pdf bib LaVA – Latvian Language Learner corpus
Roberts Darģis, Ilze Auziņa, Inga Kaija, Kristīne Levāne-Petrova and Kristīne Pokratniece
pp. 727‑731
pdf bib The EuroPat Corpus: A Parallel Corpus of European Patent Data
Kenneth Heafield, Elaine Farrow, Jelmer van der Linde, Gema Ramírez-Sánchez and Dion Wiggins
pp. 732‑740
pdf bib "Beste Grüße, Maria Meyer" — Pseudonymization of Privacy-Sensitive Information in Emails
Elisabeth Eder, Michael Wiegand, Ulrike Krieg-Holz and Udo Hahn
pp. 741‑752
pdf bib Criteria for the Annotation of Implicit Stereotypes
Wolfgang Schmeisser-Nieto, Montserrat Nofre and Mariona Taulé
pp. 753‑762
pdf bib Common Phone: A Multilingual Dataset for Robust Acoustic Modelling
Philipp Klumpp, Tomas Arias, Paula Andrea Pérez-Toro, Elmar Noeth and Juan Orozco-Arroyave
pp. 763‑768
pdf bib Curras + Baladi: Towards a Levantine Corpus
Karim Al-Haff, Mustafa Jarrar, Tymaa Hammouda and Fadi Zaraket
pp. 769‑778
pdf bib Annotation Study of Japanese Judgments on Tort for Legal Judgment Prediction with Rationales
Hiroaki Yamada, Takenobu Tokunaga, Ryutaro Ohara, Keisuke Takeshita and Mihoko Sumida
pp. 779‑790
pdf bib Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online
Dana Ruiter, Liane Reiners, Ashwin Geet D’Sa, Thomas Kleinbauer, Dominique Fohr, Irina Illina, Dietrich Klakow, Christian Schemer and Angeliki Monnier
pp. 791‑804
pdf bib ParCorFull2.0: a Parallel Corpus Annotated with Full Coreference
Ekaterina Lapshinova-Koltunski, Pedro Augusto Ferreira, Elina Lartaud and Christian Hardmeier
pp. 805‑813
pdf bib A Multi-Party Dialogue Ressource in French
Maria Boritchev and Maxime Amblard
pp. 814‑823
pdf bib Bicleaner AI: Bicleaner Goes Neural
Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Marta Bañón and Sergio Ortiz Rojas
pp. 824‑831
pdf bib Semi-automatically Annotated Learner Corpus for Russian
Anisia Katinskaia, Maria Lebedeva, Jue Hou and Roman Yangarber
pp. 832‑839
pdf bib UniMorph 4.0: Universal Morphology
Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Abbott Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, brijesh bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud’hommeaux, Maria Nepomniashchaya, fausto giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty and Ekaterina Vylomova
pp. 840‑855
pdf bib Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation
Dmytro Kalpakchi and Johan Boye
pp. 856‑866
pdf bib CyberAgressionAdo-v1: a Dataset of Annotated Online Aggressions in French Collected through a Role-playing Game
Anaïs Ollagnier, Elena Cabrio, Serena Villata and Catherine Blaya
pp. 867‑875
pdf bib Finnish Hate-Speech Detection on Social Media Using CNN and FinBERT
Md Saroar Jahan, Mourad Oussalah and Nabil Arhab
pp. 876‑882
pdf bib Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing
Hyeonseok Moon, chanjun park, Seolhwa Lee, Jaehyung Seo, Jungseob Lee, Sugyeong Eo and Heuiseok Lim
pp. 883‑891
pdf bib Domain Mismatch Doesn’t Always Prevent Cross-lingual Transfer Learning
Daniel Edmiston, Phillip Keung and Noah A. Smith
pp. 892‑899
pdf bib Cross-Lingual Knowledge Transfer for Clinical Phenotyping
Jens-Michalis Papaioannou, Paul Grundmann, Betty van Aken, Athanasios Samaras, Ilias Kyparissidis, George Giannakoulas, Felix Gers and Alexander Loeser
pp. 900‑909
pdf bib The Multilingual Microblog Translation Corpus: Improving and Evaluating Translation of User-Generated Text
Paul McNamee and Kevin Duh
pp. 910‑918
pdf bib Multilingual and Multimodal Learning for Brazilian Portuguese
Júlia Sato, Helena Caseli and Lucia Specia
pp. 919‑927
pdf bib LibriS2S: A German-English Speech-to-Speech Translation Corpus
Pedro Jeuris and Jan Niehues
pp. 928‑935
pdf bib A Linguistically Motivated Test Suite to Semi-Automatically Evaluate German–English Machine Translation Output
Vivien Macketanz, Eleftherios Avramidis, Aljoscha Burchardt, He Wang, Renlong Ai, Shushen Manakhimova, Ursula Strohriegel, Sebastian Möller and Hans Uszkoreit
pp. 936‑947
pdf bib Cross-lingual Transfer of Monolingual Models
Evangelia Gogoulou, Ariel Ekgren, Tim Isbister and Magnus Sahlgren
pp. 948‑955
pdf bib Dataset of Student Solutions to Algorithm and Data Structure Programming Assignments
Fynn Petersen-Frey, Marcus Soll, Louis Kobras, Melf Johannsen, Peter Kling and Chris Biemann
pp. 956‑962
pdf bib Language Patterns and Behaviour of the Peer Supporters in Multilingual Healthcare Conversational Forums
Ishani Mondal, Kalika Bali, Mohit Jain, Monojit Choudhury, Jacki O’Neill, Millicent Ochieng, Kagnoya Awori and Keshet Ronen
pp. 963‑975
pdf bib Frame Shift Prediction
Zheng Xin Yong, Patrick D. Watson, Tiago Timponi Torrent, Oliver Czulo and Collin Baker
pp. 976‑986
pdf bib CLeLfPC: a Large Open Multi-Speaker Corpus of French Cued Speech
Brigitte BIGI, Maryvonne Zimmermann and Carine André
pp. 987‑994
pdf bib Samrómur Children: An Icelandic Speech Corpus
Carlos Daniel Hernandez Mena, David Erik Mollberg, Michal Borský and Jón Guðnason
pp. 995‑1002
pdf bib The Norwegian Parliamentary Speech Corpus
Per Erik Solberg and Pablo Ortiz
pp. 1003‑1008
pdf bib A Speech Recognizer for Frisian/Dutch Council Meetings
Martijn Bentum, Louis ten Bosch, Henk van den Heuvel, Simone Wills, Domenique van der Niet, Jelske Dijkstra and Hans Van de Velde
pp. 1009‑1015
pdf bib Elderly Conversational Speech Corpus with Cognitive Impairment Test and Pilot Dementia Detection Experiment Using Acoustic Characteristics of Speech in Japanese Dialects
Meiko Fukuda, Ryota Nishimura, Maina Umezawa, Kazumasa Yamamoto, Yurie Iribe and Norihide Kitaoka
pp. 1016‑1022
pdf bib A Spoken Drug Prescription Dataset in French for Spoken Language Understanding
Ali Can Kocabiyikoglu, François Portet, Prudence Gibert, Hervé Blanchon, Jean-Marc Babouchkine and Gaëtan Gavazzi
pp. 1023‑1031
pdf bib Towards an Open-Source Dutch Speech Recognition System for the Healthcare Domain
Cristian Tejedor-García, Berrie van der Molen, Henk van den Heuvel, Arjan van Hessen and Toine Pieters
pp. 1032‑1039
pdf bib A Dataset for Speech Emotion Recognition in Greek Theatrical Plays
Maria Moutti, Sofia Eleftheriou, Panagiotis Koromilas and Theodoros Giannakopoulos
pp. 1040‑1046
pdf bib Audiobook Dialogues as Training Data for Conversational Style Synthetic Voices
Liisi Piits, Hille Pajupuu, Heete Sahkai, Rene Altrov, Liis Ermus, Kairi Tamuri, Indrek Hein, Meelis Mihkla, Indrek Kiissel, Egert Männisalu, Kristjan Suluste and Jaan Pajupuu
pp. 1047‑1053
pdf bib Using a Knowledge Base to Automatically Annotate Speech Corpora and to Identify Sociolinguistic Variation
Yaru WU, Fabian Suchanek, Ioana Vasilescu, Lori Lamel and Martine Adda-Decker
pp. 1054‑1060
pdf bib Phone Inventories and Recognition for Every Language
Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black and Shinji Watanabe
pp. 1061‑1067
pdf bib Constructing Parallel Corpora from COVID-19 News using MediSys Metadata
Dimitrios Roussis, Vassilis Papavassiliou, Sokratis Sofianopoulos, Prokopis Prokopidis and Stelios Piperidis
pp. 1068‑1072
pdf bib A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes
Dongxu Zhang, Sunil Mohan, Michaela Torkar and Andrew McCallum
pp. 1073‑1082
pdf bib DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
Jayetri Bardhan, Anthony Colas, Kirk Roberts and Daisy Zhe Wang
pp. 1083‑1097
pdf bib Efficiently and Thoroughly Anonymizing a Transformer Language Model for Dutch Electronic Health Records: a Two-Step Method
Stella Verkijk and Piek Vossen
pp. 1098‑1103
pdf bib BERTrade: Using Contextual Embeddings to Parse Old French
Loïc Grobol, Mathilde Regnault, Pedro Ortiz Suarez, Benoît Sagot, Laurent Romary and Benoit Crabbé
pp. 1104‑1113
pdf bib Out-of-Domain Evaluation of Finnish Dependency Parsing
Jenna Kanerva and Filip Ginter
pp. 1114‑1124
pdf bib TArC: Tunisian Arabish Corpus, First complete release
elisa gugliotta and Marco Dinarelli
pp. 1125‑1136
pdf bib Towards Universal Segmentations: UniSegments 1.0
Zdeněk Žabokrtský, Niyati Bafna, Jan Bodnár, Lukáš Kyjánek, Emil Svoboda, Magda Ševčíková and Jonáš Vidra
pp. 1137‑1149
pdf bib TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP
Steven Moran, Christian Bentz, Ximena Gutierrez-Vasques, Olga Sozinova and Tanja Samardzic
pp. 1150‑1158
pdf bib Leveraging a Bilingual Dictionary to Learn Wolastoqey Word Representations
Diego Bear and Paul Cook
pp. 1159‑1166
pdf bib Unmasking the Myth of Effortless Big Data - Making an Open Source Multi-lingual Infrastructure and Building Language Resources from Scratch
Linda Wiechetek, Katri Hiovain-Asikainen, Inga Lill Sigga Mikkelsen, Sjur Moshagen, Flammie Pirinen, Trond Trosterud and Børre Gaup
pp. 1167‑1177
pdf bib Building and curating conversational corpora for diversity-aware language science and technology
Andreas Liesenfeld and Mark Dingemanse
pp. 1178‑1192
pdf bib EPIC UdS - Creation and Applications of a Simultaneous Interpreting Corpus
Heike Przybyl, Ekaterina Lapshinova-Koltunski, Katrin Menzel, Stefan Fischer and Elke Teich
pp. 1193‑1200
pdf bib Development of a Benchmark Corpus to Support Entity Recognition in Job Descriptions
Thomas Green, Diana Maynard and Chenghua Lin
pp. 1201‑1208
pdf bib CAMIO: A Corpus for OCR in Multiple Languages
Michael Arrigo, Stephanie Strassel, Nolan King, Thao Tran and Lisa Mason
pp. 1209‑1216
pdf bib FABRA: French Aggregator-Based Readability Assessment toolkit
Rodrigo Wilkens, David Alfter, Xiaoou Wang, Alice Pintard, Anaïs Tack, Kevin P. Yancey and Thomas François
pp. 1217‑1233
pdf bib Towards Building a Spoken Dialogue System for Argument Exploration
Annalena Aicher, Nadine Gerstenlauer, Isabel Feustel, Wolfgang Minker and Stefan Ultes
pp. 1234‑1241
pdf bib FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue
chanjun park, Yoonna Jang, Seolhwa Lee, Sungjin Park and Heuiseok Lim
pp. 1242‑1248
pdf bib Self-Contained Utterance Description Corpus for Japanese Dialog
Yuta Hayashibe
pp. 1249‑1255
pdf bib DialCrowd 2.0: A Quality-Focused Dialog System Crowdsourcing Toolkit
Jessica Huynh, Ting-Rui Chiang, Jeffrey Bigham and Maxine Eskenazi
pp. 1256‑1263
pdf bib A Brief Survey of Textual Dialogue Corpora
Hugo Gonçalo Oliveira, Patrícia Ferreira, Daniel Martins, Catarina Silva and Ana Alves
pp. 1264‑1274
pdf bib A Unified Approach to Entity-Centric Context Tracking in Social Conversations
Ulrich Rückert, Srinivas Sunkara, Abhinav Rastogi, Sushant Prakash and Pranav Khaitan
pp. 1275‑1285
pdf bib A Unifying View On Task-oriented Dialogue Annotation
Vojtěch Hudeček, leon-paul Schaub, Daniel Stancl, Patrick Paroubek and Ondřej Dušek
pp. 1286‑1296
pdf bib A Multi-source Graph Representation of the Movie Domain for Recommendation Dialogues Analysis
Antonio Origlia, Martina Di Bratto, Maria Di Maro and Sabrina Mennella
pp. 1297‑1306
pdf bib SHARE: A Lexicon of Harmful Expressions by Spanish Speakers
Flor Miriam Plaza-del-Arco, Ana Belén Parras Portillo, Pilar López Úbeda, Beatriz Gil and María-Teresa Martín-Valdivia
pp. 1307‑1316
pdf bib Wiktextract: Wiktionary as Machine-Readable Structured Data
Tatu Ylonen
pp. 1317‑1325
pdf bib NyLLex: A Novel Resource of Swedish Words Annotated with Reading Proficiency Level
Daniel Holmer and Evelina Rennes
pp. 1326‑1331
pdf bib Making a Semantic Event-type Ontology Multilingual
Zdenka Uresova, Karolina Zaczynska, Peter Bourgonje, Eva Fučíková, Georg Rehm and Jan Hajic
pp. 1332‑1343
pdf bib NomVallex: A Valency Lexicon of Czech Nouns and Adjectives
Veronika Kolářová and Anna Vernerová
pp. 1344‑1352
pdf bib TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively
Izaskun Aldezabal, Jose Mari Arriola and Arantxa Otegi
pp. 1353‑1359
pdf bib Animacy Denoting German Nouns: Annotation and Classification
Manfred Klenner and Anne Göhring
pp. 1360‑1364
pdf bib x-enVENT: A Corpus of Event Descriptions with Experiencer-specific Emotion and Appraisal Annotations
Enrica Troiano, Laura Ana Maria Oberlaender, Maximilian Wegge and Roman Klinger
pp. 1365‑1375
pdf bib Polar Quantification of Actor Noun Phrases for German
Anne Göhring and Manfred Klenner
pp. 1376‑1380
pdf bib Czech Dataset for Cross-lingual Subjectivity Classification
Pavel Přibáň and Josef Steinberger
pp. 1381‑1391
pdf bib RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection
Alexandra Ciobotaru, Mihai Vlad Constantinescu, Liviu P. Dinu and Stefan Dumitrescu
pp. 1392‑1399
pdf bib Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans
Katrin Ortmann
pp. 1400‑1407
pdf bib Probing Pre-trained Auto-regressive Language Models for Named Entity Typing and Recognition
Elena V. Epure and Romain Hennequin
pp. 1408‑1417
pdf bib Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embeddings
Rob van der Goot, Max Müller-Eberstein and Barbara Plank
pp. 1418‑1427
pdf bib The Subject Annotations of the Danish Parliament Corpus (2009-2017) - Evaluated with Automatic Multi-label Classification
Costanza Navarretta and Dorte Haltrup Hansen
pp. 1428‑1436
pdf bib A Systematic Study Reveals Unexpected Interactions in Pre-Trained Neural Machine Translation
Ashleigh Richardson and Janet Wiles
pp. 1437‑1443
pdf bib Holistic Evaluation of Automatic TimeML Annotators
Mustafa Ocal, Adrian Perez, Antonela Radas and Mark Finlayson
pp. 1444‑1453
pdf bib Measuring Uncertainty in Translation Quality Evaluation (TQE)
Serge Gladkoff, Irina Sorokina, Lifeng Han and Alexandra Alekseeva
pp. 1454‑1461
pdf bib Challenging the Transformer-based models with a Classical Arabic dataset: Quran and Hadith
Shatha Altammami and Eric Atwell
pp. 1462‑1471
pdf bib Question Modifiers in Visual Question Answering
William Britton, Somdeb Sarkhel and Deepak Venugopal
pp. 1472‑1479
pdf bib Multimodal Pipeline for Collection of Misinformation Data from Telegram
Jose Sosa and Serge Sharoff
pp. 1480‑1489
pdf bib Identifying Tension in Holocaust Survivors’ Interview: Code-switching/Code-mixing as Cues
Xinyuan Xia, Lu Xiao, Kun Yang and Yueyue Wang
pp. 1490‑1495
pdf bib Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering?
Kristian Nørgaard Jensen and Barbara Plank
pp. 1496‑1508
pdf bib Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset
Svetla Koeva, Ivelina Stoyanova and Jordan Kralev
pp. 1509‑1518
pdf bib Sign Language Production With Avatar Layering: A Critical Use Case over Rare Words
Jung-Ho Kim, Eui Jun Hwang, Sukmin Cho, Du Hui Lee and Jong Park
pp. 1519‑1528
pdf bib The VoxWorld Platform for Multimodal Embodied Agents
Nikhil Krishnaswamy, William Pickard, Brittany Cates, Nathaniel Blanchard and James Pustejovsky
pp. 1529‑1541
pdf bib MemoSen: A Multimodal Dataset for Sentiment Analysis of Memes
Eftekhar Hossain, Omar Sharif and Mohammed Moshiul Hoque
pp. 1542‑1554
pdf bib RUSAVIC Corpus: Russian Audio-Visual Speech in Cars
Denis Ivanko, Alexandr Axyonov, Dmitry Ryumin, Alexey Kashevnik and Alexey Karpov
pp. 1555‑1559
pdf bib A First Corpus of AZee Discourse Expressions
Camille Challant and Michael Filhol
pp. 1560‑1565
pdf bib BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment
Luis Lebron, Yvette Graham, Kevin McGuinness, Konstantinos Kouramas and Noel E. O’Connor
pp. 1566‑1575
pdf bib Abstract Meaning Representation for Gesture
Richard Brutti, Lucia Donatelli, Kenneth Lai and James Pustejovsky
pp. 1576‑1583
pdf bib The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Taja Kuzman, Peter Rupnik and Nikola Ljubešić
pp. 1584‑1594
pdf bib The Spoken Language Understanding MEDIA Benchmark Dataset in the Era of Deep Learning: data updates, training and evaluation tools
Gaëlle Laperrière, Valentin Pelloin, Antoine Caubrière, salima mdhaffar, Nathalie Camelin, Sahar Ghannay, Bassam Jabaian and Yannick Estève
pp. 1595‑1602
pdf bib BasqueGLUE: A Natural Language Understanding Benchmark for Basque
Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri and Aitor Soroa
pp. 1603‑1612
pdf bib Resources and Experiments on Sentiment Classification for Georgian
Nicolas Stefanovitch, Jakub Piskorski and Sopho Kharazi
pp. 1613‑1621
pdf bib CoFiF Plus: A French Financial Narrative Summarisation Corpus
Nadhem ZMANDAR, Tobias Daudert, Sina Ahmadi, Mahmoud El-Haj and Paul Rayson
pp. 1622‑1639
pdf bib Generating Extended and Multilingual Summaries with Pre-trained Transformers
Rémi Calizzano, Malte Ostendorff, Qian Ruan and Georg Rehm
pp. 1640‑1650
pdf bib MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
Louis Martin, Angela Fan, Éric de la Clergerie, Antoine Bordes and Benoît Sagot
pp. 1651‑1664
pdf bib Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation
Samhita Honnavalli, Aesha Parekh, Lily Ou, Sophie Groenwold, Sharon Levy, Vicente Ordonez and William Yang Wang
pp. 1665‑1670
pdf bib Combining ELECTRA and Adaptive Graph Encoding for Frame Identification
Fabio Tamburini
pp. 1671‑1679
pdf bib Polysemy in Spoken Conversations and Written Texts
Aina Garí Soler, Matthieu Labeau and Chloé Clavel
pp. 1680‑1690
pdf bib Cross-Level Semantic Similarity for Serbian Newswire Texts
Vuk Batanović and Maja Miličević Petrović
pp. 1691‑1699
pdf bib Universal Proposition Bank 2.0
Ishan Jindal, Alexandre Rademaker, Michał Ulewicz, Ha Linh, Huyen Nguyen, Khoi-Nguyen Tran, Huaiyu Zhu and Yunyao Li
pp. 1700‑1711
pdf bib The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts
Nora Hollenstein, Maria Barrett and Marina Björnsdóttir
pp. 1712‑1720
pdf bib The Brooklyn Multi-Interaction Corpus for Analyzing Variation in Entrainment Behavior
Andreas Weise, Matthew McNeill and Rivka Levitan
pp. 1721‑1731
pdf bib Pro-TEXT: an Annotated Corpus of Keystroke Logs
Aleksandra Miletic, Christophe Benzitoun, Georgeta Cislaru and Santiago Herrera-Yanez
pp. 1732‑1739
pdf bib Work Hard, Play Hard: Collecting Acceptability Annotations through a 3D Game
Federico Bonetti, Elisa Leonardelli, Daniela Trotta, Raffaele Guarasci and Sara Tonelli
pp. 1740‑1750
pdf bib DiHuTra: a Parallel Corpus to Analyse Differences between Human Translations
Ekaterina Lapshinova-Koltunski, Maja Popović and Maarit Koponen
pp. 1751‑1760
pdf bib Data Expansion Using WordNet-based Semantic Expansion and Word Disambiguation for Cyberbullying Detection
Md Saroar Jahan, Djamila Romaissa Beddiar, Mourad Oussalah and Muhidin Mohamed
pp. 1761‑1770
pdf bib ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation
Peter Polák, Muskaan Singh, Anna Nedoluzhko and Ondřej Bojar
pp. 1771‑1779
pdf bib KSoF: The Kassel State of Fluency Dataset – A Therapy Centered Dataset of Stuttering
Sebastian Bayerl, Alexander Wolff von Gudenberg, Florian Hönig, Elmar Noeth and Korbinian Riedhammer
pp. 1780‑1787
pdf bib EZCAT: an Easy Conversation Annotation Tool
Gaël Guibon, Luce Lefeuvre, Matthieu Labeau and Chloé Clavel
pp. 1788‑1797
pdf bib Spoken Language Treebanks in Universal Dependencies: an Overview
Kaja Dobrovoljc
pp. 1798‑1806
pdf bib LeConTra: A Learner Corpus of English-to-Dutch News Translation
Bram Vanroy and Lieve Macken
pp. 1807‑1816
pdf bib Annotating Attribution in Czech News Server Articles
Barbora Hladka, Jiří Mírovský, Matyáš Kopp and Václav Moravec
pp. 1817‑1823
pdf bib Xposition: An Online Multilingual Database of Adpositional Semantics
Luke Gessler, Nathan Schneider, Joseph C. Ledford and Austin Blodgett
pp. 1824‑1830
pdf bib A Study in Contradiction: Data and Annotation for AIDA Focusing on Informational Conflict in Russia-Ukraine Relations
Jennifer Tracey, Ann Bies, Jeremy Getman, Kira Griffitt and Stephanie Strassel
pp. 1831‑1838
pdf bib Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure
Najet Hadj Mohamed, Cherifa Ben Khelil, Agata Savary, Iskandar keskes, Jean-Yves Antoine and Lamia Hadrich-Belguith
pp. 1839‑1848
pdf bib Annotation of Communicative Functions of Short Feedback Tokens in Switchboard
Carol Figueroa, Adaeze Adigwe, Magalie Ochs and Gabriel Skantze
pp. 1849‑1859
pdf bib A Dataset of Offensive Language in Kosovo Social Media
Adem Ajvazi and Christian Hardmeier
pp. 1860‑1869
pdf bib The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses
Bashar Alhafni, Nizar Habash and Houda Bouamor
pp. 1870‑1884
pdf bib The Engage Corpus: A Social Media Dataset for Text-Based Recommender Systems
Daniel Cheng, Kyle Yan, Phillip Keung and Noah A. Smith
pp. 1885‑1889
pdf bib Annotating Arguments in a Corpus of Opinion Articles
Gil Rocha, Luís Trigo, Henrique Lopes Cardoso, Rui Sousa-Silva, Paula Carvalho, Bruno Martins and Miguel Won
pp. 1890‑1899
pdf bib German Parliamentary Corpus (GerParCor)
Giuseppe Abrami, Mevlüt Bagci, Leon Hammerla and Alexander Mehler
pp. 1900‑1906
pdf bib NerKor+Cars-OntoNotes++
Attila Novák and Borbála Novák
pp. 1907‑1916
pdf bib A Comparative Cross Language View On Acted Databases Portraying Basic Emotions Utilising Machine Learning
Felix Burkhardt, Anabell Hacker, Uwe Reichel, Hagen Wierstorf, Florian Eyben and Björn Schuller
pp. 1917‑1924
pdf bib Nkululeko: A Tool For Rapid Speaker Characteristics Detection
Felix Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben and Björn Schuller
pp. 1925‑1932
pdf bib Speech Aerodynamics Database, Tools and Visualisation
Shi YU, Clara Ponchard, Roland Trouville, Sergio Hassid and Didier Demolin
pp. 1933‑1938
pdf bib PATATRA and PATAFreq: two French databases for the documentation of within-speaker variability in speech
Cécile Fougeron, Nicolas Audibert, cedric Gendrot, Estelle Chardenon and Louise Wohmann
pp. 1939‑1944
pdf bib The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
Jonathan Mukiibi, Andrew Katumba, Joyce Nakatumba-Nabende, Ali Hussein and Joshua Meyer
pp. 1945‑1954
pdf bib Far-Field Speaker Recognition Benchmark Derived From The DiPCo Corpus
Mickael Rouvier and Mohammad Mohammadamini
pp. 1955‑1959
pdf bib Evaluating Sampling-based Filler Insertion with Spontaneous TTS
Siyang Wang, joakim gustafson and Éva Székely
pp. 1960‑1969
pdf bib BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
Peter Mihajlik, Andras Balog, Tekla Etelka Graczi, Anna Kohari, Balázs Tarján and Katalin Mady
pp. 1970‑1977
pdf bib SNuC: The Sheffield Numbers Spoken Language Corpus
Emma Barker, Jon Barker, Robert Gaizauskas, Ning Ma and Monica Lestari Paramita
pp. 1978‑1984
pdf bib The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects
Liang Zhao and Eleanor Chodroff
pp. 1985‑1990
pdf bib The Speed-Vel Project: a Corpus of Acoustic and Aerodynamic Data to Measure Droplets Emission During Speech Interaction
Francesca Carbone, Gilles Bouchet, Alain Ghio, Thierry Legou, Carine André, muriel lalain, Sabrina Kadri, Caterina Petrone, Federica Procino and Antoine Giovanni
pp. 1991‑1999
pdf bib Towards Speech-only Opinion-level Sentiment Analysis
Annalena Aicher, Alisa Gazizullina, Aleksei Gusev, Yuri Matveev and Wolfgang Minker
pp. 2000‑2006
pdf bib At the Intersection of NLP and Sustainable Development: Exploring the Impact of Demographic-Aware Text Representations in Modeling Value on a Corpus of Interviews
Goya van Boven, Stephanie Hirmer and Costanza Conforti
pp. 2007‑2021
pdf bib A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis
Michael Gref, Nike Matthiesen, Sreenivasa Hikkal Venugopala, Shalaka Satheesh, Aswinkumar Vijayananth, Duc Bach Ha, Sven Behnke and Joachim Köhler
pp. 2022‑2031
pdf bib Detecting Optimism in Tweets using Knowledge Distillation and Linguistic Analysis of Optimism
Ștefan Cobeli, Ioan-Bogdan Iordache, Shweta Yadav, Cornelia Caragea, Liviu P. Dinu and Dragoș Iliescu
pp. 2032‑2041
pdf bib Dataset and Baseline for Automatic Student Feedback Analysis
Missaka Herath, Kushan Chamindu, Hashan Maduwantha and Surangika Ranathunga
pp. 2042‑2049
pdf bib EENLP: Cross-lingual Eastern European NLP Index
Alexey Tikhonov, Alex Malkhasov, Andrey Manoshin, George-Andrei Dima, Réka Cserháti, Md.Sadek Hossain Asif and Matt Sárdi
pp. 2050‑2057
pdf bib Slovene SuperGLUE Benchmark: Translation and Evaluation
Aleš Žagar and Marko Robnik-Šikonja
pp. 2058‑2065
pdf bib Speech Resources in the Tamasheq Language
Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier and Yannick Estève
pp. 2066‑2071
pdf bib Aesop’s fable "The North Wind and the Sun" Used as a Rosetta Stone to Extract and Map Spoken Words in Under-resourced Languages
elena knyazeva, Philippe Boula de Mareüil and Frédéric Vernier
pp. 2072‑2079
pdf bib Multilingual Open Text Release 1: Public Domain News in 44 Languages
Chester Palen-Michel, June Kim and Constantine Lignos
pp. 2080‑2089
pdf bib TweetTaglish: A Dataset for Investigating Tagalog-English Code-Switching
Megan Herrera, Ankit Aich and Natalie Parde
pp. 2090‑2097
pdf bib Jojajovai: A Parallel Guarani-Spanish Corpus for MT Benchmarking
Luis Chiruzzo, Santiago Góngora, Aldo Alvarez, Gustavo Giménez-Lugo, Marvin Agüero-Torales and Yliana Rodríguez
pp. 2098‑2107
pdf bib Assessing Multilinguality of Publicly Accessible Websites
Rinalds Vīksna, Inguna Skadiņa, Raivis Skadiņš, Andrejs Vasiļjevs and Roberts Rozis
pp. 2108‑2116
pdf bib A Methodology for Building a Diachronic Dataset of Semantic Shifts and its Application to QC-FR-Diac-V1.0, a Free Reference for French
David Kletz, Philippe Langlais, François Lareau and Patrick Drouin
pp. 2117‑2125
pdf bib CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models
Jörg Frohberg and Frank Binder
pp. 2126‑2140
pdf bib Evaluating Gender Bias in Speech Translation
Marta R. Costa-jussà, Christine Basta and Gerard I. Gállego
pp. 2141‑2147
pdf bib Design Choices in Crowdsourcing Discourse Relation Annotations: The Effect of Worker Selection and Training
Merel Scholman, Valentina Pyatkin, Frances Yung, Ido Dagan, Reut Tsarfaty and Vera Demberg
pp. 2148‑2156
pdf bib TBD3: A Thresholding-Based Dynamic Depression Detection from Social Media for Low-Resource Users
Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian and Ophir Frieder
pp. 2157‑2165
pdf bib SpecNFS: A Challenge Dataset Towards Extracting Formal Models from Natural Language Specifications
Sayontan Ghosh, Amanpreet Singh, Alex Merenstein, Wei Su, Scott A. Smolka, Erez Zadok and Niranjan Balasubramanian
pp. 2166‑2176
pdf bib Argument Similarity Assessment in German for Intelligent Tutoring: Crowdsourced Dataset and First Experiments
Xiaoyu Bai and Manfred Stede
pp. 2177‑2187
pdf bib Leveraging Pre-trained Language Models for Gender Debiasing
Nishtha Jain, Declan Groves, Lucia Specia and Maja Popović
pp. 2188‑2195
pdf bib Unsupervised Embeddings with Graph Auto-Encoders for Multi-domain and Multilingual Hate Speech Detection
Gretel Liz De la Peña Sarracén and Paolo Rosso
pp. 2196‑2204
pdf bib FQuAD2.0: French Question Answering and Learning When You Don’t Know
Quentin Heinrich, Gautier Viaud and Wacim Belblidia
pp. 2205‑2214
pdf bib Large-Scale Hate Speech Detection with Cross-Domain Transfer
Cagri Toraman, Furkan Şahinuç and Eyup Yilmaz
pp. 2215‑2225
pdf bib GLoHBCD: A Naturalistic German Dataset for Language of Health Behaviour Change on Online Support Forums
Selina Meyer and David Elsweiler
pp. 2226‑2235
pdf bib Creating a Data Set of Abstractive Summaries of Turn-labeled Spoken Human-Computer Conversations
Iris Hendrickx
pp. 2236‑2244
pdf bib OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue
Wen Cui, Leanne Rolston, Marilyn Walker and Beth Ann Hockey
pp. 2245‑2256
pdf bib Collecting Visually-Grounded Dialogue with A Game Of Sorts
Bram Willemsen, Dmytro Kalpakchi and Gabriel Skantze
pp. 2257‑2268
pdf bib CoRoSeOf - An Annotated Corpus of Romanian Sexist and Offensive Tweets
Diana Constantina Hoefels, Çağrı Çöltekin and Irina Diana Mădroane
pp. 2269‑2281
pdf bib ArMIS - The Arabic Misogyny and Sexism Corpus with Annotator Subjective Disagreements
Dina Almanea and Massimo Poesio
pp. 2282‑2291
pdf bib Annotating Interruption in Dyadic Human Interaction
Liu YANG, Catherine ACHARD and Catherine PELACHAUD
pp. 2292‑2297
pdf bib The Causal News Corpus: Annotating Causal Relations in Event Sentences from News
Fiona Anting Tan, Ali Hürriyetoğlu, Tommaso Caselli, Nelleke Oostdijk, Tadashi Nomoto, Hansi Hettiarachchi, Iqra Ameer, Onur Uca, Farhana Ferdousi Liza and Tiancheng Hu
pp. 2298‑2310
pdf bib Samrómur: Crowd-sourcing large amounts of data
Staffan Hedström, David Erik Mollberg, Ragnheiður Þórhallsdóttir and Jón Guðnason
pp. 2311‑2316
pdf bib An Annotated Corpus of Textual Explanations for Clinical Decision Support
Roland Roller, Aljoscha Burchardt, Nils Feldhus, Laura Seiffe, Klemens Budde, Simon Ronicke and Bilgin Osmanodja
pp. 2317‑2326
pdf bib LARD: Large-scale Artificial Disfluency Generation
Tatiana Passali, Thanassis Mavropoulos, Grigorios Tsoumakas, Georgios Meditskos and Stefanos Vrochidis
pp. 2327‑2336
pdf bib The CRECIL Corpus: a New Dataset for Extraction of Relations between Characters in Chinese Multi-party Dialogues
Yuru Jiang, Yang Xu, Yuhang Zhan, Weikai He, Yilin Wang, Zixuan Xi, Meiyun Wang, Xinyu Li, Yu Li and Yanchao Yu
pp. 2337‑2344
pdf bib The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic
Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa and Nizar Habash
pp. 2345‑2352
pdf bib A Universal Dependencies Treebank of Ancient Hebrew
Daniel Swanson and Francis Tyers
pp. 2353‑2361
pdf bib Hate Speech Dynamics Against African descent, Roma and LGBTQI Communities in Portugal
Paula Carvalho, Bernardo Cunha, Raquel Santos, Fernando Batista and Ricardo Ribeiro
pp. 2362‑2370
pdf bib Evolving Large Text Corpora: Four Versions of the Icelandic Gigaword Corpus
Starkaður Barkarson, Steinþór Steingrímsson and Hildur Hafsteinsdóttir
pp. 2371‑2381
pdf bib A Pragmatics-Centered Evaluation Framework for Natural Language Understanding
Damien Sileo, Philippe Muller, Tim Van de Cruys and Camille Pradel
pp. 2382‑2394
pdf bib Conversational Analysis of Daily Dialog Data using Polite Emotional Dialogue Acts
Chandrakant Bothe and Stefan Wermter
pp. 2395‑2400
pdf bib Inducing Discourse Marker Inventories from Lexical Knowledge Graphs
Christian Chiarcos
pp. 2401‑2412
pdf bib Story Trees: Representing Documents using Topological Persistence
Pantea Haghighatkhah, Antske Fokkens, Pia Sommerauer, Bettina Speckmann and Kevin Verbeek
pp. 2413‑2429
pdf bib Extracting and Analysing Metaphors in Migration Media Discourse: towards a Metaphor Annotation Scheme
Ana Zwitter Vitez, Mojca Brglez, Marko Robnik Šikonja, Tadej Škvorc, Andreja Vezovnik and Senja Pollak
pp. 2430‑2439
pdf bib DDisCo: A Discourse Coherence Dataset for Danish
Linea Flansmose Mikkelsen, Oliver Kinch, Anders Jess Pedersen and Ophélie Lacroix
pp. 2440‑2445
pdf bib LPAttack: A Feasible Annotation Scheme for Capturing Logic Pattern of Attacks in Arguments
Farjana Sultana Mim, Naoya Inoue, Shoichi Naito, Keshav Singh and Kentaro Inui
pp. 2446‑2459
pdf bib BeSt: The Belief and Sentiment Corpus
Jennifer Tracey, Owen Rambow, Claire Cardie, Adam Dalton, Hoa Trang Dang, Mona Diab, Bonnie Dorr, Louise Guthrie, Magdalena Markowska, Smaranda Muresan, Vinodkumar Prabhakaran, Samira Shaikh and Tomek Strzalkowski
pp. 2460‑2467
pdf bib MOTIF: Contextualized Images for Complex Words to Improve Human Reading
Xintong Wang, Florian Schneider, Özge Alacam, Prateek Chaudhury and Chris Biemann
pp. 2468‑2477
pdf bib Challenges with Sign Language Datasets for Sign Language Recognition and Translation
Mirella De Sisto, Vincent Vandeghinste, Santiago Egea Gómez, Mathieu De Coster, Dimitar Shterionov and Horacio Saggion
pp. 2478‑2487
pdf bib A Low-Cost Motion Capture Corpus in French Sign Language for Interpreting Iconicity and Spatial Referencing Mechanisms
Clémence Mertz, Vincent BARREAUD, Thibaut Le Naour, Damien Lolive and Sylvie Gibet
pp. 2488‑2497
pdf bib The CLAMS Platform at Work: Processing Audiovisual Data from the American Archive of Public Broadcasting
Marc Verhagen, Kelley Lynch, Kyeongmin Rim and James Pustejovsky
pp. 2498‑2506
pdf bib BU-NEmo: an Affective Dataset of Gun Violence News
Carley Reardon, Sejin Paik, Ge Gao, Meet Parekh, Yanling Zhao, Lei Guo, Margrit Betke and Derry Tanti Wijaya
pp. 2507‑2516
pdf bib RoomReader: A Multimodal Corpus of Online Multiparty Conversational Interactions
Justine Reverdy, Sam O’Connor Russell, Louise Duquenne, Diego Garaialde, Benjamin R. Cowan and Naomi Harte
pp. 2517‑2527
pdf bib Quevedo: Annotation and Processing of Graphical Languages
Antonio F. G. Sevilla, Alberto Díaz Esteban and José María Lahoz-Bengoechea
pp. 2528‑2535
pdf bib Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel’s Weekly Video Podcasts
Debjoy Saha, Shravan Nayak and Timo Baumann
pp. 2536‑2540
pdf bib Crowdsourcing Kazakh-Russian Sign Language: FluentSigners-50
Medet Mukushev, Aigerim Kydyrbekova, Alfarabi Imashev, Vadim Kimmelman and Anara Sandygulova
pp. 2541‑2547
pdf bib Connecting a French Dictionary from the Beginning of the 20th Century to Wikidata
Pierre Nugues
pp. 2548‑2555
pdf bib Metaphor annotation for German
Markus Egg and Valia Kordoni
pp. 2556‑2562
pdf bib NorDiaChange: Diachronic Semantic Change Dataset for Norwegian
Andrey Kutuzov, Samia Touileb, Petter Mæhlum, Tita Enstad and Alexandra Wittemann
pp. 2563‑2572
pdf bib Exploring Transformers for Ranking Portuguese Semantic Relations
Hugo Gonçalo Oliveira
pp. 2573‑2582
pdf bib Building Static Embeddings from Contextual Ones: Is It Useful for Building Distributional Thesauri?
Olivier Ferret
pp. 2583‑2590
pdf bib Sentence Selection Strategies for Distilling Word Embeddings from BERT
Yixiao Wang, Zied Bouraoui, Luis Espinosa Anke and Steven Schockaert
pp. 2591‑2600
pdf bib DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish
Gioia Baldissin, Dominik Schlechtweg and Sabine Schulte im Walde
pp. 2601‑2609
pdf bib My Case, For an Adposition: Lexical Polysemy of Adpositions and Case Markers in Finnish and Latin
Daniel Chen and Mans Hulden
pp. 2610‑2616
pdf bib WiC-TSV-de: German Word-in-Context Target-Sense-Verification Dataset and Cross-Lingual Transfer Analysis
Anna Breit, Artem Revenko and Narayani Blaschke
pp. 2617‑2625
pdf bib Re-train or Train from Scratch? Comparing Pre-training Strategies of BERT in the Medical Domain
Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne and Pierre Zweigenbaum
pp. 2626‑2633
pdf bib Universal Semantic Annotator: the First Unified API for WSD, SRL and Semantic Parsing
Riccardo Orlando, Simone Conia, Stefano Faralli and Roberto Navigli
pp. 2634‑2641
pdf bib D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research
Jan Philip Wahle, Terry Ruas, Saif Mohammad and Bela Gipp
pp. 2642‑2651
pdf bib SciPar: A Collection of Parallel Corpora from Scientific Abstracts
Dimitrios Roussis, Vassilis Papavassiliou, Prokopis Prokopidis, Stelios Piperidis and Vassilis Katsouros
pp. 2652‑2657
pdf bib CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms
Martha Gavidia, Patrick Lee, Anna Feldman and JIng Peng
pp. 2658‑2671
pdf bib Camel Treebank: An Open Multi-genre Arabic Dependency Treebank
Nizar Habash, Muhammed AbuOdeh, Dima Taji, Reem Faraj, Jamila El Gizuli and Omar Kallas
pp. 2672‑2681
pdf bib MentSum: A Resource for Exploring Summarization of Mental Health Online Posts
Sajad Sotudeh, Nazli Goharian and Zachary Young
pp. 2682‑2692
pdf bib Klexikon: A German Dataset for Joint Summarization and Simplification
Dennis Aumiller and Michael Gertz
pp. 2693‑2701
pdf bib Applying Automatic Text Summarization for Fake News Detection
Philipp Hartl and Udo Kruschwitz
pp. 2702‑2713
pdf bib Increasing CMDI’s Semantic Interoperability with schema.org
Nino Meisinger, Thorsten Trippel and Claus Zinn
pp. 2714‑2720
pdf bib RefCo and its Checker: Improving Language Documentation Corpora’s Reusability Through a Semi-Automatic Review Process
Herbert Lange and Jocelyn Aznar
pp. 2721‑2729
pdf bib Identification and Analysis of Personification in Hungarian: The PerSECorp project
Gábor Simon
pp. 2730‑2738
pdf bib ISO-based Annotated Multilingual Parallel Corpus for Discourse Markers
Purificação Silvano, Mariana Damova, Giedrė Valūnaitė Oleškevičienė, Chaya Liebeskind, Christian Chiarcos, Dimitar Trajanov, Ciprian-Octavian Truică, Elena-Simona Apostol and Anna Baczkowska
pp. 2739‑2749
pdf bib LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
David Gimeno-Gómez and Carlos-D. Martínez-Hinarejos
pp. 2750‑2758
pdf bib Modality Alignment between Deep Representations for Effective Video-and-Language Learning
Hyeongu Yun, Yongil Kim and Kyomin Jung
pp. 2759‑2770
pdf bib Mutual Gaze and Linguistic Repetition in a Multimodal Corpus
Anais Murat, Maria Koutsombogera and Carl Vogel
pp. 2771‑2780
pdf bib Multidimensional Coding of Multimodal Languaging in Multi-Party Settings
Christophe Parisse, Marion Blondel, Stéphanie Caët, Claire Danet, Coralie Vincent and Aliyah Morgenstern
pp. 2781‑2787
pdf bib Constructing a Lexical Resource of Russian Derivational Morphology
Lukáš Kyjánek, Olga Lyashevskaya, Anna Nedoluzhko, Daniil Vodolazsky and Zdeněk Žabokrtský
pp. 2788‑2797
pdf bib Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship
Temuulen Khishigsuren, Gábor Bella, Khuyagbaatar Batsuren, Abed Alhakim Freihat, Nandu Chandran Nair, Amarsanaa Ganbold, Hadi Khalilia, Yamini Chandrashekar and fausto giunchiglia
pp. 2798‑2807
pdf bib Towards Latvian WordNet
Peteris Paikens, Mikus Grasmanis, Agute Klints, Ilze Lokmane, Lauma Pretkalniņa, Laura Rituma, Madara Stāde and Laine Strankale
pp. 2808‑2815
pdf bib Building Sentiment Lexicons for Mainland Scandinavian Languages Using Machine Translation and Sentence Embeddings
Peng Liu, Cristina Marco and Jon Atle Gulla
pp. 2816‑2825
pdf bib A Thesaurus-based Sentiment Lexicon for Danish: The Danish Sentiment Lexicon
Sanni Nimb, Sussi Olsen, Bolette Pedersen and Thomas Troelsgård
pp. 2826‑2832
pdf bib IndoUKC: A Concept-Centered Indian Multilingual Lexical Resource
Nandu Chandran Nair, Rajendran S. Velayuthan, Yamini Chandrashekar, Gábor Bella and fausto giunchiglia
pp. 2833‑2840
pdf bib Korean Language Modeling via Syntactic Guide
Hyeondey Kim, Seonhoon Kim, INHO KANG, Nojun Kwak and Pascale Fung
pp. 2841‑2849
pdf bib A Whole-Person Function Dictionary for the Mobility, Self-Care and Domestic Life Domains: a Seedset Expansion Approach
Ayah Zirikly, Bart Desmet, Julia Porcino, Jonathan Camacho Maldonado, Pei-Shu Ho, Rafael Jimenez Silva and Maryanne Sacco
pp. 2850‑2855
pdf bib Placing multi-modal, and multi-lingual Data in the Humanities Domain on the Map: the Mythotopia Geo-tagged Corpus
Voula Giouli, Anna Vacalopoulou, Nikolaos Sidiropoulos, Christina Flouda, Athanasios Doupas, Giorgos Giannopoulos, Nikos Bikakis, Vassilis Kaffes and Gregory Stainhaouer
pp. 2856‑2864
pdf bib An Architecture of resolving a multiple link path in a standoff-style data format to enhance the mobility of language resources
Kazushi Ohya
pp. 2865‑2873
pdf bib A Corpus of German Citizen Contributions in Mobility Planning: Supporting Evaluation Through Multidimensional Classification
Julia Romberg, Laura Mark and Tobias Escher
pp. 2874‑2883
pdf bib Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars
Jakob Lesage, Hannah J. Haynie, Hedvig Skirgård, Tobias Weber and Alena Witzlack-Makarevich
pp. 2884‑2890
pdf bib Hong Kong: Longitudinal and Synchronic Characterisations of Protest News between 1998 and 2020
Arya D. McCarthy and Giovanna Maria Dora Dore
pp. 2891‑2900
pdf bib Nunc profana tractemus. Detecting Code-Switching in a Large Corpus of 16th Century Letters
Martin Volk, Lukas Fischer, Patricia Scheurer, Bernard Silvan Schroffenegger, Raphael Schwitter, Phillip Ströbel and Benjamin Suter
pp. 2901‑2908
pdf bib Quality and Efficiency of Manual Annotation: Pre-annotation Bias
Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková and Jan Hajic
pp. 2909‑2918
pdf bib A Comprehensive Evaluation and Correction of the TimeBank Corpus
Mustafa Ocal, Antonela Radas, Jared Hummer, Karine Megerdoomian and Mark Finlayson
pp. 2919‑2927
pdf bib Evaluating Multilingual Sentence Representation Models in a Real Case Scenario
Rocco Tripodi, Rexhina Blloshmi and Simon Levis Sullam
pp. 2928‑2939
pdf bib Validity, Agreement, Consensuality and Annotated Data Quality
Anaëlle Baledent, Yann Mathet, Antoine Widlöcher, Christophe Couronne and Jean-Luc Manguin
pp. 2940‑2948
pdf bib Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding
salima mdhaffar, Valentin Pelloin, Antoine Caubrière, Gaëlle Laperriere, Sahar Ghannay, Bassam Jabaian, Nathalie Camelin and Yannick Estève
pp. 2949‑2956
pdf bib JGLUE: Japanese General Language Understanding Evaluation
Kentaro Kurihara, Daisuke Kawahara and Tomohide Shibata
pp. 2957‑2966
pdf bib Using the LARA Little Prince to compare human and TTS audio quality
Elham Akhlaghi, Ingibjörg Iða Auðunardóttir, Anna Bączkowska, Branislav Bédi, Hakeem Beedar, Harald Berthelsen, Cathy Chua, Catia Cucchiarin, Hanieh Habibi, Ivana Horváthová, Junta Ikeda, Christèle Maizonniaux, Neasa Ní Chiaráin, Chadi Raheb, Manny Rayner, John Sloan, Nikos Tsourakis and Chunlin Yao
pp. 2967‑2975
pdf bib Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations
Chris Emmery, Ákos Kádár, Grzegorz Chrupała and Walter Daelemans
pp. 2976‑2988
pdf bib Constructing Distributions of Variation in Referring Expression Type from Corpora for Model Evaluation
T. Mark Ellison and Fahime Same
pp. 2989‑2997
pdf bib Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis
Aleksandr Perevalov, Xi Yan, Liubov Kovriguina, Longquan Jiang, Andreas Both and Ricardo Usbeck
pp. 2998‑3007
pdf bib Multi-Task Learning for Cross-Lingual Abstractive Summarization
Sho Takase and Naoaki Okazaki
pp. 3008‑3016
pdf bib How Much Context Span is Enough? Examining Context-Related Issues for Document-level MT
Sheila Castilho
pp. 3017‑3025
pdf bib TANDO: A Corpus for Document-level Machine Translation
Harritxu Gete, Thierry Etchegoyhen, David Ponce, Gorka Labaka, Nora Aranberri, Ander Corral, Xabier Saralegi, Igor Ellakuria and Maite Martin
pp. 3026‑3037
pdf bib Unsupervised Machine Translation in Real-World Scenarios
Ona de Gibert Bonet, Iakes Goenaga, Jordi Armengol-Estapé, Olatz Perez-de-Viñaspre, Carla Parra Escartín, Marina Sanchez, Mārcis Pinnis, Gorka Labaka and Maite Melero
pp. 3038‑3047
pdf bib COVID-19 Mythbusters in World Languages
Mana Ashida, Jin-Dong Kim and Lee Seunghun
pp. 3048‑3055
pdf bib On the Multilingual Capabilities of Very Large-Scale English Language Models
Jordi Armengol-Estapé, Ona de Gibert Bonet and Maite Melero
pp. 3056‑3068
pdf bib Evaluating Subtitle Segmentation for End-to-end Generation Systems
Alina Karakanta, François Buet, Mauro Cettolo and François Yvon
pp. 3069‑3078
pdf bib Using Semantic Role Labeling to Improve Neural Machine Translation
Reinhard Rapp
pp. 3079‑3083
pdf bib A Deep Transfer Learning Method for Cross-Lingual Natural Language Inference
Dibyanayan Bandyopadhyay, Arkadipta De, Baban Gain, Tanik Saikh and Asif Ekbal
pp. 3084‑3092
pdf bib Simple TICO-19: A Dataset for Joint Translation and Simplification of COVID-19 Texts
Matthew Shardlow and Fernando Alva-Manchego
pp. 3093‑3102
pdf bib Building Comparable Corpora for Assessing Multi-Word Term Alignment
Omar Adjali, Emmanuel Morin and Pierre Zweigenbaum
pp. 3103‑3112
pdf bib Mean Machine Translations: On Gender Bias in Icelandic Machine Translations
Agnes Sólmundsdóttir, Dagbjört Guðmundsdóttir, Lilja Björk Stefánsdóttir and Anton Ingason
pp. 3113‑3121
pdf bib An Analysis of Dialogue Act Sequence Similarity Across Multiple Domains
Ayesha Enayet and Gita Sukthankar
pp. 3122‑3130
pdf bib Constructing a Culinary Interview Dialogue Corpus with Video Conferencing Tool
Taro Okahisa, Ribeka Tanaka, Takashi Kodama, Yin Jou Huang and Sadao Kurohashi
pp. 3131‑3139
pdf bib UgChDial: A Uyghur Chat-based Dialogue Corpus for Response Space Classification
Zulipiye Yusupujiang and Jonathan Ginzburg
pp. 3140‑3149
pdf bib A Speculative and Tentative Common Ground Handling for Efficient Composition of Uncertain Dialogue
Saki Sudo, Kyoshiro Asano, Koh Mitsuda, Ryuichiro Higashinaka and Yugo Takeuchi
pp. 3150‑3157
pdf bib BaSCo: An Annotated Basque-Spanish Code-Switching Corpus for Natural Language Understanding
Maia Aguirre, Laura García-Sardiña, Manex Serras, Ariane Méndez and Jacobo López
pp. 3158‑3163
pdf bib ProDial – An Annotated Proactive Dialogue Act Corpus for Conversational Assistants using Crowdsourcing
Matthias Kraus, Nicolas Wagner and Wolfgang Minker
pp. 3164‑3173
pdf bib ELITR Minuting Corpus: A Novel Dataset for Automatic Minuting from Multi-Party Meetings in English and Czech
Anna Nedoluzhko, Muskaan Singh, Marie Hledíková, Tirthankar Ghosal and Ondřej Bojar
pp. 3174‑3182
pdf bib Extracting Age-Related Stereotypes from Social Media Texts
Kathleen C. Fraser, Svetlana Kiritchenko and Isar Nejadgholi
pp. 3183‑3194
pdf bib Borrowing or Codeswitching? Annotating for Finer-Grained Distinctions in Language Mixing
Elena Alvarez-Mellado and Constantine Lignos
pp. 3195‑3201
pdf bib Multi-Aspect Transfer Learning for Detecting Low Resource Mental Disorders on Social Media
Ana Sabina Uban, Berta Chulvi and Paolo Rosso
pp. 3202‑3219
pdf bib ArCovidVac: Analyzing Arabic Tweets About COVID-19 Vaccination
Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury and Firoj Alam
pp. 3220‑3230
pdf bib FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias
Flora Sakketou, Joan Plepi, Riccardo Cervero, Henri Jacques Geiss, Paolo Rosso and Lucie Flek
pp. 3231‑3241
pdf bib Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition
Julia Pritzen, Michael Gref, Dietlind Zühlke and Christoph Andreas Schmidt
pp. 3242‑3249
pdf bib SDS-200: A Swiss German Speech to Standard German Text Corpus
Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak and Manfred Vogel
pp. 3250‑3256
pdf bib Extracting Linguistic Knowledge from Speech: A Study of Stop Realization in 5 Romance Languages
Yaru WU, Mathilde Hutin, Ioana Vasilescu, Lori Lamel and Martine Adda-Decker
pp. 3257‑3263
pdf bib Overlaps and Gender Analysis in the Context of Broadcast Media
Martin Lebourdais, Marie Tahon, Antoine LAURENT, Sylvain Meignier and Anthony Larcher
pp. 3264‑3270
pdf bib A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification.
Rémi Uro, David Doukhan, Albert Rilliard, Laetitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon and Antoine Laurent
pp. 3271‑3280
pdf bib DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations
Merel Scholman, Tianai Dong, Frances Yung and Vera Demberg
pp. 3281‑3290
pdf bib QT30: A Corpus of Argument and Conflict in Broadcast Debate
Annette Hautli-Janisz, Zlata Kikteva, Wassiliki Siskou, Kamila Gorska, Ray Becker and Chris Reed
pp. 3291‑3300
pdf bib Scaling up Discourse Quality Annotation for Political Science
Neele Falk and Gabriella Lapesa
pp. 3301‑3318
pdf bib Clarifying Implicit and Underspecified Phrases in Instructional Text
Talita Anthonio, Anna Sauer and Michael Roth
pp. 3319‑3330
pdf bib Multilingual Pragmaticon: Database of Discourse Formulae
Anton Buzanov, Polina Bychkova, Arina Molchanova, Anna Postnikova and Daria Ryzhova
pp. 3331‑3336
pdf bib Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection
Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Dusko Vitas, Mihailo Skoric and Milica Ikonić Nešić
pp. 3337‑3345
pdf bib Exploring Text Recombination for Automatic Narrative Level Detection
Nils Reiter, Judith Sieker, Svenja Guhr, Evelyn Gius and Sina Zarrieß
pp. 3346‑3353
pdf bib Automatic Normalisation of Early Modern French
Rachel Bawden, Jonathan Poinhos, Eleni Kogkitsidou, Philippe Gambette, Benoît Sagot and Simon Gabay
pp. 3354‑3366
pdf bib From FreEM to D’AlemBERT: a Large Corpus and a Language Model for Early Modern French
Simon Gabay, Pedro Ortiz Suarez, Alexandre BARTZ, Alix Chagué, Rachel Bawden, Philippe Gambette and Benoît Sagot
pp. 3367‑3374
pdf bib Detecting Multiple Transitions in Literary Texts
Nuette Heyns and Menno van Zaanen
pp. 3375‑3381
pdf bib BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions
Nayla Escribano, Jon Ander Gonzalez, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre and Rodrigo Agerri
pp. 3382‑3390
pdf bib GerEO: A Large-Scale Resource on the Syntactic Distribution of German Experiencer-Object Verbs
Johanna M. Poppek, Simon Masloch and Tibor Kiss
pp. 3391‑3397
pdf bib ACT2: A multi-disciplinary semi-structured dataset for importance and purpose classification of citations
Suchetha Nambanoor Kunnath, Valentin Stauber, Ronin Wu, David Pride, Viktor Botev and Petr Knoth
pp. 3398‑3406
pdf bib Quantification Annotation in ISO 24617-12, Second Draft
Harry Bunt, Maxime Amblard, Johan Bos, Karën Fort, Bruno Guillaume, Philippe de Groote, Chuyuan Li, Pierre Ludmann, Michel Musiol, Siyana Pavlova, Guy Perrier and Sylvain Pogodalla
pp. 3407‑3416
pdf bib The LTRC Hindi-Telugu Parallel Corpus
Vandan Mujadia and Dipti Sharma
pp. 3417‑3424
pdf bib MHE: Code-Mixed Corpora for Similar Language Identification
Priya Rani, John P. McCrae and Theodorus Fransen
pp. 3425‑3433
pdf bib Bazinga! A Dataset for Multi-Party Dialogues Structuring
Paul Lerner, Juliette Bergoënd, Camille Guinaudeau, Hervé Bredin, Benjamin Maurice, Sharleyne Lefevre, Martin Bouteiller, Aman Berhe, Léo Galmant, Ruiqing Yin and Claude Barras
pp. 3434‑3441
pdf bib The Ellogon Web Annotation Tool: Annotating Moral Values and Arguments
Alexandros Fotios Ntogramatzis, Anna Gradou, Georgios Petasis and Marko Kokol
pp. 3442‑3450
pdf bib WeCanTalk: A New Multi-language, Multi-modal Resource for Speaker Recognition
Karen Jones, Kevin Walker, Christopher Caruso, Jonathan Wright and Stephanie Strassel
pp. 3451‑3456
pdf bib Using Wiktionary to Create Specialized Lexical Resources and Datasets
Lenka Bajčetić and Thierry Declerck
pp. 3457‑3460
pdf bib STAPI: An Automatic Scraper for Extracting Iterative Title-Text Structure from Web Documents
Nan Zhang, Shomir Wilson and Prasenjit Mitra
pp. 3461‑3470
pdf bib ELTE Poetry Corpus: A Machine Annotated Database of Canonical Hungarian Poetry
Péter Horváth, Péter Kundráth, Balázs Indig, Zsófia Fellegi, Eszter Szlávich, Tímea Borbála Bajzát, Zsófia Sárközi-Lindner, Bence Vida, Aslihan Karabulut, Mária Timári and Gábor Palkó
pp. 3471‑3478
pdf bib HAWP: a Dataset for Hindi Arithmetic Word Problem Solving
Harshita Sharma, Pruthwik Mishra and Dipti Sharma
pp. 3479‑3490
pdf bib The Bulgarian Event Corpus: Overview and Initial NER Experiments
Petya Osenova, Kiril Simov, Iva Marinova and Melania Berbatova
pp. 3491‑3499
pdf bib A Corpus for Commonsense Inference in Story Cloze Test
Bingsheng Yao, Ethan Joseph, Julian Lioanag and Mei Si
pp. 3500‑3508
pdf bib Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish
Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson and Magnus Sahlgren
pp. 3509‑3518
pdf bib Constrained Language Models for Interactive Poem Generation
Andrei Popescu-Belis, Àlex Atrio, Valentin Minder, Aris Xanthos, Gabriel Luthier, Simon Mattei and Antonio Rodriguez
pp. 3519‑3529
pdf bib ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls
Huije Lee, Young Ju NA, Hoyun Song, Jisu Shin and Jong Park
pp. 3530‑3541
pdf bib Generating Textual Explanations for Machine Learning Models Performance: A Table-to-Text Task
Isaac Ampomah, James Burton, Amir Enshaei and Noura Al Moubayed
pp. 3542‑3551
pdf bib Barch: an English Dataset of Bar Chart Summaries
Iza Škrjanec, Muhammad Salman Edhi and Vera Demberg
pp. 3552‑3560
pdf bib Effectiveness of Data Augmentation and Pretraining for Improving Neural Headline Generation in Low-Resource Settings
Matej Martinc, Syrielle Montariol, Lidia Pivovarova and Elaine Zosa
pp. 3561‑3570
pdf bib Effectiveness of French Language Models on Abstractive Dialogue Summarization Task
Yongxin Zhou, François Portet and Fabien Ringeval
pp. 3571‑3581
pdf bib ALEXSIS: A Dataset for Lexical Simplification in Spanish
Daniel Ferrés and Horacio Saggion
pp. 3582‑3594
pdf bib The IARPA BETTER Program Abstract Task Four New Semantically Annotated Corpora from IARPA’s BETTER Program
Timothy Mckinnon and Carl Rubino
pp. 3595‑3600
pdf bib A Named Entity Recognition Corpus for Vietnamese Biomedical Texts to Support Tuberculosis Treatment
Uyen Phan, Phuong N.V Nguyen and Nhung Nguyen
pp. 3601‑3609
pdf bib RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour
Erick Mendez Guzman, Viktor Schlegel and Riza Batista-Navarro
pp. 3610‑3625
pdf bib Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT
Mustafa Jarrar, Mohammed Khalilia and Sana Ghanem
pp. 3626‑3636
pdf bib Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient’s Perspective
Lisa Raithel, Philippe Thomas, Roland Roller, Oliver Sapina, Sebastian Möller and Pierre Zweigenbaum
pp. 3637‑3649
pdf bib GGPONC 2.0 - The German Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline NER Taggers
Florian Borchert, Christina Lohr, Luise Modersohn, Jonas Witt, Thomas Langer, Markus Follmann, Matthias Gietzelt, Bert Arnrich, Udo Hahn and Matthieu-P. Schapranow
pp. 3650‑3660
pdf bib ClinIDMap: Towards a Clinical IDs Mapping for Data Interoperability
Elena Zotova, Montse Cuadros and German Rigau
pp. 3661‑3669
pdf bib Identifying Draft Bills Impacting Existing Legislation: a Case Study on Romanian
Corina Ceausu and Sergiu Nisioi
pp. 3670‑3674
pdf bib MuLD: The Multitask Long Document Benchmark
George Hudson and Noura Al Moubayed
pp. 3675‑3685
pdf bib A Cross-document Coreference Dataset for Longitudinal Tracking across Radiology Reports
Surabhi Datta, Hio Cheng Lam, Atieh Pajouhi, Sunitha Mogalla and Kirk Roberts
pp. 3686‑3695
pdf bib How’s Business Going Worldwide ? A Multilingual Annotated Corpus for Business Relation Extraction
Hadjer Khaldi, Farah Benamara, Camille Pradel, Grégoire Sigel and Nathalie Aussenac-Gilles
pp. 3696‑3705
pdf bib Do Transformer Networks Improve the Discovery of Rules from Text?
Mahdi Rahimi and Mihai Surdeanu
pp. 3706‑3714
pdf bib Offensive language detection in Hebrew: can other languages help?
Marina Litvak, Natalia Vanetik, Chaya Liebeskind, Omar Hmdia and Rizek Abu Madeghem
pp. 3715‑3723
pdf bib JaMIE: A Pipeline Japanese Medical Information Extraction System with Novel Relation Annotation
Fei Cheng, Shuntaro Yada, Ribeka Tanaka, Eiji ARAMAKI and Sadao Kurohashi
pp. 3724‑3731
pdf bib Enhanced Entity Annotations for Multilingual Corpora
Michael Strobl, Amine Trabelsi and Osmar Zaïane
pp. 3732‑3740
pdf bib Enriching Epidemiological Thematic Features For Disease Surveillance Corpora Classification
Edmond Menya, Mathieu Roche, Roberto Interdonato and Dickson Owuor
pp. 3741‑3750
pdf bib Spanish Datasets for Sensitive Entity Detection in the Legal Domain
Ona de Gibert Bonet, Aitor García Pablos, Montse Cuadros and Maite Melero
pp. 3751‑3760
pdf bib ConvTextTM: An Explainable Convolutional Tsetlin Machine Framework for Text Classification
Bimal Bhattarai, Ole-Christoffer Granmo and Lei Jiao
pp. 3761‑3770
pdf bib Elvis vs. M. Jackson: Who has More Albums? Classification and Identification of Elements in Comparative Questions
Meriem Beloucif, Seid Muhie Yimam, Steffen Stahlhacke and Chris Biemann
pp. 3771‑3779
pdf bib Decorate the Examples: A Simple Method of Prompt Design for Biomedical Relation Extraction
Hui-Syuan Yeh, Thomas Lavergne and Pierre Zweigenbaum
pp. 3780‑3787
pdf bib Comparing Annotated Datasets for Named Entity Recognition in English Literature
Rositsa Ivanova, Marieke van Erp and Sabrina Kirrane
pp. 3788‑3797
pdf bib Investigating User Radicalization: A Novel Dataset for Identifying Fine-Grained Temporal Shifts in Opinion
Flora Sakketou, Allison Lahnala, Liane Vogel and Lucie Flek
pp. 3798‑3808
pdf bib APPReddit: a Corpus of Reddit Posts Annotated for Appraisal
Marco Antonio Stranisci, Simona Frenda, Eleonora Ceccaldi, Valerio Basile, Rossana Damiano and Viviana Patti
pp. 3809‑3818
pdf bib Evaluating Methods for Extraction of Aspect Terms in Opinion Texts in Portuguese - the Challenges of Implicit Aspects
Mateus Machado and Thiago Alexandre Salgueiro Pardo
pp. 3819‑3828
pdf bib SenticNet 7: A Commonsense-based Neurosymbolic AI Framework for Explainable Sentiment Analysis
Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing and Kenneth Kwok
pp. 3829‑3839
pdf bib Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo
Roberto Zariquiey, Claudia Alvarado, Ximena Echevarría, Luisa Gomez, Rosa Gonzales, Mariana Illescas, Sabina Oporto, Frederic Blum, Arturo Oncevay and Javier Vera
pp. 3840‑3851
pdf bib The Norwegian Colossal Corpus: A Text Corpus for Training Large Norwegian Language Models
Per Kummervold, Freddy Wetjen and Javier de la Rosa
pp. 3852‑3860
pdf bib Embeddings models for Buddhist Sanskrit
Ligeia Lugli, Matej Martinc, Andraž Pelicon and Senja Pollak
pp. 3861‑3871
pdf bib Development of Automatic Speech Recognition for the Documentation of Cook Islands Māori
Rolando Coto-Solano, Sally Akevai Nicholas, Samiha Datta, Victoria Quint, Piripi Wills, Emma Ngakuravaru Powell, Liam Koka’ua, Syed Tanveer and Isaac Feldman
pp. 3872‑3882
pdf bib A Generalized Approach to Protest Event Detection in German Local News
Gregor Wiedemann, Jan Matti Dollbaum, Sebastian Haunss, Priska Daphi and Larissa Daria Meier
pp. 3883‑3891
pdf bib Evaluation of Transfer Learning and Domain Adaptation for Analyzing German-Speaking Job Advertisements
Ann-Sophie Gnehm, Eva Bühlmann and Simon Clematide
pp. 3892‑3901
pdf bib Pre-Training Language Models for Identifying Patronizing and Condescending Language: An Analysis
Carla Perez Almendros, Luis Espinosa Anke and Steven Schockaert
pp. 3902‑3911
pdf bib HeLI-OTS, Off-the-shelf Language Identifier for Text
Tommi Jauhiainen, Heidi Jauhiainen and Krister Lindén
pp. 3912‑3922
pdf bib Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages
Silvia Severini, Ayyoob ImaniGooghari, Philipp Dufter and Hinrich Schütze
pp. 3923‑3933
pdf bib Towards the Construction of a WordNet for Old English
Fahad Khan, Francisco J. Minaya Gómez, Rafael Cruz González, Harry Diakoff, Javier E. Diaz Vera, John P. McCrae, Ciara O’Loughlin, William Michael Short and Sander Stolk
pp. 3934‑3941
pdf bib A Framenet and Frame Annotator for German Social Media
Eckhard Bick
pp. 3942‑3949
pdf bib The Robotic Surgery Procedural Framebank
Marco Bombieri, Marco Rospocher, Simone Paolo Ponzetto and Paolo Fiorini
pp. 3950‑3959
pdf bib Representing the Toddler Lexicon: Do the Corpus and Semantics Matter?
Jennifer Weber and Eliana Colunga
pp. 3960‑3968
pdf bib Organizing and Improving a Database of French Word Formation Using Formal Concept Analysis
Nyoman Juniarta, Olivier Bonami, Nabil Hathout, Fiammetta Namer and Yannick Toussaint
pp. 3969‑3976
pdf bib Towards a new Ontology for Sign Languages
Thierry Declerck
pp. 3977‑3983
pdf bib Towards the Detection of a Semantic Gap in the Chain of Commonsense Knowledge Triples
Yoshihiko Hayashi
pp. 3984‑3993
pdf bib COPA-SSE: Semi-structured Explanations for Commonsense Reasoning
Ana Brassard, Benjamin Heinzerling, Pride Kavumba and Kentaro Inui
pp. 3994‑4000
pdf bib GRhOOT: Ontology of Rhetorical Figures in German
Ramona Kühn, Jelena Mitrović and Michael Granitzer
pp. 4001‑4010
pdf bib Querying a Dozen Corpora and a Thousand Years with Fintan
Christian Chiarcos, Christian Fäth and Maxim Ionov
pp. 4011‑4021
pdf bib The Index Thomisticus Treebank as Linked Data in the LiLa Knowledge Base
Francesco Mambrini, Marco Passarotti, Giovanni Moretti and Matteo Pellegrini
pp. 4022‑4029
pdf bib Building a Multilingual Taxonomy of Olfactory Terms with Timestamps
Stefano Menini, Teresa Paccosi, Serra Sinem Tekiroğlu and Sara Tonelli
pp. 4030‑4039
pdf bib Attention Understands Semantic Relations
Anastasia Chizhikova, Sanzhar Murzakhmetov, Oleg Serikov, Tatiana Shavrina and Mikhail Burtsev
pp. 4040‑4050
pdf bib Analysis of Dialogue in Human-Human Collaboration in Minecraft
Takuma Ichikawa and Ryuichiro Higashinaka
pp. 4051‑4059
pdf bib Data Collection for Empirically Determining the Necessary Information for Smooth Handover in Dialogue
Sanae Yamashita and Ryuichiro Higashinaka
pp. 4060‑4068
pdf bib The slurk Interaction Server Framework: Better Data for Better Dialog Models
Jana Götze, Maike Paetzel-Prüsmann, Wencke Liermann, Tim Diekmann and David Schlangen
pp. 4069‑4078
pdf bib Corpus Design for Studying Linguistic Nudges in Human-Computer Spoken Interactions
Natalia Kalashnikova, Serge Pajak, Fabrice Le Guel, Ioana Vasilescu, Gemma Serrano and Laurence Devillers
pp. 4079‑4087
pdf bib Dialogue Corpus Construction Considering Modality and Social Relationships in Building Common Ground
Yuki Furuya, Koki Saito, Kosuke Ogura, Koh Mitsuda, Ryuichiro Higashinaka and Kazunori Takashio
pp. 4088‑4095
pdf bib EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems
Shutong Feng, Nurul Lubis, Christian Geishauser, Hsien-chin Lin, Michael Heck, Carel van Niekerk and Milica Gasic
pp. 4096‑4113
pdf bib Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System
Eda Okur, Saurav Sahay and Lama Nachman
pp. 4114‑4125
pdf bib Towards Modelling Self-imposed Filter Bubbles in Argumentative Dialogue Systems
Annalena Aicher, Wolfgang Minker and Stefan Ultes
pp. 4126‑4134
pdf bib Telling a Lie: Analyzing the Language of Information and Misinformation during Global Health Events
Ankit Aich and Natalie Parde
pp. 4135‑4141
pdf bib Misogyny and Aggressiveness Tend to Come Together and Together We Address Them
Arianna Muti, Francesco Fernicola and Alberto Barrón-Cedeño
pp. 4142‑4148
pdf bib The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse
Ritesh Kumar, Shyam Ratan, Siddharth Singh, Enakshi Nandi, Laishram Niranjana Devi, Akash Bhagat, Yogesh Dawer, bornini lahiri, Akanksha Bansal and Atul Kr. Ojha
pp. 4149‑4161
pdf bib TUSC: Emotion Word Usage in Tweets from US and Canada
Krishnapriya Vishnubhotla and Saif M. Mohammad
pp. 4162‑4176
pdf bib A Turkish Hate Speech Dataset and Detection System
Fatih Beyhan, Buse Çarık, İnanç Arın, Ayşecan Terzioğlu, Berrin Yanikoglu and Reyyan Yeniterzi
pp. 4177‑4185
pdf bib Life is not Always Depressing: Exploring the Happy Moments of People Diagnosed with Depression
Ana-Maria Bucur, Adrian Cosma and Liviu P. Dinu
pp. 4186‑4192
pdf bib Evaluating Tokenizers Impact on OOVs Representation with Transformers Models
Alexandra Benamar, Cyril Grouin, Meryl Bothua and Anne Vilnat
pp. 4193‑4204
pdf bib Assessing the Quality of an Italian Crowdsourced Idiom Corpus:the Dodiom Experiment
Giuseppina Morza, Raffaele Manna and Johanna Monti
pp. 4205‑4211
pdf bib Medical Crossing: a Cross-lingual Evaluation of Clinical Entity Linking
Anton Alekseev, Zulfat Miftahutdinov, Elena Tutubalina, Artem Shelmanov, Vladimir Ivanov, Vladimir Kokh, Alexander Nesterov, Manvel Avetisian, Andrei Chertok and Sergey Nikolenko
pp. 4212‑4220
pdf bib MTLens: Machine Translation Output Debugging
Shreyas Sharma, Kareem Darwish, Lucas Pavanelli, Thiago Castro Ferreira, Mohamed Al-Badrashiny, Kamer Ali Yuksel and Hassan Sawaf
pp. 4221‑4226
pdf bib IceBATS: An Icelandic Adaptation of the Bigger Analogy Test Set
Steinunn Rut Friðriksdóttir, Hjalti Daníelsson, Steinþór Steingrímsson and Einar Sigurdsson
pp. 4227‑4234
pdf bib Transfer Learning Methods for Domain Adaptation in Technical Logbook Datasets
Farhad Akhbardeh, Marcos Zampieri, Cecilia Ovesdotter Alm and Travis Desell
pp. 4235‑4244
pdf bib Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data
Thomas Vakili, Anastasios Lamproudis, Aron Henriksson and Hercules Dalianis
pp. 4245‑4252
pdf bib Dilated Convolutional Neural Networks for Lightweight Diacritics Restoration
Bálint Csanády and András Lukács
pp. 4253‑4259
pdf bib Generating Artificial Texts as Substitution or Complement of Training Data
Vincent Claveau, Antoine Chaffin and Ewa Kijak
pp. 4260‑4269
pdf bib From Pattern to Interpretation. Using Colibri Core to Detect Translation Patterns in the Peshitta.
Mathias Coeckelbergs
pp. 4270‑4274
pdf bib PAGnol: An Extra-Large French Generative Model
Julien Launay, E.L. Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Alessandro Cappelli, Iacopo Poli and Djamé Seddah
pp. 4275‑4284
pdf bib CEPOC: The Cambridge Exams Publishing Open Cloze dataset
Mariano Felice, Shiva Taslimipoor, Øistein E. Andersen and Paula Buttery
pp. 4285‑4290
pdf bib ALBETO and DistilBETO: Lightweight Spanish Language Models
José Cañete, Sebastian Donoso, Felipe Bravo-Marquez, Andrés Carvallo and Vladimir Araujo
pp. 4291‑4298
pdf bib On the Robustness of Cognate Generation Models
Winston Wu and David Yarowsky
pp. 4299‑4305
pdf bib CLISTER : A Corpus for Semantic Textual Similarity in French Clinical Narratives
Nicolas Hiebel, Olivier Ferret, Karën Fort and Aurélie Névéol
pp. 4306‑4315
pdf bib The Chinese Causative-Passive Homonymy Disambiguation: an adversarial Dataset for NLI and a Probing Task
Shanshan Xu and Katja Markert
pp. 4316‑4323
pdf bib Modeling Noise in Paraphrase Detection
Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann and Mathias Creutz
pp. 4324‑4332
pdf bib Give me your Intentions, I’ll Predict our Actions: A Two-level Classification of Speech Acts for Crisis Management in Social Media
Enzo laurenti, Nils Bourgon, Farah Benamara, Alda Mari, Véronique MORICEAU and Camille Courgeon
pp. 4333‑4343
pdf bib Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
Julien Abadji, Pedro Ortiz Suarez, Laurent Romary and Benoît Sagot
pp. 4344‑4355
pdf bib A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models
Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Jónsson, Vilhjalmur Thorsteinsson and Hafsteinn Einarsson
pp. 4356‑4366
pdf bib Adapting Language Models When Training on Privacy-Transformed Data
Tugtekin Turan, Dietrich Klakow, Emmanuel Vincent and Denis Jouvet
pp. 4367‑4373
pdf bib Evaluation of Transfer Learning for Polish with a Text-to-Text Model
Aleksandra Chrabrowa, Łukasz Dragan, Karol Grzegorczyk, Dariusz Kajtoch, Mikołaj Koszowski, Robert Mroczkowski and Piotr Rybak
pp. 4374‑4394
pdf bib Evaluation of HTR models without Ground Truth Material
Phillip Benjamin Ströbel, Martin Volk, Simon Clematide, Raphael Schwitter, Tobias Hodel and David Schoch
pp. 4395‑4404
pdf bib A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking
Tomasz Korybski, Elena Davitti, Constantin Orasan and Sabine Braun
pp. 4405‑4413
pdf bib Are Embedding Spaces Interpretable? Results of an Intrusion Detection Evaluation on a Large French Corpus
Thibault Prouteau, Nicolas Dugué, Nathalie Camelin and Sylvain Meignier
pp. 4414‑4419
pdf bib Corpus for Automatic Structuring of Legal Documents
Prathamesh Kalamkar, Aman Tiwari, Astha Agarwal, Saurabh Karn, Smita Gupta, Vivek Raghavan and Ashutosh Modi
pp. 4420‑4429
pdf bib The Search for Agreement on Logical Fallacy Annotation of an Infodemic
Claire Bonial, Austin Blodgett, Taylor Hudson, Stephanie M. Lukin, Jeffrey Micher, Douglas Summers-Stay, Peter Sutor and Clare Voss
pp. 4430‑4438
pdf bib Recovering Patient Journeys: A Corpus of Biomedical Entities and Relations on Twitter (BEAR)
Amelie Wührl and Roman Klinger
pp. 4439‑4450
pdf bib Improving Event Duration Question Answering by Leveraging Existing Temporal Information Extraction Data
Felix Virgo, Fei Cheng and Sadao Kurohashi
pp. 4451‑4457
pdf bib Entity Linking over Nested Named Entities for Russian
Natalia Loukachevitch, Pavel Braslavski, Vladimir Ivanov, Tatiana Batura, Suresh Manandhar, Artem Shelmanov and Elena Tutubalina
pp. 4458‑4466
pdf bib HiNER: A large Hindi Named Entity Recognition Dataset
Rudra Murthy, Pallab Bhattacharjee, Rahul Sharnagat, Jyotsana Khatri, Diptesh Kanojia and Pushpak Bhattacharyya
pp. 4467‑4476
pdf bib Bootstrapping Text Anonymization Models with Distant Supervision
Anthi Papadopoulou, Pierre Lison, Lilja Øvrelid and Ildikó Pilán
pp. 4477‑4487
pdf bib Natural Questions in Icelandic
Vésteinn Snæbjarnarson and Hafsteinn Einarsson
pp. 4488‑4496
pdf bib QA4IE: A Quality Assurance Tool for Information Extraction
Rafael Jimenez Silva, Kaushik Gedela, Alex Marr, Bart Desmet, Carolyn Rose and Chunxiao Zhou
pp. 4497‑4503
pdf bib A New Dataset for Topic-Based Paragraph Classification in Genocide-Related Court Transcripts
Miriam Schirmer, Udo Kruschwitz and Gregor Donabauer
pp. 4504‑4512
pdf bib DeepREF: A Framework for Optimized Deep Learning-based Relation Classification
Igor Nascimento, Rinaldo Lima, Adrian-Gabriel CHIFU, Bernard Espinasse and Sébastien Fournier
pp. 4513‑4522
pdf bib Exploring Data Augmentation Strategies for Hate Speech Detection in Roman Urdu
Ubaid Azam, Hammad Rizwan and Asim Karim
pp. 4523‑4531
pdf bib Incorporating LIWC in Neural Networks to Improve Human Trait and Behavior Analysis in Low Resource Scenarios
Isil Yakut Kilic and Shimei Pan
pp. 4532‑4539
pdf bib Using Sentence-level Classification Helps Entity Extraction from Material Science Literature
Ankan Mullick, Shubhraneel Pal, Tapas Nayak, Seung-Cheol Lee, Satadeep Bhattacharjee and Pawan Goyal
pp. 4540‑4545
pdf bib A Twitter Corpus for Named Entity Recognition in Turkish
Buse Çarık and Reyyan Yeniterzi
pp. 4546‑4551
pdf bib A STEP towards Interpretable Multi-Hop Reasoning:Bridge Phrase Identification and Query Expansion
Fan Luo and Mihai Surdeanu
pp. 4552‑4560
pdf bib Question Generation and Answering for exploring Digital Humanities collections
Frederic Bechet, Elie Antoine, Jérémy Auguste and Géraldine Damnati
pp. 4561‑4568
pdf bib Evaluating Retrieval for Multi-domain Scientific Publications
Nancy Ide, Keith Suderman, Jingxuan Tu, Marc Verhagen, Shanan Peters, Ian Ross, John Lawson, Andrew Borg and James Pustejovsky
pp. 4569‑4576
pdf bib Modeling Dutch Medical Texts for Detecting Functional Categories and Levels of COVID-19 Patients
Jenia Kim, Stella Verkijk, Edwin Geleijn, Marieke van der Leeden, Carel Meskers, Caroline Meskers, Sabina van der Veen, Piek Vossen and Guy Widdershoven
pp. 4577‑4585
pdf bib Hierarchical Aggregation of Dialectal Data for Arabic Dialect Identification
Nurpeiis Baimukan, Houda Bouamor and Nizar Habash
pp. 4586‑4596
pdf bib Investigating Active Learning Sampling Strategies for Extreme Multi Label Text Classification
Lukas Wertz, Katsiaryna Mirylenka, Jonas Kuhn and Jasmina Bogojeska
pp. 4597‑4605
pdf bib German Light Verb Constructions in Business Process Models
Kristin Kutzner and Ralf Laue
pp. 4606‑4610
pdf bib PhysNLU: A Language Resource for Evaluating Natural Language Understanding and Explanation Coherence in Physics
Jordan Meadows, Zili Zhou and André Freitas
pp. 4611‑4619
pdf bib HECTOR: A Hybrid TExt SimplifiCation TOol for Raw Texts in French
Amalia Todirascu, Rodrigo Wilkens, Eva Rolin, Thomas François, Delphine Bernhard and Núria Gala
pp. 4620‑4630
pdf bib AiRO - an Interactive Learning Tool for Children at Risk of Dyslexia
Peter Juel Henrichsen and Stine Fuglsang Engmose
pp. 4631‑4636
pdf bib Creating a Basic Language Resource Kit for Faroese
Annika Simonsen, Sandra Saxov Lamhauge, Iben Nyholm Debess and Peter Juel Henrichsen
pp. 4637‑4643
pdf bib Developing a Spell and Grammar Checker for Icelandic using an Error Corpus
Hulda Óladóttir, Þórunn Arnardóttir, Anton Ingason and Vilhjálmur Þorsteinsson
pp. 4644‑4653
pdf bib The TalkMoves Dataset: K-12 Mathematics Lesson Transcripts Annotated for Teacher and Student Discursive Moves
Abhijit Suresh, Jennifer Jacobs, Charis Harty, Margaret Perkoff, James H. Martin and Tamara Sumner
pp. 4654‑4662
pdf bib Automating Idea Unit Segmentation and Alignment for Assessing Reading Comprehension via Summary Protocol Analysis
Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga, Yasuyo Sawaki and Mika Ishizuka
pp. 4663‑4673
pdf bib IRAC: A Domain-Specific Annotated Corpus of Implicit Reasoning in Arguments
Keshav Singh, Naoya Inoue, Farjana Sultana Mim, Shoichi Naito and Kentaro Inui
pp. 4674‑4683
pdf bib Conversational Speech Recognition Needs Data? Experiments with Austrian German
Julian Linke, Philip N. Garner, Gernot Kubin and Barbara Schuppler
pp. 4684‑4691
pdf bib A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications
Vijini Liyanage, Davide Buscaldi and Adeline Nazarenko
pp. 4692‑4700
pdf bib Building a Dataset for Automatically Learning to Detect Questions Requiring Clarification
Ivano Lauriola, Kevin Small and Alessandro Moschitti
pp. 4701‑4707
pdf bib The ALPIN Sentiment Dictionary: Austrian Language Polarity in Newspapers
Thomas Kolb, Sekanina Katharina, Bettina Manuela Johanna Kern, Julia Neidhardt, Tanja Wissik and Andreas Baumann
pp. 4708‑4716
pdf bib Text Classification and Prediction in the Legal Domain
Minh-Quoc Nghiem, Paul Baylis, André Freitas and Sophia Ananiadou
pp. 4717‑4722
pdf bib I still have Time(s): Extending HeidelTime for German Texts
Andy Luecking, Manuel Stoeckel, Giuseppe Abrami and Alexander Mehler
pp. 4723‑4728
pdf bib Morphological Complexity of Children Narratives in Eight Languages
Gordana Hržica, Chaya Liebeskind, Kristina Š. Despot, Olga Dontcheva-Navratilova, Laura Kamandulytė-Merfeldienė, Sara Košutar, Matea Kramarić and Giedrė Valūnaitė Oleškevičienė
pp. 4729‑4738
pdf bib EXPRES Corpus for A Field-specific Automated Exploratory Study of L2 English Expert Scientific Writing
Ana-Maria Bucur, Madalina Chitez, Valentina Muresan, Andreea Dinca and Roxana Rogobete
pp. 4739‑4746
pdf bib An Evaluation Framework for Legal Document Summarization
Ankan Mullick, Abhilash Nandy, Manav Kapadnis, Sohan Patnaik, Raghav R and Roshni Kar
pp. 4747‑4753
pdf bib Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France’s Court of Cassation Rulings
Thibault Charmet, Inès Cherichi, Matthieu Allain, Urszula Czerwinska, Amaury Fouret, Benoît Sagot and Rachel Bawden
pp. 4754‑4766
pdf bib Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset
Bolat Tleubayev, Zhanel Zhexenova, Kenessary Koishybay and Anara Sandygulova
pp. 4767‑4773
pdf bib gaBERT — an Irish Language Model
James Barry, Joachim Wagner, Lauren Cassidy, Alan Cowap, Teresa Lynn, Abigail Walsh, Mícheál J. Ó Meachair and Jennifer Foster
pp. 4774‑4788
pdf bib PoS Tagging, Lemmatization and Dependency Parsing of West Frisian
Wilbert Heeringa, Gosse Bouma, Martha Hofman, Jelle Brouwer, Eduard Drenth, Jan Wijffels and Hans Van de Velde
pp. 4789‑4798
pdf bib A Dataset of Offensive German Language Tweets Annotated for Speech Acts
Melina Plakidis and Georg Rehm
pp. 4799‑4807
pdf bib Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German
Marie-Pauline Krielke, Luigi Talamo, Mahmoud Fawzi and Jörg Knappen
pp. 4808‑4816
pdf bib The Tembusu Treebank: An English Learner Treebank
Luís Morgado da Costa, Francis Bond and Roger V. P. Winder
pp. 4817‑4826
pdf bib The Norwegian Dialect Corpus Treebank
Andre Kåsen, Kristin Hagen, Anders Nøklestad, Joel Priestly, Per Erik Solberg and Dag Trygve Truslew Haug
pp. 4827‑4832
pdf bib RRGparbank: A Parallel Role and Reference Grammar Treebank
Tatiana Bladier, Kilian Evang, Valeria Generalova, Zahra Ghane, Laura Kallmeyer, Robin Möllemann, Natalia Moors, Rainer Osswald and Simon Petitjean
pp. 4833‑4841
pdf bib Unifying Morphology Resources with OntoLex-Morph. A Case Study in German
Christian Chiarcos, Christian Fäth and Maxim Ionov
pp. 4842‑4850
pdf bib Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers
Takuto Asakura, Yusuke Miyao and Akiko Aizawa
pp. 4851‑4858
pdf bib CorefUD 1.0: Coreference Meets Universal Dependencies
Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes and Daniel Zeman
pp. 4859‑4872
pdf bib The Universal Anaphora Scorer
Juntao Yu, Sopan Khosla, Nafise Sadat Moosavi, Silviu Paun, Sameer Pradhan and Massimo Poesio
pp. 4873‑4883
pdf bib Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes
Anastasia Zhukova, Felix Hamborg and Bela Gipp
pp. 4884‑4893
pdf bib Explainable Tsetlin Machine Framework for Fake News Detection with Credibility Score Assessment
Bimal Bhattarai, Ole-Christoffer Granmo and Lei Jiao
pp. 4894‑4903
pdf bib Enhancing Deep Learning with Embedded Features for Arabic Named Entity Recognition
Ali L. Hatab, Caroline Sabty and Slim Abdennadher
pp. 4904‑4912
pdf bib SCAI-QReCC Shared Task on Conversational Question Answering
Svitlana Vakulenko, Johannes Kiesel and Maik Fröbe
pp. 4913‑4922
pdf bib Semantic Relations between Text Segments for Semantic Storytelling: Annotation Tool - Dataset - Evaluation
Michael Raring, Malte Ostendorff and Georg Rehm
pp. 4923‑4932
pdf bib Evaluating Pre-training Objectives for Low-Resource Translation into Morphologically Rich Languages
Prajit Dhar, Arianna Bisazza and Gertjan van Noord
pp. 4933‑4943
pdf bib Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding
Abhidip Bhattacharyya, Cecilia Mauceri, Martha Palmer and Christoffer Heckman
pp. 4944‑4954
pdf bib Rosetta-LSF: an Aligned Corpus of French Sign Language and French for Text-to-Sign Translation
Elise Bertin-Lemée, Annelies Braffort, Camille Challant, Claire Danet, Boris Dauriac, Michael Filhol, Emmanuella Martinod and Jérémie Segouat
pp. 4955‑4962
pdf bib MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset
Marina Fomicheva, Shuo Sun, Erick Fonseca, Chrysoula Zerva, Frédéric Blain, Vishrav Chaudhary, Francisco Guzmán, Nina Lopatina, Lucia Specia and André F. T. Martins
pp. 4963‑4974
pdf bib OpenKorPOS: Democratizing Korean Tokenization with Voting-Based Open Corpus Annotation
Sangwhan Moon, Won Ik Cho, Hye Joo Han, Naoaki Okazaki and Nam Soo Kim
pp. 4975‑4983
pdf bib Enriching Grammatical Error Correction Resources for Modern Greek
Katerina Korre and John Pavlopoulos
pp. 4984‑4991
pdf bib A Hmong Corpus with Elaborate Expression Annotations
David R. Mortensen, Xinyu Zhang, Chenxuan Cui and Katherine Zhang
pp. 4992‑5000
pdf bib ELAL: An Emotion Lexicon for the Analysis of Alsatian Theatre Plays
Delphine Bernhard and Pablo Ruiz Fabo
pp. 5001‑5010
pdf bib Universal Dependencies for Western Sierra Puebla Nahuatl
Robert Pugh, Marivel Huerta Mendez, Mitsuya Sasaki and Francis Tyers
pp. 5011‑5020
pdf bib The Construction and Evaluation of the LEAFTOP Dataset of Automatically Extracted Nouns in 1480 Languages
Gregory Baker and Diego Molla
pp. 5021‑5028
pdf bib Huqariq: A Multilingual Speech Corpus of Native Languages of Peru forSpeech Recognition
Rodolfo Zevallos, Luis Camacho and Nelsi Melgarejo
pp. 5029‑5034
pdf bib Writing System and Speaker Metadata for 2,800+ Language Varieties
Daan van Esch, Tamar Lucassen, Sebastian Ruder, Isaac Caswell and Clara Rivera
pp. 5035‑5046
pdf bib The PALMA Corpora of African Varieties of Portuguese
Tjerk Hagemeijer, Amália Mendes, Rita Gonçalves, Catarina Cornejo, Raquel Madureira and Michel Généreux
pp. 5047‑5053
pdf bib A Learning-Based Dependency to Constituency Conversion Algorithm for the Turkish Language
Büşra Marşan, Oğuz K. Yıldız, Aslı Kuzgun, Neslihan Cesur, Arife B. Yenice, Ezgi Sanıyar, Oğuzhan Kuyrukçu, Bilge N. Arıcan and Olcay Taner Yıldız
pp. 5054‑5062
pdf bib Standard German Subtitling of Swiss German TV content: the PASSAGE Project
Jonathan David Mutal, Pierrette Bouillon, Johanna Gerlach and Veronika Haberkorn
pp. 5063‑5070
pdf bib A Survey of Multilingual Models for Automatic Speech Recognition
Hemant Yadav and Sunayana Sitaram
pp. 5071‑5079
pdf bib LuxemBERT: Simple and Practical Data Augmentation in Language Model Pre-Training for Luxembourgish
Cedric Lothritz, Bertrand Lebichot, Kevin Allix, Lisa Veiber, TEGAWENDE BISSYANDE, Jacques Klein, Andrey Boytsov, Clément Lefebvre and Anne Goujon
pp. 5080‑5089
pdf bib PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection
Salar Mohtaj, Fatemeh Tavakkoli and Habibollah Asghari
pp. 5090‑5096
pdf bib Introducing the Welsh Text Summarisation Dataset and Baseline Systems
Ignatius Ezeani, Mahmoud El-Haj, Jonathan Morris and Dawn Knight
pp. 5097‑5106
pdf bib A Systematic Approach to Derive a Refined Speech Corpus for Sinhala
Disura Warusawithana, Nilmani Kulaweera, Lakshan Weerasinghe and Buddhika Karunarathne
pp. 5107‑5113
pdf bib IgboBERT Models: Building and Training Transformer Models for the Igbo Language
Chiamaka Chukwuneke, Ignatius Ezeani, Paul Rayson and Mahmoud El-Haj
pp. 5114‑5122
pdf bib Latvian National Corpora Collection – Korpuss.lv
Baiba Saulite, Roberts Darģis, Normunds Gruzitis, Ilze Auzina, Kristīne Levāne-Petrova, Lauma Pretkalniņa, Laura Rituma, Peteris Paikens, Arturs Znotins, Laine Strankale, Kristīne Pokratniece, Ilmārs Poikāns, Guntis Barzdins, Inguna Skadiņa, Anda Baklāne, Valdis Saulespurēns and Jānis Ziediņš
pp. 5123‑5129
pdf bib Investigating the Relationship Between Romanian Financial News and Closing Prices from the Bucharest Stock Exchange
Ioan-Bogdan Iordache, Ana Sabina Uban, Catalin Stoean and Liviu P. Dinu
pp. 5130‑5136
pdf bib A Free/Open-Source Morphological Analyser and Generator for Sakha
Sardana Ivanova, Jonathan Washington and Francis Tyers
pp. 5137‑5142
pdf bib An Expanded Finite-State Transducer for Tsuut’ina Verbs
Joshua Holden, Christopher Cox and Antti Arppe
pp. 5143‑5152
pdf bib BD-SHS: A Benchmark Dataset for Learning to Detect Online Bangla Hate Speech in Different Social Contexts
Nauros Romim, Mosahed Ahmed, Md Saiful Islam, Arnab Sen Sharma, Hriteshwar Talukder and Mohammad Ruhul Amin
pp. 5153‑5162
pdf bib Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction
Mehdi Mirzapour, Waleed Ragheb, Mohammad Javad Saeedizade, Kevin Cousot, Helene Jacquenet, Lawrence Carbon and Mathieu Lafourcade
pp. 5163‑5169
pdf bib The Badalona Corpus - An Audio, Video and Neuro-Physiological Conversational Dataset
Philippe Blache, Salomé Antoine, Dorina De Jong, Lena-Marie Huttner, Emilia Kerr, Thierry Legou, Eliot Maës and Clément François
pp. 5170‑5177
pdf bib Reading Time and Vocabulary Rating in the Japanese Language: Large-Scale Japanese Reading Time Data Collection Using Crowdsourcing
Masayuki Asahara
pp. 5178‑5187
pdf bib Thematic Fit Bits: Annotation Quality and Quantity Interplay for Event Participant Representation
Yuval Marton and Asad Sayeed
pp. 5188‑5197
pdf bib ChiSense-12: An English Sense-Annotated Child-Directed Speech Corpus
Francesco Cabiddu, Lewis Bott, Gary Jones and Chiara Gambi
pp. 5198‑5205
pdf bib Making People Laugh like a Pro: Analysing Humor Through Stand-Up Comedy
Beatrice Turano and Carlo Strapparava
pp. 5206‑5211
pdf bib Testing Focus and Non-at-issue Frameworks with a Question-under-Discussion-Annotated Corpus
Christoph Hesse, Maurice Langner, Ralf Klabunde and Anton Benz
pp. 5212‑5219
pdf bib Development of a Multilingual CCG Treebank via Universal Dependencies Conversion
Tu-Anh Tran and Yusuke Miyao
pp. 5220‑5233
pdf bib The Automatic Extraction of Linguistic Biomarkers as a Viable Solution for the Early Diagnosis of Mental Disorders
Gloria Gagliardi and Fabio Tamburini
pp. 5234‑5242
pdf bib Singlish Where Got Rules One? Constructing a Computational Grammar for Singlish
Siew Yeng Chow and Francis Bond
pp. 5243‑5250
pdf bib COSMOS: Experimental and Comparative Studies of Concept Representations in Schoolchildren
Jeanne Villaneau and Farida SAID
pp. 5251‑5260
pdf bib Features of Perceived Metaphoricity on the Discourse Level: Abstractness and Emotionality
Prisca Piccirilli and Sabine Schulte im Walde
pp. 5261‑5273
pdf bib Hollywood Identity Bias Dataset: A Context Oriented Bias Analysis of Movie Dialogues
Sandhya Singh, Prapti Roy, Nihar Sahoo, Niteesh Mallela, Himanshu Gupta, Pushpak Bhattacharyya, Milind Savagaonkar, Nidhi Sultan, Roshni Ramnani, Anutosh Maitra and Shubhashis Sengupta
pp. 5274‑5285
pdf bib VoxCommunis: A Corpus for Cross-linguistic Phonetic Analysis
Emily Ahn and Eleanor Chodroff
pp. 5286‑5294
pdf bib Tracking Textual Similarities in Neo-Latin Drama Networks
Andrea Peverelli, Marieke van Erp and Jan Bloemendal
pp. 5295‑5303
pdf bib Named Entity Recognition in Estonian 19th Century Parish Court Records
Siim Orasmaa, Kadri Muischnek, Kristjan Poska and Anna Edela
pp. 5304‑5313
pdf bib Investigating Independence vs. Control: Agenda-Setting in Russian News Coverage on Social Media
Annerose Eichel, Gabriella Lapesa and Sabine Schulte im Walde
pp. 5314‑5323
pdf bib SLäNDa version 2.0: Improved and Extended Annotation of Narrative and Dialogue in Swedish Literature
Sara Stymne and Carin Östman
pp. 5324‑5333
pdf bib AGILe: The First Lemmatizer for Ancient Greek Inscriptions
Evelien de Graaf, Silvia Stopponi, Jasper K. Bos, Saskia Peels-Matthey and Malvina Nissim
pp. 5334‑5344
pdf bib »textklang« – Towards a Multi-Modal Exploration Platform for German Poetry
Nadja Schauffler, Toni Bernhart, Andre Blessing, Gunilla Eschenbach, Markus Gärtner, Kerstin Jung, Anna Kinder, Julia Koch, Sandra Richter, Gabriel Viehhauser, Ngoc Thang Vu, Lorenz Wesemann and Jonas Kuhn
pp. 5345‑5355
pdf bib Predicting the Proficiency Level of Nonnative Hebrew Authors
Isabelle Nguyen and Shuly Wintner
pp. 5356‑5365
pdf bib Trends, Limitations and Open Challenges in Automatic Readability Assessment Research
Sowmya Vajjala
pp. 5366‑5377
pdf bib HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
Mithun Das, Punyajoy Saha, Binny Mathew and Animesh Mukherjee
pp. 5378‑5387
pdf bib Surfer100: Generating Surveys From Web Resources, Wikipedia-style
Irene Li, Alex Fabbri, Rina Kawamura, Yixin Liu, Xiangru Tang, Jaesung tae, Chang Shen, Sally Ma, Tomoe Mizutani and Dragomir Radev
pp. 5388‑5392
pdf bib MS-LaTTE: A Dataset of Where and When To-do Tasks are Completed
Sujay Kumar Jauhar, Nirupama Chandrasekaran, Michael Gamon and Ryen White
pp. 5393‑5403
pdf bib KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics
Saida Mussakhojayeva, Yerbolat Khassanov and Huseyin Atakan Varol
pp. 5404‑5411
pdf bib A Graph-Based Method for Unsupervised Knowledge Discovery from Financial Texts
Joel Oksanen, Abhilash Majumder, Kumar Saunack, Francesca Toni and Arun Dhondiyal
pp. 5412‑5417
pdf bib Leveraging Mental Health Forums for User-level Depression Detection on Social Media
Sravani Boinepelli, Tathagata Raha, Harika Abburi, Pulkit Parikh, Niyati Chhaya and Vasudeva Varma
pp. 5418‑5427
pdf bib Classifying Implant-Bearing Patients via their Medical Histories: a Pre-Study on Swedish EMRs with Semi-Supervised GanBERT
Benjamin Danielsson, Marina Santini, Peter Lundberg, Yosef Al-Abasse, Arne Jonsson, Emma Eneling and Magnus Stridsman
pp. 5428‑5435
pdf bib Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect
Saméh Kchaou, rahma boujelbane, Emna Fsih and Lamia Hadrich-Belguith
pp. 5436‑5443
pdf bib EnsyNet: A Dataset for Encouragement and Sympathy Detection
Tiberiu Sosea and Cornelia Caragea
pp. 5444‑5449
pdf bib Preliminary Results on the Evaluation of Computational Tools for the Analysis of Quechua and Aymara
Marcelo Yuji Himoro and Antonio Pareja-Lora
pp. 5450‑5459
pdf bib A Tale of Two Regulatory Regimes: Creation and Analysis of a Bilingual Privacy Policy Corpus
Siddhant Arora, Henry Hosseini, Christine Utz, Vinayshekhar Bannihatti Kumar, Tristan Dhellemmes, Abhilasha Ravichander, Peter Story, Jasmine Mangat, Rex Chen, Martin Degeling, Thomas Norton, Thomas Hupperich, Shomir Wilson and Norman Sadeh
pp. 5460‑5472
pdf bib MeSHup: Corpus for Full Text Biomedical Document Indexing
Xindi Wang, Robert E. Mercer and Frank Rudzicz
pp. 5473‑5483
pdf bib Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding
Yanjun Gao, Dmitriy Dligach, Timothy Miller, Samuel Tesch, Ryan Laffin, Matthew M. Churpek and Majid Afshar
pp. 5484‑5493
pdf bib KC4MT: A High-Quality Corpus for Multilingual Machine Translation
Vinh Van Nguyen, Ha Nguyen, Huong Thanh Le, Thai Phuong Nguyen, Tan Van Bui, Luan Nghia Pham, Anh Tuan Phan, Cong Hoang-Minh Nguyen, Viet Hong Tran and Anh Huu Tran
pp. 5494‑5502
pdf bib Developing A Multilabel Corpus for the Quality Assessment of Online Political Talk
Kokil Jaidka
pp. 5503‑5510
pdf bib BILinMID: A Spanish-English Corpus of the US Midwest
Irati Hurtado
pp. 5511‑5516
pdf bib One Document, Many Revisions: A Dataset for Classification and Description of Edit Intents
Dheeraj Rajagopal, Xuchao Zhang, Michael Gamon, Sujay Kumar Jauhar, Diyi Yang and Eduard Hovy
pp. 5517‑5524
pdf bib CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform
Yue Cui, Junhui Zhu, Liner Yang, Xuezhi Fang, Xiaobin Chen, Yujie Wang and Erhong Yang
pp. 5525‑5538
pdf bib A Corpus for Suggestion Mining of German Peer Feedback
Roman Rietsche, Eva Ritz, Julius Janda and Dominik Pfütze
pp. 5539‑5547
pdf bib CLGC: A Corpus for Chinese Literary Grace Evaluation
Yi Li, Dong Yu and pengyuan liu
pp. 5548‑5556
pdf bib Anonymising the SAGT Speech Corpus and Treebank
Özlem Çetinoğlu and Antje Schweitzer
pp. 5557‑5564
pdf bib Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction
Daisuke Suzuki, Yujin Takahashi, Ikumi Yamashita, Taichi Aida, Tosho Hirasawa, Michitaka Nakatsuji, Masato Mita and Mamoru Komachi
pp. 5565‑5572
pdf bib Enhanced Distant Supervision with State-Change Information for Relation Extraction
Jui Shah, Dongxu Zhang, Sam Brody and Andrew McCallum
pp. 5573‑5579
pdf bib The Hebrew Essay Corpus
Chen Gafni, Anat Prior and Shuly Wintner
pp. 5580‑5586
pdf bib Design and Evaluation of the Corpus of Everyday Japanese Conversation
Hanae Koiso, Haruka Amatani, Yasuharu Den, Yuriko Iseki, Yuichi Ishimoto, Wakako Kashino, Yoshiko Kawabata, Ken’ya Nishikawa, Yayoi Tanaka, Yasuyuki Usuda and Yuka Watanabe
pp. 5587‑5594
pdf bib Developing Language Resources and NLP Tools for the North Korean Language
Arda Akdemir, Yeojoo Jeon and Tetsuo Shibuya
pp. 5595‑5600
pdf bib Developing a Dataset of Overridden Information in Wikipedia
Masatoshi Tsuchiya and Yasutaka Yokoi
pp. 5601‑5608
pdf bib BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
Bernardo Consoli, Henrique D. P. dos Santos, Ana Helena D. P. S. Ulbrich, Renata Vieira and Rafael H. Bordini
pp. 5609‑5616
pdf bib Universal Grammatical Dependencies for Portuguese with CINTIL Data, LX Processing and CLARIN support
António Branco, João Ricardo Silva, Luís Gomes and João António Rodrigues
pp. 5617‑5626
pdf bib CWID-hi: A Dataset for Complex Word Identification in Hindi Text
Gayatri Venugopal, Dhanya Pramod and Ravi Shekhar
pp. 5627‑5636
pdf bib Automatic Classification of Russian Learner Errors
Alla Rozovskaya
pp. 5637‑5647
pdf bib Annotation of metaphorical expressions in the Basic Corpus of Polish Metaphors
Elżbieta Hajnicz
pp. 5648‑5653
pdf bib ChiMST: A Chinese Medical Corpus for Word Segmentation and Medical Term Recognition
Yuanhe Tian, Han Qin, Fei Xia and Yan Song
pp. 5654‑5664
pdf bib Building a Synthetic Biomedical Research Article Citation Linkage Corpus
Sudipta Singha Roy and Robert E. Mercer
pp. 5665‑5672
pdf bib Dataset Construction for Scientific-Document Writing Support by Extracting Related Work Section and Citations from PDF Papers
Keita Kobayashi, Kohei Koyama, Hiromi Narimatsu and Yasuhiro Minami
pp. 5673‑5682
pdf bib RuPAWS: A Russian Adversarial Dataset for Paraphrase Identification
Nikita Martynov, Irina Krotova, Varvara Logacheva, Alexander Panchenko, Olga Kozlova and Nikita Semenov
pp. 5683‑5691
pdf bib Atril: an XML Visualization System for Corpus Texts
Andressa Rodrigues Gomide, Conceição Carapinha and Cornelia Plag
pp. 5692‑5695
pdf bib MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi
Aryaman Arora, Nitin Venkateswaran and Nathan Schneider
pp. 5696‑5704
pdf bib Universal Dependencies for Punjabi
Aryaman Arora
pp. 5705‑5711
pdf bib TeSum: Human-Generated Abstractive Summarization Corpus for Telugu
Ashok Urlana, Nirmal Surange, Pavan Baswani, Priyanka Ravva and Manish Shrivastava
pp. 5712‑5722
pdf bib A Corpus of Simulated Counselling Sessions with Dialog Act Annotation
John Lee, Haley Fong, Lai Shuen Judy Wong, Chun Chung Mak, Chi Hin Yip and Ching Wah Larry Ng
pp. 5723‑5730
pdf bib Interactive Evaluation of Dialog Track at DSTC9
Shikib Mehri, Yulan Feng, Carla Gordon, Seyed Hossein Alavi, David Traum and Maxine Eskenazi
pp. 5731‑5738
pdf bib HADREB: Human Appraisals and (English) Descriptions of Robot Emotional Behaviors
Josue Torres-Fonsesca and Casey Kennington
pp. 5739‑5748
pdf bib Dialogue Collection for Recording the Process of Building Common Ground in a Collaborative Task
Koh Mitsuda, Ryuichiro Higashinaka, Yuhei Oga and Sen Yoshida
pp. 5749‑5758
pdf bib Collection and Analysis of Travel Agency Task Dialogues with Age-Diverse Speakers
Michimasa Inaba, Yuya Chiba, Ryuichiro Higashinaka, Kazunori Komatani, Yusuke Miyao and Takayuki Nagai
pp. 5759‑5767
pdf bib Strategy-level Entrainment of Dialogue System Users in a Creative Visual Reference Resolution Task
Deepthi Karkada, Ramesh Manuvinakurike, Maike Paetzel-Prüsmann and Kallirroi Georgila
pp. 5768‑5777
pdf bib MMChat: Multi-Modal Chat Dataset on Social Media
Yinhe Zheng, Guanyi Chen, Xin Liu and Jian Sun
pp. 5778‑5786
pdf bib E-ConvRec: A Large-Scale Conversational Recommendation Dataset for E-Commerce Customer Service
meihuizi jia, Ruixue Liu, Peiying Wang, Yang Song, Zexi Xi, Haobin Li, Xin Shen, Meng Chen, Jinhui Pang and Xiaodong He
pp. 5787‑5796
pdf bib SHONGLAP: A Large Bengali Open-Domain Dialogue Corpus
Syed Mostofa Monsur, Sakib Chowdhury, Md Shahrar Fatemi and Shafayat Ahmed
pp. 5797‑5804
pdf bib A Comparison of Praising Skills in Face-to-Face and Remote Dialogues
Toshiki Onishi, Asahi Ogushi, Yohei Tahara, Ryo Ishii, Atsushi Fukayama, Takao Nakamura and Akihiro Miyata
pp. 5805‑5812
pdf bib Comparing Approaches to Language Understanding for Human-Robot Dialogue: An Error Taxonomy and Analysis
Ada Tur and David Traum
pp. 5813‑5820
pdf bib SPORTSINTERVIEW: A Large-Scale Sports Interview Benchmark for Entity-centric Dialogues
Hanfei Sun, Ziyuan Cao and Diyi Yang
pp. 5821‑5828
pdf bib EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues
Gopendra Vikram Singh, Priyanshu Priya, Mauajama Firdaus, Asif Ekbal and Pushpak Bhattacharyya
pp. 5829‑5837
pdf bib The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts
Krishnapriya Vishnubhotla, Adam Hammond and Graeme Hirst
pp. 5838‑5848
pdf bib Who’s in, who’s out? Predicting the Inclusiveness or Exclusiveness of Personal Pronouns in Parliamentary Debates
Ines Rehbein and Josef Ruppenhofer
pp. 5849‑5858
pdf bib A Language Modelling Approach to Quality Assessment of OCR’ed Historical Text
Callum Booth, Robert Shoemaker and Robert Gaizauskas
pp. 5859‑5864
pdf bib Identifying Copied Fragments in a 18th Century Dutch Chronicle
Roser Morante, Eleanor L. T. Smith, Lianne Wilhelmus, Alie Lassche and Erika Kuijpers
pp. 5865‑5878
pdf bib A Study of Distant Viewing of ukiyo-e prints
Konstantina Liagkou, John Pavlopoulos and Ewa Machotka
pp. 5879‑5888
pdf bib CCTAA: A Reproducible Corpus for Chinese Authorship Attribution Research
Haining Wang and Allen Riddell
pp. 5889‑5893
pdf bib An automatic model and Gold Standard for translation alignment of Ancient Greek
Tariq Yousef, Chiara Palladino, Farnoosh Shamsian, Anise d’Orange Ferreira and Michel Ferreira dos Reis
pp. 5894‑5905
pdf bib Rhetorical Structure Approach for Online Deception Detection: A Survey
Francielle Vargas, Jonas D‘Alessandro, Zohar Rabinovich, Fabrício Benevenuto and Thiago Pardo
pp. 5906‑5915
pdf bib TYPIC: A Corpus of Template-Based Diagnostic Comments on Argumentation
Shoichi Naito, Shintaro Sawada, Chihiro Nakagawa, Naoya Inoue, Kenshi Yamaguchi, Iori Shimizu, Farjana Sultana Mim, Keshav Singh and Kentaro Inui
pp. 5916‑5928
pdf bib Towards Speaker Verification for Crowdsourced Speech Collections
John Mendonca, Rui Correia, Mariana Lourenço, João Freitas and Isabel Trancoso
pp. 5929‑5937
pdf bib Align-smatch: A Novel Evaluation Method for Chinese Abstract Meaning Representation Parsing based on Alignment of Concept and Relation
Liming Xiao, Bin Li, Zhixing Xu, Kairui Huo, Minxuan Feng, Junsheng Zhou and Weiguang Qu
pp. 5938‑5945
pdf bib Dynamic Human Evaluation for Relative Model Comparisons
Thórhildur Thorleiksdóttir, Cedric Renggli, Nora Hollenstein and Ce Zhang
pp. 5946‑5955
pdf bib Please, Don’t Forget the Difference and the Confidence Interval when Seeking for the State-of-the-Art Status
Yves Bestgen
pp. 5956‑5962
pdf bib PCR4ALL: A Comprehensive Evaluation Benchmark for Pronoun Coreference Resolution in English
Xinran Zhao, Hongming Zhang and Yangqiu Song
pp. 5963‑5973
pdf bib Estimating Confidence of Predictions of Individual Classifiers and TheirEnsembles for the Genre Classification Task
Mikhail Lepekhin and Serge Sharoff
pp. 5974‑5982
pdf bib What do we really know about State of the Art NER?
Sowmya Vajjala and Ramya Balasubramaniam
pp. 5983‑5993
pdf bib ProQE: Proficiency-wise Quality Estimation dataset for Grammatical Error Correction
Yujin Takahashi, Masahiro Kaneko, Masato Mita and Mamoru Komachi
pp. 5994‑6000
pdf bib Evaluation of Off-the-shelf Speech Recognizers on Different Accents in a Dialogue Domain
Divya Tadimeti, Kallirroi Georgila and David Traum
pp. 6001‑6008
pdf bib Sentence Pair Embeddings Based Evaluation Metric for Abstractive and Extractive Summarization
Ramya Akula and Ivan Garibay
pp. 6009‑6017
pdf bib On “Human Parity” and “Super Human Performance”
in Machine Translation Evaluation

Thierry Poibeau
pp. 6018‑6023
pdf bib Evaluation Benchmarks for Spanish Sentence Representations
Vladimir Araujo, Andrés Carvallo, Souvik Kundu, José Cañete, Marcelo Mendoza, Robert E. Mercer, Felipe Bravo-Marquez, Marie-Francine Moens and Alvaro Soto
pp. 6024‑6034
pdf bib UMUTextStats: A linguistic feature extraction tool for Spanish
José Antonio García-Díaz, Pedro José Vivancos-Vicente, Ángela Almela and Rafael Valencia-García
pp. 6035‑6044
pdf bib Problem-solving Recognition in Scientific Text
Kevin Heffernan and Simone Teufel
pp. 6045‑6058
pdf bib HRCA+: Advanced Multiple-choice Machine Reading Comprehension Method
YUXIANG ZHANG and Hayato Yamana
pp. 6059‑6068
pdf bib HyperBox: A Supervised Approach for Hypernym Discovery using Box Embeddings
Maulik Parmar and Apurva Narayan
pp. 6069‑6076
pdf bib Extracting Space Situational Awareness Events from News Text
Zhengnan Xie, Alice Saebom Kwak, Enfa George, Laura W. Dozal, Hoang Van, Moriba Jah, Roberto Furfaro and Peter Jansen
pp. 6077‑6082
pdf bib PerCQA: Persian Community Question Answering Dataset
Naghme Jamali, Yadollah Yaghoobzadeh and Heshaam Faili
pp. 6083‑6092
pdf bib GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns
Piyawat Lertvittayakumjorn, Leshem Choshen, Eyal Shnarch and Francesca Toni
pp. 6093‑6103
pdf bib Recurrent Neural Networks with Mixed Hierarchical Structures and EM Algorithm for Natural Language Processing
zhaoxin luo and Michael Zhu
pp. 6104‑6113
pdf bib Korean-Specific Dataset for Table Question Answering
Changwook Jun, Jooyoung Choi, Myoseop Sim, Hyun Kim, Hansol Jang and Kyungkoo Min
pp. 6114‑6120
pdf bib GerCCT: An Annotated Corpus for Mining Arguments in German Tweets on Climate Change
Robin Schaefer and Manfred Stede
pp. 6121‑6130
pdf bib Budget Argument Mining Dataset Using Japanese Minutes from the National Diet and Local Assemblies
Yasutomo Kimura, Hokuto Ototake and Minoru Sasaki
pp. 6131‑6138
pdf bib Context-based Virtual Adversarial Training for Text Classification with Noisy Labels
Do-Myoung Lee, Yeachan Kim and Chang gyun Seo
pp. 6139‑6146
pdf bib FinMath: Injecting a Tree-structured Solver for Question Answering over Financial Reports
Chenying Li, Wenbo Ye and Yilun Zhao
pp. 6147‑6152
pdf bib HeadlineCause: A Dataset of News Headlines for Detecting Causalities
Ilya Gusev and Alexey Tikhonov
pp. 6153‑6161
pdf bib Incorporating Zoning Information into Argument Mining from Biomedical Literature
Boyang Liu, Viktor Schlegel, Riza Batista-Navarro and Sophia Ananiadou
pp. 6162‑6169
pdf bib MAKED: Multi-lingual Automatic Keyword Extraction Dataset
Yash Verma, Anubhav Jangra, Sriparna Saha, Adam Jatowt and Dwaipayan Roy
pp. 6170‑6179
pdf bib From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction
Robert Vacareanu, Marco A. Valenzuela-Escárcega, George Caique Gouveia Barbosa, Rebecca Sharp, Gustave Hahn-Powell and Mihai Surdeanu
pp. 6180‑6189
pdf bib Enhancing Relation Extraction via Adversarial Multi-task Learning
Han Qin, Yuanhe Tian and Yan Song
pp. 6190‑6199
pdf bib Query Obfuscation by Semantic Decomposition
Danushka Bollegala, Tomoya Machide and Ken-ichi Kawarabayashi
pp. 6200‑6211
pdf bib TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks
Ruofan Hu, Dongyu Zhang, Dandan Tao, Thomas Hartvigsen, Hao Feng and Elke Rundensteiner
pp. 6212‑6222
pdf bib Named Entity Recognition to Detect Criminal Texts on the Web
Paweł Skórzewski, Mikołaj Pieniowski and Grazyna Demenko
pp. 6223‑6231
pdf bib Task-Driven and Experience-Based Question Answering Corpus for In-Home Robot Application in the House3D Virtual Environment
zhuoqun Xu, Liubo Ouyang and Yang Liu
pp. 6232‑6239
pdf bib ELRC Action: Covering Confidentiality, Correctness and Cross-linguality
Tom Vanallemeersch, Arne Defauw, Sara Szoc, Alina Kramchaninova, Joachim Van den Bogaert and Andrea Lösch
pp. 6240‑6249
pdf bib RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports
Sarvesh Soni, Meghana Gudala, Atieh Pajouhi and Kirk Roberts
pp. 6250‑6259
pdf bib Knowledge Graph - Deep Learning: A Case Study in Question Answering in Aviation Safety Domain
Ankush Agarwal, Raj Gite, Shreya Laddha, Pushpak Bhattacharyya, Satyanarayan Kar, Asif Ekbal, Prabhjit Thind, Rajesh Zele and Ravi Shankar
pp. 6260‑6270
pdf bib A Bayesian Topic Model for Human-Evaluated Interpretability
Justin Wood, Corey Arnold and Wei Wang
pp. 6271‑6279
pdf bib A Large Interlinked Knowledge Graph of the Italian Cultural Heritage
Stefano Faralli, Andrea Lenzi and Paola Velardi
pp. 6280‑6289
pdf bib Training on Lexical Resources
Kenneth Church, Xingyu Cai and Yuchen Bian
pp. 6290‑6299
pdf bib Challenging the Assumption of Structure-based embeddings in Few- and Zero-shot Knowledge Graph Completion
Filip Cornell, Chenda zhang, Jussi Karlgren and Sarunas Girdzijauskas
pp. 6300‑6309
pdf bib Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases
Andis Lagzdiņš, Uldis Siliņš, Toms Bergmanis, Mārcis Pinnis, Artūrs Vasiļevskis and Andrejs Vasiļjevs
pp. 6310‑6316
pdf bib RELATE: Generating a linguistically inspired Knowledge Graph for fine-grained emotion classification
Annika Marie Schoene, Nina Dethlefs and Sophia Ananiadou
pp. 6317‑6327
pdf bib Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR
Nina Markl and Stephen Joseph McNulty
pp. 6328‑6339
pdf bib Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
Zaid Alyafeai, Maraim Masoud, Mustafa Ghaleb and Maged S. Al-shaibani
pp. 6340‑6351
pdf bib Linghub2: Language Resource Discovery Tool for Language Technologies
Cécile Robin, Gautham Vadakkekara Suresh, Víctor Rodriguez-Doncel, John P. McCrae and Paul Buitelaar
pp. 6352‑6360
pdf bib CxLM: A Construction and Context-aware Language Model
Yu-Hsiang Tseng, Cing-Fang Shih, Pin-Er Chen, Hsin-Yu Chou, Mao-Chang Ku and Shu-Kai HSIEH
pp. 6361‑6369
pdf bib The Lexometer: A Shiny Application for Exploratory Analysis and Visualization of Corpus Data
Oufan Hai, Matthew Sundberg, Katherine Trice, Rebecca Friedman and Scott Grimm
pp. 6370‑6376
pdf bib TallVocabL2Fi: A Tall Dataset of 15 Finnish L2 Learners’ Vocabulary
Frankie Robertson, Li-Hsin Chang and Sini Söyrinki
pp. 6377‑6386
pdf bib CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts
Muskan Garg, Chandni Saxena, Sriparna Saha, Veena Krishnan, Ruchi Joshi and Vijay Mago
pp. 6387‑6396
pdf bib How Does the Experimental Setting Affect the Conclusions of Neural Encoding Models?
Xiaohan Zhang, Shaonan Wang and Chengqing Zong
pp. 6397‑6404
pdf bib SPADE: A Big Five-Mturk Dataset of Argumentative Speech Enriched with Socio-Demographics for Personality Detection
Elma Kerz, Yu Qiao, Sourabh Zanwar and Daniel Wiechmann
pp. 6405‑6419
pdf bib Progress in Multilingual Speech Recognition for Low Resource Languages Kurmanji Kurdish, Cree and Inuktut
vishwa gupta and Gilles Boulianne
pp. 6420‑6428
pdf bib Efficient Entity Candidate Generation for Low-Resource Languages
Alberto Garcia-Duran, Akhil Arora and Robert West
pp. 6429‑6438
pdf bib What a Creole Wants, What a Creole Needs
Heather Lent, Kelechi Ogueji, Miryam de Lhoneux, Orevaoghene Ahia and Anders Søgaard
pp. 6439‑6449
pdf bib Extensions to Brahmic script processing within the Nisaba library: new scripts, languages and utilities
Alexander Gutkin, Cibu Johny, Raiomond Doctor, Lawrence Wolf-Sonkin and Brian Roark
pp. 6450‑6460
pdf bib Predicting Embedding Reliability in Low-Resource Settings Using Corpus Similarity Measures
Jonathan Dunn, Haipeng Li and Damian Sastre
pp. 6461‑6470
pdf bib Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
Idris Abdulmumin, Satya Ranjan Dash, Musa Abdullahi Dawud, Shantipriya Parida, Shamsuddeen Muhammad, Ibrahim Sa’id Ahmad, Subhadarshi Panda, Ondřej Bojar, Bashir Shehu Galadanci and Bello Shehu Bello
pp. 6471‑6479
pdf bib A Survey of Machine Translation Tasks on Nigerian Languages
Ebelechukwu Nwafor and Anietie Andy
pp. 6480‑6486
pdf bib Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung YIU, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram Shi and Pascale Fung
pp. 6487‑6494
pdf bib Survey on Thai NLP Language Resources and Tools
Ratchakrit Arreerard, Stephen Mander and Scott Piao
pp. 6495‑6505
pdf bib LaoPLM: Pre-trained Language Models for Lao
Nankai Lin, Yingwen Fu, Chuwei Chen, Ziyu Yang and Shengyi JIANG
pp. 6506‑6512
pdf bib The Maaloula Aramaic Speech Corpus (MASC): From Printed Material to a Lemmatized and Time-Aligned Corpus
Ghattas Eid, Esther Seyffarth and Ingo Plag
pp. 6513‑6520
pdf bib VIMQA: A Vietnamese Dataset for Advanced Reasoning and Explainable Multi-hop Question Answering
Khang Le, Hien Nguyen, Tung Le Thanh and Minh Nguyen
pp. 6521‑6529
pdf bib Language Identification for Austronesian Languages
Jonathan Dunn and Wikke Nijhof
pp. 6530‑6539
pdf bib A Mapudüngun FST Morphological Analyser and its Web Interface
Andrés Chandía
pp. 6540‑6547
pdf bib Improving Large-scale Language Models and Resources for Filipino
Jan Christian Blaise Cruz and Charibeth Cheng
pp. 6548‑6555
pdf bib Thirumurai: A Large Dataset of Tamil Shaivite Poems and Classification of Tamil Pann
Shankar Mahadevan, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Prabakaran Chandran, Ruba Priyadharshini, Sangeetha S and Bharathi Raja Chakravarthi
pp. 6556‑6562
pdf bib Generating Monolingual Dataset for Low Resource Language Bodo from old books using Google Keep
Sanjib Narzary, Maharaj Brahma, Mwnthai Narzary, Gwmsrang Muchahary, Pranav Kumar Singh, Apurbalal Senapati, Sukumar Nandi and Bidisha Som
pp. 6563‑6570
pdf bib AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition
Dhrubajyoti Pathak, Sukumar Nandi and Priyankoo Sarmah
pp. 6571‑6577
pdf bib GeezSwitch: Language Identification in Typologically Related Low-resourced East African Languages
Fitsum Gaim, Wonsuk Yang and Jong C. Park
pp. 6578‑6584
pdf bib Handwritten Paleographic Greek Text Recognition: A Century-Based Approach
Paraskevi Platanou, John Pavlopoulos and Georgios Papaioannou
pp. 6585‑6589
pdf bib Quality Control for Crowdsourced Bilingual Dictionary in Low-Resource Languages
Hiroki Chida, Yohei Murakami and Mondheera Pituxcoosuvarn
pp. 6590‑6596
pdf bib An Inflectional Database for Gitksan
Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai and Miikka Silfverberg
pp. 6597‑6606
pdf bib PyCantonese: Cantonese Linguistics and NLP in Python
Jackson Lee, Litong Chen, Charles Lam, Chaak Ming Lau and Tsz-Him Tsui
pp. 6607‑6611
pdf bib Afaan Oromo Hate Speech Detection and Classification on Social Media
Teshome Mulugeta Ababu and Michael Melese Woldeyohannis
pp. 6612‑6619
pdf bib Cross-lingual Linking of Automatically Constructed Frames and FrameNet
Ryohei Sasano
pp. 6620‑6625
pdf bib Aligning the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs
Ana-Maria Barbu, Verginica Barbu Mititelu and Cătălin Mititelu
pp. 6626‑6634
pdf bib PortiLexicon-UD: a Portuguese Lexical Resource according to Universal Dependencies Model
Lucelene Lopes, Magali Duran, Paulo Fernandes and Thiago Pardo
pp. 6635‑6643
pdf bib Extended Parallel Corpus for Amharic-English Machine Translation
Andargachew Mekonnen Gezmu, Andreas Nürnberger and Tesfaye Bayu Bati
pp. 6644‑6653
pdf bib Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French
Cheikh M. Bamba Dione, Alla LO, Elhadji Mamadou Nguer and sileye ba
pp. 6654‑6661
pdf bib Criteria for Useful Automatic Romanization in South Asian Languages
Isin Demirsahin, Cibu Johny, Alexander Gutkin and Brian Roark
pp. 6662‑6673
pdf bib BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation
Yuqian Dai, Marc de Kamps and Serge Sharoff
pp. 6674‑6690
pdf bib CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
Ye Jia, Michelle Tadmor Ramanovich, Quan Wang and Heiga Zen
pp. 6691‑6703
pdf bib JParaCrawl v3.0: A Large-scale English-Japanese Parallel Corpus
Makoto Morishita, Katsuki Chousa, Jun Suzuki and Masaaki Nagata
pp. 6704‑6710
pdf bib Learning How to Translate North Korean through South Korean
Hwichan Kim, Sangwhan Moon, Naoaki Okazaki and Mamoru Komachi
pp. 6711‑6718
pdf bib FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation
Wenhao Zhu, Shujian Huang, Tong Pu, Pingxuan Huang, xu zhang, Jian Yu, Wei Chen, Yanfeng Wang and Jiajun CHEN
pp. 6719‑6727
pdf bib SansTib, a Sanskrit - Tibetan Parallel Corpus and Bilingual Sentence Embedding Model
Sebastian Nehrdich
pp. 6728‑6734
pdf bib VISA: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation
Yihang Li, Shuichiro Shimizu, Weiqi Gu, Chenhui Chu and Sadao Kurohashi
pp. 6735‑6743
pdf bib A Benchmark Dataset for Multi-Level Complexity-Controllable Machine Translation
Kazuki Tani, Ryoya Yuasa, Kazuki Takikawa, Akihiro Tamura, Tomoyuki Kajiwara, Takashi Ninomiya and Tsuneo Kato
pp. 6744‑6752
pdf bib gaHealth: An English–Irish Bilingual Corpus of Health Data
Séamus Lankford, Haithem Afli, Órla Ní Loinsigh and Andy Way
pp. 6753‑6758
pdf bib Translation Memories as Baselines for Low-Resource Machine Translation
Rebecca Knowles and Patrick Littell
pp. 6759‑6767
pdf bib N24News: A New Dataset for Multimodal News Classification
Zhen Wang, Xu Shan, Xiangxie Zhang and Jie Yang
pp. 6768‑6775
pdf bib MultiSubs: A Large-scale Multimodal and Multilingual Dataset
Josiah Wang, Josiel Figueiredo and Lucia Specia
pp. 6776‑6785
pdf bib CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition
Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung YIU, Rita Frieske, Holy Lovenia, Genta Winata, Qifeng Chen, Xiaojuan Ma, Bertram Shi and Pascale Fung
pp. 6786‑6793
pdf bib Multimodal Negotiation Corpus with Various Subjective Assessments for Social-Psychological Outcome Prediction from Non-Verbal Cues
Nobukatsu Hojo, Satoshi Kobashikawa, Saki Mizuno and Ryo Masumura
pp. 6794‑6801
pdf bib MMDAG: Multimodal Directed Acyclic Graph Network for Emotion Recognition in Conversation
Shuo Xu, Yuxiang Jia, Changyong Niu and Hongying Zan
pp. 6802‑6807
pdf bib Automatic Gloss-level Data Augmentation for Sign Language Translation
Jin Yea Jang, Han-Mu Park, Saim Shin, Suna Shin, Byungcheon Yoon and Gahgene Gweon
pp. 6808‑6813
pdf bib Image Description Dataset for Language Learners
Kento Tanaka, Taichi Nishimura, Hiroaki Nanjo, Keisuke Shirai, Hirotaka Kameko and Masatake Dantsuji
pp. 6814‑6821
pdf bib The Multimodal Annotation Software Tool (MAST)
Bruno Cardoso and Neil Cohn
pp. 6822‑6828
pdf bib A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning
Gerald Schwiebert, Cornelius Weber, Leyuan Qu, Henrique Siqueira and Stefan Wermter
pp. 6829‑6836
pdf bib Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers
Muskan Garg, Seema Wazarkar, Muskaan Singh and Ondřej Bojar
pp. 6837‑6847
pdf bib Cross-lingual and Multilingual CLIP
Fredrik Carlsson, Philipp Eisen, Faton Rekathati and Magnus Sahlgren
pp. 6848‑6854
pdf bib BAN-Cap: A Multi-Purpose English-Bangla Image Descriptions Dataset
Mohammad Faiyaz Khan, S.M. Sadiq-Ur-Rahman Shifath and Md Saiful Islam
pp. 6855‑6865
pdf bib SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition
Naoki Kimura, Zixiong Su, Takaaki Saeki and Jun Rekimoto
pp. 6866‑6873
pdf bib A Simple Yet Effective Corpus Construction Method for Chinese Sentence Compression
Yang Zhao, Hiroshi Kanayama, Issei Yoshida, Masayasu Muraoka and Akiko Aizawa
pp. 6874‑6883
pdf bib JADE: Corpus for Japanese Definition Modelling
Han Huang, Tomoyuki Kajiwara and Yuki Arase
pp. 6884‑6888
pdf bib Unraveling the Mystery of Artifacts in Machine Generated Text
Jiashu Pu, Ziyi Huang, Yadong Xi, Guandan Chen, Weijie Chen and Rongsheng Zhang
pp. 6889‑6898
pdf bib Logic-Guided Message Generation from Raw Real-Time Sensor Data
Ernie Chang, Alisa Kovtunova, Stefan Borgwardt, Vera Demberg, Kathryn Chapman and Hui-Syuan Yeh
pp. 6899‑6908
pdf bib The Bull and the Bear: Summarizing Stock Market Discussions
Ayush Kumar, Dhyey Jani, Jay Shah, Devanshu Thakar, Varun Jain and Mayank Singh
pp. 6909‑6913
pdf bib Combination of Contextualized and Non-Contextualized Layers for Lexical Substitution in French
Kévin Espasa, Emmanuel Morin and Olivier Hamon
pp. 6914‑6921
pdf bib SuMe: A Dataset Towards Summarizing Biomedical Mechanisms
Mohaddeseh Bastan, Nishant Shankar, Mihai Surdeanu and Niranjan Balasubramanian
pp. 6922‑6931
pdf bib CATAMARAN: A Cross-lingual Long Text Abstractive Summarization Dataset
zheng chen and Hongyu Lin
pp. 6932‑6937
pdf bib Emotion analysis and detection during COVID-19
Tiberiu Sosea, Chau Pham, Alexander Tekle, Cornelia Caragea and Junyi Jessy Li
pp. 6938‑6947
pdf bib Cross-lingual Emotion Detection
Sabit Hassan, Shaden Shaar and Kareem Darwish
pp. 6948‑6958
pdf bib DirectQuote: A Dataset for Direct Quotation Extraction and Attribution in News Articles
Yuanchi Zhang and Yang Liu
pp. 6959‑6966
pdf bib VaccineLies: A Natural Language Resource for Learning to Recognize Misinformation about the COVID-19 and HPV Vaccines
Maxwell Weinzierl and Sanda Harabagiu
pp. 6967‑6975
pdf bib Tackling Irony Detection using Ensemble Classifiers
Christoph Turban and Udo Kruschwitz
pp. 6976‑6984
pdf bib Automatic Construction of an Annotated Corpus with Implicit Aspects
Aye Aye Mar and Kiyoaki Shirai
pp. 6985‑6991
pdf bib A Multimodal Corpus for Emotion Recognition in Sarcasm
Anupama Ray, Shubham Mishra, Apoorva Nunna and Pushpak Bhattacharyya
pp. 6992‑7003
pdf bib Annotation of Valence Unfolding in Spoken Personal Narratives
Aniruddha Tammewar, Franziska Braun, Gabriel Roccabruna, Sebastian Bayerl, Korbinian Riedhammer and Giuseppe Riccardi
pp. 7004‑7013
pdf bib A Large-Scale Japanese Dataset for Aspect-based Sentiment Analysis
Yuki Nakayama, Koji Murakami, Gautam Kumar, Sudha Bhingardive and Ikuko Hardaway
pp. 7014‑7021
pdf bib A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain
Haruya Suzuki, Yuto Miyauchi, Kazuki Akiyama, Tomoyuki Kajiwara, Takashi Ninomiya, Noriko Takemura, Yuta Nakashima and Hajime Nagahara
pp. 7022‑7028
pdf bib Complementary Learning of Aspect Terms for Aspect-based Sentiment Analysis
Han Qin, Yuanhe Tian, Fei Xia and Yan Song
pp. 7029‑7039
pdf bib Deep One-Class Hate Speech Detection Model
saugata bose and Dr. Guoxin Su
pp. 7040‑7048
pdf bib Opinions in Interactions : New Annotations of the SEMAINE Database
Valentin Barriere, Slim Essid and Chloé Clavel
pp. 7049‑7055
pdf bib Pars-ABSA: a Manually Annotated Aspect-based Sentiment Analysis Benchmark on Farsi Product Reviews
Taha Shangipour ataei, Kamyar Darvishi, Soroush Javdan, Behrouz Minaei-Bidgoli and Sauleh Eetemadi
pp. 7056‑7060
pdf bib HindiMD: A Multi-domain Corpora for Low-resource Sentiment Analysis
Mamta ., Asif Ekbal, Pushpak Bhattacharyya, Tista Saha, Alka Kumar and Shikha Srivastava
pp. 7061‑7070
pdf bib Sentiment Analysis of Homeric Text: The 1st Book of Iliad
John Pavlopoulos, Alexandros Xenos and Davide Picca
pp. 7071‑7077
pdf bib The Persian Dependency Treebank Made Universal
Pegah Safari, Mohammad Sadegh Rasooli, Amirsaeid Moloodi and Alireza Nourian
pp. 7078‑7087
pdf bib GujMORPH - A Dataset for Creating Gujarati Morphological Analyzer
Jatayu Baxi and brijesh bhatt
pp. 7088‑7095
pdf bib Informal Persian Universal Dependency Treebank
Roya Kabiri, Simin Karimi and Mihai Surdeanu
pp. 7096‑7105
pdf bib Automatic Correction of Syntactic Dependency Annotation Differences
Andrew Zupon, Andrew Carnie, Michael Hammond and Mihai Surdeanu
pp. 7106‑7112
pdf bib Building Large-Scale Japanese Pronunciation-Annotated Corpora for Reading Heteronymous Logograms
Fumikazu Sato, Naoki Yoshinaga and Masaru Kitsuregawa
pp. 7113‑7121
pdf bib StyleKQC: A Style-Variant Paraphrase Corpus for Korean Questions and Commands
Won Ik Cho, Sangwhan Moon, Jongin Kim, Seokmin Kim and Nam Soo Kim
pp. 7122‑7128
pdf bib Syntax-driven Approach for Semantic Role Labeling
Yuanhe Tian, Han Qin, Fei Xia and Yan Song
pp. 7129‑7139
pdf bib HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish
Marcin Woliński, Bartłomiej Nitoń, Witold Kieraś and Jakub Szymanik
pp. 7140‑7146
pdf bib Lexical Resource Mapping via Translations
hongchang Bao, Bradley Hauer and Grzegorz Kondrak
pp. 7147‑7154
pdf bib Unsupervised Attention-based Sentence-Level Meta-Embeddings from Contextualised Language Models
Keigo Takahashi and Danushka Bollegala
pp. 7155‑7163
pdf bib Identification of Fine-Grained Location Mentions in Crisis Tweets
Sarthak Khanal, Maria Traskowsky and Doina Caragea
pp. 7164‑7173
pdf bib HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection
Francielle Vargas, Isabelle Carvalho, Fabiana Rodrigues de Góes, Thiago Pardo and Fabrício Benevenuto
pp. 7174‑7183
pdf bib MentalBERT: Publicly Available Pretrained Language Models for Mental Healthcare
Shaoxiong Ji, Tianlin Zhang, Luna Ansari, Jie Fu, Prayag Tiwari and Erik Cambria
pp. 7184‑7190
pdf bib Leveraging Hashtag Networks for Multimodal Popularity Prediction of Instagram Posts
Yu Yun Liao
pp. 7191‑7198
pdf bib Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis
Hang Jiang, Yining Hua, Doug Beeferman and Deb Roy
pp. 7199‑7208
pdf bib Did that happen? Predicting Social Media Posts that are Indicative of what happened in a scene: A case study of a TV show
Anietie Andy, Reno Kriz, Sharath Chandra Guntuku, Derry Tanti Wijaya and Chris Callison-Burch
pp. 7209‑7214
pdf bib HashSet - A Dataset For Hashtag Segmentation
Prashant Kodali, Akshala Bhatnagar, Naman Ahuja, Manish Shrivastava and Ponnurangam Kumaraguru
pp. 7215‑7219
pdf bib Using Convolution Neural Network with BERT for Stance Detection in Vietnamese
Oanh Tran, Anh Cong Phung and Bach Xuan Ngo
pp. 7220‑7225
pdf bib Annotation-Scheme Reconstruction for "Fake News" and Japanese Fake News Dataset
Taichi Murayama, Shohei Hisada, Makoto Uehara, Shoko Wakamiya and Eiji ARAMAKI
pp. 7226‑7234
pdf bib RoBERTuito: a pre-trained language model for social media text in Spanish
Juan Manuel Pérez, Damián Ariel Furman, Laura Alonso Alemany and Franco M. Luque
pp. 7235‑7243
pdf bib Construction of Responsive Utterance Corpus for Attentive Listening Response Production
Koichiro Ito, Masaki Murata, Tomohiro Ohno and Shigeki Matsubara
pp. 7244‑7252
pdf bib Speak: A Toolkit Using Amazon Mechanical Turk to Collect and Validate Speech Audio Recordings
Christopher Song, David Harwath, Tuka Alhanai and James Glass
pp. 7253‑7258
pdf bib ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Holy Lovenia, Samuel Cahyawijaya, Genta Winata, Peng Xu, Yan Xu, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram Shi and Pascale Fung
pp. 7259‑7268
pdf bib A Romanization System and WebMAUS Aligner for Arabic Varieties
Jalal Al-Tamimi, Florian Schiel, Ghada Khattab, Navdeep Sokhey, Djegdjiga Amazouz, Abdulrahman Dallak and Hajar Moussa
pp. 7269‑7276
pdf bib BembaSpeech: A Speech Recognition Corpus for the Bemba Language
Claytone Sikasote and Antonios Anastasopoulos
pp. 7277‑7283
pdf bib BehanceCC: A ChitChat Detection Dataset For Livestreaming Video Transcripts
Viet Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt and Thien Huu Nguyen
pp. 7284‑7290
pdf bib Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection
Sheng Li, Jiyi Li, Qianying Liu and Zhuo Gong
pp. 7291‑7297
pdf bib A new European Portuguese corpus for the study of Psychosis through speech analysis
Maria Forjó, Daniel Neto, Alberto Abad, HSofia Pinto and Joaquim Gago
pp. 7298‑7304
pdf bib Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks
Aghilas SINI, Damien Lolive, Nelly Barbot and Pierre Alain
pp. 7305‑7313
pdf bib Multilingual Transfer Learning for Children Automatic Speech Recognition
Thomas Rolland, Alberto Abad, Catia Cucchiarini and Helmer Strik
pp. 7314‑7320
pdf bib BehanceQA: A New Dataset for Identifying Question-Answer Pairs in Video Transcripts
Amir Pouran Ben Veyseh, Viet Lai, Franck Dernoncourt and Thien Huu Nguyen
pp. 7321‑7327
pdf bib Bidirectional Skeleton-Based Isolated Sign Recognition using Graph Convolutional Networks
Konstantinos M. Dafnis, Evgenia Chroni, Carol Neidle and Dimitri Metaxas
pp. 7328‑7338
pdf bib Deep learning-based end-to-end spoken language identification system for domain-mismatched scenario
Woohyun Kang, Md Jahangir Alam and Abderrahim Fathan
pp. 7339‑7343
pdf bib Handwritten Character Generation using Y-Autoencoder for Character Recognition Model Training
Tomoki Kitagawa, Chee Siang Leow and Hiromitsu Nishizaki
pp. 7344‑7351
pdf bib Attention is All you Need for Robust Temporal Reasoning
Lis Kanashiro Pereira
pp. 7352‑7359
pdf bib PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter
Kornraphop Kawintiranon and Lisa Singh
pp. 7360‑7367
pdf bib Modeling the Impact of Syntactic Distance and Surprisal on Cross-Slavic Text Comprehension
Irina Stenger, Philip Georgis, Tania Avgustinova, Bernd Möbius and Dietrich Klakow
pp. 7368‑7376
pdf bib BERTifying Sinhala - A Comprehensive Analysis of Pre-trained Language Models for Sinhala Text Classification
Vinura Dhananjaya, Piyumal Demotte, Surangika Ranathunga and Sanath Jayasena
pp. 7377‑7385
pdf bib Pre-training and Evaluating Transformer-based Language Models for Icelandic
Jón Friðrik Daðason and Hrafn Loftsson
pp. 7386‑7391

Last modified on June 13, 2022, 10:59 a.m.