TOPIC: Browse articles of the conference sorted by topic

A - C - D - E - G - H - I - K - L - M - N - O - P - Q - S - T - U - V - W

A
Acquisition A Neural Network Based Model for Loanword Identification in Uyghur
Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation
Contextualized Usage-Based Material Selection
CBFC: a parallel L2 speech corpus for Korean and French learners
A Large Resource of Patterns for Verbal Paraphrases
A 2nd Longitudinal Corpus for Children's Writing with Enhanced Output for Specific Spelling Patterns
Automatic Annotation of Semantic Term Types in the Complete ACL Anthology Reference Corpus
Infant Word Comprehension-to-Production Index Applied to Investigation of Noun Learning Predominance Using Cross-lingual CDI database
Using English Baits to Catch Serbian Multi-Word Terminology
BioRo: The Biomedical Corpus for the Romanian Language
Analyzing Vocabulary Commonality Index Using Large-scaled Database of Child Language Development
KIT-Multi: A Translation-Oriented Multilingual Embedding Corpus
Multi-lingual Argumentative Corpora in English, Turkish, Greek, Albanian, Croatian, Serbian, Macedonian, Bulgarian, Romanian and Arabic
The Distribution and Prosodic Realization of Verb Forms in German Infant-Directed Speech
A Large Multilingual and Multi-domain Dataset for Recommender Systems
Cross-linguistically Small World Networks are Ubiquitous in Child-directed Speech
L1-L2 Parallel Treebank of Learner Chinese: Overused and Underused Syntactic Structures
The Use of Text Alignment in Semi-Automatic Error Analysis: Use Case in the Development of the Corpus of the Latvian Language Learners
Error annotation in a Learner Corpus of Portuguese
The AnnCor CHILDES Treebank
A Speaking Atlas of the Regional Languages of France
BabyCloud, a Technological Platform for Parents and Researchers
Anaphora, Coreference Discourse Coherence Through the Lens of an Annotated Text Corpus: A Case Study
Classifying Sluice Occurrences in Dialogue
A Gold Anaphora Annotation Layer on an Eye Movement Corpus
Annotating Zero Anaphora for Question Answering
Towards a Diagnosis of Textual Difficulties for Children with Dyslexia
Cross-Document, Cross-Language Event Coreference Annotation Using Event Hoppers
BASHI: A Corpus of Wall Street Journal Articles Annotated with Bridging Links
Deep Neural Networks for Coreference Resolution for Polish
SzegedKoref: A Hungarian Coreference Corpus
A Corpus to Learn Refer-to-as Relations for Nominals
Sanaphor++: Combining Deep Neural Networks with Semantics for Coreference Resolution
ANCOR-AS: Enriching the ANCOR Corpus with Syntactic Annotations
ParCorFull: a Parallel Corpus Annotated with Full Coreference
Coreference Resolution in FreeLing 4.0
SACR: A Drag-and-Drop Based Tool for Coreference Annotation

 

C
Cognitive Methods A Gold Anaphora Annotation Layer on an Eye Movement Corpus
The Natural Stories Corpus
Contextualized Usage-Based Material Selection
Unfolding the External Behavior and Inner Affective State of Teammates through Ensemble Learning: Experimental Evidence from a Dyadic Team Corpus
Definite Description Lexical Choice: taking Speaker's Personality into account
Referring Expression Generation in time-constrained communication
Multi Modal Distance - An Approach to Stemma Generation With Weighting
Knowing the Author by the Company His Words Keep
Fluid Annotation: A Granularity-aware Annotation Tool for Chinese Word Fluidity
Rollenwechsel-English: a large-scale semantic role corpus
Building an English Vocabulary Knowledge Dataset of Japanese English-as-a-Second-Language Learners Using Crowdsourcing
Automatic Labeling of Problem-Solving Dialogues for Computational Microgenetic Learning Analytics
CoLoSS: Cognitive Load Corpus with Speech and Performance Data from a Symbol-Digit Dual-Task
The Linguistic Category Model in Polish (LCM-PL)
Interpersonal Relationship Labels for the CALLHOME Corpus
WordKit: a Python Package for Orthographic and Phonological Featurization
Computer-Assisted Language Learning (Call) Contextualized Usage-Based Material Selection
Towards a Diagnosis of Textual Difficulties for Children with Dyslexia
SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning
CEFR-based Lexical Simplification Dataset
Semi-Supervised Clustering for Short Answer Scoring
Building a TOCFL Learner Corpus for Chinese Grammatical Error Diagnosis
MIAPARLE: Online training for the discrimination of stress contrasts
A Leveled Reading Corpus of Modern Standard Arabic
L1-L2 Parallel Treebank of Learner Chinese: Overused and Underused Syntactic Structures
Generation of a Spanish Artificial Collocation Error Corpus
Building an English Vocabulary Knowledge Dataset of Japanese English-as-a-Second-Language Learners Using Crowdsourcing
Error annotation in a Learner Corpus of Portuguese
An SLA Corpus Annotated with Pedagogically Relevant Grammatical Structures
EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language
ESCRITO - An NLP-Enhanced Educational Scoring Toolkit
Revita: a Language-learning Platform at the Intersection of ITS and CALL
Development of a Mobile Observation Support System for Students: FishWatchr Mini
Controlled Languages A Real-life, French-accented Corpus of Air Traffic Control Communications
Simplified Corpus with Core Vocabulary
Corpus (Creation, Annotation, Etc.) Discourse Coherence Through the Lens of an Annotated Text Corpus: A Case Study
Building Parallel Monolingual Gan Chinese Dialects Corpus
A FrameNet for Cancer Information in Clinical Narratives: Schema and Annotation
MOCCA: Measure of Confidence for Corpus Analysis - Automatic Reliability Check of Transcript and Automatic Segmentation
A Recorded Debating Dataset
Linking, Searching, and Visualizing Entities in Wikipedia
Multilingual Parallel Corpus for Global Communication Plan
Open Subtitles Paraphrase Corpus for Six Languages
Incorporating Global Contexts into Sentence Embedding for Relational Extraction at the Paragraph Level with Distant Supervision
Classifying Sluice Occurrences in Dialogue
A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts
Building a Corpus from Handwritten Picture Postcards: Transcription, Annotation and Part-of-Speech Tagging
Annotating High-Level Structures of Short Stories and Personal Anecdotes
Sentence Level Temporality Detection using an Implicit Time-sensed Resource
Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl
MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
A New Corpus to Support Text Mining for the Curation of Metabolites in the ChEBI Database
A Corpus of Metaphor Novelty Scores for Syntactically-Related Word Pairs
Collection of Multimodal Dialog Data and Analysis of the Result of Annotation of Users' Interest Level
Multi-layer Annotation of the Rigveda
Universal Dependencies Version 2 for Japanese
ESCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing
OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora
Building an Ellipsis-aware Chinese Dependency Treebank for Web Text
MPST: A Corpus of Movie Plot Synopses with Tags
EuroGames16: Evaluating Change Detection in Online Conversation
The Natural Stories Corpus
The MADAR Arabic Dialect Corpus and Lexicon
Semi-automatic Korean FrameNet Annotation over KAIST Treebank
A Large Parallel Corpus of Full-Text Scientific Articles
Developing the Bangla RST Discourse Treebank
Computer-assisted Speaker Diarization: How to Evaluate Human Corrections
Towards an ISO Standard for the Annotation of Quantification
Distribution of Emotional Reactions to News Articles in Twitter
Lightweight Grammatical Annotation in the TEI: New Perspectives
Fine-grained Semantic Textual Similarity for Serbian
Constructing a Lexicon of Relational Nouns
TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection
Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation
NegPar: A parallel corpus annotated for negation
SPADE: Evaluation Dataset for Monolingual Phrase Alignment
Recognizing Behavioral Factors while Driving: A Real-World Multimodal Corpus to Monitor the Driver’s Affective State
Annotating Zero Anaphora for Question Answering
Diacritics Restoration Using Neural Networks
Comprehensive Annotation of Various Types of Temporal Information on the Time Axis
EmotionLines: An Emotion Corpus of Multi-Party Conversations
Comparison of Pun Detection Methods Using Japanese Pun Corpus
A Gold Standard for Multilingual Automatic Term Extraction from Comparable Corpora: Term Structure and Translation Equivalents
Toward An Epic Epigraph Graph
Framing Named Entity Linking Error Types
Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation
T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples
PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies
ETPC - A Paraphrase Identification Corpus Annotated with Extended Paraphrase Typology and Negation
Dialogue Structure Annotation for Multi-Floor Interaction
Extracting an English-Persian Parallel Corpus from Comparable Corpora
A Corpus of eRulemaking User Comments for Measuring Evaluability of Arguments
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
A Multi-layer Annotated Corpus of Argumentative Text: From Argument Schemes to Discourse Relations
A New Version of the Składnica Treebank of Polish Harmonised with the Walenty Valency Dictionary
A Lexical Tool for Academic Writing in Spanish based on Expert and Novice Corpora
Creating a Translation Matrix of the Bible’s Names Across 591 Languages
A Dataset for Inter-Sentence Relation Extraction using Distant Supervision
The French-Algerian Code-Switching Triggered audio corpus (FACST)
Chahta Anumpa: A multimodal corpus of the Choctaw Language
SumeCzech: Large Czech News-Based Summarization Dataset
The IIT Bombay English-Hindi Parallel Corpus
Parallel Corpora for the Biomedical Domain
Improving Machine Translation of Educational Content via Crowdsourcing
Abstract Meaning Representation of Constructions: The More We Include, the Better the Representation
Aggression-annotated Corpus of Hindi-English Code-mixed Data
A Diachronic Corpus for Literary Style Analysis
CBFC: a parallel L2 speech corpus for Korean and French learners
Systems’ Agreements and Disagreements in Temporal Processing: An Extensive Error Analysis of the TempEval-3 Task
Towards a Diagnosis of Textual Difficulties for Children with Dyslexia
Automatic Wordnet Mapping: from CoreNet to Princeton WordNet
Introducing NIEUW: Novel Incentives and Workflows for Eliciting Linguistic Data
BULBasaa: A Bilingual Basaa-French Speech Corpus for the Evaluation of Language Documentation Tools
Handling Rare Word Problem using Synthetic Training Data for Sinhala and Tamil Neural Machine Translation
Annotating Temporally-Anchored Spatial Knowledge by Leveraging Syntactic Dependencies
Researching Less-Resourced Languages – the DigiSami Corpus
Semantic Supersenses for English Possessives
A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
Annotating Educational Questions for Student Response Analysis
Annotating If the Authors of a Tweet are Located at the Locations They Tweet About
SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning
Simple Semantic Annotation and Situation Frames: Two Approaches to Basic Text Understanding in LORELEI
Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions
Laying the Groundwork for Knowledge Base Population: Nine Years of Linguistic Resources for TAC KBP
An Attribution Relations Corpus for Political News
No more beating about the bush : A Step towards Idiom Handling for Indian Language NLP
Text Simplification from Professionally Produced Corpora
The New Propbank: Aligning Propbank with AMR through POS Unification
CONDUCT: An Expressive Conducting Gesture Dataset for Sound Control
Intertextual Correspondence for Integrating Corpora
Medical Entity Corpus with PICO elements and Sentiment Analysis
A vision-grounded dataset for predicting typical locations for verbs
Evaluation of Croatian Word Embeddings
BlogSet-BR: A Brazilian Portuguese Blog Corpus
Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus
A Multilingual Approach to Question Classification
The Nautilus Speaker Characterization Corpus: Speech Recordings and Labels of Speaker Characteristics and Voice Descriptions
SMILE Swiss German Sign Language Dataset
JESC: Japanese-English Subtitle Corpus
Building a Corpus for Personality-dependent Natural Language Understanding and Generation
Dataset for the First Evaluation on Chinese Machine Reading Comprehension
A Multi-Domain Framework for Textual Similarity. A Case Study on Question-to-Question and Question-Answering Similarity Tasks
Referring Expression Generation in time-constrained communication
Design and Development of Speech Corpora for Air Traffic Control Training
BiLSTM-CRF for Persian Named-Entity Recognition ArmanPersoNERCorpus: the First Entity-Annotated Persian Dataset
An Application for Building a Polish Telephone Speech Corpus
Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus
Using Discourse Information for Education with a Spanish-Chinese Parallel Corpus
A 2nd Longitudinal Corpus for Children's Writing with Enhanced Output for Specific Spelling Patterns
CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects
BKTreebank: Building a Vietnamese Dependency Treebank
Data Anonymization for Requirements Quality Analysis: a Reproducible Automatic Error Detection Task
WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference
Annotating Attribution Relations in Arabic
BASHI: A Corpus of Wall Street Journal Articles Annotated with Bridging Links
A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events
A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
Albanian Part-of-Speech Tagging: Gold Standard and Evaluation
A First South African Corpus of Multilingual Code-switched Soap Opera Speech
Collecting Code-Switched Data from Social Media
All-words Word Sense Disambiguation Using Concept Embeddings
English-Basque Statistical and Neural Machine Translation
A Real-life, French-accented Corpus of Air Traffic Control Communications
Correction of OCR Word Segmentation Errors in Articles from the ACL Collection through Neural Machine Translation Methods
Sentiment-Stance-Specificity (SSS) Dataset: Identifying Support-based Entailment among Opinions.
A «Portrait» Approach to Multichannel Discourse
Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages
Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank
Creating dialect sub-corpora by clustering: a case in Japanese for an adaptive method
Resource Creation Towards Automated Sentiment Analysis in Telugu (a low resource language) and Integrating Multiple Domain Sources to Enhance Sentiment Prediction
Building a Macro Chinese Discourse Treebank
The Morpho-syntactic Annotation of Animacy for a Dependency Parser
Automatic Annotation of Semantic Term Types in the Complete ACL Anthology Reference Corpus
A Large Self-Annotated Corpus for Sarcasm
JAIST Annotated Corpus of Free Conversation
The Metalogue Debate Trainee Corpus: Data Collection and Annotations
MYCanCor: A Video Corpus of spoken Malaysian Cantonese
AET: Web-based Adjective Exploration Tool for German
HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
Mapping Texts to Scripts: An Entailment Study
Linguistic and Sociolinguistic Annotation of 17th Century Dutch Letters
Unsupervised Korean Word Sense Disambiguation using CoreNet
Arabic Dialect Identification in the Context of Bivalency and Code-Switching
CEFR-based Lexical Simplification Dataset
QUD-Based Annotation of Discourse Structure and Information Structure: Tool and Evaluation
UFSAC: Unification of Sense Annotated Corpora and Tools
Classifying the Informative Behaviour of Emoji in Microblogs
MIsA: Multilingual "IsA" Extraction from Corpora
Creating Lithuanian and Latvian Speech Corpora from Inaccurately Annotated Web Data
KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue
Annotating Chinese Light Verb Constructions according to PARSEME guidelines
Korean L2 Vocabulary Prediction: Can a Large Annotated Corpus be Used to Train Better Models for Predicting Unknown Words?
A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
Annotating Spin in Biomedical Scientific Publications : the case of Random Controlled Trials (RCTs)
Simplified Corpus with Core Vocabulary
A High-Quality Gold Standard for Citation-based Tasks
Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard
A Pragmatic Approach for Classical Chinese Word Segmentation
A Corpus of Natural Multimodal Spatial Scene Descriptions
ZAP: An Open-Source Multilingual Annotation Projection Framework
Distributional Term Set Expansion
On the Vector Representation of Utterances in Dialogue Context
A Taxonomy for In-depth Evaluation of Normalization for User Generated Content
A Swedish Cookie-Theft Corpus
Live Blog Corpus for Summarization
FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German
ES-Port: a Spontaneous Spoken Human-Human Technical Support Corpus for Dialogue Research in Spanish
SzegedKoref: A Hungarian Coreference Corpus
Crowdsourced Corpus of Sentence Simplification with Core Vocabulary
A Corpus to Learn Refer-to-as Relations for Nominals
The Effects of Unimodal Representation Choices on Multimodal Learning
Dialog Intent Structure: A Hierarchical Schema of Linked Dialog Acts
Analysis of Implicit Conditions in Database Search Dialogues
JDCFC: A Japanese Dialogue Corpus with Feature Changes
Gaining and Losing Influence in Online Conversation
Humor Detection in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System
Towards AMR-BR: A SemBank for Brazilian Portuguese Language
Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications
Towards Language Technology for Mi'kmaq
ASAP++: Enriching the ASAP Automated Essay Grading Dataset with Essay Attribute Scores
Building A Handwritten Cuneiform Character Imageset
Transfer of Frames from English FrameNet to Construct Chinese FrameNet: A Bilingual Corpus-Based Approach
Building Universal Dependency Treebanks in Korean
Building a Sentiment Corpus of Tweets in Brazilian Portuguese
A Bird’s-eye View of Language Processing Projects at the Romanian Academy
Construction of Large-scale English Verbal Multiword Expression Annotated Corpus
Unified Guidelines and Resources for Arabic Dialect Orthography
A Parallel Corpus of Arabic-Japanese News Articles
EMTC: Multilabel Corpus in Movie Domain for Emotion Analysis in Conversational Text
The ADELE Corpus of Dyadic Social Text Conversations:Dialog Act Annotation with ISO 24617-2
The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions
Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models
The Automatic Annotation of the Semiotic Type of Hand Gestures in Obama' s Humorous Speeches
Preparing Data from Psychotherapy for Natural Language Processing
The Reference Corpus of the Contemporary Romanian Language (CoRoLa)
BioRo: The Biomedical Corpus for the Romanian Language
Discriminating between Similar Languages on Imbalanced Conversational Texts
Examining the Tip of the Iceberg: A Data Set for Idiom Translation
KRAUTS: A German Temporally Annotated News Corpus
Moving TIGER beyond Sentence-Level
Elicitation protocol and material for a corpus of long prepared monologues in Sign Language
MirasVoice: A bilingual (English-Persian) speech corpus
Semantic Relatedness of Wikipedia Concepts -- Benchmark Data and a Working Solution
Complex and Precise Movie and Book Annotations in French Language for Aspect Based Sentiment Analysis
A Multilingual Test Collection for the Semantic Search of Entity Categories
From analysis to modeling of engagement as sequences of multimodal behaviors
Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing
Japanese Dialogue Corpus of Information Navigation and Attentive Listening Annotated with Extended ISO-24617-2 Dialogue Act Tags
A Japanese Corpus for Analyzing Customer Loyalty Information
Deep JSLC: A Multimodal Corpus Collection for Data-driven Generation of Japanese Sign Language Expressions
FooTweets: A Bilingual Parallel Corpus of World Cup Tweets
Edit me: A Corpus and a Framework for Understanding Natural Language Image Editing
The Niki and Julie Corpus: Collaborative Multimodal Dialogues between Humans, Robots, and Virtual Agents
Part-of-Speech Tagging for Arabic Gulf Dialect Using Bi-LSTM
Constructing a Chinese Medical Conversation Corpus Annotated with Conversational Structures and Actions
Predicting Nods by using Dialogue Acts in Dialogue
J-MeDic: A Japanese Disease Name Dictionary based on Real Clinical Usage
Carcinologic Speech Severity Index Project: A Database of Speech Disorder Productions to Assess Quality of Life Related to Speech After Cancer
The WAW Corpus: The First Corpus of Interpreted Speeches and their Translations for English and Arabic
A Multilingual Wikified Data Set of Educational Material
TSix: A Human-involved-creation Dataset for Tweet Summarization
An Assessment of Explicit Inter- and Intra-sentential Discourse Connectives in Turkish Discourse Bank
Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification
Semantic Frame Parsing for Information Extraction : the CALOR corpus
Enriching a Lexicon of Discourse Connectives with Corpus-based Data
A Morphologically Annotated Corpus of Emirati Arabic
Building a List of Synonymous Words and Phrases of Japanese Compound Verbs
Building a TOCFL Learner Corpus for Chinese Grammatical Error Diagnosis
Experiments with Convolutional Neural Networks for Multi-Label Authorship Attribution
Towards an Automatic Assessment of Crowdsourced Data for NLU
SimPA: A Sentence-Level Simplification Corpus for the Public Administration Domain
Spanish HPSG Treebank based on the AnCora Corpus
The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs
Universal Dependencies for Amharic
Preliminary Analysis of Embodied Interactions between Science Communicators and Visitors Based on a Multimodal Corpus of Japanese Conversations in a Science Museum
The First 100 Days: A Corpus Of Political Agendas on Twitter
Using a Corpus of English and Chinese Political Speeches for Metaphor Analysis
Medical Sentiment Analysis using Social Media: Towards building a Patient Assisted System
Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach
Improving domain-specific SMT for low-resourced languages using data from different domains
Modeling Collaborative Multimodal Behavior in Group Dialogues: The MULTISIMO Corpus
The brWaC Corpus: A New Open Resource for Brazilian Portuguese
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
Discovering Parallel Language Resources for Training MT Engines
A fine-grained error analysis of NMT, SMT and RBMT output for English-to-Dutch
A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction
A Leveled Reading Corpus of Modern Standard Arabic
Annotation and Quantitative Analysis of Speaker Information in Novel Conversation Sentences in Japanese
Multimodal Lexical Translation
Improving Crowdsourcing-Based Annotation of Japanese Discourse Relations
The LIA Treebank of Spoken Norwegian Dialects
Polish Corpus of Annotated Descriptions of Images
Chats and Chunks: Annotation and Analysis of Multiparty Long Casual Conversations
A Semi-autonomous System for Creating a Human-Machine Interaction Corpus in Virtual Reality: Application to the ACORFORMed System for Training Doctors to Break Bad News
Czech Text Document Corpus v 2.0
Manually Annotated Corpus of Polish Texts Published between 1830 and 1918
Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content
M-CNER: A Corpus for Chinese Named Entity Recognition in Multi-Domains
Statistical Analysis of Missing Translation in Simultaneous Interpretation Using A Large-scale Bilingual Speech Corpus
Action Verb Corpus
An Initial Test Collection for Ranked Retrieval of SMS Conversations
Sharing Copies of Synthetic Clinical Corpora without Physical Distribution — A Case Study to Get Around IPRs and Privacy Constraints Featuring the German JSYNCC Corpus
Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger
FrNewsLink : a corpus linking TV Broadcast News Segments and Press Articles
An Italian Twitter Corpus of Hate Speech against Immigrants
Semi-supervised Training Data Generation for Multilingual Question Answering
FARMI: A FrAmework for Recording Multi-Modal Interactions
EMO&LY (EMOtion and AnomaLY) : A new corpus for anomaly detection in an audiovisual stream with emotional context.
Corpora of Typical Sentences
Annotating Opinions and Opinion Targets in Student Course Feedback
FastSense: An Efficient Word Sense Disambiguation Classifier
The German Reference Corpus DeReKo: New Developments – New Opportunities
Annotating Abstract Meaning Representations for Spanish
Risamálheild: A Very Large Icelandic Text Corpus
ASR for Documenting Acutely Under-Resourced Indigenous Languages
Construction of English-French Multimodal Affective Conversational Corpus from TV Dramas
SandhiKosh: A Benchmark Corpus for Evaluating Sanskrit Sandhi Tools
PhotoshopQuiA: A Corpus of Non-Factoid Questions and Answers for Why-Question Answering
Exploring Conversational Language Generation for Rich Content about Hotels
Automating Document Discovery in the Systematic Review Process: How to Use Chaff to Extract Wheat
Multi-lingual Argumentative Corpora in English, Turkish, Greek, Albanian, Croatian, Serbian, Macedonian, Bulgarian, Romanian and Arabic
Towards a Conversation-Analytic Taxonomy of Speech Overlap
BioRead: A New Dataset for Biomedical Reading Comprehension
Czech Legal Text Treebank 2.0
Development of an Annotated Multimodal Dataset for the Investigation of Classification and Summarisation of Presentations using High-Level Paralinguistic Features
Shami: A Corpus of Levantine Arabic Dialects
The ICoN Corpus of Academic Written Italian (L1 and L2)
Annotated Corpus of Scientific Conference's Homepages for Information Extraction
CLARIN’s Key Resource Families
Annotation and Analysis of Extractive Summaries for the Kyutech Corpus
NoReC: The Norwegian Review Corpus
Evaluation of Machine Translation Performance Across Multiple Genres and Languages
Towards a Linked Open Data Edition of Sumerian Corpora
HiNTS: A Tagset for Middle Low German
Identification of Personal Information Shared in Chat-Oriented Dialogue
SentiArabic: A Sentiment Analyzer for Standard Arabic
Towards the Inference of Semantic Relations in Complex Nominals: a Pilot Study
Cross-linguistically Small World Networks are Ubiquitous in Child-directed Speech
A Repository of Corpora for Summarization
ANCOR-AS: Enriching the ANCOR Corpus with Syntactic Annotations
L1-L2 Parallel Treebank of Learner Chinese: Overused and Underused Syntactic Structures
Rollenwechsel-English: a large-scale semantic role corpus
The MonPaGe_HA Database for the Documentation of Spoken French Throughout Adulthood
You Tweet What You Speak: A City-Level Dataset of Arabic Dialects
The Use of Text Alignment in Semi-Automatic Error Analysis: Use Case in the Development of the Corpus of the Latvian Language Learners
Towards a Standardized Dataset for Noun Compound Interpretation
ParCorFull: a Parallel Corpus Annotated with Full Coreference
A Vietnamese Dialog Act Corpus Based on ISO 24617-2 standard
A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
Corpus Building and Evaluation of Aspect-based Opinion Summaries from Tweets in Spanish
Generation of a Spanish Artificial Collocation Error Corpus
Low-resource Post Processing of Noisy OCR Output for Historical Corpus Digitisation
Test Sets for Chinese Nonlocal Dependency Parsing
Structured Interpretation of Temporal Relations
Annotating Reflections for Health Behavior Change Therapy
Adding Syntactic Annotations to Flickr30k Entities Corpus for Multimodal Ambiguous Prepositional-Phrase Attachment Resolution
Error annotation in a Learner Corpus of Portuguese
Visualizing the "Dictionary of Regionalisms of France" (DRF)
CoLoSS: Cognitive Load Corpus with Speech and Performance Data from a Symbol-Digit Dual-Task
SB-CH: A Swiss German Corpus with Sentiment Annotations
DART: A Large Dataset of Dialectal Arabic Tweets
VAST: A Corpus of Video Annotation for Speech Technologies
Auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus
The GermaParl Corpus of Parliamentary Protocols
Analyzing the Quality of Counseling Conversations: the Tell-Tale Signs of High-quality Counseling
Identifying Speakers and Addressees in Dialogues Extracted from Literary Fiction
Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus
Increasing Argument Annotation Reproducibility by Using Inter-annotator Agreement to Improve Guidelines
An SLA Corpus Annotated with Pedagogically Relevant Grammatical Structures
Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus
Interpersonal Relationship Labels for the CALLHOME Corpus
Text Mining for History: first steps on building a large dataset
Designing a Russian Idiom-Annotated Corpus
Manual vs Automatic Bitext Extraction
Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer
Universal Dependencies and Quantitative Typological Trends. A Case Study on Word Order
Building Evaluation Datasets for Cultural Microblog Retrieval
Transc&Anno: A Graphical Tool for the Transcription and On-the-Fly Annotation of Handwritten Documents
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
A corpus of German political speeches from the 21st century
Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing
Palmyra: A Platform Independent Dependency Annotation Tool for Morphologically Rich Languages
Metadata Collection Records for Language Resources
A Web-based System for Crowd-in-the-Loop Dependency Treebanking
A Corpus of Drug Usage Guidelines Annotated with Type of Advice
ChAnot: An Intelligent Annotation Tool for Indigenous and Highly Agglutinative Languages in Peru
Construction of the Corpus of Everyday Japanese Conversation: An Interim Report
Compilation of Corpora for the Study of the Information Structure–Prosody Interface
The Abkhaz National Corpus
CATS: A Tool for Customized Alignment of Text Simplification Corpora
Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville)
The LREC Workshops Map
Errator: a Tool to Help Detect Annotation Errors in the Universal Dependencies Project
Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data
A database of German definitory contexts from selected web sources
PDFAnno: a Web-based Linguistic Annotation Tool for PDF Documents
TriMED: A Multilingual Terminological Database
Building a Constraint Grammar Parser for Plains Cree Verbs and Arguments
Up-cycling Data for Natural Language Generation
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System
Crowdsourced Multimodal Corpora Collection Tool
Coreference Resolution in FreeLing 4.0
Development of a Mobile Observation Support System for Students: FishWatchr Mini
Enhancing the AI2 Diagrams Dataset Using Rhetorical Structure Theory
SACR: A Drag-and-Drop Based Tool for Coreference Annotation
Towards Continuous Dialogue Corpus Creation: writing to corpus and generating from it
Towards Processing of the Oral History Interviews and Related Printed Documents
Manzanilla: An Image Annotation Tool for TKB Building
Beyond Generic Summarization: A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data
BabyCloud, a Technological Platform for Parents and Researchers
German Radio Interviews: The GRAIN Release of the SFB732 Silver Standard Collection
Building Literary Corpora for Computational Literary Analysis - A Prototype to Bridge the Gap between CL and DH
MirasText: An Automatically Generated Text Corpus for Persian
WASA: A Web Application for Sequence Annotation
A Lightweight Modeling Middleware for Corpus Processing
Creating Large-Scale Argumentation Structures for Dialogue Systems
ILCM - A Virtual Research Infrastructure for Large-Scale Qualitative Data
SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue Systems
Web-based Annotation Tool for Inflectional Language Resources
Increasing the Accessibility of Time-Aligned Speech Corpora with Spokes Mix
Graph Based Semi-Supervised Learning Approach for Tamil POS tagging
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
Application and Analysis of a Multi-layered Scheme for Irony on the Italian Twitter Corpus TWITTIRÒ
Sentence and Clause Level Emotion Annotation, Detection, and Classification in a Multi-Genre Corpus
Build Fast and Accurate Lemmatization for Arabic
Reference production in human-computer interaction: Issues for Corpus-based Referring Expression Generation
Persian Discourse Treebank and coreference corpus
Crowdsourcing JFCKB: Japanese Feature Change Knowledge Base
Quantifying Qualitative Data for Understanding Controversial Issues
Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French
Improving Machine Translation of Educational Content via Crowdsourcing
Strategies and Challenges for Crowdsourcing Regional Dialect Perception Data for Swiss German and Swiss French
Korean L2 Vocabulary Prediction: Can a Large Annotated Corpus be Used to Train Better Models for Predicting Unknown Words?
Crowdsourcing-based Annotation of the Accounting Registers of the Italian Comedy
Crowdsourced Corpus of Sentence Simplification with Core Vocabulary
Using Crowd Agreement for Wordnet Localization
Improving Crowdsourcing-Based Annotation of Japanese Discourse Relations
Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content
Building an English Vocabulary Knowledge Dataset of Japanese English-as-a-Second-Language Learners Using Crowdsourcing
A Web-based System for Crowd-in-the-Loop Dependency Treebanking
Crowdsourced Multimodal Corpora Collection Tool
Beyond Generic Summarization: A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data

 

D
Dialogue Dialogue Structure Annotation for Multi-Floor Interaction
Effects of Gender Stereotypes on Trust and Likability in Spoken Human-Robot Interaction
A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
What Causes the Differences in Communication Styles? A Multicultural Study on Directness and Elaborateness
Expert Evaluation of a Spoken Dialogue System in a Clinical Operating Room
JAIST Annotated Corpus of Free Conversation
KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue
ES-Port: a Spontaneous Spoken Human-Human Technical Support Corpus for Dialogue Research in Spanish
Dialog Intent Structure: A Hierarchical Schema of Linked Dialog Acts
JDCFC: A Japanese Dialogue Corpus with Feature Changes
The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions
An Information-Providing Closed-Domain Human-Agent Interaction Corpus
Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing
Japanese Dialogue Corpus of Information Navigation and Attentive Listening Annotated with Extended ISO-24617-2 Dialogue Act Tags
The Niki and Julie Corpus: Collaborative Multimodal Dialogues between Humans, Robots, and Virtual Agents
Constructing a Chinese Medical Conversation Corpus Annotated with Conversational Structures and Actions
Predicting Nods by using Dialogue Acts in Dialogue
Modeling Collaborative Multimodal Behavior in Group Dialogues: The MULTISIMO Corpus
Chats and Chunks: Annotation and Analysis of Multiparty Long Casual Conversations
Identification of Personal Information Shared in Chat-Oriented Dialogue
Annotating Reflections for Health Behavior Change Therapy
Automatic Labeling of Problem-Solving Dialogues for Computational Microgenetic Learning Analytics
Creating Large-Scale Argumentation Structures for Dialogue Systems
Digital Libraries Toward An Epic Epigraph Graph
Analyzing Citation-Distance Networks for Evaluating Publication Impact
A High-Quality Gold Standard for Citation-based Tasks
Crowdsourcing-based Annotation of the Accounting Registers of the Italian Comedy
Measuring Innovation in Speech and Language Processing Publications.
PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles
Discourse Annotation, Representation And Processing Discourse Coherence Through the Lens of an Annotated Text Corpus: A Case Study
Classifying Sluice Occurrences in Dialogue
Automatic Prediction of Discourse Connectives
Developing the Bangla RST Discourse Treebank
An Integrated Representation of Linguistic and Social Functions of Code-Switching
Adapting Serious Game for Fallacious Argumentation to German: Pitfalls, Insights, and Best Practices
Evaluating Scoped Meaning Representations
A Corpus of eRulemaking User Comments for Measuring Evaluability of Arguments
A Multi-layer Annotated Corpus of Argumentative Text: From Argument Schemes to Discourse Relations
Intertextual Correspondence for Integrating Corpora
Annotating Attribution Relations in Arabic
BASHI: A Corpus of Wall Street Journal Articles Annotated with Bridging Links
A «Portrait» Approach to Multichannel Discourse
Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank
Building a Macro Chinese Discourse Treebank
The Metalogue Debate Trainee Corpus: Data Collection and Annotations
QUD-Based Annotation of Discourse Structure and Information Structure: Tool and Evaluation
On the Vector Representation of Utterances in Dialogue Context
Dialog Intent Structure: A Hierarchical Schema of Linked Dialog Acts
Gaining and Losing Influence in Online Conversation
The ADELE Corpus of Dyadic Social Text Conversations:Dialog Act Annotation with ISO 24617-2
The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions
Attention for Implicit Discourse Relation Recognition
Constructing a Chinese Medical Conversation Corpus Annotated with Conversational Structures and Actions
An Assessment of Explicit Inter- and Intra-sentential Discourse Connectives in Turkish Discourse Bank
A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks
Preliminary Analysis of Embodied Interactions between Science Communicators and Visitors Based on a Multimodal Corpus of Japanese Conversations in a Science Museum
Annotation and Quantitative Analysis of Speaker Information in Novel Conversation Sentences in Japanese
Improving Crowdsourcing-Based Annotation of Japanese Discourse Relations
Chats and Chunks: Annotation and Analysis of Multiparty Long Casual Conversations
Annotating Abstract Meaning Representations for Spanish
PhotoshopQuiA: A Corpus of Non-Factoid Questions and Answers for Why-Question Answering
Towards a Conversation-Analytic Taxonomy of Speech Overlap
A Lexicon of Discourse Markers for Portuguese – LDM-PT
Annotation and Analysis of Extractive Summaries for the Kyutech Corpus
Structured Interpretation of Temporal Relations
Automatic Labeling of Problem-Solving Dialogues for Computational Microgenetic Learning Analytics
Low Resource Methods for Medieval Document Sections Analysis
Identifying Speakers and Addressees in Dialogues Extracted from Literary Fiction
Increasing Argument Annotation Reproducibility by Using Inter-annotator Agreement to Improve Guidelines
Enhancing the AI2 Diagrams Dataset Using Rhetorical Structure Theory
SACR: A Drag-and-Drop Based Tool for Coreference Annotation
TreeAnnotator: Versatile Visual Annotation of Hierarchical Text Relations
Creating Large-Scale Argumentation Structures for Dialogue Systems
PyrEval: An Automated Method for Summary Content Analysis
Persian Discourse Treebank and coreference corpus
Document Classification, Text Categorisation Content-Based Conflict of Interest Detection on Wikipedia
TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection
Discovering the Language of Wine Reviews: A Text Mining Account
A Corpus for Multilingual Document Classification in Eight Languages
Analyzing Citation-Distance Networks for Evaluating Publication Impact
Annotating Educational Questions for Student Response Analysis
Disambiguation of Verbal Shifters
Multilingual Multi-class Sentiment Classification Using Convolutional Neural Networks
A Large Self-Annotated Corpus for Sarcasm
JAIST Annotated Corpus of Free Conversation
HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification
Arabic Dialect Identification in the Context of Bivalency and Code-Switching
Improving Hate Speech Detection with Deep Learning Ensembles
Distributional Term Set Expansion
Can Domain Adaptation be Handled as Analogies?
The Effects of Unimodal Representation Choices on Multimodal Learning
Author Profiling from Facebook Corpora
Preparing Data from Psychotherapy for Natural Language Processing
Semi-Supervised Clustering for Short Answer Scoring
Moving TIGER beyond Sentence-Level
Semantic Relatedness of Wikipedia Concepts -- Benchmark Data and a Working Solution
Attention for Implicit Discourse Relation Recognition
Experiments with Convolutional Neural Networks for Multi-Label Authorship Attribution
Medical Sentiment Analysis using Social Media: Towards building a Patient Assisted System
Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features
Czech Text Document Corpus v 2.0
An Annotation Language for Semantic Search of Legal Sources
'Aye' or 'No'? Speech-level Sentiment Analysis of Hansard UK Parliamentary Debate Transcripts
Automating Document Discovery in the Systematic Review Process: How to Use Chaff to Extract Wheat
Two Multilingual Corpora Extracted from the Tenders Electronic Daily for Machine Learning and Machine Translation Applications.
NoReC: The Norwegian Review Corpus
Using Adversarial Examples in Natural Language Processing
Improving Unsupervised Keyphrase Extraction using Background Knowledge
Modeling Trolling in Social Media Conversations
A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
Annotating Reflections for Health Behavior Change Therapy
SB-CH: A Swiss German Corpus with Sentiment Annotations
Analyzing the Quality of Counseling Conversations: the Tell-Tale Signs of High-quality Counseling
Interpersonal Relationship Labels for the CALLHOME Corpus
Designing a Russian Idiom-Annotated Corpus
Classification of Closely Related Sub-dialects of Arabic Using Support-Vector Machines
DeepTC – An Extension of DKPro Text Classification for Fostering Reproducibility of Deep Learning Experiments
Arabic Data Science Toolkit: An API for Arabic Language Feature Extraction

 

E
Emotion Recognition/Generation JFCKB: Japanese Feature Change Knowledge Base
Content-Based Conflict of Interest Detection on Wikipedia
Word Affect Intensities
Representation Mapping: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons
Distribution of Emotional Reactions to News Articles in Twitter
Recognizing Behavioral Factors while Driving: A Real-World Multimodal Corpus to Monitor the Driver’s Affective State
EmotionLines: An Emotion Corpus of Multi-Party Conversations
Unfolding the External Behavior and Inner Affective State of Teammates through Ensemble Learning: Experimental Evidence from a Dyadic Team Corpus
Aggression-annotated Corpus of Hindi-English Code-mixed Data
Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories
Definite Description Lexical Choice: taking Speaker's Personality into account
A Comparison Of Emotion Annotation Schemes And A New Annotated Data Set
JDCFC: A Japanese Dialogue Corpus with Feature Changes
EMTC: Multilabel Corpus in Movie Domain for Emotion Analysis in Conversational Text
Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing
Sarcasm Target Identification: Dataset and An Introductory Approach
A Semi-autonomous System for Creating a Human-Machine Interaction Corpus in Virtual Reality: Application to the ACORFORMed System for Training Doctors to Break Bad News
SentiArabic: A Sentiment Analyzer for Standard Arabic
Contextual Dependencies in Time-Continuous Multidimensional Affect Recognition
WikiArt Emotions: An Annotated Dataset of Emotions Evoked by Art
SB-CH: A Swiss German Corpus with Sentiment Annotations
Arabic Data Science Toolkit: An API for Arabic Language Feature Extraction
Endangered Languages Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
Chahta Anumpa: A multimodal corpus of the Choctaw Language
MYCanCor: A Video Corpus of spoken Malaysian Cantonese
Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard
Evaluation of Dictionary Creating Methods for Finno-Ugric Minority Languages
Towards Language Technology for Mi'kmaq
Pronunciation Dictionaries for the Alsatian Dialects to Analyze Spelling and Phonetic Variation
ASR for Documenting Acutely Under-Resourced Indigenous Languages
Modeling Northern Haida Verb Morphology
Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing
The Abkhaz National Corpus
A Speaking Atlas of the Regional Languages of France
Evaluation Methodologies Evaluating the WordsEye Text-to-Scene System: Imaginative and Realistic Sentences
When ACE met KBP: End-to-End Evaluation of Knowledge Base Population with Component-level Annotation
A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts
Evaluation of Domain-specific Word Embeddings using Knowledge Resources
Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method
Computer-assisted Speaker Diarization: How to Evaluate Human Corrections
SPADE: Evaluation Dataset for Monolingual Phrase Alignment
Evaluating Scoped Meaning Representations
A Gold Standard for Multilingual Automatic Term Extraction from Comparable Corpora: Term Structure and Translation Equivalents
Framing Named Entity Linking Error Types
A Corpus for Multilingual Document Classification in Eight Languages
Upping the Ante: Towards a Better Benchmark for Chinese-to-English Machine Translation
SentEval: An Evaluation Toolkit for Universal Sentence Representations
Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment
The French-Algerian Code-Switching Triggered audio corpus (FACST)
Effects of Gender Stereotypes on Trust and Likability in Spoken Human-Robot Interaction
Delta vs. N-Gram Tracing: Evaluating the Robustness of Authorship Attribution Methods
Unfolding the External Behavior and Inner Affective State of Teammates through Ensemble Learning: Experimental Evidence from a Dyadic Team Corpus
Three Dimensions of Reproducibility in Natural Language Processing
Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions
Evaluation of Croatian Word Embeddings
Dataset for the First Evaluation on Chinese Machine Reading Comprehension
Using Discourse Information for Education with a Spanish-Chinese Parallel Corpus
A Comparison Of Emotion Annotation Schemes And A New Annotated Data Set
Extending the gold standard for a lexical substitution task: is it worth it?
Construction of a Japanese Word Similarity Dataset
Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases
Word Embedding Evaluation Datasets and Wikipedia Title Embedding for Chinese
Candidate Ranking for Maintenance of an Online Dictionary
Investigating the Influence of Bilingual MWU on Trainee Translation Quality
Cross-lingual Terminology Extraction for Translation Quality Estimation
A Taxonomy for In-depth Evaluation of Normalization for User Generated Content
Evaluation of Dictionary Creating Methods for Finno-Ugric Minority Languages
An Information-Providing Closed-Domain Human-Agent Interaction Corpus
An Evaluation Framework for Multimodal Interaction
Dysarthric speech evaluation: automatic and perceptual approaches
A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks
Towards an Automatic Assessment of Crowdsourced Data for NLU
Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning
Revisiting the Task of Scoring Open IE Relations
A fine-grained error analysis of NMT, SMT and RBMT output for English-to-Dutch
Is it worth it? Budget-related evaluation metrics for model selection
Czech Text Document Corpus v 2.0
Automated Evaluation of Out-of-Context Errors
An Initial Test Collection for Ranked Retrieval of SMS Conversations
SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages
A Survey on Automatically-Constructed WordNets and their Evaluation: Lexical and Word Embedding-based Approaches
Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer
Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data
Generating a Gold Standard for a Swedish Sentiment Lexicon
MGAD: Multilingual Generation of Analogy Datasets
TQ-AutoTest – An Automated Test Suite for (Machine) Translation Quality
What's Wrong, Python? -- A Visual Differ and Graph Library for NLP in Python
PyrEval: An Automated Method for Summary Content Analysis

 

G
Grammar And Syntax Multi-layer Annotation of the Rigveda
The Natural Stories Corpus
Building Universal Dependency Treebanks in Korean
Spanish HPSG Treebank based on the AnCora Corpus
ForFun 1.0: Prague Database of Forms and Functions -- An Invaluable Resource for Linguistic Research
Corpora of Typical Sentences
The AnnCor CHILDES Treebank
A Parser for LTAG and Frame Semantics
Building a Constraint Grammar Parser for Plains Cree Verbs and Arguments
Analyzing Middle High German Syntax with RDF and SPARQL

 

H
Handwritten, Typewritten Document Recognition Crowdsourcing-based Annotation of the Accounting Registers of the Italian Comedy
Low-resource Post Processing of Noisy OCR Output for Historical Corpus Digitisation
Towards Processing of the Oral History Interviews and Related Printed Documents

 

I
Industrial Systems Voice Builder: A Tool for Building Text-To-Speech Voices
Chemical Compounds Knowledge Visualization with Natural Language Processing and Linked Data
Academic-Industrial Perspective on the Development and Deployment of a Moderation System for a Newspaper Website
Sudachi: a Japanese Tokenizer for Business
Text Normalization Infrastructure that Scales to Hundreds of Language Varieties
Tilde MT Platform for Developing Client Specific MT Solutions
Improving homograph disambiguation with supervised machine learning
Information Extraction, Information Retrieval A FrameNet for Cancer Information in Clinical Narratives: Schema and Annotation
Linking, Searching, and Visualizing Entities in Wikipedia
Learning to Map Natural Language Statements into Knowledge Base Representations for Knowledge Base Construction
Incorporating Global Contexts into Sentence Embedding for Relational Extraction at the Paragraph Level with Distant Supervision
When ACE met KBP: End-to-End Evaluation of Knowledge Base Population with Component-level Annotation
Building a Knowledge Graph from Natural Language Definitions for Interpretable Text Entailment Recognition
MPST: A Corpus of Movie Plot Synopses with Tags
Revisiting Distant Supervision for Relation Extraction
Combining rule-based and embedding-based approaches to normalize textual entities with an ontology
Constructing a Lexicon of Relational Nouns
Overcoming the Long Tail Problem: A Case Study on CO2-Footprint Estimation of Recipes using Information Retrieval
Comprehensive Annotation of Various Types of Temporal Information on the Time Axis
T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples
Extracting an English-Persian Parallel Corpus from Comparable Corpora
A Dataset for Inter-Sentence Relation Extraction using Distant Supervision
Annotating Temporally-Anchored Spatial Knowledge by Leveraging Syntactic Dependencies
Building Named Entity Recognition Taggers via Parallel Corpora
Grounding Gradable Adjectives through Crowdsourcing
Annotating If the Authors of a Tweet are Located at the Locations They Tweet About
Laying the Groundwork for Knowledge Base Population: Nine Years of Linguistic Resources for TAC KBP
An Attribution Relations Corpus for Political News
CONDUCT: An Expressive Conducting Gesture Dataset for Sound Control
BlogSet-BR: A Brazilian Portuguese Blog Corpus
WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference
A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events
A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
Bootstrapping Polar-Opposite Emotion Dimensions from Online Reviews
Chinese Relation Classification using Long Short Term Memory Networks
The UIR Uncertainty Corpus for Chinese: Annotating Chinese Microblog Corpus for Uncertainty Identification from Social Media
EventWiki: A Knowledge Base of Major Events
MIsA: Multilingual "IsA" Extraction from Corpora
Enriching Frame Representations with Distributionally Induced Senses
Annotating Spin in Biomedical Scientific Publications : the case of Random Controlled Trials (RCTs)
A High-Quality Gold Standard for Citation-based Tasks
Visualization of the occurrence trend of infectious diseases using Twitter
Portuguese Named Entity Recognition using Conditional Random Fields and Local Grammars
Analysis of Implicit Conditions in Database Search Dialogues
Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications
Using English Baits to Catch Serbian Multi-Word Terminology
KRAUTS: A German Temporally Annotated News Corpus
Deep JSLC: A Multimodal Corpus Collection for Data-driven Generation of Japanese Sign Language Expressions
Augmenting Image Question Answering Dataset by Exploiting Image Captions
TSix: A Human-involved-creation Dataset for Tweet Summarization
Measuring Innovation in Speech and Language Processing Publications.
Revisiting the Task of Scoring Open IE Relations
A supervised approach to taxonomy extraction using word embeddings
Korean TimeBank Including Relative Temporal Information
M-CNER: A Corpus for Chinese Named Entity Recognition in Multi-Domains
Automated Evaluation of Out-of-Context Errors
An Initial Test Collection for Ranked Retrieval of SMS Conversations
FrNewsLink : a corpus linking TV Broadcast News Segments and Press Articles
An Annotation Language for Semantic Search of Legal Sources
FastSense: An Efficient Word Sense Disambiguation Classifier
Sanaphor++: Combining Deep Neural Networks with Semantics for Coreference Resolution
Czech Legal Text Treebank 2.0
Annotated Corpus of Scientific Conference's Homepages for Information Extraction
Transfer Learning for Named-Entity Recognition with Neural Networks
WikiDragon: A Java Framework For Diachronic Content And Network Analysis Of MediaWikis
Structured Interpretation of Temporal Relations
Studying Muslim Stereotyping through Microportrait Extraction
Low Resource Methods for Medieval Document Sections Analysis
Auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus
Automatic Identification of Research Fields in Scientific Papers
Tel(s)-Telle(s)-Signs: Highly Accurate Automatic Crosslingual Hypernym Discovery
Text Mining for History: first steps on building a large dataset
Building Evaluation Datasets for Cultural Microblog Retrieval
Teanga: A Linked Data based platform for Natural Language Processing
A Corpus of Drug Usage Guidelines Annotated with Type of Advice
Biomedical term normalization of EHRs with UMLS
PDFAnno: a Web-based Linguistic Annotation Tool for PDF Documents
PyRATA, Python Rule-based feAture sTructure Analysis
Developing New Linguistic Resources and Tools for the Galician Language
Text Annotation Graphs: Annotating Complex Natural Language Phenomena
Build Fast and Accurate Lemmatization for Arabic
Retrieving Information from the French Lexical Network in RDF/OWL Format
Improving Hypernymy Extraction with Distributional Semantic Classes

 

K
Knowledge Discovery/Representation Simple Large-scale Relation Extraction from Unstructured Text
Network Features Based Co-hyponymy Detection
Linking, Searching, and Visualizing Entities in Wikipedia
Learning to Map Natural Language Statements into Knowledge Base Representations for Knowledge Base Construction
Building a Knowledge Graph from Natural Language Definitions for Interpretable Text Entailment Recognition
MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
A Neural Network Based Model for Loanword Identification in Uyghur
Joint Learning of Sense and Word Embeddings
Combining rule-based and embedding-based approaches to normalize textual entities with an ontology
Overcoming the Long Tail Problem: A Case Study on CO2-Footprint Estimation of Recipes using Information Retrieval
Comprehensive Annotation of Various Types of Temporal Information on the Time Axis
T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples
Abstract Meaning Representation of Constructions: The More We Include, the Better the Representation
Comparing Pretrained Multilingual Word Embeddings on an Ontology Alignment Task
Cross-Document, Cross-Language Event Coreference Annotation Using Event Hoppers
Integrating Generative Lexicon Event Structures into VerbNet
A vision-grounded dataset for predicting typical locations for verbs
A Large Resource of Patterns for Verbal Paraphrases
Cross-Lingual Generation and Evaluation of a Wide-Coverage Lexical Semantic Resource
Constructing High Quality Sense-specific Corpus and Word Embedding via Unsupervised Elimination of Pseudo Multi-sense
ScholarGraph:a Chinese Knowledge Graph of Chinese Scholars
Automatic Annotation of Semantic Term Types in the Complete ACL Anthology Reference Corpus
Mapping Texts to Scripts: An Entailment Study
MIsA: Multilingual "IsA" Extraction from Corpora
Enriching Frame Representations with Distributionally Induced Senses
Cross-checking WordNet and SUMO Using Meronymy
Towards AMR-BR: A SemBank for Brazilian Portuguese Language
From analysis to modeling of engagement as sequences of multimodal behaviors
A supervised approach to taxonomy extraction using word embeddings
The Circumstantial Event Ontology (CEO) and ECB+/CEO: an Ontology and Corpus for Implicit Causal Relations between Events
A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction
Korean TimeBank Including Relative Temporal Information
An Annotation Language for Semantic Search of Legal Sources
Scalable Visualisation of Sentiment and Stance
Development of an Annotated Multimodal Dataset for the Investigation of Classification and Summarisation of Presentations using High-Level Paralinguistic Features
Towards faithfully visualizing global linguistic diversity
Browsing the Terminological Structure of a Specialized Domain: A Method Based on Lexical Functions and their Classification
A Large Multilingual and Multi-domain Dataset for Recommender Systems
Cross-linguistically Small World Networks are Ubiquitous in Child-directed Speech
Tel(s)-Telle(s)-Signs: Highly Accurate Automatic Crosslingual Hypernym Discovery
World Knowledge for Abstract Meaning Representation Parsing
The LODeXporter: Flexible Generation of Linked Open Data Triples from NLP Frameworks for Automatic Knowledge Base Construction
One event, many representations. Mapping action concepts through visual features.
Revita: a Language-learning Platform at the Intersection of ITS and CALL
Enhancing the AI2 Diagrams Dataset Using Rhetorical Structure Theory
Text Annotation Graphs: Annotating Complex Natural Language Phenomena
Manzanilla: An Image Annotation Tool for TKB Building
Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities
Improving Hypernymy Extraction with Distributional Semantic Classes

 

L
Language Identification Building Parallel Monolingual Gan Chinese Dialects Corpus
Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French
Collecting Code-Switched Data from Social Media
Creating dialect sub-corpora by clustering: a case in Japanese for an adaptive method
Discriminating between Similar Languages on Imbalanced Conversational Texts
Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach
Shami: A Corpus of Levantine Arabic Dialects
The ICoN Corpus of Academic Written Italian (L1 and L2)
You Tweet What You Speak: A City-Level Dataset of Arabic Dialects
DART: A Large Dataset of Dialectal Arabic Tweets
VAST: A Corpus of Video Annotation for Speech Technologies
Classification of Closely Related Sub-dialects of Arabic Using Support-Vector Machines
Language Modelling A Gold Anaphora Annotation Layer on an Eye Movement Corpus
Lightweight Grammatical Annotation in the TEI: New Perspectives
Comparison of Pun Detection Methods Using Japanese Pun Corpus
Learning Word Vectors for 157 Languages
Lexical Profiling of Environmental Corpora
A Computational Architecture for the Morphology of Upper Tanana
Effects of Gender Stereotypes on Trust and Likability in Spoken Human-Robot Interaction
Grounding Gradable Adjectives through Crowdsourcing
TF-LM: TensorFlow-based Language Modeling Toolkit
Data Anonymization for Requirements Quality Analysis: a Reproducible Automatic Error Detection Task
Grapheme-level Awareness in Word Embeddings for Morphologically Rich Languages
Portable Spelling Corrector for a Less-Resourced Language: Amharic
Creating dialect sub-corpora by clustering: a case in Japanese for an adaptive method
Dynamic Oracle for Neural Machine Translation in Decoding Phase
An Integrated Formal Representation for Terminological and Lexical Data included in Classification Schemes
Distributional Term Set Expansion
A Corpus to Learn Refer-to-as Relations for Nominals
Towards Language Technology for Mi'kmaq
Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models
Finely Tuned, 2 Billion Token Based Word Embeddings for Portuguese
Analyzing Vocabulary Commonality Index Using Large-scaled Database of Child Language Development
Neural Models of Selectional Preferences for Implicit Semantic Role Labeling
SimLex-999 for Polish
Automated Evaluation of Out-of-Context Errors
Action Verb Corpus
Modeling French Sign Language: a proposal for a semantically compositional system
Evaluating Inflectional Complexity Crosslinguistically: a Processing Perspective
Modeling Northern Haida Verb Morphology
Test Sets for Chinese Nonlocal Dependency Parsing
Compilation of Corpora for the Study of the Information Structure–Prosody Interface
Building a Constraint Grammar Parser for Plains Cree Verbs and Arguments
MirasText: An Automatically Generated Text Corpus for Persian
Retrieving Information from the French Lexical Network in RDF/OWL Format
Less-Resourced Languages FonBund: A Library for Combining Cross-lingual Phonological Segment Data
Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech
Lexicon, Lexical Database Creating a Verb Synonym Lexicon Based on a Parallel Corpus
Network Features Based Co-hyponymy Detection
Introducing a Lexicon of Verbal Polarity Shifters for English
Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish)
JFCKB: Japanese Feature Change Knowledge Base
The MADAR Arabic Dialect Corpus and Lexicon
Semi-automatic Korean FrameNet Annotation over KAIST Treebank
Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages
Representation Mapping: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons
Automatic Thesaurus Construction for Modern Hebrew
Constructing a Lexicon of Relational Nouns
Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French
A Large Automatically-Acquired All-Words List of Multiword Expressions Scored for Compositionality
A multilingual collection of CoNLL-U-compatible morphological lexicons
Lexical Profiling of Environmental Corpora
A New Version of the Składnica Treebank of Polish Harmonised with the Walenty Valency Dictionary
UniMorph 2.0: Universal Morphology
Automatic Wordnet Mapping: from CoreNet to Princeton WordNet
Creating Large-Scale Multilingual Cognate Tables
A Hybrid Approach for Automatic Extraction of Bilingual Multiword Expressions from Parallel Corpora
Integrating Generative Lexicon Event Structures into VerbNet
FontLex: A Typographical Lexicon based on Affective Associations
The New Propbank: Aligning Propbank with AMR through POS Unification
LIdioms: A Multilingual Linked Idioms Data Set
Finite-state morphological analysis for Gagauz
GeCoTagger: Annotation of German Verb Complements with Conditional Random Fields
Extending the gold standard for a lexical substitution task: is it worth it?
IPSL: A Database of Iconicity Patterns in Sign Languages. Creation and Use
Undersampling Improves Hypernymy Prototypicality Learning
Morphology Injection for English-Malayalam Statistical Machine Translation
Word Embedding Evaluation Datasets and Wikipedia Title Embedding for Chinese
An Automatic Learning of an Algerian Dialect Lexicon by using Multilingual Word Embeddings
Unsupervised Korean Word Sense Disambiguation using CoreNet
Language adaptation experiments via cross-lingual embeddings for related languages
Very Large-Scale Lexical Resources to Enhance Chinese and Japanese Machine Translation
Annotating Chinese Light Verb Constructions according to PARSEME guidelines
Simplified Corpus with Core Vocabulary
A Pragmatic Approach for Classical Chinese Word Segmentation
Cross-checking WordNet and SUMO Using Meronymy
Evaluation of Dictionary Creating Methods for Finno-Ugric Minority Languages
Crowdsourced Corpus of Sentence Simplification with Core Vocabulary
Tools for The Production of Analogical Grids and a Resource of N-gram Analogical Grids in 11 Languages
Combining Concepts and Their Translations from Structured Dictionaries of Uralic Minority Languages
Transfer of Frames from English FrameNet to Construct Chinese FrameNet: A Bilingual Corpus-Based Approach
Building Universal Dependency Treebanks in Korean
Unified Guidelines and Resources for Arabic Dialect Orthography
Pronunciation Dictionaries for the Alsatian Dialects to Analyze Spelling and Phonetic Variation
Konbitzul: an MWE-specific database for Spanish-Basque
Complex and Precise Movie and Book Annotations in French Language for Aspect Based Sentiment Analysis
J-MeDic: A Japanese Disease Name Dictionary based on Real Clinical Usage
Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification
Enriching a Lexicon of Discourse Connectives with Corpus-based Data
Building a List of Synonymous Words and Phrases of Japanese Compound Verbs
Automatic Enrichment of Terminological Resources: the IATE RDF Example
Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach
Using Crowd Agreement for Wordnet Localization
A Danish FrameNet Lexicon and an Annotated Corpus Used for Training and Evaluating a Semantic Frame Classifier
Finely Tuned, 2 Billion Token Based Word Embeddings for Portuguese
SLIDE - a Sentiment Lexicon of Common Idioms
CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing
PronouncUR: An Urdu Pronunciation Lexicon Generator
SimLex-999 for Polish
Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger
SandhiKosh: A Benchmark Corpus for Evaluating Sanskrit Sandhi Tools
Preparation and Usage of Xhosa Lexicographical Data for a Multilingual, Federated Environment
A Lexicon of Discourse Markers for Portuguese – LDM-PT
Browsing the Terminological Structure of a Specialized Domain: A Method Based on Lexical Functions and their Classification
One Language to rule them all: modelling Morphological Patterns in a Large Scale Italian Lexicon with SWRL
SenSALDO: Creating a Sentiment Lexicon for Swedish
SentiArabic: A Sentiment Analyzer for Standard Arabic
Leveraging Lexical Resources and Constraint Grammar for Rule-Based Part-of-Speech Tagging in Welsh
Indian Language Wordnets and their Linkages with Princeton WordNet
Classifier-based Polarity Propagation in a WordNet
Visualizing the "Dictionary of Regionalisms of France" (DRF)
The Linguistic Category Model in Polish (LCM-PL)
Massively Translingual Compound Analysis and Translation Discovery
A Survey on Automatically-Constructed WordNets and their Evaluation: Lexical and Word Embedding-based Approaches
Universal Dependencies for Ainu
WordNet-Shp: Towards the Building of a Lexical Database for a Peruvian Minority Language
Building a Morphological Treebank for German from a Linguistic Database
A Fast and Flexible Webinterface for Dialect Research in the Low Countries
Tools for Building an Interlinked Synonym Lexicon Network
EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language
Signbank: Software to Support Web Based Dictionaries of Sign Language
Biomedical term normalization of EHRs with UMLS
A database of German definitory contexts from selected web sources
Sign Languages and the Online World Online Dictionaries & Lexicostatistics
Evaluating EcoLexiCAT: a Terminology-Enhanced CAT Tool
Extended HowNet 2.0 – An Entity-Relation Common-Sense Representation Model
Metaphor Suggestions based on a Semantic Metaphor Repository
Improving Hypernymy Extraction with Distributional Semantic Classes
Linked Data Learning to Map Natural Language Statements into Knowledge Base Representations for Knowledge Base Construction
LIdioms: A Multilingual Linked Idioms Data Set
EventWiki: A Knowledge Base of Major Events
Interoperability of Language-related Information: Mapping the BLL Thesaurus to Lexvo and Glottolog
A Framework for the Needs of Different Types of Users in Multilingual Semantic Enrichment
Universal Morphologies for the Caucasus region
J-MeDic: A Japanese Disease Name Dictionary based on Real Clinical Usage
PMKI: an European Commission action for the interoperability, maintainability and sustainability of Language Resources
Automatic Enrichment of Terminological Resources: the IATE RDF Example
Sanaphor++: Combining Deep Neural Networks with Semantics for Coreference Resolution
Preparation and Usage of Xhosa Lexicographical Data for a Multilingual, Federated Environment
One Language to rule them all: modelling Morphological Patterns in a Large Scale Italian Lexicon with SWRL
Towards a Linked Open Data Edition of Sumerian Corpora
Indian Language Wordnets and their Linkages with Princeton WordNet
Teanga: A Linked Data based platform for Natural Language Processing
The LODeXporter: Flexible Generation of Linked Open Data Triples from NLP Frameworks for Automatic Knowledge Base Construction
The ACoLi CoNLL Libraries: Beyond Tab-Separated Values
Building Literary Corpora for Computational Literary Analysis - A Prototype to Bridge the Gap between CL and DH
Automatic and Manual Web Annotations in an Infrastructure to handle Fake News and other Online Media Phenomena
LiDo RDF: From a Relational Database to a Linked Data Graph of Linguistic Terms and Bibliographic Data
Lr Infrastructures And Architectures Building Parallel Monolingual Gan Chinese Dialects Corpus
Data Management Plan (DMP) for Language Data under the New General Da-ta Protection Regulation (GDPR)
Handling Big Data and Sensitive Data Using EUDAT's Generic Execution Framework and the WebLicht Workflow Engine.
Lessons Learned: On the Challenges of Migrating a Research Data Repository from a Research Institution to a University Library.
New directions in ELRA activities
CLARIN: Towards FAIR and Responsible Data Science Using Language Resources
From ‘Solved Problems’ to New Challenges: A Report on LDC Activities
A Framework for Multi-Language Service Design with the Language Grid
Language Technology for Multilingual Europe: An Analysis of a Large-Scale Survey regarding Challenges, Demands, Gaps and Needs
Introducing NIEUW: Novel Incentives and Workflows for Eliciting Linguistic Data
A Framework for the Needs of Different Types of Users in Multilingual Semantic Enrichment
LREMap, a Song of Resources and Evaluation
Combining Concepts and Their Translations from Structured Dictionaries of Uralic Minority Languages
Preparing Data from Psychotherapy for Natural Language Processing
Universal Morphologies for the Caucasus region
Managing Public Sector Data for Multilingual Applications Development
FARMI: A FrAmework for Recording Multi-Modal Interactions
Fluid Annotation: A Granularity-aware Annotation Tool for Chinese Word Fluidity
CLARIN’s Key Resource Families
Introducing the CLARIN Knowledge Centre for Linguistic Diversity and Language Documentation
Collecting Language Resources from Public Administrations in the Nordic and Baltic Countries
Metadata Collection Records for Language Resources
Mining Biomedical Publications With The LAPPS Grid
Developing New Linguistic Resources and Tools for the Galician Language
Resource Interoperability for Sustainable Benchmarking: The Case of Events
The ACoLi CoNLL Libraries: Beyond Tab-Separated Values
Towards Continuous Dialogue Corpus Creation: writing to corpus and generating from it
Extending Search System based on Interactive Visualization for Speech Corpora
MirasText: An Automatically Generated Text Corpus for Persian
Bridging the LAPPS Grid and CLARIN
E-magyar -- A Digital Language Processing System
ILCM - A Virtual Research Infrastructure for Large-Scale Qualitative Data
What's Wrong, Python? -- A Visual Differ and Graph Library for NLP in Python
Indra: A Word Embedding and Semantic Relatedness Server
A UIMA Database Interface for Managing NLP-related Text Annotations
Lr National/International Projects, Infrastructural/Policy Issues Data Management Plan (DMP) for Language Data under the New General Da-ta Protection Regulation (GDPR)
Lessons Learned: On the Challenges of Migrating a Research Data Repository from a Research Institution to a University Library.
New directions in ELRA activities
CLARIN: Towards FAIR and Responsible Data Science Using Language Resources
From ‘Solved Problems’ to New Challenges: A Report on LDC Activities
Language Technology for Multilingual Europe: An Analysis of a Large-Scale Survey regarding Challenges, Demands, Gaps and Needs
Introducing NIEUW: Novel Incentives and Workflows for Eliciting Linguistic Data
Researching Less-Resourced Languages – the DigiSami Corpus
A Bird’s-eye View of Language Processing Projects at the Romanian Academy
The Reference Corpus of the Contemporary Romanian Language (CoRoLa)
PMKI: an European Commission action for the interoperability, maintainability and sustainability of Language Resources
Managing Public Sector Data for Multilingual Applications Development
Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content
The DLDP Survey on Digital Use and Usability of EU Regional and Minority Languages
Introducing the CLARIN Knowledge Centre for Linguistic Diversity and Language Documentation
A Legal Perspective on Training Models for Natural Language Processing
The GermaParl Corpus of Parliamentary Protocols
Collecting Language Resources from Public Administrations in the Nordic and Baltic Countries
European Language Resource Coordination: Collecting Language Resources for Public Sector Multilingual Information Management
Bridging the LAPPS Grid and CLARIN
E-magyar -- A Digital Language Processing System

 

M
Machine Translation, Speechtospeech Translation Multilingual Parallel Corpus for Global Communication Plan
ESCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing
Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method
Evaluating Domain Adaptation for Machine Translation Across Scenarios
Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation
A Large Automatically-Acquired All-Words List of Multiword Expressions Scored for Compositionality
Upping the Ante: Towards a Better Benchmark for Chinese-to-English Machine Translation
Improving Machine Translation of Educational Content via Crowdsourcing
A Hybrid Approach for Automatic Extraction of Bilingual Multiword Expressions from Parallel Corpora
Training and Adapting Multilingual NMT for Less-resourced and Morphologically Rich Languages
English-Basque Statistical and Neural Machine Translation
Morphology Injection for English-Malayalam Statistical Machine Translation
Dynamic Oracle for Neural Machine Translation in Decoding Phase
Examining the Tip of the Iceberg: A Data Set for Idiom Translation
Improving domain-specific SMT for low-resourced languages using data from different domains
A fine-grained error analysis of NMT, SMT and RBMT output for English-to-Dutch
Multimodal Lexical Translation
A Comparative Study of Extremely Low-Resource Transliteration of the World’s Languages
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
TQ-AutoTest – An Automated Test Suite for (Machine) Translation Quality
One Sentence One Model for Neural Machine Translation
Metadata LREMap, a Song of Resources and Evaluation
Morphology Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish)
UniMorph 2.0: Universal Morphology
A Computational Architecture for the Morphology of Upper Tanana
Expanding Abbreviations in a Strongly Inflected Language: Are Morphosyntactic Tags Sufficient?
A Fast and Accurate Vietnamese Word Segmenter
Finite-state morphological analysis for Gagauz
Construction of a Japanese Word Similarity Dataset
Morphology Injection for English-Malayalam Statistical Machine Translation
Grapheme-level Awareness in Word Embeddings for Morphologically Rich Languages
The Morpho-syntactic Annotation of Animacy for a Dependency Parser
Tools for The Production of Analogical Grids and a Resource of N-gram Analogical Grids in 11 Languages
Universal Morphologies for the Caucasus region
A Morphologically Annotated Corpus of Emirati Arabic
ForFun 1.0: Prague Database of Forms and Functions -- An Invaluable Resource for Linguistic Research
CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing
Manually Annotated Corpus of Polish Texts Published between 1830 and 1918
Evaluating Inflectional Complexity Crosslinguistically: a Processing Perspective
SandhiKosh: A Benchmark Corpus for Evaluating Sanskrit Sandhi Tools
Parser combinators for Tigrinya and Oromo morphology
One Language to rule them all: modelling Morphological Patterns in a Large Scale Italian Lexicon with SWRL
Modeling Northern Haida Verb Morphology
Building a Morphological Treebank for German from a Linguistic Database
MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction
A Morphological Analyzer for St. Lawrence Island / Central Siberian Yupik
The Abkhaz National Corpus
Web-based Annotation Tool for Inflectional Language Resources
Parsivar: A Language Processing Toolkit for Persian
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages
Build Fast and Accurate Lemmatization for Arabic
Multilinguality The Boarnsterhim Corpus: A Bilingual Frisian-Dutch Panel and Trend Study
Open Subtitles Paraphrase Corpus for Six Languages
Incorporating Contextual Information for Language-Independent, Dynamic Disambiguation Tasks
OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora
A Large Parallel Corpus of Full-Text Scientific Articles
An Integrated Representation of Linguistic and Social Functions of Code-Switching
We Are Depleting Our Research Subject as We Are Investigating It: In Language Technology, more Replication and Diversity Are Needed
Representation Mapping: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons
NegPar: A parallel corpus annotated for negation
Learning Word Vectors for 157 Languages
A Corpus for Multilingual Document Classification in Eight Languages
A multilingual collection of CoNLL-U-compatible morphological lexicons
Huge Automatically Extracted Training-Sets for Multilingual Word SenseDisambiguation
Creating a Translation Matrix of the Bible’s Names Across 591 Languages
UniMorph 2.0: Universal Morphology
The French-Algerian Code-Switching Triggered audio corpus (FACST)
Language Technology for Multilingual Europe: An Analysis of a Large-Scale Survey regarding Challenges, Demands, Gaps and Needs
Chahta Anumpa: A multimodal corpus of the Choctaw Language
Comparing Pretrained Multilingual Word Embeddings on an Ontology Alignment Task
Creating Large-Scale Multilingual Cognate Tables
Phonetically Balanced Code-Mixed Speech Corpus for Hindi-English Automatic Speech Recognition
Building Named Entity Recognition Taggers via Parallel Corpora
No more beating about the bush : A Step towards Idiom Handling for Indian Language NLP
Error Analysis of Uyghur Name Tagging: Language-specific Techniques and Remaining Challenges
A Multilingual Approach to Question Classification
Baselines and Test Data for Cross-Lingual Inference
Training and Adapting Multilingual NMT for Less-resourced and Morphologically Rich Languages
A First South African Corpus of Multilingual Code-switched Soap Opera Speech
IPSL: A Database of Iconicity Patterns in Sign Languages. Creation and Use
Acquiring Verb Classes Through Bottom-Up Semantic Verb Clustering
Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank
Chinese Relation Classification using Long Short Term Memory Networks
An Automatic Learning of an Algerian Dialect Lexicon by using Multilingual Word Embeddings
Language adaptation experiments via cross-lingual embeddings for related languages
A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach
An Integrated Formal Representation for Terminological and Lexical Data included in Classification Schemes
ZAP: An Open-Source Multilingual Annotation Projection Framework
Humor Detection in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System
Infant Word Comprehension-to-Production Index Applied to Investigation of Noun Learning Predominance Using Cross-lingual CDI database
Konbitzul: an MWE-specific database for Spanish-Basque
MirasVoice: A bilingual (English-Persian) speech corpus
GenDR: A Generic Deep Realizer with Complex Lexicalization
Automatic Enrichment of Terminological Resources: the IATE RDF Example
Using Crowd Agreement for Wordnet Localization
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
Statistical Analysis of Missing Translation in Simultaneous Interpretation Using A Large-scale Bilingual Speech Corpus
KIT-Multi: A Translation-Oriented Multilingual Embedding Corpus
Corpora of Typical Sentences
A Comparative Study of Extremely Low-Resource Transliteration of the World’s Languages
Two Multilingual Corpora Extracted from the Tenders Electronic Daily for Machine Learning and Machine Translation Applications.
A Large Multilingual and Multi-domain Dataset for Recommender Systems
SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages
Indian Language Wordnets and their Linkages with Princeton WordNet
ParCorFull: a Parallel Corpus Annotated with Full Coreference
A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
Massively Translingual Compound Analysis and Translation Discovery
Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus
Universal Dependencies and Quantitative Typological Trends. A Case Study on Word Order
European Language Resource Coordination: Collecting Language Resources for Public Sector Multilingual Information Management
A Workbench for Rapid Generation of Cross-Lingual Summaries
CATS: A Tool for Customized Alignment of Text Simplification Corpora
TriMED: A Multilingual Terminological Database
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform
Browsing and Supporting Pluricentric Global Wordnet, or just your Wordnet of Interest
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages
Multimedia Document Processing Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions
Sound Signal Processing with Seq2Tree Network
Social Image Tags as a Source of Word Embeddings: A Task-oriented Evaluation
The Effects of Unimodal Representation Choices on Multimodal Learning
Polish Corpus of Annotated Descriptions of Images
Towards a music-language mapping
Towards Processing of the Oral History Interviews and Related Printed Documents
Multiword Expressions & Collocations Word Embedding Approach for Synonym Extraction of Multi-Word Terms
Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method
A Large Automatically-Acquired All-Words List of Multiword Expressions Scored for Compositionality
A Lexical Tool for Academic Writing in Spanish based on Expert and Novice Corpora
A Hybrid Approach for Automatic Extraction of Bilingual Multiword Expressions from Parallel Corpora
No more beating about the bush : A Step towards Idiom Handling for Indian Language NLP
LIdioms: A Multilingual Linked Idioms Data Set
Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus
Annotating Chinese Light Verb Constructions according to PARSEME guidelines
Using English Baits to Catch Serbian Multi-Word Terminology
Construction of Large-scale English Verbal Multiword Expression Annotated Corpus
Examining the Tip of the Iceberg: A Data Set for Idiom Translation
Konbitzul: an MWE-specific database for Spanish-Basque
GenDR: A Generic Deep Realizer with Complex Lexicalization
A Multilingual Test Collection for the Semantic Search of Entity Categories
Building a List of Synonymous Words and Phrases of Japanese Compound Verbs
SLIDE - a Sentiment Lexicon of Common Idioms
Is it worth it? Budget-related evaluation metrics for model selection
The ICoN Corpus of Academic Written Italian (L1 and L2)
Towards the Inference of Semantic Relations in Complex Nominals: a Pilot Study
Towards a Standardized Dataset for Noun Compound Interpretation
Generation of a Spanish Artificial Collocation Error Corpus
Improving a Neural-based Tagger for Multiword Expressions Identification
Designing a Russian Idiom-Annotated Corpus

 

N
Named Entity Recognition A New Corpus to Support Text Mining for the Curation of Metabolites in the ChEBI Database
A Deep Neural Network based Approach for Entity Extraction in Code-Mixed Indian Social Media Text
Framing Named Entity Linking Error Types
Building Named Entity Recognition Taggers via Parallel Corpora
Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus
Error Analysis of Uyghur Name Tagging: Language-specific Techniques and Remaining Challenges
BiLSTM-CRF for Persian Named-Entity Recognition ArmanPersoNERCorpus: the First Entity-Annotated Persian Dataset
A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events
A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
Very Large-Scale Lexical Resources to Enhance Chinese and Japanese Machine Translation
Portuguese Named Entity Recognition using Conditional Random Fields and Local Grammars
M-CNER: A Corpus for Chinese Named Entity Recognition in Multi-Domains
Czech Legal Text Treebank 2.0
Transfer Learning for Named-Entity Recognition with Neural Networks
Automatic Identification of Research Fields in Scientific Papers
Text Mining for History: first steps on building a large dataset
CogCompNLP: Your Swiss Army Knife for NLP
SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue Systems
Natural Language Generation Automatic Prediction of Discourse Connectives
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions
Building a Corpus for Personality-dependent Natural Language Understanding and Generation
Definite Description Lexical Choice: taking Speaker's Personality into account
Referring Expression Generation in time-constrained communication
Incorporating Semantic Attention in Video Description Generation
Exploiting Pre-Ordering for Neural Machine Translation
GenDR: A Generic Deep Realizer with Complex Lexicalization
Augmenting Image Question Answering Dataset by Exploiting Image Captions
A Detailed Evaluation of Neural Sequence-to-Sequence Models for In-domain and Cross-domain Text Simplification
Exploring Conversational Language Generation for Rich Content about Hotels
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
Translating Web Search Queries into Natural Language Questions
Towards a music-language mapping
Up-cycling Data for Natural Language Generation
Reference production in human-computer interaction: Issues for Corpus-based Referring Expression Generation

 

O
Ontologies Comparing Pretrained Multilingual Word Embeddings on an Ontology Alignment Task
Undersampling Improves Hypernymy Prototypicality Learning
Interoperability of Language-related Information: Mapping the BLL Thesaurus to Lexvo and Glottolog
Cross-checking WordNet and SUMO Using Meronymy
The Circumstantial Event Ontology (CEO) and ECB+/CEO: an Ontology and Corpus for Implicit Causal Relations between Events
Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger
Towards a Conversation-Analytic Taxonomy of Speech Overlap
Identification of Personal Information Shared in Chat-Oriented Dialogue
A Survey on Automatically-Constructed WordNets and their Evaluation: Lexical and Word Embedding-based Approaches
Tel(s)-Telle(s)-Signs: Highly Accurate Automatic Crosslingual Hypernym Discovery
One event, many representations. Mapping action concepts through visual features.
Up-cycling Data for Natural Language Generation
Browsing and Supporting Pluricentric Global Wordnet, or just your Wordnet of Interest
Extended HowNet 2.0 – An Entity-Relation Common-Sense Representation Model
Retrieving Information from the French Lexical Network in RDF/OWL Format
Opinion Mining / Sentiment Analysis Introducing a Lexicon of Verbal Polarity Shifters for English
Quantifying Qualitative Data for Understanding Controversial Issues
Word Affect Intensities
Distribution of Emotional Reactions to News Articles in Twitter
Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories
Medical Entity Corpus with PICO elements and Sentiment Analysis
Disambiguation of Verbal Shifters
Annotating Attribution Relations in Arabic
Bootstrapping Polar-Opposite Emotion Dimensions from Online Reviews
Resource Creation Towards Automated Sentiment Analysis in Telugu (a low resource language) and Integrating Multiple Domain Sources to Enhance Sentiment Prediction
Multilingual Multi-class Sentiment Classification Using Convolutional Neural Networks
A Large Self-Annotated Corpus for Sarcasm
MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification
Can Domain Adaptation be Handled as Analogies?
Building a Sentiment Corpus of Tweets in Brazilian Portuguese
EMTC: Multilabel Corpus in Movie Domain for Emotion Analysis in Conversational Text
Complex and Precise Movie and Book Annotations in French Language for Aspect Based Sentiment Analysis
Lingmotif-lex: a Wide-coverage, State-of-the-art Lexicon for Sentiment Analysis
A Japanese Corpus for Analyzing Customer Loyalty Information
FooTweets: A Bilingual Parallel Corpus of World Cup Tweets
The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs
Sarcasm Target Identification: Dataset and An Introductory Approach
SLIDE - a Sentiment Lexicon of Common Idioms
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language
An Italian Twitter Corpus of Hate Speech against Immigrants
Annotating Opinions and Opinion Targets in Student Course Feedback
'Aye' or 'No'? Speech-level Sentiment Analysis of Hansard UK Parliamentary Debate Transcripts
Scalable Visualisation of Sentiment and Stance
NoReC: The Norwegian Review Corpus
SenSALDO: Creating a Sentiment Lexicon for Swedish
RtGender: A Corpus for Studying Differential Responses to Gender
A Vietnamese Dialog Act Corpus Based on ISO 24617-2 standard
Corpus Building and Evaluation of Aspect-based Opinion Summaries from Tweets in Spanish
WikiArt Emotions: An Annotated Dataset of Emotions Evoked by Art
Classifier-based Polarity Propagation in a WordNet
Utilizing Large Twitter Corpora to Create Sentiment Lexica
Generating a Gold Standard for a Swedish Sentiment Lexicon
Application and Analysis of a Multi-layered Scheme for Irony on the Italian Twitter Corpus TWITTIRÒ
Optical Character Recognition Building A Handwritten Cuneiform Character Imageset
Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features
Low-resource Post Processing of Noisy OCR Output for Historical Corpus Digitisation
PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles
Other Creating a Verb Synonym Lexicon Based on a Parallel Corpus
Multilingual Parallel Corpus for Global Communication Plan
DeModify: A Dataset for Analyzing Contextual Constraints on Modifier Deletion
When ACE met KBP: End-to-End Evaluation of Knowledge Base Population with Component-level Annotation
Annotating High-Level Structures of Short Stories and Personal Anecdotes
Sentence Level Temporality Detection using an Implicit Time-sensed Resource
Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions
Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish)
A Corpus of Metaphor Novelty Scores for Syntactically-Related Word Pairs
Evaluation of Domain-specific Word Embeddings using Knowledge Resources
Collection of Multimodal Dialog Data and Analysis of the Result of Annotation of Users' Interest Level
Multi-layer Annotation of the Rigveda
Universal Dependencies Version 2 for Japanese
ESCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing
Incorporating Contextual Information for Language-Independent, Dynamic Disambiguation Tasks
A Neural Network Based Model for Loanword Identification in Uyghur
OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora
Building an Ellipsis-aware Chinese Dependency Treebank for Web Text
Data Management Plan (DMP) for Language Data under the New General Da-ta Protection Regulation (GDPR)
Quantifying Qualitative Data for Understanding Controversial Issues
Word Affect Intensities
Semi-automatic Korean FrameNet Annotation over KAIST Treebank
Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text
A Large Parallel Corpus of Full-Text Scientific Articles
Developing the Bangla RST Discourse Treebank
Computer-assisted Speaker Diarization: How to Evaluate Human Corrections
We Are Depleting Our Research Subject as We Are Investigating It: In Language Technology, more Replication and Diversity Are Needed
Revisiting Distant Supervision for Relation Extraction
Automatic Thesaurus Construction for Modern Hebrew
A Deep Neural Network based Approach for Entity Extraction in Code-Mixed Indian Social Media Text
Adapting Serious Game for Fallacious Argumentation to German: Pitfalls, Insights, and Best Practices
Discovering the Language of Wine Reviews: A Text Mining Account
Lessons Learned: On the Challenges of Migrating a Research Data Repository from a Research Institution to a University Library.
New directions in ELRA activities
Recognizing Behavioral Factors while Driving: A Real-World Multimodal Corpus to Monitor the Driver’s Affective State
BDPROTO: A Database of Phonological Inventories from Ancient and Reconstructed Languages
Multi-Dialect Arabic POS Tagging: A CRF Approach
Evaluating Domain Adaptation for Machine Translation Across Scenarios
CLARIN: Towards FAIR and Responsible Data Science Using Language Resources
EmotionLines: An Emotion Corpus of Multi-Party Conversations
A Gold Standard for Multilingual Automatic Term Extraction from Comparable Corpora: Term Structure and Translation Equivalents
Toward An Epic Epigraph Graph
Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts
Learning Word Vectors for 157 Languages
Building a Word Segmenter for Sanskrit Overnight
Dialogue Structure Annotation for Multi-Floor Interaction
Extracting an English-Persian Parallel Corpus from Comparable Corpora
Upping the Ante: Towards a Better Benchmark for Chinese-to-English Machine Translation
A Corpus of eRulemaking User Comments for Measuring Evaluability of Arguments
Lexical Profiling of Environmental Corpora
A New Version of the Składnica Treebank of Polish Harmonised with the Walenty Valency Dictionary
Creating a Translation Matrix of the Bible’s Names Across 591 Languages
From ‘Solved Problems’ to New Challenges: A Report on LDC Activities
Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment
A Dataset for Inter-Sentence Relation Extraction using Distant Supervision
A Computational Architecture for the Morphology of Upper Tanana
The IIT Bombay English-Hindi Parallel Corpus
Parallel Corpora for the Biomedical Domain
A Diachronic Corpus for Literary Style Analysis
CBFC: a parallel L2 speech corpus for Korean and French learners
Systems’ Agreements and Disagreements in Temporal Processing: An Extensive Error Analysis of the TempEval-3 Task
Strategies and Challenges for Crowdsourcing Regional Dialect Perception Data for Swiss German and Swiss French
Creating Large-Scale Multilingual Cognate Tables
BULBasaa: A Bilingual Basaa-French Speech Corpus for the Evaluation of Language Documentation Tools
Handling Rare Word Problem using Synthetic Training Data for Sinhala and Tamil Neural Machine Translation
Three Dimensions of Reproducibility in Natural Language Processing
Researching Less-Resourced Languages – the DigiSami Corpus
Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories
Expanding Abbreviations in a Strongly Inflected Language: Are Morphosyntactic Tags Sufficient?
Grounding Gradable Adjectives through Crowdsourcing
Simple Semantic Annotation and Situation Frames: Two Approaches to Basic Text Understanding in LORELEI
Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions
Laying the Groundwork for Knowledge Base Population: Nine Years of Linguistic Resources for TAC KBP
An Attribution Relations Corpus for Political News
FontLex: A Typographical Lexicon based on Affective Associations
Text Simplification from Professionally Produced Corpora
Linguistically-driven Framework for Computationally Efficient and Scalable Sign Recognition
CONDUCT: An Expressive Conducting Gesture Dataset for Sound Control
A vision-grounded dataset for predicting typical locations for verbs
Improving Dialogue Act Classification for Spontaneous Arabic Speech and Instant Messages at Utterance Level
A Large Resource of Patterns for Verbal Paraphrases
Error Analysis of Uyghur Name Tagging: Language-specific Techniques and Remaining Challenges
The Nautilus Speaker Characterization Corpus: Speech Recordings and Labels of Speaker Characteristics and Voice Descriptions
JESC: Japanese-English Subtitle Corpus
An Application for Building a Polish Telephone Speech Corpus
Baselines and Test Data for Cross-Lingual Inference
A Fast and Accurate Vietnamese Word Segmenter
A 2nd Longitudinal Corpus for Children's Writing with Enhanced Output for Specific Spelling Patterns
CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects
Finite-state morphological analysis for Gagauz
Data Anonymization for Requirements Quality Analysis: a Reproducible Automatic Error Detection Task
GeCoTagger: Annotation of German Verb Complements with Conditional Random Fields
Albanian Part-of-Speech Tagging: Gold Standard and Evaluation
Collecting Code-Switched Data from Social Media
Incorporating Semantic Attention in Video Description Generation
Correction of OCR Word Segmentation Errors in Articles from the ACL Collection through Neural Machine Translation Methods
Sentiment-Stance-Specificity (SSS) Dataset: Identifying Support-based Entailment among Opinions.
Exploiting Pre-Ordering for Neural Machine Translation
Portable Spelling Corrector for a Less-Resourced Language: Amharic
Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages
Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases
Building a Macro Chinese Discourse Treebank
Urdu Word Embeddings
Dynamic Oracle for Neural Machine Translation in Decoding Phase
Sound Signal Processing with Seq2Tree Network
What Causes the Differences in Communication Styles? A Multicultural Study on Directness and Elaborateness
Expert Evaluation of a Spoken Dialogue System in a Clinical Operating Room
Deep Neural Networks for Coreference Resolution for Polish
The Metalogue Debate Trainee Corpus: Data Collection and Annotations
AET: Web-based Adjective Exploration Tool for German
Open ASR for Icelandic: Resources and a Baseline System
The UIR Uncertainty Corpus for Chinese: Annotating Chinese Microblog Corpus for Uncertainty Identification from Social Media
Linguistic and Sociolinguistic Annotation of 17th Century Dutch Letters
Semantic Equivalence Detection: Are Interrogatives Harder than Declaratives?
Arabic Dialect Identification in the Context of Bivalency and Code-Switching
Investigating the Influence of Bilingual MWU on Trainee Translation Quality
Cross-lingual Terminology Extraction for Translation Quality Estimation
Very Large-Scale Lexical Resources to Enhance Chinese and Japanese Machine Translation
Interoperability of Language-related Information: Mapping the BLL Thesaurus to Lexvo and Glottolog
KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue
Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach
Multi Modal Distance - An Approach to Stemma Generation With Weighting
Improving Hate Speech Detection with Deep Learning Ensembles
A Corpus of Natural Multimodal Spatial Scene Descriptions
LREMap, a Song of Resources and Evaluation
ZAP: An Open-Source Multilingual Annotation Projection Framework
On the Vector Representation of Utterances in Dialogue Context
A Swedish Cookie-Theft Corpus
From Manuscripts to Archetypes through Iterative Clustering
FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German
ES-Port: a Spontaneous Spoken Human-Human Technical Support Corpus for Dialogue Research in Spanish
ASAP++: Enriching the ASAP Automated Essay Grading Dataset with Essay Attribute Scores
Building A Handwritten Cuneiform Character Imageset
Infant Word Comprehension-to-Production Index Applied to Investigation of Noun Learning Predominance Using Cross-lingual CDI database
Unified Guidelines and Resources for Arabic Dialect Orthography
A Parallel Corpus of Arabic-Japanese News Articles
The ADELE Corpus of Dyadic Social Text Conversations:Dialog Act Annotation with ISO 24617-2
The Reference Corpus of the Contemporary Romanian Language (CoRoLa)
Discriminating between Similar Languages on Imbalanced Conversational Texts
KRAUTS: A German Temporally Annotated News Corpus
Moving TIGER beyond Sentence-Level
Elicitation protocol and material for a corpus of long prepared monologues in Sign Language
Semantic Relatedness of Wikipedia Concepts -- Benchmark Data and a Working Solution
Lingmotif-lex: a Wide-coverage, State-of-the-art Lexicon for Sentiment Analysis
Towards a Welsh Semantic Annotation System
FooTweets: A Bilingual Parallel Corpus of World Cup Tweets
Edit me: A Corpus and a Framework for Understanding Natural Language Image Editing
The Niki and Julie Corpus: Collaborative Multimodal Dialogues between Humans, Robots, and Virtual Agents
Part-of-Speech Tagging for Arabic Gulf Dialect Using Bi-LSTM
Dysarthric speech evaluation: automatic and perceptual approaches
Predicting Nods by using Dialogue Acts in Dialogue
Carcinologic Speech Severity Index Project: A Database of Speech Disorder Productions to Assess Quality of Life Related to Speech After Cancer
The WAW Corpus: The First Corpus of Interpreted Speeches and their Translations for English and Arabic
A Multilingual Wikified Data Set of Educational Material
An Assessment of Explicit Inter- and Intra-sentential Discourse Connectives in Turkish Discourse Bank
Building a TOCFL Learner Corpus for Chinese Grammatical Error Diagnosis
SimPA: A Sentence-Level Simplification Corpus for the Public Administration Domain
MIAPARLE: Online training for the discrimination of stress contrasts
Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning
Universal Dependencies for Amharic
The First 100 Days: A Corpus Of Political Agendas on Twitter
Using a Corpus of English and Chinese Political Speeches for Metaphor Analysis
Sarcasm Target Identification: Dataset and An Introductory Approach
Modeling Collaborative Multimodal Behavior in Group Dialogues: The MULTISIMO Corpus
The brWaC Corpus: A New Open Resource for Brazilian Portuguese
Discovering Parallel Language Resources for Training MT Engines
A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction
A Leveled Reading Corpus of Modern Standard Arabic
CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing
Annotation and Quantitative Analysis of Speaker Information in Novel Conversation Sentences in Japanese
The LIA Treebank of Spoken Norwegian Dialects
Polish Corpus of Annotated Descriptions of Images
Managing Public Sector Data for Multilingual Applications Development
Korean TimeBank Including Relative Temporal Information
Manually Annotated Corpus of Polish Texts Published between 1830 and 1918
The DLDP Survey on Digital Use and Usability of EU Regional and Minority Languages
SimLex-999 for Polish
Action Verb Corpus
Sharing Copies of Synthetic Clinical Corpora without Physical Distribution — A Case Study to Get Around IPRs and Privacy Constraints Featuring the German JSYNCC Corpus
EMO&LY (EMOtion and AnomaLY) : A new corpus for anomaly detection in an audiovisual stream with emotional context.
Fluid Annotation: A Granularity-aware Annotation Tool for Chinese Word Fluidity
The German Reference Corpus DeReKo: New Developments – New Opportunities
Risamálheild: A Very Large Icelandic Text Corpus
Construction of English-French Multimodal Affective Conversational Corpus from TV Dramas
Exploring Conversational Language Generation for Rich Content about Hotels
A Comparative Study of Extremely Low-Resource Transliteration of the World’s Languages
Multi-lingual Argumentative Corpora in English, Turkish, Greek, Albanian, Croatian, Serbian, Macedonian, Bulgarian, Romanian and Arabic
Parser combinators for Tigrinya and Oromo morphology
Preparation and Usage of Xhosa Lexicographical Data for a Multilingual, Federated Environment
Translating Web Search Queries into Natural Language Questions
Literality and cognitive effort: Japanese and Spanish
Shami: A Corpus of Levantine Arabic Dialects
CLARIN’s Key Resource Families
Towards a music-language mapping
Two Multilingual Corpora Extracted from the Tenders Electronic Daily for Machine Learning and Machine Translation Applications.
Evaluation of Machine Translation Performance Across Multiple Genres and Languages
HiNTS: A Tagset for Middle Low German
Improving Unsupervised Keyphrase Extraction using Background Knowledge
Transfer Learning for Named-Entity Recognition with Neural Networks
Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models
Leveraging Lexical Resources and Constraint Grammar for Rule-Based Part-of-Speech Tagging in Welsh
A Repository of Corpora for Summarization
WikiDragon: A Java Framework For Diachronic Content And Network Analysis Of MediaWikis
Modeling Trolling in Social Media Conversations
Contextual Dependencies in Time-Continuous Multidimensional Affect Recognition
You Tweet What You Speak: A City-Level Dataset of Arabic Dialects
A Vietnamese Dialog Act Corpus Based on ISO 24617-2 standard
WikiArt Emotions: An Annotated Dataset of Emotions Evoked by Art
Test Sets for Chinese Nonlocal Dependency Parsing
Studying Muslim Stereotyping through Microportrait Extraction
Adding Syntactic Annotations to Flickr30k Entities Corpus for Multimodal Ambiguous Prepositional-Phrase Attachment Resolution
Introducing the CLARIN Knowledge Centre for Linguistic Diversity and Language Documentation
DART: A Large Dataset of Dialectal Arabic Tweets
The Linguistic Category Model in Polish (LCM-PL)
Massively Translingual Compound Analysis and Translation Discovery
Universal Dependencies for Ainu
Identifying Speakers and Addressees in Dialogues Extracted from Literary Fiction
Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus
Speech Rate Calculations with Short Utterances: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Automatic Identification of Research Fields in Scientific Papers
Building a Morphological Treebank for German from a Linguistic Database
Collecting Language Resources from Public Administrations in the Nordic and Baltic Countries
Manual vs Automatic Bitext Extraction
Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer
Teanga: A Linked Data based platform for Natural Language Processing
Tools for Building an Interlinked Synonym Lexicon Network
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
A corpus of German political speeches from the 21st century
Metadata Collection Records for Language Resources
MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction
A Morphological Analyzer for St. Lawrence Island / Central Siberian Yupik
ChAnot: An Intelligent Annotation Tool for Indigenous and Highly Agglutinative Languages in Peru
ESCRITO - An NLP-Enhanced Educational Scoring Toolkit
Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville)
The LREC Workshops Map
Errator: a Tool to Help Detect Annotation Errors in the Universal Dependencies Project
Preserving Workflow Reproducibility: The RePlay-DH Client as a Tool for Process Documentation
Matics Software Suite: New Tools for Evaluation and Data Exploration
PDF-to-Text Reanalysis for Linguistic Data Mining
Bringing Order to Chaos: A Non-Sequential Approach for Browsing Large Sets of Found Audio Data
MGAD: Multilingual Generation of Analogy Datasets
QUEST: A Natural Language Interface to Relational Databases
Towards Continuous Dialogue Corpus Creation: writing to corpus and generating from it
One Sentence One Model for Neural Machine Translation
Text Annotation Graphs: Annotating Complex Natural Language Phenomena
Extending Search System based on Interactive Visualization for Speech Corpora
German Radio Interviews: The GRAIN Release of the SFB732 Silver Standard Collection
Building Literary Corpora for Computational Literary Analysis - A Prototype to Bridge the Gap between CL and DH
Sign Languages and the Online World Online Dictionaries & Lexicostatistics
WASA: A Web Application for Sequence Annotation
Bridging the LAPPS Grid and CLARIN
A Lightweight Modeling Middleware for Corpus Processing
SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue Systems
MMQA: A Multi-domain Multi-lingual Question-Answering Framework for English and Hindi
Parsivar: A Language Processing Toolkit for Persian
Graph Based Semi-Supervised Learning Approach for Tamil POS tagging
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
Sentence and Clause Level Emotion Annotation, Detection, and Classification in a Multi-Genre Corpus
Persian Discourse Treebank and coreference corpus

 

P
Parsing Incorporating Contextual Information for Language-Independent, Dynamic Disambiguation Tasks
Evaluating Domain Adaptation for Machine Translation Across Scenarios
Ensemble Romanian Dependency Parsing with Neural Networks
PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies
BKTreebank: Building a Vietnamese Dependency Treebank
Attention for Implicit Discourse Relation Recognition
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
ANCOR-AS: Enriching the ANCOR Corpus with Syntactic Annotations
World Knowledge for Abstract Meaning Representation Parsing
The AnnCor CHILDES Treebank
Analyzing Middle High German Syntax with RDF and SPARQL
Part-Of-Speech Tagging Building a Corpus from Handwritten Picture Postcards: Transcription, Annotation and Part-of-Speech Tagging
Universal Dependencies Version 2 for Japanese
Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text
Multi-Dialect Arabic POS Tagging: A CRF Approach
A multilingual collection of CoNLL-U-compatible morphological lexicons
Expanding Abbreviations in a Strongly Inflected Language: Are Morphosyntactic Tags Sufficient?
BKTreebank: Building a Vietnamese Dependency Treebank
Albanian Part-of-Speech Tagging: Gold Standard and Evaluation
The Morpho-syntactic Annotation of Animacy for a Dependency Parser
Linguistic and Sociolinguistic Annotation of 17th Century Dutch Letters
Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard
BioRo: The Biomedical Corpus for the Romanian Language
Part-of-Speech Tagging for Arabic Gulf Dialect Using Bi-LSTM
A Morphologically Annotated Corpus of Emirati Arabic
Leveraging Lexical Resources and Constraint Grammar for Rule-Based Part-of-Speech Tagging in Welsh
A Neural Network Model for Part-Of-Speech Tagging of Social Media Texts
Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing
EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language
SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts
Web-based Annotation Tool for Inflectional Language Resources
Graph Based Semi-Supervised Learning Approach for Tamil POS tagging
Person Identification Delta vs. N-Gram Tracing: Evaluating the Robustness of Authorship Attribution Methods
Building a Corpus for Personality-dependent Natural Language Understanding and Generation
Reusable workflows for gender prediction
Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models
Experiments with Convolutional Neural Networks for Multi-Label Authorship Attribution
The MonPaGe_HA Database for the Documentation of Spoken French Throughout Adulthood
Arabic Data Science Toolkit: An API for Arabic Language Feature Extraction
Phonetic Databases, Phonology The Boarnsterhim Corpus: A Bilingual Frisian-Dutch Panel and Trend Study
BDPROTO: A Database of Phonological Inventories from Ancient and Reconstructed Languages
Comparison of Pun Detection Methods Using Japanese Pun Corpus
Evaluation of Automatic Formant Trackers
Data-Driven Pronunciation Modeling of Swiss German Dialectal Speech for Automatic Speech Recognition
A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks
Pronunciation Variants and ASR of Colloquial Speech: A Case Study on Czech
The Distribution and Prosodic Realization of Verb Forms in German Infant-Directed Speech
Epitran: Precision G2P for Many Languages
Construction of the Corpus of Everyday Japanese Conversation: An Interim Report
A Speaking Atlas of the Regional Languages of France
WordKit: a Python Package for Orthographic and Phonological Featurization
Sign Languages and the Online World Online Dictionaries & Lexicostatistics
Profiling What Causes the Differences in Communication Styles? A Multicultural Study on Directness and Elaborateness
Reusable workflows for gender prediction
Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification
An SLA Corpus Annotated with Pedagogically Relevant Grammatical Structures
Prosody A «Portrait» Approach to Multichannel Discourse
SynPaFlex-Corpus: An Expressive French Audiobooks Corpus dedicated to expressive speech synthesis.
The Distribution and Prosodic Realization of Verb Forms in German Infant-Directed Speech
Increasing the Accessibility of Time-Aligned Speech Corpora with Spokes Mix

 

Q
Question Answering Simple Large-scale Relation Extraction from Unstructured Text
MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
Annotating Zero Anaphora for Question Answering
A Multilingual Approach to Question Classification
Dataset for the First Evaluation on Chinese Machine Reading Comprehension
A Multi-Domain Framework for Textual Similarity. A Case Study on Question-to-Question and Question-Answering Similarity Tasks
Using Discourse Information for Education with a Spanish-Chinese Parallel Corpus
WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference
Analysis of Implicit Conditions in Database Search Dialogues
An Information-Providing Closed-Domain Human-Agent Interaction Corpus
Augmenting Image Question Answering Dataset by Exploiting Image Captions
Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning
Semi-supervised Training Data Generation for Multilingual Question Answering
PhotoshopQuiA: A Corpus of Non-Factoid Questions and Answers for Why-Question Answering
BioRead: A New Dataset for Biomedical Reading Comprehension
Using Adversarial Examples in Natural Language Processing
MMQA: A Multi-domain Multi-lingual Question-Answering Framework for English and Hindi

 

S
Semantic Web Incorporating Global Contexts into Sentence Embedding for Relational Extraction at the Paragraph Level with Distant Supervision
An Integrated Formal Representation for Terminological and Lexical Data included in Classification Schemes
PMKI: an European Commission action for the interoperability, maintainability and sustainability of Language Resources
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
Towards a Linked Open Data Edition of Sumerian Corpora
The LODeXporter: Flexible Generation of Linked Open Data Triples from NLP Frameworks for Automatic Knowledge Base Construction
Analyzing Middle High German Syntax with RDF and SPARQL
Automatic and Manual Web Annotations in an Infrastructure to handle Fake News and other Online Media Phenomena
LiDo RDF: From a Relational Database to a Linked Data Graph of Linguistic Terms and Bibliographic Data
Semantics Creating a Verb Synonym Lexicon Based on a Parallel Corpus
Word Embedding Approach for Synonym Extraction of Multi-Word Terms
A FrameNet for Cancer Information in Clinical Narratives: Schema and Annotation
Network Features Based Co-hyponymy Detection
Introducing a Lexicon of Verbal Polarity Shifters for English
DeModify: A Dataset for Analyzing Contextual Constraints on Modifier Deletion
Annotating High-Level Structures of Short Stories and Personal Anecdotes
Sentence Level Temporality Detection using an Implicit Time-sensed Resource
Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl
A Corpus of Metaphor Novelty Scores for Syntactically-Related Word Pairs
Evaluation of Domain-specific Word Embeddings using Knowledge Resources
An Integrated Representation of Linguistic and Social Functions of Code-Switching
Joint Learning of Sense and Word Embeddings
Towards an ISO Standard for the Annotation of Quantification
Fine-grained Semantic Textual Similarity for Serbian
Automatic Thesaurus Construction for Modern Hebrew
NegPar: A parallel corpus annotated for negation
Evaluating Scoped Meaning Representations
ETPC - A Paraphrase Identification Corpus Annotated with Extended Paraphrase Typology and Negation
Advances in Pre-Training Distributed Word Representations
Huge Automatically Extracted Training-Sets for Multilingual Word SenseDisambiguation
SentEval: An Evaluation Toolkit for Universal Sentence Representations
C-HTS: A Concept-based Hierarchical Text Segmentation approach
Abstract Meaning Representation of Constructions: The More We Include, the Better the Representation
Systems’ Agreements and Disagreements in Temporal Processing: An Extensive Error Analysis of the TempEval-3 Task
Annotating Temporally-Anchored Spatial Knowledge by Leveraging Syntactic Dependencies
Semantic Supersenses for English Possessives
Cross-Document, Cross-Language Event Coreference Annotation Using Event Hoppers
Simple Semantic Annotation and Situation Frames: Two Approaches to Basic Text Understanding in LORELEI
Integrating Generative Lexicon Event Structures into VerbNet
The New Propbank: Aligning Propbank with AMR through POS Unification
Evaluation of Croatian Word Embeddings
Cross-Lingual Generation and Evaluation of a Wide-Coverage Lexical Semantic Resource
Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus
Extending the gold standard for a lexical substitution task: is it worth it?
Construction of a Japanese Word Similarity Dataset
Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources
Acquiring Verb Classes Through Bottom-Up Semantic Verb Clustering
Constructing High Quality Sense-specific Corpus and Word Embedding via Unsupervised Elimination of Pseudo Multi-sense
Undersampling Improves Hypernymy Prototypicality Learning
Urdu Word Embeddings
Chinese Relation Classification using Long Short Term Memory Networks
Word Embedding Evaluation Datasets and Wikipedia Title Embedding for Chinese
Semantic Equivalence Detection: Are Interrogatives Harder than Declaratives?
Social Image Tags as a Source of Word Embeddings: A Task-oriented Evaluation
SzegedKoref: A Hungarian Coreference Corpus
Knowing the Author by the Company His Words Keep
Towards AMR-BR: A SemBank for Brazilian Portuguese Language
Transfer of Frames from English FrameNet to Construct Chinese FrameNet: A Bilingual Corpus-Based Approach
The Automatic Annotation of the Semiotic Type of Hand Gestures in Obama' s Humorous Speeches
Towards a Welsh Semantic Annotation System
An Evaluation Framework for Multimodal Interaction
Semantic Frame Parsing for Information Extraction : the CALOR corpus
Spanish HPSG Treebank based on the AnCora Corpus
Using a Corpus of English and Chinese Political Speeches for Metaphor Analysis
A Danish FrameNet Lexicon and an Annotated Corpus Used for Training and Evaluating a Semantic Frame Classifier
Finely Tuned, 2 Billion Token Based Word Embeddings for Portuguese
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language
Neural Models of Selectional Preferences for Implicit Semantic Role Labeling
KIT-Multi: A Translation-Oriented Multilingual Embedding Corpus
Modeling French Sign Language: a proposal for a semantically compositional system
Annotating Abstract Meaning Representations for Spanish
A Lexicon of Discourse Markers for Portuguese – LDM-PT
Simulating ASR errors for training SLU systems
Browsing the Terminological Structure of a Specialized Domain: A Method Based on Lexical Functions and their Classification
SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages
Towards the Inference of Semantic Relations in Complex Nominals: a Pilot Study
Rollenwechsel-English: a large-scale semantic role corpus
Towards a Standardized Dataset for Noun Compound Interpretation
World Knowledge for Abstract Meaning Representation Parsing
An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
A Parser for LTAG and Frame Semantics
Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data
A database of German definitory contexts from selected web sources
One event, many representations. Mapping action concepts through visual features.
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System
Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities
Indra: A Word Embedding and Semantic Relatedness Server
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
Metaphor Suggestions based on a Semantic Metaphor Repository
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages
Reference production in human-computer interaction: Issues for Corpus-based Referring Expression Generation
Sign Language Recognition/Generation Linguistically-driven Framework for Computationally Efficient and Scalable Sign Recognition
SMILE Swiss German Sign Language Dataset
A Real-life, French-accented Corpus of Air Traffic Control Communications
Elicitation protocol and material for a corpus of long prepared monologues in Sign Language
Deep JSLC: A Multimodal Corpus Collection for Data-driven Generation of Japanese Sign Language Expressions
Modeling French Sign Language: a proposal for a semantically compositional system
Social Media Processing Building an Ellipsis-aware Chinese Dependency Treebank for Web Text
EuroGames16: Evaluating Change Detection in Online Conversation
A Deep Neural Network based Approach for Entity Extraction in Code-Mixed Indian Social Media Text
Multi-Dialect Arabic POS Tagging: A CRF Approach
PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies
Aggression-annotated Corpus of Hindi-English Code-mixed Data
Semantic Supersenses for English Possessives
Annotating If the Authors of a Tweet are Located at the Locations They Tweet About
A Comparison Of Emotion Annotation Schemes And A New Annotated Data Set
An Automatic Learning of an Algerian Dialect Lexicon by using Multilingual Word Embeddings
Social Image Tags as a Source of Word Embeddings: A Task-oriented Evaluation
Classifying the Informative Behaviour of Emoji in Microblogs
Improving Hate Speech Detection with Deep Learning Ensembles
Visualization of the occurrence trend of infectious diseases using Twitter
A Taxonomy for In-depth Evaluation of Normalization for User Generated Content
Can Domain Adaptation be Handled as Analogies?
Author Profiling from Facebook Corpora
Gaining and Losing Influence in Online Conversation
Humor Detection in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System
Building a Sentiment Corpus of Tweets in Brazilian Portuguese
Lingmotif-lex: a Wide-coverage, State-of-the-art Lexicon for Sentiment Analysis
The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs
The First 100 Days: A Corpus Of Political Agendas on Twitter
Medical Sentiment Analysis using Social Media: Towards building a Patient Assisted System
An Italian Twitter Corpus of Hate Speech against Immigrants
RtGender: A Corpus for Studying Differential Responses to Gender
A Neural Network Model for Part-Of-Speech Tagging of Social Media Texts
Utilizing Large Twitter Corpora to Create Sentiment Lexica
Building Evaluation Datasets for Cultural Microblog Retrieval
Application and Analysis of a Multi-layered Scheme for Irony on the Italian Twitter Corpus TWITTIRÒ
Speech Recognition/Understanding A Recorded Debating Dataset
Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation
Phonetically Balanced Code-Mixed Speech Corpus for Hindi-English Automatic Speech Recognition
BULBasaa: A Bilingual Basaa-French Speech Corpus for the Evaluation of Language Documentation Tools
Improving Dialogue Act Classification for Spontaneous Arabic Speech and Instant Messages at Utterance Level
Design and Development of Speech Corpora for Air Traffic Control Training
A Web Service for Pre-segmenting Very Long Transcribed Speech Recordings
Improved Transcription and Indexing of Oral History Interviews for Digital Humanities Research
Open ASR for Icelandic: Resources and a Baseline System
Creating Lithuanian and Latvian Speech Corpora from Inaccurately Annotated Web Data
Towards an Automatic Assessment of Crowdsourced Data for NLU
Data-Driven Pronunciation Modeling of Swiss German Dialectal Speech for Automatic Speech Recognition
Simulating ASR errors for training SLU systems
Pronunciation Variants and ASR of Colloquial Speech: A Case Study on Czech
Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models
Epitran: Precision G2P for Many Languages
Matics Software Suite: New Tools for Evaluation and Data Exploration
Speech Resource/Database The Boarnsterhim Corpus: A Bilingual Frisian-Dutch Panel and Trend Study
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
Phonetically Balanced Code-Mixed Speech Corpus for Hindi-English Automatic Speech Recognition
A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
The Nautilus Speaker Characterization Corpus: Speech Recordings and Labels of Speaker Characteristics and Voice Descriptions
Evaluation of Automatic Formant Trackers
An Application for Building a Polish Telephone Speech Corpus
A First South African Corpus of Multilingual Code-switched Soap Opera Speech
A Web Service for Pre-segmenting Very Long Transcribed Speech Recordings
MYCanCor: A Video Corpus of spoken Malaysian Cantonese
Open ASR for Icelandic: Resources and a Baseline System
Creating Lithuanian and Latvian Speech Corpora from Inaccurately Annotated Web Data
Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach
A Corpus of Natural Multimodal Spatial Scene Descriptions
A Bird’s-eye View of Language Processing Projects at the Romanian Academy
Pronunciation Dictionaries for the Alsatian Dialects to Analyze Spelling and Phonetic Variation
MirasVoice: A bilingual (English-Persian) speech corpus
Japanese Dialogue Corpus of Information Navigation and Attentive Listening Annotated with Extended ISO-24617-2 Dialogue Act Tags
Carcinologic Speech Severity Index Project: A Database of Speech Disorder Productions to Assess Quality of Life Related to Speech After Cancer
Data-Driven Pronunciation Modeling of Swiss German Dialectal Speech for Automatic Speech Recognition
Preliminary Analysis of Embodied Interactions between Science Communicators and Visitors Based on a Multimodal Corpus of Japanese Conversations in a Science Museum
Analyzing Vocabulary Commonality Index Using Large-scaled Database of Child Language Development
PronouncUR: An Urdu Pronunciation Lexicon Generator
A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks
Statistical Analysis of Missing Translation in Simultaneous Interpretation Using A Large-scale Bilingual Speech Corpus
EMO&LY (EMOtion and AnomaLY) : A new corpus for anomaly detection in an audiovisual stream with emotional context.
SynPaFlex-Corpus: An Expressive French Audiobooks Corpus dedicated to expressive speech synthesis.
ASR for Documenting Acutely Under-Resourced Indigenous Languages
Pronunciation Variants and ASR of Colloquial Speech: A Case Study on Czech
The MonPaGe_HA Database for the Documentation of Spoken French Throughout Adulthood
CoLoSS: Cognitive Load Corpus with Speech and Performance Data from a Symbol-Digit Dual-Task
VAST: A Corpus of Video Annotation for Speech Technologies
Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus
Construction of the Corpus of Everyday Japanese Conversation: An Interim Report
Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville)
Bringing Order to Chaos: A Non-Sequential Approach for Browsing Large Sets of Found Audio Data
Extending Search System based on Interactive Visualization for Speech Corpora
BabyCloud, a Technological Platform for Parents and Researchers
Increasing the Accessibility of Time-Aligned Speech Corpora with Spokes Mix
Speech Synthesis Design and Development of Speech Corpora for Air Traffic Control Training
SynPaFlex-Corpus: An Expressive French Audiobooks Corpus dedicated to expressive speech synthesis.
Epitran: Precision G2P for Many Languages
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform
Standards For Lrs Towards an ISO Standard for the Annotation of Quantification
Lightweight Grammatical Annotation in the TEI: New Perspectives
Resource Interoperability for Sustainable Benchmarking: The Case of Events
Statistical And Machine Learning Methods Simple Large-scale Relation Extraction from Unstructured Text
A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts
EuroGames16: Evaluating Change Detection in Online Conversation
Combining rule-based and embedding-based approaches to normalize textual entities with an ontology
Ensemble Romanian Dependency Parsing with Neural Networks
Diacritics Restoration Using Neural Networks
Building a Word Segmenter for Sanskrit Overnight
Advances in Pre-Training Distributed Word Representations
Neural Caption Generation for News Images
SumeCzech: Large Czech News-Based Summarization Dataset
Annotating Educational Questions for Student Response Analysis
SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning
Linguistically-driven Framework for Computationally Efficient and Scalable Sign Recognition
Cross-Lingual Generation and Evaluation of a Wide-Coverage Lexical Semantic Resource
SMILE Swiss German Sign Language Dataset
TF-LM: TensorFlow-based Language Modeling Toolkit
All-words Word Sense Disambiguation Using Concept Embeddings
English-Basque Statistical and Neural Machine Translation
Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources
Exploiting Pre-Ordering for Neural Machine Translation
Portable Spelling Corrector for a Less-Resourced Language: Amharic
Improved Transcription and Indexing of Oral History Interviews for Digital Humanities Research
Urdu Word Embeddings
Sound Signal Processing with Seq2Tree Network
Language adaptation experiments via cross-lingual embeddings for related languages
Investigating the Influence of Bilingual MWU on Trainee Translation Quality
Classifying the Informative Behaviour of Emoji in Microblogs
Korean L2 Vocabulary Prediction: Can a Large Annotated Corpus be Used to Train Better Models for Predicting Unknown Words?
Portuguese Named Entity Recognition using Conditional Random Fields and Local Grammars
The Automatic Annotation of the Semiotic Type of Hand Gestures in Obama' s Humorous Speeches
Semi-Supervised Clustering for Short Answer Scoring
From analysis to modeling of engagement as sequences of multimodal behaviors
A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks
Semantic Frame Parsing for Information Extraction : the CALOR corpus
Improving domain-specific SMT for low-resourced languages using data from different domains
A Danish FrameNet Lexicon and an Annotated Corpus Used for Training and Evaluating a Semantic Frame Classifier
Is it worth it? Budget-related evaluation metrics for model selection
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language
A Detailed Evaluation of Neural Sequence-to-Sequence Models for In-domain and Cross-domain Text Simplification
Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features
Semi-supervised Training Data Generation for Multilingual Question Answering
Evaluating Inflectional Complexity Crosslinguistically: a Processing Perspective
BioRead: A New Dataset for Biomedical Reading Comprehension
Simulating ASR errors for training SLU systems
Using Adversarial Examples in Natural Language Processing
A Neural Network Model for Part-Of-Speech Tagging of Social Media Texts
Adding Syntactic Annotations to Flickr30k Entities Corpus for Multimodal Ambiguous Prepositional-Phrase Attachment Resolution
Classifier-based Polarity Propagation in a WordNet
A Legal Perspective on Training Models for Natural Language Processing
Utilizing Large Twitter Corpora to Create Sentiment Lexica
Improving a Neural-based Tagger for Multiword Expressions Identification
Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus
Manual vs Automatic Bitext Extraction
Bringing Order to Chaos: A Non-Sequential Approach for Browsing Large Sets of Found Audio Data
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System
MGAD: Multilingual Generation of Analogy Datasets
DeepTC – An Extension of DKPro Text Classification for Fostering Reproducibility of Deep Learning Experiments
SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts
Summarisation Neural Caption Generation for News Images
SumeCzech: Large Czech News-Based Summarization Dataset
A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
Live Blog Corpus for Summarization
TSix: A Human-involved-creation Dataset for Tweet Summarization
Scalable Visualisation of Sentiment and Stance
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
Annotation and Analysis of Extractive Summaries for the Kyutech Corpus
A Repository of Corpora for Summarization
Corpus Building and Evaluation of Aspect-based Opinion Summaries from Tweets in Spanish
Auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus
A Workbench for Rapid Generation of Cross-Lingual Summaries
Beyond Generic Summarization: A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data
PyrEval: An Automated Method for Summary Content Analysis

 

T
Text Mining Word Embedding Approach for Synonym Extraction of Multi-Word Terms
A Recorded Debating Dataset
A New Corpus to Support Text Mining for the Curation of Metabolites in the ChEBI Database
Content-Based Conflict of Interest Detection on Wikipedia
MPST: A Corpus of Movie Plot Synopses with Tags
TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection
Discovering the Language of Wine Reviews: A Text Mining Account
Advances in Pre-Training Distributed Word Representations
Neural Caption Generation for News Images
Analyzing Citation-Distance Networks for Evaluating Publication Impact
C-HTS: A Concept-based Hierarchical Text Segmentation approach
A Diachronic Corpus for Literary Style Analysis
Three Dimensions of Reproducibility in Natural Language Processing
Medical Entity Corpus with PICO elements and Sentiment Analysis
BlogSet-BR: A Brazilian Portuguese Blog Corpus
Bootstrapping Polar-Opposite Emotion Dimensions from Online Reviews
Sentiment-Stance-Specificity (SSS) Dataset: Identifying Support-based Entailment among Opinions.
Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases
Resource Creation Towards Automated Sentiment Analysis in Telugu (a low resource language) and Integrating Multiple Domain Sources to Enhance Sentiment Prediction
Multilingual Multi-class Sentiment Classification Using Convolutional Neural Networks
HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
EventWiki: A Knowledge Base of Major Events
MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification
Cross-lingual Terminology Extraction for Translation Quality Estimation
Annotating Spin in Biomedical Scientific Publications : the case of Random Controlled Trials (RCTs)
A Swedish Cookie-Theft Corpus
Reusable workflows for gender prediction
From Manuscripts to Archetypes through Iterative Clustering
Knowing the Author by the Company His Words Keep
Author Profiling from Facebook Corpora
A Japanese Corpus for Analyzing Customer Loyalty Information
Revisiting the Task of Scoring Open IE Relations
A supervised approach to taxonomy extraction using word embeddings
The Circumstantial Event Ontology (CEO) and ECB+/CEO: an Ontology and Corpus for Implicit Causal Relations between Events
Annotating Opinions and Opinion Targets in Student Course Feedback
'Aye' or 'No'? Speech-level Sentiment Analysis of Hansard UK Parliamentary Debate Transcripts
Risamálheild: A Very Large Icelandic Text Corpus
Automating Document Discovery in the Systematic Review Process: How to Use Chaff to Extract Wheat
Annotated Corpus of Scientific Conference's Homepages for Information Extraction
RtGender: A Corpus for Studying Differential Responses to Gender
Studying Muslim Stereotyping through Microportrait Extraction
A Legal Perspective on Training Models for Natural Language Processing
Analyzing the Quality of Counseling Conversations: the Tell-Tale Signs of High-quality Counseling
Mining Biomedical Publications With The LAPPS Grid
PyRATA, Python Rule-based feAture sTructure Analysis
CogCompNLP: Your Swiss Army Knife for NLP
ILCM - A Virtual Research Infrastructure for Large-Scale Qualitative Data
Textual Entailment And Paraphrasing DeModify: A Dataset for Analyzing Contextual Constraints on Modifier Deletion
Open Subtitles Paraphrase Corpus for Six Languages
Building a Knowledge Graph from Natural Language Definitions for Interpretable Text Entailment Recognition
Automatic Prediction of Discourse Connectives
Fine-grained Semantic Textual Similarity for Serbian
SPADE: Evaluation Dataset for Monolingual Phrase Alignment
ETPC - A Paraphrase Identification Corpus Annotated with Extended Paraphrase Typology and Negation
Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment
A Multi-Domain Framework for Textual Similarity. A Case Study on Question-to-Question and Question-Answering Similarity Tasks
Baselines and Test Data for Cross-Lingual Inference
Mapping Texts to Scripts: An Entailment Study
Semantic Equivalence Detection: Are Interrogatives Harder than Declaratives?
CEFR-based Lexical Simplification Dataset
Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications
A Multilingual Test Collection for the Semantic Search of Entity Categories
Tools And Platforms For Data Collection Community-Driven Crowdsourcing: Data Collection with Local Developers
Tools, Systems, Applications Evaluating the WordsEye Text-to-Scene System: Imaginative and Realistic Sentences
Building a Corpus from Handwritten Picture Postcards: Transcription, Annotation and Part-of-Speech Tagging
Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages
Overcoming the Long Tail Problem: A Case Study on CO2-Footprint Estimation of Recipes using Information Retrieval
Handling Big Data and Sensitive Data Using EUDAT's Generic Execution Framework and the WebLicht Workflow Engine.
Ensemble Romanian Dependency Parsing with Neural Networks
Diacritics Restoration Using Neural Networks
Building a Word Segmenter for Sanskrit Overnight
A Multi-layer Annotated Corpus of Argumentative Text: From Argument Schemes to Discourse Relations
SentEval: An Evaluation Toolkit for Universal Sentence Representations
A Lexical Tool for Academic Writing in Spanish based on Expert and Novice Corpora
A Framework for Multi-Language Service Design with the Language Grid
FontLex: A Typographical Lexicon based on Affective Associations
Intertextual Correspondence for Integrating Corpora
Evaluation of Automatic Formant Trackers
A Fast and Accurate Vietnamese Word Segmenter
TF-LM: TensorFlow-based Language Modeling Toolkit
GeCoTagger: Annotation of German Verb Complements with Conditional Random Fields
Training and Adapting Multilingual NMT for Less-resourced and Morphologically Rich Languages
Correction of OCR Word Segmentation Errors in Articles from the ACL Collection through Neural Machine Translation Methods
Improved Transcription and Indexing of Oral History Interviews for Digital Humanities Research
ScholarGraph:a Chinese Knowledge Graph of Chinese Scholars
Expert Evaluation of a Spoken Dialogue System in a Clinical Operating Room
Deep Neural Networks for Coreference Resolution for Polish
Candidate Ranking for Maintenance of an Online Dictionary
QUD-Based Annotation of Discourse Structure and Information Structure: Tool and Evaluation
UFSAC: Unification of Sense Annotated Corpora and Tools
A Framework for the Needs of Different Types of Users in Multilingual Semantic Enrichment
Multi Modal Distance - An Approach to Stemma Generation With Weighting
A Pragmatic Approach for Classical Chinese Word Segmentation
Visualization of the occurrence trend of infectious diseases using Twitter
From Manuscripts to Archetypes through Iterative Clustering
Live Blog Corpus for Summarization
Tools for The Production of Analogical Grids and a Resource of N-gram Analogical Grids in 11 Languages
Combining Concepts and Their Translations from Structured Dictionaries of Uralic Minority Languages
Towards a Welsh Semantic Annotation System
An Evaluation Framework for Multimodal Interaction
Dysarthric speech evaluation: automatic and perceptual approaches
The brWaC Corpus: A New Open Resource for Brazilian Portuguese
Discovering Parallel Language Resources for Training MT Engines
ForFun 1.0: Prague Database of Forms and Functions -- An Invaluable Resource for Linguistic Research
A Detailed Evaluation of Neural Sequence-to-Sequence Models for In-domain and Cross-domain Text Simplification
Neural Models of Selectional Preferences for Implicit Semantic Role Labeling
PronouncUR: An Urdu Pronunciation Lexicon Generator
A Semi-autonomous System for Creating a Human-Machine Interaction Corpus in Virtual Reality: Application to the ACORFORMed System for Training Doctors to Break Bad News
FARMI: A FrAmework for Recording Multi-Modal Interactions
The German Reference Corpus DeReKo: New Developments – New Opportunities
Development of an Annotated Multimodal Dataset for the Investigation of Classification and Summarisation of Presentations using High-Level Paralinguistic Features
Towards faithfully visualizing global linguistic diversity
Improving Unsupervised Keyphrase Extraction using Background Knowledge
WikiDragon: A Java Framework For Diachronic Content And Network Analysis Of MediaWikis
Modeling Trolling in Social Media Conversations
Contextual Dependencies in Time-Continuous Multidimensional Affect Recognition
The Use of Text Alignment in Semi-Automatic Error Analysis: Use Case in the Development of the Corpus of the Latvian Language Learners
The GermaParl Corpus of Parliamentary Protocols
WordNet-Shp: Towards the Building of a Lexical Database for a Peruvian Minority Language
Speech Rate Calculations with Short Utterances: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Improving a Neural-based Tagger for Multiword Expressions Identification
Transc&Anno: A Graphical Tool for the Transcription and On-the-Fly Annotation of Handwritten Documents
An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
A Fast and Flexible Webinterface for Dialect Research in the Low Countries
Tools for Building an Interlinked Synonym Lexicon Network
Palmyra: A Platform Independent Dependency Annotation Tool for Morphologically Rich Languages
A Web-based System for Crowd-in-the-Loop Dependency Treebanking
MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction
A Morphological Analyzer for St. Lawrence Island / Central Siberian Yupik
A Corpus of Drug Usage Guidelines Annotated with Type of Advice
ChAnot: An Intelligent Annotation Tool for Indigenous and Highly Agglutinative Languages in Peru
Signbank: Software to Support Web Based Dictionaries of Sign Language
Biomedical term normalization of EHRs with UMLS
Compilation of Corpora for the Study of the Information Structure–Prosody Interface
A Parser for LTAG and Frame Semantics
A Workbench for Rapid Generation of Cross-Lingual Summaries
ESCRITO - An NLP-Enhanced Educational Scoring Toolkit
CATS: A Tool for Customized Alignment of Text Simplification Corpora
Errator: a Tool to Help Detect Annotation Errors in the Universal Dependencies Project
Mining Biomedical Publications With The LAPPS Grid
PDFAnno: a Web-based Linguistic Annotation Tool for PDF Documents
Preserving Workflow Reproducibility: The RePlay-DH Client as a Tool for Process Documentation
TriMED: A Multilingual Terminological Database
PyRATA, Python Rule-based feAture sTructure Analysis
Matics Software Suite: New Tools for Evaluation and Data Exploration
Revita: a Language-learning Platform at the Intersection of ITS and CALL
Developing New Linguistic Resources and Tools for the Galician Language
The ACoLi CoNLL Libraries: Beyond Tab-Separated Values
PDF-to-Text Reanalysis for Linguistic Data Mining
PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform
QUEST: A Natural Language Interface to Relational Databases
Crowdsourced Multimodal Corpora Collection Tool
Coreference Resolution in FreeLing 4.0
DeepTC – An Extension of DKPro Text Classification for Fostering Reproducibility of Deep Learning Experiments
SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts
TQ-AutoTest – An Automated Test Suite for (Machine) Translation Quality
CogCompNLP: Your Swiss Army Knife for NLP
Development of a Mobile Observation Support System for Students: FishWatchr Mini
Manzanilla: An Image Annotation Tool for TKB Building
WordKit: a Python Package for Orthographic and Phonological Featurization
Browsing and Supporting Pluricentric Global Wordnet, or just your Wordnet of Interest
Automatic and Manual Web Annotations in an Infrastructure to handle Fake News and other Online Media Phenomena
WASA: A Web Application for Sequence Annotation
Evaluating EcoLexiCAT: a Terminology-Enhanced CAT Tool
Extended HowNet 2.0 – An Entity-Relation Common-Sense Representation Model
TreeAnnotator: Versatile Visual Annotation of Hierarchical Text Relations
A Lightweight Modeling Middleware for Corpus Processing
E-magyar -- A Digital Language Processing System
What's Wrong, Python? -- A Visual Differ and Graph Library for NLP in Python
Parsivar: A Language Processing Toolkit for Persian
Indra: A Word Embedding and Semantic Relatedness Server
A UIMA Database Interface for Managing NLP-related Text Annotations
Metaphor Suggestions based on a Semantic Metaphor Repository
Sentence and Clause Level Emotion Annotation, Detection, and Classification in a Multi-Genre Corpus
Topic Detection & Tracking C-HTS: A Concept-based Hierarchical Text Segmentation approach
A Multilingual Wikified Data Set of Educational Material
Measuring Innovation in Speech and Language Processing Publications.
FrNewsLink : a corpus linking TV Broadcast News Segments and Press Articles
Low Resource Methods for Medieval Document Sections Analysis
A corpus of German political speeches from the 21st century
MMQA: A Multi-domain Multi-lingual Question-Answering Framework for English and Hindi
Typological Databases BDPROTO: A Database of Phonological Inventories from Ancient and Reconstructed Languages
Grapheme-level Awareness in Word Embeddings for Morphologically Rich Languages
Towards faithfully visualizing global linguistic diversity
Universal Dependencies and Quantitative Typological Trends. A Case Study on Word Order
QUEST: A Natural Language Interface to Relational Databases

 

U
Usability, User Satisfaction Evaluating the WordsEye Text-to-Scene System: Imaginative and Realistic Sentences
Adapting Serious Game for Fallacious Argumentation to German: Pitfalls, Insights, and Best Practices
A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks
Speech Rate Calculations with Short Utterances: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Transc&Anno: A Graphical Tool for the Transcription and On-the-Fly Annotation of Handwritten Documents
Evaluating EcoLexiCAT: a Terminology-Enhanced CAT Tool

 

V
Validation Of Lrs MOCCA: Measure of Confidence for Corpus Analysis - Automatic Reliability Check of Transcript and Automatic Segmentation
Delta vs. N-Gram Tracing: Evaluating the Robustness of Authorship Attribution Methods
Acquiring Verb Classes Through Bottom-Up Semantic Verb Clustering
Enriching a Lexicon of Discourse Connectives with Corpus-based Data
SenSALDO: Creating a Sentiment Lexicon for Swedish
Increasing Argument Annotation Reproducibility by Using Inter-annotator Agreement to Improve Guidelines
Classification of Closely Related Sub-dialects of Arabic Using Support-Vector Machines
European Language Resource Coordination: Collecting Language Resources for Public Sector Multilingual Information Management
Generating a Gold Standard for a Swedish Sentiment Lexicon
Resource Interoperability for Sustainable Benchmarking: The Case of Events

 

W
Web Services MOCCA: Measure of Confidence for Corpus Analysis - Automatic Reliability Check of Transcript and Automatic Segmentation
Handling Big Data and Sensitive Data Using EUDAT's Generic Execution Framework and the WebLicht Workflow Engine.
A Framework for Multi-Language Service Design with the Language Grid
IPSL: A Database of Iconicity Patterns in Sign Languages. Creation and Use
A Web Service for Pre-segmenting Very Long Transcribed Speech Recordings
ScholarGraph:a Chinese Knowledge Graph of Chinese Scholars
AET: Web-based Adjective Exploration Tool for German
Candidate Ranking for Maintenance of an Online Dictionary
Visualizing the "Dictionary of Regionalisms of France" (DRF)
A Fast and Flexible Webinterface for Dialect Research in the Low Countries
Palmyra: A Platform Independent Dependency Annotation Tool for Morphologically Rich Languages
Signbank: Software to Support Web Based Dictionaries of Sign Language
TreeAnnotator: Versatile Visual Annotation of Hierarchical Text Relations
LiDo RDF: From a Relational Database to a Linked Data Graph of Linguistic Terms and Bibliographic Data
A UIMA Database Interface for Managing NLP-related Text Annotations
Word Sense Disambiguation Joint Learning of Sense and Word Embeddings
Huge Automatically Extracted Training-Sets for Multilingual Word SenseDisambiguation
Automatic Wordnet Mapping: from CoreNet to Princeton WordNet
Disambiguation of Verbal Shifters
All-words Word Sense Disambiguation Using Concept Embeddings
Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources
Constructing High Quality Sense-specific Corpus and Word Embedding via Unsupervised Elimination of Pseudo Multi-sense
Unsupervised Korean Word Sense Disambiguation using CoreNet
UFSAC: Unification of Sense Annotated Corpora and Tools
Enriching Frame Representations with Distributionally Induced Senses
Construction of Large-scale English Verbal Multiword Expression Annotated Corpus
Multimodal Lexical Translation
FastSense: An Efficient Word Sense Disambiguation Classifier
WordNet-Shp: Towards the Building of a Lexical Database for a Peruvian Minority Language
An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities
Powered by ELDA © 2018 ELDA/ELRA