|  |   TOPICS: Browse articles of the conference sorted by topic 
   A -   C -   D -   E -   G -   I -   K -   L -   M -   N -   O -   P -   Q -   S -   T -   U -   V -   W   
  
  | C |  
  | Cognitive Methods | VoxML: A Visualization Modeling Language Metonymy Analysis Using Associative Relations between Words
 A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults
 Cognitively Motivated Distributional Representations of Meaning
 English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting
 Multimodal Resources for Human-Robot Communication Modelling
 Finding Recurrent Features of Image Schema Gestures: the FIGURE corpus
 Coordinating Communication in the Wild: The Artwalk Dialogue Corpus of Pedestrian Navigation and Mobile Referential Communication
 Database of Mandarin Neighborhood Statistics
 Cohere: A Toolkit for Local Coherence
 
 |  
  | Collaborative Resource Construction | A Corpus of Wikipedia Discussions: Over the Years, with Topic, Power and Gender Labels Phonetic Inventory for an Arabic Speech Corpus
 A Multi-Layered Annotated Corpus of Scientific Papers
 Corpus Resources for Dispute Mediation Discourse
 New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian
 A Tagged Corpus for Automatic Labeling of Disabilities in Medical Scientific Papers
 Introducing the Asian Language Treebank (ALT)
 Benchmarking multimedia technologies with the CAMOMILE platform: the case of Multimodal Person Discovery at MediaEval 2015
 Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora
 Building Language Resources for Exploring Autism Spectrum Disorders
 Staggered NLP-assisted refinement for Clinical Annotations of Chronic Disease Events
 Resources for building applications with Dependency Minimal Recursion Semantics
 Port4NooJ v3.0: Integrated Linguistic Resources for Portuguese NLP
 From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
 UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
 Building Evaluation Datasets for Consumer-Oriented Information Retrieval
 CLARIN-EL Web-based Annotation Tool
 EDISON: Feature Extraction for NLP, Simplified
 
 |  
  | Computer-Assisted Language Learning (CALL) | The Validation of MRCPD Cross-language Expansions on Imageability Ratings Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
 Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario
 SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies
 SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners
 Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
 Leveraging Native Data to Correct Preposition Errors in Learners' Dutch
 Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish
 A Shared Task for Spoken CALL?
 DALILA: The Dialectal Arabic Linguistic Learning Assistant
 Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers
 Joining-in-type Humanoid Robot Assisted Language Learning System
 
 |  
  | Controlled Languages | LELIO: An Auto-Adaptative System to Acquire Domain Lexical Knowledge in Technical Texts ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
 Error Typology and Remediation Strategies for Requirements Written in English by Non-Native Speakers
 
 |  
  | Corpus (Creation, Annotation, etc.) | Endangered Language Documentation: Bootstrapping a Chatino Speech Corpus, Forced Aligner, ASR The PsyMine Corpus - A Corpus annotated with Psychiatric Disorders and their Etiological Factors
 Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
 QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
 An Interaction-Centric Dataset for Learning Automation Rules in Smart Homes
 C-WEP―Rich Annotated Collection of Writing Errors by Professionals
 The REAL Corpus: A Crowd-Sourced Corpus of Human Generated and Evaluated Spatial References to Real-World Urban Scenes
 Ecological Gestures for HRI: the GEE Corpus
 How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment
 Who was Pietro Badoglio? Towards a QA system for Italian History
 Croatian Error-Annotated Corpus of Non-Professional Written Language
 New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification
 An Annotated Corpus of Direct Speech
 Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola
 Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl
 A Corpus of Wikipedia Discussions: Over the Years, with Topic, Power and Gender Labels
 NLP Infrastructure for the Lithuanian Language
 Sense-annotating a Lexical Substitution Data Set with Ubyline
 Focus Annotation of Task-based Data: A Comparison of Expert and Crowd-Sourced Annotation in a Reading Comprehension Corpus
 The OpenCourseWare Metadiscourse (OCWMD) Corpus
 An Open Corpus for Named Entity Recognition in Historic Newspapers
 Domain Adaptation for Named Entity Recognition Using CRFs
 Building a Dataset for Possessions Identification in Text
 Age and Gender Prediction on Health Forum Data
 Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project
 Manual and Automatic Paraphrases for MT Evaluation
 CodE Alltag: A German-Language E-Mail Corpus
 ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
 Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it
 Combining Semantic Annotation of Word Sense & Semantic Roles: A Novel Annotation Scheme for VerbNet Roles on German Language Data
 A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus
 Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel
 MWEs in Treebanks: From Survey to Guidelines
 LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
 Improving corpus search via parsing
 Ubuntu-fr: A Large and Open Corpus for Multi-modal Analysis of Online Written Conversations
 A Turkish-German Code-Switching Corpus
 Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research
 A Web Tool for Building Parallel Corpora of Spoken and Sign Languages
 Introducing the LCC Metaphor Datasets
 Passing a USA National Bar Exam: a First Corpus for Experimentation
 Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
 Factuality Annotation and Learning in Spanish Texts
 Using Word Embeddings to Translate Named Entities
 Privacy Issues in Online Machine Translation Services - European Perspective
 The Alaskan Athabascan Grammar Database
 Corpora for Learning the Mutual Relationship between Semantic Relatedness and Textual Entailment
 DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter
 The OnForumS corpus from the Shared Task on Online Forum Summarisation at MultiLing 2015
 Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
 DT-Neg: Tutorial Dialogues Annotated for Negation Scope and Focus in Context
 Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
 Phrase Level Segmentation and Labelling of Machine Translation Errors
 Building the Macedonian-Croatian Parallel Corpus
 The ACQDIV Database: Min(d)ing the Ambient Language
 Towards Automatic Transcription of ILSE ― an Interdisciplinary Longitudinal Study of Adult Development and Aging
 A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)
 SatiricLR: a Language Resource of Satirical News Articles
 The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis
 The Query of Everything: Developing Open-Domain, Natural-Language Queries for BOLT Information Retrieval
 Spanish Word Vectors from Wikipedia
 Two Years of Aranea: Increasing Counts and Tuning the Pipeline
 Universal Dependencies for Japanese
 Annotating and Detecting Medical Events in Clinical Notes
 Collecting Language Resources for the Latvian e-Government Machine Translation Platform
 Multiword Expressions Dataset for Indian Languages
 Quantitative Analysis of Gazes and Grounding Acts in L1 and L2 Conversations
 The Validation of MRCPD Cross-language Expansions on Imageability Ratings
 SemRelData ― Multilingual Contextual Annotation of Semantic Relations between Nominals: Dataset and Guidelines
 A Dependency Treebank of the Chinese Buddhist Canon
 Hidden Resources ― Strategies to Acquire and Exploit Potential Spoken Language Resources in National Archives
 Learning from Within? Comparing PoS Tagging Approaches for Historical Text
 Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification
 Question-Answering with Logic Specific to Video Games
 SubCo: A Learner Translation Corpus of Human and Machine Subtitles
 Multi-language Speech Collection for NIST LRE
 Selection Criteria for Low Resource Language Programs
 Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets
 Japanese Word―Color Associations with and without Contexts
 Phonetic Inventory for an Arabic Speech Corpus
 A Language Resource of German Errors Written by Children with Dyslexia
 MarsaGram: an excursion in the forests of parsing trees
 The IPR-cleared Corpus of Contemporary Written and Spoken Romanian Language
 Compilation of an Arabic Childrens Corpus
 CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech
 Corpus for Childrens Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)
 Annotating Logical Forms for EHR Questions
 Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus
 Evaluating a Topic Modelling Approach to Measuring Corpus Similarity
 Benchmarking Lexical Simplification Systems
 AIMU: Actionable Items for Meeting Understanding
 Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech
 Arabic to English Person Name Transliteration using Twitter
 Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario
 A Multi-Layered Annotated Corpus of Scientific Papers
 Korean TimeML and Korean TimeBank
 TEG-REP: A corpus of Textual Entailment Graphs based on Relation Extraction Patterns
 SYN2015: Representative Corpus of Contemporary Written Czech
 Challenges of Evaluating Sentiment Analysis Tools on Social Media
 EmoTweet-28: A Fine-Grained Emotion Corpus for Sentiment Analysis
 A Corpus of Images and Text in Online News
 WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles
 POS-tagging of Historical Dutch
 Accuracy of Automatic Cross-Corpus Emotion Labeling for Conversational Speech Corpus Commonization
 The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
 A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance
 A Bilingual Discourse Corpus and Its Applications
 Quality Assessment of the Reuters Vol. 2 Multilingual Corpus
 Language Resource Addition Strategies for Raw Text Parsing
 Information structure in the Potsdam Commentary Corpus: Topics
 Compasses, Magnets, Water Microscopes: Annotation of Terminology in a Diachronic Corpus of Scientific Texts
 The SpeDial datasets: datasets for Spoken Dialogue Systems analytics
 A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds
 The ILMT-s2s Corpus ― A Multimodal Interlingual Map Task Corpus
 The Negochat Corpus of Human-agent Negotiation Dialogues
 KorAP Architecture ― Diving in the Deep Sea of Corpus Data
 Name Translation based on Fine-grained Named Entity Recognition in a Single Language
 Wikification for Scriptio Continua
 Two Decades of Terminology: European Framework Programmes Titles
 The IFCASL Corpus of French and German Non-native and Native Read Speech
 Legal Text Interpretation: Identifying Hohfeldian Relations from Text
 Learning Tone and Attribution for Financial Text Mining
 Mirroring Facial Expressions and Emotions in Dyadic Conversations
 SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies
 Uzbek-English and Turkish-English Morpheme Alignment Corpora
 Text Segmentation of Digitized Clinical Texts
 Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
 Creating Annotated Dialogue Resources: Cross-domain Dialogue Act Classification
 Giving Lexical Resources a Second Life: Démonette, a Multi-sourced Morpho-semantic Network for French
 Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
 Lexical Resources to Enrich English Malayalam Machine Translation
 Building a Corpus of Errors and Quality in Machine Translation: Experiments on Error Impact
 Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset
 TTS for Low Resource Languages: A Bangla Synthesizer
 A Semantically Compositional Annotation Scheme for Time Normalization
 PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
 Corpus Annotation within the French FrameNet: a Domain-by-domain Methodology
 Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.
 Correcting Errors in a Treebank Based on Tree Mining
 Comparison of Emotional Understanding in Modality-Controlled Environments using Multimodal Online Emotional Communication Corpus
 A Multilingual, Multi-style and Multi-granularity Dataset for Cross-language Textual Similarity Detection
 Corpus Resources for Dispute Mediation Discourse
 The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories
 A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues
 WIKIPARQ: A Tabulated Wikipedia Resource Using the Parquet Format
 Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts
 Covering various Needs in Temporal Annotation: a Proposal of Extension of ISO TimeML that Preserves Upward Compatibility
 A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
 4Couv: A New Treebank for French
 Domain-Specific Corpus Expansion with Focused Webcrawling
 Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest
 A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research
 CORILSE: a Spanish Sign Language Repository for Linguistic Analysis
 A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
 Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
 An Arabic-Moroccan Darija Code-Switched Corpus
 The OFAI Multi-Modal Task Description Corpus
 A Tagged Corpus for Automatic Labeling of Disabilities in Medical Scientific Papers
 A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults
 Universal Dependencies v1: A Multilingual Treebank Collection
 FABIOLE, a Speech Database for Forensic Speaker Comparison
 A Japanese Chess Commentary Corpus
 InScript: Narrative texts annotated with script information
 Finding Definitions in Large Corpora with Sketch Engine
 Towards a Multi-dimensional Taxonomy of Stories in Dialogue
 PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
 Corpus-Based Diacritic Restoration for South Slavic Languages
 AfriBooms: An Online Treebank for Afrikaans
 Parallel Sentence Extraction from Comparable Corpora with Neural Network Features
 UPPC - Urdu Paraphrase Plagiarism Corpus
 A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization
 Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin
 How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
 Evaluating the Readability of Text Simplification Output for Readers with Cognitive Disabilities
 AMISCO: The Austrian German Multi-Sensor Corpus
 Emotion Analysis on Twitter: The Hidden Challenge
 A Database of Laryngeal High-Speed Videos with Simultaneous High-Quality Audio Recordings of Pathological and Non-Pathological Voices
 Identifying Content Types of Messages Related to Open Source Software Projects
 WTF-LOD - A New Resource for Large-Scale NER Evaluation
 C4Corpus: Multilingual Web-size Corpus with Free License
 Training & Quality Assessment of an Optical Character Recognition Model for Northern Haida
 Improving Information Extraction from Wikipedia Texts using Basic English
 Exploiting a Large Strongly Comparable Corpus
 Purely Corpus-based Automatic Conversation Authoring
 FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German
 Automatic identification of Mild Cognitive Impairment through the analysis of Italian spontaneous speech productions
 CINTIL DependencyBank PREMIUM - A Corpus of Grammatical Dependencies for Portuguese
 A General Framework for the Annotation of Causality Based on FrameNet
 PE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits
 Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies
 D(H)ante: A New Set of Tools for XIII Century Italian
 LexFr: Adapting the LexIt Framework to Build a Corpus-based French Subcategorization Lexicon
 QUEMDISSE? Reported speech in Portuguese
 Annotating Temporally-Anchored Spatial Knowledge on Top of OntoNotes Semantic Roles
 A Classification-based Approach to Economic Event Detection in Dutch News Text
 A Corpus of Gesture-Annotated Dialogues for Monologue-to-Dialogue Generation from Personal Narratives
 Construction of an English Dependency Corpus incorporating Compound Function Words
 Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons
 Design and Development of the MERLIN Learner Corpus Platform
 EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis
 The Universal Dependencies Treebank of Spoken Slovenian
 Introducing the Asian Language Treebank (ALT)
 The COPLE2 corpus: a learner corpus for Portuguese
 TGermaCorp -- A (Digital) Humanities Resource for (Computational) Linguistics
 1 Million Captioned Dutch Newspaper Images
 ANTUSD: A Large Chinese Sentiment Dictionary
 Multimodal Resources for Human-Robot Communication Modelling
 Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation
 The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
 Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
 Corpus for Customer Purchase Behavior Prediction in Social Media
 metaTED: a Corpus of Metadiscourse for Spoken Language
 Universal Dependencies for Norwegian
 TweetMT: A Parallel Microblog Corpus
 Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
 GRaSP: A Multilayered Annotation Scheme for Perspectives
 Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora
 NLP and Public Engagement: The Case of the Italian School Reform
 Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks
 Parallel Discourse Annotations on a Corpus of Short Texts
 BulPhonC: Bulgarian Speech Corpus for the Development of ASR Technology
 Designing a Speech Corpus for the Development and Evaluation of Dictation Systems in Latvian
 Poly-GrETEL: Cross-Lingual Example-based Querying of Syntactic Constructions
 Web Chat Conversations from Contact Centers: a Descriptive Study
 MEANTIME, the NewsReader Multilingual Event and Time Corpus
 LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl
 Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations
 The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation
 Crowdsourced Corpus with Entity Salience Annotations
 ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain
 Features for Generic Corpus Querying
 Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study
 Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis
 Cysill Ar-lein: A Corpus of Written Contemporary Welsh Compiled from an On-line Spelling and Grammar Checker
 Identification of Drug-Related Medical Conditions in Social Media
 Emotion Corpus Construction Based on Selection from Hashtags
 Mining the Spoken Wikipedia for Speech Data and Beyond
 On the Use of a Serious Game for Recording a Speech Corpus of People with Intellectual Disabilities
 A Corpus of Clinical Practice Guidelines Annotated with the Importance of Recommendations
 Construction and Analysis of a Large Vietnamese Text Corpus
 The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics
 The Methodius Corpus of Rhetorical Discourse Structures and Generated Texts
 SpaceRef: A corpus of street-level geographic descriptions
 That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models
 Constructing a Norwegian Academic Wordlist
 Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous
 CItA: an L1 Italian Learners Corpus to Study the Development of Writing Competence
 CEPLEXicon ― A Lexicon of Child European Portuguese
 Finding Recurrent Features of Image Schema Gestures: the FIGURE corpus
 Evaluating Lexical Simplification and Vocabulary Knowledge for Learners of French: Possibilities of Using the FLELex Resource
 A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation
 Parallel Speech Corpora of Japanese Dialects
 Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs
 Towards a Corpus of Violence Acts in Arabic Social Media
 Affective Lexicon Creation for the Greek Language
 The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles
 Multilevel Annotation of Agreement and Disagreement in Italian News Blogs
 PentoRef: A Corpus of Spoken References in Task-oriented Dialogues
 Building Language Resources for Exploring Autism Spectrum Disorders
 Comprehensive and Consistent PropBank Light Verb Annotation
 Summ-it++: an Enriched Version of the Summ-it Corpus
 Automatic Corpus Extension for Data-driven Natural Language Generation
 European Union Language Resources in Sketch Engine
 Extracting Structured Scholarly Information from the Machine Translation Literature
 Edit Categories and Editor Role Identification in Wikipedia
 Inconsistency Detection in Semantic Annotation
 Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers
 Staggered NLP-assisted refinement for Clinical Annotations of Chronic Disease Events
 SCARE ― The Sentiment Corpus of App Reviews with Fine-grained Annotations in German
 Developing a Dataset for Evaluating Approaches for Document Expansion with Images
 Coordinating Communication in the Wild: The Artwalk Dialogue Corpus of Pedestrian Navigation and Mobile Referential Communication
 A Multimodal Corpus for the Assessment of Public Speaking Ability and Anxiety
 Fast and Robust POS tagger for Arabic Tweets Using Agreement-based Bootstrapping
 WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words
 Datasets for Aspect-Based Sentiment Analysis in French
 Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS
 DART: a Dataset of Arguments and their Relations on Twitter
 Hypergraph Modelization of a Syntactically Annotated English Wikipedia Dump
 MADAD: A Readability Annotation Tool for Arabic Text
 Finding Alternative Translations in a Large Corpus of Movie Subtitle
 ASPEC: Asian Scientific Paper Excerpt Corpus
 Discontinuous Verb Phrases in Parsing and Machine Translation of English and German
 A Large-Scale Multilingual Disambiguation of Glosses
 Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation
 Crowdsourcing Salient Information from News and Tweets
 Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
 A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research
 TwiSty: A Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling
 TEITOK: Text-Faithful Annotated Corpora
 Extracting Interlinear Glossed Text from LaTeX Documents
 A Shared Task for Spoken CALL?
 From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
 Laughter in French Spontaneous Conversational Dialogs
 A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System
 The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods
 Persian Proposition Bank
 Dialogue System Characterisation by Back-channelling Patterns Extracted from Dialogue Corpus
 Creation of comparable corpora for English-{Urdu, Arabic, Persian}
 Detecting Annotation Scheme Variation in Out-of-Domain Treebanks
 SciCorp: A Corpus of English Scientific Articles Annotated for Information Status Analysis
 Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
 Universal Dependencies for Persian
 Aspect based Sentiment Analysis in Hindi: Resource Creation and Evaluation
 BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains
 A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
 Gulf Arabic Linguistic Resource Building for Sentiment Analysis
 If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers
 The CIRDO Corpus: Comprehensive Audio/Video Database of Domestic Falls of Elderly People
 Annotating Named Entities in Consumer Health Questions
 VPS-GradeUp: Graded Decisions on Usage Patterns
 Interoperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface
 PARC 3.0: A Corpus of Attribution Relations
 Hard Time Parsing Questions: Building a QuestionBank for French
 SuperCAT: The (New and Improved) Corpus Analysis Toolkit
 Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
 AppDialogue: Multi-App Dialogues for Intelligent Assistants
 A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs
 Urdu Summary Corpus
 Towards Automatic Identification of Effective Clues for Team Word-Guessing Games
 A CUP of CoFee: A large Collection of feedback Utterances Provided with communicative function annotations
 OSMAN ― A Novel Arabic Readability Metric
 Parallel Global Voices: a Collection of Multilingual Corpora with Citizen Media Stories
 Typed Entity and Relation Annotation on Computer Science Papers
 Speech Corpus Spoken by Young-old, Old-old and Oldest-old Japanese
 Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations
 Automatic Construction of Discourse Corpora for Dialogue Translation
 TermITH-Eval: a French Standard-Based Resource for Keyphrase Extraction Evaluation
 The Royal Society Corpus: From Uncharted Data to Corpus
 The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine
 ArchiMob - A Corpus of Spoken Swiss German
 Building Evaluation Datasets for Consumer-Oriented Information Retrieval
 Annotating Topic Development in Information Seeking Queries
 Detection of Reformulations in Spoken French
 Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene
 A Proposition Bank of Urdu
 A Hungarian Sentiment Corpus Manually Annotated at Aspect Level
 Creating a Lexicon of Bavarian Dialect by Means of Facebook Language Data and Crowdsourcing
 A Large Scale Corpus of Gulf Arabic
 CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis
 A Regional News Corpora for Contextualized Entity Discovery and Linking
 Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
 A Dataset for Open Event Extraction in English
 Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages
 Coreference Annotation Scheme and Relation Types for Hindi
 A Study of Reuse and Plagiarism in LREC papers
 A Reading Comprehension Corpus for Machine Translation Evaluation
 Transfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it?
 Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor's Love Affair
 A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
 Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature
 Designing A Long Lasting Linguistic Project: The Case Study of ASIt
 Controlled Propagation of Concept Annotations in Textual Corpora
 Exploiting Arabic Diacritization for High Quality Automatic Annotation
 An Extension of the Slovak Broadcast News Corpus based on Semi-Automatic Annotation
 Coreference in Prague Czech-English Dependency Treebank
 Joining-in-type Humanoid Robot Assisted Language Learning System
 Searching in the Penn Discourse Treebank Using the PML-Tree Query
 Rapid Development of Morphological Analyzers for Typologically Diverse Languages
 DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
 A Multi-domain Corpus of Swedish Word Sense Annotation
 A Corpus of Native, Non-native and Translated Texts
 He Said She Said ― a Male/Female Corpus of Polish
 Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)
 Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus
 corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora
 On Developing Resources for Patient-level Information Retrieval
 Graphical Annotation for Syntax-Semantics Mapping
 Monolingual Social Media Datasets for Detecting Contradiction and Entailment
 Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job
 Multi-label Annotation in Scientific Articles - The Multi-label Cancer Risk Assessment Corpus
 Improving the Annotation of Sentence Specificity
 Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments
 Czech Legal Text Treebank 1.0
 Building A Case-based Semantic English-Chinese Parallel Treebank
 NorGramBank: A Deep Treebank for Norwegian
 VerbLexPor: a lexical resource with semantic roles for Portuguese
 OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles
 Challenges and Solutions for Consistent Annotation of Vietnamese Treebank
 Crowdsourcing a Multi-lingual Speech Corpus: Recording, Transcription and Annotation of the CrowdIS Corpora
 First Steps Towards Coverage-Based Sentence Alignment
 Latin Vallex. A Treebank-based Semantic Valency Lexicon for Latin
 CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws
 Sentiframes: A Resource for Verb-centered German Sentiment Inference
 Temporal Information Annotation: Crowd vs. Experts
 PotTS: The Potsdam Twitter Sentiment Corpus
 Parallel Chinese-English Entities, Relations and Events Corpora
 Automatic Classification of Tweets for Analyzing Communication Behavior of Museums
 Adapting the TANL tool suite to Universal Dependencies
 
 |  
  | Crowdsourcing | A Gold Standard for Scalar Adjectives Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
 Remote Elicitation of Inflectional Paradigms to Seed Morphological Analysis in Low-Resource Languages
 The REAL Corpus: A Crowd-Sourced Corpus of Human Generated and Evaluated Spatial References to Real-World Urban Scenes
 Focus Annotation of Task-based Data: A Comparison of Expert and Crowd-Sourced Annotation in a Reading Comprehension Corpus
 Arabic Corpora for Credibility Analysis
 A Web Tool for Building Parallel Corpora of Spoken and Sign Languages
 The OnForumS corpus from the Shared Task on Online Forum Summarisation at MultiLing 2015
 A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)
 Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification
 Japanese Word―Color Associations with and without Contexts
 Wikipedia Titles As Noun Tag Predictors
 The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
 Crowdsourcing Ontology Lexicons
 The Negochat Corpus of Human-agent Negotiation Dialogues
 Analysis of English Spelling Errors in a Word-Typing Game
 Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.
 Towards Using Social Media to Identify Individuals at Risk for Preventable Chronic Illness
 A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
 InScript: Narrative texts annotated with script information
 Enhancing Access to Online Education: Quality Machine Translation of MOOC Content
 Annotating Temporally-Anchored Spatial Knowledge on Top of OntoNotes Semantic Roles
 Palabras: Crowdsourcing Transcriptions of L2 Speech
 Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations
 Crowdsourced Corpus with Entity Salience Annotations
 Cysill Ar-lein: A Corpus of Written Contemporary Welsh Compiled from an On-line Spelling and Grammar Checker
 EasyTree: A Graphical Tool for Dependency Tree Annotation
 Towards a Corpus of Violence Acts in Arabic Social Media
 Crowdsourcing Salient Information from News and Tweets
 Acquiring Opposition Relations among Italian Verb Senses using Crowdsourcing
 Semantic Relation Extraction with Semantic Patterns Experiment on Radiology Reports
 Creating a Lexicon of Bavarian Dialect by Means of Facebook Language Data and Crowdsourcing
 Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus
 Crowdsourcing a Multi-lingual Speech Corpus: Recording, Transcription and Annotation of the CrowdIS Corpora
 Temporal Information Annotation: Crowd vs. Experts
 
 |    
  
  | D |  
  | Dialogue | An Annotated Corpus of Direct Speech Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it
 Ubuntu-fr: A Large and Open Corpus for Multi-modal Analysis of Online Written Conversations
 DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter
 Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
 DT-Neg: Tutorial Dialogues Annotated for Negation Scope and Focus in Context
 A Dependency Treebank of the Chinese Buddhist Canon
 Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus
 AIMU: Actionable Items for Meeting Understanding
 The SpeDial datasets: datasets for Spoken Dialogue Systems analytics
 The Negochat Corpus of Human-agent Negotiation Dialogues
 Mirroring Facial Expressions and Emotions in Dyadic Conversations
 Creating Annotated Dialogue Resources: Cross-domain Dialogue Act Classification
 A Comparative Study of Text Preprocessing Approaches for Topic Detection of User Utterances
 Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
 Towards a Multi-dimensional Taxonomy of Stories in Dialogue
 A Document Repository for Social Media and Speech Conversations
 Purely Corpus-based Automatic Conversation Authoring
 A Corpus of Gesture-Annotated Dialogues for Monologue-to-Dialogue Generation from Personal Narratives
 The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics
 PentoRef: A Corpus of Spoken References in Task-oriented Dialogues
 The DialogBank
 Coordinating Communication in the Wild: The Artwalk Dialogue Corpus of Pedestrian Navigation and Mobile Referential Communication
 Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech
 Managing Linguistic and Terminological Variation in a Medical Dialogue System
 Laughter in French Spontaneous Conversational Dialogs
 A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System
 Dialogue System Characterisation by Back-channelling Patterns Extracted from Dialogue Corpus
 AppDialogue: Multi-App Dialogues for Intelligent Assistants
 A Verbal and Gestural Corpus of Story Retellings to an Expressive Embodied Virtual Character
 A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs
 Towards Automatic Identification of Effective Clues for Team Word-Guessing Games
 A CUP of CoFee: A large Collection of feedback Utterances Provided with communicative function annotations
 Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations
 ArchiMob - A Corpus of Spoken Swiss German
 Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
 A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
 Deep Learning of Audio and Language Features for Humor Prediction
 
 |  
  | Digital Libraries | A Computational Perspective on the Romanian Dialects Evaluating the Noisy Channel Model for the Normalization of Historical Texts: Basque, Spanish and Slovene
 Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means
 South African National Centre for Digital Language Resources
 Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers
 OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited
 Data Management Plans and Data Centers
 Lin|gu|is|tik: Building the Linguist's Pathway to Bibliographies, Libraries, Language Resources and Linked Open Data
 Designing A Long Lasting Linguistic Project: The Case Study of ASIt
 
 |  
  | Discourse Annotation, Representation and Processing | Falling silent, lost for words ... Tracing personal involvement in interviews with Dutch war veterans Focus Annotation of Task-based Data: A Comparison of Expert and Crowd-Sourced Annotation in a Reading Comprehension Corpus
 The OpenCourseWare Metadiscourse (OCWMD) Corpus
 ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
 Ubuntu-fr: A Large and Open Corpus for Multi-modal Analysis of Online Written Conversations
 DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter
 Quantitative Analysis of Gazes and Grounding Acts in L1 and L2 Conversations
 A Multi-Layered Annotated Corpus of Scientific Papers
 A Bilingual Discourse Corpus and Its Applications
 Information structure in the Potsdam Commentary Corpus: Topics
 The SpeDial datasets: datasets for Spoken Dialogue Systems analytics
 Learning Tone and Attribution for Financial Text Mining
 Adding Semantic Relations to a Large-Coverage Connective Lexicon of German
 Corpus Resources for Dispute Mediation Discourse
 A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues
 PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation
 Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
 A Tagged Corpus for Automatic Labeling of Disabilities in Medical Scientific Papers
 PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
 Fine-Grained Chinese Discourse Relation Labelling
 A Corpus of Gesture-Annotated Dialogues for Monologue-to-Dialogue Generation from Personal Narratives
 Argument Mining: the Bottleneck of Knowledge and Language Resources
 Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
 metaTED: a Corpus of Metadiscourse for Spoken Language
 Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks
 Parallel Discourse Annotations on a Corpus of Short Texts
 A Corpus of Clinical Practice Guidelines Annotated with the Importance of Recommendations
 The Methodius Corpus of Rhetorical Discourse Structures and Generated Texts
 The DialogBank
 From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
 Applying Core Scientific Concepts to Context-Based Citation Recommendation
 SciCorp: A Corpus of English Scientific Articles Annotated for Information Status Analysis
 PARC 3.0: A Corpus of Attribution Relations
 Using lexical and Dependency Features to Disambiguate Discourse Connectives in Hindi
 A CUP of CoFee: A large Collection of feedback Utterances Provided with communicative function annotations
 Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations
 Automatic Construction of Discourse Corpora for Dialogue Translation
 Annotating Topic Development in Information Seeking Queries
 Coreference Annotation Scheme and Relation Types for Hindi
 Transfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it?
 Searching in the Penn Discourse Treebank Using the PML-Tree Query
 Cohere: A Toolkit for Local Coherence
 Multi-label Annotation in Scientific Articles - The Multi-label Cancer Risk Assessment Corpus
 Improving the Annotation of Sentence Specificity
 
 |  
  | Document Classification, Text categorisation | Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource An Empirical Exploration of Moral Foundations Theory in Partisan News Sources
 DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining
 Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish
 Age and Gender Prediction on Health Forum Data
 Comparing Speech and Text Classification on ICNALE
 A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)
 SatiricLR: a Language Resource of Satirical News Articles
 Compilation of an Arabic Childrens Corpus
 Quality Assessment of the Reuters Vol. 2 Multilingual Corpus
 Learning Tone and Attribution for Financial Text Mining
 Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset
 A Comparative Study of Text Preprocessing Approaches for Topic Detection of User Utterances
 A Comparison of Domain-based Word Polarity Estimation using different Word Embeddings
 Towards a Multi-dimensional Taxonomy of Stories in Dialogue
 A Semi-Supervised Approach for Gender Identification
 Identifying Content Types of Messages Related to Open Source Software Projects
 Ensemble Classification of Grants using LDA-based Features
 Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
 Emotion Corpus Construction Based on Selection from Hashtags
 A Corpus of Clinical Practice Guidelines Annotated with the Importance of Recommendations
 Towards a Corpus of Violence Acts in Arabic Social Media
 Edit Categories and Editor Role Identification in Wikipedia
 Exploring the Realization of Irony in Twitter Data
 Evaluation Set for Slovak News Information Retrieval
 Discriminating Similar Languages: Evaluations and Explorations
 Modeling Language Change in Historical Corpora: The Case of Portuguese
 Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages
 Specialising Paragraph Vectors for Text Polarity Detection
 A Corpus of Native, Non-native and Translated Texts
 He Said She Said ― a Male/Female Corpus of Polish
 Cohere: A Toolkit for Local Coherence
 Multi-label Annotation in Scientific Articles - The Multi-label Cancer Risk Assessment Corpus
 MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment
 Detecting Expressions of Blame or Praise in Text
 Automatic Classification of Tweets for Analyzing Communication Behavior of Museums
 
 |    
  
  | E |  
  | Emotion Recognition/Generation | Falling silent, lost for words ... Tracing personal involvement in interviews with Dutch war veterans EmoTweet-28: A Fine-Grained Emotion Corpus for Sentiment Analysis
 Accuracy of Automatic Cross-Corpus Emotion Labeling for Conversational Speech Corpus Commonization
 Mirroring Facial Expressions and Emotions in Dyadic Conversations
 Detecting Implicit Expressions of Affect from Text using Semantic Knowledge on Common Concept Properties
 Comparison of Emotional Understanding in Modality-Controlled Environments using Multimodal Online Emotional Communication Corpus
 Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest
 A Comparison of Domain-based Word Polarity Estimation using different Word Embeddings
 Emotion Analysis on Twitter: The Hidden Challenge
 AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
 Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
 Could Speaker, Gender or Age Awareness be beneficial in Speech-based Emotion Recognition?
 Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous
 Affective Lexicon Creation for the Greek Language
 Datasets for Aspect-Based Sentiment Analysis in French
 Evaluating Context Selection Strategies to Build Emotive Vector Space Models
 Sentiment Analysis in Social Networks through Topic modeling
 A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs
 Deep Learning of Audio and Language Features for Humor Prediction
 PotTS: The Potsdam Twitter Sentiment Corpus
 
 |  
  | Endangered Languages | Endangered Language Documentation: Bootstrapping a Chatino Speech Corpus, Forced Aligner, ASR A Finite-state Morphological Analyser for Tuvan
 Remote Elicitation of Inflectional Paradigms to Seed Morphological Analysis in Low-Resource Languages
 Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project
 The Alaskan Athabascan Grammar Database
 Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy
 Constraint-Based Bilingual Lexicon Induction for Closely Related Languages
 Selection Criteria for Low Resource Language Programs
 Data Formats and Management Strategies from the Perspective of Language Resource Producers ― Personal Diachronic and Social Synchronic Data Sharing ―
 A Morphological Lexicon of Esperanto with Morpheme Frequencies
 Training & Quality Assessment of an Optical Character Recognition Model for Northern Haida
 Fostering digital representation of EU regional and minority languages: the Digital Language Diversity Project
 Cysill Ar-lein: A Corpus of Written Contemporary Welsh Compiled from an On-line Spelling and Grammar Checker
 Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
 Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish
 If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers
 Legacy language atlas data mining: mapping Kru languages
 A Rule-based Shallow-transfer Machine Translation System for Scots and English
 
 |  
  | Evaluation Methodologies | Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility Ecological Gestures for HRI: the GEE Corpus
 Complementarity, F-score, and NLP Evaluation
 DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining
 Manual and Automatic Paraphrases for MT Evaluation
 LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
 Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation
 Revisiting Summarization Evaluation for Scientific Articles
 Whats the Issue Here?: Task-based Evaluation of Reader Comment Summarization Systems
 RankDCG: Rank-Ordering Evaluation Measure
 Spanish Word Vectors from Wikipedia
 The Language Application Grid and Galaxy
 Multi-language Speech Collection for NIST LRE
 An Empirical Study of Arabic Formulaic Sequence Extraction Methods
 Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
 Evaluating a Topic Modelling Approach to Measuring Corpus Similarity
 Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means
 Use of Domain-Specific Language Resources in Machine Translation
 Exploitation of Co-reference in Distributional Semantics
 A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance
 Compasses, Magnets, Water Microscopes: Annotation of Terminology in a Diachronic Corpus of Scientific Texts
 A Novel Evaluation Method for Morphological Segmentation
 Building a Corpus of Errors and Quality in Machine Translation: Experiments on Error Impact
 Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts
 PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation
 Linguistically Inspired Language Model Augmentation for MT
 UPPC - Urdu Paraphrase Plagiarism Corpus
 Evaluating the Readability of Text Simplification Output for Readers with Cognitive Disabilities
 Word Embedding Evaluation and Combination
 PE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits
 D(H)ante: A New Set of Tools for XIII Century Italian
 Benchmarking multimedia technologies with the CAMOMILE platform: the case of Multimodal Person Discovery at MediaEval 2015
 Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
 Using Contextual Information for Machine Translation Evaluation
 Evaluating the Impact of Light Post-Editing on Usability
 Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation
 Evaluating Machine Translation in a Usage Scenario
 Cross-validating Image Description Datasets and Evaluation Metrics
 OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited
 WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words
 Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
 Comparing the Level of Code-Switching in Corpora
 Evaluation Set for Slovak News Information Retrieval
 The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods
 Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
 Tools and Guidelines for Principled Machine Translation Development
 Generating Task-Pertinent sorted Error Lists for Speech Recognition
 Towards Automatic Identification of Effective Clues for Team Word-Guessing Games
 OSMAN ― A Novel Arabic Readability Metric
 EVALution-MAN: A Chinese Dataset for the Training and Evaluation of DSMs
 Analysing Constraint Grammars with a SAT-solver
 The Trials and Tribulations of Predicting Post-Editing Productivity
 Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization
 A Regional News Corpora for Contextualized Entity Discovery and Linking
 Evaluating Interactive System Adaptation
 Applying the Cognitive Machine Translation Evaluation Approach to Arabic
 A Reading Comprehension Corpus for Machine Translation Evaluation
 B2SG: a TOEFL-like Task for Portuguese
 Translation Errors and Incomprehensibility: a Case Study using Machine-Translated Second Language Proficiency Tests
 Distributional Thesauri for Information Retrieval and vice versa
 MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment
 
 |          
  
  | L |  
  | Language Identification | Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource Multi-language Speech Collection for NIST LRE
 An Arabic-Moroccan Darija Code-Switched Corpus
 Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS
 Assessing the Potential of Metaphoricity of verbs using corpus data
 Discriminating Similar Languages: Evaluations and Explorations
 
 |  
  | Language Modelling | MARMOT: A Toolkit for Translation Quality Estimation at the Word Level Deriving Morphological Analyzers from Example Inflections
 Discriminative Analysis of Linguistic Features for Typological Study
 Morphological Analysis of Sahidic Coptic for Automatic Glossing
 Factuality Annotation and Learning in Spanish Texts
 Creating Linked Data Morphological Language Resources with MMoOn - The Hebrew Morpheme Inventory
 Using SMT for OCR Error Correction of Historical Texts
 Domain-Specific Corpus Expansion with Focused Webcrawling
 Linguistically Inspired Language Model Augmentation for MT
 Leveraging Native Data to Correct Preposition Errors in Learners' Dutch
 GRaSP: A Multilayered Annotation Scheme for Perspectives
 SCALE: A Scalable Language Engineering Toolkit
 Towards a Linguistic Ontology with an Emphasis on Reasoning and Knowledge Reuse
 Extracting Weighted Language Lexicons from Wikipedia
 Filtering Wiktionary Triangles by Linear Mbetween Distributed Word Models
 Discriminating Similar Languages: Evaluations and Explorations
 
 |  
  | Lexicon, Lexical Database | Semantic Links for Portuguese A Gold Standard for Scalar Adjectives
 A Finite-state Morphological Analyser for Tuvan
 The Gavagai Living Lexicon
 VerbCROcean: A Repository of Fine-Grained Semantic Verb Relations for Croatian
 Rule-based Automatic Multi-word Term Extraction and Lemmatization
 A New Integrated Open-source Morphological Analyzer for Hungarian
 Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality
 Enriching a Portuguese WordNet using Synonyms from a Monolingual Dictionary
 Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms
 Tēzaurs.lv: the Largest Open Lexical Database for Latvian
 NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic
 VoxML: A Visualization Modeling Language
 Example-based Acquisition of Fine-grained Collocation Resources
 A Finite-State Morphological Analyser for Sindhi
 A Computational Perspective on the Romanian Dialects
 The on-line version of Grammatical Dictionary of Polish
 A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
 Synset Ranking of Hindi WordNet
 Evaluating Lexical Similarity to build Sentiment Similarity
 Constraint-Based Bilingual Lexicon Induction for Closely Related Languages
 An Empirical Study of Arabic Formulaic Sequence Extraction Methods
 Japanese Word―Color Associations with and without Contexts
 A Language Resource of German Errors Written by Children with Dyslexia
 Discovering Fuzzy Synsets from the Redundancy in Different Lexical-Semantic Resources
 Aspectual Flexibility Increases with Agentivity and Concreteness\\ A Computational Classification Experiment on Polysemous Verbs
 "LVF-lemon ― Towards a Linked Data Representation of ""Les Verbes français"""
 A Framework for Cross-lingual/Node-wise Alignment of Lexical-Semantic Resources
 Crowdsourcing Ontology Lexicons
 Curation of Dutch Regional Dictionaries
 A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon
 A lexicon of perception for the identification of synaesthetic metaphors in corpora
 Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
 Wikification for Scriptio Continua
 Two Decades of Terminology: European Framework Programmes Titles
 Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
 A Morphological Lexicon of Esperanto with Morpheme Frequencies
 How does Dictionary Size Influence Performance of Vietnamese Word Segmentation?
 Adding Semantic Relations to a Large-Coverage Connective Lexicon of German
 SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners
 Giving Lexical Resources a Second Life: Démonette, a Multi-sourced Morpho-semantic Network for French
 Lexical Resources to Enrich English Malayalam Machine Translation
 Creating a General Russian Sentiment Lexicon
 TTS for Low Resource Languages: A Bangla Synthesizer
 GhoSt-NN: A Representative Gold Standard of German Noun-Noun Compounds
 A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
 Building Concept Graphs from Monolingual Dictionary Entries
 Detecting Optional Arguments of Verbs
 New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian
 Classifying Out-of-vocabulary Terms in a Domain-Specific Social Media Corpus
 DeQue: A Lexicon of Complex Prepositions and Conjunctions in French
 A Japanese Chess Commentary Corpus
 Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
 Encoding Adjective Scales for Fine-grained Resources
 How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
 Automatic Enrichment of WordNet with Common-Sense Knowledge
 Semantic Layer of the Valence Dictionary of Polish Walenty
 Ambiguity Diagnosis for Terms in Digital Humanities
 A General Framework for the Annotation of Causality Based on FrameNet
 LexFr: Adapting the LexIt Framework to Build a Corpus-based French Subcategorization Lexicon
 QUEMDISSE? Reported speech in Portuguese
 Extending Monolingual Semantic Textual Similarity Task to Multiple Cross-lingual Settings
 Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons
 The Hebrew FrameNet Project
 Addressing the MFS Bias in WSD systems
 A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions
 Italian VerbNet: A Construction-based Approach to Italian Verb Classification
 TGermaCorp -- A (Digital) Humanities Resource for (Computational) Linguistics
 LELIO: An Auto-Adaptative System to Acquire Domain Lexical Knowledge in Technical Texts
 Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
 Challenges of Adjective Mapping between plWordNet and Princeton WordNet
 Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study
 Accessing and Elaborating Walenty - a Valence Dictionary of Polish - via Internet Browser
 CEPLEXicon ― A Lexicon of Child European Portuguese
 Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
 Evaluating Lexical Simplification and Vocabulary Knowledge for Learners of French: Possibilities of Using the FLELex Resource
 Automatically Generated Affective Norms of Abstractness, Arousal, Imageability and Valence for 350 000 German Lemmas
 Affective Lexicon Creation for the Greek Language
 A Large Rated Lexicon with French Medical Words
 Mapping Ontologies Using Ontologies: Cross-lingual Semantic Role Information Transfer
 Multi-prototype Chinese Character Embedding
 Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries
 Extracting Weighted Language Lexicons from Wikipedia
 Best of Both Worlds: Making Word Sense Embeddings Interpretable
 Evaluating Context Selection Strategies to Build Emotive Vector Space Models
 Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
 Port4NooJ v3.0: Integrated Linguistic Resources for Portuguese NLP
 Managing Linguistic and Terminological Variation in a Medical Dialogue System
 Assessing the Potential of Metaphoricity of verbs using corpus data
 Filtering Wiktionary Triangles by Linear Mbetween Distributed Word Models
 A comparison of Named-Entity Disambiguation and Word Sense Disambiguation
 BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains
 Gulf Arabic Linguistic Resource Building for Sentiment Analysis
 A Lexical Resource for the Identification of Weak Words in German Specification Documents
 PARSEME Survey on MWE Resources
 Generating a Large-Scale Entity Linking Dictionary from Wikipedia Link Structure and Article Text
 Refurbishing a Morphological Database for German
 ANEW+: Automatic Expansion and Validation of Affective Norms of Words Lexicons in Multiple Languages
 Recent Advances in Development of a Lexicon-Grammar of Polish: PolNet 3.0
 Creating a Lexicon of Bavarian Dialect by Means of Facebook Language Data and Crowdsourcing
 A Rule-based Shallow-transfer Machine Translation System for Scots and English
 Effect Functors for Opinion Inference
 PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data
 Multiword Expressions in Child Language
 A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora
 Database of Mandarin Neighborhood Statistics
 Wow! What a Useful Extension! Introducing Non-Referential Concepts to Wordnet
 Graph-Based Induction of Word Senses in Croatian
 SlangNet: A WordNet like resource for English Slang
 B2SG: a TOEFL-like Task for Portuguese
 A Multi-domain Corpus of Swedish Word Sense Annotation
 Wiktionnaire's Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary
 Distributional Thesauri for Information Retrieval and vice versa
 ALT Explored: Integrating an Online Dialectometric Tool and an Online Dialect Atlas
 VerbLexPor: a lexical resource with semantic roles for Portuguese
 A Multilingual Predicate Matrix
 Latin Vallex. A Treebank-based Semantic Valency Lexicon for Latin
 Sentiframes: A Resource for Verb-centered German Sentiment Inference
 Named Entity Resources - Overview and Outlook
 Merging Data Resources for Inflectional and Derivational Morphology in Czech
 
 |  
  | Linked Data | Semantic Links for Portuguese Publishing the Trove Newspaper Corpus
 Cross-lingual RDF Thesauri Interlinking
 Concepticon: A Resource for the Linking of Concept Lists
 "LVF-lemon ― Towards a Linked Data Representation of ""Les Verbes français"""
 A Corpus of Images and Text in Online News
 WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles
 WTF-LOD - A New Resource for Large-Scale NER Evaluation
 Riddle Generation using Word Associations
 Challenges of Adjective Mapping between plWordNet and Princeton WordNet
 Relation- and Phrase-level Linking of FrameNet with Sar-graphs
 Crosswalking from CMDI to Dublin Core and MARC 21
 Mapping Ontologies Using Ontologies: Cross-lingual Semantic Role Information Transfer
 Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries
 Generating a Large-Scale Entity Linking Dictionary from Wikipedia Link Structure and Article Text
 Lin|gu|is|tik: Building the Linguist's Pathway to Bibliographies, Libraries, Language Resources and Linked Open Data
 The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
 PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data
 Open Data Vocabularies for Assigning Usage Rights to Data Resources from Translation Projects
 
 |  
  | LR Infrastructures and Architectures | Two Architectures for Parallel Processing of Huge Amounts of Text Trends in HLT Research: A Survey of LDC's Data Scholarship Program
 How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment
 Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it
 Publishing the Trove Newspaper Corpus
 Corpus Query Lingua Franca (CQLF)
 Providing a Catalogue of Language Resources for Commercial Users
 Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research
 Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
 Collecting Language Resources for the Latvian e-Government Machine Translation Platform
 The Language Application Grid and Galaxy
 Learning from Within? Comparing PoS Tagging Approaches for Historical Text
 ELRA Activities and Services
 New Developments in the LRE Map
 Data Formats and Management Strategies from the Perspective of Language Resource Producers ― Personal Diachronic and Social Synchronic Data Sharing ―
 Korean TimeML and Korean TimeBank
 The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources
 Analysis of English Spelling Errors in a Word-Typing Game
 A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research
 EstNLTK - NLP Toolkit for Estonian
 South African National Centre for Digital Language Resources
 A Document Repository for Social Media and Speech Conversations
 C4Corpus: Multilingual Web-size Corpus with Free License
 Using a Language Technology Infrastructure for German in order to Anonymize German Sign Language Corpus Data
 Design and Development of the MERLIN Learner Corpus Platform
 The Hebrew FrameNet Project
 FLAT: Constructing a CLARIN Compatible Home for Language Resources
 The BAS Speech Data Repository
 CLARIAH in the Netherlands
 Crosswalking from CMDI to Dublin Core and MARC 21
 LREC as a Graph: People and Resources in a Network
 Hypergraph Modelization of a Syntactically Annotated English Wikipedia Dump
 MADAD: A Readability Annotation Tool for Arabic Text
 Data Management Plans and Data Centers
 Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities
 UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
 Facilitating Metadata Interoperability in CLARIN-DK
 The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
 Towards a Language Service Infrastructure for Mobile Environments
 Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)
 GATE-Time: Extraction of Temporal Expressions and Events
 corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora
 Open Data Vocabularies for Assigning Usage Rights to Data Resources from Translation Projects
 NorGramBank: A Deep Treebank for Norwegian
 CLARIN-EL Web-based Annotation Tool
 
 |  
  | LR National/International Projects, Infrastructural/Policy issues | NLP Infrastructure for the Lithuanian Language CodE Alltag: A German-Language E-Mail Corpus
 LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
 Providing a Catalogue of Language Resources for Commercial Users
 Hidden Resources ― Strategies to Acquire and Exploit Potential Spoken Language Resources in National Archives
 ELRA Activities and Services
 Language Resource Citation: the ISLRN Dissemination and Further Developments
 The ELRA License Wizard
 Review on the Existing Language Resources for Languages of France
 Selection Criteria for Low Resource Language Programs
 New Developments in the LRE Map
 Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities
 The IPR-cleared Corpus of Contemporary Written and Spoken Romanian Language
 SYN2015: Representative Corpus of Contemporary Written Czech
 Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
 South African Language Resources: Phrase Chunking
 A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions
 Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora
 Fostering digital representation of EU regional and minority languages: the Digital Language Diversity Project
 CLARIAH in the Netherlands
 LREC as a Graph: People and Resources in a Network
 Port4NooJ v3.0: Integrated Linguistic Resources for Portuguese NLP
 Persian Proposition Bank
 Data Management Plans and Data Centers
 Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities
 Evaluating Interactive System Adaptation
 The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
 The Public License Selector: 
Making Open Licensing Easier
 Graphical Annotation for Syntax-Semantics Mapping
 Government Domain Named Entity Recognition for South African Languages
 
 |    
  
  | M |  
  | Machine Translation, SpeechToSpeech Translation | Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models Manual and Automatic Paraphrases for MT Evaluation
 Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation
 Privacy Issues in Online Machine Translation Services - European Perspective
 Phrase Level Segmentation and Labelling of Machine Translation Errors
 The United Nations Parallel Corpus v1.0
 Building the Macedonian-Croatian Parallel Corpus
 Collecting Language Resources for the Latvian e-Government Machine Translation Platform
 SubCo: A Learner Translation Corpus of Human and Machine Subtitles
 Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities
 Syntax-based Multi-system Machine Translation
 Use of Domain-Specific Language Resources in Machine Translation
 A Bilingual Discourse Corpus and Its Applications
 Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems
 CATaLog Online: Porting a Post-editing Tool to the Web
 The ILMT-s2s Corpus ― A Multimodal Interlingual Map Task Corpus
 Name Translation based on Fine-grained Named Entity Recognition in a Single Language
 Uzbek-English and Turkish-English Morpheme Alignment Corpora
 Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
 Using SMT for OCR Error Correction of Historical Texts
 Lexical Resources to Enrich English Malayalam Machine Translation
 Building a Corpus of Errors and Quality in Machine Translation: Experiments on Error Impact
 Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts
 PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation
 Linguistically Inspired Language Model Augmentation for MT
 Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
 Parallel Sentence Extraction from Comparable Corpora with Neural Network Features
 Enhancing Access to Online Education: Quality Machine Translation of MOOC Content
 Exploiting a Large Strongly Comparable Corpus
 Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
 PE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits
 Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons
 English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting
 Introducing the Asian Language Treebank (ALT)
 TweetMT: A Parallel Microblog Corpus
 Evaluating Translation Quality and CLIR Performance of Query Sessions
 Using Contextual Information for Machine Translation Evaluation
 That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models
 Evaluating the Impact of Light Post-Editing on Usability
 Bootstrapping a Hybrid MT System to a New Language Pair
 Evaluating Machine Translation in a Usage Scenario
 Using BabelNet to Improve OOV Coverage in SMT
 WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words
 Finding Alternative Translations in a Large Corpus of Movie Subtitle
 ASPEC: Asian Scientific Paper Excerpt Corpus
 Discontinuous Verb Phrases in Parsing and Machine Translation of English and German
 Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation
 Evaluation of the KIT Lecture Translation System
 Filtering Wiktionary Triangles by Linear Mbetween Distributed Word Models
 Tools and Guidelines for Principled Machine Translation Development
 ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
 The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine
 The Trials and Tribulations of Predicting Post-Editing Productivity
 A Rule-based Shallow-transfer Machine Translation System for Scots and English
 Applying the Cognitive Machine Translation Evaluation Approach to Arabic
 A Reading Comprehension Corpus for Machine Translation Evaluation
 Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor's Love Affair
 IRIS: English-Irish Machine Translation System
 Translation Errors and Incomprehensibility: a Case Study using Machine-Translated Second Language Proficiency Tests
 Building A Case-based Semantic English-Chinese Parallel Treebank
 OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles
 Towards producing bilingual lexica from monolingual corpora
 First Steps Towards Coverage-Based Sentence Alignment
 
 |  
  | Metadata | The United Nations Parallel Corpus v1.0 Review on the Existing Language Resources for Languages of France
 New Developments in the LRE Map
 A Language Resource of German Errors Written by Children with Dyslexia
 The IPR-cleared Corpus of Contemporary Written and Spoken Romanian Language
 Compilation of an Arabic Childrens Corpus
 The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources
 FLAT: Constructing a CLARIN Compatible Home for Language Resources
 CLARIAH in the Netherlands
 Crosswalking from CMDI to Dublin Core and MARC 21
 Automatically Generated Affective Norms of Abstractness, Arousal, Imageability and Valence for 350 000 German Lemmas
 LREC as a Graph: People and Resources in a Network
 A Lexical Resource for the Identification of Weak Words in German Specification Documents
 PARSEME Survey on MWE Resources
 Facilitating Metadata Interoperability in CLARIN-DK
 The Royal Society Corpus: From Uncharted Data to Corpus
 Open Data Vocabularies for Assigning Usage Rights to Data Resources from Translation Projects
 
 |  
  | Morphology | A Finite-state Morphological Analyser for Tuvan Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility
 Remote Elicitation of Inflectional Paradigms to Seed Morphological Analysis in Low-Resource Languages
 A New Integrated Open-source Morphological Analyzer for Hungarian
 A Proposal for a Part-of-Speech Tagset for the Albanian Language
 Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms
 Tēzaurs.lv: the Largest Open Lexical Database for Latvian
 A Finite-State Morphological Analyser for Sindhi
 Deriving Morphological Analyzers from Example Inflections
 Morphological Analysis of Sahidic Coptic for Automatic Glossing
 The on-line version of Grammatical Dictionary of Polish
 Creating Linked Data Morphological Language Resources with MMoOn - The Hebrew Morpheme Inventory
 Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy
 Evaluating the Noisy Channel Model for the Normalization of Historical Texts: Basque, Spanish and Slovene
 Farasa: A New Fast and Accurate Arabic Word Segmenter
 A Novel Evaluation Method for Morphological Segmentation
 A Morphological Lexicon of Esperanto with Morpheme Frequencies
 How does Dictionary Size Influence Performance of Vietnamese Word Segmentation?
 Giving Lexical Resources a Second Life: Démonette, a Multi-sourced Morpho-semantic Network for French
 Universal Dependencies v1: A Multilingual Treebank Collection
 Syntactic Analysis of Phrasal Compounds in Corpora: a Challenge for NLP Tools
 Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
 Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis
 Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art
 Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
 DALILA: The Dialectal Arabic Linguistic Learning Assistant
 Refurbishing a Morphological Database for German
 A Large Scale Corpus of Gulf Arabic
 A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora
 Exploiting Arabic Diacritization for High Quality Automatic Annotation
 Rapid Development of Morphological Analyzers for Typologically Diverse Languages
 A Neural Lemmatizer for Bengali
 Merging Data Resources for Inflectional and Derivational Morphology in Czech
 
 |  
  | Multilinguality | Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality
 Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl
 Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms
 A Computational Perspective on the Romanian Dialects
 A Turkish-German Code-Switching Corpus
 Introducing the LCC Metaphor Datasets
 Comparing Speech and Text Classification on ICNALE
 Modelling a Parallel Corpus of French and French Belgian Sign Language
 The United Nations Parallel Corpus v1.0
 Building the Macedonian-Croatian Parallel Corpus
 Two Years of Aranea: Increasing Counts and Tuning the Pipeline
 Universal Dependencies for Japanese
 Cross-lingual RDF Thesauri Interlinking
 Quantitative Analysis of Gazes and Grounding Acts in L1 and L2 Conversations
 SemRelData ― Multilingual Contextual Annotation of Semantic Relations between Nominals: Dataset and Guidelines
 Speech Synthesis of Code-Mixed Text
 Crowdsourcing Ontology Lexicons
 CATaLog Online: Porting a Post-editing Tool to the Web
 Sentiment Lexicons for Arabic Social Media
 The IFCASL Corpus of French and German Non-native and Native Read Speech
 Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
 Uzbek-English and Turkish-English Morpheme Alignment Corpora
 Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
 PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
 A Multilingual, Multi-style and Multi-granularity Dataset for Cross-language Textual Similarity Detection
 WIKIPARQ: A Tabulated Wikipedia Resource Using the Parquet Format
 South African National Centre for Digital Language Resources
 C4Corpus: Multilingual Web-size Corpus with Free License
 Cognitively Motivated Distributional Representations of Meaning
 Extending Monolingual Semantic Textual Similarity Task to Multiple Cross-lingual Settings
 Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms
 EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis
 English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting
 The COPLE2 corpus: a learner corpus for Portuguese
 Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof
 Challenges of Adjective Mapping between plWordNet and Princeton WordNet
 Poly-GrETEL: Cross-Lingual Example-based Querying of Syntactic Constructions
 MEANTIME, the NewsReader Multilingual Event and Time Corpus
 Evaluating Translation Quality and CLIR Performance of Query Sessions
 Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation
 European Union Language Resources in Sketch Engine
 FREME: Multilingual Semantic Enrichment with Linked Data and Language Technologies
 Evaluating Machine Translation in a Usage Scenario
 Finding Alternative Translations in a Large Corpus of Movie Subtitle
 ASPEC: Asian Scientific Paper Excerpt Corpus
 Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis
 Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
 A Large-Scale Multilingual Disambiguation of Glosses
 MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
 Comparing the Level of Code-Switching in Corpora
 Creation of comparable corpora for English-{Urdu, Arabic, Persian}
 Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities
 Parallel Global Voices: a Collection of Multilingual Corpora with Citizen Media Stories
 The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine
 Combining Ontologies and Neural Networks for Analyzing Historical Language Varieties. A Case Study in Middle Low German
 Applying the Cognitive Machine Translation Evaluation Approach to Arabic
 Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor's Love Affair
 UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing
 Coreference in Prague Czech-English Dependency Treebank
 IRIS: English-Irish Machine Translation System
 Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments
 OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles
 A Multilingual Predicate Matrix
 Towards producing bilingual lexica from monolingual corpora
 
 |  
  | Multimedia Document Processing | SubCo: A Learner Translation Corpus of Human and Machine Subtitles A Corpus of Images and Text in Online News
 Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context
 A Japanese Chess Commentary Corpus
 Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles
 1 Million Captioned Dutch Newspaper Images
 The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
 Developing a Dataset for Evaluating Approaches for Document Expansion with Images
 ArchiMob - A Corpus of Spoken Swiss German
 
 |  
  | MultiWord Expressions & Collocations | Rule-based Automatic Multi-word Term Extraction and Lemmatization Example-based Acquisition of Fine-grained Collocation Resources
 MWEs in Treebanks: From Survey to Guidelines
 Multiword Expressions Dataset for Indian Languages
 An Empirical Study of Arabic Formulaic Sequence Extraction Methods
 A lexicon of perception for the identification of synaesthetic metaphors in corpora
 Compasses, Magnets, Water Microscopes: Annotation of Terminology in a Diachronic Corpus of Scientific Texts
 Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
 mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing
 TermoPL - a Flexible Tool for Terminology Extraction
 GhoSt-NN: A Representative Gold Standard of German Noun-Noun Compounds
 DeQue: A Lexicon of Complex Prepositions and Conjunctions in French
 Construction of an English Dependency Corpus incorporating Compound Function Words
 Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms
 Distribution of Valency Complements in Czech Complex Predicates: Between Verb and Noun
 A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions
 Forecasting Emerging Trends from Scientific Literature
 Comprehensive and Consistent PropBank Light Verb Annotation
 Inconsistency Detection in Semantic Annotation
 Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
 PARSEME Survey on MWE Resources
 Recent Advances in Development of a Lexicon-Grammar of Polish: PolNet 3.0
 Multiword Expressions in Child Language
 
 |      
  
  | O |  
  | Ontologies | Ecological Gestures for HRI: the GEE Corpus Semi-automatic Parsing for Web Knowledge Extraction through Semantic Annotation
 Metonymy Analysis Using Associative Relations between Words
 Creating Linked Data Morphological Language Resources with MMoOn - The Hebrew Morpheme Inventory
 A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
 Annotating Logical Forms for EHR Questions
 Domain Ontology Learning Enhanced by Optimized Relation Instance in DBpedia
 A Framework for Cross-lingual/Node-wise Alignment of Lexical-Semantic Resources
 Issues and Challenges in Annotating Urdu Action Verbs on the IMAGACT4ALL Platform
 Towards a Linguistic Ontology with an Emphasis on Reasoning and Knowledge Reuse
 Constructing a Norwegian Academic Wordlist
 Mapping Ontologies Using Ontologies: Cross-lingual Semantic Role Information Transfer
 Extracting Structured Scholarly Information from the Machine Translation Literature
 Managing Linguistic and Terminological Variation in a Medical Dialogue System
 The Event and Implied Situation Ontology (ESO): Application and Evaluation
 Semantic Relation Extraction with Semantic Patterns Experiment on Radiology Reports
 Combining Ontologies and Neural Networks for Analyzing Historical Language Varieties. A Case Study in Middle Low German
 PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data
 Wow! What a Useful Extension! Introducing Non-Referential Concepts to Wordnet
 Automatic Biomedical Term Polysemy Detection
 
 |  
  | Opinion Mining / Sentiment Analysis | Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic
 DRANZIERA: An Evaluation Protocol For Multi-Domain Opinion Mining
 OPFI: A Tool for Opinion Finding in Polish
 SatiricLR: a Language Resource of Satirical News Articles
 Evaluating Lexical Similarity to build Sentiment Similarity
 Using Data Mining Techniques for Sentiment Shifter Identification
 Challenges of Evaluating Sentiment Analysis Tools on Social Media
 EmoTweet-28: A Fine-Grained Emotion Corpus for Sentiment Analysis
 A Dataset for Detecting Stance in Tweets
 Sentiment Lexicons for Arabic Social Media
 Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
 Detecting Implicit Expressions of Affect from Text using Semantic Knowledge on Common Concept Properties
 Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset
 Creating a General Russian Sentiment Lexicon
 A Comparison of Domain-based Word Polarity Estimation using different Word Embeddings
 Encoding Adjective Scales for Fine-grained Resources
 Emotion Analysis on Twitter: The Hidden Challenge
 EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis
 A Language Independent Method for Generating Large Scale Polarity Lexicons
 ANTUSD: A Large Chinese Sentiment Dictionary
 Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
 GRaSP: A Multilayered Annotation Scheme for Perspectives
 Emotion Corpus Construction Based on Selection from Hashtags
 SCARE ― The Sentiment Corpus of App Reviews with Fine-grained Annotations in German
 Exploring the Realization of Irony in Twitter Data
 Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS
 Rude waiter but mouthwatering pastries! An exploratory study into Dutch Aspect-Based Sentiment Analysis
 Sentiment Analysis in Social Networks through Topic modeling
 Aspect based Sentiment Analysis in Hindi: Resource Creation and Evaluation
 Gulf Arabic Linguistic Resource Building for Sentiment Analysis
 PARC 3.0: A Corpus of Attribution Relations
 ANEW+: Automatic Expansion and Validation of Affective Norms of Words Lexicons in Multiple Languages
 A Hungarian Sentiment Corpus Manually Annotated at Aspect Level
 Effect Functors for Opinion Inference
 Specialising Paragraph Vectors for Text Polarity Detection
 Sentiframes: A Resource for Verb-centered German Sentiment Inference
 
 |  
  | Optical Character Recognition | An Open Corpus for Named Entity Recognition in Historic Newspapers Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means
 Using SMT for OCR Error Correction of Historical Texts
 Training & Quality Assessment of an Optical Character Recognition Model for Northern Haida
 OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited
 Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus
 
 |  
  | Other | Two Architectures for Parallel Processing of Huge Amounts of Text Trends in HLT Research: A Survey of LDC's Data Scholarship Program
 Who was Pietro Badoglio? Towards a QA system for Italian History
 Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish
 Metonymy Analysis Using Associative Relations between Words
 A Finite-State Morphological Analyser for Sindhi
 Discriminative Analysis of Linguistic Features for Typological Study
 Privacy Issues in Online Machine Translation Services - European Perspective
 The ACQDIV Database: Min(d)ing the Ambient Language
 Building Tempo-HindiWordNet: A resource for effective temporal information access in Hindi
 Review on the Existing Language Resources for Languages of France
 Corpus for Childrens Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)
 Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
 Wikipedia Titles As Noun Tag Predictors
 SYN2015: Representative Corpus of Contemporary Written Czech
 Automatic Anomaly Detection for Dysarthria across Two Speech Styles: Read vs Spontaneous Speech
 User, who art thou? User Profiling for Oral Corpus Platforms
 Curation of Dutch Regional Dictionaries
 Semi-automatically Alignment of Predicates between Speech and OntoNotes data
 Wikification for Scriptio Continua
 Adding Semantic Relations to a Large-Coverage Connective Lexicon of German
 Crossmodal Network-Based Distributional Semantic Models
 Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
 EstNLTK - NLP Toolkit for Estonian
 The OFAI Multi-Modal Task Description Corpus
 A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults
 Fine-Grained Chinese Discourse Relation Labelling
 Automatic identification of Mild Cognitive Impairment through the analysis of Italian spontaneous speech productions
 Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
 Parallel Discourse Annotations on a Corpus of Short Texts
 Fostering digital representation of EU regional and minority languages: the Digital Language Diversity Project
 Features for Generic Corpus Querying
 The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles
 A Large Rated Lexicon with French Medical Words
 IMS HotCoref DE: A Data-driven Co-reference Resolver for German
 Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
 Laughter in French Spontaneous Conversational Dialogs
 Acquiring Opposition Relations among Italian Verb Senses using Crowdsourcing
 A comparison of Named-Entity Disambiguation and Word Sense Disambiguation
 Universal Dependencies for Persian
 Modeling Language Change in Historical Corpora: The Case of Portuguese
 The CIRDO Corpus: Comprehensive Audio/Video Database of Domestic Falls of Elderly People
 Interoperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface
 SuperCAT: The (New and Improved) Corpus Analysis Toolkit
 SPLIT: Smart Preprocessing (Quasi) Language Independent Tool
 A Verbal and Gestural Corpus of Story Retellings to an Expressive Embodied Virtual Character
 Word Segmentation for Akkadian Cuneiform
 Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
 Yes, We Care! Results of the Ethics and Natural Language Processing Surveys
 NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
 The Public License Selector: 
Making Open Licensing Easier
 Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings
 Deep Learning of Audio and Language Features for Humor Prediction
 Improving the Annotation of Sentence Specificity
 ALT Explored: Integrating an Online Dialectometric Tool and an Online Dialect Atlas
 Detecting Expressions of Blame or Praise in Text
 CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws
 Temporal Information Annotation: Crowd vs. Experts
 EDISON: Feature Extraction for NLP, Simplified
 Entity Linking with a Paraphrase Flavor
 Accurate Deep Syntactic Parsing of Graphs: The Case of French
 Enriching a Portuguese WordNet using Synonyms from a Monolingual Dictionary
 An Empirical Exploration of Moral Foundations Theory in Partisan News Sources
 Embedding Open-domain Common-sense Knowledge from Text
 OPFI: A Tool for Opinion Finding in Polish
 Cro36WSD: A Lexical Sample for Croatian Word Sense Disambiguation
 The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis
 Evaluating Lexical Similarity to build Sentiment Similarity
 Annotating and Detecting Medical Events in Clinical Notes
 Multiword Expressions Dataset for Indian Languages
 Constraint-Based Bilingual Lexicon Induction for Closely Related Languages
 The ELRA License Wizard
 CASSAurus: A Resource of Simpler Spanish Synonyms
 CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech
 Evaluating the Noisy Channel Model for the Normalization of Historical Texts: Basque, Spanish and Slovene
 Farasa: A New Fast and Accurate Arabic Word Segmenter
 Automatic Anomaly Detection for Dysarthria across Two Speech Styles: Read vs Spontaneous Speech
 Using a Cross-Language Information Retrieval System based on OHSUMED to Evaluate the Moses and KantanMT Statistical Machine Translation Systems
 Curation of Dutch Regional Dictionaries
 LibN3L:A Lightweight Package for Neural NLP
 Extractive Summarization under Strict Length Constraints
 DeQue: A Lexicon of Complex Prepositions and Conjunctions in French
 A Singing Voice Database in Basque for Statistical Singing Synthesis of Bertsolaritza
 ANTUSD: A Large Chinese Sentiment Dictionary
 Universal Dependencies for Norwegian
 Can Tweets Predict TV Ratings?
 Web Chat Conversations from Contact Centers: a Descriptive Study
 MEANTIME, the NewsReader Multilingual Event and Time Corpus
 Could Speaker, Gender or Age Awareness be beneficial in Speech-based Emotion Recognition?
 CItA: an L1 Italian Learners Corpus to Study the Development of Writing Competence
 Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs
 SCARE ― The Sentiment Corpus of App Reviews with Fine-grained Annotations in German
 Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries
 Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
 Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation
 A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research
 Neural Scoring Function for MST Parser
 TEITOK: Text-Faithful Annotated Corpora
 TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields
 A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
 The CIRDO Corpus: Comprehensive Audio/Video Database of Domestic Falls of Elderly People
 Generating Task-Pertinent sorted Error Lists for Speech Recognition
 Using lexical and Dependency Features to Disambiguate Discourse Connectives in Hindi
 SPLIT: Smart Preprocessing (Quasi) Language Independent Tool
 Parallel Global Voices: a Collection of Multilingual Corpora with Citizen Media Stories
 TermITH-Eval: a French Standard-Based Resource for Keyphrase Extraction Evaluation
 French Learners Audio Corpus of German Speech (FLACGS)
 Yes, We Care! Results of the Ethics and Natural Language Processing Surveys
 Transfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it?
 Wiktionnaire's Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary
 A Neural Lemmatizer for Bengali
 CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws
 
 |    
  
  | P |  
  | Parsing | Accurate Deep Syntactic Parsing of Graphs: The Case of French Punctuation Prediction for Unsegmented Transcript Based on Word Vector
 Semi-automatic Parsing for Web Knowledge Extraction through Semantic Annotation
 Explicit Fine grained Syntactic and Semantic Annotation of the Idafa Construction in Arabic
 Phrase Level Segmentation and Labelling of Machine Translation Errors
 Universal Dependencies for Japanese
 A Dependency Treebank of the Chinese Buddhist Canon
 Evaluating a Deterministic Shift-Reduce Neural Parser for Constituent Parsing
 Language Resource Addition Strategies for Raw Text Parsing
 E-TIPSY: Search Query Corpus Annotated with Entities, Term Importance, POS Tags, and Syntactic Parses
 4Couv: A New Treebank for French
 AfriBooms: An Online Treebank for Afrikaans
 Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin
 CINTIL DependencyBank PREMIUM - A Corpus of Grammatical Dependencies for Portuguese
 Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies
 Construction of an English Dependency Corpus incorporating Compound Function Words
 South African Language Resources: Phrase Chunking
 Syntactic Analysis of Phrasal Compounds in Corpora: a Challenge for NLP Tools
 EasyTree: A Graphical Tool for Dependency Tree Annotation
 Neural Scoring Function for MST Parser
 Extracting Interlinear Glossed Text from LaTeX Documents
 Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian
 Hard Time Parsing Questions: Building a QuestionBank for French
 Using lexical and Dependency Features to Disambiguate Discourse Connectives in Hindi
 Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks
 Towards Building Semantic Role Labeler for Indian Languages
 Old French Dependency Parsing: Results of Two Parsers Analysed from a Linguistic Point of View
 The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions
 UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing
 Towards Comparability of Linguistic Graph Banks for Semantic Parsing
 Czech Legal Text Treebank 1.0
 NorGramBank: A Deep Treebank for Norwegian
 Government Domain Named Entity Recognition for South African Languages
 
 |  
  | Part-of-Speech Tagging | A Proposal for a Part-of-Speech Tagset for the Albanian Language Morphological Analysis of Sahidic Coptic for Automatic Glossing
 Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy
 Two Years of Aranea: Increasing Counts and Tuning the Pipeline
 Learning from Within? Comparing PoS Tagging Approaches for Historical Text
 Improving POS Tagging of German Learner Language in a Reading Comprehension Scenario
 Wikipedia Titles As Noun Tag Predictors
 POS-tagging of Historical Dutch
 Language Resource Addition Strategies for Raw Text Parsing
 New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian
 FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German
 TGermaCorp -- A (Digital) Humanities Resource for (Computational) Linguistics
 Features for Generic Corpus Querying
 Constructing a Norwegian Academic Wordlist
 Fast and Robust POS tagger for Arabic Tweets Using Agreement-based Bootstrapping
 Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art
 TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields
 Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian
 If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers
 Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
 The hunvec framework for NN-CRF-based sequential tagging
 Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene
 Combining Ontologies and Neural Networks for Analyzing Historical Language Varieties. A Case Study in Middle Low German
 A Large Scale Corpus of Gulf Arabic
 The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions
 UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing
 Exploiting Arabic Diacritization for High Quality Automatic Annotation
 Rapid Development of Morphological Analyzers for Typologically Diverse Languages
 FlexTag: A Highly Flexible PoS Tagging Framework
 
 |  
  | Person Identification | Comparing Speech and Text Classification on ICNALE Arabic to English Person Name Transliteration using Twitter
 Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context
 FABIOLE, a Speech Database for Forensic Speaker Comparison
 Benchmarking multimedia technologies with the CAMOMILE platform: the case of Multimodal Person Discovery at MediaEval 2015
 Dialogue System Characterisation by Back-channelling Patterns Extracted from Dialogue Corpus
 He Said She Said ― a Male/Female Corpus of Polish
 Predicting Author Age from Weibo Microblog Posts
 
 |  
  | Phonetic Databases, Phonology | New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification Phonetic Inventory for an Arabic Speech Corpus
 Defining and Counting Phonological Classes in Cross-linguistic Segment Databases
 Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech
 The IFCASL Corpus of French and German Non-native and Native Read Speech
 The BAS Speech Data Repository
 Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
 Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech
 Polish Rhythmic Database ― New Resources for Speech Timing and Rhythm Analysis
 
 |  
  | Profiling | Building a Dataset for Possessions Identification in Text Age and Gender Prediction on Health Forum Data
 SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies
 A Semi-Supervised Approach for Gender Identification
 TwiSty: A Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling
 Predicting Author Age from Weibo Microblog Posts
 
 |  
  | Prosody | Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets AMISCO: The Austrian German Multi-Sensor Corpus
 Introducing the SEA_AP: an Enhanced Tool for Automatic Prosodic Analysis
 Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation
 Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis
 On the Use of a Serious Game for Recording a Speech Corpus of People with Intellectual Disabilities
 Polish Rhythmic Database ― New Resources for Speech Timing and Rhythm Analysis
 
 |      
  
  | S |  
  | Semantics | A Gold Standard for Scalar Adjectives The Gavagai Living Lexicon
 VerbCROcean: A Repository of Fine-Grained Semantic Verb Relations for Croatian
 VoxML: A Visualization Modeling Language
 Example-based Acquisition of Fine-grained Collocation Resources
 Embedding Open-domain Common-sense Knowledge from Text
 Combining Semantic Annotation of Word Sense & Semantic Roles: A Novel Annotation Scheme for VerbNet Roles on German Language Data
 SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores
 Introducing the LCC Metaphor Datasets
 DT-Neg: Tutorial Dialogues Annotated for Negation Scope and Focus in Context
 Medical Concept Embeddings via Labeled Background Corpora
 Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
 Cro36WSD: A Lexical Sample for Croatian Word Sense Disambiguation
 A Taxonomy of Spanish Nouns, a Statistical Algorithm to Generate it and its Implementation in Open Source Code
 Spanish Word Vectors from Wikipedia
 Synset Ranking of Hindi WordNet
 Neural Embedding Language Models in Semantic Clustering of Web Search Results
 SemRelData ― Multilingual Contextual Annotation of Semantic Relations between Nominals: Dataset and Guidelines
 Using Data Mining Techniques for Sentiment Shifter Identification
 Question-Answering with Logic Specific to Video Games
 Concepticon: A Resource for the Linking of Concept Lists
 Aspectual Flexibility Increases with Agentivity and Concreteness\\ A Computational Classification Experiment on Polysemous Verbs
 Annotating Logical Forms for EHR Questions
 Exploitation of Co-reference in Distributional Semantics
 A Framework for Cross-lingual/Node-wise Alignment of Lexical-Semantic Resources
 The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
 A sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon
 A lexicon of perception for the identification of synaesthetic metaphors in corpora
 A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds
 A Dataset for Detecting Stance in Tweets
 Semi-automatically Alignment of Predicates between Speech and OntoNotes data
 Legal Text Interpretation: Identifying Hohfeldian Relations from Text
 Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
 mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing
 Crossmodal Network-Based Distributional Semantic Models
 A Semantically Compositional Annotation Scheme for Time Normalization
 PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
 Corpus Annotation within the French FrameNet: a Domain-by-domain Methodology
 GhoSt-NN: A Representative Gold Standard of German Noun-Noun Compounds
 The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories
 Covering various Needs in Temporal Annotation: a Proposal of Extension of ISO TimeML that Preserves Upward Compatibility
 Building Concept Graphs from Monolingual Dictionary Entries
 CORILSE: a Spanish Sign Language Repository for Linguistic Analysis
 PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
 Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
 Semantic Layer of the Valence Dictionary of Polish Walenty
 Riddle Generation using Word Associations
 A General Framework for the Annotation of Causality Based on FrameNet
 Cognitively Motivated Distributional Representations of Meaning
 Annotating Temporally-Anchored Spatial Knowledge on Top of OntoNotes Semantic Roles
 Extending Monolingual Semantic Textual Similarity Task to Multiple Cross-lingual Settings
 The Hebrew FrameNet Project
 Addressing the MFS Bias in WSD systems
 Argument Mining: the Bottleneck of Knowledge and Language Resources
 Italian VerbNet: A Construction-based Approach to Italian Verb Classification
 Nine Features in a Random Forest to Learn Taxonomical Semantic Relations
 metaTED: a Corpus of Metadiscourse for Spoken Language
 ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain
 Issues and Challenges in Annotating Urdu Action Verbs on the IMAGACT4ALL Platform
 SpaceRef: A corpus of street-level geographic descriptions
 Visualisation and Exploration of High-Dimensional Distributional Features in Lexical Semantic Classification
 Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
 Automatically Generated Affective Norms of Abstractness, Arousal, Imageability and Valence for 350 000 German Lemmas
 A Large Rated Lexicon with French Medical Words
 Comprehensive and Consistent PropBank Light Verb Annotation
 Inconsistency Detection in Semantic Annotation
 Datasets for Aspect-Based Sentiment Analysis in French
 DART: a Dataset of Arguments and their Relations on Twitter
 Multi-prototype Chinese Character Embedding
 Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis
 Best of Both Worlds: Making Word Sense Embeddings Interpretable
 Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
 Rude waiter but mouthwatering pastries! An exploratory study into Dutch Aspect-Based Sentiment Analysis
 Can Topic Modelling benefit from Word Sense Information?
 Resources for building applications with Dependency Minimal Recursion Semantics
 Typology of Adjectives Benchmark for Compositional Distributional Models
 Assessing the Potential of Metaphoricity of verbs using corpus data
 Persian Proposition Bank
 Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks
 Semantic Relation Extraction with Semantic Patterns Experiment on Radiology Reports
 Typed Entity and Relation Annotation on Computer Science Papers
 EVALution-MAN: A Chinese Dataset for the Training and Evaluation of DSMs
 Towards Building Semantic Role Labeler for Indian Languages
 Effect Functors for Opinion Inference
 A Dataset for Open Event Extraction in English
 A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora
 Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature
 Wow! What a Useful Extension! Introducing Non-Referential Concepts to Wordnet
 Graph-Based Induction of Word Senses in Croatian
 Towards Comparability of Linguistic Graph Banks for Semantic Parsing
 A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
 GATE-Time: Extraction of Temporal Expressions and Events
 Building A Case-based Semantic English-Chinese Parallel Treebank
 VerbLexPor: a lexical resource with semantic roles for Portuguese
 A Multilingual Predicate Matrix
 Latin Vallex. A Treebank-based Semantic Valency Lexicon for Latin
 Merging Data Resources for Inflectional and Derivational Morphology in Czech
 
 |  
  | Semantic Web | Semi-automatic Parsing for Web Knowledge Extraction through Semantic Annotation Concepticon: A Resource for the Linking of Concept Lists
 Towards a Linguistic Ontology with an Emphasis on Reasoning and Knowledge Reuse
 Context-enhanced Adaptive Entity Linking
 DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
 Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job
 
 |  
  | Sign Language Recognition/Generation | A Web Tool for Building Parallel Corpora of Spoken and Sign Languages Modelling a Parallel Corpus of French and French Belgian Sign Language
 CORILSE: a Spanish Sign Language Repository for Linguistic Analysis
 Using a Language Technology Infrastructure for German in order to Anonymize German Sign Language Corpus Data
 Finding Recurrent Features of Image Schema Gestures: the FIGURE corpus
 BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains
 Detection of Major ASL Sign Types in Continuous Signing For ASL Recognition
 
 |  
  | Social Media Processing | Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola
 A Corpus of Wikipedia Discussions: Over the Years, with Topic, Power and Gender Labels
 NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic
 Building a Dataset for Possessions Identification in Text
 CodE Alltag: A German-Language E-Mail Corpus
 A Turkish-German Code-Switching Corpus
 Whats the Issue Here?: Task-based Evaluation of Reader Comment Summarization Systems
 Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities
 Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
 Speech Synthesis of Code-Mixed Text
 Challenges of Evaluating Sentiment Analysis Tools on Social Media
 A Dataset for Detecting Stance in Tweets
 Sentiment Lexicons for Arabic Social Media
 An Arabic-Moroccan Darija Code-Switched Corpus
 Classifying Out-of-vocabulary Terms in a Domain-Specific Social Media Corpus
 A Document Repository for Social Media and Speech Conversations
 A Language Independent Method for Generating Large Scale Polarity Lexicons
 Corpus for Customer Purchase Behavior Prediction in Social Media
 TweetMT: A Parallel Microblog Corpus
 Can Tweets Predict TV Ratings?
 Web Chat Conversations from Contact Centers: a Descriptive Study
 Multilevel Annotation of Agreement and Disagreement in Italian News Blogs
 Exploring the Realization of Irony in Twitter Data
 Fast and Robust POS tagger for Arabic Tweets Using Agreement-based Bootstrapping
 DART: a Dataset of Arguments and their Relations on Twitter
 Rude waiter but mouthwatering pastries! An exploratory study into Dutch Aspect-Based Sentiment Analysis
 TwiSty: A Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling
 Sentiment Analysis in Social Networks through Topic modeling
 Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests
 Segmenting Hashtags using Automatically Created Training Data
 What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis
 A Hungarian Sentiment Corpus Manually Annotated at Aspect Level
 Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages
 The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions
 Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings
 Exploring Language Variation Across Europe - A Web-based Tool for Computational Sociolinguistics
 Monolingual Social Media Datasets for Detecting Contradiction and Entailment
 Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments
 Predicting Author Age from Weibo Microblog Posts
 Effects of Sampling on Twitter Trend Detection
 PotTS: The Potsdam Twitter Sentiment Corpus
 FlexTag: A Highly Flexible PoS Tagging Framework
 Automatic Classification of Tweets for Analyzing Communication Behavior of Museums
 
 |  
  | Speech Recognition/Understanding | Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces Punctuation Prediction for Unsegmented Transcript Based on Word Vector
 The DIRHA Portuguese Corpus: A Comparison of Home Automation Command Detection and Recognition in Simulated and Real Data.
 Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech
 Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation
 Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification
 Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets
 AIMU: Actionable Items for Meeting Understanding
 A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
 How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
 Introducing the SEA_AP: an Enhanced Tool for Automatic Prosodic Analysis
 Syllable based DNN-HMM Cantonese Speech to Text System
 Palabras: Crowdsourcing Transcriptions of L2 Speech
 Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof
 BulPhonC: Bulgarian Speech Corpus for the Development of ASR Technology
 Designing a Speech Corpus for the Development and Evaluation of Dictation Systems in Latvian
 SCALE: A Scalable Language Engineering Toolkit
 The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation
 Mining the Spoken Wikipedia for Speech Data and Beyond
 A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation
 Parallel Speech Corpora of Japanese Dialects
 Generating Task-Pertinent sorted Error Lists for Speech Recognition
 The SI TEDx-UM speech database: a new Slovenian Spoken Language Resource
 AppDialogue: Multi-App Dialogues for Intelligent Assistants
 Speech Corpus Spoken by Young-old, Old-old and Oldest-old Japanese
 Joining-in-type Humanoid Robot Assisted Language Learning System
 
 |  
  | Speech Resource/Database | Endangered Language Documentation: Bootstrapping a Chatino Speech Corpus, Forced Aligner, ASR Falling silent, lost for words ... Tracing personal involvement in interviews with Dutch war veterans
 New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification
 The DIRHA Portuguese Corpus: A Comparison of Home Automation Command Detection and Recognition in Simulated and Real Data.
 Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech
 Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project
 A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus
 Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
 Towards Automatic Transcription of ILSE ― an Interdisciplinary Longitudinal Study of Adult Development and Aging
 Hidden Resources ― Strategies to Acquire and Exploit Potential Spoken Language Resources in National Archives
 CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech
 Operational Assessment of Keyword Search on Oral History
 Accuracy of Automatic Cross-Corpus Emotion Labeling for Conversational Speech Corpus Commonization
 User, who art thou? User Profiling for Oral Corpus Platforms
 Semi-automatically Alignment of Predicates between Speech and OntoNotes data
 Comparison of Emotional Understanding in Modality-Controlled Environments using Multimodal Online Emotional Communication Corpus
 FABIOLE, a Speech Database for Forensic Speaker Comparison
 A Singing Voice Database in Basque for Statistical Singing Synthesis of Bertsolaritza
 AMISCO: The Austrian German Multi-Sensor Corpus
 A Database of Laryngeal High-Speed Videos with Simultaneous High-Quality Audio Recordings of Pathological and Non-Pathological Voices
 FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German
 AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
 Introducing the SEA_AP: an Enhanced Tool for Automatic Prosodic Analysis
 Syllable based DNN-HMM Cantonese Speech to Text System
 Palabras: Crowdsourcing Transcriptions of L2 Speech
 Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof
 BulPhonC: Bulgarian Speech Corpus for the Development of ASR Technology
 The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation
 The BAS Speech Data Repository
 Mining the Spoken Wikipedia for Speech Data and Beyond
 Parallel Speech Corpora of Japanese Dialects
 The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles
 A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research
 A Shared Task for Spoken CALL?
 A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
 The SI TEDx-UM speech database: a new Slovenian Spoken Language Resource
 A Verbal and Gestural Corpus of Story Retellings to an Expressive Embodied Virtual Character
 Speech Corpus Spoken by Young-old, Old-old and Oldest-old Japanese
 SPA: Web-based Platform for easy Access to Speech Processing Modules
 Polish Rhythmic Database ― New Resources for Speech Timing and Rhythm Analysis
 CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis
 Database of Mandarin Neighborhood Statistics
 An Extension of the Slovak Broadcast News Corpus based on Semi-Automatic Annotation
 Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)
 Crowdsourcing a Multi-lingual Speech Corpus: Recording, Transcription and Annotation of the CrowdIS Corpora
 
 |  
  | Speech Synthesis | Speech Synthesis of Code-Mixed Text A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance
 TTS for Low Resource Languages: A Bangla Synthesizer
 AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
 Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis
 Chatbot Technology with Synthetic Voices in the Acquisition of an Endangered Language: Motivation, Development and Evaluation of a Platform for Irish
 CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis
 
 |  
  | Standards for LRs | An Annotated Corpus of Direct Speech A Proposal for a Part-of-Speech Tagset for the Albanian Language
 MWEs in Treebanks: From Survey to Guidelines
 Corpus Query Lingua Franca (CQLF)
 Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research
 Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
 RankDCG: Rank-Ordering Evaluation Measure
 Language Resource Citation: the ISLRN Dissemination and Further Developments
 Modelling Multi-issue Bargaining Dialogues: Data Collection, Annotation Design and Corpus
 Quality Assessment of the Reuters Vol. 2 Multilingual Corpus
 The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources
 Covering various Needs in Temporal Annotation: a Proposal of Extension of ISO TimeML that Preserves Upward Compatibility
 A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research
 The Universal Dependencies Treebank of Spoken Slovenian
 Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation
 Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
 The DialogBank
 Facilitating Metadata Interoperability in CLARIN-DK
 Towards Comparability of Linguistic Graph Banks for Semantic Parsing
 Graphical Annotation for Syntax-Semantics Mapping
 
 |  
  | Statistical and Machine Learning Methods | Punctuation Prediction for Unsegmented Transcript Based on Word Vector Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality
 MARMOT: A Toolkit for Translation Quality Estimation at the Word Level
 Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models
 A Machine Learning based Music Retrieval and Recommendation System
 Medical Concept Embeddings via Labeled Background Corpora
 Aspectual Flexibility Increases with Agentivity and Concreteness\\ A Computational Classification Experiment on Polysemous Verbs
 Evaluating a Deterministic Shift-Reduce Neural Parser for Constituent Parsing
 POS-tagging of Historical Dutch
 An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
 A Novel Evaluation Method for Morphological Segmentation
 Text Segmentation of Digitized Clinical Texts
 How does Dictionary Size Influence Performance of Vietnamese Word Segmentation?
 Creating Annotated Dialogue Resources: Cross-domain Dialogue Act Classification
 Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
 Towards Using Social Media to Identify Individuals at Risk for Preventable Chronic Illness
 A Comparative Study of Text Preprocessing Approaches for Topic Detection of User Utterances
 Detecting Optional Arguments of Verbs
 Corpus-Based Diacritic Restoration for South Slavic Languages
 Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin
 A Semi-Supervised Approach for Gender Identification
 Word Embedding Evaluation and Combination
 Automatic identification of Mild Cognitive Impairment through the analysis of Italian spontaneous speech productions
 South African Language Resources: Phrase Chunking
 Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles
 Syllable based DNN-HMM Cantonese Speech to Text System
 What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
 Bootstrapping a Hybrid MT System to a New Language Pair
 Building Language Resources for Exploring Autism Spectrum Disorders
 A Multimodal Corpus for the Assessment of Public Speaking Ability and Anxiety
 A Sequence Model Approach to Relation Extraction in Portuguese
 MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
 Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian
 Segmenting Hashtags using Automatically Created Training Data
 Detection of Major ASL Sign Types in Continuous Signing For ASL Recognition
 Word Segmentation for Akkadian Cuneiform
 A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
 Specialising Paragraph Vectors for Text Polarity Detection
 NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
 MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment
 Learning Thesaurus Relations from Distributional Features
 
 |  
  | Summarisation | Revisiting Summarization Evaluation for Scientific Articles Whats the Issue Here?: Task-based Evaluation of Reader Comment Summarization Systems
 The OnForumS corpus from the Shared Task on Online Forum Summarisation at MultiLing 2015
 Extractive Summarization under Strict Length Constraints
 A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization
 Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks
 Sentence Similarity based on Dependency Tree Kernels for Multi-document Summarization
 Urdu Summary Corpus
 Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization
 
 |    
  
  | T |  
  | Text Mining | Event Coreference Resolution with Multi-Pass Sieves The PsyMine Corpus - A Corpus annotated with Psychiatric Disorders and their Etiological Factors
 An Empirical Exploration of Moral Foundations Theory in Partisan News Sources
 Arabic Corpora for Credibility Analysis
 Medical Concept Embeddings via Labeled Background Corpora
 Using Data Mining Techniques for Sentiment Shifter Identification
 Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
 Domain Ontology Learning Enhanced by Optimized Relation Instance in DBpedia
 An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
 A Large DataBase of Hypernymy Relations Extracted from the Web.
 JATE 2.0: Java Automatic Term Extraction with Apache Solr
 Text Segmentation of Digitized Clinical Texts
 Creating a General Russian Sentiment Lexicon
 A Multilingual, Multi-style and Multi-granularity Dataset for Cross-language Textual Similarity Detection
 WIKIPARQ: A Tabulated Wikipedia Resource Using the Parquet Format
 Monitoring Disease Outbreak Events on the Web Using Text-mining Approach and Domain Expert Knowledge
 Odin's Runes: A Rule Language for Information Extraction
 A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization
 Identifying Content Types of Messages Related to Open Source Software Projects
 Ensemble Classification of Grants using LDA-based Features
 Ambiguity Diagnosis for Terms in Digital Humanities
 A Classification-based Approach to Economic Event Detection in Dutch News Text
 Corpus for Customer Purchase Behavior Prediction in Social Media
 NLP and Public Engagement: The Case of the Italian School Reform
 LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl
 Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous
 Edit Categories and Editor Role Identification in Wikipedia
 Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers
 Sentence Similarity based on Dependency Tree Kernels for Multi-document Summarization
 Crowdsourcing Salient Information from News and Tweets
 More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing
 Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests
 The Event and Implied Situation Ontology (ESO): Application and Evaluation
 Typed Entity and Relation Annotation on Computer Science Papers
 Detection of Reformulations in Spoken French
 A Study of Reuse and Plagiarism in LREC papers
 Controlled Propagation of Concept Annotations in Textual Corpora
 Predictive Modeling: Guessing the NLP Terms of Tomorrow
 A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
 Detecting Expressions of Blame or Praise in Text
 Effects of Sampling on Twitter Trend Detection
 Studying the Temporal Dynamics of Word Co-occurrences: An Application to Event Detection
 Automatic Biomedical Term Polysemy Detection
 Markov Logic Networks for Text Mining: A Qualitative and Empirical Comparison with Integer Linear Programming
 
 |  
  | Textual Entailment and Paraphrasing | SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores Passing a USA National Bar Exam: a First Corpus for Experimentation
 Corpora for Learning the Mutual Relationship between Semantic Relatedness and Textual Entailment
 TEG-REP: A corpus of Textual Entailment Graphs based on Relation Extraction Patterns
 UPPC - Urdu Paraphrase Plagiarism Corpus
 Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations
 Relation- and Phrase-level Linking of FrameNet with Sar-graphs
 A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System
 Detection of Reformulations in Spoken French
 A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
 Monolingual Social Media Datasets for Detecting Contradiction and Entailment
 
 |  
  | Tools, Systems, Applications | Event Coreference Resolution with Multi-Pass Sieves An Interaction-Centric Dataset for Learning Automation Rules in Smart Homes
 Two Architectures for Parallel Processing of Huge Amounts of Text
 Sieve-based Coreference Resolution in the Biomedical Domain
 How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment
 Croatian Error-Annotated Corpus of Non-Professional Written Language
 MARMOT: A Toolkit for Translation Quality Estimation at the Word Level
 NLP Infrastructure for the Lithuanian Language
 Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech
 Sense-annotating a Lexical Substitution Data Set with Ubyline
 Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish
 Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel
 A Machine Learning based Music Retrieval and Recommendation System
 Publishing the Trove Newspaper Corpus
 Deriving Morphological Analyzers from Example Inflections
 SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores
 The on-line version of Grammatical Dictionary of Polish
 Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
 The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis
 RankDCG: Rank-Ordering Evaluation Measure
 CASSAurus: A Resource of Simpler Spanish Synonyms
 MarsaGram: an excursion in the forests of parsing trees
 Operational Assessment of Keyword Search on Oral History
 Defining and Counting Phonological Classes in Cross-linguistic Segment Databases
 Benchmarking Lexical Simplification Systems
 Syntax-based Multi-system Machine Translation
 Phoneme Alignment Using the Information on Phonological Processes in Continuous Speech
 Farasa: A New Fast and Accurate Arabic Word Segmenter
 Use of Domain-Specific Language Resources in Machine Translation
 A Large DataBase of Hypernymy Relations Extracted from the Web.
 Automatic Anomaly Detection for Dysarthria across Two Speech Styles: Read vs Spontaneous Speech
 JATE 2.0: Java Automatic Term Extraction with Apache Solr
 CATaLog Online: Porting a Post-editing Tool to the Web
 The ILMT-s2s Corpus ― A Multimodal Interlingual Map Task Corpus
 KorAP Architecture ― Diving in the Deep Sea of Corpus Data
 mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing
 SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners
 Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
 Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
 TermoPL - a Flexible Tool for Terminology Extraction
 Correcting Errors in a Treebank Based on Tree Mining
 Towards Using Social Media to Identify Individuals at Risk for Preventable Chronic Illness
 LibN3L:A Lightweight Package for Neural NLP
 Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest
 EstNLTK - NLP Toolkit for Estonian
 SemLinker, a Modular and Open Source Framework for Named Entity Discovery and Linking
 Finding Definitions in Large Corpora with Sketch Engine
 Fine-Grained Chinese Discourse Relation Labelling
 Corpus-Based Diacritic Restoration for South Slavic Languages
 Ensemble Classification of Grants using LDA-based Features
 Riddle Generation using Word Associations
 Purely Corpus-based Automatic Conversation Authoring
 Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles
 Distribution of Valency Complements in Czech Complex Predicates: Between Verb and Noun
 1 Million Captioned Dutch Newspaper Images
 Multimodal Resources for Human-Robot Communication Modelling
 The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
 NLP and Public Engagement: The Case of the Italian School Reform
 FLAT: Constructing a CLARIN Compatible Home for Language Resources
 SCALE: A Scalable Language Engineering Toolkit
 LanguageCrawl: A Generic Tool for Building Language Models Upon Common-Crawl
 Construction and Analysis of a Large Vietnamese Text Corpus
 Accessing and Elaborating Walenty - a Valence Dictionary of Polish - via Internet Browser
 Visualisation and Exploration of High-Dimensional Distributional Features in Lexical Semantic Classification
 Evaluating Lexical Simplification and Vocabulary Knowledge for Learners of French: Possibilities of Using the FLELex Resource
 EasyTree: A Graphical Tool for Dependency Tree Annotation
 Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs
 Bootstrapping a Hybrid MT System to a New Language Pair
 Multilevel Annotation of Agreement and Disagreement in Italian News Blogs
 Adapting an Entity Centric Model for Portuguese Coreference Resolution
 FREME: Multilingual Semantic Enrichment with Linked Data and Language Technologies
 Staggered NLP-assisted refinement for Clinical Annotations of Chronic Disease Events
 Cross-validating Image Description Datasets and Evaluation Metrics
 Using BabelNet to Improve OOV Coverage in SMT
 A Multimodal Corpus for the Assessment of Public Speaking Ability and Anxiety
 MADAD: A Readability Annotation Tool for Arabic Text
 IMS HotCoref DE: A Data-driven Co-reference Resolver for German
 Resources for building applications with Dependency Minimal Recursion Semantics
 More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing
 Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
 TEITOK: Text-Faithful Annotated Corpora
 Extracting Interlinear Glossed Text from LaTeX Documents
 MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
 BAS Speech Science Web Services - an Update of Current Developments
 Evaluation of the KIT Lecture Translation System
 CirdoX: an on/off-line multisource speech and sound analysis software
 Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
 Tools and Guidelines for Principled Machine Translation Development
 Interoperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface
 SuperCAT: The (New and Improved) Corpus Analysis Toolkit
 SPLIT: Smart Preprocessing (Quasi) Language Independent Tool
 Urdu Summary Corpus
 Refurbishing a Morphological Database for German
 OSMAN ― A Novel Arabic Readability Metric
 UIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
 The hunvec framework for NN-CRF-based sequential tagging
 SPA: Web-based Platform for easy Access to Speech Processing Modules
 Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene
 Towards Multiple Antecedent Coreference Resolution in Specialized Discourse
 Word Segmentation for Akkadian Cuneiform
 Towards a Language Service Infrastructure for Mobile Environments
 NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
 Controlled Propagation of Concept Annotations in Textual Corpora
 The Public License Selector: 
Making Open Licensing Easier
 Searching in the Penn Discourse Treebank Using the PML-Tree Query
 IRIS: English-Irish Machine Translation System
 Exploring Language Variation Across Europe - A Web-based Tool for Computational Sociolinguistics
 corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora
 On Developing Resources for Patient-level Information Retrieval
 ALT Explored: Integrating an Online Dialectometric Tool and an Online Dialect Atlas
 Czech Legal Text Treebank 1.0
 FlexTag: A Highly Flexible PoS Tagging Framework
 CLARIN-EL Web-based Annotation Tool
 Adapting the TANL tool suite to Universal Dependencies
 Markov Logic Networks for Text Mining: A Qualitative and Empirical Comparison with Integer Linear Programming
 EDISON: Feature Extraction for NLP, Simplified
 
 |  
  | Topic Detection & Tracking | Enhancing Access to Online Education: Quality Machine Translation of MOOC Content That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models
 Forecasting Emerging Trends from Scientific Literature
 Can Topic Modelling benefit from Word Sense Information?
 Analyzing Time Series Changes of Correlation between Market Share and Concerns on Companies measured through Search Engine Suggests
 Automatic Construction of Discourse Corpora for Dialogue Translation
 Predictive Modeling: Guessing the NLP Terms of Tomorrow
 Studying the Temporal Dynamics of Word Co-occurrences: An Application to Event Detection
 
 |  
  | Typological Databases | Discriminative Analysis of Linguistic Features for Typological Study The Alaskan Athabascan Grammar Database
 Defining and Counting Phonological Classes in Cross-linguistic Segment Databases
 Typology of Adjectives Benchmark for Compositional Distributional Models
 Legacy language atlas data mining: mapping Kru languages
 
 |        
 |  |