A

Language	Title
60 languages	The OPUS corpus - parallel and free
A
Abbey	WALA: a multilingual resource repository for West African Languages
Afrikaans	A Chatbot as a Novel Corpus Visualization Tool
	A Spoken Afrikaans Language Resource Designed for Research on Pronunciation Variations
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The African Speech Technology Project: An Assessment
Albanian	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
Albanian	MED-TYP: A Typological Database for Mediterranean Languages
All	A Registry of Standard Data Categories for Linguistic Annotation
All	Mapping Dependency Structures to Phrase Structures and the Automatic Acquisition of Mapping Rules
All text encodable languages	Migrating Language Resources from SGML to XML: the Text Encoding Initiative Recommendations
All Unicode supported languages	Callisto: A Configurable Annotation Workbench
American English	The American English SALA-II Data Collection
American English	The American National Corpus First Release
Any	WinPitch Corpus, a Text to Speech Alignment Tool for Multimodal Corpora
	Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment
	eGram - a Grammar Development Environment and its Usage for Language Generation
	ENABLER Thematic Network of National Projects: Technical, Strategic and Political Issues of LRs
Anyi	CoGesT: A Formal Transcription System for Conversational Gesture
Anyi	WALA: a multilingual resource repository for West African Languages
Arabic	A Chatbot as a Novel Corpus Visualization Tool
	A Framework for Evaluating the Suitability of Non-English Corpora for Language Engineering
	A Multi-Modal Documentation System for Warao
	A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
	An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies
	Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium
	Automatic Language-Independent Induction of Gazetteer Lists
	Collection and Evaluation of Broadcast News Data for Arabic
	Construction of a Bilingual Arabic-Spanish Lexicon of Verbs Based on a Parallel Corpus
	Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004
	Generating an Arabic full-form lexicon for bidirectional morphology lookup
	Language Model Adaptation for Statistical Machine Translation based on Information Retrieval
	Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text
	NEMLAR - An Arabic Language Resources Project
	OrienTel - Telephony Databases Across Northern Africa and the Middle East
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation
	The Fisher Corpus: A Resource for the Next Generations of Speech-to-Text
	The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
	Towards basic categories for describing properties of texts in a corpus
Arabic dialects	MED-TYP: A Typological Database for Mediterranean Languages
B
Balkan languages	The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO
Basque	A Xml-Based Term Extraction Tool for Basque
	Abar-Hitz: An Annotation Tool for the Basque Dependency Treebank
	Cross-Language Acquisition of Semantic Models for Verbal Predicates
	Development of Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish
	Evaluation of a Spoken Phonetic Databse in Basque Language
	Exploring Portability of Syntactic Information from English to Basque
	Towards the MEANING Top Ontology: Sources of Ontological Meaning
	Translation memories enrichment by statistical bilingual segmentation
Basque (standard)	Designing and Recording an Audiovisual Database of Emotional Speech in Basque
Baule	WALA: a multilingual resource repository for West African Languages
Bengali	A Framework for Evaluating the Suitability of Non-English Corpora for Language Engineering
Berber	An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies
Berber	MED-TYP: A Typological Database for Mediterranean Languages
Bulgarian	A Hybrid Strategy for Regular Grammar Parsing
	A Language Resources Infrastructure for Bulgarian
	A Methodology and Associated Tools for Building Interlingual Wordnets
	Cluster Analysis and Classification of Named Entities
	Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing
	Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian
	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
	Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The CLaRK System: XML-based Corpora Development System for Rapid Prototyping
	Unexpected Productions May Well be Errors
	Verb Valency Descriptors for a Syntactic Treebank
C
Cantonese	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Catalan	ALLES: Integrating NLP in ICALL Applications
	Bilingual Connections for Trilingual Corpora: An XML Approach
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	FreeLing: An Open-Source Suite of Language Analyzers
	MED-TYP: A Typological Database for Mediterranean Languages
	Mercedes, A Term-In-Context Highlighter
	NLP-enhanced error Checking for Catalan unrestricted text
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The GENOMA-KB Platform: Queries Over Integrated Linguistic Resources
	The GENOMA-KB project: towards the integration of concepts, terms, textual corpora and entities
	Towards the MEANING Top Ontology: Sources of Ontological Meaning
	Towards the Use of Word Stems and Suffixes for Statistical Machine Translation
Chinese	A Model of Semantic Representations Analysis For Chinese Sentences
	A Multi-Modal Documentation System for Warao
	An Information Repository Model for Advanced Question Answering Systems
	Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium
	Augmenting Manual Dictionaries for Statistical Machine Translation Systems
	Automatic Language-Independent Induction of Gazetteer Lists
	Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment
	Collocation Extraction Using Web Statistics
	Distributional Consistency: As a General Method for Defining a Core Lexicon
	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
	Korean-Chinese-Japanese Multilingual Wordnet with Shared Semantic Hierarchy
	Language Model Adaptation for Statistical Machine Translation based on Information Retrieval
	Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text
	MEAD - A Platform for Multidocument Multilingual Text Summarization
	Pattern Discovery in Named Organization Corpus
	Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO
	Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO
	Speech & Expression - The Value of a Longitudinal Corpus
	Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop
	The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation
Chol	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Classical Arabic	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
Contemporary Italian	Representing Italian Complex Nominals: a Pilot Study
Croatian	Enlarging the Croatian Morphological Lexicon by Automatic Lexical Acquisition from Raw Corpora
	Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian
	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Cypriot Greek	Cypriot Speech Database: Data Collection and Greek to Cypriot Dialect Adaptation
Czech	A Methodology and Associated Tools for Building Interlingual Wordnets
	Annotators' Agreement: The Case of Topic-Focus Articulation
	Derivational Relations in Flectional Languages - Czech Case
	Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing
	Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH Project
	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
	Orthographic and Phonetic Annotation of Very Large Czech Corpora with Quality Assessment
	Prague Czech-English Dependency Treebank, Syntactically Annotated Resources for Machine Translation
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The Core of the Czech Derivational Dictionary
	The COST278 pan-European Broadcast News Database
	The Design of Czech Language Formal Listening Tests for the Evaluation of TTS Systems
	Tiered Tagging Revisited
	Top Ontology as a Tool for Semantic Role Tagging
	Word Association Norms as a Unique Supplement of Traditional Language Resources
D
DAML+OIL	Ontology Evaluation Functionalities of RDF(S), DAML+OIL, and OWL Parsers and Ontology Platforms
Danish	A Corpus-based Syntactic Lexicon for Adverbs
	A Danish Lexicon Resource - Ready for Applications
	A Flexible Language Acquisition Tool Kit for Natural Language Processing
	A Named Entity Recognizer for Danish
	Evaluation of a Multimodal Dialogue System for Small-screen Devices
	Human Language Technology Elements in a Knowledge Organisation System -The VID project
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The Bilingual Web Dictionary on Demand
Dutch	Automatic Phonemic Labeling and Segmentation of Spoken Dutch
	Automatic Sentence Simplification for Subtitling in Dutch and English
	Discarding noise in an automatically acquired lexicon of support verb constructions
	Evaluating Multimodal NLG using Production Experiments
	Evaluation and Adaptation of the Celex Dutch Morphological Database
	Improving Automatic Phonetic Transcription of Spontaneous Speech through Variant-Based Pronunciation Variation Modelling
	Intelligent Building of Language Resources for HLT Applications
	Linguistic annotation of the Spoken Dutch Corpus: If we had to do it all over again ...
	On the Usefulness of Large Spoken Language Corpora for Linguistic Research
	Putting the Dutch PAROLE Corpus to Work
	Reusable Lexical Representations for Idioms
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	Term Translations in Parallel Corpora: Discovery and Consistency Check
	The Centre for Dutch Language and Speech Technology (TST Centre)
	The COST278 pan-European Broadcast News Database
	The Influence of the Labeller’s Regional Background on Phonetic Transcriptions: Implications for the Evaluation of Spoken Language Resources
	The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO
	The Integrated Language Database of 8th - 21st-Century Dutch
	The new Dutch-Flemish HLT Programme: a concerted effort to stimulate the HLT sector
	Use and Evaluation of Prosodic Annotations in Dutch
	Using a Parallel Transcript/Subtitle Corpus for Sentence Compression
	Using large multi-purpose corpora for specific research questions: discourse phenomena related to wh-questions in the Spoken Dutch Corpus
Dutch (historical)	The Integrated Language Database of 8th - 21st-Century Dutch
E
Ega	Securing Interpretability: The Case of Ega Language Documentation
Ega	WALA: a multilingual resource repository for West African Languages
EL	Multimodal Multilingual Resources in the Subtitling Process
EN	Multimodal Multilingual Resources in the Subtitling Process
English	A Chatbot as a Novel Corpus Visualization Tool
	A Comparative Study on Human Communication Behaviors and Linguistic Characteristics for Speech-to-Speech Translation
	A comparison of summarisation methods based on term specificity estimation
	A Comparison of Two Variant Corpora: The Same Content with Different Sources
	A Critical Survey of the Methodology for IE Evaluation
	A Domain-Independent Approach to IE Rule Development
	A Fine-Grained Evaluation Method for Speech-to-Speech Machine Translation Using Concept Annotations
	A Flexible Language Acquisition Tool Kit for Natural Language Processing
	A Framework for Evaluating the Suitability of Non-English Corpora for Language Engineering
	A Framework for Temporal Resolution
	A Freely Available Automatically Generated Thesaurus of Related Words
	A General-Purpose off-the-shelf Anaphora Resolution Module: Implementation and Preliminary Evaluation
	A Grammar and Style Checker Based on Internet Searches
	A Labelled Corpus for Prepositional Phrase Attachment
	A Large-Scale Resource for Storing and Recognizing Technical Terminology
	A Lexicon Module for a Grammar Development Environment
	A Methodology and Associated Tools for Building Interlingual Wordnets
	A Multilingual Database of Idioms
	A Multi-Modal Documentation System for Warao
	A natural language approach to information management: tracking scientific advances through the structure of words
	A New ITU-T Recommendation on the Evaluation of Telephone-Based Spoken Dialogue Systems
	A pattern extraction workbench combining multiple linguistic levels
	A powerful and versatile XML format for representing role-semantic annotation
	A practical competition of different filters used in automatic term extraction
	A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
	A Public Reference Implementation of the RAP Anaphora Resolution Algorithm
	A Similarity Measure for Unsupervised Semantic Disambiguation
	A Suite of Tools for Marking Up Textual Data for Temporal Text Mining Scenarios
	A word alignment system based on a translation equivalence extractor
	A2Q: an agent-based architecure for multilingual Q&A
	Abstracting a Dialogue Act Tagset for Meeting Processing
	Acquiring Bayesian Networks from Text
	Acquiring Reusable Multilingual Phonotactic Resources
	Adding Syntactic Annotations to Transcripts of Parent-Child Dialogs
	Agreement in Human Factoid Annotation for Summarization Evaluation
	ALLES: Integrating NLP in ICALL Applications
	An Analysis of the Relative Difficulty of Reuters-21578 Subsets
	An Annotation Scheme for Information Status in Dialogue
	An argumentative annotation schema for meeting discussions
	An Automatic Method for Constructing Domain-Specific Ontology Resources
	An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies
	An Information Repository Model for Advanced Question Answering Systems
	Annotating a corpus for building a domain-specific knowledge base
	Annotating Noun Argument Structure for NomBank
	Annotation of anaphoric expressions in an aligned bilingual corpus
	Annotation OfCoreference Relations Among Linguistic Expressions And Images In Biological Articles
	Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium
	Application of the BLEU Method for Evaluating Free-text Answers in an E-learning Environment
	Augmenting Manual Dictionaries for Statistical Machine Translation Systems
	Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences
	Automatic Acquisition of Sense Examples using ExRetriever
	Automatic Bilingual Lexicon Acquisition Using Random Indexing of Aligned Bilingual Data
	Automatic Building Gazetteers of Co-referring Named Entities
	Automatic Classification of Geographical Named Entities
	Automatic Generation of Glosses in the OntoLearn System
	Automatic Keyword Extraction from Spoken Text. A Comparison of two Lexical Resources: the EDR and WordNet
	Automatic Language-Independent Induction of Gazetteer Lists
	Automatic Sentence Simplification for Subtitling in Dutch and English
	Automatic transformation of phrase treebanks to dependency trees
	Automatic Translation Memory Fuzzy Match Post-Editing: A Step beyond Traditional TM/MT Integration
	Bayesian Semantics Incorporation to Web Content for Natural Language Information Retrieval
	Beyond TREC's Filtering Track
	BootCaT: Bootstrapping Corpora and Terms from the Web
	Building a Maritime Domain Lexicon: a Few Considerations on the Database Structure and the Semantic Coding
Building and Using a Corpus of Shallow Dialog Annotated Meetings
Building Part-of-speech Corpora through Histogram Hopping
Calibrating Resource-light Automatic MT Evaluation: A Cheap Approach to Ranking MT Systems by the Usability of their Output
Can Anaphoric Definite Descriptions be Replaced by Pronouns?
Categorizing Web Pages as a Preprocessing Step for Information Extraction
CHeM: A System for the Automatic Analysis of e-mails in the Restoration and Conservation Domain
Cluster Analysis and Classification of Named Entities
Clustering Concept Hierarchies from Text
CoGesT: A Formal Transcription System for Conversational Gesture
Collection of SLR in the Asian-Pacific area
Collocation Extraction Using Web Statistics
Combining Heterogeneous Lexical Resources
Comparative Evaluation Of A Stochastic Parser On Semantic And Syntactic-Semantic Labels
Computing Reliability for Coreference Annotation
Concept Creation in Lexical Ontologies
Connector Usage in the English Essay Writing of Japanese EFL Learners
Consistent Storage of Metadata in Inference Lexica: The MetaLex Approach
Constructing Word-Sense Association Networks from Bilingual Dictionary and Comparable Corpora
Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004
Converting Treebank Annotations to Language Neutral Syntax
Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients
Creation of reusable components and language resources for Named Entity Recognition in Russian
Cross-effective cross-lingual document classification
Cross-Language Acquisition of Semantic Models for Verbal Predicates
CST Bank: A Corpus for the Study of Cross-document Structural Relationships
Data Driven Ontology Evaluation
Definition, dictionaries and tagger for Extended Named Entity Hierarchy
Designing a Realistic Evaluation of an End-to-end Interactive Question Answering System
Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus
Detection of Domain Specific Terminology Using Corpora Comparison
Development of Bilingual Domain-Specific Ontology for Automatic Conceptual Indexing
Development of Ontologies with Minimal Set of Conceptual Relations
Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
Enriching a Thai Lexical Database with Selectional Preferences
Enriching WordNet Via Generative Metonymy and Creative Polysemy
EuroWordNet as a Resource for Cross-language Information Retrieval
Evaluating Conversation with Hans Christian Andersen
Evaluating Factors Impacting the Accuracy of Forced Alignments in a Multimodal Corpus
Evaluating Lexical Resources for A Semantic Tagger
Evaluating Name-Matching for Coreference Resolution
Evaluating Variants of the Lesk Approach for Disambiguating Words
Evaluation and Adaptation of a Specialised Language Checking Tool for Non-specialised Machine Translation and Non-expert MT Users for Multi-lingual Telecooperation
Evaluation of Cross-Language Information Retrieval Using the Domain-Specific GIRT Data as Parallel German-English Corpus
Evaluation of Different Similarity Measures for the Extraction of Multiword Units in a Reinforcement Learning Environment
Evaluation of Multi-party Virtual Reality Dialogue Interaction
Evaluation of Transcription and Annotation Tools for a Multi-modal, Multi-party Dialogue Corpus
Evaluation Resources for Concept-based Cross-Lingual Information Retrieval in the Medical Domain
Exploiting Anchor Text as a Lexical Resource
Exploiting Language Resources for Semantic Web Annotations
Exploiting Semantic Web Technologies for Intelligent Access to Historical Documents
Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing
Exploring Portability of Syntactic Information from English to Basque
Extending a verb-lexicon using a semantically annotated corpus
Extending WordNets to Implicit Information
FreeLing: An Open-Source Suite of Language Analyzers
French-English multi-word term alignment based on lexical context analysis
Frequent Term Distribution Measures for Dataset Profiling
How Does Automatic Machine Translation Evaluation Correlate With Human Scoring as the Number of Reference Translations Increases?
How to Disassemble Alphabetical Processions - Morphological Treatment of Unknown Words
Human dialogue modelling using annotated corpora
Identifying Definitions in Text Collections for Question Answering
Improving Collocation Extraction for High Frequency Words
Incremental Knowledge Acquisition from WordNet and EuroWordNet
Incremental Methods to Select Test Sentences for Evaluating Translation Ability
Information Retrieval System Using Latent Contextual Relevance
INSPIRE: Evaluation of a Smart-Home System for Infotainment Management and Device Control
Integrated Language Technologies for Multilingual Information Services in the MEMPHIS Project
Issues in Corpus Cevelopment for Muli-party Multi-modal Task-oriented Dialogue
Language Model Adaptation for Statistical Machine Translation based on Information Retrieval
Large Scale Experiments for Semantic Labeling of Noun Phrases in Raw Text
Linguistic Corpus Search
Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text
MEAD - A Platform for Multidocument Multilingual Text Summarization
Meaningful Clusters
Mercedes, A Term-In-Context Highlighter
Mining the Web for Discourse Markers
Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality
MT Goes Farming: Comparing Two Machine Translation Approaches on a New Domain
MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
Multi-Document Summarization using Multiple-Sequence Alignment
Multilingual Corpus-based Approach to the Resolution of English -ing
Multi-lingual Evaluation of a Natural Language Generation System
Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions
Multimodal Meaning Representation for Generic Dialogue Systems Architectures
NameNet: A Self-Improving Resource for Name Classification
N-Gram Language Modeling for Robust Multi-Lingual Document Classification
NLP-enhanced Content Filtering within the POESIA Project
OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations
Open Resources for Language Technology
Open-source Tools for Creation, Maintenance, and Storage of Lexical Resources for Language Generation from Ontologies
OrienTel - Telephony Databases Across Northern Africa and the Middle East
Parsing Ungrammatical Input: An Evaluation Procedure
Part-of-Speech Annotation of Biology Research Abstracts
Polysemy and Category Structure in WordNet: An Evidential Approach
Prague Czech-English Dependency Treebank, Syntactically Annotated Resources for Machine Translation
Pronominal Anaphora Resolution for Unrestricted Text
Proper Names and Polysemy: from a Lexicographic Experience
Publicly Available Topic Signatures for all WordNet Nominal Senses
Querying both time-aligned and hierarchical corpora with NXT Search
Raising the Bar: Stacked Conservative Error Correction Beyond Boosting
Resources and Techniques for Multilingual Information Extraction
Resources for Place Name Analysis
Reusable Lexical Representations for Idioms
Re-using high-quality resources for continued evaluation of automated summarization systems
RevisionBank: A Resource for Revision-based Multi-document Summarization and Evaluation
Road-testing the English Resource Grammar over the British National Corpus
SALA II across the finish line: a large collection of mobile telephone speech databases from North and Latin America completed
Selecting the Correct English Synset for a Spanish Sense
Semi-Automatic Construction of a Question Treebank
Semi-automatic Syntactic and Semantic Corpus Annotation with a Deep Parser
Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO
Some Meaning Procedures of Ontological Semantics
Spanish WordNet 1.6: Porting the Spanish WordNet Across Princeton Versions
Speech & Expression - The Value of a Longitudinal Corpus
Steps towards Semantically Annotated Language Resources
Summarization of Multimodal Information
Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Term Translations in Parallel Corpora: Discovery and Consistency Check
Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop
Text Corpora, Local Grammars and Prediction
Textual Distraction as a Basis for Evaluating Automatic Summarisers
The AAC [Austrian Academy Corpus] An Enterprise to Develop Large Electronic Text Corpora
The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation
The Bilingual Web Dictionary on Demand
The Corpógrafo – a Web-based environment for corpora research
The Cross-Breeding of Dictionaries
The DeepThought Core Architecture Framework
The Effect of Bias on an Automatically-built Word Sense Corpus
The Fisher Corpus: A Resource for the Next Generations of Speech-to-Text
The GENOMA-KB Platform: Queries Over Integrated Linguistic Resources
The GENOMA-KB project: towards the integration of concepts, terms, textual corpora and entities
The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO
The Italian NESPOLE! Corpus: A Multilingual Database with Interlingua Annotation in Tourism and Medical Domains
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
The MULI Project: Annotation and Analysis of Information Structure in German and English
The NIST Meeting Room Pilot Corpus
The OLISSIPO and LECTIO Projects
The overview of the SST speech corpus of Japanese learner English and evaluation through the experiment on automatic detection of learners' errors
The Penn Discourse Treebank
The Rationale for Building Resources Expressly for NLP
The Role of MultiWord Terminology in Knowledge Management
The Translation Correction Tool: English-Spanish User Studies
Tiered Tagging Revisited
Tone-of-Voice and Controlled Language Techniques
Top Ontology as a Tool for Semantic Role Tagging
Towards basic categories for describing properties of texts in a corpus
Towards the MEANING Top Ontology: Sources of Ontological Meaning
Training a Sentence-Level Machine Translation Confidence Measure
Unsupervised Text Mining for Ontology Extraction: An Evaluation of Statistical Measures
Using Paradigm Tables to Generate New Utterances Similar to those Existing in Linguistic Resources
Using the NITE XML Toolkit on the Switchboard Corpus to Study Syntactic Choice: A Case Study
Using the Penn Treebank to Evaluate Non-Treebank Parsers
Using the Web as a Corpus for the Syntactic-Based Collocation Identification
Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts
Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts
Using WordNet to Measure Semantic Orientations of Adjectives
Utilization of Multiple Language Resources for Robust Grammar-Based Tense and Aspect Classification
Utilizing the One-Sense-per-Discourse Constraint for Fully Unsupervised Word Sense Induction and Disambiguation
Why do you ignore me? - Proof that not all direct speech is bad
Word Association Norms as a Unique Supplement of Traditional Language Resources
Word Sense Disambiguation as a Wordnets' Validation Method in Balkanet
Word Sense Disambiguation Using Random Indexing
You stupid tin box' - children interacting with the AIBO robot: A cross-linguistic emotional speech corpus
Semi-automatic Acquisition of Command Grammar
English (in scientific texts)	An Annotation Scheme for a Rhetorical Analysis of Biology Articles
English (U.S., Belize)	Developing Language Resources for a Transnational Digital Government System
Estonian	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	Tiered Tagging Revisited
F
Farsi	Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients
Farsi	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Finnish
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	Infrastructure for Collaborative Annotation of Speech
FR	Multimodal Multilingual Resources in the Subtitling Process
French	A Chatbot as a Novel Corpus Visualization Tool
	A complete understanding speech system based on semantic concepts
	An Evaluation Protocol For Text Mining Tools : ALCESTE SAS TEXT MINER SPAD-CRM AND TEMIS Text Mining Solutions Testing
	Annotation of anaphoric expressions in an aligned bilingual corpus
	Automatic audio and manual transcripts alignment, time-code transfer and selection of exact transcripts
	Automatisation Of The Activity Of Term Collection In Different Languages
	Building Part-of-speech Corpora through Histogram Hopping
	Calibrating Resource-light Automatic MT Evaluation: A Cheap Approach to Ranking MT Systems by the Usability of their Output
	Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment
	Development of New Telephone Speech Databases for French: The NEOLOGOS Project
	Enriching a French Treebank
	Evaluating an Authentic Audio-Visual Expressive Speech Corpus
	Evaluation Of A Speech Cuer: From Motion Capture To A Concatenative Text-To-Cued Speech System
	Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”
	Experiments on Building Language Resources for Multi-Modal Dialogue Systems
	French-English multi-word term alignment based on lexical context analysis
	Generating Coreferential Descriptions from a Structured Model of the Context
	Intelligent Building of Language Resources for HLT Applications
	Language Modeling using Dynamic Bayesian Networks
	Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects
	MED-TYP: A Typological Database for Mediterranean Languages
	Metaphors in Wordnets: from Theory to Practice
	Methodology For Building Thematic Indexes In Medecine For French
	Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality
	Morphology Based Automatic Acquisition of Large-coverage Lexica
	Multilingual Corpus-based Approach to the Resolution of English -ing
	NLP-enhanced Content Filtering within the POESIA Project
	OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations
	OrienTel - Telephony Databases Across Northern Africa and the Middle East
	Resources and Techniques for Multilingual Information Extraction
	SALA II across the finish line: a large collection of mobile telephone speech databases from North and Latin America completed
	Semi-automatic Acquisition of Command Grammar
	Semi-Automatic Derivation of a French Lexicon from CLIPS
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	Term Translations in Parallel Corpora: Discovery and Consistency Check
	The Bilingual Web Dictionary on Demand
	The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages
	The ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News
	The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO
	The Italian NESPOLE! Corpus: A Multilingual Database with Interlingua Annotation in Tourism and Medical Domains
	Using the Web as a Corpus for the Syntactic-Based Collocation Identification
	Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts
French Sign Language	Toward an Annotation Software for Video of Sign Language, Including Image Processing Tools and Signing Space Modelling
Friulan	MED-TYP: A Typological Database for Mediterranean Languages
G
Gaelic	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
Galician	A Galician Textual Corpus for Morphosyntactic Tagging with Application to Text-to-Speech Synthesis
	Parallel corpora for the Galician language: building and processing of the CLUVI (Linguistic Corpus of the University of Vigo)
	The COST278 pan-European Broadcast News Database
	Transcrigal: A Bilingual System for Automatic Indexing of Broadcast News
General	Distributional Consistency: As a General Method for Defining a Core Lexicon
German	The BITS Speech Synthesis Corpus for German
	A High Quality Partial Parser for Annotating German Text Corpora
	A powerful and versatile XML format for representing role-semantic annotation
	A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
	ALLES: Integrating NLP in ICALL Applications
	An Annotated Corpus of Tutorial Dialogs on Mathematical Theorem Proving
	An Annotated German-Language Medical Text Corpus as Language Resource
	Annotating a corpus for building a domain-specific knowledge base
	Automated Morphological Segmentation and Evaluation
	Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences
	Automatic Bilingual Lexicon Acquisition Using Random Indexing of Aligned Bilingual Data
	Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons
	Automatic transformation of phrase treebanks to dependency trees
	Automatic Translation Memory Fuzzy Match Post-Editing: A Step beyond Traditional TM/MT Integration
	Automatisation Of The Activity Of Term Collection In Different Languages
	Bootstrapping a database of German multi-word expressions
	CoGesT: A Formal Transcription System for Conversational Gesture
	Consistent Storage of Metadata in Inference Lexica: The MetaLex Approach
	Corpus based Enrichment of GermaNet Verb Frames
	Corpus-based Learning of Lexical Resources for German Named Entity Recognition
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	Development and Integration of the LDA-Toolkit into the COST249 SpeechDat (II) SIG Reference Recognizer
	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
	Evaluation and Adaptation of a Specialised Language Checking Tool for Non-specialised Machine Translation and Non-expert MT Users for Multi-lingual Telecooperation
	Evaluation of Cross-Language Information Retrieval Using the Domain-Specific GIRT Data as Parallel German-English Corpus
	Evaluation of Microphone Array Front-Ends for ASR - an Extension of the AURORA Framework
	Exploiting Coreference Annotations for Text-to-Hypertext Conversion
	How to Disassemble Alphabetical Processions - Morphological Treatment of Unknown Words
	Identifying Morphosyntactic Preferences in Collocations
	Integrated Language Technologies for Multilingual Information Services in the MEMPHIS Project
	Intelligent Building of Language Resources for HLT Applications
	Linguistic Corpus Search
	MAUS Goes Iterative
	Metaphors in Wordnets: from Theory to Practice
	Multilingual Corpus-based Approach to the Resolution of English -ing
	N-Gram Language Modeling for Robust Multi-Lingual Document Classification
	OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations
	OrienTel - Telephony Databases Across Northern Africa and the Middle East
	Querying both time-aligned and hierarchical corpora with NXT Search
	Resources and Techniques for Multilingual Information Extraction
	Rethinking readability of digital editions - the case of the AAC’s "Digital Brenner"
	SMOR: A German Computational Morphology Covering Derivation, Composition, and Inflection
	Speech recognition simulation and its application for Wizard of Oz experiments
	Steps towards Semantically Annotated Language Resources
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The AAC [Austrian Academy Corpus] An Enterprise to Develop Large Electronic Text Corpora
	The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases
	The DeepThought Core Architecture Framework
	The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO
	The Italian NESPOLE! Corpus: A Multilingual Database with Interlingua Annotation in Tourism and Medical Domains
	The MULI Project: Annotation and Analysis of Information Structure in German and English
	The Statistical Analysis of Morphosyntactic Distributions
	The TüBa-D/Z Treebank: Annotating German with a Context-Free Backbone
	Tools for Upgrading Printed Dictionaries by Means of Corpus-based Lexical Acquisition
	Towards a Dynamic Lexicon: Predicting the Syntactic Argument Structure of Complex Verbs
	Unexpected Productions May Well be Errors
	You stupid tin box' - children interacting with the AIBO robot: A cross-linguistic emotional speech corpus
	Pumping Documents Through a Domain and Genre Classification Pipeline
German (Deutsch)	Evaluation Resources for Concept-based Cross-Lingual Information Retrieval in the Medical Domain
German (Deutsch)	Towards Ontology Engineering Based on Linguistic Analysis
Greek	A Bayesian Model for Shallow Syntactic Parsing of Natural Language Texts
	A Methodology and Associated Tools for Building Interlingual Wordnets
	Bayesian Semantics Incorporation to Web Content for Natural Language Information Retrieval
	Bypassing Greeklish!
	Corpus Design, Recording and Phonetic Analysis of Greek Emotional Database
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	Cypriot Speech Database: Data Collection and Greek to Cypriot Dialect Adaptation
	Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing
	Handling Subtle Sense Distinctions through Wordnet Semantic Types
	Learning to predict Pitch Accents using Bayesian Belief Networks for Greek Language
	Multi-lingual Evaluation of a Natural Language Generation System
	OrienTel - Telephony Databases Across Northern Africa and the Middle East
	Reusing Language Resources for Speech Applications involving Emotion
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The COST278 pan-European Broadcast News Database
H
Hebrew	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
Hebrew	OrienTel - Telephony Databases Across Northern Africa and the Middle East
Hindi
	Automatic Generation of Compound Word Lexicon for Hindi Speech Synthesis
	Automatic Language-Independent Induction of Gazetteer Lists
	Collection of SLR in the Asian-Pacific area
	Information Extraction from Hindi Texts
Hungarian	Combining symbolic and statistical methods in morphological analysis and unknown word guessing
	Creating open language resources for Hungarian
	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases
	Tiered Tagging Revisited
I
Ibibio	WALA: a multilingual resource repository for West African Languages
Iko	WALA: a multilingual resource repository for West African Languages
Independent	Towards A Language Infrastructure for the Semantic Web
Indic Scripts	An XML Representation for Annotated Handwriting Datasets for Online Handwriting Recognition
Indic Scripts	Experiences in Collection of Handwriting Data for Online Handwriting Recognition in Indic Scripts
Irish	Acquiring Reusable Multilingual Phonotactic Resources
Italian	A2Q: an agent-based architecure for multilingual Q&A
	Automatisation Of The Activity Of Term Collection In Different Languages
	BootCaT: Bootstrapping Corpora and Terms from the Web
	Building a Large Grammar for Italian
	Building a Maritime Domain Lexicon: a Few Considerations on the Database Structure and the Semantic Coding
	Building Distributed Language Resources by Grid Computing
	CHeM: A System for the Automatic Analysis of e-mails in the Restoration and Conservation Domain
	Computational Lexicography and Carlo Emilio Gadda, Principe dell'Analisi e Duca della Buona Cognizione
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	Cross-Language Acquisition of Semantic Models for Verbal Predicates
	Discovery of (New) Knowledge and the Analysis of Text Corpora
	Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”
	How to Disassemble Alphabetical Processions - Morphological Treatment of Unknown Words
	Hybrid Constraints for Robust Parsing: First Experiments and Evaluation
	Integrated Language Technologies for Multilingual Information Services in the MEMPHIS Project
	Introducing the La Repubblica Corpus: A large Annotated TEI(XML)-Compliant Corpus of Newspaper Italian
	Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects
	MED-TYP: A Typological Database for Mediterranean Languages
	Metaphors in Wordnets: from Theory to Practice
	Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions
	NLP-enhanced Content Filtering within the POESIA Project
	OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations
	Proper Names and Polysemy: from a Lexicographic Experience
	Semantic Mark-up of Italian Legal Texts Through NLP-based Techniques
	Semi-Automatic Derivation of a French Lexicon from CLIPS
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	Term Translations in Parallel Corpora: Discovery and Consistency Check
	The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages
	The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO
	The Italian NESPOLE! Corpus: A Multilingual Database with Interlingua Annotation in Tourism and Medical Domains
	Towards the MEANING Top Ontology: Sources of Ontological Meaning
	Unifying Lexicons in View of a Phonological and Morphological Lexical DB
	Using cooccurrence statistics and the web to discover synonyms in a technical language
	Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-ORAL-ROM Italian
	Using Semantic Language Resources to Support Textual Inference for Question Answering
J
Japanese	A Comparative Study on Human Communication Behaviors and Linguistic Characteristics for Speech-to-Speech Translation
	A Comparison of Two Variant Corpora: The Same Content with Different Sources
	A Lexicon Module for a Grammar Development Environment
	An Information Repository Model for Advanced Question Answering Systems
	Automatic Extraction of Hyponyms from Japanese Newspapers Using Lexico-syntactic Patterns
	Building a Paraphrase Corpus for Speech Translation
	Classification of Japanese Spatial Nouns
	Collecting Spontaneously Spoken Queries for Information Retrieval
	Comparison of some automatic and manual methods for summary evaluation based on the Text Summarization Challenge 2
	Concept-based queries: Combining and Reusing Linguistic Corpus Formats and Query Languages
	Consistent Storage of Metadata in Inference Lexica: The MetaLex Approach
	Constructing Word-Sense Association Networks from Bilingual Dictionary and Comparable Corpora
	Co-reference in Japanese Task-oriented Dialogues: A Contribution to the Development of Language-specific and Language-general Annotation Schemes and Resources
	Definition, dictionaries and tagger for Extended Named Entity Hierarchy
	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
	Evaluating the FOKS Error Model
	Extraction of Hyperonymy of Adjectives from Large Corpora by Using the Neural Network Model
	How Does Automatic Machine Translation Evaluation Correlate With Human Scoring as the Number of Reference Translations Increases?
	Incremental Methods to Select Test Sentences for Evaluating Translation Ability
	Korean-Chinese-Japanese Multilingual Wordnet with Shared Semantic Hierarchy
	Making an XML-based Japanese-Slovene Learners' Dictionary
	Multilingual Corpus-based Approach to the Resolution of English -ing
	Perceptual Evaluation of Quality Deterioration Owing to Prosody Modification
	Phrase-Based Dependency Evaluation of a Japanese Parser
	Related Word-pairs Extraction without Dictionaries
	Semi-supervised learning by Fuzzy clustering and Ensemble learning
	Speech & Expression - The Value of a Longitudinal Corpus
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	Terminal Device Oriented Comparable Corpora and its Alignment -- Towards Extracting Paraphrasing Patterns --
	Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop
	Toward Text Understanding: Integrating Relevance-tagged Corpus and Automatically Constructed Case Frames
	Collection of SLR in the Asian-Pacific area
K
Korean	A Comparison of Two Variant Corpora: The Same Content with Different Sources
	A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
	Collection of SLR in the Asian-Pacific area
	Creation and Assessment of Korean Speech and Noise DB in Car Environment
	Korean-Chinese-Japanese Multilingual Wordnet with Shared Semantic Hierarchy
	Lexical Analysis of Agglutinative Languages Using a Dictionary of Lemmas and Lexical Transducers
	Sejong Korean Corpora in the Making
	Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop
L
Language independent	A Global Data Category Registry for Interoperable Language Resources
	Data Driven Ontology Evaluation
	Evaluation of Microphone Array Front-Ends for ASR - an Extension of the AURORA Framework
	Infrastructure for Collaborative Annotation of Speech
	Online Evaluation of Coreference Resolution
	Principles of a system for terminological concept modelling
	A Graphical Tool for Handling Rule Grammars in Java Speech Grammar Format
	A Search Tool for Corpora with Positional Tagsets and Ambiguities
	Highlighting latent structure in documents
	Linguistic Corpus Search
	Pumping Documents Through a Domain and Genre Classification Pipeline
	Standardization in Multimodal Content Representation: Some Methodological Issues
	SVMTool: A general POS tagger generator based on Support Vector Machines
	Towards an International Standard on Feature Structure Representation
Language-independent (multilingual interface)	An Environment for Dialogue Corpora Collection (ENDIACC)
Latvian	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
Lithuanian	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
M
Maltese	MED-TYP: A Typological Database for Mediterranean Languages
Mambila	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Mandarin	A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
	Collection of SLR in the Asian-Pacific area
	Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The Fisher Corpus: A Resource for the Next Generations of Speech-to-Text
	The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
Many	Current Projects in Languages of Military Interest at the Defense Language Institute
Maori	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
Mapudungun	Data Collection and Analysis of Mapudungun Morphology for Spelling Correction
Mexican Spanish	VOXMEX Speech Database: Design of a Phonetically Balanced Corpus
Modern Greek	MED-TYP: A Typological Database for Mediterranean Languages
Modern Greek in a multilingual context	Creating multi-purpose linguistic resources for Modern Greek: a deep Modern Greek Grammar
Modern Hebrew	MED-TYP: A Typological Database for Mediterranean Languages
Modern Standard Arabic	MED-TYP: A Typological Database for Mediterranean Languages
Moroccan Arabic	An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies
Multilingual	OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations
	Rethinking Reusable Resources
	Semi-automatic UNL Dictionary Generation using WordNet.PT
Multilingual approach	Automatic Translation Memory Fuzzy Match Post-Editing: A Step beyond Traditional TM/MT Integration
Multilingual approach	Intelligent Building of Language Resources for HLT Applications
Multiple	NIST Language Technology Evaluation Cookbook
Multiple	SLR Validation: Current Trends and Developments
N
Nahuatl	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
Norwegian	A Lexicon Module for a Grammar Development Environment
Norwegian	Memory-based Classification of Proper Names in Norwegian
O
Old-Church Slavonic	Towards Intelligent Written Cultural Heritage Processing - Lexical Processing
OWL	Ontology Evaluation Functionalities of RDF(S), DAML+OIL, and OWL Parsers and Ontology Platforms
P
Persian	Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients
Polish	A Search Tool for Corpora with Positional Tagsets and Ambiguities
	Extraction of Polish Named-Entities
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Portuguese	A Multilingual Database of Idioms
	An Efficient Word Confidence Measure Using Likelihood Ratio Scores
	Design and Implementation of a Semantic Search Engine for Portuguese
	Evaluating Solutions for the Rapid Development of State-of-the-Art POS taggers for Portuguese
	Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”
	Extending WordNets to Implicit Information
	INQUER: A WordNet-based Question-Answering Application
	Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects
	Multifunctional Computational Lexicon of Contemporary Portuguese: An Available Resource for Multitype Applications
	On the problems of creating a golden standard of inflected forms in Portuguese
	Portuguese Large-scale Language Resources for NLP Applications
	Providing on-line access to Portuguese language resources: corpora and lexicons
	SALA II across the finish line: a large collection of mobile telephone speech databases from North and Latin America completed
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages
	The Corpógrafo – a Web-based environment for corpora research
	The COST278 pan-European Broadcast News Database
	The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO
	The Lácio-Web: Corpora and Tools to advance Brazilian Portuguese Language Investigations and Computational Linguistic Tools
	The Verb in the Terminological Collocations. Contribution to the Development of a Morphological Analyser. MorphoComp
	What is my Style? Using Stylistic Features of Portuguese Web Texts to classify Web pages according to Users' Needs
Portuguese (European)	An Acoustic Corpus Contemplating Regional Variation for Studies of European Portuguese Nasals
Potentiallly all	Using the Penn Treebank to Evaluate Non-Treebank Parsers
Provençal	MED-TYP: A Typological Database for Mediterranean Languages
Q
Q’anjob’al (Mayan Guatemala)	Applying Computational Linguistic Techniques in a Documentary Project for Q’anjob’al (Mayan Guatemala)
Quechua	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
R
RDF(S)	Ontology Evaluation Functionalities of RDF(S), DAML+OIL, and OWL Parsers and Ontology Platforms
Resian	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
Romanian	A Methodology and Associated Tools for Building Interlingual Wordnets
	A word alignment system based on a translation equivalence extractor
	Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing
	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	Tiered Tagging Revisited
	Word Sense Disambiguation as a Wordnets' Validation Method in Balkanet
Russian	A Flexible Language Acquisition Tool Kit for Natural Language Processing
	A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
	Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	Creation of reusable components and language resources for Named Entity Recognition in Russian
	Development of Bilingual Domain-Specific Ontology for Automatic Conceptual Indexing
	Development of Ontologies with Minimal Set of Conceptual Relations
	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
	Integration of Russian Language Resources
	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
	Russian Information Retrieval Evaluation Seminar
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The AAC [Austrian Academy Corpus] An Enterprise to Develop Large Electronic Text Corpora
	The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
	Towards basic categories for describing properties of texts in a corpus
	Word Association Norms as a Unique Supplement of Traditional Language Resources
S
Sardinian	MED-TYP: A Typological Database for Mediterranean Languages
Scottish	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
Serbian	A Methodology and Associated Tools for Building Interlingual Wordnets
	Combining Heterogeneous Lexical Resources
	Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing
	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	Towards the Use of Word Stems and Suffixes for Statistical Machine Translation
Serbo-Croatian	MED-TYP: A Typological Database for Mediterranean Languages
Slovak	The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases
Slovakian	The COST278 pan-European Broadcast News Database
Slovene	Making an XML-based Japanese-Slovene Learners' Dictionary
	MED-TYP: A Typological Database for Mediterranean Languages
	MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
	Tiered Tagging Revisited
Slovenian	A data-driven adaptation of prosody in a multilingual TTS
	Acquisition and Annotation of Slovenian Broadcast News Database
	Creating Slovenian Language Resources for Development of Speech-to-Speech Translation Components
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	Development and Integration of the LDA-Toolkit into the COST249 SpeechDat (II) SIG Reference Recognizer
	Development of Slovenian Broadcast News Speech Database
	The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases
	The COST278 pan-European Broadcast News Database
Sotho	The African Speech Technology Project: An Assessment
South African English	The African Speech Technology Project: An Assessment
Spanish	A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
	ALLES: Integrating NLP in ICALL Applications
	Application of the BLEU Method for Evaluating Free-text Answers in an E-learning Environment
	Automatically selecting domain markers for terminology extraction
	AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition
	Bilingual Connections for Trilingual Corpora: An XML Approach
	Construction of a Bilingual Arabic-Spanish Lexicon of Verbs Based on a Parallel Corpus
	Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	Cross-effective cross-lingual document classification
	Cross-Language Acquisition of Semantic Models for Verbal Predicates
	Development and Integration of the LDA-Toolkit into the COST249 SpeechDat (II) SIG Reference Recognizer
	Development of Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish
	Enriching EWN with Syntagmatic Information by means of WSD
	Enriching the Spanish EuroWordNet by Collocations
	EuroWordNet as a Resource for Cross-language Information Retrieval
	Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”
	FreeLing: An Open-Source Suite of Language Analyzers
	Intelligent Building of Language Resources for HLT Applications
	Lexical Entry Templates for Robust Deep Parsing
	Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects
	MED-TYP: A Typological Database for Mediterranean Languages
	Mercedes, A Term-In-Context Highlighter
	Methodology for Rapid Prototyping and Testing of ASR Based User Interfaces
	MiniCors and Cast3LB: Two Semantically Tagged Spanish Corpora
	Multilingual Corpus-based Approach to the Resolution of English -ing
	Multiple Sequence Alignment for characterizing the linear structure of revision
	NLP-enhanced Content Filtering within the POESIA Project
	NLP-enhanced error Checking for Catalan unrestricted text
	OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations
	SALA II across the finish line: a large collection of mobile telephone speech databases from North and Latin America completed
	Selecting the Correct English Synset for a Spanish Sense
	Semantic categorization of Spanish se-constructions
	Spanish WordNet 1.6: Porting the Spanish WordNet Across Princeton Versions
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
	The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages
	The COST 278 MASPER initiative - crosslingual speech recognition with large telephone databases
	The GENOMA-KB Platform: Queries Over Integrated Linguistic Resources
	The GENOMA-KB project: towards the integration of concepts, terms, textual corpora and entities
	The Integral Dictionary: An Ontological Resource for the Semantic Web Integration of EuroWordNet, Balkanet, TID and SUMO
	The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
	The SPARTACUS-Database: a Spanish Sentence Database for Offline Handwriting Recognition
	The Translation Correction Tool: English-Spanish User Studies
	Towards the MEANING Top Ontology: Sources of Ontological Meaning
	Towards the Use of Word Stems and Suffixes for Statistical Machine Translation
	Training a Sentence-Level Machine Translation Confidence Measure
	Transcrigal: A Bilingual System for Automatic Indexing of Broadcast News
	Translation memories enrichment by statistical bilingual segmentation
Spanish (Latin American)	Developing Language Resources for a Transnational Digital Government System
Swahili	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
Swedish	A pattern extraction workbench combining multiple linguistic levels
	Finding the Correct Interpretation of Swedish Compounds a Statistical Approach
	MT Goes Farming: Comparing Two Machine Translation Approaches on a New Domain
	Open Resources for Language Technology
	Probabilistic Detection of Context-Sensitive Spelling Errors
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
T
Tamil	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
Thai	Collection of SLR in the Asian-Pacific area
	Enriching a Thai Lexical Database with Selectional Preferences
	Open Collaborative Development of the Thai Language Resources for Natural Language Processing
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Tibetan	A Syntactically Annotated Corpus of Tibetan
Turkish	A Methodology and Associated Tools for Building Interlingual Wordnets
	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
	Development of a Corpus Workbench for the METU Turkish Corpus
	Duration Modeling for Turkish Text-to-Speech Synthesis System
	Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing
	MED-TYP: A Typological Database for Mediterranean Languages
	OrienTel - Telephony Databases Across Northern Africa and the Middle East
	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Tzeltal	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Tzotzil	Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
U
Universal	A Large Metadata Domain of Language Resources
	Architecture for Distributed Language Resource Management and Archiving
	Cross-Disciplinary Integration of Metadata Descriptions
	Design of an Interactive Web-based User Interface for Speech Database Query Formation
US-English	Bilingual Connections for Trilingual Corpora: An XML Approach
US-English	Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
V
Various Native American	An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies
Vietnamese	Developping tools and building linguistic resources for Vietnamese morpho-syntactic processing
	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
	Spoken and Written Language Resources for Vietnamese
W
Warao	A Multi-Modal Documentation System for Warao
Warlpiri	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
X
Xhosa	The African Speech Technology Project: An Assessment
Z
Zeltal	Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report
Zulu	Software Tools for Morphological Tagging of Zulu Corpora and Lexicon Development
Zulu	The African Speech Technology Project: An Assessment

Languages

A / B / C / D / E / F / G / H / I / J / K / L / M / N / O / P / Q / R / S / T / U / V / W / X / Z