LREC 2000 2nd International Conference on Language Resources & Evaluation  
Home Basic Info Archaeological Zappeion Registration Conference

Conference Sessions

Program
Papers
Sessions
Abstracts
Authors
Keywords
Search

Introductory Messages

Message of the Chairman of the Local Organising Committee:
Professor George Carayannis

Introduction of the Conference Chairman:
Professor Antonio Zampolli

Message from ELRA's CEO:
Khalid Choukri




VOLUME I

Panels

Researches for the Millenium
Catherime Macleod

Human Language Technology Resources for Central European Languages: European Integration Issues
Zygmunt Vetulani

Multilingual Content Encoding and Translation
Antonio Sanfilippo




Session WO1 - Corpus Tagging

Developing Guidelines and Ensuring Consistency for Chinese Text Annotation
Fei Xia, Martha Palmer, Nianwen Xue, Mary Ellen Okurowski, John Kovarik, Fu-Dong Chiou, Shizhe Huang, Tony Kroch, Mitch Marcus

Using Machine Learning Methods to Improve Quality of Tagged Corpora and Learning Models
Yuji Matsumoto, Tatsuo Yamashita

Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers
Jakub Zavrel, Walter Daelemans

Something Borrowed, Something Blue: Rule-based Combination of POS Taggers
Lars Borin


Session EO1 - Evaluation of Machine Translation

Determining the Tolerance of Text-handling Tasks for MT Output
John White, Jennifer Doyon, Susan Talbott

Evaluating Translation Quality as Input to Product Development
Niamh Bohan, Elisabeth Breidt, Martin Volk

An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research
Sonja Nie?en, Franz Josef Och, Gregor Leusch, Hermann Ney


Session SO1 - Data Centers / Major Projects

Issues in Corpus Creation and Distribution: The Evolution of the Linguistic Data Consortium
Christopher Cieri, Mark Liberman

The Establishment of Motorola's Human Language Data Resource Center: Addressing the Criticality of Language Resources in the Industrial Setting
Jim Talley

A Platform for Dutch in Human Language Technologies
Elisabeth D'Halleweyn, Erwin Dewallef, Jeannine Beeken

Recent Developments within the European Language Resources Association (ELRA)
Khalid Choukri, Audrey Mance, Valerie Mapelli

COCOSDA - a Progress Report
Nick Campbell

Survey of Language Engineering Needs: a Language Resources Perspective
Jeffrey Allen, Khalid Choukri


Session WO2 - Treebanks

Building a Treebank for French
Anne Abeille, Lionel Clement, Alexandra Kinyon

Semantico-syntactic Tagging of Very Large Corpora: the Case of Restoration of Nodes on the Underlying Level
Eva Hajicova, Petr Sgall

Building a Treebank for Italian: a Data-driven Annotation Schema
Cristina Bosco, Vincenzo Lombardo, Daniela Vassallo, Leonardo Lesmo

A Treebank of Spanish and its Application to Parsing
Antonio Moreno, Ralph Grishman, Susana Lopez, Fernando Sanchez, Satoshi Sekine

Shallow Parsing and Functional Structure in Italian Corpora
Rodolfo Delmonte

An XML-based Representation Format for Syntactically Annotated Corpora
Andreas Mengel, Wolfgang Lezius


Session WO3 - Corpus Categorisation

Modern Greek Corpus Taxonomy
George Mikros, George Carayannis

Automatic Style Categorisation of Corpora in the Greek Language
George Tambouratzis, Stella Markantonatou, Nikolaos Hairetakis, George Carayannis

TyPTex: Inductive Typological Text Classification by Multivariate Statistical Analysis for NLP Systems Tuning/Evaluation
Helka Folch, Serge Heiden, Benoit Habert, Serge Fleury, Gabriel Illouz, Pierre Lafon, Julien Nioche, Sophie Prevost


Session WO4 - Reusability Issues

Language Resources as by-Product of Evaluation: The MULTITAG Example
Patrick Paroubek

Enabling Resource Sharing in Language Generation: an Abstract Reference Architecture
Lynne Cahill, Christy Doran, Roger Evans, Rodger Kibble, Chris Mellish, D. Paiva, Mike Reape, Donia Scott, Neil Tipper

Experiences of Language Engineering Algorithm Reuse
Bjorn Gamback, Fredrik Olsson


Session SO2 - Dialogue Evaluation Methods

Dialogue and Prompting Strategies Evaluation in the DEMON System
Carine-Alexia Lavelle, Martine De Calmes, Guy Perennou

Predictive Performance of Dialog Systems
H. Bonneau-Maynard, L. Devillers, S. Rosset

A Methodology for Evaluating Spoken Language Dialogue Systems and Their Components
Niels Ole Bernsen, Laila Dybkj?r

Developing and Testing General Models of Spoken Dialogue System Peformance
Marilyn Walker, Candace Kamm, Julie Boland


Session WO5 - Corpus Tools

A Framework for Cross-Document Annotation
David Day, Alan Goldschen, John Henderson

Providing Internet Access to Portuguese Corpora: the AC/DC Project
Diana Santos, Eckhard Bick

Annotating a Corpus to Develop and Evaluate Discourse Entity Realization Algorithms: Issues and Preliminary Results
Massimo Poesio

Using Few Clues Can Compensate the Small Amount of Resources Available for Word Sense Disambiguation
Claude de Loupy, Marc El-Beze


Session WO6 - Acquisition of Lexical Information

Learning Verb Subcategorization from Corpora: Counting Frame Subsets
Daniel Zeman, Anoop Sarkar

Tuning Lexicons to New Operational Scenarios
Roberto Basili, Maria Teresa Pazienza, Michele Vindigni, Fabio Massimo Zanzotto

A Flexible Infrastructure for Large Monolingual Corpora
Uwe Quasthoff, Christian Wolff

Automatic Generation of Dictionary Definitions from a Computational Lexicon
Penny Labropoulou, Elena Mantzari, Harris Papageorgiou, Maria Gavrilidou


Session SP1 - Phonetic Issues and Speech Synthesis

MHATLex: Lexical Resources for Modelling the French Pronunciation
Guy Perennou, Martine De Calmes

PLEDIT - A New Efficient Tool for Management of Multilingual Pronunciation Lexica and Batchlists
Damjan Vlaj, Janez Kaiser, Ralph Wilhelm, Ute Ziegenhain

Object-oriented Access to the Estonian Phonetic Database
Einar Meister, Arvo Eek, Toomas Altosaar, Martti Vainio

A French Phonetic Lexicon with Variants for Speech and Language Processing
Philippe Boula de Mareuil, Christophe d'Alessandro, Francois Yvon, Veronique Auberge, Jacqueline Vaissiere, Angelique Amelot

A Computational Platform for Development of Morphologic and Phonetic Lexica
Matej Rojc, Zdravko Kacic

An Optimised FS Pronunciation Resource Generator for Highly Inflecting Languages
Dafydd Gibbon, Ana Paula Quirino Simoes, Martin Matthiesen

Design Methodology for Bilingual Pronunciation Dictionary
Jong-mi Kim

Labeling of Prosodic Events in Slovenian Speech Database GOPOLIS
France Mihelic, Jerneja Gros, Elmar Noth, Volker Warnke

Regional Pronunciation Variants for Automatic Segmentation
Nicole Beringer, Marcia Neff

Le Programme Compalex (COMPAraison LEXicale)
Josue Ndamba, Jean Silence Bayamboussa

Perceptual Evaluation of Text-to-Speech Implementation of Enclitic Stress in Greek
Stavroula-Evita Fotinea, Athanassios Protopapas, Dimitris Dimitriadis, George Carayannis

Etude et Evaluation de la Di-Syllabe comme Unite Acoustique pour le Systeme de Synthese Arabe PARADIS
N. Chenfour, A. Benabbou, A. Mouradi

Design of Optimal Slovenian Speech Corpus for Use in the Concatenative Speech Synthesis System
Matej Rojc, Zdravko Kacic


Session WP1 - Lexicon

The Bank of Swedish
Martin Gellerstam, Yvonne Cederholm, Torgny Rasmark

The Multi-layer Language Knowledge Base of Chinese NLP
Hu Junfeng, Yu Shiwen

Producing LRs in Parallel with Lexicographic Description: the DCC project
Joan Soler i Bou

Some Language Resources and Tools for Computational Processing of Portuguese at INESC
Luzia Wittmann, Ricardo Daniel Ribeiro, Tania Pego, Fernando Batista

Screffva: A Lexicographer's Workbench
Jon Mills

The Concede Model for Lexical Databases
Tomaz Erjavec, Roger Evans, Nancy Ide, Adam Kilgarriff

Automatically Expansion of Thesaurus Entries with a Different Thesaurus
Hideki Kashioka, Satosi Shirai

Electronic Language Resources for Polish: POLEX, CEGLEX and GRAMLEX
Zygmunt Vetulani

Turkish Electronic Living Lexicon (TELL): A Lexical Database
Sharon Inkelas, Aylin Kuntay, C. Orhan Orgun, Ronald Sprouse

Tools for the Generation of Morphological Entries in Dictionaries
Ulle Viks

Design and Construction of Knowledge base for Verb using MRD and Tagged Corpus
Young-Soog Chae, Key-Sun Choi


Session SP2 - Spoken Language Resources Issues from Construction to Validation

Recruitment Techniques for Minority Language Speech Databases: Some Observations
Rhys James Jones, John S. Mason, Louise Helliker, Mark Pawlewski

Enhancing Speech Corpus Resources with Multiple Lexical Tag Layers
Andreas Witt, Harald Lungen, Dafydd Gibbon

What are Transcription Errors and Why are They made?
Daniela Oppermann, Susanne Burger, Karl Weilhammer

Quality Control in Large Annotation Projects Involving Multiple Judges: The Case of the TDT Corpora
Stephanie Strassel, David Graff, Nii Martey, Christopher Cieri

A New Methodology for Speech Corpora Definition from Internet Documents
D. Vaufreydaz, C. Bergamini, J.F. Serignat, L. Besacier, M. Akbar

Many Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies
David Graff, Steven Bird

SLR Validation: Present State of Affairs and Prospects
Henk van den Heuvel, Lou Boves, Khalid Choukri, Simo Goddijn, Eric Sanders

On the Usage of Kappa to Evaluate Agreement on Coding Tasks
Barbara Di Eugenio


Session WP2 - Corpus Annotation

A Word-level Morphosyntactic Analyzer for Basque
I. Aduriz, E. Agirre, I. Aldezabal, X. Arregi, J. M. Arriola, X. Artola, K. Gojenola, A. Maritxalar, K. Sarasola, M. Urkia

Interactive Corpus Annotation
Thorsten Brants, Oliver Plaehn

Semi-automatic Construction of a Tree-annotated Corpus Using an Iterative Learning Statistical Language Model
Kiyoaki Shirai, Hozumi Tanaka, Takenobu Tokunaga

A Robust Parser for Unrestricted Greek Text
Sotiris Boutsis, Prokopis Prokopidis, Voula Giouli, Stelios Piperidis

Automatic Assignment of Grammatical Relations
Leonardo Lesmo, Vincenzo Lombardo

Resources for Lexicalized Tree Adjoining Grammars and XML Encoding: TagML
Patrice Bonhomme, Patrice Lopez

CLinkA A Coreferential Links Annotator
Constantin Orasan

Coreference in Annotating a Large Corpus
Eva Hajicova, Jarmila Panenova, Petr Sgall

FAST - Towards a Semi-automatic Annotation of Corpora
Catalina Barbu

Layout Annotation in a Corpus of Patient Information Leaflets
Nadjet Bouayad-Agha


Session WP3 - Multilingual Corpora

Designing a Tool for Exploiting Bilingual Comparable Corpora
Peter Bennison, Lynne Bowker

A Word Sense Disambiguation Method Using Bilingual Corpus
Zheng Jie, Mao Yuhang

Building the Croatian-English Parallel Corpus
Marko Tadic

A Parallel Corpus of Italian/German Legal Texts
Johann Gamper

Lexical and Translation Equivalence in Parallel Corpora
Tamas Varadi

Some Technical Aspects about Aligning Near Languages
Lluis de Yzaguirre, Marta Ribas, Jordi Vivaldi, M. Teresa Cabre

Cairo: An Alignment Visualization Tool
Noah A. Smith, Michael E. Jahr


VOLUME II

Keynotes Speeches

Next Generation Natural Language Applications
Salim Roukos

Terminology Standards - Help for the Terminology Community
Alan K. Melby, Klaus-Dirk Schmitz

Panels

International Co-operation in the field of Language Resources and Evaluation
Professor Antonio Zampolli, Lynette Hirschman




Session SO3 - Speech Synthesis

GREEK ToBI: A System for the Annotation of Greek Speech Corpora
Amalia Arvaniti, Mary Baltazani

EULER: an Open, Generic, Multilingual and Multi-platform Text-to-Speech System
Thierry Dutoit, Michel Bagein, Fabrice Malfrere, Vincent Pagel, Alain Ruelle, Nawfal Tounsi, Dominique Wynsberghe

POSCAT: A Morpheme-based Speech Corpus Annotation Tool
Byeongchang Kim, Jin-seok Lee, Jeongwon Cha, Geunbae Lee


Session WO7 - Syntantic Parsing

A Strategy for the Syntactic Parsing of Corpora: from Constraint Grammar Output to Unification-based Processing
Toni Badia, Angels Egea

Learning Preference of Dependency between Japanese Subordinate Clauses and its Evaluation in Parsing
Takehito Utsuro

An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG
Ann Copestake, Dan Flickinger


Session WO8 - Acquisition of Semantic Information

Controlled Bootstrapping of Lexico-semantic Classes as a Bridge between Paradigmatic and Syntagmatic Knowledge: Methodology and Evaluation
Paolo Allegrini, Simonetta Montemagni, Vito Pirrelli

Automatic Extraction of Semantic Similarity of Words from Raw Technical Texts
Aristomenis Thanopoulos, Nikos Fakotakis, George Kokkinakis

Abstraction of the EDR Concept Classification and its Effectiveness in Word Sense Disambiguation
Kimura Kazuhiro, Hirakawa Hideki


Session EO2 - Evaluation of Tools

Where Opposites Meet. A Syntactic Meta-scheme for Corpus Annotation and Parsing Evaluation
Alessandro Lenci, Simonetta Montemagni, Vito Pirrelli, Claudia Soria

A Comparison of Summarization Methods Based on Task-based Evaluation
Mochizuki Hajime, Okumura Manabu

Evaluation of TRANSTYPE, a Computer-aided Translation Typing System: A Comparison of a Theoretical- and a User-oriented Evaluation Procedures
Philippe Langlais, Sebastien Sauve, George Foster, Elliott Macklovitch, Guy Lapalme


Session SO4 - Speech Synthesis Evaluation

The Cost258 Signal Generation Test Array
Gerard Bailly, Eduardo R. Banga, Alex Monaghan, Erhard Rank

Guidelines for Japanese Speech Synthesizer Evaluation
Shuichi Itahashi

Perception and Analysis of a Reiterant Speech Paradigm: a Functional Diagnostic of Synthetic Prosody
Albert Rilliard, Veronique Auberge


Session WO9 - Applications in the Written Area

Looking for Errors: A Declarative Formalism for Resource-adaptive Language Checking
Andrew Bredenkamp, Berthold Crysmann, Mirela Petrea

An Architecture for Document Routing in Spanish: Two Language Components, Pre-processor and Parser
Guillermo Rojo, Maria Concepcion Alvarez, Pilar Alvarino, Adelaida Gil, Maria Paula Santalla, Susana Sotelo

Extraction of Unknown Words Using the Probability of Accepting the Kanji Character Sequence as One Word
Hiroyuki Shinnou, Masanori Ikeya


Session WO10 - Semantic Annotation of Corpora

An Experiment of Lexical-Semantic Tagging of an Italian Corpus
Ornella Corazzari, Nicoletta Calzolari, Antonio Zampolli

Semantic Tagging for the Penn Treebank
Martha Palmer, Hoa Trang Dang, Joseph Rosenzweig

A Step toward Semantic Indexing of an Encyclopedic Corpus
Philippe Alcouffe, Nicolas Gacon, Claude Roux, Frederique Segond


Session SO5 - Evaluation of Dialogue

Obtaining Predictive Results with an Objective Evaluation of Spoken Dialogue Systems: Experiments with the DCR Assessment Paradigm
Jean-Yves Antoine, Jacques Siroux, Jean Caelen, Jeanne Villaneau, Jerome Goulian, Mohamed Ahafhaf

Lessons Learned from a Task-based Evaluation of Speech-to-Speech Machine Translation
Lori Levin, Boris Bartlog, Ariadna Font Llitjos, Donna Gates, Alon Lavie, Dorcas Wallace, Taro Watanabe, Monika Woszczyna

Galaxy-II as an Architecture for Spoken Dialogue Evaluation
Joseph Polifroni, Stephanie Seneff

Issues in the Evaluation of Spoken Dialogue Systems - Experience from the ACCeSS Project
Thomas Brey, Gerhard Hanrieder, Paul Heisterkamp, Ludwig Hitzenberger, Peter Regel-Brietzmann

Evaluation for Darpa Communicator Spoken Dialogue Systems
Marilyn Walker, Lynette Hirschman, John Aberdeen

Evaluation of a Dialogue System Based on a Generic Model that Combines Robust Speech Understanding and Mixed-initiative Control
R. Lopez-Cozar, A.J. Rubio, J.E. Diaz Verdejo, A. De la Torre


Session WO11 - Mono-Multilingual Lexicon Acquisition and Building

Automatic Extraction of English-Chinese Term Lexicons from Noisy Bilingual Corpora
Sun Le, Jin Youbing, Du Lin, Sun Yufang

Chinese-English Semantic Resource Construction
Bonnie J. Dorr, Gina-Anne Levow, Dekang Lin, Scott Thomas

Towards A Universal Tool For NLP Resource Acquisition
Svetlana Sheremetyeva, Sergei Nirenburg

Acquisition of Linguistic Patterns for Knowledge-based Information Extraction
Sanda M. Harabagiu, Steven J. Maiorano

Using Lexical Semantic Knowledge from Machine Readable Dictionaries for Domain Independent Language Modelling
George Demetriou, Eric Atwell, Clive Souter

ItalWordNet: a Large Semantic Database for Italian
Adriana Roventini, Antonietta Alonge, Nicoletta Calzolari, Bernardo Magnini, Francesca Bertagna


Session WO12 - Language Resources: Infrastructural Issues

An Open Architecture for the Construction and Administration of Corpora
Constantin Orasan, Ramesh Krishnamurthy

Corpus Resources and Minority Language Engineering
Tony McEnery, Paul Baker, Lou Burnard

Towards a Query Language for Annotation Graphs
Steven Bird, Peter Buneman, Wang-Chiew Tan

Software Infrastructure for Language Resources: a Taxonomy of Previous Work and a Requirements Analysis
Hamish Cunnigham, Kalina Bontcheva, Valentin Tablan, Yorick Wilks

XCES: An XML-based Encoding Standard for Linguistic Corpora
Nancy Ide, Patrice Bonhomme, Laurent Romary

The American National Corpus: A Standardized Resource for American English
Catherine Macleod, Nancy Ide, Ralph Grishman


Session TO1 - Terminology

Accessibility of Multilingual Terminological Resources - Current Problems and Prospects for the Future
Gerhard Budin, Alan K. Melby

Terminology in Korea: KORTERM
Key-Sun Choi, Young-Soog Chae

ARC A3: A Method for Evaluating Term Extracting Tools and/or Semantic Relations between Terms from Corpora
Christophe Jouis, ARC A3

Use of Greek and Latin Forms for Term Detection
Rosa Estopa, Jordi Vivaldi, M. Teresa Cabre

Automatically Augmenting Terminological Lexicons from Untagged Text
George Demetriou, Robert Gaizauskas

Creating and Using Domain-specific Ontologies for Terminological Applications
Diana Maynard, Sophia Ananiadou


Session SP3 - Spoken Language Resources' Projects

SALA: SpeechDat across Latin America. Results of the First Phase
Asuncion Moreno, Robrecht Comeyne, Keith Haslam, Henk van den Heuvel, Harald Hoge, Sabine Horbach, Giorgio Micca

SPEECON - Speech Data for Consumer Devices
Rainer Siemund, Harald Hoge, Siegfried Kunzmann, Krzysztof Marasek

The Spoken Dutch Corpus. Overview and First Evaluation
Nelleke Oostdijk

SPEECHDAT-CAR. A Large Speech Database for Automotive Environments
Asuncion Moreno, Borge Lindberg, Christoph Draxler, Gael Richard, Khalid Choukri, Stephan Euler, Jeffrey Allen

Creation of Spoken Hebrew Databases
Tami Rannon, Ofra Golani, Anat Goren, Sherrie Shammass, Ami Moyal

Spoken Portuguese: Geographic and Social Varieties
Jose Bettencourt Goncalves, Rita Veloso

Orthographic Transcription of the Spoken Dutch Corpus
Wim Goedertier, Simo Goddijn, Jean-Pierre Martens

Development of Acoustic and Linguistic Resources for Research and Evaluation in Interactive Vocal Information Servers
Giulia Bernardis, Herve Bourlard, Martin Rajman, Jean-Cedric Chappelier

Development and Evaluation of an Italian Broadcast News Corpus
Marcello Federico, Dimitri Giordani, Paolo Coletti

Large, Multilingual, Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT-2 and TDT-3 Corpus Efforts
Christopher Cieri, David Graff, Mark Liberman, Nii Martey, Stephanie Strassel

Live Lexicons and Dynamic Corpora Adapted to the Network Resources for Chinese Spoken Language Processing Applications in an Internet Era
Lin-Shan Lee, Lee-Feng Chien

Shallow Discourse Genre Annotation in CallHome Spanish
Klaus Ries, Lori Levin, Liza Valle, Alon Lavie, Alex Waibel

Issues in Design and Collection of Large Telephone Speech Corpus for Slovenian Language
Zdravko Kacic, Bogomir Horvat, Aleksandra Zogling

Spontaneous Speech Corpus of Japanese
Kikuo Maekawa, Hanae Koiso, Sadaoki Furui, Hitoshi Isahara

Corpora of Slovene Spoken Language for Multi-lingual Applications
Jerneja Gros, France Mihelic, Simon Dobrisek, Tomaz Erjavec, Mario Zganec

The ISLE Corpus of Non-Native Spoken English
Wolfgang Menzel, Eric Atwell, Patrizia Bonaventura, Daniel Herron, Peter Howarth, Rachel Morton, Clive Souter

Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition
Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, Takanobu Nishiura, Takeshi Yamada

The Influence of Scenario Constraints on the Spontaneity of Speech. A Comparison of Dialogue Corpora
Karl Weilhammer, Daniela Oppermann, Susanne Burger

Developing a Multilingual Telephone Based Information System in African Languages
J.C. Roux, E.C. Botha, J.A. du Preez


Session WP4 - Lexicon: Semantic and Multilingual Issues

Extraction of Concepts and Multilingual Information Schemes from French and English Economics Documents
Peggy Cadel, Helene Ledouble

Application of WordNet ILR in Czech Word-formation
Jana Klimova, Karel Pala

Coping with Lexical Gaps when Building Aligned Multilingual Wordnets
Luisa Bentivogli, Emanuele Pianta, Fabio Pianesi

Extension and Use of GermaNet, a Lexical-Semantic Database
Claudia Kunze

CDB - A Database of Lexical Collocations
Brigitte Krenn

Towards a Strategy for a Representation of Collocations - Extending the Danish PAROLE-lexicon
Anna Braasch, Sussi Olsen

Improving Lexical Databases with Collocational Information: Data from Portuguese
Paula Guerreiro

A Bilingual Electronic Dictionary for Frame Semantics
Thierry Fontenelle

A Text->Meaning->Text Dictionary and Process
Dominique Dutoit

Production of NLP-oriented Bilingual Language Resources from Human-oriented dictionaries
Vera Fluhr-Semenova, Christian Fluhr, Stephanie Brisson


Session TP1 - Terminology

Terms Specification and Extraction within a Linguistic-based Intranet Service
Sandro Pedrazzini, Elisabeth Maier, Dierk Konig

With WORLDTREK Family, Create, Update and Browse your Terminological World
Yasmina Abbas, Marie-Luce Picard

Extraction of Semantic Clusters for Terminological Information Retrieval from MRDs
Gerardo Sierra, John McNaught

Reusing the Mikrokosmos Ontology for Concept-based Multilingual Terminology Databases
Antonio Moreno, Chantal Perez

Term-based Identification of Sentences for Text Summarisation
Byron Georgantopoulos, Stelios Piperidis

Terminology Encoding in View of Multifunctional NLP Resources
Marianna Katsoyannou, Eleni Efthimiou

ARISTA Generative Lexicon for Compound Greek Medical Terms
John Kontos, Ioanna Malagardi, Spyros Fountoukis


Session WP5 - Corpus Tagging

Hua Yu: A Word-segmented and Part-Of-Speech Tagged Chinese Corpus
Sun Maosong, Sun Honglin, Huang Changning, Zhang Pu, Xing Hongbing, Zhou Qiang

Morphological Tagging to Resolve Morphological Ambiguities
Gaelle Birocheau

Morphemic Analysis and Morphological Tagging of Latvian Corpus
Kristine Levane, Andrejs Spektors

Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets
Saso Dzeroski, Tomaz Erjavec, Jakub Zavrel

Using a Large Set of EAGLES-compliant Morpho-syntactic Descriptors as a Tagset for Probabilistic Tagging
Dan Tufis

The Context (not only) for Humans
Barbora Hladka

PoS Disambiguation and Partial Parsing Bidirectional Interaction
Montserrat Marimon Felipe, Jordi Porta Zamorano

Rule-based Tagging: Morphological Tagset versus Tagset of Analytical Functions
Kiril Ribarov


Session WP6 - Tools in the Written Area

The New Edition of the Natural Language Software Registry (an Initiative of ACL hosted at DFKI)
Thierry Declerck, Alexander Werner Jachmann, Hans Uszkoreit

Open Ended Computerized Overview of Controlled Languages
Elisa Gavieiro-Villatte, Laurent Spaggiari

Automatic Transliteration and Back-transliteration by Decision Tree Learning
Byung-Ju Kang, Key-Sun Choi

The Universal XML Organizer: UXO
Jan-Torsten Milde, Markus Reinsch

LT TTT - A Flexible Tokenisation Tool
Claire Grover, Colin Matheson, Andrei Mikheev, Marc Moens

Will Very Large Corpora Play For Semantic Disambiguation The Role That Massive Computing Power Is Playing For Other AI-Hard Problems?
Alessandro Cucchiarelli, Enrico Faggioli, Paola Velardi

Interarbora and Thistle - Delivering Linguistic Structure by the Internet
Jo Calder

A Proposal for the Integration of NLP Tools using SGML-Tagged Documents
X. Artola, A. Diaz de Ilarraza, N. Ezeiza, K. Gojenola, A. Maritxalar, A. Soroa


VOLUME III

Keynote Speeches

Meeting Recognition and Tracking
Alex Waibel

The Evolution of an NLP System
Stephen D.Richardson

Panel

Speech Database Processing Tools - the state of the art in automatic labeling of speech
Nick Campbell




Session WP6 - Tools in the Written Area

Reusability as Easy Adaptability: A Substantial Advance in NL Technology
Irina Prodanof, Amedeo Cappelli, Lorenzo Moretti


Session WO13 - Multilingual Resources and Applications

Grammarless Bracketing in an Aligned Bilingual Corpus
Jorge Kinoshita

Constructing a Tagged E-J Parallel Corpus for Assisting Japanese Software Engineers in Writing English Abstracts
Masumi Narita

Multilingual Linguistic Resources: From Monolingual Lexicons to Bilingual Interrelated Lexicons
Marta Villegas, Nuria Bel, Alessandro Lenci, Nicoletta Calzolari, Nilda Ruimy, Antonio Zampolli, Teresa Sadurni, Joan Soler

TransSearch: A Free Translation Memory on the World Wide Web
Elliott Macklovitch, Michel Simard, Philippe Langlais


Session WO14 - Named Entity Recognition

Annotating Resources for Information Extraction
Sean Boisen, Michael R. Crystal, Richard Schwartz, Rebecca Stone, Ralph Weischedel

Integrating Seed Names and ngrams for a Named Entity List and Classifier
Sabine Buchholz, Antal van den Bosch

Named Entity Recognition in Greek Texts
Iason Demiros, Sotiris Boutsis, Voula Giouli, Maria Liakata, Harris Papageorgiou, Stelios Piperidis

Minimally Supervised Japanese Named Entity Recognition: Resources and Evaluation
Takehito Utsuro, Manabu Sassano


Session EO3 - Evaluation and Semantics

English Senseval: Report and Results
Adam Kilgarriff, Joseph Rosenzweig

Evaluation of a Generic Lexical Semantic Resource in Information Extraction
Joyce Yue Chai

Sublanguage Dependent Evaluation: Toward Predicting NLP performances
Gabriel Illouz

Evaluation of Word Alignment Systems
Lars Ahrenberg, Magnus Merkel, Anna Sagvall Hein, Jorg Tiedemann


Session WO15 - Language Resources Projects

Language Resources Development at the Spanish Royal Academy
Angel Martin Municio, Guillermo Rojo, Fernando Sanchez Leon, Octavio Pinillos

A Self-Expanding Corpus Based on Newspapers on the Web
Knut Hofland

For a Repository of NLP Tools
Stephane Chaudiron, Khalid Choukri, Audrey Mance, Valerie Mapelli


Session WO16 - Corpus Annotation and Information Extraction

Coreference Annotation: Whither?
Rodger Kibble, Kees van Deemter

Annotating Events and Temporal Information in Newswire Texts
Andrea Setzer, Robert Gaizauskas

A Semi-automatic System for Conceptual Annotation, its Application to Resource Construction and Evaluation
W.J. Black, J. McNaught, G.P. Zarri, A. Persidis, A. Brasher, L. Gilardoni, E. Bertino, G. Semeraro, P. Leo


Session EO4 - Grammars and Systems Evaluation

Using a Formal Approach to Evaluate Grammars
Bilel Gargouri, Mohamed Jmaiel, Abdelmajid Ben Hamadou

Towards More Comprehensive Evaluation in Anaphora Resolution
Ruslan Mitkov

Coreference Resolution Evaluation Based on Descriptive Specificity
Francois Trouilleux, Eric Gaussier, Gabriel G. Bes, Annie Zaenen


Session SO6 - Recognition

Methods and Metrics for the Evaluation of Dictation Systems: a Case Study
Maria Canelli, Daniele Grasso, Margaret King

Design Issues in Text-Independent Speaker Recognition Evaluation
Alvin Martin, Mark Przybocki

Perceptual Evaluation of a New Subband Low Bit Rate Speech Compression System based on Waveform Vector Quantization and SVD Postfiltering
Stavroula-Evita Fotinea, Ioannis Dologlou, Stylianos Bakamidis, Gregory Stainhaouer, George Carayannis

IPA Japanese Dictation Free Software Project
Katsunobu Itou, Kiyohiro Shikano, Tatsuya Kawahara, Kasuya Takeda, Atsushi Yamada, Akinori Itou, Takehito Utsuro, Tetsunori Kobayashi, Nobuaki Minematsu, Mikio Yamamoto, Shigeki Sagayama, Akinobu Lee

The COST 249 SpeechDat Multilingual Reference Recogniser
Finn Tore Johansen, Narada Warakagoda, Borge Lindberg, Gunnar Lehtinen, Zdravko Kacic, Andreh Zgank, Kjell Elenius, Gampiero Salvi

Automotive Speech-Recognition - Success Conditions Beyond Recognition Rates
Klaus Bengler

Evaluating Multi-party Multi-modal Systems
Laurie E. Damianos, Jill Drury, Tari Fanderclai, Lynette Hirschman, Jeff Kurtz, Beatrice Oshika


Session WO17 - Semantic Lexicons

What's in a Thesaurus?
Adam Kilgarriff, Colin Yallop

SIMPLE: A General Framework for the Development of Multilingual Lexicons
Nuria Bel, Federica Busa, Nicoletta Calzolari, Elisabetta Gola, Alessandro Lenci, Monica Monachini, Antoine Ogonowski, Ivonne Peters, Wim Peters, Nilda Ruimy, Marta Villegas, Antonio Zampolli

The Treatment of Adjectives in SIMPLE: Theoretical Observations
Ivonne Peters, Wim Peters

Lexicalised Systematic Polysemy in WordNet
Wim Peters, Ivonne Peters

Annotating, Disambiguating & Automatically Extending the Coverage of the Swedish SIMPLE Lexicon
Dimitrios Kokkinakis, Maria Toporowska Gronostaj, Karin Warmenius

Semantic Encoding of Danish Verbs in SIMPLE - Adapting a Verb Framed Model to a Satellite-framed Language
Bolette Sandford Pedersen, Sanni Nimb

Integrating Subject Field Codes into WordNet
Bernardo Magnini, Gabriela Cavaglia


Session WO18 - Morphology in Lexical and Textual Resources

Principled Hidden Tagset Design for Tiered Tagging of Hungarian
Dan Tufis, Peter Dienes, Csaba Oravecz, Tamas Varadi

Part of Speech Tagging and Lemmatisation for the Spoken Dutch Corpus
Frank Van Eynde, Jakub Zavrel, Walter Daelemans

Inter-annotator Agreement for a German Newspaper Corpus
Thorsten Brants

An Approach to Lexical Development for Inflectional Languages
Davide Turcato, Janine Toole, Stavroula Tsiplakou, Trude Heift, Paul McFetridge

GeDeriF: Automatic Generation and Analysis of Morphologically Constructed Lexical Resources
Fiammetta Namer, Georgette Dal

A Unified POS Tagging Architecture and its Application to Greek
Harris Papageorgiou, Prokopis Prokopidis, Voula Giouli, Stelios Piperidis

Derivation in the Czech National Corpus
Jana Klimova, Jan Kocek


Session EO5 - Information Retrieval and Question Answering Evaluation

The Evaluation of Systems for Cross-language Information Retrieval
Martin Braschler, Donna Harman, Michael Hess, Michael Kluck, Carol Peters, Peter Schauble

IREX: IR & IE Evaluation Project in Japanese
Satoshi Sekine, Hitoshi Isahara

Textual Information Retrieval Systems Test: The Point of View of an Organizer and Corpuses Provider
Patrick Kremer, Laurent Schmitt

Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation
Charles L. Wayne

How to Evaluate Your Question Answering System Every Day ... and Still Get Real Work Done
Eric J. Breck, John D. Burger, Lisa Ferro, Lynette Hirschman, David House, Marc Light, Inderjeet Mani

The TREC-8 Question Answering Track
Ellen M. Voorhees, Dawn M. Tice

Cardinal, Nominal or Ordinal Similarity Measures in Comparative Evaluation of Information Retrieval Process
Christine Michel


Session SP4 - Tools for Evaluation and Processing of Spoken Language Resources

Transcribing with Annotation Graphs
Edouard Geoffrois, Claude Barras, Steven Bird, Zhibiao Wu

SpeechDat-Car Fixed Platform
Jose A.R. Fonollosa, Asuncion Moreno

Automatic Speech Segmentation in High Noise Condition
Rosen Ivanov

SegWin: a Tool for Segmenting, Annotating, and Controlling the Creation of a Database of Spoken Italian Varieties
Mario Refice, Michelina Savino, Marco Altieri, Roberto Altieri

A Graphical Parametric Language-Independent Tool for the Annotation of Speech Corpora
Kallirroi Georgila, Nikos Fakotakis, George Kokkinakis

NaniTrans: a Speech Labelling Tool
David Portabella, Albert Febrer, Asuncion Moreno

Annotation of a Multichannel Noisy Speech Corpus
L. Cristoforetti, M. Matassoni, M. Omologo, P. Svaizer, E. Zovato

Dialogue Annotation for Language Systems Evaluation
Marcela Charfuelan, Jose Relano Gil, M. Carmen Rogriguez Gancedo, Daniel Tapias Merino, Luis Hernandez Gomez

Annotating Communication Problems Using the MATE Workbench
Laila Dybkj?r, Morten Baun Moller, Niels Ole Bernsen, Michael Grosse, Martin Olsen, Amanda Schiffrin

The MATE Workbench Annotation Tool, a Technical Description
Amy Isard, David McKelvie, Andreas Mengel, Morten Baun Moller

On the Use of Prosody for On-line Evaluation of Spoken Dialogue Systems
Marc Swerts, Emiel Krahmer

MDWOZ: A Wizard of Oz Environment for Dialog Systems Development
Cosmin Munteanu, Marian Boldea

End-to-End Evaluation of Machine Interpretation Systems: A Graphical Evaluation Tool
Susanne J. Jekat, Lorenzo Tessiore

Cross-lingual Interpolation of Speech Recognition Models
Giorgio Micca, Alessandra Frasca, Maria Gabriella Di Benedetto


Session WP7 - Corpus Projects

Rarity of Words in a Language and in a Corpus
Jaroslava Hlavacova

The PAROLE Program
Georges Vignaux

Portuguese Corpora at CLUL
Maria Fernanda Bacelar do Nascimento, Luisa Pereira, Joao Saramago

Russian Monitor Corpora: Composition, Linguistic Encoding and Internet Publication
Serge A.Yablonsky

A Web-based Text Corpora Development System
Dan Bohus, Marian Boldea

Issues from Corpus Analysis that have influenced the On-going Development of Various Haitian Creole Text- and Speech-based NLP Systems and Applications
Marilyn Mason


Session EP1 - Evaluation and Written Area

Enhancing the TDT Tracking Evaluation
Amit Bagga

Target Suites for Evaluating the Coverage of Text Generators
John A. Bateman, Anthony F. Hartley

A Novelty-based Evaluation Method for Information Retrieval
Atsushi Fujii, Tetsuya Ishikawa

How To Evaluate and Compare Tagsets? A Proposal
Herve Dejean

Evaluating Summaries for Multiple Documents in an Interactive Environment
Gees C. Stein, Tomek Strzalkowski, G. Bowden Wise, Amit Bagga

Establishing the Upper Bound and Inter-judge Agreement of a Verb Classification Task
Paola Merlo, Suzanne Stevenson

A Parallel English-Japanese Query Collection for the Evaluation of On-Line Help Systems
Richard F. E. Sutcliffe, Sadao Kurohashi

An HPSG-Annotated Test Suite for Polish
Malgorzata Marciniak, Agnieszka Mykowiecka, Anna Kupsc, Adam Przepiorkowski

Evaluation of Computational Linguistic Techniques for Identifying Significant Topics for Browsing Applications
Judith L. Klavans, Nina Wacholder, David K. Evans


Session SP5 - Multimodal - Multimedia Resources and Tools

The EUDICO Project, Multi Media Annotation over the Internet
Albert Russel, Hennie Brugman, Daan Broeder, Peter Wittenburg

Towards a Standard for Meta-descriptions of Language Resources
D. Broeder, H. Brugman, A. Russel, R. Skiba, P. Wittenburg

ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation
Steven Bird, David Day, John Garofolo, John Henderson, Christophe Laprun, Mark Liberman

Models of Russian Text/Speech Interactive Databases for Supporting of Scientific, Practical and Cultural Researches
Pavel Skrelin, Tatiana Sherstinova

A Multi-view Hyperlexicon Resource for Speech and Language System Development
Dafydd Gibbon, Thorsten Trippel

Addizionario: an Interactive Hypermedia Tool for Language Learning
Giovanna Turrini, Laura Cignoni, Alessandro Paccosi


Session WP8 - Corpus Tools

A Web-based Advanced and User Friendly System: The Oslo Corpus of Tagged Norwegian Texts
Janne Bondi Johannessen, Anders Noklestad, Kristin Hagen

Introduction of KIBS (Korean Information Base System) Project
Young-Soog Chae, Key-Sun Choi

Design and Implementation of the Online ILSP Greek Corpus
Nick Hatzigeorgiu, Maria Gavrilidou, Stelios Piperidis, George Carayannis, Anastasia Papakostopoulou, Athanassia Spiliotopoulou, Anna Vacalopoulou, Penny Labropoulou, Elena Mantzari, Harris Papageorgiou, Iason Demiros

The (Un)Deterministic Nature of Morphological Context
Kiril Ribarov

A Software Toolkit for Sharing and Accessing Corpora Over the Internet
Saturnino Luz

GRUHD: A Greek database of Unconstrained Handwriting
E. Kavallieratou, N. Liolios, E. Koutsogeorgos, N. Fakotakis, G. Kokkinakis


Session WP9 - Applications using Written Language Resources

Resources for Multilingual Text Generation in Three Slavic Languages
John Bateman, Elke Teich, Geert-Jan Kruijff, Ivanna Kruijff-Korbayova, Serge Sharoff, Hana Skoumalova

Evaluating Wordnets in Cross-language Information Retrieval: the ITEM Search Engine
Felisa Verdejo, Julio Gonzalo, Anselmo Penas, Fernando Lopez, David Fernandez

NL-Translex: Machine Translation for Dutch
Catia Cucchiarini, Johan Van Hoorde, Elizabeth D'Halleweyn

Typographical and Orthographical Spelling Error Correction
Kyongho Min, William H. Wilson, Yoo-Jin Moon

LEXIPLOIGISSI: An Educational Platform for the Teaching of Terminology in Greece
Constandina Economou, Spyros Raptis, Gregory Stainhaouer

Collocations as Word Co-ocurrence Restriction Data - An Application to Japanese Word Processor -
Kosho Shudo, Masahito Takahashi, Yasuo Koyama, Kenji Yoshimura

ref="papers-313.htm">NL-Translex: Machine Translation for Dutch
Catia Cucchiarini, Johan Van Hoorde, Elizabeth D'Halleweyn

Typographical and Orthographical Spelling Error Correction
Kyongho Min, William H. Wilson, Yoo-Jin Moon

LEXIPLOIGISSI: An Educational Platform for the Teaching of Terminology in Greece
Constandina Economou, Spyros Raptis, Gregory Stainhaouer

Collocations as Word Co-ocurrence Restriction Data - An Application to Japanese Word Processor -
Kosho Shudo, Masahito Takahashi, Yasuo Koyama, Kenji Yoshimura