LREC 2000 - Abstracts

LREC 2000 2^nd International Conference on Language Resources & Evaluation

Conference Papers and Abstracts

Papers and abstracts by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers and abstracts by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.

Paper Paper Title Abstract

62 Electronic Language Resources for Polish: POLEX, CEGLEX and GRAMLEX We present theoretical results and resources obtained within three projects: national project POLEX, Copernicus 1 Project CEGLEX (1032) and Copernicus Project GRAMLEX (632). Morphological resources obtained within these projects contribute to fill-in the gap on the map of available electronic language resources for Polish. After a short presentation of some common methodological bases defined within the POLEX project, we proceed to present methodology and data obtained in CEGLEX and GRAMLEX projects. The intention of the Polish language part of CEGLEX was to test formats proposed by the GENELEX project against Polish data. The aim of the GRAMLEX project was to create a corpus-based morphological resources for Polish. GRAMLEX refers directly to the morphological part of the CEGLEX project. Large samples of data presented here are accessible at http://main.amu.edu.pl/~zlisi/projects.htm.

244 Enabling Resource Sharing in Language Generation: an Abstract Reference Architecture The RAGS project aims to develop a reference architecture for natural language generation,to facilitate modular development of NLG systams as well as evaluation of components, systems and algorithms. This paper gives an overview of the proposed framework, describing an abstract data model with five levels of representation: Conceptual, Semantic, Rhetorical, Document and Syntactic. We report on a re-implementation of an existing system using the RAGS data model.

67 End-to-End Evaluation of Machine Interpretation Systems: A Graphical Evaluation Tool VERBMOBIL as a long-term project of the Federal Ministry of Education, Science, Research and Technology aims at developing a mobile translation system for spontaneous speech. The source-language input consists of human speech (English, German or Japanese), the translation (bidirectional English-German and Japanese-German) and target-language output is effected by the VERBMOBIL system. As to the innovative character of the project new methods for end-to-end evaluation had to be developed by a subproject which has been established especially for this purpose. In this paper we present criteria for the evaluation of speech-to-speech translation systems and a tool for judging the translation quality which is called Graphical Evaluation Tool (GET)2 .

8 English Senseval: Report and Results There are now many computer programs for automatically determining which sense a word is being used in. One would like to be able to say which were better, which worse, and also which words, or varieties of language, presented particular problems to which programs. In 1998 a first evaluation exercise, SENSEVAL, took place. The English component of the exercise is described, and results presented.

183 Enhancing Speech Corpus Resources with Multiple Lexical Tag Layers We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transfor-mation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).

5 Enhancing the TDT Tracking Evaluation Topic Detection and Tracking (TDT) is a DARPA-sponsored initiative concerned with finding groups of stories on the same topic (tdt, 1998). The goal is to build systems that can segment, detect, and track incoming news stories (possibly from multiple continuous feeds) with respect to pre-defined topics. While the detection task detects the first story on a particular topic, the tracking task determines, for each story, which topic it is relevant to. This paper will discuss the algorithm currently used for evaluating systems for the tracking task, present some of its limitation, and propose a new algorithm that enhances the current evaluation.

233 Establishing the Upper Bound and Inter-judge Agreement of a Verb Classification Task Detailed knowledge about verbs is critical in many NLP and IR tasks, yet manual determination of such knowledge for large numbers of verbs is difficult, time-consuming and resource intensive. Recent responsesto this problem have attempted to classify verbs automatically, as a first step to automatically build lexical resources. In order to estimate the upper bound of a verb classification task, which appears to be difficult and subject to variability among experts, we investigated the performance of human experts in controlled classification experiments. We report here the results of two experiments—using a forced-choice task and a non-forced choice task—which measure human expert accuracy (compared to a gold standard) in classifying verbs into three pre-defined classes, as well as inter-expert agreement. To preview, we find that the highest expert accuracy is 86.5% agreement with the gold standard, and that inter-expert agreement is not very high (K between .53 and .66). The two experiments show comparable results.

32 Etude et Evaluation de la Di-Syllabe comme Unite Acoustique pour le Systeme de Synthese Arabe PARADIS L' etude que nous presentons dans cet article s' inscrit dans le cadre de la realisation d' un systeme de synthese de la parole a partir du texte pour la langue arabe. Notre systeme PARADIS est base sur la concatenation des di-syllabes avec TD-PSOLA comme technique de synthese. Nous presentons dans cet article l' interet du choix de la di-syllabe comme unite de concatenation pour le synthetiseur et son apport au niveau de la qualite de synthese. En effet, la di-syllabe permet d' ameliorer amplement la qualite de synthese et de reduire les problemes de discontinuite temporelle lors de la concatenation. Cependant, on est confronte a plusieurs problemes causes par la taille considerable de l' ensemble des di-syllabes et leur adaptation aux modeles prosodiques qui sont d' habitude associes a la syllabe comme unite rythmique. Nous decrivons alors le principe sur lequel nous nous sommes bases pour reduire le nombre de di-syllabes. Nous presentons ensuite la demarche que nous avons mise au point pour la generation et l' etiquetage automatique du dictionnaire de di-syllabes. Ainsi, nous avons choisi des logatomes ayant des formes particulierement appropriees a l' automatisation de la procedure de generation du corpus des logatomes et a l' operation de segmentation automatique. Par ailleurs, nous presentons une technique d' organisation du dictionnaire acoustique parfaitement adaptee a la forme de la di-syllabe arabe.

41 EULER: an Open, Generic, Multilingual and Multi-platform Text-to-Speech System The aim of the collaborative project presented in this paper is to obtain a set of highly modular Text-To-Speech synthesizers for as many voices, languages and dialects as possible, free for use in non-commercial and non-military applications. This project is an extension of the MBROLA project: MBROLA is a speech synthesizer, freely distributed for non-commercial purposes, which uses diphone databases provided by users (19 languages in year 2000). Euler extends this idea to whole TTS systems by providing a backbone structure (MLC) and several generic algorithms for POS tagging, grapheme-to-phoneme conversion, and prosody generation. To demonstrate the potentials of the architecture and draw developpers’ interest we provide a full EULER-based TTS in French and in Arabic. Euler currently runs on Windows and Linux, and it is an open project: many of its components (and certainly its kernel) are provided as GNU C++ sources. It also incorporates, as much as possible, components and data derived from other TTS-related projects.

368 Evaluating Multi-party Multi-modal Systems The MITRE Corporation ’s Evaluation Working Group has developed a methodology for evaluating multi-modal groupware systems and capturing data on human-human interactions.The methodology consists of a framework for describing collaborative systems, scenario-based evaluation approach,and evaluation metrics for the various components of collaborative systems.We designed and ran two sets of experiments to validate the methodology by evaluating collaborative systems.In one experiment,we compared two configurations of a multi-modal collaborative application using a map navigation scenario requiring information sharing and decision making.In the second experiment,we pplied the evaluation methodology to a loosely integrated set of collaborative tools,again using a scenario-based approach.In both experiments,multi-modal,multi-user data were collected,visualized,annotated,and analyzed.

163 Evaluating Summaries for Multiple Documents in an Interactive Environment While most people have a clear idea of what a single document summary should look like, this is not immediately obvious for a multi-document summary. There are many new questions to answer concerning the amount of documents to be summarized, the type of documents, the kind of summary that should be generated, the way the summary gets presented to the user, etc. The many approaches possible to multi-document summarization makes evaluation especially difficult. In this paper we will describe an approach to multi-document summarization and report work on an evaluation method for this particular system.

136 Evaluating Translation Quality as Input to Product Development In this paper we present a corpus-based method to evaluate the translation quality of machine translation (MT) systems. We start with a shallow analysis of a large corpus and gradually focus the attention on the translation problems. The method constitutes an efficient way to identify the most important grammatical and lexical weaknesses of an MT system and to guide development towards improved translation quality. The evaluation described in the paper was carried out as a cooperation between an MT technology developer, Sail Labs, and the Computational Linguistics group at the University of Zurich.

250 Evaluating Wordnets in Cross-language Information Retrieval: the ITEM Search Engine This paper presents the ITEM multilingual search engine. This search engine performs full lexical processing (morphological analysis, tagging and Word Sense Disambiguation) on documents and queries in order to provide language-neutral indexes for querying and retrieval. The indexing terms are the EuroWordNet/ITEM InterLingual Index records that link wordnets in 10 languages of the European Community (the search engine currently supports Spanish, English and Catalan). The goal of this application is to provide a way of comparing in context the behavior of different Natural Language Processing strategies for Cross-Language Information Retrieval (CLIR) and, in particular, different Word Sense Disambiguation strategies for query translation and conceptual indexing.

191 Evaluation for Darpa Communicator Spoken Dialogue Systems The overall objective of the DARPA COMMUNICATOR project is to support rapid, cost-effective development of multi-modal speech-enabled dialogue systems with advanced conversational capabilities, such as plan optimization, explanation and negotiation. In order to make this a reality, we need to find methods for evaluating the contribution of various techniques to the users’ willingness and ability to use the system. This paper reports on the approach to spoken dialogue system evaluation that we are applying in the COMMUNICATOR program. We describe our overall approach, the experimental design, the logfile standard, and the metrics applied in the experimental evaluation planned for June of 2000.

101 Evaluation of a Dialogue System Based on a Generic Model that Combines Robust Speech Understanding and Mixed-initiative Control This paper presents a generic model to combine robust speech understanding and mixed-initiative dialogue control in spoken dialogue systems. It relies on the use of semantic frames to conceptually store user interactions, a frame-unification procedure to deal with partial information, and a stack structure to handle initiative control. This model has been successfully applied in a dialogue system being developed at our lab, named SAPLEN, which aims to deal with the telephone-based product orders and queries of fast food restaurants’ clients. In this paper we present the dialogue system and describe the new model, together with the results of a preliminary evaluation of the system concerning recognition time, word accuracy, implicit recovery and speech understanding. Finally, we present the conclusions and indicate possibilities for future work.

259 Evaluation of a Generic Lexical Semantic Resource in Information Extraction We have created an information extraction system that allows users to train the system on a domain of interest. The system helps to maximize the effect of user training by applying WordNet to rule generation and validation. The results show that, with careful control, WordNet is helpful in generating useful rules to cover more instances and hence improve the overall performance. This is particularly true when the training set is small, where F-measure is increased from 65% to 72%. However, the impact of WordNet diminishes as the size of training data increases. This paper describes our experience in applying WordNet to this system and gives an evaluation of such an effort.

355 Evaluation of Computational Linguistic Techniques for Identifying Significant Topics for Browsing Applications Evaluation of natural language processing tools and systems must focus on two complementary aspects: first, evaluation of the accuracy of the output, and second, evaluation of the functionality of the output as embedded in an application. This paper presents evaluations of two aspects of LinkIT, a tool for noun phrase identification linking, sorting and filtering. LinkIT [Evans 1998] uses a head sorting method [Wacholder 1998] to organize and rank simplex noun phrases (SNPs). LinkIT is to identify significant topics in domain-independent documents. The first evaluation, reported in D.K.Evans et al. 2000 compares the output of the Noun Phrase finder in LinkIT to two other systems. Issues of establishing a gold standard and criteria for matching are discussed. The second evaluation directly concerns the construction of the browsing application. We present results from Wacholder et al. 2000 on a qualitative evaluation which compares three shallow processing methods for extracting index terms, i.e., terms that can be used to model the content of documents. We analyze both quality and coverage. We discuss how experimental results such as these guide the building of an effective browsing applications.

34 Evaluation of TRANSTYPE, a Computer-aided Translation Typing System: A Comparison of a Theoretical- and a User-oriented Evaluation Procedures We describe and compare two protocols —one theoretical and the other in-situs —for evaluating the TRANSTYPE system, a target-text mediated interactive machine translation prototype which predicts in real time the words of the ongoing translation.

137 Evaluation of Word Alignment Systems Recent years have seen a few serious attempts to develop methods and measures for the evaluation of word alignment systems, notably the Blinker project (Melamed, 1998) and the ARCADE project (Veronis and Langlais, forthcoming). In this paper we discuss different approaches to the problem and report on results from a project where two word alignment systems have been evaluated. These results include methods and tools for the generation of reference data and a set of measures for system performance. We note that the selection and sampling of reference data can have a great impact on scoring results.

151 Experiences of Language Engineering Algorithm Reuse Traditionally, the level of reusability of language processing resources within the research community has been very low. Most of the recycling of linguistic resources has been concerned with reuse of data, e.g., corpora, lexica, and grammars, while the algorithmic resources far too seldom have been shared between di?erent projects and institutions. As a consequence, researchers who are willing to reuse somebody else's processing components have been forced to invest major e?orts into issues of integration, inter-process communication, and interface design. In this paper, we discuss the experiences drawn from the svensk project regarding the issues on reusability of language engineering software as well as some of the challenges for the research community which are prompted by them. Their main characteristics can be laid out along three dimensions; technical/software challenges, linguistic challenges, and `political' challenges. In the end, the unavoidable conclusion is that it de?nitely is time to bring more aspects of engineering into the Computational Linguistic community!

369 Extension and Use of GermaNet, a Lexical-Semantic Database This paper describes GermaNet, a lexical-semantic network and on-line thesaurus for the German language, and outlines its future extension and use. GermaNet is structured along the same lines as the Princeton WordNet (Miller et al., 1990; Fellbaum, 1998), encoding the major semantic relations like synonymy, hyponymy, meronymy, etc. that hold among lexical items. Constructing semantic networks like GermaNet has become very popular in recent approaches to computational lexicography, since wordnets constitute important language resources for word sense disambiguation, which is a prerequisite for various applications in the field of natural language processing, like information retrieval, machine translation and the development of different language-learning tools.

202 Extraction of Concepts and Multilingual Information Schemes from French and English Economics Documents This paper focuses on the linguistic analysis of economic information in French and English documents. Our objective is to establish domain-specific information schemes based on structural and conceptual information. At the structural level, we define linguistic triggers that take into account each language's specificity. At the conceptual level, analysis of concepts and relations between concepts result in a classification, prior to the representation of schemes. The final outcome of this study is a mapping between linguistic and conceptual structures in the field of economics.

35 Extraction of Semantic Clusters for Terminological Information Retrieval from MRDs This paper describes a semantic clustering method for data extracted from machine readable dictionaries (MRDs) in order to build a terminological information retrieval system that finds terms from descriptions of concepts. We first examine approaches based on ontologies and statistics, before introducing our analogy-based approach that lets us extract semantic clusters by aligning definitions from two dictionaries. Evaluation of the final set of clusters for a small set of definitions demonstrates the utility of our approach.

79 Extraction of Unknown Words Using the Probability of Accepting the Kanji Character Sequence as One Word In this paper, we propose a method to extract unknown words, which are composed of two or three kahji characters, from Japanase text. Generally the known word composed of kanji characters are segmented into other words by the morphological analysis. Moreover, the appearance probability of each segmented word is small. By these features, we can define the measure of accepting two or three kanji character sequence as an unknown word. On the other hand, we can find some segmentation patterns of unknown words. By applying our measure to kanji character sequences which have these patterns, we can extract unknown words. In the experiment, the F-measuer for extraction of known words composed of two and three kanji characters was about 0.7 and 0.4 respectively. Our method does not need to use the frequency of the word in the training corpus to judge whether its word is the unknown word or not. Therefore, our method has the advantage that low frequent unknown words are extracted.

extracted.