LREC 2000 2nd International Conference on Language Resources & Evaluation

Papers and abstracts by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers and abstracts by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts

Paper Paper Title Abstract
201 A Framework for Cross-Document Annotation We introduce a cross-document annotation toolset that serves as a corpus-wide knowledge base for linguistic annotations. This imple-mented system is designed to address the unique cognitive demands placed on human annotators who must relate information that is expressed across document boundaries.
202 Extraction of Concepts and Multilingual Information Schemes from French and English Economics Documents This paper focuses on the linguistic analysis of economic information in French and English documents. Our objective is to establish domain-specific information schemes based on structural and conceptual information. At the structural level, we define linguistic triggers that take into account each language's specificity. At the conceptual level, analysis of concepts and relations between concepts result in a classification, prior to the representation of schemes. The final outcome of this study is a mapping between linguistic and conceptual structures in the field of economics.
203 How to Evaluate Your Question Answering System Every Day ... and Still Get Real Work Done In this paper, we report on Qaviar, an experimental automated evaluation system for question answering applications. The goal of our research was to find an automatically calculated measure that correlates well with human judges' assessment of answer correctness in the context of question answering tasks. Qaviar judges the response by computing recall against the stemmed content words in the human-generated answer key. It counts the answer correct if it exceeds a given recall threshold. We determined that the answer correctness predicted by Qaviar agreed with the human 93% to 95% of the time. 41 question-answering systems were ranked by both Qaviar and human assessors, and these rankings correlated with a Kendall’s Tau measure of 0.920, compared to a correlation of 0.956 between human assessors on the same data.
205 What are Transcription Errors and Why are They made? In recent work we compared transcriptions of German spontaneous dialogues of the VERBMOBIL corpus to ascertain differences between transcribers and quality. A better understanding of where and what kind of inconsistencies occur will help us to improve the working environment for transcribers, to reduce the effort on correction passes, and will finally result in better transcription quality. The results show that transcribers have different levels of perception of spontaneous speech phenomena, mainly prosodic phenomena such as pauses in speech and lengthening. During the correction pass 80% of these labels had to be inserted. Additionally, the annotation of non-grammatical phrases and pronunciation comments seems to need a better explanation in the convention manual. Here the correcting transcribers had to change 20% of the annotations.
206 On the Usage of Kappa to Evaluate Agreement on Coding Tasks In recent years, the Kappa coefficient of agreement has become the de facto standard to evaluate intercoder agreement in the discourse and dialogue processing community. Together with the adoption of this standard, researchers have adopted one specific scale to evaluate Kappa values, the one proposed in (Krippendorff, 1980). In this paper, I highlight some issues that should be taken into account when evaluating Kappa values. Finally, I speculate on whether Kappa could be used as a measure to evaluate a system’s performance.
208 Automatic Extraction of English-Chinese Term Lexicons from Noisy Bilingual Corpora This paper describes our system, which is designed to extract English-Chinese term lexicons from noisy complex bilingual corpora and use them as translation lexicon to check sentence alignment results. The noisy bilingual corpora are aligned firstly by our improved length based statistical approach, which could detect sentence omission and insertion partly. A term extraction system is used to obtain term translation lexicons form roughly aligned corpora. Then the statistical approach is used to align the corpora again. Finally, we filter the noisy bilingual texts and obtain nearly perfect alignment corpora.
209 Issues in Corpus Creation and Distribution: The Evolution of the Linguistic Data Consortium The Linguistic Data Consortium (LDC) is a non-profit consortium of universities, companies and government research laboratories that supports education, research and technology development in language related disciplines by collecting or creating, distributing and archiving language resources including data and accompanying tools, standards and formats. LDC was founded in 1992 with a grant from the Defense Advanced Research Projects Agency (DARPA) to the University of Pennsylvania as host organization. LDC publication and distribution activities self-support from membership fees and data sales while new data creation is supported primarily by grants from DARPA and the National Science Foundation. Recent developments in the creation and use of language resources demand new roles for international data centers. Since our report at the last Language Resource and Evaluation Conference in Granada in 1998, LDC has observed growth in the demand for language resources along multiple dimensions: larger corpora with more sophisticated annotation in a wider variety of languages are used in an increasing number of language related disciplines. There is also increased demand for reuse of existing corpora. Most significantly, small research groups are taking advantage of advances in microprocessor technology, data storage and internetworking to create their own corpora. This has lead to the birth of new annotation practices whose very variety creates barriers to data sharing. This paper will describe recent LDC efforts to address emerging issues in the creation and distribution of language resources.
210 Large, Multilingual, Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT-2 and TDT-3 Corpus Efforts This paper describes the creation and content two corpora, TDT-2 and TDT-3, created for the DARPA sponsored Topic Detection and Tracking project. The research goal in the TDT program is to create the core technology of a news understanding system that can process multilingual news content categorizing individual stories according to the topic(s) they describe. The research tasks include segmentation of the news streams into individual stories, detection of new topics, identification of the first story to discuss any topic, tracking of all stories on selected topics and detection of links among stories discussing the same topics. The corpora contain English and Chinese broadcast television and radio, newswires, and text from web sites devoted to news. For each source there are texts or text intermediaries; for the broadcast stories the audio is also available. Each broadcast is also segment to show start and end times of all news stories. LDC staff have defined news topics in the corpora and annotated each story to indicate its relevance to each topic. The end products are massive, richly annotated corpora available to support research and development in information retrieval, topic detection and tracking, information extraction message understanding directly or after additional annotation. This paper will describe the corpora created for TDT including sources, collection processes, formats, topic selection and definition, annotation, distribution and project management for large corpora.
211 Using Machine Learning Methods to Improve Quality of Tagged Corpora and Learning Models Corpus-based learning methods for natural language processing now provide a consistent way to achieve systems with good performance. A number of statistical learning models have been proposed and are used in most of the tasks which used to be handled by rule-based systems. When the learning systems come to such a level as competitive as manually constructed systems, both large scale training corpora and good learning models are of great importance. In this paper, we first discuss that the main hindrances to the improvement of corpus-based learning systems are the inconsistencies or the errors existing in the training corpus and the defectiveness in the learning model. We then show that some machine learning methods are useful for effective identification of the erroneous source in the training corpus. Finally, we discuss how the various types of errors should be coped with so as to improve the learning environments.
212 Quality Control in Large Annotation Projects Involving Multiple Judges: The Case of the TDT Corpora The Linguistic Data Consortium at the University of Pennsylvania has recently been engaged in the creation of large-scale annotated corpora of broadcast news materials in support of the ongoing Topic Detection and Tracking (TDT) research project. The TDT corpora were designed to support three basic research tasks: segmentation, topic detection, and topic tracking in newswire, television and radio sources from English and Mandarin Chinese. The most recent TDT corpus, TDT3, added two tasks, story link and first story detection. Annotation of the TDT corpora involved a large staff of annotators who produced millions of human judgements. As with any large corpus creation effort, quality assurance and inter-annotator consistency were a major concern. This paper reports the quality control measures adopted by the LDC during the creation of the TDT corpora, presents techniques that were utilized to evaluate and improve the consistency of human annotators for all annotation tasks, and discusses aspects of project administration that were designed to enhance annotation consistency.
213 Learning Preference of Dependency between Japanese Subordinate Clauses and its Evaluation in Parsing (Utsuro et al., 2000) proposed statistical method for learning dependency preference of Japanese subordinate clauses, in which scopeembedding preference of subordinate clauses is exploited as a useful information source for disambiguating dependencies between subordinate clauses. Following (Utsuro et al., 2000), this paper presents detailed results of evaluating the proposed method by comparing it with several closely related existing techniques and shows that the proposed method outperforms those existing techniques.
214 Live Lexicons and Dynamic Corpora Adapted to the Network Resources for Chinese Spoken Language Processing Applications in an Internet Era In the future network era, huge volume of information on all subject domains will be readily available via the network. Also, all the network information are dynamic, ever-changing and exploding. Furthermore, many of the spoken language processing applications will have to do with the content of the network information, which is dynamic. This means dynamic lexicons, language models and so on will be required. In order to cope with such a new network environment, automatic approaches for the collection, classification, indexing, organization and utilization of the linguistic data obtainable from the networks for language processing applications will be very important. On the one hand, high performance spoken language technology can hopefully be developed based on such dynamic linguistic data on the network. On the other hand, it is also necessary that such spoken language technology can be intelligently adapted to the content of the dynamic and the ever-changing network information. Some basic concept for live lexicons and dynamic corpora adapted to the network resources has been developed for Chinese spoken language processing applications and briefly summarized here in this paper. Although the major considerations here are for Chinese language, the concept may equally apply to other languages as well.
215 Lessons Learned from a Task-based Evaluation of Speech-to-Speech Machine Translation For several years we have been conducting Accuracy Based Evaluations (ABE) of the JANUS speech-to-speech MT system (Gates et al., 1997) which measure quality and fidelity of translation. Recently we have begun to design a Task Based Evaluation for JANUS (Thomas, 1999) which measures goal completion. This paper describes what we have learned by comparing the two types of evaluation. Both evaluations (ABE and TBE) were conducted on a common set of user studies in the semantic domain of travel planning.
216 Part of Speech Tagging and Lemmatisation for the Spoken Dutch Corpus This paper describes the lemmatisation and tagging guidelines developed for the “Spoken Dutch Corpus”, and lays out the philosophy behind the high granularity tagset that was designed for the project. To bootstrap the annotation of large quantities of material (10 million words) with this new tagset we tested several existing taggers and tagger generators on initial samples of the corpus. The results show that the most effective method, when trained on the small samples, is a high quality implementation of a Hidden Markov Model tagger generator.
217 The Influence of Scenario Constraints on the Spontaneity of Speech. A Comparison of Dialogue Corpora In this article we compare two large scale dialogue corpora recorded in different settings. The main differences are unrestricted turn-taking vs. push-to-talk button and complex vs. simple negotiation task. In our investigation we found that vocabulary, durations of turns, words and sounds as well as prosodical features are influenced by differences in the setting.
218 Automatic Assignment of Grammatical Relations This paper presents a method for the assignment of grammatical relation labels in a sentence structure. The method has been implemented in the software tool AGRA (Automatic Grammatical Relation Assigner), which is part of a project for the development of a treebank of Italian sentences, and a knowledge base of Italian subcategorization frames. The annotation schema implements a notion of underspecification, that arranges grammatical relations from generic to specific one onto a hierarchy; the software tool works with hand-coded rules, which apply heuristic knowledge (on syntactic and semantic cues) to distinguish between complements and modifiers.
219 Integrating Subject Field Codes into WordNet In this paper, we present a lexical resource where WordNet synsets are annotated with Subject Field Codes. We discuss both the methodological issues we dealt with and the annotation techniques used. A quantitative analysis of the resource coverage, as well as a qualitative evaluation of the proposed annotations, are reported.
220 Building a Treebank for Italian: a Data-driven Annotation Schema Many natural language researchers are currently turning their attention to treebank development and trying to achieve accuracy and corpus data coverage in their representation formats. This paper presents a data-driven annotation schema developed for an Italian treebank ensuring data coverage and consistency between annotation of linguistic phenomena. The schema is a dependency-based format centered upon the notion of predicate-argument structure augmented with traces to represent discontinuous constituents. The treebank development involves an annotation process performed by a human annotator helped by an interactive parsing tool that builds incrementally syntactic representation of the sentence. To increase the syntactic knowledge of this parser, a specific data-driven strategy has been applied. We describe the cyclical development of the annotation schema highlighting the richness and flexibility of the format, and we present some representational issues.
221 Typographical and Orthographical Spelling Error Correction This paper focuses on selection techniques for best correction of misspelt words at the lexical level. Spelling errors are introduced by either cognitive or typographical mistakes. A robust spelling correction algorithm is needed to cover both cognitive and typographical errors. For the most effective spelling correction system, various strategies are considered in this paper: ranking heuristics, correction algorithms, and correction priority strategies for the best selection. The strategies also take account of error types, syntactic information, word frequency statistics, and character distance. The findings show that it is very hard to generalise the spelling correction strategy for various types of data sets such as typographical, orthographical, and scanning errors.
223 Application of WordNet ILR in Czech Word-formation The aim of this paper is to describe some typical word formation procedures in Czech and to show how the internal language relations (ILR) as they are introduced in Czech WordNet can be related to the chosen derivational processes. In our exploration we have paid attention to the roles of agent, location, instrument and subevent which yield the most regular and rich ways of suffix derivation in Czech. We also deal with the issues of the translation equivalents and corresponding lexical gaps that had to be solved in the framework of EuroWordNet 2 (confronting Czech with English) since they are basically brought about by verb prefixation (single, double, verb aspect pairs) or noun suffixation (diminutives, move in gender). Finally, we try to demonstrate that the mentioned derivational processes can be employed to extend Czech lexical resources in a semiautomatic way.
224 POSCAT: A Morpheme-based Speech Corpus Annotation Tool As more and more speech systems require linguistic knowledge to accommodate various levels of applications, corpora that are tagged with linguistic annotations as well as signal-level annotations are highly recommended for the development of today’s speech systems. Among the linguistic annotations, POS (part-of-speech) tag annotations are indispensable in speech corpora for most modern spoken language applications of morphologically complex agglutinative languages such as Korean. Considering the above demands, we have developed a single unified speech corpus annotation tool that enables corpus builders to link linguistic annotations to signal-level annotations using a morphological analyzer and a POS tagger as basic morpheme-based linguistic engines. Our tool integrates a syntactic analyzer, phrase break detector, grapheme-to-phoneme converter and automatic phonetic aligner together. Each engine automatically annotates its own linguistic and signal knowledge, and interacts with the corpus developers to revise and correct the annotations on demand. All the linguistic/phonetic engines were developed and merged with an interactive visualization tool in a client-server network communication model. The corpora that can be constructed using our annotation tool are multi-purpose and applicable to both speech recognition and text-to-speech (TTS) systems. Finally, since the linguistic and signal processing engines and user interactive visualization tool are implemented within a client-server model, the system loads can be reasonably distributed over several machines.
226 A Flexible Infrastructure for Large Monolingual Corpora In this paper we describe a flexible and portable infrastructure for setting up large monolingual language corpora. The approach is based on collecting a large amount of monolingual text from various sources. The input data is processed on the basis of a sentence-based text segmentation algorithm. We describe the entry structure of the corpus database as well as various query types and tools for information extraction. Among them, the extraction and usage of sentence-based word collocations is discussed in detail. Finally we give an overview of different application for this language resource. A WWW interface allows for public access to most of the data and information extraction tools (
227 Automatic Transliteration and Back-transliteration by Decision Tree Learning Automatic transliteration and back-transliteration across languages with drastically different alphabets and phonemes inventories such as English/Korean, English/Japanese, English/Arabic, English/Chinese, etc, have practical importance in machine translation, cross-lingual information retrieval, and automatic bilingual dictionary compilation, etc. In this paper, a bi-directional and to some extent language independent methodology for English/Korean transliteration and back-transliteration is described. Our method is composed of character alignment and decision tree learning. We induce transliteration rules for each English alphabet and back-transliteration rules for each Korean alphabet. For the training of decision trees we need a large labeled examples of transliteration and back-transliteration. However this kind of resources are generally not available. Our character alignment algorithm is capable of highly accurately aligning English word and Korean transliteration in a desired way.
228 Shallow Discourse Genre Annotation in CallHome Spanish The classification of speech genre is not yet an established task in language technologies. However we believe that it is a task that will become fairly important as large amounts of audio (and video) data become widely available. The technological cability to easily transmit and store all human interactions in audio and video could have a radical impact on our social structure. The major open question is how this information can be used in practical and beneficial ways. As a first approach to this question we are looking at issues involving information access to databases of human-human interactions. Classification by genre is a first step in the process of retrieving a document out of a large collection. In this paper we introduce a local notion of speech activities that are exist side-by-side in conversations that belong to speech-genre: While the genre of CallHome Spanish is personal telephone calls between family members the actual instances of these calls contain activities such as storytelling, advising, interrogation and so forth. We are presenting experimental work on the detection of those activities using a variety of features. We have also observed that a limited number of distinguised activities can be defined that describes most of the activities in this database in a precise way.
230 Building a Treebank for French Very few gold standard annotated corpora are currently available for French. We present an ongoing project to build a reference treebank for French starting with a tagged newspaper corpus of 1 Million words (Abeillé et al., 1998), (Abeillé and Clément, 1999). Similarly to the Penn TreeBank (Marcus et al., 1993), we distinguish an automatic parsing phase followed by a second phase of systematic manual validation and correction. Similarly to the Prague treebank (Hajicova et al., 1998), we rely on several types of morphosyntactic and syntactic annotations for which we define extensive guidelines. Our goal is to provide a theory neutral, surface oriented, error free treebank for French. Similarly to the Negra project (Brants et al., 1999), we annotate both constituents and functional relations.
233 Establishing the Upper Bound and Inter-judge Agreement of a Verb Classification Task Detailed knowledge about verbs is critical in many NLP and IR tasks, yet manual determination of such knowledge for large numbers of verbs is difficult, time-consuming and resource intensive. Recent responsesto this problem have attempted to classify verbs automatically, as a first step to automatically build lexical resources. In order to estimate the upper bound of a verb classification task, which appears to be difficult and subject to variability among experts, we investigated the performance of human experts in controlled classification experiments. We report here the results of two experiments—using a forced-choice task and a non-forced choice task—which measure human expert accuracy (compared to a gold standard) in classifying verbs into three pre-defined classes, as well as inter-expert agreement. To preview, we find that the highest expert accuracy is 86.5% agreement with the gold standard, and that inter-expert agreement is not very high (K between .53 and .66). The two experiments show comparable results.
234 Layout Annotation in a Corpus of Patient Information Leaflets We discuss the problems and issues that arised during the development of a procedure for annotating layout in a corpus of Patient Information Leaflets. We show how the genre of the corpus as well as the aim of the annotation influenced the annotation scheme. We also describe the automatic annotation procedure.
235 A New Methodology for Speech Corpora Definition from Internet Documents In this paper, a new methodology for speech corpora definition from internet documents is described, in order to record a large speech database, dedicated to the training and testing of acoustic models for speech recognition. In the first section, the Web robot which is in charge of collecting Web pages from Internet is presented, then the web text to French sentences filtering mechanism is explained. Some information about the corpus organization (90% for training and 10% for test) is given. In the third section, the phoneme distribution of the corpus is presented and comparison is made with others French language studies. Finally tools and planning for recording the speech database with more than one hundred speakers are described.
236 Coping with Lexical Gaps when Building Aligned Multilingual Wordnets In this paper we present a methodology for automatically classifying the translation equivalents of a machine readable bilingual dictionary in three main groups: lexical units, lexical gaps (that is cases when a lexical concept of a language does not have a correspondent in the other language) and translation equivalents that need to be manually classified as lexical units or lexical gaps. This preventive classification reduces the manual work necessary to cope with lexical gaps in the construction of aligned multilingual wordnets.
237 Design and Construction of Knowledge base for Verb using MRD and Tagged Corpus This paper represents the procedure of building syntactic knowledge base. This study is to construct basic sentence pattern automatically by using the POS-tagged corpus in balanced KAIST corpus, and electronic dictionary for Korean, and to construct syntactic knowledge base with specific information added to the lexicographer's analysis. The summary of work process will be as follows: 1) Extraction of characteristic verb targeting the high frequency verb from KAIST corpus 2) Constructing sentence pattern from each verb case frame structure extracted from MRD 3) Making out the noun categories of sentence pattern through KCP examples 4) Semantic classification of selected verb suitable for classified sentence pattern 5) Description of hyper concept to individual noun categories 6) Putting the translated words in Japanese to each noun and verb
239 Introduction of KIBS (Korean Information Base System) Project This project has been carried out on the basis of resources and tools for Korean NLP. The main research is the construction of raw corpus of 64 million tokens and Part-of-Speech tagged corpus of about 11 million tokens. And we develop some analytic tools to construct and some supporting tools to navigate them. This paper represents the present state of the work carried out by the KIBS project. We introduce a KAIST tag set of POS and syntax for standard corpus and annotation principles. And we explain several error types represented in tagged corpus.
241 Resources for Multilingual Text Generation in Three Slavic Languages The paper discusses the methods followed to re-use a large-scale, broad-coverage English grammar for constructing similar scale grammars for Bulgarian, Czech and Russian for the fast prototyping of a multilingual generation system. We present (1) the theoretical and methodological basis for resource sharing across languages, (2) the use of a corpus-based contrastive register analysis, in particular, contrastive analysis of mood and agency. Because the study concerns reuse of the grammar of a language that is typologically quite different from the languages treated, the issues addressed in this paper appear relevant to a wider range of researchers in need of large-scale grammars for less-researched languages.
243 A Multi-view Hyperlexicon Resource for Speech and Language System Development New generations of integrated multimodal speech and language systems with dictation, readback or talking face facilities require multiple sources of lexical information for development and evaluation. Recent developments in hyperlexicon development offer new perspectives for the development of such resources which are at the same time practically useful, computationally feasible, and theoretically well- founded. We describe the specification, three-level lexical document design principles, and implementation of a MARTIF document structure and several presentation structures for a terminological lexicon, including both on demand access and full hypertext lexicon compilation. The underlying resource is a relational lexical database with SQL querying and access via a CGI internet interface. This resource is mapped on to the hypergraph structure which defines the macrostructure of the hyperlexicon.
244 Enabling Resource Sharing in Language Generation: an Abstract Reference Architecture The RAGS project aims to develop a reference architecture for natural language generation,to facilitate modular development of NLG systams as well as evaluation of components, systems and algorithms. This paper gives an overview of the proposed framework, describing an abstract data model with five levels of representation: Conceptual, Semantic, Rhetorical, Document and Syntactic. We report on a re-implementation of an existing system using the RAGS data model.
246 Issues in Design and Collection of Large Telephone Speech Corpus for Slovenian Language In this paper, different issues in design, collection and evaluation of the large vocabulary telephone speech corpus of Slovenian language are discussed. The database is composed of three text corpora containing 1530 different sentences. It contains read speech of 82 speakers where each speaker read in average more than 200 sentences and 21 speakers read also the text passage of 90 sentences. The initial manual segmentation and labeling of speech material was performed. Based on this the automatic segmentation was carried out. The database should facilitate the development of speech recognition systems to be used in dictation tasks over the telephone. Until now the database was used mostly for isolated digit recognition tasks and word spotting.
247 ARC A3: A Method for Evaluating Term Extracting Tools and/or Semantic Relations between Terms from Corpora This paper describes an ongoing project evaluating Natural Language Processing (NLP) systems. The aim of this project is to test software capabilities in automatic or semi-automatic extraction of terminology from French corpora in order to build tools used in NLP applications. We are putting forward a strategy based on qualitative evaluation. The idea is to submit the results to specialists (i.e. field specialists, terminologists and/or knowledge engineers). The research we are conducting is sponsored by the ''Association des Universites Francophones'' (AUF) an international Organisation whose mission is to promote the dissemination of French as a scientific medium. Software submitted to this evaluation are conceived by French, Canadian and US research institutions (National Scientific Research Centre and Universities) and/or companies : CNRS (France), XEROX, and LOGOS Corporation among others.
248 A Parallel English-Japanese Query Collection for the Evaluation of On-Line Help Systems An experiment concerning the creation of parallel evaluation data for information retrieval is presented. A set of English queries was gathered for the domain of wordprocessing using Lotus Ami Pro. A set of Japanese queries was then created from these. The answers to the queries were elicited from eight respondents comprising four native speakers of each language. We first describe how the queries were created and the answers elicited. We then present analyses of the responses in each language. The results show a lower level of agreeement between respondents than was expected. We discuss a refinement of the elicitation process which is designed to address this problem as well as measuring the integrity of individual respondents.
249 Principled Hidden Tagset Design for Tiered Tagging of Hungarian For highly inflectional languages, the number of morpho-syntactic descriptions (MSD), required to descriptionally cover the content of a word-form lexicon, tends to rise quite rapidly, approaching a thousand or even more set of distinct codes. For the purpose of automatic disambiguation of arbitrary written texts, using such large tagsets would raise very many problems, starting from implementation issues of a tagger to work with such a large tagsets to the more theory-based difficulty of sparseness of training data. Tiered tagging is one way to alleviate this problem by reformulating it in the following way: starting from a large set of MSDs, design a reduced tagset, Ctag-set, manageable for the current tagging technology. We describe the details of the reduced tagset design for Hungarian, where the MSD-set cardinality is several thousand. This means that designing a manageable C-tagset calls for severe reduction in the number of the MSD features, a process that requires careful evaluation of the features.
250 Evaluating Wordnets in Cross-language Information Retrieval: the ITEM Search Engine This paper presents the ITEM multilingual search engine. This search engine performs full lexical processing (morphological analysis, tagging and Word Sense Disambiguation) on documents and queries in order to provide language-neutral indexes for querying and retrieval. The indexing terms are the EuroWordNet/ITEM InterLingual Index records that link wordnets in 10 languages of the European Community (the search engine currently supports Spanish, English and Catalan). The goal of this application is to provide a way of comparing in context the behavior of different Natural Language Processing strategies for Cross-Language Information Retrieval (CLIR) and, in particular, different Word Sense Disambiguation strategies for query translation and conceptual indexing.