TIDES Language Resources: A Resource Map for Translingual Information Access
Christopher Cieri (University of Pennsylvania, Linguistic Data Consortium 3615 Market Street, Philadelphia, PA 19104-2608 U.S.A.)
Mark Liberman (University of Pennsylvania, Linguistic Data Consortium 3615 Market Street, Philadelphia, PA 19104-2608 U.S.A.)
WO13: Issues On LRs Infrastructures
Continuing improvements in human language algorithms, coupled with improvements in digital storage and processing, inspire growing confidence in multilingual information access systems. Systems exist to transcribe broadcast news, segment broadcasts into individual stories and sort them by topic. These technologies, useful in isolation, are now being combined to produce intelligent multilingual systems. DARPA TIDES combines technologies in detection, extraction, summarization and translation to create systems capable of searching a wide range of streaming multilingual text and speech sources, in real time, to provide effective access for English-speaking users. The broad scope of tasks and languages in programs like TIDES demands close coordination of research and shared resources. These resources includes large collections of raw text and speech; translations and summaries; annotations of topics, named entities and relations, syntactic structures and propositional content; lexicons; annotation specifications and protocols; and distribution formats and standards. The TIDES program has initiated ambitious attacks on difficult problems, with linguistic resources matched to the needs of each piece of the overall research enterprise. This paper will describe the coordinated language resources being created under the TIDES aegis.