Title Integrated Linguistic Resources for Language Exploitation Technologies
Authors S. Strassel, C. Cieri, A. Cole, D. Dipersio, M. Liberman, X. Ma, M. Maamouri, K. Maeda
Abstract Linguistic Data Consortium has recently embarked on an effort to create integrated linguistic resources and related infrastructure for language exploitation technologies within the DARPA GALE (Global Autonomous Language Exploitation) Program. GALE targets an end-to-end system consisting of three major engines: Transcription, Translation and Distillation. Multilingual speech or text from a variety of genres is taken as input and English text is given as output, with information of interest presented in an integrated and consolidated fashion to the end user. GALE's goals require a quantum leap in the performance of human language technology, while also demanding solutions that are more intelligent, more robust, more adaptable, more efficient and more integrated. LDC has responded to this challenge with a comprehensive approach to linguistic resource development designed to support GALE's research and evaluation needs and to provide lasting resources for the larger Human Language Technology community.
Keywords data centers, linguistic resources, GALE, evaluation programs, transcription, translation, distillation
