LREC 2000 2nd International Conference on Language Resources & Evaluation  
Home Basic Info Archaeological Zappeion Registration Conference

Conference Papers

Program
Papers
Sessions
Abstracts
Authors
Keywords
Search

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.


Previous Paper   Next Paper  

Title LT TTT - A Flexible Tokenisation Tool
Authors Grover Claire (Language Technology Group University of Edinburgh, 2 Buccleuch Place Edinburgh EH8 9LW, Scotland, email: grover@cogsci.ed.ac.uk)
Matheson Colin (Language Technology Group University of Edinburgh, 2 Buccleuch Place Edinburgh EH8 9LW, Scotland, email:colin@cogsci.ed.ac.uk)
Mikheev Andrei (Language Technology Group University of Edinburgh, 2 Buccleuch Place Edinburgh EH8 9LW, Scotland, email:mikheev@cogsci.ed.ac.uk)
Moens Marc (Language Technology Group University of Edinburgh, 2 Buccleuch Place Edinburgh EH8 9LW, Scotland, marcg@cogsci.ed.ac.uk)
Keywords Corpus Preparation, Information Extraction, Named Entity Recognition, Tokenisation, XML Mark-Up
Session Session WP6 - Tools in the Written Area
Abstract We describe LT TTT, a recently developed software system which provides tools to perform text tokenisation and mark-up. The system includes ready-made components to segment text into paragraphs, sentences, words and other kinds of token but, crucially, it also allows users to tailor rule-sets to produce mark-up appropriate for particular applications. We present three case studies of our use of LT TTT: named-entity recognition (MUC-7), citation recognition and mark-up and the preparation

 

="Verdana">