LREC 2000 2nd International Conference on Language Resources & Evaluation  
Home Basic Info Archaeological Zappeion Registration Conference

Conference Papers

Program
Papers
Sessions
Abstracts
Authors
Keywords
Search

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.


Previous Paper   Next Paper  

Title Semi-automatic Construction of a Tree-annotated Corpus Using an Iterative Learning Statistical Language Model
Authors Shirai Kiyoaki (Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, kshirai@cl.cs.titech.ac.jp)
Tanaka Hozumi (Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, tanaka@cl.cs.titech.ac.jp)
Tokunaga Takenobu (Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, take@cl.cs.titech.ac.jp)
Keywords Human Intervention, Iterative Learning, Statistical Language Model, Tree-Annotated Coprpus
Session Session WP2 - Corpus Annotation
Abstract In this paper, we propose a method to construct a tree-annotated corpus, when a certain statistical parsing system exists and no tree-annotated corpus is available as training data. The basic idea of our method is to sequentially annotate plain text inputs with syntactic trees using a parser with a statistical language model, and iteratively retrain the statistical language model over the obtained annotated trees. The major characteristics of our method are as follows: (1)in the first step of the iterative learning process, we manually construct a tree-annotated corpus to initialize the statistical language model over, and (2) at each step of the parse tree annotation process, we use both syntactic statistics obtained from the iterative learning process and lexical statistics pre-derived from existing language resources, to choose the most probable parse tree.

 

rdana">