LREC 2000 2nd International Conference on Language Resources & Evaluation  
Home Basic Info Archaeological Zappeion Registration Conference

Conference Papers

Program
Papers
Sessions
Abstracts
Authors
Keywords
Search

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.


Previous Paper   Next Paper  

Title Building a Treebank for Italian: a Data-driven Annotation Schema
Authors Bosco Cristina (Dipartimento di Informatica, Universita di Torino, c.so Svizzera 185, 10149, Torino (Italy), bosco@di.unito.it)
Lombardo Vincenzo (DISTA – Universita del Piemonte Orientale “A. Avogadro”, c.so Borsalino 54, 15100 Alessandria, Italy, Centro di Scienza Cognitiva – Universita di Torino, via Lagrange 3, 10123 Torino, Italy, vincenzo@di.unito.it)
Vassallo Daniela (Dipartimento di Informatica, Universita di Torino, c.so Svizzera 185, 10149, Torino (Italy), vassallo@di.unito.it)
Lesmo Leonardo (Dipartimento di Informatica, Universita di Torino, c.so Svizzera 185, 10149, Torino (Italy), lesmo@di.unito.it)
Keywords Annotation Schema, Corpus, Dependency Format, Italian, Treebank
Session Session WO2 - Treebanks
Abstract Many natural language researchers are currently turning their attention to treebank development and trying to achieve accuracy and corpus data coverage in their representation formats. This paper presents a data-driven annotation schema developed for an Italian treebank ensuring data coverage and consistency between annotation of linguistic phenomena. The schema is a dependency-based format centered upon the notion of predicate-argument structure augmented with traces to represent discontinuous constituents. The treebank development involves an annotation process performed by a human annotator helped by an interactive parsing tool that builds incrementally syntactic representation of the sentence. To increase the syntactic knowledge of this parser, a specific data-driven strategy has been applied. We describe the cyclical development of the annotation schema highlighting the richness and flexibility of the format, and we present some representational issues.

 

="Verdana">