PBIE: A Data Preparation Toolkit toward Developing a Parsing-Based Information Extraction System


Junko Hosaka, Igor V. Kurochkin, Akihiko Konagaya

RIKEN Genomic Sciences Center




We have developed a toolkit in which an annotation tool, a syntactic tree editor, and an extraction rule editor interact dynamically. Its output can be stored in a database for further use. In the field of biomedicine, there is a critical need for automatic text processing. However, current language processing approaches suffer from insufficient basic data incorporating both human domain expertise and domain-specific language processing capabilities. With the annotation tool presented here, a set of ggold standardsh can be collected, representing what should be extracted. At the same time, any change in annotation can be viewed on an associated syntactic tree. These facilities provide a clear picture of the relationship between the extraction target and the syntactic tree. Underlying sentences can be analyzed with a parser which can be plugged in, or a set of parsed sentences can be used to generate the tree. Extraction rules written with the integrated editor can be applied at once, and their validity can immediately be verified both on the syntactic tree and on the sentence string by coloring the corresponding segments. Thus our toolkit enables the user to efficiently construct parse-based extraction rules. PBIE2 works under Windows 2000/XP and requires Microsoft Internet Explorer 6.0 or higher. The data can be stored in Microsoft Access.


Tool, Information Extraction, Biomedicine



Full Paper