An Annotation Scheme for a Rhetorical Analysis of Biology Articles


Yoko Mizuta, Nigel Collier

National Institute of Informatics




In information extraction from scientific texts, it is crucially important to identify the unique contribution of the research. The task is complicated by the large number of statements made in each article that pertain to results, including reference to previous work and technical details. Simple keyword searches are helpful for a content-based analysis but fail to tell new results from other ones. We aim to approach the problem from a rhetorical perspective and give a 'zone analysis' (ZA) of texts in light of Teufel, Carletta & Moens (1999). We analyze a text into 'zones' with a shallow nesting based on the rhetorical status which each sequence of statements fit into and annotate the text correspondingly. Our current focus is on the molecular biology domain. In this paper, we propose an annotation scheme for ZA based on an empirical analysis of major online journals (EMBO, NAR, PNAS, and JCB), and illustrate how it works. Our scheme provides a way to differentiate the text in terms of the aspects of the author's own work (e.g. experimental procedure, findings, implications) and to identify a set of statements relating data and findings and therefore helps identify the author's new results and findings.


information extraction, rhetorical analysis, zone, annotation scheme, biology texts

Language(s) English (in scientific texts)
Full Paper