Exploiting Semantic Web Technologies for Intelligent Access to Historical Documents


Nancy Ide (1), David Woolner (2)

(1) Department of Computer Science, Vassar College Poughkeepsie, NY 12604-0520 USA; (2) Department of History, Marist College Poughkeepsie, NY 12601 USA




The FDR/Pearl Harbor Project involves the enhancement of materials drawn from the Franklin D. Roosevelt Library and Digital Archives, which includes a range of image, sound, video and textual data. The project is undertaking the encoding, annotation, and multi-modal linkage of a portion of the collection, and enhancement of a web-based interface that enables exploitation of state-of-the-art methods for search and retrieval. We are currently developing a pilot project that includes government correspondence and documents produced in the sixth months prior to and including December 7, 1941, the date of the Japanese attack on Pearl Harbor, which has obvious historical, political, and general interest. The major activities in the project involve development of a model for historical documents and associated data and its instantiation using W3 standards, including XML, the Resource Definition Framework (RDF and RDF schemas), and the Ontology Web Language (OWL); development of automated means, or enhancement of existing software, to identify and mark relevant elements within these data; and exploration of the potential to automatically extract ontological information so as to enable sophisticated search and retrieval via inferencing.


corpus creation, corpus annotation, semantic web

Language(s) English
Full Paper