Progress on Multi-lingual Named Entity Annotation Guidelines using RDF(S)
Nigel Collier (National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan)
Koichi Takeuchi (National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan)
Chikashi Nobata (Communications Research Laboratory 2-2-2 Hikaraidai, Seika-cho, Soraku-gun, Kyoto, Japan)
Junichi Fukumoto (Ritsumeikan University Noji-higashi, Kusatsu-shi, Shiga 525-8577, Japan)
Norihiro Ogata (Osaka University 1-8 Machikaneyama, Toyonaka, Osaka, Japan)
WO23: Corpus Analysis, Annotation, Representation
This paper provides a discussion and concise summary of the PIA (Portable Information Access project) guidelines for annotators and tool developers for annotating what we call named entity ‘plus’ (NE+) expressions such as individual names or technical terms that we want to distinguish for whatever reason from the rest of a text. In particular we consider how to annotate locally ambiguous syntactic and semantic structures. We provide notation that conforms to RDF(S) so that annotated documents can have their content accessed on the Semantic Web, i.e. the next generation World Wide Web. In this new framework named entities become instances of concepts in an explicit ontology, and the base text provides links to the annotation and ontology data files.