Pattern Discovery in Named Organization Corpus
Hsin-Hsi Chen, Yi-Lin Chu
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
This paper presents how to mine formulation rules from a named organization corpus. The TEIRESIAS algorithm, which is widely used in bioinformatics domain, is adopted. The experimental results based on MET2 test bed show that the approach of regarding the morpheme of a keyword as a cluster is the best, the approach of regarding all the keywords as the same cluster is the next, and the approach of regarding each keyword as a cluster is the worse. The performance using morpheme-based approach is a little better than that of hand-crafted approach. The methodology can be easily extended to other types of named entities.
Named Entity Extraction, Named Organization Corpus, Pattern Discovery, TEIRESIAS algorithm