Pattern Discovery in Named Organization Corpus


Hsin-Hsi Chen, Yi-Lin Chu

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan




This paper presents how to mine formulation rules from a named organization corpus. The TEIRESIAS algorithm, which is widely used in bioinformatics domain, is adopted. The experimental results based on MET2 test bed show that the approach of regarding the morpheme of a keyword as a cluster is the best, the approach of regarding all the keywords as the same cluster is the next, and the approach of regarding each keyword as a cluster is the worse. The performance using morpheme-based approach is a little better than that of hand-crafted approach. The methodology can be easily extended to other types of named entities.


Named Entity Extraction, Named Organization Corpus, Pattern Discovery, TEIRESIAS algorithm

Language(s) Chinese
Full Paper