Automatic Extraction of Hyponyms from Japanese Newspapers. Using Lexico-syntactic Patterns


Maya ANDO (1), Satoshi SEKINE (2), Shun ISHIZAKI (1)

(1) Keio University, (2) New York University




We describe a method to automatically extract hyponyms from Japanese newspapers. First, we discover patterns which can extract hyponyms of a noun, such as "A nado-no B (B such as A)", then we apply the patterns to the newspaper corpus to extract instances. The procedure works best to extract hyponyms of concrete things in the middle of the word hierarchies. The precision is 49-87 percent depending on the patterns. We compare the extracted hyponyms and those associated by humans. We find that the popular words in the associative concept dictionary are likely to be found in the corpus but also many additional hyponyms can be extracted from 32 years of newspaper articles.


Corpus based method, Associative concept dictionary, Hyponym, Automatic discovery, Lexicosyntactic Pattern

Language(s) Japanese
Full Paper