Clustering Concept Hierarchies from Text


Philipp Cimiano, Andreas Hotho, Steffen Staab

Institute AIFB, University of Karlsruhe




We present a novel approach to learning taxonomies or concept hierarchies from text. The approach is based on Formal Concept Analysis, a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Our approach is based on the distributional hypothesis, i.e. that nouns or terms are similar to the extent to which they share contexts. Further, we assume that verbs pose more or less strong selectional restrictions on their arguments. The concept hierarchy is built via Formal Concept Analysis using syntactic dependencies as attributes. The approach is evaluated by comparing the produced concept hierarchies against two handcrafted taxonomies from two different domains: tourism and finance. We compare the results of our approach against a hierarchical bottom-up clustering algorithm as well as against Bi-Section-Kmeans as an instance of a top-down clustering algorithm.


term clustering, lexical acquisition, ontology learning

Language(s) English
Full Paper