Next: Introduction
Learning Concept Hierarchies from Text Corpora
using Formal Concept Analysis
Philipp Cimiano
, Andreas Hotho
and Steffen Staab
Institue AIFB, University of Karlsruhe
Knowledge and Data Engineering Group, University of Kassel
Institute for Computer Science, University of Koblenz-Landau
Abstract:
We present a novel approach to the automatic acquisition of taxonomies
or concept hierarchies from a text corpus. The approach is based on
Formal Concept Analysis (FCA), a method mainly used for the analysis of data,
i.e. for investigating and processing explicitly given information.
We follow Harris' distributional hypothesis and model the context
of a certain term as a vector representing syntactic dependencies
which are automatically acquired from the text corpus with a linguistic parser.
On the basis of this context information, FCA produces a lattice
that we convert into a special kind of partial order constituting
a concept hierarchy.
The approach is evaluated by comparing the resulting concept hierarchies
with hand-crafted taxonomies for two domains: tourism and finance.
We also directly compare our approach with hierarchical agglomerative
clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering
algorithm. Furthermore, we investigate the impact of using different
measures weighting the contribution of each attribute as well as of applying
a particular smoothing technique to cope with data sparseness.
Next: Introduction
Philipp Cimiano
2005-08-04