Abstract
We discuss the problem of generating dependency-trees to model a given
set of data. The traditional solution, proposed by Chow and Liu, solves
the problem by finding a minimum-spanning tree in the attribute space,
and runs in time quadratic in the number of attributes and linear in the
number of data-points. We present an improved algorithm that scales
better with the size of the data. We begin by proposing a new
minimum-spanning tree algorithm. Surprisingly, this algorithm is in fact
slower than currently-known algorithms. We then show how to embed it in a
framework that considers probabilistic confidence intervals for the edge
weights. This leads to a fast dependency-tree implementation. We present
experimental results for datasets with hundreds of attributes and millions
of points.
Joint work with Andrew Moore. |
Charles Rosenberg Last modified: Thu Mar 14 10:34:21 EST 2002