Abstract
Detecting and counting overdensities in data is a common problem in
the physical and geographic sciences. One of the most successful of
recent algorithms for the counting version of the problem was
introduced by Cuevas, Febrero and Fraiman [Cuevas et. al. 2000], which will be
referred to as the CFF algorithm. This algorithm first determines the subset
of data points that are in high density regions using a non-parametric density
estimator. A clustering step follows where such high density points are
agglomerated. While this algorithm was originally intended to estimate the
number of clusters, it can also be used to perform non-parametric clustering
against a noisy background. However, the algorithm proposed by CFF is too
computationally expensive to work on large datasets with greater than two
dimensions. We propose an alternative implementation of the CFF algorithm
producing exactly the same results but addressing the computational problems
in both the density estimation and in the agglomeration step. We will then
illustrate the effectiveness of our approach on large multi-dimensional
astrophysics datasets.
This is joint work with Andrew Moore. Note that this talk is in partial fulfillment of the speaking requirement. |
Charles Rosenberg Last modified: Wed Apr 3 17:01:42 EST 2002