Abstract
This talk is about non-approximate acceleration of high dimensional k
nearest neighbor classifiers. We attempt to exploit the fact that even
if we want exact answers to nonparametric queries, we usually do not
need to explicitly find the datapoints close to the query, but merely
need to ask questions about the properties of that set of
datapoints. This offers a small amount of computational leeway, and we
investigate how much that leeway can be exploited.
This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this talk concentrates on pure k-NN classification. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 10^6 dimensions and 10^5 records, and show non-trivial speedups while giving exact answers. |
Charles Rosenberg Last modified: Tue Sep 30 16:25:14 EDT 2003