Rahul Sukthankar
Intel Research Pittsburgh & Robotics Institute, Carnegie Mellon
Abstract
Cross-validation is an established technique for estimating the
accuracy of a classifier and is normally performed either using
a number of random test/train partitions of the data, or using
k-fold cross-validation. We present a technique for calculating
the complete cross-validation for nearest-neighbor classifiers:
i.e., averaging over all desired test/train partitions of data.
This technique is applied to several common classifier variants
such as K-nearest-neighbor, stratified data partitioning and
arbitrary loss functions. We demonstrate, with complexity
analysis and experimental timing results, that the technique
can be performed in time comparable to k-fold cross-validation,
though in effect it averages an exponential number of trials.
We show that the results of complete cross-validation are biased
equally compared to subsampling and k-fold cross-validation,
and there is some reduction in variance. This algorithm offers
significant benefits both in terms of time and accuracy.
Most of the talk is based on: M. Mullin, R. Sukthankar. Complete Cross-Validation for Nearest Neighbor Classifiers Proceedings of ICML, 2000.
|
Pradeep Ravikumar Last modified: Thu Feb 5 16:16:30 EST 2004