Sub-Sampling Techniques

Next: Active Learning Up: Related Work Previous: Related Work

Sub-Sampling Techniques

Several authors have discussed approaches that use sub-sampling algorithms different from windowing. For decision tree algorithms it has been proposed to use dynamic sub-sampling at each node in order to determine the optimal test. This idea has been originally proposed, but not evaluated by [7]. It was further explored in Catlett's work on peepholing [11], which is a sophisticated procedure for using sub-sampling to eliminate unpromising attributes and thresholds from consideration.

[35] discuss a different approach that successively expands the learning window. ELC adds randomly selected examples to the window until an extrapolation of the learning curve (accuracy over window size) does no longer promise significant gains for further enlargements of the window. In terms of run-time, the authors note that this technique can in general only gain efficiency for incremental learning algorithms.

Partitioning, proposed by [21,20], splits the example space into segments of equal size and combines the rules learned on each partition. This technique produced promising results in noisy domains, but has substantially decreased learning accuracy in non-noisy domains. Besides, the technique seems to be tailored to a specific learning algorithm and not applicable to a wider group of rule learning algorithms [20].

Sub-sampling techniques have recently been frequently used for increasing the accuracy of a classifier. The basic idea is to train classifiers on multiple subsamples of the data and combine their predictions, usually by voting. Such approaches include bagging [5], where all examples are sub-sampled with equal probabilities, and boosting [22,24] -- also known as arcing [4] -- where examples that have been misclassified in previous iterations are more likely to be selected in the next iterations. Interestingly, [5] has noted that these techniques rely on unstable base learners, while we conjecture that the better performance of windowing with rule learning algorithms is due to more stability (section 3.3). While the main focus of these works is to improve the accuracy of a given learning algorithm, it would be interesting to evaluate bagging and boosting techniques along their run-time and memory requirements as well. For example, [6] discusses arcing algorithms for datasets where limited memory requires the use of sub-sampling.

In Knowledge Discovery for Databases, sub-sampling techniques have been investigated for the discovery of association rules. [56] describes a straight-forward approach, where association rules are discovered from a subsample of the data and their validity is checked on the complete dataset. Kivinen and Mannila give theoretical bounds for the sample size that is required for establishing (with a given maximum error probability) the truth of association rules [36] or functional dependencies [37] that have been discovered from the sample.

Next: Active Learning Up: Related Work Previous: Related Work