Next: A Closer Look at Up: Integrative Windowing Previous: Introduction

A Brief History of Windowing

Windowing dates back to early versions of the ID3 decision tree learning algorithm, where it was devised as an automated teaching procedure that allowed a preliminary version of ID3 to discover complete and consistent descriptions of various problems in a KRKN chess endgame [50]. Figure 1 shows the basic windowing algorithm as described in Quinlan's subsequent seminal paper on ID3 [51]. It starts by picking a random sample of a user-settable size InitSize from the total set of Examples. These examples are used for inducing a theory with a given learning algorithm. This theory is then tested on the remaining examples and all misclassified examples are removed from the test set and added to the window of the next iteration. In order to keep the size of the training set small, another parameter, MaxIncSize, controls the maximum number of examples that can be added to the training set in one iteration. If this number is reached, no further examples are tested and the next theory is learned from the new training set. To make sure that all examples are tested in the first few iterations, the examples that have already been tested (OldTest) are APPENDed to the examples that have not yet been used, so that testing will start with new examples in the next iteration.¹

**Figure 1:** The basic windowing algorithm

[51] argued that windowing is necessary for ID3 to tackle very large classification problems. Nevertheless, windowing has not played a major role in machine learning research. One reason for this certainly is the rapid development of computer hardware, which made the motivation for windowing seem less compelling. However, a good deal of this lack of interest can be attributed to an empirical study [59], in which the authors studied windowing with ID3 in various domains and concluded that it cannot be recommended as a procedure for improving efficiency. The best results were achieved in noise-free domains, such as the Mushroom domain, where windowing was able to perform on the same level as simple ID3, while its performance in noisy domains was considerably worse.

Despite the discouraging experimental evidence of [59], [53] implemented a new version of windowing into the C4.5 learning algorithm. It differs from the simple windowing version originally proposed for ID3 [51] in several ways:

While selecting examples, it takes care to make the class distribution as uniform as possible. This can lead to accuracy gains in domains with skewed distributions [10].
It includes at least half of the misclassified examples into next window, which supposedly guarantees faster convergence (fewer iterations) in noisy domains.
It can stop before all examples are correctly classified, if it appears that no further gains in accuracy are possible.
C4.5's -t parameter, which invokes windowing, allows it to perform multiple runs of windowing and to select the best tree.

Nevertheless, windowing is arguably one of C4.5's least frequently used options.

Recent work in the areas of Knowledge Discovery in Databases [36,56] and Intelligent Information Retrieval [40,60] has re-emphasized the importance of sub-sampling procedures for reducing both learning time and memory requirements. Thus the interest in windowing techniques has revived as well. We discuss some of the more recent approaches in section 6.

Next: A Closer Look at Up: Integrative Windowing Previous: Introduction