Next: A Closer Look at
Up: Integrative Windowing
Previous: Introduction
A Brief History of Windowing
Windowing dates back to early versions of the ID3 decision tree
learning algorithm, where it was devised as an automated teaching
procedure that allowed a preliminary version of ID3 to discover
complete and consistent descriptions of various problems in a KRKN
chess endgame [50]. Figure 1 shows
the basic windowing algorithm as described in Quinlan's subsequent
seminal paper on ID3 [51]. It starts by picking a random sample
of a user-settable size InitSize from the total set of Examples. These examples are used for inducing a theory with a given
learning algorithm. This theory is then tested on the remaining
examples and all misclassified examples are removed from the test set
and added to the window of the next iteration. In order to keep the
size of the training set small, another parameter, MaxIncSize,
controls the maximum number of examples that can be added to the
training set in one iteration. If this number is reached, no further
examples are tested and the next theory is learned from the new
training set. To make sure that all examples are tested in the first
few iterations, the examples that have already been tested (OldTest) are APPENDed to the examples that have not yet been
used, so that testing will start with new examples in the next
iteration.1
Figure 1:
The basic windowing algorithm
|
[51] argued that windowing is necessary for ID3 to tackle very
large classification problems. Nevertheless, windowing has not played
a major role in machine learning research. One reason for this
certainly is the rapid development of computer hardware, which made
the motivation for windowing seem less compelling. However, a good
deal of this lack of interest can be attributed to an empirical study
[59], in which the authors studied windowing with ID3 in
various domains and concluded that it cannot be recommended as a
procedure for improving efficiency. The best results were achieved in
noise-free domains, such as the Mushroom domain, where
windowing was able to perform on the same level as simple ID3, while
its performance in noisy domains was considerably worse.
Despite the discouraging experimental evidence of [59],
[53] implemented a new version of windowing into the C4.5
learning algorithm. It differs from the simple windowing version
originally proposed for ID3 [51] in several ways:
- While selecting examples, it takes care to make the class
distribution as uniform as possible. This can lead to accuracy gains
in domains with skewed distributions [10].
- It includes at least half of the misclassified examples into
next window, which supposedly guarantees faster convergence (fewer
iterations) in noisy domains.
- It can stop before all examples are correctly classified, if it
appears that no further gains in accuracy are possible.
- C4.5's -t parameter, which invokes windowing, allows it to
perform multiple runs of windowing and to select the best tree.
Nevertheless, windowing is arguably one of C4.5's least frequently
used options.
Recent work in the areas of Knowledge Discovery in Databases
[36,56] and Intelligent Information
Retrieval [40,60] has
re-emphasized the importance of sub-sampling procedures for reducing
both learning time and memory requirements. Thus the interest in
windowing techniques has revived as well. We discuss some of the more
recent approaches in section 6.
Next: A Closer Look at
Up: Integrative Windowing
Previous: Introduction