A Note on Parameter Settings

All experiments reported above were performed with InitSize = 100 and MaxIncSize = 50. Different variations of the InitSize parameter have been investigated by [59]. Their results indicate that the algorithm is quite insensitive to this parameter, which we have empirically confirmed. Nevertheless, it might be worthwhile to use a theoretical analysis of sub-sampling similar to the one performed by [36] for determining promising values of this parameter for a particular domain.

**Figure 8:** Number of examples processed by the windowing algorithms for varying values of their *MaxIncSize* parameter (shown on a logarithmic scale).
$\resizebox{0.45\textwidth}{!}{\includegraphics{krk-incvar.ps}}$

However, we attribute more significance to the choice of the MaxIncSize parameter, which specifies the maximum number of examples that can be added to a window. Figure 8 shows the results of experiments with the 10,000 examples set of the KRK domain, in which we varied MaxIncSize from 10 to 5000 (plotted on a logarithmic scale on the x-axis). The performance of the algorithms in terms of number of processed examples is best if this parameter is kept comparably low. In the range of 10 to 50 examples, the parameter is relatively insensitive to its exact setting. If more examples are added to the window, performance degrades. For example at MaxIncSize = 50, WIN-DOS-3.1 performs about 4 iterations of the basic learning algorithm processing a total of about 700 examples, the final window containing about 250 examples. At MaxIncSize = 1000 on the other hand, the basic learning module not only has to process about twice as many examples, but windowing also takes more iterations to converge. Similar behavior can be observed for WIN-DOS-95. Thus it seems to be important to continuously evaluate the learned theories in order to focus the learner on the parts of the search space that have not yet been correctly learned. This finding contradicts the heuristic that is currently employed in C4.5, namely to add at least half of the total misclassified examples. However, this heuristic was formed in order to make windowing more effective in noisy domains [53], a goal that in our opinion cannot be achieved with merely using a noise-tolerant learner inside the windowing loop, for reasons discussed in the next section.