... iteration.1
Quinlan does not explicitly specify how this case should be handled, but we think it makes sense that way.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... examples.2
Such a linear growth was already observed by [50] in some early experiments with windowing.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... DOS3
Dull Old Separate-and-conquer.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... illegal.4
Note, however, that not all different positions form a different feature vector with the specified features as has been pointed out by [49].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... data.5
This is even worse in the usual representation of this problem [46], where instead of the between relation, a less_than predicate is given in the background knowledge. In this representation, which will be used in all our experiments, rules 6 and 7 have to be reformulated using the less_than predicate, which is only possible if each of them is replaced with two new rules.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... outcomes,6
Actually, C4.5 maximizes information gain, which is computed by subtracting the average entropy from an information value that is constant for all considered tests.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... results.7
This equal-frequency sub-sampling method has first been described by [7] for dynamically drawing a subsample at each node in a decision tree from which the best split for this node is determined. In C4.5 it is used to seed the initial window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... algorithm.8
Measured in CPU seconds of a microSPARC 110MHz running compiled Allegro Common Lisp code under SUN Unix 4.1.3.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...).9
We did not compute this measurement for the experiments in noisy domains (section 5) because the way we compute the completeness check (section 5.2.2) invalidates these measurements.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... measure.10
The measure could not be computed for the Shuttle domain, because this domain contains numerical attributes, which cannot be handled in a straight-forward fashion.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... domain11
Note that this example set is only a subset of the Tic-Tac-Toe data set that has been studied by [59]. We did not have access to the full data set.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...[23].12
The sorting causes the slightly super-linear slope in the run-time curves of DOS.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... algorithms,13
Think, e.g., of Ken Thompson's 3 CD-Roms of chess endgames, to which every competitive chess program has an interface. A compression of these endgames into a simple set of rules would certainly be appreciated by the computer chess industry [27].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... accuracy:14
Note that (4) differs from the version of [29] by using SE(AccTot(r)) instead of SE(AccWin(r)) on the right hand side. We have found this version to work somewhat more reliably.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... WIN-RIP-NT.15
NT, of course, stands for Noise-Tolerant. Special thanks to the reviewer who suggested this notation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...$\alpha$.16
In a new domain, we would advice to try $\alpha = 1.0$ and $\alpha = 0.5$ in that order.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... space.17
In representation languages that extend flat feature vectors, such as first-order logic, the complexity of a learning problem also depends crucially on the average cost of matching an instance with a rule. [54] demonstrate a technique for reducing these potentially exponential costs via sub-sampling.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.