- ...
iteration.1
- Quinlan does not explicitly specify how this case
should be handled, but we think it makes sense that way.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... examples.2
- Such a
linear growth was already observed by
[50] in some early experiments with windowing.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... DOS3
- Dull Old
Separate-and-conquer.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... illegal.4
- Note, however, that not
all different positions form a different feature vector with the
specified features as has been pointed out by [49].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... data.5
- This is even worse in the usual
representation of this problem [46], where instead of the
between relation, a less_than predicate is given in the
background knowledge. In this representation, which will be used in
all our experiments, rules 6 and 7 have to be reformulated using the
less_than predicate, which is only possible if each of them is
replaced with two new rules.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... outcomes,6
- Actually, C4.5 maximizes
information gain, which is computed by subtracting the average entropy
from an information value that is constant for all considered tests.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... results.7
- This equal-frequency sub-sampling
method has first been described by [7] for dynamically drawing
a subsample at each node in a decision tree from which the best split
for this node is determined. In C4.5 it is used to seed the initial
window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... algorithm.8
- Measured in CPU seconds of a
microSPARC 110MHz running compiled Allegro Common Lisp code under SUN
Unix 4.1.3.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...).9
- We did not compute this measurement
for the experiments in noisy domains
(section 5) because the way we compute
the completeness check (section 5.2.2) invalidates
these measurements.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... measure.10
- The measure could not
be computed for the Shuttle domain, because this domain
contains numerical attributes, which cannot be handled in a
straight-forward fashion.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... domain11
- Note that this example set is
only a subset of the Tic-Tac-Toe data set that has been studied by
[59]. We did not have access to the full data set.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...[23].12
- The sorting causes the
slightly super-linear slope in the run-time curves of DOS.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... algorithms,13
- Think, e.g., of
Ken Thompson's 3 CD-Roms of chess endgames, to which every competitive
chess program has an interface. A compression of these endgames into a
simple set of rules would certainly be appreciated by the computer
chess industry [27].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... accuracy:14
- Note that (4) differs from the
version of [29] by using
SE(AccTot(r)) instead of
SE(AccWin(r)) on the right hand side. We have found
this version to work somewhat more reliably.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... WIN-RIP-NT.15
- NT, of course, stands for
Noise-Tolerant. Special thanks to the reviewer who suggested this
notation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....16
- In a new domain,
we would advice to try
and
in that
order.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... space.17
- In representation languages that extend
flat feature vectors, such as first-order logic, the complexity of a
learning problem also depends crucially on the average cost of
matching an instance with a rule. [54]
demonstrate a technique for reducing these potentially exponential
costs via sub-sampling.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.