Next: 3.8 Boosting and Noise
Up: 3. Results
Previous: 3.6 Bagging versus Simple
Another interesting question is how effective the different
methods are for neural networks and decision trees.
Figures 7, 8, and 9
compare the error rates and reduction in error values for
Ada-Boosting, Arcing, and Bagging respectively.
Note that we graph error rate rather than percent reduction in error rate
because the baseline for each method (decision trees for Ada-Boosting on
decision trees versus neural networks for Ada-Boosting on neural networks)
may partially explain the differences in percent reduction.
For example, in the promoters-936 problem using Ada-Boosting, the much larger
reduction in error for the decision tree approach may be due to the fact
that decision trees do not seem to be as effective for this problem, and
Ada-Boosting therefore produces a larger percent reduction in the error for
decision trees.
Figure 7:
Error rates for Ada-Boosting ensembles.
The white portion shows the reduction in error of Ada-Boosting compared
to a single classifier while increases in error are shown in black.
The data sets are sorted
by the ratio of reduction in ensemble error to overall error
for neural networks.
|
Figure 8:
Error rates for Arcing ensembles.
The white portion shows the reduction in error of Arcing compared
to a single classifier while increases in error are shown in black.
The data sets are sorted
by the ratio of reduction in ensemble error to overall error
for neural networks.
|
Figure 9:
Error rates for Bagging ensembles.
The white portion shows the reduction in error of Bagging compared
to a single classifier while increases in error are shown in black.
The data sets are sorted
by the ratio of reduction in ensemble error to overall error
for neural networks.
|
The results show that in many cases if a single decision tree had lower (or higher)
error than a single neural network on a data set, then the decision-tree ensemble
methods also had lower (or higher) error than their neural network counterpart.
The exceptions to this rule generally happened on the same
data set for all three ensemble methods (e.g., hepatitis, soybean,
satellite, credit-a, and heart-cleveland).
These results suggest that (a) the performance of the
ensemble methods is dependent on both the data set and classifier method,
and (b) ensembles can, at least in some cases, overcome
the inductive bias of its component learning algorithm.
Next: 3.8 Boosting and Noise
Up: 3. Results
Previous: 3.6 Bagging versus Simple
David Opitz
1999-08-24