3.7 Neural Networks versus Decision Trees

Next: 3.8 Boosting and Noise Up: 3. Results Previous: 3.6 Bagging versus Simple

3.7 Neural Networks versus Decision Trees

Another interesting question is how effective the different methods are for neural networks and decision trees. Figures 7, 8, and 9 compare the error rates and reduction in error values for Ada-Boosting, Arcing, and Bagging respectively. Note that we graph error rate rather than percent reduction in error rate because the baseline for each method (decision trees for Ada-Boosting on decision trees versus neural networks for Ada-Boosting on neural networks) may partially explain the differences in percent reduction. For example, in the promoters-936 problem using Ada-Boosting, the much larger reduction in error for the decision tree approach may be due to the fact that decision trees do not seem to be as effective for this problem, and Ada-Boosting therefore produces a larger percent reduction in the error for decision trees.

**Figure 7:** Error rates for Ada-Boosting ensembles. The white portion shows the reduction in error of Ada-Boosting compared to a single classifier while increases in error are shown in black. The data sets are sorted by the ratio of reduction in ensemble error to overall error for neural networks.
$\begin{figure}\centerline{\psfig{figure=nn-dt-boosting-new.ps,height=7.0in,bbllx=12pt,bblly=36pt,bburx=560pt,bbury=778pt} } \end{figure}$

**Figure 8:** Error rates for Arcing ensembles. The white portion shows the reduction in error of Arcing compared to a single classifier while increases in error are shown in black. The data sets are sorted by the ratio of reduction in ensemble error to overall error for neural networks.
$\begin{figure}\centerline{\psfig{figure=nn-dt-arcing-new.ps,height=7.0in,bbllx=12pt,bblly=36pt,bburx=560pt,bbury=778pt} } \end{figure}$

**Figure 9:** Error rates for Bagging ensembles. The white portion shows the reduction in error of Bagging compared to a single classifier while increases in error are shown in black. The data sets are sorted by the ratio of reduction in ensemble error to overall error for neural networks.
$\begin{figure}\centerline{\psfig{figure=nn-dt-bagging-new.ps,height=7.0in,bbllx=12pt,bblly=36pt,bburx=560pt,bbury=778pt} } \end{figure}$

The results show that in many cases if a single decision tree had lower (or higher) error than a single neural network on a data set, then the decision-tree ensemble methods also had lower (or higher) error than their neural network counterpart. The exceptions to this rule generally happened on the same data set for all three ensemble methods (e.g., hepatitis, soybean, satellite, credit-a, and heart-cleveland). These results suggest that (a) the performance of the ensemble methods is dependent on both the data set and classifier method, and (b) ensembles can, at least in some cases, overcome the inductive bias of its component learning algorithm.

Next: 3.8 Boosting and Noise Up: 3. Results Previous: 3.6 Bagging versus Simple

David Opitz
1999-08-24