Figure 1 illustrates the basic framework for a classifier
ensemble. In this example, neural networks are the basic classification method, though
conceptually any classification method (e.g., decision trees) can be substituted in place of
the networks.
Each network in Figure 1's
ensemble (network 1 through network N in this case) is
trained using the training instances for that network.
Then, for each example, the predicted output of each of these networks
(oi in Figure 1) is combined to produce the output
of the ensemble (
in Figure 1).
Combining the output of several classifiers is useful only if there is disagreement among them. Obviously, combining several identical classifiers produces no gain. Hansen and Salamon [1990] proved that if the average error rate for an example is less than 50% and the component classifiers in the ensemble are independent in the production of their errors, the expected error for that example can be reduced to zero as the number of classifiers combined goes to infinity; however, such assumptions rarely hold in practice. Krogh and Vedelsby [1995] later proved that the ensemble error can be divided into a term measuring the average generalization error of each individual classifier and a term measuring the disagreement among the classifiers. What they formally showed was that an ideal ensemble consists of highly correct classifiers that disagree as much as possible. Opitz and Shavlik [1996a,1996b] empirically verified that such ensembles generalize well.
As a result, methods for creating ensembles center around producing classifiers that disagree on their predictions. Generally, these methods focus on altering the training process in the hope that the resulting classifiers will produce different predictions. For example, neural network techniques that have been employed include methods for training with different topologies, different initial weights, different parameters, and training only on a portion of the training set [Alpaydin1993,Drucker et al.1994,Hansen Salamon1990,Maclin Shavlik1995]. In this paper we concentrate on two popular methods (Bagging and Boosting) that try to generate disagreement among the classifiers by altering the training set each classifier sees.