next up previous
Next: Other Theoretical Objections to Up: Previous Theoretical and Experimental Previous: Previous Theoretical and Experimental

The Law of Conservation of Generalization Performance

The conservation law of generalization performance [Schaffer, 1994] proves that no learning bias can outperform any other bias over the space of all possible learning tasksgif. It follows that if Occam's razor is a valuable learning bias, it can only be so for some subset of all possible learning tasks. It might be argued that the set of `real-world' learning tasks is such a subset.

This paper is predicated on accepting the proposition that the set of `real-world' learning tasks is distinguished from the set of all possible learning tasks in respects that render the conservation law inapplicable. Rao, Gordon, and Spears [1995] argue that this is the case because learning tasks in our universe are not uniformly distributed across the space of all possible learning tasks.

But why should this be so? One argument in support of this proposition is as follows. `Real-world' learning tasks are defined by people for use with machine learning systems. To this end, the task constructors will have sought to ensure that the independent variables (class attributes) are related to the dependent variables (other attributes) in ways that can be captured within the space of classifiers that are made available for the learning system. Actual machine learning tasks are not drawn randomly from the space of all possible learning tasks. The human involvement in the formulation of the problems ensures this.

As a simple thought experiment in support of this proposition, consider a learning task for which the class attribute is generated by a random number generator and in no way relates to the other attributes. The majority of machine learning researchers would not be in the slightest disconcerted if their systems failed to perform well when trained on such data. As a further example, consider a learning task for which the class attribute is a simple count of the number of missing attribute values for an object. Assume that such a learning task was submitted to a system, such as C4.5 [Quinlan, 1993], that develops classifiers that have no mechanism for testing during classification whether an attribute value is missing. Again, the majority of machine learning researchers would be unconcerned that their systems failed to perform well in such circumstances. Machine learning is simply unsuited to such tasks. A knowledgeable user would not apply machine learning to such data, at least not in the expectation of obtaining a useful classifier therefrom.

This paper explores the applicability of the Occam thesis to `real-world' learning tasks.


next up previous
Next: Other Theoretical Objections to Up: Previous Theoretical and Experimental Previous: Previous Theoretical and Experimental

Geoff Webb
Mon Sep 9 12:13:30 EST 1996