Minimum encoding length approaches perform induction by seeking a theory that enables the most compact encoding of both the theory and available data. Two key approaches have been developed, Minimum Message Length (MML) [Wallace and Boulton, 1968] and Minimum Description Length (MDL) [Rissanen, 1983]]. Both approaches admit to probabilistic interpretations. Given prior probabilities for both theories and data, minimization of the MML encoding closely approximates maximization of posterior probability [Wallace and Freeman, 1987]. An MDL code length defines an upper bound on ``unconditional likelihood'' [Rissanen, 1987].
The two approaches differ in that MDL employs a universal prior, which Rissanen [1983] explicitly justifies in terms of Occam's razor, while MML allows the specification of distinct appropriate priors for each induction task. However, in practice, a default prior is usually employed for MML, one that appears to also derive its justification from Occam's razor.
Neither MDL nor MML with its default prior would add complexity to a decision tree if doing so were justified solely on the basis of evidence from neighboring regions of the instance space. The evidence from the study presented herein appears to support the potential desirability of doing so. This casts some doubt upon the utility of the universal prior employed by MDL and the default prior usually employed with MML, at least with respect to their use for maximizing predictive accuracy.
It should be noted, however, that the probabilistic interpretation of these minimum encoding length techniques indicates that encoding length minimization represents maximization of posterior probability or of unconditional likelihood. Maximization of these factors is not necessarily directly linked with maximizing predictive accuracy.