Earlier we divided the statistical modeling problem into two steps: first, finding appropriate facts about the data; second, incorporating these facts into the model. Up to this point we have proceeded by assuming that the first task was somehow performed for us. Even in the simple example provided above, we did not explicitly state how we selected those particular constraints. That is, why is the fact that dans or à was chosen by the expert translator 50% of the time any more important than countless other facts contained in the data? In fact, the principle of maximum entropy does not directly concern itself with the issue of feature selection: it merely provides a recipe for combining constraints into a model. But the feature selection problem is critical, since the universe of possible constraints is typically in the thousands or even millions. In this section we introduce a method for automatically selecting the features to be included in a maximum entropy model, and then offer a series of refinements to ease the computational burden. What we will describe is a form of inductive learning: from a distribution , derive a set of rules (features) which characterize .