Statistical modeling addresses the problem of modeling the behavior of a random process. In constructing this model, we typically have at our disposal a sample of output from the process. From the sample, which constitutes an incomplete state of knowledge about the process, the modeling problem is to parlay this knowledge into a succinct, accurate representation of the process. We can then use this representation to make predictions of the future behavior of the process.
Exponential models have proven themselves handy in this arena, and for that reason have earned a place in the toolbox of every statistical physicist since the turn of the century. Within the broad class of exponential models exists a family of distributions, maximum entropy models, with some interesting mathematical and philosophical properties. Though the concept of maximum entropy can be traced back along multiple threads to Biblical times, only recently have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition.
The following pages discuss a method for statistical modeling based on maximum entropy, with a particular on questions of interest in natural language processing (NLP). Extensive results and benchmarks are provided, as well as a number of practical algorithms for modeling conditional data using maxent. A connection between conditional maxent models and Markov random fields--a popular modeling technique in computer vision--is drawn in the final section.
Starting from a set of data, the algorithms discussed in the following pages can automatically extract a set of relationships inherent in the data, and then combine these rules into a model of the data which is both accurate and compact. For instance, starting from a corpus of English text with no linguistic knowledge whatsoever, the algorithms can automatically induce a set of rules for determining the appropriate meaning of a word in context. Since this inductive learning procedure is computationally taxing, we are also obliged to provide a set of heuristics to ease the computational burden.
Though the theme of this discussion is NLP--trying to quantify the relations among words in everyday human speech and writing--absolutely nothing in this document is at all particular to NLP. Maxent models have been applied with success in astrophysics and medicine, among other fields.