The maximum entropy principle presents us with a problem in constrained optimization: find the which maximizes . In simple cases, we can find the solution to this problem analytically. This was true for the example presented above when we imposed the first two constraints on p. Unfortunately, the solution of the general maximum entropy cannot be written explicitly, and we need a more indirect approach. (The reader is invited to try to calculate the solution for the same example when the third constraint is imposed.)
To address the general problem, we apply the method of Lagrange multipliers from the theory of constrained optimization. The relevant steps are outlined here; the reader is referred to the further readings for a more thorough discussion of constrained optimization as applied to maximum entropy.
The constrained optimization problem at hand is to
We refer to this as the primal problem; it is a succinct way of saying that we seek to maximize subject to the following constraints:
To solve this optimization problem, introduce the Lagrangian
The real-valued parameters and correspond to the constraints imposed on the solution.
Though we shall not prove it here, the following strategy yields the optimal value of (which we are calling ): first hold and constant and maximize (8) with respect to . This yields an expression for in terms of the (still unsolved-for) parameters and . Now substitute this expression back into (8), this time solving for the optimal values of and ( and , respectively).
Proceeding in this manner, we hold fixed and compute the unconstrained maximum of the over all :
Equating this expression to zero and solving for , we find that at its optimum, has the parametric form
We have thus found the parametric form of , and so we now take up the task of solving for the optimal values . Recognizing that the second factor in this equation is the factor corresponding to the second of the constraints listed above, we can rewrite (10) as
where , the normalizing factor, is given by
We have found but not yet . Towards this end we introduce some further notation. Define the dual function as
and the dual optimization problem as
Since and are fixed, the righthand side of (14) has only the free variables .
It is far from obvious that the of (11) with given by (14) is in fact the solution to the constrained optimization problem we set out to find. But in fact this is due to a fundamental principle in the theory of Lagrange multipliers, called generically the Kuhn-Tucker theorem, which asserts that (under suitable assumptions, which are satisfied here) the primal and dual problems are closely related. Although a detailed account of this relationship is beyond the scope of this paper, it is easy to state the final result:
The maximum entropy model subject to the constraints has the parametric form of (11), where can be determined by maximizing the dual function .