The maximum entropy principle presents us with a problem in constrained
optimization: find the which maximizes
. In simple cases,
we can find the solution to this problem analytically. This was true for the
example presented above when we imposed the first two
constraints on p. Unfortunately, the solution of the general maximum entropy
cannot be written explicitly, and we need a more indirect approach. (The
reader is invited to try to calculate the solution for the same example when
the third constraint is imposed.)
To address the general problem, we apply the method of Lagrange multipliers from the theory of constrained optimization. The relevant steps are outlined here; the reader is referred to the further readings for a more thorough discussion of constrained optimization as applied to maximum entropy.
The constrained optimization problem at hand is to
We refer to this as the primal problem; it is a succinct way of saying
that we seek to maximize subject to the following constraints:
To solve this optimization problem, introduce the Lagrangian
The real-valued parameters and
correspond to the
constraints imposed on the solution.
Though we shall not prove it here, the following strategy yields the optimal
value of (which we are calling
): first hold
and
constant and maximize (8) with respect to
.
This yields an expression for
in terms of the (still unsolved-for)
parameters
and
. Now substitute this expression back into
(8), this time solving for the optimal values of
and
(
and
, respectively).
Proceeding in this manner, we hold fixed and compute the
unconstrained maximum of the
over all
:
Equating this expression to zero and solving for , we find that
at its optimum,
has the parametric form
We have thus found the parametric form of , and so we now take up the
task of solving for the optimal values
. Recognizing
that the second factor in this equation is the factor corresponding to the
second of the constraints listed above, we can rewrite (10)
as
where , the normalizing factor, is given by
We have found but not yet
. Towards this end we
introduce some further notation. Define the dual function
as
and the dual optimization problem as
Since and
are fixed, the righthand side of
(14) has only the free variables
.
It is far from obvious that the of (11)
with
given by (14) is in fact the solution
to the constrained optimization problem we set out to find. But in fact this is
due to a fundamental principle in the theory of Lagrange multipliers, called
generically the Kuhn-Tucker theorem, which asserts that (under suitable
assumptions, which are satisfied here) the primal and dual problems are closely
related. Although a detailed account of this relationship is beyond the scope
of this paper, it is easy to state the final result:
The maximum entropy model subject to the constraints
has the parametric form
of (11), where
can be determined by maximizing the dual function
.