Next: Appendix B. Optimization of Up: Variational Probabilistic Inference and Previous: Discussion

Appendix A. Duality

The upper and lower bounds for individual conditional probability distributions that form the basis of our variational method are based on the ``dual'' or ``conjugate'' representations of convex functions. We present a brief description of convex duality in this appendix, and refer the reader to Rockafellar (1970) for a more extensive treatment.

Let f(x) be a real-valued, convex function defined on a convex set X (for example, ). For simplicity of exposition, we assume that f is a well-behaved (differentiable) function. Consider the graph of f, i.e., the points (x,f(x)) in an n+1 dimensional space. The fact that the function f is convex translates into convexity of the set called the epigraph of f and denoted by epi(f) (Figure 13).

Figure 13: Half-spaces containing the convex set epi(f). The conjugate function defines the critical half-spaces whose intersection is epi(f), or, equivalently, it defines the tangent planes of f(x).

It is an elementary property of convex sets that they can be represented as the intersection of all the half-spaces that contain them (see Figure 13). Through parameterizing these half-spaces we obtain the dual representations of convex functions. To this end, we define a half-space by the condition:

where and parameterize all (non-vertical) half-spaces. We are interested in characterizing the half-spaces that contain the epigraph of f. We require therefore that the points in the epigraph must satisfy the half-space condition: for , we must have . This holds whenever as the points in the epigraph have the property that . Since the condition must be satisfied by all , it follows that

as well. Equivalently,

where the right-hand side of this equation defines a function of , which is known as the ``dual'' or ``conjugate'' function . This function, which is also a convex function, defines the critical half-spaces which are needed for the representation of epi(f) as an intersection of half-spaces (Figure 13).

To clarify the duality between f(x) and , let us drop the maximum and rewrite the inequality as:

In this equation, the roles of the two functions are interchangeable and we may suspect that also f(x) can obtained from the dual function by an optimization procedure. This is in fact the case and we have:

This equality states that the dual of the dual gives back the original function. It provides the computational tool for calculating dual functions.

For concave (convex down) functions the results are analogous; we replace with , and lower bounds with upper bounds.

Next: Appendix B. Optimization of Up: Variational Probabilistic Inference and Previous: Discussion

Michael Jordan
Sun May 9 16:22:01 PDT 1999