Exact inference algorithms perform many millions of arithmetic operations when applied to complex graphical models such as the QMR-DT. While this proliferation of terms expresses the symbolic structure of the model, it does not necessarily express the numeric structure of the model. In particular, many of the sums in the QMR-DT inference problem are sums over large numbers of random variables. Laws of large numbers suggest that these sums may yield predictable numerical results over the ensemble of their summands, and this fact might enable us to avoid performing the sums explicitly.
To exploit the possibility of numerical regularity in dense graphical models we develop a variational approach to approximate probabilistic inference. Variational methods are a general class of approximation techniques with wide application throughout applied mathematics. Variational methods are particularly useful when applied to highly-coupled systems. By introducing additional parameters, known as ``variational parameters''--which essentially serve as low-dimensional surrogates for the high-dimensional couplings of the system--these methods achieve a decoupling of the system. The mathematical machinery of the variational approach provides algorithms for finding values of the variational parameters such that the decoupled system is a good approximation to the original coupled system.
In the case of probabilistic graphical models variational methods allow us to simplify a complicated joint distribution such as the one in Eq. (7). This is achieved via parameterized transformations of the individual node probabilities. As we will see later, these node transformations can be interpreted graphically as delinking the nodes from the graph.
How do we find appropriate transformations? The variational methods that we consider here come from convex analysis (see Appendix 6). Let us begin by considering methods for obtaining upper bounds on probabilities. A well-known fact from convex analysis is that any concave function can be represented as the solution to a minimization problem:
where is the conjugate function of f(x). The function is itself obtained as the solution to a minimization problem:
The formal identity of this pair of minimization problems expresses the ``duality'' of f and its conjugate .
The representation of f in Eq. (8) is known as a variational transformation. The parameter is known as a variational parameter. If we relax the minimization and fix the the variational parameter to an arbitrary value, we obtain an upper bound:
The bound is better for some values of the variational parameter than for others, and for a particular value of the bound is exact.
We also want to obtain lower bounds on conditional probabilities. A straightforward way to obtain lower bounds is to again appeal to conjugate duality and to express functions in terms of a maximization principle. This representation, however, applies to convex functions--in the current paper we require lower bounds for concave functions. Our concave functions, however, have a special form that allows us to exploit conjugate duality in a different way. In particular, we require bounds for functions of the form , where f is a concave function, where for are non-negative variables, and where a is a constant. The variables in this expression are effectively coupled--the impact of changing one variable is contingent on the settings of the remaining variables. We can use Jensen's inequality, however, to obtain a lower bound in which the variables are decoupled.3 In particular:
where the can be viewed as defining a probability distribution over the variables . The variational parameter in this case is the probability distribution q. The optimal setting of this parameter is given by . This is easily verified by substitution into Eq. (12), and demonstrates that the lower bound is tight.