Let us now return to the problem of computing the posterior probabilities in the QMR model. Recall that it is the conditional probabilities corresponding to the positive findings that need to be simplified. To this end, we write
where . Consider the exponent . For noisy-OR, as well as for many other conditional models involving compact representations (e.g., logistic regression), the exponent f(x) is a concave function of x. Based on the discussion in the previous section, we know that there must exist a variational upper bound for this function that is linear in x:
Using Eq. (9) to evaluate the conjugate function for noisy-OR, we obtain:
The desired bound is obtained by substituting into Eq. (13) (and recalling the definition ):
Note that the ``variational evidence'' is the exponential of a term that is linear in the disease vector d. Just as with the negative findings, this implies that the variational evidence can be incorporated into the posterior in time linear in the number of diseases associated with the finding.
There is also a graphical way to understand the effect of the transformation. We rewrite the variational evidence as follows:
Note that the first term is a constant, and note moreover that the product is factorized across the diseases. Each of the latter factors can be multiplied with the pre-existing prior on the corresponding disease (possibly itself modulated by factors from the negative evidence). The constant term can be viewed as associated with a delinked finding node . Indeed, the effect of the variational transformation is to delink the finding node from the graph, altering the priors of the disease nodes that are connected to that finding node. This graphical perspective will be important for the presentation of our variational algorithm--we will be able to view variational transformations as simplifying the graph until a point at which exact methods can be run.
We now turn to the lower bounds on the conditional probabilities . The exponent in the exponential representation is of the form to which we applied Jensen's inequality in the previous section. Indeed, since f is concave we need only identify the non-negative variables , which in this case are , and the constant a, which is now . Applying the bound in Eq. (12) we have:
where we have allowed a different variational distribution for each finding. Note that once again the bound is linear in the exponent. As in the case of the upper bound, this implies that the variational evidence can be incorporated into the posterior distribution in time linear in the number of diseases. Moreover, we can once again view the variational transformation in terms of delinking the finding node from the graph.