A typical focus of systems based on Bayesian networks is the
posterior probability of various outcomes of individual variables
given evidence,
Pr(a|). This can be generalized to the
computation of the posterior probability of a particular
instantiation of a set of variables given evidence, i.e.,
Pr(
=
|
). There are two methods that are
capable of performing this computation. The first method is very
efficient at the expense of precision. The second method is less
efficient, but offers in general better convergence rates. Both
methods are based on Equation 7.
The first method reuses the samples generated to estimate
Pr() in estimating
Pr(
,
).
Estimation of
Pr(
,
) amounts to counting the
scored sum under the condition
=
. The main
advantage of this method is its efficiency -- we can use the same
set of samples to estimate the posterior probability of any state
of a subset of the network given evidence. Its main disadvantage
is that the variance of the estimated
Pr(
,
)
can be large, especially when the numerical value of
Pr(
|
) is extreme. This method is the most widely used
approach in the existing stochastic sampling algorithms.
The second method, used much more rarely (e.g.,
[Cano et al.1996,Pradhan and Dagum1996,Dagum and Luby1997]), calls for estimating
Pr() and
Pr(
,
) separately. After
estimating
Pr(
), an additional call to the algorithm
is made for each instantiation
of the set of variables
of interest
.
Pr(
,
) is estimated by
sampling the network with the set of observations
extended by
=
. The main advantage of this method
is that it is much better at reducing variance than the first
method. Its main disadvantage is the computational cost associated
with sampling for possibly many combinations of states of nodes of
interest.
Cano et al. [1996] suggested a modified version of the
second method. Suppose that we are interested in the posterior
distribution
Pr(|
) for all possible values
of
, i = 1, 2, ..., k. We can estimate
Pr(
,
) for each i = 1, ..., k separately,
and use the value
Pr(
,
) as an
estimate for
Pr(
). The assumption behind this approach
is that the estimate of
Pr(
) will be very accurate
because of the large sample from which it is drawn. However, even
if we can guarantee small variance in every
Pr(
,
), we cannot guarantee that their sum will also have a small
variance. So, in the AIS-BN algorithm we only use the pure form of
each of the methods. The algorithm listed in Figure 2
is based on the first method when the optional computations in
Steps 12 and 13 are performed. An algorithm corresponding to the
second method skips the optional steps and calls the basic AIS-BN
algorithm twice to estimate
Pr(
) and
Pr(
,
) separately.
The first method is very attractive because of its simplicity and
possible computational efficiency. However, as we have shown in
Section 2.2, the performance of a sampling
algorithm that uses just one set of samples (as in the first
method above) to estimate
Pr(|
) will
deteriorate if the difference between the optimal importance
functions for
Pr(
,
) and
Pr(
) is
large. If the main focus of the computation is high accuracy of
the posterior probability distribution of a small number of nodes,
we strongly recommend to use the algorithm based on the second
method. Also, this algorithm can be easily used to estimate
confidence intervals of the solution.