The main network used in our tests is a subset of the CPCS (Computer-based Patient Case Study) model [Pradhan et al.1994], a large multiply-connected multi-layer network consisting of 422 multi-valued nodes and covering a subset of the domain of internal medicine. Among the 422 nodes, 14 nodes describe diseases, 33 nodes describe history and risk factors, and the remaining 375 nodes describe various findings related to the diseases. The CPCS network is among the largest real networks available to the research community at the present time. The CPCS network contains many extreme probabilities, typically on the order of 10-4. Our analysis is based on a subset of 179 nodes of the CPCS network, created by Max Henrion and Malcolm Pradhan. We used this smaller version in order to be able to compute the exact solution for the purpose of measuring approximation error in the sampling algorithms.
The AIS-BN algorithm has some learning overhead. The following comparison of execution time vs. number of samples may give the reader an idea of this overhead. Updating the CPCS network with 20 evidence nodes on our system takes the AIS-BN algorithm a total of 8.4 seconds to learn. It generates subsequently 3,640 samples per second, while the SIS algorithm generates 2,631 samples per second, and the LW algorithm generates 4,167 samples per second. In order to remain conservative towards the AIS-BN algorithm, in all our experiments we fixed the execution time of the algorithms (our limit was 60 seconds) rather than the number of samples. In the CPCS network with 20 evidence nodes, in 60 seconds, AIS-BN generates about 188,000 samples, SIS generates about 158,000 samples and LW generates about 250,000 samples.
We generated a total of 75 test cases consisting of five sequences
of 15 test cases each. We ran each test case 10 times, each time
with a different setting of the random number seed. Each sequence
had a progressively higher number of evidence nodes: 15, 20, 25,
30, and 35 evidence nodes respectively. The evidence nodes were
chosen randomly (equiprobable sampling without replacement) from
those nodes that described various plausible medical findings.
Almost all of these nodes were leaf nodes in the network. We
believe that this constituted very realistic test cases for the
algorithms. The distribution of the prior probability of evidence,
Pr( =
), across all test runs of our experiments
is shown in Figure 4. The least likely
evidence was
5.54×10-42, the most likely evidence was
1.37×10-9, and the median was
7×10-24.
|
|
Figures 5 and 6 show a typical
plot of convergence of the tested sampling algorithms in our
experiments. The case illustrated involves updating the CPCS
network with 20 evidence nodes. We plot the MSE after the
initial 15 seconds during which the algorithms start converging.
In particular, the learning step of the AIS-BN algorithm is
usually completed within the first 9 seconds. We ran the three
algorithms in this case for 150 seconds rather than the 60 seconds
in the actual experiment in order to be able to observe a wider
range of convergence. The plot of the MSE for the AIS-BN
algorithm almost touches the X axis in Figure 5.
Figure 6 shows the same plot in a finer scale in
order to show more detail in the AIS-BN convergence curve. It is
clear that the AIS-BN algorithm dramatically improves the
convergence rate. We can also see that the results of AIS-BN
converge to exact results very fast as the sampling time
increases. In the case captured in Figures 5
and 6, a tenfold increase in the sampling time
(after subtracting the overhead for the AIS-BN algorithm, it
corresponds to a 21.5-fold increase in the number of samples)
results in a 4.55-fold decrease of the MSE (to MSE
0.00048). The observed convergence of both SIS and LW
algorithms was poor. A tenfold increase in sampling time had
practically no effect on accuracy. Please note that this is a very
typical case observed in our experiments.
|
Figure 7 illustrates the ICPT learning process
of the AIS-BN algorithm for the sample case shown in Figure 6. The displayed conditional probabilities belong to
the node gasAcute which is a parent of two evidence nodes,
difInfGasMuc and abdPaiExaMea. The node gasAcute
has four states: ``absent,'' ``mild,'' ``moderate,'' and
``severe'', and two parents. We randomly chose a combination of
its parents' states as our displayed configuration. The original
CPT for this configuration without evidence, the exact ICPT
with evidence and the learned ICPT with evidence are summarized
numerically in Table 1. Figure 7
illustrates that the learned importance conditional probabilities
begin to converge to the exact results stably after three updating
steps. The learned probabilities in Step 10 are close to the exact
results. In this example, the difference between
Pr(xi|pa(Xi),) and
Pr(xi|pa(Xi)) is
very large. Sampling from
Pr(xi|pa(Xi)) instead of
Pr(xi|pa(Xi),
) would introduce large variance
into our results.
|
Figure 8 shows the MSE for all 75 test cases in our experiments with the summary statistics in Table 2. A paired one-tailed t-test resulted in statistically highly significant differences between the AIS-BN and SIS algorithms ( p < 3.1×10-20), and also between the SIS and LW algorithms ( p < 1.7×10-8). As far as the magnitude of difference is concerned, AIS-BN was significantly better than SIS. SIS was better than LW, but the difference was small. The mean MSEs of SIS and LW algorithms were both greater than 0.1, which suggests that neither of these algorithms is suitable for large Bayesian networks.
The graph in Figure 9 shows the MSE ratio between the AIS-BN and SIS algorithms. We can see that the percentage of the cases whose ratio was greater than 100 (two orders of magnitude improvement!) is 60%. In other words, we obtained two orders of magnitude improvement in MSE in more than half of the cases. In 80% cases, the ratio was greater than 50. The smallest ratio in our experiments was 2.67, which happened when posterior probabilities were dominated by the prior probabilities. In that case, even though the LW and SIS algorithms converged very fast, their MSE was still far larger than that of AIS-BN.
Our next experiment aimed at showing how close the AIS-BN
algorithm can approach the best possible sampling results. If we
know the optimal importance sampling function, the convergence of
the AIS-BN algorithm should be the same as that of forward
sampling without evidence. In other words, the results of the
probabilistic logic sampling algorithm without evidence approach
the limit of how well stochastic sampling can perform. We ran the
logic sampling algorithm on the CPCS network without evidence
mimicking the test runs of the AIS-BN algorithm, i.e., 5 blocks of
15 runs, each repeated 10 times with a different random number
seed. The number of samples generated was equal to the average
number of samples generated by the AIS-BN algorithm for each
series of 15 test runs. We obtained the average MSE
= 0.00057, with
= 0.000025,
min = 0.00052, and
max = 0.00065. The best results should be around this range. From
Table 2, we can see that the minimum MSE for the
AIS-BN algorithm was 0.00049, within the range of the optimal
result. The mean MSE in AIS-BN is 0.00082, not too far from
the optimal results. The standard deviation,
, is
significantly larger in the AIS-BN algorithm, but this is
understandable given that the process of learning the optimal
importance function is heuristic in nature. It is not difficult to
understand that there exist a difference between the AIS-BN
results and the optimal results. First, the AIS-BN algorithm in
our tests updated the sampling distribution only 10 times, which
may be too few times to let it converge to the optimal importance
distribution. Second, even if the algorithm has converged to the
optimal importance distribution, the sampling algorithm will still
let the parameter oscillate around this distribution and there
will be always small differences between the two distributions.
Figure 10 shows the convergence rate for all tested cases for a four-fold increase in sampling time (between 15 and 60 seconds).
|