Single graph heuristics are usually uninformed because the
projected belief state
often corresponds to multiple
possible states. The lack of accuracy is because single graphs are
not able to capture propagation of multiple world support
information. Consider the CBTC problem where the projected belief
state is
and we are using a single graph
. If DunkP1
were the only action we would say that
arm and
clog
can be reached at a cost of two, but in fact the cost is infinite
(since there is no DunkP2 to support
arm from all possible
worlds), and there is no strong plan.
To account for lack of support in all possible worlds and sharpen
the heuristic estimate, a set of multiple planning graphs
is considered. Each
is a single graph, as
previously discussed. These multiple graphs are similar to the
graphs used by CGP [30], but lack the more
general cross-world mutexes. Mutexes are only computed within
each graph, i.e. only same-world mutexes are computed. We
construct the initial layer
of each graph
with a different state
. With
multiple graphs, the heuristic value of a belief state is computed
in terms of all the graphs. Unlike single graphs, we can compute
different world aggregation measures with the multiple planning
graphs.
While we get a more informed heuristic by considering more of the
states in
, in certain cases it can be costly to
compute the full set of planning graphs and extract relaxed plans.
We will describe computing the full set of planning graphs, but will
later evaluate (in Section 6.4) the effect of computing a smaller
proportion of these. The single graph
is the extreme case of
computing fewer graphs.
To illustrate the use of multiple planning graphs, consider our
example CBTC. We build two graphs (Figure 6) for the
projected
. They have the respective initial literal layers:
arm, clog, inP1,
inP2
and
arm, clog,
inP2, inP2
.
In the graph for the first possible world,
arm comes
in only through DunkP1 at level 2. In the graph for the second
world,
arm comes in only through DunkP2 at level 2. Thus, the
multiple graphs show which actions in the different worlds
contribute to support the same literal.
A single planning graph is sufficient if we do not aggregate state measures, so in the following we consider how to compute the achievement cost of a belief state with multiple graphs by aggregating state distances.
Positive Interaction Aggregation: Similar to GPT
[6], we can use the worst-case world to
represent the cost of the belief state
by using the
heuristic. The difference with GPT is that we
compute a heuristic on planning graphs, where they compute plans in
state space. With this heuristic we account for the number of
actions used in a given world, but assume positive interaction
across all possible worlds.
The
heuristic is computed by finding a relaxed plan
on each planning graph
, exactly as
done on the single graph with
. The difference is that
unlike the single graph relaxed plan
, but like
, the
initial levels of the planning graphs are states, so each relaxed
plan will reflect all the support needed in the world corresponding
to
. Formally:
![]() |
Notice that we are not computing all state distances between states
in
and
. Each planning graph
corresponds to a
state in
, and from each
we extract a single relaxed
plan. We do not need to enumerate all states in
and find a
relaxed plan for each. We instead support a set of literals from
one constituent of
. This constituent is estimated to be the
minimum distance state in
because it is the first constituent
reached in
.
For CBTC, computing
(Figure 6)
finds:
inP1
, Flush
,
inP1
,
Flush
},
inP1
clog
,
clog
, DunkP1
,
clog
,
DunkP1
},
arm
clog
and
inP2
, Flush
,
inP2
,
Flush
},
inP2
clog
,
clog
, DunkP2
,
clog
,
DunkP2
},
arm
clog
.
Each relaxed plan contains two actions and taking the
maximum of the two relaxed plan values gives
. This aggregation ignores the fact that we must use different
Dunk actions each possible world.
Independence Aggregation: We can use the
heuristic to assume independence among the worlds in our belief
state. We extract relaxed plans exactly as described in the previous
heuristic and simply use a summation rather than maximization of the
relaxed plan costs. Formally:
![]() |
For CBTC, if computing
, we find the same
relaxed plans as in the
heuristic, but sum
their values to get 2 + 2 = 4 as our heuristic. This aggregation
ignores the fact that we can use the same Flush action for both
possible worlds.
State Overlap Aggregation: We notice that in the two previous
heuristics we are either taking a maximization and not accounting
for some actions, or taking a summation and possibly accounting for
extra actions. We present the
heuristic to balance
the measure between positive interaction and independence of worlds.
Examining the relaxed plans computed by the two previous heuristics
for the CBTC example, we see that the relaxed plans extracted from
each graph have some overlap. Notice, that both
and
contain a Flush action
irrespective of which package the bomb is in - showing some
positive interaction. Also,
contains DunkP1,
and
contains DunkP2 - showing some
independence. If we take the layer-wise union of the two relaxed
plans, we would get a unioned relaxed plan:
inP1
, Flush
,
inP1
,
inP2
,
Flush
},
inP1, inP2
clog
,
clog
, DunkP1, DunkP2
,
clog
,
DunkP1
,
DunkP2
},
arm
clog
.
This relaxed plans accounts for the actions that are the same between possible worlds and the actions that differ. Notice that Flush appears only once in layer zero and the Dunk actions both appear in layer one.
In order to get the union of relaxed plans, we extract relaxed plans
from each
, as in the two previous heuristics.
Then if we are computing heuristics for regression search, we start
at the last level (and repeat for each level) by taking the union of
the sets of actions for each relaxed plan at each level into another
relaxed plan. The relaxed plans are end-aligned, hence the
unioning of levels proceeds from the last layer of each relaxed plan
to create the last layer of the
relaxed plan, then the
second to last layer for each relaxed plan is unioned and so on. In
progression search, the relaxed plans are start-aligned to
reflect that they all start at the same time, whereas in regression
we assume they all end at the same time. The summation of the
number of actions of each action level in the unioned relaxed plan
is used as the heuristic value. Formally:
![]() |
For CBTC, we just found
, so counting the number of actions
gives us a heuristic value of
.