The simplest approach for using planning graphs for belief space planning heuristics is to use a ``classical'' planning graph. To form the initial literal layer from the projected belief state, we could either sample a single state (denoted ) or use an aggregate state (denoted ). For example, in CBTC (see Figure 5) assuming regression search with , the initial level of the planning graph for might be:
= {arm, clog, inP1, inP2}
and for it is defined by the aggregate state :
= {arm, clog, inP1, inP2, inP1, inP2}.
Since these two versions of the single planning graph have identical semantics, aside from the initial literal layer, we proceed by describing the graph and point out differences with where they arise.
Graph construction is identical to classical planning graphs (including mutex propagation) and stops when two subsequent literal layers are identical (level off). We use the planning graph formalism used in IPP [20] to allow for explicit representation of conditional effects, meaning there is a literal layer , an action layer , and an effect layer in each level . Persistence for a literal , denoted by , is represented as an action where . A literal is in if an effect from the previous effect layer contains the literal in its consequent. An action is in the action layer if every one of its execution precondition literals is in . An effect is in the effect layer if its associated action is in the action layer and every one of its antecedent literals is in . Using conditional effects in the planning graph avoids factoring an action with conditional effects into a possibly exponential number of non-conditional actions, but adds an extra planning graph layer per level. Once our graph is built, we can extract heuristics.
No Aggregation: Relaxed plans within a single planning graph
are able to measure, under the most optimistic assumptions, the
distance between two belief states. The relaxed plan represents a
distance between a subset of the initial layer literals and the
literals in a constituent of our belief state. In the
, the
literals from the initial layer that are used for support may not
hold in a single state of the projected belief state, unlike the
. The classical relaxed plan heuristic
finds a
set of (possibly interfering) actions to support the goal
constituent. The relaxed plan
is a subgraph of the planning
graph, of the form {
,
,
, ...,
,
,
}. Each of the layers contains a
subset of the vertices in the corresponding layer of the planning
graph.
More formally, we find the relaxed plan to support the constituent that is reached earliest in the graph (as found by the heuristic in Appendix A). Briefly, returns the first level where a constituent of has all its literals in and none are marked pair-wise mutex. Notice that this is how we incorporate negative interactions into our heuristics. We start extraction at the level , by defining as the literals in the constituent used in the level heuristic. For each literal , we select a supporting effect (ignoring mutexes) from to form the subset . We prefer persistence of literals to effects in supporting literals. Once a supporting set of effects is found, we create as all actions with an effect in . Then the needed preconditions for the actions and antecedents for chosen effects in and are added to the list of literals to support from . The algorithm repeats until we find the needed actions from . A relaxed plan's value is the summation of the number of actions in each action layer. A literal persistence, denoted by a subscript ``p'', is treated as an action in the planning graph, but in a relaxed plan we do not include it in the final computation of . The single graph relaxed plan heuristic is computed as
For the CBTC problem we find a relaxed plan from the , as shown in Figure 5 as the bold edges and nodes. Since arm and clog are non mutex at level two, we can use persistence to support clog and DunkP1 to support arm in . In we can use persistence for inP1, and Flush for clog. Thus, = 2 because the relaxed plan is:
inP1 , Flush ,
inP1 , Flush },
inP1 clog ,
clog , DunkP1 ,
clog , DunkP1 },
arm clog .
The relaxed plan does not use both DunkP2 and DunkP1 to support arm. As a result arm is not supported in all worlds (i.e. it is not supported when the state where inP2 holds is our initial state). Our initial literal layer threw away knowledge of inP1 and inP2 holding in different worlds, and the relaxed plan extraction ignored the fact that arm needs to be supported in all worlds. Even with an graph, we see similar behavior because we are reasoning with only a single world. A single, unmodified classical planning graph cannot capture support from all possible worlds - hence there is no explicit aggregation over distance measures for states. As a result, we do not mention aggregating states to measure positive interaction, independence, or overlap.