Single Graph Heuristics ( )

Next: Multiple Graph Heuristics ( Up: Heuristics Previous: Non Planning Graph-based Heuristics

Single Graph Heuristics ( )

**Figure 5:** Single planning graph for CBTC, with relaxed plan components in bold. Mutexes omitted.
$\scalebox{.5}{\includegraphics{btcSGrp1.eps}}$

The simplest approach for using planning graphs for belief space planning heuristics is to use a ``classical'' planning graph. To form the initial literal layer from the projected belief state, we could either sample a single state (denoted ) or use an aggregate state (denoted ). For example, in CBTC (see Figure 5) assuming regression search with , the initial level ${\cal L}_0$ of the planning graph for might be:

${\cal L}_0$ = {arm, clog, inP1, $\neg$ inP2}

and for it is defined by the aggregate state $\tilde{S}(BS_P)$ :

${\cal L}_0$ = {arm, clog, inP1, inP2, $\neg$ inP1, $\neg$ inP2}.

Since these two versions of the single planning graph have identical semantics, aside from the initial literal layer, we proceed by describing the graph and point out differences with where they arise.

Graph construction is identical to classical planning graphs (including mutex propagation) and stops when two subsequent literal layers are identical (level off). We use the planning graph formalism used in IPP [20] to allow for explicit representation of conditional effects, meaning there is a literal layer ${\cal L}_k$ , an action layer ${\cal A}_k$ , and an effect layer ${\cal E}_k$ in each level . Persistence for a literal , denoted by , is represented as an action where $\rho^e(l_p) = \varepsilon^{0}(l_p) = l$ . A literal is in ${\cal L}_k$ if an effect from the previous effect layer ${\cal E}_{k-1}$ contains the literal in its consequent. An action is in the action layer ${\cal A}_k$ if every one of its execution precondition literals is in ${\cal L}_k$ . An effect is in the effect layer ${\cal E}_k$ if its associated action is in the action layer ${\cal A}_k$ and every one of its antecedent literals is in ${\cal L}_k$ . Using conditional effects in the planning graph avoids factoring an action with conditional effects into a possibly exponential number of non-conditional actions, but adds an extra planning graph layer per level. Once our graph is built, we can extract heuristics.

No Aggregation: Relaxed plans within a single planning graph are able to measure, under the most optimistic assumptions, the distance between two belief states. The relaxed plan represents a distance between a subset of the initial layer literals and the literals in a constituent of our belief state. In the , the literals from the initial layer that are used for support may not hold in a single state of the projected belief state, unlike the . The classical relaxed plan heuristic $h^{SG}_{RP}$ finds a set of (possibly interfering) actions to support the goal constituent. The relaxed plan is a subgraph of the planning graph, of the form { ${\cal A}^{RP}_{0}$ , ${\cal E}^{RP}_{0}$ , ${\cal L}^{RP}_{1}$ , ..., ${\cal A}^{RP}_{b-1}$ , ${\cal E}^{RP}_{b-1}$ , ${\cal L}^{RP}_{b}$ }. Each of the layers contains a subset of the vertices in the corresponding layer of the planning graph.

More formally, we find the relaxed plan to support the constituent $\hat{S} \in \hat{\xi}(BS_i)$ that is reached earliest in the graph (as found by the $h^{SG}_{level}(BS_i)$ heuristic in Appendix A). Briefly, $h^{SG}_{level}(BS_i)$ returns the first level where a constituent of has all its literals in ${\cal L}_b$ and none are marked pair-wise mutex. Notice that this is how we incorporate negative interactions into our heuristics. We start extraction at the level , by defining ${\cal L}^{RP}_{b}$ as the literals in the constituent used in the level heuristic. For each literal $l \in {\cal L}^{RP}_{b}$ , we select a supporting effect (ignoring mutexes) from ${\cal E}_{b-1}$ to form the subset ${\cal E}^{RP}_{b-1}$ . We prefer persistence of literals to effects in supporting literals. Once a supporting set of effects is found, we create ${\cal A}^{RP}_{b-1}$ as all actions with an effect in ${\cal E}^{RP}_{b-1}$ . Then the needed preconditions for the actions and antecedents for chosen effects in ${\cal A}^{RP}_{b-1}$ and ${\cal E}^{RP}_{b-1}$ are added to the list of literals to support from ${\cal L}^{RP}_{b-2}$ . The algorithm repeats until we find the needed actions from ${\cal A}_{0}$ . A relaxed plan's value is the summation of the number of actions in each action layer. A literal persistence, denoted by a subscript ``p'', is treated as an action in the planning graph, but in a relaxed plan we do not include it in the final computation of $\mid{{\cal A}^{RP}_{j}}\mid$ . The single graph relaxed plan heuristic is computed as

$\displaystyle h^{SG}_{RP}(BS_i) = \sum\limits_{j = 0}^{b-1} \mid{{\cal A}^{RP}_{j}}\mid$

For the CBTC problem we find a relaxed plan from the , as shown in Figure 5 as the bold edges and nodes. Since $\neg$ arm and $\neg$ clog are non mutex at level two, we can use persistence to support $\neg$ clog and DunkP1 to support $\neg$ arm in ${\cal L}^{RP}_{2}$ . In ${\cal L}^{RP}_{1}$ we can use persistence for inP1, and Flush for $\neg$ clog. Thus, $h^{SG}_{RP}(BS_G)$ = 2 because the relaxed plan is:

${\cal A}^{RP}_{0} = \{$ inP1 , Flush $\}$ ,

${\cal E}^{RP}_{0} = \{\varphi^0($ inP1

, $\varphi^0($ Flush

${\cal L}^{RP}_{1} = \{$ inP1 $, \neg$ clog $\}$ ,

${\cal A}^{RP}_{1} = \{\neg$ clog

, DunkP1 $\}$ ,

${\cal E}^{RP}_{1} = \{\varphi^0(\neg$ clog

, $\varphi^1($ DunkP1

${\cal L}^{RP}_{2} = \{\neg$ arm $, \neg$ clog $\}$ .

The relaxed plan does not use both DunkP2 and DunkP1 to support $\neg$ arm. As a result $\neg$ arm is not supported in all worlds (i.e. it is not supported when the state where inP2 holds is our initial state). Our initial literal layer threw away knowledge of inP1 and inP2 holding in different worlds, and the relaxed plan extraction ignored the fact that $\neg$ arm needs to be supported in all worlds. Even with an graph, we see similar behavior because we are reasoning with only a single world. A single, unmodified classical planning graph cannot capture support from all possible worlds - hence there is no explicit aggregation over distance measures for states. As a result, we do not mention aggregating states to measure positive interaction, independence, or overlap.

Next: Multiple Graph Heuristics ( Up: Heuristics Previous: Non Planning Graph-based Heuristics

2006-05-26