Ever since CGP [30] and SGP [31] a series of planners have been developed for tackling conformant and conditional planning problems - including GPT [6], C-Plan [10], PKSPlan [26], Frag-Plan [21], MBP [3], KACMBP [1], CFF [17], and YKA [28]. Several of these planners are extensions of heuristic state space planners that search in the space of ``belief states'' (where a belief state is a set of possible states). Without full-observability, agents need belief states to capture state uncertainty arising from starting in an uncertain state or by executing actions with uncertain effects in a known state. We focus on the first type of uncertainty, where an agent starts in an uncertain state but has deterministic actions. We seek strong plans, where the agent will reach the goal with certainty despite its partially known state. Many of the aforementioned planners find strong plans, and heuristic search planners are currently among the best. Yet a foundation for what constitutes a good distance-based heuristic for belief space has not been adequately investigated.
Belief Space Heuristics: Intuitively, it can be argued that the
heuristic merit of a belief state depends on at least two
factors-the size of the belief state (i.e., the uncertainty in the
current state), and the distance of the individual states in the
belief state from a destination belief state. The question of
course is how to compute these measures and which are most
effective. Many approaches estimate belief state distances in terms
of individual state to state distances between states in two belief
states, but either lack effective state to state distances or ways
to aggregate the state distances. For instance the MBP planner
[3] counts the number of states in the current
belief state. This amounts to assuming each state distance has unit
cost, and planning for each state can be done independently. The GPT
planner [6] measures the state to state
distances exactly and takes the maximum distance, assuming the
states of the belief state positively interact.
Heuristic Computation Substrates: We characterize several
approaches to estimating belief state distance by describing them in
terms of underlying state to state distances. The basis of our
investigation is in adapting classical planning reachability
heuristics to measure state distances and developing state distance
aggregation techniques to measure interaction between plans for
states in a belief state. We take three fundamental approaches to
measure the distance between two belief states. The first approach
does not involve aggregating state distance measures, rather we use
a classical planning graph to compute a representative state
distance. The second retains distinctions between individual states
in the belief state by using multiple planning graphs, akin to CGP
[30], to compute many state distance measures
which are then aggregated. The third employs a new planning graph
generalization, called the Labelled Uncertainty Graph (
), that
blends the first two to measure a single distance between two belief
states. With each of these techniques we will discuss the types of
heuristics that we can compute with special emphasis on relaxed
plans. We present several relaxed plan heuristics that differ in
terms of how they employ state distance aggregation to make stronger
assumptions about how states in a belief state can co-achieve the
goal through action sequences that are independent, positively
interact, or negatively interact.
Our motivation for the first of the three planning graph techniques
for measuring belief state distances is to try a minimal extension
to classical planning heuristics to see if they will work for us.
Noticing that our use of classical planning heuristics ignores
distinctions between states in a belief state and may provide
uninformed heuristics, we move to the second approach where we
possibly build exponentially many planning graphs to get a better
heuristic. With the multiple planning graphs we extract a heuristic
from each graph and aggregate them to get the belief state distance
measure. If we assume the states of a belief state are independent,
we can aggregate the measures with a summation. Or, if we assume
they positively interact we can use a maximization. However, as we
will show, relaxed plans give us a unique opportunity to measure
both positive interaction and independence among the states by
essentially taking the union of several relaxed plans. Moreover,
mutexes play a role in measuring negative interactions between
states. Despite the utility of having robust ways to aggregate state
distances, we are still faced with the exponential blow up in the
number of planning graphs needed. Thus, our third approach seeks to
retain the ability to measure the interaction of state distances but
avoid computing multiple graphs and extracting heuristics from each.
The idea is to condense and symbolically represent multiple planning
graphs in a single planning graph, called a Labelled Uncertainty
Graph (
). Loosely speaking, this single graph unions the causal
support information present in the multiple graphs and pushes the
disjunction, describing sets of possible worlds (i.e., initial
literal layers), into ``labels". The planning graph vertices are the
same as those present in multiple graphs, but redundant
representation is avoided. For instance an action that was present
in all of the multiple planning graphs would be present only once in
the
and labelled to indicate that it is applicable in a
planning graph projection from each possible world. We will describe
how to extract heuristics from the
that make implicit
assumptions about state interaction without explicitly aggregating
several state distances.
Ideally, each of the planning graph techniques considers every state
in a belief state to compute heuristics, but as belief states grow
in size this could become uninformed or costly. For example, the
single classical planning graph ignores distinctions between
possible states where the heuristic based on multiple graphs leads
to the construction of a planning graph for each state. One way to
keep costs down is to base the heuristics on only a subset of the
states in our belief state. We evaluate the effect of such a
sampling on the cost of our heuristics. With a single graph we
sample a single state and with multiple graphs and the
we
sample some percent of the states. We evaluate state sampling to
show when it is appropriate, and find that it is dependent on how we
compute heuristics with the states.
Standardized Evaluation of Heuristics: An issue in evaluating
the effectiveness of heuristic techniques is the many architectural
differences between planners that use the heuristics. It is quite
hard to pinpoint the global effect of the assumptions underlying
their heuristics on performance. For example, GPT is outperformed by
MBP-but it is questionable as to whether the credit for this
efficiency is attributable to the differences in heuristics, or
differences in search engines (MBP uses a BDD-based search). Our
interest in this paper is to systematically evaluate a spectrum of
approaches for computing heuristics for belief space planning. Thus
we have implemented heuristics similar to GPT and MBP and use them
to compare against our new heuristics developed around the notion of
overlap (multiple world positive interaction and independence). We
implemented the heuristics within two planners, the Conformant-
planner (
) and the Partially-Observable
Non-Deterministic planner (
).
does handle search with
non-deterministic actions, but for the bulk of the paper we discuss
deterministic actions. This more general action formulation, as
pointed out by [30], can be translated into
initial state uncertainty. Alternatively, in Section 8.2 we discuss
a more direct approach to reason with non-deterministic actions in
the heuristics.
External Evaluation: Although our main interest in this paper
is to evaluate the relative advantages of a spectrum of belief space
planning heuristics in a normalized setting, we also compare the
performance of the best heuristics from this work to current state
of the art conformant and conditional planners. Our empirical
studies show that planning graph based heuristics provide effective
guidance compared to cardinality heuristics as well as the
reachability heuristic used by GPT and CFF, and our planners are
competitive with BDD-based planners such as MBP and YKA, and
GraphPlan-based ones such as CGP and SGP. We also notice that our
planners gain scalability with our heuristics and retain reasonable
quality solutions, unlike several of the planners we compare
against.
The rest of this paper is organized as follows. We first present the
and
planners by describing their state and action
representations as well as their search algorithms. To understand
search guidance in the planners, we then discuss appropriate
properties of heuristic measures for belief space planning. We
follow with a description of the three planning graph substrates
used to compute heuristics. We carry out an empirical evaluation in
the next three sections, by describing our test setup, presenting a
standardized internal comparison, and finally comparing with several
other state of the art planners. We end with related research,
discussion, prospects for future work, and various concluding
remarks.