While the scope of our presentation and evaluation is restricted to planning with initial state uncertainty and deterministic actions, some of the planning graph techniques can be extended to include non-deterministic actions of the type described by [27]. Non-deterministic actions have effects that are described in terms of a set of outcomes. For simplicity, we consider Rintanen's conditionality normal form, where actions have a set of conditional effects (as before) and each consequent is a mutually-exclusive set of conjunctions (outcomes) - one outcome of the effect will result randomly. We outline the generalization of our single, multiple, and labelled planning graphs to reason with non-deterministic actions.
Single Planning Graphs: Single planning graphs, that are built
from approximate belief states or a sampled state, do not lend
themselves to a straight-forward extension. A single graph ignores
uncertainty in a belief state by unioning its literals or sampling a
state to form the initial planning graph layer. Continuing with the
single graph assumptions about uncertainty, it makes sense to treat
non-deterministic actions as deterministic. Similar to how we
approximate a belief state as a set of literals to form the initial
literal layer or sample a state, we can assume that a
non-deterministic effect adds all literals appearing in the effect
or samples an outcome as if the action were deterministic (i.e.
gives a set of literals). Single graph relaxed plan heuristics thus
remain unchanged.
Multiple Planning Graphs: Multiple planning graphs are very
much like Conformant GraphPlan [30]. We can
generalize splitting the non-determinism in the current belief state
into multiple initial literal layers to splitting the outcomes of
non-deterministic effects into multiple literal layers. The idea is
to root a set of new planning graphs at each level, where each has
an initial literal layer containing literals supported by an
interpretation of the previous effect layer. By interpretations of
the effect layer we mean every possible set of joint effect
outcomes. A set of effect outcomes is possible if no two outcomes
are outcomes of the same effect. Relaxed plan extraction still
involves finding a relaxed plan in each planning graph. However,
since each planning graph is split many times (in a tree-like
structure) a relaxed plan is extracted from each ``path of the
tree''.
We note that this technique is not likely to scale because of the
exponential growth in redundant planning graph structure over
time. Further, in our experiments CGP has enough trouble with
initial state uncertainty. We expect that we should be able to do
much better with the
.
Labelled Uncertainty Graph: With multiple planning graphs we
are forced to capture non-determinism through splitting the
planning graphs not only in the initial literal layer, but also each
literal layer that follows at least one non-deterministic effect. We
saw in the
that labels can capture the non-determinism that
drove us to split the initial literal layer in multiple graphs. As
such, these labels took on a syntactic form that describes subsets
of the states in our source belief state. In order to generalize
labels to capture non-determinism resulting from uncertain effects,
we need to extend their syntactic form. Our objective is to have a
label represent which sources of uncertainty (arising from the
source belief state or effects) causally support the labelled item.
We also introduce a graph layer
to represent outcomes
and how they connect effects and literals.
It might seem natural to describe the labels for outcomes in terms of their affected literals, but this can lead to trouble. The problem is that the literals in effect outcomes are describing states at a different time than the literals in the projected belief state. Further, an outcome that appears in two levels of the graph is describing a random event at different times. Using state literals to describe all labels will lead to confusion as to which random events (state uncertainty and effect outcomes at distinct steps) causally support a labelled item. A pathological example is when we have an effect whose set of outcomes matches one-to-one with the states in the source belief state. In such a case, by using labels defined in terms of state literals we cannot distinguish which random event (the state uncertainty or the effect uncertainty) is described by the label.
We have two choices for describing effect outcomes in labels. In
both choices we introduce a new set of label variables to describe
how a literal layer is split. These new variables will be used to
describe effect outcomes in labels and will not be confused with
variables describing initial state uncertainty. In the first case,
these variables will have a one-to-one matching with our original
set of literals, but can be thought of as time-stamped literals. The
number of variables we add to the label function is on the order of
2
per level (the number of fluent literals - assuming boolean
fluents). The second option is to describe outcomes in labels with a
new set of fluents, where each interpretation over the fluents is
matched to a particular outcome. In this case, we add on the order
of log
variables, where
is the
outcome layer. It would actually be lower if many of the outcomes
were from deterministic effects because there is no need to describe
them in labels. The former approach is likely to introduce fewer
variables when there are a lot of non-deterministic effects and they
affect quite a few of the same literals. The latter will introduce
fewer variables when there are relatively few non-deterministic
effects whose outcomes are fairly independent.
With the generalized labelling, we can still say that an item is
reachable from the source belief state when its label is entailed by
the source belief state. This is because even though we are adding
variables to labels, we are implicitly adding the fluents to the
source belief state. For example, say we add a fluent
to
describe two outcomes of an effect. One outcome is labelled
,
the other
. We can express the source belief state
that is projected by the
with the new fluent as
. An item labelled as
will
not be entailed by the projected belief state (i.e. is unreachable)
because only one outcome causally supports it. If both outcomes
support the item, then it will be reachable.
Given our notion of reachability, we can determine the level from which to extract a relaxed plan. The relaxed plan procedure does not change much in terms of its semantics other than having the extra graph layer for outcomes. We still have to ensure that literals are causally supported in all worlds they are labelled with in a relaxed plan, whether or not the worlds are from the initial state uncertainty or supporting non-deterministic effects.