The syntax of PLTL, the language chosen to represent rewarding
behaviours, is that of propositional logic, augmented with the
operators
(previously) and
(since),
see e.g., [20]. Whereas a classical propositional logic formula
denotes a set of states (a subset of
), a PLTL formula denotes a
set of finite sequences of states (a subset of
). A formula
without temporal modality expresses a property that must be true of
the current state, i.e., the last state of the finite sequence.
specifies that
holds in the previous state (the state
one before the last).
, requires
to have been
true at some point in the sequence, and, unless that point is the present,
to have held ever since. More formally, the modelling relation
stating whether a formula
holds of a finite sequence
is defined recursively as follows:
-
iff
, for
, the set
of atomic propositions
-
iff
-
iff
and
-
iff
and
-
iff
and
From
, one can define the useful operators
meaning that
has been true at some point,
and
meaning that
has
always been true. E.g,
denotes the
set of finite sequences ending in a state where
is true for the
first time in the sequence. Other useful abbreviation are
(
times ago), for
iterations of the
modality,
for
(
was true at some
of the
last steps), and
for
(
was true at all the
last steps).
Non-Markovian reward functions are described with a set of pairs
where
is a PLTL reward formula and
is a real, with
the semantics that the reward assigned to a sequence in
is the
sum of the
's for which that sequence is a model of
. Below,
we let
denote the set of reward formulae
in the description
of the reward function. Bacchus et al. [2] give a list of behaviours which it might be useful to reward, together
with their expression in PLTL. For instance, where
is an
atemporal formula,
rewards with
units the achievement of
whenever it happens. This is a Markovian reward. In contrast
rewards every state following (and including)
the achievement of
, while
only rewards the first occurrence of
.
rewards the occurrence of
at most once every
steps.
rewards the
state, independently of its properties.
rewards the occurrence of
immediately
followed by
and then
. In reactive planning, so-called
response formulae which describe that the achievement of
is
triggered by a condition (or command)
are particularly useful.
These can be written as
if every state
in which
is true following the first issue of the command is to be
rewarded. Alternatively, they can be written as
if only the first occurrence of
is to be
rewarded after each command. It is common to only reward the
achievement
within
steps of the trigger; we write for example
to reward all such states in which
holds.
From a theoretical point of view, it is known
[38] that the behaviours representable in PLTL
are exactly those corresponding to star-free regular languages. Non
star-free behaviours such as
(reward an even number of states
all containing
) are therefore not representable. Nor, of course,
are non-regular behaviours such as
(e.g. reward taking equal
numbers of steps to the left and right). We shall not speculate here
on how severe a restriction this is for the purposes of planning.
Sylvie Thiebaux
2006-01-20