Reward Functions
With the language defined so far, we are able to compactly represent
behaviours. The extension to a non-Markovian reward function is
straightforward. We represent such a function by a set6
of formulae associated with real
valued rewards. We call a reward function specification.
Where formula is associated with reward in , we write
`
'. The rewards are assumed to be independent and
additive, so that the reward function represented by
is given by:
E.g, if is
, we get a reward of 5.2
the first time that holds, a reward of from the first time
that holds onwards, a reward of when both conditions are
met, and 0 otherwise.
Again, we can progress a reward function specification to
compute the reward at all stages i of . As before,
progression defines a sequence
of reward function specifications, with
, where
is the function that
applies Prog to all formulae in a reward function specification:
Then, the total reward received at stage is simply the sum of
the real-valued rewards granted by the progression function to the
behaviours represented by the formulae in :
By proceeding that way, we get the expected analog of
Theorem 1, which states
progression correctly computes non-Markovian reward functions:
Theorem 2
Let be a reward-normal7reward function specification, and let
be the result of progressing it through the
successive states of a sequence using the function
. Then, provided
for any
, then
.
Proof:
Immediate from Theorem 1.
Sylvie Thiebaux
2006-01-20