Reward Functions

With the language defined so far, we are able to compactly represent behaviours. The extension to a non-Markovian reward function is straightforward. We represent such a function by a set⁶ $\phi \subseteq \mbox{\mbox{\$}FLTL}\ \times \mbox{I$\!$R}$ of formulae associated with real valued rewards. We call $\phi$ a reward function specification. Where formula

is associated with reward

in $\phi$ , we write ` $(f:r) \in \phi$ '. The rewards are assumed to be independent and additive, so that the reward function $R_{\phi}$ represented by $\phi$ is given by:

Definition 4 ${\displaystyle R_{\phi}(\Gamma(i)) = \sum_{(f:r) \in \phi} \{r\mid \Gamma(i) \in B_{f}\}}$

E.g, if $\phi$ is $\{\neg p \makebox[1em]{$\mathbin{\mbox{\sf U}}$}(p \wedge \mbox{\$}) : 5.2, \mbox{$\Box$}(q\rightarrow \mbox{$\Box$}\mbox{\$}) : 7.3\}$ , we get a reward of 5.2 the first time that

holds, a reward of

from the first time that

holds onwards, a reward of

when both conditions are met, and 0 otherwise. Again, we can progress a reward function specification $\phi$ to compute the reward at all stages i of $\Gamma$ . As before, progression defines a sequence $\langle \phi_{0}, \phi_{1}, \ldots\rangle$ of reward function specifications, with $\phi_{i+1} = \mbox{R}\mbox{Prog}(\Gamma_i,\phi_{i})$ , where $\mbox{R}\mbox{Prog}$ is the function that applies Prog to all formulae in a reward function specification:

$\begin{displaymath}\mbox{R}\mbox{Prog}(s,\phi) = \{(\mbox{Prog}(s,f):r) \mid (f:r) \in \phi\}\end{displaymath}$

Then, the total reward received at stage

is simply the sum of the real-valued rewards granted by the progression function to the behaviours represented by the formulae in $\phi_i$ :

$\begin{displaymath} \sum_{(f:r) \in \phi_{i}} \{r \mid \mbox{Rew}(\Gamma_{i},f)\} \end{displaymath}$

By proceeding that way, we get the expected analog of Theorem 1, which states progression correctly computes non-Markovian reward functions:

Theorem 2 Let $\phi$ be a reward-normal⁷reward function specification, and let $\langle \phi_{0},\phi_{1} \ldots\rangle$ be the result of progressing it through the successive states of a sequence $\Gamma$ using the function $\mbox{{\em R}\mbox{\em Prog}}$ . Then, provided $(\mbox{$\bot$}\!:\!r)\!\not\in\!\phi_i$ for any , then ${\displaystyle \sum_{(f:r) \in \phi_{i}} \{r \mid \mbox{\em Rew}(\Gamma_{i},f)\} = R_{\phi}(\Gamma(i))}$ .

Proof: Immediate from Theorem 1. $\Box$

Sylvie Thiebaux 2006-01-20