Reward Functions

With the language defined so far, we are able to compactly represent behaviours. The extension to a non-Markovian reward function is straightforward. We represent such a function by a set6 $\phi \subseteq \mbox{\mbox{\$}FLTL}\ \times \mbox{I$\!$R}$ of formulae associated with real valued rewards. We call $\phi$ a reward function specification. Where formula $f$ is associated with reward $r$ in $\phi$, we write ` $(f:r) \in \phi$'. The rewards are assumed to be independent and additive, so that the reward function $R_{\phi}$ represented by $\phi$ is given by:

Definition 4   $ {\displaystyle R_{\phi}(\Gamma(i)) = \sum_{(f:r) \in \phi} \{r\mid
\Gamma(i) \in B_{f}\}}$

E.g, if $\phi$ is $\{\neg p \makebox[1em]{$\mathbin{\mbox{\sf U}}$}(p \wedge \mbox{\$}) : 5.2,
\mbox{$\Box$}(q\rightarrow \mbox{$\Box$}\mbox{\$}) : 7.3\}$, we get a reward of 5.2 the first time that $p$ holds, a reward of $7.3$ from the first time that $q$ holds onwards, a reward of $12.5$ when both conditions are met, and 0 otherwise. Again, we can progress a reward function specification $\phi$ to compute the reward at all stages i of $\Gamma$. As before, progression defines a sequence $\langle \phi_{0}, \phi_{1},
\ldots\rangle$ of reward function specifications, with $\phi_{i+1} =
\mbox{R}\mbox{Prog}(\Gamma_i,\phi_{i})$, where $\mbox{R}\mbox{Prog}$ is the function that applies Prog to all formulae in a reward function specification:

\begin{displaymath}\mbox{R}\mbox{Prog}(s,\phi) = \{(\mbox{Prog}(s,f):r) \mid (f:r) \in \phi\}\end{displaymath}

Then, the total reward received at stage $i$ is simply the sum of the real-valued rewards granted by the progression function to the behaviours represented by the formulae in $\phi_i$:

\begin{displaymath}
\sum_{(f:r) \in \phi_{i}} \{r \mid \mbox{Rew}(\Gamma_{i},f)\}
\end{displaymath}

By proceeding that way, we get the expected analog of Theorem 1, which states progression correctly computes non-Markovian reward functions:

Theorem 2   Let $\phi$ be a reward-normal7reward function specification, and let $\langle \phi_{0},\phi_{1}
\ldots\rangle$ be the result of progressing it through the successive states of a sequence $\Gamma$ using the function $\mbox{{\em R}\mbox{\em Prog}}$. Then, provided $(\mbox{$\bot$}\!:\!r)\!\not\in\!\phi_i$ for any $i$, then ${\displaystyle \sum_{(f:r) \in \phi_{i}} \{r \mid
\mbox{\em Rew}(\Gamma_{i},f)\} = R_{\phi}(\Gamma(i))}$.

Proof: Immediate from Theorem 1. $\Box$
Sylvie Thiebaux 2006-01-20