Principles Behind the Translations

All three translations into an MDP (PLTLSIM, PLTLMIN, and PLTLSTR) rely on the equivalence $\mbox{$f_1 \mathbin{\mbox{\sf S}}f_2$} \equiv f_2 \vee (f_1 \wedge \circleddash (f_1 \mathbin{\mbox{\sf S}}f_2))$ , with which we can decompose temporal modalities into a requirement about the last state $\Gamma_i$ of a sequence $\Gamma(i)$ , and a requirement about the prefix $\Gamma(i-1)$ of the sequence. More precisely, given state

and a formula

, one can compute in² ${\cal O}(\vert\vert f\vert\vert)$ a new formula $\mbox{Reg}(f,s)$ called the regression of

through

. Regression has the property that, for

is true of a finite sequence $\Gamma(i)$ ending with $\Gamma_i =s$ iff $\mbox{Reg}(f,s)$ is true of the prefix $\Gamma(i-1)$ . That is, $\mbox{Reg}(f,s)$ represents what must have been true previously for

to be true now. $\mbox{Reg}$ is defined as follows:

$\mbox{Reg}(p,s) = \mbox{$\top$}$ iff $p\in s$ and $\mbox{$\bot$}$ otherwise, for $p\in {\cal P}$
$\mbox{Reg}(\neg f,s)= \neg\mbox{Reg}(f,s)$
$\mbox{Reg}(f_1 \wedge f_2,s) = \mbox{Reg}(f_1,s) \wedge \mbox{Reg}(f_2,s)$
$\mbox{Reg}(\circleddash f,s) = f$
$\mbox{Reg}(f_1 \mathbin{\mbox{\sf S}}f_2,s) = \mbox{Reg}(f_2,s) \vee (\mbox{Reg}(f_1,s) \wedge (f_1 \mathbin{\mbox{\sf S}}f_2))$

For instance, take a state

in which

holds and

does not, and take $f = (\circleddash \neg q) \wedge (p \mathbin{\mbox{\sf S}}q)$ , meaning that

must have been false 1 step ago, but that it must have held at some point in the past and that

must have held since

last did. $\mbox{Reg}(f,s) = \neg q \wedge (p \mathbin{\mbox{\sf S}}q)$ , that is, for

to hold now, then at the previous stage,

had to be false and the $p \mathbin{\mbox{\sf S}} q$ requirement still had to hold. When

and

are both false in

, then $\mbox{Reg}(f,s) = \mbox{$\bot$}$ , indicating that

cannot be satisfied, regardless of what came earlier in the sequence. For notational convenience, where

is a set of formulae we write $\overline{X}$ for $X \cup \{\neg x \mid x \in X\}$ . Now the translations exploit the PLTL representation of rewards as follows. Each expanded state (e-state) in the generated MDP can be seen as labelled with a set $\Psi \subseteq \overline{\mbox{Sub}(F)}$ of subformulae of the reward formulae in

(and their negations). The subformulae in $\Psi$ must be (1) true of the paths leading to the e-state, and (2) sufficient to determine the current truth of all reward formulae in

, as this is needed to compute the current reward. Ideally the $\Psi$ s should also be (3) small enough to enable just that, i.e. they should not contain subformulae which draw history distinctions that are irrelevant to determining the reward at one point or another. Note however that in the worst-case, the number of distinctions needed, even in the minimal equivalent MDP, may be exponential in $\vert\vert F\vert\vert$ . This happens for instance with the formula $\circleddash ^k f$ , which requires

additional bits of information memorising the truth of

over the last

steps.

Sylvie Thiebaux 2006-01-20