Proofs of Theorems

Property 1 Where $b \Leftrightarrow (\Gamma(i) \in B)$ , $(\Gamma,i)\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ iff $(\Gamma,i+1) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,f)$ .

Proof: Induction on the structure of . There are several base cases, all fairly trivial. If $f=\mbox{$\top$}$ or $f=\mbox{$\bot$}$ there is nothing to prove, as these progress to themselves and hold everywhere and nowhere respectively. If then if holds in $\Gamma_i$ then it progresses to $\mbox{$\top$}$ which holds in $\Gamma_{i+1}$ while if does not hold in $\Gamma_i$ then it progresses to $\mbox{$\bot$}$ which does not hold in $\Gamma_{i+1}$ . The case $f = \neg p$ is similar. In the last base case, $f = \mbox{\$}$ . Then the following are equivalent:
$(\Gamma,i)\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$
$\Gamma(i)\in B$

$\mbox{Prog}(b,\Gamma_i,f) = \mbox{$\top$}$
$(\Gamma,i+1) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,f)$

Induction case 1: $f = g \wedge h$ . The following are equivalent:
$(\Gamma,i)\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$
$(\Gamma,i) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ and $(\Gamma,i) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$
$(\Gamma,i+1) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,g)$ and $(\Gamma,i+1) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,h)$ (by induction hypothesis)
$(\Gamma,i+1) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,g) \wedge \mbox{Prog}(b, \Gamma_i,h)$
$(\Gamma,i+1) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,f)$

Induction case 2: $f = g \vee h$ . Analogous to case 1.

Induction case 3: $f = \raisebox{0.6mm}{$\scriptstyle \bigcirc$}g$ . Trivial by inspection of the definitions.

Induction case 4: $f = g \makebox[1em]{$\mathbin{\mbox{\sf U}}$}h$ . Then is logically equivalent to $h \vee (g \wedge \raisebox{0.6mm}{$\scriptstyle \bigcirc$}(g \makebox[1em]{$\mathbin{\mbox{\sf U}}$}h)$ which by cases 1, 2 and 3 holds at stage of $\Gamma$ for behaviour iff $\mbox{Prog}(b, \Gamma_i, f)$ holds at stage i+1. $\Box$

Theorem 1 Let be reward-normal, and let $\langle f_0, f_1, \ldots \rangle$ be the result of progressing it through the successive states of a sequence $\Gamma$ . Then, provided no is $\mbox{$\bot$}$ , for all $\mbox{\em Rew}(\Gamma_i,f_i)$ iff $\Gamma(i)\in B_{f}$ .

Proof: First, by the definition of reward-normality, if is reward-normal then $\Gamma \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ iff for all , if $\Gamma(i) \in B_f$ then $\Gamma(i)\in B$ . Next, if $\Gamma \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ then progressing through $\Gamma$ according to (that is, letting each be true iff $\Gamma(i)\in B$ ) cannot lead to a contradiction because by Property 1, progression is truth-preserving. It remains, then, to show that if $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ then progressing through $\Gamma$ according to must lead eventually to $\mbox{$\bot$}$ . The proof of this is by induction on the structure of and as usual the base case in which is a literal (an atom, a negated atom or $\mbox{$\top$}$ , $\mbox{$\bot$}$ or $\mbox{\$}$ ) is trivial. Case $f = g \wedge h$ . Suppose $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ . Then either $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ or $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$ , so by the induction hypothesis either or progresses eventually to $\mbox{$\bot$}$ , and hence so does their conjunction. Case $f = g \vee h$ . Suppose $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ . Then both $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ and $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$ , so by the induction hypothesis each of and progresses eventually to $\mbox{$\bot$}$ . Suppose without loss of generality that does not progress to $\mbox{$\bot$}$ before does. Then at some point has progressed to some formula and has progressed to $g' \vee \mbox{$\bot$}$ which simplifies to . Since also progresses to $\mbox{$\bot$}$ eventually, so does . Case $f = \raisebox{0.6mm}{$\scriptstyle \bigcirc$}g$ . Suppose $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ . Let $\Gamma = \Gamma_0;\Delta$ and let $B' = \{\gamma\vert \Gamma_0;\gamma \in B\}$ . Then $\Delta \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B'$}}\:$} g$ , so by the induction hypothesis progressed through $\Delta$ according to eventually reaches $\mbox{$\bot$}$ . But The progression of through $\Gamma$ according to is exactly the same after the first step, so that too leads to $\mbox{$\bot$}$ . Case $f = g \makebox[1em]{$\mathbin{\mbox{\sf U}}$}h$ . Suppose $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ . Then there is some such that $(\Gamma,j) \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ and for all $i \leq j$ , $(\Gamma,i) \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$ . We proceed by induction on . In the base case , and both $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ and $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$ whence by the main induction hypothesis both and will eventually progress to $\mbox{$\bot$}$ . Thus $h \vee (g \wedge f')$ progresses eventually to $\mbox{$\bot$}$ for any , and in particular for $f' = \raisebox{0.6mm}{$\scriptstyle \bigcirc$}f$ , establishing the base case. For the induction case, suppose $\Gamma \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ (and of course $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$ ). Since is equivalent to $h \vee (g \wedge \raisebox{0.6mm}{$\scriptstyle \bigcirc$}f)$ and $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ , $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$ and $\Gamma \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ , clearly $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \raisebox{0.6mm}{$\scriptstyle \bigcirc$}f$ . Where $\Delta$ and are as in the previous case, therefore, $\Delta \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B'$}}\:$} f$ and the failure occurs at stage of $\Delta$ . Therefore the hypothesis of the induction on applies, and progressed through $\Delta$ according to goes eventually to $\mbox{$\bot$}$ , and so progressed through $\Gamma$ according to goes similarly to $\mbox{$\bot$}$ . $\Box$

Theorem 3 Let be the set of e-states in an equivalent MDP for $D=\langle S,s_0,A,\Pr ,R\rangle$ . is minimal iff every e-state in is reachable and contains no two distinct e-states and with $\tau(s'_1) = \tau(s'_2)$ and $\mu(s'_1) = \mu(s'_2)$ .

Proof: Proof is by construction of the canonical equivalent MDP . Let the set of finite prefixes of state sequences in $\widetilde{D}(s_0)$ be partitioned into equivalence classes, where ${\Gamma}1(i) \equiv {\Gamma}2(j)$ iff ${\Gamma}1_i = {\Gamma}2_j$ and for all $\Delta \in S^*$ such that ${\Gamma}1(i);\Delta \in \widetilde{D}(s_0)$ , $R({\Gamma}1(i);\Delta) = R({\Gamma}2(j);\Delta)$ . Let $[\Gamma(i)]$ denote the equivalence class of $\Gamma(i)$ . Let ${\cal E}$ be the set of these equivalence classes. Let ${\cal A}$ be the function that takes each $[\Gamma(i)]$ in ${\cal E}$ to $A(\Gamma_i)$ . For each $\Gamma(i)$ and $\Delta(j)$ and for each $a \in {\cal A}([\Gamma(i)])$ , let ${\cal T}( [\Gamma(i)],a,[\Delta(j)] )$ be $\Pr(\Gamma_i,a,s)$ if $[\Delta(j)] = [\Gamma(i);\langle s \rangle]$ . Otherwise let ${\cal T}( [\Gamma(i)],a,[\Delta(j)] ) = 0$ . Let ${\cal R}([\Gamma(i)])$ be $R(\Gamma(i))$ . Then note the following four facts:

Each of the functions , and is well-defined.
$D^c = \langle {\cal E}, [\langle s_0 \rangle], {\cal A}, {\cal T}, {\cal R} \rangle$ is an equivalent MDP for with $\tau([\Gamma(i)])=\Gamma_i$ .
For any equivalent MDP of there is a mapping from a subset of the states of onto $\cal E$ .
satisfies the condition that every e-state in is reachable and contains no two distinct e-states and with $\tau(s'_1) = \tau(s'_2)$ and $\mu(s'_1) = \mu(s'_2)$ iff is isomorphic to .

What fact 1 above amounts to is that if $\Gamma{1}(i) \equiv \Gamma{2}(j)$ then it does not matter which of the two sequences is used to define ${\cal A}$ , ${\cal T}$ and ${\cal R}$ of their equivalence class. In the cases of ${\cal A}$ and ${\cal T}$ this is simply that $\Gamma{1}_i = \Gamma{2}_j$ . In the case of ${\cal R}$ , it is the special case $\Delta = \langle \Gamma{1}_i \rangle$ of the equality of rewards over extensions. Fact 2 is a matter of checking that the four conditions of Definition 1 hold. Of these, conditions 1 ( $\tau([s_0]) = s_{0}$ ) and 2 ( ${\cal A}([\Gamma(i)])= A(\Gamma_i)$ ) hold trivially by the construction. Condition 4 says that for any feasible state sequence $\Gamma \in \widetilde{D}(s_0)$ , we have ${\cal R}([\Gamma(i)]) = R(\Gamma(i))$ for all

. This also is given in the construction. Condition 3 states:

For all $s_1,s_2\in S$ , if there is $a\in A(s_1)$ such that $\Pr(s_1,a,s_2) > 0$ , then for all $\Gamma(i) \in \widetilde{D}(s_0)$ such that $\Gamma_i = s_1$ , there exists a unique $[\Delta(j)] \in {\cal E}$ , $\Delta_j = s_2$ , such that for all $a \in {\cal A}([\Gamma(i)])$ , ${\cal T}([\Gamma(i)],a,[\Delta[j]]) = \Pr(s_1,a,s_2)$ .

Suppose $\Pr(s_1,\alpha,s_2) > 0$ , $\Gamma(i) \in \widetilde{D}(s_0)$ and $\Gamma_i = s_1$ . Then the required $\Delta(j)$ is $\Gamma(i);\langle s_2 \rangle$ , and of course ${\cal A}([\Gamma(i)])= A(\Gamma_i)$ , so the required condition reads:

$[\Gamma(i);\langle s_2 \rangle]$ is the unique element of ${\cal E}$ with $\tau(X) = s_2$ such that for all $a \in A(\Gamma_i)$ , ${\cal T}( [\Gamma(i)],a,X ) = \Pr(s_1,a,s_2)$ .

To establish existence, we need that if $a \in A(\Gamma_i)$ then ${\cal T} ( [ \Gamma(i) ],a, [\Gamma(i);\langle s_2 \rangle] ) = \Pr(\Gamma_i,a,s_2)$ , which is immediate from the definition of $\cal T$ above. To establish uniqueness, suppose that $\tau(X) = s_2$ and ${\cal T}( [\Gamma(i)],a,X ) = \Pr(s_1,a,s_2)$ for all actions $a \in A(\Gamma_i)$ . Since $\Pr(s_1,\alpha,s_2) > 0$ , the transition probability from $[\Gamma(i)]$ to

is nonzero for some action, so by the definition of $\cal T$ ,

can only be $[\Gamma(i);\langle s_2 \rangle]$ . Fact 3 is readily observed. Let

be any equivalent MDP for

. For any states

and

, and any state

such that $\tau(X) = s_1$ there is at most one state

with $\tau(Y) = s_2$ such that some action $a\in A(s_1)$ gives a nonzero probability of transition from

. This follows from the uniqueness part of condition 3 of Definition 1 together with the fact that the transition function is a probability distribution (sums to 1). Therefore for any given finite state sequence $\Gamma(i)$ there is at most one state of

reached from the start state of

by following $\Gamma(i)$ . Therefore

induces an equivalence relation $\approx_M$ on

: $\Gamma(i) \approx_M \Delta(j)$ iff they lead to the same state of

(the sequences which are not feasible in

may all be regarded as equivalent under $\approx_M$ ). Each reachable state of

has associated with it a nonempty equivalence class of finite sequences of states of

. Working through the definitions, we may observe that $\approx_M$ is a sub-relation of $\equiv$ (if $\Gamma(i) \approx_M \Delta(j)$ then $\Gamma(i) \equiv \Delta(j)$ ). Hence the function that takes the equivalence class under $\approx_M$ of each feasible sequence $\Gamma(i)$ to $[\Gamma(i)]$ induces a mapping

(an epimorphism in fact) from the reachable subset of states of

onto $\cal E$ . To establish Fact 4, it must be shown that in the case of

the mapping can be reversed, or that each equivalence class $[\Gamma(i)]$ in

corresponds to exactly one element of

. Suppose not (for contradiction). Then there exist sequences $\Gamma{1}(i)$ and $\Gamma{2}(j)$ in $\widetilde{D}(s_0)$ such that $\Gamma{1}(i) \equiv \Gamma{2}(j)$ but on following the two sequences from

we arrive at two different elements

and

with $\tau(s'_1) = \Gamma{1}_i = \Gamma{2}_j = \tau(s'_2)$ but with $\mu(s'_1) \neq \mu(s'_2)$ . Therefore there exists a sequence $\Delta(k) \in \widetilde{D}(s)$ such that $R(\Gamma{1}(i-1);\Delta(k)) \neq R(\Gamma{2}(j-1);\Delta(k))$ . But this contradicts the condition for $\Gamma{1}(i) \equiv \Gamma{2}(j)$ . $\Box$
Theorem 3 follows immediately from facts 1-4.

Theorem 4 Let be the translation of as in Definition 5. is a blind minimal equivalent MDP for .

Proof: Reachability of all the e-states is obvious, as they are constructed only when reached. Each e-state is a pair $\langle s, \phi\rangle$ where is a state of and $\phi$ is a reward function specification. In fact, $s = \tau(\langle s, \phi \rangle)$ and $\phi$ determines a distribution of rewards over all continuations of the sequences that reach $\langle s, \phi\rangle$ . That is, for all $\Delta$ in such that $\Delta_0 = s$ , the reward for $\Delta$ is $\sum_{(f:r) \in \phi} \{r\mid \Delta \in B_{f}\}$ . If is not blind minimal, then there exist distinct e-states $\langle s, \phi\rangle$ and $\langle s, \phi' \rangle$ for which this sum is the same for all $\Delta$ . But this makes $\phi$ and $\phi'$ semantically equivalent, contradicting the supposition that they are distinct. $\Box$

Sylvie Thiebaux 2006-01-20