Influence of Syntax

Unsurprisingly, we find that the syntax used to express rewards, which affects the length of the formula, has a major influence on the run time. A typical example of this effect is captured in Figure 12. This graph demonstrates how re-expressing $\mbox{\emph{prvOut}}\equiv\circleddash ^n (\wedge_{i=1}^{n} p_i)$ as $\mbox{\emph{prvIn}}\equiv\wedge_{i=1}^{n} \circleddash ^n p_i$ , thereby creating

times more temporal subformulae, alters the running time of all PLTL methods. FLTL is affected too as $FLTL progression requires two iterations through the reward formula. The graph represents the averages of the running times over all the methods, for the COMPLETE domain.

**Figure 13:** Effect of Multiple Rewards on MDP size
$\includegraphics[width=0.65\textwidth]{figures/multiplesize}$

**Figure 14:** Effect of Multiple Rewards on Run Time
$\includegraphics[width=0.65\textwidth]{figures/multipletime}$

Our most serious concern in relation to the PLTL approaches is their handling of reward specifications containing multiple reward elements. Most notably we found that PLTLMIN does not necessarily produce the minimal equivalent MDP in this situation. To demonstrate, we consider the set of reward formulae $\{f_1,f_2, \ldots, f_n\}$ , each associated with the same real value

. Given this, PLTL approaches will distinguish unnecessarily between past behaviours which lead to identical future rewards. This may occur when the reward at an e-state is determined by the truth value of $f_1 \vee f_2$ . This formula does not necessarily require e-states that distinguish between the cases in which $\{f_1 \equiv \top, f_2 \equiv \bot\}$ and $\{f_1 \equiv \bot, f_2 \equiv \top\}$ hold; however, given the above specification, PLTLMIN makes this distinction. For example, taking $f_i = \circleddash p_i$ , Figure 13 shows that FLTL leads to an MDP whose size is at most 3 times that of the NMRDP. In contrast, the relative size of the MDP produced by PLTLMIN is linear in

, the number of rewards and propositions. These results are obtained with all hand-coded domains except SPUDD-EXPON. Figure 14 shows the run-times as a function of

for COMPLETE. FLTL dominates and is only overtaken by PLTLSTR(A) for large values of

, when the MDP becomes too large for explicit exploration to be practical. To obtain the minimal equivalent MDP using PLTLMIN, a bloated reward specification of the form $\{(\circleddash \vee_{i=1}^{n} (p_i \wedge_{j=1,j\neq i}^{n} \neg p_j) : r), \ldots, (\circleddash \wedge_{i=1}^{n} p_i : n*r)\}$ is necessary, which, by virtue of its exponential length, is not an adequate solution.

Sylvie Thiebaux 2006-01-20