Footnotes

Technically, the definition allows the sets of actions

and

to be different, but any action in which they differ must be inapplicable in reachable states in the NMRDP and in all e-states in the equivalent MDP. For all practical purposes,

and

can be seen as identical.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... in ²

The size $\vert\vert f\vert\vert$ of a reward formula is measured as its length and the size $\vert\vert F\vert\vert$ of a set of reward formulae

is measured as the sum of the lengths of the formulae in

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...³

If there is no

such that $\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize $B$}}\:$} f$ , which is the case for any $-free

which is not a logical theorem, then

is $\bigcap\emptyset$ - i.e.

following normal set-theoretic conventions. This limiting case does no harm, since $-free formulae do not describe the attribution of rewards.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...⁴

It is an open question whether the set of representable behaviours is the same for $FLTL as for PLTL, that is star-free regular languages. Even if the behaviours were the same, there is little hope that a practical translation from one to the other exists.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...⁵

These difficulties are inherent in the use of linear-time formalisms in contexts where the principle of directionality must be enforced. They are shared for instance by formalisms developed for reasoning about actions such as the Event Calculus and LTL action theories, see e.g., [13].

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... set ⁶

Strictly speaking, a multiset, but for convenience we represent it as a set, with the rewards for multiple occurrences of the same formula in the multiset summed.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... reward-normal ⁷

We extend the definition of reward-normality to reward specification functions in the obvious way, by requiring that all reward formulae involved be reward normal.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...⁸

Care is needed over the notion of `semantic equivalence'. Because rewards are additive, determining equivalence may involve arithmetic as well as theorem proving. For example, the reward function specification $\{(p \rightarrow \mbox{\$}:3), (q \rightarrow \mbox{\$}:2)\}$ is equivalent to $\{((p \wedge q) \rightarrow \mbox{\$}:5), ((p \wedge \neg q) \rightarrow \mbox{\$}:3), ((\neg p \wedge q) \rightarrow \mbox{\$}:2)\}$ although there is no one-one correspondence between the formulae in the two sets.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... methods.⁹

It would be interesting, on the other hand, to use PLTLSTR in conjunction with symbolic versions of such methods, e.g. Symbolic LAO* [21] or Symbolic RTDP [22].

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... previously:¹⁰

Here is an executive summary of the answers for the executive reader. 1. no, 2. yes, 3. yes, 4. PLTLSTR and FLTL, 5. PLTLSIM, 6. yes, 7. yes and no, respectively, 8. yes, 9. no and yes, respectively.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... properties ¹¹

$\raisebox{0.6mm}{$\scriptstyle \bigcirc$}^{n}\mbox{\$}$ in $FLTL

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... side.¹²

The figures are not necessarily valid for non-completely connected NMRDPs. Unfortunately, even for completely connected domains, there does not appear to be a much cheaper way to determine the MDP size than to generate it and count states.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... unreachable:¹³

Here we sometimes speak of conditions and goals being `reachable' or `achievable' rather than `feasible', although they may be temporally extended. This is to keep in line with conventional vocabulary as in the phrase `reachability analysis'.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... served.¹⁴

We have experimented with stochastic variants of Miconic where passengers have some small probability of desembarking at the wrong floor. However, we find it more useful to present results for the deterministic version since it is closer to the Miconic deterministic planning benchmark and since, as we have shown before, rewards have a far more crucial impact than dynamics on the relative performance of the methods.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... supervisor ¹⁵

To understand the $FLTL formula, observe that we get a reward iff $(BoardedP_i \wedge SupervisedP_i) \rightarrow (BoardedP_j \wedge SupervisorP_jP_i)$ holds until

becomes true, and recall that the formula $\neg q \makebox[1em]{$\mathbin{\mbox{\sf U}}$}((\neg p \wedge \neg q) \vee (q \wedge \mbox{\$}))$ rewards the holding of

until the occurrence of

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... time,¹⁶

On each given problem, planners had 15mn to run whatever computation they saw as appropriate (including parsing, pre-processing, and policy generation if any), and execute 30 trial runs of the generated policy from an initial state to a goal state.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... strategy.¹⁷

More sophisticated near-optimal strategies for deterministic blocks world exist (see [42]), but are much more complex to encode and might have caused time performance problems.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... probabilities.¹⁸

We would not argue, on the other hand, that CTL is necessary for representing non-Markovian rewards.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... goal ¹⁹

Where

is an atemporal formula, GOALP(

) is true iff

is true of all goal states.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... conducted ²⁰

None of those are included in this paper, however.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.