When solving NMRDPs in this setting, the central issue is to define a
non-Markovian reward specification language and a translation into an
MDP adapted to the class of MDP solution methods and
representations we would like to use for the type of problems at hand.
More precisely, there is a tradeoff between the effort spent in the
translation, e.g. in producing a small equivalent MDP without
many irrelevant history distinctions, and the effort required to solve
it. Appropriate resolution of this tradeoff depends on the type of
representations and solution methods envisioned for the MDP. For
instance, structured representations and solution methods which
have some ability to ignore irrelevant information may cope with a
crude translation, while state-based (flat) representations and
methods will require a more sophisticated translation producing an MDP
as small as feasible.
Both the two previous proposals within this line of research rely on
past linear temporal logic (PLTL) formulae to specify the behaviours
to be rewarded [2,3]. A nice feature
of PLTL is that it yields a straightforward semantics of non-Markovian
rewards, and lends itself to a range of translations from the crudest
to the finest. The two proposals adopt very different translations
adapted to two very different types of solution methods and
representations. The first [2] targets classical
state-based solution methods such as policy iteration [32]
which generate complete policies at the cost of enumerating all
states in the entire MDP. Consequently, it adopts an expensive
translation which attempts to produce a minimal MDP. By
contrast, the second translation [3] is very
efficient but crude, and targets structured solution methods and
representations (see e.g., [29,12,21]), which do not
require explicit state enumeration.
Sylvie Thiebaux
2006-01-20