Existing Approaches

When solving NMRDPs in this setting, the central issue is to define a non-Markovian reward specification language and a translation into an MDP adapted to the class of MDP solution methods and representations we would like to use for the type of problems at hand. More precisely, there is a tradeoff between the effort spent in the translation, e.g. in producing a small equivalent MDP without many irrelevant history distinctions, and the effort required to solve it. Appropriate resolution of this tradeoff depends on the type of representations and solution methods envisioned for the MDP. For instance, structured representations and solution methods which have some ability to ignore irrelevant information may cope with a crude translation, while state-based (flat) representations and methods will require a more sophisticated translation producing an MDP as small as feasible. Both the two previous proposals within this line of research rely on past linear temporal logic (PLTL) formulae to specify the behaviours to be rewarded [2,3]. A nice feature of PLTL is that it yields a straightforward semantics of non-Markovian rewards, and lends itself to a range of translations from the crudest to the finest. The two proposals adopt very different translations adapted to two very different types of solution methods and representations. The first [2] targets classical state-based solution methods such as policy iteration [32] which generate complete policies at the cost of enumerating all states in the entire MDP. Consequently, it adopts an expensive translation which attempts to produce a minimal MDP. By contrast, the second translation [3] is very efficient but crude, and targets structured solution methods and representations (see e.g., [29,12,21]), which do not require explicit state enumeration.

Sylvie Thiebaux 2006-01-20