We have shown how to construct an MDP from the PPDDL encoding of a planning problem. The plan objective is to maximize the expected reward for the MDP. This objective can be interpreted in different ways, for example as expected discounted reward or expected total reward. The suitable interpretation depends on the problem. For process-oriented planning problems (for example, the “Coffee Delivery” problem), discounted reward is typically desirable, while total reward often is the interpretation chosen for goal-oriented problems (for example, the “Bomb and Toilet” problem). PPDDL does not include any facility for enforcing a given interpretation or specifying a discount factor.
For the competition, we used expected total reward as the optimality criterion. Without discounting, some care is required in the design of planning problems to ensure that the expected total reward is bounded for the optimal policy. The following restrictions were made for problems used in the planning competition:
Håkan L. S. Younes