Temporal POCL Planning

In classical planning, actions have no duration: the effects of an action are instantaneous. Many realistic planning domains, however, require actions that can overlap in time and have different duration. The version of the planning domain definition language (PDDL), PDDL2.1, that was used for IPC3 introduces the notion of durative actions. A durative action represents an interval of time, and conditions and effects can be associated with either endpoint of this interval. Durative actions can also have invariant conditions that must hold for the entire duration of the action.

We use the constraint-based interval approach to temporal POCL planning described by Smith et al. (2000), which in essence is the same approach as used by earlier temporal POCL planners such as DEVISER (Vere, 1983), ZENO (Penberthy & Weld, 1994), and IXTET (Ghallab & Laruelle, 1994). Like IXTET, we use a simple temporal network (STN) to record temporal constraints. The STN representation allows for rapid response to temporal queries. ZENO, on the other hand, uses an integrated approach for handling both temporal and metric constraints, and does not make use of techniques optimized for temporal reasoning. The following is a description of how VHPOP handles the type of temporal planning domains expressible in PDDL2.1.

When planning with durative actions, we substitute the partial order $\mathcal {O}$ in the representation of a plan with an STN $\mathcal {T}$ . Each action a_i of a plan, except the dummy actions a₀ and a, is represented by two nodes t_2i-1 (start time) and t_2i (end time) in the STN $\mathcal {T}$ , and $\mathcal {T}$ can be compactly represented by the d-graph (Dechter et al., 1991). The d-graph is a complete directed graph, where each edge t_i $\rightarrow$ t_j is labeled by the shortest temporal distance, d_ij, between the two time nodes t_i and t_j (i.e. t_j - t_i $\leq$ d_ij). An additional time point, t₀, is used as a reference point to represent time zero. By default, d_ij = $\infty$ for all i $\neq$ j ( d_ii = 0), signifying that there is no upper bound on the difference t_j - t_i.

Constraints are added to $\mathcal {T}$ at the addition of a new action, the linking of an open condition, and the addition of an ordering constraint between endpoints of two actions.

The duration, $\delta_{i}^{}$ , of a durative action a_i is specified as a conjunction of simple duration constraints $\delta_{i}^{}$ $\bowtie$ c, where c is a real-valued constant and $\bowtie$ is in the set { = , $\leq$ , $\geq$ }.⁴ Each simple duration constraint gives rise to temporal constraints between the time nodes t_2i-1 and t_2i of $\mathcal {T}$ when adding a_i to a plan $\langle$ $\mathcal {A}$ , $\mathcal {L}$ , $\mathcal {T}$ , $\mathcal {B}$ $\rangle$ . The temporal constraints, in terms of the minimum distance d_ij between two time points, are as follows:

Duration Constraint	Temporal Constraints
$\delta_{{i}}^{}$ = c	d_{2i-1, 2i} = c and d_{2i, 2i-1} = - c
$\delta_{{i}}^{}$ $\leq$ c	d_{2i-1, 2i} $\leq$ c
$\delta_{{i}}^{}$ $\geq$ c	d_{2i, 2i-1} $\leq$ - c

The semantics of PDDL2.1 with durative actions dictates that every action be scheduled strictly after time zero. Let $\epsilon$ denote the smallest fraction of time required to separate two time points. To ensure that an added action a_i is scheduled after time zero, we add the temporal constraint d_{2i-1, 0} $\leq$ - $\epsilon$ in addition to any temporal constraints due to duration constraints. Figure 2(a) shows the matrix representation of the d-graph after adding an action, a₁, with duration constraint $\delta_{1}^{}$ $\leq$ 7 $\land$ $\delta_{1}^{}$ $\geq$ 3 to a null plan. The rows and columns of the matrix correspond to time point 0, the start of action a₁, and the end of action a₁ in that order. After adding action a₂ with duration constraint $\delta_{2}^{}$ = 4, we have the d-graph represented by the matrix in Figure 2(b). The two additional rows and columns correspond to the start and end of action a₂ in that order.

**Figure 2:** Matrix representation of d-graph, with $\epsilon$ = 1, for STN after (a) adding action a₁ with duration constraint $\delta_{1}^{}$ $\leq$ 7 $\land$ $\delta_{1}^{}$ $\geq$ 3, (b) adding action a₂ with duration constraint $\delta_{2}^{}$ = 4, and (c) ordering the end of a₂ before the end of a₁. Explicitly added temporal constraints are in boldface.
$\begin{figure}\centering\subfigure[]{$\left(\begin{array}{ccc} 0 & \infty & \in... ...nfty & 0 & 4 \\ -5 & \infty & \infty & -4 & 0 \end{array}\right)$} \end{figure}$

A temporal annotation $\tau$ $\in$ {s, i, e} is added to the representation of open conditions. The open condition $\;\stackrel{{q@\mathrm{s}}}{{\longrightarrow}}\;$ a_i represents a condition that must hold at the start of the durative action a_i, $\;\stackrel{{q@\mathrm{e}}}{{\longrightarrow}}\;$ a_i represents a condition that must hold at the end of a_i, while $\;\stackrel{{q@\mathrm{i}}}{{\longrightarrow}}\;$ a_i is an invariant condition for a_i. An equivalent annotation is added to the representation of causal links. The linking of an open condition $\;\stackrel{{q@\tau}}{{\longrightarrow}}\;$ a_i to an effect associated with a time point t_j gives rise to the temporal constraint d_kj $\leq$ - $\epsilon$ (k = 2i if $\tau$ = e, else k = 2i - 1). Figure 2(c) shows the representation of the STN for a plan with actions a₁ and a₂, as before, and with an effect associated with the end of a₂ linked to a condition associated with the end of a₁.

Unsafe causal links are resolved in basically the same way as before, but instead of adding ordering constraints between actions we add temporal constraints between time points ensuring that one time point precedes another time point. We can ensure that time point t_i precedes time point t_j by adding the temporal constraint d_ji $\leq$ - $\epsilon$ .

Every time we add a temporal constraint to a plan, we update all shortest paths d_ij that could have been affected by the added constraint. This propagation of constraints can be carried out in O(| $\mathcal {A}$ |²) time.

Once a plan without flaws is found, we need to schedule the actions in the plan, i.e. assign a start time and duration for each action. A schedule of the actions is a solution to the STN $\mathcal {T}$ , and a solution assigning the earliest possible start time to each action is readily available in the d-graph representation. The start time of action a_i is set to - d_{2i-1, 0} (Corollary 3.2, Dechter et al., 1991) and the duration to d_{2i-1, 0} - d_{2i, 0}. Assuming Figure 2(c) represents the STN for a complete plan, then we would schedule a₁ at time 1 with duration 5 and a₂ at time 1 with duration 4. We can easily verify that this schedule is indeed consistent with the duration constraints given for the actions, and that a₂ ends before a₁ as required.

Each non-durative action can be treated as a durative action of fixed duration 0, with preconditions associated with the start time, effects associated with the end time, and without any invariant conditions. This allows for a frictionless treatment of domains with both durative and non-durative actions.

Let us for a moment consider the memory requirements for temporal POCL planning compared to classical POCL planning. When planning with non-durative actions, we store $\mathcal {O}$ as a bit-matrix representing the transitive closure of the ordering constraints in $\mathcal {O}$ . For a partial plan with n actions, this requires n² bits. With n durative actions, on the other hand, we need roughly 4n² floating-point numbers to represent the d-graph of $\mathcal {T}$ . Each floating-point number requires at least 32 bits on a modern machine, so in total we need more than 100 times as many bits to represent temporal constraints as regular ordering constraints for each plan. We note, however, that each refinement changes only a few entries in the d-graph, and by choosing a clever representation of matrices we can share storage between plans. The upper left 3×3 sub-matrix in Figure 2(b) is for example identical to the matrix in Figure 2(a). The way we store matrices in VHPOP allows us to exploit this commonality and thereby reduce the total memory requirements.

The addition of durative actions does not change the basic POCL algorithm. The recording of temporal constraints and temporal annotations can be handled in a manner transparent to the rest of the planner. The search heuristics described in Section 3, although not tuned specifically for temporal planning, can be used with durative actions. We only need to slightly modify the definition of literal and action cost in the additive heuristic because of the temporal annotations associated with preconditions and effects of durative actions. Let $\mathcal {GA}$ _s(q) denote the set of ground actions achieving q at the start, and $\mathcal {GA}$ _e(q) the set of ground actions achieving q at the end. We define the cost of the literal q as

h_add(q) = $\displaystyle \left\{\vphantom{\begin{array}{ll} 0 & \mbox{if $q$\ holds initia... ...cal{GA}_t(q) \neq \emptyset$}\\ \infty & \mbox{otherwise} \end{array}}\right.$ $\displaystyle \begin{array}{ll} 0 & \mbox{if $q$\ holds initially}\\ \min_{a ... ...f $\mathcal{GA}_t(q) \neq \emptyset$}\\ \infty & \mbox{otherwise} \end{array}$ ,