Accounting for Positive Interaction

The additive heuristic does not take reuse of actions (other than the dummy action a₀) into account, so it often overestimates the actual number of actions needed to complete a plan. The need to take positive interaction into account in order to obtain a more accurate heuristic estimate has been recognized in both state space planning (Nguyen & Kambhampati, 2000; Hoffmann & Nebel, 2001; Refanidis & Vlahavas, 2001) and plan space planning (Nguyen & Kambhampati, 2001). For IPC3 we used a slight modification of the additive heuristic to address the issue of action reuse:

h_add^r( $\displaystyle \pi$ ) = $\displaystyle \sum_{{\stackrel{q}{\longrightarrow}\!a_i\in\mathcal{OC}(\pi)}}^{}$ $\displaystyle \left\{\vphantom{\begin{array}{ll} 0 & \mbox{if $\exists a_j\in\m... ...in \mathcal{O}$}\\ h_{\mathrm{add}}(q) & \mbox{otherwise} \end{array}}\right.$ $\displaystyle \begin{array}{ll} 0 & \mbox{if $\exists a_j\in\mathcal{A}$\ s.t.\... ..._j \not\in \mathcal{O}$}\\ h_{\mathrm{add}}(q) & \mbox{otherwise} \end{array}$

To illustrate the difference between h_add( $\pi$ ) and h_add^r( $\pi$ ) consider a planning domain with two action schemas A₁ and A₂, where A₁ has no preconditions and A₂ has a single precondition q. Assume that q can only be achieved through an action instance of A₁. The heuristic cost for the literal q is therefore 1 according to the additive heuristic. Consider now a plan $\pi$ with two unordered actions a₁ and a₂ (a_i being an instance of action schema A_i) and a single open condition $\;\stackrel{{q}}{{\longrightarrow}}\;$ a₂. We have h_add( $\pi$ ) = h_add(q) = 1 corresponding to the addition of a new instance of action schema A₁ to achieve q, but h_add^r( $\pi$ ) = 0 because there is an action (viz. a₁) that is not ordered after a₂ and has an effect that unifies with q. Table 1 shows that taking reuse into account can have a significant impact on planning time in practice. The modified additive heuristic h_add^r clearly dominates h_add in the DriverLog and ZenoTravel domains despite incurring a higher overhead per generated plan. The results in the Satellite domain are more mixed, with h_add having a slight edge overall. We show planning times for the four flaw selection strategies that were used by VHPOP at IPC3. These and other novel flaw selection strategies are discussed in detail in Section 3.2.2.

Table: Planning times in seconds using different flaw selection strategies for a selection of problems in the DriverLog, ZenoTravel, and Satellite domains, showing the impact of taking reuse into account in the plan ranking heuristic. A dash (-) means that the planner ran out of memory (512Mb).

Problem	MW-Loc		MW-Loc-Conf		LCFR-Loc		LCFR-Loc-Conf
	h_add	h_add^r	h_add	h_add^r	h_add	h_add^r	h_add	h_add^r
DriverLog6	8.65	0.16	4.41	0.13	87.58	2.01	-	1.16
DriverLog7	3.66	0.34	0.63	0.17	21.15	1.28	1.57	0.22
DriverLog8	-	-	110.26	1.48	-	177.27	-	2.05
DriverLog9	-	0.33	-	0.28	-	-	-	-
DriverLog10	4.13	2.11	0.71	0.76	3.79	0.64	1.30	0.83
ZenoTravel6	-	0.93	17.41	2.90	25.09	0.95	11.24	2.82
ZenoTravel7	-	-	-	37.81	-	-	-	33.10
ZenoTravel8	-	15.48	-	37.99	-	-	-	6.45
ZenoTravel9	-	86.21	-	11.53	-	33.37	26.33	9.49
ZenoTravel10	-	26.59	-	21.22	-	21.20	-	18.22
Satellite6	0.36	0.22	0.37	0.24	0.32	0.21	0.40	0.24
Satellite7	0.49	0.37	0.54	0.84	0.55	0.51	0.62	-
Satellite8	1.09	-	1.29	0.84	0.85	0.83	1.25	0.68
Satellite9	2.41	-	2.11	-	1.84	-	2.50	-
Satellite10	1.53	1.12	1.95	1.11	1.50	1.36	2.08	1.37

Hoffmann and Nebel (2001) describe the FF heuristic that takes positive interaction between actions into account by extracting a plan from the relaxed planning graph¹, and argue that the accounting of action reuse is one of the contributing factors to FF's performance advantage over HSP. The FF heuristic can take reuse of potential actions into account, and not just existing actions as is the case with our modified additive heuristic. This should result in a better estimate of actual plan cost, but requires that a plan is extracted from the relaxed planning graph for every search node, which could be costly. It would be interesting to see how the FF heuristic performs if used in a plan space planner.

The heuristic cost function used in REPOP (Nguyen & Kambhampati, 2001), a heuristic partial order planner working solely with ground actions, is defined using a serial planning graph.² The heuristic is similar in spirit to the FF heuristic, and can like the FF heuristic take reuse of potential actions into account. The REPOP heuristic also takes into account reuse of existing actions, but seemingly without considering ordering constraints, which is something we do in our modified additive heuristic. Furthermore, our h_add^r heuristic always takes reuse of any existing actions that achieves a literal q into account, while the REPOP heuristic only considers an existing action if it happens to be selected from the serial planning graph as the the action that achieves q. The results in Table 2 indicate that the REPOP heuristic may be less effective than the additive heuristic (with and without reuse) in certain domains.