VHPOP at IPC3

VHPOP allows for several flaw selection strategies to be used simultaneously in a round-robin scheme. This lets us exploit the strengths of different flaw selection strategies concurrently, which was essential for the success of VHPOP at IPC3 since we have yet to find a single superior flaw selection strategy that dominates all other flaw selection strategies in terms of the number of solved problems within a given time frame. The technique we use in VHPOP for supporting multiple flaw selection strategies is in essence the same as the technique proposed by Howe et al. (1999) for exploiting performance benefits of several planners at once in a meta-planner. Although the meta-planner is slower than the fastest planner on any single problem, it can solve more problems than any single planner.

We used four different flaw selection strategies at IPC3 (Table 6), preferring local flaw selection strategies as they tend to incur a lower overhead than global strategies such as LCFR and MW and often appear more effective than global strategies because of a maintained focused on subgoal achievement. The four strategies were selected after some initial experimentation with problems from a few of the competition domains.

The use of multiple flaw selection strategies can be thought of as running multiple concurrent instances of the planner, as a separate search queue is maintained for each flaw selection strategy that is used. Similar to HSP2.0 at the planning competition in 2000 (Bonet & Geffner, 2001a), we use a fixed control strategy to schedule these multiple instances of our planner. The first time a flaw selection strategy is used, it is allowed to generate up to 1000 search nodes. The second time the same flaw selection strategy is used, it can generate another 1000 search nodes, making it a total of 2000 search nodes. At each subsequent round i, each flaw selection strategy is permitted to generate up to 1000^.2^i-2 additional nodes. The maximum number of nodes generated using a specific flaw selection strategy is 1000^.2^i-1 after i rounds. An optional upper limit on the number of generated search nodes can be set for each flaw selection strategy. This is useful for flaw selection strategies that typically solve problems quickly, when they solve them at all within reasonable time. Table 7 shows the search limits used by VHPOP at IPC3. These limits were determined after some initial trials on the competition problems. Note that there was no set search limit for the last flaw selection strategy. Whenever the other three strategies all reached their search limits without having found a solution, LCFR-Loc-Conf was used until physical resource limits were reached.

Table 7: Execution order of flaw selection strategies used at IPC3, and also search limits used with each strategy on domains with and without durative actions.

Name	Order	STRIPS Limit	Durative Limit
MW-Loc	1	10000	12000
MW-Loc-Conf	2	100000	100000
LCFR-Loc	3	200000	240000
LCFR-Loc-Conf	4	$\infty$	$\infty$

Table 8 shows the number of plans generated in the STRIPS Satellite domain before a solution is found for the four flaw selection strategies used at IPC3, and also the number of generated plans when combining the four strategies using the schedule in Table 7. To better understand how the round-robin scheduling works, we take a closer look at the numbers for problem 15. Table 9 shows how the total number of generated plans is divided between rounds and flaw selection strategies. Note that although MW-Loc is actually the best strategy for this problem, it is stopped already in round 5. The total number of generated plans does not exactly match the actual number of generated plans reported in Table 8. This is because we only consider suspending the use of a flaw selection strategy after all refinements of the last selected plan have been added, so the limit in a round can be exceeded slightly in practice. The numbers in Table 9 represent an idealized situation where flaw selection strategies are switched when the number of generated plans exactly matches the limit for the current round.

Table: Number of generated plans in the STRIPS Satellite domain for the four different flaw selection strategies used by VHPOP at IPC3. The rightmost column is the number of plans generated by VHPOP before finding a solution when using the schedule in Table 7. A dagger ( $\dagger$ ) means that the planner ran out of memory (800Mb) after generating at least the indicated number of plans.

Problem	MW-Loc	MW-Loc-Conf	LCFR-Loc	LCFR-Loc-Conf	All
1	118	118	118	118	118
2	229	229	249	249	229
3	172	172	172	172	172
4	738	843	822	1797	738
5	448	723000 $\dagger$	1018	706000 $\dagger$	448
6	636000 $\dagger$	629000 $\dagger$	720	834	2727
7	571	745	620	688000 $\dagger$	571
8	482000 $\dagger$	874	1017	783	1874
10	1245	1178	1323	1275	4283
11	1172	1172	1172	1172	4172
12	3517	3733	525000 $\dagger$	525000 $\dagger$	9542
13	6241	382000 $\dagger$	559000 $\dagger$	544000 $\dagger$	18265
14	2352	2352	2157	2157	8365
15	74738	444000 $\dagger$	107375	465000 $\dagger$	281387
16	533000 $\dagger$	529000 $\dagger$	3442	3571	13471
17	2975	2975	3438	3438	8981
18	1584	1584	1724	1724	4588

Table 9: A closer look at the round-robin scheduling for problem 15 in the STRIPS Satellite domain. Italic entries indicate that the search limit for a flaw selection strategy was reached in the round.

Round	MW-Loc	MW-Loc-Conf	LCFR-Loc	LCFR-Loc-Conf	Total
1	1000	1000	1000	1000	4000
2	1000	1000	1000	1000	4000
3	2000	2000	2000	2000	8000
4	4000	4000	4000	4000	16000
5	2000	8000	8000	8000	26000
6	-	16000	16000	16000	48000
7	-	32000	32000	32000	96000
8	-	36000	43375		79375
Total	10000	100000	107375	64000	281375

VHPOP solved 122 problems out of 224 attempted at IPC3. The quality of the plans, in terms of number of steps, generated by VHPOP was generally very high. For plain STRIPS domains, VHPOP's plans were typically within 10 percent of the best plans found by any planner in the competition, with 28 of VHPOP's 68 STRIPS plans being at least as short as the best plans found and, being a POCL planner, VHPOP automatically exploits parallelism in planning domains, generating plans for STRIPS domains with low total plan execution time (Table 10). Table 11 shows that VHPOP also performed well in terms of number of solved problems in four of the six STRIPS domains, being competitive with top performers such as MIPS and LPG (particularly in the Rovers domain).

In domains with durative actions⁵, total execution time was given as an explicit plan metric, and the objective was to minimize this metric. The specification of an explicit plan metric is a feature of PDDL2.1 not present in earlier versions of PDDL. As VHPOP currently ignores this objective function and always tries to find plans with few steps, it should come as no surprise that the quality of VHPOP's plans for domains with durative actions was significantly worse than the quality of the best plans found (Table 12).⁶ VHPOP still produced plans with few steps, however, with over 60 percent of VHPOP's plans for domains with durative actions having the fewest steps. The plan selection heuristic that VHPOP uses is tuned for finding plans with few steps, and it would need to be modified in order to find plans with shorter total execution time. Table 13 shows that LPG solved by far the most problems in domains with durative actions, but that VHPOP was competitive with MIPS and clearly outperformed TP4 and TPSYS.

Table 10: Relative plan quality for the STRIPS domains where VHPOP solved more than half of the problems. There are two plan quality metrics. Number of steps is simply the total number of steps in a plan, while execution time is the total time required to execute a plan (counting parallel actions as one time step). The table shows the average ratio of VHPOP's plan quality and the quality of the best plan generated by any planner, and the number of problems in each domain where VHPOP found the best plan is also shown.

Domain	# Solved	# Steps	# Best	Execution Time	# Best
DriverLog	14	1.09	5	1.15	4
ZenoTravel	13	1.04	7	1.20	5
Satellite	17	1.07	7	1.25	5
Rovers	20	1.08	7	1.08	13

Table 11: Number of problems solved by top performing fully automated planners in STRIPS domains.

Planner	Depots	DriverLog	ZenoTravel	Satellite	Rovers	FreeCell	Total
FF	22	15	20	20	20	20	117
LPG	21	18	20	20	12	18	109
MIPS	10	15	16	14	12	19	86
SIMPLANNER	22	11	20	17	9	12	91
STELLA	4	10	18	14	4	0	50
VHPOP	3	14	13	17	20	1	68

Table 12: Same information as in Table 10, but for domains with durative actions.

Domain	# Solved	# Steps	# Best	Execution Time
DriverLog	14	1.04	8	1.50
ZenoTravel	13	1.04	10	1.54
Satellite	17	1.04	9	2.51
Rovers	7	1.04	5	1.39

Table 13: Number of problems solved by fully automated planners in domains with durative actions.

Planner	Depots	DriverLog	ZenoTravel	Satellite	Rovers	Total
LPG	20	20	20	20	12	92
MIPS	11	15	14	9	9	58
TP4	1	2	5	3	4	15
TPSYS	0	2	2	2	4	10
VHPOP	3	14	13	17	7	54

While VHPOP was a top performer at IPC3 in terms of plan quality, it was far from the top in terms of planning time. VHPOP was typically orders of magnitude slower than the fastest planner. The high planning times for VHPOP can in part be attributed to implementation details. Improvements to the code (e.g. using pointer comparison instead of string comparison whenever possible) since the planning competition has resulted in 10 to 20 percent lower planning times when using ground actions and when using lifted actions the planner is more than twice as fast as before. The reachability analysis is still a bottleneck, however, and further improvements could definitely be made there. It is important to remember, though, that we basically run four planners at once by using four flaw selection strategies concurrently. Table 14 shows the average relative performance of VHPOP at IPC3 compared to the performance of VHPOP using only the best flaw selection strategy for each problem. VHPOP with the best flaw selection strategy is on average two to three times faster than VHPOP with four concurrent strategies. Using several flaw selection strategies simultaneously helps us solve more problems, but the price is reduced speed. By more intelligently scheduling the different flaw selection strategies depending on domain and problem features, and not just using a fixed schedule for all problems, we could potentially increase planner efficiency significantly.

Table 14: Each number in the table represents the average ratio of the planning time for VHPOP using all four flaw selection strategies concurrently and the planning time for VHPOP with only the fastest flaw selection strategy.

Domain	STRIPS	Durative
DriverLog	2.52	2.66
ZenoTravel	2.76	2.86
Satellite	1.78	2.01
Rovers	2.32	3.37