VHPOP allows for several flaw selection strategies to be used simultaneously in a round-robin scheme. This lets us exploit the strengths of different flaw selection strategies concurrently, which was essential for the success of VHPOP at IPC3 since we have yet to find a single superior flaw selection strategy that dominates all other flaw selection strategies in terms of the number of solved problems within a given time frame. The technique we use in VHPOP for supporting multiple flaw selection strategies is in essence the same as the technique proposed by Howe et al. (1999) for exploiting performance benefits of several planners at once in a meta-planner. Although the meta-planner is slower than the fastest planner on any single problem, it can solve more problems than any single planner.
We used four different flaw selection strategies at IPC3 (Table 6), preferring local flaw selection strategies as they tend to incur a lower overhead than global strategies such as LCFR and MW and often appear more effective than global strategies because of a maintained focused on subgoal achievement. The four strategies were selected after some initial experimentation with problems from a few of the competition domains.
The use of multiple flaw selection strategies can be thought of as running multiple concurrent instances of the planner, as a separate search queue is maintained for each flaw selection strategy that is used. Similar to HSP2.0 at the planning competition in 2000 (Bonet & Geffner, 2001a), we use a fixed control strategy to schedule these multiple instances of our planner. The first time a flaw selection strategy is used, it is allowed to generate up to 1000 search nodes. The second time the same flaw selection strategy is used, it can generate another 1000 search nodes, making it a total of 2000 search nodes. At each subsequent round i, each flaw selection strategy is permitted to generate up to 1000 . 2i-2 additional nodes. The maximum number of nodes generated using a specific flaw selection strategy is 1000 . 2i-1 after i rounds. An optional upper limit on the number of generated search nodes can be set for each flaw selection strategy. This is useful for flaw selection strategies that typically solve problems quickly, when they solve them at all within reasonable time. Table 7 shows the search limits used by VHPOP at IPC3. These limits were determined after some initial trials on the competition problems. Note that there was no set search limit for the last flaw selection strategy. Whenever the other three strategies all reached their search limits without having found a solution, LCFR-Loc-Conf was used until physical resource limits were reached.
|
Table 8 shows the number of plans generated in the STRIPS Satellite domain before a solution is found for the four flaw selection strategies used at IPC3, and also the number of generated plans when combining the four strategies using the schedule in Table 7. To better understand how the round-robin scheduling works, we take a closer look at the numbers for problem 15. Table 9 shows how the total number of generated plans is divided between rounds and flaw selection strategies. Note that although MW-Loc is actually the best strategy for this problem, it is stopped already in round 5. The total number of generated plans does not exactly match the actual number of generated plans reported in Table 8. This is because we only consider suspending the use of a flaw selection strategy after all refinements of the last selected plan have been added, so the limit in a round can be exceeded slightly in practice. The numbers in Table 9 represent an idealized situation where flaw selection strategies are switched when the number of generated plans exactly matches the limit for the current round.
|
|
VHPOP solved 122 problems out of 224 attempted at IPC3. The quality of the plans, in terms of number of steps, generated by VHPOP was generally very high. For plain STRIPS domains, VHPOP's plans were typically within 10 percent of the best plans found by any planner in the competition, with 28 of VHPOP's 68 STRIPS plans being at least as short as the best plans found and, being a POCL planner, VHPOP automatically exploits parallelism in planning domains, generating plans for STRIPS domains with low total plan execution time (Table 10). Table 11 shows that VHPOP also performed well in terms of number of solved problems in four of the six STRIPS domains, being competitive with top performers such as MIPS and LPG (particularly in the Rovers domain).
In domains with durative actions5, total execution time was given as an explicit plan metric, and the objective was to minimize this metric. The specification of an explicit plan metric is a feature of PDDL2.1 not present in earlier versions of PDDL. As VHPOP currently ignores this objective function and always tries to find plans with few steps, it should come as no surprise that the quality of VHPOP's plans for domains with durative actions was significantly worse than the quality of the best plans found (Table 12).6 VHPOP still produced plans with few steps, however, with over 60 percent of VHPOP's plans for domains with durative actions having the fewest steps. The plan selection heuristic that VHPOP uses is tuned for finding plans with few steps, and it would need to be modified in order to find plans with shorter total execution time. Table 13 shows that LPG solved by far the most problems in domains with durative actions, but that VHPOP was competitive with MIPS and clearly outperformed TP4 and TPSYS.
|
|
|
|
While VHPOP was a top performer at IPC3 in terms of plan quality, it was far from the top in terms of planning time. VHPOP was typically orders of magnitude slower than the fastest planner. The high planning times for VHPOP can in part be attributed to implementation details. Improvements to the code (e.g. using pointer comparison instead of string comparison whenever possible) since the planning competition has resulted in 10 to 20 percent lower planning times when using ground actions and when using lifted actions the planner is more than twice as fast as before. The reachability analysis is still a bottleneck, however, and further improvements could definitely be made there. It is important to remember, though, that we basically run four planners at once by using four flaw selection strategies concurrently. Table 14 shows the average relative performance of VHPOP at IPC3 compared to the performance of VHPOP using only the best flaw selection strategy for each problem. VHPOP with the best flaw selection strategy is on average two to three times faster than VHPOP with four concurrent strategies. Using several flaw selection strategies simultaneously helps us solve more problems, but the price is reduced speed. By more intelligently scheduling the different flaw selection strategies depending on domain and problem features, and not just using a fixed schedule for all problems, we could potentially increase planner efficiency significantly.
|
Håkan L. S. Younes