In this study, we compared performance of multiple versions of four
planners (labeled for this section with W, X, Y and Z, with larger
version numbers indicating subsequent versions). We considered two
criteria for improvement: outcome of planning and computation time for
solved problems. The outcome of planning is one of: solved, failed or
timed-out. On each criterion, we statistically analyzed the data for
superior performance of one of the versions. The outcome results for
all the planners are summarized in Table 7. As
the table shows, rarely does a new version result in more problems
being solved. Only Z improved the number of our test
problems solved in subsequent versions.
Table 7:
Version performance: counts of outcome and change in number
solved.
Planner
Version
Solved
Failed
Timeout
Solved?
W
1
286
664
533
W
2
255
1082
147
X
1
502
973
3
X
2
441
940
103
Y
1
387
750
339
Y
2
382
771
329
Z
1
240
1043
201
Z
2
276
959
248
Z
3
268
963
252
Z
4
421
878
184
To check for whether the differences in outcome are significant, we ran 2x3
tests with planner version as independent variable and outcome as
dependent. Table 8 summarizes the results of the
analysis. For Z, we compared each version to its successor only. The
differences are significant except for Y and the transition from Z 2 to 3 (this
was expected because these two versions were extremely similar).
Table 8:
results comparing versions of the same planner.
old
new
Planner
Version
Version
P
W
1
2
320.96
.0001
X
1
2
98.84
.0001
Y
1
2
.46
.79
Z
1
2
10.96
.004
Z
2
3
.158
.924
Z
3
4
48.50
.0001
Another planner performance metric, which we evaluated, was the speed
of solution. For this analysis, we limited the comparison to just those
problems that were solved by both versions of the planner. We then
classified each problem by whether the later version solved the problem
faster, slower, or in the same time as the preceding version. From the
results in Table 9, we see that all of the
planners improved in the average speed of solution for subsequent versions, with the
exception of Z (transition from the 1 to 2 versions). However,
Z did increase the number of problems solved between those versions.
Table 9:
Improvements in execution speed across versions. The
Faster column counts the number of cases in which the new version
solved the problem faster; Slower specifies those cases in which
the new version took longer to solve a given problem.