next up previous
Next: Results of Analysis Up: Scaling Issues Previous: Scaling Issues

Analytic Framework

As discussed in Section 6, we use the planners themselves as judges to determine how difficult individual problems were. Given that most of the competing planners proceeded by first grounding the problem instance and then by searching the problem space using some variation on the theme of a relaxed distance estimate, there seems little reason to believe that the planners would strongly diverge. If a particular instance, or family of instances, proved difficult for one planner it might be expected that this same collection would be challenging for all the competitors. To avoid being distracted by the impact of hand-coded control rules we separate the judgements of the fully-automated planners from those of the hand-coded planners. For each domain/level combination the hypothesis is that planners tend to agree about the relative difficulties of the problems presented within that domain and level.

To explore the extent to which agreement exists we perform rank correlation tests for agreement in multiple judgements [KanjiKanji1999] (we refer to this test as an MRC). In our experiment the judges are the planners and the subjects are the problem instances. We perform a distinct MRC for each domain/level combination, showing in each case how the planners ranked the instances in that domain and level. We therefore perform 25 MRCs for the fully-automated planners (there were 25 distinct domain/level pairs in which the fully-automated planners competed), 23 for the hand-coded planners on the small problems (the hand-coded planners did not compete in the Freecell STRIPS or Settlers NUMERIC domains) and 22 for the hand-coded planners on the large problems (amongst which there were no Satellite HARDNUMERIC instances). The results of these tests are shown in Figure 26. In each test the $n$ planners rank the $k$ problem instances in order of time taken to solve. Unsolved problems create no difficulties as they are pushed to the top end of the ranking. The MRC determines whether the independent rankings made by the $n$ planners agree. The test statistic follows the F-distribution with $(k-1,k(n-1))$ degrees of freedom determining whether the critical value is exceeded.


next up previous
Next: Results of Analysis Up: Scaling Issues Previous: Scaling Issues
Derek Long 2003-11-06