The format of the 1998 competition required entrants to execute 140 problems in the first round. Of these problems, 52 could not be solved by any planner. For round two, the planners executed 15 new problems in three domains, one of which had not been included in the first round.
The 2000 competition attracted 15 competitors in three tracks: STRIPS, ADL and a hand-tailored track. It required performance on problems in five domains: logistics, Blocksworld, parts machining, Freecell (a card game), and Miconic-10 elevator control. These domains were determined by the organizing committee, with Fahiem Bacchus as the chair, and represented a somewhat broader range. We chose problems from the Untyped STRIPS track for our set.
From a scientific standpoint, one of the most interesting conclusions of both competitions was the observed trade-offs in performance. Planners appeared to excel on different problems, either solving more from a set or finding a solution faster. In 1998, IPP solved more problems and found shorter plans in round two; STAN solved its problems the fastest; HSP solved the most problems in round one; and blackbox solved its problems the fastest in round one. In 2000, awards were given to two groups of distinguished planners across the different categories of planners (STRIPS, ADL and hand tailored), because according to the judges, ``it was impossible to say that any one planner was the best''[Bacchus2000]; TalPlanner and FF were in the highest distinguished planner group. The graphs of performance do show differences in computation time relative to other planners and to problem scale-up. However, each planner failed to solve some problems, which makes these trends harder to interpret (the computation time graphs have gaps).
The purpose of these competitions was to showcase planner technology at which they succeeded admirably. The planners solved much harder problems than could have been accomplished in years past. Because of this trend in planners handling increasingly difficult problems, the competition test sets may become of historical interest for tracking the field's progress.