Next: Planner Assumption 2: What
Up: Interpretation of Results and
Previous: Problem Assumption 3: Does
Our
results suggest that new versions run faster, but often do not solve
more problems. Thus, the newest version may not represent the ``best''
(depending on your definition) performance for the class of
planner. Some competitions in other fields, e.g., the automatic
theorem proving community, require the previous year's best performer
to compete as well; this has the advantage of establishing a baseline
of performance as well as allowing a comparison to how the focus may
shift over time.
Recommendation 10: If the primary evaluation metric is speed,
then a newer version may be the best competition. If it is number of problems
solved or if one wishes to establish what progress has been made, then it
may be worth running against an older version as well. If
recommendation 9 has been followed, then evaluators should select a
version based on this guidance.
©2002 AI Access Foundation and Morgan Kaufmann
Publishers. All rights reserved.