Efficiency is clearly a function of memory and effort. Memory size is limited by the hardware. Effort is measured as CPU time, preferably but not always on the same platform in the same language. The problems with CPU time are well known: programmer skill varies; research code is designed more for fast prototyping than fast execution; numbers in the literature cannot be compared to newer numbers due to processor speed improvements. However, if CPU times are regenerated in the experimenter's environment then one assumes that
performance degrades similarly with reductions in capabilities of the runtime environment (e.g., CPU speed, memory size) (metric assumption 1).In other words, an experimenter or user of the system does not expect that code has been optimized for a particular compiler/operating system/hardware configuration, but it should perform similarly when moved to another compatible environment.
The most commonly reported comparison metric is computation time. The second most is number of steps or actions (for planners that allow parallel execution) in a plan. Although planning seeks solutions to achieving goals, the goals are defined in terms of states of the world, which does not lend itself well to general measures of quality. In fact, quality is likely to be problem dependent (e.g., resource cost, amount of time to execute, robustness), which is why number of plan steps has been favored. Comparisons assume that
number of steps in a resulting plan varies between planner solutions and approximates quality (metric assumption 2).Any comparison, competitions especially, has the unenviable task of determining how to trade-off or combine the three metrics (number solved, time, and number of steps). Thus, if number of steps does not matter, then the comparison could be simplified.
We converted each assumption into a testable question. We then either summarized the literature on the question or ran an experiment to test it.