next up previous
Next: The Effect of the Up: Experimenting with MICRO-HILLARY Previous: Independent Variables

Learning to Solve Problems in the 15-puzzle Domain


Table 1: The upper part of the table shows the learning resources consumed while learning macros in the 15-puzzle domain. The results are averaged over 100 learning sessions. The lower part of the table lists the statistics of the generated macro sets (averaged over the 100 databases).
Learning Resources Operator applications CPU seconds Problems
  Mean Std. Mean Std. Mean Std.
  498,172 74,047 735 218 60 4.2
Generated Macros Total number Mean Length Max Length
  Mean Std. Mean Std. Mean Std.
  14.16 0.74 8.38 0.2 18 0.0

The basic experiment tests the ability of MICRO-HILLARY to learn the 15-puzzle domain. Table 1 shows the learning resources and the macro statistics averaged over the 100 learning sessions. We also tested the utility of the learned knowledge by comparing the performance of the problem solver that uses the learned macros to the following problem solvers:

.
MICRO-HILLARY without the learned knowledge and with learning turned off.
.
Best-first search8using the MD heuristic.
.
Best-first search using the RR heuristic.
.
Weighted-A* with the MD heuristic. We selected a node evaluation function f(n)=(1-w)g(n)+wh(n) with w=0.75, which was reported by Korf [14] as the best weight for the 15-puzzle domain (Korf uses the equivalent notation f(n)= g(n)+Wh(n) with W=3).
.
IDA* with the MD heuristic. Since we used random problems generated using the same method as Korf, we used the results reported in [14]. The number of operator applications for Korf's data was estimated based on the ratio between generated nodes and operator applications in our run of best-first search.
Table 2 shows the results of this experiment. The results are averaged over the 100 problems in the test set. The results for MICRO-HILLARY after learning are also averaged over the 100 learning sessions.


Table 2: The utility of the learned macros. The resources required for solving the testing set compared with various non-learning problem solvers. The results are averaged over the 100 problems in the test set. The results for MICRO-HILLARY after learning are also averaged over the 100 learning sessions.
Operator Generated Expanded CPU Solution
applications nodes nodes seconds length
Search method Mean Std. Mean Std. Mean Std. Mean Std. Mean Std.
MICRO-HILLARY (before learning) 221,552 51,188 181,422 45,044 55,372.9 12,794.0 618.92 137.75 141.0 26.40
MICRO-HILLARY (after learning) 688 21 196 4 47.4 1.3 0.30 0.01 149.5 0.19
Best first (MD) 13,320 8,686 6,868 4,500 3,330.5 2,171.6 53.95 39.81 145.8 35.48
Best first (RR) >192,811 26,365 >98,101 13,294 >48,203 6591.4 >5920 11,989 --
WA* (W=3) 40,385 44,396 20,181 21,917 10,096 11099.0 439.28 1206.00 74.2 12.64
IDA* (Korf [14]) 704,067,291 - 363,028,090 - - - - - 53.0 -

MICRO-HILLARY, after learning, solves problems much faster than the other algorithms. It uses 19 times fewer operator applications than best-first search using the MD heuristic, with solutions of comparable length. If we compare the CPU time of best-first search to that of MICRO-HILLARY we can see that MICRO-HILLARY is more than 100 times faster than best-first search. This is because best-first search has an overhead per generated node which is much higher than MICRO-HILLARY's.9 MICRO-HILLARY also has the advantage of a very small standard deviation. MICRO-HILLARY after learning is 322 times faster than MICRO-HILLARY before learning. MICRO-HILLARY is 59 times faster than WA* with W=3, but yields solutions that are twice as long. MICRO-HILLARY is more than 1,000,000 times faster than IDA* with solutions that are 2.8 times longer (than the optimal solutions found by IDA*).

During the testing phase, MICRO-HILLARY never encountered local minima. Thus, according to definition 1, all the sets of macros learned by MICRO-HILLARY were apparently complete. This was also the case with the rest of the experiments described in this paper.

We have tried using iterative deepening (ID) for escape instead of ILB. We used an improved version of ID that tests each new state against the states along the path to the root to avoid loops. Table 3 shows the resources consumed during learning. Table 4 shows the statistics of the macro sets acquired. As expected, learning with ILB is faster than with ID (by a factor of 9), but produces macros that are slightly longer (ID finds the shortest escape routes).


Table 3: The learning resources consumed while learning in the 15-puzzle domain using various escape methods. Each number represents the mean over 100 learning sessions.
  Operator applications CPU seconds Problems
Escape method Mean Std. Mean Std. Mean Std.
ILB 498,172 74,047 735 218 60 4.2
ID 4,553,319 767,093 2,542 457 60 5.4


Table 4: The statistics of the macro sets acquired with various escape methods. Each number represents the mean over 100 databases.
  Total number Mean Length Max Length
Escape method Mean Std. Mean Std. Mean Std.
ILB 14.16 0.74 8.38 0.20 18.0 0.0
ID 14.39 0.88 8.12 0.22 17.0 0.0


next up previous
Next: The Effect of the Up: Experimenting with MICRO-HILLARY Previous: Independent Variables
Shaul Markovitch
1998-07-21