Learning to Solve Problems in the 15-puzzle Domain

Next: The Effect of the Up: Experimenting with MICRO-HILLARY Previous: Independent Variables

Learning to Solve Problems in the 15-puzzle Domain

Table 1: The upper part of the table shows the learning resources consumed while learning macros in the 15-puzzle domain. The results are averaged over 100 learning sessions. The lower part of the table lists the statistics of the generated macro sets (averaged over the 100 databases).

Learning Resources	Operator applications		CPU seconds		Problems
	Mean	Std.	Mean	Std.	Mean	Std.
	498,172	74,047	735	218	60	4.2
Generated Macros	Total number		Mean Length		Max Length
	Mean	Std.	Mean	Std.	Mean	Std.
	14.16	0.74	8.38	0.2	18	0.0

The basic experiment tests the ability of MICRO-HILLARY to learn the 15-puzzle domain. Table 1 shows the learning resources and the macro statistics averaged over the 100 learning sessions. We also tested the utility of the learned knowledge by comparing the performance of the problem solver that uses the learned macros to the following problem solvers:

.: MICRO-HILLARY without the learned knowledge and with learning turned off.
.: Best-first search⁸using the MD heuristic.
.: Best-first search using the RR heuristic.
.: Weighted-A* with the MD heuristic. We selected a node evaluation function f(n)=(1-w)g(n)+wh(n) with w=0.75, which was reported by Korf [14] as the best weight for the 15-puzzle domain (Korf uses the equivalent notation f(n)= g(n)+Wh(n) with W=3).
.: IDA* with the MD heuristic. Since we used random problems generated using the same method as Korf, we used the results reported in [14]. The number of operator applications for Korf's data was estimated based on the ratio between generated nodes and operator applications in our run of best-first search.

Table 2 shows the results of this experiment. The results are averaged over the 100 problems in the test set. The results for MICRO-HILLARY after learning are also averaged over the 100 learning sessions.

Table 2: The utility of the learned macros. The resources required for solving the testing set compared with various non-learning problem solvers. The results are averaged over the 100 problems in the test set. The results for MICRO-HILLARY after learning are also averaged over the 100 learning sessions.

	Operator		Generated		Expanded		CPU		Solution
	applications		nodes		nodes		seconds		length
Search method	Mean	Std.	Mean	Std.	Mean	Std.	Mean	Std.	Mean	Std.
MICRO-HILLARY (before learning)	221,552	51,188	181,422	45,044	55,372.9	12,794.0	618.92	137.75	141.0	26.40
MICRO-HILLARY (after learning)	688	21	196	4	47.4	1.3	0.30	0.01	149.5	0.19
Best first (MD)	13,320	8,686	6,868	4,500	3,330.5	2,171.6	53.95	39.81	145.8	35.48
Best first (RR)	>192,811	26,365	>98,101	13,294	>48,203	6591.4	>5920	11,989	-	-
WA* (W=3)	40,385	44,396	20,181	21,917	10,096	11099.0	439.28	1206.00	74.2	12.64
IDA* (Korf [14])	704,067,291	-	363,028,090	-	-	-	-	-	53.0	-

MICRO-HILLARY, after learning, solves problems much faster than the other algorithms. It uses 19 times fewer operator applications than best-first search using the MD heuristic, with solutions of comparable length. If we compare the CPU time of best-first search to that of MICRO-HILLARY we can see that MICRO-HILLARY is more than 100 times faster than best-first search. This is because best-first search has an overhead per generated node which is much higher than MICRO-HILLARY's.⁹ MICRO-HILLARY also has the advantage of a very small standard deviation. MICRO-HILLARY after learning is 322 times faster than MICRO-HILLARY before learning. MICRO-HILLARY is 59 times faster than WA* with W=3, but yields solutions that are twice as long. MICRO-HILLARY is more than 1,000,000 times faster than IDA* with solutions that are 2.8 times longer (than the optimal solutions found by IDA*).

During the testing phase, MICRO-HILLARY never encountered local minima. Thus, according to definition 1, all the sets of macros learned by MICRO-HILLARY were apparently complete. This was also the case with the rest of the experiments described in this paper.

We have tried using iterative deepening (ID) for escape instead of ILB. We used an improved version of ID that tests each new state against the states along the path to the root to avoid loops. Table 3 shows the resources consumed during learning. Table 4 shows the statistics of the macro sets acquired. As expected, learning with ILB is faster than with ID (by a factor of 9), but produces macros that are slightly longer (ID finds the shortest escape routes).

Table 3: The learning resources consumed while learning in the 15-puzzle domain using various escape methods. Each number represents the mean over 100 learning sessions.

	Operator applications		CPU seconds		Problems
Escape method	Mean	Std.	Mean	Std.	Mean	Std.
ILB	498,172	74,047	735	218	60	4.2
ID	4,553,319	767,093	2,542	457	60	5.4

Table 4: The statistics of the macro sets acquired with various escape methods. Each number represents the mean over 100 databases.

	Total number		Mean Length		Max Length
Escape method	Mean	Std.	Mean	Std.	Mean	Std.
ILB	14.16	0.74	8.38	0.20	18.0	0.0
ID	14.39	0.88	8.12	0.22	17.0	0.0

Next: The Effect of the Up: Experimenting with MICRO-HILLARY Previous: Independent Variables

Shaul Markovitch
1998-07-21