Learning Macros in Other Domains

Table 12: The resources consumed while learning in various domains. Each number represents the mean over 100 learning sessions.

	Operator applications		CPU seconds		Problems
Domain	Mean	Std.	Mean	Std.	Mean	Std.
24-puzzle	859,497	186,823	4842.0	2270.0	62.8	7.2
10-Cannibals	216,572	88,467	14.2	6.0	63.9	12.0
10-Stones	144,303	1,653	8.4	1.9	52.0	0.3
5-Hanoi	377,671	80,968	35.2	3.5	76.2	9.5
Grid	1,956,972	1,191,383	58.8	35.0	185.0	60.2

Table 13: Statistics of the macro sets generated in various domains. Each number represents the mean over 100 sets.

	Total number		Mean Length		Max Length
Domain	Mean	Std.	Mean	Std.	Mean	Std.
24-puzzle	15.32	0.79	8.68	0.24	18.00	0.0
10-cannibals	2.16	0.39	2.93	0.15	3.98	0.2
10-stones	1.20	0.40	2.00	0.00	2.00	0.0
5-Hanoi	11.47	0.71	7.24	0.25	16.00	0.0
Grid	17.50	3.14	8.62	0.82	16.36	1.5

Table 14: The performance of MICRO-HILLARY (in operator applications) before and after learning in various domains.

	Before learning		After learning
Domain	Mean	Std.	Mean	Std.
24-puzzle	711,545	134,807.0	1540.0	57.5
10-cannibals	57	40.5	31.6	0.4
10-stones	206	84.8	126.6	8.1
5-Hanoi	10,993	14,623.3	156.0	9.4
Grid	1,529	1313	369.0	35.0

We have applied MICRO-HILLARY to the other domains specified in Section 4.1. Tables 12, 13 and 14 show the mean results for 100 learning sessions. MICRO-HILLARY was able to reach quiescence in all the domains. The 10-stones and 10-cannibals domains are very simple. One or two macros were sufficient to reach quiescence. Note that we used the same quiescence parameter, 50 problems, for all the domains. After solving each problem, MICRO-HILLARY increases by 100 the length of the random sequence used for generating a training problem. Therefore, MICRO-HILLARY spends 125,000 operator applications just to make sure that there is nothing new to learn. In the simple domains, this amounts to most of the resources used by MICRO-HILLARY.

It is interesting to look at the macros learned in the grid domain. Most of the macros have a structure of ${\scriptstyle SSS \ldots S W N \ldots NNN}$ , where S stands for south, W for west, N for north, and S and N are equal in number. Such macros are used to make detours around walls that block the search.

MICRO-HILLARY was able to improve the performance of problem solving in each of the domains. The most notable improvement is in the 24-puzzle domain where the performance after learning is 462 times better than the performance before learning.