The Effect of the Heuristic Function on Learning

The MICRO-HILLARY algorithm is based upon the availability of a ``generally good'' heuristic function to start with. The heuristic function should be able to lead the search program towards the goal, except in a small number of local minima where an exhaustive search and learning take place. The next experiment tests the effect of the heuristic function on the performance of MICRO-HILLARY. We have conducted 100 learning sessions and testing sessions with each of the heuristic functions described in Section 4.1.2. Tables 8, 9 and 10 show the results obtained.

Table 8: The learning resources consumed while learning in the 15-puzzle domain using various heuristic functions. Each number represents the mean over 100 learning sessions.

	Operator applications		CPU seconds		Problems
Heuristic	Mean	Std.	Mean	Std.	Mean	Std.
RR	498,172	74,047	735	218	60	4.2
Row-by-row-2	4,391,468	1,805,776	2191	1,181	247	62.9
Reduction	552,463	128,769	878	334	63	9.1
Spiral	891,728	144,633	1,621	47	74	15.3
MD	228,487,090	44,106,048	98,659	2,341	1,917	210.3

Table 10: The utility of the learned macros with various heuristic functions. The resources required for solving the testing set compared with various non-learning problem solvers. The results are first averaged over the 100 problems in the test set and then averaged again over the 100 macro sets.

	Operator		Expanded		CPU		Solution
	applications		nodes		seconds		length
Heuristic	Mean	Std.	Mean	Std.	Mean	Std.	Mean	Std.
RR	688	21	47.4	1.3	0.30	0.01	149.5	0.19
Row-by-row-2	2,827	21	57.24	144.8	1.33	3.60	148.2	0.19
Reduction	738	19	42.67	0.4	0.29	0.01	125.5	1.60
Spiral	944	34	44.76	0.5	0.37	0.02	130.1	5.27
MD	11,520	1009	175.00	83.6	2.40	0.01	142.0	2.04

To understand the reasons for these differences we can look at the statistics of the acquired macros (Table 9). The total number of macros acquired with MD is 50 times larger than the number of macros acquired with RR. A larger number of macros implies a larger branching factor, which explains the lower efficiency of the problem solver using the MD macros. The high learning resources can be explained by three factors:

The above explanation is only partially satisfactory, since it does not tell us the source of the problem. To understand why the MD heuristic has a large radius, look at the following puzzle:

Table 11: Examples for macros acquired with the RR and the MD heuristics. Each example shows the state where the macro was acquired, the macro itself, and the state after the application of the acquired macro.

MD			RR
Start	Macro	End	Start	Macro	End
$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... & 12 \\ \hline 5 & 11 & 3 & 10 \\ \hline 9 & & 7 & 14 \\ \hline \end{array}$	r r u u u	$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... 8 \\ \hline 5 & 11 & 3 & 12 \\ \hline 9 & 7 & 14 & 10 \\ \hline \end{array}$	$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... 8 \\ \hline 9 & 13 & 10 & \\ \hline 11 & 14 & 15 & 12 \\ \hline \end{array}$	d l l u r	$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... 8 \\ \hline 9 & 10 & & 12 \\ \hline 11 & 13 & 14 & 15 \\ \hline \end{array}$
$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... \\ \hline 1 & 10 & 11 & 12 \\ \hline 13 & 6 & 15 & 14 \\ \hline \end{array}$	r d l d r u u l d r d	$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... 8 \\ \hline 10 & & 11 & 12 \\ \hline 13 & 6 & 15 & 14 \\ \hline \end{array}$	$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... 4 \\ \hline 14 & 5 & 13 & 6 \\ \hline 9 & 11 & 7 & 10 \\ \hline \end{array}$	l u r r d l u l d	$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ...15 \\ \hline 14 & 5 & 13 & 6 \\ \hline 9 & 11 & 7 & 10 \\ \hline \end{array}$
$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... 7 \\ \hline 5 & 10 & 11 & 12 \\ \hline 13 & 14 & 15 & \\ \hline \end{array}$	u l u r d l l l u r r r d d l l u r u l l r d d r r l r u d l l l u r u l d d	$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... 8 \\ \hline 9 & 10 & 11 & 12 \\ \hline & 13 & 14 & 15 \\ \hline \end{array}$	$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... 8 \\ \hline 9 & 10 & 11 & 12 \\ \hline 14 & 15 & 13 & \\ \hline \end{array}$	u l d l l u r d r u l l d r r u r d	$\renewcommand{\arraystretch}{1.0}\setlength{\arraycolsep}{0.5mm}\begin{array}{... ... 8 \\ \hline 9 & 10 & 11 & 12 \\ \hline 15 & 13 & 14 & \\ \hline \end{array}$

It is more difficult to explain the larger number of macros. A larger number of macros means that during training MICRO-HILLARY encounters many states where none of the previously learned macros is useful, and new macros are acquired. This implies that the likelihood of a macro acquired with MD to be useful in a state is much lower than that of a macro acquired with RR. One reason for this difference is the longer macros acquired with MD. As we stated before, longer macros tend to be less general. There are, however, other reasons which are best understood by examples. Table 11 shows three macros acquired with each of the two heuristics, along with the states before and after the macro acquisition.

The MD macro at the first line, rruuu, is applicable only in states where the empty tile is in the last row at one of the two leftmost locations. Five tiles are moved and the Manhattan distance of each is either increased or decreased by one. The macro is useful only in states where the distance of at least three of the tiles is decreased. Compare this to the equally long RR macro dllur. It is applicable in states where the empty tile is in one of 6 locations (last two columns and last three rows), and is therefore much more likely to be applicable than the MD macro. It is useful in states where the blank is to the right of the next tile to be moved and the direction of the movement is to the left. Since the RR heuristic imposes a particular order on the placement of tiles, any attempt to solve a problem will get to a state where, for example, the first 9 tiles are in place and tile 10 should be placed next. At a certain point tile 10 will be at distance 1 from its target location, either below the target location or to the right of it. In all the states where the tile is to the right of its target location, and the blank tile is to its right, the macro dllur will be useful. Thus, this macro has high likelihood of being useful.

The MD macro in the second line is, again, useful for a very particular situation. The RR macro, however, is useful for at least half of the problems. While solving a problem with RR there must be a situation where the first 3 tiles are in their target location and the fourth tile is at distance one from its target. If the blank is to its left, then the macro is useful. The MD macro in the last row of the table is the most extreme case of low usefulness. It is useful only if the tiles are placed in a particular order, with the tiles in the first, third and fourth row placed before the tiles in the second row. Contrast this macro with the longest RR macro shown to its right. This macro is useful for most problems since there will always be a situation where the first three rows are ordered and tile 13 should be moved towards its goal.

To conclude, the difference between RR and MD that makes the RR macros more useful is the restriction that RR puts on the order of solving subproblems. This strict order makes the problem solver go through similar states regardless of the initial state. The freedom given by MD allows the problem solver pick any of the N²! possible orders for tile placement. Furthermore, it allows macros that move tiles away from their target location, provided that other tiles are moved towards their target. Such macros are useful only in very particular states.

The particular order used by RR reduces the maximal radius by making it possible to place tiles without the need to (even temporarily) displace too many already-placed tiles. The other two heuristics, Reduction and Spiral, also have this property. Consider, however, a heuristic similar to RR where the tiles are placed in order from tile 2 to tile 15 after which tile 1 is placed. Such a placement order increases the maximal radius significantly. In Section 6 we discuss the differences between the two heuristics when used for the NxN puzzle domain.