Next: Appendix A. Applying PARAMETRIC Up: A Selective Macro-learning Algorithm Previous: Experimenting with PARAMETRIC MICRO-HILLARY

Discussion

Despite its simplicity, the MICRO-HILLARY algorithm presented in this paper was able to learn macros that allow an efficient solution of any solvable NxN puzzle problem. It also performed well in a number of other domains. In this section, we compare MICRO-HILLARY to other macro-learning algorithms and discuss its strengths and weaknesses.

Most of the existing macro-learning programs are based on the notion of subgoaling: the learner tries to acquire macros that achieve some subgoal without undoing previously satisfied subgoals [13,19,33,34,41,40]. MICRO-HILLARY, like MACLEARN [9], does not assume subgoaling, but assumes the existence of a heuristic function. EASe [34] combines subgoaling with a heuristic function to guide the search for the current subgoal. The subgoal-oriented macro-learners use various methods to guard the previously achieved subgoals. MICRO-HILLARY is much simpler, requiring only that the macro acquired lead from a local minimum to a state with a better heuristic value. It is possible to implicitly enforce subgoaling by using an appropriate evaluation function. For example, the RR heuristic used for the NxN puzzle domain encodes a subgoal-oriented strategy for solving puzzle problems. However, the algorithm works even in domains where such natural subgoaling is not evident. For example, in the grid domain, the local minima are located behind obstacles--the search procedure cannot move forward and all other directions increase the Manhattan distance from the goal location. MICRO-HILLARY acquires macros that allow a detour around the obstacles. Such local minima cannot be called subgoals since we would prefer to avoid them. However, once the search program is trapped in those minima, it can use the macros to escape from them.

While MICRO-HILLARY and MACLEARN are similar in their use of the heuristic function as a guide to the macro learner, they are different in several other aspects. MACLEARN uses best-first search as its training and testing problem solver. MICRO-HILLARY uses hill-climbing (with an escape procedure). This difference is significant since the increased branching factor resulting from the macros is extremely harmful in best-first search but adds only to the constant in hill-climbing search (provided that the hill-climbing search does not find solutions that are significantly longer than those found by best-first search). Related to this difference is the different approaches that MACLEARN and MICRO-HILLARY take toward generalization of macros. MACLEARN, using best-first search, cannot afford trying every macro in every state. Therefore, it uses domain-specific pattern language to specify for each macro the states at which it should be applied. MICRO-HILLARY, using hill-climbing, can afford more liberal and domain-independent generalization using a macro wherever it is legal. Another difference between the two algorithms is the attention filters that they use. MACLEARN uses the minimum-to-minimum method while MICRO-HILLARY uses the minimum-to-better method. The minimum-to-minimum method has the weakness of acquiring longer macros, as local minima become scarce and far apart during the learning process. MACLEARN needed a special filter for avoiding the acquisition of long macros.

MICRO-HILLARY is a completely autonomous algorithm--it generates its own examples and uses the concept of quiescence to decide when to increase the domain parameter and when to stop learning. MACLEARN used a more supervised approach, processing hand-crafted training examples. To learn to solve sliding-tile puzzles, MACLEARN was given first a single problem of a 6-tile puzzle, then a single instance of an 8-puzzle and a 15-puzzle (after which it was able to solve a single instance of a 24-puzzle). MICRO-HILLARY was demonstrated solving large sets of random problems of large size. One advantage that MACLEARN has over MICRO-HILLARY is its ability to handle domains with parameterized operators. MICRO-HILLARY tries all the applicable macros in every state. Had parameterized macros been used, MICRO-HILLARY would have had to try all the ways of binding the parameters, resulting in a potentially large branching factor. The peg-solitaire domain cannot be handled efficiently by MICRO-HILLARY since it does not use a small set of fixed operators. MACLEARN solves it by using parameterized operators.

Unlike some speedup learners that provide us with either statistical or theoretical guarantees [2,7,8,39,40], MICRO-HILLARY has a heuristic nature and does not provide us with any guarantee. Indeed, while it performs very well in some domains, it fails in other domains such as the N-Hanoi. To handle such domains, we would have to endow MICRO-HILLARY with the capability of learning parameterized recursive macros [37]. A related weakness of MICRO-HILLARY is the sensitivity of the algorithm to the heuristic function available. We have shown an example where MICRO-HILLARY has extreme difficulties with a seemingly good heuristic (the sum of Manhattan distances for the 15-puzzle) while learning easily with another one.

There are two features of the heuristic function that are necessary for a successful performance of MICRO-HILLARY:

There is a ``small'' set of operator sequences that completely smooth the heuristic function to eliminate any local minima. This can be achieved either by a heuristic function that has a small number of local minima or, when the set of local minima is large, by macros that generalize well over this set.
The maximal radius of the local minima should be small enough to allow the escape procedure to find an escape route. In addition, to use the scalable version of MICRO-HILLARY, the maximal radius should not increase with the size.

The experiments described in this paper and the theorem in the appendix show that the RR heuristic of the NxN puzzle domain has a maximal radius of 18, regardless of the puzzle size. This feature does not hold for the MD heuristic. The example puzzle in Section 4.4 illustrates the problem. When scaling up the puzzle, the maximal radius scales up as well.

Another weakness of MICRO-HILLARY is its sensitivity to the quiescence parameter. This parameter tells MICRO-HILLARY when to quit learning. For PARAMETRIC MICRO-HILLARY it is used to determine when to increase the domain parameter for training. Setting the quiescence parameter to a low value can make MICRO-HILLARY quit prematurely before learning all the necessary macros. Setting it to a high value can lead to a great waste of learning resources. Indeed, for the simple domains such as the N-Cannibals and N-Stones, MICRO-HILLARY spent most of its learning time making sure that there is nothing new to learn. Note that the quiescence mechanism is needed only when performing off-line learning. MICRO-HILLARY could also be used in an on-line mode, where macros are learned while solving user problems. In such a case there is no need to use the quiescence parameter. Whenever MICRO-HILLARY encounters a local minimum, it will use the escape mechanism to find a path and record it as a macro.

While the above weaknesses are significant, we should also consider the strengths of the algorithm. MICRO-HILLARY is extremely simple, yet it is able to learn to efficiently solve problems in a number of known domains. The ability of such a simple learning algorithm to find an efficient procedure for solving general NxN puzzle problems shows the potential in using selective learning for speeding up problem solvers. Our research goal of building the simplest domain-independent learning algorithm to solve the NxN puzzle was achieved. A Common-Lisp implementation of MICRO-HILLARY and the NxN domain is included in the online appendix. This short code (less than 200 lines long) underscores the simplicity of the algorithm.

Even with the simple architecture of MICRO-HILLARY, there is much future work to be done. One of the interesting directions is the application of macro-learning to more complex domains such as scheduling. Macro-learning, like most learning techniques, works best when there are regularities in the domain that it can exploit. It is not yet clear whether there are such regularities in the scheduling domain, for example. It is possible that macros can encode groups of variable assignments that should be tried together.

Next: Appendix A. Applying PARAMETRIC Up: A Selective Macro-learning Algorithm Previous: Experimenting with PARAMETRIC MICRO-HILLARY

Shaul Markovitch
1998-07-21