Next: Background: Selective Macro-Learning Up: A Selective Macro-learning Algorithm Previous: A Selective Macro-learning Algorithm

Introduction

The goal of a learning system is to modify a performance element so as to improve its performance with respect to some given criterion. The learning system does so by using its experience to generate knowledge for use by the performance element. Most of the machine learning community is concerned with improving the classification accuracy of classifiers based on classified examples. A smaller part of the community is concerned with improving the speed of search programs based on problem-solving experience. This type of learning is commonly termed speedup learning [40].

One of the most common methods of speedup learning is the acquisition of macro-operators [4,10,13,19,21,23,34]. Given the traditional definition of a search space with a set of states and a set of basic operators that connect them, a macro-operator is defined as a finite sequence of basic operators. Macro-operators are typically acquired during problem solving and are used in the same manner as basic operators. The process of acquiring macros is relatively simple. A system needs only to solve training problems and pass the search tree to the acquisition procedure, which, in turn, can add any sub-sequence of operators from the tree to its macro knowledge-base. The acquired macros, however, carry costs as well as benefits. One of the most significant costs associated with using macros is the increased branching factor of the search space. When the costs outweigh the benefits, we face a phenomenon called the utility problem [24,7,22,26].

Due to the very large number of macros available for acquisition, a learning program must be selective in order to obtain a macro set with high utility. The goal of this research is to demonstrate that a simple macro-learning technique, combined with the right selection mechanisms, can lead to a speedup learning algorithm that is powerful and general, yet simple as well.

We start by defining a framework for selective macro-learning and describe the general architecture of a macro learner. The information filtering framework [22] is used to describe the various logical components of a macro learner. In particular, the framework emphasizes the important role of the selection mechanisms used during the learning process. We continue by describing the MICRO-HILLARY algorithm. To make the presentation and the experiments clearer, we present the algorithm in two stages. We first describe a simplified version of the algorithm that learns macros for a single domain at a time. We then show a generalized version that can handle a family of domains.

MICRO-HILLARY, like all other macro-learners, is useful for satisficing search. Macro-learning has a negative utility when used for optimizing search [20]. Even when the learned macros are optimal they are not useful for optimizing search procedures: knowing the shortest way from point A to point B and the shortest way from B to C does not tell us anything about the optimality of their concatenation.

The input for the MICRO-HILLARY algorithm is a domain, specified by a set of basic operators, a heuristic function (not necessarily admissible) for evaluating the merit of states, and a function for generating random goal states. The MICRO-HILLARY algorithm performs learning by experimentation [25]. It generates solvable training problems with an increasing level of difficulty. The training problems are solved by a search program that performs hill-climbing combined with a procedure for escaping local minima. The escape routes are then recorded as macros. The problem solver uses the same search procedure, using macros as if they were basic operators. MICRO-HILLARY performs the maximal possible generalization by trying all the macros in all the states. Such a method increases the branching factor by the number of macros and would be too expensive when used in search procedures such as best-first search. Using this method in hill-climbing significantly reduces this overhead (unless the solutions found are significantly longer).

The architecture of MICRO-HILLARY contains building blocks that are inspired by the earlier works of many other researchers. Generating training problems with increasing difficulty was done manually by several researchers [9,23,25]. EASe [34] used hill-climbing to avoid exponential increase in search time as a result of adding macros. MORRIS [23] and MACLEARN [9] used macro selection methods that inspired the one used by MICRO-HILLARY. The MICRO-HILLARY architecture is a well-balanced and well-tuned combination of these building blocks that emerged after extensive experimental study.

The main domain upon which MICRO-HILLARY was tested is the general NxN sliding-tile puzzle [31] which includes as special cases the 8-puzzle, 15-puzzle and the 24-puzzle. The 8-puzzle and the 15-puzzle problems have long been popular among mathematicians [12,18,42] and AI researchers. The sliding-tile puzzles are among the most commonly used domains for testing heuristic search algorithms and macro-learning [1,9,13,19,34,35]. The reason for the popularity of sliding-tile puzzles is the simplicity of their specification combined with the very large size of their associated search space [29, p. 4]. An NxN puzzle problem has an associated search graph of size N²!/2. The optimizing NxN puzzle problem is NP-hard [31], although some research efforts have been made in finding optimal solutions to the 15-puzzle [15] and the 24-puzzle [17]. It is not difficult to devise a domain-specific algorithm for solving the satisficing NxN puzzle problem [31]. However, the attempts to find even non-optimal solutions for the puzzle using weak methods were successful only for small puzzles. A notable exception is the paper by Korf [16], who outlined a search algorithm for solving NxN puzzles. MACLEARN[10], for example, one of the most successful macro-learners, was demonstrated solving only a single 5 x 5 puzzle problem.

It is therefore a worthwhile challenge to devise a simple, domain-independent, learning algorithm that will be able to acquire the ability to efficiently solve any solvable NxN puzzle problem. The paper includes a comprehensive set of experiments that test the properties of the algorithm mostly in the NxN puzzle domain. We show that the learned macros can solve very large puzzles, and supply a formal proof that the generated macros are sufficient for efficiently solving puzzles of any size.

While the NxN puzzle is the main domain tried, the MICRO-HILLARY algorithm is completely domain-independent. We made efforts to build a minimal domain-independent architecture that is sufficient to solve the NxN puzzle, as well as problems in other domains, without introducing domain-dependent enhancements. We include experiments in other domains to demonstrate that the algorithm is indeed domain-independent.

Section 2 defines selective macro-learning and identifies the choices to be made when designing a macro-learning system. Section 3 describes the general MICRO-HILLARY¹ algorithm for learning macro-operators. Section 4 describes experiments with MICRO-HILLARY in various domains, mainly the 15-puzzle domain. Section 5 describes an extension of MICRO-HILLARY that is able to learn a family of domains that are differentiated by a single numeric parameter, and describes the application of the parameterized algorithm to the family of NxN puzzle domains. Finally, Section 6 discusses and sums up MICRO-HILLARY's strengths and weaknesses, in comparison to other macro-learning algorithms.

Next: Background: Selective Macro-Learning Up: A Selective Macro-learning Algorithm Previous: A Selective Macro-learning Algorithm

Shaul Markovitch
1998-07-21