Let S be a finite set of states. Let O be a finite set of operators where each operator is a function . If , we say that o(s) is undefined. A problem is a pair of states , where is called the initial state and is called the goal state.2 A solution is a sequence of operators , such that . A problem is solvable if there exists a solution for p (specified by solvable(si,sg)).
A macro-operator is a sequence of operators . A macro-operator m is applied to a state by applying its basic operators in a sequence. If while applying the sequence of operators we encounter an undefined application, the application of the macro is undefined.
A problem solver is an algorithm that takes a problem p and a set of operators O and applies operators to states, searching for a solution to p. Given a solvable problem p, a problem solver and a set of operators O, we define , the cost of solving p using and O, as the number of operator applications performed by until a solution is found. Note that SearchCost is different from the solution cost--the cost of the solution path according to some cost function. In this work we are only interested in satisficing search--minimizing the cost of the search regardless of the cost of the resulting solution.
The utility of a set of macros M, with respect to a problem
p,
a problem solver
and a set of operators O, is defined as
Most of the macro-learning systems perform learning by experimentation [25]. The program solves training problems and acquires sequences of operators applied during the search. Given a set of operators O, a distribution of problems D, and a problem solver , the goal of a macro-learning system is to acquire a set of macro-operators M, such that is positive, and as high as possible. It is quite possible, however, that the utility of M will be negative, as the added macros also increase the branching factor.
Markovitch and Scott [22] introduced the information filtering framework, which offers a systematic way of dealing with the utility problem by being more selective. The framework identifies five logical types of selection processes in learning systems: selective experience, selective attention, selective acquisition, selective retention and selective utilization. The framework views learning programs as information processing systems where information flows from the experience space through some attention mechanism, through an acquisition procedure to the knowledge base and finally to the problem solver. The five filters are defined with respect to their logical location within the information flow.
The general architecture offered by the information-filtering framework, instantiated to off-line macro-learning, is shown in Figure 1. Markovitch and Scott argue that the art of building successful learning systems often lies in selecting the right combination of filters. A careful examination of existing macro-learning systems reveals that those containing sophisticated filtering mechanisms performed the most successful learning.