Background: Selective Macro-Learning

Let S be a finite set of states. Let O be a finite set of operators where each operator $o \in O$ is a function $o : S \rightarrow S \cup \{ \emptyset \}$ . If $o(s)=\emptyset$ , we say that o(s) is undefined. A problem is a pair of states $\left \langle s_i , s_g \right \rangle$ , where $s_i \in S$ is called the initial state and $s_g \in S$ is called the goal state.² A solution is a sequence of operators $\left \langle o_1 , \ldots , o_k \right \rangle$ , such that $o_k\left(\ldots o_1\left(s_i\right)\ldots\right)=s_g$ . A problem $p=\left \langle s_i , s_g \right \rangle$ is solvable if there exists a solution for p (specified by solvable(s_i,s_g)).

A macro-operator is a sequence of operators $m=\left \langle o_1 , \ldots , o_k \right \rangle$ . A macro-operator m is applied to a state $s \in S$ by applying its basic operators in a sequence. If while applying the sequence of operators we encounter an undefined application, the application of the macro is undefined.

$\begin{displaymath} m(s)= \left\{ \begin{array}{ll} o_k\left(\ldots o_1\left(s \... ...ight ]\\ \emptyset & \mbox{otherwise.} \par\end{array}\right. \end{displaymath}$

A problem solver $\varphi$ is an algorithm that takes a problem p and a set of operators O and applies operators to states, searching for a solution to p. Given a solvable problem p, a problem solver $\varphi$ and a set of operators O, we define $SearchCost(\varphi,p,O)$ , the cost of solving p using $\varphi$ and O, as the number of operator applications performed by $\varphi$ until a solution is found. Note that SearchCost is different from the solution cost--the cost of the solution path according to some cost function. In this work we are only interested in satisficing search--minimizing the cost of the search regardless of the cost of the resulting solution.

The utility of a set of macros M, with respect to a problem p, a problem solver $\varphi$ and a set of operators O, is defined as

$\begin{displaymath} U_{p,\varphi,O}(M)= SearchCost(\varphi,p,O) - SearchCost(\varphi,p,O \cup M). \end{displaymath}$

(1)

$\begin{displaymath} U_{\varphi,O}(M)= E_{D(p)}[SearchCost(\varphi,p,O) - SearchCost(\varphi,p,O \cup M)]. \end{displaymath}$

(2)

Most of the macro-learning systems perform learning by experimentation [25]. The program solves training problems and acquires sequences of operators applied during the search. Given a set of operators O, a distribution of problems D, and a problem solver $\varphi$ , the goal of a macro-learning system is to acquire a set of macro-operators M, such that $U_{\varphi,O}(M)$ is positive, and as high as possible. It is quite possible, however, that the utility of M will be negative, as the added macros also increase the branching factor.

Markovitch and Scott [22] introduced the information filtering framework, which offers a systematic way of dealing with the utility problem by being more selective. The framework identifies five logical types of selection processes in learning systems: selective experience, selective attention, selective acquisition, selective retention and selective utilization. The framework views learning programs as information processing systems where information flows from the experience space through some attention mechanism, through an acquisition procedure to the knowledge base and finally to the problem solver. The five filters are defined with respect to their logical location within the information flow.

**Figure 1:** The architecture of an off-line macro-learning system. Training problems are generated and filtered by the experience filter. The attention procedure converts the search traces into operator sequences that are filtered by the attention filter. The sequences are converted to macro-operators. The acquisition filter decides whether to add new macros to the macro set. The retention filter deletes macros with negative utility. The utilization filter allows only a subset of the macro set to be used by the problem solver.

The general architecture offered by the information-filtering framework, instantiated to off-line macro-learning, is shown in Figure 1. Markovitch and Scott argue that the art of building successful learning systems often lies in selecting the right combination of filters. A careful examination of existing macro-learning systems reveals that those containing sophisticated filtering mechanisms performed the most successful learning.