Selective Attention: Acquiring Macros Leading from Local Minima to Better States

Next: Escaping from Local Minima Up: MICRO-HILLARY Previous: Selective Experience: Generating Solvable

Selective Attention: Acquiring Macros Leading from Local Minima to Better States

Given a search trace, every subpath of it can be recorded as a macro. Obviously, there are too many subpaths, so we need to employ a filter in order to acquire a set of macros with high utility. What properties of macros are likely to indicate high utility? There are three requirements that should be fulfilled:

.

The macros should be useful, i.e., they should save the system significant search resources.

.

There should be as few macros as possible. Redundant macros increase the branching factor of the search graph without associated benefits.

.

The macros should be as short as possible, for two reasons:

Long macros tend to be less general. The main reason is that applying a long sequence of basic operators to a state is likely to encounter an illegal state.
For the learning method of MICRO-HILLARY, longer macros require significantly more learning resources (for the escape procedure).

The filtering method we use for MICRO-HILLARY only acquires macros leading from a local minimum to a better state. Whenever not in a local minimum, the search procedure can use the heuristic function, so macros that do not start in a local minimum are not necessary. The macros acquired by this filtering method are useful since they enable the search procedure to continue from a local minimum. These macros are also the shortest that satisfy that condition. We call this method a minimum-to-better filtering. Let h be a well-behaved heuristic function (not necessarily admissible), let $e=\left\langle s_{init}, s_g \right \rangle$ be a training problem, and let $T = \left \langle o_1, \ldots , o_n \right \rangle$ be a solution to e. Let $s_l = o_{l} (o_{l-1} (\ldots o_{1}(s_{init})))$ denote the state resulting from the application of the sequence $\left \langle o_1, \ldots , o_l \right \rangle$ to the initial state. A subpath $\left \langle o_{j} , o_{j+1}, \ldots , o_{j+k} \right \rangle$ is selected by the minimum-to-better attention filter if and only if the following condition holds:

$\begin{displaymath} \begin{array}{l} \forall o \in O [h(s_{j},s_g) \leq h(o(s_{... ... y < k\rightarrow h(s_{j},s_g) \leq h(s_{j+y},s_g)] \end{array}\end{displaymath}$

(3)

The first conjunct says that the first state of a macro must be a local minimum. It serves to satisfy requirement 2 above--that there should be as few macros as possible. The second conjunct says that the last state must be a better state. It serves to satisfy requirement 1 above--that macros should be useful. The third conjunct says that the macro is the shortest satisfying the former two conjuncts. It serves to satisfy the third requirement above--that macros should be as short as possible. When a sufficient number of such macros has been acquired, the search program may be able to advance toward the goal state without ever being trapped in local minima.

Definition 1 Let S be a set of states and O a set of operators. Let s_g be a goal state. Let h be a well-behaved heuristic function. Let M be a set of macro-operators. We say that M is complete with respect to h and s_g if and only if

$\begin{displaymath} \forall s \in S - \{s_g\} [solvable(s,s_g) \rightarrow \exists o \in O \cup M [h(o(s),s_g) < h(s,s_g)]. \end{displaymath}$

A complete macro set ``smooths'' the heuristic function by eliminating local minima. The length of the individual macros in a complete set depends on the distance between local minima and their associated better state.

Definition 2 Let S be a set of states and O be a set of operators. Let s_g be a goal state. Let h be a well-behaved heuristic function. Let d(s₁,s₂) denote the graph distance between s₁ and s₂ in the graph defined by O. The radius of a state $s_0 \in S$ with respect to h, s_g and O is defined as:

$\begin{displaymath} radius(s_0)=\min_{s \in S \wedge h(s,s_g)<h(s_0,s_g)}d(s_0,s). \end{displaymath}$

The radius of the goal state is 0. The radius of a state which is not a local minimum is 1.

**Figure 3:** An illustration of the differences between the *minimum-to-minimum* and *minimum-to-better* filtering methods. The X axis stands for the sequence of states along the solution path. The Y axis stands for the heuristic values of states. (a) A situation where the *minimum-to-better* strategy acquires short macros that are just sufficient to allow progress in the heuristic search while the *minimum-to-minimum* strategy acquires unnecessarily long macros that are likely to be less applicable. (b) A situation where the *minimum-to-minimum* strategy acquires a macro that leads from a local minimum to a state with *worse* (higher) value. The *minimum-to-better* strategy always acquires macros that lead to a state with a better (lower) value.

The minimum-to-better filter looks similar to the minimum-to-minimum⁴ filtering method used by Iba [10]. However, there are several problems with the minimum-to-minimum method that the minimum-to-better filter avoids. When the distance between minima is large, the minimum-to-minimum filter will acquire very long macros. This violates the third requirement above. Figure 3(a) illustrates this problem. The X axis stands for the sequence of states along the solution path. The Y axis stands for the heuristic values of states. Long macros such as those acquired by the minimum-to-minimum filter tend to be much less general (they are more likely to fail). This problem is intensified with incremental learning. In the first stages of learning, the minima are close together and short macros are learned. These macros, however, make minima more sparse in later stages of learning, leading to the acquisition of very long macros. This problem becomes extreme when there is only one minimum in the solution path. In such a case, the minimum-to-minimum filter will either learn nothing, or will use Iba's approach and consider the start and end states of the solution path as minima, leading to the acquisition of two very long macros. Indeed, when we tried to replace the minimum-to-better with the minimum-to-minimum filter, MICRO-HILLARY went into a very long session of acquiring very long macros. Iba had to introduce an acquisition filter that blocks long macros in order to reduce the harmfulness of this phenomenon.

Another problem with the minimum-to-minimum method is that it may learn routes from a minimum to a worse (higher) minimum. This violates the first requirement above. Figure 3(b) illustrates this problem. Of course, it may then learn another macro from the worse minimum, but the total number of macros still increases needlessly. The minimum-to-better strategy always acquires macros that lead to a state with a better (lower) value. In Section 6 we discuss further differences between MACLEARN and MICRO-HILLARY.

The T-Macros acquired by MORRIS [23] also resemble the macros acquired by MICRO-HILLARY. A T-Macro is ``locally anomalous. Its initial segment appears to make no progress, but the sequence as a whole is rated as advantageous'' [23]. It is therefore possible to call the selection strategy employed by MORRIS any-to-better. This method, which does not insist on starting a macro from a local minimum, will acquire more macros than the minimum-to-better strategy. These macros are not necessary and violate the second requirement above. The larger number of (possibly unnecessary) macros will increase the branching factor and may deteriorate the performance of the program that uses them.

**Figure 4:** An illustration of the difference between the *any-to-better* and *minimum-to-better* selection methods. The X axis stands for the sequence of states along the solution path. The Y axis stands for the heuristic value of states. The any-to-better strategy acquires more (possibly unnecessary) macros.

Figure 4 illustrates the difference between the two strategies. While the minimum-to-better strategy will acquire one macro that smooths the path, the any-to-better strategy acquires many macros that will unnecessarily increase the branching factor.

Next: Escaping from Local Minima Up: MICRO-HILLARY Previous: Selective Experience: Generating Solvable

Shaul Markovitch
1998-07-21