Combinatorial optimization domains, in addition to being more economically important than game domains, have several nice properties that make them amenable to evaluation function learning. First, the simple search techniques in wide use (hillclimbing, simulated annealing) only ever use a single ply of lookahead, making them especially dependent on the evaluation function. Second, the widespread use of manual objective-function tuning clearly indicates the need for and potential benefits of efficient automatic procedures. Finally, like games but unlike most robotic control problems, the complete action model for optimization domains is known. This means we can apply value function approximation algorithms without the added complication of having to learn a model and deal with its inevitable inaccuracies.
Formally, the problem of combinatorial optimization is simply stated: given a finite state space X and an objective function , find an optimal state . Typically, X is huge, and finding an optimal is intractable. However, there are many heuristic algorithms that attempt to exploit f's structure to locate good optima, including hillclimbing, simulated annealing, tabu search, and genetic algorithms. Loosely speaking, these work by imposing a neighborhood relation on the states of X and then searching the graph that results, guided by f.
The effectiveness of these search algorithms clearly depends on the shape of f with respect to the graph structure. If f is riddled with local minima or has large regions of constant value (plateaus), then search-based algorithms are likely to fail. For this reason, the most natural f for a problem--the one which evaluates the actual quantity a human would like to minimize--is often not a good evaluation function to use for guiding search. Looked at another way, f is most useful in search if it indicates not only which states are good, but also which states predictably lead to good states.
In current practice, more complex evaluation functions than f, based on features of the state that help predict the search quality, are manually built and tuned. This part of my thesis considers the possibility of automating this task using value function approximation. I consider two methods: Section 3.2 extends the approach of [Zhang and Dietterich1995], and Section 3.3 introduces a simpler approach based on predicting search outcomes.