This part of the research begins with a simple observation: the performance of a graph-search-based optimization algorithm depends on the state from which search starts. We can express this dependence in a mapping from starting states x to expected search result:
where A refers to the search algorithm being used, and is the probability that a search starting from x will terminate
in state z.
evaluates x's promise as a starting state
for A. We seek to learn this function using a statistical
regression model such as linear regression or multi-layer perceptrons,
where states x are represented as real-valued feature vectors.
These input features may encode any relevant properties of the domain,
including the original objective function f(x) itself.
Training data for supervised learning of may be readily
obtained by performing Monte Carlo simulations of A from random
starting points. In addition, if the algorithm A behaves as a
Markov chain, then all the intermediate states of each simulated
trajectory may also be considered alternate ``starting points'' for
that search, and thus used as training data as well. Stochastic
hillclimbing trivially meets the Markov requirement: each state x
leads stochastically to a neighboring state y for which f(y) <
f(x), regardless of the past history of the search; local minima are
terminal states of the chain. Simulated annealing, too, is Markovian
in the expanded state space
[van Laarhoven1987].
In the Markov-chain cases, is simply the value function of the
chain: it predicts the eventual expected outcome from every state.
Since value functions satisfy the Bellman equations
(Equation 3), algorithms more sophisticated than
Monte-Carlo simulation with supervised learning are applicable: in
particular, TD(
) and ROUT both apply.
The state space X is huge, so we cannot expect our simulations to
explore any significant fraction of it. Instead, we must depend on
good extrapolation from the function approximator if we are to learn
accurately. Specifically, we hope that the function
approximator will predict good results for unexplored states which
share many features with training states that performed well. If
is fairly smooth, this hope is reasonable.