This part of the research begins with a simple observation: the performance of a graph-search-based optimization algorithm depends on the state from which search starts. We can express this dependence in a mapping from starting states x to expected search result:
where A refers to the search algorithm being used, and is the probability that a search starting from x will terminate in state z. evaluates x's promise as a starting state for A. We seek to learn this function using a statistical regression model such as linear regression or multi-layer perceptrons, where states x are represented as real-valued feature vectors. These input features may encode any relevant properties of the domain, including the original objective function f(x) itself.
Training data for supervised learning of may be readily obtained by performing Monte Carlo simulations of A from random starting points. In addition, if the algorithm A behaves as a Markov chain, then all the intermediate states of each simulated trajectory may also be considered alternate ``starting points'' for that search, and thus used as training data as well. Stochastic hillclimbing trivially meets the Markov requirement: each state x leads stochastically to a neighboring state y for which f(y) < f(x), regardless of the past history of the search; local minima are terminal states of the chain. Simulated annealing, too, is Markovian in the expanded state space [van Laarhoven1987].
In the Markov-chain cases, is simply the value function of the chain: it predicts the eventual expected outcome from every state. Since value functions satisfy the Bellman equations (Equation 3), algorithms more sophisticated than Monte-Carlo simulation with supervised learning are applicable: in particular, TD( ) and ROUT both apply.
The state space X is huge, so we cannot expect our simulations to explore any significant fraction of it. Instead, we must depend on good extrapolation from the function approximator if we are to learn accurately. Specifically, we hope that the function approximator will predict good results for unexplored states which share many features with training states that performed well. If is fairly smooth, this hope is reasonable.