next up previous contents
Next: Using the Predictions Up: Prediction for Optimization: ``STAGE'' Previous: Prediction for Optimization: ``STAGE''

Learning to Predict

This part of the research begins with a simple observation: the performance of a graph-search-based optimization algorithm depends on the state from which search starts. We can express this dependence in a mapping from starting states x to expected search result:

equation212

where A refers to the search algorithm being used, and tex2html_wrap_inline1722 is the probability that a search starting from x will terminate in state z. tex2html_wrap_inline1728 evaluates x's promise as a starting state for A. We seek to learn this function using a statistical regression model such as linear regression or multi-layer perceptrons, where states x are represented as real-valued feature vectors. These input features may encode any relevant properties of the domain, including the original objective function f(x) itself.

Training data for supervised learning of tex2html_wrap_inline1728 may be readily obtained by performing Monte Carlo simulations of A from random starting points. In addition, if the algorithm A behaves as a Markov chain, then all the intermediate states of each simulated trajectory may also be considered alternate ``starting points'' for that search, and thus used as training data as well. Stochastic hillclimbing trivially meets the Markov requirement: each state x leads stochastically to a neighboring state y for which f(y) < f(x), regardless of the past history of the search; local minima are terminal states of the chain. Simulated annealing, too, is Markovian in the expanded state space tex2html_wrap_inline1750 [van Laarhoven1987].

In the Markov-chain cases, tex2html_wrap_inline1752 is simply the value function of the chain: it predicts the eventual expected outcome from every state. Since value functions satisfy the Bellman equations (Equation 3), algorithms more sophisticated than Monte-Carlo simulation with supervised learning are applicable: in particular, TD( tex2html_wrap_inline2400 ) and ROUT both apply.

The state space X is huge, so we cannot expect our simulations to explore any significant fraction of it. Instead, we must depend on good extrapolation from the function approximator if we are to learn tex2html_wrap_inline1752 accurately. Specifically, we hope that the function approximator will predict good results for unexplored states which share many features with training states that performed well. If tex2html_wrap_inline1728 is fairly smooth, this hope is reasonable.


next up previous contents
Next: Using the Predictions Up: Prediction for Optimization: ``STAGE'' Previous: Prediction for Optimization: ``STAGE''

Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996