I believe that the computational requirements of the meta-optimization approach can be significantly reduced by the use of stochastic optimization techniques being developed in our group [Moore and Schneider1996]. These techniques are designed to optimize functions for which samples are both expensive to gather and potentially very noisy. The meta-objective function M certainly fits this characterization, since sampling M means performing a complete run of simulated annealing (or whatever search algorithm is being optimized) and reporting the final result.
Memory-based stochastic optimization uses the results of all searches conducted thus far to build a global model of M. This model is trained with Bayesian locally-weighted polynomial regression, a flexible function approximator which also behaves well while the data set is still small. Whereas Powell's method requires noise-free samples (and thus forced Ochotta to do 200 annealing runs for each parameter setting), Bayesian regression can capture meaningful gradients even without high confidence in the model accuracy.
The learned model can be used in several ways to recommend the next parameter setting to sample. The specific algorithms I will explore are PMAX and IEMAX [Moore and Schneider1996]. PMAX's approach is simple: it always recommends sampling the current optimum of the model. IEMAX, patterned after Kaelbling's IE algorithm [Kaelbling1990], recommends sampling the parameter setting which optimizes the ``optimistic model,'' i.e. the 95th-percentile values of the model's predicted confidence intervals. In an empirical test on synthetic unimodal five-dimensional optimization problems, PMAX converged to the optimum more quickly than IEMAX, but IEMAX's more persistent exploration will be helpful if M has several local optima.
The approaches based on locally-weighted regression are best-suited for optimizing evaluation functions with a relatively small number of parameters. For learning more complicated evaluation functions, one possibility is to use a cascaded neural network architecture [Fahlman and Lebiere1990], which can build a function of p parameters without ever having to optimize more than of them at a time. Another possibility is to replace the locally-weighted regression model of M with a global polynomial regression better suited to high-dimensional inputs. Time permitting, I will investigate both these approaches.