next up previous contents
Next: Literature Review Up: Learning Evaluation Functions Previous: Using the Predictions

Direct Meta-Optimization of Evaluation Functions

 

Standing in contrast to evaluation function learning methods based on approximating the theoretically-optimal tex2html_wrap_inline1400 are what I call direct meta-optimization methods. Such methods assume a fixed parametric form for the evaluation function and optimize it directly with respect to the ultimate objective, sampled by Monte Carlo simulation. In symbols, given an evaluation function tex2html_wrap_inline1814 parametrized by weights tex2html_wrap_inline1816 , we seek to learn tex2html_wrap_inline1816 by directly optimizing the meta-objective function

displaymath1810

The functions tex2html_wrap_inline1822 learned by such methods are not constrained by the Bellman equations: the evaluations they produce for any given state have no semantic interpretation akin to the definition of tex2html_wrap_inline1400 in Equation 2 (page gif). The lack of such constraints means that less information for training the function can be gleaned from a simulation run. The temporal-difference goal of explicitly caching values from lookahead search into the static evaluation function is discarded; only the final costs of completed simulation runs are available. For these reasons, the reinforcement-learning community has largely ignored the direct meta-optimization approach.

Nevertheless, I want to give this approach a fair comparison against VFA methods, for several reasons:





Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996