Contributions

Next: Timeline Up: Summary Previous: Summary

Contributions

This thesis will contribute to the state-of-the-art in machine learning, reinforcement learning, heuristic search and combinatorial optimization. Its specific contributions will include:

A source reference for researchers interested in automatic learning and tuning of evaluation functions.
A new algorithm for learning approximate value functions in large, acyclic Markov decision problems.
Formulations for posing general combinatorial optimization problems as Value Function Approximation tasks, extending [Zhang1996].
A new algorithm for using prediction learning to bootstrap search algorithms.
Development of new techniques, based on memory-based stochastic optimization, for direct meta-optimization of evaluation functions.
An empirical comparison, on several large-scale applications, of the VFA and direct-optimization approaches.
A software product for public distribution that implements the major algorithms for evaluation function learning.

The following research topics are closely related to this thesis but nonetheless outside the scope of what I plan to explore:

Combining learning of the world transition model with learning the evaluation function. Rather, all our algorithms assume that the dynamics of the world are fully known in advance.
Learning approaches based on analyzing the problem-space operators, such as [Prieditis1993], or based on human-labelled training data, such as [Tesauro and Sejnowski1989]. All our algorithms learn from simulations and function approximation.
Theoretical results and analysis. The new algorithms presented here make no assumptions about the function-approximation scheme used, so strong theoretical results are unlikely. My focus will instead be on designing well-motivated algorithms and validating them empirically.
Direct policy optimization. A reasonable approach for some domains is to bypass the evaluation function altogether, and instead learn a direct mapping from states to actions [Moriarty and Miikkulainen1995]. I will not consider this approach.

Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996