Next: Timeline
Up: Summary
Previous: Summary
This thesis will contribute to the state-of-the-art in machine
learning, reinforcement learning, heuristic search and combinatorial
optimization. Its specific contributions will include:
- A source reference for researchers interested in automatic
learning and tuning of evaluation functions.
- A new algorithm for learning approximate value functions in
large, acyclic Markov decision problems.
- Formulations for posing general combinatorial optimization
problems as Value Function Approximation tasks, extending
[Zhang1996].
- A new algorithm for using prediction learning to bootstrap
search algorithms.
- Development of new techniques, based on memory-based stochastic
optimization, for direct meta-optimization of evaluation functions.
- An empirical comparison, on several large-scale applications, of
the VFA and direct-optimization approaches.
- A software product for public distribution that implements the
major algorithms for evaluation function learning.
The following research topics are closely related to this thesis but
nonetheless outside the scope of what I plan to explore:
- Combining learning of the world transition model with learning
the evaluation function. Rather, all our algorithms assume that the
dynamics of the world are fully known in advance.
- Learning approaches based on analyzing the problem-space
operators, such as [Prieditis1993], or based on human-labelled
training data, such as [Tesauro and Sejnowski1989]. All our algorithms learn
from simulations and function approximation.
- Theoretical results and analysis. The new algorithms presented
here make no assumptions about the function-approximation scheme
used, so strong theoretical results are unlikely. My focus will
instead be on designing well-motivated algorithms and validating
them empirically.
- Direct policy optimization. A reasonable approach for some
domains is to bypass the evaluation function altogether, and instead
learn a direct mapping from states to actions [Moriarty and Miikkulainen1995]. I
will not consider this approach.
Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996