Value Function Approximation Machine Learning Conference Tahoe City, California, July 9, 1995 |
---|
Andrew Moore, chair (Carnegie Mellon University), awm@cs.cmu.edu |
Justin Boyan (Carnegie Mellon University), jab@cs.cmu.edu |
Rich Sutton (University of Massachusetts), rich@cs.umass.edu |
|
Leemon Baird (United States Air Force) |
Satinder Singh (MIT, Brain and Cognitive Sciences) |
John Tsitsiklis (MIT, Lab for Information and Decision Sciences) |
Overview | [see below] |
Andrew Moore, "Value Function Approximation" | [47 slides on 12 pages] |
Christopher Atkeson, "Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming" | [6 pages] |
Leemon Baird, "Residual Algorithms: Reinforcement Learning with Function Approximation" | [9 pages] ; [19 slides] |
Justin Boyan and Andrew Moore, "Robust Value Function Approximation by Working Backwards" | [6 pages] |
Bob Crites, "A Case Study from the Elevator Dispatching Domain" | [14 slides] |
Wei Zhang and Tom Dietterich, "Value Function Approximations and Job-Shop Scheduling" | [12 pages] |
Stuart Dreyfus, "AHC for Stochastic Minimum Cost Path Problems Using Neural Net Function Approximators" | [4 pages] |
Geoffrey Gordon, "Online Fitted Reinforcement Learning" | [3 pages] |
Michael Littman, "Solving Partially Observable Markov Decision Processes via VFA" | [14 slides] |
Richard Sutton, "On the Virtues of Linear Learning and Trajectory Distributions" | [1 page] |
These proceedings have been printed as Carnegie Mellon School of Computer Science Technical Report CMU-CS-95-206. For information on ordering, see the CMU Tech Report archive or email reports@cs.cmu.edu.
The key computational idea underlying reinforcement learning is the iterative approximation of the value function---the mapping from states (or state-action pairs) to an estimate of the long-term future reward obtainable from that state. For large problems, the approximation of the value function must involve generalization from examples to reduce memory requirements and training time. Moreover, generalization may help beat the curse of dimensionality for problems in which the complexity of the value function increases sub-exponentially with the number of state variables. Generalizing function approximators have been used effectively in reinforcement learning as far back as Samuel's checker player, which used a linear approximator, and Michie and Chambers' BOXES system, which used state aggregation. Tesauro's TD-Gammon, which used a backpropagation network, provides a tantalizing recent demonstration of just how effective this can be.However, almost all of the theory of reinforcement learning is dependent on the assumption of tabular representations of the value function, for which generalization is impossible. Moreover, several researchers have argued that there are serious hazards intrinsic to the approximation of value functions by reinforcement learning methods. It is an important and still unclear question as to just how serious these hazards might be. In this workshop we surveyed the substantial recent and ongoing work pertaining to this question, and identified the crucial outstanding issues.
If existing theory of reinforcement learning no longer applies when an approximate value function is learned, what are we to do? The workshop explored the problem and the following possible responses, among others:
- ignore the problem and proceed empirically, perhaps discovering as we create applications that some seem to work and some don't. A danger with this approach is that successes will be reported loudly and failures whispered quietly, and we may become a community of tweakers.
- analyze what properties function approximators need in order to work well with current reinforcement learning algorithms: are there function approximators specially suited to reinforcement learning?
- invent new reinforcement learning methods that are specifically designed to work well with function approximators.
- invent new theory for value function approximation. What bounds can be placed on the errors? Can particular function approximators give us better bounds or stability guarantees? Can online training get us better bounds or stability guarantees?
- learn from the literature of other fields:
- optimal control theory and LQG regulation
- heuristic function generation techniques in computer game playing
- operations research and differential games
The papers contained in this report represent the majority of the topics presented at the workshop. They demonstrate the diversity of approaches to value function approximation, and offer many directions for continued research in this important area.