Next: Acknowledgments
Up: Reinforcement Learning: A Survey
Previous: Robotics and Control
There are a variety of reinforcement-learning techniques that work
effectively on a variety of small problems. But very few of these
techniques scale well to larger problems. This is not because
researchers have done a bad job of inventing learning techniques, but
because it is very difficult to solve arbitrary problems in the
general case. In order to solve highly complex problems, we must give
up tabula rasa learning techniques and begin to incorporate bias
that will give leverage to the learning process.
The necessary bias can come in a variety of forms, including the
following:
- shaping:
- The technique of shaping is used
in training animals [45]; a teacher presents very simple
problems to solve first, then gradually exposes the learner to more
complex problems. Shaping has been used in supervised-learning
systems, and can be used to train hierarchical reinforcement-learning
systems from the bottom up [59], and to alleviate problems of
delayed reinforcement by decreasing the delay until the problem is
well understood [37, 38].
- local reinforcement signals:
- Whenever possible, agents
should be given reinforcement signals that are local. In applications
in which it is possible to compute a gradient, rewarding the agent for
taking steps up the gradient, rather than just for achieving the final
goal, can speed learning significantly [73].
- imitation:
- An agent can learn by ``watching'' another
agent perform the task [59]. For real robots, this requires
perceptual abilities that are not yet available. But another strategy
is to have a human supply appropriate motor commands to a robot
through a joystick or steering wheel [89].
- problem decomposition:
- Decomposing a huge learning
problem into a collection of smaller ones, and providing useful
reinforcement signals for the subproblems is a very powerful technique
for biasing learning. Most interesting examples of robotic
reinforcement learning employ this technique to some
extent [28].
- reflexes:
- One thing that keeps agents that know nothing
from learning anything is that they have a hard time even finding the
interesting parts of the space; they wander around at random never
getting near the goal, or they are always ``killed'' immediately.
These problems can be ameliorated by programming a set of ``reflexes''
that cause the agent to act initially in some way that is
reasonable [73, 107]. These reflexes can eventually
be overridden by more detailed and accurate learned knowledge, but
they at least keep the agent alive and pointed in the right direction
while it is trying to learn. Recent work by Millan [78]
explores the use of reflexes to make robot learning safer and more
efficient.
With appropriate biases, supplied by human programmers or teachers,
complex reinforcement-learning problems will eventually be solvable.
There is still much work to be done and many interesting questions
remaining for learning techniques and especially regarding methods for
approximating, decomposing, and incorporating bias into problems.
Next: Acknowledgments
Up: Reinforcement Learning: A Survey
Previous: Robotics and Control
Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996