[1]
|
Martin Stolle and Christopher Atkeson.
Transfer of policies based on trajectory libraries.
In Proceedings of the International Conference on Intelligent
Robots and Systems (IROS 2007), 2007.
[ bib |
http |
.pdf ]
Recently, libraries of trajectory plans have been shown to be a promising
way of creating policies for difficult problems. However, often it
is not desirable or even possible to create a new library for every
task. We present a method for transferring libraries across tasks,
which allows us to build libraries by learning from demonstration
on one task and apply them to similar tasks. Representing the libraries
in a feature-based space is key to supporting transfer. We also search
through the library to ensure a complete path to the goal is possible.
Results are shown for the Little Dog task. Little Dog is a quadruped
robot that has to walk across rough terrain at reasonably fast speeds.
|
[2]
|
Martin Stolle and Christopher G. Atkeson.
Knowledge transfer using local features.
In Proceedings of the IEEE Symposium on Approximate Dynamic
Programming and Reinforcement Learning (ADPRL 2007), 2007.
[ bib |
http |
.pdf ]
We present a method for reducing the effort required to compute policies
for tasks based on solutions to previously solved tasks. The key
idea is to use a learned intermediate policy based on local features
to create an initial policy for the new task. In order to further
improve this initial policy, we developed a form of generalized policy
iteration. We achieve a substantial reduction in computation needed
to find policies when previous experience is available.
|
[3]
|
Martin Stolle and Christopher G. Atkeson.
Policies based on trajectory libraries.
In Proceedings of the International Conference on Robotics and
Automation (ICRA 2006), 2006.
[ bib |
http |
.pdf ]
We present a control approach that uses a library of trajectories
to establish a global control law or policy. This is an alternative
to methods for finding global policies based on value functions using
dynamic programming and also to using plans based on a single desired
trajectory. Our method has the advantage of providing reasonable
policies much faster than dynamic programming can provide an initial
policy. It also has the advantage of providing more robust and global
policies than following a single desired trajectory. Trajectory libraries
can be created for robots with many more degrees of freedom than
what dynamic programming can be applied to as well as for robots
with dynamic model discontinuities. Results are shown for the “Labyrinth”
marble maze, both in simulation as well as a real world version.
The marble maze is a difficult task which requires both fast control
as well as planning ahead.
|
[4]
|
Martin Stolle.
Automated discovery of options in reinforcement learning.
Master's thesis, McGill University, February 2004.
[ bib |
http |
.pdf ]
AI planning benefits greatly from the use of temporally-extended or
macro-actions. Macro-actions allow for faster and more efficient
planning as well as the reuse of knowledge from previous solutions.
In recent years, a significant amount of research has been devoted
to incorporating macro-actions in learned controllers, particularly
in the context of Reinforcement Learning. One general approach is
the use of options (temporally-extended actions) in Reinforcement
Learning. While the properties of options are well understood, it
is not clear how to find new options automatically. In this thesis
we propose two new algorithms for discovering options and compare
them to one algorithm from the literature. We also contribute a new
algorithm for learning with options which improves on the performance
of two widely used learning algorithms. Extensive experiments are
used to demonstrate the effectiveness of the proposed algorithms.
|
[5]
|
Martin Stolle and Doina Precup.
Learning options in reinforcement learning.
Lecture Notes in Computer Science, 2371:212-223, 2002.
[ bib |
http |
.pdf ]
Temporally extended actions (e.g., macro actions) have proven very
useful in speeding up learning, ensuring robustness and building
prior knowledge into AI systems. The options framework (Precup, 2000;
Sutton, Precup & Singh, 1999) provides a natural way of incorporating
such actions into reinforcement learning systems, but leaves open
the issue of how good options might be identified. In this paper,
we empirically explore a simple approach to creating options. The
underlying assumption is that the agent will be asked to perform
different goal-achievement tasks in an environment that is otherwise
the same over time. Our approach is based on the intuition that “bottleneck”
states, i.e. states that are frequently visited on system trajectories,
could prove to be useful subgoals (e.g. McGovern & Barto, 2001; Iba,
1989). We present empirical studies of this approach in two gridworld
navigation tasks. One of the environments we explored contains bottleneck
states, and the algorithm indeed finds these states, as expected.
The second environment is an empty gridworld with no obstacles. Although
the environment does not contain bottleneck states, our approach
still finds useful options, which essentially allow the agent to
travel around the environment more quickly.
|
[6]
|
Francois Rivest, Martin Stolle, and Thomas Shulz.
LNSC cascade-correlation simulator applet.
WWW, 2001.
[ bib |
http ]
|
This file has been generated by
bibtex2html 1.88.
|