Abstract
We contribute Policy Reuse as a technique to improve a reinforcement
learner with guidance from past learned similar policies. Our method
relies on using the past policies in a novel way as a probabilistic
bias where the learner faces three choices: the exploitation of the
ongoing learned policy, the exploration of random unexplored actions,
and the exploitation of past policies. We introduce the algorithm and
its major components: an exploration strategy to include the new reuse
bias, and a similarity metric to estimate the similarity of past
policies with respect to a new one. We provide empirical results
demonstrating that Policy Reuse improves the learning performance over
different strategies that learn without reuse. Policy Reuse further
contributes the learning of the structure of a domain. Interestingly
and almost as a side effect, Policy Reuse identifies classes of
similar policies revealing a basis of "eigen-policies" of the
domain. In general, Policy Reuse contributes to the overall goal of
lifelong reinforcement learning, as (i) it incrementally builds a
policy library; (ii) it provides a mechanism to reuse past policies;
and (iii) it learns an abstract domain structure in terms of
eigen-policies of the domain.
This is joint work with Prof. Manuela Veloso.
|
Pradeep Ravikumar Last modified: Sun Dec 4 02:04:53 EST 2005