Peter Stone and Manuela Veloso
Computer Science Department
Carnegie Mellon University
Pittsburgh,
PA 15213
{pstone,veloso}@cs.cmu.edu
http://www.cs.cmu.edu/{~pstone,~mmv}
Submitted to the 15th International Conference on Machine Learning, March 1998
In this paper, we present a novel multi-agent learning paradigm called
team-partitioned, opaque-transition reinforcement learning (TPOT-RL).
TPOT-RL introduces the concept of using action-dependent features to
generalize the state space. In our work, we use a learned
action-dependent feature space. TPOT-RL is an effective technique to
allow a team of agents to learn to cooperate towards the achievement
of a specific goal. It is an adaptation of traditional RL methods
that is applicable in complex, non-Markovian, multi-agent domains with
large state spaces and limited training opportunities. Multi-agent
scenarios are opaque-transition, as team members are not always in
full communication with one another and adversaries may affect the
environment. Hence, each learner cannot rely on having knowledge of
future state transitions after acting in the world. TPOT-RL enables
teams of agents to learn effective policies with very few training
examples even in the face of a large state space with large amounts of
hidden state. The main responsible features are: dividing the learning
task among team members, using a very coarse, action-dependent feature
space, and allowing agents to gather reinforcement directly from
observation of the environment. TPOT-RL is fully implemented and has
been tested in the robotic soccer domain, a complex, multi-agent
framework. This paper presents the algorithmic details of TPOT-RL as
well as empirical results demonstrating the effectiveness of the
developed multi-agent learning approach with learned features.
keywords: reinforcement learning, layered learning, multi-agent
learning