Game-Theoretic Control for
Robot Teams
Rosemary Emery-Montemerlo
12th August, 2005
CMU-RI-TR-05-36
Thesis Committee:
Jeff Schneider (co-chair)
Sebastian Thrun (co-chair)
Geoff Gordon
Michael Littman, Rutger, The State University of New Jersey
Planning for a decentralized team of robots is a fundamentally
different problem from that of centralized control. During decision
making, robots must take into account not only their own observations
of world state, but also the possible observations and actions of
teammates. While the interconnectedness of such a reasoning process
seems to require an infinite recursion of beliefs to be modelled by
each member of the team, game theory provides an alternative
approach. Partially observable stochastic games (POSGs) generalize
notions of single-stage games and Markov decision processes to both
multiple agents and partially observable worlds. Even if there is only
limited communication between teammates, POSGs allow robots to come up
with policies that still take into account possible teammate
experiences without the need to explicitly model any recursive beliefs
about those experiences.
While a powerful model of decentralized teams, POSGs are
computationally intractable for all but the smallest problems. This
dissertation proposes a Bayesian game approximation to POSGs in which
game-theoretic reasoning about action selection is retained, but
robots reason only a limited time ahead about uncertainty in world
state and the experiences of their teammates. Planning and execution
are interleaved to further reduce computational burdens: at each
timestep, robots perform a step of full game-theoretic reasoning about
their current action selection given any possible history of
observations and a heuristic evaluation of the expected future value
of those decisions.
The Bayesian game approximation algorithm (BaGA) is able to find
solutions to much larger problems than previously solved. Further
computational savings are gained by reasoning about groups of similar
observation histories rather than single histories. Finally,
efficiency and performance are also improved through the use of
run-time communication policies that trade off expected gains in
performance with the costs of using bandwidth. In this dissertation,
the performance of BaGA is compared to policies generated for full
POSGs as well as heuristics. BaGA is also used to develop real-time
robot controllers for a series of simulated and physical robotic tag
problems that gradually increase in realism.
full text
|