Next: Generalization over Actions
Up: Adaptive Resolution Models
Previous: Variable Resolution Dynamic Programming
Moore's PartiGame
algorithm [81] is another solution to the problem of
learning to achieve goal configurations in deterministic
high-dimensional continuous spaces by learning an adaptive-resolution
model. It also divides the environment into cells; but in each cell,
the actions available consist of aiming at the neighboring cells (this
aiming is accomplished by a local controller, which must be provided
as part of the problem statement). The graph of cell transitions is
solved for shortest paths in an online incremental manner, but a
minimax criterion is used to detect when a group of cells is too
coarse to prevent movement between obstacles or to avoid limit
cycles. The offending cells are split to higher resolution.
Eventually, the environment is divided up just enough to choose
appropriate actions for achieving the goal, but no unnecessary
distinctions are made. An important feature is that, as well as
reducing memory and computational requirements, it also structures
exploration of state space in a multi-resolution manner. Given a
failure, the agent will initially try something very different to
rectify the failure, and only resort to small local changes when all
the qualitatively different strategies have been exhausted.
Figure 7a shows a two-dimensional continuous maze.
Figure 7b shows the performance of a robot using the
PartiGame algorithm during the very first trial.
Figure 7c shows the second trial, started from a slightly
different position.
Figure 7: (a) A two-dimensional maze problem.
The point robot must find a path from start to goal without crossing
any of the barrier lines.
(b)
The path taken by PartiGame during the entire first trial.
It begins with intense exploration to find a route out of the almost
entirely enclosed start region. Having eventually reached a
sufficiently high resolution, it discovers the gap and proceeds
greedily towards the goal, only to be temporarily blocked by the
goal's barrier region.
(c)
The second trial.
This is a very fast algorithm, learning policies in spaces of up to
nine dimensions in less than a minute. The restriction of the current
implementation to deterministic environments limits its applicability,
however. McCallum [76] suggests some related
tree-structured methods.
Next: Generalization over Actions
Up: Adaptive Resolution Models
Previous: Variable Resolution Dynamic Programming
Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996