PartiGame Algorithm

Next: Generalization over Actions Up: Adaptive Resolution Models Previous: Variable Resolution Dynamic Programming

PartiGame Algorithm

Moore's PartiGame algorithm [81] is another solution to the problem of learning to achieve goal configurations in deterministic high-dimensional continuous spaces by learning an adaptive-resolution model. It also divides the environment into cells; but in each cell, the actions available consist of aiming at the neighboring cells (this aiming is accomplished by a local controller, which must be provided as part of the problem statement). The graph of cell transitions is solved for shortest paths in an online incremental manner, but a minimax criterion is used to detect when a group of cells is too coarse to prevent movement between obstacles or to avoid limit cycles. The offending cells are split to higher resolution. Eventually, the environment is divided up just enough to choose appropriate actions for achieving the goal, but no unnecessary distinctions are made. An important feature is that, as well as reducing memory and computational requirements, it also structures exploration of state space in a multi-resolution manner. Given a failure, the agent will initially try something very different to rectify the failure, and only resort to small local changes when all the qualitatively different strategies have been exhausted.

Figure 7a shows a two-dimensional continuous maze. Figure 7b shows the performance of a robot using the PartiGame algorithm during the very first trial. Figure 7c shows the second trial, started from a slightly different position.

Figure 7: (a) A two-dimensional maze problem. The point robot must find a path from start to goal without crossing any of the barrier lines. (b) The path taken by PartiGame during the entire first trial. It begins with intense exploration to find a route out of the almost entirely enclosed start region. Having eventually reached a sufficiently high resolution, it discovers the gap and proceeds greedily towards the goal, only to be temporarily blocked by the goal's barrier region. (c) The second trial.

This is a very fast algorithm, learning policies in spaces of up to nine dimensions in less than a minute. The restriction of the current implementation to deterministic environments limits its applicability, however. McCallum [76] suggests some related tree-structured methods.

Next: Generalization over Actions Up: Adaptive Resolution Models Previous: Variable Resolution Dynamic Programming

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996