Next: Variable Resolution Dynamic Programming
Up: Adaptive Resolution Models
Previous: Adaptive Resolution Models
In environments that are characterized
by a set of boolean or discrete-valued variables, it is possible to
learn compact decision trees for representing Q values. The
G-learning algorithm [21], works as follows. It starts
by assuming that no partitioning is necessary and tries to learn Q
values for the entire environment as if it were one state. In
parallel with this process, it gathers statistics based on individual
input bits; it asks the question whether there is some bit b in the
state description such that the Q values for states in which b=1
are significantly different from Q values for states in which b=0.
If such a bit is found, it is used to split the decision tree. Then,
the process is repeated in each of the leaves. This method was able
to learn very small representations of the Q function in the
presence of an overwhelming number of irrelevant, noisy state
attributes. It outperformed Q-learning with backpropagation in a
simple video-game environment and was used by
McCallum [74] (in conjunction
with other techniques for dealing with partial observability) to learn
behaviors in a complex driving-simulator.
It cannot, however, acquire partitions
in which attributes are only significant in combination (such as those
needed to solve parity problems).
Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996