Next: POMDP Approach
Up: Policies with Internal State
Previous: Classifier Systems
One way to restore the
Markov property is to allow decisions to be based on the history of
recent observations and perhaps actions. Lin and
Mitchell [62] used a fixed-width finite history window
to learn a pole balancing task. McCallum [76] describes
the ``utile suffix memory'' which learns a variable-width window that
serves simultaneously as a model of the environment and a
finite-memory policy. This system has had excellent results in a very
complex driving-simulation domain [74]. Ring [92]
has a neural-network approach that uses a variable history window, adding
history when necessary to disambiguate situations.
Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996