Next: Delayed Reward
Up: Immediate Reward
Previous: REINFORCE Algorithms
Another strategy for generalization in
reinforcement learning is to reduce the learning problem to an
associative problem of learning boolean functions. A boolean function
has a vector of boolean inputs and a single boolean output. Taking
inspiration from mainstream machine learning work, Kaelbling developed
two algorithms for learning boolean functions from reinforcement: one
uses the bias of k-DNF to drive the generalization
process [54]; the other searches the space of syntactic
descriptions of functions using a simple generate-and-test
method [53].
The restriction to a single boolean output makes these techniques
difficult to apply. In very benign learning situations, it is
possible to extend this approach to use a collection of learners to
independently learn the individual bits that make up a complex output.
In general, however, that approach suffers from the problem of very
unreliable reinforcement: if a single learner generates an
inappropriate output bit, all of the learners receive a low
reinforcement value. The CASCADE method [52]
allows a collection of learners to be trained collectively to generate
appropriate joint outputs; it is considerably more reliable, but can
require additional computational effort.
Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996