Interval-based Techniques

Next: More General Problems Up: Ad-Hoc Techniques Previous: Randomized Strategies

Interval-based Techniques

Exploration is often more efficient when it is based on second-order information about the certainty or variance of the estimated values of actions. Kaelbling's interval estimation algorithm [52] stores statistics for each action : is the number of successes and the number of trials. An action is chosen by computing the upper bound of a confidence interval on the success probability of each action and choosing the action with the highest upper bound. Smaller values of the parameter encourage greater exploration. When payoffs are boolean, the normal approximation to the binomial distribution can be used to construct the confidence interval (though the binomial should be used for small n). Other payoff distributions can be handled using their associated statistics or with nonparametric methods. The method works very well in empirical trials. It is also related to a certain class of statistical techniques known as experiment design methods [17], which are used for comparing multiple treatments (for example, fertilizers or drugs) to determine which treatment (if any) is best in as small a set of experiments as possible.

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996