Reinforcement Learning

Modern computer systems face complex decision-making tasks such as controlling traffic signals, scheduling factory production, planning medical treatments, allocating investment portfolios, routing data through communications networks, and playing expert-level backgammon or chess. Such tasks are difficult sequential decision problems:
  • the task calls for not a single decision, but rather a whole series of decisions over time;
  • the outcome of any decision may depend on random environmental factors beyond the computer's control; and
  • the ultimate objective---measured in terms of traffic flow, patient health, business profit, or game victory---depends in a complicated way on many interacting decisions and their random outcomes.
  • In such complex problems, optimal decision policies are in general unknown, and it is often difficult, even for human domain experts, to manually encode even reasonably good decision policies in software. A growing body of research in Artificial Intelligence suggests the following alternative methodology:

    A decision-making algorithm can autonomously learn effective policies for sequential decision tasks, simply by simulating the task and keeping statistics on which decisions lead to good ultimate performance and which do not.

    The field of reinforcement learning defines a principled foundation for this methodology, based on classical dynamic programming algorithms for solving Markov Decision Problems. The goal of reinforcement learning is to successfully approximate the optimal value function, a special evaluation function that predicts the expected long-term quality of visiting any state. For example, the optimal value function in the game of backgammon is defined as the probability of Player X winning at any given board position, assuming optimal play by both players for the rest of the game. Having an (approximately) optimal value function allows (approximately) optimal decisions to be made.

    More Information

    Back to Glossary Index