Tuesday, April 10, 2018. 12:00PM. NSH 3305.
Wen Sun -- Efficient Reinforcement Learning via Imitation
Abstract: A fundamental challenge in Artificial Intelligence (AI), robotics, and language processing is sequential prediction: to reason, plan, and make a sequence of predictions or decisions to minimize accumulated cost, achieve a long-term goal, or optimize for a loss acquired only after many predictions. Reinforcement Learning (RL), as a general framework for learning from experience to make predictions and decisions, is often considered as one of the perfect tools for solving such a challenge in AI. Recently, equipped with the advancement from Deep Learning literature, we have advanced the state-of-the-art of RL on a number of applications including simulated high-dimensional robotics control, video games, and board games (e.g., AlphaGo).
Because of its generality-RL is a general framework that summarizes many special machine learning algorithms and applications-RL is hard. As there is no direct supervision, one central challenge in RL is how to explore an unknown environment and collect useful feedback efficiently. In recent RL success stories (e.g., super-human performance on video games [Mnih et al., 2015]), we notice that most of them rely on random exploration strategies, which usually requires huge number of interactions with the environment before it can learn anything useful. Another challenge is credit assignment: if a learning agent successfully achieves some task after making a long sequence of decisions, how can we assign credit for the success among these decisions?
We first attempt to gain purchase on RL problems by introducing an additional source of information—an expert who knows how to solve tasks (near) optimally. By imitating an expert, we can significantly reduce the burden of exploration (i.e., we imitate instead of randomly explore), and solve the credit assignment problem (i.e., the expert tells us which decisions are bad). We study in both theory and in practice how one can imitate experts to reduce sample complexity compared to a pure RL approach.
As Imitation Learning is efficient, we next provide a general reduction from RL to Imitation Learning with a focus on applications where experts are not available. We explore the possibilities of learning local models and then using off-shelf model-based RL solvers to compute an intermediate ''expert'' for efficient policy improvement via imitation. Furthermore, we show a general convergence analysis that generalizes and provides the theoretical foundation for recent successful practical RL algorithms such as ExIt and AlphaGo Zero [Anthony et al., 2017, Silver et al., 2017], and provides a theoretical sound and practically efficient way of unifying model-based and model-free RL approaches.