Tuesday, February 20, 2018. 12:00PM. NSH 1507.
Alec Koppel -- Nonparametric Stochastic Methods for Continuous Reinforcement Learning
Abstract: Reinforcement learning is a generic framework to describe an autonomous agent seeking to learn behavior sequentially in uncertain environments based on rewards. This framework has gained increasing relevance for autonomous control, management science, and econometrics. Unfortunately, heuristics or intractably complicated tools are still prevalent when state and action spaces are continuous. In this talk, we develop new algorithms for estimating the value function or action-value function in continuous Markov Decision Problems (MDPs). The core of these methods are nonparametric (kernelized) extensions of stochastic quasi-gradient methods operating in tandem with sparse subspace projections. The resulting tools yield the first convergence results for Value or Q-function estimation when these functions have an infinite nonlinear parameterization, addressing in the affirmative a long-standing open question posed by Tsistiklis and Van Roy (1997). We then demonstrate on the classic Mountain Car domain that we can obtain comparable performance to existing approaches to TD or Q learning with orders of magnitude fewer data samples and interpretable representations of the learned functions.
Biography: Alec Koppel began as a Research Scientist at the U.S. Army Research Laboratory in the Computational and Information Sciences Directorate in September of 2017. He completed his master's degree in Statistics and doctorate in Electrical and Systems Engineering, both at the University of Pennsylvania (Penn) in August of 2017. He is also a participant in the Science, Mathematics, and Research for Transformation (SMART) Scholarship Program sponsored by the American Society of Engineering Education. Before coming to Penn, he completed his Master's degree in Systems Science and Mathematics and Bachelor's Degree in Mathematics, both at Washington University in St. Louis (WashU), Missouri. His research interests are in the areas of signal processing, optimization and learning theory. His current work focuses on optimization and learning methods for streaming data applications, with an emphasis on problems arising in autonomous systems. He co-authored a paper selected as a Best Paper Finalist at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers.