Tuesday, November 8, 2016. 12:00PM. NSH 3305.
Zhaohan (Daniel) Guo - A PAC RL Algorithm for Episodic POMDPs
Many interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. Polynomial sample complexity bounds are prevalent for fully observable environments, so we looked for a way to do the same for POMDPs. Generally, polynomial sample complexity bounds for POMDPs are impossible, since observations may give no information about the underlying dynamics. However we can build on recent advances in estimating latent variable models using method of moments, specifically confidence bounds on method of moment estimators for HMMs. These methods quantify problem specific properties, allowing us to give, to our knowledge, the first partially observable RL algorithm with a problem specific polynomial bound on the sample complexity.