Date |
Topic |
Teacher |
Links |
8/31 |
No class this week, CSD immigration course
| |
|
9/2 |
No class this week, CSD immigration course
| |
|
9/7 |
Labor day, no class |
|
|
9/9 |
Overview
| Brunskill |
slides |
9/14 |
Monte carlo estimation, TD(0), and Fitted Value Iteration |
Brunskill |
notes |
9/16 |
Fitted Value Iteration |
Brunskill |
FQI paper (Ernst et al. 2005), lecture notes, M.Ghavamzadeh's lecture notes on AVI, API |
9/21 |
Least Squared Policy Iteration |
Brunskill |
lecture notes, LSPI (Lagoudakis and Parr, 2003) |
9/23 |
Approximate Model-based learning |
Brunskill |
lecture notes, lecture slides (different than notes), "An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning"
|
9/28 |
Constructing a good set of features |
Brunskill |
lecture notes, lecture slides (different than notes), "Greedy Algorithms for Sparse Reinforcement Learning"
|
9/30 |
Constructing a good set of features 2 |
Brunskill |
lecture notes, lecture slides from 9/28, Batch iFDD: A Scalable Matching Pursuit Algorithm for Solving MDPs
|
10/5 |
Evaluating the output of Batch RL Methods |
Brunskill |
lecture notes, lecture slides (read these then notes for temporal ordering), "An analysis of model-based interval estimation for Markov decision processes" paper
|
10/7 |
Evaluating the output of Batch RL Methods |
Brunskill/Thomas |
Slides from start of class lecture notes on Bias and Variance approach Phil Thomas's slides "Bias and and Variance Approximation in Value Function Estimates" paper
|
10/12 |
Importance Sampling Approaches to evaluating Batch RL |
Thomas |
lecture slides, paper: "High Confidence Off-Policy Evaluation" paper: "High Confidence Policy Improvement"
|
10/19 |
Selecting among Models in Batch RL for future performance
| Brunskill |
lecture notes, lecture slides, "Offline Policy Evaluation Across Representations with Applications to Educational Games", "Model Selection in Markovian Processes"
|
10/21 |
Online learning
| Brunskill |
lecture notes, "Incremental Model-based Learners with Formal Learning-Time Guarantees"
|
10/26 |
Regret bounds
| Brunskill |
lecture notes, lecture slides
|
10/28 |
Project meetings
| |
|
11/2 |
Bayes-optimal RL
| Brunskill |
POMDP lecture notes (mostly background reference), lecture slides, "Monte-Carlo Planning in Large POMDPs", "Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search", "Bayes-Optimal Reinforcement Learning for Discrete Uncertainty Domains"
|
11/4 |
Sample Efficient Model-based RL
| Brunskill |
lecture slides, "Gaussian processes for sample efficient
reinforcement learning with RMAX-like
exploration", "TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots"
|
11/9 |
Policy Search: Policy Gradient
| Brunskill |
lecture slides, Scribed notes from Pieter Abbeel's class that include derivation I replicated on the board, see pages 1-2, "Policy Gradient Methods for RL with Function Approximation"
|
11/11 |
Policy Search: Sample Efficiency with Bayesian Optimization
| Brunskill |
lecture slides, Ryan Adam's intro to Bayesian Optimization, "Bayesian Optimization for Learning Gaits under Uncertainty"
|
11/16 |
RL for DARPA Robotics Challenge & Pouring Tasks
| Akihiko Yamaguchi |
lecture slides
|
11/18 |
Risk Sensitive RL: Optimizing CVaR
| Brunskill |
lecture slides, "Optimizing the CVaR Via Sampling", A.Tamar's PhD thesis, see sections 1.2.1 for different risk-sensitive objectives that can be of itnerest
|
11/23 |
Safe Exploration
| Brunskill |
(rough) lecture notes to support paper presentation, "Safe Exploration in MDPs"
|
11/25 |
Thanksgiving break
|
|
|
11/30 |
Why doesn't the stuff you learn in class work in real life? A robotics-focused perspective.
| Chris Atkeson |
Dynamic Optimization class website
|
12/2 |
Inverse Reinforcement Learning |
Brunskill |
lecture notes, "Maximum Entropy Inverse Reinforcement Learning", Abbeel's slides on IRL
|
12/7 |
Students
| Project Presentations |
|
12/9 |
Students
| Project Presentations |
|