16-745: Optimal Control and Reinforcement Learning: Course Description

This course surveys the use of optimization to design behavior. We will explore ways to represent policies including hand-designed parametric functions, basis functions, tables, and trajectory libraries. We will also explore algorithms to create policies including parameter optimization and trajectory optimization (first and second order gradient methods, sequential quadratic programming, random search methods, evolutionary algorithms, etc.). We will discuss how to handle the discrepancy between models used to create policies and the actual system being controlled (evaluation and robustness issues). The course will combine lectures, student-presented material, and projects. The goal of this course will be to help participants find the most effective methods for their problems.

Why teach an optimal control course at the Robotics Institute at CMU?

Progress in computer animation based on dynamic optimization has demonstrated solutions to problems we were not able to solve in the past. We are getting closer to practical use of optimal control and reinforcement learning for animation and robot planning.

New hardware such as cost effective supercomputer clusters and thousand core GPU/CPUs also help to make optimal control and reinforcement learning practical.

CMU has been a leader in applying optimal control to animation and robotics. We honor Andy Witkin (1952-2010) for his contributions in applying trajectory optimization to computer animation. Andy was a professor at CMU before moving to Pixar.
(Pixar info)
(CGW writeup)
(IEEE Computer Graphics and Applications writeup)
Andrew Witkin and Michael Kass. Spacetime constraints. Computer Graphics, 22:159-168, 1988. Proc. Siggraph '88: Text Figures

Topics

We will focus on systems with continuous states and continuous actions. We will focus on deterministic systems and systems with "mild" stochastic effects (most of the time any additive noise distribution is assumed to be unimodal or Gaussian, and there are few discontinuities). We will talk about both continuous and discrete time systems.

Introduction. What are function optimization, trajectory optimization, policy optimization, value functions, etc.?
Optimizing an operating point: function optimization.
- Resources: Matlab, Numerical Recipes, GSL, AMPL, NEOS, software list 1, Useful software guide, ...
- Sequential quadratic programming currently popular in control and robotics. SNOPT
- Covariance Matrix Adaptation Evolution Strategy currently popular in computer animation. See also Hansen web page.
Trajectory optimization I: Use function optimization.
- Shooting methods
- Collocation
  - Dircol
  - PROPT
Trajectory optimization II: Use the chain rule.
- Differential Dynamic Progamming
Policy optimization I: Use function optimization. Known in machine learning/reinforcement learning as policy search, policy refinement, policy gradient, ...
Policy optimization II: Dynamic programming (DP)
- Traditional dynamic programming and the curse of dimensionality. Can supercomputers help? Random action DP.
- Adaptive grids. Random sampling of states. Trajectory library approaches.
- See this page for some pointers.

Optional topics: Based on class interest, we will pick from this list and topics suggested by participants:

Stochastic effects on all of the above.
- Belief states/information states and dual control
Robust versions of all of the above. What happens when the model is wrong (as it always is)?
How can we manually choose good features or basis functions? How can we do it automatically?
Handling periodic systems such as walking.
Relationship to robot motion planning techniques.
Receding horizon control
Games
- Pursuit evasion games
Changing/simplifying/abstracting problem representations
- Abstraction/Aggregation (state and/or action aggregation)
- Decoupling
- Approximation with simpler problem
- Multigrid methods for trajectory optimization and DP.
- Meta-optimization: avoid local minima by varying structural parameters such as sampling rate, resolution, number of knot points, regularization parameters, etc.

Prerequisites

Graduate standing or permission of the instructor. It is necessary to be able to write programs written in some computer language, or use a package such as Matlab. It is helpful to have had prior exposure to numerical methods.

Textbook

There is no textbook for this course. See the resources listed above.

Work

Assignments typically will involve solving optimal control and reinforcement learning problems by using packages such as Matlab or writing programs in a computer language like C and using numerical libraries.
Project: Described on main course web page.
Presentations: Students will be asked to make presentations in class and to present their project.
There will be no exams. Your grade will be based on assignments, project, and presentations.