16-745: Optimal Control and Reinforcement Learning: Course Description
This course surveys the use of optimization
to design behavior. We will explore ways to represent
policies including hand-designed parametric functions, basis
functions, tables, and trajectory libraries. We will also explore
algorithms to create policies including parameter optimization and
trajectory optimization (first and second order gradient methods,
sequential quadratic programming, random search methods, evolutionary
algorithms, etc.). We will discuss how to handle the discrepancy
between models used to create policies and the actual system being
controlled (evaluation and robustness issues). The course will combine
lectures, student-presented material, and projects. The goal of this
course will be to help participants find the most effective methods
for their problems.
Why teach an optimal control course at the Robotics Institute at CMU?
Progress in computer animation based on dynamic
optimization has demonstrated
solutions to problems we were not able to solve in the past. We are
getting closer to practical use of optimal control and reinforcement
learning for animation and robot planning.
New hardware such as cost effective supercomputer clusters and thousand
core GPU/CPUs also help to make optimal control and reinforcement learning
practical.
CMU has been a leader in applying optimal control to animation and
robotics.
We honor Andy Witkin
(1952-2010) for his contributions in applying
trajectory optimization to computer
animation. Andy was a professor at CMU before moving to Pixar.
(Pixar info)
(CGW writeup)
(IEEE Computer Graphics and Applications writeup)
Andrew Witkin and Michael Kass. Spacetime constraints. Computer Graphics, 22:159-168, 1988. Proc. Siggraph '88:
Text
Figures
Topics
We will focus on systems with continuous states and continuous actions.
We will focus on deterministic systems and systems with "mild" stochastic
effects (most of the time any additive noise distribution is
assumed to be unimodal or Gaussian, and there are few discontinuities). We will talk about both
continuous and discrete time systems.
-
Introduction. What are
function optimization,
trajectory optimization,
policy optimization,
value functions, etc.?
-
Optimizing an operating point: function optimization.
-
Trajectory optimization I: Use function optimization.
-
Trajectory optimization II: Use the chain rule.
-
Differential Dynamic Progamming
-
Policy optimization I: Use function optimization. Known in machine learning/reinforcement learning as policy search, policy refinement, policy gradient, ...
-
Policy optimization II: Dynamic programming (DP)
-
Traditional dynamic programming and the curse of dimensionality. Can supercomputers help? Random action DP.
-
Adaptive grids. Random sampling of states. Trajectory library approaches.
-
See this page for some pointers.
Optional topics: Based on class interest, we will pick from this list and
topics suggested by participants:
-
Stochastic effects on all of the above.
-
Belief states/information states and dual control
-
Robust versions of all of the above. What happens when the model is wrong
(as it always is)?
-
How can we manually choose good features or basis functions? How
can we do it automatically?
-
Handling periodic systems such as walking.
-
Relationship to robot motion planning techniques.
-
Receding horizon control
-
Games
-
Changing/simplifying/abstracting problem representations
-
Abstraction/Aggregation (state and/or action aggregation)
-
Decoupling
-
Approximation with simpler problem
-
Multigrid methods for trajectory optimization and DP.
-
Meta-optimization: avoid local minima by varying structural parameters
such as sampling rate, resolution, number of knot points, regularization
parameters, etc.
Prerequisites
Graduate standing or permission of the instructor.
It is necessary to be able to write programs written in some computer
language, or use a package such as Matlab.
It is helpful to have had prior exposure to numerical methods.
Textbook
There is no textbook for this course. See the resources listed above.
Work
-
Assignments typically will involve solving optimal control and reinforcement learning problems
by using packages such as Matlab or writing programs in a computer language
like C and using numerical libraries.
-
Project: Described on main course web page.
-
Presentations: Students will be asked to make presentations in class
and to present their project.
-
There will be no exams. Your
grade will be based on assignments,
project, and presentations.
Relevant Books
There is no one reading with the coverage I want, but here are some
relevant readings.
Dynamic Optimization, Arthur E. Bryson, 1999
Matlab code
Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, Second Edition (Advances in Design and Control), John T. Betts, 2009
Dynamic Programming and Optimal Control, Dimitri P. Bertsekas,
Vol. I, 3RD EDITION, 2005,
Vol. II, 3RD EDITION, 2007.
Relevant Courses
-
16-899 Adaptive Control and Reinforcement Learning (Spring 2020)
-
16-881 Deep Reinforcement Learning for Robotics
-
10-703: Deep Reinforcement Learning and Control
-
10-403: Deep Reinforcement Learning and Control (undergrad version)
-
CMU ChemE: 6-720 Advanced Process Systems Engineering
-
CMU Tepper: 47-840 Dynamic Programming, 47-832 Nonlinear Programming
-
CMU MATH: 21-690 Methods of Optimization
-
CMU MechE 24-785 Engineering Optimization
-
CMU ChemE: 6-462 Optimization Modeling and Algorithms
-
Berkeley's Deep Reinforcement Learning
-
David Silver, UCL: Reinforcement Learning
-
Udemy: Advanced AI: Deep Reinforcement Learning in Python
The Complete Guide to Mastering Artificial Intelligence using Deep Learning and Neural Networks
-
Udacity/Georgia Tech: Reinforcement Learning
-
Coursera
-
ECE 553 - Optimal Control,
Spring 2008, ECE, University of Illinois at Urbana-Champaign,
Yi Ma
-
U. Washington, Todorov
-
MIT: 6.231 Dynamic Programming and Stochastic Control Fall 2008
See
Dynamic Programming and Optimal Control/Approximate Dynamic Programming,
for Fall 2009 course slides.
-
EPFL: IC-32: Winter Semester 2006/2007: NONLINEAR AND DYNAMIC
OPTIMIZATION
From Theory to Practice
-
AGEC 637:
Lectures in Dynamic Optimization:
Optimal Control and Numerical Dynamic Programming
-
U. Florida: AEB 6533: Static and Dynamic Optimization Models in Agriculture
-
USC: Syllabus: Reinforcement Learning and Learning Control
-
Duke: BA591.12 Dynamic programming and optimal control
-
MIT: 14.451 Dynamic Optimization Methods with Applications, Fall 2009
-
MIT: 16-323 Principles of Optimal Control
-
MIT 16.410/16.413 Principles of Autonomy and Decision Making
-
Berkeley CS 294-40
Learning for robotics and control, Fall 2009
-
UPenn: Dynamic Programming and Stochastic Control
-
UBC
CPSC 532M, Topics in AI: Dynamic Programming 2007-8
-
U. of Cambridge:
Optimization and Control
-
Stanford: MS&E 351 Dynamic Programming 2008
-
Google "courses optimal control" or "courses reinforcement learning"
Stuff From Economics
Worked Examples in Dynamic Optimization:
Analytic and Numeric Methods
Dynamic Optimization in Continuous-Time Economic Models
(A Guide for the Perplexed)
Dynamic Optimization: The Calculus of Variations and Optimal Control in Economics and Management (Advanced Textbooks in Economics)
Elements of Dynamic Optimization (also see Amazon.com page)
Econ 610: Stochastic Dynamic Optimization in Economics
Lancaster U. Management School: PhD Course - Foundations of Optimization
Other Resources
Dynamic Programming and Optimal Control/Approximate Dynamic Programming,
Dimitri P. Bertsekas
B. van Roy: See Dynamic Optimization publications
DOTcvpSB, a software toolbox for dynamic optimization in systems biology
Optimal control (wikipedia) - too narrow
Optimal control (scholarpedia) also too narrow