This assignment explores designing a controller and a global policy for an unstable system (cart-pole).
Here is a simulation of an inverted pendulum balanced on a cart. Your goal is to develop controllers for this system that get and keep the pole pointed upwards. Use the useful and lib directories from a previous assignment.
Part 1: Try to design a controller by hand for this system. The point of this part of the assignment is to make you appreciate automatic controller design methods. If you think this is too easy, manually design a controller for a jointed pole on a cart, a flexible pole, two unequal length poles on the same cart, or a system with a 0.2 second delay in responding to commands.
Part 2: Design an LQR-based controller for this system. You must choose the optimization criterion. Find the LQR-based controller with the biggest volume of initial conditions in state space for which it works. You can model the volume with a grid, or try to estimate the volume with an ellipsoid. Test what happens when the optimization criterion has a very small penalty on the pole angle vs. a very small penalty on the cart position. Why is there a difference.
Part 3: Design a global policy that can swing the pendulum up to the top using dynamic programming. The criterion you should use is a pure quadratic criterion x'*Q*x + u'*R*u where x is the state vector, u is the action vector, and Q and R are identity matrices. What volume of intial conditions in state space where the LQR controller for the same criterion has a value function that is less than twice the value function of the optimal policy?
Part 4: Just for fun: Let's measure how humans do this task, and try to identify what the human controller is. Software to be provided.
Things to think about: Does a longer or shorter pole make this easier or harder? Does viscous friction help? Does Coulomb friction help? How handle static friction?
Another thing to think about: How do you get LQR control design to generate an integral gain?
http://www.coneural.org/florian/papers/05_cart_pole.pdf http://mil.engr.utk.edu/wiki/Cart-Pole_Dynamics_Testbed_and_Analysis http://mil.engr.utk.edu/wiki/Example_Cart-Pole_Controllers http://65.44.200.132/Library/1996/Correction_Cart-Pole.pdf http://www.cs.ualberta.ca/~sutton/book/code/pole.c http://brain.cc.kogakuin.ac.jp/~kanamaru/NN/CPRL/ http://portal.acm.org/citation.cfm?id=869873 http://www.cmap.polytechnique.fr/~munos/variable/cartpole.html http://www-clmc.usc.edu/Resources/Publications?id=2654 http://www-clmc.usc.edu/publications/S/schaal-NIPS1997.pdf http://www.stanford.edu/class/cs229/ps/ps4/q6/control.m http://www-anw.cs.umass.edu/rlr/domains.html http://mlg.eng.cam.ac.uk/marc/learn_ctrl.php http://www.serbi.ula.ve/sira/Congresos%20Internacionales/con1994_5.pdf http://www.ict.swin.edu.au/personal/jbrownlee/2005/TR07-2005.pdf http://www.ijee.dit.ie/articles/Vol18-6/IJEE1333.pdf http://inst.eecs.berkeley.edu/~ee128/fa08/labs/EECS128_lab5a.pdf http://www.mitpressjournals.org/doi/abs/10.1162/0899766053011528 http://www.cs.cmu.edu/~sandholm/cs15-381/hw4/index.html
You can use any type of computer/OS/language you want. You can work in groups or alone.