You need to be happy about Markov Decision Processes (the previous
Andrew Tutorial) before venturing into Reinforcement Learning. It concerns
the fascinating question of whether you can train a controller to
perform optimally in a world where it may be necessary to suck up
some short term punishment in order to achieve long term reward. We
will discuss certainty-equivalent RL, the Temporal Difference (TD)
learning, and finally Q-learning. The curse of dimensionality will
be constantly learning over our shoulder, salivating and cackling.
Powerpoint Format: The Powerpoint originals of these slides are freely available to anyone who wishes to use them for their own work, or who wishes to teach using them in an academic institution. Please email Andrew Moore at awm@cs.cmu.edu if you would like him to send them to you. The only restriction is that they are not freely available for use as teaching materials in classes or tutorials outside degree-granting academic institutions.
Advertisment: I have recently joined Google, and am starting up the new Google Pittsburgh office on CMU's campus. We are hiring creative computer scientists who love programming, and Machine Learning is one the focus areas of the office. If you might be interested, feel welcome to send me email: awm@google.com .