How do you plan efficiently if the results of your actions are
uncertain? There is some remarkably good news, and some some
significant computational hardship. We begin by discussing Markov
Systems (which have no actions) and the notion of Markov Systems with
Rewards. We then motivate and explain the idea of infinite horizon
discounted future rewards. And then we look at two competing approaches
to deal with the following computational problem: given a Markov
System with Rewards, compute the expected long-term discounted rewards.
The two methods, which usually sit at opposite corners of the ring and
snarl at each other, are straight linear algebra and dynamic programming.
We then make the leap up to Markov Decision Processes, and find that
we've already done 82% of the work needed to compute not only the
long term rewards of each MDP state, but also the optimal action to
take in each state.
Powerpoint Format: The Powerpoint originals of these slides are freely available to anyone who wishes to use them for their own work, or who wishes to teach using them in an academic institution. Please email Andrew Moore at awm@cs.cmu.edu if you would like him to send them to you. The only restriction is that they are not freely available for use as teaching materials in classes or tutorials outside degree-granting academic institutions.
Advertisment: I have recently joined Google, and am starting up the new Google Pittsburgh office on CMU's campus. We are hiring creative computer scientists who love programming, and Machine Learning is one the focus areas of the office. If you might be interested, feel welcome to send me email: awm@google.com .