Assignment 2

Due in class at 1:30pm, July 23. You may work in a group of up to three students, but each individual must be involved in every question. Do not assign problems to individuals within a team! Please submit only one solution per group.

Consider the following state space, an extension of the one we did in class. There is a reward of 72 for taking the right (R) action from d. We'll start out in position a.

  1. Using the basic Q learning algorithm described in Section 3.2 (with gamma being 1/2), show how the Q values change as you continue going right from a. Continue going right until the value of Q(d,R) exceeds 72.

    Fill out a table like the following. (I've done the first few steps for you.)
    step12345...
    state (s) abcda...
    action (A) RRRRR...
    reward (r) 000720...
    new state (s') bcdab...
    Q(a,R) 000
    Q(b,R) 000
    Q(c,R) 000
    Q(d,R) 000

  2. Now do the same thing, using the variant algorithm described in Section 3.4. Use an alpha of 1 and a lambda of 1. Complete a table similar to the one of the previous problem.

  3. In problem 1, how many steps did you make before the Q hypothesis refined to a point where Q(d,R) exceeds 72? What about problem 2?