Due in class at 1:30pm, July 23. You may work in a group of up to three students, but each individual must be involved in every question. Do not assign problems to individuals within a team! Please submit only one solution per group.
Consider the following state space, an extension of the one we did in class. There is a reward of 72 for taking the right (R) action from d. We'll start out in position a.
Fill out a table like the following. (I've done the first few steps for
you.)
step | 1 | 2 | 3 | 4 | 5 | ... |
state (s) | a | b | c | d | a | ... |
action (A) | R | R | R | R | R | ... |
reward (r) | 0 | 0 | 0 | 72 | 0 | ... |
new state (s') | b | c | d | a | b | ... |
Q(a,R) | 0 | 0 | 0 | |||
Q(b,R) | 0 | 0 | 0 | |||
Q(c,R) | 0 | 0 | 0 | |||
Q(d,R) | 0 | 0 | 0 |