Assignment 2

Due in class at 1:30pm, July 23. You may work in a group of up to three students, but each individual must be involved in every question. Do not assign problems to individuals within a team! Please submit only one solution per group.

Consider the following state space, an extension of the one we did in class. There is a reward of 72 for taking the right (R) action from d. We'll start out in position a.

Using the basic Q learning algorithm described in Section 3.2 (with gamma being 1/2), show how the Q values change as you continue going right from a. Continue going right until the value of Q(d,R) exceeds 72.

Fill out a table like the following. (I've done the first few steps for you.)

step 1 2 3 4 5 ...

state (s) a b c d a ...

action (A) R R R R R ...

reward (r) 0 0 0 72 0 ...

new state (s') b c d a b ...

Q(a,R) 0 0 0

Q(b,R) 0 0 0

Q(c,R) 0 0 0

Q(d,R) 0 0 0
Now do the same thing, using the variant algorithm described in Section 3.4. Use an alpha of 1 and a lambda of 1. Complete a table similar to the one of the previous problem.
In problem 1, how many steps did you make before the Q hypothesis refined to a point where Q(d,R) exceeds 72? What about problem 2?

step	1	2	3	4	5	...
state (`s`)	`a`	`b`	`c`	`d`	`a`	...
action (`A`)	R	R	R	R	R	...
reward (`r`)	0	0	0	72	0	...
new state (`s`')	`b`	`c`	`d`	`a`	`b`	...
`Q`(`a`,R)	0	0	0
`Q`(`b`,R)	0	0	0
`Q`(`c`,R)	0	0	0
`Q`(`d`,R)	0	0	0