next up previous contents
Next: STAGE Up: Algorithms Previous: TD() for Control

ROUT

 

Here are the ROUT algorithm and HUNTFRONTIERSTATE subroutine, as described in Section 2.3.

ROUT(start states tex2html_wrap_inline1864 , fitter F):
/* Assumes that the world model MDP is known and acyclic. */
initialize training set tex2html_wrap_inline1890 , and F := an arbitrary fit;
repeat:
for each start state tex2html_wrap_inline1894 not yet marked ``done'', do:
s := HUNTFRONTIERSTATE(x, F);
add tex2html_wrap_inline1900 s tex2html_wrap_inline1902 to training set S and re-train fitter F on S;
if (s = x), then mark start state x as ``done''.
until all start states in tex2html_wrap_inline1864 are marked ``done''.
HUNTFRONTIERSTATE(state x, fit F):
/* tex2html_wrap1952
for each legal action tex2html_wrap_inline1926 , do:
repeat up to H times:
generate a trajectory tex2html_wrap_inline1930 from x to termination, starting with action a;
let y be the last state on tex2html_wrap_inline1930 with Bellman residual tex2html_wrap_inline1940 ;
if tex2html_wrap_inline1942 and tex2html_wrap_inline1944 , then break out of loops, and
restart procedure with HUNTFRONTIERSTATE(y, F).
/* reaching this point, x's subtree is deemed all self-consistent and correct! */
return x.



Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996