ROUT

Here are the ROUT algorithm and HUNTFRONTIERSTATE subroutine, as described in Section 2.3.

ROUT(start states , fitter F):

/* Assumes that the world model MDP is known and acyclic. */

initialize training set , and F := an arbitrary fit;

repeat:

for each start state not yet marked ``done'', do:

s := HUNTFRONTIERSTATE(x, F);

add s to training set S and re-train fitter F on S;

if (s = x), then mark start state x as ``done''.

until all start states in are marked ``done''.

HUNTFRONTIERSTATE(state x, fit F):

/* tex2html_wrap1952

for each legal action , do:

repeat up to H times:

generate a trajectory from x to termination, starting with action a;

let y be the last state on with Bellman residual ;

if and , then break out of loops, and

restart procedure with HUNTFRONTIERSTATE(y, F).

/* reaching this point, x's subtree is deemed all self-consistent and correct! */

return x.