STAGE

Here is the STAGE optimization algorithm main loop, along with two alternative subroutines for training

, as described in Section 3.3.

STAGE(Algorithm A, Objective-function f(x)):

/* Assumption: A(v) is a graph-search algorithm which, given any evaluation function v(x),

acts as a Markov chain over the graph. */

repeat:

run A(f), producing a trajectory ;

Update_Fitter_from_Traj( );

run the two-stage optimization procedure: , and print result.

until results stop improving.

Update_Fitter_From_Traj_by_TD (Fitter V, Trajectory T, result-value z):

/* Assumes that function approximator V is parametrized by weight vector w. */

for i := downto 0, do:

tex2html_wrap_inline1988

update V's weights by delta rule: := ;

end.

Update_Fitter_From_Traj_by_Batch_Fit(Fitter V, Trajectory T, result-value z):

/* Assumes that V stores all data it has ever been trained on. */

for i := 0 to , do:

Add training pair to V's memory: ;

Re-train V from the updated training set.