Here is the STAGE optimization algorithm main loop, along with two alternative subroutines for training , as described in Section 3.3.
STAGE(Algorithm A, Objective-function f(x)): |
/* Assumption: A(v) is a graph-search algorithm which, given any evaluation function v(x), |
acts as a Markov chain over the graph. */ |
repeat: |
run A(f), producing a trajectory ; |
Update_Fitter_from_Traj( ); |
run the two-stage optimization procedure: , and print result. |
until results stop improving. |
Update_Fitter_From_Traj_by_TD (Fitter V, Trajectory T, result-value z): |
/* Assumes that function approximator V is parametrized by weight vector w. */ |
for i := downto 0, do: |
update V's weights by delta rule: := ; |
end. |
Update_Fitter_From_Traj_by_Batch_Fit(Fitter V, Trajectory T, result-value z): |
/* Assumes that V stores all data it has ever been trained on. */ |
for i := 0 to , do: |
Add training pair to V's memory: ; |
Re-train V from the updated training set. |