Here is the STAGE optimization algorithm main loop, along with two
alternative subroutines for training , as described in
Section 3.3.
STAGE(Algorithm A, Objective-function f(x)): |
/* Assumption: A(v) is a graph-search algorithm which, given any evaluation function v(x), |
acts as a Markov chain over the graph. */ |
repeat: |
run A(f), producing a trajectory
![]() |
Update_Fitter_from_Traj( ![]() |
run the two-stage optimization procedure: ![]() |
until results stop improving. |
Update_Fitter_From_Traj_by_TD ![]() |
/* Assumes that function approximator V is parametrized by weight vector w. */ |
for i := ![]() |
![]() |
update V's weights by delta rule: ![]() ![]() |
end. |
Update_Fitter_From_Traj_by_Batch_Fit(Fitter V, Trajectory T, result-value z): |
/* Assumes that V stores all data it has ever been trained on. */ |
for i := 0 to ![]() |
Add training pair to V's memory: ![]() |
Re-train V from the updated training set. |