The second experiment is essentially a repeat of the first experiment but in the robot arm domain. The initial number of steps, before the goal was moved, was reduced to 300,000 to speed up the experiments. As the arm has only two degrees of freedom, and with the restrictions discussed in Section 2.4, the number of variations is small. So only three obstacle configurations were used, constructed by hand, with two obstacles in each. To increase the number of experiments, to allow for greater statistical variation, each configuration was repeated with the goal in each of three possible positions, as shown in Figure 28. The black diamonds represent the obstacles, the black rectangles the goal. Solutions to all these tasks were loaded into the case base. When composing a function, however, the system is prevented from selecting a case that comes from the same goal and obstacle configuration.
The curves in Figure 29 are the average of 18 experimental runs, two new goal positions for each of the three original goal positions in the three obstacle configurations shown in Figure 28. There are only two learning curves, non-reinitialized Q-Learning being dropped. As in the first experiment, the function composition system, the lower curve, performed much better than Q-learning. The knee of the function composition system occurs at 2000 steps, the knee of Q-learning at 50,000 steps, giving a speed up of 25. In this experiment, the case base contained subgraphs that matched for all new tasks, so default functions were not needed. The composed functions tend to be very accurate and little further refinement is necessary.