6.1 Limitations in the Approach

Next: 6.2 Limitations in the Up: 6 Limitations Previous: 6 Limitations

6.1 Limitations in the Approach

To explore the possible limitations in the approach, this section reviews the fundamental assumptions on which it is based.

It is a fundamental assumption that features arise in the reinforcement learning function that qualitatively define its shape. The features used in this paper are the violation of a smoothness assumption, that neighboring states have very similar utility values. A wall, by preventing transitions between neighboring states, typically causes such a violation. Other things, such as actions with a significant cost, would have a similar effect. Smaller, and much more varied costs, will not generate the features required by this approach, so it offers little in the way of speed up in these cases. If there is a mixture of large and small costs, it is expected that the system will capture features generated by the former, initialize the function and normal reinforcement learning will address the latter.

The smoothness assumption is less clear if the dimensions are not numeric. The neighborhood relation, used here, is a predefined distance metric over a continuous space. In nominal, binary or mixed domains it is not obvious how such a metric would be defined, although there is some work on such metrics for other applications (Osborne & Bridge 1997). If the dimensions are mixed, feature location might be limited to the continuous ones. If the dimensions are purely nominal or binary, a generalization of the snake may be appropriate. The snake is, at an abstract level, a constrained hill climber. But whether or not this idea would usefully generalize in this way is at present somewhat speculative.

It is a fundamental assumption that the features clearly delimit subtasks. In the domains discussed in this paper, the obstacles and walls subdivide the state space into regions connected by small ``doorways''. The subtask of reaching one doorway is not greatly affected by the subsequent subtask. In other domains this may not be the case. As the doorways become larger, the context sensitivity increases. As long as the composed solution is reasonably accurate, reinforcement learning can easily correct the error although speed up will be reduced. At some point however, due to a very large amount of context sensitivity, the advantage of dividing the task into subtasks will become questionable. It would be possible to account for some of the context dependency in the graph matching stage, looking at larger units than subgraphs. If two adjacent subgraphs match the new problem, they might be used as a pair, thereby including any contextual relationship between them. Even if single subgraphs were used, the context in which they appear, i.e. the shape of neighboring subgraphs, could be taken into account. In the limit, graph matching the whole task might be used. But, as was argued in the introduction, this would considerably limit when transfer is applicable, and thus its overall effectiveness.

It is a fundamental assumption that the absolute position of the features is unimportant, it is the shape of the delimited region that matters. To increase the likelihood of transfer, solutions to subtasks have been subjected to a variety of transformations. In some domains, many, if not all, of these transformations will be invalid. If actions cannot be rotated or reflected, or if many small costs affect different regions of the state space, the effectiveness of transfer will be reduced. This would be, to some extent, addressed by additional penalties for different transformations, but again this would limit the opportunities for transfer. Which transformations are appropriate, and whether or not this can be determined automatically from the domain, will be the subject of future research.

It is a fundamental assumption that a vision processing technique can locate these features in a timely fashion, even in very high dimensional domains. Learning in very high dimensional domains is likely to be slow whatever technique is used. Normal reinforcement learning will take time to navigate the much larger space, slowing down the emergence of the features. Although the time taken to partition the function will increase, the frequency with which partitioning is applicable will decrease. Thus the amortized cost will rise more slowly. Further, as high dimensional spaces are generally problematical, methods such as principal components analysis and projection pursuit (Nason 1995) can be used to reduce dimensionality. It may prove in practice that the dimensionality which is important, and is the focus of feature extraction, is much smaller than the actual dimensionality of the space.

Next: 6.2 Limitations in the Up: 6 Limitations Previous: 6 Limitations

Chris Drummond
Thursday January 31 01:30:31 EST 2002