These experiments demonstrate that the E-PCA algorithm can scale to finding low-dimensional surfaces embedded in very high-dimensional spaces.
The algorithm is iterative and therefore no simple expression for the total running time is available. For a data set of samples of dimensionality , computing a surface of size , each iteration of the algorithm is . Each step of the Newton's algorithm is dominated by a set of matrix multiplies and the final step of inverting an matrix, which is . The step consists of iterations, where each iteration has multiplies and the inversion. The step consists of iterations, where each iteration has multiplies and the inversion, leading to the total complexity given above.
Figure 23 shows the time to compute the E-PCA bases for 500 sample beliefs, for 20,230 states. This implementation used Java 1.4.0 and Colt 1.0.2, on a 1 GHz Athlon CPU with 900M of RAM. Also shown are the computation times for conventional PCA decomposition. For small state space problems, the E-PCA decomposition can be faster than PCA for a small number of bases, if the implementation of PCA always computes the full decomposition (, where is the reduced dimensionality and is the full dimensionality).
|
By far the dominant term in the running time of our algorithm is the time to compute the E-PCA bases. Once the bases have been found and the low-dimensional space has been discretized, the running time required by value iteration to converge to a policy for the problems we have described was on the order of 50 to 100ms.
In all example problems we have addressed, we have used a standard sample size of 500 sample beliefs. Additionally, we have used hand-coded heuristic controllers to sample beliefs from the model. In practice, we found 500 sample beliefs collected using a semi-random controller sufficient for our example problems. However, we may be able to improve the overall performance of our algorithm on future problems by iterating between phases of building the belief space representation (i.e., collecting beliefs and generating the low-dimensional representation) and computing a good controller. Once an initial set of beliefs have been collected and used to build an initial set of bases and a corresponding policy, we can continue to evaluate the error of the representation (e.g., K-L divergence between the current belief and its low-dimensional representation). If the initial representation has been learned with too few beliefs, then the representation may over-fit the beliefs; we can detect this situation by noticing that our representation does a poor job at representing new beliefs. Validation techniques such as cross-validation may also be useful in determining when enough beliefs have been acquired.
One of the open questions we have not addressed so far is that of choosing the appropriate number of bases for our representation. Unless we have problem-specific information, such as the true number of degrees of freedom in the belief space (as in the toy example of section 3), it is difficult to identify the appropriate dimensionality of the underlying surface for control. One common approach is to examine the eigenvalues of the decomposition, which can be recovered using the orthonormalization step of the algorithm in Table 1. (This assumes our particular link function is capable of expressing the surface that our data lies on.) The eigenvalues from conventional PCA are often used to determine the appropriate dimensionality of the underlying surface; certainly the reconstruction will be lossless if we use as many bases as there are non-zero eigenvalues.
Unfortunately, recall from the description of E-PCA in section 4 that we do not generate a set of singular values, or eigenvalues. The non-linear projection introduced by the link function causes the eigenvalues of the matrix to be uninformative about the contribution of each basis to the representation. Instead of using eigenvalues to choose the appropriate surface dimensionality, we use reconstruction quality, as in Figure 11. Using reconstruction quality to estimate the appropriate dimensionality is a common choice for both PCA and other dimensionality reduction techniques [TdSL00]. One alternate choice would be to evaluate the reward for policies computed for different dimensionalities and choose the most compact representation that achieves the highest reward, essentially using control error rather than reconstruction quality to determine dimensionality.
Recall from our discussion in section 2 that we are using dimensionality reduction to represent beliefs from POMDPs with a specific kind of structure. In particular, the E-PCA representation will be most useful in representing beliefs that are relatively sparse and have a small number of degrees of freedom. However, E-PCA will be unable to find good low-dimensional representations for POMDP models that do not exhibit this kind of structure - that is, if the beliefs cannot be represented as lying on low-dimensional hyperplane linked to the full belief space via the appropriate link function. One additional problem then is how to know a priori whether or not a specific POMDP has the appropriate structure. It is unlikely that there is a general technique that can determine the usefulness of E-PCA, but we can take advantage of model selection techniques also to determine whether or not E-PCA will find a usefully low dimensional representation for a specific POMDP. For example, if the KL divergence between a set of sample beliefs and their reconstructions is large even using a large number of bases, then the problem may not have the right structure.