next up previous
Next: Convergence of Modified Value Up: Point-Based DP Update: The Previous: The Algorithm

Stopping Point-Based Value Iteration

Consider the do-while loop of POINT-BASED-VI (Figure 2). Starting from an initial set of vectors, it generates a sequence of sets. If the initial set is uniformly improvable, then the value functions represented by the sets are monotonically increasing and are upper bounded by the optimal value function. As such, they converge to a value function (which is not necessarily the optimal value function). The question is when to stop the do-while loop.

A straightforward method would be to compute the distance tex2html_wrap_inline1240 between two consecutive sets tex2html_wrap_inline1144 and tex2html_wrap_inline1142 and stop when the distance falls below a threshold. To compute the distance, one needs to solve tex2html_wrap_inline1636 linear programs, which is time consuming. We use a metric that is less expensive to compute. To be more specific, we stop the do-while loop when

displaymath1628

In words, we calculate the maximum difference between tex2html_wrap_inline1144 and tex2html_wrap_inline1142 at the witness points of vectors in tex2html_wrap_inline1144 and stop the do-while loop when this quantity is no larger than tex2html_wrap_inline1644 . Here tex2html_wrap_inline1646 is the threshold on the Bellman residual for terminating value iteration and tex2html_wrap_inline1648 is a number between 0 and 1. In our experiments, we set it at 0.1.



Dr. Lian Wen Zhang
Thu Feb 15 14:47:09 HKT 2001