Consider the do-while loop of POINT-BASED-VI (Figure 2). Starting from an initial set of vectors, it generates a sequence of sets. If the initial set is uniformly improvable, then the value functions represented by the sets are monotonically increasing and are upper bounded by the optimal value function. As such, they converge to a value function (which is not necessarily the optimal value function). The question is when to stop the do-while loop.
A straightforward method would be to compute the distance between two consecutive sets and and stop when the distance falls below a threshold. To compute the distance, one needs to solve linear programs, which is time consuming. We use a metric that is less expensive to compute. To be more specific, we stop the do-while loop when
In words, we calculate the maximum difference between and at the witness points of vectors in and stop the do-while loop when this quantity is no larger than . Here is the threshold on the Bellman residual for terminating value iteration and is a number between 0 and 1. In our experiments, we set it at 0.1.