...points.
As will be seen later, point-based DP update also backs up on other points.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...selected,
We will show how in Section 5.5.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...31#31.
Since b is a witness of 31#31 w.r.t 48#48, we have 90#90. Since 33#33 is uniformly improvable, we also have 91#91. Together with the obvious fact that 92#92 and the condition 93#93, we have 94#94. Consequently, there cannot be any vector in 34#34 that equals 31#31.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...otherwise:
In our actual implementation, the solution point b is used for backup even when the optimal value of the objective function is negative. In this case, duplication check is needed.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...code.
The implementation is available on request.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...4).
Note that times shown there do not include time for testing the stopping condition.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...quality
Quality of a policy is estimated using the Bellman residual.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...previously.
Hauskrecht (2000) has conducted an extensive survey on previous value function approximation methods and has empirically compared them in terms of, among other criteria, complexity and quality. It would be interesting to also include point-based value iteration in the empirical comparison. This is not done in the present paper because our focus is on using point-based value iteration to speed value iteration, rather than using as a value function approximation method.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...policy
In Hansen's writings, policy improvement includes DP update as a substep. Here DP update is not considered part of policy improvement.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Dr. Lian Wen Zhang
Thu Feb 15 14:47:09 HKT 2001