- ...points.
- As will be seen later, point-based DP update
also backs up on other points.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...selected,
- We will show how in Section 5.5.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...31#31.
- Since b is a witness of 31#31
w.r.t 48#48, we have 90#90. Since 33#33 is uniformly improvable,
we also have 91#91.
Together with the obvious fact that 92#92 and
the condition 93#93, we have 94#94.
Consequently, there cannot be any vector in 34#34 that
equals 31#31.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...otherwise:
- In our
actual implementation, the solution point b is used for backup even when
the optimal value of the objective function is negative. In this case, duplication check
is needed.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...code.
- The implementation is available on request.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...4).
-
Note that times shown there do not include time for testing
the stopping condition.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...quality
- Quality of
a policy is estimated using the Bellman residual.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...previously.
-
Hauskrecht (2000) has conducted an extensive survey on previous value
function approximation methods and has empirically compared them
in terms of, among other criteria, complexity and quality.
It would be interesting to also
include point-based value iteration in the empirical comparison.
This is not done in the present paper because our focus is on
using point-based value iteration to speed value iteration,
rather than using as a value function approximation method.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...policy
- In Hansen's writings,
policy improvement includes DP update as a substep.
Here DP update is not considered part of
policy improvement.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.