Suppose V and U are two value functions. We say that U dominates V and write if for every belief state b. A value function V is said to be uniformly improvable if . A set of vectors dominates another set of vectors if the value function induced by dominates that induced by . A set of vectors is unformly improvable if the value function it induces is.
This lemma is obvious and is well known in the MDP community (Puterman 1990). Nonetheless, it enables us to explain the intuition behind the term ``uniformly improvable". Suppose V is a uniformly improvable value function and suppose value iteration starts with V. Then the sequence of value functions generated is monotonically increasing and converges to the optimal value function . This implies . That is, TV(b) is closer to than V(b) for all belief states b.
The following lemma will be used later to address the issues listed at the end of the previous section.
Proof: Since , we have by Lemma 1. We also have the condition . Consequently, . That is, U is uniformly improvable.