A value function V is represented by a set of vectors if it equals the value function induced by the set. When a value function is representable by a finite set of vectors, there is a unique parsimonious set of vectors that represents the function (Littman et al. 1995a).
Sondik (1971) has shown that if a value function V is representable by a finite set of vectors, then so is the value function TV. The process of obtaining the parsimonious representation for TV from the parsimonious representation of V is usually referred to as dynamic programming (DP) update. Let be the parsimonious set of vectors that represents V. For convenience, we use to denote the parsimonious set of vectors that represents TV.
In practice, value iteration for POMDPs is not carried out directly in terms of value functions themselves. Rather, it is carried out in terms of sets of vectors that represent the value functions (Figure 2). One begins with an initial set of vectors . At each iteration, one performs a DP update on the previous parsimonious set of vectors and obtains a new parsimonious set of vectors . One continues until the Bellman residual , which is determined by solving a sequence of linear programs, falls below a threshold.
Figure 2: Value Iteration for POMDPs.