As we mentioned at the outset of Section , reuse only translates to locality if the subsequent use of data occurs before the data are displaced from the cache. The likelihood of displacement depends on (i) the amount of data brought into the cache between reuses, which is influenced by factors such as loop iteration counts, and (ii) the ability of the cache to retain data, which depends on factors such as the cache size. In this subsection, we describe how our algorithm takes such factors into account to compute locality through the notion of a localized iteration space.
We begin with the example in Figure which illustrates how loop iteration counts and cache size affect locality. The code in Figures (a) and (b) is similar to our original example in Figure (a), except that the upper bound of the j loop is small in Figure (a) (8 iterations) and large in Figure (b) (10,000 iterations). The B[j+1][0] reference in both cases has temporal reuse along the outer loop (as discussed earlier in Section ). In Figure (a), this reuse is likely to result in temporal locality since the eight j iterations bring only 192 bytes into the 8 Kbyte cache between reuses of the same B[j+1][0] locations. In Figure (b), however, the temporal reuse does not result in locality since the 10,000 j iterations will sweep 240 Kbytes of data through the 8 Kbyte cache, flushing a given B[j+1][0] location before it can be reused.
Predicting accurately whether data will remain in the cache is infeasible for the compiler, due to factors such as symbolic loop iteration counts, cache mapping conflicts, etc. Rather than trying to represent exactly which reuses would result in cache hits, we adopt Wolf and Lam's approach of capturing only the dimensionality of the iteration space that has data locality [87]. We define the localized iteration space to be the set of loops that can exploit reuse. For example, the localized iteration space includes both loops in Figure (a), and only the innermost loop in Figure (b). In the latter case, data fetched in the loop body will be available to subsequent iterations within the same innermost loop, but not to subsequent iterations of the outer loop.