As we mentioned at the outset of Section ,
reuse only translates to locality if the subsequent use of data occurs
before the data are displaced from the cache. The likelihood of
displacement depends on (i) the amount of data brought into the cache
between reuses, which is influenced by factors such as loop iteration
counts, and (ii) the ability of the cache to retain data, which depends on
factors such as the cache size. In this subsection, we describe how our
algorithm takes such factors into account to compute locality through
the notion of a localized iteration space.
We begin with the example in Figure which illustrates
how loop iteration counts and cache size affect locality. The code in
Figures
(a) and
(b) is similar to
our original example in Figure
(a), except that
the upper bound of the j loop is small in
Figure
(a) (8 iterations) and large in
Figure
(b) (10,000 iterations). The B[j+1][0]
reference in both cases has temporal reuse along the outer loop (as
discussed earlier in Section
). In
Figure
(a), this reuse is likely to result in temporal
locality since the eight j iterations bring only 192 bytes into the 8
Kbyte cache between reuses of the same B[j+1][0] locations. In
Figure
(b), however, the temporal reuse does not
result in locality since the 10,000 j iterations will sweep 240
Kbytes of data through the 8 Kbyte cache, flushing a given B[j+1][0] location before it can be reused.
Predicting accurately whether data will remain in the cache is infeasible
for the compiler, due to factors such as symbolic loop iteration counts,
cache mapping conflicts, etc. Rather than trying to represent exactly which
reuses would result in cache hits, we adopt Wolf and Lam's approach of
capturing only the dimensionality of the iteration space that has
data locality [87]. We define the localized iteration
space to be the set of loops that can exploit reuse. For example, the
localized iteration space includes both loops in
Figure (a), and only the innermost loop in
Figure
(b). In the latter case, data fetched in the
loop body will be available to subsequent iterations within the same
innermost loop, but not to subsequent iterations of the outer loop.