There are three key behavioral distinctions between prefetches and loads; prefetches are (i) non-binding, (ii) non-blocking, and (iii) non-excepting. The non-binding property gives prefetches the flexibility to be issued far in advance of the actual references, without worrying about the impact on correctness. The non-blocking property allows prefetches to be overlapped with other references and with computation. The non-excepting property allows speculative prefetching of addresses which may potentially be invalid. In this subsection, we discuss the importance of each of these properties in more detail.
The non-binding aspect of prefetching is implemented by fetching data
into the cache rather than a register. As we discussed earlier in
Section , non-binding prefetches are
essential in multiprocessors since they allow the compiler to prefetch a
location without worrying about whether the value may have been modified by
another processor in the meantime. Even in a uniprocessor, the non-binding
property is important since it avoids the correctness problems that can
arise when using registers for temporary storage given imperfect
memory disambiguation. For example, if prefetches fetched data into
registers, it would be illegal to move a prefetch ahead of a store
unless it was certain that the store was to a different location (otherwise
the prefetched value would be stale). Proving that addresses do not
coincide is extremely difficult because of complications such as aliasing,
pointers, etc. Therefore, the non-binding property frees the compiler from
correctness problems that can occur both across threads and within a single thread.
An additional advantage of prefetching into the cache rather than the register file is that otherwise the limited size of the register file can be a significant constraint on how far ahead one can prefetch. This is crucial since extending register lifetimes to hundreds of cycles (in order to hide large latencies) is almost guaranteed to cause significant register spilling, which can hurt performance considerably. The register lifetime problem is most important in scientific code, where common techniques such as loop unrolling, software-pipelining and register blocking result in very high register pressures even without prefetching. The cache, on the other hand, is substantially larger than the register file, and therefore is not expected to constrain the amount one would reasonably want to prefetch ahead.
The non-blocking aspect of prefetching is essential since the very essence of this latency-hiding mechanism is overlapping memory accesses with computation. Normal loads could also potentially be non-blocking, but this would require a mechanism for interlocking and forwarding the data whenever the load result was used before the access completed. (Because of this hardware complexity, few commercial microprocessors have implemented non-blocking loads.) In contrast, it is easy to make prefetches non-blocking since they produce no result value, and therefore no instructions can depend upon their completion.
Finally, the non-excepting aspect of prefetching (i.e. prefetches do
not take memory exceptions on invalid addresses) is important since it
allows data-dependent addresses (e.g., pointers) to be prefetched without
being absolutely certain that the address is valid. We have already
discussed in Section how this is important
when prefetching indirect references, such as in sparse-matrix code. Even
in dense-matrix code, this property is useful by making it safe to prefetch
off the end of an array whenever generating a proper epilog would be too
expensive (i.e. when it would result in a code size explosion). Therefore
the non-excepting property offers considerable flexibility to the compiler
since it is much easier to generate valid prefetch addresses most of
the time rather than all of the time.