Next: Prefetching for Multiprocessors Up: Prefetching for Uniprocessors Previous: Experimental Results

Chapter Summary

Our study of compiler-based prefetching for array-based uniprocessor applications has produced the following results:

The selective prefetching algorithm presented in Chapter is successful at hiding memory latency while minimizing prefetching overhead, thus improving overall performance by as much as twofold.
Our prefetching algorithm is robust with respect to the compile-time parameters that describe the memory hierarchy.
Prefetching and locality optimizations are complementary and therefore should be combined. Locality optimizations reduce the number of accesses to main memory, and prefetching tolerates the latency of the remaining misses.
Through a minor extension of our software pipelining algorithm, our compiler can automatically prefetch indirect array references.

tcm@
Sat Jun 25 15:13:04 PDT 1994