Chapter describes our core prefetching algorithm,
which handles affine array references and thus dense-matrix code. A key
feature of this algorithm is minimizing prefetching overhead by only
prefetching references that are predicted to suffer cache misses. This core
algorithm is the basis for all of our experiments, and will be extended in
later chapters.
Chapter studies the performance benefits of
prefetching for uniprocessor applications, beginning with a detailed
evaluation of the algorithm described in Chapter
.
Next we evaluate the interaction between prefetching and locality
optimizations, which are another important latency-hiding technique for
dense-matrix codes. Finally, we extend our core compiler algorithm to
handle indirect references (and hence sparse-matrix codes), and measure the
resulting performance improvement of relevant applications.
Chapter focuses on prefetching for large-scale
shared-memory multiprocessors. These machines are interesting because of their
large performance potential, and because they are particularly prone to
suffering from memory latency. We begin by discussing how the prefetching
compiler algorithm described in Chapters
and
is modified to address the issues unique to
multiprocessing, and then evaluate its effect on the performance of the entire
SPLASH [72] application suite. We also compare compiler-inserted
prefetching with hand-inserted prefetching to see whether the compiler is
living up to its potential, and to discover methods for further improvement.
Chapter explores the architectural issues
associated with prefetching, and is divided into three distinct sections.
The first section examines the architectural support necessary for the
basic prefetching model assumed in Chapters
and
. The second part considers ways to enhance the
architecture to further improve prefetching. The third section
comparatively studies other latency-hiding techniques that require
architectural support, namely hardware-controlled prefetching, relaxed
consistency models, and multithreading.
Finally, Chapter contains a summary of the
important results in this dissertation, and discusses their implications.
It also discusses directions for future work in this area.
0