Software-controlled prefetching is only one of several different approaches
for coping with latency. We discussed a number of these techniques
in Section , and in Section
we
demonstrated that locality optimizations and prefetching are
complementary. While locality optimizations are strictly a compiler-based
technique, we will now consider several other techniques that require
architectural support to see how they compare and interact with
software-controlled prefetching. We begin in Section
by comparing hardware-controlled prefetching
with software-controlled prefetching. Next, we will discuss relaxed memory consistency models in Section
, which
are primarily useful for hiding write latency in multiprocessors. Finally, we
evaluate multithreading in Section
,
which is a technique for hiding latency by exploiting parallelism across
multiple threads of execution.