Finally, for our experiments in Section , we set
the prefetch latency to 300 cycles. We chose a value greater than 75 cycles
to account for bandwidth-related delays. To evaluate whether this value was
a good choice, we compiled each benchmark again using prefetch latencies of
100 and 1000 cycles. In nearly all the cases, the impact on performance is
small. In many cases, the 100-cycle case is slightly worse than the
300-cycle case due to bandwidth-related delays. The most interesting case
is CHOLSKY, as shown in Figure
(c). In this case,
prefetched data tends to be replaced from the cache shortly after it
arrives, so ideally it should arrive ``just in time''. Therefore, the
lowest prefetch latency (100 cycles) offers the best the performance, as we
see in Figure
(c). However, in such cases the best
approach may be to eliminate the cache conflicts that cause this
behavior [49].
In general, we observe that it is better to be conservative with the
prefetch latency parameter. Clearly if the value is not large enough to
hide latency, it will always hurt performance. If we specify more
latency than is actually experienced, it hurts performance only if
data gets displaced. As caches become larger, this should become less
and less of a problem. Besides which, only a relatively small number of
new lines can be fetched into the cache in 300-500 cycles. If cache
conflicts are a problem within this relatively small window of time,
chances are that the conflicts will occur even if the prefetch latency is
set to the smallest value that can hide the latency. These chronic cache
conflicts must be dealt with in another way, as we will discuss later in
Section .