During a normal cache miss, the levels of the memory hierarchy closer to
the processor are always checked before proceeding to subsequent levels.
For example, the secondary cache is only checked if the data is not found
in the primary cache. With prefetching, however, one might argue that since
the prefetches are scheduled early enough to hide the worst-case miss
latency, it is no longer necessary to check each level of the cache while
searching for the data. To evaluate this, we modified the uniprocessor
architecture such that prefetches proceed directly to memory without
checking either level of the cache. The results of this experiment are
shown in Figure .
As we see in Figure , it is still
important to check levels of the cache close to the processor for the
prefetched data.
The primary reason for this is to minimize bandwidth
consumption, not latency. The deeper levels of the memory hierarchy are
slower, and have less bandwidth to offer. Therefore the prefetches tend to
congest the memory system, causing delays both in issuing other prefetches
and in servicing normal cache misses. So we see that checking the cache
helps alleviate bandwidth-related delays caused by prefetches that can be
serviced close to the processor (including unnecessary prefetches).
Once the prefetched data has been found, the next step is moving it close to the processor. Just how close is the next question we address.