Earlier in Section , we discussed the possibility of placing prefetched data in a separate target buffer, rather than the normal cache hierarchy. We suggested that as a general policy, this was not a good idea. However, there is one case where this may make sense, which is when prefetched data is only used once (i.e. it has no locality). When data has no locality, it will not be reused after it is brought into the cache. If a large amount of data with no locality is placed in the cache, it may displace other data that would have a locality benefit. Therefore one possibility is to issue uncached prefetches whenever the compiler determines that a reference has no locality.
Using uncached prefetches has several complications. First, if the compiler incorrectly predicts that a reference has no locality when it does in fact have locality, the performance will suffer as a result. For example, the A[j][0] reference in Figure may or may not have temporal locality, depending on the size of n. If the compiler assumes n is large and therefore A[j][0] has no locality, it should realize that using uncached prefetches for A[j][0] will hurt performance if n turns out to be small. Therefore uncached prefetches should only be used when the compiler is certain that a reference has no locality.
The second complication, as we mentioned earlier in Section , is the hardware complexity of building the separate target buffer. Given this complexity, it is unclear whether it would be worth building such a structure even to support uncached prefetches.