The final hardware-related issue we will discuss is whether it is useful to have a separate prefetch issue buffer in an architecture that already contains a buffer for writes, or whether both writes and prefetches should be placed in the same buffer. One possible performance disadvantage of using a combined buffer is that prefetches may be delayed behind writes. From an implementation perspective, a buffer that only handles prefetch requests would be smaller, since it does not contain written data. However, it may be simpler to build just a single buffer.
The uniprocessor architecture we have been using does not contain a write
buffer, but the multiprocessor architecture does, since it has a
write-through primary data cache (versus the copy-back cache of the
uniprocessor architecture). In our experiments so far, the multiprocessor
architecture has included both a sixteen-entry write buffer and a
sixteen-entry prefetch issue buffer. To evaluate the performance impact of
having a common buffer, we ran an experiment where both writes and
prefetches were placed in a combined sixteen-entry buffer. Our results
showed absolutely no difference in performance. This is partly because the
lockup-free cache (which allows up to eight outstanding misses for the
multiprocessor architecture) handles requests quickly enough that
prefetches are rarely delayed behind writes. In an earlier study where we
did not use a lockup-free cache [61], the performance
advantage of having a separate prefetch issue buffer was also rather small.
Therefore the choice of separate or combined buffers should be dictated by
whichever is easier to implement, since both schemes offer similar
performance.