In this subsection, we discuss an issue that arises under invalidation-based coherence schemes (the model we assume for the remainder of this section), which is the use of exclusive-mode prefetching. Under the invalidation-based coherence model, a processor wishing to read a location receives a sharable copy of the line, which allows the line to be replicated in other caches as long as each processor is only reading the line. To write to a line, however, a processor must first acquire an exclusive copy of the line by invalidating the line from other processors' caches. This prevents the replicated copies from becoming stale, thus preserving coherence.
Just as normal memory accesses have two variations (shared accesses for reads, and exclusive accesses for writes), it also makes sense to have two types of prefetches: one that fetches a shared copy of a line, and one that fetches an exclusive copy. If a processor only intends to read a line, it will use the shared-mode prefetch. However, if the processor intends to modify the line-even if the line will be read first and modified shortly thereafter-an exclusive-mode prefetch should be issued to not only fetch a copy of the line, but also to gain ownership.
Proper use of exclusive-mode prefetching can provide two performance benefits. First, it can reduce the latency of the subsequent write since exclusive ownership of the line has already been obtained. This may or may not have a direct impact on execution time, depending on whether writes can be buffered.
The second benefit occurs in the common case where a value is read before
it is written. Intuitively, these cases occur frequently because it is more
common to update a shared variable (e.g., incrementing a shared
counter, updating the position of a particle in a wind tunnel), than to
simply overwrite it without reading it first. In such
``read-modify-write'' cases, what normally occurs is that the processor
first requests a sharable copy of the line, and then immediately
afterward requests an exclusive copy of the same line to perform the
write. Rather than issuing two separate requests, a better approach is
to issue a single exclusive-mode prefetch, as illustrated in
Figure . Therefore exclusive-mode prefetches
can potentially eliminate up to half of the total memory traffic, which
can improve the performance of all references (both reads and
writes) by reducing the amount of contention in the memory subsystem.
We modify our compiler algorithm to exploit exclusive-mode prefetching as
follows. After performing locality analysis, the references have been
partitioned into equivalence classes (see
Section ), which are sets of references that can be
treated as a single reference. An equivalence class may contain multiple
references if they share group locality. We insert an exclusive-mode
prefetch rather than a shared-mode prefetch for a given equivalence class
if at least one member of the equivalence class is a write. For example,
for the code in Figure
(a), locality analysis
would determine that both the read and write of A[i] are in the same
equivalence class. Therefore, despite the fact that the leading
reference to A[i] (i.e. the reference first accessing the data)
is a read, our algorithm would schedule a single exclusive-mode
prefetch of A[i], thus achieving the desired effect illustrated in
Figure
(b).