Prefetching data the proper amount of time in advance of when it is used is
tricky for the hardware, since it has difficulty knowing where the
instruction stream will be a hundred or so cycles in the future. Whereas
the compiler uses software pipelining, the hardware must rely on branch
prediction and possibly instruction lookahead buffering in order to issue
the prefetches at the right time. This can be quite difficult, for example,
when a loop contains a conditional statement whose outcome varies
erratically. For example, if the outcome of function foo() in
Figure
is unpredictable, the lookahead
mechanism will be ineffective since it must back up and start over each
time the branch is mispredicted. With software-pipelining, however, the
data would still be prefetched properly, since the compiler realizes that
subsequent loop iterations are executed regardless of the outcome of the
conditional statement.
The second challenge of making prefetches effective is avoiding cache
conflicts, which is a problem common to both hardware-based and
software-based techniques. Therefore the techniques discussed earlier in
Section are also applicable to
hardware-controlled prefetching.