While software-controlled prefetching requires support from both hardware and software, several schemes have been proposed that are strictly hardware-based. Porterfield [64] evaluated several cacheline-based hardware prefetching schemes. In some cases they were quite effective at reducing miss rates, but at the same time they often increased memory traffic substantially. Lee [53] proposed an elaborate lookahead scheme for prefetching in a multiprocessor where all shared data is uncacheable. He found that the effectiveness of the scheme was limited by branch prediction and by synchronization. Baer and Chen [7] proposed a scheme that uses a history buffer to detect constant-stride access patterns. In their scheme, a ``lookahead PC'' speculatively walks through the program ahead of the normal PC using branch prediction. When the lookahead PC finds a matching stride entry in the table, it issues a prefetch. They evaluated the scheme in a memory system with a 30 cycle miss latency and found encouraging results.
To compare hardware-controlled prefetching with software-controlled prefetching, we will discuss how hardware-controlled prefetching addresses the three goals introduced in Section -namely performing analysis, maximizing effectiveness and minimizing overheads associated with prefetching.