An-Cheng Huang <pach@cs.cmu.edu> Leejay Wu <lw2j@cs.cmu.edu>
The following graphs each track one statistic - such as instructions per cycle or the level 1 instruction cache miss rate - across different benchmarks and configurations, comparing a standard SimpleScalar out-of-order-execution simulator sim-outorder with our modified version that supports instruction prefetching.
The benchmarks include various object-oriented C++ benchmarks from the OOCSB suite, as well as C and Fortran programs in SPEC95. Other SPEC95 benchmarks are not included because runs have not yet completed for these benchmarks across all configurations.
The names of each configuration can be broken down as follows. assoc8 refers to an 8-way associative level 1 instruction cache, whereas direct means direct-mapped; in either case, this cache was 16 KB and used 32-byte blocks. The string normal indicates a prefetching version where blocks retrieved from the prefetch buffer were not copied into the level 1 instruction cache; move indicates they were; and the absence of both indicates a non-prefetching version.
All the prefetching versions are marked with pf; non-prefetching, !pf.
In all cases, the level 1 data cache was a 16 KB direct-mapped with 32-byte blocks, and the unified level 2 cache was 256 KB 8-way associative with 64-byte blocks.
We note that with direct-mapped caches, prefetching can have a significant impact on instructions per cycle.
[IMAGE ] |
[IMAGE ] |
[IMAGE ] |
[IMAGE ] |
[IMAGE ] |
[IMAGE ] |