The benchmarks evaluated in this study are all scientific and engineering
applications drawn from several benchmark suites. This collection includes
NASA7 and TOMCATV from the SPEC benchmarks [77], OCEAN-a
uniprocessor version of a SPLASH benchmark [72], and CG (conjugate
gradient), EP (``embarassingly parallel''-a Monte Carlo simulation), IS
(integer sort), MG (multigrid) from the NAS Parallel
Benchmarks [8]. Since the NASA7 benchmark really consists of 7
independent kernels, we study each kernel separately (MXM, CFFT2D, CHOLSKY,
BTRIX, GMTRY, EMIT and VPENTA). In addition, for our study in
Section on prefetching indirect references, we also
evaluate MP3D (another uniprocessor version of a SPLASH benchmark) and
SPARSPAK [25] (a sparse matrix application), since these applications contain
many indirect references. Table
provides a brief
summary of the applications, including their input data sets, and Table
shows some general characteristics of the
applications.
For four of the applications (MXM, CFFT2D, VPENTA and TOMCATV), the mapping
conflicts in the direct-mapped cache occurred so frequently that we
manually changed the alignment of some of the matrices to help reduce these
conflicts. These problematic matrices tend to have dimensions that are
powers of two, which causes the cache size (also a power of two) to evenly
divide into the size of a row or possibly the entire matrix. Therefore
adjacent elements in the same column-and sometimes elements with similar
access functions in adjacent matrices-often mapped into the same cache
entry, thus resulting in large numbers of conflicts within inner loops. We
manually fixed this problem by adding 13 (an arbitrary prime number) to the
size of each dimension for these problematic matrices, while being careful
that these changes affected only the data layout and not the actual
computation. Later, in Section , we will
examine these mapping conflicts in more detail, and will evaluate possible
architectural enhancements to minimize their performance impact.