This page contains information about computer systems that goes beyond the core course material. You won't be tested on any of this, but you might find it interesting, anyhow.
In class, I was not able to fully explain the behavior of the ``memory mountain'' shown on Slide #29. In particular, I could not identify the three levels of caching. I tracked down the paper this was extracted from: T. Stricker and T. Gross, ``Global Address Space, Non-Uniform Bandwidth: A Memory System Performance Characterization of Parallel Systems,'' ACM Conf. on High Performance Computer Architecture, 1997. The paper is available online as Text Abstract, PDF, and postscript (huge!).
The relevant discussion is in Sect. 5.1 of this paper. For those of you who just want the management summary, the L1 cache case corresponds to the highest ridge and to theright. They claim the decreased performance with smaller working set sizes is due to loop overhead, i.e., the processor cannot utilize the full cache bandwidth. The L2 performance is to the left of this highest ridge to the first darkened band. The L3 performance is the clear ``plateau'' that peaks at 600 MB/s, and the main memory performance is the lowest ``plain'' peaking at ~150 MB/s.