Abstract: This paper considers issues of memory performance in shared memory multiprocessors that provide a high-bandwidth network and in which the memory banks are slower than the processors. We are concerned with the effects of memory bank contention, memory bank delay, and the bank expansion factor (the ratio of number of banks to number of processors) on performance, particularly for irregular memory access patterns. This work was motivated by observed discrepancies between predicted and actual performance in a number of irregular algorithms implemented for the Cray C90 when the memory contention at a particular location is high.
We develop a formal framework for studying memory bank contention and delay, and show several results, both experimental and theoretical. We first show experimentally that our framework is a good predictor of performance on the Cray C90 and J90, providing a good accounting of bank contention and delay. Second, we show that it often improves performance to have additional memory banks, even beyond the natural choice of d banks per processor to compensate for a bank delay of d. Third, we explore scenarios under which high-level models, the EREW PRAM and QRQW PRAM, can be efficiently mapped onto high-bandwidth machines. We provide a work-preserving QRQW PRAM emulation, whose slowdown is a nonlinear function of the bank delay and the number of banks per processor. Finally, we evaluate the impact of contention on performance for several algorithms.
@inproceedings{dxbsp, author = "Guy Blelloch and Phil Gibbons and Yossi Matias and Marco Zagha", title = "Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors", booktitle = "Proceedings Symposium on Parallel Algorithms and Architectures", pages = "84--94", year = 1995, month = jul}