Guy Blelloch

Research

My research has largely been in the interaction of Algorithms and Programming Languages, much of it in the area of parallel computing.
Here are some of the more recent topics I have worked on with my students and collaborators.

Parallelism

Benchmarking Parallel Algorithms. We are developing a benchmark suite of parallel algorithmic problems for testing machines and programming paradigms. The ideas is that they will be defined in terms of the interface instead of the code so they can be compared across machine types and programming paradigms.

Cache efficient parallel algorithms. We have been developing cost models, algorithms, and schedulers for taking advantage of locality on parallel cache hierarchies without worrying about details of the machine.

Synchronization and concurrent algorithms. We are looking into the idea of adding transactions on individual memory blocks to the memory system and how this can be used to simplify many concurrent algorithms. We have also looked into the idea "room-synchronizations".

Thread Scheduling. We are developing scheduling dynamic parallel languages and studying how they affect time, memory and cache performance.

Data structures and algorithms

Self-adjusting Computation. The idea is to keep track of dependencies while executing an algorithm so that you can go back, change history, and propagate the change to the output.

Computational Biology: algorithms for phylogenetic tree construction. We have been looking at algorithms that generate trees with maximum-parsimony.

Uniquely Represented Data Structures. Data structures that only depend on their contents and not the order of insertion or deletion have applications in security and concurrency.

and here are other things I've worked on:

Algorithms for Multiprocessor Garbage Collection. The goal here is to bound the wait time for a GC while also bounding the memory required.

The NESL Language

Provably Efficient Language Implementations

Compact Data Structures. We look at how to represent graphs, meshes, indices and sets with a number of bits that are asymptotically optimal and access/update times that are also asymptotically optimal.

Parallel Algorithms including work on Delaunay Triangulation, Treaps, and Sorting.

Other: Pipelining with Futures, Multibank Memory Systems.

Cache Efficient Parallel Algorithms

We have been developing high-level algorithmic models for accounting for locality on parallel machines, developing algorithms which are efficient under these models, developing schedulers for the models, and proving bounds on runtime based on the models. This includes work reported in the following papers:

Guy E. Blelloch, Phillip B. Gibbons, and Harsha Vardhan Simhadri
Brief announcement: low depth cache-oblivious sorting.
Proc. ACM Symposium on Parallel Algorithms and Architectures (SPAA), August 2009.

Guy E. Blelloch and Phillip B. Gibbons, and Harsha Vardhan Simhadri
Low Depth Cache Oblivious Algorithms
Tech Report CMU-CS-09-134

Rezaul A. Chowdhury, Vijaya Ramachandran, Guy E. Blelloch, Phillip B. Gibbons, Shimin Chen and Michael Kozuch.
Provably Good Multicore Cache Performance for Divide-and-Conquer Algorithms.
SIAM/ACM Symposium on Discrite Algortithms (SODA), January 2008.
Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas, Todd C. Mowry, and Chris Wilkerson.
Scheduling threads for constructive cache sharing on CMPs.
Proc. ACM Symposium on Parallel Algorithms and Architectures (SPAA), June 2007.
Guy E. Blelloch, and Phillip B. Gibbons
Effectively Sharing a Cache Among Threads
ACM Symposium on Parallelism in Algorithms and Architecture (SPAA), June 2004.
Umut Acar, Guy Blelloch, and Robert Blumofe.
The Data Locality of Work Stealing.
Theory of Computing Systems (TCS), 35(3), May 2002.

Synchronization Primitives and Concurrent Algorithms

Guy E. Blelloch, Phillip B. Gibbons, and S. Harsha Vardhan
Combinable Memory-Block Transactions.
Proc. ACM Symposium on Parallel Algorithms and Architectures (SPAA), June 2008.

Guy E. Blelloch, Perry Cheng, and Phillip B. Gibbons.
Scalable Room Synchronizations
Theory of Computing Systems (TCS), 36(5), September 2003. (Preliminary version in SPAA 2001).

Self Adjusting Computation

It is often the case that by slightly changing the input to an algorithm, the execution path will only change slightly. The idea of self-adjusting computation is to let the algorithm "adjust" to such small changes by only updating the part of the computation that is required to be updated. The idea is implemented by maintaining a dependence graph of the computation and using this graph to propagate changes. The propagation will both update the output and update the graph itself (so that it is ready for another change).

We are interested both in techniques for maintaining the graph, and in proving bounds for how much needs to be updated for different algorithms.

Umut A. Acar, Guy E. Blelloch, Ruy Ley-Wild, Kanat Tangwongsan, and Duru Turkoglu.
Traceable Data Types for Self-Adjusting Computation.
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2010.
Umut A. Acar, Guy E. Blelloch, Matthias Blume, Robert Harper, and Kanat Tangwongsan
An experimental analysis of self-adjusting computation.
ACM Transactions on Programming Languages and Systems (TOPLAS), 32(1), September 2009.

Umut A. Acar, Guy E. Blelloch, Kanat Tangwongsan, and Duru Trkoglu
Robust Kinetic Convex Hulls in 3D.
European Symposium on Algorithms (ESA), September 2008.

Umut A. Acar, Guy E. Blelloch, and Kanat Tangwongsan.
Kinetic 3D convex hulls via self-adjusting computation. (Video presentation, Short description)
Proc. ACM Symposium on Computational Geometry (SoCG), June 2007.
Umut Acar, Guy Blelloch, and Robert Harper.
Adaptive Functional Programming.
ACM Transactions on Programming Languages and Systems (TOPLAS), November 2006.
(Originally in POPL 2002).
Umut A. Acar, Guy E. Blelloch, Kanat Tangwongsan, and Jorge L. Vittes.
Kinetic Algorithms via Self-Adjusting Computation.
European Symposium on Algorithms (ESA), September 2006.
Umut A. Acar, Guy E. Blelloch, Matthias Blume, and Kanat Tangwongsan.
An Experimental Analysis of Self-Adjusting Computation.
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2006.
Umut A. Acar, Guy E. Blelloch, and Jorge L. Vittes.
An Experimental Analysis of Change Propagation in Dynamic Trees
Workshop on Algorithms Engineering and Experiments (ALENEX), January 2005.
Umut A. Acar, Guy E. Blelloch, Robert Harper, Jorge L. Vittes, and Shan Leung Maverick Woo
Dynamizing Static Algorithms, with Applications to Dynamic Trees and History Independence
ACM/SIAM Symposium on Discrete Algorithms (SODA), January 2004.
Umut A. Acar, Guy E. Blelloch and, Robert Harper.
Selective Memoization
ACM Symposium on Principles of Programming Languages (POPL), January 2003.

Compact Data Structures for Graphs, Indices, Meshes and Sets

We have looked at developing data structures that are compressed to within a constant factor of optimal while allowing access within a constant factor of optimal.

Daniel K. Blandford and Guy E. Blelloch.
Compact Dictionaries for Variable-Length Keys and Data, with applications.
ACM Transactions on Algorithms, May 2008. (Preliminary version in SODA 05).

Daniel K. Blandford, Guy E. Blelloch, and Clemens Kadow.
Engineering a Compact Parallel Delaunay Algorithm in 3D.
ACM Symposium on Computational Geometry (SoCG), June 2006.
Daniel K. Blandford, Guy E. Blelloch, David E. Cardoze, and Clemens Kadow.
Compact representations of simplicial meshes in two and three dimensions.
International Journal of Computational Geometry and Applications, February 2005. (Preliminary version in IMR 2003)
Daniel K. Blandford and Guy E. Blelloch.
Compact Representations of Ordered Sets
ACM/SIAM Symposium on Discrete Algorithms (SODA), January 2004.
Daniel Blandford, Guy E. Blelloch, and Ian Kash.
An Experimental Analysis of a Compact Graph Representation
Workshop on Algorithms Engineering and Experiments (ALENEX), January 2004.
Daniel Blandford, Guy E. Blelloch, and Ian Kash.
Compact Representations of Separable Graphs.
ACM/SIAM Symposium on Discrete Algorithms (SODA), January 2003 (to appear).
Daniel Blandford and Guy E. Blelloch.
Index Compression through Document Reordering.
IEEE Data Compression Conference (DCC), April 2002.

Uniquely Represented Data Structures

A data structure is uniquely represented if its layout in memory depends only on the contents (as seen by the interface). In particular it cannot depend on the ordering that objects were added or deleted. Such data structures have applications to security (to make sure that no information is revealed about the order of operations) and concurrency (to make sure the data structure ends up in the same state, independent of how operations end up being interleaved).

Guy E. Blelloch, Daniel Golovin, and Virginia Vassilevska
Uniquely Represented Data Structures for Computational Geometry.
Proc. Scandinavian Workshop on Algorithm Theory (SWAT), July 2008.

Guy E. Blelloch and Daniel Golovin.
Strongly History-Independent Hashing with Applications
IEEE Symposium on Foundations of Computer Science (FOCS), October 2007.

Multiprocessor Garbage Collection

Daniel Spoonhower, Guy Blelloch, and Robert Harper.
Objects and their collection: Using page residency to balance tradeoffs in tracing garbage collection
ACM/USENIX international conference on Virtual execution environments, June 2005.
Perry Cheng and Guy E. Blelloch.
A Parallel, Real-Time Garbage Collector.
ACM SIGPLAN Symposium on Programming Language Design and Implementation (PLDI), June 2001.
Guy E. Blelloch and Perry Cheng.
On Bounding Time and Space for Multiprocessor Garbage Collection.
ACM SIGPLAN Symposium on Programming Language Design and Implementation (PLDI), May 1999.
Presents the first real-time multiprocessor garbage collection algorithm with provable bounds on time and space. The algorithm requires at most 2 (R(1 + 2/k) + N + 5PD) memory locations, where P is the number of processors, R is the maximum reachable space during a computation, N is the maximum number of reachable objects, D is the maximum depth of any data structure, and k is a parameter specifying how many locations are copied each time a location is allocated. Furthermore the client processes are never stopped for more than time proportional to k non-blocking machine instructions.

Thread Scheduling

Many of today's high level parallel languages support dynamic, fine-grained parallelism. The language implementation typically requires an efficient scheduling algorithm to assign computations to processors at runtime. However, in an attempt to expose a sufficient degree of parallelism to keep all processors busy, schedulers often create many more parallel threads than necessary, leading to excessive memory usage. Further, the order in which the threads are scheduled can greatly affect the total size of the live data at any instance during the parallel execution, and unless the threads are scheduled carefully, the parallel execution of a program may require much more memory than its serial execution.

Our research involves finding theoretically space- and time-efficient scheduling algorithms for multithreaded programs, including programs with nested parallelism and synchronization variables. In addition to proving the space and time bounds of the resulting parallel schedules, we demonstrate that they are efficient in practice. We have implemented a runtime system that uses our algorithm to schedule threads in computations with nested parallelism. The following papers describe our results.

Daniel Spoonhower, Guy E. Blelloch, Phillip B. Gibbons, and Robert Harper
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures.
Proc. ACM Symposium on Parallel Algorithms and Architectures (SPAA), August 2009.

Rezaul A. Chowdhury, Vijaya Ramachandran, Guy E. Blelloch, Phillip B. Gibbons, Shimin Chen and Michael Kozuch.
Provably Good Multicore Cache Performance for Divide-and-Conquer Algorithms.
SIAM/ACM Symposium on Discrite Algortithms (SODA), January 2008.

Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas, Todd C. Mowry, and Chris Wilkerson.
Scheduling threads for constructive cache sharing on CMPs.
Proc. ACM Symposium on Parallel Algorithms and Architectures (SPAA), June 2007.
Guy E. Blelloch, and Phillip B. Gibbons
Effectively Sharing a Cache Among Threads
ACM Symposium on Parallelism in Algorithms and Architecture (SPAA), June 2004.
Umut Acar, Guy Blelloch, and Robert Blumofe.
The Data Locality of Work Stealing.
Theory of Computing Systems (TCS), 35(3), May 2002.
G. E. Blelloch, P. Gibbons and Y. Matias.
Provably Efficient Scheduling for Languages with Fine-Grained Parallelism.
Journal of the ACM (JACM), 46(2), March 1999.
Describes a parallel algorithm for scheduling based on the notion of a parallel depth-first schedule. It proves that any computation which does W work (total number of operations), has D depth (longest chain of dependences, or critical path), and requires S1 space if run sequentially can be scheduled to run on P processors in O(W/P + D) time and S1 + O(P D) total space. We show how to generate such a schedule for any computation off-line and give an algorithm for nested-parallel computations that can generate an efficient on-line schedule. The previous best known bounds on space were S1 P based on results by Blumofe and Leiserson. Ours are therefore the first bounds that only require an additive term instead of a multiplicative factor over the sequential space.
Girija J. Narlikar and Guy E. Blelloch.
Space-Efficient Implementation of Nested Parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS), 21(1), January 1999.
Describes an asynchronous version of the scheduling algorithm and the first implementation of our scheduling ideas. This version is more practical the than the above mentioned on-line scheduler since it allows processors to run mostly asynchronously and does not require any preemption (threads can run unimpeded until they decide to terminate). Experimental results are given for a variety of applications. The experiments show improved space usage over previous schedulers, and approximately equal performance.
Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, and Girija Narlikar.
Space-Efficient Scheduling of Parallelism with Synchronization Variables
ACM Symposium on Parallel Algorithms and Architectures, June 1997.
Describes an on-line scheduling algorithm for languages with synchronization variables (e.g. futures, i-structures, events). Presents the first space bounds for such languages since previous results only applied to nested-parallel or strict languages. The technique uses the parallel depth-first schedule as described in our original paper, but requires balanced-trees to properly maintain the thread priorities on-line.
Guy E. Blelloch and John Greiner.
A Provable Time and Space Efficient Implementation of NESL
ACM International Conference on Functional Programming, May 1996.
Applies the ideas from the previous paper to the NESL language by specifying a provably-efficient language implementation. The paper describes a "profiling" semantics for NESL---which is an operational semantics extended with cost measures. The profiling semantics tracks work and depth using by returning a DAG, and tracks sequential space by threading a store in a fixed order. The paper then proves relationship of these costs measures to runtime on various machines models.

Parallel Algorithms

We have studied and experimented with parallel algorithms for a variety of problems, including sorting, Delaunay-triangulation, graph-connectivity, N-body simulations and set operations. The goal of our work has been to look at key problems and see what the very fastest algorithm we could develop for the problem is. Our work has consisted of both looking at theoretical and practical consequences. We have implemented all the algorithms we describe. In many cases our algorithms are the fastest available (at least at the time they were implemented). I've also written the chapter in the CRC Press Computer Science and Engineering Handbook (with Bruce Maggs). Our work includes the following.

Daniel K. Blandford, Guy E. Blelloch, and Clemens Kadow.
Engineering a Compact Parallel Delaunay Algorithm in 3D.
ACM Symposium on Computational Geometry (SoCG), June 2006.
Guy E. Blelloch, Jonathan C. Hardwick, Gary L. Miller, and Dafna Talmor.
Design and Implementation of a Practical Parallel Delaunay Algorithm.
Algorithmica, 24(3/4), 1999.
This is a composition of the two papers: Developing a practical projection-based parallel Delaunay algorithm and Implementation and Evaluation of an Efficient Parallel Delaunay Triangulation Algorithm.
Describes a new divide-and-conquer parallel algorithm for 2-d Delaunay triangulation, and its implementation in NESL and MPI. The algorithm has O(n log n) work and has O(log^3 n) depth. The main idea of the algorithm is to do all the partitioning work on the way down the recursion as opposed to on the way up (as done by Shammos and Hoey's algorithm -- the original O(n log n) time sequential algorithm). Experimental results show a 32-fold speedup on 64-processors (on a Cray T3D). The speedup is relative to the fastest sequential implementation of 2-D Delaunay we could find.
Guy E. Blelloch and Margaret Reid-Miller.
Fast Set Operations Using Treaps.
ACM Symposium on Parallel Algorithms and Architectures, June 1998.
Describes parallel algorithms for union, intersection and difference on ordered sets using random balanced binary trees (treaps), and presents experimental results for the algorithms on the SGI power challenge and Sun Enterprise server. For two sets of size n and m (m <= n) the algorithms run in expected O(m log (n/m)) work and O(log n) depth (parallel time) on an EREW PRAM with scan operations. The experiments show speedups of 6.3 to 6.8 speedup on 8 processors of the SGI power challenge.
Guy Blelloch and Girija Narlikar.
A Practical Comparison of N-Body Algorithms.
Proceedings DIMACS implementation challenge, 1997.
Guy E. Blelloch, Siddhartha Chatterjee, and Marco Zagha.
Solving Linear Recurrences with Loop Raking.
Journal of Parallel and Distributed Computing, February 1995.
John Greiner.
A Comparison of Data-Parallel Algorithms for Connected Components.
ACM Symposium on Parallel Algorithms and Architectures, June 1994 (and CMU-CS-93-191).
Marco Zagha and Guy E. Blelloch.
Radix Sort for Vector Multiprocessors.
Supercomputing '91, November 1991.
Guy E. Blelloch, Charles E. Leiserson, Bruce M. Maggs, C. Gregory Plaxton, Stephen J. Smith, and Marco Zagha.
A Comparison of Sorting Algorithms for the Connection Machine CM-2.
ACM Symposium on Parallel Algorithms and Architectures, 1991.

The NESL Programming Language

In the early 90s we designed a parallel programming language called NESL (NESted-parallel Language) and implemented it on a wide variety of parallel machines. The goal has been to experiment with ideas of nested parallelism, and with language-based cost models. We have coded up a large number of algorithms in NESL, and in most cases the code is very much more concise than in other parallel programming languages. Many of these algorithms are available off of the NESL algorithm library page, and demos of some of them are available off of the algorithm animation page. An online tutorial and interpreter are also available. The Chapter on Parallel Algorithms in the CRC Computer Science and Engineering Handbook uses pseudocode based on NESL.

Perhaps the most important feature of NESL is the model it supplies for analyzing costs. In particular it defines costs in terms of work (total number of operations) and depth (longest sequence of dependences, or critical path) and defines rules for composing these costs across expression. In general when making calls sequentially the sum is taken of both the work and depth, and when making calls in parallel the maximum is taken over the work and depth. This idea of using work and depth as cost measures has been adopted by other languages, such as the CILK programming language.

In addition to the NESL home page the following papers describe our work on NESL.

Guy Blelloch.
Programming Parallel Algorithms.
Communications of the ACM, 39(3), March 1996.
Describes and motivates the two main ideas behind NESL: nested data parallelism, and the language based performance model. It goes through several examples of parallel algorithms written and analyzed in NESL, including quicksort, primes, sparse matrix vector multiply, FFT, quickmedian, and a convex-hull algorithm.
Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Jay Sipelstein, and Marco Zagha.
Implementation of a Portable Nested Data-Parallel Language.
Journal of Parallel and Distributed Computing, 21(1), April 1994.
Outlines the implementation of NESL and gives performance numbers on a variety of parallel machines.
Guy E. Blelloch and John Greiner.
A Provable Time and Space Efficient Implementation of NESL
ACM International Conference on Functional Programming, May 1996.
Applies the ideas from the previous paper to the NESL language by specifying a provably-efficient language implementation. The paper describes a "profiling" semantics for NESL---which is an operational semantics extended with cost measures. The profiling semantics tracks work and depth using by returning a DAG, and tracks sequential space by threading a store in a fixed order. The paper then proves relationship of these costs measures to runtime on various machines models.
Guy E. Blelloch.
NESL: A Nested Data-Parallel Language.
Carnegie Mellon, Technical Report CMU-CS-95-170, September 1995.
The language definition, including a complete list of functions. It does not contain the operational semantics (see below).
Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, and Marco Zagha.
NESL User's Manual.
Carnegie Mellon, Technical Report CMU-CS-95-169, September 1995.
Describes how to set up the NESL environment and how to use the various features in NESL (such as tracing, profiling and using remote machines).

Provably-Efficient Language Implementations

The idea of a provably-efficient language implementations is to

Formally define an abstract cost model from which the users of the language can analyze their code.
Specify a mapping from these costs to running times on a concrete machine model.
Describe an implementation and prove that this mapping is achieved with the implementation.

This work was originally motivated by the need to model time in the parallel programming language NESL, but we have applied it to other parallel programming models. We formalize the cost model using a profiling semantics, which is an operational semantics extended with cost measures. The following papers describe our results.

Guy E. Blelloch and John Greiner.
Parallelism in Sequential Functional Languages
ACM Symposium on Functional Programming and Computer Architecture, June 1995.
Describes a profiling semantics for the lambda-calculus assuming that in an expression e1 e2 the two subexpressions e1 and e2 can be evaluated in parallel, but that they must both fully evaluate before the result e1 can be applied to e2. This is called call-by-value parallelism. The semantics tracks work and depth as the two cost measures. The paper shows several results including the following.
1. An implementation of the lambda-calculus on a CREW PRAM that guarantees that any expression that evaluates in W work and D depth, will run on P processors in O(W/P + D log P) time.
2. A simulation of the PRAM on the lambda calculus that guarantees that any computation that takes T time and M memory on a P processor PRAM will run in T P log M work and T log M log P depth.
3. Shows how sorting can be implemented in O(n log n) work and O(log^2 n) depth in the lambda-calculus.
The first two results imply that the class NC (Nick's class) can be defined as computations in the lambda-calculus that can run within polynomial work and polylogarithmic depth.
John Greiner and Guy E. Blelloch.
A Provably Efficient Parallel Implementation of Full Speculation
ACM Symposium on Principles of Programming Languages, January 1996.
This extends the previous work to languages in which in an expression e1 e2 the function resulting from evaluating e1 can start running before e2 is complete. This is the parallelism available with futures and with lenient languages. Such parallelism can reduce the depth relative to call-by-value parallelism, but implementing such parallelism requires synchronizing on individual variables. The paper describes an implementation that guarantees that any expression that evaluates in W work and D depth, will run on P processors in O(W/P + D log P) time on a CRCW PRAM.
Guy E. Blelloch and John Greiner.
A Provable Time and Space Efficient Implementation of NESL .
ACM International Conference on Functional Programming, May 1996.
Formally defines a cost model for NESL using a profiling semantics. This model defines work (W), depth (D), and sequential space (S1). It then proves a mapping from this model onto various machine models. For example it shows that on a CREW PRAM a computation will run in O(W/P + D log P) time and S1 + O(D P) space.

Computational Biology: Maximum-Parsimony Phylogeny

Navodit Misra, Guy Blelloch, R Ravi and Russell Schwartz.
Generalized Buneman pruning for inferring the most parsimonious multi-state phylogeny.
Conference on Research in Computational Molecular Biology (RECOMB), April 2010.
Srinath Sridhar, Fumei Lam, Guy E. Blelloch, R. Ravi, and Russell Schwartz
Mixed Integer Linear Programming for Maximum-Parsimony Phylogeny Inference.
ACM/IEEE Transactions on Computational Biology and Bioinformatics (TCBB), July 2008. (Preliminary version in ISBRA 2007)

Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, Eran Halperin, R. Ravi and Russell Schwartz
Algorithms for Efficient Near-Perfect Phylogenetic Tree Reconstruction in Theory and Practice.
ACM/IEEE Transactions on Computational Biology and Bioinformatics (TCBB), October 2007. (A combination of preliminary versions in ICALP 2006 and CSB 2006.)

Other Research

Guy E. Blelloch and Margaret Reid-Miller.
Pipelining with Futures
ACM Symposium on Parallel Algorithms and Architectures. June 1997.
Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time, but properly maintaining the pipeline is very complicated for the programmer. This paper shows how futures (a parallel language construct) can be used to implement pipelining without requiring the user to code it explicitly, allowing for much simpler code and more asynchronous execution. It describes and analyzes three algorithms using this technique: a parallel merging algorithm on trees, a parallel version of insertion into and deletion from randomized balanced trees (treaps), and insertion into 2-3 trees.
Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, and Marco Zagha .
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors.
ACM Symposium on Parallel Algorithms and Architectures, July 1995.
This work considers issues of memory performance in shared memory multiprocessors for algorithms with irregular memory accessesses (e.g. pointer based algorithms). In particular it develops a model to account for the effect of contention and delay at the memory banks. The work was motivated by observed discrepancies between predicted and actual performance in a number of irregular algorithms implemented for the Cray C90 when the memory contention at a particular location is high. We show experimentally that our model much better accounts for the performance of such algorithms than previous models.

Guy E. Blelloch's Home Page

Research

Parallelism

Data structures and algorithms