From Programming Parallel Algorithms.
Communications of the ACM, 39(3), March, 1996.

Next: Acknowledgments Up: Programming Parallel Algorithms Previous: Three Other Algorithms

Summary

The NESL language was designed to be useful for programming and teaching parallel algorithms. For these purposes it was important that it allowed for simple descriptions of algorithms that closely match our high level intuition, and also supplied a well-defined model for analyzing performance. We believe the language has successfully achieved these goals. There are many aspects of NESL, and the purpose of this article was to extract the two features that are most important for programming parallel algorithms. They are:

A performance model based on work and depth. An important aspect is that the model is defined directly in terms of language constructs rather than trying to appeal to any intuition of a machine. As discussed, the model is a virtual one for which we give mappings onto running times for various physical machine models.
The use of data-parallel constructs for expressing parallelism, and the ability to nest such constructs. We certainly do not mean to exclude any other parallel constructs, but having some way of mapping a function over a set of values in parallel seems critical for expressing many parallel algorithms.

This article is suggesting a change in the underlying models we use for analyzing parallel algorithms. In particular it suggests that we move away from using theoretical performance models based on machines to models based on languages. As mentioned in the article, some reference works already informally analyze parallel algorithms in terms of work and depth before mapping them onto a PRAM [17, 16]. We suggest that the extra step is taken of formalizing a model based on work and depth. With this formal model the PRAM can be cut out of the loop, directly mapping the model onto more realistic machines. We furthermore argue that language-based models seem to be the most reasonable way to define a programming model based on work and depth.

A full implementation of NESL is currently available on the Web. The compiler is based on a technique called flattening nested parallelism [4], and compiles to an intermediate language called VCODE. Benchmark results for this implementation for the Connection Machines CM-2 and CM-5 and the Cray C90 are described in [6]. These results show that NESL's performance is competitive with that of machine-specific codes for those benchmarks.

Next: Acknowledgments Up: Programming Parallel Algorithms Previous: Three Other Algorithms

Guy Blelloch, blelloch@cs.cmu.edu