From Programming Parallel Algorithms.
Communications of the ACM, 39(3), March, 1996.
Next: Acknowledgments
Up: Programming Parallel Algorithms
Previous: Three Other Algorithms
The NESL language was designed to be useful for programming and
teaching parallel algorithms. For these purposes it was important
that it allowed for simple descriptions of algorithms that closely
match our high level intuition, and also supplied a well-defined model
for analyzing performance. We believe the language has successfully
achieved these goals. There are many aspects of NESL, and the
purpose of this article was to extract the two features that are most
important for programming parallel algorithms. They are:
-
A performance model based on work and depth. An important aspect is
that the model is defined directly in terms of language constructs
rather than trying to appeal to any intuition of a machine. As
discussed, the model is a virtual one for which we give mappings
onto running times for various physical machine models.
-
The use of data-parallel constructs for expressing parallelism, and
the ability to nest such constructs. We certainly do not mean to
exclude any other parallel constructs, but having some way of mapping a
function over a set of values in parallel seems critical for
expressing many parallel algorithms.
This article is suggesting a change in the underlying models we use for
analyzing parallel algorithms. In particular it suggests that we move
away from using theoretical performance models based on machines to
models based on languages. As mentioned in the article, some reference
works already informally analyze parallel algorithms in terms of work
and depth before mapping them onto a PRAM [17, 16]. We
suggest that the extra step is taken of formalizing a model based on
work and depth. With this formal model the PRAM can be cut out of the
loop, directly mapping the model onto more realistic machines. We
furthermore argue that language-based models seem to be the most
reasonable way to define a programming model based on work and depth.
A full implementation of NESL is currently available on the Web.
The compiler is based on a technique called flattening nested
parallelism [4], and compiles to an intermediate
language called VCODE. Benchmark results for this implementation for
the Connection Machines CM-2 and CM-5 and the Cray C90 are described
in [6]. These results show that NESL's performance is
competitive with that of machine-specific codes for those benchmarks.
Next: Acknowledgments
Up: Programming Parallel Algorithms
Previous: Three Other Algorithms
Guy Blelloch, blelloch@cs.cmu.edu