This article uses NESL [5] as an example of a nested data-parallel language. This section gives an overview of the language and Section 3 gives several examples of parallel algorithms described and analyzed with NESL. NESL was designed to express nested parallelism in a simple way with a minimum set of structures and was therefore designed as a language on its own rather than as an extension of an existing sequential language. The ideas, however, can clearly be used in other languages. NESL is loosely based on ML [19], a language with a powerful type system, and on SETL [22], a language designed for concisely expressing sequential algorithms. As with ML, NESL is mostly functional (has only limited forms of side effects), but this feature is tangential to the points made in this article.
NESL supports data-parallelism by means of operations on sequences--one dimensional arrays. All elements of a sequence must be of the same type, and sequence indices are zero based (a[0] extracts the first element of the sequence a). The main data-parallel construct is apply-to-each, which uses a set-like notation. For example, the expression
{a * a : a in [3, -4, -9, 5]};
squares each elements of the sequence [3, -4, -9, 5]
returning the sequence [9, 16, 81, 25]. This can be read:
``in parallel, for each a in the sequence
adds the two sequences elementwise returning [4, -2, -6, 9].
The apply-to-each construct also provides the ability to subselect
elements of a sequence based on a filter. For example
can be read: ``in parallel, for each a in the
sequence
Any function, whether primitive or user defined, may be
applied to each element of a sequence. So, for example, we could define
and then apply it over the elements of a sequence as in
In addition to the parallelism supplied by apply-to-each, NESL
provides a set of functions on sequences, each of which can be
implemented in parallel. For example the function sum adds the
elements of a sequence, and the function reverse reverses the
elements of a sequence. Perhaps the most important function on
sequences is write, which supplies the only mechanism to modify
multiple values of a sequence in parallel. write takes two
arguments: the first is the sequence to modify and the second
is a sequence of integer-value pairs that specify what to modify.
For each pair (i,v) the value v is inserted into position
i of the destination sequence. For example
inserts the -2, 5 and 9 into the
sequence at locations 4, 2 and 5, respectively, returning
Nested parallelism is supplied in NESL by allowing sequences to be
nested and allowing parallel functions to be used in an apply-to-each.
For example, we could apply the sum function in parallel over a
nested sequence, as in
{a + b : a in [3, -4, -9, 5]; b in [1, 2, 3, 4]};
{a * a : a in [3, -4, -9, 5] | a > 0};
function factorial(n) =
if (n == 1) then 1
else n*factorial(n-1);
{factorial(i) : i in [3, 1, 7]};
which returns the sequence [6, 1, 5040].
write([0, 0, 0, 0, 0, 0, 0, 0],
[(4,-2),(2,5),(5,9)]);
[0, 0, 5, 0, -2, 9, 0, 0] .
If an index is repeated, then one
value is written non-deterministically. For readers familiar with the
variants of the PRAM model, we note that the write function is
analogous to an ``arbitrary'' concurrent write. NESL also includes
a function e_write that does not allow repeated indices, and is
analogous to an exclusive write. If repeated indices are used with
e_write, the current implementation reports an error.
{sum(a) : a in [[2,3], [8,3,9], [7]]};
which would return [5, 20, 7]. Here there is
parallelism both within each sum and across the sums. The Quicksort
algorithm showed another example of nested calls--the algorithm is
itself used in an apply-to-each to invoke two recursive calls in
parallel.
Next: The Performance Model
Up: Nested Data-Parallelism and NESL
Previous: Nested Data-Parallelism and NESL
Guy Blelloch, blelloch@cs.cmu.edu