Next: 5 Tree contraction Up: Communication-Efficient Parallel Algorithms Previous: 3 Conservative algorithms

4 List contraction

In this section we present a conservative ``recursive pairing'' algorithm, Algorithm LC , that can perform many of the same functions on lists as recursive doubling. The idea is to contract an input list by repeatedly pairing and merging adjacent elements of the list until only a single element remains. The merges are recorded as internal nodes of a binary contraction tree whose leaves are the elements in the input list. After building the contraction tree, operations such as broadcasting from the root or parallel prefix can be performed in a conservative fashion. Algorithm LC is a randomized algorithm, and with high probability, the height of the contraction tree and the number of steps on a DRAM are both , where n is the number of elements in the input list. A deterministic variant based on deterministic coin tossing [5] runs in steps, where m is the number of processors in the DRAM, and produces a contraction tree of height .

The recursive pairing strategy is illustrated in Figure 3 for a list . In the first step, elements B and C pair and merge, as do elements D and E. The merges are shown as contours in the figure. A new contracted list is formed from the unpaired element A and the two compound elements BC and DE. After the second step of the algorithm, the contracted list consists of the elements ABC and DE. The third and final step reduces the list to the single element ABCDE.

In Algorithm LC , the contours of Figure 3 are represented in a data structure called a contraction tree. The leaves of the contraction tree are the list elements, and the internal nodes are the contours. To maintain the contraction-tree data structure, the algorithm requires constant extra space for each element in the input list. Each processor contains two elements: an element in the input list, and a spare element that will act as an internal node in the contraction tree. We call the two elements in the same processor mates. Each element holds a pointer to an unused internal node, which for each list element initially points to its mate. The use of spare nodes allows the algorithm to distribute the space for the internal nodes of the contraction tree uniformly over the elements in the list. (Spare internal nodes are used in [1] and [17] for similar reasons, but in a different context.)

We now describe the operation of Algorithm LC , which is illustrated in Figure 4 for the example of Figure 3. (A description in pseudocode can be found in [19].) In the first step, each element of the input list randomly picks either its left or right neighbor. An element at the left or right end of the list always picks its only neighbor. If two elements pick each other, then they merge. The merge is recorded by making the spare of the left element of the pair be the root of a new contraction tree. The spare of the right element becomes the spare for the root, and the elements themselves become the children of the root. The roots of the new contraction trees and the unpaired list elements now form themselves into a new list representing the contracted list, upon which the algorithm operates recursively.

At each step of the algorithm, any given element of the contracted list is a set of consecutive elements in the input list--a contour in Figure 3. The set is represented by a contraction-tree data structure whose leaves are the elements of the set and whose internal nodes record the merges. When the entire input list has been contracted to a single node, the algorithm terminates and a single contraction tree records all of the merges.

Figure 3: The recursive pairing strategy operating on a list . Merged nodes are shown as contours, and the nesting of contours gives the structure of the contraction tree.

Figure 4: The operation of Algorithm LC on the example of Figure 3. The input list is , and the corresponding spares are in lower case. When elements B and C pair and merge in the first step of the algorithm, the spare b becomes the root of a contraction tree with leaves B and C to represent the compound node BC. The spare for b is c. At the end of the first step, the list consisting of the elements A, b, and d represents the contracted list . After two more contraction steps of Algorithm LC , the input list is contracted to a single element ABCDE, which is represented by a contraction tree whose root is c and whose leaves are the elements of the input list .

To describe the efficiency of randomized algorithms such as Algorithm LC , we shall sometimes say that an algorithm runs in steps ``with high probability,'' by which we shall mean that for any constant k>0, there are constants and such that with probability , the algorithm terminates in at most steps.

Proof: We show that the algorithm terminates after iterations with probability at least . We use an accounting scheme involving ``tokens'' to analyze the algorithm. Initially, a unique token resides between each pair of elements in the input list. Whenever two list elements pick each other, we destroy the token between them. For each token destroyed, the length of the list decreases by one, and the algorithm terminates when no token remains. In any iteration, an existing token has probability at least of being destroyed. Thus, after m iterations, a token has probability at most of remaining in existence. Let be the event that token i exists after m iterations, and let T be the event that any token remains after m iterations. Then the probability that any token remains after m iterations is given by

eqnarray300

For iterations, we have

eqnarray310

to .667emto .667em

Proof: The height of the contraction tree is no greater than the number of iterations of Algorithm LC .

to .667emto .667em

We now prove that Algorithm LC is conservative.

Proof: By convention, let the mate of an element in the input list lie in the order between that element and its right neighbor. The key idea is that the order of the list elements and their spares is preserved by the merging operation, and consequently, after each contraction step, the pointers in the contracted list correspond to disjoint paths in the original list, and the pointers between elements and their spares also correspond to disjoint paths. By the Shortcut Lemma these two sets of pointers are each conservative with respect to the input list, and since each set of memory accesses in a contraction step of the algorithm is a subset of one of these two sets, the algorithm is conservative.

to .667emto .667em

Although a contraction tree itself is not conservative with respect to an input list, it can be used as a data structure in conservative algorithms. For example, contraction trees can be used to efficiently broadcast a value to all of the elements of a list and to accumulate values stored in each element of a list.

More generally, contraction trees are useful for performing prefix computations in a conservative fashion. Let be a domain with a binary associative operation and an identity . A prefix computation [3, 7, 23] on a list with elements in puts the value in element i for each .

A prefix computation on a list can be performed by a conservative, two-phase algorithm on the contraction tree. The leaves of the contraction tree from left to right are the elements in the list from to . The first phase proceeds bottom up on the tree. Each leaf passes its x value to its parent. When an internal node receives a value from its left child and a value from its right child, the node saves the value and passes to its parent. When the root receives values from its children, the second top-down phase begins. The root passes to its left child and its value to its right child. When an internal node receives a value from its parent, it passes to its left child, and passes to its right child. When a leaf receives it computes .

The number of steps required by the prefix computation is proportional to the height of the tree, which with high probability is . At each step, the algorithm communicates across a set of pointers in the contraction tree, all of which are the same distance from the leaves in the first phase, and the same distance from the root in the second. That this computation is performed in a conservative fashion is a consequence of the following lemma.

theorem324

Proof: An inorder traversal of CT alternately visits list elements (leaves) and their mates (internal nodes) in the same order that the list elements and mates appear in L. Thus, if no pointer in P is an ancestor of another pointer in P, the pointers in P correspond to disjoint paths in L. By the Shortcut Lemma, any set of pointers that correspond to disjoint paths in the list L are conservative with respect to L.

to .667emto .667em

Algorithm LC , which constructs a contraction tree in steps, is a randomized algorithm. By using the ``deterministic coin tossing'' technique of Cole and Vishkin [5], the algorithm can be performed nearly as well deterministically. Specifically, the randomized pairing step can be performed deterministically in steps on a DRAM with m processors, where is the number of times the logarithm function must be successively applied to reduce m to a value at most 1. The overall running time for list contraction is thus .

As a final comment, we observe that with minor modifications, Algorithm LC can be used to contract circular lists with the same complexity bounds as for linear lists.

Next: 5 Tree contraction Up: Communication-Efficient Parallel Algorithms Previous: 3 Conservative algorithms

Bruce Maggs
Thu Jul 25 19:12:36 EDT 1996