next up previous
Next: 5 Tree contraction Up: Communication-Efficient Parallel Algorithms Previous: 3 Conservative algorithms

4 List contraction

 

In this section we present a conservative ``recursive pairing'' algorithm, Algorithm LC , that can perform many of the same functions on lists as recursive doubling. The idea is to contract an input list by repeatedly pairing and merging adjacent elements of the list until only a single element remains. The merges are recorded as internal nodes of a binary contraction tree whose leaves are the elements in the input list. After building the contraction tree, operations such as broadcasting from the root or parallel prefix can be performed in a conservative fashion. Algorithm LC is a randomized algorithm, and with high probability, the height of the contraction tree and the number of steps on a DRAM are both tex2html_wrap_inline1233 , where n is the number of elements in the input list. A deterministic variant based on deterministic coin tossing [5] runs in tex2html_wrap_inline1237 steps, where m is the number of processors in the DRAM, and produces a contraction tree of height tex2html_wrap_inline1241 .

The recursive pairing strategy is illustrated in Figure 3 for a list tex2html_wrap_inline1243 . In the first step, elements B and C pair and merge, as do elements D and E. The merges are shown as contours in the figure. A new contracted list tex2html_wrap_inline1253 is formed from the unpaired element A and the two compound elements BC and DE. After the second step of the algorithm, the contracted list consists of the elements ABC and DE. The third and final step reduces the list to the single element ABCDE.

In Algorithm LC , the contours of Figure 3 are represented in a data structure called a contraction tree. The leaves of the contraction tree are the list elements, and the internal nodes are the contours. To maintain the contraction-tree data structure, the algorithm requires constant extra space for each element in the input list. Each processor contains two elements: an element in the input list, and a spare element that will act as an internal node in the contraction tree. We call the two elements in the same processor mates. Each element holds a pointer to an unused internal node, which for each list element initially points to its mate. The use of spare nodes allows the algorithm to distribute the space for the internal nodes of the contraction tree uniformly over the elements in the list. (Spare internal nodes are used in [1] and [17] for similar reasons, but in a different context.)

We now describe the operation of Algorithm LC , which is illustrated in Figure 4 for the example of Figure 3. (A description in pseudocode can be found in [19].) In the first step, each element of the input list randomly picks either its left or right neighbor. An element at the left or right end of the list always picks its only neighbor. If two elements pick each other, then they merge. The merge is recorded by making the spare of the left element of the pair be the root of a new contraction tree. The spare of the right element becomes the spare for the root, and the elements themselves become the children of the root. The roots of the new contraction trees and the unpaired list elements now form themselves into a new list representing the contracted list, upon which the algorithm operates recursively.

At each step of the algorithm, any given element of the contracted list is a set of consecutive elements in the input list--a contour in Figure 3. The set is represented by a contraction-tree data structure whose leaves are the elements of the set and whose internal nodes record the merges. When the entire input list has been contracted to a single node, the algorithm terminates and a single contraction tree records all of the merges.

   figure269
Figure 3: The recursive pairing strategy operating on a list tex2html_wrap_inline1267 . Merged nodes are shown as contours, and the nesting of contours gives the structure of the contraction tree.

   figure276
Figure 4: The operation of Algorithm LC on the example of Figure 3. The input list is tex2html_wrap_inline1269 , and the corresponding spares are in lower case. When elements B and C pair and merge in the first step of the algorithm, the spare b becomes the root of a contraction tree with leaves B and C to represent the compound node BC. The spare for b is c. At the end of the first step, the list consisting of the elements A, b, and d represents the contracted list tex2html_wrap_inline1293 . After two more contraction steps of Algorithm LC , the input list is contracted to a single element ABCDE, which is represented by a contraction tree whose root is c and whose leaves are the elements of the input list tex2html_wrap_inline1299 .

To describe the efficiency of randomized algorithms such as Algorithm LC , we shall sometimes say that an algorithm runs in tex2html_wrap_inline1301 steps ``with high probability,'' by which we shall mean that for any constant k>0, there are constants tex2html_wrap_inline1305 and tex2html_wrap_inline1307 such that with probability tex2html_wrap_inline1309 , the algorithm terminates in at most tex2html_wrap_inline1311 steps.

theorem297

Proof: We show that the algorithm terminates after tex2html_wrap_inline1317 iterations with probability at least tex2html_wrap_inline1319 . We use an accounting scheme involving ``tokens'' to analyze the algorithm. Initially, a unique token resides between each pair of elements in the input list. Whenever two list elements pick each other, we destroy the token between them. For each token destroyed, the length of the list decreases by one, and the algorithm terminates when no token remains. In any iteration, an existing token has probability at least tex2html_wrap_inline1321 of being destroyed. Thus, after m iterations, a token has probability at most tex2html_wrap_inline1325 of remaining in existence. Let tex2html_wrap_inline1327 be the event that token i exists after m iterations, and let T be the event that any token remains after m iterations. Then the probability that any token remains after m iterations is given by

eqnarray300

For tex2html_wrap_inline1339 iterations, we have

eqnarray310


to .667emto .667em

theorem318

Proof: The height of the contraction tree is no greater than the number of iterations of Algorithm LC .


to .667emto .667em

We now prove that Algorithm LC is conservative.

theorem320

Proof: By convention, let the mate of an element in the input list lie in the order between that element and its right neighbor. The key idea is that the order of the list elements and their spares is preserved by the merging operation, and consequently, after each contraction step, the pointers in the contracted list correspond to disjoint paths in the original list, and the pointers between elements and their spares also correspond to disjoint paths. By the Shortcut Lemma these two sets of pointers are each conservative with respect to the input list, and since each set of memory accesses in a contraction step of the algorithm is a subset of one of these two sets, the algorithm is conservative.


to .667emto .667em

Although a contraction tree itself is not conservative with respect to an input list, it can be used as a data structure in conservative algorithms. For example, contraction trees can be used to efficiently broadcast a value to all of the elements of a list and to accumulate values stored in each element of a list.

More generally, contraction trees are useful for performing prefix computations in a conservative fashion. Let tex2html_wrap_inline1345 be a domain with a binary associative operation tex2html_wrap_inline1347 and an identity tex2html_wrap_inline1349 . A prefix computation [3, 7, 23] on a list with elements tex2html_wrap_inline1351 in tex2html_wrap_inline1353 puts the value tex2html_wrap_inline1355 in element i for each tex2html_wrap_inline1359 .

A prefix computation on a list can be performed by a conservative, two-phase algorithm on the contraction tree. The leaves of the contraction tree from left to right are the elements in the list from tex2html_wrap_inline1361 to tex2html_wrap_inline1363 . The first phase proceeds bottom up on the tree. Each leaf passes its x value to its parent. When an internal node receives a value tex2html_wrap_inline1367 from its left child and a value tex2html_wrap_inline1369 from its right child, the node saves the value tex2html_wrap_inline1371 and passes tex2html_wrap_inline1373 to its parent. When the root receives values from its children, the second top-down phase begins. The root passes tex2html_wrap_inline1375 to its left child and its tex2html_wrap_inline1377 value to its right child. When an internal node receives a value tex2html_wrap_inline1379 from its parent, it passes tex2html_wrap_inline1381 to its left child, and passes tex2html_wrap_inline1383 to its right child. When a leaf receives tex2html_wrap_inline1385 it computes tex2html_wrap_inline1387 .

The number of steps required by the prefix computation is proportional to the height of the tree, which with high probability is tex2html_wrap_inline1389 . At each step, the algorithm communicates across a set of pointers in the contraction tree, all of which are the same distance from the leaves in the first phase, and the same distance from the root in the second. That this computation is performed in a conservative fashion is a consequence of the following lemma.

theorem324

Proof: An inorder traversal of CT alternately visits list elements (leaves) and their mates (internal nodes) in the same order that the list elements and mates appear in L. Thus, if no pointer in P is an ancestor of another pointer in P, the pointers in P correspond to disjoint paths in L. By the Shortcut Lemma, any set of pointers that correspond to disjoint paths in the list L are conservative with respect to L.


to .667emto .667em

Algorithm LC , which constructs a contraction tree in tex2html_wrap_inline1423 steps, is a randomized algorithm. By using the ``deterministic coin tossing'' technique of Cole and Vishkin [5], the algorithm can be performed nearly as well deterministically. Specifically, the randomized pairing step can be performed deterministically in tex2html_wrap_inline1425 steps on a DRAM with m processors, where tex2html_wrap_inline1429 is the number of times the logarithm function must be successively applied to reduce m to a value at most 1. The overall running time for list contraction is thus tex2html_wrap_inline1435 .

As a final comment, we observe that with minor modifications, Algorithm LC can be used to contract circular lists with the same complexity bounds as for linear lists.



next up previous
Next: 5 Tree contraction Up: Communication-Efficient Parallel Algorithms Previous: 3 Conservative algorithms



Bruce Maggs
Thu Jul 25 19:12:36 EDT 1996