Fast Set Operations Using Treaps

Balanced trees provide a wide variety of low-cost operations on ordered sets and dynamic dictionaries. Of the many types of balanced trees that have been developed over the years, randomized search trees (treaps)[4] have the advantage of being both simple and general--in addition to insertion and deletion they easily support efficient finger searching, joining, and splitting. Furthermore, by using appropriate hash functions they require no balance information to be stored at the nodes. Treaps, however, have only been studied in the context of sequential algorithms, and there has been little study of their performance.

Our interest is in developing fast parallel algorithms for aggregate set operations intersection, union, and difference. These operations play an important role in databases queries and index searching--each term (word) can be represented as a set of the ``documents'' it appears in and searches on logical conjunctions of terms are implemented as set operations (intersection for and, union for or , and difference for and-not)[5,3]. Searching, inserting, and deleting multiple elements from a dictionary in parallel have similar algorithms.

Treaps are randomized search trees in which each node in the tree has a key and an associated random priority. The nodes are organized so that the keys appear in in-order and the priorities appear in heap-order. Our set operation algorithms based on treaps run in optimal O(m log (n/m)) work and in O(log n) depth (parallel time) on an EREW PRAM with unit-time scan operations (used for load balancing)--all expected case. This bound is based on automated pipelining)[1]. Without pipelining the algorithms run in O(log² n) depth. Previous work has focused either on parallel merging algorithms that take O(n+m) work and are optimal when the two sets have nearly equal sizes or on multi-insertion algorithms that take O(m log n) work and are optimal when the input values to be inserted are not presorted.

To analyze the effectiveness of the algorithms in practice we ran several experiments. We were interested in various properties including how well treaps compare sequentially with other balanced trees such as red-black trees, splay trees, and skip lists, how well our algorithms perform for various overlaps in the keys, and how well the algorithms parallelize. The parallel implementation and memory management was coded in Cilk, a parallel extension of C, and run on a Sun Ultra Enterprise and a SGI Power Challenge. The results of this work is summarized in [2].

Relevant Papers

1

G. Blelloch and M. Reid-Miller Pipelining with Futures. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 249-259, Newport, RI, June 1997

2

G. Blelloch and M. Reid-Miller Fast Set Operations Using Treaps. In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 16-26, Puerto Vallarta, Mexico, June 1998.

3

M.L. Mauldin. Lycos: Design choices in an Internet search service IEEE Expert, 12(1):8-11, 1997.

4

R. Seidel and C.R. Aragon. Randomized Search Trees. Algorithmica, 16:464-497, 1996.

5

I.H. Witten, A. Moffat, and T.C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold.

Up to the Irregular Algorithms page or the Scandal page.

mrmiller@cs.cmu.edu.

Last modified: Tue Jul 14 15:23:01 EDT 1998