Abstract:
Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high-level, fine-grained parallel formulation of a decision tree-based classifier for memory-resident datasets on SMPs. We exploit two levels of divide-and-conquer parallelism in the tree builder: at the outer level across the tree nodes, and at the inner level within each tree node. Lightweight Pthreads are used to express this highly irregular and dynamic parallelism in a natural manner. The task of scheduling the threads and balancing the load is left to a space-efficient Pthreads scheduler. Experimental results on large datasets indicate that the space and time performance of the tree builder scales well with both the data size and number of processors.
@techreport{NarlikarTR98, author = "Girija J. Narlikar", title = "A Parallel, Multithreaded Decision Tree Builder", institution = "Computer Science Department, Carnegie Mellon University", number = "CMU-CS-98-184", year = 1998, month = "Dec" }