Previously in Proc. Symposium on Parallel Algorithms and Architectures, pages 1-12. July 1995.
Abstract: Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to dynamically schedule tasks to processors so as to not only minimize the execution time, but also to minimize the amount of memory needed. Without careful scheduling, the parallel execution on P processors can use a factor of P or larger more memory than a sequential implementation of the same program.
This paper first identifies a class of parallel schedules that are provably efficient in both time and space, even for programs whose task structure is revealed only during execution. For programs with sufficient parallelism, the schedule guarantees that the amount of memory used by the program is within a factor of $1 + o(1)$ of a sequential implementation. This space bound is obtained by proving a graph-theoretic result relating parallel and sequential traversals of directed acyclic graphs.
The paper then describes an efficient dynamic scheduling algorithm that generates schedules in this class, for languages with nested fine-grained parallelism. The algorithm is relatively simple, performing the necessary processor allocation and task synchronization while incurring at most a constant factor overhead in time and space. The correctness and performance guarantees of the algorithm rely on properties of depth-first-like traversals of series-parallel graphs. The algorithm is the first efficient solution to the scheduling problem discussed here, even if space considerations are ignored.
@article{BGM99, author = "Guy Blelloch and Phil Gibbons and Yossi Matias", title = "Provably Efficient Scheduling for Languages with Fine-Grained Parallelism", journal = "Journal of the Association for Computing Machinery", number = 2, volume = 46, pages = "281--321", year = 1999}