Oracle Scheduling: Controlling Granularity in Implicitly Parallel Languages

Abstract

A classic problem in parallel computing is determining whether to execute a task in parallel or sequentially. If small tasks are executed in parallel, the overheads due to task creation can be overwhelming. If large tasks are executed sequentially, processors may spin idle, resulting again in suboptimal speedups. Although this "granularity problem" is identified to be an important problem, it is not well understood; broadly applicable solutions remain elusive.

We propose techniques for controlling the granularity in implicitly parallel programming languages to achieve parallel efficiency. To understand the importance of granularity control, we extent Brents theorem (a.k.a.'s work-time principle) to include task creation overheads. Using a cost semantics for a general-purpose language in the style of lambda calculus with parallel tuples, we then show that taskcreation overheads can slowdown parallel execution by a multiplicative factor. We propose oracle scheduling to reduce these overheads by using estimates of the sizes of parallel tasks. We show that if the oracle provides in constant time estimates that are accurate within a constant multiplicative factor then oracle scheduling provable reduces the taskcreation overheads for a class of parallel computations.

We realize the oracle scheduling by combining static and dynamic techniques. We require the programmer to provide the asymptotic complexity for parallel tasks and use runtime profiling to determine hardware-specific constant factors. We implement the proposed approach and propose a compiler for it as extension of the Manticore compiler for Parallel ML. Our empirical evaluation shows that we can reduce the run-time overheads due to task creation down to between 3 and 13 percent of the sequential time and can obtain scalable speedups when running on multiple processors.

Paper

Umut A. Acar, Arthur Charguéraud, and Mike Rainey
OOPSLA, October 2011