Heartbeat Scheduling: Provable Efficiency for Nested Parallelism

Abstract

A classic problem in parallel computing is to take a high- level parallel program written, for example, in nested-parallel style with fork-join constructs and run it efficiently on a real machine. The problem could be considered solved in theory, but not in practice, because the overheads of creating and managing parallel threads can overwhelm their benefits. Developing efficient parallel codes, therefore, usually requires extensive tuning and optimizations, whose sole purpose is to reduce parallelism just to a point where the overheads become acceptable.

In this paper, we present a scheduling technique that delivers provably efficient results for arbitrary nested-parallel programs, without the tuning needed for controlling parallelism overheads. The basic idea behind our technique is to create threads only at a beat (what we refer to as the "heartbeat") and make sure to do useful work in between. We specify our heartbeat scheduler using an abstract machine semantics and provide mechanized proofs that the scheduler guarantees low overheads for all nested parallel programs. We present a prototype C++ implementation and an evaluation that shows that Hearbeat competes well with manually optimized Cilk Plus codes, without requiring manual tuning.

Paper

Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski
PLDI: Programming Language Design and Implementation, June 2018