Scalable Join Patterns

Scalable Join Patterns Aaron Turon Claudio V. Russo Northeastern University Microsoft Research [email protected] [email protected] Abstract var j = Join.Create(); § ¤ Coordination can destroy scalability in parallel program- Synchronous.Channel[] hungry; ming. A comprehensive library of scalable synchroniza- Asynchronous.Channel[] chopstick; tion primitives is therefore an essential tool for exploiting j.Init(out hungry, n); j.Init(out chopstick, n); parallelism. Unfortunately, such primitives do not easily for(int i = 0; i < n; i++) { var left = chopstick[i]; combine to yield solutions to more complex problems. We var right = chopstick[(i+1) % n]; demonstrate that a concurrency library based on Fournet and j.When(hungry[i]).And(left).And(right).Do(() => { Gonthier’s join calculus can provide declarative and scalable eat(); left(); right();// replace chopsticks coordination. By declarative, we mean that the programmer }); needs only to write down the constraints of a coordination } problem, and the library will automatically derive a correct solution. By scalable, we mean that the derived solutions ¦ Figure 1. Dining Philosophers, declaratively ¥ deliver robust performance both as the number of processors increases, and as the complexity of the coordination problem grows. We validate our claims empirically on seven The proliferation of specialized synchronization primi- coordination problems, comparing our generic solution to tives is therefore no surprise. For example, java.util.con- specialized algorithms from the literature. current [13] contains a rich collection of carefully engi- Categories and Subject Descriptors D.3.3 [Programming neered classes, including various kinds of locks, barriers, Languages]: Language Constructs and Features—Concurrent semaphores, count-down latches, condition variables, ex- programming structures changers and futures, together with nonblocking collections. Several of these classes led to research publications [14, 15, General Terms Algorithms, Languages, Performance 26, 28]. A JAVA programmer faced with a coordination prob- Keywords message passing, concurrency, parallelism lem covered by the library is therefore in great shape. Inevitably, though, programmers are faced with new 1. Introduction problems not directly addressed by the primitives. The prim- Parallel programming is the art of keeping many processors itives must be composed into a solution. Doing so correctly busy with real work. But except for embarrassingly-parallel and scalably can be as difficult as designing a new primitive. cases, solving a problem in parallel requires coordination be- Take the classic Dining Philosophers problem [3], in tween threads, which entails waiting. When coordination is which philosophers sitting around a table must coordinate unavoidable, it must be carried out in a way that minimizes their use of the chopstick sitting between each one; such both waiting time and interprocessor communication. Effec- competition over limited resources appears in many guises tive implementation strategies vary widely, depending on the in real systems. The problem has been thoroughly studied, coordination problem. Asking an application programmer to and there are solutions using primitives like semaphores that grapple with these concerns—without succumbing to con- perform reasonably well. There are also many natural “so- currency bugs—is a tall order. lutions” that do not perform well—or do not perform at all. Naive solutions may suffer from deadlock, if for example each philosopher picks up the chopstick to their left, and then finds the one to their right taken. Correct solutions may Permission to make digital or hard copies of all or part of this work for personal or scale poorly with the number of philosophers (threads). For classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation example, using a single global lock to coordinate philoso- on the first page. To copy otherwise, to republish, to post on servers or to redistribute phers is correct, but will force non-adjacent philosophers to to lists, requires prior specific permission and/or a fee. OOPSLA’11, October 22–27, 2011, Portland, Oregon, USA. take turns through the lock, adding unnecessarily sequential- Copyright © 2011 ACM 978-1-4503-0940-0/11/10. $10.00 ization. Avoiding these pitfalls takes experience and care. In this paper, we demonstrate that Fournet and Gonthier’s We present our algorithm incrementally, both through join calculus [4, 5] provides the basis for a declarative and diagrams and through working C] code (§3). scalable coordination library. By declarative, we mean that • We validate our scalability claims experimentally on the programmer needs only to write down the constraints for seven different coordination problems (§4). For each co- a coordination problem, and the library will automatically ordination problem we evaluate a joins-based implemen- derive a correct solution. By scalable, we mean that the tation running in both Russo’s lock-based library and our derived solutions deliver robust, competitive performance new fine-grained library. We compare these results to the both as the number of processors or cores increase, and as performance of direct, custom-built solutions. the complexity of the coordination problem grows. In all cases, our new library scales significantly better Figure 1 shows a solution to Dining Philosophers using than Russo’s. We scale competitively with—sometimes our library, which is a drop-in replacement for Russo’s C] better than—the custom-built solutions, though we suffer JOINS library [24]. The library uses the message-passing paradigmn of the join calculus. For Dining Philosophers, we from higher constant-time overheads in some cases. use two arrays of channels (hungry and chopstick) to carry • We state and sketch proofs for the key safety and liveness value-less messages; being empty, these messages represent properties characterizing our algorithm (§5). unadorned events. The declarative aspect of this example is Our goal is not to replace libraries like java.util.con- the join pattern starting with j.When. The declaration says current, but rather to complement them, by exposing some that when events are available on the channels hungry[i], of their insights in a way easily and safely useable by ap- left, and right, they may be simultaneously and atomically plication programmers. We discuss this point further, along consumed. When the pattern fires, the philosopher, having with related work in general, in the final section. obtained exclusive access to two chopsticks, eats and then returns the chopsticks. In neither the join pattern nor its 2. The JOINS library body is the order of the chopsticks important. The remaining details are explained in §2.1 Russo’s Joins [24] is a concurrency library derived from Most implementations of join patterns, including Russo’s, Fournet and Gonthier’s join calculus [4, 5], a process algebra use coarse-grained locks to achieve atomicity, resulting in similar to Milner’s π-calculus but conceived for efficient poor scalability (as we show experimentally in §4). Our con- implementation. In this section, we give an overview of the tribution is a new implementation of the join calculus that library API and illustrate it using examples drawn from the uses ideas from fine-grained concurrency to achieve scala- join calculus literature [2, 4, 5]. bility on par with custom-built synchronization primitives: The join calculus takes a message-passing approach to concurrency where messages are sent over channels and • We first recall how join patterns can be used to solve channels are themselves first-class values that can be sent as a wide range of coordination problems (§2), as is well- messages. What makes the calculus interesting is the way established in the literature [2, 4, 5]. The examples pro- messages are received. Programs do not actively request vide basic implementations of some of the java.util. to receive messages from a channel. Instead, they employ concurrent classes mentioned above.2 In each case, the join patterns (also called chords [2]) to declaratively specify JOINS-based solution is as straightforward to write as the reactions to message arrivals. The power of join patterns lies one for dining philosophers. in their ability to respond atomically to messages arriving simultaneously on several different channels. • To maximize scalability, we must allow concurrent, in- Suppose, for example, that we have two channels Put and dependent processing of messages, avoid centralized Get, used by producers and consumers of data. When a pro- contention, and spin or block intelligently. This must ducer and a consumer message are available, we would like be done, of course, while guaranteeing atomicity and to receive both simultaneously, and transfer the produced progress for join patterns. value to the consumer. Using Russo’s Joins API, we write: We meet these challenges by (1) using lock-free [9] data structures to store messages, (2) treating messages as class Buffer<T> { § public readonly Asynchronous.Channel<T> Put; ¤ resources that threads can race to take possession of, public readonly Synchronous<T>.Channel Get; (3) avoiding enqueuing messages when possible (“lazy” public Buffer() { queuing and specialized channel representations), and (4) Join j = Join.Create();// allocatea Join object allowing barging (“message stealing”). j.Init(out Put);// bind its

Load more