A Design Framework for Highly Concurrent Systems

A Design Framework for Highly Concurrent Systems Matt Welsh, Steven D. Gribble, Eric A. Brewer, and David Culler Computer Science Division University of California, Berkeley Berkeley, CA 94720 USA mdw,gribble,brewer,culler @cs.berkeley.edu f g Abstract In addition to high concurrency, Internet services have three other properties which necessitate a fresh Building highly concurrent systems, such as large-scale look at how these systems are designed: burstiness, Internet services, requires managing many information flows continuous demand, and human-scale access latency. at once and maintaining peak throughput when demand ex- Burstiness of load is fundamental to the Internet; deal- ceeds resource availability. In addition, any platform sup- ing with overload conditions must be designed into the porting Internet services must provide high availability and be able to cope with burstiness of load. Many approaches system from the beginning. Internet services must also to building concurrent systems have been proposed, which exhibit very high availability, with a downtime of no generally fall into the two categories of threaded and event- more than a few minutes a year. Finally, because ac- driven programming. We propose that threads and events cess latencies for Internet services are at human scale are actually on the ends of a design spectrum, and that and are limited by WAN and modem access times, an the best implementation strategy for these applications is important engineering tradeoff to make is to optimize somewhere in between. for high throughput rather than low latency. We present a general-purpose design framework for building highly concurrent systems, based on three design com- Building highly concurrent systems is inherently dif- ponents | tasks, queues, and thread pools | which encapsu- ficult. Structuring code to achieve high throughput late the concurrency, performance, fault isolation, and soft- is not well-supported by existing programming mod- ware engineering benefits of both threads and events. We els. While threads are a commonly used device for present a set of design patterns that can be applied to map expressing concurrency, the high resource usage and an application onto an implementation using these compo- scalability limits of many thread implementations has nents. In addition, we provide an analysis of several systems led many developers to prefer an event-driven ap- (including an Internet services platform and a highly avail- proach. However, these event-driven systems are gen- able, distributed, persistent data store) constructed using erally built from scratch for particular applications, our framework, demonstrating its benefit for building and and depend on mechanisms not well-supported by most reasoning about concurrent applications. languages and operating systems. In addition, using event-driven programming for concurrency can be more complex to develop and debug than threads. 1 Introduction That threads and events are best viewed as the op- posite ends of a design spectrum; the key to developing Large Internet services must deal with concurrency highly concurrent systems is to operate in the middle at an unprecedented scale. The number of concurrent of this spectrum. Event-driven techniques are useful sessions and hits per day to Internet sites translates for obtaining high concurrency, but when building real into an even higher number of I/O and network re- systems, threads are valuable (and in many cases re- quests, placing enormous demands on underlying re- quired) for exploiting multiprocessor parallelism and sources. Microsoft's web sites receive over 300 million dealing with blocking I/O mechanisms. Most devel- hits with 4.1 million users a day; Lycos has over 82 mil- opers are aware that this spectrum exists, by utilizing lion page views and more than a million users daily. As both thread and event-oriented approaches for concur- the demand for Internet services grows, as does their rency. However, the dimensions of this spectrum are functionality, new system design techniques must be not currently well understood. used to manage this load. We propose a general-purpose design framework for building highly concurrent systems. The key idea be- hind our framework is to use event-driven program- completion rate: ming for high throughput, but leverage threads (in S tasks / sec limited quantities) for parallelism and ease of programming. In addition, our framework addresses the other closed loop implies S = A server # concurrent requirements for these applications: high availability latency: tasks in server: and maintenance of high throughput under load. The L sec A x L tasks former is achieved by introducing fault boundaries between application components; the latter by condition- ing the load placed on system resources. task arrival rate: This framework provides a means to reason about A tasks / sec the structural and performance characteristics of the system as a whole. We analyze several different systems in terms of the framework, including a distributed Figure 1: Concurrent server model: The server receives persistent store and a scalable Internet services plat- A tasks per second, handles each task with a latency of form. This analysis demonstrates that our design L seconds, and has a service response rate of S tasks per framework provides a useful model for building and second. The system is closed loop: each service response reasoning about concurrent systems. causes another tasks to be injected into the server; thus, S = A in steady state. 2 Motivation: Robust Throughput the network, and hands off incoming tasks to individ- To explore the space of concurrent programming ual task-handling threads, which step through all of styles, consider a hypothetical server (as illustrated in the stages of processing that task. One handler thread Figure 1) that receives A tasks per second from a num- is created per task. An optimization of this simple ber of clients, imposes a server-side delay of L seconds scheme creates a pool of several threads in advance per task before returning a response, but overlaps as and dispatches tasks to threads from this pool, thereby many tasks as possible. We denote the task completion amortizing the high cost of thread creation and de- rate of the server as S. A concrete example of such a struction. In steady state, the number of threads T server would be a web proxy cache; if a request to the that execute concurrently in the server is S L. As cache misses, there is a large latency while the page is the per-task latency increases, there is a corresponding× fetched from the authoritative server, but during that increase in the number of concurrent threads needed to time the task doesn't consume CPU cycles. For each absorb this latency while maintaining a fixed through- response that a client receives, it immediately issues put, and likewise the number of threads scales linearly another task to the server; this is therefore a closed- with throughput for fixed latency. loop system. Threads have become the dominant form of ex- There are two prevalent strategies for handling pressing concurrency. Thread support is standard- concurrency in modern systems: threads and events. ized across most operating systems, and is so well- Threading allows programmers to write straight-line established that it is incorporated in modern lan- code and rely on the operating system to overlap com- guages, such as Java [9]. Programmers are comfortable putation and I/O by transparently switching across coding in the sequential programming style of threads threads. The alternative, events, allows programmers and tools are relatively mature. In addition, threads to manage concurrency explicitly by structuring code allow applications to scale with the number of proces- as a single-threaded handler that reacts to events (such sors in an SMP system, as the operating system can as non-blocking I/O completions, application-specific schedule threads to execute concurrently on separate messages, or timer events). We explore each of these processors. in turn, and then formulate a robust hybrid design pat- Thread programming presents a number of correct- tern, which leads to our general design framework. ness and tuning challenges. Synchronization primi- 2.1 Threaded Servers tives (such as locks, mutexes, or condition variables) are a common source of bugs. Lock contention can A simple threaded implementation of this server cause serious performance degradation as the number (Figure 2) uses a single, dedicated thread to service of threads competing for a lock increases. 1000 t completion rate: 800 ) S tasks / sec ghpu u o r sec / 600 h t thread.sleep( L secs ) r # concurrent e v asks 400 t threads in server: r closed loop S ( implies S = A T threads 200 ax se m T' 0 dispatch( ) or create( ) 1 10 100 1000 10000 # threads executing in server (T) task arrival rate: A tasks / sec Figure 3: Threaded server throughput degradation: This benchmark has a very fast client issuing many concurrent Figure 2: Threaded server: For each task that arrives at 150-byte tasks over a single TCP connection to a threaded the server, a thread is either dispatched from a statically server as in Figure 2 with L = 50ms on a 167 MHz Ultra- created pool, or a new thread is created to handle the task. SPARC running Solaris 5.6. The arrival rate determines the At any given time, there are a total of T threads executing number of concurrent threads; sufficient threads are preal- concurrently, where T = A L. located for the load. As the number of concurrent threads × T increases, throughput increases until T T 0, after which the throughput of the system degrades substantially.≥ Regardless of how well the threaded server is crafted, as the number of threads in a system grows, operating system overhead (scheduling and aggregate mem- driven programmers. However, event-driven program- ory footprint) increases, leading to a decrease in the ming avoids many of the bugs associated with synchro- overall performance of the system.

A Design Framework for Highly Concurrent Systems

Applying the Proactor Pattern to High-Performance Web Servers

Thread Management for High Performance Database Systems - Design and Implementation

Concurrency & Parallel Programming Patterns

Model Checking Multithreaded Programs With

Lecture 26: Creational Patterns

An Enhanced Thread Synchronization Mechanism for Java

Designing for Performance: Concurrency and Parallelism COS 518: Computer Systems Fall 2015

Assessment of Barrier Implementations for Fine-Grain Parallel Regions on Current Multi-Core Architectures

Migrating Thread-Based Intentional Concurrent Programming to a Task-Based Paradigm

Intrinsically-Typed Mechanized Semantics for Session Types

A Comparison of Parallel Design Patterns for Game Development

Leader/Followers a Design Pattern for Efficient Multi-Threaded I/O