Microarchitectural Implications of Event-Driven Server-Side Web Applications

Microarchitectural Implications of Event-driven Server-side Web Applications Yuhao Zhu Daniel Richins Matthew Halpern Vijay Janapa Reddi The University of Texas at Austin, Department of Electrical and Computer Engineering {yzhu, drichins, matthalp}@utexas.edu, [email protected] Abstract by mapping software efficiently to the hardware substrate. Enterprise Web applications are moving towards server- We must continue to track the developments in the software side scripting using managed languages. Within this shifting ecosystem in order to sustain architecture innovation. context, event-driven programming is emerging as a crucial At the cusp of the software evolution are managed script- programming model to achieve scalability. In this paper, we ing languages, which provide portability, enhanced security study the microarchitectural implications of server-side script- guarantees, extensive library support, and automatic memory ing, JavaScript in particular, from a unique event-driven pro- management. In particular, JavaScript is the peak of all the gramming model perspective. Using the Node.js framework, programming languages, surpassing C, C++, and Java to be we come to several critical microarchitectural conclusions. the most widely used language by developers [1]. From inter- First, unlike traditional server-workloads such as CloudSuite active applications in mobile systems to large-scale analytics and BigDataBench that are based on the conventional thread- software in datacenters, JavaScript is ushering in a new era of based execution model, event-driven applications are heavily execution challenges for the underlying processor architecture. single-threaded, and as such they require significant single- In this paper, we focus on server-side JavaScript, specifi- thread performance. Second, the single-thread performance cally its implications on the design of future server processor is severely limited by the front-end inefficiencies of today’s architectures. While there are numerous studies that have fo- server processor microarchitecture, ultimately leading to over- cused on various aspects of dynamic languages on hardware, all execution inefficiencies. The front-end inefficiencies stem such as garbage collection [2], type checking [3], exploiting from the unique combination of limited intra-event code reuse parallelisms [4,5], and leveraging hardware heterogeneity [6], and large inter-event reuse distance. Third, through a deep we study the implications of the programming model that understanding of event-specific characteristics, architects can is emerging in server-side JavaScript applications, i.e., asyn- mitigate the front-end inefficiencies of the managed-language- chronous event-driven programming. based event-driven execution via a combination of instruction In server-side asynchronous event-driven programming [7, cache insertion policy and prefetcher. 8], user requests are treated as application events and inserted into an event queue. Each event is associated with an event Categories and Subject Descriptors callback. The event-driven system employs a single-threaded event loop that traverses the event queue and executes any C.1 [Processor Architecture]: General available callbacks sequentially. Event callbacks may initi- ate additional I/O events that are executed asynchronously Keywords to the event loop in order to not block other requests. The event-driven model has a critical scalability advantage over the Microarchitecture, Event-driven, JavaScript, Prefetcher conventional thread-based model because it eliminates the ma- 1. Introduction jor inefficiencies associated with heavy threading, e.g., context switching and thread-local storage [9, 10]. Thus, the event- Processor architecture advancements have been largely driven driven model has been widely adopted in building scalable by careful observations made of software. By examining Web applications, mainly through the Node.js [11] framework. and leveraging the inherent workload characteristics, such We find that event-driven server applications are fundamen- as instruction-, thread-, and data-level parallelism, processor tally bounded by single-core performance because of their architects have been able to deliver more efficient computing reliance on the single-threaded event loop. However, unlike conventional single-threaded benchmarks (e.g., SPEC CPU 2006) for which current processor architectures are highly Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are optimized, event-driven server applications suffer from severe not made or distributed for profit or commercial advantage and that copies microarchitecture inefficiencies, particularly front-end bottle- bear this notice and the full citation on the first page. To copy otherwise, to necks, i.e., high instruction cache and TLB misses and branch republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. misprediction rate. Moreover, unlike conventional heavily MICRO 2015 Waikiki, Hawaii USA threaded enterprise workloads that also suffer from front-end Copyright 2015 ACM 978-1-4503-4034-2/15/12 ...$15.00. Request 1 Event Handler Response 1 Handler Thread 1 Single-threaded Request 1 Request 2 Response 2 Incoming Event Loop Event Handler Request 2 Requests Incoming Handler Network Network Network Event Queue Requests Thread 2 . Dispather Request N Outgoing . Thread Responses Response N Request N Event Handler Handler I/O Helper Thread N Thread Pool Figure 1: In the thread-based execution model, each incoming Figure 2: In the event-based execution model, each incoming client request is assigned to a unique thread, which is respon- client request is handled by the single-threaded event loop. sible for returning a request to the client. I/O operations are handled asynchronously. issues, the front-end bottleneck of an event-driven server appli- 2. Background cation stems from the single-threaded event execution model, rather than microarchitectural resources being clobbered by Web applications employ server-side scripting to respond to multiple threads. With the front-end constituting up to half of network requests and provide dynamic content to end-users. the execution cycles, it is clear that current server processor The traditional approach to providing responsiveness to end- designs are suboptimal for executing event-driven workloads. users at scale has been thread-based programming, i.e., to To improve the front-end efficiency of event-driven server increase the number of threads as the number of incoming applications, we study them from an event perspective. We requests increases. Recently, because of several fundamental find that the severe front-end issue arises fundamentally be- limitations of heavy multi-threading that limit system scalabil- cause events have large instruction footprints with little intra- ity, many industry leaders, such as eBay, PayPal, and LinkedIn, event code reuse. Recent studies on client-side event-driven have started to adopt event-driven programming as an alterna- applications also derive similar conclusions [12, 13]. We take tive to achieve scalability more efficiently. this research a step further to make the key observation that In this section, we first discusses thread-based execution event-driven programming inherently exposes strong inter- and its limitations (Sec. 2.1). On that basis, we explain why event code reuse. Taking the L1 I-cache as a starting point, event-driven programming is emerging as an alternative for we show that coordinating the cache insertion policy and the developing large-scale Web applications (Sec. 2.2). instruction prefetcher can exploit the unique inter-event code 2.1. Thread-based Programming reuse and reduce the I-cache MPKI by 88%. In summary, we make the following contributions: Traditional server-side scripting frameworks, such as PHP and • To the best of our knowledge, we are the first to systemati- Ruby, pair user requests with threads, commonly known as cally characterize server-side event-driven execution ineffi- the “thread-per-request” or “single-request-per-script” execu- ciencies, particularly the front-end bottlenecks. tion model. These thread-based execution models, shown in • We tie the root-cause of front-end inefficiencies to charac- generality in Fig.1, consist of a dispatch thread that assigns teristics inherent to the event-driven programming model, each incoming request to a worker thread for processing. The which gives critical microarchitectural optimization insights. result is that at any given time, the server abounds with the • We show that it is possible to drastically optimize away the same number of threads as the number of requests. instruction cache inefficiencies by coordinating the cache While the thread-based execution model is intuitive to pro- insertion policy and prefetching strategy. gram with, it suffers from fundamental drawbacks. As the The remainder of the paper is structured as follows. Sec.2 number of client requests increases, so does the number of provides a background into asynchronous event-driven pro- active threads in the system. As a result, the operating system gramming and why it has emerged as a crucial tipping point overhead, such as thread switching and aggregated memory in server-side programming. Sec.3 presents the workloads we footprint, grows accordingly. Therefore, operating systems study and shows the event-driven applications’ single-threaded typically

Load more