Optimizing Server Architectures for Microservice Diversity @Scale

SoftSKU: Optimizing Server Architectures for Microservice Diversity @Scale Akshitha Sriraman†*, Abhishek Dhanotia*, Thomas F. Wenisch† University of Michigan†, Facebook* [email protected], [email protected], [email protected] 1.E+06 ABSTRACT 1.E+04 The variety and complexity of microservices in warehouse- 1.E+02 1.E+00 scale data centers has grown precipitously over the last few 1.E-02 years to support a growing user base and an evolving product IPC portfolio. Despite accelerating microservice diversity, there CPU util. CPU ITLB MPKI ITLB util. Throughput is a strong requirement to limit diversity in underlying server latency Req. µservices (log scale)(logµservices hardware to maintain hardware resource fungibility, preserve MPKI coDe LLC Context switches Context Mem. banDwiDth procurement economies of scale, and curb qualification/test Diversityorvariation rangeacross System-level parameters Architectural parameters overheads. As such, there is an urgent need for strategies that enable limited server CPU architectures (a.k.a “SKUs”) to Figure 1: Variation in system-level & architectural traits across mi- provide performance and energy efficiency over diverse microservices: our microservices face extremely diverse bottlenecks. croservices. To this end, we first undertake a comprehensive characterization of the top seven microservices that run on 1 Introduction the compute-optimized data center fleet at Facebook. The increasing user base and feature portfolio of web ap- Our characterization reveals profound diversity in OS and plications is driving precipitous growth in the diversity and I/O interaction, cache misses, memory bandwidth utilization, complexity of the back-end services comprising them [1]. instruction mix, and CPU stall behavior. Whereas customiz- There is a growing trend towards microservice implementa- ing a CPU SKU for each microservice might be beneficial, it tion models [2–6], wherein a complex application is decom- is prohibitive. Instead, we argue for “soft SKUs”, wherein we posed into distributed microservices [7–10] that each provide exploit coarse-grain (e.g., boot time) configuration knobs to specialized functionality [11], such as HTTP connection ter- tune the platform for a particular microservice. We develop a mination, key-value serving [12], protocol routing [13,14], or tool, mSKU, that automates search over a soft-SKU design ad serving [15]. This deployment model enables application space using A/B testing in production and demonstrate how it components’ independent scalability by ramping the number can obtain statistically significant gains (up to 7:2% and 4:5% of physical servers/cores dedicated to each in response to performance improvement over stock and production servers, diurnal and long-term load trends [5]. respectively) with no additional hardware requirements. At global user population scale, important microservices can grow to account for an enormous installed base of physi- CCS CONCEPTS cal hardware. Across Facebook’s global server fleet, seven Computer systems organization ! Cloud computing key microservices in four service domains run on hundreds of thousands of servers and occupy a large portion of the KEYWORDS compute-optimized installed base. These microservices’ im- Microservice, resource fungibility, soft SKU portance begs the question: do our existing server platforms serve them well? Are there common bottlenecks across mi- ACM Reference Format: croservices that we might address when selecting a future Akshitha Sriraman, Abhishek Dhanotia, Thomas F. Wenisch, 2019. Soft- server CPU architecture? SKU: Optimizing Server Architectures for Microservice Diversity @Scale. To this end, we undertake comprehensive system-level and In Proceedings of ISCA ’19, Phoenix, AZ, USA, June 22-26, 2019, 14 pages. architectural characterizations of these microservices on Face- https://doi.org/10.1145/3307650.3322227 book production systems serving live traffic. We find that Permission to make digital or hard copies of part or all of this work for personal or application functionality disaggregation across microservices classroom use is granted without fee provided that copies are not made or distributed has yielded enormous diversity in system and CPU architec- for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. tural requirements, as shown in Fig.1. For example, caching For all other uses, contact the owner/author. microservices [16] require intensive I/O and microsecond- ISCA ’19, June 22-26, 2019, Phoenix, AZ, USA scale response latency and frequent OS context switches can © 2019 Copyright is held by the owner/author(s). ACM ISBN 978-1-4503-6669-4/19/06. comprise 18% of CPU time. In contrast, a Feed [17] mi- https://doi.org/10.1145/3307650.3322227 croservice computes for seconds per request with minimal ISCA ’19, June 22-26, 2019, Phoenix, AZ, USA A. Sriraman et al. OS interaction. Our Web [18] microservice entails massive in- necks, highlighting potential design optimizations. struction footprints, leading to astonishing instruction cache and ITLB misses and branch mispredictions, while others • mSKU: A design tool that automatically tunes important execute much smaller instruction footprints. Some microser- configurable server parameters to create microservice- vices depend heavily on floating-point performance while specific “soft” server SKUs on existing hardware. others have no floating-point instructions. The microarchitec- • A detailed performance study of configurable server tural trends we discover differ markedly from those of SPEC parameters tuned by mSKU. CPU2006/2017 [19, 20], academic cloud workloads [21, 22], and even some of Google’s major services [1, 23]. The rest of the paper is organized as follows: We describe Such diversity might suggest a strategy to specialize CPU and measure these seven production microservices’ perfor- architectures to suit each microservice’s distinct needs. Opti- mance traits in Sec.2. We argue the need for Soft SKUs in mizing one or more of these microservices to achieve even Sec.3. We describe mSKU’s design in Sec.4 and we discuss single-digit percent speedups can yield immense performance- the methodology used to evaluate mSKU in Sec.5. We evalu- per-watt benefits. Indeed, we report observations that might ate mSKU in Sec.6, discuss limitations in Sec.7, compare inform future hardware designs. However, large-scale in- against related work in Sec.8, and conclude in Sec.9. ternet operators have strong economic incentives to limit hardware platforms’ diversity to (1) maintain fungibility of 2 Understanding Microservice Performance hardware resources, (2) preserve procurement advantages that We aim to identify software and hardware bottlenecks faced arise from economies of scale, and (3) limit the overhead of by Facebook’s key production microservices to see if they qualifying/testing myriad hardware platforms. As such, there share common bottlenecks that might be addressed in future is an immediate need for strategies that enable a limited set server CPU architectures. In this section, we (1) describe each of server CPU architectures (often called “SKUs,” short for microservice, (2) explain our characterization methodology, “Stock Keeping Units”) to provide performance and energy (3) discuss system-level characteristics to provide insights efficiency over microservices with diverse characteristics. into how each microservice is operated, (4) report on the Rather than diversify the hardware portfolio, we argue architectural characteristics and bottlenecks faced by each for “soft SKUs,” a strategy wherein we exploit coarse-grain microservice, and (5) summarize our characterization’s most (e.g., boot time) OS and hardware configuration knobs to important conclusions. A key theme that emerges throughout tune limited hardware SKUs to better support their presently our characterization is diversity; the seven microservices assigned microservice. Unlike data centers that co-locate differ markedly in their performance constraints’ time-scale, services via virtualization, Facebook’s microservices run on instruction mix, cache behavior, CPU utilization, bandwidth dedicated bare metal servers, allowing us to easily create requirements, and pipeline bottlenecks. Unfortunately, this microservice-specific soft SKUs. As microservice allocation diversity calls for sometimes conflicting optimization choices, needs vary, servers can be redeployed to different soft SKUs motivating our pursuit of “soft SKUs” (Section3) rather than through reconfiguration and/or reboot. Our OS and CPUs custom hardware for each microservice. provide several specialization knobs; in this study, we focus on seven: (1) core frequency, (2) uncore frequency, (3) active 2.1 The Production Microservices core count, (4) code vs. data prioritization in the last-level cache ways, (5) hardware prefetcher configuration, (6) use of We characterize seven microservices in four diverse service transparent huge pages, and (7) use of static huge pages. domains running on Facebook’s compute-optimized data center fleet. The workloads with longer work-per-request (e.g. Identifying the best microservice-specific soft-SKU config- Feed2, Ads1) might be called “services” by some readers; uration is challenging: the design space is large, service code we use “microservice” since none of these systems is entirely evolves quickly, synthetic load tests do not necessarily cap- stand-alone. We characterize on

Load more