Elmem: Towards an Elastic Memcached System

ElMem: Towards an Elastic Memcached System Ubaid Ullah Hafeez, Muhammad Wajahat, Anshul Gandhi; Stony Brook University PACE Lab, Department of Computer Science, Stony Brook University {uhafeez,mwajahat,anshul}@cs.stonybrook.edu Abstract—Memory caches, such as Memcached, are a critical Recent studies have shown that latencies and throughput component of online applications as they help maintain low la- degrade significantly, by as much as 10×, when autoscaling tencies by alleviating the load at the database. However, memory a caching tier [8]; worse, performance recovery can take tens caches are expensive, both in terms of power and operating costs. It is thus important to dynamically scale such caches in response of minutes [19]. Our own results confirm these findings as to workload variations. Unfortunately, stateful systems, such as well (see Section II-D). Conversely, tail latencies for online Memcached, are not elastic in nature. The performance loss that services such as Amazon and Facebook are on the order of follows a scaling action can severely impact latencies and lead to tens of milliseconds [20], [21]; even a subsecond delay can SLO violations. quickly translate to significant revenue loss due to customer This paper proposes ElMem, an elastic Memcached system abandonment. that mitigates post-scaling performance loss by proactively migration hot data between nodes. The key enabler of our work is an Most of the prior work on autoscaling focus on stateless efficient algorithm, FuseCache, that migrates the optimal amount web or application tier nodes which do not store any data [4]– of hot data to minimize performance loss. Our experimental [7]. There is also some prior work on autoscaling replicated results on OpenStack, across several workload traces, show that database tiers [2], [3]. Memory caching tiers are neither ElMem elastically scales Memcached while reducing the post- scaling performance degradation by about 90%. stateless nor replicated, and are thus not amenable to the above approaches. While there are some works that discuss Index Terms—elastic, memcached, auto-scaling, data- migration autoscaling of Memcached nodes (see Section VI), they do not address the key challenge of performance loss following I. INTRODUCTION a scaling action. A crucial requirement for online applications is elasticity – This paper presents ElMem, an elastic Memcached system the ability to add and remove servers in response to changes designed specifically to mitigate post-scaling performance in workload demand, also referred to as autoscaling. In a loss. ElMem seamlessly migrates hot data items between physical deployment, elasticity can reduce energy costs by Memcached nodes before scaling to realize the cost and energy ∼ 30-40% [1]–[4]. Likewise, in a virtual (cloud) deployment, benefits of elasticity. To minimize overhead, we implement elasticity can reduce resource rental costs [5]–[8]; in fact, elas- ElMem in a decentralized manner and regulate data movement ticity is often touted as one of the key motivating factors for over the network (Section III). cloud computing [9]–[11]. Elasticity is especially important for The key enabler of ElMem is our novel cache merging customer-facing services that experience significant variability algorithm, FuseCache, that determines the optimal subset of in workload demand over time [12]–[15]. hottest items to move between retiring and retained nodes. Customer-facing services and applications, including Face- FuseCache is based on the median-of-medians algorithm [22], book [12], [16] and YouTube [17], typically employ distributed and finds the n hottest items across any k sorted lists. We also memory caching systems, e.g., Memcached [18], to allevi- show that FuseCache is within a factor log(n) of the lower ate critical database load and mitigate tail latencies. While bound time complexity, O(klog(n)) (Section IV). Memcached enables significant performance improvements, We experimentally evaluate ElMem on a multi-tier, memory (DRAM) is an expensive resource, both in terms Memcached-backed, web application deployment using sev- of power and cost. Our analysis of Memcached usage in eral workload traces, including those from Facebook [12] and Facebook [12] (see Section II) suggests that a cache node Microsoft [23]. Our results show that ElMem significantly is 66% costlier and consumes 47% more power than an reduces tail response times, to the tune of 90%, and enables application or web tier node. Clearly, an elastic Memcached cost/energy savings by autoscaling Memcached (Section V). solution would be invaluable to customer-facing applications. Further, compared to existing solutions, ElMem reduces tail Unfortunately, memory caching systems are not elastic in response times by about 85%. nature owing to their statefulness. Consider a caching node To summarize, we make the following contributions: that is being retired in response to low workload demand. All incoming requests for data items that were cached on the 1) We present the design and implementation of ElMem, an retiring node will now result in cache misses and increased elastic Memcached system. load on the (slower) database, leading to high tail latencies. 2) We develop an optimal cache migration algorithm, Fuse- Cache, that enables ElMem and runs in near-optimal time. Consistent hashing is typically employed to minimize the 3) We implement ElMem and experimentally illustrate its change in key membership upon node failures. benefits over existing solutions. Note that the client library, and not Memcached, determines The rest of the paper is organized as follows. Section II which node to contact. Memcached nodes are not aware of provides the necessary background and motivation for our the key range that they (or the other nodes) are responsible work. Section III describes the system design of ElMem and for storing, thus placing this responsibility on the client. Each Section IV presents our FuseCache algorithm. We present our Memcached node can be treated as a simple cache which stores evaluation results in Section V. We discuss related work in items in memory. If the number of items stored exceeds the Section VI and conclude in Section VII. memory capacity, items are evicted using the Least-Recently- Used (LRU) algorithm in O(1) time by simply deleting the II. BACKGROUND,MOTIVATION, AND CHALLENGES tail of the MRU list. Online services are often provided by multi-tier deploy- B. Cost/Energy analysis of Memcached ments consisting of load-balanced web/application servers and Despite the many benefits of Memcached, it is an expensive data storage servers. To avoid performance loss due to slow solution, both in terms of cost and energy, because of its I/O access at the data tier(s), many application owners employ DRAM usage. Recent numbers from Facebook suggest that memory caching systems. These systems provide low latency they use 72GB of memory per Memcached node [27]. By responses to clients and alleviate critical database load by contrast, the web or application tier nodes are equipped with caching hot data items on memory (DRAM); several caching 12GB of memory. Memcached nodes typically have a Xeon servers can be employed in a distributed manner to provide CPU [16], and we expect application tier nodes to have scalable memory caching. A popular caching system that is about twice the computing power as that of a Memcached employed by several companies, including Facebook [12], node. Using power numbers reported by Fan et al. [28], [16], Twitter [24], Wikipedia [25], and YouTube [17], is and normalizing them to get per GB and per CPU socket Memcached [18]. In the remainder of this paper, we focus power consumption, we estimate that an application tier server on Memcached as our memory caching system. (2 CPU sockets, 12GB) will consume 204 Watts of (peak) A. Memcached overview power, whereas a Memcached node (1 CPU socket, 72GB) Memcached Nodes will consume 299 Watts (47% additional power). In terms Web/Application Servers Client of cost, a compute-optimized EC2 instance currently costs Requests Load $0.1/hr, whereas a memory-optimized EC2 instance currently Balancer Storage Tier (Database) costs $0.166/hr (66% higher cost), based on numbers for large sized instances [29]. C. Potential benefits of an elastic Memcached Fig. 1. Illustration of a multi-tier Memcached-backed application. Online services that employ Memcached often exhibit large Memcached is a distributed in-memory key-value (KV) variations in arrival rate [1], [14], [30], presenting an op- store that serves as an in-memory cache. Memcached sits in portunity for reducing operating costs by dynamic scaling. between the client and the back-end database or storage tier, For example, production traces from Facebook [12] indicate as shown in Figure 1, and aggregates the available memory load variations on the order of 2× due to the diurnal nature of all nodes in the caching tier to cache data. The memory of customer-facing applications, and a further 2–3× due to allocated to Memcached on a node is internally divided into traffic spikes. Our preliminary analysis of these traces reveals 1MB pages. The pages are grouped into slabs, where each that a perfectly elastic Memcached tier—one that instantly slab is responsible for storing KV pairs, or items, of a given adds/removes the optimal number of nodes and consolidates size range (to minimize fragmentation) by assigning each item all hot data on the resulting nodes—can reduce the number of to a chunk of memory. Within a slab, KV pairs are stored as caching nodes by 30–70%. Unfortunately, dynamic scaling of a doubly-linked list in Most-Recently-Used (MRU) order. Memcached is a difficult problem. Clients read (get) and write (set) KV data from Mem- D. The inelastic nature of Memcached cached via a client-side library, such as libmemcached [26]. Stateful systems store data that is required for the efficient The library hashes the requested key and determines which functioning of the application.

Elmem: Towards an Elastic Memcached System

Redis and Memcached

Modeling and Analyzing Latency in the Memcached System

Release 2.5.5 Ask Solem Contributors

An End-To-End Application Performance Monitoring Tool

WEB2PY Enterprise Web Framework (2Nd Edition)

Performance at Scale with Amazon Elasticache

High Performance with Distributed Caching

Opinnäytetyön Malli Vanhoille Word-Versioille

VSI's Open Source Strategy

Amazon Elasticache Deep Dive Powering Modern Applications with Low Latency and High Throughput

Developing Replication Plugins for Drizzle

Designing and Implementing Scalable Applications with Memcached and Mysql