Performance Analysis of Memcached

Performance Analysis of Memcached Vijay Chidambaram Deepak Ramamurthi Department of Computer Science Department of Computer Science University of Wisconsin Madison University of Wisconsin Madison [email protected] [email protected] Abstract subject. There are also no publicly available performance Memcached is an open source, high-performance,distributed studies made by industry. It is very poorly documented in- memory object caching system. It is widely used by large spite of being widely used. Development is carried out by a scale Web 2.0 companies to speed up dynamic web appli- dozen programmers, and outside this elite circle, knowledge cations by alleviating database load. However, there is little of how memcached works is not widespread. or no academic literature on memcached - It is not under- In this work, we set out to understand memcached, ana- stood very well, and its performance has not been analyzed lyze its performance and find possible bottlenecks. Our main so far. In this paper, we present an analysis of memcached. contributions in this paper include analysis of memcached’s We identify avenues for optimizing memcached and explore performance( both with respect to the numberof clients con- two of them. We analyze the effects on performance for both tacting the server, and the size of the data objects that mem- avenues. cached stores ), and identification of opportunities for optimizing memcached. As proof of concept, we explore two 1. Introduction such opportunities and report the effects on performance. We look at tuning the e1000 network driver for mem- Memcached[6] is a distributed cache, originally developed cached - Interrupt blanking, where the system is non- at Danga to improve their website performance. Since then, responsive to interrupts for a period of time; Opportunistic it has evolved into a high-performance object caching sys- polling, which switches between polling and interrupts for tem which is widely used by large-scale companies. Face- I/O based on the load; We look at modifying the number of book has the world’s largest memcached deployment - It is buffers for transmitting and receiving packets and the delays used in their photo service to cache the on-disk locations of associated with transmitting and receiving packets. photos that are requested by users all over the world. Live- We also look at providing operating system support for Journal uses memcached to alleviate its database load due to memcached in terms of zero-copying. The Linux kernel has hundreds of thousands of users accessing blog entries. Twit- some interesting system calls called splice and vmsplice ter uses memcached to reduce database load when millions which implement zero-copy. We modify memcached to use of users twitter and access each other’s streams. these system calls and document the effect of this modifica- The power of memcached lies in the fact that it is very tion on performance. easy to scale - To increase performance or to increase the The rest of the paperis organizedas follows:Section 2 ex- amount of data cached, one just adds nodes. Memcached plains what memcached is, and some of its important prop- is very fast since it resides entirely in main memory. This erties. In section 3, we motivate our work and enumerate makes it very attractive to companiesthat might need to scale our goals. Section 4 discusses the experimentalsetup that we very quickly. used, along with the specific tools used in the work. We ana- Inspite of the huge deployments of memcached in several lyze the performanceof memcachedin Section 5. We present companies all over the world, it remains a poorly understood the design of our modified memcached in section 6, and ex- system. The research community do not understand mem- plain its implementation in Section 7. We present a compari- cached well - there have been no academic papers on the son of performancebetween the original memcached and the modified memcached in Section 8, along with an analysis of the performance. We discuss related work in Section 9, and conclude in section 10. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed 2. Background for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute Memcached is an in-memory key-value store for small to lists, requires prior specific permission and/or a fee. chunks of arbitrary data (strings, objects) from results of Figure 1. Memcached in action. Once values have been cached in the memcached server, GET requests from the Nginx http server can be directly served by memcached, easing load on the app servers. Upon value updates, the app servers SET the values in memcached. database calls, API calls, or page rendering. Each data item ”set” messages to the server via UDP, but you cannot send in memcached is associated with a key, which is used to ’get’ requests using UDP, since you need to reliably get retrieve that particular data item. Memcached uses a sim- the response back. It can run in blocking and non-blocking ple hash table to stores keys and values. The lookup takes modes. Currently, the maximum size of key-value pairs that constant time. can be stored is 1 MB. Memcached is a distributed caching system - There may Memcached supports multi-threading - the 1.4.3 version be several physical machines running memcached, but a uni- which we use runs 4 threads by default. The number of fied view is presented to the end user. Memcached uses a threads run can be controlled by setting an option at the time technique called consistent hashing to map each key perma- of starting memcached. nently to a single server, among all the servers runningmem- cached. Identifying which server contains a particular key is 3. Motivation done at the client side, thereby relieving the servers of this Memcached is widely deployed by a number of large scale load. This contributes to the high scalability of memcached. Web 2.0 companies such as Facebook, Flickr, Youtube, Figure 1 demonstrates the working of memcached. Upon Digg, Twitter, LiveJournal and Wordpress. These compa- each update, the application servers SET the value in mem- nies operate at huge scale, typically serving hundreds of cached. Note that the value might be stored in any one of thousands of customers per second. These companies use the number of machines which are running memcached - Memcached to alleviate database load and speed up their But this is oblivious to the application servers. Now when web applications. This makes memcached one of the most the Nginx http server requires the result of a query, it first widely deployed distributed system of recent times. asks memcached whether it has that value. If memcached However, there has been little or no academic research has that value, a trip to the application servers is avoided about memcached. Due to the small amount of documenta- and the value is returned by it to the http server. This is how tion available online about it, the research community does memcached helps alleviate database application load. not have a good understanding of it. Our goals in this project Instead of using malloc and free, memcached uses slab are: allocation[7] to efficiently allocate space for key-value pairs. Upon start-up, chunks of memory are pre-allocated. Upon 1. To gain a deep understanding of memcached request for space for a particular value, the chunk closest 2. To find bottlenecks in memcached performance in size to it is returned. This is done to avoid fragmentation and to avoid the overhead of finding contiguous blocks of 3. To optimize memcached memory. The last goal of optimizing memcached is difficult be- Memcached uses TCP for communicating with clients. cause several companies, especially Facebook, have invested The latest version supports UDP partially, you can send time and money into optimizing it. The memcached project Table 1. Specification of server node in experiments ning code at low overhead. It consists of a kernel driver Parameter Value and a daemon for collecting sample data, and several post- Operating System Fedora Core 6 profiling tools for turning data into information. OProfile Linux Kernel Version 2.6.20 leverages the hardware performance counters of the CPU Disk size 120 Gb to enable profiling of a wide variety of interesting statis- Frequency 3000 MHz tics, which can also be used for basic time-spent profiling. Memory 4 GB All code is profiled: hardware and software interrupt han- Processor Pentium 4 dlers, kernel modules, the kernel, shared libraries, and appli- Number of processors 2 cations. 4.5 Workload leader at Facebook considers that the system is currently so Memcachedis a general purposekey-valuestore - It does not well tuned that adding a single system call will have a con- assume anything about the values that it is asked to store. siderable impact on throughput. To faithfully mimic this in our experiments, we randomly generated strings containing ASCII characters. In our exper- 4. Test Setup iments, the actual value stored in memcached is always ran- 4.1 Hardware Setup dom - only the length of the value is controlledduring the experiments. Upon startup, memcached is first pre-populated For our experiments, we borrowed machines from the Wis- with keys from range 0 to 10000. Then each client randomly consin Advanced Internet Laboratory. For preliminary ex- picks a number in this range and requests the value for that periments, we used 10 machines. For the final experiments, key.

Performance Analysis of Memcached

Redis and Memcached

Modeling and Analyzing Latency in the Memcached System

Release 2.5.5 Ask Solem Contributors

An End-To-End Application Performance Monitoring Tool

WEB2PY Enterprise Web Framework (2Nd Edition)

Performance at Scale with Amazon Elasticache

High Performance with Distributed Caching

Opinnäytetyön Malli Vanhoille Word-Versioille

VSI's Open Source Strategy

Amazon Elasticache Deep Dive Powering Modern Applications with Low Latency and High Throughput

Developing Replication Plugins for Drizzle

Designing and Implementing Scalable Applications with Memcached and Mysql