
SlimCache: Exploiting Data Compression Opportunities in Flash-based Key-value Caching Yichen Jia Zili Shao Feng Chen Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering Louisiana State University The Chinese University of Hong Kong Louisiana State University [email protected] [email protected] [email protected] Abstract—Flash-based key-value caching is becoming popular First, compared to memory-based key-value cache, such as in data centers for providing high-speed key-value services. These Memcached, flash-based key-value caches are usually 10-100 systems adopt slab-based space management on flash and provide times larger. As key-value items are typically small (e.g., tens a low-cost solution for key-value caching. However, optimizing cache efficiency for flash-based key-value cache systems is highly to hundreds of bytes), a flash-based key-value cache often challenging, due to the huge number of key-value items and the needs to maintain billions of key-value items, or even more. unique technical constraints of flash devices. In this paper, we Tracking such a huge number of small items in cache man- present a dynamic on-line compression scheme, called SlimCache, agement would result in an unaffordable overhead. Also, many to improve the cache hit ratio by virtually expanding the usable advanced cache replacement algorithms, such as ARC [36] and cache space through data compression. We have investigated the effect of compression granularity to achieve a balance between CLOCK-Pro [26], need to maintain a complex data structure compression ratio and speed, and leveraged the unique workload and a deep access history (e.g., information about evicted characteristics in key-value systems to efficiently identify and data), making the overhead even more pronounced. Therefore, separate hot and cold data. In order to dynamically adapt to a complex caching scheme is practically infeasible for flash- workload changes during runtime, we have designed an adaptive based key-value caches. hot/cold area partitioning method based on a cost model. In order to avoid unnecessary compression, SlimCache also estimates Second, unlike DRAM, flash memories have several unique data compressibility to determine whether the data are suitable technical constraints, such as the well-known “no in-place for compression or not. We have implemented a prototype overwrite” and “sequential-only writes” requirements [7], [15]. based on Twitter’s Fatcache. Our experimental results show that As such, flash devices generally favor large, sequential, log- SlimCache can accommodate more key-value items in flash by like writes rather than small, random writes. Consequently, up to 125.9%, effectively increasing throughput and reducing average latency by up to 255.6% and 78.9%, respectively. flash-based key-value caches do not directly “replace” small key-value items in place as Memcached does. Instead, key- I. INTRODUCTION value data are organized and replaced in large coarse-grained chunks, relying on Garbage Collection (GC) to recycle the Today’s data centers still heavily rely on hard disk drives space occupied by obsolete or deleted data. This unfortunately (HDDs) as their main storage devices. To address the per- further reduces the usable cache space and affects the caching formance problem of disk drives, especially for handling efficiency. random accesses, in-memory key-value cache systems, such as For the above two reasons, it is difficult to solely rely Memcached [37], become popular in data centers for serving on developing a complicated, fine-grained cache replacement various applications [20], [48]. Although memory-based key- algorithm to improve the cache hit ratio for key-value caching value caches can eliminate a large amount of key-value data in flash. In fact, real-world flash-based key-value cache sys- retrievals (e.g., “User ID” and “User Name”) from the back- tems often adopt a simple, coarse-grained caching scheme. For end data stores, they also raise concerns on high cost and example, Twitter’s Fatcache uses a First-In-First-Out (FIFO) power consumption issues in a large-scale deployment. As policy to manage its cache in a large granularity of slabs (a an alternative solution, flash-based key-value cache systems group of key-value items) [48]. Such a design, we should note, recently have attracted an increasingly high interest in industry. is an unwillingly-made but necessary compromise to fit the For example, Facebook has deployed a key-value cache system needs for caching many small key-value items in flash. based on flash, called McDipper [20], as a replacement of the This paper seeks an alternative solution to improve the expensive Memcached servers. Twitter has a similar key-value cache hit ratio. This solution, interestingly, is often ignored in cache solution, called Fatcache [48]. practice—increasing the effective cache size. The key idea is that for a given cache capacity, the data could be compressed to A. Motivations save space, which would “virtually” enlarge the usable cache The traditional focus on improving the caching efficiency space and allow us to accommodate more data in the flash is to develop sophisticated cache replacement algorithms [36], cache, in turn increasing the hit ratio. [26]. Unfortunately, it is highly challenging in the scenario of In fact, on-line compression fits flash devices very well. flash-based key-value caching. This is for two reasons. Figure 1 shows the percentage of I/O and computation time I/O 3 100 Compute Weibo 100% Tweets Reddit 2.5 80 80% 2 60% 60 Weibo 1.5 Tweets 40% 40 Reddit 1 20% Compression Ratio Percentage of total latency 20 0% 0.5 Cumulative Probability(%) 0 0 0 500 1000 1500 2000 1KB,R 4KB,R 1MB,R 1KB,W 4KB,W Item 64B 128B 256B 512B 1KB 2KB 4KB 8KB 16KB 32KB128KB 1MB,W 16KB,R 64KB,R 16KB,W 64KB,W 256KB,R Item Size(Bytes) 256KB,W Compression Granularity Fig. 1: I/O time v.s. computation time Fig. 2: Compr. ratio v.s. granularity. Fig. 3: Distribution of item sizes. for compressing and decompressing random data in different compressibility and conditionally apply on-line compression request sizes. The figure illustrates that for read requests, the to minimize the overhead. decompression overhead only contributes a relatively small Last but not least, we also need to be fully aware of the portion of the total time, less than 2% for requests smaller unique properties of flash devices. For example, flash devices than 64KB. For write requests, the compression operations are generally favor large and sequential writes. The traditional more computationally expensive, contributing for about 10%- log-based solution, though being able to avoid generating 30% of the overall time, but it is still at the same order of small and random writes, relies on an asynchronous Garbage magnitude compared to an I/O access to flash. Compared to Collection (GC) process, which would leave a large amount schemes compressing data in memory, such as zExpander [54], of obsolete data occupying the precious cache space and the relative computing overhead accounts for an even smaller negatively affect the cache hit ratio. percentage, indicating that it would be feasible to apply on-line All these issues must be well considered for an effective compression in flash-based caches. adoption of compression in flash-based key-value caching. B. Challenges and Critical Issues C. Our Solution: SlimCache Though promising, efficiently incorporating on-line com- In this paper, we present an adaptive on-line compression pression in flash-based key-value cache systems is non-trivial. scheme for key-value caching in flash, called SlimCache. Several critical issues must be addressed. SlimCache identifies the key-value items that are suitable First, various compression algorithms have significantly dif- for compression, applies a compression and decompression ferent compression efficiency and computational overhead [3], algorithm at a proper granularity, and expands the effectively [5], [32]. Lightweight algorithms, such as lz4 [32] and usable flash space for caching more data. snappy [3], are fast, but only provide moderate compression In SlimCache, the flash cache space is dynamically divided uncompressed ratio (i.e., compressed ); heavyweight schemes, such as the into two separate regions, a hot area and a cold area, to deflate algorithm used in gzip [2] and zlib [5], can provide store frequently and infrequently accessed key-value items, better compression efficacy, but are relatively slow and would respectively. Based on the highly skewed access patterns in incur higher overhead. We need to select a proper algorithm. key-value systems [8], the majority, infrequently accessed key- Second, compression efficiency is highly dependent on value items are cached in flash in a compressed format for the the compression unit size. A small unit size suffers from purpose of space saving. A small set of frequently accessed a low compression ratio problem, while aggressively using key-value items is cached in their original, uncompressed an oversized compression unit could incur a severe read format to avoid the read amplification and decompression amplification problem (i.e., read more than needed). Figure 2 penalty. The partitioning is automatically adjusted based on shows the average compression ratio of three datasets (Weibo, the runtime workloads. In order to create the desired large Tweet, Reddit) with different container sizes. We can see that sequential write pattern on flash, the cache eviction process these three datasets are all compressible, as expected, and a and the hot/cold data separation mechanism are integrated to larger compression granularity generally results in a higher minimize the cache space waste caused by data movement compression ratio. In contrast, compressing each key-value between the two areas. item individually or using a small compression granularity To our best knowledge, SlimCache is the first work intro- (e.g., smaller than 4 KB) cannot reduce the data size effec- ducing compression into flash-based key-value caches.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages14 Page
-
File Size-