Reducing DRAM Footprint with NVM in Facebook

Reducing DRAM Footprint with NVM in Facebook Assaf Eisenman1;2, Darryl Gardner2, Islam AbdelRahman2, Jens Axboe2, Siying Dong2, Kim Hazelwood2, Chris Petersen2, Asaf Cidon1, Sachin Katti1 1Stanford University, 2Facebook, Inc. ABSTRACT DRAM NVM TLC Flash Popular SSD-based key-value stores consume a large amount of DRAM in order to provide high-performance database operations. Max Read Bandwidth 75 GB/s 2.2 GB/s 1.3 GB/s However, DRAM can be expensive for data center providers, espe- Max Write Bandwidth 75 GB/s 2.1 GB/s 0.8 GB/s cially given recent global supply shortages that have resulted in Min Read Latency 75 ns 10 µs 100 µs increasing DRAM costs. In this work, we design a key-value store, Min Write Latency 75 ns 10 µs 14 µs MyNVM, which leverages an NVM block device to reduce DRAM Endurance Very High 30 DWPD 5 DWPD usage, and to reduce the total cost of ownership, while providing Capacity 128 GB 375 GB 1.6 TB comparable latency and queries-per-second (QPS) as MyRocks on Table 1: An example of the key characteristics of DDR4 a server with a much larger amount of DRAM. Replacing DRAM DRAM with 2400 MT/s, NVM block device, and 3D TLC flash with NVM introduces several challenges. In particular, NVM has device in a typical data center server. DWPD means Device limited read bandwidth, and it wears out quickly under a high write Writes Per Day. bandwidth. We design novel solutions to these challenges, including using small block sizes with a partitioned index, aligning blocks post-compression to reduce read bandwidth, utilizing dictionary As DRAM faces major scaling challenges [19, 22], its bit supply compression, implementing an admission control policy for which growth rate has experienced a historical low [17]. Together with the objects get cached in NVM to control its durability, as well as replac- growing demand for DRAM, these trends have led to problems in ing interrupts with a hybrid polling mechanism. We implemented global supply [17], increasing the total cost of ownership (TCO) for MyNVM and measured its performance in Facebook’s production data center providers. Over the last year, for example, the average environment. Our implementation reduces the size of the DRAM DRAM DDR4 price has increased by 2.3× [1]. cache from 96 GB to 16 GB, and incurs a negligible impact on la- Our goal in this work is to reduce the DRAM footprint of My- tency and queries-per-second compared to MyRocks. Finally, to the Rocks, while maintaining comparable mean latency, 99th percentile best of our knowledge, this is the first study on the usage of NVM latency, and queries-per-second (QPS). The first obvious step would devices in a commercial data center environment. be to simply reduce the DRAM capacity on a MyRocks server con- 1 INTRODUCTION figuration. Unfortunately, reducing the amount of DRAM degrades the performance of MyRocks. To demonstrate this point, Figure 1 Modern key-value stores like RocksDB and LevelDB are highly shows that when reducing the DRAM cache in MyRocks from dependent on DRAM, even when using a persistent storage medium 96 GB to 16 GB, the mean latency increases by 2×, and P99 latency (e.g. flash) for storing data. Since DRAM is still about 1000× faster increases by 2.2×. than flash, these systems typically leverage DRAM for caching hot NVM offers the potential to reduce the dependence of systems indices and objects. like MyRocks on DRAM. NVM comes in two forms: a more expen- MySQL databases are often implemented on top of these key- sive byte-addressable form, and a less expensive block device. Since value stores. For example, MyRocks, which is built on top of RocksDB, our goal is to reduce total cost of ownership, we use NVM as a is the primary MySQL database in Facebook and is used extensively block device. Table 1 provides a breakdown of key DRAM, NVM, in data center environments. MyRocks stores petabytes of data and Triple-Level Cell (TLC) flash characteristcs in a typical data and is used to serve real-time user activities across large-scale web center server. An NVM block device offers 10× faster reads and applications [23]. In order to achieve high throughput and low la- has 5× better durability than flash [3]. Nevertheless, NVM is still tency, MyRocks consumes a large amount of DRAM. In our case, we more than 100× slower than DRAM. To make up for that, NVM leverage a MyRocks server that consists of 128 GB DRAM and 3 TB can be used to reduce the number of accesses to the slower flash of flash. Many other key-value stores are also highly dependent on layer, by significantly increasing the overall cache size. To givea DRAM, including Tao [9] and Memcached [24]. sense of the price difference between NVM and DRAM, consider Permission to make digital or hard copies of part or all of this work for personal or that on October 23, 2017, Amazon.com lists the price for an Intel classroom use is granted without fee provided that copies are not made or distributed Optane NVM device with 16 GB at $39, compared to $170 for 16 GB for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. of DDR4 DRAM 2400MHz. For all other uses, contact the owner/author(s). However, there are several unique challenges when replacing EuroSys ’18, April 23–26, 2018, Porto, Portugal DRAM with an NVM device. First, NVM suffers from significantly © 2018 Copyright held by the owner/author(s). lower read bandwidth than DRAM, and its bandwidth heavily de- ACM ISBN 978-1-4503-5584-1/18/04. https://doi.org/10.1145/3190508.3190524 pends on the block size. Therefore, simply substituting DRAM with 800 7000 700 6000 ) (μs) 600 μs 5000 500 4000 400 3000 300 2000 200 P99 Latency ( Average Latency 1000 100 0 0 MyRocks (96GB MyRocks (16GB MyNVM (16GB MyRocks (96GB MyRocks (16GB MyNVM (16GB DRAM Cache) DRAM Cache) DRAM Cache + DRAM Cache) DRAM Cache) DRAM Cache + 140GB NVM Cache) 140GB NVM Cache) Figure 1: Average and P99 latencies for different cache sizes in MyRocks, compared with MyNVM, using real production work- loads. NVM will not achieve the desired performance due to read bandwidth constraints. In addition, significantly reducing the DRAM footprint results in a smaller amount of data that can be cached for fast DRAM access. This requires a redesign of the indices used Figure 2: The format of a Sorted String Table (SST). to lookup objects in RocksDB. Second, using smaller data blocks reduces their compression ratio, thus increasing the total database size on flash. Third, unlike DRAM, NVM has write durability con- cache by 6×. Our implementation achieves a mean latency and P99 straints. If it is used as a DRAM replacement without modification, latency of 443 µs and 3439 µs, respectively. As shown in Figure 1, it will wear out too quickly. Fourth, because NVM has a very low both its mean and P99 latencies are 45% lower than the MyRocks latency relative to other block devices, the operating system inter- server with a low amount of DRAM. Compared to the MyRocks rupt overhead becomes significant, adding a 20% average latency with the high amount of DRAM, MyNVM’s average latency is only overhead. 10% higher, and its P99 latency is 20% higher. Its QPS is 50% higher We present MyNVM, a system built on top of MyRocks, that than the MyRocks server with a small amount of DRAM, and only significantly reduces the DRAM cache using a second-layer NVM 8% lower than MyRocks with the large DRAM cache. cache. Our contributions include several novel design choices that address the problems arising from adopting NVM in a data center 2 BACKGROUND setting: To give context for our system, we provide a primer on RocksDB and (1) NVM bandwidth and index footprint: Partititioning the data- MyRocks, and discuss the performance and durability challenges base index enables the caching of fine grained index blocks. of NVM technology. Thus, the amount of cache consumed by the index blocks is reduced, and more DRAM space is left for caching data 2.1 RocksDB and MyRocks blocks. Consequently, the read bandwidth of the NVM is RocksDB is a popular log-structured merge tree (LSM) key-value also reduced. When writing to NVM, compressed blocks are store for SSD. Since many applications still rely on SQL backends, grouped together in the same NVM page (when possible) MyRocks is a system that adapts RocksDB to support MySQL and aligned with the device pages, to further reduce NVM queries, and is used as the primary database backend at Face- read bandwidth. book [23]. (2) Database size: When using smaller block sizes, compression In RocksDB [5], data is always written to DRAM, where it is becomes less efficient. To improve the compression ratio we stored in Memtables, which are in-memory data structures based utilize a dictionary compression. on Skiplists. Once a Memtable becomes full, its data is written into (3) Write durability: Admission control to NVM only permits a Sorted String Table (SST), which is depicted in Figure 2. An SST writing objects to NVM that will have a high hit rate, thus is a file on disk that contains sorted variable-sized key-value pairs reducing the total number of writes to NVM. that are partitioned into a sequence of data blocks. In addition (4) Interrupt latency: Hybrid polling (where the kernel sleeps for to data blocks, the SST stores meta blocks, which contain Bloom most of the polling cycle) can reduce the I/O latency overhead filters and metadata of the file (e.g.

Load more