WLFC: Write Less in the Flash-Based Cache

WLFC: Write Less in the Flash-based Cache Chaos Dong Fang Wang Jianshun Zhang Huazhong University of Huazhong University of Huazhong University of Science and Technology Science and Technology Science and Technology Huhan, China Huhan, China Huhan, China Email: [email protected] Email: [email protected] Email: [email protected] Abstract—Flash-based disk caches, for example Bcache [1] and log-on-log [5]. At the same time, the Open-Channel SSD Flashcache [2], has gained tremendous popularity in industry (OCSSD) has became increasingly popular in academia. in the last decade because of its low energy consumption, non- Through a standard OCSSD interface, the behaviors of the volatile nature and high I/O speed. But these cache systems have a worse write performance than the read performance because flash memory can be managed by the host to improve the of the asymmetric I/O costs and the the internal GC mechanism. utilization of the storage. In addition to the performance issues, since the NAND flash is In this paper, we mainly make three contributions to optimize a type of EEPROM device, the lifespan is also limited by the write performance of the flash-based disk cache. First, we Program/Erase (P/E) cycles. So how to improve the performance designed a write-friendly flash-based disk cache system and the lifespan of flash-based caches in write-intensive scenarios has always been a hot issue. Benefiting from Open-Channel SSDs upon Open-Channel SSDs, named WLFC (Write Less in (OCSSDs) [3], we propose a write-friendly flash-based disk cache the Flash-based Cache), in which the requests is handled system, which is called WLFC (Write Less in the Flash-based by a strictly sequential writing method to reduce write Cache). In WLFC, a strictly sequential writing method is used to amplification. Second, we propose a new cache replacement minimize the write amplification. A new replacement algorithm algorithm which takes both the asymmetric I/O costs of flash for the write buffer is designed to minimize the erase count caused by the evicting. And a new data layout strategy is designed memory and the recent access history into account. Thus, to minimize the metadata size persisted in SSDs. As a result, the the erase count caused by the evicting is greatly reduced the Over-Provisioned(OP) space is completely removed, the erase but the miss ratio is similar to the classical LRU algorithm. count of the flash is greatly reduced, and the metadata size is Third, we propose a new strategy to manage data so that the 1/10 or less than that in BCache. Even with a small amount of data consistency can be guaranteed after the crash even with metadata, the data consistency after the crash is still guaranteed. Compared with the existing mechanism, WLFC brings a 7%- a small amount of metadata. 80% reduction in write latency, a 1.07×-4.5× increment in write The rest of the paper is organized as follows. Section II throughput, and a 50%-88.9% reduction in erase count, with a introduces the physical characteristics of flash memory, moderate overhead in read performance. SSDs and the existing mechanisms in flash-based caches. Section III discusses the critical concerns of the existing Index Terms—flash-based cache, Open-Channel SSD, flash lifetime, write-friendly works about flash-based cache. Section IV shows the design of WLFC. Section V details the simulation experiments by comparing the performance of WLFC with B like, a cache I. INTRODUCTION system modeled after BCache. And in section VI, we make a Flash-based disk cache systems, such as Flashcache [2] and summary about this paper. Bcache [1], are widely used in the industry because of the advantages of flash memory, which include shock resistance, II. BACKGROUND arXiv:2104.05306v2 [cs.OS] 21 Jul 2021 low energy consumption, non-volatile nature and high I/O speed over the Hard Disk Driver (HDD) [4]. In general, the A. Flash memory flash memory is utilized as the write cache (in the write-back The minimum programming unit of NAND Flash memory mode) or the read cache (in the write-through or write-around is a page. Each page contains an out-of-band area (OOB) mode) for other slower device (a rotating HDD or a HDD to store some additional metadata, such as Error Correction array) [1]. The flash-based cache enables a lower Total Codes (ECC) and Page Mapping. A fixed number of pages Cost of Ownership (TCO) than DRAM cache and a better form a block. A fixed number of blocks form a plane which performance than the HDDs. However, the characteristics is the smallest single parallel unit within the storage media. A of NAND Flash, including programming by pages, erasing set of planes sharing the same transfer bus is called a Die [3]. by blocks, no in-place overwrites and a limited number of NAND flash memory is a type of EEPROM device which Program/Erase (PE) cycles, are quite different from those of supports three basic operations, namely Read, Write and Erase. HDDs. Simply treating the SSD as a faster storage device A page is the unit for reads and writes which are typically to cache data raises several critical concerns, including the fast (e.g., 50us and 500us respectively). A block is the unit unsuitable replacement algorithm, complex mapping, and for erases which are typically slow (e.g., 5ms) [6]. In flash memory, the pages in a block can only be programmed in works either take nothing about the recent access history of a strictly sequential method, meaning that the pages cannot the block into account or is based on the LRU which isn’t as be overwritten in place until the entire block is erased. These accurate as LFU or ARC, resulting in an increased number features introduce lots of extra performance overhead. What’s of cache misses. Some studies pay attention to resizing the more, the NAND flash cell has a limited number of P/E cycles. over-provisioned space dynamicly by the I/O pressure to For example, the SLC flash has 100,000 P/E cycles, the MLC reduce the erase count [16], [17]. Some adopt the idea of flash has 10,000 and the TLC flash has only 1,000 [7]. selective caching to filter out seldom accessed blocks and prevent them from entering cache. For example, Huang et B. Open-channel SSD al. [18] proposed LARC in the SSD cache, which is a new A piece of SSD consists of an SSD controller and flash cache replacement algorithm based ARC. Hua Wang et al. memory. The SSD controller can be divided into 3 parts. [19] use machine learning to train the new cache algorithm to Host Interface provides a block device interface of SATA or avoid unnecessary writes. These works take affect to reduce PCIe/NVMe to the host. the wear-out and extend SSD lifespan but still don’t consider Flash translation layer (FTL) is the core part of the SSD the extra overhead caused by evicting. The new replacement controller, consisting of these modules: Page Mapping, Wear- algorithm should consider both the inner I/O mechanism of Leveling, Garbage Collection (GC), Bad Block Management flash memory and the recent access history. and Error Correction (EC), etc. Flash Interface works for the data transmission between the SSD controller and flash memory. B. complex mapping The biggest difference between the traditional SSD and There are a lot of studies propose to buffer the write as logs the Open-Channel SSD is that the latter does not have in sequence [8] to improve the performance. The mapping a firmware Flash Translation Layer (FTL), but instead a of the write logs to Logic Block Addresses (LBA) is usually software FTL implemented in the host’s operating system. stored on disk for recovery and copied into the DRAM to The implementation of the FTL is more flexible in Open accelerate random access when running. For example, in Channel SSDs. BCache, the mapping is persisted in SSDs as the journal and managed as the B+ tree in DRAM. Meanwhile, the mapping of the logic pages to the physical flash memory pages is III. RELATED WORK AND MOTIVATION managed by the FTL in SSDs. These two types of mapping In the previous studies, separating cache into a read cache exist at two levels simultaneously, which increase both the and a write cache is a efficient way to improve cache perfor- DRAM consumption and the SSD overhead [20]. Fusing mance [8] because their roles are subtly different. The aim of these two levels of mapping into one is a good idea to reduce read cache is caching the hotest data to reduce the miss ratio the metadata size and shorten the software stack. but the aim of write cache is absorbing more small writes into a bulk. Some previous works propose to buffer the write as logs in sequence, called log buffer-way, which fits the physical C. log-on-log characteristics of flash memory [9]. However, there are still First, the I/O requests are buffered in SSDs sequentially some critical concerns in flash-based caches, which includes as logs. Second, for the recovery after the crash, a journal the unsuitable replacement algorithm, the complex mapping is introduced to record the updates of the log mapping. and the log-on-log problem. And third, in the physical level, the data is stored in the flash memory by a appending way, too. As a result, one A. Unsuitable replacement algorithm I/O request produces multilevel logs, called log-on-log [5], The extensively-used cache replacement algorithms, for bringing a serious write amplification problem.

WLFC: Write Less in the Flash-Based Cache

Sorting Algorithms

Random Access Memory (Ram)

Lenovo Thinksystem DM Series Performance Guide

CSS 133 Linked Lists, Stack, Queue

Random Access to Grammar-Compressed Strings and Trees

Enterprise COBOL for Z/OS V4.2 Language Reference File Data

Array Data Structures

Quiz 4 Solutions

Lecture5: Linked Lists Linked List Singly Linked Lists the Node Class

Merge Sort O( Nlog( N)) — O( Nlog( N)) O( N) Yes Yes Merging

Lecture Notes on Binary Search Trees

Memory Classification