Write Amplification Analysis in Flash-Based Solid State Drives
Total Page:16
File Type:pdf, Size:1020Kb
Write Amplification Analysis in Flash-Based Solid State Drives Xiao-Yu Hu, Evangelos Eleftheriou, Robert Haas, Ilias Iliadis, Roman Pletka IBM Research IBM Zurich Research Laboratory CH-8803 Rüschlikon, Switzerland {xhu,ele,rha,ili,rap}@zurich.ibm.com ABSTRACT age computer architecture, ranging from notebooks to en- Write amplification is a critical factor limiting the random terprise storage systems. These devices provide random I/O write performance and write endurance in storage devices performance and access latency that are orders of magnitude based on NAND-flash memories such as solid-state drives better than that of rotating hard-disk drives (HDD). More- (SSD). The impact of garbage collection on write amplifica- over, SSDs significantly reduce power consumption and dra- tion is influenced by the level of over-provisioning and the matically improve robustness and shock resistance thanks to choice of reclaiming policy. In this paper, we present a novel the absence of moving parts. probabilistic model of write amplification for log-structured NAND-flash memories have unique characteristics that flash-based SSDs. Specifically, we quantify the impact of pose challenges to the SSD system design, especially the over-provisioning on write amplification analytically and by aspects of random write performance and write endurance. simulation assuming workloads of uniformly-distributed ran- They are organized in terms of blocks, each block consist- dom short writes. Moreover, we propose modified versions ing of a fixed number of pages, typically 64 pages of 4 KiB of the greedy garbage-collection reclaiming policy and com- each. A block is the elementary unit for erase operations, pare their performance. Finally, we analytically evaluate whereas reads and writes are processed in terms of pages. the benefits of separating static and dynamic data in reduc- Before data can be written to a page (i.e., the page is pro- ing write amplification, and how to address endurance with grammed with that data), the page must have been erased. proper wear leveling. Moreover, NAND-flash memories have a limited program- erase cycle count. Typically, flash chips based on single-level cells (SLC) sustain 105 and those based on multi-level cells Categories and Subject Descriptors (MLC) 104 program-erase cycles. B.3.3 [Memory Structures]: Performance Analysis and Flash memory uses relocate-on-write – also called out-of- Design Aids—formal models, simulation; C.3 [Special-pur- place write – mainly for performance reasons: If write-in- pose and application-based systems]: Real-time and place is used instead, flash will exhibit high latency due to embedded systems; D.4.2 [Storage Management]: Gar- the necessary reading, erasing, and reprogramming (writing) bage collection of the entire block in which data is being updated. However, relocate-on-write necessitates a garbage-collec- General Terms tion process, which results in additional read and write oper- ations. Whereas the reclaiming policy that selects the blocks Design, Performance, Algorithms to garbage-collect in Sprite LFS [13] was based only on the amount of free space to be gained, the policy defined in [10] Keywords also included the time elapsed since the last writing of the Solid State Drives, Solid State Storage Systems, Write Am- block with data. In general, the objective is to keep at a plification, Flash Memory minimum the number of valid pages in the blocks selected for garbage collection. The efficiency of garbage collection could, for instance, be improved by delaying those blocks 1. INTRODUCTION holding data being actively invalidated. The number of read The advent of solid-state drives (SSD) based on NAND- and write operations resulting from garbage collection de- flash memories is currently revolutionizing the primary stor- pends on the number of valid pages in the block. In contrast to disks, flash memory blocks eventually wear out with progressing number of program-erase cycles until they can no longer be written. Wear-leveling techniques are Permission to make digital or hard copies of all or part of this work for therefore used to exhaust the program-erase cycles available personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies (i.e., the cycle budget) of as many blocks as possible, in bear this notice and the full citation on the first page. To copy otherwise, to order to serve the largest number of user writes (or host republish, to post on servers or to redistribute to lists, requires prior specific writes), thereby maximizing endurance. Their performance permission and/or a fee. is measured by the total unconsumed cycle budget left when SYSTOR ’09 Haifa, Israel garbage collection can no longer return a free block. Note Copyright 2009 ACM X-XXXXX-XX-X/XX/XX ...$5.00. that retention is another issue that can be addressed by wear garbage collection. Kawaguchi et al. [7] showed that in leveling as well. a flash-based log-structured file system garbage collection Assuming independent and uniformly distributed random has a significant impact on performance when utilization short writes, the optimal wear-leveling technique consists is high. A common strategy for garbage collection is the of wearing out all blocks over time as uniformly as pos- greedy reclaiming policy [3], in which the block that has the sible. This can be achieved, for instance, by minimizing largest number of invalid pages will be recycled. However, the delta between maximum wear and average wear over all as garbage collection also contributes to using up the cycle blocks, with this delta corresponding to the wear-leveling budget of blocks, it is usually beneficial to combine it with inefficiency as described in [6]. wear leveling. In reality, host writes are not uniformly distributed. If a Various such combined algorithms have been proposed. distinction can be made between blocks with static data (i.e., The one described by Chang et al. [3] avoid unnecessary addresses to which the host only infrequently rewrites data) reclamations in garbage collection and combines this with and blocks with dynamic data (with frequent rewrites), wear wear leveling in the form of a periodical task that performs leveling can benefit from treating these two types of blocks a linear search for blocks with a small erase count to iden- differently instead of wearing them out uniformly. Hence, tify blocks to be recycled. Agrawal et al. [1] describe an- in this case, wear-leveling performance not only depends on other combined algorithm called modified greedy garbage- the unconsumed cycle budget, but also on the number of collection strategy. The algorithm generally selects the block cycles wasted by repeatedly moving unmodified static data. with the most invalid pages for garbage collection, while In all cases, wear leveling causes additional read and write avoiding a large spread in the remaining cycle budget among operations. Therefore, in flash, write amplification corre- all blocks and limiting the frequent movement of static data. sponds to the additional writes caused by garbage collec- Such a strategy is referred to as static wear-leveling in [14, tion and by wear leveling. Hence, the total number of user 4] and exhibits a four fold improvement in endurance over writes that can be served depends on the total cycle budget a strategy that does not relocate static data (assuming 75% available, write amplification, and the eventual unconsumed of dynamic and 25% of static data). Ben-Aroya et al. [2] cycle budget due to wear-leveling inadequacy. performed a worst-case competitive analysis with focus on Finally, the management of out-of-place updates involves endurance-based randomized algorithms. To achieve nearly a mapping between logical block1 addresses (LBA), i.e., the ideal endurance, they suggest separating garbage collection user (or host) address space, and physical block addresses from wear-leveling. (PBA). This mapping may be used to distinguish dynamic Initially, write amplification has been studied by Rosen- from static data. blum et al. [13] for log-structured file systems as a func- In the remainder of this paper, we present a probabilistic tion of disk utilization. Whereas the Sprite LSF analysis analysis of write amplification in log-structured flash-based distinguishes between hot and cold data (including reads SSDs. The analysis assumes a windowed greedy reclaiming and writes), we instead distinguish between static and dy- policy, which is a variation of the age-threshold-based policy namic data as only writes contribute to write amplification. described in [10]. Write amplification is derived assuming Furthermore, the Sprite LSF write-cost comparison includes 4 KiB independently and uniformly distributed user write time for seeks, rotational latency, and cleaning costs. There- requests. The analytical results are confirmed by simulation fore, our results can only be qualitatively compared to those results. Analytical results are then extended to the case in from Sprite LSF. which static and dynamic data can be distinguished. Although some of the flash memory papers briefly men- This paper is organized as follows. Section 2 reviews the tion performance impacts from garbage collection and wear relevant work in garbage collection and flash wear leveling. leveling, we did not find a detailed analysis of write ampli- In Section 3, we introduce an analytical model based on a fication in flash-based storage systems and how it relates to probabilistic approach and then continue with a description parameters, such as the spare factor, in the literature. of our flash storage simulator in Section 4. Based on these two sections, we present essential analytical and simulation- based numerical results that allow us to quantify write am- 3. WRITE AMPLIFICATION ANALYSIS plification in Section 5 before we extend our analytical model In this section, we first introduce a generic architecture to the more realistic scenario with static and dynamic data for a log-structured SSD with a windowed greedy reclaiming (Section 6).