Reducing Write Amplification of Flash Storage Through Cooperative Data Management with NVM Eunji Lee Julie Kim Hyokyung Bahn* Sam H

Reducing Write Amplification of Flash Storage through Cooperative Data Management with NVM Eunji Lee Julie Kim Hyokyung Bahn* Sam H. Noh Chungbuk Nat’l University Ewha University Ewha University UNIST Cheongju, Korea Seoul, Korea Seoul, Korea Ulsan, Korea [email protected] [email protected] [email protected] [email protected]

Abstract—Write amplification is a critical factor that limits the wear-out of flash storage. This is an important contribution the stable performance of flash-based storage systems. To reduce when one considers the fact that NVM density, cost, and write amplification, this paper presents a new technique that performance is constantly improving, while the endurance of cooperatively manages data in flash storage and nonvolatile flash keeps deteriorating [16-19]. memory (NVM). Our scheme basically considers NVM as the cache of flash storage, but allows the original data in flash The basic workings of our scheme can be summarized in storage to be invalidated if there is a cached copy in NVM, which the following two situations. First, when cached data becomes can temporarily serve as the original data. This scheme dirty, our scheme immediately notifies this state change to the eliminates the copy-out operation for a substantial number of storage, allowing early invalidation of storage data. Second, cached data, thereby enhancing garbage collection efficiency. Experimental results show that the proposed scheme reduces the when garbage collection is activated, our scheme erases victim copy-out overhead of garbage collection by 51.4% and decreases blocks without copying out their valid data if the data, the standard deviation of response time by 35.4% on average. possibly being clean, resides in NVM. To support this mechanism, we define a new flash page state that we call Keywords-Flash Memory; Write Amplification; Non-volatile “removable” and discuss how this state can be utilized when Memory managing data in flash storage and the NVM cache. I. INTRODUCTION The proposed scheme is implemented on SSDsim, which is Developments in nonvolatile memory (NVM) technologies an extended version of DiskSim for SSDs [20]. Experimental such as PCM (phase-change memory), STT-MRAM (spin results with various storage workloads show that the proposed torque transfer magnetic RAM) and 3D XPoint are advancing scheme reduces the copy-out overhead of garbage collection rapidly [1-5]. While deploying emerging nonvolatile by 51.4% on average. This also leads to an average of 15% memories requires making changes in the storage management reduction in response time. Such reduction in write mechanisms devised for conventional volatile memory based amplification also results in a 35.4 % reduction in standard architectures [6], it also opens opportunities to improve on deviation of the response time. limitations that also exist. In this paper, we consider the use of NVM as a cache for flash based storage devices and propose II. ANALYZING WRITE AMPLIFICATION FACTOR the Cooperative Data Management (CDM) scheme that The write amplification factor is defined as the amount of cooperatively manages data in flash and the NVM cache in data actually written to flash storage over the amount of data order to reduce the write amplification of flash memory. writes that was requested. We investigate the write Flash memory is widely used as storage in high-end amplification factor of SSD by making use of SSDsim, which systems as well as small embedded devices. Flash memory is is a high-fidelity event-driven SSD simulator, as commercial an erase-before-write medium and the erasure unit (called SSD devices do not provide internal information to the end block) is much larger than the write unit (called page) [7]. users. (Detailed configurations of the experiment will be Thus, an entire block needs to be erased even if a small described later in Section 4.) We observe the write portion of the block is updated. To alleviate this inefficiency, amplification factor with respect to the workload writes in flash memory are performed out-of-place, and space characteristics by using two synthetic workloads (Random and containing obsolete data is periodically recycled. This Sequential) and two real workloads (JEDEC and OLTP). For procedure, which is called garbage collection, selects blocks to be recycled, copies out valid pages in the blocks, if any, and the synthetic workloads, we generate 5 million write then erases the blocks. Garbage collection incurs additional operations in random and sequential patterns, respectively, by writes to flash, that is, writes are amplified. This write making use of the internal workload generator in SSDsim [20]. amplification is a critical factor that limits the stable Each operation is in 8 sector (4KB) units and the total performance of flash storage [8-15]. footprint is 20GB. Table 1 summarizes the characteristics of The key idea of the CDM scheme comes from the the real workloads used in our experiments. JEDEC (JEDEC observation that we can recycle victim blocks in flash without 219A) is a workload that is used for SSD endurance copying out their valid data if the data resides in nonvolatile verification [21]. OLTP trace that is attained at financial cache, which provides durability, instead of flash storage. This institutions generates I/O accesses for financial transactions scheme exploits the non-volatility of scalable cache to relieve [22]. Table 1: Summary of workload characteristics. 7.0 7.0

JEDEC OLTP 6.0 6.0 5.0 5.0 # of ops. 5,000,000 9,034,179 4.0 4.0 write 89%, trim 6%, read 52% Ratio of ops. 3.0 3.0 flush 5% write 48% 2.0 2.0 Write Amplification Factor Footprint 31.2GB 30.7GB Write Amplification Factor 1.0 1.0

0.0 0.0 Before measurements begin we warm up the simulator by 4GB 8GB 16GB 32GB 64GB 4GB 8GB 16GB 32GB 64GB SSD Capacity SSD Capacity writing data sequentially until the number of free blocks reduces to less than 5% of the total flash blocks. The results (a) Random (b) Sequential reported are those of repeatedly running the workload 10 times 7.0 7.0 for each workload. As shown in Figure 1, the average write 6.0 6.0 amplification factors of random and sequential are 2.84 and 5.0 5.0 1.00, respectively. In random workloads, as flash pages are 4.0 4.0 updated at random, the victim block selected during garbage 3.0 3.0 collection is likely to have many valid pages. This leads to 2.0 2.0 Write Amplification Factor increased write amplification. In contrast, as pages within a 1.0 Write Amplification Factor 1.0 0.0 0.0 block are invalidated together in sequential workloads, there is 4GB 8GB 16GB 32GB 64GB 4GB 8GB 16GB 32GB 64GB no write amplification. SSD Capacity SSD Capacity (c) JEDEC (d) OLTP However, most workloads in real systems are a certain mixture of sequential and random workloads. To mimic such Figure 1: Write amplification factor for various workloads. real situations, we generate mixed workloads and investigate the write amplification factor as the ratio of sequential and random accesses is varied. All other configurations other than 1.40 the ratio of random to sequential requests are the same as that 1.35 of Figure 1. Figure 2 shows the write amplification factor of a 1.30 32GB SSD as the ratio of sequential and random accesses is 1.25 varied. As shown in the figure, the write amplification factor 1.20 of mixed workloads is almost identical to that of a pure 1.15 1.10 random workload and does not decrease even when most 1.05

accesses are sequential. That is, only a small fraction of factor amplification Write 1.00

10:0m 9:1:1 8:2:2 7:3:3 6:4:4 5:5:5 4:6:6 3:7 :7 2:8 :8 1:9 :9 0:10 random accesses is necessary to intensify the write ial t 9 8 7 6 5 4 3 2 1 o Access ratio (random:sequential) amplification factor, which implies that the write amplification of real workloads would be similar to that of random accesses. Figure 2: Write amplification factor as access ratio is varied. This is also consistent with the write amplification factor measured from real I/O workloads as shown in Figures 1(c) preserved even though there is another copy in the cache. and 1(d), which are very similar to that of Random in Figure However, when the cache becomes nonvolatile and data is 1(a). When storage capacity is small compared to the data to cached, we now have two persistent copies in the system; one be stored, the write amplification factor becomes very high, in flash, which we refer to as the storage (or flash) copy and going up to as much as 6.0 in our experiments. Such write the other in NVM cache, which we refer to as the cached copy. amplification degrades the performance of flash storage and One of these copies can (and should) be eliminated as keeping also reduces the lifetime of flash storage. both is redundant and thus, may be costly. Whether to discard To solve this problem, we propose a new data management the cache or the flash copy will depend on which benefits the scheme that enables flash memory to minimize valid page system more. For example, if the storage copy is a candidate copy-out by making use of nonvolatile cache. The next section to be moved due to garbage collection, it might be better to describes the design and implementation details of the simply discard it instead as a valid, nonvolatile cached copy Cooperative Data Management (CDM) scheme that we exists. In this manner, the Cooperative Data Management propose. (CDM) scheme that we propose recycles victim blocks without copying out their valid data if the data resides in the III. COOPERATIVE DATA MANAGEMENT WITH FLASH cache. MEMORY AND NVM Note that the storage copy could be invalidated as soon as it A. Cooperative Data Management is uploaded in cache. However, this will increase writes to Traditional caching systems employ volatile media such as flash as, in this case, the cached copy must be written back to DRAM and thus, the original data in storage must be flash when evicted from cache as the storage copy is now N/A read / S0 write

Dirty Valid N/A Up-to-date S0 cache allocation for write S2 Writable

Valid Valid cache allocation read Clean for write write for S1 evict Clean read Removable S1 Up-to-date Writable

Removable

State in erase S# S2 Dirty cache checkpoint in flash checkpoint

State in Dirty Dirty read Invalid storage S3 Up-to-date S4 Up-to-date read / Write-protected commit Writable write Invalid Invalid Figure 3: States of cache and storage data when the NVM cache and flash storage cooperate. write read / write

State in Dirty S# cache S5 Out-of-date invalid. Thus, instead of invalidating storage copy right after it Write-protected is cached, we change the state from “valid” to “removable”, State in storage Invalid where “removable” is a flash page state that is defined as being currently “valid” but may be removed (i.e., be Figure 4: States of storage data and cached copy in the consistency guaranteed considered to be “invalid”) without copying it out if an erasure model. occurs on the storage copy. If the cached copy is evicted before the storage copy is recycled, the state of the storage copy simply returns to “valid.” level of reliability is also necessary when using our CDM On the other hand, if the cached copy is updated (i.e., scheme. Here, we discuss the consistency mechanism for becomes dirty), our scheme immediately invalidates the CDM. storage copy because the cached copy must be written back to Recall that unlike flash memory, in-place updates are flash anyhow. Note here that periodic flushing done in possible with NVM but atomicity of in-place updates are traditional volatile caches to maintain consistency becomes guaranteed in only relatively small sizes [23, 24]. Now unnecessary as the cache is now nonvolatile. suppose that only the cached copy of a certain page remains. Figure 3 shows the state diagram of data in flash (storage) In this situation, if the system crashes while updating the and cache when the NVM cache cooperates with flash. Each cached copy, the data may become inconsistent as page circle represents the state of the storage copy and its cached overwrite is not atomic. To overcome this problem, we copy. For simplicity, let us assume that storage and cached prohibit in-place updates of cached copies when it serves as copies are managed in page units. The storage copy has three the original data. states, “valid,” “invalid,” and “removable,” and the cached Figure 4 extends the states defined in Section 3.A to copy has two states “clean” and “dirty.” In the figure, in state guarantee consistency. We define two additional indicators S0 the data only exists in flash. When the storage copy is “writable/write-protected” and “up-to-date/out-of-date.” The retrieved and cached, the state changes to S1. Now, as the up- writable/write-protected indicator distinguishes whether the to-date copy exists in both cache and flash, the state of the cached copy allows in-place update or not. If a write is storage copy becomes “removable.” If the cached copy is requested on a write-protected copy, it is first copied and the updated or the storage copy is removed, the up-to-date data write is performed on that copy. The up-to-date/out-of-date exists only in the cache, and the storage copy becomes indicator distinguishes whether the cached copy is the most “invalid” (S2). Finally, if the cached copy in the “dirty” state recent version or not. This is necessary as multiple copies for is evicted, it is flushed to flash and the state returns to S0. If a the same data may exist in the cache. cached copy in the “clean” state (S1) is replaced, it is simply Let us see how our scheme works with the state changes. discarded from the cache without flushing (S0). Initially, data exists only in flash (S0). Upon read/write requests, the data is cached and S0 transits to S1/S2. In S1 B. Consistency Issues state, as up-to-date data exists in both cache and flash, the Management of cached data requires a guaranteed level of storage copy becomes “removable.” In this situation, suppose reliability. For example, modern reliable file systems perform that garbage collection occurs and the storage copy needs to out-of-place updates such as journaling (e.g., Ext4 and be erased. Then, the cached copy becomes “write-protected” ReiserFS) or copy-on-write (e.g., BtrFS and ZFS) to support before the storage copy becomes “invalid” (S3). This protects recovery of file systems to the latest consistent state. A certain the cached copy from being corrupted upon a crash. The data SSD Table 2: Experimental parameters.

Flash Processor Pkg SSD capacity 64GB Page size 4KB Flash Host Buffer NVRAM Pkg Block size 256KB (64 pages) system manager controller Host interface Host

... Page read latency 25us

Flash Page write latency 200us NVRAMNVRAMNVRAM Pkg NV-cache cache Block erase latency 1.5ms Data transfer latency 100us (for 4KB page)

Figure 5: System architecture used in our experiments. Overprovisioning ratio 15% in this state still services read requests, but upon a write request, a copy-on-write occurs to protect the original data because the host OS and the storage system must be able to (S5) and the update is performed on a new cache location (S4). notify each other of state changes that occur in the cache and The out-of-date copy (S5) is maintained until its up-to-date storage system. One way to get around this limitation is to version (S4) is successfully committed, and then reclaimed extend the host interface to transfer the state information [25]. (S0). Fortunately, this is becoming a viable approach as emerging Returning to state S1, if the cached copy is updated while storage interfaces like NVMe or Universal PCI Express are its original data exists in flash, the storage copy becomes out- flexible such that adding proprietary extensions are becoming of-date. However, as the cached copy has not yet committed, feasible [26]. the storage copy is still in the “valid” state (S2). This is The other level is to use NVM as an internal write buffer in different from the basic model presented in Section 3.A that flash storage devices such as SSDs. In this case, our scheme immediately changes the out-of-date data to “invalid”, but can be incorporated into the design of FTL in SSD internals does not guarantee consistency. without modifying host interfaces. This approach does not Similar to file system journaling that periodically commits require any modifications to the storage stack as the FTL is a updated data to a separate storage area (e.g., every 5 seconds), purely internal mechanism within the flash storage device. the proposed scheme can periodically perform commits by Even though the scheme that we propose can be deployed at changing the state of cached “dirty” data (S2/S4) to “write- both levels as just discussed, for our performance evaluation, protected” (S3). Note that this procedure just changes the state we will only focus on the latter case. of the cached copy, incurring neither storage writes nor copy IV. PERFORMANCE EVALUATION operations within the cache. Once the data is committed, it becomes the new original data and the data before the update For our evaluation, we implement the proposed scheme into needs to be invalidated. This old data may exist either in flash DiskSim’s MSR SSD extension [20]. The SSD simulator or in cache. If the storage copy is invalidated (S2S3), emulates SLC NAND flash memory chip operations, and the unnecessary copy overhead during garbage collection can be parameters that we use are presented in Table 2. In all configurations, there are 8 flash memory chips and the total eliminated. If the cached copy is invalidated (S5S0), it is storage capacity is 64GB. The simulator assumes the PCI-e reclaimed and becomes free. The committed up-to-date data interface with 8 lanes with 8b/10b encoding, providing 2.0 (S3) are finally reflected to a permanent storage location via Gbps per lane. checkpointing. Checkpointing should be triggered when free In this SSD simulator, we add a nonvolatile cache and memory drops below low watermark. Even if the committed implement the CDM scheme within the FTL. The cache is data (S3) becomes out-of-date (S5) due to subsequent writes, managed in 4KB page units using the LRU replacement policy. it (S5) is written to the storage during checkpointing as the up- Specifically, we add the “removable” state to flash pages in to-date copy (S4) has not yet been committed. After conjunction with the “valid/invalid” states by modifying the checkpointing, the committed obsolete data (S5) is reclaimed page table entries in FTL. We then revise the garbage collector (S0) and the up-to-date copy (S4) transits its state (S2) as it to skip copy-out operations for pages with the “removable” now has a backup copy in storage. The up-to-date cached copy state. Before erasing blocks, the garbage collector determines (S3 or S4) still serves as a cache block (S1 or S2) after that the pages in removable states are to be erased and sets checkpointing. their states to write-protected. Figure 5 shows the system C. Implementation Issues architecture of the proposed scheme. For all experiments, we The scheme that we propose can be deployed at two warm up the simulator in the same manner used for the write different levels in real systems. The first is using NVM as a amplification experiments. host-side cache for flash based primary storage. In this case, We compare our scheme with NVM-basic, which uses the our scheme can be supported by the kernel through the same NVM cache architecture, thus providing durability modification of the file system buffer cache layer. However, in against power failures, but does not perform cooperative data this case, current storage interfaces need to be revised. This is management. Figure 6 shows the number of pages copied out 1000000 1600000 1.2 1.2 900000 1400000 800000 1.0 1.0 1200000 700000 0.8 0.8 600000 1000000 500000 800000 0.6 0.6 400000 600000 300000 NVM-basic NVM-basic 0.4 NVM-basic 0.4 NVM-basic 400000 200000 CDM CDM CDM CDM # of pages moved pages # of # of pages moved # of pages 0.2 0.2 100000 200000 0 0 0.0 0.0 avg. response time (norm) time response avg. (norm) time response avg.

Cache size (MB) Cache size (MB) Cache size (MB) Cache size (MB) (a) JEDEC (b) OLTP (a) JEDEC (b) OLTP Figure 6: Number of pages copied out. Figure 8: Average response time.

1.2 1.2 1.7 1.7 1.6 NVM-basic 1.6 1.0 1.0 CDM 1.5 1.5 0.8 0.8 1.4 1.4 0.6 0.6 1.3 1.3 0.4 NVM-basic 0.4 NVM-basic 1.2 1.2 NVM-basic CDM CDM 0.2 0.2 1.1 1.1 CDM std. dev. std.of responsedev. time std.of responsedev. time write amilification factor 1.0 write amilification factor 1.0 0.0 0.0

cache size (MB) cache size (MB) Cache size (MB) Cache size (MB) (a) JEDEC (b) OLTP (a) JEDEC (b) OLTP Figure 9: Standard deviation of response time. Figure 7: Write amplification factor. informs storage the lifetime of the data being transferred such during garbage collection as a function of the cache size. As that the storage device can manage data more efficiently with shown in the figure, the proposed scheme significantly reduces respect to garbage collection [9]. Robert proposes dynamic the copy-out overhead. Specifically, the improvement overprovisioning methods for storage systems through becomes larger as the cache size increases. This is because a compression and deduplication of data for reduction of write large size cache allows for aggressive invalidation of copies in amplification and increased endurance and longevity [10]. flash. The average reduction of copied pages is 48.3% and Skourtis et al. propose redundant data management and 54.4% for JEDEC and OLTP, respectively, compared to the separation of reads and writes to avoid unpredictable delays original system. This will eventually lead to prolonging the caused by garbage collection [11]. Jagmohan et al. propose a lifetime of flash memory. multi-write coding mechanism that enables a NAND flash Figure 7 shows the write amplification factor as a function page to be programmed more than once without block erase, of the cache size. As shown in the figure, the proposed scheme thereby relieving write amplification by garbage collection significantly reduces the write amplification factor, [12]. Boboila and Desnoyers present a method using actual specifically when the cache size becomes large. The reduced chip-level measurements to reverse engineer FTL details [13]. write amplification factor is in the range of 2.1-17.6% and 4.3- They show that performance and endurance can be estimated 38.2% for the JEDEC and OLTP workloads, respectively. through the reverse engineering FTL. This method is used to Figures 8 and 9 show the average response times and their suggest FTL parameter setting such that performance and standard deviation for CDM normalized to NVM-basic. Due endurance can be efficiently balanced. Yang et al. present to the large reduction in garbage collection overhead, CDM analytic modeling for evaluating write amplification in improves the average response time by 9.7% and 20.3% and garbage collection and reveal the relationship between reduces the average standard deviation by 31% and 39% for endurance and performance metric [14]. JEDEC and OLTP, respectively. VI. CONCLUSION V. RELATED WORKS This paper presented a new data management scheme for Lu et al. observe that the layered file system and FTL flash storage when NVM is adopted as the cache. The design accelerates flash memory wear-out as it prevents file proposed scheme cooperatively manages data in flash and the systems from exploiting the characteristics of flash storage cache in order to efficiently perform garbage collection. devices [27]. To remedy this deficiency, they propose an Specifically, we allow victim blocks to be erased without object-based flash translation layer (OFTL) to which the file copying out their valid data if the data are in the cache. system storage management component is offloaded so that Experimental results show that the proposed scheme reduces flash memory can be managed directly. Kang et al. propose a the copy-out overhead of garbage collection by 51.4% on multi-streamed I/O mechanism where the host explicitly average. This results in reduced write amplification, which in turn, results in reduced response time and variance of response [14] Y. Yang , J. Zhu, “Algebraic modeling of write amplification in time. hotness-aware SSD,” Proceedings of the 8th ACM International Systems and Storage Conference (SYSTOR), 2015. VII. ACKNOWLEDGEMENT [15] P. Desnoyers, “Analytic modeling of SSD write performance,” This work was supported by Basic Science Research Proceedings of the 5th ACM International Systems and Storage Program through the National Research Foundation of Korea Conference (SYSTOR), 2012. (NRF) funded by the Ministry of Science, ICT & Future [16] M. Yang, Y. Chang, C. Tsao, and P. Huang, “New ERA: new Planning (No. 2014R1A1A3053505 and No. efficient reliability-aware wear leveling for endurance enhancement of flash storage devices,” Proceedings of the 50th 2015R1A2A2A05027651) and by the IT R&D program Annual Design Automation Conference (DAC), 2013. MKE/KEIT (No. 10041608). Hyokyung Bahn is the [17] D. Apalkov, A. Khvalkovskiy, S. Watts, V. Nikitin, X. Tang, D. corresponding author of this paper. Lottis, K. Moon, X. Luo, E. Chen, A. Ong, A. Driskill-Smith, and M. Krounbi, “Spin-transfer torque magnetic random access REFERENCES memory (STT-MRAM),” ACM Journal on Emerging [1] J. C. Mogul, E. Argollo, M. Shah, and P. Faraboschi, “Operating Technologies in Computing Systems, 9(2), 2013. system support for NVM+DRAM hybrid main memory,” [18] O. Zilberberg, S. Weiss, and S. Toledo, “Phase-change memory: Proceedings of USENIX Workshop on Hot Topics in Operating An architectural perspective,” ACM Computing Surveys, 45(3), Systems (HotOS), 2009. 2013. [2] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, “Scalable high [19] Y. Li and K. N. Quader, “NAND Flash memory: challenges and performance main memory system using phase-change memory opportunities,” Computer, pp. 23-29, 2013. technology,” Proceedings of the 36th International Symposium on Computer Architecture (ISCA), 2009. [20] N. Agrawal, V. Prabhakaran, T. Wobber, J. Davis, M. Manasse, and R. Panigrahy, “Design tradeoffs for SSD performance,” [3] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, “Phase change Proc. USENIX ATC, pp. 57-70, 2008. memory architecture and the quest for scalability,” Communications of the ACM, 53(7), 2010. [21] JEDEC, Master trace for 128 GB SSD, http://www.jedec.org/standards-documents/docs/ jesd219a_mt. [4] S. Lee, H. Bahn, and S. H. Noh, “CLOCK-DWF: a write- history-aware page replacement algorithm for hybrid PCM and [22] UMASS trace repository, http://traces.cs.umass.edu. DRAM memory architectures,” IEEE Transactions on [23] J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger Computers, 63(9), pp. 2187-2200, 2014. and D. Coetzee, “Better I/O through byte-addressable, persistent [5] https://en.wikipedia.org/wiki/3D_XPoint memory,” Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pp.133-146, 2009. [6] E. Lee, H. Bahn, and S. H. Noh, “Unioning of the buffer cache and journaling layers with non-volatile memory,” Proceedings [24] S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, of the 11th USENIX Conference on File and Storage and R. Sankaran, J. Jackson, “System software for persistent Technologies (FAST), pp. 73-80, 2013. memory,” Proceedings of the 9th European Conference on Computer Systems (EuroSys), 2014. [7] J. Kim, J.M. Kim, S.H. Noh, S.L. Min, and Y. Cho, “A space- efficient flash translation layer for compact flash systems,” [25] F. Shu, “Data set management commands proposal for ATA8- IEEE Transactions on Consumer Electronics, 48(2), 2002. ACS2,” T13 Technical Committee, United States: At Attachment:e07154r1, 2007. [8] Y. Lu, J. Shu, and W. Zheng, “Extending the Lifetime of Flash- based Storage through Reducing Write Amplification from File [26] A. Huffman, “NVM Express: Going Mainstream and What’s Systems,” Proceedings of the 11th USENIX Conference on File Next”, Intel Developers Forum, 2014. and Storage Technologies (FAST), pp. 73-80, 2013. [9] J. Kang, J. Hyun, H. Maeng, and S. Cho, “The Multi-streamed Solid-State Drive,” Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File systems (HotStorage), 2014. [10] Robert L. Horn, “Dynamic overprovisioning for data storage systems”, Western Digital Technologies, 2015. [11] D. Skourtis, D. Achlioptas, N. Watkins, C. Maltzahn, and S. Brandt, “Flash on Rails: Consistent Flash Performance through Redundancy,” Proceedings of the USENIX Annual Technical Conference (ATC), 2014. [12] A. Jagmohan, M. Franceschini, L. Lastras, “Write Amplification Reduction in NAND Flash through Multi-Write Coding,” Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), 2010. [13] S. Boboila and P. Desnoyers, “Write Endurance in Flash Drives: Measurements and Analysis,” Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST), 2010