UNIVERSITY of CALIFORNIA, SAN DIEGO Model and Analysis of Trim

UNIVERSITY OF CALIFORNIA, SAN DIEGO Model and Analysis of Trim Commands in Solid State Drives A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering (Intelligent Systems, Robotics, and Control) by Tasha Christine Frankie Committee in charge: Ken Kreutz-Delgado, Chair Pamela Cosman Philip Gill Gordon Hughes Gert Lanckriet Jan Talbot 2012 The Dissertation of Tasha Christine Frankie is approved, and it is acceptable in quality and form for publication on microfilm and electronically: Chair University of California, San Diego 2012 iii TABLE OF CONTENTS Signature Page . iii Table of Contents . iv List of Figures . vi List of Tables . x Acknowledgements . xi Vita . xii Abstract of the Dissertation . xiv Chapter 1 Introduction . 1 Chapter 2 NAND Flash SSD Primer . 5 2.1 SSD Layout . 5 2.2 Greedy Garbage Collection . 6 2.3 Write Amplification . 8 2.4 Overprovisioning . 10 2.5 Trim Command . 11 2.6 Uniform Random Workloads . 11 2.7 Object Based Storage . 13 Chapter 3 Trim Model . 15 3.1 Workload as a Markov Birth-Death Chain . 15 3.1.1 Steady State Solution . 17 3.2 Interpretation of Steady State Occupation Probabilities . 18 3.3 Region of Convergence of Taylor Series Expansion . 22 3.4 Higher Order Moments . 23 3.5 Effective Overprovisioning . 24 3.6 Simulation Results . 26 Chapter 4 Application to Write Amplification . 35 4.1 Write Amplification Under the Standard Uniform Random Workload . 35 4.2 Write Amplification Analysis of Trim-Modified Uniform Random Workload . 37 4.3 Alternate Models for Write Amplification Under the Trim-Modified Uniform Random Workload . 41 4.4 Simulation Results and Discussion . 41 iv Chapter 5 Workload Variation: Hot and Cold Data . 47 5.1 Mixed Hot and Cold Data . 48 5.2 Separated Hot and Cold Data . 50 5.3 Simulation Results . 51 5.4 N Temperatures of Data . 54 Chapter 6 Workload Variation: Variable Sized Objects . 56 6.1 From LBAs to Objects . 57 6.2 Fixed Sized Objects . 58 6.3 Variable Sized Objects . 59 6.4 Simulation Results . 62 6.5 Application to Write Amplification . 67 Chapter 7 Conclusion . 72 Appendix A Asymptotic Expansion Analysis . 77 A.1 Useful Theorems and Facts . 77 A.2 Asymptotic Expansion Analysis . 80 A.3 An Accurate Gaussian Approximation . 81 A.4 A Higher Order PDF Approximation . 84 A.4.1 Approximating the Mean . 92 A.4.2 Approximating the Variance . 93 A.4.3 Approximating the Skew . 95 A.4.4 Approximating the Kurtosis . 96 A.5 Accuracy of the Truncation PDFs . 96 A.6 The K-L Divergence to and from the Normal Distribution 97 A.6.1 Kullback-Liebler Divergence . 99 A.6.2 Application to PX . 100 References . 104 v LIST OF FIGURES Figure 2.1: Schematic of greedy garbage collection. Data is written to the current writing block, which is the first block in the free block pool. When garbage collection is triggered, the block with fewest valid pages is selected from the occupied block queue. After its valid pages are copied to the current writing block, the selected block is erased and appended to the end of the free block pool. 7 Figure 3.1: Steady state distribution of the number of In-Use LBAs for the Trim-modified uniform random workload under various amounts of Trim, with u = 1000. Note that a higher percentage of Trim requests in the workload results in a lower mean but a higher variance. 19 Figure 3.2: Steady state distribution of the effective spare factor for the Trim-modified uniform random workload with 25% of requests Trim over 1000 user-allowed LBAs, and various amounts of manufacturer specified spare factor. Note the improvement in the effective spare factor over the manufacturer specified spare factor has a larger variance for smaller manufacturer specified spare factor. 25 Figure 3.3: Comparison of analytic and simulation PDF, for 40% Trim. The analytic values closely match the simulation results. 26 Figure 3.4: A histogram of one run of simulation results for u = 1000 and q = 0:4, along with plots of the exact analytic PDF (Equa- tion 3.1) in red, the Stirling approximation of the PDF (Equa- tion 3.2) in black dots, and the Gaussian approximation of the PDF (Equation 3.3) in green. The line for the exact analytic PDF is not even visible because the line for the Gaussian approximation of the PDF covers it completely. The Stirling approximation of the PDF (black dots) lines up well with the Gaussian approximation and the (covered) analytic PDF. Even at this low value of u = 1000, the theoretical approximations are effectively identical to the exact analytic PDF. 27 vi Figure 3.5: A histogram of one run of simulation results for u = 25 and q = 0:3, along with plots of the exact analytic PDF (Equa- tion 3.1) in red, the Stirling approximation of the PDF (Equa- tion 3.2) in black, and the Gaussian approximation of the PDF (Equation 3.3) in green. A slight negative skew of the exact analytic PDF is evident when compared to the Gaussian approximation. Note that the exact analytic PDF is the best fit of the histogram. A series of 64 Monte-Carlo simulations with these values result in an average skew of mean ± one standard deviation = −0:299 ± 0:004, which agrees well with the theo- 1 retically computed approximation value of −0:306 ± R σ3 = −0:306 ± R(0:029). The average simulation kurtosis is 0.064; the approximate theoretical is 0.070, both values are well within overlapping error bounds. The top graph is a plot of the numer- ically normalized equations as shown in the text. In the bottom graph, a half-bin shift correction has been made to the Stirling and Gaussian approximation densities as described in the text. 28 Figure 3.6: Theoretical results for utilization under various amounts of Trim. When more Trim requests are in the workload, a lower percentage of LBAs are In-Use at steady state, but the variance is higher than seen in a workload with fewer Trim requests. 30 Figure 3.7: Theoretical results for the effective spare factor under various amounts of Trim, with a manufacturer specified spare factor of 0.11. When more Trim requests are in the workload, a higher effective spare factor is seen, but with a higher variance than seen in a workload with fewer Trim requests. 31 Figure 3.8: Mean and 3σ values of effective spare factor versus manufacturer specified spare factor. This shows that an SSD with zero manufacturer specified spare factor effectively has a spare factor of 33% when the workload contains 25% Trim requests. This level of overprovisioning without the Trim command requires 50% more physical storage than the user is allowed to access. The ±3σ values were calculated using a small Tp = 1000, so they would be visible. Typical devices have a Tp >> 1000, resulting in a negligible variance. 32 vii Figure 3.9: Theoretical results comparing the effective spare factor to the manufacturer specified spare factor. The mean and 3σ standard deviations are shown. For this figure, Tp = 1280 was used so that the 3σ values are visible. Larger gains in the effective spare factor are seen for smaller specified spare factor. In this figure, we see that a workload of 10% Trim requests transforms an SSD with a manufacturer specified spare factor of zero into one with an effective spare factor of 0.11. To obtain this spare factor through physical overprovisioning alone would require 12.5% more physical space than the user is allowed to access! . 33 Figure 4.1: Equations needed to compute the empirical model-based algo- rithm for AHu (adapted from [1, 2]). w is the window size, and allows for the windowed greedy garbage collection variation. Setting w = Tp − r is needed for the standard greedy garbage np collection discussed in this dissertation. For an explanation of the theory behind these formulas, see [1, 2]. 36 Figure 4.2: Write amplification versus probability of Trim under a Trim- modified uniform random workload and greedy garbage collection, when Sf = 0:2. The plots labeled Hu, Agarwal, and Xi- ang all show the Trim-modified models for write amplification. (T ) AAgarwal gives such an optimistic prediction that for large q it (T ) foresees fewer actual writes than requested writes! AHu gives a more realistic, but still optimistic prediction for write ampli- (T ) fication. AXiang gives the best, and generally quite accurate, prediction for write amplification. 42 Figure 4.3: Simulation and theoretical results for write amplification under various amounts of Trim and Sf = 0:1. The Trim-modified Xiang calculation is a very good match. The Trim-modified calculations for Hu and Agarwal are optimistic about the write amplification, with the Agarwal calculation being so optimistic as to give an impossible write amplification of less than 1 for high amounts of Trim! . 43 Figure 5.1: Write amplification for mixed hot and cold data. Physical spare factor is 0.2. Trim requests for cold data occur with probability 0.1; Trim requests for hot data occur with probability 0.2. Cold data is requested 10% of the time, with hot data requested the remaining 90% of the time. Notice that the theoretical values, based only on the level of effective overprovisioning, are optimistic when cold data accounts for more than 20% of the user space. 49 viii Figure 5.2: Write amplification for separated hot and cold data. Physical spare factor is 0.2.

UNIVERSITY of CALIFORNIA, SAN DIEGO Model and Analysis of Trim

Technology of Enterprise Solid State Drive

MULTI-STREAM SSD TECHNOLOGY 2X READ PERFORMANCE & WRITE ENDURANCE T10 SCSI STANDARD

Filesystem Considerations for Embedded Devices ELC2015 03/25/15

SATA 6Gb/S 3I+1 SSD Hybrid Pcie Quick Installation Guide

Things You Should Know About Solid State Storage

How Controllers Maximize SSD Life

Parafs: a Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

Write Amplification Analysis in Flash-Based Solid State Drives

Why Data Retention Is Becoming More Critical in Automotive

Getting the Most out of SSD: Sometimes, Less Is More

Improving Endurance with Garbage Collection, TRIM and Wear Leveling

Determining Solid-State Drive (SSD) Lifetimes for SEL Rugged Computers