Solid State Disk Object-Based Storage with Trim Commands Tasha Frankie, Member, IEEE, Gordon Hughes, Fellow, IEEE, and Ken Kreutz-Delgado, Senior Member, IEEE

1 Solid State Disk Object-Based Storage with Trim Commands Tasha Frankie, Member, IEEE, Gordon Hughes, Fellow, IEEE, and Ken Kreutz-Delgado, Senior Member, IEEE

Abstract—This paper presents a model of NAND flash SSD utilization and write amplification when the ATA/ATAPI SSD Trim command is incorporated into object-based storage under a variety of user workloads, including a uniform random workload with objects of fixed size and a uniform random workload with objects of varying sizes. We first summarize the existing models for write amplification in SSDs for workloads with and without the Trim command, then propose an alteration of the models that utilizes a framework of object-based storage. The utilization of objects and pages in the SSD is derived, with the analytic results compared to simulation. Finally, the effect of objects on write amplification and its computation is discussed along with a potential application to optimization of SSD usage through object storage metadata servers that allocate object classes of distinct object size.

Index Terms—Trim, Solid State Drive, Object-Based Storage, Write Ampliﬁcation, NAND Flash, Markov processes. !

1 INTRODUCTION SSD to internally manage their object physical map- N recent years, NAND flash solid state drives ping and relocation. This has the potential to improve I (SSDs) have become ubiquitous, appearing in small I/O rate and QoS, user access blocking probability, mobile electronics to large data centers. Part of the and SSD flash chip lifetime through the reduction of popularity of flash-based SSDs over traditional hard write amplification. disk drives (HDDs) is the increased speed and per- In this paper, we present a model for computing uti- formance that they offer. However, SSDs have their lization and write amplification under a modified ran- own set of problems, including a slowdown in per- dom workload including Write and Trim commands, formance as the device fills with data due to write and data records of varying sizes. This model is an amplification, and a limited lifetime due to wearout extension of models proposed for simpler workloads of the floating-gate transistors. Additionally, SSDs by previous researchers [3], [4], [5], [6] and builds on currently need to conform with legacy interfaces from our previous models of the Trim command [7], [8], [9]. HDDs, which makes optimization in using the device Good models assist in the development of algorithms more challenging. for managing the challenges of SSDs; our model adds Object based storage offers multiple advantages for flexibility to the theory in order to more closely match systems using solid state storage devices [1]. When the real-world workload characteristics. multiple thousand pages in a flash block are divided into object records of larger size, the resultant fewer 2 NAND FLASH SSDS arXiv:1210.5975v1 [cs.PF] 10 Oct 2012 objects per block directly translates into fewer data In this section, we provide background about NAND write relocations during garbage collection, leading to Flash SSDs which is necessary to understand the a lower write amplification and faster performance. model presented in this paper. This offers an opportunity for user application storage metadata, via metadata servers, to actively allow 2.1 Layout flash storage devices to self-optimize their hardware, to fully utilize SSD write speed advantages over The organization of a NAND flash SSD gives it char- magnetic HDD while ameliorating SSD limitations acteristics that cause unique challenges. The smallest like block erase and finite write lifetime. Object stor- storage unit of flash is the page, which typically holds age metadata servers could allocate SSDs to object 2K, 4K, or 8K of data [10], [11]. Pages are grouped into classes [2] of distinct object size and thereby allow blocks, with blocks typically containing 64, 128, or 256 pages, varying by manufacturer [10]. One challenge of • T. Frankie and K. Kreutz-Delgado are with the Department of Electrical flash SSDs is that a page must be erased before it can and Computer Engineering, UC San Diego, La Jolla, 92093. be programmed again, but the smallest erase unit of E-mail: [email protected], [email protected] • G. Hughes is with the Center for Magnetic Recording Research, UC flash SSDs is the block, and not all pages on the block San Diego, La Jolla, 92093. are ready to be erased at the same time. This means E-mail: [email protected] that write-in-place schemes are impractical, and most SSDs employ a dynamic logical-to-physical mapping 2 called the flash translation layer (FTL) [12], [13]. With section, we summarize previous work that we build this dynamic mapping, when data is overwritten, on in this paper. the old physical location of the data is marked as invalidated in the FTL mapping. In this paper, we utilize a log-structured file system, in which data is 3.1 Uniform Random Workload written to successively available pages, as described The type of workload used can have a large effect in [3]. on the performance of a storage device [16]. We view real-world workloads as existing along a con- 2.2 Garbage Collection tinuum from purely sequential workloads to com- Garbage collection is the process of selecting and pletely random workloads. For SSDs, analysis of a erasing a block to reclaim space for writing. When purely sequential workload is uninteresting, because a block is chosen for erasure, any valid pages on entire blocks of data are invalidated before garbage the block first need to be copied elsewhere to avoid collection, leading to no write amplification [3]. How- data loss. Copying these valid pages leads to write ever, at the other end of the spectrum, completely amplification, a phenomenon in which the device random workloads are very interesting to statistically performs more writes than are requested of it. analyze, since the random nature of the writes results There are many algorithms used to select a block for in most blocks still containing valid pages at the garbage collection, most of which are based on one of time of garbage collection, leading to potentially high two schemes originally described in the seminal paper write amplification. Hu, et al. [3], [4], Agarwal and by Rosenblum and Ousterhout [14]. Their first method Marrow [5], and Xiang and Kurkoski [6] have all pro- of selection chooses the block that creates the most posed models for write amplification of the uniform free space, and their second method also accounts for random workload under greedy garbage collection. the age of the block. In this paper, we utilize greedy Their workloads consist of write requests distributed garbage collection, in which the block with the fewest uniformly randomly over the user space, with each valid pages is selected. logical block address (LBA) requested requiring one page to store the data. Each of their models found that 2.3 Improving Performance the single most important factor in determining the write amplification was the level of overprovisioning Reducing write amplification creates an improvement provided on the SSD. in performance by increasing the average write speed due to the fewer number of extra writes performed by the device. Additionally, by reducing the number of 3.2 Trim-Modified Uniform Random Workload program/erase cycles, SSD lifetime is extended, since flash devices wear out and lose the ability to store Previously, we modified the uniform random work- data after being written many times. load to incorporate Trim requests, then analyzed the There are several ways to reduce write amplifi- resulting utilization and write amplification [7], [8], cation. One common way is for manufacturers to [9]. In the Trim-modified uniform random workload, provide extra storage space on the SSD beyond what both write and Trim requests are allowed, and the the user is allowed to access, a practice called over- data associated with each LBA occupies one page. provisioning. Overprovisioning reduces write ampli- As in the standard uniform random workload, write fication because the amount of time between garbage requests are uniformly randomly chosen over all u collections increases, which tends to lead to a decrease user LBAs, and Trim requests are uniformly randomly in the number of valid pages in the block selected for chosen over all In-Use LBAs, where an LBA is defined reclamation. We measure the amount of overprovi- as being In-Use if its most recent request was a Write. Trim requests occur with probability q, and write sioning with the spare factor, Sf = (T − u)/T, 0 ≤ requests occur with probability 1 − q. This workload Sf ≤ 1, where T is the total number of pages on the SSD, and u is the number of pages in the user space. was modeled as a Markov birth-death chain with the Another way to reduce write amplification is state being the number of In-Use LBAs, and the steady through use of the Trim command [15], which is a state solution was approximated as Gaussian with way for the file system to communicate to the SSD mean us and variance us¯, where s = (1 − 2q)/(1 − q) that certain data pages no longer need to be retained. and s¯ = q/(1 − q) are dependent on the rate of Trim. The pages associated with the data are invalidated in the FTL mapping, meaning that there are fewer valid 3.2.1 Effective Overprovisioning pages that need copying during garbage collection. Since the number of In-Use LBAs is the same as the number of valid pages, the level of effective overpro- 3 ANALYZING PERFORMANCE visioning is straightforward to calculate. The effective Several researchers have created analytic models to overprovisioning is measured in terms of the effective help understand the performance of SSDs. In this spare factor, Seff = (T − Xn)/T , where Xn is the 3 number of valid pages at time n. The mean effective are object IDs that are In-Use or not In-Use. Previ- spare factor was found to be ously, there were u LBAs allowed; now we allow [1, u] u T − us object IDs in the range , for a maximum of S¯ = objects. The object-view of the Trim-modified uniform eff T random workload is the same as the LBA view of the =s ¯ + sSf problem when each object occupies 1 page. The variance of the effective spare factor was also To remove the restriction that one object occupies calculated, and found to be negligible in practical only one page, we start by examining the workload devices (as T → ∞) [7], [9]. in which each object occupies b pages, where b is constant and identical for all objects in the workload. 3.2.2 Write Amplification Then, we examine the workload in which b varies We also explored write amplification under the Trim- each time an object is written. For all workloads, modified random workload. After paralleling the we assume that when an object ID is selected for derivation of Xiang and Kurkoski [6], we discovered trimming (or, more appropriately for objects, dele- that all three previously proposed models for write tion), all pages associated with that object ID are amplification could be modified by using the effective invalidated. Likewise, when an object is rewritten, all level of overprovisioning. In simulation, the Trim- pages previously containing data associated with that modified model of Xiang and Kurkoski provided the object ID are invalidated. Note that there are now two best prediction; the Trim-modified models of Hu, et quantities of interest: the number of object IDs that are al., and of Agarwal and Marrow were optimistic in in-use, and the number of pages that are valid. When their predictions of write amplification [8], [9]. each object occupies one page, these two quantities Additionally, we separated the workload into hot are the same. (frequently accessed) and cold (infrequently accessed) Define the following variables: T data in order to closer approximate a real-work work- χt = (χ1,t, χ2,t, . . . , χu,t) is a vector of indicator load. Analysis of this workload led to a more general random variables at time t. workload with N temperatures, and found that the write amplification of this workload type was lowest ( N 1 ω ∈ Ai (object i is In-Use) when the temperatures were kept physically sepa- χi,t = 1Ai (ω) = rated into different blocks [9]. 0 otherwise (object i is not In-Use) where 4 MODEL AUGMENTATION Ai = {ω|the STATE(ω) is such that location i is occupied} In the previous work described in Section 3, one page stored the data associated with one LBA. In this section, we augment the model to allow larger Pu Xt = i=1 χi,t is a random variable representing amounts of data to be stored under a single identifier, the number of objects in-use at time t. The mean and which lends itself naturally to a discussion of object- variance of this random variable at steady state were based storage. computed previously by Frankie, et al. [7], [9]. T Zt = (Z1,t,Z2,t,...,Zu,t) is a vector of the size of 4.1 From LBAs to Objects each object at time t. t In object-based storage, the filesystem does not assign Now, compute the number of valid pages at time : Y = ZTχ = Pu χ Z data to particular LBAs. Instead, the filesystem tells t t t i=1 i,t i,t the object-based storage disk to store certain data; the We are concerned with the values at steady state, disk chooses where to store it, and gives the filesys- and drop the time index to indicate the steady state tem a unique object ID with which the data can be solution. retrieved and/or modified [17]. Object-based storage allows for variable-sized objects with identifiers that 4.2 Fixed Sized Objects need not indicate any information about the physical location of the data [18]. Rajimwale, et al. have pro- This workload is an extension of the original work- posed that object-based storage is an optimal interface load whose steady state pdf was derived in [7]. How- for flash-based SSDs [19]. We use object-based storage ever, objects now occupy b pages instead of 1 page. as motivation for extensions to the theory of effective Recall that the number of In-Use objects is the same as overprovisioning and write amplification described in the number of In-Use LBAs previously derived; only Section 3. the number of valid pages will change. We assume The utilization problem we solved previously [7] that at least ub pages are allowed for the user to write can be viewed in object space instead of LBA space. to, so that it is not possible to request more space than Instead of LBAs that are In-Use or not In-Use, there is available. 4

T For this workload, Zt = (b, b, . . . , b) ∀t. This means = mZ E[X] u X = mZ us (1) Yt = χi,tZi,t i=1 The variance is more complicated because of cross- u terms. The χi values are not independent, so it is not X 1 = χi,tb simple to compute the variance . i=1 To compute the variance of Y , compute u X V ar(Y ) = E[Y 2] − E[Y ]2 = b χi,t i=1 The value of E[Y ] was computed in Equation 1. Com- = bX t puting E[Y 2] is more complicated. When following 2 To compute the steady state mean and variance of Y , the steps below, note that χi = χi , χi is independent we can use the known steady state mean and variance of Zi and Zj, and the Zi values are iid. of X.  u !2 2 X E[Y ] = E[bX] E[Y ] = E  χiZi  = bE[X] i=1  u  = bus X 2 X = E  χiZi + 2 χiχjZiZj Similarly, i=1 i

−4 x 10 Comparison of Analytic and Simulation pdf Obviously, when i = j, P (Ai|Aj) = 1. However, 8 calculating this value when i 6= j is important. Since there is nothing special about any particular i or j, 7 Gaussian Approximation of Analytic pdf assume that P (Ai|Aj) is the same for all i 6= j, and 6 Simulation pdf (histogram) i j thus is a constant with respect to and . 5 It is tempting to claim that P (Ai|Aj) = (us−1)/(u− 1), since us is the expected number of In-Use objects 4 at steady state. However, this is an invalid statement Probability 3 because it implicitly assumes a zero variance. In fact, 2 using this value in the calculation of V ar(Y ) results in a statement that V ar(Y ) = 0, which is exactly 1 what is expected because of the implicit zero variance 0 0 0.5 1 1.5 2 2.5 3 assumption. 4 # Valid Pages x 10 Instead of trying to compute P (Ai|Aj) directly, it is possible to compute it by using the result of the one Fig. 1. Comparison of analytic and simulation pdf for object occupies one page model. Working backwards the number of valid pages when objects are a ﬁxed size to calculate this probability, and noting that there are of 32 pages. For this example, u = 1000 and q = 0.2. (u−1)u 2 terms in the sum over i < j: u ! X V ar(Y ) = V ar χ i next 9 million requests. The results in each table are i=1 u the average of 64 Monte-Carlo simulations. Although X X u = 1000 = V ar (χi) + 2 Cov(χi, χj) results are only shown for , results for larger i=1 i

TABLE 1 Mean and σ of Number of In-Use Objects and Number of Valid Pages for Fixed Sized Objects. Values in this table were computed with u = 1000. The size of the objects was ﬁxed at 32. Simulation values are the average of 64 Monte-Carlo simulations.

Mean Objects σ Objects Mean Pages σ Pages Trim Amount q Analytic Simulation Analytic Simulation Analytic Simulation Analytic Simulation 0.05 947.37 947.37 7.25 7.24 30315.79 30315.80 232.15 231.80 0.1 888.89 888.89 10.54 10.54 28444.44 28444.37 337.31 337.37 0.2 750.00 750.01 15.81 15.82 24000.00 24000.37 505.96 506.18 0.3 571.43 571.47 20.70 20.73 18285.71 18287.10 662.46 663.29 0.4 333.33 333.43 25.82 25.83 10666.67 10669.70 826.24 826.46 0.45 181.82 181.89 28.60 28.62 5818.18 5820.40 915.32 915.93

TABLE 2 Mean and σ of Number of In-Use Objects and Number of Valid Pages for Variable Sized Objects. Values in this table were computed with u = 1000. The size of the objects was drawn from a discrete uniform distribution over [1, 32]. Simulation values are the average of 64 Monte-Carlo simulations.

Mean Objects σ Objects Mean Pages σ Pages Trim Amount q Analytic Simulation Analytic Simulation Analytic Simulation Analytic Simulation 0.05 947.37 947.36 7.25 7.27 15631.58 15631.33 308.37 307.92 0.1 888.89 888.92 10.54 10.53 14666.67 14667.03 325.62 325.63 0.2 750.00 750.00 15.81 15.80 12375.00 12374.67 363.32 363.42 0.3 571.43 571.41 20.70 20.70 9428.57 9429.00 406.69 406.49 0.4 333.33 333.35 25.82 25.79 5500.00 5500.13 458.17 457.71 0.45 181.82 181.81 28.60 28.58 3000.00 2999.85 488.11 487.65

TABLE 3 Mean and σ of Number of In-Use Objects and Number of Valid Pages for Variable Sized Objects. Values in this table were computed with u = 1000. The size of the objects was drawn from a binomial distribution over with parameters p = 0.4 and n = 32. Simulation values are the average of 64 Monte-Carlo simulations.

Mean Objects σ Objects Mean Pages σ Pages Trim Amount q Analytic Simulation Analytic Simulation Analytic Simulation Analytic Simulation 0.05 947.37 947.36 7.25 7.27 12126.32 12126.39 126.09 126.06 0.1 888.89 888.88 10.54 10.53 11377.78 11377.73 158.21 158.10 0.2 750.00 750.01 15.81 15.80 9600.00 9600.25 216.15 216.00 0.3 571.43 571.37 20.70 20.68 7314.29 7313.46 273.14 272.77 0.4 333.33 333.32 25.82 25.79 4266.67 4266.57 334.35 334.03 0.45 181.82 181.84 28.60 28.63 2327.27 2327.60 368.03 368.36

−3 −3 x 10 Comparison of Analytic and Simulation pdf x 10 Comparison of Analytic and Simulation pdf

1.8 Gaussian Approximation of Analytic pdf 1 1.6 Simulation pdf (histogram)

1.4 Gaussian Approximation 0.8 of Analytic pdf 1.2 Simulation pdf (histogram) 1 0.6 0.8 Probability Probability 0.4 0.6

0.4 0.2 0.2

0 0 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 4 4 # Valid Pages x 10 # Valid Pages x 10 Fig. 2. Comparison of analytic and simulation pdf for Fig. 3. Comparison of analytic and simulation pdf for the number of valid pages when the size of objects is the number of valid pages when the size of objects is drawn from a discrete uniform distribution over [1, 32]. drawn from a binomial distribution over with parame- For this example, u = 1000 and q = 0.2. ters p = 0.4 and n = 32. For this example, u = 1000 and q = 0.2. 7

PPLICATION TO RITE MPLIFICATION 5 A W A Computation of Trim-Modified Write Amplifica- In this section, we examine write amplification when tion Factor one object requires a fixed number of pages to store data. We assume that the number of pages required Compute p = (p1, ··· , pj, ··· , pw) to store the data associated with the object evenly di- −1.9 T − 1 mod(n , b) = p1 = e us (4) vides the number of pages per block (i.e. p ! 0, so that an entire object is stored within a single 1.1 pj−1 p = min 1, (5) block. As in previous work, data is stored in the j 1 np 1 − us fashion of a log-structured file system with greedy garbage collection. We argue that fixed sized objects with sizes that Compute v = vj,k , a matrix of size w × np, evenly divide the number of pages per block is j = 1, ··· , w & k = 0, ··· , np − 1 equivalent to reducing the number of pages in a k block. For instance, if two objects can fit in one block, X np n −m v = 1 − pm (1 − p ) p this is equivalent to having a block with two (large j,k m j j size) pages, and objects that require one of these m=0 (large) pages to store the associated data. With this equivalence between reduced number of pages per Compute V = (V1, ··· ,Vk, ··· ,Vnp ) block and increased object size, the formulas for write w−1 amplification with Trim that we previously devel- Y oped [8] should apply. However, the Trim-modified Vk = vj,k Agarwal and the Trim-modified Xiang models do not j=0 take np into account! The derivation of the Trim- modified Agarwal model causes n to drop out nat- p p∗ = (p∗, ··· , p∗, ··· , p∗ ) urally; it is not a good candidate for modification. Compute 0 k np The Trim-modified Xiang model is a better candidate, ∗ p = 1 − V0 as asymptotically removes np in the derivation. By 0 ∗ eliminating this asymptotic approximation, the write pk = Vk−1 − Vk for k = 1, . . . , np − 1 n /y amplification is computed as p , where p∗ = 1 − V np np−1 y = np + T (1+¯s) 1 1 Compute the write amplification, −W T (1 +s ¯) 1 − s(1+¯s)u ln 1 − s(1+¯s)u (3) np T 1 (1 +s ¯)ln 1 − AHu = np (6) np s(1+¯s)u P ∗ np − k=0 kpk and W (·) is the Lambert W function [20], [6], [8]. The Trim-modified Hu model does not need to be changed in order to take np into account, as it already does this. Fig. 4. Equations needed to compute the Trim- The model can be found in Figure 4. modified empirical model-based algorithm for AHu Results comparing the simulated write amplifica- (adapted from [3], [4] and [8]). w is the window size, tion and the computed write amplification are in Ta- and allows for the windowed greedy garbage collection ble 4 and Figure 5. In this simulation, there were 1280 variation of greedy garbage collection. Setting w = blocks, with a manufacturer-specified spare factor of t − r is needed for the standard greedy garbage np 0.2, and a Trim factor of 0.1. As the number of pages collection discussed in this paper. For an explanation per block changed, the effective spare factor was of the theory behind these formulas, see [3], [4]. held constant to enable comparisons. As expected, the write amplification was reduced as the number of pages per block decreased, with the minimum possible write amplification of 1 achieved with 1 page in a block. The modification of the Xiang model shown in Equation 3 was a very poor approximation as the number of pages in a block decreased; this When objects do not fit neatly within blocks, such as is likely due to the assumption that the number of the case for mod(np, b) 6= 0, for variable sized objects, invalid pages freed in a block by garbage collection or for objects larger than one block, the computation is binomial; with a small number of pages per block, of write amplification is not as straightforward. First, this approximation becomes too coarse. However, the design decisions must be made: should an object be Trim-modified Hu model provides a reasonable, if split over multiple blocks, or should empty, unwritten optimistic, prediction of write amplification. Note that pages be left on a block, that perhaps could be filled the Trim-modified Hu model does not make sense to in later by a smaller object? The design decision will compute if there is one page per block, so the entry affect the analysis of the write amplification; this is an in the table is left blank. area for future research. 8

TABLE 4 workloads. By using this model in conjunction with Write amplification for varying number of pages per object-based storage, we envision metadata servers block, equivalent to fixed sized objects that evenly that separate object classes of distinct sizes to unique divide n . The level of effective overprovisioning was p flash SSDs in order to assist in self-optimization of kept constant for each simulation. The amount of Trim hardware that would improve overall data access was q = 0.1. All models referenced are Trim-modified; performance and improve the lifetime of the SSD. the Xiang model was additionally modified to account for n , as shown in Equation 3. p REFERENCES

Write Amplification [1] Y. Kang, J. Yang, and E. Miller, “Efficient storage management Pages Modified for object-based flash memory,” in 2010 18th Annual IEEE/ACM Hu /Block Simulation Xiang International Symposium on Modeling, Analysis and Simulation of Model np Model Computer and Telecommunication Systems. IEEE, 2010, pp. 407– 1 1.000 1.936 - 409. 2 1.191 1.937 1.000 [2] D. Feng and L. Qin, “Adaptive object placement in object- 4 1.432 1.938 1.065 based storage systems with minimal blocking probability,” in Advanced Information Networking and Applications, 2006. AINA 8 1.631 1.938 1.274 2006. 20th International Conference on, vol. 1. IEEE, 2006, pp. 16 1.771 1.938 1.628 611–616. 32 1.853 1.938 1.732 [3] X. Hu, E. Eleftheriou, R. Haas, I. Iliadis, and R. Pletka, “Write 64 1.896 1.938 1.793 amplification analysis in flash-based solid state drives,” in 128 1.918 1.938 1.828 Proceedings of SYSTEM 2009: The Israeli Experimental Systems 256 1.929 1.938 1.847 Conference. ACM, 2009. [4] X. Hu and R. Haas, “The fundamental limit of flash random Write Amplification vs. Pages Per Block write performance: Understanding, analysis and performance 2 modelling,” IBM Research Report, vol. RZ 3771, Mar 2010. [5] R. Agarwal and M. Marrow, “A closed-form expression for Write Amplification in NAND Flash,” in GLOBECOM Work- 1.8 shops (GC Wkshps), 2010 IEEE. IEEE, 2010, pp. 1846–1850. [6] L. Xiang and B. Kurkoski, “An Improved Analytical Expres- sion for Write Amplification in NAND Flash,” in International Conference on Computing, Networking, and Communications 1.6 Simulation , 2012, Also available at Arxiv preprint arXiv:1110.4245. Hu [7] T. Frankie, G. Hughes, and K. Kreutz-Delgado, “A Mathe- Xiang 1.4 matical Model of the Trim Command in NAND-Flash SSDs,” in Proceedings of the 50th Annual Southeast Regional Conference.

Write Amplification, A ACM, 2012. 1.2 [8] ——, “The Effect of Trim Requests on Write Amplification in Solid State Drives,” in Proceedings of the IASTED-CIIT Confer- ence. IASTED, 2012. 1 0 50 100 150 200 250 [9] ——, “Analysis of trim commands on overprovisioning and Pages Per Block, n write amplification in solid state drives,” ACM Transactions on p Storage (TOS), 2012, in Submission. [10] L. Grupp, A. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, Fig. 5. Write amplification versus pages per block P. Siegel, and J. Wolf, “Characterizing flash memory: anoma- for constant effective overprovisioning. The modified lies, observations, and applications,” in Proceedings of the 42nd Xiang model (Equation 3) fails when the number of Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2009, pp. 24–33. pages per block is low. The Trim-modified Hu model [11] D. Roberts, T. Kgil, and T. Mudge, “Integrating NAND flash provides an optimistic, but reasonable, prediction. devices onto servers,” Commun. ACM, vol. 52, no. 4, pp. 98– 103, 2009. [12] T. Chung, D. Park, S. Park, D. Lee, S. Lee, and H. Song, “A sur- vey of flash translation layer,” Journal of Systems Architecture, 6 CONCLUSION vol. 55, no. 5-6, pp. 332–343, 2009. [13] A. Gupta, Y. Kim, and B. Urgaonkar, “DFTL: a flash translation In this paper, the Trim model we previously presented layer employing demand-based selective caching of page- level address mappings,” in Proceedings of the 14th international in [7], [8], [9] was extended to encompass the idea conference on Architectural support for programming languages and of object-based storage and allow objects to utilize operating systems. ACM, 2009, pp. 229–240. more than one page for data storage. A utilization [14] M. Rosenblum and J. Ousterhout, “The design and implemen- tation of a log-structured file system,” ACM Transactions on model was derived and shown to be accurate via sim- Computer Systems (TOCS), vol. 10, no. 1, pp. 26–52, 1992. ulation. Additionally, models for write amplification [15] Information Technology - ATA/ATAPI Command Set - 2 (ACS-2), were modified and compared to simulated values, INCITS Working Draft T13/2015-D Rev. 7 ed., June 2011, http://www.t13.org/documents/UploadedDocuments/ demonstrating the failure point of one model and the docs2011/d2015r7-ATAATAPI Command Set - 2 ACS-2. need for a more accurate model in another. Models pdf. such as the ones presented in this paper are useful [16] N. Agrawal, V. Prabhakaran, T. Wobber, J. Davis, M. Manasse, and R. Panigrahy, “Design tradeoffs for SSD performance,” in tools for understanding the problems that an FTL USENIX 2008 Annual Technical Conference on Annual Technical design needs to address, and can lead to improved Conference. USENIX Association, 2008, pp. 57–70. future designs. [17] Information Technology - SCSI Object-Based Storage Device Commands-3 (OSD-3), INCITS Working Draft T10/2128-D Rev. The model extension presented in this paper helps 0 ed., March 2009, http://www.t10.org/cgi-bin/ac.pl?t=f&f= make the workload a more characteristic of real-world osd3r00.pdf. 9

[18] Y. Kang, J. Yang, and E. Miller, “Object-based SCM: An Ef- ﬁcient Interface for Storage Class Memories,” in Mass Storage Systems and Technologies (MSST), 2011 IEEE 27th Symposium on. IEEE, 2011, pp. 1–12. [19] A. Rajimwale, V. Prabhakaran, and J. Davis, “Block management in solid-state devices,” in Proceedings of the 2009 conference on USENIX Annual technical conference. USENIX Association, 2009, pp. 21–21. [20] R. Corless, G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth, “On the LambertW function,” Advances in Computational mathemat- ics, vol. 5, no. 1, pp. 329–359, 1996.