arXiv:1704.05138v1 [cs.AR] 17 Apr 2017 eletbihdtcnlg n aebe ieyue nco in used widely been have and technology well-established ihn efrac vredadas eysalcagsa t at changes small very also and overhead performance no with erlvln ehiusfraheiglne ieie I lifetime. longer achieving for techniques wear-leveling ie,u o1 er nmdr eie) eod yexploiti by second, devices); modern in years 10 to up (i.e., htcudptnilyb sdt osrc C,fls memory flash SCM, construct to used be potentially could techn that memory magnet nonvolatile solid-state hard-disk candidate conventional Among of age. cost low and a the capabilities with robustness, and solid- high-performance a as such of memory, benefits the combines (SCM) memory Storage-class ABSTRACT a ewe RM(oae erpoesr [ processor) performan near large (located is DRAM between there gap that mem observe conventional we the hierarchy, changing at storage by Looking performance co overall and hierarchy. higher problem memory-storage achieve wall memory to the how target sider might one instance, impro For performance for ways impera additional is developing substan it consider in Therefore, results computers. longer of no improvement constrain devices formance power logic the become of have scaling CPUs and the decade, last the During INTRODUCTION 1 software. FTL osgicnl mrv eielftm bten5.1 (between ab lifetime is device D-SLC improve that simulation-b demonstrate significantly extensive we to an SCM, flash-based Using a lifeti SCM. of the analysis an extending of for characteristic part this (F solid-state software use management to flash order the finally,in in and changes extension; required lifetime the cuss for a cycle into erase writes each multiple during perform D us called enables mechanism, which novel (D-SLC), a SLC propose we relaxation, time tion C,w hwta aoiyo h aasoe nasolid-stat a in memo flash stored by data provided times the retention long of require majority not do a SCM longevi that data show quantifying we by SCM, first, lifetime, flash improve reductio man to write by of attention combination of some using lot are a that received ufactures has is and This problem unusable). becomes documented it before technology, flash the on o ahbsdSM(ahato rtn i a lgtyda 10 slightly written be may can bit cell a flash one writing so of cell, a act age (each SCM cha flash-based significant a a for is than endurance power write However, and systems. space based performance, between tradeoffs enable better SCM Flash-based incarnations. SCM available cially xliigDt ogvt o nacn h ieieof Lifetime the Enhancing for Longevity Data Exploiting enyvnaSaeUniversity State Pennsylvania [email protected] [email protected] yugo Jung Myoungsoo osiUniversity Yonsei oi Choi Wonil ls-ae trg ls Memory Class Storage Flash-based 4 –10 21 5 ie,depending times, , 22 , × 25 neffort an n n 8.6 and eo the of me , greten- ng yi an in ty 38 vement. ilper- tial ologies ce-cost much s edis- we cstor- ic rchival well- a ieto tive mmer- llenge and n and ] ense- state disk- ased ory- TL) cell sa is the ed, × m- he ry le n- e ) - ihahg-efrac,hg-est n o-otnon-vo low-cost and high-density high-performance, a with nSCfls elhstovlaesae sdfrsoigone-b storing for used states voltage two has cell flash SLC An e-i)flsbtenDA n D,adi aldSoaeCl Storage [ called is (SCM) cos and Memory its HDD, as and well DRAM between (as falls time per-bit) access whose technology (NVM) memory ayfls eoyapiain,teeaeas oecsswhe cases some does also data are stored m there the for a applications, is data memory non-volatility retain flash long-term many to this Although expected traditi years. are devices more and flash or times The retention data. long stored have the hold correctly relaxing by can it improving for time opportunity retention the discuss and SCM o ierneo nepie(n LP okod ae fro taken workloads [ OLTP) suite (and I/O Cambridge enterprise S MSR of SLC the an range into wide data written a this of confirms for longevity work the measure this we in – SS study havior MLC characterization underlying Our re- the HDD. by I/O or handled the wi normally requests handle are I/O longevity to the term while SSD longevity, SLC short-term the with quests expect we hierarchy, storage e efrac nalknso optn ytm.Ti spo is boos This to systems. computing potential of kinds the all has in performance gap tem this Bridging advances. nology id fcmeca ytm agn rmlposaddeskto and various laptops computers. in from enterprise used ranging widely systems been commercial ( paper of has future kinds this which near Instead, SCM any flash-based in costs). NAND SCM fabrication a in high them their exploit of to cause unlikely quite is it [ HDD ihwiepesr ni fls eoysfesfo o ele cell low from 10 tolerate suffers pos can SLC memory cell which the (flash each traffic that i.e., it incoming durance, is on the reason pressure of The write portion high setup. great this significant a in a services SSD is SSD SLC endurance the Write for terab level. lenge with lower SSD the MLC an at and capacity level, upper the at capacity gabytes hr saSCSldSaeDie(S)[ (SSD) Drive State Solid st SLC internal same a hierarchal the is a in there long has technologies typically a densit the SCM has flash-based both higher a and of for tem, fast benefits metrics is the have these SLC to off time. order trades each MLC at a but data states bit lifetime, two more than more or has 2 cell stores flash MLC an while information, (ML fl Cell popular Multi-Level of and types (SLC) two Cell are Single-Level There memories: holds. it electrons of amount oois(uha hs hneMmr [ Memory Change Phase as (such nologies nti ae,w agttelftm rbe fSCSDi an in SSD SLC of problem lifetime the target we paper, this In ls eoysoe iaydt ntefr facag,i.e., charge, a of form the in data binary stores memory Flash 11 , 16 ,adti a a eoelre ihtercn tech- recent the with larger become has gap this and ], enyvnaSaeUniversity State Pennsylvania enyvnaSaeUniversity State Pennsylvania ftefls,ie,tepro ftm htafls cell flash a that time of period the i.e., flash, the of oamdArjomand Mohammad 4 [email protected] amtKandemir Mahmut , 10 not [email protected] .Dsietercn dacsi V tech- NVM in advances recent the Despite ]. eur t o xml,wti memory- a within example, For it. require 32 .W bev htamjrt of majority a that observe We ]. 4 –10 2 , 35 9 5 , rga/rs cycles). program/erase n antcRM[ RAM Magnetic and ] 13 , 20 ihtn fgi- of tens with ] assumes hlong- th ructure: s for ust onally ssible .In y. sys- t latile sto ps chal- one sys- ash SD the yte the ass be- be- C). nd es re m D n- it t- 39 ]), written data into the SCM for all evaluated workloads exhibits very program (write), and erase. Page is the unit of a read or a write op- short longevity. In fact, more than 90% of written data in these eration, and reprogramming a cell needs to be preceded by an erase. workloads have a longevity of up to 10 hours (it is less than 3 min- Erase is performed at unit of a block. Due to the need for erase- utes for some applications and less than 10 hours for some others). before-write operation and high latency of an erase, flash memory The retention time relaxation for flash memory is previously stud- usually employs an out-of-place update policy, i.e., when updating ied by some works [26, 33]. They have shown that, by controlling a data, the page containing the old data is marked as invalid, and the write process in flash device, it is possible to reduce its reten- the new data is written to an arbitrary clean page. The new page is tion time (more details on the theory behind this property are given marked as valid. in Section 3.2). The prior work mostly use this property for per- FTL: The FTL implements some important functionalities for formance improvement of the flash by reducing its write execution flash memory management. We go over two main FTL’s function- time. In this paper, however, we use the retention time relaxation of alities in below. flash to enhance its lifetime. The main idea is that, by relaxing the • Address mapping: On receiving an I/O request, FTL segments retention time of an SLC device, we can have more than two states it into multiple pages and maps each page onto flash chips sepa- in a cell. At each given time, similar to the baseline SLC, we use rately. Address mapping for a write request is a two-step process. every two states to write one bit information. In this way, a device First, a chip-level allocation strategy [19] determines which chip stores multiple bits (one bit at each time) before it needs an erase, in- each page should be mapped to. Then, the page is mapped to one creasing the number writes to cell during one erase cycle, or simply of the blocks and a page inside it (block allocation). For each increasing the PWE1 of the device beyond the conventional SLC chip, FTL always keeps one clean block as the active block for flash (i.e., one). Increasing PWE of a device directly translates into new page writes. Within an active block, clean pages are used lifetime improvement. to program data in a sequential order. Once the clean pages run Our proposed flash memory design is called Dense-SLC (D-SLC) out in the active block, a new active block is assigned. Finally, and its implementation needs two minor changes at FTL. First, the the mapping information of each page (i.e., chip number, block block-level allocation algorithm in FTL should be modified to en- number in the chip, and page index in the block) is stored in a able having blocks with different retention times and use them for mapping table which is kept by FTL. On receiving a read request, storing data values of different longevities. Our proposed block- the FTL looks up the mapping table for finding its physical page level allocation algorithm does not require any specific mechanism location. for retention time estimation. Instead it uses a simple and yet ef- • Garbage collection (GC): When the number of free pages falls fective page migration scheme that imposes negligible lifetime and below a specific threshold, the FTL triggers a GC procedure to bandwidth overhead. Second, the garbage collection algorithm, which reclaim the invalid pages and make some pages clean. When a is needed for the flash management, is modified to ease the system- GC is invoked, the target blocks are selected, their valid pages are level implementation of writing multiple bits in one erase cycle. moved (written) to other places, and finally the blocks are erased. These modifications are simple to implement in an FTL and need Due to the page migrations and erase operation, a GC generally two-bit metadata information per one block. Using a detailed im- takes a long time and consumes significant SSD bandwidth [14, plementation of D-SLC flash memory in DiskSim simulator [1, 3], 15, 24]. we evaluate its lifetime and performance efficiency for a large work- load set. Our experimental evaluation confirms that a typical imple- mentation of D-SLC is able to improve SLC SSD’s lifetime by up to 2.2 SLC-based SSD 8.6× (6.8×, on average) with no degradation in the overall system Flash memory conventionally stores one-bit information in each performance. cell (Single-Level Cell or SLC). However, during the last few years, manufactures leverage the ability to store multiple bits in a single 2 PRELIMINARIES cell – cells in recent products can store 3 bits (called Triple-Level 2.1 SSD and Flash Memory Cell or TLC) before which 2-bit cell (Multiple-Level Cell or MLC) was the norm. The multi-bit capability of a cell is provided by en- Figure 1 illustrates the internal architecture of an SSD which is com- abling multiple voltage states in it – MLC has four different volt- posed of three components: 1) Host Interface communicates with age states, whereas TLC has eight different voltage states (some- the host system, queues the incoming requests, and schedules them times called voltage levels). Despite of their low cost per bit, the for services; 2) The SSD controller is responsible for processing I/O TLC/MLC flash memories have higher access latencies and lower requests and managing SSD resources by executing Flash Transla- endurance than the SLCs[34]. tion Layer (FTL) software; 3) A set of NAND flash memory chips In order to have the benefits of both technologies in the same as the storage medium, which are connected to the controller via system, current SCM designs usually rely on a two-level and hy- multiple buses. brid flash-based hierarchy. At upper level, there is a small and fast NAND flash chip: A flash memory has thousands of blocks and SLC flash-based SSD (with tens of gigabyte capacity), and a dense each block has hundreds of pages. Each page is a row of NAND MLC flash-based SSD (with few terabyte capacity) is used at lower flash cells. Binary values of a cell is represented by its charge level. In such an architecture, the SLC-based SSD is responsible for holding property. Flash memory has three main operations: read, servicing most of the I/O traffic and hence its lifetime is very crucial 1We use the term “page writes per erase cycle” (PWE) as the maximum number of (because of writes). In this work, we focus on enhancing lifetime logical pages stored in one physical page during one P/E cycle. of the SLC SSD in SCM. However, the studied characterization and 2
CVH `QJ