Exploiting Memory Device Wear-Out Dynamics to Improve NAND Flash Memory System Performance
Total Page:16
File Type:pdf, Size:1020Kb
Exploiting Memory Device Wear-Out Dynamics to Improve NAND Flash Memory System Performance Yangyang Pan, Guiqiang Dong, and Tong Zhang Electrical, Computer and Systems Engineering Department Rensselaer Polytechnic Institute, USA. Abstract applications, from consumer electronics to personal and enterprise computing. In particular, it is now economi- This paper advocates a device-aware design strategy to cally viable to implement solid-state drives (SSDs) using improve various NAND flash memory system perfor- NAND flash memory, which is expected to fundamen- mance metrics. It is well known that NAND flash tally change the memory and storage hierarchy in future memory program/erase (PE) cycling gradually degrades computing systems. As the semiconductor industry is memory device raw storage reliability, and sufficiently aggressively pushing NAND flash memory technology strong error correction codes (ECC) must be used to en- scaling and the use of multi-bit-per-cell storage scheme, sure the PE cycling endurance. Hence, memory man- NAND flash memory increasingly relies on error correc- ufacturers must fabricate enough number of redundant tion codes (ECC) to ensure the data storage integrity. It memory cells geared to the worst-case device reliability is well known that NAND flash memory cells gradually at the end of memory lifetime. Given the memory de- wear out with the program/erase (PE) cycling [6], which vice wear-out dynamics, the existing worst-case oriented is reflected as gradually diminishing memory cell storage ECC redundancy is largely under-utilized over the en- noise margin (or increasing raw storage bit error rate). To tire memory lifetime, which can be adaptively traded for meet a specified PE cycling endurance limit, NAND flash improving certain NAND flash memory system perfor- memory manufacturers must fabricate enough number mance metrics. This paper explores such device-aware of redundant memory cells that can tolerate the worst- adaptive system design space from two perspectives, in- case raw storage reliability at the end of memory life- cluding (1) how to improve memory program speed, and time. Clearly, the memory cell wear-out dynamics tend (2) how to improve memory defect tolerance and hence to make the existing worst-case oriented ECC redun- enable aggressive fabrication technology scaling. To en- dancy largely under-utilized over the entire lifetime of able quantitative evaluation, we for the first time develop memory, especially at its early lifetime when PE cycling a NAND flash memory device model to capture the ef- number is relatively small. fects of PE cycling from the system level. We carry out simulations using the DiskSim-based SSD simula- Very intuitively, we may adaptively trade such under- tor and a variety of traces, and the results demonstrate utilized ECC redundancy for improving certain NAND up to 32% SSD average response time reduction. We flash memory system performance metrics throughout further demonstrate that the potential on achieving very the memory lifetime. This naturally leads to a PE- good defect tolerance, and finally show that these two cycling-aware adaptive NAND flash memory system de- design approaches can be readily combined together to sign paradigm. Based upon extensive open literature on noticeably improve SSD average response time even in flash memory devices, we first develop an approximate the presence of high memory defect rates. NAND flash memory device model that quantitatively captures the dynamic PE cycling effects, including ran- dom telegraph noise [15, 17] and interface trap recovery 1 Introduction and electron detrapping [26, 31, 45], and another major noise source: cell-to-cell interference [25]. Such a device The steady bit cost reduction over the past decade has en- model makes it possible to explores and quantitatively abled NAND flash memory to enter increasingly diverse evaluate possible adaptive system design strategies. In ∗This material is based upon work supported by the National Sci- particular, this paper explores the adaptive system de- ence Foundation under Grant No. 0937794 sign space from two perspectives: (1) Since NAND flash 1 memory program speed also strongly affects the mem- allelism. It is well known that the threshold voltage of ory cell storage noise margin, we could trade the under- erased memory cells tends to have a wide Gaussian-like utilized ECC redundancy to adaptively improve NAND distribution [41]. Hence, we can approximately model flash memory program speed; (2) We could also exploit the threshold voltage distribution of erased state as the under-utilized ECC redundancy to realize stronger 2 1 − (x−me) memory cell defect tolerance and hence enable more ag- 2s2 pe(x) = p e e ; (1) gressive technology scaling. We elaborate on the under- se 2p lying rationale and realizations of these two design ap- where me and se are the mean and standard deviation proaches. In addition, for the latter one, we propose a of the erased state threshold voltage. Regarding mem- simple differential wear-leveling strategy in order to min- ory program, a tight threshold voltage control is typi- imize its impact on effective PE cycling endurance. cally realized by using incremental step pulse program For the purpose of evaluation, using the developed (ISPP) [6, 39], i.e., all the memory cells on the same NAND flash memory device model, we first obtain de- word-line are recursively programmed using a program- tailed quantitative memory cell characteristics under dif- and-verify approach with a stair case program word-line ferent PE cycling times and different program speed for voltage Vpp. Let DVpp denote the incremental program a hypothetical 2bit/cell NAND flash memory. Accord- step voltage. For the k-th programmed state with the ver- ingly, with the sector size of 512B user data, we construct (k) ify voltage Vp , ideally ISPP program results in a uni- a binary BCH code (4798, 4096, 54) with 1.1% coding form threshold voltage distribution: redundancy that can ensure the data storage integrity at ( the PE cycling limit of 10K. Using representative work- 1 ; if V (k) ≤ x ≤ V (k) + DV (k) DVpp p p pp load traces and the SSD model [3] in DiskSim [8], we pp (x) = : (2) 0; else carry out extensive simulations to evaluate the potential of trading under-utilized ECC redundancy to improve Unfortunately, the above ideal memory cell thresh- memory program speed while assuming the memory is old voltage distribution can be (significantly) distorted defect-free. The simulation results show that we could in practice, mainly due to PE cycling and cell-to-cell in- reduce the SSD average response time by up to 32%. As- terference, which will be discussed in the remainder of suming memory defects follow Poisson distributions, we this section. further show that the proposed differential wear-leveling technique can very effectively improve the effectiveness 2.2 Effects of PE Cycling of allocating ECC redundancy for improving memory defect tolerance. Finally, we study the combined effects Flash memory PE cycling causes damage to the tunnel when we trade the under-utilized ECC redundancy to im- oxide of floating gate transistors in the form of charge prove memory program speed and realize defect toler- trapping in the oxide and interface states [9, 30, 34], ance at the same time. DiskSim-based simulations show which directly results in threshold voltage shift and fluc- that, even in the presence of high defect rates, we can still tuation and hence gradually degrades memory device achieve noticeable SSD average response time reduction. noise margin. Major distortion sources include 1. Electrons capture and emission events at charge 2 Background trap sites near the interface developed over PE cy- cling directly result in memory cell threshold volt- 2.1 Memory Erase and Program Basics age fluctuation, which is referred to as random tele- graph noise (RTN) [15, 17]; Each NAND flash memory cell is a floating gate tran- 2. Interface trap recovery and electron detrapping [26, sistor whose threshold voltage can be configured (or pro- 31,45] gradually reduce memory cell threshold volt- grammed) by injecting certain amount of charges into the age, leading to the data retention limitation. floating gate. Hence, data storage in NAND flash mem- ory is realized by programming the threshold voltage Moreover, electrons trapped in the oxide over PE cy- of each memory cell into two or more non-overlapping cling make it difficult to erase the memory cells, leading voltage windows. Before one memory cell can be pro- to a longer erase time, or equivalently, under the same grammed, it must be erased (i.e., remove the charges erase time, those trapped electrons make the threshold in the floating gate, which sets its threshold voltage to voltage of the erased state increase [4, 21, 27, 42]. Most the lowest voltage window). NAND flash memory uses commercial flash chips employ erase-and-verify opera- Fowler-Nordheim (FN) tunneling to realize both erase tion to prevent the increase of erase state threshold volt- and program [7], because FN tunneling requires very age at the penalty of gradually longer erase time with PE low current and hence enables high erase/program par- cycling. 2 PE cycling number N Memory Ideal Distorted by Distorted by Cell-to-Cell Distorted by interface trap Final Threshold Erase Programming RTN Interference recovery and electron detrapping Voltage Distribution V # (µe ,! e ) $ pp "r (µd ,! d ) t Figure 1: Illustration of the approximate NAND flash memory device model to incorporate major threshold voltage distortion sources. RTN causes random fluctuation of memory cell cell, and the coupling ratio g(k) is defined as threshold voltage, where the fluctuation magnitude is C(k) subject to exponential decay. Hence, we can model g(k) = ; (5) the probability density function pr(x) of RTN-induced Ctotal threshold voltage fluctuation as a symmetric exponential (k) function [15]: where C is the parasitic capacitance between the in- 1 − jxj terfering cell and the victim cell, and Ctotal is the total pr(x) = e lr : (3) capacitance of the victim cell.