And Intra-Set Write Variations

i2WAP: Improving Non-Volatile Cache Lifetime by Reducing Inter- and Intra-Set Write Variations 1 2 1,3 4 Jue Wang , Xiangyu Dong ,YuanXie , Norman P. Jouppi 1Pennsylvania State University, 2Qualcomm Technology, Inc., 3AMD Research, 4Hewlett-Packard Labs 1{jzw175,yuanxie}@cse.psu.edu, [email protected], [email protected], [email protected] Abstract standby power, better scalability, and non-volatility. For example, among many ReRAM prototype demonstrations Modern computers require large on-chip caches, but [11, 14, 21–25], a 4Mb ReRAM macro [21] can achieve a the scalability of traditional SRAM and eDRAM caches is cell size of 9.5F 2 (15X denser than SRAM) and a random constrained by leakage and cell density. Emerging non- read/write latency of 7.2ns (comparable to SRAM caches volatile memory (NVM) is a promising alternative to build with similar capacity). Although its write access energy is large on-chip caches. However, limited write endurance is usually on the order of 10pJ per bit (10X of SRAM write a common problem for non-volatile memory technologies. energy, 5X of DRAM), the actual energy saving comes from In addition, today’s cache management might result in its non-volatility property. Non-volatility can eliminate the unbalanced write traffic to cache blocks causing heavily- standby leakage energy, which can be as high as 80% of the written cache blocks to fail much earlier than others. total energy consumption for an SRAM L2 cache [10]. Unfortunately, existing wear-leveling techniques for NVM- Given the potential energy and cost saving opportunities based main memories cannot be simply applied to NVM- (via reducing cell size) from the adoption of non-volatile based on-chip caches because cache writes have intra-set technologies, replacing SRAM and eDRAM with them can variations as well as inter-set variations. To solve this be attractive. Consistent with this expectation, such a problem, we propose i2WAP, a new cache management technology shift is happening starting from lower memory policy that can reduce both inter- and intra-set write hierarchy levels such as storage and main memory. For variations. i2WAP has two features: (1) Swap-Shift, example, PCM-based storage [1,4,5] and main memory [8, an enhancement based on previous main memory wear- 13, 15–17, 19, 20, 26, 27] have already been explored. leveling to reduce cache inter-set write variations; (2) However, adoption of non-volatile memory at the on- Probabilistic Set Line Flush, a novel technique to reduce chip cache level has a write endurance issue. For example, 2 cache intra-set write variations. Implementing i WAP only PCM is not suitable for on-chip caches because it is only needs two global counters and two global registers. By expected to sustain 108 writes before experiencing frequent 2 adopting i WAP, we can improve the lifetime of on-chip stuck-at-1 or stuck-at-0 errors [8, 15, 17, 19, 20, 26]. For non-volatile caches by 75% on average and up to 224%. ReRAM, the write endurance bar is much improved but is 1011 still around [11]. For STTRAM, although a prediction of up to 1015 write cycles is often cited, the best endurance 4 × 1012 1 Introduction test result for STTRAM devices so far is less than cycles [6]. Although the absolute write endurance values for SRAM and embedded DRAM (eDRAM) are commonly ReRAM and STTRAM seem sufficiently high for use in on- used for on-chip cache designs in modern microproces- chip L2 or L3 caches, the actual problem is that the current sors. However, the scaling of SRAM and eDRAM is cache management policies are not write variation-aware. increasingly constrained by technology limitations such as These polices were originally designed for SRAM caches leakage power and cell density. Recently, some emerg- and result in significant non-uniformity in terms of writing ing non-volatile memory technologies, such as Phase- to cache blocks, which would cause heavily-written non- Change RAM (PCM or PCRAM), Spin-Torque Transfer volatile cache blocks to fail much earlier than most other RAM (STTRAM or MRAM),andResistive RAM (ReRAM), blocks. have been explored as alternative memory technologies. Many wear-leveling techniques have been proposed to Compared to SRAM and eDRAM, these non-volatile extend the lifetime of non-volatile main memories [15, 20, memory technologies have advantages of high density, low 27], but the difference between cache and main memory 978-1-4673-5587-2/13/$31.00 ©2013 IEEE 234 operational mechanisms makes the existing wear-leveling prototypes demonstrate the best write endurance ranging techniques for non-volatile main memories inadequate for from 1010 [14] to 1011 [11]. The best endurance test on non-volatile caches. This is because writes to caches have STTRAM devices so far is less than 4 × 1012 cycles [6]. intra-set variations in addition to inter-set variations while The write endurance of some non-volatile memories may writestomainmemoriesonlyhaveinter-set variations. seem sufficiently high for use in L2 or L3 caches, but the According to our analysis, intra-set variations can be actual problem is the write variation brought by state-of-the- comparable to inter-set variations for some workloads. art cache replacement policies, which would cause heavily- This presents a new challenge in designing wear-leveling written cache blocks to fail much earlier than their expected techniques for non-volatile caches. lifetime. Thus, eliminating cache write variation is one of To minimize both inter- and intra-set write variations, the most critical problems that must be addressed before we introduce i2WAP (inter/intra-set Write variation-Aware non-volatile memories can be used to build practical and cache Policy), a simple but effective wear-leveling policy reliable on-chip caches. for non-volatile caches. i2WAP features two schemes: 1) Swap-Shift is enhanced from the existing main memory 3 Inter-Set and Intra-Set Write Variations wear-leveling techniques and aims to reduce the cache inter-set write variation; 2) Probabilistic Set Line Flush Write variation is a significant concern in designing any is designed to alleviate the cache intra-set write variation, cache/memory subsystems with a limited write endurance. which is a severe problem for non-volatile caches and has Large write variation can greatly degrade the product not been addressed before. lifetime because only a small subset of memory cells that Our experimental results show i2WAP can effectively experience the worst-case write traffic can result in an entire reduce the total write variation by 16X on average, and the dead cache/memory subsystem even when the majority of overall lifetime improvement of non-volatile L2 caches is cells are far from wear-out. 75% (up to 224%) over conventional cache management While the write variation in non-volatile main memories policies. i2WAP has small performance overhead (0.25% has been widely studied [13, 15, 16, 20, 27], to our knowl- on average) and negligible hardware overhead requiring edge, the write variation in non-volatile caches has not. only two extra global counters and two global registers. Wear-leveling in caches brings extra challenges since there are write count variations inside every cache set (i.e. intra- set variations) as well as across different cache sets (i.e. 2 Background inter-set variations). In order to demonstrate how severe 2.1 Benefits of Non-Volatile Caches the problem is for non-volatile caches, we first do a quick experiment. It is beneficial to use non-volatile memory technologies as on-chip caches. First, compared to SRAM and eDRAM, 3.1 Definition non-volatile memory can provide denser caches due to its The objective of cache wear-leveling is to reduce write smaller cell. For example, ReRAM [21] was demonstrated variations and make write traffic uniform. To quantify to be 15 times denser but with a similar read speed the cache write variation, we first define the coefficient of compared to SRAM. Thus, using non-volatile memory inter-set variations (InterV) and the coefficient of intra-set can enable significantly larger caches with correspondingly variations (IntraV) as follows, reduced cache miss rate, improving performance over 2 SRAM-based caches. Second, non-volatile caches can N M reduce energy consumptions. Previous studies have shown wi,j /M − Waver 1 i=1 j=1 that caches may consume up to 50% of a microprocessor’s InterV = W N − 1 (1) energy [18], and leakage energy can be as much as 80% of aver the total cache energy consumption [10]. Using non-volatile 2 M M memories can eliminate leakage energy when they are in w − w /M N i,j i,j standby, hence reducing the total energy consumption. 1 j=1 j=1 IntraV = W · N M − 1 aver i=1 2.2 Limited Write Endurance (2) w Write endurance is defined as the number of times a where i,j is the write count of the cache line located at set i j W memory cell can be overwritten, and emerging non-volatile and way , aver is the average write count defined as: memories commonly have a limited write endurance. The N M wi,j ITRS [7] projects that the average PCM write endurance i=1 j=1 107 108 Waver = is in a range between and . Recent ReRAM NM (3) 235 100% 189% 221% 100% 115% 222% 90% L2 InterV 90% L3 InterV 80% L2 IntraV 80% L3 IntraV 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% (a) (b) Figure 1. The coefficient of variation for inter-set and intra-set write count of L2 and L3 caches in a simulated 4-core system with 32KB I-L1, 32KB D-L1, 1MB L2, and 8MB L3 caches.

And Intra-Set Write Variations

Increasing Reliability and Fault Tolerance of a Secure Distributed Cloud Storage

Improving Performance and Reliability of Flash Memory Based Solid State Storage Systems

Configuring Oracle Solaris ZFS for an Oracle Database

PC-Based Data Acquisition II Data Streaming to a Storage Device

(12) United States Patent (10) Patent No.: US 7,191,286 B2 Forrer, Jr

Storage Administration Guide Storage Administration Guide SUSE Linux Enterprise Server 15

Of File Systems and Storage Models

Data Stored Based Test Pattern Generation in Flash Memory

Architectural Techniques for Improving NAND Flash Memory Reliability Yixin Luo

Drobotm Beyondraidtm Simplifies Storage Deployment and Management Safe

Data Redundancy and Maintenance for Peer-To-Peer File Backup Systems Alessandro Duminuco

Coded Access Architectures for Dense Memory Systems