And Intra-Set Write Variations
Total Page:16
File Type:pdf, Size:1020Kb
i2WAP: Improving Non-Volatile Cache Lifetime by Reducing Inter- and Intra-Set Write Variations 1 2 1,3 4 Jue Wang , Xiangyu Dong ,YuanXie , Norman P. Jouppi 1Pennsylvania State University, 2Qualcomm Technology, Inc., 3AMD Research, 4Hewlett-Packard Labs 1{jzw175,yuanxie}@cse.psu.edu, [email protected], [email protected], [email protected] Abstract standby power, better scalability, and non-volatility. For example, among many ReRAM prototype demonstrations Modern computers require large on-chip caches, but [11, 14, 21–25], a 4Mb ReRAM macro [21] can achieve a the scalability of traditional SRAM and eDRAM caches is cell size of 9.5F 2 (15X denser than SRAM) and a random constrained by leakage and cell density. Emerging non- read/write latency of 7.2ns (comparable to SRAM caches volatile memory (NVM) is a promising alternative to build with similar capacity). Although its write access energy is large on-chip caches. However, limited write endurance is usually on the order of 10pJ per bit (10X of SRAM write a common problem for non-volatile memory technologies. energy, 5X of DRAM), the actual energy saving comes from In addition, today’s cache management might result in its non-volatility property. Non-volatility can eliminate the unbalanced write traffic to cache blocks causing heavily- standby leakage energy, which can be as high as 80% of the written cache blocks to fail much earlier than others. total energy consumption for an SRAM L2 cache [10]. Unfortunately, existing wear-leveling techniques for NVM- Given the potential energy and cost saving opportunities based main memories cannot be simply applied to NVM- (via reducing cell size) from the adoption of non-volatile based on-chip caches because cache writes have intra-set technologies, replacing SRAM and eDRAM with them can variations as well as inter-set variations. To solve this be attractive. Consistent with this expectation, such a problem, we propose i2WAP, a new cache management technology shift is happening starting from lower memory policy that can reduce both inter- and intra-set write hierarchy levels such as storage and main memory. For variations. i2WAP has two features: (1) Swap-Shift, example, PCM-based storage [1,4,5] and main memory [8, an enhancement based on previous main memory wear- 13, 15–17, 19, 20, 26, 27] have already been explored. leveling to reduce cache inter-set write variations; (2) However, adoption of non-volatile memory at the on- Probabilistic Set Line Flush, a novel technique to reduce chip cache level has a write endurance issue. For example, 2 cache intra-set write variations. Implementing i WAP only PCM is not suitable for on-chip caches because it is only needs two global counters and two global registers. By expected to sustain 108 writes before experiencing frequent 2 adopting i WAP, we can improve the lifetime of on-chip stuck-at-1 or stuck-at-0 errors [8, 15, 17, 19, 20, 26]. For non-volatile caches by 75% on average and up to 224%. ReRAM, the write endurance bar is much improved but is 1011 still around [11]. For STTRAM, although a prediction of up to 1015 write cycles is often cited, the best endurance 4 × 1012 1 Introduction test result for STTRAM devices so far is less than cycles [6]. Although the absolute write endurance values for SRAM and embedded DRAM (eDRAM) are commonly ReRAM and STTRAM seem sufficiently high for use in on- used for on-chip cache designs in modern microproces- chip L2 or L3 caches, the actual problem is that the current sors. However, the scaling of SRAM and eDRAM is cache management policies are not write variation-aware. increasingly constrained by technology limitations such as These polices were originally designed for SRAM caches leakage power and cell density. Recently, some emerg- and result in significant non-uniformity in terms of writing ing non-volatile memory technologies, such as Phase- to cache blocks, which would cause heavily-written non- Change RAM (PCM or PCRAM), Spin-Torque Transfer volatile cache blocks to fail much earlier than most other RAM (STTRAM or MRAM),andResistive RAM (ReRAM), blocks. have been explored as alternative memory technologies. Many wear-leveling techniques have been proposed to Compared to SRAM and eDRAM, these non-volatile extend the lifetime of non-volatile main memories [15, 20, memory technologies have advantages of high density, low 27], but the difference between cache and main memory 978-1-4673-5587-2/13/$31.00 ©2013 IEEE 234 operational mechanisms makes the existing wear-leveling prototypes demonstrate the best write endurance ranging techniques for non-volatile main memories inadequate for from 1010 [14] to 1011 [11]. The best endurance test on non-volatile caches. This is because writes to caches have STTRAM devices so far is less than 4 × 1012 cycles [6]. intra-set variations in addition to inter-set variations while The write endurance of some non-volatile memories may writestomainmemoriesonlyhaveinter-set variations. seem sufficiently high for use in L2 or L3 caches, but the According to our analysis, intra-set variations can be actual problem is the write variation brought by state-of-the- comparable to inter-set variations for some workloads. art cache replacement policies, which would cause heavily- This presents a new challenge in designing wear-leveling written cache blocks to fail much earlier than their expected techniques for non-volatile caches. lifetime. Thus, eliminating cache write variation is one of To minimize both inter- and intra-set write variations, the most critical problems that must be addressed before we introduce i2WAP (inter/intra-set Write variation-Aware non-volatile memories can be used to build practical and cache Policy), a simple but effective wear-leveling policy reliable on-chip caches. for non-volatile caches. i2WAP features two schemes: 1) Swap-Shift is enhanced from the existing main memory 3 Inter-Set and Intra-Set Write Variations wear-leveling techniques and aims to reduce the cache inter-set write variation; 2) Probabilistic Set Line Flush Write variation is a significant concern in designing any is designed to alleviate the cache intra-set write variation, cache/memory subsystems with a limited write endurance. which is a severe problem for non-volatile caches and has Large write variation can greatly degrade the product not been addressed before. lifetime because only a small subset of memory cells that Our experimental results show i2WAP can effectively experience the worst-case write traffic can result in an entire reduce the total write variation by 16X on average, and the dead cache/memory subsystem even when the majority of overall lifetime improvement of non-volatile L2 caches is cells are far from wear-out. 75% (up to 224%) over conventional cache management While the write variation in non-volatile main memories policies. i2WAP has small performance overhead (0.25% has been widely studied [13, 15, 16, 20, 27], to our knowl- on average) and negligible hardware overhead requiring edge, the write variation in non-volatile caches has not. only two extra global counters and two global registers. Wear-leveling in caches brings extra challenges since there are write count variations inside every cache set (i.e. intra- set variations) as well as across different cache sets (i.e. 2 Background inter-set variations). In order to demonstrate how severe 2.1 Benefits of Non-Volatile Caches the problem is for non-volatile caches, we first do a quick experiment. It is beneficial to use non-volatile memory technologies as on-chip caches. First, compared to SRAM and eDRAM, 3.1 Definition non-volatile memory can provide denser caches due to its The objective of cache wear-leveling is to reduce write smaller cell. For example, ReRAM [21] was demonstrated variations and make write traffic uniform. To quantify to be 15 times denser but with a similar read speed the cache write variation, we first define the coefficient of compared to SRAM. Thus, using non-volatile memory inter-set variations (InterV) and the coefficient of intra-set can enable significantly larger caches with correspondingly variations (IntraV) as follows, reduced cache miss rate, improving performance over 2 SRAM-based caches. Second, non-volatile caches can N M reduce energy consumptions. Previous studies have shown wi,j /M − Waver 1 i=1 j=1 that caches may consume up to 50% of a microprocessor’s InterV = W N − 1 (1) energy [18], and leakage energy can be as much as 80% of aver the total cache energy consumption [10]. Using non-volatile 2 M M memories can eliminate leakage energy when they are in w − w /M N i,j i,j standby, hence reducing the total energy consumption. 1 j=1 j=1 IntraV = W · N M − 1 aver i=1 2.2 Limited Write Endurance (2) w Write endurance is defined as the number of times a where i,j is the write count of the cache line located at set i j W memory cell can be overwritten, and emerging non-volatile and way , aver is the average write count defined as: memories commonly have a limited write endurance. The N M wi,j ITRS [7] projects that the average PCM write endurance i=1 j=1 107 108 Waver = is in a range between and . Recent ReRAM NM (3) 235 100% 189% 221% 100% 115% 222% 90% L2 InterV 90% L3 InterV 80% L2 IntraV 80% L3 IntraV 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% (a) (b) Figure 1. The coefficient of variation for inter-set and intra-set write count of L2 and L3 caches in a simulated 4-core system with 32KB I-L1, 32KB D-L1, 1MB L2, and 8MB L3 caches.