Nonblocking Memory Refresh

Nonblocking Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian History of DRAM 2 Refresh Latency Bus Cycle Time Min. Read Latency 512 550 16 13.5 Latency (ns) Latency 0.75 0.5 1968 2000 2003 2007 2014 2018 DDR4 DRAM is DDR DDR2 DDR3 50th Anniversary of patented 2013 DRAM patent 2012 2015 2017 Skipping Refresh (ISCA ‘12, HPCA ‘13 HPCA ’14, ISCA ’15, ISCA ’17, MICRO ‘17 ) Issues with Skipping Refresh 3 Tested DRAM chips from different manufacturers Memory Cell Refresh Interval (ms) Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu, “Flipping bits in memory without accessing them: An experimental study of dram disturbance errors,” in 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 361–372, June 2014. Skipping refresh reduces memory security Why DRAM Refresh Hurts Performance 4 DRAM SRAM address line T3 T4 T1 T2 transistor storage T5 T6 capacitor bit line word line bit bit Blocking Refresh Nonblocking Refresh Our Proposal: Nonblocking Refresh 5 • Improve performance while retaining the same level of security as the conventional baseline. • Transform DRAM refresh into the static/background refresh in SRAM at the system level. • Refresh DRAM in the background without stalling read accesses to refreshing memory blocks. How Nonblocking Refresh Works 6 Nonblocking Refresh Conventional Refresh Refreshing Refreshing Memory Block Memory Block Pending read requests Calculate to the block are stalled Refreshing Redundant Data Data Leveraging Existing Redundant Data for Free 7 Avg 97% Each memory block in server memory 7 6 Redundant Data Program Data 5 (12.5% - 40.6%) 4 3 For hardware failure operation of Year 2 protection 1 0% 20% 40% 60% 80% 100% % of pages that remain fault-free, on average For server systems, Nonblocking Refresh can leverage existing underutilized redundant data without storage overheads. Primer on Server Memory Organization 8 Example Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Fetched Memory Block from Rank Nonblocking Refresh for Server Memory 9 Inaccessible data Accessible data due to refresh Example Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Fetched Memory Block from Rank Calculate Challenge 1: Ensuring Same Amount of Refresh 10 Conventional (blocking) refresh Refreshing 6 (inaccessible) 5 4 Not refreshing 3 (accessible) 2 Chip ID Chip 1 Time Refreshing Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Challenge 1: Ensuring Same Amount of Refresh 11 Nonblocking Refresh Refreshing 6 (inaccessible) 5 4 Not refreshing 3 (accessible) 2 Chip ID Chip 1 Time Refreshing Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Challenge 1: Ensuring Same Amount of Refresh 12 Nonblocking Refresh Refresh Interval Refreshing 6 (inaccessible) 5 4 Not refreshing 3 (accessible) 2 Chip ID Chip 1 Time Refreshing Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Challenge 1: Ensuring Same Amount of Refresh 13 Nonblocking Refresh Refresh Interval Refreshing 6 (inaccessible) 5 4 Not refreshing 3 (accessible) 2 Chip ID Chip 1 Time Refreshing Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Challenge 2: Ensuring Memory Write Bandwidth 14 Conventional Systems Nonblocking Refresh Refreshing Shared Shared Memory Bus Memory Bus 100%/N Rank 0% Rank 1 1 Write 100% 36 KB/Channel 100% 100% Rank Writeback Write Rank Queue Queue 100%/N 2 Cache 2 ... ... ... Processor Processor Rank 0% Rank N 100%/N N Challenge 3: Preserving Baseline Hardware Failure Protection 15 Use the block’s existing redundant data: Read a block from a to calculate inaccessible data stored in refreshing chips + refreshing rank to detect unknown hardware errors Hardware YES Error Wait for refresh to complete detected ? NO Re-read block from memory Read completes Perform error correction Methodology 16 • Two Memory Systems: • Intel/AMD Server Memory Systems • IBM Server Memory System • Baseline: • Conventional Refresh: fully compliance with manufacturer specification • Insecure Refresh: skips 75% of refresh operations • Evaluated 7 multi-threaded and 7 multi-program workloads • 16gb and future 32gb DRAM • 4 memory channels with 4 ranks per channel Performance Improvement 17 40% 35% 30% 25% 20% 15% Improvement vs. vs. Improvement 10% 5% 0% -5% Conventional Refresh Conventional -10% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Performance Mem Mem 16Gb 32Gb Performance Improvement 18 10% 8% 6% 4% 2% 0% Improvement vs. vs. Improvement -2% -4% -6% Insecure Refresh Insecure -8% -10% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Performance Mem Mem 16Gb 32Gb Power Consumption 19 vs. Conventional Refresh vs. Insecure Refresh 9% 7% 5% 3% 1% Power -1% -3% -5% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Mem Mem 16Gb 32Gb Performance of Systems with Faulty Chips 20 3 Faulty Ranks/Channel 2 Faulty Ranks/Channel 1 Faulty Rank/Channel Average 100% 98% 96% 94% 92% 90% free systems free 88% - 86% 84% 82% vs. on fault vs. 80% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Nonblocking Refresh on faulty systems systems on faulty Refresh Nonblocking Mem Mem 16GB 32GB Conclusion 21 • Since its invention 50 years ago, DRAM has always required expensive refresh operations that stall accesses to refreshing data. • We propose Nonblocking Refresh to refresh data in DRAM without stalling read accesses to refreshing data. • For server memory systems, Nonblocking Refresh improves average performance by 16.2% and 30.3% for 16gb and 32gb chips, respectively. • Nonblocking Refresh preserves conventional baseline level of security by ensuring the same amount of refresh. 22 Questions?.

Nonblocking Memory Refresh

ASIC Implementation of DDR SDRAM Memory Controller

A Modern Primer on Processing in Memory

Machxo2 LPDDR SDRAM Controller IP Core User's Guide

¡ Semiconductor MSM5718C50/Md5764802this Version: Feb

Retention-Aware DRAM Auto-Refresh Scheme for Energy and Performance Eﬃciency

DRAM Refresh Mechanisms, Penalties, and Trade-Offs

AMBA DDR, LPDDR, and SDR Dynamic Memory Controller DMC-340 Technical Reference Manual

External Memory Interface Handbook Volume 3: Reference Material

On the Optimal Refresh Power Allocation for Energy-Efficient

Flipping Bits in Memory Without Accessing Them: an Experimental Study of DRAM Disturbance Errors

Rowhammer: a Retrospective

DR DRAM: Accelerating Memory-Read-Intensive Applications