Nonblocking Memory Refresh

Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian History of DRAM 2

Refresh Latency Cycle Time Min. Read Latency 512 550

16 13.5 Latency (ns) Latency

0.75 0.5

1968 2000 2003 2007 2014 2018 DDR4 DRAM is DDR DDR2 DDR3 50th Anniversary of patented 2013 DRAM patent 2012 2015 2017 Skipping Refresh (ISCA ‘12, HPCA ‘13 HPCA ’14, ISCA ’15, ISCA ’17, MICRO ‘17 ) Issues with Skipping Refresh 3 Tested DRAM chips from different manufacturers

Memory Cell Refresh Interval (ms) Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu, “Flipping bits in memory without accessing them: An experimental study of dram disturbance errors,” in 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 361–372, June 2014. Skipping refresh reduces memory security Why DRAM Refresh Hurts Performance 4

DRAM SRAM

address line

T3 T4 T1 T2

storage T5 T6

bit line word line

bit bit

Blocking Refresh Nonblocking Refresh Our Proposal: Nonblocking Refresh 5

• Improve performance while retaining the same level of security as the conventional baseline.

• Transform DRAM refresh into the static/background refresh in SRAM at the system level.

• Refresh DRAM in the background without stalling read accesses to refreshing memory blocks. How Nonblocking Refresh Works 6

Nonblocking Refresh Conventional Refresh Refreshing Refreshing Memory Block Memory Block

Pending read requests Calculate to the block are stalled Refreshing Redundant Data Data Leveraging Existing Redundant Data for Free 7

Avg 97% Each memory block in server memory 7 6 Redundant Data Program Data 5 (12.5% - 40.6%) 4 3 For hardware failure operation of Year 2 protection 1 0% 20% 40% 60% 80% 100% % of pages that remain fault-free, on average For server systems, Nonblocking Refresh can leverage existing underutilized redundant data without storage overheads. Primer on Server Memory Organization 8

Example Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2

Fetched Memory Block from Rank Nonblocking Refresh for Server Memory 9

Inaccessible data Accessible data due to refresh Example Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2

Fetched Memory Block from Rank Calculate Challenge 1: Ensuring Same Amount of Refresh 10 Conventional (blocking) refresh Refreshing 6 (inaccessible) 5 4 Not refreshing 3 (accessible)

2 Chip ID Chip 1 Time

Refreshing Memory Rank

Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Challenge 1: Ensuring Same Amount of Refresh 11 Nonblocking Refresh Refreshing 6 (inaccessible) 5 4 Not refreshing 3 (accessible)

2 Chip ID Chip 1 Time

Refreshing Memory Rank

Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Challenge 1: Ensuring Same Amount of Refresh 12 Nonblocking Refresh Refresh Interval Refreshing 6 (inaccessible) 5 4 Not refreshing 3 (accessible)

2 Chip ID Chip 1 Time

Refreshing Memory Rank

Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Challenge 1: Ensuring Same Amount of Refresh 13 Nonblocking Refresh Refresh Interval Refreshing 6 (inaccessible) 5 4 Not refreshing 3 (accessible)

2 Chip ID Chip 1 Time

Refreshing Memory Rank

Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Challenge 2: Ensuring Memory Write Bandwidth 14

Conventional Systems Nonblocking Refresh

Refreshing Shared Shared Memory Bus Memory Bus 100%/N Rank 0% Rank 1 1

Write 100% 36 KB/Channel 100% 100% Rank Writeback Write Rank Queue Queue

100%/N 2 Cache 2

...

...... Processor Processor

Rank 0% Rank N 100%/N N Challenge 3: Preserving Baseline Hardware Failure Protection 15

Use the block’s existing redundant data: Read a block from a to calculate inaccessible data stored in refreshing chips + refreshing rank to detect unknown hardware errors

Hardware YES Error Wait for refresh to complete detected ? NO Re-read block from memory

Read completes Perform error correction Methodology 16

• Two Memory Systems: • Intel/AMD Server Memory Systems • IBM Server Memory System • Baseline: • Conventional Refresh: fully compliance with manufacturer specification • Insecure Refresh: skips 75% of refresh operations • Evaluated 7 multi-threaded and 7 multi-program workloads • 16gb and future 32gb DRAM • 4 memory channels with 4 ranks per channel Performance Improvement 17

40% 35% 30% 25% 20% 15%

Improvement vs. vs. Improvement 10% 5% 0% -5%

Conventional Refresh Conventional -10% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Performance Mem Mem 16Gb 32Gb Performance Improvement 18

10% 8% 6% 4% 2% 0%

Improvement vs. vs. Improvement -2% -4% -6%

Insecure Refresh Insecure -8% -10% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Performance Mem Mem 16Gb 32Gb Power Consumption 19 vs. Conventional Refresh vs. Insecure Refresh 9%

7%

5%

3%

1% Power -1%

-3%

-5% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Mem Mem 16Gb 32Gb Performance of Systems with Faulty Chips 20 3 Faulty Ranks/Channel 2 Faulty Ranks/Channel 1 Faulty Rank/Channel Average 100% 98% 96% 94% 92% 90%

free systems free 88% - 86% 84% 82%

vs. on fault on vs. 80% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem

Nonblocking Refresh on faulty systems systems faulty on Refresh Nonblocking Mem Mem 16GB 32GB Conclusion 21

• Since its invention 50 years ago, DRAM has always required expensive refresh operations that stall accesses to refreshing data.

• We propose Nonblocking Refresh to refresh data in DRAM without stalling read accesses to refreshing data.

• For server memory systems, Nonblocking Refresh improves average performance by 16.2% and 30.3% for 16gb and 32gb chips, respectively.

• Nonblocking Refresh preserves conventional baseline level of security by ensuring the same amount of refresh. 22

Questions?