Expanding the World of Heterogenous Memory Hierarchies The Evolving Non-Volatile Memory Story
Bill Gervasi Principal Systems Architect
16 May 2019 2 Data Checkpointing Memory Processing Tiers Challenges Persistence Sharing Time Agenda Current Solutions Security Seeking Mixed the Ideal Distributed A New Mode Processing Standard Solutions 3
Data processing is great 4
Data processing is great
Until something goes wrong The Cost of Power Failure 5 6 7
DRAM Checkpointing Run degrades performance Checkpoint Storage Checkpointing Run burns power Checkpoint Storage
Run Checkpointing sucks Checkpoint Storage 8
DRAM Run But checkpointing avoids data loss from failure Checkpoint Storage Checkpoint
Run FAIL! RESTART Run 9
System failure is a key factor in server software design
Data persistence is essential
Storage access time impacts transaction granularity 10 The game we play to trade off performance, capacity, and cost 11
To reduce the penalties from checkpointing…
…move non-volatile storage closer to the CPU Traditional Server Architecture Review 12
Network
Memory I/O $ CPU Control
… … Memory Memory Memory Memory Memory Memory Memory Memory … … …
Faster, lower latency 13
The Search for The Holy Grail 14
When we no longer fear power failure…
DATA PERSISTENCE What if you could 15 replace DRAM with a non-volatile memory?
You’d call it Memory Class Storage 16
When was the last time you read about a new volatile memory?
NRAM™
MRAM PCM
3DXP ReRAM
The non-volatile memory revolution is under way 17 To NVRAM To DRAM
To core memory From vacuum tubes 18
THIS is why the term “Persistent Memory” is insufficient
The industry must distinguish between deterministic and non-deterministic persistent memory
Only “Memory Class Storage” is fully deterministic AND persistent 19
NRAM FeRAM 3DXpoint MRAM Flash ReRAM DRAM SRAM Not all “persistence” is created equal 20
“Write endurance” determines HOW persistent Wear leveling needed if writes are limited 21 Temperature sensitivity impacts long term retention
Weeks of Data Retention 22 READ WRITE WRITE READ WRITE
DATA DATA DATA DATA DATA
DRAM interface is deterministic Data latency is FIXED
READ WRITE WRITE READ WRITE DATA DATA DATA X X HOUSEKEEPING Any endurance limit breaks determinism 23 Memory Class Storage
Full DRAM Speed
No endurance limits
Fully deterministic 24 NVRAM is a Memory Class Storage 25 In the future?
Memory Class Storage
Memory Class Storage = NVRAM NVRAM
For now… 26 Storage Class Memory Is NOT a Memory Class Storage 27
Hard Disk Storage Memory Class Storage Class SSD Memory Storage Flash NVMe ≥ DRAM performance = DRAM endurance Phase Change ≥ DRAM capacity 3DXpoint Resistive RAM Wasteland Magnetic RAM 3D NOR
DDR DRAM DDR NVRAM 28
Non-Deterministic Non-Deterministic Non-Deterministic
Deterministic Deterministic Deterministic 29
DRAM
NVDIMM-N
Optane
NVRAM Memory Class Storage
NVDIMM-P 30 Itty bitty leaky capacitors lose charge DRAM On power fail, you lose
ACT ACT ACT ACT
RD RD RD RD
WR WR WR WR
PRE PRE PRE PRE REFRESH Refresh time consumes up to 15% of bandwidth 31
Run DRAM
FAIL! Energy Source 32
Flash Backup Voltage NVDIMM-N NVM Regulator DRAM Array Control
Use DRAM normally Isolation Buffers -N NVDIMM
On Power Fail, copy to Flash Power Fail
Power restored, copy to DRAM Voltage CPU Regulator Host System 33 Run
NVDIMM-N FAIL! RESTORE
Switch to Battery Copy Flash to Power DRAM
Copy DRAM to Flash Run 34
NVDIMM-N One power fail cycle pays for 1-2 MINUTES Copy Flash to DRAM a LOT of protection Copy DRAM to Flash 1-2 MINUTES Faster than Flash!!! 35 3DXpoint Array But vs DRAM? Meh
NVM Optane Decent capacity, though Control Reads are slow Writes are deathly slow
RD WR CPU Data Host System
Could be used as a very slow DRAM but more common as expansion Data 36 512GB = 512GB 3DXpoint Array
3DXpoint Array
App Direct NVM Control NVM Control Memory Mode 512GB + 64GB = 512GB
DRAM as Cache
CPU Optane Host System CPU Host System Small Energy Source Non-Volatile Memory 37 Array – Any Kind Voltage Regulator NVDIMM-P DRAM NVM Cache Control NVDIMM-P Protocol
Read A Read B Send Read C Send Send
RSP Data A RSP RSP Data C Data B
CPU New non-deterministic protocol Host System Not backward compatible with DDR
Requires NVDIMM-P aware CPU 38
Volatile Mode Battery Backup No Persistence ala NVDIMM-N
NVDIMM-P Persistence Options
Reduced Explicit FLUSH Energy, Command Cacheless NVRAM 39
DRAM speed Non-volatility Unlimited write endurance Wide temperature range Scalable beyond DRAM Flexible fabrication & application Low power Low cost 40
Drop in replacement for DRAM NVRAM MemoryDRAM Class Storage
Fully Deterministic
Permanently persistent
Always available Host System 41
NRAM™ PCM *
DDR5 * Future generation devices NVRAM
ReRAM * MRAM * 42 43 Comparing DRAM & NVRAM
No refresh is required
“Self refresh” can be power OFF
Some timing differences (but deterministic!)
Data persistence definitions
Greater per-die capacity 44
NRAM™ PCM
ReRAM ≠ MRAM
Precharge Persistence Timings requirement definition
DDR5 NVRAM Specification brings coherence 45 DRAM NVRAM
IDLE IDLE
“0 ns” REFRESH
REFRESH “350 ns” = NOP
Refresh command is not needed Decoded as NOP for compatibility 46 DRAM NVRAM
IDLE IDLE
SELF REFRESH SELF REFRESH
FREQUENCY REFRESH CHANGE FREQUENCY CHANGE Power burned “No” power burned DRAM NVRAM 47
IDLE READ
ACTIVATE IDLE WRITE READ
PRECHARGE WRITE Precharge command is not needed Decoded as NOP for compatibility 48
Persistence Definitions*
Intrinsic: Power Fail: Immediately On After NVRAM Extrinsic: WRITE RESET After FLUSH Command * Discussions on-going Data is persistent 49
Intrinsic WR WR WR Persistence
Extrinsic WR WR FLUSH WR WR FLUSH Persistence
Power Fail WR WR WR WR WR RESET Persistence
* Discussions on-going 50
DDR5 DRAM DDR5 NVRAM is limited enables up to to 32Gb per die 128Tb per die 51 DDR5 SDRAM
ACT RD WR ACT RD WR ACT RD WR
DDR5 NVRAM
REXT ACT RD WR ACT RD WR REXT ACT RD WR
Row Extension adds up to 12 more bits of addressing
Backward compatible with DDR5 – Acts like REXT = 0 until needed 52 ROW Bank buffer 0 DDR5
ROW Bank buffer 31 … SDRAM COLUMNS
“ROW” includes bank group & bank…
REXT ROW Bank buffer 0 DDR5
REXT ROW Bank buffer 31 … NVRAM
COLUMNS 53
REXT A REXT C ACT BANK W, ROW K READ BANK Y Row A + M ACT BANK X ,ROW L READ BANK Z Row B + N ACT BANK Y, ROW M ACT BANK W, ROW P
REXT B READ BANK W Row C + P
ACT BANK Z, ROW N WRITE BANK X Row A + L READ BANK W Row A + K ACT BANK X, ROW L
WRITE BANK X Row C + L READ BANK X Row A + L REXT A READ BANK Y Row A + M ACT BANK X, ROW R READ BANK Z Row B + N READ BANK X Row A+R
Row Extension Example 54
REXT A REXT C ACT BANK W, ROW K READ BANK Y Row A + M ACT BANK X ,ROW L READ BANK Z Row B + N ACT BANK Y, ROW M ACT BANK W, ROW P
REXT B READ BANK W Row C + P
ACT BANK Z, ROW N WRITE BANK X Row A + L READ BANK W Row A + K ACT BANK X, ROW L
WRITE BANK X Row C + L READ BANK X Row A + L REXT A READ BANK Y Row A + M ACT BANK X, ROW R READ BANK Z Row B + N READ BANK X Row A+R
Row Extension Replacement Example 55
NVRAM Memory Class Storage 56
DRAM NVRAM Run Checkpointing Run can be made Checkpoint Storage to persistent memory Run Run OR
Checkpoint Storage Checkpointing Run can be turned Run off completely Run
Checkpoint Storage 57 Phase 1 Phase 2
DRAM NVRAM NVRAM Run Run Run
Checkpoint Storage Checkpoint NVRAM No checkpoint
Run Run Run
Checkpoint Storage Checkpoint NVRAM No checkpoint
Run Run Run
Checkpoint Storage Checkpoint NVRAM 58
Keep in mind…
Checkpoints may include system failure
Knowing when a task may resume is complicated Power failure is not the only thing to fear 59 Remember Those Persistence Definitions Immediately On After NVRAM WRITE RESET After FLUSH Command Tasks may not be safe until system Tasks may be safe in stability confirmed nanoseconds Tasks may be safe in microseconds 60
Persistence Capacity Performance
System designers have a lot of options to balance Homogenous 61 Main Memory
DRAM
NVDIMM-N MCS
NVDIMM-P Optane Heterogenous 62 Main Memory
DRAM + Optane MCS + NVDIMM-P
MCS + Optane 512GB 63
When capacity NVDIMM-P Optane meets persistence
64GB
DRAM NVRAM 32GB Memory Class Storage
NVDIMM-N Homogenous 64 Main Memory Combinations
Data Safe Performance Capacity
DRAM No Best 1.0 X
NVDIMM-N Yes Best 0.5 X
Optane Yes Worst 10 X
NVDIMM-P Yes Mid 10 X
MCS Yes Best+ 1 X+ Heterogeneous 65 Main Memory Combinations
Data Safe Performance Capacity
DRAM + Optane No High 6 X
DRAM + NVDIMM-P No High 6 X
MCS + Optane Yes High 6 X
MCS + NVDIMM-P Yes High 6 X Homogenous Heterogeneous 66 Main Memory Combinations Main Memory Combinations
Software need not care Software encouraged to put critical functions All functions take the in faster memory same time Often mount slower memory as RAM drive 67
Software support via DAX assists in moving…
from mounted drives…
…to RAM drive…
…to direct access mode 68
The Power
of
Zero Power 69 Putting a Node to Sleep
Operating Self Refresh Mode Mode
Instant On means power must stay alive
Refresh operations burn significant power 70 Memory Class Storage can be turned off entirely
Operating Power 33 Mode Off 71 Memory Module (DIMM)
DDR5 memory modules Module Power have on-DIMM voltage Memory Media regulation (PMIC) PMIC Data Buffers DIMM power may be shut off independently of system power System Power
System Motherboard DIMM1 DIMM2 72
Module Module Power Power Memory Media Memory Media
PMIC PMIC Data Buffers Data Buffers
System Power
System power off; both DIMMs off Multiple power System power on & both DIMMs off management options System power on & DIMM1 on, DIMM2 off 73
Nantero NRAM™
My favorite NVRAM
Full presentation on Wednesday… ELECTRODE 74
ELECTRODE
Van der Waals energy barrier keeps CNTs apart or together
Data retention >300 years @ 300 ֯C, >12,000 years @ 105 ֯C
Stochastic array of hundreds nanotubes per each cell 75
No temperature sensitivity
5 ns balanced read/write performance 76
NRAM Data Retention = 12,000 Years
10,000 years ago 4,500 years ago
2,500 years ago X 77 Y NRAM LAYER 64 Kb tile X Z 256 K tiles = 16 Gb Drivers Receivers Array size tuned to the size of drivers & receivers Chip-level timing is a function of bit line flight times Replicate this “tile” as needed for device capacity Add I/O drivers to emulate any PHY needed
I/O PHY 78
Carbon Nanotube Arrays
DDR4, DDR5 72 bits NRAM Row Column Bank Decode Decode Decode SECDED ECC Engine
Address 64 bits
FIFO FIFO Chip ID Die Selector x4/x8
Data Strobe Strobe Data 79
15-20% DDR4/DDR5
DDR4/DDR5 NRAM Architectural improvements improve data throughput Base throughput 15% or greater at the same Elimination of refresh clock frequency Elimination of tFAW restrictions Elimination of bank group restrictions
Elimination of power states
Elimination of inter-die delays
Bandwidth: larger is better NVRAM 80 Memory Class Storage NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM
NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM
Plugs into an RDIMM slot
Appears to the CPU as DRAM
Memory controller may optionally be tuned for NVRAM 81 One less layer of marshmallows to deal with Persistence
Persistence Non- deterministic Fully deterministic 82 Would you rather… 83 Step on broken glass? Know Your
Enemy A LEGO? Or some jacks? 84
…about those energy stores…
Batteries Supercapacitors Tantalums (etc.) 85
Batteries Supercapacitors Tantalums (etc.)
High capacity Medium capacity Low capacity High energy density Low energy density Low energy density Low reliability Degrade over time …but stable 86
Energy needed for backup of DRAM cache
Flash or Storage Storage Class DRAM Controller Energy Memory
I/O More room 87 for storage
Eliminate need X for backup energy
Flash or Storage Storage Class Controller NVRAM Energy Memory X I/O 88 NVRAM Changes the Math DRAM cache limited by 1GB/TB energy available No DRAM? Cache size dictated by cost/performance 89
Switching gears again…
…to Systems Evolution 90
Pop quiz How many CPUs in a 1980s PC? 91
Modem
One?
Graphics Network Adapter Adapter Sound Blaster They were killed by 92 “Native Signal Processing” They were called “DSPs” Digital Signal Processors
Drivers
They put processing next to the data Analog front end devices 93
With NSP… $ $ $ W W So why do it? W Now We Are 94 Trending Back FPGA
CPU Memory
Storage Fabric Storage
Memory CPU
AI 95 Processor 96 Elements Processor Processor Elements Elements Bridge
I/O Low I/O Elements Latency Elements Fabric Bridge
Storage Storage Elements Elements Storage Elements
Bridge 97
Distributed resources
Application-specific computing
In-memory computing
Artificial intelligence and deep learning
Security 98
Graphics Accelerator Search Engine Human interface Standard CPU
HTML processing Artificial Intelligence Accelerator Human interface management Low Latency Network Adapter Fabric
Memory Array
Filesystem Aware Storage Example AI accelerator 99 Tbps links
I/O NNP Control
Exec HBM SRAM Unit SIMD architectures Matrix interconnections Exec HBM SRAM Unit Fast pipes still limit load/save time
HBM … Challenges: • Model checkpointing Exec HBM SRAM • Data loss on power fail Unit • Temperature sensitivity I/O 100
Back propagation algorithms complicate things
Data loss problems are amplified
Checkpointing highly time and bandwidth consuming 101
The more distributed memory gets, the harder to load and unload
MEM MEM MEM MEM
MEM MEM MEM MEM
MEM MEM MEM MEM 102
NVRAM TO THE RESCUE!
Replacing dynamic memory with persistent memory resolves the data loss issues Just leave the data in place 103 as long as you want
I/O NNP Control Replace DRAM with NVRAM
MCS Exec Replace eRAM with NVRAM HBM SRAM HBM Unit
MCS Exec MCS MCS MCS MCS HBM SRAM HBM Unit MCS MCS MCS MCS
MCS HBM HBM … MCS MCS MCS MCS
MCS Exec HBM SRAM HBM Unit 104
SRAM & The final frontier… Registers 105 Continuing to look for ways to bring Memory Class Storage down under 1ns Voltage adjustment Faster edge rates Better error check Shadow registers Getting smarter
It will happen 106
When we no longer fear power failure…
DATA PERSISTENCE
Full END TO END persistence 107
Are we getting near the day when we look back at volatile memory… …and LAUGH? 108
……b…bu…but…but… 109
Persistent data introduces challenges, too 110 Data is ALWAYS there!
Data security is a growing concern 111 Application opens data from previous application
Memory moved from one system to another
Spy devices on memory buses
So many potential breaches 112
Infection via hack Infection via spy devices 113
Password: X2.Hd44**3#jj0%
General trend is to encrypt data before transmission or storage 114
X2.Hd44**3#jj0%
Keep the bad guys out
X2.Hd44**3#jj0% Some are adding in-memory 115 Small Energy Source Non-Volatile Memory compute functions Array – Any Kind including encryption Voltage X2.Hd44**3#jj0% Regulator Works as long as the bus Smart DRAM NVM Cache is secure Control Encryption quality may be Password: limited by block transfer size
CPU Management of many keys Host System can get complicated quickly 116 ISO/IEC 11889 Saving Data is Need tiers 117 Power Fail a Pain of memory Sucks & storage Persistence Sharing is Essential Time Today’s Summary Solutions Persistence Help Complications But We Data Can Do Mix & DDR5 Better Distribution Match NVRAM Spec Challenges Memories in Progress 118
Thank you for your time
Bill Gervasi [email protected] 119
I’m here to learn too
What do you deal with?