Rick Coulson Senior Fellow, Intel NVM Solutions Group
1 Outline
• The history of storage latency and where we stand today • The promise of Storage Class Memory (SCM) and 3D Xpoint™ Memory • Extensive compute platform changes driven by the quest for lower storage latency with SCM / 3D Xpoint™ Memory – As traditional storage – As persistent memory • Innovation opportunities abound 1956: IBM RAMAC 350
5 MBytes $57,000 $15200/Mbyte ~1.5 Random IOPs* 600ms latency
* IOPs depends on the workload and is a range *Other names and brands may be claimed as the property of others.
5 1980: IBM 3350
300 Mbytes $60,000 $200/MByte 30 random IOPs 33ms latency
*Other names and brands may be claimed as the property of others. 1983: IBM3380
2.52 GBytes $82,000 $36/MByte ~160 IOPS total 25ms latency
Two hard disk assemblies each with two independent actuators each accessing 630 MB gigabyte within one chassis
*Other names and brands may be claimed as the property of others. 2007: 15K RPM HDD
15K RPM HDD About 200 random IOPs* ~5ms latency
IOPS scaling problem was addressed through HDDs in parallel in Enterprise
* IOPs depends on the workload and is a range 2016: 10K RPM HDD
1.8TB ~150 IOPs 6.6ms latency 2016 NVMe NAND SSD
2TB 500,000+ IOPs ~60 usec latency
10 The Continuing Need For Lower Latency
RAMAC 350 HDD ~100x 10K RPM Access time reduction 600ms ~6ms access
~10,000x NAND SSD Access time reduction ~60us access
59 Years
RAMAC 305 ~40,000,000x Core™ i7 100 Hz best Clock speed increase ~4 Ghz clock case “clock”
Source: Wikipedia *Other names and brands may be claimed as the property of others. Lower Storage Latency Requires Media and
Platform Improvements Persistent Memory
3D XPoint™ memory (SCM)
Ultra fast SSD
Drive for Lower Latency
Media Bottlenecks
Platform HW / SW bottlenecks
12 Addressing Media Latency:
Next Gen NVM / SCM Resistive RAM NVM Options
Scalable Resistive Memory Element Family Defining Switching Characteristics Wordlines Memory Element Phase Energy (heat) converts material Change between crystalline (conductive) and Selector Memory amorphous (resistive) phases Device Magnetic Switching of magnetic resistive Tunnel layer by spin-polarized electrons Junction (MTJ) Electrochemical Formation / dissolution of Cells (ECM) “nano-bridge” by electrochemistry Binary Oxide Reversible filament formation by Filament Oxidation-Reduction Cross Point Array in Backend Layers ~4l2 Cell Cells Interfacial Oxygen vacancy drift diffusion Switching induced barrier modulation
Scalable, with potential for near DRAM access times 3D XPoint™ Technology
Crosspoint Breakthrough Structure Material Advances Selectors allow dense Compatible switch and packing and individual memory cell materials access to bits
Scalable High Performance Memory layers Cell and array architecture can be stacked in that can switch states 1000x a 3D manner faster than NAND 1000X 1000X 10X faster endurance denser THAN NAND OF NAND THAN DRAM
*Results have been estimated or simulated using internal analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance 3D XPoint™ Technology Instantiation
Intel® Optane™ SSDs DIMMs based on 3D XPoint™ 3D Xpoint™ Technology Video
Please excuse the marketing
17 3D Xpoint™ Technology Video Demonstration of 3D Xpoint™ SSD Prototype Need to Address System Architecture To Go Lower
120
100 )
80 usecs 60
Latency( 40
20
0 NAND MLC NVMe SSD 3D Xpoint NVMe SSD DIMM Memory (4kB read) (4kB read) (64B read) Block Storage Platform Changes
Intel® Optane™ SSDs
21 Addressing Interface Efficiency With NVMe / PCI
10,000
200 SSD NAND technology offers ~500X
) 175 reduction in media latency over HDD
uS 150 125 NVMe™ eliminates 20 µs of 100 controller latency
75 Latency ( Latency 50 ~7X 3D XPoint™ SSD delivers < 10 µs latency 25 3D XPoint™ Persistent Memory 0 HDD SSD SSD SSD PM +SAS/ NAND NAND 3D 3D SATA +SAS/ +NVMe™ XPoint™ Xpoint™ SATA +NVMe™
Drive Controller Software Latency Latency Latency (ie. SAS HBA)
Source: Storage Technologies Group, Intel NVMe Delivers Superior Latency Platform HW/SW Average Latency Excluding Media 4KB 100 90 80 70 SATA SAS 60 50 40 PCIe NVMe approaches 30 theoretical max of 800K
microseconds microseconds (us) 20 IOPS at 18us 10 0 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 IOPS 2 WV LSI 4 WV LSI 6 WV LSI 2 WV AHCI 4 WV AHCI 1 FD 1 CPU 1 FD 2 CPU 1 FD 4 CPU
Source: Storage Technologies Group, Intel NVMe/PCIe Provides More Bandwidth
8
7 6.4 6
5
4 3.2 PCIe/NVMe provides 3 more than 10X the Bandwidth of SATA. 2 Even More with Gen 4 Bandwidth (GB/sec) Bandwidth 1 0.55
0 SATA 4x PCIeG3/NVMe 8x PCIeG3/NVMe
Source: Storage Technologies Group, Intel Storage SW Stack Optimizations
Much of the storage stack designed with HDDs latencies in mind • No point in optimizing until now • Example: Paging algorithms with seek optimization and grouping
25 Synchronous Completion for Queue Depth 1?
Async (interrupt-driven) Context Switch From Yang: FAST ‘12 9.0 µs System call Return to user Ta’ Tb=1.4 Tu = 2.7 µs -10th USENIX CPU Ta’’ user Conference on File user kernel kernel user (P2) OS cost = Ta + Tb and Storage Storage = 4.9 + 1.4 Device Device 4.1 µs Technologies command Interrupt = 6.3 µs
Sync (polling) System call 4.4 µs Return to user CPU user kernel user polling
Storage 2.9 µs Device OS cost = 4.4 µs Standards for Low Latency Replication
In most Datacenter usage models, a storage write does not “count” until replicated High replication overhead diminishes the performance differentiation of 3D XPoint™ technology NVMe over Fabrics is a developing SNIA specification for low overhead replication Summary: Block Storage Platform Changes
Move to PCIe based storage Streamlined command set NVMexpress Intel® Optane™ SSDs OS / SW stack optimizations Fast replication standards
28 Persistent Memory Oriented Platform Changes
DIMMs based on 3D XPoint™
29
Why Persistent Memory?
30
25
) 20
usecs 15
10 Latency(
5
0 NAND MLC NVMe SSD 3D Xpoint NVMe SSD 3D XPointDIMMDIMM Memory Memory (4kB read) (4kB read) (64B(64B read)read) Open NVM Programming Model
50+ Member Companies
SNIA Technical Working Group Initially defined 4 programming modes required by developers
Spec 1.0 developed, approved by SNIA voting members and published
Interfaces for PM-aware file system interfaces for application accessing a Kernel support for block Interfaces for legacy applications to accessing kernel PM support PM-aware file system NVM extensions access block NVM extensions
32 NVM Library: pmem.io 64-bit Linux Initially
Application
Standard Load/Store File API User Space Library
pmem-Aware MMU Mappings File System • Open Source Kernel • http://pmem.io Space • libpmem • libpmemobj • libpmemblk Transactional Intel 3D XPoint DIMM • libpmemlog • libvmem 33 • libvmmalloc
33 Write I/O Replaced with Persist Points
Application
Standard Load/Store File API User Space NVM Library
No Page Cache
pmem-Aware MMU Mappings File System
Kernel Traditional APIs NVML API Space msync() pmem_persist() FlushViewOfFile()
NVDIMM Operating System Support for Persistent Memory The Data Path
Core Core Core Core MOV L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2
L3
Memory Controller Memory Controller
NVDIMM NVDIMM NVDIMM NVDIMM
36 New Instructions For Flushing Writes
Core Core Core Core MOV L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 CLFLUSH, CLFLUSHOPT, CLWB
L3
Memory Controller Memory Controller PCOMMIT
NVDIMM NVDIMM NVDIMM NVDIMM
37 Flushing Writes from Caches
Instruction Meaning
Cache Line Flush: CLFLUSH addr Available for a long time
Optimized Cache Line Flush: CLFLUSHOPT addr New to allow concurrency
Cache Line Write Back: CLWB addr Leave value in cache for performance of next access
38
38 Flushing Writes from Memory Controller
Instruction Meaning
Persistent Commit: PCOMMIT Flush stores accepted by memory subsystem
Flush outstanding writes Asynchronous DRAM Refresh on power failure Platform-Specific Feature
39
39 Example Code
Comments MOV X1, 10 MOV X2, 20 X2,X1 are in pmem . MOV R1, X1 Stores to X1 and X2 are globally visible, but may . not be persistent . . CLFLUSHOPT X1 CLFLUSHOPT X2 X1 and X2 moved from caches to memory . . SFENCE PCOMMIT . . SFENCE Ensures PCOMMIT has completed Join the Discussion about Persistent Memory
Learn about the Persistent Memory programming model . http://www.snia.org/forums/sssi/nvmp Join the pmem NVM Libraries Open Source project . http://pmem.io Read the documents and code supporting ACPI 6.0 and Linux NFIT drivers . http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf . https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/log/?h=nd . https://github.com/pmem/ndctl . http://pmem.io/documents/ . https://github.com/01org/prd Intel Architecture Instruction Set Extensions Programming Reference . https://software.intel.com/en-us/intel-isa-extensions Intel 3D XPointTM Memory . http://www.intel.com/content/www/us/en/architecture-and-technology/non-volatile-memory.html Persistent Memory Summary
New storage model for low latency DIMMs based on 3D XPoint™ New instructions to support persistence OS support Lots of innovation opportunity Low Latency Ahead
<1 usec
Persistent Memory
3D XPoint™ memory
NVMe SSD <10 usec Ultra fast SSD
43