JANUARY 20, 2016, SAN JOSE, CA

PRESENTATIONRick TITLE Coulson GOES HERE Sr. Fellow

3D XPoint™ Technology Drives

System Architecture

The Journey to Low Latency Storage

The Latency Journey So Far New Media Enables the Next Step A little about 3D XPoint™ Technology System and SW Architecture Changes In Process Changes to Storage Stack to Minimize Latency Changes to Enable Persistent Memory

2 Starting Point: The First HDD

1957 IBM RAMAC 350 5 MBytes $57,000 $15200/Mbyte ~1.5 Random IOPs

*Other names and brands may be claimed as the property of others. Why the Drive for Low Latency?

HDD ~100x 10K RPM Access time reduction RAMAC 350 ~6ms access 600ms ~10,000x NAND SSD Access time reduction ~60us access

58 Years

Processor RAMAC 305 100 Hz best ~30,000,000x Core™ i7 Clock speed increase case “clock” ~3 Ghz clock

Source: Wikipedia *Other names and brands may be claimed as the property of others. Continuing the Storage Device Latency Journey

Persistent Memory

3D XPoint™ memory (SCM)

Ultra fast SSD Drive for Lower Latency

Media Bottlenecks

Platform HW / SW bottlenecks

5 Media Enabler: 3D XPoint™ Technology (Or any SCM)

Breakthrough Crosspoint Structure Selectors allow dense packing and Material Advances individual access to bits Compatible switch and materials

Scalable High Performance Memory layers can be Cell and array architecture that can stacked in a 3D manner switch states 1000x faster than NAND What is 3D Xpoint™ Technology

Video here 3D XPoint™ Technology Instantiation

Intel® based on 3D Optane™ SSD XPoint™

Demonstration of 3D Xpoint™ SSD Prototype

The Need to Address System Architecture

120

100

) 80 usecs 60

40 Latency ( Latency

20

0 NAND MLC NVMe SSD 3D Xpoint NVMe SSD DIMM Memory (4kB read) (4kB read) (64B read) Storage Enabler: NVMe Efficiency Exposes Low 3D XPoint™ Media Latencies

Latency (uS) 10,000

200 SSD NAND technology offers ~500X 175 reduction in media latency over HDD 150

125 NVMe™ eliminates 20 µs of 100 controller latency 75 50 ~7X 3D XPoint™ SSD delivers < 10 µs latency 25 3D XPoint™ Persistent Memory 0 HDD SSD SSD SSD PM +SAS/ NAND NAND 3D 3D SATA +SAS/ +NVMe™ XPoint™ Xpoint™ SATA +NVMe™

Drive Controller Software Latency Latency Latency (ie. SAS HBA) Source: Storage Technologies Group, Intel NVMe Delivers Superior Latency

100 Platform HW/SW Average Latency Excluding Media 4KB 90

80 AHCI maxes out at ~150K IOPS

70

60

50

40

microseconds (us) microseconds 30 PCIe NVMe approaches theoretical max of 800K IOPS 20 at 18us 10

0 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 IOPS 2 WV LSI 4 WV LSI 6 WV LSI 2 WV AHCI 4 WV AHCI 1 FD 1 CPU 1 FD 2 CPU 1 FD 4 CPU

Source: Storage Technologies Group, Intel NVMe/PCIe Provides More Bandwidth

Bandwidth (GB/sec) 6.4

PCIe/NVMe provides more than 10X the

3.2 Bandwidth of SATA. Even More with Gen 4 0.55

SATA 4X PCIEG3/NVME 8X PCIEG3/NVME

Source: Storage Technologies Group, Intel Enabler: NVMe Over Fabrics

In most Datacenter usage models, a storage write does not “count” until replicated High overhead diminishes the performance differentiation of 3D XPoint™ technology NVMe over Fabrics is a developing standard for low overhead replication Synchronous Completion for QD1?

Async (interrupt-driven) Context Switch 9.0 µs System call Return to user CPU Ta’ Tb=1.4 Tu = 2.7 µs Ta’’ user user kernel kernel user (P2) OS cost = Ta + Tb Storage = 4.9 + Device Device 4.1 µs 1.4 command Interrupt = 6.3 µs

Sync (polling) System call 4.4 µs Return to user CPU user kernel user polling

Storage Device 2.9 µs OS cost = 4.4 µs

Synchronous completion also costs less OS / CPU time Other Enablers

Storage Stack optimizations Reduced Paging Overhead HW RAID alternatives

16 Persistent Memory

30

25

) 20 usecs 15

10 Latency ( Latency

5

0 NAND MLC NVMe SSD 3D Xpoint NVMe SSD DIMM Memory (4kB read) (4kB read) (64B read) Open NVM Programming Model

50+ Member Companies

SNIA Technical Working Group Initially defined 4 programming modes required by developers

Spec 1.0 developed, approved by SNIA voting members and published Interfaces for PM-aware file interfaces for application Kernel support for Interfaces for legacy system accessing kernel PM accessing a PM-aware file block NVM applications to access block support system extensions NVM extensions

18 NVM Library: pmem.io 64-bit Linux Initially

Application

Standard Load/Store File API User Space Library

pmem-Aware MMU Mappings • Open Source Kernel • http://pmem.io Space • libpmem • libpmemobj • libpmemblk Transactional Intel DIMM • libpmemlog • libvmem 19 • libvmmalloc

19 New Instructions

MOV Core Core Core Core L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 CLFLUSH, CLFLUSHOPT, CLWB L3

Memory Controller Memory Controller PCOMMIT

NVDIMM NVDIMM NVDIMM NVDIMM

20 Low Latency Ahead!

<1 usec

Persistent Memory

3D XPoint™ memory

NVMe SSD <10 usec Ultra fast SSD

21 Thank you

22