Introduction to Open-Channel Solid State Drives and What’s Next! Matias Bjørling Director, Solid-State System Software

September 25rd, 2018

Storage Developer Conference 2018, Santa Clara, CA ©2018 Western Digital Corporation or its affiliates. All rights reserved. Forward-Looking Statements Safe Harbor | Disclaimers

This presentation contains forward-looking statements that involve risks and uncertainties, including, but not limited to, statements regarding our solid-state technologies, product development efforts, software development and potential contributions, growth opportunities, and demand and market trends. Forward-looking statements should not be read as a guarantee of future performance or results, and will not necessarily be accurate indications of the times at, or by, which such performance or results will be achieved, if at all. Forward-looking statements are subject to risks and uncertainties that could cause actual performance or results to differ materially from those expressed in or suggested by the forward- looking statements.

Key risks and uncertainties include volatility in global economic conditions, business conditions and growth in the storage ecosystem, impact of competitive products and pricing, market acceptance and cost of commodity materials and specialized product components, actions by competitors, unexpected advances in competing technologies, difficulties or delays in manufacturing, and other risks and uncertainties listed in the company’s filings with the Securities and Exchange Commission (the “SEC”) and available on the SEC’s website at www.sec.gov, including our most recently filed periodic report, to which your attention is directed. We do not undertake any obligation to publicly update or revise any forward- looking statement, whether as a result of new information, future developments or otherwise, except as required by law.

©2018 Western Digital Corporation or its affiliates. All rights reserved. 2 Agenda

1 Motivation

2 Interface

3 Eco-system

4 What’s Next? Standardization

©2018 Western Digital Corporation or its affiliates. All rights reserved. 3 0% Writes - Read Latency

4K Random Read / 4K Random Write 4K Random Read Latency Read Random 4K

I/O Percentiles

©2018 Western Digital Corporation or its affiliates. All rights reserved. 4 20% Writes - Read Latency 4K Random Read / 4K Random Write 4ms!

Signficant outliers!

Worst-case 30X 4K Random Read Latency Read Random 4K

I/O Percentiles

©2018 Western Digital Corporation or its affiliates. All rights reserved. 5 NAND Chip Density Continues to Grow While Cost/GB decreases

3D NAND Layers 120 SLC MLC TLC QLC

100 96

80 64 60 Workload #2 Workload #3 48

40

Workload #1 Workload #4 20

0 2015 2017 2018

©2018 Western Digital Corporation or its affiliates. All rights reserved. 6 Ubiquitous Workloads Efficiency of the Cloud requires many different workloads of a single SSD

Databases Sensors Analytics Virtualization Video

SSD

©2018 Western Digital Corporation or its affiliates. All rights reserved. 7 Solid State Drive Internals Read/Write Host Interface (NVMe)

• NAND Read/Program/Erase Solid State Drive Logical to Physical Wear-Leveling Garbage Collection • Highly Parallel Architecture Translation Map – Tens of Dies Bad Block Media Error … • NAND Access Latencies Management Handling

• Translation Layer Media Interface Controller – Logical to Physical Translation Layer Read/Program/Erase – Wear-leveling Read (50-100us) – Garbage Collection Die0 Die0 Die0 Die0 Write (1-10ms) Erase (3-15ms) – Bad block management Die1 Die1 Die1 Die1 – Media error handling – Etc. Die2 Die2 Die2 Die2 • Read/Write/Erase -> Read/Write Die3 Die3 Die3 Die3

Highly Parallel Architecture 8 ©2018 Western Digital Corporation or its affiliates. All rights reserved. Single-User Workloads Indirection and Indirect Writes causes outliers

Host: Log-on-Log Device: Indirect Writes

User Log-structured Database (e.g., RocksDB) 1 Writes Reads Space Metadata Mgmt. Address Mapping Garbage Collection pread/pwrite Solid-State Drive Pipeline VFS Write Buffer Kernel Log-structured File-system NAND Controller Log- 2 Space Metadata Mgmt. Address Mapping Garbage Collection Structuredii die0 die1 die2 die3 Block Layer Read/Write/Trim Drive maps logical Indirect to the physical location Solid-State Drive Writes using Best Effort HW Metadata Mgmt. Address Mapping Garbage Collection 3

Host is oblivious to data placement due to the Unable to align data indirection logically = Write amplification increase + extra GC

©2018 Western Digital Corporation or its affiliates. All rights reserved. 9 Open-Channel SSDs

I/O Isolation Predictable Latency Data Placement & I/O Scheduling

©2018 Western Digital Corporation or its affiliates. All rights reserved. 10 Solid State Drive Internals

• Host Responsibility – Logical to Physical Translation Map – Garbage Collection

– Logical Wear-leveling NVMe Device Driver • Hint to host to place hot/cold data

• Integration Solid State Drive – Block device Logical to Physical LogicalDevice Garbage Collection • Host-side FTL that does L2P, GC, Logical Wear- Translation Map Wear-Leveling leveling

• Similar overhead to traditional SSDs Bad Block Media Error … – Applications Management Handling • Databases and File-systems Media Interface Controller

Read/Program/Erase

Die0 Die0 Die0 Die0

©2018 Western Digital Corporation or its affiliates. All rights reserved. 11 Die1 Die1 Die1 Die1 Concepts in an Open-Channel SSD Interface Blocks

• Chunks – Sequential write only LBA ranges – Align writes to internal block sizes • Hierarchical addressing – A sparse addressing scheme projected onto the NVMe™ LBA address space • Host-assisted Media Refresh – Improve I/O predictability • Host-assisted Wear-leveling – Improve wear-leveling

©2018 Western Digital Corporation or its affiliates. All rights reserved. 12 Chunks #1 Enable orders of magnitude reduction of device-side DRAM

• A chunk is a range of LBAs where writes must be sequential. – Reduces DRAM for L2P table by orders of magnitude – Hot/Cold data separation • Rewrite requires a reset • A chunk can be in one of four states (free/open/closed/offline) • If a chunk is open, there is a write pointer associated. • Same device model as the ZAC/ZBC standards. • Similar device model to be standardized in Namespace NVMe (I’ll come back to this)

Chunk 0 Chunk 1 Chunk X

0 ... Max LBAs ©2018 Western Digital Corporation or its affiliates. All rights reserved. Chunks #2

• Drive capacity divided into chunks Disk LBA range divided in chunks • Chunk types – Conventional • Random or Sequential – Sequential Write Required • Chunk must be written sequential only Write • Must be reset entirely before being rewritten commands Write pointer advance the position write pointer

Reset commands rewind the write pointer

©2018 Western Digital Corporation or its affiliates. All rights reserved. 9/22/2018 14 Hierarchical Addressing Channels and Dies are mapped to Logical Groups and Parallel Units

• Expose device parallelism through Groups/Parallel Units • One or a group of dies are exposed as parallel units to the host • Parallel units are a logical representation

Physical LBA Address Space Logical Host

SSD

Group 0 1 … Group - 1

Chunk Chunk

Chunk Chunk Chunk Chunk Chunk PU 0 1 … PU - 1 Chunk Groups PU PU PU PU 0 1 … Chunk - 1 (Channels) Chunk LBA - Group(s) LBA 0 1 … 1 NVMe Namespace LUNLUNPU LUNLUNPU (Dies) (Dies)

©2018 Western Digital Corporation or its affiliates. All rights reserved. 15 Host-assisted Media Refresh Enable host to assist SSD data refresh

• SSDs refreshes its data periodically to Step 1 Step 2

maintain reliability. It does this through a NVMe AER Refresh Data data scrubbing process (Chunk Notification Entry) (If necessary) – Internal read and writes make the drive I/O latencies unpredictable. – Writes dominates I/O outliers Read-only Scrubbing

• 2-step Data Refresh NAND Controller OCSSD – Device to only perform the data scrubbing read

part - Data movement is managed by host die die die die 0 0 0 0 – Increases predictability of the drive. Host manages refresh strategy die die die die • Should it refresh? Is there a copy elsewhere? 1 1 1 1

©2018 Western Digital Corporation or its affiliates. All rights reserved. 16 Host-assisted Wear-Leveling Enable host to separate Hot/Cold data to Chunks depending on wear Host Writes • SSDs typically does not know the temperature of newly written data Write Seq. SSD Superblock (SB) – Placing hot and cold data together increases write amplication 1st GC GC SB – Write amplication is often 4-5X for SSDs with no 2nd GC Hot SB Cold? SB optimizations 3rd GC Warm SB Cold SB • Chunk characteristics – Limited reset cycles (as NAND blocks has limited erase Host Writes cycles) – Place cold data on chunks that are nearer end-of-life SSD and use younger chunks for hot data Chunk X WLI: 0

• Approach Chunk Y WLI: 33% – Introduce per-chunk relative wear-level indicator (WLI) Chunk Z WLI: 90% – Host knows workload and places data w.r.t. to WLI – Reduces garbage collection→ Increases lifetime, I/O

latency, Flashand Memory performance Summit 2018, Santa Clara, CA ©2018 Western Digital Corporation or its affiliates. All rights reserved. 17 Interface Summary The concepts together provide

I/O Isolation through the use of Groups & Parallel Units

Fine-grained data refresh managed by the host

Reduce write amplification by enabling host to place hot/cold data efficiently

DRAM & Over-provisioning reduction through append-only Chunks

Direct-to-media to avoid expensive internal data movement

Specification available at http://lightnvm.io

©2018 Western Digital Corporation or its affiliates. All rights reserved. 18 Eco-system Large eco-system through Zoned Block Devices and OCSSD User Space

Kernel® Applications – NVMe Device Driver Applications • Detection of OCSSDs liblightnvm • Support for 1.2 and 2.0 specification ) Linux

• Register with LightNVM subsystem fio File-System with Regular File- Kernel SMR Support • Register as a Zoned Block Devices Systems () (patches available) (, ) – LightNVM Subsystem Logical Block • Core functionality Device (pblk) • Target management LightNVM Block Layer – Target interface ( Tester IO Flexible Subsystem • Enumerate, get geometry, I/O interface, etc. NVMe Driver • pblk host-side FTL – Map OCSSD to Block Device

• User-space OCSSD2 – libzbc, fio (ZBD support), liblightnvm – SPDK

©2018 Western Digital Corporation or its affiliates. All rights reserved. 19 Open-Source Software Contributions

• Initial release of subsystem with Linux kernel 4.4 (January 2016). • User-space library (liblightnvm) support upstream in Linux kernel 4.11 (April 2017). • pblk available in Linux kernel 4.12 (July 2017). • Open-Channel SSD 2.0 specification released (January 2018) and support available from Linux kernel 4.17 (May 2018). • SPDK Support for OCSSD (June 2018) • Fio with Zone support (August 2018) • Upcoming – OCSSD as a Zoned Block Device (Patches available) – RAIL – XOR support for lower latency – 2.0a revision

©2018 Western Digital Corporation or its affiliates. All rights reserved. 9/19/2018 · 20 Tools and Libraries

LightNVM: The Linux Open-Channel SSD Subsystem

https://www.usenix.org/conference/fast17/technical-sessions/presentation/bjorling LightNVM http://lightnvm.io LightNVM Linux kernel Subsystem https://github.com/OpenChannelSSD/linux liblightnvm https://github.com/OpenChannelSSD/liblightnvm QEMU NVMe with Open-Channel SSD Support https://github.com/OpenChannelSSD/qemu-nvme

©2018 Western Digital Corporation or its affiliates. All rights reserved. 21 Western Digital and the Western Digital logo are registered trademarks or trademarks of Western Digital Corporation or its affiliates Storage Developer Conference 2018, Santa Clara, CA in the US and/or other countries. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries. The NVMe ©2018 Western Digital Corporation or its affiliates. All rights reserved. word mark is a trademark of NVM Express, Inc. All other marks are the property of their respective owners. 22