SPDK FTL & Marvell OCSSD for Noisy Neighbor Problem

David Recker Marketing VP

Circuit Blvd., Inc. April 2019

4/17/2019 © 2019 Circuit Blvd., Inc. 1 Industry Who We Are Enterprise/Cloud Database and Storage

Year Founded Sunnyvale, CA, U.S.A. 2017

Mission We develop next gen database/storage systems leveraging expertise in memory semiconductor, solid-state storage system, and operating systems

Open Source Contributions • Linux LightNVM, OCSSD 2.0 specification, OpenSSD FPGA platform • SPDK (since SPDK v17.10) • RocksDB

4/17/2019 © 2019 Circuit Blvd., Inc. 2 OCSSD with SPDK FTL

• SPDK FTL on Marvell’s OCSSD Platform • We have been evaluating SPDK FTL on Marvell's SSD SoC platform since Jan ’19 • SPDK (Flash Translation Layer) FTL: The Flash Translation Layer library provides block device access on top of non-block SSDs implementing Open Channel interface. It handles the logical to physical address mapping, responds to the asynchronous media management events, and manages the defragmentation process* • Measured various performance metrics of initial prototype and demonstrate how SPDK OCSSDs can solve the noisy neighbor problem in multi-tenant environments • Share experimental data based on our current implementation (both SPDK FTL and Marvell’s controller being continuously improved)

• (Demo) SPDK Driven OCSSD Comparison (Isolation vs Non-Isolation) • Demo table outside (please feel free to drop by for further questions)

* SPDK FTL definition: https://spdk.io/doc/ftl.html

4/17/2019 © 2019 Circuit Blvd., Inc. 3 Hardware Setup

• SuperMicro X11DPG • 2 * Xeon Scalable Gold 6126 2.6 Ghz (12 cores) • hyperthreading disabled OCSSD1 • 8 * 32 GB DIMM 2666 MT/s

• 2 * OCSSD 2.0 OCSSD2 • Marvell 88SS1098 controller • PCIe Gen3x4 slot to each CPU package • nvme id-ns • LBADS=12 (4KiB), MS=0 • ocssd geometry • 8 grp (3), 8 pu (3), 1478 chk (11), 6144 lbk (13) • () means bit length in LBAF • ws_opt=24 (96KiB) CPU1 • 3D TLC NAND CPU2 • write unit: 96KiB (one shot program) • read unit: 32KiB

4/17/2019 © 2019 Circuit Blvd., Inc. 4

OCSSD Geometry PCIe OCSSD LBA (64bits) grp pu chk lbk bits 3 3 11 13 OCSSD ex) 7 1 1 5 grp 0 1 2 3 4 5 6 7 3D TLC NAND pu 0 8 16 24 32 40 48 56 2 plane blocks lbk (4KiB)

1 9 17 25 33 41 49 57 chk 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ws_opt=24 1 2 10 18 26 34 42 50 58 16 17 18 19 20 21 22 23 2 : : : : : : : : 3 11 19 27 35 43 51 59 : : : : : : : : 3 rs_opt=8 (*) 4 12 20 28 36 44 52 60 : : : : : : : : : 6120 6121 6122 6123 6124 6125 6126 6127

5 13 21 29 37 45 53 61 1476 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6 14 22 30 38 46 54 62 1477 64 (layer) * 4 (wordline) * 3 (page) * 8 (lbk) = 6144 7 15 23 31 39 47 55 63 NAND op: tR < tPROG < tBERS (1) pu range = 0-15,16-31,32-47,48-63 *: rs_opt(optimal read size) is not defined on OCSSD spec 2.0 (2) pu range = 0-63 4/17/2019 for pblk & spdk ftl 5 Software Setup

• linux 4.17 for pblk • Additional changes to SPDK master • with Marvell’s patches applied • Marvell specific patch (CircuitBlvd’s) • num_chk=1478 as posted on SPDK github issue • linux 5.0 for SPDK • OCSSD identification quirk & edlp=0 • vector reset (0x90) to DSM deallocate (0x09) • fio 3.13 • erase should be done in synchronous mode • isolcpus=6-11,18-23 for fio threads • vendor specific cmd to build chunk info • cpus_allowed=6-11 or 18-23 • Optimal read size (rsopt) patch • will be posted to SPDK gerrithub • SPDK • Cherry picks to avoid chk wptr error (’s) • master: 7b0579d (4/9/2019) • https://review.gerrithub.io/c/spdk/spdk/+/449068 • https://review.gerrithub.io/c/spdk/spdk/+/450174 • https://review.gerrithub.io/c/spdk/spdk/+/449239

4/17/2019 © 2019 Circuit Blvd., Inc. 6 Throughput Comparison

• single bdev on 64 PUs pblk spdk ftl spdk ftl-rsopt 3.5 read • 2 * W (128k write T1Q64) 3

• 1 * R (128k read T1Q64) 2.5 1st write 2nd write • spdk ftl isn’t aware of TLC 2

read unit as posted on SPDK GiB/s 1.5 Trello 1

0.5

0 0 1000 2000 3000 4000 Time (seconds) * pblk target created with op=20 4/17/2019 © 2019 Circuit Blvd., Inc. 7 Noisy Neighbor Problem Solved by OCSSD

• 4k randread T3Q64 & randwrite T1Q64 on four partitions • each partition is pre-conditioned with 128k write • not isolated: single bdev on 64 Pus • isolated by 2 channels: four bdevs per 16 Pus

250 250 200 200 3 reads

150 150

pblk 100 100

50 50

0 0 1 write 7100 7120 7140 7160 7180 7200 7100 7120 7140 7160 7180 7200

250 250

200 200

spdk ftl 150 150 100 100 3 reads 50 50

0 0 1 write 7100 7120 7140 7160 7180 7200 7100 7120 7140 7160 7180 7200

4/17/2019 X: seconds, Y: K IOPS © 2019 Circuit Blvd., Inc. 8 Contributions & Future Works

• OCSSD 2.0 API & FTL • https://github.com/spdk/spdk/commits?author=youngtack • https://github.com/spdk/spdk/commits?author=iClaire • FTL issues • https://trello.com/c/Osol93ZU • https://github.com/spdk/spdk/issues/created_by/youngtack • https://github.com/spdk/spdk/issues/created_by/iClaire • Future works • random IOPS bottleneck analysis • ANM analysis once Marvell firmware will support • CPU affinity per FTL bdev analysis • PMDK and ZNS support of FTL bdev

4/17/2019 © 2019 Circuit Blvd., Inc. 9 Acknowledgement

• Wojciech Malikowski (Intel) – SPDK FTL • Matias Bjørling () – QEMU NVMe, LightNVM PBLK • Luan Ton-That (Marvell) - OCSSD firmware • John Schadegg (Marvell) - OCSSD EVB

4/17/2019 © 2019 Circuit Blvd., Inc. 10 Open-Channel SSD Roadmap

2011 2014 2015 2018 2019 2020 ~

Jasmine OpenSSD OCSSD Spec Indilinx (SoC) SATA LightNVM Architecture

OCSSD Projects Cinabro™ Cosmos OpenSSD Storage Appliance FPGA w/ PCIe Gen 2 Alibaba OCSSD Microsoft Denali OCSSD w/ SPDK SPDK FTL + PMDK Marvell SoC w/ SPDK FTL OCSSD / ZNS Optane DIMM

4/17/2019 © 2019 Circuit Blvd., Inc. 11 CinabroTM Architecture SW Stack and Storage Appliance

App OCSSD / ZNS SPDK FTL / PMDK Optane DIMM OS

20 ~ 30 SSDs

4/17/2019 © 2019 Circuit Blvd., Inc. 12 Summary www.circuitblvd.com

• The SPDK+OCSSD shows promise in alleviating the Noisy Neighbor problem.

• SPDK OCSSD Reference Platform Availability: 2H ‘19

• For inquiries or more information:

[email protected]

4/17/2019 © 2019 Circuit Blvd., Inc. 13 Marvell Data Center & Enterprise Open Channel SSD Controller

S P D K 2 0 1 9 Marvell Confidential Agenda

• Marvell 88SS1098 Datacenter NVMe SSD Controller • Marvell OC Drive (Prototype)

15 Marvell Confidential 88SS1098 - Marvell Datacenter NVMe SSD Controller

Feature 88SS1098 Feature 88SS1098 8TB/8CH or 16TB/16CH Capacity (via 2x4GB/s MCI) NAND I/F speed 800MT/s PCIe Gen 3x4, Single and dual port Reliability Gen4 LDPC 1.3 , 64 VF NVMe 64 IO queues , 256 commands SGL Yes Virtualization 64VF IO Determinism Yes Metadata T10 / DIF / DIX

Program/Erase Natively supported including out-of- T10 E2E DIX Yes Suspend & Resume order transfers

CPU QUAD CORTEX – R5 ARM Encryption AES-XTS

16 Marvell Confidential 88SS1098 - Marvell Datacenter NVMe SSD Controller

88SS1098

128K Seq Write 2.73 GB/s

128K Seq Read 3.31 GB/s

4K Random Write 500 KIOPs

4K Random Read 650 KIOPs

NAND: BICS3 TLC, NFIF : 533 MT/s 8 Channels, 64 dies

17 Marvell Confidential Marvell OC Drive (Prototype) • Host: Linux PC with PCIe 3.0 • Drive: M.2 SSD/PCIE3.0x4 Ubuntu Linux PC Linux Host • Approach: PCIe3.0x4 I/F – Align with Linux open-source community and SPDK – Evaluate open-channel SSD solution with prototype NVMe Controller • Targets: Marvell OC SSD Media FW Support open-channel SSD interface v2.0 SSD – Device / » In-house modification to support v2.0 read/write/erase Back End Drive operations Controller » Aligned with Linux upstream kernel 4.17, 4.18, 5.0

– Integrate with Marvell SSD controller and expose NAND NAND NAND NAND as a block device using pblk path in lightNVM » Multi pblk instances support

18 Marvell Confidential NVMe Command Support

Operation NVMe Command

Read Read Chunk

Write Write Chunk

Erase Reset Chunk (Free or Vacant)

Get Geometry Geometry

Get Chunk Information Get Log Page (Chunk Information)

Media Feedback Get/Set Features (Media Feedback)

19 Marvell Confidential OC Prototype Performance

88SS1098 OC Drive Prototype

128K Seq Write 2.7 GB/s

128K Seq Read 2.3 GB/s

4K Random Write 594 KIOPs

4K Random Read 448 KIOPs

NAND: Toshiba BICS3 TLC, NFIF : 533 MT/s 8 Channels, 64 dies

We can achieve maximum possible chip performance with future product code

20 Marvell Confidential Planned Features for OCSSD

• Vector I/O and Asynchronized erase – High performance • NAND error recovery – Highly efficient error recovery algorithms for best QoS and drive life – Reusable, compatible and tested with all major NAND vendors • Meta support – To store host LBA in NAND • Performance tuning

21 Marvell Confidential Summary

• Marvell 88SS1098 controller is a perfect fit for both conventional enterprise and open channel SSD products • Marvell has highly efficient FW components – Unified HAL : Provides access and exercises all HW features – Full featured media management and NAND error recovery – FW for NVMe block and other IP’s

22 Marvell Confidential Q & A

Marvell Confidential The information contained in this presentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided “AS IS”, without warranty of any kind, express or implied. This information is based on Marvell’s current product roadmap, which are subject to change by Marvell without notice. Marvell assumes no obligation to update or otherwise correct or revise this information. Marvell shall not be responsible for any direct, indirect, special, consequential or other damages arising out of the use of, or otherwise related to, this presentation or any other documentation even if Marvell is expressly advised of the possibility of such damages. Marvell makes no representations or warranties with respect to the contents of the presentation and assumes no responsibility for any inaccuracies, errors or omissions that may appear in this presentation.

6-May-19 24 6-May-19 25