SPDK FTL & Marvell OCSSD for Noisy Neighbor Problem
Total Page:16
File Type:pdf, Size:1020Kb
SPDK FTL & Marvell OCSSD for Noisy Neighbor Problem David Recker Marketing VP Circuit Blvd., Inc. April 2019 4/17/2019 © 2019 Circuit Blvd., Inc. 1 Industry Who We Are Enterprise/Cloud Database and Storage Year Founded Sunnyvale, CA, U.S.A. 2017 Mission We develop next gen database/storage systems leveraging expertise in memory semiconductor, solid-state storage system, and operating systems Open Source Contributions • Linux LightNVM, OCSSD 2.0 specification, OpenSSD FPGA platform • SPDK (since SPDK v17.10) • RocksDB 4/17/2019 © 2019 Circuit Blvd., Inc. 2 OCSSD with SPDK FTL • SPDK FTL on Marvell’s OCSSD Platform • We have been evaluating SPDK FTL on Marvell's SSD SoC platform since Jan ’19 • SPDK (Flash Translation Layer) FTL: The Flash Translation Layer library provides block device access on top of non-block SSDs implementing Open Channel interface. It handles the logical to physical address mapping, responds to the asynchronous media management events, and manages the defragmentation process* • Measured various performance metrics of initial prototype and demonstrate how SPDK OCSSDs can solve the noisy neighbor problem in multi-tenant environments • Share experimental data based on our current implementation (both SPDK FTL and Marvell’s controller being continuously improved) • (Demo) SPDK Driven OCSSD Comparison (Isolation vs Non-Isolation) • Demo table outside (please feel free to drop by for further questions) * SPDK FTL definition: https://spdk.io/doc/ftl.html 4/17/2019 © 2019 Circuit Blvd., Inc. 3 Hardware Setup • SuperMicro X11DPG • 2 * Xeon Scalable Gold 6126 2.6 Ghz (12 cores) • hyperthreading disabled OCSSD1 • 8 * 32 GB DIMM 2666 MT/s • 2 * OCSSD 2.0 OCSSD2 • Marvell 88SS1098 controller • PCIe Gen3x4 slot to each CPU package • nvme id-ns • LBADS=12 (4KiB), MS=0 • ocssd geometry • 8 grp (3), 8 pu (3), 1478 chk (11), 6144 lbk (13) • () means bit length in LBAF • ws_opt=24 (96KiB) CPU1 • 3D TLC NAND CPU2 • write unit: 96KiB (one shot program) • read unit: 32KiB 4/17/2019 © 2019 Circuit Blvd., Inc. 4 OCSSD Geometry PCIe PCIe OCSSD LBA (64bits) grp pu chk lbk bits 3 3 11 13 OCSSD ex) 7 1 1 5 grp 0 1 2 3 4 5 6 7 3D TLC NAND pu 0 8 16 24 32 40 48 56 2 plane blocks lbk (4KiB) 1 9 17 25 33 41 49 57 chk 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ws_opt=24 1 2 10 18 26 34 42 50 58 16 17 18 19 20 21 22 23 2 : : : : : : : : 3 11 19 27 35 43 51 59 : : : : : : : : 3 rs_opt=8 (*) 4 12 20 28 36 44 52 60 : : : : : : : : : 6120 6121 6122 6123 6124 6125 6126 6127 5 13 21 29 37 45 53 61 1476 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6 14 22 30 38 46 54 62 1477 64 (layer) * 4 (wordline) * 3 (page) * 8 (lbk) = 6144 7 15 23 31 39 47 55 63 NAND op: tR < tPROG < tBERS (1) pu range = 0-15,16-31,32-47,48-63 *: rs_opt(optimal read size) is not defined on OCSSD spec 2.0 (2) pu range = 0-63 4/17/2019 for pblk & spdk ftl 5 Software Setup • linux 4.17 for pblk • Additional changes to SPDK master • with Marvell’s patches applied • Marvell specific patch (CircuitBlvd’s) • num_chk=1478 as posted on SPDK github issue • linux 5.0 for SPDK • OCSSD identification quirk & edlp=0 • vector reset (0x90) to DSM deallocate (0x09) • fio 3.13 • erase should be done in synchronous mode • isolcpus=6-11,18-23 for fio threads • vendor specific cmd to build chunk info • cpus_allowed=6-11 or 18-23 • Optimal read size (rsopt) patch • will be posted to SPDK gerrithub • SPDK • Cherry picks to avoid chk wptr error (Intel’s) • master: 7b0579d (4/9/2019) • https://review.gerrithub.io/c/spdk/spdk/+/449068 • https://review.gerrithub.io/c/spdk/spdk/+/450174 • https://review.gerrithub.io/c/spdk/spdk/+/449239 4/17/2019 © 2019 Circuit Blvd., Inc. 6 Throughput Comparison • single bdev on 64 PUs pblk spdk ftl spdk ftl-rsopt 3.5 read • 2 * W (128k write T1Q64) 3 • 1 * R (128k read T1Q64) 2.5 1st write 2nd write • spdk ftl isn’t aware of TLC 2 read unit as posted on SPDK GiB/s 1.5 Trello 1 0.5 0 0 1000 2000 3000 4000 Time (seconds) * pblk target created with op=20 4/17/2019 © 2019 Circuit Blvd., Inc. 7 Noisy Neighbor Problem Solved by OCSSD • 4k randread T3Q64 & randwrite T1Q64 on four partitions • each partition is pre-conditioned with 128k write • not isolated: single bdev on 64 Pus • isolated by 2 channels: four bdevs per 16 Pus 250 250 200 200 3 reads 150 150 pblk 100 100 50 50 0 0 1 write 7100 7120 7140 7160 7180 7200 7100 7120 7140 7160 7180 7200 250 250 200 200 spdk ftl 150 150 100 100 3 reads 50 50 0 0 1 write 7100 7120 7140 7160 7180 7200 7100 7120 7140 7160 7180 7200 4/17/2019 X: seconds, Y: K IOPS © 2019 Circuit Blvd., Inc. 8 Contributions & Future Works • OCSSD 2.0 API & FTL • https://github.com/spdk/spdk/commits?author=youngtack • https://github.com/spdk/spdk/commits?author=iClaire • FTL issues • https://trello.com/c/Osol93ZU • https://github.com/spdk/spdk/issues/created_by/youngtack • https://github.com/spdk/spdk/issues/created_by/iClaire • Future works • random IOPS bottleneck analysis • ANM analysis once Marvell firmware will support • CPU affinity per FTL bdev analysis • PMDK and ZNS support of FTL bdev 4/17/2019 © 2019 Circuit Blvd., Inc. 9 Acknowledgement • Wojciech Malikowski (Intel) – SPDK FTL • Matias Bjørling (Western Digital) – QEMU NVMe, LightNVM PBLK • Luan Ton-That (Marvell) - OCSSD firmware • John Schadegg (Marvell) - OCSSD EVB 4/17/2019 © 2019 Circuit Blvd., Inc. 10 Open-Channel SSD Roadmap 2011 2014 2015 2018 2019 2020 ~ Jasmine OpenSSD OCSSD Spec Indilinx (SoC) SATA LightNVM Architecture OCSSD Projects Cinabro™ Cosmos OpenSSD Storage Appliance FPGA w/ PCIe Gen 2 Alibaba OCSSD Microsoft Denali OCSSD w/ SPDK SPDK FTL + PMDK Marvell SoC w/ SPDK FTL OCSSD / ZNS Optane DIMM 4/17/2019 © 2019 Circuit Blvd., Inc. 11 CinabroTM Architecture SW Stack and Storage Appliance App OCSSD / ZNS SPDK FTL / PMDK Optane DIMM OS 20 ~ 30 SSDs 4/17/2019 © 2019 Circuit Blvd., Inc. 12 Summary www.circuitblvd.com • The SPDK+OCSSD shows promise in alleviating the Noisy Neighbor problem. • SPDK OCSSD Reference Platform Availability: 2H ‘19 • For inquiries or more information: [email protected] 4/17/2019 © 2019 Circuit Blvd., Inc. 13 Marvell Data Center & Enterprise Open Channel SSD Controller S P D K 2 0 1 9 Marvell Confidential Agenda • Marvell 88SS1098 Datacenter NVMe SSD Controller • Marvell OC Drive (Prototype) 15 Marvell Confidential 88SS1098 - Marvell Datacenter NVMe SSD Controller Feature 88SS1098 Feature 88SS1098 8TB/8CH or 16TB/16CH Capacity (via 2x4GB/s MCI) NAND I/F speed 800MT/s PCIe Gen 3x4, Single and dual port Reliability Gen4 LDPC 1.3 , 64 VF NVMe 64 IO queues , 256 commands SGL Yes Virtualization 64VF IO Determinism Yes Metadata T10 / DIF / DIX Program/Erase Natively supported including out-of- T10 E2E DIX Yes Suspend & Resume order transfers CPU QUAD CORTEX – R5 ARM Encryption AES-XTS 16 Marvell Confidential 88SS1098 - Marvell Datacenter NVMe SSD Controller 88SS1098 128K Seq Write 2.73 GB/s 128K Seq Read 3.31 GB/s 4K Random Write 500 KIOPs 4K Random Read 650 KIOPs NAND: Toshiba BICS3 TLC, NFIF : 533 MT/s 8 Channels, 64 dies 17 Marvell Confidential Marvell OC Drive (Prototype) • Host: Linux PC with PCIe 3.0 • Drive: M.2 SSD/PCIE3.0x4 Ubuntu Linux PC Linux Host • Approach: PCIe3.0x4 I/F – Align with Linux open-source community and SPDK – Evaluate open-channel SSD solution with prototype NVMe Controller • Targets: Marvell OC SSD Media FW Support open-channel SSD interface v2.0 SSD – Device / » In-house modification to support v2.0 read/write/erase Back End Drive operations Controller » Aligned with Linux upstream kernel 4.17, 4.18, 5.0 – Integrate with Marvell SSD controller and expose NAND NAND NAND NAND as a block device using pblk path in lightNVM » Multi pblk instances support 18 Marvell Confidential NVMe Command Support Operation NVMe Command Read Read Chunk Write Write Chunk Erase Reset Chunk (Free or Vacant) Get Geometry Geometry Get Chunk Information Get Log Page (Chunk Information) Media Feedback Get/Set Features (Media Feedback) 19 Marvell Confidential OC Prototype Performance 88SS1098 OC Drive Prototype 128K Seq Write 2.7 GB/s 128K Seq Read 2.3 GB/s 4K Random Write 594 KIOPs 4K Random Read 448 KIOPs NAND: Toshiba BICS3 TLC, NFIF : 533 MT/s 8 Channels, 64 dies We can achieve maximum possible chip performance with future product code 20 Marvell Confidential Planned Features for OCSSD • Vector I/O and Asynchronized erase – High performance • NAND error recovery – Highly efficient error recovery algorithms for best QoS and drive life – Reusable, compatible and tested with all major NAND vendors • Meta support – To store host LBA in NAND • Performance tuning 21 Marvell Confidential Summary • Marvell 88SS1098 controller is a perfect fit for both conventional enterprise and open channel SSD products • Marvell has highly efficient FW components – Unified HAL : Provides access and exercises all HW features – Full featured media management and NAND error recovery – FW for NVMe block and other IP’s 22 Marvell Confidential Q & A Marvell Confidential The information contained in this presentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided “AS IS”, without warranty of any kind, express or implied.