Nvme Over Fabrics Demystified

Nvme Over Fabrics Demystified

NVMe over Fabrics Demystified Rob Davis Mellanox 1 2019 Storage Developer Conference India © All Rights Reserved. Why NVMe over Fabrics? Storage Media Technology 1000 Sec) - 10 Access Time (micro Access Time in Micro Seconds in Access Time Micro 0.1 HDDHD SSD NVMPM © 2019 Mellanox Technologies 2 NVMe Technology ▪ Optimized for flash and PM ▪ Traditional SCSI interfaces designed for spinning disk ▪ NVMe bypasses unneeded layers ▪ NVMe Flash Outperforms SAS/SATA Flash ▪ +2.5x more bandwidth, +50% lower latency, +3x more IOPS © 2019 Mellanox Technologies 3 “NVMe over Fabrics” was the Logical and Historical next step ▪ Sharing NVMe based storage across multiple servers/CPUs was the next step ▪ Better utilization: capacity, rack space, power ▪ Scalability, management, fault isolation ▪ NVMe over Fabrics standard ▪ 50+ contributors ▪ Version 1.0 released in June 2016 Gb/s ▪ Pre-standard demos in 2014 ▪ Able to almost match local NVMe performance © 2019 Mellanox Technologies 4 NVMe over Fabrics (NVMe-oF) Transports ▪ The NVMe-oF standard is not Fabric specific ▪ Instead there is a separate Transport Binding specification for each Transport Layer st ▪ RDMA was 1 InfiniBand ▪ Later Fibre Channel ▪ NVM.org just released a new binding specification for TCP © 2019 Mellanox Technologies 5 How Does NVMe-oF Maintain NVMe Performance? ▪ By extending NVMe efficiency over a fabric ▪ NVMe commands and data structures are transferred end to end ▪ Bypassing legacy stacks for performance ▪ First products and early demos all used RDMA Transport ▪ Performance is impressive or IB Transport NVMe/TCP SAS/sATA Device NVMe/RDMA over Fabrics © 2019 Mellanox Technologies 6 How Does NVMe-oF Maintain NVMe Performance? ▪ By extending NVMe efficiency over a fabric ▪ NVMe commands and data structures are transferred end to end ▪ Bypassing legacy stacks for performance ▪ First products and early demos all used RDMA ▪ Performance is impressive NVMe/TCP SAS/sATA Device NVMe/RDMA over Fabrics https://www.theregister.co.uk/2018/08/16/pavilion_fabrics_performance/ © 2019 Mellanox Technologies 7 How Does NVMe-oF Maintain NVMe Performance? ▪ By extending NVMe efficiency over a fabric ▪ NVMe commands and data structures are transferred end to end ▪ Bypassing legacy stacks for performance Fibre ▪ First products and early demos all used RDMA Channel ▪ Performance is impressive Fibre Channel ~150 NVMe/TCPNVMe/TCP SAS/sATA NVMe/FC Device NVMe/RDMA over Fabrics over Fabrics © 2019 Mellanox Technologies 8 Faster Storage Needs a Faster Network 10GbE Fibre Channel © 2019 Mellanox Technologies 9 Faster Network Wires Solves Some the Network Bottle Neck Problem… Ethernet & InfiniBand End-to-End 25, 40, 50, 56, 100, 200Gb Going to 400Gb © 2019 Mellanox Technologies 10 Faster Protocols Solves the Rest © 2019 Mellanox Technologies 11 Faster Protocols Solves the Rest © 2019 Mellanox Technologies 12 NVMe, NVMe-oF, and RDMA Protocols © 2019 Mellanox Technologies 13 NVMe/RDMA NVMe-oF over RoCE adapter based transport © 2019 Mellanox Technologies 14 NVMe/RDMA NVMe-oF over RoCE 1) Ethernet ▪ RoCE ▪ iWARP 2) InfiniBand 3) OmniPath adapter based transport © 2019 Mellanox Technologies 15 NVMe Commands Encapsulated Network © 2019 Mellanox Technologies 16 NVMe Commands Encapsulated NVMe NVMe RNIC RNIC Initiator Target Post Send (CC) Send – Command Capsule Ack Completion Completion Free send buffer Network Post NVMe command Wait for completion Post Send Free receive buffer (Write data) Write first Post Send (RC) Write last Ack Send – Response Capsule Completion Completion Ack Free allocated buffer Completion Free send buffer © 2019 Mellanox Technologies 17 Importance of Latency with NVMe-oF Logarithmicscale Common Switch & Adapter Newest NVMe SSD Request/Response Low Latency Switch & Adapter Network hops multiply latency © 2019 Mellanox Technologies 18 Composable Infrastructure Use Case Switch ▪Also called Compute Compute Storage Disaggregation Compute and Rack Scale Compute ▪ Dramatically improves Compute data center efficiency Compute ▪NVMe over Fabrics Compute enables Composable Infrastructure Compute ▪ Low latency Compute Compute ▪ High bandwidth Compute ▪ Nearly local disk Compute performance Compute © 2019 Mellanox Technologies 19 Hyperconverged and Scale-Out Storage Use Case ▪Scale-out ComputeHCI Nodes Nodes ▪ Cluster of commodity servers VM VM VM VMStorage ▪ Software provides storage App functions ▪Hyperconverged collapses compute & storage Scale out Storage ▪ Integrated compute-storage NVMe NVMe NVMe nodes & software Mellanox x86 Switch ▪ NVMe-oF performs like local/direct-attached SSD NVMe NVMe NVMe Storage Application © 2019 Mellanox Technologies 20 Backend Scale Out Use Case JBOF Network Frontend Backend © 2019 Mellanox Technologies 21 NVMe-oF Use Cases: Classic SAN ▪SAN features at higher performance ▪ Better utilization: capacity, rack space, and power ▪ Scalability ▪ Management ▪ Fault isolation © 2019 Mellanox Technologies 22 NVMe-oF Target Hardware Offloads No Offload Mode © 2019 Mellanox Technologies 23 How Target Offload Works ▪ Offload ▪ Only control path, management and exceptions go through Target CPU software ▪ Data path and NVMe commands handled by the network adapter © 2019 Mellanox Technologies 24 Offload vs No Offload Performance no Offload Offload Target Target 2 100Gb Initiators SOC 2 100Gb Initiators ConnectX-5 ConnectX-5 DDR4 DDR4 Initiator x86 Initiator x86 ConnectX-5 ConnectX-5 DDR4 DDR4 PCIe Switch Initiator x86 PCIe Switch Initiator x86 Data Path NVME NVME NVME NVME SSD SSD SSD SSD ▪ 6M IOPs, 512B block size ▪ 8M IOPs, 512B block size ▪ 2M IOPs, 4K block side ▪ 5M IOPs, 4K block side ▪ ~15 usec latency (not including ▪ ~5 usec latency (not including SSD) SSD) © 2019 Mellanox Technologies 25 Offload vs No Offload Performance no Offload Offload Target Target 2 100Gb Initiators SOC 2 100Gb Initiators ConnectX-5 ConnectX-5 DDR4 DDR4 Initiator x86 Initiator x86 ConnectX-5 ConnectX-5 DDR4 DDR4 PCIe Switch Initiator x86 PCIe Switch Initiator x86 Data Path NVME NVME NVME NVME SSD SSD SSD SSD ▪ 6M IOPs, 512B block size ▪ 8M IOPs, 512B block size ▪ 2M IOPs, 4K block side ▪ 5M IOPs, 4K block side ▪ ~15 usec latency (not including ▪ ~5 usec latency (not including SSD) SSD) © 2019 Mellanox Technologies 26 NVMe Emulation Physical Local NVMe Storage NVMe Drive Emulation Host Server Host Server OS/Hypervisor OS/Hypervisor NVMe Standard Driver NVMe Standard Driver PCIe PCIe BUS BUS NVMe NVMe Emulated Storage Physical Local Storage Remote Storage Local Physical Storage to Hardware Emulated Storage © 2019 Mellanox Technologies 27 NVMe/TCP ▪ NVMe-oF commands are sent over standard TCP/IP sockets ▪ Each NVMe queue pair is mapped to a TCP connection ▪ Easy to support NVMe over TCP with no changes ▪ Good for distance, stranded server, and out of band management connectivity © 2019 Mellanox Technologies 28 Latency: NVMe-RDMA vs NVMe-TCP Tail Latency RDMA Write RDMA Fraction of withIOs latency this or less TCP Write TCP Local SSD Write SSD Local © 2019 Mellanox Technologies 29 Latency: NVMe-RDMA vs NVMe-TCP Tail Latency RDMA Write RDMA Fraction of withIOs latency this or less TCP Write TCP Local SSD Write SSD Local © 2019 Mellanox Technologies 30 Latency: NVMe-RDMA vs NVMe-TCP Tail Latency RDMA Write RDMA Fraction of withIOs latency this or less TCP Write TCP Local SSD Write SSD Local © 2019 Mellanox Technologies 31 NVMe over Fabrics Maturity ▪ UNH-IOL, a neutral environment for multi-vendor interoperability since 1988 ▪ Four plug fests for NVMe-oF since May 2017 ▪ Tests require participating vendors to mix and match in both Target and Initiator positions ▪ June 2018 test included Mellanox, Broadcom and Marvel ASIC solutions ▪ URL to list of vendors who OK public results: https://www.iol.unh.edu/registry/ nvmeof © 2019 Mellanox Technologies 32 NVMe Market Projection – $60B by 2021 ▪~$20B in NVMe-oF revenue projected by 2021 ▪NVMe-oF adapter shipments will exceed 1.5M units by 2021 ▪ This does not include ASICs, Custom Mezz Cards, etc. inside AFAs and other Storage Appliances © 2019 Mellanox Technologies 33 Some NVMe-oF Storage Players © 2019 Mellanox Technologies 34 Conclusions ▪NVMe-oF brings the value of networked storage to NVMe based solutions ▪NVMe-oF is supported across many network technologies ▪The performance advantages of NVMe, are not lost with NVMe-oF ▪Especially with RDMA ▪There are many suppliers of NVMe-oF solutions across a variety of important data center use cases © 2019 Mellanox Technologies 35 Thank You © 2019 Mellanox Technologies 36 NVMe over Fabrics Demystified Rob Davis Mellanox 37 2019 Storage Developer Conference India © All Rights Reserved..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    37 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us