Karol Latecki John Kariuki

SPDK, PMDK, Intel® Performance Analyzers Virtual Forum Agenda

I/O Performance Performance and Efficiency 1 Workloads

Local Storage Performance Test case and objectives 2 Performance Test tools, environment and optimizations

Storage over Ethernet Performance Test case and objectives 3 Performance Test tools, environment and optimizations

Virtualized Storage Performance Test case and objectives 4 Performance Test tools, environment and optimizations

SPDK, PMDK, Intel® Performance Analyzers Virtual Forum 2 SPDK I/O Performance

Efficiency & Scalability Latency I/O per sec from 1 thread Average I/O core scalability Tail(P90, P99, P99.99)

3 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum 4 KiB 128 KiB Local Storage Performance

100% Random Read 100% Seq Read

Storage over Ethernet Performance 100% Random Write 100% Seq Write

Virtualized Storage Performance 70%/30% Random Read/Write 70%/30% Seq Read/Write

4 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum https://spdk.io/doc/performance_reports.html The Performance Reports

5 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum SPDK, PMDK, Intel® Performance Analyzers Virtual Forum

Local Block Storage Objectives: • Measure SPDK NVMe BDEV performance • Compare SPDK vs. Kernel (libaio, io_uring) block layers SPDK /FIO Test Cases: 1. I/O per second from one thread SPDK 2. I/O core scalability SPDK NVMe 3. SPDK vs. Kernel Latency BDEV 4. IOPS vs. Latency SPDK NVMe Driver

Intel® TLC Test case execution automated with test/nvme/perf/run_perf.sh 3D NAND SSD

7 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum SPDK NVMe SPDK NVMe BDEV IOPS 1 CPU Core BDEV I/O 4 KB Rand Read @ QD=128 Efficiency. 6000.00

5000.00

4000.00

3000.00 (Higher is is Better) (Higher IOPS IOPS (Thousands) 2000.00

1000.00

0.00 1 2 4 6 8 10 Number of SSDs

Single Core IOPS scale linearly as number of SSDs increases up to 8 Maximum IOPS/Core: 5.2 million at 10 SSDs.

8 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum See configuration details – Slide 33 SPDK NVMe BDEV I/O Core Scalability

Lockless I/O path - IOPS Scale linearly with addition of I/O Cores

9 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum See configuration details – Slide 33 [global] direct=1 thread=1 ---- bs=4096 Minimize number of I/O numjobs=1 threads runtime=300 ramp_time=60

[filename0] NUMA iodepth=192 cpus_allowed=0 filename=Nvme0n1 filename=Nvme1n1 Tools: fio, bdevperf, nvmeperf filename=Nvme2n1 filename=Nvme3n1 filename=Nvme4n1 filename=Nvme5n1

10 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum fio Industry standard High flexibility Lots of I/O metrics

Why SPDK SPDK perf tools perf tools? Less flexibility Optimized for I/O submission and completion. Up to 2x more IOPS/Core

SPDK, PMDK, Intel® Performance Analyzers Virtual Forum IOPS vs. Latency - 4 I/O Cores

IOPS vs. Average Latency 4 KB Random Read (4 I/O Cores) 12.00 2,000.00

1,800.00 10.00 1,600.00

1,400.00 8.00 1,200.00

6.00 1,000.00

IOPS(millions) 800.00

(Lower (Lower is better) (Higher (Higher Better) is 4.00 Ave. Latency(usec) 600.00

400.00 2.00 200.00

0.00 0.00 1 2 4 8 16 32 64 128 Queue Depth

SPDK Fio Bdev IOPS Kernel Libaio IOPS Kernel IO Uring IOPS SPDK Avg. Latency (usecs) Kernel Libaio Avg. Latency (usecs) Kernel IO Uring Avg. Latency (usecs)

SPDK BDEV up to 2.9x and 5.8x more IOPS/Core vs. io_uring and libaio respectively

12 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum See configuration details – Slide 33 • Intel Server System R2224WFTZS • 2 x Intel® Xeon® Gold 6230N Processor (2.30 GHz, 20 cores per socket) • 384 GB 2933MHz DDR4 RAM • 24 x Intel® SSD DC P4610 1.6TB NVMe

13 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum Why Benchmark each release?

Measure Measure Validate performance on performance after performance new HW SW optimizations impact of new (SSDs, CPUs, NICs) features

14 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum SPDK, PMDK, Intel® Performance Analyzers Virtual Forum

Block Storage over Ethernet SPDK and Linux NVMe-oF Test Cases: Performance 1. SPDK NVMe-oF target I/O core scalability 2. SPDK NVMe-oF initiator I/O core scalability 3. Latency and Interoperability of SPDK and Kernel RDMA & TCP Transports components 4. Performance with increasing number of connections Interoperability performance testing Automation scripts: spdk/scripts/perf/nvmf/run_nvmf.py

16 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum Target: Storage and Network NVMe• -oF Initiator 1 NVMe-oF Initiator 1 (SPDK/) (SPDK/Linux Kernel)

QSFP28 Cables QSFP28 Cables (direct connection) (direct connection) Host: many CPU cores 100GbE NIC1 100GbE NIC2 (CPU Socket 0) (CPU Socket 1) NVMe-oF Target (SPDK/Linux Kernel) Benchmark tool: fio PCIe Switch 1 PCIe Switch 2 (CPU Socket 0) (CPU Socket 1)

8x Intel P4610 SSDs 8x Intel P4610 SSDs

17 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum Over 100 Gbps

8 Target CPU Core saturate 100Gbps – 4KB Random Read Data from SPDK NVMe-oF TCP 21.01 Performance Report

18 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum See configuration details – Slide 33 SPDK relative efficiency up to 2x better with increasing number of connections. Data from SPDK NVMe-oF TCP 21.01 Performance Report

19 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum See configuration details – Slide 33 SPDK NVMe-oF TCP Target Core Scaling 128k Read Workload 180.00

160.00

140.00

120.00

100.00

80.00 BANDWIDTH

60.00 (GBPS, BETTER) IS HIGHER (GBPS, 40.00

20.00

0.00 1 4 8 # OF CPU CORES

SPDK NVMe-oF TCP 20.01 SPDK NVMe-oF TCP 20.04

MSG_ZEROCOPY doubled performance of a single CPU SPDK Target process. Data from SPDK NVMe-oF TCP 20.01 and 20.04 performance reports.

20 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum See configuration details – Slide 33 •Hardware NUMA alignment •BIOS & OS performance settings •NIC IRQ Affinity settings •TCP/IPv4 Linux Sysctl settings

21 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum SPDK, PMDK, Intel® Performance Analyzers Virtual Forum

Virtualized Storage The test cases:

SPDK& Kernel Vhost 1. SPDK Vhost single core VM saturation Performance 2. SPDK Vhost I/O Core Scalability 3. VM Density–SPDK & Kernel Vhost VM Density 4. Latency vs IOPS with increasing Queue Depth 5. Performance Tuning: ▪ Link Time Optimization Optimizations ▪ Qemu Packed Rings More on SPDK Vhost: https://spdk.io/doc/vhost.html

23 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum VM0 VM1 VM(N-1) VM N Vhost-Scsi&Virtio-Blk lvol0 lvol1 . . . lvol(N-1) lvolN

SPDK Vhost

lvol NVMe Bdev & Logical lvol0 lvol1 . . . lvoln Volumes (n-1)

Nvme0n1 . . . Nvme23n1 QEMU/KVM; up to 36 VMs

Local NVMe SSDs

24 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum Up to 1.8 million IOPS on 1 CPU Core. Linear scaling with addition of I/O cores. Data from SPDK Vhost 21.01 Performance Report

25 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum See configuration details – Slide 33 SPDK Vhost able to serve required IO with high number of VMs. Data from SPDK Vhost 21.01 Performance Report

SPDK, PMDK, Intel® Performance Analyzers Virtual Forum 26 See configuration details – Slide 33 +1.8% +5.4%

+7.6%

+6.2% +5.4%

Data from SPDK Vhost 21.01 Performance Report

SPDK, PMDK, Intel® Performance Analyzers Virtual Forum 27 See configuration details – Slide 33 • Benchmark Tool:fio in client-server mode • Automation script: Benchmarking Tools test/vhost/perf_bench/vhost_perf.sh

• Test optimizations:

Optimizations • NUMA alignment • Fiomeasurementoptions

• Resource limiting()

28 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum SPDK, PMDK, Intel® Performance Analyzers Virtual Forum

Continous Performance • Run in SPDK Continuous Integration • Uses same scripts as for quarterly benchmark reports • Currently covers Vhost, NVMe-oF TCP and NVMe-oF RDMA

30 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum See configuration details – Slide 33 • Performance & Power: Using dynamic scheduler to measure IOPS/Watt • NVMe over vfio-user performance • Container Storage performance • Data Services Performance: Compress bdev, Crypto bdev

31 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum Q&A

SPDK, PMDK, Intel® Performance Analyzers Virtual Forum Local Storage (Slides 8,9,12) & Virtualized Storage (Slides 25-27):Test by Intel as of 2/10/2021. 1-node, 2x Intel® Xeon® Gold 6230N Processor, 20 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode:0x4003003), Fedora 33, Linux Kernel 5.10.19-200, gcc 9.3.1 compiler, fio 3.19, SPDK 21.01, Storage: 24x Intel® SSD DC P4610 1.6TB. Network Storage (Slides 18 - 20):Test by Intel as of 2/10/2021. Target Node: 1-node, 2x Intel® Xeon® Gold 6230 Processor, 20 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: 3.4 (ucode:0x5003003), Fedora 33, Linux Kernel 5.8.15-300, gcc 9.3.1 compiler, fio 3.19, SPDK 21.01, Storage: 16x Intel® SSD DC P4610 1.6TB, Network: 2x 100 GbE Mellanox ConnectX-5. Host Nodes: 2-nodes, 2x Intel® Xeon® Gold 6252 Processor, 24 cores HT On Turbo ON Total Memory 192 GB (6 slots/ 32GB/ 2933 MHz), BIOS: 3.4 (ucode:0x5003003), Fedora 33, Linux Kernel 5.8.15-300, gcc 9.3.1 compiler, fio 3.19, SPDK 21.01, Network: 1x 100 GbE Mellanox ConnectX-5

33 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum • • Automated metric collection with SAR scripts/perf/nvmf/run_nvmf.py • SAR CPU utilization measurement on Target side Bwm-ng • bwm-ng to measure bandwidth utilization on network interfaces PCM • PCM measurements on Target side include CPU, memory and power consumption.

34 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum • TCP/IPv4 Linux Sysctl settings: • NIC IRQ Affinity: • [sys_sgci@localhost ofed_scripts]$ sudo • net.core.somaxconn = 4096 ./set_irq_affinity.sh ens49f1np1 ens49f0np0 • net.core.netdev_max_backlog = 8192 • ------• net.ipv4.tcp_max_syn_backlog = 16384 • net.core.rmem_max = 268435456 • Optimizing IRQs for Dual port traffic • net.core.wmem_max = 268435456 • ------• net.ipv4.tcp_mem = 268435456 268435456 268435456 • net.ipv4.tcp_rmem = 8192 1048576 33554432 • Discovered irqs for ens49f1np1: 645 646 647 648 649 • net.ipv4.tcp_wmem = 8192 1048576 33554432 650 651 [...] • net.ipv4.route.flush = 1 Assign irq 645 core_id 0 • vm.overcommit_memory = 1 Assign irq 646 to its affinity_hint 0000,00000000,00100000 Assign irq 647 to its affinity_hint 0000,00000000,00200000 Assign irq 648 to its affinity_hint 0000,00000000,00400000 • […]

35 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum • Target system: •2 x Intel® Xeon® Gold 6230 Processor (2.10 GHz, 20 cores per socket) •16 x Intel® SSD DC P4610 1.6TB NVMe •2 x Mellanox ConnectX-5 100Gb • Initiator systems: •2 x Intel® Xeon® Gold 6252 Processor •1 x Mellanox ConnectX-5 100Gb

36 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum Our Test system: CPU Cores: VMs –2 vCPU/VM –up to 36 • Intel Server System R2224WFTZS • 2 x Intel® Xeon® Gold 6230N Processor(2.30 Memory: GHz, 20 cores per socket) 4 GB/VM –up 72 VM in density test case • 384 GB 2933MHz DDR4 RAM • 24 x Intel® SSD DC P4610 1.6TB NVMe

37 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum

Intel technologies may require enabled hardware, software or service activation. Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​​. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​updates. See backup for configuration details. No product or component can be absolutely secure. No product or component can be absolutely secure. Your costs and results may vary. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

SPDK, PMDK, Intel® Performance Analyzers Virtual Forum 39 This is a sample This is a sample This is a sample This is a sample text. Enter your text text. Enter your text text. Enter your text text. Enter your text here here here here

SPDK, PMDK, Intel® Performance Analyzers Virtual Forum 40 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum 41 SPDK, PMDK, Intel® Performance Analyzers Virtual Forum 42