White Paper

Accelerating Network Overlays

Assists and Offloads for Linux KVM/, Windows Hyper-V, and VMware® GENEVE/NVGRE/VXLAN Marvell® FastLinQ™

Linux KVM/XEN, Windows Hyper-V, and VMware® Generic Encapsulation (GENEVE), Network Virtualization using Generic Routing Encapsulation (NVGRE), and Virtual Extensible LAN (VXLAN) tunneling overlay network technology addresses key network scalability challenges associated with a hyperscale cloud infrastructure.

Marvell® FastLinQ® 3400/8400(578xx)/41000/45000 Series 10GbE/25GbE/40GbE/50GbE/100GbE Adapters offer the most complete portfolio of overlay network offloads.

QMarvell tunneling offloads enhance Linux KVM/XEN, Windows Hyper-V, and VMware ESXi to scale network performance.

KEY REQUIREMENTS FOR VIRTUALIZED CHALLENGES IN VIRTUALIZED CLOUD-SCALE NETWORKS CLOUDSCALE NETWORKS In today’s cloud-scale networks, multiple organizations share A virtualized, multi-tenant environment must allow the the same physical infrastructure. Utilizing common processing unlimited transparent migration of workloads across and networking resources on an as-needed basis has become physical servers— while controlling the cost and maintaining a standard business practice. Some cloud networks support the Quality of Service (QoS) the customer requires. Most implementations with dedicated physical servers for each importantly, virtualized data centers need the flexibility customer, while other cloud network implementations support of provisioning resources that span multiple geographic virtual servers per customer (on a common physical server). locations. At the same time, virtualized data centers must maintain isolation between tenants and still allow seamless A single network environment that hosts multiple customers management of the multi-tenant environment. (tenants) allows the customers to reduce up-front costs for processing or networking resources, yet provides them with the Virtualized cloud networks must also accomplish flexibility to increase or reduce the resources as needed. Such the following: multi-tenant environments are increasingly using these new Handle MAC address growth in conjunction with the architectures due to the advantages of server virtualization. explosive growth of VMs in the cloud data center. However there are some challenges to overcome in order to deliver a truly scalable, elastic, and secure virtualized cloud‑scale Accommodate increasingly larger number of VLANs to handle VM traffic segregation in a VLAN “sprawl” infrastructure. situation.

Provide isolation of the physical L2 network. Each virtual tenant network has the illusion of being on its own physical network without impacting network performance and scalability.

SN0430997-00 Rev. D 06/19 1  Accelerating Network Virtualization Overlays White Paper

Figure 1 highlights some of the benefits of a virtualized cloud-scale clouds that are the sum of their parts—rather than distinct sets network as well as the shortcomings this environment commonly of parts. A true global cloud essentially decouples the physical faces. The benefits are outlined in green (+), and the shortcomings network design from the logical network design. GENEVE/ are outlined in red (-). NVGRE/VXLAN accomplishes these goals by using an overlay of tunnels, where layer 2 Ethernet frames are encapsulated (or tunneled) within layer 3 packets.

Elastic(+) Technically, this is achieved by deploying VXLAN technology Seamless in the vSwitch (Linux KVM/XEN, Windows Hyper-V, Extension with Extensible(+) VLANS, IP and VMware ESXi using GENEVE/NVGRE/VXLAN tunneling Subnets(-) protocol) and any Ethernet adapter. However choosing a non-fully featured Ethernet adapter would compromise the performance and flexibility of the cloud infrastructure. GENEVE tunneling technology is supported on VMware 6.7 and later. VXLAN is Lower supported on Linux KVM/XEN, Windows Server 2016 and later Isolation(-) Cloud Cost(+) Hyper-V, and VMware ESXi 5.1/6.0 and later.

CHALLENGES OF TUNNELING TECHNOLOGY AND MARVELL SOLUTIONS Logical Addressing Scalable(+) The main drawback of layer 2 over layer 3 tunneled traffic is Scheme for Tenants(-) that encapsulated traffic by-passes the normal stateless offload Workload features of a GENEVE/NVGRE/VXLAN-unaware adapter due to Mobility(-) the following reasons:

CPUs moving the packets individually (not as a block of data) Figure 1. Benefits and Shortcomings of Virtualization to the send queue. in a Multi-Tenant Cloud Environment CPUs calculating each send/received packet’s checksum value.

A single CPU core (normally core 0) handling all GENEVE/ SOLUTIONS FOR VIRTUALIZED CLOUD-SCALE NETWORKS NVGRE/VXLAN tunneled traffic of all ports. To provide workload mobility and migrate across geographic locations, one cloud network solution is to decouple the physical With these GENEVE/NVGRE/VXLAN-aware Marvell FastLinQ and logical addressing schemes. The tenant uses the logical Adapters, both the transmit and receive direction stateless address while the network infrastructure sees the physical address. offloads become available for use, which helps resolve all the This decoupling enables the flexibility required by the virtualized challenges outlined above and greatly increases tunneled traffic cloud data center for creating a faster, fatter, and flatter network. throughput with lower overall CPU utilization.

SCALING CLOUD NETWORKS WITH LAYER 2 OVER LAYER The Marvell FastLinQ Adapters built-in assists and offloads for 3 TUNNELING tunneled traffic helps network architects overcome the negatives (shown in Figure 1) in a virtualized private cloud environment and GENEVE/NVGRE/VXLAN are used specifically to allow non- improve performance via tunneled traffic offloading. geographically located server farms (that are not on the same IP subnet) to transparently do VM migrations and to allow Marvell adapters, with assists and offloads for tunneled traffic tenant VM users to seamlessly interact with other pre-defined technology, enable necessary QoS for multiple VMs in a multi- workloads, regardless of the geographic/IP subnet location of core environment by removing the single-queue restrictions the workloads. and enabling multiple hardware queues for multiple VMs. For VMware ESXi Hypervisor, the Marvell FastLinQ 3400/8400 Scaling the cloud network with GENEVE/NVGRE/VXLAN is the (578xx) Series Adapters acceleration is enabled by UDP RSS first step toward enabling logical, software-based networks that packet task offloading support in ESXi 5.1 and UDP/TCP RSS, can be created on demand, allowing enterprises to leverage capacity wherever it is available. In other thewords, GENEVE/ NVGRE/VXLAN can now help companies build true global

SN0430997-00 Rev. D 06/19 2  Accelerating Network Virtualization Overlays White Paper

TCP Segmentation Offload (TSO), NetQueue, and TX Checksum Marvell FastLinQ Adapter’s assists and offloads for tunneled traffic Offload packet task offloading support available in ESXi 5.5 and technology enables efficient distribution of network transmit and 6.0./6.5/6.7 and later. The Marvell FastLinQ 41000/45000 Series receive processing for GENEVE/NVGRE/VXLAN traffic across Adapters acceleration is enabled through UDP/TCP RSS, TSO, servers with multiple CPU cores. With Marvell Server Network NetQueue, Large Receive Offload (LRO), and TX/RX Checksum Virtualization Overlay acceleration, these FastLinQ adapter’s Offload packet task offloading on ESXi 6.0/6.5/6.7 and later. provide the ability to distribute workloads efficiently across all processor cores.

Table 1. Adapter Assists and Offloads for Tunneled Traffic

Linux KVM-XEN/Windows Hyper-V /VMware ESXi Standard Adapter 3400/ 8400 41000/ GENEVE/NVGRE/VXLAN without (578xx) Series 45000 Series Benefits of Acceleration Tunneled Traffic Stateless Acceleration Adapter Adapter Offloads

Large Segmentation Offload Offloading reduces interrupts for large (LSO)    transfer sizes, which saves CPU cycles TX Checksum Offload (CSO)    Offloading calculations saves CPU cycles RX Checksum Offload (CSO)    Offloading calculations saves CPU cycles Spreads workload over multiple CPU cores in RSS Queue Acceleration    the Host, avoiding single core bottlenecks

NetQueue/VMQ/MQ Spreads workload over multiple CPU cores in Acceleration    the VM, avoiding single core bottlenecks

Large Receive Offload (LRO) Offloading reduces interrupts for large with HW Transparent Packet    transfer sizes, which saves CPU cycles Aggregation (TPA)

PERFORMANCE METRICS OF VXLAN ESXI 5.5 TCP/UDP RSS The Marvell FastLinQ 10Gb 3400/8400 Series Adapters showed AND TSO/TX CSO ACCELERATION a performance increase of 126% with Receive VXLAN traffic Marvell performed benchmark tests between two identical x86 and 132% with VXLAN bidirectional traffic over an ESXi 5.5 servers running VMware ESXi 5.5 set up for VXLAN encapsulation. configuration without any acceleration (see Figure 2 and Figure 3). The test drove network I/O using industry standard IxChariot

10 application from multiple Windows Virtual Machines. Various RX VXLAN w/o Acceleration RX VXLAN w/Acceleration different scenarios and transfer sizes were tested. For detailed 9 8

steps on ESXi 6.0 configuration refer to theMarvell VXLAN 7 126% Bandwidth Deployment Guide. 6

5 Table 2 shows a summary of the benchmark test results. Gbps 4

3 Table 2. VMware ESXi Tunneling Offload Test Results 2 "without "with 1 "Performance 0 Traffic VXLAN VXLAN 64K 32K 16K 8K 4K 2K 1K Advantage" Acceleration" Acceleration" Transfer Size 10GbE 126% Figure 2. 3400/8400 (578xx) Series Adapter VMware ESXi Receive (RX) 6.70 Gbps 8.43 Gbps (Figure 2) 5.5 VXLAN RX Traffic Performance Only 10GbE 132% Bi-Directional 13.1 Gbps 17.3 Gbps (Figure 3) (BiDir) Only 25GbE 120% Bi-Directional 38 Gbps 45.7 Gbps (Figure 4) (BiDir) Only

SN0430997-00 Rev. D 06/19 3  Accelerating Network Virtualization Overlays White Paper

18 BENEFITS OF ACCELERATING NETWORK VIRTUALIZATION BiDir VXLAN w/o Acceleration BiDir VXLAN w/Acceleration 16 OVERLAYS

14 132% Marvell 3400/8400(578xx)/41000/45000 SeriesAdapters 12 Bandwidth (available as low-profile single/dual/quad 10GbE/25GbE/ 10

Gbps 8 40GbE/50GbE/100GbE port PCIe/OCP adapters for rack

6 and tower servers support L2 networking, Universal RDMA- 4 Offload, iSCSI-Offload, and FCoE-Offload) combine the benefits 2 of GENEVE/NVGRE/VXLAN overlays with stateless offload 0 64K 32K 16K 8K 4K 2K 1K acceleration: Transfer Size Figure 3. 3400/8400 (578xx) Series Adapter VMware Marvell Adapters with tunneling acceleration can be deployed ESXi 5.5 VXLAN Bi-Directional Traffic Performance on an existing GENEVE/NVGRE/VXLAN infrastructure, which reduces Capital Expenditure (CAPEX) costs and increases The Marvell FastLinQ 25Gb QL452xx Series Adapters showed a cloud-scale network performance. performance increase of 120% bandwidth and 164% CPU Efficiency Cloud networks can scale up the number of VMs being (Gbps/CPU Utilization) with VXLAN bidirectional traffic over an deployed on their servers using Marvell FastLinQ Adapter ESXi 6.0 configuration without any acceleration (see Figure 4). tunneling acceleration, which increases the number of tenants with the same physical infrastructure and boosts their Return On Investment (ROI) on GENEVE/NVGRE/VXLAN deployments. 50 QL45212 Offloaded QL45212 non-Offloaded 45 Cloud network administrators using Marvell Adapter 40 >120% BW

35 tunneling acceleration can provision additional bandwidth

30 2.521 for resourceintensive applications or over-provision VMs with 2.496 2.450 2.473 2.350 25 2.198 1.957 Gbps bandwidth for highpeak scenarios. 20 >164% Gbps/CPU

Throughput (Gbps) Throughput 15 1.550 1.520 1.564 1.520 1.514 1.475 1.426 GENEVE/NVGRE/VXLAN and Marvell Adapters support a larger 10

5 number of VLANs (16 million) over a wider non-geographic

0 network, allowing tenant VMs to be deployed anywhere. 64K 32K 16K 8K 4K 2K 1K Transfer Size (Bytes) GENEVE/NVGRE/VXLAN and Marvell Adapters enable network Figure 4. 45000 Series Adapter VMware ESXi 6.0 VXLAN administrators to create or migrate VMs dynamically over Bi-Directional Traffic Performance geographically separated locations, increasing flexibility.

ABOUT MARVELL: Marvell first revolutionized the digital storage industry by moving information at speeds never thought possible. Today, that same breakthrough innovation remains at the heart of the company’s storage, processing, networking, security and connectivity solutions. With leading intellectual property and deep system-level knowledge, Marvell semiconductor solutions continue to transform the enterprise, cloud, automotive, industrial, and consumer markets. For more information, visit www.marvell.com.

Copyright © 2019 Marvell. All rights reserved. Marvell and the Marvell logo are registered trademarks of Marvell. All other trademarks are the property of their respective owners.

SN0430997-00 Rev. D 06/19 4