Accelerating Network Virtualization Overlays White Paper

White Paper Accelerating Network Virtualization Overlays Assists and Offloads for Linux KVM/XEN, Windows Hyper-V, and VMware® GENEVE/NVGRE/VXLAN Marvell® FastLinQ™ Linux KVM/XEN, Windows Hyper-V, and VMware® Generic Network Virtualization Encapsulation (GENEVE), Network Virtualization using Generic Routing Encapsulation (NVGRE), and Virtual Extensible LAN (VXLAN) tunneling overlay network technology addresses key network scalability challenges associated with a hyperscale cloud infrastructure. Marvell® FastLinQ® 3400/8400(578xx)/41000/45000 Series 10GbE/25GbE/40GbE/50GbE/100GbE Adapters offer the most complete portfolio of overlay network offloads. QMarvell tunneling offloads enhance Linux KVM/XEN, Windows Hyper-V, and VMware ESXi to scale network performance. KEY REQUIREMENTS FOR VIRTUALIZED CHALLENGES IN VIRTUALIZED CLOUD-SCALE NETWORKS CLOUDSCALE NETWORKS In today’s cloud-scale networks, multiple organizations share A virtualized, multi-tenant environment must allow the the same physical infrastructure. Utilizing common processing unlimited transparent migration of workloads across and networking resources on an as-needed basis has become physical servers— while controlling the cost and maintaining a standard business practice. Some cloud networks support the Quality of Service (QoS) the customer requires. Most implementations with dedicated physical servers for each importantly, virtualized data centers need the flexibility customer, while other cloud network implementations support of provisioning resources that span multiple geographic virtual servers per customer (on a common physical server). locations. At the same time, virtualized data centers must maintain isolation between tenants and still allow seamless A single network environment that hosts multiple customers management of the multi-tenant environment. (tenants) allows the customers to reduce up-front costs for processing or networking resources, yet provides them with the Virtualized cloud networks must also accomplish flexibility to increase or reduce the resources as needed. Such the following: multi-tenant environments are increasingly using these new Handle MAC address growth in conjunction with the architectures due to the advantages of server virtualization. explosive growth of VMs in the cloud data center. However there are some challenges to overcome in order to deliver a truly scalable, elastic, and secure virtualized cloud-scale Accommodate increasingly larger number of VLANs to handle VM traffic segregation in a VLAN “sprawl” infrastructure. situation. Provide isolation of the physical L2 network. Each virtual tenant network has the illusion of being on its own physical network without impacting network performance and scalability. SN0430997-00 Rev. D 06/19 1 Accelerating Network Virtualization Overlays White Paper Figure 1 highlights some of the benefits of a virtualized cloud-scale clouds that are the sum of their parts—rather than distinct sets network as well as the shortcomings this environment commonly of parts. A true global cloud essentially decouples the physical faces. The benefits are outlined in green (+), and the shortcomings network design from the logical network design. GENEVE/ are outlined in red (-). NVGRE/VXLAN accomplishes these goals by using an overlay of tunnels, where layer 2 Ethernet frames are encapsulated (or tunneled) within layer 3 packets. Elastic(+) Technically, this is achieved by deploying VXLAN technology Seamless in the hypervisor vSwitch (Linux KVM/XEN, Windows Hyper-V, Extension with Extensible(+) VLANS, IP and VMware ESXi using GENEVE/NVGRE/VXLAN tunneling Subnets(-) protocol) and any Ethernet adapter. However choosing a non-fully featured Ethernet adapter would compromise the performance and flexibility of the cloud infrastructure. GENEVE tunneling technology is supported on VMware 6.7 and later. VXLAN is Lower supported on Linux KVM/XEN, Windows Server 2016 and later Isolation(-) Cloud Cost(+) Hyper-V, and VMware ESXi 5.1/6.0 and later. CHALLENGES OF TUNNELING TECHNOLOGY AND MARVELL SOLUTIONS Logical Addressing Scalable(+) The main drawback of layer 2 over layer 3 tunneled traffic is Scheme for Tenants(-) that encapsulated traffic by-passes the normal stateless offload Workload features of a GENEVE/NVGRE/VXLAN-unaware adapter due to Mobility(-) the following reasons: CPUs moving the packets individually (not as a block of data) Figure 1. Benefits and Shortcomings of Virtualization to the send queue. in a Multi-Tenant Cloud Environment CPUs calculating each send/received packet’s checksum value. A single CPU core (normally core 0) handling all GENEVE/ SOLUTIONS FOR VIRTUALIZED CLOUD-SCALE NETWORKS NVGRE/VXLAN tunneled traffic of all ports. To provide workload mobility and migrate across geographic locations, one cloud network solution is to decouple the physical With these GENEVE/NVGRE/VXLAN-aware Marvell FastLinQ and logical addressing schemes. The tenant uses the logical Adapters, both the transmit and receive direction stateless address while the network infrastructure sees the physical address. offloads become available for use, which helps resolve all the This decoupling enables the flexibility required by the virtualized challenges outlined above and greatly increases tunneled traffic cloud data center for creating a faster, fatter, and flatter network. throughput with lower overall CPU utilization. SCALING CLOUD NETWORKS WITH LAYER 2 OVER LAYER The Marvell FastLinQ Adapters built-in assists and offloads for 3 TUNNELING tunneled traffic helps network architects overcome the negatives (shown in Figure 1) in a virtualized private cloud environment and GENEVE/NVGRE/VXLAN are used specifically to allow non- improve performance via tunneled traffic offloading. geographically located server farms (that are not on the same IP subnet) to transparently do VM migrations and to allow Marvell adapters, with assists and offloads for tunneled traffic tenant VM users to seamlessly interact with other pre-defined technology, enable necessary QoS for multiple VMs in a multi- workloads, regardless of the geographic/IP subnet location of core environment by removing the single-queue restrictions the workloads. and enabling multiple hardware queues for multiple VMs. For VMware ESXi Hypervisor, the Marvell FastLinQ 3400/8400 Scaling the cloud network with GENEVE/NVGRE/VXLAN is the (578xx) Series Adapters acceleration is enabled by UDP RSS first step toward enabling logical, software-based networks that packet task offloading support in ESXi 5.1 and UDP/TCP RSS, can be created on demand, allowing enterprises to leverage capacity wherever it is available. In other thewords, GENEVE/ NVGRE/VXLAN can now help companies build true global SN0430997-00 Rev. D 06/19 2 Accelerating Network Virtualization Overlays White Paper TCP Segmentation Offload (TSO), NetQueue, and TX Checksum Marvell FastLinQ Adapter’s assists and offloads for tunneled traffic Offload packet task offloading support available in ESXi 5.5 and technology enables efficient distribution of network transmit and 6.0./6.5/6.7 and later. The Marvell FastLinQ 41000/45000 Series receive processing for GENEVE/NVGRE/VXLAN traffic across Adapters acceleration is enabled through UDP/TCP RSS, TSO, servers with multiple CPU cores. With Marvell Server Network NetQueue, Large Receive Offload (LRO), and TX/RX Checksum Virtualization Overlay acceleration, these FastLinQ adapter’s Offload packet task offloading on ESXi 6.0/6.5/6.7 and later. provide the ability to distribute workloads efficiently across all processor cores. Table 1. Adapter Assists and Offloads for Tunneled Traffic Linux KVM-XEN/Windows Hyper-V /VMware ESXi Standard Adapter 3400/ 8400 41000/ GENEVE/NVGRE/VXLAN without (578xx) Series 45000 Series Benefits of Acceleration Tunneled Traffic Stateless Acceleration Adapter Adapter Offloads Large Segmentation Offload Offloading reduces interrupts for large (LSO) transfer sizes, which saves CPU cycles TX Checksum Offload (CSO) Offloading calculations saves CPU cycles RX Checksum Offload (CSO) Offloading calculations saves CPU cycles Spreads workload over multiple CPU cores in RSS Queue Acceleration the Host, avoiding single core bottlenecks NetQueue/VMQ/MQ Spreads workload over multiple CPU cores in Acceleration the VM, avoiding single core bottlenecks Large Receive Offload (LRO) Offloading reduces interrupts for large with HW Transparent Packet transfer sizes, which saves CPU cycles Aggregation (TPA) PERFORMANCE METRICS OF VXLAN ESXI 5.5 TCP/UDP RSS The Marvell FastLinQ 10Gb 3400/8400 Series Adapters showed AND TSO/TX CSO ACCELERATION a performance increase of 126% with Receive VXLAN traffic Marvell performed benchmark tests between two identical x86 and 132% with VXLAN bidirectional traffic over an ESXi 5.5 servers running VMware ESXi 5.5 set up for VXLAN encapsulation. configuration without any acceleration (see Figure 2 and Figure 3). The test drove network I/O using industry standard IxChariot 10 application from multiple Windows Virtual Machines. Various RX VXLAN w/o Acceleration RX VXLAN w/Acceleration different scenarios and transfer sizes were tested. For detailed 9 8 steps on ESXi 6.0 configuration refer to the Marvell VXLAN 7 126% Bandwidth Deployment Guide. 6 5 Table 2 shows a summary of the benchmark test results. Gbps 4 3 Table 2. VMware ESXi Tunneling Offload Test Results 2 "without "with 1 "Performance 0 Traffic VXLAN VXLAN 64K 32K 16K 8K 4K 2K 1K Advantage" Acceleration" Acceleration" Transfer Size 10GbE 126% Figure 2. 3400/8400 (578xx)

Accelerating Network Virtualization Overlays White Paper

Cisco Nexus 1000V Switch for KVM Data Sheet

CCNA-Cloud -CLDFND-210-451-Official-Cert-Guide.Pdf

Solving the Virtualization Conundrum

Understanding Linux Internetworking

Infrastructure : Netapp Solutions

Increase Nfvi Performance and Flexibility

Linux Networking 101

Network Service in Openstack Cloud

A Performance Study of VM Live Migration Over the WAN

Learn About VXLAN in Virtualized Data Center Networks

Contrail for the Enterprise | White Paper

Vmware Validated Design Reference Architecture Guide