Resilient and Highly Performant Network Architecture for Virtualized Data Centers Danilo Cerovic

Resilient and highly performant network architecture for virtualized data centers Danilo Cerovic To cite this version: Danilo Cerovic. Resilient and highly performant network architecture for virtualized data centers. Networking and Internet Architecture [cs.NI]. Sorbonne Université, 2019. English. NNT : 2019SORUS478. tel-02931840 HAL Id: tel-02931840 https://tel.archives-ouvertes.fr/tel-02931840 Submitted on 7 Sep 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE DE DOCTORAT DE LA SORBONNE UNIVERSITÉ Spécialité Informatique École doctorale Informatique, Télécommunications et Électronique de Paris Présentée par Danilo CEROVIC´ Pour obtenir le grade de DOCTEUR de la SORBONNE UNIVERSITE Sujet de la thèse : Architecture réseau résiliente et hautement performante pour les datacenters virtualisés soutenue le 7 février 2019 devant le jury composé de : M.RaoufBOUTABA Rapporteur Professeur-UniversitédeWaterloo M.OlivierFESTOR Rapporteur Directeur-TélécomNancy M.DjamalZEGHLACHE Examinateur Professeur-TélécomSudParis M.ChristianJACQUENET Examinateur ReferentExpert,Networks of the Future - Orange Labs Mme.MichelleSIBILLA Examinateur Professeur-IRIT M.MarceloDIAS DE AMORIM Examinateur DirecteurdeRecherche - CNRS M.GuyPUJOLLE Directeurdethèse Professeur-SorbonneUniversité M.KamelHADDADOU Encadrant DirecteurdeRechercheetDéveloppement - Gandi À la mémoire de mon père 4 Abstract The amount of traffic in data centers is growing exponentially and it is not expected to stop growing any time soon. This brings about a vast amount of advancements in the networking field. Network interface throughputs supported today are in the range of 40Gbps and higher. On the other hand, such high interface throughputs do not guarantee higher packet processing speeds which are limited due to the overheads imposed by the architecture of the network stack. Nevertheless, there is a great need for a speedup in the forwarding engine, which is the most important part of a high-speed router. For this reason, many software-based and hardware-based solutions have emerged recently with a goal of increasing packet processing speeds. The networking stack of an operating system is not conceived for high-speed networking applications but rather for general purpose communications. In this thesis, we investigate various approaches that strive to improve packet processing performance on server-class network hosts, either by using software, hardware, or the combi- nation of the two. Some of the solutions are based on the Click modular router which offloads its functions on different types of hardware like GPUs, FPGAs or different cores among different servers with parallel execution. Furthermore, we explore other software solutions which are not based on the Click modular router. We compare software and hardware packet processing solutions based on different criteria and we discuss their integration possibilities in virtualized environments, their constraints and their requirements. As our first contribution, we propose a resilient and highly performant fabric network architecture. Our goal is to build a layer 2 mesh network that only uses directly connected hardware acceleration cards that perform packet processing instead of routers and switches. We have decided to use the TRILL protocol for the communication between these smart NICs as it provides a better utilization of network links while also providing least-cost pair-wise data forwarding. The data plane packet processing is offloaded on a programmable hardware with parallel processing capability. Additionally, we propose to use the ODP API so that packet processing application code can be reused by any other packet processing solution that supports the ODP API. As our second contribution, we designed a data plane of the TRILL protocol on the MPPA (Massively Parallel Processor Array) smart NIC which supports the ODP API. Our experi- mental results show that we can process TRILL frames at full-duplex line-rate (up to 40Gbps) for different packet sizes while reducing latency. 5 6 As our third contribution, we provide a mathematical analysis of the impact of different network topologies on the control plane’s load. The data plane packet processing is performed on the MPPA smart NICs. Our goal is to build a layer 2 mesh network that only uses directly connected smart NIC cards instead of routers and switches. We have considered various network topologies and we compared their loads induced by the control plane traffic. We have also shown that hypercube topology is the most suitable for our PoP data center use case because it does not have a high control plane load and it has a better resilience than fat-tree while having a shorter average distance between the nodes. Keywords Networking, Packet processing, Cloud computing, ODP, MPPA, TRILL, Virtualization, DPDK. 7 8 Table of contents 1 Introduction 19 1.1 Motivation.................................... 19 1.2 Problematics .................................. 22 1.3 Contributions .................................. 24 1.4 Planofthethesis ................................ 26 2 State of the art 29 2.1 Introductoryremark. .. .. .. .. .. .. .. .. 30 2.2 SolutionsandProtocolsusedforfabricnetworks . ......... 30 2.2.1 Fabricnetworksolutions . 31 2.2.1.1 QFabric ........................... 31 2.2.1.2 FabricPath .......................... 32 2.2.1.3 BrocadeVCSTechnology. 32 2.2.2 Communicationprotocolsforfabricnetworks . ...... 33 2.2.2.1 TRILL............................ 33 2.2.2.2 SPB ............................. 33 2.3 Fastpacketprocessing .. .. .. .. .. .. .. .. 34 2.3.1 Terminology .............................. 34 2.3.1.1 FastPath........................... 34 2.3.1.2 SlowPath .......................... 36 2.3.2 Backgroundonpacketprocessing . 36 2.4 Softwareimplementations . .. 38 2.4.1 Click-basedsolutions. 38 2.4.1.1 Click............................. 38 2.4.1.2 RouteBricks ......................... 41 2.4.1.3 FastClick........................... 42 2.4.2 Netmap................................. 44 2.4.3 NetSlices ................................ 45 2.4.4 PF_RING(DNA)............................ 46 9 10 TABLE OF CONTENTS 2.4.5 DPDK.................................. 48 2.5 Hardwareimplementations . .. 50 2.5.1 GPU-basedsolutions . 50 2.5.1.1 Snap ............................. 50 2.5.1.2 PacketShader ........................ 51 2.5.1.3 APUNet ........................... 53 2.5.1.4 GASPP............................ 55 2.5.2 FPGA-basedsolutions . 55 2.5.2.1 ClickNP ........................... 55 2.5.2.2 GRIP............................. 56 2.5.2.3 SwitchBlade ......................... 58 2.5.2.4 Chimpp ........................... 60 2.5.3 Performance comparison of different IO frameworks . ....... 60 2.5.4 Otheroptimizationtechniques . .. 62 2.6 Integration possibilities in virtualized environments.............. 62 2.6.1 Packet processing in virtualized environments . ........ 62 2.6.2 Integration constraints and usage requirements . ........ 66 2.7 Latest approaches and future directions in packet processing ......... 70 2.8 Conclusion ................................... 72 3 Fabric network architecture by using hardware acceleration cards 75 3.1 Introduction................................... 76 3.2 Problems and limitations of traditional layer 2 architectures.......... 76 3.3 Fabricnetworks................................. 78 3.4 TRILL protocol for communication inside a data center . .......... 78 3.5 Comparison of software and hardware packet processing implementations . 80 3.5.1 Comparisonofsoftwaresolutions . .. 80 3.5.1.1 Operations in the user-space and kernel-space . ... 80 3.5.1.2 Zero-copytechnique. 81 3.5.1.3 Batchprocessing. 81 3.5.1.4 Parallelism.......................... 83 3.5.2 Comparisonofhardwaresolutions . .. 83 3.5.2.1 Hardwareused........................ 83 3.5.2.2 UsageofCPU ........................ 83 3.5.2.3 Connectiontype . 84 3.5.2.4 Operations in the user-space and kernel-space . ... 84 3.5.2.5 Zero-copytechnique. 84 3.5.2.6 Batchprocessing. 85 3.5.2.7 Parallelism.......................... 85 3.5.3 DiscussiononGPU-basedsolutions . .. 87 3.5.4 DiscussiononFPGA-basedsolutions . .. 88 3.5.5 Otherhardwaresolutions. 89 TABLE OF CONTENTS 11 3.6 KalrayMPPAprocessor. 90 3.6.1 MPPAarchitecture ........................... 90 3.6.2 MPPAAccessCoreSDK . 91 3.6.3 Reasons for choosing MPPA for packet processing .......... 92 3.7 ODP(OpenDataPlane)API. 93 3.7.1 ODPAPIconcepts ........................... 94 3.7.1.1 Packet ............................ 95 3.7.1.2 Thread............................ 95 3.7.1.3 Queue ............................ 96 3.7.1.4 PktIO............................. 96 3.7.1.5 Pool ............................. 96 3.8 Architecture of the fabric network by using the MPPA smartNICs ...... 96 3.9 Conclusion ................................... 98 4 Data plane offloading on a high-speed parallel processing architecture 101 4.1 Introduction................................... 101 4.2 Systemmodelandsolutionproposal . .. 102 4.2.1 Systemmodel.............................

Resilient and Highly Performant Network Architecture for Virtualized Data Centers Danilo Cerovic

ECE 435 – Network Engineering Lecture 15

FRR - a New Quagga Fork with a More Open Development

Test-Beds and Guidelines for Securing Iot Products and for Secure Set-Up Production Environments

Institutionalizing Freebsd Isolated and Virtualized Hosts Using Bsdinstall(8), Zfs(8) and Nfsd(8)

Laboratory 2 ARP; Zebra Routing Daemon Part1. Introduction

Vector Packet Processor Documentation Release 0.1

Thread Scheduling in Multi-Core Operating Systems Redha Gouicem

Challenges in Testing How Opensourcerouting Tests Quagga

Vyos Documentation Release Current

Segment Routing: a Comprehensive Survey of Research Activities, Standardization Efforts and Implementation Results

Building a Virtualisation Appliance with Freebsd/Bhyve/Openzfs Jason Tubnor ICT Senior Security Lead Introduction

Bhyve - Improvements to Virtual Machine State Save and Restore