IT 20 044 Examensarbete 30 hp July 2020

Network Service Mesh Solving Cloud Native IMS Networking Needs

Lionel Jouin

Institutionen för informationsteknologi Department of Information Technology

Abstract Network Service Mesh Solving Cloud Native IMS Networking Needs Lionel Jouin

Teknisk- naturvetenskaplig fakultet UTH-enheten With the growing demand for mobile networks and specially IP Multimedia subsystem (IMS), new cloud native orchestration tools Besöksadress: providing more flexibility and efficiency start to be used within Ångströmlaboratoriet Lägerhyddsvägen 1 telecommunication companies in order to improve the robustness and Hus 4, Plan 0 the reliability of these systems. However, , the most used among cloud native orchestration tools does not fulfill completely Postadress: all the needs and use cases in terms of networking the Box 536 751 21 Uppsala telecommunication industry meets. Network Service Mesh (NSM), a new Cloud Native Computing Foundation (CNCF) project, aiming to address Telefon: complex networking use cases in Kubernetes might solve the different 018 – 471 30 03 issues IP multimedia subsystem face. Detailed designs and Telefax: implementations using Network Service Mesh coupled with diverse 018 – 471 30 00 networking technologies are shown in this thesis with the objective of solving the networking IP multimedia subsystem requirements (e.g. Hemsida: the NAT issue and the secondary network). In addition, an analysis http://www.teknat.uu.se/student and an evaluation of Network Service Mesh is given together with a presentation of the ability of this new project to bring solutions to IP Multimedia subsystem based on a cloud native technology.

Handledare: Saminathan Vijayabaskar Ämnesgranskare: Thiemo Voigt Examinator: Mats Daniels IT 20 044 Tryckt av: Reprocentralen ITC

Acknowledgements

This work has been conducted in collaboration with Ericsson. I want to thank the company for having provided all the information and resources I required to complete this project. Special thanks to Jerker Zetterlund for his constant support and for giving me this wonderful opportunity to work at Ericsson. To my supervisor, Saminathan Vijayabaskar, I would like to express my gratitude for his very helpful experience and kindness. I would also like to thank those who, in any way, have been involved in my thesis work. Further, I would like to thank my reviewer Thiemo Voigt at Uppsala University for his precious advices and comments for structuring and writing this document.

July 3rd, 2020 Lionel Jouin Contents

1 Introduction1 1.1 Motivation and Objectives...... 2 1.2 Delimitations...... 2 1.3 Structure of the Report...... 3

2 Background4 2.1 IP Multimedia subsystem...... 4 2.1.1 Network address translation issues...... 5 2.1.2 Traffic separation / Secondary network...... 6 2.1.3 Environment...... 6 2.2 ...... 6 2.2.1 Namespaces...... 7 2.2.2 Container...... 7 2.3 Kubernetes...... 7 2.3.1 Service...... 7 2.3.2 Container Network Interface...... 8 2.4 Network Function...... 8 2.4.1 Load Balancing...... 9 2.4.2 Firewall...... 10 2.4.3 BGP / ECMP...... 10 2.5 Service Mesh...... 10 2.6 Network Service Mesh...... 11 2.6.1 Control plane...... 13 2.6.2 Data plane...... 14 2.6.3 Service Function Chaining...... 15 2.6.4 Community and future development...... 16 2.7 Related work...... 17 2.7.1 NAT...... 17 2.7.2 Alternatives...... 17 2.7.3 Performance...... 18

i 3 Design 19 3.1 Ingress traffic alternatives...... 19 3.1.1 Host shared...... 19 3.1.2 VPN...... 21 3.1.3 MACVLAN / IPVLAN...... 23 3.1.4 Overlay Network / VxLAN...... 25 3.1.5 Load Balancing and VIP advertisement...... 27 3.2 Egress traffic alternatives...... 28 3.2.1 Tunneling...... 29 3.2.2 NSE delegation...... 29 3.2.3 Connection Tracker / Port Allocation...... 30 3.2.4 Multiple NSEs...... 32 3.2.5 Dynamic allocation by the application...... 32 3.3 Data plane / Control plane separation...... 32

4 Implementation 36 4.1 Environment...... 36 4.1.1 OpenStack...... 36 4.1.2 Kubernetes...... 37 4.1.3 Development...... 37 4.2 Network Service Endpoint...... 37 4.2.1 Interface...... 37 4.2.2 VPN...... 38 4.2.3 Load Balancing...... 39 4.2.4 BGP...... 40 4.2.5 Port allocation...... 40 4.3 Network Function Chaining...... 42 4.4 Network Service Client...... 44

5 Evaluation 46 5.1 Benchmarking methodology...... 46 5.2 Data plane performance...... 47 5.2.1 External Connectivity...... 47 5.2.2 Network Service Mesh Connectivity...... 49 5.3 Security...... 50 5.4 Scalability...... 50

6 Conclusions and Future work 52

ii List of Figures

2.1 Reference Architecture of the IP Multimedia Core Network Subsystem5 2.2 Overview of networking in Kubernetes with NSM...... 12 2.3 Network Service Chaining example...... 16

3.1 NSM - Ingress - Host Shared...... 20 3.2 NSM - Ingress - VPN...... 22 3.3 NSM - Ingress - IPVLAN/MACVLAN...... 24 3.4 NSM - Ingress - VxLan...... 26 3.5 NSM - Ingress - BGP and IPVS...... 28 3.6 NSM - Egress - NAT...... 30 3.7 NSM - Egress - No NAT...... 31 3.8 NSM - Data plane / Control plane separation using namespaces.... 34

4.1 LVS - Packet Flow...... 41

5.1 External Connectivity performances...... 48 5.2 Network Service Mesh Connectivity performances...... 49

iii Listings

4.1 IPVS command to create a service...... 39 4.2 IPVS command to add a real server to a service...... 39 4.3 IPVS command to remove a real server from a service...... 40 4.4 IPTables command to mark TCP packets according to a destination port range...... 42 4.5 Specification of an NSM Network service...... 42 4.6 Specification of a Network Service Endpoint deployment...... 43 4.7 Specification of a Network Service Client deployment...... 44 4.8 IPTables to source NAT outgoing traffic...... 45

iv List of symbols and abbreviations

3GPP 3rd Generation Partnership Project AS Autonomous System ASIC Application-specific integrated circuit BGP Border Gateway Protocol CIDR Classless Inter-Domain Routing CNCF Cloud Native Computing Foundation CNF Cloud Network Function CNI Container Network Interface DC Data Center DHCP Dynamic Host Configuration Protocol DNS Domain Name System DPDK Data Plane Development Kit ECMP Equal-Cost Multi-path Routing EVPN Ethernet VPN fps Frames per Second FQDN Fully Qualified Domain Name FTP File Transfer Protocol fwmark Firewall Mark GRE Generic Routing Encapsulation gRPC gRPC Remote Procedure Calls HTTP Hypertext Transfer Protocol IANA Internet Assigned Numbers Authority IETF Internet Engineering Task Force IMS IP Multimedia Subsystem IoT Internet of Things IP Internet Protocol IPAM IP Address Management IPsec Internet Protocol Security IPVS IP Virtual Server ipvsadm IPVS Administration ISP Internet Service Provider LAN Local Area Network

v LB Load Balancer LVS MAC Media Access Control memif Shared Memory Packet Interface MP-BGP Multiprotocol BGP MTAS Multimedia Telephony Application Server MTU Maximum Transmission Unit NAPT Network Address Port Translation NAT Network Address Translation NFV Network function NSE Network Service Endpoint NSM Network Service Mesh OCI Open Containers Initiative OS OSI Open Systems Interconnection OVS Open vSwitch PCIe Peripheral Component Interconnect Express pps Packets per Second QoS Quality of service RDMA Remote Direct Memory Access RFC Request for Comments SCTP Stream Control Transmission Protocol SDK Software Development Kit SDN Software-Defined Networking SDP Session Description Protocol SFC Service Function Chaining SIP Session Initiation Protocol SR-IOV Single-root input/output virtualization srv6 Segment Routing over IPv6 TCP Transmission Control Protocol Telco Telephone Company TOE TCP Offload Engine UDP User Datagram Protocol URLLC Ultra-Reliable and Low Latency Communications us Microsecond veth Virtual Ethernet Device VIP Virtual IP VLAN Virtual Local Area Network VM Virtual Machine VNF Virtual Network Function VNI VLAN/VxLAN Network Identifier VPN Virtual Private Network VPP Vector Packet Processing VxLAN Virtual eXtensible Local Area Network YAML YAML Ain’t Markup Language

vi Chapter 1

Introduction

In recent years, the growth of data production and consumption has never stopped increasing. One of the main motivations for the development of 5G is to manage this amount of data caused by existing technologies but also to new technologies such as IoT and self-driving cars. According to the Ericsson Mobility Report [39], the data traffic increased by 68 percent between Q3 2018 and Q3 2019, and forecasts announce that this traffic will continue to grow by 27 percent per year between 2019 and 2025. As the demand and the usages continue to grow, it is a real challenge for the telecommunications industry, which must find new innovative solutions to meet this demand, but also to remain competitive while keeping affordable prices. Many moves could be observed between hardware-based solution, virtualization and cloud native solution. A shift already happened a few years back from expensive, purpose-built hardware to general-purpose servers and software using virtualization [41]. Moving Network functions, such as routing, security, load balancing to virtual network function (VNF) and software defined networking (SDN) allowed providers to dynamically offer these network services with the ability to spin them up or down on demand [94]. Nowadays, the new 3rd Generation Partnership Project (3GPP) standardization defines the system architecture of the 5G Core functions based on cloud native concepts and principles [15][38] to respond to capabilities raised by the Ultra-Reliable and Low Latency Communications (URLLC) requirements such as throughput, latency and availability [45]. 5G has certainly announced a new switch from Virtual Network Functions (VNFs) to Cloud Native Network Functions (CNFs). In addition, this situation is accelerated with a general usage of cloud native technologies within IT companies and the creation of the organizations such as the Cloud Native Computing Foundation (CNCF) or the Open Containers Initiative (OCI) that are standardizing protocols and software.

1 1.1 Motivation and Objectives

Moving to cloud native technologies would allow better performance, better flexibility and better reliability in comparison with the previous existing solution the telecommunication’ world has used (dedicated hardware, virtualization). The Cloud Native technology is still very young, many components and features are still missing especially in networking. As for instance, Docker was first started in 2013 and the initial release of Kubernetes was in 2014. But this technology has already solved many issues in the IT and applications enterprise, we can observe this with the emergence of new highly available services such as Netflix or Uber. This research targets the IP multimedia subsystem (IMS) applications which are now running on cloud native environments with the Kubernetes orchestration tools. IMS has legacy and complex networking requirements that cannot be fulfilled by the default Kubernetes networking system. Kubernetes, in general, relies on the networking provided by different Container Network Interfaces (e.g. Calico) which are using existing Linux functionalities like IPTables or IPVS. Depending on the CNI, the functionalities and network functions might differ. Calico, for instance, is offering BGP capabilities when Flannel cannot. But, in general, the main components are the same, all CNIs afford, at least, a primary virtual network and load balancing functionalities. This study is then needed to determine if the new Cloud Native Computing Foundation’ project called Network Service Mesh could fulfill the lack of networking functionality within Kubernetes, and therefore, bring a solution to the IMS based on a cloud native technology. According to the website of Network Service Mesh, the project aims to bring solutions to the complex use cases met by the Telcos, ISPs, and advanced enterprise networks in their architecture and development of NFV, 5G networks, Edge Computing and IoT within Cloud Native environments.

1.2 Delimitations

Some of the priority IMS networking requirements are listed below, but this must be considered as a general requirement for any Telco applications. Similar requirements have also been observed from other Ericsson application domains.

• IMS applications with Session Initiation Protocol (SIP) as signaling protocol are not working well with the NAT in standard K8s network • K8s NAT will add additional latency to the SIP communication • Applications have to comply to a traffic separation requirement on VLAN level which is not possible currently since the standard K8s pods can be attached to only one network • IMS applications running as many instances are addressed E2E using Virtual IP address, VIP address need to be preserved for both ingress and egress scenarios • Standard Kubernetes provided L3/L4 load balancing is based on the Linux (IPVS) which does NAT for selecting a particular backend pod, a E2E load

2 balancing together with the existing L7 load balancer based on Envoy has to be investigated as part of the thesis in NSM • There are specific applications that need to support large number of networks making the applications VLAN aware

• IMS AKA (IPSec) is also a strict requirement which will not work with NAT • Each individual application has different interface types requirements (trunked, nontrunked, kernel, DPDK)

Although an overview is given to be able to understand what issues IMS is facing, its implications and use cases, the aim of this thesis is not to describe and explain in details how are working Kubernetes, Linux, Containers, NFVs and Service Mesh, this thesis focuses on proposing solutions and designs with the implementation of the most promising ones. One of the requirements for this thesis is to not break any Linux, Kubernetes or Network Service Mesh components, solutions must be built on top of existing projects IMS is using. Knowing this requirement, the performance of NSM will be evaluated and discussed following the current state of the features of NSM.

1.3 Structure of the Report

This report is organized as follows; Chapter 2 introduces the concepts and technologies about networking, Cloud Native systems and telecommunication companies. Chapter 3 presents the several possible solutions found to the given problematic. In Chapter 4, the environment used, and the implementation instructions are described allowing readers to reproduce the experiments. Chapter 5 describes the benchmarking methodology, shows the results and analyzes the performances, security and scalability of Network Service Mesh using the implemented architecture. Finally, Chapter 6 concludes the report with a discussion about the findings and the results. It is also providing suggestions for future work.

3 Chapter 2

Background

In this chapter, a brief high-level overview of all components involved in this study is given for a better understanding of the implications and concepts of the designs, implementations and evaluations presented in the respective chapters3,4 and5. In addition, this chapter includes related work from other researchers and technologies.

2.1 IP Multimedia subsystem

Started in year 2000 with the third generation (3G) networks, the IP Multimedia subsystem (IMS) vision was to merge cellular networks and the Internet which were the two most successful paradigms in communications. The main ideas behind IMS was to offer Internet services everywhere and at any time using cellular technologies with a certain quality of service (QoS) to enjoy real-time multimedia sessions, but also to be able to charge multimedia sessions appropriately [44]. With IMS, an operator can offer services involving multiple parties, multiple connections and multiple media streams of one or different types such as speech, audio, video and data (e.g. chat text, shared whiteboard) to the wireless and wireline user even when roaming [16][17]. IMS is a system architecture providing the Internet protocol for telephony and multimedia services and composed of a collection of functions linked by standardized interfaces. The third-generation partnership (3GPP) is the entity which has standardized IMS and which has been continually evolving it [44]. 3GPP does not standardize nodes, but functions, Implementers (vendors) are free to decide about their own implementation. Multiple functions can be developed in a monolithic node as long as they are following the standard functions, but, in general, the IMS architecture is followed closely and is implemented with one function for each node [44]. Figure 2.1 gives the reference architecture of the IP Multimedia Core Network Subsystem. This picture is present in the Specification #: 23.228 [17] and the complete list of all the interfaces provided by IMS and its components is included in the Specification #: 23.002 [14].

4 Figure 2.1: Reference Architecture of the IP Multimedia Core Network Subsystem

2.1.1 Network address translation issues IMS components (P-CSCF, S-CSCF, etc.) are primarily responsible to route and process Session Initiation Protocol (SIP) traffic which is used as signaling protocol in order to deliver IP multimedia services to users. SIP, standardized as RFC 2543 [73] in 1999 and revised in 2002 in RFC 3261 [58], is an application-layer control (Layer 7) protocol mainly used in VoIPthat creates, modifies and terminates a multimedia session with one or more participants [64]. It consists of an exchange of short messages that contain session descriptions using the Session Description Protocol (SDP) [72], which allow participants to agree on a common set of media parameters. The path between a pair of SIP clients is handled by SIP proxy/registrar servers which are also keeping information related to the current location of clients, authenticate and authorize users for services, and route requests to those clients [23]. The client originating the request must insert into the request its host name or network address and the port number at which it wishes to receive responses [73]. Network Address Translators (NATs) traversal is then a challenge for IP communications and SIP messages since messages can be blocked due to port mismatch [23]. Network address translation is a network function (see Section 2.4) allowing hosts in a private network to transparently communicate with destinations on an external network and vice versa by modifying source and/or destination IP address(es) and port

5 number(s) [87]. This process provides more security since addresses are hidden, and it is also providing a short-term solution to the lack of IPv4 public addresses, the long term is the usage of larger addresses such as IPv6.

2.1.2 Traffic separation / Secondary network One of the legacy requirements of the IMS infrastructure is to support network separation with a dedicated network interface according to the type of the traffic and its policies. The type and the policies of a network can vary with the usage, for instance, an operations and maintenance (O&M) network would use a different Quality of service (QoS), protocols and security policies than a control plane network. This architectural approach is tightly coupled with the security requirements of the telecommunication companies in order to secure the users/consumers data flow, indeed, since network traffics are isolated, it prevents access between networks and provides a better access control. Moreover, in addition to the security, infrastructures and applications also benefit from the approach with a performance gain thanks to the reduction of the congestion and the minimization of the local traffic. Also, the global system will become more resilient, if an issue occurs on one of the networks, its effect will be limited from the other parts of the system.

2.1.3 Environment Telecommunication vendors developing IMS solutions for multiple telecommunication companies which have different needs due to the different environments they are using, would prefer implementing their solutions in a generic way in order to cover all use cases. The Cloud Native Computing Foundation (CNCF) provides a consensus for cloud native solutions with many diverse projects covering a large set of needs (e.g. service, storage, networking, etc.) in a standard way and widely used in other industries. Some of the environments have restrictions on the impact the IMS solution has on the system, for instance, host modification by installing new kernel modules could be forbidden. In addition, generic projects used have to keep their intactness by building new solutions on top of them, and not by modifying them. Telecommunication infrastructures are costly, consequently, optimization has to be really taken into consideration such as, for instance, avoiding east-west traffic.

2.2 Linux

Linux is an Open Source Unix-like Operating System driven by The and initially released by Linus Torvalds in 1991, Linux is now the world’s largest open source software project in the history of computing, with massive adoption in almost every sector, Internet of Things, smartphones, cars, desktops, servers, etc. [2][7]. Most of the projects and tools used in this document are sub-project of this Foundation such as Kubernetes with the CNCF, VPP with FD.IO, OCI, OVS, DPDK and many others [4]. Linux is covering a wide area of different subjects, thus, this thesis mainly

6 focuses on the Operating System-level virtualization, the applications on the level and the network stack of Linux.

2.2.1 Namespaces According to the Linux man pages, a namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace but are invisible to other processes [10]. Linux provides seven different namespaces, Process ID (pid), Network (net), Mount (mnt), Interprocess Communication (ipc), UNIX Timesharing System (uts), User ID (user), Control group (cgroup). Namespaces have many use cases, they are used in particular to implement containers.

2.2.2 Container Operating System-level virtualization, or Containerization, is a way to isolate a set of one or more processes from the host system [77]. This may seem, at first glance, similar or an alternative to virtual machines (VM), but, the main difference is that the containers are built on top of the host operating system’s kernel, so they contain operating system APIs, services and applications running on the user space [43]. Nowadays, containerization became really important for its lightness, autonomy and its ability to be orchestrated with, for instance, Kubernetes or Docker Swarm. An open standard for OS-level virtualization has been designed by the Linux Foundation project called Open Container Initiative (OCI) [3]. This standard specifies the development and the use of the runtimes (runtime-spec) and the images (image-spec). The goal of a this standardization is to ”encapsulate a software component and all its dependencies in a format that is self-describing and portable, so that any compliant runtime can run it without extra dependencies, regardless of the underlying machine and the contents of the container” [9]. The OCI is also developing the container runtime runc according to the OCI specification, Docker, containerd, rkt or cri-o are other examples of container runtimes compliant to the OCI specification.

2.3 Kubernetes

Kubernetes is an open-source platform automating Linux container operations, which has been originally developed and designed by Google engineers. This platform is orchestrating containers, networking and storages across multiple hosts in order to deploy, update and scale containerized applications and their resources on the fly for a better use of the hardware [6].

2.3.1 Service By default, Kubernetes, is offering the possibility to deploy applications using the resource called services. These services, once deployed, are getting their own IP

7 addresses and a fully qualified domain name (FQDN) to be able to access it. The services, in Kubernetes, are managed by the daemonset called Kube-proxy (one pod is running on each node). Kube-proxy has for responsibility the management and maintenance of the network rules using load balancing and routing principles in order to forward packets to the right pod. Three types of services exist [12]. The first one is the ClusterIP, this type is exposing the service only with an internal IP, so the service is reachable only from within the cluster, the other applications can, then, consume this service through its fully qualified domain name (FQDN) name or its cluster IP. The second one, NodePort, provides the service over a specific port (The default range is 30000-32767) on every node. The service is then accessible using one of the node IP and the port. LoadBalancer, the last service type, exposes the service through an external load balancer, it could be a load balancer provided by a cloud provider (AWS, Azure, etc.), or by a load-balancer implementation such as MetalLB.

2.3.2 Container Network Interface There is no standard implementation of the networking model within Kubernetes [76], the implementation depends on the use cases and the environment where the cluster is deployed, for instance, Microsoft azure, Amazon Web Services or Google Cloud Platform have their own container network interface for their Kubernetes services. CNI, the CNCF project, is a generic plugin-based networking solution for application containers, the CNI responsibility is to allocate network interfaces to the newly created container network namespace and make the necessary changes on the host to enable the connectivity with other containers on the same network. An IP should also be assigned to the interface by using the correct IP address management (IPAM). Since CNIs are generic plugins, they should meet the specification [11] written by the CNCF. Kubernetes is not the only user of CNIs, OpenShift, Amazon ECS and few other container environments are also using this concept.

2.4 Network Function

This section describes what are the network functions and how they have evolved over the last years. The understanding of network functions is crucial for the reading of this document since it is the principal component used in the design and implementation sections. The purpose of the utilized network functions is described in the section below. A network function is a functional building block within a network infrastructure, which has well-defined external interfaces and a well-defined functional behavior [95]. Traditionally, network functions were physical (PNFs), they were tightly coupled to dedicated proprietary hardware [34][75]. With the standardization of the networks and the high demand, Physical Network Functions have migrated to a Network Function Virtualization (NFV) architecture which is encapsulating PNFs into virtual machines which could provide more scalability and agility. Virtual Network Functions rely on kernel hacks or otherwise do not restrict themselves to Linux kernel users-pace. They also often need to use DPDK (Data Plane Development Kit) or SR-IOV (Single-

8 root input/output virtualization) to achieve enough performance [46]. 5G brings new use cases and requirements that can make Kubernetes the operating system for 5G networks, and therefore, make the evolution of VNFs to Container Network Functions (CNFs) providing a direct access to the hardware allowing better performances with less resource consumption [46].

2.4.1 Load Balancing Applications are nowadays hosted on multiple servers located around the world within multiple clusters, distributing traffic load evenly across these servers it then essential [37]. Load balancing is the technique for distributing network traffic across a group of servers in a way to optimize traffic, maximizing throughput, minimizing response time and avoiding overload of the server [50]. Load balancing provides failover concept, if a server fails, the load balancer stops sending requests to it and will instantly redirect traffic to another one, thus mitigating the impact on users [50]. Connection tracking is another concept proposed by load balancers in order to keep the context of the established connections, for instance, if the acknowledgement packets of a TCP connection are sent to several different servers, the connection will never be established correctly. Load Balancers are usually working on layer 4 and layer 7 of the OSI model. On layer 4, routing decisions and connection tracking are made based on IP addresses, port numbers and transport protocol such as TCP, UDP, SCTP and such. Layer 7 load balancers operate on the application layer, routing path and connection tracking are determined using the content of the message, for example, in HTTP, URLs or Cookies might be used. Load Balancing on layer 4 is one of the few network functions Kubernetes embeds by default with Kube-proxy as control plane and IPTables or IPVS as data plane. When a new service is created within Kubernetes, a new load balancer is deployed to distribute network traffic evenly across the pods and the nodes (if ”externalTrafficPolicy” is set to ”cluster”).

IPVS IPVS (IP Virtual Server) implements load balancing operating on transport-layer (Layer 4) inside the Linux kernel merged into versions 2.4.x and newer [70][68]. This load balancer is part of the Linux Virtual Server (LVS) open source project which is a component depending upon the Linux Netfilter framework. IPVS supports three technologies (NATting, tunneling and direct routing) to forward packets to application servers with persistent session mechanisms using configurable timeout. IPVS distributes traffic based on IP, port numbers, protocols (TCP, UDP, SCTP) and connmark (netfilter mark) using configurable algorithms (connection algorithms) among the 13 implemented inside the kernel such as Round-Robin, Maglev Hashing, Least-Connection [74].

9 2.4.2 Firewall Firewall is a network security component filtering and controlling traffic following pre-configured set of rules and policies. In Linux, the firewall is a component of the Netfilter framework, rules and policies are enabling packet filtering, network address and port translation, and packet mangling, they are administered thanks to user-space tools , iptables, ip6tables, arptables, and ebtables. Rules are chained into 5 points PREROUTING, POSTROUTING, INPUT, OUTPUT, and FORWARD. These types specify at which point a packet is manipulated. Kubernetes uses IPTables to NAT IP addresses, but also as load balancer with a hack using probability theory, random number generation and redirection features IPTables provides.

2.4.3 BGP / ECMP Within a distributed system using Kubernetes and Network Service Mesh, multiple routes for the same service might be available. A service running on several independent nodes, must be reachable using the same virtual IP address (VIP) on any of the nodes with an equal distribution of the traffic among the nodes to not overload one in particular. Every node also has to advertise about the availability of the service towards the entity distributing the traffic. This system that can be developed using BGP and ECMP [86], permits to have more reliability and scalability on a service. The Border Gateway Protocol started to be used on the internet in 1994, this protocol has been standardized over several versions, the last current one, published in 2006 with the RFC 4271, is the version 4 [99]. This routing protocol is designed to exchange routing and reachability information of autonomous systems (ASes) by advertising adjacent networks and propagating these networks to all other BGP systems. BGP is keeping this information in a database, and can construct a graph of AS connectivity to prune routing loops and enforce policy decisions. Equal-Cost Multipath, or ECMP, described in the RFC 2992 [54], is a routing strategy to route packets along multiple paths of equal cost. Various routing protocols, including OSPF, ISIS and in this case BGP, explicitly allow Equal-Cost Multipath routing [32]. ECMP evenly distributes the load over the different routes available and collected by BGP using a flow based (e.g. hash) or round-robin scheduling method. The advantages of this approach are a faster failure reaction and a better load distribution in order to avoid traffic congestion and to increase bandwidth consumption.

2.5 Service Mesh

Cloud-native applications architected as distributed are bringing new challenges such as the communication between the different parts of the same application and between the services. The new infrastructure layer completely transparent to the application, called service mesh, addressing this challenge is then needed. This layer, responsible for reliability of the service to service communication, allows developers to focus on the business logic of the application instead of having to spend time on the communication. Istio and Linkerd are the most famous among the

10 many open source service meshes available for Kubernetes. In general, service meshes are running on the layer 4 (transport) to the layer 7 (application) using sidecar attached to the applications pods to provide a set of features such as service discovery, load balancing, fault tolerance, traffic monitoring, circuit breaking and access control [66].

2.6 Network Service Mesh

Network Service Mesh (NSM) is a new open source sandbox project of the Cloud Native Computing Foundation (CNCF) which has been inspired by Istio and Service Meshes [8]. The main objective of this project is to solve complex layer 2 (Data Link) and Layer 3 (Network) use cases by providing network features which are not originally proposed in Kubernetes. The networking capabilities Network Service Mesh is adding to Kubernetes are not modifying how the default networking is working nor any pre existing functionalities, indeed, NSM, once deployed, runs as independent pods with separated networks. The new networking capabilities Network Service Mesh brought to Kubernetes make it possible to handle heterogeneous network configurations, exotic protocols, service function chaining, dynamic networking, on demand connectivity, cross-cluster/multi-cloud/hybrid-cloud connectivity that telcos, ISPs, and advanced enterprise network would require in their architectures and developments of NFV, 5G networks, Edge Computing and IoT within Cloud Native environments. As most of the Kubernetes extensions, the installation and upgrades of Network Service Mesh are achieved using built-in components such as daemonset, deployments, etc. Helm charts have been provided for two version releases of NSM (v0.1.0 Andromeda and v0.2.0 Borealis) in order to facilitate the installation within Kubernetes clusters. Helm is an open source project from the CNCF aiming to help users to manage Kubernetes applications in the definitions, installations, and upgrades. The Figure 2.2 represents the overview of a Kubernetes cluster with Network Service Mesh deployed, a simple network service using a daemonset of network service endpoint (NSE) and an application (NSC) deployment composed of 4 replicas. It is possible to see, thanks to this picture, the network services are not full meshes, this is meaning, if there is more than one NSE for a network service, the NSCs will be attached to only one NSE. For instance, in this figure, NSE-A and NSE-B are part of the same network service, but NSC-1 is connected only to NSE-A. The host where a network service endpoint is running has no impact on which network service client it will be attached to, it can be on the same, or a different, more explanation is given in Section 2.6.2.

11 Figure 2.2: Overview of networking in Kubernetes with NSM

The Figure 2.2 is also showing the main components that have been installed with NSM:

• Network Service Mesh Forwarder (nsm-vpp-forwarder/nsm-kernel-forwarder): A pod used as the control plane for the vpp/kernel-base data plane (The pod running depends on the configuration on the installation of NSM). It is providing end-to-end connections, wires, mechanisms and forwarding elements to a network service.

12 • NSM Mutating Admissions Controller (nsm-admission-webhook): A single pod using the mutating admission controller (MutatingAdmissionWebhook) of the Kubernetes API to intercept and inject an init container into network service client pod’ specifications. • Network Service Manager (NSMgr): A daemon set managing the gRPC requests for connections by matching clients with appropriate endpoints. It is providing a full mesh by forming connections to every other NSMgr within an NSM domain. • Network Service Endpoint (NSE): A pod with knowledge of network resources that implements network service. An NSE has zero or more connections to client’s requests from clients which want to receive the network service it is offering. An NSE is also acting as a control plane for a network service and can also act as a data plane at the same time. • Network Service Client (NSC): A requester or consumer of the network service. An NSC can be an NSE at the same time (in case the network service is chained for example).

In order to develop network service endpoints and network service clients, a software development kit (SDK) in Golang is provided. The SDK for the NSE enables the development of functionalities such as Load balancer, firewall or BGP speaker. The development is based on reusable components called composite implementing the ”NetworkServiceServer” interface. These composites are then chained together in a process called composition to build an NSE. Connection handler, IP Address Management (IPAM), DNS server are few of the several useful composites that are predefined by the community and that can be re-used to develop a new NSE. Three functions have to be implemented to create a new composite component, ”Init” which will be called just after the NSE has been created (e.g. add service in IPVS), ”Request” which will be called when a new NSC will be connected (e.g. add server to the service in IPVS) and ”Close” which will be called when a new NSC will be disconnected (e.g. remove server from the service in IPVS). An SDK for the client is also available, it is used to handle connection with the NSEs. The client SDK will not be used in the document since a good init-container with all the required functionalities is already provided with Network Service Mesh.

2.6.1 Control plane The main responsibility of the control plane is to take the decisions on how packets should be forwarded and also to configure these decisions on the forwarding plane [62]. As described in the previous section, within Network Service Mesh, the main default components that are installed and the network service endpoints are acting as the control plane. Since the NSEs are configurable using the SDK, new control planes can be embedded, this can include load balancer configuration, vip advertising using BGP, etc. When Network Service Mesh is configured to use VPP as the forwarding mechanism, the pod called ”nsm-vpp-forwarder” is running. This pod is using Ligato

13 VPP-agent which is a control plane management agent for VPP created for the development of cloud-native network functions (CNFs). When Network Service Mesh is configured to use the kernel based forwarding mechanism, the pod called ”nsm-kernel-forwarder” is running. This pod is using and network namespace (netns) to control the kernel forwarding plane. Netlink, in Linux, is an intra-kernel messaging system and a communication interface between kernel space and user space. Netlink has been defined by the RFC 3549 [59].

2.6.2 Data plane The data plane, or forwarding plane, is a physical or virtual component in charge of the data path operations and the packet forwarding process according to the decision made by control plane [30][97]. To handle these responsibilities, the forwarding plane acts on the Layer 2 (Data Link), 3 (Network), 4 (Protocol) of the OSI model using switch and router functionalities. In general, physical forwarding plane components are Application-specific integrated circuit (ASIC), network-processor, or general-purpose processor-based devices [48], while virtual forwarding-plane are, principally, embedded into the kernel space of operating systems, but, user space forwarding planes are also existing such as Vector Packet Processing (VPP) or Open vSwitch (OVS). These user space data planes can be accelerated in terms of packet processing performance and throughput using frameworks (e.g. RDMA, TOE, DPDK, etc.) providing more advanced functionalities and configurability by bypassing the kernel solution. The trendiest technologies for network acceleration are the Data Plane Development Kit (DPDK) and the Single-root input/output virtualization (SR- IOV) specification. The DPDK is a Linux Foundation project providing a set of libraries to accelerate packet processing workloads running on a wide variety of CPU architectures [1]. Two data planes are available in Network Service Mesh: Kernel based and VPP. The option can be configured by setting the value of ”forwardingPlane” in the value.yaml of the helm chart. Within Network Service Mesh, and since Kubernetes can run on several hosts, it might happen a network service endpoint and its network service clients (application pod) require remote connectivity. Network Service Mesh has the ability to handle both local (two namespaces on the same host) and remote (two namespaces on two different hosts) connections through several configurable mechanisms such as Virtual Ethernet Device (veth), tapv2 or Shared Memory Packet Interface (memif) for the local connectivity and Virtual eXtensible Local Area Network (VxLAN), wireguard or Segment Routing over IPv6 (srv6) for the remote connectivity. The mechanism depends on the forwarding plane configured (Kernel based, VPP, etc.). As the data plane option, the remote mechanism can be configured by setting the value of ”preferredRemoteMechanism” in the value.yaml of the helm chart.

Linux kernel based The Linux kernel based forwarding plane is composed of several independent components all implemented as different pieces of code inside the kernel of Linux.

14 These components, netfilter (nftables, iptables, ebtables, etc.), IP route and bridge, put together are forming the network stack from layer 2 to 4 (data link, network, transport) [92][26]. The Linux kernel is not recommended for a high-speed network environment, indeed, it suffers from a lack of performance, a lack of advanced features and a resource consumption overhead [98][88]. Previously in an interrupt driven mode, then update with the New API (NAPI) from Linux 2.5/2.6, the kernel still have these performance bottleneck issues (e.g. in TCP [60]) [90], this can be solved by bypassing the Linux kernel using network acceleration framework such as DPDK [1] or SR-IOV [61] or/and user space data plane/network stack solution such as VPP [40] or OVS [42]. Linux kernel based forwarding plane is one of the two forwarding planes Network Service Mesh is proposing. To handle local connections, the Linux kernel based forwarding plane has only veth available. For the remote connections, by default, VxLAN is used, but wireguard is also available.

VPP Vector Packet Processing (VPP), initially developed in 2002 [20] by Cisco and included in Cisco router products for about a decade [25], is now an open source Linux Foundation project part of the FD.io project. The VPP platform is an extensible framework that provides out-of-the-box production quality switch/router functionality allowing high performance, modularity, flexibility, and an access to a rich feature set [40]. As described in the previous section, VPP is a user-space high-performance framework for packet processing, it is designed to take advantage of general-purpose CPU architectures. VPP is the default forwarding plane Network Service Mesh configured in the helm chart used for the deployment. To handle local connections, the VPP forwarding plane is using by default memif interfaces, but, veth/Tapv2 are also available (veth is replacing Tapv2 if it is not available). For the remote connections, by default, as the Linux kernel based forwarding plane, VxLAN is used, but Segment Routing over IPv6 (srv6) is also available.

OVS Another sophisticated date plane, not implemented in Network Service Mesh but that could be, is Open vSwitch (OVS) [42]. OVS is a widely used virtual switch supported in many environments such as Linux and Windows and supporting network acceleration with DPDK or AF XDP. Open vSwitch is available in the Linux kernel from the version 2.6 and is also available and installable on the user space to benefit from all its advantages. In some cases, OVS is a really good alternative to VPP.

2.6.3 Service Function Chaining One of the mean features added by NSM to Kubernetes is the Service Function Chaining (SFC), there is no standard specification, but, SFC has been described and

15 summarized by the Internet Engineering Task Force (IETF) in the RFC 7665 [49]. SFC consists of an ordered list of Service Function instances so the traffic travels through it. Using NSM, Chain of network functions are defined only using the Kubernetes specification files in YAML. The pod specification has to receive new annotations, or, it is also possible to define a service chain using the new object called ”NetworkService” for more complex use cases. The network functions, as in Figure 2.3 below, are running on different pods, each pod contains an NSE container representing a Network Function. The Network Function are executable compiled from code written using the SDK of NSM. These containers can be reused, for instance, in Figure 2.3, ”NSM- Firewall-VxLan” and ”NSM-Firewall-VPN” could have the same container, with the same executable, but their pod specification will be different. Therefore, the NSE containers could be enough generic to be re-used in order, to, effortlessly, create multiple services.

Figure 2.3: Network Service Chaining example

2.6.4 Community and future development Network Service Mesh is a CNCF project which is still in the early stage (Sandbox) of its development. As all the CNCF projects, Network Service Mesh is open source, it is mainly maintained by four people (Ed Warnicke (edwarnicke), Frederick Kautz (fkautz), Nikolay Nikolaev (nickolaev) and Andrey Sobolev (haiodo)) and is open to other contributors via pull requests on Github. To contribute to this project, maintainers and contributors have a mailing list, are present on the CNCF slack and IRC, and are also organizing online meetings every Tuesday. From the beginning of its development, two versions have been released: the v0.1.0 released in August 2019, and the v0.2.0 released in November 2019, which has been used in Section4. Promoting a project is important to make people and companies adopt it and make it a reference, the community then presents what are the possibilities, advantages and features of Network Service Mesh in different places such as, for instance, conferences with the NSMCon, the dedicated conferences at KubeCon. The current SDK, as for now, is targeting Golang as the main Client and Endpoint implementation language, but, in the future, other languages such as C, C++ or Python might be considered [80]. The development of this project is still in progress, many other features that will be developed in the future already have some part of their specification written on the google drive folder of the project [82]. In this document it would have been interesting to study, among them, two of these features: SR-IOV, and the ingress

16 gateway. The single root I/O virtualization (SR-IOV) is a PCI Express (PCIe) extension specification allowing the traffic to bypass the software networking stack (e.g. Linux Kernel) by making that a physical PCIe devices (in this case, a network adapter) appear as multiple separate physical devices. This permits to reach high performance near a native environment [61]. SR-IOV could have been important, since it would have been compared with the Linux Kernel and VPP data planes. The ingress gateway, meanwhile, is more related to the designs described in Sections 3.1 and 3.2.

2.7 Related work

In recent years, multiple research and work has been done and studied, especially related to cloud native and networking. In this section, some of them are presented since they were potentially utilized for this study.

2.7.1 NAT As discussed in Section 2.1.1, NAT is a well-known issue with the SIP protocol, many research papers have been realized around this topic. Authors of [64], [78], [100] and [23] are examining the problems and potential solutions using, for instance, Session Traversal Utilities for NAT (STUN) and Traversal Using Relays around NAT (TURN) servers.

2.7.2 Alternatives There is no similar project to Network Service Mesh supported by the CNCF, but alternatives to create a secondary network and service mesh running on the application layer within Kubernetes are already existing. Nokia, for instance, has built DANM (Damn, Another Network Manager!) [83], their own network manager framework fulfilling their needs on the developments of telecommunication applications and networking. This project works as a CNI plugin providing attachment capabilities to multiple network interfaces of any type (IPVLAN, VxLAN, etc.), and having the capability to work together with any other CNI solution (e.g. SR-IOV, Calico, Flannel, etc.). In the same way as Nokia, Huawei is developing CNI-Genie [55], a project supporting about the same functionalities as DANM. Otherwise, the most popular project enabling multiple network interfaces attachment is Multus [57], developed by Intel. Multus is following the Kubernetes Network Custom Resource Definition De- Facto Standard [36] defined by the network plumbing working group (NPWG) which is specifying the configurations for additional network interfaces. Formed at Kubecon NA ’17 in Austin, the NPWG is also working on another promising project about a new service abstraction for Kubernetes fulfilling more use cases [47]. In the future, this work can potentially become a common standard directly included in Kubernetes. On Service Mesh projects, some research has analyzed the different solutions available in Kubernetes and presents them in real use cases. [66] presents the main challenges, state of the art, and future research opportunities regarding service meshes. Authors of [63] have set up multiple service mesh in Kubernetes in order to have

17 encryption and traffic separation using encapsulation and to provide function-level traffic monitor/control.

2.7.3 Performance Performances are a crucial topic for new generations of telecommunication in order to always provide a better experience to users. Some research related to performances of data planes and network function virtualization exists. [22] have compared direct connection, MACVLAN, SR-IOV, Linux bridge, and OVS in a containerized environment. In [89], authors compared OVS-DPDK, SR-IOV and VPP. Performance analyses has also been realized by main maintainers of projects such as FD.IO for VPP which are comparing VPP with OVS DPDK [40][53], or, OPNFV [84] comparing VPP and OVS using RFC 2544 [93]. In addition to pure performances comparison, different methods to measure performances are existing. [35] introduces Network Function Performance Analyzer (NFPA), a standalone and RFC 2544 [93] compliant benchmarking tool allowing users to measure performances of any network function running on any hardware and software combination. Researchers of this paper have also demonstrated its usage with a comparison between kernel and DPDK data planes over native, virtualized and containerized environments.

18 Chapter 3

Design

In order to fulfill the ingress and egress no NAT scenarios and the secondary network requirement of IP multimedia subsystem in a cloud native environment, this chapter presents multiple designs accompanied by their description, advantages and disadvantages. Some of the most promising solutions presented will then be selected and implemented in the Chapter4. Observation of the behavior and the utilization of Network Service Mesh is also included in this chapter to highlight the specificities of this new project.

3.1 Ingress traffic alternatives

This section proposes multiple solutions in order to solve the ingress no NAT scenario for the IP Multimedia Subsystem in Kubernetes using external connectivity and Network Service Mesh. The first parts introduce different ways to reach a network service endpoint, and the last part suggests the design to reach the network service clients from incoming traffic reaching a network service endpoint.

3.1.1 Host shared Kubernetes provides the flexibility to control the different namespace scopes of the pods (PID, network, ipc, etc.) to keep them isolated, or, to share them with the host. In this scenario, the network namespace is shared with the host by setting the parameter ”hostNetwork” to true in the pod specification. Sharing the network namespace with the host will cause the iptables, network interfaces, routing tables, IPVS rules, ebtables to be shared. As described below in Figure 3.6, during the deployment of the Network Service Endpoint and the application, Network Service Mesh will create an interface on each pod of the application (e.g. NSM0 on NSC-1) and a corresponding interface on the hosts (e.g. NSM-1 on Node-A). The ingress traffic can then be injected through the host interface ”ens3” and processed (e.g. firewall) and load balanced to the corresponding pods of the application as the default Kubernetes service model is already providing.

19 Figure 3.1: NSM - Ingress - Host Shared

This solution is easy to set up and could be considered as a good idea at first glance, but it has several important drawbacks which will prevent it to be further investigated and selected as the final design for this research. Since the network interfaces are created directly on the host network namespace, the network isolation is lost, it could cause problems on the infrastructure if any network overlapping happens, a network service could be connected to the Kubernetes network, or to another network service. The risk a network overlapping happens increases with the number of services deployed. If, for instance, the IMS application

20 requires 20 services with 10 pods running on each node, 200 new network interfaces will be created on each node. Using the shared host network feature could also be prohibited by the infrastructure owner, or, by the clients of the IMS system. By default, Kubernetes with IPVS configured, is refreshing the IPVS table every 30 seconds with the services and internal traffic load balancing rules. IPVS is then not available to be used with this solution, each new rule will be overridden by Kubernetes. When a pod is destroyed, the namespaces are destroyed at the same time if they belong to the pod. In the shared host network concept, the namespace does not belong to the Network Service Endpoint, so if the pod has to be destroyed (because of a crash, restart, etc.), the previous network interfaces will still remain on the host as unused network interface.

3.1.2 VPN To avoid the NAT in Kubernetes, a new container with a virtual private network client running can be attached as a sidecar to the endpoint pod, so traffic tunneling is set up in order to encapsulate the VIP and ”hide” the NATting produced by Kubernetes. In Figure 3.2, we can see a VPN server using OpenVPN running as a new pod, a new network, 192.168.255.0/24 (the Yellow one), is then created. The VPN server is exposed with a Kubernetes NodePort service, so any clients can connect to it using the FQDN or the public IP and the port of the service if they have access to the authentication keys. The sidecar of the endpoint pods is connected to this VPN server, they can receive traffic from entity connected to the VPN Server via their private IP or via the VIP (11.12.13.14/32) handled by their loopback interface or, in this case the tun0 interface. Each NSEs can then run different network functions to, for instance, advertise BGP listeners a new hop is available, or to load balance traffic to the NSCs.

21 Figure 3.2: NSM - Ingress - VPN

Using the Kubernetes services has the advantage of using existing components, other services using Kubernetes services will continue to work as before. The part to be developed and set up for the VPN server and the VPN client might be high since security features of the VPN have to be considered. The VPN server has to transfer authentication keys to the clients (NSEs’ sidecar). A new, highly available

22 VPN server has to be deployed, knowing that, if this service becomes unavailable, the NSEs will be impacted and not reachable. The sidecar also has to be injected and configured into the NSE pods. But this solution still has the advantage to be able to be connected to an external network in a secure way. It is important to mention this solution might generate latency, all the traffic will have to be handled by the VPN server and the use of existing components such as the Kubernetes services has some disadvantages. New layers in the traffic flow are added since the traffic is going through the VPN server, then through the Kubernetes service resources, which is a first load balancing layer created by Kube-proxy, and then it has to go through another load balancer layer embedded within the network service endpoint. This behavior leads to a more complex traffic route, therefore, to a latency addition. This mechanism can also be done using a VPN server instead of a client as a sidecar container of the endpoint pod, but the issue raised with this solution will be the persistence of the connection. When a new client will want to access the endpoint through the VPN, it will have to connect in the same way as the ”server” solution, but it will connect to one and only VPN server, and all its traffic will go through the same endpoint pod. This is due to the fact VPN servers do not have synchronization of active connections and because the persistence of the connection is activated on the Kubernetes load balancer (IPVS or IPTables).

3.1.3 MACVLAN / IPVLAN MACVLAN and IPVLAN are Linux network drivers exposing underlay networks directly to network namespaces running on the host. These drivers can then be used to create a new network interface on containers and pods. With MACVLAN, new network interfaces on top of the host network interface can be created with new unique mac addresses and, consequently, a new unique IP address. Four types/modes are available in MACVLAN [27][33], the first one is ”private”, the MACVLAN interfaces attached to the same network interface cannot communicate to each other. The second, ”passthru”, allows one and only one MACVLAN interface to be connected to the host interface. VEPA, the third mode, allows communication between MACVlAN interfaces, data are transmitted out the host interface to reach the switch (The hairpin needs to be supported by the switch). The last type, Bridge, creates a bridge on the host interface, the MACVlAN interfaces can, therefore, communicate to each other within the host. IPVLAN is conceptually very similar to the MACVLAN driver with one major difference, IPVLAN does not assign new MAC addresses to create the new interface, the new interfaces are sharing the MAC address of the host. IPVLAN has two modes [24][33], L2 which is working as MACVLAN Bridge and L3, the host interface acts as a router. In Figure 3.3, NSEs have received a new network interface, the ingress traffic will then bypass the Kubernetes networks and reach the pod directly instead of going through Kube-proxy. They can still communicate to each other since they have different IP addresses. A new pod is also appearing: ”ingress-controller”, this pod is in charge of setting up the IPVLAN/MACVLAN interface and injecting it into the NSE pod. Sharing the network with the host is required in order to be able to create

23 a new virtual interface based on the host network interface (ens3) and move it to the appropriate network namespace (the one of the NSE pod).

Figure 3.3: NSM - Ingress - IPVLAN/MACVLAN

This solution does not meet entirely the secondary network attachment requirement since the NSEs will be connected to an underlay network (the same network as the host), but it would be interesting if the host is connected to multiple different networks with several network interfaces. The secondary network attachment requirement will then be delegated to the underlay network and to the host.

24 Depending on the configuration and the policies, the switch (e.g. Open VSwitch on Openstack) of the network can block IPVLAN and MACVLAN drivers. Some other limitations are also observable using these two technologies. In MACVLAN, when the number of devices increases, the mac address table will grow and provoke more latency and a possible loss of connectivity. And using IPVLAN in the L3 mode, interfaces does not receive broadcast and multicast traffic [52]. In term of performance, IPVLAN/MACVLAN does not add any level of encapsulation, is running on lower layer on the OSI model, and is one of the default driver of Linux, we can then expect higher performance than with a VPN or any other technologies using encapsulation or layer 7 (application) software.

3.1.4 Overlay Network / VxLAN This solution is quite similar to the one using IPVLAN/MACVLAN discussed in the previous Section 3.1.3 since it is using another type of Linux interface for virtual networking called VxLAN, an extensible version of VLAN. A Virtual Local Area Network (VLAN) enables the separation of the logical connectivity from the physical connectivity [91]. VLANs have been designed in several different protocols, the most commonly used today is IEEE 802.1Q [56]. The IEEE 802.1Q protocol has a limitation of 4094 VLANs (due to the length of the VLAN Identifier (VNI): 12-bit) on a given Ethernet network which is inadequate in such situations. The VxLAN tunnelling protocol has, then, been designed to solve this limitation [5]. The Virtual Extensible LAN (VXLAN) protocol is a 1 to N tunnelling protocol (unlike most of the other tunnelling protocol) described by the RFC 7348 with the capability to have 16 Million VxLANs (The length of the VNI is 24-bit) coexisting within the same Ethernet network [79]. VxLAN is encapsulating Layer 2 traffic over user datagram protocol (UDP) packet, which can be transported over Layer 3 networks. The Figure 3.4 is describing a similar architecture as the one described in Section 3.1.2 about VPN. A new network (the yellow one) has appeared as well as a new pod called ”ingress-controller”, this pod has the same responsibilities as in the IPVLAN/MACVLAN solution (set up the VxLAN interface and inject it into the NSE pod). The new network is an VxLAN overlay network using a multicast group (e.g. 239.1.1.1) over the interface ”ens3” of the nodes and running with a destination port number set to the IANA-assigned value (4789) [13]. Each network service endpoint will learn the VxLAN IP address of the other endpoint either dynamically.

25 Figure 3.4: NSM - Ingress - VxLan

Unlike the IPVLAN/MACVLAN, the secondary network requirement is met using this solution without any additional precondition from the underlying network. VxLAN is a confirmed concept and widely adopted by most of the CNIs which are using the same solution to create a new network for Kubernetes without any modification on the underlying network. The expected performance using VxLAN is probably worse than using IPVLAN/MACVLAN since the packets are encapsulated, which should probably reduce the efficiency a bit. However, the performance will be better than the VPN

26 solution. One of the drawbacks of VxLAN is the lack of mechanisms for scale and fault tolerance. A database containing connected hosts is built on each switch. Other switches learn about those connected hosts using a data plane-driven model called flood and learn. Flood and learn is the default control plane with VxLAN that makes a host to be reachable by flooding across the network its information. A control plane is then needed to add a capability of distribution of the host reachability information across the network. Ethernet VPN (EVPN), defined in the RFC 7209 [18] and standardized in the RFC 8365 [19], extensions associated with Multi-Protocol BGP (MP-BGP), standardized in the RFC 4760 [96], could be the solution. MP-BGP EVPN for VxLAN provides a distributed control plane solution that significantly improves the ability to build and interconnect SDN overlay networks. It is also minimizing network flooding and improving the robustness, the security and the scalability of VxLAN [29].

3.1.5 Load Balancing and VIP advertisement The designs defined previously allow the network service endpoint to be reachable from the external world and in particular, routers or DC gateways. This external device (routers/DC gateway) is not part of the scope of the thesis, but it still has to be represented for this design. The traffic comes from and is directed to this device for external connectivity, the VIP should be advertised to it. An ECMP load balancer and a BGP listener are running on this device to be aware about the availability of the service and its route and then forward the traffic to the right NSM network service. The Figure 3.5 introduces the new device router/DC gateway, which is a machine outside the Kubernetes cluster, but still connected to the same yellow network (can be a VxLAN, VPN, etc.). For this design, it is running a BGP listener which should be coupled with an ECMP load balancer. Each NSEs handling services and VIPs, are running a BGP speaker in order to advertise the availability of the NSM network service to the router/DC gateway. They are advertising the route to use to reach the instance of the service and the VIP they are handling. The NSE should advertise its availability only if there is at least one NSC connected to itself, because the real termination point will be an NSC. One improvement that could be done to this, would be the usage of the weight parameter in BGP for the router/DC gateway to prefer NSEs with more NSCs connected. Until now, the traffic was reaching the NSEs without being redirected to the NSCs. A load balancer such as IPVS running on the NSEs can be used to balance traffic among the different NSCs. ECMP supports session stickiness and IPVS has a connection tracker, so for each established connection, the packet for a client will always use the same route and reach the same NSC. Using the SDK of NSM, the load Balancing and the VIP advertisement can be implemented as reusable components in order to deploy one or multiple services serving one or multiple applications. This design is compatible with any of the designs discussed previously.

27 Figure 3.5: NSM - Ingress - BGP and IPVS

3.2 Egress traffic alternatives

In continuity and in the same way as the previous section, this section proposes several designs aiming at solving egress traffic scenario by removing the NAT for IMS. This

28 case is more complex than the ingress traffic since all network service clients should have the same VIP, port collision might then happen and should be handled. The design used as a base, is the last one in Section 3.1.5 about load balancing and VIP advertisement.

3.2.1 Tunneling A solution to remove the NAT in the egress traffic, would be to have a tunnel for each client. For each new connection, the network service client set up a tunnel between itself, and the destination point, so it can send an inner packet that will be encapsulated and intact for the receiver, thus allowing a direct IP connection. The outer packet can be filtered, NATted and modified, it can also use any other protocol, as long as it reaches the right destination, it will be decapsulated. The protocol used is a generic routing encapsulation (GRE), IPSec, IP in IP, or a VPN with custom network protocol. The drawback of this solution will be the amount of tunnel since for every NSC will create a new one for each connection. The destination point will also have to support on demand connectivity, to be able to create a tunnel each time it wants to receive a new connection.

3.2.2 NSE delegation The responsibility of the initialization and the handling of the packets to expose to the external world can be delegated to the network service endpoint, the NSC will then not have to take attention of the port it is using. To achieve this, the network service endpoint can be developed as part of the application. The NSE and NSC development will then be tightly coupled since the NSE will become the initial point of the traffic incompatible with the NAT. A way of communication will have to be set up between the NSE and the NSC since the data will have to be transferred to the application. These data can then be encapsulated into packets to send it to the final application running on the NSCs using protocols such as gRPC, HTTP or any other. This solution, using encapsulation with layer 7 (application) protocol, does not require the particular usage of the new functionalities brought by Network Service Mesh. Traditional service meshes such as Istio or Linkerd are already using these protocols for the communication between pods and services and can therefore be used to achieve this solution. In Figure 3.5, the NSC is transmitting the appropriate data for the NSE-A to initiate a communication with the external world. After the NSE has received the response from the external world, it can embed packets into another protocol (e.g. gRPC) and send it to the right network service client. We can observe the traffic is NATted, but this design should work, even if the protocol is not compatible with the NAT. The connection between the NSC and the NSE is a different connection than the one between the NSE and the external world.

29 Figure 3.6: NSM - Egress - NAT

3.2.3 Connection Tracker / Port Allocation In this scenario, since there is no NAT, all NSCs have the same IP, the only identifier to route the traffic is the source port number (destination port when the traffic is coming back). Tracking the connection is important in the case an application needs to use a protocol requiring an established connection such as TCP or SCTP. Each pod attached to the NSE will have a specific port range they can use to send egress traffic. This port range division can be decided by the NSE and communicated to the NSCs, but this mechanism will require new components in the NSE and the NSC to communicate between them. Another solution would be to have independent decisions of the port allocation for each NSC. If the port range is given as a parameter of the NSC with a unique index, it is then possible to calculate unique port ranges for every NSCs without any communication between them or with the NSE.

30 The traffic flow of the packet originated from an NSC is shown in Figure 3.5. The NSC-A2 has the authorization to create packets with a source port between 40000 and 41000, if a packet is created outside this range, it will be rejected by the NSE. For the response, the NSE has a rule in its routing table to redirect all the packets with a destination port between 40000 and 41000 to the NSC-A2.

Figure 3.7: NSM - Egress - No NAT

The drawback of this solution is the difficulty of distribution of the ports and range width decision. If the range is too low, some NSC could have a lack of ports if they require a lot of communication to a single destination. For instance, if the port range is 50, and an NSC has to connect to the same destination IP and port more than 50 times, the NSC will not have enough source ports available. If the range is too large, it could provoke issues in the scalability of the NSCs. No new ports will be available, so the new network service clients created will not be able to receive their port range. To fix these issues, a reallocation of the ports would be possible, but, to be achieved, it could take some time, and it would be really complex, since port allocation of every network service client would have to be done without breaking the connections to the external world that are already established.

31 3.2.4 Multiple NSEs Adding multiple network service endpoints to one service would add more complexity in the solutions presented previously in Sections 3.2.1, 3.2.2 and 3.2.3. An NSM service using several network service endpoints handling the same VIP address will require more flexibility in the port allocation since two network service clients are not allowed to create a request using the same source ports. The Network service endpoints will, then, require a new port allocation mechanism that should synchronize the ports in use and the ports allocated. This solution refers to the same solution as described in Section 3.2.3, but, this time, focused on the network service endpoints. For the port synchronization in the network service clients, any of the previous design solutions (sections 3.2.2 and 3.2.3) are still compliant.

3.2.5 Dynamic allocation by the application A less generic solution for the problematic raised in the previous section (3.2.4) would be to remove this responsibility from the network infrastructure and from the network service endpoints and, to let the applications decide on the port allocation by themselves. The idea would be that each application has their own implementation of the port allocation which will be more permissive in terms of features and possibilities. A possible implementation of this solution would be using microservices. A new cluster IP service can be created on Kubernetes to serve the instances of the applications handling the VIP and take the decision about the port allocation. When one of the running instances of an application will need to connect with the external world (so generate egress traffic) using the VIP without NAT, the first thing it will have to do, is to contact the service to get a port assigned, and then, send the packet using this port. If the instance is trying to send packets with a port it is not allowed to use, the packets must be dropped to avoid potential conflicts. The implementation of the application serving this service can be seen as a common database with a connection tracker. The application must monitor the networks of the instances to see if each running instance is using ports they are allowed to use, but, also, to free the port in case, for example, a TCP connection is closed (according to the flag FIN or RST) or a timeout has been exceeded.

3.3 Data plane / Control plane separation

As of today, Network Service Mesh has a well separated data plane from the control plane. The data plane is defined by the network service deployed using VPP or the kernel, and, the control plane is decentralized with the three main components installed with Network Service Mesh: nsm-vpp-forwarder/nsm-kernel-forwarder, nsm- admission-webhook, NSMgr (see Section 2.6). The development of a network service endpoint using the API and the SDK of NSM can easily end up with a data plane running on the same place as a part of the control plane. Indeed, the SDK is specially made to develop the network service endpoint, a pod which is deciding when a network service client requires the connectivity to the

32 network service, but that can also run the network service itself with firewalls, load balancers, and other components of the data plane. For instance, the load balancer described in Section 3.1.5 has IPVS (the data plane) running in the NSE pod while the NSE is also the component configuring this IPVS. To avoid this situation, it is possible, with some work, to separate the network service endpoint into two different components. In Figure 3.8, the usage of MACVLAN/IPVLAN is an illustrative example, VxLAN, VPN or any other connectivity can be used in this case. As it is possible to see with the running pod called nsm-kernel-forwarder, this design is configured to use the kernel forwarding plane. Instead of having the network service clients directly connected to the network service endpoints, the NSCs are connected to a new network namespace which is running the data plane of the network service. The NSEs are becoming only a control plane and have as role to set up the rules in the new network namespaces. This example is showing a network namespace to run the data plane, thus no user space such as VPP would work with this, but, this network namespace, can also be replaced with a new pod or a simple container.

33 Figure 3.8: NSM - Data plane / Control plane separation using namespaces

This architecture would require a lot of work, a new implementation of the network service endpoint and some changes in the way Network Service Mesh is implemented. The drawback of this possible implementation will be the addition of complexity in the management and deployment of the network service. Each network service will then require at least two pods to work, and, in some situations, the network service would require two pods for each instance of the network service. The advantage of having this architecture in place, would be to have a good separation between the control plane and the data plane. If any of the components

34 has to evolve, it can be done independently and without any impact on the other. The control plane could evolve without any interruption of the network service.

35 Chapter 4

Implementation

This chapter describes in detail the implementation and the setup of the prototype solutions. The solutions chosen to be implemented are the most promising one (According to the issues and requirements of IMS) described in Section 3.1.5 coupled with an external connectivity using VxLAN (section 3.1.4) and a VPN (section 3.1.2) for the ingress traffic, and the static port allocation designed in Section 3.2.3 for the egress traffic.

4.1 Environment

In order to reproduce the implementation in the same conditions as the one employed in this study, this section describes the environment in detail with the hardware description and the software used with their versions.

4.1.1 OpenStack The tests and PoCs are running on Kubernetes clusters which are hosted on an Openstack cluster. The Openstack works using KVM on a server composed of Intel Xeon E5-2650 [email protected]. The bandwidth of the internal traffic measured using Iperf is 6.74 Gbits/sec. A Heat file (an OpenStack Orchestration template) has been created to deploy one master, two nodes and another VM which will be used as BGP listener and traffic forwarder. A network is deployed using the network address 192.168.12.0/24, with 192.168.12.1 as the gateway. The allocation pool of the addresses starts at 192.168.12.20 and finishes at 192.168.12.99 and two DNS servers have been configured to communicate with the internet. The VMs of the stack have the same security group attached. ICMP and IPv4 encapsulation are enabled, all ports are open for UDP, and all ports are open for TCP except the 65535 which is open only within the network. The VMs are all created with 2 vCPUs, 4gb of memory, 10gb of storage and one network interface connected to 192.168.12.0/24. Ubuntu 18.04.4 LTS is the operating

36 system used, its kernel version is: 4.15.0-76-generic.

4.1.2 Kubernetes To connect the Kubernetes worker nodes to the master, a daemon service is running on the master providing the token required by the nodes in order to join the Kubernetes cluster. The daemon service is listening on port 65535 (only open for the VMs connected to the same network as the master) and sending the token for every request it will receive. It is then easy for the nodes to connect automatically to the Kubernetes network knowing the master IP is by default: 192.168.12.21. The Kubernetes cluster is a Vanilla Kubernetes cluster version 1.17.1, with IPVS configured on Kube-Proxy, and Calico installed and configured as Container Network Interface (CNI). The pod subnet is: 10.244.0.0/16 and the service subnet is: 10.96.0.0/12. On top of Kubernetes, the package manager Helm has been installed together with Network Service Mesh version v0.2.0 provided by the helm repository accessible with this address: ”helm.nsm.dev”.

4.1.3 Development The approach in the development of the experimentation and the solutions was to keep the things simple by using the most popular tools to be effective and to find more documentation and help. Therefore, Docker has been chosen as container runtime. The development language widely used in the cloud computing community is golang (e.g. Kubernetes and Docker are developed in Golang), the applications have been therefore developed using this language. Together with golang, Makefiles are created to compile and test the code, and compile the docker images. Since applications are not deployed using only one specification file, they are using at least 2 specification files, one for the NSE, and the other one for the NSC, Helm has been used in order to manage the specification and the installation of the applications.

4.2 Network Service Endpoint

This section presents the implementation of the ”bricks” composing the network service endpoints that have been used in the prototype solutions. The ”bricks”, once assembled, can create the different network service endpoints described in the design Chapter3.

4.2.1 Interface Adding a new VxLAN, IPVLAN or MACVLAN interface to a container, as explained in Sections 3.1.3 and 3.1.4, requires an access to the host network namespace. For this, a daemonset must be created to be able to act on every node and enable the attachment of the interfaces regardless of where the pods are running. Multus could have been used to achieve this implementation, but for learning and a better understanding of the Linux networking, a home-made implementation has been preferred.

37 Creating a new interface in a container is equivalent to create a new interface in a network namespace. The first step is to create a new interface on the host network namespace by communicating with the Linux kernel space using netlink. In Go, vishvananda/netlink, the open source library, is available and provides all the required functionalities of netlink. Indeed, it is supporting VxLAN, IPVLAN and MACVLAN types, each requiring different parameters. LinkAttrs is a common parameter for these interfaces, it is composed of several sub-parameters such as the MTU and the name. VxLAN, in addition to the LinkAttrs, requires an Id (VxlanId), a port, a group and the index of the parent interface (VtepDevIndex). IPVLAN/MACVLAN, meanwhile, requires the mode (see Section 3.1.3). The second step is to move the interface to the network namespace of the pod. To manipulate the network namespace, another open source library called vishvananda/netns is available. This library supports the management of a network namespace according to the PID, the name, the thread, or the path, because, depending on which container runtime is used, the path to the network namespace might be different (usually in /var/run/netns). For this implementation, the name has been used using the container ID of the network service endpoint. To find this container ID, the network service endpoint had to be captured using the Kubernetes API. Thirdly, one or multiple IP addresses must be added to the interface using an IPAM. DHCP could have been used for this task, but for simplicity, the IP address has been assigned in a static way which can still be done using netlink. Explained in the design section (see Section3), the VIP must be handled by the NSE, so it can be added to the new interface or to the loopback interface. Finally, the interface can be set up using netlink and the network service endpoint will then become reachable from the machines connected to the same Network (VxLAN or host network).

4.2.2 VPN To attach a secondary network to the network service endpoint, a solution is described in Section 3.1.2. This solution requires the development and the deployment of new components such as a pod with a running VPN server, a service serving the VPN, and a sidecar container to attach to the network service endpoint. The development of a new NSM composite using the SDK of NSM is not required. To achieve this task, the open source software OpenVPN has been used as well as k8s-ovpn [31], the implementation of OpenVPN on Kubernetes. This implementation provides tools to generate authentication keys, configuration files, secret Kubernetes objects to store certificates and keys, and configmap Kubernetes objects to allow access to the different files. A YAML specification file is also provided to deploy a deployment containing a running OpenVPN server and a service to access and connect to the VPN. Once the OpenVPN configured and deployed on the Kubernetes cluster. The configuration file for a VPN client can be generated and injected in a new container. Alpine was used to create this container, and the OpenVPN client has been installed on it. The container can be, then, added to the specification of the network service endpoint as a secondary container, it will connect to the VPN server, and make the NSE

38 accessible from the external world provided that external machines are also connected to the VPN.

4.2.3 Load Balancing The NSE handling the VIP, can be seen as a termination point, but, in reality, it has to load balance the traffic to the different NSCs. The IPVS load balancer has then been used together with the SDK of NSM to implement a new NSM composite in order to create a new NSE. In Linux virtual server terms, the NSE is a director, with a function to distribute incoming requests among real servers (NSCs). The first step to do with IPVS to deploy a service is to create a virtual service handling an IP and a port and using a scheduling method. Using the SDK of NSM, deploying a service is easy, a function to implement is triggered just after the creation of the network service endpoint. The different parameters describing the service such as the VIP, the port, the protocol and the scheduling method, being saved as environment variables in the specification file of the NSE can be retrieved in Go. The IPVS service can be then configured using these parameters. Go has a library to control IPVS, but, for this implementation, ipvsadm [69] commands have been executed in Go. The command in the Listing 4.1 below, permits to create a TCP service with 11.12.13.14 as VIP, listening on port 5000 and using round robin as scheduling method. Listing 4.1: IPVS command to create a service ipvsadm −A −t 11.12.13.14:5000 −s r r

After the creation of the NSE, the NSCs can start to be added. Another function is called each time an NSC needs the connectivity to the NSE. The function, named ”Request”, provides as parameter several information about the pod, and, especially, the IP of the pod, which can, then, be added to the service according to the configuration (VIP, port and protocol). The requirement is to avoid the NAT (and also the NAPT), the port used for the real server should be the same as the one used for the service. The Direct routing mode must be also used to avoid the NAT. The command in the Listing 4.2 below, permits to add a real server (NSE) to the service using direct routing (or gatewaying) as a packet forwarding method. Listing 4.2: IPVS command to add a real server to a service ipvsadm −a −t 11.12.13.14:5000 −r 172.40.1.5:5000 −g

Since the network service endpoint is the entry point from the external world, and the NSCs are not connected to the same network (VxLAN, VPN, etc.), the NSE has then to act as a gateway for the response packets using the direct routing mode. By default, the Linux kernel drop packets with a local source address (in this case it is the VIP) from the forward path as ”source martians” [21]. It is possible to fix this issue with a kernel patch and by setting the kernel parameter ” /proc/sys/net/ipv4/conf/all/forward shared” to 1, but it will not meet the IMS requirements (see Section 2.1.3) since kernel modifications are not allowed. Otherwise, since Linux Kernel 3.6, it is possible to fix it by setting the kernel parameter ”accept local” to 1, the NSE will then be used as gateway [21].

39 If a network service client is removed for any reason, the function ”Close” will be called, the real server will, then, have to be removed from the server to avoid the NSE to forward traffic to a non-existing address. As the ”Request” function, ”Close” provides the IP as parameter, it is then easy to remove it from the service. The command in the Listing 4.3 below, permits to remove a real server (NSE) from the service.

Listing 4.3: IPVS command to remove a real server from a service ipvsadm −d −t 11.12.13.14:5000 −r 172.40.1.5:5000

4.2.4 BGP For this part, the set up and implementation has been achieved on the network service endpoint side to host a BGP speaker, and, on the router/DC gateway side to host a BGP listener. GoBGP [85], one of the open source implementations of BGP, has been used on both sides. The implementation of the network service endpoint side has involved the use of the SDK of NSM. GoBGP has been installed in the NSE pod, and a configuration file is included containing the NSM IP of the pod, the address of the router, and the Autonomous System (AS) used. A new NSM composite has been created starting with the GoBGP speaker and advertising the VIP when at least one NSC is available, meaning the service is available through this NSE. To count the number of NSCs connected to the NSE, a variable can be included in the composite struct, and be increased when the function ”Request” is called, and decreased when ”Close” is called. Similarly, on the router/DC gateway, GoBGP has been installed on the machine, and configured to use dynamic neighbors thanks to a file containing the Autonomous System (AS). GoBGP can, then, run in background as a daemon to receive the different routes to use in ECMP (not implemented in this document).

4.2.5 Port allocation Detailed in Section 3.2, port collision between two network service clients using the same NSM service would make impossible the response to reach the original source. The design chosen for this implementation is a static port allocation using only one NSE. The implementation is not complex for this solution since there is no port synchronization between several NSEs. An idea, for instance, would be to find a unique index for each pod of the deployment, and calculate the port range according to a length given in the environment variable. The solution adopted for this implementation has been based on the host part of the IP address of the NSM interface which has been used as a unique index (since each NSC has a unique IP). This avoids the development of new communication components between the NSE and the NSCs but prevents the usage of multiple VIPs. For example, the NSC with the IP 172.20.1.1 will get as index the value 0, and then a port range allocated between 40000 and 40999, the one with the IP 172.20.1.5 will get as index the value 1, and then a port range allocated between 41000 and 41999...etc. To allocate port statically within Linux, the kernel

40 parameter ”net.ipv4.ip local port range” can be modified by writing the range (e.g. ”40000 41000”) using sysctl.

Response handler One of the requirements of the architecture described in Section 3.1.5 about the ingress traffic and the IPVS load balancer was to have a local interface handling the virtual IP (VIP) in order to catch the request and then load balance the traffic to the different network service clients. This local interface handling the virtual IP changes the behavior and the path of a packet. A response packet, instead of being forwarded to the initial source, will be destined to the local process (NF IP LOCAL IN), and, be forwarded to the application (layer 7) handling the port or be dropped if there is no route. To avoid this behavior a solution is possible using only existing components within Linux without reaching the Layer 7 (application). Indeed, as described in Figure 4.1 below, Linux virtual server (LVS) with IPVS hooks into LOCAL IN to get the packets and decides to forward it to an NSC according to the defined rules [74]. This approach prevents packets from being processed by the local application and allows their transfer into the POSTROUTING chain.

Figure 4.1: LVS - Packet Flow

Described in Section 2.4.1, the rules in IPVS can be of two different types: IP or connmark (firewall mark (fwmark)). It is, then, possible to forward the response traffic using firewall marks, netfilter (IPTables) and IPVS if the network service clients have pre-defined and static port allocated. The usage of firewall marks (fwmarks) is required since IPVS does not support by default the load balancing of port range, but this is still possible using fwmarks. Since the packets are still going through the netfilter PREROUTING before being directed to LOCAL IN and IPVS, the packets can be marked according to a range of ports. If the network service endpoint handles multiple VIPs, the destination IP has to be added into consideration into the command. Marking packets according to a port range can be achieve in Linux shell with this IPTables command:

41 Listing 4.4: IPTables command to mark TCP packets according to a destination port range i p t a b l e s −A PREROUTING −t mangle −p t c p \ −−dport 40000:41000 −j MARK −−set −mark 1

Once the packets are marked, IPVS has to be configured to forward the packet to the right destination. For this task, the allocation ports of each network service client are known together with their firewall marks. One IPVS service in direct routing mode has to be created per NSC using their fwmarks.

4.3 Network Function Chaining

Network Service Mesh provides on-demand connectivity with reusable components (Network service endpoints) and using the specifications of Kubernetes objects. The description of this implementation gives the information to set up a connectivity between 2 pods, and how to extend it to more complex use cases. A network service endpoint has, first, to be developed using the SDK of NSM to handle the network service. The simplest possible NSE to implement, can be built using two NSM’ composites called ”NewConnectionEndpoint” and ”NewIpamEndpoint” which enable connection between the pods (according to the NSM configuration) and assign IP addresses to the pods. Once an NSE has been developed, the specifications can be written. Three specifications are needed to deploy a minimal and working NSM network service. The first one, see Listing 4.5, is used to create a ”NetworkService” which is a new Kubernetes object installed with Network Service Mesh. A ”NetworkService” is establishing a relationship between one or multiple network service clients and network service endpoints by matching values contained in the specification files (spec.matches.sourceSelector.app, metadata.name and spec.matches.route.destinationSelector.app).

Listing 4.5: Specification of an NSM Network service apiVersion: networkservicemesh.io/v1alpha1 kind: NetworkService m e t a d a t a : name: network spec : pa yloa d : IP matches : − match : sourceSelector: app : l i n k r o u t e : − destination: destinationSelector: app : nse

42 The second one, see Listing 4.6, is the specification of the network service endpoint. The container managing the image of the NSE has to receive environment variables which will be handled by the different composites and the NSE configuration [81]. ”ADVERTISE NSE NAME” requires the name of the ”NetworkService” (metadata.name in Listing 4.5), ADVERTISE NSE LABELS requires the destination selector (spec.matches.route.destinationSelector.app in Listing 4.5) and IP ADDRESS requires the CIDR to create a network. The current NSM version is the v0.2.0, the name of the environment variables might have changed in the future version

Listing 4.6: Specification of a Network Service Endpoint deployment apiVersion: apps/v1 kind: Deployment m e t a d a t a : name : nse spec : s e l e c t o r : matchLabels: app : ” nse ” r e p l i c a s : 1 t e m p l a t e : m e t a d a t a : l a b e l s : app : ” nse ” spec : serviceAccount: nse−acc containers: − name : nse image: simple −nse : l a t e s t env : − name : ADVERTISE NSE NAME value: ”network” − name: ADVERTISE NSE LABELS value: ”app=nse” − name : IP ADDRESS value: ”172.10.1.0/24” r e s o u r c e s : l i m i t s : networkservicemesh.io/socket: 1

The last specification, see Listing 4.7, is the application one. The application can be anything, a HTTP server, a database or even another NSE. Within IMS, the application can be, for instance, the multimedia telephony application server (MTAS). To connect it to an NSM network service, this specification file requires only one modification which is a new annotation (metadata.annotations.ns.networkservicemesh.io) with as

43 value, the name of the ”NetworkService” (metadata.name in Listing 4.5) and a source selector (spec.matches.sourceSelector.app in Listing 4.5).

Listing 4.7: Specification of a Network Service Client deployment apiVersion: apps/v1 kind: Deployment m e t a d a t a : name : ” nsc ” annotations: ns.networkservicemesh.io: network?app=link spec : s e l e c t o r : matchLabels: app : ” nsc ” r e p l i c a s : 2 t e m p l a t e : m e t a d a t a : l a b e l s : app : ” nsc ” spec : serviceAccount: nsc−acc containers: − name : nsc image: alpine:latest command: [”tail”, ”−f”, ”/dev/null”]

The result of the installation of these three specification files will be three new pods running, two network service clients and one network service endpoint. Each network service client will receive a new network interface connected to the network service endpoint thanks to a new network (172.10.1.0/24). The network service endpoint will receive two new interfaces, one for each network service client. To extend this connectivity to more complex use cases, one client can connect to multiple network service endpoints by using a comma in the ”ns.networkservicemesh.io” annotation (e.g. ”network?app=nse,network?app=nse- b”). Multiple different clients can also be connected to the same network service endpoint if the clients are using the same network and link in their ”ns.networkservicemesh.io” annotation.

4.4 Network Service Client

For any implementation of the designs presented in Sections3 and 4.2, the same network service client has been used. This NSC is using the intact version of the sidecar init container (networkservicemesh/nsm-init) provided by Network Service Mesh. The section is, therefore, not involving the SDK of NSM for its implementation. Nevertheless, other components have to be set up: the VIP handler, the port limitation,

44 and a TCP testing server. Handling a VIP, as for the NSE, requires a network interface to have the VIP attached to it, moreover, since the NSC is the one responding to the requests and generating egress traffic, the outgoing traffic should have the VIP as the source IP. Every outgoing packet from the network interface NSM0 has then to be source NATted with the VIP. The command in the Listing 4.8 below, permits to achieve this task using netfilter on Linux.

Listing 4.8: IPTables to source NAT outgoing traffic i p t a b l e s −t n a t −A POSTROUTING −o NSM0 −j \ SNAT −−to 11.12.13.14

The port limitation implementation, as explained in Section 4.2.5, is using the NSM network interface to calculate the port range the pod is allowed to use and also, according to some environment variables giving the length of the port range. Once the port range is calculated (e.g. ”40000 41000”), it can be registered as the kernel parameter called ”net.ipv4.ip local port range” using sysctl. As a testing server, Netcat has been used. Netcat is a networking utility widely used for debugging and testing by providing the capability to read from and write to network connections using TCP or UDP. It can, then, be used to create a simple TCP server on a specific port, returning the hostname of the pod for every incoming session. Similarly, Netcat has been used as a client to test the connection from the router to the NSCs, and to test the egress traffic from an NSC. This tool is really useful to test long TCP sessions such as FTP and short TCP sessions such as HTTP. All of these components can be embedded in one container that can be attached as a sidecar to a real application without impacting it.

45 Chapter 5

Evaluation

Based on the implementation of the designs presented in the previous chapters, this chapter characterizes the evaluation and the results in three different parts: the performances, the security and the scalability. Together with the evaluation and the results, the methodology employed is described in detail.

5.1 Benchmarking methodology

To measure the performance, the system has been divided into two different parts. The first one is the external connectivity with a comparison between VxLAN and VPN designed in Section 3.1 and implemented in Section 4.2. The second one is a comparison between the existing data planes available within Network Service Mesh: VPP and Kernel, and their different methods to add connectivity between pods depends on the host they are running on: Local connectivity and remote connectivity. The separation between these two parts has been done since they are not tightly coupled. For instance, the external connectivity can be replaced without having any impact on the Network Service Mesh connectivity. This study can, then, be extended with new external connectivity (e.g. IPVLAN), and new NSM connectivity (e.g. OVS) based on the information given in this document. To evaluate the performance of network interconnect devices, a benchmarking methodology described in the RFC 2544 [93] exists. Several specific test methods for parameters such as throughput, latency or frame loss rate defined in RFC 1242 [28] are provided together with recommendations of result format. In this document, unidirectional traffic using UDP is sent and measured in frame per second (FPS) for the throughput, microsecond (us) for the latency, and, percent (%) for the frame loss rate. These three tests have been performed at seven different frame sizes: 64, 128, 256, 512, 1024, 1280 and 1518. The throughput is the fastest rate at which the destination machine receives the same number of test frames that have been sent [28]. Frames were sent at a specific rate through the right data plane (Kernel, VPP, etc.), and, if the number of frames received was lower than the number of frames sent, the test has been rerun with a reduced frame

46 rate [93][67]. According to the RFC 1242 [28], two different definitions of latency are existing: ”For store and forward devices” and ”For bit forwarding devices”. The one used in this document is ” For store and forward devices” which is ”the time interval starting when the last bit of the input frame reaches the input port and ending when the first bit of the output frame is seen on the output port” [28]. To perform the latency tests, the throughput has been calculated before, in order to send frames at the determined throughput rate to the destination. The time at which the frame was transmitted has been recorded as well as the time at which the frame was received, then the difference has been calculated. For each frame size, 20 rounds have been performed and the average has been calculated to get the test results [93]. The frame loss rate is the percentage of frames, which, due to lack of resources, have not been forwarded and which were therefore lost [28]. This test has been made under constant load defined in Table 5.1 below. For each frame size, 20 rounds have been performed and the average have been calculated to get the test results [67].

Size (bytes) Ethernet (pps) 64 14880 128 8445 256 4528 512 2349 1024 1197 1280 961 1518 812

Table 5.1: Maximum frame rates reference

The evaluation of the security and of the scalability is not numbers related. These parts of the evaluation focus more on the current state of Network Service Mesh and on the current state of the protocols. Some few tests have been run in order to simulate a large demand of the service and, then, to observe the behavior and the potential limits of Network Service Mesh.

5.2 Data plane performance

This section presents the results of the RFC 2544 benchmarks. The data plane performance has been split into two different parts in order to facilitate a potential evolution of this work and to have a clear separation between the external connectivity and Network Service Mesh itself.

5.2.1 External Connectivity Figure 5.1 shows the results from the throughput, latency and frame loss rate benchmarks for two solutions designed in Section 3.1: VxLan and VPN. As planned and discussed during Section 3.1, the performances in VxLAN are better than with

47 the VPN on every aspect. This difference of performance is probably due to three reasons. The first one, is the layer on the OSI model they are running on, the VxLAN is running on the layer 4 to encapsulate Ethernet frames into UDP packets while, the VPN used (OpenVPN), is running on the layer 7 with a custom security protocol. A VPN running on a lower layer using, for instance, IPSec or IP in IP, might improve the performances. The second reason is that, to work, the VPN needs a service which might have an impact on the connectivity. Indeed, depending on where the service is running, its quality could be different, and, consequently, might decrease the throughput and increase the latency and frame loss rate. Finally, the securities the VPN solution uses might also increase the latency, the machine at the source has to encrypt all the traffic, and the destination machine has to decrypt it.

6

4

2 Frame loss rate (%)

0

64 128 256 512 1024 1280 1518 Frame size (bytes)

6,000 1,500

4,000 1,000 Latency (us) Throughput (fps) 2,000 500

64 128 256 512 1024 1280 1518 64 128 256 512 1024 1280 1518 Frame size (bytes) Frame size (bytes) VxLan VPN

Figure 5.1: External Connectivity performances

These tests can be extended using more types of connectivity attached to the network service endpoint such as IPVLAN, VLANs and any other. OpenVPN is using a custom security protocol, but any of the other security protocols such as IPSec or IP in IP could be considered.

48 5.2.2 Network Service Mesh Connectivity Figure 5.2 shows the results from the throughput, latency and frame loss rate benchmarks for the two data planes (Kernel and VPP) that are currently available on Network Service Mesh. In addition to the two data planes, the results with their strategies, local and remote, are also presented in Figure 5.2 since the connectivity between a network service endpoint and a network service client can vary depending if the two pods are running on the same host or not. As can be observed, with the two data planes Network Service Mesh offers, a trade-off exists between the usage of Kernel and VPP. VPP had better results on the throughput and frame loss rate aspects for both local and remote strategies, while Kernel has a lower latency. VPP shows a pretty high latency, even in local, its latency is higher than with the kernel in remote.

4

3

2

1 Frame loss rate (%)

0

64 128 256 512 1024 1280 1518 Frame size (bytes)

5,000 4,000

3,000

4,000 2,000 Latency (us)

Throughput (fps) 1,000 3,000

0 64 128 256 512 1024 1280 1518 64 128 256 512 1024 1280 1518 Frame size (bytes) Frame size (bytes) Kernel local VPP local Kernel remote VPP remote

Figure 5.2: Network Service Mesh Connectivity performances

In the future, these tests can be extended using more data planes, types of interfaces and network acceleration frameworks such as OVS, DPDK or SR-IOV. In some needs, for instance, an East-West or North-South traffic optimization, some of these solutions

49 might improve the performance in every aspect presented above (throughput, latency and frame loss rate) compared with the two available solutions (VPP and Kernel).

5.3 Security

From a security perspective, by definition, a VPN has more capabilities than VxLAN. A VPN allows to secure and encrypt connections using cryptographic methods that, depending on the implementation, can stand on any of the layers 3, 4 or 7. These cryptographic methods are providing certificates using public/private key pair, crypt/decrypt and sign/verify methods in order to ensure the trustiness of the connections. For this VPN service implementation, an authentication infrastructure is needed to obtain and distribute credentials towards the different endpoints. The Informational document provided by the IETF published as RFC 7348 about VxLAN [79] is presenting security recommendations to prevent several types of attacks, secure the tunneled traffic and ensure VXLAN endpoints are authorized on the LAN. Highlighted during the RIPE 77 meeting in 2018 [65], these attacks can be injections of ARP traffic, SYN traffic or UDP packets.

5.4 Scalability

All the different solutions designed in Section 3.2 have scalability issues, including the one implemented in Section 4.2.5. Since there is no infinite number of ports available, the number of connections is then limited. The Internet Assigned Numbers Authority (IANA) suggests the range 49152 to 65535 for the egress traffic [71], and, many Linux kernels use the port range 32768 to 60999 (can be retrieved in the system variable: /proc/sys/net/ipv4/ip local port range). Considering the solution designed and implemented in Sections 3.2.3 and 4.2.5, with a global port range set between, for instance, 35000 and 60000, the range size will then be 25000. And considering each NSC will receive an allocation of 100 ports, the maximum number of pods that will be possible to create will then be 250. About the scalability of Network Service Mesh, on the current version released: v0.2.0 - Borealis, some issues are existing and have been observed during the scale-out (or horizontal scaling) tests ran. All network service clients are connected to a network service endpoint through different interfaces. In the past, in the Linux kernel version 2.2 and before, the number of interfaces was limited to 255. However, the implementation of network interfaces has been changed now to a linked list data structure in the new kernel versions in order to remove this limitation [51]. First and foremost, between a network service endpoint and a network service client, in IPv4, NSM is using 30 as CIDR (4 addresses) to create the connection, so, if the network service has, for instance, 24 as CIDR (256 addresses), only 64 network service clients will have the possibility to be connected to the network service endpoint. If there is no policy to limit the number of pods, and the number of pods exceeds the number of addresses available, the default init-container of some NSCs will be stuck,

50 and these NSCs will crash continuously. For the pods which are not able to attach the interfaces, the behavior of NSM will still be to try to create those interfaces, but they will remain on the host, even after the deletion of the network service. The second observation has been made based on a large-scale test with the CIDR of the network service set to 8. After reaching 80 network service clients connected to one unique network service endpoint, the connection was still working properly, but, beyond, like 100, Network Service Mesh has started crashing and creating an undefined number of interfaces (more than 100) on the network service endpoint which were not connected to anything.

51 Chapter 6

Conclusions and Future work

This work presented solutions to some issues IP multimedia subsystem faces when running in a cloud native environment. No NAT ingress and egress scenarios, traffic separation are some of the multiple topics that have been covered using several different alternatives and technologies. Network Service Mesh, the main project used during this thesis, demonstrated its capability to offer generic and standard solutions to Kubernetes. Still in the early stage of its development, Network Service Mesh is a really promising project solving complicated layer 2 and layer 3 use cases never yet addressed by Kubernetes nor any existing service mesh. Its ambitions and the problems it solves might make it a standard for telecommunication companies, Internet service providers and advanced enterprise networks which want to adopt cloud native environments by implementing cloud-native Network Functions (CNFs) and Service Function Chainings (SFCs) in an easy way. Some of the widely used technologies in telecommunication companies such as network acceleration with OVS, DPDK or SR- IOV are not yet available in Network Service Mesh, but might be release in the future, or be implemented and integrated in the project by external contributors since Network Service Mesh is an open source project. Before a final usage of Network Service Mesh, components have to be developed using the SDK provided as done during this thesis with a load balancer and a BGP speaker, but also for the external connectivity. A good compromise with Network Service Mesh would be to couple it with Multus in order to attach an external connectivity allowing to completely bypass the primary network of Kubernetes. In the future, as defined in the Chapter5, this work can be expanded with new data planes benchmarks for external connectivity (e.g. IPVLAN, MACVLAN, ect.), but also for Network Service Mesh if there is any new integrated to this project (e.g. SR-IOV, OVS, etc.). Newly integrated mechanisms for local and remote connectivity can also be included in the expansion of this work. The RFC 2544, used to perform performance benchmarks are not only defining the steps to measure throughput, latency and frame loss rate, Back-to-back frames benchmark defined in the RFC 2544 and can also be considered to evaluate the performance of Network Service Mesh and of the different the solutions presented

52 in this document. As of now in this thesis, only performance is taken into consideration for the evaluation of Network Service Mesh, future work could include the resources consumption (e.g. CPU, memory) of Network Service Mesh and the impact it has on the system and on the time an application takes to be entirely deployed. Further research can be also conducted on a potential overlapping of Network Service Mesh with the current discussion within the Network Plumbing Working Group (NPWG) about a new service abstraction that could replace the existing one in Kubernetes in the future.

53 Bibliography

[1] Dpdk. https://www.dpdk.org/. [Accessed 5 June 2020]. [2] Linux - the linux foundation. https://www.linuxfoundation.org/ projects/linux/. [Accessed 5 June 2020]. [3] Open containers initiative. https://www.opencontainers.org/. [Accessed 5 June 2020]. [4] Open source projects - the linux foundation. https: //www.linuxfoundation.org/projects/. [Accessed 5 June 2020]. [5] Virtual extensible local area networking documentation. https://www.kernel.org/doc/ Documentation/networking/vxlan.txt. [Accessed 5 June 2020]. [6] What is kubernetes? https://www.redhat.com/en/topics/ containers/what-is-kubernetes. [Accessed 5 June 2020]. [7] What is linux? https://www.linux.com/what-is-linux/. [Accessed 5 June 2020]. [8] What is network service mesh? https://networkservicemesh.io/ docs/concepts/what-is-nsm/. [Accessed 5 June 2020]. [9] The 5 principles of standard containers. https://github.com/opencontainers/ runtime-spec/blob/master/principles.md, dec 2016. [Accessed 5 June 2020]. [10] Namespaces - overview of linux namespaces. namespaces man page, aug 2019. [Accessed 5 June 2020]. [11] Container network interface specification. https://github.com/containernetworking/cni/ blob/master/SPEC.md, feb 2020. [Accessed 5 June 2020]. [12] Kubernetes - concepts - service. https://kubernetes.io/docs/ concepts/services-networking/service/, mar 2020. [Accessed 5 June 2020].

54 [13] Service name and transport protocol port number registry - 4789. https:// www.iana.org/assignments/service-names-port-numbers/ service-names-port-numbers.xhtml?search=4789, apr 2020. [Accessed 5 June 2020]. [14] 3rd Generation Partnership Project (3GPP). Specification : 23.002, version 15.0.0. Specification, mar 2018. [15] 3rd Generation Partnership Project (3GPP). 5g for the connected world. https://www.3gpp.org/news-events/ 2088-5g-for-the-connected-world, nov 2019. [Accessed 5 June 2020].

[16] 3rd Generation Partnership Project (3GPP). Specification : 22.228, version 17.0.0. Specification, dec 2019. [17] 3rd Generation Partnership Project (3GPP). Specification : 23.228, version 16.4.0. Specification, mar 2020.

[18] J. Uttaro N. Bitar W. Henderickx A. Isaac A. Sajassi, R. Aggarwal. Requirements for ethernet vpn (evpn). RFC 7209, Internet Engineering Task Force, may 2014. [19] N. Bitar R. Shekhar J. Uttaro W. Henderickx A. Sajassi, J. Drake. A network virtualization overlay solution using ethernet vpn (evpn). RFC 8365, Internet Engineering Task Force, mar 2018. [20] Daniel Walton Aeneas Dodd-Noble. Cisco ultra packet core - high performance and features. PDF, oct 2018. [21] Julian Anastasov. Software, patches and docs - ssi. http://ja.ssi.bg/ #lvsgw, apr 2020. [Accessed 5 June 2020].

[22] J. Anderson, H. Hu, U. Agarwal, C. Lowery, H. Li, and A. Apon. Performance considerations of network functions virtualization using containers. In 2016 International Conference on Computing, Networking and Communications (ICNC), pages 1–7, 2016.

[23] M. Aurel Constantinescu, V. Croitoru, and D. Oana Cernaianu. Nat/firewall traversal for sip: issues and solutions. In International Symposium on Signals, Circuits and Systems, 2005. ISSCS 2005., volume 2, pages 521–524 Vol.2, 2005. [24] Mahesh Bandewar. Ipvlan driver howto - linux documentation. https://www.kernel.org/doc/ Documentation/networking/ipvlan.txt. [Accessed 5 June 2020]. [25] D. Barach, L. Linguaglossa, D. Marion, P. Pfister, S. Pontarelli, and D. Rossi. High-speed software data plane via vectorized packet processing. IEEE Communications Magazine, 56(12):97–103, 2018.

55 [26] Christian Benvenuti. Understanding Linux Network Internals. O Reilly Media, 1 edition, dec 2005. [27] Arnd Bergmann. macvlan: implement bridge, vepa and private mode, 618e1b7482f7a8a4c6c6e8ccbe140e4c331df4e9. https://git.kernel. org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=618e1b7482f7a8a4c6c6e8ccbe140e4c331df4e9, nov 2009. [Accessed 5 June 2020]. [28] S. Bradner. Benchmarking terminology for network interconnection devices. RFC 1242, Internet Engineering Task Force, Jul 1991.

[29] David Jansen Jason Gmitter Jeff Ostermiller Jose Moreno Kenny Lei Lilian Quan Lukas Krattiger Max Ardica Rahul Parameswaran Rob Tappenden Satish Kondalam Brenden Buresh, Dan Eline. A modern, open and scalable fabric: Vxlan evpn. PDF, apr 2016. [30] MartAn˜ Casado, Teemu Koponen, Rajiv Ramanathan, and Scott Shenker. Virtualizing the network forwarding plane. page 8, 01 2010.

[31] chepurko. chepurko/k8s-ovpn: Openvpn on a kubernetes cluster. roll your own secure vpn cluster! https://github.com/chepurko/k8s-ovpn, may 2019. [Accessed 5 June 2020]. [32] Cisco. Mpls: Layer 3 vpns configuration guide, cisco ios xe release 3s (cisco asr 900 series). https://www.cisco.com/c/en/us/td/docs/ ios-xml/ios/mp_l3_vpns/configuration/xe-3s/asr903/ 16-9-1/b-mpls-l3-vpns-xe-16-9-asr900.pdf, jul 2016. [Accessed 5 June 2020]. [33] J. Claassen, R. Koning, and P. Grosso. Linux containers networking: Performance and scalability of kernel modules. In NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium, pages 713–717, 2016. [34] Common NFVI Telco Task Force (CNTT). Common nfvi telco task force (cntt) - glossary. https://github.com/cntt-n/CNTT/blob/master/ doc/tech/glossary.md, mar 2020. [Accessed 5 June 2020]. [35] L. Csikor, M. Szalay, B. Sonkoly, and L. Toka. Nfpa: Network function performance analyzer. In 2015 IEEE Conference on Network Function Virtualization and Software Defined Network (NFV-SDN), pages 15–17, 2015. [36] Network Plumbing Working Group Dan Williams, Doug Smith. Kubernetes network custom resource definition de-facto standard. https://github. com/k8snetworkplumbingwg/multi-net-spec/blob/master/ v1.1/%5Bv1.1%D%20Kubernetes%20Network%20Custom% 20Resource%20Definition%20De-facto%20Standard.pdf, Dec 2019. [Accessed 5 June 2020].

56 [37] Daniel E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein. Maglev: A fast and reliable software network load balancer. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 523–535, Santa Clara, CA, 2016.

[38] Ericsson. Cloud native is happening. https://www.ericsson.com/en/ digital-services/trending/cloud-native. [Accessed 5 June 2020]. [39] Ericsson. Ericsson mobility report. https://wcm.ericsson.net/ 4acd7e/assets/local/mobility-report/ documents/2019/emr-november-2019.pdf/, nov 2019. [Accessed 5 June 2020]. [40] fd.io. Vpp/what is vpp? https://wiki.fd.io/view/VPP/What_is_ VPP%3F, may 2017. [Accessed 5 June 2020].

[41] Andrew Feinberg Forbes. Virtualization: The future of telecom. https: //www.netcracker.com/assets/uploads/ Insights/virtualization-the-future-of-telecom.pdf, feb 2015. [Accessed 5 June 2020]. [42] The Linux Foundation. Open vswitch. https://www.openvswitch. org/. [Accessed 5 June 2020].

[43] Jason Gerend. Containers vs. virtual machines. https://docs.microsoft. com/en-us/virtualization/windowscontainers/about/ containers-vs-vm, oct 2019. [Accessed 5 June 2020].

[44] Miguel-Angel Garcia-Martin Gonzalo Camarillo. The 3G IP Multimedia Subsystem (IMS): Merging the Internet and the Cellular Worlds. Wiley, 3 edition, oct 2008. [45] 5G-PPP Software Network Working Group. From webscale to telco, the cloud native journey. PDF, jul 2018.

[46] 5G-PPP Software Network Working Group. Cloud-native and verticalsaˆ services, 5g-ppp projects analysis. PDF, aug 2019. [47] Network Plumbing Working Group. Npwg service abstraction discussion board. https://docs.google.com/document/d/1tYs_ O7Dz-YQenwPz6QHwm4ZoQ3bu0-1m-2c_7dno4N8/edit#, Mar 2020. [Accessed 5 June 2020]. [48] T. Anderson H. Khosravi. Requirements for separation of ip control and forwarding. RFC 3654, Internet Engineering Task Force, nov 2003.

57 [49] J. Halpern and C. Pignataro. Service function chaining (sfc) architecture. RFC 7665, Internet Engineering Task Force, oct 2015. [50] H. Handoko, S. M. Isa, S. Si, and M. Kom. High availability analysis with database cluster, load balancer and virtual router redudancy protocol. In 2018 3rd International Conference on Computer and Communication Systems (ICCCS), pages 482–486, 2018. [51] Red Hat. What is the maximum number of interface aliases supported in red hat enterprise linux? https://access.redhat.com/solutions/40500, May 2015. [Accessed 5 June 2020]. [52] Red Hat. Red hat enterprise linux 8 - system design guide - designing a rhel 8 system. PDF, feb 2020. [53] Thomas F Herbert. An comparison of fd.io and ovs/dpdk. https://www. dpdk.org/wp-content/uploads/sites/35/2016/08/ Day02-Session04-ThomasHerbert-DPDKUSASummit2016.pdf, aug 2016. [Accessed 5 June 2020]. [54] C. Hopps. Analysis of an equal-cost multi-path algorithm. RFC 2992, Internet Engineering Task Force, nov 2000. [55] Huawei. huawei-cloudnative/cni- genie. https://github.com/huawei-cloudnative/CNI-Genie, Feb 2019. [Accessed 5 June 2020]. [56] IEEE. 802.1q-2014 - bridges and bridged networks. IEEE, dec 2014. [57] Intel. intel/multus-cni. https://github.com/intel/multus-cni, May 2020. [Accessed 5 June 2020]. [58] G. Camarillo A. Johnston J. Peterson R. Sparks M. Handley E. Schooler J. Rosenberg, H. Schulzrinne. Sip: Session initiation protocol. RFC 3261, Internet Engineering Task Force, jun 2002. [59] A. Kleen A. Kuznetsov J. Salim, H. Khosravi. Linux netlink as an ip services protocol. RFC 3549, Internet Engineering Task Force, jul 2003. [60] E.Y. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, D. Han, and Kyung- Soon Park. Mtcp: A highly scalable userlevel tcp stack for multicore systems. Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI’14, pages 489–502, 01 2014. [61] Tahlia Richardson Dayle Parker Laura Bailey Scott Radvan Jiri Herrmann, Yehuda Zimmerman. Red hat enterprise linux 6 - virtualization host configuration and guest installation guide. PDF, oct 2017. [62] J. Hadi Salim D. Meyer O. Koufopavlou K. Pentikousis, S. Denazis. Software- defined networking (sdn): Layers and architecture terminology. RFC 7426, Internet Engineering Task Force, jan 2015.

58 [63] M. Kang, J. Shin, and J. Kim. Protected coordination of service mesh for container-based 3-tier service traffic. In 2019 International Conference on Information Networking (ICOIN), pages 427–429, 2019. [64] Kimoon Bae and Kyuseob Cho. The problems and their solutions when using sip in nat environment. In 5th International Conference on Computer Sciences and Convergence Information Technology, pages 997–1000, 2010. [65] Henrik Lund Kramshoj. Vxlan security or injection. https:// ripe77.ripe.net/presentations/32-vxlan-ripe77.pdf, oct 2018. [Accessed 5 June 2020].

[66] W. Li, Y. Lemieux, J. Gao, Z. Zhao, and Y. Han. Service mesh: Challenges, state of the art, and future research opportunities. In 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE), pages 122–1225, 2019. [67] F. Lifu, Y. Dongming, T. Bihua, L. Yuanan, and H. Hefei. Technique for network performance measurement based on rfc 2544. In 2012 Fourth International Conference on Computational Intelligence and Communication Networks, pages 200–204, 2012. [68] linuxvirtualserver. Ipvs software - advanced layer-4 switching. http:// www.linuxvirtualserver.org/software/ipvs.html. [Accessed 5 June 2020].

[69] linuxvirtualserver. Ipvsadm - man page. http://kb.linuxvirtualserver.org/wiki/Ipvsadm, apr 2007. [Accessed 5 June 2020]. [70] linuxvirtualserver. Ipvs wiki. http://kb.linuxvirtualserver.org/ wiki/IPVS, aug 2012. [Accessed 5 June 2020]. [71] J. Touch M. Westerlund S. Cheshire M. Cotton, L. Eggert. Internet assigned numbers authority (iana) procedures for the management of the service name and transport protocol port number registry. RFC 6335, Internet Engineering Task Force, Aug 2011.

[72] C. Perkins M. Handley, V. Jacobson. Sdp: Session description protocol. RFC 4566, Internet Engineering Task Force, jul 2006. [73] E. Schooler J. Rosenberg M. Handley, H. Schulzrinne. Sip: Session initiation protocol. RFC 2543, Internet Engineering Task Force, mar 1999.

[74] Joseph Mack. Lvs-howto. http://www.austintek.com/LVS/ LVS-HOWTO/HOWTO/, jan 2012. [Accessed 5 June 2020]. [75] Peter Willis Andy Reid James Feger Michael Bugenhagen Waqar Khan Michael Fargano Dr. Chunfeng Cui Dr. Hui Deng Javier Benitez Uwe Michel Herbert Damker Kenichi Ogaki Tetsuro Matsuzaki Masaki Fukui Katsuhiro Shimano

59 Dominique Delisle Quentin Loudier Christos Kolias Ivano Guardini Elena Demaria Roberto Minerva Antonio Manzalini Diego Lopez Francisco Javier Ramon Salguero Frank Ruhl Prodip Sen Margaret Chiosi, Don Clarke. Network functions virtualisation - white paper. oct 2012. [76] Siarhei Matsiukevich. Kubernetes networking: How to write your own cni plug- in with bash. https://www.reddit.com/r/devops/comments/ 97nx32/kubernetes_networking_writing_your_own_simple_ cni/, aug 2018. [Accessed 5 June 2020]. [77] Scott McCarty. A practical introduction to container terminology. https: //developers.redhat.com/blog/2018/02/22/ container-terminology-practical-introduction/, feb 2018. [Accessed 5 June 2020]. [78] X. Meng and R. Chen. Petri net modeling of sip of traversing nat based on stun. In 2012 International Conference on Computer Science and Electronics Engineering, volume 3, pages 134–138, 2012.

[79] K.Duda P.Agarwal L.Kreeger T.Sridhar M.Bursell C.Wright M.Mahalingam, D.Dutt. Virtual extensible local area network (vxlan): A framework for overlaying virtualized layer 2 networks over layer 3 networks. RFC 7348, Internet Engineering Task Force, aug 2014. [80] networkservicemesh. Network service mesh sdk, 8d64ff42c90d420c60b06be91cf16b45eb8ac781. https://github.com/ networkservicemesh/networkservicemesh/blob/ 8d64ff42c90d420c60b06be91cf16b45eb8ac781/sdk/, mar 2019. [Accessed 5 June 2020]. [81] networkservicemesh. sdk/common/configuration.go, master. https:// github.com/networkservicemesh/networkservicemesh/ blob/master/sdk/common/configuration.go, dec 2019. [Accessed 5 June 2020]. [82] networkservicemesh. Network service mesh - google drive. https://drive.google.com/drive/folders/ 1f5fek-PLvoycMTCp6c-Dn_d9_sBNTfag, may 2020. [Accessed 5 June 2020]. [83] Nokia. nokia/danm. https://github.com/nokia/danm, May 2020. [Accessed 5 June 2020].

[84] opnfv. Traffic generator testing - opnfv wiki. https://wiki. opnfv.org/display/vsperf/Traffic+Generator+Testing, oct 2018. [Accessed 5 June 2020]. [85] osrg. osrg/gobgp: Gobgp: Bgp implementation in go. https://github. com/osrg/gobgp, may 2020. [Accessed 5 June 2020].

60 [86] J. Tantsura P. Lapukhov. Equal-cost multipath considerations for bgp - draft- lapukhov-bgp-ecmp-considerations-03, nov 2019. [87] M. Holdrege P. Srisuresh. Ip network address translator (nat) terminology and considerations. RFC 2663, Internet Engineering Task Force, aug 1999. [88] Florian Wohlfart Paul Emmerich, Daniel Raumer and Georg Carle. Assessing soft- and hardware bottlenecks in pc-based packet forwarding systems. ICN 2015, apr 2015. [89] Nikolai Pitaev, Matthias Falkner, Aris Leivadeas, and Ioannis Lambadaris. Characterizing the performance of concurrent virtualized network functions with ovs-dpdk, fd.io vpp and sr-iov. pages 285–292, 03 2018. [90] Ramneek, S. Cha, S. H. Jeon, Y. J. Jeong, J. M. Kim, and S. Jung. Analysis of linux kernel packet processing on manycore systems. In TENCON 2018 - 2018 IEEE Region 10 Conference, pages 2276–2280, 2018. [91] James Edwards Rich Seifert. The All-New Switch Book: The Complete Guide to LAN Switching Technology. Wiley, 2 edition, aug 2008. [92] Rami Rosen. Linux Kernel Networking: Implementation and Theory. Apress, 1 edition, dec 2013. [93] J. McQuaid S. Bradner. Analysis of an equal-cost multi-path algorithm. RFC 2544, Internet Engineering Task Force, mar 1999. [94] Rowayda A. Sadek. An agile internet of things (iot) based software defined network (sdn) architecture. Egyptian Computer Science Journal, 42(2):13–29, may 2019. [95] William Stallings. Foundations of Modern Networking: SDN, NFV, QoE, IoT, and Cloud. Addison-Wesley, 1 edition, 2015. [96] D. Katz Y. Rekhter T. Bates, R. Chandra. Multiprotocol extensions for bgp-4. RFC 4760, Internet Engineering Task Force, jan 2007. [97] Wikipedia. Virtualizing the network forwarding plane. https://en. wikipedia.org/wiki/Forwarding_plane, oct 2019. [Accessed 5 June 2020]. [98] Wenji wu and Matt Crawford. Potential performance bottleneck in linux tcp. Int. J. Communication Systems, 20:1263–1283, 11 2007. [99] S. Hares Y. Rekhter, T. Li. A border gateway protocol 4 (bgp-4). RFC 4271, Internet Engineering Task Force, jan 2006. [100] L. Yang and K. Lei. Combining ice and sip protocol for nat traversal in new classification standard. In 2016 5th International Conference on Computer Science and Network Technology (ICCSNT), pages 576–580, 2016.

61