A Framework for Ebpf-Based Network Functions in an Era of Microservices

IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 1, MARCH 2021 133 A Framework for eBPF-Based Network Functions in an Era of Microservices Sebastiano Miano , Member, IEEE, Fulvio Risso , Member, IEEE, Mauricio Vásquez Bernal, Matteo Bertrone, and Yunsong Lu Abstract—By moving network functionality from dedicated stores [9], [10]; thanks also to the development and availability hardware to software running on end-hosts, Network Functions of programmable network devices (e.g., SmartNICs) [11]. The Virtualization (NFV) pledges the benefits of cloud computing to increased flexibility (software is intrinsically easier to program packet processing. While most of the NFV frameworks today rely on kernel-bypass approaches, no attention has been given to compared to the hardware), and the recent advances in terms of kernel packet processing, which has always proved hard to evolve speed for the software packet processing have then contributed and to program. In this article, we present Polycube, a software to the proliferation of a myriad of VNFs frameworks that framework whose main goal is to bring the power of NFV to provide implementations of efficient and easily programmable in-kernel packet processing applications, enabling a level of flex- software middleboxes [12], [13], [14], [15], [16], [17], [18]. ibility and customization that was unthinkable before. Polycube enables the creation of arbitrary and complex network function Current solutions to implement the dataplane of those soft- chains, where each function can include an efficient in-kernel data ware packet processing applications rely mostly on kernel plane and a flexible user-space control plane with strong char- bypass approaches, by giving to the user-space direct access acteristics of isolation, persistence, and composability. Polycube to the underlying hardware (e.g., DPDK [19], netmap [20], network functions, called Cubes, can be dynamically generated FD.io [21]) or by following a unikernel approach, where and injected into the kernel networking stack, without requiring custom kernels or specific kernel modules, simplifying the debug- only the minimal set of OS functionalities, required for ging and introspection, which are two fundamental properties in the application to run, are built with the application itself recent cloud environments. We validate the framework by show- (e.g., ClickOS [22], [23], [24]). These approaches have per- ing significant improvements over existing applications, and we fectly served their purposes, with efficient implementations prove the generality of the Polycube programming model through of software network functions that have shown potential for the implementation of complex use cases such as a network provider for Kubernetes. processing 10-100Gbps on a single server [25], [26], [27]. Recently, new technologies such as 5G, edge comput- Index Terms—NFV, eBPF, XDP, linux. ing and IoT among the others, led to a significant increase in the total number of connected devices and consequent I. INTRODUCTION network load, hence originating two new trends. From one ITH the advent of Software Defined Networks (SDN) side, traditional “centralized” appliances (e.g., global datacen- W and Network Functions Virtualization (NFV), a large ter firewalls) are hard to scale, leading to a more distributed number of Network Functions (NFs)1 are becoming pure soft- approach in which network functions are implemented directly ware images executed on general-purpose servers, running on end hosts (e.g., datacenter servers). From the other side, either as virtual machines (VMs) or as cloud native software. cloud-native technologies are used to built NFs packaged in Possible examples include load balancing [1], [2], [3], [4], containers, deployed as microservices, and managed on elastic congestion control [5], [6], [7], and application-specific infrastructure through agile DevOps processes and continu- network workloads such as DDoS Mitigation [8] or key-value ous delivery workflows [28]. These new requirements have also caused a visible change in the type and requirements Manuscript received June 1, 2020; revised October 28, 2020 and January 4, 2021; accepted January 25, 2021. Date of publication January 29, 2021; of network functionalities deployed across the data center. date of current version March 11, 2021. The associate editor coordi- Network applications should be able to continuously adapt nating the review of this article and approving it for publication was to the runtime behavior of cloud-native applications, which T. Zinner. (Corresponding author: Sebastiano Miano.) Sebastiano Miano, Fulvio Risso, Mauricio Vásquez Bernal, and might regularly change or be scheduled by an orchestra- Matteo Bertrone are with the Department of Control and Computer tor, or easily interact with existing “native” applications by Engineering, Politecnico di Torino, 10129 Torino, Italy (e-mail: sebas- leveraging kernel functionalities - all of this without sacri- [email protected]; [email protected]; [email protected]; [email protected]). ficing performance or flexibility. For instance, cloud-native Yunsong Lu was with the Networking and Emerging Technologies platforms, like Kubernetes [29], can exploit different network Group, Futurewei Technologies, Inc., Santa Clara, CA 95050 USA. He plug-ins2 to implement the underlying data plane functionali- is now with the Linux Foundation, San Francisco, CA, USA (e-mail: [email protected]). ties and transparently steer packets between micro-services. Digital Object Identifier 10.1109/TNSM.2021.3055676 1This article uses the term Network Function (NF) to specify components such as traditional individual appliances (e.g., bridge, router, NAT, etc.), 2A list of network plugins (also known as Container Network Interface while a service defines a more complex scenario in which multiple NFs must (CNI) plug-ins) is available in the Kubernetes Cluster Networking page, cooperate, e.g., through a chain of NFs. https://kubernetes.io/docs/concepts/cluster-administration/networking/. 1932-4537 c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: Politecnico di Torino. Downloaded on March 16,2021 at 18:07:53 UTC from IEEE Xplore. Restrictions apply. 134 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 1, MARCH 2021 Unfortunately, the previously mentioned kernel-bypass used to simplify the development of new NFs approaches suffer within this new scenario [30]. First, they (Section VI). require the exclusive allocation of resources (i.e., CPU cores) • We introduce a generic model for the control and man- to achieve good performance; this is perfectly fine when agement plane of each NF that is used to simplify the we have machines dedicated to networking purposes but it manageability and accelerate the development of new becomes overwhelming when this cost has to be paid for network services (Section VII), which complement the every server in the cluster since they permanently steal pre- dataplane-only approach proposed by eBPF. cious CPU cycles to other application tasks. Second, they • Finally, we identify and quantify both the overhead require to re-implement the entire network stack in userspace, introduced by the Polycube programming model and losing all the well-tested configuration, deployment and man- its the data plane abstractions and the performance agement tools developed over the years within the operating improvements brought to existing kernel implementations system. Third, they rely on custom or modified versions of (Section IX). NIC (network interface card) drivers, which may not be avail- Polycube is open-source and available at [34]. able on public cloud platforms, also requiring a non-negligible maintenance cost. Last but not least, they have difficulties (and poor performance) when dealing with existing kernel II. BACKGROUND implementations or communicating with applications that are not implemented using the same approach, requiring them This section lists the main properties that we believe are fun- to adhere to custom-defined APIs (e.g., mTCP [31]) or to damental in today’s environments (Section II-A), it provides change the original application logic (e.g., StackMap [32]). a brief overview of the extended Berkeley Packet Filter sub- As a consequence, most of the existing cloud-native network system (Section II-B) and why we believe this may represent plug-ins for Kubernetes still rely on functionalities and tools a good choice for networking applications (Section II-C). embedded into the operating system network stack (e.g., ipta- bles, ipvs, linuxbridge), while Polycube (and recently, Cilium) being notable exceptions. Unfortunately, the drawbacks of this A. Desired Properties for Cloud-Native NFs approach are also evident. First of all, fixed kernel network Low Overhead: Although efficiency is always a desir- applications are notoriously slow and inefficient given their able property, this assumes even more significance in the generality, which impairs the possibility to specialize them cloud-native context where servers are mostly used to deliver depending on workloads or the type of application that is high-level services (e.g., Web portals, databases, etc.) and running on top. Secondly, software network functions (or networking components are often perceived as an unavoidable the associated kernel modules [33]) that

Load more