Netvm: High Performance and Flexible Networking Using Virtualization on Commodity Platforms Jinho Hwang† K

Total Page:16

File Type:pdf, Size:1020Kb

Netvm: High Performance and Flexible Networking Using Virtualization on Commodity Platforms Jinho Hwang† K NetVM: High Performance and Flexible Networking using Virtualization on Commodity Platforms Jinho Hwang† K. K. Ramakrishnan∗ Timothy Wood† †The George Washington University ∗WINLAB, Rutgers University Abstract compared to purpose-built (often proprietary) network- NetVM brings virtualization to the Network by enabling ing hardware or middleboxes based on custom ASICs. high bandwidth network functions to operate at near Middleboxes are typically hardware-software pack- line speed, while taking advantage of the flexibility and ages that come together on a special-purpose appliance, customization of low cost commodity servers. NetVM often at high cost. In contrast, a high throughput platform allows customizable data plane processing capabilities based on virtual machines (VMs) would allow network such as firewalls, proxies, and routers to be embed- functions to be deployed dynamically at nodes in the net- ded within virtual machines, complementing the con- work with low cost. Further, the shift to VMs would let trol plane capabilities of Software Defined Networking. businesses run network services on existing cloud plat- NetVM makes it easy to dynamically scale, deploy, and forms, bringing multiplexing and economy of scale ben- reprogram network functions. This provides far greater efits to network functionality. Once data can be moved flexibility than existing purpose-built, sometimes propri- to, from and between VMs at line rate for all packet sizes, etary hardware, while still allowing complex policies and we approach the long-term vision where the line between full packet inspection to determine subsequent process- data centers and network resident “boxes” begins to blur: ing. It does so with dramatically higher throughput than both software and network infrastructure could be devel- existing software router platforms. oped, managed, and deployed in the same fashion. NetVM is built on top of the KVM platform and In- Progress has been made by network virtualization tel DPDK library. We detail many of the challenges standards and SDN to provide greater configurability in we have solved such as adding support for high-speed the network [1–4]. SDN improves flexibility by allowing inter-VM communication through shared huge pages software to manage the network control plane, while the and enhancing the CPU scheduler to prevent overheads performance-critical data plane is still implemented with caused by inter-core communication and context switch- proprietary network hardware. SDN allows for new flex- ing. NetVM allows true zero-copy delivery of data to ibility in how data is forwarded, but the focus on the con- VMs both for packet processing and messaging among trol plane prevents dynamic management of many types VMs within a trust boundary. Our evaluation shows of network functionality that rely on the data plane, for how NetVM can compose complex network functional- example the information carried in the packet payload. ity from multiple pipelined VMs and still obtain through- This limits the types of network functionality that can puts up to 10 Gbps, an improvement of more than 250% be “virtualized” into software, leaving networks to con- compared to existing techniques that use SR-IOV for vir- tinue to be reliant on relatively expensive network appli- tualized networking. ances that are based on purpose-built hardware. Recent advances in network interface cards (NICs) al- low high throughput, low-latency packet processing us- 1 Introduction ing technologies like Intel’s Data Plane Development Kit Virtualization has revolutionized how data center servers (DPDK) [5]. This software framework allows end-host are managed by allowing greater flexibility, easier de- applications to receive data directly from the NIC, elim- ployment, and improved resource multiplexing. A simi- inating overheads inherent in traditional interrupt driven lar change is beginning to happen within communication OS-level packet processing. Unfortunately, the DPDK networks with the development of virtualization of net- framework has a somewhat restricted set of options for work functions, in conjunction with the use of software support of virtualization, and on its own cannot support defined networking (SDN). While the migration of net- the type of flexible, high performance functionality that work functions to a more software based infrastructure is network and data center administrators desire. likely to begin with edge platforms that are more “con- To improve this situation, we have developed NetVM, trol plane” focused, the flexibility and cost-effectiveness a platform for running complex network functional- obtained by using common off-the-shelf hardware and ity at line-speed (10Gbps) using commodity hardware. systems will make migration of other network functions NetVM takes advantage of DPDK’s high throughput attractive. One main deterrent is the achievable per- packet processing capabilities, and adds to it abstractions formance and scalability of such virtualized platforms that enable in-network services to be flexibly created, chained, and load balanced. Since these “virtual bumps” User Applications can inspect the full packet data, a much wider range of Buffer Ring packet processing functionality can be supported than Management Management in frameworks utilizing existing SDN-based controllers Poll Mode Packet Flow manipulating hardware switches. As a result, NetVM Libraries Drivers Classification Data Plane makes the following innovations: Environmental Abstraction Layer 1. A virtualization-based platform for flexible network Linux service deployment that can meet the performance of H/W Platform customized hardware, especially those involving com- Figure 1: DPDK’s run-time environment over Linux. plex packet processing. 2. A shared-memory framework that truly exploits the long pipelines, out-of-order and speculative execution, DPDK library to provide zero-copy delivery to VMs and multi-level memory systems, all of which tend to and between VMs. increase the penalty paid by an interrupt in terms of cy- 3. A hypervisor-based switch that can dynamically ad- cles [12, 13]. When the packet reception rate increases just a flow’s destination in a state-dependent (e.g., further, the achieved (receive) throughput can drop dra- for intelligent load balancing) and/or data-dependent matically in such systems [14]. Second, existing operat- manner (e.g., through deep packet inspection). ing systems typically read incoming packets into kernel space and then copy the data to user space for the applica- 4. An architecture that supports high speed inter-VM tion interested in it. These extra copies can incur an even communication, enabling complex network services greater overhead in virtualized settings, where it may be to be spread across multiple VMs. necessary to copy an additional time between the hyper- 5. Security domains that restrict packet data access to visor and the guest operating system. These two sources only trusted VMs. of overhead limit the the ability to run network services We have implemented NetVM using the KVM and on commodity servers, particularly ones employing vir- DPDK platforms−all the aforementioned innovations tualization [15, 16]. are built on the top of DPDK. Our results show how The Intel DPDK platform tries to reduce these over- NetVM can compose complex network functionality heads by allowing user space applications to directly poll from multiple pipelined VMs and still obtain line rate the NIC for data. This model uses Linux’s huge pages throughputs of 10Gbps, an improvement of more than to pre-allocate large regions of memory, and then allows 250% compared to existing SR-IOV based techniques. applications to DMA data directly into these pages. Fig- We believe NetVM will scale to even higher throughputs ure 1 shows the DPDK architecture that runs in the ap- on machines with additional NICs and processing cores. plication layer. The poll mode driver allows applications to access the NIC card directly without involving ker- 2 Background and Motivation nel processing, while the buffer and ring management This section provides background on the challenges of systems resemble the memory management systems typ- providing flexible network services on virtualized com- ically employed within the kernel for holding sk buffs. modity servers. While DPDK enables high throughput user space ap- plications, it does not yet offer a complete framework for 2.1 Highspeed COTS Networking constructing and interconnecting complex network ser- Software routers, SDN, and hypervisor based switching vices. Further, DPDK’s passthrough mode that provides technologies have sought to reduce the cost of deploy- direct DMA to and from a VM can have significantly ment and increase flexibility compared to traditional net- lower performance than native IO1. For example, DPDK work hardware. However, these approaches have been supports Single Root I/O Virtualization (SR-IOV2) to al- stymied by the performance achievable with commodity low multiple VMs to access the NIC, but packet “switch- servers [6–8]. These limitations on throughput and la- ing” (i.e., demultiplexing or load balancing) can only be tency have prevented software routers from supplanting performed based on the L2 address. As depicted in Fig- custom designed hardware [9–11]. ure 2(a), when using SR-IOV, packets are switched on There are two main challenges that prevent commer- cial off-the-shelf (COTS) servers from being able to pro- 1 Until Sandy-bridge, the performance was close to half of native per- cess network
Recommended publications
  • SNMP Trap - Firewall Rules
    SNMP Trap - Firewall Rules Article Number: 87 | Rating: 1/5 from 1 votes | Last Updated: Wed, Jan 13, 2021 at 4:42 PM Fir e wall Rule s These steps explain how to check if the Operating System (OS) of the Nagios server has firewall rules enabled to allow inbound SNMP Trap UDP port 162 traffic. The different supported OS's have different firewall commands which are explained as follows. You will need to establish an SSH session to the Nagios server that is receiving SNMP Traps. RHEL 7/8 | C e nt O S 7/8 | O r ac le Linux 7/8 First check the status of the firewall: systemctl status firewalld.service IF the firewall is running , it should product output like: ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2018-11-20 10:05:15 AEDT; 1 weeks 0 days ago Docs: man:firewalld(1) Main PID: 647 (firewalld) CGroup: /system.slice/firewalld.service └─647 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid IF the firewall is NO T running, it will produce this output: ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: inactive (dead) since Tue 2018-11-27 14:11:34 AEDT; 965ms ago Docs: man:firewalld(1) Main PID: 647 (code=exited, status=0/SUCCESS) If the firewall is NOT running, this means that inbound traffic is allowed. To ENABLE the firewall on b o o t and to s ta rt it, execute the following commands: systemctl
    [Show full text]
  • Netfilter's Connection Tracking System
    FILTERING POLICIES BASED UNIQUELY on packet header information are obsolete. PABLO NEIRA AYUSO These days, stateful firewalls provide advanced mechanisms to let sysadmins Netfilter’s and security experts define more intelli- gent policies. This article describes the connection implementation details of the connection tracking system tracking system provided by the Netfilter project and also presents the required Pablo Neira Ayuso has an M.S. in computer science background to understand it, such as an and has worked for several companies in the IT secu- rity industry, with a focus on open source solutions. understanding of the Netfilter framework. Nowadays he is a full-time teacher and researcher at the University of Seville. This article will be the perfect complement to understanding the subsystem that [email protected] enables the stateful firewall available in any recent Linux kernel. The Netfilter Framework The Netfilter project was founded by Paul “Rusty” Russell during the 2.3.x development series. At that time the existing firewalling tool for Linux had serious drawbacks that required a full rewrite. Rusty decided to start from scratch and create the Netfilter framework, which comprises a set of hooks over the Linux network protocol stack. With the hooks, you can register kernel modules that do some kind of network packet handling at different stages. Iptables, the popular firewalling tool for Linux, is commonly confused with the Netfilter framework itself. This is because iptables chains and hooks have the same names. But iptables is just a brick on top of the Netfilter framework. Fortunately, Rusty spent considerable time writ- ing documentation [1] that comes in handy for anyone willing to understand the framework, al- though at some point you will surely feel the need to get your hands dirty and look at the code to go further.
    [Show full text]
  • Communicating Between the Kernel and User-Space in Linux Using Netlink Sockets
    SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2010; 00:1–7 Prepared using speauth.cls [Version: 2002/09/23 v2.2] Communicating between the kernel and user-space in Linux using Netlink sockets Pablo Neira Ayuso∗,∗1, Rafael M. Gasca1 and Laurent Lefevre2 1 QUIVIR Research Group, Departament of Computer Languages and Systems, University of Seville, Spain. 2 RESO/LIP team, INRIA, University of Lyon, France. SUMMARY When developing Linux kernel features, it is a good practise to expose the necessary details to user-space to enable extensibility. This allows the development of new features and sophisticated configurations from user-space. Commonly, software developers have to face the task of looking for a good way to communicate between kernel and user-space in Linux. This tutorial introduces you to Netlink sockets, a flexible and extensible messaging system that provides communication between kernel and user-space. In this tutorial, we provide fundamental guidelines for practitioners who wish to develop Netlink-based interfaces. key words: kernel interfaces, netlink, linux 1. INTRODUCTION Portable open-source operating systems like Linux [1] provide a good environment to develop applications for the real-world since they can be used in very different platforms: from very small embedded devices, like smartphones and PDAs, to standalone computers and large scale clusters. Moreover, the availability of the source code also allows its study and modification, this renders Linux useful for both the industry and the academia. The core of Linux, like many modern operating systems, follows a monolithic † design for performance reasons. The main bricks that compose the operating system are implemented ∗Correspondence to: Pablo Neira Ayuso, ETS Ingenieria Informatica, Department of Computer Languages and Systems.
    [Show full text]
  • Introduction to Linux Virtual Server and High Availability
    Outlines Introduction to Linux Virtual Server and High Availability Chen Kaiwang [email protected] December 5, 2011 Chen Kaiwang [email protected] LVS-DR and Keepalived Outlines If you don't know the theory, you don't have a way to be rigorous. Robert J. Shiller http://www.econ.yale.edu/~shiller/ Chen Kaiwang [email protected] LVS-DR and Keepalived Outlines Misery stories I Jul 2011 Too many connections at zongheng.com I Aug 2011 Realserver maintenance at 173.com quiescent persistent connections I Nov 2011 Health check at 173.com I Nov 2011 Virtual service configuration at 173.com persistent session data Chen Kaiwang [email protected] LVS-DR and Keepalived Outlines Outline of Part I Introduction to Linux Virtual Server Configuration Overview Netfilter Architecture Job Scheduling Scheduling Basics Scheduling Algorithms Connection Affinity Persistence Template Persistence Granularity Quirks Chen Kaiwang [email protected] LVS-DR and Keepalived Outlines Outline of Part II HA Basics LVS High Avaliablity Realserver Failover Director Failover Solutions Heartbeat Keepalived Chen Kaiwang [email protected] LVS-DR and Keepalived LVS Intro Job Scheduling Connection Affinity Quirks Part I Introduction to Linux Virtual Server Chen Kaiwang [email protected] LVS-DR and Keepalived LVS Intro Job Scheduling Configuration Overview Connection Affinity Netfilter Architecture Quirks Introduction to Linux Virtual Server Configuration Overview Netfilter Architecture Job Scheduling Scheduling Basics Scheduling Algorithms Connection Affinity Persistence Template Persistence Granularity Quirks Chen Kaiwang [email protected] LVS-DR and Keepalived LVS Intro Job Scheduling Configuration Overview Connection Affinity Netfilter Architecture Quirks A Linux Virtual Serverr (LVS) is a group of servers that appear to the client as one large, fast, reliable (highly available) server.
    [Show full text]
  • Netkernel: Making Network Stack Part of the Virtualized Infrastructure
    NetKernel: Making Network Stack Part of the Virtualized Infrastructure Zhixiong Niu Hong Xu Peng Cheng Microsoft Research City University of Hong Kong Microsoft Research Qiang Su Yongqiang Xiong Tao Wang Dongsu Han City University of Hong Kong Microsoft Research New York University KAIST Keith Winstein Stanford University Abstract We argue that the current division of labor between appli- This paper presents a system called NetKernel that decou- cation and network infrastructure is becoming increasingly ples the network stack from the guest virtual machine and inadequate, especially in a private cloud setting. The central offers it as an independent module. NetKernel represents a issue is that the network stack is controlled solely by indi- new paradigm where network stack can be managed as part of vidual guest VM; the operator has almost zero visibility or the virtualized infrastructure. It provides important efficiency control. This leads to efficiency problems that manifest in vari- benefits: By gaining control and visibility of the network stack, ous aspects of running the cloud network. Firstly, the operator operator can perform network management more directly and is unable to orchestrate resource allocation at the end-points flexibly, such as multiplexing VMs running different applica- of the network fabric, resulting in low resource utilization. It tions to the same network stack module to save CPU. Users remains difficult today for the operator to meet or define per- also benefit from the simplified stack deployment and better formance SLAs despite much prior work [17,28,35,41,56,57], performance. For example mTCP can be deployed without as she cannot precisely provision resources just for the net- API change to support nginx natively, and shared memory work stack or control how the stack consumes these resources.
    [Show full text]
  • Ready; T=0.01/0.01 21:14:10 Ipl 160 Clear Zipl V1.8.0-44.45.2 Interactive Boot Menu
    Ready; T=0.01/0.01 21:14:10 ipl 160 clear zIPL v1.8.0-44.45.2 interactive boot menu 0. default (ipl) 1. ipl Note: VM users please use '#cp vi vmsg <number> <kernel-parameters>' Please choose (default will boot in 10 seconds): Booting default (ipl)... Initializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.32.36-0.5-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3- branch revision 152973] (SUSE Linux) ) #1 SMP 2011 -04-14 10:12:31 +0200 setup.1a06a7: Linux is running as a z/VM guest operating system in 64-bit mode Zone PFN ranges: DMA 0x00000000 -> 0x00080000 Normal 0x00080000 -> 0x00080000 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0x00000000 -> 0x00020000 PERCPU: Embedded 11 pages/cpu @0000000000e92000 s12544 r8192 d24320 u65536 pcpu-alloc: s12544 r8192 d24320 u65536 alloc=16*4096 pcpu-alloc: [0] 00 [0] 01 [0] 02 [0] 03 [0] 04 [0] 05 [0] 06 [0] 07 pcpu-alloc: [0] 08 [0] 09 [0] 10 [0] 11 [0] 12 [0] 13 [0] 14 [0] 15 pcpu-alloc: [0] 16 [0] 17 [0] 18 [0] 19 [0] 20 [0] 21 [0] 22 [0] 23 pcpu-alloc: [0] 24 [0] 25 [0] 26 [0] 27 [0] 28 [0] 29 [0] 30 [0] 31 pcpu-alloc: [0] 32 [0] 33 [0] 34 [0] 35 [0] 36 [0] 37 [0] 38 [0] 39 pcpu-alloc: [0] 40 [0] 41 [0] 42 [0] 43 [0] 44 [0] 45 [0] 46 [0] 47 pcpu-alloc: [0] 48 [0] 49 [0] 50 [0] 51 [0] 52 [0] 53 [0] 54 [0] 55 pcpu-alloc: [0] 56 [0] 57 [0] 58 [0] 59 [0] 60 [0] 61 [0] 62 [0] 63 Built 1 zonelists in Zone order, mobility grouping on.
    [Show full text]
  • Nftables Och Iptables En Jämförelse Av Latens Nftables and Iptables a Comparison in Latency
    NFtables and IPtables Jonas Svensson Eidsheim NFtables och IPtables En jämförelse av latens NFtables and IPtables A Comparison in Latency Bachelors Degree Project in Computer Science Network and Systems Administration, G2E, 22.5 hp IT604G Jonas Svensson Eidsheim [email protected] Examiner Jonas Gamalielsson Supervisor Johan Zaxmy Abstract Firewalls are one of the essential tools to secure any network. IPtables has been the de facto firewall in all Linux systems, and the developers behind IPtables are also responsible for its intended replacement, NFtables. Both IPtables and NFtables are firewalls developed to filter packets. Some services are heavily dependent on low latency transport of packets, such as VoIP, cloud gaming, storage area networks and stock trading. This work is aiming to compare the latency between the selected firewalls while under generated network load. The network traffic is generated by iPerf and the latency is measured by using ping. The measurement of the latency is done on ping packets between two dedicated hosts, one on either side of the firewall. The measurement was done on two configurations one with regular forwarding and another with PAT (Port Address Translation). Both configurations are measured while under network load and while not under network load. Each test is repeated ten times to increase the statistical power behind the conclusion. The results gathered in the experiment resulted in NFtables being the firewall with overall lower latency both while under network load and not under network load. Abstrakt Brandväggen är ett av de viktigaste verktygen för att säkra upp nätverk. IPtables har varit den främst använda brandväggen i alla Linux-system och utvecklarna bakom IPtables är också ansvariga för den avsedda ersättaren, NFtables.
    [Show full text]
  • Intro to Linux Kernel Firewall
    IntroIntro toto LinuxLinux KernelKernel FirewallFirewall © 2010 Obsidian-Studios, Inc. Presenter William L. Thomson Jr. LinuxLinux KernelKernel FirewallFirewall ● Kernel provides Xtables (implemeted as different Netfilter modules) which store chains and rules – x_tables is the name of the kernel module carrying the shared code portion used by all four modules that also provides the API used for extensions – Xtables is more or less used to refer to the entire firewall (v4,v6,arp,eb) architecture ● Different kernel modules and user space programs are currently used for different protocols – iptables applies to IPv4 – ip6tables to IPv6 – arptables to ARP – ebtables for Ethernet frames ● All require elevated privileges to operate and must be executed by root ● Not ”essential binaries” – User must install in some distros – May reside in /sbin or /usr/sbin © 2010 Obsidian-Studios, Inc. Presenter William L. Thomson Jr. Netfilter.orgNetfilter.org ● Home to the software of the packet filtering framework inside the Linux kernel – Software commonly associated with netfilter.org is iptables. ● Software inside this framework enables packet filtering, network address [and port] translation (NA[P]T) and other packet mangling – netfilter is a set of hooks inside the Linux kernel that allows kernel modules to register callback functions with the network stack. – iptables is a generic table structure for the definition of rulesets. Each rule within an IP table consists of a number of classifiers (iptables matches) and one connected action (iptables target). – netfilter, ip_tables, connection tracking (ip_conntrack, nf_conntrack) and the NAT subsystem together build the major parts of the framework. ● Main Features – stateless and stateful packet filtering (IPv4 and IPv6) – network address and port translation, e.g.
    [Show full text]
  • Unbreakable Enterprise Kernel Release Notes for Unbreakable Enterprise Kernel Release 6 Update 1
    Unbreakable Enterprise Kernel Release Notes for Unbreakable Enterprise Kernel Release 6 Update 1 F35561-04 May 2021 Oracle Legal Notices Copyright © 2020, 2021 Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract.
    [Show full text]
  • State of the Linux Kernel Security Subsystem
    State of the Linux Kernel Security Subsystem James Morris <[email protected]> LinuxCon Japan 2012, Yokohama Introduction ● Who I am – Kernel security maintainer – Engineering manager ● Scope – Background – Discuss Linux-specific security – Ongoing developments Background ● Linux is a clone of Unix ● Inherits core security model ● DAC – Not sufficient for modern systems ● Malware, bugs etc. – User manages own object security – Root user overrides security – Does not protect against many threats ● Linux kernel has many security extensions... Linux Kernel Security Features ● Need to be retrofitted to existing design! - Constrained by that design ● Extensions of DAC – Access Control Lists (ACLs) – Posix Capabilities (privileges) ● Process-based ● File capabilities Linux Kernel Security Features ● Namespaces ● Seccomp (“mode 2” coming in 3.5) ● Netfilter/IPtables ● Cryptographic subsystem – Ipsec – Disk encryption ● dm-crypt ● ecryptfs Linux Kernel Security Features ● Mandatory Access Control (MAC) – SELinux – Smack – AppArmor – TOMOYO Linux Kernel Security Features ● System Hardening – ASLR – NX – /dev/mem restrictions – Toolchain hardening – Yama LSM (3.4) ● ptrace_scope (grsec) Linux Kernel Security Features ● Audit ● Keys ● Integrity & platform security – IMA/EVM – TPM – TXT – VT-d – dm-verity Integrity Management Architecture (IMA) ● Detects if files have been maliciously or accidentally altered ● Measures and stores file hashes in TPM – Remote attestation – Local validation ● IMA appraisal (ongoing) ● Protect security attributes against offline attack (EVM) Seccomp Mode 2 ● General system call filtering ● Reduces attack surface of kernel ● Not a sandbox! ● BPF filters installed with – prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog); ● Action may be set to trap, kill, errno, trace, allow. ● Also, PR_SET_NO_NEW_PRIVS – Prevents privilege granting via execve() Ongoing Work ● Security requirements also now being driven by mobile and virt – SE-Android – Tizen (Smack) – Svirt ● Integrity management a focus of current work – Signed modules – Trusted boot etc.
    [Show full text]
  • Replacing Iptables with Ebpf in Kubernetes with Cilium Cilium, Ebpf, Envoy, Istio, Hubble
    Replacing iptables with eBPF in Kubernetes with Cilium Cilium, eBPF, Envoy, Istio, Hubble Michal Rostecki Swaminathan Vasudevan Software Engineer Software Engineer [email protected] [email protected] [email protected] What’s wrong with iptables? 2 What’s wrong with legacy iptables? IPtables runs into a couple of significant problems: ● Iptables updates must be made by recreating and updating all rules in a single transaction. ● Implements chains of rules as a linked list, so all operations are O(n). ● The standard practice of implementing access control lists (ACLs) as implemented by iptables was to use sequential list of rules. ● It’s based on matching IPs and ports, not aware about L7 protocols. ● Every time you have a new IP or port to match, rules need to be added and the chain changed. ● Has high consumption of resources on Kubernetes. Based on the above mentioned issues under heavy traffic conditions or in a system that has a large number of changes to iptable rules the performance degrades. Measurements show unpredictable latency and reduced performance as the number of services grows. 3 Kubernetes uses iptables for... ● kube-proxy - the component which implements Services and load balancing by DNAT iptables rules ● the most of CNI plugins are using iptables for Network Policies 4 And it ends up like that 5 6 What is BPF? 7 Linux Network Stack Process Process Process ● The Linux kernel stack is split into multiple abstraction layers. System Call Interface ● Strong userspace API compatibility in Linux for years. Sockets ● This shows how complex the linux kernel is and its years of evolution.
    [Show full text]
  • Writing Netfilter Modules
    Writing Netfilter modules Jan Engelhardt, Nicolas Bouliane rev. July 3, 2012 The Netfilter/Xtables/iptables framework gives us the possibility to add features. To do so, we write kernel modules that register against this framework. Also, depending on the feature’s category, we write an iptables userspace module. By writing your new extension, you can match, mangle, track and give faith to a given packet or complete flows of interrelated connections. In fact, you can do almost everything you want in this world. Beware that a little error in a kernel module can crash the computer. We will explain the skeletal structures of Xtables and Netfilter modules with complete code examples and by this way, hope to make the interaction with the framework a little easier to understand. We assume you already know a bit about iptables and that you do have C programming skills. Copyright © 2005 Nicolas Bouliane <acidfu (at) people.netfilter.org>, Copyright © 2008–2012 Jan Engelhardt <jengelh (at) inai.de>. This work is made available under the Creative Commons Attribution-Noncom- mercial-Sharealike 3.0 (CC-BY-NC-SA) license. See http://creativecommons.org/ licenses/by-nc-sa/3.0/ for details. (Alternate arrangements can be made with the copyright holder(s).) Additionally, modifications to this work must clearly be indicated as such, and the title page needs to carry the words “Modified Version” if any such modifications have been made, unless the release was done by the designated maintainer(s). The Maintainers are members of the Netfilter Core Team, and any person(s) appointed as maintainer(s) by the coreteam.
    [Show full text]