Replacing Iptables with Ebpf in Kubernetes with Cilium Cilium, Ebpf, Envoy, Istio, Hubble

Total Page:16

File Type:pdf, Size:1020Kb

Replacing Iptables with Ebpf in Kubernetes with Cilium Cilium, Ebpf, Envoy, Istio, Hubble Replacing iptables with eBPF in Kubernetes with Cilium Cilium, eBPF, Envoy, Istio, Hubble Michal Rostecki Swaminathan Vasudevan Software Engineer Software Engineer [email protected] [email protected] [email protected] What’s wrong with iptables? 2 What’s wrong with legacy iptables? IPtables runs into a couple of significant problems: ● Iptables updates must be made by recreating and updating all rules in a single transaction. ● Implements chains of rules as a linked list, so all operations are O(n). ● The standard practice of implementing access control lists (ACLs) as implemented by iptables was to use sequential list of rules. ● It’s based on matching IPs and ports, not aware about L7 protocols. ● Every time you have a new IP or port to match, rules need to be added and the chain changed. ● Has high consumption of resources on Kubernetes. Based on the above mentioned issues under heavy traffic conditions or in a system that has a large number of changes to iptable rules the performance degrades. Measurements show unpredictable latency and reduced performance as the number of services grows. 3 Kubernetes uses iptables for... ● kube-proxy - the component which implements Services and load balancing by DNAT iptables rules ● the most of CNI plugins are using iptables for Network Policies 4 And it ends up like that 5 6 What is BPF? 7 Linux Network Stack Process Process Process ● The Linux kernel stack is split into multiple abstraction layers. System Call Interface ● Strong userspace API compatibility in Linux for years. Sockets ● This shows how complex the linux kernel is and its years of evolution. TCP UDP Raw ● This cannot be replaced in a short term. Netfilter ● Very hard to bypass the layers. IPv4 IPv6 ● Netfilter module has been supported by linux for more Ethernet than two decades and packet filtering has to applied to packets that moves up and down the stack. Traffic Shaping Netdevice / Drivers HW Bridge OVS . 8 BPF kernel hooks Process Process Process System Call Interface BPF System calls Sockets BPF Sockmap and Sockops TCP UDP Raw Netfilter IPv4 IPv6 BPF cGroups Ethernet Traffic Shaping BPF TC hooks Netdevice / Drivers BPF XDP HW Bridge OVS . 9 Mpps 10 BPF replaces IPtables Local Processes PREROUTING INPUT FORWARD OUTPUT POSTROUTING IPTables Routing netfilter Decision FILTER hooks eBPF TC hooks NAT XDP hooks Routing Decision FILTER FILTER eBPF eBPF Code Code Routing Decision NAT NAT Netdev Netdev (Physical or (Physical or virtual Device) virtual Device) 11 BPF based filtering architecture To Linux From Linux Stack Stack TC/XDP Ingress TC Egress hook hook NetFilter NetFilter Connection Tracking Update Store Ingress [local dst] INGRESS session session Chain CHAIN Label Packet Selector [remote dst] Update FORWARD Store session CHAIN session Label Packet [local src] Update Store Egress Chain OUTPUT session session Selector CHAIN Netdev Label Packet Netdev (Physical or (Physical or virtual Device) virtual Device) [remote src] 12 BPF based tail calls LBVS is implemented Each eBPF program is Each eBPF program can exploit a with a chain of eBPF injected only if there are per cpu _array shared across the entire program chain different matching algorithm (e.g., programs, connected rules operating on that through tail calls. field. exact match, longest prefix match, Packet header offsets etc). eBPF Program #1 eBPF Program #2 eBPF Program #3 eBPF Program #N eBPF Program Search IP1 bitv1 * bitv1 rule1 act1 Headers call Tail IP.dst IP.proto first IP2 bitv2 udp bitv2 rule2 act2 parsing lookup lookup Matching Tail call Tail IP3 bitv3 tcp bitv3 rule rule3 act3 Tail call Tail …. call Tail Update rule1 cnt1 Bitwise counters rule2 cnt2 AND bit-vectors ACTION Header parsing is done (drop/ once and results are kept accept) in a shared map for performance reasons Packet out Packet in Bitvector with temporary result To eBPF hook From eBPF hook per cpu _array shared across the entire program chain 13 BPF goes into... ● Load balancers - katran ● perf ● systemd ● Suricata ● Open vSwitch - AF_XDP ● And many many others 14 BPF is used by... 15 Cilium 16 What is Cilium? 17 CNI Functionality CNI is a CNCF ( Cloud Native Computing Foundation) project for Linux Containers It consists of specification and libraries for writing plugins. Only care about networking connectivity of containers ● ADD/DEL General container runtime considerations for CNI: The container runtime must ● create a new network namespace for the container before invoking any plugins ● determine the network for the container and add the container to the each network by calling the corresponding plugins for each network ● not invoke parallel operations for the same container. ● order ADD and DEL operations for a container, such that ADD is always eventually followed by a corresponding DEL. ● not call ADD twice ( without a corresponding DEL ) for the same ( network name, container id, name of the interface inside the container). When CNI ADD call is invoked it tries to add the network to the container with respective veth pairs and assigning IP address from the respective IPAM Plugin or using the Host Scope. When CNI DEL call is invoked it tries to remove the container network, release the IP Address to the IPAM Manager and cleans up the veth pairs. 18 Cilium CNI Plugin control Flow Kubectl Kubernetes API Server Kubelet Userspace K8s Pod CRI-Containerd cni-add().. Container2 CNI-Plugin (Cilium) Container1 Cilium Agent bpf_syscall() BPF Maps Linux Kernel Network 000 c1 FE 0A Stack BPF 001 54 45 31 Hook 002 A1 B1 C1 004 32 66 AA Kernel eth0 19 Cilium Components with BPF hook points and BPF maps shown in Linux Stack Orchestrator U CILIUM POD (Control Plane) S E VM’s and Containers Apps R PLUGIN CILIUM CLI CILIUM MONITOR S Cont Cont Cont P VM1 1 2 3 App CILIUM HEALTH NAMESPACE A CILIUM AGENT DAEMON C CILIUM CILIUM HEALTH E OPERATOR CILIUM HOST_NET BPF Bpf_create_map K AF-INET AF-RAW SO_ATTACH_BPF (sockmap, s E sockopts R Virtual B N Net TCP/UDP Layer E Bpf_lookup_elements A BPF-Cilium L Devices P F IP Layer BPF-Cont1 S - F BPF-Cont2 P X BPF-Cont3 A D C E P Queueing and Forwarding TC BPF m NETWORK STACK with BPF hook points a p XDP Build sk_buff Device Driver s PHYSICAL LAYER ( NETWORK HARDWARE 20 Cilium as CNI Plugin K8s cluster K8s node K8s node K8s pod K8s pod K8s pod container A container B container C eth0 eth0 eth0 lxc0 lxc0 lxc1 Cilium Networking CNI eth0 eth0 21 Networking modes Encapsulation Direct routing Use case: Use case: Cilium handling routing between nodes Using cloud provider routers, using BGP routing daemon Node A Node A VXLAN VXLAN Node B Cloud or BGP routing VXLAN Node C Node B Node C 22 Pod IP Routing - Overlay Routing ( Tunneling mode) 23 Pod IP Routing - Direct Routing Mode 24 L3 filtering – label based, ingress Pod Pod Labels: role=frontend Labels: role=backend IP: 10.0.0.1 allow IP: 10.0.0.3 Pod Pod Labels: role=frontend Labels: role=frontend IP: 10.0.0.2 IP: 10.0.0.4 deny Pod IP: 10.0.0.5 25 L3 filtering – label based, ingress apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "Allow frontends to access backends" metadata: name: "frontend-backend" spec: endpointSelector: matchLabels: role: backend ingress: - fromEndpoints: - matchLabels: class: frontend 26 L3 filtering – CIDR based, egress Cluster A IP: 10.0.1.1 allow Subnet: 10.0.1.0/24 Pod Labels: role=backend IP: 10.0.2.1 IP: 10.0.0.1 deny Subnet: 10.0.2.0/24 Any IP not belonging to 10.0.1.0/24 27 L3 filtering – CIDR based, egress apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "Allow backends to access 10.0.1.0/24" metadata: name: "frontend-backend" spec: endpointSelector: matchLabels: role: backend egress: - toCIDR: - IP: “10.0.1.0/24” 28 L4 filtering apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "Allow to access backends only on TCP/80" metadata: name: "frontend-backend" spec: endpointSelector: matchLabels: role: backend ingress: - toPorts: - ports: - port: “80” protocol: “TCP” 29 L4 filtering Pod Labels: role=backend IP: 10.0.0.1 allow TCP/80 deny Any other port 30 L7 filtering – API Aware Security Pod Labels: role=api IP: 10.0.0.1 GET /articles/{id} Pod IP: 10.0.0.5 GET /private 31 L7 filtering – API Aware Security apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "L7 policy to restict access to specific HTTP endpoints" metadata: name: "frontend-backend" endpointSelector: matchLabels: role: backend ingress: - toPorts: - ports: - port: “80” protocol: “TCP” rules: http: - method: "GET" path: "/article/$" 32 Standalone proxy, L7 filtering Node A Node B Envoy Pod A Envoy Pod B Generating BPF programs for Generating BPF programs for L7 filtering through libcilium.so L7 filtering through libcilium.so + BPF + BPF VXLAN Generating BPF programs Generating BPF programs for L3/L4 filtering for L3/L4 filtering 33 Features 34 Cluster Mesh Cluster A Cluster B Node A Node A Node B Pod A Pod A Pod B Pod C Container Container Container Container eth0 eth0 eth0 eth0 + BPF + BPF + BPF External etcd 35 Istio (Transparent Sidecar injection) without Cilium K8s Node K8s Pod K8s Pod Service Service Socket Socket Socket Socket Socket Socket TCP/IP TCP/IP TCP/IP TCP/IP TCP/IP TCP/IP IPtables IPtables IPtables IPtables IPtables IPtables Ethernet Ethernet Ethernet Ethernet Ethernet Ethernet Loopback Loopback eth0 eth0 Network 36 Istio with cilium and sockmap K8s Node K8s Pod K8s Pod Service Service Socket Socket Socket Socket Socket Socket Cilium CNI Cilium CNI TCP/IP TCP/IP Ethernet Ethernet eth0 eth0 Network 37 Istio Istio K8s cluster Pilot/Mixer/Citadel K8s node K8s node K8s pod K8s pod K8s pod Service A Service B Service C Cilium Networking CNI 38 Istio - Mutual TLS Istio K8s cluster Pilot/Mixer/Citadel K8s node K8s node K8s pod K8s pod Service A Service B Mutual TLS Cilium Networking CNI 39 Istio - Deferred kTLS Istio K8s cluster Pilot/Mixer/Citadel K8s node K8s node K8s pod K8s pod Service A Service B Deferred kTLS External Github encryption External Cloud Service Network Cilium Networking CNI 40 Kubernetes Services BPF, Cilium Iptables, kube-proxy ● Hash table.
Recommended publications
  • SNMP Trap - Firewall Rules
    SNMP Trap - Firewall Rules Article Number: 87 | Rating: 1/5 from 1 votes | Last Updated: Wed, Jan 13, 2021 at 4:42 PM Fir e wall Rule s These steps explain how to check if the Operating System (OS) of the Nagios server has firewall rules enabled to allow inbound SNMP Trap UDP port 162 traffic. The different supported OS's have different firewall commands which are explained as follows. You will need to establish an SSH session to the Nagios server that is receiving SNMP Traps. RHEL 7/8 | C e nt O S 7/8 | O r ac le Linux 7/8 First check the status of the firewall: systemctl status firewalld.service IF the firewall is running , it should product output like: ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2018-11-20 10:05:15 AEDT; 1 weeks 0 days ago Docs: man:firewalld(1) Main PID: 647 (firewalld) CGroup: /system.slice/firewalld.service └─647 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid IF the firewall is NO T running, it will produce this output: ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: inactive (dead) since Tue 2018-11-27 14:11:34 AEDT; 965ms ago Docs: man:firewalld(1) Main PID: 647 (code=exited, status=0/SUCCESS) If the firewall is NOT running, this means that inbound traffic is allowed. To ENABLE the firewall on b o o t and to s ta rt it, execute the following commands: systemctl
    [Show full text]
  • Netfilter's Connection Tracking System
    FILTERING POLICIES BASED UNIQUELY on packet header information are obsolete. PABLO NEIRA AYUSO These days, stateful firewalls provide advanced mechanisms to let sysadmins Netfilter’s and security experts define more intelli- gent policies. This article describes the connection implementation details of the connection tracking system tracking system provided by the Netfilter project and also presents the required Pablo Neira Ayuso has an M.S. in computer science background to understand it, such as an and has worked for several companies in the IT secu- rity industry, with a focus on open source solutions. understanding of the Netfilter framework. Nowadays he is a full-time teacher and researcher at the University of Seville. This article will be the perfect complement to understanding the subsystem that [email protected] enables the stateful firewall available in any recent Linux kernel. The Netfilter Framework The Netfilter project was founded by Paul “Rusty” Russell during the 2.3.x development series. At that time the existing firewalling tool for Linux had serious drawbacks that required a full rewrite. Rusty decided to start from scratch and create the Netfilter framework, which comprises a set of hooks over the Linux network protocol stack. With the hooks, you can register kernel modules that do some kind of network packet handling at different stages. Iptables, the popular firewalling tool for Linux, is commonly confused with the Netfilter framework itself. This is because iptables chains and hooks have the same names. But iptables is just a brick on top of the Netfilter framework. Fortunately, Rusty spent considerable time writ- ing documentation [1] that comes in handy for anyone willing to understand the framework, al- though at some point you will surely feel the need to get your hands dirty and look at the code to go further.
    [Show full text]
  • Communicating Between the Kernel and User-Space in Linux Using Netlink Sockets
    SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2010; 00:1–7 Prepared using speauth.cls [Version: 2002/09/23 v2.2] Communicating between the kernel and user-space in Linux using Netlink sockets Pablo Neira Ayuso∗,∗1, Rafael M. Gasca1 and Laurent Lefevre2 1 QUIVIR Research Group, Departament of Computer Languages and Systems, University of Seville, Spain. 2 RESO/LIP team, INRIA, University of Lyon, France. SUMMARY When developing Linux kernel features, it is a good practise to expose the necessary details to user-space to enable extensibility. This allows the development of new features and sophisticated configurations from user-space. Commonly, software developers have to face the task of looking for a good way to communicate between kernel and user-space in Linux. This tutorial introduces you to Netlink sockets, a flexible and extensible messaging system that provides communication between kernel and user-space. In this tutorial, we provide fundamental guidelines for practitioners who wish to develop Netlink-based interfaces. key words: kernel interfaces, netlink, linux 1. INTRODUCTION Portable open-source operating systems like Linux [1] provide a good environment to develop applications for the real-world since they can be used in very different platforms: from very small embedded devices, like smartphones and PDAs, to standalone computers and large scale clusters. Moreover, the availability of the source code also allows its study and modification, this renders Linux useful for both the industry and the academia. The core of Linux, like many modern operating systems, follows a monolithic † design for performance reasons. The main bricks that compose the operating system are implemented ∗Correspondence to: Pablo Neira Ayuso, ETS Ingenieria Informatica, Department of Computer Languages and Systems.
    [Show full text]
  • Introduction to Linux Virtual Server and High Availability
    Outlines Introduction to Linux Virtual Server and High Availability Chen Kaiwang [email protected] December 5, 2011 Chen Kaiwang [email protected] LVS-DR and Keepalived Outlines If you don't know the theory, you don't have a way to be rigorous. Robert J. Shiller http://www.econ.yale.edu/~shiller/ Chen Kaiwang [email protected] LVS-DR and Keepalived Outlines Misery stories I Jul 2011 Too many connections at zongheng.com I Aug 2011 Realserver maintenance at 173.com quiescent persistent connections I Nov 2011 Health check at 173.com I Nov 2011 Virtual service configuration at 173.com persistent session data Chen Kaiwang [email protected] LVS-DR and Keepalived Outlines Outline of Part I Introduction to Linux Virtual Server Configuration Overview Netfilter Architecture Job Scheduling Scheduling Basics Scheduling Algorithms Connection Affinity Persistence Template Persistence Granularity Quirks Chen Kaiwang [email protected] LVS-DR and Keepalived Outlines Outline of Part II HA Basics LVS High Avaliablity Realserver Failover Director Failover Solutions Heartbeat Keepalived Chen Kaiwang [email protected] LVS-DR and Keepalived LVS Intro Job Scheduling Connection Affinity Quirks Part I Introduction to Linux Virtual Server Chen Kaiwang [email protected] LVS-DR and Keepalived LVS Intro Job Scheduling Configuration Overview Connection Affinity Netfilter Architecture Quirks Introduction to Linux Virtual Server Configuration Overview Netfilter Architecture Job Scheduling Scheduling Basics Scheduling Algorithms Connection Affinity Persistence Template Persistence Granularity Quirks Chen Kaiwang [email protected] LVS-DR and Keepalived LVS Intro Job Scheduling Configuration Overview Connection Affinity Netfilter Architecture Quirks A Linux Virtual Serverr (LVS) is a group of servers that appear to the client as one large, fast, reliable (highly available) server.
    [Show full text]
  • Netkernel: Making Network Stack Part of the Virtualized Infrastructure
    NetKernel: Making Network Stack Part of the Virtualized Infrastructure Zhixiong Niu Hong Xu Peng Cheng Microsoft Research City University of Hong Kong Microsoft Research Qiang Su Yongqiang Xiong Tao Wang Dongsu Han City University of Hong Kong Microsoft Research New York University KAIST Keith Winstein Stanford University Abstract We argue that the current division of labor between appli- This paper presents a system called NetKernel that decou- cation and network infrastructure is becoming increasingly ples the network stack from the guest virtual machine and inadequate, especially in a private cloud setting. The central offers it as an independent module. NetKernel represents a issue is that the network stack is controlled solely by indi- new paradigm where network stack can be managed as part of vidual guest VM; the operator has almost zero visibility or the virtualized infrastructure. It provides important efficiency control. This leads to efficiency problems that manifest in vari- benefits: By gaining control and visibility of the network stack, ous aspects of running the cloud network. Firstly, the operator operator can perform network management more directly and is unable to orchestrate resource allocation at the end-points flexibly, such as multiplexing VMs running different applica- of the network fabric, resulting in low resource utilization. It tions to the same network stack module to save CPU. Users remains difficult today for the operator to meet or define per- also benefit from the simplified stack deployment and better formance SLAs despite much prior work [17,28,35,41,56,57], performance. For example mTCP can be deployed without as she cannot precisely provision resources just for the net- API change to support nginx natively, and shared memory work stack or control how the stack consumes these resources.
    [Show full text]
  • Ready; T=0.01/0.01 21:14:10 Ipl 160 Clear Zipl V1.8.0-44.45.2 Interactive Boot Menu
    Ready; T=0.01/0.01 21:14:10 ipl 160 clear zIPL v1.8.0-44.45.2 interactive boot menu 0. default (ipl) 1. ipl Note: VM users please use '#cp vi vmsg <number> <kernel-parameters>' Please choose (default will boot in 10 seconds): Booting default (ipl)... Initializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.32.36-0.5-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3- branch revision 152973] (SUSE Linux) ) #1 SMP 2011 -04-14 10:12:31 +0200 setup.1a06a7: Linux is running as a z/VM guest operating system in 64-bit mode Zone PFN ranges: DMA 0x00000000 -> 0x00080000 Normal 0x00080000 -> 0x00080000 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0x00000000 -> 0x00020000 PERCPU: Embedded 11 pages/cpu @0000000000e92000 s12544 r8192 d24320 u65536 pcpu-alloc: s12544 r8192 d24320 u65536 alloc=16*4096 pcpu-alloc: [0] 00 [0] 01 [0] 02 [0] 03 [0] 04 [0] 05 [0] 06 [0] 07 pcpu-alloc: [0] 08 [0] 09 [0] 10 [0] 11 [0] 12 [0] 13 [0] 14 [0] 15 pcpu-alloc: [0] 16 [0] 17 [0] 18 [0] 19 [0] 20 [0] 21 [0] 22 [0] 23 pcpu-alloc: [0] 24 [0] 25 [0] 26 [0] 27 [0] 28 [0] 29 [0] 30 [0] 31 pcpu-alloc: [0] 32 [0] 33 [0] 34 [0] 35 [0] 36 [0] 37 [0] 38 [0] 39 pcpu-alloc: [0] 40 [0] 41 [0] 42 [0] 43 [0] 44 [0] 45 [0] 46 [0] 47 pcpu-alloc: [0] 48 [0] 49 [0] 50 [0] 51 [0] 52 [0] 53 [0] 54 [0] 55 pcpu-alloc: [0] 56 [0] 57 [0] 58 [0] 59 [0] 60 [0] 61 [0] 62 [0] 63 Built 1 zonelists in Zone order, mobility grouping on.
    [Show full text]
  • Nftables Och Iptables En Jämförelse Av Latens Nftables and Iptables a Comparison in Latency
    NFtables and IPtables Jonas Svensson Eidsheim NFtables och IPtables En jämförelse av latens NFtables and IPtables A Comparison in Latency Bachelors Degree Project in Computer Science Network and Systems Administration, G2E, 22.5 hp IT604G Jonas Svensson Eidsheim [email protected] Examiner Jonas Gamalielsson Supervisor Johan Zaxmy Abstract Firewalls are one of the essential tools to secure any network. IPtables has been the de facto firewall in all Linux systems, and the developers behind IPtables are also responsible for its intended replacement, NFtables. Both IPtables and NFtables are firewalls developed to filter packets. Some services are heavily dependent on low latency transport of packets, such as VoIP, cloud gaming, storage area networks and stock trading. This work is aiming to compare the latency between the selected firewalls while under generated network load. The network traffic is generated by iPerf and the latency is measured by using ping. The measurement of the latency is done on ping packets between two dedicated hosts, one on either side of the firewall. The measurement was done on two configurations one with regular forwarding and another with PAT (Port Address Translation). Both configurations are measured while under network load and while not under network load. Each test is repeated ten times to increase the statistical power behind the conclusion. The results gathered in the experiment resulted in NFtables being the firewall with overall lower latency both while under network load and not under network load. Abstrakt Brandväggen är ett av de viktigaste verktygen för att säkra upp nätverk. IPtables har varit den främst använda brandväggen i alla Linux-system och utvecklarna bakom IPtables är också ansvariga för den avsedda ersättaren, NFtables.
    [Show full text]
  • Intro to Linux Kernel Firewall
    IntroIntro toto LinuxLinux KernelKernel FirewallFirewall © 2010 Obsidian-Studios, Inc. Presenter William L. Thomson Jr. LinuxLinux KernelKernel FirewallFirewall ● Kernel provides Xtables (implemeted as different Netfilter modules) which store chains and rules – x_tables is the name of the kernel module carrying the shared code portion used by all four modules that also provides the API used for extensions – Xtables is more or less used to refer to the entire firewall (v4,v6,arp,eb) architecture ● Different kernel modules and user space programs are currently used for different protocols – iptables applies to IPv4 – ip6tables to IPv6 – arptables to ARP – ebtables for Ethernet frames ● All require elevated privileges to operate and must be executed by root ● Not ”essential binaries” – User must install in some distros – May reside in /sbin or /usr/sbin © 2010 Obsidian-Studios, Inc. Presenter William L. Thomson Jr. Netfilter.orgNetfilter.org ● Home to the software of the packet filtering framework inside the Linux kernel – Software commonly associated with netfilter.org is iptables. ● Software inside this framework enables packet filtering, network address [and port] translation (NA[P]T) and other packet mangling – netfilter is a set of hooks inside the Linux kernel that allows kernel modules to register callback functions with the network stack. – iptables is a generic table structure for the definition of rulesets. Each rule within an IP table consists of a number of classifiers (iptables matches) and one connected action (iptables target). – netfilter, ip_tables, connection tracking (ip_conntrack, nf_conntrack) and the NAT subsystem together build the major parts of the framework. ● Main Features – stateless and stateful packet filtering (IPv4 and IPv6) – network address and port translation, e.g.
    [Show full text]
  • Netvm: High Performance and Flexible Networking Using Virtualization on Commodity Platforms Jinho Hwang† K
    NetVM: High Performance and Flexible Networking using Virtualization on Commodity Platforms Jinho Hwang† K. K. Ramakrishnan∗ Timothy Wood† †The George Washington University ∗WINLAB, Rutgers University Abstract compared to purpose-built (often proprietary) network- NetVM brings virtualization to the Network by enabling ing hardware or middleboxes based on custom ASICs. high bandwidth network functions to operate at near Middleboxes are typically hardware-software pack- line speed, while taking advantage of the flexibility and ages that come together on a special-purpose appliance, customization of low cost commodity servers. NetVM often at high cost. In contrast, a high throughput platform allows customizable data plane processing capabilities based on virtual machines (VMs) would allow network such as firewalls, proxies, and routers to be embed- functions to be deployed dynamically at nodes in the net- ded within virtual machines, complementing the con- work with low cost. Further, the shift to VMs would let trol plane capabilities of Software Defined Networking. businesses run network services on existing cloud plat- NetVM makes it easy to dynamically scale, deploy, and forms, bringing multiplexing and economy of scale ben- reprogram network functions. This provides far greater efits to network functionality. Once data can be moved flexibility than existing purpose-built, sometimes propri- to, from and between VMs at line rate for all packet sizes, etary hardware, while still allowing complex policies and we approach the long-term vision where the line between full packet inspection to determine subsequent process- data centers and network resident “boxes” begins to blur: ing. It does so with dramatically higher throughput than both software and network infrastructure could be devel- existing software router platforms.
    [Show full text]
  • Unbreakable Enterprise Kernel Release Notes for Unbreakable Enterprise Kernel Release 6 Update 1
    Unbreakable Enterprise Kernel Release Notes for Unbreakable Enterprise Kernel Release 6 Update 1 F35561-04 May 2021 Oracle Legal Notices Copyright © 2020, 2021 Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract.
    [Show full text]
  • State of the Linux Kernel Security Subsystem
    State of the Linux Kernel Security Subsystem James Morris <[email protected]> LinuxCon Japan 2012, Yokohama Introduction ● Who I am – Kernel security maintainer – Engineering manager ● Scope – Background – Discuss Linux-specific security – Ongoing developments Background ● Linux is a clone of Unix ● Inherits core security model ● DAC – Not sufficient for modern systems ● Malware, bugs etc. – User manages own object security – Root user overrides security – Does not protect against many threats ● Linux kernel has many security extensions... Linux Kernel Security Features ● Need to be retrofitted to existing design! - Constrained by that design ● Extensions of DAC – Access Control Lists (ACLs) – Posix Capabilities (privileges) ● Process-based ● File capabilities Linux Kernel Security Features ● Namespaces ● Seccomp (“mode 2” coming in 3.5) ● Netfilter/IPtables ● Cryptographic subsystem – Ipsec – Disk encryption ● dm-crypt ● ecryptfs Linux Kernel Security Features ● Mandatory Access Control (MAC) – SELinux – Smack – AppArmor – TOMOYO Linux Kernel Security Features ● System Hardening – ASLR – NX – /dev/mem restrictions – Toolchain hardening – Yama LSM (3.4) ● ptrace_scope (grsec) Linux Kernel Security Features ● Audit ● Keys ● Integrity & platform security – IMA/EVM – TPM – TXT – VT-d – dm-verity Integrity Management Architecture (IMA) ● Detects if files have been maliciously or accidentally altered ● Measures and stores file hashes in TPM – Remote attestation – Local validation ● IMA appraisal (ongoing) ● Protect security attributes against offline attack (EVM) Seccomp Mode 2 ● General system call filtering ● Reduces attack surface of kernel ● Not a sandbox! ● BPF filters installed with – prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog); ● Action may be set to trap, kill, errno, trace, allow. ● Also, PR_SET_NO_NEW_PRIVS – Prevents privilege granting via execve() Ongoing Work ● Security requirements also now being driven by mobile and virt – SE-Android – Tizen (Smack) – Svirt ● Integrity management a focus of current work – Signed modules – Trusted boot etc.
    [Show full text]
  • Writing Netfilter Modules
    Writing Netfilter modules Jan Engelhardt, Nicolas Bouliane rev. July 3, 2012 The Netfilter/Xtables/iptables framework gives us the possibility to add features. To do so, we write kernel modules that register against this framework. Also, depending on the feature’s category, we write an iptables userspace module. By writing your new extension, you can match, mangle, track and give faith to a given packet or complete flows of interrelated connections. In fact, you can do almost everything you want in this world. Beware that a little error in a kernel module can crash the computer. We will explain the skeletal structures of Xtables and Netfilter modules with complete code examples and by this way, hope to make the interaction with the framework a little easier to understand. We assume you already know a bit about iptables and that you do have C programming skills. Copyright © 2005 Nicolas Bouliane <acidfu (at) people.netfilter.org>, Copyright © 2008–2012 Jan Engelhardt <jengelh (at) inai.de>. This work is made available under the Creative Commons Attribution-Noncom- mercial-Sharealike 3.0 (CC-BY-NC-SA) license. See http://creativecommons.org/ licenses/by-nc-sa/3.0/ for details. (Alternate arrangements can be made with the copyright holder(s).) Additionally, modifications to this work must clearly be indicated as such, and the title page needs to carry the words “Modified Version” if any such modifications have been made, unless the release was done by the designated maintainer(s). The Maintainers are members of the Netfilter Core Team, and any person(s) appointed as maintainer(s) by the coreteam.
    [Show full text]