Packet Filtering & Linux Today

packet filtering & Linux today from iptables to nftables, back and more /me Studied math and computer science 2018- iNNOVo Cloud Cloud Gardener, OpenStack, k8s, edge computing 2016-2018 FHE3, Sysadmin, internal & external consultant 2012-2016 1&1, DNS Team, System Admin About iNNOVO IT as a Service Platform Provider Modular & standardised Bank-level Developing & Operating 2x Tier 3+ DCs in Edge Datacenters Compliance & standardised, agile ITaaS Frankfurt Sicherheit Cloud Platforms NEU! 50+ Offices in Frankfurt und Berlin employees 80% Tech Engineers/ Admins Tolles Team, spannende Aufgaben und interessante Technik! iMKE iNNOVO managed 20% Business Development + Backoffice Kubernetes engine 07.08.2019 3 Where are we? ● netfilter / iptables since 11/2002 ● in transition to nftables - Migration? How it works: hooks -> tables -> chain -> rules How it works: hooks -> tables -> chain -> rules very basic example POLICY ● iptables -P INPUT DROP MATCH CHAIN TARGET ● iptables -A INPUT -p icmp -j ACCEPT ● be more precise … why? ● iptables -A INPUT \ -p icmp --icmp-type echo-request \ -j ACCEPT Where is iptables used? ● linux based router with firewall ● host firewalling ● docker ● k8s ● application level filtering ● debugging How is iptables used? - long list of n rules - origin? - shell script - framework - … - - O(n) - worst case How is iptables used? Issues? Issues - long list of n rules - origin? ● long lists - shell script - framework → tracking - … which rule matched which packet - → in kernel - O(n) - worst case → high latencies - Code duplication in userland and kernel - iptables, ip6tables, ebtables, arptables How to cope with that? ● ignoring ○ missing knowledge/awareness ○ issue in big deployments ● big deployment? ○ linux based routers with many interfaces ○ host firewalls for IP blocking (before IP sets) ○ k8s network polices Use Case - iptables performance for small rulesets ● enable services, simple stupid SSH and HTTP(S) (DNS, or …) how hard can that be? -> Easy ● Naive solution, via conntrack ● Pitfalls? iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -p icmp -j ACCEPT iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -i intern -p tcp --dport 22 -j ACCEPT iptables -A INPUT -p tcp --dport 80 -j ACCEPT iptables -A INPUT -p tcp --dport 443 -j ACCEPT Assume: ● Tables and chains empty ● iptables -P INPUT DROP Use Case - iptables performance for small rulesets ● enable services, simple stupid SSH and HTTP(S) (DNS, or …) how hard can that be? -> Easy ● Naive solution, via conntrack iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -p icmp -j ACCEPT iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -i intern -p tcp --dport 22 -j ACCEPT iptables -A INPUT -p udp --dport 53 -j ACCEPT iptables -A INPUT -p tcp --dport 53 -j ACCEPT Assume: ● Tables and chains empty ● iptables -P INPUT DROP Use Case - iptables performance for small rulesets Use cases ● enable simple services SSH and HTTP(S) (DNS, or …) how hard can that be? -> Easy ● Naive solution, via conntrack ● Pitfalls? → Conntrack expection tables migth get exthausted, → loss of control and service # iptables -t raw -A PREROUTING -p udp --dport 53 -j NOTRACK # iptables -t raw -A OUTPUT -p udp --sport 53 -j NOTRACK iptables - performance for small rulesets Use cases ● enable SSH and DNS, how hard can that be? -> Easy ● DNS DDoS, near line rate 10G/1G, many locations ● could not be filtered properly on AS borders One possibe solution u32 match: u32 filter generate-netfilter-u32-dns-rule iptables performance for small rulesets u32 filter generate-netfilter-u32-dns-rule # python generate-netfilter-u32-dns-rule.py \ --qname heise.de --qtype AAAA 0>>22&0x3C@20&0xFFDFDFDF=0x05484549&&0>>22&0x3C@24&0xDFDFFFDF=0x53450244&&0>>2 2&0x3C@28&0xDFFFFFFF=0x4500001C # iptables [...] --match u32 --u32 "$rule" -j DROP tune iptables performance ● state might kill -> connection tracking ● protocols using UDP -> might be a bad idea * DNS, syslog, NTP ... → iptables -t RAW -A -m match … -j NOTRACK ● sysctl tuneable for timeouts in conntrack stack net.netfilter.nf_conntrack_tcp_timeout_established=7200 net.netfilter.nf_conntrack_udp_timeout=60 net.netfilter.nf_conntrack_udp_timeout_stream=180 ... Examples: other cool matches -m ● u32 - very flexible, but annoying to write ● bpf ● conntrack - use the state of connections ● cgroup ● probability - testing ● recent - port knocking without daemon https://www.digitalocean.com/community/tutorials/how-to-configure-port-knocking-using-only-iptables-on-an-ubuntu-vps Examples: other cool matches & targets -j ● REDIRECT - Application level fitering, debugging aid ● MARK / CONNMARK ● LOG / ULOG - Logging / structured & flexible logging ● TRACE - ruleset debugging helper, show packet flow throught the rulesets iptables - nftables - Transition e.g. Debian 10 Buster - iptables-nft is standard #Warning: iptables-legacy tables present, use iptables-legacy-save to see them ● iptables-nft vs. iptables-legacy ● What’s in /etc/modules, ...? ○ iptables-legacy-save | iptabes-nft-restore ○ remove old modules ipt_filter, .... ○ black list those modules How it works: hooks -> tables -> chain -> rules ● dynamic tables and chain creation ● no default tables and chains → netfilter hooks nftables # nft list tables # nft list table inet filter # nft flush ruleset # nft add table inet filter ### iptables compat # nft add chain inet filter input { type filter hook input priority 0 \; policy drop \; } # nft add chain inet filter forward { type filter hook forward priority 0 \; policy drop \; } # nft add chain inet filter output { type filter hook output priority 0 \; policy accept \; } #!/usr/sbin/nft -f # nft add rule inet filter input ct state related,established accept # nft add rule inet filter input iif lo accept # nft add rule inet filter input ip protocol tcp dport 22 accept atomicity nftables: ingress hook ● no conntrack, before any other tables ● Why this is useful? → veth, macvtap, Containers What else do we have? ● iptables/ip6tables/ebtables/arptables ● nftables ● tc ● bpfilter ● XDP tc and tcpdump ● tc → traffic control, strange syntax, but useful ○ QoS ○ Filtering ○ Mirroring ○ Network simulation ● tcpdump pcap compiles BPF fragment → loaded into kernel, → attached to an interface → hand over matching packets/frames to tcpdump ● How to generate fragments? Note: # tcpdump -ddd use an interface with same encapsulation tc and tcpdump # ip tuntap add dev tun0 mode tun; ip l set up tun0 # tcpdump -i tun0 -ddd icmp | tee filter.bpf 7 48 0 0 0 84 0 0 240 21 0 3 64 48 0 0 9 21 0 1 1 6 0 0 262144 6 0 0 0 Note: tun0 transport raw IP packets, might look different ethernet devices has ethernet frames tc and tcpdump # tc qdisc add dev eth0 handle ffff: ingress # tc filter add dev eth0 parent ffff: bpf bytecode-file filter.bpf action drop # tc filter show dev eth0 parent ffff: bpfilter, XDP ● similiar to nftables ingress hook, attach fragments to interfaces ● BPF in fact eBPF ● Hardware offloading possible! see Cililum, good quick start tutorial, https://docs.cilium.io/en/v1.4/bpf/ Fun fact: loopless 6502 derivative, but with proper register sizes Questions ? tc and tcpdump - syntax pogo edition! # tc qdisc add dev eth0 handle ffff: ingress # tc filter add dev eth0 parent ffff: bpf bytecode-file filter.bpf action drop # tc filter show dev eth0 parent ffff: How to delete? tc filter del dev enp0s8 parent ffff: local traffic redirection - debugging # iptables -t nat -A OUTPUT -p tcp --dport 80 \ -j REDIRECT --to-ports 8080 # iptables -t nat -A OUTPUT -p tcp --dport 443 \ -j REDIRECT --to-ports 8080 # nc -l 0.0.0.0 8080 # mitmproxy --mode transparent --showhost -k tc and the network emulator Simulate delays or losses tc qdisc add dev eth0 root netem loss 10% https://wiki.linuxfoundation.org/networking/netem iptables ... -m probabilty.

Load more