packet filtering & Linux today from iptables to nftables, back and more /me
Studied math and computer science
2018- iNNOVo Cloud Cloud Gardener, OpenStack, k8s, edge computing
2016-2018 FHE3, Sysadmin, internal & external consultant
2012-2016 1&1, DNS Team, System Admin About iNNOVO IT as a Service Platform Provider
Modular & standardised Bank-level Developing & Operating 2x Tier 3+ DCs in Edge Datacenters Compliance & standardised, agile ITaaS Frankfurt Sicherheit Cloud Platforms NEU!
50+ Offices in Frankfurt und Berlin employees 80% Tech Engineers/ Admins
Tolles Team, spannende Aufgaben und interessante Technik! iMKE iNNOVO managed 20% Business Development + Backoffice Kubernetes engine
07.08.2019 3 Where are we?
● netfilter / iptables since 11/2002 ● in transition to nftables - Migration? How it works: hooks -> tables -> chain -> rules How it works: hooks -> tables -> chain -> rules very basic example
POLICY
● iptables -P INPUT DROP MATCH CHAIN TARGET
● iptables -A INPUT -p icmp -j ACCEPT
● be more precise … why?
● iptables -A INPUT \ -p icmp --icmp-type echo-request \ -j ACCEPT Where is iptables used?
● linux based router with firewall ● host firewalling ● docker ● k8s ● application level filtering ● debugging How is iptables used?
- long list of n rules - origin? - shell script - framework - … - - O(n) - worst case How is iptables used? Issues?
Issues - long list of n rules - origin? ● long lists - shell script - framework → tracking - … which rule matched which packet - → in kernel - O(n) - worst case → high latencies
- Code duplication in userland and kernel - iptables, ip6tables, ebtables, arptables How to cope with that?
● ignoring ○ missing knowledge/awareness ○ issue in big deployments
● big deployment? ○ linux based routers with many interfaces ○ host firewalls for IP blocking (before IP sets) ○ k8s network polices Use Case - iptables performance for small rulesets
● enable services, simple stupid SSH and HTTP(S) (DNS, or …) how hard can that be? -> Easy ● Naive solution, via conntrack ● Pitfalls? iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -p icmp -j ACCEPT iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -i intern -p tcp --dport 22 -j ACCEPT iptables -A INPUT -p tcp --dport 80 -j ACCEPT iptables -A INPUT -p tcp --dport 443 -j ACCEPT Assume: ● Tables and chains empty ● iptables -P INPUT DROP Use Case - iptables performance for small rulesets
● enable services, simple stupid SSH and HTTP(S) (DNS, or …) how hard can that be? -> Easy ● Naive solution, via conntrack iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -p icmp -j ACCEPT iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -i intern -p tcp --dport 22 -j ACCEPT iptables -A INPUT -p udp --dport 53 -j ACCEPT iptables -A INPUT -p tcp --dport 53 -j ACCEPT
Assume: ● Tables and chains empty ● iptables -P INPUT DROP Use Case - iptables performance for small rulesets
Use cases
● enable simple services SSH and HTTP(S) (DNS, or …) how hard can that be? -> Easy ● Naive solution, via conntrack ● Pitfalls? → Conntrack expection tables migth get exthausted, → loss of control and service
# iptables -t raw -A PREROUTING -p udp --dport 53 -j NOTRACK # iptables -t raw -A OUTPUT -p udp --sport 53 -j NOTRACK iptables - performance for small rulesets
Use cases
● enable SSH and DNS, how hard can that be? -> Easy ● DNS DDoS, near line rate 10G/1G, many locations ● could not be filtered properly on AS borders
One possibe solution u32 match: u32 filter generate-netfilter-u32-dns-rule iptables performance for small rulesets
u32 filter generate-netfilter-u32-dns-rule
# python generate-netfilter-u32-dns-rule.py \ --qname heise.de --qtype AAAA
0>>22&0x3C@20&0xFFDFDFDF=0x05484549&&0>>22&0x3C@24&0xDFDFFFDF=0x53450244&&0>>2 2&0x3C@28&0xDFFFFFFF=0x4500001C
# iptables [...] --match u32 --u32 "$rule" -j DROP tune iptables performance
● state might kill -> connection tracking ● protocols using UDP -> might be a bad idea * DNS, syslog, NTP ...
→ iptables -t RAW -A -m match … -j NOTRACK
● sysctl tuneable for timeouts in conntrack stack net.netfilter.nf_conntrack_tcp_timeout_established=7200 net.netfilter.nf_conntrack_udp_timeout=60 net.netfilter.nf_conntrack_udp_timeout_stream=180 ... Examples: other cool matches
-m
● u32 - very flexible, but annoying to write ● bpf ● conntrack - use the state of connections
● cgroup ● probability - testing ● recent - port knocking without daemon https://www.digitalocean.com/community/tutorials/how-to-configure-port-knocking-using-only-iptables-on-an-ubuntu-vps Examples: other cool matches & targets
-j
● REDIRECT - Application level fitering, debugging aid ● MARK / CONNMARK ● LOG / ULOG - Logging / structured & flexible logging ● TRACE - ruleset debugging helper, show packet flow throught the rulesets iptables - nftables - Transition e.g. Debian 10 Buster - iptables-nft is standard
#Warning: iptables-legacy tables present, use iptables-legacy-save to see them
● iptables-nft vs. iptables-legacy
● What’s in /etc/modules, ...? ○ iptables-legacy-save | iptabes-nft-restore ○ remove old modules ipt_filter, .... ○ black list those modules How it works: hooks -> tables -> chain -> rules
● dynamic tables and chain creation ● no default tables and chains
→ netfilter hooks nftables
# nft list tables # nft list table inet filter
# nft flush ruleset
# nft add table inet filter
### iptables compat # nft add chain inet filter input { type filter hook input priority 0 \; policy drop \; } # nft add chain inet filter forward { type filter hook forward priority 0 \; policy drop \; } # nft add chain inet filter output { type filter hook output priority 0 \; policy accept \; } #!/usr/sbin/nft -f
# nft add rule inet filter input ct state related,established accept
# nft add rule inet filter input iif lo accept
# nft add rule inet filter input ip protocol tcp dport 22 accept
atomicity nftables: ingress hook
● no conntrack, before any other tables ● Why this is useful? → veth, macvtap, Containers What else do we have?
● iptables/ip6tables/ebtables/arptables ● nftables ● tc ● bpfilter ● XDP tc and tcpdump
● tc → traffic control, strange syntax, but useful ○ QoS ○ Filtering ○ Mirroring ○ Network simulation ● tcpdump pcap compiles BPF fragment → loaded into kernel, → attached to an interface → hand over matching packets/frames to tcpdump ● How to generate fragments? Note: # tcpdump -ddd use an interface with same encapsulation tc and tcpdump
# ip tuntap add dev tun0 mode tun; ip l set up tun0
# tcpdump -i tun0 -ddd icmp | tee filter.bpf 7 48 0 0 0 84 0 0 240 21 0 3 64 48 0 0 9 21 0 1 1 6 0 0 262144 6 0 0 0 Note: tun0 transport raw IP packets, might look different ethernet devices has ethernet frames tc and tcpdump
# tc qdisc add dev eth0 handle ffff: ingress # tc filter add dev eth0 parent ffff: bpf bytecode-file filter.bpf action drop
# tc filter show dev eth0 parent ffff: bpfilter, XDP
● similiar to nftables ingress hook, attach fragments to interfaces ● BPF in fact eBPF ● Hardware offloading possible! see Cililum, good quick start tutorial, https://docs.cilium.io/en/v1.4/bpf/
Fun fact: loopless 6502 derivative, but with proper register sizes Questions ? tc and tcpdump - syntax pogo edition!
# tc qdisc add dev eth0 handle ffff: ingress # tc filter add dev eth0 parent ffff: bpf bytecode-file filter.bpf action drop
# tc filter show dev eth0 parent ffff:
How to delete? tc filter del dev enp0s8 parent ffff: local traffic redirection - debugging
# iptables -t nat -A OUTPUT -p tcp --dport 80 \ -j REDIRECT --to-ports 8080
# iptables -t nat -A OUTPUT -p tcp --dport 443 \ -j REDIRECT --to-ports 8080
# nc -l 0.0.0.0 8080
# mitmproxy --mode transparent --showhost -k tc and the network emulator
Simulate delays or losses tc qdisc add dev eth0 root netem loss 10% https://wiki.linuxfoundation.org/networking/netem iptables ... -m probabilty