packet filtering & today from to , back and more /me

Studied math and computer science

2018- iNNOVo Cloud Cloud Gardener, OpenStack, k8s, edge computing

2016-2018 FHE3, Sysadmin, internal & external consultant

2012-2016 1&1, DNS Team, System Admin About iNNOVO IT as a Service Platform Provider

Modular & standardised Bank-level Developing & Operating 2x Tier 3+ DCs in Edge Datacenters Compliance & standardised, agile ITaaS Frankfurt Sicherheit Cloud Platforms NEU!

50+ Offices in Frankfurt und Berlin employees 80% Tech Engineers/ Admins

Tolles Team, spannende Aufgaben und interessante Technik! iMKE iNNOVO managed 20% Business Development + Backoffice Kubernetes engine

07.08.2019 3 Where are we?

/ iptables since 11/2002 ● in transition to nftables - Migration? How it works: hooks -> tables -> chain -> rules How it works: hooks -> tables -> chain -> rules very basic example

POLICY

● iptables -P INPUT DROP MATCH CHAIN TARGET

● iptables -A INPUT -p icmp -j ACCEPT

● be more precise … why?

● iptables -A INPUT \ -p icmp --icmp-type echo-request \ -j ACCEPT Where is iptables used?

● linux based router with ● host firewalling ● docker ● k8s ● application level filtering ● debugging How is iptables used?

- long list of n rules - origin? - shell script - framework - … - - O(n) - worst case How is iptables used? Issues?

Issues - long list of n rules - origin? ● long lists - shell script - framework → tracking - … which rule matched which packet - → in kernel - O(n) - worst case → high latencies

- Code duplication in userland and kernel - iptables, ip6tables, ebtables, arptables How to cope with that?

● ignoring ○ missing knowledge/awareness ○ issue in big deployments

● big deployment? ○ linux based routers with many interfaces ○ host firewalls for IP blocking (before IP sets) ○ k8s network polices Use Case - iptables performance for small rulesets

● enable services, simple stupid SSH and HTTP(S) (DNS, or …) how hard can that be? -> Easy ● Naive solution, via conntrack ● Pitfalls? iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -p icmp -j ACCEPT iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -i intern -p tcp --dport 22 -j ACCEPT iptables -A INPUT -p tcp --dport 80 -j ACCEPT iptables -A INPUT -p tcp --dport 443 -j ACCEPT Assume: ● Tables and chains empty ● iptables -P INPUT DROP Use Case - iptables performance for small rulesets

● enable services, simple stupid SSH and HTTP(S) (DNS, or …) how hard can that be? -> Easy ● Naive solution, via conntrack iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -p icmp -j ACCEPT iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -i intern -p tcp --dport 22 -j ACCEPT iptables -A INPUT -p udp --dport 53 -j ACCEPT iptables -A INPUT -p tcp --dport 53 -j ACCEPT

Assume: ● Tables and chains empty ● iptables -P INPUT DROP Use Case - iptables performance for small rulesets

Use cases

● enable simple services SSH and HTTP(S) (DNS, or …) how hard can that be? -> Easy ● Naive solution, via conntrack ● Pitfalls? → Conntrack expection tables migth get exthausted, → loss of control and service

# iptables -t raw -A PREROUTING -p udp --dport 53 -j NOTRACK # iptables -t raw -A OUTPUT -p udp --sport 53 -j NOTRACK iptables - performance for small rulesets

Use cases

● enable SSH and DNS, how hard can that be? -> Easy ● DNS DDoS, near line rate 10G/1G, many locations ● could not be filtered properly on AS borders

One possibe solution u32 match: u32 filter generate-netfilter-u32-dns-rule iptables performance for small rulesets

u32 filter generate-netfilter-u32-dns-rule

# python generate-netfilter-u32-dns-rule.py \ --qname heise.de --qtype AAAA

0>>22&0x3C@20&0xFFDFDFDF=0x05484549&&0>>22&0x3C@24&0xDFDFFFDF=0x53450244&&0>>2 2&0x3C@28&0xDFFFFFFF=0x4500001C

# iptables [...] --match u32 --u32 "$rule" -j DROP tune iptables performance

● state might kill -> connection tracking ● protocols using UDP -> might be a bad idea * DNS, syslog, NTP ...

→ iptables -t RAW -A -m match … -j NOTRACK

● sysctl tuneable for timeouts in conntrack stack net.netfilter.nf_conntrack_tcp_timeout_established=7200 net.netfilter.nf_conntrack_udp_timeout=60 net.netfilter.nf_conntrack_udp_timeout_stream=180 ... Examples: other cool matches

-m

● u32 - very flexible, but annoying to write ● bpf ● conntrack - use the state of connections

● cgroup ● probability - testing ● recent - port knocking without daemon https://www.digitalocean.com/community/tutorials/how-to-configure-port-knocking-using-only-iptables-on-an-ubuntu-vps Examples: other cool matches & targets

-j

● REDIRECT - Application level fitering, debugging aid ● MARK / CONNMARK ● LOG / ULOG - Logging / structured & flexible logging ● TRACE - ruleset debugging helper, show packet flow throught the rulesets iptables - nftables - Transition e.g. Debian 10 Buster - iptables-nft is standard

#Warning: iptables-legacy tables present, use iptables-legacy-save to see them

● iptables-nft vs. iptables-legacy

● What’s in /etc/modules, ...? ○ iptables-legacy-save | iptabes-nft-restore ○ remove old modules ipt_filter, .... ○ black list those modules How it works: hooks -> tables -> chain -> rules

● dynamic tables and chain creation ● no default tables and chains

→ netfilter hooks nftables

# nft list tables # nft list table inet filter

# nft flush ruleset

# nft add table inet filter

### iptables compat # nft add chain inet filter input { type filter hook input priority 0 \; policy drop \; } # nft add chain inet filter forward { type filter hook forward priority 0 \; policy drop \; } # nft add chain inet filter output { type filter hook output priority 0 \; policy accept \; } #!/usr/sbin/nft -f

# nft add rule inet filter input ct state related,established accept

# nft add rule inet filter input iif lo accept

# nft add rule inet filter input ip protocol tcp dport 22 accept

atomicity nftables: ingress hook

● no conntrack, before any other tables ● Why this is useful? → veth, macvtap, Containers What else do we have?

● iptables/ip6tables/ebtables/arptables ● nftables ● tc ● bpfilter ● XDP tc and tcpdump

● tc → traffic control, strange syntax, but useful ○ QoS ○ Filtering ○ Mirroring ○ Network simulation ● tcpdump pcap compiles BPF fragment → loaded into kernel, → attached to an interface → hand over matching packets/frames to tcpdump ● How to generate fragments? Note: # tcpdump -ddd use an interface with same encapsulation tc and tcpdump

# ip tuntap add dev tun0 mode tun; ip l set up tun0

# tcpdump -i tun0 -ddd icmp | tee filter.bpf 7 48 0 0 0 84 0 0 240 21 0 3 64 48 0 0 9 21 0 1 1 6 0 0 262144 6 0 0 0 Note: tun0 transport raw IP packets, might look different ethernet devices has ethernet frames tc and tcpdump

# tc qdisc add dev eth0 handle ffff: ingress # tc filter add dev eth0 parent ffff: bpf bytecode-file filter.bpf action drop

# tc filter show dev eth0 parent ffff: bpfilter, XDP

● similiar to nftables ingress hook, attach fragments to interfaces ● BPF in fact eBPF ● Hardware offloading possible! see Cililum, good quick start tutorial, https://docs.cilium.io/en/v1.4/bpf/

Fun fact: loopless 6502 derivative, but with proper register sizes Questions ? tc and tcpdump - syntax pogo edition!

# tc qdisc add dev eth0 handle ffff: ingress # tc filter add dev eth0 parent ffff: bpf bytecode-file filter.bpf action drop

# tc filter show dev eth0 parent ffff:

How to delete? tc filter del dev enp0s8 parent ffff: local traffic redirection - debugging

# iptables -t nat -A OUTPUT -p tcp --dport 80 \ -j REDIRECT --to-ports 8080

# iptables -t nat -A OUTPUT -p tcp --dport 443 \ -j REDIRECT --to-ports 8080

# nc -l 0.0.0.0 8080

# mitmproxy --mode transparent --showhost -k tc and the network emulator

Simulate delays or losses tc qdisc add dev eth0 root netem loss 10% https://wiki.linuxfoundation.org/networking/netem iptables ... -m probabilty