Performance Benchmarking and Tuning for Container Networking on Arm
Total Page:16
File Type:pdf, Size:1020Kb
Performance Benchmarking and Tuning for Container Networking on Arm Trevor Tao, Jianlin Lv, Jingzhao Ni, Song Zhu Sep/2020 © 2020 Arm Limited (or its affiliates) Agenda • Introduction • Container Networking Interfaces(CNIs) on arm • Benchmarking metrics, environment and tools • Benchmarking results • Initial Performance Analysis with perf tools • Future Work(Provisional) 2 © 2020 Arm Limited (or its affiliates) Introduction © 2020 Limited Introduction What is CNI? • CNI (Container Network Interface), a Cloud Native Computing Foundation project, consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers, along with a number of supported plugins. • CNI concerns itself only with network connectivity of containers and removing allocated resources when the container is deleted. • CNI has a wide range of support and the specification is simple to implement but not the implementation itself for its extensions. • CNI are the de-facto Kubernetes networking support • We need to know how they perform on arm platform Kubernetes Networking Model • Kubernetes makes opinionated choices about how Pods • Networking objects are networked: • Container-to-Container networking • all Pods can communicate with all other Pods • Pod-to-Pod networking without using network address translation (NAT). • Pod-to-Service networking • all Nodes can communicate with all Pods without • Internet-to-Service networking NAT. • The IP that a Pod sees itself as is the same IP that others see it as. 4 © 2020 Arm Limited (or its affiliates) Container Networking Interfaces(CNIs) on arm © 2020 Limited High Performance CNIs available for Arm Edge Stack Things now available in Akraino IEC Arm edge stack as a ref: IEC Arm Edge Stack Calico Cilium Contiv-VPP OVN-K8s SRIOV Flannel • Direct physical • Widely used and • pure IP networking • Linux-Native, API- • uses FD.io VPP to interfaces(PF/VFs) almost easiest fabric Aware Networking deployment for a provide network • OVS/OVN- support for Pods and Security for simple K8s • high-level network connectivity controller based Containers • High performance networking policy between PODs K8s networking with direct Linux management by • Linux eBPF based solution kernel eth driver or• Linux network bridge • Native DPDK iptables network policy, DPDK PMD driver for pod connection interface support • Rather good load balance and and overlay based • Good scalability for phy NIC performance with • Usually co-work with communication for security which is OVS inherited other CNIs, such as inter-hosts access • Support believed to be with • Native VPP Flannel, Calico by direct(non-overlay) incredible ACL/NAT based • Use OVN logical Multus or other glue• Easy to be integrated and overlay(IPINIP, performance network policy and switches/routers to CNI into other container VxLAN) network connect Pods and networking solution, Repo: • access • Need resource connection L3 networking for outside access e.g., Cilium between hosts • Good performance description or • No good network https://gerrit.akraino.o • Easy deployment but with rather • No OVS-DPDK annotation when do • Good scalability policy support complex support now the configuration for rg/r/admin/repos/iec • Calico-VPP appears too configuration CNI and Pod setup • Hard to debug 6 6 © 2020 Arm Limited (or its affiliates) CNI Networking Models Flannel Cilium Ref. and modified from web source Quote from web source Backend: Tested version: v0.11.0 Backend: Tested version: IPIP, VXLAN VXLAN, Direct Master branch compiled at 2020-09-09 Routing(not tested 7 © 2020 Arm Limited (or its affiliates) now) 7 CNI Networking Models Calico Kubernetes Service Implementation Quote from web source 8 © 2020 Arm Limited (or its affiliates) Tested version: v3.13.2 Benchmarking metrics, environment and tools © 2020 Limited Benchmarking metrics, topology and tools Benchmarking Metrics • Protocols: TCP, UDP, HTTP(s) • TCP, UDP Metrics: bandwidth in Mbits/sec, Gbits/sec, round-trip delay in ms • HTTP(s): Bandwidth in Mbits/sec, Gbits/sec, CPS(Connection per Second), RPS(Request per Second) Tools: Network connection: • IPerf, WRK 10Gbps connection by Ethernet Controller XXV710----→82599ES 10-Gigabit Server Platform SFI/SFP+ Network Connection 10fb Architecture: aarch64 CPU max MHz: 2500.0000 NUMA node0 CPU(s): 0-xxx Byte Order: Little Endian CPU min MHz: 1000.0000 NUMA node1 CPU(s): xxx- CPU(s): xxx BogoMIPS: 400.00 yyy On-line CPU(s) list: 0-xxx L1d cache: xxK Thread(s) per core: 4 L1i cache: xxK L2 cache: xxxK L3 cache: xxxxK 10 © 2020 Arm Limited (or its affiliates) 10 Benchmarking metrics, environment and tools IPerf(v2) test topology: Wrk (http performance) test topology: Nginx Test Command: Client: Test command: iperf -c ${SERVER_IP} -t ${time} -i 1 -w wrk -t12 -c1000 -d30s 100K -P 4 http://$IP/files/$file 11 © 2020 Arm Limited (or its affiliates) Server: Iperf -s 11 Benchmarking Results © 2020 Limited Benchmarking Results of TCP Throughput for CNIs with Different Backends Node to Pod TCP Performance for IPIP(Calico), IPIP(Flannel), VXLAN(Flannel), Observation for TCP performance over CNIs VXLAN(Cilium) and Direct Routing(no Tunnel, Calico) Inter-Hosts 10Gb/s ether connection • The performance gap between CNIs are not so explicit when overlay tunnel is used; 12 • Calico and Flannel show a little bit better 10 performance than Cilium for most MTUs here 8 BW • With IPIP/VXLAN overlay tunnel enabled, the 6 (Gbps) larger MTU size, the throughput(BW) 4 performance is better. 2 • When use direct routing(here by Calico, Cilium 0 also support this mode), the throughput 1500 2000 3000 4000 5000 6000 7000 8000 9000 performance is not significantly affected by MTU MTU size (Byte) size. • The performance of direct routing here by Calico, Calico IPIP Tunnel Flannel IPIP Tunnel Flannel VXLAN Tunnel Cilium VXLAN Tunnel Calico Direct Routing(no tunnel) Cilium also support this mode) is better than IPIP Pod to Pod Performance for IPIP(Calico), IPIP(Flannel), VXLAN(Flannel), enabled. VXLAN(Cilium) and Direct Routing(no Tunnel, Calico) • The IPIP tunnel is a little better than VXLAN tunnel Inter Hosts 10Gb/s ether connection • In general, the node to pod TCP performance is better than that of pod 2 pod which flows one 12 more step ( of veth connection to the Linux 10 kernel) . 8 BW Finally, compared with different scenarios, it proves 6 (Gbps) that IPIP/VXLAN overlay tunnel which are now 4 implemented in the Linux kernel is the key factor 2 which affects the performance of CNIs on arm 0 1500 2000 3000 4000 5000 6000 7000 8000 9000 Question: MTU (Byte) Why the node to pod performance is no better than that of pod to pod case for Cilium? 13 © 2020 ArmCalico Limited IPIP (or Tunnel its affiliates)Flannel IPIP Tunnel Flannel VXLAN Tunnel Cilium VXLAN Tunnel Calico Native Routing HTTP Performance Benchmarking for Calico CNI Pod2Pod HTTP Performance with Calico IPIP Overlay for Cross- Host Communication 1000 Initial observation: • MTU has a rather 900 bigger effect on 800 767.05 775.54 760.33 the performance when accessing 700 large files, but when the 600 accessed file size 510.09 BW is small, it has 500 little effect (MB/s) 400 369.53 • The accessed file size is a major 300 factor to the HTTP performance 200 when there is only 100.82 a small number of 100 parallel threads 9.09 0 • When the file size 600B 10KB 100KB 1MB 10MB 100MB 500MB is big enough, the performance can’t File size to be accessed by Wrk be improved much even with bigger MTU: 1480 2000 3000 4000 5000 6000 7000 8000 8980 MTUs 14 © 2020 Arm Limited (or its affiliates) HTTP Performance Benchmarking for Calico CNI Pod2Pod HTTP Performance with Calico non-IPIP Overlay for Cross-Host Communication 1200 1120 1120 1120 Initial observation: 1000 1000 • Almost the same as that of IPIP • The file size has much 800 more significant performance impact BW 596.93 than the MTU 600 (MB/s) • For file size > =10MB, the MTU has little effect 400 to the final performance • The performance is much higher than those 200 of IPIP when file size >= 114.96 100KB(See next page) 9.78 0 600B 10KB 100KB 1MB 10MB 100MB 500MB File size to be accessed by Wrk MTU: 1500 2000 3000 4000 5000 6000 7000 8000 9000 Question: Why for small file size, the performance of smaller MTU is even better than those of large MTUs? 15 © 2020 Arm Limited (or its affiliates) Wrk: thread 5, connections: 10 HTTP Performance Benchmarking for Calico CNI Pod2Pod HTTP Performance with Calico IPIP vs non-IPIP for Cross- Host Communication 1200 1130 1130 1130 1020 Initial observation: 1000 • For file size > =10MB, the MTU 800 has little effect to the final performance BW 580.31 600 (MB/s) • The performance is much higher than those of IPIP 400 when file size >= 100KB 200 • When MTU is 107.6 small, the 10.28 performance gap 0 between IPIP and 600B 10KB 100KB 1MB 10MB 100MB 500MB non-IPIP is higher File size to be accessed by Wrk IPIP-MTU-1480 non-IPIP-MTU-1500 IPIP-MTU-5000 MTU: non-IPIP-MTU-5000 IPIP-MTU-8980 non-IPIP-MTU-9000 16 © 2020 Arm Limited (or its affiliates) Wrk: thread 5, connections: 10 HTTP Performance Benchmarking for Calico CNI Host2Pod vs Host2Service HTTP Performance with Calico IPIP and Observation: non-IPIP for Cross-Host Communication • The performance gap is minor 1200 when accessing small files 1090 1090 1090 • For small file size, the 1010 host2pod and host2service 1000 performance is almost the same, which means the 800 service access(by iptables configured by kube-proxy) is BW not the bottleneck for HTTP 600 559 (MB/s) service • The performance of non-IPIP is 400 much higher than those of IPIP when file size >= 100KB 200 • For large MTU and large file 102.9 size, the host2pod 9.12 performance is better than 0 host2svc.