OS Kernel Support for a Low-Overhead Container Overlay Network
Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu Matthew Rockett, Arvind Krishnamurthy, Thomas Anderson Containers are ubiquitous
Cache Big data
Web server Deep learning
Database Microservice VM
App App Container
OS OS App App
Hypervisor OS
Hardware Hardware How do containers communicate?
Two containers cannot • Host mode bind to the same port. • Use the host network interface to communicate
• Macvlan mode (or SR-IOV) • Make container’s IP address routable on the host network Complicates host network routing • Overlay mode • Container network virtualization
Are the network virtualization High overheads overheads fundamental? In this talk…
• Existing approach: Packet-based network virtualization results in high overheads.
• Slim: connection-based network virtualization that is compatible with existing Linux applications.
• Saving up to 56% CPU cycles on popular cloud applications (e.g., Memcached, Nginx, PostgreSQL, Apache Kafka). Container network virtualization
Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7
vSwitch vSwitch
Host 10.1.2.3 Host 10.1.2.4
Give a set of containers an illusion of owning a dedicated network. Container network virtualization
Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7
vSwitch vSwitch 1.2.3.7 Data Host 10.1.2.3 Host 10.1.2.4 Container network virtualization
Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7
vSwitch vSwitch
Host 10.1.2.3 10.1.2.4 1.2.3.7Host Data 10.1.2.4 Container network virtualization
Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7
1.2.3.7 Data vSwitch vSwitch
Host 10.1.2.3 Host 10.1.2.4
Packet-based network virtualization Why packet-based virtualization?
Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7
vSwitch vSwitch
Host 10.1.2.3 Host 10.1.2.4 Performance overheads
Setup Throughput Latency (RTT)
Intra, Host 48.4 Gbps 5.9 us
Intra, Container 37.4 Gbps (23%) 7.9 us (34%)
Inter, Host 26.8 Gbps 11.3 us
Inter, Container 14.0 Gbps (48%) 20.9 us (85%)
Intel Xeon E5-2680 (2.5 GHz), Linux 4.4, Intel XL710 NIC (40G). Performance overheads
Setup Throughput Latency (RTT)
Vanilla 14.0 Gbps (48%) 20.9 us (85%)
Improved 24.5 Gbps (9%) 21.2 us (88%)
Packet steering
Host 26.8 Gbps 11.3 us CPU overheads, 10 Gbps
0.8
0.6 93% 60% 0.4
Virtual Cores 0.2
0 Vanilla Improved Host Packet-based virtualization
Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7
vSwitch vSwitch
Host 10.1.2.3 Host 10.1.2.4 Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4
Container A 1.2.3.4
vSwitch
Host 10.1.2.3 Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4
Container A Host
Application • Socket vNIC NIC vSwitch • Accept 1.2.3.4 10.1.2.3 • Connect • Send • Recv • Close • …. • ….
POSIX Socket interface Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4
Container A Host
Application • Start (Target IP) vNIC NIC vSwitch • Send (Buffer) 1.2.3.4 10.1.2.3 • Recv (Buffer) • End Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4
Container A Host File descriptor Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()
A capability to send/receive packets to/from 1.2.3.7 through vNIC. Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4
Container A Host File descriptor Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()
Network stack (vNIC) Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End() 1.2.3.7 A “ABC” 1.2.3.7 B 1.2.3.7 C Network stack (vNIC) Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End() 10.1.2.4 1.2.3.7 A “ABC” 10.1.2.4 1.2.3.7 B Network stack (vNIC) 10.1.2.4 1.2.3.7 C Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()
“ABC” Network stack (vNIC) Network stack (NIC) Connections: Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Packet generation, Device driver In this talk…
• Existing approach: Packet-based network virtualization results in high overheads.
• Slim: connection-based network virtualization that is compatible with existing Linux applications.
• Saving up to 56% CPU cycles on popular cloud applications (e.g., Memcached, Nginx, PostgreSQL, Apache Kafka). Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()
“ABC” Network stack (vNIC) Network stack (NIC) Connections: Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()
“ABC” Network stack (vNIC) Network stack (NIC) Connections: Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()
“ABC” Network stack (NIC) Connections:
Packet generation, Device driver Container D 1.2.3.7 Challenge #1: Network virtualization Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) What’s 10.1.2.3 • Con.End() 1.2.3.7?
Network stack (NIC) Connections: How to give the container an illusion of a dedicated network? Packet generation, Device driver Container D 1.2.3.7 Challenge #2: Compatibility Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()
Where’s my NIC? Network stack (NIC) Connections: How to work with unmodified applications? Packet generation, Device driver Container D 1.2.3.7 Challenge #3: Network Policies Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()
“ABC” Network stack (NIC) Connections: How do we enforce network policies? Packet generation, Device driver Container D 1.2.3.7 Challenge #4: Security Host 10.1.2.4
Container A Host
Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End() 10.1.2.3?
Tell me my IP address Network stack (NIC) Connections: How do we enforce security? Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4
Container A: 1.2.3.4 Host
Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()
Network stack (NIC) SlimRouter Connections:
Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4
Container A: 1.2.3.4 Host
Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()
Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4
Container A: 1.2.3.4 Host
Application 10.1.2.4 A • Con = Start (1.2.3.7) NIC 10.1.2.4 B • Con.Send(“ABC”) 10.1.2.3 • Con.End() 10.1.2.4 C
“ABC” Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver Container D 1.2.3.7 Slim: compatibility Host 10.1.2.4
Container A: 1.2.3.4 Host
Application SlimSocket • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) Dynamically 10.1.2.3 • Con.End() linked
Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver sock = socket() h_s = socket() h_s = socket() con = socket() h_s h_s
bind(sock, h_s bind(h_s, 10.1.2.3, 1.2.3.4, 80) 1234)
listen(sock) Host connection created. listen(h_s)
h_s h_s con = h_c = accept(h_s, connect(h_s, 10.1.2.3, connect(con, accept(sock, addr) 1234) 1.2.3.4, 80) addr) h_c SlimRouter SlimRouter dup2(h_c, dup2(h_s, con) con)
send(con, buf) SlimSocket SlimSocket recv(con, buf) NIC NIC
recv(con, buf) IP = 10.1.2.3 IP = 10.1.2.4 send(con, buf)
Container Container
(a) Web server (b) Web client sock = socket() h_s = socket() h_s = socket() con = socket() h_s h_s
bind(sock, h_s bind(h_s, 10.1.2.3, 1.2.3.4, 80) 1234)
listen(sock)
listen(h_s)
h_s h_s con = h_c = accept(h_s, connect(h_s, 10.1.2.3, connect(con, accept(sock, addr) 1234) 1.2.3.4, 80) addr) h_c SlimRouterHost connection SlimRouter dup2(h_c, dup2(h_s, con) con)
send(con, buf) SlimSocket SlimSocket recv(con, buf) NIC NIC
recv(con, buf) IP = 10.1.2.3 IP = 10.1.2.4 send(con, buf)
Container Container
(a) Web server (b) Web client Slim: Support network policies
Container A: 1.2.3.4 Host
Application SlimSocket • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()
Reject a connection if destination IP = 1.2.3.7 Network stack (NIC) SlimRouter Connections:
Packet generation, Device driver Slim: Security enforcement
Container A: 1.2.3.4 Host A capability to access NIC. Application SlimSocket • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()
Con.GetMyIP() Con.IncreaseMyPriority() Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver Slim: Security enforcement
Container A: 1.2.3.4 Host
Application SlimSocket • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()
Your IP is 1.2.3.4 Con.GetMyIP() Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver Slim: Security enforcement
Container A: 1.2.3.4 Host
Application SlimSocket • Con = Start (1.2.3.7) Container 1.2.3.4 NIC • Con.Send(“ABC”) Con.GetMyIP() is malicious. 10.1.2.3 • Con.End() Request rejected. SlimKernModule
Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver In this talk…
• Existing approach: Packet-based network virtualization results in high overheads.
• Slim: connection-based network virtualization that is compatible with existing Linux applications.
• Saving up to 56% CPU cycles on popular cloud applications (e.g., Memcached, Nginx, PostgreSQL, Apache Kafka). Microbenchmark: performance
Setup Throughput Latency (RTT)
Vanilla 14.0 Gbps (48%) 20.9 us (85%)
Improved 24.5 Gbps (9%) 21.2 us (88%)
Slim 26.8 Gbps 11.4 us
Host 26.8 Gbps 11.3 us Microbenchmark: CPU
1.6 1.4 40% 1.2 1 0.8 Improved 0.6 Host 0.4
Virtual Cores Slim 0.2 0 0 2.5 5 7.5 10 12.5 15 17.5 20 22.5 25 TCP Throughput (Gbps) Evaluation: applications
• In-memory key-value store • Memcached 1.5.6 • Web server • Nginx 1.10.3 • Database • PostgreSQL 9.5 • Stream-processing framework • Kafka 2.0.0 Evaluation: Memcached performance
400 1.2 1.0 300 71% 0.8 42% 200 0.6 0.4
100 Latency (ms)
Throughput (K) 0.2 0 0.0 Vanilla Improved Slim Host Vanilla Improved Slim Host Evaluation: Memcached CPU 5 205M 71% 4 reqs/s 351M 363M reqs/s reqs/s 3 25% CPU reduction: 2 56%
Virtual Cores 1 0 Improved Slim Host Evaluations: CPU
Application CPU utilization reduction Memcached 56% Nginx 24% PostgreSQL 22% Apache Kafka 10% Limitation
• Connection setup time is longer • 270 us -> 556 us
• Limited support for packet-based network policies
• Cannot support unmodified low-level network tools
• Cannot speed up datagram sockets (i.e., UDP) Summary
• Packet-based network virtualization results in high overheads in terms of throughput, latency, and CPU utilization.
• Slim integrates efficient connection-based network virtualization support natively into the OS network stack. • Saving up to 56% CPU cycles on popular cloud applications.
• Open sourced at https://github.com/danyangz/slim.