<<

OS Kernel Support for a Low-Overhead Container Overlay Network

Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu Matthew Rockett, Arvind Krishnamurthy, Thomas Anderson Containers are ubiquitous

Cache Big data

Web server Deep learning

Database Microservice VM

App App Container

OS OS App App

Hypervisor OS

Hardware Hardware How do containers communicate?

Two containers cannot • Host mode bind to the same port. • Use the host network interface to communicate

• Macvlan mode (or SR-IOV) • Make container’s IP address routable on the host network Complicates host network routing • Overlay mode • Container network virtualization

Are the network virtualization High overheads overheads fundamental? In this talk…

• Existing approach: Packet-based network virtualization results in high overheads.

• Slim: connection-based network virtualization that is compatible with existing applications.

• Saving up to 56% CPU cycles on popular cloud applications (e.g., Memcached, Nginx, PostgreSQL, Apache Kafka). Container network virtualization

Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7

vSwitch vSwitch

Host 10.1.2.3 Host 10.1.2.4

Give a set of containers an illusion of owning a dedicated network. Container network virtualization

Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7

vSwitch vSwitch 1.2.3.7 Data Host 10.1.2.3 Host 10.1.2.4 Container network virtualization

Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7

vSwitch vSwitch

Host 10.1.2.3 10.1.2.4 1.2.3.7Host Data 10.1.2.4 Container network virtualization

Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7

1.2.3.7 Data vSwitch vSwitch

Host 10.1.2.3 Host 10.1.2.4

Packet-based network virtualization Why packet-based virtualization?

Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7

vSwitch vSwitch

Host 10.1.2.3 Host 10.1.2.4 Performance overheads

Setup Throughput Latency (RTT)

Intra, Host 48.4 Gbps 5.9 us

Intra, Container 37.4 Gbps (23%) 7.9 us (34%)

Inter, Host 26.8 Gbps 11.3 us

Inter, Container 14.0 Gbps (48%) 20.9 us (85%)

Intel Xeon E5-2680 (2.5 GHz), Linux 4.4, Intel XL710 NIC (40G). Performance overheads

Setup Throughput Latency (RTT)

Vanilla 14.0 Gbps (48%) 20.9 us (85%)

Improved 24.5 Gbps (9%) 21.2 us (88%)

Packet steering

Host 26.8 Gbps 11.3 us CPU overheads, 10 Gbps

0.8

0.6 93% 60% 0.4

Virtual Cores 0.2

0 Vanilla Improved Host Packet-based virtualization

Container Container Container Container A B C D 1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7

vSwitch vSwitch

Host 10.1.2.3 Host 10.1.2.4 Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4

Container A 1.2.3.4

vSwitch

Host 10.1.2.3 Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4

Container A Host

Application • Socket vNIC NIC vSwitch • Accept 1.2.3.4 10.1.2.3 • Connect • Send • Recv • Close • …. • ….

POSIX Socket interface Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4

Container A Host

Application • Start (Target IP) vNIC NIC vSwitch • Send (Buffer) 1.2.3.4 10.1.2.3 • Recv (Buffer) • End Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4

Container A Host Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()

A capability to send/receive packets to/from 1.2.3.7 through vNIC. Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4

Container A Host File descriptor Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()

Network stack (vNIC) Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End() 1.2.3.7 A “ABC” 1.2.3.7 B 1.2.3.7 C Network stack (vNIC) Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End() 10.1.2.4 1.2.3.7 A “ABC” 10.1.2.4 1.2.3.7 B Network stack (vNIC) 10.1.2.4 1.2.3.7 C Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Container D 1.2.3.7 Packet-based virtualization Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()

“ABC” Network stack (vNIC) Network stack (NIC) Connections: Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Packet generation, Device driver In this talk…

• Existing approach: Packet-based network virtualization results in high overheads.

• Slim: connection-based network virtualization that is compatible with existing Linux applications.

• Saving up to 56% CPU cycles on popular cloud applications (e.g., Memcached, Nginx, PostgreSQL, Apache Kafka). Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()

“ABC” Network stack (vNIC) Network stack (NIC) Connections: Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) vNIC NIC vSwitch • Con.Send(“ABC”) 1.2.3.4 10.1.2.3 • Con.End()

“ABC” Network stack (vNIC) Network stack (NIC) Connections: Connections: • Con: 1.2.3.4 <-> 1.2.3.7 Packet generation, Device driver Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()

“ABC” Network stack (NIC) Connections:

Packet generation, Device driver Container D 1.2.3.7 Challenge #1: Network virtualization Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) What’s 10.1.2.3 • Con.End() 1.2.3.7?

Network stack (NIC) Connections: How to give the container an illusion of a dedicated network? Packet generation, Device driver Container D 1.2.3.7 Challenge #2: Compatibility Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()

Where’s my NIC? Network stack (NIC) Connections: How to work with unmodified applications? Packet generation, Device driver Container D 1.2.3.7 Challenge #3: Network Policies Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()

“ABC” Network stack (NIC) Connections: How do we enforce network policies? Packet generation, Device driver Container D 1.2.3.7 Challenge #4: Security Host 10.1.2.4

Container A Host

Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End() 10.1.2.3?

Tell me my IP address Network stack (NIC) Connections: How do we enforce security? Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4

Container A: 1.2.3.4 Host

Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()

Network stack (NIC) SlimRouter Connections:

Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4

Container A: 1.2.3.4 Host

Application • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()

Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver Container D 1.2.3.7 Slim: connection-based virtualization Host 10.1.2.4

Container A: 1.2.3.4 Host

Application 10.1.2.4 A • Con = Start (1.2.3.7) NIC 10.1.2.4 B • Con.Send(“ABC”) 10.1.2.3 • Con.End() 10.1.2.4 C

“ABC” Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver Container D 1.2.3.7 Slim: compatibility Host 10.1.2.4

Container A: 1.2.3.4 Host

Application SlimSocket • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) Dynamically 10.1.2.3 • Con.End() linked

Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver sock = socket() h_s = socket() h_s = socket() con = socket() h_s h_s

bind(sock, h_s bind(h_s, 10.1.2.3, 1.2.3.4, 80) 1234)

listen(sock) Host connection created. listen(h_s)

h_s h_s con = h_c = accept(h_s, connect(h_s, 10.1.2.3, connect(con, accept(sock, addr) 1234) 1.2.3.4, 80) addr) h_c SlimRouter SlimRouter dup2(h_c, dup2(h_s, con) con)

send(con, buf) SlimSocket SlimSocket recv(con, buf) NIC NIC

recv(con, buf) IP = 10.1.2.3 IP = 10.1.2.4 send(con, buf)

Container Container

(a) Web server (b) Web client sock = socket() h_s = socket() h_s = socket() con = socket() h_s h_s

bind(sock, h_s bind(h_s, 10.1.2.3, 1.2.3.4, 80) 1234)

listen(sock)

listen(h_s)

h_s h_s con = h_c = accept(h_s, connect(h_s, 10.1.2.3, connect(con, accept(sock, addr) 1234) 1.2.3.4, 80) addr) h_c SlimRouterHost connection SlimRouter dup2(h_c, dup2(h_s, con) con)

send(con, buf) SlimSocket SlimSocket recv(con, buf) NIC NIC

recv(con, buf) IP = 10.1.2.3 IP = 10.1.2.4 send(con, buf)

Container Container

(a) Web server (b) Web client Slim: Support network policies

Container A: 1.2.3.4 Host

Application SlimSocket • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()

Reject a connection if destination IP = 1.2.3.7 Network stack (NIC) SlimRouter Connections:

Packet generation, Device driver Slim: Security enforcement

Container A: 1.2.3.4 Host A capability to access NIC. Application SlimSocket • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()

Con.GetMyIP() Con.IncreaseMyPriority() Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver Slim: Security enforcement

Container A: 1.2.3.4 Host

Application SlimSocket • Con = Start (1.2.3.7) NIC • Con.Send(“ABC”) 10.1.2.3 • Con.End()

Your IP is 1.2.3.4 Con.GetMyIP() Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver Slim: Security enforcement

Container A: 1.2.3.4 Host

Application SlimSocket • Con = Start (1.2.3.7) Container 1.2.3.4 NIC • Con.Send(“ABC”) Con.GetMyIP() is malicious. 10.1.2.3 • Con.End() Request rejected. SlimKernModule

Network stack (NIC) SlimRouter • 1.2.3.4 <-> 1.2.3.7 is Connections: mapped to 10.1.2.3 • Con: 10.1.2.3 <->10.1.2.4 <-> 10.1.2.4 Packet generation, Device driver In this talk…

• Existing approach: Packet-based network virtualization results in high overheads.

• Slim: connection-based network virtualization that is compatible with existing Linux applications.

• Saving up to 56% CPU cycles on popular cloud applications (e.g., Memcached, Nginx, PostgreSQL, Apache Kafka). Microbenchmark: performance

Setup Throughput Latency (RTT)

Vanilla 14.0 Gbps (48%) 20.9 us (85%)

Improved 24.5 Gbps (9%) 21.2 us (88%)

Slim 26.8 Gbps 11.4 us

Host 26.8 Gbps 11.3 us Microbenchmark: CPU

1.6 1.4 40% 1.2 1 0.8 Improved 0.6 Host 0.4

Virtual Cores Slim 0.2 0 0 2.5 5 7.5 10 12.5 15 17.5 20 22.5 25 TCP Throughput (Gbps) Evaluation: applications

• In-memory key-value store • Memcached 1.5.6 • Web server • Nginx 1.10.3 • Database • PostgreSQL 9.5 • Stream-processing framework • Kafka 2.0.0 Evaluation: Memcached performance

400 1.2 1.0 300 71% 0.8 42% 200 0.6 0.4

100 Latency (ms)

Throughput (K) 0.2 0 0.0 Vanilla Improved Slim Host Vanilla Improved Slim Host Evaluation: Memcached CPU 5 205M 71% 4 reqs/s 351M 363M reqs/s reqs/s 3 25% CPU reduction: 2 56%

Virtual Cores 1 0 Improved Slim Host Evaluations: CPU

Application CPU utilization reduction Memcached 56% Nginx 24% PostgreSQL 22% Apache Kafka 10% Limitation

• Connection setup time is longer • 270 us -> 556 us

• Limited support for packet-based network policies

• Cannot support unmodified low-level network tools

• Cannot speed up datagram sockets (i.e., UDP) Summary

• Packet-based network virtualization results in high overheads in terms of throughput, latency, and CPU utilization.

• Slim integrates efficient connection-based network virtualization support natively into the OS network stack. • Saving up to 56% CPU cycles on popular cloud applications.

sourced at https://github.com/danyangz/slim.