software defining system devices with the BANANA double split driver model

Dan WILLIAMS, Hani JAMJOOM IBM Watson Research Center

Hakim WEATHERSPOON Cornell University

Decoupling gives Flexibility

• Cloud’s flexibility comes from decoupling device functionality from physical devices – aka “virtualization”

• Can place VM anywhere – Consolidation – Instantiation – Migration – Placement optimizations

2 Are all Devices Decoupled?

• Today: split driver model – Guests don’t need device-specific driver – System portion interfaces with physical device

Dom 0 (Host OS) Dom U (VM)

Back-end Ring Front-end System Driver Buffer Driver portion Hardware Driver

Hypervisor

Physical Machine 3 Are all Devices Decoupled?

• Today: split driver model – Guests don’t need device-specific driver – System portion interfaces with physical device

• Dependencies Dom 0 (Host OS) Dom U (VM)

on hardware – Presence of Back-end Ring Front-end Buffer device (e.g., System Driver Driver portion GPU, FPGA) Hardware Driver – Device-related configuration Hypervisor (e.g., VLAN) Physical Machine 4 Devices Limit Flexibility

• Split driver model: dependencies break if VM moves

• No easy place to plug into Dom 0 (Host OS) Dom U (VM) hardware driver – System portion Back-end Ring Front-end connected in System Driver Buffer Driver portion ad-hoc way Hardware Driver

Hypervisor

Physical Machine 5 Banana Double-Split

• Clean separation between hardware driver (Corm) and backend driver (Spike)

• Standard Dom 0 (Host OS) Dom U (VM) interface Ring Front-end Spike between Corm Buffer Driver

and Spike Endpoint (endpoint) Corm

• Connected with Hypervisor wires Physical Machine 6 Roadmap

• Banana Double-Split Driver Model • Implementation • Experiments Dom 0 (Host OS) Dom U (VM) • Related Work Ring Front-end Spike • Summary Buffer Driver

Endpoint Wire

Corm

Hypervisor

Physical Machine 7 (Anatomy of a Banana Plant)

Flower Spike

Corm

8 Corms

• One Corm per physical device

• Expose one or Dom 0 (Host OS) Dom U (VM) more endpoints Ring Front-end Spike – Virtual NIC-like Buffer Driver interfaces Endpoint Wire

• Connect to one Corm or more Spikes Hypervisor

Physical Machine 9 Spikes

• One Spike per front-end driver

• Expose an Dom 0 (Host OS) Dom U (VM) endpoint Ring Front-end SpikeCorm Buffer Driver

• Attach to a Endpoint Wire Corm Corm

• No need to Hypervisor share machine! Physical Machine 10 Wires

• Connections between endpoints – E.g., tunnel, VPN, local bridge

• Each hypervisor contains endpoint controller – Advertises endpoints (e.g., for Corms) – Looks up endpoints (e.g., for Spikes) – Sets wire type – Integrates with VM migration

• Simple interface – connect/disconnect 11 Implementation

Dom 0 (Host OS) Dom U (VM) • Xen network

devices Spike Back Ring Front-end

Endpoint vif1.0 Buffer eth0

• Endpoint Endpoint Bridge exposes

netdev Corm Outgoing Interface Endpoint Endpoint • Grafted to eth0

existing vSwitch devices via

endpoint to remote host via eth0 bridge Hypervisor

12 Physical Machine Implementation

• Types of wires – Native (bridge) – Encapsulating (in kernel module) – Tunneling (Open-VPN based)

• /proc interface for configuring wires

• Integrated with live migration

13 Initial Experimentation

• Ultimate test of location independence – Cross-cloud live migration – Keep using the same Corm

• Banana Wire Performance

14 Setup

• Amazon EC2 and local resources – EC2 (4XL): 33 ECUs, 23 GB memory, 10 Gbps Ethernet – Local: 12 cores @ 2.93 GHz, 24 GB memory, 1Gbps Ethernet

• Xen-blanket for nested virtualization – Dom 0: 8 vCPUs, 4 GB memory – PV guests: 4 vCPUs, 8 GB memory

• Local NFS server for VM disk images

• netperf to measure throughput and latency – 1400 byte packets

15 Cross-cloud Live Migration

1600 • Continuous 1400 1200 netperf 1000 traffic 800 between 600 VM2 migration starts VM2 migration ends Throughput (Mbps) 400 VMs 200 VM1 migration starts VM1 migration ends 0 0 100 200 300 400 500 600 700 Time (s) • It works! 10

1 Latency (ms) VM1 migration starts VM1 migration ends VM2 migration starts VM2 migration ends

0.1 0 100 200 300 400 500 600 700 800 900 Time (s) Corm Switching

1600 • Dependency 1400 1200 eliminated 1000 after second 800 VM migrated 600 VM2 migration starts VM2 migration ends Throughput (Mbps) 400 200 VM1 migration starts VM1 migration ends 0 0 100 200 300 400 500 600 700 • Switching Time (s) Corms can 10 recover performance 1 Latency (ms) VM1 migration starts VM1 migration ends VM2 migration starts VM2 migration ends

0.1 0 100 200 300 400 500 600 700 800 900 Time (s) Migration Numbers

• Experimental setup has overhead Downtime Duration – Nesting increases Banana 1.3 [0.5] 20.13 [0.1] downtime by 43% None (nested) 1.0 [0.5] 20.04 [0.1] None (not nested) 0.7 [0.4] 19.86 [0.2] • Flexibility to do such Mean [w/std. dev] local to local live VM extreme migration has migration (s) a cost – Migrating Banana Downtime Duration endpoints introduces a further 30% overhead Local to Local 1.3 [0.5] 20.13 [0.1] EC2 to EC2 1.9 [0.3] 10.69 [0.6] Local to EC2 2.8 [1.2] 162.25 [150.0] • Cross-cloud live Mean [w/std. dev] local to local live VM migration had 2.8s migration (s) downtime 18 Encapsulating Wires

1000 non-nested 800 nested

600

400

200 Throughput (Mbps) 0 baseline tinc OpenVPN Banana wire • UDP throughput experiment

• Using kernel module is especially important in nested environment – Factor of 1.55 improvement becomes factor of 3.28 19 Related Work

• Other cut points – Block tap drivers (block devices)

• Continuous access – Netchannel

• Specialized solutions – Nomad, USB/IP

• Virtual networking – VL2, NetLord, VIOLIN, others 20 Summary

• Banana Double-Split driver model decouples virtual devices from physical – Provides location independence – Mappings are software-defined

• So far, implemented network device – Demonstrated cross-cloud live migration

• Future directions – Controller to manage mappings/re-wirings? – Corm switching?

– Other devices? 21 Thank You

Flower Spike

Corm

22