Unikernels as Processes Dan Williams, Ricardo Koller (IBM Research) Martin Lucina (robur.io/Center for the Cultivation of Technology) Nikhil Prakash (BITS Pilani) What is a unikernel?
• An application linked with library OS components • Run on virtual hardware (like) abstraction VM
• Language-specific • MirageOS (OCaml) • IncludeOS (C++) • Legacy-oriented • Rumprun (NetBSD-based) • Can run nginx, redis, node.js, python,etc..
2 Why unikernels?
• Lightweight • Only what the application needs • Isolated VM • VM-isolation is the “gold standard” • Well suited for the cloud • Microservices • Serverless • NFV
3 Virtualization is a mixed bag
• Good for isolation, but…
• Tooling for VMs not designed for lightweight (e.g., lightVM) • How do you debug black-box VMs? • Poor VM performance due to vmexits • Deployment issues on already-virtualized infrastructure
4 Why not run unikernels as processes?
• Unikernels are a single process anyway!
• Many benefits as a process Process • Better performance • Common tooling (gdb, perf, etc.) • ASLR • Memory sharing • Architecture independence • Isolation by limiting process interface to host • 98% reduction in accessible kernel functions
5 Outline
• Introduction • Where does unikernel isolation come from? • Unikernels as processes • Isolation evaluation • Performance evaluation • Summary
6 Isolation: definitions and assumptions
• Isolation: no cloud user can read/write state or modify its execution
• Focus on software deficiencies in the host app • Code reachable through interface is a metric for attack surface
Host Kernel • We trust HW isolation (page tables, etc.)
• We do not consider covert channels, timing channels or resource starvation
7 Unikernel architecture
• ukvm unikernel monitor process (e.g., ukvm) monitor • Userspace process • Uses Linux/KVM
• Setup and loading • Exit handling KVM Linux
I/O devices VT-x
8 Unikernel architecture
• ukvm unikernel monitor process (e.g., ukvm) monitor • Userspace process Setup • Uses Linux/KVM
• Setup and loading 1 Set up I/O fds • Exit handling KVM Linux
I/O devices VT-x
9 Virtual CPU Unikernel architecture context
• ukvm unikernel monitor process (e.g., ukvm) monitor • Userspace process Setup • Uses Linux/KVM
2 Load unikernel • Setup and loading 1 Set up I/O fds • Exit handling KVM Linux
I/O devices VT-x
10 Virtual CPU Unikernel architecture context
• ukvm unikernel monitor process (e.g., ukvm) monitor • Userspace process Setup Exit handling • Uses Linux/KVM
2 Load unikernel • Setup and loading 1 Set up I/O I/O fds • Exit handling KVM Linux
I/O devices VT-x
11 Unikernel isolation comes from the interface
• 10 hypercalls Hypercall walltime puts • 6 for I/O poll • Network: packet level blkinfo blkwrite • Storage: block level blkread netinfo • vs. >350 syscalls netwrite netread halt
12 Observations
• Unikernels are not kernels! • No page table management after setup • No interrupt handlers: cooperative scheduling and poll
• The ukvm monitor doesn’t “do” anything! • One-to-one mapping between hypercalls and system calls
• Idea: maintain isolation by limiting syscalls available to process
13 Unikernel as process architecture
• Tender: modified ukvm tender process unikernel monitor • Userspace process • Uses seccomp to restrict interface
• Setup and loading
Linux
I/O devices
14 Unikernel as process architecture
• Tender: modified ukvm tender process unikernel monitor • Userspace process Setup • Uses seccomp to restrict interface
1 Set up • Setup and loading I/O fds
Linux
I/O devices
15 Unikernel as process architecture
• Tender: modified ukvm tender process unikernel monitor • Userspace process Setup • Uses seccomp to restrict interface
1 Set up • Setup and loading I/O fds 2 Load unikernel Linux
I/O devices
16 Unikernel as process architecture
• Tender: modified ukvm tender process unikernel monitor • Userspace process Setup • Uses seccomp to restrict interface 3 Configure seccomp 1 Set up • Setup and loading I/O fds 2 Load unikernel Linux
I/O devices
17 Unikernel as process architecture
• Tender: modified ukvm tender process unikernel monitor • Userspace process Setup Exit handling • Uses seccomp to restrict interface 3 Configure seccomp 1 Set up I/O • Setup and loading I/O fds 2 Load unikernel • “Exit” handling Linux
I/O devices
18 Unikernel isolation comes from the interface
• 10 hypercalls Hypercall walltime puts poll blkinfo • 6 for I/O blkwrite • Network: packet level blkread • Storage: block level netinfo netwrite netread • vs. >350 syscalls halt
19 Unikernel isolation comes from the interface
• Direct mapping between 10 Hypercall System Call Resource walltime clock_gettime hypercalls and system puts write stdout call/resource pairs poll ppoll net_fd blkinfo • 6 for I/O blkwrite pwrite64 blk_fd • Network: packet level blkread pread64 blk_fd • Storage: block level netinfo netwrite write net_fd netread read net_fd • vs. >350 syscalls halt exit_group
20 Implementation: nabla !
• Extended Solo5 unikernel ecosystem and ukvm • Prototype supports: • MirageOS • IncludeOS • Rumprun
• https://github.com/solo5/solo5
21 Measuring isolation: common applications
• Code reachable 1600 through interface is process 1400 ukvm a metric for attack 1200 nabla surface 1000 • Used kernel ftrace 800 600 400 Unique kernel
• Results: functions accessed 200 0 • Processes: 5-6x more nginx nginx-largenode-expressredis-get redis-set • VMs: 2-3x more
22 Measuring isolation: fuzz testing
• Used kernel ftrace 700 accept • Used trinity 600 nabla block system call fuzzer to 500 try to access more of 30 the kernel 400 300 0 200 0 10 • Results: 100 • Nabla policy reduces Unique kernel functions 0 by 98% over a 0 50 100 150 200 250 300 “normal” process
23 Measuring performance: throughput
245 • Applications include: 200% ukvm 180% • Web servers nabla QEMU/KVM • Python benchmarks 160% • Redis 140% no I/O with I/O • etc. 120% 100% Normalized throughput
80% py_tornado py_chameleon node_fib mirage_HTTP py_2to3 node_express nginx_large redis_get redis_set includeos_TCP nginx includeos_UDP • Results: • 101%-245% higher throughput than ukvm
24 Measuring performance: CPU utilization
120 100 • vmexits have an effect on 80 60 instructions per cycle (a) 40 CPU % 20 100 0 80 60
• Experiment with MirageOS (b) 40 20 VMexits/ms web server 0 1.5 • Results: 1 (c) • 12% reduction in cpu 0.5 nabla
IPC (ins/cycle) ukvm 0 utilization over ukvm 0 5000 10000 15000 20000 Requests/sec
25 Measuring performance: startup time
• Startup time is important QEMU/KVM ukvm nabla process for serverless, NFV 30 30 30 750 QEMU/KVM 20 20 20 ukvm 500 500 nabla 10 10 10 process Hello world • Results: 250 world Hello 0 • 0 0 0 0 Ukvm has 30-370% higher 200 200 200 1500 1500 latency than nabla 150 150 150 1000 1000 100 100 100 500 500 50 50 50 HTTP POST • Mostly due avoiding KVM Post HTTP 0 0 0 0 0 246810121416 246810121416 246810121416 246810121416 02468101214 overheads Number of cores
26 Summary and Next Steps
• Unikernels should run as processes! • Maintain isolation via thin interface • Improve performance, etc. Process
• Next steps: can unikernels as processes be used to improve container isolation? • Nabla containers • https://nabla-containers.github.io/
27 28