Unikernels as Processes Dan Williams, Ricardo Koller (IBM Research) Martin Lucina (robur.io/Center for the Cultivation of Technology) Nikhil Prakash (BITS Pilani) What is a ?

• An application linked with library OS components • Run on virtual hardware (like) abstraction VM

• Language-specific • MirageOS (OCaml) • IncludeOS (C++) • Legacy-oriented • Rumprun (NetBSD-based) • Can run nginx, redis, node.js, python,etc..

2 Why unikernels?

• Lightweight • Only what the application needs • Isolated VM • VM-isolation is the “gold standard” • Well suited for the cloud • Microservices • Serverless • NFV

3 Virtualization is a mixed bag

• Good for isolation, but…

• Tooling for VMs not designed for lightweight (e.g., lightVM) • How do you debug black-box VMs? • Poor VM performance due to vmexits • Deployment issues on already-virtualized infrastructure

4 Why not run unikernels as processes?

• Unikernels are a single anyway!

• Many benefits as a process Process • Better performance • Common tooling (gdb, perf, etc.) • ASLR • Memory sharing • Architecture independence • Isolation by limiting process interface to host • 98% reduction in accessible kernel functions

5 Outline

• Introduction • Where does unikernel isolation come from? • Unikernels as processes • Isolation evaluation • Performance evaluation • Summary

6 Isolation: definitions and assumptions

• Isolation: no cloud user can read/write state or modify its execution

• Focus on software deficiencies in the host app • Code reachable through interface is a metric for attack surface

Host Kernel • We trust HW isolation ( tables, etc.)

• We do not consider covert channels, timing channels or resource starvation

7 Unikernel architecture

• ukvm unikernel monitor process (e.g., ukvm) monitor • Userspace process • Uses Linux/KVM

• Setup and loading • Exit handling KVM Linux

I/O devices VT-x

8 Unikernel architecture

• ukvm unikernel monitor process (e.g., ukvm) monitor • Userspace process Setup • Uses Linux/KVM

• Setup and loading 1 Set up I/O fds • Exit handling KVM Linux

I/O devices VT-x

9 Virtual CPU Unikernel architecture context

• ukvm unikernel monitor process (e.g., ukvm) monitor • Userspace process Setup • Uses Linux/KVM

2 Load unikernel • Setup and loading 1 Set up I/O fds • Exit handling KVM Linux

I/O devices VT-x

10 Virtual CPU Unikernel architecture context

• ukvm unikernel monitor process (e.g., ukvm) monitor • Userspace process Setup Exit handling • Uses Linux/KVM

2 Load unikernel • Setup and loading 1 Set up I/O I/O fds • Exit handling KVM Linux

I/O devices VT-x

11 Unikernel isolation comes from the interface

• 10 hypercalls Hypercall walltime puts • 6 for I/O poll • Network: packet level blkinfo blkwrite • Storage: block level blkread netinfo • vs. >350 syscalls netwrite netread halt

12 Observations

• Unikernels are not kernels! • No page table management after setup • No handlers: cooperative scheduling and poll

• The ukvm monitor doesn’t “do” anything! • One-to-one mapping between hypercalls and system calls

• Idea: maintain isolation by limiting syscalls available to process

13 Unikernel as process architecture

• Tender: modified ukvm tender process unikernel monitor • Userspace process • Uses seccomp to restrict interface

• Setup and loading

Linux

I/O devices

14 Unikernel as process architecture

• Tender: modified ukvm tender process unikernel monitor • Userspace process Setup • Uses seccomp to restrict interface

1 Set up • Setup and loading I/O fds

Linux

I/O devices

15 Unikernel as process architecture

• Tender: modified ukvm tender process unikernel monitor • Userspace process Setup • Uses seccomp to restrict interface

1 Set up • Setup and loading I/O fds 2 Load unikernel Linux

I/O devices

16 Unikernel as process architecture

• Tender: modified ukvm tender process unikernel monitor • Userspace process Setup • Uses seccomp to restrict interface 3 Configure seccomp 1 Set up • Setup and loading I/O fds 2 Load unikernel Linux

I/O devices

17 Unikernel as process architecture

• Tender: modified ukvm tender process unikernel monitor • Userspace process Setup Exit handling • Uses seccomp to restrict interface 3 Configure seccomp 1 Set up I/O • Setup and loading I/O fds 2 Load unikernel • “Exit” handling Linux

I/O devices

18 Unikernel isolation comes from the interface

• 10 hypercalls Hypercall walltime puts poll blkinfo • 6 for I/O blkwrite • Network: packet level blkread • Storage: block level netinfo netwrite netread • vs. >350 syscalls halt

19 Unikernel isolation comes from the interface

• Direct mapping between 10 Hypercall System Call Resource walltime clock_gettime hypercalls and system puts write stdout call/resource pairs poll ppoll net_fd blkinfo • 6 for I/O blkwrite pwrite64 blk_fd • Network: packet level blkread pread64 blk_fd • Storage: block level netinfo netwrite write net_fd netread read net_fd • vs. >350 syscalls halt exit_group

20 Implementation: nabla !

• Extended Solo5 unikernel ecosystem and ukvm • Prototype supports: • MirageOS • IncludeOS • Rumprun

• https://github.com/solo5/solo5

21 Measuring isolation: common applications

• Code reachable 1600 through interface is process 1400 ukvm a metric for attack 1200 nabla surface 1000 • Used kernel ftrace 800 600 400 Unique kernel

• Results: functions accessed 200 0 • Processes: 5-6x more nginx nginx-largenode-expressredis-get redis-set • VMs: 2-3x more

22 Measuring isolation: fuzz testing

• Used kernel ftrace 700 accept • Used trinity 600 nabla block system call fuzzer to 500 try to access more of 30 the kernel 400 300 0 200 0 10 • Results: 100 • Nabla policy reduces Unique kernel functions 0 by 98% over a 0 50 100 150 200 250 300 “normal” process

23 Measuring performance: throughput

245 • Applications include: 200% ukvm 180% • Web servers nabla QEMU/KVM • Python benchmarks 160% • Redis 140% no I/O with I/O • etc. 120% 100% Normalized throughput

80% py_tornado py_chameleon node_fib mirage_HTTP py_2to3 node_express nginx_large redis_get redis_set includeos_TCP nginx includeos_UDP • Results: • 101%-245% higher throughput than ukvm

24 Measuring performance: CPU utilization

120 100 • vmexits have an effect on 80 60 instructions per cycle (a) 40 CPU % 20 100 0 80 60

• Experiment with MirageOS (b) 40 20 VMexits/ms web server 0 1.5 • Results: 1 (c) • 12% reduction in cpu 0.5 nabla

IPC (ins/cycle) ukvm 0 utilization over ukvm 0 5000 10000 15000 20000 Requests/sec

25 Measuring performance: startup time

• Startup time is important QEMU/KVM ukvm nabla process for serverless, NFV 30 30 30 750 QEMU/KVM 20 20 20 ukvm 500 500 nabla 10 10 10 process Hello world • Results: 250 world Hello 0 • 0 0 0 0 Ukvm has 30-370% higher 200 200 200 1500 1500 latency than nabla 150 150 150 1000 1000 100 100 100 500 500 50 50 50 HTTP POST • Mostly due avoiding KVM Post HTTP 0 0 0 0 0 246810121416 246810121416 246810121416 246810121416 02468101214 overheads Number of cores

26 Summary and Next Steps

• Unikernels should run as processes! • Maintain isolation via thin interface • Improve performance, etc. Process

• Next steps: can unikernels as processes be used to improve container isolation? • Nabla containers • https://nabla-containers.github.io/

27 28