Lupine “A Linux in Unikernel Clothing”

Dan Williams (IBM)

With Hsuan-Chi (Austin) Kuo (UIUC + IBM), Ricardo Koller (IBM), Sibin Mohan (UIUC) Roadmap

• Context • Containers and isolation • Unikernels • Nabla containers • Lupine Linux • A linux in unikernel clothing • Concluding thoughts

2 Containers are great!

• Have changed how applications are packaged, deployed and developed • Normal processes, but “contained” • Namespaces, cgroups, chroot • Lightweight • Start quickly, “bare metal” • Easy image management (layered fs) • Tooling/orchestration ecosystem

3 But…

High level of • Large attack surface to the host abstraction (e.g., system • Limits adoption of container-first architecture calls)

app

• Fortunately, we know how to reduce attack surface! Host Kernel with namespacing (e.g., Linux) Containers

4 Deprivileging and unsharing kernel functionality

Low level of High level of • Virtual machines (VMs) abstraction abstraction • Guest kernel (e.g., virtual (e.g., system app hardware) calls) • Thin interface

Guest Kernel app (e.g., Linux)

Monitor (e.g., QEMU) Host Kernel with Host Kernel/ namespacing (e.g., Linux/KVM) (e.g., Linux) VMs Containers

5 Deprivileging and unsharing kernel functionality

Low level of High level of • Virtual machines (VMs) abstraction abstraction • Guest kernel (e.g., virtual (e.g., system app hardware) calls) • Thin interface • Userspace kernel Guest Kernel app • Performance issues (e.g., Linux)

Monitor Process (e.g., QEMU) Host Kernel with Host Kernel/Hypervisor namespacing (e.g., Linux/KVM) (e.g., Linux) VMs Containers

6 But wait? Aren't VMs slow and heavyweight?

• Boot time? • Memory footprint?

• Especially for environments like serverless??!!

7 VMs are becoming lightweight Low level of abstraction • Thin monitors (e.g., virtual • e.g., AWS Firecracker app hardware) • Reduce complexity for performance (e.g., no PCI) Guest Kernel (e.g., Linux)

Monitor Process (e.g., QEMU) Host Kernel/Hypervisor (e.g., Linux/KVM) VMs

8 VMs are becoming lightweight Low level of abstraction • Thin monitors (e.g., virtual • e.g., AWS Firecracker app hardware) • Reduce complexity for performance (e.g., no PCI) Guest Kernel (e.g., Linux)

Monitor Process

Host Kernel/Hypervisor (e.g., Linux/KVM) VMs Firecracker boot times as reported in Agache et al., NSDI 2020

9 VMs are becoming lightweight Low level of abstraction • Thin monitors (e.g., virtual • e.g., AWS Firecracker app hardware) • Reduce complexity for performance (e.g., no PCI) Guest Kernel (e.g., Linux)

Monitor Process

Host Kernel/Hypervisor (e.g., Linux/KVM) VMs Firecracker boot times as reported in Agache et al., NSDI 2020 Manco et al., SOSP 2017 10 VMs are becoming lightweight Low level of abstraction • Thin monitors (e.g., virtual • e.g., AWS Firecracker app hardware) • Reduce complexity for performance (e.g., no PCI)

Guest Kernel • Thin guests? • Userspace: (e.g., Ubuntu --> Alpine Linux) Monitor Process

• Kernel configuration (e.g., TinyX, Lupine) Host Kernel/Hypervisor • Unikernels (e.g., Linux/KVM) VMs

KubeCon 2020 Eurosys 2020 11 Unikernels are thin guests to the extreme

• An application linked with library OS components • Run on virtual hardware (like) abstraction • Single CPU VM

• Language-specific • MirageOS (OCaml) • IncludeOS (C++)

• Legacy-oriented • Rumprun (NetBSD-based) • Hermitux • OSv Claim binary compatibility with Linux 12 Deprivileging and unsharing kernel functionality

Low level of High level of • Virtual machines (VMs) abstraction abstraction • Guest kernel (e.g., virtual (e.g., system app hardware) calls) • Thin interface • Userspace kernel Guest Kernel app • E.g., UML (e.g., Linux) • Performance issues Monitor Process • (e.g., QEMU) Library OS / unikernel Host Kernel with • Only-what-you need Host Kernel/Hypervisor namespacing • Lightweight (e.g., Linux/KVM) (e.g., Linux) VMs Containers

13 What we learned from Nabla containers

• Nabla containers are unikernels as Unikernel as Process processes Application • Can achieve or exceed lightweight HotCloud ’16, HotOS ‘17 libraries, runtimes characteristics of containers Library OS • Interfaces are what matter, not HotCloud ’18, SOCC ‘18 HW TCP/IP FS … • But we lose a lot: Generality Solo5 • Lupine Linux: applying unikernel techniques to Linux VMs Eurosys ’20

14 What we learned from Nabla containers

• Nabla containers are unikernels as Unikernel as Process processes Application • Can achieve or exceed lightweight HotCloud ’16, HotOS ‘17 libraries, runtimes characteristics of containers Library OS • Interfaces are what matter, not HotCloud ’18, SOCC ‘18 virtualization HW TCP/IP FS … • But we lose a lot: Generality Solo5 • Lupine Linux: applying unikernel Eurosys ’20 this techniques to Linux VMs talk

15 Lupine Linux Overview and Roadmap

• Introduction • Lupine Linux Application (container) Unikernel-like techniques App rootfs • Specialization Application manifest Specialization • System Call Overhead via Kconfig Elimination System Call • Putting it together Overhead Elimination Lupine Linux • Evaluation via KML Linux source “Unikernel” • Discussion • Related Work

16 Unikernels are great

• Small kernel size • Fast boot time • Performance • Security Unikernels are great… but

• Small kernel size • Lack full Linux support • Fast boot time • Hermitux: supports only 97 system calls • Performance • OSv: • application needs to be compiled with –PIE, can’t use TLS • Security • Static-linked applications are not supported • Fork() , execve() are not supported • Special files are not supported such as /proc • Signal mechanism is not complete • Rumprun: only 37 curated applications • Community is too small to keep it rolling Can Linux > be as small as

Lupine Linux “Unikernel” > boot as fast as > outperform unikernels? Can Linux > be as small as

Lupine Linux “Unikernel” > boot as fast as > outperform unikernels? • Spoiler alert: Yes! • 4MB image size • 23 ms boot time • Up to 33% higher throughput Lupine Linux Overview and Roadmap

• Introduction • Lupine Linux Application (container) Unikernel-like techniques App rootfs • Specialization Application manifest Specialization • System Call Overhead via Kconfig Elimination System Call • Putting it together Overhead Elimination Lupine Linux • Evaluation via KML Linux source “Unikernel” • Discussion • Related Work

21 Lupine Linux Overview and Roadmap

• Introduction • Lupine Linux Application (container) Unikernel-like techniques App rootfs • Specialization Application manifest Specialization • System Call Overhead via Kconfig Elimination System Call • Putting it together Overhead Elimination Lupine Linux • Evaluation via KML Linux source “Unikernel” • Discussion • Related Work

22 Unikernel technique #1: Specialization

• Unikernels include only what is needed

• Linux is very configurable • Kconfig • 16,000 options • Drivers • Filesystems • Processor features • ... Specializing Linux through configuration

• Start with Firecracker microvm configuration • Assuming unikernel-like workload, can remove even more! • Application-specific options • Multiprocessing • HW management

24 Application-specific options

• Example: system calls Option Enabled System Call(s) Furthermore, the kernel is usually run in a separate, more ADVISE_SYSCALLS madvise, fadvise64 privileged security domain than the application. As such, AIO io_setup, io_destroy, io_submit, io_cancel, io_getevents the kernel contains enhanced access control systems such • Kernel services BPF_SYSCALL bpf EPOLL epoll_ctl, epoll_create, epoll_wait, epoll_pwait as SELinux and functionality to guard the crossing from the • e.g., /proc, sysctl EVENTFD eventfd, eventfd2 application domain to the kernel domain, such as seccomp FANOTIFY fanotify_init, fanotify_mark ￿lters, all of which are all unnecessary for unikernels More • Kernel library FHANDLE open_by_handle_at, name_to_handle_at importantly, security options with a severe impact on per- FILE_LOCKING ￿ock formance are also unnecessary for this reason. For example, • Crypto routines FUTEX futex, set_robust_list, get_robust_list INOTIFY_USER inotify_init, inotify_add_watch, inotify_rm_watch KPTI (kernel isolation [9]) forbids the mapping • Compression routines SIGNALFD signalfd, signalfd4 of kernel pages into processes’ page table to mitigate the TIMERFD timerfd_create, timerfd_gettime, timerfd_settime Meltdown [39] vulnerability. This dramatically a￿ects sys- • Debugging/information Table 1. Linux con￿guration options that enable/disable tem call performance; when testing with KPTI on Linux 5.0 system calls. we measured a 10x slowdown in system call latency. In total, we eliminated 12 con￿guration options due to the single security domain. 25 Linux is well equipped to run on multiple-processor sys- A Lupine kernel compiled for redis does not contain the tems. As a result, the kernel contains various options to in- AIO or EVENTFD-related system calls. clude and tune SMP and NUMA functionality. On the other In addition to the above, some applications expect other hand, since most unikernels do not support fork, the stan- services from the kernel, for instance, the /proc ￿lesystem or dard approach to take advantage of multiple processors is to sysctl functionality. Moreover, the Linux kernel maintains run multiple unikernels. a substantial library that resides in the kernel because of its Finally, Linux contains facilities for dynamically loading traditional position as a more privileged security domain. functionality through modules. A single application facili- Unikernels do not maintain the traditional privilege separa- tates the creation of a kernel that contains all functionality tion but may make use of this functionality directly or indi- it needs at build time. rectly by using a protocol or service that needs it (e.g., cryp- Overall, we attribute the removal of 89 con￿guration op- tographic routines for IPsec). We marked 20 compression- tions to the single-process—“uni”—characteristics of uniker- related and 55 crypto-related options from the microVM nels as shown in Figure 4 (under "Multiple Processes"). In con￿guration as application-speci￿c. Finally, Linux contains Section 5, we examine the relaxation of this property. signi￿cant facilities for debugging; a Lupine unikernel can select up to 65 debugging and information-related kernel Unikernels are not intended for general hardware. De- con￿guration options from microVM’s con￿guration. fault con￿gurations for Linux are intended to result in a In total, we classi￿ed approximately 311 con￿guration general-purpose system. Such a system is intimately involved options as application-speci￿c as shown in Figure 4. In Sec- in managing hardware with con￿gurable functionality to tion 4, we will evaluate the degree of application specializa- perform tasks, including power management, hotplug and tion via Linux kernel con￿guration (and its e￿ects) achieved driving and interfacing with devices. Unikernels, which are in Lupine for common cloud applications. typically intended to run as virtual machines in the cloud, can leave many physical hardware management tasks to the 3.1.2 Unnecessary options. underlying host or hypervisor. Firecracker’s microVM ker- Some options in microVM’s con￿guration will, by de￿ni- nel con￿guration demonstrates the ￿rst step by eliminating tion, never be needed by any Lupine unikernel so they can many unnecessary drivers and architecture-speci￿c con￿g- be safely eliminated. We categorize these options into two uration options (as shown in Figure 3). Lupine’s con￿gura- groups: (1) those that stem from the single-process nature of tion goes further by classifying 150 con￿guration options— unikernels and (2) those that stem from the expected virtual including 24 options for power management that can be left hardware environment in the cloud. to the underlying host—as unnecessary for Lupine uniker- Unikernels are not intended for multiple processes. The nels as shown in Figure 4. Linux kernel is intended to run multiple processes, thus requir- ing con￿gurable functionality for synchronization, sched- 3.2 Eliminating System Call Overhead uling and resource accounting. For example, cgroups and Kernel Mode Linux [41] is an existing patch to Linux that namespaces are speci￿c mechanisms that limit, account for enables normal user processes to run in kernel mode, share and isolate resource utilization between processes or groups an address space with the kernel and eliminate the need for of processes. We classi￿ed about 20 con￿guration options expensive privilege transitions or context switches during related to cgroups and namespaces in Firecracker’s microVM system calls. Yet they are processes that, unlike kernel mod- con￿guration. ules, do not require any change to the programming model 5 Other assumptions from unikernels

• Unikernels are not intended for multiple processes • Related to isolating, accounting for processes • Cgroups, namespaces, SElinux, seccomp, KPTI • SMP, NUMA • Module support • Unikernels are not intended for general hardware • Intended to run as VMs in the cloud • microVM removes many drivers and arch-specific configs • Lupine removes more, including power mgmt

26 How to get an app-specific kernel config

• Start with lupine-base • Manual trial and error • Guided by application output • E.g., the futex facility returned an unexpected error code => CONFIG_FUTEX

• In general, this is a hard problem

27 Lupine Linux Overview and Roadmap

• Introduction • Lupine Linux Application (container) Unikernel-like techniques App rootfs • Specialization Application manifest Specialization • System Call Overhead via Kconfig Elimination System Call • Putting it together Overhead Elimination Lupine Linux • Evaluation via KML Linux source “Unikernel” • Discussion • Related Work

28 Unikernel technique #2: System call overhead elimination • Kernel Mode Linux (KML) • Non-upstream patch (latest Linux 4.0) • Execute unmodified apps in kernel mode • User program can directly access the kernel

Location exposed • Replace “syscall” instruction with “call” in libc e.g., musl via vsyscall - __asm__ __volatile__ ("syscall" : "=a"(ret) : + __asm__ __volatile__ ("call *%1" : "=a"(ret) : "r"(__kml), "a"(n), "D"(a1), "S"(a2), "d"(a3), "r"(r10), "r"(r8), "r"(r9) : "rcx", "r11", "memory"); • Requires relink for static binaries • Less invasive than build modifications for unikernels Lupine Linux Overview and Roadmap

• Introduction • Lupine Linux Application (container) Unikernel-like techniques App rootfs • Specialization Application manifest Specialization • System Call Overhead via Kconfig Elimination System Call • Putting it together Overhead Elimination Lupine Linux • Evaluation via KML Linux source “Unikernel” • Discussion • Related Work

30 Putting it all together

libc.so libm.so librarieslibc.so Unmodified app binary Linux kernel source

31 Putting it all together

Application-specific requirements (manifest) libc.so libm.so Application-specific librarieslibc.so Lupine config Unmodified app binary Linux kernel source Specialization

32 Putting it all together

Application-specific requirements (manifest) libc.so libm.so Application-specific librarieslibc.so Lupine config Unmodified app binary Linux kernel source Specialization

KML KML-enabled patch musl libc overhead System call elimination

33 Putting it all together

Application-specific requirements (manifest) libc.so libm.so Application-specific librarieslibc.so Lupine config Unmodified app binary Linux kernel source Specialization

KML KML-enabled patch musl libc overhead System call elimination

Application-specific Lupine kernel image

34 Remaining issues

• How to build a root filesystem for Linux • Container images are root filesystems already • Contains both application and necessary libraries

• How to start the (single) application • Linux kernel parameter “init” specifies first program, usually “/sbin/init” • Boot the kernel with “init=/app” • Caveats: • May need some simple setup (e.g., network) • Application-specific!

35 Putting it all together

Application-specific requirements (manifest) libc.so libm.so Application-specific librarieslibc.so Lupine config Unmodified app binary Linux kernel source Specialization

KML KML-enabled patch musl libc overhead System call elimination

Application-specific Lupine kernel image

36 Putting it all together

Application-specific requirements (manifest) Container image (alpine) libc.so Metadata libm.so Application-specific librarieslibc.so entrypoint Lupine config Unmodified env variables app binary Linux kernel source Specialization

KML KML-enabled patch musl libc overhead System call elimination

Application-specific Lupine kernel image

37 Putting it all together

Application-specific requirements (manifest) Container image (alpine) libc.so Metadata libm.so Application-specific librarieslibc.so Application- entrypoint Lupine config specific Unmodified env variables startup script app binary Linux kernel source (“init”) Specialization

KML KML-enabled patch musl libc overhead System call elimination

Application-specific Lupine kernel image Lupine app “rootfs”

38 Lupine Linux Overview and Roadmap

• Introduction • Lupine Linux Application (container) Unikernel-like techniques App rootfs • Specialization Application manifest Specialization • System Call Overhead via Kconfig Elimination System Call • Putting it together Overhead Elimination Lupine Linux • Evaluation via KML Linux source “Unikernel” • Discussion • Related Work

39 Evaluation setup

• Machine setup • CPU: Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz • Mem: 16 GB • VM setup • Hypervisor : firecracker • 1 VCPU, 512 MB Mem • Guest: Linux 4.0 with and without KML patches improvement in macrobenchmarks, indicating that system # Options atop Name Downloads Description call overhead should not be a primary concern for uniker- lupine-base nel developers. Finally, we show that Lupine avoids major nginx 1.7 Web server 13 pitfalls of POSIX-like unikernels that stem from not being postgres 1.6 Database 10 Linux-based, including both the lack of support for unmod- httpd 1.4 Web server 13 i￿ed applications and performance from highly-optimized node 1.2 Language runtime 5 code. redis 1.2 Key-value store 10 mongo 1.2 NOSQL database 11 4.1 Con￿guration diversity mysql 1.2 Database 9 trae￿k 1.1 Edge router 8 Lupine attempts to mimic the only-what-you-need approach memcached 0.9 Key-value store 10 of unikernels in order to achieve some of their performance hello-world 0.9 C program “hello” 0 and security characteristics. In this subsection, we evaluate mariadb 0.8 Database 13 how much specialization of the Linux kernel occurs in prac- golang 0.6 Language runtime 0 tice when considering the most popular cloud applications. python 0.5 Language runtime 0 Unlike other unikernel approaches, Lupine poses no re- openjdk 0.5 Language runtime 0 strictions on applications and requires no application modi- rabbitmq 0.5 Message broker 12 ￿cations, alternate build processes, or curated package lists. php 0.4 Language runtime 0 As a result, we were able to directly run the most popu- wordpress 0.4 PHP/mysql blog tool 9 lar cloud applications on Lupine unikernels. To determine Configurationhaproxy 0.4 DiversityLoad balancer 8 popularity, we used the 20 most downloaded container im- in￿uxdb 0.3 Time series database 11 elasticsearch 0.3 Search engine 12 ages from Docker Hub [2]. We ￿nd that popularity follows a improvement in macrobenchmarks, indicating that system # Options atop Top twenty most popular applications on Docker Name Downloads Description power-law distribution: 20 applications account for 83% of Table 3. call overhead should not be a primary concern for uniker- lupine-base • Manually determined app-specificnel developers. configurations Finally, we show that Lupine avoids major nginx 1.7 Web server 13 Hub (by billions of downloads) and thepitfalls number of POSIX-like of unikernels additional that stem from not being postgres 1.6 Database 10 all downloads. Table 3 lists the applications. Linux-based, including both the lack of support for unmod- httpd 1.4 Web server 13 • con￿guration options each requires beyondi￿ed applications the and performancelupine-base from highly-optimized node 1.2 Language runtime 5 For each application, in place of an application manifest, 20 top apps on Docker hub (83%code. of all downloads) redis 1.2 Key-value store 10 9 mongo 1.2 NOSQL database 11 kernel con￿guration. mysql 1.2 Database 9 we carried out the following process to determine the mini- 4.1 Con￿guration diversity • Only 19 configuration options required to run all trae￿k 1.1 Edge router 8 Lupine attempts to mimic the only-what-you-need approach memcached 0.9 Key-value store 10 mal viable con￿guration. First we ran the application as a of unikernels in order to achieve some of their performance hello-world 0.9 C program “hello” 0 20 applications: lupine-generaland security characteristics. In this subsection, we evaluate mariadb 0.8 Database 13 standard container to determine success criteria for the ap- how much specialization of the Linux kernel occurs in prac- golang 0.6 Language runtime 0 tice when considering the most popular cloud applications. python 0.5 Language runtime 0 Unlike other unikernel approaches, Lupine poses no re- openjdk 0.5 Language runtime 0 plication. While success criteria could include sophisticated strictions on applications and requires no application modi- rabbitmq 0.5 Message broker 12 20 ￿cations, alternate build processes, or curated package lists. php 0.4 Language runtime 0 test suites or achieving performance targets, we limited our- As a result, we were able to directly run the most popu- wordpress 0.4 PHP/mysql blog tool 9 18 lar cloud applications on Lupine unikernels. To determine haproxy 0.4 Load balancer 8 selves to the following tests. Language runtimes like golang, popularity, we used the 20 most downloaded container im- in￿uxdb 0.3 Time series database 11 g optionsg 16 elasticsearch 0.3 Search engine 12

ages from Docker Hub [2]. We ￿nd that popularity follows a openjdk or python were tested by compiling (when applica- 14 power-law distribution: 20 applications account for 83% of Table 3. Top twenty most popular applications on Docker all downloads. Table 3 lists the applications. Hub (by billions of downloads) and the number of additional For each application, in place of an application manifest, con￿guration options each requires beyond the lupine-base ble) a hello world application and testing that the message 12 9 we carried out the following process to determine the mini- kernel con￿guration. was correctly printed. Servers like elasticsearch or nginx 10 mal viable con￿guration. First we ran the application as a standard container to determine success criteria for the ap- 8 plication. While success criteria could include sophisticated were tested with simple queries or health status queries. con Number 0 2 4 6 8 10 12test suites 14 or achieving 16 performance 18 targets, 20 we limited our- haproxy traefik selves to the following tests. Language runtimes like golang, and were tested by checking the logs in- Support for top x openjdkappsor python were tested by compiling (when applica- ble) a hello world application and testing that the message 41 dicating that they were ready to accept tra￿c. We discuss was correctly printed. Servers like elasticsearch or nginx were tested with simple queries or health status queries. the potential pitfalls of this approach in Section 6. haproxy and traefik were tested by checking the logs in- Figure 5. Growth of Lupine applicationdicating that con they￿ weregurations ready to accept tra￿ toc. We discuss the potential pitfalls of this approach in Section 6. Once we had determined success criteria, we attempted to Figure 5. Growth of Lupine application con￿gurations to support more apps. Once we had determined success criteria, we attempted to support more apps. run the application on a Linux kernel built with the run the application on a Linux kernel built with the lupine- lupine- base con￿guration as described in Section 3.1. Recall that the base con￿guration is derived from microVM but lacks CONFIG_UNIX. Some error messages were less helpful and base con￿guration as described in Section 3.1. Recall that about 550 con￿guration options that we classi￿ed as hard- required some trial and error. Finally, some messages indi- ware management, multiprocessing and application-speci￿c. cated that the application was likely not well-suited to be a the base con￿guration is derived from microVM but lacks CONFIG_UNIX. Some error messagesSome were applications less require helpful no further con and￿guration options unikernel. For example, postgres in Linux is made up of ￿ve to be enabled beyond lupine-base. For others, we added new processes (background writers, checkpointer, and replicator). about 550 con￿guration options that we classi￿ed as hard- required some trial and error. Finally,options some one by one messages while testing the application indi- at each step. It required CONFIG_SYSVIPC, an option we had classi￿ed as We expected all new options to be from the set classi￿ed as multi-process related and therefore not appropriate for a ware management, multiprocessing and application-speci￿c. application-speci￿c. unikernel. Lupine can run such an application despite its ob- cated that the application was likely notThe process well-suited was manual: application to be output a guided vious non-unikernel character, which is an advantage over Some applications require no further con￿guration options which con￿guration options to try. For example, an er- other unikernel-based approaches. We will discuss the im- unikernel. For example, postgres in Linuxror message is like made“the futex facility up of returned￿ve an unexpected plications of relaxing unikernel restrictions in Section 5. to be enabled beyond lupine-base. For others, we added new error code” indicated that we should add CONFIG_FUTEX, processes (background writers, checkpointer,“epoll_create1 failed: and function replicator). not implemented” suggested we 9We exclude the Docker daemon in this table because Linux 4.0 does not try CONFIG_EPOLL and “can’t create socket” indicated support layered ￿le systems, a prerequisite for Docker. options one by one while testing the application at each step. It required CONFIG_SYSVIPC, an option we had classi￿ed as 7 We expected all new options to be from the set classi￿ed as multi-process related and therefore not appropriate for a application-speci￿c. unikernel. Lupine can run such an application despite its ob- The process was manual: application output guided vious non-unikernel character, which is an advantage over which con￿guration options to try. For example, an er- other unikernel-based approaches. We will discuss the im- ror message like “the futex facility returned an unexpected plications of relaxing unikernel restrictions in Section 5. error code” indicated that we should add CONFIG_FUTEX, “epoll_create1 failed: function not implemented” suggested we 9We exclude the Docker daemon in this table because Linux 4.0 does not try CONFIG_EPOLL and “can’t create UNIX socket” indicated support layered ￿le systems, a prerequisite for Docker. 7 Kernel image size

16 14 12 10 8 6 4 Megabytes 2 0 microVM lupine lupine-generalhermitux rump

Figure 6. Image size for hello world. Figure 7. Boot time for hello world.

4.3 Boot Time Figure 7 shows the boot time to run a hello-world application Table 3 shows the number of con￿guration options (be- for each con￿guration. Firecracker logs the boot time of all yond lupine-base) deemed necessary to reach the success Linux variants and OSv based on an I/O port write from the 42 solo5-hvt criteria for each application. Figure 5 depicts overlapping guest. We modi￿ed the unikernel monitors and uhyve options thus showing how the union of the necessary con￿g- respectively to similarly measure boot time via an I/O uration options grows as more applications are considered. port write from the guest. The union of all con￿guration options is 19; in other words, As shown in Figure 7, use of a unikernel monitor does a kernel (lupine-general) with only 19 con￿guration options not guarantee fast boot time. Instead, unikernel implementa- added on top of the lupine-base con￿guration is su￿cient to tion choices dominate the boot time. The OSv measurements run all 20 of the most popular applications. The ￿attening of show how dramatic the e￿ects of unikernel implementation the growth curve provides evidence that a relatively small can be: when we ￿rst measured it using zfs (the standard set of con￿guration options may be su￿cient to support a r/w ￿lesystem for OSv), boot time was 10x slower than the large number of popular applications. numbers we had seen reported elsewhere. After investiga- tion, we found that a reduction in unikernel complexity to use a read-only ￿lesystem resulted in the 10x improvement, 4.2 Image Size thus underscoring the importance of implementation. Most unikernels achieve small image sizes by eschewing gen- Lupine’s con￿guration shows signi￿cant improvement erality. Similarly, Lupine uses the Linux kernel’s con￿gura- over microVM and comparable boot time to the reference tion facilities for specialization. Figure 6 compares the kernel unikernels. In Figure 7, we present the boot time without image size of Lupine to microVM and several unikernels—all KML (lupine-nokml). A primary enabler of fast boot time con￿gured to run a simple hello world application in or- in Linux comes from the CONFIG_PARAVIRT con￿guration der to measure the minimal possible kernel image size. The option which is active in microVM and lupine-nokml, but lupine-base image (su￿cient for the hello world) is only 27% currently incompatible with KML. Without this option boot of the microVM image, which is already a slim kernel image time jumps to 71 ms for Lupine. We believe that the in- for general cloud applications. When con￿guring Lupine to compatibilities with KML are not fundamental and could be optimize for size over performance (-tiny), the Lupine image overcome with engineering e￿ort and would result in similar shrinks by a further 6%. boot times to lupine-nokml. We do not ￿nd an improvement Figure 6 shows Lupine to be comparable to our reference in Lupine’s boot time when employing space-saving tech- unikernel images. We note that Lupine dynamically loads niques (-tiny) with or without KML. In other words, the 6% libraries (like libc) whereas unikernels like Rump statically reduction in image size described in Section 4.2 does not link them into the image, resulting in larger image size. How- a￿ect boot time thus implying that boot time is more about ever, for a trivial application like hello-world libc is not reducing the complexity of the boot process than the image needed and Rump’s libc (which consists of 24M) is not in- size. For lupine-general, we found only 2 ms additional boot cluded in the image (or the reported numbers). time is incurred, remaining faster than HermiTux and OSv. We also examined the e￿ect on application-speci￿c con- ￿guration on the Lupine kernel image size. We found that 4.4 Memory Footprint the image size of lupine kernels varied from 27 33% of mi- Unikernels achieve low memory footprint by using small croVM’s baseline. Compared to lupine-base, this corresponds runtime images that include only what is needed to run a to an increase of up to 19 percent. Even with the largest particular application. We de￿ne the memory footprint for an Lupine kernel con￿guration (lupine-general) the resulting application as the minimum amount of memory required by image size remains smaller than the corresponding OSv and the unikernel to successfully run that application as de￿ned Rump image sizes. by success criteria described in Section 4.1. We determine 8 Kernel image size

16 • Configuration is effective 14 12 10 • 4 MB 8 6 4 • 27% (hello) - 33% of Megabytes 2 0 microvm microVM lupine lupine-generalhermitux osv rump • Even lupine-general produces smaller images Figure 6. Image size for hello world. Figure 7. Boot time for hello world. than Rump, OSv

4.3 Boot Time Figure 7 shows the boot time to run a hello-world application Table 3 shows the number of con￿guration options (be- for each con￿guration. Firecracker logs the boot time of all yond lupine-base) deemed necessary to reach the success Linux variants and OSv based on an I/O port write from the 43 solo5-hvt criteria for each application. Figure 5 depicts overlapping guest. We modi￿ed the unikernel monitors and uhyve options thus showing how the union of the necessary con￿g- respectively to similarly measure boot time via an I/O uration options grows as more applications are considered. port write from the guest. The union of all con￿guration options is 19; in other words, As shown in Figure 7, use of a unikernel monitor does a kernel (lupine-general) with only 19 con￿guration options not guarantee fast boot time. Instead, unikernel implementa- added on top of the lupine-base con￿guration is su￿cient to tion choices dominate the boot time. The OSv measurements run all 20 of the most popular applications. The ￿attening of show how dramatic the e￿ects of unikernel implementation the growth curve provides evidence that a relatively small can be: when we ￿rst measured it using zfs (the standard set of con￿guration options may be su￿cient to support a r/w ￿lesystem for OSv), boot time was 10x slower than the large number of popular applications. numbers we had seen reported elsewhere. After investiga- tion, we found that a reduction in unikernel complexity to use a read-only ￿lesystem resulted in the 10x improvement, 4.2 Image Size thus underscoring the importance of implementation. Most unikernels achieve small image sizes by eschewing gen- Lupine’s con￿guration shows signi￿cant improvement erality. Similarly, Lupine uses the Linux kernel’s con￿gura- over microVM and comparable boot time to the reference tion facilities for specialization. Figure 6 compares the kernel unikernels. In Figure 7, we present the boot time without image size of Lupine to microVM and several unikernels—all KML (lupine-nokml). A primary enabler of fast boot time con￿gured to run a simple hello world application in or- in Linux comes from the CONFIG_PARAVIRT con￿guration der to measure the minimal possible kernel image size. The option which is active in microVM and lupine-nokml, but lupine-base image (su￿cient for the hello world) is only 27% currently incompatible with KML. Without this option boot of the microVM image, which is already a slim kernel image time jumps to 71 ms for Lupine. We believe that the in- for general cloud applications. When con￿guring Lupine to compatibilities with KML are not fundamental and could be optimize for size over performance (-tiny), the Lupine image overcome with engineering e￿ort and would result in similar shrinks by a further 6%. boot times to lupine-nokml. We do not ￿nd an improvement Figure 6 shows Lupine to be comparable to our reference in Lupine’s boot time when employing space-saving tech- unikernel images. We note that Lupine dynamically loads niques (-tiny) with or without KML. In other words, the 6% libraries (like libc) whereas unikernels like Rump statically reduction in image size described in Section 4.2 does not link them into the image, resulting in larger image size. How- a￿ect boot time thus implying that boot time is more about ever, for a trivial application like hello-world libc is not reducing the complexity of the boot process than the image needed and Rump’s libc (which consists of 24M) is not in- size. For lupine-general, we found only 2 ms additional boot cluded in the image (or the reported numbers). time is incurred, remaining faster than HermiTux and OSv. We also examined the e￿ect on application-speci￿c con- ￿guration on the Lupine kernel image size. We found that 4.4 Memory Footprint the image size of lupine kernels varied from 27 33% of mi- Unikernels achieve low memory footprint by using small croVM’s baseline. Compared to lupine-base, this corresponds runtime images that include only what is needed to run a to an increase of up to 19 percent. Even with the largest particular application. We de￿ne the memory footprint for an Lupine kernel con￿guration (lupine-general) the resulting application as the minimum amount of memory required by image size remains smaller than the corresponding OSv and the unikernel to successfully run that application as de￿ned Rump image sizes. by success criteria described in Section 4.1. We determine 8 Boot time

• Measured via I/O port

write from guest 60 50 • OSv boot heavily 40 30 depends on FS choice 20 10 Milliseconds 0 microVM lupine-nokmllupine-nokml-generalhermitux osv-rofs osv-zfs rump

Figure 6. Image size for hello world. Figure 7. Boot time for hello world.

4.3 Boot Time Figure 7 shows the boot time to run a hello-world application44 Table 3 shows the number of con￿guration options (be- for each con￿guration. Firecracker logs the boot time of all yond lupine-base) deemed necessary to reach the success Linux variants and OSv based on an I/O port write from the solo5-hvt criteria for each application. Figure 5 depicts overlapping guest. We modi￿ed the unikernel monitors and uhyve options thus showing how the union of the necessary con￿g- respectively to similarly measure boot time via an I/O uration options grows as more applications are considered. port write from the guest. The union of all con￿guration options is 19; in other words, As shown in Figure 7, use of a unikernel monitor does a kernel (lupine-general) with only 19 con￿guration options not guarantee fast boot time. Instead, unikernel implementa- added on top of the lupine-base con￿guration is su￿cient to tion choices dominate the boot time. The OSv measurements run all 20 of the most popular applications. The ￿attening of show how dramatic the e￿ects of unikernel implementation the growth curve provides evidence that a relatively small can be: when we ￿rst measured it using zfs (the standard set of con￿guration options may be su￿cient to support a r/w ￿lesystem for OSv), boot time was 10x slower than the large number of popular applications. numbers we had seen reported elsewhere. After investiga- tion, we found that a reduction in unikernel complexity to use a read-only ￿lesystem resulted in the 10x improvement, 4.2 Image Size thus underscoring the importance of implementation. Most unikernels achieve small image sizes by eschewing gen- Lupine’s con￿guration shows signi￿cant improvement erality. Similarly, Lupine uses the Linux kernel’s con￿gura- over microVM and comparable boot time to the reference tion facilities for specialization. Figure 6 compares the kernel unikernels. In Figure 7, we present the boot time without image size of Lupine to microVM and several unikernels—all KML (lupine-nokml). A primary enabler of fast boot time con￿gured to run a simple hello world application in or- in Linux comes from the CONFIG_PARAVIRT con￿guration der to measure the minimal possible kernel image size. The option which is active in microVM and lupine-nokml, but lupine-base image (su￿cient for the hello world) is only 27% currently incompatible with KML. Without this option boot of the microVM image, which is already a slim kernel image time jumps to 71 ms for Lupine. We believe that the in- for general cloud applications. When con￿guring Lupine to compatibilities with KML are not fundamental and could be optimize for size over performance (-tiny), the Lupine image overcome with engineering e￿ort and would result in similar shrinks by a further 6%. boot times to lupine-nokml. We do not ￿nd an improvement Figure 6 shows Lupine to be comparable to our reference in Lupine’s boot time when employing space-saving tech- unikernel images. We note that Lupine dynamically loads niques (-tiny) with or without KML. In other words, the 6% libraries (like libc) whereas unikernels like Rump statically reduction in image size described in Section 4.2 does not link them into the image, resulting in larger image size. How- a￿ect boot time thus implying that boot time is more about ever, for a trivial application like hello-world libc is not reducing the complexity of the boot process than the image needed and Rump’s libc (which consists of 24M) is not in- size. For lupine-general, we found only 2 ms additional boot cluded in the image (or the reported numbers). time is incurred, remaining faster than HermiTux and OSv. We also examined the e￿ect on application-speci￿c con- ￿guration on the Lupine kernel image size. We found that 4.4 Memory Footprint the image size of lupine kernels varied from 27 33% of mi- Unikernels achieve low memory footprint by using small croVM’s baseline. Compared to lupine-base, this corresponds runtime images that include only what is needed to run a to an increase of up to 19 percent. Even with the largest particular application. We de￿ne the memory footprint for an Lupine kernel con￿guration (lupine-general) the resulting application as the minimum amount of memory required by image size remains smaller than the corresponding OSv and the unikernel to successfully run that application as de￿ned Rump image sizes. by success criteria described in Section 4.1. We determine 8 Boot time

• Measured via I/O port

write from guest 60 50 • OSv boot heavily 40 30 depends on FS choice 20 10 Milliseconds 0 • Lupine boot time microVM lupine-nokmllupine-nokml-generalhermitux osv-rofs osv-zfs rump without KML* • Even lupine-general Figure 6. Image sizeboots for hello faster world. than Figure 7. Boot time for hello world. Hermitux, OSv

4.3 Boot Time *KML incompatibility with CONFIG_PARAVIRT Figure 7 shows the boot time to run a hello-world application45 Table 3 shows the number of con￿guration options (be- for each con￿guration. Firecracker logs the boot time of all yond lupine-base) deemed necessary to reach the success Linux variants and OSv based on an I/O port write from the solo5-hvt criteria for each application. Figure 5 depicts overlapping guest. We modi￿ed the unikernel monitors and uhyve options thus showing how the union of the necessary con￿g- respectively to similarly measure boot time via an I/O uration options grows as more applications are considered. port write from the guest. The union of all con￿guration options is 19; in other words, As shown in Figure 7, use of a unikernel monitor does a kernel (lupine-general) with only 19 con￿guration options not guarantee fast boot time. Instead, unikernel implementa- added on top of the lupine-base con￿guration is su￿cient to tion choices dominate the boot time. The OSv measurements run all 20 of the most popular applications. The ￿attening of show how dramatic the e￿ects of unikernel implementation the growth curve provides evidence that a relatively small can be: when we ￿rst measured it using zfs (the standard set of con￿guration options may be su￿cient to support a r/w ￿lesystem for OSv), boot time was 10x slower than the large number of popular applications. numbers we had seen reported elsewhere. After investiga- tion, we found that a reduction in unikernel complexity to use a read-only ￿lesystem resulted in the 10x improvement, 4.2 Image Size thus underscoring the importance of implementation. Most unikernels achieve small image sizes by eschewing gen- Lupine’s con￿guration shows signi￿cant improvement erality. Similarly, Lupine uses the Linux kernel’s con￿gura- over microVM and comparable boot time to the reference tion facilities for specialization. Figure 6 compares the kernel unikernels. In Figure 7, we present the boot time without image size of Lupine to microVM and several unikernels—all KML (lupine-nokml). A primary enabler of fast boot time con￿gured to run a simple hello world application in or- in Linux comes from the CONFIG_PARAVIRT con￿guration der to measure the minimal possible kernel image size. The option which is active in microVM and lupine-nokml, but lupine-base image (su￿cient for the hello world) is only 27% currently incompatible with KML. Without this option boot of the microVM image, which is already a slim kernel image time jumps to 71 ms for Lupine. We believe that the in- for general cloud applications. When con￿guring Lupine to compatibilities with KML are not fundamental and could be optimize for size over performance (-tiny), the Lupine image overcome with engineering e￿ort and would result in similar shrinks by a further 6%. boot times to lupine-nokml. We do not ￿nd an improvement Figure 6 shows Lupine to be comparable to our reference in Lupine’s boot time when employing space-saving tech- unikernel images. We note that Lupine dynamically loads niques (-tiny) with or without KML. In other words, the 6% libraries (like libc) whereas unikernels like Rump statically reduction in image size described in Section 4.2 does not link them into the image, resulting in larger image size. How- a￿ect boot time thus implying that boot time is more about ever, for a trivial application like hello-world libc is not reducing the complexity of the boot process than the image needed and Rump’s libc (which consists of 24M) is not in- size. For lupine-general, we found only 2 ms additional boot cluded in the image (or the reported numbers). time is incurred, remaining faster than HermiTux and OSv. We also examined the e￿ect on application-speci￿c con- ￿guration on the Lupine kernel image size. We found that 4.4 Memory Footprint the image size of lupine kernels varied from 27 33% of mi- Unikernels achieve low memory footprint by using small croVM’s baseline. Compared to lupine-base, this corresponds runtime images that include only what is needed to run a to an increase of up to 19 percent. Even with the largest particular application. We de￿ne the memory footprint for an Lupine kernel con￿guration (lupine-general) the resulting application as the minimum amount of memory required by image size remains smaller than the corresponding OSv and the unikernel to successfully run that application as de￿ned Rump image sizes. by success criteria described in Section 4.1. We determine 8 Memory Footprint

• Repeatedly tested app with decreasing 50 memory allotment 40 hello nginx redis • Choice of apps limited 30 20

by unikernels MegaBytes 10 0 microVM lupine lupine-generalhermitux osv rump

Figure 8. Memory footprint. Figure 10. Relationship of KML syscall latency improve- ment to busying-waiting iterations (the more busy-waiting iterations the less frequent of user-kernel mode switching). 46

4.5 System call latency microbenchmark Unikernels claim low system call latency due to the fact that the application is directly linked with the library OS. Using Lupine, a Linux system, can achieve similar system call la- tency as other POSIX-like unikernel approaches. Figure 9 shows the lmbench system call latency benchmark for the various systems. The results show that Lupine is competitive with unikernel approaches for the null (getppid), read and write tests in lmbench. OSv shows the e￿ects of implemen- tation choices as getppid (issued by the null system call test) is hardcoded to always return 0 without any indirection. Figure 9. System call latency via lmbench. Read of /dev/zero is unsupported and write to /dev/null is almost as expensive as the microVM case. Experimentation with lupine-nokml shows that both spe- cialization and system call overhead elimination play a role. Specialization contributes up to 56% improvement (achieved during the write test) over microVM. However, we found no the memory footprint by repeatedly testing the unikernel di￿erences in system call latency between the application- with a decreasing memory parameter passed to the monitor. speci￿c and general variants (lupine-general) of Lupine. KML Our choice of applications was severely limited by what the provides Lupine an additional 40% (achieved during the null (non-Lupine) unikernels could run without modi￿cation; we test) improvement in system call latency over lupine-nokml. only present the memory footprint for three applications To better understand the potential performance improve- as shown in Figure 8. Unfortunately, HermiTux cannot run ments of KML on Lupine, we designed a microbenchmark nginx, so we omit that bar. in which we issued the null (getppid) system call latency Figure 8 shows the memory footprint for each applica- test in a loop, while inserting a con￿gurable amount of CPU tion. In both application-speci￿c and general cases, Lupine work via another tight loop to control the frequency of the achieves a comparable memory footprint that is even smaller switching between user and kernel mode: the frequency of than unikernel approaches for redis. This is due in part to switching decreases as the number of iterations increases. lazy allocation. While each of the unikernels shows variation In an extreme case where the application calls the system in memory footprint, the Linux-based approaches (microVM call without doing anything else (0 iterations) KML provides and Lupine) do not.10 There is no variation because the Linux a 40% performance improvement. However, Figure 10 shows kernel binary (the ￿rst binary to be loaded) is more or less how quickly the KML bene￿ts are amortized away: with the same size across applications. The binary size of the ap- only 160 iterations between the issued system calls the orig- plication is irrelevant if much of it is loaded lazily and even a inal 40% improvement in latency drops below 5%. We ￿nd large application-level allocation like the one made by redis similarly low KML bene￿ts for real-world applications in may not be populated until later. However, an argument can Section 4.6. be made in favor of eliminating laziness and upfront knowl- 10OSv is similar to Linux in this case in that it loads the application dynam- edge of whether su￿cient resources will be available for an ically, which is why nginx and hello exhibit the same memory footprint; application. We further discuss this issue in the context of we believe redis exhibits a larger memory footprint because of how the immutable infrastructure in Section 6. OSv memory allocator works. 9 Memory Footprint

• Repeatedly tested app with decreasing 50 memory allotment 40 hello nginx redis • Choice of apps limited 30 20

by unikernels MegaBytes 10 0 • No variation in lupine: microVM lupine lupine-generalhermitux osv rump lazy loading makes binary size irrelevant Figure 8. Memory footprint. Figure 10. Relationship of KML syscall latency improve- ment to busying-waiting iterations (the more busy-waiting iterations the less frequent of user-kernel mode switching). 47

4.5 System call latency microbenchmark Unikernels claim low system call latency due to the fact that the application is directly linked with the library OS. Using Lupine, a Linux system, can achieve similar system call la- tency as other POSIX-like unikernel approaches. Figure 9 shows the lmbench system call latency benchmark for the various systems. The results show that Lupine is competitive with unikernel approaches for the null (getppid), read and write tests in lmbench. OSv shows the e￿ects of implemen- tation choices as getppid (issued by the null system call test) is hardcoded to always return 0 without any indirection. Figure 9. System call latency via lmbench. Read of /dev/zero is unsupported and write to /dev/null is almost as expensive as the microVM case. Experimentation with lupine-nokml shows that both spe- cialization and system call overhead elimination play a role. Specialization contributes up to 56% improvement (achieved during the write test) over microVM. However, we found no the memory footprint by repeatedly testing the unikernel di￿erences in system call latency between the application- with a decreasing memory parameter passed to the monitor. speci￿c and general variants (lupine-general) of Lupine. KML Our choice of applications was severely limited by what the provides Lupine an additional 40% (achieved during the null (non-Lupine) unikernels could run without modi￿cation; we test) improvement in system call latency over lupine-nokml. only present the memory footprint for three applications To better understand the potential performance improve- as shown in Figure 8. Unfortunately, HermiTux cannot run ments of KML on Lupine, we designed a microbenchmark nginx, so we omit that bar. in which we issued the null (getppid) system call latency Figure 8 shows the memory footprint for each applica- test in a loop, while inserting a con￿gurable amount of CPU tion. In both application-speci￿c and general cases, Lupine work via another tight loop to control the frequency of the achieves a comparable memory footprint that is even smaller switching between user and kernel mode: the frequency of than unikernel approaches for redis. This is due in part to switching decreases as the number of iterations increases. lazy allocation. While each of the unikernels shows variation In an extreme case where the application calls the system in memory footprint, the Linux-based approaches (microVM call without doing anything else (0 iterations) KML provides and Lupine) do not.10 There is no variation because the Linux a 40% performance improvement. However, Figure 10 shows kernel binary (the ￿rst binary to be loaded) is more or less how quickly the KML bene￿ts are amortized away: with the same size across applications. The binary size of the ap- only 160 iterations between the issued system calls the orig- plication is irrelevant if much of it is loaded lazily and even a inal 40% improvement in latency drops below 5%. We ￿nd large application-level allocation like the one made by redis similarly low KML bene￿ts for real-world applications in may not be populated until later. However, an argument can Section 4.6. be made in favor of eliminating laziness and upfront knowl- 10OSv is similar to Linux in this case in that it loads the application dynam- edge of whether su￿cient resources will be available for an ically, which is why nginx and hello exhibit the same memory footprint; application. We further discuss this issue in the context of we believe redis exhibits a larger memory footprint because of how the immutable infrastructure in Section 6. OSv memory allocator works. 9 Figure 8. Memory footprint. Figure 10. Relationship of KML syscall latency improve- ment to busying-waiting iterations (the more busy-waiting System call latency microbenchmark iterations the less frequent of user-kernel mode switching).

0.1 .19.17 4.5 System call latency microbenchmark null 0.08 read Unikernels claim low system call latency due to the fact that • Lmbench write the application is directly linked with the library OS. Using 0.06 Lupine, a Linux system, can achieve similar system call la-

Latency 0.04 tency as other POSIX-like unikernel approaches. Figure 9 0.02 shows the lmbench system call latency benchmark for the various systems. The results show that Lupine is competitive 0 microvm lupine-nokmllupine lupine-generalhermitux osv rump with unikernel approaches for the null (getppid), read and write tests in lmbench. OSv shows the e￿ects of implemen- tation choices as getppid (issued by the null system call test) is hardcoded to always return 0 without any indirection. Figure 9. System call latency via lmbench. Read of /dev/zero is unsupported and write to /dev/null is almost as expensive as the microVM case. Experimentation with lupine-nokml shows that both spe- cialization and system call overhead elimination play a role. Specialization contributes up to 56% improvement (achieved during the write test) over microVM. However, we found no the memory footprint by repeatedly testing the unikernel di￿erences in system call latency between the application- speci￿c and general variants ( ) of Lupine. KML with a decreasing memory parameter passed to the monitor. 48 lupine-general Our choice of applications was severely limited by what the provides Lupine an additional 40% (achieved during the null (non-Lupine) unikernels could run without modi￿cation; we test) improvement in system call latency over lupine-nokml. only present the memory footprint for three applications To better understand the potential performance improve- as shown in Figure 8. Unfortunately, HermiTux cannot run ments of KML on Lupine, we designed a microbenchmark nginx, so we omit that bar. in which we issued the null (getppid) system call latency Figure 8 shows the memory footprint for each applica- test in a loop, while inserting a con￿gurable amount of CPU tion. In both application-speci￿c and general cases, Lupine work via another tight loop to control the frequency of the achieves a comparable memory footprint that is even smaller switching between user and kernel mode: the frequency of than unikernel approaches for redis. This is due in part to switching decreases as the number of iterations increases. lazy allocation. While each of the unikernels shows variation In an extreme case where the application calls the system in memory footprint, the Linux-based approaches (microVM call without doing anything else (0 iterations) KML provides and Lupine) do not.10 There is no variation because the Linux a 40% performance improvement. However, Figure 10 shows kernel binary (the ￿rst binary to be loaded) is more or less how quickly the KML bene￿ts are amortized away: with the same size across applications. The binary size of the ap- only 160 iterations between the issued system calls the orig- plication is irrelevant if much of it is loaded lazily and even a inal 40% improvement in latency drops below 5%. We ￿nd large application-level allocation like the one made by redis similarly low KML bene￿ts for real-world applications in may not be populated until later. However, an argument can Section 4.6. be made in favor of eliminating laziness and upfront knowl- 10OSv is similar to Linux in this case in that it loads the application dynam- edge of whether su￿cient resources will be available for an ically, which is why nginx and hello exhibit the same memory footprint; application. We further discuss this issue in the context of we believe redis exhibits a larger memory footprint because of how the immutable infrastructure in Section 6. OSv memory allocator works. 9 Figure 8. Memory footprint. Figure 10. Relationship of KML syscall latency improve- ment to busying-waiting iterations (the more busy-waiting System call latency microbenchmark iterations the less frequent of user-kernel mode switching).

0.1 .19.17 4.5 System call latency microbenchmark null 0.08 read Unikernels claim low system call latency due to the fact that • Lmbench write the application is directly linked with the library OS. Using 0.06 • 56% improvement Lupine, a Linux system, can achieve similar system call la-

Latency 0.04 tency as other POSIX-like unikernel approaches. Figure 9 over microvm from 0.02 shows the lmbench system call latency benchmark for the various systems. The results show that Lupine is competitive specialization 0 microvm lupine-nokmllupine lupine-generalhermitux osv rump with unikernel approaches for the null (getppid), read and write tests in lmbench. OSv shows the e￿ects of implemen- tation choices as getppid (issued by the null system call test) is hardcoded to always return 0 without any indirection. Figure 9. System call latency via lmbench. Read of /dev/zero is unsupported and write to /dev/null is almost as expensive as the microVM case. Experimentation with lupine-nokml shows that both spe- cialization and system call overhead elimination play a role. Specialization contributes up to 56% improvement (achieved during the write test) over microVM. However, we found no the memory footprint by repeatedly testing the unikernel di￿erences in system call latency between the application- speci￿c and general variants ( ) of Lupine. KML with a decreasing memory parameter passed to the monitor. 49 lupine-general Our choice of applications was severely limited by what the provides Lupine an additional 40% (achieved during the null (non-Lupine) unikernels could run without modi￿cation; we test) improvement in system call latency over lupine-nokml. only present the memory footprint for three applications To better understand the potential performance improve- as shown in Figure 8. Unfortunately, HermiTux cannot run ments of KML on Lupine, we designed a microbenchmark nginx, so we omit that bar. in which we issued the null (getppid) system call latency Figure 8 shows the memory footprint for each applica- test in a loop, while inserting a con￿gurable amount of CPU tion. In both application-speci￿c and general cases, Lupine work via another tight loop to control the frequency of the achieves a comparable memory footprint that is even smaller switching between user and kernel mode: the frequency of than unikernel approaches for redis. This is due in part to switching decreases as the number of iterations increases. lazy allocation. While each of the unikernels shows variation In an extreme case where the application calls the system in memory footprint, the Linux-based approaches (microVM call without doing anything else (0 iterations) KML provides and Lupine) do not.10 There is no variation because the Linux a 40% performance improvement. However, Figure 10 shows kernel binary (the ￿rst binary to be loaded) is more or less how quickly the KML bene￿ts are amortized away: with the same size across applications. The binary size of the ap- only 160 iterations between the issued system calls the orig- plication is irrelevant if much of it is loaded lazily and even a inal 40% improvement in latency drops below 5%. We ￿nd large application-level allocation like the one made by redis similarly low KML bene￿ts for real-world applications in may not be populated until later. However, an argument can Section 4.6. be made in favor of eliminating laziness and upfront knowl- 10OSv is similar to Linux in this case in that it loads the application dynam- edge of whether su￿cient resources will be available for an ically, which is why nginx and hello exhibit the same memory footprint; application. We further discuss this issue in the context of we believe redis exhibits a larger memory footprint because of how the immutable infrastructure in Section 6. OSv memory allocator works. 9 Figure 8. Memory footprint. Figure 10. Relationship of KML syscall latency improve- ment to busying-waiting iterations (the more busy-waiting System call latency microbenchmark iterations the less frequent of user-kernel mode switching).

0.1 .19.17 4.5 System call latency microbenchmark null 0.08 read Unikernels claim low system call latency due to the fact that • Lmbench write the application is directly linked with the library OS. Using 0.06 • 56% improvement Lupine, a Linux system, can achieve similar system call la-

Latency 0.04 tency as other POSIX-like unikernel approaches. Figure 9 over microvm from 0.02 shows the lmbench system call latency benchmark for the various systems. The results show that Lupine is competitive specialization 0 microvm lupine-nokmllupine lupine-generalhermitux osv rump with unikernel approaches for the null (getppid), read and • Additional 40% from write tests in lmbench. OSv shows the e￿ects of implemen- tation choices as getppid (issued by the null system call KML test) is hardcoded to always return 0 without any indirection. 0.4 0.35Figure 9. System call latency via lmbench. Read of /dev/zero is unsupported and write to /dev/null • KML benefit vanishes 0.3 0.25 is almost as expensive as the microVM case. 0.2 Experimentation with lupine-nokml shows that both spe- quickly in more 0.15 0.1 cialization and system call overhead elimination play a role. realistic workloads 0.05 Specialization contributes up to 56% improvement (achieved 0 KML improvement (%) 0 20 40 60 80 100 120 140 160 during the write test) over microVM. However, we found no the memory footprintIterations by repeatedly between system testing calls the unikernel di￿erences in system call latency between the application- Memory footprint. speci￿c and general variants ( ) of Lupine. KML Figure 8. with a decreasing memory parameter passed to the monitor. 50 lupine-general OurFigure choice 10. ofRelationship applications was of KML severely syscall limited latency by what improve- the provides Lupine an additional 40% (achieved during the null (non-Lupine)ment to busying-waiting unikernels could iterations run without (the more modi busy-waiting￿cation; we test) improvement in system call latency over lupine-nokml. onlyiterations present the the less memory frequent footprint of user-kernel for three mode applications switching). To better understand the potential performance improve- as shown in Figure 8. Unfortunately, HermiTux cannot run ments of KML on Lupine, we designed a microbenchmark nginx, so we omit that bar. in which we issued the null (getppid) system call latency 4.5Figure System 8 shows call the latency memory microbenchmark footprint for each applica- test in a loop, while inserting a con￿gurable amount of CPU tion.Unikernels In both claim application-speci low system call￿c and latency general due cases, to the Lupine fact that work via another tight loop to control the frequency of the achievesthe application a comparable is directly memory linked footprint with the that library is even OS. smaller Using switching between user and kernel mode: the frequency of thanLupine, unikernel a Linux approaches system, can for achieveredis. This similar is due system in part call to la- switching decreases as the number of iterations increases. lazytency allocation. as other While POSIX-like each of unikernel the unikernels approaches. shows variation Figure 9 In an extreme case where the application calls the system inshows memory the footprint,lmbench thesystem Linux-based call latency approaches benchmark (microVM for the call without doing anything else (0 iterations) KML provides andvarious Lupine) systems. do not. The10 There results is showno variation that Lupine because is competitive the Linux a 40% performance improvement. However, Figure 10 shows kernelwith unikernel binary (the approaches￿rst binary for to the be null loaded) (getppid is more), read or less and how quickly the KML bene￿ts are amortized away: with thewrite same tests size in acrosslmbench applications.. OSv shows The the binary e￿ects size of of implemen- the ap- only 160 iterations between the issued system calls the orig- plicationtation choices is irrelevant as getppid if much(issued of it is loaded by the lazilynull andsystem even call a inal 40% improvement in latency drops below 5%. We ￿nd largetest) application-level is hardcoded to always allocation return like 0 the without one made any indirection. by redis similarly low KML bene￿ts for real-world applications in Figure 9. System call latency via lmbench. mayRead not of be/dev/zero populatedis until unsupported later. However, and write an argument to /dev/null can Section 4.6. beis made almost in as favor expensive of eliminating as the microVM laziness and case. upfront knowl- 10OSv is similar to Linux in this case in that it loads the application dynam- edgeExperimentation of whether su￿cient with lupine-nokml resources willshows be available that both for an spe- ically, which is why nginx and hello exhibit the same memory footprint; application.cialization and We system further call discuss overhead this issue elimination in the context play a role. of we believe redis exhibits a larger memory footprint because of how the immutableSpecialization infrastructure contributes in up Section to 56% 6. improvement (achieved OSv memory allocator works. during the write test) over microVM. However, we found no 9 the memory footprint by repeatedly testing the unikernel di￿erences in system call latency between the application- with a decreasing memory parameter passed to the monitor. speci￿c and general variants (lupine-general) of Lupine. KML Our choice of applications was severely limited by what the provides Lupine an additional 40% (achieved during the null (non-Lupine) unikernels could run without modi￿cation; we test) improvement in system call latency over lupine-nokml. only present the memory footprint for three applications To better understand the potential performance improve- as shown in Figure 8. Unfortunately, HermiTux cannot run ments of KML on Lupine, we designed a microbenchmark nginx, so we omit that bar. in which we issued the null (getppid) system call latency Figure 8 shows the memory footprint for each applica- test in a loop, while inserting a con￿gurable amount of CPU tion. In both application-speci￿c and general cases, Lupine work via another tight loop to control the frequency of the achieves a comparable memory footprint that is even smaller switching between user and kernel mode: the frequency of than unikernel approaches for redis. This is due in part to switching decreases as the number of iterations increases. lazy allocation. While each of the unikernels shows variation In an extreme case where the application calls the system in memory footprint, the Linux-based approaches (microVM call without doing anything else (0 iterations) KML provides and Lupine) do not.10 There is no variation because the Linux a 40% performance improvement. However, Figure 10 shows kernel binary (the ￿rst binary to be loaded) is more or less how quickly the KML bene￿ts are amortized away: with the same size across applications. The binary size of the ap- only 160 iterations between the issued system calls the orig- plication is irrelevant if much of it is loaded lazily and even a inal 40% improvement in latency drops below 5%. We ￿nd large application-level allocation like the one made by redis similarly low KML bene￿ts for real-world applications in may not be populated until later. However, an argument can Section 4.6. be made in favor of eliminating laziness and upfront knowl- 10OSv is similar to Linux in this case in that it loads the application dynam- edge of whether su￿cient resources will be available for an ically, which is why nginx and hello exhibit the same memory footprint; application. We further discuss this issue in the context of we believe redis exhibits a larger memory footprint because of how the immutable infrastructure in Section 6. OSv memory allocator works. 9 Application performance

• Throughput normalized Name redis-get redis-set nginx-conn nginx-sess rings and users. These restrictions are often promoted as a to microVM microVM 1.00 1.00 1.00 1.00 feature (e.g., a single address space saves TLB ￿ushes and lupine-general 1.19 1.20 1.29 1.15 improves context-switch performance [32, 44]) and justi￿ed lupine 1.21 1.22 1.33 1.14 • Application choice lupine-tiny 1.15 1.16 1.23 1.11 or downplayed in certain contexts (e.g., many limited by unikernels lupine-nokml 1.20 1.21 1.29 1.16 do not utilize multi-processing [62]) Unfortunately, there is lupine-nokml-tiny 1.13 1.13 1.21 1.12 no room for bending the rules: as a unikernel, an application • Lupine outperforms hermitux .66 .67 that issues fork will often crash or enter into an unexpected osv .87 .53 state by a stubbed-out fork implementation (e.g., continuing rump .99 .99 1.25 .53 microVM by up to 33% as a child where there is no parent). Such rigidity leads to Application performance normalized to microVM. • Linux implementation is Table 4. serious issues for compatibility: as we encountered in our evaluation, it is unlikely that an existing application will run highly optimized unmodi￿ed on a unikernel, even if the library OS is more- 4.6 Application performance or-less binary compatible. Furthermore, there are situations Unikernels boast good application performance due to lack where relaxing the unikernel restrictions is imperative. As of bloat and the elimination of system call latency. Table 4 a trivial example, building the Linux kernel with a single processor takes almost twice as long as with two processors. shows the throughput of two popular Web applications: the51 nginx web server and the redis key-value store, normal- Lupine is, at its core, a Linux system, and relaxing its ized to microVM performance. As in the memory footprint unikernel properties is as simple as re-enabling the relevant experiment in Section 4.4, we were severely limited in the con￿guration options. This results in a graceful degrada- choice of applications by what the various unikernels could tion of unikernel-like performance properties. For example, fork run without modi￿cation. rather than crashing on , Lupine can continue to exe- For clients, we used redis-benchmark to benchmark two cute correctly even if it begins to experience common redis commands, get and set, measuring requests overheads. Next, we investigate what the cost would be for per second. For nginx, we used ab to measure requests per Lupine to support applications that use multiprocessing fea- tures and whether including this support would adversely second. Under the connection-based scenario (nginx-conn), one connection sends only one HTTP request. Under the a￿ect applications that do not. We ￿rst consider the use of multiple address spaces and session-based scenario (nginx-sess), one connection sends one hundred HTTP requests.11 We ran the clients on the experiment with two di￿erent scenarios. First, we consider same physical machine to avoid uncontrolled network ef- auxiliary processes that spend most of their time waiting fects. either waking up or running in a frequency that does not As shown in Table 4, Lupine outperforms the baseline interfere or create contention on resources with the appli- cation. We refer to such processes as , i.e., and all the unikernels. A general kernel (lupine-general) that control processes supports 20 applications in Section 3 does not sacri￿ce appli- processes that are responsible for monitoring the applica- cation performance. The poor performance of the unikernels tion for multiple purposes (e.g., shells, environment setup, is most likely due to the fact that the implementation of ker- recovery and analysis, etc.). In practice, it is extremely com- nel functionality in Linux has been highly optimized over mon, for example, to ￿nd a script that forks an application many years thanks to the large Linux community, beyond from a after setting up some environment variables. what other implementations can achieve. We would like to Lack of support for this case from existing POSIX-compliant have more data points, but the inability to run applications unikernel implementations severely limits their generality. on the unikernels is a signi￿cant challenge: even with these We design an experiment to show that such uses of multi- two extremely popular applications, OSv drops connections ple address spaces are not harmful to unikernel-like perfor- for redis and nginx has not been curated for HermiTux. mance. Speci￿cally, we measure the system call latencies after launching 2i (i = 0, 1,...,10) control processes, using Within the Lupine variants, optimizing for space (e.g., - sleep as the control process. As expected, in all cases, there tiny) can cost up to 10 percentage points (for nginx-conn), is no latency increase; all measurements (averaged over 30 while KML adds at most 4 percentage points (also for nginx- runs) are within one standard deviation. conn). As in the other experiments, KML and optimizing for size a￿ects performance only a small amount relative to Second, we consider co-existing processes that contend re- specialization via con￿guration. sources with each other and may experience context switch overheads. To quantify these overheads, we compare the 5 Beyond Unikernels context switch overheads for threads that do not switch ad- dress spaces (to approximate unikernel behavior) versus pro- Unikernel applications (and their developers) are typically cesses. We use the messaging benchmark in perf [10] where restricted from using multiple processes, processors, security 2i (i = 0, 1, 2, 3, 4) groups (10 senders and 10 receivers per 11We use the –keepalive option in ab. group) of threads or processes message each other via UNIX 10 Takeaways

• Specialization is important: • 73% smaller image size, 59% faster boot time, 28% lower memory footprint and 33% higher throughput than the state-of-the-art VM • Specialization per application may not be: • 19 options (lupine-general) cover at least 83% of downloaded apps with at most 4% reduction in performance • System call overhead elimination may not be: • only 4% improvement for macrobenchmark, unlike 40% for microbenchmarks

• Lupine avoids common pitfalls: has support for unmodified Linux applications, optimized implementation

52 Lupine is still Linux

• Graceful degradation of unikernel properties

• Fork crashes unikernels, not Lupine • Virtually no overhead to support multiple address spaces • Especially for control processes • At worst 8% overhead to support multiple processors Unachieved unikernel benefits

• Language-based unikernel benefits • Powerful static analysis / whole-system optimization

• Some unikernels (e.g., Solo5-based) have been proven to run on a thinner unikernel monitor interface • Potentially better security, debugging opportunities, unikernel as process, etc. • Linux does not (yet) Related work

• Unikernel-like work that leverages Linux • LightVM (TinyX): VMs can be as light as containers • X-Containers: Xen paravirt for Linux to be a libOS • UKL: modify Linux build to include kernel call to application main

• Linux configuration studies • Alharthi et al.: 89% of 1530 studied vulnerabilities nullified via config specialization • Kurmus et al.: 50-85% of attack surface reduction via configuration Getting Lupine benefits into community

• Most benefits are achieved through specialized config • But lupine-general.config can run top 20 Docker containers

• Challenges/risks • How do we know lupine-general is general enough? • Research needed: discovery vs. failover vs. ? • Tension with container ecosystem (kata agent à more general kernel config) • Research needed: kernel configuration-aware design? Lupine Conclusion

• Unikernels and library OS seem attractive • But trying to achieve generality/POSIX in unikernels is not worth it

• Linux can already behave like a unikernel! • Specialization via configuration • Can maintain Linux community and engineering effort in past three decades

• Can we apply these techniques to virtualization-enabled containers? Thank you!

• https://github.com/hckuo/Lupine-Linux • https://nabla-containers.github.io/

[email protected]

58