“A Linux in Unikernel Clothing”
Total Page:16
File Type:pdf, Size:1020Kb
Lupine Linux “A Linux in Unikernel Clothing” Dan Williams (IBM) With Hsuan-Chi (Austin) Kuo (UIUC + IBM), Ricardo Koller (IBM), Sibin Mohan (UIUC) Roadmap • Context • Containers and isolation • Unikernels • Nabla containers • Lupine Linux • A linux in unikernel clothing • Concluding thoughts 2 Containers are great! • Have changed how applications are packaged, deployed and developed • Normal processes, but “contained” • Namespaces, cgroups, chroot • Lightweight • Start quickly, “bare metal” • Easy image management (layered fs) • Tooling/orchestration ecosystem 3 But… High level of • Large attack surface to the host abstraction (e.g., system • Limits adoption of container-first architecture calls) app • Fortunately, we know how to reduce attack surface! Host Kernel with namespacing (e.g., Linux) Containers 4 Deprivileging and unsharing kernel functionality Low level of High level of • Virtual machines (VMs) abstraction abstraction • Guest kernel (e.g., virtual (e.g., system app hardware) calls) • Thin interface Guest Kernel app (e.g., Linux) Monitor Process (e.g., QEMU) Host Kernel with Host Kernel/Hypervisor namespacing (e.g., Linux/KVM) (e.g., Linux) VMs Containers 5 Deprivileging and unsharing kernel functionality Low level of High level of • Virtual machines (VMs) abstraction abstraction • Guest kernel (e.g., virtual (e.g., system app hardware) calls) • Thin interface • Userspace kernel Guest Kernel app • Performance issues (e.g., Linux) Monitor Process (e.g., QEMU) Host Kernel with Host Kernel/Hypervisor namespacing (e.g., Linux/KVM) (e.g., Linux) VMs Containers 6 But wait? Aren't VMs slow and heavyweight? • Boot time? • Memory footprint? • Especially for environments like serverless??!! 7 VMs are becoming lightweight Low level of abstraction • Thin monitors (e.g., virtual • e.g., AWS Firecracker app hardware) • Reduce complexity for performance (e.g., no PCI) Guest Kernel (e.g., Linux) Monitor Process (e.g., QEMU) Host Kernel/Hypervisor (e.g., Linux/KVM) VMs 8 VMs are becoming lightweight Low level of abstraction • Thin monitors (e.g., virtual • e.g., AWS Firecracker app hardware) • Reduce complexity for performance (e.g., no PCI) Guest Kernel (e.g., Linux) Monitor Process Host Kernel/Hypervisor (e.g., Linux/KVM) VMs Firecracker boot times as reported in Agache et al., NSDI 2020 9 VMs are becoming lightweight Low level of abstraction • Thin monitors (e.g., virtual • e.g., AWS Firecracker app hardware) • Reduce complexity for performance (e.g., no PCI) Guest Kernel (e.g., Linux) Monitor Process Host Kernel/Hypervisor (e.g., Linux/KVM) VMs Firecracker boot times as reported in Agache et al., NSDI 2020 Manco et al., SOSP 2017 10 VMs are becoming lightweight Low level of abstraction • Thin monitors (e.g., virtual • e.g., AWS Firecracker app hardware) • Reduce complexity for performance (e.g., no PCI) Guest Kernel • Thin guests? • Userspace: (e.g., Ubuntu --> Alpine Linux) Monitor Process • Kernel configuration (e.g., TinyX, Lupine) Host Kernel/Hypervisor • Unikernels (e.g., Linux/KVM) VMs KubeCon 2020 Eurosys 2020 11 Unikernels are thin guests to the extreme • An application linked with library OS components • Run on virtual hardware (like) abstraction • Single CPU VM • Language-specific • MirageOS (OCaml) • IncludeOS (C++) • Legacy-oriented • Rumprun (NetBSD-based) • Hermitux • OSv Claim binary compatibility with Linux 12 Deprivileging and unsharing kernel functionality Low level of High level of • Virtual machines (VMs) abstraction abstraction • Guest kernel (e.g., virtual (e.g., system app hardware) calls) • Thin interface • Userspace kernel Guest Kernel app • E.g., UML (e.g., Linux) • Performance issues Monitor Process • (e.g., QEMU) Library OS / unikernel Host Kernel with • Only-what-you need Host Kernel/Hypervisor namespacing • LightWeight (e.g., Linux/KVM) (e.g., Linux) VMs Containers 13 What we learned from Nabla containers • Nabla containers are unikernels as Unikernel as Process processes Application • Can achieve or exceed lightWeight HotCloud ’16, HotOS ‘17 libraries, runtimes characteristics of containers Library OS • Interfaces are What matter, not HotCloud ’18, SOCC ‘18 virtualization HW TCP/IP FS … • But we lose a lot: Generality Solo5 • Lupine Linux: applying unikernel techniques to Linux VMs Eurosys ’20 14 What we learned from Nabla containers • Nabla containers are unikernels as Unikernel as Process processes Application • Can achieve or exceed lightWeight HotCloud ’16, HotOS ‘17 libraries, runtimes characteristics of containers Library OS • Interfaces are What matter, not HotCloud ’18, SOCC ‘18 virtualization HW TCP/IP FS … • But we lose a lot: Generality Solo5 • Lupine Linux: applying unikernel Eurosys ’20 this techniques to Linux VMs talk 15 Lupine Linux Overview and Roadmap • Introduction • Lupine Linux Application (container) Unikernel-like techniques App rootfs • Specialization Application manifest Specialization • System Call Overhead via Kconfig Elimination System Call • Putting it together Overhead Elimination Lupine Linux • Evaluation via KML Linux source “Unikernel” • Discussion • Related Work 16 Unikernels are great • Small kernel size • Fast boot time • Performance • Security Unikernels are great… But • Small kernel size • Lack full Linux support • Fast boot time • Hermitux: supports only 97 system calls • Performance • OSv: • application needs to be compiled with –PIE, can’t use TLS • Security • Static-linKed applications are not supported • ForK() , execve() are not supported • Special files are not supported such as /proc • Signal mechanism is not complete • Rumprun: only 37 curated applications • Community is too small to keep it rolling Can Linux > be as small as Lupine Linux “Unikernel” > boot as fast as > outperform unikernels? Can Linux > be as small as Lupine Linux “Unikernel” > boot as fast as > outperform unikernels? • Spoiler alert: Yes! • 4MB image size • 23 ms boot time • Up to 33% higher throughput Lupine Linux Overview and Roadmap • Introduction • Lupine Linux Application (container) Unikernel-like techniques App rootfs • Specialization Application manifest Specialization • System Call Overhead via Kconfig Elimination System Call • Putting it together Overhead Elimination Lupine Linux • Evaluation via KML Linux source “Unikernel” • Discussion • Related Work 21 Lupine Linux Overview and Roadmap • Introduction • Lupine Linux Application (container) Unikernel-like techniques App rootfs • Specialization Application manifest Specialization • System Call Overhead via Kconfig Elimination System Call • Putting it together Overhead Elimination Lupine Linux • Evaluation via KML Linux source “Unikernel” • Discussion • Related Work 22 Unikernel technique #1: Specialization • Unikernels include only what is needed • Linux is very configurable • Kconfig • 16,000 options • Drivers • Filesystems • Processor features • ... Specializing Linux through configuration • Start with Firecracker microvm configuration • Assuming unikernel-like workload, can remove even more! • Application-specific options • Multiprocessing • HW management 24 Application-specific options • Example: system calls Option Enabled System Call(s) Furthermore, the kernel is usually run in a separate, more ADVISE_SYSCALLS madvise, fadvise64 privileged security domain than the application. As such, AIO io_setup, io_destroy, io_submit, io_cancel, io_getevents the kernel contains enhanced access control systems such • Kernel services BPF_SYSCALL bpf EPOLL epoll_ctl, epoll_create, epoll_wait, epoll_pwait as SELinux and functionality to guard the crossing from the • e.g., /proc, sysctl EVENTFD eventfd, eventfd2 application domain to the kernel domain, such as seccomp FANOTIFY fanotify_init, fanotify_mark lters, all of which are all unnecessary for unikernels More • Kernel library FHANDLE open_by_handle_at, name_to_handle_at importantly, security options with a severe impact on per- FILE_LOCKING ock formance are also unnecessary for this reason. For example, • Crypto routines FUTEX futex, set_robust_list, get_robust_list INOTIFY_USER inotify_init, inotify_add_watch, inotify_rm_watch KPTI (kernel page table isolation [9]) forbids the mapping • Compression routines SIGNALFD signalfd, signalfd4 of kernel pages into processes’ page table to mitigate the TIMERFD timerfd_create, timerfd_gettime, timerfd_settime Meltdown [39] vulnerability. This dramatically aects sys- • Debugging/information Table 1. Linux conguration options that enable/disable tem call performance; when testing with KPTI on Linux 5.0 system calls. we measured a 10x slowdown in system call latency. In total, we eliminated 12 conguration options due to the single security domain. 25 Linux is well equipped to run on multiple-processor sys- A Lupine kernel compiled for redis does not contain the tems. As a result, the kernel contains various options to in- AIO or EVENTFD-related system calls. clude and tune SMP and NUMA functionality. On the other In addition to the above, some applications expect other hand, since most unikernels do not support fork, the stan- services from the kernel, for instance, the /proc lesystem or dard approach to take advantage of multiple processors is to sysctl functionality. Moreover, the Linux kernel maintains run multiple unikernels. a substantial library that resides in the kernel because of its Finally, Linux contains facilities for dynamically loading traditional position as a more privileged security domain. functionality through modules. A single application facili- Unikernels do not maintain the traditional privilege separa- tates