Study of Firecracker Microvm

Study of Firecracker MicroVM Madhur Jain Khoury College of Computer Sciences Northeastern University Boston, MA [email protected] Abstract—Firecracker is a virtualization technology that makes Firecracker was developed to handle serverless workloads use of Kernel Virtual Machine (KVM). Firecracker belongs to and has been running in production for AWS Lambda and a new virtualization class named the micro-virtual machines Fargate since 2018. Serverless workloads require isolation (MicroVMs). Using Firecracker, we can launch lightweight Mi- croVMs in non-virtualized environments in a fraction of a second, and security, and at the same time, have container benefits at the same time offering the security and workload isolation of faster boot time. A lot of research has been done with provided by traditional VMs and also the resource efficiency that respect to the cold-start and warm-start of the VM instances comes along with containers [1]. Firecracker aims to provide a for serverless workloads. [17] The idea behind developing slimmed-down MicroVM, comprised of approximately 50K lines Firecracker was to make use of the KVM module loaded in of code in Rust and with a reduced attack surface for guest VMs. This report will examine the internals of Firecracker and the Linux kernel and get rid of the legacy devices that other understand why Firecracker is the next big thing going forward virtualization technologies like Xen and VMWare offer. This in virtualization and cloud computing. way, Firecracker can create a VMM with a smaller memory Index Terms—firecracker, microvm, Rust, VMM, QEMU, footprint and also provide improved performance. KVM Section 2 explains the high-level architecture of the Fire- cracker Micro VM, Section 3 dives deep into the boot se- I. INTRODUCTION quence of the Firecracker, Section 4 describes the device model emulation provided, and Section 5 shares some light Firecracker is a new open source Virtual Machine Monitor on the conclusions garnered through this study. (VMM) developed by AWS for serverless workloads. It is a virtualization technology which makes use of KVM, meaning II. ARCHITECTURE OF FIRECRACKER it can only be run on KVM-supported and enabled hosts and KVM is an enabler of hardware extensions provided by a corresponding Linux Kernel v4.14+. With the recommended vendors such as Intel and AMD with their virtualization Linux kernel guest configuration, Firecracker claims to offer extensions such as SVM and VMX. These extensions allow a memory overhead of less than 5MB per container, boots to KVM to directly execute the guest code on the host CPU. application code within 125ms, and allows for the creation of There are three sets of ioctls that make up the KVM API and up to 150 MicroVMs per second. The number of Firecracker are issued to control the various aspects of the virtual machine. MicroVMs running simultaneously on a single host is only The three classes that the iocltls belongs to are [7] - limited by the availability of hardware resources. [1] • System IOCTLs: These query and set global attributes, Firecracker provides security and resource isolation by run- which affect the whole KVM subsystem. In addition, a ning the Firecracker userpsace process inside a jailer process. system ioctl is used to create virtual machines. The jailer sets up system resources that require elevated • VM IOCTLs: These query and set attributes that affect permissions (e.g., cgroup, chroot), drops privileges, and then an entire virtual machinefor example, memory layout. arXiv:2005.12821v1 [cs.OS] 26 May 2020 exec()s into the Firecracker binary, which then runs as an In addition, a VM ioctl is used to create virtual CPUs unprivileged process. Past this point, Firecracker can only (vCPUs). VM ioctls are run from the same userspace access resources to which a privileged third-party grants access process (address space) that was used to create the VM. (e.g., by copying a file into the chroot, or passing a file • vCPU IOCTLs: These query and set attributes that descriptor). [13] control the operation of a single virtual CPU. They run Seccomp filters limit the system calls that the Firecracker vCPU ioctls from the same thread that was used to create process can use. There are 3 possible levels of seccomp the vCPU. filtering, configurable by passing a command line argument QEMU is an open source machine emulator and a virtualiza- to the jailer: 0 (disabled), 1 (whitelists a set of trusted system tion technique to run KVM-enabled Virtual Machines. Similar calls by their identifiers) and 2 (whitelists a set of trusted to QEMU, Firecracker uses a multithreaded event-driven ar- system calls with trusted parameter values), the latter being chitecture. Each Firecracker process runs one and only one the most restrictive and the recommended one. The filters MicroVM. The process consists of three main threads: API, are loaded in the Firecracker process, immediately before the VMM and vCPU. The API server thread is an HTTP server execution of the untrusted guest code starts. [13] that is created to accept requests and performs actions on those created for each virtual CPU. These vCPU threads are nothing but POSIX threads created by KVM. To run guest code, the vCPUs execute an ioctl with KVM RUN as its argument. Software in the root mode is the hypervisor. Hypervisor or the VMM forms a new plane that runs in root mode while the VMs run in non-root mode. KVM uses virtualization extensions to provide these different modes on the host CPUs. In the case of Intel CPUs, VT-x is the CPU virtualization and VT-d is the IO virtualization. For vCPUs, VT-x provides two modes of guest code execution: root and non-root. Whenever Fig. 1. Firecracker Design - Threads a VM attempts to execute an instruction that is prohibited by the non-root model, vCPU immediately switches to a root mode in a trap-like way. This is called a VMEXIT. requests. The API server thread structure comprises of a socket Hypervisor deals with the reason for VMEXIT and then connection to listen to requests on the port, epoll fd to listen to executes VMRESUME to re-enter non-root mode for that VM events on the socket port, and a hashmap of connections. The instance. This interaction between root and non-root is the hashmap consists of token and connection pairs corresponding essence of hardware virtualization. to the file descriptor of the streamin this case, the socket. Traditionally, QEMU has made use of select or poll system III. FIRECRACKER BOOT SEQUENCE calls to maintain an event loop of file descriptors on which Traditional PCs boot Linux with a BIOS and a bootloader. to listen for new events. [16] select or poll system calls The primary responsibilities of the BIOS includes booting requires a list of all open file descriptors maintained in the the CPU in real mode and performing a Power on Self Test VMM structure and then it would go through each of the file (POST) setup before loading the bootloader. BIOS determines descriptors to determine which FDs have new events. This the candidates for boot devices, and once a bootable device would take up O(N) time where N is the number of file is found, the bootloader is loaded into RAM and executed. descriptors to listen for new events. [14] Different systems have different stages of the bootloader to Firecracker takes the epoll approach where the host kernel be executed. LILO, for example, has a two stage bootloader maintains a list of file descriptors for VM process and notifies while GRUB contains a 3 stage bootloader. Multiple stages the VMM whenever there is a new event that occurs in any of a bootloader are used because of the system limitations of of the file descriptors. This is called as a Edge Triggered some of the older devices that were used to boot Linux. mechanism (”pull”), whereas the select/poll was a Level Linux kernels actually do not require a BIOS and a boot- Triggered mechanism (”push”). The epoll fd structure created loader. Instead, Firecracker uses what is known as Linux Boot by Firecracker has the ‘close on exec‘ flag set, which means Protocol. [15] There are multiple versions of the Linux Boot if a process forks the Firecracker process, the file descriptors Protocol standard that exist. Firecracker follows the 64-bit would not be shared. Linux Boot Protocol Standard. Thus, Firecracker can directly The API server exposes a number of requests as REST load the Linux kernel and mount the corresponding root file APIs which can be used to configure the MicroVM. Once system. the MicroVM has been started, i.e., once it receives the The Linux kernel is an uncompressed bzImage (big com- ”InstanceStart” command, the API server thread will just pressed image, usually larger than 512KB). The bzImage block on the epoll file descriptor until new events come in. format consists of a real mode kernel code and a protected Firecracker creates a channel (see Rust channels) to enable mode kernel code. Instead of booting into the entry point communication between the API Server thread and the VMM defined by the real mode, Firecracker directly boots into the thread. Rust channels are similar to Unix pipes for comparison. 64-bit entry point located at 0x200 in the protected mode The VMM server thread manages the entire MicroVM. kernel code. Firecracker loads the uncompressed Linux kernel Once the VMM server thread is created, it runs an event as well as the init process, thereby saving approximately 20 loop which takes the parsed request one by one from the API to 30ms of the time taken to decompress the kernel. server thread and dispatches it to the appropriate handlers.

Load more