Linux-Vserver Resource Efficient OS-Level Virtualization
Total Page:16
File Type:pdf, Size:1020Kb
Linux-VServer Resource Efficient OS-Level Virtualization Herbert Pötzl Marc E. Fiuczynski [email protected] [email protected] Abstract Hosting scenarios such as web hosting centers provid- ing Virtual Private Servers (VPS) and HPC clusters need Linux-VServer is a lightweight virtualization system to isolate different groups of users and their applications used to create many independent containers under a from each other. Linux-VServer has been in production common Linux kernel. To applications and the user of a use for several years by numerous VPS hosting centers Linux-VServer based system, such a container appears around the world. Furthermore, it has been in use since just like a separate host. 2003 by PlanetLab (www.planet-lab.org), which is a ge- ographically distributed research facility consisting of The Linux-Vserver approach to kernel subsystem con- roughly 750 machines located in more than 30 countries. tainerization is based on the concept of context isola- Due to its efficiency, a large number of VPSs can be tion. The kernel is modified to isolate a container into robustly hosted on a single machine. For example, the a separate, logical execution context such that it can- average PlanetLab machine today has a 2.4Ghz x86 pro- not see or impact processes, files, network traffic, global cessor, 1GB RAM, and 100GB of disk space and typi- IPC/SHM, etc., belonging to another container. cally hosts anywhere from 40-100 live VPSes. Hosting centers typically use even more powerful servers and it Linux-VServer has been around for several years and its is not uncommon for them to pack 200 VPSes onto a fundamental design goal is to be an extremely low over- single machine. head yet highly flexible production quality solution. It Server consolidation is another hosting scenario where is actively used in situations requiring strong isolation isolation between independent application services where overall system efficiency is important, such as (e.g., db, dns, web, print) improves overall system ro- web hosting centers, server consolidation, high perfor- bustness. Failures or misbehavior of one service should mance clusters, and embedded systems. not impact the performance of another. While hypervi- sors like Xen and VMware typically dominate the server 1 Introduction consolidation space, a Linux-VServer solution may be better when resource efficiency and raw performance are required. For example, an exciting development is that This paper describes Linux-VServer, which is a virtu- the One Laptop Per Child (OLPC) project has recently context isolation alization approach that applies tech- decided to use Linux-VServer for their gateway servers niques to the Linux kernel in order to create lightweight that will provide services such as print, web, blog, void, container instances. Its implementation consists of a backup, mesh/wifi, etc., to their $100 laptops at school separate kernel patch set that adds approximately 17K premises. These OLPC gateways servers will be based lines of code to the Linux kernel. Due to its architec- on low cost / low power consuming embedded systems ture independent nature it has been validated to work on hardware, and for this reason a resource efficient solu- eight different processor architectures (x86, sparc, al- tion like Linux-VServer rather than Xen was chosen. pha, ppc, arm, mips, etc.). While relatively lean in terms of overall size, Linux-VServer touches roughly 460 ex- Sandboxing scenarios for generic application plugins isting kernel files—representing a non-trivial software- are emerging on mobile terminals and web browsers to engineering task. Linux-VServer is an efficient and flex- isolate arbitrary plugins–not just Java lets–downloaded ible solution that is broadly used for both hosting and by the user. For its laptops OLPC has designed a secu- sandboxing scenarios. rity framework, called bitfrost [1], and has decided to 152 • Linux-VServer utilize Linux-VServer as its sandboxing solution, isolat- Host Services Apache Quake Srv. ing the various activities from each other and protecting MySQL Postgresql PHP ... the main system from all harm. Virtual Platform Hosting Platform Host Context Guest 1 ... Guest N Of course, Linux-VServer is not the only approach to Admin Guest Admin Guest Admin implementing containers on Linux. Alternatives in- ... /dev /usr /home /proc ... /dev /usr /home /proc ... /dev /usr /home /proc clude non-mainlined solutions such as OpenVZ, Vir- tuozzo, and Ensim; and, there is an active commu- Shared OS Image nity of developers working on kernel patches to incre- mentally containerize the mainline kernel. While these approaches differ at the implementation level, they all Figure 1: Container-based Platform typically focus in onto a single point within the over- all containerization design spectrum: system virtualiza- To applications and the user of a container-based sys- tion. Benchmarks run on these alternative approaches to tem, the container appears just like a separate linux sys- Linux-VServer reveal non-trivial time and space over- tem. head, which we believe are fundamentally due to their focus on system virtualization. In contrast, we have Figure 1 depicts a container-based system, which is found with Linux-VServer that using a context isola- comprised of two basic platform groupings. The host- tion approach to containerize critical kernel subsystems ing platform consists essentially of the shared OS image yields neglible overhead compared to a vanilla kernel— and a privileged host context. This is the context that and in most cases performance of Linux-VServer and a system administrator uses to manage containers. The Linux is indistinguishable. virtual platform is the view of the system as seen by the guest containers. Applications running in a guest con- This paper has two goals: 1) serve as a gentle intro- tainer work just as they would on a corresponding non- duction to container-based system for the general Linux container-based system. That is, they have their own community, and 2) highlight both the benefits (and root file system, IP addresses, /dev, /proc, etc. drawbacks) of our context isolation approach to kernel The subsequent sections will focus on the main contri- containerization. The next section presents a high-level butions that Linux-VServer makes, rather than being ex- overview of container-based systems and then describes haustively describing all required kernel modifications. the Linux-VServer approach in further detail. Section 3 evaluates efficiency of Linux-VServer. Finally, Sec- tion 4 offers some concluding remarks. 2.2 Kernel Containerization This section describes the kernel subsystem enhance- 2 Linux-VServer Design Approach ments that Linux-VServer makes to support contain- ers. These enhancements are designed to be low over- This section provides an overview of container-based head, flexible, as well as to enhance security in order systems, describes the general techniques used to to properly confine applications into a container. For achieve isolation, and presents the mechanisms with CPU scheduling, Linux-VServer introduces a novel fil- which Linux-VServer implements these techniques. tering technique in order to support fair-share, work- conserving, or hard limit container scheduling. How- ever, in terms of managing system resources such as 2.1 Container-based System Overview storage space, io bandwidth, for Linux-VServer it is mostly an exercise of leveraging existing Linux resource At a high-level, a container-based system provides a management and accounting facilities. shared, virtualized OS image, including a unique root file system, a set of system executables and libraries, and resources (cpu, memory, storage, etc.) assigned to 2.2.1 Guest Filesystems the container when it is created. Each container can be “booted” and “shut down” just like a regular operating Guest container filesystems could either be imple- system, and “rebooted” in only seconds when necessary. mented using loop-back mounted images—as is typical 2007 Linux Symposium, Volume Two • 153 for qemu, xen, vmware, and uml based systems—or by proach is that files common to more than one container, simply using the native file systems and using chroot. which are rarely going to change (e.g., like libraries The main benefit of using a chroot-ed filesystem over and binaries from similar OS distributions), can be hard the loop-back mounted images is performance. Guests linked on a shared filesystem. This is possible because can read/write files at native filesystem speed. How- the guest containers can safely share filesystem objects ever, there are two drawbacks: 1) chroot() information is (inodes). volatile and therefore only provides weak confinement, and 2) chroot-ed filesystems may lead to significant du- The only drawback with hard linking files is that without plication of common files. Linux-VServer addresses additional measures, a container could (un)intentionally both of these problems, as one of its central objectives is destroy or modify such shared files, which in turn would to support containers in a resource efficient manner that harm/interfere other containers. performs as well as native Linux. This can easily be addressed by adding an immutable attribute to the file, which then can be safely shared be- Filesystem Chroot Barrier tween two containers. In order to ensure that a container Because chroot() information is volatile, it is simple cannot modify such a file directly, the Linux capability to escape from a chroot-ed environment, which would to modify this attribute is removed from the set of capa- be bad from a security perspective when one wants to bilities given to a guest container. maintain the invariant that processes are confined within their containers filesystem. This invariant is nice to have However, removing or updating a file with immutable when using containers for generic hosting scenarios, but link attribute set from inside a guest container would be clearly is required for sandboxing scenario. To appreci- impossible. To remove the file the additional “permis- ate how easy it is to escape conventional chroot() con- sion to unlink” attribute needs to be set.