An Architectural Perspective for Cloud

Yiming Zhang Qiao Zhou Dongsheng Li Kai Yu Chengfei Zhang School of Computer School of Computer School of Computer School of Computer School of Computer NUDT NUDT NUDT NUDT NUDT Changsha, China Changsha, China Changsha, China Changsha, China Changsha, China [email protected] [email protected] [email protected] [email protected] [email protected]

Yaozheng Wang Jinyan Wang Ping Zhong Yongqiang Xiong Huaimin Wang School of Computer School of Computer Central South University MSRA School of Computer NUDT NUDT Changsha, China Beijing, China NUDT Changsha, China Changsha, China [email protected] [email protected] Changsha, China [email protected] [email protected] whm [email protected]

Abstract—Virtual machines (VMs) and processes are two ty [4]. VMs usually emulate existing hardware architectures important abstractions for cloud virtualization, where VMs that run a complete operating system (OS) providing the usually install a complete operating system (OS) running environment for executing various applications in the form user processes. Although existing in different layers in the of user processes. virtualization hierarchy, VMs and processes have overlapped Although existing in different layers in the virtualization functionalities. For example, they are both intended to provide hierarchy, VMs and processes have overlapped functionali- execution abstraction (e.g., physical/virtual memory address ties. They are both intended to provide execution abstraction space), and share similar objectives of isolation, cooperation (e.g., memory address space), and share similar objectives and scheduling. Currently, neither of them could provide the of isolation, cooperation and scheduling. On the other hand, benefits of the other: VMs provide higher isolation, security however, neither of them could provide the benefits of the and portability, while processes are more efficient, flexible and other: VMs provide higher isolation, security and portability, easier to schedule and cooperate. The heavyweight virtualiza- while processes are more lightweight and easier to schedule tion architecture degrades both efficiency and security of cloud and cooperate (with multi-process APIs like fork, IPC and services. signals). Therefore, both VMs and processes are needed by There are two trends for cloud virtualization: the first is to the mainstream software stacks, which not only decreases enhance processes to achieve VM-like security, and the second efficiency but also increases vulnerability [4, 5]. Recently, commodity clouds (like Amazon’s Elastic is to reduce VMs to achieve process-like flexibility. Based on Computing Cloud [6] and Alibaba’s Aliyun [7]) have com- these observations, our vision is that in the near future VMs pletely changed the economics of large-scale computing [8], and processes might be fused into one new abstraction for cloud providing a public platform where tenants run their specific virtualization that embraces the best of both, providing VM- applications (e.g., a web server or a database server) on level isolation and security while preserving process-level effi- dedicated VMs. The highly-specialized VMs require only a ciency and flexibility. We present a reference implementation, very small subset of the overall functionality provided by a dubbed cKernel (customized kernel), for the new abstraction. standard OS. Essentially, cKernel is a LibOS based virtualization architec- This has motivated two trends for cloud virtualization: ture, which (i) removes traditional OS and process layers and the first is to enhance processes to achieve VM-like security, retains only VMs, and (ii) takes the as an OS and and the second is to reduce VMs to achieve process-like flex- imitates processes’s dynamic mapping mechanism for VMs’ ibility. For example, picoprocesses [9] augment processes pages and libraries. so as to address the isolation and efficiency problem of conventional OS processes, and Unikernels [4] statically 1. Introduction seal only the application binary and requisite libraries into a single bootable appliance image so that Unikernel VMs Currently, virtual machines (VMs) and processes are two could be lightweight and flexible. important abstractions for cloud virtualization. Hardware Based on these observations, our vision is that in the near virtual machines [1] have been widely used to guarantee future the two abstractions of VMs and processes might be isolation [2, 3] and improve system reliability and securi- fused into one new abstraction for cloud virtualization that embraces the best of both VMs and processes, providing VM-level isolation and security while preserving process- level efficiency and flexibility. We argue that a promising approach for this fusion is to start from the VM abstraction and add desirable processes’ features following the library operating system (LibOS) [10] paradigm, which has been widely adopted in the virtualization literature [4, 11, 9, 12, 13, 14, 15, 16]. We present a reference implementation, dubbed cKernel Figure 1: Conventional VMs/processes vs. cKernel VM (customized kernel), for the new abstraction. cKernel is appliances. essentially a LibOS based virtualization architecture, which (i) removes traditional OS and process layers and retains on- ly VMs, and (ii) takes the hypervisor as an OS and imitates effectively reduce the image size and memory footprint, processes’s dynamic mapping mechanism for VMs’ pages taking the first step towards making a more and libraries. By drawing an analogy between conventional like a process. processes and VMs, cKernel has partially (i) realized the Meanwhile, another trend is to improve processes with fork API for VMs, a basis for conventional multi-process VM-like isolation and security. For instance, the picopro- programming abstractions, and (ii) supported dynamic lib- cess [9, 19] is a process-based isolation abstraction that rary loading and linking for the minimized VMs. can be viewed as a stripped-down virtual machine with- The rest of this paper is organized as follows. Sec- out emulated CPU kernel mode, MMU or virtual devices. tion 2 further introduces the background and motivation. Picoprocess-oriented library OSs [12, 15, 13, 14, 16] provide Section 3 introduces a promising approach (cKernel) to re- an interface restricted to a narrowed set of host kernel alizing the fusion of processes and VMs. Section 4 presents ABIs [12], implementing the OS personalities as library cKernel’s reference implementation. Section 5 introduces functions and mapping high-level APIs onto a few interfaces related work. Section 6 discusses future work. And finally to the host OS. Section 7 concludes the paper. For example, Drawbridge [12] refactors Windows 7 to create a self-contained LibOS running in a picoprocess yet still supporting rich desktop applications. Graphene [15] 2. Background broadens the LibOS paradigm by supporting multi-process APIs in picoprocesses. Bascule [13] allows OS-independent 2.1. VMs Vs. Processes extensions to be attached safely and efficiently at runtime. Tardigrade [14] utilizes picoprocesses to easily construct VMs rely on the virtual machine hypervisor, like [1] fault-tolerant services. And Haven [16] leverages the hard- and Linux KVM [17], to provide an application with all its ware protection of Intel Software Guard Extensions (SGX) dependencies in a virtualized, self-contained unit. As shown to achieve shielded execution in picoprocesses against at- in Fig. 1 (left), the hardware dependency is provided through tacks from a malicious host. Similar designs of enhancing the abstraction of an emulated machine with CPU kernel processes include Linux Container [20], [21], Open- mode, multiple address spaces and virtual hardware devices, VZ [22], and VServer [23]. which is able to provide the software dependency by running a conventional operating system together with various lib- 2.2. Vision raries, perhaps slightly modified for para-virtualization [1]. VMs have been widely used for since Currently, neither VMs nor processes could provide the they provide desirable compatibility of major applications benefits of the other, and thus both of them are needed by reusing existing OS functions including the support of by the mainstream software stacks. This not only decreases various hardwares. efficiency but also increases vulnerability of cloud services. However, running a conventional OS in a VM brings The two treads (of enhancing processes to achieve VM- substantial overhead, since the guest OS may run many du- like security and reducing VMs to achieve process-like plicated management processes and each guest OS may de- flexibility) have motivated our vision that in the near future mand gigabytes of disk storage and hundreds of megabytes VMs and processes might be fused into one new abstraction of physical memory. This has recently motivated the revival for cloud virtualization, with the goal of making the best of of LibOS for VM appliances [4, 18, 11]. For example, both VMs and processes. Unikernel [4] implements the OS functions (e.g., networking Considering the great success of VMs in security and system and file system) as a set of libraries that can be backward compatibility, in this paper we will discuss a separately linked into the VM appliance at compile time. promising approach for the fusion. We propose to start from The Unikernel (system libraries), application libraries and the VM abstraction, strip off any unused functions of the application binary are statically sealed into a single bootable accommodated applications, and add desirable processes’ image, which can run directly on the virtual hardware pro- features like inter-process communication and dynamic lib- vided by the hypervisor. These VM oriented LibOS designs rary linking to the minimized VMs. We follow the technical and industrial tread to base our design on the state-of-the-art 3.2. Process-Like Fork of VMs LibOS paradigm, making virtual machines on the hypervisor be comparable to processes on the OS, as depicted in Fig. 1 Since the process abstraction has been striped off from (right). the simplified cKernel architecture, cKernel VMs need to realize IPC (inter-process communication) to support con- 3. Fusion of VMs and Processes ventional multi-process applications. cKernel lets a multi- process application have the view that its processes run with- out any change, while these processes are actually cKernel This section introduces a promising approach (called VMs collaboratively running on the hypervisor. cKernel) to realizing the new abstraction, a LibOS based The fork API is the basis to realize IPC. It is straight- virtualization architecture that fuses VM and process and forward to leverage the hypervisor’s memory mapping func- makes the best of both for the cloud. tions (like Xen’s page sharing mechanism) to implement fork. The fork API will create a child VM by duplicating the 3.1. cKernel VMs calling parent VM and return its ID to the parent. cKernel needs to maintain all the information of the parent and child VMs, including the IDs, offsets within shared pages, Conventional general-purpose OSs are heavyweight and references of the grant table, etc. contain extensive features, so processes have been proposed to act as “lightweight VMs” that are cheaper to create, schedule and configure. In cloud environments, however, 3.3. Process-like Library Linking of VMs most cloud services only run a specific application in a single-purpose VM, which needs only a very small portion Full VMs could easily realize dynamic library link- of conventional OS supports. Therefore, we expect to reduce ing by leveraging their rich-featured OS functionalities. In the OS and make the VM become (even) as lightweight as the cKernel architecture, however, the minimized VMs are a process. monolithically sealed at compile time and thus libraries Based on this observation, cKernel proposes to start cannot be dynamically linked as before. Static library linking from the VM abstraction and take minimalism to the ex- brings great inefficiency to cKernel. For example, the entire treme, sealing only the application binary into a single VM image has to be re-compiled and re-deployed to update VM. Previous work [5] has showed that a VM could be each of its libraries. booted/run in less than 3 milliseconds, which is comparable Therefore, it is necessary to design a dynamic LibOS to process fork/exec on Linux (about 1 millisecond). for cKernel’s minimized VMs, which supports to map the This proves the feasibility for cKernel to realize high- shared libraries (along with their states) onto the VMs’ vir- performance process-like VM creation and scheduling (such tual address space at runtime. To achieve this goal, cKernel as VM fork, IPC-like inter-VM communication, and dynam- follows the “shell-core” mechanism [2], sealing only the ic library linking) by taking the hypervisor as an OS. By application binary into a single VM (shell) and dynamically drawing an analogy between VMs and processes, cKernel linking libraries (core) at runtime. It (i) adds a dynamic naturally relies on the hypervisor for multicore scheduling, segment to its virtual address space and (ii) dynamically following the multikernel mechanism [24] of running one loads and maps requisite libraries, in a similar way to the VM per vCPU. procedure of dynamic linking of conventional processes by In the conventional virtualization architecture, threads the language runtime (like glibc) [27]. have been proposed as “lightweight processes”. Multiple Specifically, cKernel initializes a VM and then jumps threads running in one process could provide more efficient to an entry function provided by the hypervisor. cKernel scheduling and communication. However, many production adopts a single virtual address space where the basic OS systems have showed that threads are difficult to rein in functions (e.g., bootstrap and memory allocation), system large-scale distributed environments [25] due to problems libraries (including the language runtime), application binary like data race, deadlock, and livelock, which are usually and libraries, and global data structures co-locate to run nondeterministic and counter-intuitive. On the other hand, without kernel/user space separation or process scheduling. in modern clouds one VM usually runs only one service instance, making threads less attractive than before. There- 4. Reference Implementation fore, cKernel adopts the simplified coroutine model [26] for concurrency, where many logical coroutines (each of which We present a reference implementation of cKernel by serves one client connection) run in the same VM/process so modifying MiniOS [11], a para-virtualized LibOS for user as to achieve fast sequential performance while alleviating domains on Xen [1], where the application binary and the problems of threads. necessary libraries are statically sealed into a single bootable Next, we will in turn introduce how to add the two im- image like Unikernel [4]. MiniOS has implemented the portant features of conventional processes to cKernel VMs, basic functionalities needed to run an appliance on Xen. It namely, dynamic mapping of pages and dynamic linking of has a single address space and a simple scheduler without libraries. preemptive scheduling/threading. Algorithm 1 Dynamic Library Linking reserved by Xen 1: procedure XC DOM MAP DYN 2: Create domain boot loader program heap program stack 3: Parse kernel image file boot stack 4: Initiate boot memory console space page tables 5: Build image in memory xenstore external I/O arch‐specific start info 6: Read dynamic lib info from program hdr table structures 7: for each dynamic lib do address grant table p2m table 8: Retrieve info of dynamic sections dynamic seg 9: Load dynamic sections

virtual front driver info 10: Relocate unresolved symbols kernel seg 11: end for text and data 12: Jump to the program entry 13: end procedure Figure 2: cKernel memory layout. The blue region is where dynamic libraries are loaded. libraries and application binary, (ii) the dynamic segment where all the dynamic libraries are loaded, and (iii) the struc- cKernel leverages the GNU cross-compiling develop- tures needed by cKernel such as the physical-to-machine ment tool chain (including the GCC compiler) to support (p2m) mapping table, arch-specific structures, page tables source code (mainly C in our current prototype) compatibil- and stacks. ity, so that a legacy Linux application could be re-compiled The front driver info region is for realizing the split to a cKernel appliance that can run directly on Xen. It driver model of Xen. The grant table region is used to incorporates the lightweight RedHat Newlib [28] to provide share pages with dom0 and other domUs. The external the standard C library functions. I/O region maps the external devices to the virtual address space to facilitate I/O operations. The heap region is used 4.1. Fork to allocate memory growing in 4KB page chunks. And the Xen-reserved region is used by Xen for hypervisor-level cKernel fork is designed to duplicate the VM that ac- management and not available by cKernel. commodates the caller application. When the fork API is By emulating the procedure of dynamic mapping in invoked in the parent VM, cKernel uses inline assemblies UNIX processes that supports shared libraries, we have to get the values of CPU registers. Dom0 is responsible for implemented cKernel’s dynamic mapping mechanism of duplicating and starting the child VM. shared libraries in the Xen control library (libxc), which is The parent domain uses the grant table to selectively used by the upper-layer control tool stacks like XL (XM), share its data with its child, including the stack, heap, .bss Libvirt/VIRSH, and XAPI/XE. and data sections. This improves the security by avoiding After creating the domain boot loader, XL will build a unauthorized access. For instance, the parent may choose not para-virtualized domU, during the process of which it (i) to share its stack if the child never access it; and in a key- parses the kernel image file, (ii) initiates the boot memory, value store application a child domain servicing a specific (iii) builds the image in the memory, and (iv) boots up the user should be given access to only that user’s data. Writable image for domU. We modify the above 3rd step (of building data is shared in a copy-on-write mode. image) and add a function xc dom map dyn(), which maps The forked child VM can access the authorized data of the dynamic libraries at runtime to the virtual address of the parent VM. The child and parent may also use shared dynamic segment in the text and data region depicted in pages to communicate with each other, i.e., realizing con- Fig. 2. ventional inter-process communication APIs in the cKernel architecture, such as pipe, signal, and message queue. The Xen-based implementation of inter-VM page mapping will 5. Related Work & Discussion be studied in our future work. This section discusses the related work of the fused vir- 4.2. Dynamic Library Linking tualization architecture, including virtualization techniques of virtual machines and containers (Section 5.1), modular The original design of MiniOS cannot support dynamic kernels and library OSs (Section 5.2), and security issues libraries. Fig. 2 shows the augmented memory layout of (Section 5.3). cKernel, which is divided into multiple subspaces for regions of text and data, frontend driver info, grant table, external 5.1. Virtual Machines & Containers I/O, heap, and Xen-reserved. The text and data region contains (i) the kernel segment Currently full virtual machine like Xen [1] including the pre-compiled basic OS functions, (if any) static and Linux KVM [17] are the dominant virtualization tech- nique that provides an application with all its dependen- virtual memory and inter-process communication primitives, cies on both hardware and software in a virtualized self- and SPACE [38] achieves better I/O performance while pro- contained unit [29]. viding low-level kernel abstractions defined by the trap and The full VM and rich-featured OS architecture is heavy- architectural interface. Cache Kernel [39] provides a low- weight and inefficient. Therefore, researchers have proposed level kernel that supports multiple application-level OSs, container-based virtualization techniques, such as picopro- and thus it can be viewed as a variation of the exokernel. cesses [9], Linux Container [20], Docker [21], OpenVZ [22], The difference between Cache Kernel and Exokernel is and VServer [23]. A container is essentially an OS process that Cache Kernel mainly focuses on reliability rather than plus its executing environment, so it is much smaller than a secure multiplexing of physical resources. traditional full VM and boots up much faster. With the proliferation of cloud computing and single- The picoprocess [9] is a process-based isolation con- purpose VM appliances, Unikernels [4] target the com- tainer. It is a highly restricted process that is prevented from modity clouds and compile the applications and libraries making kernel calls, and provides a single memory-isolated into a single bootable VM image that runs directly on a address space with strictly user-mode CPU execution and VM hypervisor (e.g., Xen). Mirage [4] is an OCaml [40] a narrow interface for implementing specific OS personali- based unikernel system where the applications are written ties [12]. in high-level type-safe OCaml code and are compile-time Linux Container [20] (LXC) runs multiple isolated con- statically sealed into VM appliances against modification, tainers over a single Linux host, each of which creates its which provides better security than previous library OSs own virtual spaces for its processes. It uses Linux kernel like [41]. Mirage eschews backward compatibil- [30] to multiplex and prioritize resources such as ity [12] to achieve higher performance, but it is nontrivial to CPU cycles and memory, and leverages Linux namespace to rewrite the numerous legacy services and applications (that isolate an applications’ view of users, processes, file systems are written in C, Java, Python, etc.) using OCaml. and networking [31]. Jitsu [42] uses Unikernel to design a power-efficient A Docker [21] container is made up of a base image and responsive platform for hosting cloud services in the plus multiple committed layers, and also uses resource edge network, providing a new Xen toolstack that satisfies isolation features of Linux to allow multiple independent the demands of secure multi-tenant isolation on resource- dockers to run within a single Linux host. Traditionally, constrained embedded ARM devices. Docker supports a single process only, so a separate process MiniOS [11] designs and implements a unikernel-style manager like supervisor is needed to run multiple processes library OS to run as a para-virtualized guest OS within in Docker. a Xen domain. MiniOS has better backward compatibility Although containers are smaller and have better perfor- and supports some single-process applications written in C. mance, full VMs are still dominant in public commodity However, the original MiniOS also statically seals a mono- clouds and production platforms due to better isolation, lithic appliance and suffers problems similar to Mirage. security, portability and compatibility. Libra [43] provides a library OS abstraction for the JVM directly over Xen and relies on a separate Xen domain to 5.2. Modular Kernels & Library OSs provide networking and storage, which is similar to the design of MiniOS. A microkernel [32] provides the basic functions such as Library OSs may also run in picoprocesses [12, 15, 13, virtual memory management and interprocess communica- 14, 16]. The picoprocess-based library OSs implement the tion (IPC), while many other functions like file systems, OS personalities as library functions and map high-level drivers and protocols, are run in the user space. Micro- APIs onto a few interfaces to the host OS. Picoprocess- kernels have mainly argued for extensibility and flexibili- based LibOS is more like a traditional OS process and nat- ty [33, 34, 35, 36]. For example, the SPIN [33] microker- urally supports dynamic shared libraries. However, currently nel allows applications to make policy decisions by safely picoprocess-based library OSs run only a handful of custom downloading extensions into the kernel, and Scout [34], applications on small research prototypes, mainly because Vino [35] and Nemesis [36] efficiently realize the commu- they require much engineering effort to achieve the same nication, database and multimedia applications on top of security and compatibility as full VMs. microkernels, respectively. The exokernel architecture [10] achieves higher flexibil- 5.3. Security Issues ity than microkernels by providing application-level secure multiplexing of physical resources and enabling application- cKernel VMs are expected to provide network-facing specific customization of conventional OS abstractions. A services in the public multi-tenant cloud and run inside full small exokernel securely exports all hardware resources virtual machines above the Xen hypervisor layer, therefore through a low-level interface to untrusted library OSs, which we treat both the hypervisor and the control domain (dom0) use this interface to implement system functions. Exok- as part of the trusted computing base (TCB). Software ernels provide applications with higher performance than running in cKernel VMs is under potential threat from both traditional microkernel systems. For example, ExOS [10] other tenants in the cloud and malicious hosts connected and JOS [37] provide much more efficient application-level via Internet. The full VM architecture facilitates cKernel to use the hypervisor as the guard for isolation against internal to incorporate the MUSL library [47] instead of RedHat attacks from other tenants, and the use of secure protocols Newlib for better runtime performance. like SSL and SSH helps cKernel appliances trust external Third, it is important to develop demonstrating applica- entities. tions of the new architecture. For example, it is interesting to Some modern OSs apply type-safe languages (e.g., build a cloud platform on which multiple large-scale cKernel OCaml [4], Haskell [44], and C sharp [41]) to achieve type- applications are deployed securely and quickly. It is chal- safety. For example, Mirage [4] uses OCaml to implement lenging to build a cKernel pool for fast VM booting, which a set of type-safe protocol libraries and provides static type- accommodates sleeping (empty) VMs that can be waken safety with a single address-space layout that is immutable up by dynamically linking to the designated applications via a hypervisor extension. But there are still non-type-safe as shared libraries. It is also useful to build basic cloud components (e.g., the garbage collector) in Mirage. Clearly, services adopting the fused architecture, such as the large- cKernel is written in C and thus is not type-safe. However, scale persistent in-memory key-value store [48, 49]. the cKernel architecture is orthogonal to the language for Fourth, currently customers of a cloud must trust the implementation and it is possible to implement a “type-safe” cloud provider with both the hypervisor and the cKernel cKernel using type-safe languages. Note that to achieve VMs, while recent advances in hardware protection (Intel complete type-safety not only the OS and applications but SGX [16]) provides opportunity to shield cKernel VMs not also the hypervisor and dom0 should not allow the type-safe only against unforseen bugs of the compiler, runtime, hyper- property to be violated. visor or toolchain, but also against attacks from malicious Another approach to improving security is to reduce the hosts and operators. TCB of a VM through disaggregation. For example, Ander- Last, it is also vital to implement the fused virtualization son et al. propose to run security sensitive components in architecture for other VM hypervisors like Linux KVM and separate Xen domains for MiniOS [11], leveraging the fully VMWare vSphere. virtualized processors and the Trusted Computing Group’s trusted platform module (TPM) provided by the hypervisor. 7. Conclusion Murray et al. implement “trusted virtualization” for Xen and improve the security of virtual TPM by moving the domain In this paper we propose a new virtualization architec- builder, which is the most important privileged component, ture which fuses the abstractions of VMs and processes into a minimal trusted compartment [45]. Currently the TCB for the cloud. We present a LibOS based reference im- of cKernel includes the hypervisor, the privileged dom0 and plementation (cKernel) by modifying MiniOS on top of the necessary management tools that is hosted on top of a Xen, which seals only the application binary into a bootable full Linux OS running dom0, therefore it is worth studying VM and supports dynamic library linking and conventional the disaggregation approach to reduce the TCB of cKernel process-like fork. The new virtualization architecture has a in future work. number of advances. For instance, it (i) allows online library update by leveraging dynamic library mapping, (ii) supports conventional multi-process applications by realizing VM 6. Future Work fork, and (iii) improves the flexibility of VMs’ management by emulating processes’ scheduling and operations. Some The fusion of VMs and processes provides the cloud a key components of cKernel are already available as open- desirable virtualization architecture that embraces the best source [50]. of both, but a lot of work remains to do in this promising direction. Acknowledgement First, currently cKernel supports fork but still cannot support IPC APIs like pipe, signal and message queue. So This work was supported in part by the National Ba- it is urgent to implement the complete IPC abstractions for sic Research Program of China (973) under Grant No. cKernel. It is also necessary to study the subtle difference 2014CB340303, the National Natural Science Foundation between VMs and processes. For example, it is common of China (NSFC) under Grant No. 61772541, 61379055 and for a web server to fork a worker process to service a 61379053. This work was performed when the first author user request, but a worker VM will have a new IP and was visiting the Computer Lab at University of Cambridge lose connection to the user. The fork API could also be between 2012 and 2013. The authors would like to thank extended to support the spawn() primitive for fast booting Huiba Li, Ziyang Li, Yuxing Peng, Ling Liu, Jon Crowcroft, a large-number of VM appliances in OpenStack the iVCE Group, the NetOS group of Computer Lab at Second, the current cKernel implementation relies on University of Cambridge, and the anonymous reviewers for the (unmodified) shared memory mechanism of Xen, whose their insightful comments. networking system, file system and threading model are simple and inefficient. This might result in relatively poor References performance [46]. For example, the current networking im- plementation of Xen is insufficient and we could use the [1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, ClickOS [46] modification to improve it. It is also desirable A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the art of virtualization,” ACM SIGOPS Operating USENIX Symposium on Operating Systems Design and Systems Review, vol. 37, no. 5, pp. 164–177, 2003. Implementation (OSDI), 2014. [2] X. Lu, H. Wang, and J. Wang, “Internet-based virtual [17] “http://www.linux-kvm.org/.” computing environment (ivce): Concepts and architec- [18] “http://osv.io/.” ture,” Science in China Series F: Information Sciences, [19] J. Howell, B. Parno, and J. R. Douceur, “How to vol. 49, no. 6, pp. 681–701, 2006. run posix apps in a minimal picoprocess.” in USENIX [3] X. Lu, H. Wang, J. Wang, and J. Xu, “Internet-based Annual Technical Conference, 2013, pp. 321–332. virtual computing environment: Beyond the data center [20] “https://linuxcontainers.org.” as a computer,” Future Generation Computer Systems, [21] “https://www.docker.com.” vol. 29, no. 1, pp. 309–322, 2013. [22] “http://openvz.org.” [4] A. Madhavapeddy, R. Mortier, S. Smith, S. Hand, and [23] “http://http://linux-vserver.org.” J. Crowcroft, “Unikernels: Library operating systems [24] A. Baumann, P. Barham, P.-E. Dagand, T. Harris, for the cloud,” in ACM ASPLOS. ACM, 2013, pp. R. Isaacs, S. Peter, T. Roscoe, A. Schupbach,¨ and 461–472. A. Singhania, “The multikernel: a new os architecture [5] F. Manco, C. Lupu, F. Schmidt, J. Mendes, S. Kuenzer, for scalable multicore systems,” in Proceedings of the S. Sati, K. Yasukata, C. Raiciu, and F. Huici, “My vm ACM SIGOPS 22nd symposium on Operating systems is lighter (and safer) than your container,” 2017. principles. ACM, 2009, pp. 29–44. [6] “https://aws.amazon.com/ec2/.” [25] A. Ho, S. Smith, and S. Hand, “On deadlock, livelock, [7] “http://www.aliyun.com/.” and forward progress,” Technical Report, University of [8] A. Madhavapeddy, R. Mortier, R. Sohan, T. Gaza- Cambridge, Computer Laboratory (May 2005), 2005. gnaire, S. Hand, T. Deegan, D. McAuley, and [26] S. R. Maxwell, “Experiments with a coroutine execu- J. Crowcroft, “Turning down the lamp: software spe- tion model for genetic programming,” in Evolutionary cialisation for the cloud,” in Proceedings of the 2nd Computation, 1994. IEEE World Congress on Compu- USENIX conference on Hot topics in cloud computing, tational Intelligence., Proceedings of the First IEEE HotCloud, vol. 10, 2010, pp. 11–11. Conference on. IEEE, 1994, pp. 413–417. [9] J. R. Douceur, J. Elson, J. Howell, and J. R. Lorch, [27] Y. Zhang, D. Li, and Y. Xiong, “Enabling online library “Leveraging legacy code to deploy desktop applica- update for vm appliances in the cloud,” NUDT, Tech. tions on the web.” in OSDI, vol. 8, 2008, pp. 339–354. Rep., 2016. [10] D. R. Engler, M. F. Kaashoek et al., “Exokernel: [28] “https://sourceware.org/newlib/.” An operating system architecture for application-level [29] J. Rutkowska and R. Wojtczuk, “Qubes os architec- resource management,” in Fifteenth ACM Symposium ture,” Invisible Things Lab Tech Rep, vol. 54, 2010. on Operating Systems Principles. ACM, 1995. [30] “https://en.wikipedia.org/wiki/Cgroups.” [11] M. J. Anderson, M. Moffie, C. I. Dalton et al., “To- [31] R. Rosen, “Resource management: Linux kernel wards trustworthy virtualisation environments: Xen namespaces and cgroups,” Haifux, May, 2013. library os security service infrastructure,” HP Tech [32] P. B. Hansen, “The nucleus of a multiprogramming Reort, pp. 88–111, 2007. system,” Communications of the ACM, vol. 13, no. 4, [12] D. E. Porter, S. Boyd-Wickizer, J. Howell, R. Olinsky, pp. 238–241, 1970. and G. C. Hunt, “Rethinking the library os from the [33] B. N. B. S. S. Przemysław, P. E. G. Sirer, M. E. F. D. top down,” ACM SIGPLAN Notices, vol. 46, no. 3, pp. Becker, C. Chambers, and S. Eggers, “Extensibility, 291–304, 2011. safety and performance in the spin operating system,” [13] A. Baumann, D. Lee, P. Fonseca, L. Glendenning, in Fifteenth ACM Symposium on Operating Systems J. R. Lorch, B. Bond, R. Olinsky, and G. C. Hunt, Principles. ACM, 1995, p. 46. “Composing os extensions safely and efficiently with [34] A. B. Montz, D. Mosberger, S. W. O’Mally, bascule,” in Proceedings of the 8th ACM European L. L. Peterson, and T. A. Proebsting, “Scout: A Conference on Computer Systems. ACM, 2013, pp. communications-oriented operating system,” in HotOS. 239–252. USENIX, 1995, p. 58. [14] J. R. Lorch, A. Baumann, L. Glendenning, D. T. [35] C. Small and M. Seltzer, “Vino: an integrated platform Meyer, and A. Warfield, “Tardigrade: Leveraging for operating systems and database research,” Harvard, lightweight virtual machines to easily and efficiently Tech. Rep., 1994. construct fault-tolerant services.” [36] I. M. Leslie, D. McAuley, R. Black, T. Roscoe, [15] C.-C. Tsai, K. S. Arora, N. Bandi, B. Jain, W. Jannen, P. Barham, D. Evers, R. Fairbairns, and E. Hyden, “The J. John, H. A. Kalodner, V. Kulkarni, D. Oliveira, design and implementation of an operating system to and D. E. Porter, “Cooperation and security isola- support distributed multimedia applications,” Selected tion of library oses for multi-process applications,” Areas in Communications, IEEE Journal on, vol. 14, in Proceedings of the Ninth European Conference on no. 7, pp. 1280–1297, 1996. Computer Systems. ACM, 2014, p. 9. [37] “http://pdos.csail.mit.edu/6.828.” [16] A. Baumann, M. Peinado, and G. Hunt, “Shielding [38] D. Probert, J. Bruno, and M. Karaorman, “Space: applications from an untrusted cloud with haven,” in A new approach to operating system abstraction,” in Object Orientation in Operating Systems, 1991. Pro- ceedings., 1991 International Workshop on. IEEE, 1991, pp. 133–137. [39] D. R. Cheriton and K. J. Duda, “A caching model of operating system kernel functionality,” in Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation. USENIX Association, 1994, p. 14. [40] “https://ocaml.org.” [41] G. C. Hunt and J. R. Larus, “Singularity: rethinking the software stack,” ACM SIGOPS Operating Systems Review, vol. 41, no. 2, pp. 37–49, 2007. [42] A. Madhavapeddy, T. Leonard, M. Skjegstad, T. Gaza- gnaire, D. Sheets, D. Scott, R. Mortier, A. Chaudhry, B. Singh, J. Ludlam et al., “Jitsu: Just-in-time sum- moning of unikernels,” in 12th USENIX Symposium on Networked System Design and Implementation, 2015. [43] G. Ammons, J. Appavoo, M. Butrico, D. Da Silva, D. Grove, K. Kawachiya, O. Krieger, B. Rosenburg, E. Van Hensbergen, and R. W. Wisniewski, “Libra: a library operating system for a jvm in a virtualized execution environment,” in VEE, vol. 7, 2007, pp. 44– 54. [44] T. Hallgren, M. P. Jones, R. Leslie, and A. Tolmach, “A principled approach to operating system construction in haskell,” in ACM SIGPLAN Notices, vol. 40, no. 9. ACM, 2005, pp. 116–128. [45] D. G. Murray, G. Milos, and S. Hand, “Improving xen security through disaggregation,” in Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments. ACM, 2008, pp. 151–160. [46] J. Martins, M. Ahmed, C. Raiciu, V. Olteanu, M. Hon- da, R. Bifulco, and F. Huici, “Clickos and the art of network function virtualization,” in 11th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 14). USENIX Association, 2014, pp. 459–473. [47] “http://www.musl-libc.org.” [48] Y. Zhang, C. Guo, D. Li, R. Chu, H. Wu, and Y. Xiong, “Cubicring: Enabling one-hop failure detection and recovery for distributed in-memory storage systems,” in 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI 15, Oakland, CA, USA, May 4-6, 2015, 2015, pp. 529–542. [49] J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazieres,` S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum et al., “The case for ramclouds: scalable high-performance storage entirely in dram,” ACM SIGOPS Operating Systems Review, vol. 43, no. 4, pp. 92–105, 2010. [50] “http://www.kylinx.com/.”