The Nonkernel: a Kernel Designed for the Cloud

The nonkernel: A Kernel Designed for the Cloud Muli Ben-Yehuda1 Omer Peleg1 Orna Agmon Ben-Yehuda1 Igor Smolyar1;2 Dan Tsafrir1 1Technion|Israel Institute of Technology 2Open University of Israel fmuli,omer,ladypine,igors,[email protected] Abstract virtual machines running in IaaS clouds run the same legacy operating systems that previously ran Infrastructure-as-a-Service (IaaS) cloud computing on traditional bare-metal stand-alone servers; that is causing a fundamental shift in the way computing have been designed for the hardware available twenty resources are bought, sold, and used. We foresee a fu- and thirty years ago; and that assume that all system ture whereby every CPU cycle, every memory word, resources are always at their disposal. We argue that and every byte of network bandwidth in the cloud this is neither efficient nor sustainable and that the would have a constantly changing market-driven system software stack must adapt to the new and price. We argue that, in such an environment, the fundamentally different run-time platform posed by underlying resources should be exposed directly to IaaS cloud computing. applications without kernel or hypervisor involve- We begin by contrasting IaaS clouds with tradi- ment. We propose the nonkernel, an architecture for tional servers. We ask the following questions: Who operating system kernel construction designed for owns resources? What is the economic model? At such cloud computing platforms. A nonkernel uses what granularity are resources acquired and released? modern architectural support for machine virtualiza- And what architectural support does the platform tion to securely provide unprivileged user programs provide? with pervasive access to the underlying resources. Resource ownership and control. On a tra- We motivate the need for the nonkernel, we contrast ditional server, the operating system assumes that it against its predecessor the exokernel, and we out- it owns and controls all resources. For example, it line how one could go about building a nonkernel may use all available memory for internal caches, operating system. assuming there is no better use for it; on the other hand, in case of memory pressure it swaps pages 1 The Cloud is Different out, assuming it cannot get more physical memory. In an IaaS cloud, the operating system (running Infrastructure-as-a-Service (IaaS) cloud computing, in a virtual machine) unwittingly shares a physical where clients rent virtual machines from providers server with other operating systems running in other for short durations of time, is running more and virtual machines, each of which assumes it has full more of the world's computing workloads. Despite ownership and control over all resources assigned representing a fundamentally new way of buying, to it. Resource clashes inevitably arise, leading to selling, and using computing resources, nearly all performance variability and security breaches. Economic model. In the cloud, the guest operating system's owner and the hypervisor's owner are separate selfish economic entities. Due to economic pressure, resources are overcommitted and constantly change hands. In the Resource-as-a-Service (RaaS) cloud, into which the IaaS clouds are gradually turn- ing, those ownership decisions are made on an economic basis [1], reflecting the operating systems' owners valuation of the different resources at the when you could pay less for the same amount of time. Thus, in the cloud, each resource has a time- useful work? Thus the cloud kernel should enable dependent associated cost|this can already be ob- applications to bi-objective optimize for both useful served in, e.g., CloudSigma's (http://cloudsigma. work and cost. com/) burst pricing|and the operating system must The second requirement is to expose physical relinquish resources at a moment's notice when their resources. On a traditional server, the kernel serves prices rise beyond some application-specific thresh- multiple roles: it abstracts and multiplexes the phys- old; conversely, the operating system must purchase ical hardware, it serves as a library of useful func- the resources its applications need when they need tionality (e.g., file systems, network stacks), and it them. isolates applications from one another while letting Resource granularity. In a traditional server, them share resources. This comes at a price: applica- the operating system manages entire resources: all tions must access their resources through the kernel, CPUs, all RAM, all available devices. In the cloud, incurring run-time overhead and its associated costs; the kernel acquires and releases resources on an in- the kernel manages their resources in a one-size-fits- creasingly finer granularity [1], with a goal of acquir- all manner; and the functionality the kernel provides, ing and releasing a few milliseconds of CPU cycles, \good enough" for many applications, is far from op- a single page of RAM, a few Mb/s of network band- timal for any specific application. width. Although current cloud computing platforms In the cloud, where costs of resources constantly operate at a coarser granularity, the trend toward change, the kernel should get out of the way and fine-granularity is evident from our earlier work [1]. let applications manage their resources directly. This Architectural support. The operating systems has several important advantages: first, applications running on traditional servers strive to support can decide when and how much of each resource to both the ancient and the modern at the same time. use depending on its momentary price-tag. This en- Linux, for example, only recently dropped support ables applications to trade off cost with useful work, for the 27-year-old original Intel 386. Modern x86 or to trade off the use of a momentarily expensive cloud servers have extensive support for machine resource with a momentarily cheap one according virtualization at the CPU, MMU, chipset, and to the trade-offs that their algorithms are capable I/O device level. We contend that any new kernel of making. For example, when memory is expensive, designed for running on cloud servers should eschew one application might use less memory but more legacy platforms and take full advantage of this bandwidth while another might use less memory architectural support for machine virtualization. but more CPU cycles. Second, applications know best how to use the resources they have. An application knows best what paging policy is best for it, or whether it wants a NIC driver that is optimized 2 Designing a Cloud Kernel for throughput or for latency, or whether it needs a small or large routing table. The kernel, which has to Given the fundamental differences between a tradi- serve all applications, cannot be optimal for any one tional server and an IaaS cloud, we now ask: what application. Exposing physical resources directly to requirements should we impose on a kernel designed application means that nearly all of the functionality for the cloud? of traditional kernels can be moved to application- The first requirement is to allow applications to level, where applications can then specialize it to optimize for cost. On a traditional server, costs suit their specific needs. are fixed and applications only optimize for \useful The third requirement is to isolate applications. work". Useful work might be measured in run-time In the cloud, the kernel can rely on the underlying performance, e.g., in cache hits per second. In the hardware for many aspects of safe sharing and isola- cloud, where any work carried out requires renting tion it previously had to take care of, and thus reduce resources and every resource has a momentary price- costs and increase resource utilization. For example, tag associated with it [1], applications would still using an IOMMU, the kernel can give each applica- like to optimize for \useful work"|more useful work tion direct and secure access to its own I/O device is always better|but now they would also like to \instance" (an SR-IOV Virtual Function (VF) [10,18] optimize for cost. Why pay the cloud provider more or a paravirtual I/O device [20]) instead of multi- 2 App1 App2 App3 App1 App2 App3 N o n Traditional Operating System Kernel k e r n e l Figure 1: Traditional kernel structure compared with the nonkernel. The nonkernel grants applications safe direct access to their I/O devices. plexing in software few I/O devices between many ory management, file systems, storage device drivers, different applications. TCP/IP stack, network device drivers or any other device drivers. Instead, all resource-related code is provided to applications as libraries they can 3 The Nonkernel choose to link with. The libraries enable the application owners to control their level of optimiza- We propose the nonkernel, a new approach for kernel tion. Convenience-oriented owners will use a default construction designed for cloud computing platforms library, or maybe benchmark several alternative li- (Fig.1). The nonkernel is a hybrid kernel/hypervisor braries and choose among them. However, interested designed to satisfy all three functional requirements application owners can optimize the application for mentioned in the previous section: it allows bi- both useful work and cost by adjusting existing li- objective optimization of both useful work and cost; braries and even by implementing specialized ver- it exposes resources and their costs directly to appli- sions of \interesting" libraries. cations; and it isolates applications from one another. The nonkernel can run in one of two modes: either At the end of the day, someone has to manage as the bare-metal \hypervisor" (left side of Fig.2) or resources|but it is the application and application as the operating system kernel of a virtual machine alone that knows how to best manage resources for running on top of a legacy cloud hypervisor (right its purposes.

Load more