The Restless Cloud Nathan Pemberton Johann Schleier-Smith Joseph E

The RESTless Cloud Nathan Pemberton Johann Schleier-Smith Joseph E. Gonzalez UC Berkeley UC Berkeley UC Berkeley [email protected] [email protected] [email protected] ABSTRACT similarities. When deciding on a new feature, users must Cloud provider APIs have emerged as the de facto operating pick a set of services to commit to, then work out how to system interface for the warehouse scale computers that com- integrate and manage them. As needs evolve and platform prise the public cloud. Like single-server operating systems, services expand, they may be forced to rewrite their logic. they provide the resource allocation, protection, communica- Contrast this with writing an application for a single tion paths, naming, and scheduling for these large machines. server. While there are many languages and frameworks Cloud provider APIs also provide all sorts of things that oper- to choose from, a common set of underlying abstractions is ating systems do not, things like big data analytics, machine found on just about every machine available. Whether the learning model training, or factory automation. Somewhere, platform is Windows or Linux, files and processes work in lurking within this menagerie of services, there is an operat- roughly the same way. The portable operating system inter- ing system interface to a really big computer, the computer face (POSIX) [38] arose to formalize these patterns, not just that today’s application developers target. This computer for portability as the name implies, but also as a model of how works nothing like a single server, yet it also isn’t a dis- an operating system behaves. It is time for the cloud to have persed distributed system like the internet. It is something its own POSIX, a standard model for state and computation. in-between. Now is the time to distill and refine a coherent There are many options for how such an interface might “cloud system interface” from the multitude of cloud provider be designed. Most commonly today, programmers focus on APIs, preferably a portable one. In this paper we discuss what building applications that comprise multiple networked web goes in, what stays out, and the principles that inform these services [75], using REST-based protocols [26] to access stor- decisions. age, compute, and other data center resources. Alternatively, we could try to design a single system image (SSI) operat- CCS CONCEPTS ing system [15] that presents the cloud as if it were a single machine. Some proposals, like LegoOS, have proposed ex- • Computer systems organization ! Cloud comput- tending POSIX abstractions to disaggregated resources [64]. ing; • Software and its engineering ! Operating systems. Others propose a departure from POSIX to abstractions more ACM Reference Format: suitable for distributed systems [62]. In this paper, we con- Nathan Pemberton, Johann Schleier-Smith, and Joseph E. Gonzalez tend that only these latter approaches are suitable for the . 2021. The RESTless Cloud. In Workshop on Hot Topics in Operating cloud. The cloud is not a single computer and application Systems (HotOS ’21), May 31–June 2, 2021, Ann Arbor, MI, USA. designs need to reflect that. However, the cloud is also nota ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3458336. 3465280 widely dispersed set of independent machines as assumed by web services. 1 INTRODUCTION In reality, the cloud is a collection of ever-changing, tightly managed resources shared by many independent users. It The cloud is a diverse and complicated place. Cloud providers is also not a particular system, but a category of systems have added services one-by-one, their offerings growing or- with many competing implementations. Likewise, we are ganically to meet countless customer needs. Each service has not proposing a particular operating system, but a category of its own set of interfaces and semantics, and the differences system interfaces that can be implemented in a portable way between cloud providers sometimes seem greater than their by any vendor. Furthermore, this interface should integrate Permission to make digital or hard copies of part or all of this work for the wide range of constantly evolving features and services personal or classroom use is granted without fee provided that copies are available in the cloud today while providing an easier path not made or distributed for profit or commercial advantage and that copies to innovation in the future. Critically, it will need to reflect bear this notice and the full citation on the first page. Copyrights for third- party components of this work must be honored. For all other uses, contact the physical reality of the cloud in a natural and intuitive the owner/author(s). way. This helps ensure that applications can grow without HotOS ’21, May 31–June 2, 2021, Ann Arbor, MI, USA encountering artificial scalability bottlenecks, and that their © 2021 Copyright held by the owner/author(s). costs and resource consumption remain commensurate to ACM ISBN 978-1-4503-8438-4/21/05. their actual needs. https://doi.org/10.1145/3458336.3465280 49 HotOS ’21, May 31–June 2, 2021, Ann Arbor, MI, USA Pemberton, et al. Serverless computing, and Function-as-a-Service (FaaS) Operation Latency in particular, attempts to address many of these require- 2005 data center network RTT 1,000,000 ns ments [17, 60]. Cloud functions scale in accordance to the 2021 data center network RTT 200,000 ns number of requests they receive and free users from concerns Object marshaling (1k) >50,000 ns of provisioning and configuring individual servers. However, HTTP protocol 50,000 ns current serverless interfaces remain limited in scope [34], Socket overhead 5,000 ns offering an alternative, rather than a unifying, paradigm Emerging fast network RTT 1,000 ns (Section 2). In this paper, we will present a sketch of a uni- KVM Hypervisor call 700 ns fied cloud interface that builds on serverless abstractions to Linux System call 500 ns provide a portable and unified view of the cloud (Section 3). WebAssembly call - V8 Engine 17 ns While this is just one possible proposal for a portable cloud Table 1: Representative latency of various operations. interface, we hope that it will serve as a starting point for Emerging network technologies have RTT times discussion. We will follow our proposal with a discussion much lower than web service overheads. Hypervisor of the benefits such an interface can provide (Section 4). Fi- calls and system calls have similar latency, and WebAs- nally, we will present some remaining open questions and sembly isolation [31, 66] can have lower latency still. challenges for the community (Section 5). 2 TODAY’S ALTERNATIVES Before we dive into the design of a new cloud interface, the need can be tremendously wasteful [42], in part because we first consider the inadequacies of existing solutions: the of overheads such as those of web services. As a concrete web services APIs that cloud providers offer today, UNIX- example, we observe that fetching a 1KB object via the NFS derived distributed operating systems, modern cross-cloud protocol takes 1.5 ms and costs 0.003 USD/M (without the management solutions such a Kubernetes, and serverless benefit of local caching), whereas fetching the same data computing as it exists today. from DynamoDB [68] takes 4.3 ms and costs 0.18 USD/M. We speculate that a part of the cost difference comes from 2.1 Why not web services? the cloud provider passing the cost of providing a RESTful Web services and the cloud are almost synonymous, and web service interface on to the customer. for good reason. Warehouse scale computing [8] is possible At a minimum, cloud providers need a non-REST imple- because it relies upon internet technologies with proven scal- mentation of their existing APIs, but since performance prob- ability. Internet Protocol provides routing, TCP provides flow lems are tied to the protocol statelessness, a simple transla- control and congestion control, and HTTP load balancers tion is unlikely to suffice. distribute work across servers. Service endpoints provide stateless RESTful [25] interfaces, using another technology 2.2 Why not POSIX? derived from the web. Making a collection of computers work like one powerful While web services are excellent for scalability and inter- computer is a longstanding goal of distributed operating sys- operability, optimizing them for performance remains a stub- tems research [63, 71]. A flurry of work ensued in the decades born problem. Table 1 shows the latency involved in various after inexpensive workstation hardware and local networks operations that might be invoked during a call into the cloud first became available [3, 4, 20, 24, 33, 45, 47, 51, 61, 78]. These API. Web services APIs will always be adequate for certain efforts generally sought to provide a UNIX-like interface to things, such as provisioning servers, or even fetching large a group of machines. However this line of work was largely data objects from storage. However web service overheads eclipsed by the emergence of the internet, which ushered in will certainly become prohibitive on future fast networks [6], a new era of distributed systems that operated on a far larger especially when supporting fine-grained operations such scale [7, 8]. The internet technologies won in the market as small-block reads and writes. Part of the problem comes with the help of tremendous investment, which makes it from protocol and data formatting requirements, part of it hard to conclude whether POSIX-like distributed operating from stream oriented transport (cf. scatter-gather file system systems suffered from technical failings, or whether they APIs), and part from the statelessness of REST. Statelessness simply were not ready to meet the needs of gigantic internet is particularly fundamental, and has consequences such as services.

Load more