Cloudburst: Stateful Functions-As-A-Service

Cloudburst: Stateful Functions-as-a-Service Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann Schleier-Smith, Joseph E. Gonzalez, Joseph M. Hellerstein, Alexey Tumanovy U.C. Berkeley, yGeorgia Tech {vikrams, cgwu, charles.lin, jssmith, jegonzal, hellerstein}@berkeley.edu, [email protected] ABSTRACT 103 Function-as-a-Service (FaaS) platforms and “serverless” cloud computing are becoming increasingly popular due to ease-of-use 2 (ms) 10 and operational simplicity. Current FaaS oerings are targeted at stateless functions that do minimal I/O and communication. We argue that the benets of serverless computing can be 10 extended to a broader range of applications and algorithms while Latency maintaining the key benets of existing FaaS oerings. We Dask λ λ+S3 CB (Single) present the design and implementation of Cloudburst, a stateful Cloudburst SAND λ+Dynamo Step-Fns λ (Single) FaaS platform that provides familiar Python programming with low-latency mutable state and communication, while maintaining Figure 1: Median (bar) and 99th percentile (whisker) latency the autoscaling benets of serverless computing. Cloudburst for square(increment(x: int)). Cloudburst matches the accomplishes this by leveraging Anna, an autoscaling key-value best distributed Python systems and outperforms other store, for state sharing and overlay routing combined with mu- FaaS systems by over an order of magnitude (x6.1). table caches co-located with function executors for data locality. Performant cache consistency emerges as a key challenge in this architecture. To this end, Cloudburst provides a combination of of their code: there is no need to overprovision to match peak lattice-encapsulated state and new denitions and protocols for load, and there are no compute costs during idle periods. These distributed session consistency. Empirical results on benchmarks benets have made FaaS platforms an attractive target for and diverse applications show that Cloudburst makes stateful both research [26, 44, 4, 43, 37, 47, 10, 81, 28, 25] and industry functions practical, reducing the state-management overheads applications [7]. of current FaaS platforms by orders of magnitude while also The hallmark autoscaling feature of serverless platforms is en- improving the state of the art in serverless consistency. abled by an increasingly popular design principle: the disaggrega- PVLDB Reference Format: tion of storage and compute services [32]. Disaggregation allows Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann Schleier- the compute layer to quickly adapt computational resource allo- Smith, Jose M. Faleiro, Joseph E. Gonzalez, Joseph M. Hellerstein, Alexey Tumanov. Cloudburst: Stateful Functions-as-a-Service. PVLDB, 13(11): cation to shifting workload requirements, packing functions into 2438-2452, 2020. VMs while reducing data movement. Similarly, object or key-value DOI: https://doi.org/10.14778/3407790.3407836 stores can pack multiple users’ data storage and access workloads into shared resources with high volume and often at low cost. Dis- 1. INTRODUCTION aggregation also enables allocation at multiple timescales: long- term storage can be allocated separately from short-term compute Serverless computing has become increasingly popular in leases. Together, these advantages enable ecient autoscaling. recent years, with a focus on autoscaling Function-as-a-Service User code consumes expensive compute resources as needed and (FaaS) systems. FaaS platforms allow developers to write func- accrues only storage costs during idle periods. tions in standard languages and deploy their code to the cloud Unfortunately, today’s FaaS platforms take disaggregation to an with reduced administrative burden. The platform is responsible extreme, imposing signicant constraints on developers. First, the for transparently autoscaling resources from zero to peak load autoscaling storage services provided by cloud vendors—e.g., AWS and back in response to workload shifts. Consumption-based S3 and DynamoDB—are too high-latency to access with any fre- pricing ensures that developers’ cost is proportional to usage quency [84, 37]. Second, function invocations are isolated from each other: FaaS systems disable point-to-point network commu- This work is licensed under the Creative Commons Attribution- nication between functions. Finally, and perhaps most surpris- NonCommercial-NoDerivatives 4.0 International License. To view a copy ingly, current FaaS oerings have very slow nested function calls: of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For argument- and result-passing is a form of cross-function commu- any use beyond those covered by this license, obtain permission by emailing nication and exhibits the high latency of current serverless oer- [email protected]. Copyright is held by the owner/author(s). Publication rights ings [4]. We return these points in x2.1, but in short, today’s popu- licensed to the VLDB Endowment. lar FaaS platforms only work well for isolated, stateless functions. Proceedings of the VLDB Endowment, Vol. 13, No. 11 ISSN 2150-8097. As a workaround, many applications—even some that were ex- DOI: https://doi.org/10.14778/3407790.3407836 plicitly designed for serverless platforms—are forced to step out- 2438 side the bounds of the serverless paradigm altogether. For exam- 1 from cloudburst import* ple, the ExCamera serverless video encoding system [26] depends 2 cloud = CloudburstClient(cloudburst_addr, my_ip) upon a single server machine as a coordinator and task assign- 3 cloud.put('key', 2) 4 reference = CloudburstReference('key') ment service. Similarly, numpywren [74] enables serverless linear 5 def sqfun(x): returnx*x algebra but provisions a static Redis machine for low-latency ac- 6 sq = cloud.register(sqfun, name='square') cess to shared state for coordination. These workarounds might 7 8 print('result:%d' % (sq(reference)) be tenable at small scales, but they architecturally reintroduce the 9 > result: 4 scaling, fault tolerance, and management problems of traditional 10 server deployments. 11 future = sq(3, store_in_kvs=True) 12 print('result:%d' % (future.get()) 13 > result: 9 1.1 Toward Stateful Serverless via LDPC Given the simplicity and economic appeal of FaaS, we are inter- ested in exploring designs that preserve the autoscaling and opera- Figure 2: A script to create and execute a Cloudburst func- tional benets of current oerings, while adding performant, cost- tion. ecient, and consistent shared state and communication. This “stateful” serverless model opens up autoscaling FaaS to a much latency autoscaling key-value store designed to achieve a variety broader array of applications and algorithms. We aim to demon- of coordination-free consistency levels by using mergeable mono- strate that serverless architectures can support stateful applica- tonic lattice data structures [75, 17]. For performant consistency, tions while maintaining the simplicity and appeal of the serverless Cloudburst takes advantage of Anna’s design by transparently programming model. encapsulating opaque user state in lattices so that Anna can For example, many low-latency services need to autoscale consistently merge concurrent updates. In addition, we present to handle bursts while dynamically manipulating data based novel protocols that ensure consistency guarantees (repeatable on request parameters. This includes webservers managing read and causal consistency) across function invocations that run user sessions, discussion forums managing threads, ad servers on separate nodes. We evaluate Cloudburst via microbenchmarks managing ML models, and more. Similarly, many parallel and as well as two application scenarios using third-party code, distributed protocols require ne-grained messaging, from quan- demonstrating benets in performance, predictable latency, and titative tasks like distributed aggregation [46] to system tasks like consistency. In sum, this paper’s contributions include: membership [20] or leader election [6]. These protocols forms 1. The design and implementation of an autoscaling serverless the backbone of parallel and distributed systems. As we see in x6, architecture that combines logical disaggregation with physical these scenarios are infeasible on today’s stateless FaaS platforms. co-location of compute and storage (LDPC) (x4). To enable stateful serverless computing, we propose a new 2. Identication of distributed session consistency concerns and design principle: logical disaggregation with physical colocation new protocols to achieve two distinct distributed session consis- (LDPC). Disaggregation is needed to provision, scale, and bill stor- tency guarantees—repeatable read and causal consistency—for age and compute independently, but we want to deploy resources compositions of functions (x5). to dierent services in close physical proximity. In particular, a 3. The ability for programs written in traditional languages to running function’s “hot” data should be kept physically nearby enjoy coordination-free storage consistency for their native for low-latency access. Updates should be allowed at any function data types via lattice capsules that wrap program state with invocation site, and cross-function communication should work metadata that enables automatic conict APIs supported by at wire speed. Anna (x5.2). Colocation of compute and data is a well-known method to 4. An evaluation of Cloudburst’s performance and consistency overcome performance barriers, but it can raise thorny correct- on workloads

Load more