<<

Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) . ], gg 29 , 19 , better than 16 [ × parallelism out model substitution icecc is built on two key con- , and can be applied to arbitrary gg provides a novel mechanism gg can extract gg , which are self-contained units of distcc gg , keeps track of intermediate dependen- represents work graphs through an ab- on one of the most challenging and well- thunks ’s thunks are self-contained units of com- gg can be used to implement video processing gg gg gg gg can evaluate them in diverse environments, in- gg Because thunks identify their complete functional foot- These gains are due both to increased parallelism ( To achieve these results, Unlike previous work, Thunks: processing job, a thunk might specify one chunk of the print, cluding a local sandboxEach or thunk is an named AWS canonically Lambdation in task terms function. and of dependencies its (which computa- can be other thunks), outsourcing to a long-running 64-core build server. can exploit thousand-way parallelism fromfrastructure, serverless and in- can infer parallelismfound opportunities in not the original Makefile)trips, and because fewer network round cies in the cloud andbuild does products not back need to to the fetchshow local intermediate that machine. In addition, we and MapReduce workloads similar toera [21] those and in PyWren ExCam- [32]. cepts. First, straction called computation specifying both the executabledependencies. to Second, run and its for automatically capturing an accurate dependencyfrom graph an existing application, called putation that specify both anflow executable step to and run its for dependencies aample, and work- in environment. For a - build system, the source thunk file for will compiling reference, ahashes as single of dependencies, the the content sourcedependencies, file, and the the header files binary it itself. In includes a as video processes and executing it on serverless platforms. scientific workflows, and video-processing pipelines.first We evaluate studied parallelization problems: software builds.setting, In we this show that of builds than traditionaltools parallel such and distributed as build tomatically, and that it performs up to 3.9 video and the encoder binary that will operate on it. workflows of -like processes, such as software builds, while inferring the dependencies between build steps au- We briefly outline these concepts in turn. 1 ]. Re- 22 APIs and Stanford University , 18 locally on the ] and MapRe- , as fast as out- 7 A Thunk to Remember: 21 × general ], and all the major to 4 32 × , make -j64 21 , 4 (and other jobs) on functions-as-a-service infrastructure Sadjad Fouladi Dan Iter Shuvo Chatterjee was 3.7 Christos Kozyrakis Matei Zaharia Keith Winstein gg outperforms existing schemes for gg : a system for executing interdependent soft- as fast as running , a system for efficiently and easily captur- gg × gg to 2.2 make -j1000 × Making hyper-elastic platforms broadly available to We found that systems for accessing them, as opposedspecific to systems the designed application- in priorpropose work. To this end, we ing a parallelizable application consisting of a DAG of minimum billing increments forsearchers have VMs already [ taken advantage ofto these implement platforms hyper-elastic versions ofapplications compute-intensive including video encoding [ duce [32]. more applications, however, will require cations. As cloud- platforms have developed, they have invariably become more fine-grained:ple, for cloud exam- vendors today offercomputing” hyper-elastic platforms “serverless that can launchcontainers thousands within of Linux seconds [ cloud vendors have switched to one-minute or ten-minute applicable to a broad range of tasks. 1 Introduction opportunity to dramatically rethink many computer appli- infrastructure with thousand-way parallelism. accelerating compilation—with large projectsinkscape and such LLVM, as sourcing compilation to a remote 64-core machine, and 64-core machine itself—and that the thunk abstraction is anywhere; (b) a system todency automatically infer tree the of depen- a softwareas build a system directed acyclic and graph synthesizeof of it the thunks, by system replacing with stages “models” that capture dependencies solves thunks recursively on public functions-as-a-service that run in parallelsystem on includes public three cloud major infrastructure.change contributions: The format (a) for an representing inter- “thunks”—programstheir and complete data dependencies—that can be executed Abstract The scale and elasticity of cloud platforms gives us an with fine granularity; and (c) an execution engine that re- ware workflows across thousands of short-lived “lambdas” We present 1.2 Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) ]. ], ], is gg 35 32 39 gg icecc to imple- inkscape gg ], CIEL [ com/gg-anon. 31 . offers general and gg to capture each task, and thunk treats computations as a DAG can build up the thunk graph from gg gg . gg can execute the same build system in 1.25 ’s thunk abstraction is general enough to differs from these systems in two important gg gg gg ], and many scientific workflow systems [ 50 ] and a MapReduce engine similar to PyWren [ is open-source software; we have posted an anony- ’s thunk abstraction differs from tasks in typical 21 is related to several different classes of systems. In summary, we believe that the ability to divide com- gg gg Compiling LLVM (a toolkit for writing ) re- These gains come with a caveat: automatically infer- Apart from software builds, we also use a self-contained process invocation with alldencies of captured, its making depen- itunmodified possible multi-process applications to (e.g., take buildand scripts) pieces execute of them incrementally orcomputing in parallel infrastructure. on Second, remote higher-order thunks capture these applications. mon computational tasks into fine-grainedcute tasks them and over exe- hyper-elastic functions-as-a-service plat- forms like AWS Lambda, Google CloudOpenWhisk, Functions, and IBM Azure Functionsfoundational use will of become these platforms. apowerful new mechanisms to achieve this forof applications—multi-process a Unix-like common applications. class is effective for a widebarrassingly range parallel” of MapReduce tasks to ranging parallel fromcomplex, builds “em- irregular with dependencies. mous version for review https://github of tasks, in a similarSpark manner [ to Dryad [ However, the manner in which an arbitrary (Unix-like) application using models. By contrast, minutes on AWS Lambda,subsecond with granularity. each stage billed with quires 86 minutes on a singleto core, 4 outsource minutes to using a 64-core VM in the same region, and ring dependencies in the firstoriginal place build requires system running (e.g. the make) withother the programs compiler replaced and with models.itself The build can system become a bottleneck,recursive make especially and if many it dependencies involves andchine the is client not ma- well-endowed with CPUstart, resources. inferring On a the cold treerequired of 2.5 dependencies minutes for on the client machine (the 4-core ment a video processingera [ workload similar toto show ExCam- that 2 Related Work gg Workflow systems. workflow systems in two ways. First, a thunk in ways: its abstraction of a VM), and 2.75 minutes for LLVM. 1.2 minutes with 2 , and can sub- ranlib , gg distcc to outsource , ld (a free-software , description of the ): overspecifying ). This sometimes icecc g++ , make also provides a novel and gcc . As the driver process runs, inkscape Makefile gg fine-grained , or even a shell script), by sim- Although applications can spec- PATH through several example applica- for each CPU-intensive executable ninja gg , to memoize and reuse thunk results. can automatically capture the dependency ], require many round trips between a mas- gg automatically discovers a fine-grained but gg ’s model for the C linker determines which cmake 29 , ), , gg gg ’s execution engine allows it to upload all input 16 common executables ( [ model programs gg make six on AWS Lambda outperforms existing parallel out- to implement a parallel build accelerator. Builds have For example, compiling We demonstrate Model substitution: illustration tool) requires 33.5 minutescore on VM, one and core 5 of minutes abuilds when 4- to using a separate 64-core VM in the same EC2 region. sourced build systems running inrequiring the changes EC2 to cloud, the without program’s build system. stitution, accurate dependency graph for athe build simply existing by build running systemfinds (e.g. more parallelism than the build system hasfiles itself. and submit theround execution trips, running graph with without up to repeated 1000-way parallelism. 1.1 Summary of results gg ifying them will leadrent to parallel incorrect build-outsourcing tools, results. such Second,icecc as cur- ter server and its workersbuild (e.g., products), to and send perform back poorly intermediate connection on to a a higher-latency public cloud. In contrast, using model sub- not supported by previous “serverless”gg systems, we used traditionally been challenging tofor parallelize two reasons. efficiently First, totool leverage needs parallelism, an the accurate build and dependencies (e.g., from a dependencies will reduce parallelism, while underspec- legacy applications; for example, byjust including models for and graphs of many large open-source projects. tions. First, to showcase a complex application that is quired to determine each execution’sexample, dependencies; for libraries will be consulted todoes produce the not output actually file, link but outputs them. Each a model thunk program representinginputs. then We that found that computation models and aretomatically an its and effective correctly way infer to dependencies au- from large called model substitution. In thisstitute mechanism, invoked by a high-level driver process for the application ply placing them in the these models execute only the minimal computation re- ify a thunk graph directly, easy-to-use mechanism for capturing andency accurate graph depen- from an existing multi-process application, Then, (e.g., which allows Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) ] 33 ’s main ’s mem- ’s goal of gg gg ], VM migra- ’s thunks can the build input 49 gg , ], whether alone ] and relational ]. Snowflock [ all 9 [ 48 , allows for cached , 50 17 ] uses AWS Lambda , 44 dependencies, sufficient 21 45 allows leveraging server- Beyond build systems, au- icecc all ccache , gg can upload or 24 applies this technique to thunks gg automatically enables incremen- gg ]. ExCamera [ gg distcc 34 , 25 , however, we chose to explicitly represent , gg ] and container services [ ], so steps such as multiple file compilations, 10 9 ] proposes a MapReduce implementation on 40 . In , 32 12 lar execution graphs suchsystem. as Moreover, those that arisetal in re-execution a when build only partprogram of changes, the as input well data asintermediate or shared files the caching among of multiple input users. also or be used to express videoas coding we tasks show or in MapReduce, Section 4.5. tomatic incremental computationfor has data beendatabases flow proposed [ systemsrepresenting [ arbitrary Unix-like processes andstorage uses to cloud efficiently cache and reuse computation results. Process migration and distributed OSes. transparently running a process on remotesimilar infrastructure to is that of distributed OSestion [ [ combines VM cloning with distributederate execution to parallel accel- applications through aand VM-fork abstraction shows that thisdistcc abstraction works efficiently under the computation graph up-front innumber order of to network minimize round the tripscomputation and and to caching. enable incremental Cloud computing. to scale out videothousands encoding of and serverless processing function invocations, tasks while over Py- less computing platforms for aloads much using broader set the of abstraction work- of thunks, including irregu- retrieving each intermediate file back tosystems the thus master. perform These best onsubstantial latency a when local building network, on andin more the add remote cloud. servers In contrast, once and have thunks execute and exchange data purely Caching of build products. or in concert with compilations, reducing the requiredcompiles for of subsequent files that havelimitation not is changed. that at present,pilations it [ only caches single filelinking, com- or any other partBazel of also the caches build process build products, area but re-run. it system will include not file detect hasoization if changed. In system contrast, is agnosticbeing to performed the and captures offor computation diverse users to share a common cache (§3.4). Incremental recomputation. AWS Lambda. In contrast, within the cloud, minimizing the impact of latency. Wren [ 3 or ], and is able 29 is able to [ gg ]) and out- 8 gg Makefile icecc ], ], misconfiguring the 16 [ 27 (with 64-way parallelism) ], and Bazel [ , send data between a master 19 file; it seeks to work with ex- Numerous build systems (in- gg [ (also 64-way parallelism) when distcc tool processes subdirectories se- icecc make BUILD ], and parallelism than the original application script, to identify dependencies and en- 28 running the full computation (e.g., dedu- is not a build system itself and does not make -j64 automake or Bazel more gg distcc ]) seek to incrementalize, parallelize, or dis- to reference and analyze properties of derived BUILD without 37 gg [ does require model programs to be designed for ’s model programs, meanwhile, are a novel way to tries to address: requiring a manually-specified de- can identify dependencies at a fine granularity within In addition, many existing remote compilation systems, gg Such misconfiguration is not uncommon: for example, Virtually all existing build systems analyze a manually- gg they are shared across many applications. including node and the workers frequently during the build, e.g., each case. The number of coresto is unlock the more same, parallelism but from the underlying Makefile. every executable that will consumesources, significant and CPU every re- executable thatstream” will to process be the used output “down- number of of such such programs, executables but is the typically small (5–10), and quentially in order; in effect creatingedges spurious that dependency reduce the available parallelism. infer these dependencies accurately with fine granularity. outperforms compiling FFmpeg and Mosh, by about a factor of two in Bazel’s able incrementalization or parallelism. Whilesis this can analy- be highly sophisticatedbuild [ will cause the system to run slower than necessary the popular gg pendency graph, which createsspecify problems dependencies, if and users inefficient over- high-latency execution network (large over number a of round trips). specified configuration file, such as a cluding Vesta [ sourcing tools (such as mrcc tribute compilation to more-powerful remote machines. unlocking exposed. require rewriting or replacingMakefile mechanisms such as the isting software packages by automatically inferringdencies depen- from the existing buildexecution system, to then serverless outsourcing infrastructure. Build systems and tools. capture a dependency graph. By replacing each executable termines which files agg program invocation depends on, existing applications, such as build scripts—sometimes allow results plicating two references to theor same running higher-order a thunk model programdetermine that the can effect process on a a thunk downstream to computation). The state of the art generally has two limitations that (if unneeded dependencies are given). As we show in Section 4, with a model that can take thunks as input and only de- Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) ]. The process 1 represents a thunk as gg computation on data (a) a name, (c) the size in bytes, and (b) the SHA-256 hash of its contents, (d) its “order.” ’s goal is to outsource computations from the user’s The SHA-256 hash ofcutable an using x86-64 the ELF Linux binary kernel ABI. exe- A list of arguments andthe environment variables executable that will be invoked with. The name of thehas output finished running, file. the After contents theconsidered of the this executable value file of will the be thunk. A list of “infiles”—theaccess. files These that include the data program filesgram, accessed will other by executables invoked the by pro- it, andlibraries any loaded shared at startup time.fied Each by: infile is identi- An infile whose contents are known at the time the In addition, the thunk needs to be portable to a variety To satisfy these requirements, gg This representation has several design goals. First, it 2. 3. 4. 1. thunk is created will have order zero, and the SHA-256 one by one, would require too many round trips compared of computing environments and toa-service different providers. functions-as- It should identifycacheable, inputs in content-addressed a way, granular, because someenced data refer- may already be availablemay in the need cloud to and be some uploaded.a For particular any dataset, particular there execution should on the be thunk, a so canonical that name for thenot thunk be can executed be twice. memoized and need a JSON document (Figure 1) that contains: 3.1 Thunks: units of delayed deterministic In the functional-programming literature,parameterless closure a (a function) thunk that is capturesof a a snapshot its environment for laterof evaluation evaluating the [ thunk—which requires firstany evaluating thunks that the thunkknown has as captured, “forcing” the recursively—is thunk. computer to thousands of functions in theswer cloud, wherever in the a computationdesigned is a concrete run. realization To of docomputation—a a thunk—in self-contained this, this unit we context. of needs to identify all thethe information computation. necessary Because to a execute cloudmay execution be environment a significant distancethat from simply the exporting user, the we user’sand believe filesystem allowing to dependencies the to be cloud discovered dynamically, with loading all dependencies upfront in one shot. way that guarantees that the user will get the same an- 4 , etc.) ”. The bazel , : gg ninja make -j1000 , ’s representation, so that cmake gg , for forcing a thunk, and all thunks a directed acyclic graph of inter- make infer abstraction for representing a morsel of thunk is designed as a general system for representing the hash of its output,forcing and of can the short-circuit same the thunk. later Execution engines that it depends on,ments: recursively, in locally, outsourced various to environ- a remoteon computer, functions-as-a-service or infrastructure. Thesegines en- also memoize thunka evaluation: correspondence they between record the hash of a thunk and Software to dependent thunks from an arbitrarysystem software (whether build that makes use of alinker, C etc.). This or component C++ is toolchain specificcompilation. (compiler, to software tional footprint: the arguments,content environment, hashes and of all files it canthunks. access, This component some is of agnosticment to where the the environ- thunk willecuted), be and “forced” agnostic (meaning to ex- the computation performed. The deterministic computation in terms of ahash function of (the an x86-64 executable) and its complete func- which may be the output of other as-yet-unevaluated In total, these components are implemented in about To explore this idea in concrete terms, we built a set of gg 2. 3. 1. 8,800 lines of C++. We describe each in turn. users can outsource builds to the cloud withfollowing thousand- are the major components of filtering), and instead outsourcing itlived to thousands parallel of threads short- innear-interactive the completion time. cloud, in order to achieve components that transform one type of CPU-intensive job large workflows that can beservice executed on infrastructure. The functions-as-a- expectation ispursue that a users variety will of “laptopcomputation extension” tasks, that by might taking normally a runtime locally (software compilation, for interactive a data long exploration and visualization, machine learning, video encoding and to invoke thousands of parallelstartup threads times with and subsecond subsecond billing.intended Although originally for asynchronous Web microservices, recent encoding [21] and data analysis [32]. 3 Design and Implementation Functions-as-a-service infrastructure presentsparallel-processing substrate: services a like AWS Lambda, new Google Cloud Functions, and Azure Functions allow users voke thousands of threads at once, for tasks such as video (software compilation) into way parallelism—the equivalent of “ work has used such infrastructure to synchronously in- Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) , bazel , which a cloud , which , to be used brk gg within ninja socket , , and cmake , getcpu make , 1 Linux system call. It exits with an ptrace ’s thunks are designed to be: (1) deterministic gg gettimeofday , cache (indexed by a hash of its contents) to see if its If not, the executable isments run with and the environment provided variables—optionally argu- insandbox a that enforces the boundariesand fails of with the an thunk errorreference if data the not executable attempts included to in the thunk. The contents of the outputof file the are thunk. taken as the value value has already been memoized. We also link executables with a modified version of libc that pre- 1 A thunk is somewhat akin to an operating-system con- The basic approach is to model each stage of the compi- We implemented the sandbox in about 800 lines of 5. 4. dencies of the underlying program with perfectelse recall forcing the thunk will fail.dependencies It with should also good capture precision, these orinclude else unnecessary the thunk files will that will need to be uploaded footprint specified in the thunk. tainer. Compared with Dockerschemes, containers andand similar content-addressed, by naming theits executable inputs and by all hash, (2) expressableorder in thunks terms of as other, input lower- function files, provided (3) by functions-as-a-service runnable infrastructure, and (4) factorable—some infiles (liketem the libraries) compiler may or already sys- bethose available in infiles the should not cloud, need and the to user be wishes re-uploaded to every time force a thunk remotely. 3.2 Modeling compilation withHaving thunks designed a thunk abstractioncation to express of the an appli- x86-64 executablebuilt to a series named of data, tools wea to next software infer build a system. thunk These for tools each allow output of raw shell scripts, etc. lation process with a programior that of understands the the underlying behav- programthe just model enough is so invoked, that it when . can The resulting out thunk a must thunk capture and the quickly future depen- C++, using the error if the program referencesthe any thunk. file It not also mentioned prevents in getpid the use of system callscould such be as used to introducebox outside does information. not The guarantee sand- perfectlyan deterministic executable execution; can still call system callsmay such return as different results onmore different like invocations. It a is unit tois gain not confidence inadvertently that accessing the data executable outside the functional vents use of the vDSO to get the time or CPU number without a syscall. (every file referenced must be mentioned in the thunk), or with most build systems: 5 "preprocessed_output", "-o", "compiled_output", "-S", "-specs=/__gg__/gcc-specs" ], "args": [ "-x", "cpp-output", "hash": "sJ.rk55", }, ) is a thunk of order 1, making this a second-order An example thunk, describing the compilation stage "hash": "KSwy9c", "order": 1, }, "hash": "sJ.rk55", "size": "1197328", "order": 0, }, "hash": "wYsT5z3", "size": "10381", "order": 0, }, "hash": "LuVgfa3", "size": "23787992", "order": 0, } ], infiles (whose contents are theoriginal result of infiles). forcing This the changesoverall the thunk so contents that of it is the nowThe a first-order first-order thunk. thunk can then be looked up in a files are named by their hashes). If the thunk is notzero-order a infiles first-order thunk, must then firstthemselves. its After be non- this forced process completes, recursively nal the thunk origi- is then rewritteninfiles so are that replaced the with higher-order corresponding zero-order Retrieve the infiles, eachof one its identified contents. by These a mayaddressed come hash storage from (e.g. any a content- directory or database where KSwy9c { "filename": "g++", { "filename": "gcc-specs", { "filename": "cc1", { "filename": "preprocessed_output", "outfile": "compiled_output" } "infiles": [ 2. 3. 1. 3.1.1 Execution environment follows several steps: order, with one or more ofother its (as-yet-unevaluated) infiles thunks. taken In as this the case,256 output hash the of is SHA- taken over the contents of the referenced thunk thunk is one greater than the maximum order of any infile. some less-important fields have been omitted. hash that identifies the filecontents. will The resulting be thunk taken is overthunk: considered its a a actual thunk first-order that canits immediately infiles be are executed all because actual files. A thunk may also be of higher in a “Hello world” program,assembly. from The preprocessed preprocessed C C to code compiled (the first infile, hash starting thunk. SHA-256 hashes have been shortened for display, and Figure 1: { "function": { "exe": "/__gg__/g++", To force a thunk and retrieve its value, the execution with (the JSON document), and the order of the referencing Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) , , -E model is a thunk inkscape gcc takes this as to detect if a ) to obtain this models by run- model g++ ptrace gg-mock gcc -M options, which the mod- read a single file that is and gcc gcc that uses ranlib toolkit, applications like and directive, recursively. The model uses are more difficult. We model the com- gg-mock llvm gcc strip cause the compiler to stop after preprocessing, , and OpenSSH, and the Linux kernel. These build -c #include Tools like We torture-tested the To model the preprocessor, the thunk must include the In some cases, these references may not be sufficient; , and full list of dependencies, including every file included a feature of the reallist. preprocessor The ( model locally caches thekeyed output by of the this hashes command, of theIf contents none of of the those files referenced has files. can changed, safely the reuse preprocessor the model previous dependency list. We use a they are “thunk-aware” by tryingfile to that open has a a nonexistent canonicala filename; signal to stop tracing the program. 3.2.1 Implementation details of models named on the command linemodels and are modify trivial. it in-place; these piler as aing, sequence compiling, assembling, of and up linking.parses The to the command-line four options and stages:of determines the which preprocess- four stages will be executed. (Arguments like compiling, or assembling.) The model constructsthunks separate for each stage,source so files that may still compiling resultexample, two their in different preprocessed a output is hit the if, same. for ning them against anincluding array the of free-softwareffmpeg programs, systems use a wide variety of els now handle. derlying tool. A thunk referenceparticular thunk, includes but the is hash also ofof valid a file. syntax For for example, another the kind an output executable, of so the the linker output isreference of typically that the linker acts like anthat executable: includes it is the a hash shellforce of script the the thunk thunk. to If compileexecute the executed, it. helper it When program modeling will and a then compilera stage library, that the produces model writestactically a valid thunk linker reference script, that in isto case a link the syn- against build a system build tries product. for example, a build systempiled might output into seek a to ZIP bundleagainst file, com- or a compare canonical compiled reference output and take action basedper on tool called reading; if so, it pauses executionuntil of the the thunk thread can in question bebe forced replaced and with the the thunk value reference of can the thunk. Models indicate Tools like with a whether it matches. To handle this case, we built a wrap- “non-thunk-aware” program opens a thunk reference for -S 6 gcc hello.c gg-infer followed by . This tool runs stdio.h hello.i hello.s hello.o ) replace the underly- gg-infer gcc hello hello step, and their output is then ), and for ancillary tools like (stripped) closeout.o libhello.a closeout.c closeout.i closeout.s gg-infer make : a directory in the filesystem that must handle the case where the example program. (Many system header ranlib libc gg-infer hello and string.h example program, is shown in Figure 2. object store gg-infer .o dirname.s dirname.i ar Part of the DAG of thunks inferred with compilers, the GNU assembler and linker, the . Each of these models parses the command-line hello g++ To deal with this case, our models instead write a thunk However, Build systems often include scripts that run in addition At runtime, the user can extract the dependency tree We built models for the GNU preprocessor, the dirname.c the helper program isable a to thunk, be the executed. program will not be run during the captured in a thunk (e.g. by the preprocessor). build system tries to runtimes or build systems link do against not acompile simply thunk. a run Some- helper scripts: program they and first cause then difficulties execute it. because This if could the file that should contain GNU to these standard toolsheader files), (e.g. but to typically generateof such configuration the scripts preprocessor, run compiler, “upstream” difficulty etc. for These our approach, do because not the cause scripts are simply for a software package by running the build command, e.g., the given command after changing the PATH environment ing tools. An example DAG of thunks, inferred from the and archiver ( strip arguments for the underlying tool,output and files writes that to the the real same a program reference would to write a to, thunk but instead with of the true output. files and other dependencies omitted to simplify the diagram.) thunk is created, all ofcopied the to dependencies the are captured and contains each infile, named by the hash of its contents. Figure 2: from the GNU variable so the models (e.g. for when forcing the thunk on a remote machine. When the “reference” to whatever files would be written by the un- Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) proceeds as . Any thunks writes a mem- y are rewritten to x —in our implemen- gg-force gg-force reads the thunk in question and remote object store only runs as many parallel thunks as the gg-force (the local directory that stores infiles named evaluated to an output of hash gg to explicitly force it. The method of forcing x , and their order is recalculated (as one plus the y Path subgraphs are linear chainsthe of output thunks, of where each thunk is theciently input forced in to the the samethe next. cloud dependencies function, will because already benot available and have to will be retrieved again from storage. These can be compressed into one thunk, or effi- needs to ensure that the thunk’s infiles are available When each thunk completes, The user starts with a thunk that represents the desired 3. puter or a cloudgg function). Before executing thein thunk, the foreign environment. Therepresentation system of maintains a a local tation, an Amazon S3 bucket. Every time it uploads astore file so it will not need to re-upload it subsequently. gg-force depends on which environment the user has selected. 3.3.1 Executing thunks locally In this mode, recursively builds the DAG of thunks that it depends on. object store by hash). All first-order thunks canBy be default, executed in parallel. user has cores. oization entry to the filesystem, recording that athat thunk depended on the completedrefer thunk to maximum order of any infile).can Any new then first-order be thunks executed, andoriginal the thunk process is continues executed. until the 3.3.2 Executing thunks remotely above, with the exceptionhappens on that a actual remote thunk computer (either execution a persistent com- 3.3 Forcing thunks locally or in the cloud locally on a Linux computer,ning remotely a to persistent a webserver, computer and run- onfunctions a service commercial (AWS cloud- Lambda).the These thunk systems, abstraction, and are agnosticexecuted to and whether the it process is being relatedor to not. software The compilation models inspecific the to previous software section, compilation. however, are output of a large computation.reference, If they it can is simply an executable executeand thunk it then to run the force resulting the program; thunk otherwise they can run These thunks are all identified by hash and located in the (named by hash), it records this fact to the local object with hash When outsourcing computation, We built systems to force thunks in three environments: 7 , ’s cc1 gg , and musl , as , , ld ar , sandbox. to generate an gg . At startup, the C ranlib , ar ranlib , ’s approach opens up the or another library function gg g++ , (to create the archive)—which thunk-aware open minimalist standard C library to gcc , ar s musl (debugging) flag can be eliminated from the collect2 -g , . ally.) The compiler’s command-line if the eventualgoing output to is be stripped. Many build systemsarchive run index, but thisit comes step after is ait null often operation does. In if thatskipped. (Currently, case, we this do stage this optimization can manu- be safely seeks to execute thunks efficiently either locally or in We ensure the correctness of this indirection layer, and We did not modify the tools, but we statically linked The models also cache the hash of referenced files by 1. 2. of thunks. For example: produce the same output. 3.2.3 Whole-tree optimizations of aBy DAG of first thunks extracting the entireduces chain the of operations compiled that output, pro- possibility of optimizations that apply to the whole DAG of the models, bynormally compiling with the many toolchain, projects andsteps two by and modeling ways: then the forcing same thunksregression inside the tests verify that both methods succeed and correspondence table between names and hashes. Later, that references a filename, thea C library reference replaces to this the with of file’s its true contents). name This (theprograms layer to SHA-256 become of hash transparently indirection thunk-aware,modifying without allows the these filenames visible to the program, which them against the reduce their total size. Additionally,to we make modified the programs library checks for an environment variablethe that thunk, contains parses the thunk’s list of infiles, and makes a code), but also the executable itself. We built canonical run efficiently on the user’s machineable but in will the also be cloud. avail- Thisprograms toolchain from includes GCC the 7.2.0 following andcc1plus binutils 2.2.8: strip the cloud (on a remote VMstructure). or The functions-as-a-service thunk’s infra- functional footprint includeshashes the of data referenced by the executable (e.g. source similar technique to model the linker. inode number and nanosecond timestamp;not if changed, these the have models do not re-hash the input file. 3.2.2 Building a canonical toolchain gg x86-64 versions of the GNU C/C++ toolchain that can would have affected debugging and diagnostic output. when the program calls Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) ], 46 [ has a ], and x 13 achieves [ A crypto- gg achieves sig- as fast as out- locally on the × protobuf cmake gg memoization as- ], ], to 4 38 42 [ [ × A cryptographically every make -j64 mosh was 3.5 openssh gg ], —will be more cautious in whom .” y 20 gg [ ’s performance and that of comparison gg ]. No changes were made to the underlying , we measured its start-to-finish build times 30 gg as fast as running ffmpeg [ × ], has the following contents.” 36 [ to 3 graphically signed statement that, “The filex with hash Memoization assertions. signed statement that, “The thunk with hash Content-addressed storage assertions. value with hash × Our findings show that for local builds, We expect that entities that execute thunks might pub- In both cases, a skeptical organization can double- 2. 1. schemes, we collected agrams written selection in of C or open-source C++.of These pro- packages build cover complexity, a parallelization variety levels, andset size. of This programs consisted of llvm inkscape build system of these packages. collaborative use of they trust to makeassertions. memoization assertions vs. storage 4 Evaluation under multiple scenarios with severalsource unmodified packages. open- We compared these timesbuild with tools existing under the same scenarios. either similar performance to the underlyingor build in system, some cases (FFmpegnificantly and higher performance Mosh), by inferringinformation finer-grained about dependencies, unlocking morelelism. paral- For distributed builds of largeLLVM) outsourced projects from (Inkscape, a low-powered clientEC2 (a VM) 4-core to the cloud, sourcing compilation to a remote 64-core machine, and 64-core machine itself. 4.1 Evaluation set could be risky to blindlyoutput, trust in another case entity’s the compiler compiler has been compromised. lish two kinds of statements: check the assertion. Ifor an is entity compromised, makes it a will mistake, be lies, possiblehave to issued prove a signed to statement the One that is is demonstrably unlikely false. to double-check sertion, because doing so isthe equivalent to thunk. simply We expect forcing that users—even those interested in To evaluate To benchmark world that it should no longer be trusted, because it will 1.2 8 in uses collaborate gg-force , so that no two y may The lambda function 2 gg compiler requires more gg llvm We wrote routines in C++ to to predict this situation in advance. evaluates to output x first uploads any missing infiles to the remote gg-force We created different lambda functions for each combination of 2 Many entities build a lot of software—e.g., continuous Remote execution on a persistent host is similar: we To remotely execute a thunk on AWS Lambda, once by one of these organizations. At the same time, it executables, so the executables wouldlambda starts already and be would available not when need the to be re-downloaded from S3. users need run the same computation twice. testing infrastructure like Travis-CI, andbian, projects like Ubuntu, De- and Fedora. It would be convenient, and needed to rebuild a file that had already been compiled their caching of objects byparticular hash and thunk memoization that a these routines pipeline theloaded HTTP requests files), (e.g. uploading the 32 up- files back-to-back without found that S3 allows up to 100 pipelined requests in aupload row. and download throughput whenfiles, transferring even many compared with Amazon’s C++ SDK. 3.4 Collaborative use of a persistent host asreason the for backup the “size” “overflow” field server.to in The allow the thunk infiles structure is Implementation detail. speedily upload and download fileseral parallel from HTTPS S3 connections. using Within each sev- connection, can accommodate. This typically happenslinking during step; a e.g., final linking the than two gigabytes of infiles,500 compared MB with for a the limitaccommodate lambda’s the of filesystem, output. In which these also cases, must of the output are notchine; downloaded only to the the user’s hash, local sobe ma- that rewritten the in referencing terms thunks of can this output. created a CGI script thatinterface implements and the AWS run Lambda itexecuting with thunks the on AWS Apache Lambda, webserver. occasionally When a thunk over from a previous invocation), deletesnot any included files as that infiles are to themissing current infiles thunk, from downloads S3 any (the remotethe object function, store), and executes uploads thehash. output The to S3, lambda named function with alsothe its responds, SHA-256 via hash HTTP, with and size of the output. The contents gg-force object store, then invokesthunk a in the lambda HTTP function POSTlooks payload. with at its the local filesystem (which may contain infiles left This pipelining makes a considerable improvement in S3 would further improve the speed of compilation, if no user waiting for a response in between each one. We have will require infiles bigger than the lambda’s filesystem We envision that deployments of Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) ’s gg 2× gg ’s fine- gg caches the in building gg is roughly build using gg gg mosh in favor of newer, more , generate separate build ’s performance is hindered by and indicates that at a certain cmake or make automake The benefits of using Although AWS Lambda allows the user and FFmpeg locally, ffmpeg make running with the same degree of available is capable of extracting more parallelism has about 1500 files that require compilation; gg mosh make ffmpeg features, such as global caching. Figure 5 shows the timings from a The overhead is less severe in the case of an incremen- larger software packages are more significant. Asple, an exam- given enough cores, almosthappen all of in that parallel. compilation Figureend can 6 timings shows for the thunkpoint, start increasing parallelism and yields diminishing returns; In building faster than spurious dependency edges. This indicatesgrained that approach is effective infor reducing packages the that build consist time ofbuild many small systems, modules. like Some recipes for each module, andafter they another, build even each though module there one maybetween be each no of dependencies these modules.pendencies, By extracting the real de- than the original build system. Small packages. to execute thousands ofpackages jobs usually at do once, not have smallerNonetheless, this software these degree programs of can parallelism. stillgg benefit from other on AWS Lambda. For the preprocess stage, most of the preprocessing a single file mayheader depend files. on a Because few preprocessing hundred happensfiles for simultaneously, its all effect the on thetime start-to-finish is build negligible. Larger packages. for each package, running the nativemodel build substitution. system These through times areof significant larger in packages, the and case indicatechinery that itself the build (e.g. system recursive ma- make),required as to well hash the as contents thecapture of time them all in build thunks, dependencies can to consumeof a CPU significant and amount I/O. tal build, or without ahashes cold and cache, because dependencies of fileslying on build disk system and will the not under- However, attempt it to does re-build suggest every a file. significantapproach weakness of to attempting to retainunderlying compatibility build system, with rather the thanthors asking software to au- discard efficient build systems (e.g. Bazel).plan In to future make work, it we possible to adapt inferred thunksalongside into these a open-source packages. 4.4 Discussion of evaluation results (64-way) parallelism. workers’ time is spent on downloading the dependencies— “build system template” that can be distributed with or 9 . Ice- make ccache (not shown be- against As described in . No jobs were exe- gg separates the depen- 3 gg ccache gg ) and no remote machines ), outsourcing to a 64-core and on a completely cold cache gg . The package was built on a 4- In these two scenarios, the pack- With the same setup as the with 1000-way parallelism, again gg The package was built on AWS gg experiment, the target package is built (64): To evaluate the effectiveness of our gg-infer .xlarge The build was initiated on a 4-core client m4.16xlarge : The package was built locally using 64-way gg (1000): (64): ’s benchmarks ). For both, the build was done on a 64-core does preprocessing and linking locally, so it is not possible λ λ (64): - - gg gg standby EC2 VM acted asthunks the whose “overflow” worker total for infile sizelambda, was as was described too in big Section for 3.3. a Lambda using cute on AWS Lambda. gg cream (64+1) in a distributed fashion using cuted on the client machine. gg machine, and thunks were executed on AWS Lambda parallelism using core client ( comparison, the number of localthe jobs master) (executed on was limited to one. gg-remote age’s own build systemcore, was or executed with on up a to single 64-way parallelism (e.g. EC2 VM ( Icecream (64+1): make, make (64): with a standby EC2 VM for thunks too large to exe- with a maximum of 64 workers running at a time. A VM in the same region. In order to achieve a fair were involved. -j64 icecc 3 We ran each scenario on each package at least 5 times. 4. 1. 3. 2. 3. dency inference and execution stages.running Figure times 4 of shows the to fully outsource the build job. to populate the cache. Then,repeated after again, cleaning, this the time build on was almost a identical “hot between cache.” Results were cause of space limits). Dependency inference performance. § 3, unlike the other approaches, Figure 3 shows the median timings. Cached builds. caching mechanism, we compared Starting with an empty cache, the package was built once to evaluate 4.3 4.2 Baselines For each package, we measuredtime the in start three to different finish scenariosand as build distributed the builds: baseline for local We conducted the following experiments for each package 1, 2. Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) (1000) λ - partition gg : i P → ) (64) λ - gg mapped rebasing a compressed video . and then, in a serial process, function. Figure 8 shows the N ]. To demonstrate the expres- : , ¨ the first line of map output is a () () concatenates the infiles together 21 Y : … re-encoding the first frame of an : , → : extracting the final state of a com- ) reduce function goes through the infile Y encoding an uncompressed video to a ¨ : out ¨ , we also implemented this scheme in vpxenc rebase : : i -remote (64) → S → mapped R C ) gg mapped ) i. gg . → 1 → → ) R3 ( → , ) ) ) i U U,C,S,S P R2 input files and 3 reduce workers. ( ( ( C,S U,C,S , ( ( txt starting with a state, on top of another state. and sums up all of the values for each key. to create the outfile. compressed VP8 video. pressed video starting with a state. compressed video, starting with a state. list of offsets for partition program. program read the firstrange line that of it the should infile read. to find the N R1 i. ( ( In summary, the algorithm first encodes each chunk Figure 7 shows the dataflow for word fequency problem in parallel using rebases each output on topous of chunk the using state left bydependency the graph for previ- encoding athese batch functions. of 4 chunks using 4.5.2 Video encoding Previous work has used functions-as-a-serviceture infrastruc- to run interdependentmany-way video parallelism processing [ tasks with sive power of terms of thunks. The functions necessary to encodevpxenc a xcdec xcenc rebase map partition reduce with we defined the following set of functions: “batch” of video chunks were: 10 , gg ] and 15 (64) Icecream (64+1) build system, gg Comparison of build times in different scenarios described in § 4.1. inkscape (combining inference and to generate the thunks that it Local Distributed gg gg . In a truly cold client where to other applications 4 cores 64 cores in 1 minute, 15 seconds on AWS Figure 3: 4× gg ]. To implement this workflow using make make (64) 32 inkscape must be run anew on the 4-core client to extract . This indicates that serial steps (e.g. linking) MoshOpenSSHCMakeProtobufFFmpeg 05sInkscape 02sLLVM 15s 01m 28s 12s 02m 33s 03s 02s 02m 45s 09s 36s 03s 44s 49s Time required for compiles Results for LLVM are similar, even though the single- Figure 3 demonstrates that for very large packages, MoshOpenSSHCMakeProtobufFFmpeg 36s 04m 29sInkscape 31s 05m 26sLLVM 09m 45s 33m 35s 1h 16m 02s 18s 11s 17s 20s 01m 32s 40s 02m 33s 01m 03s 46s 05s 15s 02m 21s 22s 23s 05m 01s 04m 06s 06s 12s 28s 01m 36s 23s 02m 25s 03m 29s 01m 36s 02m 27s 53s 12s 49s 02m 35s 41s 01m 15s 01m 11s 59s 13s 10s 33s 40s 35s 15s 12s 25s 33s the goal is to counthas the appeared number in of a corpus times of thatimplemented documents. each such Previous word work workloads has on functions-as-a-service infrastructure [ dominate the job completion time, as in Figure 6. 4.5 Extending 4.5.1 Word frequency with MapReduce the total time to buildexecution) with would be 3m48 (2m33faster + than 1m15), outsourcing about to 32% the nearby 64-core VM. core build time for this projectinkscape is more than double that of gg Lambda with a cold cache, compared with 5 minutes gg-infer the dependency graph from the the last four jobs areand primarily stripping) serial (archiving, and linking, consume almostcompletion half time. of the total job a high degree ofbuild parallelism times, can but significantly that serial improve stages continue to dominate. Figure 4: This is a speedup of will execute later, either locally or remotely. when outsourced to a 64-core VM in the same EC2 region. Word frequency is a classic MapReduce problem [

Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute)

time (s) time time (s) time ’s approach of at- 10 8 6 4 2 0 30 25 20 15 10 5 0 gg 97 ’s modeling approach gg does not have an approach gg on AWS Lambda. job completed  archive and link  job completed  gg worker # 80 , which pauses execution of any “non- archive, link and strip  using 5080 5095 5115 mosh on AWS Lambda. Serial stages (archiving, linking, gg-mock gg 30 25 20 15 10 5 0 a wide range of open-source programs, but often  assemble 60 using reference or check if they arerun empty. under These builds must be thunk-aware” thread that tries to readdisk. a However, thunk performance reference of on the resultingbecause build thunks is end poor, up forcedsystem one-at-a-time, gets as to the each build insystem turn. of We will intelligently need pre-fetching to thunks implement so a they can be peg with 1000-way parallelismsuggests in a AWS significant Lambda. weakness This tempting to to retain compatibility withsystem. the underlying In build future work,adapt we inferred plan thunks to into a makecan “build it system be possible template” distributed to that withpackages. or alongside these open-source Often, tests are what isonly slow. infers dependencies forthe programs compiler, it linker, knows archiver, about: build etc. This is sufficientit to is the testingthat of is these the programs, true not bottleneck. to the automatically compilation, infer the dependencies of such tests; de- these processes to be outsourced. Some build systems wantBuild to systems like read the intermediate Linuxate kernel data. files try and, to in read scripts, intermedi- compare them against a canonical velopers would need to manually create thunks to allow with only 35 seconds to actually compile all of FFm- job completed  11 ffmpeg , archive, link and strip  worker # make infers out 40 gg 3000 3500 4000 4500 5000  compile R1 R2 R3 worker # thunks. gg 20 P3 P1 P2 partition reduce cat Breakdown of workers’ execution time when building Fetching the dependencies Executing the thunk Uploading the results Fetching the dependencies Executing the thunk Uploading the results  preprocess 0  preprocess, compile and assemble 0 500 1000 1500 2000 2500 , etc.) by running it under model substi- Figure 5: 3.mapped N.mapped 1.mapped 2.mapped Dependency graph for a MapReduce workflow calcu- Breakdown of workers’ execution time when building bazel , ⋮ has a number of important limitations that will be the N.txt 1.txt 2.txt 3.txt (infiles) map thunks, this step can be aple, significant on bottleneck. a For totally exam- cold cache,server it to took infer 88 the seconds dependencies on from a FFmpeg, 4-core compared subject of future work. The underlying build system is adependencies bottleneck. from the underlyingcmake build system ( tution to generate thunks. In cases where the client is Figure 7: lating word frequency, expressed in 5 Limitations and future work gg and stripping) consume almost half the total job completion time. The right-hand plot shows a zoomed-in version of the figure. Figure 6: weak in comparison with available infrastructure to force Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) ] in terms 32 , 21 . It will be useful to ’s utility and perfor- gg 4.vp8 gg thunks. , a system for executing gg ], or cluster-wide like AWS gg 14 , xcdec rebase 11 3-1.state 3.vp8 and pledges [ ’s thunk abstraction, so it can be run by the same gg Cloud functions may tell a similar story. Although in- As a computing substrate, we suspect cloud functions 6 Conclusion networks and deep learning. tended for asynchronous Web microservices, wethat believe with sufficient systems,capable the of same broad and infrastructure exciting is GPGPU new computing applications. did Just as alambda decade computing ago, may have general-purpose far-reaching effects. contending applications, and failures. Itto may introduce be priority necessary and admission-controlfor mechanisms lambda-based frameworks like In this paper, weinterdependent described software workflows across thousandsshort-lived of functions that run in parallelinfrastructure. on We demonstrated public cloud mance benefits in compiling largethousand-way software parallelism, projects and with we expressedin prior general-purpose work serverless computing [ of execution engines. are in a similar positionthe to Graphics 2000s. Processing At Units the in time,3D GPUs graphics, were but designed the solelythat community for they had gradually become recognized programmablesome enough parallel to algorithms execute unrelated to graphics.this Over “general-purpose time, GPU” (GPGPU) movementsystems-support created technologies, and todayGPGPU non-graphics applications are acal major simulations, database use queries, of and GPUs: especially neural physi- cgroups IAM and limits [3]. Multiple applications and users. develop techniques that allow multiple applicationsmultiple and users to share aCurrent approaches fixed such set as of quotes lambdacurrently on resources. the active number lambdas of con- can lead to deadlock among 2-1.state 12 that . The 2.vp8 xcenc xcdec rebase 4-1.vp8 3-1.vp8 ] for this ap- 47 , data locality 3-0.state 2-0.state 1-0.state 43 , ]. For applications like 41 ephemeral storage 26 is 1.vp8 gg 2-0.vp8 4-0.vp8 3-0.vp8 on AWS Lambda has led us Dependency graph for a video-processing workflow [21], expressed in gg 2.raw 1.raw 4.raw 3.raw Figure 8: (infiles) vpxenc xcdec ]. The scalability and durability of these ] and frameworks [ Lambda functions are currently scheduled 5 , 23 2 , AWS Lambda functions can access long-term 6 As we expand the scope of lambda uses, it is , 4 , it is also important to consider consume and for how longcan (i.e., spend how and much over money how they using much and time). extending This existing mechanisms, may both involve local like could match tasks tooff their communication data, overheads prefetch to the data,resources cost or to of become trade waiting available for in a containerSafety. with data. important to provide mechanisms thatsingle constrain lambda what or a an ensemble of lambdas can do, with sources and next based on codeavailable locality or (container running already on agg host) [ scheduler should steer each taskstore towards or functions have that fast accesslarge to set its of input the data—or inputencoded at data. in least, The thunks a explicit allows data for dependencies an intelligent scheduler that tions. The scalability, speed, and pricingIOPS for should match capacity the and characteristicsputation, of and serverless referenced com- objects shouldgarbage-collected automatically when be a job terminates. Scheduling. first based on the availability of processor and memory re- storage and database services, suchnamoDB as AWS [ S3 and Dy- services makes them appropriateput for data the of input serverless andponent computations. out- for The applications missing like com- allows communication of intermediate data between func- Developing and using to several observations about improving serverless ser- plication domain. Storage. forced in parallel, or persuadetheir these build developers systems. to modify 5.1 Future work in serverless services vices [ whom they can communicate, and what resources they can Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) , , , , Pro- ,G., AND com/ . , W., Server- ARTER Caching Proceed- USSEAU ENG HEKKATH -D HACKLETT ORTER Nectar: Au- google , T. P., . , Y. ,S.,H U , 4 (Dec. 2001), Y ,R.H. ANN ,L.,T ,L. RPACI ,A.,P com/blogs/aws/new- . AND , K. V., Z Software – Practice and ,R.S.,S ,R.,M USSEAU HUANG , V., A com/FFmpeg/FFmpeg. Z TURDEVANT . ,R., -D (Berkeley, CA, USA, 2016), amazon . Encoding, fast and slow: Low- EVIN AHBY IVARAMAN Answering queries using views: AVINDRANATH Make – A Program for Maintain- AND Proceedings of the 9th USENIX EVIN ,S.,S ,K. RPACI A com/compute/pricing. (Berkeley, CA, USA, 2010), OSDI’10, , 4 (1979), 255–65. . ,R.,S , Y., The VLDB Journal 10 14th USENIX Symposium on Networked ,S.I. ,S.,W U ,C.A.,L ,A.,L , A. Y. , P. K., R AND Software Configuration Management Using INSTEIN ALASUBRAMANIAM ENKATARAMANI google . W . Springer Publishing Company, Incorporated, (New York, NY, USA, 2000), PLDI ’00, ACM, , Y. EYDON ALEVY ENDRICKSON EYDON UNDA HALERAO U OULADI ELDMAN H G C.A.,Y H H H F F B.,B B tion pp. 311–320. 2011. Google Functions.functions. https://cloud tomatic management of datadatacenters. and computation In in plementation USENIX Association, pp. 75–88. 270–294. less computation with openlambda.ings of the In 8th USENIXin Conference on Cloud Hot Computing Topics HotCloud’16, USENIX Association, pp. 33–39. function calls using precise dependencies.ceedings of In the ACM SIGPLAN 2000 Conference on Per-Second Billing for EC2 Instances andper-second-billing-for-ec2-instances-and-ebs- EBS ing Computer Programs. latency video processing usingthreads. thousands In of tiny 376. Google Computecloud Engine pricing. https:// AND T., V A.C., volumes. Systems Design and Implementation (NSDI 17) Programming Language Design and Implementa- Experience 9 Conference on Operating Systems Design and Im- (Boston, MA, 2017), USENIX Association, pp. 363– A survey. Volumes. https://aws Y Vesta [28] [23] [24] [25] [26] [27] [18] [19] [20] FFmpeg. https://github [21] [22] 13 , - Pro- USS AND com/ com/ . . S ANSEN com/en- org/doc/ . . ,I., Communi- org/papers/ . WITH amazon . kernel txt. ,S.,H RATT com/lambda. . . . Mapreduce: sim- build. com/s3/. . microsoft . com/iam/. . ,G.J. . AND Deriving production . ,S. org. . ,C.,P amazon . ,J. amazon (San Francisco, CA, USA, . amazon . org. ,K.,H USSMAN (Berkeley, CA, USA, 2005), , 1 (2008), 107–113. . Pledge: a new mitigation docker . S IMPACH IDOM HEMAWAT G W com/en-us/pricing/. Live migration of virtual machines. AND . RASER org. . , T. ,E.,L AND ,A. AND ,H., , 2nd editon . MIT Press/McGraw-Hill, Structure and Interpretation of Computer UL ,C.,F samba . AADT ,J., microsoft ,S., . ,J. R Proceedings of the 2Nd Conference on Sym- ARFIELD EAN BELSON LARK ERI D DE C C A MAN J.G.,J distcc distributed compiler.distcc/distcc. https://github mechanism.hackfest2015-pledge/. https://www plified data processing on largecations clusters. of the ACM 51 In posium on Networkedmentation Systems - Volume Design 2 &NSDI’05, Imple- USENIX Association, pp. 273–286. cgroups.Documentation/cgroup-v1/cgroups https://www rules for incremental viewceedings maintenance. of the In 17th International Conference on Inc., pp. 577–589. azure ccache - accache fast C/C++ compiler cache. https:// Azure Functions. https://azure us/services/functions. Azure pricing overview. https:// Cambridge, 1996. AWS DynamoDB.dynamodb/. https://aws W Programs Very Large Data Bases 1991), VLDB ’91, Morgan Kaufmann Publishers [6] [5] Amazon S3. https://aws [9] [4] AWS Lambda. https://aws [8] Bazel build system. https://bazel [7] [1] [2] [3] AWS IAM. https://aws [16] [17] Docker. https://www [12] [14] [15] [13] CMake. https://cmake [11] [10] References Draft of Sep. 25, 2017 (under double-blind review; please do not redistribute) , , , An ,M. AVE VAN com/ . ,B.B. ,M.J., ,S.J. SARD ,A.R., ,R., I , T., D ELCH ULLENDER com/google/ com/apache/ . Computer 21 . Resilient dis- AS M W Proceedings of , 3 (July 1981), , 12 (Dec. 1990), AND RANKLIN com. . com/open-lambda/ ENESSE AND AND . ,I. (Berkeley, CA, USA, R ,M.,D ULLENDER (Berkeley, CA, USA, (Berkeley, CA, USA, HERENSON , Y., M U ,M.,F openssh TOICA VAN . ,G.J., ,M.N., S AND ,M.,Y AULEY HARP HOWDHURY C AND ELSON Commun. ACM 33 C ,J.K.,C ,A.S., ,A.S., UDIU ,H.,S ,S., ,M.,C , F., N Proceedings of the 2009 Conference on Hot ,J.,M A SIGOPS Oper. Syst. Rev. 15 ,L.,B Experiences with the amoeba distributed op- USTERHOUT OUGLIS AHARIA HENKER OPA TAVEREN ANENBAUM ANENBAUM T T S S.J. Z S O D P Protocol Buffers.protobuf. https://github Serverless framework.serverless/serverless. https://github overview of the amoeba distributedtem. operating sys- 51–64. erating system. tributed datasets: A fault-tolerantin-memory cluster abstraction computing. for In the 9th USENIX Conference on Networked Systems 2012), NSDI’12, USENIX Association, pp. 2–2. 2005), ATEC ’05, USENIX Association, pp. 25–25. Openlambda.open-lambda. https://github Apache Openwhisk.incubator-openwhisk. https://github 2 (Feb. 1988), 23–36. Dryadinc: Reusing work intions. In large-scale computa- 2009), HotCloud’09, USENIX Association. A.,M Proceedings of the Annual Conference on USENIX The sprite network . 46–63. Design and Implementation Topics in Cloud Computing Annual Technical Conference [47] [48] [49] [50] [41] [42] OpenSSH. https://www [43] [44] [45] [46] 14 , , - - - AT org. ,G. ,I., . Pro- ,A., CAN AND Proc. com/ M (New . ATYA H (2017). S , 4 (Dec. Efficient ,S.M., MOWTON org. . AND TOICA AND Proceedings IRRELL UTCHINS ,J.A.,S AND H , P., mosh ,M.H. . UMBLE ,M.,S ,A., ,S.,S IM , Y., B K AND U HITNEY ,M., Proceedings of the 8th , P., R AND ALDURIEZ J. Grid Comput. 13 The limitation of mapreduce: (Berkeley, CA, USA, 2011), ,M.,Y RUDNO Snowflock: Rapid virtual ma- CoRR abs/1702.04024 ,B.-H., Dryad: Distributed data-parallel CHWARZKOPF (2010), pp. 68–73. ATCHIN ADHAVAPEDDY ,H.A.,W ,L. ,E.,V IM ,J.H., Occupy the cloud: Distributed com- U ,D. ENKATARAMAN UDIU G ON ,M. A survey of data-intensive scientific ,B. ,E.,B ,S.,M AVILLA ACITTI org. ,D.G.,S ,M.,L AND . (New York, NY, USA, 2009), EuroSys ’09, -C ,E.,V ,M.,B ECHT ETTERLY ARA ,M. ,A.M.,P MITH R F L , K. Y., S ,J.,P ,Z., Ciel: A universal execution engine for distributed Proceedings of the Tenth International Confer- A URRAY ELSON IU EE AGAR ONAS SARD N M C.,S S. M L L NELL DE NARAYANAN J L I and Implementation NSDI’11, USENIX Association, pp. 113–126. Fast transparent migration for virtual machines. In data-flow computing. In of the 1st Intl. Conf.and on Virtualization Cloud Computing, GRIDs, pp. 349–356. 2015), 457–493. The LLVM compiler Infrastructure. http://llvm incremental view maintenance in dataIn warehouses. ence on Information and Knowledge Management chine cloning for cloud computing.of In the 4th ACM European Conference on Computer puting for the 99%. programs from sequential building blocks.ceedings In of the 2Nd ACMpean SIGOPS/EuroSys Conference Euro- on Computer Systems 2007 icecream distributed compiler. https://github icecc/icecream. Inkscape, a powerful,inkscape free design tool. https:// TOSO AND AND Systems 72. (New York, NY, USA, 2001), CIKM ’01, ACM, USENIX Conference on Networked Systems Design A probing case and a lightweight solution. In workflow management. ACM, pp. 1–12. York, NY, USA, 2007), EuroSys ’07, ACM, pp. 59– [40] [39] [37] [38] mosh - Mobile Shell. https://www [35] [36] [34] [32] [31] [33] [29] [30]