Webmesh: a Browser-Based Computational Framework for Serverless Applications
Total Page:16
File Type:pdf, Size:1020Kb
WebMesh: A Browser-Based Computational Framework for Serverless Applications Joseph Romano ([email protected]) Advisor: Rodrigo Fonseca ([email protected]) May 1, 2019 Abstract based computational framework for serverless applications. WebMesh is able to use idle computing WebMesh is a browser-based computational frame- time by executing distributed applications on any work for serverless applications. Unlike existing environment that can run an up-to-standards web serverless frameworks which run in managed data- browser such as Google Chrome. We initially centers, WebMesh nodes run in browser tabs, using implemented WebMesh with MapReduce and tested WebAssembly to allow C++ code to run portably using the canonical word count use case for in web browsers and WebRTC to allow nodes di- MapReduce. This paper evaluates the scalability, rect peer-to-peer communication. We evaluate the adaptability, resilience despite unreliable workers, scalability and feasibility of such a system by im- and versatility of such a system. plementing the MapReduce distributed computing WebMesh advances existing research into browser- model on the framework. We find that, in tests of based, volunteer, and serverless computing by using a word count job running on MapReduce, WebMesh WebAssembly to allow C++ code to run in a is able to handle 200,000+ concurrent jobs, execute portable environment, and by using WebRTC for jobs 3.6 times faster than running locally, and support peer-to-peer transmission of data, a first step towards a variety of tasks. We suggest it is possible to cre- decentralization. ate value from otherwise unused computing power, The remainder of this paper is organized as allowing more cost-efficient distributed computing follows. Section 2 provides pertinent background than existing serverless providers. information on serverless computing, WebAssembly, and WebRTC. Section 3 details the design decisions, execution model, and programming model. Section 1 Introduction 4 evaluates the system empirically and subjectively, provides a brief financial analysis, describes With the increasing ubiquity of personal computing shortcomings, and lays out ideas for future work. devices which continue to grow in power, a large Related work is reviewed in Section 5. We conclude amount of processing power remains primarily idle, in Section 6. within smartphones in people’s pockets or within computers on their desks. While there are many frameworks for distributed computing, commercial 2 Background and non-commercial, none have lowered the barriers to entry to a simple task like opening a web page. 2.1 Serverless computing With over two billion installations of Google Chrome worldwide, there is a massive opportunity to create Cloud computing has grown in popularity over value from preexisting and unused computing power, the past few decades and providers have slowly filling gaps in time between peak usage [23]. shifted to manage additional services on behalf of To fill this gap, we present WebMesh, a browser- customers. Virtual machines and containers were the 1 first wave, abstracting the need to manage machines, exported functions that the JavaScript code is able to operating systems, and execution environments to call. The instantiated WebAssembly binary manages some extent. Recently, serverless computing has its own memory but is sandboxed in the same way become increasingly popular as a way of abstracting that browsers currently sandbox JavaScript code. nearly every aspect of cloud computing, besides the This sandboxing provides the same security actual code that is to run. Serverless computing guarantees that existing JavaScript sandboxes providers manage the environment, provisioning, and provide, ensuring that malicious WebAssembly more, promising customers that the code they submit modules could only get so far as a malicious will scale based on demand. This is advertised as not JavaScript file could. While it is out of the scope only easier for customers, but more cost efficient due of this project to explore the security ramifications to fine-grained billing. of WebAssembly, there is a baseline assurance that One of the first mainstream providers of serverless WebAssembly provides similar security promises as computing, Amazon Web Services, markets their JavaScript. product as AWS Lambda, advertising “continuous WebAssembly also provides performance that can scaling” and offers an event-driven service [1]. be close to native, with an average of an 89% Lambda will run based on particular triggers, such slowdown and peaking at a 3.14x slowdown in as a request to an HTTP endpoint or other triggers Google Chrome on the SPEC CPU benchmarks [21]. from applications within AWS. Each Lambda worker Given that there could be dozens or hundreds of is limited to only 15 minutes of execution, up to workers dividing time on a job, even the maximum 3,008 MB of memory, and has no default support slowdown of 3.14x would provide fast performance for inter-process communication (i.e. from Lambda in a highly distributed setting in a system with low to Lambda). Lambda supports a variety of languages overhead. This performance will continue to increase such as Python, Java, Go, C#, and Ruby. In total, as WebAssembly develops and matures. AWS Lambda provides an event-based serverless While still novel, WebAssembly is able to integrate offering to fill a specific niche, and not necessarily for with existing compilers and toolchains to allow general computing. A framework developed to run existing optimizations and libraries to be used on on AWS Lambda offering more general distributed the web. Support exists for the newest versions computing power, Pywren, is discussed in Section 5. of C++ using Emscripten, a toolchain branched off Google also offers a serverless computing of LLVM [8]. Emscripten supports WebAssembly architecture, not only providing event-based “Cloud binaries by providing them the needed “glue code” Functions” similar to AWS Lambda, but also is with native JavaScript implementations of syscall, file alpha-testing running containers and Kubernetes on system, I/O, and library functions. This is particularly their serverless framework to provide the scaling important as WebAssembly runs entirely sandboxed, and pricing benefits of serverless in more traditional and must provide all of its own library code – even settings [13]. This opens up more opportunity for syscalls must be re-implemented in JavaScript to general computing than AWS’s offering. be imported into the WebAssembly module. Also, Emscripten manages the memory allocation given 2.2 WebAssembly to the WebAssembly modules, growing the memory allocated in JavaScript whenever syscalls are made WebAssembly is an instruction set designed for which increase memory pressure. languages such as C++ to run portably in web Emscripten also provides a tool, Embind, for browsers or other environments [20]. Already easily accessing C++ functions and objects from implemented in major web browsers, these browsers’ JavaScript. This makes interoperability easier virtual machines are able to execute WebAssembly when providing input data to C++ functions from just like JavaScript is currently executed. The virtual JavaScript. Furthermore, Embind provides access to machine takes in a WebAssembly binary format file, class constructors and creates analogous JavaScript provides it any requested imports from JavaScript objects matching C++ objects (see Figure 1). This functions, and the binary file specifies a set of lets C++ functions take in and return C++ objects, so 2 emscripten::value_array<pair<string, int server offering STUN and TURN services to ensure >>("PairStrInt") complete reachability. This creates significant cost .element(&pair<string, int >::first) overhead for a service using WebRTC, as TURN .element(&pair<string, int >::second); servers could potentially be forced to relay all peer traffic in the case of highly restrictive NATs or Figure 1: Using Embind to match a C++ std::pair firewalls. type to a JavaScript array 3 Design long as proper bindings are made to JavaScript types using Embind. The system is broken up into four major parts: the Another web technology important for We- coordinator server, submitters, intermediary nodes, bAssembly is Web Workers [14]. Web Workers al- and workers. Submitters input C++ code and select low background tasks to run independent of the main input files, the coordinator manages tasks and relays event loop, ostensibly creating “multithreading” in some data and information between submitters and JavaScript. Without Web Workers, long-running We- workers, intermediary nodes optionally relay results bAssembly tasks would potentially block the main between dependent workers, and workers receive thread, thus freezing the entire webpage. By using jobs from the coordinator and compute results. A a Web Worker, the WebAssembly code can be exe- separate WebRTC STUN and TURN server is used cuted in the background, leaving the main page free to assist in WebRTC establishment and reachability. to request data, take input from the user, and more. An overview of the architecture and job lifecycle can be seen in Figure 2. 2.3 WebRTC WebRTC is a browser technology allowing for direct 3.1 Coordinator inter-browser networking and communication. While WebMesh relies on a central coordinator for traditionally, web messages would require a server compiling code to WebAssembly, job scheduling, to process relay information as separate requests negotiating WebRTC connections via the signalling between