Bringing the Web up to Speed with Webassembly
Total Page:16
File Type:pdf, Size:1020Kb
Bringing the Web up to Speed with WebAssembly Andreas Rossberg Ben L. Titzer Andreas Haas Dfinity Stiftung, Germany Google GmbH, Germany Google GmbH, Germany [email protected] [email protected] [email protected] Derek L. Schuff Dan Gohman Luke Wagner Google Inc, USA Mozilla Inc, USA Mozilla Inc, USA [email protected] sunfi[email protected] [email protected] Alon Zakai JF Bastien Michael Holman Mozilla Inc, USA Apple Inc, USA Microsoft Inc, USA [email protected] [email protected] [email protected] ABSTRACT { simple interoperability with the Web platform The maturation of the Web platform has given rise to so- phisticated Web applications such as 3D visualization, audio • Safe and efficient representation: and video software, and games. With that, efficiency and se- { maximally compact curity of code on the Web has become more important than { easy to decode, validate and compile ever. WebAssembly is a portable low-level bytecode that { easy to generate for producers addresses these requirements by offering a compact repre- { streamable and parallelizable sentation, efficient validation and compilation, and safe ex- ecution with low to no overhead. It has recently been made Why are these goals important? Why are they hard? available in all major browsers. Rather than committing to a Safe. Safety for mobile code is paramount on the Web, specific programming model, WebAssembly is an abstraction since code originates from untrusted sources. Protection for over modern hardware, making it independent of language, mobile code has traditionally been achieved by providing a hardware, and platform and applicable far beyond just the managed language runtime such as the browser's JavaScript Web. WebAssembly is the first mainstream language that VM or a language plugin. Managed languages enforce mem- has been designed with a formal semantics from the start, ory safety, preventing programs from compromising user finally utilizing formal methods that have matured in pro- data or system state. However, managed language runtimes gramming language research over the last four decades. have traditionally not offered much for low-level code, such as C/C++ applications that do not use garbage collection. Fast. Low-level code like that emitted by a C/C++ com- 1. INTRODUCTION piler is typically optimized ahead-of-time. Native machine The Web began as a simple hypertext document network code, either written by hand or as the output of an optimiz- but has now become the most ubiquitous application plat- ing compiler, can utilize the full performance of a machine. form ever, accessible across a vast array of operating systems Managed runtimes and sandboxing techniques have typically and device types. By historical accident, JavaScript is the imposed a steep performance overhead on low-level code. only natively supported programming language on the Web. Universal. There is a large and healthy diversity of pro- Because of its ubiquity, rapid performance improvements in gramming paradigms, none of which should be privileged or modern implementations, and perhaps through sheer neces- penalized by a code format, beyond unavoidable hardware sity, it has become a compilation target for many other lan- constraints. Most managed runtimes, however, have been guages. Yet JavaScript has inconsistent performance and designed to support a particular language or programming various other problems, especially as a compilation target. paradigm well while imposing significant cost on others. WebAssembly (or \Wasm" for short) addresses the prob- Portable. The Web spans not only many device classes, lem of safe, fast, portable low-level code on the Web. Previ- but different machine architectures, operating systems, and ous attempts, from ActiveX to Native Client to asm.js, have browsers. Code targeting the Web must be hardware- and fallen short of properties that such a low-level code format platform-independent to allow applications to run across all should have: browser and hardware types with the same deterministic behavior. Previous solutions for low-level code were tied to • Safe, fast, and portable semantics: a single architecture or have had other portability problems. { safe to execute Compact. Code that is transmitted over the network { fast to execute should be small to reduce load times, save bandwidth, and { language-, hardware-, and platform-independent improve overall responsiveness. Code on the Web is typically { deterministic and easy to reason about transmitted as JavaScript source, which is far less compact Permission to make digital or hard copies of all or part of this work for than a binary format, even when minified and compressed. personal or classroom use is granted without fee provided that copies are Binary code formats are not always optimized for size either. not made or distributed for profit or commercial advantage and that copies WebAssembly is the first solution for low-level code on bear this notice and the full citation on the first page. To copy otherwise, to the Web that delivers on all of the above design goals. It is republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. the result of an unprecedented collaboration across all major Copyright 2018 ACM 0001-0782/08/0X00 ...$5.00. browser vendors and an online community group to build a common solution for high-performance applications.1 mediately aborts the current computation. Traps cannot While the Web is the primary motivation for WebAssembly, (currently) be handled by WebAssembly code, but an em- its design { despite the name { carefully avoids any depen- bedder will typically provide means to handle this condition, dencies on the Web. It is an open standard intended for for example, by reifying them as JavaScript exceptions. embedding in a broad variety of environments, and other Machine Types. WebAssembly has only four basic value such embeddings are already being developed. types t to compute with. These are integers and IEEE 754 To our knowledge, WebAssembly also is the first industrial- floating point numbers, each with 32 or 64 bits, as available strength language that has been designed with a formal se- in common hardware. Most WebAssembly instructions pro- mantics from the start. It not only demonstrates the \real vide simple operators on these data types. The grammar in world" feasibility of applying formal techniques, but also Figure 1 conveniently distinguishes categories such as unary that they lead to a remarkably clean and simple design. and binary operators, tests, comparisons, and conversions. Like hardware, WebAssembly makes no distinction between signed and unsigned integer types. Instead, where it mat- 2. A TOUR OF THE LANGUAGE ters, a sign extension suffix u or s to an instruction selects Even though WebAssembly is a binary code format, we either unsigned or two's complement signed behavior. define it as a programming language with syntax and struc- Variables. Functions can declare mutable local variables, ture. As we will see, that makes it easier to explain and un- which essentially provides an infinite set of zero-initialized derstand and moreover, allows us to apply well-established virtual registers. A module may also declare typed global formal techniques for defining its semantics and for reason- variables that can be either mutable or immutable and re- ing about it. Hence, Figure 1 presents WebAssembly in quire an explicit initializer. Importing globals allows a lim- 2 terms of a grammar for its abstract syntax. ited form of configurability, e.g. for linking. Like all entities in WebAssembly, variables are referenced by integer indices. 2.1 Basics So far so boring. In the following sections we turn our Let us start by introducing a few unsurprising concepts attention to more unusual features of WebAssembly's design. before diving into less obvious ones in the following. Modules. A WebAssembly binary takes the form of a 2.2 Memory module. It contains definitions for functions, globals, tables, The main storage of a WebAssembly program is a large and memories. Definitions may be exported or imported. array of raw bytes, the linear memory or simply memory. While a module corresponds to the static representation Memory is accessed with load and store instructions, where of a program, a module's dynamic representation is an in- addresses are simply unsigned integer operands. stance, complete with its mutable state. Instantiating a Creation and Growing. Each module can define at module requires providing definitions for all imports, which most one memory, which may be shared with other instances may be exports from previously created instances. Compu- via import/export. Memory is created with an initial size tations is initiated by invoking an exported function. but may be dynamically grown. The unit of growth is a Modules provide both encapsulation and sandboxing: be- page, which is defined to be 64 KiB, a choice that allows cause a client can only access the exports of a module, other reusing virtual memory hardware for bounds checks on mod- internals are protected from tampering; dually, a module can ern hardware (Section 5). Page size is fixed instead of being only interact with its environment through its imports which system-specific to prevent portability hazards. are provided by a client, so that the client has full control Endianness. Programs that load and store to aliased over the capabilities given to a module. Both these aspects locations with different types can observe byte order. Since are essential ingredients to the safety of WebAssembly. most contemporary hardware has converged on little endian, Functions. The code in a module is organized into indi- or at least can handle it equally well, we chose to define Web- vidual functions, taking parameters and returnng results as Assembly memory to have little endian byte order. Thus the defined by its function type. Functions can call each other, semantics of memory access is completely deterministic and including recursively, but are not first class and cannot be portable across all engines and platforms.