Analyzing the Performance of Webassembly Vs. Native Code Abhinav Jangda, Bobby Powers, Emery D

Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha, University of Massachusetts Amherst https://www.usenix.org/conference/atc19/presentation/jangda This paper is included in the Proceedings of the 2019 USENIX Annual Technical Conference. July 10–12, 2019 • Renton, WA, USA ISBN 978-1-939133-03-8 Open access to the Proceedings of the 2019 USENIX Annual Technical Conference is sponsored by USENIX. Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha University of Massachusetts Amherst Abstract WebAssembly is a low-level, statically typed language All major web browsers now support WebAssembly, a low- that does not require garbage collection, and supports in- level bytecode intended to serve as a compilation target for teroperability with JavaScript. The goal of WebAssembly code written in languages like C and C++. A key goal of Web- is to serve as a universal compiler target that can run in a Assembly is performance parity with native code; previous browser [15, 16, 18].1 Towards this end, WebAssembly is work reports near parity, with many applications compiled designed to be fast to compile and run, to be portable across to WebAssembly running on average 10% slower than na- browsers and architectures, and to provide formal guarantees tive code. However, this evaluation was limited to a suite of type and memory safety. Prior attempts at running code at of scientific kernels, each consisting of roughly 100 lines of native speed in the browser [4, 13, 14, 38], which we discuss code. Running more substantial applications was not possi- in related work, do not satisfy all of these criteria. ble because compiling code to WebAssembly is only part of WebAssembly is now supported by all major browsers [8, the puzzle: standard Unix APIs are not available in the web 34] and has been swiftly adopted by several programming browser environment. To address this challenge, we build languages. There are now backends for C, C++, C#, Go, BROWSIX-WASM, a significant extension to BROWSIX [29] and Rust [1, 2, 24, 39] that target WebAssembly. A curated that, for the first time, makes it possible to run unmodified list currently includes more than a dozen others [10]. Today, WebAssembly-compiled Unix applications directly inside the code written in these languages can be safely executed in browser. We then use BROWSIX-WASM to conduct the first browser sandboxes across any modern device once compiled large-scale evaluation of the performance of WebAssembly to WebAssembly. vs. native. Across the SPEC CPU suite of benchmarks, we A major goal of WebAssembly is to be faster than find a substantial performance gap: applications compiled to JavaScript. For example, the paper that introduced Web- WebAssembly run slower by an average of 45% (Firefox) to Assembly [18] showed that when a C program is compiled 55% (Chrome), with peak slowdowns of 2:08× (Firefox) and to WebAssembly instead of JavaScript (asm.js), it runs 34% 2:5× (Chrome). We identify the causes of this performance faster in Google Chrome. That paper also showed that the per- degradation, some of which are due to missing optimizations formance of WebAssembly is competitive with native code: and code generation issues, while others are inherent to the of the 24 benchmarks evaluated, the running time of seven WebAssembly platform. benchmarks using WebAssembly is within 10% of native code, and almost all of them are less than 2× slower than 1 Introduction native code. Figure1 shows that WebAssembly implementa- tions have continuously improved with respect to these bench- Web browsers have become the most popular platform for run- marks. In 2017, only seven benchmarks performed within ning user-facing applications, and until recently, JavaScript 1.1× of native, but by 2019, this number increased to 13. was the only programming language supported by all major These results appear promising, but they beg the question: web browsers. Beyond its many quirks and pitfalls from the are these 24 benchmarks representative of WebAssembly’s perspective of programming language design, JavaScript is intended use cases? also notoriously difficult to compile efficiently [12, 17, 30, 31]. Applications written in or compiled to JavaScript typ- 1The WebAssembly standard is undergoing active development, with ically run much slower than their native counterparts. To ongoing efforts to extend WebAssembly with features ranging from SIMD primitives and threading to tail calls and garbage collection. This paper address this situation, a group of browser vendors jointly focuses on the initial and stable version of WebAssembly [18], which is developed WebAssembly. supported by all major browsers. USENIX Association 2019 USENIX Annual Technical Conference 107 The Challenge of Benchmarking WebAssembly The PLDI 2017 April 2018 This paper aforementioned suite of 24 benchmarks is the PolybenchC 25 benchmark suite [5], which is designed to measure the ef- fect of polyhedral loop optimizations in compilers. All the 20 benchmarks in the suite are small scientific computing ker- 15 nels rather than full applications (e.g., matrix multiplication and LU Decomposition); each is roughly 100 LOC. While 10 WebAssembly is designed to accelerate scientific kernels on the Web, it is also explicitly designed for a much richer set of 5 full applications. benchmarks # of PolyBenchC The WebAssembly documentation highlights several in- 0 tended use cases [7], including scientific kernels, image editing, video editing, image recognition, scientific visualization, < 2x of native simulations, programming language interpreters, virtual ma- < 1.1x of native < 1.5x of native < 2.5x of native chines, and POSIX applications. Therefore, WebAssembly’s Performance relative to native strong performance on the scientific kernels in PolybenchC do not imply that it will perform well given a different kind Figure 1: Number of PolyBenchC benchmarks performing of application. within x× of native. In 2017 [18], seven benchmarks per- We argue that a more comprehensive evaluation of Web- formed within 1.1× of native. In April 2018, we found that Assembly should rely on an established benchmark suite of 11 performed within 1.1× of native. In May 2019, 13 per- large programs, such as the SPEC CPU benchmark suites. formed with 1.1× of native. In fact, the SPEC CPU 2006 and 2017 suite of benchmarks include several applications that fall under the intended processes, files, pipes, blocking I/O, and other Unix fea- use cases of WebAssembly: eight benchmarks are scien- tures. Moreover, it includes a C/C++ compiler (based on tific applications (e.g., 433.milc, 444.namd, 447.dealII, Emscripten) that allows programs to run in the browser un- 450.soplex, and 470.lbm), two benchmarks involve image modified. The BROWSIX case studies include complex ap- and video processing (464.h264ref and 453.povray), and plications, such as LAT X, which runs entirely in the browser all of the benchmarks are POSIX applications. E without any source code modifications. Unfortunately, it is not possible to simply compile a sophis- ticated native program to WebAssembly. Native programs, Unfortunately, BROWSIX is a JavaScript-only solution, including the programs in the SPEC CPU suites, require oper- since it was built before the release of WebAssembly. More- ating system services, such as a filesystem, synchronous I/O, over, BROWSIX suffers from high performance overhead, and processes, which WebAssembly and the browser do not which would be a significant confounder while benchmarking. provide. The SPEC benchmarking harness itself requires a file Using BROWSIX, it would be difficult to tease apart the poorly system, a shell, the ability to spawn processes, and other Unix performing benchmarks from performance degradation intro- facilities. To overcome these limitations when porting native duced by BROWSIX. applications to the web, many programmers painstakingly modify their programs to avoid or mimic missing operating Contributions system services. Modifying well-known benchmarks, such as SPEC CPU, would not only be time consuming but would • BROWSIX-WASM: We develop BROWSIX-WASM, a also pose a serious threat to validity. significant extension to and enhancement of BROWSIX The standard approach to running these applications today that allows us to compile Unix programs to Web- is to use Emscripten, a toolchain for compiling C and C++ to Assembly and run them in the browser with no modifi- WebAssembly [39]. Unfortunately, Emscripten only supports cations. In addition to integrating functional extensions, the most trivial system calls and does not scale up to large- BROWSIX-WASM incorporates performance optimiza- scale applications. For example, to enable applications to use tions that drastically improve its performance, ensuring synchronous I/O, the default Emscripten MEMFS filesystem that CPU-intensive applications operate with virtually loads the entire filesystem image into memory before the no overhead imposed by BROWSIX-WASM (§2). program begins executing. For SPEC, these files are too large to fit into memory. • BROWSIX-SPEC: We develop BROWSIX-SPEC, a har- A promising alternative is to use BROWSIX, a framework ness that extends BROWSIX-WASM to allow automated that enables running unmodified, full-featured Unix appli- collection of detailed timing and hardware on-chip per- cations in the browser [28, 29]. BROWSIX implements a formance counter information

Load more