Taint Tracking for Webassembly

Taint Tracking for Webassembly

Taint Tracking for WebAssembly ARON SZANTO, Harvard University TIMOTHY TAMM, Harvard University ARTIDORO PAGNONI, Harvard University WebAssembly seeks to provide an alternative to run- their runtime engines. Since wasm code is com- ning large and untrusted binaries within web browsers piled, optimized, static, has a linear memory model, by implementing a portable, performant, and secure and does not include built-in automatic garbage bytecode format for native web computation. However, collection, it is 20x-40x faster than JavaScript [2]. WebAssembly is largely unstudied from a security per- And because wasm is intended to run natively, its spective. In this work, we build the rst WebAssembly developers have focused intently on security guar- virtual machine that runs in native JavaScript, and imple- antees that were previously intractable in the face ment a novel taint tracking system that allows a user to of large, third-party codebases. run untrusted WebAssembly code while monitoring the ow of sensitive data through the application. We also ough wasm is both economical and perfor- introduce indirect taint, a label that denotes the implicit mant, wide adoption by the community requires, ow of sensitive information between local variables. as with all new languages, the development of com- rough rigorous testing and validation, we show that prehensive security tools atop it so that code can our system is correct, secure, and relatively ecient, be checked for safety. One important challenge beneting from the native performance of WebAssem- in security analysis is to monitor the ow of sen- bly while retaining precise security guarantees of more sitive information through a particular program. mature soware paradigms. In other environments, taint tracking has been de- ployed as a model for strict bookkeeping of sen- sitive data [5]. However, there does not yet exist 1 INTRODUCTION a platform for taint tracking inside the wasm exe- As web applications grow in size and complexity, cution environment. Native wasm taint tracking they require users to rely on third-party browser requires the browser to interpret wasm binary code plugins. ese large programs oer the capacity using client-side (JavaScript) soware to track the to handle heavy computational loads in exchange ow of information at runtime. However, (to our for bulky and potentially insecure non-native im- knowledge) there does not exist a JavaScript vir- plementations. In the past, growing demand for tual machine (VM) for wasm. Our contributions complex applications like video editing soware, are thus twofold: we develop the rst JavaScript arXiv:1807.08349v1 [cs.CR] 22 Jul 2018 3D games, and scientic programs le both users wasm VM, and atop this framework we institute and developers lile choice by way of soware native binary-code-granularity taint tracking. e models for heavy-duty soware on the client side. remainder of this paper will proceed as follows: However, an alternative framework known as We- Section 2 gives an overview of the WebAssembly bAssembly (wasm) was recently released as the technical specication and describes related work standard of the future for native high-performance in taint tracking. Section 3 describes our techni- computing in the browser. wasm allows for the cal approach to building both the JavaScript wasm compilation of C or C++ code into a novel binary in- Virtual Machine and the taint tracking soware struction set, which browsers will be congured to execute in a sandboxed virtual environment within linked to it. Section 4 describes the test environ- executed in sequence. An opcode species either ment, including three parts: a parallel compilation a control instruction, which may change the state of C into assembly and WebAssembly with veri- of the program instruction counter (similar to the cation of their equivalence via our virtual machine, %eip value in x86), or a simple instruction, which an extensive suite of taint tracking tasks to validate performs an operation over the values at the top the correctness of our methods, and a performance of the stack before pushing the results back onto evaluation to demonstrate the relative eciency the stack. wasm opcodes are strongly typed, with of our taint tracking implementation. Section 5 each operation specifying an exact datatype (or concludes and suggests avenues for future work. datatypes) over which it operates. Along with the serial nature of the instruction stream, this guar- 2 BACKGROUND AND RELATED antees that secure verication of a wasm program can be done in one pass. WORK WebAssembly’s memory model is simple: a lin- 2.1 WebAssembly Technical Overview ear, contiguous block of memory that is sandboxed WebAssembly is a low-level bytecode format de- away from the stack, local variables, and the run- signed to be compiled from C and C++ and run time engine’s memory. is preserves security and natively in web browsers. In the past, users of com- ensures simplicity for access (and taint tracking). plex applications would have to install browser Last, wasm does not have direct access to system plugins, which are cumbersome and untrusted. We- resources, instead relying on external JavaScript bAssembly allows for native execution of high- code to pass in data to the virtual environment. performance code within the browser while ad- However, WebAssembly is able to export data to hering to strict security guidelines like sandboxed the runtime environment, meaning that there is execution and deterministic behavior. Since its de- a need for a system that monitors wasm code to velopment and MVP phase in 2016, WebAssembly ensure the proper handling of secure data. has enjoyed quick adoption by major browsers– an October 2017 estimate put the share of browsers 2.2 Taint Tracking supporting wasm at 61.34% [1]. Most users regularly use a wide variety of soware WebAssembly’s runtime engine is described as a that (perhaps unbeknownst to the user) has access “structured stack machine” [3] in that most wasm to sensitive information, including credit card num- computations involve a local stack of values, func- bers; device hardware data; system, personal, and tion calls push and pop values from the stack, and advertising preferences; and personal identiers control ow is organized into blocks, ifs, and loops. like social security numbers and birth dates. Be- Each binary operation code (opcode) is parsed se- cause soware permissions are both coarse-grained rially and independently, with the full binary syn- and highly opaque, mechanisms for monitoring the tax specied as the instantiation of a formal se- ow of sensitive information during the execution mantics. is allows wasm to dene an abstract of a semi-trusted program are a valuable tool in runtime structure that is hardware-agnostic, allow- security analysis. ing for full portability across languages, browsers, Taint tracking is a technique that assigns each operating systems, and machines [6]. However, data object in a program a taint label that contains in exchange for this exibility, instructions are of information about its sensitivity. Taint sources are variable length, which complicates interpreter im- those that are inherently sensitive (e.g., personal in- plementations. formation, IMEI numbers), and their labels are ini- WebAssembly bytecode functions are organized tialized as tainted. As the program executes, taint is into blocks of instructions which are decoded and transitively propagated between data objects when 2 one object’s value is inuenced by that of a tainted e largest distributable and executable unit of object. As such, at any point in time the taint track- code in wasm is known as a module. We represent ing engine is able to determine precisely which a module as a JavaScript process, meaning that data are tainted (i.e., either directly or indirectly each VM instance contains one module. Within a store sensitive information) and are thus unsafe to module, the WebAssembly abstract runtime is or- transmit to untrusted parties. Many systems for ganized by section, independent components of the information ow monitoring in previous work are engine with global scope and idiosyncratic respon- coarse-grained and operate at the emulator level sibilities. e sections dened by the specication [7], but for our purposes a bytecode-granularity are: import1, export, start, global, memory, data, tracking system is required; [5] describes some of table, elements, function, and code. these previous eorts. One recent bytecode-level e scaolding routine (build_module() in our impelementation is TaintDroid [5], a version of implementation) sets up the local environment for taint tracking built atop the Android mobile op- the current module, creating JavaScript objects for erating system. TaintDroid leverages instruction local memory structures like the runtime stack, in- code taint propagation in order to shield users from stantiating objects for the various sections (e.g., exposing information to untrusted sources, and to reading and storing static data), and making space identify applications that act carelessly or mali- for function execution and dynamically-allocated ciously towards users’ data privacy. memory. In more detail, scaolding proceeds as follows: check the magic values and version codes We motivate our work by noting that despite the at the beginning of the bytecode to ensure initial many advantages promised by WebAssembly, its validity. Allocate JavaScript objects to hold tables, relative youth implies a lack of security tools built data, globals, memory, tables, functions, exports, atop it. Since JavaScript is the only language that and types (each dened in the formal specication). can run natively in a browser, it is essential that Starting with the rst byte aer the header infor- there be a JavaScript-based virtual machine that mation, read instructions sequentially. For each can execute and instrument the wasm bytecode for section code encountered, instantiate JavaScript applications like taint tracking. As such, we build objects as required by the wasm binary. For exam- such a machine as a substrate for our taint tracking ple, the scaolding may encounter a binary code and other future security soware.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us