Phosphor: Illuminating Dynamic Data Flow in Commodity Jvms

Phosphor: Illuminating Dynamic Data Flow in Commodity Jvms

Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs Jonathan Bell and Gail Kaiser Columbia University, New York, NY USA rtifact A Comple * t * te n * A * te is W E s A e n C l fjbell,[email protected] l L o D C S o * * c P u e m s E u O e e n v R t e O o d t a y * s E a * l u d a e t Abstract 1. Introduction Dynamic taint analysis is a well-known information flow Dynamic taint analysis (also referred to as dynamic informa- analysis problem with many possible applications. Taint tion flow tracking) is a powerful form of information flow tracking allows for analysis of application data flow by analysis useful for identifying the origin of data during ex- assigning labels to data, and then propagating those la- ecution. Inputs to an application are “tainted,” or labeled bels through data flow. Taint tracking systems traditionally with a tag. As computations are performed, these labels are compromise among performance, precision, soundness, and propagated through the system such that any new values de- portability. Performance can be critical, as these systems rived from a tagged value also carry a tag derived from these are often intended to be deployed to production environ- source input tags. In this way, we can inspect any object and ments, and hence must have low overhead. To be deployed determine if it is derived from a tainted input by inspecting in security-conscious settings, taint tracking must also be its label. By maintaining a precise mapping from objects to sound and precise. Dynamic taint tracking must be portable labels, we can enable a broad range of analyses, for such in order to be easily deployed and adopted for real world purposes as end-user privacy testing [16], fine-grained data purposes, without requiring recompilation of the operating security [3, 10, 27, 31], detection of code-injection attacks system or language interpreter, and without requiring access [22, 33, 35] and improved debugging [17, 25]. to application source code. Taint tracking systems typically face challenges in both We present PHOSPHOR, a dynamic taint tracking system precision and soundness, in that it is generally difficult to for the Java Virtual Machine (JVM) that simultaneously determine the bounds of a variable in memory. Therefore, achieves our goals of performance, soundness, precision, some parts of a variable may become dissociated with their and portability. Moreover, to our knowledge, it is the first intended taint tag, or multiple variables may inadvertently portable general purpose taint tracking system for the JVM. be tracked together as a single variable. Taint tracking sys- We evaluated PHOSPHOR’s performance on two commonly tems similarly face a challenge of performance: many ap- used JVM languages (Java and Scala), on two successive plications for taint tracking rely on its use in real, deployed revisions of two commonly used JVMs (Oracle’s HotSpot systems, demanding an acceptably low run time overhead. and OpenJDK’s IcedTea) and on Android’s Dalvik Virtual Traditional approaches to building taint tracking sys- Machine, finding its performance to be impressive: as low tems that address these challenges normally rely on mod- as 3% (53% on average; 220% at worst) using the DaCapo ifications to the operating system [39, 44], modifications to macro benchmark suite. This paper describes our approach the language interpreter [3, 9, 16, 20, 28, 29], or access to toward achieving portable taint tracking in the JVM. source code [24, 42]. While taint tracking at the interpreter Categories and Subject Descriptors D.3.3 [Programming or source code level can improve soundness and precision Languages]: Language Constructs and Features; D.4.6 (by providing variable-level memory semantics), these ap- [Operating Systems]: Security and Protection—Information proaches all introduce a new challenge: portability. Such ap- flow controls proaches can have limited applicability due to the need for Keywords Taint Tracking, Dataflow Analysis modifications to client systems (e.g. specialized operating systems), need for specialized interpreters, or need to access source code. For example, in the case of two taint tracking systems that target the JVM [9, 28], they are restricted to support only research-oriented JVMs which do not support the full Java specification. For example, these JVMs are un- able to execute the entirety of the popular macro-benchmark suite DaCapo [5], an indicator that they may be impracti- cal to deploy in client environments. We are not aware of Columbia University Computer Science Technical Report #CUCS-008-14 any taint tracking systems that target the JVM that are suffi- ciently portable to function with commonly used JVMs from 2. Motivation vendors such as Oracle and the OpenJDK project. Although several existing systems target Java applications Our key insight is that we can leverage the same ben- (e.g. [12, 21, 22]) by modifying application or library byte efits (e.g., in soundness and precision) that interpreter-level code, these are not general purpose: they can only track data and source code-level approaches gain without requiring any flow of Java Strings (and not of any other type), and there- modification to the underlying interpreter by taking advan- fore are unable to continue tracking those Strings in the event tage of the strong specification for the intermediate language that they are converted by the application to another repre- (i.e. byte code) that runs in that interpreter. This approach is sentation (such as a character array). Moreover, these sys- a compromise between interpreter level approaches (which tems can not track inputs that are not Strings (e.g. integers, do not require access to source code but require modifica- or a language-specific version of String in another, non-Java tions to the interpreter) and source level approaches (which JVM language). require access to source code but do not change the inter- Several existing systems can perform taint tracking on all preter), functioning instead at the level of byte code. PHOS- data types in Java, but are highly restricted in portability, PHOR provides taint tracking within the Java Virtual Ma- functioning only on research JVMs. The JVMs targeted by chine (JVM) without requiring any modifications to the lan- [28] (Kaffe [37]) and [9] (Jikes RVM [36]) support only a guage interpreter, VM, or operating system, and without re- subset of Java version 6, severely limiting applicability. We quiring any access to source code. Moreover, PHOSPHOR will refer to both of these incomplete JVMs as “research can be applied to any commodity JVM, and functions with JVMs,” as they do not implement the complete Java spec- code written in any language targeting the JVM, such as Java ification, and are principally used within the research com- and Scala. munity (rather than in production environments). PHOSPHOR’s approach to tracking variable level taint We also note that while we focus on dynamic taint track- tags (without modifying the JVM) seems simple at first: ing, static taint analysis is also a topic of interest. However, we essentially need only instrument all code such that ev- while static taint analysis for Java [2, 34, 38] can determine ery variable maps to a “shadow” variable, which stores the a priori where data might leak from a system, it may report taint tag for that variable. However, such changes are actu- false positives from code which can not execute in practice, ally quite invasive, and become complicated as our modified and as with all static analysis tools for Java, it must model Java code begins to interact with (non-modified) native li- reflective calls, possibly further increasing the likelihood of braries. In fact, we are unaware of any previous work that false positives. makes such invasive changes to the bytecode executed by There is a need for a general purpose taint tracking system the JVM: most previous taint tracking systems for the JVM that is sufficiently decoupled from specific data types to use slower mechanisms to maintain this shadow data [40]. support a wide range of precise and sound analyses (i.e. with We present a detailed description of the challenges that we no false positives or false negatives) for applications running faced implementing our instrumentation, and describe how on any production JVM. We briefly describe work in three our general approach could be used for other sorts of dy- broad areas that could benefit from PHOSPHOR. namic data and control flow analyses in Java. We evaluated PHOSPHOR on a variety of macro and mi- 2.1 Detecting injection attacks cro benchmarks on several widely-used JVMs from Oracle and the OpenJDK project, finding its overhead to be impres- Taint tracking has been widely studied as a mechanism for sively low: as low as 3.32%, on average 53.31% (and up improving application security. Taint tracking can be used to 220%) in macro benchmarks. We also compared PHOS- to ensure that untrusted inputs from external sources (such PHOR to the popular, state of the art Android-only taint track- as an end-user) are not used as inputs to critical functions ing system, TaintDroid [16], finding that our approach is far [22, 33, 35]. For instance, consider an application that takes more portable, is more precise, and is comparable in perfor- an input string from the user, and then reads a file based mance. on that input, returning the file to the user. An attacker The contributions of this paper are: could perhaps craft an input to coerce the application to read and return an arbitrary file, including sensitive files such • A general purpose approach to efficiently storing meta- as /etc/passwd.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us