Ptrsplit: Supporting General Pointers in Automatic Program Partitioning

PtrSplit: Supporting General Pointers in Automatic Program Partitioning Shen Liu Gang Tan Trent Jaeger The Pennsylvania State University The Pennsylvania State University The Pennsylvania State University University Park, PA University Park, PA University Park, PA [email protected] [email protected] [email protected] ABSTRACT these programs are prone to attacks (e.g., attacks enabled by mem- Partitioning a security-sensitive application into least-privileged ory vulnerabilities). For instance, OpenSSH was refactored by Provos components and putting each into a separate protection domain et al. to have unprivileged monitor processes for handling user con- have long been a goal of security practitioners and researchers. nections and one privileged server process [28]. Another example However, a stumbling block to automatically partitioning C/C++ is the microkernel operating-system design, in which a minimum applications is the presence of pointers in these applications. Point- amount of code is kept in the kernel and most OS functionalities ers make calculating data dependence, a key step in program par- are pushed outside. Yet another example is Google’s Chromium titioning, difficult and hard to scale; furthermore, C/C++ pointers browser, which isolates each tab into a sandboxed process [1, 29]. do not carry bounds information, making it impossible to automat- These manual restructuring efforts have significantly improved ically marshall and unmarshall pointer data when they are sent the security of the relevant software; however, they are labor inten- across the boundary of partitions. In this paper, we propose a set of sive and sometimes error-prone. techniques for supporting general pointers in automatic program Several systems [3, 5, 21, 32] have been proposed to apply pro- partitioning. Our system, called PtrSplit, constructs a Program De- gram analysis to separate C/C++ applications automatically into pendence Graph (PDG) for tracking data and control dependencies partitions, from a small number of user annotations about sensitive in the input program and employs a parameter-tree approach for data. These systems demonstrate automatic program partitioning representing data of pointer types; this approach is modular and can be practical. However, one major limitation of these systems is avoids global pointer analysis. Furthermore, it performs selective that they lack good support for pointer data, which are prevalent pointer bounds tracking to enable automatic marshalling/unmar- in C/C++ applications. In particular, shalling of pointer data, even when there is circularity and arbitrary • C-style pointers do not carry bounds information; when aliasing. As a result, PtrSplit can automatically generate executable a pointer needs to be sent across the partition boundary partitions for C applications that contain arbitrary pointers. in an RPC call, marshalling does not know the size of the KEYWORDS underlying buffer and consequently cannot marshall the buffer automatically. Some systems adopt heuristics when Automatic program partitioning; bounds tracking; data marshalling marshalling pointer data (e.g., a “char *” pointer is assumed to point to a null-terminated string); however, programmers 1 INTRODUCTION are often required to write marshalling and unmarshalling code manually for pointer data, especially for pointers that Following the principle of least privilege, privilege separation in point to dynamically allocated buffers. Some systems avoid software refers to separating a software application into multiple the problem by restricting the partitioning algorithm to not partitions, each with its own set of privileges. Partitions are isolated create partitions that require pointer passing; this design, so that the compromise of one partition does not directly lead to the however, limits the flexibility of where partitions can be compromise of other partitions. Function calls between partitions created. are realized by Remote-Procedure Calls (RPCs); data for an RPC • A program-partitioning algorithm needs to reason about are marshalled and sent to the callee, which unmarshalls the data, dependence in a program to decide where to split. When the performs its computation, and sends the result back to the caller. program has pointers, a global pointer analysis is typically Privilege separating programs in low-level, type-unsafe lan- required to understand aliasing and how data flow in mem- guages such as C/C++ is especially beneficial to security because ory. However, global pointer analysis is often complex and does not scale to large programs. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation In this paper, we propose a series of techniques that enable the on the first page. Copyrights for components of this work owned by others than the support of pointers in automatic program partitioning. These tech- author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or niques are implemented and evaluated in a system called PtrSplit. republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. Its major techniques and contributions are as follows: CCS ’17, October 30-November 3, 2017, Dallas, TX, USA © 2017 Copyright held by the owner/author(s). Publication rights licensed to Associa- • Taking source code as input, PtrSplit constructs a static tion for Computing Machinery. Program-Dependence Graph (PDG) for the program. A fea- ACM ISBN 978-1-4503-4946-8/17/10...$15.00 https://doi.org/10.1145/3133956.3134066 ture in PtrSplit’s PDG that distinguishes it from previous PDGs for imperative programs is a technique called parame- with sensitive information and an unprivileged slave process. Pro- ter trees. It provides a modular way of constructing the PDG gramCutter [32] collects a dynamic dependence graph via profiling for a program with pointers; as a result, only an intraproce- and performs graph partitioning to produce partitions that bal- dural pointer analysis is needed, instead of a global pointer ance performance and security using a multi-terminal minimal cut analysis. Our tree representation generalizes the object-tree algorithm. SeCage [21] employs hybrid static/dynamic analysis approach in prior work [19], which discussed a tree repre- to compute a set of functions that can access secrets and isolates sentation for objects in object-oriented languages and did the sensitive partition via hardware virtualization support. Jif/s- not cover pointers at the language level; our system uses the plit [33, 35] automatically partitions a Java source program based tree representation for representing pointers in imperative on security-label annotations and a description of trust relation- languages and deals with circular data structures resulting ships between protection domains. Swift [7] generalizes Jif/split from pointers. for the development of web applications by exploring general al- Based on the PDG, PtrSplit performs a standard reachability- gorithms for improving both security and performance. With the based program-partitioning algorithm that separates the emergence of Trusted Execution Environments (TEEs), there have program into a partition that accesses sensitive data and also been program-partitioning frameworks that target Intel’s SGX a partition for the rest of the code. or ARM’s TrustZone. For instance, Rubinov et al. [30] proposed a • To marshall pointers, PtrSplit instruments the program so static-analysis framework that partitions an Android application that pointers carry bounds information. However, prior work into one component that runs in TrustZone’s secure world and one shows that full pointer bounds tracking incurs significant that runs in TrustZone’s normal world. A similar system called performance overhead. PtrSplit makes the critical observa- Glamdring [20] targets Intel’s SGX. tion that program partitioning does not need full pointer All the aforementioned automatic program partitioning frame- tracking—it is sufficient to track the bounds of pointers that works, either partition programs in languages that do not have cross the partitioning boundary. Therefore, given an arbi- explicit pointers (e.g., Java) or require programmers to manually trary partitioning of the program, PtrSplit computes a set of write marshalling and unmarshalling code for pointer data [5, 32]; pointers that require bounds information and instruments furthermore, data dependence computed by these frameworks that the program to track the bounds of only those pointers. We partition C/C++ application are incomplete in the presence of point- call this selective pointer bounds tracking. ers and can lead to incorrect partitioning results. In contrast, Ptr- • PtrSplit generates code that performs marshalling and un- Split uses a PDG representation that soundly represents pointer marshalling for data sent over an RPC call. This is auto- data as parameter trees, and tracks pointer bounds for automatic matic even for pointer data because all pointers that cross marshalling and unmarshalling without user involvement. the partition boundary carry bounds information. We de- PtrSplit partitions programs so that sensitive data within the scribe a type-based algorithm for performing deep copies of sensitive partition

Load more