Confllvm: a Compiler for Enforcing Data Confidentiality in Low-Level Code

ConfLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code Ajay Brahmakshatriya∗ Piyus Kedia∗ Derrick P. McKee∗ MIT, USA IIIT Delhi, India Purdue University, USA [email protected] [email protected] [email protected] Deepak Garg Akash Lal Aseem Rastogi MPI-SWS, Germany Microsoft Research, India Microsoft Research, India [email protected] [email protected] [email protected] Hamed Nemati Anmol Panda Pratik Bhatu∗ CISPA, Saarland University, Germany Microsoft Research, India AMD, India [email protected] [email protected] [email protected] Abstract models are trained on private inputs. Bugs or exploited vul- We present a compiler-based scheme to protect the confiden- nerabilities in these programs may leak the private data to tiality of sensitive data in low-level applications (e.g. those public channels that are visible to unintended recipients. For written in C) in the presence of an active adversary. In our example, the OpenSSL buffer-overflow vulnerability Heart- scheme, the programmer marks sensitive data by lightweight bleed [6] can be exploited to exfiltrate a web server’s private annotations on the top-level definitions in the source code. keys to the public network in cleartext. Generally speaking, The compiler then uses a combination of static dataflow anal- the problem here is one of information flow control [28]: We ysis, runtime instrumentation, and a novel taint-aware form would like to enforce that private data, as well as data de- of control-flow integrity to prevent data leaks even in the rived from it, is never sent out on public channels unless it presence of low-level attacks. To reduce runtime overheads, has been intentionally declassified by the program. the compiler uses a novel memory layout. The standard solution to this problem is to use static We implement our scheme within the LLVM framework dataflow analysis or runtime taints to track how private data and evaluate it on the standard SPEC-CPU benchmarks, and flows through the program. While these methods work well on larger, real-world applications, including the NGINX web- in high-level, memory-safe languages such as Java [37, 40] server and the OpenLDAP directory server. We find that the and ML [46], the problem is very challenging and remains performance overheads introduced by our instrumentation broadly open for low-level compiled languages like C that are moderate (average 12% on SPEC), and the programmer are not memory-safe. First, the cost of tracking taint at run- effort to port the applications is minimal. time is prohibitively high for these languages (see Section 9). Second, static dataflow analysis cannot guarantee data confi- 1 Introduction dentiality because the lack of memory safety allows for buffer overflow and control-flow hijack attacks [1, 2, 4, 11, 49,54], Many programs compute on private data: Web servers use both of which may induce data flows that cannot be antici- private keys and serve private files, medical software pro- pated during a static analysis. cesses private medical records, and many machine learning One possible approach is to start from a safe dialect of arXiv:1711.11396v3 [cs.PL] 13 Mar 2019 ∗ C (e.g. CCured [42], Deputy [24], or SoftBound [41]) and Work done while the author was at Microsoft Research, India. leverage existing approaches such as information-flow type systems for type-safe languages [32, 50]. The use of safe Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies dialects, however, (a) requires additional annotations and are not made or distributed for profit or commercial advantage and that program restructuring that cause significant programming copies bear this notice and the full citation on the first page. Copyrights overhead [38, 42], (b) are not always backwards compatible for components of this work owned by others than the author(s) must with legacy code, and (c) have prohibitive runtime overhead be honored. Abstracting with credit is permitted. To copy otherwise, or making them a non-starter in practical applications (see republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. further discussion in Section 9). EuroSys ’19, March 25–28, 2019, Dresden, Germany In this paper, we present the first end-to-end, practical © 2019 Copyright held by the owner/author(s). Publication rights licensed compiler-based scheme to enforce data confidentiality in C to ACM. programs even in the presence of active, low-level attacks. ACM ISBN 978-1-4503-6281-8/19/03...$15.00 Our scheme is based on the insight that complete memory https://doi.org/10.1145/3302424.3303952 1 safety and perfect control-flow integrity (CFI) are neither vanilla compiler. It can access all of U’s memory, specifically, sufficient nor necessary for preventing data leaks. Wede- it can perform declassification by copying data from U’s pri- sign a compiler that is able to guarantee data confidentiality vate region to U’s public region (e.g. after encrypting the without requiring these properties. data). T has its own separate stack and heap, and we re-use We accompany our scheme with formal security proofs the range checks on memory accesses in U to prevent them and a thorough empirical evaluation to show that it can from reading or writing to T ’s memory. We give an example be used for real applications. Below, we highlight the main and general guidelines for refactoring U and T in Section 2. components of our scheme. Formal guarantees. We have formalized our scheme using Annotating private data. We require the programmer to a core language of memory instructions (load and store) and add private type annotations only to top-level function sig- control flow instructions (goto, conditional, and direct and natures and globals’ definitions to mark private data. The indirect function call). We prove a termination-insensitive programmer is free to use all of the C language, including non-interference theorem for U, assuming that the T func- pointer casts, aliasing, indirect calls, varargs, variable length tions it calls are non-interfering. In other words, we prove arrays, etc. that if two public-equivalent configurations of U take a Flow analysis and runtime instrumentation. Our com- step each, then the resulting configurations are also public- piler performs standard dataflow analysis, statically propa- equivalent. Our formal model shows the impossibility of gating taint from global variables and function arguments sensitive data leaks even in the presence of features like that have been marked private by the programmer, to detect aliasing and casting, and low-level vulnerabilities such as any data leaks. This analysis assumes the correctness of de- buffer overflows and ROP attacks (Appendix A). clared taints of each pointer. These assumptions are then protected by inserting runtime checks that assert correctness Implementation and Evalution We have implemented of the assumed taint at runtime. These checks are necessary our compiler, ConfLLVM, as well as the complementary low- to guard against incorrect casts, memory errors and low- level verifier, ConfVerify, within the LLVM framework [36]. level attacks. Our design does not require any static alias We evaluate our implementation on the standard SPEC- analysis, which is often hard to get right with acceptable CPU benchmarks and three large applications—NGINX web precision. server [10], OpenLDAP directory server [45], and a neural- Novel memory layout. To reduce the overhead associated network-based image classifier built on Torch [13, 14]. All with runtime checks, our compiler partitions the program’s three applications have private data—private user files in virtual address space into a contiguous public region and a the case of NGINX, stored user passwords in OpenLDAP, disjoint, contiguous private region, each with its own stack and a model trained on private inputs in the classifier. In all and heap. Checking the taint of a pointer then simply reduces cases, we are able to enforce confidentiality for the private to a range check on its value. We describe two partitioning data with a moderate overhead on performance and a small schemes, one based on the Intel MPX ISA extension [7], amount of programmer effort for adding annotations and and the other based on segment registers (Section 3). Our declassification code. runtime checks do not enforce full memory safety: a private (public) pointer may point outside its expected object but 2 Overview our checks ensure that it always dereferences somewhere in Threat model. We consider C applications that work with the private (public) region. These lightweight checks suffice both private and public data. Applications interact with the for protecting data confidentiality. external world using the network, disk, and other channels. Information flow-aware CFI. Similar to memory safety, They communicate public data in clear, but want to protect we observe that perfect CFI is neither necessary nor sufficient the confidentiality of the private data by, for example, en- for information flow. We introduce an efficient, taint-aware crypting it before sending it out. However, the application CFI scheme. Our compiler instruments the targets of indirect could have logical or memory errors, or exploitable vulnera- calls with magic sequences that summarize the output of bilities that may cause private data to be leaked out in clear. the static dataflow analysis at those points. At the source of The attacker interacts with the application and may send the indirect calls, a runtime check ensures that the magic carefully-crafted inputs that trigger bugs in the application. sequence at the target is taint-consistent with the output of The attacker can also observe all the external communica- the static dataflow analysis at the source (Section 4).

Confllvm: a Compiler for Enforcing Data Confidentiality in Low-Level Code

공개sw 솔루션 목록(2015.12.30)

Deviceside Software Update Strategies for Automotive Grade Linux

Python Language

An Analysis of Python's Topics, Trends, and Technologies Through Mining Stack Overflow Discussions

Comparison of Web Server Software from Wikipedia, the Free Encyclopedia

Pweb Peer-To-Peer Web Hosting Communication System and Dynamic Web Hosting

STONE FAB CATALOG 2020-V1 Your Stone, Tile and Concrete Tool Supply Experts Since 1971

SOPHOS IPS Signature Update Release Notes

Code of Federal Regulations GPO Access

Embedded Web Device Security

Server Push and Web Sockets

Hunter Documentation Release 0.23