Finding Address-Based Side-Channels in Binaries

DATA – Differential Address Trace Analysis: Finding Address-based Side-Channels in Binaries Samuel Weiser, Graz University of Technology; Andreas Zankl, Fraunhofer AISEC; Raphael Spreitzer, Graz University of Technology; Katja Miller, Fraunhofer AISEC; Stefan Mangard, Graz University of Technology; Georg Sigl, Fraunhofer AISEC; Technical University of Munich https://www.usenix.org/conference/usenixsecurity18/presentation/weiser This paper is included in the Proceedings of the 27th USENIX Security Symposium. August 15–17, 2018 • Baltimore, MD, USA ISBN 978-1-939133-04-5 Open access to the Proceedings of the 27th USENIX Security Symposium is sponsored by USENIX. DATA – Differential Address Trace Analysis: Finding Address-based Side-Channels in Binaries Samuel Weiser1, Andreas Zankl2, Raphael Spreitzer1, Katja Miller2, Stefan Mangard1, and Georg Sigl2,3 1Graz University of Technology, 2Fraunhofer AISEC, 3Technical University of Munich Abstract target for various side-channel attacks [11, 45, 77], as a successful attack undermines cryptographic security Cryptographic implementations are a valuable target for guarantees. Especially software-based microarchitec- address-based side-channel attacks and should, thus, be tural attacks (e.g., cache attacks, DRAM attacks, branch- protected against them. Countermeasures, however, are prediction attacks, and controlled-channel attacks) are often incorrectly deployed or completely omitted in prac- particularly dangerous since they can be launched from tice. Moreover, existing tools that identify information software and, thus, without the need for physical access. leaks in programs either suffer from imprecise abstrac- Many of these software-based attacks exploit address- tion or only cover a subset of possible leaks. We sys- based information leakage to recover cryptographic keys tematically address these limitations and propose a new of symmetric [6, 36] or asymmetric [28, 87] primitives. methodology to test software for information leaks. Various countermeasures against address-based infor- In this work, we present DATA, a differential address mation leakage have been proposed on an architectural trace analysis framework that detects address-based side- level [52, 62, 81]. However, these require changing the channel leaks in program binaries. This accounts for at- hardware, which prohibits fast and wide adoption. A tacks exploiting caches, DRAM, branch prediction, con- more promising line of defense are software countermea- trolled channels, and likewise. DATA works in three sures, which remove address-based information leaks by phases. First, the program under test is executed to eliminating key-dependent memory accesses to data and record several address traces. These traces are analyzed code memory. For example, data leakage can be thwarted using a novel algorithm that dynamically re-aligns traces by means of bit-slicing [43, 47, 66], and control-flow to increase detection accuracy. Second, a generic leakage leakage by unifying the control flow [21]. Even though test filters differences caused by statistically independent software countermeasures are already well studied, in program behavior, e.g., randomization, and reveals true practice their adoption to crypto libraries is often par- information leaks. The third phase classifies these leaks tial, error-prone, or non-transparent, as demonstrated by according to the information that can be obtained from recent attacks on OpenSSL [27, 28, 88]. them. This provides further insight to security analysts about the risk they pose in practice. To address these issues, leakage detection tools have We use DATA to analyze OpenSSL and PyCrypto in been developed that allow developers and security ana- a fully automated way. Among several expected leaks in lysts to identify address-based side-channel vulnerabil- symmetric ciphers, DATA also reveals known and pre- ities. Most of these tools, however, primarily focus on viously unknown leaks in asymmetric primitives (RSA, cache attacks and can be classified into static and dy- DSA, ECDSA), and DATA identifies erroneous bug fixes namic approaches. Many static analysis methods use ab- of supposedly fixed constant-time vulnerabilities. stract interpretation [24,25,48,57] to give upper leakage bounds, ideally proving the absence of information leaks in already secured implementations, e.g., the evaluation 1 Introduction of Salsa20 [24]. However, these approaches struggle to accurately describe and pinpoint information leaks due Side-channel attacks infer sensitive information, such to over-approximation [24, page 443], rendering leakage as cryptographic keys or private user data, by moni- bounds meaningless in the worst case. Moreover, their toring inadvertent information leaks of computing de- approximations of the program’s data plane fundamen- vices. Cryptographic implementations are a valuable tally prohibit the analysis of interpreted code. USENIX Association 27th USENIX Security Symposium 603 In contrast, dynamic approaches [41, 84, 89] focus tifies in a program is potentially exposed to side-channel on concrete program executions to reduce false posi- attacks. DATA works in three phases. tives. Contrary to static analysis, dynamic analysis can- Difference Detection: The first phase generates noise- not prove the absence of leakage without exhaustive in- less address traces by executing the target program with put search, which is infeasible for large input spaces. binary instrumentation. It identifies differences in these However, in case of cryptographic algorithms, testing a traces on a byte-address granularity. This accounts for subset of inputs is enough to encounter information leaks all address-based side-channel attacks such as cache at- with a high probability, because crypto primitives heav- tacks [61,64,87], DRAM attacks [65], branch-prediction ily diffuse the secret input during processing. Thus, there attacks [1], controlled-channel attacks [86], and many is a fundamental trade-off between static analysis (mini- blackbox timing attacks [11]. mizing false negatives) and dynamic analysis (minimiz- Leakage Detection: The second phase tests data and ing false positives). control-flow differences for dependencies on the secret We aim for a pragmatic approach towards minimiz- input. A generic leakage test compares the address traces ing false positives, allowing developers to identify infor- of (i) a fixed secret input and (ii) random secret inputs. If mation leaks in real-world applications. Thus, we fo- the traces differ significantly, the corresponding data or cus on dynamic analysis and tackle the limitations of ex- control-flow differences are labeled as secret-dependent isting tools. In particular, existing tools either focus on leaks. This minimizes false positives and explicitly ad- control-flow leaks or data leaks, but not both at the same dresses non-deterministic program behavior introduced time [80,89]; they consider the strongest adversary to ob- by blinding or probabilistic encryption, for example. serve cache-line accesses only [41], which is too coarse- Leakage Classification: The third phase classifies grained in light of recent attacks (CacheBleed [88]); the information leakage of secret-dependent data and many of them lack the capability to properly filter pro- control-flow differences. This is achieved with specific gram activity that is statistically independent of secret leakage tests that find linear and non-linear relations be- input [50, 80, 84]; and most do not provide any means tween the secret input and the address traces. These leak- to further assess the severity of information leaks, i.e., age tests are a valuable tool for security analysts to de- the risk they bring and the urgency with which they termine the severity and exploitability of a leak. must be fixed. Based on these shortcomings, we argue We implement DATA in a fully automated evaluation that tools designed to identify address-based information tool that allows analyzing large software stacks, includ- leaks must tackle the following four challenges: ing initialization operations, such as key loading and 1. Leakage origin: Detect the exact location of data parsing, as well as cryptographic operations. We use and control-flow leaks in programs on byte-address DATA to analyze OpenSSL and PyCrypto, confirming granularity instead of cache-line granularity. existing and identifying new vulnerabilities. Among sev- 2. Detection accuracy: Minimize false positives, e.g., eral expected leaks in symmetric ciphers (AES, Blow- caused by non-determinism that is statistically inde- fish, Camellia, CAST, Triple DES, ARC4), DATA also pendent of the secret input, and provide reasonable reveals known and previously unknown leaks in asym- strategies to also reduce false negatives. metric primitives (RSA, DSA, ECDSA) and identifies er- 3. Leakage classification: Provide means to classify roneous bug fixes of supposedly resolved vulnerabilities. leaks with respect to the information gained by an Outline. The remainder of this paper is organized as fol- adversary. lows. In Section 2, we discuss background information 4. Practicality: Report information leaks (i) fully au- and related work. In Section 3, we present DATA on a tomated, i.e., without requiring manual interven- high level. In Sections 4–6 we describe the three phases tion, (ii) using only the program binary, i.e., with- of DATA. In Section 7, we give implementation details. out requiring the source code, and (iii) efficiently in In Section 8, we evaluate DATA on OpenSSL and Py- terms

Load more