Cidpro: Custom Instructions for Dynamic Program Diversification

CIDPro: Custom Instructions for Dynamic Program Diversification Thinh Hung Pham∗, Alexander Fell∗, Arnab Kumar Biswasy, Siew-Kei Lam∗, and Nandeesha Veeranna∗ ∗Nanyang Technological University, Singapore Email: fpham ht, afell, [email protected], siewkei [email protected] yUniversity of South Brittany, France [email protected] Abstract—Timing side-channel attacks pose a major threat key0 key1 to embedded systems due to their ease of accessibility. We 400 propose CIDPro, a framework that relies on dynamic program 400 diversification to mitigate timing side-channel leakage. The pro- 300 posed framework integrates the widely used LLVM compiler 300 200 infrastructure and the increasingly popular RISC-V FPGA soft- 200 processor. The compiler automatically generates custom instructions in the security critical segments of the program, and the 100 100 instructions execute on the RISC-V custom co-processor to pro- 0 0 duce diversified timing characteristics on each execution instance. 4,200 4,300 4,400 4,500 4,600 4,700 6,800 6,900 7,000 7,100 7,200 CIDPro has been implemented on the Zynq7000 XC7Z020 FPGA (a) Baseline (b) Conditional Assignment device to study the performance overhead and security trade- 300 offs. Experimental results show that our solution can achieve 300 80% and 86% timing side-channel capacity reduction for two benchmarks with an acceptable performance overhead compared 200 200 to existing solutions. In addition, the proposed method incurs only a negligible hardware area overhead of 1% slices of the entire 100 100 RISC-V system. 0 0 I. INTRODUCTION 6,700 6,800 6,900 7,000 7,100 7,200 3,300 3,400 3,500 3,600 3,700 3,800 (c) Cross-Copying (d) Left-to-Right Sliding Window Embedded systems have become an integral part of our lives, and hence it is essential they are secure. Unfortunately, Fig. 1: Timing histogram of (a) a baseline program, (b) the cryptographic schemes in embedded systems are deployed after transformation using cross-copying, (c) using conditional in unforeseen adversarial settings where keys can be compro- assignment and (d) using left-to-right sliding window. In each mised through side-channel attacks. Such attacks have been plot, the X-axis and Y-axis show the execution time in clock reported on embedded devices [1] where the attacker extracts cycles and number of instances with that execution time, secret information from a victim program by observing a respectively. physical phenomenon, e.g. execution time and power con- sumption, during its execution [2]. In this paper, we consider the information leakage wherein the secret information of a relies on dynamic hardware diversification to vary the timing victim program is unintentionally leaked to the attacker via characteristics of the program during execution. CIDPro is the timing channel. For example, a modular exponentiation built upon the widely used LLVM compiler infrastructure function (modExp) used in RSA decryption (as shown in [3] and targets the Rocket core [4] which is based on the Algorithm 1), induces information leakage in execution time RISC-V ISA. The framework is able to reduce timing side- due to the absence of an else-branch, while the condition channel leakage significantly by utilizing hardware diversifica- of the if-branch depends on the secret key k. The attacker tion modules invoked by custom instructions that are inserted can observe the execution time of the victim program and in the original source code. establish a correlation between the observation and hypothesis The paper is organized as follows: Section II discusses of the secret information such as a cryptographic key. In this existing countermeasures for timing leakage reduction, while paper, we show that existing software countermeasures for CIDPro is presented in Section III. In Section IV, we provide mitigating timing leakage are ineffective especially when the a detailed discussion of the experimental results and Section V cryptographic algorithms execute in bare metal mode or with concludes the paper. an embedded OS (Operating System), which is common in embedded systems. II. RELATED WORK To overcome the limitations of the existing software coun- A typical countermeasure against timing side-channel at- termeasures, we propose a framework called CIDPro that tacks is to apply program transformations such that the critical functions (e.g. modExp) complete in constant time. Two Timing side-channel attacks to extract not only the Ham- representative program transformation techniques are cross- ming weight but also the value of fixed Diffie-Hellman expo- copying [5] and conditional assignment [6]. In cross-copying, nents and RSA keys, have been reported in [9]. To mitigate an else-branch is added with a dummy task that mimics the the attacks, the authors presented blinding techniques to ran- execution pattern in the if-branch to equalize the execution domize the execution time. However, the blinding technique time of both branches. In conditional assignment, the condition requires a significant amount of computations, which poses a is directly encoded into the branches using bit masks and bit- heavy burden for lightweight processors in embedded systems. wise logical operators. Hardware solutions usually require substantial changes to The countermeasures, mentioned above, suffer from several the processor architecture that impacts all programs running limitations: First, the program transformations are typically on the processor and also incur performance overhead on performed on high-level programs. As such, compiler opti- non-security critical programs. A secure processor that can mizations may inadvertently reduce the effectiveness of the protect against side-channel attacks using masking and hiding countermeasures, causing timing side-channel vulnerabilities techniques, is proposed in [10]. Besides an independent data at the micro-architecture level. In addition, even though the path to implement the masking scheme, a pipeline randomizer conditional assignment removes critical conditions by flatten- adds non-deterministic dummy control and data signals to the ing and converting them into primitive arithmetic and bitwise processor data path. Another secure processor called Ascend, instructions, certain instructions (especially multiplications, is presented in [11] that relies on an ORAM controller to ob- divisions, etc.) may cause timing side-channel leakage due fuscate the address bus. However, information leakage cannot to variable clock cycle requirements based on the operand be prevented due to on-chip resource sharing. An instruction values. Second, both the existing techniques cannot minimize shuffler is proposed in [12] to shuffle independent instructions the hamming weight of different keys. Third, both the existing randomly for protection against side-channel attacks. techniques result in performance degradation as additional In this paper, we propose the CIDPro framework as a instructions are introduced to equalize the execution times. countermeasure against timing side-channel attacks especially in low noise embedded systems by utilizing both software and Algorithm 1 Modular Exponentiation hardware methods with minimal resource overhead. Input: data y, private key k of length d, integer N III. PROPOSED SOLUTION Output: yk mod N In this section, we describe the proposed CIDPro framework 1: procedure MODEXP(y, k, N) and its hardware implementation that utilizes custom instruc- 2: r 1 tions to effectively reduce timing side-channel leakage with 3: for i 1 to d do low hardware cost and acceptable performance overhead. 4: if (k mod 2 == 1) then . Critical Condition 5: r (r × y) mod N A. CIDPro Framework 6: y (y × y) mod N The premise of our framework is hardware diversification 7: k k 1 using custom instructions to execute diversified (time-varying) 8: return r mod N instructions (DIs) as shown in Figure 2. The original source code of the program is given to the The sliding window technique [7] has been previously pre- framework, which consists of two parts: 1) An LLVM pass sented to reduce the effect of the hamming weight difference transforming the source code into an assembly file, and of keys on the execution time of the modExp function. This 2) custom instruction generation of a set of m diversified technique not only improves the performance but also reduces instructions DI = fDI1; DI2; : : : ; DImg. Each DIi 2 DI the timing-channel leakage. However, the sliding window with 1 ≤ i ≤ m performs an arithmetic operation such as an technique is not able to eliminate the information leakage. In ADD, MUL, etc, which is usually found in the source code of j particular, the residual timing leakage is notable on embedded cryptographic primitives. Further, each IIi 2 DIi represents systems with a low noise runtime environment. a diversified version of the same operation with 1 ≤ j ≤ n. j These limitations of the existing approaches are highlighted Therefore every instance IIi provides the same functionality 1 2 n in Figures 1(b)-(d), which show the timing histograms of the (i.e. f(IIi ) = f(IIi ) = ··· = f(IIi )), but exhibits different RSA algorithm (that includes the modExp function) when it execution time characteristics. is executed with two different keys in the absence of an OS, For each DIi, a corresponding hardware module i.e. in a bare metal environment that is common in embedded CoP r(DIi) is implemented in a hardware description lan- systems [8]. The timing characteristics corresponding to the guage (HDL). These modules are integrated as custom in- two different keys are clearly distinguishable with the ex- structions (CI) in the Secured Processor together with a isting approaches and hence, they do not provide effective Diversity Control Unit and a Pseudo Random Number Gen- countermeasures against timing side-channel attacks. As such, erator (PRNG). At runtime, the Diversity Control Unit selects executing functions in constant time cannot be easily achieved the corresponding DIi based on the CI, while the random j in software, particularly for embedded systems. number determines the IIi to be executed.

Cidpro: Custom Instructions for Dynamic Program Diversification

Hamming Weight Attacks on Cryptographic Hardware – Breaking Masking Defense?

Type I Codes Over GF(4)

Arxiv:1311.6244V1 [Physics.Comp-Ph] 25 Nov 2013

Faster Population Counts Using AVX2 Instructions

Coding Theory and Applications Solved Exercises and Problems Of

Conversion of Mersenne Twister to Double-Precision Floating-Point

Applications of Search Techniques to Cryptanalysis and the Construction of Cipher Components. James David Mclaughlin Submitted F

Calculating Hamming Distance with the IBM Q Experience José Manuel Bravo [email protected] Prof

FED-STD-1037C the Number of Digit Positions in Which the Corresponding Digits of Two Binary Numbers Or Words of the Same Length

Bits and Bit Sequences Integers

A Loopless Gray Code for Minimal Signed-Binary Representations

Calculating Hamming Distance with the IBM Q Experience José Manuel Bravo [email protected] Prof