Novel Side-Channel Attacks on Emerging Cryptographic Algorithms and Computing Systems
A Dissertation Presented by
Chao Luo
to
The Department of Electrical and Computer Engineering
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
in
Computer Engineering
Northeastern University Boston, Massachusetts
December 2018 To my family.
i Contents
List of Figures iv
List of Tables vi
List of Acronyms vii
Acknowledgments viii
Abstract of the Dissertation ix
1 Introduction 1 1.1 Motivation ...... 1 1.2 Research Agenda ...... 3
2 Side-Channel Analysis of XTS-AES 5 2.1 Introduction and Motivation ...... 5 2.2 Preliminaries ...... 6 2.2.1 XTS-AES Algorithm ...... 6 2.2.2 Attack and Leakage Model ...... 7 2.3 Simple Power Analysis of Software Implementation on Microcontroller ...... 8 2.4 Horizontal Attack of Hardware Implementation: Analysis of Modular Multiplication 9 2.4.1 Tweak Generation Leakage Analysis without Noise ...... 10 2.4.2 Improved Tweak Recovery ...... 13 2.4.3 Block Tweak Leakage Analysis with Noise ...... 14 2.4.4 Experimental Results ...... 19 2.5 Vertical Attack of Hardware Implementation: CPA on XTS-AES ...... 20 2.6 Countermeasures ...... 23 2.7 Summary ...... 25
3 Side-channel Analysis of AES on GPU 26 3.1 Introduction and Motivation ...... 26 3.2 Preliminaries ...... 27 3.2.1 GPU Basics ...... 27 3.2.2 AES and a CUDA implementation of AES ...... 30
ii 3.2.3 Side-channel Attack and Typical Correlation Power analysis ...... 31 3.2.4 Attack Model ...... 32 3.3 Experimental Setup and Power Trace Acquisition ...... 32 3.4 Power Model Building ...... 34 3.4.1 Hamming Distance Based Power Leakage Extraction ...... 34 3.4.2 GPU’s Power Leakage Model ...... 36 3.5 Key Discovery of GPU by Power Analysis Attacks ...... 39 3.5.1 Full Key Extraction ...... 40 3.5.2 A More Realistic Execution Environment ...... 42 3.6 Countermeasures ...... 46 3.7 Summary ...... 47
4 Side-channel Analysis of RSA on GPU 48 4.1 Introduction and Motivation ...... 48 4.2 Background: RSA and GPU Implementation ...... 50 4.2.1 Sliding Window Exponentiation ...... 50 4.2.2 Montgomery Multiplication ...... 52 4.2.3 GPU Parallelization of RSA ...... 53 4.3 The Timing Models of RSA on GPUs ...... 54 4.3.1 GPU Timing Model ...... 54 4.3.2 Timing Model Verification ...... 56 4.4 Correlation Timing Attack ...... 56 4.4.1 Attack CLNW ...... 57 4.4.2 Attack VLNW ...... 59 4.5 Success Rate Analysis ...... 60 4.6 Error Correction ...... 63 4.7 Experimental Results ...... 65 4.8 Countermeasures ...... 66 4.9 Summary ...... 67
5 Side-channel Analysis of ECC on Embedded Systems 68 5.1 Introduction and Motivation ...... 68 5.2 Preliminary ...... 70 5.2.1 ECC Background ...... 70 5.2.2 Side-Channel Countermeasures of micro-ecc ...... 72 5.3 Noval Simple Power Analysis ...... 74 5.4 Collision Attack ...... 80 5.5 Discussion ...... 83 5.6 Summary ...... 83
6 Conclusion 85
Bibliography 87
iii List of Figures
2.1 Diagram of XTS-AES sector encryption ...... 7 2.2 Power difference for Tj[127] = 1 and0 ...... 9 2.3 Operation of modular multiplication for different cases of {Tj[127],Tj+1[127]} .. 11 2.4 Hamming weights/distances of block tweaks ...... 12 2.5 Probability distribution of number of possible values for the 7 least significant bits . 14 2.6 Bit error rate of Bayesian test and ML-based test ...... 18 2.7 Distribution of power difference ΔPj ...... 20 2.8 Comparison of ΔHDj and ΔPj ...... 20 2.9 Experiment results with Bayesian test ...... 21 2.10 Correlation coefficient with T[0] and RT[0] ...... 22 2.11 Power difference for Tj[127] = 1 and 0 with dummy XOR protection ...... 24 2.12 Comparison of ΔHDj and ΔPj with dummy bit protection ...... 25
3.1 Typical CUDA threads and blocks present in a single grid [1]...... 28 3.2 Block diagram of a TESLA C2070 streaming multiprocessor [1]...... 29 3.3 The round operation running as one thread...... 31 3.4 The power measurement setup used in this work...... 33 3.5 A sample power trace of our GPU running AES, with the DC signal subtracted. . . 33 3.6 Last round operation on registers for one state byte...... 35 3.7 Distribution of confusion coefficient for one byte of the key for the GPU...... 38 3.8 Distribution of the confusion coefficient without linearity...... 39 3.9 Correlation between the power traces and the Hamming distances for all possible subkey byte values...... 40 3.10 Our CPA attack results...... 41 3.11 Success rate with different combinations of linear and nonlinear Hamming distances. 42 3.12 Empirical and theoretical success rates for 8, 16 and 32 blocks of plaintext. .... 46
4.1 Timing model verification ...... 57 4.2 Operations on Mtemp with CLNW ...... 58 4.3 Operations on Mtemp with VLNW...... 59 4.4 Theoretic and empirical success rate ...... 63 4.5 Sequence of correlation coefficients of a timing attack when an error happens . . . 64 4.6 Correlation coefficients of attacking zero and nonzero windows...... 65
iv 4.7 Always reduce countermeasure ...... 67 4.8 Random assignment countermeasure ...... 67
5.1 Modular multiplication and addition ...... 75 5.2 Simple power leakage from power and EM traces...... 76 5.3 Correlation of power trace with sliding multiplication pattern ...... 77 5.4 Count number of additions after modular multiplication ...... 78 5.5 Ephemeral key candidate search ...... 79 5.6 Power and EM trace collision ...... 82
v List of Tables
2.1 Threshold and BER for Different SNR ...... 16 2.2 Complexity of Search Among Erroneous Bits ...... 19
4.1 Attack result with error correction...... 66
5.1 Attacks and Countermeasures ...... 83
vi List of Acronyms
AES Advanced Encryption Standard. The Advanced Encryption Standard (AES), also known by its original name Rijndael is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology (NIST) in 2001.
CLNW Constant Length Nonzero Window. A sliding window algorithm of RSA, which partitions the private key into segments of windows. The nonzero window has a constant length of bits.
CPA Correlation Power Analysis.
DPA Differential Power Analysis.
ECC Elliptic-Curve Cryptography. ECC is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields. ECC requires smaller keys compared to non-EC cryptography (based on plain Galois fields) to provide equivalent security.
RSA Rivest–Shamir–Adleman. RSA is an algorithm used by modern computers to encrypt and decrypt messages. It is an asymmetric cryptographic algorithm. Asymmetric means that there are two different keys. This is also called public key cryptography, because one of the keys can be given to anyone. The other key must be kept private. The algorithm is based on the fact that finding the factors of a large composite number is difficult: when the integers are prime numbers, the problem is called prime factorization.
SPA Simple Power Analysis.
VLNW Variable Length Nonzero Window. A sliding window algorithm of RSA, which partitions the private key into segments of windows. The nonzero window has a variable length of bits.
XTS-AES Xor-encrypt-xor-based tweaked-codebook mode with ciphertext stealing AES. An AES mode designed for disk encryption and standardized on 2007-12-19 as IEEE P1619.
vii Acknowledgments
Thanks to my advisor Professor Yunsi Fei, who guided and supported me through the years of my PhD study and research. Thanks to Professor David Kaeli for all the help with my research. Thanks to Professor Adam Ding for the help with mathematics involved in my research. Thanks to Professor Pau Closas for his contribution to the statistic analysis. Thanks to Professor Aatmesh Shrivastava for proof reading.
viii Abstract of the Dissertation
Novel Side-Channel Attacks on Emerging Cryptographic Algorithms and Computing Systems
by Chao Luo Doctor of Philosophy in Computer Engineering Northeastern University, December 2018 Dr. Yunsi Fei, Advisor
After more than 20 year’s research and development, side-channel attacks are constantly posing serious threats to various computing systems. When targeting crypto-implementations to retrieve the secret, side-channel attacks utilize the peculiarity of the specific implementations, and achieve much better efficiency than brute force attacks and traditional cryptanalysis which attacks the weakness of the cryptographic algorithms themselves. Typical side channels include power consumption, electromagnetic emanation, and execution time. With inherent correlation between these side-channel information and the secret, statistic analysis can be employed to find the secret. However, there are still many challenges presented for side-channel research driven by two trends: new ciphers and emerging computing platforms. New ciphers or variants are being developed to provide higher level of security or get tailored to different applications. For example, XTS- AES (XEX-based tweaked-codebook mode with ciphertext stealing AES) is a security-hardened mode of AES for storage systems, which increases the algorithm complexity and hides more system- dependent parameters to users (attackers). Meanwhile, we see more emerging computing platforms, for general purpose computing or specific algorithm acceleration. Graphic Processing Unit (GPU) has been used to run a range of cryptographic algorithms for higher performance. However, the security of GPU when processing sensitive data, especially the highly relevant side-channel vulnerabilities, has received little attention and is vastly unexplored. Yet GPU differs from other computing platforms distinctly in terms of the hardware structure and software programming model, making side-channel attacks on GPU much more challenging. In this dissertation, I propose several novel side-channel attacks, targeting new ciphers including XTS-AES and ECC and also popular accelerators - GPUs. Some of our vulnerabilities analysis and
ix security evaluation are first of its kind, and we anticipate them to pave the way for mitigations and lead to more active side-channel research. The contributions of this dissertation include:
• Evaluation of the security of XTS-AES. XTS-AES features two secret keys instead of one, and an additional tweak for each data block. These characteristics make the mode not only resistant against cryptoanalysis attacks, but also more challenging for side-channel attacks. In this project, I comprehensively analyze the side-channel power leakage of various XTS-AES implementations and invent effective attacks. I first run a simple power analysis of a software implementation. For a hardware implementation on FPGA, I analyze side-channel leakage of the particular modular multiplication in XTS-AES mode. In addition, I utilize the relationship between two consecutive block tweaks and propose a method to work around the masking of ciphertext by the tweak. These attacks are verified on an FPGA implementation of XTS-AES. The results show that XTS-AES is susceptible to side-channel power analysis attacks, and therefore dedicated protections are required for security of XTS-AES in storage devices.
• Analysis of the power side-channel of GPU running AES. I propose and implement a side- channel power analysis methodology to extract all the last round key bytes of an AES im- plementation on an NVIDIA TESLA GPU. I first analyze the challenges of capturing GPU power traces due to the degree of concurrency and underlying architectural features of a GPU, and propose techniques to overcome these challenges. I then construct an appropriate power model for the GPU. I describe effective methods to process the GPU power traces and launch a correlation power attack on the processed data. I carefully consider the scalability of the attack with increasing degrees of parallelism, a key challenge on the GPU. Both our empirical and theoretical results show that parallel computing hardware systems such as a GPU are vulnerable to power analysis side-channel attacks, and need to be hardened against such threats.
• Analysis of the timing leakage of a public cipher, RSA on GPU. I build a timing model to capture the parallel characteristics of a RSA public-key cipher as implemented on a GPU, and consider optimizations that include Montgomery multiplication and sliding window exponenti- ation. Our timing model considers the challenges of parallel execution, complications which do not occur in single-threaded computing platforms. I describe the first successful timing attack on RSA running on a GPU, extracting the private key of RSA. I also present an effective error detection and correction mechanism. The results demonstrate that GPU acceleration of RSA is vulnerable to side-channel timing attacks. Countermeasures to defend against such attacks are proposed to be incorporated into next-generation GPU implementations.
• Power analysis methods on a real-world ECC library, micro-ecc [2]. micro-ecc incorporates many countermeasures against power analysis attacks on ECC and is the state-of-the-art implementation. Still, I discover two new side channel leakages. Based on these leakages, I propose and evaluate attacks to exploit these two weaknesses, and demonstrate practical full-key recovery on both AVR and ARM embedded systems using either a single power trace or a single electromagnetic emanation trace. These attacks are also applicable to other IoT systems using micro-ecc, and represent information leakage threats to a wide class of security services that are built on top of ECC.
x Chapter 1
Introduction
1.1 Motivation
Side-channel attack was first proposed by Korcher in the seminal work [3] in 1996. It fundamen- tally changed the notion of cryptanalysis, as side-channel attack utilizes information leakage obtained from the physical implementation of a cryptographic algorithm rather than just theoretically analyz- ing the algorithm’s weakness. Typical side channels include power consumption, electromagnetic emanation (EM), and execution time. Depending on the side-channel leakage and properties of the system under attack, the analysis method could be simple power (EM) analysis, differential power (EM) analysis, advanced statistical analysis like Mutual Information Attack [4], Template Attack [5] and Timing Attack [6]. Side-channel attacks are demonstrated to be powerful and effective against all the widely used ciphers, such as DES, AES and RSA, running on various computing systems, such as embedded systems, FGPA and CPU, if no dedicated protection is employed. However, creation of new ciphers and emerging of new computing platforms also present more challenges and opportunities for side-channel analysis. New ciphers or variants are designed to increase the security level or get tailored to specific applications. These ciphers are impacting our daily life significantly with wide deployment of security engines in diverse applications and infrastructures. Their mathematical security has been proven by classic cryptanalysis and certified and standardized by government agencies. However, side-channel vulnerabilities of these ciphers are not fully evaluated yet. They also introduce new challenges for side-channel research, because the algorithms get much more complex. The existing side-channel attacks may not apply to them, and the algorithms have to be strictly scrutinized for potential side-channel leakage. My dissertation first targets XTS-AES, a security-hardened advanced mode of AES proposed by IEEE and approved
1 CHAPTER 1. INTRODUCTION by NIST as an encryption cipher for block-oriented storage systems. It enhances the security over existing AES modes in multiple ways. First, it uses two secret keys instead one. The two keys are independent of each other, such that knowing only one of the keys would not break the system. Second, the input and output of AES encryption are masked by a logical address dependent tweak, which is generated by another AES encryption and modular multiplication. As a result, the internal state of the cipher is blinded to attackers. Since the approval of NIST, it has been widely used in hard disk drives, solid state drives and flash drives. It was also introduced to Windows 10’s BitLocker encryption. With the broad usage in real systems, any side-channel vulnerability would be detrimental to the system security. I thoroughly analyze the algorithmic characteristics of XTS-AES, discover unique side-channel power leakage of XTS-AES operations, and invent novel attacks, revealing the weakness of XTS-AES to side-channel analysis. As post-quantum cryptography is currently being developed for next-generation security engines, this line of research considering new ciphers will provide guidelines to side-channel analysis of post-quantum cryptography. Another emerging development for cryptographic implementations is ciphers are migrating from conventional sequential computing platforms (CPU) to parallel computing platforms for higher performance. Graphic Processing Unit (GPU) has been employed for general-purpose computing in addition to traditional graphics rendering, including accelerating a range of cryptographic algo- rithms. Block ciphers are suitable for parallel computing with their independent data blocks and common operations on blocks. RSA, a computation-intensive public cipher, is notorious for its low performance and acceleration is desired. There has been many crypto libraries porting both block ciphers and public key ciphers onto GPUs, with the improvement of throughput reaching hundreds to thousands of times compared to the fastest CPUs. However, the research on GPU security, especially its side-channel vulnerability, is still in its infancy. Our exploration along this direction is the first of its kind. We realize that it is very challenging to analyze side-channel vulnerabilities of GPUs, due to its distinct characteristics significantly different from other computing platforms, including CPUs and FPGAs. With the unique SIMT (Single Instruction Multiple Thread) model, many threads (processes) are running concurrently with different data, introducing enormous noise in the physical side-channels the attacker would monitor. In addition, there is temporal uncertainty of execution due to the non-deterministic and unknown scheduling of these threads on the underlying processing elements, while oftentimes execution and measurement alignment is a requirement for side-channel analysis. It is a daunting task to perform effective side-channel analysis. In this dissertation, I target side-channel analysis of GPU. I propose methods to overcome these challenges and design innovative attacks to launch successful attacks on both AES and RSA implementations on GPU.
2 CHAPTER 1. INTRODUCTION
The last work of my PhD research is evaluating the side-channel (power analysis) vulnerability of a side-channel resistant real world ECC library, micro-ecc. ECC, a relatively new public-key cipher, achieves much higher computation efficiency with shorter key size compared to RSA. Many developed countermeasures against known side-channel vulnerabilities of ECC have been incor- porated into micro-ecc, making it the state-of-the-art implementation of ECC. micro-ecc is being widely deployed in embedded systems, including the burgeoning Internet-of-Things (IoT) devices and systems. Despite the claimed side-channel resistance [2], I discovered two vulnerabilities of the implementation, and designed efficient attack methods to retrieve the secret key. To avoid the attack, countermeasures are also proposed in this dissertation.
1.2 Research Agenda
With the aforementioned motivations, my PhD dissertation investigates several new cipher/modes and the emerging computing platform for cryptographic implementation, GPU. I investigate the following topics in the rest of the dissertation:
1. Power analysis of XTS-AES: I target an XTS-AES implemented on FPGA using power analysis. The novelty lies in the leakage discovery of the modular multiplication of XTS-AES which has not been investigated before. I also discover the relationship between consecutive block encryptions can be leveraged to break the system. I design the attack methods utilizing the new vulnerabilities to retrieve the full secret key.
2. Power analysis of AES on GPU: It is the first successful side-channel power analysis of GPU. I derive a power model and corresponding data processing method to overcome the challenges introduced by GPU’s parallel computing feature. Correlation power analysis is used to extract the secret key with the power model. The results show that GPU is vulnerable to side-channel power analysis, for the first time, and side-channel countermeasures should be developed for GPUs as well.
3. Timing analysis of RSA on GPU: I target a popular RSA implementation on GPU with Montgomery multiplication and sliding window exponentiation. A hierarchical timing model is build to explicitly capture various complex interactions in a massively parallel computing platform. Based on this model, I design a successful correlation timing analysis to retrieve the private key in an iterative manner. An effective error correction algorithm is designed to detect and correct attack errors, achieving significant improvement of success rate.
3 CHAPTER 1. INTRODUCTION
4. Power analysis of state-of-the-art ECC implementation: I target a real world implementa- tion of ECC, micro-ecc. Two vulnerabilities are newly discovered even though it has already been protected from common side-channel power attacks, including SPA and DPA. A simple power analysis and collision attack are designed to exploit the two weaknesses, respectively and successfully recover the private key of ECC. I also propose effective countermeasures to prevent such attacks.
4 Chapter 2
Side-Channel Analysis of XTS-AES
2.1 Introduction and Motivation
XTS-AES (XEX-based tweaked-codebook mode with ciphertext stealing) [7], an AES (Advanced Encryption Standard) mode designed specifically for data protection on block-oriented storage devices, has been widely used on hard disk drives (HDD), solid-state disks (SSD) and flash cards. It protects the confidentiality of sensitive data even when the adversaries have physical access to the device. The US National Instituted of Standards and Technology (NIST) has approved its usage in 2010 [8]. It adopts AES as its block cipher, but involves two phases of AES encryption with different keys. In the first phase, a series of block tweaks (128-bit data blocks) are generated through one AES encryption with a tweak key followed by a multiplication-by-2 in finite field GF(2128). The second phase is for data encryption, where every input and output block of AES is XORed with a distinctive block tweak generated from the first phase. In order to reveal the plaintext of the data stored on a protected device, the attacker has to infer the block tweaks in addition to the data encryption key, all unknown to adversaries. The security of different modes of AES under side-channel analysis has been evaluated by many researchers. Jaffe described a differential power analysis (DPA) against the counter mode (CTR) without knowing the initial counter [9]. In [10], Jayasinghe et al. studied the common advanced modes, including Cipher Block Chaining (CBC), Cipher Feedback (CFB), Output Feedback (OFB) and CTR, under side-channel analysis in a quantitative way. Recently the security of the new XTS- AES mode has also attracted a lot of research attention. Unterluggauer et al. pointed out the data encryption key can be extracted by one extra attack on the second-last round [11] in addition to the normal attack on the last round. However, the attack on the second-last round needs to deal with
5 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
MixColumn operation [12], which increases the attack complexity with four subkey bytes. Side- channel information may also be leaked through modular multiplication in finite field, and be used to launch attacks. In [13, 14], Bela¨ıd et al. described a side-channel analysis of finite field multiplication. Their attack works when the secret operand of multiplication is constant and the known operand changes. However, their attack is not applicable to XTS-AES, where the secret operand (tweak) keeps changing by multiplication of a constant of 2 in XTS-AES’s modular multiplication.
2.2 Preliminaries
In this section, I first give a brief introduction of XTS-AES and its modular multiplication. I then describe the attack model and the leakage model of XTS-AES on sector-based storage drives.
2.2.1 XTS-AES Algorithm
XTS-AES mode is an instantiation of Rogaway’s XEX (XOR Encrypt XOR) tweakable block cipher [15]. It provides stronger security than the common Electronic Codebook Mode (ECB), and can be parallelized for better performance [16] compared with CBC Mode. It protects data against ciphertext manipulation and cut-and-paste attacks [17]. The data to be encrypted is divided into equal-sized sectors containing multiple data blocks (e.g., each at the size of 128 bits for AES-128). The typical sector size of storage devices is 512 bytes before 2011 and 4K bytes after [18]. In this chapter, we consider the sector size of 4K bytes, which consists of 256 128-bit blocks. Fig. 2.1 shows the process of an XTS-AES sector encryption. The initial 128-bit tweak, T0, is normally the encrypted logical sector address (i) where the data is stored with a tweak key KeyT by an AES 128 encryption. T0 then goes through a modular multiplication α, a primitive element of GF (2 ) which is 2 here, to produce the next block tweak T1, and so on for further block tweaks. Each block tweak is applied to mask both the plaintext (using XOR) before the encryption, and demask the output after the data encryption to generate the final ciphertext. The operation of modular multiplication of α can be described as below. With each block tweak
Tj represented as a vector of 128 bits (Tj[127],Tj[126], ... , Tj[0]), the relationship between Tj and
Tj+1 is: ⎧ ⎨ T [(i − 1)%128] ⊕ T [127] for i ∈{1, 2, 7} T i j j j+1[ ]=⎩ (2.1) Tj[(i − 1)%128] otherwise
6 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
AES-enc ӽ ӽ
˔ ˔
AES-enc AES-enc
˔ ˔
Figure 2.1: Diagram of XTS-AES sector encryption
where ⊕ is bit XOR, and % is integer modular operation. It means that when Tj[127] = 0, the modular multiplication is a round left shifting; otherwise, it is a round left shifting plus inverting the bits at indices 1, 2, 7.
2.2.2 Attack and Leakage Model
In the attack model, we assume that the adversary only has the knowledge of the stored ciphertext,
Cj, which can be obtained from the memory storage device physically. It is also the threat model considered in the design of XTS-AES [7]. The plaintext and the block tweaks are never revealed. Although the logical sector address, i, can be observed by probing the memory bus [11], it can be easily hidden by adding an unknown offset onto it or other obfuscation methods. We consider it unknown for general attacks. The output of the data encryption, CCj, is unknown because of the unknown block tweak Tj. For the leakage model, we assume the Hamming weight or Hamming distance of intermediate data is leaked with some Gaussian noise. The Hamming weight model is commonly used for attacking microprocessor implementations, and the Hamming distance model is more suitable for hardware implementations such as FPGA and ASIC. The goal of our side-channel power analysis is to recover both the encryption and tweak keys.
The block tweak is a key element here. By knowing the T0 of every sector associated with different i, the tweak key KeyT can be recovered by a classical CPA considering T0 as the ciphertext. For the encryption key KeyE recovery, the ciphertext on the disk is first demasked by the block tweak, then CPA can be applied with the knowledge of the encryption output CC.
7 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
2.3 Simple Power Analysis of Software Implementation on Microcon- troller
In our prior work [19], we focus on the attacks of hardware implementation. For comprehensive power analysis of the XTS-AES algorithm, we also have to consider software implementations. In this section, we evaluate the side-channel vulnerability of software implementations and perform a simple power analysis of microcontroller. The recommended software implementation of XTS-AES in [7] has a vulnerability to simple power analysis, leading to the disclosure of tweak T0. For updating th the (j +1) block’s tweak Tj+1 from Tj as in Equation (2.1), there is a conditional extra XOR operation depending on the most significant bit value of Tj. Algorithm 1 depicts the tweak updating th process, where Tj(i) is the i byte of Tj, and there are 16 bytes in each tweak.
Algorithm 1 Tweak Updating
Input: Tj
Output: Tj+1 = Tj ⊗ α
1: Cin ← 0 2: for i =0to 15 do {16-byte left shifting}
3: Cout ← (Tj(i) 7) & 1
4: Tj+1(i) ← ((Tj(i) 1) + Cin)&0xFF
5: Cin ← Cout 6: end for
7: if Cout == 1 then
8: Tj+1(0) ← Tj+1(0) ˆ 0x87 9: end if
10: return Tj+1
In Algorithm 1, the input Tj and output Tj+1 are the two consecutive tweaks of 128 bits. Lines 2-6 implement the 16-byte left logical shifting (the rightmost filled bit is zero) and the leftmost bit (MSB) is saved in Cout. It start from the least significant bytes, Tj(0), and iterates 16 times for all the bytes. For each byte processing, Cout extracts the most significant bit of the byte, and Cin is the MSB of the next lower byte. Lines 7-9 are for the conditional XOR. If the Cout of the tweak Tj is one, the generated tweak Tj+1 should be XOR-ed with 0x87.
If Tj[127] = 1, there will an extra XOR operation at line 8. As a result, the power consumption by the XOR operation can be observed in the power trace, when compared with another one without
8 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES the extra XOR operation. The power profile would be longer, and in different shape. The most significant bit of Tj can be extracted by a simple power analysis as shown in Fig 2.2. The power traces are measured from an ATmega328p microcontroller running the XTS-AES encryption. As shown in Section 2.4, T0 can be recovered by knowing Tj[127] for j =0, ..., 127. There are multiple quick solutions to fix this vulnerability. For example, one is to balance the branch, also with an XOR operation Tj[127] = 0, but the constant is 0x00 instead of 0x87. Another solution is to use two extra variable to hold the values of Tj+1(0) and Tj+1(0)ˆ0x87, and copy one of them to the final output of Tj+1(0) according to Tj[127], which incurs execution overhead and may also possibly be optimized by the compiler. Both methods are effective for hiding the value Tj[127] on ATmega328p from simple power analysis, but may still vulnerable to advanced side-channel attack such as template attack [20].
170 T [127]=1 j T [127]=0 160 j
150
140
130 Normalized Voltage
120
extra XOR operation
110 0 1000 2000 3000 4000 5000 6000 7000 8000 Sampling Points
Figure 2.2: Power difference for Tj[127] = 1 and 0
2.4 Horizontal Attack of Hardware Implementation: Analysis of Mod- ular Multiplication
For hardware implementation, the simple power analysis is not effective any more. The common implementation of line 7 to 9 in Algorithm 1 is a multiplexer selecting the original Tj+1(0) or the
XOR-ed result as the final value of Tj+1(0) based on the value of Tj[127]. For different values of
Tj[127], there will be no time difference, and the power difference is minimal. As a result, we choose to attack the modular multiplication in XTS-AES to recover the block tweaks, where the operations go horizontally within a data sector encryption, as shown in Fig. 2.1, to generate block tweaks consecutively. We call such attack horizontal attack. We first ignore the effect
9 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
of noise, and demonstrate how to get the value of T0 bit by bit. We then show how to recover the bit information with noise and make correction in case of errors.
2.4.1 Tweak Generation Leakage Analysis without Noise
We assume all the tweaks are stored in a register. Its Hamming weight and Hamming distance are HW(Tj) and HW(Tj+1 ⊕ Tj) for j ≥ 0. For the Hamming weight power leakage model, we can obtain the Hamming weights from simple-power analysis (SPA) of a single side-channel power trace. If Tj[127] is 0, according to (2.1),
Tj+1 is Tj round left shifted by one, resulting in the same Hamming weight as HW(Tj+1)=HW(Tj).
Otherwise, there are three bits inverted in Tj+1, which change HW(Tj+1) by either ±1 or ±3.We denote the Hamming weight difference between two consecutive block tweaks in a sector as:
ΔHWj+1 = HW(Tj+1) − HW(Tj) for j ≥ 0 (2.2)
It can be summarized as below, ⎧ ⎨ 0 for T [127] = 0 HW j Δ j+1 = ⎩ (2.3) Δ for Tj[127] = 1, Δ ∈{±1, ±3}
By observing ΔHWj+1, we can tell what the value Tj[127] is. After having {T0[127],T1[127], ... , T127[127]}, the highest bits of 128 consecutive blocks tweaks within one sector, we can reconstruct T0 based on (2.1): ⎧ ⎪ T0[n]=T127− [127] 127 ≥ n ≥ 7 ⎪ n for ⎨⎪ T0[n]=T127−n[127] ⊕ T0[121 + n] for 6 ≥ n ≥ 2 (2.4) ⎪ T T ⊕ T ⊕ T ⎪ 0[1] = 126[127] 0[122] 0[127] ⎩⎪ T0[0] = T127[127] ⊕ T0[121] ⊕ T0[127] ⊕ T0[126]
For the Hamming distance power leakage model, we define the differential of two consecutive Hamming distances in the register as:
ΔHDj+2 = HDTj+2 − HDTj+1 (2.5) = HW(Tj+2 ⊕ Tj+1) − HW(Tj+1 ⊕ Tj)
We consider all the 4 cases of (Tj[127],Tj+1[127]), and illustrate the composition of Tj, Tj+1, and Tj+2 in Fig. 2.3. We represent each bit of Tj as a box with an index number, and also Tj+1 and Tj+2 using bits of Tj. A number with a bar over it means its inverse, and two bars means the
10 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
127 126 … ... 8 7 6 5 4 3 2 1 0
126 125 … ... 7 6 5 4 3 2 1 0 127
125 124 … ... 6 5 4 3 2 1 0 127 126
127 126 … ... 8 7 6 5 4 3 2 1 0
126 125 … ... 7 6 5 4 3 2 1 0 127
125 124 … ... 6 4 3 2 1 126
127 126 … ... 8 7 6 5 4 3 2 1 0
126 125 … ... 7 5 4 3 2 127
125 124 … ... 5 4 3 2 127 126
127 126 … ... 8 7 6 5 4 3 2 1 0
126 125 … ... 7 5 4 3 2 127
125 124 … ... 4 3 2 126
Figure 2.3: Operation of modular multiplication for different cases of {Tj[127],Tj+1[127]} inverse of its inverse, which is itself. We denote a set I = {0, 1, 2, ...127}, and I\A as a subset in I complementing the subset A.
Under the case of (Tj[127],Tj+1[127]) = (0, 0), the bit values of all the three tweaks do not change, only the bit positions in Tj+1 and Tj+2 change. The Hamming distances HDTj+1 and
HDTj+2, are both equal to the sum of XOR results of every bit with its previous bit. That’s HDT00 HDT00 T i ⊕ T i j+1 = j+2 = j[ ] j[( + 1)%128] (2.6) i∈I HD00 HDT00 − HDT00 Δ j+2 = j+2 j+1 =0 (2.7)
The superscription is used for different cases of (Tj[127],Tj+1[127]).
For the case of (0, 1), there are three bits, Tj[127], Tj[0], Tj[5] being flipped in Tj+2 at different positions, and therefore HDTj+1 and HDTj+2 are different in three places. We have
HD01 HDT01 − HDT01 Δ j+2 = j+2 j+1
= Tj[127] ⊕ Tj[0] + Tj[0] ⊕ Tj[1] + Tj[5] ⊕ Tj[6] (2.8)
− Tj[127] ⊕ Tj[0] − Tj[0] ⊕ Tj[1] − Tj[5] ⊕ Tj[6]
11 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
a ⊕ b − a ⊕ b a ⊕ b − a ⊕ b HD01 ∈ We know = , with value of 1 or -1. Therefore we have Δ j+2 {−3, −1, 1, 3} by enumerating all the possibilities. The case of (1, 0), is similar to (0, 1), with only the positions of the difference varied, we also HD10 ∈{− , − , , } have Δ j+2 3 1 1 3 HD11 For (1, 1), with the same analysis we have Δ j+2 =0. In conclusion, ⎧ ⎨ 0 for T [127] = T +1[127] HD j j Δ j+2 = ⎩ (2.9) Δ otherwise, Δ ∈{±1, ±3}
Different from the Hamming weight model, we can only recover the relationship between any two consecutive Tj[127] and Tj+1[127] bits by SPA of the power trace. To reconstruct T0, we start with a guess of T0[127], and Tj[127] for j ≥ 1 can be determined according to (2.9). Then T0 can be reconstructed in the same fashion as in Hamming weight leakage model by using (2.4), but with two candidates instead of one (one for T0[127] = 0, one for T0[127] = 1). The erroneous one can be eliminated through the verification method in next section.
A simulation of HW(Tj) and HDTj+1 with a random T0 is shown in Fig. 2.4. If Tj[127] = 1,
HW(Tj) and HDTj+1 are represented as ◦, otherwise ∗. For the Hamming weight model case, we see if Tj[127] = 1, the next Hamming weight changes. For the Hamming distance model case, if
Tj[127] is not equal to the previous one, the next Hamming distance changes.
70
65
HW(T ) j 60 T [127]=0 j
Hamming Weight T [127]=1 j 55 0 20406080100120 70 HDT j 65 T [127]=0 j T [127]=1 60 j
55 Hamming Distance 50 0 20406080100120 Block Sequence j
Figure 2.4: Hamming weights/distances of block tweaks
12 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
2.4.2 Improved Tweak Recovery
The method described in Section 2.4.1 requires 128 horizontal tweak generations to recover the MSBs of the tweaks first and then to derive individual tweaks. We take the Hamming weight model as an example for its simplicity, while the method also applies to the Hamming distance model.
For one modular multiplication to generate Tj+1 from Tj,ifTj[127] = 1, the Hamming weight difference Δ between Tj+1 and Tj is only dependent on Tj[7],Tj[2] and Tj[1], because only these three bits are toggled. If any one of the three bits is zero, it will be toggled to one in Tj+1, contributing 1 Hamming weight increase in Δ; otherwise, it contributes 1 Hamming weight decrease in Δ. The relationship can be expressed as: Δ= (1 − Tj[k] ∗ 2) (2.10) k=0,1,6
When Tj[127] = 0, we have no extra information because all the bits’ value keep the same and Δ=0. Assuming a Hamming-weight based power model without noise, the differential of two con- secutive Hamming weights, Δ, can be read directly from the power trace. The first Δ determines the MSB, T0[127], and also the relationship between the three bits if T0[127] is one, according to Equation (2.10). Similarly, the second Δ determines the second MSB, T0[126], and also the relationship between the {0, 1, 6} bits of T2, which consist of T0[0] and T0[5] (or flipped value) due to the circular shifting. At most the seven MSBs of T0 will generate seven equations over the seven
LSBs of T0. The problem can be formulated into a classical boolean satisfiability problem, and solved by a SAT solver. Intuitively, to find the values of the 7 LSB variables with at most 7 equations (we learn nothing if one MSB is zero), the solution would not be unique. However, we only have a very small number of solutions (legal values) on average. To get the probability distribution of the number of solutions, we simulate the tweak generation with randomly generated initial tweak, construct the appropriate equations using Equation (2.10) for j =0, 1, ..., 6, and plug them into a SAT solver to get all possible solutions. The probability distribution of number of solutions to the 7 least significant bits are shown in Fig 2.5. The highest probability happens when there are only 8 possible values. The total probability of possible number of values less or equal to 32 is 94.34%. It means, when we launch the attack with only 121 power traces instead of 128, we only have to iterate at most 32 possible solutions for the most of the time, which is trivial for a modern computer.
13 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
0.25
0.2
0.15
Probabilit 0.1
0.05
0 14812162432 48 64 96 128 Number of legal alues
Figure 2.5: Probability distribution of number of possible values for the 7 least significant bits
2.4.3 Block Tweak Leakage Analysis with Noise
Here we are targeting FPGA implementation, of which Hamming distance is a more suitable power model. The physical side-channel measurements (e.g., power consumptions) Pj are noisy observations of the targeted Hamming distances, Pj = HDTj + Nj, j ≥ 1, where is the unit power consumption for one switching, Nj is the noise, normally following a Gaussian distribution denoted by N (c, σ2). The side-channel signal-to-noise ratio (SNR) is defined as the ratio between the variance of the deterministic component of the power consumption, HDTj, and the variance of 2 2 random noise, Nj. For the 128-bit Tj, SNR =32 /σ , and the SNR can be obtained empirically. The attacker can find the noise variance σ2 by repeating the encryption of a sector at certain logical address (the same i, and therefore the same T0 and the following modular multiplication) and finding the variance of the power measurement at a fixed j position, whose variation is just due to the noise. The attacker can find the variance of the power consumptions across different j within one sector. 2 The variance of such power consumptions is therefore Var(HDTj)+σ , under the assumptions that HDTj and Nj are independent. For simplicity, we consider the normalized (by ) side-channel information so that
Pj = HDTj + Nj for j ≥ 1. (2.11)
With noisy observations, the maximum likelihood principle [21] leads us to search for the T0 value that will maximize the likelihood of the data being observed. Under the assumption of white
14 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
Gaussian noises for an observed trace of (P1, ..., Pn) within one sector, this becomes the least square estimator that minimizes the mean squared error (MSE):
1 n MSE = (P − HDT∗)2, n j j (2.12) j=1 HDT∗ j ≥ T ∗ where j ( 1) is the Hamming distance value corresponding to the j produced by the T ∗ P − HDT∗ 2 HDT − modular multiplication of (2.1) with a guessed value 0 . Since ( j j ) =( j HDT∗ N 2 n MSE T ∗ ≈ σ2 T ∗ T MSE T ∗ ≈ σ2 T ∗ T j + j) , for large , ( 0 ) for 0 = 0, and ( 0 ) 64 + for 0 = 0. Therefore, when σ2 is small (i.e., the SNR is large), the least square estimator can distinguish the correct tweak value T0. The attacker can always increase the SNR by attacking the averages of 2 2 r traces (P¯1, ..., P¯n), whose noise N¯j variance is σ /r (i.e., SNR is increased r times to 32r/σ ) according to the Central Limit Theorem. Finding the MSE minimum in (2.12), however, is prevented by the computation complexity of 128 enumerating all the 2 possible values for T0. Therefore, we first recover the relationship:
δj = Tj[127] ⊕ Tj−1[127] (2.13) from the noisy power observations, and then reconstruct T0 as shown in Section 2.4.1.
Relationship δj Recovery Using Maximum Likelihood Based Test: From Equation (2.9),
δj =0when ΔHDj+1 =0and δj =1when ΔHDj+1 =0 . We have to infer whether ΔHDj+1 is zero from the differential of the two consecutive power measurements: Hence we judge whether
ΔHDj is zero from the differential of the noisy power consumption:
ΔPj+1 = Pj+1 − Pj =ΔHDj+1 +ΔNj+1 (2.14)
2 2 where ΔNj+1 = Nj+1−Nj follows the distribution N (0, σ˜ =2σ ), and ΔHDj+1 ∈{−3, −1, 0, 1, 3} with the corresponding probability {1/16, 3/16, 1/2, 3/16, 1/16}. Hence the absolute value of ΔPj+1 tends to be smaller when ΔHDj+1 =0than when ΔHDj+1 =0 . We can use a threshold TH on the observed ΔPj+1, when it is below the threshold, ΔHDj+1 =0, and δj =0. The bitwise error rate (BER) of such recovered δj is therefore: 3 BER = [Φσ˜(TH − 1) − Φσ˜(−TH − 1)] 8 (2.15) 1 + [Φ˜(TH − 3) − Φ˜(−TH − 3)]+Φ˜(−TH) 8 σ σ σ 2 where Φσ(·) denotes the cumulative distribution function of N (0,σ ). We can find the best TH and the minimal error rate for a given SNR, by numerically minimizing (2.15). The result is shown in
15 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES
Table 2.1: Threshold and BER for Different SNR
SNR 8 16 32 64 128 256 512 TH 3.03 2.26 1.74 1.33 0.98 0.74 0.62 BER 0.460 0.430 0.385 0.330 0.266 0.182 0.093
Table 2.1. When the SNR is big (SNR > 100), the optimal threshold can be solved analytically as TH ≈ / / σ2 1 2+loge(8 3)˜ . 127 With probability (1 − BER) , the noisy attack can recover all 127 δj bits correctly as the noiseless attack of Section 2.4.1. Thus, for the noisy attack to achieve the same result as the noiseless attack with 99% probability, we need BER =0.00008 which corresponds to SNR= 3750.As mentioned above, by attacking the averages of r traces, SNR increases r times. With big enough r, the attacker can ensure the attack is the same as the noiseless attack. The two initial guesses of
T0[127], together with the (δ1, ..., δ127), result in two guessed T0 values, and the correct one should minimize the MSE in (2.12).
Relationship δj Recovery Using the Bayesian Test: In our prior work [19], we only presented the maximum likelihood attack method. In this work, we continue to improve the attacking success rate by adopting Bayesian test [22]. The idea of Bayesian test is to decide two hypotheses H1 and H2 from the observation of a random variable Y by comparing the conditional probability of p(H1|Y = y) and p(H2|Y = y). We decide in favor of H1 when p(H1|Y = y) >p(H2|Y = y), otherwise we choose H2. Here the two hypotheses are H1 :Δ=0and H2 :Δ=0 . The observation is the power measurements. From the r traces, we construct two power measurement vectors across the r traces at two time points, Pj = {Pj,1, ..., Pj,r} and Pj+1 = {Pj+1,1, ..., Pj+1,r}, for recovering δj, where Pj,k is the th power of computing Tj during the k repetition. We assume the noise samples in the measurement are i.i.d. (independent and identically distributed) from zero-mean Gaussian distribution with variance 2 of σ . The distribution of Pj and Pj+1 are
2 Pj ∼N (HDTj1,σ I) (2.16) 2 Pj+1 ∼N ((HDTj +Δ)1,σ I) (2.17)
The recovery of δj is equivalent to test the two hypotheses. According to the Bayesian theory,
16 CHAPTER 2. SIDE-CHANNEL ANALYSIS OF XTS-AES the optimal test is given by the ratio
p(Δ = 0|P , P +1) H1 j j ≷ 1 (2.18) p(Δ =0 |Pj, Pj+1) H2
It can be expanded in terms of the likelihood and a priori distribution as
p(P , P +1|Δ=0)p(Δ = 0) H1 j j ≷ 1 (2.19) p(Pj, Pj+1|Δ =0) p(Δ =0) H2 where we consider that p(Δ = 0) = p(Δ =0)=0 .5. From Equations (2.16) and (2.17), we derive