EM Side-channel Analysis on Smartphone Early Boot Encryption
Oleksiy Lisovets
Master’s Thesis. June 1, 2020. Chair for Security Engineering – Priv.-Doz. Dr. Amir Moradi Advisor: David Knichel
Abstract
Modern smartphones often implement boot component encryption in order to add an obstacle for attackers who want to analyse and possibly exploit the device. This gives a false sense of security, as obscurity through encryption does not protect against vulnerabilities. In this thesis EM side-channel is used to analyse the hardware AES implementation of a smartphone and to recover the hardware fused encryption key. Therefore, a BootROM exploit is used to deploy a payload in boot loader context, which allows communicating with the hardware AES engine. Furthermore, the payload is used to expose a low latency interface to the CPU by repurposing a hardware button to become a GPIO output, as well as to modify the bootloaders crypto engine invokation function such that the exposed GPIO pin signals start and end of AES decryptions. This is then used as trigger signal which allows performing EM measurements for timing, SNR and correlation analysis, eventually leading to a CPA attack which recovers the hardware fused encryption key. The recovered key allows o ine decryption of current and future firmware files for the target device model.
i
Eidesstattliche Erklärung
Ich erkläre, dass ich keine Arbeit in gleicher oder ähnlicher Fassung bereits für eine andere Prüfung an der Ruhr-Universität Bochum oder einer anderen Hochschule eingereicht habe.
Ich versichere, dass ich diese Arbeit selbständig verfasst und keine anderen als die ange- gebenen Quellen benutzt habe. Die Stellen, die anderen Quellen dem Wortlaut oder dem Sinn nach entnommen sind, habe ich unter Angabe der Quellen kenntlich gemacht. Dies gilt sinngemäß auch für verwendete Zeichnungen, Skizzen, bildliche Darstellungen und dergleichen.
Ich versichere auch, dass die von mir eingereichte schriftliche Version mit der digitalen Version übereinstimmt. Ich erkläre mich damit einverstanden, dass die digitale Version dieser Arbeit zwecks Plagiatsprüfung verwendet wird.
O cial Declaration
Hereby I declare that I have not submitted this thesis in this or similar form to any other examination at the Ruhr-Universität Bochum or any other institution of university.
I o cially ensure that this paper has been written solely on my own. I herewith o cially ensure that I have not used any other sources but those stated by me. Any and every parts of the text which constitute quotes in original wording or in its essence have been explicitly referred by me by using o cial marking and proper quotation. This is also valid for used drafts, pictures and similar formats.
I also o cially ensure that the printed version as submitted by me fully confirms with my digital version. I agree that the digital version will be used to subject the paper to plagiarism examination.
Not this English translation but only the o cial version in German is legally binding.
Datum / Date Unterschrift / Signature
Contents
1 Introduction 1 1.1 Motivation ...... 2 1.2 Contribution ...... 2 1.3 RelatedWork...... 2 1.4 Organization of This Thesis ...... 3
2 Background 5 2.1 Advanced Encryption Standard ...... 5 2.1.1 AES Implementation ...... 6 2.2 Side-channelAnalysis ...... 7 2.2.1 Welch’st-test...... 8 2.2.2 Power Consumption of ICs ...... 9 2.2.3 PowerModel ...... 11 2.2.4 Signal-to-Noise Ratio ...... 11 2.2.5 Di erentialPowerAnalysis ...... 12 2.2.6 TheEMSide-channel ...... 13 2.2.7 Update Formulas ...... 13 2.3 SecureBoot...... 18
3Setup 19 3.1 TargetDevice...... 19 3.2 MeasurementSetup ...... 21 3.3 Computations ...... 21
4 Boot Loader Code Execution and AES Engine Access 23 4.1 Entrypoint ...... 23 4.1.1 Preparation of Images ...... 23 4.1.2 Payload Execution ...... 24 4.2 Payload Creation ...... 24 4.3 PayloadDescription ...... 25 4.3.1 Payload Entrypoint ...... 25 4.3.2 Main Function ...... 25 4.3.3 AES Hook ...... 26 4.3.4 Command Handlers ...... 28 4.3.5 Helper Functions ...... 28
5 AES Side-channel Analysis 29 5.1 AESEngineModes...... 29 iv Contents
5.2 AESTiming...... 29 5.3 InitialProbePlacement ...... 31 5.4 Non-Specifict-test ...... 31 5.5 Signal-to-Noise Ratio ...... 32 5.6 AESCPAPowerModels...... 35 5.7 LeakingPowerModel ...... 35 5.8 Evaluating Smaller Power Models ...... 36 5.9 FullChipScan ...... 37 5.10 AES CPA Attack ...... 38 5.10.1 CPAWithKnownKey ...... 38 5.10.2 CPA on Target Key ...... 40
6 Tooling 55 6.1 Alignment of Traces ...... 55 6.1.1 Manual Alignment ...... 55 6.1.2 Automatic Full Chip Scan Alignment ...... 56 6.2 E cient Correlation Implementation ...... 57 6.2.1 Server ...... 57 6.2.2 Clients...... 58 6.2.3 OpenCLGPUClient...... 59
7 Conclusion 63 7.1 Summary ...... 63 7.2 FutureWork ...... 63
A Acronyms 65
BCode 67
List of Figures 97
List of Listings 99
Bibliography 101 1 Introduction
Secure Boot is a common mechanism used in modern smartphones to prevent loading code not authorized by the vendor in the early stages of the boot process. This creates a foundation to assure the integrity of the operating system. Therefore, each component in the boot process is responsible to verify the next component before launching it, up until the kernel is booted[ios]. The root of trust is usually a small immutable piece of software which is fused into hardware, also called BootROM. Exploiting a vulnerability in any of the components involved in the boot process lets an attacker fully compromise the system before it even started up, thus breaking every assumption the system has about its integrity. This allows to fully circumvent all mitiga- tions which are not yet initialized at that point in time, but would have been a problem at a later stage, such as Apples Kernel Text Readonly Region (KTRR), a mitigation which assures the immutability of the kernel, or ARMv8.3 Pointer Authentication which prevents code reuse attacks such as Return Oriented Programming (ROP). To harden examination, debugging and exploitation of these boot components by unauthorized parties, vendors sometimes choose to encrypt firmware components in addition to just cryptographically authenticate them. Therefore, usually a key is fused into the hardware, which is used to decrypt individual components during the boot sequence. This layer of obscurity does not protect against boot loader vulnerabilities, but adds an additional obstacle for an attacker, who now first needs to break the encryption before a vulnerability can be debugged and exploited. Physical attacks like side-channel analysis can be valuable tools to break security barriers on a level where software engineers cannot defend the system. In this thesis the applicability of Electro Magnetic (EM) side-channel attacks for recovering the hardware fused encryption keys is examined. 2 1 Introduction
1.1 Motivation
As a security researcher it would be beneficial to be able to analyse the code running on your device. This requires to bypass the encryption. In this thesis EM side-channel is used to analyse the concrete Advanced Encryption Standard (AES) implementation of the Apple iPhone 4, as well as to recover the hard- ware fused Group Identifier (GID) key which allows o ine decryption of boot loader components, removing the need for having a physical device and boot loader software exploit. Since such an attack requires a decryption oracle which on its own would already be su cient to decrypt boot loaders, the goal here is to develop a method which can use a temporary decryption oracle, achieved by for example glitching the boot loader. This establishes a permanent unfixable primitive allowing to decrypt and analyse future firmwares for this device, which in turn can help to find and fix or exploit software vulnerabilities.
1.2 Contribution
This thesis evaluates the practical applicability of EM side-channel attacks on smartphones. Therefore, the hardware AES is analysed using timing analysis, t-test, Signal-to-Noise Ratio (SNR) and Correlation Power Analysis (CPA), allowing to recover the structure of the underlying implementation with a high confidence. Furthermore, the position on the System on Chip (SoC) with the greatest EM leakage is determined using a systematic approach. Finally, CPA is used to recover the GID key, which is fused into the Apple iPhone 4 SoC.
1.3 Related Work
Even though many people seem interested in recovering the GID key in order to perma- nently being able to decrypt firmware without the possession of a device and software exploit, there is not much publicly accessible research on this topic. A compilation of available resources can be found on theiphonewiki[Theb]. According to an article[Intc]byThe Intercept based on two documents leaked by Edward Snowden, the CIA was trying to retrieve the GID key using Di erential Power Analysis (DPA)[Intb] and by physically de-processing the chip[Inta] in 2012. However, until today there is no public information on anyone extracting a GID key and there is also no information available whether any private attempts where successful. The work in this thesis is most useful combined with glitching in order to achieve code execution in early boot stages. Practical applications of this were shown by Lu, who was able to get boot loader code execution in the PlayStation Vita using fault injection[Lu19]. Another practical application of glitching was shown by @derrek, 1.4 Organization of This Thesis 3
@nedwill and @naehrwert[@de16] at 33C3 in 2016, who glitched the Nintendo 3DS to bypass security protections at the highest privilege level. Schneider et al. provide a roadmap for side-channel evaluations[SM15], which can be used as a starting point when targeting a new device. Furthermore, Schneider et al. provide one-pass formulas[SMG15] for e cient parallel computation. This is especially useful for processing large amounts of data, which might be required for attacking a particular device. An alternative method for detecting side-channel leakage using the ‰2-Test is presented by Moradi et al.[MRSS18]. In addition to that, previous work by Moradi et al. evaluates practical attacks and countermeasures on reconfigurable hardware, which shows that attacks are indeed possible given enough traces and resources even with countermeasures in place, making it a viable attack vector for modern smartphones[MMP11]. Furthermore, Moradi et al. demonstrate a real-world attack against the bitstream encryption of Xilinx Virtex-4 and Virtex-5 o -the-shelf FPGAs, which is supposed to protect the intellectual property and to prevent fraud through cloning or manipulating a design or even implanting hardware Trojans[MKP12].
1.4 Organization of This Thesis
The thesis is organized by first giving an introduction to background knowledge in Chapter 2. Next, Chapter 3 presents the setup used in this thesis, including the target device and the modifications which were applied to it, measurement hardware like oscilloscope and probe as well as the computing hardware used to process the data. Afterwards, Chapter 4 explains what software needs to be executed on the target device and how to deploy it in order to create an accessible decryption oracle required for EM measurements. Chapter 5 deals with various di erent ways of analysing the target and what information can be gathered from each of them. Then, in Section 5.10 a CPA attack is performed in order to retrieve the hardware fused key from the target device. The algorithms and programs used to perform data processing are discussed in Chapter 6. Finally, Chapter 7 sums up the information retrieved about the target device and gives an outlook on future work.
2 Background
This chapter provides background knowledge required to follow the side-channel analysis in this thesis.
2.1 Advanced Encryption Standard
The Advanced Encryption Standard (AES) is a symmetric cryptographic algorithm standardised by the U.S. National Institute of Standards and Technology (NIST)in 2001[DR02]. It is a subset of the Rijndael block cipher, developed by Vincent Rijmen and Joan Daemen. AES encrypts 128-bit blocks with either a 128-bit, 192-bit or 256-bit key. The algorithm consists of a key schedule, an initial AddRoundKey step and 9, 11 or 13 rounds of: SubBytes, ShiftRows, MixColumns and AddRoundKey depending on the key length as well as an additional round lacking the MixColumns step. The 16 master key bytes are used as the first round key. In case of 192-bit or 256-bit AES, subsequent bytes are used for the second round key. Using the AES key schedule further key bytes are derived until there is a 16 byte round key for each round plus the initial AddRoundKey. The AES state is modeled by a 4x4 byte matrix which globally indexes the bytes column wise from left to right within the matrix and locally from top to bottom within the column. Given the plaintext as byte array p0, p1,...,p15, the initial data state of the AES is given as:
p0 p4 p8 p12 p p p p S = S 1 5 9 13T p2 p6 p10 p14 W X Wp p p p X W 3 7 11 15X The AddRoundKey step xors the roundU key bytewise toV the AES state. Afterwards, the SubBytes step replaces each byte, by using a non-linear 8-bit mapping referred to as Substitution Box (S-Box). Subsequently, ShiftRows permutates the position of the bytes in the matrix such that the individual rows are rotated to the left by their row index, counting from zero. This means the first row is not rotated, the second row is rotated to the left by one byte, the third row is rotated by two bytes and the last row is rotated by three bytes. Finally, MixColumns multiplies each column of the state matrix with a fixed multiplication matrix as shown below.
new0 2311 old0 new 3112 old S 1T = S T S 1T new2 1123ú old2 W X W X W X Wnew X W1231X Wold X W 3X W X W 3X U V U V U V 6 2 Background
This multiplication is performed in the Galois Field GF (28) using the irreducible polyno- mial x8 + x4 + x3 + x +1.
2.1.1 AES Implementation AES can be implemented in many di erent ways depending on the available resources and optimization goals. A few common implementation are described in the following.
2.1.1.1 Round Based The most straight forward approach is the round based implementation, where each step is implemented in one single circuit which is re-used in every round, as described in Section 2.1. An implementation in hardware usually follows one of the following two approaches: In the first one, a register is put after every step, which minimizes the critical path and allows to have a higher clock speed for the cost of additional hardware (registers). In the second one, a full round is computed in one clock cycle before the state is saved in a register. This slightly reduces the maximum possible clock speed due to a longer critical path, but also has a lower amount of required registers.
2.1.1.2 T-Table Another way of implementing AES is using T-Tables as described by the authors Daemen and Rijmen, which technically also is a round based implementation[DR02]. The idea is to create four 1-to-4 byte lookup tables which combine SubBytes, ShiftRows and MixColumns such that one column of a round can be written as:
s4Õ i+0 Ss4Õ i+1T = T0[s4i]+T1[s4(i+1)+1]+T2[s4(i+2)+2]+T3[s4(i+3)+3]+Kr[i] sÕ i W 4 +2X Ws X W 4Õ i+3X U V Here s denotes the old and sÕ the new state, Ti is the i-th T-Table and Kr is an array indexing 4 key bytes of the r-th round at a time. Furthermore, all indices to s are modulo 16, which was omitted in the formula for the sake of readability. This implementation is used when high performance is desired and large amounts of memory or physical space (in case of hardware implementation) are not an issue. One round needs 16 table lookups and 16 32-bit xor operations, but requires 4 4 256 = 4096 ú ú bytes of memory for the T-Tables.
2.1.1.3 Fully Unrolled Having a fully unrolled hardware implementation of AES means that there is a dedicated circuit for each step in every round. This allows having a full encryption in one single clock cycle, but serves several disadvantages. 2.2 Side-channel Analysis 7
One disadvantage is that due to a very long critical path, the clock speed needs to be low. Another is that a fully unrolled implementation consumes a lot of physical area on the chip. Furthermore, such an implementation is not very flexible, meaning it is not possible to switch between 128-bit, 192-bit and 256-bit key sizes (due to di erent amount of rounds) or encryption/decryption modes. It is possible to overcome the low clock speed issue by placing a register after every round. This allows having a high throughput if a lot of data needs to be encrypted with the same key, as this pipeline enables computing one encrypted block in every clock cycle after an initial latency of the amount of clock cycles required for one full encryption.
2.2 Side-channel Analysis
The AES cipher is said to be cryptographically secure, meaning there is currently no known attack breaking the mathematical properties in order to extract the key, even with an encryption/decryption oracle available. Individual implementation on the other hand can be vulnerable to attacks, even when the mathematical properties of a cipher are secure. Physical attacks exploit physical properties of a device, which can be divided into several categories as seen in Figure 2.1. This thesis will focus on passive non-invasive attacks, or more specific: side-channel attacks.
Passive Active Invasive Probing Forcing Permanent Faults Semi-Invasive Optical Inspection Light/Radiation Attacks Non-Invasive Side-channel Attacks Clock/Power/Temperature Glitching Attacks
Figure 2.1: Overview of Physical attacks
When a device is inspected while it processes data, i.e. during an encryption, an attacker may observe physical properties like the runtime of the particular implementation, the power consumption or EM emanations of the device during runtime or even the produced heat. These properties directly correlate with the processes on the device and can leak information about the internal state while the algorithm is executing. They are thus refered to as ‘side-channels’. If for example the runtime of an operation depends on a secret value, it might be possible to measure the time to gain information about that value. This is called a timing side-channel attack. Another example is that when a value in memory is replaced with a di erent value, the electrical circuit needs only to change its state if the new value is di erent from the old value, but not if the new value is the same as the old value. For example, if a 1 in memory is replaced with a 0 (or vice versa), more power is consumed, compared to the 8 2 Background
case where a 1 is replaced with a 1 (or 0 with 0). By measuring the power it might be possible to learn whether the value changed. This is called a power side-channel attack. The same principle applies to electromagnetic emanations, which directly correlate with the power consumption. When exploiting an EM side-channel, the electromagnetic emanations are measured rather than the power consumption. Due to the fact that the measured power consumption or electromagnetic emanation is not only influenced by informative data, called signal, but also by the environment and other unrelated processes, called noise, several statistical tools are used in order to filter out the noise and extract the signal from the measurements. The amount of useful signal that can be extracted highly depends on the implementation and the presence or absence of countermeasures. One countermeasure is hiding, which tries to decrease the SNR. This can be achieved by adding unrelated circuits processing random data, which will increase noise. Another countermeasure is masking, which attempts to hide leakage in higher statistical orders. The secret is split into multiple shares, by xoring it with a random value. For example, let x be the secret and r a random value. Then, one share is x = x r and the 1 ü other is x2 = r. Now both x1 and x2 are random values independent of the secret x on their own. The idea is to process both shares individually and combine them afterwards. For example, if f( ) is a linear function then f(x)=f(x x )=f(x ) f(x ) and · 1 ü 2 1 ü 2 therefore the shares can be processed individually.
2.2.1 Welch’s t-test The Welch’s t-test is a statistical hypothesis test which provides a quantitative value that the mean of two normal-distributed sets are di erent from each other[Wel47]. Using the t-test it is possible to examine the validity of the null hypothesis, stating that two sets are drawn from a normal distributed population with the same mean, and are thus not distinguishable by the mean. The value t is computed using the formula:
n n µ0 µ1 2 1 2 1 t = ≠ where ‡ = (xi µ) and µ = xi 2 2 n ≠ n ‡0 ‡1 i=1 i=1 n + n ÿ ÿ Ú 0 1
for two sets Q0 and Q1 with cardinality n0 and n1, the means µ0 and µ1 and variance 2 2 (where all values are equally likely) ‡0 and ‡1,respectively. In practical applications such as side-channel attacks, where both populations are large enough, a threshold of t > 4.5 is defined to reject the null hypothesis, as it leads to a | | confidence of >0.99999[SM15]. The Non-Specific t-test can be used for assessing leakage of a device in order to determine whether it might be exploitable. The idea is to measure the power consumption or EM emanation of two sets of encryptions, one with a fixed input and one where each measurement is done with a randomly varied input. Due to the fact that all intermediate values in the encryption are always the same for the same input, the power consumption should be similar when no countermeasures are implemented. 2.2 Side-channel Analysis 9
If the t-test yields rejection of the null hypothesis, then it is possible to distinguish the fixed input encryptions from random input encryptions solely by their power traces. That in turn means there is leakage which can possibly be exploited in order to recover the secret. A special measurement phase is needed, where the decision on encrypting a fixed or random input is based on a random coin toss. This is required because otherwise there might be dependencies not related to exploitable leakage, causing false positives in the t-test. If it is not possible to reject the null hypothesis using the first order t-test, higher-order testing can help detecting leakage in case countermeasures like masking are implemented. However, higher-order attacks are out of the scope of this thesis.
2.2.2 Power Consumption of ICs The power consumption of an Integrated Circuit (IC) can be divided into several com- ponents, where not every component is particularly interesting for the attacker. The total power consumption consists of the amount of power the device consumes when idle (Pconst) plus the amount of power which is dependent on the ICs instruction or data it processes (Pvar). The majority of today’s hardware is built using Complementary Metal-Oxide-Semiconductor (CMOS) technology, which consists of a pull-up network, connecting the output with power (Vdd ) and a pull-down network, connecting the output with ground (Vss). An example of a CMOS inverter is shown in Figure 2.2.
Figure 2.2: CMOS inverter[Wika]
When the circuit is in a stable state it consumes very little power. Although, even then there is a di erence in power consumption between di erent states. For example, if the input is low, the inverter connects Vout to Vdd. Since the transistors are not perfectly blocking, there is a small leaking current flow between Vdd and Vss. This issue becomes more relevant the smaller the technology becomes. Furthermore, whatever is connected to Vout will drain a little bit of energy from Vdd (when Vin is low), thus creating a very small current. This consumes more power in contrast to where Vout is connected to Vss. When the inverter changes its state, there is dynamic power consumption. For example, when Vout changes from low to high, then the circuit needs to be charged. This consumes 10 2 Background
more power than when Vout changes from high to low, in which case the circuit needs to be discharged. Another important e ect is the result of short-circuit current, where a short circuit between Vdd and Vss is created as both transistors are conductive for a short period of time when the input signal switches. The overall power consumption can be modeled as Ptotal = Pconst + Pconst_expl + Pvar_expl + Pnoise. Using a statistical approach it is possible to overcome Pconst and Pnoise, where the latter is usually considered as a Gaussian normal distribution, in order to extract the exploitable part.
2.2.2.1 Glitches Glitches are unnecessary signal transitions without functionality, which occur when a circuit changes its state. When all inputs change at the same time (i.e. on a rising clock edge), some signals arrive slower at certain gates than others due to a longer physical path or more gates that need to be passed. Figure 2.3 shows a circuit consisting of two logic gates, namely an or gate and an and gate.
Figure 2.3: Logic gates[Wikb]
When for example the inputs (A =1, B =0, C =0) change to (A =0, B =0, C =1), signal C =1arrives at the and gate before AB holds the correct value. Thus, for a short period, the and gate sees the values AB =1and C =1, which causes the output O to be 1 for some time before it holds the correct value 0. Since the output was 0 in the previous clock cycle and is again 0 in the current clock cycle, the temporary transition to 1 was unnecessary. The larger the critical path is the more unnecessary transitions there are in a circuit before the output holds a stable value. This has no e ect on correct functioning of the circuit, since the clock cycle is always long enough to let the signals propagate correctly before the output is used in the next cycle, however this causes the power consumtion to be much higher in a clock cycle where inputs change compared to a clock cycle where the inputs stay the same. Glitch power dissipation is 20%-70% of total power dissipation and thus should be considered for side-channel attacks. 2.2 Side-channel Analysis 11
2.2.3 Power Model When attacking an IC using power side-channels, the exploitable power consumption is often modeled using a so called power model to recover the ICs internal values by utilizing measurements of the power consumption during runtime. Given data stored in a register, two possible power models can be applied. The first one models the static power consumption, i.e. if the register stores a 1 it will consume more power compared to when it stores a 0 (or vice versa in case of a low active circuit). The other one models the power consumption on state transition, i.e. the moment the register gets updated with a new value. If the old value matches the new value, less power is consumed for transitioning between the states, compared to the case where the values are di erent. This is because of the high number of glitches, where the circuit needs to be (dis-)charged several times. In order to recover a one byte value from a register using power traces, a hypothetical power consumption is created for each value and compared to the real trace. The hypothetical power consumption being closest to the real trace is most likely the one yielding the correct internal value. Applying this to the previously given example, the power model would either be the hamming weight of the value or hamming distance between the old and the new value.
2.2.4 Signal-to-Noise Ratio
The Signal-to-Noise Ratio (SNR) provides a way to find the points in time where Pexp is largest and thus most promising for an attack. The general definition is
Var(Signal) V AR(P ) SNR = , which in the given context translates to SNR = exp Var(Noise) V AR(Pnoise)
When attacking an implementation, the SNR can be used to determine how much and at what point in time a specific power model (see Section 2.2.3) leaks. Therefore, the traces are separated into groups based on the power model. If for example the 8-bit hamming weight of the first plaintext byte is used as a power model, the traces are divided into 9 groups according to that model. Then, the SNR at each point in time for the trace is computed as
2 ngroups ‡means 1 2 SNR = with µvariances = ‡gi and µvariances ngroups ÿi=1
ngroups ngroups 2 1 2 1 ‡means = (µgi µmeans) while µmeans = µgi ngroups ≠ ngroups ÿi=1 ÿi=1 2 Here µgi and ‡gi are being the mean and variance of group i, respectively, at a single point in time over all traces. 12 2 Background
2.2.5 Di erential Power Analysis Di erential Power Analysis (DPA) published by Kocher et. al in 1999 is a statistical approach to exploit the data dependent part of the power consumption[KJJ99]. The idea of the attack is to analyse the power consumption at a fixed moment in time for di erent inputs. Each point in time is considered independently, which allows not only to recover the secret, but also the point in time that an intermediate value is processed on the device. Similar to evaluating the SNR, a power model is required to perform a DPA attack, depending on which only a small subset of the key space is considered at a time. The key is usually recovered either bytewise or in small groups of 2 or 4 bytes at once.
2.2.5.1 Classical Di erential Power Analysis In classical DPA traces are divided into groups based on a power model and key candidate. Then, the mean is computed for each point in time on the trace for each of those groups. The general idea is that for an incorrect key candidate the traces are grouped randomly such that the means in the individual groups are similar, resulting in a low variance of means. If an accurate power model is selected, the right key guess will lead to a sorting with distinguishable means. For example, when using the hamming weight as power model, traces where a value with higher hamming weight is processed are expected to consume more power than traces where a value with lower hamming weight is processed. This will be reflected by the mean power consumption of the groups if the traces are grouped correctly. In that case the variance of means is higher than the variance of means of randomly grouped traces.
2.2.5.2 Correlation Power Analysis A more recent variant is Correlation Power Analysis (CPA), which uses hypothetical power consumptions, rather than splitting up the traces into groups. For each key guess, this creates a hypothetical power consumption for each trace, which is then correlated with the real power consumption. Again, the idea is that the correct key candidate yields a hypothetical power consumption which correlates much better than hypothetical power consumptions generated based on wrong candidates. Therefore, the Pearson correlation coe cient which determines the linear dependency between two random variables yielding a value between 1 and 1[BCHC09], is used. It ≠ is computed by dividing the covariance of two sets by the product of their standard derivation: cov(X, Y ) n (X X¯)(Y Y¯ ) fl = = i=1 i ≠ i ≠ ‡X ‡Y nq(X Y¯ )2 n (Y Y¯ )2 i=1 i ≠ i=1 i ≠ Òq Òq For the attack a key candidate with high absolute correlation is an indicator for a correct guess, while a correlation value close to zero suggests an incorrect guess. 2.2 Side-channel Analysis 13
2.2.6 The EM Side-channel
Side-channel attacks like Simple Power Analysis (SPA) and Di erential Power Analysis (DPA) by Kocher et al. have been exploiting the power consumption since 1999[KJJ99]. However, sometimes the power side-channel is unavailable due to countermeasures or other practical obstacles. The EM side-channel provides a more powerful alternative to the classical power side-channel, which can sometimes be exploited even in presence of power side-channel countermeasure. While the power side-channel provides only a single aggregated view of the circuits power consumption, the EM side-channel can be used to inspect various signals at di erent frequencies. Furthermore, by carefully chosing a probe position, it is possible to focus on only one specific area on the circuit and to ignore unrelated processes happening at other physical locations. Agrawal et al. divide EM emanations into two categories[AARR03], as described in the following. Direct emanations result from intentional current flows, usually consisting of short bursts of current observable over a wide frequency band. In order to get good results with direct emanations, tiny field probes need to be positioned very close to the signal source, which might require decapsulating the chip packaging. Unintentional emanations are caused by electrical and electromagnetic coupling between components in close proximity. They are the most interesting for this thesis. The e ect is amplified by capacitors and capacitance of power lines, as they emanate a stronger signal when their charge changes compared to emanations of logic gates and data lines.
2.2.7 Update Formulas
When further processing a large scale measurement, available resources like Random Access Memory (RAM) or computing power pose a limit for computing correlation using standard formulas. A solution to this problem are incremental and pairwise update formulas[SMG15], which provide an e cient, scaleable and parallelizable way of computing the mean, variance and correlation.
2.2.7.1 Incremental Mean Update Formula
The standard formula for computing the mean is
1 n µ = x Q n i ÿi=1 If we add a new element to the set without updating the mean, the old mean is computed as
n 1 1 ≠ µQ = xi n 1 ≠ ÿi=1 14 2 Background
We call the updated mean
n n 1 1 1 ≠ µÕ = x = ( x + x ) Q n i n i n ÿi=1 ÿi=1 Next, the equation of the old mean is written as
n 1 ≠ x =(n 1)µ i ≠ Q ÿi=1 and inserted into the new mean equation resulting in 1 µÕ = ((n 1)µ + x ) Q n ≠ Q n By applying the transformation 1 1 nµ µ + x µ x ((n 1)µ + x )= (nµ µ + x )= Q + ≠ Q n = µ Q ≠ n n ≠ Q n n Q ≠ Q n n n Q ≠ n we get the incremental update formula µÕ = µ for =µ x Q Q ≠ n Q ≠ n
2.2.7.2 Incremental Variance Update Formula The standard formula for computing the variance is
1 n ‡2 = (x µ )2 Q n i ≠ Q ÿi=1 However, in order to create an incremental update formula, we will use the first central moment instead, which is defined as
n s2 = (x µ )2 Q i ≠ Q ÿi=1 Following the same procedure as before we get
n 1 ≠ s2 = (x µ )2 Q i ≠ Q ÿi=1 if we add one element to the set, but do not update the central moment or the mean. Thus, to get the new central moment we write
n n 1 2 2 ≠ 2 2 s = (xi µQ ) = (xi µQ ) +(xn µQ ) QÕ ≠ Õ ≠ Õ ≠ Õ ÿi=1 ÿi=1 2.2 Side-channel Analysis 15
From Section 2.2.7.1 we know that µQÕ can be written as µ = µ with =µ x QÕ Q ≠ n Q ≠ n By inserting that into the central moment formula, we get
n 1 2 ≠ 2 2 s = (xi µQ + ) +(xn µQ + ) QÕ ≠ n ≠ n ÿi=1 n 1 ≠ = ((x µ )2 + 2(x µ ) +( )2)+(x µ + )2 i ≠ Q i ≠ Q n n n ≠ Q n ÿi=1 n 1 n 1 ≠ ≠ = (x µ )2 + 2(x µ ) +(n 1)( )2 +(x µ + )2 i ≠ Q i ≠ Q n ≠ n n ≠ Q n ÿi=1 ÿi=1 n 1 2 ≠ (n 1) 2 = s2 + (x µ )+ ≠ +( (µ x )+ )2 Q n i ≠ Q n2 ≠ Q ≠ n n ÿi=1 Next, we insert
n 1 2 ≠ (µ x )= and (x µ )=0 Q ≠ n n i ≠ Q ÿi=1 This results in (n 1) 2 s2 = s2 +0+ ≠ +( )2 QÕ Q n2 n ≠ which we transform to (n 1) 2 n s2 = s2 + ≠ +( )2 QÕ Q n2 n ≠ n (n 1) 2 (1 n) = s2 + ≠ +( ≠ )2 Q n2 n (n 1) 2 (1 n)2 2 = s2 + ≠ + ≠ Q n2 n2 (n 1) 2 +(1 2n + n2) 2 = s2 + ≠ ≠ Q n2 (n2 n) 2 = s2 + ≠ Q n2 2(n 1) = s2 + ≠ Q n Thus, we get the incremental formula