<<

NOVEL METHODSOF PASSIVEAND ACTIVE SIDE-CHANNEL ATTACKS

DISSERTATION

zur Erlangung des Grades eines Doktor-Ingenieurs der Fakultat¨ fur¨ Elektrotechnik und Informationstechnik an der Ruhr-Universitat¨ Bochum

by Falk Schellenberg Bochum, October 2018 Copyright © 2018 by Falk Schellenberg. All rights reserved. Printed in Germany. Falk Schellenberg Place of birth: Gera, Germany Author’s contact information: [email protected] https://www.emsec.rub.de/chair/_staff/Falk_Schellenberg/

Thesis Advisor: Prof. Dr.-Ing. Christof Paar Ruhr-Universit¨atBochum, Germany Secondary Referee: Prof. Dr. Daniel Holcomb University of Massachusetts Amherst, USA Thesis submitted: October 30, 2018 Thesis defense: November 30, 2018 Last revision: December 3, 2018

iii

Abstract

Large mainframes herald the start of the Digital Revolution which soon permeated almost all facets of our daily life. Starting in the mid-70s, shrinking sizes of integrated circuits enabled the “one computer per user” paradigm. In its recent evolution, numerous interconnected devices and sensors form the Internet of Things. The information stored or transferred within these heterogeneous networks needs protection, e.g., to guarantee its authenticity or to assure its con- fidentiality. Such security features are realized through cryptographic protocols and algorithms. Although sometimes hidden by multiple layers of abstraction, every cryptographic operation is executed on some form of hardware. While webservers etc. might only be remotely accessi- ble, cryptographic hardware in the hand of a potential attacker opens an entirely new attack surface: Implementation attacks target the physical realization of algorithms and are mostly independent of the mathematical security of the employed scheme. Passive side-channel attacks try to gain some insight into the cryptographic operation by unintentional information channels such as the execution time or power consumption. Active fault injection attacks enforce some computational error during the target’s operation and thereby reveal internals of the operation by the faulty output. This thesis’s research contribution covers these two main classes of implementation attacks, i.e., novel passive and active attacks. In the first part, we demonstrate that a voltage sensor built in the user-available fabric of a Field Programmable Gate Array (FPGA) can be used for side-channel attacks on circuits nearby. Even if the targeted circuit is logically isolated, its activity will cause fluctuations on the Power Distribution Network (PDN) that might spread far enough to be picked up by the sensor. We demonstrate the capabilities of the sensor through three scenarios: First, we successfully attack a cryptographic algorithm residing on the same FPGA as the voltage sensor. For multi-tenant FPGAs, this could allow one user to spy at the other. As there is no direct connection to the target required, the sensor could also be deployed as malicious IP-core. Second, many Systems on Chip (SoCs) today come with additional FPGA fabric as hardware accelerator. We show that a sensor implemented in the fabric can pick up side-channel leakage of software running on the CPU-core. In the third scenario, we reach even further and target cryptographic implementations running on a separate chip on the same circuit board. The only connection between attacker and victim is the shared PDN. Besides FPGAs, our sensor can be implemented in Application Specific Integrated Circuits (ASICs). Both poses a large threat to board-level integration. Our contribution indicates that power side-channel countermeasures should be considered for such scenarios, even though there is no physical access for an attacker. In the second part of our research contribution, we describe novel methods to aid in laser fault injection. Whether a useful fault is injected heavily depends on the used parameters and the location on the die. Iterating over the whole parameter space for every potential point-of- interest is infeasible. Instead, we propose measuring the Optical Beam Induced Current (OBIC) as imaging technique using a setup intended for laser fault injection. We demonstrate that in

v the image captured through laser scanning, we can easily identify the locations of flip-flops as primary target for the desired single-bit faults. We stress that in contrast to a fault injection from coarse to fine granularity, the chip is not powered during imaging. Thus, the identification of flip-flops is independent of other parameters such as pulse length, energy etc. Further, potential reactive countermeasures such as deleting the key cannot deploy. In an additional work, we combine Fault Sensitivity Analysis (FSA) and laser fault injection, resulting in relaxed requirements for the fault precision, especially relating to the required spot size. The motivation here is that the minimal laser spot size is physically bounded. There are numerous research articles pushing the boundary by investigating down to which feature size single bit faults can be injected. In contrast, by precisely timing the laser fault injection, only the longest critical path(s) will affect the target, similar to a timing fault. For shorter paths, it is irrelevant if they are affected by the laser or not, as a potential faulty value will be overwritten anyway. Thus, we can compensate the required laser spot size by proper timing.

Keywords. Implementation Attacks, Side-Channel Analysis, Field Programmable Gate Arrays, Laser Fault Attacks

vi Kurzfassung

Neuartige Methoden passiver und aktiver Seitenkanalangriffe

Die Digitale Revolution ging aus Großrechnern hervor, die von mehreren Benutzern gleichzei- tig verwendet wurden. Dies ¨anderte sich Mitte der 1970er Jahre durch kompaktere integrierte Schaltkreise mit dem Einzug des neuen Paradigmas ein Computer pro Benutzer“. Die neueste ” Evolutionsstufe vernetzt eine Vielzahl kleiner Ger¨ate und Sensoren—die Geburt des Internet der Dinge. Die Informationen und Daten, die in solchen heterogenen Netzwerken gespeichert oder transportiert werden, ben¨otigen kryptografischen Schutz, um z. B. die Authentizit¨at oder die Vertraulichkeit der Nachricht zu gew¨ahrleisten. Oftmals versteckt hinter vielen Abstraktions- ebenen wird jedes kryptografische Verfahren auf irgendeiner Art Hardware ausgefuhrt.¨ Im Ge- gensatz zu beispielsweise Webservern, auf die ublicherweise¨ kein physikalischer Zugriff m¨oglich ist, kann ein eingebettetes Ger¨at in der Hand eines Angreifers v¨ollig neue Angriffswege ¨offnen: Sogenannte Implementierungsangriffe zielen auf die physikalische Realisierung einer Sicherheits- funktion ab und sind weitestgehend unabh¨angig von deren mathematischen Sicherheit. Passive Seitenkanalangriffe verwenden unbeabsichtigte Informationskan¨ale wie die ben¨otigte Rechen- zeit oder den Stromverbrauch, um Ruckschl¨ usse¨ auf Zwischenwerte w¨ahrend der Berechnung zu ziehen. Aktive Fehlerangriffe versuchen einen Fehler w¨ahrend der kryptografischen Berechnung zu erzeugen. Eine Analyse der fehlerhaften Ausgabe l¨asst m¨oglicherweise ebenfalls Ruckschl¨ usse¨ auf Zwischenwerte zu. Die Forschungsbeitr¨age dieser Dissertation erstrecken sich uber¨ diese beiden Hauptgebiete der Implementierungsangriffe: neuartige aktive und passive Angriffe. Im ersten Teil demonstrie- ren wir wie ein selbstentwickelter Spannungssensor innerhalb eines Field Programmable Gate Arrays (FPGAs) fur¨ Seitenkanalangriffe auf benachbarte Schaltungen verwendet werden kann. Selbst wenn die Opferschaltung auf der Logikebene vom Sensor isoliert ist, erzeugt diese Span- nungsschwankungen auf dem Power Distribution Netzwerk, die vom Sensor detektiert werden k¨onnen. Die M¨oglichkeiten des Sensors werden durch drei verschiedene Szenarien demonstriert: Zun¨achst wird eine benachbarte Schaltung innerhalb desselben FPGAs angegriffen. In Anwen- dungsf¨allen, bei denen sich mehrere Benutzer einen FPGA teilen, k¨onnte ein Benutzer einen anderen aussp¨ahen. Weiterhin wird keine direkte Logikverbindung zum Sensor ben¨otigt. Daher k¨onnte der Sensor auch durch einen pr¨aparierten IP-Kern eingeschleust werden. Im zweiten Szenario befindet sich der Sensor in der FPGA-Logik eines System-on-Chip und sp¨aht erfolg- reich Software auf der CPU aus. Im dritten Szenario k¨onnen wir Seitenkanalinformation eines Chips aufnehmen, der sich lediglich auf der gleichen Platine befindet. Die einzige Verbindung ist wieder die gemeinsame Spannungsversorgung. Unser Beitrag zeigt auf, dass Seitenkanalge- genmaßnahmen unter Umst¨anden auch ben¨otigt werden, selbst dann, wenn ein Angreifer keinen physikalischen Zugang hat. Der zweite Teil unseres Forschungsbeitrags behandelt neuartige Methoden der Laserfehler- injektion. Ob ein fur¨ den Angreifer nutzlicher¨ Fehler injiziert werden kann, h¨angt von vielen

vii Parametern ab. Den kompletten Parameterraum vollst¨andig zu testen ist zeitlich unm¨oglich. Wir schlagen stattdessen vor, den mittels Laser injizierten Strom zu messen und daraus ein Bild zu erzeugen (Optical Beam Induced Current). Wir zeigen, dass damit Flip-Flops leicht identifiziert werden k¨onnen, um dort sp¨ater die h¨aufig angestrebten Ein-Bit“-Fehler zu erzeu- ” gen. Da der Chip w¨ahrend der Bildaufnahme ausgeschaltet ist, sind wichtige Parameter fur¨ die Fehlerinjektion hier unwichtig und außerdem k¨onnen reaktive Gegenmaßnahmen nicht ausl¨ost werden. In einer weiteren Arbeit kombinieren wir Fault Sensitivity Analysis“ mit Laserfehlerinjektion. ” Daraus ergibt sich ein sehr angenehmes Fehlermodell, besonders in Bezug auf die ben¨otigte Spotgr¨oße. Da die minimal erreichbare Spotgr¨oße physikalisch beschr¨ankt ist, wurde in vielen Artikeln untersucht, bis zu welcher Technologiegr¨oße immer noch Ein-Bit Fehler erzeugt werden k¨onnen. Im Gegensatz dazu verwenden wir zeitlich pr¨azise Laserpulse, sodass nur die kritischen Pfade einen Effekt haben. Ahnlich¨ wie bei Timing-Angriffen sind Fehler in kurzeren¨ Pfaden irrelevant, weil diese ohnehin wieder uberschrieben¨ werden. Dadurch k¨onnen wir einen großen Laserspot durch pr¨azises Timing kompensieren.

Schlagworte. Physikalische Angriffe, Seitenkanalanalyse, Field Programmable Gate Arrays, Laserfehleran- griffe

viii Acknowledgements

The work described in this thesis spreads over almost six years of research at the chair of Embedded Security (EMSEC) at Ruhr-Universit¨atBochum (RUB). This thesis would not have been possible without the support by my family and my friends and colleagues at RUB: First, I thank Christof Paar for being a great supervisor and for his tremendous support while at the same time, leaving me room to develop at my projects, 1000 Dank! Also, I thank Daniel Holcomb for agreeing to be my external referee. I thank my later office neighbors Timo Kasper and David Oswald for supporting me and raising my interest in physical attacks which convinced me of pursuing my PhD at EMSEC. I thank Amir Moradi, the guru of side-channel attacks, for his remarkable support. I thank Irmgard K¨uhnand Horst Edelmann for keeping EMSEC running and their administrative and technical support. Also, I thank all my co-authors (in alphabetical order): Anita Aghaie, Florian Bache, Carsten Brenner, Benedikt Driessen, Markus Finkeldey, Nils C. Gerhardt, Dennis R.E. Gnad, Lena G¨oring,Martin R. Hofmann, Daniel Holcomb, Timo Kasper, Shahrzad Keshavarz, Gregor Lean- der, Amir Moradi, David Oswald, Shahram Rasoolzadeh, Bastian Richter, Maximilian Sch¨apers, Tobias Schneider, Daehyun Strobel, and Mehdi B. Tahoori. Further, I thank all my friends and colleagues at RUB for their support, especially (in no par- ticular order): Bastian Richter, with whom I shared an office after David and Timo, for many technical discussions, Markus Kasper from whom I inherited funding through truly interesting projects, Markus Finkeldey and Nils C. Gerhardt for advice in optics and making the numerous train rides enjoyable, Dennis Gnad for the fruitful collaboration on voltage sensors in fabric, Tobias Schneider for joining me at soccer who, together with Pascal Sasdrich, explained me a lot of math, Georg T. Becker for his coffee-visits as Post-doc, Klaus Gomann for testing chem- ical processes for decapsulation with me, and later on, Norbert Ischler and Nicole L¨utkem¨oller for allowing me to use their fume hood, Gregor Leander and Stefan Heyse for discussions and advice during many 5-minute breaks, Susanne Engels for joining me at soccer and Tony’s chore- ography, Endres Puschner for continuing the work on the Brett, David Knichel for keeping project-related workload off me while I am writing this, Georg T. Becker, Christof Beierle, Ge- sine Hinterw¨alder,Thorben Moos, Bastian Richter, Tobias Schneider, Daehyun Strobel, Pawel Swierczynski, Carina Wiesen, and Christian Zenger for various (self-paid!) Privataufenthalte extending conference trips to enjoyable holidays, thank you! Finally, I thank all the students I happily supervised.

ix

Table of Contents

Preface...... i Imprint...... iii Abstract...... v Kurzfassung...... vii Acknowledgements...... ix

I Preliminaries1

1 Introduction 3 1.1 Motivation...... 3 1.2 Summary of Research Contribution...... 5 1.2.1 Novel Applications of Side-Channel Attacks...... 6 1.2.2 Advanced Laser Fault Injection Attacks...... 7 1.3 Structure of this Thesis...... 9

2 Implementation Attacks 11 2.1 Introduction...... 11 2.2 Side-Channel Analysis...... 11 2.2.1 Simple Power Analysis (SPA)...... 12 2.2.2 Differential Power Analysis (DPA)...... 12 2.2.3 Countermeasures...... 13 2.3 Fault Attacks...... 13 2.3.1 Physical Means of Fault Injection...... 14 2.3.2 Fault Exploitation...... 15 2.3.3 Countermeasures...... 15

II Novel Applications of Side-Channel Attacks 17

3 Preliminaries 19 3.1 Introduction...... 19 3.2 Field Programmable Gate Arrays...... 19 3.3 Power Distribution and Consumption...... 22 3.3.1 Power Distribution Networks (PDNs)...... 22 3.3.2 Power Requirements of CMOS devices...... 23

xi Table of Contents

3.4 Measuring the Power Side Channel...... 24 3.4.1 Traditional Measurement...... 24 3.4.2 Voltage Sensors in FPGA Fabric...... 25

4 Intra-FPGA Side-Channel Attacks 29 4.1 Introduction...... 29 4.2 Adversary Model...... 30 4.3 Experimental Setup...... 31 4.3.1 Voltage Sensor in FPGA Fabric...... 31 4.3.2 AES Module...... 32 4.4 Results on the SAKURA-G...... 33 4.4.1 Implementation...... 33 4.4.2 Sensor placed close to the AES core...... 34 4.4.3 Distant Sensor...... 35 4.5 Results on General-Purpose FPGA Development Platforms...... 35 4.5.1 Digilent Basys3...... 36 4.5.2 Digilent PYNQ...... 36 4.6 Conclusion...... 37

5 System-on-Chip Side-Channel Attacks, from fabric to CPU 43 5.1 Introduction...... 43 5.2 Experimental Setup...... 44 5.3 Results...... 44 5.4 Conclusion...... 46

6 Inter-FPGA Side-Channel Attacks 47 6.1 Introduction...... 48 6.2 Adversary Model...... 48 6.3 Experimental Setup...... 49 6.4 Results...... 50 6.4.1 Attack on AES...... 50 6.4.2 Attack on RSA...... 52 6.4.3 Discussion...... 53 6.5 Conclusion...... 53

III Active Side-Channel Attacks 57

7 Background on Laser Fault Injection 59 7.1 Introduction...... 59 7.2 Physical Properties...... 60 7.2.1 Photoelectric Effect in Semiconductors...... 60 7.2.2 Single Event Transient...... 61 7.2.3 Single Event Upset...... 61 7.2.4 Wavelength...... 63 xii Table of Contents

7.2.5 Spot Size...... 63 7.3 Sample Preparation...... 64

8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements 67 8.1 Introduction...... 67 8.1.1 Related Work...... 68 8.1.2 Our Contribution...... 69 8.2 Optical Beam Induced Current...... 70 8.3 Experimental Setup...... 71 8.3.1 Device under Test (DUT)...... 71 8.3.2 Optical Setup...... 71 8.4 Results...... 73 8.4.1 Estimating the Rough Location of AES...... 73 8.4.2 OBIC Measurements...... 73 8.4.3 Correlation-Based Pattern Recognition...... 76 8.4.4 Finding the Correct Timing...... 77 8.4.5 Targeting found Locations...... 78 8.4.6 Differential Fault Attack...... 79 8.5 Discussion...... 80 8.5.1 Reduction of Points of Interest...... 80 8.5.2 OBIC versus Reflective Imaging...... 81 8.5.3 Influence of the Technology Node...... 81 8.6 Conclusion...... 82

9 Large Laser Spots and Fault Sensitivity Analysis 85 9.1 Introduction...... 85 9.2 Fault Sensitivity Analysis...... 86 9.3 Laser-Based Fault Sensitivity Analysis...... 87 9.3.1 Timing Violations by Different Laser Pulse Lengths...... 87 9.3.2 Attack Strategy...... 88 9.4 Practical Evaluation...... 89 9.4.1 Experimental Setup...... 89 9.4.2 Measuring Individual Timings...... 90 9.4.3 Attack Results...... 91 9.5 Discussion...... 92 9.6 Conclusion...... 93

IV Conclusion 95

10 Conclusion and Future Work 97 10.1 Conclusion...... 97 10.2 Future Work...... 98 10.2.1 Increasing the Distance between Sensor and Victim...... 98 10.2.2 Additional Scenarios and Countermeasures...... 99

xiii Table of Contents

10.2.3 Laser Fault Injection on Latest Technology Nodes and Related Attacks. 100

V Appendix 101

Bibliography 103

List of Abbreviations 123

List of Figures 125

About the Author 127

Publications and Academic Activities 128

xiv Part I

Preliminaries

1

Chapter 1 Introduction

Motivated by the increasing pervasion of embedded devices in the Internet of Things, we introduce physical attacks as ongoing threat to cryptographic implementations. We summarize our contributions to the fields of passive and active implementation attacks and provide the structure of the remainder of this thesis.

Contents of this Chapter

1.1 Motivation ...... 3 1.2 Summary of Research Contribution ...... 5 1.3 Structure of this Thesis ...... 9

1.1 Motivation

The Digital Revolution was paved by the invention of the transistor itself in 1947: As a replace- ment for bulky and error-prone vacuum tubes, the first digital switch was made of semiconductor material. Soon after, multiple transistors were fabricated on a single crystal, forming the first integrated circuit in 1949. Continuous integration drastically changed the size and power con- sumption of computers or other digital systems. At first, one large mainframe computer was to be shared among multiple users in the late 1950s until the 1970s. Roughly every following decade revolutionized the interaction between humans and computers anew. Starting in the mid-1970s, personal computers and gaming consoles opened computing and digital entertain- ment to everybody, i.e., starting the “one computer per user” paradigm. The first commercially available non-stationary hand-held phone was introduced by Motorola in 1983. In 1991, the World Wide Web as we know it today became publicly accessible. This new territory opened nu- merous new markets which unfortunately led to the dot-com bubble in the late 1990s. Starting in the early 2000s, cloud storage and more prominently cloud computing can be seen somewhat as a back-to-the-roots development, as they share characteristics with mainframe computers. Here, users can dynamically rent storage or computing power suiting their business without the need of buying actual hardware. Another important milestone is the availability of smartphones starting in 2007: Mobile phones with increased computing power and better cellular networks merged Internet access and PCs into a single mobile personal computer. Going even further, multiple (small) connected devices form now the Internet of Things. This includes either mostly stationary devices in Smart Home applications or mobile sensors, actors etc., all being connected through a local network or the Internet. The equivalent of the Internet of Things in industry

3 Chapter 1 Introduction is titled Industry 4.0, which is the most recent IT evolution in the industrial manufacturing sector. All the developments described above require strong security features or services tailored to the specific use case. For example, whenever handling sensitive or private user data, privacy and confidentiality is of concern. An orthogonal example is transferring money digitally, i.e., from one bank account to another. While it is of course convenient that the transferred amount is kept secret, instead the integrity and authenticity of the ordered transfer are crucial. For old mainframe computers, such goals might have been achieved easily through simple access rights or locks on the door. However, the vast number of connected devices and data being stored or in transit requires more complex and adapted security measures. Security goals in virtually every application, may it be SSL, WhatsApp, or bitcoin, are achieved using one or multiple cryptographic primitives as atomic building blocks. There are two basic classes of cryptographic primitives, characterized by their distribution of cryptographic key material. For symmetric schemes, all users share a common secret key for communication. In asymmetric schemes, there is a public key for encryption and a distinct private key for decryption. Asym- metric schemes are often constructed based on number-theoretic hard problems like the hardness of factoring large integers. For most symmetric scheme there are no strict mathematical proofs. Instead, trust in such schemes is usually based on their description being public for years up to decades, to be evaluated publicly. Ultimately, every cryptographic protocol or algorithm is executed on some form of hardware, be it that one CPU within Amazon’s server farm, a coprocessor in a smartphone or a tiny microprocessor in a medical sensor. The crux of the latter is that while webservers might only be remotely accessible, most embedded microprocessors are in the hand of a potential attacker. Thus, there might be an attacker that tries to tamper with the device, to take it apart, to record its electromagnetic emanation, etc. Exceeding simple access to inputs and outputs of the device, this leads to an additional attack surface by a physical attacker. Insufficient physical security can result in the disclosure of secret data such as cryptographic keys and potentially, in a full compromise of a system. Such implementation attacks can be divided into two categories: active and passive attacks. Passive side-channel attacks exploit unintentional information channels that might leak infor- mation about the secret key. Such side channels include the response time of the device, its power consumption, or, physically closely related, its electromagnetic emanation. If the re- sponse time of a device depends on the value of secret data, it might be easy for an attacker to recover this secret even remotely over the Internet. Conditional branches or cache-misses are often the source of such timing variations. The power consumption of a circuit depends on the operation being currently performed, i.e., it is linked to the active transistors within a clock cycle of some computation. This might allow deducing information on what is currently being processed: For example, a binary exponentiation might be vulnerable when an attacker can differentiate a multiplication and a squaring if they are implemented in distinct functions. If the power consumption is linked to a key-dependent intermediate value, e.g., when this value is transferred over a bus, multiple traces can be statistically evaluated with Differential Power Analysis (DPA). Active attacks form the second main class of implementation attacks. Here, the attacker either operates the device outside its specification, i.e., by clock glitches or underpowering, or affects the computation by additional external stimulus, i.e., by laser radiation or electromagnetic (EM)

4 1.2 Summary of Research Contribution pulses. The goal of the attacker is to create a computational error or fault during the target’s computation. Afterwards, the attacker uses the genuine and the faulty output of a cryptographic algorithm to perform Differential Fault Analysis (DFA). Here, the attacker tries to deduce which secret keys could lead to such an output for the given fault. The mathematical security of a scheme is evaluated considering a black-box model, i.e., only inputs and outputs are available to an attacker. Instead, all implementation attacks are based on gaining some insight into the black box. This means that the success of physical attacks is usually independent of the mathematical security. For example, if the device under test can somehow be forced to spill out its secret key using fault injection, the employed scheme is entirely irrelevant. Considering countermeasures, specific programming styles might avoid timing side channels. For DPA, one option is to bury the signal in noise (hiding). A large body of research is based on making the power consumption to be independent of the used secret through randomization (masking). In practice, a combination of both is often used. For active attacks, there are sensors to detect specific methods of fault injection, i.e., sensing the voltage supply for glitches. More general, one can think of multiple variants of redundancy (time/area) with a final comparison in the end. Finally, it is important to highlight the cycle in security research: As there are no strict mathematical proofs for most of the cryptographic schemes, trust in such schemes can only be established through continuous public testing and evaluation. The same applies for research within the field of physical security. For a designer, it is crucial to know potential attacks com- promising the security of a device. Otherwise, it is impossible to select or develop appropriate countermeasures.

1.2 Summary of Research Contribution

This thesis’s research contributions spreads over a broad area within the field of implementation attacks, i.e., novel passive side-channel attacks and active fault injection. First, we demonstrate that a voltage sensor can be built in the fabric of a Field Programmable Gate Array (FPGA). Through voltage fluctuations in the Power Distribution Network (PDN), this sensor can capture power traces without requiring an external oscilloscope at all. This enables spying on other cores on the same FPGA, on the CPU of a SoC, or even on an external chip connected to the same power network. Indeed, our contributions imply that power side-channel countermeasures might have to be deployed even when an attacker does not have physical access to the device. In the field of active attacks, we present an imaging technique for finding points-of-interest for laser fault injection: By measuring the Optical Beam Induced Current (OBIC), we can identify registers easily without requiring the chip to be powered on. Because the minimal laser spot size is physically bounded, there is an ongoing discussion whether single bit errors are possible at smallest feature sizes. Instead, we demonstrate in an additional work how to combine laser fault injection with Fault Sensitivity Analysis (FSA), enabling successful fault attacks even when the laser hits many transistors at once. In the following, we provide a more detailed summary of our contributions and subsequent security implications.

5 Chapter 1 Introduction

1.2.1 Novel Applications of Side-Channel Attacks

We present a circuit built within the user-available fabric of an FPGA that can measure the power consumption of other circuits nearby. We exploit that the dynamic power consumption can cause fluctuations on the Power Distribution Network (PDN). The power fluctuation will decrease the switching speed of connected transistors. To measure the fluctuations, we use a tapped delay line. When the chip (instantaneously) consumes a lot of power, the voltage in the PDN drops due to increased current demand. In consequence, the buffers of the delay line get slower and a signal through the delay line cannot travel so far. Likewise, if the chip consumes only little power, the voltage is high and the circuit fast and a signal through the delay line reaches further. Thus, the progress of our test signal through the delay line directly corresponds to the current power consumption. We compare the performance of our sensor to that of a traditional measurement using an external oscilloscope when used for side-channel analysis. Our power sensor can be deployed through an inconspicuous IP-core to measure the power consumption remotely without requiring physical access. In the following, we briefly highlight security implications in different scenarios caused by the availability of such a sensor.

Intra-FPGA

Multiple users per physical CPU is already common practice in cloud services using virtual machines. Introducing such a scenario made researchers investigate possible vulnerabilities, indeed finding all sorts of microarchitectural cache timing attacks. We follow those footsteps by pointing out possible security issues for FPGAs: Concepts for multi-tenant FPGAs are already widely investigated in academia. Here, different users might use different parts of the available fabric. The large players in cloud computing already offer to rent machines with powerful FPGA add-ons. Thus, we might see multiple users sharing a single FPGA in real life, with the potential of one user trying to spy at the other. Besides multi-tenant FPGAs, there is of course always some risk when integrating potentially malicious third-party Intellectual Property (IP) cores. To counter obvious security concerns, logical and physical isolation of the individual users or cores must be ensured. However, even when there is no logical connection and enough physical separation to counter crosstalk, the entire configurable logic of the FPGA shares a common Power Distribution Network (PDN). We test whether the sensor can capture the leakage of an AES hardware implementation through the PDN on the SAKURA-G side- channel evaluation board. Indeed, our results illustrate that the sensor can successfully measure the power consumption and key extraction is possible. Compared to a traditional measurement using an oscilloscope, we require only a slightly increased number of traces. As the PDN spreads over the whole fabric, the attack succeeds even with physical isolation and when both cores (sensor and AES) being placed in different corners of the FPGA.

System-on-Chip

Besides cloud computing with FPGA-accelerators, there exist other systems consisting of multi- ple hardware modules. System on Chip often include user-dedicated FPGA fabric as accelerator in addition to one or more CPU cores and ASIC cores. If FPGA fabric and the CPUs share a common power supply, there is a high chance of an additional power side channel. We were able to successfully attack a straightforward RSA implementation running on a Xilinx ZYNQ SoC.

6 1.2 Summary of Research Contribution

The RSA was implemented using the GnuMP multi-precision library and was executed on the ARM CPU. Indeed, the sensor implemented in the FPGA fabric was able to pick up differences between a squaring and a multiplication, allowing a direct key recovery using a single trace.

Inter-FPGA

The power distribution network does not stop at chip level but continues to the board level with dedicated Power Management Integrated Circuits (PMICs). We show that the sensor can pick up side-channel signals originating from other chips that are connected to the same PDN. Leaving the package-boundary of a single FPGA or SoC has severe security implications for board-level integration. While FPGAs are often thought as a security add-on, a malicious firmware or bitstream could potentially be used to spy on another chip on the same Printed Circuit Board (PCB). Further, our design is not limited to FPGA implementation but can be implemented in ASICs as well. Thus, virtually any otherwise unsuspicious device could be used to mount an attack, as long as it is connected to the same PDN as the target. As the SAKURA- G features an additional control FPGA, we used this platform to test our assumption. Thus, the AES core and an RSA-implementation were running on one FPGA and the sensor was implemented on another FPGA. We modified to PCB so that both FPGAs were powered by a single PMIC. Indeed, the RSA implementation was vulnerable to Simple Power Analysis (SPA) using a single trace. Further, we capture enough side-channel information for a Correlation Power Analysis (CPA) to be successful, although with a larger number of traces compared to the intra-FPGA attack.

1.2.2 Advanced Laser Fault Injection Attacks

Fault attacks are based on enforcing some computational error during the cryptographic opera- tion. A subsequent mathematical analysis of the genuine and the faulty output might reveal the used secret key. To inject a physical fault into the target device, multiple methods with varying granularity are known. For example, voltage or clock glitches usually affect the whole device and electromagnetic fault injection at least a large area. Laser fault injection allows faulting even single transistors and has thus seen some interest in academia and practice. Here, a laser beam is focused on a device so that electrical current is generated, similar to the current in a solar cell caused by the photoelectric effect. If this current is generated at certain positions, it may charge or discharge circuit nodes, creating the desired computational error. For combinational logic, the fault is only temporal, as the circuit will regain its genuine state depending on its input. When targeting memories, e.g., SRAM or register, the fault will be stored permanently until it is overwritten or reset. In the following, we briefly summarize our research contribution relating to laser fault injection.

Measuring the Optical Beam Induced Current for Finding Points of Interest

Many parameters determine the outcome of a fault injection, i.e., whether a physical fault occurs or not. Such parameters include the correct clock cycle, the duration of the effect, its physical intensity, etc. Laser fault injection adds some more parameters, e.g., the spot size, the focal plane, and the precise location on the chip. When using a multi-beam setup, all those

7 Chapter 1 Introduction parameters must be correct for every individual laser for a useful fault to appear. A brute- force approach for finding an optimal set of parameters is often infeasible because of the sheer number of tests to be executed. Thus, a clever search strategy is required. This especially holds for high-precision methods like laser fault injection targeting single transistors, due to the vast number of transistors in the latest technology nodes. Previous work used machine learning techniques based on the reasonable assumption that a useful fault occurs somewhere between “nothing happened” and “the device stops working”. To determine points of interest for laser fault injection, we propose measuring the Optical Beam Induced Current (OBIC) to create an image of the device while scanning over the device. Advantages over brute-force or “coarse- then-fine” scanning include that we can identify multiple areas of interest while the chip is not powered. This means that (reactive) countermeasures are shut-off as well and the locations can be found independently of the correctness of other parameters. Further, our method is possible with every setup for laser fault injection and scanning provides a better resolution compared to a regular camera. We provide experimental results by successfully locating the registers within the combinational logic of an ATXmega16A4U microcontroller. Indeed, when only targeting the desired state registers of AES, we were able to drastically reduce the search space compared to naive search over the whole area.

Large Laser Spots and Fault Sensitivity Analysis

The diffraction limit given by the wave nature of light forms a physical limit for the minimal spot size of the laser spot directly depending on the wavelength. Nowadays, laser fault injection is usually performed from the backside as multiple metal layers and metal fill block the light from reaching the transistors from the front side. For attacks from the backside, however, the laser beam must pass through the bulk silicon. Even with backside thinning, we still require a wavelength within the near-infrared spectrum for the silicon to become transparent. This limits the minimal spot size to around a single micrometer. The spot size is orders of magnitude larger than the minimum feature sizes of the latest technology nodes. Therefore, one might be tempted to think that such small features result in an inherent security as it is impossible to target single transistors. However, the gate width of a transistor usually determines the minimal feature size. The whole logic gate constructed by multiple transistors is usually much larger. On top of that, we present a fault injection and evaluation method that can cope with numerous affected transistors easily. To this end, we combine laser fault injection with Fault Sensitivity Analysis (FSA) in its correlation-enhanced version. FSA was previously only considered for clock glitches. However, we show that it is also applicable when using lasers by adjusting the pulse length of the laser or point in time the radiation is stopped. Like the traditional timing- based FSA, our method does not require precise understanding of the fault model, but only that an identical input will result in an identical faulty behavior. Likewise, in contrast to classical DFA, we do not require the faulty output but solely the information whether a fault occurred or not. We adjust the pulse width so that we see faults only for some inputs applied to the circuit. This allows for finding collisions between key bytes when a circuit is faulted when calculating the same key-dependent input. Especially noteworthy in the context of laser fault injection is that we can allow a rather large spot affecting many transistors. The reason is that only the longest path(s) will result in an actual fault and all other created faults are ineffective as they are overwritten by the genuine non-faulty input. We provide experimental results targeting the

8 1.3 Structure of this Thesis

AES implementation of an ATXmega with an artificially large spot size of 40 µm. Although the ATXmega is made using a rather old technology, extrapolating our results to latest sizes shows that laser fault injection might still be applicable. Thus, we conclude that one should not expect inherently secure circuits with shrinking sizes in the future and suitable countermeasures still need to be applied.

1.3 Structure of this Thesis

This thesis consists for four parts: We start with a short introduction and a summary of our research contribution. Further, we provide a more detailed overview of implementation attacks in general. The following two parts constitute the main part of our thesis and our contribution in detail, split into passive and active attacks. For each of the two topics, we introduce the technical background and related work that might be required for the reader later. Finally, we conclude and give directions for future research.

Preliminaries In this part, we first motivate research in the field of implementation attacks and summarize our research contribution. In the following chapter, we provide an overview of the seminal work for passive and active attacks and countermeasures.

(1) Introduction

(2) Implementation Attacks

Novel Applications of Side-Channel Attacks This part constitutes the first part of our re- search contribution, i.e., to the field of passive side-channel analysis. We start with an expla- nation of the working principle of Field Programmable Gate Arrays (FPGAs) and the source of power side-channel information and fluctuations on the Power Distribution Network (PDN). Further, we describe different types of sensors to detect such fluctuations and their implementa- tion in FPGAs. Then, we use such a sensor for successful side-channel analysis in three different scenarios as listed below and discuss consequential security implications.

(1) Preliminaries

(2) Intra-FPGA Side-Channel Attacks

(3) System-on-Chip Side-Channel Attacks, from fabric to CPU

(4) Inter-FPGA Side-Channel Attacks

Active Side-Channel Attacks The second main part of this thesis covers our contribution to the field of laser fault injection. We start with an introduction providing background on the principles of laser fault injection and the involved parameters. We present a method for finding flip-flops as target for laser fault injection by creating images based on the induced current by the laser. Further, we show a way to perform fault sensitivity analysis using laser fault injection, resulting in a relaxed fault model relating to the required spot size.

(1) Background on Laser Fault Injection

9 Chapter 1 Introduction

(2) Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements

(3) Large Laser Spots and Fault Sensitivity Analysis

Conclusion Finally, we provide a summary of our research contribution and the consequential security implications. Further, we discuss directions of future research in the covered areas of implementation attacks.

10 Chapter 2 Implementation Attacks

We introduce the field of implementation attacks, providing background on passive side-channel analysis and active fault injection attacks.

Contents of this Chapter

2.1 Introduction ...... 11 2.2 Side-Channel Analysis ...... 11 2.3 Fault Attacks ...... 13

2.1 Introduction

Implementation attacks target the specific implementation of a cryptographic scheme and are mostly independent of its mathematical security. Such attacks usually consider probing the target’s execution and thus, provide additional information in addition to its inputs and outputs. For example, an adversary might try to capture unintentional information channels leaking secrets (passive attacks), try to inject faults to manipulate intermediate values (active attacks), or even both at the same time. This thesis’s main contributions are in these two major areas of implementation attacks. Hence, we briefly describe the ideas behind such attacks and point to the seminal work.

2.2 Side-Channel Analysis

Side-channel attacks are passive implementation attacks. The adversary captures unintended information channels that might occur besides intentional input/output behavior. One example is the timing side channel [Koc96]. An implementation might be vulnerable if the execution time of the algorithm depends on some secret value. We explain the concept using a bad implementation of a string-compare operation that works as follows: The symbols are compared consecutively to a secret reference value. The next symbol is only processed if the proceeding symbol was correct. It is easy to deduce the secret compare value by testing symbols and measuring the response time, starting with the first one and moving on. Here, we can observe a crucial property that holds for virtually all implementation attacks. With increasing (key) length of the target algorithm, the complexity of the attack only increases linearly, instead of exponentially for a brute-force approach testing all possible combinations.

11 Chapter 2 Implementation Attacks

Practical attacks have also been reported using more exotic side channels like tempera- ture [HS13] or sound [GST14, GST17]. The most popular side channel is the power consumption of the target device [KJJ99]. The power consumption can be measured over a shunt using an oscilloscope. Physically closely related is measuring theEM emanation, simply caused by the relation betweenEM fields and changing electric currents. This means that the power con- sumption can also be captured through coils picking up theEM emanation. In the following, we briefly describe the two basic methods of side-channel attacks and corresponding counter- measures.

2.2.1 Simple Power Analysis (SPA) Kocher et al. introduced SPA[KJJ99]. Here, only a single or a few side-channel traces are evaluated. A popular example is the binary exponentiation, e.g., used for RSA-decryption. The algorithm iterates over single bits of the secret exponent. In every step, a squaring is performed. If the current bit of the exponent is set, an additional multiplication with the base is calculated. Both operations are computationally intensive because of the large operands, i.e., at least 1024 Bit per operand. As modular squaring can be implemented more efficiently than multiplication, the power consumption (and probably duration) differs likewise. If an adversary can observe these differences in the power consumption, the secret exponent can be extracted directly by the occurrence of the multiplications.

2.2.2 Differential Power Analysis (DPA) In contrast to SPA, a large set of traces is statistically evaluated for DPA[KJJ99]. First, the power consumption of multiple encryptions under varying input is recorded. For evaluation, an intermediate value of the cipher is selected that depends on a small portion of the key. For the classical difference-of-means attacks, a single bit of this intermediate value is selected. By guessing this small part of the key, a hypothetical assumption about the intermediate can be made. For all hypotheses, the traces are separated into two bins, depending on whether the bit was set or not. Then, the bins are averaged and subtracted from each other. If the guessed key and hence the separation was correct, the difference will show a peak resulting from the different power consumption caused by the “1” or the “0” being processed. If the key guess was wrong, the separation was random and there should be no peak visible. This approach was later refined to Correlation Power Analysis (CPA)[BCO04]. Instead of us- ing only a single bit, the intermediate value gets mapped to a hypothetical power consumption. When considering software implementations running on a microcontroller, the power consump- tion might be proportional to the number of ones on the memory bus, i.e., the Hamming weight of the value. Then, the measured power consumption is correlated with the hypothetical values using Pearson correlation. As before, if the partially guessed key and the power model were cor- rect, there should be some correlation between both, resulting in a peak. If the key was wrong, the measured traces are uncorrelated to the intermediate value and the resulting correlation will appear as noise. Recently, a methodology for leakage assessment based on Welch’s t-test has been pro- posed [GJJR11] which was widely adopted in academia. Two variants were proposed. For the first one named “unspecific”, the traces are recorded for fixed and for random input and then compared. For first order security, the method tests the hypothesis whether the mean of

12 2.3 Fault Attacks the two populations is equal. There is some indication of leakage if the populations are not equal. However, the test does not state that the leakage is exploitable. Higher order attacks can be implemented by testing for the variance (2nd order), the skewness (3rd order), and so forth. The “specific” version works like a difference-of-means DPA but with Welch’s t-test as distinguisher. As it thus can incorporate a key-dependent intermediate value, it can be used for an actual attack.

2.2.3 Countermeasures

Countermeasures against side-channel attacks can be split up into three categories: (a) masking, (b) hiding, and (c) re-keying. In practice, a combination of these is often advisable [MW15], as they are complementing to each other. Masking [CJRR99, RP10, DDF14] aims to make the power consumption independent of the processed value. Depending on the desired order of protection, the input is initially split up into different shares so that their sum equals the original value. For each encryption, the values of the shares are chosen randomly. Then, the cipher operations are performed on each share individually while preserving the above property. As masking linear layers is easy, virtually the entire research in this field focuses on masking the non-linear substitution layer of a cipher. Another approach is to somehow hide the power consumption which can be achieved in two different ways. One option is to add noise so that the side-channel signal is buried within the noise floor [GM11]. This, however, can be counteracted by increasing the number of measure- ments. Another idea is based on balancing the power consumption, i.e., so that always the same amount of power is consumed. This is usually achieved by specific logic styles, e.g., dual-rail precharge logic [TAV02, TV04, GCS+09, WMG18]. Here, one must take care of early propa- gation and glitches as such still could result in leakage. Even then, manufacturing tolerances might still cause the paths to be imbalanced. Re-keying as a third approach restricts the number of traces that can be recorded under the same key. Stateless [MSGR10, MPR+11] as well as stateful schemes [Koc98] have been proposed. However, special care must be taken when designing and implementing the key derivation function [DEMM14].

2.3 Fault Attacks1

Lenstra and Boneh et al. were the first who showed a practical fault attack by exploiting an erroneous computation on a cryptographic device to recover a secret key [Len96, BDL97]. For such attacks, the device is intentionally operated outside its specification so that some faulty output can be observed. Based on a subsequent mathematical analysis of the faulty (and genuine) output, the adversary can recover the secret key. Like all attacks that target the implementation of a cryptographic scheme, the success of a fault attack is mainly independent of the mathematical security considering a black-box model. Researchers started to explore (a) physical methods to inject faults and (b) develop attacks on commonly-used cryptographic schemes [BCN+06, KSV13]. Knowing that fault attacks can entirely compromise the security of a cryptographic device, (c) corresponding countermeasures were developed likewise—either

1Section 2.3 is based to some large extend on one of the contributions to [AMR+18] of this thesis’s author.

13 Chapter 2 Implementation Attacks specific to a certain method of fault injection or to provide protection in general. In the following, we briefly introduce these three areas of research in the field of fault attacks.

2.3.1 Physical Means of Fault Injection

Many physical means of injecting a fault are based on operating the device outside its specifi- cation. For example, an attacker might target the supply voltage line by introducing voltage glitches, with voltage levels both above [ABF+02] and below [SGD08] the intended supply voltage. The latter is closely related to clock glitches [ADN+10]. Both violate the time the combinatorial logic requires for computation, either by slowing down the circuit (lower volt- age) or by exceeding the maximum frequency (clock). In consequence, the computation does not finish at the next clock event, causing the corresponding registers to store a wrong value. Electromagnetic fields in the near vicinity can affect the target’s execution as well. Strong electromagnetic pulses [QS02] might induce current into the device by direct coupling. Further, ring oscillators might lock on externally applied frequencies [BBA+12], causing steady output when being used as a True Random Number Generator (TRNG). While the these methods usually affect the whole device or a large area, optical fault injec- tion [SA02] using focused laser beams can scale down the focus to a single transistor. Recent works have shown that by advanced optical setups it is even possible to target multiple tran- sistors independently [SHS16]. A detailed description of the principles of laser fault injection is provided in Chapter7. Finally, Rowhammer-type attacks [KDK +14] demonstrated that fault attacks can be executed even without physical access to the target device. In practice, all physical fault injection techniques incorporate many parameters leading to a vast search space for a successful attack. For example, the adversary must consider the timing (i.e., clock cycle), the physical intensity, the duration of the effect and, for targeted methods, the location (x/y) on the device and even the distance (EM) or focal plan (laser), respectively. A useful fault might only occur if all parameters are correct. This leads to various approaches trying to reduce the search space [CPB+13, SFR+15]. When analyzing the effect caused by the physical methods of fault injection, multiple pa- rameters can be derived on how the target will be affected [VKS11, BCN+06, KSV13]. Most notably we can refer to its electrical effect, e.g., whether some internal value will be always set to logical ONE or always reset to logical ZERO. Note that although faults sometimes are modeled as bit-flip or bit-toggle (i.e., set and reset based on the genuine value respectively) there is no reliable physical method known that would achieve this effect. In any case, bit toggle is certainly useful to model both set and reset faults. Another parameter is the area that will be affected, i.e., a single transistor or more bits, entire registers or words up to full variables. A crucial aspect is the distribution of the resulting faults as there is usually some form of bias [GYTS14, FJLT13, GSD+08, GRG+07, ADN+10]. Considering for example clock glitches or underpowering, the bits involved in the critical path will be the first becoming faulty. For optical fault injections, only the exact area that is illuminated by enough photons will be affected.

14 2.3 Fault Attacks

2.3.2 Fault Exploitation The vast majority of attacks on ciphers is based on comparing a single or multiple faulty outputs to genuine ones respectively, i.e., Differential Fault Analysis (DFA)[BS97]. DFA is closely related to reduced-round differential cryptanalysis [BS90]. The naivest fault attack targets directly the key storage and assumes an asymmetric physical fault. For this attack, we assume that an attacker can force a single bit i of the key k stored in the device to logical ONE using a transient fault. Thus, independent of the previously stored value, we observe the behavior ki : 0 → 1 or ki : 1 → 1. Note that such fault manifestation is indeed realistic using lasers, cf. Chapter7. When comparing the faulty output to a genuine one, we can derive a trivial fault attack: A change in the output means that the bit was ZERO. Instead, if the output under fault injection is identical to the genuine one, the corresponding bit of the key was already set. This attack defines a lower bound for attack complexity similar to a naive brute force attack in relation to mathematical cryptanalysis. For an advanced fault attack to be considered valid, it should outmatch the trivial from above in one of the following properties:

 Relaxed requirements for precision and repeatability

 A lower number of required faults to recover the whole secret key

This, of course, usually come with a trade-off regarding the number of key hypothesis that need to be tested. The seminal attacks [Len96, BDL97] on an implementation of RSA using the Chinese Re- mainder Theorem are certainly an extreme case for a relaxed fault requirement. Here, it is enough to create some arbitrary fault at an arbitrary point in time during a computational expensive modular exponentiation to factor the modulus and revealing the secret key. Considering differential fault attacks on AES, we can observe the full spectrum balancing fault precision, computational complexity, and number of faults [BS03, Gir04, DLV03, PQ03, MSS06, Muk09, SMC09, AMT13] In addition to the attacks above, there are multiple more “exotic” approaches that differ in certain aspects or requirements: Fault Collisions [BK06], Fault Sensitivity Analysis [MMP+11], Differential Fault Intensity Analysis [GYTS14], Statistical Fault Attacks [FJLT13], Safe-Error Attacks [YJ00], Statistical Ineffective Fault Attacks [DEK+18] etc. Using one or another of such attacks, implementations of both symmetric (AES [BS03], DES [BS97]) and asymmetric schemes (RSA [BDL97]) were found to be vulnerable. Candidates for standardizing a cipher for authenticated encryption (CAESAR challenge) were successfully analyzed as well [DEK+16]. In fact, nearly every newly-proposed cipher is usually followed by a publication describing a corresponding DFA, that is indeed a form of differential cryptanalysis on some last rounds of the cipher thereby defining a fault model.

2.3.3 Countermeasures Considering countermeasures, one approach is to shield the cryptographic operation in some way. This might include actual metal shields to hinder EM pulses, or generating the clock signals and voltage levels internally so that they cannot be affected by an attacker. Other countermea- sures detect specific fault injection methods with sensors, e.g., clock glitches [ELH+12], voltage

15 Chapter 2 Implementation Attacks perturbations [BTD+13, KH14],EM faults [ERM16] or laser illumination [BTD +13, HBB+16]. However, such countermeasures are usually applied ad-hoc and counter only specific attacks. An obvious generic approach is to introduce some form of redundancy in area and/or time. For example, one might sign and verify to counter the attacks on RSA-CRT [BDL01]. For symmetric schemes, this translates to repeating the encryption and/or decryption for time redundancy, or multiple encryptions in parallel for area redundancy. More sophisticated approaches employ coding schemes instead of plain redundancy [BBK+03, OKKG05, NFR07, BCC+14, APH+16, CG16, ABCV17, AMR+18]. Such Concurrent Error Detection (CED) schemes commonly check or compare whether indeed no fault occurred to enable the output. One interesting idea is Infective Computation [YJ00] which entirely omits the final check whether an error occurred. Instead, any faulty intermediate value will randomize the output of the cipher so that it is of no use for an attacker. At least for symmetric schemes, such approaches were unfortunately repeatedly broken, e.g., [GST12] was broken in [BG13] and its improved version [TBM14] was broken again in [BG15].

16 Part II

Novel Applications of Side-Channel Attacks

17

Chapter 3 Preliminaries

In PartII, we use voltage sensors built in the fabric of an FPGA to measure fluc- tuations in the Power Distribution Network (PDN). For a better understanding of the following chapters, we briefly introduce the working principle of FPGAs, power distribution for and within Integrated Circuits (ICs), and different ways to measure such fluctuations.

Contents of this Chapter

3.1 Introduction ...... 19 3.2 Field Programmable Gate Arrays ...... 19 3.3 Power Distribution and Consumption ...... 22 3.4 Measuring the Power Side Channel ...... 24

3.1 Introduction

In the following chapters, we use the voltage fluctuations in the Power Distribution Net- work (PDN) of a complex design as source for side-channel information. Measuring this dynamic power consumption externally or throughEM emanation is the basic idea for any power side- channel analysis and well-established. Here, a shunt resistor is placed in the supply line to measure the voltage drop relating to the current drawn by the device. In contrast, we demon- strate later that we can measure such variations with a sensor built using the user-available fabric of an FPGA, without requiring an external oscilloscope. Further, we show that this sen- sor can pick up variations originating from the whole PDN. This includes measuring leakage (a) from other parts of the fabric, (b) from a CPU of a SoC implemented on the same silicon die, and (c)ICs on the same Printed Circuit Board (PCB) as the sensor. As introduction to the topic, we briefly describe the working principle of FPGAs in the following as well as fluctuations on the PDN and how to measure them.

3.2 Field Programmable Gate Arrays

From an architectural point-of-view, FPGAs are located somewhere between running software on a CPU and a dedicated hardware implementation in an ASIC. Like software on a CPU or mi- crocontroller, an FPGA can be programmed. However, instead of providing a list of instructions

19 Chapter 3 Preliminaries to be executed, the program directly determines how hardware elements are configured and con- nected. This includes registers and look-up tables so that any logic function can be implemented. When comparing FPGAs to ASIC-implementations, the FPGAs have a faster time-to-market and a lower setup cost as no masks etc. are required. However, considering large quantities, the cost per unit for an FPGA is much higher. Compared to a CPU-implementation and depending on the task, FPGAs are much faster and more energy efficient. However, FPGAs might be not as versatile and creating a configuration for a FPGA is much more complex. Thus, for deciding which architecture to chose, e.g., ASIC, FPGA, or CPU, there is usually some trade-off between performance, cost (setup and recurring), time-to-market, etc. The properties above render FPGAs an ideal target when high performance is required but only a small volume should be produced. Further, in-field updates are beneficial for many ap- plications. According to Xilinx, FPGAs are often seen in practice in the areas of aerospace and defense, prototyping, automotive, data center, high-performance computing, test & measure- ment, etc. Here, FPGAs are used as general-purpose accelerator in addition to a CPU to benefit from both worlds. For example, many SoCs include additional FPGA fabric next to multiple CPU cores. Further, all large cloud computing providers offer options to rent machines that are equipped with additional FPGAs. For both, the idea is that computational intense tasks can be outsourced using optimized implementations running on the FPGA. Security critical tasks can profit as well: hardware implementations are not susceptible to classical attacks on CPUs such as buffer overflows etc.

IO IO IO IO IO IO IO IO IO

IO IO

IO CLB CLB CLB CLB IO

IO IO a

IO CLB CLB CLB CLB IO b y IO IO c IO CLB CLB CLB CLB IO d IO IO x1 x0 IO CLB CLB CLB CLB IO

IO IO

IO IO IO IO IO IO IO IO IO (b) 2-input LUT (a) FPGA Overview

Figure 3.1: FPGA overview consisting of CLBs, IO-cells and switch matrices (left) and func- tional description of a two-input/one-output LUT (right) (both based on [Xil13]).

We briefly introduce the working principle of SRAM-based FPGAs and follow the naming scheme of FPGAs manufactured by Xilinx Inc. [Xil10] along the way. Instead of providing a list of instructions to be executed, generic and fine-grained hardware elements available on the FPGA are configured and connected directly. The configuration is stored in SRAM cells and is

20 3.2 Field Programmable Gate Arrays called bitstream. As SRAM is volatile, the bitstream must be programmed into the FPGA at each power up and is usually stored externally on flash memory. Figure 3.1(a) depicts the general structure of an FPGA. The Configurable Logic Block (CLB) as primary element is where the desired functionality is implemented. Other parts include different variants of Random Access Memory (RAM), dedicated logic for Digital Signal Processing (DSP), I/O-configuration, etc. Each element is connected to a switch matrix. These switch matrices can be connected to route signals of the CLB or other elements across the FPGA or directly to I/O-pins. Besides connecting adjacent switch matrices, there is also the option to skip multiple nodes by long wires. Each CLB consists of two independent slices. Xilinx FPGAs consist of multiple variants of such slices. Figure 3.2 depicts a section of a SliceL. Only the upper half is depicted as the lower part is mostly identical. Each slice consists of four 6-input LUTs, logic for carry handling, eight flip-flops that can also be bypassed and wide multiplexers. Most notable are the LUTs: Here, any desired Boolean logic function (or logic gate) can be realized in form of one or more LUTs. Figure 3.1(b) depicts an exemplary two-input/one-output LUT: The output y can be set to one of the SRAM elements a, b, c, d depending on the two inputs x0, x1. The values in the SRAM cells can be programmed by the n user. Note that there are 22 possible configurations of the SRAM-cells of an n-input LUT. n Thus, 22 different boolean functions can be realized in an n-input LUT.

Figure 3.2: Section of a Xilinx Spartan6 SliceL, not depicted is the lower half as it is identical to the upper one. It consists in total of four 6-input LUTs, the carry logic, eight flip- flops, and wide multiplexers. The blue arrow depicts the path between two buffers to be later used in the tapped delay line (cf. Sect. 3.4.2). The red arrow indicates the path to the latch tapping the delay line. (Base image source: [Xil10])

21 Chapter 3 Preliminaries

3.3 Power Distribution and Consumption

The PDN ensures that every part of the circuit is supplied with its average and peak current requirements at a stable supply voltage. There might beICs and components placed on a circuit board that require different supply voltages. Further, today’s complex ASICs, FPGAs, or CPUs often require multiple supply voltages themselves. For example, different voltage levels might be required for the inner core-voltage, the IO-pins, integrated peripherals, memories, etc. For convenience, almost every electronic device is powered by only a single external power supply, often including a transformation from alternating current of the wall socket to direct current. Internally, Power Management Integrated Circuits (PMICs) are then used to generate the different required voltages. In the following, we introduce the properties of PDNs based on [WH10].

3.3.1 Power Distribution Networks (PDNs)

Voltage PCB Package Bonding On-Chip Regulator and Pins Wires RCL-Network

Figure 3.3: Lumped model of a power distribution network, from left to right: PMIC, the PCB with various capacitors, theIC package with its bonding wires, and the complex on-chip RCL-network (based on [WH10, p. 560/563])

Delivering the current from the PMIC to the demanding sinks involves multiple partly para- sitic effects that need to be considered. Figure 3.3 depicts all relevant parts in a lumped model for a single voltage [WH10]. The PMIC is located on the left and can be modeled as a non- ideal power supply. Most relevant is its internal resistance, as it limits the current that can be supplied instantaneously. Going further, both the PMIC and the demandingIC are placed on a PCB, together with capacitors to buffer the supply voltage. Here, we have both distant large capacitors and small-value capacitors with a lower Equivalent Series Resistance (ESR) that are usually placed closer to the pins of theIC. Besides such intentional capacitance to stabilize the instantaneous current demands, there are multiple unwanted parasitic effects such as wire resistance and inductance, etc. Next, the package contributes with its solder connections, the pins or solder balls, and its lead frame carrying the silicon die. The main contribution of the package is inductive by the bonding wires and pins. Finally, the metal layers and the active layer can be modeled as rather complex network of all three components. Considering an FPGA, the individual slices are per default logically separated and only connected when the switch matrices are configured accordingly. However, even when there is no

22 3.3 Power Distribution and Consumption logical connection, every part within the inner core is still connected to the same PDN. Our goal is to use both the on-chip PDN and the outside PDN as carrier for side-channel information. Thus, we discuss the influence of the PDN to the rest of the circuit in the following.

3.3.2 Power Requirements of CMOS devices

We briefly summarize the origin of the power demands P of Complementary Metal Oxide Semiconductor (CMOS) devices based on [WH10]. The overall current I(t) through the device (P = I · V ) can be separated in a constant static part and a dynamic part that depends on the active gates: I(t) = Istatic + Idynamic(t) The main benefit of CMOS compared to other logic styles is the very low static power con- sumption, i.e., the majority of the power is only consumed when the circuit switches. However, with recent very small technology nodes, the static current through the device is becoming relevant [NDA+01]. The three main components contributing to the static power consumption are the following: Istatic = Isubthreshold + Igate + Ijunction The subthreshold current describes the leaking current from source to drain through transistors that are actually turned off. Igate describes the leakage current passing the dielectric from the gate terminal for on-state transistors. In addition, there might be some leakage at the junction of source or drain diffusion areas caused by a potential difference to the substrate [WH10]. The dynamic power consumption Idynamic depends on the active transistors at point-in-time t and thus, usually occurs directly following a clock event:

Idynamic(t) = Ishort−circuit(t) + Ilogic(t)

If an output signal changes, the connected combinatorial logic switches sequentially until a stable state is reached, ready to be stored in the following set of registers. Whenever the output of a CMOS gate changes its state, the pull-up network will become conducting while the pull- down network starts to block (or vice-versa). Because both transistors are switching at the same time, there will be a time when both are in a partially conducting state causing current to flow directly from VDD to ground (Ishort−circuit). Most of the dynamic power consumption is caused by the toggling logic gates themselves. We must distinguish between output transitions from low to high and high to low:

0 : 1 Whenever the output of a gate changes from low to the high state, its (parasitic) output capacity defined by its fanout needs to be charged (cf. Fig. 3.4). This charging current must be supplied through the power supply and flows through the on-state pMOS transistors. The small voltage drop across the on-state transistor will result in an additional energy dissipation [WH10].

1 : 0 When the output of a gate changes from high to low, an nMOS transistor will connect the output capacity to ground. As both sides of the capacitor are then connected to ground, the resulting connection will discharge the capacitor without requiring current by the power supply.

23 Chapter 3 Preliminaries

charging current discharging current in out

Figure 3.4: Current flow at a CMOS-inverter when charging and discharging its output capacity for output transitions from low to high and high to low, respectively.

Most side-channel attacks target the dynamic part, as it is linked to the active transistors in the circuit. However, recent work investigated the static part as side-channel source as well [Mor14, PSKM15, MMR17]. The idea is that the leakage current will differ for nMOS and pMOS transistors. Thus, the static current will depend on the current state of the gates, i.e, which transistors are currently conducting. When halting the clock, one might be able to measure this dependence. The varying current required by the device leads to a voltage drop in the PDN through the inductive components L · dI/dt. In addition, the resistive components add a steady state offset I · R. Because low-resistance copper wires outside, dominant part by on-chip resistors [WH10, p. 557]. Thus, the voltage drop of the PDN across the device due to current demands is

Vdrop = L · dI/dt + I · R.

3.4 Measuring the Power Side Channel

As described in Section 2.2 and 3.3.2, the dynamic power consumption of a circuit is linked to the data being processed and could potentially leak secret information. In the following, we describe traditional ways of measuring the power consumption and recent ideas how to build analog sensors within the digital fabric of an FPGA.

3.4.1 Traditional Measurement

DUT VDUT

VR

Figure 3.5: Measurement of the current through the Device under Test (DUT) by the voltage drop VR over a shunt resistor in the ground path.

24 3.4 Measuring the Power Side Channel

Traditionally, a shunt in the ground path of the PDN is used to measure the power consump- tion of the circuit (cf. Fig. 3.5). The voltage drop across this shunt resistor is

VR(t) = I(t) · R where I(t) is the dynamic current demand of the device as described in Section 3.3.2. The desired power consumption of the Device Under Test (DUT) is calculated as

PDUT (t) = I(t) · VDUT (t) = I(t) · (VCC − VR(t)) .

As the term I(t) · VR(t) introduces some error, the value of the shunt resistor should be kept small [Osw13]. In this case, measuring the voltage drop VR(t) across the resistor suffices to measure the desired power consumption:

P (t) ∝ VR(t)

Note that there are various other ways to place the shunt and the probes of the oscilloscope. The advantages and disadvantages are described in [Mor16]. Further, electrical current I causes a magnetic field of strength H at the distance r when passing through a straight conductor (around the axis of the conductor): I(t) H(t) = . 2π · r Thus, it is also possible to get an indirect measurement of the power consumption by measuring the electromagnetic emanation of the device. On option is to measure the near-field, leading to a wireless measurement of the power consumption [Sch10]. Another option is to spatially measure only a small portion of the circuit in the ultra-near field [SBO+15]. For power andEM measurements, a high-speed Analog to Digital Converter (ADC) is used to digitize the analog signal by the respective probes, e.g., a digital oscilloscope. For asynchronous sampling, the sampling rate should be much larger than the frequency of the DUT[Sch10]. Otherwise, the side-channel information might be assigned varying sample points. Synchronous sampling relaxes the performance requirements at the ADC drastically [OC15].

3.4.2 Voltage Sensors in FPGA Fabric Besides the traditional measurement using a shunt, there is work showing “exotic” ways to get an indirect measurement of the power consumption. For example, [GST14, GST17] used the acoustic characteristic and [CPM+18] showed that the power consumption might be modulated onto a Radio Frequency (RF) transmission. For directly measuring the power consumption of an FPGA, the internal power sensors have already been shown to be not suitable for side-channel analysis [Nag12]. In the following, we describe how to use the PDN as source of side-channel information. From a chip designer’s point-of-view, the noise carried through the PDN is of course unwanted and should be minimized. However, from an attacker’s point-of-view, this noise might carry side-channel information that is linked to secret data. Since we expect useful side-channel information in the PDN, our goal is to measure this voltage drop. While this is of course possible using an external measurement with an oscilloscope, our goal is to use a malicious circuit on the die. Further, we exclude dedicated system monitors as such would not be available to a stealthy adversary. There is a direct connection between the supplied voltage level and the

25 Chapter 3 Preliminaries transition time of a signal through a logic gate. Thus, instead of measuring the voltage level of the PDN directly, we can also measure the time a circuit requires to perform some operation. This will result in an indirect reading of the voltage level and thus, also of the varying current required by the circuit. This basic connection was already shown in [GOKT16], i.e., by toggling many flip-flops next to a sensor. [GOKT16] also showed that voltage fluctuations on the PDN indeed have a major influence on the speed of a circuit, e.g., when compared to temperature. Two basic ways of measuring time are illustrated in the following.

Delay Line-based Time-to-Digital Converters

The tapped delay line as a voltage sensor within the FPGA fabric was first introduced in [ZSZF13]. Their implementation offered a sample rate faster by a factor of 500 compared to the built-in ADC of the 28 nm Xilinx FPGA. It is intended to characterize noise on the PDN, i.e., to be able to capture unwanted nanosecond transients. An additional motivation is to detect an attacker that tries to fault the device using voltage glitches. Instead, we will use the sensor as measurement tool to capture side-channel information originating by nearby circuits.

initial delay observable delay

clk 1 1 0 0 0

en latch 11000 = 3

Figure 3.6: Tapped delay line

The basic layout of the tapped delay line is depicted in Figure 3.6, consisting of a series of buffers. The delay line is split into an initial and an observable part to save area. For the observable part, the signals in-between the buffers are tapped to a large register of latches. A clock input is connected to the delay line and to the enable-input of the latches. The blue signal depicts the clockcounter signal traveling through the delay line. At the depicted moment of time, the rising clock edge already passed all buffers of the initial delay and one buffer of the observable delay. The falling clock edge on the left already reached the circuit. While there is no influence on the observable delay line, the falling edge will reach the enable-input of the latches immediately. Thus, the current value of “11000” will be stored in the latches. For compression, this value can be converted to an integer, e.g., by counting the number of zeros. As discussed in Section 3.3.2, the instantaneous current demands of a circuit will result in current fluctuations on the overall PDN. If a circuit connected to the same PDN requires a high amount of current, the voltage will drop. Thus, the supply voltage of the buffers decreases for a moment, resulting in a slower transition time. In other words, the clock signal will require longer to pass the buffer that is currently active. Instead, if the device requires only little current, the supply voltage stays high and clock signal can travel fast through the active buffer. Thus, to get a voltage reading of the PDN, one needs to measure how far a signal passed through the delay line. By counting the number of zeros, we get a high value when the clock did not reach

26 3.4 Measuring the Power Side Channel

Initial Delay Observable Delay

LUTs CARRY4 Latches Registers

Figure 3.7: Floorplan (rotated right) of one voltage sensor with 18×(LUT, Latch) as part of the initial delay (Source: [SGMT18b]). far, i.e., due to a high power consumption. In turn, we will get a low value if the clock traveled far, i.e., a low power consumption. The circuit needs some tuning depending on the applied clock frequency: The falling edge should neither overshoot by passing the whole delay line nor undershoot by not reaching the observable delay line at all. This can be compensated either by adjusting the length of the initial delay line or by adjusting the phase of the enable-signals of the latches at runtime. We use the sensor implementation for Xilinx FPGAs provided by [GOKT16]. Here, the observable part is implemented using the CARRY4 carry-chain primitives of a SliceL, providing the finest granularity. Figure 3.2 depicts the path between two buffers (blue arrow) and the path to a latch tapping the delay line (red arrow). The initial delay is implemented using a combination of LUTs and latches to save area. Figure 3.7 depicts the resulting implementation in the floor-planning view of the Xilinx ISE. Note that the way the sensor is constructed inherits a trade-off between the sampling rate and signal variation which might be falsely identified as a trade-off regarding quantization. For our straightforward adjustment of the phase, we adjust the delay added by the elements in the initial delay part of the sensor. We assume that the sensor is configured so that the falling edge of the clock is indeed sampled within the observable delay part. Now, we double the frequency of the clock signal. In consequence, we must approximately halve the initial delay. Of course, the exact factor depends on the ratio of the elements in initial delay to the observable delay and how far the clock signal reached. If the initial delay is not adjusted, the falling clock edge will still be stuck somewhere in the initial delay at the time the latches store their input. However, this also means that the time the PDN can influence the delay chain is halved likewise, and we will see less variation. Decreasing the clock frequency will require a longer initial delay and thus, leads to a higher variation. This effect can be observed in Figure 4.3.

Ring Oscillators as Counters

Another option to measure the speed of a circuit is by counting the number of oscillations of a ring oscillator. When the circuit consumes a large amount of power, the voltage supplied to the oscillator decreases, resulting in a lower number of oscillations. If the circuit consumes less power, the voltage does not drop and the oscillator is faster, i.e., resulting in a higher number of oscillations. As the length of a single oscillation determines the resolution of the sensor, the path must be as short as possible. Figure 3.8 depicts the approach using only a single inverter. The oscillator triggers a counter whose value is read out periodically. This design was used in [RPD+18, GRE18] to pick up signals originating from long wires (cf. Sect. 3.2). [ZS18] uses the

27 initial delay observable delay

clk 1 1 0 0 0

en latch 11000 = 3

Chapter 3 Preliminaries average value of 20 ring oscillators to measure fluctuations on the PDN originating from an RSA implementation.

counter

Figure 3.8: Ring oscillator as voltage sensor

Comparison and Discussion on the Implementation Both the tapped delay line and the ring oscillator are based on the very same principle of operation, i.e., measuring the speed of a circuit as function of the voltage drop across the circuit which is related to the power consumption. The tapped delay line is an unrolled version of the ring oscillator, requiring more area. At least for Xilinx FPGAs, the ring oscillator can only be constructed using a LUT as inverter and routing its output through a switch matrix back to the input of the CLB. This leads to a relatively long atomic path that will increment the counter by one. Instead, the observable part of the delay line is implemented using a buffer in the carry chain as atomic measure of time. Thus, even though the physical influence of the power consumption to the speed of the circuit might be identical, the ring provides a lower resolution given by the slower delay element. We implemented multiple ring oscillators on the SAKURA-G based on the description given in [ZS18]. Indeed, when measuring the fluctuations on the PDN caused by an AES core, the number of oscillators differed at most by one while others even showed a constant number. Using the same measurement environment and the tapped delay line resulted in up to 15 different delay values. However, the downside of the tapped delay line is that the delay of each buffer varies due to placement or manufacturing tolerances. This would result in a nonlinear measurement whereas the ring oscillator is linear because always the same inverter is measured. However, the favor might change towards the ring oscillator when looking at ASIC imple- mentations, as the problematic long routing of the ring oscillator’s feedback is nonexistent.

28 Chapter 4 Intra-FPGA Side-Channel Attacks

We present a successful side-channel attack from one part of the user-available fabric of an FPGA to another. Even though distinct cores implemented on an FPGA might be logically isolated, they still share the same Power Distribution Network (PDN) and power supply. Circuit activity causes fluctuations in the power network. Thus, the voltage on the PDN can be used as a side channel to spy on other circuits. We use a tapped delay line as sensor to measure the speed of the circuit as function of the supply voltage. We provide experimental results by targeting an FPGA im- plementation of AES. The SAKURA-G as our first target is specifically made for side-channel analysis and makes it possible to compare the performance of the sensor to a traditional oscilloscope. In addition, we present similar results for off-the-shelf general-purpose FPGA development platforms. The sensor can be deployed through an unsuspicious IP-core acting as a Trojan or can be used directly by another user on a multi-tenant FPGA.

The work described in this chapter relating to the SAKURA-G platform was published at DATE’18 [SGMT18b] and was joint work with Dennis Gnad of Karlsruhe Institute of Tech- nology. The initial implementation of the voltage sensor by Dennis Gnad was jointly adopted for side-channel analysis and corresponding attacks were performed in close collaboration. We indicate in the following which sections are taken from [SGMT18b]. In this case, both authors contributed equally. Contents of this Chapter

4.1 Introduction ...... 29 4.2 Adversary Model ...... 30 4.3 Experimental Setup ...... 31 4.4 Results on the SAKURA-G ...... 33 4.5 Results on General-Purpose FPGA Development Platforms ...... 35 4.6 Conclusion ...... 37

4.1 Introduction

It has been known for a long time that timing attacks can be performed remotely, even through the noise added by the Internet [Ble98]. Instead, fault attacks were thought to require physical access, i.e., to create glitches at the clock input. Only recently, Rowhammer-based attacks

29 Chapter 4 Intra-FPGA Side-Channel Attacks demonstrated that a fault can be injected into Dynamic Random Access Memory (DRAM) by repeatably accessing neighboring cells [GMM16]. This can be triggered purely by executing specific code on the CPU. Further, [TSS17] demonstrated on a smartphone that clock glitches might occur if the circuit that dynamically adjusts the voltage levels and the clock speed is put into an unstable configuration. Following these ideas, [KGT18] shows that creating a large amount of switching activity on an FPGA leads to a voltage drop in the PDN. If the voltage drop is large enough, faults might occur in neighboring circuits. Side-channel attacks based on measuring the power consumption of a device are thought to require physical access as well, i.e., to connect an oscilloscope. Of course, the electromagnetic emanation might be picked up at a small distance or even further if the chip emits something that is a function of the power consumption. For example, [CPM+18] demonstrated that the power consumption of an encryption might be modulated onto the signal emitted by an RF- transmitter. This chapter constitutes the start of our research contribution to the field of side-channel attacks: Our seminal work described in the following is the first to demonstrate that a remote and direct measurement of the power consumption of a circuit is possible. We use a voltage sensor based on [GOKT16] to capture fluctuation on the PDN (cf. Sect. 3.3.2). There, it is already shown that the activity of logic has an influence on the PDN, i.e., varying current demands will lead to a voltage drop on the PDN. The varying supply voltage can be picked up by other parts of the circuit (cf. Sect. 3.4.2). The used sensor measures the progress of a test signal through a delay chain. The voltage level on the PDN influences the speed of transistors. Therefore, how far this test signal advanced is proportional to the power consumption and thus, the activity of circuits connected to the same PDN. We built on this work and show that the information carried through the PDN is enough to act as a source of side-channel information. In this chapter, we provide experimental results targeting an AES encryption implemented on the SAKURA-G while the sensor is implemented in the fabric nearby. Later, we will show that the sensor can also pick up leakage from a CPU- core on the same silicon die (Chapter5) as well as anotherIC on the same PCB (Chapter6). The SAKURA-G is designed for side-channel analysis and allows us to compare the perfor- mance of the sensor to the traditional external measurement using an oscilloscope. Indeed, we show that we can successfully extract the secret key using the traces captures with the sensor with only a slight increase in the number of required traces. In addition, we port the sensor to two different off-the-shelf multipurpose FPGA development platforms, showcasing its general applicability beyond hardware specifically made for side-channel analysis. We highlight that the sensor does not require any logic connection to the victim. Further, the sensor measuring an analog signal can be implemented within the user-available fabric of an FPGA meant for digital logic. This leads to severe security implications in scenarios as discussed in the following.

4.2 Adversary Model

The availability of a sensor that can spy on other circuits implemented in the same fabric of an FPGA has severe security implications in the following scenarios among others:

30 4.3 Experimental Setup

 IP-cores are commonly used during the FPGA and ASIC tool flow to implement, e.g., memories, network interfaces, multipliers, etc. These already prepared logic elements are either provided directly by the manufacturer of the FPGA to configure dedicated elements like memories or are provided by a third party. Our voltage sensor might be deployed through such an unsuspicious IP-core as a Trojan or virus. Of course, for example an IP-core for signal processing provided by a third-party being connected to an encryption module would raise questions immediately. Instead, our sensor is especially stealthy in that sense as we do not require a direct logic connection to the targeted circuit. While capturing the fluctuations of the PDN can be performed remotely, we still need a way to get the captured traces. Thus, for example an IP-core providing a network interface would be an ideal target to implant the Trojan.

 Considering multi-tenant FPGAs, the sensor can be used directly to spy at another user residing on the same FPGA. This is especially relevant, e.g., for FPGA accelerators shared in the cloud [EV12, BSB+14, FVS15] or when SoCs with FPGA fabric are shared amongst multiple users. Because of obvious security concerns, isolation of different users and verification of uploaded designs have already been proposed in [TM17]. However, as the PDN spans over the whole fabric, techniques such as logical separation with defined interfaces as bridges [HBW+07, Cor13] are entirely ineffective.

Note that similar previous work tunneling through isolation requires some modification to the targeted circuit. For example, such covert channels might be formed by special routing to create electrical coupling [GE16] or by additional functionality [INK11]. Instead, we solely capture the voltage fluctuations of the genuine circuit and do not require any modification.

4.3 Experimental Setup

We present successful side-channel attacks on an AES-128 implementation on multiple platforms for traces captured with the internal voltage sensor in fabric. The SAKURA-G as our first target is built for side-channel analysis and is thus equipped with a dedicated shunt to measure the power consumption externally. This enables us to compare the traces captured with our sensor to the traditional measurement using an oscilloscope. As one might be tempted to think that the attack works especially well on the SAKURA-G, we provide further results using two different general-purpose FPGA development boards by Digilent. In the following, we briefly introduce the properties of the experimental setup that are shared between all targets.

4.3.1 Voltage Sensor in FPGA Fabric

We implement the same sensor circuit on all target platforms, i.e., the SAKURA-G side-channel platform and the Digilent Basys3 and PYNQ development boards. We omit a detailed descrip- tion of the sensor at this point and refer to Section 3.4.2 for the working principle and imple- mentation. The platform-specific properties mainly relating to the used sampling rate and they way the captured traces are transferred will be described in the corresponding sections.

31 Chapter 4 Intra-FPGA Side-Channel Attacks

4.3.2 AES Module For the sake of comparability, we implemented the same AES module on all three target plat- forms. Figure 4.1 depicts the main 32-bit datapath not showing the key schedule and the ShiftRows operation. Because ShiftRows is just a permutation of the bytes of the state, we can arbitrarily move its position if we do not pass a MixColumn operation and adjust key addition accordingly. In our case, ShiftRows is executed first by implementing a specific routing between the shift registers holding the state s0, ..., s15. Then, as shown in Figure 4.1, four Sboxes run in parallel and are followed by a single 32-bit MixColumn operation and the key addition. The last round bypasses the MixColumn operation. One cycle for ShiftRows and the four cycles for processing the 128-bit state at chunks of 32 bits add up to five cycles per round and thus, 50 cycles in total. Because the Sboxes are not required during ShiftRows, they can be shared with the key schedule which is executed exactly at those cycles. For the SAKURA-G, the implementation of the AES module requires 265 flip-flops and 862 LUTs, i.e., 0.3% of the available flip-flops and 0.9% of the available LUTs in total on the FPGA. Even though the utilization is much lower compared to 8% of the available flip-flops used for characterizing the sensor in [GOKT16], we still can perform a successful attack as shown in the following.

Sbox s0 s4 s8 s12

Sbox s1 s5 s9 s13

Sbox s2 s6 s10 s14

RoundKey Sbox s3 s7 s11 s15

Figure 4.1: Architecture of the underlying AES encryption core (ShiftRows and KeySchedule not shown)

To evaluate whether the PDN poses an unintentional information channel for side-channel analysis, we performed straightforward Correlation Power Analysis (CPA) on the last round of AES. However, as already expected based on previous external measurements of this AES implementation using an oscilloscope, only a few bits of each of the internal state bytes showed some leakage. Thus, we ran multiple attacks for the bits individually and only depict the result for the one showing the highest correlation. khyp describes the hypothetical key hypothesis and ci a byte of the ciphtertext. Thus, our model for the predicted intermediate bit b for the fixed bit position bitpos showing the highest correlation is

−1 bitpos b = Sbox (khyp ⊕ ci) ∧ (2 ).

We also evaluated the influence of the sample rate and the initial delay (cf. Sect. 3.4.2). For our straightforward implementation of the sensor, we must adjust the initial delay and re- synthesize our design (cf. Sect. 3.4.2). Unfortunately, every synthesis slightly altered the leakage characteristic, resulting in different bits leaking. To be able to still provide comparable results, we implemented the AES-core only once and fixed its placement and routing. This way, only the implementation of the sensor will vary. Of course, this was not possible when switching

32 4.4 Results on the SAKURA-G from the SAKURA-G to the boards manufactured by Digilent, as they use different FPGA architectures.

4.4 Results on the SAKURA-G1

In the following, we provide experimental results for the SAKURA-G, showing a successful attack using the traces measured by the internal sensor. We compare the results to a traditional measurement setup, i.e., measuring the power consumption externally.

4.4.1 Implementation

We ran our experiments on the widely-used side-channel evaluation platform SAKURA-G, fea- turing a main and a control Xilinx Spartan-6 FPGA. The main FPGA is a larger XC6SLX75 for security implementations, controlled by an auxiliary Spartan-6 XC6SLX9. As a proof of concept, we considered the AES encryption module as the targeted cryptographic core, imple- mented in the main FPGA and ran at a frequency of 24 MHz. The control FPGA generates random plaintexts to be encrypted on the main FPGA. Our Trojan circuit to measure voltage sits in the main FPGA, logically disconnected from the AES module. When the AES module sends out the ciphertext, we also receive the voltage data from our sensors on the workstation, by utilizing the Xilinx Chipscope Integrated Logic Analyzer (ILA). Here, the sensor values are first stored in the internal Block RAM (BRAM) and are then read out using the JTAG interface. In Figure 4.2 (left), we show the entire floorplan of the experimental setup. We only place our design in the lower part of the Spartan-6. In the center region, the AES core is fixed and the sensor is placed on the left side of the AES module. The FPGA slices used for the sensor’s delay line, including latches and output register, are not shared with any other logic. For all the experiments, we kept the same placed and routed partition for the AES core, to keep the results comparable. However, the logic required for the ILA core are automatically added each time by the synthesis tool. The higher the initial delay is in relation to the observable delay, the more variation is zoomed into by the observable part. Thus, more fine-grained quantization levels are seen with higher initial delay, when checking the peak-to-peak variation of a given voltage fluctuation. In Table 4.1, we show the initial delay and resulting variations observed in our experiments.

Data Acquisition

We compare the efficiency of our developed sensor to a traditional measurement setup. To this end, we measured the voltage drop over a ≈0 Ω shunt resistor2 in the VDD path using a Picoscope 6403. Figure 4.3 (top) depicts the resulting trace, measured at 625 MS/s showing approximately 120 quantization levels. Note that the round structure of the underlying AES implementation can be observed through ten similar patterns, each including five smaller peaks of each individual step, respectively. A 24 MHz clock is externally given to the main FPGA which

1This section is taken from [SGMT18b] and was written in equal parts by Dennis Gnad and the author of this thesis. 2The built-in shunt resistor of the SAKURA-G was shorted with a jumper.

33 Chapter 4 Intra-FPGA Side-Channel Attacks

Table 4.1: Overview of different sensor’s sampling frequency with AES module @ 24 MHz. Sampling frequency (MHz) 96 72 48 24 No. of primitives used for initial delay 10 14 22 46 Observed peak-to-peak variation 6 6 8 15 supplies both the AES module and our developed sensor. Thus, in contrast to the oscilloscope that has an independent time base, the internal sensor can sample the power consumption synchronously. The side-channel information is expected to be amplitude-modulated over the clock signal, i.e., it is visible at the clock peaks. Therefore, it would be enough to sample the power consumption (only) at this exact moment when the side-channel information leakage occurs. This drastically lowers the required sample frequency for a successful attack [OC15]. To verify this, we conducted different experiments by supplying the sensor with different frequencies (24 MHz, 48 MHz, 72 MHz, and 96 MHz) while the AES module always runs at 24 MHz. To this end, we used a Digital Clock Manager (DCM) to generate the desired clock frequencies based on the external 24 MHz clock.

Sensor Feasibility Discussion In our experiments we used Xilinx Chipscope to read the sensor values. Note that besides using the same clock source, there is no connection made between the AES core and the sensors, or the logic belonging to Chipscope. However, our developed sensor has the additional advantage that even if it is not synchronized with the AES clock, it catches all the variation that occurs in half of each clock cycle, since the clock traverses the delay chain during half of the clock period in which the latches are enabled (see Section 3.4.2). Although we use JTAG to connect to Chipscope in our experimental setup, an actual attacker would easily be able to use whatever remote connection he has, to transmit the sensor values from internal BRAM to the outside. Since no logical signaling between the attacked module and the sensor is desired, the attacker would need to adjust a mechanism to trigger the start of saving the samples e.g., into the BRAM. This can be achieved by observing the measured signal itself and trigger by detecting a large peak. Indeed, the sensor value varies only slightly, indicating that the AES is inactive (cf. Fig. 4.3). The power consumption of the first round of the AES module results in a large negative peak in the sensor value, enabling a stable reference point for aligning the traces. As described in Section 3.4.2, for each sensor frequency, the initial delay of the sensor must be adjusted. This leads to different levels of quantization, and thus the observed peak-to-peak variation. This relationship is verified by our experimental data in Figure 4.3, where sensors at lower operating frequencies show higher peak-to-peak variation (cf. Table 4.1).

4.4.2 Sensor placed close to the AES core Figure 4.4 depicts the results using the oscilloscope as well as placing the internal sensor close to the AES core, with a gap of just four FPGA slices to avoid potential crosstalk. In all cases the correlation curves using 5 000 traces and the progressive curves over the number of traces are shown. Starting with the result using the oscilloscope, we observed the maximum

34 4.5 Results on General-Purpose FPGA Development Platforms correlation of approximately −0.3 for the correct key hypothesis. As shown, the attacks using the internally-measured traces by the sensor are also successful. The correct key hypothesis is clearly distinguished from the others, but with a slightly lower maximum correlation of about −0.2. Comparing the results of the sensor at different sampling frequencies, we do not observe a large deviation. This is caused by the synchronous sampling as most of the information is contained in the respective peak anyway. Finally, we can observe that the higher resolution (more quantization steps) slightly improves the maximum correlation.

4.4.3 Distant Sensor

We further investigated whether we still can detect any side-channel leakage if the sensor is placed far away from the cryptographic block. We placed the sensor in the opposite region as far away as possible from the AES module. The right part of Figure 4.2 depicts the corresponding layout. We examined this situation only with 96 MHz sampling frequency, i.e., the worst case in Figure 4.4. The corresponding CPA results are depicted in Figure 4.5, indicating that the successful attack is still possible with only a slight decrease in the correlation. This highlights the high risks involved when sharing an FPGA among multiple users. Note that for a real-world design, additional logic might be placed in between the AES and the sensor, resulting in noise and an increased number of required traces for a successful attack. Anyhow, such effects are also present for an external measurement.

4.5 Results on General-Purpose FPGA Development Platforms

We explicitly started our experiments using the SAKURA-G as it is well-established in the side-channel community. Further, it features a shunt in the supply path (cf. Sect. 3.4.1) and thus, allows for a fair comparison to traces captures with an oscilloscope. Indeed, the sensor’s performance is similar to a traditional oscilloscope for the SAKURA-G (cf. Fig. 4.4). In the following, we extend the results to general-purpose off-the-shelf FPGA development platforms. Our goal is to show that we do not require boards designed for an optimal side- channel measurement, underlining the practicability of our sensor. That the sensor is performing well on easily-available standard boards leads to an additional, constructive use-case: The design could potentially be used for side-channel evaluation of (protected) designs without requiring additional measurement equipment besides the FPGA itself, e.g., in a lab course for students. To explore this use-case we implement the sensor on two widely available development boards, i.e., the Basys3 and the PYNQ. Both boards are manufactured by Digilent and come with a special pricing offer for academic users. We plan to release the code as an open-source framework to be used for an easy and cost-efficient introduction to side-channel analysis. Both Digilent boards are equipped with fabric of a Xilinx Artix 7 FPGA. While the Xilinx Spartan 6 of the SAKURA-G is built at the 45 nm technology node, the Artix 7 is manufactured using a 28 nm process. The Digilent PYNQ development board is built around a Xilinx Zynq. This SoC features both the FPGA fabric of an Artix 7 and a dual-core ARM Cortex-A9 processor connected through the Xilinx Advanced eXtensible Interface (AXI) bus. We show a successful SPA attack on an RSA implementation running on the ARM core using our voltage sensor in Chapter5.

35 Chapter 4 Intra-FPGA Side-Channel Attacks

We synthesized the AES and the same sensor configuration as for the SAKURA-G. Note that regarding the experimental setup and implementation aspects, we only mention key differences to the SAKURA-G in the following and refer to Section 4.4 otherwise.

4.5.1 Digilent Basys3 The Digilent Basys3 development board is built around a Xilinx Artix 7 FPGA and is available for around 150€ or 80€ with academic discount. The board is equipped with an 100 MHz crystal oscillator. Thus, we ran the sensor at this frequency as well and clocked the AES at 12.5 MHz. In contrast to the previous work in Chapter4 using Xilinx Chipscope, the sensor data is stored in the internal BRAM of the FPGA and sent out through Universal Asynchronous Receiver Transmitter (UART). This resulted in around 30 traces per second, limited by the used UART interface provided by an FTDI FT2232 USB-UART bridge. After collecting the first set of traces, we observed some periodic noise asynchronous to our measurement. As can be seen in Figure 4.6 (top), the disturbing signal has a period of around 2 µs. The Linear Technology LTC3633 switching voltage regulator was quickly identified as source of this noise: Its switching frequency can be selected between 500 kHz and 4 MHz by connecting a specific resistor value. Indeed, the switching frequency is set to 500 kHz (or period of 2 µs, respectively) This is the most energy-efficient option but at the same the one with the most amount of noise on the power rail. Setting the frequency to 4 MHz, i.e., replacing the 576 kΩ resistor by one with 81 kΩ, drastically reduced the amount of switching noise, cf. Figure 4.6 (bottom). We ran the same attack on the last round of AES as for the SAKURA-G. The right side of Figure 4.6 depicts the progress of the correlation. We expected a larger number of required traces compared to the SAKURA-G because of small-value capacitors: For the SAKURA-G, the small-value capacitors close to the target FPGA are not equipped, i.e., they are marked as “do not populate” by the manufacturer. Instead, all small-value capacitors stabilizing the power rails are equipped for the results depicted in Figure 4.6. Finally, we can observe that increasing the switching frequency of the voltage regulator indeed lowers the number of required traces significantly. For comparison, we also tried using an external benchtop power supply to create all required voltages, avoiding the PMIC entirely. However, we did not notice any additional improvement compared to the PMIC running at 4 MHz.

4.5.2 Digilent PYNQ The Digilent PYNQ-Z1 comes with a Xilinx Zynq SoC, featuring both Artix 7 fabric and a dual-core ARM Cortex-A9 CPU. It is specifically designed for using the Python programming language, resulting in the portmanteau “PYNQ”. Anyhow, we do not require any Python-specific features and chose the board only based on the SoC and on its price tag of 65 US$ for academic users. We implemented the voltage sensor and the AES core in the programmable logic part of the SoC. For initial testing, we used again the BRAM as storage and an external UART interface provided by an FTDI cable for communication. This resulted in a measurement speed equivalent to the Basys3 board of around 30 traces per second, again limited by the FTDI chip. To increase the measurement speed, we chose to store the traces directly on the SD-card that acts as non-volatile memory for the Linux operation system. The implementation of the

36 4.6 Conclusion corresponding AXI interface was mainly performed by Jonas Krautter of Karlsruhe Institute of Technology. The AXI bus is intended for high-speed communication between fabric and CPU. Indeed, the resulting measurement speed improved to around 100.000 traces in 90 s or around 1.100 traces per second. Based on our experience with the Basys3 board, we analyzed the power supply as well: The switching voltage regulator is set to a frequency of approximately 1.2 MHz. However, replacing the configuration resistor to set the frequency to the maximum of 2.2 MHz had no significant effect on the side-channel attack. Hence, we only present the results for the unaltered board in Figure 4.7. Note that we did not alter or removed any capacitors of the board.

4.6 Conclusion

In this chapter, we have demonstrated that a voltage sensor implemented in the user-available fabric of an FPGA can pick up side-channel information of neighboring circuits. It exploits that the activity of the victim causes fluctuations on the PDN. In turn, such fluctuations will result in a varying speed of transistors connected to the same PDN. Thus, the sensor can measure the speed of a signal through a series of buffers to capture the power consumption indirectly—without requiring any logic connection to the victim. We provided experimental results by performing a side-channel attack on AES for the SAKURA-G, indicating that the sensor’s performance is close to the traditional measurement using an oscilloscope. Further, we present similar results for general-purpose FPGA develop- ment platforms. The availability of such a sensor has severe implications to emerging applications of FPGA fabric. For example, it might be deployed through an unsuspicious third-party IP-core that is not connected to the victim. For multi-tenant FPGAs, it might enable one user to spy at the computation of another user, even when logical isolation is used to separate the designs. In the following, we will demonstrate that the sensor can reach even further, e.g., by picking up side-channel leakage of the CPU-core of an SoC as well as of anotherIC on the same PCB. Again, the only connection is made through the common PDN.

37 Chapter 4 Intra-FPGA Side-Channel Attacks

Figure 4.2: Floorplans showing the Experimental Setup with all the relevant parts. Left: the internal sensor is placed close to the AES module. Right: the internal sensor is placed far away from the AES.

38 4.6 Conclusion

100 50

0 Picoscope ADC Value ADC -50 0 200 400 600 800 1000 1200 1400 1600 1800 2000

10

96Mhz

5 Sensor Value Sensor 0 50 100 150 200 250

10

72Mhz

5 Sensor Value Sensor 0 20 40 60 80 100 120 140 160 180 200 20

15 48Mhz

Sensor Value Sensor 10 0 20 40 60 80 100 120 140 30

20

24Mhz Sensor Value Sensor 10 0 10 20 30 40 50 60 70 Time Samples

Figure 4.3: Single traces measured using an oscilloscope (top) and using our developed sensor at different sampling frequencies (below). Time samples refer to the individual samples captured at the respective sampling rate.

39 Chapter 4 Intra-FPGA Side-Channel Attacks

Figure 4.4: Results using the oscilloscope (top row), using the internal sensor at different sam- pling frequencies (rows below), for each the correlation using 5 000 traces (left) and the progressive curves over the number of traces (right). The correct key hypothesis is marked in black. Time samples refer to the individual samples captured at the respective sampling rate.

Figure 4.5: Correlation using 5 000 traces (left) and progress of the maximum correlation over the number of traces (right) using the internal sensor at 96 MHz sampling frequency, placed far away from the AES module.

40 4.6 Conclusion

Figure 4.6: AES on the Basys3 development board: Single trace (left) and progress of correlation (right) measured with different switching frequencies of the voltage regulator, top: 500 kHz, bottom: 4 MHz.

Figure 4.7: AES on the PYNQ-Z1 development board: Single trace (left) and progress of cor- relation (right

41

Chapter 5 System-on-Chip Side-Channel Attacks, from fabric to CPU

As evolution of our attack within the FPGA fabric, we attack logic beyond the fabric with the voltage sensor. In this chapter, we successfully extract the secret exponent of an RSA-implementation running on the ARM-CPU of a System on Chip (SoC). The sensor is implemented in the FPGA fabric on the same silicon die.

Contents of this Chapter

5.1 Introduction ...... 43 5.2 Experimental Setup ...... 44 5.3 Results ...... 44 5.4 Conclusion ...... 46

5.1 Introduction

In Chapter4, we demonstrated a successful attack within the user-available fabric of an FPGA from on part to another. The attack even succeeded if the targeted AES core was placed far away from the sensor. This leads to the question whether we can reach even further to other circuits connected to the same PDN of theIC. Recently, many manufacturers integrated general-purpose CPUs and FPGA fabric into a single SoC. The idea is to use the FPGA fabric to speed up suitable computation tasks. An additional motivation might be to outsource security- critical tasks to the FPGA as the correct behavior might be more easily verified: Classical attacks on software running on a CPU like buffer overflows etc. are not applicable to hardware implementations. Instead, we show in the following that exactly the opposite might be true, i.e., unsuspicious but potential malicious IP-cores might be used to spy on the CPU. This threat is elevated when multiple users reside on the same SoC but only one user has access to FPGA fabric. In the following, we present a proof-of-concept of the scenarios above for a Xilinx SoC. Indeed, we successfully extract the secret key bits of an RSA implementation in software by using the voltage sensor implemented in the FPGA fabric. The work described in this chapter was mainly performed around the turn of the year 2017/2018, i.e., after submitting the results on the SAKURA-G to DATE’18 and after the

43 Chapter 5 System-on-Chip Side-Channel Attacks, from fabric to CPU

initial experiments on the Basys3 and PYNQ (cf. Sect. 4.5). However, before being able to sub- mit and publish the results, we became aware of independent work showing similar results in March 2018, to be published later this year at S&P’18 [ZS18]. The authors successfully attacked an RSA implementation on another Xilinx Zynq platform as well using multiple ring-oscillators as voltage sensor (cf. Sect. 3.4.2).

5.2 Experimental Setup

To see whether there is any leakage, we implemented RSA with a key size of 1024-bit using a straight-forward left-to-right binary exponentiation in C (Alg.1). The GNU multi-precision library GMP 6.0.0 is already installed on the Ubuntu 15.10 image supplied by Digilent and compiled for ARM. We call GMP’s mpz mul for both squaring and multiplication with a con- secutive mpz mod for reduction. However, GMP internally checks whether the operands are identical or not and calls optimized versions for squaring and multiplication accordingly. Thus, the attacker can extract the secret exponent by distinguishing a multiplication from a squaring: If a multiplication took place, the corresponding bit of the exponent was set.

Algorithm 1 Left-to-right binary exponentiation

1 mpz_set(result, g);// result <-g 2 for(int i = n - 1; i >= 0; i--) {//n= size of exponente in bits 3 mpz_mul(result, result, result);// multiple 4 mpz_mod(result, result, modulus);// modular reduction 5 if( mpz_tstbit(e, i)) { 6 mpz_mul(result, result, g);// square 7 mpz_mod(result, result, modulus);// modular reduction 8 } 9 }// result=gˆe mod modulus

The sensor is triggered through the AXI interface to record a fixed number of samples. Because our goal was to capture leakage of RSA using a single trace anyway, we simply used an external UART connected to the FPGA to get the data of the sensor. Other than that, the implementation of the sensor is identical to the intra-FPGA attack described in Section 4.5.2. Digilent used a single PMIC to create the four required supply voltages of the entire board with a single PMIC (1.0 V, 1.5 V, 1.8 V, and 3.3 V). The package of the Xilinx Zynq SoC of the PYNQ has distinct supply pins for internal logic of the processing system (PS), i.e., the ARM-cores, and of the programmable logic (PL). However, both VCCPINT for the ARM-core and VCCINT for the FPGA fabric have identical recommended operation voltage of 1.0 V [Xil18]. Thus, Digilent chose to power the ARM-core and the fabric from the same supply.

5.3 Results

The sensor is sampling at 100 MHz while the ARM core is running at its nominal frequency of 600 MHz. Because of this under-sampling, we are unable to see the power consumption of individual instructions etc. Figure 5.1 depicts the resulting trace (averaged ten times) for the exponent 0xCCCC. The exponentiation takes place approximately between 100 µs and 600 µs

44 5.3 Results

While the exponentiation has a distinct signature, there is no clear indication pointing to the bits of the exponent. However, a spectrogram even of a single trace easily revealed to secret exponent (cf. Fig. 5.2). Here, the trace is partitioned in windows of 512 sample points with an overlap of 256 points. For each of these windows, the discrete Fast Fourier Transformation (FFT) is computed. The frequency spectrum of a window is plotted on the x-axis with the color indicating the corre- sponding intensity at this frequency. This is repeated for each window, to create a plot over time in directing of the y-axis. Multi-precision multiplication and squaring are both repetitive operations, i.e., both iterate over the individual words of the operands. Depending on the length of an iteration, we will observe patterns as can be seen in Figure 5.2. For example, between around 120 µs and 500 µs, we can see some signal around 5 MHz, 23 MHz, and most prominently just above 30 MHz. Instead, the signals for example at frequencies of 2 MHz, 7 MHz, and 25 MHz are static and independent of the exponentiation. Figure 5.2 (right) depicts a detailed view of the spectrogram around 32 MHz. As squaring can be implemented more efficiently than multiplication, we expect both operations to have distinct frequency characteristics. Our test exponent was 0xCCCC. Thus, we try to identify patterns of 1100 or “SQ MUL SQ MUL SQ SQ”, respectively. Setting the variable storing the result to the base (Line 1 in Alg.1) inherently provides the first “1” of the exponent. Therefore, the start of the exponentiation should only consist of 100 or “SQ MUL SQ SQ”. Indeed, we can observe distinct frequency characteristics in Figure 5.2 (right). This magnified view depicts “dots” at two different frequencies on the left and brief “bars” on the right spanning over a larger frequency band. After some profiling of the GMP operations, we concluded that the dots in the middle (at 32 MHz) correspond to the multiplication. Further, the “dots” on the left correspond to the modular reduction (at ≈31 MHz). The bars on the right represent the squaring. As expected, integer squaring is faster compared to multiplication. Even though the squaring is not always visible, we can easily extract the secret exponent using a single trace by the position of the dots caused by the multiplication.

46

44

42

40 Sensor Value Sensor

38 0 100 200 300 400 500 600 700 800 900 1000 Time (µs)

Figure 5.1: RSA on the PYNQ development board: Binary exponentiation on the ARM CPU captured with the voltage sensor, ten traces averaged. The exponentiation takes places between approximately 120 µs and 570 µs.

45 Chapter 5 System-on-Chip Side-Channel Attacks, from fabric to CPU

Figure 5.2: RSA on the PYNQ development board: Binary exponentiation on the ARM CPU captured with the voltage sensor, single trace spectrogram (window size: 512 sample points, overlap: 256, points for discrete FFT: 512). Left: full time and frequency span, Right: detail of dots representing squaring (left column) and multiplication (right column). The colorbar representing the Power/frequency is calculated based on the (arbitrary) sensor values.

5.4 Conclusion

On top of the side-channel leakage within the fabric of an FPGA as demonstrated in Chap- ter4, we successfully picked up leakage from the CPU of an SoC. Our proof-of-concept attack demonstrates that while FPGAs are often thought as security add-on, they enable a new po- tential attack surface. Thus, deploying countermeasures against power side-channel attacks on SoCs should be considered even in a remote setting. For future work, it is interesting to see whether generating the supply voltages individually is already enough to separate the FPGA- part from the ARM-CPUs. Finally, attacking a vulnerable implementation of RSA through SPA is certainly much easier than attacking a symmetric cipher with DPA. Thus, another aspect of future work is to see whether attack for example an AES-implementation in software is practically feasible or not.

46 Chapter 6 Inter-FPGA Side-Channel Attacks

The current practice in board-level integration is to incorporate chips and components from numerous vendors. A fully trusted supply chain for all used components and chipsets is an important, yet extremely difficult to achieve, prerequisite to validate a complete board-level system for safe and secure operation. An increasing risk is that most chips nowadays run software or firmware, typically updated throughout the system lifetime, making it practically impossible to validate the full system at every given point in the manufacturing, integration and operational life cycle. This risk is elevated in devices that run 3rd party firmware. In this chapter we show that an FPGA used as a common accelerator in various boards can be reprogrammed by software to introduce a sensor, suitable as a remote power analysis side-channel attack vector at the board-level. We show successful power analysis attacks from one FPGA on the board to another chip implementing RSA and AES cryptographic modules. Since the sensor is only mapped through firmware, this threat is very hard to detect, because data can be exfiltrated without requiring inter-chip communication between victim and attacker. Our results also prove the potential vulnerability in which any untrusted chip on the board can launch such attacks on the remaining system.

The work described in this chapter will be published at ICCAD’18 [SGMT18a] and was joint work with Dennis Gnad of Karlsruhe Institute of Technology. The initial implementation of the voltage sensor by Dennis Gnad was jointly adopted for side-channel analysis and corresponding attacks were performed in close collaboration. We indicate in the following which sections are taken from [SGMT18a]. In this case, both authors contributed equally. Contents of this Chapter

6.1 Introduction ...... 48 6.2 Adversary Model ...... 48 6.3 Experimental Setup ...... 49 6.4 Results ...... 50 6.5 Conclusion ...... 53

47 Chapter 6 Inter-FPGA Side-Channel Attacks

6.1 Introduction

Trust is required at multiple levels to assure that a system performs its intended operation. For example, the program code running in a CPU must be verified, i.e., so that no malicious instructions are executed. Although sometimes hidden behind multiple layers of abstraction, every software is executed on some form of hardware. Enabling trust in hardware is challenging due to the landscape ofIC manufacturing [Tur99]: Various components manufactured by a multitude of geographically scattered vendors need to be integrated onto a Printed Circuit Board (PCB). Even when a trusted state of the system is reached after manufacturing, potentially malicious or trojanized software, firmware, or bitstream updates could defeat the chain-of-trust. Likewise, employed security features such as secure boot or signed firmware images might be vulnerable as well (cf. [HvD04, MKP11, RSWO17]) or isolation techniques might be ineffective (cf. [YF14, GMM16, LSG+18, KGG+18]). Previously, we demonstrated that the voltage sensor implemented in the user-available fabric of the FPGA can pick up side-channel leakage that originated from logic within the same fabric (Chapter4) and from the ARM-CPU of an SoC (Chapter5). As extension to this, we show in the following that the sensor can capture leakage from otherICs residing on the same PCB, i.e., even through multiple levels of the PDN. We provide experimental results using the SAKURA- G as it features two separate FPGAs on a single PCB: The voltage sensor is implemented on the main FPGA to capture side-channel leakage of AES and RSA implemented on the control FPGA. A sensor being able to spy on otherICs immensely increases the attack surface when compared to the previous two scenarios. We discuss multiple potential threats next.

6.2 Adversary Model

The sensor might be implemented in a chip on the PCB through one the following scenarios. In any case, the sensor does not rely on a logic connection to the victim. Instead, everyIC residing on the same PDN is a potential target, e.g., the CPU of the host, a Hardware Security Module (HSM), orICs within a trusted computing base.

 In the first scenario, the attacker has access to the supply chain. The design of the voltage sensor is not limited to FPGA implementations but can be ported to an ASIC implementation as well. Thus, an ASIC implementation might be integrated into an otherwise unsuspicious chip by an attacker. Monitoring the power consumption through the PDN might be much stealthier than, e.g., directly connecting a dedicated ADC.

 After manufacturing, (parts of) the bitstream running on an FPGA might become in control of an adversary. For example, virtually every large cloud provider offers to rent machines with FPGA add-ons. Other examples include trojanized firmware, e.g., by interception at delivery or through tampered firmware updates.

48 6.3 Experimental Setup

6.3 Experimental Setup1

The main reason for using the SAKURA-G in our experiments is that it contains two FPGAs on a single board, where both can be freely programmed. It contains two Spartan-6 FPGAs: a small auxiliary FPGA (XC6SLX9), and a larger main FPGA (XC6SLX75). We ran different configurations for the capacitors between VDD and GND, i.e., the unaltered default configura- tion of the SAKURA-G and with all small-value capacitors close to the FPGAs removed. In any case, we inserted a small bridge so that the core voltage of the main FPGA is provided by the same power supply as for the auxiliary FPGA. This modification does resemble more typical industrial boards, in which power supplies of the same voltage level are shared for efficiency reasons. The auxiliary FPGA receives plaintexts from the PC and encrypts them with an internal secret key, sending the ciphertext back to the PC. The main FPGA uses a sensor and transfers the sensor data to the PC. For the ease of experimentation, the main FPGA also receives a trigger signal whenever the auxiliary FPGA starts an encryption. In a real-world scenario, an encryption might be triggered externally (by the PC) or the traces can be realigned using existing work. Our victim designs are implemented on the smaller auxiliary FPGA. This FPGA is still large enough to fit an AES module and a small RSA implementation. The AES module is identical to the one used in Chapter4. Thus, we refer to Section 4.3.2 for a detailed description. In addition, we implemented RSA using a straightforward right-to-left binary exponentiation [MvOV96], i.e., following the same principles as the one used in [ZS18] for the sake of comparability. The pseudo- code is given in Alg.2. In each iteration step of the exponentiation, a squaring is executed. If the current (secret) bit of the exponent is set, the squared term of the previous step is multiplied to the register storing the result. Like [ZS18], if a specific step of the algorithm does not require a multiplication, one of the inputs is set to 1, i.e., calculating the identity function. Thus, to retrieve the secret exponent, the adversary tries to identify whether an actual multiplication took place for each of the steps in the binary exponentiation. Both squaring and multiplication are implemented as separate modules using dedicated mul- tiplication cores with integrated modular reduction. The multipliers itself operate on the shift- and-add principle: In each clock cycle, one operand is multiplied by two (left shift) and reduced. If the current bit of the other operand is set, the shifted term gets added to the result register. The limited resources of our auxiliary FPGA only sufficed for a rather small key-size of 224- bit, running at 24 MHz. However, we stress that RSA implementations with a larger key size are usually much easier to attack using Simple Power Analysis (SPA) than smaller ones. This stems from the fact that because of the larger operand sizes, the multiplication cores require more cycles while consuming more power as well. For our proof-of-concept implementation, we have 224 steps of the binary exponentiation, each requiring 224 clock cycles for the squaring and the multiplication running in parallel. Thus, for each step we have 224 clock cycles for which we must decide whether a multiplication is taking place or not. Considering for example an RSA with a 4096-bit key size, the computation time is increased to 4096 cycles likewise.

1This section is taken from [SGMT18a] and was written in equal parts by Dennis Gnad and the author of this thesis.

49 Chapter 6 Inter-FPGA Side-Channel Attacks

Algorithm 2 Right-to-left binary exponentiation (cf. 14.76 of [MvOV96]) Require: Message x, Exponent e, Modulus N Ensure: xe mod N 1: A ← 1, S ← x 2: while e =6 0 do 3: if e is odd then 4: A ← A · S mod N 5: else 6: A ← A · 1 mod N 7: end if 8: e ← be/2c 9: S ← S · S mod N 10: end while 11: Return(A)

6.4 Results2

In the following, we present experimental results attacking AES using CPA and RSA using SPA. For both cases, the respective cipher was running on the auxiliary FPGA while the voltage sensor captured the inter-chip voltage drop through its supply pin on the main FPGA.

6.4.1 Attack on AES

As discussed in Section 3.4.2 and already seen in Figure 4.4, the voltage sensors incorporate an inverse relation between the sampling frequency and the amount of variation that is captured. Hence, we chose to run the attack for different sampling frequencies as well. Figure 6.1 depicts the exemplary traces, for each sampling frequency averaged over 1000 measurements. The AES encryption is taking place approximately between 3.6 and 5.7 µs. Like in Chapter6, we run a textbook CPA attack, using the ciphertexts to predict the state before the last Sbox operation based on key hypothesis. Figure 6.2 depicts the corresponding results for the attacks: For each sensor frequency, the correlation after 500 000 traces is plotted on the left and the progress of the maximum of the correlation on the right. The curve belonging to the correct key candidate is marked in black. Indeed, the attack succeeds for all tested sampling frequencies, yet at a large deviation relation to the number of required traces. For the 96 MHz sensor, the correct key candidate is starting to stand out after approximately 200 000 processed traces. Considering the 24 MHz sensor instead, the correct key candidate is visible immediately after around 20 000 traces.

2This section is taken from [SGMT18a] and was written in equal parts by Dennis Gnad and the author of this thesis.

50 6.4 Results

Figure 6.1: Averaged traces measured during AES using the voltage sensors at different sampling frequencies.

Figure 6.3: CPA attack on AES: Progressive curves over the number of traces with a board when all relevant capacitors are removed (left) and with the default capacitor configuration (right), both sampled at 24 MHz. The correct key hypothesis is marked in black. Time samples refer to the individual samples captured at the respective sampling rate.

51 Chapter 6 Inter-FPGA Side-Channel Attacks

Comparing our results to the ones reported in Chapter6 for sensor and target implemented within the same FPGA, we require more traces by a factor of 40, considering the respective best attacks. It should be noted that none of the smallest-value capacitors between VDD and GND were put in place when measuring the inter-chip leakage. Only distant large-value capacitors were left in the VDD path as such do not affect small variations anyway. Anyhow, we ran additional experiments with the default configuration of the capacitors on the SAKURA-G, i.e., all small capacitors close to the FPGA chip were placed. A direct comparison at the sampling frequency of 24 MHz is depicted in Figure 6.3. As expected, such additional capacitance acts as a low pass filter that can be compensated by increasing the number of captured traces. Indeed, when sampling at 24 MHz the correct key candidate started to stand out after approximately 2.5 million traces using the default configuration but powering through the same power supply. Note that 2.5 million traces still only correspond to around 38 Mbyte of encrypted data when using AES-128.

6.4.2 Attack on RSA

Based on the results for AES, we chose to measure the RSA core using a sampling rate of 24 MHz. As before, both FPGAs share a power supply and all capacitors are in place. The RSA is running at 24 MHz. Thus, we require at least 50 176 cycles to capture the whole binary exponentiation (224 clock cycles for each of the 224 steps). Figure 6.4 depicts the raw trace with an already visible variation over a time span of approximately 2100 µs. We recall that the adversary’s goal is to recover the secret exponent by identifying whether the multiplication took place or not. Every time a multiplication is performed (in parallel to a squaring), the circuit consumes more power. Figure 6.5 depicts a detailed view after applying a low pass filter with a cut-off frequency of 900 kHz. Instead of simply capturing the increased power consumption during the multiplication, we can observe that the voltage sensor receives a differential signal of the encryption. Thus, we must consider three different cases how the conditional multiplication in the binary exponentiation will affect the voltage sensor:

 If the multiplication module is switched from the off-state (factor of 1 applied to an input) to the enabled-state, the voltage will briefly drop until the power supply is compensated for the increased current. The voltage drop will slow down the signal in the voltage sensor, leading to a negative peak in the trace. An arrow pointing downwards indicates this case is Figure 6.5.

 When the multiplication is switched from the on-state to the off-state, i.e., the FPGA suddenly consuming less power, the voltage overshoots briefly until compensated. This leads to a positive peak in the trace due to the accelerated sensor. This case is marked using an arrow pointing upwards. Also note the large positive peak at the end of the exponentiation in Figure 6.4, indicating that both the multiplier and the squaring module got deactivated.

 If the state of the multiplication does not change, i.e., either staying enabled or staying off, the power consumption remains identical. Thus, the voltage level is constant, causing a steady sensor value. This is indicated by a dash.

52 6.5 Conclusion

These three cases are marked in the magnified view of Figure 6.5. Indeed, the secret exponent can be read out easily even though the RSA and the voltage sensor are implemented on separate FPGAs only sharing the same power supply.

6.4.3 Discussion Our results prove that board-level power analysis side-channel attack threats exist, even in the presence of decoupling capacitors. Of course, same or even better results can be achieved when adding a dedicated ADC directly to the power rails, but not without raising questions of its use. Instead, a seemingly disconnected FPGA would not raise any alarm even in a fully trusted supply chain. The malicious behavior can be enabled later by a firmware to measure the supply voltage with the sensors. Note that this threat is not limited to FPGAs in a board-level integrated system when full supply chain trust is not ensured. Instead, such a threat exists for any untrusted chip on the board. For example, an attacker could use an undocumented internal ADC connected to the shared power supply as a power analysis attack vector. The same is true for any other chip on the board that can be used as a measurement device through maliciously altered firmware. An increasing number of sensors are integrated into all kinds of chips for increased reliability and monitoring purposes, even for voltage fluctuations [DSD+07, WDHB17], elevating the risk of maliciously measuring them for power analysis attacks. In a system where only remote access is considered to be an attack vector, integrated cryptographic accelerators are often not protected against power analysis side-channels (i.e., only timing side-channels are avoided). Such systems could thus be attacked remotely if proper electrical isolation on the board integration level does not exist.

6.5 Conclusion

In this chapter, we demonstrated that we can extend the range of the voltage sensor beyond the chip package: The sensor implemented in a fabric of an FPGA can capture side-channel leakage from other chips on the same PCB. Most noteworthy, the sensor can cope with logical isolation and is only connected to the same power supply as the victim. While we provide experiment results with two separate FPGAs on the same PCB, other types ofICs such as general-purpose CPUs or ASICs might be vulnerable as well.

53 Chapter 6 Inter-FPGA Side-Channel Attacks

Figure 6.2: CPA attack on AES: Results to estimate the sensor quality at different sampling rates, with a board when all relevant capacitors are removed. Each row shows the correlation using 500 000 traces (left) and the progressive curves over the number of traces (right). The correct key hypothesis is marked in black. Time samples refer to the individual samples captured at the respective sampling rate.

54 6.5 Conclusion

Figure 6.4: Binary exponentiation for RSA captured with the voltage sensor on a separate FPGA, sampled at 24 MHz (raw trace).

Figure 6.5: Detail of the binary exponentiation captured with the voltage sensor after applying a 900 kHz low-pass filter. Dotted lines mark the time-span of an individual step in the binary exponentiation. Arrows indicated whether the state of the multiplication module changed (on to off: arrow upwards, off to on: arrow downwards, and no change: dash). The bits above are a part of the (correctly) recovered secret exponent according to this classification.

55

Part III

Active Side-Channel Attacks

57

Chapter 7 Background on Laser Fault Injection

Laser fault injection has seen some particular interest in academia and practice. The reason is that it is one of the most precise methods of fault injection, offering the often desired single-bit fault at high repeatability. In this chapter, we provide background on laser fault injection that might aid for a better understanding of the following chapters.

Contents of this Chapter

7.1 Introduction ...... 59 7.2 Physical Properties ...... 60 7.3 Sample Preparation ...... 64

7.1 Introduction

Many of the known advanced methods of physical attacks originate from failure analysis. A circuit might be not working correctly directly after manufacturing or later stops working in- field after being deployed. To identify the root of this malfunction, the manufacturers might need some insight into the chip, e.g., probing signals or looking for shorts or open connections. However, if an attacker is capable of probing signals, the secret key for example might be trivially revealed. Likewise, a focused ion beam workstation being able to cut or add connections might be used to expose buried signals containing secret data [Zon14]. Reaching more into the relation between semiconductors and photonics, for example laser voltage probing [TLSB17] and photonic emission analysis [SNK+12] have been used for successful attacks while originating from failure analysis (cf. [YPER99] and [KT97], respectively). The same holds for laser fault injection. Semiconductors are susceptible to radiation [PJA88, GG74], e.g., a heavy ion hitting a circuit might cause single or multiple bit errors [LMH+98]. For example, this can be caused by nuclear radiation at ground level [GD77, p.350f] or with decreasing protection by Earth’s magnetosphere and atmosphere, during high-altitude flight and in outer-space. Of course, experimentally testing anIC using a heavy ion accelerator or through space flight requires some tremendous effort. Instead, single or multiple bit errors can be simulated more easily using laser stimulation [Hab65]. Thus, without requiring an actual source of radiation, it is still possible to observe the response of the actual hardware to a computational fault. In 2002, laser fault injection was first used in the context of cryptographic implementations [SA02]. Numerous publications followed, still finding new attack strategies

59 Chapter 7 Background on Laser Fault Injection and refining fault models up today [SZK+18, VTM+17, SSS+17, HBB+16]. Because of the diffraction limit bounding the minimal spot size, many explored down to which feature size single-bit faults still can be produced (cf. [MDR+10, ADM+10, CLFT14a, RSDT13, CLFT14b, SBHS15, DBC+18]).

7.2 Physical Properties

In the following, we describe the photoelectric effect and how it can be used to manipulate signals in integrated circuits. Further, we discuss the most influencing parameters for laser fault injection, i.e., the spot size and the wavelength of the used light source.

7.2.1 Photoelectric Effect in Semiconductors

In the Annus Mirabilis paper of Albert Einstein about the photoelectric effect, he provided an explanation by adding the particle property to the nature of light. These particles named photons can transfer their energy in form of kinetic energy to electrons. As a result, the electrons might be raised into the conductive band of a semiconductor (photoconductivity) or even out of the crystal structure of a metal or semiconductor (photoemission). Figure 7.1(a) depicts the carriers generated along the path of the laser near a p-n junction of a transistor. Considering photoconductivity, the generated electron-hole pairs would recombine immediately. However, the electric field at the p-n junction will act as separation, pulling or re- pelling charges close by. The collection of these carriers and the resulting current (cf. Fig. 7.1(c)) can be separated into two phases [Bau04].

n+ n+ - - + + - - + - + - + + + + - - - - + - + + - + - - - + + + - + - - + + + - + - + - + - + - + + - - - + - + - + + + + - - - + - + + + --+

p-Si p-Si

(a) (b) (c)

Figure 7.1: Effect of a laser beam hitting a p-n junction: (a) carriers generate along the laser path, (b) funneling effect, and (c) generated current ((a),(b) based on and (c) taken from [Bau04]).

In the prompt charge collection phase, the generated carriers stretch the depletion area around the n+-well and its electric field along the laser path. This leads to the funnel shape in Fig- ure 7.1(b), therefore named the Funneling Effect. Within the funnel field, the free carriers are collected within a few picoseconds by the increased drift. This will generate a high current as shown in the middle section of Figure 7.1(c).

60 7.2 Physical Properties

When the enlarged field collapses, the remaining carriers are collected during the much slower diffusion phase. The effect stops when all carriers are either collected or recombined. The resulting current is lower but lasts longer as depicted in the last section of Figure 7.1(c). Note that the above describes the dominant effect for pulse lengths in the nanosecond-domain and above. Simulation results of [DPD+06] indicate that the parasitic bipolar effect dominates when using picosecond pulses while keeping the pulse energy constant. Our laser source is only capable of pulses in the nanosecond-domain anyway and the cost of a laser source is usually inverse proportional to the pulse length [DPD+06]. Thus, we refer to [DPD+06, PLL+99] for further reading.

7.2.2 Single Event Transient We investigate the macroscopic effect of a laser beam using the cross section of an inverter as depicted in Figure 7.2. First, without any influence by a laser beam (i.e., ignoring the red arrows), the input to the gate is “0” and the pMOS transistor is conducting while the nMOS is blocking. The conducting pMOS transistor thus charges the output capacity to “1” (indicated by the ). We now aim the laser beam directly at the drain of the blocking nMOS transistor. As described before, the laser beam will generate electron-hole pairs along its path that will not combine near a p-n junction. Instead, a current is generated that flows through the substrate contact to ground. On the other side, the current pulls from VDD through the conducting pMOS and from the output capacity. If the current caused by the laser beam is large enough to overcome the current trying to charge this capacity through the pMOS, the output capacity will be discharged. In this case, the output changes from “1” to “0”. Stopping the laser beam will also stop the generation of electron-hole pairs, depleting the current source. The still conducting pMOS will then charge the output capacity again. This effect is usually called Single Event Transient (SET) as the gate and the whole circuit while regain their original state depending on the non-affected inputs. Thus, the faulty value must be present at a clock event so that it is stored in the subsequent register. Otherwise, the faulty value will vanish without any effect. This property will be used later for FSA in Chapter9. Note that once the fault is stored, the effect on the remaining part of the chip can be described purely digital at the register transfer level (cf. [PTH+15]). The characteristics for the opposite case, i.e., the input to the inverter being “1”, can be derived similarly. With this input, the nMOS transistor is conducting. Then, the drain of the blocking pMOS transistor becomes sensitive to photons of the laser beam: The resulting current will flow through the conducting nMOS to ground and to the output capacity. Again, if this current exceeds the draining to ground, the capacitor will be charged, changing the output from “0” to “1”. As before, when stopping the laser stimulation, the capacity will be discharged through the nMOS transistor and the circuit will regain its original state. In conclusion, the p-n junction at the drain of blocking transistors is susceptible to laser excitation. The laser beam creates a current source within the semiconductor that might charge or discharge the output capacity of a gate.

7.2.3 Single Event Upset The physical effect of the laser beam is only transient, and the faulty value must be stored in a register to be effective. When targeting storage elements like flip-flops or SRAM directly, the

61 Chapter 7 Background on Laser Fault Injection

in = 0

out = 10 ground VDD

p+ n+ n+ p+ p+ n+ + + - - + + -+ - contact to contact to nMOS + - pMOS substrate n-well n-well p-type substrate laser beam

Figure 7.2: Cross section of an inverter implemented in CMOS technology when hit by a laser beam (based on [CDR+16]). The laser beam (light-red) creates electron-hole pairs at a p-n junction. The resulting current (red) discharges the output capacity, causing a faulty value to appear at the output. fault might be semi-permanent. For such a Single Event Upset (SEU), the faulty value is stored directly, independent of the correct timing. Of course, the memory cell might be overwritten later as the fault is not permanent. In the following, we will explain the SEUs using SRAM cells as example. The storage element in SRAM is usually implemented using cross-coupled inverters, i.e., the output of one is connected to the input of the other one. Thus, their behavior can be directly derived from the description of a single inverter. Figure 7.3 depicts the schematic and the layout of a SRAM cell using six transistors. The inner four transistors M1 to M4 are used in pairs to build the inverters. M5 and M6 are used as pass transistors to access this exact cell in a large grid of SRAM through the wordline (WL) and the bitlines (BL/BL). As seen for the single inverter above, the drains of the blocking transistors are sensitive to laser fault injection, indicated by the colored circles in Figure 7.3. Depending on the stored value, the drain of either the pass transistors M5 or M6 are sensitive as well. For example, if Q = 1, M4 is conducting and M3 and M6 are blocking. Aiming the laser at the drain of M3 or M6 will thus pull the node Q to “0” and Q to “1” through the inverter consisting of M2 and M1. The same outcome can be observed when directly targeting M2: Q will be charged to “1” and Q is pulled down through the inverter consisting of M3 and M4. If the inverter stores a “0” instead of a “1”, the sensitive areas are switched likewise. Figure 7.4(a) depicts the SRAM of an Atmel ATXmega16A4U microcontroller using a Scanning Electron Microscopy (SEM) as imaging technique. The characteristic layout resembles the layout sketched in Figure 7.3(b). The metal layers were removed to expose the polysilicon layer. Thus, the white dots are remainder of the vias to the metal layer. The red box marks the area of two cross-coupled inverters: the actual transistors are implemented between adjacent vias on the top and on the bottom. The result of an exemplary laser fault injection is depicted in Figure 7.4(b), repeated multiple times per location. Dots in light-blue correspond to bit-set faults, yellow dots to bit-reset, and brown dots to an inconsistent behavior.

62 7.2 Physical Properties

WL BL GND BL gate V via DD M M6 5 n-well WL M2 M4 p-type substrate M M 5 6 M4 M2 channel metal connection Q Q

M M BL 1 3 BL

M3 VDD M1 (a) Schematic (b) Layout (based on [IKT+98])

Figure 7.3: Schematic and layout of a SRAM cell with six transistors. The zones sensitive to laser fault injection are marked in red for the state Q = 0 and blue for the state Q = 1. The proportions of the layout are not in scale for increased visibility.

7.2.4 Wavelength

The energy of photons depends on the wavelength of the used light source, i.e., the energy raises for shorter wavelengths. Thus, it might seem desirable to use short wavelengths such as 532 nm in [RDT13] to minimize the requirements of the laser source. However, such wavelengths are only applicable for attacks from the frontside, as discussed in the following. First, the continuously increasing number of transistors available onICs requires an increasing number of metal layers to connect the gates. In these layers, void areas are usually filled up with isolated metal structures. These structures ensure that the surface of each layer is planar during manufacturing as otherwise unwanted tilt could accumulate over multiple processing steps. In addition, high-securityICs usually deploy some form of shield as final layer on top to counter probing attacks. All these metal structures block the light from reaching the active layer completely or to a large extend so that precise fault injection is impossible. This leads to attacks from the backside of theIC through the substrate. However, silicon blocks light in the visible spectrum and starts to become transparent for longer wavelengths (cf. Fig. 7.5). At 1200 nm, silicon starts to get transparent and because of the reduced absorption, the photoelectric effect starts to vanish. Instead, the thermal effect will dominate, i.e., metal tracks are heated up locally resulting in a change of the electrical resistance. Finding the optimal wavelength thus involves a trade-off between photon energy and penetration depth, cf. Figure 7.5. In practice, laser sources are used that provide a wavelength in the Near Infrared (NIR) spectrum, i.e., around 1000 nm.

7.2.5 Spot Size

To create a fault, the laser must create enough electron-hole pairs close to the targeted p-n junction (cf. Sect. 7.2.2). Thus, under the assumption that the energy density is high enough, the spot size mainly influences the affected area. The diffraction limit caused by the wave

63 Chapter 7 Background on Laser Fault Injection

(a) SEM image (b) Fault occurance map

Figure 7.4: SRAM of an Atmel ATXmega16A4U microcontroller: Scanning Electron Microscopy (SEM) image of the die, metal layers removed exposing the polysilicon layer (left) and locations of faults on the right (yellow: reset, light-blue: set, brown: both). The red rectangle marks a single cell in both images. nature of light physically limits the minimal spot size of the laser spot1. There are definitions by Abbe and Rayleigh on how to calculate this limit (cf. [Abb73, RF79]). Using the criterion of Rayleigh, the resulting resolution d describes the ability to separate two structures (airy disks) λ of distance d. For a microscope, it is optically bounded by d = 2NA (in air), whereas λ is the wavelength and NA is the numerical aperture of the used objective. For example, for a typical wavelength of λ = 1064 nm and anNA of 0 .7 (e.g., Mitutoyo M Plan Apo NIR HR 100×), the resolution is bounded to around d = 760 nm. However, the effective spot size reaching the p-n junction is usually larger because of optical aberrations. Because of the physical limit, there is ongoing research down to which technology node single- bit faults can be achieved, cf. [ADM+10, MDR+10, RSDT13, CLFT14a, CLFT14b, SBHS15, DBC+18]. Of course, the optical setups vary a lot throughout these publications and the results might not be strictly comparable. Anyhow, [SBHS15] compared fault injection on SRAM manufactured in 90 nm and 45 nm. As reported, single-bit faults were still possible for SRAM at the 90 nm technology node. However, when targeting SRAM in 45 nm technology, the success depends on the state in neighboring cells, i.e., single-bit faults cannot be created reliably. Instead of SRAM,[CLFT14a, DBC+18] targeted registers as their implementation requires more transistors and likewise more area. Indeed, [DBC+18] reported reliable single-bit faults in registers manufactured using a 28 nm process.

7.3 Sample Preparation

Because of the absorption of light in silicon, it is advisable to thin the backside of the DUT as much as possible (cf. Sect. 7.2.4). Throughout our experiments, we found a remaining silicon thickness of 20 µm to 40 µm to be suitable. While there are dedicated machines commercially available [ULT07], we decided for a low-cost approach and built one by ourselves. We would like

1Note that there are still advanced optical methods available to pass this limit, e.g., 2-Photon Absorption or Solid Immersion Lenses.

64 7.3 Sample Preparation

5 1010

4 105

3

100 2

photon energy [eV] energy photon 10-5

1 absorption depth [cm] (dashed) [cm] depth absorption

0 10-10 400 600 800 1000 1200 1400 wavelength [nm]

Figure 7.5: Photon energy and absorption depth (dashed) in silicon at 300 K (raw data from [GK95]) for light in the spectrum of 250 nm to 1450 nm. to thank Enrico Dietz of TU Berlin for valuable feedback on the construction and operation of such a device. In the following, we briefly summarize the involved requirements and parameters:

 Milling and polishing in a hole: The preferred solution would be to thin the entire package. However, when looking at the cross-section of virtually every variant of epoxy package forICs, the pins to the outside sit on the same height as the carrier holding the silicon die. Thus, when sanding down the whole backside of the package, these pins will get thinner likewise. Ultimately, the pins will be removed entirely and then, the die is no longer bonded to the outside. As dismantling and re-bonding the chip in a new package is also rather difficult, the only option is to mill a hole through the backside. This is certainly more complicated than placing the package, e.g., on top of a spinning disk with sandpaper.

 Alignment of the silicon die to the tool: We aim for a remaining silicon thickness of 20 µm to 40 µm over the whole surface of the die. However, the silicon die of the ATXmega16 that we will use for our experiments later has a size of 4 mm × 4 mm Thus, if the die is tilted by more than 1% to the plane of the milling machine, we will cut into the transistors at one corner while not having reached the desired depth at the opposite corner.

 Surface quality: Another property relates to the smoothness of the surface, especially with spot sizes reaching the diffraction limit. The surface should be smooth within this range because otherwise the resulting spot in the active layer will be malformed. Since silicon and air have a different refractive index, the optimal distance of objective to the silicon surface also depends on the thickness of the silicon underneath. Because laser fault injection usually involves some scanning over a large area, the surface should be even as

65 Chapter 7 Background on Laser Fault Injection

otherwise the focal plane needs to be adjusted all the time. Note that depending onNA of the objective, its focus depth might only be a few micrometers with an immense and sudden decrease of power density when out of focus.

 Fragile work piece: Finally, the mechanical properties of silicon are close to glass, i.e., hard but brittle. Milling the silicon is only possible in small increments in axial direction, i.e., in the range of a few micrometers. Otherwise, the silicon die might break due to mechanical stress. Further, the material added to the wafer to form the transistors and the metal layers on top is in the range of 10 µm in height. After milling and polishing, the remaining thickness is only 30 µm to 50 µm and the die is susceptible to mechanical stress.

To stabilize our setup for laser fault injection, we drastically reduced the center of mass by installing a much lower actuator for the z-axis. The resulting surplus z-axis is still suitable for sample preparation with a minimum resolution of 1 µm. We combined this axis with a hobbyist- grade CNC-machine for milling in x- and y-direction. Unfortunately, the CNC-machine was intended for a large working area and resulting large construction resulted in some instability. Since the z-axis worked reasonably well, we changed x/y to much smaller but more precise linear actuators with a 50 mm range, resulting in a more compact and stable setup. Depending on the package, various additional preparation steps might be required. First, the backside of the silicon die needs to be exposed. For epoxy packages, we directly started with a diamond drill head to remove the epoxy encasing and the carrier. Also, we experimented with using nitric acid to expose the carrier with good results. The goal was to electrically probe the distance to the tool and adjust for any tilt of the carrier. Of course, if the carrier is thin enough it can be removed by hand as well. After cutting any remaining connecting of the carrier to the lead frame, it can be twisted off the silicon die, exposing the attaching glue. An ASIC produced at EMSEC through Europractice [Eur07] came in a ceramic package with open lid. Removing the carrier from the backside would mean that only the bonding wires hold the silicon die in place. Thus, we applied epoxy-based two-component adhesive from the top. The volume of the epoxy stayed constant during curing so that it would not sheer off the bonding wires. We use a diamond head to cut through the ceramic case and the metal carrier to expose the silicon die from the backside. For any package type, we then remove the bulk silicon from the backside at steps of 1 µm to 5 µm, with smaller steps when reaching the target depth. At each layer, we first mill in a zigzag pattern, followed by the same pattern rotated by 90°. The milling steps are lubricated and cooled using oil. Finally, we used diamond polishing compound with a relatively sharp felt tip to polish the surface, starting a coarse grain reaching down to 500 nm grain size.

66 Chapter 8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements

Laser Fault Injection (LFI) is one of the most powerful methods of inducing a fault as it allows targeting only specific areas down to single transistors. The downside com- pared to non-invasive methods like introducing clock glitches is the largely increased search space. An exhaustive search through all parameters including dimensions for correct timing, intensity, or length might not be not feasible. Existing solutions to this problem are either not directly applicable to the fault location or require addi- tional device preparation and access to expensive equipment. Our method utilizes measuring the Optical Beam Induced Current (OBIC) as imaging technique to find target areas like flip-flops and thus, reducing the search space drastically. This mea- surement is possible with existing laser scanning microscopes or well-equipped LFI setups. We provide experimental results targeting the AES hardware accelerator of an Atmel ATXMega microcontroller.

The work described in the following chapter was published at FDTC 2015 [SFR+15]. Contents of this Chapter

8.1 Introduction ...... 67 8.2 Optical Beam Induced Current ...... 70 8.3 Experimental Setup ...... 71 8.4 Results ...... 73 8.5 Discussion ...... 80 8.6 Conclusion ...... 82

8.1 Introduction

Physical methods of fault injection have many possible parameters, e.g., the physical intensity and duration, the correct clock cycle, or the correct timing within a clock cycle. Only if all of them are correct, a fault might appear that is useful for an attacker. Even then, the fault injection might not work reliable, e.g., due to clock jitter of the target or jitter within the tool used to inject the faults. For some parameters, suitable ranges might be known through previous experiments. For others, proper values might have to be found for every new target. The high precision offered by Laser Fault Injection (LFI) comes in turn with many possible

67 Chapter 8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements locations to be tested. Starting in a coarse grid followed by a detailed scan might not always be an option, especially if sensors are deployed as countermeasure. As discussed in Section 7.2, the wavelength, the spot size, and the focal plane influence the success as well. An exhaustive search through the whole parameter space is infeasible. Instead, we propose a method for finding points-of-interest, i.e., locations on die to be shot at, mostly independently of the remaining parameters. A single-bit is the most-precise fault that can be achieved and many mathematical methods of fault analysis model the effect likewise. Such faults can be realized when targeting flip-flops with the laser beam. Only targeting the location of the flip-flops can save a tremendous amount of time compared to a scan over the entire area. [CLFT14b] shows that the locations of flip-flops can be found when removing the covering metal layers and taking an image with a Scanning Electron Microscopy (SEM). In contrast to using a SEM as imaging method, we propose measuring the Optical Beam Induced Current (OBIC) to create an image. The measurement can be realized with any setup for laser fault injection with at most minor modification and thus, does not rely on expensive equipment. We provide experimental results identifying the registers of an AES hardware accelerator and perform a differential fault attack to extract the secret key.

8.1.1 Related Work

Depending on the desired method of fault injection, an exhaustive search through all the parameters can be impractical. Consequentially, many recent publications deal with meth- ods to reduce the search space of the parameters of fault injection campaigns. The ap- proaches [CPB+13, PBJC14, PBBJ15, MSPB18] employ machine learning algorithms to find optimal parameters for voltage level and glitch length when considering power glitches. The pre- requisite for such methods is often that the parameters are (steadily) continuous and bounded, e.g., depending on the voltage level of the power glitch, the device is either unaffected (lower bound) or stops responding at all (upper bound). It is sound to assume that the optimal value here is somewhere in between those bounds. Courbon et al. present in [CLFT14b] a way to identify areas-of-interest for LFI: As an initial profiling step, the device gets thinned down to the dopant layer from the frontside. Then, a high-resolution image is taken using SEM; after a single flip-flop is found, the characteristic pattern of its dopant layer is used to find all other instances of this flip-flop on the die. The found locations can then be used as target for LFI, drastically reducing the number of required shots compared to an exhaustive search. The downside is the requirement of an expensive SEM and an additional sample for profiling, which is destroyed afterwards. In [Kiz09], Kizhvatov showed that the AES hardware acceleration used on an ATXMega128A1 is vulnerable to side-channel analysis measuring the power consumption and using CPA as distinguisher. The attack requires less than 3000 traces. Our DUT, an ATXMega16A4U, presumably implements an identical AES core and is vulnerable as well. Yet, we use this device to show the concept of our attack targeting an easy-accessible and well-documented hardware implementation of AES. Measuring the OBIC, a well-known technique in semiconductor failure analysis (cf. [BLB+02, Col11]), is only rarely mentioned within a security context: the authors of [vWWM11] present an OBIC image (with a very low resolution) for backside navigation, and no additional discussion

68 8.1 Introduction or interpretation is provided. Sergei Skorobogatov describes in [Sko05] that it is possible to read out masked ROM by measuring the OBIC of a nowadays rather outdated Motorola (now Freescale) MC68HC705P6A. In [Sko06], this technique is used from the frontside of a Microchip PIC16F84 (0.9 µm technology node) to identify sensitive areas in SRAM cells and to deduce the transistor layout. OBIC is used as well in [Sko10] to locate the sensitive areas of the Flash control logic of a NEC PD78F9116 microcontroller (0.35 µm technology node). The found areas where then targeted with LFI to extract the firmware by “bumping” during the verify-operation. The same attack was carried out against an Actel ProASIC3 A3P250 FPGA as well (0.13 µm technology node). For the FPGA, however, the author used an exhaustive search over the whole chip area with a larger laser spot in a 20 µm grid to find areas sensitive to bumping attacks during the verify-operation. Note that even though the vulnerable areas were found quite efficiently, the latter approach still requires correct timing and could trigger potential reactive countermeasures.

8.1.2 Our Contribution

We present a method to reduce the location-dependent search space for laser fault injection, i.e., reducing the points of interest to shoot at. Our method measures the OBIC to create a high-resolution image from the backside of the chip. We show that it is possible to identify flip-flops similar to the method in [CLFT14b]. In contrast, our solution does not require an SEM and can be implemented using an existing laser scanning microscope. The main benefits of our proposed method are:

 Finding the correct z-value for LFI and points of interest (x/y) for LFI independent of other parameters like laser energy or pulse width.

 The device is not powered. Consequentially, there is no clock signal and potential areas can be found independent of the correct timing. Further, potential reactive countermeasures against (laser) fault injection are shut off as well.

 Minimal equipment overhead: To measure the OBIC, only a shunt resistor and a low- bandwidth ADC are required. A more sophisticated but still low-cost setup can use a transimpedance amplifier.

 No additional preparation step is required: For LFI, backside thinning is usually performed anyway to account for absorption of the laser beam in silicon.

 Measuring the OBIC provides a considerably better resolution even when compared to a laser scanning reflective image (cf. [WM87, SW90]). Further, no additional (expensive) imaging equipment is required.

As a proof-of-concept implementation, we target the AES co-processor of an Atmel ATXMega microcontroller. In our example, we identify all flip-flops as points-of-interest. Consecutively, we show that we can extract the secret key of the AES by classical DFA when targeting those locations with LFI.

69 Chapter 8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements

8.2 Optical Beam Induced Current

Induced faults in semiconductors were originally studied to understand the effects caused by cosmic radiation in high altitudes for aviation and space technologies. To simulate such effects which are usually caused by high energy particles, laser fault injection was used. When a semiconductor is radiated by light, electron-hole pairs are generated along the path by the inner photoelectric effect. These pairs would recombine directly but if separated by an additional force, i.e. an electric field, a photo current can be measured. In anIC, an electric field occurs at the junctions of differently doped areas. Roscian et al. model the induced current Ilaser in [RSDT13] with Equations 8.1-8.4.

Ilaser = (a · V + b) · Ωlaser · S (8.1)

2 a = p · Plaser + q · Plaser (8.2)

b = s · Plaser (8.3)

d2 d2 − c − c Ωlaser = β · e 1 + γ · e 2 (8.4)

Ωlaser is of special interest for OBIC since it describes the spatial dependency of the induced current. It forms the Gaussian distribution of the laser spot and shows an exponential de- pendency on the distance to the junction d. Also, the surface of the sensitive zone S is an influencing factor. Since we measure the OBIC with the device shut off and no voltage bias, we have V = 0 and the current we observe does not depend on Equation 8.2. β, γ, c1, and c2 are fitting parameters which depend on the optical setup and lens used. b is a function of the laser power Plaser using the fitting parameter s. While for OBIC we measure the induced current for imaging purposes, in a running device this current can change the state of the logic. When the laser spot hits the drain or gate of a transistor realized in CMOS, it can charge/discharge the node of the circuit it is connected to. This effect can be used to alter calculations by targeting flip-flops and alter their saved state. Also, it is possible to target the logic and hold the pulse until the next clock edge so that the changed logic state gets saved, since it will most likely recover into its original state when the laser pulse ends. Note that besides OBIC, there is a multitude of (optical beam based) imaging techniques known in failure analysis [Col11]. We specifically chose OBIC measurements because of the ease to be integrated into a laser fault injection setup: Simply a shunt or a transimpedance amplifier and a device to measure the voltage is fully sufficient. In contrast, as described in [Col11], Light-Induced Voltage Alteration (LIVA) requires a constant current supply with nano ampere precision and range. Optical Beam Induced Resistance Change (OBIRCH), Thermally- Induced Voltage Alteration (TIVA), and Seebeck Effect Imaging (SEI) use localized heating caused by a laser with a wavelength above 1100 nm. Such a wavelength is usually not found in LFI setups (cf. [Ris17b, vWWM11]) since here, the photoelectric effect is desired instead of the thermoelectric one.

70 8.3 Experimental Setup

8.3 Experimental Setup

In the following, the DUT, the used microscopes and laser sources, as well as the measurement setup are described.

8.3.1 Device under Test (DUT) All our experiments were conducted using an Atmel ATXMega16A4U in a TQFP44 pack- age [Atm14]. The technology node of the DUT is approximately 250 nm (based on the approxi- mate minimum feature size in the SRAM, measured using a SEM solely for this purpose). The DUT includes 2 KB SRAM, 1KB EEPROM, 20KB of flash, up to 32MHz clock speed, and, most prominently from a security perspective, DES and AES acceleration in hardware. The DES is attached to the CPU by a special instruction executing a single round of DES and its key schedule using the regular CPU registers. In contrast, the AES engine is attached to the data bus and provides the STATE, KEY, CTRL, and STATUS registers to control its function. It takes 375 clock cycles to complete one encryption. The way the AES core is attached to the CPU favors it as target for a controlled experiment. For fault attacks on DES, one might not be able to distinguish faults on the actual DES hardware between general faults on the CPU, e.g., disrupting the instruction fetch. In contrast, when the AES is running on its own and the CPU is simply executing NOP operations, one can be certain that a fault in the result, i.e., the ciphertext, was induced into the AES core. The DUT was placed on a custom circuit board especially designed for laser fault injection and side-channel measurements: to recover from possible latch-ups caused by LFI, the power supply can be detached automatically via USB. To prevent the DUT from powering itself through its IO-ports, the programming pins and a serial interface are connected through an electrical isolationIC. To reduce absorption of the laser beam in silicon, the package was opened from the backside and the exposed silicon was thinned down close to the active layer and polished. The DUT used in our experiments has a silicon substrate thickness of around 20 µm (i.e., the distance from the backside surface to the doped layer). This is a typical approach for laser fault injection.

8.3.2 Optical Setup Two different optical setups were used throughout the experiments: Setup 1 for creating the OBIC image and Setup 2 for fault injection1. The main difference is that Setup 1 is equipped with a continuous scanning stage with position output and thus, was used for faster imaging. A common coordinate system for both setups was established by measuring different reference points on the DUT using OBIC measurement. Both setups stand in an air-conditioned class 4 laser laboratory on a vibration isolated optical table.

Setup 1 (Laser Scanning Microscope)

Setup 1 is a modified self-built laser scanning confocal microscope [KFS+14, Web96], which is used for OBIC and reflective mode imaging with high resolution and precision. A temperature

1Note that this separation has solely organizational reasons during our experiments as the main difference, the scanning stage, can be easily implemented in both setups.

71 Chapter 8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements stabilized, fiber coupled laser diode module from Lumics at 1064 nm was used as light source. The Single Mode Fiber (SMF) output was collimated by a large beam reflective collimator and propagated through a non-polarizing beam splitter (BS). The light was focused onto the DUT through a custom-made Leica objective (NA 0.75, magnification 100x), optimized for use in the NIR range. The diffraction limited spot size for the given combination of objective and laser was calculated to be 1.7 µm. The DUT was scanned by three motorized stages (x, y: LTA-HS with M-462; z: LTA-HL with M-MVN80) from Newport Corp. with a measured precision better than 100 nm. For reflection mode imaging an additional achromatic lens (f=35 mm) was used to focus the backscattered light from the sample through a custom 500 µm aperture onto a Si photodiode. The OBIC was amplified and converted to a voltage using a FEMTO DLPCA- 200 variable gain transimpedance amplifier and a Stanford Research SR560 low noise amplifier. The low-pass filter of the SR560 was set to 10 kHz to match the maximum ADC bandwidth in the Newport XPS motion controller unit. A PC with Matlab was used to communicate over Ethernet with the XPS controller for operation, data collection, and creating the point cloud. Figure 8.1 shows the electrical part of the setup and Figure 8.2 shows the optical part.

Figure 8.1: A block diagram of the setup

Figure 8.2: Laser scanning microscope, the green part is for reflective mode only

72 8.4 Results

Setup 2 (Modified Laser-Fault-Injection Microscope from Opto GmbH)

Setup 2 is a commercially available microscope especially designed for dual spot laser fault injection. It was modified to reach the required mechanical precision and stability by changing the z-stage to a precision z-stage (M-501.1DG) from Physik Instrumente (PI). In addition, a new sample holder was designed and integrated. A 976 nm SMF-coupled on-demand diode laser module from ALPhANOV [ALP16] was used as light source, focused through a Mitutoyo Plan Apo NIR HR objective (NA 0.65, magnification 50x). We calculated the diffraction limited spot size to 1.83 µm.

8.4 Results

In the following, a fault injection campaign on the hardware AES implementation of an ATXmega16A4U using our proposed method of reducing the search space is described. We start by using the OBIC measurements to identify flip-flops as candidates. Then, we target the found positions and show that it is possible to extract the secret AES key using LFI. Although not within the scope of this work, we utilized side-channel measurements two times during our experiments: first, to get a rough estimation where the AES might be located based on itsEM emanation, and second, to pinpoint a timespan for fault injection based on CPA with a known key. However, when the attacker has no such control over the target device, these measurements can be skipped. Note that ignoring rare cases and analog designs, all chips nowadays are based on the standard cell design principle. Here, the manufacturer of the chip has a library of layouts for many basic gates, e.g., NAND, and more complex cells like flip-flops. Those are individually optimized and only copies are instantiated on the silicon die when needed and are connected on the upper layers. Consequentially, at least on the lower layers and especially the dopant/polysilicon layer, all instances of a cell look identical. Of course, one must account for mirrored versions in both directions: (1) In a standard layout the lines for VDD and VSS alternate and the cells must be mirrored, and (2), mirroring within the CMOS lanes is possible as well without changing its functionality.

8.4.1 Estimating the Rough Location of AES For initially reducing the search space, we measured the localizedEM emanation of the DUT to get a rough location of the AES core. Note that this approach might not work for every target, as it heavily depends on the characteristic emanation and implementation. We used an open-source probe available at [OR14]. During the computation of the AES core a trigger signal was pulled high and the probe was moved across the surface manually. While the core was running, the CPU simply executed NOP-operations. Figure 8.3 depicts the captured emanation at one side of the DUT, showing a strong signal not found elsewhere. A rectangle within this area was used for measuring the OBIC.

8.4.2 OBIC Measurements In the following, we describe how OBIC measurement can be used to determine the correct z-position and how to find candidates (x/y) for laser fault injection. Note that this profiling

73 Chapter 8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements

Figure 8.3: Captured localEM signal (blue) and trigger pulled high during AES computation (green).

step must be performed only once per target. The chip is not powered and thus, those steps are independent of other parameters like correct timing, pulse length, or (to some extent) pulse energy. Further, no responsive countermeasure against (laser) fault injection is active.

Establishing the correct z-position

The first step is to find the correct z-position for the following measurements of the OBIC on the x/y-plane. We sweep through the z-axis while holding the laser energy constant. The measurable current caused by the laser will get maximized exactly when the photon density on the sensitive area is maximized, i.e., the focus of the laser spot lays directly at the pn-junction. Consequentially, the diameter of the laser spot is minimal as well and this z-value can be used for imaging, provided the optimal resolution. Figure 8.4 depicts the measured OBIC signal while sweeping through the z-axis. The found position can be used for fault injection as well. Repeating this process at multiple x/y-positions, e.g., at each border, further enables a very precise measurement of the angle of the DUT to the objective. If the chip is slightly tilted, the z-axis can be adjusted automatically depending on the current x/y-position for targeting a very large area, both for OBIC and for fault injection.

74 8.4 Results

Figure 8.4: OBIC over the z-axis, 100x/NA=0.75 1064nm.

Measuring in x- and y-direction

Once the correct z-position is found, the OBIC can be measured in xy-direction, providing a detailed image of the active area of the DUT. Figure 8.5 depicts an OBIC measurement over an area of around 225 µm × 150 µm with a step size of 100 nm taken in approximately 45 min. The location was roughly estimated usingEM measurements as described above. One can clearly identify the 18 vertical lanes implementing CMOS-logic and the supply lines in between. In more detail, Figure 8.6(a) has a width of two full “CMOS”-lanes, each having a width of around 12 µm. The horizontal bars visible in the image directly correspond to the drain and source regions of the individual transistors. The electrons forming the channel for n-type MOSFETs have a higher mobility than the holes for p-type MOSFETs. For a symmetric switching voltage, the widths are usually chosen so that the width of the p-type is 2.5 times larger than the width of the n-type MOSFET [KL96]. Thus, given that PMOS transistors usually have larger channel widths than NMOS transistors in CMOS designs, one can assume that the inner supply lane corresponds to VDD and the two lines at the outside are connected to VSS. Most importantly, one can easily identify repeated cell instances at the provided resolution. For example, in Figure 8.6(a), there are four large identical standard cells visible, two of which are mirrored vertically. A quick manual inspection revealed that the marked pattern is repeated multiple times within the area and we assumed that the pattern corresponds to a flip-flop. Flip-flops are usually one of the largest cells in a standard cell library. We argue that even without the exact knowledge of the layout of a flip-flop, we still can identify large repeated cells. Even if we obtain multiple potential candidate layouts, we do not care about a slight overhead caused by some false positives. Still, the time benefit will be very large compared to an exhaustive search over the whole area.

75 Chapter 8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements

To verify whether the pattern in our case implements a flip-flop, we performed a fault injec- tion test on one of its occurrences: During the laser pulse we stopped the clock of the ATXmega. Only faults which directly affect flip-flops will persist and produce a fault in the calculation. Faults in combinatorial logic will not be saved to the flip-flops due to the missing clock. Since we were indeed able to induce faults using this procedure, we are certain that the used pat- tern corresponds to flip-flops. Thus, we used this pattern for automatically finding other cell instances as described in the following. Note that devices specifically designed for security- critical applications will not provide external access to the clock signal. Thus, halting the clock is not possible there.

Figure 8.5: OBIC x/y, the Gray scale corresponds to the OBIC amplitude in arbitrary units. We assume that the wave around Y = 100 µm is a result of mechanical stress caused by the thinning of the silicon die.

8.4.3 Correlation-Based Pattern Recognition

Using the large characteristic pattern assumed to implement a flip-flop, we searched for all other instances in the area visible in Figure 8.5 using correlation. As discussed above, the instantiated standard cells might be mirrored because the vertical lines connecting to VDD and VSS alternate. To optimize routing, the instances further might be horizontally mirrored, i.e., so that the output of the cell is closer to the input of the following cell. Figure 8.7 depicts the two-dimensional correlation between the pattern (without its mirrored version) and the selected area. This resulted in multiple spikes with a correlation ranging from 0.6 up to 0.8. Note that for finding flip-flops based on correlation in a larger area, one might want to limit the search space to the individual lanes between the supply lines. We skipped this step because the computation in our example finished in a matter of seconds on a standard PC. Figure 8.8 depicts 34 found positions for all four mirrored versions of the pattern. To obtain the expected

76 8.4 Results

(a) OBIC x/y in detail, boxes mark the as-(b) Locations found sensitive to LFI within sumed flip-flop layouts, colors represent the previously isolated areas, detail of mirrored version, the Gray scale corre- four cells. sponds to the OBIC amplitude in ar- bitrary units. (Coordinates w.r.t. Fig- ure 8.5)

Figure 8.6: OBIC x/y in detail at least 128 positions, the search area must be increased further. The colors (or line types) in the figure differentiate each mirrored version. Note that the distribution is sound with respect to the vertical lanes: Only red and yellow boxes or blue and green boxes appear in one lane. Further, both types alternate caused by the alternating VDD and VSS connections. Now, instead of targeting the whole area with LFI, we can only focus on the found locations, reducing the search space. Usually, not the whole area within the box will have sensitivity towards fault injection. Instead, specific areas or transistors of the cell will have an effect while others not. Thus, after finding those sensitivity zones for one pattern, the search space can be decreased even further only targeting the correct zones with a few laser shots.

8.4.4 Finding the Correct Timing

Before performing fault attacks on the isolated locations, we performed a CPA attack with a known key to narrow down the point in time. If no profiling device with known key is available, this step must be skipped. For profiling, we chose the last round to avoid the MixColumns step of AES. Thus, for a successful fault injection, we expect either single-bit faults (fault occurred

77 Chapter 8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements

Figure 8.7: Correlation (Gray scale) of a single pattern resulting in multiple spikes. The white box depicts a detail for three found instances. after the SBox) or single byte faults (fault before SBox). We started by calculating intermediate values at the beginning of the last round, i.e., after the addition of the round key but before the S-Box computation in ShiftRows order. The most suitable power model was found to be the Hamming Distance between each byte of the state: HD (si, si+1) for state byte si. Figure 8.9 shows the correlation with 15 clearly visible peaks corresponding to the 15 Hamming Distances using 300 traces2. Therefore, at exactly this point in time the value is processed or overwritten respectively, providing an exact time reference.

8.4.5 Targeting found Locations

Having found potential areas and the potential point in time; we performed actual fault attacks on the last round of AES. Note that the profiling steps above only must be performed once per target. We chose to energize the laser at exactly 181 µs for a duration of 600 ns, i.e., shortly before the rising edge of a clock cycle until shortly after the rising edge of the following cycle (cf. Fig. 8.9). This way, a full clock cycle is covered. The point in time is specifically chosen to affect after and before an addition with the secret key. To ensure an even distribution at the input of the last round, we tested each position for 256 different plaintexts. Figure 8.10 depicts an overlay of the found sensitivity zones within the previously found locations: Green dots correspond to Bit-Set and red dots to Bit-Reset faults. Note that not all found cells show sensitivity, presumably because their content gets overwritten later or the registers are not active.

2Note that the previous attack in [Kiz09] required 3000 traces using the HD model after key addition in the first round.

78 8.4 Results

Figure 8.8: Found flip-flops marked in an OBIC image. Colors represent mirrored versions.

Figure 8.6(b) provides a more detailed view of four cells and their sensitivity zones for LFI. Given that both red boxes correspond to an identical orientation of the instantiated standard cell, the obtained sensitivity zones are very similar as well. Further, since the blue boxes are a horizontally mirrored version of the red ones, the sensitivity zones are mirrored as well. Consistent to the expected behavior as discussed in Section 8.2, the faults occur where the OBIC measurement shows high amplitude. However, investigating the sensitive zones differentiated by their exact behavior, i.e., the green and the red areas, reveals interesting effects. There is always one complementary pattern visible, cf. Figure 8.6(b) at each box in the upper part. The third sensitivity zone, however, changes its fault behavior: In the upper two boxes of Figure 8.6(b), the third zone at the bottom triggers Bit-resets. In contrast, the zones in the lower two boxes of the same figure correspond to Bit-Sets, although the other zones are unchanged. We assume that the complementary zones implement the storage part of the flip-flop. Hence, we obtain specific areas for Set and Reset, similar as reported in [CLFT14a]. Because the third sensitivity zone appears uncorrelated to the other two in each cell, we assume that it is related to the input signal or the clock signal. Note that the orientation of the upper complementary zones is inverted for some instances as well. Since we are only able to determine the behavior based on the ciphertext and have no additional information about the implementation, we assume that those flip-flops store their content inverted.

8.4.6 Differential Fault Attack

Comparing the genuine ciphertexts and the faulted ones, we noticed that we only obtained single-bit faults. The last operation in AES is an exclusive-or between the current state and the last round key. Thus, for an unknown key, we cannot determine whether the ciphertext was altered or the key. Thus, we changed the timing so that the laser is energized exactly one round

79 Chapter 8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements

Figure 8.9: Correlation for Hamming Distance between consecutive state bytes√ at the input of the last round after KeyAddition, the red lines represent the ±4/ #traces bound for uncorrelated noise (cf. [MOP07, p.150]). earlier, i.e., directly within the ninth round of AES. This way, we received identical sensitivity zones but this time resulting in single-byte and four-byte faults in the ciphertext. Concurrent with the considerations above (cf. Sect. 8.4.4), this corresponds to some faults occurring before the MixColumns-Step and some faults afterwards. Picking only the single-byte faults, we per- formed a well-known, straight-forward DFA[BS97]: For given pairs of ciphertext and faulty ciphertext, we test whether the difference between hypothetical state values at the input of the SBox in the last round resolves to a single-bit fault (at an identical position for all pairs). The key hypotheses that satisfy this equation are potential key candidates. Usually only a few key candidates remain for one pair of ciphertext and faulty ciphertext. Indeed, we were able to calculate a byte of the correct round key using two respective pairs.

8.5 Discussion

8.5.1 Reduction of Points of Interest

The time required to perform a fault injection campaign linearly depends on the number of positions to test: The setup first must move to these positions and then, can perform a test with constant runtime. The time required to move to a certain position depends solely on the way the setup is constructed (moving stage or scanning within the field of view of the objective). Further, the runtime per position heavily depends on the execution time on the target and the communication overhead. Thus, we only consider the total number of positions in our calculation. For the given example, 255x150 = 38250 positions need to be tested if scanning the whole range in 1 µm steps. Using our method, the number of positions can be reduced to 34x17x10 = 5780 positions within the borders of the 34 found flip-flops. This is an improvement by a factor of 6.6 which is a considerable improvement given the high density of flip-flops in the targeted area. This number can be reduced further if the sensitivity zones are

80 8.5 Discussion

Figure 8.10: Locations found sensitive to LFI within the previously isolated areas. Green dots correspond to Bit-Set and Red dots to Bit-Reset faults. known. For example, one could theoretically target each of the sensitivity zones only once with a slightly larger laser spot. Then, the number of positions is reduced to 34x3 = 102 or by a factor of 375 for the given area and 34 found instances.

8.5.2 OBIC versus Reflective Imaging Based on the provided figures, one might be tempted to think that there is no significant difference between OBIC measurements and capturing the NIR reflection using a camera for the NIR range. to compare both approaches, we captured the reflection of the laser beam using the laser scanning microscope (cf. Sect. 8.3.B). Note that laser scanning microscopy with reflection measurement already provides a higher resolution compared to using a camera (cf. [ASH95, AVZ82]). Figure 8.11 depicts a direct comparison between OBIC and reflective measurements. As discussed in [WM87, SW90] as well, measuring the OBIC provides more detail and improved contrast compared to laser scanning. Anyhow, OBIC measurement certainly enables a higher resolution than commercially available NIR-lightning/camera solutions for fault injection setups, e.g., shown in [Ris17a] for an ATMega163.

8.5.3 Influence of the Technology Node To decide, whether the described method could be applied, one must consider the following parameters: (a) the technology node, (b) the characteristic cell layout of the DUT, and (c) the (effective) spot size of the laser. Given the SMF-coupling of the used laser diodes, the photon (or energy) density of the laser spot will follow a Gaussian distribution. The spatial resolution λ d, as defined by the Rayleigh criterion [RF79], is given by d = 2NA . In case of the Setup 1 (LSM) these leads to d ≈ 710 nm.

81 Chapter 8 Locating Points-of-Interest for Laser Fault Injection using OBIC Measurements

Figure 8.11: Comparison between OBIC and reflective measurements. Jet color scale corre- sponds to arbitrary units of the respective signal.

Comparing this value to the measured technology node of the ATXMega16A4U of 250 nm might seem conflicting. However, the technology node refers to the minimal feature size incor- porated into the silicon die, which is usually the channel length of the smallest n-type MOSFET. In the other direction, the channel width is linked to the current handling capability of the tran- sistor and is in CMOS designs usually much larger than the channel length. For example, the ratio between channel width and length of a NAND-gate of an ATXMega32 is approximately W L = 8 (cf. Fig. 8.12). Including the drain and source regions, connecting vias, and signals to the layout thus leads to a characteristic structure much larger than the technology node. This especially holds for flip-flops consisting of a multitude of transistors. In fact, the found flip-flops in Section 8.4.2 require a space of around 17 µm x 12 µm, providing plenty of structural detail for the given resolution.

W Figure 8.12: SEM image of a NAND-gate of an ATXMega32 showing a ratio of L = 8. (As- sumed column width of 12 µm.)

8.6 Conclusion

Our method drastically reduced the location-dependent search space for laser fault injection with minimal equipment overhead. The device is not powered during imaging, and thus, points-of- interest can be found independently of the correct timing, pulse length, etc. Further, potential reactive countermeasures are shut off as well. Similar to [CLFT14b], we identified flip-flops using correlation. However, instead of using a SEM and additional device preparation, we show that similar results can be achieved when measuring the OBIC. For our experimental results, we verified that the found locations indeed implement flip-flops by inducing a fault while the clock is not running. We successfully attacked the AES hardware implementation of

82 8.6 Conclusion an ATXmega16A4U and extracted the secret key. Even though our exemplary chosen target area contains many flip-flops, having this precise location information leads to a reduction factor of 6.6 when covering the full flip-flop area, or a factor of 375 when targeting each sensitivity zone with a single shot. For future work, it should be answered down to which minimum feature size measuring the OBIC is providing enough resolution to find repeated patterns based on correlation. Based on our measurements, the feature size of the ATXMega16A4U is certainly not the limit.

83

Chapter 9 Large Laser Spots and Fault Sensitivity Analysis

Laser Fault Injection (LFI) is a powerful method of introducing faults into a specific area of an integrated circuit. Because the minimum spot size of the laser spot is physically bounded, many recent publications investigate down to which technology node individual transistors can be targeted. In contrast, we develop a novel attack that is applicable even when numerous gates are affected at the smallest feature sizes. To achieve this, we adapt Fault Sensitivity Analysis to the laser setting. Such attacks require reasoning about the critical path of a combinatorial circuit and were previously only considered for clock glitches. Indeed, we show that this prerequisite is available for LFI as well. This leads to a very relaxed fault model, especially in terms of the required laser spot size. We conclude that there is no intrinsic protection for the latest technology nodes and LFI remains a serious threat for embedded devices. Experimental results are provided by targeting the combinatorial AES Sbox of an Atmel ATxmega microcontroller with an artificially large laser spot. Finally, we discuss why this attack is still applicable to the smallest structure sizes.

The work described in the following chapter was published at HOST 2016 [SFG+16] and got awarded with the best student-paper award based on the paper and the presentation given at HOST’16. Contents of this Chapter

9.1 Introduction ...... 85 9.2 Fault Sensitivity Analysis ...... 86 9.3 Laser-Based Fault Sensitivity Analysis ...... 87 9.4 Practical Evaluation ...... 89 9.5 Discussion ...... 92 9.6 Conclusion ...... 93

9.1 Introduction

The advantage of Laser Fault Injection (LFI) over other methods of fault injection is its high precision. For example, clock glitches or voltage glitches usually affect the entire device. Mi- crocontrollers intended for security-critical applications often generate clocks and voltages in- ternally and thus, deny such attacks entirely. EvenEM fault injection affects usually a large area on the silicon die. Instead, single transistors can be targeted using LFI.

85 Chapter 9 Large Laser Spots and Fault Sensitivity Analysis

As discussed in Section 7.2.5, the wavelength of the used light source bounds the minimal spot size. Attacks from the backside of the silicon die require a wavelength in the NIR-range (or above) to be able to pass enough energy through the silicon (cf. Sect. 7.2.4). The continuously shrinking feature sizes in modernIC-fabrication leads directly to a potential conflict when single-bit faults are desired. There are many articles down to which technology node single-bit faults can be achieved, cf. [ADM+10, MDR+10, RSDT13, CLFT14a, CLFT14b, SBHS15, DBC+18]. For SRAM, the results of [SBHS15] indicate that reliable single-bit faults are not possible at 45 nm while still succeeding at 90 nm. However, hitting exactly only a single transistor might not be required to create a single bit fault. For example, [CLFT14a, DBC+18] targeted registers instead as disturbing an entire flip-flop consisting of multiple transistors should still result in a single-bit fault. Indeed, [DBC+18] reported reliable single-bit faults for a 28 nm process. In contrast, we show in the following that we still can produce useful faults even when numerous transistors are disturbed. To this end, we combine Fault Sensitivity Analysis (FSA) with laser fault injection. FSA was first introduced in [LSG+10] but required extensive profiling of the different paths in the combinatorial logic of the DUT. Instead, [MMP+11] proposed to skip this profiling step by using collision-based analysis. We use this enhanced version for our attacks which leads to a relaxed fault model with respect to the required spot size. In the following, we will briefly introduce the working principles of FSA and then describe our adaption to LFI.

9.2 Fault Sensitivity Analysis

FSA was originally introduced in [LSG+10] and targets the combinatorial part of an imple- mentation. The important observation here is that the length of the critical path between two registers is dependent on the input of the circuit (cf. Fig. 9.1). In more detail, the accumulated propagation delay until the output of the circuit is stable depends mainly on two parameters: the active gates (different depths) and the signals applied to these gates. To exploit this behavior, the authors targeted different AES Sboxes implemented on an ASIC prototype by introducing clock glitches with increasing frequency. Faults started to occur at a certain frequency, called the critical fault injection intensity. For the attack, the authors correlated this critical frequency for multiple random plaintexts with a prediction based on the ciphertexts and a key hypothesis. Using only 50 plaintexts, the authors successfully attacked a 128-bit PPRM1 AES implemen- tation. The downside of this approach is the mentioned prediction function which must be generated using extensive device profiling. As an improvement to this attack, two options to apply the Correlation-Enhanced Collision attack of [MME10] to FSA were discussed in [MMP+11]. Both options skip the profiling step entirely. These attacks also require that the input dependency of the critical path is similar for different instantiations of the same circuit. We omit Option 2 of [MMP+11] here since Option 1 is more similar to our approach. Option 1 captures the distribution of the resulting ciphertexts when setting the clock glitch leads to approximately 50 % faulty ciphertexts. The authors create a list Cnt(i) that stores how often the faulty ciphertext i occurred. Repeating this for multiple instances of the Sbox allows finding collisions. Assuming the hypothetical difference between two bytes of the key ∆k = ki ⊕ kj is zero, i.e., ∆k = 0, the distributions are expected to be similar. Other possible hypotheses can be easily tested by rearranging one of the distributions,

86 9.3 Laser-Based Fault Sensitivity Analysis

c1 clock c2 B A R R 1 T E T E S S 2 G I G I R E R E time / data flow

Figure 9.1: General structure of clocked combinatorial logic between two registers.

0 i.e., Cnt (i) = Cnt(i ⊕ ∆khyp). For measuring the similarity of the distributions, the authors used the Pearson correlation coefficient. Thus, the most probable hypothesis is expected to show the highest correlation. Note that full control over the clock signal is not available in every setting. For example, the clock is usually generated internally for security-enhanced smartcards.

9.3 Laser-Based Fault Sensitivity Analysis

We adapt the idea of FSA to the laser setting. A detailed explanation of the (physical) effects concerning LFI is available in Chapter7. When the photons of a laser beam hit the pn- junction of a transistor, a current is created that might charge or discharge the output of the targeted gate. When targeting, e.g., SRAM or flip-flops, the state might be permanently altered. However, when shooting at general combinatorial logic, the effect caused by the laser is only transient. The circuit will regain its original state depending on its input when the laser excitation is stopped. We use this transient effect as the basis for FSA as described in the following.

9.3.1 Timing Violations by Different Laser Pulse Lengths

Figure 9.1 depicts the general setting for (clocked) combinatorial logic. At the first rising edge of the clock signal (c1), register A becomes transparent and applies its input to the output on its right. Then, the logic gates of the combinatorial circuit switch consecutively, each gate with a specific delay. Exactly at the next rising edge of the clock signal (c2), the values present at the input of register B are stored. Hence, the total propagation delay through the combinatorial circuit must be smaller than the clock period. Otherwise, faulty intermediate values might be stored in register B. The blocks 1 and 2 in Figure 9.1 represent individual gates or entire subnets. Now we target the laser beam at block 1 and force its output to some faulty value. Consequentially, block 2 must re-evaluate and at some point in time, the faulty values will have propagated to the input of register B. Here we can observe two extrema regarding the end of the laser pulse with respect to the distance to the following clock edge (c2). First, when the pulse ends long after the rising edge c2, the faulty value will surely be latched into register B and the fault is preserved. The second case is that the laser pulse stops long before clock event c2. After the laser influence,

87 Chapter 9 Large Laser Spots and Fault Sensitivity Analysis

Algorithm 3 Measurement phase for a single Sbox Require: random plaintext P , number of executions N, target byte j ∈ {0, ..., 15} Ensure: The number of faults Cntj(p)(p = 0, 1, .., 255) that occurred for plaintext byte P j = p

1: Cntj(p) ← 0 for p = 0, 1, ..., 255 2: for n = 1 to N do 3: run encryption with injected fault 4: if fault occurred then 5: Cntj(p) ← Cntj(p) + 1 6: end if 7: end for

Algorithm 4 Evaluation phase Require: Target bytes i and j, respective distributions Cnti and Cntj Ensure: Most probable key difference ∆k = ki ⊕ kj 1: for 0 ≤ ∆k ≤ 255 do 0 2: Cnt j(a) ← Cntj(a ⊕ ∆k), ∀a ∈ {0,..., 255} 0 3: Cor(∆k) = ρ(Cnti, Cnt j) // Pearson Correlation 4: end for 5: return arg max Cor(∆k) ∆k block 1 restores its original value and block 2 has enough time to re-evaluate. Then, the correct values are latched into register B and the fault is not preserved. Now we consider an offset or pulse length so that the laser ends within this interval. Then, we expect to see multiple different faulty values in register B. With increasing pulse length, an increasing number of gates in the path(s) from the fault to the register will affect the result. Note that the above might be exploitable using Fault Intensity Analysis [GYTS14] as well. They exploit that for increasing fault intensity the number of affected bits increases likewise. The characterization of our DUT confirms this behavior. Regardless, it is not required for the attack to succeed.

9.3.2 Attack Strategy

From the considerations above, we conclude that LFI might provide an identical basis for FSA where in the original works clock glitches have been applied. Yet, we slightly change the attack strategy [MMP+11]. Instead of the last round, we target SubBytes in the first round of AES. Thus, we introduce faults during the computation of Sbox(p ⊕ k) for plaintext byte p and key byte k. We assume a serialized implementation. Hence, we repeat the following step for each Sbox instance. For random plaintexts, we increase the length of the laser pulse by small steps until a certain percentage of faults appears, and then we run N trials. We create a list Cnt(p) that stores the number of faults that occurred for plaintext byte p. Note that by targeting the first round, we do not require the exact value of the genuine or the faulty ciphertext but solely the plaintext and knowledge of whether a fault occurred or not. For evaluation, we use Pearson correlation to find collisions as described in [MMP+11]. Likewise, the correct difference

88 9.4 Practical Evaluation

∆k = ki ⊕ kj is expected to show the highest correlation. For completeness, we provide the pseudocode of the measurement phase in Alg.3 and of the evaluation in Alg.4.

9.4 Practical Evaluation

In the following, we present a practical evaluation of the proposed attack targeting the AES co- processor of an ATxmega16. After describing the experimental setup, we show that we indeed can measure individual timings using LFI. Finally, we run the attack and successfully recover the correct key differences.

9.4.1 Experimental Setup

We conducted our experiments using an Atmel ATxmega16A4U with a minimal feature size of around 250 nm (cf. Sect. 8.3.1). We chose this microcontroller as it allows easy access to a fully controllable real-world hardware implementation of AES. Note that an identical ATxmega16 was already found to be vulnerable when targeting its flip-flops with LFI (cf. Sect. 8.4.6). The AES core is loosely attached by status and control registers to the CPU (as opposed to round instructions). Thus, we can be certain that any obtained faults originated from the AES and not from other parts of the circuit like instruction registers, etc. A single AES encryption requires 375 clock cycles. We assume that a highly serialized implementation is used and most importantly, a serialized SubBytes operation as well. Figure 9.2 shows the silicon die, captured from the backside using NIR illumination. The area of the Sbox implementation is marked by a rectangle. The exact location and its dimension were found earlier by an exhaustive search spatially and for every input. Gates related to the Sbox were found to cover an area of 230 µm × 310 µm. The backside of the DUT was thinned to approximately 20 µm remaining silicon substrate. The supply voltage of the DUT was set to 1.6 V and the clock was provided externally at 2 MHz. We used a microscope by Opto GmbH built for LFI, slightly modified to match our stability and throughput requirements. The laser was focused through a Mitutoyo Plan Apo NIR (NA 0.26, magnification 10x) objective. Two 975 nm single mode fiber-coupled on-demand diode laser modules from ALPhANOV were used. Both were set to maximum output and were focused on the same spot to maximize the energy density. We measured a laser peak power of 0.52 W. Because of the overly long pulse widths (cf. Fig. 9.3), the pulse energy is not relevant. The timing was controlled by a Stanford Research Systems DG645 programmable delay generator based on a trigger signal at the beginning of the encryption. First, we established the optimal focal plane by gradually decreasing the energy to a minimum. During this process, we adjusted the location of the spot spatially and axially in such a way that faults were still observable. The minimum spot size for anNA=0 .26 is calculated as 1.22×λ NA = 4.5 µm at a wavelength of λ = 975 nm (diffraction limit by Abbe). We measured the spot size using an Ophir Spiricon SP620U beam profiling camera with a pixel pitch of 4.4 µm. Indeed, the minimal measured spot size was a single pixel. For our experiments, we intentionally changed the focal plane to simulate smaller technology nodes. The resulting spot size using an offset of 85 µm was measured to be approximately 45 µm (Gaussian spot, intensity above 10 %.), i.e., larger by a factor of 10. The combinatorial logic of the ATxmega is made up of CMOS lanes of 12 µm (cf. Sect. 8.4.2). Thus, the area illuminated by the laser beam contains more than

89 Chapter 9 Large Laser Spots and Fault Sensitivity Analysis three entire lanes in width and a multitude of transistors with respect to the layout given in Sect. 8.4.2.

Figure 9.2: ATxmega16A4U backside image using NIR illumination, bottom-right corner (mir- rored as seen from the front side). The rectangle marks the position of the combi- natorial Sbox implementation.

Figure 9.3 depicts the laser pulse with respect to other relevant signals. The digital signals were scaled and shifted for better visibility as their absolute value is of no relevance. The black and gray signals represent the current flowing through the device measured over a shunt resistor. The laser was not powered for the gray signal. When powering the laser (black), the effect on the measured current can be clearly identified (cf. Sect. 8.2). The blue signal represents the trigger sent from the delay generator to the laser diode module. The laser pulse in purple was measured through a photo-diode in the optical path (converted to a voltage by a shunt and amplified using a Langer PA303 amplifier). In fact, using this exact configuration, we did not obtain any faults. However, increasing the length of the pulse by less than 10 ns already resulted in stable faults, i.e., the faulty values did not change with increasing the length. It might seem confusing that the measured pulse stops slightly after the rising edge of the clock. This is due to the positions in which the respective signals were measured. We can only measure the clock signal at the clock generator and not at the exact time it arrives at the respective register. If the signal from the photo-diode and the current of the DUT have a similar propagation delay, we can observe the following. First, the measured pulse ends right before the foot of the peak of the gray signal around 0.4 µs. Further, the current caused by the laser (visible by the black signal) ends right before the same respective peak of the power consumption trace. Note that the laser output is stable long before the pulse ends. Thus, we do not change the rate of photons affecting the device but solely the time when the effect stops.

9.4.2 Measuring Individual Timings

Note that the following is not required for the attack but to support the assumption of mea- surable input dependencies using LFI made in Section 9.3. We ran multiple tests for different inputs to an Sbox while increasing the pulse length in steps of 5 ps in the interval mentioned above. We have no information about the delay of individual gates of the ATxmega processor. However, a propagation delay of 41 ps for a single inverter (ring oscillator speed per gate) in

90 9.4 Practical Evaluation

0.02

0.01

0

trigger

Amplitude (V) -0.01 measured pulse clock -0.02 0 0.1 0.2 0.3 0.4 0.5 time (µs)

Figure 9.3: Timing diagram of the fault injection. Black: current through the device (laser on), gray: current (laser off), green: clock signal, blue: pulse to the laser diodes, purple: measured pulse (clock, trigger, and measured pulse are scaled for visibility). a 250 nm technology is mentioned in [Eur15]. Thus, we assume to observe changing values in this region as well. Figure 9.4 depicts the obtained faulty behavior for four different (but fixed) inputs to the Sbox while increasing the pulse length. We shot 20 times for each delay value; the y-axis represents the percentage a certain faulty value appeared on the output. Considering for example plot (a), we injected no fault until a delay of 238.5 ns (measured from the start of the pulse). After around 238.5 ns, we started to obtain a certain fault at the output and for a slightly increased delay, the fault appeared in each of the 20 tests. Continuing to increase the delay, for a delay of 239.0 ns, a different faulty value appeared at the output and remained stable until approx. 240.05 ns. Note that we focused on the part with changing faults, i.e., all the faults were stable after 242 ns. For the attack, we especially require that faulty outputs start to appear for different inputs at different pulse lengths. Indeed, we can observe this effect in Fig. 9.4, revealing differences up to 1 ns. The ATxmega stores the final round key in a dedicated register, e.g., so that decryption can be performed using this key and a flag that runs the key schedule in reverse. We used this to confirm that the gates related to the key schedule were unaltered. Since we know the key used for the encryption, we can use the faulty ciphertext to calculate backwards to investigate the faults (cf. Fig. 9.4). Indeed, for every test, only the output of the targeted Sbox operation in the correct cipher round was altered.

9.4.3 Attack Results

We performed the attack outlined in Section 9.3 choosing the pulse length in such way that 20 %, 50 %, and 80 % of the inputs led to faulty outputs. We targeted an arbitrarily chosen area within the borders of the combinatorial Sbox. Since the ATxmega processes each byte of the state consecutively, we repeated the measurement for each byte by delaying the start of the pulse by one clock cycle, i.e., 500 ns. We found that the ATxmega processes the bytes in the order created after the ShiftRows operation. Thus, the hypotheses should consider the following order: (0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11).

91 Chapter 9 Large Laser Spots and Fault Sensitivity Analysis

Figure 9.4: Percentage of different faulty values at the output of the Sbox for increasing laser pulse length, four exemplary chosen inputs. Colors represent different faulty values at the output. The colors do not match for different inputs, i.e., all obtained faults are unique.

Figure 9.5 depicts the results of the correlation collision [MME10] for the first three ∆k using N = 1000 samples. For Figure 9.5(a), a delay value was chosen so that approximately 20 % of the random input resulted in a fault (i.e., 200 faults out of the 1000 samples). For each target ∆k (and all 12 remaining other ones), the peak corresponding to the correct hypothesis is clearly distinguishable. In Figure 9.5(b), the results for 50% fault occurrence are depicted. The attack succeeds likewise, although with a smaller correlation. We ran the attack again using a pulse length long enough that all faults were stable, i.e., the laser pulse ends after the rising edge of the clock signal. This way, we obtained approximately 80 % faults, meaning that not the whole Sbox is affected by the laser beam. Our results show that the attack fails for the same number of measurements (Fig. 9.5(c)). When using N = 5000 measurements, the correct hypotheses showed the highest correlation (cf. Fig. 9.5(d)). However, the peaks are again not easily distinguishable.

9.5 Discussion

From Section 9.3, we observe that it is not critical which part of the Sbox is affected nor that we understand which fault was exactly injected. We only must obtain some faulty behavior. Further and most importantly, the spot size (or the number of affected gates, respectively) is not critical either. Of course, if all faults are stable, and the spot size is so large that we obtain a fault for every possible input, the attack clearly fails. However, we solved this issue by introducing variable pulse lengths, as shown above. For example, when targeting block 1 in Figure 9.1, it is not relevant whether we hit block 2 as well. When the laser pulse ends, block 2 must reevaluate anyway since its input was changed by block 1. Transferring this to

92 9.6 Conclusion the complete Sbox, only the affected gate with the longest propagation path will determine the success of the attack. The Sbox of the ATxmega covers an area of 230 × 310 µm2 at a technology node of 250 nm. Although being a very rough estimation, transferring this value to, e.g., 11 nm, leads to an area of 10 × 13 µm2. This still provides enough space to target the Sbox with LFI without hitting unrelated logic. For example, applying the Abbe diffraction limit to high-resolution long working distance NIR objectives results in a minimum spot size of 1.7 µm for a commercially available NA of 0.7. Note that scaling down our artificially large spot to 11 nm likewise leads to a spot size of 1.98 µm. Since we are trading spatial accuracy with timing precision, it should be noted that current laser systems certainly make no limits in terms of pulse length and jitter. Even considering clock frequencies above 1 GHz or clock periods below 1 ns, the clock jitter must scale with the frequency. Current research in laser technology considers jitter in the atto-second range for femto-second lasers [BFK12]. In contrast, [Eur15] mentions a delay value of 4.98 ps for a 40 nm process.

9.6 Conclusion

By adapting Fault Sensitivity Analysis to the LFI setting, we inherit a convenient fault model. For example, we allow random plaintexts and do not require genuine or faulty ciphertexts. As opposed to classical Differential Fault Analysis, we solely use the information whether a fault occurred or not. Compared to the original approaches, we trade clock glitches for laser fault injection and do not require control over the clock signal anymore. We show that it is indeed possible to exploit input-dependent timing violations using fine-adjusted laser pulse lengths. Most importantly, we get loose requirements with respect to the laser spot size. The attack still succeeds even if the laser spot is so large that every input to the combinatorial logic is affected. We provided experimental results targeting an AES hardware implementation with an artificially large spot and successfully extracted the secret key. Theoretically scaling the used parameters to the latest technology nodes still provides plenty of space to perform the attack. We conclude that there is no inherent protection at smallest feature sizes against LFI. Replacing clock glitches by laser fault injection as described in this work might be a promising aspect for future research. This offers to introduce timing violations into a specific area of the chip opposed to affecting the whole area. More specifically, this holds especially for high frequencies where clock glitches might not reach the target due to board and on-chip capacitors acting as a low-pass filter.

93 Chapter 9 Large Laser Spots and Fault Sensitivity Analysis

1 1

0 0

1 1

0 0 Correlation Correlation 1 1

0 0

0 50 100 150 200 250 0 50 100 150 200 250 Hypothesis Hypothesis (a) 20%, N = 1000 (b) 50%, N = 1000

1 1

0 0

1 1

0 0 Correlation Correlation 1 1

0 0

0 50 100 150 200 250 0 50 100 150 200 250 Hypothesis Hypothesis (c) 80%, N = 1000 (d) 80%, N = 5000

Figure 9.5: Correlation collision for different percentages and N, top to bottom: ∆k0,1, ∆k0,2, ∆k0,3. Correct ∆k marked with a circle.

94 Part IV

Conclusion

95

Chapter 10 Conclusion and Future Work

In this chapter, we summarize our research contributions and provide prospects and directions of future work.

Contents of this Chapter

10.1 Conclusion ...... 97 10.2 Future Work ...... 98

10.1 Conclusion

In the first part of this thesis, we described how to build voltage sensors in the user-available fabric of an FPGA. Even when different circuits implemented in the same FPGA are logically isolated, they share a common Power Distribution Network (PDN). We demonstrated that the activity of a circuit causes fluctuations on the PDN that can be captured by the sensor. Indeed, the information carried through the PDN is enough to perform side-channel attacks, revealing the secret key of a cryptographic operation. The availability of such a sensor has severe security implications for multi-tenant FPGAs, as it allows one user to spy at another if no countermeasures are implemented. Because the sensor does not require any logic connection to the victim, it might also be deployed through an unsuspicious third-party IP-core. To the best of our knowledge, this was the first time a power side channel of a circuit was measured directly without requiring an oscilloscope. We started our experiments on the SAKURA-G board that is especially built for side-channel analysis. Indeed, the performance of the sensor is comparable to a measurement using an oscilloscope in the number of required traces for a successful attack. However, we also provided experimental results on general purpose FPGA boards, showcasing the general applicability of the design. We extended this idea by increasing the distance of the sensor and the victim in two steps. First, we presented a successful attack on an RSA implementation on an ARM-CPU. Here, the sensor was implemented in the FPGA fabric on the same silicon die. In the second step, we showed that we can even leave theIC-package and attack a chip residing on the same PCB, again through the same PDN. This has a serious impact on board-level integration: It is already difficult to establish a chain-of-trust during manufacturing. The security of for example HSMs can be compromised through a malicious bitstream update even if the FPGA has no logic connection to the victim. One can assume that we will see multiple further variants of such attacks soon.

97 Chapter 10 Conclusion and Future Work

New computational features such as attaching FPGA fabric to SoCs or CPUs in the cloud are promising, e.g., to speed up all sorts of processes. This holds especially for security critical tasks, e.g., so that cryptographic keys are not accessible by malicious software running on the CPU. However, we vividly demonstrated that such new features at the same time usually open a new attack surface and appropriate countermeasures are required. The second part of this thesis described our contribution to the field of laser fault injection. First, we used an imaging method mapping the current that is induced by the laser beam to find points-of-interest on the silicon die. This can tremendously reduce the required time for fault injection by reducing the parameter space. For example, the desired locations of flip-flops are independent of other parameters such as pulse energy etc. Scanning usually produces a higher resolution than taking an image with a camera. Further, it can be performed with any setup intended for laser fault injection with at most minor changes. Note that the target is not powered during imaging and thus, reactive countermeasures such as deleting the key cannot deploy. In our second contribution to laser fault injection, we presented how to perform FSA using precisely timed laser pulses. Because the minimum spot size is physically bounded, there is ongoing research down to which feature size single-bit faults can be achieved. In contrast, our work combining FSA with laser fault injection resulted in relaxed requirements, especially regarding the required spot size. Thus, we can conclude that even at smallest feature sizes, laser fault injection will remain relevant offering faults that are useful for an attacker. In conclusion, more than 20 years after the first publication introducing implementation attacks, these remain an ongoing threat to cryptographic implementations. Novel use-cases such as multi-tenant FPGAs might create new security threats as demonstrated by the novel passive attacks we found during our research. The continuous growth of the Internet of Things connects more and more devices, e.g., large servers, smartphones, and a huge number of embedded low- cost actors and sensors. Along its way, we will certainly see new threats and attacks in the future, caused by novel scenarios and use-cases—again requiring new or adapted countermeasures.

10.2 Future Work

For discussing possible directions of future work, we follow the structure of the thesis and thus, start with possible extensions of our work relating to the voltage sensor built within an FPGA. Finally, we provide future ideas in the field of laser fault injection.

10.2.1 Increasing the Distance between Sensor and Victim

Throughout the first half of our research contribution, we progressively increased the distance of the sensor to the victim. For each iteration, i.e., same fabric, CPU of an SoC and another package on the same PCB, we showed that there is detectable leakage. This leads to the obvious question how much further it is possible to go. For the inter-FPGA attack in Chapter6, we modified the PDN so that both FPGAs are powered from the same PMIC. Going up one level in the hierarchy, i.e., so that the side-channel information must be carried through a PMIC will result in much more required traces. One can assume that the FPGAs offered by cloud providers are installed using a PCIe-card. Thus, it might be possible to see side-channel leakage originating from the CPU of the host or other PCIe FPGA cards in the system. Building an

98 10.2 Future Work analog measurement device within the entirely digital fabric of an FPGA is interesting on its own. Because of the close relation of current and electromagnetic fields, it might be rewarding to investigate whether an antenna can be built to capture theEM emanation of chips nearby. Such an antenna might be realized either directly in the fabric, e.g., through long wires, or using the complex PDN, e.g., if the VDD-plane of the PCB acts as an antenna. Even if the range is small, it still might be used to target another silicon die sitting on top of the sensor, i.e., for stacked-die packages.

10.2.2 Additional Scenarios and Countermeasures

Beyond the question of the maximum possible distance between victim and sensor, there are additional scenarios that should be explored: First, we plan to release the various implemen- tations for the SAKURA-G, the Basys3, and the PYNQ as a measurement framework. It might be used as a cheap alternative to buying oscilloscopes and a quick introduction to side- channel analysis, e.g., for a student’s lab course. Similarly, the voltage sensor could be used on a machine of a cloud provider that is equipped with an FPGA. However, instead using the sensor destructively to spy at other users or processes, it might be used constructively for side- channel analysis “as a service”. After an in-depth characterization of the measurement platform, such an implementation might be used to perform side-channel analysis without requiring own hardware at all. Because cloud providers offer easy scaling of the number of rented machines anyway, manufacturers could easily measure multiple design or implementation in parallel and on-demand. At least for the scenario where the sensor is deployed as a malicious IP-core, it is still an open question how an attacker can get the captured traces. Simply pushing them out through a network interface would be easily detectable. One possible solution might be to perform the side-channel evaluation directly in the fabric. For example, for CPA and difference-of-means DPA, different key hypotheses can be tested independently. Thus, there is a straight-forward time/area trade-off possible for the evaluation to be stealthy. The resulting key could then be leaked through covert channels, e.g., by timing specific network packets etc. Throughout the year 2018, there were multiple publications going in the same or a similar direction as our seminal work present at DATE’18 (cf. [SGMT18b, ZS18, RPD+18, GRE18]). In contrast to our work using a tapped delay chain, all other approaches counted the number of oscillations of one or multiple ring oscillators. We already briefly discussed and compared the properties of both concepts in Section 3.4.2. However, a more detailed evaluation including measurements can reveal which sensor to choose for which scenario. Such a study should also consider ASIC-implementations of the sensors as well as FPGAs by other manufacturers. There are many applications of side-channel analysis for which the sensor might be used as well. For example, one potential scenario might be building a side-channel disassembler with the sensor extracting the program code that is executed on the CPU of an SoC. An obvious solution to prevent the attacks we showcased in PartII of this thesis is to employ traditional and established side-channel countermeasures even in such remote scenarios. Instead, one might think of lightweight countermeasures tailored to a potential sensor in the nearby fabric etc. This also raises the question whether current FPGAs can be used for multi-tenant scenarios at all or whether dedicated and better isolated FPGAs should be manufactured. Besides preventing the side-channel leakage directly, it is also interesting to detect potential

99 Chapter 10 Conclusion and Future Work voltage sensors within the bitstream, e.g., for a multi-tenant cloud scenario. However, as the bitstream format is usually proprietary and kept secret by the manufacturer, effort into this direction might require collaboration with manufacturers or reverse-engineering of the format.

10.2.3 Laser Fault Injection on Latest Technology Nodes and Related Attacks Research down to which technology node single-bit faults can be produced should be continued, Arguably, the technology node of the ATXmega16 in PartII is rather old and large. Thus, one might try to transfer the results to smaller feature sizes, i.e., to see whether flip-flops still can be identified and whether FSA works as claimed. While the spot size should not be critical for laser-based FSA, jitter of the laser pulse larger than the delay of a single gate will add a lot of noise to the measurement. Because circuits manufactured in a smaller size are (intentionally) faster, synchronizing the laser to the target might be challenging. Finally, using CED-schemes as presented in [AMR+18] was intended to counter an attacker capable of high-precision and biased faults. Multiple variants of the countermeasure at differ- ent technology nodes are already implemented on an ASIC manufactured through Europrac- tice [Eur07]. Thus, the security claims and the usefulness of the adversary model could be supported by experimental results in the future.

100 Part V

Appendix

101

Bibliography

[Abb73] Ernst Abbe. Beitr¨age zur Theorie des Mikroskops und der mikroskopischen Wahrnehmung. Archiv f¨urmikroskopische Anatomie, 9(1):413–418, Dec 1873. 64

[ABCV17] Sabine Azzi, Bruno Barras, Maria Christofi, and David Vigilant. Using linear codes as a fault countermeasure for nonlinear operations: application to AES and formal verification. J. Cryptographic Engineering, 7(1):75–85, 2017. 16

[ABF+02] Christian Aum¨uller,Peter Bier, Wieland Fischer, Peter Hofreiter, and Jean-Pierre Seifert. Fault Attacks on RSA with CRT: Concrete Results and Practical Coun- termeasures. In Burton S. Kaliski Jr., C¸etin Kaya Ko¸c,and Christof Paar, editors, Cryptographic Hardware and Embedded Systems - CHES 2002, 4th International Workshop, Redwood Shores, CA, USA, August 13-15, 2002, Revised Papers, vol- ume 2523 of Lecture Notes in Computer Science, pages 260–275. Springer, 2002. 14

[ADM+10] Michel Agoyan, Jean-Max Dutertre, Amir-Pasha Mirbaha, David Naccache, Anne- Lise Ribotta, and Assia Tria. How to flip a bit? In 16th IEEE International On-Line Testing Symposium (IOLTS 2010), 5-7 July, 2010, Corfu, Greece, pages 235–239. IEEE Computer Society, 2010. 60, 64, 86

[ADN+10] Michel Agoyan, Jean-Max Dutertre, David Naccache, Bruno Robisson, and Assia Tria. When Clocks Fail: On Critical Paths and Clock Faults. In Dieter Gollmann, Jean-Louis Lanet, and Julien Iguchi-Cartigny, editors, Smart Card Research and Advanced Application, 9th IFIP WG 8.8/11.2 International Conference, CARDIS 2010, Passau, Germany, April 14-16, 2010. Proceedings, volume 6035 of Lecture Notes in Computer Science, pages 182–193. Springer, 2010. 14

[ALP16] ALPhANOV Centre technologique optique et lasers. Datasheet: Pulse-on-Demand Modules PDM Series, 2016. available at http://www.alphanov.com/client/ document/pn10_pdm-_2016_8.pdf, accessed June 27, 2018. 73

[AMR+18] Anita Aghaie, Amir Moradi, Shahram Rasoolzadeh, Falk Schellenberg, and Tobias Schneider. Impeccable Circuits. IACR Cryptology ePrint Archive, 2018:203, 2018. 13, 16, 100

[AMT13] Subidh Ali, Debdeep Mukhopadhyay, and Michael Tunstall. Differential fault anal- ysis of AES: towards reaching its limits. J. Cryptographic Engineering, 3(2):73–97, 2013. 15

[APH+16] Charalampos Ananiadis, Athanasios Papadimitriou, David H´ely, Vincent Beroulle, Paolo Maistri, and R´egisLeveugle. On the development of a new countermeasure

103 Bibliography

based on a laser attack RTL fault model. In 2016 Design, Automation & Test in Europe Conference & Exhibition, DATE 2016, Dresden, Germany, March 14-18, 2016, pages 445–450, 2016. 16

[ASH95] Richard E. Anderson, Jerry M. Soden, and Christopher L. Henderson. Failure analysis: Status and future trends., 1995. available at http://www.iaea.org/ inis/collection/NCLCollectionStore/_Public/26/041/26041072.pdf. 81

[Atm14] Atmel Corporation. ATxmega16A4U Datasheet, 2014. available at http://www. atmel.com/images/8-bitAVR.pdf. 71

[AVZ82] W. Jerry Alford, Richfspard D. VanderNeut, and Vicent J. Zaleckas. Laser scan- ning microscopy. Proceedings of the IEEE, 70(6):641–651, June 1982. 81

[Bau04] Robert C. Baumann. Soft Errors in Commercial Integrated Circuits. International Journal of High Speed Electronics and Systems, 14(02):299–309, 2004. 60

[BBA+12] Pierre Bayon, Lilian Bossuet, Alain Aubert, Viktor Fischer, Fran¸coisPoucheret, Bruno Robisson, and Philippe Maurine. Contactless Electromagnetic Active At- tack on Ring Oscillator Based True Random Number Generator. In COSADE, volume 7275 of Lecture Notes in Computer Science, pages 151–166. Springer, 2012. 14

[BBK+03] Guido Bertoni, Luca Breveglieri, Israel Koren, Paolo Maistri, and Vincenzo Piuri. Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard. IEEE Trans. Computers, 52(4):492–505, 2003. 16

[BCC+14] Julien Bringer, Claude Carlet, Herv´eChabanne, Sylvain Guilley, and Houssem Maghrebi. Orthogonal Direct Sum Masking - A Smartcard Friendly Computation Paradigm in a Code, with Builtin Protection against Side-Channel and Fault At- tacks. In WISTP, volume 8501 of Lecture Notes in Computer Science, pages 40–56. Springer, 2014. 16

[BCN+06] Hagai Bar-El, Hamid Choukri, David Naccache, Michael Tunstall, and Claire Whe- lan. The Sorcerer’s Apprentice Guide to Fault Attacks. Proceedings of the IEEE, 94(2):370–382, 2006. 13, 14

[BCO04] Brier, Christophe Clavier, and Francis Olivier. Correlation Power Analy- sis with a Leakage Model. In Marc Joye and Jean-Jacques Quisquater, editors, Cryptographic Hardware and Embedded Systems - CHES 2004: 6th International Workshop Cambridge, MA, USA, August 11-13, 2004. Proceedings, volume 3156 of Lecture Notes in Computer Science, pages 16–29. Springer, 2004. 12

[BDL97] Dan Boneh, Richard A. DeMillo, and Richard J. Lipton. On the Importance of Checking Cryptographic Protocols for Faults (Extended Abstract). In Walter Fumy, editor, Advances in Cryptology - EUROCRYPT ’97, International Confer- ence on the Theory and Application of Cryptographic Techniques, Konstanz, Ger- many, May 11-15, 1997, Proceeding, volume 1233 of Lecture Notes in Computer Science, pages 37–51. Springer, 1997. 13, 15

104 Bibliography

[BDL01] Dan Boneh, Richard A. DeMillo, and Richard J. Lipton. On the Importance of Eliminating Errors in Cryptographic Computations. J. Cryptology, 14(2):101–119, 2001. 16

[BFK12] Andrew J. Benedick, James G. Fujimoto, and Franz X. Kartner. Optical flywheels with attosecond jitter. Nature Photonics, 6(2):97–100, 2012. 93

[BG13] Alberto Battistello and Christophe Giraud. Fault Analysis of Infective AES Com- putations. In 2013 Workshop on Fault Diagnosis and Tolerance in , Los Alamitos, CA, USA, August 20, 2013, pages 101–107, 2013. 16

[BG15] Alberto Battistello and Christophe Giraud. Fault Cryptanalysis of CHES 2014 Symmetric Infective Countermeasure. IACR Cryptology ePrint Archive, 2015:500, 2015. 16

[BK06] Johannes Bl¨omerand Volker Krummel. Fault Based Collision Attacks on AES. In Luca Breveglieri, Israel Koren, David Naccache, and Jean-Pierre Seifert, editors, Fault Diagnosis and Tolerance in Cryptography, Third International Workshop, FDTC 2006, Yokohama, Japan, October 10, 2006, Proceedings, volume 4236 of Lecture Notes in Computer Science, pages 106–120. Springer, 2006. 15

[BLB+02] T. Beauchene, D. Lewis, F. Beaudoin, P. Perdu, and P. Fouillat. Backside SC- OBIC using a pulsed NIR-laser and its application to fault location. In Physical and Failure Analysis of Integrated Circuits, 2002. IPFA 2002. Proceedings of the 9th International Symposium on the, pages 193–195, 2002. 68

[Ble98] Daniel Bleichenbacher. Chosen Ciphertext Attacks Against Protocols Based on the RSA Encryption Standard PKCS #1. In Advances in Cryptology - CRYPTO ’98, 18th Annual International Cryptology Conference, Santa Barbara, California, USA, August 23-27, 1998, Proceedings, pages 1–12, 1998. 29

[BS90] and . Differential Cryptanalysis of DES-like Cryptosys- tems. In Alfred Menezes and Scott A. Vanstone, editors, Advances in Cryptology - CRYPTO ’90, 10th Annual International Cryptology Conference, Santa Barbara, California, USA, August 11-15, 1990, Proceedings, volume 537 of Lecture Notes in Computer Science, pages 2–21. Springer, 1990. 15

[BS97] Eli Biham and Adi Shamir. Differential Fault Analysis of Secret Key Cryptosys- tems. In Advances in Cryptology - CRYPTO ’97, 17th Annual International Cryp- tology Conference, Santa Barbara, California, USA, August 17-21, 1997, Proceed- ings, pages 513–525, 1997. 15, 80

[BS03] Johannes Bl¨omerand Jean-Pierre Seifert. Fault Based Cryptanalysis of the Ad- vanced Encryption Standard (AES). In Rebecca N. Wright, editor, Financial Cryptography, 7th International Conference, FC 2003, Guadeloupe, French West Indies, January 27-30, 2003, Revised Papers, volume 2742 of Lecture Notes in Computer Science, pages 162–181. Springer, 2003. 15

105 Bibliography

[BSB+14] Stuart Byma, J. Gregory Steffan, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. FPGAs in the Cloud: Booting Virtualized Hardware Accelera- tors with OpenStack. In 22nd IEEE Annual International Symposium on Field- Programmable Custom Computing Machines, FCCM 2014, Boston, MA, USA, May 11-13, 2014, pages 109–116. IEEE Computer Society, 2014. 31

[BTD+13] Rodrigo Possamai Bastos, Frank Sill Torres, Jean-Max Dutertre, Marie-Lise Flottes, Giorgio Di Natale, and Bruno Rouzeyre. A bulk built-in sensor for de- tection of fault attacks. In 2013 IEEE International Symposium on Hardware- Oriented Security and Trust, HOST 2013, Austin, TX, USA, June 2-3, 2013, pages 51–54, 2013. 16

[CDR+16] Stephan De Castro, Jean-Max Dutertre, Bruno Rouzeyre, Giorgio Di Natale, and Marie-Lise Flottes. Frontside Versus Backside Laser Injection: A Comparative Study. JETC, 13(1):6:1–6:15, 2016. 62

[CG16] Claude Carlet and Sylvain Guilley. Complementary dual codes for counter- measures to side-channel attacks. Adv. in Math. of Comm., 10(1):131–150, 2016. 16

[CJRR99] Suresh Chari, Charanjit S. Jutla, Josyula R. Rao, and Pankaj Rohatgi. Towards Sound Approaches to Counteract Power-Analysis Attacks. In Michael J. Wiener, editor, Advances in Cryptology - CRYPTO ’99, 19th Annual International Cryptol- ogy Conference, Santa Barbara, California, USA, August 15-19, 1999, Proceedings, volume 1666 of Lecture Notes in Computer Science, pages 398–412. Springer, 1999. 13

[CLFT14a] Franck Courbon, Philippe Loubet-Moundi, Jacques J. A. Fournier, and Assia Tria. Adjusting Laser Injections for Fully Controlled Faults. In Constructive Side-Channel Analysis and Secure Design - 5th International Workshop, COSADE 2014, Paris, France, April 13-15, 2014. Revised Selected Papers, pages 229–242, 2014. 60, 64, 79, 86

[CLFT14b] Franck Courbon, Philippe Loubet-Moundi, Jacques J. A. Fournier, and Assia Tria. Increasing the efficiency of laser fault injections using fast gate level reverse engi- neering. In 2014 IEEE International Symposium on Hardware-Oriented Security and Trust, HOST 2014, Arlington, VA, USA, May 6-7, 2014, pages 60–63, 2014. 60, 64, 68, 69, 82, 86

[Col11] Edward I. Cole, Jr. Beam-Based Defect Localization Techniques. In Richard J. Ross, editor, Microelectronics Failure Analysis: Desk Reference, Sixth Edition, pages 246–262. ASM International, 2011. 68, 70

[Cor13] John D. Corbett. The Xilinx Isolation Design Flow for Fault-Tolerant Systems, 2013. 31

[CPB+13] Rafael Boix Carpi, Stjepan Picek, Lejla Batina, Federico Menarini, Domagoj Jakobovic, and Marin Golub. Glitch It If You Can: Parameter Search Strategies

106 Bibliography

for Successful Fault Injection. In Aur´elienFrancillon and Pankaj Rohatgi, editors, Smart Card Research and Advanced Applications - 12th International Conference, CARDIS 2013, Berlin, Germany, November 27-29, 2013. Revised Selected Papers, volume 8419 of Lecture Notes in Computer Science, pages 236–252. Springer, 2013. 14, 68

[CPM+18] Giovanni Camurati, Sebastian Poeplau, Marius Muench, Tom Hayes, and Aur´elien Francillon. Screaming Channels: When Electromagnetic Side Channels Meet Ra- dio Transceivers. In Proceedings of the 25th ACM conference on Computer and communications security (CCS), CCS ’18. ACM, October 2018. 25, 30

[DBC+18] Jean-Max Dutertre, Vincent Beroulle, Stephan De Castro, Louis-Bathelemy Faber, Marie-Lise Flottes, Philippe Gendrier, David Hely, R´egisLeveugle, Paolo Maistri, Giorgio Di Natale, Athanasios Papadimitriou, and Bruno Rouzeyre. Laser fault injection at the CMOS 28 nm technology node: an analysis of the fault model. In 2018 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2018, Amsterdam, The Netherlands, September 13, 2018. IEEE Computer Society, 2018. 60, 64, 86

[DDF14] Alexandre Duc, Stefan Dziembowski, and Sebastian . Unifying Leakage Mod- els: From Probing Attacks to Noisy Leakage. In Phong Q. Nguyen and Elisabeth Oswald, editors, Advances in Cryptology - EUROCRYPT 2014 - 33rd Annual In- ternational Conference on the Theory and Applications of Cryptographic Tech- niques, Copenhagen, Denmark, May 11-15, 2014. Proceedings, volume 8441 of Lecture Notes in Computer Science, pages 423–440. Springer, 2014. 13

[DEK+16] Christoph Dobraunig, Maria Eichlseder, Thomas Korak, Victor Lomn´e,and Flo- rian Mendel. Statistical Fault Attacks on Nonce-Based Authenticated Encryption Schemes. In Jung Hee Cheon and Tsuyoshi Takagi, editors, Advances in Cryptology - ASIACRYPT 2016 - 22nd International Conference on the Theory and Appli- cation of Cryptology and Information Security, Hanoi, Vietnam, December 4-8, 2016, Proceedings, Part I, volume 10031 of Lecture Notes in Computer Science, pages 369–395, 2016. 15

[DEK+18] Christoph Dobraunig, Maria Eichlseder, Thomas Korak, Stefan Mangard, Florian Mendel, and Robert Primas. SIFA: Exploiting Ineffective Fault Inductions on Symmetric Cryptography. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2018(3):547–572, Aug. 2018. 15

[DEMM14] Christoph Dobraunig, Maria Eichlseder, Stefan Mangard, and Florian Mendel. On the Security of Fresh Re-keying to Counteract Side-Channel and Fault Attacks. In Marc Joye and Amir Moradi, editors, Smart Card Research and Advanced Appli- cations - 13th International Conference, CARDIS 2014, Paris, France, November 5-7, 2014. Revised Selected Papers, volume 8968 of Lecture Notes in Computer Science, pages 233–244. Springer, 2014. 13

[DLV03] Pierre Dusart, Gilles Letourneux, and Olivier Vivolo. Differential Fault Analy- sis on A.E.S. In Jianying Zhou, Moti Yung, and Yongfei Han, editors, Applied

107 Bibliography

Cryptography and Network Security, First International Conference, ACNS 2003. Kunming, China, October 16-19, 2003, Proceedings, volume 2846 of Lecture Notes in Computer Science, pages 293–306. Springer, 2003. 15

[DPD+06] A. Douin, V. Pouget, F. Darracq, D. Lewis, P. Fouillat, and P. Perdu. Influence of Laser Pulse Duration in Single Event Upset Testing. IEEE Transactions on Nuclear Science, 53(4):1799–1805, Aug 2006. 61

[DR02] Joan Daemen and Vincent Rijmen. The Design of Rijndael: AES - The Advanced Encryption Standard. Information Security and Cryptography. Springer, 2002.

[DSD+07] Alan J. Drake, Robert M. Senger, Harmander Deogun, Gary D. Carpenter, Soraya Ghiasi, Tuyet Nguyen, Norman K. James, Michael S. Floyd, and Vikas Pokala. A Distributed Critical-Path Timing Monitor for a 65nm High-Performance Mi- croprocessor. In 2007 IEEE International Solid-State Circuits Conference, ISSCC 2007, Digest of Technical Papers, San Francisco, CA, USA, February 11-15, 2007, pages 398–399, 2007. 53

[ELH+12] Sho Endo, Yang Li, Naofumi Homma, Kazuo Sakiyama, Kazuo Ohta, and Taka- fumi Aoki. An Efficient Countermeasure against Fault Sensitivity Analysis Using Configurable Delay Blocks. In Guido Bertoni and Benedikt Gierlichs, editors, 2012 Workshop on Fault Diagnosis and Tolerance in Cryptography, Leuven, Belgium, September 9, 2012, pages 95–102. IEEE Computer Society, 2012. 15

[ERM16] David El-Baze, Jean-Baptiste Rigaud, and Philippe Maurine. An Embedded Dig- ital Sensor against EM and BB Fault Injection. In 2016 Workshop on Fault Di- agnosis and Tolerance in Cryptography, FDTC 2016, Santa Barbara, CA, USA, August 16, 2016, pages 78–86. IEEE Computer Society, 2016. 16

[Eur07] Europractice. Activity Report 2017 - Side-Channel and Fault Injection Evaluation Chip, 2007. available at http://europractice-ic.com/docs/ EPactivityReport2017.pdf. 66, 100

[Eur15] Europractice-IC. TSMC 0.25 um technology overview (MPW), accessed 30.10.2015. available at http://www.europractice-ic.com/technologies_ TSMC.php?tech_id=025um. 91, 93

[EV12] Ken Eguro and Ramarathnam Venkatesan. FPGAs for trusted cloud comput- ing. In Dirk Koch, Satnam Singh, and Jim Tørresen, editors, 22nd International Conference on Field Programmable Logic and Applications (FPL), Oslo, Norway, August 29-31, 2012, pages 63–70. IEEE, 2012. 31

[FGS+17a] Markus Finkeldey, Lena G¨oring,Falk Schellenberg, Carsten Brenner, Nils C. Ger- hardt, and Martin R. Hofmann. Multimodal backside imaging of a microcontroller using confocal laser scanning and optical-beam-induced current imaging. In Proc. SPIE 10110, Photonic Instrumentation Engineering IV, 101101F, 2017.

[FGS+17b] Markus Finkeldey, Lena G¨oring,Falk Schellenberg, Nils C. Gerhardt, and Mar- tin Hofmann. Backside imaging of a microcontroller with common-path digital

108 Bibliography

holography. In Proc. SPIE 10127, Practical Holography XXXI: Materials and Ap- plications, 1012704, 2017. [FJLT13] Thomas Fuhr, Eliane´ Jaulmes, Victor Lomn´e,and Adrian Thillard. Fault Attacks on AES with Faulty Ciphertexts Only. In Wieland Fischer and J¨orn-MarcSchmidt, editors, 2013 Workshop on Fault Diagnosis and Tolerance in Cryptography, Los Alamitos, CA, USA, August 20, 2013, pages 108–118. IEEE Computer Society, 2013. 14, 15 [FSG+16] Markus Finkeldey, Falk Schellenberg, Nils C. Gerhardt, Christof Paar, and Mar- tin R. Hofmann. Common-path depth-filtered digital holography for high resolu- tion imaging of buried semiconductor structures. In Proc. SPIE 9771, Practical Holography XXX: Materials and Applications, 97710G, 2016. [FVS15] Suhaib A. Fahmy, Kizheppatt Vipin, and Shanker Shreejith. Virtualized FPGA Accelerators for Efficient Cloud Computing. In 7th IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2015, Vancouver, BC, Canada, November 30 - December 3, 2015, pages 430–435, 2015. 31 [GCS+09] Sylvain Guilley, Sumanta Chaudhuri, Laurent Sauvage, Tarik Graba, Jean-Luc Danger, Philippe Hoogvorst, Vinh-Nga Vong, Maxime Nassar, and Florent Fla- ment. Shall we trust WDDL?, pages 208–215. Vieweg+Teubner, Wiesbaden, 2009. 13 [GD77] Samuel Glasstone and Philip J. Dolan. The Effects of Nuclear Weapons - Third Edition, 1977. Prepared and published by the United States Department of Defense and the Energy Research and Development Administration. 59 [GE16] Ilias Giechaskiel and Ken Eguro. Information Leakage Between FPGA Long Wires. CoRR, abs/1611.08882, 2016. 31 [GFS+17] Lena G¨oring,Markus Finkeldey, Falk Schellenberg, Carsten Brenner, Martin R. Hofmann, and Nils C.Gerhardt. Optical metrology for the investigation of buried technical structures. tm - Technisches Messen, 85(2):104–110, 2017. [GG74] Bob L. Gregory and Charles W. Gwyn. Radiation effects on semiconductor devices. Proceedings of the IEEE, 62(9):1264–1273, Sept 1974. 59 [Gir04] Christophe Giraud. DFA on AES. In Advanced Encryption Standard - AES, 4th International Conference, AES 2004, Bonn, Germany, May 10-12, 2004, Revised Selected and Invited Papers, pages 27–41, 2004. 15 [GJJR11] Gilbert Goodwill, Benjamin Jun, Josh Jaffe, and Pankaj Rohatgi. A test- ing methodology for side channel resistance validation. In NIST non- invasive attack testing workshop, 2011. http://csrc.nist.gov/news_events/ non-invasive-attack-testing-workshop/papers/08_Goodwill.pdf. 12 [GK95] Martin A. Green and Mark J. Keevers. Optical properties of intrinsic silicon at 300 K. Progress in Photovoltaics: Research and Applications, 3:189 – 192, 1995 1995. 65

109 Bibliography

[GM11] Tim G¨uneysuand Amir Moradi. Generic Side-Channel Countermeasures for Re- configurable Devices. In Bart Preneel and Tsuyoshi Takagi, editors, Cryptographic Hardware and Embedded Systems - CHES 2011 - 13th International Workshop, Nara, Japan, September 28 - October 1, 2011. Proceedings, volume 6917 of Lecture Notes in Computer Science, pages 33–48. Springer, 2011. 13 [GMM16] Daniel Gruss, Cl´ementine Maurice, and Stefan Mangard. Rowhammer.js: A Re- mote Software-Induced Fault Attack in JavaScript. In Detection of Intrusions and Malware, and Vulnerability Assessment - 13th International Conference, DIMVA 2016, San Sebasti´an,Spain, July 7-8, 2016, Proceedings, pages 300–321, 2016. 30, 48 [GOKT16] Dennis R. E. Gnad, Fabian Oboril, Saman Kiamehr, and Mehdi Baradaran Tahoori. Analysis of transient voltage fluctuations in FPGAs. In 2016 Interna- tional Conference on Field-Programmable Technology, FPT 2016, Xi’an, China, December 7-9, 2016, pages 12–19, 2016. 26, 27, 30, 32 [GRE18] Ilias Giechaskiel, Kasper Bonne Rasmussen, and Ken Eguro. Leaky Wires: Infor- mation Leakage and Covert Communication Between FPGA Long Wires. In Jong Kim, Gail-Joon Ahn, Seungjoo Kim, Yongdae Kim, Javier L´opez, and Taesoo Kim, editors, Proceedings of the 2018 on Asia Conference on Computer and Communi- cations Security, AsiaCCS 2018, Incheon, Republic of Korea, June 04-08, 2018, pages 15–27. ACM, 2018. 27, 99 [GRG+07] D. Giot, P. Roche, G. Gasiot, J. L. Autran, and R. Harboe-Sorensen. Heavy ion testing and 3D simulations of Multiple Cell Upset in 65nm standard SRAMs. In European Conference on Radiation and Its Effects on Components and Systems, pages 1–6, 2007. 14 [GSD+08] Sylvain Guilley, Laurent Sauvage, Jean-Luc Danger, Nidhal Selmane, and Renaud Pacalet. Silicon-level Solutions to Counteract Passive and Active Attacks. In Luca Breveglieri, Shay Gueron, Israel Koren, David Naccache, and Jean-Pierre Seifert, editors, Fifth International Workshop on Fault Diagnosis and Tolerance in Cryptography, 2008, FDTC 2008, Washington, DC, USA, 10 August 2008, pages 3–17. IEEE Computer Society, 2008. 14 [GST12] Benedikt Gierlichs, J¨orn-MarcSchmidt, and Michael Tunstall. Infective Compu- tation and Dummy Rounds: Fault Protection for Block Ciphers without Check- before-Output. In LATINCRYPT, volume 7533 of Lecture Notes in Computer Science, pages 305–321. Springer, 2012. 16 [GST14] Daniel Genkin, Adi Shamir, and Eran Tromer. RSA Key Extraction via Low- Bandwidth Acoustic Cryptanalysis. In Juan A. Garay and Rosario Gennaro, edi- tors, Advances in Cryptology - CRYPTO 2014 - 34th Annual Cryptology Confer- ence, Santa Barbara, CA, USA, August 17-21, 2014, Proceedings, Part I, volume 8616 of Lecture Notes in Computer Science, pages 444–461. Springer, 2014. 12, 25 [GST17] Daniel Genkin, Adi Shamir, and Eran Tromer. Acoustic Cryptanalysis. J. Cryp- tology, 30(2):392–443, 2017. 12, 25

110 Bibliography

[GYTS14] Nahid Farhady Ghalaty, Bilgiday Yuce, Mostafa M. I. Taha, and Patrick Schau- mont. Differential Fault Intensity Analysis. In Assia Tria and Dooho Choi, editors, 2014 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2014, Busan, South Korea, September 23, 2014, pages 49–58. IEEE Computer Society, 2014. 14, 15, 88

[Hab65] Donald H. Habing. The Use of Lasers to Simulate Radiation-Induced Transients in Semiconductor Devices and Circuits. Nuclear Science, IEEE Transactions on, 12(5):91–100, Oct 1965. 59

[HBB+16] Wei He, Jakub Breier, Shivam Bhasin, Noriyuki Miura, and Makoto Nagata. Ring Oscillator under Laser: Potential of PLL-based Countermeasure against Laser Fault Injection. In 2016 Workshop on Fault Diagnosis and Tolerance in Cryptog- raphy, FDTC 2016, Santa Barbara, CA, USA, August 16, 2016, pages 102–113. IEEE Computer Society, 2016. 16, 60

[HBW+07] Ted Huffmire, Brett Brotherton, Gang Wang, Timothy Sherwood, Ryan Kastner, Timothy E. Levin, Thuy D. Nguyen, and Cynthia E. Irvine. Moats and Draw- bridges: An Isolation Primitive for Reconfigurable Hardware Based Systems. In 2007 IEEE Symposium on Security and Privacy (S&P 2007), 20-23 May 2007, Oakland, California, USA, pages 281–295, 2007. 31

[HS13] Michael Hutter and J¨orn-MarcSchmidt. The Temperature Side Channel and Heat- ing Fault Attacks. In Aur´elien Francillon and Pankaj Rohatgi, editors, Smart Card Research and Advanced Applications - 12th International Conference, CARDIS 2013, Berlin, Germany, November 27-29, 2013. Revised Selected Papers, volume 8419 of Lecture Notes in Computer Science, pages 219–235. Springer, 2013. 12

[HvD04] James Hendricks and Leendert van Doorn. Secure bootstrap is not enough: shoring up the trusted computing base. In Proceedings of the 11st ACM SIGOPS European Workshop, Leuven, Belgium, September 19-22, 2004, page 11, 2004. 48

[IKT+98] M. Ishida, T. Kawakami, A. Tsuji, N. Kawamoto, M. Motoyoshi, and N. Ouchi. A novel 6T-SRAM cell technology designed with rectangular patterns scalable beyond 0.18 /spl mu/m generation and desirable for ultra high speed operation. In Inter- national Electron Devices Meeting 1998. Technical Digest (Cat. No.98CH36217), pages 201–204, Dec 1998. 63

[INK11] Taras Iakymchuk, Maciej Nikodem, and Krzysztof Kepa. Temperature-based covert channel in FPGA systems. In Proceedings of the 6th International Work- shop on Reconfigurable Communication-centric Systems-on-Chip, ReCoSoC 2011, Montpellier, France, 20-22 June, 2011, pages 1–7. IEEE, 2011. 31

[KA98] Markus G. Kuhn and Ross J. Anderson. Soft Tempest: Hidden Data Transmission Using Electromagnetic Emanations. In David Aucsmith, editor, Information Hid- ing, Second International Workshop, Portland, Oregon, USA, April 14-17, 1998, Proceedings, volume 1525 of Lecture Notes in Computer Science, pages 124–142. Springer, 1998.

111 Bibliography

[KDK+14] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji-Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. Flipping bits in memory with- out accessing them: An experimental study of DRAM disturbance errors. In ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014, Minneapolis, MN, USA, June 14-18, 2014, pages 361–372, 2014. 14

[KFS+14] Nektarios Koukourakis, Markus Finkeldey, Moritz St¨urmer,Christoph Leithold, Nils C. Gerhardt, Martin R. Hofmann, Ulrike Wallrabe, J¨urgenW. Czarske, and Andreas Fischer. Axial scanning in confocal microscopy employing adaptive lenses (CAL). Optical Express, 22(5):6025–6039, Mar 2014. 71

[KGG+18] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre Attacks: Exploiting Speculative Execution. CoRR, abs/1801.01203, 2018. 48

[KGT18] Jonas Krautter, Dennis Gnad, and Mehdi Tahoori. FPGAhammer: Remote Volt- age Fault Attacks on Shared FPGAs, suitable for DFA on AES. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2018(3):44–68, Aug. 2018. 30

[KH14] Thomas Korak and Michael Hoefler. On the Effects of Clock and Power Supply Tampering on Two Microcontroller Platforms. In 2014 Workshop on Fault Diagno- sis and Tolerance in Cryptography, FDTC 2014, Busan, South Korea, September 23, 2014, pages 8–17, 2014. 16

[Kiz09] Ilya Kizhvatov. Side channel analysis of AVR XMEGA crypto engine. In Proceed- ings of the 4th Workshop on Embedded Systems Security, WESS 2009, Grenoble, France, October 15, 2009, 2009. 68, 78

[KJJ99] Paul C. Kocher, Joshua Jaffe, and Benjamin Jun. Differential Power Analysis. In Advances in Cryptology - CRYPTO ’99, 19th Annual International Cryptology Conference, Santa Barbara, California, USA, August 15-19, 1999, Proceedings, pages 388–397, 1999. 12

[KL96] Sung-Mo (Steve) Kang and Yusuf Leblebici. CMOS Digital Integrated Circuits Analysis & Design. McGraw-Hill, Inc., New York, NY, USA, 1996. 75

[Koc96] Paul C. Kocher. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems. In Neal Koblitz, editor, Advances in Cryptology - CRYPTO ’96, 16th Annual International Cryptology Conference, Santa Barbara, California, USA, August 18-22, 1996, Proceedings, volume 1109 of Lecture Notes in Computer Science, pages 104–113. Springer, 1996. 11

[Koc98] Paul C. Kocher. Leak-resistant cryptographic indexed key update, 1998. US Patent US6539092B1, https://patents.google.com/patent/US6539092B1. 13

[KSR+18] Shahrzad Keshavarz, Falk Schellenberg, Bastian Richter, Christof Paar, and Daniel Holcomb. SAT-based reverse engineering of gate-level schematics using fault injec- tion and probing. In 2018 IEEE International Symposium on Hardware Oriented

112 Bibliography

Security and Trust, HOST 2018, Washington, DC, USA, April 30 - May 4, 2018, pages 215–220. IEEE Computer Society, 2018.

[KSV13] Dusko Karaklajic, J¨orn-MarcSchmidt, and Ingrid Verbauwhede. Hardware De- signer’s Guide to Fault Attacks. IEEE Trans. VLSI Syst., 21(12):2295–2306, 2013. 13, 14

[KT97] J. A. Kash and J. C. Tsang. Dynamic internal testing of CMOS circuits using hot luminescence. IEEE Electron Device Letters, 18(7):330–332, July 1997. 59

[Len96] Arjen K. Lenstra. Memo on RSA signature generation in the presence of faults, 1996. manuscript. 13, 15

[LMH+98] Jie Liu, Feng Ma, Mingdong Hou, Youmei Sun, Jinming Quan, Yanpin Zhou, Yujin Zhong, Jinde Fan, Zhaoyun Chen, and Fuyin Feng. Heavy ion induced single event effects in semiconductor device1This subject was supported by the Chinese Academy of Sciences.1. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms, 135(1):239 – 243, 1998. 59

[LSG+10] Yang Li, Kazuo Sakiyama, Shigeto Gomisawa, Toshinori Fukunaga, Junko Taka- hashi, and Kazuo Ohta. Fault Sensitivity Analysis. In Cryptographic Hardware and Embedded Systems, CHES 2010, 12th International Workshop, Santa Barbara, CA, USA, August 17-20, 2010. Proceedings, pages 320–334, 2010. 86

[LSG+18] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown. CoRR, abs/1801.01207, 2018. 48

[MDR+10] Amir Pasha Mirbaha, Jean-Max Dutertre, Anne-Lise Ribotta, Michel Agoyan, Assia Tria, and David Naccache. Single-Bit DFA Using Multiple-Byte Laser Fault Injection. In 10th IEEE International Conference on Technologies for Homeland Security, Boston, United States, November 2010. 60, 64, 86

[MGdC+18] Pieter Maene, Johannes G¨otzfried,Ruan de Clercq, Tilo M¨uller, Felix C. Freiling, and Ingrid Verbauwhede. Hardware-Based Trusted Computing Architectures for Isolation and Attestation. IEEE Trans. Computers, 67(3):361–374, 2018.

[MKP11] Amir Moradi, Markus Kasper, and Christof Paar. On the Portability of Side- Channel Attacks - An Analysis of the Xilinx Virtex 4 and Virtex 5 Bitstream Encryption Mechanism. IACR Cryptology ePrint Archive, 2011:391, 2011. 48

[MME10] Amir Moradi, Oliver Mischke, and Thomas Eisenbarth. Correlation-Enhanced Power Analysis Collision Attack. In Cryptographic Hardware and Embedded Sys- tems, CHES 2010, 12th International Workshop, Santa Barbara, CA, USA, August 17-20, 2010. Proceedings, pages 125–139, 2010. 86, 92

[MMP+11] Amir Moradi, Oliver Mischke, Christof Paar, Yang Li, Kazuo Ohta, and Kazuo Sakiyama. On the Power of Fault Sensitivity Analysis and Collision Side-Channel

113 Bibliography

Attacks in a Combined Setting. In Cryptographic Hardware and Embedded Systems - CHES 2011 - 13th International Workshop, Nara, Japan, September 28 - October 1, 2011. Proceedings, pages 292–311, 2011. 15, 86, 88

[MMR17] Thorben Moos, Amir Moradi, and Bastian Richter. Static power side-channel analysis of a threshold implementation prototype chip. In David Atienza and Giorgio Di Natale, editors, Design, Automation & Test in Europe Conference & Exhibition, DATE 2017, Lausanne, Switzerland, March 27-31, 2017, pages 1324– 1329. IEEE, 2017. 24

[MOP07] Stefan Mangard, Elisabeth Oswald, and Thomas Popp. Power Analysis Attacks: Revealing the Secrets of Smart Cards (Advances in Information Security). Springer- Verlag New York, Inc., Secaucus, NJ, USA, 2007. 80

[Mor14] Amir Moradi. Side-Channel Leakage through Static Power - Should We Care about in Practice? In Lejla Batina and Matthew Robshaw, editors, Cryptographic Hardware and Embedded Systems - CHES 2014 - 16th International Workshop, Busan, South Korea, September 23-26, 2014. Proceedings, volume 8731 of Lecture Notes in Computer Science, pages 562–579. Springer, 2014. 24

[Mor16] Amir Moradi. Advances in Side-Channel Security, 2016. Habilitation, Ruhr- Universit¨atBochum, Germany. 25

[MPR+11] Marcel Medwed, Christophe Petit, Francesco Regazzoni, Mathieu Renauld, and Fran¸cois-Xavier Standaert. Fresh Re-keying II: Securing Multiple Parties against Side-Channel and Fault Attacks. In Emmanuel Prouff, editor, Smart Card Research and Advanced Applications - 10th IFIP WG 8.8/11.2 International Conference, CARDIS 2011, Leuven, Belgium, September 14-16, 2011, Revised Selected Papers, volume 7079 of Lecture Notes in Computer Science, pages 115–132. Springer, 2011. 13

[MRR+15] Ramya Jayaram Masti, Devendra Rai, Aanjhan Ranganathan, Christian M¨uller, Lothar Thiele, and Srdjan Capkun. Thermal Covert Channels on Multi-core Plat- forms. In 24th USENIX Security Symposium, USENIX Security 15, Washington, D.C., USA, August 12-14, 2015., pages 865–880, 2015.

[MSGR10] Marcel Medwed, Fran¸cois-Xavier Standaert, Johann Großsch¨adl,and Francesco Regazzoni. Fresh Re-keying: Security against Side-Channel and Fault Attacks for Low-Cost Devices. In Daniel J. Bernstein and Tanja Lange, editors, Progress in Cryptology - AFRICACRYPT 2010, Third International Conference on Cryptology in Africa, Stellenbosch, South Africa, May 3-6, 2010. Proceedings, volume 6055 of Lecture Notes in Computer Science, pages 279–296. Springer, 2010. 13

[MSPB18] Antun Maldini, Niels Samwel, Stjepan Picek, and Lejla Batina. Genetic algorithm- based electromagnetic fault injection. In 2018 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2018, Amsterdam, The Netherlands, September 13, 2018. IEEE Computer Society, 2018. 68

114 Bibliography

[MSS06] Amir Moradi, Mohammad T. Manzuri Shalmani, and Mahmoud Salmasizadeh. A Generalized Method of Differential Fault Attack Against AES Cryptosystem. In Cryptographic Hardware and Embedded Systems - CHES 2006, 8th International Workshop, Yokohama, Japan, October 10-13, 2006, Proceedings, pages 91–100, 2006. 15

[Muk09] Debdeep Mukhopadhyay. An Improved Fault Based Attack of the Advanced Encryption Standard. In Bart Preneel, editor, Progress in Cryptology - AFRICACRYPT 2009, Second International Conference on Cryptology in Africa, Gammarth, Tunisia, June 21-25, 2009. Proceedings, volume 5580 of Lecture Notes in Computer Science, pages 421–434. Springer, 2009. 15

[MvOV96] Alfred Menezes, Paul C. van Oorschot, and Scott A. Vanstone. Handbook of Applied Cryptography. CRC Press, 1996. 49, 50

[MW15] Amir Moradi and Alexander Wild. Assessment of Hiding the Higher-Order Leak- ages in Hardware - What Are the Achievements Versus Overheads? In Tim G¨uneysuand Helena Handschuh, editors, Cryptographic Hardware and Embed- ded Systems - CHES 2015 - 17th International Workshop, Saint-Malo, France, September 13-16, 2015, Proceedings, volume 9293 of Lecture Notes in Computer Science, pages 453–474. Springer, 2015. 13

[Nag12] Christoph Nagl. Exploiting the Virtex 6 System Monitor for Power-Analysis At- tacks. Master’s thesis, IAIK - Graz University of Technology, 2012. 25

[NDA+01] Siva Narendra, Vivek De, Dimitri Antoniadis, Anantha Chandrakasan, and Shekhar Borkar. Scaling of stack effect and its application for leakage reduction. In Enrico Macii, Vivek De, and Mary Jane Irwin, editors, Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001, Huntington Beach, California, USA, 2001, pages 195–200. ACM, 2001. 23

[NFR07] Giorgio Di Natale, Marie-Lise Flottes, and Bruno Rouzeyre. An On-Line Fault Detection Scheme for SBoxes in Secure Circuits. In 13th IEEE International On- Line Testing Symposium (IOLTS 2007), 8-11 July 2007, Heraklion, Crete, Greece, pages 57–62, 2007. 16

[OC15] Colin O’Flynn and Zhizhang Chen. Synchronous sampling and clock recovery of internal oscillators for side channel analysis and fault injection. J. Cryptographic Engineering, 5(1):53–69, 2015. 25, 34

[OKKG05] Vitalij Ocheretnij, G. Kouznetsov, Ramesh Karri, and Michael G¨ossel. On-Line Error Detection and BIST for the AES Encryption Algorithm with Different S-Box Implementations. In IOLTS, pages 141–146. IEEE Computer Society, 2005. 16

[OR14] David Oswald and Bastian Richter. SCATools–Open tools for side-channel analysis and related techniques., 2014. available at https://github.com/emsec/SCATools. 73

115 Bibliography

[OSS+13] David Oswald, Daehyun Strobel, Falk Schellenberg, Timo Kasper, and Christof Paar. When Reverse-Engineering Meets Side-Channel Analysis - Digital Lockpick- ing in Practice. In Selected Areas in Cryptography - SAC 2013 - 20th International Conference, Burnaby, BC, Canada, August 14-16, 2013, Revised Selected Papers, pages 571–588, 2013. [Osw13] David Oswald. Implementation Attacks: From Theory to Practice, 2013. Disser- tation, Ruhr-Universit¨atBochum, Germany. 25 [PBBJ15] Stjepan Picek, Lejla Batina, Pieter Buzing, and Domagoj Jakobovic. Fault In- jection with a New Flavor: Memetic Algorithms Make a Difference. In Stefan Mangard and Axel Y. Poschmann, editors, Constructive Side-Channel Analysis and Secure Design - 6th International Workshop, COSADE 2015, Berlin, Ger- many, April 13-14, 2015. Revised Selected Papers, volume 9064 of Lecture Notes in Computer Science, pages 159–173. Springer, 2015. 68 [PBJC14] Stjepan Picek, Lejla Batina, Domagoj Jakobovic, and Rafael Boix Carpi. Evolving genetic algorithms for fault injection attacks. In 37th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2014, Opatija, Croatia, May 26-30, 2014, pages 1106–1111. IEEE, 2014. 68 [PJA88] Ronald L. Pease, Allan H. Johnston, and Joseph L. Azarewicz. Radiation testing of semiconductor devices for space electronics. Proceedings of the IEEE, 76(11):1510– 1526, Nov 1988. 59 [PLL+99] V. Pouget, H. Lapuyade, D. Lewis, Y. Deval, P. Fouillat, and L. Sarger. SPICE modeling of the transient response of irradiated MOSFETs. In 1999 Fifth European Conference on Radiation and Its Effects on Components and Systems. RADECS 99 (Cat. No.99TH8471), pages 69–74, 1999. 61 [PQ03] Gilles Piret and Jean-Jacques Quisquater. A Differential Fault Attack Technique against SPN Structures, with Application to the AES and KHAZAD. In Colin D. Walter, C¸etin Kaya Ko¸c,and Christof Paar, editors, Cryptographic Hardware and Embedded Systems - CHES 2003, 5th International Workshop, Cologne, Germany, September 8-10, 2003, Proceedings, volume 2779 of Lecture Notes in Computer Science, pages 77–88. Springer, 2003. 15 [PSKM15] Santos Merino Del Pozo, Fran¸cois-Xavier Standaert, Dina Kamel, and Amir Moradi. Side-channel attacks from static power: when should we care? In Wolfgang Nebel and David Atienza, editors, Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, DATE 2015, Grenoble, France, March 9-13, 2015, pages 145–150. ACM, 2015. 24 [PTH+15] Athanasios Papadimitriou, Marios Tampas, David H´ely, Vincent Beroulle, Paolo Maistri, and R´egisLeveugle. Validation of RTL laser fault injection model with respect to layout information. In IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2015, Washington, DC, USA, 5-7 May, 2015, pages 78–81. IEEE Computer Society, 2015. 61

116 Bibliography

[QS02] Jean-Jacques Quisquater and David Samyde. Eddy current for magnetic analysis with active sensor. In Proceedings of Esmart, 2002. 14

[RDT13] Cyril Roscian, Jean-Max Dutertre, and Assia Tria. Frontside laser fault injection on cryptosystems - Application to the AES’ last round -. In 2013 IEEE International Symposium on Hardware-Oriented Security and Trust, HOST 2013, Austin, TX, USA, June 2-3, 2013, pages 119–124. IEEE Computer Society, 2013. 63

[RF79] Lord Rayleigh F.R.S. XXXI. Investigations in optics, with special reference to the spectroscope. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 8(49):261–274, 1879. 64, 81

[Ris17a] Riscure BV. Datasheet: IR ring light with camera, 2017. available at https:// www.riscure.com/uploads/2017/07/ringlight_datasheet.pdf, accessed June 27, 2018. 81

[Ris17b] Riscure BV. Datasheet: Twin Scan Laser Station 2 upgrade - Easy to use dual spot laser fault injection v1, 2017. available at https://www.riscure.com/uploads/ 2017/08/Twin-Scan-LS2-Upgrade-datasheet-v1.1.pdf, accessed June 27, 2018. 70

[RP10] Matthieu Rivain and Emmanuel Prouff. Provably Secure Higher-Order Mask- ing of AES. In Stefan Mangard and Fran¸cois-Xavier Standaert, editors, Crypto- graphic Hardware and Embedded Systems, CHES 2010, 12th International Work- shop, Santa Barbara, CA, USA, August 17-20, 2010. Proceedings, volume 6225 of Lecture Notes in Computer Science, pages 413–427. Springer, 2010. 13

[RPD+18] Chethan Ramesh, Shivukumar B. Patil, Siva Nishok Dhanuskodi, George Prov- elengios, S´ebastienPillement, Daniel Holcomb, and Russell Tessier. FPGA Side Channel Attacks without Physical Access. In 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2018, Boulder, CO, USA, April 29 - May 1, 2018, pages 45–52. IEEE Computer Society, 2018. 27, 99

[RSDT13] Cyril Roscian, Alexandre Sarafianos, Jean-Max Dutertre, and Assia Tria. Fault Model Analysis of Laser-Induced Faults in SRAM Memory Cells. In 2013 Work- shop on Fault Diagnosis and Tolerance in Cryptography, Los Alamitos, CA, USA, August 20, 2013, pages 89–98, 2013. 60, 64, 70, 86

[RSWO17] Eyal Ronen, Adi Shamir, Achi-Or Weingarten, and Colin O’Flynn. IoT Goes Nuclear: Creating a ZigBee Chain Reaction. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pages 195–212, 2017. 48

[SA02] Sergei P. Skorobogatov and Ross J. Anderson. Optical Fault Induction Attacks. In Cryptographic Hardware and Embedded Systems - CHES 2002, 4th International Workshop, Redwood Shores, CA, USA, August 13-15, 2002, Revised Papers, pages 2–12, 2002. 14, 59

117 Bibliography

[SBHS15] Bodo Selmke, Stefan Brummer, Johann Heyszl, and Georg Sigl. Precise Laser Fault Injections into 90 nm and 45 nm SRAM-cells. In Smart Card Research and Advanced Applications - 14th International Conference, CARDIS 2015, Bochum, Germany, November 4-6, 2015. Revised Selected Papers, pages 193–205, 2015. 60, 64, 86

[SBO+15] Daehyun Strobel, Florian Bache, David Oswald, Falk Schellenberg, and Christof Paar. SCANDALee: A side-ChANnel-based DisAssembLer using Local Electro- magnetic Emanations. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, DATE 2015, Grenoble, France, March 9-13, 2015, pages 139–144, 2015. 25

[Sch10] Falk Schellenberg. Comparing Power and Electromagnetic Analysis of Embedded Devices, 2010. Bachelors’s Thesis, Ruhr-Universit¨atBochum. 25

[SDK+13] Daehyun Strobel, Benedikt Driessen, Timo Kasper, Gregor Leander, David Os- wald, Falk Schellenberg, and Christof Paar. Fuming Acid and Cryptanalysis: Handy Tools for Overcoming a Digital Locking and Access Control System. In Advances in Cryptology - CRYPTO 2013 - 33rd Annual Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2013. Proceedings, Part I, pages 147–164, 2013.

[SFG+16] Falk Schellenberg, Markus Finkeldey, Nils Gerhardt, Martin Hofmann, Amir Moradi, and Christof Paar. Large Laser Spots and Fault Sensitivity Analysis. In 2016 IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2016, McLean, VA, USA, May 3-5, 2016, pages 203–208, 2016. 85

[SFR+15] Falk Schellenberg, Markus Finkeldey, Bastian Richter, Maximilian Sch¨apers, Nils Gerhardt, Martin Hofmann, and Christof Paar. On the Complexity Reduction of Laser Fault Injection Campaigns Using OBIC Measurements. In 2015 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2015, Saint Malo, France, September 13, 2015, pages 14–27, 2015. 14, 67

[SGD08] Nidhal Selmane, Sylvain Guilley, and Jean-Luc Danger. Practical Setup Time Vi- olation Attacks on AES. In Seventh European Dependable Computing Conference, EDCC-7 2008, Kaunas, Lithuania, 7-9 May 2008, pages 91–96. IEEE Computer Society, 2008. 14

[SGMT18a] Falk Schellenberg, Dennis R. E. Gnad, Amir Moradi, and Mehdi B. Tahoori. Re- mote Inter-Chip Power Analysis Side-Channel Attacks at Board-Level. In 2018 International Conference On Computer Aided Design, ICCAD 2018, San Diego, CA, USA, November 5-8, 2018. 47, 49, 50

[SGMT18b] Falk Schellenberg, Dennis R. E. Gnad, Amir Moradi, and Mehdi Baradaran Tahoori. An Inside Job: Remote Power Analysis Attacks on FPGAs. In 2018 Design, Automation & Test in Europe Conference & Exhibition, DATE 2018, Dres- den, Germany, March 19-23, 2018, pages 1111–1116, 2018. 27, 29, 33, 99

118 Bibliography

[SHS16] Bodo Selmke, Johann Heyszl, and Georg Sigl. Attack on a DFA Protected AES by Simultaneous Laser Fault Injections. In 2016 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2016, Santa Barbara, CA, USA, August 16, 2016, pages 36–46. IEEE Computer Society, 2016. 14 [Sko05] Sergei P. Skorobogatov. Semi-invasive attacks – A new approach to hardware security analysis. Technical Report UCAM-CL-TR-630, University of Cambridge, Computer Laboratory, April 2005. 69 [Sko06] Sergei P. Skorobogatov. Optically Enhanced Position-Locked Power Analysis. In Louis Goubin and Mitsuru Matsui, editors, Cryptographic Hardware and Embedded Systems - CHES 2006, 8th International Workshop, Yokohama, Japan, October 10- 13, 2006, Proceedings, volume 4249 of Lecture Notes in Computer Science, pages 61–75. Springer, 2006. 69 [Sko10] Sergei Skorobogatov. Flash Memory ’Bumping’ Attacks. In Stefan Mangard and Fran¸cois-Xavier Standaert, editors, Cryptographic Hardware and Embedded Sys- tems, CHES 2010, 12th International Workshop, Santa Barbara, CA, USA, Au- gust 17-20, 2010. Proceedings, volume 6225 of Lecture Notes in Computer Science, pages 158–172. Springer, 2010. 69 [SMC09] Dhiman Saha, Debdeep Mukhopadhyay, and Dipanwita Roy Chowdhury. A Diag- onal Fault Attack on the Advanced Encryption Standard. IACR Cryptology ePrint Archive, 2009:581, 2009. 15 [SNK+12] Alexander Schl¨osser,Dmitry Nedospasov, Juliane Kr¨amer,Susanna Orlic, and Jean-Pierre Seifert. Simple Photonic Emission Analysis of AES - Photonic Side Channel Analysis for the Rest of Us. In Emmanuel Prouff and Patrick Schaumont, editors, Cryptographic Hardware and Embedded Systems - CHES 2012 - 14th Inter- national Workshop, Leuven, Belgium, September 9-12, 2012. Proceedings, volume 7428 of Lecture Notes in Computer Science, pages 41–57. Springer, 2012. 59 [SOR+14] Daehyun Strobel, David Oswald, Bastian Richter, Falk Schellenberg, and Christof Paar. Microcontrollers as (In)Security Devices for Pervasive Computing Applica- tions. Proceedings of the IEEE, 102(8):1157–1173, 2014. [SSS+17] Takeshi Sugawara, Natsu Shoji, Kazuo Sakiyama, Kohei Matsuda, Noriyuki Miura, and Makoto Nagata. Exploiting Bitflip Detector for Non-invasive Probing and its Application to Ineffective Fault Analysis. In 2017 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2017, Taipei, Taiwan, September 25, 2017, pages 49–56. IEEE Computer Society, 2017. 60 [SW90] Keith C. Stevens and Thomas J. Wilson. Locating IC defects in process monitors and test structures using optical beam induced current. Microelectronic Engineer- ing, 12(1):397 – 404, 1990. Special Issue on Electron and Optical Beam Testing of Integrated Circuits. 69, 81 [SZK+18] Bodo Selmke, Kilian Zinnecker, Philipp Koppermann, Katja Miller, Johann Heyszl, and Georg Sigl. Latch-up-locked? — An empirical study on laser fault injection

119 Bibliography

into ARM Cortex-M processors. In 2018 Workshop on Fault Diagnosis and Toler- ance in Cryptography, FDTC 2018, Amsterdam, The Netherlands, September 13, 2018. IEEE Computer Society, 2018. 60

[TAV02] Kris Tiri, Moonmoon Akmal, and Ingrid Verbauwhede. A dynamic and differential CMOS logic with signal independent power consumption to withstand differential power analysis on smart cards. In Proceedings of the 28th European Solid-State Circuits Conference, pages 403–406, Sept 2002. 13

[TBM14] Harshal Tupsamudre, Shikha Bisht, and Debdeep Mukhopadhyay. Destroying Fault Invariant with Randomization - A Countermeasure for AES Against Dif- ferential Fault Attacks. In CHES, volume 8731 of Lecture Notes in Computer Science, pages 93–111. Springer, 2014. 16

[TLSB17] Shahin Tajik, Heiko Lohrke, Jean-Pierre Seifert, and Christian Boit. On the Power of Optical Contactless Probing: Attacking Bitstream Encryption of FPGAs. In Bhavani M. Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu, editors, Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communi- cations Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, pages 1661–1674. ACM, 2017. 59

[TM17] Steve Trimberger and Steve McNeil. Security of FPGAs in data centers. In IEEE 2nd International Verification and Security Workshop, IVSW 2017, Thessaloniki, Greece, July 3-5, 2017, pages 117–122, 2017. 31

[TSS17] Adrian Tang, Simha Sethumadhavan, and Salvatore J. Stolfo. CLKSCREW: Ex- posing the Perils of Security-Oblivious Energy Management. In 26th USENIX Security Symposium, USENIX Security 2017, Vancouver, BC, Canada, August 16-18, 2017., pages 1057–1074, 2017. 30

[Tur99] Ray Turner. System-Level Verification - A Comparison of Approaches. In Pro- ceedings of the Tenth IEEE International Workshop on Rapid System Prototyping (RSP 1999), Clearwater, Florida, USA, June 16-18, 1999, pages 154–159, 1999. 48

[TV04] Kris Tiri and Ingrid Verbauwhede. A Logic Level Design Methodology for a Secure DPA Resistant ASIC or FPGA Implementation. In 2004 Design, Automation and Test in Europe Conference and Exposition (DATE 2004), 16-20 February 2004, Paris, France, pages 246–251. IEEE Computer Society, 2004. 13

[ULT07] ULTRA TEC Manufacturing, Inc. Selected Area Preparation - Solu- tions for Decapsulation, Substrate Thinning and Polishing, 2007. avail- able at http://www.ultratecusa.com/sites/default/files/content_files/ ASAP-1Brochurelowres-S-10-07.pdf. 64

[VKS11] Ingrid Verbauwhede, Dusko Karaklajic, and J¨orn-MarcSchmidt. The Fault Attack Jungle - A Classification Model to Guide You. In Luca Breveglieri, Sylvain Guil- ley, Israel Koren, David Naccache, and Junko Takahashi, editors, 2011 Workshop

120 Bibliography

on Fault Diagnosis and Tolerance in Cryptography, FDTC 2011, Tokyo, Japan, September 29, 2011, pages 3–8. IEEE Computer Society, 2011. 14

[VTM+17] Aurelien Vasselle, Hugues Thiebeauld, Quentin Maouhoub, Adele Morisset, and Sebastien Ermeneux. Laser-Induced Fault Injection on Smartphone Bypassing the Secure Boot. In 2017 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2017, Taipei, Taiwan, September 25, 2017, pages 41–48. IEEE Computer Society, 2017. 60

[vWWM11] Jasper G. J. van Woudenberg, Marc F. Witteman, and Federico Menarini. Practi- cal Optical Fault Injection on Secure Microcontrollers. In 2011 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2011, Tokyo, Japan, September 29, 2011, pages 91–99, 2011. 68, 70

[WDHB17] Paul N. Whatmough, Shidhartha Das, Zacharias Hadjilambrou, and David M. Bull. Power Integrity Analysis of a 28 nm Dual-Core ARM Cortex-A57 Cluster Using an All-Digital Power Delivery Monitor. J. Solid-State Circuits, 52(6):1643–1654, 2017. 53

[Web96] Robert H. Webb. Confocal optical microscopy. Reports on Progress in Physics, 59(3):427, 1996. 71

[WH10] Neil Weste and David Harris. CMOS VLSI Design: A Circuits and Systems Per- spective. Addison-Wesley Publishing Company, USA, 4th edition, 2010. 22, 23, 24

[WM87] T. Wilson and E. M. McCabe. Theory of optical beam induced current images of defects in semiconductors. Journal of Applied Physics, 61(1):191–195, 1987. 69, 81

[WMG18] Alexander Wild, Amir Moradi, and Tim G¨uneysu. GliFreD: Glitch-Free Dupli- cation Towards Power-Equalized Circuits on FPGAs. IEEE Trans. Computers, 67(3):375–387, 2018. 13

[Wra91] John C. Wray. An Analysis of Covert Timing Channels. In Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy, pages 2–7, 1991.

[Xil10] Xilinx, Inc. Spartan-6 FPGA Configurable Logic Block - User Guide, UG384 (v1.1), 2010. https://www.xilinx.com/support/documentation/user_guides/ ug384.pdf. 20, 21

[Xil13] Xilinx, Inc. Introduction to FPGA Design with Vivado High-Level Synthesis, UG998 (v1.0), 2013. https://www.xilinx.com/support/documentation/sw_ manuals/ug998-vivado-intro-fpga-design-hls.pdf. 20

[Xil18] Xilinx, Inc. Zynq-7000 SoC (Z-7007S, Z-7012S, Z-7014S, Z-7010, Z-7015, and Z-7020): DC and AC Switching Characteristics, DS187 (v1.20.1), 2018. https://www.xilinx.com/support/documentation/data_sheets/ ds187-XC7Z010-XC7Z020-Data-Sheet.pdf. 44

121 Bibliography

[YF14] Yuval Yarom and Katrina Falkner. FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. In Kevin Fu and Jaeyeon Jung, editors, Pro- ceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014., pages 719–732. USENIX Association, 2014. 48

[YJ00] Sung-Ming Yen and Marc Joye. Checking Before Output May Not Be Enough Against Fault-Based Cryptanalysis. IEEE Trans. Computers, 49(9):967–970, 2000. 15, 16

[YPER99] Wai Mun Yee, M. Paniccia, T. Eiles, and V. Rao. Laser voltage probe (LVP): a novel optical probing technology for flip-chip packaged microprocessors. In Pro- ceedings of the 1999 7th International Symposium on the Physical and Failure Analysis of Integrated Circuits (Cat. No.99TH8394), pages 15–20, July 1999. 59

[Zon14] Andrew Zonenberg. Getting my feet wet with invasive attacks, part 2: The attack , 2014. available at http://siliconexposed.blogspot.com/2014/03/ getting-my-feet-wet-with-invasive_31.html. 59

[ZS18] Mark Zhao and G. Edward Suh. FPGA-Based Remote Power Side-Channel At- tacks. In 2018 IEEE Symposium on Security and Privacy, SP 2018, Proceedings, 21-23 May 2018, San Francisco, California, USA, pages 229–244. IEEE, 2018. 27, 28, 44, 49, 99

[ZSZF13] Kenneth M. Zick, Meeta Srivastav, Wei Zhang, and Matthew French. Sens- ing nanosecond-scale voltage attacks and natural transients in FPGAs. In The 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, Monterey, CA, USA, February 11-13, 2013, pages 101–104, 2013. 26

122 List of Abbreviations

ADC Analog to Digital Converter AES Advanced Encryption Standard ASIC Application Specific Integrated Circuit AXI Advanced eXtensible Interface BRAM Block RAM CED Concurrent Error Detection CLB Configurable Logic Block CMOS Complementary Metal Oxide Semiconductor CPA Correlation Power Analysis CPU Central Processing Unit DCM Digital Clock Manager DES DFA Differential Fault Analysis DPA Differential Power Analysis DRAM Dynamic Random Access Memory DSP Digital Signal Processing DUT Device Under Test EM electromagnetic ESR Equivalent Series Resistance FFT Fast Fourier Transformation FPGA Field Programmable Gate Array FSA Fault Sensitivity Analysis HSM Hardware Security Module IC Integrated Circuit ILA Integrated Logic Analyzer IP Intellectual Property

123 Abbreviations

LFI Laser Fault Injection LIVA Light-Induced Voltage Alteration LUT Look-Up Table NA Numerical Aperture NIR Near Infrared nMOS n-channel Metal Oxide Semiconductor OBIC Optical Beam Induced Current OBIRCH Optical Beam Induced Resistance Change PCB Printed Circuit Board PCIe Peripheral Component Interconnect Express PDN Power Distribution Network PMIC Power Management Integrated Circuit pMOS p-channel Metal Oxide Semiconductor RAM Random Access Memory RF Radio Frequency RSA Rivest Shamir and Adleman (cryptosystem) SEI Seebeck Effect Imaging SEM Scanning Electron Microscopy SET Single Event Transient SEU Single Event Upset SMF Single Mode Fiber SoC System on Chip SPA Simple Power Analysis SRAM Static Random Access Memory TIVA Thermally-Induced Voltage Alteration TRNG True Random Number Generator UART Universal Asynchronous Receiver Transmitter USB Universal Serial Bus

124 List of Figures

3.1 FPGA overview consisting of CLBs, IO-cells, switch matrices, and a functional description of a two-input/one-output LUT...... 20 3.2 Section of a Xilinx Spartan6 SliceL...... 21 3.3 Lumped model of a power distribution network...... 22 3.4 Current flow at a CMOS-inverter when charging and discharging its output capacity. 24 3.6 Tapped delay line...... 26 3.7 Floorplan (rotated right) of one voltage sensor with 18×(LUT, Latch) as part of the initial delay...... 27 3.8 Ring oscillator as voltage sensor...... 28

4.1 Architecture of the AES encryption core...... 32 4.2 Floorplans showing the Experimental Setup with all the relevant parts...... 38 4.3 Single traces measured using an oscilloscope and using our developed sensor at different sampling frequencies...... 39 4.4 Results using the oscilloscope and the internal sensor at different sampling fre- quencies...... 40 4.5 Correlation using 5 000 traces using the internal sensor placed far away from the AES module...... 40 4.6 Trace and CPA attack of AES implemented on the Basys3 development board.. 41 4.7 Trace and CPA attack of AES implemented in the FPGA fabric of the PYNQ-Z1. 41

5.1 Binary exponentiation on the ARM core captured with the sensor in fabric.... 45 5.2 Spectrogram of the binary exponentiation on the ARM core captured with the sensor in fabric...... 46

6.1 Averaged traces measured during AES using the voltage sensors at different sam- pling frequencies...... 51 6.3 CPA attack on AES: Progressive curves over the number of traces...... 51 6.2 CPA attack on AES: Results to estimate the sensor quality at different sampling rates...... 54 6.4 Binary exponentiation for RSA captured with the voltage sensor on a separate FPGA...... 55 6.5 Detail of the binary exponentiation captured with the voltage sensor after apply- ing a 900 kHz low-pass filter...... 55

7.1 Effect of a laser beam hitting a p-n junction...... 60 7.2 Cross section of an inverter implemented in CMOS technology, effects of laser fault injection...... 62

125 List of Figures

7.3 Schematic and layout of a SRAM cell with six transistors, marked locations sensitive to laser fault injection...... 63 7.4 SRAM of an Atmel ATXmega16A4U microcontroller, SEM image of the die and locations of faults...... 64 7.5 Photon energy and absorption depth (dashed) in silicon at 300 K...... 65

8.1 Block diagram of the confocal laser scanning setup...... 72 8.2 Schematic of the used laser scanning microscope...... 72 8.3 Captured localEM signal of the ATXmega16A4U...... 74 8.4 OBIC over the z-axis...... 75 8.5 Measurement of the OBIC constructed to an image...... 76 8.6 OBIC x/y in detail...... 77 8.7 Correlation of a single pattern resulting in multiple spikes...... 78 8.8 Found flip-flops marked in an OBIC image...... 79 8.9 Correlation for Hamming Distance between consecutive state bytes at the input of the last round after KeyAddition...... 80 8.10 Locations found sensitive to LFI within the previously isolated areas...... 81 8.11 Comparison between OBIC and reflective measurements...... 82 8.12 SEM image of a NAND-gate of an ATXMega32...... 82

9.1 General structure of clocked combinatorial logic between two registers...... 87 9.2 ATxmega16A4U backside image using NIR illumination...... 90 9.3 Timing diagram of the laser fault injection...... 91 9.4 Percentage of different faulty values at the output of the Sbox for increasing laser pulse length...... 92 9.5 Correlation collision for laser-based FSA at different fault percentages and a different number of tests...... 94

126 About the Author

Author information as of October 2018.

Personal Data

Name Falk Schellenberg

Address Chair for Embedded Security Universit¨atsstr.150, ID 2/615 44780 Bochum, Germany

E-Mail [email protected]

Date of birth September 24, 1987

Place of birth Gera, Germany

Education

Since 01/2013 PhD-student, Ruhr-Universit¨atBochum, Electrical and Information Engineering.

10/2010 - 12/2012 M.Sc., Ruhr-Universit¨atBochum, IT Security/Information Engineering.

10/2007 - 09/2010 B.Sc., Ruhr-Universit¨atBochum, IT Security/Information Engineering.

Professional Experience

Since 01/2013 Research Assistant, Ruhr-Universit¨atBochum.

01/2011 - 12/2012 Student Assistant, Ruhr-Universit¨atBochum.

10/2010 - 11/2010 Intern (Werksstudent), ESCRYPT, Bochum.

07/2010 - 09/2010 Intern, ESCRYPT, Bochum.

127 Publications and Academic Activities

Peer-Reviewed Journal Papers

 Lena G¨oring,Markus Finkeldey, Falk Schellenberg, Carsten Brenner, Martin R. Hofmann, and Nils C.Gerhardt. Optical metrology for the investigation of buried technical struc- tures. tm - Technisches Messen, 85(2):104–110, 2017

 Daehyun Strobel, David Oswald, Bastian Richter, Falk Schellenberg, and Christof Paar. Microcontrollers as (In)Security Devices for Pervasive Computing Applications. Proceed- ings of the IEEE, 102(8):1157–1173, 2014

Peer-Reviewed Conference Proceeding

 Falk Schellenberg, Dennis R. E. Gnad, Amir Moradi, and Mehdi B. Tahoori. Remote Inter-Chip Power Analysis Side-Channel Attacks at Board-Level. In 2018 International Conference On Computer Aided Design, ICCAD 2018, San Diego, CA, USA, November 5-8, 2018

 Shahrzad Keshavarz, Falk Schellenberg, Bastian Richter, Christof Paar, and Daniel Hol- comb. SAT-based reverse engineering of gate-level schematics using fault injection and probing. In 2018 IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2018, Washington, DC, USA, April 30 - May 4, 2018, pages 215–220. IEEE Computer Society, 2018

 Falk Schellenberg, Dennis R. E. Gnad, Amir Moradi, and Mehdi Baradaran Tahoori. An Inside Job: Remote Power Analysis Attacks on FPGAs. In 2018 Design, Automation & Test in Europe Conference & Exhibition, DATE 2018, Dresden, Germany, March 19-23, 2018, pages 1111–1116, 2018

 Markus Finkeldey, Lena G¨oring,Falk Schellenberg, Nils C. Gerhardt, and Martin Hof- mann. Backside imaging of a microcontroller with common-path digital holography. In Proc. SPIE 10127, Practical Holography XXXI: Materials and Applications, 1012704, 2017

 Markus Finkeldey, Lena G¨oring,Falk Schellenberg, Carsten Brenner, Nils C. Gerhardt, and Martin R. Hofmann. Multimodal backside imaging of a microcontroller using confocal laser scanning and optical-beam-induced current imaging. In Proc. SPIE 10110, Photonic Instrumentation Engineering IV, 101101F, 2017

 Falk Schellenberg, Markus Finkeldey, Nils Gerhardt, Martin Hofmann, Amir Moradi, and Christof Paar. Large Laser Spots and Fault Sensitivity Analysis. In 2016 IEEE Interna- tional Symposium on Hardware Oriented Security and Trust, HOST 2016, McLean, VA, USA, May 3-5, 2016, pages 203–208, 2016

128 Publications and Academic Activities

 Markus Finkeldey, Falk Schellenberg, Nils C. Gerhardt, Christof Paar, and Martin R. Hofmann. Common-path depth-filtered digital holography for high resolution imaging of buried semiconductor structures. In Proc. SPIE 9771, Practical Holography XXX: Materials and Applications, 97710G, 2016

 Falk Schellenberg, Markus Finkeldey, Bastian Richter, Maximilian SchÃďpers, Nils Ger- hardt, Martin Hofmann, and Christof Paar. On the Complexity Reduction of Laser Fault Injection Campaigns Using OBIC Measurements. In 2015 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2015, Saint Malo, France, September 13, 2015, pages 14–27, 2015

 Daehyun Strobel, Florian Bache, David Oswald, Falk Schellenberg, and Christof Paar. SCANDALee: A side-ChANnel-based DisAssembLer using Local Electromagnetic Ema- nations. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, DATE 2015, Grenoble, France, March 9-13, 2015, pages 139–144, 2015

 David Oswald, Daehyun Strobel, Falk Schellenberg, Timo Kasper, and Christof Paar. When Reverse-Engineering Meets Side-Channel Analysis - Digital Lockpicking in Practice. In Selected Areas in Cryptography - SAC 2013 - 20th International Conference, Burnaby, BC, Canada, August 14-16, 2013, Revised Selected Papers, pages 571–588, 2013

 Daehyun Strobel, Benedikt Driessen, Timo Kasper, Gregor Leander, David Oswald, Falk Schellenberg, and Christof Paar. Fuming Acid and Cryptanalysis: Handy Tools for Over- coming a Digital Locking and Access Control System. In Advances in Cryptology - CRYPTO 2013 - 33rd Annual Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2013. Proceedings, Part I, pages 147–164, 2013

Technical Reports

 Anita Aghaie, Amir Moradi, Shahram Rasoolzadeh, Falk Schellenberg, and Tobias Schnei- der. Impeccable Circuits. IACR Cryptology ePrint Archive, 2018:203, 2018

Invited Talks

 The Technology Impact on Laser Fault Injection. 2nd International Verification and Security Workshop (IVSW), July 3-5 2017, Thessaloniki, Greece

129 Publications and Academic Activities

Awards 04/2018 DATE 2018 Nominated for Best-Paper Award Design, Automation & Test in Europe Conference & Exhibition 2018, Dresden, Germany

05/2016 HOST 2016 Best Student-Paper Award 2016 IEEE International Symposium on Hardware Oriented Security and Trust, McLean, VA, USA

11/2013 2nd place CAST-F¨orderpreis IT-Sicherheit 2013 Category Master’s theses, CAST e.V., Darmstadt, Germany

Program Committee Member

 MAL-IoT 2018 Malicious Software and Hardware in Internet of Things, Ischia, Italy

 MAL-IoT 2017 Malicious Software and Hardware in Internet of Things, Siena, Italy

Participation in Selected Conferences, Workshops and Summer Schools

 ICCAD, 2018, San Diego, USA (planned)

 CHES, FDTC 2018, Amsterdam, The Netherlands

 DATE, 2018, Dresden, Germany

 CHES, FDTC, PROOFS, 2017, Taipei, Taiwan

 IVSW, 2017, Thessaloniki, Greece

 Nationale Konferenz zur IT-Sicherheitsforschung, 2017, Berlin, Germany

 CHES, FDTC, 2016, Santa Barbara, USA

 HOST, 2016, Washington D.C., USA

 CHES, FDTC, 2015, Saint-Malo, France

 DATE, 2015, Grenoble, France

 CHES, FDTC, 2014, Busan, Republic of Korea

 Joint MEDIAN–TRUDEVICE Open Forum, 2014, Amsterdam, The Netherlands

 CHES, CRYPTO, FDTC, 2013, Santa Barbara, USA

 SAC, 2013, Vancouver, Canada

130 Publications and Academic Activities

 2. IT-Sicherheitskonferenz, 2013, Stralsund, Germany

 Crypto for 2020, 2013, Tenerife, Spain

 CHES, 2012, Leuven, Belgium

131