AN AUTOMATABLE WORKFLOW TO ANALYZE AND SECURE

INTEGRATED CIRCUITS AGAINST POWER ANALYSIS ATTACKS

by

KEVIN PERERA

Submitted in partial fulfillment of the requirements For the degree of Master of Science

Thesis Advisor: Dr. Daniel G. Saab

Department of Electrical Engineering and Computer Science

CASE WESTERN RESERVE UNIVERSITY

May, 2017

CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis of Kevin Perera Candidate for the degree of Master of Science

Committee Chair Dr. Daniel G. Saab

Committee Member Dr. Christos Papachristou

Committee Member Dr. Ming-Chun Huang

Date of Defense March 31st 2017

*We also certify that written approval has been obtained for any proprietary material contained therein.

Table of Contents

List of Tables ...... 4

List of Figures ...... 5

Acknowledgements ...... 7

Abstract ...... 8

1 Introduction ...... 9

1.1 Focus of Thesis ...... 11

2 Background ...... 12

2.1 Side Channel Attacks ...... 12

2.1.1 Power Analysis Attacks ...... 13

2.1.1.1 Simple Power Analysis (SPA) ...... 15

2.1.1.2 Differential Power Analysis (DPA)...... 16

2.2 AES Algorithm ...... 20

2.2.1 SubBytes sub stage ...... 22

2.2.2 ShiftRows sub stage ...... 23

2.2.3 MixColumns sub stage ...... 24

2.2.4 AddRoundKey sub stage and Rijndael’s key schedule ...... 25

2.2.5 ENS Method ...... 26

2.3 Testability ...... 26

1

2.3.1 Controllability and Observability ...... 27

2.3.1.1 SCOAP (Sandia Controllability/Observability Analysis Program) ...... 28

3 Literature Review ...... 32

3.1 Security-Aware Design Methodology and Optimization ...... 32

3.2 Dynamic Voltage and Frequency Scaling ...... 33

3.3 Converter Reshuffling Power Management ...... 34

3.4 Inductive Integrated Voltage Regulator ...... 35

3.5 Signal Independent Power Consumption CMOS Logic ...... 37

3.6 Differential Pass Transistor Precharge Logic ...... 38

3.7 Dual Voltage Single Rail Logic ...... 40

3.8 Masked Gates ...... 41

4 Preliminary Gate Architecture Experiments...... 43

4.1 Conventional Gates ...... 44

4.2 Smaller Feature Length Technology ...... 46

4.3 Complementary Output Gates ...... 47

4.3.1 Padded Complimentary Gates ...... 50

5 Gate Substitution Workflow ...... 51

5.1 Synthesis Stage and Analysis Stage ...... 52

5.2 Substitution Stage ...... 53

2

6 Application of Workflow ...... 55

6.1 RTL Design ...... 55

6.2 Synthesis and Analysis ...... 56

6.3 Substitution and Conversion to a Netlist ...... 58

6.4 SPICE Simulation ...... 59

6.5 Experiment Results ...... 61

7 Future Work ...... 67

8 Conclusion ...... 68

Appendices ...... 69

Appendix I: 0.5 micron SPICE Models ...... 69

Appendix II: 45 nanometer SPICE Models ...... 70

Appendix III: Conventional Gate Experiment 1 Power Trace ...... 74

Appendix IV: Conventional Gate Experiment 2 Power Trace ...... 76

Appendix V: Conventional Gate 45nm Power Trace ...... 78

Appendix VI: Complimentary Gate Power Trace ...... 80

Appendix VII: Complimentary Gate 45nm Power Trace ...... 82

Appendix VIII: Padded Complimentary Gate Power Trace ...... 84

References ...... 86

3

List of Tables

Table 2.1 S-Box lookup table [11] ...... 23

Table 5.1 Algorithm Stats and System Info...... 53

Table 6.1 AES Circuit Statistics ...... 56

Table 6.2 Changed SPICE options (descriptions from LTSpice help [33]) ...... 60

Table 6.3 Simulation Times and System information ...... 60

4

List of Figures

Figure 2.1 Power consumption of a RSA algorithm that exposes its data [4] ...... 15

Figure 2.2 Sample DPA traces [3] ...... 17

Figure 2.3 Flow diagram of the AES algorithm [9] ...... 21

Figure 2.4 Illustration of Subbytes sub stage [10] ...... 22

Figure 2.5 Illustration of the ShiftRows sub stage [10] ...... 24

Figure 2.6 Illustration of Rjindael’s key schedule [11] ...... 25

Figure 2.7 Illustration of the ENS method for the AES algorithm [12] ...... 26

Figure 2.8 Calculations of combinational controllability for common gates [19] ...... 29

Figure 2.9 Calculations of combinational observability for common gates [19] ...... 30

Figure 3.1 Flow of the proposed algorithm [20] ...... 33

Figure 3.2 DVFS architecture [21] ...... 34

Figure 3.3 Power attack and proposed countermeasure [23] ...... 36

Figure 3.4 Generic N gate and AND/NAND gate using SABL [24] ...... 38

Figure 3.5 (A) Power trace comparison (B) XOR gate in DPPL [25] ...... 39

Figure 3.6 DSDL AND gate [27] ...... 41

Figure 3.7 Masked AND Gate [29] ...... 42

Figure 4.1 Gate test structure ...... 43

Figure 4.2 XOR gate ...... 44

Figure 4.3 Conventional gate power trace 1 ...... 44

Figure 4.4 Conventional gate power trace 2 ...... 45

Figure 4.5 Conventional gate 45nm power trace ...... 46 5

Figure 4.6 Complimentary XOR gate ...... 48

Figure 4.7 Complimentary gate power trace ...... 48

Figure 4.8 Complimentary gates 45nm power trace ...... 49

Figure 4.9 Padded complimentary gates power trace ...... 50

Figure 5.1 Flow diagram of workflow ...... 51

Figure 5.2 OR Gate with obfuscation to counteract power analysis attacks ...... 54

Figure 6.1 AES algorithm with the vulnerable stage highlighted ...... 57

Figure 6.2 Flow diagram of the analysis and substitution stages ...... 58

Figure 6.3 Pre gate substitution Power Trace ...... 61

Figure 6.4 Post gate substitution power trace ...... 62

Figure 6.5 Pre gate substitution power trace (magnified) ...... 62

Figure 6.6 Post gate substitution power trace (magnified) ...... 63

Figure 6.7 Stage Isolated pre gate substitution power trace ...... 64

Figure 6.8 Stage Isolated post gate substitution power trace ...... 64

Figure 6.9 Stage Isolated pre gate substitution power trace (magnified) ...... 65

Figure 6.10 Stage Isolated post gate substitution power trace (magnified) ...... 65

6

Acknowledgements

First and foremost, I would like to thank my advisor, Professor Daniel G. Saab of the Electrical Engineering and Computer Science Department at Case Western Reserve

University. He has provided me with constant support and guidance throughout the length of this research. He has given me access to tools that were crucial for this study and helped me whenever I was stuck on a problem. All things considered, his support was integral to the success of this research.

I would like thank Dr. Christos Papachristou and Dr. Ming-Chun Huang for their support as members of the thesis defense committee. I thank all the professors that I have had as teachers during my time at Case Western Reserve University for providing me with the valuable knowledge and support. I would like to thank Case Western Reserve

University School of Graduate Studies and the Electrical Engineering and Computer

Science department for providing me with the opportunity to pursue this research.

Finally, I would like to thank my family for their support and encouragement throughout the duration of my time at Case Western Reserve University. For without them, this would not have been possible.

7

An Automatable Workflow to Analyze and Secure Integrated

Circuits Against Power Analysis Attacks

Abstract

By

KEVIN PERERA

This thesis presents a workflow that will analyze and secure a circuit during its synthesis stage, based on a security level specified by the designer. The workflow applies the security measures at the gate level. The workflow has three main stages; synthesis, analysis and substitution. After synthesizing and levelizing the circuit; observability and controllability, testability measures are used to determine the vulnerabilities of gates.

Once vulnerable gates are determined they are replaced with gates that perform the same operation, but has measures to prevent power analysis attacks. Experimental results are provided of preliminary experiments carried out to determine the feasibility of gate level solutions to counteract power analysis attacks. The AES algorithm was chosen to apply the work flow in order to verify its effectiveness and the results from this experiment indicate that applying the workflow substantially increased the circuit’s resilience against power analysis attacks.

8

1 Introduction

Electronics play a major role in day to day life as a cornerstone of modern society.

Ever since their origin in the early 19th century electronics have seen exponential growth in terms of development and sophistication. This unprecedented growth has resulted in society being increasingly dependent on electronics and computers; while this is not explicitly a bad thing, it is crucial that the systems the society depends on continue to function safely and reliably as they are expected to. This widespread use of computing devices result in very sensitive data processed in integrated circuits. Due to the delicate nature of the tasks some integrated circuits handle they have become the targets of attackers looking to exploit the information in ways that can cause physical and financial damage on an immense scale. These attacks a divided into two main categories, software based and hardware based. The software based attacks are the more popularized ones that involve viruses, trojans and other malicious software that can be used to cripple and/or steal information from a system, usually these attacks can be remedied by patches and add-ons that provide robust software based protection. Hardware based attacks are more discreet and occur at a much smaller scale because it often requires physical access to the device, hence their relative rarity. However, despite this fact hardware attacks pose a very serious threat to more delicate operations such as military computations, financial transaction, etc. as they cannot be patched during runtime.

9

The potential victims of the attacks can also explain the difference between the effects and exposure of software and hardware attacks. Software attacks usually the end user resulting in visible damage to society therefore attacks of this sort are heavily publicized resulting in a lot of focus on software security among legislative bodies [1]. On the other hand, hardware attacks usually target government agencies and corporate entities which suppresses information about such attacks due to security concerns.

Therefore, despite the massive damage that hardware can cause, the general public to be mostly unaware of hardware attacks resulting in a lack of attention to it by governing organizations. Therefore, it has become crucial to update hardware design standards, in order to maintain the production reliable computing devices. Hardware security has been an active issue ever since the start of the information age in the mid twentieth century, but in the past few years it has become crucial in order to maintain the integrity of electronic systems.

The use of tools and methodologies that are aware of hardware attacks, which in turn can aid the circuit designer to design a circuit that is immune to hardware attacks are one of the solutions to mitigate the issue of hardware security. By having the tools to detect and protect against potential vulnerabilities in the circuit designers are able to choose a good compromise between the impermeability of the circuit to hardware attacks and performance and design parameters of the circuit. As the importance of these two factors can vary significantly depending on the application of the circuit.

10

1.1 Focus of Thesis

This thesis focuses on an automatable workflow for the detection and mitigation of hardware attacks, specifically power analysis attacks during the synthesis stage of the design. Out of the many possible stages of circuit development where security measures can be taken against hardware attacks, the synthesis stage is first where effective measures can be taken without affecting the operation of the circuit. Using common testability measures it is possible search the circuit for potential vulnerabilities above a certain threshold to which security measures are taken. This is the main reason that the synthesis stage was chosen to carry out the security tweaks.

The AES circuit, which is one of the most widely used encryption schemes will be used as the focus of this research. It is a viable candidate because it is based on a National

Institute of Standards and technology specification and is the only cipher approved by the

NSA (National Security Agency). It has robust security features which makes it a prime candidate to demonstrate the possibility of attacks against it and methods to mitigate such attacks. Software simulations of the electrical properties of the circuit are used to test the effectiveness of safeguards against the hardware attacks. In this study the individual safeguards are tested and are tested in the AES circuit in order to observe the effects of them in a large circuit at a given security threshold.

11

2 Background

2.1 Side Channel Attacks

The type of attacks that focus hardware and can be carried out regardless of the operation it is running are known as side channel attacks (SCA), which are hardware based attacks employed in order to gain information about a circuit system and exploit it without affecting the operation of the said circuit (noninvasive attack). The potential solutions to these attacks are varied and case specific. This results in partially unprotected and sub optimally protected devices being produced purely due to the lack of a proper standard. The increase of side channel attacks in the past couple of decades is due to two major reasons [2]. One is the increase in skill and resources; hardware exploits require a thorough knowledge of the operation of circuits and sophisticated equipment, these high requirements kept a lot of potential attackers at bay for a long time. However due to globalization and new innovations in recent years, the required skills and equipment are readily available, resulting in a larger number of attackers exploiting hardware.

The other major trend that contributes to hardware exploits is the realization of the attackers about the importance of hardware. In an electronic eco system hardware is the most privileged platform as all software components are based on the hardware, therefore by gaining access to the hardware the attackers are able to access all aspects of software. In recent times this information has led many attackers to focus on hardware exploits rather than their software counterparts. Apart from this, another attraction for potential attackers to side channel attacks are their noninvasive nature, where the

12

attacker can extract information from the circuit without disrupting its work flow which in turn means that the attack can go unnoticed by the operators of the circuit. This makes side channel attacks very difficult to detect and address.

There are various types of side channel attacks that utilize different types of residual and unintended effects resultant from a circuits operation. There are attacks that focus on the electromagnetic radiation generated from the circuit, attacks that focus on reading remnant information from the circuit after the operation is complete and attacks that analyze the subtle difference in timing between operations to extract information; however some of the most popular and easy to carry out of side channel attacks are the ones that focus on the power consumption of the circuit while it is operating and extract information from it using the difference in the circuits power consumption when carrying out different operation, these are categorized as power analysis attacks.

2.1.1 Power Analysis Attacks

Power analysis attacks is the category that all side channel attacks that use the power consumption of a circuit during its operation to extract information from it falls into. They key to a successful power analysis attack is the correlation between the input or output of the circuit that the attacker has access to and the power consumption of the operation which involves the information sought by the attack. The identification of such a relationship is affected by many factors including the attacker’s knowledge of the algorithm and how it is implemented in the specific circuit. Therefore, it is very case specific for all types of power analysis attacks. Once this relationship is established the

13

attacker is required to obtain the power consumption traces of the circuit running the operation and record the circuits output/input and/or manipulate it’s input one or more times depending on the specific type of attack employed. Once sufficient information is obtained various techniques and analysis needs to be done on the data to extract the required information from it. The exact techniques used are specific to each type of power analysis attack and each has its benefits and drawbacks.

While there are ways to prevent power analysis attacks, it is very difficult and can have undesirable effects on the performance and design parameters of the circuit.

Therefore, most systems can be exploited using some sort of power analysis attack.

Another factor that contributes to the danger of power analysis attacks is the ease of carrying out the attack once access to the circuit is gained. For most cases, a general digital oscilloscope would suffice to gather the required information and the analysis can be performed on third party software without the requirement of the circuit. There are two major types of power analysis attacks, simple power analysis and differential power analysis. Another type of power analysis attacks is called Correlation Power Analysis

(CPA), however this is an extension of differential power analysis and uses the same basic approach when extracting information from a circuit.

14

2.1.1.1 Simple Power Analysis (SPA)

Simple power analysis is the type of power analysis attack where the attacker can obtain information about the circuit directly from the power trace without any statistical operations. The key requirement for circuit eligible for simple power analysis is that operation to be exploited has a direct effect on the power consumption of the circuit. This will allow the attacker to deduce either result of the said operations or the type of operation taking place and in turn formulate this data into the information that was sought. One of the major benefits of SPA is the simplicity of the process as once the relationship between the data and power consumption is identified, obtaining the power trace and using it to extract the data is easy and efficient on the other hand due to this visual nature of the extraction the data can be easily corrupted by noise in circuits with a low signal to noise ratio. Therefore, on circuits that are more complex SPA is challenging.

Circuits with features such as conditional branching, multipliers exponentiators [3] result in a very vulnerable circuit to simple power attacks.

Figure 2.1 Power consumption of a RSA algorithm that exposes its data [4]

15

SPA is easily preventable by measures such as the avoidance of conditional branching operations. In addition to this most implementations of symmetric cryptographic systems have very small power consumptions such that SPA will not be able to extract ay information regarding the secret key.

2.1.1.2 Differential Power Analysis (DPA).

Differential power analysis is a much more powerful attack that uses statistical data to determine the secret key of a circuit. Differential power analysis can also be used in cases where the changes in power consumption is distorted by measurement error and noise (low signal to noise ratio). At the core of DPA is the selection function, it provides the value of a certain bit given the cypher text of the function and a key guess. Another parameter used in the selection function is the threshold which is used to determine the output of the selection function. This threshold used to differentiate the output of the selection function is determined by the leakage model. The Primary leakage model used is the hamming distance model, this model is based on the assumption that the change in power consumption is dependent on the number of output bits that are changed, and where a large number of bits changing would result in a larger effect on the power consumption as opposed to a smaller change in the output. Another leakage model is the

Hamming weight leakage model which is based on the assumption that the power consumption is directly correlated to the amount of output bits that are logical high.

There are four major steps in carrying out a differential power analysis attack. Initially the attacker records a predetermined number of power traces (the number of traces required

16

varies on the signal to noise ration of the circuit. As the amount of noise increases the attacker needs to obtain more power traces), records the relevant cypher text and then partitions them into two sets depending on the value of the selection function for that particular power trace. Then the attacker proceeds to obtain the average power trace of each set. The next step in the process is two obtain the difference of the two averages of each set, at this step the attacker will be able to determine if the initial key guess is correct or not because if the key guess is correct the difference power trace will contain the spike where the operation occurs if not the power trace will be relatively flat.

Figure 2.2 Sample DPA traces [3]

17

The differential power analysis traces obtained from a DES (Data Encryption

Standard) circuit by Kocher et al in figure 2.2[3]; shows 3 key guesses used in the selection function for the circuit. It is clear that the key guess used to obtain the first DPA trace is correct indicated by the spike and the other 2 are incorrect as they are relatively flat. The danger of differential power analysis is that regardless of the implementation of the algorithm given enough samples are taken any circuit which does not have safe guards specifically to prevent DPA can be exploited. It is even possible through a lot of analysis of output data and manipulation of input data to use differential power analysis on circuits with unknown algorithms.

The AES algorithm is also vulnerable to differential power analysis. The operation of the AES circuit that may be exploited using DPA depends on the information the attacker has access to. If the attacker has access to the input data of the circuit the initial

XOR operation between the data and the key is a prime operation to exploit as there is a direct and known relationship between the input data which the attacker knows of and the key (As the key is simply the output of this operation XOR’ed with the data bits).

Another possible target for differential power analysis is final XOR operation in the final stage of the AES circuit. By observing the encrypted output data, the attacker is able to determine the final key value using DPA. Thereafter the attacker is able to determine the initial key by reverse engineering the key stages of the AES circuit. Another vulnerable operation in the AES circuit is the S-box module as it is non-linear and operates on each

8-bit unit separately making it easy to target individual bits, which in turn greatly simplifies the statistical techniques required in the selection function. Therefore, it is

18

evident that the differential power analysis attack is versatile and effective enough to exploit even highly complex circuits such as the AES circuit, if it does not have safeguards specifically implemented to prevent such attack.

There are several methods to prevent differential power analysis attacks against circuits such as the use of dummy instructions [3], random power consumption, duplicate logic and masking; however some of these safeguards can be bypassed by modifying the differential power analysis attack process. One such occurrence is the use of second order differential power analysis to hinder the effects of masking [5], which prevent operations from directly operating on the vulnerable data by performing logic operations on it before the data reaches its next operation. Higher order differential power analysis attacks pose a threat to most conventional tactics of power analysis attack resistance and they also have the advantage of requiring only the data that which is required for an ordinary differential power analysis and only requires the attacker to develop the statistical technique to process the data. However to develop these complex techniques requires a lot of effort especially if the architecture of the circuit is unknown. The best method to counteract such attacks is a security aware design and safeguards that specifically disrupt this type of attacks, such as splitting, where the vulnerable data is split into chunks when they are used in operations [6].

19

2.2 AES Algorithm

The Advanced Encryption Standard (AES) is an encryption standard issued by the

National Institute of Standards and Technology (NIST) in 2001 [7]. The standard utilizes the Rijndael algorithm, developed by Joan Daemon of Proton World International and

Vincent Rijmen of Kathlieke Universiteit Leuven. The specification specifies three possible key lengths for the AES algorithm, 128, 192 and 256 bits and it is capable of encrypting blocks of 128 bits. One of the key features of the AES algorithm is its use of both linear and non-linear operations. Being one of the most widely used algorithms its applications vary from file system encryption to wireless network security. The AES algorithm operates on 4 by 4 matrix of 8 bit chunks totaling to 128 bits which is the standard input and output of the algorithm. In the AES algorithm the encryption of the data is carried out in stages and the number of these stages depends on the key length of the particular implementation. The number of stages can be 10, 12 and 14 for the 128-bit key, 192-bit key and 256-bit key variants, respectively. A typical stage of the AES algorithm consists of

4 sub-stages (except the final stage, which does not have the mix column sub stage), which are called subbytes, shiftrows, mixcoloumns and addroundkey. The key used in each stage differs from the original key as well, these different keys are obtained by processing the original key using rjindael key schedule repeatedly for each stage. It has been proven that any direct attack directed towards the AES algorithm is slower than an exhaustive key search (ranging from 2^127 to 2^255 operations depending on the key length variant used) when the complete 10 (12 or 14 for 192 and 256 bit versions) stages are employed [8]. However, it should be noted that even though direct crypto analytical

20

attacks are futile against the AES algorithm side channel attacks could be used to exploit and extract information from it.

Figure 2.3 Flow diagram of the AES algorithm [9]

21

2.2.1 SubBytes sub stage

In this stage the input bytes are each replaced by values obtained by a look up table (called S box). This table is obtained by the multiplicative inverse in the Galois field

GF(28). This is done to obtain a non-linear set of results, as this is the main non-linear component of the AES algorithm. The values are also chosen due to its nature where no subbyte value of a byte would result in the input byte itself (fixed points) and opposite points where the XOR operation between the subbytes output and the input would result in all logical highs. In AES implementations the s-box can be dynamically calculated or obtained from a look up table.

Figure 2.4 Illustration of Subbytes sub stage [10]

22

Table 2.1 S-Box lookup table [11]

2.2.2 ShiftRows sub stage

The shift rows operation uses the horizontal rows in the 4 by 4 matrix structure to shift the data in a cyclic pattern with an offset based on the row number. For the 128 bit and 192 bit variants the shifts are 0,1,2 and 3 bytes for the first, second, third and fourth rows, respectively while for the 256 bit variant it is 0,1,3 and 4 bytes for the first, second, third and fourth rows, respectively. The purpose of this sub stage is to redistribute the data among the vertical columns at each stage in order to prevent the algorithm from being 4 separate operations.

23

Figure 2.5 Illustration of the ShiftRows sub stage [10]

2.2.3 MixColumns sub stage

In this stage the four bytes of each vertical column in the 4 by 4 matrix in taken as the input and the output is 4 bytes that are affected by each of the inputs. The calculation of this output is given by the following equation [7]:

Which in turn, simplifies to the following four equations each corresponding to a different output of this stage which replace the column in the original matrix:

24

2.2.4 AddRoundKey sub stage and Rijndael’s key schedule

Rijndael’s key schedule is used in the AES algorithm to generate round keys from the original key for each AES stage. This operation has three major components named

RotWord, which rotates a 32 bit word 8 bits to the left with wrap around, Rcon, described as the round constant word array [7] and SubWord, which is the same operation as the

SubBytes sub stage. The key schedule operation is as follows:

Figure 2.6 Illustration of Rjindael’s key schedule [11]

The addRoundkey sub stage of the AES algorithm is the bitwise XOR operation between the round key output from the keyschedule and the data.

25

2.2.5 ENS Method

Another way to implement the AES circuit, is to use the ENS method [12]. This process replaces the majority of steps in the AES algorithm by 3 look up tables of which the third can be calculated dynamically from the first two. While an implementation of an

AES algorithm with the ENS method would minimize the number of logical operation it does require the use of two 16 by 16 look up tables resulting in a high memory requirement.

Figure 2.7 Illustration of the ENS method for the AES algorithm [12]

2.3 Testability

Testability is a measure of the ability to test the operation of a circuit during or after its design. It is a very important part of the design process as it verifies and validates the circuit. Once designed if the circuit is that of low testability it is quite expensive to test that circuit, using methods such as automatic test pattern generation [13], which is

26

required in order to completely verify the functionality of the circuit. Another method of testing such circuits is through randomly generated patterns, which also has its drawbacks of being unable to be certain that all possible operations in the circuit are tested and the indefinite duration it might take. Therefore designers tend to opt for design of testing

(DFT) techniques when designing a circuit, these are specific design techniques which specifically increase the testability of the circuit in order to make it easier to test the circuit.

There are two main categories when implementing testable designs, they are structured design for testability and testability measure analysis. One example of structured design is level sensitive design. It is based on the independence of operations from the transition delays and the use of shift registers as all internal memory modules

[14]. Testability measure analysis on the other hand can be categorized into two categories, intrinsic and extrinsic measures. Extrinsic measures use additional hardware along with the circuit to be tested to carry out the test while intrinsic measures use the internal operations and properties of the circuit to carry out the tests [15]. Two primary intrinsic measures of testability is known as controllability and observability. These two measures are capable of identifying components of a circuit that are testable given a certain threshold [16].

2.3.1 Controllability and Observability

Controllability is defined as “In order to be able to do whatever we want with the given dynamic system under control input, the system must be controllable” [17]. In

27

circuits this means that the controllability measures the ability of a tester to access a certain operation within the circuit using its inputs. This is very useful in determining the method in which to test the certain operation in the circuit with the ability to control the inputs of the circuit. Observability is defined as “In order to see what is going on inside the system under observation, the system must be observable” [17]. Therefore in circuits observability is a measure of the ability of the tester to observe the effects of the results of a certain operation on the output of the circuit. This method of testing is useful when the tester has access to the output of the circuits where the tester can see if the intended operation took place accordingly. Combining these two testability measures a program was developed to algorithmically determine the testability measures of a given circuit, this was called SCOAP (Sandia Controllability/Observability Analysis Program) [16].

2.3.1.1 SCOAP (Sandia Controllability/Observability Analysis Program)

SCOAP is a program developed at Sandia National Laboratories to assess the observability and controllability of a given circuit based on a linear complexity algorithm

(O(n)). The SCOAP is based on the calculation of six key functions of the targeted node of the circuit. These functions act as weights to calculate the neighboring nodes. In order to calculate these values efficiently SCOAP reads the circuit as an array of cells, both combinational and sequential. The values computed are divided into combinational measures and sequential measures. The combinational measures are: Combinational 0 controllability (CC0), Combinational 1 controllability (CC1) and Combinational observability (CO). The sequential measures are Sequential 0 controllability (SC0),

Sequential 1 controllability (SC1) and Sequential observability [18]. The final result in 28

SCOAP for each node is a controllability value ranging from 1, most controllable to infinity, least controllable. The value for observability ranges from 0, most observable to infinity, least observable.

The execution of SCOAP starts with the calculation of the controllability values for each node. This is done by first assigning the combinational controllability of each primary input to one and the sequential controllability of each primary input to 0. The controllability values of all the other nodes are set to infinity. The program then iterates forward from the primary inputs by incrementing getting the lowest controllability value of its excitation inputs and incrementing that value by one and the sequential controllability is incremented by one from the lowest value of its inputs the signal travels through a sequential cell (flip flop etc.). The sequential controllability must be calculated repeatedly if there is a loop and must continue until it stabilizes.

Figure 2.8 Calculations of combinational controllability for common gates [19] 29

The next step in the SCOAP is to calculate the observability values for the nodes.

This is done by setting the combinational and sequential observability of each of the primary outputs to zero and the observability values of all the other nodes to infinity.

Then algorithm then iterates through the circuit until it reaches the primary inputs following the same calculations as that of the controllability but calculates the value for each input to an operation based on the other inputs and output as opposed to calculating a value for the output based on the inputs in controllability. An algorithm is used to assign a level to each node in order to simplify the calculation process. This is done by assigning level zero to the primary inputs and then iterating through each of the fan-outs of the primary inputs and assigning the level to a node once all of its inputs have a level assigned.

Figure 2.9 Calculations of combinational observability for common gates [19]

30

The key difference between the combinational and sequential observability and controllability measures is that the combinational observability and controllability corresponds to the number of operations between a certain node and the primary inputs and outputs; the sequential controllability and observability corresponds to the number of clock cycles between the primary inputs and the primary outputs, respectively. One of the key advantages of the Sandia Controllability/Observability Analysis Program is the linear time complexity of its algorithm. This enables it to analyze large circuits in relatively short periods of time. The big O notation of the SCOAP algorithm is calculated to be O(2n) which can be simplified to O(n) [19].

31

3 Literature Review

3.1 Security-Aware Design Methodology and Optimization

This research presents a design methodology with security and safety constraints for control area network (CAN) and time division multiple access (TDMA) based protocols for automotive systems [20]. System security is of utmost importance when it comes to implementations in automotive systems as a breach in such a system can have catastrophic consequences. The article defines two types of security requirements, security properties (ones to be fulfilled) and security constraints (quantitative constraints). The article provides several examples for security mechanisms which are one key for all distribution, pairwise key distribution, time delayed release of keys, flexible key distribution and asymmetric cryptography. While all methods have their advantages and disadvantages the authors have chosen flexible key distribution for CAN based systems and time delayed release of keys for TDMA based systems. The article also provides an algorithm to solve the security aware optimization problem with a time complexity of O(N log N). The authors also provide proof that this algorithm is the optimal solution for the problem. Experimental data gathered by applying the methodology is provided and it demonstrates the effectiveness of the methodology. In conclusion, the methodology presented in the article provides a general design methodology with conforms to security and other design constraints.

32

Figure 3.1 Flow of the proposed algorithm [20]

3.2 Dynamic Voltage and Frequency Scaling

This research presents an approach to protect cryptosystems from power analysis attacks. Power analysis attacks uses current traces obtained during circuit operations to extract information from the circuit [21]. Dynamic voltage and frequency scaling (DVFS) involves adjusting the input voltage and clock frequency in order to maximize the processor utilization, which results in reduced power consumption. There are three main components in a DVFS implementation, an OS that can determine the desired frequency and voltage, a voltage and frequency regulation loop and hardware that can operate over a wide voltage range. The article introduces three performance metrics for the DVFS system, signal trace entropy, energy overhead and time overhead. Experimentation is carried out using three different DVFS scheduler designs, a random generator, a generator that changes voltage and frequency every clock cycle and a generator that takes the timing budget and the total clock cycle number. While all three designs make it

33

difficult to carry out power analysis attacks, the third provides more performance as it takes the timing budget etc. into consideration when scheduling the voltage and the frequency. In conclusion, the DVFS implementation proposed in the article prevents timing correlation between power traces which improves the circuit’s resistivity to power analysis attacks such as differential power analysis. The implementation has a higher execution time but the power consumption is lower as well.

Figure 3.2 DVFS architecture [21]

3.3 Converter Reshuffling Power Management

This work proposes a new on chip power management technique, converter reshuffling (CoRe). This technique will insert random spikes in the current draw to secure the circuit against power analysis attacks [22]. The authors also developed a security performance metric for the circuit to compare it against existing power delivery systems.

Although on chip voltage regulators are previously proposed they use an interleaved 34

switched voltage converter which creates spikes in the current draw. This method uses a pseudo random number generator to determine the stage being activated or gated. The disadvantage of this method is that due to the spikes being determined by the workload, the attacker can gain information about the actual power draw of the circuit through the artificial spikes and through the use of small load changing operation which will not trigger the interleaved stages. The proposed converter reshuffling technique scrambles the input current when the load is too small to trigger the converter stages and a new set of stages is determined regularly through a pseudo random number generator. This provides a two pronged protection against power analysis attacks. The paper first provides theoretical proof of the effectiveness of the converter reshuffling method, then proceeds to provide a circuit level evaluation of the technique. The paper uses power trace entropy as the performance metric and tests the circuit with and without dynamic voltage and frequency scaling. Through both results it is evident that this technique provides better security as it is a stronger countermeasure against power analysis attacks.

3.4 Inductive Integrated Voltage Regulator

This paper explores the use of integrated voltage regulators (IVR) as a security measure to protect circuits from power analysis attacks [23]. The use of the integrated voltage regulator results in an external current draw which is uncorrelated to the internal current draw therefore makes it difficult for the attackers to gain information about the operations performed by the circuit. The AES-128 circuit running at 20 MHz with an

35

output latency of 8 cycles is used as a test circuit in simulations to evaluate the effectiveness of the IVR. The IVR used in the simulations consists of a power stage and a

L/C filter. The package parasitics are also taken into account when the simulations are performed. Several different simulations are performed on the IVR. First it is tested with current and load patterns such as random load patterns and sinusoidal current patters, simulation using the AES circuit. It is evident from the result of these tests that the IVR decreases the correlation between the input and output currents. Then the simulations are run with the AES circuit. This tests also demonstrate the effectiveness of the IVR. The paper also describes the benefits and disadvantages of using an inductive IVR instead of a linear IVR (slower yet more secure In conclusion, although it does not perfectly conceal the operations of the circuit, the integrated voltage regulator makes carrying out a power analysis attack difficult by obscuring the external power consumption of the circuit. The paper states that in order to protect a circuit against more complex attacks such as ones based on alternative signatures, security aware IVR designs are required.

Figure 3.3 Power attack and proposed countermeasure [23] 36

3.5 Signal Independent Power Consumption CMOS Logic

This paper proposes a system where transition of the signal from a gate is not correlated in the power consumption of the circuit [24]. This is a very efficient against power analysis attacks as the relationship between gate output transitions and the current draw is the basis of the hamming weight leakage model, which is the most commonly used model when performing power analysis attacks. This logic is called Sense

Amplifier Based Logic (SABL).The authors achieve this operation independent power consumption by switching the gate output at every evaluation regardless of the operational result and by having a constant load capacitance. The research shows that differential logic prevent the input value to the gate from affecting the current draw and the dynamic logic will prevent the input sequence from having a correlation with the power consumption; therefore the combination of these logic styles will result in a design where neither the input value nor sequence of the gate have no effect on the power consumption. In addition to this SABL makes sure that the capacitance charge at each of the four possible evaluation results is the same. SBL is shown to be versatile as it can be used as combinational gates, cascade gates (domino logic) and as memory devices such as flip flops and latches. The paper provides experimental data where it shows that using

SABL results in a much lower (116 times less) effect on the current draw by gate output changes with only a two fold increase in circuit area and power consumption.

37

Figure 3.4 Generic N gate and AND/NAND gate using SABL [24]

3.6 Differential Pass Transistor Precharge Logic

This research presents a type of dual rail pre-charge logic called differential pass transistor logic (DPPL) which addresses the early propagation effect, making the operations effect the power consumption; producing a type of logic that maintains constant power consumption in the circuit [25]. It uses complimentary pass transistor logic as a basis. By using the pre charge phase all nodes are set to high and the differential inputs affect the N-transistor logic to carry out the operation during the evaluation stage.

This architecture reduces the effect of the logic on the current draw while also completely removing any early propagation effects of the logic unlike with wave dynamic differential logic (WDDL), which is another logic architecture that uses a pre charge phase and is resistant to power analysis attacks [26]. WDDL achieves this using a similar method to

DPPL but results in a much larger cost of implementation. The authors of the paper

38

proposing DPPL has presented statistics on the area comparison between these two technologies and while WDDL has smaller inverter, AND and NOT gates DPPL results in a

XOR gate that is half the size of that of WDDL and due to the extensive use of OR gates in most cryptosystems it has a significant advantage. While WDDL eliminates the use of conventional power analysis attacks it is still vulnerable to an attack that takes advantage of the early propagation effect that takes place as opposed to DPPL. Differential pass transistor logic also has the advantage of performing logic operations immediately in the pre charge stage and once the complementary input signals reach the gate in evaluation stage. Another key feature of DPPL is that regardless of the phase it is in it has the capacity to eliminate the correlation between the data that makes it vulnerable to power analysis attacks.

Figure 3.5 (A) Power trace comparison (B) XOR gate in DPPL [25] 39

3.7 Dual Voltage Single Rail Logic

This paper presents a logic style known as single rail dual dynamic logic (DSDL)

[27]. The DSDL logic architecture is based on a 3 stage design as opposed to the usual two stage design of most dynamic logic architectures. The three stages are precharge, evaluation and discharge and two sources of voltage. In the precharge phase is implemented conventionally where due to the logic low status of the clock the circuit of this stage is pulled to high. The evaluation stage carries out the evaluation of the logic operation while doing this the voltage is determined by a capacitor connecting the drain side of the first section of the circuit with one voltage source and the source side of the second section supplied by the second voltage source. Resulting in a synchronously varying voltage difference between two supplies during the evaluation stage. In the third stage (discharge stage), the combinations of clock being high and the DCH signal also being high discharges the gate. One of the factors that affect the speed of the circuit is the voltage difference between the two voltage sources. The paper presents experimental evidence that the use of a first voltage source that is twice that of the second one results in a twofold speed increase as a single power source solution. The paper provides experimental results of comparisons between DSDL and other logic variants to show its resistance to differential power analysis attacks in addition to the low overall power consumption of them. Then the paper presents the case study where DSDL is used as an isolated S-Box in an AES circuit and proves that it is more resistant to differential power analysis attacks than conventional CMOS logic.

40

Figure 3.6 DSDL AND gate [27]

3.8 Masked Gates

Masked gates is the use of transformations at the gate level where each value is deconstructed into 2 or more values, with the specified mask being chosen in such a way that the distribution of the deconstructed values are uniform, resulting in a sequence of state transitions that are difficult to predict [28]. While masking can be done at the algorithm this is a complex task as it involves all values inside the circuit to change resulting in differing operations being required to take place, another easier to implement method of masking is through the use of masked gates. Due to the highly generic nature of this operation it is automatable which makes the securing process much easier as opposed to the changes at the algorithmic level. Once implemented masked gates needs to have all internal values different from the original value in order to protect against the power analysis attacks. Experimental results of differential power attacks against both

41

masked gates and normal gates are presented where the masked gates provide a significant advantage against differential power attacks.

Figure 3.7 Masked AND Gate [29]

42

4 Preliminary Gate Architecture Experiments

For the purposes of determining the best architecture to for the testing, several different gate architectures were tested using a test setup (Figure 4.1) and different

CMOS technologies were used. XOR gates were chosen for the experiments due to their prevalence in cryptographic circuits. All of the preliminary experiments were performed on the SPICE (Simulation Program with Integrated Circuit Emphasis) simulator called

NGSpice. NGSpice is an open source SPICE simulator combining the functionality of Spice3

(Developed by the center for electronic systems design, University of California,

Berkeley); Cider, which adds DSIM device simulator to Spice3 increasing its accuracy and

Xspice, which provides simulation by an embedded event driven algorithm [30]. The current trace from the primary voltage source was used in the analysis as it provides the power consumption of the circuit, which I what is used in power analysis attacks.

Figure 4.1 Gate test structure 43

4.1 Conventional Gates

In order to establish a baseline to compare various gate designs a conventional gate XOR gate was implemented with the combination of NAND gates (Figure 4.2).

Figure 4.2 XOR gate

This experiment used a 0.5 micron technology (generated by Mentor Graphics

ADK) with a five volt VDD. Please see Appendix I for the SPICE models of the N and P

MOSFETs. The data inputs to the register was “1011”. As this experiment featured the conventional gate was expected to be highly prone to power analysis attacks. In order to measure this the data inputs were changed for each bit causing a state change of every gate regardless of the stored value. Then the resulting power traces for each change was observed and the results were the following (Figure 4.3).

Conventional Gates Experiment 1 1.50E-02 1.00E-02 I0 5.00E-03 I1 0.00E+00 I2 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -5.00E-03 I3 -1.00E-02 -1.50E-02

Figure 4.3 Conventional gate power trace 1

44

Please see individual power traces and a full page version of the overlaid power trace in Appendix III. As the input changes from 0 to 1 will cause the output of the XOR gate to rise if the register is storing a 0 and to fall if it's storing a 1. By observing the current trace it is evident that bit 1 is storing 0 (smaller dip in current) and the others (larger dip in current) is storing 1. This tallies with the initial storage of "1011" in the register. In order to further verify the results of this test the experiment was repeated using conventional gates but with a different data input. The results can be seen in figure 4.4.

Conventional Gates Experiment 2 2.00E-02

1.50E-02

1.00E-02 I0 I1 5.00E-03

I2 0.00E+00 I3 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -5.00E-03

-1.00E-02

-1.50E-02

Figure 4.4 Conventional gate power trace 2

Please see individual power traces and a full page version of the overlaid power trace in

Appendix IV. By observing the current trace it is evident that bit 1 is storing 0 (smaller dip in current) and the others (larger dip in current) is storing 1. This tallies with the initial storage of "1100" in the register. Thus concluding that this type of gate is susceptible to power analysis attacks.

45

4.2 Smaller Feature Length Technology

The next set of gates were identical in architecture to the ones tested before but they were simulated using a different technology. The technology used were 45nm feature length with a VDD of 1 volt. The SPICE models were obtained by the

Predictive Technology Model developed by the Nanoscale Integration and Modeling

(NIMO) Group at Arizona State University [31]. Please see Appendix II for the SPICE models of the 45 nm N and P MOSFETs. The reason this experiment was carried out was to evaluate the effect of the technology (feature length and VDD) on its susceptibility to power analysis attacks. As the gate architecture is unchanged the expected result is that while still somewhat susceptible to power analysis attacks, it is more resistant than the gates built using the 0.5 micron technology due to the smaller pikes in current draw when the output of the gate transitions. The results of the experiment can be seen in figure 4.5.

Conventional Gates 45nm 3.00E-05

2.00E-05

1.00E-05

0.00E+00 I0 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-05 I1 -2.00E-05 I2 -3.00E-05 I3 -4.00E-05

-5.00E-05

-6.00E-05

-7.00E-05

Figure 4.5 Conventional gate 45nm power trace

46

Please see individual power traces and a full page version of the overlaid power trace in Appendix V. As seen on the power trace it is evident that bit 1 is storing 0 (smaller dip in current) and the others (larger dip in current) is storing 1. This tallies with the initial storage of "0100" in the register. Note that the scale is much smaller on this power trace which is resultant from the much more subtle differences in the power consumption caused by the smaller technology with the lower voltage VDD. Therefore it can be concluded that while changing the process technology a circuit can be made to be less prone to power analysis attacks given the resultant low signal to noise ration but still differential power analysis can be carried out as it filters out the noise through statistical analysis.

4.3 Complementary Output Gates

For this experiment complementary output gates were used to check their resistance to power analysis attacks. Complementary gates are gates where the gate provides both the inverted and non inverted output of the operation. These gates are epected to have robust resistance against power analysis attacks as the number of transitions occuring on the MOSFET outputs are always balanced regardless of the gate input. The complementary XOR gate used in this experiment consists of 8 NMOS and 8

PMOS transistors (including the inverters required to obtain the inverted signals) this provides a 20% reduction in the number of transistors used compared to two independent gates (optimized XOR and XNOR gates which are much smaller than the ones

47

constructed using NAND gates). Figure 4.6 depicts the transistor level diagram of the complementary XOR gate.

Figure 4.6 Complimentary XOR gate

The results of the experiment can be seen in figure 4.7.

Complimentary Gates 2.00E-02

1.50E-02

1.00E-02 I0 5.00E-03 I1 I2 0.00E+00 I3 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -5.00E-03

-1.00E-02

-1.50E-02

Figure 4.7 Complimentary gate power trace 48

Please see individual power traces and a full page version of the overlaid power trace in Appendix VI. The traces appear very similar to each other due to the nature of the dual rail gate, as every rise corresponds to an equal fall; it is not possible to predict the bit stored in the register. Therefore this design provides a substantial amount of resistance against power analysis attacks. There are subtle artifacts in the power traces that differ from each other but these are still very difficult to discern which will become even harder as this simulation is a noiseless simulation whereas in an actual cryptographic circuit the effects from other operations will provide a significant amount of noise on the power trace. The same experiment was performed using the 45nm technology, the results were can be seen in figure 4.8.

Complimentary Gates 45nm 4.00E-05 2.00E-05

I0 0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 I1 -2.00E-05 I2 -4.00E-05 I3 -6.00E-05 -8.00E-05 -1.00E-04

Figure 4.8 Complimentary gates 45nm power trace

Please see individual power traces and a full page version of the overlaid power trace in Appendix VII. The individual traces in this trace is almost indiscernible from each other. Therefore it can be concluded that complimentary gates are highly resistant against power analysis attacks.

49

4.3.1 Padded Complimentary Gates

This experiment uses complimentary gates as described in the previous experiments (0.5 micron technology), but a buffer is added at the input of the gate as padding. This architecture is expected to be resistant against power analysis attacks as the buffers provide further obfuscation of the power trace and it removes spikes and artifacts from the input from propagating to the gate. The results of the experiment can be seen in figure 4.9.

Padded Complimentary Gates 2.00E-02 1.50E-02 i0 1.00E-02 i1 5.00E-03 i2 0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 i3 -5.00E-03 -1.00E-02 -1.50E-02

Figure 4.9 Padded complimentary gates power trace

Please see individual power traces and a full page version of the overlaid power trace in Appendix VIII. It is evident from the power trace that the gates resistance to power analysis attacks increases substantially due to the addition of the buffers at the inputs as the traces are almost indiscernible from each other and much more consistent compared to the same gates without the buffer padding. Therefore it can be concluded from this experiment that the use of these padded gates provide robust protection against power resistance attacks. 50

5 Gate Substitution Workflow

The workflow for the analysis of a circuit for vulnerabilities for power analysis attacks and prevention has three major steps. The first step is the synthesis step, this step uses standard synthesis software to synthesize the circuit into a gate level design. The second step is the analysis step. In this step the circuit is analyzed by calculating controllability and testability values for each gate in the circuit. The final step is to substitute gates that were determined to be vulnerable by modified gates with better resistance against power analysis attacks.

Figure 5.1 Flow diagram of workflow

51

5.1 Synthesis Stage and Analysis Stage

The synthesis stage uses conventional synthesis software to perform the conversion from the high level RTL design to a gate level design based on certain gate library. Synthesis is the second step (after RTL design) in industrial circuit design and is a very important step as it optimizes the circuit as well as convert it to a get level design. It is important that the analysis stage occurs after the synthesis as the algorithm for levelizing the circuit and assigning the controllability and observability values only works on gate level designs. It is not possible to perform this at the higher level RTL form not only because of the complexity but because of the implementation differences between different synthesis settings and tools. It is also important to do substitutions after synthesis as synthesis will remove redundancies from the circuit, including ones put in place to counteract power analysis attacks.

The analysis stage performs the levelization algorithm on the synthesized circuit and assigns controllability and observability values to each gate, following the algorithm described in SCOAP [16]. As this will provide an overview of the circuit’s vulnerabilities to power analysis attacks. Then it is possible to mark certain gates as vulnerable gates depending on the thresholds provided. The thresholds for controllability ranges from 2

(gates that occur directly after the primary input, making them the most vulnerable for power analysis attacks where the attacker can manipulate the inputs) to infinity, where all gates will be marked as vulnerable. The threshold for observability ranges from 1 (gates that occur directly before the primary output, making them the most vulnerable to power

52

analysis attacks where the attacker has access to the output data of the circuit) to infinity, where all gates will be marked as vulnerable. The run time and system of a test run of the levelization algorithm can be seen in table 5.1.

Number of Gates and FlipFlops 17793

Algorithm Runtime (ms) 6251

System Operating System Windows 7 Home Premium

System CPU Intel i7 2860QM (2.5 GHz)

System Memory 16GB DDR3 1600MHz

Table 5.1 Algorithm Stats and System Info.

5.2 Substitution Stage

In this stage gates marked as vulnerable in the previous stages are replaced by specific replacement gates. These replacement gates needs to be mapped to a corresponding gate in the synthesis gate library in order for them to be able to be substituted. The type of gate that is used in the substitution can be customized. Since this substitution happens in the gate level design the substitute gate can be a combination of gates or a completely new gate, if this is the case it’s important that this gate is described in the cell library as it is required when the circuit proceeds to the next stage in the design process. From the preliminary experiments carried out in the previous chapter, it can be derived that using complimentary gates as the substitutions is one of the simplest solutions to counteract power analysis attacks. But it is not limited to complimentary

53

gates, combinations of gates that result in the same output and the same gate with a different transistor level design can be used just as well.

Figure 5.2 OR Gate with obfuscation to counteract power analysis attacks

Although the workflow allows the use of combinations of gates such as the one depicted in figure 5.2, it is quite costly in terms of the number of cost (especially on stricter analysis thresholds). Therefore such gates must only be used if they provide a circumstantial advantage on a specific circuit. The use of complimentary circuits will provide the best result in terms of the tradeoff between security and gate cost. A buffer padded complimentary gate will provide better security at the cost of area and power consumption as documented in the last example in the previous chapter.

Once the substitution stage is completed the circuit is ready to proceed to the next stage of the circuit design process. The advantage of this post synthesis workflow in its current state is that due to the fact that none of the steps are computation heavy. The levelization algorithm in the analysis stage is the costliest but this has been optimized to be able to handle large circuits quickly. Therefore it is feasible to apply these steps during the design of a circuit without incurring a large computational and time cost.

54

6 Application of Workflow

The AES algorithm was chosen to use as a test circuit for this workflow, given its cryptographic nature and its complexity. There are several areas in the AES algorithm that are vulnerable to power analysis attacks when implemented conventionally, this provides a prime candidate to test its vulnerability before and after applying the gate substitutions through the proposed workflow. This chapter will elaborate on the process from the implementation of the AES algorithm in RTL to the SPICE simulation of the post workflow

AES circuit.

6.1 RTL Design

The AES circuit was implemented using the ENS method, where a combination of the S-Box and another lookup table is used to replace a lot of midlevel operations. This was chosen to reduce the circuit size and it does not affect the part of the circuit the power analysis attack will be focusing on. Another change was the lookup tables in the circuit was replaced with a simpler structure. While this will not provide accurate AES encryption it drastically reduced the size of the circuit. The variety of AES used is also the

128-bit version, as minimizing the size of the circuit was the priority as it is the primary bottleneck faced during this experiment due to the computation heavy nature of SPICE simulations. For the purpose of this experiment Verilog was chosen as the hardware description language to implement the AES circuit in.

55

6.2 Synthesis and Analysis

The Verilog RTL was synthesized using Mentor Graphics Leonardo Spectrum Level

3-2009a.6 and the ami05 library was used as the gate library. Once synthesis was complete the resulting circuit was written as an EDIF (Electronic Design Interchange

Format) file. This was then processed through readedif, a program written by Prof. Daniel

Saab to read EDIF files and convert them to an intermediary format (based on cell models) and provide the statistics of the circuit. The statistics of the AES circuit can be found in table 6.1.

Cell Type Number of Occurrences

XNOR2 4816

XOR2 1712

INV 5904

FALSE 11136

DFFRS 5568

BUF 18000

Total 47136

Table 6.1 AES Circuit Statistics

Then the file outputted from readedif is processed through B2Champ, another program written by Prof. Daniel Saab to read the bench intermediary format and convert the circuit into another intermediary format where the circuit is described in terms of

56

cells. The resultant file is ready for analysis to assign controllability and observability values. For the purposes of this experiment the controllability threshold used is 2 and the observability threshold is 0 (no substitutions due to observability). This setting marks the gates in the first AddRoundKey stage of the AES circuit (Highlighted in red in figure 6.1)

Figure 6.1 AES algorithm with the vulnerable stage highlighted

This stage of the AES circuit is made of 128 two input XOR gates. Therefore these are the gates marked to be replaced in the next stage.

57

6.3 Substitution and Conversion to a Netlist

A new cell type is created in the circuit file and all the gates marked as vulnerable are replaced with this type. The type created has the exact same number of inputs and outputs as the original XOR gate. The cell description of all the standard cells are added to the circuit file. Thereafter the circuit description of the new cell type is added. The cell descriptions are written in terms of NMOS and PMOS transistors. The new cell type implemented is a complimentary XOR gate with buffers padding (as experimented in subchapter 4.3.1). The next step is to convert the circuit into a SPICE netlist so that it can be simulated. To do this a program called Champ2Spice is used. This is a program written by Prof. Daniel Saab that reads the circuit description and the cell descriptions and replace the cells in the circuit with the transistors, flattening the design into a transistor level design and writing it in SPICE netlist format. The output file from this program is modified to suit the particular SPICE simulator that is being used and the relevant control statements, technology models are added. The technology used in this experiment is the

5V VDD 0.5 micron technology generated by Mentor Graphics ADK. Please see Appendix

I for the SPICE models of the N and P MOSFETs. Once this is done the circuit is ready to be simulated.

Figure 6.2 Flow diagram of the analysis and substitution stages

58

6.4 SPICE Simulation

Even though a lot of measures were taken to minimize the size of the circuit, the

SPICE netlists were still very large, with the circuit before the gate substitutions having

146335 transistors and 146847 transistors after the gate substitutions. The simulation of this circuit was initially done using NGSpice, the same software used to perform the preliminary experiments in chapter 4; but it was not successful as the circuit was too large and the NGSpice program crashed repeatedly when attempting to initialize the circuit matrix and perform the transient analysis. The next software that was used was LTSpice

XVII. LTSpice is a powerful freeware SPICE simulator with enhancements to accommodate large circuits developed by Linear Technology Corporation [32]. LTSpice was chosen due to its powerful simulation engine that is able to perform the SPICE simulation with a high degree of parallelism, being able to fully utilize multi-threaded CPUs. LTSpice also provided access to a lot of configuration options that allows the user to customize the simulation run. However even with the use of LTSpice initially the circuit was too large to simulate effectively and resulted in crashed and infinite loops. This was mostly due to convergence problems. Convergence problems occur because the circuit initial values are determined through an iterative algorithm, for which the size of the circuit can cause numerical errors which result in the circuit node voltages don’t stabilize within the limit of iterations. To counteract this, some of the simulation settings of LTSpice were modified. The modified settings and their descriptions can be seen in table 6.2.

59

Option Description Value Conductance added at gmin every PN junction to aid 1e-10 convergence Absolute current error abstol 1e-10 tolerance reltol Relative error tolerance 0.003 Optional capacitance cshunt added from every node to 1e-15 ground Go directly to gmin noopiter true stepping

Table 6.2 Changed SPICE options (descriptions from LTSpice help [33])

Once these options were changed the circuit was able to converge and LTSpice was able to perform the transient analysis. 20 ns of simulation was performed on both the pre and post gate substitution netlists with two sets of selectively differing inputs. The time and system information used to do the simulation can be seen in table 6.3.

Pre Gate Substitution Simulation Time (hours) 92.83

Post Gate Substitution Simulation Time (hours) 97.50

System Operating System Windows 7 Home Premium

System CPU Intel i7 2860QM (2.5 GHz)

System Memory 16GB DDR3 1600MHz

Table 6.3 Simulation Times and System information

After the simulations were finished the current trace of the primary voltage source was saved, as this is current trace directly equates to the power consumption of the circuit.

60

6.5 Experiment Results

Once the initial simulations were concluded it was clear that simple power analysis cannot be performed on the AES circuit regardless of the gate type used; due to the complexity of the circuit, which caused a very low signal to noise ratio in the power trace.

Therefore the resistance to differential power analysis was the main focus of the experiment. Due to computational constraints performing a sufficient sample rate of simulation runs using SPICE in order to actually carry out of differential power analysis attacks was infeasible. Therefore the scope of the experiment is to investigate whether the circuit exhibited signs that indicate it is vulnerable to differential power analysis attacks. This was possible since all the inputs can be controlled to a very precise degree.

The results of the simulations runs can be found in figure 6.3 and 6.4 of the pre and post gate substitution circuits, respectively.

Pre Gate Substitution Power Trace 5.00E+00 4.00E+00 3.00E+00 2.00E+00 1.00E+00 0.00E+00 0.00E+00 5.00E-09 1.00E-08 1.50E-08 2.00E-08 -1.00E+00 -2.00E+00 -3.00E+00 -4.00E+00 -5.00E+00

Figure 6.3 Pre gate substitution Power Trace

61

Post Gate Substitution Power Trace 5.00E+00

4.00E+00

3.00E+00

2.00E+00

1.00E+00

0.00E+00 0.00E+00 5.00E-09 1.00E-08 1.50E-08 2.00E-08 -1.00E+00

-2.00E+00

-3.00E+00

-4.00E+00

-5.00E+00

Figure 6.4 Post gate substitution power trace

In order to better see the differences in correlation the magnified power traces for pre and post gate substitution can be seen in figure 6.5 and 6.6.

Pre Gate Substitution Power Trace (Magnified) 5.00E+00 4.00E+00 3.00E+00 2.00E+00 1.00E+00 0.00E+00 6.45E-09 6.47E-09 6.49E-09 6.51E-09 6.53E-09 6.55E-09 -1.00E+00 -2.00E+00 -3.00E+00 -4.00E+00 -5.00E+00

Figure 6.5 Pre gate substitution power trace (magnified)

62

Post Gate Substitution Power Trace (Magnified) 5.00E+00

4.00E+00

3.00E+00

2.00E+00

1.00E+00

0.00E+00 6.45E-09 6.47E-09 6.49E-09 6.51E-09 6.53E-09 6.55E-09 -1.00E+00

-2.00E+00

-3.00E+00

-4.00E+00

-5.00E+00

Figure 6.6 Post gate substitution power trace (magnified)

Due to the pipelined nature of the AES circuit, all the stages operate simultaneously thus causing a lot of noise on the power trace. Although a difference can be seen in the magnified traces some of the spikes caused are masked by the noise. It is very difficult to identify changes on the trace that is been focused on for the search of vulnerabilities. Of course this will not affect a real differential power analysis attack as the use of statistical analysis lets the attacker filer out the noise due to the large sample size.

In order to further investigate the vulnerabilities the power consumption of the stage that is being attacked was isolated. This allowed a much clearer view of the behavior of the gates. The power traces for two simulations are presented for each circuit, the difference between these two simulation runs is that some of the input bits remain low and in the other these bits turn high. The one where the bits remain low is named Simulation A and the simulation where the bits transition from low to high is named Simulation B. The

63

power trace for pre gate substitution circuit can be seen figure 6.7 and the power trace for the post gate substitution circuit can be seen figure 6.8. In both circuits both simulations are overlaid in a single chart for easier comparison.

Stage Isolated Pre Gate Substitution Power Trace 1.00E+00 8.00E-01 6.00E-01 4.00E-01 2.00E-01 0.00E+00 -2.00E-010.00E+00 5.00E-09 1.00E-08 1.50E-08 2.00E-08 -4.00E-01 -6.00E-01 -8.00E-01 -1.00E+00

Simulation A Simulation B

Figure 6.7 Stage Isolated pre gate substitution power trace

Stage Isolated Post Gate Substitution Power Trace 1.00E+00 8.00E-01 6.00E-01 4.00E-01 2.00E-01 0.00E+00 -2.00E-010.00E+00 5.00E-09 1.00E-08 1.50E-08 2.00E-08 -4.00E-01 -6.00E-01 -8.00E-01 -1.00E+00

Simulation A Simulation B

Figure 6.8 Stage Isolated post gate substitution power trace

64

In order to better see the results the two power traces were magnified on the timespan the difference in input takes place. These power traces can be seen in Figure

6.9 and 6.10 for pre and post gate substitution circuits, respectively.

Stage Isolated Pre Gate Substitution Power Trace (Magnified) 1.00E+00

5.00E-01

0.00E+00 6.45E-09 6.47E-09 6.49E-09 6.51E-09 6.53E-09 6.55E-09

-5.00E-01

-1.00E+00

Simulation A Simulation B

Figure 6.9 Stage Isolated pre gate substitution power trace (magnified)

Stage Isolated Post Gate Substitution Power Trace (Magnified) 1.00E+00

5.00E-01

0.00E+00 6.45E-09 6.47E-09 6.49E-09 6.51E-09 6.53E-09 6.55E-09

-5.00E-01

-1.00E+00

Simulation A Simulation B

Figure 6.10 Stage Isolated post gate substitution power trace (magnified)

65

It can be clearly seen in the magnified power traces that the two circuits have very different behaviors during the input transitions. The circuit with the conventional gate have a very close correlation between the transitions and the power consumption as the dips in the power trace are very clear, consistent and distinguishable. While the power trace of the circuit with the substituted gates show no correlation to the input transitions.

The two traces between the simulations do not match perfectly as it occurred in the preliminary experiments carried out I chapter 4, but this can be caused by various factors such as the increased signal rise time and fall time difference between the two outputs complimentary gate based on its inputs and numerical errors during the simulation. Note that the sudden large positive and negative spikes in the simulations of both circuits is due to numerical errors caused by multistep method to solve the differential equations.

Therefore it can be concluded that even though the gate substitution didn’t yield perfect results, where the two traces between different inputs are identical; the substitution substantially decreased the correlation between the data and the power consumption resulting in circuit more resistant to differential power analysis.

66

7 Future Work

There are several ways to improve the gate substitution workflow that was not done due to resource constraints. The next step for this research is to combine the individual steps and automate the process because while the computation heavy sections of the process such as the levelization algorithm is already implemented, they need to be integrated with the other required stages and conversions; ideally integrated into the circuit synthesizer as this will make the use of the workflow extremely easy for the circuit designer. Once integrated the program will only require the desired level of resistance and gate libraries and the output will be the gate level circuit with all the relevant security measures already in place. As there are no extremely computational heavy tasks in the workflow this addition is not expected to increase the synthesis time of a circuit by a large amount. Another potential improvement is to extend the scope of the system beyond gate level substitutions. Substitutions at the transistor level can provide a lot of benefits such as smaller designs and stronger protection, but this would require the ability to create custom cells and extensive information about the lower level parameters of the design; as the logic style being used needs to be taken in to consideration when security measures are added on the transistor level. Finally, optimization can also be introduced at the end of the workflow to optimize the cost and security of the circuit as it may be able to remove redundant gates (and transistors, if the scope is increased to the transistor level) while still maintain the specified controllability and observability thresholds, which results in consistent resistance to power analysis attacks.

67

8 Conclusion

In this research a workflow was proposed to analyze and increase the security of a circuit during the synthesis stage. The proposed workflow is fully automatable and has a light computational cost. The proposed workflow consists of three stages, the synthesis stage (conventional circuit synthesis with cell library where each cell has its corresponding secure design to be used in the substitution stage), the analysis stage, where the circuit is analyzed using a levelization algorithm to determine vulnerabilities and the substitution stage, where the gates prone to power analysis attacks are replaced with more resilient gates. The workflow works at the gate level and can be used as an extension of the synthesis process, just requiring the users input of the gate library and security level thresholds. The feasibility of gate based defenses against power analysis were first tested on individual gates to verify the concept. Then an AES circuit was designed was used to test it. The AES circuit was chosen for its complexity and cryptographic functionality. The

AES circuit (designed to minimize the number of gates used) were subjected to the proposed workflow and converted to a SPICE netlist to run electronic simulation. The simulations were used to investigate the effects of the workflow on the circuit’s resistance to power analysis attacks. From the results of the experiment it can be concluded that while not completely eliminating the possibility of a power analysis attacks the workflow increased its resistance to power analysis attacks. Further progress can be made on this front by using different gate architectures and increased scope that provide better resilience against attacks and optimization.

68

Appendices

Appendix I: 0.5 micron SPICE Models

NMOS Transistor (generated by Mentor Graphics ADK):

.MODEL n NMOS LEVEL=3 PHI=0.700000 TOX=1.0000E-08 XJ=0.200000U TPG=1 + VTO=0.7812 DELTA=2.4510E-01 LD=4.0510E-08 KP=1.8847E-04 + UO=545.8 THETA=2.5170E-01 RSH=2.1290E+01 GAMMA=0.6200 + NSUB=1.3810E+17 NFS=7.0710E+11 VMAX=1.8610E+05 ETA=2.2420E-02 + KAPPA=9.6720E-02 CGDO=3.66E-10 CGSO=3.66E-10 + CGBO=4.0161E-10 CJ=5.4E-04 MJ=0.6 CJSW=1.5000E-10 + MJSW=0.32 PB=0.99

PMOS Transistor (generated by Mentor Graphics ADK):

.MODEL p PMOS LEVEL=3 PHI=0.700000 TOX=1.0000E-08 XJ=0.200000U TPG=-1 + VTO=-0.9197 DELTA=2.4830E-01 LD=6.7120E-08 KP=4.4546E-05 + UO=129.0 THETA=1.7800E-01 RSH=3.4290E+00 GAMMA=0.5230 + NSUB=9.8260E+16 NFS=6.4990E+11 VMAX=3.0560E+05 ETA=1.7820E-02 + KAPPA=6.3410E+00 CGDO=3.66E-10 CGSO=3.66E-10 + CGBO=4.2772E-10 CJ=9.3191E-04 MJ=0.51 CJSW=1.5E-10 + MJSW=0.193 PB=0.95

69

Appendix II: 45 nanometer SPICE Models

NMOS Transistor [31]:

.model n nmos level = 54 +version = 4.0 binunit = 1 paramchk= 1 mobmod = 0 +capmod = 2 igcmod = 1 igbmod = 1 geomod = 1 +diomod = 1 rdsmod = 0 rbodymod= 1 rgatemod= 1 +permod = 1 acnqsmod= 0 trnqsmod= 0 +tnom = 27 toxe = 1.4e-009 toxp = 7e-010 toxm = 1.4e-009 +dtox = 0 epsrox = 3.9 wint = 5e-009 lint = 1.2e-008 +ll = 0 wl = 0 lln = 1 wln = 1 +lw = 0 ww = 0 lwn = 1 wwn = 1 +lwl = 0 wwl = 0 xpart = 0 toxref = 1.4e-009 +vth0 = 0.22 k1 = 0.35 k2 = 0.05 k3 = 0 +k3b = 0 w0 = 2.5e-006 dvt0 = 2.8 dvt1 = 0.52 +dvt2 = -0.032 dvt0w = 0 dvt1w = 0 dvt2w = 0 +dsub = 2 minv = 0.05 voffl = 0 dvtp0 = 1e-007 +dvtp1 = 0.05 lpe0 = 5.75e-008 lpeb = 2.3e-010 xj = 2e-008 +ngate = 5e+020 ndep = 2.8e+018 nsd = 1e+020 phin = 0 +cdsc = 0.0002 cdscb = 0 cdscd = 0 cit = 0 +voff = -0.15 nfactor = 1.2 eta0 = 0.15 etab = 0 +vfb = -0.55 u0 = 0.032 ua = 1.6e-010 ub = 1.1e-017 +uc = -3e-011 vsat = 1.1e+005 a0 = 2 ags = 1e-020 +a1 = 0 a2 = 1 b0 = -1e-020 b1 = 0 +keta = 0.04 dwg = 0 dwb = 0 pclm = 0.18 +pdiblc1 = 0.028 pdiblc2 = 0.022 pdiblcb = -0.005 drout = 0.45 +pvag = 1e-020 delta = 0.01 pscbe1 = 8.14e+008 pscbe2 = 1e-007 +fprout = 0.2 pdits = 0.2 pditsd = 0.23 pditsl = 2.3e+006 +rsh = 3 rdsw = 150 rsw = 150 rdw = 150

70

+rdswmin = 0 rdwmin = 0 rswmin = 0 prwg = 0 +prwb = 6.8e-011 wr = 1 alpha0 = 0.074 alpha1 = 0.005 +beta0 = 30 agidl = 0.0002 bgidl = 2.1e+009 cgidl = 0.0002 +egidl = 0.8 +aigbacc = 0.012 bigbacc = 0.0028 cigbacc = 0.002 +nigbacc = 1 aigbinv = 0.014 bigbinv = 0.004 cigbinv = 0.004 +eigbinv = 1.1 nigbinv = 3 aigc = 0.012 bigc = 0.0028 +cigc = 0.002 aigsd = 0.012 bigsd = 0.0028 cigsd = 0.002 +nigc = 1 poxedge = 1 pigcd = 1 ntox = 1 +xrcrg1 = 12 xrcrg2 = 5 +cgso = 6.238e-010 cgdo = 6.238e-010 cgbo = 2.56e-011 cgdl = 2.495e-10 +cgsl = 2.495e-10 ckappas = 0.01 ckappad = 0.01 acde = 1 +moin = 15 noff = 0.9 voffcv = 0.02 +kt1 = -0.37 kt1l = 0.0 kt2 = -0.042 ute = -1.5 +ua1 = 1e-009 ub1 = -3.5e-019 uc1 = 0 prt = 0 +at = 53000 +fnoimod = 1 tnoimod = 0 +jss = 0.0001 jsws = 1e-011 jswgs = 1e-010 njs = 1 +ijthsfwd= 0.01 ijthsrev= 0.001 bvs = 10 xjbvs = 1 +jsd = 0.0001 jswd = 1e-011 jswgd = 1e-010 njd = 1 +ijthdfwd= 0.01 ijthdrev= 0.001 bvd = 10 xjbvd = 1 +pbs = 1 cjs = 0.0005 mjs = 0.5 pbsws = 1 +cjsws = 5e-010 mjsws = 0.33 pbswgs = 1 cjswgs = 3e-010 +mjswgs = 0.33 pbd = 1 cjd = 0.0005 mjd = 0.5 +pbswd = 1 cjswd = 5e-010 mjswd = 0.33 pbswgd = 1 +cjswgd = 5e-010 mjswgd = 0.33 tpb = 0.005 tcj = 0.001 +tpbsw = 0.005 tcjsw = 0.001 tpbswg = 0.005 tcjswg = 0.001 +xtis = 3 xtid = 3 +dmcg = 0e-006 dmci = 0e-006 dmdg = 0e-006 dmcgt = 0e-007 +dwj = 0.0e-008 xgw = 0e-007 xgl = 0e-008 +rshg = 0.4 gbmin = 1e-010 rbpb = 5 rbpd = 15 +rbps = 15 rbdb = 15 rbsb = 15 ngcon = 1

71

PMOS Transistor [31]:

.model p pmos level = 54 +version = 4.0 binunit = 1 paramchk= 1 mobmod = 0 +capmod = 2 igcmod = 1 igbmod = 1 geomod = 1 +diomod = 1 rdsmod = 0 rbodymod= 1 rgatemod= 1 +permod = 1 acnqsmod= 0 trnqsmod= 0 +tnom = 27 toxe = 1.4e-009 toxp = 7e-010 toxm = 1.4e-009 +dtox = 0 epsrox = 3.9 wint = 5e-009 lint = 1.2e-008 +ll = 0 wl = 0 lln = 1 wln = 1 +lw = 0 ww = 0 lwn = 1 wwn = 1 +lwl = 0 wwl = 0 xpart = 0 toxref = 1.4e-009 +vth0 = -0.22 k1 = 0.39 k2 = 0.05 k3 = 0 +k3b = 0 w0 = 2.5e-006 dvt0 = 3.9 dvt1 = 0.635 +dvt2 = -0.032 dvt0w = 0 dvt1w = 0 dvt2w = 0 +dsub = 0.7 minv = 0.05 voffl = 0 dvtp0 = 0.5e-008 +dvtp1 = 0.05 lpe0 = 5.75e-008 lpeb = 2.3e-010 xj = 2e-008 +ngate = 5e+020 ndep = 2.8e+018 nsd = 1e+020 phin = 0 +cdsc = 0.000258 cdscb = 0 cdscd = 6.1e-008 cit = 0 +voff = -0.15 nfactor = 2 eta0 = 0.15 etab = 0 +vfb = 0.55 u0 = 0.0095 ua = 1.6e-009 ub = 8e-018 +uc = 4.6e-013 vsat = 90000 a0 = 1.2 ags = 1e-020 +a1 = 0 a2 = 1 b0 = -1e-020 b1 = 0 +keta = -0.047 dwg = 0 dwb = 0 pclm = 0.55 +pdiblc1 = 0.03 pdiblc2 = 0.0055 pdiblcb = 3.4e-008 drout = 0.56 +pvag = 1e-020 delta = 0.014 pscbe1 = 8.14e+008 pscbe2 = 9.58e-007 +fprout = 0.2 pdits = 0.2 pditsd = 0.23 pditsl = 2.3e+006 +rsh = 3 rdsw = 250 rsw = 160 rdw = 160 +rdswmin = 0 rdwmin = 0 rswmin = 0 prwg = 3.22e-008

72

+prwb = 6.8e-011 wr = 1 alpha0 = 0.074 alpha1 = 0.005 +beta0 = 30 agidl = 0.0002 bgidl = 2.1e+009 cgidl = 0.0002 +egidl = 0.8 +aigbacc = 0.012 bigbacc = 0.0028 cigbacc = 0.002 +nigbacc = 1 aigbinv = 0.014 bigbinv = 0.004 cigbinv = 0.004 +eigbinv = 1.1 nigbinv = 3 aigc = 0.69 bigc = 0.0012 +cigc = 0.0008 aigsd = 0.0087 bigsd = 0.0012 cigsd = 0.0008 +nigc = 1 poxedge = 1 pigcd = 1 ntox = 1 +xrcrg1 = 12 xrcrg2 = 5 +cgso = 7.43e-010 cgdo = 7.43e-010 cgbo = 2.56e-011 cgdl = 1e-014 +cgsl = 1e-014 ckappas = 0.5 ckappad = 0.5 acde = 1 +moin = 15 noff = 0.9 voffcv = 0.02 +kt1 = -0.34 kt1l = 0 kt2 = -0.052 ute = -1.5 +ua1 = -1e-009 ub1 = 2e-018 uc1 = 0 prt = 0 +at = 33000 +fnoimod = 1 tnoimod = 0 +jss = 0.0001 jsws = 1e-011 jswgs = 1e-010 njs = 1 +ijthsfwd= 0.01 ijthsrev= 0.001 bvs = 10 xjbvs = 1 +jsd = 0.0001 jswd = 1e-011 jswgd = 1e-010 njd = 1 +ijthdfwd= 0.01 ijthdrev= 0.001 bvd = 10 xjbvd = 1 +pbs = 1 cjs = 0.0005 mjs = 0.5 pbsws = 1 +cjsws = 5e-010 mjsws = 0.33 pbswgs = 1 cjswgs = 3e-010 +mjswgs = 0.33 pbd = 1 cjd = 0.0005 mjd = 0.5 +pbswd = 1 cjswd = 5e-010 mjswd = 0.33 pbswgd = 1 +cjswgd = 5e-010 mjswgd = 0.33 tpb = 0.005 tcj = 0.001 +tpbsw = 0.005 tcjsw = 0.001 tpbswg = 0.005 tcjswg = 0.001 +xtis = 3 xtid = 3 +dmcg = 5e-006 dmci = 5e-006 dmdg = 5e-006 dmcgt = 6e-007 +dwj = 0e-008 xgw = 3e-007 xgl = 4e-008 +rshg = 0.4 gbmin = 1e-010 rbpb = 5 rbpd = 15 +rbps = 15 rbdb = 15 rbsb = 15 ngcon = 1

73

Appendix III: Conventional Gate Experiment 1 Power Trace

i0

0.02

0.01

0 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -0.01

-0.02

i1 0.02

0.01

0 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -0.01

-0.02

i2 0.02

0.01

0 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -0.01

-0.02

i3 0.02

0.01

0 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -0.01

-0.02

74

75

Appendix IV: Conventional Gate Experiment 2 Power Trace

I0 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

I1 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

I2 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

I3 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

76

77

Appendix V: Conventional Gate 45nm Power Trace

I0 4.00E-05 2.00E-05 0.00E+00 -2.00E-050.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -4.00E-05 -6.00E-05 -8.00E-05

I1 4.00E-05 2.00E-05 0.00E+00 -2.00E-050.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -4.00E-05 -6.00E-05 -8.00E-05

I2 4.00E-05 2.00E-05 0.00E+00 -2.00E-050.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -4.00E-05 -6.00E-05 -8.00E-05

I3 4.00E-05 2.00E-05 0.00E+00 -2.00E-050.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -4.00E-05 -6.00E-05 -8.00E-05

78

79

Appendix VI: Complimentary Gate Power Trace

I0 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

I1 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

I2 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

I3 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

80

81

Appendix VII: Complimentary Gate 45nm Power Trace

I0 5.00E-05

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -5.00E-05

-1.00E-04

I1 5.00E-05

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -5.00E-05

-1.00E-04

I2 5.00E-05

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -5.00E-05

-1.00E-04

I3 5.00E-05

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -5.00E-05

-1.00E-04

82

83

Appendix VIII: Padded Complimentary Gate Power Trace

i0 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

i1 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

i2 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

i3 2.00E-02

1.00E-02

0.00E+00 0.00E+00 2.00E-08 4.00E-08 6.00E-08 8.00E-08 1.00E-07 -1.00E-02

-2.00E-02

84

85

References

[1] S. Smith, " boxes and boots: security in hardware", Computer, vol. 37, no. 10, pp. 106-109, 2004. [2] N. Potlapally, "Hardware security in practice: Challenges and opportunities", 2011 IEEE International Symposium on Hardware-Oriented Security and Trust, 2011. [3] P. Kocher, J. Jaffe and J. Benjamin, "Differential Power Analysis", Proceedings of the 19th Annual International Cryptology Conference on Advances in Cryptology, pp. 388- 397, 1999. [4] P. Rohatgi, "Protecting FPGAs from power analysis | EE Times", EETimes, 2017. [Online]. Available: http://www.eetimes.com/document.asp?doc_id=1278081. [5] T. Messerges, "Using Second-Order Power Analysis to Attack DPA Resistant Software", Cryptographic Hardware and Embedded Systems — CHES 2000, pp. 238- 251, 2000. [6] S. Chari, C. Jutla, J. Rao and P. Rohatgi, "Towards Sound Approaches to Counteract Power-Analysis Attacks", Advances in Cryptology — CRYPTO’ 99, pp. 398-412, 1999. [7] "Advanced encryption standard (AES)", Federal Information Processing Standards Publication 197, United States National Institute of Standards and Technology (NIST), 2001. [8] J. Daemen and V. Rijmen, "AES Proposal: Rijndael", 1999. [9] Y. Wang and Y. Ha, "A Performance and Area Efficient ASIP for Higher-Order DPA- Resistant AES", IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 4, no. 2, pp. 190-202, 2014. [10] A. Sia, "SE4C03 Wiki - Advanced Encryption Standard (AES)", Imps.mcmaster.ca, 2007. [Online]. Available: http://imps.mcmaster.ca/courses/SE-4C03- 07/wiki/siaa/se4c03_aes_wiki(7).html. [Accessed: 19- Mar- 2017]. [11] L. Barbosa, "POWER8 in-core cryptography", Ibm.com, 2015. [Online]. Available: http://www.ibm.com/developerworks/library/se-power8-in-core- cryptography/index.html. [Accessed: 19- Mar- 2017]. [12] N. Eftaxiopulos-Sarris and G. Zervakis, "Design and Implementation of a Versatile Hardware Crypto IP for Symmetric and Asymmetric Algorithms", 2012. [13] H. Wunderlich, "PROTEST: A Tool for Probabilistic Testability Analysis", 22nd ACM/IEEE Design Automation Conference, 1985. [14] E. Eichelberger and T. Williams, "A logic design structure for LSI testability", Papers on Twenty-five years of electronic design automation - 25 years of DAC, 1988. 86

[15] W. Keiner and R. West, "Testability Measures", Proc. 1977 AUTOTESTCON, pp. 49-55, 1977. [16] L. Goldstein and E. Thigpen, "SCOAP: Sandia Controllability/Observability Analysis Program", 17th Design Automation Conference, pp. 190-196, 1980. [17] Z. Gajic, Linear dynamic systems and signals, 1st ed. Upper Saddle River (N.J.): Prentice Hall/Pearson Education, 2003, p. Chapter 5. [18] D. Forte, S. Bhunia and M. Tehranipoor, Hardware Protection through Obfuscation, 1st ed. 2017, p. 41. [19] C. Patel, "Testability measures", UMBC CSEE. [Online]. Available: https://www.csee.umbc.edu/~cpatel2/links/418/lectures/chap6_lect07_testability _measures.pdf. [20] C. Lin, B. Zheng, Q. Zhu and A. Sangiovanni-Vincentelli, "Security-Aware Design Methodology and Optimization for Automotive Systems", ACM Transactions on Design Automation of Electronic Systems, vol. 21, no. 1, pp. 1-26, 2015. [21] S. Yang, P. Gupta, M. Wolf, D. Serpanos, V. Narayanan and Y. Xie, "Power Analysis Attack Resistance Engineering by Dynamic Voltage and Frequency Scaling", ACM Transactions on Embedded Computing Systems, vol. 11, no. 3, pp. 1-16, 2012. [22] W. Yu, O. Uzun and S. Köse, "Leveraging on-chip voltage regulators as a countermeasure against side-channel attacks", Proceedings of the 52nd Annual Design Automation Conference on - DAC '15, 2015. [23] M. Kar, D. Lie, M. Wolf, V. De and S. Mukhopadhyay, "Impact of inductive integrated voltage regulator on the power attack vulnerability of encryption engines: A simulation study", Proceedings of the IEEE 2014 Custom Integrated Circuits Conference, 2014. [24] K. Tiri, M. Akmal and I. Verbauwhede, "A dynamic and differential CMOS logic with signal independent power consumption to withstand differential power analysis on smart cards", Proceedings of the 28th European Solid-State Circuits Conference, pp. 403-406, 2002. [25] X. Pang, J. Wang, C. Wang and X. Wang, "A DPA resistant dual rail Préchargé logic cell", 2015 IEEE 11th International Conference on ASIC (ASICON), 2015. [26] I. Verbauwhede and K. Tiri, "Wave dynamic differential logic", 8947123, 2015. [27] W. Tang, S. Jia and Y. Wang, "A dual-voltage single-rail dynamic DPA-resistant logic based on charge sharing mechanism", 2015 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC), 2015. [28] S. Mangard, T. Popp and B. Gammel, "Side-Channel Leakage of Masked CMOS Gates", Lecture Notes in Computer Science, pp. 351-365, 2005.

87

[29] J. Zeng, Y. Wang, C. Xu and R. Li, "Improvement on masked S-box hardware implementation", 2012 International Conference on Innovations in Information Technology (IIT), 2012. [30] P. Nenzi, "Ngspice circuit simulator", Ngspice.sourceforge.net. [Online]. Available: http://ngspice.sourceforge.net/presentation.html. [31] "Predictive Technology Model (PTM)", Ptm.asu.edu, 2017. [Online]. Available: http://ptm.asu.edu/. [32] "Linear Technology - Design Simulation and Device Models", Linear.com, 2017. [Online]. Available: http://www.linear.com/designtools/software/. [33] ".OPTIONS -- Set simulator options", Ltwiki.org, 2017. [Online]. Available: http://ltwiki.org/LTspiceHelp/LTspiceHelp/_OPTIONS_Set_simulator_options.htm.

88