Side Channel Attack Resistance: Migrating Towards High Level Methods

Submitted to the

Division of Research and Advanced Studies of the University of Cincinnati

in partial fulﬁllment of the requirements for the degree of

Doctor of Philosophy

in the School of Electronics and Computing Systems of the College of Engineering and Applied Science University of Cincinnati July 2013

Mike Borowczak

B.S.Computer Engineering University of Cincinnati, Cincinnati, OH June 2007

Thesis Advisor and Committee Chair: Ranga Vemuri Thesis advisor Author

Ranga Vemuri Mike Borowczak

Side Channel Attack Resistance:Migrating Towards High Level Methods

Abstract

Our world is moving towards ubiquitous networked computing with unstoppable momentum. With technology available at our every ﬁnger tip, we expect to connect quickly, cheaply, and securely on the sleekest devices. While the past four decades of design automation research has focused on making integrated circuits smaller, cheaper and quicker the past decade has drawn more attention towards security. Though security within the scope of computing is a large domain, the focus of this work is on the elimination of computationally based power byproducts from high-level device models down to physical designs and implementations

The scope of this dissertation is within the analysis, attack and protection of power based side channels. Current research in the ﬁeld concentrates on determining, masking and/or eliminating the sources of data dependent information leakage within designs.

While a significant amount of research is allocated to reducing this leakage at low levels of abstraction (e.g. logic style, gate/circuit layout), significantly less research effort has gone into higher levels of abstraction (e.g. architectural, algorithmic). This dissertation focuses on both ends of the design spectrum while motivating the future need for hierarchical side channel resistance metrics for hardware designs.

Current low level solutions focus on creating perfectly balanced standard cells through various straight-forward logic styles. Each of these existing logic styles, while enhancing side channel resistance by reducing the channels’ variance, come at signiﬁcant

ii Abstract iii design expense in terms of area footprint, power consumption, delay and even logic style structure. The ﬁrst portion of this proposal introduces a universal cell based on a dual multiplexer, implemented using a pass-transistor logic (SDMLp) which approaches and exceeds some standard cell cost benchmarks. The proposed cell and circuit level methods shows signiﬁcant improvements in security metrics over existing cells and approaches standard CMOS cell and circuit performance by reducing area, power consumption and delay.

While most low level works stop at the cell level, this work also investigates the impact of environmental factors on security.

On the other end of the design spectrum, existing secure architecture and algorithm research attempts to mask side channels through random noise, variable timing, instruction reordering and other similar methods. These methods attempt to obfuscate the primary source of information with side channels. Unfortunately, in most cases, the techniques are still susceptible to attack - of those with promise, most are algorithm specific. This dissertation approaches high-level security by eliminating the relationship between high level side channel models and the side channels themselves. This work discusses two different solutions targeting architecture level protection. The first, deals with the protection of

Finite State Machines, while the seconds deals with protection of a class of cryptographic algorithms using Feedback Shift Registers. While the high-level solutions geared towards

FSMs are functional, they can be optimized. This dissertation includes methods for reducing the power overhead of any FSM circuit (secured or not). The solutions proposed herein render potential side channel models moot by eliminating or reducing the model’s data dependent variability. The results are undeﬁned correlations and elimination of all mutual information between the device’s actual side channel and side channel models at the cost of only area. Designers unwilling to compromise on a doubling of area can include some sub-optimal security to their devices. c 2013 - Mike Borowczak

Some rights reserved. CC BY-SA 3.0 Acknowledgments

It’s all about the Coﬀee, Family and Friends. Without you and a steady supply of coﬀee this dissertation wouldn’t have seen the light of day. The past few years have been full of growth and change both personally and academically.

To my parents, Christine and Marc and my sister Marie, Steve and Andrea - thank you for all your love and encouragement over the years. I’m truly fortunate to have a loving and supporting family who always stood by me regardless of the obstacles, challenges and choices I’ve come across. Without you - this work would have taken much longer and only been a shell of what it is today.

A special thank you to all the great friends I’ve made during graduate school. I am especially grateful for the bonds and friendships developed during and as a results of the GK-12 program - Andrea, Chelsea, Nick & Ken - two years seemed like an instant but the memories are ingrained for forever.

Within the lab- Annie & Jon - the 528 cohort; Lakshmi, Manoj, Greg, Aditi

& Antar the brave master’s students who decided to plunge into the unknowns of Side

Channel Attack & Analysis Research and provided many of the building blocks in the Low

Level and analysis realms; the senior DDELite PhD cohort of Shubankar, Angan, Almitra and Vijay; and finally Arun, Kristin, Tuhin and Nakul for having provided me outlets for hours of conversation. A decade and two degrees later - I’ve had the distinct pleasure of working with and learning from some incredible faculty. From the mentor on my first journal publication as a Freshman - Dr. John Franco to my first opportunity with proof of concept development work at Clifton Labs with Dr. Phillip Wilsey, to my experiences (and

NSF opportunities) with engineering education with Dr. Anant Kukreti and Dr. Richard

Miller you’ve all helped form me into the researcher, engineer and educator that I am today.

A special thank you to my committee members - Dr. Wen Ben Jone, Dr. George Purdy,

Dr. Carla Purdy, and Dr. Vijay Sundaresan- your support and feedback throughout the

v Acknowledgments vi years and during this dissertation process has been invaluable.

Finally, an especially big thank you to my advisor - Dr. Ranga Vemuri. Allowing me to work and develop as a researcher in all facets of academic research at my own pace, entrusting me with half a dozen master’s students and allowing me to manage a mini-research thrust has taught me more than any book can ever contain. Your guidance, encouragement and opinions have carried me through these past several years and I’m forever grateful for the experience, maturity and knowledge I’ve gained under your direction within DDEL. Contents

Title Page ...... i Abstract ...... ii Acknowledgments ...... v Table of Contents ...... vii List of Figures ...... x List of Tables ...... xii Dedication ...... xiv

1 Introduction1 1.1 Motivation ...... 1 1.1.1 Perspective ...... 3 1.1.2 Present Day Relevance ...... 4 1.2 Thesis Statement ...... 5 1.3 Organization of Dissertation ...... 6

2 Fundamentals8 2.1 Fundamental of Cryptography ...... 8 2.1.1 Classes of Algorithms ...... 9 2.1.2 Attacks on Cryptography: Cryptoanalysis and Hardware ...... 16 2.2 Fundamentals of Side Channels ...... 19 2.2.1 Types of Side Channels ...... 20 2.2.2 Modeling Side Channels ...... 20 2.3 Side Channel Attack Theory ...... 26 2.3.1 Protection Mechanisms ...... 29

3 Low Level Protection Mechanisms 33 3.1 Introduction ...... 33 3.2 Existing Work ...... 34 3.3 Designing a Secure Cell: SDMLp ...... 36 3.3.1 SDMLp Fundamentals ...... 36 3.3.2 Cell Level Characterization & Analysis ...... 40 3.3.3 Implications ...... 42 3.4 Secure Circuit-Level Design Methodology ...... 44

vii Contents viii

3.4.1 Synthesizing SDMLp based Circuits Using Reduced Order BDDs . . 45 3.4.2 SDMLp Synthesis Flow ...... 46 3.4.3 Implementation & Analysis ...... 47 3.4.4 Implications ...... 50 3.5 Temperature Variation Eﬀects on Dynamic Power ...... 54 3.5.1 Temperature Eﬀects ...... 54 3.5.2 Implications ...... 56

4 FSM Based Protection Mechanisms 58 4.1 Introduction ...... 58 4.2 Motivation ...... 59 4.3 Background ...... 60 4.3.1 Limitation to existing Research ...... 62 4.4 Objective ...... 64 4.5 FSM Example ...... 65 4.6 S*FSM ...... 67 4.6.1 Structural ...... 68 4.6.2 Encoding ...... 71 4.7 Implementation ...... 73 4.7.1 FSM Conversion ...... 74 4.7.2 Solving the Encoding Problem with an SMT Solver ...... 75 4.8 Experimental Setup ...... 78 4.8.1 Characterization ...... 78 4.8.2 Security Analysis Phase ...... 79 4.9 Characterization Using Theoretical Simulation ...... 81 4.10 Physical Realization of S*FSM ...... 82 4.10.1 Implemented Flow ...... 83 4.11 Results ...... 84 4.11.1 Characterization Results ...... 85 4.11.2 Security Results ...... 91 4.12 Implications ...... 95

5 SFSMs with Power Constraints 105 5.1 Motivation ...... 105 5.2 Introduction ...... 106 5.3 Existing Solution ...... 107 5.4 Proposed Extension ...... 110 5.5 SMT Based Constraint Formulation ...... 114 5.6 Results ...... 115 5.6.1 Characterization Results ...... 115 5.6.2 Security Results ...... 121 5.7 Implications ...... 125 Contents ix

6 Side Channel Attack and Analysis Related Work 130 6.1 Architecture Level Approach ...... 130 6.1.1 Motivation ...... 131 6.1.2 FSR-based Algorithms ...... 131 6.1.3 Register Rotation with Substitution ...... 137 6.1.4 System Level Security through Parallelization ...... 140 6.2 Tool and Technologies Developed ...... 146 6.2.1 SCARF ...... 147 6.2.2 Secure Cell Logic Synthesis Methodology ...... 148 6.2.3 Supplementary ...... 149

7 Conclusions and The Path Forward 151 7.1 Conclusions & Contributions ...... 151 7.2 Future Work ...... 152 7.2.1 Low Level Methods and Logic Synthesis ...... 153 7.2.2 Automata Based Side Channel Protection ...... 153 7.2.3 System Level Security though Duplication ...... 155 7.2.4 Attacks ...... 155 7.3 Beyond the Dissertation ...... 156 7.4 Final Thoughts ...... 157

Bibliography 159 List of Figures

2.1 Symmetric and Asymmetric Cryptography ...... 10 2.2 Cryptographic Algorithm Classiﬁcation ...... 11 2.3 XOR Cipher ...... 13 2.4 DES: A Classic Feistel Based Symmetric Algorithm ...... 15 2.5 ECG Sinus Rhythm ...... 21 2.6 Expanded Target Model ...... 22 2.7 Asymmetric CMOS Power Consumption ...... 23 2.8 CMOS Power Components ...... 24 2.9 CMOS XOR Power Consumption ...... 24 2.10 Power Dissipation Forecast ...... 25 2.11 Target, Model and Attack ...... 27 2.12 Simple Power Analysis on AES ...... 30

3.1 CPL Base DUAL MUX Cell ...... 37 3.2 SDMLp Logic Cell ...... 38 3.3 SDMLp cell layout ...... 41 3.4 Complementary Nodes ...... 45 3.5 Targeted SDMLp Design Flow ...... 46 3.6 DES Layouts ...... 48 3.7 SDMLp Correlation vs. Number of Vectors ...... 52 3.8 SDMLp: Correlation vs. Key Guess...... 53 3.9 NLFSR Correlation at Temperature ...... 56 3.10 NLFSR Correlation with High Vth ...... 57

4.1 Branch Predictor ...... 66 4.2 Motivating Restructuring and Encoding ...... 68 4.3 Restructured FSM ...... 70 4.4 Structurally Modiﬁed BP FSM ...... 70 4.5 Mutual Information: Four Cases ...... 80 4.6 FSM Evaluation Flow ...... 85 4.7 FSM Experimental Flow ...... 86 4.8 BENGEN FSM 137 ...... 87 4.9 BENGEN FSM 86 ...... 88

x List of Figures xi

4.10 MCNC BBARA ...... 89 4.11 MCNC OPUS ...... 90 4.12 MCNC State, Transition Increase ...... 94 4.13 MCNC Bit Requirements ...... 97 4.14 MCNC Layout Requirements ...... 99 4.15 BBARA Entropy ...... 100 4.16 BBARA Mutual Information ...... 101 4.17 MCNC Current Entropy ...... 102 4.18 MCNC MI(Power, HW) ...... 103 4.19 MCNC MI(Power, HD) ...... 104

5.1 3-State FSM Model Correlations ...... 109 5.2 Functionally equivalent SFSM ...... 111 5.3 MCNC Bit Requirements Across Structures ...... 116 5.4 MCNC Bit Requirements Under Constraint ...... 117 5.5 MCNC Bit Requirements w.r.t. Modiﬁed Constraints ...... 118 5.6 MCNC Area Requirements) ...... 119 5.7 Full Current Trace MCNC EX1 ...... 120 5.8 Partial Current Trace MCNC EX1 ...... 121 5.9 Normalized Power Requirements ...... 122 5.10 MCNC Entropy Under Constraints) ...... 123 5.11 MCNC Constrained MI(Power, HW) ...... 124 5.12 MCNC Constrained MI(Power, HD) ...... 126 5.13 MCNC Constrained Costs ...... 128 5.14 Design and Security Costs ...... 129

6.1 A generic NLFSR-based cryptographic architecture ...... 132 6.2 Hamming Weight and Distance of a Static Rotating Register ...... 136 6.3 Doubled FSR System ...... 142 6.4 Quad FSR System ...... 144 6.5 PDF of HW/HD Change ...... 145 List of Tables

2.1 RSA Summarized ...... 17 2.2 Possible Inverter Output Transition ...... 25

3.1 Generic CPL Gate ...... 39 3.2 SDMLp Max Inst. Current Variance ...... 42 3.3 SDMLp Delay ...... 42 3.4 SDMLp Layout Area ...... 43 3.5 SDMLp Avg Power ...... 43 3.6 DES Layout Area ...... 49 3.7 Total Power: DES ...... 50 3.8 Max Inst. Current: DES ...... 51 3.9 Full DES Requirements ...... 51

4.1 Multiple Encodings for Secure BP FSM ...... 71 4.2 States and Transitions for SOPT ...... 73 4.3 Existence of Hamming Model Correlations ...... 82 4.4 BenGen State Requirements ...... 88 4.5 BenGen Transition Requirements ...... 91 4.6 MCNC State Space Requirements ...... 92 4.7 MCNC Transition Requirements ...... 93 4.8 Bits Needed to Secure BenGen FSMs ...... 95 4.9 Bits Needed to Secure MCNC FSMs ...... 96 4.10 BenGen Layout Area Requirements ...... 98 4.11 MCNC Layout Requirements ...... 99

5.1 Hamming Model Variability without Constraint ...... 110 5.2 Hamming Model Variability with Constraint ...... 113

6.1 Simple Rotating Register Values ...... 134 6.2 XOR pairs for A Static Rotating Register ...... 135 6.3 PDF of HW Change ...... 138 6.4 Values of a rotating register with substitutions...... 139 6.5 Shift Register XOR Pairs ...... 139

xii List of Tables xiii

6.6 MSB and LSB of Rotating Register ...... 141 6.7 Probability of ∆HD: 1reg/1bit ...... 142 6.8 Possible HD Values and Probabilities ...... 142 6.9 Probability of ∆HW—HD: 2reg/1bit ...... 143 6.10 Probability of ∆HW—HD: 4reg/1bit ...... 146 To the dreamers, believers and hopeless romantics... You will always have two hundred and thirty four things to do, you might as well enjoy the journey.

xiv Chapter 1

Introduction

I have some security that could protect me against provocations but of course there are more terrible actions that could not be stopped by any security. -Garry Kasparov

1.1 Motivation

Over the past decade the proliferation security-centric mobile technologies such as smartcards, heart implantable deﬁbrillators, and soldier worn sensor networks, has grown ever-increasingly to supply the demand set by the consumer, medical and military electronics markets. Each of these markets is ultimately driven by the ever-present desire to have the fastest, smallest and cheapest technology with the longest up-time between recharges. While some consumer devices such as radios, coﬀee makers, smart thermostats and arguably most pure audio/video players contain and allow access to mundane information or services, most electronic devices store, use and/or control much more sensitive, personal, secret information.

At this moment, there is a great chance that you are surrounded by mobile devices, credit cards, a laptop, and car keys among a multitude of other electronics devices. In many

1 Chapter 1: Introduction 2 cases, you or someone you know may have an implanted electronic medical device. On your commute today, you were surrounded by electronics devices controlling everything from airbags and brakes to your climate control, traffic signals and even the most efficient route to work. Behind the scenes your paycheck, investments, credit and medical history are moving from device to device: accessed, modified, traded and saved. Irrespective of the location, functionality or complexity, in each scenario security-critical devices attempt to allow you, or some authorized agent(s), secure access to some critical content or service.

Historically, in the digital age, information security has been viewed as a problem of applied mathematics namely a cryptographic exercise. Except for one-way hashes, cryptographic functions are simply reversible mathematical functions that apply a transform to an input signal. Cryptographic functions can either transform an input stream of sensitive information D into a secured stream S using some secret K through a process called encryption or transform a secured stream back to the original data using some other secret

K0 that is related to K, through a process called decryption. A cryptographic function is considered mathematically secure or hardened if even given both D and S, ﬁnding KorK0

is not possible without brute force guessing.

Unfortunately, the long standing view of most device designers has been that if

a cryptographic function is hardened, then so is a device that implements that function.

In the past decade a new family of attacks, based on side channel sources of information,

has emerged as a threat to the traditional security measures embedded within electronic

devices. These attacks target familiar encryption algorithms found in many turn-key con-

sumer devices and smart-cards as well as special purpose algorithms such as those used in

many security-critical devices found in the general consumer, medical and military spectra.

These attacks have not been primary focus of the hardware design community,

rather their focus has been on improving the functionality, performance and reliability Chapter 1: Introduction 3 of electronic devices while reducing their physical footprint, power consumption, time-to- market, and ultimately, cost. While the past few decades have seen a push towards providing mathematically strong/secure consumer devices comparatively little, un-classiﬁed, research and work has gone into the automated design of devices resistant to side channel based attack. Now, with the ever-increasing usage and prevalence of data-critical electronics in society, coupled with side channel mechanisms capable of circumventing traditional cryptographic protection methods, the need to for top-down, generalized, secure hardware design methods is paramount.

1.1.1 Perspective

While the 1970’s TEMPEST project is the earliest officially declassified U.S. military project to deal with side channel the released material references their apparent use by the Japanese (1962), in the Soviet Union (early 1950’s) and finally in work done by

Bell Labs on a Bell 131-B2 (1940’s) [1] . Early work by Bell Labs to protect cryptographic devices lead to three main suppression measures, namely shielding for radiation through space and magnetic ﬁelds; ﬁltering for transmitted signals; and, masking for space-radiated or transmitted signals.

To understand why side channels are still an open problem today, the motivation and goals of a traditional hardware designer must be taken into account. Regardless of the size of the group, a designer is responsible for taking a functional requirement and trans- lating it to a structure which can be fabricated into a physically realized device following a design process. Regardless of the complexity of the functional requirements (e.g. an entire processor down to single gate) and the final desired target technology (e.g. semiconductor based, wire-wrapped, mechanically driven), the goal of the designer is to insure that for all possible inputs into their device, the output will match the specification given. In addition Chapter 1: Introduction 4 to functional specifications, hardware designers are limited by the constraints also in place

(e.g. minimum speed, maximum power drawn, maximum area footprint, technology used).

Once the functional requirement is translated, implemented, tested and validated within the bounds of the imposed constraints a designer moves on to the next design: typically little to no additional consideration is given to the function being implemented - a multiplier is treated the same as a cryptographic sub-component. Unfortunately, this approach has sig- niﬁcant consequences in the realm of side channel security since rarely is side channel based information leakage considered a constraint or design parameter in present day hardware design.

Increased use of cryptographic functions in consumer electronics for authentication and secure communication, coupled with the increasing sophistication of the attacks on them necessitates new design methodologies for secure devices. The ultimate goal is to reduce the burden on hardware designers who simply do not have the time or skill set to become cryptography experts. Two broad topics exist in the development of secure devices: the design of mathematically secure or hardened algorithms and the construction of secure physical implementations. Mathematical hardening, while potentially eﬀective against cryptanalysis, does not guarantee security of implemented integrated circuits: data dependent information is leaked through secondary sources such as power consumption, timing characteristics, glitch occurrence, electromagnetic (EM) and even sound emanations.

1.1.2 Present Day Relevance

While often research and practicality do not intersect, the ﬁrst half of 2013 saw a new threat to consumer security publicly revealed via several media outlets [2][3]. Thieves are targeting subsets of vehicles with an electronic, turn-key black box device. Based on the characteristics of the targeted attacks, (e.g. limited car manufacturer targets, pene- Chapter 1: Introduction 5 tration/unlock accessibility only, speed of access) it is highly probably that these attacks are rooted in using information gleaned from a type of side channel attack in conjunction with relaxed key retention policies. The mitigation of information leakage from any source is paramount in increasing attack complexity while conversely any information leakage can be used to reduce the attack search space. This dissertation deals with providing a set of methods which increase the physical security of a devices’ emanations at various levels of abstraction.

1.2 Thesis Statement

The need for secure logical devices has outpaced the days where secure computations and messages are encoded and decoded in buildings protected by Faraday cages.

Every electronic device, from cryptographic smart cards to life-enabling pacemakers need mechanisms that offer protection from side channel based attacks. This dissertation focuses on methods that span from low level and circuit level designs methods to architecture level solutions, and finally briefly introduces the foundation for some system level approaches.

The ability to secure a device’s computationally based by-products using low-level methods has been well documented. The first third of this work details a new secure logic style and combines it with an efficient circuit level design methodology to mitigate the high design cost of most existing low level side channel resistance techniques. In order to provide more granularity and flexibility in securing devices this dissertation also discusses two separate high-level methods for reducing the risk of side channel attacks. The first of these methods, covered in complete detail, focuses on generating secure Finite State

Machines while the second which is discussed as a tangential work, centers around high- level architectural modiﬁcations to increase device security while also increasing potential Chapter 1: Introduction 6 data throughput. The ﬁnite state machine work is extended to provide greater design tuning of both area and power consumption.

1.3 Organization of Dissertation

Though motivated in this chapter, the fundamental concepts of cryptography and side channels needed to understand the bulk of this thesis are covered in Chapter2. The remainder, and focus of this dissertation is organized in the following chapters:

• Chapter3: Cell & Circuit Level Protection provides a continuation of one

of the most prevalent areas of side channel research in discussion of low-level cell

based techniques. The new secure cell logic style (SDMLp) and associated reduced-

order BDD synthesis ﬂow detailed in this chapter show marked improvements in many

design metrics including area and power consumption. Finnaly, this chapter highlights

the impact of temperature variations on side channel leakage.

• Chapters4 and5: High Level FSM Protection and Constrained FSM Pro-

tection both focus on the ability to protect Finite Automata from attack. The High

Level FSM theory is motivated by ﬁrst exploring the dichotomy between the minimal

representation and encoding strategies which have been the focus of FSM design and

the requirements for secure FSMs. In attempting to mitigate power concerns the so-

lution space is further constrained to minimize switching activity as well as reduce the

secure FSM state space requirements by relaxing self-looping constraints. While the

focus of the former chapter is on the physical requirements needed to produce secure

FSMs the focus of the latter chapter is on the impact of security relaxation on the

Mutual Information shared between the power side channels and attack models.

• Chapter6: Tangential Work discusses other side channel related work, including Chapter 1: Introduction 7

the basis for system level resistance as well as the collection of tools throughout the

course of this research.

Finally, Chapter7: Conclusions and The Path Forward outlines both the overarching ﬁndings and remarks concerning this research as well potential future research veins that this work enables and motivates - in hardware security as well as other overarching areas. Chapter 2

Fundamentals

This thesis focuses on removing the relationship between side channels and their models in the context of cryptographic devices. Most non-invasive attacks on cryptographic devices and implementations are based on the relationship between a simple side channel model and the information leaked from a secondary sources [4–6]. This chapter contains information pertaining to both cryptography and side channels, culminating with an overview of side channel attacks as well brief overview on existing countermeasures.

The objective of this chapter is to provide the reader, regardless of previous electronics and computing background, the fundamentals required to understand each of the remaining chapters: after this chapter, while a linear progression from low to high level security methods exists, each can be read independently of one another.

2.1 Fundamental of Cryptography

The goal of any cryptographic system (cryptosystem) is to transmit and receive encoded messages in the open without exposing the original message to anyone except the sender and receiver. Cryptographic functions, the primary components of any cryptosystem,

8 Chapter 2: Fundamentals 9 are either encryption functions if they convert known messages (plaintexts) into a secret messages (ciphertexts) or decryption functions if they converts ciphertexts back to plaintext.

Both encryption and decryption use parameterized transforms that include a secret key.

2.1.1 Classes of Algorithms

The primary categorization of a cryptosystems is dependent of the relationship between the encryption key, ke, and decryption key, kd (Figure 2.1). Cryptographic algorithms in which a single key (ke = kd) is used for both encryption and decryption are referred to as Symmetric algorithms while those which use two unique keys (ke 6= kd) are referred to as Asymmetric (Public Key) algorithms [7]. The distinct advantage to an asymmetric key system is that the security of the two keys is independent of one another - knowing one, does not reveal the other - allowing the safe transmission of the public key.

Each major set of cryptosystems is further divided based on unique characteristics within the subset - symmetric algorithms are separated by how they process a message while asymmetric algorithms avoid this entirely. A classification tree, found in Figure 2.2, has internal nodes that represent major classes of algorithms while its leaf nodes are a representative subset of interesting algorithms that fit that classification.

Symmetric Algorithms

Of the two classes of cryptographic algorithms used in modern day communication, symmetric (or secret key) algorithms are less complex both mathematically and in implementation, making them orders of magnitude faster than their asymmetric counter- parts. Symmetric algorithms, shown in Figure 2.1(a), use the same key (ke = kd) in both encryption and decryption functions. Further classiﬁcation of symmetric algorithms occurs in how they process a information during a single round of encryption or decryption: either Chapter 2: Fundamentals 10

(a) Symmetric Key Cryptography

(b) Asymmetric (Public) Key Cryptography

private key public key

plaintext Encryption ciphertext Decryption plaintext

Figure 2.1: Symmetric Algorithms (a) and Asymmetric Algorithms (b) diﬀer in their use of either a single key (ke = kd) or unique keys (ke 6= kd) for encryption and decryption for message passing. Asymmetric algorithms can also be used to create digital signatures (c)

. Chapter 2: Fundamentals 11

Data Encryption Standard

Advanced Encryption Block Cipher Standard Blowﬁsh

RC4 Stream Cipher FISH Symmetric Hash MD5 Cryptographic Functions SHA-0,1,2,3 Algorithms Asymmetric Difﬁe-Hellman

Elliptic Curve, Discrete Logs, EPOC-1,2,3 Prime Number ElGamal Factorization RSA: Rivest, Shamir & Adleman

BLS: Boneh, Lynn, Shacham

DSA: Digital Signature Algorithm

Figure 2.2: The classiﬁcation of cryptographic algorithms begins with key management (symmetric or asymmetric), and then depending on the branch is either classiﬁed by application (symmetric) or major underlying mathematical basis (asymmetric). Chapter 2: Fundamentals 12 on blocks of data (block ciphers) or single bits or bytes of a stream of data (stream ciphers).

Block Ciphers Block ciphers are one-to-one, deterministic algorithms that operate on

ﬁxed sized blocks of data, converting each message block to another ﬁxed size block. In this scenario, encryption e and decryption d, using the same key, act as inverse functions

(e = d−1, d = e−1). While possible, as is the case in the exclusive or (XOR) block cipher,

this does not imply that all secret key encryption and decryption functions are identities,

they are generally unique in nature. Block cipher algorithms such as the Data Encryption

Standard (DES) and the Advanced Encryption Standard (AES) are designed as iterated

product ciphers which consist of the simple arrangement and combination of two funda-

mental operations essential to cryptography (substitution and permutation) over multiple

(iterated) rounds. In order to enhance security, these product cipher use a unique sub-key

that is related to original key during each iteration.

XOR One of the simplest block ciphers (and consequently also a naive stream

cipher), the XOR cipher demonstrates two key properties of block ciphers: their operation

on ﬁxed block sizes and their 1-1 nature. In Figure 2.3 a 32-bit data message (plaintext or

ciphertext) is segmented into four 8-bit blocks, each block is XOR’d with the same 8-bit

key (0xBE), and reassembled to form a 32-bit output (ciphertext or plaintext). Chapter 2: Fundamentals 13

Ciphertext/Plaintext E A D B E E F D E A D B E E F D 1110 1010 1101 1011 1110 1110 1111 1101 1110 1010 1101 1011 1110 1110 1111 1101 block 1 F D X 4 3 X F D 1111 1101 0100 0011 1111 1101 block 2 E E X 5 0 X E E 1110 1110 0101 0000 1110 1110 block 3 D B X 6 5 X D B 1101 1011 0110 0101 1101 1011 block 4 E A X 5 4 X E A 1110 1010 0101 0100 1110 1010 B E B E 1011 1110 1011 1110 5 4 6 5 5 0 4 3 0101 0100 0110 0101 0101 0000 0100 0011

Plaintext/Ciphertext

Figure 2.3: Detailing the blockwise XOR encryption/decryption of a 32-bit message (0xEADBEEFD) with an 8-bit key (0xBE) and the subsequent decryption/encryption of the resulting message (0x54655043) using the same key. Chapter 2: Fundamentals 14

DES The Data Encryption Standard [8] while in its original form is now considered insecure due to its small 56-bit key, it functions as a introductory example of the key components in modern day block ciphers. Furthermore, both the original and the tripled version of the algorithm (Triple DES) remain heavily used in commercial and consumer applications alike. DES is a classic 64-bit block cipher that operates using 56 bits of a 64 bit key (the remaining 8 bits are used for parity checking) over 16 identical rounds with message permutations occurring at both the input and output of the algorithm. The arrangement and combination of these rounds and permutations follow the criss-crossing or butterﬂy-like, Feistel network shown in Figure 2.4(a)[9]. The use of a regular structure, like a Feistel network, enables heavy redundancy between encryption and decryption functions

- allowing a single implementation to perform both operations. In the case of DES, the only required change is the reversal of subkeys to the inner Feistel functions. Each of the internal

Feistel functions (Figure 2.4(b)) in DES use a uniquely derived 48-bit sub-key XORd with half of a 64-bit data block which is then transformed using 8 unique substitution boxes (or look up tables) that convert 6 bit inputs to 4 bit outputs.

Stream Ciphers Unlike block ciphers, stream or state ciphers operate on a single bit or byte of data per cycle using a pseudo-random keystream1. This allows for signiﬁcant speed improvements, reduced implementation complexity and variable length in plaintext/ciphertext length at the cost of reduced cryptographic strength mostly due to the pseudo-random key stream. Stream ciphers are used predominately in media and communication-based applications as high throughput with minimal resources far outweighs the increased security provided by a large clock ciphers. Of particular interest is the class of stream ciphers that

1While the distinction between block and stream ciphers is not well deﬁned, generally ciphers that operate on single bits and bytes are considered stream ciphers while those that operate on 32 or more bits are considered block ciphers. Chapter 2: Fundamentals 15

(a) DES Feistel Structure

(b) Individual Feistel Function within DES

Figure 2.4: (a) The DES algorithm is comprised of 16 rounds organized in a classic Feistel network. (b) The individual Feistel functions consist of a simple xor of half of the data a generated subkey passed to 8 unique 6 to 4 s-box transform (Source: [8])

. Chapter 2: Fundamentals 16 rely on the use of Feedback Shift Registers (FSRs) to form secure cryptosystems.

Asymmetric Algorithms

The major distinction for asymmetric algorithms over symmetric algorithms is the use of two separate keys for the encryption and decryption processes. Typically, as shown in

Figure 2.1(b) the public key (open and shared to the world), is used to encrypt information which can then only be decrypted using the private (secret) key. Asymmetric schemes are also at the core of digital signatures - the private key encrypts the message and then the publicly available key is used to validate the message as shown in Figure 2.1(c). They commonly used RSA algorithm is detailed in the following section.

RSA uses the power of large prime numbers as the fundamental basis for its cryptographic strength. The process is intensive during the key generation phase, and relatively simple in both encryption and decryption as highlighted in Table 2.1. While the example provided uses a small key for demonstration, RSA key sizes should exceed 1024 bits in length in order to provide any semblance of security: a 1024 bit RSA key is roughly equivalent to a symmetric key algorithm with an 96-bit key [10].

2.1.2 Attacks on Cryptography: Cryptoanalysis and Hardware

The topic of security for modern day research in cryptography revolves around a subset of principles established a century ago by Auguste Kerchkhoﬀs. Of particular interest to cryptoanalysis is the fundamental idea that the cryptographic system should be mathematically undecipherable and open [11]. In other words, any secure cryptographic system should be computational hard to attack and should remain uncompromised even when all implementation details are known to an attacker. Designers must rely on the Chapter 2: Fundamentals 17

RSA

Description Formal Example

Key Generation

Select two large primes p and q p = 7, q = 13

Compute n, φ(n) n = p × q, φ(n) = (p − 1) × (q − 1) n = 91, φ(n) = 72

Choose integer e s.t. gcd(φ(n), e) = 1; 1 < e < φ(n) e = 5

Compute d d ≡ e−1(modφ(n)) d = 29

Public Key (n, e) (91, 5)

Private Key (n, d) (91, 29)

Encryption

Plaintext M < n M = 42

Ciphertext C = M e(modn) C = 425(mod91) = 35

Decryption

Ciphertext CC = 35

Plaintext M = Cd(modn) M = 3529mod91 = 42

Table 2.1: RSA Key Generation, Encryption and Decryption Chapter 2: Fundamentals 18 strength of the algorithm, not its obscurity2. The goal is to create cryptosystems that are completely transparent in implementation and rely on maintaining the secrecy of individual keys. Throughout this thesis the algorithms and implementations are completely exposed - the worst case scenario for any target, and best case scenario for an attacker.

Attacks on Physical Implementations

Physical attacks on implementations are typically categorized using two different classifiers - the first is the engagement of the attacker and secondly is the invasiveness of the attack on the target.

Attack Engagement There are two primary classiﬁcations for attacks on (cryptographic) hardware implementations that are diﬀerentiated by the level of engagement an attacker exerts on the target.

Passive Attacks gather physical byproduct information during the targets’ normal ex-

ecution: these attacks can be diﬃcult (if not impossible) to detect as the attacker

simply captures existing streams of information.

Active Attacks modify the target system, its logical and functional inputs and even its

environment to gain additional information: these attacks generally force the target

device into abnormal states.

While important to understand from an attack perspective, this classiﬁer, given both Ker- chkhoﬀ’s Principles and Shannon’s Maxim, requires any viable security solution that assumes an active attack when validating its security merit.

2Shannon’s Maxim holds: ”The enemy knows the system.” [12] Chapter 2: Fundamentals 19

Attack Invasiveness In addition to classifying attacks by the level of modiﬁcation, attacks are further classiﬁed by the complexity (and invasiveness) of the interface between the target and the attack:

Invasive Physical changes are made to the device, its packaging, and/or its original func-

tionality to gain access to information: a device can be completely depackaged allowing

individual signals to be probed.

Semi-invasive Similar in nature to invasive attacks, these attacks also depackage a device,

but generally to gain access to memory cells bypassing the typical read-out circuitry.

Non-invasive Unlike invasive/semi-invasive attacks, only accessible interfaces are targeted

(passively or actively). Non-invasive attack methods generally leave no residual trace

when done passively, though some active attacks on the other hand place systems into

detectable abnormal states.

The focus of this thesis is on the mitigation of side channel attacks. These attacks fall within the realm of non-invasive attacks, and while generally presented as passive in nature, a large majority of attacks require a highly active attacker/attack strategy.

2.2 Fundamentals of Side Channels

While side channels may appear to be unique and novel features of electronic devices, consider the two following scenarios of espionage and medicine:

Communication Analyst Alice would like to translate a message from her native tongue

of English to two diﬀerent languages: French and Lithuanian. Which will take her

longer to translate? Which will contain fewer errors? Which will be closer to what Chapter 2: Fundamentals 20

would be produced by a native writer? While both would require the process of trans-

lation, their individual complexities and available references would create a disparity

in creating each translated message. This disparity could be quantiﬁed by the overall

time to translate, rate of word translation, the accuracy of translation or even the

similarity of the translation to existing texts.

Cardiologist Alice is reading Bob’s electrocardiograph and notices an abnormality in the

signal: speciﬁcally an increase in QRS complex’s amplitude (Figure 2.5). How does

this helps Alice to make a diagnosis? While to an untrained eye an ECG/EKG is just a

set of waveforms, in this case its indicative that Bob probably has cardiac hypertrophy

- enlarged ventricles (hopefully he is an athlete, not stressed/hypertensive). The heart

does not have an ”enlarged ventricles” signal, rather that information as a byproduct

of the electrical activity across the heart. Alice uses her existing knowledge, coupled

with the ECG and other data from Bob to determine the likely underlying condition

- further testing will validate her hypothesis.

2.2.1 Types of Side Channels

Recalling that side channel attacks by deﬁnition belong to the more general class of non-invasive attacks, Figure 2.6 depicts both the expected High-Level Input/Output view of a target device, while including additional products on the left and potential attack by-products (side channels) on the right.

2.2.2 Modeling Side Channels

The following discussion centers around appropriate models for the commonly attacked power side channel as it remains the focus of this work. Creating a high level model of a side channel starts with understanding the channel. While these models are Chapter 2: Fundamentals 21

Figure 2.5: The hearts normal sinus rhythm and associated physiological implications have been explored and documented since the dawn of the 20th century with the advent of Eletrocariography [13]. generally assumed, the following section provides an overview of their derivation since they are somewhat technology dependent 3.

Fundamentals of CMOS Power Consumption

The interest in the power consumed by a device (or sub-function) is in its relation-

ship to the data processed. While a typical CMOS device consists of countless functions,

to derive eﬀective, simple, models for power side channels an inverter (Figure 2.7) is used

to highlight the asymmetric nature of power consumption within CMOS devices - primar-

ily due to static and dynamic power dissipation (Figure 2.8). Static power consumption

(Ps), consists of the power consumed while the device (or function) remains in a steady,

3A more in-depth discussion of power dissipation in CMOS devices can be found in Chapter 5.5 of [14] while Chapter 3 of [15] contains a overview within the scope of side channel attacks. Chapter 2: Fundamentals 22

Figure 2.6: A target with tradition Input / Output pair along with potential attack stimuli and byproduct responses.

unchanged state while dynamic power (Pdyn) is a result of input switching activity. During the operation of a CMOS device two complementary transistor networks function in tan- dem to realize any Boolean function. These two networks, the pull-up network the and a pull-down network, either pull a common output node towards a high state - logic level 1

(Pull Up Network, PUN) or towards a low state - logic level 0 (Pull Down Network, PDN).

It is the current passing through these two networks as well as the capacitive load seen by a combined CMOS network that forms the basis for fundamental CMOS power models.

Static Power - Ps Of the two sources of power dissipation, static power exists during the steady state of a devices operation. While ideally zero, this source of power is dominated by the leakage current - or the amount of current ﬂowing through MOS transistors when the circuit is in a static state. Ps = IS · VDD. Chapter 2: Fundamentals 23

Figure 2.7: The CMOS Inverter demonstrates the asymmetric nature of CMOS logic styles. In particular, the 0-1 transition which consumes the bulk of the power within CMOS devices. (Figure Source: [16], [17])

Dynamic Power - Pdyn In general, Dynamic Power consumption occurs whenever internal or output signals change state. Dynamic power (Eq. 2.1) is a function of the switching activity, the supply voltage and the capactive load within the circuit.

2 Pdyn = CLVDDf0→1 (2.1)

In the case of the inverter this is simpliﬁed to transitions of the input signal. Since only two possible inputs exist, there are only four unique transition cases possible 0 → 0,

0 → 1, 1 → 0 and 1 → 1. Of these four cases only two cause output transitions in a standard CMOS inverter: 0 → 1 and 1 → 0. When an inverter goes through a 1 → 0 input

transition, it capactive load CL is charged with current from the supply source while during

a 0 → 1 transition the capacitance is discharged. An XOR gate, which consumes power

only during the 0 → 1 input transition, is shown in Figure 2.9 to demonstrate the extension

to multi-input sCMOS style logic gates.

Total Power Clearly an inverter can only remain in one of two states based on one input

thus a maximum of four possible input scenarios exist as seen in Table 2.2. In present 45

nanometer or greater technology it is generally accepted that of these four scenarios, the Chapter 2: Fundamentals 24

(a) (b) (c)

(d)

Figure 2.8: The CMOS Inverter showing the sources of power (a) and (b) show a switch model of dynamic behavior during Low to High and High to Low transitions respectively, while (c) shows the short circuit current associated with the low to high transitions. Finally, (d) shows the leakage associated with a non-transitioning inverter. Images from [14].

Figure 2.9: Power Consumption Proﬁle for a Two-Input XOR gate in CMOS logic (Figure Source: [18] Chapter 2: Fundamentals 25

static power dissipated by both steady-states P0→0 and P1→1 are roughly equal and con-

sume much less power than dynamic power consumed and dissipated during P1→0 and P0→1

transitions respectively. It is important to note that while for most present day electronic

devices, static power consumption, is a minimal factor in the overall power consumed (ac-

counting for 1 % of total power consumption), as feature sizes decrease and devices start

being viably mass-produced with deep sub-micron technology leakage power will become

the dominate factor (See Figure 2.10).

Figure 2.10: Predicted device power dissipation: static and dynamic sources as functions of exponentially decreasing gate length. (Figure Source: [19]).

V int V int+1 Vout Transition Equivalent Power Components

P0→0 0 0 - Ps

P0→1 0 1 1→0 Ps + Pdyn

P1→0 1 0 0→1 Ps + Pdyn

P1→1 1 1 - Ps

Table 2.2: The CMOS Inverter has four possible input scenarios between time t and t+1, of which only two have output transitions. Chapter 2: Fundamentals 26

Hamming Models

When using a power side channel, a typical attack model uses either the Hamming

Weight, HW (4.1), of a register S or the Hamming Distance, HD (4.2), between two successive states of a register, Ri and Ri+1, of that register. Variability in either of the Hamming computations implies a correlated variability in the power consumed, and thus a potential security vulnerability [15].

x HW (S[x..0]) = X S[i] (2.2) i=0

x X HD(Ri[x..0],Ri+1[x..0]) = Ri[b] ⊕ Ri+i[b] (2.3) b=0

2.3 Side Channel Attack Theory

Regardless of the potential target - social, biological, or in the case of this work, electronic - to ensure its protection from side channel attacks the relationship between the model and the byproduct must be eliminated. While side channel attacks are a relatively new attack strategy against hardware devices, their popularity is growing due to their low implementation cost. Figure 2.11 depicts a common target, a side channel model and their use in a generic side channel attack scenario.

Side channel attacks exploit the relationship between the logical/functional operation of a device and it’s physical byproducts. When used in conjunction with other, known information, it is possible to determine information stored within the device.

A typical side channel attack, on device T arget, implementing a function F with input M, key K (e.g. a Secret Key), output C, and side channel SC requires three main components: Chapter 2: Fundamentals 27

(a) Generic target T transforms an input to produce (b) Generic model M of either F or F −1 that pro-

both an output as well as a side channel. duces a predicted side channel.

specﬁc details of individual attacks.

Figure 2.11: A typical attack target, simpliﬁed side channel model and their incorporation within a generic attack model. Chapter 2: Fundamentals 28

1. Access to a target device’s side channel: T argetSC ∝ T argetF (K,M|C)→C|M ;

2. A model of the side channel: ModelSC ∝ G(KGuess, M|C), where G is related to

F orF 0 and KGuess is a key guess belonging to a subset of possible secret keys; and

3. A relationship between T argetSC and ModelSC allowing ranking of key guesses.

Fortunately for attackers (and conversely not so for designers), the physical realization of functions using fundamental building blocks, like standard cells, are generally unique. This property makes it relatively simple to capture and perform statistical analysis when algorithmic level models exist, thereby allowing access to internal device details. One of the most prevalent, and original, side channels attacks, due to its accessibility and low hardware cost, targets devices using information embedded within their power consumption [20].

While more sophisticated methods for measuring power dissipation from a distance have been documented and exploited since the focus of this work remains on the worst case scenario principles described earlier, the most eﬀective attack method of a target’s side channel requires direct physical access to the device. With physical access to a target, measuring the voltage diﬀerence across a known resistor inserted in series with the power or ground of a target allows the capture of the power dissipated over time.

Simple Power Analysis

Fundamentally, simple power analysis (SPA) consists of manually identifying the relationship between a power trace and the underlying functionality of a device. Figure

2.12(a) shows a single AES encryption on an ASIC device, while Figure 2.12(b) shows a zoomed in and annotated power trace from a microcontroller. In an SPA attack, the attacker uses direct, visual inspection of a power trace and its fluctuations to determine Chapter 2: Fundamentals 29 information about the cryptographic operations being performed. This idea was the catalyst for Differential Power Analysis attacks first formally introduced by Kocher et al [20].

The key for both attackers and designers is that SPA can quite readily be used to reveal the sequence of instructions that are being executed, and more importantly, it can be used to break cryptographic implementations with data dependent execution paths.

Diﬀerential Power Analysis

The fundamental idea behind Diﬀerential Power Analysis (DPA) based attacks is the same as in SPA - power dissipation during a devices’ operation is heavily data dependent. The impediment to using SPA in ASIC devices is that most are heavily parallelized and optimized - clear distinctions in individual operations are not recognizable as with mi- crocontrollers which can execute at most one operation at a time. DPA harnesses the power of statistical analysis and error correction methods to locate information which correlates to a devices secret key. DPA can be broken down into two major components: data collection and data analysis.

2.3.1 Protection Mechanisms

Current protection mechanism against that are effective against side channel based attacks fall predominately in one of two categories that either reside at low levels or high levels of implementations. There is a distinct bias toward low level methods due to their effectiveness though their efficacy leaves room for significant improvement.

Low Level Methods

Most of the present day research eﬀort focuses on removing the variability of the side channel itself, that is if no variability exists within the side channel - no useful Chapter 2: Fundamentals 30

(a)

(b)

(c)

Figure 2.12: (a) Simple Power Analysis of a single AES encryption on ASIC device easily reveals 10 rounds each taking eight clock cycles to complete. (b) Annotated, zoomed in view of AES encryption occurring on a microcontroller with the corresponding instruction set shown in (c) (Source: [15]). Chapter 2: Fundamentals 31 information is transmitted: any correlation between the model and the side channel will be random in nature - TSC 6∝ MSC . Assuming an end target of automated design, the logical approach is the design of secure logic styles - among the first proposed were [21,22]. These dual logic, and balanced style cells, when analyzed simply at the fundamental building block level reduced variability of the power side channel at the expense of overall power dissipation and area footprint. When placed and routed during layout, difference in wire length and cell placement caused mismatched capacitance and timing delays that were easily attacked. The proposition of fatwire routing and other techniques have since mitigated these issues, but again at significant penalty over standard CMOS designsTIRI Other cell level techniques, one of which is highlighted in Chapter3, use novel structures to further reduce power variation while also approaching CMOS performance targets [23, 24]. Furthermore, some of these logic styles have been incorporated into novel logic synthesis flows which in some cases result in circuits that can outperform standard CMOS circuits in terms of power consumption and footprint [25]. Finally, while other recent advances in the use of charge recovery logic in dual-rail logic styles such as SABL have also mitigated power penalties [26,27], low-level approaches come at significant hardware cost in terms of standard cell restrictions, speed and design effort.

Side channels exist because no two operations can ever be identical - take two simple gates laid out on the same substrate, their locations differ, the wire routing differs, their loads differ and therefor the resulting load and parasitic capacitance can never be balanced. The resulting capacitance differential yields slight variations in dynamic power.

High Level Methods

High level hardware methods to prevent side channel attacks fall into two main categories - hiding and masking. Hiding methods remove the linkage between side chan- Chapter 2: Fundamentals 32 nel and the underlying function while masking methods randomize the observable values processed by a target device.

Masking techniques generally require changes to the underlying algorithm making the process unsuitable to automation. Within hardware, speciﬁc masking techniques include multiplier masking [28–30], random precharging [31,32], and bus masking [33]. Un- fortunately, most current research in masking centers around a small subset of algorithms dominated by one algorithm - AES [15] and these technique can fall prey to higher-order attacks as they deal with the injection of random data.

High level hiding mechanisms attempt to make side channels independent of underlying data or device operations. Approaches generally have centered around randomizing the side channel or making the side channel constant. These approaches occur in one of two dimensions - time or amplitude. Time based hiding mechanisms include the insertion of dummy operations and operation shuﬄing while amplitude based hiding either increase device noise or reduces the signal. In hardware architecture level solutions the focus is almost exclusively on random events and modiﬁcations rather than decreasing the signal

(note the complete opposite is true at the low level). Present high level methods require either speciﬁc detailed manipulations of individual algorithms or require modiﬁcations to targeted subsystems/functions which are neither trivial to randomly generate, much less re- combine with the entire system’s data and control logic. This work aims to provide a global solution for a class of circuits irrespective of individual algorithms or implementations. Chapter 3

Low Level Protection Mechanisms

Security is an attempt to try to make the universe static so that we feel safe. -Anne Wilson Schaef

3.1 Introduction

When approached from a low level perspective, side channel information leakage occurs due to the physical characteristics of logical functions that have been instantiated in particular hardware technologies. For example, while not immediately practical in VLSI, logical functions can be built from electro-mechanical relays [34], pneumatic devices [35],

Minecraft blocks [36][37] and even DNA [38]- each of which can leak information through different mediums and at different rates due to their particular constructs and interactions within the physical world. In the world of silicon technologies, the same holds true. The data dependent leakage of information from CMOS devices is dominated by one critical factor at low levels of implementation - dynamic power consumption. This chapter focuses chiefly on dynamic power - namely how to balance it in order to make it data invariant through cell design and automated synthesis and how variations in device construction or environment factors impact its variability.

33 Chapter 3: Low Level Protection Mechanisms 34

Side channel security as a low-level implementation activity focuses on removing the leakage of information from a side channel by making the physical components perform either independent or constant with respect to the processed data. This area of side channel attack resistance offers the greatest level of security though at an extremely high cost. The cell and circuit level methods introduced in this chapter aim to mitigate the resistance that these methods have encountered to entering mainstream design flows due to their significant area, routing, power, time and delay penalties.

Due to the proliﬁc nature of this area, this chapter begins with a discussion of existing work, focusing on similar works that are ultimately the baseline for comparative analysis. The proposed low level solution, a side channel invariant cell design (SDMLp), is

detailed, along with the use of this custom cell (library) is integrated with a circuit level

ROBDD-based synthesis solution for circuit level designs. Finally, a brief discussion on the

impacts of temperature on attackabililty concludes the chapter.

3.2 Existing Work

The past decade has seen signiﬁcant work in the attack and security of side chan-

nels [39, 40]. Current low level countermeasures suﬀer signiﬁcant power, area and delay

penalties, making them unsuitable for a majority of consumer and low power device mar-

kets. Additionally, many of these logic styles require speciﬁc knowledge and skill to assemble

and achieve true security. The proposed SDMLp cell reduces this existing gap by reducing

data dependent power variations while reducing overall area and power consumption - all

with a single generic cell.

Recall that observable data dependencies originate due to characteristics of com-

plementary CMOS logic which consumes a disproportionate amount of power for each of Chapter 3: Low Level Protection Mechanisms 35 the four possible transition scenarios: 0 → 0, 1 → 1, 0 → 1 and 1 → 0. Several logic styles

have already been proposed that attempt to mitigate this imbalance.

Most present day secure logic styles are designed using the Dynamic and Diﬀeren-

tial Logic (DDL) that was ﬁrst introduced and the backbone of Sense Ampliﬁer Based Logic

(SABL) [16]. Fundamentally, DDL styles force individual gates to go through exactlyone

transition per cycle, regardless of input combinations. This class of logic styles has various

implementations, but in each case, a set of complementary and non-complementary signals

generate diﬀerential outputs by pre-charging output nodes before evaluation.

The next leap in DDL, Wave Dynamic Diﬀerential Logic or WDDL, introduced pre-

charge wave propagation allowing it to use standard static CMOS cells as a backbone [41].

Unfortunately, WDDL implementations require double the power and area when compared

to standard SCMOS implementation. This overhead motivated Reduced Complementary

Dynamic and Diﬀerential Logic (RCDDL) which uses less area and consumes less power than

WDDL [42]. While both RCDDL and WDDL use complementary logic structures to create

complementary sections of a circuit, they both suﬀer increased current (and ultimately

power) variations due to diﬀerent switching capacitance inherent due to the diﬀerences

between the complementary physical structures.

The methods proposed in this chapter allows for DPA resistant circuits that ad-

dresses the issues of high power and area overhead while also equalizing individual capaci-

tance’s allowing for near constant power consumption for all operations. Chapter 3: Low Level Protection Mechanisms 36

3.3 Designing a Secure Cell: SDMLp 1

Advanced integrated circuit protection from diﬀerential power analysis attacks can be achieved though a hybrid-logic style based on Complementary Pass-transistor and

Dynamic and Diﬀerential Logic (DDL). The capabilities of the Secure Dual Multiplexor

Logic with pass transistors (SDMLp) cell are tested, validated and compared with Wave

Dynamic Diﬀerential Logic, and the traditional Standard Complementary CMOS Logic.

3.3.1 SDMLp Fundamentals

The Complementary Pass-transistor Logic (CPL) pseudo-”dual multiplexer” shown in Figure 3.1 uses a diﬀerential logic with a symmetrical structure [14] - an ideal choice for a

DDL-based secure logic style. CPL on its own however, does not meet the dynamic behavior requirements needed for DDL: speciﬁcally, it does not impose exactly one-logical switch per cycle.

1Research work discussed in this section is collaborative work with Lakshmi Narasimhan Ramakrishnan [23] with additional details in [24]. Chapter 3: Low Level Protection Mechanisms 37

Figure 3.1: Complementary Pass-transistor Logic Cell implementing a Dual Multiplexer conﬁgured using Table 3.1 can realize any two-input function.

The required dynamic behavior of SDMLp is achieved, as in other secure logic styles, through the use of a two phase process controlled by two primary transistor networks. The overall functionality of the circuit is an alternation between a setup phase

(pre-discharging the network) and an evaluation phase.

Evaluation phase

The portion of the network responsible for the evaluation phase is common between the generic CPL shown in Figure 3.1 and the custom SDMLp cell shown in 3.2. The four

NMOS transistors in conjunction with two inverters can realize equations (3.1) and (3.2) which when provided input stimulus, summarized in Table 3.1, can implement any two-input function. Chapter 3: Low Level Protection Mechanisms 38

Figure 3.2: SDMLp Cell Logic Cell implementing a Dual Multiplexer as well as the DDL required setup-phase (pre-discharge) sub-circuitry.

Out = Ip1 · S + Ip2 · Sbar. (3.1)

Outbar = Ip1bar · S + Ip2bar · Sbar. (3.2) Chapter 3: Low Level Protection Mechanisms 39

IP1 IP2 S Out Outbar

A BB A · B A · B

B AB A + B A + B

A AB A · B + A · B A · B + A · B

B AS S · A + S · B S · A + S · B

A BB A · BA + B

BAB A + BA · B

A AA A A

Table 3.1: Generic CPL gate input parameters with associated output

In an SDMLp cell (Figure 3.2), the evaluation phase begins only when S and Sbar are complements of each other. Speciﬁcally, during evaluation, there is no path between the pre-discharge network and the four NMOS transistors m1, m2, m7 and m8 evaluate to complementary, Out and Outbar signals.

Setup - Pre-discharge phase

In order to realize a dynamic logic cell, SDMLp includes a pre-discharge network made up of four PMOS transistors - m3, m4, m9 and m10. The inclusion of these transistors, controlled entirely by S and Sbar, result in the realization of modiﬁed, dual-multiplexer with two outputs Out and Outbar that functionally equivalent to equations (3.3) and (3.4) respectively. Chapter 3: Low Level Protection Mechanisms 40

Out = Ip1 · S + Ip2 · Sbar + S · Sbar. (3.3)

Outbar = Ip1bar · S + Ip2bar · Sbar + S · Sbar. (3.4)

The pre-discharge phase, also controller via a phase controller, starts when S and

Sbar are forced to logic low [41]. When both S and Sbar are pulled to logic level 0, the pre-discharge network is activated (PMOS transistors m3,m4,m9 and m10 conduct while

NMOS transistors m1,m2,m7 and m8 stop conduction) - forcing both inverter outputs,

Out and Outbar, to logic level 0. Pre-discharge signal propagation initializes interconnect and internal capacitances of the circuit before each evaluation phase and guarantees one and only one transition occurs per phase. The propagation of the pre-discharge signal in addition to near constant capacitance of the cell, due to its symmetric structure, results in constant power consumption during evaluation. Since during the evaluation phase, there is only one 0 → 1 transition, during the subsequent setup phase, there is exactly one 1 → 0 transition.

3.3.2 Cell Level Characterization & Analysis

Maximum instantaneous current variation is the main security objective, since it represents the worst-case current leakage in a cell that could be exploited by DPA. A comparison of several key design metrics including current variation are performed on SDMLp

(Figure 3.3) using traditional unsecured logic (SCMOS) and an existing secure logic style

(WDDL) as a baseline. Chapter 3: Low Level Protection Mechanisms 41

Figure 3.3: SDMLp cell layout

Experimental results presented in Table 3.2, show a signiﬁcant reduction in maximum instantaneous current variance, between all possible input transitions, for several basic gates. This, almost constant, switched current consumption improves attack resistance of

SDMLp, but as shown in Table 3.3, incurs a delay penalty when compared to SCMOS.

From Tables 3.5 and 3.4, SDMLp clearly consumes signiﬁcantly less power and is smaller in comparison to WDDL.

As mentioned earlier, constant load capacitance was used for both Out and Outbar outputs during simulation. Similar to WDDL, this is a requirement for optimum performance of SDMLp and can be met by using fat wire routing [43] or other routing techniques for balancing capacitances of complementary wires. Chapter 3: Low Level Protection Mechanisms 42

Gates(2X1) SCMOS SDMLp WDDL

AND 35.81 1.29 7.91

OR 32.63 1.36 6.96

XOR 69.74 1.39 7.73

MUX 74.43 1.26 8.64

Avg 53.15 1.325 7.81

Std Dev 21.98 0.06 0.69

Table 3.2: Maximum Instantaneous Current Variance (10−9Amps2)

Gates(2X1) SCMOS SDMLp WDDL

AND 24.86 35.61 37.6

OR 25.6 36.1 35.56

XOR 49.81 36.52 78.61

MUX 47.62 37.23 77.22

Avg 36.97 36.37 57.25

Std Dev 13.59 0.69 23.89

Table 3.3: Propagation Delay (10−12 Seconds)

3.3.3 Implications

A designer with ﬂexibility and time can design an SDMLp circuit using a 1 to

1 translation process since the SDMLp universal cell can instantiate any 2 input logical Chapter 3: Low Level Protection Mechanisms 43

Gates(2X1) SCMOS SDMLp WDDL

AND 7.6 18.4 15.2

OR 7.6 18.4 15.2

XOR 12.2 18.4 24.4

MUX 11.4 18.4 22.8

Avg 9.7 18.4 19.4

Std Dev 2.45 0.00 4.89

Table 3.4: Layout Area (λ2).

Gates(2X1) SCMOS SDMLp WDDL

AND 5.29 7.41 13.95

OR 5.62 7.43 14.47

XOR 6.69 7.02 22.12

MUX 7.02 6.93 21.98

Avg 6.15 7.20 18.13

Std Dev 0.83 0.26 4.53

Table 3.5: Average Power Consumption (10−6 Watts).

function. While this can optimized as the next section discusses. A manual process still

has major implications in four main categories. The use of SDMLp cells in a design have

the following reduce the following critical parameters: Chapter 3: Low Level Protection Mechanisms 44

1. Instantaneous Current Variation is reduced by 10% over existing secure cell

logics and over 250% over SCMOS.

2. Layout Area and Variation are less than secure logics styles, with area doubling

over SCMOS with no variation.

3. Propagation Delay is reduced compared to SCMOS and Secure logic styles by 2%

and 37% respectively.

4. Average Power Consumption approaches SCMOS (17% increase) but is a 60%

reduction over other secure logic styles. Variation also decreases to 25% of SCMOS

and only 6% of secure styles.

3.4 Secure Circuit-Level Design Methodology2

This section details a Reduced Ordered Binary Decision Diagrams (ROBDD) based circuit synthesis methodology for SDMLp based circuits. The motivation and explanation of the synthesis ﬂow are followed by experimental results of a DES layout where, when compared to existing DDL styles, the area usage dropped 43% and the total power consumption reduced over 50% both at a speed penalty of 20% when compared to SCMOS.

In order to motivate the need for a SDMLp targeted circuit synthesis technique, consider two complementary logic functions. Assuming two functions f = A · B and its complement f = A · B = A + B. With standard CMOS logic cells a minimum of two standard cells would be used depending on timing constraints. The ﬁrst, either an AND or NAND would be used to implement f/f while an inverter would be used to generate the complementary functional output f/f. Using a one-to-one function between CMOS

2Research work discussed in this section is collaborative work with Manoj Chakkaravarthy [23] with additional details in [25]. Chapter 3: Low Level Protection Mechanisms 45 and SDMLp, while valid, is clearly sub-optimal. With knowledge that f and f are complements obviously both could be implemented using a single SDMLp cell that generates complementary signals.

3.4.1 Synthesizing SDMLp based Circuits Using Reduced Order BDDs

The pseudo dual-multiplexer embedded within SDMLp cells is the proposed target for circuit level optimization. Given any Boolean equation, a binary decision diagram (BDD) is an eﬃcient data structure that can be used to represent it in the form of a directed acyclic graph (DAG). Allow each node in a BDD to be a one-to-one mapping of a generic multiplexer [44]. In the case of SDMLp - equivalently stated: Each node in Binary Decision

Diagram is implemented using a single SDMLp cell.

Following other logic synthesis techniques, a package such as CUDD can provide an optimal variable ordering of the given Boolean equation [45]. Once a new BDD is generated based on the obtained variable ordering, additional reduction steps are applied to generate a Reduced Order Binary Decision Diagram [46]. Using Figure 3.4 as an example, in conjunction with the earlier discussion on complementary signals, clearly the elimination of BDD nodes occurs any time it’s children are complementary [44].

Figure 3.4: Complementary Nodes Chapter 3: Low Level Protection Mechanisms 46

3.4.2 SDMLp Synthesis Flow

With an established motivation for a targeted secure circuit synthesis ﬂow for

SDMLp cells, Figure 3.5 details a High Level to ASIC layout design ﬂow. The synthesis program is used to generate a SDMLp gate level netlist for the combinational portion of a circuit. For DES this included the cryptography circuit comprised in decreasing order of area S-Boxes, XOR trees and multiplexer modules.

Figure 3.5: A Targeted Secure Circuit Synthesis Flow for SDMLp Design Flow. Chapter 3: Low Level Protection Mechanisms 47

3.4.3 Implementation & Analysis

Using a standard ﬂow, as well as the custom SDMLp ﬂow in Figure 3.5, several

DES based circuits and sub-ciruits were implemented in 90nm technology.

DES Cryptographic circuit

The cell based experiment, from the previous section, was repeated on a DES circuit and its major cryptographic components. The DES hardware consists of eight Sub- stitution boxes (S-boxes) [8]. All 8 DES S-boxes were implemented in SCMOS, SDMLp and WDDL logic styles; their area (Table 3.6), power (Table 3.7)and instantaneous current variance (Table 3.8) are all characterized.

After the implementation and characterization of the S-Boxes, a full DES hardware implementation was also characterized using the three logic styles showing signiﬁcation reductions in area, power and instantaneous power at an expected speed penalty over WDDL.

Table 3.9 summarizes all the results, in particular highlighting the dramatic move towards the typical SCMOS design metrics with the exception of speed. Figure 3.6 shows the three

DES layouts in the same bounding box of 260λ by 260λ.

Attack on DES Single Round Implementation

Finally, a Diﬀerential Power Analysis attack was mounted on one round of DES circuit designed using SCMOS, SDMLp and WDDL cells. The attack was performed based on the framework adopted by Junee for DES [47] and Tanimura & Dutt for AES [48]. An arbitrarily selected secret key (20) was chosen for performing the attack. The secret key of SCMOS circuit was easily revealed on application of 70 random vectors. On the other hand, the secret key for the SDMLp and WDDL implementation could not be revealed even after the application of 5, 000 random vectors. Chapter 3: Low Level Protection Mechanisms 48

(a) (b)

(c)

Figure 3.6: DES layouts for (a) SCMOS, (b) WDDL and (c) SDMLp. Chapter 3: Low Level Protection Mechanisms 49

DES S-box SCMOS SDMLp WDDL

1 1137.45 1766.09 2619.83

2 1125.54 1586.3 2503.26

3 1091.42 1537.81 2434.72

4 1101.71 1235.04 2393.85

5 1079.64 1615.82 2360.16

6 1120.24 1651.18 2408.69

7 1122.96 1512.04 2499.77

8 1055.29 1560.09 2461.09

Avg 1104 1558 2460

Std Dev 28 152 82

Table 3.6: DES Layout Area (λ2).

In Figure 3.7 the waveform shows the Correlation Coeﬃcient versus Number of vectors during DPA attack for SCMOS, SDMLp and WDDL. The darkened line corresponds to the original Key. The correlation corresponding to the correct key is easily identiﬁed while in comparison to other key values over time in case of SCMOS design. On the other hand, the correlation for the original key corresponding to SDMLp and WDDL is un-distinguishable from the other key correlation values.

Figure 3.8 plots the Correlation Coeﬃcient versus all 64 key guesses for SCMOS,

SDMLp and WDDL respectively; a distinguishable peak in correlation identiﬁes the key.

Thus the secret key is revealed in the case of SCMOS, but in the case of WDDL and SDMLp Chapter 3: Low Level Protection Mechanisms 50

DES S-box SCMOS SDMLp WDDL

1 9.03 13.9 20.1

2 9.00 13.2 17.9

3 8.99 12.7 18.5

4 9.04 11.0 14.9

5 8.99 13.9 18.7

6 9.00 13.1 19.2

7 9.02 13.0 17.1

8 8.98 13.2 18.3

Avg 9.00 13.0 18.09

Std Dev 0.02 0.91 1.56

Table 3.7: DES Total Power Consumption (10−5W atts). there is no single key guess that is distinguishable. Finally, the correlation for the correct key guess (20), is indistinguishable from the correlation coeﬃcient of other keys: further traces would not enable its distinction either.

3.4.4 Implications

While a secure cell has certain beneﬁts, the overhead of 1-to-1 cell replacements can be avoided in automated synthesis tools. Harnessing the idea that complementary signals are already generated ”for free” when using dual-routed complementary cells and circuits BDD based synthesis the techniques and methods presented in the previous section Chapter 3: Low Level Protection Mechanisms 51

DES S-box SCMOS SDMLp WDDL

1 7490 3.03 43.6

2 186.0 4.01 83.6

3 323.0 4.11 38.9

4 8690 5.30 14.8

5 245.0 7.18 32.3

6 7670 4.29 53.5

7 9420 4.62 32.3

8 323.0 4.77 49.7

Avg 4293 4.66 43.59

Std Dev 4342 1.21 20.15

Table 3.8: DES Maximum Instantaneous Current Variation(10−7Amps2).

DESFull Chip SCMOS SDMLp WDDL

Area (λ2) 13247.21 18715.26 32714.64

Total Power (mW) 1.37 1.47 2.96

Max. Op Freq. (MHz) 100 66.67 83.33

Inst. Current Var. (10−6Amps2) 1892.3 5.95 229.1

Table 3.9: Full Chip DES Implemented in SCMOS, SDMLp and WDDL. Chapter 3: Low Level Protection Mechanisms 52

(a)

(b)

(c)

Figure 3.7: SDMLp Correlation vs. Number of Vectors (a) SCMOS, (b) WDDL and (c). allow the automation of reduced-sized secure circuits. The only caveat to using this method is that currently the original synthesis can only include a subset of all standard cells - Chapter 3: Low Level Protection Mechanisms 53

(a)

(b)

(c)

Figure 3.8: SDMLp: Correlation vs. Key Guess (a) SCMOS, (b) WDDL and (c) speciﬁcally a subset of two input logic gates. Using this method SDMLp cells can be used to minimize secure cell area overhead (40% increase over SCMOS versus WDDL’s 115% Chapter 3: Low Level Protection Mechanisms 54 increase) as well at the instantaneous current variation by 1000% over SCMOS and 90% over WDDL secure cells.

3.5 Temperature Variation Eﬀects on Dynamic Power 3

A critical component of the low level, physical characteristics of logic device is its existence within a real environment. While designed and physically laid out using various standard and custom synthesis flows - previous works only capture the variations in the macro-scale parameters that affect power side channel information leakage. Specifically, works have focused on load capacitance (due to routing and placement); timing/glitch and switching characteristics; and overall power requirements - hoping to make each data independent. This section focuses on the other half of the equation - capturing the effect that variations within the physical world have on the variance of dynamic power. Specifically, this section focuses on the effect of temperature variations on dynamic power variation and the implications for side channel security.

3.5.1 Temperature Eﬀects

In order to understand the eﬀect of temperature on a CMOS circuits power consumption the fundamental construction and parameters of a MOSFET (Metal–Oxide–

Semiconductor Field–Eﬀect Transistors) must be examined. Of the several dozen variables that control the functionality of any given MOSFET four are impacted by the ambient of the MOSFET itself: threshold voltage [50], carrier mobility [51][52], saturation velocity [53] and parasitic drain source resistance [54]. Synthesizing the results of these previous works [49] details the follow broads eﬀects of increased temperature on these parameters:

3Research work discussed in this section is collaborative work with Aditi Vijaykumar, additional details can be found in [49]. Chapter 3: Low Level Protection Mechanisms 55

• Threshold Voltage -[Vth] Decreases with increased temperatures

4 • Eﬀective Carrier Mobility [µeff ] - Decreases with increased temperature [55].

• Saturation Velocity [VSAT = µeff Ec] - Has a weaker dependence as the Electric

Field Ec increases with temperature.

• Parasitic Drain/Source Resistance - Increases linearly with respect to tempera-

ture.

Based on these effects and taking into account technology scaling the it has been hypothesized that the effective drive current at any temperature T is dependent on the dominate effect between Vth scaling and decreasing carrier mobility [49]. Unless these parameters are perfectly balanced, changes in temperature result in drive current changes.

In order to validate this hypothesis, two full cryptographic circuits were implemented in Spice using a 90nm standard cell logic style: the ﬁrst a classic Non-Linear Shift

Feedback Register based block cipher [56][57] and the second a classic Feistel network implementation of the DES [8]. Each circuit was simulated in Spice using ﬁve unique, random

64-bit keys, at four unique temperature points of 10, 25, 75 and 125 degrees Celsius with thousands of random 32-bit plaintexts. The resulting ciphertext and power information were fed through to an existing Side Channel Attack Research Framework Tool (See Chap- ter6) to determine the relative Dynamic Power variability and thus DPA attakability at each temperature point.

Figure 3.9 summarizes the average maximum correlation between dynamic power and hamming models for both the NLFSR block cipher algorithm as well as the DES implementation. In 90nm technology, a 10-30% decrease in correlation is observed when

4Note that carrier mobility refers to both Electron and Hole mobility. Chapter 3: Low Level Protection Mechanisms 56

Figure 3.9: Average Correlation of ﬁve correct keys for a classic NLFSR Block cipher as well as DES showing between a 10-30% reduction by increasing the temperature from 10 C to 125 C. increasing the temperature from a nominal 10 C to 125 C. This drop in correlation is directly related the reduction in dynamic power variation [49]. In order enhance this eﬀect, the use of High Vth cells was also explored. Using a high Vth reduces carrier mobility and as well as the drain current variance. The following experiments again use the same two cryptographic circuits and are repeated for high Vth as well as an intermediate Vth = 26% high Vth. The correlation results for the NLFSR block cipher are found in Figure 3.10 and show that the use of High Vth cells further reduces potential correlation up to 40% when increasing the temperature from 10 C to 125 C (44% drop from nominal Vth@10C).

3.5.2 Implications

The implications of this work are two fold. First secure devices need to be tested more robustly, at various temperatures - not simply to ensure functionality, but rather Chapter 3: Low Level Protection Mechanisms 57

Figure 3.10: Average Correlation of ﬁve correct keys for a classic NLFSR Block cipher 10% reduction by increasing the temperature from 10 C to 125 C under standard Vth increasing to a 40% reduction when using Higher Vth cells over the same temperature increase. guarantee protected side channels. Second, while increasing the temperature of a device increases its power consumption, data dependent variations are increasingly masked which thereby make attacks more diﬃcult. While the ambient temperature of a device is impossible to control (e.g. an outside force can easily change it), designers can modify the Vth levels for baseline impacts. This work heavily motivates future work in the area. Having shown that a devices is extremely sensitive to a single environmental variation motivates investigation in other environmental and process-based variations. Chapter 4

FSM Based Protection

Mechanisms

4.1 Introduction

While the past thirty years of hardware design and design automation have focused, in part, on creating smaller, minimally-sized circuits, this chapter proposes that security- centric designs require a departure from this minimalist mentality. The need for built- in protection mechanisms, at all levels of design is paramount to providing cost-eﬀective, eﬃcient, secure systems. The focus of this chapter is on missing weak-link within the design of secure hardware systems: the high-level design of secure sequential circuits. The problem is approached by targeting Finite State Machines (FSMs) and their vulnerability to non-invasive, side channel based attacks.

The standard approach to FSM synthesis and encoding of circuits enables the high level modeling of the internal registers needed in side channel based attacks. Minimal encoding strategies and state reduction technique allow the current state or transition be-

58 Chapter 4: FSM Based Protection Mechanisms 59 tween states of a FSM to be easily correlated to several common Hamming-based power models. This chapter proposes and details a two-fold method for the design side channel secure Finite State Machines (S*FSMs). The method focuses on FSM design with respect to their physical topology and encoding. Theoretical results show the eﬀectiveness and potential for security driven approaches to FSM Synthesis in eliminating the relationship between common side channel models and hardware implementations. These theoretical results are then supported by the characterization and security analysis of fully synthesized

FSM circuits.

4.2 Motivation

In a world where access is power - authentication and secrecy control mechanisms are constantly under attack. From physical access to decoding media to the control/disruption of autonomous systems - attackers probe the weakest link in a network, device, subsystem, or subroutine. Since the authentication schemes used in many critical devices are mathematically and computationally diﬃcult to comprise, attackers now routinely seek out secondary sources of information to compromise a system. These secondary sources of information can directly attack an authentication scheme (e.g. direct key recovery) or can be used in conjunction with traditional sources of information (e.g. transmitted data) to form more intelligent, computationally easier attacks. A classical example of the latter are side channel attacks which typically use energy information in the form of power or electromagnetic radiation to mount extremely powerful attacks on cryptographic modules.

The protection of hardware devices from side channel attacks is well documented at low levels of abstraction, including in the previous chapters. A majority of existing low-level solutions focused on methods that reduce intra-cycle current variations through dual-routed Chapter 4: FSM Based Protection Mechanisms 60 logic cells [16][41][42][23]. These methods, albeit effective, come at significant design cost and increased complexity in downstream design and implementation. The existing high level masking and hiding alternatives tend to be restricted to specific algorithms and require detailed and specific knowledge to implement - making them less than desirable for formally creating secure hardware designs from arbitrary functions.

Rather than focus on the protection of a single class of cryptographic algorithms the focus is on a more general problem space: the realm of FSMs. Since their use in hardware devices is widespread and their signiﬁcance within a larger device scope varies from minute to critical, the information gleaned from an FSMs current state or transition can do everything from reducing the computational complexity of an attack to directly revealing sensitive information to allowing disruption of its normal operation. Several speciﬁc direct uses of FSMs in security critical hardware include parking meters, anti-lock brake systems,

[58] pacemakers [59] and other medical devices [60]. In order to alleviate this weakness, a high level method described in [61] eliminates the relationship between FSMs and side channel models. This work applies this method on a subset of FSMs benchmarks in order to ﬁrst validate the theoretical result as well as characterize the ﬁnal design impact.

4.3 Background

Side channel attacks are hardware based attack vectors that exploit the relation- ships between logical/functional operations of a device and their physical byproducts. In conjunction with some basic information-leakage models and standard device I/O these correlation driven attacks can be used to determine information stored within the device.

Consider a target device T that implements some set of functions F with input/output IO and a set of side channels SC. Assume some sensitive information S is Chapter 4: FSM Based Protection Mechanisms 61

processed by a functional subset fs of F , fundamentally, an attack requires the following:

1. A relationship between at least one side channel and an IO sensitive functional subset

— ∃(sc ⊂ SC) ∝ fs(IO);

2. A secret and/or IO dependent model of the side channel — M(s, IO) ∝ sc ∝ fs(IO);

One of the most prevalent, and original, side channels attacks, due to its accessibility and low hardware cost, targets devices using information embedded within their data-dependent power consumption proﬁle [20]. These attacks typically use hamming-based models, including the Hamming Weight (HW) and the Hamming Distance (HD), found in

Equations 4.1 and 4.2 respectively. These models target a set of bits within a device (e.g. register, bus), either by modeling the total number of bits “on” or the number of bits that are switching [62]. When unguarded against, side channel attacks are extremely powerful as variations in a Hamming model correlate to variations in the power consumed [15].

x HW (S[x..0]) = X S[i] (4.1) i=0

x X HD(Si[x..0],Si+1[x..0]) = Si[b] ⊕ Si+i[b] (4.2) b=0 Recall that there are two basic requirements in order to perform side channel attacks - the ﬁrst requires a side channel that is related to the secure (sub)function while the second requires a modeled secure sub-function to be proportional to the side channel and intrinsically the underlying (sub)function. Most existing research removes the relationship between the side channel and the underlying data dependent functionality. From a statistics perspective, given a model m and a power side channel p, each containing j aligned data- points, the correlation, C(m, p), is deﬁned by Equation 4.3. Chapter 4: FSM Based Protection Mechanisms 62

P ((m − m) × (p − p)) C(m, p) = j j (4.3) qP 2 P 2 (mj − m) × (pj − p)

4.3.1 Limitation to existing Research

Current research in side channel security focuses on preventing information leakage due to the physical realization of a design. In most cases, information leakage is mitigated using either low-level design techniques or with methods targeted at speciﬁc algorithms and implementations. While Chapter3 focuses on Low Level design techniques, this op- tion does not allow for a true top-down application and optimization of device security countermeasures.

The most prevalent side channel solutions focus on low level implementation methods to reduce intra-cycle current variations, typically through dual-logic cell styles [16][41]

[42][23]. These methods, while effective, come at significant design cost and increased complexity in downstream design constrains and implementation. In particular these styles suffer significant penalties either in cell layout area, routing, delay, and overall power consumption [21]. The alternative to low level cell manipulation is through high level masking and hiding of information [63]. .

Other approaches include special asynchronous designs methods which again pose issues in widespread design automation [64]. Similarly, approaches that are speciﬁc to an algorithm, typically deal with masking speciﬁc computations - methods not generically or automatically adaptable to any logic circuit [30, 65] . These techniques not only trap designers into looking for patterns rather than solving the underlying problem, but they are both impractical for widespread, generic use and are prey to higher-order side channel attacks. Chapter 4: FSM Based Protection Mechanisms 63

Low level techniques used towards the mitigation of power based side channel attacks focus on removing information from the physical side channel. The general prin- ciple consists of removing the inter-cycle variation from within a (sub)circuit due to data- dependent inputs. A majority of secure CMOS logic styles follow two conditions outlined by Tiri et. al [22]:

1. Each logic cell must go through exactly one output transition during each cycle (re-

gardless of input); and

2. The total switched capacitance must always be constant.

When both conditions are satisﬁed, the result is a power side channel with no cycle-to-cycle variation (∀j, pj = p). The correlation between the model and modiﬁed side channel, as derived in Equations 4.4- 4.6, is indeterminate. In real-world implementations, the second condition of perfectly matching the total switched capacitance is unfeasible, especially post-layout, and leads to slight variations in data-dependent power consumption.

P ((m − m) × (p − p)) C(m, p) = j (4.4) qP 2 P 2 (mj − m) × (p − p) P ((m − m) × (0)) = j (4.5) qP 2 (mj − m) × (0) 0 = √ Indeterminate. (4.6) 0

Finite State Machines: Encoding and Security

In order to implement a high-level solution, a similar strategy is employed within the side channel model space. The objective is to minimize and eliminate leakage from and information theory perspective. Speciﬁcally, the objective is that for each side channel model all target data points are constant, or equal to the average of the entire set (∀j, mj = m). Chapter 4: FSM Based Protection Mechanisms 64

As such, correlation between a variable side channel and constant, static, model will be indeterminate.

Within the minimalist spectrum, Finite State Machine (FSM) synthesis and encoding has been explored extensively. The current research focus in FSM synthesis and encoding lies in low power applications [66–68] as well as some use as protection methods against fault injection based attacks [69–71]. In order to address this larger research thrust,

Chapter5 includes power as a competing, achievable, objective by relaxing and constraining the parameters derived in this chapter.

4.4 Objective

Rather than focus on removing the variability of the underlying physical com- ponents1 the methods in this chapter remove the variability of the side channel models themselves. This approach follows the top-down architecture dogma: improvements at the high level are more efficient than those implemented at the low level [75]. While greater efficiency in no way implies greater security, the high-level methods in this and the next chapter act as a first pass solution that any designer can implement to gain increased security with overhead on average better than low level methods. Increased security, in a top-down approach can be achieved by introducing security mechanisms at each design stage until the design’s security metrics are met.

The major contribution of this chapter is the elimination of variation within the side channel models of Finite State Machines: it provides a high-level method to protect any FSM based devices against power side channel attack. Recall two of the predominant power models used in side channel attacks: the Hamming Weight (HW) of internal register

1While possible in simulation this task is non-trivial when dealing with mass produced devices since even perfectly laid out and routed designs are subject to intra-die process variations. [72–74] Chapter 4: FSM Based Protection Mechanisms 65

S or the Hamming Distance (HD) between two successive states St and St+1 of register S.

Variability in either of the Hamming models can be correlated to the power consumed, and thus exposes a vulnerability of the internal data [15].

4.5 FSM Example

FSMs represent a computational model typically used heavily in the design of circuit level sequential hardware and computing programs found in everything from locking mechanisms and vending machines to communication protocols. While the security of a vending machine is hardly comparable to that of a communication protocol, motivation for Secure FSMs (S*FSMs) starts with a basic FSM rather than one as complex as a communication protocol.

Imagine the classical computer architecture problem of designing a branch predictor. Figure 4.1 shows the branch predicting FSM using the expected 2-bit saturating counter with each state of the counter assigned a minimal binary encoding. The implementation is a generic four-state Moore machine: if the previous branch was taken (T = 1), the branch-predictor moves towards (or remains in) the Strongly Taken state (ST ), otherwise when the previous branch is not taken (T = 0) the predictor moves towards the

Strongly Not Taken state (SNT ). The two intermediate states, Weakly Taken and Weakly

Not Taken (WT and WNT ), aide in providing an increased prediction history or memory, greatly increasing its accuracy [76, 77]. Chapter 4: FSM Based Protection Mechanisms 66

(a) Branch Predictor with transition conditions and

state encodings.

Scurrent Snext T HDHW [Scurrent]

ST ST 1 0 2 ST WT 0 1

WT ST 1 1 1 WT WNT 0 2

WNT WT 1 2 1 WNT SNT 0 1

SNT WNT 1 1 0 SNT SNT 0 0

(b) Transition Table with Hamming Measures.

Figure 4.1: A branch predictor implemented as a two-bit saturating counter.

The minimally encoded two-stage branch predictor shown in Figure 4.1 is summarized in the extended transition table in Figure 4.1(b) which shows both the HW and HD Chapter 4: FSM Based Protection Mechanisms 67 ranging from 0 to 2. The variability of the Hamming Weight and Hamming Distance are unavoidable, regardless of the minimal binary encoding applied. A secure FSM strategy is required.

4.6 S*FSM

As the focus of this chapter is on the security of Finite State Machines, FSMs are traditionally deﬁned as the quintuple (Σ, S, s0, δ, F ) where:

• Σ is the ﬁnite, non-empty, set of symbols.

• S is the ﬁnite, non-empty, state space.

• s0 ∈ S is the initial state. • δ : S × Σ → S is the state-transition function.

• F is the set of ﬁnal states.

A S*FSM is defined as one which eliminates the relationship between side channel models and the internal states and transitions. Two, high-level, conditions are required and sufficient to secure a FSM against traditional HW and HD side channel models. Specifically, a S*FSM must first eliminate the relationship between the current state and HW models as well eliminate the relationship between transitions and HD models. The elimination of the relationship is accomplished by imposing the following restrictions on the state encoding within the S*FSM:

1. ∀s ∈ S : HW (enc(s)) = c1.

0 0 2. ∀s ∈ S∀α ∈ Σ:s = δ(s, α) ∈ S → HD(enc(s), enc(s )) = c2.

The ﬁrst condition, which requires that each state within a S*FSMs has a constant

HW, can be readily achieved, though not optimally with existing methods. The second Chapter 4: FSM Based Protection Mechanisms 68

Figure 4.2: The need for alternate construction and encoding strategies in FSMs is highlighted by this two-state FSM with a variable HD that corresponds to unique transitions. condition, however, which requires that each state transition has a constant HD requires structural modiﬁcation as well as non-trivial state assignments.

4.6.1 Structural

A two state FSM can be used to motivate the need for structural modiﬁcations of secured FSMs. Consider the two unique states, A and B, shown in Figure 4.2. When in state A, the transitions to A and B result in two unique Hamming Distances - regardless of the strategy used to encode the states [61]. The self-loop transition always results in a HD(A, A) = 0 while the other transition results in HD(A, B) 6= 0. For example, when using a one hot encoding {A = 01,B = 10} the HD = 2, while using a minimal binary encoding {A = 0,B = 1} the HD = 1. In order to completely eliminate the relationship between HD and an FSM, a S*FSM can not contain any self-loops.

The need for this structural modification in SFSM is further demonstrated when considering the need for a constant HD between all state transitions. Consider first a single node FSM with one self-looping transition. Albeit not useful, regardless of its encoding, it will always have a constant HW as well as a HD of zero. Clearly, it satisfies both conditions Chapter 4: FSM Based Protection Mechanisms 69 needed for a secure FSM. On the other hand, the two-node FSM in 4.3(a) can never satisfy the second condition. Regardless of encoding selected for states A and B, the HD between two unique states can never be zero, while the HD between any node and itself is always zero.

The only solution capable of eliminating this conﬂict is the unrolling of self loops

- essentially the opposite of state collapsing which is often used to reduce FSM complexity [78]. Thus, in order to satisfy the second condition, a multi-state, side-channel hardened, FSM can not contain any self-loops. Algorithm1, FSM Loop Remove, is a straight-forward, yet eﬀective method of removing self-loops for all FSMs. To achieve this goal it takes each node, checks for a self-loop, and upon ﬁnding one, removes it from the transition list. A new state is added with two corresponding edges, each with the same transition condition(s). Finally all out-going edges are replicated, maintaining the original functionality of the FSM. Without loss of generality, we assume that self-loops should not be present in any hardened FSM.

Algorithm 1 FSM Loop Unroll 1: procedure LoopRemove(V,E) 2: for each v ∈ V do 3: if ∃e : e(v, v, t) ∈ E then . Self-loop w/cond. t 4: E ← E − e 5: v0 ← v . Create new node 6: V ← V ∪ {v0} 7: E ← E ∪ {{v, v0, c}, {v0, v, c}} 8: for each u ∈ V |u 6= v, e(v, u, c) ∈ E do 9: E ← E ∪ e(v0, u, c) 10: end for 11: end if 12: end for 13: end procedure Chapter 4: FSM Based Protection Mechanisms 70

(a) (b)

Figure 4.3: Motivating structural modiﬁcation: (a) FSM which fails Condition 2 for Hard- ened FSM regardless of encoding; (b) Unrolled, structurally secure FSM.

Figure 4.4: Original two-stage branch predictor after a structural modiﬁcation algorithm is applied. Chapter 4: FSM Based Protection Mechanisms 71

State Minimal/Binary One-Hot S*Optimal

ST1 101 100000 0101

ST2 100 010000 0110

WT 011 001000 1100

WNT 010 000100 1001

SNT2 001 000010 0011

SNT1 000 000001 1010

HW Range [0-2] [1] [2]

HD Range [1-3] [2] [2]

Table 4.1: Three encodings applied to the structurally secured FSM from Fig. 4.4.

4.6.2 Encoding

Assuming structural modification have been applied to a design, there is already an existing encoding strategy that satisfies both conditions for S*FSMs. One hot encodings by definition have a constant HW, specifically HW = 1. Trivially, the HD between two states is also constant, HD = 2. Unfortunately, their ease of use is quickly overshadowed by the amount of overhead required to encode each state. One hot encodings require n bits

where n is equal to the number of states within the FSM. This linear increase in encoding

length limits one-hot to all but small FSMs. While other common encoding strategies (e.g.

Gray, Minimal Binary) exist, they fail one or both of the needed S*FSM conditions in most

practical examples.

While a one-hot encoding, as mentioned previously, deﬁned by n = s and c = 1, readily satisﬁes these constraints, a more balanced target for secure encodings in typical Chapter 4: FSM Based Protection Mechanisms 72

n S*FSMs is choosing c = 2 . This selection, referred to as S*Opt, is summarized in Table 4.2. S*Opt maximizes the allowable state space and number of internal transitions while

j n2 k minimizing the encoding length. In particular, S*Opt encodings allow for at most 4 connections from any given state (TMAX ). In cases where the S*Opt does not satisfy the

TMAX requirement an increase in n is required (while a reduction of c alone would be counter-productive).

The main challenge in deriving a minimal secure encoding is minimizing the bits required while maximizing the number of transitions that can be made from a single state.

To formally deﬁne the problem, assume machine with s states. Clearly, in order to satisfy the ﬁrst condition of S*FSMs all state encodings should have the same number of bits “on.”

Thus, given an n-bit encoding (n ≤ s), each of the s states should have c bits on and n−c bits oﬀ. Selection of n and c must satisfy Equation 4.7 to insure all the states can be represented uniquely. Furthermore, in order to minimize power consumption c should satisfy Equation

4.8. Finally, in order to satisfy the second security constraint while minimizing the HD between states (e.g. min bound = 2), Equation 4.10 must be satisﬁed.

! n ≥ s (4.7) c

n c ≤ (4.8) 2

! n n = SMAX (4.9) 2

c × (n − c) ≥ TMAX (4.10) Chapter 4: FSM Based Protection Mechanisms 73

n c SMAX TMAX

2 1 2 1

3 1 3 2

4 2 6 4

5 2 10 6

6 3 20 9

7 3 35 12

8 4 70 16

9 4 126 20

10 5 252 25

11 5 462 30

12 6 924 36

Table 4.2: Maximum States (eq. 4.9) and Transitions (eq. 4.10) given S*Opt(n,c).

4.7 Implementation

In order to ground the theories explored earlier, the following section provides a brief overview of some of the unique implementation details. In combination with source code (available here [79]) this overview should allow any researcher access to the tools needed to create a secure FSM implementation. Since the FSM conversion process is generic, only pseudo-code is provided here. The SMT solver based solution for encodings uses a speciﬁc

Python API (a C interface is also available) and as such that code is written directly in

Python. Chapter 4: FSM Based Protection Mechanisms 74

4.7.1 FSM Conversion

The FSM conversion process operates on standard kiss ﬁle formats similar to the

one appearing in Listing 4.1. The ﬁrst four lines set the input and output width, the num-

ber of product terms and the number of states respectively. The last line (.e) signiﬁes the

termination of the model. The remaining lines, Lines 5-9, contain each transition product

term in following form INPUT SOURCE DEST OUTPUT.

1 . i 2 2 . o 8 3 . p 5 4 . s 4 5 1− st1 st2 00000000 600 st2 st1 11111111 7 −1 st2 st3 11111111 810 st2 st2 11111111 9 0− st3 st2 00010011 10 . e

Listing 4.1: FSM in KISS File Format

While some details from Listing 4.2 are left out for brevity, the conversion process

assumes a 2 level hash style data structure with the SOURCE and DEST as keys and the

INPUT and OUTPUT as a value tuple.

1 foreach SRC in HASH 2 foreach DEST in HASH{SRC} 3 i f (SRC==DEST) 4 HASH{SRC}{TMP} = HASH{SRC}{DEST} 5 HASH{TMP}{SRC} =HASH{SRC}{DEST} 6 REMOVE HASH{SRC}{DEST} 7 end i f 8 end DEST foreach 9 end SRC foreach

Listing 4.2: Secure FSM Creation Pseudo-code Chapter 4: FSM Based Protection Mechanisms 75

4.7.2 Solving the Encoding Problem with an SMT Solver

The power of an existing Satisfiability Modulo Theories solver, Z3 [80] is harnessed to check the satisfiability of the imposed security constraints within a range of bit-encoding lengths. Z3 is an extremely efficient SMT solver that allows the solving of mixed form

first-order formulas from various background theories. Of particular interest, is its ability to generate and solve models involving arbitrary bit-vectors and manipulations and calculations across of those vectors. This allows us to check whether a given set of FSM encoding constraints can be satisfied within a specific bitlength range MIN ≤ bitlength ≤ MAX

(See Listing 4.3). While it is apparent, for the current objectives, that the bounds are constrained to dlog2(NStates)e ≤ bitlength ≤ Nstates, only the upper-bound is deﬁned as a cross-check. The SMT solver should never return an encoding less than dlog2(NStates)e.

The following code iterates one bit at a time from a predefined maximum bit length down to a predefined minimum. The solver is instantiated in Line 4 while an optional timeout parameter is set in Line 5. Lines 6-8 represent the addition of constraints or rules to solver. Variable definitions if any would typically occur before the addition of rules. Fi- nally, still within the loop structure, Line 9 checks whether the constraints are satisfiable

and if they are the model (e.g. the encoding in this case) is printed, otherwise a bitlength

encoding is not satisﬁable and the program exists. Chapter 4: FSM Based Protection Mechanisms 76

1 from z3 import ∗ 2 from math import ∗ 3 for b i t s in range ( MAX, MIN, −1) : 4 s = Solver(); 5 s.set(”timeout”, 1000000) 6 s . add (CONSTRAINT#1) 7 . . . 8 s . add (CONSTRAINT#N) 9 i f (s.check() == sat): 10 print ”Sat, %d,” %(bits), 11 m = s . model ( ) 12 for d in m. d e c l s ( ) : 13 print ”%s,” % (d.name()), 14 print ”” 15 print ”ASSIGN, %d , ” %( b i t s ) , 16 for d in m. d e c l s ( ) : 17 print ”%s,” % (m[d]) , 18 else : 19 print ”NotSat, %d,” %(bits), 20 print ”” 21 sys . e x i t ( ) 22 print ”” 23 sys . stdout . f l u s h ( )

Listing 4.3: Z3 Control Structure

Uniqueness

First and foremost, regardless of security, every state in an N-state machine, must

be uniquely encoded. Any bit-encoding length (bitlength) less than log2(N) is unsatisﬁable,

while bit-encoding lengths greater than or equal to N are trivially satisﬁable (e.g. one-hot

encodings). Within, Z3, we deﬁne all the states (st1 ··· stn) of a given FSM assuming

a speciﬁc bitlength encoding from within a range as described earlier. Shown below in

Listing 4.4 is the process used to deﬁne a four state bitvector using the Z3 python interface

(Line 3). In order to satisfy the unique encoding requirement Line 4 adds a rule to the

solver that requires each bitvector to be distinct. Chapter 4: FSM Based Protection Mechanisms 77

1 for b i t s in range ( MAX, MIN, −1) : 2 . . . 3 s t 1 , st 2 , s t 3 , s t n = BitVecs(st 1 s t 2 s t 3 st n , b i t s ) 4 s.add(Distinct(st 1 , st 2m st 3 , s t n ) ) 5 . . .

Listing 4.4: Deﬁning Z3 Bit Vectors (Line 3) and adding a Uniqueness Constraint (Line4)

Next, a set of constraints are speciﬁed for the SMT solver. In order to guaranteed

that the HW constraint (Listing 4.5) is satisﬁed for all N states (e.g all states have an

unspeciﬁed HW=c1), no more than n − 1 distinct rules are required - namely, HW (sti) =

n HW (sti + 1) for 1 ≤ i < n. Optimally, only b 2 c checks, representing the inner nodes of complete binary tree, are needed. The code listing below assumes four distinct states

requiring three rules: 1 s . add ( 2 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE1)) for i in range(bits) ]) == 3 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE2)) for i in range(bits) ]) 4 ) 5 s . add ( 6 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE2)) for i in range(bits)]) == 7 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE3)) for i in range(bits)]) 8 ) 9 s . add ( 10 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE3)) for i in range(bits)]) == 11 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATEN)) for i in range(bits)]) 12 )

Listing 4.5: Z3 Realization of Hamming Weight Constraints

By similar logic the HD constraint can also be generated (Listing 4.6). Only

states with transitions between them should be compared - otherwise the system is overly

constrained. A fully connected FSM, with n×(n−1) transitions, reverts to a similar solution

as the hamming weight constraint. This ensures the lower bound of the maximum number

of checks to be bfracn2c Assume for the following example that STATE1 is connected to Chapter 4: FSM Based Protection Mechanisms 78

STATES2 and STATES3 while STATE2 is connected to STATEN: 1 s . add ( 2 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,(STATE1 ˆ STATE2))) for i in range(bits)]) == 3 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,(STATE1 ˆ STATE3))) for i in range(bits)]) 4 ) 5 s . add ( 6 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,(STATE2 ˆ STATEN))) for i in range(bits) ]) == 7 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,(STATE1 ˆ STATE2))) for i in range(bits)]) 8 )

Listing 4.6: Z3 Realization of Hamming Distance Constraints

4.8 Experimental Setup

The experimental setup for this work consisted of two main phases: a charac-

terization phase and a security analysis phase. Preliminary characterization results were

done using over a 150 BenGen FSM benchmarks [81] , followed by more robust real world

FSMs from the MCNC Benchmark Suite [82] and supplemented by our own independently

created benchmark suite (See Chapter6). The second phase, dealing with security analy-

sis, while relatively straight forward at Low Levels due to use of correlation as a metric-

became increasing complex and uninformative - due to the nature of the data being used

for correlation. This motivates a diﬀerent mechanism to compare and the security of our

solutions - speciﬁcally one grounded in Mutual Information Theory.

4.8.1 Characterization

The characterization of the S*FSM was broken down into two classes - for each

benchmark there is a standard un-modiﬁed version (FSM) as well as a structurally secure,

loop-unrolled version (SFSM). A ﬁrst pass comparison between the two classes using the

state and transitions requirements for both FSMs essentially reveals a predicable state space Chapter 4: FSM Based Protection Mechanisms 79 and transition increase relative to the frequency of self-loops times the original number of states.

Next, in order to characterize the bit-encoding requirement the theorem prover generator was tuned to ignore all security constructs and satisfy a single constraint - each state must have a unique encoding. The theorem prover, without knowledge of the constraint, or a theoretical lower bound, started at a predetermined upper-bound reducing the number of bits until it could no longer satisfy the single constraint. As expected, the theorem prover returned a minimal binary encoding for both the FSM and SFSM. Using the same solver, with all security constraints in place (constant Hamming Weight and Distance) the minimal secure encoding was derived for the SFSM.

In order to move beyond theoretical requirements each FSM and SFSM was paired with an associated encoding. These pairing were then converted and synthesized using a standard cell library in 90nm technology in order to calculate area requirements as well as run low-level Nanosim power simulations for the next phase of the experiment.

4.8.2 Security Analysis Phase

In order to evaluate security, the relationship between the power side channel and hamming models had to be analyzed, quantiﬁed and ranked. In previous, low-level work many authors claimed that reduced variation in the power signal should reduce correlation

- since variation still existed in the signal correlation metric was still deﬁned - though questionable in its true meaning. Using correlation as a security metric involves making some strong assumptions (e.g. linearity between side channel and model, non-sparse data, non-uniform vectors, correlation and causation). Considering these parameters, a more robust approach using Mutual Information theory was used. Consider the two data sets

A and B in Figure 4.5 - Mutual Information quantiﬁes how much entropy can be removed Chapter 4: FSM Based Protection Mechanisms 80 from a set A by knowing set B (or from set A if B is known).

(a) A and B share some mutual information. (b) A and B share no mutual information.

Figure 4.5: Mutual Information between two data sets A and B with entropy’s E(A) and E(B) respectively.

In order to evaluate the relative security of FSMs, given various structures and

encodings, the entropy (eq. 4.11) of the power side channel (P) and the attack models was

computed for each uniquely encoded FSM. Additionally, the Mutual Information (eq. 4.12)

was computed for each side channel and attack model pairing for all encoding combinations.

It is interesting to note that for completeness, the secured encoding with zero entropy fails

the independent random variable requirement and, as expected, has maximum MI = 0 (eq.

4.13).

X E(A) = H(A) = − p(a)log2p(a) (4.11) a∈A Chapter 4: FSM Based Protection Mechanisms 81

p(a, b) I(A; B) ≡ I[p(A, B)] ≡ X p(a, b)log (4.12) 2 P p(a, b) P p(a, b) a,b b a

X p(a, b) X I(A, B)| ¯ = p(a, b)log2 = p(a, b)log2(1) = 0 (4.13) ∀b∈B b=b P p(a, b) P p(a, b) a,b b a a,b

4.9 Characterization Using Theoretical Simulation

This section experimentally validates the two-part method proposed for S*FSMs

by demonstrating the variability of two Hamming models on two diﬀerent FSMs. The ﬁrst

is the unsecured FSM shown in Figure 4.3, while the second is the unrolled structurally

modiﬁed FSM in Figure 4.3(b). Each FSM was encoded with a minimal encoding (BE)

derived by the satisﬁability solver as well as secure encoding methods when they are feasible

- One Hot (OH) and/or S*Opt (S*O).

The experiment targets the worst-case variability of the side channel models: if the

model itself shows no variability, it is rendered useless. In order to measure the variability

of both the HW and HD models, a simple event driven FSM simulator was constructed.

The high-level simulator applied random transition vectors (ranging in size from 10 - 50000

transitions) to a given FSM. During each transition, the current state and transition path

are recorded along with computed HW and HD values. While a simpliﬁcation with respect

to a real hardware system, the goal is to determine whether a relationship between the two

recorded events (State and Transition) and the models (HW and HD) exists.

Table 4.3 summarizes the simulation-based maximum theoretical correlation (CrMtheory)

for the standard FSM as well as the structurally secure S*FSM for three unique encoding

schemes including: BE, OH and S*O. The CrMtheory values reported here are average over

100 unique simulation runs at given cycle lengths. The unique set of 100 input vectors is Chapter 4: FSM Based Protection Mechanisms 82

Standard FSM S*FSM

# cycles BE OH BE OH S*O

10 0.71 - 0.45 - -

500 0.62 - 0.59 - -

5000 0.63 - 0.59 - - State,HW

50000 0.63 - 0.60 - -

10 0.61 0.61 0.01 - -

500 0.17 0.17 0.30 - -

Tran,HD 5000 0.17 0.17 0.32 - -

50000 0.18 0.18 0.32 - -

Table 4.3: Correlation (when it exists) between HW/HD Model and State/Transition for three encoding strategies on Standard and S*FSMs. shared across each row to eliminate random bias.

The results highlight ﬁrst and foremost that a proper encoding strategy is key:

FSMs, regardless of structure, are insecure when encoded using typical binary encoding schemes. Secondly, the only way to eliminate variability of both the HD and HW models is through a combined use of a secure encoding scheme (S*O or OH) and structural modi-

ﬁcation. In order to quantify the relative security FSMs will be fully implemented and the

Mutual Information between Side Channel and Models computed.

4.10 Physical Realization of S*FSM

The practicality and strength of S*FSMs are tested and veriﬁed with gate level, power-accurate implementations and realization of logic circuits to insure that the theoreti- Chapter 4: FSM Based Protection Mechanisms 83 cal justiﬁcations hold. While the ultimate test is with physically realized designs this is both impractical and expensive at the current time. S*FSM validation is accomplished using an analog electronic circuit simulator in order to show the relationship between the theoretical maximum correlation (rMtheory) results presented earlier and the worst case correlation to a simulated power side channel (rSpower). For the purpose of this discussion, rSpower is

computed by correlating the power side channel against two diﬀerent data sources: an ”or-

acle” (rSpower[O]) and an attacker best-case model (rSpower[M]). The ﬁnal security metric

for any FSM consist of the Mutual Information between rSpower and rSpower[M] or more

simply MI(P,M) for all Models M (HW and HD).

• rSpower[O] is computed by capturing the actual state or transition of the FSM under

attack and using that information rather than an assumed, predictive model.

• rSpower[M] is computed by using the oracle state information to perfectly compute

the side channel models.

4.10.1 Implemented Flow

The objective of this work is to quantify, using real-world FSM benchmarks, the

eﬀectiveness and cost of implementing S*FSMs in hardware. Our experimental platform

uses a collection of over 150 FSM benchmarks, generated and acquired from the authors of

BenGen [81], ranging in size from 4 to 60 states, with total transition ranging from 8 to 216.

The implemented workﬂow, seen in Figure 4.7 begins with original benchmarks, covering

the extreme ranges of both the state and transition space. These benchmarks are then

converted to structurally secured versions with near S*Opt encodings. Both sets of FSM,

the original BenGen benchmarks as well as the S*FSMs are converted to Verilog where they

can be synthesized at the gate-level using DC Compiler. The resulting gate-level netlist is Chapter 4: FSM Based Protection Mechanisms 84 converter to Spice in order to gain cycle accurate power information using Nanosim. The

Nanosim data in conjunction with, worst-case, FSM Oracle data is used to measure the correlation between the Hamming models and the current FSM state or transition between states.

Figure 4.6 depicts the low level FSM synthesis ﬂow used to generate the correlation data rSpower. The bold solid path represents the complete S*FSM path, while the relaxed dashed lines represent naive FSMs (either in structure and/or encoding). Finally, the compressed dashed-lines represent FSM speciﬁc information used in conjunction with oracle data to determine attack best-case correlations.

The current process begins with a high level structural description of an FSM (a).

Structurally secure implementations follow the right branch (b) at which point the ﬂow can rejoin the naive, left branch (c) which only enables standard encoding options or continue to secure encoding styles (d). The information from either (c) or (d) is then passed to an existing High Level to Low Level compiler (e). The resulting low level circuit is simulated using a low level simulator (f) which uses stimulus data (generated during the theoretical analysis) to create power-accurate traces. The stimulus generator (h) also computes needed oracle and best-case attack models which are all used to determine the maximum rSpower.

4.11 Results

Results from our experiment fall into two general classes. The ﬁrst set deals with the characterization results which contain the theoretical costs based entirely on the number of FSM states and transitions as well the implementation cost deﬁned by the synthesized hardware layout of the original and secured FSMs assuming using the Synopsys 90nm standard cell library. The second set of results focus on the security, in terms of shared Chapter 4: FSM Based Protection Mechanisms 85

FSM HLD (a)

S*FSM Re-Structure (b)

Std. Encoding (c) (d)

HDL to LL DC Compiler (e)

LL Simulation Stimulus Spice (f) Generator (h)

Oracle/Models Side Channel Leakage Analysis Matlab, Perl (g)

Figure 4.6: FSM ﬂow to test and verify theoretical results using gate-level realization of FSMs and S*FSMs using multiple encodings. information, aﬀorded by the particular FSM+encoding strategy selected.

4.11.1 Characterization Results

The characterization results are broken down in the following four sections - the

ﬁrst include the physical topological changes that occur due to FSM restructuring though similar information is consolidated and easily synthesized within the second sections which details the state space and transitions requirements of the original and restructured FSM.

The third part of the characterization reports the eﬀects of encoding strategy on the Chapter 4: FSM Based Protection Mechanisms 86

Figure 4.7: Experimental Flow to compare eﬀectiveness and cost of existing FSM Benchmarks.

. bitlength requirements while the ﬁnal section on layout space takes into consideration state space and encoding requirements in synthesized circuits.

Topology Changes Due to Restructuring

In order to visualize the eﬀect of the secure FSM strategy on several selected benchmark, representative benchmarks are included for reference in Figures 4.8 and 4.9

(BENGEN) and Figures 4.10 and 4.11 (MCNC). The ﬁgures show the original version and the secured version below with newly added states shaded and denoted using the name of Chapter 4: FSM Based Protection Mechanisms 87 state it duplicates followed by ” n.” These pictorial representation are omitted for brevity but can be easily generated with the custom tools referenced in Chapter6.

(a)

(b)

Figure 4.8: FSM 137 of the BENGEN Benchmark Suite. (a) Original FSM requiring 4 states and 11 transitions, (b) Structurally Secured version requiring 7 states and 20 transitions.

State Space Requirements

The state space increase, seen inTable 4.4, of the S*FSMs ranged between 14% to the maximum possible (assuming each state contains a self-loop) of 100%, with an average increase of 71.6%. Transition counts, shown in Figure 4.5, increase drastically for most non-trivial FSMs ranging from a 25%-100% increase. Though the overall average increase is 77%, more complex FSMs had an average increase closer to 93%.

While the BENGEN and our custom FSMs benchmarks are purely synthetically developed - the MCNC benchmark suite represents a more realistic picture of typical FSMs and therefor the true impact on the state space requirements (Tables 4.6 and 4.7). As such Chapter 4: FSM Based Protection Mechanisms 88

(a)

(b)

Figure 4.9: FSM 86 of the BENGEN Benchmark Suite.(a) Original FSM requiring 5 states and 8 transitions, (b) Structurally Secured version requiring 7 states and 12 transitions.

States FSM Original FSM S*FSM % Increase

137 4 7 75

94 7 8 14

86 5 7 40

108 60 117 95

60 57 114 100

147 61 112 84

146 60 116 93

avg 36.3 68.7 71.6

Table 4.4: State requirements of seven unique BENGEN FSMs. SFSMs require on average 71% more states than their standards form FSMs a large majority of the future results are dominated by the MCNC benchmarks. Chapter 4: FSM Based Protection Mechanisms 89

(a)

(b)

Figure 4.10: BBARA of the MCNC Benchmark Suite. (a) Original FSM requiring 10 states and 37 transitions, (b) Structurally Secured version requiring 20 states and 74 transitions.

Bit Length Encoding Requirements

In order to demonstrate the eﬀectiveness of the S*Opt encoding, the number of bits needed for the secure encoding, along with the increase over the original FSMs binary encoding are presented in Table 4.8. Note that on average the increase is around 79% though for all but the smallest FSM the average is a 67% increase.

As with the state space and transitions requirements - the MCNC benchmarks Chapter 4: FSM Based Protection Mechanisms 90

(a) (b)

Figure 4.11: OPUS of the MCNC Benchmark Suite. (a) Original FSM requiring 10 states and 21 transitions, (b) Structurally Secured version requiring 16 states and 38 transitions. provide a more accurate baseline for bit length encoding impact in real FSMs (See Table

4.9 and Figure 4.13). The diﬀerence between benchmark suites is less noticeable simply due to the logarithmic nature of bit encoding requirements.

Layout Space Requirements

To validate the S*FSM approach, the original and secure designs were automatically synthesized without any optimizations using the Synopsys 90nm standard cell library.

The resulting layout areas are summarized in Table 4.10. Note that while the range varied from a 50% increase to 160% (for the smallest circuit due to overhead) the average increase, even without optimizations, is 4% smaller than 1-to-1 low-level duplication methods. Ex- Chapter 4: FSM Based Protection Mechanisms 91

Transition FSM Original FSM S*FSM % Change

137 11 20 82

94 8 10 25

86 8 12 50

108 193 380 97

60 211 422 100

147 202 384 90

146 202 394 95

avg 119 231 77.0

Table 4.5: Transition requirements of seven unique BENGEN FSMs. SFSMs require on average 77% more transitions than their standards form FSMs. tending the analysis to include a larger subset (n=25) of the original benchmarks shows an average total area increase closer to 75%.

As with the previous metric - the MCNC benchmarks provide a more accurate baseline for layout area in real FSMs (See Table 4.11 and Figure 4.14). Overall increases in area, when including small FSM outliers, tend toward 100% increase - without the outliers the average tends closer to the mid 60% range.

4.11.2 Security Results

Before looking at all of the security results in aggregate the results for a single benchmark are presented for completeness. The following progression will be used for both the individual benchmark as well as those in aggregate: ﬁrst the information contained in the three primary information sources will be presented (as Entropy) and second the relative Chapter 4: FSM Based Protection Mechanisms 92

States FSM Original FSM S*FSM % Increase

bbara 10 20 100

bbsse 16 23 44

ex1 21 37 76

modulo12 12 24 100

opus 11 17 55

planet 48 49 2

sand 32 62 94

scf 122 122 0

sse 16 23 44

styr 30 49 63

avg 31.8 42.6 58

Table 4.6: State requirements of ten unique MCNC FSMs. The percentage of self loops (and therefore new states) ranges from 0-100% with an average of 58% - about 10% fewer than the synthetic benchmarks in BENGEN. overlap in information (as Mutual Information) will be discussed. For each benchmark three fully realized, instantiated, and simulated circuits as well as three corresponding FSM oracles were fed the exact same input vectors (5K inputs). The three instantiated circuits represent the following three FSM+Encoding pairs: base FSM with binary encoding (FSM), the restructured FSM with binary encoding (FSM S or SFSM) and the restructured FSM with secure encoding (SOPT FSM S or SFSM HW+HD).

Figure 4.15 shows the amount of information contained within the Power Side

Channel as well as the two Hamming Models for the three versions of the BBARA FSMs. Chapter 4: FSM Based Protection Mechanisms 93

Transitions FSM Original FSM S*FSM % Increase

bbara 60 120 100

bbsse 56 88 57

ex1 136 263 93

modulo12 24 48 100

opus 22 39 77

planet 115 119 3

sand 184 363 97

scf 166 166 0

sse 56 88 57

styr 166 319 92

avg 103.2 161.3 68

Table 4.7: Transition requirements of ten unique MCNC FSMs with an average increase of 68%, roughly 10% less than synthetic benchmarks.

First, note that there is reduction in the Entropy of the power side channel even though that was not the main objective of this work. The objective, which is clearly shown in the restructured FSM with secure encoding, is to eliminate information from the attack models.

Clearly it is, as expected 0 for both Hamming Models. The ﬁgure also shows an increase in power entropy between the binary encoded base and restructured FSMs which is directly proportional to the increase in state space.

The Mutual Information results for BBARA FSMs are found in Figure 4.16. Val- diating the theoretical results, the only way to eliminate information leakage between the physical side channel at both hamming models is through use of the SOPT encoding on the Chapter 4: FSM Based Protection Mechanisms 94

Figure 4.12: Relative change in State and Transition counts for 10 MCNC benchmarks. Note the average state increase of 58% and transition count of 68% and that individual benchmarks, excluding the 100% outliers exhibit roughly a 18% diﬀerential. restructured BBARA FSM.

Presented in Figure 4.17 are the results for diverse MCNC benchmarks {styr, sand, ex1, bsse, bbara, sse}. With the exception of styr and ex1 the entropy of the power is reduced, a trend that while not targets is repeated in our validation benchmarks.

The mutual information between the power side channel and Hamming Weight in Figure 4.18 re-emphasizes that in almost every case restructuring and using a binary encoding is must worse than simply using a binary encoding. Restructuring almost always increases the number of states - in doing so the binary encoding has greater variability leading to higher entropy and higher overlap with the power side channel. As expected, the restructured FSM with SOPT’s power side channel does not have any mutual information with the HW model. Chapter 4: FSM Based Protection Mechanisms 95

Bits Needed FSM BE Orig BE S*FSM Secure S*FSM Overall % Increase

137 2 3 5 150

94 3 3 5 66.7

86 3 3 5 66.7

108 6 7 10 66.7

60 6 7 10 66.7

147 6 7 10 66.7

146 6 7 10 66.7

avg 4.6 5.3 7.9 78.6

Table 4.8: Bits to encode FSM using a binary encoding as well as to encode SFSM using a standard binary and S*Opt encodings.

The mutual information between the power side channel and Hamming Distance in Figure 4.19 is unlike the relationship with the Hamming Weight. Restructuring removes loops, as such the distribution of Hamming Distances is normally distributed as opposed to bi-modal (two means near 0 and some other value). The resulting change in MI between baseline and restructured FSMs is currently less predictable though likely related to the amount of connections between states. As expected, the restructured FSM with SOPT’s power side channel does not have any mutual information with the HD model.

4.12 Implications

With the increased focus on creating electronic devices that are resistant to physical attack there is a need to protect more than just cryptographic primitives. By focusing Chapter 4: FSM Based Protection Mechanisms 96

Bits Needed FSM BE Orig BE S*FSM % Increase Secure S*FSM Overall % Increase

modulo12 4 5 25 7 75

opus 4 5 25 7 75

bbara 4 5 25 8 100

bbsse 4 5 25 8 100

sse 4 5 25 8 100

sand 5 6 20 8 60

planet 6 6 0 8 33

ex1 5 6 20 9 80

styr 5 6 20 9 80

scf 7 7 0 9 29

avg 4.8 5.6 19 8.1 73

Table 4.9: Bits to encode FSM using a binary encoding as well as to encode SFSM using a standard binary and S*Opt encodings. on securing Finite State Machines at high levels of abstraction, and showing that the hardware costs, even without optimization, are in line with low level solutions, we are able to demonstrate the feasibility and practicality of S*FSMs for use by any designer.

While the results presented cover a subset of the full FSM benchmarks, our existing methods have been applied to the remaining subset with similar results. Our future direction, in addition to validating the eﬃcacy of our method, is in enabling select optimizations within the standard synthesis ﬂow that target the combinational logic without optimizing the FSM structure. Additionally, with synthesized benchmarks, we are interested in more robust analysis that will quantify the amount of information leaked using various encodings Chapter 4: FSM Based Protection Mechanisms 97

Figure 4.13: Bit requirements for MCNC Benchmarks with standard FSM and Binary Encoding (BE) and restructured FSM (Restruc. FSM) with both Binary (BE) and Optimal Secure (S*OPT). The increase in nominal/minimal length bit encoding is approximately 20% while the increase from the original FSM to secure requires a 75% increase in bitlength. and structural modiﬁcation strategies. In a similar line, the next chapter investigates the eﬀects of low power encoding strategies in conjunction with the S*FSM security constraints with hopes of providing low-power tunable security encodings.

Our results, based on theoretical estimates and synthesized hardware layouts, show the feasibility of implementing S*FSMs using high-level methods. The increased state space

(approx. 72%), associated increase in bits needed for secure encodings (approx. 79%) and the increased transition space (approx. 77%) yield an increase physical layout area (approx.

96%) that is comparable to some low level techniques without compromising downstream design eﬀorts.

Finally, the theoretical results are validating using Mutual Information theory indicating that restructuring a FSM is not suﬃcient to prevent information leakage - rather in many scenarios it increases the information overlap. Presently, the only way to eliminate the Mutual Information between the power side channel and both common Hamming Models Chapter 4: FSM Based Protection Mechanisms 98

Layout Area nm2 FSM Orig S*FSM % Increase

137 407 1058 160

94 470 797 70

86 431 775 80

108 13707 28062 105

60 59517 88445 49

147 13868 29089 110

146 14148 28477 101

avg 14649.7 25243.3 96.43

Table 4.10: Layout Area Required for subset of BenGen Benchmarks using SAED 90nm Standard Cells

. is to apply SOPT encodings. This clearly motivates the need for solutions that provide more

ﬂexibility to designers. Chapter 4: FSM Based Protection Mechanisms 99

Layout Area nm2 FSM Orig S*FSM % Increase

bbara 810 2498 208.3

bbsse 1430 2651 85.3

sse 3779 7372 95.0

sand 6007 9657 60.8

ex1 1387 2651 91.1

styr 4014 7699 91.8

avg 2905 5421 105.4%

Table 4.11: Layout Area Required for subset of MCNC Benchmarks using SAED 90nm Standard Cells. Note that removing the bbara outline the average increase is closer to 85%.

Figure 4.14: Layout Area Required for subset of MCNC Benchmarks using SAED 90nm Standard Cells. Chapter 4: FSM Based Protection Mechanisms 100

Figure 4.15: Entropy for the MCNC BBARA Benchmark. Chapter 4: FSM Based Protection Mechanisms 101

Figure 4.16: Mutual Information between Power and Hamming Models for MCNC’s BBARA Benchmark: Note the increased MI for the restructured machine with binary encoding. Chapter 4: FSM Based Protection Mechanisms 102

Figure 4.17: Current Entropy for MCNC’s Benchmarks and sorted by the change in entropy between FSM and SFSM+SOPT. Chapter 4: FSM Based Protection Mechanisms 103

Figure 4.18: Mutual Information between Power Side Channel and Hamming Weight Model for MCNC’s Benchmarks - SFSM+SOPT MI=0. Chapter 4: FSM Based Protection Mechanisms 104

Figure 4.19: Mutual Information between Power Side Channel and Hamming Distance Model for all MCNC’s Benchmarks - SFSM+SOPT MI=0. Chapter 5

SFSMs with Power Constraints

5.1 Motivation

As silicon-based technology feature sizes continue to decrease and designs remain susceptible to novel attacks designers face competing goals when creating secure, low power, integrated circuits (ICs). Often, low power designs rely on heavy minimization and optimization procedures while many secure designs use low-level duplication mechanisms to thwart attacks. An area that requires special attention, and is crucial in both realms, is the power consumption profile of Finite State Machines (FSM). This chapter specifically addresses the key concern of creating secure, low-power, FSM encodings. This chapter proposes a flexible, secure, encoding strategy which, in conjunction with security-based structural modifications, can provide low-power security solutions against DPA side channel attacks. The secure encoding strategy includes methods that relax and tighten the original constraints in order to provide varying levels of protection that approach traditional low power encoding methods.

105 Chapter 5: SFSMs with Power Constraints 106

5.2 Introduction

In the era of ubiquitous computing, with ICs in everything from identiﬁcation badges and pacemakers to UAVs and sensor arrays, the requirements and constraints are no longer focused solely around transistor counts, but also on the minimization of power consumption. While not immediately apparent, this minimization with respect to power can exacerbate an underlying security vulnerability present in many ICs. IC security, long considered an algorithmic-level and physical packaging problem, is gaining momentum as a implementation problem due to the continued advances in side channel based attacks [20,62].

While many factors contribute to power consumption and side channel based information leakage, a prominent intersection between the two design metrics exists in the instantaneous, data-dependent, current profile of a device. The goal of low-power IC solutions is to reduce the current profile during operation (typically by making the common cases power-efficient) while the goal in side channel leakage mitigation is to create a data- independent current profile (making all cases equal). This work targets the creation of secure circuits using high-level methods that can also minimize the current of sequential circuits in the form of Finite State Machines (FSMs). This work showcases a family of high- level solutions, spanning structural and encoding design spaces, that can be used to apply varying levels of protection while maintaining consideration for overall power consumption.

While secure state encodings are a new design parameter, extensive work has already been done in the area of low-power/power aware state encodings [83–88]. In particular, most solutions focus on the minimization of switching activity (by reducing Hamming

Distance between connected states) using a variety of techniques ranging from genetic local search, to integer linear programming, to SAT based algorithms. This work builds upon some of these methods, speciﬁcally those that target peak current minimization (Npeak) as Chapter 5: SFSMs with Power Constraints 107 their objective [87, 88].

In this chapter, we extend the set of high-level solutions for reducing data-dependent current variations during FSM state transitions introduced in the previous chapter. This work provides methods to relax the security constraints enabling FSM encodings that approach existing low-power solutions. The remainder of this work is organized as follows:

Sections 5.3 and 5.4 summarize the existing solution as well as the proposed extension while

Section 5.5 describes the solution in the context of the Z3 SMT solver; Section 5.6 provides results using the MCNC benchmark suite [82]; and ﬁnally Section 5.7 provides the overall implications of this approach.

5.3 Existing Solution

The existing high-level solution targets the information leakage from within the models M, making every possible value Mi = M, rendering the correlation undeﬁned - regardless of variations in the power sc. When mapping the generic high-level solution onto the FSM problem space, two generic constraints are formulated that guarantee side channel protection against common hamming model predictors.

The ﬁrst set of constraints are derived using the HW of each FSM state’s encoding

- speciﬁcally the HW must be constant across all states S (Eqs. 5.1-5.2).

X si[b] = c1 ∀i|si ∈ S (5.1) b

HW (s) = c1 ∀s ∈ S (5.2)

The second set of constraints, similarly derived, requires that the HD remain constant between transitioning states si → sj (Eqs. 5.3-5.4). Chapter 5: SFSMs with Power Constraints 108

X (si[b] ⊕ sj[b]) = c2 ∀i, j|si, sj ∈ S, ∃si → sj (5.3) b

HD(si, sj) = c2 ∀si, sj|∃si → sj (5.4)

Based on the encoding constraint imposed by Eq. 5.4 it is crucial to note that should a connection exist between a state and itself, i = j (e.g. B → B in Fig. 5.1(a)), the resulting HD is deﬁned as HW (si ⊕ si) = 0. In all but trivial, single state FSMs this will break the imposed HD constraint since any other transition would be non-zero

(HDi6=j(si, sj) 6= 0) thus increasing the correlation between M and sc. This motivates a need for structural modiﬁcations to FSM in order to guarantee side channel security, the result is a Structurally Secure FSM (S*FSM) [61]. The process of creating a S*FSM is the inverse of state collapsing and minimization used in traditional FSM synthesis. Each self-loop requires a the creation of a duplicate state, with duplicate outbound transitions as well as two transitions between the original and duplicate state equivalent to the original self-loop condition. Figure 5.2(a), in the following example, is the functional equivalent

S*FSM of the FSM in Figure 5.1(a).

Consider the three-state FSM in Figure 5.1(a), under two diﬀerent encoding schemes: binary and one-hot (OH). Table 5.1(b) shows the impact of the encoding strategy on the

HW model and the potential for correlation between the model and the underlying state.

Clearly, there is a perfect correlation between unique encodings and states, and a potential for correlation when the FSM is encoding using a binary code. Consider a sequence of HWs

- if the binary encoding is used, state B (HW=2) can be used to reconstruct the entire sequence. Due to its constant nature, correlation to the underlying state using the OH encoding is not possible. Table 5.1(c) shows a similar evaluation between the FSM transition space and the HD model. Unlike the HW scenario, both encodings show that correlation Chapter 5: SFSMs with Power Constraints 109 is possible (due to the self-loop HD=0). In this particular FSM, a single occurrence of the self-loop would be enough to reconstruct the entire set of transitions.

(a) A three state FSM with a self loop demonstrates the need for

structural modiﬁcation as well as constrained state encodings.

State BIN BIN:HW OH OH:HW

A 01 1 001 1

B 11 2 010 1

C 10 1 100 1

Correlation 1 exists 1 undef (b) Potential correlation between states and the HW model

with two encodings: binary (BIN) and one-hot (OH).

Transition BIN XOR BIN:HD OH XOR OH:HD

A,B 10 1 011 2

B,C 01 1 110 2

B,B 00 0 000 0

C,A 11 2 101 2

Correlation 1 exists 1 exists (c) Potential correlation between states and the HD model with two

encodings: binary (BIN) and one-hot (OH).

Figure 5.1: A three-state FSM with potential correlation between HW and HD models when considering two diﬀerent encodings. Chapter 5: SFSMs with Power Constraints 110

Tables 5.2(b) and 5.2(c) illustrate the importance of encoding selection FSMs, even if they are structurally modiﬁed S*FSMs. While an arbitrary length binary encoding could satisfy the HW constraint imposed earlier, a minimal binary can not - allowing some potential for correlation between the HW and the underlying states. Similarly, no guarantee can be made about the HD constraint for minimal binary encodings. Finally, while the OH encoding solution presented in Figure 5.2 is capable of protecting any S*FSM, the required encoding bitlength is unjustiﬁed for all but small S*FSMs.

Clearly given an original FSM (O), and its structurally modiﬁed counterpart (S)

various encoding options exist, Binary (BIN), OH, and an optimal bitlength (OPT), a

summary of the potential variability is given in Table 5.1.

FSM.Enc HW HD.

O.BIN Var Var

O.OH Const Var

S.BIN Var Var

SS.OH Const Const

SS.OPT Const Const

Table 5.1: Variability of the HW and HD models given an FSM type (Original, Structurally Modiﬁed) and Encoding (Binary, OH, Bitlength Optimal). Note: secured FSMs are preﬁxed with SS.

5.4 Proposed Extension

In order to provide designers ﬂexibility and low power options in their designs we provide two methods, which, when used in conjunction mimic existing low-power solutions.

The first method increases flexibility, at a potentially significant cost to security - involves Chapter 5: SFSMs with Power Constraints 111

(a) Structurally modiﬁed, functionally equivalent S*FSM of Fig.

5.1(a) with no self-loops.

State BIN BIN:HW OH OH:HW

A 00 0 0001 1

B 01 1 0010 1

B n 11 2 0100 1

C 10 1 1000 1

Correlation 1 exists 1 undef (b) Potential correlation between states and the HW model

with two encodings: binary (BIN) and one-hot (OH).

Transition BIN XOR BIN:HD OH XOR OH:HD

A,B 01 1 0011 2

B,C 11 2 1010 2

B,B n 10 1 0110 2

B n,B 10 1 0110 2

B n,C 01 1 1100 2

C,A 10 1 1001 2

Correlation exists weak exists undef (c) Potential correlation between states and the HD model with two

encodings: binary (BIN) and one-hot (OH).

Figure 5.2: A S*FSM, functionally equivalent to Figure 5.1(a), including the potential for correlation between HW and HD models with two different encodings. Chapter 5: SFSMs with Power Constraints 112 reverting back to FSMs that contain self-loops. In certain scenarios, when many states have self loops, detecting a self loop does not provide as much information if other transitions are not identifiable. In order to maintain transition masking - the original HD constraint must be relaxed slightly (HDRelaxed, HDR) as shown in Equation 5.5. This modified HDR constraint still forces all non-looping transitions to be a constant HD but is achievable in non-restructured FSMs - unlike the constraint in Equation 5.4.

The second method further restricts the original HW and HD/HDR constraints to minimize the peak power consumed in order to satisfy low-power design objectives.

Following previously established work [87, 88] this translates to minimizing the maximum number of 0 → 1 and 1 → 0 transitions (Npeak). Equations 5.6 and 5.7, formally deﬁne

Npeak with unique and identical technology dependent weighting factors w1, w2 respectively.

While previous works were also concerned with switching activity [88] (Eq. 5.8), our current

discussion relaxes this constraint in favor of our core security objectives. Minimization of

Npeak is accomplished by forcing additional restrictions on HD/HDR - mainly minimizing

c2 in equations 5.4 and 5.5 respectively.

HDR = HD(si, sj) = c2 ∀i, j|i 6= j (5.5)

Npeak = max{w1 · N0→1, w2 · N1→0} (5.6)

Npeak = max{N0→1,N1→0} (5.7)

X SWtot = HD(si, sj) · pi,j (5.8) si→sj If the primary HW and HD constraints are maintained, then assuming two states, s1, s2, each 0 → 1 or 1 → 0 in s1 forces a respective 1 → 0 or 0 → 1 transition in s2. Thus

c2 Npeak secure = 2 as seen in Eq. 5.9. When guaranteeing either the HD or HDR constraint,

Npeak secure is equal to half of the HD between any two connected states. Minimization of the HD/HDR constraint constant c2 will also minimize Npeak secure. For example, a one-hot Chapter 5: SFSMs with Power Constraints 113

encoding will have a c2 = 2 and an Npeak secure = 1 - the lower bound of Npeak. While not

formally expressed, similar constraint restrictions can be placed on the HW constant c1 in

order to mitigate static power consumption.

c N = N = N = 2 (5.9) peak secure 0→1 1→0 2

Both of these methods result in new families of encoding strategies, of which two subsets will be explored here - original FSMs with HDR and S*FSMs with power aware constraint modiﬁcations (CHD and CHW) as seen in Table 5.2. While it is possible to further constrain the original FSM using a CHW, that discussion is beyond the scope of this work.

FSM.Enc HW HD

O.HDR Var c2 and 0

O.HDR+OH c1 c2 and 0

SS.CHD=4 c1 4

SS.CHD=2 c1 2

SS.CHD=2,CHW=1 1 2

Table 5.2: Variability of the HW and HD models given an FSM type (Original, Structurally Modiﬁed) and constrained encoding.Secured FSMs are preﬁxed with SS.

In summary, in order to prevent any information leakage from the HW and HD models a S*FSM is required. While the structural requirement can be relaxed (assuming the HDR constraint is used) an FSM with self-loops will always have variability in the HD model. Further, it possible to restrict the HW/HD constants (CHW/CHD) to minimize the peak current (Npeak) - in the secure case - minimization of Nsecure peak is directly Chapter 5: SFSMs with Power Constraints 114

proportional to minimization of the HD constraint constant c2.

5.5 SMT Based Constraint Formulation

Building upon the constraints sets in the previous Chapter, modiﬁcation of the

SMT-based solver to include further restrictions is trivial. In order to generate constant

hamming weights, the original constraint was imposed by requiring each states hamming

weight to be identical (Listing 4.5). In order to allow ﬁner grained control - e.g. should a

designed want a one-hot encoding (HW = 1) they could simply add a rule to that eﬀect

for each state (Listing 5.1), or they could add an additional rule to the already existing

set forcing the equality to ripple through rule (Listing 5.2). Similarly the NP eak can be

controlled through modiﬁcation or addition to the Hamming Distance rules. 1 s . add ( 2 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE1)) for i in range(bits) ]) == 1 3 ) 4 s . add ( 5 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE2)) for i in range(bits)]) == 1 6 ) 7 s . add ( 8 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE3)) for i in range(bits)]) == 1

Listing 5.1: Realization of Constant Hamming Weight Constraint Across All Rules

1 s . add ( 2 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE1)) for i in range(bits) ]) == 3 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE2)) for i in range(bits) ]) 4 ) 5 s . add ( 6 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE2)) for i in range(bits)]) == 7 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE3)) for i in range(bits)]) 8 ) 9 s . add ( 10 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE3)) for i in range(bits)]) == Chapter 5: SFSMs with Power Constraints 115

11 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATEN)) for i in range(bits)]) 12 ) 13 s . add ( 14 Sum([ ZeroExt(int(ceil(log(bits)/log(2))+1), Extract(i ,i ,STATE3)) for i in range(bits)]) == 1 15 )

Listing 5.2: Realization of Constant Hamming Weight Constraint Ripple Eﬀect

Finally when generating the original Z3 solver rules, a special check must be made

for the relaxed HD FSMs - namely any self-loop (SourceState=DestinationState) must be

excluded from the Hamming Distance rules.

5.6 Results

The preliminary experimental results, as in the previous chapter, ﬁrst focus on the

general characterization of solutions in terms of bit encoding, layout area and now power.

Note that since no changes were made to topology/FSM structure that details are omitted

(the reader is referred to the previous chapter, Chapter4). As with the previous chapter

the results then focus on security aspects of power constrained solutions. A subset of the

common MCNC [82] benchmarks is used for alignment with existing FSM restructuring and

encoding solutions.

5.6.1 Characterization Results

In order to characterize solutions, three diﬀerent aspects are considered: the bit

encoding requirement, the physical layout requirements and ﬁnally the power requirements.

State space elements are not considered as the two classes of FSMs - baseline and restruc-

tured - remain un-modiﬁed from previous discussions. Chapter 5: SFSMs with Power Constraints 116

Bit Encoding Requirements

The bits required for a binary encoding of the original (O.Binary) and structurally modiﬁed FSMs (S.Binary) as well as the bits required for a secure optimal encoding

(SS.Opt) can found in Figure 5.3. The binary encodings were theoretically derived and experimentally validated, while the optimal secure encodings were generated using the SMT solver and methods described earlier. Using the minimum binary encoding for the original FSM (O.Binary) as a baseline we ﬁnd that a binary encoding for the restructured

S*FSM (S.Binary) requires a 15% increase in the number of encoding bits. Similarly, a bitlength-optimal, secure encoding of the S*FSM (SS.Opt) requires a 70% increase over the the original FSM or a 47% increase over the binary encoded version of the restructured

S*FSM .

Figure 5.3: Bit requirement comparison between binary encodings of original (O.Binary) and structurally modiﬁed (S.Binary) MCNC benchmarks as well as the S*Opt encoding for the structurally modiﬁed benchmarks (SS.Opt).

In Figure 5.4 three, power-driven, constrained versions of the S*FSMs are compared in order to evaluate the burden on the encoding requirement. The baseline, in which Chapter 5: SFSMs with Power Constraints 117

CHD=4, also happens to line up with S.Opt for this subset of MCNC benchmarks. While further constraining the HD (CHD=2) more than doubles the required bitlength and requires on average a 112% increase in the number of bits, it does reduces Npeak to its minimum

- 1. Further constraining the encoding by imposing a constrained HW (CHW=1), forces the S*FSM to be encoded using a one-hot encoding that signiﬁcantly impacts the encoding bitlength by increasing it almost 400%. We avoid this implementation in future discussions due to the limited practicality in anything but simple controllers.

Figure 5.4: Bit requirement for structurally modiﬁed S*FSMs with added power driven constraints.

Using the previous O.Binary and SS.Opt as lower and upper bounds respectively, we applied the relaxed HD (O.RHD) and relaxed HD with HW (O.RHD+HW) constraints on the original FSM (See Figure 5.5). Since the O.RHD allows for variable HW it is safely considered the weakest of the available modiﬁcations though it still requires an 30% average increase in encoding length over O.Binary. When adding the constant HW requirement, the average increase is nearly 55% over O.Binary. In other words, the O.RHD+HW encoding is a 9% decrease over SS.Opt at the cost of exposing any self-loop transitions. Chapter 5: SFSMs with Power Constraints 118

Figure 5.5: Bit requirement comparison for FSMs with relaxed structural constraints. The lower bound (LB) is the original FSM with binary encoding (O.Binary), while the upper bound (UB) is the structurally modiﬁed FSM with bitlength-optimal secure encoding (SS.Opt).

Layout Requirements

The area requirements for both baseline MCNC Benchmarks (Figure 5.6(a)) and restructured MCNC Benchmarks (Figure 5.6(b)) show the same linear increase in the required layout area as additional constraints are added. Generally, the greatest increase in area is due to the addition of the HD constraint - within the baseline FSM (15% increase), this corresponds to additional encoding bits, whereas the SFSM sees a greater increase

(20%) due to both additional unrolled states as well as encoding bits. The average maximum increases from baseline to fully constrained (e.g. FSM/SFSM to HW+HD/HDr=4) are 32% and 27% respectively.

Power Requirements

The two classes of FSMs - base FSM and restructured SFSMs each under four unique encodings all have unique power proﬁles. Before exploring the aggregated results, all Chapter 5: SFSMs with Power Constraints 119

(a) Base FSMs

(b) SFSMs

Figure 5.6: MCNC Area Layout requirements. (a) Base FSMs dominated by HD Constraint (15%), (b) SFSMs dominated by HD Constraint (20%) eight current traces for a single 1000 input experimental run of the MCNC EX1 benchmark are shown in Figure 5.7. The ﬁgure shows the global picture: SFSMs in general have higher Chapter 5: SFSMs with Power Constraints 120 current proﬁles than standard FSMs. Additionally, it is possible to reduce the current drawn of both SFSMs as well as FSMs using special encoding techniques (that reduce the

Npeak).

Figure 5.7: The current (10−15A) required by 8 unique FSM and Encoding combinations over a 1000 input stimulus.

Focusing again on a single benchmark, with the same eight FSM+Encoding choices

Figure 5.8 depicts the current drawn per round in clearer manner. Clearly, while some circuit instantiations are noisier than others (FSM HW+HDr and SFSM HD) the average current proﬁles are well delineated. The SFSM consume the most, while the four FSMs consume the least. This said, the constrained SFSM is able to reduce the power overhead substantially requiring a overhead of only 50% while the other secure methods require closer to 100% overhead.

Since the structure and underlying characteristics of the MCNC benchmarks are varied, and represent a functional cross-section of FSMs the aggregate, normalized power requirements are of interest to designers interested in ﬁne-tuning power (security and power are explored in the following section). Figure 5.9 shows the aggregate normalized power for all of the MCNC benchmarks evaluated. The trends show the following - a FSM constrained in both HW and HD (or HDr) consumes less power, on average than one that is not Chapter 5: SFSMs with Power Constraints 121

Figure 5.8: The current (10−15A) required by 8 unique FSM and Encoding combinations over 4 input stimulus. constrained - obviously the penalty in in layout area. In order to minimize current consump-

HD/HDR tion, setting a low HD/HDr value is crucial as it directly impacts Npeak(Npeak = 2 ). Implementing an HD/HDr constraint without the HW is only beneﬁcial in the base-line

FSMs.

5.6.2 Security Results

As in the previous chapter three factors are examined with respect to information security: Entropy, Mutual Information between the Power Side Channel and the Hamming

Weight model, and the Mutual Information between the Power Side Channel and the Ham- ming Distance model. Results here are further divided between baseline FSMs and those with additional restructuring - SFSMs.

First, Figure 5.10(a) shows that the amount of information contained within the baseline FSM power side channel ﬂuctuates. In most cases the heavily constrained encoding

- which minimized power - has less information than the typical binary encoding. The Chapter 5: SFSMs with Power Constraints 122

Figure 5.9: The Normalized power requirements of the MCNC benchmarks in the presence of structural and encoding constraints. middle two encoding strategies - Relaxed Hamming Distance (HDR) and the inclusion of

Hamming Weight without constraint are generally on par with the information contained in the baseline FSM. While similar to the standard FSM in terms of entropy, the entropy of the constrained SFSMs encodings shown in Figure 5.10(b) is almost always less than the binary encoded SFSM.

The Mutual Information between the power side channel and the Hamming Weight models is signiﬁcantly more interesting. First, for baseline FSMs (Figure 5.11(a)) the Mu- tual Information generally increase when only the Relaxed Hamming Distance (HDR) constraint is added. As expected, the hamming weight constrained encodings have leak no MI.

The secure FSMs (Figure 5.11(b)) fair slightly better, but still expose information if only Chapter 5: SFSMs with Power Constraints 123

(a) Base FSMs

(b) SFSMs

Figure 5.10: MCNC Entropy of Power Side Channel with respect to four unique encoding strategies. (a) Base FSMs, (b) SFSMs. the Hamming Distance constraint is applied.

While it might be expected that the the Mutual Information between the Power

Side Channel and Hamming Distance fair the same as the Hamming Weight, this is simply Chapter 5: SFSMs with Power Constraints 124

(a) Base FSMs

(b) SFSMs

Figure 5.11: MCNC Constrained MI(Power, HW) with respect to four unique encoding strategies; (a) Base FSMs, (b) SFSMs. not the case. First, in Figure 5.12(a), note that only in instances where the base model has few/no self-loops can the Mutual Information be reduced to near 0. Second, the greatest Chapter 5: SFSMs with Power Constraints 125 reduction in MI occurs when only the reduced HD (HDr) constrain is applied. When further constraints are applied this exacerbates the bi-modal distribution of the Hamming Distance

Model thereby increasing the MI. Finally, as expected, the restructured SFSM in Figure

5.12(b) leaks no information through the HD constrained encodings. The only parameter to consider here is therefor power and layout area costs.

5.7 Implications

Overall results show that in order to provide a secure solution, the typical MCNC

FSM requires a 50% increase in the number of states and a 57% increase in the number of product terms needed to deﬁne the state transitions. These increases translate to a minimum encoding space increase of 70% raising the average number of bits needed to encode the MCNC benchmarks from 4.8 to 7.9. When factoring in a relaxed structural constraints, and corresponding HDR constraint, we found respective increases of 53% and

67% raising the average number of bits needed to 7.3 and 7.9. The focus of this work was on reducing the power requirements while providing security. In terms of power savings, the current minimization was possible for both FSMs and SFSMs through the addition of HD/HDr constraints with on average current reductions of 30% and 70% respectively.

Overall, a constrained SFSM can consume about 5% more power than a nominal FSM with binary encodings while requiring a 95% increase in layout area.

A designer with additional layout real-estate can easily trade it for both increased security while mitigating overall power consumption. Figures 5.13(a) and 5.13(b) show the impacts on current and security with respect to increasing layout requirements for both standard FSMs as well as Secure SFSMs. If a designer can not aﬀord to increase area by more than a third - FSM restructuring is out of the question and the base FSMs in Chapter 5: SFSMs with Power Constraints 126

(a) Base FSMs

(b) SFSMs

Figure 5.12: MCNC Constrained MI(Power, HD) with respect to four unique encoding strategies; (a) Base FSMs, (b) SFSMs.

Figure 5.13(a) are of interest. In order to increase security they can only guarantee the mitigation of the HW leakage for a nominal current usage penalty. Reduction in the HD Chapter 5: SFSMs with Power Constraints 127 leakage is dependent on its bi-modal behavior - decreasing the HDr constraint (to HDr=4 or HDr=2) mitigates the eﬀect but slightly increases the required area (32% increase). If a designer has much greater ﬂexibility in layout area, then the SFSMs in Figure 5.13(b) are recommended for complete side channel security. A nominal SFSM with binary encoding, while requiring the least overhead of SFSMs is never recommended. In fact, it’s use increases the vulnerability of the system. At a 90% increase in layout area a designer can guarantee an FSM free of HD information leakage at 30% current penalty. A 92% layout area increase guarantees an FSM free of both HW and HD information leakage through still requiring a

30% current overhead penalty. Finally, a 94% layout increase still maintains complete HW and HD information secrecy while reducing the current penalty to under 5%.

The original goal, to provide varying levels of security protection using high-level methods has been demonstrated through a plausible solution. The physical state space and encoding requirements are feasible and designs have been fully synthesized using commercial tools. Results are in line with previous expectations, while complete security is impossible without restructuring, a designer can mitigate HW information (0 leakage) rather easily as well as on average reduce the information leakage from HD information (avg reduction across all methods greater than 50%) using simple constraints. Aggregating the information presented earlier, Figure 5.14 provides a complete view, with both base line and secured

FSMs sorted by increased area requirements. Chapter 5: SFSMs with Power Constraints 128

(a) Base FSMs

(b) SFSMs

Figure 5.13: MCNC Overall Design and Security Normalized Costs (w.r.t FSM with Binary Encoding) ranked in order of Area Penalty with respect to four unique encoding strategies; (a) Base FSMs, (b) SFSMs. Chapter 5: SFSMs with Power Constraints 129

Figure 5.14: MCNC Overall Design and Security Normalized Costs (w.r.t FSM with Binary Encoding) ranked in order of Area Penalty. Chapter 6

Side Channel Attack and Analysis

Related Work

6.1 Architecture Level Approach

When initially designed, most cryptosystems focus on reducing power and area footprint - sacriﬁcing these later for enhanced side channel security. This motivates a diﬀerent approach and focuses on one of the key principles behind the power of side channel attacks. A typical attack requires the isolation of a single intermediate value (IV) that is the resultant of an operation between some known data element and the secret key. If a single intermediate value is not distinguishable, the relative security of the system increases.

This concept will not prevent statistically based attacks, but will increase a systems overall side channel attack resistance.

130 Chapter 6: Side Channel Attack and Analysis Related Work 131

6.1.1 Motivation

In the previous chapters two drastically diﬀerent approaches were taken to solve the same problem: eliminate the relationship between a device’s power side channel and underlying functionality. Both approaches removed or weakened the relationship between a target device’s side channel (Tsc) and models of the side channel (Msc). While the methods increase the resistance to side channel based attacks, they do so at a high cost for most designers by increasing fundamental design constraint metrics such as the logic style, area, power, and delay while the only providing increased security. Currently one other mention of architecture level security for FSRs exists, and while they show marked improvements in several design constraints they still come at a high cost for improved side channel security

[89].

This section proposes that with the increased prevalence of multi-core and soon many-core architectures side channel security can be enhanced through clever open, modifications to specific cryptosystem components (Secure Modified Structural Parallelism S*MSP).

In order to motivate this proposed idea, the focus here is on generic security-centric architecture modiﬁcations which, while also impacting area, power and potential delay, also has a positive impact on throughput.

6.1.2 FSR-based Algorithms

To showcase the proposed security-centric architectural modiﬁcations this work focuses on the class of stream cipher algorithms that use Feedback Shift Registers (FSRs)

[90]. These algorithms have, and continue to be heavily used in military and communication applications due to their relatively simple software and hardware implementations on a wide range of target platforms including both mechanical and electrical circuits [91, 92].

Two broad types of FSR cryptographic algorithms exist, Linear (LFSRs) and Non-Linear Chapter 6: Side Channel Attack and Analysis Related Work 132

(NLFSR). While NLFSRs are considered more cryptographically secure than LFSRs [93] both exist in many consumer hardware devices (e.g. car keys, garage door openers, secure entry badges and smart cards) and are equally susceptible to side channel based attacks [94].

Fundamentally, this section focuses on the major components shared in all FSR based architectures.

FSR based cryptographic algorithms involve the rotation of one or more registers while a transform is applied to sets of bits before being shifted back onto a register. Gen- erally these algorithms consist of two rotating registers, K and D, and some transform function F . This transform, F takes a combination of bits from both K and D to produces a new bit that is shifted onto D. These components are realized in the generic cryptographic circuit shown in Figure 6.1 as Key, Data1 and Crypto respectively.

Data1

CRYPTO

Key

Figure 6.1: A generic single-bit FSR-based cryptographic architecture with two registers, one static (Key) and the other undergoing substitution (Data1) due to a transform function (CRYPTO).

Two scenarios for register shifting are identiﬁed in traditional crypto-systems.

The ﬁrst scenario deals with a simple register rotations without modiﬁcation (Rstatic) - the

second deals with rotation and substitution (Rsub) - in both scenarios the focus will be on Chapter 6: Side Channel Attack and Analysis Related Work 133 the impact to the two main side channel models: hamming weight and hamming distance.

Future work will investigate the impact of system level modiﬁcation to the signal to noise ratio (SNR).

Assume a register, Rstatic, which undergoes a simple 1-bit rotation without substitution during each cycle (e.g. The Key register in Figure 6.1).The following two sections investigate the potential variation both hamming based models.

Hamming Weight Recall that the Hamming Weight of a register is equal to the total number of bits that are at Logic Level High while in a steady state. Formally, the hamming

Pn weight of the static register is deﬁned as: HW (Rstatic[n . . . 0]) = bit=0 Rstatic[bit].

In the simple non-substitution scheme the following bit assignment occurs during

each rotation of the register:

Rstatic[n − 1] ...Rstatic[0] = Rstatic[n] ...Rstatic[1]

Rstatic[n] = Rstatic[0]

The ﬁrst n, 1-bit rotations of a register undergoing no transformation are summarized in Table 6.1, clearly, no new information is generated: while the individual bits of the register may contain diﬀerent values, the register as a whole contains the same static information. The total number of bits at Logic Level High will always remain constant and therefor the Hamming Weight will remain constant. Consider the register in Figure

6.2 showing the ﬁrst four 1-bit rotations of a 32-bit register: during each rotation, each has an identical Hamming Weight (24) to that of the original register. Chapter 6: Side Channel Attack and Analysis Related Work 134

time Shift Register Contents

0 r[n] r[n-1] r[n-2] ··· r[i] ··· r[2] r[1] r[0]

1 r[0] r[n] r[n-1] ··· r[i+1] ··· r[3] r[2] r[1]

2 r[1] r[0] r[n] ··· r[i+2] ··· r[4] r[3] r[2] . .

i-1 r[i-2] r[i-3] r[i-4] ··· r[n] ··· r[i+1] r[i] r[i-1]

i r[i-1] r[i-2] r[i-3] ··· r[0] ··· r[i+2] r[i+1] r[i]

i+1 r[i] r[i-1] r[i-2] ··· r[1] ··· r[i+3] r[i+2] r[i+1] . .

n-2 r[n-2] r[n-3] r[n-4] ··· r[(i-2] ··· r[0] r[n] r[n-1]

n-1 r[n-1] r[n-2] r[n-3] ··· r[i-1] ··· r[1] r[0] r[n]

n r[n] r[n-1] r[n-2] ··· r[i] ··· r[2] r[1] r[0]

Table 6.1: The contents of a rotating register with no substitution during a complete n-cycle rotation.

Hamming Distance The Hamming Distance (HD) measures the inter-cycle bit switching that occurs within a register - that is, the number of 1 to 0, and 0 to 1 transitions which occur within a register. The HD is easily computing by taking the XOR of two adjacent registers states and computing the resultants Hamming Weight: HD(Rstatic,time,Rstatic,time+1) =

HW (Rstatic,time ⊕ Rstatic,time+1)

Assuming a single 1-bit shift each cycle, Table 6.1 summarizes the contents of the n-locations of register Rstatic for time 0 to n: a full rotation through Rstatic. Table 6.2 shows the resulting, sequential sets of XOR pairs. First, notice the 1-bit shift of each XOR pair in each successive set (from 0 ⊕ 1ton − 1 ⊕ n).Second, and more importantly, since the individual XOR terms remain static and only shift, their total sum (HW) will also remain Chapter 6: Side Channel Attack and Analysis Related Work 135 constant. Thus, as with the HW, the Hamming Distance remains constant when the register rotates without substitution. Namely, HD1 = HD2 = HDx for any time x.

In Figure 6.2, the Hamming Distance between any two successive rounds remains constant at 16.

XORs Hamming Distance Terms

0⊕1 r[n]⊕r[0] r[n-1]⊕r[n] r[n-2]⊕r[n-1] ··· r[i]⊕r[i+1] ··· r[2]⊕r[3] r[1]⊕r[2] r[0]⊕r[1]

1⊕2 r[0]⊕r[1] r[n]⊕r[0] r[n-1]⊕r[n] ··· r[i+1]⊕r[i+2] ··· r[3]⊕r[4] r[2]⊕r[3] r[1]⊕r[2] . .

(i-1)⊕i r[i-2]⊕r[i-1] r[i-3]⊕r[i-2] r[i-4]⊕r[i-3] ··· r[n]⊕r[0] ··· r[i+1]⊕r[i+2] r[i]⊕r[i+1] r[i-1]⊕r[i]

i⊕(i+1) r[i-1]⊕r[i] r[i-2]⊕r[i-1] r[i-3]⊕r[i-2] ··· r[0]⊕r[1] ··· r[i+2]⊕r[i+3] r[i+1]⊕r[i+2] r[i]⊕r[i+1] . .

n-2⊕n-1 r[n-2]⊕r[n-1] r[n-3]⊕r[n-2] r[n-4]⊕r[n-3] ··· r[i-2]⊕r[i-1] ··· r[0]⊕r[1] r[n]⊕r[0] r[n-1]⊕r[n]

n-1⊕n r[n-1]⊕r[n] r[n-2]⊕r[n-1] r[n-3]⊕r[n-2] ··· r[i-1]⊕r[i] ··· r[1]⊕r[2] r[0]⊕r[1] r[n]⊕r[0]

Table 6.2: The XOR pairs associated with the simple rotating register in Table 6.1 used in Hamming Distance computation. Notice the diagonal trend of the pairs, with each pair existing exactly once per line. The pairing of the XORs remains constant and as such the summation of all the XORs will also remain constant.

If all the elements within a device rotated without substitution, then beyond it being trivial and non-cryptographic in nature, it would be impossible to achieve any mean- ingful correlation. Side channel attacks work by correlating a model to a side channel; the correlation in this scenario would attempt to correlate a set of i constant HD or HW value c and a set of i variable power side channel captures s. The resulting attempt to correlate a non-variable dataset to a variable dataset results in an indeterminate (non-existent) pearson correlation as shown earlier in Chapter4 and again here in the equation set (6.2). Chapter 6: Side Channel Attack and Analysis Related Work 136

Figure 6.2: A 32-bit register, undergoing four 1-bit rotations, ending on a famous assembly magic number. During each cycle the register has a Hamming Weight (24), between cycles the Hamming Distance also remains constant (16). Chapter 6: Side Channel Attack and Analysis Related Work 137

P (c − c¯) × (s − s¯) C(c, s) = i i (6.1) pP 2 P 2 (ci − c¯) × (si − s¯) P (0) × (s − s¯) = i p P 2 0 × (si − s¯) 0 = √ 0 = indeterminate.

6.1.3 Register Rotation with Substitution

With the previous results in mind the interest, and the potential for an attack lies within registers whose contents change over time (e.g. registers that rotate with substitution, rsub) In the example NLFSR, this occurs in the data register where a new bit is generated and shifted onto the register.

Hamming Weight At time i + 1 assume a Register Rsub of length m, with the least

signiﬁcant bit (LSB) from time i (Rsub[i]) shifted and transformed, to form a new most

signiﬁcant bit (MSB) q[i]. The resulting bitwise contents for m-cycles of substition and

rotation of a generic register Rsub are shown in Table 6.4. Three unique scenarios exist

in the generation of the new bit q[i] that will impact the Hamming Weight model and

architectural-level security:

1. If q[i] = Rsub[i] then ∆HW = 0: HWi+1 = HWi;

2. If q[i] = 1 and Rsub[i] = 0 then ∆HW = +1: HWi+1 = HWi + 1;

3. If q[i] = 0 and Rsub[i] = 1 then ∆HW = −1: HWi+1 = HWi − 1.

While the ﬁrst scenario may seem ideal, it is crucial to note that if it is the only scenario that occurs during a device operation, then no transform occurs, both the HW and Chapter 6: Side Channel Attack and Analysis Related Work 138

HD remain constant, and again the problem would return to a trivial, non-cryptographic example. Additionally, if the transformation is statistically balanced, the expected probability of the three scenarios is summarized in Table 6.3.

∆HW Probability Condition

-1 25% MSB = LSB = 0

0 50% MSB = LSB

1 25% MSB = LSB = 1

Table 6.3: Probability distribution of the change in Hamming Weight (∆HW) assuming a single bit replacement with rotation.

Hamming Distance While not as straightforward, using the register content values during a complete m-cycle rotation in Table 6.4, the XOR of all sequential round pairings in

Table 6.5. Applying the Hamming Weight formula to these XOR results yields the Hamming

Distance. While the Hamming Distance appears to be the point of interest, the vulnerability of a device is in how the Hamming Distance changes over time. In order to determine the change over time of a register that under goes change the difference between any two sequential XOR sets (e.g. between [i-1, i] and [i,i+1]) is derived. Computing the difference of terms in subsequent cycles, shown in Table 6.5, yields a simple iterative relationship in the Hamming Distance over time. Between any two sequential rows exactly one new XOR term is generated: namely a 4-term expression defined by the difference of XORs of the

MSBs and LSBs in equation 6.2.

HDi,i+1 − HDi−1,i = (q[i] ⊕ q[i − 1]) − (r[i] ⊕ r[i − 1]) (6.2) Chapter 6: Side Channel Attack and Analysis Related Work 139

time Shift Register Contents

0 r[m] r[m-1] r[m-2] ··· r[i] ··· r[2] r[1] r[0]

1 q[0] r[m] r[m-1] ··· r[i+1] ··· r[3] r[2] r[1]

2 q[1] q[0] r[m] ··· r[i+2] ··· r[4] r[3] r[2] . .

i-1 q[i-2] q[j-3] q[j-4] ··· r[m] ··· r[i+1] r[i] r[i-1]

i q[i-1] q[i-2] q[i-3] ··· q[0] ··· r[i+2] r[i+1] r[i]

i+1 q[i] q[i-1] q[i-2] ··· q[1] ··· r[i+3] r[i+2] r[i+1] . .

m-2 q[m-2] q[m-3] q[m-4] ··· q[i-2] ··· q[0] r[m] r[m-1]

m-1 q[m-1] q[m-2] q[m-3] ··· q[i-1] ··· q[1] q[0] r[m]

m q[m] q[m-1] q[m-2] ··· q[i] ··· q[2] q[1] q[0]

Table 6.4: The values of a register during m-cycles of substitution and feedback.

XORs Hamming Distance Terms

0⊕1 r[m]⊕q[0] r[m-1]⊕r[m] r[m-2]⊕r[m-1] ··· r[i]⊕r[i+1] ··· r[2]⊕r[3] r[1]⊕r[2] r[0]⊕r[1]

1⊕2 q[0]⊕q[1] r[m]⊕q[0] r[m-1]⊕r[m] ··· r[i+1]⊕r[i+2] ··· r[3]⊕r[4] r[2]⊕r[3] r[1]⊕r[2] . .

(i-1)⊕i q[i-2]⊕q[i-1] q[i-3]⊕q[i-2] r[i-4]⊕q[i-3] ··· r[m]⊕q[0] ··· r[i+1]⊕r[i+2] r[i]⊕r[i+1] r[i-1]⊕r[i]

i⊕(i+1) q[i-1]⊕q[i] q[i-2]⊕q[i-1] q[i-3]⊕q[i-2] ··· q[0]⊕q[1] ··· r[i+2]⊕r[i+3] r[i+1]⊕r[i+2] r[i]⊕r[i+1] . .

(m-2)⊕(m-1) q[m-2]⊕q[m-1] q[m-3]⊕q[m-2] q[m-4]⊕q[m-3] ··· q[i-2]⊕q[i-1] ··· r[2]⊕q[1] r[m]⊕q[0] r[m-1]⊕r[m]

(m-1)⊕m q[m-1]⊕q[m] q[m-2]⊕q[m-1] q[m-3]⊕q[m-2] ··· q[i-1]⊕q[i] ··· q[1]⊕q[2] q[0]⊕ q[1] r[m]⊕q[0]

Table 6.5: The XOR pairs associated with sequential cycles of the Shift Register in Table 6.4. Computing the diﬀerence of terms in subsequent cycles yields a simple iterative relationship in the Hamming Distance computation over time. Chapter 6: Side Channel Attack and Analysis Related Work 140

The key point of interest is that the value of the Hamming Distance during any given round is a function of the previous round and the diﬀerence of two XORs for the current MSB and former LSB (See Equation 6.3). Only three unique outcomes exist for our current single bit substitution and rotation; these potential values are summarized in Table

6.6 and following the probability distribution detailed in Table 6.7.

HDi,i+1 = HDi−1,i + [(q[i] ⊕ q[i − 1]) − (r[i] ⊕ r[i − 1])] (6.3)

6.1.4 System Level Security through Parallelization

Using the results in the previous section as motivation, note that intuitively in an attack scenario the extreme cases (e.g. HD = ±1) are easier to interpret. In the single register model - a Hamming Distance value of ±1 reveals an immense amount of information

- and combined make up 50% of all potential observable Hamming Distances. Furthermore, the observation of a Hamming Distance of 0 allows for only a small window of potential scenarios as all for the new bit (recall one already knows the ﬁnal register value, the only unknowns are q[0] and q[1], and of those q[0] has supposedly already been guessed correctly)

Assume now we have two registers that rotate with substitution (this is obviously where data leaks) as seen in Figure 6.3. We’ll perform a simple probability analysis based on our earlier results.

Given a two register system, the following summarizes the potential outcomes:

Note that nothing here explicitly prevents an attack - rather the complexity of guessing the key correctly increases dramatically. With a single duplication of registers in place - a hamming distance of 0 is detected ( 37% of the time) but it can arise from 3 Chapter 6: Side Channel Attack and Analysis Related Work 141

MSB N MSB LSB N LSB ∆ HD

q[i] q[i-1] q[i] N q[i-1] r[i] r[i-1] r[i] N r[i-1] qxor-rxor

0 0 0 0 0 0 0

0 0 0 0 1 1 -1

0 0 0 1 0 1 -1

0 0 0 1 1 0 0

0 1 1 0 0 0 1

0 1 1 0 1 1 0

0 1 1 1 0 1 0

0 1 1 1 1 0 1

1 0 1 0 0 0 1

1 0 1 0 1 1 0

1 0 1 1 0 1 0

1 0 1 1 1 0 1

1 1 0 0 0 0 0

1 1 0 0 1 1 -1

1 1 0 1 0 1 -1

1 1 0 1 1 0 0

Table 6.6: Truth Table of all possible MSB and LSB combinations showing the resulting change in Hamming Distance for a 1-bit, shift register undergoing substitution can be 0 or ±1. Chapter 6: Side Channel Attack and Analysis Related Work 142

∆HD Probability Condition

-1 25% ∆LSB, MSB const.

0 50% LSB and MSB const. OR ∆LSB,∆MSB

1 25% ∆MSB, LSB const.

Table 6.7: Probability distribution of the change in Hamming Distance (∆HD) assuming a single bit replacement with rotation.

Data1 Data2

CRYPTO CRYPTO

Key

Figure 6.3: Two FSR Data Elements, each attached to the same underlying cryptographic function using the key or some unique derivation of the key.

-1 0 +1 1/4 2/4 1/4

-1 -2 -1 0 1/16 2/16 1/16 1/4

0 -1 0 1 2/16 4/16 2/16 2/4

+1 0 1 2 1/16 2/16 1/16 1/4

Table 6.8: The potential HD ∆ values given two registers on the left, with the associated probabilities on the right. Chapter 6: Side Channel Attack and Analysis Related Work 143

∆HW—HD P (∆) %

-2 1/16 12.5%

-1 1/4 25%

0 3/8 37.5%

+1 1/4 25%

+2 1/16 12.5%

Table 6.9: Probability distribution of the change in Hamming Measurements (∆HW—HD) assuming a single bit replacement in two registers simultaneously. unique scenarios, similar statements can be made about the ± 1 instances in the 2 register implementation. The only guaranteed leakage of information in this version is when the hamming weight ∆ is ±2 - this situation corresponds to the ±1 instance in single register version.

Quadrupled

While the added complexity of a two data register reduces the extreme tails (±2) to a total 25% of the time, our interest is on reducing it further. Following the same logic, Figure 6.4 depicts a quadrupled data register version of the original algorithm shows additional promise.

With two duplication of registers occurring - a hamming distance of 0 is detected

( 27% of the time) but it can arise from 5 unique scenarios, similar statements can be made about the ±1,2,3 instances in the 4 register implementation. The only guaranteed leakage of information in this version is when the hamming weight ∆ is ±4 - this situation corresponds to the ±1 instance in single register version. Chapter 6: Side Channel Attack and Analysis Related Work 144

Data1 Data2

CRYPTO CRYPTO

Key

CRYPTO CRYPTO

Data3 Data4

Figure 6.4: Four FSR Data Elements, each attached to the same underlying cryptographic function using the key or some unique derivation of the key.

-2 -1 0 +1 +2 1/16 4/16 6/16 4/16 1/16

-2 -4 -3 -2 -1 0 1 4 6 4 1 1/16

-1 -3 -2 -1 0 1 4 16 24 16 4 4/16

0 -2 -1 0 1 2 6 24 36 24 6 6/16

+1 -1 0 1 2 3 4 16 24 16 4 4/16

+2 0 1 2 3 4 1 4 6 4 1 1/16 Chapter 6: Side Channel Attack and Analysis Related Work 145

50 Single Register

40 Doubled Register

30 Quadrupled Register

25 Probability

-4 -3 -2 -1 0 1 2 3 4

Change in Hamming Weight / Distance

Figure 6.5: Probability Distribution of the Possible Hamming Weight/Distance Changes for 1, 2 and 4 Registers undergoing a single bit rotation. Chapter 6: Side Channel Attack and Analysis Related Work 146

∆ HW—HD P (∆) %

-4 1/256 <0.4%

-3 1/32 3.125%

-2 7/64 10.9%

-1 7/32 21.9%

0 35/128 27.3%

+1 7/32 21.9%

+2 7/64 10.9%

+3 1/32 3.125%

+4 1/256 <0.4%

Table 6.10: Probability distribution of the change in Hamming Measurements (∆HW—HD) assuming a single bit replacement in two registers simultaneously.

6.2 Tool and Technologies Developed

The following sections detail a collection of tools that have been developed during

the past six year in order to adequately develop and test Side Channel vulnerability and

solutions in a Design Automation context. These tools range from a broad, ﬂexible Side

Channel Analysis Research Framework (SCARF) that is meant to be used a hardware test

bench driver and ﬂexible DPA analysis tool to secure logic logic ﬂow meant to automatically

implement and test secure logic styles against DPA attacks. Supplementary tools highlight

some of the unique Perl based functions developed in order to perform rapid statistical

calculations for Mutual Information. Chapter 6: Side Channel Attack and Analysis Related Work 147

6.2.1 SCARF 1

The Side-Channel Analysis Research Framework (SCARF) [95]was created to address the complexities of designing a software infrastructure capable of managing both simulation and measurement data related to side channel research. The ultimate goal is to compute relevant metrics and create human readable tabular and graphic plots relating to cryptographic side channel analysis research.

The core frame work of SCARF is written in Ruby and uses the R scripting language for additional statistical computation assistance as well as REDIS as on in-store memory database solution. The source code is freely available as a gem online [96].

For interested researchers, the software framework provides the following:

Raw Trace Analysis Allows plotting, inspection, aggregation, permutations and statis-

tical analysis of raw data.

Leakage Modeling Reusable functions that are included in the core are Hamming Weight

and Hamming Distance

Statistical Analysis Reusable functions include standard statistical methods such as mean,

standard deviation and correlation coeﬃcient.

Side Channel Attack Models The framework provides a ﬂexible workﬂow that allows

data to be processes in chunks across many phases of complex attack processes. The

tool allows for arbitrary attack processes as deﬁned by the user.

Post Processing Provides the end user with human readable tabular and graphical data

highlighting various metrics that are used to characterize the eﬀectiveness of a given

attack.

1Research work discussed in this section is collaborative work with Greg Meﬀord, additional details can be found in [95]. Chapter 6: Side Channel Attack and Analysis Related Work 148

The online gem also includes fully functioning implementations of two ﬂows a simple NLFSR block cipher as well as simpliﬁed version of DES.

6.2.2 Secure Cell Logic Synthesis Methodology 2

There is an open need for design methodologies capable of developing cryptographic circuits using the SDMLp Logic Family. The methodology proposed in [97] uses

Binary Decision Diagram (BDD) Logic Synthesis [25] for Pass Transistors to achieve eﬃcient results and consists of:

• A Bottom-up Design Methodology from RTL to GDS.

• Symmetric and area eﬃcient layout in 90nm technology

• SDMLp Standard Cell Library is developed in 90nm technology

Generation of SDMLp Standard Cell Library followed the Standard ASIC Design Flow, limiting the available cells to those that are readily replaced by SDMLp cells - namely

NAND, NOR, AND, OR , XOR and XNOR. Timing and Power characterization done using standard synopsys Liberty NCX tool.

In conjunction with the logic synthesis ﬂow, a DPA Attack Methodology was implemented with the following features:

• A Push button Makeﬁle, Perl based ﬂow;

• A modular ﬂow capable of implementing various DPA attacks; and

• A Highly conﬁgurable and parameterized to easily implement DPA on any crypto-

graphic logic.

2Research work discussed in this section is collaborative work with Antar Singh, additional details [97]. Chapter 6: Side Channel Attack and Analysis Related Work 149

The push button attack ﬂow is a Makeﬁle based tool that uses Perl and compiled

C code to:

• Generate gate level spice netlist for the target circuit;

• Generate spice stimuli for the DPA attack;

• Run Nanosim simulation to grab current values;

• Perform the Attack; and

• Generate human readable graphics.

6.2.3 Supplementary

Since the objective of the High Level security methods was to provide any researcher or designer the ability to create secure FSMs all of the tools used in the FSM translation, encoding, statistics and metric evaluation were written in Perl. One of the intermediate programs (automatically created using the evaluation framework) is written in Python and uses a freely available Z3 API. The major tools and contributions include:

• fsm2sfsm.pl to convert fsms in kiss or exl format to structurally secured versions.

• fsm2z3.pl to create a python-based interface to the Z3 theorem solver for a given fsm

under the speciﬁed HW and HD constraints that, when executed, returns the minimal

encoding that satisﬁes those requirements.

• fsm2oracle.pl to generate a perl based oracle of an FSM with a speciﬁc encoding

(optional) that can be passed a stimulus ﬁle to generate the exact state and transition

history (including hamming values when an encoding is speciﬁed). Chapter 6: Side Channel Attack and Analysis Related Work 150

• fsm2verliog.pl to allow designers with Verilog toolchains to automatically convert

FSMs to synthesizable verilog code.

• fsm2graphics.pl converts fsms to various graphical formats suitable for publication.

• genFSM.pl to generate FSM benchmarks of any given size, interconnection load and

self-loop percentage.

• fsmStats.pl to perform advanced statistical analysis on two data streams (actual and

side channel measurements) without need for external mathematics software (Matlab,

Octave, etc) or custom libraries (GSL, BLAS, BLITZ, etc).

These tools, as well as the command line shell scripts that automate their use can be found online and freely available under Creative Commons Attribute and Share-Alike licensing [79]. Chapter 7

Conclusions and The Path Forward

This chapter provides overarching conclusions and contributions for the three main areas discussed within this dissertation. The chapter also lays out potential avenues for continued work - ranging from low level cell and technology exploration to FSM and High

Level research areas including some topics quite removed from the discussion presented earlier. The chapter concludes with ﬁnal thoughts on the high level approach to side channel security - in particular opinions about the direction and eﬀort that this research could take.

7.1 Conclusions & Contributions

This work provided both low level and high-level solutions to side channel attack resistance while using a novel set of tools. At the low level the focus was on mitigating current variations of a single universal cell and then scaling it to larger circuits:

• A universal cell, using pass-transistors, was deﬁned, tested and validated. Its design

costs approach those of SCMOS logic styles and its side channel leakage is orders of

magnitude less than other secure logic styles.

151 Chapter 7: Conclusions and The Path Forward 152

• A modiﬁed BDD logic based synthesis tool was presented to improve the layout ef-

ﬁciency of the secure cell. The tool was tested by synthesizing the DES encryption

algorithm using three logic styles showing again the secure cell’s cost reduction over

other secure logics.

• The eﬀects of temperature and Vth were characterized, showing that devices, while

requiring more power at high temperature, are more secure since the data dependent

power is masked by increased power dependent leakage.

At the high level the focus was on the ability to disconnect Hamming models from the underlying functionality of a device:

• The concept of structurally secured FSM was introduced and shown to be the only

way to remove the leakage of Hamming Distance information from an FSM.

• Secure encodings were deﬁned and implemented using a SMT solver. Their cost, both

in bits and power motivated alternative strategies.

• Modiﬁed encoding strategies for structurally relaxed FSM were introduced, showing

a trade oﬀ between layout area and power at the cost of security.

• Finally, constrained encodings were also introduced to both standard and structurally

secured machines allowing power savings and security improvements for both secure

and insecure circuits at the cost of area.

7.2 Future Work

As with any research work, every question answered spawns other questions and potential research veins. Some of the following areas of research are simply extensions of the work already done by the author and several master’s students while other work is tangential Chapter 7: Conclusions and The Path Forward 153 or completely removed from any existing work. Interested future researchers are cautioned to read the ﬁnal thoughts section before investing time in the following extension-based work.

7.2.1 Low Level Methods and Logic Synthesis

• Identify any early propagation eﬀects on SDMLp cells and circuits [98].

• Determine root cause of non-linear scaling properties of SDMLp cells within circuits.

• Reduce leakage by adjusting the time delay between the input signals thereby requiring

many additional constraints within the circuit design [99].

• Complete parametric evaluation of SDMLp cells in the presence of process and en-

vironmental variation - especially at smaller technology features sizes and extreme

temperatures (e.g. full temperature spectrum -40C - 120C)

• Develop dual rail technique for routing of SDMLp circuits

• Perform hardware attack on SDMLp implementation of DES or AES circuit strength-

ening the result.

• Perform software attack in lower technologies (45nm, 32nm etc) to evaluate the eﬀect

of leakage power for SDMLp in providing resistance against DPA attacks.

7.2.2 Automata Based Side Channel Protection

While this work has focused and shown the feasibility of the protection and cost mitigation of secure automatas it opens avenues for further study and research. Having demonstrated MI as a useful metric allows future ”real-time” security synthesis using high level methods. Chapter 7: Conclusions and The Path Forward 154

• Investigate the feasibility of splitting a designs data path and control logic in order

to apply security in two distinct parts; Low-Level SDMLp cells for the data path and

S*FSMs for the control logic.

• Development of optimization technique for Npeak selection based on weighted function

of other design parameters including area and power.

• Characterize the impacts of reset states on restructuring, secure encodings and on

security if the HD constraint to the reset state is relaxed.

• Current Benchmark Generation does not approximate real world circuits - investiga-

tion of a ﬁner grained benchmark generation system and complete push button ﬂow

would be beneﬁcial.

• Real-time feedback, e.g. re-encoding based on synthesis results can enhance the

methodology for ”automated” use. (E.g. If power exceeds speciﬁed limits, re-encoding

with set constraints.)

• Add Mutual Information as a synthesis constraint. This requires the synthesis ﬂow

to generate the oracle and perform some circuit stimulus (there is no way for the ﬂow

to assume anything about structure of the underlying system so assuming the model

is constant is not valid, it needs to be a real-time synthesis check).

• Validate and migrate the Mutual Information metric to low-level designs. Existing

solutions are ”all of nothing” approaches. Solutions presented in this work, including

oracle generation and MI, can be used for selective/iterative critical security path

evaluation and cell replacement.

• Validate the impact and overhead of using secure logic styles in conjunction with the

high level methodology. Chapter 7: Conclusions and The Path Forward 155

7.2.3 System Level Security though Duplication

While the system level method for duplication shows promise, an open item still remains in preventing attack from data duplication. Recall that in the current conﬁguration, the use a single common key in would open a large security vulnerability. Imagine the use of a single key to control all the crypto-devices within the system- a relatively simple attack would load all the data registers with the same value. Rather than masking the Hamming

Values, it exaggerates them. For this high level method to work the following enhancements could be assessed.

• Use of one of the duplicated modules with a randomized key acting on the same input

data stream

• Use of a duplicate module with the same key and randomized data

• Generation and use of unique sub-key rather than the same master key in all the

duplicate instantiations

• Modiﬁcation of the rotating key mechanism already in place in many key-hopping

algorithms

• Validate the impact and overhead of using secure logic styles (SDMLp) in conjunction

with the system level methodology.

7.2.4 Attacks

In order to develop eﬀective countermeasures, investigation of existing vulnerabil- ities is critical. Of extreme interest are the following attacks on speciﬁc modules as well as methods for attack:

• Side Channel leakage of hardware-based random number generators. Chapter 7: Conclusions and The Path Forward 156

• Side Channel leakage of Trusted Platform Modules.

• Data-fusion techniques for advanced attacks.

• Attacks on SoCs using standard interfaces with modiﬁed/hijacked protocols.

7.3 Beyond the Dissertation

Several intriguing possibilities exist beyond the scope this dissertation in the realm of Side Channel Attacks, or more generally information leakage. If a side channel security metric is ever to become as common and accessible to every hardware designer as speed, layout area or power consumed - the methods and topics discussed within this proposal are only the first step. While independently, each of the methods holds clout, the binding between the various abstraction levels does not currently allow for the evaluation and propagation of a unified security metric for a given hardware device. Open areas for continued research exist in how metrics are abstraction level dependent, how they can be unified, passed interchangeably to higher and lower levels of abstractions. These ideas could also lead to efficient mechanisms for the automated detection of ”critical security paths.”

Also of considerable interest is the rapid detection and classiﬁcation of that information exists within a signal. As such using previous work in the area of medical imaging and informatics, speciﬁcally fractal dimension analysis [100–103] could yield interesting results. Regardless of the underlying mechanism the obvious extension is to side channel self monitoring by a device or oracle sitting outside the device. Implementation of these monitors would aide in mitigating active attacks on a system in which input parameters

(possibly transparent to a device, e.g. temperature) are modiﬁed to elicit the leakage of information. Chapter 7: Conclusions and The Path Forward 157

7.4 Final Thoughts

The main topic of discussion throughout this dissertation has been on the relationship between side channels and their models. At low levels, methods to remove the data dependence of the side channel were explored - focusing mostly on the use of the

SDML universal cell using pass-transistor logic. The dual routed cells use in circuits was optimized using clever mapping of gate subsets onto binary decision diagrams and then applying custom minimizations relevant to dual-complementary logic styles. Additionally, low-level parametric variations of physical properties (process and temperature based) were also investigates as mitigating information leakage. At high-levels secure FSM approaches were investigated to allow any designer to remove the Model to Side Channel relationship.

The theory was extended beyond removal of the relationship, and instead it’s mitigation to allow for power-driven security compromises.

All of the solutions presented assume isolated systems - in reality, as IP blocks are internalized into larger hardware circuits and SoCs this assumption will be less prevalent

(except in ultra-low power circuits). This leads to two main thrusts of research - ﬁrst detection of information within large circuits and second the attackability of a target IP circuit through active attack of remote IPs. The ﬁrst of these problems likely involves data fusion in more than two dimensions. While current attacks use a single side channel and side channel model the author suspects that any useful, compromising, attack on an embedded system will require at a minimum a triangulation of multiple data sets. The second of these problems is much more likely and probable as a real-world attack vector. Assume a complex system such a mobile processor - inputs to the processor travel on shared communication buses and once on the bus - act with certain privilege. Abuse of these direct inputs to the processor could easily fool the system into releasing sensitive information. While these ideas Chapter 7: Conclusions and The Path Forward 158 are not concrete, they provide an interested researching a starting point in novel areas of hardware security. Bibliography

[1] “Tempest: A Signal Problem,” Cryptologic Spectrum, vol. 2, no. 3, pp. 26–30, 1972.

[2] J. Rossen and J. Davis, “Police admit they’re ’stumped’ by mystery car thefts,” in Today News, 2013.

[3] M. Geuss, “After burglaries, mystery car unlocking device has police stumped,” in Ars Technica, 2013.

[4] D. Agrawal, B. Archambeault, J. R. Rao, and P. Rohatgi, “The em side-channel(s),” in Revised Papers from the 4th International Workshop on Cryptographic Hardware and Embedded Systems, ser. CHES ’02. London, UK, UK: Springer-Verlag, 2003, pp. 29–45.

[5] P. C. Kocher, “Timing attacks on implementations of diﬃe-hellman, rsa, dss, and other systems,” in Proceedings of the 16th Annual International Cryptology Conference on Advances in Cryptology, ser. CRYPTO ’96. London, UK, UK: Springer-Verlag, 1996, pp. 104–113.

[6] C. Clavier, J.-S. Coron, and N. Dabbous, “Diﬀerential Power Analysis in the Presence of Hardware Countermeasures,” in Proceedings of the Second International Workshop on Cryptographic Hardware and Embedded Systems, ser. CHES ’00. London, UK: Springer-Verlag, 2000, pp. 252–263.

[7] C. K. Koc, Cryptographic Engineering. Springer US, 2009.

[8] U. D. of Commerce and NIST, “Data encryption standard,” FIPS, October 1999.

[9] H. Feistel, Cryptography and Computer Privacy. Scientiﬁc American, 1973.

[10] R. Silverman, “A cost-based security analysis of symmetric and asymmetric key lengths,” RSA, Tech. Rep., 2001.

[11] A. Kerckhoﬀs, “La cryptographie militaire,” Journal des sciences militaires, vol. IX, pp. 5–83, 161–191, January, February 1883.

[12] C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, 1949.

159 Bibliography 160

[13] W. Einthoven, “The string galvanometer and the human electrocardiogram,” in KNAW.

[14] J. Rabaey, Digital Integrated Circuits: A Design Perspective, ser. Prentice Hall electronics and VLSI series. Prentice Hall, 1996.

[15] S. Mangard, E. Oswald, and T. Popp, Power Analysis Attacks: Revealing the Secrets of Smart Cards (Advances in Information Security). Secaucus, NJ, USA: Springer- Verlag New York, Inc., 2007.

[16] K. Tiri, M. Akmal, and I. Verbauwhede, “A Dynamic and Diﬀerential CMOS Logic with Signal Independent Power Consumption to Withstand Diﬀerential Power Anal- ysis on Smart Cards,” in Solid-State Circuits Conference, 2002. ESSCIRC 2002. Pro- ceedings of the 28th European, 2002, pp. 403 – 406.

[17] K. Tiri and I. Verbauwhede, “Dynamic and diﬀerential cmos logic with signal- independent power consumption to withstand diﬀerential power analysis,” Patent US7 417 468.

[18] V. Sundaresan, “Architectural Synthesis Tehcniques for Design of Correct and Secure ICs,” Thesis, 2008.

[19] N. S. Kim, T. Austin, D. Blaauw, T. Mudge, K. Flautner, J. S. Hu, M. J. Irwin, M. Kandemir, and V. Narayanan, “Leakage current: Moore’s law meets static power,” Computer, 2003.

[20] P. Kocher, J. Jaﬀe, and B. Jun, “Diﬀerential Power Analysis,” in Advances in Cryp- tology — CRYPTO’ 99, ser. Lecture Notes in Computer Science, M. Wiener, Ed. Springer Berlin / Heidelberg, 1999, vol. 1666, pp. 789–789.

[21] P. Schaumont and K. Tiri, “Masking and dual-rail logic don’t add up,” in Crypto- graphic Hardware and Embedded Systems - CHES 2007, ser. Lecture Notes in Com- puter Science, P. Paillier and I. Verbauwhede, Eds. Springer Berlin / Heidelberg, 2007, vol. 4727, pp. 95–106.

[22] K. Tiri, D. Hwang, A. Hodjat, B.-C. Lai, S. Yang, P. Schaumont, and I. Verbauwhede, “Prototype ic with wddl and diﬀerential routing - dpa resistance assessment,” in Cryptographic Hardware and Embedded Systems CHES 2005, ser. Lecture Notes in Computer Science, J. Rao and B. Sunar, Eds. Springer Berlin / Heidelberg, 2005, vol. 3659, pp. 354–365.

[23] L. N. Ramakrishnan, M. Chakkaravarthy, A. S. Manchanda, M. Borowczak, and R. Vemuri, “SDMLp: On the use of Complementary Pass Transistor Logic for De- sign of DPA Resistant Circuits,” in International Symposium on Hardware-Oriented Security and Trust (HOST), 2012.

[24] L. N. Ramakrishnan, “Sdmlp - secure diﬀerential multiplexer logic : Logic design for dpa-resistant cryptographic circuits.” Thesis, Univerity of Cincinnati, 2011. Bibliography 161

[25] M. Chakkaravarthy, “Bdd based synthesis ﬂow for design of dpa resistant cryptographic circuits,” Thesis, Univerity of Cincinnati, 2012.

[26] K. Tiri and I. Verbauwhede, “Charge recycling sense ampliﬁer based logic: securing low power security ics against dpa [diﬀerential power analysis],” in Solid-State Circuits Conference, 2004. ESSCIRC 2004. Proceeding of the 30th European, sept. 2004, pp. 179 – 182.

[27] A. Moradi, M. Khatir, M. Salmasizadeh, and M. Manzuri Shalmani, “Charge recovery logic as a side channel attack countermeasure,” in Quality of Electronic Design, 2009. ISQED 2009. Quality Electronic Design, march 2009, pp. 686 –691.

[28] J. Zeng, Y. Wang, C. Xu, and R. Li, “Improvement on masked S-box hardware implementation,” in 2012 International Conference on Innovations in Information Technology (IIT), 2012.

[29] Y.-J. Baek and M.-J. Noh, DPA-Resistant Finite Field Multipliers and Secure AES Design. Springer Berlin Heidelberg, 2006.

[30] E. Oswald, S. Mangard, N. Pramstaller, and V. Rijmen, “A side-channel analysis resistant description of the aes s-box,” in Fast Software Encryption, 2005, pp. 413– 423.

[31] E. Oswald and S. Mangard, Counteracting Power Analysis Attacks by Masking. Springer US, 2010.

[32] N. Mentens, B. Gierlichs, and I. Verbauwhede, “Power and fault analysis resistance in hardware through dynamic reconﬁguration,” Cryptographic Hardware and . . . , 2008.

[33] A.-T. Hoang and T. Fujino, “Intra-masking dual-rail memory on LUT implementation for tamper-resistant AES on FPGA,” 2012.

[34] Q. Li, S.-M. Koo, M. D. Edelstein, J. S. Suehle, and C. A. Richter, “Silicon nanowire electromechanical switches for logic device application,” Nanotechnology, vol. 18, no. 31, p. 315202, 2007.

[35] J.-P. Pegourie, “Pneumatic logic circuits and their integrated circuits,” Sep. 6 1977.

[36] S. C. Duncan, “Minecraft, beyond construction and survival,” Well Played, vol. 1, no. 1, pp. 1–22, Jan. 2011.

[37] C. Wingrave, J. Norton, C. Ross, N. Ochoa, S. Veazanchin, E. Charbonneau, and J. LaViola, “Inspiring creative constructivist play,” in Proceedings of the 2012 ACM annual conference extended abstracts on Human Factors in Computing Systems Ex- tended Abstracts, ser. CHI EA ’12. New York, NY, USA: ACM, 2012, pp. 2339–2344.

[38] A. Okamoto, K. Tanaka, and I. Saito, “Dna logic gates,” Journal of the American Chemical Society, vol. 126, no. 30, pp. 9458–9463, 2004. Bibliography 162

[39] S. Yang, W. Wolf, N. Vijaykrishnan, D. N. Serpanos, and Y. Xie, “Power attack resistant cryptosystem design: A dynamic voltage and frequency switching approach,” in Proceedings of the conference on Design, Automation and Test in Europe - Volume 3, ser. DATE ’05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 64–69.

[40] D. Suzuki, M. Saeki, and T. Ichikawa, “Random switching logic: A new countermeasure against dpa and second-order dpa at the logic level,” IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. E90-A, no. 1, pp. 160–168, Jan. 2007.

[41] K. Tiri and I. Verbauwhede, “A logic level design methodology for a secure dpa resistant asic or fpga implementation,” 2004, pp. 246–251.

[42] V. Sundaresan, S. Rammohan, and R. Vemuri, “Power Invariant Secure IC Design Methodology Using Reduced Complementary Dynamic and Diﬀerential Logic,” in Very Large Scale Integration, 2007. VLSI - SoC 2007. IFIP International Conference on, 2007, pp. 1 –6.

[43] K. Tiri and I. Verbauwhede, “Place and route for secure standard cell design,” in Smart Card Research and Advanced Applications VI, ser. IFIP International Feder- ation for Information Processing, J.-J. Quisquater, P. Paradinas, Y. Deswarte, and A. El Kalam, Eds. Springer Boston, 2004, vol. 153, pp. 143–158.

[44] G. Paul, S. Pradhan, A. Pal, and B. Bhattacharya, “Low power bdd-based synthesis using dual rail static dcvspg logic,” in Circuits and Systems, 2006. APCCAS 2006. IEEE Asia Paciﬁc Conference on, dec. 2006, pp. 1504 –1507.

[45] F. Somenzi, “Cudd: Cu decision diagram package,” 2000.

[46] R. E. Bryant, “Graph-based algorithms for boolean function manipulation,” IEEE Trans. Comput., vol. 35, no. 8, pp. 677–691, aug 1986.

[47] R. Junee, “Power analysis attacks :: A weakness in cryptographic smart cards and microprocessors,” Master’s thesis, Bachelor of Computer Engineering and Bachelor of Commerce, November 2002.

[48] K. Tanimura and N. Dutt, “Exccel: Exploration of complementary cells for eﬃcient dpa attack resistivity,” in Hardware-Oriented Security and Trust (HOST), 2010 IEEE International Symposium on, june 2010, pp. 52 –55.

[49] A. Vijaykumar, “Dpa resistance of cryptographic circuits considering temperature and process variations,” Master’s thesis, University of Cincinnati, 2012.

[50] R. F. Pierret, Semiconductor device fundamentals. Pearson Education India, 1996.

[51] R. Kumar and V. Kursun, “Modeling of temperature eﬀects on nano-cmos devices with the predictive technologies,” in Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on. IEEE, 2007, pp. 694–697. Bibliography 163

[52] G. Chindalore, S. Mudanai, W.-K. Shih, A. Tasch Jr, and C. Maziar, “Temperature dependence characterization of eﬀective electron and hole mobilities in the accumula- tion layers of n-and p-type mosfet’s,” Electron Devices, IEEE Transactions on, vol. 46, no. 6, pp. 1290–1294, 1999.

[53] J.-C. Sun, Y. Taur, R. H. Dennard, and S. P. Klepner, “Submicrometer-channel cmos for low-temperature operation,” Electron Devices, IEEE Transactions on, vol. 34, no. 1, pp. 19–27, 1987.

[54] R. Kumar and V. Kursun, “Reversed temperature-dependent propagation delay characteristics in nanometer cmos circuits,” Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 53, no. 10, pp. 1078–1082, 2006.

[55] B. Zeghbroeck, Principles of semiconductor devices and heterojunctions. Paperback- Nov, 2008, vol. 25.

[56] M. TB003, “An introduction to keeloq code hopping,” Microchip Technology Inc., DS91002A, 1996.

[57] C. Keeloq, “source code by ruptor,” See http://cryptolib. com/ciphers.

[58] V. Alagar and K. Periyasamy, “Extended ﬁnite state machine,” in Speciﬁcation of Software Systems, ser. Texts in Computer Science. Springer London, 2011, pp. 105– 128.

[59] Z. Jiang, M. Pajic, S. Moarref, R. Alur, and R. Mangharam, “Modeling and veriﬁ- cation of a dual chamber implantable pacemaker,” in Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2012, pp. 188–203.

[60] Z. Jiang, M. Pajic, and R. Mangharam, “Cyber–physical modeling of implantable cardiac medical devices,” Proceedings of the IEEE, vol. 100, no. 1, pp. 122–137, 2012.

[61] M. Borowczak and R. Vemuri, “S*FSM: An Paradigm Shift for Attack Resistant FSM Designs and Encodings,” in Redeﬁning and Integrating Security Engineering, 2012. RISE 2012. ASE International Conference on cyber security, dec. 2012, pp. 651 – 655.

[62] I. Verbauwhede, Secure Integrated Circuits and Systems, ser. Integrated Circuits and Systems. Springer London, Limited, 2010.

[63] F. Mac´e,F.-X. Standaert, and J.-J. Quisquater, “Information theoretic evaluation of side-channel resistant logic styles,” in Cryptographic Hardware and Embedded Systems - CHES 2007, ser. Lecture Notes in Computer Science, P. Paillier and I. Verbauwhede, Eds. Springer Berlin / Heidelberg, 2007, vol. 4727, pp. 427–442.

[64] K. Kulikowski, A. Smirnov, and A. Taubin, “Automated design of cryptographic devices resistant to multiple side-channel attacks,” in In Workshop on Cryptographic Hardware and Embedded Systems, 2006, pp. 339–413. Bibliography 164

[65] J. Golićand C. Tymen, “Multiplicative masking and power analysis of aes,” in Cryp- tographic Hardware and Embedded Systems - CHES 2002, ser. Lecture Notes in Com- puter Science, B. Kaliski, Koetin, and C. Paar, Eds. Springer Berlin / Heidelberg, 2003, vol. 2523, pp. 31–47. [66] M. Koegst, G. Franke, and K. Feske, “State assignment for FSM low power design,” in Design Automation Conference, 1996, with EURO-VHDL ’96 and Exhibition, Pro- ceedings EURO-DAC ’96, European, 1996. [67] C. Cao, B. C. Oelmann, and S. . M. . 50th Midwest Symposium on, “The analysis of power-related characteristics of FSM benchmarks,” Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on, 2007. [68] M. A. Pasha, S. Derrien, and O. Sentieys, “Ultra low-power FSM for control oriented applications,” Circuits and Systems, 2009. ISCAS 2009. IEEE International Symposium on, 2009. [69] K. Akdemir and B. Sunar, “Generic approach for hardening state machines against strong adversaries,” Computers & Digital Techniques, IET, 2010. [70] Z. Wang and M. Karpovsky, “Robust FSMs for cryptographic devices resilient to strong fault injection attacks,” On-Line Testing Symposium (IOLTS), 2010 IEEE 16th International, 2010. [71] H. Moradmand and A. A. T. f. C. A. . I. C. o. Payandeh, “Secure finite state integer arithmetic codes,” in Advanced Technologies for Communications (ATC), 2011 International Conference on, 2011. [72] T. Li, W. Zhang, and Z. Yu, “Full-chip leakage analysis in nano-scale technologies: Mechanisms, variation sources, and verification,” in Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE, 2008. [73] R. Shen, S. X. D. Tan, and H. Yu, Statistical Performance Analysis and Modeling Techniques for Nanometer VLSI Designs. Springer, 2012. [74] M. Dietrich and J. Haase, Process Variations and Probabilistic Integrated Circuit De- sign. Springer, 2011. [75] J. Johansson and J. Forskitt, System Designs into Silicon:. Taylor & Francis, 1993. [76] J. Hennessy, D. Patterson, and A. Arpaci-Dusseau, Computer architecture: a quan- titative approach, ser. The Morgan Kaufmann Series in Computer Architecture and Design. Morgan Kaufmann, 2007, no. v. 1. [77] T.-Y. Yeh and Y. N. Patt, “Two-level adaptive training branch prediction,” in Pro- ceedings of the 24th annual international symposium on Microarchitecture, ser. MI- CRO 24. New York, NY, USA: ACM, 1991, pp. 51–61. [78] D. Grune and C. Jacobs, Parsing techniques: a practical guide, ser. Monographs in computer science. Springer, 2008. Bibliography 165

[79] M. Borowczak, “SFSM,” in Github, 2013.

[80] L. De Moura and N. Bjørner, “Z3: An eﬃcient SMT solver,” Tools and Algorithms for the Construction and Analysis of Systems, pp. 337–340, 2008.

[81] L. Jozwiak, D. Gawlowski, and A. Slusarczyk, “An eﬀective solution of benchmarking problem: Fsm benchmark generator and its application to analysis of state assignment methods,” in Digital System Design, 2004. DSD 2004. Euromicro Symposium on, aug.-3 sept. 2004, pp. 160 – 167.

[82] S. Yang, “Logic Synthesis and Optimization Benchmarks User Guide Version 3.0,” 1991.

[83] E. Olson and S. Kang, “State assignment for low-power FSM synthesis using genetic local search,” in Custom Integrated Circuits Conference, 1994., Proceedings of the IEEE 1994, may 1994, pp. 140 –143.

[84] F. Gao and J. Hayes, “ILP-based optimization of sequential circuits for low power,” in Low Power Electronics and Design, 2003. ISLPED ’03. Proceedings of the 2003 International Symposium on, aug. 2003, pp. 140 – 145.

[85] C. Cao and B. Oelmann, “Mixed synchronous/asynchronous state memory for low power FSM design,” in Digital System Design, 2004. DSD 2004. Euromicro Sympo- sium on, aug.-3 sept. 2004, pp. 363 – 370.

[86] L. Yuan, G. Qu, T. Villa, and A. Sangiovanni-Vincentelli, “Fsm re-engineering and its application in low power state encoding,” in Design Automation Conference, 2005. Proceedings of the ASP-DAC 2005. Asia and South Paciﬁc, vol. 1, jan. 2005, pp. 254 – 259 Vol. 1.

[87] S.-H. Huang, C.-M. Chang, and Y.-T. Nieh, “State Re-Encoding for Peak Current Minimization,” in Computer-Aided Design, 2006. ICCAD ’06. IEEE/ACM Interna- tional Conference on, nov. 2006, pp. 33 –38.

[88] Y. Lee and T. Kim, “State encoding algorithm for peak current minimisation,” Com- puters Digital Techniques, IET, vol. 5, no. 2, pp. 113 –122, march 2011.

[89] S. Mansouri and E. Dubrova, “An architectural countermeasure against power analysis attacks for FSR-Based stream ciphers,” Constructive Side-Channel Analysis and Secure Design, 2012.

[90] M. Goresky and A. Klapper, Algebraic Shift Register Sequences. Cambridge Univer- sity Press, 2012.

[91] L. M. Paoletti, “Autodin,” Computer Communication Networks, 1973.

[92] N. Gupta and G. P. Biswas, “Wep implementation using linear feedback shift register (lfsr) and dynamic key,” in 2011 2nd International Conference on Computer and Communication Technology (ICCCT), 2011. Bibliography 166

[93] B. Preneel, “A survey of recent developments in cryptographic algorithms for smart cards,” Computer Networks, 2007.

[94] W. Fischer, B. Gammel, O. Kniﬄer, and J. Velten, “Diﬀerential power analysis of stream ciphers,” Topics in Cryptology–CT-RSA 2007, 2006.

[95] G. Meﬀord, “Side channel analysis research framework,” Master’s thesis, University of Cincinnati, 2012.

[96] ——, “SCARF,” in Github, 2013.

[97] A. S. Manchanda, “Design methodology for diﬀerential power attack resistant circuits,” Master’s thesis, University of Cincinnati, 2013.

[98] K. Kulikowski, M. Karpovsky, and A. Taubin, “Power attacks on secure hardware based on early propagation of data,” in On-Line Testing Symposium, 2006. IOLTS 2006. 12th IEEE International, 0-0 2006, p. 6 pp.

[99] J. Domingo-Ferrer, J. Posegga, D. Schreckling, and I. W. . S. Cards), Smart card research and advanced applications: 7th IFIP WG 8.8/11.2 International Conference, CARDIS 2006, Tarragona, Spain, April 19-21, 2006 : proceedings, ser. Lecture notes in computer science. Springer, 2006.

[100] Y. Yu, S. Jingshan, L. Baoliang, and Y. Shixuan, “Signal feature extraction base on fractal dimensions of time-frequency domain,” Nature & Biologically Inspired Com- puting, 2009. NaBIC 2009. World Congress on, pp. 988–992, 2009.

[101] D. Easwaramoorthy and R. Uthayakumar, “Estimating the complexity of biomedical signals by multifractal analysis,” Students’ Technology Symposium (TechSym), 2010 IEEE, pp. 6–11, 2010.

[102] M. Mikhail, K. El-Ayat, R. El Kaliouby, J. Coan, and J. J. B. Allen, “Emotion detection using noisy EEG data,” in AH ’10: Proceedings of the 1st Augmented Human International Conference. ACM Request Permissions, Apr. 2010.

[103] H. Hassanpour and S. Anisheh, “An improved adaptive signal segmentation method using fractal dimension,” in Information Sciences Signal Processing and their Appli- cations (ISSPA), 2010 10th International Conference on, 2010, pp. 720–723.

[104] O. Aciicmez, C. K. Koc, and J.-P. Seifert, “On the power of simple branch prediction analysis,” Cryptology ePrint Archive, Report 2006/351, 2006.

[105] Z. Wang and M. Karpovsky, “Robust fsms for cryptographic devices resilient to strong fault injection attacks,” in On-Line Testing Symposium (IOLTS), 2010 IEEE 16th International, july 2010, pp. 240 –245.

[106] H. Bar-El, H. Choukri, D. Naccache, M. Tunstall, and C. Whelan, “The sorcerer’s apprentice guide to fault attacks,” Proceedings of the IEEE, vol. 94, no. 2, pp. 370 –382, feb 2006. Bibliography 167

[107] S. Mangard, “Masked dual-rail pre-charge logic: Dpa-resistance without routing constraints,” in Systems – CHES 2005, 7th International Workshop. Springer, 2005, pp. 172–186.

[108] C. Paar, P. Fleischmann, and P. Roelse, “Eﬃcient multiplier architectures for galois ﬁelds gf(2 4n ),” IEEE Transactions on Computers, vol. 47, pp. 162–170, 1998.

[109] E. D. Mastrovito, “Vlsi designs for multiplication over ﬁnite ﬁelds gf (2m),” in Pro- ceedings of the 6th International Conference, on Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, ser. AAECC-6. London, UK, UK: Springer-Verlag, 1989, pp. 297–309.

[110] MATLAB, version 7.12.0 (R2011a). Natick, Massachusetts: The MathWorks Inc., 2011.

[111] AES, “Advanced encryption standard,” in FIPS PUB 197, Federal Information Pro- cessing Standards Publication, 2001.

[112] N. Kamoun, L. Bossuet, and A. Ghazel, “Correlated power noise generator as a low cost DPA countermeasures to secure hardware AES cipher,” in Signals, Circuits and Systems (SCS), 2009 3rd International Conference on, 2009.

[113] W. Einthoven, “The string galvanometer and the human electrocardiogram,” KNAW, 1903.

[114] N. Bjørner, “Taking satisﬁability to the next level with z3,” in Automated Reasoning, ser. Lecture Notes in Computer Science, B. Gramlich, D. Miller, and U. Sattler, Eds. Springer Berlin Heidelberg, 2012, vol. 7364, pp. 1–8.

[115] S. Chari, C. S. Jutla, J. R. Rao, and P. Rohatgi, “Towards sound approaches to counteract power-analysis attacks.” Springer-Verlag, 1999, pp. 398–412.

[116] S. Yang, W. Wolf, N. Vijaykrishnan, D. Serpanos, and Y. Xie, “Power attack resistant cryptosystem design: a dynamic voltage and frequency switching approach,” in De- sign, Automation and Test in Europe, 2005. Proceedings, title=Power attack resistant cryptosystem design: a dynamic voltage and frequency switching approach, 2005, pp. 64 – 69 Vol. 3.

[117] A. J. Menezes, P. C. V. Oorschot, S. A. Vanstone, and R. L. Rivest, “Handbook of applied cryptography,” 1997.

[118] C. Paar, P. Fleischmann, and P. Roelse, “Eﬃcient Multiplier Architectures for Galois Fields GF(24n),” IEEE Transactions on Computers, vol. 47, pp. 162–170, 1998.

[119] E. D. Mastrovito, “VLSI Designs for Multiplication over Finite Fields GF (2m),” in Proceedings of the 6th International Conference, on Applied Algebra, Algebraic Algorithms and Error-Correcting Codes. London, UK: Springer-Verlag, 1989, pp. 297–309. Bibliography 168

[120] J. Rabaey, A. Chandrakasan, and B. Nikoli´c, Digital Integrated Circuits: a Design Perspective. Pearson Education, 2003.