<<

The Pennsylvania State University The Graduate School College of Engineering

ENERGY-EFFICIENT AND SECURE DESIGNS OF SPINTRONIC MEMORY:

TECHNIQUES AND APPLICATIONS

A Dissertation in Computer Science and Engineering by Anirudh Srikant Iyengar

 2018 Anirudh Srikant Iyengar

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

August 2018

The dissertation of Anirudh Srikant Iyengar was reviewed and approved* by the following:

Swaroop Ghosh Assistant Professor of EE Dissertation Advisor and Chair of Committee

Trent Jaeger Professor of CSE

Saptarshi Das Assistant Professor of ESM

Vijaykrishnan Narayanan Professor of CSE

Chitaranjan Das Head of the Department of CSE

*Signatures are on file in the Graduate School

ii

Abstract

With increased integration of technology in our lives, the arms race between chip manufacturers to provide the latest and greatest to entice the consumer, only intensifies. A by- product of this growth is an ever-increasing demand for performance and efficiency. To address this problem CMOS transistors have always been scaled to smaller nodes to ‘fit in’ more functionality as well as lower the overall energy-footprint of the operation. However, scaling down the size of the transistor is becoming difficult, which in-turn dramatically reduces the profit motive of such an endeavor. Additionally, several new challenges are emerging in integrated circuit (IC) design: mainly leakage power (in caches) and the need for high-bandwidth computing. With this foresight, the industry began investigating alternative memory technologies such as: Resistive

RAM (RRAM), Phase Change RAM (PCRAM), Transfer Torque RAM (STTRAM), Domain

Wall Memory (DWM), Magnetic RAM (MRAM), Ferroelectric RAM (FeRAM), etc., that could replace CMOS in applications whilst providing non-volatility (to eradicate leakage), high-density (high-bandwidth compute) and high-endurance (long lifetime). A side-effect of incorporating these new memory technologies is the issue of security, privacy and counterfeiting.

As the demand for technology increases, the motivation for adversaries to tamper with them for economic, political and social gains will only increase.

A major perspective for the “beyond CMOS” comes from spintronic memory (as per the

International Technology Roadmap for Semiconductors) exploiting not only the charge of electrons but more importantly their magnetism, or their spin. STTRAM and DWM offer much potential

iii

owing to their high endurance, retention and density while operating at low-voltages. This motivated me to explore various possibilities of spintronic memory (STTRAM and DWM) in the domain of energy-efficiency, security and testing.

This thesis addresses: (i) energy-efficient applications and techniques for a system employing spintronic memory; (ii) the security challenges we might face adopting spintronic memory; and (iii) the need for securing traditional CMOS ICs from a counterfeiting as well as a duplication standpoint. In particular, we tackle the problems in energy-efficiency, authentication, privacy and secrecy.

The first part of the thesis describes the modeling aspects of spintronic memory i.e.

STTRAM and DWM. Then, we present three energy-efficient spintronic memory applications: (i) non-volatile flip-flop (NVFF), (ii) MTJ crossbar using selector diode (SD) and, (iii) pulsed shifting of DWs. Apart from the traditional state retentivity, the proposed NVFF offers protection against unexpected power-cuts— allowing for a fluid instant-ON experience. The MTJ crossbar using a

MIIM SD allows for a high-density design with the necessary robustness and energy-efficiency demanded by high bandwidth applications. The pulsed shifting technique of DWs reduces the impact of Joule heating in NWs, thus, maintaining energy efficiency without sacrificing performance.

In the next part of the thesis we present potential security vulnerabilities– side channel analysis and privacy, of STTRAM and DWM. We present some mitigation techniques to circumvent these issues. Following this, we explore the security aspects of said spintronic memory, by illustrating their potential use as a PUF. Our proposed PUFs exploit the inherently large entropy surrounding the DW NW making them a strong candidate for magnetic memory-based authentication. We then analyze the operation and robustness of the PUFs under varying supply and temperature conditions.

iv

Finally, we describe threshold voltage defined CMOS switches for camouflaging logic. By modifying the doping concentration of selective CMOS switches at design-time, we have been able to realize six different logic functionalities. We demonstrate the designs by implementing ring oscillators (RO) in a 65nm node test-chip on which we analyze the impact of supply-voltage, process variation and temperature. Also, we demonstrate how we can reclaim lost performance by tuning the gate voltage under varying temperatures and supply voltages. The difficulty in RE the netlist when a portion of the gates are camouflaged gate is quantified by estimating the time taken for the decamouflaging process. We also describe camouflaged gate selection using controllability and observability conditions. Additionally, an alternative camouflaging technique that operates on charge-trap is described. The advantage of this technique is that the charge trapped in the gate oxide is responsible for gate selection, thus leaving no physical evidence of camouflaging.

In summary, this dissertation provides an overview of the design, analysis and applications of STTRAM and DWM for energy-efficiency and enhanced device security.

v

Table of Contents

List of Figures ...... x List of Tables ...... xvi List of Abbreviations ...... xvii Acknowledgements ...... xix

Chapter 1 ...... 1 Introduction ...... 1 1.1 Contributions ...... 7

Chapter 2 ...... 9 Introduction to STTRAM and DWM ...... 9 2.1. Introduction...... 10 2.2. Basics of STTRAM and DWM ...... 11 2.2.1. Design Fundamentals of STTRAM ...... 11 2.2.2. Modeling of STTRAM Switching dynamics ...... 12 2.2.3. Design Fundamentals of DWM ...... 14 2.2.3.1. Basics of DWM ...... 14 2.2.3.2. NW material and DW type ...... 16 2.2.3.3. Pinning of DW ...... 17 2.2.4. Modeling of DW dynamics ...... 17 2.2.5. Microscopic Properties ...... 19 2.2.6. Macroscopic Properties: ...... 21 2.3. Summary ...... 22

Chapter 3 ...... 23 Energy-Efficient Spintronic Applications and Techniques ...... 23 3.1. Introduction...... 24 3.2. Enhanced-Scan Enabled NVFF (ES-NVFF) ...... 26 3.2.1. Base ES-NVFF...... 27 3.2.2. High Performance ES-NVFF (HPES-NVFF) ...... 31 3.3. Design Analysis of HPES-NVFF ...... 32 3.3.1. Write Asymmetry: Analysis and Mitigation ...... 33 3.3.2. Power Gating Scheme ...... 35

vi

3.4. Other Energy Efficient Techniques ...... 39 3.4.1. Exploration of Selector Diode-STTRAM Crossbar ...... 39 3.4.1.1. Overview ...... 40 3.4.2. Mitigating Joule Heating and in DW NW ...... 42 3.5. Summary ...... 45

Chapter 4 ...... 46 Secrecy and Privacy Issues of Spintronic Memory ...... 46 4.1. Introduction...... 47 4.1.1. Threat Model ...... 50 4.2. Side Channel Attacks on STTRAM & Countermeasures ...... 51 4.2.1. STTRAM Functional Vulnerabilities ...... 51 4.2.2. Exploiting these Functional Vulnerabilities ...... 54 4.2.3. Prevention Techniques ...... 57 4.2.3.1. Semi Non- (SNVM)...... 57 4.2.3.2. Adding 1-Bit Parity ...... 58 4.2.3.3. Adding Random Bits in a Word ...... 59 4.2.3.4. Constant Current Write ...... 60 4.2.3.5. Increasing Word Size ...... 61 4.3. Data Privacy Issues ...... 62 4.4. Other Potential Issues ...... 64 4.5. Summary ...... 66

Chapter 5 ...... 67 DWM PUFs for Security, Trust and Authentication ...... 67 5.1. Introduction...... 68 5.1.1. Threat Model ...... 70 5.2. Physically Unclonable Functions...... 71 5.2.1. Approach ...... 71 5.2.2. Harvesting Entropy and Randomness ...... 72 5.2.3. Relay-PUF Design ...... 72 5.2.4. Memory-PUF Design ...... 75 5.3. Simulation Results ...... 77 5.3.1. PUF Strength ...... 77 5.3.2. PUF Randomness and Stability ...... 77 5.3.2.1. Relay-PUF ...... 77 5.3.2.2. Memory-PUF ...... 79 5.3.2.3. Quality Analysis of DW-PUFs ...... 81 5.3.2.3.1. Uniqueness of Mapping ...... 81 5.3.2.3.2. Stability or reliability ...... 83 5.3.2.3.3. Randomness of Response ...... 83 5.3.2.3.4. Challenge-Response Analysis ...... 84 vii

5.4. Attack Models ...... 86 5.4.1. Magnetic Attack ...... 87 5.4.2. Machine Learning Attack ...... 88 5.4.3. Other Possible Threat Models ...... 89 5.5. Summary ...... 89

Chapter 6 ...... 90 IP Protection Using Camouflaging ...... 90 6.1. Introduction...... 91 6.1.1. Threat Model ...... 93 6.2. Background ...... 93 6.2.1. Threshold defined switch ...... 93 6.2.2. Multi-function camouflaged logic ...... 94 6.2.3. Application in Hardware Security...... 96 6.3. Test-Chip Overview ...... 96 6.3.1. Design ...... 96 6.3.2. Test features ...... 98 6.4. Experimental Results ...... 99 6.4.1. Basic setup ...... 100 6.4.2. Optimal VSN and/or VSP ...... 100 6.4.3. Vdd scaling ...... 102 6.4.4. Process variations...... 103 6.4.5. Temperature variation ...... 103 6.5. Design and Security Analysis ...... 105 6.5.1. Area, Power and Delay Overheads ...... 105 6.5.2. RE Effort ...... 106 6.5.3. Camouflaging Strategy and Evaluations ...... 107 6.6. Discussion ...... 109 6.6.1. Attack possibilities ...... 109 6.6.2. Integration with EDA tools ...... 109 6.6.3. Low-Overhead Camouflaged Gate ...... 110 6.6.4. Other camouflaging techniques ...... 111 6.6.5. Need for a security evaluation framework ...... 112 6.7. Summary ...... 113

Chapter 7 ...... 114 Future Work ...... 114 7.1. Architecture Design ...... 114 7.2. High Performance Compute ...... 115 7.3. Energy Efficiency ...... 115 7.4. Security ...... 116

viii

Chapter 8 ...... 117 Summary ...... 117

Appendices ...... 120 A.1. Modeling of STTRAM Retention ...... 120 A.1.1. MTJ Size Vs Retention Time ...... 122 A.1.2. Stochastic Retention Modeling ...... 122 A.2. Modeling of DW dynamics ...... 123 A.2.1. Modified Landau, Lifshitz and Gilbert Equation. (LLG) ...... 124 A.2.2. One Dimensional Model (1D) ...... 125 A.2.3. Modeling of NW Resistance ...... 133 A.2.3. Modeling of Pinning ...... 133 A.2.4. Modeling Process Variation and Entropy ...... 137 A.2.4.1 Process Variation ...... 137 A.2.4.2 Entropy and Randomness in DWM ...... 138 B.1. Referred Conferences ...... 140 B.1. Referred Conferences ...... 140 B.2. Referred Journals ...... 141 B.3. Referred Patents ...... 142 B.4. Referred Book Chapters ...... 142

Bibliography ...... 143

ix

List of Figures

Figure Page

1.1 (a) Growth of fully connected devices [103]; (b) trend of revenue share of semi- 1 conductor-based devices.

1.2 (a) Percentage of area occupied by memory and logic, and (b) percentage of total power due to leakage in scaled technologies (increasing trend because of 2 larger on-chip cache).

1.3 Some popular emerging memory technologies. 3

1.4 Typical semi-conductor global supply chain for subassembly [102]. 5

1.5 Overview of the different attack possibilities—from design to distribution 6 [102].

2.1 (a) 1-T 1-MTJ bitcell schematic; (b) energy barrier separating the two MTJ 11 magnetization states that determines the retention time.

2.2 Switching of the MTJ FL under the influence of: (a) current and, (b) magnetic 13 field (the MTJ parameters used are outlined in Table 2.1).

2.3 (a) Schematic of Domain Wall Memory and governing equations (b) types of 15 DWs and their dependency on NW dimensions and, (c) Bloch and Neel wall.

2.4 Domain wall pinning: (a) nanowire with pinning sites at q1, q2 and q3. (b) ψ vs. 16 q plot for pinning at q1.

2.5 (a) DW velocity vs. applied current (experimental values are also plotted), (b) 18 ψ & q in a NW with a DW.

2.6 DW dynamics with two different shift current magnitude and duty cycle but 20 same average value is also shown.

2.7 Stochastic pinning of DWs with respect to three different shift currents. 21

x

3.1 (a) spin-MTJ based NVFF [3], (b) NVDFF [1] design, and (c) SHE-NVFF [2] 24 based design.

3.2 (a) Schematic of the proposed base ES-NVFF circuit, and (b) corresponding 28 timing diagram describing the various operation modes.

3.3 Base ES-NVFF current paths during (a) store and, (b) restore. 29

3.4 (a) Schematic of the proposed base HPES-NVFF circuit, and (b) corresponding 31 timing diagram describing the various operation modes.

3.5 Current paths for the MTJ operation. 33

3.6 Asymmetry in the MTJ write times for (a) MTJ1 and, (b) MTJ2. 34

3.7 GHPES-NVFF, (a) circuit schematic, and (b) timing diagram illustrating the 36 gating process.

3.8 Comparison of short circuit leakage energy between the HP-ESE -NVFF and 38 GHPES -NVFF.

3.9 (a) MIIM diode stack, (b) band-diagram at 0 bias, (c) band-diagram at negative 40 bias (Vr) on TE ( Vr< VT-), (d) band-diagram at negative bias (Vr) on TE ( Vr> VT-), (e) band-diagram at positive bias (Vf) on TE ( Vf> VT+). BE is grounded.

3.10 (a) I-V curve of 20nmx20nmx5nm MTJ obtained using [8] with Ms=780Oe, 41 Ea=56kT erg, Ku=Ea/v erg/cc, alpha=0.007, pol=0.8. (b) I-V curve of 1D- 1MTJ for various RL.

3.11 Sneak current during sensing. 42

3.12 (a) & (b) Max. resistance and temperature of NW for different current densities 43 and operating modes.

3.13 (a) Transient velocity of the DW for constant voltage assumptions and, (b) avg. 44 velocity of DW w.r.t for different operating modes

4.1 System level view comprising of CPU, LLC and external voltage regulator. The 48 adversary can monitor die current and/or regulator current.

4.2 Power signature of an example system consisting of 4 flavors of ring-oscillators 49 (250 each) to mimic CPU along with a 512b word STTRAM LLC.

4.3 (a) Write latency; (b) read latency distribution of an 8MB STTRAM cache 51 under process variation. The long read and write latency presents wider attack window to the adversary.

xi

4.4 (a) Supply current waveform (y-axis values are negative) for write ‘1’, and (b) write ‘0’. A significant gap is present between write ‘0’ and ‘which can be employed as signature. Furthermore, the magnitude of write current is a 52 function of stored data which also acts as a signature. The size of MTJ is (40X40X4)nm, ∆ is 56, damping constant α is 0.007, and saturation magnetization is 780Oe.

4.5 Supply current variation for read operation. 4 flavors of ring-oscillators (250 each) to mimic CPU along with a 512b word STTRAM LLC. A reasonable gap 53 is present between read ‘0’ and ‘1’ currents which can be employed as signature.

4.6 Write latency for different values of thermal barrier. 54

4.7 Write currents for 4-bit operation. 55

4.8 Read currents for 4-bit operation. 56

4.9 (a) Retention time variation with respect to MTJ ; and, (b) retention time 57 dependence on temperature.

4.10 (a) Current waveform for 4-bit write with 1-bit parity, (b) percent reduction in states with 1-bit parity for different word sizes. Substantial reduction in states 58 in possible with 1-bit parity.

4.11 Percent reduction in states with multi-bit random write for different word sizes. 60

4.12 (a) Constant current write circuit [9]; and, (b) write latency difference with 61 constant current write (current in mA).

4.13 Homogeneous write using reduced write current. 62

4.14 The architecture to erase the cache tag, data and valid bits in a direct mapped 63 cache when the system is turned off.

5.1 Sources of entropy and randomness in DWM system. 68

5.2 Harvesting entropy and randomness through DW nucleation (1), shift (2), DW 72 motion (3) and sense (4).

5.3 Schematic of DW relay-PUF. Ishift pulse magnitude and width can also be used 73 as challenges. Sequence of events is numbered from 1 to 5.

5.4 Timing diagram representing the wWL, the shift signals for each stage, the rWL 74 and the variation of resistance sensed by the read head.

5.5 (a) Timing diagram representing the wWL, the shift signal and the rW and, (b) 75 Schematic of memory-PUF. xii

5.6 Relationship of DW velocity on the three pulsed voltage conditions, (a) for various (Pulse Magnitude) PMs, (b) for different (Pulse width) PWs and, (c) for 76 different (pulse frequencies) PFs (legend shows the off-on time = 5ns, pulse period for (a) and (b) is 10ns).

5.7 (a) The NW race between NW1 & NW2, which is being relayed to NW3 & 78 NW4, and (b) response of 6-stage relay-PUF for 32 challenges for 32 different dies.

5.8 Arrival time distribution for (a) different shift voltage settings at 25°C and, (b) 80 two voltages settings at 25°C and 125°C.

5.9 (a) Velocity distribution in the memory array, and (b) a memory array bitmap 81 for the typical case.

5.10 Inter and intra-die Hamming distance distribution for: (a) relay-PUF; and (b) 82 memory-PUF.

5.11 (a) Increase in responses w.r.t. increase in PW and PM modes, (b) Difference between Arbiter-PUF and DWM relay-PUF to achieve certain number of 85 responses, (c) The reduction in power w.r.t increase in number of heads for fixed number of challenges.

5.12 (a) Schematic of MTJ; (b) flipping of MTJ due to STT (Happ=0Oe, I=0.638mA); and, (c) due to external magnetic field (Happ=260Oe, I=0). Plots 87 are obtained by solving LLG.

6.1 Proposed camouflaging technique. Existing combinational logic gates could be replaced with camouflaged gates to protect the underlying IP by increasing RE 91 effort.

6.2 (a) VT programmable NMOS switch. HVT: OFF, LVT: ON. PMOS switch works similarly; (b) The I-V plot for the HVT, NVT and LVT switches biased 94 at VSN = (HVT+LVT)/2.

6.3 (a) NMOS-switch based camouflaged gate to hide 6 functionalities; and, (b) 95 die-image of the test-chip.

6.4 Layout design of: (a) reference NAND gate, (b) NMOS-switch based 97 camouflaged gate and, (c) CMOS-switch based camouflaged gate (the gates are upsized to counter the ill-effects of process variations).

6.5 Schematic overview of the test-chip design with a resistance ladder for 98 generating VSN and VSP voltages.

6.6 Test setup consisting of (i) the test chip, (ii) a logic analyzer, (iii) an oscilloscope and (iv) a power supply. The oscilloscope capture of RO 99 oscillations of camouflaged NAND gate is also shown. xiii

6.7 Frequency variation with respect to (a) VSN – for NMOS-switch (optimal VSN = 500mV) and, (b) VSN + VSP for CMOS-switch (optimal VSN = 450mV and 100 VSP = 600mV).

6.8 Frequency variation with respect to (a) VDD for NMOS-switch @ VSN = 500mV, (b) VDD for CMOS-switch @ VSN = 450mV and VSP = 600mV, and, 101 (c) VSN at 650mV VDD. It must be noted that resistance ladder voltage changes with VDD, thus leading to a smaller VSN step size.

6.9 Optimal VSN and VSP for CMOS-switch at 650mV VDD (optimal VSN = 102 325mV and VSP = 275mV).

6.10 (a) Frequency distribution of 10 die for both the NMOS-switch and CMOS- switch based RO; (b) variation of oscillating frequency under change in 104 temperature for an NMOS-switch based RO.

6.11 Optimal VSN bias shift under the effect of temperature (65ºC) 105

6.12 RE effort using SAT-based solver for a 2-input 6-function camouflaged gate. 107

6.13 Netlist generation algorithm for gate camouflaging. 108

6.14 Low-overhead camouflaged gate with 3 functionalities. 110

6.15 CTCG (a) Charge trapping circuit; and, (b) 2-input 2 function CTCG. 111

A.1 (a) Retention time variation with respect to disturb current and, (b) retention 122 time variation with respect to free layer volume.

A.2 (a) Torques experienced by a magnetic moment, (b) The co-ordinate system 125 used to convert from Cartesian to polar.

A.3 (a) Variation of the magnetization angle across the width of the domain wall and, (b) the sections of a NW used to better understand the dynamics of the 128 domain wall (inclusive of tilt angle and magnetization position).

A.4 Modeling NW resistance: (a) normalized resistance vs. temperature w.r.t experimental data [8] and, (b) resistance vs. time for different number of DWs 134 in the NW.

A.5 Domain wall pinning: (a) nanowire with pinning sites at q1, q2 and q3. (b) ψ vs. q plot for pinning at q1. The DW depins with u=100m/s, (c) ψ vs q plot of DW 135 3 3 for one, two notches (Vpin =2000 J/m ) and two notches (Vpin =1000 J/m ), (d) 3 ψ vs. q plot of DW for three notches with Vpin =2000 J/m .

A.6 Velocity degradation due to multiple notches, (a) transient velocity in 0, 1, 2, 3 136 pinning sites in the NW. (b) transient velocity for one deep pinning and two cases of shallow pinning sites. [α=0.01, β= 0.02 and ∆=25nm]. xiv

A.7 Distribution of depinning voltage and segregation of pulsed shifting. 137

A.8 DW dynamics in presence of physical roughness induced slowdown and 138 eventual pinning. Stochastic motion of DW. A misaligned shift pulse reduces velocity.

xv

List of Tables

Table Page

1.1 Salient features of the above-mentioned memory technologies. 4

2.1 MTJ parameters used. 14

2.2 DWM parameters used. 19

3.1 Design comparison with other works. 25

3.2 Comparative analysis of proposed techniques. 39

6.1 Comparative analysis of gate camouflaging techniques. 106

xvi

List of Abbreviations

ADC Analog to Digital Converter CTCG Charge Trap-based Camouflaged Gate DAC Digital to Analog Converter DWM Domain Wall Memory ESNVFF Enhanced Scan enabled Non-Volatile Flip Flop FeFET Ferroelectric FET FeRAM Ferroelectric RAM FF Flip-flop HPESNVFF High Performance Enhanced Scan enabled Non-Volatile Flip Flop IC Integrated Circuit IMA In-plane Magnetic Anisotropy IoT IP Intellectual Property LLG Landau-Lifshitz-Gilbert MIIM Metal-Insulator-Insulator-Metal MRAM Magnetic RAM MTJ Magnetic Tunnel Junction NVFF Non-volatile Flip-Flop NVM Non-Volatile Memory NW Nanowire PCM Phase Change Memory PMA Perpendicular Magnetic Anisotropy

xvii

PUF Physically Unclonable Function RE Reverse Engineering RO Ring Oscillator RRAM Resistive RAM SCA Side Channel Attack SD Selector Device SHE Spin Hall Effect SIMD Single Instruction, Multiple Data SPA Simple Power Analysis STTRAM Spin Transfer Torque RAM TRNG True Random Number Generator

xviii

Acknowledgements

I would like to take a moment to thank a few people who have made this enriching journey possible for me. Firstly, I thank my advisor and mentor, Dr. Swaroop Ghosh, for his continuous guidance, patience and support throughout the course of my doctoral studies. Being a part of the

LOGICS group has been extremely fulfilling and enabled me to grow as an independent researcher.

Dr Ghosh’s insight and advice on both research and my career graph are simply invaluable.

I would like to thank Dr. Srikanth Srinivasan, for his input on the underlying physics of memory. Collaborating with him on a journal publication has been enriching and informative. Dr. Jaideep Kulkarni provided me with much needed insight into the industry and offered a practical perspective to my research direction, and I thank him for that. I would like to thank Dr. Ram Krishnamurthy, for his continued motivation and support, and Dr Swarup Bhunia, for his insight into hardware security. Collaborating with him on a few journal publications helped me understand the nuances of this field and helped shape my research direction. I would also like to thank Dr. Trent Jaeger, whose computer security class offered me a holistic view of system security and helped me understand relevant security issues that exist in the industry currently. I would like to thank Dr. Vijaykrishnan Narayanan, for guiding me to look beyond what is currently being done and seek a more holistic view to problem solving, and Dr. Sumeet Gupta, for his guidance and training in advanced VLSI. His hands-on approach of teaching and invaluable knowledge helped me deep dive into this field and learn all that it has to offer.

xix

I would like to thank Burzin and Truc for making my two internship stints at Intel extremely enjoyable and fulfilling. They helped me greatly, in understanding and integrating into corporate life and the Intel Ecosystem. I would also like to thank Dr. Sacchhidh, Amir and Alex, for their continuous support and dedicated mentorship. I look forward to working with all of them when I continue my stint with Intel as a full-time employee.

My PhD journey would not be as fulfilling, challenging or enjoyable if not for my fellow lab-mates in the LOGICS group. I would like to thank Kenny, Jae, Asmit, Nasim, Saki, Faizal,

Rekha and Hamid for their continued support, motivation, and encouragement.

I would like to give a special thanks to my fiancé Sowmya Srikanth for always being there and keeping me motivated through all the highs and lows. Lastly, I would like to thank my parents and my sister for their support, patience and encouragement throughout the course of my doctoral studies.

This material is based on work supported by the Semiconductor Research Corporation

(SRC) under award number (#2727.001), the National Science Foundation (NSF) under award numbers (#CNS-1722557, #CCF-1718474 and #DGE-1723687), and the Defense Advanced

Research Projects Agency (DARPA) Young Faculty Award under award number (#D15AP00089).

Any opinion, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the Semiconductor Research

Corporation, National Science Foundation and Defense Advanced Research Projects Agency.

xx

To my fiancé Sowmya Srikanth, my parents Dr. Srikant S. Iyengar and Rajyashree Srikant and

my sister Anjali S. Iyengar.

xxi

Chapter 1

Introduction

Semiconductor industry has seen an explosive growth in the past few years owing to a huge boom in the ‘Internet of Things’ (IoT) and high-performance computing (HPC) segments. It is projected that there would be around 50 billion fully connected devices by the year 2020, which would constitute nearly 2 trillion dollars of the market share (shown in Fig. 1.1(a-b)). These ‘Smart’ and fully connected devices are expanding into various fields from automation to medicine to household services. With its increasing reach comes a demand for more and more performance with an even lower energy footprint. This has fueled the miniaturization of device dimensions and large-scale device integration in today’s ICs. Thus opening itself up to two major issues (Fig. 1.2):

INTERNET OF THINGS An Explosion of Connected Possibility 2000 Range of ‘smart’ Applications 1600

1200

IoT Inception 800 Desktop/PC Laptops 400

Revenue in Billion U.S Dollars U.S in Billion Revenue 0 10 11 12 13 14 15 16 17 20 Year

(a) (b)

Figure 1.1 (a) Growth of fully connected devices [103]; (b) trend of revenue share of semi- conductor based devices.

1

Energy-Efficiency: Both in the IoT space and HPC space, the system-wide power consumption has been identified as one of the key constraints moving forward [1]. The ever- increasing operating frequency/bandwidth along with the increase in the number of transistors on a single die, has resulted in an increase in the overall power consumption (Fig. 1.2(b)). With more and more of the on-chip die area being dedicated to memory (1.2(a)), the power wasted due to leakage has also increased [2]. Fig. 1.2(b) shows that the leakage power is poised to overtake the dynamic power if left unchecked. In case of HPC systems, a significant portion of a node’s power is consumed by the DRAM (roughly 30 to 50 percent), plus the increasing difficultly to keep memory at pace with the computational rates offered by next-generation processor is a major hurdle moving forward. In order to address this issue, several emerging memory technologies related to nonvolatile memory (NVM) devices are currently being investigated as an alternative for both on- chip, main-memory and storage. Fig. 1.3 highlights some of the popular emerging memory

(a) (b) Figure 1.2 (a) Percentage of area occupied by memory and logic, and (b) percentage of total power due to leakage in scaled technologies (increasing trend because of larger on-chip cache). 2

technologies such as Spin-Torque Transfer RAM (STT-RAM)[3], Domain Wall memory (DWM)

[4], Phase-change RAM (PCRAM) [5], Ferro-electric RAM (FeRAM) [6] and Resistive RAM

(RRAM)) [7], that are being explored as potential alternatives to existing memories in future

computing systems. These emerging NVM technologies combine the speed of SRAM, the density

of DRAM, and the non-volatility of , and hence have become very attractive amongst

the research community.

Table 1.1 consolidates the salient features of some of the popular emerging memories. In

this thesis we will be focusing our efforts investigating spintronic memory. Spintronic memory

offers some unique opportunities owing to its non-volatility—zero-leakage; instant-ON feature and

good power efficiency (especially useful in IoT applications); resistive storage—can show potential

use in both boolean (logic operations) and non-boolean computations (such as sorting, min/max

PCRAM STTRAM

Emerging Memories FeFET/ DWM FeRAM

RRAM

Figure 1.3 Some popular emerging memory technologies.

3

Table 1.1 Salient features of the above-mentioned memory technologies.

Features SRAM STTRAM DWM RRAM FeRAM PCRAM

Density ~100F2 ~6F2 ~2.5F2 ~4F2 ~20F2 ~4F2

Programming Low Low Low Low Low High Energy

Access Time <1/ ~10/ ~10/ ~100/ ~100ns/ 50/75ns (W/R) <1ns ~10ns ~10ns ~50ns ~20ns

Endurance >1021 ~1016 ~1016 ~1010 ~1015 ~109

Non- No Yes Yes Yes Yes Yes Volatility

operations); shift-based access—can see potential use in single instruction, multiple data (SIMD) operations; and small footprint anywhere from 6-20F2 for STTRAM and as low as 2.5F2 for

DWM[8]—large on-chip cache and main-memory applications.

In order to explore the design space for different goals, low-level models are required to estimate the spintronic design, from an area, timing and energy point of view. In the next chapter we describe the modeling of STTRAM and DW motion, following which we illustrate some potential energy-efficient applications and techniques.

Security: A corollary of having so many connected devices is the problem of ensuring security. The risk of large scale security breach is deeply concerning and can prove to have catastrophic impacts. Security of hardware has long been viewed as a trusted party that supports the whole computing system. The IC supply chain was believed to be well-protected with high attack barriers that the adversaries could not easily compromise. But with increasing globalization of the supply chain, the manufacture supply chain can no-longer be considered secure (Fig. 1.4).

4

Figure 1.4 Typical semi-conductor global supply chain for subassembly [102].

For example, a malicious foundry may manipulate the underlying design that can potentially be exploited by attackers after the chips are integrated into their platforms.

Hardware security was a term that was coined in reference to hardware Trojan designs, categorization, detection, and isolation where the untrusted foundries were treated as the main threat. As a result, most of the effort was spent in developing hardware Trojan detection techniques aimed at the post-silicon phase with emphasis on the security enhancement of existing testing methods [9,10]. Given the fact that third-party IP cores may be another attack vector for malicious logic insertion, the protection of pre-synthesis designs becomes equally important. In light of this, pre-silicon circuit protection approaches have also been developed [9, 11].

Over time, the concept of hardware security has moved away from the hardware Trojan detection and now leans towards trustworthy hardware development for the construction of the root-of-trust. The intrinsic properties of hardware devices which have a negative impact on circuit performance are leveraged for security applications. One leading example is the development of

5

Figure 1.5 Overview of the different attack possibilities—from design to distribution [102].

physical-unclonable functions (PUFs) which rely on device process variation to generate chip- specific fingerprints in the format of challenge-response pairs.

Aside from the threats imposed on the hardware itself, the problem of hardware counterfeiting and intellectual property (IP) reverse engineering (RE)/duplication is becoming a serious concern. With the potential of siphoning large profits, there is a growing need to protect the original hardware/IP. Camouflaging is one technique that is being investigated to hide the underlying design to make RE very difficult if not impossible. Solutions such as—incorporation of dummy contacts/gates [cam ref], threshold voltage modulation [our] etc. are being investigated.

Fig. 1.5 highlights the vast attack landscape The overview of the vulnerabilities, the type of adversaries and various control measures that are currently being researched is summarized in Fig.

1.5.

With the potential of wide-spread adoption of emerging memory technologies (discussed in section 1.1), researchers are investigating the usage and /or protection of , 6

such as spin-transfer torque (STT) device, RRAM, FeRAM and domain wall, leveraging their special properties for hardware security applications and techniques.

1.1 Contributions

Spintronic memory is a promising alternative to traditional SRAM cache memory due to its high endurance, zero leakage, high density and high retention. In this thesis, we first present the modeling details of the underlying physics that govern the operation of the free layer of the MTJ.

We also incorporate the effects such as—volume, temperature, current, magnetic field and thermals noise, that determine the retention time of STTRAM. We then describe the modeling details that govern the dynamics of DW motion. The impact of NW resistance, pinning, process variation and entropy on the DW motion are also investigated.

Next, we investigate potential energy-efficient applications of spintronic (MTJs) memory, one of which is the non-volatile flip-flop (FF). These FFs offer an instant-ON experience even in the case of an unexpected power failure along with enhanced-scan test capability. Furthermore, by incorporating MTJs in the traditional enhanced scan flip-flop, we are able to maintain a low design complexity and area-overhead. We analyze both the store and restore operations and propose some improvements to boost energy-efficiency such as: whole cycle store and input pattern-based power gating. Aside from this, we also investigate a crossbar type STTRAM comprising of a bi-directional diode as the selector device. In case of DWM however, shifting is the most common and power consuming operation. We propose bank-interleaving and pulse-shifting to mitigate the impact of

Joule heating and electromigration, which in turn boosts energy efficiency.

Non-volatility although beneficial, can lead to potential privacy/security concerns.

Persistent data will allow a much larger (longer) window of attack for an adversary to extract the necessary information and/or corrupt them. Furthermore, spintronic memory being a current driven

7

memory, ‘leaks’ information based on the current drawn. An adversary can exploit this and snoop on the data that is being written-to or read-from the memory. We analyze these issues by first establishing some attack models such as: side-channel power analysis – for snooping on the data during writes and reads and, persistent data attacks by just hot-swapping the memory and reading from the data that is stored. We propose mitigation/prevention techniques for both of these issues and analyze their effectiveness and robustness.

The high-degree of entropy of spintronic memory is a problem towards a robust design, but this can prove especially useful in realizing hardware security primitives such as PUFs and

TRNGs. We exploit such properties in realizing two DWM-based PUFs: relay-PUF and memory-

PUF. We analyze the effectiveness of the two proposed PUFs and its robustness under the influence of temperature and voltage. Additionally, we describe some of the attack models such as magnetic field and machine learning, which can potentially compromise the proposed PUFs.

In the case of traditional CMOS technology, IP piracy and counterfeiting poses a serious security and commercial threat. We present a threshold voltage switch-based camouflaged logic which helps to protect IP piracy and counterfeiting. We describe two camouflaged gates (NMOS- based and full CMOS-based) that can exhibit 6 functionalities based on appropriately incorporating threshold defined switches. These two flavors of gate camouflaging are realized as ring oscillators by using the ST micro 65nm node. Also, we perform in-depth temperature, supply-voltage and process variation analysis on our test chips. Furthermore, we demonstrate how tuning the gate voltage can not only be used to guarantee functionality, but also can be used to reclaim lost performance at low-voltages and high temperatures. We also quantify the difficulty of RE by employing SAT based solvers on various benchmarks. Also, we describe how controllability and observability metrics can be used to select the gates that are best suited for camouflaging.

8

Chapter 2

Introduction to STTRAM and DWM

Since CMOS technology is approaching the end of scaling roadmap, there is a need to explore alternative technologies to assist or replace CMOS technology. Furthermore, the growth in new application areas such as data analytics, Internet-of-Things (IoT), cognitive computing, and, , requires new features that are not readily available in CMOS technology.

Emerging memories can potentially solve the scalability issues and cater to the new applications by providing features such as non-volatility, high density and endurance. Spintronic technologies have demonstrated significant promise due to multitude of features that can find applications in storage, cache, non-volatile combinational logic, sequential logic, search engines, security primitives, and, neuro-inspired computing to name a few. In this chapter we will study two popular flavors of spintronic memory – spin-transfer torque RAM (STTRAM) and domain wall memory

(DWM).

Magnetic tunnel junction (MTJ) is the fundamental unit that makes the STTRAM, a reliable MTJ dynamics model is paramount towards STTRAM circuit and architecture analysis. In this chapter we first focus our efforts in modeling the magnetization dynamics of the free layer of the MTJ under the influence of current and magnetic field.

In case of the DWM, reliable shifting of DWs is critical to its functionality, we focus our efforts towards modeling the dynamics of DW motion. In this chapter, we also describe the physics- based modeling of DW dynamics in permalloy NW. Next, we explain the modeling of NW resistance with DW and simulation results. In addition, we provide a means to comprehend the influence of process variation on the dynamics of DW motion.

9

2.1. Introduction

Spintronic technology operates on the principle of manipulation of electron spins to perform computation and storage. Contrary to charge-based computing spintronic computing requires less energy to switch the output making them energy-efficient. The non-volatility is also desirable for many applications especially energy-constrained Internet-of-Things (IoT) that mostly stay OFF and occasionally perform computing. Persistence of information during inactive cycles saves re-initialization energy. The structure of spintronic devices is an active area of research.

Magnetic RAM (MRAM) [12], STTRAM, DWM, and spin are some of the most investigated spintronic devices [8]. Interestingly, a variety of new structures have been proposed to suit certain applications. Examples include spintronic devices for interconnects, full adders, neurons, synapses, analog-to-digital converters (ADC) and digital-to-analog converters (DAC)

[13]. The feasibility of such structures has been validated using experimental demonstration as well as through micro-magnetic simulations.

Spintronic devices possess properties such as polarization dependent resistance, asymmetric read/write latencies, current-based switching and dynamics of magnetization, non- linearity, chaotic dynamics, noise sensitivity, and, non-volatility [14]. These properties can be tied with appropriate applications for area, energy-efficiency and quality. In this chapter, we describe the salient features of STTRAM and DWM and detail the physics-based modeling of STTRAM magnetization flipping as well as the dynamics of DW motion. Additionally, we also illustrate the impact of process variation on DW motion via this model.

The highlights of the chapter are:

▪ Description of the fundamental concepts of STTRAM and DWM.

▪ Provision of a physics-based model of the STTRAM magnetization flipping and the

DW motion that comprehends the number of bits and DW resistance.

10

▪ Discussion of the various microscopic and macroscopic properties of that surround

DWM.

2.2. Basics of STTRAM and DWM

2.2.1. Design Fundamentals of STTRAM

STTRAM employs MTJ as the storage element. MTJ is a spintronic device which contains two ferromagnetic layers (free and pinned layers) and a layer of magnetic oxide sandwiched between them (Fig. 2.1(a)). The Pinned Layer (PL) has fixed magnetization state while the Free

Layer (FL) magnetization state can be polarized parallel or anti-parallel with respect to the PL.

STTRAM is a 1-T 1-MTJ device which stores the data in the form of polarization of magnet. The parallel and anti-parallel magnetization state of the FL to that of the PL is used to store either a logic ‘0’ or ‘1’ respectively. The magnetization of the free layer is flipped by current induced Spin

Torque Transfer (STT) by passing the appropriate write current (Iw) (Fig. 2.1(a)). The resistance of

FL

PL

FL

PL

(a) (b) Figure 2.1 (a) 1-T 1-MTJ bitcell schematic; (b) energy barrier separating the two MTJ magnetization states that determines the retention time.

11

MTJ is high when PL and FL are in antiparallel configuration (Rap) whereas the resistance is low when they are parallel (Rp) to each other. The value written to the MTJ depends on the direction and the strength of the write current.

Fig. 2.1 (a) shows a commonly implemented STTRAM. To read information stored in the cell, the word line (WL) is turned on and a small read current is applied to either the bit line (BL) or source line (SL) with the other end being grounded. The MTJ resistance state is determined by sensing the voltage drop across it for a given read current. In contrast, the write operation requires bidirectional currents to switch the FL magnetization to the corresponding state. For anti-parallel to parallel flip, the WL and the BL are activated, while the SL is grounded, and vice-versa for parallel to anti-parallel flip. For successful write, the write current Iw must be greater than the threshold current (Ico).

The two magnetization states of the FL form the stable states separated by an energy barrier

‘EB’ (Fig. 2.1(b)). To switch the magnetization from one state to another, the FL is excited with enough energy to overcome this barrier. This is achieved by passing a current through the MTJ. It is possible to lower the amount of current required to switch the magnetization states by reducing

EB (as shown by numbers 1-4 in Fig. 2.1(b)), which in turn lowers the retention time of STTRAM

2.2.2. Modeling of STTRAM Switching dynamics

In order to utilize STTRAM in novel circuit and architecture applications, it is necessary to correctly model its behavior. The Landau-Lifshitz-Gilbert (LLG) equation forms the foundation for formulating the behavior of magnetization m, of a nanomagnet in the presence of an effective magnetic field, Heff, and a spin current, Is with a few other terms describing the interactions between the nanomagnet and the spin-current [15]:

12

Magnetization

1.1 1 Data = ‘0’ High Current 0.9 Data = ‘1’ 0 Parallel Mx Low Current Anti-Parallel -1 |Current| (mA) |Current| 0.7 0.2 1 0 0.2 0.4 0.6 0.8 1 0 0 Time (ns) My -0.2 -1 Mz

(a) (b)

Figure 2.2 Switching of the MTJ FL under the influence of: (a) current and, (b) magnetic field (the MTJ parameters used are outlined in Table 2.1).

휕푚⃗⃗⃗ 1 + 훼2 = −훾(𝑚⃗⃗ × 퐻⃗⃗⃗⃗⃗⃗⃗⃗⃗ ) + 훼훾(𝑚⃗⃗ × 𝑚⃗⃗ × 퐻⃗⃗⃗⃗⃗⃗⃗⃗⃗ ) + 휏 + 훼 𝑚⃗⃗ × 휏 (1) 휕푡 푒푓푓 푒푓푓

푚⃗⃗⃗ ×퐼⃗⃗⃗ ×푚⃗⃗⃗ 휏 = 푠 ≡ Spin torque [15] 푞푁푠

Here γ is the gyromagnetic ratio, α is the Gilbert damping parameter, q is the charge of an electron and Ns is the total number of spins in the nanomagnet given by where Ns = MsV/µB. Ms, V and µB are the saturation magnetization, volume of the nanomagnet and the Bhor magneton respectively. The first two terms represent precession and damping torques respectively, these govern the dynamics of the magnetization in the presence of an effective magnetic field. The last two terms represent the current-induced torques that take a Slonczewski-like and field-like forms.

Fig. 2.2 illustrates the switching—governed by the LLG equation, of the MTJ FL under the influence of current (Fig. 2.2(a)) and magnetic field (Fig. 2.2(b)). The values of constants used in the model are provided in Table 2.1. Sufficient current/field will provide the necessary STT to

13

Table 2.1 MTJ parameters used.

Parameter Value Ms 780 emu/cc

Demagnetization Field 4*π*Ms Bohr magneton(µB) 9.27e-24 J/T

α 0.007

Exchange Constant (A) 20e-12 J/m.

Length(l)/Width(w)/Thickness(t) of NW 40e-9 m/40e-9 m/10e-9 m

ɣ 1.76e11 /G s

Energy Barrier (EB) 56*kB*T

flip the magnetization by 180° (to the other easy axis). It must be noted that the current and magnetic field direction plays a vital role in the FL switching.

2.2.3. Design Fundamentals of DWM

2.2.3.1. Basics of DWM

DWM is an extension of the STTRAM that retains all the benefits of the STTRAM while providing even higher density and more varied functionality. DWM consists of three components:

(a) write head (b) read head and (c) NW. The read and write heads are similar to conventional MTJ whereas NW holds the bits in terms of magnetic polarity. The most interesting component of NW is the formation of DWs between domains of opposite polarities. The dynamics of NW is determined by the dynamics of DW. The DWs can be shifted forward and backward by injecting charge current from left-shift (LS) and right-shift (RS) contacts. In essence, the NW is analogous

14

to a shift register. The new domains are injected by first pushing current through shift contacts to move the bits in lockstep fashion in order to bring the desired bit under write head. Next spin polarized current is injected through write MTJ (using wBL and SL) in positive or negative direction to write a ‘1’ or ‘0’ (up-spin or down-spin) in the NW. Read is performed by bringing the desired bit under read head using shift and sensing the resistance of MTJ formed by DW under the

Read Bitline Write Bitline Read head Write head (MTJ) (MTJ) Parallel 0 Right Left Anti-parallel 1 Shift Shift NW Domain wall Wordline GND Source Line

(a)

Vortex Axis of rotation

Width/thickness Néel Wall of this work

Axis of rotation

NWthickness Transverse NW width Bloch Wall

(b) (c)

Figure 2.3 (a) Schematic of Domain Wall Memory and governing equations (b) types of DWs and their dependency on NW dimensions and, (c) Bloch and Neel wall.

15

read head (using rBL). It can be noted that this new access mechanism makes shifting of DWs critical to the functionality of the memory. The robustness, speed and power consumption of the memory has a significant dependency on DW dynamics. The schematic of DWM with the equations of the MTJ free-layer and DW is illustrated in Fig. 2.3(a).

2.2.3.2. NW material and DW type

There are two types of magnetic NW materials for potential use in DWM memories namely, perpendicular magnetic anisotropy (PMA) and in-plane magnetic anisotropy (IMA). Both of these NWs store bits in terms of magnetic orientations. The easy axis of in-plane DWM is aligned horizontally whereas it is oriented vertically in PMA.

When two domains meet in the NW, a DW is formed where the magnetization changes orientation (shown schematically in Fig. 2.3(a) inset). The most prominent types of DWs are: (a) transverse wall (TW) and, (b) vortex wall (VW). The type of DW depends on the width and

Psi Vs Q for one Pinning Site 10 Process Variation 0 nw nd -10

Notch width (nw) Notch -20 (deg) depth  -30 (nd) U=80m/s w t -40 U=90m/s U=100m/s -50 0 0.5 1 1.5 L q1 q2 q3 Q(Um)

(a) (b)

Figure 2.4 Domain wall pinning: (a) nanowire with pinning sites at q1, q2 and q3. (b) ψ vs. q plot for pinning at q1.

16

thickness of the magnetic NW [12] (Fig. 2.3(b)). For the thickness and width under consideration in this work we safely assume TW. The TW can be either Nèel wall or Bloch wall [20, 21] (Fig.

2.3(c)). For this study we have considered Bloch wall which is more prominent in thin NWs [22].

2.2.3.3. Pinning of DW

In order to move the DW in lockstep fashion, notches are built in the NW where the DW can stop (Fig. 2.4(a)). This also prevents the possibility of annihilation of back-to-back propagating

DWs if they move with slightly different velocities. The width and depth of notch determines the strength of pinning. Once the DW is pinned, a current pulse can dislodge the DW from the pinning site so that it can move to next bit location (Fig. 2.4(b)). PV can result in an uneven NW surface and may be manifested as unwanted pinning potential (Fig. 2.4(a)). A minor pinning potential may not cause full pinning of the DW, but it can degrade the velocity. Accumulation of such minor velocity degradations may result in the eventual pinning of DW (Fig. A.5(c, d)). Unwanted pinning will cause functional failure from circuit standpoint as the desired bit may fail to propagate to the read/write head.

2.2.4. Modeling of DW dynamics

The magnetization dynamics of DW (i.e., movement w.r.t to time) is given by modified

Landau-Lifshitz-Gilbert (LLG) [23] equation to include adiabatic and non-adiabatic spin-transfer torque terms [24, 25] as follows

휕푚⃗⃗⃗ 휕푚⃗⃗⃗ = −훾𝑚⃗⃗ × 퐻⃗⃗⃗⃗⃗⃗⃗⃗⃗ + 훼𝑚⃗⃗ × − 𝑢 푗 ⋅ 훻 𝑚⃗⃗ + 훽𝑢𝑚⃗⃗ × 푗 ⋅ 훻 𝑚⃗⃗ (2) 휕푡 푒푓푓 휕푡

Where 𝑚⃗⃗ and 푗 are unit vectors representing magnetic moment of DW and current flow

1 δw respectively, 퐻⃗⃗⃗⃗푒푓푓⃗⃗⃗⃗⃗ = − is the effective field, α is damping constant, β is non-adiabatic spin 휇0Ms δm⃗⃗⃗ torque transfer term and u is a scalar quantity having the units of velocity. Term u depends on the

17

current density J, the spin polarization P, saturation magnetization 푀푠 and Bohr Magneton 휇퐵 as follows:

μBJP ℏe u = , μB = (3) eMs 2me

In the above expression, ℏ is reduced plank’s constant, e is electron charge and me is electron mass. (2) is solved by converting the vectors from Cartesian to polar co-ordinates. The final expressions of motion are given by

2 휇0 푉푞 1 + 훼 𝑞̇ = 훾Δ 퐻푘 sin 2휓 − 휋퐻푇 + 훼Δ훾 (휇0퐻퐴 − ) + 1 + 훼훽 𝑢 (4) 2 푀푠푑

2 휇0 V푞 훽−훼 1 + 훼 휓̇ = − 훼훾 퐻푘 sin 2휓 − 휋퐻푇 + 훾 (휇0퐻퐴 − ) + 𝑢 (5) 2 푀푠푑 Δ

Where, 𝑞̇ and 휓̇ are the time derivatives of the domain wall position and tilt angle respectively. The detailed derivation of above equation from LLG is provided in [Appendix A.2.].

We model the position (q) and tilt angle (ψ) (Fig. 2.5(b)) of DW using Verilog-A. The values of

Current driven Variation of Velocity 2000 =0.0 (a) =0.01 1500 =0.02 =0.04 =0.1 1000 -ex=0.0 -ex=0.01 -ex=0.02 Velocity(m/s) 500 -ex=0.04 -ex=0.1

0 0 500 1000 1500 2000 U (m/s) (b) ψ q

Figure 2.5 (a) DW velocity vs. applied current (experimental values are also plotted), (b) ψ & q in a NW with a DW.

18

Table 2.2 DWM parameters used.

Parameter Value α Varied (0.01 - 0.02)

β Varied (0.0 - 0.1)

Bohr magneton(µB) 9.27e-24 J/T

Ms 8e5 A/m

Exchange Constant (A) 1.3e-11 J/m.

Length(l)/Width(w)/Thickness(t) of NW 1e-6 m/1e-7 m/10e-9 m

ɣ 1.76e11 /G s

Demagnetization Field (Hk) 1600~1800 Oe.

constants used in the model are provided in Table 2.2. The average DW velocity is found by taking the ratio of distance moved by the DW and time. The outcome is illustrated for different magnitudes of α and non-adiabatic spin-transfer torque (β). It can be observed that velocity obtained from simulation (Fig. 2.5(a)) matches experimental data closely and confirms the validity of our model.

2.2.5. Microscopic Properties

Besides entropy, the DWM possesses several microscopic properties (discussed below) that are modeled in our simulation framework:

Stochastic DW motion (speed and polarity): The DW speed depends on the type of wall, spatial location of notches (which is stochastic in space), shift current magnitude, frequency as well as duration for which pulse is applied. The phase between a notch and pulse arrival time is a stochastic process in time domain. Fig. 2.6 represents the stochastic nature of notch location w.r.t. shift pulse. The DW with the same initial velocity simulated with two different shift pulse 19

Variation of Velocity Vs Time 250 0.5ns 2V pulse 5ns 0.725V pulse 200 Pinning sites

150

100

Velocity(m/s) 50 Speed 0 Stochastic difference DW motion -50 0 10 20 30 40 50 Time(ns)

Figure 2.6 DW dynamics with two different shift current magnitude and duty cycle but same average value is also shown.

characteristics (with same average value) moves with different velocities. Therefore, DW velocity is stochastic and our model will capture the behaviors for random roughness of the NW and DW nucleation.

Stochastic DW pinning: The DW moves smoothly if the NW is free of roughness (notches).

However, process variation-induced edge roughness can hinder and slow down the DW that may result in eventual pinning. Pinning of DW is a stochastic process and depends on surface roughness

(magnitude and spatial location), injected shift current and magnetization dynamics. Once pinned, the DW behaves as a particle trapped in a potential well. The depinning is also a stochastic process and depends on the injected current magnitude, frequency and environmental conditions. As described earlier, Fig. 2.7 represents the stochastic nature pinning due to notch location, DW speed and shift pulse. Fig. 2.7 shows that an out-of-sync pulse may degrade the DW speed and lead to pinning.

20

Pinning sites depinning Current driven Variation of Psi Vs Q 10 Pinning 0

-10

-20 Psi(deg)

-30

-40 U=85m/s Stochastic DW U=100m/s U=115m/s pinning/depining -50 0 0.5 1 1.5 2 Q(Um)

Figure 2.7 Stochastic pinning of DWs with respect to three different shift currents.

2.2.6. Macroscopic Properties:

Initialization and resetting: Initially, the NW is magnetized in a preferred direction determined by the balance of exchange and anisotropy energies. Therefore, the NW is free of DWs before first access. In order to populate new information in the NW, DWs are nucleated using access MTJ (write head) by injecting sufficient current in the orthogonal direction to flip the local magnetization under the MTJ. The NW can be flushed out by simply injecting current through shift contacts and moving the bits out.

Bipolar DW nucleation: The nucleation of DW in the NW is bipolar in nature. The current polarity (through write MTJ) determines the type of DW (head-to-head or tail-to-tail).

Multiple domains/NW: The NW can hold multiple domains or information bits by nucleating new DWs through write MTJ and shifting them in the NW similar to shift register. The limit of bits per NW is set by the NW dimensions.

21

Serial access and bipolar shifting: In contrast to conventional magnetic devices, DWM stores multiple bits of information. The group of DWs is shifted together by injecting current. The bits in the NW can be shifted in both directions by changing the current polarity.

Misc. properties: DWM provides opportunity to exploit multiple sense points (read heads) along the NW for continuous collection of entropy and randomness. The position of read heads are determined according to requirements and modeled accordingly. Each DWM is also be characterized in terms of power and performance for DW nucleation, shift and sense operations.

The physical dimension of the DWM with integrated MTJ access devices is estimated for footprint analysis.

2.3. Summary

We describe the fundamental properties that govern the operation of STTRAM and DWM and discuss their salient features. We present a physics-based model of STTRAM magnetization flipping as well as the dynamics of domain wall motion. We also discuss the various microscopic and macroscopic properties surrounding DWM. These concepts will form the foundation for the topics covered in chapters 3,4 and 5.

22

Chapter 3

Energy-Efficient Spintronic Applications and Techniques

Mobile devices such as smartphones, laptops and ipads demand ultra-low power and instant-ON (ION) user experience after hibernation or power failure. Over the past decades, the power consumption of integrated circuits (ICs) have been lowered significantly through CMOS scaling. This allowed Internet-of-things (IoTs) to become more portable while being powered by a modest battery or some ambient energy source (e.g. solar cells). Most IoTs are intermittently active, making the store and restore operation fundamental to their performance and functionality.

Traditionally, this has been performed by backing-up the current processor states of the IoT into volatile memories (such as Flip-Flops (FFs)). When idle, the supply voltage to the FFs can only be lowered to the point of ensuring reliable data retention. The system therefore isn’t capable of being completely ‘powered down’, hence continually consuming energy. Additionally, the reliance on a stable power supply is essential in ensuring the processor states are maintained correctly. This makes it hard for IoTs that operate under ambient energy sources.

To address this problem, state retentive sequential elements are being actively investigated as they can store the processor state before being fully powered-down, that can potentially cut down the ON time drastically, not to mention, drastically decreasing the energy consumption of such

IoTs. Several nonvolatile flip-flops (NVFFs) have been widely investigated [36-40] to this effect.

NVFF saves the current logic state into its NV storage element before the power gating. After wake- up the data saved in the nonvolatile storage is restored in the flip-flop to resume normal operation.

23

Therefore, a quick and efficient ‘power-down’ and ‘power-up’ condition is of critical importance to provide an ION and reliable experience.

3.1. Introduction

A primary challenge in conventional NVFF [36,37,39,40] is the lack of support to handle sudden power outage. The NVFF in [38], (Fig. 3.1(a)) incorporates two additional write driver

D

Slave Register

(a) (b)

D

(c)

Figure 3.1 (a) spin-MTJ based NVFF [38], (b) NVDFF [36] design, and (c) SHE-NVFF [37] based design.

24

Table 3.1 Design comparison with other works.

Spin-MTJ NV-DFF [36] SHE-NVFF [37] NMFF [39] NVFF [38] Technology 65nm NA 90nm 65nm Power-aware No write driver, Low power, Magnetic field Main feature bias-control small area small area based MTJ Store energy 0.304pJ 0.49pJ / 5pJ 1.5pJ

Power gating NA NA NA NA 5.65um × Area 1.2X (std FF) 1.28X (std FF) NA 10.15um 10.91ns & Clock period 10ns 3.3GHz 3.5GHz 18.54ns * 4ns for MTJ Only during Critical path 29.45ns ~300ps switching data store Test capability NO NO NO NO

FeFET NVFF [115] RRAM NVFF [116] Gated ES-NVFF

Technology 10nm 180nm 22nm predictive [42] Compact Design, Robust Sub Vt and Parallel write, fewer Main feature Low-power near Vt operation control signals Store energy 1.3fJ 0.74pJ [email protected]

Power gating NA NA YES 2X (std FF) Area 1.35X (std. FF) NA MTJ = 10X10nm Clock period ~1ns 10ns during store ~2GHz Only during data Only during data Dependent on MTJ Critical path store store write time (<1ns) Test capability NO NO YES NA – Not Available * - Two step operation with AP→ P delay (10.91ns) and P→AP delay (18.54ns).

circuitry to store the data into the MTJs. This is associated with increased area and power overhead.

Although the NVFF in [36] (Fig. 3.1(b)) provides a more power-efficient solution, incorporating

25

the MTJ in the operational path incurs delay overhead limiting the operating frequency of the flip- flop. A spin-hall effect (SHE) based NVFF is presented in [37] (Fig. 3.1(c)) for energy-efficiency.

However, a delay of ~30ns for storing the data in MTJ makes it impractical towards per cycle data backup. Other previously proposed methods such as [40,41] involve high delay and power overhead as the resistance of the MTJs are sensed by a sense amplifier and forwarded to the slave circuitry.

In this chapter we propose two NVFF flavors that are capable of backing up data per-cycle while maintaining a moderately low delay along with delay testing capability. We also analyze the effect of supply voltage scaling, asymmetry of writing into the MTJ and the impact of static leakage power using the 22nm predictive model [42]. Table 3.1 provides a summary of comparisons of the proposed NVFFs with respect to [36-39, 115, 116]. It includes flip-flop performance metrics such as clock-to-Q delay, critical path delay (limited by MTJ write time) and total cycle time. It also includes store energy, area and presence of other features such as enhanced scan capability and power gating (to save store energy when flip-flop data is same as input data). The proposed design not only provides test capability and power gating, but also allows in every cycle with a maximum operation frequency of ~2GHz. These features allow it to sustain unexpected power failures and provide an instant-ON user experience.

3.2. Enhanced-Scan Enabled NVFF (ES-NVFF)

Enhanced scan flip-flops are widely accepted form of sequential design-for-test technique to enable two-pattern delay testing. We incorporate the store and restore functionality in the hold latch of the enhanced scan circuitry. We propose two flavors of ES-NVFF namely base ES-NVFF and high-performance ES-NVFF (HPES-NVFF).

26

3.2.1. Base ES-NVFF

It consists of two parallel latches to allow normal, enhanced scan and store-restore operations (Fig. 3.2(a)). The output of the master latch is provided to the slave as well as the NV latch. The HOLD and REST signals control the operating mode of the FF. The primary property of this design is that the writing of the MTJ is only during the negative phase of the clock cycle.

Furthermore, the write of MTJs take place serially. In the following paragraph we describe the operation of ES-NVFF in detail.

Normal Mode: During the ‘normal’ mode of operation, both the HOLD and REST signals are low, which sets the ‘ST’ signal (also controlled by CLK), enabling transmission gate T1 and disabling T2 that is controlled by the ‘HOLD’ signal. The data from the master stage is fed to both the parallel latches. The output Q is driven by the slave latch. While the slave is pushing data out, it is also stored into the NVFF in parallel using control signals (SEN and CTRL).

Store operation: The timing diagram and switching of MTJ [15] resistance is captured in Fig. 3.2(b). The current paths for switching the magnetization of the MTJs with respect to their node voltages (SN1/SN2) is illustrated in Fig. 3.3(a). Signal ‘SEN’ is enabled (activates the access transistor (TA)) during the negative CLK phase. The CTRL signal is pulsed high for half of the

‘SEN’ signal to enable the writing of the MTJ with ‘0’ node voltage (SN1/SN2). A voltage difference is created between the nodes (SN1/SN2) and CTRL, which provides the current to switch the state of the MTJ magnetization (the transition from AP P magnetization state is as shown in

Fig. 3.2(b)). In the next half phase, the CTRL is made low to enable the MTJ with ‘1’ node voltage

(SN1/SN2) to generate a current that switches its magnetization state to ‘1’. Note that the MTJ stores are done sequentially which makes operational frequency of the flip-flop MTJ write latency dependent. This is drastically different than conventional sequential element design where operational frequency is mostly determined by the combinational logic delay.

27

Enhanced Scan Mode: Fig. 3.2(b) shows the waveforms in enhanced scan mode. The

NONVOLATILE LATCH

Store-Restore & Enhanced Scan SHARED 2X 1X

D 2X

SI 2X

2 1X 1X

NORMAL MASTER LATCH SLAVE LATCH

(a) POWER NORMAL NORMAL MODE ENHANCED SCAN MODE TEST MODE OFF RESTORE MODE CLK Vdd MTJ1 MTJ2 Vdd Ramped SEN CTRL D Restore SN1

SN2 Observed OUT Transition R1 V1 shifted in V2 shifted in R2 NORM Resistance ST Switching HOLD V2 is held REST

(b)

Figure 3.2 (a) Schematic of the proposed base ES-NVFF circuit, and (b) corresponding timing diagram describing the various operation modes. 28

test pattern V1 is first shifted into the NV latch of all the flip-flops by the scan-out chain (SO). The

HOLD signal is then asserted which makes NORM=0 (disabling transmission gate T3) and enables transmission gate T2. The output is driven by NV latch while the second test pattern V2 is scanned

2 Current direction during Storage

O→1 1→0

D

SI

(a)

2 SN1 voltage sets S1 node

MTJ1 drain MTJ2 drain

current current

D

SI

(b)

Figure 3.3 Base ES-NVFF current paths during (a) store and, (b) restore.

29

through scan-in (SI) port and shifted in the scan chain. Next HOLD is made ‘0’ (re-enabling T3) and the two-pattern transition is injected into the combinational logic. Note that the test clock is typically much slower than functional clock. Therefore, MTJ write latency is not critical for performance.

Note that different clocks namely system clock and test clock are used during normal mode and test mode respectively. In the proposed design a common clock is used in both normal flop and enhanced scan latch. The selection between clocks is done with the help of a MUX that is located elsewhere. The test clock is selected only during the test mode when the ‘HOLD’ signal is asserted.

The store and restore operation is performed during normal mode without asserting the ‘HOLD’ signal. Therefore, the transmission gates of enhanced scan latch are controlled by the system clock.

When the enhanced scan latch is activated in test mode by asserting HOLD signal, the test clock is selected for the scan, hold and launch operations. Note that the MTJs will get written during test mode as well. This may not be an issue since the test clock is typically slower than normal clock.

In other words, the proposed design performs the store operation every cycle (and not only before power-off) seamlessly, without the need of switching normal and test clock.

Restore Mode: The restore operation is illustrated in Fig. 3.2(b). Fig. 3.3(b) describes the current paths during the restore mode. Initially, the power supply (Vdd) and SEN are ramped up while maintaining a low CLK signal. Due to a difference between the resistances of the two

MTJs (opposite magnetization states), their respective current drivability equally vary. As the node voltage (SN1/SN2) rises, the voltage is being drained through the MTJ resistance. Due to a difference in the path resistance, a mismatch in the current is generated, which in turn leads to a voltage difference between SN1 and SN2. This enables the back-to-back inverters to latch to the corresponding logic state (as observed in Fig. 3.2(a)). Once latched, T4 is activated (with a high

REST signal) to set the slave latch node voltage S1 (as shown in Fig. 3.3(b)) with minimal contention, thus completing the restore operation. 30

3.2.2. High Performance ES-NVFF (HPES-NVFF)

(a) POWER NORMAL NORMAL MODE ENHANCED SCAN MODE TEST MODE RESTORE OFF MODE CLK Vdd MTJ1 & MTJ2 Vdd Ramped written Parallelly D

SN1 SN2 Restore Resistance Observed OUT Switching Transition R1 R2 V2 shifted in NORM V1 shifted in ST HOLD V2 is held REST

(b)

Figure 3.4 (a) Schematic of the proposed HPES-NVFF circuit, and (b) corresponding timing diagram describing the various operation modes. 31

From the previous design we note that the base ES-NVFF stores the data serially during the negative phase of CLK cycle. Since MTJ write is delay intensive, it limits the frequency of the flip-flop. The primary feature of HPES-NVFF design is that it allows the MTJs to be written in parallel, thereby increasing the frequency of operation. Fig. 3.4(a) shows the schematic of the proposed HPES-NVFF design. We parallelize the write of MTJs by removing the access transistor

(TA) and CTRL and using output of the master stage to drive the MTJ. Therefore, the CTRL is replaced by data input of the nonvolatile latch (NVL), through the inverters (I1, I2 and I3). Inverters

I1, I2 and I3 form the necessary complementary inputs to the MTJ. By providing separate write drivers to the MTJs, the write operation can be performed during the entire CLK cycle.

The timing diagram and switching of MTJ [15] resistance is captured in Fig. 3.4(b). We observe a similar functionality as that of the base ES-NVFF, however, both the MTJs are written in parallel (as shown in Fig. 3.4(b)) upon receiving a new input (during high CLK). Additionally, by removing the access transistor, the resistance of the path 1 (shown in Fig. 3.5) is reduced. This allows for a 5-8X larger current (by correspondingly sizing the INVs) to flow into the MTJs, thus reducing their effective write time which in turn leads to an increase in operation frequency. As much as 5X frequency benefit can be obtained compared to the base ES-NVFF at the cost of extra area overhead (discussed in Section IV) due to drivers. By controlling the transmission gates (T1,

T2, T3 and T4) both ES-NVFF and HPES-NVFFs can store the information every cycle or backup data prior to power gating.

3.3. Design Analysis of HPES-NVFF

In this section we analyze the HPES-NVFF with respect to supply voltage scaling and the inherent asymmetry in write latency between the two MTJ states (i.e., parallel-to-antiparallel and vice versa). Additionally, the impact of static power during retention and back-to-back writing of same data is investigated. We also present an input dependent power gating to mitigate static power. 32

2 1 Path 1 2

I1 Path 2 I2 I3

D SI

2

Figure 3.5 Current paths for the MTJ operation.

3.3.1. Write Asymmetry: Analysis and Mitigation

The asymmetry of write in NVFFs originate from two sources namely, (a) inherent asymmetry between P→AP and AP→P switching. The polarization of charge current is higher during AP→P switching that makes the write latency faster compared to P→AP switching [43, 44]; and, (b) asymmetry in the driving circuit for the two MTJs. The operating frequency of the flip- flop is determined by the worst case MTJ write latency. Therefore, thorough analysis and mitigation is needed to eliminate the frequency bottlenecks originating from MTJ write latency.

It is clear from Fig. 3.5 that the node voltage of SN1 is maintained by the driving capability of the master stage inverter which is typically sized larger. MTJs are correspondingly written by the bi-directional current flowing in paths 1 and 2 respectively. We note that path 1 (I1, T1 and

33

Asymmetry of Write (MTJ1)

2500 MTJ1P-AP Reduction in MTJ1AP-P → MTJ1 P AP MTJ1P-AP(8X) 2000 Write time MTJ1P-AP(16X) MTJ1P-AP(24X) 1500 P→ AP ~850MHz ~750MHz

Write time (ps) time Write 1000

500 AP→ P

0.7 0.8 0.9 1 1.1 Voltage (V)

(a) Asymmetry of Write (MTJ2) 2000 MTJ2P-AP Reduction in MTJ2AP-P MTJ2 P→ AP MTJ2P-AP(8X) 1500 Write time MTJ2P-AP(16X) MTJ2P-AP(24X)

P→ AP 1000

~400MHz ~550MHz Write time (ps) time Write

500 AP→ P

0.7 0.8 0.9 1 1.1 Voltage (V)

(b)

Figure 3.6 Asymmetry in the MTJ write times for (a) MTJ1 and, (b) MTJ2.

M1) is the critical current path due to presence of multiplexers, which results in increased path resistance and thus effectively reduces the overall write current. Aside from this issue, the switching property of the MTJ is directly dependent on the current magnetization state of the free layer with respect to the fixed layer. This leads to an asymmetry in writing. The critical path delay is affected 34

by cumulative effect i.e., due to slower MTJs switching from P→AP magnetization state and the resistive path.

Fig. 3.6 captures the impact of magnetization switching times of AP→P and P→AP switching for both the MTJs with respect to supply voltages. For circuit simulation we have used the verilogA model of MTJ following [42] that incorporates the polarization difference between

P→AP and AP→P. We note that the switching time for MTJ1 from P→AP is the slowest. In order to bridge the gap between the switching times between the magnetizations states, sizing of the current path components needs to be carefully selected. To improve the path-1 write current, we increase the driving capability of master stage inverter by increasing the size of the PMOS and the transmission gate and the size of NMOS transistor of inverter I1. In the current path-2, we increase the size of the PMOS of the inverter that drives node SN2 and also the size of the NMOS of the feedback inverter.

The dotted lines in Fig. 3.6(a-b) represent the variation of write time of P→AP of MTJ1 and MTJ2 for 8X, 16X and 24X transistor size respectively. We notice a significant reduction of write time (~60%) with the increase in transistor sizing. Furthermore, the gap between the write times of the two magnetization states is reduced by 81% at 1V and 103% at 0.8V. A similar sizing exercise can be performed for the current path-2 to ensure near uniform write times. Aside from improving transistor sizing, boosting supply voltage also reduces the overall write time. A supply voltage of 1.1V allows us to operate the flip-flop at a frequency of ~2GHz with 16X sizes. At 0.7V the operating frequency is ~0.75GHz.

3.3.2. Power Gating Scheme

Upon completion of the write operation in both the MTJs, the high branch current continues to flow leading to a large amount of short circuit leakage current. This not only increases the power

35

consumed but also degrades the lifetime of the MTJ [44]. Furthermore, infrequent switching of the

NONVOLATILE LATCH 2 2X SHARED

I1 1X

2X I2 I3 1X 2X 2

D 2X 2X SI IN 3

2 1X 1X

1X

TERTIARY LATCH 2X

(a)

CLK

D

Q 1 Cycle Input Delay SN4 SN3 & SN4 SN3 XORed

Gate I1 & I3 I1 & I3 Active Inactive Sleep MTJ1 MTJ2 SN1 Floating Floating

MTJ1

MTJ2

(b)

Figure 3.7 GHPES-NVFF, (a) circuit schematic, and (b) timing diagram illustrating the gating process.

36

input consumes static power even when the MTJs are not written with new values (Fig. 3.8).

In order to efficiently cut down the unnecessary static power, we apply power gating technique to shut off the INVs I1 and I3 (as shown in Fig. 3.7(a)) when there is no change in the input of the flip-flop. This is achieved by comparing the current state of the flip-flop with the previous state. If the value is unchanged then the write drivers are disconnected from the supply using gating transistors. The gated HPES-NVFF (GHPES-NVFF) circuit schematic is shown in

Fig. 3.7(a), wherein the NVL is shifted to the master section and a tertiary latch is incorporated to compare the difference between previous and the current states of the flip-flop. The tertiary latch is connected in series to the slave latch in order to provide one cycle delay between the states being compared. This allows the NVL one whole cycle to store the input in the respective MTJs (as shown in Fig. 3.7(b)). The gating transistors are controlled by a sleep signal. These signals are obtained by XOR-ing the node voltages SN4 and SN3 (previous and current voltage states). We note from

Fig. 3.7(b) that the sleep signal is only active for one CLK cycle, when there is a mismatch between

SN4 and SN3. When the input remains unchanged, the node voltage MTJ1 and MTJ2 are floating, thereby disconnecting it from the current path. Note that both PMOS and NMOS gating is required due to bidirectional nature of the current that flows through the MTJ depending on input data polarity. The XOR gate and tertiary latch add extra area overhead in the design.

Fig. 3.8 quantifies the effect of leakage energy of both switching (AP→P and P→AP) for

MTJ1 and MTJ2 for back-to-back unchanged inputs for multiple CLK cycles. We observe a significant reduction (almost 100X) in the short circuit leakage energy between the HPES-NVFF and the GHPES-NVFF. This feature is absent is all current state retention flip-flops.

Table 3.2 summarizes the salient features of the three proposed NVFFs. It can be noted that the base ES-NVFF, requires two control signals (one for the TA and the other to the MTJ) for

37

EnergyEnergy consumed Consumed forfor C non-varyingonstant Input IP

4 Without 10 ) Power fJ Gating ~100X MTJ1P-AP MTJ1AP-P MTJ2P-AP With MTJ2AP-P MTJ1P-AP-PG Total energy (fJ)energy Total Power Leakage Energy ( Energy Leakage 3 MTJ1AP-P-PG 10 Gating MTJ2P-AP-PG MTJ2AP-P-PG

2 4 8 16 NumberNumber of of CLK CLK cycles cycles of ofConstant Constant Input IP

Figure 3.8 Comparison of short circuit leakage energy between the HP-ESE -NVFF and GHPES - NVFF.

storing the corresponding data. Additionally, the storage operation is performed serially (one MTJ after another) and only during the negative clock phase (Fig. 3.2(b)). This results in a reduction in the operation frequency. In order to improve performance, we proposed the HPES-NVFF where the store operation occurs parallelly and the entire clock cycle is utilized for writing data into the

MTJ. However, this leads to a high current through the MTJs which consumes high power and might degrade MTJ lifetime. In order to reduce the static power, the input gating scheme is proposed. The GHPES-NVFF uses a tertiary latch to hold the previous data, which is used to compare with the current data. If there is no change between the pervious and current data, the feedback inverters are disconnected from the power supply thus eliminating the static power.

Note that the proposed design is free of read disturbance. During restore operation the voltage at the nodes SN1 and SN2 is latched and the feedback inverters at the drain of the MTJs reset the MTJ into its corresponding correct state. Once the latch is initialized in same polarity as

38

Table 3.2 Comparative analysis of proposed techniques.

Title Base ES-NVFF HPES-NVFF GHPES -NVFF Input gated to achieve With enhanced scan, Parallel write, fewer Main feature low power no write driver control signals consumption Store Energy 0.55pJ@ 1.1V 0.57pJ@ 1.1V 0.57pJ@ 1.1V Restore Energy 33fJ 58fJ 58fJ Timing ~350MHz ~2GHz ~2GHz Power Gating NO NO YES Area 1.8X (std FF) 2X (std FF) 2.5X (std FF) Timing ~350MHz ~2GHz ~2GHz C-to-Q delay 30.9/ 33.5ps 28.9/ 32.1ps 31.2/33.8 (fall/rise)

MTJ it writes the same polarity back thereby avoiding read disturbance. During normal operation, once the store is done, the latch again writes the same polarity as stored value. Therefore, the read disturbance is absent.

3.4. Other Energy Efficient Techniques

In order to achieve the highest area density, a crossbar type memory architecture is required. However, due to the high current requirements and the problems of sneak paths a 1T-

1STT cell suffers from these issues. To truly achieve a low-power and dense design, a novel selector switch- that offers very good ION to IOFF current ratio with a small footprint is required.

3.4.1. Exploration of Selector Diode-STTRAM Crossbar**

Transistor-based selectors increase the bitcell footprint. Usage of selector diodes (SD) can greatly reduce the bitcell footprint, however, due to poor ION to IOFF ratio of MTJ the method 39

ON with ON with OFF Negative Positive bias bias

STTRAM e Metal-2 (TE) e e Oxide-2 BE TE Oxide-1 (TiN) (TiN) Metal-1 (BE) Selector Diode Oxide-1 (TaOx) Oxide-2 (MgO)

V= 0 V Vr< VT- Vr≥ VT- Vf≥ VT+

(a) (b) (c) (d) (e)

Figure 3.9 (a) MIIM diode stack, (b) band-diagram at 0 bias, (c) band-diagram at negative bias

(Vr) on TE ( Vr< VT-), (d) band-diagram at negative bias (Vr) on TE ( Vr> VT-), (e) band-diagram at positive bias (Vf) on TE ( Vf> VT+). BE is grounded

was infeasible. In order to overcome this issue, we investigate a novel metal-insulator-insulator- metal (MIIM) diode which can conduct bi-directionally. We describe a 1-D 1-MTJ model which we use to perform design space exploration for large and robust arrays.

3.4.1.1. Overview

Two-terminal MIIM tunneling diode contains a stack of metal-1/oxide-1/oxide-2/metal-2

(Fig. 3.9 (a)). When negative applied bias on top electrode (TE) is less than negative threshold voltage (VT-), the electrons tunnel through full thicknesses of oxide-1 and oxide-2 at low negative bias and therefore, this current (dominated by direct tunneling) is very low (Fig. 3.9(c)). When voltage reaches VT- the electrons will experience just the thickness of oxide-2 and tunneling current

** This work was in collaboration Dr. Rashmi Jha and Rekha Govindaraj. 40

-4 Current Vs Voltage x 10 1 P AP 0.5

0 Sense

Margin Current (A) Current -0.5 P→AP FLIP

-1 -0.06 -0.04 -0.02 0 0.02 0.04 AP→P Write Voltage (V) FLIP Voltage 푀 = − )/ _ = ( ) ( ) = 푀 (a) (b) Figure 3.10 (a) I-V curve of 20nmx20nmx5nm MTJ obtained using [15] with Ms=780Oe, Ea=56kT erg, Ku=Ea/v erg/cc, alpha=0.007, pol=0.8. (b) I-V curve of 1D-1MTJ for various RL.

is expected to increase leading to the turn-ON of device in reverse bias (Fig. 3.9(d)). In forward bias (i.e. positive voltage on TE) Fowler-Nordheim (FN) current will dominate through the triangular barrier due to the lower barrier height of BE on HfO2 (ΦBE) (Fig. 3.9(e)). This will turn-

ON the device at VT+.

Fig. 3.10 illustrates the MTJ and 1D-1MTJ I-V plots. It is noted from Fig. 3.10(a) that the

MTJ flips at lower voltages since critical current is delivered. However, in case of the 1D-1MTJ, the I-V curve is dominated by the diode. This underscores the need of delivering high ION by diode quickly above VT to lower the write voltage. The opening of I-V determines the sense margin. Fig.

3.10(b) illustrates the I-V curve of 1D-1MTJ for various RL. It is evident that higher RL can improve the sense margin. Additionally, the AP→P switching can limit the write voltage due to the asymmetric diode I-V. The write voltage and sense margin with respect to RL indicates the need for an optimization for robust read and lower Vmin.

41

Vread Fwd biased Rs Rs Vref SA Rev. biased Diff. & solving we get

Rev. biased

Selected bit

GND Figure 3.11 Sneak current during sensing.

Fig. 3.11 shows the read schematic and explains the lower sneak current with proposed

SD. The selected bit is sensed in reverse bias which in turn reverse biases two out of three diodes in sneak path. Due to voltage division, each diode sees less than Vread/3 which is less than VT-. The sneak path current is cut down by orders of magnitude due to extremely low leakage of SD in reverse bias. Hence robust sense margin is attained even for larger crossbar arrays.

3.4.2. Mitigating Joule Heating and Electromigration in DW NW

The shift operation is the most widely used operation in DWM. The high current densities required for shifting the DWs in the NW results in Joule heating and corresponding rise in temperature and performance degradation. The crucial challenge in DWM is the reliable shift operation which requires high current density (~1010-1012 A/m2) to push the DWs. The DW velocity degrades as the temperature increases over time. Therefore, the desired DW will fail to reach read/write head resulting in functional failure. In order to assess the impact of Joule heating we consider four operating modes:

42

Max Resistance Vs Switching Pattern Max Temp Vs Switching Pattern 1800 J=0.5E12 J=1E12 J=2E12 J=4e12 600 1600 J=0.5E12 J=1E12 +26% +25% +24% 1400 500 J=2E12 +16% +45% +45% J=4e12 +44% 1200 400 +29% 1000 300 800

600 200

400

MaxResistance [Ohms] MaxTemperature [K] 100 200

0 0 Always ON Op Mode BI 2 BI 8 Always ON Op Mode BI 2 BI 8 Switching Pattern Switching Pattern (a) (b) Figure 3.12 (a) & (b) Max. resistance and temperature of NW for different current densities and operating modes.

Continuous shift (always ON): This is the worst-case condition with a high activity factor of

1, here we assume that bits are always shifted left and then right (without any read/write operation).

Operational mode shift: This is a realistic worst-case condition where the bits are shifted to cover the entire length of NW followed by read/write and shift backwards. The activity factor in this scenario is ~0.86.

Two bank interleaving (BI-2): In this case a full shift forward/backward operation is interleaved with another full shift operation of another bank. As the banks are selected alternately, the activity is 0.5.

Eight bank interleaving (BI-8): In this case a full shift forward/backward operation is interleaved with full shift operations of 7 other banks. This is the best-case scenario with an activity factor of 0.11. For the thermal simulation we also assume constant number of shift operations (104) on the target bank for each operating mode. Furthermore, we assume that the same row is accessed

43

Vel Vs TempTime for different Currents Velocity variation due to Joule heating 300 300 Ideal 0.5mA Always On 250 1mA 250 ON=OFF 2mA -52% 200 4mA 200 BI8 BI16 150 150

280

100 100 -13% Velocity(m/s) Velocity(m/s) 260

240

50 50 Velocity(m/s) 220 3.6 3.8 4 Current [mA] 0 0 0 0.05 0.1 0 1 2 3 4 Time (ms) Current [mA] (a) (b) Figure 3.13 (a) Transient velocity of the DW for constant voltage assumptions and, (b) avg. velocity of DW w.r.t current density for different operating modes

consecutively to simulate the worst-case condition. We extract the DW velocity for each current density and use that to estimate the shift frequency and simulation time.

Fig. 3.12(a) shows the maximum resistance of the NW for different operating modes and current densities. It can be noted that resistance can increase by 15%-25% depending on the operating mode due to higher current density and Joule heating. The increase in resistance results in more heat dissipation and corresponding rise in temperature (Fig. 3.12(b)). This creates a positive feedback loop where the NW can get damaged due to electromigration and overheating. BI-2 (BI-

8) reduces maximum temperature by 2% (24%) for J=4e12A/m2. Note that these results are valid for constant current assumption. In reality the shift current is implemented using constant voltage sources. Therefore, the increase in NW resistance decreases the current which in turn reduces the overheating at the cost of speed degradation. We have also simulated the constant voltage scenario for target initial current densities. Fig. 3.13(a) plots the transient DW velocity for different starting currents. It can be observed that the DW velocity can degrade significantly (~52%) due to rise in operating temperature for J=4e12A/m2. Therefore, the operating frequency of NW can degrade by

44

2X after few thousands shift. This will require significant guard-banding to ensure functionality.

Fig. 3.13(b) shows average DW velocity vs. current for the four cases under considerations. BI-16

(BI-8) shows 11% (6%) improvement in speed compared to the worst case operating mode.

Therefore, bank interleaving is a feasible approach to mitigate Joule heating.

3.5. Summary

We propose two NVFFs, which offer fast data store and restoration (within single clock cycle) from intentional and unintentional power outages. The proposed flip-flops also provide enhanced scan functionality that is needed for two-pattern delay testing. The design employs input from the master stage to store the values in MTJ which in turn eliminates the need for an external control and driver circuitry. Compared to existing techniques, HPES-NVFF utilizes the entire CLK cycle for the backup operation thus eliminating frequency bottleneck originating from MTJ write latency. Also, we looked into design issues that lead to asymmetry of MTJ write latency and high static power. We incorporate power gating technique to ensure low-power and robust operation.

We also investigate the MIIM diode as Selector Device (SD) for STTRAM crossbar.

Additionally, we describe the MIIM diode as Selector Device (SD) for STTRAM crossbar. The SD design space is analyzed by considering the trade-off between retention time, read/write voltages, sense margin and array size.

Furthermore, our investigation revealed that DW can experience significant overheating and slowdown posing threat to its applicability as embedded cache. We proposed bank interleaving and pulsed shifting as design techniques to mitigate the impact of variability and temperature induced reliability degradation while enabling low-power and high frequency operation.

45

Chapter 4

Secrecy and Privacy Issues of Spintronic Memory **

Although promising, STTRAM LLC brings new security challenges that were absent in conventional volatile memories such as Static RAM (SRAM). The root cause is persistent data and the fundamental dependency of the memory technology on ambient parameters such as magnetic field and temperature that can be exploited to compromise the data. Additionally, STTRAM suffers from high write latency and write current, the latency and current depend on the polarity of the data being written. These factors introduce security vulnerabilities and expose the cache memory to side channel attacks. Furthermore, due to the persistent data and the fundamental dependency of the memory technology on ambient parameters such as magnetic field and temperature, one can potentially exploit to compromise the data stored.

In this chapter we discuss the STTRAM vulnerabilities such as high latency, high switching current, temperature and asymmetric read/write current. We also present attack models that builds upon the vulnerabilities described. We then discuss preventive techniques to obfuscate the current signature and/or make the attack difficult or nearly impossible. Since the supply current signature is prominent during write operation we focus our efforts to obfuscate the write current signature.

We then describe the privacy issues plaguing non-volatile memory cache by taking the example of

STTRAM cache. Finally, we describe the applicability of the proposed attack model and countermeasures for various scenarios.

** This work was in collaboration Nitin Rathi. Fig. 4.3, 4.5, 4.6, 4.7, 4.8, 4.10, 4.11 and 4.14 were provided by Nitin Rathi. 46

4.1. Introduction

Non-volatile memories (NVMs) such as Spin-Transfer Torque RAM (STTRAM),

Magnetic RAM (MRAM), ferroelectric RAM etc. have drawn significant attention due to complete elimination of bitcell leakage. In addition to plethora of benefits NVM LLC brings, new security challenges that were absent in their conventional volatile memory counterparts such as Static RAM

(SRAM) and embedded Dynamic RAM (eDRAM). The root cause is persistent data that may allow the adversary to retrieve sensitive information like password or cryptographic keys.

STTRAM/DWM depends on ambient parameters that can be exploited to tamper with the stored data. The free layer of MTJ flips under the influence of external magnetic field which can be exploited by the adversary to launch magnetic attacks using a horseshoe magnet or an electromagnet [46]. The switching of MTJ depends on the ambient temperature, at high temperature the MTJ resistance reduces resulting in high read and write current [47]. The increased read current leads to read disturb failures, where the bits are accidentally flipped during read operation. The temperature can also be exploited to extend the persistence of the memory [48]. The persistent user data in non-volatile cache can also be compromised by launching unauthorized read and write operation and probing the data buses after the authentic user has logged off. The persistent data leaving the cache can also be accessed by probing the data bus between the cache and main memory

[49].

Traditional cache attacks can also be extended for STTRAM/DWM such as, (a) micro- probing, where conductors are attached to the chip surface directly to interfere with the integrated circuit; (b) radiation imprinting, where the contents are burned in using X-Ray radiation to prevent overwriting or erasing of stored data; (c) optical probing, where a laser is shinned on the surface 47

ICPU ILLC Off-chip Voltage Regulator IDIE I Processor REGULATOR Memory Core (LLC)

Die

Figure 4.1 System level view comprising of CPU, LLC and external voltage regulator. The adversary can monitor die current and/or regulator current.

resulting in activating the underlying circuit. The active components glow which can then be used to interpret the stored data.

In this chapter we describe the Simple Power Analysis (SPA) based side channel attack, to decipher the contents of the STTRAM/DWM LLC by monitoring the current drawn from the supply during read and write operations. The fact that STTRAM/DWM is associated with high write latency, high write current and asymmetry (polarity dependent) of writes, makes it vulnerable to side channel attacks that can compromise data privacy and integrity. The current in a circuit can be measured by inserting a small resistance in series with the Vdd or ground rail and measuring the voltage drop across it. Sophisticated devices can be used to sample the voltages at high rates (1GHz) with excellent accuracy (< 1% error) [50]. The system level illustration of the die and the regulated power supply is shown in Fig. 4.1. Although on-chip regulators have been investigated, due to its limited presence in ICs makes SPA-based attacks non-trivial.

Fig. 4.2 shows the variation of supply current when a 512b word is written into the LLC.

In order to mimic the power signature of a processor core, we implement 15, 17, 19 and 21-stage

48

Mx1 512b word written from all 0s to all 1s

Change in current drawn Power signature ( 4 flavors of ROs) due to change in MTJ to mimic the operation state of a processor core. I(Vs)

Figure 4.2 Power signature of an example system consisting of 4 flavors of ring-oscillators (250 each) to mimic CPU along with a 512b word STTRAM LLC.

ring-oscillators and instantiate those 250 times. We note the change in the DC current level upon bit-flip from all-0 to all-1 which is a direct indication of the value of data being written. The data can be extracted more easily by forcing the CPU in idle mode. Data privacy can be addressed to some extent by semi nonvolatile memory (SNVM) which is similar to NVM but with very low retention time (e.g. 1s instead of 1yr). The retention time is intentionally lowered to improve latency and power. Additionally, it provides better privacy as the data vanishes after power is turned OFF.

However, we note that SNVM is not sufficient therefore, we propose an erasure architecture to destroy the data at power OFF. Since erasure could be power intensive operation (and might need a backup battery under power failure attack), we propose to exploit the residual charge present in power rail. A canary circuit is proposed to track the MTJ write time under unregulated supply.

The highlights of this chapter are the following:

49

• Description of STTRAM security vulnerabilities such as long write latency, high write

current and asymmetric read/write currents.

• Proposition of side channel attack models to weaken the data privacy.

• Proposition of design techniques such as short retention STTRAM, parity encoding and

random write to obfuscate the side channel signature.

• Proposition of constant current write technique to eliminate polarity dependent write

current signature.

• Description of energy-efficient erasure by reusing the residual charge present in power rail.

4.1.1. Threat Model

We focus on defending against side channel-based power analysis attacks for ensuring secrecy. We assume adversaries have can accurately decipher the data being written-to or read- from the cache, by filtering out the ‘noise’ and other power profiles. We also assume that the adversary will be able to decipher under varying temperature and voltage conditions.

We focus on erasing the data from the LLC to protect against privacy infringement. We assume that the system will have sufficient power in the power-rails to successfully erase most of the valid, tag and data bits during shut-down. We assume that the adversary will be able to accurately read-out the contents of the cache and piece together the necessary secret information.

We also assume that the system will be shut-down normally allowing for the proposed sequence of events.

50

4.2. Side Channel Attacks on STTRAM & Countermeasures

4.2.1. STTRAM Functional Vulnerabilities

Read/Write Latency: The write latency of STTRAM is a function of thermal stability factor (Δ) which in turn de-pends on the retention time. For 10-year retention Δ =56 is required

[51] which corresponds to a write latency of 0.67ns at 1V supply. Furthermore, STTRAM is susceptible to process variation (PV) [52] which increases the thermal stability of bits randomly especially for larger arrays. Therefore, some bits suffer from excessive high read and write latencies. Fig. 4.3(a-b) shows the read and write latency distribution of a 40nmx40nmx4nm

STTRAM under PV. A 5000-point Monte Carlo simulation is performed, and the data is extrapolated to 8MB using extreme value theory in Matlab. It is observed that the worst-case writes

(read) latency is 1.3X (3.4X) the mean value. To avoid read and write failures worst case latency is followed for the entire memory array which results in longer wordline pulse. The longer read and

6 6 Xx 10106 xX 10 106 33 4 Mean=0.67ns 4 Mean=0.2ns 33 22 Min=0.49ns Max=0.89ns Min=0.06ns Max=0.67ns 22 3.4X

occurences 1.3X 11 occurences

1 # of of #

# of of # 1

0 0 0.40.4 0.50.5 0.6 0.7 0.80.8 0.90.9 0 0.2 0.40.4 0.6 0.80.8 WriteWrite Latency Latency (ns) ReadRead LatencyLatency (ns) (ns)

(a) (b) Figure 4.3 (a) Write latency; (b) read latency distribution of an 8MB STTRAM cache under process variation. The long read and write latency presents wider attack window to the adversary. 51

write latency presents more opportunity to the adversary to analyze the side channels and weaken the data privacy.

Read/Write Current: Another aspect of STTRAM is the high write current which is dependent on thermal stability, retention time and the polarity of the stored data. We assume constant voltage write which is commonly employed to simplify the write driver design [53].

STTRAM resistance is high (low) during state ‘1’ (‘0’). Fig. 4.4(a) shows the supply current waveform for single bit write ‘1’ when the previous value stored is ‘0’. Initially the current is high

(STTRAM resistance low) and it goes low after successful write. Fig. 4.4(b) shows the supply current waveform for write ‘0’ with previous value stored as ‘1’, in this case the current is initially low and goes high after successful write. The high and low states of current are very distinct, and they reveal the information about the previous and new data. The current difference between the states depends on the Tunnel Magneto Resistance (TMR) of STTRAM which is given by (RH-

RL)/RL. For robust read operation it is desired to have higher TMR which adversely affects the

1.2 High Current 1.1 1.1 Data = ‘0’ Data 0.9 High Current 0.9 = ‘0’ Data = ‘1’ Data = ‘1’

Low Current Low Current |Current| (mA) |Current| |Current| (mA) |Current| 0.7 0.7 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Write Latency (ns) Write Latency (ns)

(a) (b) Figure 4.4 (a) Supply current waveform (y-axis values are negative) for write ‘1’, and (b) write ‘0’. A significant gap is present between write ‘0’ and ‘which can be employed as signature. Furthermore, the magnitude of write current is a function of stored data which also acts as a signature. The size of MTJ is (40X40X4) nm, ∆ is 56, damping constant α is 0.007, and saturation magnetization is 780Oe.

52

data privacy. The read current is comparatively less than the write current (Fig. 4.5), thus the read and write operation can be distinctly identified from the current waveforms. The source degeneration based read sensing is used in this work [54].

Temperature Sensitivity: The thermal stability (Δ) of STTRAM is a function of ambient temperature and the write current and write latency linearly depends on the thermal

푘푀푠퐴푟푡 stability. The thermal stability is given by ∆= . where Hk = uniaxial anisotropy, Ms= 2푘퐵푇 saturation magnetization, Ar= area of MTJ, t=thickness of free layer, kB= Boltzmann constant, T= ambient temperature.

Colder temperature increases Δt which in turn increases the write current and latency. Fig.

4.6 shows the write latency with respect to delta values. The write latency increases with the increase in thermal barrier. This can be exploited by the adversary to strengthen the side channel signature from STTRAM.

Data = ‘0’ 80 uA 0.20 Data = ‘1’

0.10 10 uA Current (mA) Current

0 0 0.4 0.8 1.2 Read Latency (ns)

Figure 4.5 Supply current variation for read operation. 4 flavors of ring-oscillators (250 each) to mimic CPU along with a 512b word STTRAM LLC. A reasonable gap is present between read ‘0’ and ‘1’ currents which can be employed as signature 53

0.8

0.6

0.4

0.2 Write Latency (ns) Latency Write

0 10 20 30 40 50 60 70 Thermal Stability

Figure 4.6 Write latency for different values of thermal barrier

4.2.2. Exploiting these Functional Vulnerabilities

Exploiting Read/Write Current: The LLC contains sensitive data in raw form such as login, password and credit card details entered during a web transaction and encryption keys used to encrypt data to be sent over the network. In current processor architecture all the user data processed by CPU passes through cache memory. The adversary can steal the raw data or get clues about the data so that the correct data can be predicted in linear time. For STTRAM LLC the adversary can perform side channel attack by monitoring the supply current waveform of the memory array. It is assumed that the adversary can monitor the current flowing into the memory array from the power supply. Even if the adversary has access to die-level power supply, it can reveal the LLC side channel signature. Fig. 4.7 shows the write current waveforms for 4-bit write operation in STTRAM. Out of 16 data values only 5 are unique in terms of total number of 0’s and

1’s (1111, 0111, 0011, 0001, 0000). In memory array all the bits in a word are written in parallel, thus the order of 0’s and 1’s in a word does not affect the supply current waveform rather the overall number of 0’s and 1’s in a word defines the current signature. For 4 bits all 5 permutations are

54

0 Attack Window Attack Window (Old Data) (New Data) Noise -2 1-1-1-1

0-1-1-1

Current (mA) Current 0-0-1-1

-4 0-0-0-1

0-0-0-0 0 0.3 0.6 0.9 1.2 1.5 Write Latency (ns)

Figure 4.7 Write currents for 4-bit operation

clearly distinct in the current waveform. Knowing the number of 0’s and 1’s weakens the security significantly as it reduces the reverse engineering effort to identify the correct data.

Exploiting Read/Write Latency: The high read and write latency provides a larger attack window to the adversary. By monitoring the current waveforms, the adversary can not only predict the number of 0’s and 1’s in the new data that is being written but can also predict the previous data by sampling the current just after the wordline is asserted. The adversary samples the current during the attack window shown in Fig. 4.7. The difference in current states of each combination depends on the TMR of STTRAM as discussed before, higher the TMR more apart are the current states. In Fig. 4.7 the write operation is completed in 800ps but to avoid write failures under PV the wordline is active for longer duration. This gives adversary more time to identify the transient current and get confidence about the results. Thus, data dependency of current reveals the stored and new data and higher latency facilitates the attack. The figure also shows the attack window available to identify the old and new data. Note that larger word size creates more number of states in supply current signature however the difference between two consecutive states remain

55

Read current for a 4-bit read operation 1.5 0-0-0-0 0-0-0-1 1.25 0-0-1-1 ~320 uA 0-1-1-1 1 1-1-1-1

0.75

0.5 Current (mA) Current Attack Window 0.25 (Old Data) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Time (ns)

Figure 4.8 Read currents for 4-bit operation

the same. Furthermore, larger word size increases the total current which makes the attack easier for the adversary. A similar attack can be performed during the read operation. Fig. 4.8 illustrates the attack window available to the adversary for deciphering data during reads.

Temperature-Assisted Attack: The adversary can intentionally increase the write latency by lowering the ambient temperature. The MTJ resistance increases at lower temperatures which leads to less write current. The write latency is directly proportional to the write current and thus at lower temperatures the write latency increases which provides the adversary more time to launch the attack.

56

Retention time Vs Volume 15 10 Base MTJ:

10 (40X40X4) 10 nm3 ~10 5

10 Years Retention time(sec) Retention

0 ~10 secs 10 1X 1.2X 1.4X 1.6X 1.8X 2X Volume

(a) (b) Figure 4.9 (a) Retention time variation with respect to MTJ volume; and, (b) retention time dependence on temperature.

4.2.3. Prevention Techniques

4.2.3.1. Semi Non-Volatile Memory (SNVM)

SNVM is a non-volatile memory with lower retention time. The typical retention time for

STTRAM is 10 years however such high retention time is not required for cache application as the data is invalidated when the system restarts or the virtual address space is changed. Instead the retention time can be lowered to improve the write latency and write current [54]. The write latency and write current (I) linearly depends on the Δ of STTRAM. The retention time (t) is exponentially

훥 related to Δ by 푟푒푡 = 𝑓0 ∗ , fo is the thermal attempt frequency.

Both write latency and write current can be lowered by reducing Δ which in turn lowers the retention time. Since Δ depends on the free layer volume of STTRAM it can be scaled to lower the retention time (Fig. 4.9 (a)). The lower write latency due to SNVM reduces the attack window as shown in Fig. 4.6. Lower write current brings the current states closer to each other making it

57

80

60

40

20 % reduction in states in reduction %

0 16 32 64 128 256 Word Size (bits)

(a) (b) Figure 4.10 (a) Current waveform for 4-bit write with 1-bit parity, (b) percent reduction in states with 1-bit parity for different word sizes. Substantial reduction in states in possible with 1-bit parity.

difficult to identify the state individually. However, simulations (Fig. 4.9(b)) show that at low temperatures the retention time increases dramatically, thus giving away the above benefits obtained from lower retention. Thus, SNVM cannot be used in isolation to prevent side channel attack.

4.2.3.2. Adding 1-Bit Parity

The objective of this prevention technique is to merge multiple supply current levels in the side channel current waveform which will make it difficult for the adversary to predict the states accurately. This is achieved by writing an extra parity bit along with the original data. Fig. 4.10(a) shows the current waveform of 4-bit write with 1-bit even parity. So, instead of writing 4 bits we write 5 bits with the last bit value decided by the parity of the 4 bits. By doing this we are able to merge 5 states (Fig. 4.10(a)) into 3 states. Compared to un-coded data the reverse engineering effort increases because a data will map to more number of possibilities. The solution works on the principle that the overall write current depends on the number of 0’s and 1’s and not on their order.

This extra 1-bit write makes some states identical to each other in terms of total 1’s and 0’s. For 58

example, the un-coded 0111 will become 01111 which will merge with 1111. Fig. 4.10(b) shows the percent reduction in states with 1-bit parity for different word sizes. For a 32-bit word the number of states reduce by 30%. The reduction in states due to 1-bit parity goes down with the increase in word size because the effect of 1-bit parity gets absorbed by the larger word size. For a

32-bit word the effect of single bit is 1/32 whereas for a 256-bit word the effect reduces to 1/256.

The reduction in states is maximum for 16-/bit word, 70% reduction. Below 16-bit the reduction rates drops because there are not many states available to merge. For a 4-bit word there are 5 states out of which 2 are merged by 1-bit parity. Therefore, the 1-bit parity mitigation technique works best for 16-32 bit word sizes.

Note that the overhead associated with parity is negligible for practical word sizes.

Furthermore, parity encoding is typically present in the error correction code (ECC) protected memory arrays. Therefore, this technique is easily introduced in the design by reusing existing design features.

4.2.3.3. Adding Random Bits in a Word

The reduction is states with 1-bit parity diminishes as the word size increases. The signature of the current waveform at higher word sizes becomes difficult to interpret as the number of states increase. To further obfuscate the signature, we propose to add multiple random bits in the word during write. This technique further complicates and merges the states in the supply current signature. The results with addition of 2, 3 and 4 random bits in the word is shows in Fig. 4.11. It can be observed that the larger number of extra random bits reduce the number states substantially for larger word sizes. The random bits can be generated by employing a simple pseudo random

59

80 2-bit 3-bit 60 4-bit

40

20 % reduction in states in reduction %

0 0 50 100 150 200 250 Word Size (bits)

Figure 4.11 Percent reduction in states with multi-bit random write for different word sizes.

number generator. For larger word sizes the overhead from few extra bits is expected to be negligible.

4.2.3.4. Constant Current Write

In the previous Section it has been noted that asymmetric polarity dependent write current is a manifestation of constant voltage write. If we write both ‘1’ and ‘0’ with the same amount of current, then there will be only one level in the current waveform and the write current will only depend on the word size. Constant current write can be achieved by using a current mirror with voltage controlled current source (Fig. 4.12(a)). The two PMOS forms the current mirror whereas the NMOS MC controls the current to be mirrored depending on the STTRAM resistance [55].

Bias voltage (VB) is adjusted to provide the initial read current in the main branch which will pass through the STTRAM in the auxiliary branch. However constant current write will create mismatch in switching times between ‘0’ and ‘1’ states (Fig. 4.12(b)). This will affect the design of the word- line driver, but the adversary will have no clue about the data as the current will remain constant throughout the write access.

60

Current Mirror

MC MTJ VB

(a) (b) Figure 4.12 (a) Constant current write circuit [9]; and, (b) write latency difference with constant current write (current in mA).

Reducing power overhead of constant current write: To ensure functional correctness, the constant current approach utilizes the worst case write current injected to homogenize the write current. This leads to power wastage while writing logic ‘1’. To address this issue, it is possible to leverage the trade-off that exists between write current and error rate (as shown in Fig. 4.13). By lowering the write current for both ‘0→1’ and ‘1→0’, the write-time of certain number of bits may fall beyond the worst latency. These bits contribute to the write error rate. By maintaining the write error rate under permissible levels or increasing the permissible write latency it is possible to lower the power overhead of constant current write.

4.2.3.5. Increasing Word Size

The supply current waveform highly depends on the number of bits that is being read and written at once i.e., the word size. With the increase in word size and under PV the attack window for the adversary will reduce. This will affect the prediction accuracy and increase the difficulty for 61

High Current Reduced Original Current write time Write time distribution distribution under reduced current

Failing bits = Write Error

Write time

Figure 4.13 Homogeneous write using reduced write current.

the adversary to correctly predict the number of 0’s and 1’s stored in the memory array. Thus, increasing word size during read/write can lower the attack window for the adversary.

4.3. Data Privacy Issues

Volatile Memory: The volatile cache such as SRAM and eDRAM is inherently more secure as the data vanishes on power OFF. A tamper-sensing unit is embedded in memory and in the event of tampering, the power to the memory is turned off or even shorted to ground [48]. As an effect when the power is back ON, the SRAM is initialized to random states with no correlation to earlier stored value. Though data remanence effect may cause some bit cells to retain the data, but vast majority of the data is lost when the power is turned OFF. Similar statement is true for eDRAM. Therefore, powering down is considered as a successful protection mechanism for the data privacy of volatile memory.

62

Figure 4.14 The architecture to erase the cache tag, data and valid bits in a direct mapped cache when the system is turned off.

Nonvolatile Memory: NVM retains data after power OFF providing instant-ON experience as the operating system and application software are retained in an initialized and executable state. This feature is useful when NVM is used as main memory, but for cache the data may not be needed after power OFF. Moreover, the sensitive persistent data becomes vulnerable to information stealth. Traditional measures like tamper sensing units fail as the attack can be launched after power OFF. The high latency associated with encryption/decryption process make it less suitable as a preventive measure. Therefore, by implementing an erasure technique – that utilizes the residual charge in the power rails, to erase portions of the memory resulting in memory corruption i.e. invalidation. Fig. 4.14 illustrates the architecture that can be employed to erase the tag, data and valid bits of the LLC.

63

Semi-Nonvolatile Memory: Typical retention time of STTRAM is 10 years however such high retention is not needed for cache memory. Cache data is invalidated on system startup and also when the virtual address space is changed. Thus, the retention time can be lowered to improve the write latency and write energy [54]. The write energy can be lowered by reducing the energy barrier of the MTJ (Fig. 2.1). We know that the switching current decreases linearly with the reduction in thermal barrier, which in turn decreases the retention (2.2.3). The retention time (t) is therefore proportional to MTJ free layer volume. Therefore, downsizing the free layer lowers the retention time (Fig. 4.9) providing fast write latency and lower write energy. We note that lower retention is good from the data privacy standpoint as the data will be lost by the time adversary tries to get it. Therefore, SNVM can be used as first line of defense to protect the data. However, the retention time can be increased dramatically by freezing the chip. Thus, SNVM cannot be used in isolation to preserve data privacy.

4.4. Other Potential Issues

Impact of Scaling: With technology scaling the MTJ size reduces which lowers the free layer thickness. Δ is linearly dependent on the free layer thickness and the retention time is exponentially related to Δ. There-fore, the write latency and write current of STTRAM is expected to scale down making it more secure against power analysis attack. Introduction of perpendicular magnetic anisotropy (PMA) STTRAM makes it further challenging for the adversary to perform meaningful side channel attack due to inherently lower write latency and write current.

Impact of TMR: As described earlier, the TMR ratio determines the resistance difference between the two MTJ states. It is therefore evident that, larger the TMR, greater will be the difference of resistance between the two MTJ states. For a good sense margin, a large TMR is

64

always de-sired. However, this can prove detrimental from a security point of view as it will allow a clearer distinction between the bits being written/read. Thus, improving the effectiveness of SPA.

Impact of Usage: Although STTRAM LLC is considered in this chapter the proposed attack models are equally applicable to the STTRAM main memory. Availability of dedicated power supply makes it easy to probe main memory active current. However, cryptographic keys cannot be revealed since the crypto operations are performed on chip. Nevertheless, the raw unencrypted sensitive data can be extracted.

Impact of Magnetic Tampering: External DC magnetic field of opposite strength could be used to increase the switching time of MTJ, which will increase the attack window for the adversary. Thus, with the help of a common horseshoe magnet the adversary can increase the write latency to facilitate attacks (especially for constant voltage write).

Cache Timing Attack: In shared computer the main memory and hard disk are protected against use by another user on the same machine, but the cache is not. If two users are working on the same machine the malicious user can fill the entire cache with his own data and wait for the other user to perform secret operations like encryption. The malicious user then measures the loading time to find which of his data has been replaced by the other user and learns about the cache addresses used in encryption. This timing information can be exploited for key recovery of encryption algorithms like AES [56]. Since a larger cache size can be afforded with

STTRAM (due to smaller footprint bitcell) the number of cache line replacements is expected to be less alleviating the cache timing attack. However, the persistence of data can be exploited to launch the attack at a later time to retrieve the sensitive information.

Other Side Channels: STTRAM resistance in the parallel and anti-parallel state is in the range of KΩ (5K-10K) and the write current is in the order of µA (100-150 µA). Thus, the IR drop will be in the order of mV resulting in considerable droop in supply voltage. The adversary

65

can monitor the droops in supply voltage to identify write operation and the amount of droop can give out the information about the data being written much similar to supply current.

Considerations for Other NVMs: Long/asymmetric write latency and high/asymmetric write current is common challenge for other NVMs such as Resistive RAM, Phase

Change RAM and Domain Wall Memory. Therefore, the attack models presented in this chapter are equally applicable to the emerging NVMs. Due to generic nature of the solutions pro-posed in this chapter, similar techniques could also be extended to other NVMs for mitigation.

4.5. Summary

In this chapter we discuss that the STTRAM read/write current, latency and asymmetricity can be potential security vulnerabilities. We presented novel side channel attack models for

STTRAM to compromise the sensitive data in LLC. We also provided a suite of preventive countermeasures such as constant current write, increased word size, SNVM and parity bit encoding to increase the reverse engineering effort required by the adversary to decipher the data from read and write current waveforms. The discussed techniques showed significant promise to protect against data privacy attacks to enable secure NVM design. The solutions proposed in this chapter could also be extended to other NVMs for attack mitigation.

66

Chapter 5

DWM PUFs for Security, Trust and Authentication **

The manufacturing of the present day integrated circuits (IC) are mostly outsourced to external companies. Under this business model the design is exposed to tampering and cloning by the third party breaching the Intellectual Property (IP). IC cloning also siphons off the economic benefits of the product. Due to high tech facilities employed by adversaries, isolating the fake chips from the genuine ones is becoming increasingly difficult task. Traditionally, unique keys are generated by the ICs for important applications such as IP security, counter-plagiarism etc. These keys are then stored on the on-chip non-volatile memory that is thought to be impervious to illegal access and duplication. However, adversaries can decode the secret key through Reverse

Engineering (RE). The duplicated chip with the key obtained through RE cannot be distinguished from genuine chip. In order to address these issues, an auxiliary circuit i.e. Physically Unclonable

Function (PUF) is incorporated in the authentic chips. PUFs are designed to exploit the physical properties of the chip (e.g., process) to generate its unique identification key. PUF is unclonable as the duplicate of this circuit will not provide the same identification tag as original even if the ICs are functionally identical. PUFs work on the foundation of challenge-response protocol, which functions on the basis of complex and variable physical process.

We show that the process variations in the nanowire (NW) is not good towards robustness, but it can be very useful for device authentication. We propose two PUFs that exploit the non-linear

** This work was in collaboration Kenneth Ramclam and Jae-Won Jang. Fig. 5.7 was provided by Kenneth Ramclam, and Fig. 5.12 was provided by Jae-Won Jang. 67

DW-dynamics for secure key generation. Two flavors of PUF designs namely- the relay-PUF and the memory-PUF, offer lower overhead and power as compared to a traditional CMOS-PUFs and offer a higher degree of resilience against cloning.

5.1. Introduction

Hardware security, trust and authentication are inherently intertwined with each other. The untrusted design environment results in infected hardware that in turn brings the need to authenticate the ICs. Although software-based security solutions are easy to implement, hardware solutions such as hardware encryption, PUFs, True Random Number Generators (TRNGs) and, tamper detection sensors have shown great promise to meet power/performance standards, while

AV V

T Random fluct. Meta. Therm. DW Fluc. Stoch. Stoch. DW DW Stochastic annihil. DW speed Variation of Velocity Vs Time 200 motion LLG DW pinning  Vs Q for one Pinning Site 180 10

160 0

-10 140

-20 Velocity (m/s)

120 (deg)  Stoch. -30 100 1 notch U=80m/s 2" Notch -40 U=90m/s 3" Notches NW 80 U=100m/s 0 2 4 6 8 10 12 14 -50 Time(ns) DW 0 0.5 1 1.5 roughn Q(Um) pinning ess Stoch. roughness Write/ sense

Figure 5.1 Sources of entropy and randomness in DWM system.

68

uncovering and solving emerging security issues such as Trojan insertion, IC recycling and side- channel attacks [57, 58]. The security primitives typically extract the spatial and temporal randomness and inherent entropy present in the system using carefully designed harvesting circuits for generating unique identification keys. The downside of CMOS based circuits are area and power overhead, sensitivity to environmental fluctuations, limited randomness and entropy offered by the

Silicon substrate. This brings the need to exploit emerging technologies containing abundance of entropy and physical randomness while being robust, energy-efficient and fast. We note that spintronics [59-61] is one such possible candidate that possesses an untapped source of entropy in the system besides having an energy-efficiency of higher order of magnitude than CMOS. Some examples are shown in Fig. 5.1.

The experimental results on spin valves, magnetic-tunnel junctions (MTJ), domain wall magnets (DWM) etc. [23-25, 30-35, 62-66] have created enormous interest in spin-based computations. The most promising effect is current induced modulation of magnetization dynamics discovered in MTJ and DWM, as it opens door to energy-efficient logic and memory design.

Interaction between injected current and local magnetization creates several Spin-Transfer Torque

(STT) mechanisms that are excellent sources of entropy in the magnet. The thermally activated electrons in the material add to the entropy. Besides, the magnet is also sensitive to physical randomness. One such magnetic system with abundance of entropy is DW in permalloy NW with

20% Fe and 80% Ni (Fe20Ni80). We propose methodologies to harvest the entropy to realize hardware security primitives such as PUF. Design of PUF using memristors have been proposed in the past [67-68]. However due to the emerging nature of spintronics and hardware security, very little research [69] has been done to bridge the two fields. Although a practical demonstration of a

DWM based PUF is still missing, works illustrating DWM for cache [70], content addressable memory [71] and highly efficient DW motion [4, 72] have been previously described which show promise.

69

Although DWM is a promising memory technology, it brings in an important security concern. It is susceptible to contactless tampering efforts, e.g. by subjecting it to strong external magnetic field, an adversary can corrupt stored contents. The fixed layer of the MTJ is robust.

However, the free layer could be toggled through both spin polarized current as well as magnetic field, making them vulnerable towards tampering. The ease of tampering the data underscores the need of quantifying the impact of magnetic attack and exploring effective, low-overhead protection mechanisms [73].

This chapter provides transformative applications that highlight:

• Engineering new techniques to harvest the randomness in magnetic NW.

• Combining the circuits and models of non-linear magnetic dynamics to realize

hardware security primitives such as PUF.

• Validating the models and design ideas under harsh conditions.

• Attack scenarios such as magnetic field and machine learning.

Due to superior energy-efficiency and the footprint of DWM, the proposed circuits are orders of magnitude lower in power and higher in density as compared to its CMOS counterparts.

Along with the non-linear dynamics of the magnetic system the quality of spintronic security primitives is superior. Metrics such as entropy, uniqueness, repeatability, number of trials, robustness, power consumption and attack resilience are employed to quantify the strength of proposed security primitives.

5.1.1. Threat Model

We focus on defending against masquerading and counterfeiting based attacks that can potentially compromise both authenticity and integrity. We assume that the sources of variations are truly random and are purely dependent on the stochastic nature of spintronic memory. We assume the measuring circuitry will not provide any bias during operation. We assume that the 70

adversary will not be able to replicate these signatures (due to them being purely random). We assume the connection between the device being authenticated and the authenticator is fully secure.

We assume the devices are thoroughly tested and the various challenge-response sequences are accurately recorded in a safe facility prior to distribution. We also assume that the device is being operated under nominal temperature ranges for a pre-determined period of time (no aging issues).

5.2. Physically Unclonable Functions

Extensive research has been conducted to address hardware security, trust and authentication mechanisms. To deter IC cloning and stealing of secure keys, PUF [74] has been proposed. PUF generates the response (key) to a particular challenge from physical properties of the chip. Several flavors of PUFs e.g., optical PUF [75], delay PUF [76], SRAM (Static Random-

Access Memory) PUF [77], flash PUF [78] and, flip-flop PUF [79] have been introduced. The common limitations of CMOS-PUF designs are area/power overhead, restricted number of challenge-response sets, robustness to environmental fluctuations and most importantly, limited physical randomness in the system. Specially crafted circuit structures such as SRAM and flip-flop are used to amplify the effect of physical randomness. Nano-electronic PUFs [67, 68] using memristors have been presented to leverage properties like initialization, non-volatility, density, resistance states etc. to address some of the challenges faced by CMOS PUFs.

5.2.1. Approach

The first aim is to capture the non-linear dynamics of DW in detailed physics-based models

(described in Chapter 3). This is accomplished by modeling the process variations in the DW nanowire. The Landau-Liftshitz-Gilbert (LLG) equation to solve the DW dynamics is modified to incorporate the effect of process variations. Harvesting technique is designed to capture the noise and randomness with minimal disturbance to the underlying system and convert them into 71

2 Shift 3

1 Nucleation 4 Sense

Figure 5.2 Harvesting entropy and randomness through DW nucleation (1), shift (2), DW motion (3) and sense (4).

measurable quantities. Next the harvesting circuit is employed as the building for the PUF design. The quality of the PUF e.g., entropy, uniqueness, repeatability, robustness and, resiliency towards attacks is ensured by detailed analysis. Finally, the proposed spintronic PUF are benchmarked against conventional CMOS PUF architectures.

5.2.2. Harvesting Entropy and Randomness

The objective of harvesting techniques is to convert the entropy to measureable quantities like voltage, current and resistance, without disturbing the stochastic process by employing accurate sense methodology. Fig. 5.2 illustrates an example where a DW is nucleated and moved in the NW by injecting shift current. The DW arrival time under the sense MTJ contains several sources of entropy namely, metastability of DW, stochastic motion and, stochastic pinning/speed degradation.

5.2.3. Relay-PUF Design

We harvest the physical randomness in the DWM to generate challenge and response. Fig.

5.3 illustrates a relay-PUF design with series connected NW stages. The conventional muxing

72

1 3 Sense 1 NW1 NW3

Vref Vref Resp

2 DW Race 5 relay Arbiter NW2 NW4 I 3 shift 4 1 Challenge[0] 1 Challenge[1] DW nucleation

Figure 5.3 Schematic of DW relay-PUF. Ishift pulse magnitude and width can also be used as challenges. Sequence of events is numbered from 1 to 5.

.circuit between each stage is introduced to toggle paths and create new challenges. More number of stages also provides higher degree of randomness in signature. An arbiter block is placed at the end to compare the arrival times of the respective DWs. The operation of relay-PUF has three stages:

Challenge: In contrast to conventional delay-PUF, the relay-PUF also provides extra degrees of freedom to choose challenges namely shift pulse voltage, pulse width and pulse frequency. These new challenges can be employed to increase the number of challenges with low area overhead. Fig. 5.3 shows that obtaining the same number of challenges will require significant area and power overhead. With 1e19 challenges, it will take ~10 years to decode the response by an adversary, making the PUF attack-resistant.

DW nucleation and relay race: The first step of operation is to nucleate the DWs in all the NWs by applying a pulsed (+/ -) current, during which the write word line (wWL) is activated. Next, the shift signal of stage-1 is activated, that triggers the DW race. The read head is activated by pulsing the read word line (rWL). As soon as the resistance sensed by the read head changes (by sensing the magnetization change), the shifting of the stage is stopped. Unlike an inverter chain, where the transition propagates from one stage to another, the DW vanishes once it 73

Volts DW Nucleation wWL

t Shift Signal Stage 1 t Shift Relay Signal Stage 2 t rWL

t Ohms Resistance Sensed by Read Head

t

Figure 5.4 Timing diagram representing the wWL, the shift signals for each stage, the rWL and the variation of resistance sensed by the read head.

. reaches the end of the NW. To enable seamless propagation of the DW, in anticipation of the arrival of the DW after the nucleation stage, the read head is kept asserted to sense the arrival of the DW. Once the read head detects the arrival of the DW (the DW reaches the end of NW), the shift signal of the following stage is fired, thus relaying the DW information to the next stage. The mux select determines whether the upper or lower DW will be fired in the following stage. The sequence of events is illustrated in Fig. 5.5.

Response: The response of the relay-PUF (0 or 1) is determined by an arbiter that decides the early arrival of DWs in parallel NWs. The switching of paths in association with shift pulse width, duration and frequency provides several layers of randomness in the race condition. As the physical roughness varies NW-to-NW, the DWs will race with different speeds and the response will vary between chips 74

5.2.4. Memory-PUF Design

This PUF is similar to SRAM based PUF where the entire memory bank is potentially used

to obtain the authentication key unique to the chip at hand. The DWs in all NWs in the memory

banks are fired simultaneously. The race concludes when the read signal is asserted. The DWs

winning the race are set to 1 whereas the others are set to 0. In contrast to relay-PUF, this design

does not require any circuit overhead. Due to non-volatile nature of the bitcell this PUF is also low-

power. The schematic of the memory-PUF is shown in Fig. 5.5(b).

Challenge: In contrast to SRAM-PUF where the memory pattern is solely dependent on

power up and variations, the DWM memory-PUF depends on both variations and shift pulse

characteristics (magnitude and width). The challenges are the address of the array and shift pulse.

DW nucleation and race: Similar to relay-PUF, first a single DW is nucleated in all

NWs present in the array (Fig. 5.5(a)). Next, the DWs are shifted/raced by a shift pulse challenge.

Volts

DW Nucleation

wWL

t

Shift

Response Signal

t

rWL

t Mean Race Time Challenge

(a) (b)

Figure 5.5 (a) Timing diagram representing the wWL, the shift signal and the rW and, (b) Schematic of memory-PUF.

75 .

The rWL is fired after a conservative time to screen the pinned DWs at the end of the race for determining the outcome.

Response: The response of this PUF is the output of the array when a certain address is accessed for a particular pulse setting. The value of the bitcell is ‘1’ (‘0’) if a high (low) resistance is read from the read head.

Velocity w.r.t different PMs Velocity due to varying PWs 250 150

200 100 150 50 100 0 50

Velocity(m/s) -50 0 Velocity(m/s) 2ns 4ns -50 -100 6ns 1V 1.2V 1.4V 1.6V 1.8V 8ns -100 -150 0 5 10 15 20 25 30 0 10 20 30 Time (ns) Time (ns)

(a) (b) Velocity under varying PFs 150

100

50

0

Velocity(m/s) -50 2ns -100 3ns 4ns 5ns -150 0 5 10 15 20 25 30 Time (ns)

(c) Figure 5.6 Relationship of DW velocity on the three pulsed voltage conditions, (a) for various (Pulse Magnitude) PMs, (b) for different (Pulse width) PWs and, (c) for different (pulse frequencies) PFs (legend shows the off-on time = 5ns, pulse period for (a) and (b) is 10ns).

76

.

5.3. Simulation Results

5.3.1. PUF Strength

The proposed PUFs not only employ the conventional challenges such as mux switching

(for relay-PUF) and row address (for memory-PUF), but also shift current pulse magnitude, pulse width and pulse frequency as additional set of challenges. Therefore, the relay-PUF could be categorized as a strong PUF whereas the memory-PUF could be categorized as a moderately strong

PUF. The outcome of the race is highly randomized, as the process variation varies from NW-to-

NW, and the location of notches are random both spatially and temporally. The behavior of the DW in response to shift current pulse magnitude, width and frequency is illustrated in Fig. 5.6(a-c)). It is evident that DW velocity is strongly dependent on the pulse characteristics. Therefore, shift pulse could also be employed as a challenge. The non-linear dynamics of the DW also make the proposed

PUFs modeling and machine learning attack resilient.

5.3.2. PUF Randomness and Stability

The PUF randomness is measured by the inter-die Hamming Distance (HD) whereas the stability is measured by intra-die HD. First, we present the results for relay-PUF. This is followed by memory-PUF results.

5.3.2.1. Relay-PUF

We demonstrate the relay race between two DWs due to process variations. For this simulation we assume two parallel NWs each containing two stages. The length of each NW is 2um and the process variation is modeled by assuming three pinning notches at 0.5um, 1um and 1.5um along the length of the NW. The values of pinning potentials are assumed to be 1000, 750 and 500

J/m3 for the top NW and 500, 750 and 1000 J/m3 for the bottom NW. The relay of the DWs from

77

DW Race Bit Map Nanowire DW Relay

50 5 DW is relayed from NW1 to 40 NW3 to continue race 10

30 Nanowire 15 NW1 20 20 NW2

10 NW3 25 DW Race(PV) DW

NW4 30 DW Time(ns)DW 0 0 0.5 1 1.5 2 5 10 15 20 25 30 Nanowire Length(um) Challenge(Bit)

(a) (b)

Figure 5.7 (a) The NW race between NW1 & NW2, which is being relayed to NW3 & NW4, and (b) response of 6-stage relay-PUF for 32 challenges for 32 different dies.

.one stage to another through the challenge mux is shown in Fig. 5.7(a). It can be observed that

DWs race at different speeds due to difference in pinning potentials. NW1 finishes the race and the sense amplifier triggers the shifting of DW in NW3. At the end of race in second stage NW4 finishes much earlier that NW3 due to cumulative relay effect. It can be observed that there is a gap of few picoseconds between first and second stage which the time taken for passing the relay (i.e., sense and shift pulse triggering). But as this delay is uniform for all the NW stages, it doesn’t affect the overall result. The sense amplifier is designed to output a default high value. When the DW arrives the output is toggled to low. We also note that the resolution between the two races is small.

In order to allow time for the read head to detect this difference, we operate the PUF under low voltages. This not only reduces the power consumed but also magnifies the effect of pinning and provides the read head adequate time to sense the outcome of the race.

For the detailed simulation, we extend the relay-PUF to a 2-parallel path, 6 stage design.

The total number of challenges in this PUF is 25 i.e., 32. Therefore 32 different path combinations

78

are possible, with each combination producing a one-bit response. As described in Section II, process variation within each NW can result in different pinning potentials for each notch. For this simulation, the pinning locations in the NW is kept same as before. However, the mean pinning potential is assumed to be 500J/m3 and a variation of 150J/m3 (3 σ) is added to model, to incorporate the effect of process variation-induced pinning potentials. The relay-PUF’s responses for all 32 possible challenges are simulated. Next, new sets of process variations are applied to the PUF to simulate inter-die process variation.

Fig. 5.7(b) shows the PUF response obtained from 32 different dies (y-axis) and 32 challenges (x-axis). It can be seen in the bit map that process variation within the nanowires can cause the arbiter outputs to change. The challenge also triggers a change in the PUF response. Note that the relay-PUF is robust to voltage variations because the voltage will speed up/slow down both paths by the same amount. Therefore, the net effect (race) will remain unaltered.

5.3.2.2. Memory-PUF

For this PUF flavor, we consider 100x100 array of DWM. The intra-die variation is modeled by varying the pinning width and depth as Gaussian distribution with (µnw, σnw) to be (0,

5nm) and (µnd, σnd) to be (0, 2nm). Three notches are assumed per NW at 0.5um, 1um, 1.5um. The pinning potentials are determined from the notch dimensions. The simulation at 1V shift pulse shows that only 34 out of 1000 NWs get the DWs pinned (Fig. 5.8(a)). Considering the fact that the pinned DWs will result in a ‘0’ response, this race condition will produce uneven ‘1’s and ‘0’s.

In order to balance the ‘0’ and ‘1’ we reduce the shift pulse voltage and note that shifting at 0.25V roughly produces 59% of ‘1’ (i.e., the DWs that win the race). By operating the memory-PUF at this voltage, there is no need to correctly manage the reference read time as the DWs that get pinned

79

will always loose the race. The problem with this method is its susceptibility to variations in temperature which directly impacts DW velocity.

Fig. 5.8(b) shows the DW arrival time distribution at 0.25V for two temperatures 25°C and

125°C. It can be observed that high temperature pins more DWs (409 vs 498) and changes the signature of memory-PUF. We propose shift voltage boost at high temperature to negate the effect of extra DWs pinning. Our simulation indicates that boosting by 36.2mV brings back the number of pinned DWs back to 408 at 125°C.

To analyze the die-to-die uniqueness in response, we model the process corners (fast, typical and slow) by skewing the NW width and thickness by a factor of 10% i.e. fast corner is (-

10%, -10%) and slow corner is (+10%, +10%). Fig. 5.9(a) shows the distribution of velocity for slow, typical and fast corners. Fig. 5.9(b) shows the bitmap pattern of the typical case. Upon XOR- ing the corner cases a vivid distinction in the effective bitmap patterns is observed.

Arrival time distribution Arrival time distribution 600 120 982 0.2V 678 0.25V-HT 0.25V 966 0.25V-NT 500 0.3V 100 0.4V 591 0.286V-HT 0.5V 0.286V-LT 400 1V 80 1.5V 592 890 300 60 842 200 502

40 Occurrences

712 Occurrences 100 591 429 20

0 0 10 20 30 40 50 60 70 80 0 Arrival Time (ns) 40 45 50 55 60 65 70 75 Arrival Time (ns)

(a) (b) Figure 5.8 Arrival time distribution for (a) different shift voltage settings at 25°C and, (b) two voltages settings at 25°C and 125°C.

80

.

5.3.2.3. Quality Analysis of DW-PUFs

5.3.2.3.1. Uniqueness of Mapping

Uniqueness is the metric to differentiate the response of one PUF of a certain chip with respect to other chips of the same type. By analyzing the intra-die Hamming distance (the difference between PUF responses in the same die) and the inter-die Hamming distance (variation of the PUF response from die to die), the uniqueness of mapping of the PUF is established.

Each NW experiences many different arrival times due to process variation. This is because the width, depth and availability of every notch for each NW is different from one another. There is also a possibility of both NW getting pinned and not completing the race because of this process variation. In case of the relay-PUF, we assume a small NW segment of 50nm with a top and bottom pinning site. The notch width and depth are varied using a Gaussian distribution. The notches are used to represent the unintentional roughness that can be present for each NW. These segments are

Mean Velocity distribution under PV Memory map for a Typical Chip under PV 6000 Fast Typical 5000 20 Slow 4000 40 3000

Number 60

2000 Rows(1 to 100) 80 1000

0 100 0 20 40 20 40 60 80 100 Velocity (m/s) Columns(1 to 100)

(a) (b) Figure 5.9 (a) Velocity distribution in the memory array, and (b) a memory array bitmap for the typical case.

81 .

Uniqueness of Mapping Uniqueness of Mapping 0.5 0.8 ~25% Inter Die Var Intra-die Intra Die Var 0.4 Inter-die 0.6

0.3 ~45% Sensitive Bits 0.4 0.2 0.2 0.1

Sensitive Bits

Normalized Number Normalized Normalized Number 0 0 0 0.2 0.4 0.6 0.8 1 0 0.5 1 Fractional HD Fractional HD

(a) (b) Figure 5.10 Inter and intra-die Hamming distance distribution for: (a) relay-PUF; and (b) memory- PUF.

connected in series to make up a NW of length 2um. There is also an availability parameter that is used to determine whether the top and bottom notch will be available in each NW segment. This parameter uses a random distribution of ‘1’ (notch available) and ‘0’ (no notch available) in each

NW segment. This simulates the unpredictability of notches that are created during the fabrication process of a NW. Fig. 5.10(a) shows the distribution of the inter-die Hamming distance for the relay PUF operating under normal temperature and voltage (25°C & 1V). We notice an average

45% difference between responses obtained.

In case of the memory-PUF, we assume 20 dies each with 20 such PUF blocks. The notch dimensions for the inter-die process variations are varied according to a Gaussian distribution. The circuit is operated at 0.25V and the Hamming distance (as seen in Fig. 5.10(b)) for the intra-die variations is plotted. The average Hamming distance is 50%.

82

5.3.2.3.2. Stability or reliability

PUF stability or reliability encapsulates how efficiently a PUF reproduces the correct response bits under a given set of intra-die variations. The relay-PUF is moderately susceptible towards thermal and voltage variations despite both the race-paths being affected equally. As the race is closely matched, a small change in temperature or voltage can affect the outcome. However, by performing the race at the minimum voltage that ensures no pinning in the paths, the gap between the racing DWs can be increased, which in turn enhances the reliability of the response across thermal and voltage variations.

In case of the memory-PUF, we capture the response under normal temperature and voltage

(25°C and 0.25V). This is then compared to the response captured from chips under ambient temperature variations (-10°C & 90°C), and voltage noise (+/- 10%). The average difference between the responses for each bit provides the intra-die variation.

We note that a long tail in the distribution is obtained due to sensitive bits that are highly susceptible towards temperature and voltage variations (as seen in Fig. 5.10(a,b)). An average

30%/45% difference between the inter-die and the intra-die variation is observed in the case of relay-PUF and memory-PUF respectively. We also infer that the high inter-die Hamming distance improves the uniqueness of mapping and the low intra-die Hamming distance provides greater stability and reliability. In order to improve stability of the memory-PUF under varying ambient temperatures, we propose boosting or lowering the voltage (for high/low temp resp.) to maintain a constant current density.

5.3.2.3.3. Randomness of Response

In order to quantify the randomness of the actual response i.e. the proportion of ‘0’s and

‘1’s, we simulate a 32X32 memory array under process variations at three locations (0.5um, 1um

83

&1.5um) along the NW. The notch dimensions are assumed to be Gaussian distribution with the pinning width and depth with (µnw, σnw) to be (0, 5nm) and (µnd, σnd) to be (0, 2nm).

For the memory-PUF, we achieve 44% randomness under normal conditions. The randomness can be further improved by varying the PW, PM and PF, which can either speed-up or slow-down the DW velocity, thereby changing the randomness of the response obtained.

5.3.2.3.4. Challenge-Response Analysis

So far, we have seen the design and analysis of the DWM-PUFs and its operation for a fixed shift PW and PM. Now, we take its dependence on the shift PM and PW, towards improving the number of challenge-response pairs. The PW and PM provide two additional knobs (challenges) towards the operation of the relay-based PUF. Fig. 5.11(a) shows the relationship between the number of challenge-responses to the number of shift PW and PM settings for fixed number of stages of the relay-PUF. We notice that, by increasing the number of PW and PM modes, the number of effective challenges quadratically increase. This relationship is shown for 6, 12, 24, and

48 stage relay-PUF. For example, a 24 stage DW relay-PUF can be made to have a 1e9 number of challenges just by increasing the number of PW and PM modes (10 in this case). However, in case of an arbiter-PUF, 31 stages are required to achieve the same result. Furthermore, from Fig. 5(a), we have seen that the DWM is dominated by shift power, by increasing the number of responses obtained from fewer NWs, the power consumption is lowered. With the increase in the number of responses towards authentication, the time required to crack (decipher) a valid response increases dramatically. For example, assuming a 4 GHz clock frequency, 1e9 response will require roughly

25 seconds to potentially decipher, assuming 100 clock cycles to obtain one valid response. By increasing the number of PW and PM modes to 20 and the number of stages to 48, the response obtained is of size 1e17. Deciphering this under the same conditions will take roughly 80 years

(25*1e8 seconds). It must also be noted that, with the increase in the number of stages, the run time

84

also proportionally increases. Thus, a tradeoff is made between the test time and the degree of authentication.

Fig. 5.11(b) shows the response comparison w.r.t the number of stages, between a traditional arbiter-PUF and our relay-PUF. As the number of responses in case of an arbiter-PUF varies directly w.r.t the number of stages, a linear relationship is obtained. In order to compare the

12 DWM Stages 10 PW & PM 1E9

8 Stages

(a) (b)

-90%

(c) Figure 5.11 (a) Increase in responses w.r.t. increase in PW and PM modes, (b) Difference between Arbiter-PUF and DWM relay-PUF to achieve certain number of responses, (c) The reduction in power w.r.t increase in number of heads for fixed number of challenges.

85

number of stages required to achieve similar number of responses, the relay-PUF of 12 stages with

2, 4, 8, 16 PW and PM modes are used. We notice a clear distinction of 2-8 stages between the relay-PUF and the arbiter-PUF for a fixed number of responses. It must be noted that for every change in PW and PM, a PUF reset is required, i.e. the DWs need to be flushed out or annihilated, to allow a new race to begin. However, the values of PM and PW are determined in advance to test for each device. Thus, reducing the need for resetting.

So far, we have only described a NW with one read and one write head. However, it is possible to have multiple read heads on a NW [80], which are individually selected by a wordline

(WL). Thus, allowing us to use the head selection as another challenge tier. Applying this to the memory-PUF based design, we can achieve a larger number of responses for the same size array or maintain the number of responses and reduce the size of the memory array. It is important to note that the selection of heads must be done in an orderly fashion (i.e. head1-head2-head3…) to avoid the need to reset the NWs before every analysis. Additionally, since the shift power is dependent on the length of the NW, the use of multiple heads in the NW (as seen in Fig. 5.11(c)) will dramatically reduce the power consumed. By increasing the number of heads, a power reduction of 90% over SRAM can be achieved. Combining the concept of shift PW and PM dependence, we can get greater modes of test for the device at hand.

5.4. Attack Models

In this section, we consider the resilience of the proposed PUFs against different threat models such as machine learning or modeling-based attack and magnetic field attack. We also present the limitations associated with the spintronic PUFs in this work.

86

5.4.1. Magnetic Attack

Fig. 5.12(a) shows the MTJ schematic, MTJs are used as read and write heads in the DWM.

By subjecting the MTJ to a strong external magnetic field, the information read can be corrupted.

Fig. 5.12(b)-(c) shows that the MTJ free layer could flip its polarity either using current or with

250Oe magnetic field. The magnetic field produced by a common horseshoe magnet is ~126Oe, which is sufficient to flip the weak bits in presence of process variations and thermal noise.

Bitline MTJ states d MTJ Free layer tox Oxide barrier Fixed layer Parallel Anti-Parallel (Low R) (High R) wordline

Source line

(a)

Magnetization Magnetization Happ=250Oe Before Happ=0Oe Before I=0A 1 I=0.638mA 1

0 0

Mag-x Mag-x -1 -1 0.2 0.5 1 After 1 0 After 0 0 0 -0.2 -1 -0.5 -1 Mag-y Mag-z Mag-y Mag-z (b) (c) Figure 5.12 (a) Schematic of MTJ; (b) flipping of MTJ due to STT (Happ=0Oe, I=0.638mA); and, (c) due to external magnetic field (Happ=260Oe, I=0). Plots are obtained by solving LLG.

87

The attacks on MTJ could be launched through a (DC) magnetic field. For example, a magnetic field will cause failures only for the bits whose free layer orientation is opposite to the applied field. The attack could be launched either during retention mode or functional mode

(read/write). The impact of attack during functional mode (especially read) could be more detrimental than retention due to two factors: (a) presence of disturb current; and, (b) higher frequency of reads compared to writes. This can result in either a hard failure (i.e., flipping of bitcell content) or soft failure (i.e., delay in write or degraded sense margin). The soft failures could be mitigated by slowing down the read/write operation, but the hard failures need to be avoided or corrected through error correction. It must be noted that the bits can fail easily when the current polarity and magnetic field are in the same direction (assistive), while the flip time is higher when current and magnetic field are in opposite direction (suppressive). Furthermore, the stability of

MTJ free layer is a function of its volume. Therefore, it is possible to enhance the robustness of the

MTJ against tampering by increasing the size.

5.4.2. Machine Learning Attack

Machine learning deals with the ability of a computer algorithm to automatically learn a complex behavior from a limited set of observations (responses) and use this towards predicting the outcome (response) by generalizing the interactions of the device from these examples. PUFs are built upon a set of complex challenge-response pairs (CRPs) that exploit the underlying physical system with a limited number of unknowns. It must be noted that, appropriate machine learning techniques must be employed to learn the behavior from a small training set of CRPs, and this is used towards making accurate predictions of unknown responses. In this work, we use ‘Logistic

Regression’ implemented using the Waikato Environment for Knowledge Analysis (WEKA) [81,

82] for the analysis.

88

Logistic regression is a well-established machine learning framework, which is used to predict a binary response from a binary input. By measuring the relationship between the dependent variable and one or more independent variables, the probability of the output response is calculated and the appropriate output is predicted. In this work we compare the traditional CMOS based arbiter and SRAM PUF with respect to DWM-based relay-PUF and memory-PUF. We use 75% of the

CRPs towards training the machine learning algorithm and the rest 25% towards test. The probability of correct prediction for arbiter-PUF (relay-PUF) is 50.8% (48.4%) whereas for SRAM-

PUF (memory-PUF), it is 65.6% (69.1%). Therefore, the proposed spintronic PUFs perform at par with CMOS PUFs.

5.4.3. Other Possible Threat Models

Instances of invasive attack include disabling primitives partially or fully, tampering inputs to design etc. Non-invasive attacks involve modulating the operating environment (voltage, temperature) to trigger corner cases or kill the entropy momentarily and side channel monitoring, by exploiting operating modes (authentication vs normal).

5.5. Summary

In this chapter, we describe spintronic PUFs (modeling, circuit design and analysis) for security, trust and authentication. We reveal that non-linear dynamics of the spintronic domain wall protects the proposed PUFs against machine learning based attacks. The NW dynamics allowed us to use different pulsing techniques that generate new challenge-response pairs, which enhance the strength of the proposed PUFs. The simulations show that DWM-PUF’s can achieve 30-45% separation between intra and inter-HD. New threat models such as magnetic field-based attack and environment modulation attacks are also discussed.

89

Chapter 6

IP Protection Using Camouflaging **

Semiconductor supply chain is increasingly being exposed to variety of security attacks such as Trojan insertion, cloning, counterfeiting, reverse engineering (RE), piracy of Intellectual

Property (IP) or Integrated Circuit (IC) and side-channel analysis due to involvement of untrusted parties [83-86]. With RE being one of the biggest threats to hardware IP as it has proven to be a powerful tool for IP piracy and counterfeiting. Although techniques such as watermarking and fingerprinting [87] have been used to curb the spread of counterfeit products, they do not increase the complexity of RE itself. RE involves de-packaging the chip, milling down layer-by-layer, imaging each of the metal layers and stitching the images together to identify the logic functionality and connectivity. The objective is to unlock the IP and clone the design. In order to address this issue, various camouflaging techniques have been proposed.

In this chapter, we describe the threshold voltage-defined switches that camouflage logic gates, both logically and physically to resist RE and IP piracy. The proposed gate can function as

NAND, AND, NOR, OR, XOR, and XNOR robustly using threshold defined switches. We also propose a flavor of camouflaged gate that represents reduced functionality (NAND, NOR and

NOT) at much lower overhead. The proposed gates if used to design the IP, will force an adversary to perform a brute-force guess-and-verify methodology of the underlying functionality—thus

** This work was in collaboration Deepak Vontela, Ithihasa Reddy, Syedhamidreza Motaman, Jae-won Jang, and Asmit De. Fig. 6.4 was provided by Deepak Vontela, Fig. 6.12 was provided by Jae-Won Jang, and Fig. 6.15 was provided by Asmit De. 90

increasing the RE effort. The camouflaged design operates at nominal voltage and obeys

conventional reliability limits. We propose two flavors of camouflaging, one employing only a pass

transistor (NMOS-switch) and the other utilizing a full pass transistor (CMOS-switch) that are used

to design Ring-Oscillators (RO) in ST 65nm technology, one for each functionality, on which we

have performed temperature, voltage, and process-variation analysis.

6.1. Introduction

Camouflaging is a technique of hiding the circuit functionality of a few chosen gates to

make RE impossible or extremely hard [88-9X]. The gate camouflaging using hollow vias [90]

realizes three functions with a ~5X area and power overhead. Aside from requiring a process

change (e.g., hollow via), this technique fails to force the adversary to resort to exhaustive RE.

Techniques to deceive the attacker using filler cells [91] and dummy transistors [92] are also

proposed. Other obfuscation techniques [93, 94] suffer from either extensive signal routing and/or

Normal Design

HOWEVER

Failures due to logical errors

Camo Design

V modulated gates t

Figure 6.1 Proposed camouflaging technique. Existing combinational logic gates could be replaced with camouflaged gates to protect the underlying IP by increasing RE effort.

91 .

process change. In order to increase the RE difficulty, we propose the threshold voltage (VT) modulation (implemented by changing channel doping concentration during manufacturing) of switches.

We illustrate two generic gates that can exhibit 6 functionalities based on the VT of the switches. Unlike the existing camouflaging techniques, the VT programmable technique does not add process cost and leave no layout clues. Since proposed camouflaged gates are static logic based, they can be easily integrated with the current Electronic Design Automation (EDA) tools to provide a seamless and effective implementation. Fig. 6.1 provides an overview of the application of the camouflaging technique.

We also present experimental demonstration of the static VT defined camouflaging technique by analyzing each of the six logic functionalities as Ring Oscillators (RO). Two flavors of camouflaged gates are proposed namely, NMOS-switch based, and CMOS-switch based camouflaged gates.

The highlights of this chapter are the following:

▪ Demonstration of the VT defined camouflaging technique for logic obfuscation.

▪ Analysis of the performance of the six logic functions realized as RO using both

NMOS-switch and CMOS-switch based camouflaged gates.

▪ Supply voltage, temperature and process-variability analysis, since VT is sensitive to

these parameters.

▪ Demonstration of the tuning of gate biasing of VT defined switches (using VSN and/or

VSP) to not only guarantee the functionality of proposed camouflaged gates, but also

reclaim lost performance at low voltages and high temperatures.

▪ Quantification of the RE effort for the proposed camouflaging techniques using

SAT-based solvers [22].

92

▪ Description of a low-overhead version of the camouflaged gate that hides 3

functionalities.

▪ Present an alternative flavor of gate camouflaging—using charge-trap.

6.1.1. Threat Model

When designing the camouflaging technique, we focus on defending against large scare IP theft/piracy and design counterfeiting. We assume the adversary to be well-versed with IC design

(from both the schematic and layout point-of-view) and has access to resources to accurately de- layer and RE the underlying design. Additionally, we work with the assumption that the adversary is not equipped with advanced tools to identify the process level modifications on individual transistors such as, doping density. We assume that there does not exist any concrete de- camouflaging methodology that can quickly and accurately uncover the IP/design. We also assume the designs are fabricated in trusted foundries.

6.2. Background

6.2.1. Threshold defined switch

The VT defined NMOS-switch (Fig. 6.2(a)) is realized by using an NMOS transistor which is biased with gate voltage VSN (VSN = 0.5*(VLVT+VHVT)). The switch conducts when LVT is assigned on it and stops conducting when HVT is assigned. A similar process is adopted for the

PMOS switch biased at VSP. Fig. 6.2(b) highlights the variation of current for HVT, NVT and LVT under various switch gate voltage (VSN). The value of VSN can range from 300-650mV, while the value of VSP ranges from 500-850mV for a VDD of 1.2V. This provides the flexibility to choose

4 between a low-power (high ION-to-IOFF ratio of ≈10 ) when VSN is low and high-performance mode

93

when VSN is high. The switch is employed to compose multi-function gate whose functionality is selected through VT assignment.

6.2.2. Multi-function camouflaged logic

HVT VDD LVT VDD

Vsn Vsn

HVT = Vsn = LVT = NVT + Δ (HVT+LVT)/2 NVT - Δ

(a)

Good for low- power Good for performance

ION/IOFF ≈ 104

VSN Range

(b)

Figure 6.2 (a) VT programmable NMOS switch. HVT: OFF, LVT: ON. PMOS switch works similarly; (b) The I-V plot for the HVT, NVT and LVT switches biased at VSN = (HVT+LVT)/2.

Fig. 6.3(a) shows the schematic of the proposed camouflaged gate that exhibits 6- functionalitites (AND, OR, NAND, NOR, XOR and XNOR) depending on the VT of switches S1-

S8. The switches (S1-S8) of selected (unselected) function are programmed to LVT (HVT) whereas the input and output buffers are programmed using NVT. This design is based on NMOS switch as 94

(a) (b)

Figure 6.3 (a) NMOS-switch based camouflaged gate to hide 6 functionalities; and, (b) die-image of the test-chip.

. pass transistor in the camouflaged logic. For example, a NAND logic can be realized by asserting

LVT on switches S2 and S7 and HVT on all other switches. This leads to a parallel connection of

PMOS transistors and a series connection of NMOS transistors. The design can be optimized to

either low power or high-performance by; (i) appropriately tuning the VT of the HVT and LVT

transistors; (ii) modulating the VSN and VSP voltages; and, (iii) sizing the transistors accordingly.

Note that the performance and area of the proposed camouflaged gate is strongly correlated to the

resistance of VT defined switches in the path. The CMOS-based camouflaged logic is achieved by

replacing switches S1-S2 and S4-S8 with full-transmission gates (i.e., NMOS and PMOS switches

in parallel). The CMOS-based switch provides full conduction of VDD and GND at the cost of

higher design overhead.

95

6.2.3. Application in Hardware Security

With the addition of multiple transistors to provide a 6-funtion camouflage, the design suffers from area, power and delay overheads. Therefore, the proposed camouflaged gates must be used carefully. For example, one can swap large gates with camouflaged gates to minimize area overhead or swap low activity factor gates to minimize power overhead and swap off-critical path gates to minimize delay overhead. The gates with least controllability and observability are potential candidates to be replaced with camouflaged gates to magnify the RE effort. Therefore, a trade-off exists between the overheads and security (i.e., RE effort). Note that adversary will be able to locate the camouflaged gates due to their unique appearance however, they will not know the functionality. Adversary may also be able to locate the switch gate voltage (VSN / VSP) however, they don’t reveal the functionality. Probing of VSN / VSP voltage level will not provide any clue either. VSN / VSP are DC signals and could be routed along with power rail. Unlike power rail VSN

/ VSP do not drive load. Therefore, the routing overhead could be kept minimal by using thin tracks.

6.3. Test-Chip Overview

6.3.1. Design

The proposed camouflaged gates are implemented in ST-Micro 65nm technology. The die- image with the design components (annotated) is shown in Fig. 6.3(b). Fig. 6.5 shows the block diagram of the test-chip. The design is composed of three sets of 23-stage ROs. With one set being the reference (normal gate-based RO), the second set being the only NMOS-based (pass transistor) camouflaged gates and the third set being the full CMOS-based (transmission gate) camouflaged gates. Each set is composed of the six-logic function-based camouflaged RO. For example, the camouflaged gates are configured as NAND gates in the NAND-RO. Buffers are placed in-between each stage of RO to provide optimal swing. Additionally, the above sets (and ROs) are power-gated

96

to ensure only the set being currently used is selected (turned ON). The output of all the sets are

MUXed to a single output pin. The VT switch voltage (VSN and VSP) are generated via a resistance

ladder as shown in Fig. 6.5. A total of 8 voltages settings are present for both VSN and VSP, with

VSN ranging from 300mV to 650mV and VSP ranging from 500mV to 850mV (for a supply voltage

of 1.2V) with a 50mV step. Fig. 6.4 shows the layout of a standard NAND, NMOS-switch based

Cell Area = 47.17µ2 Cell Area = 136.72µ2

NAND - Ref NAND – NMOS Camo

(a) (b)

Cell Area = 144.1µ2

NAND – CMOS Camo

(c) Figure 6.4 Layout design of: (a) reference NAND gate, (b) NMOS-switch based camouflaged gate and, (c) CMOS-switch based camouflaged gate (the gates are upsized to counter the ill-effects of process variations). . 97

VSN

Select XNOR - RO

XOR - RO Reference OR - RO AND - RO NOR - RO NAND - RO

VSN

XNOR - RO

XOR - RO NMOS Only OR - RO AND - RO NOR - RO NAND - RO

Power Gating

XNOR - RO VSP XOR - RO OR - RO Full CMOS AND - RO NOR - RO NAND - RO

Scan Scan -IN VSP -OUT Select

Figure 6.5 Schematic overview of the test-chip design with a resistance ladder for generating VSN and VSP voltages

camouflaged gate and CMOS-switch based camouflaged gate used in the test-chip. The VT defined

switches are enlarged to reduce process variation induced VT shift.

6.3.2. Test features

We have incorporated power gating for each set along with their corresponding ROs in the

design (Fig. 6.5). This allows us to analyze the power drawn by only by the RO that is currently

active. Additionally, a scan-chain implementation (Fig. 6.5) is employed to correctly assert the

98

necessary control signals and the select signals to appropriately choose the required RO with the

necessary settings (i.e. VSN, VSP). We have added flexibility to dynamically select VSN and VSP (to

tune camouflaged RO frequency) during-test. Thus, allowing us to control the behavior of our ROs,

to achieve the best performance.

6.4. Experimental Results

In this section, we analyze the optimal VSN/VSP selection, and the impact of supply voltage

scaling, process variations and temperature on the proposed camouflaged gates. Additionally, we

also demonstrate reclaiming lost performance by dynamically varying VSN and VSP.

Oscilloscope

Supply Logic Analyzer Test- Chip

Figure 6.6 Test setup consisting of (i) the test chip, (ii) a logic analyzer, (iii) an oscilloscope and (iv) a power supply. The oscilloscope capture of RO oscillations of camouflaged NAND gate is also shown.

99

6.4.1. Basic setup

The experimental setup is composed of a logic analyzer to feed in the input stream to be scanned in; a high-sampling oscilloscope‒to accurately analyze the oscillations observed; a dc- power supply and the test-chip (as shown in Fig. 6.6). The oscilloscope capture of NMOS-switch based NAND camouflaged gate oscillation is also depicted.

6.4.2. Optimal VSN and/or VSP

The camouflaged switches are controlled by VSN (NMOS) and VSP (PMOS). Fig. 6.7(a) shows the impact of VSN on the frequency of the NMOS-based camouflaged switch. We note that

500mV is the optimal VSN for the NMOS based switch at 1.2V. This is because 500mV provides the most overdrive for the LVT transistor without turning ON the HVT transistors. Similarly, the oscillation frequency of the CMOS-based switch for different VSN and VSP is shown in Fig. 6.7(b).

Variation of Freq w.r.t. Vsn and Vsp 0-5 5-10 10-15 15-20 20-25 Oscillating Frequency Vs VSN Optimal 20 Optimal V SN 25 VSN & VSP 15 20

10 15 10 750

5 5 700 Frequency (MHz) Frequency Frequency (MHz) Frequency 0 650 0 400 (mV) Vsp 450 300 350 400 450 500 550 600 650 500 600 VSN (mV) 550 Vsn (mV) AND-RO OR-RO NAND-RO NOR-RO XOR-RO XNOR-RO

(a) (b)

Figure 6.7 Frequency variation with respect to (a) VSN – for NMOS-switch (optimal VSN =

500mV) and, (b) VSN + VSP for CMOS-switch (optimal VSN = 450mV and VSP = 600mV).

100

From this plot, we observe that 450mV for VSN and 600mV for VSP are the optimal biasing points.

In this case, the driving strength of the NMOS and PMOS vary, therefore, the VSN and VSP are not necessarily between their respective HVT and LVT settings.

Oscillating Frequency Vs Vdd Variation of Frequency w.r.t. Vdd 20 120

100 15 Oscillations Oscillations die due to 80 die due to 10 60 low VSN low VSN & VSP

5 ~25X 40 ~45X

Frequency(MHz) Frequency(MHz) 20 0 0 1.25 1.2 1.15 1.1 1.05 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 Vdd (V) Supply Voltage Vdd (V) AND-RO OR-RO NAND-RO NOR-RO XOR-RO XNOR-RO AND-RO OR-RO NAND-RO NOR-RO XOR-RO XNOR-RO

(a) (b)

Variation of Frequency w.r.t VSN @ 0.65V Vdd 1.4 Reclaim 1.2 performance 1 0.8 using VSN 0.6 ~6X 0.4

Frequency (MHz) Frequency 0.2 0 300 350 400 450 500 550 600 650 VSN (mV) AND-RO OR-RO NAND-RO NOR-RO XOR-RO XNOR-RO

(c)

Figure 6.8 Frequency variation with respect to (a) VDD for NMOS-switch @ VSN = 500mV, (b)

VDD for CMOS-switch @ VSN = 450mV and VSP = 600mV, and, (c) VSN at 650mV VDD. It must be noted that resistance ladder voltage changes with VDD, thus leading to a smaller VSN step size.

101

6.4.3. Vdd scaling

Fig. 6.8 (a,b) illustrate the impact of VDD scaling on the NMOS-switch and CMOS-switch based RO. Optimal VSN and VSP are used for obtaining this data. Supply voltage scaling impacts the VSN and VSP of the VT switches due to the shift in the node voltages of the resistance ladder.

Therefore, the oscillations die out at ~650mV VDD. We sweep VSN to 325mV at VDD =650mV (seen in Fig. 6.8(c)) and observe that the oscillation frequency increases with the increase in VSN

(bringing it closer to original optimal value). It must be noted that at 325mV both the HVT and

LVT NMOS’es are OFF. However, the LVT is ‘less’ OFF than the HVT, thus resulting in the increase of frequency with the increase in VSN.

Next, we perform a similar analysis for the CMOS-based RO at VDD = 650mV. Fig. 6.9 illustrates the variation of oscillating frequency with respect to VSN and VSP. The best-case

Variation of Frequency w.r.t. Vsn and Vsp @650mV Vdd Optimal VSN & VSP 1.8 NMOS & PMOS more ON 1.6 1.4 1.2 1 0.8 0.6

Frequency (MHz) Frequency 0.4 275 0.2 325 0 NMOS & PMOS more OFF 375

150 (mV) Vsp 175 200 225 425 250 275 300 325 Vsn (mV)

0-0.2 0.2-0.4 0.4-0.6 0.6-0.8 0.8-1 1-1.2 1.2-1.4 1.4-1.6 1.6-1.8

Figure 6.9 Optimal VSN and VSP for CMOS-switch at 650mV VDD (optimal VSN = 325mV and VSP = 275mV).

102

frequency is observed for maximum VSN and minimum VSP i.e. VSN of 325mV and VSP of 250mV.

This is due to the fact that at max VSN and min VSP, the LVT NMOS and PMOS are ‘less’ OFF than their HVT counterparts. We also note that for minimum VSN and maximum VSP, the oscillations fail. The above-mentioned tuning method can be extended to different VDD, providing the necessary flexibility (performance to power) in selecting the bias voltage for a given VDD.

6.4.4. Process variations

We have analyzed the frequency response of the NMOS-switch and CMOS-switch based camouflaged gates for 10 test-chips (Fig. 6.10(a)). We observe ~5% variation in the frequency distribution for each function. A 5X to 7X improvement in the speed is observed between the

CMOS-switch to the NMOS-switch based camouflaged gates. The designs exhibit less sensitivity to process variation due to the enlarged switch sizes. The experimental results indicate that the proposed gates are robust with respect to process variations.

6.4.5. Temperature variation

Temperature has a direct impact on the performance of the camouflaged gates since VT is a function of temperature. Fig. 6.10(b) illustrates the impact of temperature on the oscillation frequency of the NMOS switch-based RO. With the increase in temperature, the transistor’s VT reduces, which correspondingly shifts the HVT and LVT values. If left unchecked, the OFF switches with HVT will turn ON. This in turn will start contending with internal signals and corrupts the functionality. Therefore, VSN and VSP need to be adjusted appropriately to restore the optimal functionality. This is illustrated in Fig. 6.11, where the device is heated to a temperature of

~65ºC and the VSN bias is swept from 300mV to 650mV to find the optimal bias point for a NMOS- based RO. It must be noted that the VT of the device reduces by ~2mV for each degree rise in temperature, which would therefore shift the VT of the device by ~80mV at 65ºC. From the 103

NMOS- CMOS- switch switch RO RO ~7X ~5X

(a)

Increase in Frequency

due to lowering of VT

(b)

Figure 6.10 (a) Frequency distribution of 10 die for both the NMOS-switch and CMOS-switch based RO; (b) variation of oscillating frequency under change in temperature for an NMOS-switch based RO.

experiment, we observe that the optimal VSN bias point has shifted from 500mV at 25ºC to 450mV at 65ºC. This result indicates that bias voltage can be optimized to counter the effects of temperature. All modern processors include temperature sensors. The temperature and supply

104

NAND-RO w.r.t VSN @ 25C and 65C 16 Optimal 14 VSN @25C 12 Optimal

10 VSN @65C 8

6

Frequency (MHz) Frequency 4 ~50mV Shift 2

0 300 350 400 450 500 550 600 650 VSN (mV)

NAND-RO @65C NAND-RO @25C

Figure 6.11 Optimal VSN bias shift under the effect of temperature (65ºC).

voltage combinations can be used to select appropriate VSN and VSP settings for the robust operation of the camouflaged gates.

6.5. Design and Security Analysis

6.5.1. Area, Power and Delay Overheads

Table 6.1 consolidates the power delay and area overheads of the proposed methodology along with the previous discussed methods from literature. It must be noted that our proposed design is upsized and is compared to an upsized NAND-gate. We observe a ~33X and ~14X delay overhead between the NMOS-switch based and CMOS-switch based NAND-RO to the reference

NAND-RO design. Due to the upsized nature, the power drawn by the two camouflaged flavors are 22% and 17% lower than the reference NAND design.

105

Table 6.1 Comparative analysis of gate camouflaging techniques.

Proposed Proposed Hollow Obfuscation MUX Gate V Dynamic Feature T NMOS- CMOS- Via [90] [110] [111] Gates [112] Switch Switch

22^m for 2m:1 # of fn. 3 Varies 2 6 6 MUX Area 3.06X 1.15X 1.15X 4.25X 2.89X** 3.05X** Delay 1.32X 0.99X 1.48X 11.48X ~33X** ~14X** Power 3.67X 1.16X N/A 3.83X 0.78X** 0.83X** Static / Static Static Static Dynamic Static Static dynamic Sim/Exp Sim Sim Sim Sim Exp** Exp** Note: ** experimental comparative analysis of an upsized NAND-RO (as fabricated) for the reference, NMOS and CMOS-based designs. Note: All other comparisons are performed with respect to a standard NAND gate.

6.5.2. RE Effort

We tested our camouflaged circuit with SAT-based solver proposed in [104]. We replaced gates using a Sandia Controllability / Observability Analysis Program (SCOAP) based algorithm and a random gate replacement strategy. The SAT-based solver source code is publicly available to be used in the main author’s webpage [104]. This simulation was executed on an Intel Core i7-

6700 3.4GHz Quad Core processor with 16Gb of RAM running Ubuntu 16.04 LTS x86_64

Operating System.

Fig. 6.12 shows the simulation results of RE effort for a 6-function gate camouflaging. The

RE effort is written in terms of seconds (System CPU time) and the results that are hovering above

~106 seconds were deemed to be unsolvable (highlighted with red circle on the figures) and had to be manually terminated. Among obtained simulation results, the longest time the SAT-based solver took without having to manually terminate was 74456 seconds for c2670 benchmark using 15%

106

6-Function Camouflaging (Random/SCOAP)

Terminated after

~2.592 million seconds RE EffortRE (seconds)

c499 c880 c1355 c1908 c2670 c3540 c5315 c7752 Benchmark

Figure 6.12 RE effort using SAT-based solver for a 2-input 6-function camouflaged gate.

random camouflaging technique. These RE effort results are comparable to some recent

camouflaging techniques [105, 106].

We can observe that by increasing the number of gate camouflaged using our proposed

camouflaging strategy, the RE effort significantly improves. Additionally, it can be noted that there

are few exception benchmarks which become unsolvable independent of its benchmark size (such

as c2670 breaking the norm of linear increase of the RE effort with respect to gate counts of

benchmarks).

6.5.3. Camouflaging Strategy and Evaluations

Since camouflaged gates are area, delay and power intensive, they cannot be used

frequently in the design. Gate selection techniques such as, random, non-resolvable and output

corruptibility have been proposed [90] and can be used in this work. However, we employ a

107

controllability (CC) and observability (Obs) based algorithm to identify interconnects/gates based on quantifiable values to maximize RE effort. Controllability and observability metrics have been widely used in literature to analyze testability of digital circuits [107, 108]. The difficulty of controlling and observing logical values of internal nodes from circuit I/O determines the ease of testability of the circuit. Hence it follows that these metrics are suitable for determining camouflaging complexity, as the primary objective of camouflaging is to increase the RE effort for determining circuit functionality. We first compute the CC and Obs values using SCOAP [109] for every net and its number of fan-outs in a circuit. The ‘0’ and ‘1’ controllability (CC0 and CC1) and observability values provides a relative difficulty of controlling and observing a logic signal of a particular net. By selecting the net with low CC0, CC1 and Obs values, it is possible to increase the RE effort of adversaries. Note that the controllability and observability of the net is assigned the same value as the controllability and observability of the gate that is driving the net. For the nets with fan-outs (FO), the controllability and observability is propagated to all fan-out nets.

Figure 6.13 Netlist generation algorithm for gate camouflaging. 108

Fig. 6.13 displays the netlist generation algorithm for the gate camouflaging technique using the controllability and observability metrics (SCOAP) which is implemented in C++ and tested using HSPICE simulation. The algorithm imports Verilog benchmarks and finds controllability / observability values of the gates (step 2) and then assigns these values to the output nets (step 3). Upon obtaining these parameters, we sort the output nets in descending order based on CC0+CC1+Obs value (step 4). When nets are sorted, we select dummy / fake nets based on the priority of CC0, CC1, Obs, and fan-outs parameters (step 5). By selecting the fake nets that are difficult to control and observe, we can further improve RE effort (This is further evident from observing the output shown in Fig. 6.12). Afterwards, camouflaged gates are inserted, and the new netlist is created. This netlist is used for the Synopsys Design Compiler to perform synthesis and to evaluate the overall design in terms of area overhead, propagation delay, and power consumption compared with the original ISCAS85 benchmarks.

6.6. Discussion

6.6.1. Attack possibilities

Although the proposed camouflaging is obtained free of cost due to multi-VT feature offered in advanced nodes, they are sensitive to temperature which is a vulnerability. The adversary can use temperature to gain insights about the camouflaged gate functionality since each individual gate flavor provide different delay signature with temperature. However, modulation of VSN/VSP of switches under temperature variation can eliminate the side channel signature.

6.6.2. Integration with EDA tools

The proposed technique adds minimal changes to the EDA flow. First, the standard cell library is modified to include the NMOS-switch based and CMOS-switch based camouflaged gates.

109

This is achieved by creating a liberty file of the above two flavors of camouflaged gates. The values used in the liberty file are populated by characterizing the gates in terms of area, delay and power.

Then, tools such as, Synopsys Design Compiler can be employed to perform in-depth design, analysis and synthesis of any combinational logic design that utilized the two flavors of camouflaged gates. Finally, appropriate layout files (LEF/GDS) are developed and made available to the physical designers for implementation.

6.6.3. Low-Overhead Camouflaged Gate

The camouflaged gate proposed above offers high resistance to RE since it exhibits 6 functionalities. However, it comes at the expense of design overhead. We propose a low-overhead flavor of camouflaged gate with 3 functionalities i.e., NOT, NAND and NOR (Fig. 6.14). This design is based on static CMOS. The switches that must be asserted with HVT and LVT are also seen in the figure.

LVT:1,3,5,6 1 HVT: 2,4 2

3 LVT:1,2,3,4,6 A Out B HVT: 5 6 5 LVT:1,4 HVT:2,3,5,6 4

Figure 6.14 Low-overhead camouflaged gate with 3 functionalities.

110

It can be noted that the design complexity of the proposed low-overhead camouflaged gate is similar to [90, 95]. The proposed gate should be used judiciously in the design to minimize the overall design overhead. System level techniques such as converting off-critical path gates (lower delay overhead), low-activity gates (lower power overhead) and more complex gates (lower area overhead) to camouflaged gate can be used to minimize the overheads.

6.6.4. Other camouflaging techniques

Aside from the VT switch-based camouflaging, a charge-trap based camouflaging technique (implemented by exploiting the gate capacitance of a transistor to store charge) -which is impervious to most RE techniques, is explored. We first achieve a rudimentary charge trapping by utilizing two transistors with their gate terminals connected to each other and their source/drain terminals set to a pre-determined voltage (Fig. 6.15(a)). The charge at node P is injected through

Fowler Nordheim (FN) tunneling through transistor CTA. After injection, the voltages are lowered to prevent de-trapping. Using this trapped charge, we selectively activate/deactivate various functions in a camouflaged gate. To lower overhead and eliminate the possibility of leakage of

Thin oxide VDD/2 Thick oxide trap transistor access transistor CLK Output CTA TX Output

Node trapping Charge Charge the charge Fowler EV EC Trap Trap P Nordheim PDN In0 Tunneling Function Function Thick oxide PDN V /2 h e NAND2 NOR2 trap transistor DD In1

CTB

CLK EFG

(a) (b)

Figure 6.15 CTCG (a) Charge trapping circuit; and, (b) 2-input 2 function CTCG

111

trapped charges, we also propose an alternative design by replacing the 3-transistor design with a single Non-Volatile Ferroelectric FET (NV-FeFET) [96-98]. The NV-FeFET can be polarized to retain charges in a non-volatile fashion. Therefore, we selectively activate/deactivate the different functions in the camouflaged gate by positively/negatively polarizing the NV-FeFET access transistor. FeFET process is CMOS compatible which makes integration practically feasible [99].

The camouflaged gate is designed using dynamic logic with multiple pull-down networks (PDNs)

(Fig. 6.15(b)), each of which serves a particular gate function.

Driven by the necessity for post-CMOS technology, a great deal of research has been concentrated in investigating different memory technologies and possible MOS transistor alternatives. These new devices offer some unique properties that can also be leveraged towards either logic of timing camouflaging. For eg. the FeFET based camouflaging [113] employs dummy contacts that offers varied functionalities with minimal overheads as compared to it CMOS counterparts. These post-CMOS technologies offer varied benefits making them potentially excellent candidates for other security primities such as polymorphic gates, TRNGs, PUFs, circuit protectors etc.

6.6.5. Need for a security evaluation framework

Although a lot of work has been done in the field of IP camouflaging, there however is a serious dearth in security analysis of the various techniques. In order to effectively evaluate the various camouflaging flavors, a standardized security metric needs to be developed. Work done by

[114] aims at addressing the problem of whether IC camouflaging is an effective technique. They address several open ended questions, such as whether one can construct circuits that are indeed difficult to decamouflage, why the discriminating input set size is small and whether the attack models highlighted are a good measure of security. They also provide an attack procedure for decamouflaging based on the SAT solver (which we have followed for our analysis in 6.5.2). 112

6.7. Summary

We demonstrate VT switches to hide six logic functionalities and proposed two camouflaged gates i.e. NMOS-switch and CMOS-switch based design experimentally. Biasing knobs i.e. VSN and VSP are used to study the impact on performance with supply voltage scaling and temperature variation. Our analysis revealed that VSN and VSP can be tuned dynamically to combat ill effects of temperature, voltage and process variations. Analysis of the RE effort with respect to the percentage of the total gates camouflaged is also described. The proposed design does not leave any logic or physical level clues, and when used judiciously, improves the overall RE effort.

113

Chapter 7

Future Work

With the potential of revolutionizing the field of storage and computing, emerging memory technologies especially NVMs are being heavily researched. Furthermore, it is anticipated that these NVM technologies will break important ground and move closer to the market very rapidly.

As more and more devices are becoming ‘smart’, simply using these new technologies as replacements of existing CMOS-based designs may not be the most desirable approach. There is a lot left to be fully understood in how best to employ/deploy these technologies. Some of the key areas of research include:

7.1. Architecture Design

As the emerging memory technologies are getting mature, integrating such memory technologies into the memory hierarchies provides new opportunities for future memory architecture designs. Compared to SRAM/DRAM, these emerging memories usually have: (1) much higher density, with comparable fast access time; (2) Non-volatility feature, they have zero standby power, and immune to radiation-induced soft errors; (3) Compared to NAND-Flash SSD,

STT-RAM/PCRAM are byte-addressable. In addition, different hybrid compositions of by using SRAM, DRAM, and PCRAM or MRAM can be motivated by different power and access behaviors of various memory technologies. With the potential of change at every level of memory hierarchy, researchers are actively investigating various designs and applications.

114

7.2. High Performance Compute

In large scale systems, keeping pace with the massive data processing is especially limited whilst employing a Von-Neumann architecture, that is, the separation of memory and compute units interconnected via busses. Most of the delay is observed by memory accesses, I/O congestion and limited memory bandwidth. To address this limitation, in-memory computing architectures, circuits and devices are being widely investigated. The promise of a highly energy-efficient compute by ‘pre-processing’ the data and only providing the intermediate result to the processor rather than the raw data itself. The goal of this methodology is to lower the strain on the processor to process all the raw data that it is fed and reduce the amount of data that moves to-and-fro between the processor and memory – thus lowering the bottleneck. Spintronic memory can be modified to perform certain basic operations in the memory itself. Furthermore, with the added density that can be leveraged by employing spintronic memory, we can potentially reduce a significant portion of the data that the processor needs to process.

7.3. Energy Efficiency

Technology scaling of SRAM and DRAM (which are the common memory technologies used in traditional memory hierarchy) are increasingly constrained by fundamental technology limits. The proliferation of ‘smart’ devices such as IoTs, automobiles etc. have only increased the demand for higher and higher performance whilst consuming low-energy. Emerging memory technologies offer much promise in this regard and can potentially overhaul the computing framework from a purely CMOS technology to completely a non-CMOS or at the very least a hybrid of both CMOS and NVM technology. This opens up a lot of research opportunities in energy-efficient design.

115

One popular application of an energy-efficient design is to leverage the properties of spintronic device— MTJ/DWM, towards neuromorphic computing which tries to emulate human brain in vision, perception and cognition related tasks. Spintronic devises not only offer an ultra- low current operation but also occupy a significantly lower area [100] as compared to a full CMOS implementation.

7.4. Security

Also, with the proliferation of these ‘smart’ devices, the need for ensuring secrecy, integrity, and availability of data/IP is paramount. With a vast variety of memory technologies being investigated, security protocols/learnings from one cannot be directly ported to another.

There, therefore, exists a need to investigate potential vulnerabilities that is associated with each new memory technology as well as understand potential applications in the field of security.

116

Chapter 8

Summary

In this chapter we summarize the contributions of this thesis.

With the ever-increasing demand for highly scalable and energy-efficient devices, traditional SRAM and DRAM memories are not able to keep up. Therefore, other emerging memory technologies are being investigated to combat this problem. Spintronics, offers some significant benefits with its high density, retention and endurance over other emerging memory technologies. To analyze the impact of this memory technology in the circuit, architecture design space, a mathematical model that captures the physics-based dynamics is realized. We describe two models: one for STTRAM and the other for DWM, that provide us the flexibility for performing circuit and architecture analysis. Also, we incorporate the effects of process variation and retention time variation in our modeling. This forms the foundation for all our work on spintronics.

With the incorporation of spintronic memory, there is a need to investigate potential energy-efficient applications. We propose a non-volatile Flip-Flop (NVFF) that incorporates the non-volatility of MTJ, whilst utilizing a modified CMOS-based D-FF design. The FF is also configured with enhanced scan capability. This hybrid circuitry provides the benefit of an ‘instant

ON’ experience and can see potential use in various IoT based products—where an unexpected power failure can be catastrophic. We show that the HPES-NVFF utilizes the entire CLK cycle for the backup operation thereby eliminating frequency bottlenecks originating from due to the MTJ write latency. By incorporating power-gating, we are able to eliminate the need for redundant writes—thereby lowering the overall energy requirement. Additionally, we investigate the issues plaguing MTJ-based crosspoint memory such as sneak-path current and good Ion-to-IOFF current.

117

We describe a MIIM-based selector diode, which conducts bi-directionally after crossing its threshold. Also, we analyze the selector device based STTRAM crossbar under different read/write voltages, array sizes, retention time and sense margins.

Spintronic memories are current driven devices and are therefore prone to side-channel attacks that utilize simple power analysis. To increase the RE effort for the adversary to easily decipher the data being written or read, we propose mitigation techniques such as constant current write, increased word size, SNVM and parity bit encoding. These techniques reduce the clarity of the data signature being detected, thereby reducing the accuracy of the SCA.

Persistent data has been a non-issue for CMOS memories as the data is lost during power- down. However, spintronic memory being non-volatile, the problem of data privacy is very evident.

We provide two solutions: a semi non-volatile memory and a novel cache erasure technique; to address this issue. We argue that a system doesn’t need to be truly ‘non-volatile’ in-order to get the most performance. We also highlight the benefits of utilizing a semi non-volatile memory, such as lower write current and need for far fewer refreshes. However, there will remain areas where non- volatility is still preferred, for such situations we propose to use the residual charge in the power rails to erase the cache—making them unusable or corrupt.

The high degree of entropy that entails spintronic memory although detrimental towards a robust and secure design, can prove to be especially beneficial towards realizing hardware security primitives such as PUFs and TRNGs. PUFs have been previously proposed for traditional CMOS memories, but with the incorporation of spintronic memory, a need for a spintronic–based PUF is evident. We exploit the non-linear dynamics of the domain wall towards realizing two flavors of

DW PUFs – The relay PUF and memory PUF. The proposed designs provide additional knobs e.g., shift pulse, number of access ports to expand the set of challenge-response pairs. The designs have been analyzed against temperature and voltage variations and threat models such as magnetic field- based attack, environment modulation attacks and machine-learning attacks are also discussed.

118

Additionally, with the globalization of the semiconductor supply chain, the threat of counterfeiting to siphon-off profits and/or steal the underlying IP is on the rise. We try to tackle this problem in this thesis by investigating a novel threshold voltage defined switch towards camouflaging logic. Two types of switches i.e. NMOS-switch and CMOS-switch are experimentally realized to hide six logic functionalities. We describe the impact of manufacture process variation, temperature and supply voltage and illustrate how the biasing knobs i.e. VSN and

VSP can be tuned dynamically to combat these ill effects. We also evaluate the RE effort involved in decamouflaging the proposed design. The proposed design does not leave any logic or physical level clues, and when used judiciously, improves the overall RE effort.

119

Appendix A

A.1. Modeling of STTRAM Retention

The Landau-Lifshitz-Gilbert (LLG) equation forms the foundation for formulating the behavior of magnetization m, of a nanomagnet in the presence of an effective magnetic field, Heff, and a spin current, Is with a few other terms describing the interactions between the nanomagnet and the spin-current [15]:

휕푚⃗⃗⃗ 1 + 훼2 = −훾(𝑚⃗⃗ × 퐻⃗⃗⃗⃗⃗⃗⃗⃗⃗ ) + 훼훾(𝑚⃗⃗ × 𝑚⃗⃗ × 퐻⃗⃗⃗⃗⃗⃗⃗⃗⃗ ) + 휏 + 훼 𝑚⃗⃗ × 휏 (1) 휕푡 푒푓푓 푒푓푓

푚⃗⃗⃗ ×퐼⃗⃗⃗ ×푚⃗⃗⃗ 휏 = 푠 ≡ Spin torque [15] 푞푁푠

Here γ is the gyromagnetic ratio, α is the Gilbert damping parameter, q is the charge of an electron and Ns is the total number of spins in the nanomagnet given by where Ns = MsV/µB. Ms, V and µB are the saturation magnetization, volume of the nanomagnet and the Bhor magneton respectively.

The first two terms represent precession and damping torques respectively, these govern the dynamics of the magnetization in the presence of an effective magnetic field. The last two terms represent the current-induced torques that take a Slonczewski-like and field-like forms.

The MTJ data retention time (Tret) is a function of thermal stability (Δ) and is given by [16]:

훥 푟푒푡 = 𝑓0 ∗ (2)

Where fo is the thermal attempt frequency, which is roughly 1GHz [17]. Δ is given by:

120

퐾 푉 훥 = ( 푢 ) (3) 푘퐵푇

Where Ku is the magneto-crystalline anisotropy, V is the effective activation volume

(volume of the free layer), T is the operating temperature and kB is the Boltzmann constant. It can be noted that the retention time is exponentially dependent on MTJ dimension and ambient temperature.

The switching threshold current density that flips the free layer in the absence of any external magnetic field at 0K, is given by [ 16, 17]:

2푞훼 = 푀 퐻 + 2휋푀 (4) 푐표 ℏ휂 푆 푘 푆

Here q is the electron charge, α is the damping constant, t is the free layer thickness, ℏ is the reduced Planck’s constant, Hk is the effective anisotropy field, Ms is the saturation magnetization, η is the spin transfer efficiency.

The switching threshold current Ico is given by:

퐼푐표 = 푐표. . (5)

where A is the cross-sectional area of free layer.

Injection of current (I) during read into the MTJ alters the retention time. In such a scenario,

(2) is modified as

퐾 푉 퐼 훥 = ( 푢 ) 1 − (6) 푘퐵푇 퐼푐표

From (3), (4) & (5), we note that shrinking the cell dimension can alter Δ, and subsequently vary the threshold current density required for switching. Fig. 2.3(a) shows the variation of retention time with respect to the disturb current. A current of 100µA reduces the retention time from 10s to 1ms. The dependency of Δ on write current can be exploited for test time reduction.

121

Retention time Vs Volume Retention Time Vs Current 15 2 10 10 10s Base MTJ: (40X40X4) 0 10 10 10 nm3 ~1e4 X ~10 -2 Base MTJ: 5 10 10 Years (40X40X4) 3 time(sec) Retention Log(Retention time) secs time) Log(Retention nm -4 1ms 10 0 ~10 secs 0 20 40 60 80 100 10 Current A 1X 1.2X 1.4X 1.6X 1.8X 2X Volume

(a) (b)

Figure A.1 (a) Retention time variation with respect to disturb current and, (b) retention time variation with respect to free layer volume.

A.1.1. MTJ Size Vs Retention Time

MTJ volume has a direct relationship with the retention time. Therefore, it can be modulated by varying the MTJ thickness (typically ~ few nm) to adjust the retention time. We take a base MTJ of dimension 40nmX40nmX4nm (W, L & T), and sweep the volume to observe the retention time (Fig. 2.3(b)). A 2X increase in MTJ volume increases the retention time from a few seconds to a few decades. Therefore, manufacturing variation can potentially alter the data retention time significantly. Therefore, characterizing the retention time of the STTRAM array is challenging.

A.1.2. Stochastic Retention Modeling

Retention time of the MTJ is stochastic in presence of thermal noise. Therefore, the bitcell retention time could fluctuate between multiple measurements. Certifying the bitcell retention will

122

require multiple tests to capture the worst-case behavior. The combined effect of large number of tests and long test time makes the overall retention characterization a time-consuming process.

Thermal excitation (noise) causes the magnetic moment of the MTJ to precess about its axis, thus leading to a variation of the initial angle (θ) (Appendix A). Work done by [18] shows the dependence of initial angle on the switching delay of an MTJ. Therefore, the initial angle of the magnet needs to be statistically characterized to correctly understand the impact on retention time.

In order to realize the variation of initial angle with respect to thermal noise, a corresponding field term (HTH) with zero mean is incorporated into the LLG (6,7 and 1). The thermal “kicks” are the source of white noise that can be expressed as [19]:

2푘퐵푇 퐻⃗⃗⃗⃗⃗푇⃗⃗ ⃗ = 휉 √ (7) 훾휇0푀푆푉Δ푡

퐻⃗⃗⃗⃗푒푓푓⃗⃗⃗⃗⃗ = 퐻⃗⃗⃗⃗푒푥푡⃗⃗⃗⃗ + 퐻⃗⃗⃗⃗푎푛푖⃗⃗⃗⃗ + 퐻⃗⃗⃗⃗푑푒푚푎푔⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ + 퐻⃗⃗⃗⃗푡⃗⃗ℎ (8)

where ξ is a standard Gaussian random variable in 3D space, T is the temperature, µ0 is the permeability of free space and Δt is the constant time step used in the numerical simulation.

Equation (3) is modified to take into account the off-easy axis magnetization [17]. It now reads:

퐾 푉 훥 = ( 푢 ) 푐𝑜𝑠2휃 (10) 푘퐵푇

Equation (1) is solved to determine distribution of initial angle (θ) which is plugged into

(10) to find distribution of Δ and retention time.

A.2. Modeling of DW dynamics

Magnetic domains in ferromagnetic materials arise due to demagnetization field from the different orientations of magnetization in order to reduce the total energy (magnetostatic,

123

anisotropic, exchange, and Zeeman) of the system. The concept of magnetic domains dynamics was first proposed by Slonczewski [101]. Upon understanding the adiabatic and non-adiabatic Spin

Torque Transfer, (STT), the Landau and Lifshitz equation with the damping term formulated by

Gilbert was modified to accommodate these torque terms.

A.2.1. Modified Landau, Lifshitz and Gilbert Equation. (LLG)

In this section we solve LLG equation to model the dynamics of the domain wall. The LLG equation including the adiabatic and non-adiabatic spin torque terms [23, 24] is given by:

휕푚⃗⃗⃗ 휕푚⃗⃗⃗ = −훾𝑚⃗⃗ × 퐻 + 훼𝑚⃗⃗ × − 𝑢 푗 ⋅ 훻 𝑚⃗⃗ + 훽𝑢𝑚⃗⃗ × 푗 ⋅ 훻 𝑚⃗⃗ 휕푡 푒푓푓 휕푡

Where 𝑚⃗⃗ and 푗 are unit vectors representing local magnetic moments and current flow. 훼 and 훽 represent the Gilbert’s damping parameter and the non-adiabatic spin transfer term

1 훿푤 respectively. The effective field is represented by 퐻푒푓푓 = − . The parameter ‘u’ is the spin 휇0푀푠 훿푚⃗⃗⃗ transfer torque parameter and its proportional to the current density J, the spin polarization P, and

휇퐵퐽푃 is given by: 𝑢 = , where 푀푠 is the saturation magnetization, w is the energy density and 휇퐵 is e푀푠 the Bohr magnetron. Each term of the equation represents the torque exerted on the magnetic moment. The first term of the modified LLG defines the effect of the magnetostatic energy

(responsible for the magnetic moment’s precession), the second term is the Gilbert’s damping term and the last two terms are the spin transfer torque adiabatic and non-adiabatic torque terms. Fig.

A.1(a) shows the effect of these torque terms on the magnetic moment.

124

H Z STT STT

θ θ ϕ Y

X

(a) (b) Figure A.2 (a) Torques experienced by a magnetic moment, (b) The co-ordinate system used to convert from Cartesian to polar.

A.2.2. One Dimensional Model (1D)

The 1D model provides a quantitative understanding of the dynamics of the domain walls even in the case of complex spin structures (e.g., vortex walls). This allows us to deduce the critical current density, DW velocity and also provides a crude effect of pinning and/or PVs.

The spherical coordinate system (shown in Fig A.1(b)) is employed, as it provides a more intuitive picture of the magnetic precession. This model assumes a constant domain wall profile, where the magnitude of magnetization is given by Ms, and the magnetic moment is expressed as

𝑚 푥, 푦, 푧 = (1, 휃 푥, 푦, 푧 , 휑 푥, 푦, 푧 ), as Ms=1. The rate of change of magnetic moment is given by,

휕푚⃗⃗⃗ 휕푚⃗⃗⃗ 휕푀 휕m⃗⃗⃗ 휕휃 휕푚⃗⃗⃗ 휕휑 = ⋅ 푠 + ⋅ + ⋅ (11) 휕푡 휕푀푠 휕푡 휕휃 휕푡 휕휑 휕푡

The partial derivatives given in terms of their unit vectors as,

125

휕𝑚 휕푀푠 휕𝑚⃗⃗ 휕𝑚 휕휃 1 휕𝑚⃗⃗ ̂푚 = | | = , ̂θ = | | = , 휕푀푠 휕𝑚 휕푀푠 휕휃 휕𝑚 푀푠 휕휃

휕푚 휕휑 1 휕푚⃗⃗⃗ ̂휑 = | | = 휕휑 휕푚 푀푠 sin 휃 휕휑

Substituting these in (1),

휕푚⃗⃗⃗ = 푀̇ ̂ + 푀 휃̇ ̂ + 푀 sin 휃휑̇ ̂ (12) 휕푡 푠 푚 푠 휃 푠 휑

We now take each term of the LLG separately and convert them into their polar equivalent.

The magnetic precession is given by,

⃗⃗⃗⃗⃗⃗⃗⃗⃗ 𝑚⃗⃗ × 퐻푒푓푓 = 푀푠 ̂푚 × [(퐻푒푓푓 ⋅ ̂푚) + (퐻푒푓푓 ⋅ ̂휃) + (퐻푒푓푓 ⋅ ̂휑)]

êm êθ êφ m⃗⃗⃗ × H⃗⃗⃗⃗eff⃗⃗⃗ =| 1 0 0 | Heff. ̂푚 Heff ⋅ êθ (Heff ⋅ êφ)

⃗⃗⃗⃗⃗⃗⃗ m⃗⃗⃗ × Heff = (Heff ⋅ êθ)êφ − (Heff ⋅ êφ)êθ (13)

The effective fields in θ and φ are:

1 δw 1 δw Hθ = − Hφ = − (14) 휇0푀s δθ 휇0푀s sin θ δφ

We next take the polar form of the damping term:

̂푚 ̂휃 ̂휑 휕푚⃗⃗⃗ 𝑚⃗⃗ × = | 1 0 0 | = 휃̇ ̂ − sin 휃휑̇ ̂ (15) 휕푡 휑 휃 0 휃̇ sin 휃휑̇

The adiabatic and the non-adiabatic torque terms are given by:

휕𝑚⃗⃗ 휕휃 휕휑 = ̂ + sin 휃 ̂ 휕푦 휕푦 휃 휕푦 휑

126

êm êθ êφ 휕m⃗⃗⃗ 휕θ 휕φ m⃗⃗⃗ × = | 1 0 0 | = ê − sin θ ê (16) 휕y 휕θ 휕φ 휕y φ 휕y θ 0 sin θ 휕y 휕y

Substituting (13), (14), (15) and (16) in (11) we get:

0

0 휕휃 휕휑 휕𝑚⃗⃗ −H ̂ − 훼 sin 휃휑̇ ̂ − 𝑢 ̂ + 훽𝑢 sin 휃 ̂ ̇ φ 휃 휃 휃 휃 = [ 휃 ] = 휕푦 휕푦 휕 sin θ휑̇ 휕휑 휕휃 H ̂ + 훼휃̇ ̂ − 𝑢 sin 휃 ̂ + 훽𝑢 ̂ [ θ 휑 휑 휕푦 휑 휕푦 휑 ]

Upon equating and using (14) we get:

훾 훿푤 휕휑 휕휃 sin 휃휑̇ = + 훼휃̇ − 𝑢 sin 휃 + 훽𝑢 (17) 푀푠 훿휃 휕푦 휕푦

−γ δw 휕θ 휕φ θ̇ = − α sin θφ̇ − u̅̅̅̅ − βu sin θ (18) Ms sin θ δφ 휕y 휕y

Rearranging, (7) & (8) we get,

훿푤 푀 = 푠 [휑̇ sin 휃 − 훼휃̇ − 훽𝑢휕휃] (19) 훿휃 훾 휕푦̅̅̅̅

δw M sin θ 휕θ = − S [θ̇ + αφ̇ sin θ + u ] (20) δ휑 γ 휕y

Above equations (19&20) show the functional derivative of the energy density w with respect to the magnetic moments’ direction. This is the general form of the LLG in 1D inclusive of the STT terms. Note that the flow of current is in the y-direction and is assumed to be homogeneous.

We also lump the magnetic moments of DW and take the azimuthal angle to be independent of position, which is shown by defining a new variable ψ(t). To calculate the functional derivative,

127

Variation of the Magnetization Angle  in the DW 3.5 Magnetization 3

2.5 1 2 I II III

1.5

(Radians)  1 y 0.5 -1

0 0 2 4 6 8 10 ψ q Time(ns)

(a) (b) Figure A.3 (a) Variation of the magnetization angle across the width of the domain wall and, (b) the sections of a NW used to better understand the dynamics of the domain wall (inclusive of tilt angle and magnetization position).

we first change the variables that show the changes in the magnetic moment’s direction occur where the DW is located. By adopting the traveling wave ansatz, the domain wall motion is given as, [21],

푦−푞 푡 휃 푦, = 2 tan−1 ( 푥 [ ]) and 휑 = 휓 (21) Δ

Where Δ = Domain wall width parameter. This equation shows the variation of θ across the domain wall. This variation resembles that turning motion of a screw as seen in the Fig. A.2(a).

In order to remove the functional derivative, the areal energy density 휎 is used instead of volume energy density w.

휎 = ∫ 푤d푦

The functional derivative of the areal energy density is given as,

훿푤 훿푤 d휎 = ∫ 훿푤 d푦 = ∫ [( ) 훿휃 + ( ) 훿휑] d푦 (22) 훿휃 훿휑

128

In order to solve the above equation, the spatial and time derivatives of 휃 is then calculated.

Rearranging (21) we get,

(푦−푞 푡 ) 휃 tan = e Δ (23) 2

Differentiating (23) w.r.t 휃, we get,

1 θ tan 휕θ Δ 2 1 θ θ sin θ = θ = [sin cos ] = 휕y 1+tan2 Δ 2 2 Δ 2

Similarly, differentiating (23) w.r.t , we get,

sin θ 휕θ sin θ θ̇ = −𝑞̇ , δθ = − ( ) 휕q = − dq Δ 휕y Δ

According to the properties of Bloch-wall [13]

δφ = dψ, ∫ sin2 θ = 2Δ , ∫ sin θ = πΔ

Substituting these values in (22),

푀 훼𝑞̇ sin 휃 𝑢 sin 휃 sin 휃 d휎 = ∫ 푠 (휓̇ sin 휃 + − 훽 ) (− ) d𝑞 훾 Δ Δ Δ

푀 sin 휃 −𝑞̇ sin 휃 𝑢 sin 휃 + (− 푠 ) ( + 훼휓̇ sin 휃 + ) d휓)dy 훾 Δ Δ

푀 sin2 휃 훼푞̇ 훽푢 푀 sin2 휃 d휎 = − 푠 [휓̇ + − ] d𝑞 − 푠 [−𝑞̇ + 훼휓̇Δ + 𝑢]d휓 훾Δ Δ Δ Δ훾

Upon further simplification, and taking the partial derivatives of the areal energy density, the equation becomes,

휕𝜎 2푀 훼푞̇ 훽푢 = − 푠 [ + 휓̇ − ] (24) 휕푞 훾 Δ Δ

휕σ 2M = s [𝑞̇ − α휓̇Δ − u] (25) 휕휓 γ

129

We now rearrange the terms to bring (24) & (25) in terms of 𝑞̇ and 휓̇

훾 휕휎 훾 휕휎 훼𝑞̇ 훽𝑢 𝑞̇ = + 훼Δ [− − + ] + 𝑢 2푀푠 휕휓 2푀𝑠 휕𝑞 Δ Δ

γ 휕σ αΔγ 휕σ 𝑞̇ = − − α2𝑞̇ + αβu + u 2Ms 휕ψ 2Ms 휕q

γ 휕σ 휕σ 1 + α2 𝑞̇ = − [αΔ ( ) − ] + 1 + αβ u (26) 2Ms 휕푞 휕휓

Similarly,

훾 휕휎 훼 훾 휕휎 훽𝑢 휓̇ = − − [ + 훼휓̇Δ + 𝑢] + 2푀푠 휕𝑞 Δ 2푀𝑠 휕휓 Δ

−γ 휕σ αγ 휕σ αu βu 휓̇ = − − α2휓̇ − + 2M5 휕q 2ΔMs 휕휓 Δ Δ

γ 휕σ α 휕σ β−α 1 + α2 휓̇ = − [( ) + ( )] + u (27) 2MS 휕푞 Δ 휕휓 Δ

The next objective is to compute the areal energy density σ of the system. This is obtained by integrating the volume energy density w over y (using (22)). The volume energy density basically consists of the exchange, anisotropy, demagnetization and Zeeman energy terms. As the system is assumed to be homogeneous, w can be expressed using the θ and φ moment’s direction.

We take the y-axis to be the easy axis, each of the energy terms are be expressed as:

2 2 휕휃 푤 = (훻푀⃗⃗ ) = ( ) 퐸푋 휕푦

2 wANI = K sin θ

1 1 w = 휇 H⃗⃗⃗⃗⃗ ⋅ M⃗⃗⃗ = 휇 푀2푁 sin2 휃 sin2 휑 DEMAG 2 0 d 2 0 푠 푧

푤ZEEMAN = −휇0M⃗⃗⃗ ⋅ H⃗⃗⃗⃗EXT⃗⃗⃗⃗⃗⃗ = −휇0푀푠(푀푥퐻푥 + 푀푦퐻푦 + 푀푧퐻푧)

130

Where A is the exchange constant, K is the uniaxial energy, Hd is the demagnetization field,

Nz is the demagnetization tensor and Hext is the external field (applied field). To understand the contributions to the total energy, we divide the nanowire into 3 sections as showing in Fig. A.2(b).

The sections 1 and 3 are the homogeneous sections of the wire and the section 2 is the one where the magnetization changes from -1 to 1(domain wall section). The domain wall energy 푤퐷푊 is given by:

휕휃 2 1 푤 = ( ) + 𝐾 sin2 휃 + 휇 푀2푁 sin2 휃 sin2 휑 + 푤퐷푊 (28a) 퐷푊 휕푦 2 0 푠 푧 푍퐸퐸푀퐴푁

The Zeeman energy can be expressed in terms of its longitudinal and its two transverse components as:

퐷푊 푧 푥 푤푍퐸퐸푀퐴푁 = 휇0 −푀푆퐻퐴 cos 휃 − 푀푠퐻푇 sin 휃 sin 휑 − 푀푠퐻푇 sin 휃 cos 휑

푥 푧 Where, 퐻퐴 = 퐻푦, 퐻푇 = 퐻푥 and 퐻푇 = 퐻푧.

We now integrate 18 over y, which gives us:

2퐴 휎 = + 2𝑘Δ + 휇 푀2푁 Δ sin2 휓 − 휋휇 푀 Δ 퐻푧 sin 휓 + H푥 cos 휓 퐷푊 Δ 0 푠 푧 0 푠 푇 푇

The energy density in sections I and III is given by:

휎ou푡 = −2휇0𝑞푀푠퐻퐴

In order to take into account, the pinning effects, a phenomenological pinning energy [23] is added.

푉푞2 휎 = 휗 |𝑞| − 푑 (28b) 푝𝑖푛 푑

Where, V and d are the depth and width of the pinning potential and 휗 𝑞 is a bidirectional

Heaviside function.

131

The total energy per unit area can be written as,

2퐴 푉푞2 휎 = + 2𝑘Δ + 휇 푀2푁 Δ sin2 휓 − 휋휇 푀 Δ퐻 − 2𝑞휇 푀 퐻 + 휗 |𝑞| − 푑 (29) 푡표푡 Δ 0 푠 푍 0 푠 푇 0 푠 퐴 푑

Taking the partial derivative of 휎푡표푡 w.r.t q and 휓, we get,

휕σ 2푉𝑞 t표푡 = −2휇 M H + 휕q 0 s A d

휕σ = 휇 M H Δ sin 2ψ − π휇 M ΔH 휕휓 0 s k 0 s T

푧 푥 Where, V = V휗 |𝑞| − 푑 , and 퐻푇 = 퐻푇 cos 휓 − 퐻푇 sin 휓 and 퐻푘 = Ms푁푍. We now substitute the above values in (26) to get,

2 훾 2푉𝑞 𝑞̇ 1 + 훼 = − [훼Δ (−2휇0푀푠퐻퐴 + ) − 푀푠휇0퐻푘Δ sin 2휓 − 휋휇0푀푠Δ퐻푇] + 𝑢 1 + 훼훽 2푀푠 푑

2 휇0 Vq 1 + α 𝑞̇ = γΔ Hk sin 2휓 − πHT + αΔγ (휇0HA − ) + 1 + αβ u (30) 2 Msd

Similarly substituting and simplifying (27) we get,

2 훾 2푉𝑞 훼 훽 − 훼 1 + 훼 휓̇ = − [(−2휇0푀푠퐻퐴 + ) + 휇0 푀푠퐻푘Δ sin 2훹 − 휋푀푠Δ퐻푇 ] + 𝑢. 2푀푠 푑 Δ Δ

2 휇0 vq β−α 1 + α 휓̇ = − αγ Hk sin 2ψ − πHT + γ (휇0HA − ) + u. 2 Msd Δ

2 휇0 Vq β−α 1 + α 휓̇ = − αγ Hk sin 2휓 − πHT + γ (휇0HA − ) + 𝑢 (31) 2 Msd Δ

The equations (30 and 31) describe the dynamics of the domain wall in 1D plane which is used in this chapter.

132

A.2.3. Modeling of NW Resistance

The resistance of NW is a function of the wire resistance (RNW0), temperature (T) and DW resistance (RDW). It is given by,

푁푊 = 푁푊0 훿 + 퐶. (32)

𝜌푙 Where = , T is temperature, 훿 is offset parameter, C is the temperature co- 푁푊0 푤.푡 efficient of resistance, ρ is resistivity, w is width, t is thickness and l is length of permalloy. The normalized resistance with temperature is plotted in Fig. A.4(a). The experimental data is also shown. The DW resistance is given by [23]

∆ = 휒훿 푤ℎ 훿 = 푑 휒 = 0.8 (33) 퐷푊 퐴푀푅 퐴푀푅 푁푊0 100푙

In the above expression δRAMR is the change in Anisotropy Magneto Resistance (AMR) due to presence of one DW and χ is a factor determined experimentally [26]. If the number of DWs in NW is N then the effective resistance is given by

2 푒푓푓 = 푁푊 + 푁 퐷푊 + 훿 퐴푀푅푐𝑜𝑠 휓 (34)

Fig. A.4(a) shows the simulation results at w.r.t temperature. The variation in resistance is due to DW resistance which is a function of ψ. The resistance for multiple DWs is shown in Fig.

A.4(b).

A.2.3. Modeling of Pinning

Notches are built intentionally in the NW to pin the DW motion. This is done to ensure that bits can be shifted deterministically. However, PVs in the NW could create unwanted physical notches that could pin the DW and degrade velocity. Pinning sites create extra potential preventing the DW motion and eventual fixation. The magnitude of pinning energy is dependent on notch dimensions. We model the pinning energy as follows [27-29]:

133

Variation of Resistance Vs Time Normalized Resistance Vs Temperature 201.5 1.6 Numerical 1 DW 5 DW 10 DW 1.5 Experimental

1.4 201

1.3

Rn(T) 1.2 R(Ohms) 200.5 1.1

1

0.9 200 200 300 400 500 600 700 0 50 100 Temperature (K) Time(ns)

(a) (b) Figure A.4 Modeling NW resistance: (a) normalized resistance vs. temperature w.r.t experimental data [23] and, (b) resistance vs. time for different number of DWs in the NW.

2 푉 푞−푞푝푖푛 푉 = V푝𝑖푛 𝑞푝𝑖푛 − 푑 ≤ 𝑞 ≤ 𝑞푝𝑖푛 + 푑 σ푝𝑖푛 = { (35) 푀푠 2푑 푉 = 0 𝑜ℎ 푤 𝑠

Where qpin is the pinning site, Vpin is the pinning potential at that particular location and d is pinning width. Multiple pinning sites are modeled by changing qpin accordingly. The LLG is solved with the pinning sites in order to observe the impact of DW dynamics.

To understand the impact of PVs we first fix the pinning locations (q1, q1/q2 and q1/q2/q3) and set the pinning potential to be equal to 2000J/m3 [23]. This condition is set to simulate the intentional pinning and depinning (study its impact on shift current).

For simulation we fix the pinning sites at q1=0.5um, q2=1um and q3=1.5um (Fig. A.5(a)).

The pinning width (d) is assumed to be 150nm. Fig. A.5(b) shows the ψ vs. q plot of DW (for

3 pinning site at q1 and Vpin=2000J/m ) for three different magnitudes of injected currents. The DW gets pinned in the first two cases (u=80m/s and 90m/s) but dislodges successfully with u=100m/s

(Fig. A.5(b)). This plot indicates the need of higher current (i.e., higher power) to dislodge the DW.

3 Fig. A.5(c) illustrates the results with pinning at two sites (at q1, q2, with Vpin1= Vpin2=2000J/m ).

134

With u=100m/s the DW gets depinned from q1 but gets pinned at q2. This indicates that velocity degradation due to first notch (even though unpinned) can cause pinning in the next notch. The

3 same current successfully dislodges two notches of half the pinning potential (i.e., Vpin=1000J/m ).

This clearly indicates that multiple deep notches hinder the DW motion compared to multiple shallow notches.

Psi Vs Q for one Pinning Site 10 Process Variation 0 nw nd -10

Notch width (nw) Notch -20 (deg) depth  -30 (nd) U=80m/s w t -40 U=90m/s U=100m/s -50 0 0.5 1 1.5 L q1 q2 q3 Q(Um)

(a) (b) Current driven Variation of Psi Vs Q Current driven Variation of Psi Vs Q

20 Current10 driven variation of ψ Vs Q. U = 100m/s. 10 0

0 -10

-10 -20

Psi(deg) Psi(deg) -20 -30 1 Notch 3 -30 1,2 = 2000J/m 2 Notch -40 U=85m/s 2” = 1000J/m3 2" Notch U=100m/s -40 U=115m/s -50 0 0.5 1 1.5 2 0 0.5 1 1.5 2 Q(Um) Q(Um) (c) (d)

Figure A.5 Domain wall pinning: (a) nanowire with pinning sites at q1, q2 and q3. (b) ψ vs. q plot for pinning at q1. The DW depins with u=100m/s, (c) ψ vs q plot of DW for one, two notches (Vpin 3 3 =2000 J/m ) and two notches (Vpin =1000 J/m ), (d) ψ vs. q plot of DW for three notches with Vpin 135 =2000 J/m3.

The plot in Fig. A.5(d) shows the outcome of three full depth (Vpin1=

3 Vpin2=Vpin3=2000J/m )) notches at q1, q2, q3. With u=85m/s the DW is pinned at q1. It can be dislodged with u=100m/s however it is insufficient to depin from q2. Finally, we find that u=115m/s is sufficient to dislodge the DW from both q1, q2 pinning sites but not from q3. This is due to the location of pinning sites. If multiple notches are located close to each other they can pin the DW.

If the new notch arrives before the full recovery of DW velocity from the previous notch, then it becomes prone to get pinned. The corresponding transient velocity can be seen in Fig. A.6(a). The velocity of DW without any notch is also plotted for reference. It can be observed that the average velocity of the DW can be affected significantly due to presence of unintentional PV induced notches.

Fig. A.6(b) shows the effect of distributing one deep notch into multiple shallow notches.

By lumping multiple shallow notches into one deep notch (for faster simulation) could result in significant overestimation of depinning voltage requirement.

VariationVariation of of Velocity Velocity Vs Time Vs Time VariationVariation of of Velocity Vs Time Vs Time 250 200

200 180

) ) 160

(m/s 150 (m/s 140

100 Velocity (m/s)

Velocity (m/s) 120 Velocity

50 No notch Velocity 100 1 notch 1 Notch 2" Notch 2 Notches 3 Notches 3" Notches 0 80 0 5 10 15 20 0 2 4 6 8 10 12 14 Time(ns) Time(ns) Time (ns) Time (ns)

(a) (b) Figure A.6 Velocity degradation due to multiple notches, (a) transient velocity in 0, 1, 2, 3 pinning sites in the NW. (b) transient velocity for one deep pinning and two cases of shallow pinning sites. [α=0.01, β= 0.02 and ∆=25nm].

136

A.2.4. Modeling Process Variation and Entropy

A.2.4.1 Process Variation

In order to study the impact of process variation first we modeled the relationship between depinning magnetic field Hth and its dependency on notch depth (nd) for the NW thickness of 10nm

[29]. By curve fitting from [29] we obtain 퐻푡ℎ = 2.34 푑 + 2.60 and 푉푝𝑖푛 ≈ 2퐻푡ℎ푀푠.

By substituting Hth and Vpin in (8) we relate the pinning energy with notch dimensions.

Next, we study the presence of single notch at qpin=0 under process variation induced notch width

(w) and depth (nd) fluctuations. The process variation in nw and nd (Fig. A.5(a)), is assumed to be

Gaussian with mean (µ) and sigma (σ) of (µd, σd) = (0, 6.66nm) and (µnt, σnt) = (0, 50nm). Supply voltage of the shift circuit is swept from 0 to 3V in steps of 50mV and minimum voltage to dislodge the DW is plotted in Fig. A.7 for 1000 runs of Monte Carlo simulation. The (µ, σ) of depinning voltage is found to be (0.70V, 0.42V).

Depinning Voltage distribution under Process Variations 100

LOW 80 MEDIUM 60 HIGH

Number 40

20

0 0 0.5 1 1.5 2 2.5 3 Voltage (V) Figure A.7 Distribution of depinning voltage and segregation of pulsed shifting.

137

The unwanted pinning of DW is dominated by the long tail of the distribution which indicates that the worst-case voltage (and power) to ensure the depinning of process variation induced notches could be very high. This underscores the need of designing adaptive circuits (such as pulsed shifting) that can depin the notches without consuming worst case power. Fig. A.7 also illustrates a binning methodology where different parts of the memory array can be pulsed with different voltages.

A.2.4.2 Entropy and Randomness in DWM

The interaction between conduction electron of injected current, thermally activated electrons and local magnetization is a source of entropy [30-35]. We incorporate randomness in physical parameters in the model. In NW, the roughness is modeled as triangular notches with width (d) and depth (t) (Fig. A.5(a)). The pinning energy is given (35).

Current driven Variation of Q Vs Time 4 1 Notch U =100m/s. 2 Notch Slow speed due 3 2" Notch to roughness

3 2 1,2 = 2000J/m 3

Q(Um) 2” = 1000J/m pinning 1 DW dynamics in

0 presence of randomness 0 5 10 15 20 25 Time (ns)

Figure A.8 DW dynamics in presence of physical roughness induced slowdown and eventual pinning. Stochastic motion of DW. A misaligned shift pulse reduces velocity.

138

By determining an accurate relation between Vpin and t and assuming Gaussian distribution of d and t, we can obtain the statistical nature of DW dynamics in the non-uniform NW. Fig. A.8 illustrates the initial results, showing DW motion by using a curve fitting model of Vpin. It can be observed that the DW dynamics is non-linear in nature, which makes it resistant to modeling-type attacks.

139

Appendix B

Publications

B.1. Referred Conferences

• Anirudh Iyengar, Deepak Vontela, Ithihasa Reddy , Swaroop Ghosh, Syedhamidreza Motaman, and Jae-won Jang. “Threshold Defined Camouflaged Gates in 65nm Technology for Reverse Engineering Protection”, In Proceedings of the 2018 International Symposium on Low-power Electronics and Design (accepted). • Nirmala, Ithihasa Reddy, Deepak Vontela, Swaroop Ghosh, and Anirudh Iyengar. "A novel threshold voltage defined switch for circuit camouflaging." In Test Symposium (ETS), 2016 21th IEEE European, pp. 1-2. IEEE, 2016. • Anirudh Iyengar and Swaroop Ghosh. "A novel threshold voltage defined switch for circuit camouflaging." In Test Symposium (ETS), 2016 21th IEEE European, pp. 1-2. IEEE, 2016. • Anirudh Iyengar, and Swaroop Ghosh. "Modeling and analysis of domain wall dynamics for robust and low-power embedded memory." In Proceedings of the 51st Annual Design Automation Conference, pp. 1-6. ACM, 2014. • Anirudh Iyengar, Kenneth Ramclam, and Swaroop Ghosh. "DWM-PUF: A low- overhead, memory-based security primitive." In Hardware-Oriented Security and Trust (HOST), 2014 IEEE International Symposium on, pp. 154-159. IEEE, 2014. • Anirudh Iyengar, Swaroop Ghosh, Nitin Rathi, and Helia Naeimi. "Side channel attacks on STTRAM and low-overhead countermeasures." In Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2016 IEEE International Symposium on, pp. 141-146. IEEE, 2016. • Rathi, Nitin, Swaroop Ghosh, Anirudh Iyengar, and Helia Naeimi. "Data privacy in non-volatile cache: Challenges, attack models and solutions." In Design Automation Conference (ASP-DAC), 2016 21st Asia and South Pacific, pp. 348-353. IEEE, 2016. • Asmit De, Anirudh Iyengar et. al, “CTCG: Charge-Trap Based Camouflaged Gates for Reverse Engineering Prevention” HOST 2018 (accepted). 140

• Motaman, Seyedhamidreza, Anirudh Iyengar, and Swaroop Ghosh. "Synergistic circuit and system design for energy-efficient and robust domain wall caches." In Proceedings of the 2014 international symposium on Low power electronics and design, pp. 195-200. ACM, 2014. • Anirudh Iyengar, Nareen Vobilisetti, and Swaroop Ghosh. "Authentication of printed circuit boards." In 42nd International Symposium for Testing and Failure Analysis, ISTFA 2016. ASM International, 2016. • Khan, Mohammad Nasim Imtiaz, Anirudh Iyengar, and Swaroop Ghosh. "Novel magnetic burn-in for retention testing of STTRAM." In Proceedings of the Conference on Design, Automation & Test in Europe, pp. 666-669. European Design and Automation Association, 2017.

B.2. Referred Journals

• Anirudh Iyengar, Swaroop Ghosh, and Kenneth Ramclam. "Domain wall magnets for embedded memory and hardware security." IEEE Journal on Emerging and Selected Topics in Circuits and Systems 5, no. 1 (2015): 40-50. • Motaman, Seyedhamidreza, Anirudh Iyengar, and Swaroop Ghosh. "Domain wall memory-layout, circuit and synergistic systems." IEEE Transactions on Nanotechnology14, no. 2 (2015): 282-291. • Anirudh Iyengar, Swaroop Ghosh, and Jae-Won Jang. "MTJ-based state retentive flip- flop with enhanced-scan capability to sustain sudden power failure." IEEE Transactions on Circuits and Systems I: Regular 62, no. 8 (2015): 2062-2068. • Anirudh Iyengar, Swaroop Ghosh, Kenneth Ramclam, Jae-Won Jang, and Cheng-Wei Lin. "Spintronic PUFs for security, trust, and authentication." ACM Journal on Emerging Technologies in Computing Systems (JETC) 13, no. 1 (2016): 4. • Anirudh Iyengar, Swaroop Ghosh, and Srikant Srinivasan. "Retention Testing Methodology for STTRAM." IEEE Design & Test 33, no. 5 (2016): 7-15. • Anirudh Iyengar, Swaroop Ghosh and Nitin Rathi, “MTJ Reliability Assessment under Process Variations and Activity Factors and Mitigation Techniques”, JOLPE 2018 (accepted). • Nasim Imtiaz Khan, Anirudh Iyengar & Swaroop Ghosh “Novel Magnetic Burn-In for Retention and Magnetic Tolerance Testing of STTRAM” TVLSI 2018 (accepted).

141

• Ghosh, Swaroop, Rashmi Jha, Anirudh Iyengar, and Rekha Govindaraj. "Design Space Exploration for Selector Diode-STTRAM Crossbar Arrays [-. 4pc]." IEEE Transactions on Magnetics (2018).

B.3. Referred Patents

• Physically unclonable function based on domain wall memory and method of use, Swaroop Ghosh, Anirudh Iyengar, and Kenneth Ramclam (US20170062072). • Non-Volatile Flip-Flop with Enhanced-Scan Capability to Sustain Sudden Power Failure, Swaroop Ghosh and Anirudh Iyengar (US20160322093A1). • Threshold Voltage Defined Switches for Programmable Camouflaged Gates, Anirudh Iyengar, Swaroop Ghosh, Deepakreddy Vontela & Ithihasa Reddy Nirmala (filed June 2016).

B.4. Referred Book Chapters

• Anirudh Iyengar & Swaroop Ghosh, “Hardware Trojans and Piracy of PCBs”, Springer International.

142

Bibliography

[1] Vetter, Jeffrey S., and Sparsh Mittal. "Opportunities for nonvolatile memory systems in

extreme-scale high-performance computing." Computing in Science & Engineering 17.2

(2015): 73-82.

[2] https://nanohub.org/courses/ss2014/01a/outline/unit8anandraghunathanmemorysystems/l82

cachebasics#

[3] Hosomi, M., H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada et

al. "A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-

RAM." In Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, pp.

459-462. IEEE, 2005.

[4] Parkin, Stuart SP, Masamitsu Hayashi, and Luc Thomas. "Magnetic domain-wall racetrack

memory." Science 320, no. 5873 (2008): 190-194.

[5] Harshfield, Steven T., and David Q. Wright. "PCRAM and method of making

same." U.S. Patent 7,102,150, issued September 5, 2006.

[6] Choi, Ja Moon. "Ferroelectric RAM device." U.S. Patent 6,044,008, issued March 28, 2000.

[7] Govoreanu, B., G. S. Kar, Y. Y. Chen, V. Paraschiv, S. Kubicek, A. Fantini, I. P. Radu et al.

"10× 10nm 2 Hf/HfO x crossbar resistive RAM with excellent performance, reliability and

low-energy operation." In Electron Devices Meeting (IEDM), 2011 IEEE International, pp.

31-6. IEEE, 2011.

143

[8] Sun, Guangyu, Jishen Zhao, Matt Poremba, Cong Xu, and Yuan Xie. "Memory that Never

Forgets: Emerging Non-volatile Memory and the Implication for Architecture

Design." National Science Review (2017).

[9] Jin, Yier. "Introduction to hardware security." Electronics 4, no. 4 (2015): 763-784.

[10] Rad, R.M.; Wang, X.; Tehranipoor, M.; Plusquellic, J. Power Supply Signal Calibration

Techniques for Improving Detection Resolution to Hardware Trojans. In Proceedings of the

IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, USA, 10–

13 November 2008; pp. 632–639.

[11] Hicks, Matthew, Murph Finnicum, Samuel T. King, Milo MK Martin, and Jonathan M.

Smith. "Overcoming an untrusted computing base: Detecting and removing malicious

hardware automatically." In Security and Privacy (SP), 2010 IEEE Symposium on, pp. 159-

172. IEEE, 2010.

[12] Huai, Yiming, and Paul P. Nguyen. "Magnetic element utilizing spin transfer and an MRAM

device using the magnetic element." U.S. Patent 6,714,444, issued March 30, 2004.

[13] Roy, Kaushik, Deliang Fan, Xuanyao Fong, Yusung Kim, Mrigank Sharad, Somnath Paul,

Subho Chatterjee, Swarup Bhunia, and Saibal Mukhopadhyay. "Exploring spin transfer

torque devices for unconventional computing." IEEE Journal on Emerging and Selected

Topics in Circuits and Systems 5, no. 1 (2015): 5-16.

[14] Ghosh, Swaroop, Anirudh Iyengar, Seyedhamidreza Motaman, Rekha Govindaraj, Jae-Won

Jang, Jinil Chung, Jongsun Park, Xin Li, Rajiv Joshi, and Dinesh Somasekhar. "Overview of

Circuits, Systems, and Applications of Spintronics." IEEE Journal on Emerging and Selected

Topics in Circuits and Systems 6, no. 3 (2016): 265-278.

[15] Srinivasan, Srikant. "All spin logic: Modeling multi-magnet networks interacting via spin

currents." PhD diss., Purdue University, 2012.

144

[16] Raychowdhury, Arijit, Dinesh Somasekhar, Tanay Karnik, and Vivek De. "Design space and

scalability exploration of 1T-1STT MTJ memory arrays in the presence of variability and

disturbances." In Electron Devices Meeting (IEDM), 2009 IEEE International, pp. 1-4. IEEE,

2009.

[17] Sun, Zhenyu, Xiuyuan Bi, Hai Li, Weng-Fai Wong, and Xiaochun Zhu. "STT-RAM Cache

Hierarchy With Multiretention MTJ Designs." Very Large Scale Integration (VLSI) Systems,

IEEE Transactions on 22, no. 6 (2014): 1281-1293

[18] Sun, J. Z. "Spin-current interaction with a monodomain magnetic body: A model study."

Physical Review B 62, no. 1 (2000): 570.

[19] Brown Jr, William Fuller. "Thermal fluctuations of a single‐domain particle."Journal of

Applied Physics 34.4 (1963): 1319-1320.

[20] Boulle, O., G. Malinowski, and Mathias Kläui. "Current-induced domain wall motion in

nanoscale ferromagnetic elements." Materials Science and Engineering: R: Reports 72, no.

9 (2011): 159-187.

[21] Malozemoff, A. P., and J. C. Slonczewski. Magnetic Domain Walls in Bubble Materials:

Advances in Materials and Device Research. Vol. 1. Academic press, 2016.

[22] Slonczewski, J. C., and S. Middelhoek. "ENERGY OF WALLS IN THIN MAGNETIC

DOUBLE PERMALLOY (Ni–Fe) FILMS." Applied Physics Letters 6, no. 7 (1965): 139-

140.

[23] M. Hayashi, "Current driven dynamics of magnetic domain walls in permalloy nanowires."

PhD diss., Stanford University, 2006.

[24] Zhang, Jianwei, Peter M. Levy, Shufeng Zhang, and Vladimir Antropov. "Identification of

transverse spin currents in noncollinear magnetic structures." Physical review letters 93, no.

25 (2004): 256602.

145

[25] Thiaville, André, and Yoshinobu Nakatani. "Domain-wall dynamics in nanowiresand

nanostrips." In Spin dynamics in confined magnetic structures III, pp. 161-205. Springer,

Berlin, Heidelberg, 2006.

[26] Hassel, C., S. Stienen, F. M. Römer, R. Meckenstock, G. Dumpich, and J. Lindner.

"Resistance of domain walls created by means of a magnetic force microscope in

transversally magnetized epitaxial Fe wires." Applied Physics Letters 95, no. 3 (2009):

032504.

[27] Hayashi, Masamitsu, Luc Thomas, Charles Rettner, Rai Moriya, and Stuart SP Parkin.

"Dynamics of domain wall depinning driven by a combination of direct and pulsed

currents." Applied Physics Letters 92, no. 16 (2008): 162503.

[28] Thomas, Luc, Masamitsu Hayashi, Xin Jiang, Rai Moriya, Charles Rettner, and Stuart SP

Parkin. "Oscillatory dependence of current-driven magnetic domain wall motion on current

pulse length." Nature 443, no. 7108 (2006): 197.

[29] Suzuki, T., S. Fukami, N. Ohshima, K. Nagahara, and N. Ishiwata. "Analysis of current-

driven domain wall motion from pinning sites in nanostrips with perpendicular magnetic

anisotropy." Journal of Applied Physics 103, no. 11 (2008): 113913.

[30] Duine, R. A., A. S. Núñez, and A. H. MacDonald. "Thermally assisted current-driven

domain-wall motion." Physical review letters 98, no. 5 (2007): 056605.

[31] Hermann, Donfack Gildas, and Jean-Pierre Nguenang. "Chaos Appearance during domain

wall motion under electronic transfer in nanomagnets." World Journal of Condensed Matter

Physics 3, no. 03 (2013): 136.

[32] Okuno, H. "Chaos and energy loss of nonlinear domain wall motion." Journal of applied

physics 81, no. 8 (1997): 5233-5235.

[33] Alekseev, K. N., G. P. Berman, V. I. Tsifrinovich, and A. M. Frishman. "Dynamical chaos

in magnetic systems." Physics-Uspekhi 35, no. 7 (1992): 572-590. 146

[34] Ott, Edward. Chaos in dynamical systems. Cambridge university press, 2002.

[35] Jamali, Mahdi, Kyung-Jin Lee, and Hyunsoo Yang. "Metastable magnetic domain wall

dynamics." New Journal of Physics 14, no. 3 (2012): 033010.

[36] Yamamoto, S. I., & Sugahara, S. (2010). Nonvolatile delay flip-flop based on spin-transistor

architecture and its power-gating applications. Japanese Journal of Applied Physics, 49(9R),

090204.

[37] Kwon, K. W., Choday, S. H., Kim, Y., Fong, X., Park, S. P., & Roy, K. (2014). SHE-NVFF:

spin Hall effect-based nonvolatile flip-flop for power gating architecture.

[38] Zhao, W., Belhaire, E., & Chappert, C. (2007, August). Spin-mtj based non-volatile flip-flop.

In Nanotechnology, 2007. IEEE-NANO 2007. 7th IEEE Conference on (pp. 399-402). IEEE.

[39] Sakimura, N., Sugibayashi, T., Nebashi, R., & Kasai, N. (2009). Nonvolatile magnetic flip-

flop for standby-power-free SoCs. Solid-State Circuits, IEEE Journal of, 44(8), 2244-2250.

[40] Chabi, D., Zhao, W., Deng, E., Zhang, Y., Ben Romdhane, N., Klein, J. O., & Chappert, C.

(2014). Ultra low power magnetic flip-flop based on checkpointing/power gating and self-

enable mechanisms. Circuits and Systems I: Regular Papers, IEEE Transactions on, 61(6),

1755-1765.

[41] Goel, A., Bhunia, S., Mahmoodi, H., & Roy, K. (2006, January). Low-overhead design of

soft-error-tolerant scan flip-flops with enhanced-scan capability. InDesign Automation,

2006. Asia and South Pacific Conference on (pp. 6-pp). IEEE.

[42] Predictive technology model, http://ptm.asu.edu/

[43] Zhang, Y., Wang, X., Li, Y., Jones, A. K., & Chen, Y. (2012, March). Asymmetry of MTJ

switching and its implication to STT-RAM designs. In Proceedings of the Conference on

Design, Automation and Test in Europe (pp. 1313-1318). EDA Consortium.

147

[44] Khan, A. A., Schmalhorst, J., Thomas, A., Schebaum, O., & Reiss, G. (2008). Dielectric

breakdown in Co–Fe–B/MgO/Co–Fe–B magnetic tunnel junction.Journal of Applied

Physics, 103(12), 123705

[45] Robertson, J. "High dielectric constant oxides." The European Physical Journal-Applied

Physics 28, no. 3 (2004): 265-291.

[46] Jang, Jae-Won, Jongsun Park, Swaroop Ghosh, and Swarup Bhunia. "Self-correcting

STTRAM under magnetic field attacks." In Proceedings of the 52nd Annual Design

Automation Conference, p. 77. ACM, 2015.

[47] Bi, Xiuyuan, Hai Li, and Jae-Joon Kim. "Analysis and optimization of thermal effect on STT-

RAM Based 3-D stacked cache design." In VLSI (ISVLSI), 2012 IEEE Computer Society

Annual Symposium on, pp. 374-379. IEEE, 2012.

[48] Halderman, J. Alex, Seth D. Schoen, Nadia Heninger, William Clarkson, William Paul,

Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, and Edward W. Felten. "Lest we

remember: cold-boot attacks on encryption keys." Communications of the ACM 52, no. 5

(2009): 91-98.

[49] Rathi, Nitin, Swaroop Ghosh, Anirudh Iyengar, and Helia Naeimi. "Data privacy in non-

volatile cache: Challenges, attack models and solutions." In Design Automation Conference

(ASP-DAC), 2016 21st Asia and South Pacific, pp. 348-353. IEEE, 2016.

[50] Jameco Electronics, “PC-Multiscope (part# 142834),” p.103, 1999.

[51] Smullen, Clinton W., Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and

Mircea R. Stan. "Relaxing non-volatility for fast and energy-efficient STT-RAM caches."

In High Performance Computer Architecture (HPCA), 2011 IEEE 17th International

Symposium on, pp. 50-61. IEEE, 2011.

[52] Motaman, Seyedhamidreza, Swaroop Ghosh, and Nitin Rathi. "Impact of process-variations

in STTRAM and adaptive boosting for robustness." In Proceedings of the 2015 Design, 148

Automation & Test in Europe Conference & Exhibition, pp. 1431-1436. EDA Consortium,

2015.

[53] Kim, Jisu, Kyungho Ryu, Seung H. Kang, and Seong-Ook Jung. "A novel sensing circuit for

deep submicron spin transfer torque MRAM (STT-MRAM)." IEEE Transactions on very

large scale integration (VLSI) systems 20, no. 1 (2012): 181-186.

[54] Swaminathan, Karthik, Raghav Pisolkar, Cong Xu, and Vijaykrishnan Narayanan. "When to

forget: A system-level perspective on STT-RAMs." In Design Automation Conference (ASP-

DAC), 2012 17th Asia and South Pacific, pp. 311-316. IEEE, 2012.

[55] Halupka, David. "Effects of silicon variation on nano-scale solid-state memories." PhD diss.,

2011.

[56] Bernstein, Daniel J. "Cache-timing attacks on AES." (2005): 3.

[57] Abramovici, Miron, and Paul Bradley. "Integrated circuit security: new threats and

solutions." In Proceedings of the 5th Annual Workshop on Cyber Security and Information

Intelligence Research: Cyber Security and Information Intelligence Challenges and

Strategies, p. 55. ACM, 2009.

[58] Rostami, Masoud, Farinaz Koushanfar, Jeyavijayan Rajendran, and Ramesh Karri.

"Hardware security: Threat models and metrics." In Proceedings of the International

Conference on Computer-Aided Design, pp. 819-823. IEEE Press, 2013.

[59] Žutić, Igor, Jaroslav Fabian, and S. Das Sarma. "Spintronics: Fundamentals and

applications." Reviews of modern physics76, no. 2 (2004): 323.

[60] Bandyopadhyay, Supriyo, and Marc Cahay. Introduction to spintronics. CRC press, 2008.

[61] Wolf, S. A., D. D. Awschalom, R. A. Buhrman, J. M. Daughton, S. Von Molnar, M. L.

Roukes, A. Yu Chtchelkanova, and D. M. Treger. "Spintronics: a spin-based electronics

vision for the future." Science 294, no. 5546 (2001): 1488-1495.

149

[62] Nikonov, Dmitri, George Bourianoff, and Paolo Gargini. "Taxonomy of spintronics (a zoo

of devices)." (2006).

[63] Driskill-Smith, Alexander. "Latest advances and future prospects of STT-RAM." In Non-

Volatile Memories Workshop, pp. 11-13. 2010.

[64] Berger, L. "Exchange interaction between ferromagnetic domain wall and electric current in

very thin metallic films." Journal of Applied Physics 55, no. 6 (1984): 1954-1956.

[65] Freitas, P. P., and Luc Berger. "Observation of s‐d exchange force between domain walls and

electric current in very thin Permalloy films." Journal of Applied Physics 57, no. 4 (1985):

1266-1269.

[66] Wang, Shan X., and Alex M. Taratorin. Magnetic Information Storage Technology: A

Volume in the Electromagnetism Series. Elsevier, 1999.

[67] Rose, Garrett S., Dhireesha Kudithipudi, Ganesh Khedkar, Nathan McDonald, Bryant

Wysocki, and Lok-Kwong Yan. "Nanoelectronics and hardware security." In Network

Science and Cybersecurity, pp. 105-123. Springer New York, 2014.

[68] Rajendran, Jeyavijayan, Ramesh Karri, James Bradley Wendt, Miodrag Potkonjak, Nathan

R. McDonald, Garrett S. Rose, and Bryant T. Wysocki. "Nanoelectronic Solutions for

Hardware Security." IACR Cryptology ePrint Archive 2012 (2012): 575.

[69] Tanamoto, Tetsufumi, Naoharu Shimomura, Sumio Ikegawa, Mari Matsumoto, Shinobu

Fujita, and Hiroaki Yoda. "High-speed magnetoresistive random-access memory random

number generator using error-correcting code." Japanese Journal of Applied Physics 50, no.

4S (2011): 04DM01.

[70] Annunziata, A. J., M. C. Gaidis, L. Thomas, C. W. Chien, C. C. Hung, P. Chevalier, E. J.

O'Sullivan et al. "Racetrack memory cell array with integrated magnetic tunnel junction

readout." In Electron Devices Meeting (IEDM), 2011 IEEE International, pp. 24-3. IEEE,

2011. 150

[71] Nebashi, R., N. Sakimura, Y. Tsuji, S. Fukami, H. Honjo, S. Saito, S. Miura et al. "A content

addressable memory using magnetic domain wall motion cells." In VLSI Circuits (VLSIC),

2011 Symposium on, pp. 300-301. IEEE, 2011.

[72] Yang, See-Hun, Kwang-Su Ryu, and Stuart Parkin. "Domain-wall velocities of up to 750

ms−1 driven by exchange-coupling torque in synthetic antiferromagnets." Nature

nanotechnology 10, no. 3 (2015): 221.

[73] SGMI Research Themes & Subjects. Online:

http://www.samsung.com/global/business/semiconductor/-html/news-

events/file/SGMI_Request_for_Proposal.pdf

[74] R. Pappu (2001), “Physical one-way functions," PhD thesis, Massachusetts Institute of

Technology.

[75] Tuyls, Pim, Geert-Jan Schrijen, Boris Škorić, Jan Van Geloven, Nynke Verhaegh, and Rob

Wolters. "Read-proof hardware from protective coatings." In International Workshop on

Cryptographic Hardware and Embedded Systems, pp. 369-383. Springer, Berlin, Heidelberg,

2006.

[76] Maiti, Abhranil, Jeff Casarona, Luke McHale, and Patrick Schaumont. "A large scale

characterization of RO-PUF." In Hardware-Oriented Security and Trust (HOST), 2010 IEEE

International Symposium on, pp. 94-99. IEEE, 2010.

[77] Holcomb, Daniel E., Wayne P. Burleson, and Kevin Fu. "Power-up SRAM state as an

identifying fingerprint and source of true random numbers." IEEE Transactions on

Computers 58, no. 9 (2009): 1198-1210.

[78] Wang, Yinglei, Wing-kei Yu, Shuo Wu, Greg Malysa, G. Edward Suh, and Edwin C. Kan.

"Flash memory for ubiquitous hardware security functions: True random number generation

and device fingerprints." In Security and Privacy (SP), 2012 IEEE Symposium on, pp. 33-47.

IEEE, 2012. 151

[79] Zheng, Yu, Aswin Raghav Krishna, and Swarup Bhunia. "ScanPUF: Robust ultralow-

overhead PUF using scan chain." In Design Automation Conference (ASP-DAC), 2013 18th

Asia and South Pacific, pp. 626-631. IEEE, 2013.

[80] Motaman, Seyedhamidreza, Anirudh Iyengar, and Swaroop Ghosh. "Synergistic circuit and

system design for energy-efficient and robust domain wall caches." In Proceedings of the

2014 international symposium on Low power electronics and design, pp. 195-200. ACM,

2014.

[81] Öztürk, Erdinç, Ghaith Hammouri, and Berk Sunar. "Towards robust low cost authentication

for pervasive devices." In Pervasive Computing and Communications, 2008. PerCom 2008.

Sixth Annual IEEE International Conference on, pp. 170-178. IEEE, 2008.

[82] Hall, Mark, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian

H. Witten. "The WEKA data mining software: an update." ACM SIGKDD explorations

newsletter 11, no. 1 (2009): 10-18.

[83] Semiconductor Industry Association. "Winning the battle against counterfeit semiconductor

products.”." Washington, DC: SIA. http://www. semiconductors. org/clientuploads/Anti-

Counterfeiting/SIA% 20Anti-Counterfeiting% 20Whitepaper. pdf (2013).

[84] Guin, Ujjwal, Ke Huang, Daniel DiMase, John M. Carulli, Mohammad Tehranipoor, and

Yiorgos Makris. "Counterfeit integrated circuits: A rising threat in the global semiconductor

supply chain." Proceedings of the IEEE 102, no. 8 (2014): 1207-1228.

[85] Dignan, L. "Counterfeit chips: A $169 billion tech supply chain headache." (2012).

[86] Senate Armed Services Committee. "Inquiry into counterfeit electronic parts in the

department of defense supply chain." Washington, DC (2012).

[87] Charbon, Edoardo. "Hierarchical watermarking in IC design." In Custom Integrated Circuits

Conference, 1998. Proceedings of the IEEE 1998, pp. 295-298. IEEE, 1998.

152

[88] Nirmala, Ithihasa Reddy, Deepak Vontela, Swaroop Ghosh, and Anirudh Iyengar. "A novel

threshold voltage defined switch for circuit camouflaging." In Test Symposium (ETS), 2016

21th IEEE European, pp. 1-2. IEEE, 2016.

[89] De, Asmit, and Swaroop Ghosh. "Preventing Reverse Engineering using threshold voltage

defined multi-input camouflaged gates." In Technologies for Homeland Security (HST), 2017

IEEE International Symposium on, pp. 1-6. IEEE, 2017.

[90] Rajendran, Jeyavijayan, Michael Sam, Ozgur Sinanoglu, and Ramesh Karri. "Security

analysis of integrated circuit camouflaging." In Proceedings of the 2013 ACM SIGSAC

conference on Computer & communications security, pp. 709-720. ACM, 2013.

[91] Cocchi, Ronald P., Lap Wai Chow, James P. Baukus, and Bryan J. Wang. "Method and

apparatus for camouflaging a standard cell based integrated circuit with micro circuits and

post processing." U.S. Patent 8,510,700, issued August 13, 2013.

[92] Baukus, James P., Lap Wai Chow, and William M. Clark Jr. "Digital circuit with transistor

geometry and channel stops providing camouflage against reverse engineering." U.S. Patent

5,930,663, issued July 27, 1999.

[93] Imeson, Frank, Ariq Emtenan, Siddharth Garg, and Mahesh V. Tripunitara. "Securing

Computer Hardware Using 3D Integrated Circuit (IC) Technology and Split Manufacturing

for Obfuscation." In USENIX Security Symposium, pp. 495-510. 2013.

[94] Zhang, Jiliang. "A practical logic obfuscation technique for hardware security." IEEE

Transactions on Very Large Scale Integration (VLSI) Systems 24, no. 3 (2016): 1193-1197.

[95] Cocchi, Ronald P., James P. Baukus, Bryan J. Wang, Lap Wai Chow, and Paul Ouyang.

"Building block for a secure CMOS logic cell library." U.S. Patent 8,111,089, issued

February 7, 2012.

153

[96] Khan, Asif I., Chun W. Yeung, Chenming Hu, and Sayeef Salahuddin. "Ferroelectric

negative capacitance MOSFET: Capacitance tuning & antiferroelectric operation."

In Electron Devices Meeting (IEDM), 2011 IEEE International, pp. 11-3. IEEE, 2011.

[97] Wang, Danni, Sumitha George, Ahmedullah Aziz, Suman Datta, Vijaykrishnan Narayanan,

and Sumeet K. Gupta. "Ferroelectric transistor based non-volatile flip-flop." In Proceedings

of the 2016 International Symposium on Low Power Electronics and Design, pp. 10-15.

ACM, 2016.

[98] Aziz, Ahmedullah, Swapnadip Ghosh, Suman Datta, and Sumeet Kumar Gupta. "Physics-

based circuit-compatible SPICE model for ferroelectric transistors." IEEE Electron Device

Letters 37, no. 6 (2016): 805-808.

[99] Degans H., "Breakthrough in CMOS-compatible ferroelectric memory.

"https://phys.org/news/2017-06-breakthrough-cmos-compatible-ferroelectric-memory.html

[100] Sengupta, Abhronil, Yong Shim, and Kaushik Roy. "Proposal for an all-spin artificial neural

network: Emulating neural and synaptic functionalities through domain wall motion in

ferromagnets." IEEE transactions on biomedical circuits and systems 10, no. 6 (2016): 1152-

1160.

[101] Slonczewski, J. C. "Theory of domain‐wall motion in magnetic films and platelets." Journal

of Applied Physics 44, no. 4 (1973): 1759-1770.

[102] Kerry Bernstein “Supply Chain Hardware Integrity for Electronics Defense Shield” DARPA.

[103] Free icons by FLATICON: https://www.flaticon.com/

[104] C. Yu, X. Zhang, D. Liu, M. Ciesielski, and D. Holcomb, “Incremental SAT-based Reverse

Engineering of Camouflaged Logic Circuits,” IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems, 2017.

154

[105] Li, Meng, Kaveh Shamsi, Travis Meade, Zheng Zhao, Bei Yu, Yier Jin, and David Z. Pan.

"Provably secure camouflaging strategy for ic protection." IEEE Transactions on Computer-

Aided Design of Integrated Circuits and Systems (2017).

[106] Yasin, Muhammad, Bodhisatwa Mazumdar, Ozgur Sinanoglu, and Jeyavijayan Rajendran.

"Camoperturb: secure ic camouflaging for minterm protection." In Proceedings of the 35th

International Conference on Computer-Aided Design, p. 29. ACM, 2016.

[107] Dejka, William J. "Measure of testability in device and system design." In Proc. 20th

Midwest Symp. Circuits Syst, pp. 39-52. 1977.

[108] Goldstein, L. "Controllability/observability analysis of digital circuits." IEEE Transactions

on Circuits and Systems 26, no. 9 (1979): 685-693.

[109] Goldstein, Lawrence H., and Evelyn L. Thigpen. "SCOAP: Sandia

controllability/observability analysis program." In Proceedings of the 17th Design

Automation Conference, pp. 190-196. ACM, 1980.

[110] Chakraborty, Rajat Subhra, and Swarup Bhunia. "Security against hardware Trojan through

a novel application of design obfuscation." In Computer-Aided Design-Digest of Technical

Papers, 2009. ICCAD 2009. IEEE/ACM International Conference on, pp. 113-116. IEEE,

2009.

[111] Jang, Jae-Won, and Swaroop Ghosh. "A Novel Interconnect Camouflaging Technique using

Transistor Threshold Voltage." arXiv preprint arXiv:1705.02707 (2017).

[112] Collantes, Maria I. Mera, Mohamed El Massad, and Siddharth Garg. "Threshold-dependent

camouflaged cells to secure circuits against reverse engineering attacks." In VLSI (ISVLSI),

2016 IEEE Computer Society Annual Symposium on, pp. 443-448. IEEE, 2016.

[113] Bi, Yu, Kaveh Shamsi, Jiann-Shiun Yuan, Pierre-Emmanuel Gaillardon, Giovanni De

Micheli, Xunzhao Yin, X. Sharon Hu, Michael Niemier, and Yier Jin. "Emerging technology-

155

based design of primitives for hardware security." ACM Journal on Emerging Technologies

in Computing Systems (JETC) 13, no. 1 (2016): 3.

[114] El Massad, Mohamed, Siddharth Garg, and Mahesh V. Tripunitara. "Integrated Circuit (IC)

Decamouflaging: Reverse Engineering Camouflaged ICs within Minutes." In NDSS. 2015.

[115] Wang, Danni, Sumitha George, Ahmedullah Aziz, Suman Datta, Vijaykrishnan Narayanan,

and Sumeet K. Gupta. "Ferroelectric transistor based non-volatile flip-flop." In Proceedings

of the 2016 International Symposium on Low Power Electronics and Design, pp. 10-15.

ACM, 2016.

[116] Kazi, Ibrahim, Pascal Meinerzhagen, Pierre-Emmanuel Gaillardon, Davide Sacchetto, Yusuf

Leblebici, Andreas Burg, and Giovanni De Micheli. "Energy/reliability trade-offs in low-

voltage ReRAM-based non-volatile flip-flop design." IEEE Transactions on Circuits and

Systems I: Regular Papers 61, no. 11 (2014): 3155-3164.

156

Vita

Anirudh Srikant Iyengar

Anirudh Srikant Iyengar received his Bachelor’s degree in Instrumentation and Control

Engineering in 2010 from Manipal Institute of Technology, Manipal, India and his Master’s degree in Electrical Engineering in 2013 from the University of South Florida (USF), Tampa, USA. He is currently pursuing his Ph.D. degree in Computer Science and Engineering from the Pennsylvania

State University after transferring from USF in 2016.

Anirudh’s primary research interests include device, circuit and architectural techniques and applications of spintronic devices towards energy-efficiency and enhanced device security.

During his Ph.D. tenure, his research has also spanned topics such as spintronic memory retention testing and reliability analysis, synergistic system based on domain wall memory and authentication and protection of printed circuit boards.

His research work has culminated in several peer-reviewed journal and conference publications as well as best poster awards. Additionally, he holds three patents for his work on domain wall memory- based physically unclonable functions, magnetic tunnel junction -based non- volatile flip-flop and threshold voltage defined switch-based camouflaged logic.

During the course of his Ph.D., he also had the opportunity to intern twice at the Security

Center of Excellence at Intel for the 2016 and 2017 summers. He was supporting the team in understating the security issues that plague the emerging memories—Intel’s new 3DXpoint memory as well as spintronic memory. As part of service to the community, he has served as a technical reviewer for journals and conferences including IEEE TCAS-I, IEEE JETCAS, IEEE

TVLSI, HOST and Asian-HOST.