HARDWARE TROJAN ATTACKS: THREAT ANALYSIS AND LOW-COST

COUNTERMEASURES THROUGH GOLDEN-FREE DETECTION AND

SECURE DESIGN

by

XINMU WANG

Submitted in partial fulfillment of the requirements

For the degree of Doctor of Philosophy

Dissertation Adviser: Dr. Swarup Bhunia

Department of Electrical Engineering and Computer Science

CASE WESTERN RESERVE UNIVERSITY

January, 2014

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

Xinmu Wang

candidate for the Doctor of Philosophy degree*.

Swarup Bhunia (signed) (chair of the committee)

Christos Papachristou

Francis Merat

Andy Podgurski

(date) September 04, 2013

*We also certify that written approval has been obtained for any proprietary material contained therein. ii

TABLE OF CONTENTS

Page LIST OF TABLES ...... v LIST OF FIGURES ...... vi 1 Introduction ...... 1 2 Hardware Trojan Design ...... 5 2.1 Introduction ...... 5 2.2 Background ...... 6 2.3 Effective Hardware Trojan Design Techniques ...... 7 2.3.1 Sequential Hardware Trojan ...... 7 2.3.2 Side-Channel Aware Trojan Placement in Gate-Level Circuit Netlist ...... 20 2.4 Summary ...... 23 3 Hardware Trojan Attack in Embedded Memory ...... 24 3.1 Introduction ...... 24 3.2 Background ...... 26 3.2.1 SRAM Fault Models ...... 26 3.2.2 SRAM Testing Algorithms ...... 28 3.3 Trojan Attacks in SRAM Array ...... 32 3.3.1 Trojan Trigger Mechanism ...... 32 3.3.2 Trojan Type 1: Resistive Short/Bridge ...... 36 3.3.3 Trojan Type 2: Resistive Open ...... 40 3.3.4 Feasibility Verification ...... 44 3.4 Simulation Results ...... 49 3.4.1 Trojan Type 1: Short ...... 49 3.4.2 Trojan Type 1: Bridge ...... 52 iii

Page 3.4.3 Trojan Type 2: Open ...... 55 3.5 Discussion ...... 57 3.6 Summary ...... 58 4 Temporal Self-Referencing (TeSR) for Sequential Trojans Detection ... 59 4.1 Introduction ...... 59 4.2 Background and Scope ...... 62 4.2.1 Related Work ...... 62 4.2.2 Scope of the proposed Trojan detection approach ...... 64 4.3 Motivational Examples ...... 65 4.4 TeSR Methodology ...... 68 4.4.1 Test Generation ...... 71 4.4.2 Circuit Characterization ...... 74 4.4.3 Trojan Detection Sensitivity ...... 75 4.4.4 Role of Scan Chain ...... 76 4.4.5 DfS for Detecting Transition-Proof Trojans ...... 79 4.4.6 Summary of Test Considerations ...... 82 4.5 Results ...... 83 4.5.1 Test Setup ...... 83 4.5.2 Simulation Results ...... 85 4.5.3 Experimental Validation ...... 88 4.6 Summary ...... 90 5 Side-Channel Analysis based Reverse Engineering (SCARE) for Post-Silicon Validation ...... 91 5.1 Introduction ...... 91 5.2 Background ...... 93 5.3 Methodology ...... 94 5.4 Case Study: DLX Processor ...... 102 5.5 Summary ...... 105 iv

Page 6 Design for SoC Security ...... 107 6.1 Introduction ...... 107 6.2 Background of IIP and Embedded Core Test ...... 111 6.2.1 Infrastructure IP ...... 111 6.2.2 IEEE 1500 Standard ...... 111 6.3 Overview of IIPS ...... 113 6.4 Design of IIPS Security Functions ...... 116 6.4.1 Attack Models and Mitigation Strategies ...... 116 6.4.2 Security Primitive Design ...... 120 6.5 Test Protocol under IEEE Std. 1500 ...... 128 6.5.1 Wrapper Operation Modes ...... 128 6.5.2 SoC-Level IIPS Test Protocol ...... 130 6.6 Results ...... 133 6.6.1 IIPS Functional Validation ...... 134 6.6.2 SoC Authentication and Hardware Trojan Detection .... 136 6.6.3 Hardware Overhead ...... 139 6.6.4 Experimental Validation ...... 140 6.7 Discussion ...... 142 6.7.1 Flexibility ...... 142 6.7.2 Scalability ...... 142 6.7.3 Configurability ...... 143 6.8 Summary ...... 143 7 Conclusion and Future Work ...... 145 REFERENCES ...... 147 v

LIST OF TABLES

Table Page 2.1 Area/Power Overhead of Sequential Trojans of Same Functionality but Varying Implementations ...... 13 2.2 Hardware Overhead Incurred by the Trojans ...... 19 2.3 Measured RO Frequency Changes for Different Trojans ...... 22 2.4 Impact of Different Trojan Configurations (as shown in Fig. 2.7) on RO Frequency, 70nm PTM @1V, 25◦C ...... 23 3.1 Implemented Trojans of Type 1...... 39 3.2 Implemented Trojans of Type 2...... 44 3.3 Impact of Trojan Ts QB Vss x (x∈{2,3,4}) on a 32x64 SRAM array. . 52 3.4 Impact of two other type-1(Short) Trojans on a 32x64 SRAM array. .. 53 3.5 Payload of Trojan Tb QB QB x (x∈{2,3,4})...... 54 3.6 Impact of untriggered Trojan Tb QB QB x (x∈{2,3,4}) on a 32x64 SRAM array...... 54 3.7 Impact of untriggered Trojan To QB Vdd on a 32x64 SRAM array. .. 57 4.1 Difference metric and Test Length for three designs with three types of Trojan instances...... 86 6.1 Control values for wrapper boundary cell...... 130 6.2 Hardware overhead of IIPS w.r.t. two example SoCs...... 140 6.3 IIPS overhead in FPGA platform...... 141 6.4 Hardware Trojan detection results on FPGA...... 141 vi

LIST OF FIGURES

Figure Page 2.1 Sequential Trojan model and Trojan state diagram...... 8 2.2 Four sequential Trojan design examples...... 8 2.3 State diagram of a sequential Trojan with sequential and combinational logic sharing with original circuit...... 12 2.4 Various trigger and payload conditions for the proposed Trojan inserted in an embedded processor...... 14 2.5 (a) Trojan trigger mechanism; (b) state transition diagram for the sequen- tial Trojan...... 16 2.6 Hard macro creation flow for the FPGA platform [40]...... 20 2.7 Different payload insertion approaches: (a) stitching an extra gate (XOR) inside a delay path; (b) replacing an existing gate (e.g. NOT by XOR) and resizing; (c) stitching a gate outside built-in RO path; (d) inserting a NMOS pull-down transistor as payload; and (e) inserting the payload inside a master-slave FF...... 21 3.1 Common data backgrounds used in SRAM testing...... 31 3.2 Hardware Trojan attack in SRAM array: (a) a general model; (b) effective defect types...... 32 3.3 Data patterns that can be leveraged by Trojan trigger mechanisms. .. 34 3.4 Trojans causing v-cell node shorted to Vss: (a) triggered by 2-cell data pattern/& a word line; (b) triggered by 3-cell data pattern/& a word line. 37 3.5 Trojans causing v-cell pull-up path broken: (a) controlled by one node; (b) controlled by two nodes...... 40 3.6 Implemented Trojans of type 2...... 41 3.7 Layout of Trojans causing short defects in a compact SRAM array: (a) Ts BL Vss 2 WL; (b) Ts QB Vss 2 WL...... 45 3.8 Layout of Trojans causing bridge defects in a compact SRAM array: (a) Tb BLB BL 2 WL; (b) Tb QB QB 3 WL...... 46 vii

Figure Page 3.9 Layout of Trojans causing open defects in a compact SRAM array: (a) To Q(B) Vdd WL; (b) To Q(B) Vdd WL QB...... 47 3.10 Hold-SNM of the v-cell while Trojan is on...... 51 3.11 Read-SNM of the v-cell while Trojan is on...... 52 3.12 Trojans cause read-destructive fault (RDF) in the v-cell during read-0 operation...... 52 3.13 Trojans cause shifted write-0 trip point and degraded logic-1 voltage at QB...... 53 3.14 Trojans cause write-0 failure (RDF) in the v-cell...... 53 3.15 Trojan Tb QB QB 2 causes coupling faults in the v-cells: (a) with 2ns clock period; (b) with 3ns clock period...... 55 3.16 Trojan To Q VDD 2 causes (a) data retention fault; (b) read-after-write dynamic faults...... 56 3.17 Type-2 Trojans with WL as part of the trigger condition cause temporary negative SNM in the v-cell...... 57 4.1 (a) Sequential Trojan model and examples: (b) Synchronous Counter, (c) Rarely-triggered Finite State Machine (FSM), (d) MOLES Trojan [3]. 60 4.2 Comparison of challenges and scope of different Trojan detection approaches. 62 4.3 (a) Circuit-level parameter variations can be due to inter-die or intra-die variations in device parameters. (b) The effect of process variations on the average transient current can mask the effect of a Trojan circuit. . 66 4.4 Effectiveness of temporal self-referencing in detecting Trojans even amidst process variations...... 67 4.5 Basic concept of TeSR...... 68 4.6 The major steps of the TeSR for Trojan detection...... 70 4.7 Test application strategy considering the state transition diagrams for (a) full-scan and (b) no-scan designs. The example test signature consists of the average current for vectors I1, I2 and I3 applied when the circuit is in state S10. Different paths are used to arrive at state S10 to get same current signature for the golden circuit but different signatures if a Trojan is present and shows some activity for the particular test set under consideration...... 78 4.8 STG of transition-proof Trojan...... 79 viii

Figure Page 4.9 (1)Flip-flops in original circuit FSM; (b)DfS-enhanced flip-flops. .... 79 4.10 IEU with a 8-bit counter...... 84 4.11 AES with a MOLES Trojan LFSR (Linear Feedback Shift Register). .. 85 4.12 DLX with a FSM Trojan...... 85 4.13 Difference Metric for varying size of a sequential Trojan inserted in 32-bit IEU circuit, using TeSR and other process-calibration approaches. ... 87 4.14 Experimental setup using FPGA-based board and measured current wave- forms for validating the TeSR approach...... 87 4.15 Measurement results for DLX with 8-bit counter Trojan...... 89 5.1 Untrusted stages of the IC manufacturing flow. Steps of the proposed methodology to perform non-invasive RE and trust validation...... 92 5.2 Spatial self-referencing for identifying hierarchical functional blocks. .. 94 5.3 Main steps of the proposed approach for IC reverse engineering. .... 95 5.4 Current signatures of RCA and CSA adders for 45nm and 65nm nodes used for self-referencing based reverse engineering...... 95 5.5 Self-referencing current signatures of 8-bit Array Multiplier...... 99 5.6 Self-referencing current signatures of an 8-bit Wallace Tree Multiplier and the corresponding current of Array Multiplier for comparison...... 100 5.7 Steps of random logic structure identification...... 101 5.8 An example of the verification unit: (a),(b) Dual implementations of func- tion F=A&B0. (c) Here, B indirectly limits the switching caused by A. 102 5.9 Temporal self-referencing helps to identify the pipeline stage currents of a DLX processor...... 103 5.10 Extraction of combinational logic current by subtracting sequential current component: (a) 3-bit binary counter shows the FF switching pattern of 1-2-1-3 which can be easily identified from the current at the positive or negative edge of CLK. (b) Extracting combinational current...... 103 5.11 Transient current signatures corresponding to specific vectors used to iden- tify random logic structure isolated from the MEM stage of the DLX pro- cessor, with dependence on (a) a0, a1 and a3; and (b) a2, a4 and a5. . 104 5.12 Random logic structure of WB stage of the DLX processor...... 104 ix

Figure Page 6.1 (a) Security threats at different stages of IC development and deployment cycle; (b) Proposed infrastructure IP for security (IIPS) interfaces with constituent cores of a SoC and provides a flexible, convenient means of addressing various security concerns...... 108 6.2 IEEE 1500 Standard: (a) core wrapper interface terminals; (b) mandatory components of the wrapper [88]...... 112 6.3 Block diagram of the IIPS module showing interconnection with other IP cores in an SoC using SoC boundary scan architecture...... 113 6.4 State transition diagrams of the IIPS master FSM and Scan enable control FSM embedded inside it...... 114 6.5 Functional core scan chain protection with SC EN...... 120 6.6 ScanPUF, PUF realized in the scan chains of IPs by the IIPS module: (a) principle of signature generation [97]; (b) clock generator and timing of relevant signals...... 122 6.7 Design of the programmable delay line [93]...... 124 6.8 (a) Concept of clock sweeping technique for hardware Trojan detection [98]; (b) Trojan detection through monitoring of delay shift by observing the latched value under clock sweep in two possible schemes...... 125 6.9 Clock sweeping based hardware Trojan detection: (a) clock generator; (b) clock generation signal timing...... 128 6.10 (a) Wrapper boundary registers concatenated for SoC test or IIPS func- tions; (b) wrapper boundary cell and its configurations...... 129 6.11 Configuration of scan chains of different IPs during signature generation for chip authentication using scanPUF structures...... 131 6.12 Example SoC configuration when performing Trojan detection for: (a) Trojans inside a core; (b) Trojans in SoC system bus...... 134 6.13 IIPS timing diagram: (a) M-FSM and SE-FSM; (b) ScanPUF clock gen- eration; (c) Trojan detection clock generation...... 134 6.14 Inter-die HD distribution for scan-based PUF in case of 500 chips under tsig = 0.19ns...... 136 6.15 Example combinational path used as a model of Trojan attack. .... 138 6.16 Minimum delay of Trojan detectable with clock sweeping technique im- plemented in IIPS for σ = 5% intra-die process variation...... 139 x

Hardware Trojan Attacks: Threat Analysis and Low-Cost Countermeasures Through Golden-Free Detection and Secure Design

Abstract by XINMU WANG

Due to multiple untrusted components in integrated circuits (ICs) life cycle, mali- cious modifications of integrated circuits in design houses or foundries have emerged as a major security threat. Such modifications, popularly referred to as Hardware Trojan attacks, are extremely difficult to detect during manufacturing test. Effective- ness of traditional logic testing and side-channel analysis based detection approaches are limited by their capability in meeting complex Trojan trigger conditions and the masking effect due to large process variations, respectively. In this thesis, we analyze hardware Trojan attacks of various forms from both an attacker’s and a de- fender’s perspectives, with the final goal of developing effective defense mechanisms to thwart Trojan attacks and protect ICs security. From an attacker’s point of view, we explore the design space of hardware Trojan by developing innovative and efficient Trojan design techniques at different stages of IC development. Hardware Trojans are designed and implemented to cause system malfunction and critical information leakage. Novel circuit level design techniques are investigated for minimizing Tro- jan side-channel fingerprint. A new class of hardware Trojans is proposed that can be mounted in Static-Random-Access Memories (SRAMs) to tamper data integrity in embedded memories (e.g. processor cache), which also validates the feasibility of mounting general hardware Trojan attacks in foundries by manipulating design layouts. As effective defense measures, we propose two robust side-channel analysis based Trojan detection approaches that do not require a golden IC instance thus eliminate process noises. Finally, as a Design-for-Security (DfS) technique, the con- cept of Infrastructure IP for Security (IIPS) is proposed and implemented to provide comprehensive protections against various forms of hardware attacks. Both circuit- xi level simulations and experimental results are provided demonstrate the effectiveness of the countermeasures at modest hardware overhead. ACKNOWLEDGEMENTS

My first debt of gratitude goes to my research advisor, Dr. Swarup Bhunia, who has given me guidance and persistent help throughout my Ph.D program. Dr. Bhunia holds great enthusiasm in research and has always been able to motivate me in ex- ploring different possibilities when I face an academic challenge. He has been a role model to me as an innovative researcher, while has been able to give me visions and guidance in detailed research problems. This thesis would not have been possible without his help. I would like to thank my dissertation committee members, Dr. Chris Papachris- tou, Dr. Frank Merat, and Dr. Andy Podgurski. Their helpful suggestions and perspectives are valuable for improving the thesis and continuing the research in the future. I deeply appreciate their time, effort, and great patience. Special thanks go to all my labmates in Nanoscape lab. They have always been friendly, considerate, and generous in offering help whenever I face difficulties. Discus- sions with them on research problems and their strong spiritual support have helped me go through the tough times in my Ph.D study. I also want to thank all my friends in US and China. They are my sources of joy and spiritual support. It is all these friends who make my doctoral journey a glorious memory. I believe our friendships can extend well beyond our shared time. Finally, I owe my deepest gratitude to my parents, my grandparents and all my family. Their love is the driving force for every stage of my life. This thesis is dedicated to them. I also would like to acknowledge National Science Foundation (NSF) for providing part of the financial support for the research (NSF CAREER grant CNS 1054744). 1

1. INTRODUCTION

Globalization in the (IC) industry decreases control of IC vendors on the fabricated chips. Incorporation of 3rd party Intellectual Properties (IPs) and Computer Aided Design (CAD) tools, as well as outsourcing of fabrication to off-shore facilities help to lower cost and meet aggressive time-to-market targets. However, these untrusted elements greatly reduce designers’ control on the fabricated chips, creating chances for potential attackers to secretly manipulate a design for malicious purposes, which can compromise an IC’s functional or parametric behavior. Such tampering through malicious modification of a design is referred to as hardware Tro- jan. Hardware Trojans can be inserted in different phases of IC development cycle in various forms, e.g. malicious insertion in IP cores, modification of design netlists, or tampering GDS-II files in foundries; and can be designed to alter the intended IC functionality or leak secure information from inside an IC. This can have serious consequences during in-field operation, especially in security-critical applications such as military, communication and national infrastructure [57]. On the other hand, hardware Trojans are generally extremely difficult to detect during manufacturing test. Clever attackers can design intelligent and sophisticated trigger conditions, e.g. using rare node values or a sequence of rare events, rendering the Trojan hardly get triggered during normal testing with limited test time and test cases. Traditionally, two types of hardware Trojan detection techniques have been proposed in literature: (1) Logic-testing based techniques and (2) side-channel analysis-based techniques [10]. Logic testing approaches rely on applying intelligently generated test vectors and verifying the circuit output against the golden output, which are, however, not effective in detecting large complex Trojans, e.g. sequential Trojans, because Trojan trigger conditions can be designed to be arbitrarily difficult to meet. Side-channel analysis has emerged as an alternative class of Trojan detection 2 approaches, which do not require trigger Trojan payload malfunction but look for abnormal side-channel parameters, e.g. circuit path delay or power profile. However, existing side-channel approaches suffer from significant process variations, especially in deep submicron , which largely reduce the detection sensitivity and sets a lower limit of the sizes of Trojans detectable. In this thesis, we analyze hardware Trojans from both an attacker’s and a de- fender’s perspective, with the final goal of developing effective defense mechanisms to thwart Trojan attack and protect SoC security. From an attacker’s point of view, we explore the design space of hardware Trojan by developing innovative and efficient Trojan design techniques at different stages of IC development. Three case studies are provided: (i) We design and implement hard-to-detect sequential Trojans in an embedded processor to cause system malfunction and leak critical information. (ii) We investigate circuit level design techniques to minimize the side-channel finger- print of Trojans that are inserted in a synthesized circuit netlist, and demonstrate the capability of the implemented Trojans in bypassing detection of a ring oscilla- tor network (RON) based design hardening mechanism. (iii) For the first time, to our acknowledge, we propose a new class of hardware Trojans that are mounted in Static-Random-Access Memories (SRAMs). The Trojans are designed to evade in- dustrial standard post-manufacturing tests while causing tampered data integrity in cache memories during deployment, through which we also demonstrate the general feasibility of inserting such Trojans in foundries by manipulating design layouts. As a defender, we consider both post-manufacturing Trojan detection and design techniques to facilitate SoC security. Trojan detection sensitivity of existing side- channel analysis based approaches are largely limited by IC process variations, which is fundamentally because that these methods rely on the comparison of one IC in- stance with another one. Associated with this, the requirement of a golden reference chip is also an extra burden demanding intensive efforts. In this thesis, two golden- free robust Trojan detection approaches are proposed, neither requires a golden chip and thus eliminates process noise. Particularly, Temporal Self-Referencing (TeSR) 3 targets detection of sequential hardware Trojans. The effectiveness is based on uncor- related state transitions between the Trojan finite state machine (FSM) and those of the original circuit. Side-Channel Analysis based Reverse Engineering (SCARE) is a fast non-destructive reverse engineering technique that can hierarchically extract structural information from an IC through its transient current signature. It com- bines side-channel analysis with logic testing, and can be applied as part of post-silicon validation process to verify IC integrity. Besides Trojan detection, design-time incorporation of Design-for-Security (DfS) features is essential for . A concept of Infrastructure IP for SoC Security (IIPS) is presented. The motivation is that DfS measures for diverse security threats require specific design modifications, leading to remarkable design overhead, unaffordable design effort and test procedures incompatible with SoC ar- chitecture. IIPS integrates multiple security measures and provides comprehensive security protections for an SoC, as well as features ultralow-overhead, ease of inte- gration, nearly plug-and-play, and good flexibility/functional scalability. We have implemented a representative IIPS to protect an SoC against scan-based attack, IP piracy and hardware Trojan attacks. The thesis performs a thorough analysis on hardware security attacks and coun- termeasures. The main contributions include:

1. The thesis have explored the design space of hardware Trojans by designing and implementing novel Trojans with various payloads, including system malfunc- tion and information leakage; as well as developing effective circuit-level design techniques to minimize Trojan side-channel fingerprint.

2. It proposes a new class of hardware Trojans mounted in SRAMs, which can evade industrial standard post-manufacturing SRAM testing but tamper data integrity in embedded memories during deployment. This also validates the feasibility of mounting general hardware Trojan attacks in foundries by manip- ulating design layouts. 4

3. Two golden-free robust side-channel analysis based Trojan detection approaches have been developed, which eliminate the impact of process variations thus able to achieve high Trojan coverage.

4. A concept of Infrastructure IP for SoC Security (IIPS) have been proposed and implemented. IIPS integrates multiple security measures to defense an SoC against comprehensive forms of hardware attacks. As a nearly plug-and-play module, IIPS provides an efficient manner to integrate security protection into IC design flow.

The rest of the thesis is organized as follows. In section 2, we investigate RTL and circuit level hardware Trojan design techniques to achieve effective Trojan design that minimize the trigger probability in conventional testing and side-channel finger- print. In section 3, a new class of Trojans is proposed by exploring the design space in SRAM arrays, which can bypass industrial standard SRAM testing while compro- mise cache data integrity in deployment. Two golden-free side-channel analysis based hardware Trojan detection approaches are then presented in section 4 and section 5, respectively, for detection of hard-to-detect sequential Trojans and non-destructive post-silicon IC integrity validation. Section 6 proposes the concept of IIPS as a DfS technique to provide comprehensive security protections for an SoC against various forms of hardware attacks. Finally, we conclude the thesis and provide further re- search directions in section 7. 5

2. HARDWARE TROJAN DESIGN

2.1 Introduction

Due to global outsourcing of fabrication services to foreign countries, there is an emerging security concern with integrated circuit (IC) manufacturing, regarding potential malicious modification during fabrication [1] in untrusted foundry. Such malicious hardware modifications, also referred to as Hardware Trojans, can give rise to undesired functional behavior of a chip, for example, providing covert channels or back doors through which sensitive information such as cryptographic keys can be leaked [4]. These ICs could also be manipulated deliberately to cause in-deployment performance degradation or malfunctions. Besides tampering the functional robust- ness of generic consumer electronics, hardware Trojans can cause catastrophic con- sequences during in-field operation of security-critical applications such as military, communication and national infrastructure [57]. In this chapter, we investigate effective hardware Trojan design techniques from two perspectives: (1) designing sequential hardware Trojans to evade logic based de- tection; (2) circuit level design techniques to minimize Trojan side-channel fingerprint. Chapter 3 will explore the feasibility of mounting hardware Trojans in SRAM arrays in design layout. These aspects correspond to different stages during IC development, i.e. front-end functional design phase, synthesized gate-level netlist, and GDSII files in foundries. Two case studies are provided in this chapter to demonstrate the effec- tiveness of the proposed techniques and design considerations in the two scenarios. Particularly, in case study 1 we implement RTL-level hardware Trojans in an 8051 embedded processor, including multiple design variations that explore different hard- ware vulnerability to cause system malfunctions and information leakage. Case study 2 explores design techniques to mount Trojans in gate-level netlists hardened by a 6

Ring-Oscillator-Network, and demonstrate the Trojans’ ability to bypass detection of the hardening mechanism. In particular, this chapter makes the following contributions:

1. It presents novel design of hardware Trojans in an embedded processor that target leaking secret information from inside the processor. The Trojans can be triggered in field by an adversary by manipulating the software or input data. Secret information can be leaked either through processor ports as logic values or through side channels (e.g. supply current).

2. It proposes innovative approaches of designing and placing Trojans in a gate- level circuit netlist in order to effectively evade existing protection mechanisms. It shows that clever design of Trojan trigger/payload circuits can incur ultra-low delay/power overhead, thus bypassing side-channel analysis based defenses.

The rest of the chapter is organized as follows. Section 2.2 presents related work on hardware Trojan design. The design techniques, implementation details and simu- lation/experimental results are described in Section 2.3. We conclude in Section 2.4.

2.2 Background

A detailed taxonomy of hardware Trojans and their detection mechanisms is pre- sented in [10]. A common classification of Trojans [2, 11] is based on the activation mechanism (referred as Trojan trigger) and the effect on the circuit functionality (re- ferred as Trojan payload). Trojans can be both combinationally and sequentially trig- gered. Typically, an adversary would choose an extremely rare activation condition so that it is highly unlikely for the Trojan to trigger during conventional manufactur- ing test. Sequentially triggered Trojans (the so-called “time bombs”), on the other hand, are activated by the occurrence of a sequence of rare events, or after a period of continuous operation. The output of the Trojan circuit can maliciously affect the functionality of the circuit by affecting the logic values at its internal nodes (payload). 7

Another kind of Trojan which has a passive payload is used to leak the secret key used in cryptographic hardware by aiding in side-channel attacks. A classification of Trojans designed for information leakage is presented in [14].

2.3 Effective Hardware Trojan Design Techniques

2.3.1 Sequential Hardware Trojan

To prevent hardware Trojans from being detected during conventional post-silicon validation procedures, intelligent attackers are expected to design Trojans which are stealthy in nature. Typically, attackers would insert Trojans that can trigger upon some rare conditions and compromise the security or functionality of the design. Hardware Trojan circuits can either be combinational or sequential [57]. Combina- tional Trojans are triggered on the occurrence of rare logic values of one or more internal nodes, while a sequential Trojan exhibits its malicious effect after a sequence of rare events during long period of field operation, acting as a time-bomb. Gen- erally, sequential Trojans can be designed to be exponentially harder-to-detect than combinational Trojans by increasing the length of trigger sequence. In fact, these sequential Trojans can be extremely small in size and hard-to-detect during normal post-Si testing. They can also bypass exhaustive testing of a design in full-scan mode.

Functional Sequential Trojan Model

A sequential Trojan can be represented as a finite state machine (FSM), where the Trojan trigger sequence is mapped to one of the rarely-satisfied paths in its state transition diagram. The general FSM based model of a sequential Trojan is illustrated in Fig. 2.1. The next state logic of the Trojan FSM depends on the occurrence of certain rare events, i.e. combinations of rare logic values, of the original circuit’s internal nodes. The Trojan circuit undergoes state transition under certain pre-defined rare events in the original circuit; otherwise the Trojan will remain in 8 the current state or go back to the initial state if the expected rare event does not happen. The Trojan output is activated only upon reaching the final Trojan state

(ST ), when it affects the payload node compromising the original circuit’s normal operation. Next, we provide examples of various types of sequential Trojans.

Fig. 2.1. Sequential Trojan model and Trojan state diagram.

Fig. 2.2. Four sequential Trojan design examples. 9

1) Free-runnning/Enabled Synchronous Counter Trojan: Fig. 2.2(a) shows a k-bit synchronous counter Trojan with or without an enable signal. A synchronous free- running counter works like a time-bomb where there is no event-dependent trigger condition. The Trojan will get triggered, independent of the operation of the original circuit, and the only design parameter is the time duration for activation of the Trojan (referred as time-to-trigger). It has a deterministic time-to-trigger 2k-1 clock cycles, where k is the number of state elements in the counter. The drawback of this type of Trojan is the large area/power overhead required in order to guarantee a certain trigger time. By using rare nodes of the original circuit to generate an enable signal for the counter, we can lower the trigger probability and thus greatly increase the time-to-trigger for the same area overhead. 2) Asynchronous Counter Trojan: The asynchronous counter-based Trojan uses an internal signal as the clock for counting the occurences of a rare event. As the example shown in Fig. 2.2(b), p and q are two rare internal nodes in the original circuit, both of which have the rare logic value of 1. Therefore ANDing p and q creates a signal that seldom switches from 0 to 1, and thus can be used as the clock signal for the counter. By proper choice of the rare events, one can ensure an extremely large time-to-trigger. 3) Hybrid Counter Trojan: To further lower the Trojan trigger probability, a hybrid counter Trojan model is developed as demonstrated in Fig. 2.2(c). It contains multiple cascaded counters, where the counters can be synchronous or asynchronous, with the clock of the second counter depending on both the first counter state and rare internal events. In the particular example shown in Fig. 2.2(c), whenever the first counter achieves its maximum value of 2k1 − 1, if both signals p and q happen to be at their rare value of logic 1, the second counter will be updated. 4) FSM based Trojan: The counter-based Trojans can be generalized to FSM- based Trojans, which contain a sequential and combinational part, with the inputs being derived from rare circuit conditions. The advantage of the FSM-based Trojan is that they can be designed to be arbitratily complicated with same amount of resource 10

and can re-use both combinational logic and flip-flops (FFs) of the original circuit for FSM-hosting. Moreover, unlike counters which are uni-directional, the FSM- based Trojan can have state transitions leading back to the initial state, thus causing the final Trojan state to be reached only if the entire state sequence is satisfied in consecutive clock cycles.

Expected Time-to-Trigger

The time it takes for the inserted Trojan to get activated (time-to-trigger) is not deterministic (except for free-running counter whose trigger is independent of the circuit condition) because the working load of the IC can vary. It depends on the actual Boolean logic used as Trojan state transition function, which gets satisfied based on the actual sequence of input vectors applied to the circuit.

To estimate the expected time of Trojan activation Tmean, consider that the Trojan passes through a sequence of states S1,S2 ...SN before getting activated, as shown

in Fig. 2.1. Suppose the probability of the Trojan having transition from state Si−1 k to state Si is given by pi, 1 ≤ i ≤ (N + 1), where N = 2 − 2 and k is the number of state elements. This is essentially a Markov Process, with pi depending only on the present state Si−1 of the Trojan FSM. For simplicity, consider that the Trojan while inactive stays at its present state for all input state space conditions except the unique condition that causes a state transition. Once in state Si−1, the probability of the Trojan staying in state Si−1 is 1 − pi. Hence, on average, the number of cycles the Trojan spends in state Si−1 is: 11

2 T (Si−1)) = pi · 1 + (1 − pi) · pi · 2 + (1 − pi) · pi · 3 + ... → ∞ (2.1) n X ( = limn→∞ (1 − pi) j − 1) · pi · j (2.2) j=1 n 1 − (1 − pi) n = limn→∞ − n · (1 − pi) (2.3) pi (2.4)

Hence, the expected time-to-trigger for the Trojan in terms of clock cycles (assume continuous operation):

N+1 X 1 T = (2.5) mean p j=1 i

For an FSM-based Trojan which goes back to the initial state in absence of the rare state transition conditions, the trigger requires a continuous satisfaction of the rare trigger sequence, therefore the trigger probability is:

N+1 Y P (S = ST ) = (pi) (2.6) j=1

The Trojan model can be simplified to a two-state FSM containing only the initial state (S0) and the Trojan state (ST ) where the transition probability from S0 to ST is given by equation 2.6. Since equation 2.1 is applicable to this one-step model, the expected time-to-trigger is given by:

1 T = (2.7) mean QN+1 j=1 pi 12

Fig. 2.3. State diagram of a sequential Trojan with sequential and combinational logic sharing with original circuit.

Optimized Implementation

From an attacker’s perspective, it is important to minimize the hardware overhead introduced by Trojans in order to reduce the impact on side-channel parameters, e.g. path delay and power profile, to hide the Trojans well against side-channel based detection. Although in the previously described sequential Trojan model, Trojan state elements are shown separately from those of the original circuit, it is not necessary for sequential Trojan insertions to introduce extra state elements. Instead, they could use existing unused states of the original circuit, if applicable. For example, Fig. 2.3 shows an FSM of five states, requiring 3 state elements with binary encoding. Here, the unused don’t care states (S5 and S6) can be leveraged by the attacker to implement a sequential Trojan. Such sequential elements sharing benefits the attackers in both minimizing the area and power overhead, since only the next state logic is modified, as well as in protecting the Trojan from formal verification based approaches. To further reduce the area/power overhead, the Trojan can be carefully designed to reuse the combinational logic of the original circuit. For example, the Trojan state machine in Fig. 2.3 reuses the transition conditions of the original FSM, whose consecutive occurrence is an extremely rare event in state S4. Table I demonstrates the area/power overhead due to a sequential Trojan with the same functionality yet different implementations. In particular, Trojan 1 is implemented with extra state elements; Trojan 2 reuses the existing don’t care states without 13

Table 2.1 Area/Power Overhead of Sequential Trojans of Same Functionality but Varying Implementations Design/Overhead* Area Power Seq. Comb. Overall Orig. Ckt w/ Troj.1 8.1% 4.3% 5.4% 3.5% Orig. Ckt w/ Troj.2 0 3.4% 2.3% 1.2% Orig. Ckt w/ Troj.3 0 0.8% 0.6% 0.4% *All designs are synthesized at iso-delay as the original circuit

next state logic sharing; and Trojan 3 re-uses both state elements and next stage logic, by exploiting existing rare conditions in the combinational logic. For example, in a microprocessor, it is not difficult to find such rare conditions in the memory controller or ALU logic. The power overhead is mainly caused by the leakage power of the sequential Trojans, because dynamic power due to the Trojans is negligible due to their low switching activity.

Case Study 1: Design Software Exploitable HTs in an Embedded Processor

Software exploitable hardware Trojans could be designed to support general at- tacks with variable payload effect defined by the malicious software. However, such Trojans are more suitable for general-purpose processors or complex embedded pro- cessors that already have hardware supported security features, where various attacks could be performed based on corrupting the security features (e.g. privilege mecha- nism) through the Trojan-induced backdoor. In our work, our target system was a simple 8051 microcontroller without any security features and dedicated to perform an function. Therefore we focused on designing practical Trojan attacks which both exploit the features of a processor and explore possible vulnerabilities of an encryption system. In particular, we implemented software exploitable hardware Trojans able to leak the program IP, steal the encryption key of the crypto system, 14

Fig. 2.4. Various trigger and payload conditions for the proposed Trojan inserted in an embedded processor.

and cause system malfunction (Fig. 2.4). The Trojan trigger mechanism makes use of both the instructions being executed and the data being used by the processor [9]. Trojan Trigger Condition: The simplest Trojan is an always-on Trojan without requiring any triggering condi- tion to start malfunctioning. Though causing less overhead, it is liable to get detected during post-manufacturing testing due to the evident malfunction. To circumvent this, it has been proposed to make the Trojan trigger condition either controllable externally by an attacker or to use rare conditions in the internal circuitry to activate the Trojan, e.g. sequential triggering of Trojans [6]. This could also take advantage of any easily perceptible test control signals to disable the Trojan in the test mode. For example, if the design has a scan chain which is enabled by a test control (TC) signal, the extra controllability/observability offered by the scan flip-flops can be negated by disabling the Trojan with the TC signal. The Trojan trigger condition could be more powerful in the context of a proces- sor. Since in processors, hardware Trojan can serve as supporting hardware platform for ”software triggered Trojan” [12], which means the hardware loopholes can be ex- ploited by software codes to trigger the Trojan. Theoretically three aspects can be leveraged by the processor to define Trojan trigger condition: specific sequence of instructions, specific sequence of data, and combinations of sequences of instructions 15 and data, where the data can be obtained from main memory or I/O. In this way the Trojan trigger process is controllable by the attacker and flexible since multiple instructions or data can trigger the same Trojan, as long as they satisfy the feature required for Trojan trigger. For a processor running an operating system with mul- tiple user programs, the hardware Trojan trigger condition design can be applied in multiple ways, where all three manners listed above can be used by Trojan trigger mechanism. However, in the context of this competition, the 8051embedded micro- controller is supposed to run a dedicated program to perform RC5 encryption [13]. Therefore, our Trojan trigger condition should be capable of being exploited by this specific program. On the other hand, we need to make sure the trigger condition is only known and controllable by the attacker and should not be triggered during normal operation of the program. Therefore, we cannot use simply a sequence of instructions as the trigger condition. Instead, we can use 1) specific sequence of data (plaintext in this case) and 2) specific sequence of combinations of specific instruc- tions/data. In particular, we use a specific sequence of instructions to capture a sequence of plaintext data, which is then compared with the pre-defined plaintext sequence to determine whether to trigger the Trojan or not. We embedded an FSM serving as a sequence monitor in the control logic of the 8051 microcontroller to watch for the execution of the following code segment to capture the plaintext: MOVX A, dptr ADDC A,Ri MOV Ri, A Based on high-level knowledge of the RC5 encryption algorithm, we know that the algorithm will start with XORing the plaintext stored in the internal memory, with the encryption key stored in the external memory. Repetition of the above code segment allows the FSM to capture the plaintext, namely the data from external memory upon observation of such code segment. This means the data is captured in the Arithmetic Logic Unit (ALU) inputs instead of from the data bus. Actually, this 16

Fig. 2.5. (a) Trojan trigger mechanism; (b) state transition diagram for the sequential Trojan.

is more reliable then directly monitoring the data on the data bus, because of the lack of knowledge of the software implementation of the algorithm. Upon capturing each plaintext word, a comparison will be performed with the predefined word to decide whether to move forward one state or re-initialize the sequence monitor. If the entire sequence of the pre-defined plaintext is seen, the FSM will trigger the Trojan payload. The length of the plaintext sequence needs to trade-off between the requirement of a low probability of accidental trigger during testing and the hardware overhead. Since the 8051 microcontroller has a multi-cycle microarchitecture using an FSM to control the instruction execution, we can easily embed our Trojan trigger sequence monitor into the control logic. Design optimization through logic sharing can also be performed to reduce the hardware overhead. Fig. 2.5(a) and (b) provide the Trojan trigger mechanism and state transition respectively. Trojan Payload: 17

Regarding the function of the Trojan payload, various Trojans have been proposed in the literature starting from simply inverting at some internal node, presenting non- sense information at the primary outputs, to cleverly leaking secret information inside the hardware. Channels for leaking information can be through output ports, modu- lation of existing outgoing information, or the carrier frequency, phase, amplitude to piggyback on existing modes of communication, or side-channels such as power trace or EM radiation. In our work, we focused our effort on designing Trojan payload to make the most out of the context of an embedded processor functioning as a cipher. The information is of three types when it comes to a processor chip: secret key stored inside a processor or fetched from external memory (e.g. hard disk), the code run- ning in the processor, and the data which is being operated on by the processor for a given task. In particular, we designed three types of Trojan payloads: 1) leakage of the software IP; 2) leakage of the encryption key; and 3) causing various system malfunctions. Nowadays software IPs are increasingly valuable properties, especially with the rapid prospect of intelligent embedded portable devices, where application software are making big profit. Therefore protection of software IPs in embedded systems is of significant importance. Correspondingly, leakage of the executable software IP through hardware-created backdoor poses a significant threat. In this work, we implemented 6 types of Trojans which can leak the software IP (the RC5 encryption program in our case) through primary outputs or to external memory: Leak S/W IP payload #1: leak program at run-time through LEDs. When an instruction is fetched from the instruction memory to the register, it is also passed on to the LED ports for display. In reality, the information leakage channel could be temporarily unused output ports or various side channels. S/W IP payload #2: leak program subroutines related to critical computing. Sometimes the program segment related to critical computing of an encryption algo- rithm is implemented in functions or subroutines and used multiple times in program. In such circumstances, Trojan payload can be implemented to leak such functions or 18 subroutines only, requiring extra hardware logic to identify the starting and end point of a function/subroutine. In our work, the Trojan identifies lcall and ret instruction to capture the subroutine. Leak S/W IP payload #3: secretly store the program at run-time then leak it dur- ing delay loops. Leaking information at run-time might be subject to user observation because the output may be closely monitored for security purpose. Alternatively, we can secretly store a copy of the executed instructions into idle memory locations and leak them during the delay loops of the program. Actually often the delay loops could be for waiting for the next information transaction; in our case, it occurs during the gap between one encryption and the next plaintext. At these times, the output values might be neglected by the user and can be leveraged by the attacker. Each payload mechanism listed above is also implemented in a variant form, where the instruction sequence is leaked to the external memory. This is based on the consideration of real scenarios, where the processor runs an OS and multiple user programs. User program later can have access to the memory locations storing the software IP given it is not stored in privileged memory regions. Apart from Trojan payload of leaking the software IP, we also implemented Tro- jans to leak the encryption key, which is central to the confidentiality of an encryption system. In the RC5 algorithm, the encryption key is used in bytes to XOR with the plaintext or intermediate values, where the XOR is actually realized through ADD or ADDC instruction. Therefore, the Trojan is implemented in such a way that it would leak the ADD and ADDC operands upon trigger. Specifically two versions of the Trojan are implemented to leak the key through primary outputs and the external memory for later access, respectively. Beside confidentiality and integrity, availability is also a concerned aspect in infor- mation security. Correspondingly, Trojans are implemented to cause different system malfunctions: Malfunction payload #1: cause illegal memory write. 19

Table 2.2 Hardware Overhead Incurred by the Trojans Design # of LUTs (overhead) # of FFs (overhead) Reference Circuit 2791 551 w/ Trojan 1 2866 (+2.7%) 594 (+7.8%) w/ Trojan 2 2821 (+1.1%) 594 (+7.8%) w/ Trojan 3 2805 (+0.5%) 625 (+13.4%) w/ Trojan 4 2879 (+3.1%) 619 (+12.3%) w/ Trojan 5 2763 (-1.0%) 619 (+12.3%) w/ Trojan 6 2691 (-3.6%) 594 (+7.8%) w/ Trojan 7 2777 (-0.5%) 622 (+12.9%) w/ Trojan 8 2816 (+0.9%) 594 (+7.8%) w/ Trojan 9 2812 (+0.8%) 594 (+7.8%) w/ Trojan 10 2764 (-1.0%) 594 (+7.8%)

Malfunction payload #2: Modify stack pointer to change subroutine return ad- dress location, e.g. point to malicious subroutine. Malfunction payload #3: Modify the key at ALU input. Design Overhead: Table 2.2 provides the hardware overhead of the 10 implemented hardware Tro- jans, from which it can be seen that the combinational logic overhead is rather small (<3.1%). Existence of negative percentage overhead is because our implementation was based on FPGA platform, where modification of the design would result in re- placement/re-routing of the design and could cause reduction of resource utilization even upon increase of HDL codes. In addition, it is worth noting that hardware Tro- jans do not necessarily mean extra logic: the inclusion of the Trojan function can be through modification of the original design. Besides, we always try to optimize the design to maximize the logic sharing between the original and Trojan circuitry. The percentage overhead of used flip-flops is relatively large because the given 8051 microprocessor is a simplified design and does not have many state elements. The overhead would be significantly smaller in the case of real-world processors. 20

Fig. 2.6. Hard macro creation flow for the FPGA platform [40].

2.3.2 Side-Channel Aware Trojan Placement in Gate-Level Circuit Netlist

Recently, Ring Oscillator Network (RON) based design hardening approaches [78], [39] were proposed for hardware Trojan detection through ring oscillator (RO) fre- quency change. These RON based approaches mainly fall into two categories: One is securing the design by dynamically configuring circuit paths into ROs to monitor undesired design modification [39]; the other is additionally inserted RON to detect voltage drops due to extra Trojan circuitry [78]. In this work, we consider designs hardened by the first approach, and analyze the effectiveness of the proposed Trojan insertions in both FPGA and ASIC scenarios. Our analysis shows that attackers can insert stealthy Trojans which successfully evade the hardening mechanism. In an FPGA-based framework, the insertion of any extra circuitry would cause the entire design to be re-synthesized and re-routed, resulting in wide fluctuations in embedded RO frequencies, which were dominated by interconnect delays. An effective Trojan insertion technique is to preserve the original design topology by making it a Hard Macro [6]. Hard macro generally refers to a pre-compiled module, which can be re-used in its optimized form. Fig. 2.6 illustrates the flow of creating a hard macro using Xilinx ISE [40]. Since a hard macro consists of previously synthesized, mapped, placed and routed circuitry, the RO layout will not change due to Trojan insertion. In the scope of ASIC, a clever attacker can reverse engineer and bypass existing embedded ROs when mounting Trojans. Even if all paths (and all gates) of the 21

Fig. 2.7. Different payload insertion approaches: (a) stitching an extra gate (XOR) inside a delay path; (b) replacing an existing gate (e.g. NOT by XOR) and resizing; (c) stitching a gate outside built-in RO path; (d) inserting a NMOS pull-down transistor as payload; and (e) inserting the payload inside a master-slave FF.

circuit are covered by some RO(s), the Trojan can still be inserted without affecting the delay significantly as shown below. Generally, Trojan trigger logic only adds load capacitance to some circuit nodes, which can be distributed to different ROs. On the other hand, Trojan payload usually adds an (XOR) gate delay to the original circuit path, as shown in Fig. 2.7(a). Four ways of designing Trojan payload can avoid directly inserting gates in RO path: (i) Re-synthesizing the design and re-sizing the gates after insertion of Trojan payloads to preserve the path delay. For example, in Fig. 2.7(b) the Trojan payload is implemented by modifying an inverter to an XNOR gate with the other input coming from the Trojan output, and sizing the gate to incur the same delay. (ii) Inserting the Trojan payload outside RO paths at a primary output or flip-flop input, so as to add only extra load capacitance, as shown 22

Table 2.3 Measured RO Frequency Changes for Different Trojans Trojan Type Adder w/ 2 ROs Adder w/ 5 ROs RO1 RO2 RO1 RO2 Synchronous counter 1.46% 1.44% 1.59% 2.83% Synchronous counter w/ En 0.06% 0.49% 2.23% 1.89% Asynchronous counter 0.05% 0.83% 0.77% 0.06% Hybrid counter 0.55% 0.51% 0.85% 1.12% FSM 3.45% 2.35% 0.80% 3.49%

in Fig. 2.7(c). This load can be minimized by re-sizing the payload gate capacitance to match the original load capacitance. (iii) The payload can be realized without adding an extra level of gate, e.g. one can simply add an NMOS transistor controlled by the Trojan trigger signal to pull down the payload node as shown in Fig. 2.7(d), equivalent to a stuck-at-0 fault activated only under rare conditions. This would have virtually no impact on a delay path due to the negligible diffusion capacitance load. (iv) Fig. 2.7(e) provides an example of merging the payload into the flip-flop, by replacing one inverter in the D flip-flop with an XNOR gate. In this case, change of the load cannot be seen by the ROs directly thus causing negligible impact.

Case Study 2: Gate-level Trojan Implementation to Bypass RO-Network Hardened Design

Different types of sequential hardware Trojans are implemented in a 4-bit carry look-ahead adder (referred as Beta design [5]) hardened by ROs. The impact of Trojans on RO frequency fluctuations is validated in a Xilinx Spartan-3e FPGA platform as shown in Table 2.3. In addition, HSPICE simulations of the configurations in Fig. 2.7 are performed on Beta design and several ISCAS’85 benchmark circuits using 70nm Predictable Technology Model (PTM) [95] with a supply voltage of 1V at 25oC, and the results are given in Table 2.4. The results demonstrate the Trojan- 23

Table 2.4 Impact of Different Trojan Configurations (as shown in Fig. 2.7) on RO Frequency, 70nm PTM @1V, 25◦C Circuit # of levels in RO frequency change* RO path Config. A Config. B Config. C Config. D Beta 11 7.76% 2.21% 1.80% 0.28% c880 13 6.40% 2.05% 1.77% 0.26% c2670 15 5.92% 1.97% 1.51% 0.24% c3540 15 5.25% 1.76% 1.12% 0.14% c5315 17 4.38% 1.15% 0.85% 0.11% c6288 17 3.95% 1.05% 0.74% 0.07% c7550 25 2.89% 0.85% 0.56% 0.06% *Config. E does not cause any change in RO frequency

induced impact on RO frequency is under 6.6% thus can be masked by the effect of process variations [7].

2.4 Summary

In this chapter, we have presented design of novel hardware Trojans in embedded processor that target leaking secret information from inside the processor. The Tro- jans can be triggered in field by an adversary by manipulating the software or input data. Secret information can be leaked either through processor ports as logic values or through side channels (e.g. supply current); and the trigger conditions can be in diverse forms. We have also proposed innovative circuit-level techniques of designing Trojans and placing them inside a circuit in a way that effectively evades existing protection mechanisms. Simulation and experimental results demonstrate that clever design of Trojans can successfully evade conventional logic testing while incurring ultralow delay/power overhead, thus bypassing side-channel analysis based detection, e.g. on-chip monitors (RO) based DfS approaches in this case. The Trojan design approaches presented have been shown effective for both FPGA and ASIC platforms. 24

3. HARDWARE TROJAN ATTACK IN EMBEDDED MEM- ORY

3.1 Introduction

Static Random Access Memories (SRAMs) is an integral component of modern processors and System-on-Chip (SoC) designs. For example, cache memories, of which the main component are SRAM arrays, is an essential part of any processor for bridg- ing the rising disparity between processor and main memory speeds [15]. Nowadays on-chip memories typically occupy more than 50% of SoC die area, which is expected to increase further as technology advances [90]. Therefore, reliable SRAMs is indis- pensable to assure dependable computing, as SRAM failures can lead to corruption of stored data, which can easily propagate in the system and tamper the data in- tegrity [16]. On the other hand, SRAM arrays are becoming denser with technology scaling, rendering cells more sensitive to defects, and exhibiting more complex faulty behav- iors. Plenty of research have been conducted in developing fault models to simulate these faulty behaviors, and testing algorithms to achieve good coverage against these faults [17], [21], [22]. However, we observe that although the various proposed SRAM testing methods have advanced capability in detecting manufacture-induced SRAM faults, they cannot assure detecting generic faults deliberately implemented by an attacker. In addition, the high test cost for large SRAM arrays prohibits IC vendors from applying exhaustive combinations of various test algorithms and test stress con- ditions, which further limits the capability of industrial standard testing in detecting well-designed controlled faults. In this chapter, we explore the possibility of mounting hardware Trojan attacks in SRAM arrays, which are essentially elaborately designed faults. We demonstrate the 25 feasibility of inserting hardware Trojans in SRAM array core cells, which can evade industrial standard post-manufacturing SRAM tesing, but cause array functional fail- ures in deployment to tamper the data integrity in cache memories. The proposed Trojan designs preserve SRAM cell foot-print and incurs zero silicon area overhead. While not triggered, the Trojan does not cause noticeable impact on SRAM perfor- mance, power, or stability. Therefore, side-channel based hardware Trojan detection approaches (e.g. leakage or delay) are generally ineffective in detecting these Tro- jans, especially in large SRAM arrays, due to large background current and negligible change in memory access time. We validate the Trojan design in compact SRAM layout, thus proves the existence of opportunities for attackers in foundries to insert well-designed Trojans that can bypass conventional post-manufacturing SRAM test- ing. This also proves the general feasibility of hardware Trojan insertion in foundries by manipulating design layouts. This work is focused on designing and implement- ing hardware Trojans in SRAM array core cells. For peripheral circuitry, the Trojan design techniques in logic circuits which we have studied in section 2 can be applied. In particular, this chapter makes the following contributions:

1. It, for the first time to our acknowledge, presents design of a new class of hard- ware Trojans in embedded memory, i.e. SRAM arrays. The Trojan is designed to cause SRAM cell failure in deployment by tampering the cell content or the ability to change to content, while being able to evade detection of industrial standard SRAM testing and existing hardware Trojan detection approaches.

2. The proposed Trojans are essentially elaborately controlled faults. And bypass- ing test-time detection is achieved by designing complex trigger mechanisms and payload faulty behaviors, so that the combination is beyond the coverage of existing testing methods.

3. The proposed Trojan model is side-channel benign, in the sense that it does not incur noticeable impact on performance, power or cell stability of the SRAM 26

array while it is not activated. It also does not change the foot-print of the layout or incur any silicon area overhead.

4. As a representative example, the proposed SRAM Trojan design proves the feasibility of mounting general hardware Trojan attacks during IC fabrication phase in foundries.

The rest of the chapter is organized as follows. Section 3.2 provides related back- ground on common SRAM faults and standard as well as advanced testing algorithms. The proposed SRAM Trojan designs are elaborated in Section 3.3, along with the im- plementation in a compact SRAM layout to validate the feasibility. 3.4 provides simulation results for Trojan functional verification and side-channel impact charac- terization. A discussion on the design space and possible extension of SRAM Trojan is conducted in section 3.5. Finally we summarize the chapter in Section 3.6.

3.2 Background

3.2.1 SRAM Fault Models

SRAM faults are caused by permanent defects in SRAM arrays due to manufac- turing issues. The physical defects can be global, e.g. due to a too thin polysilicon or mask misalignments; or local, i.e. incurred by extra, missing or in inappropriate material [17]. Generally, SRAM testing algorithms are developed based on functional fault models, which define the functional behaviors of faulty cells without considering explicitly the corresponding physical defects. Since the quality of the test meth- ods, in terms of their fault coverage and test length, strongly depends on the fault model being used, research have been conducted to relate functional fault models with physical defects in order to establish realistic functional models that can benefit development of effective and compact test methods [18]. Representative approaches include Inductive Fault Analysis (IFA) that predict the likelihood of physical defects and translate them into functional models [23] [24], and structure based electrical 27

simulation of defects to derive new fault models [18]. Here we describe commonly used SRAM functional fault models and the underlying possible physical defects [18]. The definitions and notations follow the convention in [17]. Most SRAM faults can be classified as faults within a cell and faults between two cells. Consider faults within a cell that tamper the ability to change stored the content. In a stuck-at fault (SAF), the value a cell stores is stuck at logic 0 (< ∀/0 >) or 1 (< ∀/0 >) and cannot be changed. It can be caused by shorts of one cell node or a broken cross coupling interconnect. It is the most common faults that occur in on-chip SRAMs, and can be detected by most industrial standard test. Stuck-open fault (SOF) means a cell cannot be accessed, which could be caused by a broken word line or word line shorted to ground. The cells fail to undergo both transitions, i.e. < w ↑ /0 > and < w ↑ /0 >, and the output value while being read depends on the implementation of the sense amplifier (SA). SA may produce a fixed value or repeat the previous value (< rx/x/? >). With a transition fault (TF), the cell fails to undergo a particular type of transition, i.e. 0 → 1 (< w ↑ /0 >) or 0 → 1 (< w ↑ /0 >). It may be caused by access transistor open or short at one bit line. There are also faults that compromise cell hold or read stability. With a Data

retention fault (DRF), a cell fails to hold the value for more than time T (< 1T /0 >

and < 0T /1 >), perhaps due to pull-up transistor or power supply path broken. Both read destructive fault (RDF) and deceptive read destructive fault (DRDF) cause cell value to flip with a read operation. In particular, RDF is denoted by < r0/ ↑ /1 > and < r1/ ↓ /0 >, and may be caused by cell pull-up transistor open or bit line short to ground; DRDF is denoted by < r0/ ↑ /0 > and < r1/ ↓ /1 >, perhaps due to pull-up transistor open or cell node shorted to ground. Faults involving two cells are called coupling faults (CFs). The cell causing the faulty behavior is called the aggressor cell (a-cell) or coupling cell, and the cell being affected and exhibits a faulty behavior is called the victim cell (v-cell) or coupled cell. Based on both the sensitizing condition and the faulty behavior, general coupling faults can be broadly categorized into state coupling fault (CFst), inversion coupling 28

fault (CFin) and idempotent coupling fault (CFid). A CFst is usually due to bridges between two cells or lines. The faulty behavior is that a particular state in the a-cell forces a value in the v-cell, which can occur as one of the four variations, < 0; 0/1 >, < 0; 1/0 >, < 1; 0/1 >, and < 1; 1/0 >. Different from CFst, CFin and CFid are sensitized by a transition write operation in the a-cell. In CFin, the v-cell value is inverted; while in CFid, the v-cell is forced to a fixed value. CFin and CFid are denoted as <↑; ↓>, <↓; ↑>; <↑; 1/0 >, <↑; 0/1 >, <↓; 1/0 >, <↓; 0/1 >. It is worth noting that the actual faulty behaviors may vary depending on the electrical property of the physical defects, e.g. the resistance value of the bridge, short or open spots. Weak faults that cause slight disturbance in SRAM operations may also occur [18], and may lead to more complex fault model like dynamic faults, which require multiple read or write operations to eventually cause the fault. In addition, linked faults can hide the faults further during testing, which refers to multiple faults affecting the same cell, and the effect of one fault can mask that of another. This makes detection of each fault more difficult. Finally, there have been functional fault models developed considering faults sensitized by the data pattern of multiple cells, namely pattern sensitive fault (PSF). Although generic pattern sensitive fault is considered unrealistic and impossible to detect, neighborhood pattern sensitive faults (NPSFs) have been studied [20]. More details on realistic SRAM fault models can be found in [18].

3.2.2 SRAM Testing Algorithms

The development of SRAM testing has been towards high fault coverage and short test time. Early testing algorithms tend to have long test time, i.e. O(n2), and poor coverage due to lack of reasonable fault models and proofs, such algorithms include Zero-One, Walking 0/1 and GALPAT [25]. To achieve more effective SRAM testing, a class of testing algorithms, march tests, have been proposed. March tests reduce the test time to O(n), and can achieve 29

excellent fault coverage for comprehensive fault models. A march test is composed of multiple march elements, where each march element is a group of operations to be executed as a unit on every cell and repeated throughout the SRAM array in a scanned manner. The complexity of march elements in a test depend on the target fault models. A test attempting to cover a wider range of fault models generally needs more complex march elements, thus require longer test time. Here we briefly introduce two widely used march test algorithms, March C- and March G, that can cover almost all simple static faults and many linked faults, as well as several advanced tests that are designed for harder-to-detect faults. March C- [26] test algorithm is given in Equation 3.1. It contains 10n operations, where n is the number of cells in the SRAM array. It can detect all simple SAFs and TFs that are unlinked with CFs, because reading both 0 and 1, and exciting both 0 → 1 and 1 → 0 transitions are covered by the test. It can also detect all CFsts and unlinked CFins. Coverage for CFsts can be proved by the observation that each pair of cells experience all four data combinations (0,0),(0,1),(1,0) and (1,1) during read operation on either cell. For CFin, <↑; l> with addr(a) < addr(v) and addr(v) < addr(a) can be detected by march element M2 (⇑ (r1, w0)) and M4 (⇓ (r1, w0)), respectively, where addr(a) and addr(v) stand for the address of the a- cell and v-cell. Similarly, <↓; l> faults are covered by M1 and M3. However, march C- cannot detect linked faults, DRF or SOF. For example, if a TF happens on a cell in M1, it maybe masked by CFin with a-cell at a higher address. Similarly TFs occur in M3 may be masked by CFin from a lower address. In addition, SOFs are not covered by March C- because the test cannot determine if the value obtained from the sense amplifier comes from the accessed cell or repetition of the value in last read operation. March G [17] test is a comprehensive algorithm attempting to cover SAFs and TFs linked with CFs, CFins (unlinked and sometimes linked with CFids), linked CFids, and SOFs, as shown Equation 3.1. Any march test can be extended with delay elements to cover DRFs with respect to both logic 0 and 1, at the cost of longer test time. Example of an extended march G test is also provided. 30

March C − (10n): {m (w0); ⇑ (r0, w1); ⇑ (r1, w0); ⇓ (r0, w1); ⇓ (r1, w0); m (r0)}

March G (23n): {m (w0); ⇑ (r0, w1, r1, w0, r0, w1); ⇑ (r1, w0, w1); ⇓ (r1, w0, w1, w0);

⇓ (r0, w1, w0); m (r0, w1, r1); m (r1, w0, r0)}

March G w/ Delay elements : {m (w0); ⇑ (r0, w1, r1, w0, r0, w1); ⇑ (r1, w0, w1);

⇓ (r1, w0, w1, w0); ⇓ (r0, w1, w0); Del; m (r0, w1, r1); Del; m (r1, w0, r0)} (3.1)

Advanced algorithms to test against more complex fault models have also been investigated. Some of them are subject to significantly longer test time, thus have less industrial applications. For example, March SL was presented to guarantee cov- ering all simple linked faults [22]. March RAW is designed to detect dynamic faults sensitized by read-after-write operations [21]. The tests are provided in Equation 3.2. Researchers have also considered a more general form of coupling fault, i.e. pattern sensitive fault (PSF), in which multiple a-cells collaboratively cause a faulty behavior in the v-cell. The fault model assumes fault sensitization dependent on the data pat- tern in the a-cells, which is how the fault is named. General pattern sensitive fault testing is considered impossible, the target of existing algorithms is neighborhood pattern sensitive faults (NPSF) with a restricted region of the a-cells with respect to the v-cell, and usually with the assumption of the occurrence of no more than one NPSF at any time during the test [20]. Even with these assumptions, the tests are still overly complex compared to other march tests, e.g. the one proposed in [20] requires test time of 68n. Moreover, there are no good evidences validating the reality of such fault models, making them less attractive to the semiconductor industry. 31

March SL (41n): {m (w0); ⇑ (r0, r0, w1, w1, r1, r1, w0, w0, r0, w1);

⇑ (r1, r1, w0, w0, r0, r0, w1, w1, r1, w0); ⇓ (r0, r0, w1, w1, r1, r1, w0, w0, r0, w1);

⇓ (r1, r1, w0, w0, r0, r0, w1, w1, r1, w0); }

March RAW (26n): {m (w0); ⇑ (r0, w0, r0, r0, w1, r1); ⇑ (r1, w1, r1, r1, w0, r0);

⇓ (r0, w0, r0, r0, w1, r1); ⇓ (r1, w1, r1, r1, w0, r0); m (r0)} (3.2)

Apart from various forms of SRAM testing algorithms, different stress conditions are also used in SRAM tests for better coverage. Widely used stress conditions include address sequences, data backgrounds and supply voltages. Common addressing di- rections include ‘fast x’ and ‘fast y’, in which the address increments or decrements in a way that each step goes to the next row or column [19]. Different data background can improve the coverage of coupling faults, and to some extent facilitate detection of pattern sensitive faults. The four data background used in industrial testing are displayed in Fig. 3.1. In addition, SRAM arrays can be tested under higher or lower than nominal Vdd to obtain better coverage in faults caused by resistive short/bridge or open, e.g. dynamic faults caused by broken pull-up transistor or Vdd [27]. In this work, the Trojans are evaluated in the context of fast-y march tests with commonly used data background. It is shown later that the proposed Trojan models remain valid in more complicated scenarios, including word-oriented SRAMs.

Fig. 3.1. Common data backgrounds used in SRAM testing. 32

Fig. 3.2. Hardware Trojan attack in SRAM array: (a) a general model; (b) effective defect types.

3.3 Trojan Attacks in SRAM Array

Fig. 3.2(a) demonstrates the general attack model of the proposed SRAM Tro- jan, which is essentially SRAM faults with sophisticated trigger conditions designed deliberately to bypass conventional SRAM testing. The trigger condition can be a combination of states of various elements in the SRAM array, e.g. cell nodes, word lines and bit lines, as shown in the figure. The payload mechanism of a Trojan upon trigger imitates that of a conventional fault, i.e. it can cause shorts, bridges and opens in the array, as depicted in Fig. 3.2(b).

3.3.1 Trojan Trigger Mechanism

To effectively evade detection of various SRAM testing algorithms, we exploit data patterns in the array and operation conditions denoted by states of word lines and bit lines. As can been seen in the following description, a more sophisticated trigger condition is demanded to bypass a more complex test, thus usually requiring more Trojan circuitry to be inserted. Consider a basic fast-y march test with solid data background, where fast-y test means each march element is operated throughout a row of cells and then move to the next row. An observation can be made that, at any time when a cell is being 33 accessed in the test, the cells with lower addresses all hold the same value, and so do the cells with higher addresses. In other words, a block of cells that are not being operated would still have a solid data pattern, either the original or the inverted version. In this case, a Trojan that tampers the read/write operation of a cell can have a trigger condition leveraging on values of two cells on the same side of the victim cell (cell affected by the Trojan), i.e. both with higher addresses or lower addresses. In particular, the trigger condition can be “when the two trigger cells on the same side of (but in a row different from) the v-cell hold different values”, as this will never happen when the v-cell is being operated since the two trigger cells in a different row must not be accessed simultaneously with the v-cell. When the four data backgrounds listed in Fig. 3.1 are applied, each pair of cells can have all four combinations of data pattern (0,0), (0,1), (1,0) and (1,1), even while not being accessed. However, the observation still holds that each side of the currently accessed cell have the original or inverted data pattern with respect to the background pattern. Consequently, a block of m × n adjacent cells (Fig. 3.3) on one side of the accessed cell will have the following properties:

1. If m is odd, the data pattern of the block is row-wise symmetric.

2. If n is odd, the data pattern of the block is column-wise symmetric.

3. If m is even, the data pattern of each row in the block is the same as or com- plement to the inverse of the row.

4. If n is even, the data pattern of each column in the block is the same as or complement to the inverse of the column.

5. If the block is square (m=n), the block data pattern must be either row-wise or column-wise or diagonal-wise symmetric, or simultaneously satisfy the three.

Based on these observations, Trojan trigger conditions can be designed exploiting data patterns that are unlikely to appear in march tests with the four commonly 34

Fig. 3.3. Data patterns that can be leveraged by Trojan trigger mechanisms.

used data backgrounds. Consider the simple case where n=1, namely the data pat- tern block is within a row. If m=2, all four data combinations are valid patterns in march tests, denoted in black in Fig. 3.3. Thus 2-cell patterns are not adequate to establish a Trojan trigger condition. Half of the 3-cell patterns are invalid (marked in orange) in march tests when the block is not being accessed, and can be used to form the trigger mechanism. However, these patterns are possible to occur when multi- column stripe data backgrounds are applied, e.g. each stripe contains more than one columns, which are hardly used in industrial tests due to time limit though. As the cell number increases to 4, 12 out of the 16 possible patterns are impossible to occur except when the pattern is being changed, i.e. one of the cells in the block is being written. Moreover, out of the 12 invalid patterns, 8 are theoretically possible in tests with multi-column stripe backgrounds (marked in orange), and the rest 4 patterns are unlikely to happen in all regular pattern data backgrounds (marked in red), includ- ing those multi-row or multi-column stripe. Therefore, these patterns can be used to design the most reliably Trojan that can evade any march test. The statements 35

hold true for column-wise pattern as well which consider against multi-row stripe alternatively. It is worth noting that these invalid patterns are still highly possible to occur in field to make the Trojans trigger-able, in which case each pattern has an equal probability of occurrence without assumption on workloads. The fundamental reason they can bypass march tests with various data backgrounds is that the tests are designed to test against realistic faults, while Trojans are essentially intention- ally designed unrealistic faults. In addition, NPSF testing algorithms cannot detect such Trojans effectively because Trojan trigger cells are not restricted to the defined neighborhood region. States of word lines and bit lines in an SRAM array can also be used as part of a Trojan trigger condition. For instance, the invalid data patterns discussed above are in the context where they are not being operated on. In reality, a judgement can be made on that based on the values of their word lines. If all the trigger cells are in one row, the corresponding word line value can be incorporated to guarantee an impossible condition, i.e. WL · P , where P is the invalid data pattern. Conclusively, enable features of the proposed approach to designing Trojan trigger mechanisms for bypassing testing of the broad class of march tests with various data backgrounds, as well as NPSF testing algorithms include:

1. The approach explores complex data patterns that are unlikely to occur during conventional SRAM testing. In this sense, a Trojan can be viewed as a group of complicated coupling faults that are undetectable by march tests.

2. Trojan trigger cells are not limited to the neighborhood region of the v-cell, thus able to evade detection of NPSF test algorithms.

3. The proposed trigger mechanisms make use of word line and bit line states in an array to establish extremely rare or impossible conditions in conventional testing.

4. Design of Trojan payload effects explore hard-to-detect faulty behaviors like data retention faults (DRFs) and dynamic faults. 36

5. Combinations of rare trigger conditions and hard-to-detect payloads enable Tro- jans to bypass post-manufacturing testing reliably.

Although the proposed Trojan design is in the context of bit-oriented SRAMs (BOMs), i.e. the memory is accessed bit by bit; the Trojan model remains valid in word-oriented SRAMs (WOMs) that are accessed by each word containing B bits (B > 1). In WOMs with bit-interleaved structure, testing a word is equivalent to testing B bits in parallel, while the memory block containing each bit has the same data backgrounds as in BOMs. Therefore the capability of a given Trojan trigger mechanism in evading a march test is the same for a BOM and a bit-interleaved WOM. In the scenario of a WOM with adjacent bit organization, different multi- column stripe data backgrounds will be applied in the testing to detect intra-word faults [28]. Hence Trojans trigger conditions can exploit the data patterns that do not occur in march tests with multi-column stripe data backgrounds, as shown in Fig. 3.3. Moreover, an attacker can select specific locations (in a word or in neighborhood words) of the trigger cells to protect the Trojan against testing, because the possible patterns cannot occur at every location within a word. Elaboration on two types of SRAM Trojan design is provided below. The clas- sification is based on the effective payload defect type. The designs are given at transistor level, and are validated on a compact SRAM layout with the constraints of not changing the cell foot-print or causing design rule violation in the original cells.

3.3.2 Trojan Type 1: Resistive Short/Bridge

Trojans can be designed to cause resistive shorts between one circuit node in the array and Vdd/Vss, or resistive bridges between two nodes, where a node can be a cell node (true or complementary), a word line, or a bit line. Fig. 3.4 demonstrates representative designs of Trojans that short a cell node with Vss upon trigger. Tro- jans causing bridges between circuit nodes can be implemented in a similar manner. Theoretically the same technique can be applied to any pair of nodes to cause short- 37 ing or bridging, the actual feasibility of implementation is constrained by the SRAM layout. The trigger condition of a type-1 Trojan is realized by multiple nmos pass tran- sistors (PT) concatenated in series. The Gate terminals of the PTs are connected to different nodes that form the trigger condition. Only when all Gate terminals hold voltages of logic 1, thus all PTs are turned on, the shorting path is activated. This denotes the on state of the Trojan. Otherwise, if at lease one PT on the path is off, the Trojan is not activated, and the v-cell node sees an extremely high resistance due to the path (on the order of Gohms), which does not affect the normal functionality of the v-cell. The choice of nmos instead of pmos PT is based on the following rea- sons. First, nmos PTs have higher mobility than pmos PTs of the same size, hence better conductivity. Considering the limited space an attack has to insert Trojan cir- cuitry, a shorting path with nmos PTs will have reasonably low effective resistance, while pmos PTs will result in very high resistance, compromising the effectiveness

Fig. 3.4. Trojans causing v-cell node shorted to Vss: (a) triggered by 2-cell data pattern/& a word line; (b) triggered by 3-cell data pattern/& a word line. 38 of the Trojan payload. This is further necessitated by the concatenation of multiple PTs, which weaken the path conductivity significantly. Second, nmos PT is good at conducting logic 0 thus suitable for Trojans shorting a node to Vss. A pmos PT is subject to voltage degradation of Vthp when conducting logic 0, which is intolerable in deep submicron, especially low-power, technologies, where the degradation will lead to the opposite logic value. Finally, the traditional SRAM layout used in this work prohibits insertion of an arbitrary pmos transistor. The nmos PTs are implemented by exploring the free space introduced by word lines and access transistors. The trigger nodes of the Trojan, i.e. the PT Gate terminals, are selected according to the method presented in section 3.3.1. Cell nodes from a row are selected to form data patterns that are unlikely to occur during march tests. Fig. 3.4(a) and (b) display the realization of 2-cell trigger pattern and 3-cell trigger pattern, respectively, for evading march tests with solid data background and four commonly used data backgrounds. A 4-cell trigger pattern can be designed in the same manner to allow bypass all march tests. The cells that are demanded to have value 1 in the trigger pattern are connected with the true nodes, while those required to have value 0 are exploited by the complementary nodes. The strategy of choosing trigger cells from one row is to corporate with the SRAM layout, as it is hardly possible to route interconnects from nodes in different rows without using an extra metal layer. Placing trigger cells in a single row can also facilitate minimization of Trojan incurred parasitic effects, e.g. interconnect resistance to the v-cell and load capacitance to the trigger cells. Each Trojan can also contains a trigger node from a non-trigger-cell word line to lower the trigger probability, and guarantee the Trojan is not triggered when one of the trigger cell is being written, causing the trigger pattern to appear temporarily. If the trigger word line is not that of the v-cell (corresponding to the case where the v-cell and trigger cells are the same row), the Trojan could never be triggered while the v-cell is accessed, making the payload a weak fault that merely tampers cell hold stability. For realizing a trigger pattern that does not occur during conventional testing, any non-trigger-cell word line can be used to control the extra PT. However, 39

Table 3.1 Implemented Trojans of Type 1.

Trojan name Defect Defect # of Trigger lines Payload effect Subject to type terminals trigger tests Ts Q(B) Vss 2 Short Cell node, 2 2 other cells Reduced hold-SNM, non-Solid Vss RDF, write failure DB Ts Q(B) Vss 3 Short Cell node, 3 3 other cells Reduced hold-SNM, advanced Vss RDF, write failure DB Ts Q(B) Vss 4 Short Cell node, 4 4 other cells Reduced hold-SNM, None Vss RDF, write failure Ts Q(B) Vss 4 WL Short Cell node, 5 WL, Temporarily None Vss 4 other cells reduced hold-SNM Ts BL(B) Vss 2 Short Bit line, 2 2 cells Incorrect read non-Solid Vss DB Tb Q(B) Q(B) 2 Bridge Cell node, 2 2 other cells RDF, write failure, non-Solid Cell node coupling fault DB Tb Q(B) Q(B) 3 Bridge Cell node, 3 3 other cells write failure advanced Cell node DB Tb Q(B) Q(B) 4 Bridge Cell node, 4 4 other cells reduced hold/read-SNM None Cell node routing from a distant word line is very difficult to implement. Without needing an additional metal layer or tampering original cell, it would only be possible at the edge of an SRAM bank leveraging on the space between banks incurred by control logic like word line segmentation. From the perspective of implementation, the shorting or bridging path can contain any number of concatenated PTs, and larger number of PTs can lead to a lower trigger probability during in field operation. However, more PTs in series will result in a larger effective on-resistance, degrading the shorting or bridging impact on v-cell. When the PT number is above a threshold, the Trojan will only cause a weak fault upon trigger that decreases the v-cell SNM without causing other faulty behaviors. In addition, longer Trojan paths will cause larger parasitic effect on the array. Therefore we limit the number of PTs within the range of [2,5]. The detailed Trojan payload effect is characterized in section 3.4. Table 3.1 describes the implemented type-1 Trojans. 40

3.3.3 Trojan Type 2: Resistive Open

Design space of mounting Trojans to cause resistive open defects an SRAM cell, or word lines/bit lines, is more limited compared with inserting Trojans to cause shorts or bridges. Primary reason lies in the fact that shorting and bridging paths are extra circuitry placed in the SRAM layout, which do not directly change the original layout but only add some new connections to existing nodes; while causing open in existing paths requires inserting circuitry in the original path to control the conditioned open. Since SRAM layouts are usually highly optimized and compact, it is generally very difficult to add extra transistors in an existing path without causing design rule violations in the cell. Note that in this work we use active elements, i.e. PTs, to control the states of a Trojan, instead of exploiting passive elements, e.g. creating a weak open defect spot by thinning an interconnect line. Because the latter approach does not have any control mechanism, rendering Trojans always on and detectable in march tests covering the specific fault type, e.g. stuck-open fault (SOF) or data retention fault (DRF). By exploring opportunities of mounting Trojans to introduce resistive open de- fects, we were able to implement different variations of one Trojan type that upon trigger causes one pull-up path in a cell broken. Fig. 3.5 demonstrates the circuit

Fig. 3.5. Trojans causing v-cell pull-up path broken: (a) controlled by one node; (b) controlled by two nodes. 41 level implementation, while omitting the trigger node connections. The simplest form of the Trojan is depicted in Fig. 3.5(a), using one extra pmos PT to control the re- sistive open defect. As stated previously, it is difficult to insert arbitrary extra pmos transistors in the conventional SRAM layout. In fact, the space between the pull-up transistor and Vdd is the only location where an extra pmos PT can fit in, as can be seen later in section 3.3.4. The PT is controlled by a trigger line Trig that can turn off the PT with value of logic 1 to cause the open defect to appear. It is desir- able to use multiple signals to collaboratively control the Trojan so that the trigger condition can be complex enough to bypass march tests. Complementary to design of type-1 Trojan, which employs series connection of PTs to create a occasionally on path; design of Trojan type 2 needs to use PTs in parallel to establish a occasionally off path. This is because that occasionally-off, which can be viewed as occasional-0, is essentially an OR function. A type-2 Trojan with two trigger control lines is illus- trated in Fig. 3.5(b). Unfortunately, this is the maximum number of control lines that can be realized in a type-2 Trojan due to space limitation.

Fig. 3.6. Implemented Trojans of type 2. 42

Fig. 3.6 illustrates the Trojans that have been implemented to cause resistive open defects. The implementations are based on comprehensive exploration of op- portunities of inserting type-2 Trojans in a compact SRAM layout. Choices of Trojan trigger signals are made to protect the Trojan against march tests to the best extent, as well as to ease routing of Trojan circuitry. In particular, the Trojan in Fig. 3.6(a) is controlled by a single trigger line, i.e. the word line of the previous row. This implementation assures that the Trojan will not be triggered while the v-cell is being read or written, as during which the trigger word line must be low, keeping the v-cell pull-up path on. While the v-cell is not being accessed, DRF can happen if the v-cell is holding a 0 and the trigger word line remains high for long enough. Note that the v-cell does not have a problem in holding value 1 because the Trojan can only cause an open in the inverter generating the complementary output. The time required to cause a DRF depends on the pull-down path leakage of the cell, which in our framework is on the order of 100µs. However, in either SRAM testing or deployment, DRFs can hardly happen for Trojans whose trigger conditions involve a word line. Because in each cycle the word line is only active for no more than the evaluation phase, and remains low during the bit line pre-charge phase, the v-cell node QB can still get refreshed every cycle when it holds a 1. Therefore, this Trojan will not cause a strong fault, and naturally evade all tests. It will temporarily compromise the SNM of the v-cell when holding value 0, making the cell highly vulnerable to noise and soft errors while the trigger word line is high. The other three Trojans are designed to be controlled simultaneously by two trigger lines. The one shown in Fig. 3.6(b) is modified from the one in Fig. 3.6(a) by introducing an additional trigger control of another cell node (Q). This lower the occurrence probability of the soft error by 50% because soft errors can only happen when the trigger cell holds a 1. Fig. 3.6(c) demonstrates a Trojan using the true and

complementary bit lines (BLt, BLt) of another column (t) to establish the trigger condition. The Trojan is turned on when both BLt and BLt are high. The payload effect is two fold. If the trigger condition is satisfied, e.g. after a read operation on a 43

cell in column t BLt and BLt can both have floating value close to logic 1, the v-cell have read(0)-after-write(0) fault that flips the value with multiple read operations, where multiple read operations may be required. While not accessed, in theory the

v-cell can have DRF if BLt and BLt remain high for a long time; however, one of the bit line voltages will always have a slight degradation during the evaluation phase, creating a weakly conducting path that prevents the DRF from occurrence. In our framework with 45nm low-power CMOS technology, the v-cell exhibits (w0)(r0)3 when the Trojan is on. Thus the Trojan will be detected by test algorithms covering read-after-write dynamic faults while maintaining BLt and BLt high when operating on the v-cell, e.g. March RAW [21]. However algorithms like March SL [22], which

keeps one of BLt and BLt low with a write as the last operation in the march element while accessing the v-cell, cannot detect this Trojan. Note that although March RAW is only intended to cover (w0)(r0)2 faults, the compromised value after two read-0 operations cannot be recovered by the weakly-on bit line, and eventually will have a flip. Actual simulation results will be provided in section 3.4. Finally, the Trojan given in Fig. 3.6(d) exploits a 2-cell pattern as the trigger condition, which is satisfied when both trigger cells, that are on one side of the v-cell, hold different values. As have been discussed in section 3.3.1 and 3.3.2, such a Trojan can evade detection in march tests with solid background; with other backgrounds, i.e. checkerboard and column stripe in this case, march tests covering read-after-write dynamic faults or DRFs can detect the Trojan. However, the advanced march tests are generally not applied with complex data background because pattern sensitive dynamic fault or DRF are not realistic fault models. Therefore there is a high chance that the Trojan can pass industrial standard tests. The four implemented Trojans are listed in Table 3.2. 44

Table 3.2 Implemented Trojans of Type 2.

Trojan name Defect spot # of Trigger lines Payload effect Subject to trigger tests To QB Vdd WL cell pull-up 1 WLi−1 Temporary None neg. hold-SNM To QB Vdd WL Q(B) cell pull-up 2 WLi−1, Temporary None Q(B)1 neg. hold-SNM 3 To QB Vdd BL BLB cell pull-up 2 BLt,BLt (w0)(r0) March RAW

To QB Vdd Q(B) Q(B) cell pull-up 2 Q(B)1,Q(B)2 (w0)(r0)2, March RAW/SL/ DRF DRF tests w/ checkerboard/ col. stripe DB

3.3.4 Feasibility Verification

We have implemented the proposed Trojans in a compact SRAM layout at 45nm CMOS technology node, in which the object dimensions and spacing are minimized while not violating the design rules. The layout follows the traditional 6T layout style [49]. With respect to the new lithographically friendly 6T layout [49], the same Trojan design methodology can apply; However the actual feasibility of each implementation may vary due to a different layout topology. Fig. 3.7, 3.8 and 3.9 illustrate the implementation of each type of Trojans with two representative examples. It can be seen that the Trojans are inserted exploiting the free space in the original layout, which does not change the cell foot-print or incur extra silicon area. Trojans that are not mounted inside a cell, i.e. Trojans of type 1, make no modification on the original cell layout, but only add extra connections to certain existing nodes. The possibility of inserting Trojans in SRAM layout is fundamentally because that in a regular array structure, all spacing between objects cannot meet their lower limits simultaneously; there will be free space that can be exploited to fit in extra circuitry. In addition, the amount of free space can surge in large SRAMs with multiple banks and complex architectures like hierarchical word lines that create extra spacing inside or between sub-arrays. This also indirectly proves the existence of significantly more free space in digital logic circuits because of the inefficiency of automatic layout and the irregular structure, which provides an attacker opportunities 45

Fig. 3.7. Layout of Trojans causing short defects in a compact SRAM array: (a) Ts BL Vss 2 WL; (b) Ts QB Vss 2 WL. 46

Fig. 3.8. Layout of Trojans causing bridge defects in a compact SRAM array: (a) Tb BLB BL 2 WL; (b) Tb QB QB 3 WL.

to insert more complicated Trojans, e.g. sequential Trojans. Moreover, although the locations and structure of SRAM cells are to be preserved, certain objects have some flexibility in their locations. For example, the goal of VDD-to-N-Well contacts are to assure a uniform potential of the N-well, their locations can be slightly shifted while 47

Fig. 3.9. Layout of Trojans causing open defects in a compact SRAM array: (a) To Q(B) Vdd WL; (b) To Q(B) Vdd WL QB. 48 maintaining the necessary density, as was done in Fig. 3.7(b), 3.8(b) and 3.9(b). Such flexibility allows an attacker to re-optimize the layout to favor Trojan insertion. In digital logic the flexibility will be larger, including re-optimizing/re-sizing certain small parts of the logic. An optimized Trojan design should try to share source/drain of Trojan transistors with those of existing transistors in the SRAM as much as possible, thus avoiding inefficient contacts and metals that limit the space available for active elements. This philosophy is manifested in Fig. 3.7(a), 3.8(a) and 3.9(a), which is usually a good strategy to implement local Trojan circuitry with few number of transistors. For Trojans containing more transistors or creating bridging defects between two distant cells, contacts are required between active regions and interconnects. As can be seen in the layouts, metal 1 and 2 are extensively used in the SRAM array horizontally and vertically, therefore it is difficult to use these metal layers for Trojan interconnection. Hence we leverage on polysilicon to build interconnects between Trojan transistors, resulting in higher interconnect resistance compared to metal. The actual resistance impact will be elaborated in section 3.4. This work is based on the conservative assumption that no other metal layers, e.g. metal 3, are available for Trojan interconnects, as that would require extra masks. In reality, SRAMs in an SoC are fabricated together with other on-chip logic, where additional metal layers would be available. In this scenario, the Trojan interconnects can be more efficient in terms of series resistance, but may incur larger load capacitance. It is worth noting that although the original SRAM elements are guaranteed to have no design rule violations, we allow certain extent of design rule errors in Trojan circuitry due to space limitation. These errors can lead to broken Trojan circuitry in some ICs due to process variations, and eventually result in benign Trojans, i.e. Trojans that can never be triggered, without causing other side-effects. An example can be insufficient active overlap around active contacts in Trojan type 1, that can cause broken contacts and make the Trojan path never turned on. 49

3.4 Simulation Results

HSPICE simulations were performed with 45nm low-power PTM CMOS model [95]. The payload effects of each implemented Trojan is analyzed on a 32x64 SRAM array, considering the v-cell in standby mode and during read/write operations. The impact of each Trojan on other cells in the array, while it is not triggered, is also characterized. The minimal influences incurred by untriggered Trojans prove the capability of the Trojans in evading conventional testing.

3.4.1 Trojan Type 1: Short

Trojan Ts QB Vss x (x∈{2,3,4}) introduces a shorting path between the comple- mentary node of a cell and Vss, the state of which is controlled by the data pattern of 2, 3 or 4 other cells. The Trojan payload is mainly to compromise the v-cell’s capability in holding/operating with value 0. The v-cell faulty behaviors with an activated Trojan is elaborated first below, followed by the impact of an untriggered Trojan. Hold-0 : While not operated, the v-cell can still hold value 0; however, the static noise margin (SNM) is severely compromised due to the Trojan-induced pull-down path. Fig. 3.10 illustrates the distorted DC transfer characteristics of the shorted inverter. The black and blue curves represent the butterfly curve of the v-cell when the Trojan is untriggered, which is the same as that of a Trojan-free cell. The curve is rotated by 45◦ to ease the measurement of SNM. The red curves correspond to the transfer characteristics of the v-cell QB transistor with an on-state Trojan. A straightforward observation is that an on-state Trojan leads to dramatically reduced SNM for holding 0, while the SNM for holding a 1 is improved. This means the cell is extremely sensitive to noise when storing 0, which can be caused by supply voltage or temperature fluctuation, coupling of neighborhood cells, and radiation. Hence, the soft error rate for the v-cell is significantly increased. It can also be seen that the impact on SNM varies with the number of trigger cells. The reason is that less 50

number of trigger cells leads to a shorting path with fewer concatenated PTs, hence lower series resistance. This aggravates the discharging of QB and makes it more difficult for the v-cell to hold a 0. The actual SNM values are provided in Table 3.3. Read-0 : In a similar way, the Trojans also reduce the v-cell read-SNM when triggered. Since read-SNM is generally much lower than hold-SNM for a cell, it turns out that the read-SNM of a v-cell with an on-state Trojan is negative for read-0. Fig. 3.11 shows that even the Trojan triggered by a 4-cell pattern, which has a relatively weak shorting path compared with the other Trojan variations, leads to a non-positive read-0 SNM. This implies that the v-cell cannot perform a read-0 correctly even with no disturbance due to noise, which is proved in Fig. 3.12 that shows the occurrence of RDFs when read-0 is performed on the v-cells. As a comparison, the blue curves display the correct read-0 operation of Trojan-free cells. Degraded logic-1 voltages due to Trojans can also be seen in the figure, confirming the reduced SNMs from another perpective. Write-0 : Fig. 3.13 shows an on-state Trojan slightly shifts the write-0 trip- point of the v-cell, because the shorting path results in a stronger pull-down path in the QB transistor. The Trojans also degrade logic-1 voltages at QB because the shorting path renders QB transistor ratioed. More importantly, transient analysis demonstrates remarkably increased write access time due to triggered Trojans, as shown in Table 3.3. It is worth noting that the read/write access times reported here and below are measured from the beginning of the evaluation phase, excluding the bit line conditioning or address decoder propagation delay, where Trojans do not make a difference. The increased write access time causes write-0 failures in Trojan- compromised v-cells at high operating frequencies. Fig. 3.14 demonstrates write-0 failures in v-cells at a frequency that operates normally for a Trojan-free array. Similar to SNM, the impact on write access time also exacerbates with reduced number of trigger cells because of stronger shorting paths. Considering a pre-charge phase of 1ns, a Ts QB Vss 2 causes a performance degradation from 543 MHz (period of 1.84ns) to 408 MHz (period of 2.45ns). 51

Fig. 3.10. Hold-SNM of the v-cell while Trojan is on.

Untriggered Trojan: Table 3.3 shows that an untriggered Trojan Ts QB Vss x has virtually no impact on v-cell SNM and read access time, and incurs minimal (<0.03ns) degradation on write access time due to capacitive loading. Considering 1ns pre-charge phase, the performance degradation is below 1.6%, which is negligible for deep submicron technologies with large process variations. Moreover, the impact is characterized with respect to a tiny SRAM array of size 32x64. In realistic SRAM arrays, the access time will be much lower due to large word line/bit line capacitance, thus the relative performance overhead caused by an untriggered Trojan will be trivial as it does not scale up with the array size. The parametric impact of two other type-1(short) Trojans is given in Table 3.4. Regarding the payload, an on-state Trojan Ts BL Vss 2 causes incorrect read-1 in the v-cell because the shorting path prevents BL from pre-charging to as high as BLB, which poses a significant effect on the latch-based sense amplifier that only senses the bit line pair voltage difference for a short time. Trojan Ts QB Vss 4 WL incurs merely temporal hold-SNM degradation because the trigger word line (of a different row from the v-cell) never maintains high for more than half clock cycle or get asserted while the v-cell is accessed. 52

Fig. 3.11. Read-SNM of the v-cell while Trojan is on.

Fig. 3.12. Trojans cause read-destructive fault (RDF) in the v-cell during read-0 operation.

Table 3.3 Impact of Trojan Ts QB Vss x (x∈{2,3,4}) on a 32x64 SRAM array.

Parameters Golden Trojan untriggered Trojan triggered x=2 x=3 x=4 x=2 x=3 x=4 SNM-hold (V) 0.42 0.42 0.42 0.42 0.04 0.12 0.16 SNM-read (V) 0.24 0.24 0.24 0.24 <0 <0 <0 Read access time (ns) 0.26 0.26 0.26 0.26 - - - Write access time (ns) 0.84 0.85 0.86 0.87 1.45 1.06 0.99 Standby power (nW) 1.43 1.43 1.43 1.43 - - - Read Energy (fJ) 118.29 118.29 118.29 118.29 - - - Write Energy (fJ) 110.95 111.03 111.13 111.22 - - -

3.4.2 Trojan Type 1: Bridge

Trojans causing bridge defects between SRAM circuit nodes have been imple- mented, including Trojan Tb QB QB x (x∈{2,3,4}) that incur bridges between the complementary nodes of two cells. 53

Fig. 3.13. Trojans cause shifted write-0 trip point and degraded logic- 1 voltage at QB.

Fig. 3.14. Trojans cause write-0 failure (RDF) in the v-cell.

Table 3.4 Impact of two other type-1(Short) Trojans on a 32x64 SRAM array.

Parameters Golden Ts QB Vss 4 WL Ts BL Vss 2 Troj. untrig. Troj. trig. Troj. untrig. Troj. trig. SNM-hold (V) 0.42 0.42 0.20 0.42 0.42 SNM-read (V) 0.24 0.24 - 0.24 0.24 Read access time (ns) 0.26 0.26 - 0.26 0.27 Write access time (ns) 0.84 0.86 - 0.84 0.85 Standby power (nW) 1.43 1.43 - 1.43 - Read Energy (fJ) 118.29 118.29 - 118.30 - Write Energy (fJ) 110.95 111.22 - 111.38 -

Table 3.5 provides the v-cell faulty behavior incurred by Trojan Tb QB QB x (x∈{2,3,4}) in their on-states. It can be seen that as the number of PTs in the bridging 54

Table 3.5 Payload of Trojan Tb QB QB x (x∈{2,3,4}).

Operation x=2 x=3 x=4 QB1=0, QB2=1, read QB2 RDF Normal Normal QB1=1, QB2=0,read QB2 RDF Normal Normal QB1=0, QB2=0, write-1 QB2 Cell2 write failure Cell2 write failure Normal QB1=1, QB2=1, write-0 QB2 CFid/RDF in Cell1 Normal Normal QB1=1, QB2=0, hold QB2 Normal Normal Normal

Table 3.6 Impact of untriggered Trojan Tb QB QB x (x∈{2,3,4}) on a 32x64 SRAM array.

Parameters Golden Trojan untriggered x=2 x=3 x=4 SNM-hold (V) 0.42 0.42 0.42 0.42 SNM-read (V) 0.24 0.24 0.24 0.24 Read access time (ns) 0.26 0.26 0.26 0.26 Write access time (ns) 0.84 0.85 0.86 0.86 Standby power (nW) 1.43 1.45 1.45 1.45 Read Energy (fJ) 118.29 118.35 118.35 118.35 Write Energy (fJ) 110.95 110.99 111.07 111.19 path increases, the Trojan impact is reduced; eventually Tb QB QB 4 causes merely a weak fault that compromises the v-cell SNMs. On the contrary, Tb QB QB 2 leads to faulty behavior in each operation that involves different values at the two bridged nodes. In particular, read destructive faults occur when one of the v-cells is being read if the other one holds a different value. When both v-cell nodes hold value 0, the pulling-down effect of the bridging path prevents a successful write-1 operation on either of them. However, while both the victim nodes hold value 1, a write-0 performed on one of them will be pull down both v-nodes. This difference is mainly because the fact that nmos-constitute bridging path is strong at conducting 0 but weak at conducting 1. Therefore, in the latter case, value 1 on the standby v-node is overpowered by the 0 being written onto the other v-node. Depending on different clock frequencies, the actual faulty behavior can be a instantaneous idempotent cou- pling fault or a RDF in the standby v-cell following the write, as shown in Fig. 3.15. Table demonstrates the negligible impact on SRAM performance, power and SNM caused by the Trojans untriggered. 55

Fig. 3.15. Trojan Tb QB QB 2 causes coupling faults in the v-cells: (a) with 2ns clock period; (b) with 3ns clock period.

3.4.3 Trojan Type 2: Open

Four Trojans of type 2 were implemented as described in section 3.3.3. In par- ticular, the two Trojans whose trigger conditions involve a word line cause simply weak faults upon trigger, by temporarily reducing the hold-SNM of the v-cell to a negative value when the control word line is asserted, as shown in Fig. 3.17. Since a word line is only on for a short time during a clock cycle, it is not adequate to cause a data flip in the v-cell. However, an instant noise due to supply fluctuation, neighborhood signal coupling or radiation can easily tamper the stored value due to absence of a positive noise margin. On the other hand, the two implemented Trojans that use data pattern or bit line pair as the trigger condition cause explicit faulty 56

Fig. 3.16. Trojan To Q VDD 2 causes (a) data retention fault; (b) read-after-write dynamic faults.

behaviors, as the Trojan can be on during both standby or access mode of the v-cell. As illustrated in Fig. 3.16, the on-state Trojan incurs data retention fault with 175 µs in a standby v-cell; and the Trojan triggered by data pattern and bit line pair lead to (w0)(r0)2 and (w0)(r0)3 dynamic faults, respectively. The difference is because that one of the bit line pair is pulled down slightly during the evaluation phase, causing a weakly conducting path at the open defect spot. If the word line is asserted for a long time during a clock cycle, the bit line swing could turn the Trojan pmos PT on and eliminate the explicit payload effect. Table 3.7 demonstrates the minimal impact of the untriggered Trojans. 57

Fig. 3.17. Type-2 Trojans with WL as part of the trigger condition cause temporary negative SNM in the v-cell.

Table 3.7 Impact of untriggered Trojan To QB Vdd on a 32x64 SRAM array.

Parameters Golden Trojan trigger lines WL WL, Q1 Q1, Q2 BL1, BLB1 SNM-hold (V) 0.42 0.42 0.41/0.42 0.41/0.42 0.41/0.42 SNM-read (V) 0.24 0.24 0.23 0.23 0.23 Read access time (ns) 0.26 0.26 0.26 0.26 0.26 Write access time (ns) 0.84 0.85 0.85/0.86 0.85/0.86 0.85/0.86 Standby power (nW) 1.43 1.43 1.43 1.44 1.42 Read Energy (fJ) 118.29 118.35 118.30 118.35 118.37 Write Energy (fJ) 110.95 110.71 110.69 110.69 109.39

3.5 Discussion

The design space of Trojans in SRAMs is not limited by that in the core array cells; the attack can be mounted in row/column circuitry and other peripheral logic, e.g. word line segmentation logic, as well. In fact, the more complicated the SRAM architecture is, considering both the cell design and peripheral structure, the more opportunities an attacker will have to insert hardware Trojans. For example, modern SRAMs usually contain logic for SRAM partitioning, divided word line, hierarchical decoder for high performance; such complicated SRAM architectures lead to overall less compact layout, introducing more free space. An attacker can be optimistic to some extent on the fact that these optimizations are unavoidable with increasing SRAM sizes, which saves the attackers’ life, that would otherwise be more difficult as 58 higher lithographic technology and layout techniques lead to more compact SRAM designs. The concept and overall design methodology of SRAM Trojan can be applied to other forms of memory, e.g. DRAM or flash memory. The general model of Trojan attack in memory will remain valid, and the philosophy of designing rare trigger conditions to bypass standard tests can be reused. However, depending on memory technology as well as the testing methods, various rare trigger conditions and payload can be implemented. For example, in the case of NAND flash, one can design Trojans to affect internal voltage of the NAND stack when a Trojan is triggered.

3.6 Summary

In this chapter, we have designed and implemented hardware Trojans in SRAMs that can evade industrial standard post-manufacturing testing while causing malfunc- tion in deployment to tamper the data integrity in embedded cache memories. The Trojans are to cause short, bridge or open defects in SRAM core array with well- designed trigger mechanisms. Different variations of each type of Trojans are imple- mented and analyzed, exploring the comprehensive design space of trigger/payload spot and trigger patterns. HSPICE simulation is performed at 45nm CMOS tech- nology node to demonstrate the faulty behavior caused by each implemented Trojan, as well as the minimal impact on SRAM performance, power and stability while the Trojans are not activated. The proposed Trojans are implemented in a compact layout following the traditional 6T SRAM style to demonstrate that they do not tamper SRAM cell foot-print or incur silicon area overhead. By doing this work, we demonstrate the general feasibility of inserting hardware Trojans in foundries by manipulating design layouts. Future work will include hardware validation through test chip fabrication and measurements and extension of the threat model to other memory technologies. 59

4. TEMPORAL SELF-REFERENCING (TESR) FOR SE- QUENTIAL TROJANS DETECTION

4.1 Introduction

Hardware Trojans refer, i.e. malicious modifications in an IC, are generally clev- erly designed to evade post-manufacturing testing and various detection approaches Conventional structural and functional testing fails to reliably detect these Trojans due to their stealthy nature and inordinately large number of instances an adversary can exploit [57]. Hardware Trojan circuits can be either combinational or sequen- tial [2] in nature. A combinational Trojan depends on the occurrence of rare logic values at one or more internal circuit nodes to trigger, while a sequential Trojan acts as a time-bomb, exhibiting its malicious effect after a sequence of rare events during long period of field operation. Fig. 4.1(a) shows a generic model for sequential Tro- jan. Examples of sequential Trojan circuits are k-bit synchronous counter, as shown in Fig. 4.1(b) and Finite State Machine (FSM) which is triggered by rare events in the internal nodes of the original circuit, as shown in Fig. 4.1(c). Trojan activation con- dition is referred as Trigger condition, while the node that can be affected when the Trojan is triggered is referred as payload. The individual state transition conditions are referred as partial trigger conditions (PTC). Another kind of sequential Trojan [3] with a passive payload, consists of a Linear Feedback Shift Register (LFSR) which is used to leak the secret key in cryptographic hardware by aiding side-channel attacks, as shown in Fig. 4.1(d). Various hardware Trojan detection techniques have been presented earlier, each of them having their own merits and limitations. In fact, most of the techniques act as complimentary detection mechanisms providing their unique coverage for partic- ular Trojan models. For example, leakage current based Trojan detection schemes 60 can be extremely powerful for large sequential Trojans that contribute significantly to the leakage power traces, whereas logic testing-based schemes are suited for acti- vating and identifying the presence of small combinational Trojans which can easily be missed in the process noise. Generally speaking, side-channel based approaches are advantageous in detecting sequential Trojans compared to logic testing as they do not require functionally triggering the Trojan. However, previous side-channel ap- proaches suffer from reduced sensitivity due to process variations, and rely on a golden IC instance which is usually hard to obtain. In [8] we proposed a novel side-channel analysis approach referred as Temporal Self-Referencing or TeSR for efficient de- tection of these Trojans. The proposed approach can eliminate the effects of both die-to-die and within-die process variations, as well as local noise induced by other de- sign marginalities. It also avoids the requirement of a reference or golden IC to isolate

Fig. 4.1. (a) Sequential Trojan model and examples: (b) Syn- chronous Counter, (c) Rarely-triggered Finite State Machine (FSM), (d) MOLES Trojan [3]. 61

Trojan effects by comparing a chip’s transient current signature with itself - but at a different time window. In this chapter, we extended our work in [8] in the following aspects: 1) We developed an algorithm to systematically generate TeSR test sets to guarantee the effectiveness of the approach and improve Trojan detection coverage; 2) We elaborated the analysis of TeSR with respect to sequential Trojan types, and presented a Design-for-Security (DfS) technique to facilitate TeSR in testing against very hard-to-detect sequential Trojans; 3) In addition to validating TeSR on three large sequential IP cores, the effectiveness of the approach is proved by comparing its Trojan detection capability with an intra-die process calibration based approach; 4) Experimental validation was conducted on a FPGA platform with Xilinx Vertex-II XC2V500 device. TeSR focuses on identifying the sequential Trojans, which typically represent a greater threat than their combinational counterparts, since an intelligent attacker can take advantage of few state elements to create a complex Trojan with rare trigger conditions. The main insight on which the work is based is that when a Trojan- free circuit is made to undergo the same set of state transitions multiple times, the transient current “signature” should remain constant over different time windows. However, in a Trojan-infected circuit, the overall current signature varies over multiple time windows for the same set of state transitions of the original circuit, due to uncorrelated state transitions in the Trojan. TeSR can be employed without process calibration of golden chips, as it is performed independently for each IC. To the best of our knowledge, this is the first side-channel analysis approach for Trojan detection that 1) completely mitigates the effect of die-to-die and within-die process noise (both random and systematic); 2) cancels the effect of other design marginalities; and 3) avoids the need of having golden reference chip. The rest of the chapter is organized as follows. In Section 4.2, we present the related work in the field of Trojan detection and the scope of the proposed approach in comparison to other complementary approaches for hardware Trojan detection. In Section 4.3, the temporal self-referencing concept is illustrated with a motivational 62

Fig. 4.2. Comparison of challenges and scope of different Trojan de- tection approaches.

example which indicates how TeSR overcomes the challenge of process variations and other design marginalities. The main methodology is described in Section 4.4, along with details of test generation and circuit characterization. Section 4.5 contains the simulation results and experimental validation of the proposed approach. We conclude in Section 4.6.

4.2 Background and Scope

4.2.1 Related Work

Traditionally, two types of hardware Trojan detection techniques have been pro- posed in literature: (a) Logic-testing based techniques and (b) side-channel analysis- based techniques. Sequential Trojans can be extremely hard to detect using logic testing approaches [57] because the sequence of rare events required to cause all the state transitions in the Trojan, finally leading to the activation of its payload, is highly unlikely to be satisfied during test-time. Logic testing approaches depend on comparing the functional behavior of a circuit under test (CUT) with that of a golden or reference circuit. These approaches are usually more effective for detect- ing combinational Trojans activated by rare values at the internal circuit nodes [10]. 63

Furthermore, if the Trojan trigger mechanism is independent of the circuit operations (e.g. Fig. 4.1(b)), logic testing techniques become completely ineffective. On the other hand, since side-channel analysis is based on noting the Trojan effect on physical side-channel parameters such as current or delay, they can be very effective in detecting large sequential Trojans. These approaches do not require the Trojan to be completely activated and its malicious effect propagated to primary outputs in order to be detected. However, traditional side-channel analysis approaches suffer from reduced sensitivity with ever-increasing inter-die and within-die process variation effects [61]. Though the Trojan circuit’s activity is reflected in the supply current, the effect can be easily masked by process noise, leading to false positive/negative decisions [29]. Hence, existing approaches tend to use process calibration techniques with known set of golden ICs in order to obtain the golden trend. Any deviation from the trend (beyond a pre-defined threshold) signifies the presence of a Trojan circuit. In [63], the authors use current measurement from multiple ports along with cali- bration techniques and statistical analysis to alleviate the effect of process and envi- ronmental variations. In [31], correlations between multiple side-channel parameters like transient supply current and maximum operating frequency are used to iden- tify golden trend line which minimizes effect of process noise. Further experimental analysis to study the effect of inder-die and intra-die variations on Trojan detec- tion sensitivity is presented in [32, 33]. In [34], a formal extension of this method to combine multiple modalities for Trojan detection is discussed. Region-based test vector generation [35,37,38,62] has been shown to increase Trojan detection sensitiv- ity for large circuits. Other methods include path-delay fingerprint calibration [11], ring-oscillator based delay calibration [39, 78] and gate-level characterization [42–44] of leakage and delay parameters for all gates in the original design under process variations for identifying presence of extra gates post-fabrication. Nevertheless, existing side-channel methods cannot completely mitigate the influ- ence of process variations in hardware Trojan detection since they depend on measure- ments made from multiple ICs to compare and make the decision. In addition, all of 64 the earlier approaches rely on the availability of a set of golden ICs (usually obtained by destructive testing of a sample of untrusted ICs) or complete characterization of the golden design, which can be of exponential complexity for large designs. Comple- mentary to these approaches are run-time functional validation approaches [45, 46], which require high design overhead, but provide a last line of defense for identifying presence of Trojans in mission-critical systems.

4.2.2 Scope of the proposed Trojan detection approach

The temporal self-referencing approach is effective for generic sequential Trojans, which are modeled as in Fig. 4.1(a). The state transition conditions (Cti ) are derived from combinations of rare internal node values (T1,T2,...Tn). The Trojan causes a malfunction at its payload in state StT after it goes through the state transition

St0, St1, . . . StN . We can assume without loss of generality that the Trojan FSM is often confined to the states before StT during test-time, otherwise it would cause a malicious effect at its payload, and be detected by functional testing approaches [57]. This accounts for detection of the combinational Trojans and small sequential Trojans (very few states). We can also detect large, distributed, sequential Trojans, which are very likely to cause sufficient change in the side-channel parameter like leakage current beyond the process noise [29, 31, 63]. The various challenges for non-invasive side-channel approaches are presented in Fig. 4.2 along with their relative scope. The attack model exploited in this chapter assumes trusted RTL design but considers Trojan insertion as possible in any stages of IC development after RTL design and verification, given the various verification techniques and phases in the front-end IC design, which can prevent Trojan insertion by insiders to a large extent. It is generally accepted that there is no silver-bullet solution which can detect Trojans of all possible sizes and types. While the proposed TeSR scheme is suitable for sequential Trojan attacks of various forms and sizes, it provides distinct advantage for small sequential Trojans, which easily evade logic testing and existing side-channel 65 approaches due to process variations. Statistical logic testing approaches [57] can be used as a complementary technique to TeSR since they have high coverage for ultra- small combinational Trojans which do not produce significant effect on a side-channel parameter but can get triggered easily. The only known limitation for the TeSR approach is due to temporal variations induced by measurement noise, and solutions to reduce their effect have also been discussed here.

4.3 Motivational Examples

As a motivational example of self-referencing based sequential Trojan detection, we simulated a 32-bit DLX processor circuit (with ∼ 20, 000 logic gates) in HSPICE using 70nm Predictive Technology Model (PTM) [64]. Test vector sets are designed to fill the pipeline with repeated “NOP ” or “ADD” instructions, causing controlled activity in one pipeline stage at-a-time. Multiple instances of the processor were considered - non-infected and infected, at different process corners, to demonstrate the existence of time-invariant (but process-dependent) signature in each non-infected IC. The Trojan circuit is a free-running synchronous 8-bit binary counter (see Fig. 4.1(b)), which causes malfunction when it reaches the maximum count value (i.e. after 64 cycles of continuous operation), which is considered, for illustrative purposes, to be beyond the test application time. The measured side-channel parameter is the average transient supply current in each clock cycle. Due to process variations, device parameters shift from their nominal values. Fig. 4.3(a) shows the effect of process variations on the transistor threshold volt- age (VT ) and they can vary due to inter-die variations as well as intra-die variations, which can have both random and systematic components [61]. The effect of pro- cess variations was simulated using Monte Carlo simulations in HSPICE with ±20% variations in inter-die VT and intra-die variations having a standard deviation (σ) of 10%. These variations can mask the effect of an inserted Trojan circuit, as ev- ident from the overlap in the simulated average current distribution of the golden 66 and Trojan-containing circuits in Fig. 4.3(b). The overlap is prominent for large cir- cuits with small Trojans, which makes it difficult to choose a single threshold value to distinguish between infected and non-infected ICs, causing large mis-classification errors. For side-channel analysis based Trojan detection techniques to be dependable, the effect of process variations and design marginalities must be eliminated. We note that if we compare the current signature for the same IC, when it is subjected to the same test stimulus under the same experimental setup but in different time windows, we can isolate the temporal variations in Trojan current (if present). This is because the recorded current trace over different time windows consist of two components - (a) a correlated component because of identical state transitions of the circuit, and (b) an uncorrelated component due to the switchings in the Trojan circuit. Fig. 4.4(a) shows the cycle-by-cycle average transient current trace of the DLX circuit for two windows, where it was repeatedly brought to the same state and made to go through the same set of state transitions. The current trace corresponding to the state transitions can be clearly distinguished by its repetitive nature, and forms a “current signature” for this state transition sequence, as seen in the bottom plot of Fig. 4.4(a), where the current signatures from the two windows are superimposed on

Fig. 4.3. (a) Circuit-level parameter variations can be due to inter-die or intra-die variations in device parameters. (b) The effect of process variations on the average transient current can mask the effect of a Trojan circuit. 67 each other. In Fig. 4.4(b), the current signatures for the same two windows are plotted for a non-infected die at a different inter-die process corner. Process variations cause considerable change in the golden signature from chip-to-chip, but the signature for the same IC instance remains time-invariant. The same effect holds true even under intra-die process variations, as seen in Fig. 4.4(c). Now, let us consider the current signatures for a Trojan-infected DLX circuit in Fig. 4.4(d). Since the Trojan state machine undergoes a set of state transitions uncorrelated to the original circuit, the current trace in the two time windows differ substantially. This example motivates the use of temporal self-referencing as a high-sensitivity Trojan detection scheme.

(a) Golden Nominal (b) Golden with Inter-die Process Variations

(c) Golden with Intra-die (d) Trojan Nominal Process Variations

Fig. 4.4. Effectiveness of temporal self-referencing in detecting Tro- jans even amidst process variations. 68

4.4 TeSR Methodology

Fig. 4.5 illustrates the basic concept of the proposed Temporal Self-Referencing (TeSR) hardware Trojan detection approach. First, the FSM in the original circuit is excited to go through a sequence of states SD1, SD2 ... SDn for the purpose of Trojan detection, which is called Test trial #1. These state transitions are triggered by specifically derived test patterns ({Vtest}) that can maximize Trojan circuitry activity. Therefore, certain Trojan FSM transition can be expected to take place. In this example, a single transition of Trojan FSM is assumed for the sake of simplicity.

When the original FSM reaches state SDn, another set of test patterns is applied to bring the FSM back to state SD1, during which the Trojan FSM can have zero or more transitions. Without loss of generality, we use ST 3 to denote the Trojan FSM state after re-initialization, where ST 3 is not necessarily different from ST 2 but expected to be different from ST 1. At this point, {Vtest} is applied again to excite the original

Fig. 4.5. Basic concept of TeSR. 69

FSM to traverse the same sequence of states SD1, SD2 until SDn, which is denoted as Test trial #2. During this process, the Trojan FSM can have certain state transitions starting from ST 3, which would be different from its transitions in Test trial #1 given the fact that ST 3 is not the same as ST 1. Transient current IDDT is characterized for each clock cycle during Test trial #1 and Test trial #2. Comparison is then performed between the two trials. The original circuit will exhibit exactly the same switching activities in the two trials, given the same initial state and the same set of test patterns. However, since the Trojan FSM starts from two different states in both test trials, the state transitions as well as combinational logic switching will be different, leading to different IDDT . Therefore, if there is any difference between the measured IDDT of the circuit under test during the two test trials, one can infer that it is caused by a sequential Trojan. The underlying assumption is that Trojan FSMs will have a state transition diagram uncorrelated with that of the original circuit FSM; otherwise the Trojan can be detected easily with logic testing. The major steps of TeSR methodology are shown in Fig. 4.6. It involves both test generation and current measurement-based circuit characterization. Starting with the set of generated test vectors, input vectors are applied to take the circuit to each of the starting states Sinit of the pre-determined test trials. Once the circuit is in a desired starting state Sinit, a corresponding set of test vectors {Vtest} are applied which take the circuit through a fixed set of state transitions in order to produce the characteristic current trace. The current signature is computed by taking the average of the transient current waveform for each cycle. The difference metric for comparing the current signature of two windows is taken as the Euclidean distance between the two current signatures. If one or more of the current traces differ from the average current trace over multiple windows by a pre-defined design-independent threshold to account for the temporal measurement noise, the IC is inferred to contain a Trojan.

This technique is repeated for various test sets starting from different states Sinit in order to ensure detection coverage of different kinds of possible Trojan instances. 70

Fig. 4.6. The major steps of the TeSR for Trojan detection.

The recorded current traces might have small, random temporal variations due to transient measurement noise, supply voltage fluctuations or temperature variations. The effect of these noise components can be minimized by maintaining stable exper- imental conditions during testing. Also, random measurement noise can be largely eliminated by averaging the transient current waveform over multiple measurement runs which could all start from power-on reset in order to ensure that the Trojan circuit is also re-initialized. The pre-defined decision threshold also determines to some extent the limit of the detection efficiency of the proposed approach. For ultra small Trojans, the variation in current signature might fall beneath the noise floor. However, the detection sensitivity can be increased by using previously proposed approaches, like region-based test generation [35] and use of measurements from mul- 71

tiple supply pins [63] in order to decrease the background current and increase Trojan detection sensitivity.

4.4.1 Test Generation

The test generation procedure is divided into two parts. First, a statistical test pattern generation approach MERO [57] is applied to generate sets of test patterns (corresponding to the test trials) that can maximize the Trojan circuitry activities. Second, reachability analysis is performed on the FSM of the original circuit to identify transition paths which bring the FSM back to the first state of each test trial. The detailed steps of the test generation procedure is provided in Algorithm 1. First, reachability analysis is performed on the original FSM to identify states which can seldom be reached (in terms of number of paths) by other states (step 1, 2, 3). These states are not suitable to be used as the first state of a test trial. In particular, step 1 performs a reachability analysis on the original circuit FSM to identify all paths between every state pair. Here v represents the complete information of the

path starting from state Ss and ending at state Se, which contains all intermediate states on the path and input vectors to trigger each intermediate state transition. In

step 2, the identified paths are grouped according to the destination states Se. These destination states are to be used as the first state for MERO test trials. The groups are then sorted in descending size (step 3), where a larger size indicates Se that can be reached through more paths. Only the first several destination states are selected as the candidates to apply MERO sets from, in order to to guarantee a good chance of re-initializing the FSM. These states form the MERO intial state set {Sinit}. Next, in step 4, MERO vectors are generated and applied from each element S0 in the optimal initial state set {Sinit}, and the corresponding end states SF are recorded. Note that in the test generation procedure, attention is only paid to the logic activity of the circuit; while the transient current signature is only captured in the signature characterization phase. Next a search is performed within the reachability database 72

for elements p = {SF ,S0, vi}, where vi represents a generic path. If there is no such a path, it means the current initial state S0 is not reachable by the end state of the

current set of MERO vectors. Then the current MERO set is abandoned for this S0. However, this does not mean the MERO set cannot be applied for other elements in

{Sinit}. If a single path p = {SF ,S0, vi} exists in the reachability database, the end

state of MERO vectors can be brought back to S0 through p. And the entire TeSR

test set can be expressed as {VM , v, VM }, where v stands for the test vector sequence to realize transition path p. In fact, the procedure of re-initializing and re-applying MERO sets can be repeated multiple times to increase the chance of capturing Trojan circuit activities during the repetitive MERO tests. As stated before, the basic requirement to guarantee the ef- fectiveness of TeSR is different initial states of Trojan FSM when repetitively applying MERO test sets. Upon satisfaction of this requirement, the circuit IDDT signatures will vary among multiple test trials, and the difference can contain combinational and sequential switching components. In particular, if Trojan FSM state transitions only occur during the re-initialization procedure and not in the test trials, the captured IDDT discrepancy would be only due to different switching of the combinational next- state logic. And the amount and distribution of the discrepancy depend on to what extent and in what frequency the MERO sets can trigger the Trojan activity. On the other hand, if the Trojan FSM have state transitions within the test trials, the IDDT difference will be partially contributed by the sequential switching and can be much more significant. Repeating the MERO sets along with the re-initialization test vectors multiple times can improve the probability of Trojan FSM state transition during both procedures by statistically increasing Trojan circuitry activities. This will lead to a more remarkable IDDT difference, hence improve the chance of Trojan detection. Algorithm 2 elaborates the reachability analysis, which is based on breadth-first

traversal. S0 is the root state under consideration. G is the FSM state transition graph (STG) in adjacency-list representation, in which each edge (corresponding to 73

Algorithm 1 TeSR test pattern generation

Step 1: Perform FSM state traversal/reachability analysis & form a database of p = {Ss,Se, v} Step 2: Group p according to Se Step 3: Sort the groups in descending order of size & select the first N Se as the initial states of MERO vectors (Sinit) Step 4: ∀S0 ∈ {Sinit} Apply MERO vector set VM from S0, which ends at state SF PBB ← # of paths p = {SF ,S0, vi} in the database if PBB = 0 then abandon this VM else if PBB = 1 then TeSR set={VM , v, VM } else

TeSR set={VM , v1,VM , v2, ...VM , vPBB ,VM } end if one state transition) has an associated property indicating the set of input vectors that can trigger this transition v(S1,S2). The reason of using adjacency-list instead of adjacency-matrix representation is that most FSM STGs are sparse graphs, and adjacency-list representation can also favor the image computation of each state.

Reached stands for the set of states reachable from S0, which is the goal of the entire calculation. F rontier represents the current frontier states as the breadth-first traversal proceeds. Function Img(Si,G) calculates the states that are reachable by Si in one step, and is defined as follows, where S is the set of states in G, and E ⊆ S ×S is the set of edges in G:

0 0 Img(Si,G) = {S ∈ S | (Si,S ) ∈ E} (4.1)

In fact, the image computation can be easily realized by looking into the adjacency- list of the root state, as all the directly reachable states are stored in the same list. As implied by the name, breadth-first traversal expands the search uniformly across the frontier [intro to Algo], during which the input vector set dictated by the transition 74

function property v(S1,S2) is appended to that of the previous path, and the sequence of input vector sets is associated to each newly identified reachable state as property

Sj · I. The iterative process is continued until no new states beyond Reached are experienced, namely F rontier is empty.

4.4.2 Circuit Characterization

From the generated test vector set, a sequence of test vectors are applied which takes the circuit to the state Sinit, followed by the set {Vtest} that makes the circuit go through a fixed set of state transitions in order to produce the characteristic cur- rent trace. The current signature is computed by taking the average of the transient current waveform for each cycle. The difference metric for comparing the current

Algorithm 2 FSM reachability analysis

Reached = F rontier = Img(S0,G) for each Sj ∈ F rontier do Sj · I = v(S0,Sj) end for while F rontier 6= φ do F rontier1 = φ for each Si ∈ F rontier do T emp = Img{Si,G} for each Sk ∈ T emp do if Sk ∈ Reached then T emp = T emp − Sk else Sk · I = hSk · I, v(Si,Sk)i end if end for F rontier1 = {F rontier1, T emp} end for F rontier = F rontier1 Reached = {Reached, F rontier1} end while 75

signature of two windows is taken as the point-wise Euclidean distance between the two current signatures. If one or more of the current traces differ from the average current trace over multiple windows by a pre-defined noise threshold, the IC is inferred to contain a Trojan. This noise threshold value can be obtained by taking multiple current measurements with constant activity (reset state) to characterize the back- ground noise in the measurement setup. Unlike other side-channel Trojan detection approaches, we do not require one or more golden ICs to determine the threshold or to calibrate process or measurement noise.

4.4.3 Trojan Detection Sensitivity

The sensitivity of a simple side-channel approach based on comparison of measured physical parameter I can be defined in terms of various noise effects and different calibration techniques. For example, in a simple side-channel approach, considering ideal situation with no noise, any golden circuit is expected to have the measured

parameter value as Iorig. The deviation introduced by an extra Trojan circuit causes

the measured value for an infected chip to be IT = Iorig + ∆IT . The sensitivity, in the absence of noise, is proportional to ∆IT and inversely proportional to Iorig. Now, with the presence of measurement noise Inmeas and process noise, Inproc , the measured values of the non-infected circuits can vary from Iorig by Inmeas + Inproc . The process noise Inproc is a time-invariant constant which affects different ICs differently. It can further be decomposed to contain inter- and intra-die components, with the intra-die component having systematic and random sub-components. The measurement noise

Inmeas has a temporal variation (due to temperature and other factors) for the same IC and a dc offset due to measurement circuitry. Considering the simple side-channel analysis approach, sensitivity can be defined as: I − I ∆I Sens = T orig = T . (Iorig + In1 ) − (Iorig + In2 ) ∆Inmeas + ∆Inproc Existing side-channel approaches tend to perform process calibration by using normalization (or process-corner estimation) and measurement noise calibration by 76 averaging over multiple measurements to get rid of random noise. In order to get rid of inter-die variations and calibrate systematic intra-die variations, region-based approaches are used where measurements from multiple power pins corresponding to activation of distinct regions, help to compare the measured parameter from the same IC under different circumstances (self-referencing). By using a region-based approach, one can also increase the sensitivity since the Iorig value gets reduced and any noise which is proportional to the measured value (e.g. process noise) gets reduced as well. However, by using a temporal self-referencing approach like TeSR, we can completely eliminate process noise since we are comparing measurements for the same input vectors for the same IC under different time windows. Hence, ∆IT SensT eSR = . The ∆IT in this case is the difference in activity within the ∆Inmeas Trojan circuit at different time windows, since the original circuit will have the same value of the measured parameter for the same set of vectors. The only factor which limits the sensitivity of this approach is the time-varying component of measurement noise. This can be reduced by performing the measurements under temperature- controlled test environment with high-quality test equipment, as done in standard semiconductor testing facilities in the industry. Moreover, by averaging measurements over multiple cycles, we can further increase the sensitivity.

4.4.4 Role of Scan Chain

In order to improve testability of sequential circuits, test engineers typically use various “Design for Test” (DfT) measures such as scan-chain insertion. If the se- quential elements (flip-flops) in the design are implemented as scan flip-flops and connected in a chain, any value can be loaded into them in the testing phase, thereby reducing the test generation problem to that for a combinational circuit which is much tractable computationally. The degree of testability and the associated design overhead provide a trade-off which causes circuit designers to go for partial scan- based approaches where only a few selected flip-flops are part of the scan-chain. If 77 the design is equipped with full-scan, it is easy to initialize the entire circuit at any particular state from which the current signature is to be measured. For the state diagram shown in Fig. 4.7(a) it is possible to take the circuit to the desired state

S10 to start the test application procedure for Trojan detection. However, it must be noted that the easily identifiable standard test control (TC) signal can be used by the attacker to disable the Trojan or to synchronize the Trojan state machine with the test application phase. This would defeat the purpose of temporal self-referencing as, in this case, the Trojan current signature would be invariant for each application of the same test sequence. In order to avoid this, we need to perform side-channel current signature measurement in the normal functional mode of the circuit and not in any easily-identifiable test mode. However, full-scan designs can still be used to aid in the test application process for temporal self-referencing. As shown in Fig. 4.7(a), one of the desired test sequences to be applied is {V1, V2, V3} starting from the initial state S10. For each window, we need to compare the current signature obtained during application of this test pattern for Trojan detection using temporal self-referencing. For this, we need to ensure that the Trojan is not at the same state each time we take the circuit to the state S10. We note that by using full-scan capability of the design, we can initialize the circuit to different initial states S0, S4 or S13, which are close to the desired state S10, and then use input vector sequences obtained from reachability analysis to direct the circuit to the desired state along different paths. In other words, scan chain allows one to set the original FSM to an easily re-initializable (to S10) state. This can reduce the test time and increase diversity in the re-initialization paths, thus improving the chance of causing Trojan switching activity. There is another constraint on the lengths of the scan-facilitated initialization paths (e.g. from S0, S4, or S13 to S10). Suppose the Trojan is of the free-running synchronous counter type, and it gets reset with the TC signal (in the full-scan case) or the reset signal (in the no-scan case). In this case, if the lengths of all scan-facilitated initialization paths are equal, the Trojan would be at the same state 78

Fig. 4.7. Test application strategy considering the state transition diagrams for (a) full-scan and (b) no-scan designs. The example test signature consists of the average current for vectors I1, I2 and I3 applied when the circuit is in state S10. Different paths are used to arrive at state S10 to get same current signature for the golden circuit but different signatures if a Trojan is present and shows some activity for the particular test set under consideration.

irrespective of the actual path taken to arrive at Sinit. In that case, the effect of the Trojan on the overall current signature would be identical for every run, and temporal self-referencing would be unable to detect the Trojan. Hence, the length of the scan- facilitated initialization paths should all be different, which would ensure that the Trojan state machine is at a different state for each of the runs. Ideally, the lengths of the paths should be mutually prime, which would eliminate the possibility of Trojan

FSM state coincidence at Sinit in different test trials. Hence, the overall set of patterns 79

required to record a characterization dataset is given by V = {(Sscan, {Cp}, {Vtest})}, where the set of vectors {Cp} takes the circuit from state Sscan (set via scan chain) to Sinit, and |Cp| is mutually prime for different paths p for the same Sinit.

4.4.5 DfS for Detecting Transition-Proof Trojans

The effectiveness of TeSR demands the Trojan FSM to start at a different state in each test trial. To achieve this, MERO test generation algorithm is applied to maximize switching activity of Trojan circuitry thus the frequency of Trojan FSM state transition. However, it is particularly difficult to make certain types of sequential hardware Trojans have captureable transitions as they tend to get stuck in certain state(s) stably. STG of one such Trojan is provided in Fig. 4.8. It can be seen that each state transition towards the final state requires an input vector from a pre-defined set; upon any other input vector, the FSM will go back to the initial state. Since the difficulty of satisfying a rare event sequence grows exponentially with the length of the sequence, FSM of this type of Trojans stays in the initial state for most of the time. Examples of such Trojan are the ones monitoring a particular input or internal

Fig. 4.8. STG of transition-proof Trojan.

Fig. 4.9. (1)Flip-flops in original circuit FSM; (b)DfS-enhanced flip-flops. 80 variable sequence, which triggers the payload effect only when the expected input sequence is satisfied in consecutive clock cycles, otherwise returns to the initial state. We name this type of sequential Trojans as Transition-Proof Trojans (TP Trojans). It is difficult to capture TP Trojans in states other than the initial states. Therefore, in this case, TeSR will lose the power because the Trojan FSM, besides the original circuit FSM, also starts from the same (initial) state in each test trial. To solve this problem, we propose a Design-for-Security (DfS) technique which can freeze the original circuit FSM in any state (provided full-scan-chain) and test the circuit IDDT signature on a per-cycle basis. The DfS-enhanced design is illustrated in Fig. 4.9. The original state elements of the circuit take inputs from the next-state logic and produce output to the next-state logic. With the DfS-enhanced feature, the flip- flops work as usual in normal mode, but can retain their values in the Trojan detection mode (en=1) while the next-state logic still switches due to the test vectors applied at primary inputs. Therefore, in the Trojan detection mode, original circuit switching activity depends purely on the primary inputs, and consists of only combinational switching. Any switching current uncorrelated with the input vectors must be caused by the Trojan state elements. This means, if we apply the same input vectors multiple times and observe different IDDT , we can claim the existence of sequential Trojans. It is worth noticing that the attack model assumes trusted RTL, which means Trojan insertion can only happen in back-end design (in IC layout) or foundries. Our DfS technique will be implemented in RTL as explained in more details later, a Trojan FSM will not have the DfS feature to freeze its states. Therefore, it can still cause different IDDT signatures during multiple test trials due to its sequential property.

In particular, we expand each MERO test set to an enhanced set VMEROe as follows:

VMERO = {v1, v2, ..., vm} ⇒ (4.2)

VMEROe = {v1, v1, v1, v2, v2, v2, ..., vm, vm, vm} (4.3) 81

By tripling each test vector in VMERO we could zoom in our observation by comparing

the circuit IDDT cycle by cycle. In the three cycles when applying vi, IDDT of the second and third cycles are measured. Since in the first cycle, vi is applied and next- state logic outputs computation is completed, when applying vi again in the second and third cycle, there should not be any switching activity measured. However, if input vector vi triggers a Trojan state transition, we should be able to observe non- zero (and different) switching current either due to Trojan state transition again (e.g. return to the initial state) or different Trojan combinational logic switching (because of different FF values serving as Trojan next-state logic inputs). Such “zoom-in” test allows us to identify TP Trojans. Considering that Trojan state transitions may be only triggered under certain but not all original FSM states, the above test need be performed for various original FSM states. The initialization can be realized by shifting in the desired state values through full-scan-chain, or by running the circuit under normal mode for certain deterministic time and assert the Trojan detection enable signal. One drawback of directly inserting multiplexers (MUXes) in the design netlist is that the array of MUXes renders the enable signal (En) easily visible to attackers who explore the design layout or netlist. To avoid such exposure, two tricks are employed when implementing the DfS technique. First, the FSM state freezing is realized during RTL design phase by modifying the STG to include an arch from each state to itself. For states that do not have a path to itself, such path is added with the transition condition of “En = 1”. For states that already have a path to itself, the transition condition is altered to “CondkEn = 1”, where “Cond = 1” is the original transition condition. Second, a separate primary input En may be identifiable especially when appearing repetitively near the flip-flops. Therefore, we implement a small FSM to activate En with a particular sequence of input vectors:

seq(Ven1,Ven2, ...., VenN ) → En = 1 (4.4) 82 where N is the number of input patterns in the sequence. These input vectors are beyond the functional vector set to make sure En will not get asserted unintentionally when the chip is normally functioning. En gets deactivated with the global reset signal. The entire design modification is done in RTL hence the enable signal and the MUX logic will be merged into the next-state logic then mapped to the designated technology libraries. No DfS signals are left outstanding and the DfS feature is difficult to detect without a thorough design reverse engineering.

4.4.6 Summary of Test Considerations

In this work, we target the detection of Trojan instances which affect the transient current signature of a circuit. In particular, the effectiveness of our approach is due to the following features:

• Nature of the inserted Trojan: An inserted Trojan instance is sequential in na- ture, either running independently (e.g. the binary counter of Fig. 4.1(b)), or triggered by rare events at the internal nodes of the circuit (e.g. the asyn- chronous counter of Fig. 4.1(c)). An DfS technique can be adopted to facilitate TeSR in testing against Transition-Proof Trojans. The proposed approach needs to be augmented with functional testing approach which is effective for detecting ultra-small combinational Trojans which do not have time-varying signature.

• Test application: The circuit can be brought to the same state multiple times by state transitions along different paths. Starting from this state, the circuit can then be made to traverse a pre-determined set of state transitions to produce current signatures that can be used for comparison.

• Variation of Trojan current signature: The effect of an inserted Trojan on the current signature varies with the change in state of the Trojan circuit. Ear- lier circuit-level design techniques have been proposed to equalize the switching currents for all state transitions in CMOS circuits in the context of securing 83

cryptographic circuits against power-analysis attacks [65]. However, such cir- cuits are known to cause over 2X increase in area/power which can make them easily detectable by simple current analysis. Besides, current balancing tech- niques suffer from reduced effectiveness under process variations. Process varia- tion tolerant current balancing circuits require asynchronous design techniques, which increases the overhead further [52].

• Elimination of process variation effects and design marginalities: The effects of both intra-die and inter-die process variations as well as effects of design marginalities on the transient current signature depends solely on the IC in- stance under test and the set of state transitions for which the dataset is recorded. Hence, self-referencing eliminates all effects of process induced varia- tions and design marginalities on the current signature by comparing between multiple datasets for the same set of state transitions of the same IC instance.

4.5 Results

In this section we present the effectiveness of the temporal self-referencing based validation of sequential Trojans. We present both simulation as well as measurement results obtained from FPGA experiments.

4.5.1 Test Setup

We used three test circuits to validate the proposed Trojan detection approach: 1) an AES cipher circuit with an equivalent area of 22,386 two-input NAND gates (i.e ∼ 105 transistors) and about 30% of the total area contributed by memory elements, 2) a 32-bit pipelined Integer Execution Unit (IEU) with 20,775 two-input gates and 3) a 32-bit DLX processor with a 5-stage pipeline with about 19,338 two-input gates (mentioned in Section 4.3). We introduced three types of sequential Trojan circuits in each of the designs to investigate the scalability of the approach. The first Trojan 84 is a k-bit synchronous counter as shown in Fig. 4.1(b), where k was varied from 1 to 10 bits. The second Trojan is a synchronous Finite state Machine (see Fig. 4.1(c)) with 6 flip-flops. The partial trigger condition is a 9-bit value derived from the rare-valued internal nodes of the original circuit. The third Trojan, as shown in Fig. 4.1(d) is a Linear Feedback Shift Register (LFSR) with 20 flip-flops, modeled on the MOLES Trojan described in [3], which leaks the secret key inside the AES circuit by modulating a pseudo-random number generator to assist in side-channel attacks. All circuits were designed (or obtained from [53]) in Verilog and synthesized using Synopsys Design Compiler and a LEDA library. Circuit simulations were carried out for the 70nm Predictive Technology Model (PTM) [64] using the HSPICE simulator. We used Monte Carlo simulations to model the effect of inter- and intra-die variations in VT . The measurement noise from recorded current waveforms (as explained later in Section 4.5.3) was characterized to generate random Gaussian noise in MATLAB, which was used in our simulations to model the effect of temporal variations. The test vectors were generated based on the algorithm described in Section 4.4.

Fig. 4.10. IEU with a 8-bit counter. 85

4.5.2 Simulation Results

Fig. 4.10 shows the plot of average current over each clock cycle for a 32-bit In- teger Execution Unit (IEU) as the original design with and without an 8-bit counter as the Trojan. The average current trace (blue for non-infected and red for Trojan) shows repetition between the two windows corresponding to the signature, which are

Fig. 4.11. AES with a MOLES Trojan LFSR (Linear Feedback Shift Register).

Fig. 4.12. DLX with a FSM Trojan. 86

Table 4.1 Difference metric and Test Length for three designs with three types of Trojan instances.

Test Difference Metric (µA) Length No Trojan Counter FSM LFSR IEU 752 2.68 47.26 214.30 89.88 AES 1161 3.11 87.09 215.30 78.28 DLX 605 2.96 4.10 33.90 33.63

highlighted using the black rectangles. The current signatures for the two windows are superimposed and the difference between the two signatures are also plotted. It can be clearly observed that there is a significant difference in the signatures for the two windows due to presence of Trojan. Similar waveforms are plotted for different non-infected and Trojan-infected circuit combinations in Fig. 4.11 and Fig. 4.12 re- spectively. Note that, unlike existing process calibration approaches, we do not need to compare the current signature between non-infected and infected ICs to detect presence of Trojan. The slight difference in current signatures for the original circuit is due to mea- surement noise, which is superimposed on the supply current waveforms. The noise threshold is obtained from the measurement data and any difference larger than that is attributed to the presence of a Trojan. The difference metric values for the different circuits with different Trojan instances are shown in Table 4.1. The difference for a non-infected IC is also shown for comparison, which falls within the noise threshold. Table 4.1 also lists the test length obtained using our test vector generation tool, which causes each rare node to go to its rare value N = 20 times in order to activate arbitrary combinations of rare nodes, as possible Trojan state transition conditions. Note that, for ICs containing Counter- or LFSR-type Trojans, the entire test set need not be applied for detecting their presence. 87

Fig. 4.13. Difference Metric for varying size of a sequential Trojan in- serted in 32-bit IEU circuit, using TeSR and other process-calibration approaches.

(a) Experimental Setup (b) Oscilloscope

(c) Recorded Waveforms (d) Measurement Noise

Fig. 4.14. Experimental setup using FPGA-based board and mea- sured current waveforms for validating the TeSR approach. 88

Next, we insert Trojan counters of different sizes in the IEU circuit to estimate the sensitivity of the TeSR approach and compare with existing approaches which perform process calibration. We varied the Trojan size from 20 bits to 2 bits and the corresponding values of the difference metric are plotted in Fig. 4.13. The process cal- ibration technique is modeled using normalization of measured current to estimate the process corner and reduce it to the nominal value. The uncalibrated process noise is 1.6mA, which is reduced to 84.43µA after calibration. Hence, counters of size greater than 14 flip-flops or equivalent Trojans can be detected by using process calibration techniques. To further increase sensitivity, we use the TeSR approach which can de- tect Trojans having more than 2 flip-flops and is limited only by measurement noise of 2.76µA. Any smaller Trojan will activate its malicious payload in less than 4 cycles and be detected using logic testing approaches. Note that, since TeSR compares the difference in Trojan activity over multiple time windows, the difference metric values

are less than the normalized IDDT metric used in the process-calibration approach.

4.5.3 Experimental Validation

Hardware validation of the proposed side-channel approach was performed using an FPGA-platform where FPGA chips were used to emulate the ASIC scenario. We wanted to observe the effectiveness of the proposed approach to isolate the Trojan effect in presence of process variations, when a golden design and its variant with Trojan are mapped to the FPGA devices. Such an FPGA-based test setup provides a convenient platform for hardware validation using different Trojan types, sizes and even different designs. The selected FPGA device was Xilinx Virtex-II XC2V500 fabricated in 120nm

CMOS technology. In order to measure IDDT , we measured the voltage drop across a sense resistor (0.5Ω), using high-side current sensing strategy. To increase the accu- racy of measurements amidst measurement noise, a custom test board was designed with the sense resistor connected between the core VDD pins and the on-board bypass 89

Fig. 4.15. Measurement results for DLX with 8-bit counter Trojan.

capacitors. A differential probe was used to measure the voltage waveforms, which were recorded using an Agilent mixed-signal oscilloscope (100MHz, 2Gsa/sec). The waveforms were synchronized with a 10 MHz clock input from the oscilloscope and are recorded over 16 cycles corresponding to a pattern of 16 input vectors. A “SYNC” signal is used to correspond to the first input vector in the set, so that the current can be measured for the same vectors in all cases. The test setup is shown in Fig. 4.14. To observe the effect of measurement noise and other temporal variations in our simulation results, we used the characteristics of the noise obtained in real measure- ments (see Fig. 4.14(d)) to generate random Gaussian noise in MATLAB, which was used in our simulations. Here, we performed measurements for the DLX processor mapped to different FPGA chips. The varying current signature for three of the 10 chips at different process corners is shown in Fig. 4.15. One of the chips contains an 8-bit counter Trojan. It can be clearly observed that the Trojan-containing instance has a difference metric which falls above the noise threshold. 90

4.6 Summary

In this chapter, we have presented TeSR, a hardware Trojan detection approach aimed at detecting sequential hardware Trojans, which are a type of Trojans more difficult to isolate and more capable to perform various malicious functions compared to their combinational counterparts. The proposed approach provides higher detec- tion sensitivity under large process noise, and hence is suitable for nanoscale process technologies. It facilitates detection of small, rarely-activated sequential Trojans, which can be extremely difficult to detect using existing logic testing or side-channel approaches. The approach leverages on the uncorrelated temporal variations in tran- sient current signature of sequential hardware Trojans to isolate their effect from process and measurement noise. By comparing current signature of a chip for the same input pattern at different time windows, it can completely eliminate the ef- fects of both die-to-die and local within-die parameter variations, as well as various design marginalities, which can cause local deviations in current signature leading to large number of false positives/negatives. The proposed approach also eliminates the need of golden or reference ICs, which are difficult and highly expensive to ob- tain. The simulation and experimental validation results verify that the proposed method can be very effective in isolating chips with hard-to-detect sequential Trojans of varying forms and sizes, which can easily evade logic testing and other side-channel approaches. 91

5. SIDE-CHANNEL ANALYSIS BASED REVERSE ENGINEERING (SCARE) FOR POST-SILICON VALIDA- TION

5.1 Introduction

Since the Cold War era, reverse engineering (RE) has been considered as a pow- erful tool to analyze electronic hardware for gaining competitive intelligence or for commercial piracy. Although regarded illegal in common belief, in most countries around the globe, RE is allowed for analysis, evaluation or teaching purposes [54]. In military and many mission-critical applications, RE can provide enabling technology for post-silicon validation of integrity and reliability of complex chips, which are de- signed and fabricated in untrusted environments [55]. For semiconductor industry, RE has become an attractive (and often, the only) option for claiming hardware In- tellectual Property (IP) rights in the court of law. This requirement has led to the formation of a number of industrial entities, e.g. ChipWorks [56], dedicated to reverse engineering and the analysis of microchips and electronic systems. In recent years, IC trust has emerged as a critical concern in semiconductor in- dustry. Dictated by economic reasons, modern semiconductor design and fabrication flow involves third party IP cores, outsourced design and test services, as well as CAD tools supplied by third-party vendors. Lack of control on the design and fabrication steps greatly increases the vulnerability to hardware Trojan attacks. An attacker can mount these Trojan attacks to cause malfunction during field operation or leak secret information from inside a chip. Both side-channel and logic-testing based non-invasive approaches have been proposed earlier in the context of Trojan detection [58] when golden chip instances are available. However, due to untrusted fabrication facility in 92 most cases, golden chips, which are needed to benchmark and detect compromised chips, are hard to achieve and demand reverse-engineering. Image recognition based structural extraction involving de-packaging and de- layering an IC has been conventionally used as a reverse engineering approach [56]. Such a method is highly expensive, time-consuming, and destroys the chip. Since the chip “validated” in this way cannot function properly anyway, it can no longer be used as the benchmark for detecting other potentially compromised chips. On the other hand, some functional RE approaches have been investigated in recent years, e.g. [59] and [60]. Yet the complexity of logic testing approaches increases dramati- cally with the circuit size, especially in absence of full-scan testability in the design. More importantly, logic testing based approaches rely on random test vector genera- tion, which can fail to detect extraneous undesired functions reliably if the functions are activated and observed only under rare conditions [57]. This implies that logic testing approaches aim to identify only the Boolean functions while considering the actual structural connectivity information transparent, which itself implies potential ignorance of design-parameter-violation-Trojans. In this chapter, we propose a top-down, hierarchical unified side-channel and logic testing approach that can extract both structural and functional information from a manufactured IC. The method assumes the availability of a golden design (not golden chip instance), and can be extended to scenarios without the golden design [41].

Fig. 5.1. Untrusted stages of the IC manufacturing flow. Steps of the proposed methodology to perform non-invasive RE and trust valida- tion. 93

Fig. 5.1 illustrates the proposed top-down approach. This approach is valuable in two contexts: (1) For validating a golden chip instance as Trojan detection benchmark, it is a significantly more low-cost, time-efficient and reliable choice compared to image recognition and logic testing based reverse-engineering approaches. (2) When the method is considered directly as a Trojan detection approach, it is applicable to detecting comprehensive types of Trojans with no need of a golden chip by providing circuit structural information with the resolution of a single gate. Also, by using temporal and spatial self-referencing, this approach is invulnerable to significant process noise. Comparatively, conventional logic testing and side-channel based approaches are limited by their effective Trojan ranges, lower resolution, vul- nerability to environmental noise, and need of a golden chip. When extended to no-golden-design scenarios, the proposed approach can depend only on the datasheet specifications to detect malicious hardware inserted in any stage of the design and fabrication flow. The hierarchical approach is scalable to large designs.

5.2 Background

Malicious insertions are usually cleverly designed so as to be rarely triggered during normal operation. The reasons for the failure of logic testing based approach to detect hardware Trojans are as follows:

1. Exhaustive enumeration is impractical for large designs, especially for sequen- tial designs with/without scan-chains, creating chances of omitting rare events which trigger hardware Trojans.

2. Trigger of sequential malicious insertions requires a sequence of unknown rare events, which can hardly be achieved even with exhaustive testing, and cannot be triggered during one-time exhaustive enumeration. Moreover, state-elements in such sequential Trojans could use rare switching activity of internal circuit 94

nodes as their clock signal, which again lowers Trojan trigger possibility, ren- dering them almost transparent in logic-based circuit extraction.

On the other hand, side-channel analysis based approaches [58] using transient current (IDDT), quiescent current or path delay fingerprint [79] have been proposed for Trojan detection in untrusted ICs. The main deterrent to such approaches is the large amount of process-induced parameter variations [61] which can mask the effect of malicious circuitry on the measured side-channel parameter. To overcome this drawback, various statistical techniques have been proposed to make process- invariant self-similarities in the design get reflected in the measured side-channel parameter such as transient supply current [62]. While logic values at the primary output reflect only the Boolean function with respect to the present state and primary inputs, the current waveform contains information about relative timing of different paths in the form of glitches, which can reveal significant information about internal structure, such as number of switching gates for particular vector pairs and their connectivity. Similarly, quiescent leakage current [63] contains information about all the gates in an IC, but it is difficult to observe the effect of small Trojans on the total leakage current, hence such methods have decreasing sensitivity for large designs. Therefore we choose an IDDT based side-channel approach.

5.3 Methodology

Transient current (IDDT) signature of an IC in response to input transitions con- tains structural information of an IC including connectivity and dependency among

Fig. 5.2. Spatial self-referencing for identifying hierarchical functional blocks. 95

Fig. 5.3. Main steps of the proposed approach for IC reverse engineering.

blocks. However, to identify structural blocks of an IC from its current signature, two major challenges have to be addressed: (1) avoid the aliasing effect due to simulta- neous switching of multiple blocks; (2) eliminate the effect of process variations and measurement noise. We adopt a novel side-channel analysis approach, referred as self- referencing, which compares an IC with itself - either spatially between two or more

Fig. 5.4. Current signatures of RCA and CSA adders for 45nm and 65nm technology nodes used for self-referencing based reverse engi- neering. 96 regions or temporally between two time instances. The idea of spatial self-referencing can be explained using Fig. 5.2, which shows that the self-similarity of circuit blocks can be exploited hierarchically to identify constituent logic sub-blocks in structured logic. Similarly, temporal self-similarities in current signature are used to build a transient current signature library containing process and technology independent current signatures for each datapath block. The overall flow of the automated reverse engineering approach is illustrated in Fig. 5.3. Next, we describe key steps in detail with specific examples. From the golden structural block diagram, functional blocks are defined along with their input/output dependencies. Next, functional vector sets are generated targeting activation of specific blocks [62]. In circuits with pipeline stages, temporal self-referencing can be used to restrict the switching activity to one stage by appro- priate choice of vectors. Spatial self-referencing can also be used to identify parallel structural blocks and homogeneous array structures such as memory. The next step is to isolate the sequential and combinational parts of switching current by using the correlations between the switching at the positive and negative edges of the clock. By using a slow clock, all the combinational switching can be confined to the positive half-cycle and the switching in the negative half-cycle corre- sponds only to the master stages of the flip-flops and clock coupling current. After the sequential current and the switching current during memory access has been sub- tracted from the stage current, the switching current caused by combinational circuit activity can be isolated out. Due to their regular structures, standard datapath elements exhibit technology and process independent transient current features in response to specific input test patterns, which can be exploited to identify their specific types and implementa- tion. A signature library based on relative features of transient current shapes, e.g. waveform correlation and number of observable ripples, is built after comprehensive characterization of different datapath elements and their standard implementations. One can match the measured signature with macro-elements from the library to con- 97

firm the implementation specified in the golden netlist. Signature characterization is performed with the following perspectives: (1) Architecture-specific signature information: One can sensitize different paths in a circuit which relate to some particular functional behaviors, and manifest informa- tion of structural features. For example, overall topology (e.g. flattened structure or blocked structure) of an adder can be revealed by transitions involving carry propa- gation. (2) Temporal self-referencing: Transient current signatures can be obtained by com- paring switching current for different transitions that trigger the same part of the circuit. (3) Spatial self-referencing: Structural symmetry causes similar transient current for different transitions, helping in detection of repeated structures at high level (e.g. parallel structures) and low level (e.g. repeated full-adders in multi-bit adder). Adder: Fig. 5.4 provides an instance demonstrating all the above three perspec- tives. Current waveforms for two test sets containing 3 vector pairs each are obtained. (1) Set S1 contains vectors to perform single bit addition without carry propagation. In particular, three vector pairs i, ii, and iii are used to perform single-bit addition at bit0, bit1 and bit8 and the current waveforms for two types of adders are shown in Fig. 5.4(a) and (b) for two technology nodes. Test vectors used on the Ripple Carry Adder (RCA) give closely matching current waveforms for all three vectors, implying that RCA contains a repeated bit-wise structure. In the case of Carry Save Adder (CSA), the shape of switching current for different operations depends on the relative bit position inside its block (4 bits are grouped as a block). This can be observed in Fig. 5.4(a), where current waveforms match for addition in the same relative positions inside each block. Besides, from Fig. 5.4(a) and (b) we can see the invariance in shape in terms of relative features across different technology nodes. (2) Set S2 consists of vectors to activate carry propagation paths of different lengths to explore self-similarity inside the adder architectures, by propagating the carry from the carry-in bit, bit3 and bit7. In the top sub-figure of Fig. 5.4(c) we can clearly see 98

the rippling effect in supply current which indicates the carry propagating to the Most Significant Bit (MSB) for RCA. The overlapping current for the 3 vectors confirms the ripple propagation of the most significant 8 bits. For the blocked CSA, if the carry is at the input of a block, the triggered blocks have the same switching activity, forming the block propagation signature(red and blue traces in the center sub-figure of Fig. 5.4(c)). However, if generated inside a block, the block propagation wave- form will only appear when the carry propagates to the next block (the green curve). Similar signatures can be derived at another technology node (65nm), as shown in Fig. 5.4(d), again confirming the technology and process independent nature of the signatures. Quantitatively, cross-correlation is performed between pairs of shapes to measure the similarity. Then the correlation values in response to different vector pairs are digitized and multiplied together to obtain the overall correlation with re- spect to one test set. Finally, signatures from all test sets collaboratively define the actual signature of a datapath element implementation. Multiplier: Current signatures exploring structural self-similarity of multipliers can be obtained in a similar way. For 8-bit Array Multiplier, the following transitions are applied: T1: (0x02, 0x00)→(0x02, 0x01); T2: (0x04, 0x00)→(0x04, 0x01); T3: (0x08, 0x00)→(0x08, 0x01); T4: (0x10, 0x00)→(0x10, 0x01); T5: (0x20, 0x00)→(0x20, 0x01); T6: (0x40, 0x00)→(0x40, 0x01); T7: (0x80, 0x00)→(0x80, 0x01). The corre- sponding switching current is shown in Fig. 5.5. In each transition, only one partial product is made to be 1 and propagate to one primary output through a series of full adders. Regularly increasing number of ripples in the switching current indicates an array structure. On the other hand, the structure of a Wallace Tree Multiplier (WTM) is rela- tively irregular. Test vector pairs T1, T2 and T3 are applied for triggering current signatures. In particular, T1 sensitizes the longest path with no carry propagation (Fig. 5.6(a) red curve), indicating a shorter path than that of an 8-bit array multiplier (Fig. 5.6(a) blue curve), thus implying WTM. T1 and T2 sensitize two different paths 99

Fig. 5.5. Self-referencing current signatures of 8-bit Array Multiplier.

with exactly the same structure, which is specific to WTM. The identical waveforms form a signature verifying this self-similarity. T1: (0x20, 0x00)→(0x20, 0x08); T2: (0x80, 0x00)→(0x80, 0x01); T3: (0x00, 0xff)→(0x80, 0xff). Another feature of WTM is that the switching activities are more focused on the former levels compared to other types of multipliers to reduce the critical path delay, which is explored by T3. We first pre-process the current waveform by filtering out the high frequency components, then use a “normalized slope” of the rising part to represent the signature metric.

Metric 1: The ratio of the peak current value (Ipeak) over that of the middle time point of the rising part of the switching current (Imid). (WTM (Fig. 5.6(c)) > 2.3, Array multiplier (Fig. 5.6(d)) < 2) Metric 2: The ratio of a normalized switching current amplitude over a normalized switching current duration. The former one is defined as (Ipeak -Iend)/I0, whereas the latter one is Ttran/T0.Iend is the current value of the last time point in the post-

filtering waveform, Ttran is the switching duration of the real switching waveform, while I0 and T0 are the peak current and switching duration of a 1-bit full adder, which can be obtained from both multipliers by applying certain test vectors. (WTM > 3.2, Array multiplier < 2.3) 100

Fig. 5.6. Self-referencing current signatures of an 8-bit Wallace Tree Multiplier and the corresponding current of Array Multiplier for com- parison.

After obtaining datapath element structures, the remaining combinational logic is grouped as random logic with no pre-determined current signature. By applying test vectors to trigger each small group of gates, different gate-level transient current signatures can be obtained and compared with a pre-characterized signature library, e.g. trigger certain paths while setting other inputs to non-controlling values. Scenarios Without A Golden Design: Unavailability of a golden design makes reverse engineering gate-level implementation of random logic to be a remarkably difficult task. Because there is no golden netlist to verify, test vector generation is not oriented. In this case, we adopt the approach described in the flow chart in Fig. 5.7. First the logic expression obtained from logic testing [60] is synthesized to a gate-level netlist. Then iterative side-channel based verification is performed based on this initial guess, during which the predicted netlist is updated with the confirmation or modification of each predicted gate. For each logic level, test vectors are intelligently generated to focus the switching activity on a small number of gates. Considering F=A&B0, the dual manners to implement this function using a reduced 101

Fig. 5.7. Steps of random logic structure identification.

library is shown in Fig. 5.8(a) and (b). Considering Fig. 5.8(a) as the initial guess, to verify this gate, input B is set to the controlling value ‘1’ while switching A. If the prediction is correct, switching of a single inverter should appear; while for the case of (b), no switching activity is expected. Repeating the test with A kept constant helps confirm that A and B are direct inputs of the gate. A case where B indirectly limits the switching caused by A for function F is given in Fig. 5.8(c). Here, both 0→1 and 1→0 at B would cause significant switching even if A is set to its controlling value. However, in this step two exceptions might be encountered. First, if neither of the dual implementations can be confirmed, it implies mis-prediction of the existence of a gate; hence a different set of nodes have to be tried as the inputs. The other exception occurs when the switching activity cannot be limited to one NAND/NOR gate according to the predicted netlist, which could happen because of shared input 102 logic cone that leads to loss of independent controllability of different gates. In this case test vectors are generated targeting multiple gates as a group, followed by a current signature comparison step. The hierarchical top-down reverse engineering process, as described above, is very amenable to automation. The side-channel analysis steps at different levels of hi- erarchy can also be fully automated. However, the only step that requires manual intervention (and hence can only be partially automated) is the high-level test gen- eration based on functional specifications. This needs to be based on the functional block-diagram for a chip and can vary widely from design-to-design. The final result of the RE process is the complete gate-level implementation, along with hierarchi- cal functional and structural description. Any undesired gate or function is easily identified as a malicious modification or Trojan circuit.

5.4 Case Study: DLX Processor

We perform the automated reverse engineering procedures on a 32-bit DLX pro- cessor to prove its effectiveness. All simulation results are obtained by performing HSPICE simulations in Predictive Technology Model (PTM) 70nm [64] technology. 1. Partitioning sequential space using Temporal Self-Referencing: By filling the processor pipeline with the same instruction, we can ensure that only one pipeline stage has switching activity in each clock cycle. Special instructions such as NOPs and JUMPs are used to characterize the background switching current of program counters and state transition of the pipeline stage control FFs. Once all the

Fig. 5.8. An example of the verification unit: (a),(b) Dual imple- mentations of function F=A&B0. (c) Here, B indirectly limits the switching caused by A. 103

Fig. 5.9. Temporal self-referencing helps to identify the pipeline stage currents of a DLX processor.

background current information is obtained, it is subtracted from the total current to focus on the individual pipeline stages such as Instruction Decode (ID), Execute (EX), Memory (MEM) and Write Back (WB). As shown in Fig. 5.9, the current signature for each stage corresponding to an ADD instruction is different from that for NOP, and the current for each stage has a unique signature in terms of peak current, delay and other transient current shape information. 2. Identifying and isolating sequential current component: As shown in Fig. 5.10, for structured sequential circuits such as shift registers and counters, there are process-independent current signatures which are clearly identifiable and can be

Fig. 5.10. Extraction of combinational logic current by subtracting sequential current component: (a) 3-bit binary counter shows the FF switching pattern of 1-2-1-3 which can be easily identified from the current at the positive or negative edge of CLK. (b) Extracting combinational current. 104

Fig. 5.11. Transient current signatures corresponding to specific vec- tors used to identify random logic structure isolated from the MEM stage of the DLX processor, with dependence on (a) a0, a1 and a3; and (b) a2, a4 and a5.

Fig. 5.12. Random logic structure of WB stage of the DLX processor.

detected and eliminated. Similarly, memory access instructions such as LOAD/STORE can be used to find current specific to memory access circuitry. By careful selection of instructions, we can estimate width of memory, structure of address decoders and other peripheral logic, and timing of memory access relative to other operations. 3. Identifying datapath elements by Spatial/Temporal Self-Referencing: By exploring self-similarity of datapath elements using temporal/spatial self-referencing, we reverse engineer the implementation of the structured datapath elements. For ex- ample, we identified a CSA and a WTM in EX stage by applying vectors as described in Section 5.3. 105

4. Identifying random logic and datapath sub-structure by combining side-channel analysis with logic testing: In this step, we successfully reverse engineer random logic in MEM and WB stages of the DLX processor after subtracting out background current due to other stages, the sequential current, and memory current. In MEM stage, two output logic cones structures DRDEN and DWREN with function are derived, where DRDEN is data read enable signal and DWREN is data write enable signal for memory access, respectively. The Boolean functions obtained from logic testing approach: DRDEN = op5 & op40 & op3 & (op10| op20 & op0) DWREN = op5 & op4 & op30 & op20 & (op10| op0) Particularly, by switching different input bits of the MEM stage, we first figure that they are functionally dependent on input bits op5, op4, op3, op2, op1, op0. Then the Boolean equations are derived by applying exhaustive test vectors at these six inputs. Based on this, we obtained the predicted netlists using synthesis tool. After applying the verification procedure, the actual circuits are found as illustrated in Fig. 5.11, in which some transient current waveforms are also shown to demonstrate the netlist verification process. Similarly, in WB stage, we reverse engineer the structure of logic for MUX select signal SEL and write-back enable signal WE. The schematics for the actual logic are illustrated in Fig. 5.12.

5.5 Summary

We have presented a novel reverse engineering based IC trust validation process which combined transient current based side-channel analysis with logic testing based function extraction. We have shown that RE can be used for trust validation in two scenarios: 1) when golden design is available; 2) without golden design (i.e. with functional specification only). Although we focus on using RE for trust validation, the process can also be adapted to improve the effectiveness of conventional man- ufacturing test. The validation steps can be automated to minimize the cost and 106 time of trust validation. Since the technique works at multiple levels of hierarchy, it is scalable to large designs. The approach can work without scan, although pres- ence of scan can be leveraged to improve the logic function extraction process. The proposed RE based trust validation can be used in conjunction with other existing protection approaches. For example, low-cost hardware Trojan detection approaches using static/transient current signature can be used for fast security screening of manufactured ICs, while the proposed approach can be used to increase the level of trust significantly. Future investigation would focus on developing an automation framework and validation with measurement results from commercial ICs. 107

6. DESIGN FOR SOC SECURITY

6.1 Introduction

Security is becoming an increasingly important parameter in modern System-on- Chip (SoC) design. Primary reason lies in the fact that with respect to various forms of existing and emerging security attacks at hardware level, post-manufacturing detec- tion alone cannot provide adequate protection. Fig. 6.1(a) displays diverse hardware security issues at different stages of SoC development and deployment cycle. To ef- fectively protect SoC hardware against these threats, design-time considerations are becoming mandatory as a mechanism to prevent an attack or facilitate detection (or recovery) in the event of an attack. Major threats that are currently being con- sidered in academia and industries include hardware Trojan attacks in the form of malicious modifications in a design; hardware intellectual property theft, like illegal sale or use of soft IP cores/ICs; and physical attacks on cryptographic systems during deployment, e.g. side-channel attack, fault-based attack, and scan-based attack. Dif- ferent techniques have been proposed to thwart these attacks. Considering hardware Trojan threat, rare-event removal [80] and ring-oscillator (RO) network [78], have been proposed as design-for-security (DfS) approaches to facilitate Trojan detection or partially prevent Trojans from functioning. For hardware IP protection, tech- niques like Physical Unclonable Function (PUF) [47], IC metering [71] and hardware obfuscation [77] have been proposed to achieve passive or active protection. For cryp- tographic power attacks, various countermeasures have been reported which require design modification at architecture [48] or circuit level [65]. Similarly, several secure scan architectures have been presented in [68] [76] [69] [70] to prevent scan-based attack. These solutions collectively show that DfS approaches can provide highly effective countermeasures against different threats. 108

(a)

(b)

Fig. 6.1. (a) Security threats at different stages of IC development and deployment cycle; (b) Proposed infrastructure IP for security (IIPS) interfaces with constituent cores of a SoC and provides a flexible, convenient means of addressing various security concerns.

However, with the emerging trend of IP-based SoC design, incorporating DfS fea- tures in SoC faces the following major challenges: 1) It requires design of each IP to be modified at design time, thus considerably increasing design effort and time-to- market; 2) separate DfS features need to be incorporated separately to protect against multiple attacks, which can impose conflicting design as well as test requirements; 3) IP-specific modification may suffer from large hardware overhead; and 4) the hetero- geneous architecture of many modern SoC, makes core-level DfS difficult to access at SoC level. The situation is aggravated by the prevalent use of third party IP cores during SoC design, which prohibits liberal design modifications. Moreover, many of 109 these techniques incur large silicon area and power overhead, and may cause con- siderable performance degradation. For example [65] increases the area and power consumption of a crypto core by more than 200%, and the one in [66] incurs 38% performance overhead. The DfS scheme presented in [76] causes significant perfor- mance degradation and suffer from scalability issue. Therefore, it is desirable to have an easy-to-integrate scalable IP core that serves as a centralized security module to achieve SoC design equppied with comprehensive security protection at low design overhead and effort. Such an IP resembles the role of an infrastructure IP for SoC verification or testing [81] [84], but is dedicated for accomplishing security against various forms attacks, as depicted in Fig. 6.1(a). In this chapter, we propose an infrastructure IP for SoC security, referred to as IIPS, as illustrated in Fig. 6.1(b). The IIPS module can efficiently interface with con- stituent cores in an SoC through the use of IEEE 1500 Standard Embedded Core Test (SECT). It can provide comprehensive protection against diverse security attacks by either preventing an attack or facilitating SoC trust validation during manufacturing test. This centralized module acts as a plug-and-play core for an SoC designer and can be automatically interfaced with other cores in the SoC. We have presented general design of IIPS and its interfacing with other cores. Furthermore, we have considered specific case study with respect to three example attack modes, as described next. It provides an authentication mechanism for activating functional IP scan chains to prevent information leak through illegal access to scan chains. It contains a Physical Unclonable Function (PUF) primitive for device authentication or cryptographic key generation, by making use of a prevalent on-chip structure, namely, the scan chain Design-for-Testability (DfT) structure in the cores. It also includes an infrastruc- ture to perform trust validation against hardware Trojans. The infrastructure can be shared to execute (faster-than) at-speed delay fault testing during post-manufacturing or in-field SoC test. IIPS only requires minimal modifications to functional cores for interfacing thus facilitating system-level integration. Since IIPS resides outside the functional cores and only activates when performing testing or security tasks, it does 110 not incur functional performance/power overhead. The die overhead is very modest for realistic multi-core SoCs, and is further minimized by sharing the same infras- tructure among multiple security primitives. In particular, major contributions of the chapter are as follows: (1) The chapter proposes, for the first time to our knowledge, an on-chip in- frastructure IP, namely IIPS, which can provide protection against multiple security threats for an SoC. (2) It presents an efficient, general interface of IIPS to communicate with func- tional IP cores through standard SoC boundary scan architecture [91]. We have ex- plored appropriate low-overhead design modifications in the IP cores for interfacing with IIPS. (3) For three common threat models, we have studied the design requirements for IIPS. In particular, we have considered protection against (i) scan based attacks that recover secret keys from a crypto core, (ii) IC piracy and counterfeiting through use of physical unclonable functions, and (iii) hardware Trojan attacks. We have analyzed the design requirements and optimized IIPS implementation to minimize the hardware overhead. (4) We have provided simulation as well as measurement results to validate the IIPS module functionally, evaluate the hardware overhead and verify its capability in achieving security against the specific attacks considered. The remainder of the chapter is organized as follows. Section 6.2 provides back- ground of infrastructure IP and embedded core test standard. Section 6.3 provides an overview of IIPS, followed by the design details in section 6.4. Section 6.5 presents the SoC level test protocol. Simulation and experimental results are provided in sec- tion 6.6. And a discuss is conducted in section 6.7 on the functional flexibility and scalability of IIPS. Finally we conclude in section 6.8. 111

6.2 Background of IIP and Embedded Core Test

6.2.1 Infrastructure IP

Infrastructure IPs (IIPs) refer to a range of IPs that are dedicated to facilitate SoC functional verification, testing or yield improvement. Synopsys Inc. provides a set of verification IPs that can be instantiated in a SoC to verify certain bus pro- tocols [81]. These IPs are targeted to benefit front-end SoC designers, and do not exist on the manufactured SoCs. There are also IIPs that are on-chip modules to be fabricated in order to facilitate post-manufacturing test or debug or to improve yield. Examples include the structures proposed in [84] to help in-field SoC test, for embedded timing characterization [85], transient-error tolerance [86], and for mixed- signal yield improvement [87]. Moreover, several vendors provide test-IP products for test capability enhancement for board-level designs [82]. The demand for on-chip dedicated infrastructure logic to facilitate testing is due to the increasing SoC de- fect and transient error rates with aggressive technology scaling, for which external test equipments provides inadequate coverage or greatly increase test cost/time [83]. The proposed infrastructure IP for security, share the same motivation and design principles as IIP for SoC test or error tolerance. It, however, aims at enhancing SoC security and trust.

6.2.2 IEEE 1500 Standard

The general heterogeneous architecture of SoCs limits the controllability and ob- servability of internal functional IP cores, which brings up the problem of test cost and coverage. This requires incorporating certain infrastructure logic at design time to al- low effective testing of IP cores to be performed at the chip level. On the other hand, test reuse is becoming an indispensable part of IP reuse in SoC development [88]. This creates a need for a standard test protocol associated with the test infrastruc- ture hardware so that different IP cores can be tested with a unified test frame, and 112

Fig. 6.2. IEEE 1500 Standard: (a) core wrapper interface terminals; (b) mandatory components of the wrapper [88].

core level test sets created by the core developer can be easily expanded to SoC level at the core users’ end. IEEE Std. 1500 is the test standard developed to serve these purposes. IEEE Std. 1500 [91] contains two parts: (1) a core test wrapper architecture for test access to embedded cores; and (2) a core test language (CTL) [89] for core test knowledge transfer. To comply the standard, each IP core in the SoC should have a test wrapper. The wrapper provides one boundary register cell for each functional I/O port of the core, where all boundary register cells are referred to as Wrapper Boundary Register (WBR). It also contains a Wrapper Instruction Register (WIR) and a Wrapper Bypass Register (WBR). Fig. 6.2 displays a high level core wrapper interface and the mandatory components of a core wrapper. By configuring the WIRs of the SoC function cores, all WBRs can be concatenated in different ways to allow four SoC operation modes, including normal mode, inward facing mode for core test, outward facing mode for interconnect test, and bypass mode. Test access to the embedded cores is enabled by the core wrapper and an on-chip Test Access Mechanism (TAM). Fig. 6.2(a) shows two ways of accessing the cores: the mandatory Wrapper Serial Ports (WSP) and the optional Wrapper Parallel Ports (WPP). WSP provides a single bit test data port (WSI/WSO) along with test clock 113

Fig. 6.3. Block diagram of the IIPS module showing interconnection with other IP cores in an SoC using SoC boundary scan architecture.

(WRCK ) and control signals (WSC ), while WPP offers a larger bit-width for more efficient test execution. However, IEEE Std. 1500 does not anticipate a WPP port, of which, therefore, the design responsibility resides with the core provider. For this reason, in this work IIPS is designed to exploit merely WSP. We are also not concerned about the various choices in selecting a parallel TAM architecture. In the case where certain parallel TAM architecture is given, the security test set expansion from core-level to SoC-level can follow the same methodology as conventional core test expansion with parallel test access.

6.3 Overview of IIPS

The block diagram of IIPS is provided in Fig. 6.3. It consists of a Master Finite State Machine (M-FSM ) that controls the working mode of IIPS, a Scan Chain En- abling FSM (SE-FSM ) to provide individual control over activation of scan chains in the SoC, and a clock control module to generate necessary clock and control sig- nals for performing ScanPUF authentication and path delay based hardware Trojan detection. The state transition diagrams of the M-FSM and SE-FSM are illustrated 114

Fig. 6.4. State transition diagrams of the IIPS master FSM and Scan enable control FSM embedded inside it.

in Fig. 6.4. When enabled, IIPS takes the standard SoC test inputs from Wrapper Serial Port (WSP)(WSO output pin is not shown). As outputs, IIPS sends one scan chain enable signal to each functional-IP core, and replaces the original test clock WRCK with CLK, which is generated inside IIPS, to support ScanPUF authentica- tion and Trojan detection. Besides, test control ShiftWR and CaptureWR applied to the cores are replaced with ShiftWR TD and CaptureWR TD during Trojan detec- tion for automatic capture of test responses. In short, IIPS serves as a test control line translator. SoC test inputs arrive at IIPS, and IIPS generates the actual test clock and control signals that are applied to the cores. During normal testing the test signals are automatically preserved by IIPS. A single functional clock is assumed in this work and multiple clock domains can be adapted with slightly altered design of the clock control module. After IIPS is turned on, M-FSM starts from IDLE state. A specific vector se- quence applied at WSP can bring M-FSM to SC EN state to allow scan chain activa- tion. After scan chain activation is done, M-FSM can be set to ScanPUF or TrojDet state to perform ScanPUF authentication or hardware Trojan detection, respectively. 115

SC EN state is designed to precede ScanPUF and TrojDet states, because both Scan- PUF and Trojan detection require at least one active scan chain. Specific sequences of vectors are required for M-FSM to switch among different security functions, act- ing both as an authentication mechanism and to prevent unintentional trigger of a function. When performing each IIPS function, M-FSM stays in the corresponding state; when the task is completed, M-FSM can be reset to IDLE state with IIPS reset signal IIPS RSTN. The SC LOCK and TD LOCK states are two locking states designed to support scan chain based SoC testing and delay fault testing, respectively, as described below in detail. When IIPS is deactivated or in the locking states, intact test signals are propagated to the functional cores; during IIPS security functions, modified (in ScanPUF and TrojDet) or deasserted (in SC EN ) signals will be applied to the functional cores. IEEE 1500 compliant SoCs have a standard active low test reset signal WRSTN. Since the test inputs WSP usually reuse chip function inputs, WRSTN is necessary to differentiate the test mode from function mode by interpreting the inputs as test signals, and consequently configuring the test infrastructure as well as propagating test data. Similarly, one extra input pin IIPS RSTN is needed on the SoC to provide an active low reset signal for IIPS. Active IIPS RSTN disables IIPS by setting M- FSM to IDLE and turning off all of its output signals. IIPS RSTN and WRSTN collaboratively define four operating modes of the SoC: (i) {IIPS RSTN, WRSTN } = “00”, functional mode; (ii) {IIPS RSTN, WRSTN } = “01”, basic SoC test mode; (iii) {IIPS RSTN, WRSTN } = “11”, advanced SoC test mode requiring active core scan chain or (faster-than) at-speed delay fault test; (iv) {IIPS RSTN, WRSTN } = “10”, IIPS security task mode. In mode (i) and (ii), IIPS is not turned on. Hence mode (ii) can only perform basic SoC testing where core scan chains are disabled, and delay fault testing is not supported. Mode (iii) enables IIPS to allow scan-chain based SoC testing and delay fault testing. One requirement in this procedure is that IIPS should not have 116 unnecessary state transitions in order to avoid interfering with the normal testing procedure. For example, in scan-based testing, IIPS should enable the scan chains as requested and then stay idle during the rest of the test. If M-FSM happens to enter ScanPUF or TrojDet state, the test clock WRCK and scan shift signal ShiftWR will be tampered, causing wrong test responses. Similarly, when delay fault testing is being performed, IIPS should be configured in TrojDet state to provide appropriate clock signal, without further state transitions. To achieve such “functional locking”, states SC LOCK and TD LOCK are added in M-FSM state diagram for scan-based and delay fault testing, respectively. In the locking states, proper output signals are retained, but any state transition is disallowed except for resetting the entire IIPS to IDLE with IIPS RSTN. In particular, in SC LOCK state IIPS maintains the status of the scan chain enable signals determined in SC EN state. Similarly, the clock and test control signals in TrojDet state will be available in TD LOCK state for (fast-than) at-speed SoC delay fault testing.

6.4 Design of IIPS Security Functions

6.4.1 Attack Models and Mitigation Strategies

Scan-based Attack in Cryptographic Systems

Scan chain is the most prevalent DfT infrastructure in modern ICs. It serves as an essential vehicle for post-manufacturing and in-field IC testing because it can provide a high test coverage with a simple structure. Scan chain also facilitates development and maintenance of software running on a chip by connecting to JTAG interface to provide on-chip debug capability [96]. However, scan chain turns out to be a two-edged sword when it comes to security hardware like cryptographic processors or ASICs. The extra controllability and observability that scan chains offer can reveal chip internal secret information (e.g. the cryptographic key) or intermediate data that can help attackers to derive the secret information. Previous studies [69] 117 have demonstrated that by operating AES in functional and scan modes alternately, intermediate computation of each clock cycle can be observed through the scan chain. With a chosen-plaintext attack, the secret key can be discovered easily. Several different secure scan architectures have been proposed to prevent access to the scan chain by illegal users. The common philosophy of most techniques is implementing an authentication mechanism to thwart any critical information leak caused by illegal scan access. In particular, the scan structure in [68] masks the scan output with pseudo random numbers when users fail to embed the correct test key into their test vectors. [76] scrambles the scan elements when the secure test mode is not successfully reached. A low overhead authentication based scan protection was proposed in [70] that gates the scan output unless the authentication is passed. Different from previous approaches, [69] proposed a method to isolate the important data registers (e.g. those holding the keys) when the chip is running in an insecure mode. In IIPS, we adopt VIm-Scan proposed in [70] with slight modification to improve the security effectiveness. The choice is made primarily based on the fact that [70] does not require structural changes on the the scan chains except for the trivial interface adaption, therefore minimizing the design modification in SoC functional cores to ease the design integration process. [70] also has the advantage of ultra-low overhead and good scalability compared to other approaches.

Hardware Intellectual Property Piracy

Outsourcing fabrication introduces substantial uncontrollable factors in IC manu- facturing, creating a platform for counterfeiting, cloning and overproduction of elec- tronic systems [74]. In 2011, information and analytics provider IHS reported $169 billion potential risk in annual avenue due to five most prevalent types of semiconduc- tors reported as counterfeits, including microprocessor and memory ICs [75]. These ICs have widespread use in both consumer and military applications, and the inci- dent rate of counterfeit ICs is still exhibiting rapid increase. Often made of salvaged 118 waste ICs or cheap substitutes, counterfeit ICs seldom meet the quality requirements and usually cause severe functional problem in deployment. This could lead to func- tional failures of consumer electronics, catastrophic consequences in military services, and tarnishing of legitimate IC venders’ reputation. Device authentication has been looked at as an efficient method to prevent counterfeit ICs from entering the de- ployment field. By assigning each legitimately fabricated IC a unique device ID and register it in a vendor-built database, consumers can authenticate the IC with the vendor, thus get only the legal ICs activated [72]. In this way, the overproduced and counterfeited ICs can be identified before user deployment. However, a digitally stored device ID is not secure or tamper-proof. Invasive or non-invasive methods can be applied to reveal the stored ID [73], and a cloned device can be programmed with the same ID subsequently. On the other hand, unclonable IC fingerprint like PUF has been investigated to serve the authentication purpose. PUF forms an important security primitive by exploiting hardware process variations, such as random path delay characteristics or transistor threshold mismatch in SRAM cells, to generate unique device signatures. Since these signatures are derived from the IC circuitry itself, and is caused by random uncontrollable factors during manufacture, they cannot be duplicated by an attacker. In addition, good PUFs exhibit high uniqueness and robustness, thus are able to differentiate each individual IC reliably. To integrate a PUF primitive into the IIPS infrastructure, it is necessary that the PUF incurs low silicon overhead considering both the PUF circuitry and the control logic for signature generation. It is also highly desirable that the signature generation and extraction procedure complies standard SoC testing flow, as one of the design goals of IIPS is to have a test flow compatible with standard test protocol so that the security tasks can be integrated with normal testing and be delivered uniformly via Core Test Language (CTL). With these requirements, we employ the ScanPUF architecture presented in [97], which is an ultra-low overhead PUF with high uniqueness and robustness, as the PUF circuitry is merely the already existent scan chains in the functional cores. 119

Hardware Trojan Attack

Hardware Trojan attacks studied in this thesis are enabled by untrusted elements in IC development cycle. The concept revolutionized the traditional concept of secu- rity that is primarily concerned about software and networking. It poses a potentially more devastating threat, because malicious hardware can easily bypass traditional software-implemented defense techniques as it is a layer below the entire software stack. The enabling power of hardware Trojans is their stealthy nature, rendering themselves extremely difficult to detect during conventional IC testing procedure. This is because design of Trojan trigger condition is to the attackers’ liberty. The trigger condition can be made arbitrarily difficult to meet by functional or random test within limited test time, given the large space of the resources (e.g. enormous signals inside the SoC) that can be exploited to make up the trigger condition. Various hardware Trojan detection approaches have been proposed so far, includ- ing logic testing and side-channel analysis based methods. The context of IIPS rules out logic testing for two reasons. First, it is generally believed that cleverly designed hardware Trojans (e.g. sequential hardware Trojans requiring a long sequence of rare events to trigger) are extremely difficult to detect in logic testing, as the Trojan has to be completely activated to be detected, which is difficult due to the complex Trojan trigger mechanism. Without a trigger model for certain, Trojan detection cannot be guaranteed even with intelligently generated test vectors. Second, on-chip generation of intelligent test vectors is overly expensive; on the other hand, logic testing with externally input vectors can be readily executed with the test infrastructure for nor- mal testing (e.g. IEEE 1500 architecture). Among the various side-channel analysis based detection approaches, combinational path delay based detection [79] exhibits particular advantages. First, it does not require triggering the Trojan payload to detect the Trojan. In fact, it does not necessarily require switching activities of the Trojan circuitry; extra delay due to capacitive loading of Trojan circuitry can render the Trojan detected. In this sense it is superior to transient current based meth- 120

Fig. 6.5. Functional core scan chain protection with SC EN.

ods which require switching activity from Trojan circuitry. Second, high resolution path delay measurement can be done through on-chip infrastructure at very low cost, which, in addition, can be shared to perform delay fault testing. In this work, we exploit the clock sweeping technique proposed in [98] for characterizing path delays. In particular, IIPS provides the infrastructure required for high resolution path de- lay characterization, the response of which can further be processed using statistical analysis to form a profile for Trojan detection.

6.4.2 Security Primitive Design

Authentication-based Scan Chain Activation

The state diagram of SE-FSM is illustrated in the circle in Fig. 6.4. IIPS generates an external scan chain enable signal SC EN i (i ∈ [1,N], where N is the number of 121

functional cores) for each functional core. The scan chain locking is achieved by gating

both the scan input and output when SC EN i is inactive. Fig. 6.5 demonstrates the scan-in and scan-out gating of a IEEE 1500 compliant core. Gating both the input and output of a scan chain can prevent illegal users from gaining controllability and observability via the scan chain. [70] chose to gate only the scan output, which still leaves illegal users the opportunity of scanning in an arbitrary state to operate the chip from, and observe the functional primary outputs, to ease the crypto key extraction. An alternative way of controlling access to the scan chain is to gate the core internal scan enable SE. However, this may compromise the timing of SE, which is usually one of the critical paths of an SoC.

By default, SC EN i is set to logic 0, hence all scan chains are disabled and the cores can only work in functional mode. Even if the scan chain is concatenated with the wrapper boundary register (WBR) during testing, only logic 0’s will be scanned in and shifted out from the scan chain. In order to perform scan-based testing or IIPS security tasks a test initiation phase is required to enable necessary core scan chains prior to the testing procedure. This can only be done when M-FSM is in SC EN state. To enable scan chain access of each functional core, a sequence of input vectors specific to the core has to be provided at test interface WSP to bring SE-FSM to the particular scan activation state, e.g. Core1 SC EN for enabling scan chain of Core1. During scan chain activation, the predefined authentication input sequences are distinctive for different scan chains, hence can be considered as the scan activation key to each core. After each activation, M-FSM will return to SC INIT waiting for

the next scan chain activation request. SC EN i of the enabled scan chain is latched and can only be disabled by active IIPS RSTN.

ScanPUF Primitive

Fig. 6.6(a) illustrates the ScanPUF concept. The basic idea is to exploit random delay variations in scan paths, namely the paths between adjacent flip-flops when 122

(a)

(b)

Fig. 6.6. ScanPUF, PUF realized in the scan chains of IPs by the IIPS module: (a) principle of signature generation [97]; (b) clock generator and timing of relevant signals.

they are connected into a scan chain. Scan chain is essentially a shift register. When a vector is being shifted in, to guarantee each flop can successfully latch the value from the previous flop, the new value has to be ready by a setup window (tsetup) ahead the rising edge of the clock. This means the shift clock period should be no less than tclk2q + tpd + tsetup, where tclk2q is the clock-to-Q delay of the flop and tpd is the com- binational propagation delay from the previous flop, which can include interconnect and buffer delay. Smaller clock period will lead to setup violation, causing uncertain value being latched to the flop. Generally, scan paths within a functional core can be partitioned into groups, where paths in each group have closely matching delays. In each group, the path delays can be modeled as of a single nominal delay (tpd0) plus 123

a random intra-die variation (δi, i is the index of the scan path) with Gaussian dis-

tribution. By using a shift clock with period tclk2q + tpd0 + tsetup (denoted as nominal capture clock period t0), half the scan flip-flops are expected to have a setup failure; and the positions of the failure flops in the scan chain is chip specific, depending on the random intra-die delay variations formed during IC manufacturing. This denotes the PUF signature, i.e. response to a specific capture clock period. To further explore the fine-grained delay variations among different scan paths, the capture clock period can be slightly varied around t0 in small steps to obtain more signatures. Signa- ture generation can be executed on each group individually. In particular, ScanPUF signature generation is performed as follows: (1) Scan chain initialization: In test mode (scan-shifting mode), scan in a sequence of alternating 0’s and 1’s under normal test clock. After this step, each adjacent flop pair will hold different values.

(2) Signature generation: Launch a clock rising edge tsig later than the last rising edge in scan chain initialization. Since every flop is supposed to have a transition, those end up holding the old values indicate setup violations due to a relatively longer delay of the preceding scan path. (3) Signature propagation: Shift out the response bits through scan-out port with normal test clock. ScanPUF test procedure merely requires scan shift operation with a dedicatedly generated clock signal. Therefore, ShiftWR should be asserted and applied to the functional cores throughout the signature generation process to enable scan shift. To generate CLK with a proper signature capture cycle, test inputs need to acknowl- edge IIPS on the completion of scan initialization. Since SoC test input CaptureWR (one of Wrapper Serial Control lines) is not used during SoC configuration or scan shift, it can be leveraged as the acknowledge signal. The actual CaptureWR sent to the embedded cores are deasserted because capture operation is not needed during ScanPUF. Relevant signal timing in the signature capturing process is demonstrated in Fig. 6.6(b). The clock generation logic shown in Fig. 6.6(b) works in the following 124

Fig. 6.7. Design of the programmable delay line [93].

manner. At the beginning of ScanPUF process, RSTN is asserted to set SEL to 0. Normal test clock WRCK is used for scan shifting (CLK =WRCK ), while ShiftWR is active during the entire ScanPUF process and applied to all the functional cores. CaptureWR is asserted before the last shift in scan initialization and holds for one cycle, indicating the completion of scan initialization. In this way, with the clock rising edge of the last shift, CaptureWR is latched by a flop, turning SEL to 1 so that CLK is switched to WRCK d, which is a phase-shifted version of WRCK. The amount of the phase shift is tuned to be the capture period tsig. Buffers are added before the clock port of the flop to introduce a slight positive skew, to ensure the clock pulse in the capture cycle of reasonable width. Because an overly narrow clock pulse may cause unsuccessful data latching or suffer from severe distortion when pass- ing through the clock tree. WRCK d is used thereafter until the PUF response is completely shifted out through WSO. As described above, the capture clock is generated by switching between two clocks with a phase difference. The phase shift is realized by adopting a programmable delay line (PDL) proposed in [93], as shown in Fig. 6.7. It has a coarse delay control and a 125

(a)

(b)

Fig. 6.8. (a) Concept of clock sweeping technique for hardware Tro- jan detection [98]; (b) Trojan detection through monitoring of delay shift by observing the latched value under clock sweep in two possible schemes.

fine delay control block, differentiated by the delay of a unit buffer. The total delay can be configured by serially loading the control flip-flops to set one bit in each block as logic 1. The coarse control and fine control collaboratively allow a fine-grained delay tuning with a wide range at low hardware cost. The implementation details are provided in section 6.6.2. 126

Path Delay Based Hardware Trojan Detection

Fig. 6.8(a) illustrates the basic concept of clock sweeping technique. Considering one combinational path, by sensitizing it from its source (i.e. a primary input or flip- flop) and capture the output at the destination (i.e. a primary output or flip-flop) with a clock of certain frequency, a judgement can be made on the comparison of the path propagation delay and the clock period. By sweeping the clock frequency, path propagation delays in a circuit can be categorized into different bins, where each bin stand for a small range of period, determined by the clock sweeping resolution. Effi- cient test pattern generation to sensitize maximum number of paths can be achieved leveraging on existing CAD algorithms and tools for transient and path delay fault generation [98]. We have implemented a clock generation logic that can generates appropriate clock and test control signals inside the SoC to support both Launch-On-Capture (LOC) and Launch-On-Shift (LOS) test schemes. In clock sweeping based Trojan detection, Trojan impacts are modeled as path delay faults. The testing responses over the entire circuit are collected for statistical analysis to identify possible Trojans. We make the capture clock frequency tunable to support a fine-grained clock frequency sweeping for a wide range, trying to cover both critical and non-critical paths even with very short delays. Path delay characterization is a two-pattern test, which involves one vector to initialize the inputs and a second vector to excite the transition on the target paths. LOC and LOS refer to two ways of generating the second test vector. In particular,

LOC takes the functional result of the first test vector (V0) as the second vector (V1), namely

V0 = {PI0,PPI0} (6.1)

V1 = {PI0, fPPI (V0)} (6.2) 127

where PI represents the primary input, PPI stands for the pseudo primary inputs, i.e.

internal state elements, and fPPI is the boolean dependency of PPI on {PI,PPI}. In

this case, the circuit should operate in functional mode during both the V1 generating cycle and the capture cycle, hence the scan enable (Shift) can be deasserted before the

V1 generating cycle and re-asserted after the capture cycle, as displayed in Fig. 6.8(b). This way the scan enable timing merely needs to meet the test clock frequency, and is not concerned about the capture cycle. On the contrary, Launch-On-Shift (LOS) scheme shifts the first test vector by one bit to create the second vector, which poses a stringent requirement on the scan enable timing, as shown in Fig. 6.8(b). The turnaround time of the scan enable is limited by the capture clock period. Fig. 6.9 demonstrates the clock generator circuitry as well as the timing of relevant signals. The basic clock generation scheme is similar to that of ScanPUF clock generation. However, in this case both ShiftWR and CaptureWR are used to control IIPS, and the actual test control lines ShiftWR TD and CaptureWR TD that go to the cores are generated by IIPS. Particularly, ShiftWR is used to acknowledge the

completion of test initialization by holding low for one cycle after the last shift for V0. CaptureWR is used to differentiate LOC (CaptureWR=1) from LOS (CaptureWR=0). During the clock generation, rising of SEL indicates occurrence of the launching clock edge, and rising of REC SE indicates the capture clock edge. While for both LOC and LOS, the scan enable ShiftWR TD returns to 1 with REC SE after the capture, ShiftWR TD in LOC is deasserted together with ShiftWR to allow the core to enter functional mode in order to generate the launching vector, and ShiftWR TD in LOS maintains high until SEL rises with the launching clock edge. Capture TD is not shown in the figure because it is the inversion of ShiftWR TD in both cases, as capture of functional responses is only needed when the core is working in functional mode while ShiftWR TD is low. 128

Fig. 6.9. Clock sweeping based hardware Trojan detection: (a) clock generator; (b) clock generation signal timing.

6.5 Test Protocol under IEEE Std. 1500

6.5.1 Wrapper Operation Modes

IEEE 1500 standard is the chip level analogue of IEEE 1149.1 JTAG (Joint Test Action Group) boundary scan test standard for PCB board. The main difference is that JTAG standard includes a test control FSM to prescribe the test procedure, while IEEE Std. 1500 allows testing of an SoC in various configurations via collaborative control of Wrapper Instruction Register (WIR) and Wrapper Serial Control (WSC ) lines. Similar to JTAG, IEEE Std. 1500 can concatenate the core boundary registers and scan chains of embedded cores into a test path, thus enabling test access to each functional core. Fig. 6.10(a) demonstrates the concatenated WBRs of SoC cores. By loading the WIR with different instructions, the test path can include or exclude the scan chain, and the decision can be core specific. As shown in Fig. 6.10(a), Core 2 is incorporated into the test path with its internal scan chain, allowing testing of the core in scan mode. IEEE Std. 1500 supports core based testing, namely when a core is under test, irrelevant cores can be set into BYPASS mode with only the 1-bit WBY register included in the test path so as to save test time. This can be done by configuration of WIRs of irrelevant cores with BYPASS instruction. 129

(a)

(b)

Fig. 6.10. (a) Wrapper boundary registers concatenated for SoC test or IIPS functions; (b) wrapper boundary cell and its configurations.

Different core operation modes include functional mode, inward facing mode, out- ward facing mode, and bypass mode. Functional mode corresponds to the scenario where the SoC is performing its normal function, while the other three modes are for testing. In particular, inward facing mode allows testing the internal logic of a core; outward facing mode supports validating the functionality of interconnects among 130

Table 6.1 Control values for wrapper boundary cell.

Input Cell Output Cell Instruction Scan Enable Hold Enable Scan Enable Hold Enable WS INTEST ShiftWR 1 ShiftWR ∼ CaptureWR WS EXTEST ShiftWR ∼ CaptureWR ShiftWR 1

different cores; and bypass mode indicates use of WBY register to facilitate testing of other cores. These operation modes are realized by configurations of the Wrapper Boundary Cell (WBC ), i.e. element in WBR. Fig. 6.10(b) displays the structure of a minimal WBC and its configurations to re- alize different operations. The operation mode a WBC performs, controlled by values of Scan Enable and Hold Enable, depends both on the core configuration (i.e. instruc- tion in WIR) and the test control lines WSC. IEEE 1500 serial test interface contains a test data input terminal WSI and output terminal WSO, a test clock WRCK, a test reset WRSTN, and test control lines WSC. Mandatory WSC signals include Se- lectWIR which, when active, connects WSI with WIR for test mode configuration; ShiftWR which controls the shift operation of WBR and CaptureWR which indicates capture operation of the wrapper cells. The actual dependency of Scan Enable and Hold Enable on WSC is mode dependent. Example implementation of WBC control is shown in Table 1 for inward facing and outward facing modes [104].

6.5.2 SoC-Level IIPS Test Protocol

Scan Chain Authentication

The control of IIPS M-FSM and SE-FSM is at SoC level. Therefore, for overall IIPS control and scan chain activation, creation of test protocols/patterns is of the system integrator’s responsibility. Test can be directly provided in terms of SoC 131

Fig. 6.11. Configuration of scan chains of different IPs during signa- ture generation for chip authentication using scanPUF structures.

primary inputs and delivered in the form of standard Core Test language (CTL) [89]. In particular, the enabling process of each security mode (i.e. SC EN, ScanPUF or TrojDet) can be modeled as a test macro.

ScanPUF

On the other hand, it is desirable to generate test sets for ScanPUF and Trojan detection at core level by core designers, because ScanPUF test data lengths and Trojan detection test patterns are core dependent. The test sets can then be expanded to SoC level by the system integrator. For test sets in conventional testing, expansion from core to SoC level involves translating the core terminals to the corresponding SoC pins, as well as searching for paths to and from the CUT for test stimuli and response propagation when using the parallel test interface [94]. Expansion of ScanPUF and Trojan detection test sets can reuse the same methodology, with the main distinction that an acknowledge signal should be raised for one clock cycle at the completion of test initialization. For both ScanPUF and Trojan detection, the entire test procedure can be divided into four parts: (1) core test mode configuration; (2) test initialization; (3) signature generation; and (4) signature propagation. Considering ScanPUF, although func- tional cores in an SoC can be concatenated into a single scan chain to generate 132

ScanPUF signature, scan paths in different cores may have remarkably different de-

lays and requires different tsig. A more efficient way for saving test time is to perform ScanPUF test on different cores individually by setting the Core Under Test (CUT) to INTEST SCAN mode (i.e. connect both WBR and core internal scan chain into the test path) while configuring other cores to BYPASS mode, as indicated in Fig. 6.11. This procedure is referred to as core test mode configuration, which is exactly the same as in normal SoC testing. ScanPUF test initialization refers to initializing the entire scan chain with alter- nate 0’s and 1’s. Beside test data input and output translation, the initialization data length is expanded to account for the BYPASS registers of irrelevant cores on the test path. At the completion of test initialization, SoC input CaptureWR is asserted for one clock cycle to launch the signature generation and propagation process. What is needed afterwards is simply maintaining the test lines and pausing the clock until the signature is shifted out. Note that the actual clock and WSC signals applied to the cores for test control will be automatically generated by IIPS, the only responsibility at the SoC boundary is to raise the acknowledge signal when test initialization ends.

Hardware Trojan Detection

For Trojan detection, test sets creation can be considered in two scenarios. To detect Trojans mounted inside the functional cores, test sets should be developed by the core designers and then expanded to SoC level. Trojans could also be inserted to tamper the interconnection among cores, e.g. on system buses. To test against such attacks, test sets should be created at SoC level. Fig. 6.12 illustrates the core configurations in the two scenarios. In the first case, to detect Trojans inside a core, the CUT is set to INTEST SCAN mode to allow inward facing test, while other cores are set to BYPASS mode. Here test initialization means scanning in test vectors for CUT primary and pseudo primary inputs (scan flip-flops) before launching the signature generation. Associated with this 133 step is to translate the test terminals from core to SoC boundary, and expand the test data to account for the BYPASS registers in other cores, which is trivial because test is executed through the serial interface thus no data propagation path search is needed. In the second case where integrity of system interconnections is concerned, test patterns can be directly created at SoC level and no test data expansion is needed. Multiple test sets can be generated, with each focusing on Trojan detection of a particular region of interconnects. When one region of interconnect is being tested, the corresponding driver and receiver cores should be configured to EXTEST mode while other cores BYPASS mode. In this way, output wrapper registers of the driver cores can provide test stimuli, and test responses can be captured at receiver core input wrapper registers. The test scheme choices are also different for the two cases. When considering Trojans inside a core, both LOC and LOS can be used to capture test response. However, for Trojans on system buses, only LOS scheme can be applied for minimally configured IEEE 1500 architecture. This is because minimally configured output wrappers do not support capture operation in EXTEST mode, thus cannot generate the excitation pattern in functional mode with LOC. However, with higher 1500 configuration where output wrappers are implemented with enhanced scan cells, or extended 1500 infrastructure to support output wrapper capture in EXTEST mode [92], LOC scheme can also be used for detecting Trojans on system interconnects. A simplified example of SoC level test protocol for Trojan detection using LOC is represented in Algorithm 3. It is worth noting that WSC may have patterns that are invalid for normal testing, as the actual test control lines are generated by IIPS.

6.6 Results

To verify the functionality and security effectiveness of IIPS, we performed HSPICE simulation on IIPS with 45nm CMOS model [95], and implemented IIPS in a rep- 134

Fig. 6.12. Example SoC configuration when performing Trojan detec- tion for: (a) Trojans inside a core; (b) Trojans in SoC system bus.

Fig. 6.13. IIPS timing diagram: (a) M-FSM and SE-FSM; (b) Scan- PUF clock generation; (c) Trojan detection clock generation.

resentative SoC on Altera DE0 FPGA platform. We have also characterized the incurred hardware overhead in both ASIC and FPGA contexts.

6.6.1 IIPS Functional Validation

Fig. 6.13 displays the timing diagram which indicates IIPS functional state tran- sition in response to SoC test inputs, as well as the functionality the clock generator under ScanPUF and Trojan detection mode, respectively. In Fig. 6.13 (a), WSP is a 6-bit input including test reset signal WRSTN, test data input WSI, and four wrapper serial control signals ShiftWR, CaptureWR, UpdateWR, SelectWIR. Apart 135

Algorithm 3 SoC-level test protocol for Trojan detection

MacroDefs Core TrojDet macros{ //core test mode configuration core config{ // If the core is CUT, configure it into WS INTEST SCAN // Otherwise WS BYPASS (Omitted here) Purpose Instruction; W core config timing; C {WRSTN=0; WSI=0; WSO=x; WSC=0000;} V {WRSTN=1; SelectWIR=1;} // to load WIR Shift { V {WSI=010;} // instruction WS INTEST SCAN } V {WSC=0001;} // update WIR }

//Trojan Detection TrojDet{ //Test initialization W TrojDet timing; C {WRSTN=1; WSC=0110; } // enable scan and set capture scheme as LOC Shift { V {WSI=‘wsi’; WSO=#; } // test pattern defined separately } // Signature generation acknowledge V {ShiftWR=0; } // Signature propagation F {ShiftWR=1; } Shift { V {WSI=0; WSO=‘wso’;} // check response } } }

from WRSTN, the rest signals are considered as a 5-bit bus input to control IIPS functional state transitions. As shown in the waveform, after the correct input se- quence is received, IIPS M-FSM enters state SC EN (“0100”) to allow scan chain 136

Fig. 6.14. Inter-die HD distribution for scan-based PUF in case of 500 chips under tsig = 0.19ns

activation. At this time, WSP input is interpreted by SE-FSM to determine whether to enable a scan chain. In this example, the authentication input sequence is applied to enable scan chain of core 1. In the next cycle, scan en core1 reg is asserted, allow- ing access to core 1 scan chain. Next, an input sequence is provided to make IIPS enter ScanPUF state for performing ScanPUF on core 1. During the entire ScanPUF procedure scan en core1 reg remains high, until active IIPS RSTN after ScanPUF operation to reset the entire IIPS. Fig. 6.13(b) and (c) illustrates the timing of clock generation relevant signals in ScanPUF and Trojan detection mode, respectively.

6.6.2 SoC Authentication and Hardware Trojan Detection

The clock generator including the programmable delay line (PDL) block is im- plemented using 45nm CMOS Predictable Technology Model (PTM) [95]. The PDL contains a 6-level coarse delay control and a 11-level fine delay control block, with 137 resolution of 136ps and 12ps respectively. The tunable clock has period range of 300ps ∼ 1.15ns, namely 870MHz ∼ 3.33GHz in frequency. Monte Carlo HSPICE simulations were performed to verify the effectiveness of IIPS in supporting ScanPUF and delay based Trojan detection. The uniqueness of PUF requires the signatures of SoCs to be far away from one another. Usually, we employ the inter-die Hamming distance (HD) to evaluate the difference of two signatures. Fig. 6.14 shows the inter-die HD histogram of signatures (128 bits each) in 500 chips when the signature-generation period is 0.19 ns. We can find that most of HDs are around 50%, which means that nearly 64 bits in a signature are different from others. Hence, the ScanPUF in IIPS can provide excellent performance in SoC authentication with highly unique signature in each SoC. [97] also shows that under temporal variations (temperature or supply voltage fluctuation), signature of ScanPUF has good robustness with only a small number of flipped bits. For Trojan detection, Monte Carlo HSPICE simulations were performed on com- binational paths starting from and ending at flip-flops. Each circuit node on the path is loaded with extra inverters to imitate loading effect in actual circuit. A test path example is provided in Fig. 6.15. Since the biggest challenge in side-channel analysis based Trojan detection is process variations which can significantly mask the Trojan effect, it is necessary to validate IIPS Trojan detection capability under inter-die and intra-die process variations. The minimal Trojan that can be reliably detected in terms of its incurred extra delay is used to indicate IIPS Trojan sensitiv- ity. In particular, two combinational paths of delay 393ps and 1013ps were chosen, representing short and long paths in real circuits, respectively. Inter-die variations of transistor threshold voltage Vth with standard deviation σth,inter = 5%, 10%, and

15% were considered, with intra-die variation of standard deviation σth,intra = 5%. At each process corner, a test is first performed on one Trojan-free path to find out the threshold capture frequency, which is then used to differentiate Trojan-infected paths from Trojan-free ones. Multiple Trojans incurring different delays were mounted at each process corner to determine the Trojan sensitivity. In this context, we do not 138

Fig. 6.15. Example combinational path used as a model of Trojan attack.

concern ourselves with the types of Trojans or the manner in which the Trojan affects the path delay. The Trojan size is denoted directly by the extra delay it causes. Reliably detection here means that the miss rate is no greater than 5%, including false positive and false negative. For each process corner and each path, 100 Trojan- free and 100 Trojan-infected copies of the path with Vth in Gaussian distribution were simulated. Fig. 6.16 demonstrates IIPS Trojan sensitivity at each process corner. It can be seen that the Trojan detection resolution decreases slightly with the increase of inter-die process variations, and is lower for the long path than for the short path. This is because the accumulated process variation effect on the long path is greater than that on the short path, mainly due to more logic levels. With σ = 5% intra-die and σ ≤ 15% inter-die process variations, Trojan causing delay above 44ps can be reliably detected, equivalent to the delay caused by an extra level of XOR gate. It is worth noting that Trojan detection approaches generally rely on statistical methods, e.g. principal component analysis [79] or multidimensional scaling [98], to isolate the Trojan-infected ICs. a Trojan usually has a distributed impact on circuit delays at various paths, and statistical tools can comprehensively analyze the Trojan effect all over the circuit and thus be able to make judgements with a high precision. In this chapter, our focus is to develop a framework that provides delay based Trojan detection capability, where statistical analysis is out of the scope. However, statistical analysis can remarkably facilitate Trojan detection and is an indispensable step of Trojan detection procedure. Therefore the actual Trojan sensitivity of the proposed framework is not limited by the reported results, as isolating a Trojan does not require identifying its impact on all paths it tampers. 139

Fig. 6.16. Minimum delay of Trojan detectable with clock sweeping technique implemented in IIPS for σ = 5% intra-die process variation.

6.6.3 Hardware Overhead

We use two representative SoC benchmarks as the baseline design to characterize the silicon overhead introduced by IIPS. SoC benchmark 1 contains three ISCAS89 sequential circuits, while SoC benchmark 2 tries to imitate realistic designs by incor- porating a 32-bit DLX processor, an AES with 128-bit key, and a 128-bit pipelined FFT module. Both designs have full-scan infrastructure. The numbers of scan chains are estimated by assuming an average scan chain length of 250, which is obtained from the first ten benchmark circuits in ITC02 SoC benchmark [100]. Note that although the SE-FSM scales up with increasing number of scan chains, the number of state elements in SE-FSM increases with only the rate of log2(C · p), i.e. log2(p) + log2(C), where p is the rate of increasing number of scan chains and C is the authentication key length. Table 6.2 demonstrates that percentage area overhead induced by IIPS decreases remarkably with the size of the benchmark circuits. For real designs the overhead is less than 1%. In addition, since IIPS resides outside the functional mod- ules and is used only when performing off-line testing or security tasks, it does not incur functional performance degradation or power overhead. The clock tree routing 140

Table 6.2 Hardware overhead of IIPS w.r.t. two example SoCs.

Benchmark SoC 1 Benchmark SoC 2 Cores/SoC IP Cores SoC IIPS IP Cores SoC IIPS s1423 s5378 s9234 DLX AES FFT Area 1151 2659 3344 7154 546 30845 83049 500386 614280 4978 (µm2) (+7.6%) (+0.8%) # of SFFs 74 179 211 464 - 521 2469 33203 36193 - # of 1 1 1 3 - 3 10 133 146 - scan chains can also be preserved by keeping multiplexers between IIPS clock and functional clock inside IIPS.

6.6.4 Experimental Validation

Hardware validation of IIPS was performed on an FPGA platform where Altera DE0 FPGA development boards were used to emulate the ASIC scenario. IIPS was implemented on 65nm Cyclone III FPGA devices contained on the DE0 boards. Due to lack of precise control over the placement and routing of the mapped design, the PDL cannot be implemented as in ASICs to produce accurately tunable delays. We employ the on-chip Phase Locked Loop (PLL) module in the FPGA to generate the phase shifted clock. The on-chip PLL can provide phase shift at resolution of 97ps in our experiments, which is equivalently the clock frequency sweeping resolution. We have implemented a minimally configured IEEE 1500 infrastructure in SoC benchmark 1, and validated the functionality of IIPS by interfacing it with the bench- mark circuit. The hardware overhead incurred by IIPS is provided in Table 6.3. Part of the extra resource utilization is contributed by the control logic for PLL, which does not come into picture in ASIC scenarios. To test the effectiveness of the clock genera- tion logic, we performed Trojan detection with IIPS implemented in SoC benchmark 1. In particular, Trojans were mounted on 8 different paths of s9234 core. The Trojan was implemented as a single inverter inserted at different locations of the paths to imitate the minimal delay introduced by an extra logic level. The underlying assump- 141

Table 6.3 IIPS overhead in FPGA platform. SoC 1 SoC 1 w/ IIPS Overhead # of LUTs 1194 1238 +3.7% # of Registers 533 577 +8.3%

Table 6.4 Hardware Trojan detection results on FPGA.

Path index 1 2 3 4 5 6 7 8 Average Path delay (ns) 3.7 4.5 5.8 6.1 7.2 8.1 10.3 13.1 - Trojan induced delay 0.37 0.26 0.46 0.30 0.26 0.33 0.59 0.19 0.35 overhead (ns) Miss rate 0 9.4% 3.1% 9.4% 3.1% 12.5% 0 12.5% 6.3% tion is that the payload of a generic digital Trojan will introduce at least one extra on certain path of the circuit. We preserved the post-fitting netlist of the Trojan-free design by using incremental compilation to insert the Trojan, in order to assure the extra delay is caused by Trojan logic rather than changes in global layout topology. Target circuit paths were chosen to have nominal delay ranging from 3.7ns to 12.5ns. 16 FPGA boards were used, with Trojan-free design and Trojan-infected design mapped once on each board for 32 copies of the design. The detection miss rate for each path is provided in Table 6.4. It can be seen that the minimal 1-inverter Trojan with average delay of 0.35ns can be reliably detected with miss rate of 6.3%. The miss rate is expected to exhibit a monotonic increasing trend with the increas- ing path length if the Trojan-induced delay is identical for each path. However, the measured miss rate varies irregularly due to slight variations in Trojan placement and routing that are out of our control. The actual Trojan coverage will be much higher thanks to statistical analysis, as explained in section 6.6.2. 142

6.7 Discussion

6.7.1 Flexibility

The IIPS test protocol described in this chapter is based on a minimal implemen- tation of IEEE 1500 test infrastructure. However, IIPS security functions are flexible in interfacing with enhanced configurations of IEEE 1500 architecture. In fact, IIPS test efficiency and effectiveness can be improved with advanced features of IEEE 1500. Both ScanPUF and Trojan detection test protocols can be adapted to use the parallel interface for test vector propagation to save test time, provided a parallel TAM architecture. For the minimally configured IEEE 1500 infrastructure, Trojan detection on system buses can only be performed with LOS scheme, because core output terminal wrapper cells do not support a capture operation in EXTEST mode. However, if the wrapper is implemented using WBC s with enhanced scan functions, or the infrastructure is extended to support delay-fault test [92], a broadside capture can be provided at the system bus inputs for LOC test scheme. Improved Trojan detection as well as delay-fault coverage can thus be expected.

6.7.2 Scalability

IIPS also exhibits good potency in functional scalability to integrate more security functions or test infrastructures at minimal extra overhead. It can be adapted to provide protection against other attack models, e.g. side-channel attack on crypto systems. For example, IIPS can be equipped with a noise injector [99] to mask the transient current that leaks the encryption key related information. In fact, existing logic in IIPS can be leveraged for noise generation, provided proper control, to minimize the hardware overhead. Besides, IIPS can incorporate multiple approaches aimed at a certain attack model to enhance the defense capability. For instance, a current monitor can be integrated to detect behavior of hardware Trojans at run-time [101]. IIPS can also include an assertion-based security checker to detect malicious 143 inclusions in on-chip processors [102]. Moreover, IIPS can be shared to enhance in- field SoC testing capability, e.g. by integrating with an on-chip test control for SoC built-in self-test (BIST) [84], [103]. Integrating security functions and test support logic in IIPS can significantly minimize the design and hardware overhead because of resource sharing of the centralized control logic. In addition, the ease-of-integration feature of IIPS maintains in large complex SoCs with increasing number of IPs, because no extra design effort is required for integrating IIPS in a larger SoC to achieve the same level of security. Although scan chain gating and test control signal multiplexing need to be done for each core, such process can be automated, and can be performed after the integration of functional IPs and timing/power closure, making IIPS a scalable plug-n-play IP suitable for SoCs of various sizes.

6.7.3 Configurability

IIPS can provide design-time and run-time configurability by enabling different subsets of its security functions to fit different applications or SoC operation modes. Design-time configurability can be provided by using compiler directives in HDL codes to allow a system integrator to enable only the security measures desirable for his/her own purposes, and exclude the rest to reduce overhead. IIPS can also provide run-time configurability, where activation of different security measures is based on sensing the run-time behavior of the SoC. For instance, in low-power mode power attacks are more likely, hence IIPS may trigger more aggressive countermeasure against power analysis attacks. And during test/debug mode, security measures can be adapted to enable enhanced testing features.

6.8 Summary

We have presented a novel paradigm of secure SoC design using an infrastruc- ture IP for security. The IP, referred to as IIPS, is a centralized on-chip module 144 that interfaces with other IPs in SoC and provides multiple hardware security mea- sures. IIPS features ultra-low overhead, ease of integration, compliance to standard SoC test protocol, functional flexibility, scalability and configurability. It can also facilitate conventional SoC testing. We have presented design of the IIPS module and its efficient interfacing with other IP cores. We have shown the effectiveness of an IIPS module that incorporates a low-overhead authentication mechanism for preventing scan-based attacks; a Physical Unclonable Function (PUF) primitive to protect against piracy and counterfeit chips; and a test infrastructure for trust val- idation against hardware Trojan attacks. Monte Carlo HSPICE simulation results and experimental measurements on an FPGA platform validate the functionality and security features of IIPS. The results show high uniqueness of the ScanPUF primitive with close to 50% average inter-die HD. The Trojan detection primitive can reliably detect the impact of a single XOR gate in a long delay path with above 95% preci- sion. Both ASIC and FPGA implementations of IIPS show that it incurs ultra-low hardware overhead. IIPS can flexibly interface with SoCs equipped with various con- figurations of IEEE 1500 infrastructure, and can benefit from higher configurations in test efficiency. Along with its plug-n-play advantage, IIPS can be expanded to provide enhanced security functions for protecting an SoC against other security attacks, or a certain attack in multiple ways, as well as to integrate with test infrastructures to minimize the design cost and overhead. Future work will include effective integration of IIPS with other infrastructure IPs for test and debug as well as expansion of its capability to protect against other security issues. 145

7. CONCLUSION AND FUTURE WORK

Hardware Trojan attack is an emerging security attack in electronic hardware (inte- grated circuits and components) that poses severe threats on operational reliability and integrity, which have been observed through security analysis as well as recently reported attacks. This thesis has performed a thorough analysis on the attack by exploring hardware Trojan design space and feasibility of hard-to-detect Trojan in- sertion, and developing defense mechanisms to thwart the attack through design-for- security as well as efficient test solutions. The proposed approaches have been shown to be effective to protect against Trojan attacks of diverse forms and sizes at low hard- ware overhead through both simulations and experimental validation. A possible next step of countermeasures against hardware Trojan would be integrating multiple de- fense mechanisms to provide comprehensive threat prevention and detection against various forms of attacks. In particular, future investigations will extend both threat analysis by exploring novel hardware Trojan design techniques and low-cost defense mechanisms. Hard- ware/software interface in processors will be reviewed to explore opportunities of inserting hardware Trojans to facilitate well known software attacks, e.g. buffer over- flow. Design of Trojans in SRAMs will be extended by exploiting the array peripheral circuitry like word-line buffers, which typically introduce more irregular logic as well as free space in the layout, to enable design of more sophisticated Trojans, and pos- sibly sequential Trojans. The model and design methodology of SRAM Trojans can be adapted to other types of memories, e.g. register files, DRAM and flash memory. Test generation for TeSR will be further improved for generating compact test patterns while maintaining the Trojan coverage. The methodology can also be vali- dated against designs containing multiple FSMs that make the issue of state explosion 146 more challenging. Future investigation on SCARE would focus on developing an au- tomation framework and validation with measurement results from commercial ICs. The design of IIPS can be expanded to incorporate design-time programmability, so that a system integrator can have flexibility in enabling only the security functions desirable for his/her own purposes, and excluding the rest to reduce overhead. The IIPS module can also be made configurable at run time to adapt to the changing security need with the operating modes of an IC. Security functions of IIPS will also be enriched to protect an SoC against other forms of hardware attacks. For example, a noise injector circuit can be included to defend an on-chip crypto core against side- channel attacks. For SoCs that apply hardware obfuscation techniques to prevent IP piracy or reverse-engineering attacks, the obfuscation control logic can be integrated into IIPS to minimize design overhead. 147

REFERENCES

[1] DARPA, “TRUST in Integrated Circuits (TIC),” 2007. [Online]. Available: http: //www.darpa.mil/MTO/solicitations/baa07-24. [2] R.S. Chakraborty, S. Narasimhan and S. Bhunia, “Hardware Trojan: Threats and emerging solutions,” High-Level Design Verification and Test Workshop, 2009. [3] L. Lin, W. Burleson and C. Parr, “MOLES: malicious off-chip leakage enabled by side-channels,” Intl. Conf. on Computer-Aided Design, 2009. [4] Y. Jin, and Y. Makris, “Hardware Trojans in Wireless Cryptographic ICs,” IEEE Design & Test of Computers, Vol.27 Issue 1, pp. 26-35, 2010. [5] Cyber Security Awareness Week ESC, [Online]. Available: http://www.poly. edu/csaw-embedded. [6] X. Wang, S. Narasimhan, A. Krishna, T. Mal-Sarkar, and S. Bhunia, “Sequen- tial hardware Trojan: Side-channel aware design and placement,” IEEE 29th International Conference on Computer Design (ICCD), 2011. [7] A. Maiti, J. Casarona, L. McHale, and P. Schaumont, “A Large Scale Character- ization of RO-PUF,” Proc. IEEE Intl. Workshop on Hardware-Oriented Security and Trust (HOST), 2010. [8] S. Narasimhan, X. Wang, D. Du, R. S. Chakraborty and S. Bhunia, “TeSR: A Robust Temporal Self-Referencing Approach for Hardware Trojan Detection,” Proc. IEEE Intl. Workshop on Hardware-Oriented Security and Trust (HOST), 2011. [9] X. Wang, T. Mal-Sarkar, A. Krishna, S. Narasimhan, and S. Bhunia, “Soft- ware exploitable hardware Trojans in embedded processor,” IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2012. [10] M. Tehranipoor and F. Koushanfar, “A survey of hardware Trojan taxonomy and detection,” IEEE Design and Test of Computers, vol. 27, no. 1, pp. 10-25, 2010. [11] Y. Jin and Y. Makris, “Hardware Trojan detection using path delay fingerprint,” HOST, 2008. [12] S. T. King et al., “Designing and Implementing Malicious Hardware,” USENIX Workshop on LEET, 2008. [13] R. R. Rivest, “The RC5 Encryption Algorithm,” FSE, 1994. 148

[14] R. Karri, J. Rajendran, K. Rosenfeld, and M. Tehranipoor, “Toward trusted hardware: Identifying and classifying hardware Trojans,” IEEE Computer Mag- azine, 2010. [15] A. R. Alameldeen and D. A. Wood, “Adaptive Cache Compression for High- Performance Processors,” ISCA, 2004. [16] H. Asadi et al., “Reliability Tradeoffs in Design of Cache Memories,” Workshop on Architectural Reliability, 2005. [17] A. J. van de Goor, “Using March Tests to Test SRAMs,” IEEE Design & Test of Computers, March 1993. [18] S. Hamdioui and A. J. van de Goor, “An Experimental Analysis of Spot Defects in SRAMs: Realistic Fault Models and Tests,” ATS, 2000. [19] S. Hamdioui et al., “Memory Test Experiment: Indistrial Results and Data,” IEE Proceedings, 2006. [20] K.-L. Cheng et al., “Neighborhood Pattern-Sensitive Fault Testing and Diag- nostics for Ramdom-Access Memories,” IEEE Trans. on CAD of Integr. Circuits and Syst., Vol. 21, No. 11, November 2002. [21] S. Hamdioui et al., “Testing Static and Dynamic Faults in Random Access Memories,” VTS, 2002. [22] S. Hamdioui et al., “Linked Faults in Random-Access-Memories: Concepts, Fault Models, Test Algorithms and Industrial Results,” IEEE Trans. on CAD of Integr. Circuits and Syst., 2004, 23, (5), pp.282-289. [23] W. Maly, “Modeling of Lithography Related Yield Losses for CAD of VLSI Circuits,” IEEE Trans. CAD, 1985, 4, (3), pp.166-177. [24] J.P. Shen, “Inductive Fault Analysis of CMOS Integrated Circuits,” IEEE Des. Test Comput., 1985, 2, (6), pp.13-26. [25] M. A. Breuer, “Diagnosis and reliable design of digital systems,” Computer Science Press, 1976. [26] A. J. van de Goor, “Testing Semiconductor Memories, Theory and Practice,” John Wiley & Son, Chichester, UK, 1991. [27] S. Borri et al., “Defect-Oriented Dynamic Fault Models for Embedded-SRAMs,” Test Workshop, 2003. [28] A.J. van de Goor and I.B.S Tlili, “March Tests for Word-Oriented Memories,” Design, Automation and Test in Europe (DATE), 1998. [29] D. Agrawal, S. Baktir, D. Karakoyunlu, P. Rohatgi and B. Sunar, “Trojan detection using IC fingerprinting,” IEEE Symp. on Security and Privacy, 2007. [30] R. Rad, J. Plusquellic and M. Tehranipoor, “A sensitivity analysis of power signal methods for detecting hardware Trojans under real process and environmental conditions,” IEEE Tran. on Very Large Scale Integration (VLSI), 2010. 149

[31] S. Narasimhan et al, “Multiple-parameter side-channel analysis: A non-invasive hardware Trojan detection approach,” HOST, 2010. [32] J. Aarestad, D. Acharyya, R. Rad and J. Plusquellic, “Detecting Trojans though leakage current analysis using multiple supply pad IDDQs,” IEEE Tran. on In- formation Forensics and Security, 2010. [33] C. Lamech, R.M. Rad, M. Tehranipoor, J. Plusquellic, “An experimental analysis of power and delay signal-to-noise requirements for detecting Trojans and methods for achieving the required detection sensitivities,” IEEE Tran. on Information Forensics and Security, Sept. 2011. [34] F. Koushanfar and A. Mirhoseini, “A unified framework for multimodal submod- ular Integrated Circuits Trojan detection,” IEEE Tran. on Information Forensics and Security, vol. 6, no. 1, Mar. 2011. [35] M. Banga and M. Hsiao, “A region based approach for the identification of Hardware Trojans,” HOST, 2008. [36] D. Du, S. Narasimhan, R.S. Chakraborty and S. Bhunia, “Self-referencing: A scalable side-channel approach for hardware Trojan detection,” CHES Workshop, 2010. [37] H. Salmani, M. Tehranipoor, and J. Plusquellic, “A layout-aware approach for improving localized switching to detect hardware Trojans in Integrated Circuits,” IEEE Intl. Workshop on Information Forensics and Security (WIFS), 2010. [38] M. Banga and M.S. Hsiao, “A novel sustained vector technique for the detection of hardware Trojans,” VLSI Design Conference, 2009. [39] J. Rajendran, V. Jyothi, O. Sinanoglu, and R. Karri, “Design and analysis of ring oscillator based Design-for-Trust technique,” VTS, 2011. [40] C. Lavin et al., “Using Hard Macros to Reduce FPGA Compilation Time,” FPL, 2010. [41] X. Wang, S. Narasimhan, A. Krishna and S. Bhunia, “SCARE: Side-Channel Analysis Based Reverse Engineering for Post-Silicon Validation,” 25th Interna- tional Conference on VLSI Design (VLSID), 2012. [42] M. Potkonjak, A. Nahapetian, M. Nelson and T. Massey, “Hardware Trojan horse detection using gate-level characterization,” DAC, 2009. [43] Y. Alkabani and F. Koushanfar, “Consistency-based characterization for IC Trojan detection,” Intl. Conf. on Computer-Aided Design, 2009. [44] S. Wei, and M. Potkonjak, “Scalable hardware Trojan diagnosis,” IEEE Tran. on Very Large Scale Integration (VLSI), 2011. [45] M. Abramovici and P. Bradley, “Integrated Circuit security - new threats and solutions,” CSIIR Workshop, pp. 1-3, 2009.. [46] G. Bloom et al, “Providing secure execution environments with a last line of defense against Trojan circuit attacks,” Computers and Security, 2009. 150

[47] G. E. Suh et al., “Physical Unclonable Functions for Device Authentication and Secret Key Generation,” DAC, 2007. [48] S. Mangard et al., “Power Analysis Attacks- Revealing the Secrets of Smart Cards,” Springer 2007. [49] N. H. E. Weste and D. M. Harris, “CMOS VLSI Design: A Circuits and Systems Perspective,” Addison Wesley 2011. [50] S. Kundu, S.T. Zachariah, Yi-Shing Chang and C. Tirumurti, “On modeling crosstalk faults,” IEEE Tran. Computer-Aided Design, 2005. [51] K. Tiri and I. Verbauwhede, “A logic level design methodology for a secure DPA resistant ASIC or FPGA implementation,” DATE, 2004. [52] K.J. Kulikowski, V. Venkataraman, Z. Wang and A. Taubin, “Power balanced gates insensitive to routing capacitance mismatch,” DATE, 2008. [53] [Online]. Available: www.opencores.org. [54] D. James. “Reverse engineering delivers product knowledge, aids tech- nology spread”. Electronic Design, 2006. [Online]. Available: http:// electronicdesign.com/Articles/Index.cfm?AD=1&ArticleID=11966. [55] DARPA, “Integrity and Reliability of Integrated Circuits (IRIS)”, 2010. [Online]. Available: https://www.fbo.gov/index?id= 342ac5ed191ae7b8b03357fead590c4e. [56] Chipworks, Inc., “Semiconductor manufacturing - reverse engineering of semi- conductor components, parts and process”. [Online]. Available: http://www. chipworks.com. [57] R.S. Chakraborty, F. Wolff, S. Paul, C. Papachristou and S. Bhunia, “MERO: A statistical approach for hardware Trojan detection”, CHES Workshop, 2009. [58] M. Tehranipoor and F. Koushanfar. “A survey of hardware Trojan taxonomy and detection”. IEEE Design and Test of Computers, 2010. [59] M.C. Hansen, H. Yalcin, and J.P. Hayes. “Unveiling the ISCAS-85 Benchmarks: A case study in reverse engineering”. IEEE Design and Test of Computers, vol. 16, no. 3, pp. 72-80, 1999. [60] D.G. Saab, V. Nagabudi, F. Kocan, and J. Abraham. “Extraction based verifi- cation method for off the shelf Integrated Circuits”. ASQED, 2009. [61] S. Borkar et al, “Parameter variations and impact on circuits and micro- architecture”, DAC, 2003. [62] D. Du, S. Narasimhan, R.S. Chakraborty and S. Bhunia, “Self-referencing: A scalable side-channel approach for hardware Trojan detection”, CHES, 2010. [63] R. Rad, J. Plusquellic and M. Tehranipoor, “A sensitivity analysis of power signal methods for detecting hardware Trojans under real process and environmental conditions”, IEEE TVLSI, 2010. [64] Predictive Technology Model, [Online] http://www.eas.asu.edu/∼ptm/. 151

[65] K. Tiri et al., “A Logic Level Design Methodology for a Secure DPA Resistant ASIC or FPGA Implementation,” DATE, 2004. [66] Y. Lu et al., “FPGA Implementation and Analysis of Random Delay Insertion Countermeasure against DPA,” Proceedings of the International Conference on ICECE Technology (FTP08), 2008. [67] A. R. Krishna et al., “MECCA: A Robust Low-Overhead PUF using Embedded Memory Array,” CHES, 2011. [68] J. Lee et al., “Securing Scan Design Using Lock & Key Technique,” DFT, 2005. [69] B. Yang et al., “Secure Scan: A Design-for-Test Architecture for Crypto Chips,” DAC, 2005. [70] S. Paul et al., “VIm-Scan: A Low Overhead Scan Design Approach for Protection of Secret Key in Scan-Based Secure Chips,” VTS, 2007. [71] F. Koushanfar, “Integrated Circuits Metering for Piracy Protection and Digital Rights Management: an Overview,” GLSVLSI, 2011. [72] V. V. D. Leest et al., “Anti-counterfeiting with hardware intrinsic security,” DATE, 2013. [73] B. Gassend et al., “Delay-Based Circuit Authentication and Applications,” SAC, 2003. [74] Quiddicard: http://www.intrinsic-id.com/products/quiddicard-/. [75] IHS Electronics & Media: http://www.isuppli.com/Pages/Market-Research- Products.aspx. [76] D. Hely et al., “Scan Design and Secure Chip,” IEEE Intl. On-Line Testing Symposium, 2004. [77] R. S. Chakraborty et al., “Security Against Hardware Trojan Attacks Using Key-based Design Obfuscation,” Journal of Electronic Testing: Theory and Ap- plications, vol. 27. no. 6, pp. 767-785, Dec 2011. [78] X. Zhang et al., “RON: An on-chip ring oscillator network for hardware Trojan detection,” DATE, 2011. [79] Y. Jin et al., “Hardware Trojan detection using path delay fingerprint,” HOST, 2008. [80] H. Salmani et al., “A Novel Technique for Improving Hardware Trojan Detection and Reducing Trojan Activation Time,” IEEE Transactions on Very Large Scale Integration Systems, Vol. 20, No. 1, Jan 2012. [81] Synopsys Verification IP: http://www.synopsys.com/Tools/Verification/ - FunctionalVerification/VerificationIP/Pages/default.aspx. [82] Intellitech Test-IP Product Family: http://www.intellitech.com/- prod- ucts/boundary scan test.asp. 152

[83] Y. Zorian, “Guest Editor’s Introduction: What is Infrastructure IP?” IEEE Design & Test of Computers, 2002. [84] P. Bernardi et al., “Exploiting an I-IP for in-field SoC test,” DFT, 2004. [85] S. Tabatabaei et al., “Embedded Timing Analysis: A SoC Infrastructure,” IEEE Design & Test of Computers, 2002. [86] E. Dupont et al., “Embedded Robustness IPs for Transient-Error-Free ICs,” IEEE Design & Test of Computers, 2002. [87] J. Bordelon et al., “A Strategy for Mixed-Signal Yield Improvement,” IEEE Design & Test of Computers, 2002. [88] F. DaSilva et al., “Overview of the IEEE P1500 Standard,” ITC, 2003. [89] IEEE 1450.6 Core Test Language (CTL): http://grouper.ieee.org/groups/ctl/. [90] International Technology Roadmap for Semiconductors: http://www.itrs.net/. [91] IEEE 1500 Embedded Core Test: http://grouper.ieee.org/groups/1500/. [92] Q. Xu et al., “Delay Fault Testing of Core-Based Systems-on-a-Chip” DATE, 2003. [93] R. Tayade et al., “On-chip Programmable Capture for Accurate Path Delay Test and Characterization,” ITC, 2008. [94] E. J. Marinissen et al., “The Role of Test Protocols in Automated Test Gen- eration for Embedded-Core-Based System ICs,” Journal of Electronic Testing: Theory and Applications, Vol. 18 Issue 4-5, Aug.-Oct., 2002. [95] Available Online: http://ptm.asu.edu/. [96] D. D. Josephson et al., “Debug Methodology for the McKinley Processor,” ITC, 2001. [97] Y. Zheng et al., “ScanPUF: Robust Ultralow Overhead PUF Using Scan Chain,” ASP-DAC, 2013. [98] K. Xiao et al., “A Clock Sweeping Technique for Detecting Hardware Trojans Impacting Circuits Delay,” IEEE Design & Test of Computers, Issue 99, 2012. [99] X. Wang et al., “Role of Power Grid in Side Channel Attack and Power-Grid- Aware Secure Design,” DAC, 2013. [100] ITC’02 SoC Test Benchmarks: http://itc02socbenchm.pratt.duke.edu/. [101] S. Narasimhan et al., “Improving IC Security against Trojan Attacks through Integration of Security Monitors,” IEEE Design & Test of Computers Special Issue on Smart Silicon, 2012. [102] M. Bilzor et al., “Security Checkers: Detecting processor malicious inclusions at runtime,” HOST, 2011. [103] P. Bernardi et al., “A P1500-compatible programmable BIST approach for the test of Embedded Flash Memories,” DATE, 2003. [104] S. Francisco et al., “The Core Test Wrapper Handbook,” Springer 2006.