INFRASTRUCTURE AND PRIMITIVES FOR HARDWARE SECURITY IN INTEGRATED CIRCUITS

by ABHISHEK BASAK

Submitted in partial fulfillment for the degree of Doctor of Philosophy

in Electrical Engineering and Science

CASE WESTERN RESERVE UNIVERSITY

May 2016 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES

We hereby approve the dissertation of ABHSIHEK BASAK Candidate for the degree of Doctor of Philosophy

Committee Chair Swarup Bhunia

Committee Member Frank Merat

Committee Member Soumyajit Mandal

Committee Member Ming-Chun Huang

Committee Member Sandip Ray

Date of Defense 03/15/2016

We also certify that any written approval has been obtained for any proprietary material contained therein. To my Family and Friends

i Contents

List of Tables vi

List of Figures viii

Abbreviations xii

Acknowledgements xiv

Abstract xvi

1 Introduction 1 1.1 What are Counterfeit ICs? ...... 3 1.2 Related Work on Countermeasures against Counterfeit ICs . . . . . 6 1.3 Major Contributions of Research (Part I) ...... 8 1.4 System-on-Chip (SoC) Security ...... 11 1.4.1 Background on SoC Security Policies ...... 13 1.4.2 Issues with Current SoC Design Trends ...... 15 1.4.3 Related Work ...... 17 1.5 Major Contributions of Research (Part II) ...... 18 1.6 Organization of Thesis ...... 21

2 Antifuse based Active Protection against Counterfeit ICs 22 2.1 C-Lock Methodology ...... 23 2.1.1 Business Model ...... 25 2.1.2 Pin Lock Structure ...... 26 2.1.3 Lock Insertion in I/O Port Circuitry ...... 27 2.1.4 Programming the Key ...... 28 2.1.5 Design Circuitry for Chip Unlocking ...... 29 2.1.5.1 Lock/Unlock Controller State Transitions . . . . . 30 2.2 Security and Overhead Analysis of C-Lock ...... 31 2.2.1 Security Analysis ...... 31

ii Table of Contents iii

2.2.1.1 Resistance against Side Channel Attacks ...... 32 2.2.1.2 Why not FSM based Unlocking ? ...... 32 2.2.2 Overhead Analysis ...... 33 2.2.3 Comparison with PUF and Aging Sensors ...... 34 2.3 Discussion ...... 35 2.4 P-Val Methodology ...... 36 2.5 P-Val Implementation ...... 39 2.5.1 Important AF Properties ...... 39 2.5.2 P-Val Component Selection ...... 40 2.5.2.1 Effect of AF/TF on Normal Pin Operation . . . . . 40 2.5.2.2 Antifuse (AF) Selection ...... 42 2.5.2.3 Test Fuse (TF) Selection ...... 43 2.5.2.4 Package Level Fabrication ...... 43 2.6 Pin Locking and IC Authentication in P-Val ...... 45 2.6.1 Pin Locking ...... 45 2.6.2 IC Authentication Methodology ...... 46 2.6.3 Signature Generation ...... 48 2.7 Security Analysis ...... 49 2.7.1 P-Val Security against Recycled Chips ...... 50 2.7.2 Security of P-Val against Cloned chips ...... 51 2.7.2.1 Precision Resistance Insertion ...... 52 2.7.2.2 AF Integration in Cloned ICs ...... 53 2.7.2.3 Protection against Overproduced ICs ...... 55 2.7.3 Uniqueness and Robustness of Signature ...... 55 2.7.3.1 Simulation Setup & Metrics ...... 56 2.7.3.2 Results ...... 57 2.7.4 Sample Cloning and Overhead Values ...... 57 2.8 Conclusion ...... 58

3 Nearly Free of Cost Protection against Cloned ICs 60 3.1 PiRA Methodology ...... 61 3.2 Implementation of PiRA ...... 64 3.2.1 Sources of Entropy ...... 64 3.2.2 Measurement Scheme ...... 65 3.2.3 Signature Generation ...... 67 3.3 Security Analysis ...... 70 3.3.1 PiRA Security ...... 70 3.3.2 Uniqueness and Robustness of Signature ...... 71 3.3.3 Discussion ...... 75 3.4 Conclusion ...... 76

4 A Flexible Architecture for Systematic Implementation of SoC Security Policies 78 4.1 Architecture ...... 79 Table of Contents iv

4.1.1 IP Security Wrappers ...... 81 4.1.2 Security Wrapper Implementation ...... 82 4.1.3 Security Policy Controller ...... 83 4.1.4 Secure Authenticated Policy Upgrades ...... 85 4.1.5 Policy Implementation in SoC Integration ...... 86 4.1.6 Alleviation of Issues ...... 86 4.2 Use Case Scenarios ...... 87 4.2.1 Use Case I: Secure Crypto ...... 88 4.2.2 Use Case II: Access Control ...... 90 4.3 Overhead Analysis ...... 92 4.4 Conclusion ...... 94

5 Exploiting Design-for-Debug in SoC Security Policy Architecture 95 5.1 On-Chip Debug Infrastructure ...... 96 5.2 Methodology ...... 98 5.3 DfD-Based Security Architecture ...... 100 5.3.1 Debug-Aware IP Security Wrapper ...... 100 5.3.2 SPC-Debug Infrastructure Interface ...... 101 5.3.3 Design Methodology ...... 103 5.4 Use Case Analysis ...... 104 5.4.1 An Illustrative Policy Implementation ...... 104 5.4.2 On-Field Policy Implementation/Patch ...... 105 5.5 Experimental Results ...... 106 5.6 Related Work ...... 109 5.7 Hardware Patch in SoCs ...... 109 5.8 Conclusion ...... 112

6 Security Assurance in SoC in presence of Untrusted IP Blocks 113 6.1 Problem of Untrustworthy IPs ...... 113 6.2 Background and Related Work ...... 116 6.3 System-level Security Issues Caused by Untrusted IPs ...... 118 6.4 SoC Security Architecture Resilient to Untrusted IP ...... 125 6.4.1 Assumptions ...... 125 6.4.2 Untrustworthy Security Wrappers ...... 127 6.4.2.1 Solution Methodology ...... 129 6.4.2.2 Implementation Details ...... 131 6.4.3 Untrustworthy IP Cores ...... 133 6.4.3.1 IP-Trust Aware Security Monitors: ...... 135 6.4.3.2 IP-Trust Aware Interface Triggers ...... 140 6.4.3.3 IP-Trust Aware Security Policies ...... 144 6.5 Use Case Analysis ...... 146 6.6 Overhead Analysis ...... 150 6.6.1 Security Monitor Implementations ...... 151 6.6.2 Results ...... 151 Table of Contents v

6.7 Conclusion ...... 153

7 Conclusion and Future Work 155

Bibliography 159 List of Tables

2.1 Major Electrical Properties of the Antifuse based Lock [1] . . . . . 26 2.2 Security & Area Overhead of proposed Locking at 45 nm ...... 33 2.3 Qualitative Comparison with Alternative Approaches ...... 34 2.4 Area Overhead Comparison at 45 nm. Process Technology . . . . . 34 2.5 Major Properties of the P-Val MIM Antifuse ...... 42 2.6 Security & Estimated Package Area Overhead of P-Val ...... 58

4.1 Representative set of security critical events according to IP type . 81 4.2 Policies for Usage Case Analysis ...... 87 4.3 Area & Power Overhead of IP Security Wrapper (at 32nm) . . . . 93 4.4 Area & Power of Central Security Controller(at 32 nm) ...... 93 4.5 Die Area Overhead of Central Controller(at 32 nm) ...... 94

5.1 Typical Security Critical Events detected by DfD Trace Cell in Pro- cessor Core ...... 99 5.2 Example DfD Instrumentation Features by IP Type in SoC Model 107 5.3 Area (µm2), Power (µW) of DAP (SoC Area- ∼ 1.42X106µm2; SoC Power- > 30 mW )...... 107 5.4 Area (µm2), Power (µW) Overhead of DfD Trace Macrocells in SoC 107 5.5 Area (µm2) Savings of IP Security Wrapper ...... 108 5.6 Power (mW) Analysis in SoC on implementation of Debug Reuse . 109

6.1 Current trends in Trojan Research and Scope of this Work . . . . . 118 6.2 Assumptions Regarding Trustworthiness of Associated Components in Solution Methodology with respect to an Untrusted IP ...... 126 6.3 Categorization of MCE and Policies by IP Types ...... 137 6.4 Representative Interface Triggers for an Untrustworthy Processor . 144 6.5 Different Scenarios of Trojan (represented by payload) Coverage by Insertion of Security Monitors in three IP Cores of our framework . 150 6.6 Area & Power Overhead of Security Monitors in Processor IP (Orig. Area and Power with 1 KB inst., data memory at 32 nm - 352405 µm2 , 12.56 mW) ...... 152 6.7 Area & Power Overhead of Security Monitors in Memory Controller (MC) IP and SPI Controller IP (Orig. Area and Power of MC and SPI with wrappers at 32 nm - 629433 µm2, 13.81 mW;; 5456 µm2, 0.298 mW) ...... 152

vi List of Tables vii

6.8 Die Area Overhead (OVH) of Security Monitors (SMs) with maxi- mum Trojan coverage wrt. to our SoC framework (Area - 13.1X106), Apple A5 APL2498 (Area - 69.6X106), Intel Atom Z2520 (Area - 40.2X106), all at 32 nm process technology ...... 153 List of Figures

1.1 Different security threats in the modern electronic system design process, addressed by approaches proposed in this dissertation. . . . 2 1.2 a) Present semiconductor business model; (b) possible sneak paths for adversaries to insert counterfeit ICs into the supply chain. . . . 4 1.3 Percentage of reported counterfeit incidences by IC type in 2011 [2]; (b) Counterfeit ICs sold by VisionTech for different critical appli- cations, under name of various semiconductor vendors...... 5 1.4 Classification of existing anti-counterfeiting protection schemes. . . 7 1.5 Some typical current application/usage scenarios where SoCs are utilized for implementing the corresponding electronic systems . . . 12 1.6 Schematic of a typical representative SoC architecture with the pro- posed framework for security policies ...... 12 1.7 Stages of a typical SoC front end (till fabrication) design process where system level security policies may be defined, refined or mod- ified...... 16

2.1 Major stages of programming a Metal-Insulator-Metal antifuse with associated parameter values...... 23 2.2 Schematic of the implementation of the proposed on-die locking mechanism in an IC...... 24 2.3 a) Incorporation of the security mechanism in the current IC design cycle and b) the semiconductor business model to protect against diverse counterfeiting attacks...... 25 2.4 Implementation of MIM antifuse in a 2 metal process...... 26 2.5 Insertion of the lock unit in a general purpose input-output (GPIO) port of a state of the art microcontroller [3]...... 27 2.6 a) Modified OTP antifuse (AF) based 128-bit ROM architecture on-chip for storing authentication key [4]; b) 2 possible OTP ROM bit structures based on AF [4], [5]...... 28 2.7 Additional design circuitry for comparison of key for counterfeit chip authentification...... 30 2.8 (a) Typical state transition diagram of the controller in the com- parator circuitry; (b) Example of XOR gate in a Power balance type logic [6] ...... 31 2.9 (a) Overview of the proposed security mechanism in an IC; b) uni- fied protection against recycling and cloning ...... 37

viii List of Figures ix

2.10 a) A general representation of the I/O port circuitry found in dif- ferent chips (µC, FPGA, µP etc.); b) minimal loading effect due to proposed scheme during the critical output mode operation of each candidate pin...... 41 2.11 (a) Discrete AF-TF integration in through hole and surface mount packages like QFN, QFP, PLCC, SSOP [7]; (b) P-Val implementa- tion in flip-chip bonded BGA based CSPs [8]...... 44 2.12 Life-cycle of a legitimate IC with P-Val implementation...... 46 2.13 (a) Possible sources of intrinsic variation of programmed AF resis- tances (Ron); (b) Variation of Ron of MIM AF at 20 mA program current (Ipp); (c) Greater variation of AF Ron at lower Ipp at similar program voltages, duration and pulse patterns...... 47 2.14 Security provided by P-Val against different attempts by an adver- sary to bypass the used/recycled chip detection scheme...... 50 2.15 (a) Example set diagram representation of match probability of cloned IC signatures. (b) CDF and derived PDF at Ipp of 10 mA used for simulation studies; (c) Variation of calculated cloning prob- ability with number of authentication pins (1 million legitimate ICs). 53 2.16 (a) Distribution of fractional Inter-Hamming distance for 190 bit signatures in 1000 chips at Ipp = 10mA; (b) Probability of signature bits bits being 1; (c) Fractional Intra-Hamming distance distribu- tions for 190 bit signatures in 1000 chips with coarse measurement resolution between 0.5 − 1Ω and 10 measurement instances (com- pared with reference of 0.05Ω resolution)...... 56

3.1 Chip-specific signature creation from the intrinsic variations of pin resistances across ICs, measured by DC input/output current vari- ations for particular voltages at pins...... 61 3.2 a) Incorporation of the signature generation step in the IC design cycle; (b) seamless integration of PiRA with the current semicon- ductor business model for enhanced security...... 63 3.3 Measured variation of pin input current in (a) normal operational mode (logic high i/p voltage of 5.5 V) and (b) forward-biased ESD diode (6 V i/p for Vdd = 5.5 V) in 28 PIC micro-controller chips. . 64 3.4 (a) Schematic of typical path from IC pin to die core logic; (b) Representative on die I/O logic as well as package level components. 65 3.5 (a) Typical measurement scheme of input leakage currents; ex- tended schemes measuring (b) output drive; (c) forward-biased diode current that can be utilized in PiRA to create signatures...... 67 3.6 Measured pin leakage currents at logic low and high input voltages for 3 different pins across 3 chips [3]...... 68 3.7 (a) Fractional Inter-Hamming and (b) Fractional Intra-Hamming distance (5 repetitions) for 82 bit signatures across 28 PIC µC ICs; (c) Probability of 1 of signature bits...... 72 List of Figures x

3.8 (a) Fractional Inter-Hamming and (b) Fractional Intra-Hamming distance (5 repetitions) for 80 bit signatures across 22 OP-AMP ICs; (c) Probability of 1 of signature bits...... 73 3.9 (a) Forward-biased diode voltage selection; (b) Fractional Inter- Hamming and (c) Fractional Intra-Hamming distance (5 repeti- tions) for 91 bit signatures considering both Vdd and Vss diodes in 7 I/O ports across 28 PIC µC ICs; (d) Probability of 1 of each signature bit...... 75 3.10 (a) Fractional Inter-Hamming distance for 135 bit signatures in 25 SRAM ICs (good uniqueness); (b) Probability of 1 of each signature bit...... 76

4.1 Schematic of a proposed architecture framework with the major components, for systematic implementation of SoC security policies 80 4.2 Architecture of a generic IP security wrapper...... 82 4.3 (a) Fields of a typical event frame; (b) An example communication protocol between wrapper and security engine...... 84 4.4 Representative centralized E-IIPS top level architecture...... 85 4.5 Flow/message diagram representation of implementation of Use Case I...... 89 4.6 Flow diagram representation of implementation of Use Case II . . . 91 4.7 A representative high level architecture of our functional toy SoC model in RTL ...... 92

5.1 Simplified SoC DfD Architecture Based on CoresightTM...... 97 5.2 Additional hardware resources for interfacing DfD with IP security wrapper...... 100 5.3 Interfacing SPC with on-chip debug...... 102 5.4 Use case scenario of security policy implementation exploiting the local DfD instrumentation...... 105 5.5 Block diagram schematic of SoC model with on-chip debug infras- tructure...... 106

6.1 Typical Representative SoC Front End (Until Fabrication Ready) Design Flow ...... 114 6.2 a) Example IP level Trojans in a representative state machine ver- ilog RTL (soft IP); b) Logic level representation of a sample Trojan Model...... 115 6.3 Message diagram level representation of untrustworthy IP being a (a) passive reader and modifier, (b) diverter and masquerader, along with the associated threats...... 120 6.4 (a) Cross-verification based proposed methodology to detect un- trusted wrappers; (b) Modifications required for re-purposing DfD for security policies in SoC; (c) Zoomed view of the additions in IP security wrapper and corresponding DfD...... 130 List of Figures xi

6.5 a) Typical sequence of events in detecting malicious action of IP in proposed solution; b) Block diagram representation of architecture of enhanced IP-Trust aware security wrapper...... 135 6.6 Potential sites in the IP design for insertion of IP-Trust aware secu- rity monitors in a) MIPS processor core, b) representative memory controller and c) NoC router ...... 138 6.7 Input Tags associated with input data/control streams to untrust- worthy IP, depending on the security criticality of the interacting IP...... 142 6.8 Operation flow of the proposed solution for providing system level protection in use case scenario of Trojan in security wrapper and core of main memory controller...... 147 6.9 a) Block Diagram Schematic of the SoC framework; the internal sub-units of the b) DLX processor, c) Representative memory con- troller and d) SPI controller ...... 150 Abbreviations

H/W HardWare S/W SoftWare F/W FirmWare SoC System On Chip DfS Design For Security IC IP Intellectual Property AF AntiFuse OTP One Time Programmable NVM Non Volatile Memory OTR One Time Readable DC Direct Current I/O Input/Output SPC Security Policy Controller DfD Design for Debug ROM Read Only Memory JTAG Joint Test Action Group IIPS Infrastructure Intellectual Property for Security E-IIPS Extended Infrastructure Intellectual Property for Security PUF Physical Unclonable Function TM Trace Macrocell IM Instruction Memory DM Data Memory SPI Serial Peripheral Interface xii Abbreviations xiii

IF Instruction Fetch ID Instruction Decode EX EXecute MEM Memory Access WB WriteBack MCE Micro-Architecturally Correlated Events SM Security Monitor ASIC Application Specific Integrated Circuit AES Advanced Encryption Standard MC Memory Controller IT Interface Trigger TOCTOU Time Of Check to Time Of Use Acknowledgements

First and foremost, I would like to express my sincere gratitude to my advisor Prof. Swarup Bhunia for providing me the opportunity to pursue a PhD degree under his supervision. Along with inspiring me with undying optimism and con- fidence to explore novel, impactful research at all times, Prof. Bhunia has been an ideal supervisor, encouraging me to hone my academic abilities in terms of teaching, presentation and mentoring. He has always encouraged out of the box, independent thinking in interpreting and tackling tough research problems. Under his guidance, I have learnt to appreciate the virtues of positive thinking, patience and persistence in all aspects of life.

Besides my advisor, I would like to thank my dissertation committee members, Prof. Frank Merat, Prof. Soumyajit Mandal, Prof. Ming-Chun Huang and Dr. Sandip Ray for providing their insightful thoughts and comments as well as en- couraging my research and academic activities. I am especially indebted to Dr. Sandip Ray from Intel Corporation, who amidst his busy schedule, has always made time and helped me in defining and analyzing many of the problems which have formed the basis of my thesis. His mentorship, especially during the latter stages of my graduate studies has been truly invaluable.

I have also been fortunate to collaborate with Prof. Cenk Cavusoglu, Prof. Chris Papachristou, Prof. Soumya Ray, Prof. Philip Feng, Prof. Prabhat Mishra, Dr. Thomas Tkacik, Dr. Gary Morrison, Dr. Amir Khatib Zadeh and Dr. Srivaths Ravi on various research topics and thank them for their help, encouragement and guidance.

At Case Western, I have been able to gain valuable knowledge and practical expe- rience by attending courses taught by Prof. Kenneth Loparo, Prof. Steve Garver- ick, Prof. Swarup Bhunia, Prof. Frank Merat, Prof. David Wilson, Prof. Wojbor Woyczynski, Prof. Pedram Mohseni and Prof. Chris Zorman. I have been able to apply the garnered knowledge in the analysis of different research problems over the course of my graduate studies.

My PhD at Case would not have been possible without the help and support of my labmates at the Nanoscape Research Lab. I am particularly indebted to Dr. Seetharam Narasimhan for his mentorship, continued guidance and support over the early part of my PhD studies. I would also like to specially thank Dr. Xinmu xiv Wang, Vaishnavi Ranganathan, Maryam Hashemian, Wenchao Qian, Fengchao Zhang, Yu Zheng, Robert Karam, Dr. Somnath Paul, Lei Wang and Luke Gould for making this journey a fun filled, interactive and collaborative one over the course of the last five years. Besides, my heartfelt gratitude goes out to Dr. Bhanu Pratap Singh who was a friend, classmate and above all, a mentor to guide me through some of the tough phases of PhD life.

I would like to extend my gratitude to all my team mates and mentors at Intel Corporation who helped me re-define my self confidence and made me realize my potential. Special thanks in this regard goes out to my intern manager David Durham and direct mentor Siddhartha Chhabra for their encouragement and for keeping their trust in me.

The drive and determination to overcome the numerous obstacles along the way and continue towards the goal, was held intact majorly in part due to the love and support of many friends, who have made my 5+ years of stay in Cleveland a warm and memorable one. I would like to especially express my gratitude to my best friend Soumili Chatterjee, whose constant encouragement and support in all aspects of life kept the fuel inside burning. Adriel Jebin, Sanchita Basu, Arijit Ghosh, Sanyukta Ghosh, Ayan Maity, Sharoon Hanook, Santanu Panda, Sushabhan Sadhukhan, Subhadip Senapati - your contributions towards my life during my stay here at Cleveland are extremely valuable.

Last but not the least, I would like to thank my family for their unconditional love and support at all times. My parents are my backbone and the sole reason why I am here today, getting a chance to write a PhD dissertation. Hopefully I have been able to make you proud. I value everything you have taught me and love you immensely, even if I have not said it enough. A special thanks to my caring little sister Abantika whose love and innocent enthusiasm was often a reminder of how lucky I am to be blessed with a wonderful family. I conclude by offering my sincere and earnest prayers to the Almighty for providing me the opportunity to explore and experience a different part of the world and keeping me along the right path in life. Infrastructure and Primitives for Hardware Security in Integrated Circuits Abstract

by ABHISHEK BASAK

For logical correlation and clustering similar approaches together, this thesis is divided into two parts. Part I proposes three light-weight, proactive IC integrity validation approaches as countermeasures against the two major forms of counter- feit ICs namely Recycled and Cloned chips. Hence the security threats considered here revolve around the legitimacy of the procured components from the vast, ever-expanding global supply chain, used to design electronic systems. The first approach is a low overhead, on-die protection mechanism against both types of above-mentioned counterfeit digital ICs based on one-time programmable Anti- fuses inserted in the I/O port logic and a key stored in secure non-volatile mem- ory. Second is an antifuse based IC package level solution against both counterfeit types, that does not require any design modification, on-die resources and hence can be applied to legacy designs (i.e. production ready designs), which comprise a significant portion of the semiconductor market. The last is an intrinsic pin resistance based IC authentication approach against cloned ICs, which does not require any overhead (die or package), changes in the design cycle and is appli- cable to legacy ICs. In addition to digital ICs, the latter two techniques also work efficiently for analog and mixed-signal designs. The protection against re- cycling offered by the first two methods involves active defense rather than only detection, i.e. the ICs are non-functional (hence of no value) until the antifuses are programed. Overall, as compared to existing Design-for-Security (DfS) techniques, utilization of one or more of these proposed approaches would incur minimal to virtually zero design modifications and associated hardware overhead, offer easy integrability in existing chips and are potentially applicable to legacy designs and ICs of all types at comparable security.

Part II of the thesis revolves around efficient protection against threats arising due to the integration characteristics and interactions between different hardware and/or software/firmware components on a platform required to perform system level functions. It particularly focuses on a System-on-Chip (SoC), which consti- tute the primary IC type in mobile and embedded electronic systems today and is essentially an entire platform on a single chip. We have proposed a novel ar- chitecture framework that provides a methodical, formal approach to implement system level security policies in these SoCs. SoCs incorporate different types of hardware/firmware/software based Intellectual Property (IP) cores including gen- eral purpose processors, graphics cores, accelerators, memory subsystems, device controllers etc. Security policies protect the access of various security assets on chip sprinkled around in these IP blocks, like device keys, passwords, configura- tion register settings, programmable fuses and private user data. They typically involve subtle interactions between different IP components and their specifica- tion as well as implementation often get modified over the design cycle involving various stakeholders. As a result, these policies are typically implemented in a rather adhoc fashion in SoCs presently. This creates significant issues in post-Si SoC validation, in-field testing as well as patch/upgrades in response to bugs or changing security requirements in field. To address this issue, the thesis proposes a light-weight infrastructure framework for systematic, methodical implementa- tion of diverse SoC security policies. The architecture is centered around smart security wrappers, which extract security critical event information from the IPs and a centralized, firmware upgradable micro-controlled policy controller, which analyzes the SoC security state at all phases and enforces the appropriate security controls via the wrappers. Furthermore, to reduce the security wrapper overheads as well as provide greater flexibility to adapt to new security requirements in-field, an interface is provided between the security architecture and the existing on-chip debug infrastructure to permit reuse of its Design-for-Debug (DfD) components for security policy implementation. The thesis concludes with an analysis of the threat due to malicious modifications and/or covert backdoors in untrustworthy 3rd party IPs in use today for designing SoCs. In the absence of full-proof static trust analysis methods, potent run-time solutions have been proposed in the ar- chitectural framework as a last line of defense to ensure SoC security in presence of untrustworthy IPs. Chapter 1

Introduction

Modern electronic system design is an expansive process involving many different stages across the world. An illustration of such a typical process is provided in Fig. 1.1 starting from design flow of individual ICs. For these ICs which could be of different types ranging from processors, graphic cores to memory controllers and various input/output device controllers, the functional/parametric specifica- tions govern the internal design implementations. As shown in Fig. 1.1, they are consequently fabricated and wafer-tested at the foundries, followed by assembly into suitable packages and final testing before release into the electronic supply chain. For design of a platform, a system designer selects the suitable ICs from the supply chain and integrates them along with incorporation of necessary system level software and firmware from operating system vendors (OSV) and indepen- dent software vendors (ISV) to enable them to perform system level functions in computing platforms like smart phones, laptops, automotive , industrial control etc. Here, the IC design stage is further broken down into the typical cor- responding sub-processes for a system-on-chip (SoC), which is the major IC type in the mobile and embedded system domain these days and constitute a platform by itself on chip. Importantly, SoC design is analogous to a small scale system design itself and depends on procurement of intellectual property (IP) design cores of different types from both in-house design teams as well as third party vendors.

Traditionally security of computing systems have mostly referred to vulnerabili- ties and attacks only at the upper layers of the system stack namely application, system software (S/W) and firmware. Usually, the underlying hardware (H/W) is considered trustworthy and secure and hence often utilized as the root of trust

1 Abbreviations 2

Figure 1.1: Different security threats in the modern electronic system design process, addressed by approaches proposed in this dissertation. for protection approaches. However in the present semiconductor ecosystem, this assumption does not always hold true [9], [10], [11]. The electronic supply chain for Integrated Circuits (ICs) and systems has expanded globally over the years, involving many levels of stakeholders and and incorporating countries with differ- ent regulation practices, licensing rules etc. Besides, due to strict time-to-market demands for products coupled with factors affecting sustenance of adequate profit margins, the different phases of design, fabrication and testing of ICs and comput- ing systems are typically outsourced to 3rd party vendors located all around the globe. Hence along with the software, firmware stacks, the trustworthiness of the H/W layer cannot be guaranteed with just the current practices of direct/indi- rect validation for functional and parametric correctness and some burn-in-tests for reliability. For a typical electronic system design process, the different stages exposed to various security threats are also illustrated in Fig. 1.1. These include untrustworthy off-shore design houses or fabrication facilities intentionally insert- ing malicious conditionally triggered H/W logic or leakage channels (also called “Trojans”) or the more common scenario of unintentional security vulnera- bilities due to less strict regulatory environment for design and testing. Many of these rogue design and manufacturing facilities are the sources of illegitimate/u- nauthorized IC designs or fake, counterfeit chips [12], which are inserted into the vast global supply chain taking the advantage of the presence of multiple chains of untrustworthy parties. Besides, with increasing size and complexities of plat- form designs including system-on-chips over the years, the integration of multiple Abbreviations 3 potentially untrustworthy ICs, IPs as well as highly vulnerable system software, firmware and application software during system design and deployment make the prevalent computing systems extremely susceptible to many security attacks. This dissertation aims to propose defense approaches against specifically some of these security threats, as highlighted in Fig. 1.1.

This thesis is divided into two closely knit parts, both addressing the high level problem of ensuring security of modern electronic systems. This is done mostly to integrate similar approaches, methods and to ensure better logical flow and correlation. Part I of the thesis proposes Design-for-Security (DfS) solutions to address the growing problem of counterfeit ICs in the semiconductor ecosystem. Part II of the thesis deals with architecture frameworks that allow for a systematic, methodical approach to efficiently implement system level security policies, which are primarily utilized to ensure the security of the underlying H/W operations and interactions associated with the numerous security assets in System-on-Chips (SoCs). Hence, by an analogy, part I deals with security threats associated with procurement of the ICs to design these systems and Part II deals with vulnera- bilities associated with the architecture or integration characteristics of different components on platforms with the focus mainly on SoCs. Next, we give a back- ground on the major threat that we try to address in Part I, namely Counterfeit ICs.

1.1 What are Counterfeit ICs?

A counterfeit IC is an electronic component with discrepancy in the material, performance or characteristics, but sold as a genuine chip. It can be anything from an unauthorized copy, remarked/recycled die (e.g. used chip sold as new), cloned design obtained through reverse engineering or piracy, overproduced chip to a failed real part. The rising incidences of counterfeit integrated circuits (ICs) in the semiconductor supply chain has emerged as a great concern to the electronic industry [13]. Counterfeit ICs may suffer from altered functionality, poor perfor- mance and/or degraded reliability of operation. They pose a significant threat to chip manufacturers, system integrators as well as end-users in diverse industrial sectors like consumer electronics, automobile, health-care, networking and even defense and other mission critical systems [13], [14]. The two major categories of counterfeit ICs are 1) Remarked/Recycled, 2) Cloned Chips. The former includes Abbreviations 4

Figure 1.2: a) Present semiconductor business model; (b) possible sneak paths for adversaries to insert counterfeit ICs into the supply chain. the selling of aged, used chips as new in the open market, possibly after careful scraping off ICs from motherboards and other PCBs and consequent remarking or some repackaging of the die. According to one report, over 80% of the counterfeits chips are reported to be either recycled or remarked [15]. On the other hand, cloned ICs include unauthorized production of an IC without legal rights, which is usually performed through Intellectual Property (IP) piracy, over-production at foundries or IC reverse engineering. These IPs, utilized in modern SoC and other IC designs could be anything from a processor core, memory sub-system controller to a component of the communication infrastructure or system I/O in- terface. Presence of untrustworthy entities in the vast semiconductor ecosystem leads to IPs being pirated and copied at all levels of hardware design abstraction including Register Transfer Level (RTL), gate/netlist or layout levels. Overpro- duction occurs when an untrusted foundry (who gains access to the original design through legal/illegal means), manufactures ICs outside the designer’s contract and sells them in the supply chain in parallel without the design houses’ knowledge. Depending on the business model, these chips have a high probability of not being tested under the proper operating/stress conditions (adequate tests perhaps know only to design house) and introduce functionality and reliability concerns. With counterfeiters continuously increase their levels of sophistication, often backed by illicit networks and marketplace, chip designs are also reverse-engineered by these adversaries depending on potential cost/benefit ratios.

As alluded to before, the increasingly complex global semiconductor supply chain, Abbreviations 5

Figure 1.3: Percentage of reported counterfeit incidences by IC type in 2011 [2]; (b) Counterfeit ICs sold by VisionTech for different critical appli- cations, under name of various semiconductor vendors. spanning different countries and their legal systems, provides ample opportuni- ties to adversaries to insert these counterfeit chips into this supply chain. Prior to actual deployment, an IC is often bought and resold many times, involving many levels of untrustworthy traders, brokers and retailers [2], [14]. The current semiconductor business model, as illustrated in Fig. 1.2(a), offers various sneak channels that can be exploited by an adversary to insert cloned and recycled ICs (illustrated in Fig. 1.2(b)). The standard chip, package and system level tests, selected mostly to maximize fault coverage are mostly inadequate in detecting counterfeit ICs. Furthermore, the existing Design-for-Security (DfS) approaches proposed mostly at the academic research level, are often not attractive due to requirement of significant design modifications, test workload, and/or hardware overhead [14], [13]. This also renders them non-applicable for legacy designs i.e. designs already finalized for production over the years. Many of these techniques also suffer from low robustness of operation and inadequate coverage of counterfeit ICs. Moreover, a majority of them are digital-only approaches and are ineffective for analog and mixed-signal chips [2], [13], which comprise a significant portion of the ICs being counterfeit in recent years, as shown in Fig. 1.3(a) as part of a study.

As a result of these shortcomings, many actual counterfeit incidences have not even been reported in recent times. According to a recent study, only ∼ 3% of total counterfeit incidences were reported in the year 2008. Millions of these fake IC’s are floating around in the supply chain and probably have been employed in various equipment being used at present without us even knowing it. As an example, to comprehend the gravity of the situation, Fig. 1.3(b) illustrates the number of cloned ICs used in different mission critical systems, sold by a company Abbreviations 6

VisionTech under the name of different legitimate vendors [15]. So, apart from an enhanced probability of system failures and degraded operation reliability, use of these fake ICs also have a significant negative impact on legitimate vendor brand reputation leading to huge profit losses, hamper research and development efforts and foster growth of organized illicit networks [12]. The cost of counterfeiting and piracy had been estimated to rise to a staggering 1.2 to 1.7 trillion dollars by 2015 [13]. To address this growing concern affecting the semiconductor ecosystem, there is a critical need for robust and light weight protection against counterfeit ICs.

1.2 Related Work on Countermeasures against Counterfeit ICs

As the semiconductor supply chain is global and widespread across countries, typ- ically involving no central control and incorporating multiple independent entities, attempts at implementing regulatory laws and rules to control the flow are usu- ally not effective. Most of the existing countermeasures against counterfeiting attacks at the industry level are reactive i.e. aims at detecting an attack through standard test/validation [13]. In case of remarked ICs, the detection approaches mainly involve physical tests such as external package inspection, decapsulation verification and material analysis for signs of wear and tear and previous usage. However, most IC packages are carefully refurbished such that they can easily evade these tests [12]. To weed out counterfeit ICs (cloned and recycled), chips are tested by different stakeholders for design specifications by functional tests, DC/AC parametric tests and burn-in tests for reliability verification. However, these are expensive, time-intensive and difficult to perform exhaustively for all ICs with regards to practical constraints of time-to-market and cost-effectiveness. As a result, they are inefficient for the purposes of detecting different forms of aforementioned fake chips. There have even been attempts to tag each chip with a unique ID e.g. a RFID tag, but present reverse-engineering tools have become very advanced, allowing an attacker to read them [14].

Due to the inadequacy of the existing industry level reactive countermeasures, several proactive DfS approaches have emerged , still mostly confined in the re- search community. They include on-chip aging sensors [16], [17], chip tracking Abbreviations 7

Existing Protection Approaches against Counterfeiting

Detection by generic Design-for-Security (DfS) based testing/validation detection (Proactive) (Reactive) e.g. via physical, functional, parametric or burn-in-test Against Recycled ICs Against Cloned ICs e.g. RO based aging e.g. PUF based chip sensors on-chip fingerprint, IC watermark, H/W obfuscation, metering

Figure 1.4: Classification of existing anti-counterfeiting protection schemes. schemes based on watermarking [18], IC fingerprinting [19] as well as Physical Unclonable Functions (PUF) [20], [21], [22]. These existing anti-counterfeiting defense approaches are shown in Fig. 1.4. Aging sensors can however work only for isolating recycled/used chips. Moreover, the proposed designs typically incur design and test efforts as well as significant area/power overhead, especially in small-scale chips [14]. Due to requirement of design modifications, aging sensors typically are inapplicable to legacy designs. The on-chip ring-oscillator (RO) based aging sensors [17], besides the general requirements of design changes and H/W overhead, may only succeed in detecting recycled ICs if they have been used for a particular minimum usage time due to their low sensitivity under manufacturing process variations.

For cloned chips, watermarking and fingerprint based approaches typically incur significant design modifications and H/W overhead. The same shortcomings ap- plies to H/W obfuscation, which typically utilize key based logic scrambling to render IP piracy and IC reverse-engineering much more difficult for the adver- saries [23]. In other words, although these techniques could potentially provide the required security, the cost of design modifications affecting existing design parameters like timing, power and test/validation patterns along with the in- curred time, complexity and resources overrides the benefits. Over the last 10 years, Physical Unclonable Functions or PUFs, originating from mathematical ap- proaches of Physical One Way Functions [24] have emerged as a widely researched topic in this area. Silicon based PUFs exploit intrinsic random variations in the semiconductor manufacturing process (both within and across dies) to generate Abbreviations 8 a unique identifier for each chip. This beneficial use of the typically unfavorable nanometer process variations removes the need for storing a hard coded key or secret as a unique signature in storage on chip and enhances security. However PUFs are not applicable for detecting recycled chips without additional, complex design protocols involving multiple stakeholders. At the same time, most PUFs like arbiter or ring-oscillator [20], [21] based ones incur considerable design effort, hardware overhead, test workload and cannot be applied to legacy designs. Most proposed silicon circuit based PUFs are digital-only and are inapplicable to analog and mixed-signal chips. Moreover, for authentication, some PUF implementations like the SRAM power-on state based designs [22], exhibit robustness issues due to temperature, voltage fluctuations as well as across time. As a result, they require additional Error Correcting Code (ECC) based schemes on chip to gener- ate stable key or signatures, significantly increasing the design effort. Hardware metering [25] can provide active defense against cloning attacks. However, they also require the presence of a separate on chip random number block to generate a key for unlocking, resulting in considerable die area overhead. Hence, we find that there is a critical need in the semiconductor ecosystem for light-weight, easily implementable and efficient DfS methods against IC counterfeiting.

1.3 Major Contributions of Research (Part I)

The research in Part I of this thesis revolved mainly around novel, proactive light- weight DfS approaches against both recycled and cloned ICs. Three defense mea- sures have been proposed in this regard namely “C-Lock” (for Chip Locking), “P- Val” (for Package level IC Integrity Validation) and “PiRA” (for Pin Resistance based IC Authentication). Similar to the purpose of tamper evident seals/caps in medicine bottles [26], C-Lock and P-Val are novel one-time-programmable (OTP) Antifuse (AF) based pin locking approaches that can protect ICs against both recycling and cloning attacks. The AFs, which behave as a normally open switch (up to ∼ 1GΩ) until programmed[1], disables the corresponding pins and renders the IC non-functional and thereby of no value. C-Lock is a die level approach [27], incorporating these AF/s in the input/output port logic of single/multiple candi- date IC pins. On the other hand, P-Val is a package level method where AFs are added inside the IC packages, leaving the die completely untouched [28]. Once these OTP AFs are programmed by a trusted party e.g. system designer, the ICs Abbreviations 9 are functional and remain unlocked through out its life cycle. Hence, the pro- grammed AFs serve as witnesses of previous usage, handling or even tampering, thereby automatically weeding out recycled/remarked chips. An adversary needs to replace these AF/s to avoid detection of recycled/remarked/repackaged ICs. Due to package level AF insertion, such replacement may be theoretically possible only in P-Val (not practical from cost-benefit ratio). Hence, C-Lock offers impene- trable security against used ICs at the cost of small, yet some design modifications, hardware overhead and test workload. On the contrary, P-Val incurs no die level modifications, overhead and can be applied to legacy designs (designs finalized for production) and analog/mixed signal ICs. In C-Lock, a hard-to-clone device/fam- ily specific key is stored in an OTP, one-time-readable (OTR) non-volatile memory (NVM) on chip by the IC designer after post-manufacturing tests/validation. The AF/s are only programmed on input of this unique key, thus offering security against cloned ICs of different kinds. In P-Val, we exploit intrinsic, random vari- ations in programmed resistance of AF devices connected to the unlocked IC pins that enable us to create unique chip-specific signatures for authentication. Due to easily computable fingerprints inherent, unique to the IC, P-Val offers higher security against cloning as compared to the stored key based approach in C-Lock. For any IC in P-Val, along with past usage verification, an interested party would compute the chip signature using P-Val methods and compare against legitimate IC fingerprints stored in a trusted design house database to detect copied/fake chips. A point to note is that for all these approaches, we consider protecting the integrity of the ICs in the supply chain till the point of system integration (as illustrated in the business model in Fig. 1.2(b)) as this threat model constitutes all the attacks that we talked about previously. Hence on-field authentication and consequent aging analysis for robustness is omitted in the study. Moreover, these techniques can be applied to all ICs irrespective of their type and complexity ranging from simple IP based custom ASIC designs to modern system-on-chips, comprising of complete platforms by themselves. Besides, in all of these proposed approaches, the IC designer or the manufacturer and the system integrator are considered trustworthy whereas the IC fabrication houses (where wafer tests are typically performed as well) are potentially untrusted which incorporates typical usage scenarios found these days. Adversaries may reside in the design houses of the IPs used for chip designs these days, who may play a significant role in assisting piracy of the IPs at different design levels.

The third of the proposed approaches “PiRA” requires no additional devices like Abbreviations 10

AFs and exploits the intrinsic, uncontrolled, random variations in the pin resis- tances (PiR) within and across different ICs to create unique chip signatures for authentication (A) [29]. Hence it is reactive and protects against all different types of cloning attacks. Pin resistance is usually defined as the electrical resistance calculated while looking into the corresponding pin under operating conditions, similar to the concept utilized to measure input resistance/impedance at circuit nodes. Intrinsic random manufacturing variations affect the on-die I/O logic cir- cuits. PiRA is based on extracting these variations by appropriate external (from pin side) voltage/current measurements. The advantage is that PiRA incurs virtu- ally zero design (die/package level) effort and hardware overhead. It only requires some minimal additional test workload to generate the IC signatures during post- manufacturing validation phase. The operation model to verify IC integrity is same as P-Val i.e. comparing fingerprints to trusted design house database. PiRA can be easily applied to legacy designs, which comprise a major portion of the market. Its usage also extends to chips of all types including analog/mixed-signal ICs, in which existing all-digital security primitives are difficult to implement.

Hence, the antifuses in I/O pin/ports of ICs provides protection against recycled chips where as the IC signature based authentication approaches ensure the neces- sary security against different types of cloning attacks. The entire gamut of these three proposed DfS based countermeasures provides a chip manufacturer with the opportunity to select one or a combination of these approaches with complimen- tary benefits, according to one’s requirements. For example, an IC designer may implement package level AF based active protection against recycling as in P-Val and intrinsic pin resistance based authentication as in PiRA to protect against different forms of cloning attacks at the cost of minimal design effort and H/W overhead. Die level AF based port locking may be chosen for a new design to provide the highest assurances against used ICs, based on the threat model con- sidered. As compared to existing approaches, a particular or a combination of the three proposed approaches would provide the following advantages:

• It can be utilized to provide unified protection against both the major forms of counterfeiting namely recycled ICs and cloned chips of different types. Defense against used ICs is active rather than only detection based i.e. the IC is non-functional without breaking/bypassing the defense. Abbreviations 11

• The implementation may incur low die level to minor package level to vir- tually zero design modifications and hardware overhead. They are easily implementable with minimal test workload.

• Two of the three proposed techniques are applicable to legacy designs as well as ICs of all types including analog/mixed signal chips.

1.4 System-on-Chip (SoC) Security

Recent years have seen rapid proliferation of embedded and mobile computing devices. Such devices come in a variety of form factors, including smartphones, tablets, automotive controls, wearables, medical and fashionable implants, and smart sensors. Most of these devices are architected around one or more System- on-Chip (SoC) designs. Fig. 1.5, showing a few typical modern day application domains of SoC. A SoC architecture involves coordination and communication of a number of pre-designed hardware blocks or hardware-software modules of well- defined functionality, referred to as “Intellectual Properties” or “IPs”. These IPs could constitute cores of different types and functionalities like general purpose processor cores, graphics cores, memory sub-system and corresponding controllers along with different device controllers with interfaces to interact with the system, as illustrated in a SoC block diagram schematic in Fig. 1.6. Besides, intra-IP com- munication fabrics in the SoC such as bus, cross-bar or network-on-chip (NoC) constituting of bus controllers, routers, switches etc., often are available as “In- frastructure IPs (IIPs)”. There are IIPs for test, on-chip debug in the SoC as well. Hence, riding on the benefits of increasing integration density afforded by Moore’s Law[30] along with heterogeneous manufacturing trends, a modern day SoC is a platform in itself and is the major IC component used to design embedded/mobile computing systems. Although not fully analogous, for better comprehension, one can think of a SoC as the yesteryear’s motherboard with assembled components being all shrunk inside a chip.

Given the diversity and personalization of these devices/systems, security has emerged as a critical concern for them. Most of these devices contain confiden- tial assets, which must be protected against unauthorized access. Examples of secure or sensitive assets present in virtually all modern computing systems in- clude cryptographic and DRM keys, premium content, firmware, programmable Abbreviations 12

Figure 1.5: Some typical current application/usage scenarios where SoCs are utilized for implementing the corresponding electronic systems

Figure 1.6: Schematic of a typical representative SoC architecture with the proposed framework for security policies Abbreviations 13

fuses, de-featuring bits and personal end-user information. Unauthorized or ma- licious access to these assets can result in leakage of company trade secrets for device manufacturers or content providers, identity theft for end users, and even destruction of human life. Consequently, it is vital to ensure that secure assets in computing devices are adequately protected. In SoCs, their access restrictions are typically defined by multiple system level security policies [31], [32], [33], [34].

1.4.1 Background on SoC Security Policies

As mentioned, modern SoC designs include a large number of critical assets, which must be protected against unauthorized access. At a high level, such access control can be defined by confidentiality, integrity, and availability requirements [35], the famous “CIA” triad of information security. Confidentiality refers to the property of protecting information from disclosure to unauthorized parties. Integrity is the property of securing information from tamper/modification by an unauthorized entity. The third pillar i.e. Availability ensures that authorized parties are able to access the information when necessary. To add to this context, authorization or privilege levels may vary temporally i.e. a party may be authorized to modify information at t = t1 and only view/read them at t = t2. Security policies map such CIA requirements to “actionable” design constraints that can be used by IP implementers or SoC integrators to define, analyze and implement protection mechanisms. Following are two representative examples for a typical SoC.

• Example 1: During boot, data transmitted by the crypto engine cannot be observed by any IP in the SoC other than its intended target.

• Example 2: A secure key container can be updated for silicon validation but not after production.

Example 1 is a confidentiality requirement while Example 2 is an integrity con- straint; the policies provide definitions of (computable) conditions to be satisfied by the design for accessing a security asset. Furthermore, we observe that access to an asset may vary depending on the state of execution (e.g., boot time, normal execution, etc.), or position in the development life-cycle (e.g., manufacturing, production, etc.). Below we summarize some standard policy classes. It is be- yond the scope of this thesis to provide a comprehensive compendium of different Abbreviations 14

policies, or even to discuss any of them in detail. The description below merely provides a flavor of some existing policies in context of SoC operation.

Access Control [36–38]: This is the most common class of policies, and specifies how different agents in a SoC can access a security asset at different points of the execution. Here an “agent” can be a hardware, firmware or software component in any IP. Examples 1 and 2 above are examples of such policy. Furthermore, access control forms the basis of many other policies, including information flow, control flow integrity, and secure boot.

Information Flow [39, 40]: Values of secure assets can sometimes be inferred without direct access, through indirect observation or “snooping” of intermediate computation or communications of IPs. Information flow policies restrict such indirect inference. Following is an example:

• Key Obliviousness: A low-security IP cannot infer cryptographic keys by snooping only the data from crypto engine on a low-security NoC.

Information flow policies are difficult to analyze. They often require highly sophis- ticated protection mechanisms and advanced mathematical arguments for correct- ness, typically involving hardness or complexity results from information security. Consequently they are employed only on critical assets with very high confiden- tiality requirements.

Liveness [41]: These policies ensure that the system performs its functionality without “stagnation” throughout its execution. A typical liveness policy is that a request for a resource by an IP is followed by an event response or grant. Devi- ation from such a policy can result in system deadlock or livelock, consequently compromising system availability requirements.

Time-of-Check vs. Time of Use (TOCTOU) [33, 42]: This refers to the requirement that any agent accessing a resource requiring authorization is in- deed the agent that has been authorized. A critical example of TOCTOU is in firmware update, where the policy requires that firmware eventually installed on an update/upgrade is the same one that has been authenticated. Abbreviations 15

Secure Boot: Booting a system entails communication of significant security assets between corresponding IPs, e.g. fuse configurations, access control prior- ities, cryptographic keys, firmware updates, debug and post-silicon observability information. Consequently, boot imposes more stringent security requirements on IP internals and communications than normal execution. Individual policies during boot can be access control, information flow, and TOCTOU requirements; however, it is often convenient to coalesce them into a unified set of boot policies.

We observe that the above policies relate to the integration characteristics of SoC designs, not each individual IP block i.e. the underlying threat model includes attacks through untrustworthy software in the IPs or vulnerable SoC to system interface but not malicious hardware introduced in the IPs. Our threat model is reasonable for SoC designs involving primarily in-house rather than third-party IPs. The last part of the thesis is devoted towards extending the analysis for security in presence of malicious modifications or covert backdoors in the hardware of potentially untrustworthy IPs in the SoC.

1.4.2 Issues with Current SoC Design Trends

Security assets in SoC designs spread across multiple IP blocks. Their access re- strictions often involve subtle and complex interactions between hardware, firmware and/or software associated with these design modules. Hence, security policies controlling operations involving these security assets are often complex and am- biguous, which makes it difficult for the SoC designers to correctly implement them. Typically, policies are defined by system architects as well as different IP design and SoC integration teams and often refined or modified during the course of system development. Stages of a generic SoC front end design process (con- sidered till fabrication) where the policies might be defined, updated or modified by different stakeholders are shown in Fig. 1.7. Of course, they are tweaked after post-Si validation as well. To exacerbate the issue, security policies are rarely specified in any formal, analyzable form. Some policies are described (in natural language) in different architecture documents, and many remain undocumented. As a result, the final implementation of system level security policies in modern day SoCs, involving multiple parties/stakeholders is rather adhoc i.e. not following a systematic, disciplined approach. Along with the increased complexity arising Abbreviations 16

Figure 1.7: Stages of a typical SoC front end (till fabrication) design process where system level security policies may be defined, refined or modified. during design, potentially incurring greater resources and design time (affecting time-to-market), this current practice leads to the following major issues:

• It becomes extremely difficult to validate the SoC for adherence to the sys- tem security requirements during post-Si validation and/or in-field testing. In the absence of a formal, methodical approach, bugs or errors detected during security validation, are becoming increasingly complicated to trace back to their sources and thus correct them. This potentially requires more validation time and resources, thereby leading to increased time-to-market. Indirectly, the probability of vulnerabilities remaining in a SoC design after design or test/patch is also enhanced.

• It is very complicated to patch/upgrade the SoC security policies which might be necessary in response to bugs found in-field as well as dynamically changing security requirements due to varying product/system usage scenar- ios, including its adoption in different geographic market segments around the world.

• The approach of systematic design reuse, based on which modern SoCs are implemented is hampered (in context of secure SoC design) due to the ad- hoc nature of implementation of security policies. This too often multiplies design effort and complexity and leads to increased time-to-market. Abbreviations 17

Hence in light of these pressing practical issues, there is a critical need for devising a systematic approach towards design and implementation of system level SoC security policies.

1.4.3 Related Work

The notions of high-level security requirements in a computing system were devel- oped in the 1990s as part of research on information security [35]. Early research on security policies looked primarily on software based systems and developed analysis frameworks for access control and information flow policies [39, 43]. More recently, researchers have tried to develop languages for formal representation of hardware (H/W) security policies [32]. This is just one small, complementary (to our work) step (of the many needed) that would aid the overall process of systematic, me- thodical implementation of H/W security policies. Besides, at the platform level, there has been considerable work on providing a Trusted Execution Environment (TEE) to services or applications running on the system to provide end-to-end security and thereby protect them from different software based threats like mal- wares, rootkits etc. These TEE implementations mostly provide a secure mode in the processor core, often incorporating a protected region of memory [44], [45] and trusted input/output based on isolation/cryptographic methods, that enable services to utilize them accordingly to protect the necessary security critical as- sets from leakage or tamper. Secure elements (SE) like Trusted Platform Modules (TPMs) can be used alongside these TEEs to ensure secure storage and thereby protect against hardware based tampers [46]. However, all these hardware based security primitives including different TEEs from multiple independent platform developers and SEs serve only for implementation of specific access control policies governing the confidentiality and integrity of assets. Most of them revolve around protection only inside the processor of the platform during execution. They have not been analyzed for other core types or constituents of the system.

On the other hand, with the increasing prominence of SoC based platforms, there has been significant research interest in SoC security. However, most recent re- search in this area has focused on hardware security, specifically protection of the system against a malicious [47], various forms of counterfeit- ing attacks [48], and attacks to leak secret information through side channels or on-chip resources such as scan or debug infrastructure. A recent work has also Abbreviations 18 targeted developing a centralized and scalable framework, referred to as an infras- tructure IP for security (IIPS), for efficiently realizing countermeasures against these attacks [49]. Infrastructure IPs refer to a range of IPs that are dedicated to facilitate SoC functional verification, testing or yield improvement [50]. There are also a few works that report efficient protocols [33] involving functional IP blocks, crypto IPs, communication fabric and associated architecture-level custom modifications for only specific access control policies [34]. They are not generic or flexible enough to apply for a diverse set of security policies, which are typically required to be implemented in a modern SoC usage scenario.

1.5 Major Contributions of Research (Part II)

Part II of this thesis proposes a generic, flexible architecture framework for sys- tematic exploration, analysis and implementation of diverse system-level security policies for modern SoC designs [51]. The cornerstone of our architecture is a dedicated, plug-and-play, centralized IP block, referred to as E-IIPS (Extended Infrastructure IP for Security) as illustrated in Fig. 1.6. E-IIPS builds on the high level concepts of the recently reported infrastructure IP for SoC security, IIPS [49] and hence referred to as Extended-IIPS. IIPS alleviates SoC designers from sepa- rately addressing security issues through design modifications in multiple IP cores, and provides ease of integration and functional scalability. However, as mentioned previously, it was only limited to protection against a few low-level IP localized hardware security vulnerabilities, e.g. IP piracy, hardware Trojan, and scan-based information leakage. E-IIPS extends IIPS for implementing diverse set of security policies in SoC integration and is much broader and scalable in scope allowing for different threat models and the designed/chosen countermeasures in various system execution contexts i.e. flexible system level SoC policies can be set by a designer to prevent scan chain usage by unauthorized parties during test/debug or disable any possibility of Trojan payload in a potentially untrustworthy 3rd party IP to propagate to and affect the system. Most importantly, this architec- ture framework provides a methodical, disciplined approach to SoC designers to implement these system level SoC security policies. This systematic methodology would significantly alleviate complexities in verifying SoC adherence to security requirements during post-Si validation and on-field tests as well as performing upgrades/patches in response to cases like bugs/exploits found during validation, Abbreviations 19 changing security requirements on-field etc. All these may lead to reduced time- to-market, which nowadays is often considered the single most important factor behind sustenance of adequate profit margins for a semiconductor company.

E-IIPS is a micro-controlled, firmware-upgradable module that realizes/executes system-level security policies of various forms and types using firmware code fol- lowing existing security policy languages, such as SAPPER [32]. Hence, often in the ensuing text and chapters of Part II, the central E-IIPS control module is referred to interchangeably by SPC (Security Policy Controller). SoC designers can program security policies in E-IIPS as firmware modules that are stored in a secure ROM or flash memory; The E-IIPS or SPC module interfaces with the constituent IP blocks in a SoC using “security wrappers” integrated with the IPs, to obtain necessary security related information. Functionally, the SPC utilizes these event information to determine the current security state of the system and asserts/disables the required IP and system level controls via the wrappers. In other words, the E-IIPS module intervenes when a policy violation is detected. Conceptually, the security wrappers extend the existing test (e.g. IEEE 1500 boundary scan based wrapper [52]) and debug wrapper (e.g. ARM’s Coresight IP interface [53]) of an IP to provide a standardized way for E-IIPS to obtain lo- cal security information and communicate appropriate controls while abstracting the details of internal implementation of individual IPs. The wrapper and the E-IIPS architectures are flexible and agnostic to the SoC design functionality or security policy requirements. They can be applied in a scalable manner to ex- isting SoC designs with varying number of IP blocks of different types. Besides, to reduce hardware overhead of the security wrappers, which could potentially be considerable for complex, security critical IPs, the proposed security architecture interfaces with the on-chip debug infrastructure to re-purpose it to obtain security critical event information. The on chip design-for-debug (DfD) architecture typi- cally offers significant observability and controllability into each SoC component, required for post-Si validation and SoC upgrades/patches. Keeping the debug use cases completely transparent to re-purposing DfD for security, one can uti- lize the local (to IPs) DfD observability modules to extract the security critical information required for policies. Alongside, the DfD to security interface would also make the proposed policy framework significantly more flexible to require- ments of on-field upgrades/patches, which otherwise might not have been possible without an improbable re-spin in certain scenarios. We also enhance the proposed security architecture with appropriate run-time support so as to detect potential Abbreviations 20 undependable or malicious behavior of constituent 3rd party IPs using fine grained IP-Trust aware security policies. This prevents system level effects of any rogue IP action, thereby ensuring security and reliability of SoC operations. Together with existing static IP-Trust verification methods, the proposed architecture level run time support aids in defending against system level effects of “IP level Trojans”, a rising problem in the security community. The research in Part II of this the- sis provides all the relevant implementation details and demonstrates how to use the proposed architecture to facilitate implementation and validation of system- level SoC security policies through several case studies involving common security policies of various types. In short, this work makes four major contributions-

• We develop, for the first time to our knowledge, an on-chip flexible archi- tecture using a configurable centralized controller IP for implementing, ex- ploring, and analyzing diverse SoC security policies. We present a general interface of security policy enforcement with functional IPs that extends the existing test/debug wrappers.

• Developing representative small-scale SoC models, we incorporate the archi- tecture framework in it and demonstrate examples of generic security policy implementation in different use case scenarios. A cost analysis of such an infrastructure is also provided by estimating the hardware overheads (area, power) of the wrappers and the centralized SPC.

• To potentially reduce the H/W overhead of the security wrappers as well as provide flexibility to update/upgrade security policies on field requiring extracting additional local events from the IPs, we propose a light-weight interface between the security architecture framework with the existing on- chip debug infrastructure to reuse the dormant (during normal execution) local design-for-debug (DfD) modules for security policy implementation.

• Our final contribution is extending the architecture framework to provide support for fine-grained IP-Trust aware security policies, for run-time secu- rity and reliability of the SoC in presence of inherently untrustworthy IP blocks which could originate from third party IP vendors. Abbreviations 21

1.6 Organization of Thesis

As mentioned before, for better logical flow and correlation, the thesis is divided into 2 parts. Out of the ensuing content, chapters 2 and 3 belong to part I where as chapters 4, 5 and 6 are included in Part II. Chapter 2 analyzes in detail and provides the methods and simulation/experimental results for proactive antifuse based IC security against recycled and cloned chips. This incorporates both the die level “C-Lock” and the package level solution “P-Val” with complimentary benefits. Description and experimental verification of the intrinsic pin resistance based authentication approach against cloned ICs, namely “PiRA”, is provided in chapter 3. Chapter 4, starting part II of the thesis, analyzes the micro-architecture details of the proposed infrastructure framework for systematic implementation of diverse SoC security policies. Use case policy implementation scenarios along with hardware overhead estimations using representative SoC models are presented as part of the analysis in chapter 4. Chapter 5 includes the interface of the security architecture with the on-chip debug infrastructure as highlighted before. The overhead analysis of the enhanced framework has been presented as well. Chapter 6 introduces us to the rising problem of potentially untrustworthy IP blocks in the SoC ecosystem and describes how our policy architecture may be extended to provide run-time security of the SoC operations in presence of these malicious IPs. Finally Chapter 7 concludes with a summary of the thesis and potential directions for future research. Chapter 2

Antifuse based Active Protection against Counterfeit ICs

AFs are one time programmable (OTP) devices, behaving as a normally open switch (resistance ∼ 100MΩ − 1GΩ) [1]. Once the desired programming volt- age is applied across the terminals (mostly independent of polarity), irreversible changes occur in the structure of the AF. Consequently, applying a programming current leads to the formation of a conductive filament due to high joule heating and chemical reaction between electrode and insulator material, hence behaving as a closed switch (resistance ∼ 10Ω − 100Ω) [54], as shown in Fig. 2.1. AFs have been employed in design of secure, reliable programmable read-only memory (PROM) and military grade FPGAs [54] due to their one time programmabil- ity. AFs are primarily of two different structures: a) PolySi-ONO-Diffusion and b) Metal-Insulator-Metal (MIM) structure. The latter provides greater ease of implementation in present micro-scale devices at desired electrical properties.

Using the antifuse device as the core component, we propose two defense mecha- nisms against both recycled and cloned chips with complimentary benefits, namely a die level approach named “C-Lock” [27] and a package level solution “P-Val” [28]. In the first part of this chapter we analyze the principle, implementation and operation methodology of C-Lock.

22 Abbreviations 23

Figure 2.1: Major stages of programming a Metal-Insulator-Metal antifuse with associated parameter values. 2.1 C-Lock Methodology

C-Lock, in short for “chip locking” is a novel design approach based on a die level pin locking mechanism, for active defense against various forms of counterfeiting attacks. The key idea is to lock or disable an IC pin by placing an antifuse (AF) in the corresponding input/output (I/O) port circuitry on die. As antifuse behaves as a normally open switch until programmed[1], it disables the pin connection. Depending on the degree of desired locking, the IC design house may insert AFs to one or many pins. With atleast one or a fraction of disabled pins, the IC is not fully functional and hence of no use or value. The IC remains non-operational until unlocked by a proprietary programming device (designed by trusted source like IC design house itself) that inputs a secret unlocking key to trigger programming of the AFs. This could be a 64/128 bit key, unique to an IC family/type from a de- signer and is stored on-chip in a one-time-programmable (OTP), one-time-readable (OTR) non-volatile memory (NVM). Corresponding light-weight electronic cir- cuitry/logic is also present on die to compare a user (system designer or last level retailer) supplied input sequence with the on-chip stored key and to proceed to the AF unlock phase in case of a match. The additional design is inserted into an IP at the Register Transfer level (RTL). Fig. 2.2 illustrates the implementation of the locking mechanism in a regular IC. Once unlocked, a chip remains unlocked throughout its life cycle as AFs are one time programmable (OTP). Hence the programmed AFs serve as indication or proof of previous usage/tamper, which is usually enough information for a trusted party not to buy/use it and thus protects against recycled/remarked ICs. Abbreviations 24

Figure 2.2: Schematic of the implementation of the proposed on-die locking mechanism in an IC.

Only after post-manufacturing testing and validation is the key programmed in the NVM under control of the designer. Hence the leakage via untrusted foundry or test facility is not possible. The access paths to the memory are then disabled by blowing the OTP e-fuses [55], inserted during fabrication, similar to the scan chain disabling after testing. This makes the NVM OTP, OTR, hence preventing stealing of the signature. The modifications required in an IC design cycle for incorporation of the locking mechanism are shown in Fig. 2.3(a). Hence the securely stored, hard- to-clone on-die key based authentication provides the protection against cloned ICs of different types including overproduced ICs, which do not come back to the legitimate designer in the typical flow and hence not programmed with the right key. The detection of counterfeit ICs using C-Lock in the present business model is summarized in the illustration in Fig. 2.3(b). Unlocking of AFs based on the match with the 64/128 bit key makes functional reverse-engineering practically infeasible. Techniques have been proposed to make the design resilient against side channel attacks (SCA). Besides, high robustness of AFs in the on and off states, under different environmental variations makes this scheme highly stable and reliable. Unlike the existing approaches, “C-Lock” provides the following key advantages: (1) it provides an active defense against counterfeiting attacks i.e. the IC cannot be even used without bypassing the protection mechanism; (2) it protects against both reselling/remarking as well as cloning attacks - two major Abbreviations 25

Figure 2.3: a) Incorporation of the security mechanism in the current IC design cycle and b) the semiconductor business model to protect against diverse counterfeiting attacks. forms of counterfeit chips in a supply chain; and (3) it incurs significantly less design modifications and hardware overhead relative to present DfS techniques.

2.1.1 Business Model

All anti-counterfeiting schemes must seamlessly fit in the current semiconductor business model. Our proposed design for chip lock/unlock is inserted at the RTL level and synthesized in the desired library. With layout masks containing the antifuse (AF) and e-fuse information, the ICs are fabricated. Post manufacturing, the dies are tested, packaged and the key is consequently programmed in the OTP, OTR NVM through the external port interface. Finally, they are released into the supply chain. The proprietary programming device for the corresponding IC family is designed by a trusted source (e.g. designer himself). A system designer (like a computer manufacturer) or the last level retailer (like Digikey) would obtain both the locked ICs from the global supply chain and the programming device (from trusted party), unlock the chips, and utilize them in their systems or supply to the customer respectively. Hence, security against counterfeiting, is maintained through out all levels of the model. Abbreviations 26

Figure 2.4: Implementation of MIM antifuse in a 2 metal process. Table 2.1: Major Electrical Properties of the Antifuse based Lock [1]

Parameters Value Programming voltage 4-6 V Programming current 5-15 mA Programming duration 0.1-5 ms OFF state resistance 50MΩ − 1GΩ ON state resistance 20Ω − 80Ω

2.1.2 Pin Lock Structure

The lock component, which renders the pin/s un-operational is the antifuse (AF). Among possible AF structures, we select Metal-Insulator-Metal (MIM) ones, due to their inherent advantages of low on-state resistance and capacitance, possible use of existing process metal layers in their manufacturing and lower breakdown voltages [1]. Although thin gate oxide MOS capacitors are utilized as AFs in OTP Read Only Memories with practically no modifications in the CMOS manufactur- ing process, their on-state resistances are in the order of a few kilo ohms, which could lead to signal interfacing difficulties and loading issues, especially on the output ports [56]. The corresponding programmed values for MIM AFs are as low as 20 − 30 Ω. The electrical properties of the chosen MIM AF structure, are pre- sented in Table 2.1. The cross-sectional area of AFs are below 0.1µm2 (determined mainly by contact sizes) at advanced process nodes, and hence incurs minimal area overhead. In an on-die implementation as shown in Fig. 2.4, Metal-Insulator-Metal AFs incorporating tungsten, aluminum electrodes and plasma nitride/silicon diox- ide [54] insulator are deposited over the silicon substrate. Consequently plasma and DRIE etching are performed for the metal electrode contacts. Abbreviations 27

Figure 2.5: Insertion of the lock unit in a general purpose input-output (GPIO) port of a state of the art microcontroller [3].

2.1.3 Lock Insertion in I/O Port Circuitry

To render a desired pin un-operational, an AF device would be inserted in the input/output (I/O) circuitry between the pad and the corresponding port data line. The placement of an AF in the input path of a general purpose input/output (GPIO) pin of a state of the art micro-controller is shown in Fig. 2.5. The possible locations for lock insertion in the output path of the port are shown as well. The illustrated GPIO structure is a good representation of the I/O circuitry commonly found in processors, FPGAs and micro-controllers.

When the port acts as an input, the output driver is in the high impedance state and the data bit is read through the input latch and buffer. In the output mode, the data in the latch is transferred to the pad. The extra locking signals to the port circuit are the program high (VH) and low (VL) voltages to the AF and the program enable (PE) to the buffers. For all locked pins, a test path comprising of an OTP e-fuse [55] is placed in parallel to the AF (as shown in Fig. 2.5) for use during chip testing. After validation, the test path is disabled by blowing the e-fuse. Hence, an adversary cannot utilize test paths for causing malfunction or functional reverse-engineering. After one-time programming of the lock AF/s by key input, the AF program buffers are always in the high impedance state, and Abbreviations 28

Figure 2.6: a) Modified OTP antifuse (AF) based 128-bit ROM architecture on-chip for storing authentication key [4]; b) 2 possible OTP ROM bit structures based on AF [4], [5]. hence no interference occurs in the normal input-output operational modes. The AF is placed after the buffer and not directly near the pad to avoid possibility of alterations of AF state due to environmental interference (voltage, EMI, stray signals etc.) through the pads.

2.1.4 Programming the Key

Post fabrication, the authentication key is programmed into each IC. One Time Programmable (OTP) Read Only Memories (ROM), based on Metal Oxide Semi- conductor (MOS) gate capacitor Antifuses (AF) [56], [4] are chosen due to no incurred extra process steps and higher reliability at advanced process nodes, com- pared to flash and PROM structures. The structure of such an AF based NVM array with minor design additions for our application is shown in Fig. 2.6(a). Here, we select only one key bit operation per cycle for programming and comparison, to minimize hardware overhead. The internal controller logic maintains the tran- sitions between different states. For key programming, the write signal is enabled. Abbreviations 29

The address bits of the column decoder are controlled by a counter. Successful implementation of the locking method requires the prevention of external key read out in field by an adversary. After programming, based on an enabled test signal (T/R) to a de-multiplexer by the controller, the key bits are read out once through the external read line (DR) for validation. Consequently, DR is disabled by pro- gramming the OTP e-fuse. During IC authentication, T/R signal is disabled by the controller, passing the individual key bit values through DT to the lock/unlock circuitry for comparison. DT does not have any external access. To prohibit any possibility of field programming of the unprogrammed AFs (cells storing 0), the write signal is disabled by a similar protection scheme. When the legitimate IC reaches a system designer, he/she uses the proprietary device (obtained from man- ufacturer either directly or a trusted supply chain) to input the key and unlock the AF in pin. All NVM signal lines are derived from multiplexed original ports to avoid addition of chip ports. Two OTP AF based cells, one a 3T structure and the other a 1T structure, are illustrated in Fig. 2.6(b) [4], [5]. Due to significant area advantages, we use the latter AF structure, for storage of the key.

2.1.5 Design Circuitry for Chip Unlocking

The input sequence for chip unlocking is stored in the proprietary programming device, for each IC family. It is uploaded into the chip via one or few select multiplexed input pins, depending on the design implementation. The comparison circuitry consists of one or many XOR gates and a central controller, transitioning between different lock-unlock states. The key is stored in the OTP NVM on chip with disabled external access paths and the internal read connected to the lock-unlock circuitry. An illustration of a simple single bit design implementation (offers low area overhead) is in Fig. 2.7. Apart from the AF programming units, all circuit components work at normal logic operational voltages, derived from the on-die voltage regulators. The AF cell programming voltages (V pp ∼ 5V ) in the NVM and lock units [1], [4] are derived directly from the primary supply input. The clock inputs, would be derived from the on-chip PLL/s. Abbreviations 30

Figure 2.7: Additional design circuitry for comparison of key for counterfeit chip authentification.

2.1.5.1 Lock/Unlock Controller State Transitions

The default controller state during field operations is Idle. On the first sequence bit/s input, the start conversion (SC) signal is enabled by the programming device. This also enables the key bit read from the NVM (for multi-bit comparison per cycle, parallel sense amplifiers, buffers, column decoders are necessary). With this, the state machine transitions to the Comp state for bitwise key match, using the XOR gate/s. Depending on the output/s and the number of bits compared (by the NVM counter), the controller decides the next states i.e. if the bits match, the next higher order bits are compared in the Comp state, where as the comparison operation is halted in case of a unequal bit value case (Idle state restored). In the scenario of equality of all the bits, the Program Enable (PE) signal is made high (Prog state), which enables the antifuse (AF) electrode voltage lines and the program buffers for unlocking of the pins. After a time count, determined by the AF programming duration, the PE signal is made low, and the controller returns back to the default Idle state. This duration would be counted by the same NVM counter (previous count finished), by the PE signal feedback to the NVM. The controller state transition diagram, for 128 bit key is shown in Fig. 2.8(a), for a simple single bit comparison per cycle. Apart from SC, the state input signals are the XOR gate output CE, the count of number of bits compared CO and the count of programming duration PT (assumed 5 bit here). Abbreviations 31

Figure 2.8: (a) Typical state transition diagram of the controller in the com- parator circuitry; (b) Example of XOR gate in a Power balance type logic [6] 2.2 Security and Overhead Analysis of C-Lock

2.2.1 Security Analysis

Aged, recycled chips in the supply chain can easily be identified from the broken locks or in other words the programmed AFs. In such scenarios, the chip gives func- tional outputs corresponding to input sequences, signifying already programmed locks and hence a used IC. Besides functional validation, pin level parametric fail- ure tests such as input high and low (IIH and IIL) leakage and continuity tests can be utilized to distinguish between broken, absence, incorrectly copied locks or legitimate port locking scheme.

For cloning ICs, an adversary may steal the modified IP with lock design, at the RTL, gate or foundry level as well. Hardware obfuscation based approaches have played a role in preventing the first two with varying degree of success [57]. In our proposed scheme, the post manufacturing (after manufacturing tests as well as packaging) programming of the key in the AF based non-volatile memory (NVM), with disabled external read and program paths, prevents cloning of the IC with only the stolen IPs at different design levels. In this context, overproduced ICs also constitute stolen designs at the foundry level. To successfully clone a design, an attacker has to exactly decipher the key to enable programming of the copied IC with the proprietary device, and hence evade detection. Assuming an attacker supplies completely random binary patterns, the probability of each bit in the input sequence to be 0 or 1 is 1/2. Hence for a N bit key, the probability of a Abbreviations 32 match, for a particular independent input trial is 1/2N [58]. Hence deciphering a 64 or 128-bit key would require 264 or 2128 functional trials in the worst-case, and hence practically infeasible. Side channel based attacks can also be resisted successfully, as discussed in the next section. So die level destructive IC reverse- engineering, which is extremely complex and expensive [59], would be the only way for an adversary, to decipher the design.

2.2.1.1 Resistance against Side Channel Attacks

As in encryption algorithms, although the proposed methodology is resistant to mathematical reverse-engineering, side channel attacks (SCA) can in some sce- narios reveal the secret stored key by utilizing the actual implementation of the lock-unlock logic circuitry in the design. SCA techniques like differential power analysis (DPA) can utilize the input data dependent logic transitions and hence power consumption, to decipher part or whole length of key [6]. Similarly timing information may be exploited to reconstruct part or whole of the key. To protect against such attacks, we propose to employ the following steps in C-Lock.

• Full Key Comparison: To prevent any input sequence (given by potential adversary) dependent timing, current and hence power signatures, in all input scenarios, the controller would compare the entire length of the key and sequence, to transition to either the Program or Idle state. A separate signal Mismatch would be enabled on the first bit inequality. On comparison of all bits, if Mismatch is low, pin unlocking is enabled. Otherwise, the Idle state is restored. This avoids timing based information leakage.

• Power-balanced Logic: To prevent input data dependent power consumption patterns, the XOR gate/s for key bit comparison and the controller combi- national circuitry are implemented in a Power Balanced logic style. An im- plementation of a XOR gate in power balanced logic is shown in Fig. 2.8(b). Like dynamic differential logic (DDL), they lead to equivalent switching ca- pacitance every cycle irrespective of input transitions [6].

2.2.1.2 Why not FSM based Unlocking ?

Instead of a stored-key, a finite state machine (FSM) based approach can also be employed to achieve equivalent security against functional reverse-engineering. Abbreviations 33

Table 2.2: Security & Area Overhead of proposed Locking at 45 nm

32-bit key 64-bit key 128-bit key No of trials 232 264 2128 Lock Area (µm2) 215 245 290 Core-i7 (%) 8.1 ∗ 10−5 9.4 ∗ 10−5 1.09 ∗ 10−4 OMAP3630 (%) 1.20 ∗ 10−4 1.38 ∗ 10−4 1.59 ∗ 10−4 Spartan 6-LX (%) 2.17 ∗ 10−4 2.47 ∗ 10−4 2.88 ∗ 10−4

Here, the security arises from the fact that the programming is enabled only on the application of a pre-defined sequence of inputs. If the program state is reached on N particular transitions for a M bit input stream, the required number of trials is 2(M∗N), similar to the scan protection approach in [58]. For an 8-bit input and 16 state transitions, a complexity is 2128. However, a drawback of using FSM is that an adversary in the foundry may clone the design with the FSM itself (as a black box). He/she doesn’t even have to know the internals of the FSM design and the sequences. The proposed post-fabrication programming of the key in secure NVM at the last stage of the IC design cycle (before release into supply chain) makes the pin locking approach resistant to such foundry level attacks.

2.2.2 Overhead Analysis

The key is stored on-die in the OTP AF based NVM, the unlocking sequence stored in the programming device is input through multiplexed input/s, and fi- nally a XOR gate and controller are used to compare between the different bits (simplest serial comparison). This design (with ROM being simulated functionally as register based memory) has been functionally simulated using Model Sim, for a 16 bit key for input sequences equal and unequal to the key. The controller tran- sitions through 3 states as described, with a 5 bit program duration variable. The same design has been scaled up to a realistic scenario of a 128 bit key and synthe- sized at 45 nm with the low power NCSU PDK45nm technology library to obtain an estimate of the area overhead of the lock mechanism. The overhead values for three key lengths, in three ICs are tabulated in Table 2.2. The 3 ICs considered are the Intel Core-i7 general purpose processor, TI OMAP3630 embedded processor and Spartan 6-LX field programmable gate array, all fabricated at 45 nm process technology. The overhead due to a 32/64/128 bit AF based OTP ROM with peripherals, for storing the key are incorporated into the calculations [4]. Abbreviations 34

Table 2.3: Qualitative Comparison with Alternative Approaches

Property PUF Aging Proposed [60] Sensor [61] Method Implementation level On-Die On-Die On-Die Signature specificity Per IC N/A Per IC family Used/Remarked IC det. No Yes Yes Cloned IC det. Yes No Yes Die area overhead High High Low Test workload (designer) High Medium Low Difficulty of cloning sig. Very High N/A High Robustness Medium Medium Very High

Table 2.4: Area Overhead Comparison at 45 nm. Process Technology

Proposed RO-PUF Aging Sensor (128-bit key) [60] [61] Area Overhead 295µm2 3122µm2 4190µm2

Besides, the area in the I/O circuitry due to AF and program buffers in 30% of the total pins is added to the overhead value. The cross sectional areas of MIM AFs are dominated by the contact sizes (2λX2λ) and hence minimal.

From Table 2.2, it is observed that the locking mechanism incurs negligible area overhead ( 0.001% in state of art chip designs). With doubling the signature length, the reverse-engineering complexity increases exponentially, whereas the lock area approximately increases linearly up to a maximum overhead value of 0.0003% for a 128-bit key. For fast NVM programming and key comparison, we can easily utilize parallelism (sense amplifier, decoder, XOR) with insignificant area overhead. Besides, the programmed AF resistances in MIM structures are small (20-80Ω) [1], thereby not causing any loading effects in the chip pins. Hence the proposed security mechanism provides extremely high protection against various forms of counterfeiting attacks at negligible overhead.

2.2.3 Comparison with PUF and Aging Sensors

Two other well-known design techniques for protecting against counterfeit chips are PUFs (e.g. RO-PUF [13] or SRAM based MECCA PUF [60]) and the aging sensor, Abbreviations 35 typically realized with differential oscillator [61]. Although theoretically the PUF signature is extremely difficult to reverse engineer, it has certain disadvantages with respect to protection against counterfeiting attacks. Aging sensors, on the other hand, incur significant design overhead. Table 2.3 provides a qualitative comparison between our proposed locking mechanism, PUF and aging sensors. Estimated area overhead values of a common RO-PUF and an aging sensor are compared with the 128-bit key based locking method. As seen from Table 2.4, our proposed method incurs much lower (< 10% of the other two) area overhead. Besides, although the BFSM of the active hardware metering approach incurs comparable low overhead [25], a PUF structure or random number generator is needed for random ID generation, and hence overhead would be significant.

2.3 Discussion

So in this work, we have presented “C-Lock”, a novel design approach for active protection against counterfeiting attacks through locking select pins of an IC by antifuse devices (AFs). A system designer can unlock a chip before using it in a system by a hard-to-clone key-based programming of the AFs with a propri- etary programming device. Unlike existing approaches based on aging sensors and PUFs, it simultaneously protects against two major forms of counterfeiting attacks, namely reselling and cloning. Use of AF as locking mechanism ensures one-time unlocking - i.e. once unlocked, an IC remains unlocked through its life cycle. Hence, the proposed approach can readily prevent reselling of aged/scav- enged chips. We have shown that the unlocking signature with a 64/128 bit se- curely stored key is practically infeasible to functionally reverse engineer. Hence, it also protects against different forms of cloning attacks. We have presented de- tailed analysis on the on-die implementation of this locking methodology. The key matching circuit is designed to be robust against both invasive and side-channel attacks. The effectiveness and overhead of the approach is compared with alterna- tive anti-counterfeiting approaches. Relative to existing DfS techniques, C-Lock incurs significantly less design modifications and hardware resource overhead. Fu- ture work would include hardware prototyping of the locking methodology in a realistic IC. Abbreviations 36

However, in spite of the advantages, C-Lock has some limitations which influ- ence us to continue research into potential defense techniques with complemen- tary benefits like the method “P-Val” presented in the consequent section. C-Lock limitations can be enumerated as:

• Although low compared to existing methods, C-Lock does require some de- sign modification and incurs die level overhead. Hence it is not applicable to legacy designs or designs finalized for production, which constitute a sig- nificant fraction of the semiconductor market.

• The defense against cloning is based on a key stored in a secure NVM on- die, unique across IC families or types from a designer (as chosen by design house). First, absence of a unique key or signature per device/IC exposes C-Lock to “break-one-break-all” type attacks and reduces security against different forms of cloning attacks. Second, however secure the storage might be, a physically stored key is always more vulnerable to reverse-engineer as compared to a signature based on intrinsic IC characteristics like that in PUFs.

• C-Lock requires design modifications catered towards digital chips. Hence like almost all other existing DfS techniques, C-Lock is a digital only tech- nique and is not applicable to analog and mixed signal ICs, which also con- stitute among chips being counterfeit by adversaries these days.

Next we describe the methodology, security analysis and associated results of the proposed antifuse (AF) based package level defense approach “P-Val”.

2.4 P-Val Methodology

P-Val is a novel, unified package-level IC integrity validation approach that pro- tects against both recycled and cloned ICs. Similar to “C-Lock” methodology, protection against used, aged chips is achieved through a unique active defense approach, which involves locking or disabling an IC function fully/partially by insertion of OTP antifuse (AF) devices to one or few select pins of the IC. The contrast with C-Lock is that AFs are integrated at the package level in P-Val, as illustrated in the schematic of Fig. 2.9(a). Parallel test fuses (TF), also integrated Abbreviations 37

Figure 2.9: (a) Overview of the proposed security mechanism in an IC; b) unified protection against recycling and cloning at the package level to these locked pins, are used for final chip testing and blown before deployment. Hence the significant advantage of P-Val is no requirement of design modifications and workload at chip level and savings of associated hardware overhead, leading to applicability to legacy designs (designs ready for production). To clarify, for ICs based on these legacy designs, already employed in systems in the market, P-Val based protection does not apply due to requirement of minimal, yet some package modifications and generation of signature before deployment. Besides, P-Val is also suitable for protection of analog and mixed signal chips. The pins selected for locking are usually from the general purpose input/output (GPIO) or output only pin set of the chip as programming them only requires setting a fixed output voltage (e.g. 0 V) at the die pad end and externally ap- plying a program voltage at the pin with consequent passing of program current. Insertion of AFs to one or few select pins during packaging incurs minimal area overhead [54]. Following the same principles as before, a locked IC in the supply chain remains non-operational until unlocked by application of required program- ming parameters by a trusted party e.g. system integrator. As AFs are OTP, the chip is functional for its entire life cycle once unlocked. Hence, a programmed AF serves proof of past usage/handling and automatically protects against reselling of aged chips of all types, including legacy designs and analog ICs.

The P-Val defense against cloning is significantly different and stronger than “C- Lock”. For this, we exploit intrinsic variations in programmed resistance of AF devices connected to IC pins that enable us to create unique chip-specific sig- natures for authentication (Fig. 2.9(b), lower). These AFs are integrated into Abbreviations 38 few/all of the GPIO or output pins from the set of pins not used for chip lock- ing, as illustrated in Fig. 2.9(a). They are programmed after packaging testing. The chip signature is evaluated and stored in the designer’s database just before being deployed in the supply chain. It is well-known that inherent randomness in AF fabrication and programming process leads to variations in the insulator thickness, electrode and insulator composition, heat distribution during program, as well as the stoichiometric composition of the final filament [62]. As a result, the programmed resistance of AFs exhibit intrinsic random variations around nominal values for manufacturing and program parameters [63], [64], which are utilized to create the chip-specific signatures. As lower program currents (Ipp) of the cho- sen AF structure leads to higher variations (better uniqueness of signatures) and currents greater than Ipp leads to irreversible programmed resistance variations, test fuses (TFs) are also implemented in parallel to the authentication pin AFs for final chip test (full test coverage) before they are blown. Consequently the authentication AFs are programmed.

P-Val only requires commensurate modifications at the package and testing phases. The proposed security approach seamlessly integrates with the current semicon- ductor business cycle like C-Lock. An optional programming and verification device (PD) may be securely exchanged between the chip designer and a trusted party like system integrator to facilitate AF programming and signature verifica- tion. Finally, like C-Lock, P-Val is transparent to the end-user i.e. it comes at no constraints with regards to usage and performance. The candidate pins, AF, TF properties are chosen such that any loading effects at the corresponding pins are minimal during normal chip operations. In particular, P-Val offers the following major advantages:

• To the best of our knowledge, P-Val based unified protection against recy- cling and cloning is the first package-level anti-counterfeit solution proposed. As compared to the OTP stored key based approach in C-Lock which is susceptible to break-one-break-all type threats, the P-Val security against cloning attacks, based on the intrinsic random variations of AF program resistances, is higher.

• P-Val incurs no design modifications and hardware overhead at the chip level, making them suitable for application to legacy designs. As compared to existing on-die approaches, the test workload is comparatively lesser. Abbreviations 39

• P-Val, based on AF integration and program via the pins in IC package, is applicable to analog and mixed signal ICs.

2.5 P-Val Implementation

Both the complementary security schemes of P-Val involve integration of an anti- fuse (AF) and test fuse (TF) to the corresponding pins at the package level. For protection against recycling, the AFs are left un-programmed where as for defense against different cloning attacks, the desired set of AFs are programmed before deployment. For all the candidate pins, post-package testing is performed through the test fuses (TF), implemented in parallel to the AF. TFs functionally represent the inverse of AFs i.e. normally closed switch (∼ 20Ω − 40Ω) [55] and blown open after programming. In P-Val, TFs are programmed just before chip deployment. Next, we enlist some important properties of antifuse (AF) that we have utilized in the P-Val scheme.

2.5.1 Important AF Properties

Property 1: The ON state resistance Ron is approximately inversely proportional

to program current Ipp [62]. The median value of the MIM AF program resis- tances [63], [62] in the ranges of programming current within the maximum pin operating currents of different ICs between 15 − 40 mA [3], [65], [66], is given by [62]:

Ron = ρs,on/(Π ∗ rc) + ρc,on/(Π ∗ rc)+ 2 (ρc,on ∗ d)/(Π ∗ rc )...... (1)

where rc = (Ipp ∗ Vf,p)/(4 ∗ keff ∗ (Tco − Ta))

In the above formula derived from electro-thermal models, ρs,on, ρc,on are the elec- trical spreading and core resistivity under operation, d is insulator thickness, rc is core/filament radius, Ipp is program current, Vf,p is programming fuse voltage, keff is equivalent AF thermal conductivity, Tco is core equilibrium reaction temperature and Ta is ambient temperature. Usually, equation (1) is simplified to incorporate the AF electrical, thermal conductivities and the core reaction temperature into a term Vf , called the characteristic fuse voltage [62], which is only dependent on the Abbreviations 40

type of AF (Poly-Si or MIM). From the formula, it is evident that lower program currents lead to higher AF resistance and vice versa.

Property 2: Lower program currents lead to greater variation in the AF program resistance and vice versa [63]. At lower AF program currents, the effect of the insulator thickness and hence the conducting channel (instead of only a spherical

core) is prominent (third term of Ron in equation 1).

Property 3: For programmed AFs, a read/operating current (Iread) below the pro-

gram current Ipp leads to no AF structural alterations and hence consistent resis-

tance values [63], [62]. However on Iread exceeding Ipp, the resistance decreases (to new value corresponding to Iread) due to greater joule heating and hence increased filament/core size [62].

2.5.2 P-Val Component Selection

2.5.2.1 Effect of AF/TF on Normal Pin Operation

AF/TFs are incorporated into all/some of the general purpose input output (GPIO) or output only pins of the IC. This allows programming of both components (AF- TF) by setting/writing a particular output voltage at the pad (e.g. 0 V) and ap- plying an external pin voltage equivalent to the program voltage and consequent passage of the AF/TF program current. A general representation of a GPIO port logic, found in slightly varying implementations in different types of chips such as micro-controllers, FPGAs, processors [3], [65], [66], [67] etc. is illustrated in the schematic in Fig. 2.10(a). Particular registers control the GPIO port configura- tion i.e. input or output (based on custom logic) as well as the port value/content. The electrical parameters of both AFs and TFs at the candidate pins have to be set to minimize any loading effects at the pin. The capacitance of both AF and TFs are of the order of fF/µm2 [63], [64], [68] and hence minimal for typical dimensions, as compared to the usual total existing I/O capacitances in range of ∼ 5 − 10 pFs [65], [66]. Moreover, the GPIO input resistance is of the order of MΩs and above, and hence a programmed AF during normal operation or an un-programmed TF during test would not cause any loading effects during input operations. The un-programmed and programmed resistances of an AF and TF are of the order of hundreds of MΩs and thereby does not hamper any electrical Abbreviations 41

Figure 2.10: a) A general representation of the I/O port circuitry found in different chips (µC, FPGA, µP etc.); b) minimal loading effect due to proposed scheme during the critical output mode operation of each candidate pin.

operations during post-package test (through TF) or in field (through AF) respec- tively. Hence the loading effects, in terms of logic propagation delay or load drive strength (due to programmed AF/un-programmed TF) should be analyzed in the critical pin output mode during normal/test operations.

In the output mode during normal operation, the port driver circuit reduces to an equivalent pull up/down transistor with a programmed AF in series (pro- grammed TF in MΩs and hence considered open), as shown in Fig. 2.10(b). With the I/O supply voltages typically varying between 1.8-5 V and the maxi- mum pin source/sink current between 10-40 mA across different grades and types of chips [3], [65], [66], [67] (referred chips vary between 180 nm to 32 nm in the process technology node), the ON resistance of the equivalent driver transistor is mostly >= 200Ω. The resistances of the equivalent transistors, comprising the port enable logic are usually in the same range or higher as compared to drivers. Hence, the AF programmed resistance should be set accordingly to a low value to minimize the loading during the output mode. Minimization of loading effects also involves selecting the IC pins to be utilized for locking and authentication as described next. Abbreviations 42

Table 2.5: Major Properties of the P-Val MIM Antifuse

Parameters Value Program voltage 4-6 V Program current 10-40 mA (∼ max. IC pin current) Program duration 0.1-5 ms OFF state resistance 50MΩ − 1GΩ ON state resistance 10Ω − 30Ω Size / dimensions < 1 − 2µm2 Insulator thickness ∼ 20 nm

2.5.2.2 Antifuse (AF) Selection

The metal electrodes of the MIM AFs may be composed of Al, Cu, TiW etc [62], [69]. whereas the choice of insulators include SixNy, SiO2, amorphous Si, C etc [69], [63]. Although there exists CMOS compatible, transistor gate oxide breakdown based AF structures [70], we do not choose it as IC packages do not typically involve incorporation of silicon. For a chosen lower program voltage in the range of ∼ 4−6 V, the AF insulator thickness would be ∼ 20 nm [68], [71]. The program voltage would be applied externally at the pin (with pad at e.g. 0 V) to initiate a rupture or weak spot within the AF insulator link. The AF size is determined mostly by the electrode contact dimensions [72] and is within a maximum of 1-2 µm2 [1], [64]. Possible pin choices for locking/authentication include most of the general purpose input-output (GPIO) or output only pins due to easier programmability.

With respect to loading effects, critical pins such as high frequency chip clock/s (hundreds of MHz- GHz range), GPIO pins multiplexed with clock, oscillator input/outputs etc. are not selected to prevent any frequency degradation. Power pins are not chosen as stable functioning supplies are required for the TF and AF programming during chip unlock. Any pins with current limitations are also not considered. AFs in GPIO i/p mode (or input only pins) can also be carefully programmed in the input mode through forward biased protection diodes with proper choice of AF, diode, pull up/down resistance (if any) and pulse timing values such that the maximum operational ratings at the die pads are not violated. These are considered only in rare scenarios of too few candidate pins for P-Val. To minimize loading effects during normal operation, AF ON state resistance

(Ron) is chosen between 10 − 30Ω [63], [71], [69]. To enable this, according to the previously described MIM AF property 1, the final program current is chosen Abbreviations 43

around the maximum pin current ranging typically from 10-40 mA across different IC types. In AF based FPGAs from Actel and Quicklogic, logic paths with even more than 5-10 MIM AFs in sequence can achieve a maximum frequency in the order of several MHz. As we are avoiding frequency critical pins in the order

of hundreds of MHz to GHz, the chosen Ron values are guaranteed to minimize any loading. Multiple small program pulses could be used to further reduce AF program resistances [68]. The AF parameters are listed in Table 2.5.

2.5.2.3 Test Fuse (TF) Selection

Programmed e-fuse resistances are of the order of 10-100 MΩ. The common e- fuse structure incorporating a thin strip of Poly-Si, covered with a thin silicide

layer (e.g. CoSi2, W Si2) would be implemented [55], [73]. Programming involves electromigration of the metal atoms of the silicide due to localized heat generated from passage of programming current through the fuse link. Like AF, the size of the e-fuse is limited within 1 − 2µm2 [55]. The e-fuse properties are chosen such that the ON resistances are within ∼ 30Ω [73] to minimize any loading effects during test. Their program properties should be selected according to the maximum operational ratings of the corresponding pins, thereby providing full IC test coverage at both wafer and assembly levels. Desired fuse maximum current and associated voltage ratings may be set by selecting appropriate e-fuse silicide materials, geometry, electrodes etc.

2.5.2.4 Package Level Fabrication

The AF-TF structures would be implemented at the package level, leaving the die untouched. P-Val can be implemented on all packaging types including current state of the art Ball Grid Array (BGA) based chip scale packages (CSP). Based on the chip mounting method, density of pins and package dimensions, IC packages are mostly categorized into 3 main types: 1) Through Hole; 2) Surface Mount and 3) Chip Scale Packages [8], [74]. AF-TF structures would be integrated between the corresponding die pads and pins. Based on the package size with respect to the bare die estate and the type of existing connections between die pad and external pins, these units (AF-TF) would be implemented in discrete form on packaging substrate or grown on it selectively. With respect to these considerations, one can classify IC packaging technologies into 2 major categories: 1) through hole and Abbreviations 44

Figure 2.11: (a) Discrete AF-TF integration in through hole and surface mount packages like QFN, QFP, PLCC, SSOP [7]; (b) P-Val implementation in flip-chip bonded BGA based CSPs [8].

most surface mount types (e.g. SSOP, QFP, QFN, PLCC) [74] where pads are connected to pin leads (at package substrate periphery) through normal wirebonds, afforded by the much greater package size compared to die (> 4-5X); 2) chip scale packages where the package size is only ∼ 1.2X that of the die and pads are connected to external solder bumps by short wirebonds and substrate inter-layer traces (wire-bonded BGA) or through flip chip bonds (flip chip BGA) [8].

In the former scenario, due to much greater pad-pin spacings, the wirebonds are between 3-5 mm in total length [75]. This allows enough package substrate area to integrate individual discrete AF and TF, of maximum sizes in the order of a few µm2 (defined mostly by the sizes of the contacts 2λX2λ [72]) between the die pad and corresponding pin. The individual AF-TFs would be held in place by a material similar to the epoxy based die attach for mounting ICs on the packaging substrate [7]. As the sole additional packaging effort, wirebonds would be attached from the die pad to one of the AF and TF electrodes and from the other electrode contacts to the corresponding metal pin or leads, as illustrated in Fig. 2.11(a). In these cases, P-Val incurs only greater number of wire-bonds per candidate pin and usually no additional package area. Hence the extra cost is minimal. This scenario incorporates P-Val implementation in most of the IC types, especially the low and mid range chips. On the other hand, the special chip scale packages (CSP) for high end processors, system-on-chips, DSPs etc. would involve controlled fabrication of the AF, TFs on the packaging substrate mostly by reactive sputtering techniques [76], [77]. The metal layers from the package substrate conductive traces (e.g. Cu, Al) can be utilized to serve as the electrodes where as the AF insulator layer of for example SixNy and Poly-Si and silicide Abbreviations 45

(e.g. W Si2) of e-fuse can be deposited selectively using masks. Schematics of P-Val implementation in a flip-chip bonded CSP is illustrated in Fig. 2.11(b). Here, only the AFs are depicted in the package cross section view. The TF layers would be fabricated and integrated in a similar manner. Fig. 2.11(b) only depicts one layer of interposer and conductive trace in the package. Many CSPs contain multiple such layers between die pad and external solder bumps [8]. In these scenarios, the same method of AF-TF fabrication can be performed on the top stack layer. Either of the two procedures (discrete AF/TF or deposition) may be implemented in wire-bonded BGA based CSPs.

2.6 Pin Locking and IC Authentication in P-Val

The flow (involving design and business cycle) of a legitimate chip with P-Val implementation is depicted in Fig. 2.12. P-Val unifies two novel complementary security approaches:

2.6.1 Pin Locking

Antifuses (AFs) inserted in few chip pins (from GPIO or output only port set) disables the pin functions and hence the term “pin locking”. All ICs from a family would have the same pins locked. The number of locked pins could be one especially in small scale chips, but usually a few pins would be selected to significantly affect the IC functionality until the AFs are programmed.

At the level of the system designer/end retailer (trusted entities in model), a pro- gramming device (PD) could be securely exchanged by the chip manufacturer to accelerate the verification, unlock and authentication process, at the expense of slight increase in cost. This device would integrate the voltage pulser, current regulator, the locked pin locations for the IC family and verification logic for the pin lock/unlock condition (along with signature calculation). It is assumed trust- worthy, tamper proof and exchanged securely in the P-Val implementation. A PD is not a necessity for the P-Val scheme, but helps in easier adoption of the pro- posed approach by system designers with respect to test effort and time-to-market constraints. Besides, a PD would help in standardizing the AF resistance measur- ing process for authentication and avoid any site-to-site variation on measurement Abbreviations 46

Figure 2.12: Life-cycle of a legitimate IC with P-Val implementation. accuracy. With a PD or own set up, the system designer verifies the lock/unlock condition by measuring the order/range of lock pin resistances in the port o/p modes (by measuring current under applied pin voltage) and/or checking the chip functional outputs and proceeds with programming the AFs in case of a new IC.

2.6.2 IC Authentication Methodology

The random intrinsic variation of AF programmed resistances around nominal value (for same program parameters) can be utilized to create chip-specific sig- natures for authentication. The manufacturing and programming process induces variation in device parameters such as insulator thickness, electrode surface rough- ness, core radius etc. To the best of our knowledge, AF program resistance vari- ation have only been studied [63], [64], [69], [71] to ensure that their values are constrained within a threshold (e.g. 50-100 Ω for MIM AF) to satisfy critical path delays in FPGA applications. Here, we aim to utilize the inherent variations to our benefit for authentication, similar to PUFs using random process variations in device/circuit parameters [20].

The sources of these variations could be manifold, as inferred from studies regard- ing conducting filament characteristics, core temperature boundary, link material analysis etc. [62], [68], [63]. Possible sources such as insulator thickness, side wall geometry, AF electrical and thermal properties, filament stoichiometry (not all independent) etc. are illustrated in a representative programmed AF cross section Abbreviations 47

Figure 2.13: (a) Possible sources of intrinsic variation of programmed AF resistances (Ron); (b) Variation of Ron of MIM AF at 20 mA program current (Ipp); (c) Greater variation of AF Ron at lower Ipp at similar program voltages, duration and pulse patterns. schematic in Fig. 2.13(a). The variation of AF program resistances (Ron) between

20 and 30 Ω, for a program current (Ipp) of 20 mA is illustrated in the probability density function (pdf) in Fig. 2.13(b) (differentiation of empirical cumulative distribution in [63]).

Corresponding to AF properties 1 and 2 mentioned in the previous section, higher variation and median value of Ron for lower Ipp are illustrated in Fig. 2.13(c) for fabricated MIM AFs [63]. The greater variation can be utilized for enhanced uniqueness of the signature space. For a MIM AF program voltage around 4-6 V, the minimum Ipp utilized for forming the conductive filament is ∼ 5 mA [63], [68].

In this case, Ron would range from 50 Ω to ∼ 80Ω [63]. Utilizing this large typi- cally random variation, the individual programmed authentication AF resistances would be measured after chip testing and the IC signature generated according to the protocol described in the ensuing section. AF property 3 prohibits an AF operating current beyond Ipp for invariant AF program resistances and thus ro- bust signatures. For this purpose, test e-Fuses (TF) of low resistance and program current around max. pin current would also be in parallel to the authentication AFs for full test coverage. After test, TFs are blown and AFs programmed. Abbreviations 48

After authentication, along with programming the lock AFs at Imax, the system

designer passes Imax through the authentication AFs as well (shown in Fig. 2.12),

where Imax is ∼ 15-40 mA across types and families of ICs. According to AF

Property 3, this would reduce Ron of all AFs to between 10 − 30Ω [63], [71], [69], which is guaranteed to minimize any loading effects during normal field operation. Hence in P-Val, MIM AF properties are suitably utilized to create unique signa- tures for authentication as well as minimize any loading effects at the pins in field. All the legitimate IC signatures are stored in the manufacture’s database. When an IC reaches a trusted entity, he/she generates the chip signature after measure- ment. Consequently, it is compared with the signatures stored in the database to verify a match and hence considered a legitimate IC. During signature creation

and verification, AF Ron are measured by setting the corresponding pad at a fixed

voltage (e.g. 0 V) and applying a measuring potential at pin to pass a set Iread.

The ratio of applied voltage to Iread gives AF Ron.

2.6.3 Signature Generation

The steps of the signature generation protocol are depicted below:

Signature Generation (same Ipp for all AFs)

Input:[Ri], the resistance vector of chip, for all i ⊆ (1, .., M), M←no. of auth. pins

Initialization:C ← 0

for all i ⊆ (1, .., M − 1), for all j ⊆ (i + 1, .., M), C ← C + 1

Comparison: If Ri >= Rj (normalized due to same Ipp)

Ps[C] = 1 else

Ps[C] = 0 end end

Output: Sig = [Ps[k]] for all k ⊆ (1, .., MC2) Abbreviations 49

The inputs to the chip signature generation algorithm are the measured program resistances of the authentication pin AFs. As AFs in a chip are typically pro- grammed at the same Ipp (same mean value of distribution), a simple compari- son based scheme is utilized to create robust signatures (values already normal- ized). Any two authentication AF Ron, chosen in a pre-determined sequence for all chips, are compared to create a 1 or 0 signature bit. This method also uti- lizes the full entropy of the signature space. Another significant advantage of the comparison based scheme is the robustness to any common-mode variations of

AF Ron with temperature, which might occur before/during measurement. Al- though much smaller than metal interconnects [68], [63], programmed MIM AF resistances exhibit a non-negligible positive linear temperature coefficient in the order of (4 ∗ 10−4) − (8 ∗ 10−4)K−1 at 298 K, in the P-Val current ranges. For example, for Ipp = 10mA, the MIM AF Ron increases by a maximum of 4% from 25◦ − 80◦. Our comparison based signature generation automatically makes P-Val robust against the effect of these typically linear temperature variations.

As compared to global normalization and digitization based algorithms, where all chips are compared amongst each other, normalized with respect to globally cho- sen mean, minima etc. and possibly digitized, the proposed inter-pin (same chip) comparison scheme provides the following advantages: 1) Robustness to temper- ature based Ron changes and AF fabrication related common biases if any; 2) Utilizing the full entropy of the signature space ; 3) No requirement of measure- ment or storage of any global pin value like mean, median etc. of Ron by the chip manufacturer. With the proposed signature scheme, even 20 AF pins would lead to 20C2 = 190 candidate signature bits. This allows proper selection of bits to create a robust 128 bit signature and also minimize any bit correlations during comparison.

2.7 Security Analysis

As the counterfeiting of ICs is such a lucrative malpractice in terms of economic benefits, adversaries would try do everything possible to defeat any protection scheme. In the next subsections, P-Val is analyzed from different angles in how Abbreviations 50

Figure 2.14: Security provided by P-Val against different attempts by an adversary to bypass the used/recycled chip detection scheme. an attacker may attempt to bypass its security features and at the same time maintain an economic benefit, which is the main purpose of counterfeiting.

2.7.1 P-Val Security against Recycled Chips

The different recycled IC based attack channels on proposed scheme and the cor- responding P-Val defenses are illustrated in Fig. 2.14. As AFs are one-time- programmable (OTP), the sole viable way an attacker can attempt to bypass P-Val for an used chip is to de-package the IC, replace the programmed AFs (pre- viously locked) with new ones and then re-package and insert into supply chain. In P-Val, the position of the locked and authentication pins are transparent. With state of the art tools, an adversary can de-package, replace components and re- package without detection by physical and microscopic analysis tools. However from the economic perspective, as compared to first time integration, replacing AF-TFs without leaving any functional/electrical detection traces is prohibitively more expensive. To elaborate, by far, the most common practice that is followed by attackers for recycling is to scrape off the used ICs from the system level boards, fix any wear and tear in the pin/s and physical defects on the external package, remark the surface if required and insert them back into the supply chain. This incurs minimal cost. Minimal pitch surface mount and chip scale packages, where the AF-TFs are fabricated on the packaging stack makes their replacement sig- nificantly more difficult due to complexity in handling of narrow traces, solder balls, micro-via, interposer layers etc. To replace the programmed package level lock AF, an attacker would require the access to state of the art micro-electronic tools/resources. More importantly, dealing invasively with the package increases Abbreviations 51 the chances of enhanced contact capacitance, incorrect alignments, micro-via al- terations, pad wearing etc.. These may get detected by system designer via IC parametric tests like maximum frequency, drive current, burn in and high speed functional tests. Hence, the economic gains from recycling reduces significantly when one has to expend significant time and resources to evade detection.

Even if we assume that an attacker successfully does the replacement of the lock pin AFs, the IC would fail at the authentication stage of P-Val as the programmed ON-state resistances of all the authentication AFs had been reduced irreversibly (from values used to create signature) prior to in-field usage (by program at max. pin current) and hence no resulting signature match. Even in an extreme scenario, if the adversary replaces all pin AFs and programs the authentication set after reverse-engineering, differences in fabrication, program parameters coupled with intrinsic random variations would lead to unique IC signatures. As these will not be stored in the IC manufacturer’s database, the corresponding chips would be detected as cloned ICs. Detection of any type of counterfeit ICs (recycled as clone or vice versa) suffices to prevent them from further usage in electronic systems, which is the main goal of security schemes.

2.7.2 Security of P-Val against Cloned chips

In this subsection, through theoretical analysis, mathematical formulation and simulation based results, we analyze the security provided by P-Val against differ- ent forms of cloning attacks. Cloning threats include piracy of IP, overproduction of ICs as well as reverse engineering after fabrication. With P-Val implementa- tion, along with these attacks, an IC signature need to match one in the database to successfully pass the authentication phase. Due to complexity of implementa- tion involving different entities in semiconductor business model, the verification of a particular signature in the IC manufacturer’s database does not lead to any database update (to note verification once) in the P-Val scheme. Hence, any chip having a signature matching any legitimate verified IC would also be authenti- cated in P-Val. As a result, an adversary attempts to reproduce the signatures of legitimate chips in his manufactured cloned ICs. Hard-coding the signature bits in a non-volatile memory is not possible as P-Val requires resistance measurements and off-line generation of signatures. If programming devices are used, we assume their secure exchange and tamper protection. To pass the P-Val authentication Abbreviations 52

scheme, an attacker, assuming the role of a system designer, can buy a few legiti- mate chips and generate their signatures. Consequently, he/she would attempt to copy these signatures with highest probability.

Referring to the signature generation algorithm, it is sufficient for an adversary to copy the relative values of the AF resistances (compared to actual magnitudes) of a legitimate chip. Similar to the principles of PUF, random intrinsic variations in the program AF resistances arise from different inherent structural, chemical and programming characteristics, which are not controllable [62], [63], [64]. As a result, for better controllability and thereby enhancing the match probability, an adversary might attempt to replicate signatures of legitimate chip/s by inserting chip-scale precision resistances (tight distribution) of corresponding relative values in cloned ICs.

2.7.2.1 Precision Resistance Insertion

To obtain a set of resistance values with high probability, an adversary would wish to use a resistance type with much tighter distribution than AFs, along with a miniature form factor to be integrated into the IC package. Flat chip scale thin film precision resistors with a tolerance of ∼ 0.1% are used in some electronic applications [78], [79]. In a cloning scenario, the adversary integrates precision resistors of the measured relative magnitudes.

This attempt can be defeated if we can detect whether the measured authentication resistance originates from a programmed AF or any other material such as preci- sion resistors, test-fuse etc. Some unique property characterizing programmed AFs needs to be used as an additional defense layer e.g. the aforementioned property

3 of the MIM AFs can be utilized i.e. if Iread > Ipp, the AF ON-state resistance

(Ron) reduces due to enlarged filament size. In P-Val, after authentication, AFs are consequently programmed at Imax to minimize Ron and hence any loading ef- fects in-field. Normal chip scale precision resistors or test-fuses etc. do not exhibit this unique property. Hence, the reduction of resistance on final program roughly according to empirical equations proves that the resistance measured is due to a programmed MIM AF and thus detects an adversary malice of the type mentioned. Other distinct AF properties could be used for verification as well. These would be verified only in the scenario of a signature match. Abbreviations 53

Figure 2.15: (a) Example set diagram representation of match probability of cloned IC signatures. (b) CDF and derived PDF at Ipp of 10 mA used for simulation studies; (c) Variation of calculated cloning probability with number of authentication pins (1 million legitimate ICs).

2.7.2.2 AF Integration in Cloned ICs

For cloning, an attacker could integrate AFs at the respective pin positions of the chip and program them. As AFs of different possible materials, geometries and program properties all suffer from intrinsic manufacturing/program related variations, the success of an adversary i.e. cloned IC matching any original chip signature is probabilistic. The probability of cloning a small scale or low end chip (e.g. some µCs, analog ICs) with less number of pins is higher as the signature space (NC2 bits) is typically exponentially dependent on the number of pins. Dif- ferent chosen AF distributions (different structure, chemical composition, program voltage etc.) might lead to varied adversary success rates. In the next section, we calculate the minimum number of authentication pins to statistically minimize this cloning probability. For ease of calculations, we favorably assume (from at- tacker’s point) that the legitimate and cloned IC AF distributions are the same, which the attacker could achieve by using the same fabrication facility etc. Abbreviations 54

Minimum number of Authentication Pins: If there are “N” authentication pins and each AF program resistance is represented by Xi, then with “M” total existing legitimate ICs, the probability Pr of a cloned IC matching any legitimate chip signature can be theoretically formulated as:

[ [ Pr = P ((X1,X2, ..., XN−1,XN ) ⊆ (O1 O2 ..OK ))

where O1,O2, ...., OK constitute the regions in the total signature space (maximum of 2NC2 bits) comprising of the “M” legitimate ICs. This is also conceptually de- picted in the representative set diagram illustration in Fig. 2.15(a). As estimating this probability from theoretical formulation considering all possible scenarios is extremely complex, we statistically estimate the probability through simulation

studies considering the cumulative distribution function (cdf) of AF Ron in [63] (Fig. 2.15(b) upper). Although the AF composition in P-Val is a bit different from that in [63], the simulation results serve only to estimate representative val-

ues for the cloning probability. We perform the analysis for Ipp = 10mA (lowest empirically analyzed value in [63]).

Simulation Setup & Results: The probability density function (pdf) of programmed AF resistances, derived from differentiation of cdf in [63]) is shown in Fig. 2.15(b) (lower). The number of authentication pins are varied from 8 to 16. We consid-

ered 1 million legitimate chips. For each chip, assuming the same Ipp for all AFs,

programmed resistance (Ron) values are chosen randomly according to the pdf and the signature calculated following the protocol. The resolution of the measurement determines the the total number of possible resistance values of a programmed AF. It is derived from the specification of a simple, low cost multi-meter as ∼ 0.05Ω

or 50mΩ. For a representative cloned IC, AF Ron values corresponding to the number of authentication pins are randomly selected from pdf and the signature calculated. For 10000 such iterations (from simulation time constraints), the cal- culated signature is compared with all the legitimate signatures for a match. One or more matches leads to increment of a counter by 1 for every iteration. The ratio of the final counter value to 10000 is an estimate of the cloning probability of a legitimate IC. All simulations are performed in Matlab. Abbreviations 55

The variation of calculated cloning probability with the number of authentication pins is illustrated in Fig. 2.15(c). As observed, considering 1 million legitimate ICs of a certain type, there will always be a match up to 8 authentication pins. For 500,000 original chips, the corresponding probability reduces to 0.91. However due to the exponentially increasing signature space, this probability reduces drastically to 0.19 for 10 pins and 0.003 for 12 pins. This suggests that on an average, with 10 authentication pins, an attacker has to fabricate 100 chips for 19 to pass P-Val and hence not economically viable. No match was found with greater than 12 pins. Hence a chip with P-Val and minimum ∼ 10 authentication pins would be typically secure against cloning attacks. This would include majority of the IC types/families in the market.

2.7.2.3 Protection against Overproduced ICs

One prevalent form of counterfeiting is overproduction of chips by a malicious foundry beyond the contract with the IC designer. As the AF-TF fabrication pro- cess is the same for legitimate and overproduced chips in this case, there is a higher probability of complete resistance distribution match although AF programming happens under the control of the designer. Even then, as described previously, due to intrinsic random variations, IC signatures with 10 or more authentication AFs would be either impossible to clone or copying is mostly not an economically viable option. This includes usage scenarios of attackers modifying the packaging of the overproduced lot to enhance the probability of signature overlap with le- gitimate, as discussed in last section with chip scale precision resistors. In P-Val, the signatures are generated at the end of the IC design cycle, just before de- ployment. As overproduced chips follow an alternate parallel route to the supply chain, the corresponding unique signatures would be never stored in the original manufacturer’s database. Hence they would be easily detected by a trusted party.

2.7.3 Uniqueness and Robustness of Signature

Uniqueness and robustness of signatures is important for reliably authenticating each chip from the IC manufacturer. Abbreviations 56

Figure 2.16: (a) Distribution of fractional Inter-Hamming distance for 190 bit signatures in 1000 chips at Ipp = 10mA; (b) Probability of signature bits bits being 1; (c) Fractional Intra-Hamming distance distributions for 190 bit signatures in 1000 chips with coarse measurement resolution between 0.5 − 1Ω and 10 measurement instances (compared with reference of 0.05Ω resolution).

2.7.3.1 Simulation Setup & Metrics

Signatures are analyzed for 1000 legitimate chips and 20 authentication pins per chip. The MIM AF program resistances (Ron) are taken from the measurements in [63] as in previous analysis. With 20 pins, 190 (20C2) bit IC signatures are generated. The metric quantifying the signature uniqueness [21] is the fractional

Inter-Hamming distance distribution with average (HDinter) as:

N−1 N X X HDinter = (2/(N ∗ (N − 1))) ∗ HDij i=1 j=i+1

Here HDij is the fractional Inter-Hamming distance between chips i and j and N

is the number of chips. HDinter is desired to be close to 0.5. As discussed, P-Val is robust to temperature variations. During measurement, random instrumentation noises can be nullified by averaging over multiple iterations (as done in [63]) and hence not considered here. The robustness of P-Val is tested in an extreme scenario of resistances being measured only at coarse resolutions between 0.5−1Ω, perhaps due to limitations of measurement setup or high noise levels. In this case, close resistance values could lead to bit flips during comparison. Signature robustness has been quantified by the fractional Intra-Hamming distance distribution [21] and

its average value (HDintra) is: Abbreviations 57

N Z−1 Z X X X HDintra = (2/(N ∗ Z ∗ (Z − 1)))) ∗ HDIijk i=1 j=1 k=j+1

th Here HDIijk is the fractional Intra-Hamming distance for chip i between the j th and k measurements. Here Z = 10 i.e. 10 measurements. HDintra should be ideally zero.

2.7.3.2 Results

Fig. 2.16(a) illustrates the fractional Inter-Hamming distance distribution with

1000 chips. Majority of the values lie between 0.4 and 0.6 with the mean HDinter = 0.506, signifying high uniqueness of signatures. The probability of each of the bits being 1 varies between 0.47 and 0.53 (shown in Fig. 2.16(b)), signifying absence of any 1/0 bias.

For comparison of the condition of measurement resolution between 0.5 − 1Ω (10 values uniformly drawn in range) with the reference, the corresponding metric is illustrated in Fig. 2.16(c). Even in this scenario, most values are located within 0.05 with mean value 0.0242, which is acceptable for security primitives like PUFs [20], [21]. Hence P-Val leads to the generation of unique, robust chip signatures for authentication.

2.7.4 Sample Cloning and Overhead Values

The approximate number of candidate authentication pins in different IC types and the resulting clone probability according to simulation results is given in Table 2.6. We have considered high end ICs like an Intel-i7, Xilinx Spartan-6, TMS320C620 DSP and mid, low range chips namely a µC (ATmega32L) and an analog mul- tiplexer (ADG509F). For high end chips, with supply/GND comprising a large fraction of pins, only 10% of the total pins are considered here for authentication AFs. As evident from simulation results, high end chips with P-Val are impossible to clone. For the µC (40 pins), selecting even 20 out of the 32 GPIO ports would practically reduce the probability to zero. For the low end chip ADG509F (16 Abbreviations 58

Table 2.6: Security & Estimated Package Area Overhead of P-Val

No. of Authen- Approx. prob. of Package Area tication AFs cloning 1 chip Overhead (%) Core-i7 135 ∼ 0 4.2 ∗ 10−4 Spartan 6-LX 118 ∼ 0 5 ∗ 10−4 TMS320C620 35 ∼ 0 1.1 ∗ 10−4 ATmega32(L) 20 ∼ 0 4.4 ∗ 10−2 ADG509F 10 ∼ 0.2 1.6 ∗ 10−2 pins) all 10 I/O pins (select, input ports) would be considered for P-Val. With a resulting clone probability of ∼ 0.2, the economic shift in cost/benefit ratio would prevent any cloning attempts.

With the area of MIM AF and TF structures being dominated by the size of the electrode contacts (2λX2λ), the package area overhead is negligible (< 0.05%) for the different ICs (Table 2.6), considering that each AF-TF pair requires over- head beyond package estate. P-Val incurs zero design modifications and die area overhead. All AFs are programmed at maximum pin currents post-verification, typically reducing Ron to 10 − 30Ω. Together with capacitances in fF range, load- ing effects on the pins are minimal. The only P-Val overhead would be a minor rise in packaging cost.

2.8 Conclusion

In this chapter, we have presented two novel, antifuse (AF) based active defense approaches “C-Lock” and “P-Val”, for unified protection against both recycling and cloning of ICs, the two major forms of counterfeiting. C-Lock is a die level technique with AFs in the I/O port logic, thereby requiring some design modi- fications. The hardware overhead and test effort is shown to be much lesser as compared to existing DfS techniques. On-chip OTP AFs lead to C-Lock providing very high security against recycled/remarked chips. P-Val is a package level ap- proach, based on the same high level method of active defense against recycling. Here the AFs are integrated on package between die pad and external pin. Hence P-Val requires no design changes and H/W overhead on die, resulting in its appli- cability towards legacy ICs, which is mostly not possible with existing techniques. P-Val is also efficient for analog and mixed signal ICs unlike todays’ digital only Abbreviations 59 methods. In P-Val, IC authentication is based on intrinsic, random variations in program resistance of AFs and hence like PUFs, the security against differ- ent forms of cloning is very high. Based on the security requirements of system, one can combine the die-level AFs of C-Lock (could be just 1-2 ports) with the package-level programmed AF based authentication of P-Val to obtain the best of both worlds. Future work would involve experimental validation of these two approaches with integrated AFs in IC design flow. Chapter 3

Nearly Free of Cost Protection against Cloned ICs

In the last chapter, the antifuse based active defense techniques namely “C-Lock” and “P-Val” both require some degree of modifications either at the die level or at the IC package level. Often, in the current semiconductor ecosystem, the strict time-to-market constraints lead to widespread practices of design reuse from pre- vious IP/ICs/systems. As a result, there is an inherent inhibition among IC design houses to incorporate any sort of modifications into the constituent components during design. The thought process behind it is that additional design changes would require extra effort and resources both from design and testing view point, thereby increasing cost and time-to-market. Besides, functional and parametric specifications may be affected as well in certain scenarios. In this chapter, we present a complimentary anti-counterfeiting technique “PiRA” (standing for Pin Resistance based Authentication) [29], which requires no modifications either at die or package level. Hence no hardware overhead is incurred as well. It is based on extracting intrinsic, random variations in resistances of IC pins (due to pro- cess variations) by simple measurements and generating a chip-specific signature for authentication. Hence PiRA is only applicable for protecting against cloning attacks of all types.

60 Abbreviations 61

Figure 3.1: Chip-specific signature creation from the intrinsic variations of pin resistances across ICs, measured by DC input/output current variations for particular voltages at pins. 3.1 PiRA Methodology

PiRA is a simple, robust IC authentication approach, used to validate the integrity of ICs in presence of cloning attacks. It exploits the intrinsic, uncontrolled, random variations in the pin resistances (PiR) within and across different ICs to create unique chip signatures for authentication (A), as illustrated in Fig. 3.1. Pin resis- tance is usually defined as the electrical resistance calculated while looking into the corresponding pin under operating conditions, similar to the concept utilized to measure input resistance/impedance at circuit nodes. It is calculated by measur- ing the input current for particular external DC voltage input at the pins (within specifications) in a powered chip. Powering is required to set stable working states for the active electronic components in the I/O logic. In PiRA, we extend the concept of pin resistance to incorporate the current through the port protection diodes by appropriate forward biasing in the input modes as well as the drive current during output mode source/sink (pin output resistance). Under normal operating input modes, the measured current is the same as the input high (IIH) Abbreviations 62 and low (IIL) leakages at the logic high and low voltages, commonly measured at the digital pins to detect any defects/failures during chip testing [80], [81], [82]. It is referred to as input leakage due to extremely high input resistance at typ- ical digital I/O ports arising mainly from i/p digital buffers/schmitt triggers as well as off-state output components. For analog chips such as operational am- plifiers, the similar measurement corresponds to the small input bias currents at pins. Application of input voltages just outside the supply range (within absolute ratings) would lead to current measurements solely through forward biased diodes and likewise for load currents through source/sink driver transistors in output ON modes. We measure these different currents (inversely proportional to encountered pin resistance) and use its variations across chips to create unique signatures.

During conventional IC testing, the different leakages/bias/output currents are tested simply to check if they fall within limits or meet design specifications. We propose to utilize their chip specific variations for IC authentication in PiRA. Sub or super-linear resistances add to the overall signature space entropy and may be calculated from measurements at different DC current-voltage (I-V) points to generate multiple bits per pin. Impedance estimation through AC measurements may lead to greater information, but this work is limited to DC tests. The candi- date pins for authentication include all types and functions encompassing general purpose input-output (GPIO), input (i/p) as well as output (o/p) pins. Intrinsic random process variations affect the on-die I/O logic circuits. PiRA is based on extracting these variations by appropriate external current measurements. These are normalized and used for the purpose of chip authentication. The presence of high entropy per pin enables application of the approach to small scale low- pin-count ICs of all types. Besides through PiRA, one can measure variations across individual discrete components of I/O port logic, resulting in higher ex- tracted entropy compared to most on-chip Physical Unclonable Function (PUF) based schemes, where the random variations are averaged over multiple elements to obtain the final measured parameter (e.g. delay over inverters in RO-PUF).

PiRA only incurs some additional test effort at the end of the IC design cycle as illustrated in Fig. 3.2(a). The signature generation happens during this phase under the control of the IC manufacturer. The unique legitimate fingerprints are stored in the designer’s secure database. The pin measurement and signature creation protocol is public. On receiving an IC from the supply chain, a system designer/integrator would generate its signature after appropriate measurements. Abbreviations 63

Figure 3.2: a) Incorporation of the signature generation step in the IC design cycle; (b) seamless integration of PiRA with the current semiconductor business model for enhanced security.

If the signature matches any of the current entries in the database, then it is confirmed as a legitimate chip (Fig. 3.2(b)). A secure, verification device (VD), although not a necessity for PiRA implementation, can be passed from the designer to the system integrator for IC authentication. This provides the advantage of standardized measurements and avoids effect of site-to-site variations on signature generation. Besides, a VD performing automated authentication may lead to easier adoption of PiRA by minimizing time-to-market.

The pin resistance variations within and across chips due to process noise are uncontrolled and virtually impossible to clone for each pin of an IC. Hence, for sufficient extracted entropy and thereby large enough signature length (e.g. ≥ 80 bits), all chips including cloned ICs would practically possess a unique signature. As chips cloned through IP piracy, IC reverse-engineering and overproduction typically follow alternate routes into the supply chain, their fingerprints are not stored in the IC manufacturer’s database. Hence, they would be easily detected by a trusted party like system designer, just like the scenario of P-Val in the previous chapter. Moreover, if the IC designer maintains separate databases for different grades of a chip family, a low-grade IC being sold as higher grades would be detected as well using PiRA. The chip signatures are generated at the end of the design cycle. Hence, PiRA is resistant to any information leakage and tamper based attacks. As compared to existing design-for-security techniques, PiRA provides two major advantages: 1) It incurs virtually zero design effort and hardware overhead at similar protection levels as other approaches; 2) It can be applied to legacy designs, which comprise a major portion of the market. Its usage Abbreviations 64

Figure 3.3: Measured variation of pin input current in (a) normal operational mode (logic high i/p voltage of 5.5 V) and (b) forward-biased ESD diode (6 V i/p for Vdd = 5.5 V) in 28 PIC micro-controller chips. extends to chips of all types including analog/mixed-signal ICs, in which existing all-digital security primitives are difficult to implement.

3.2 Implementation of PiRA

PiRA is based on entropy extraction from inherent, random variations in chip I/O port components to create unique signatures for authentication. The variations are captured through external pin resistance (current) measurements under different modes and varying bias conditions. The variations in the measured input pin current at logic high voltage as well as through the corresponding Vdd forward

biased protection diode (at i/p of 0.5 V greater than Vdd) of a PIC micro-controller port [3] are illustrated in Fig. 3.3. Next we describe in detail the implementation of PiRA.

3.2.1 Sources of Entropy

A general schematic of the electrical path from the external digital IC pins to the core circuitry through the packaging layers, contact pad and Input/Output (I/O) logic is illustrated in Fig. 3.4(a). A detailed diagram of representative I/O logic components on die as well as assembly/package constituents along the pin electrical path are shown in Fig. 3.4(b). Package level components including wire bonds, solder balls, die pads, vias etc. contribute minimally towards pin resistances. In the I/O port logic in most representative digital IC families, the input buffer (often a schmitt trigger), output pull up/down driver transistors, protection diodes to both supplies as well as optional pull up/down resistor networks (implemented Abbreviations 65

Figure 3.4: (a) Schematic of typical path from IC pin to die core logic; (b) Representative on die I/O logic as well as package level components. usually with transistors) are the major contributors to measured pin resistances (current) under different bias conditions, applied voltage, register settings etc. For analog ICs, this could include both bipolar or CMOS transistors and different bias resistor networks.

Process variations during IC fabrication would cause intrinsic, random differences in these I/O components within and across chips, specifically the geometry, dopant concentration of the constituent transistors etc.. These lead to varying electrical properties and hence resistances of the discrete I/O components (remains within design specifications), which are captured by appropriate DC current measure- ments in PiRA. Independent random variations across different I/O components lead to higher entropy of signature space. Through PiRA, one can measure the intrinsic variations in individual discrete components like a diode, PMOS/NMOS driver transistor etc. which preserves the underlying entropy as compared to cur- rent or delay based PUFs which perform averaging over different components.

3.2.2 Measurement Scheme

For PiRA pin resistance measurements during normal input operations, an exter- nal voltage in the allowable range is applied at the pin and the input leakage/bias Abbreviations 66 current measured with a high resolution ammeter whose range extends to µA/nA [83]. For general purpose input-output (GPIO) pins, the output path is placed in the high impedance (Z) mode before the measurement through for example writ- ing the corresponding value in the data direction register. Lower the measured current, higher the pin resistance and vice versa. For the common digital ICs (e.g. micro-controller, FPGA, processors etc.), input voltages in the logic high (LH) and low (LL) ranges lead to activation of different pull down and pull up networks in the input path logic. In CMOS technology, PMOS based pull up and NMOS based pull down have different nominal doping types, concentrations, geometries etc. and hence the variations in the resistance for LH and LL are mostly indepen- dent (uncorrelated) of each other. This increases the signature space entropy. This can also be verified by the fact that both input high (IIH) and low (IIL) leakage tests are performed in all pins as part of IC parametric/defect tests. Empirical tests show that in digital ICs, intermediate input voltages (between LH and LL) lead to floating values and hence varying currents for different trials. Hence only two i/p voltages at the logic high and low ranges are considered for pin leakage tests in digital ICs. Multiple candidate pins can be selected from the same port as the process variations affecting each I/O path within the chip are mostly random. As different pins of a port (e.g. 8 bits of I/O port etc.) do not effect each other in terms of I/O operations, other pins may be left unconnected while measuring a particular pin. If o/p resistance and hence drive current variations are included for signature creation, the corresponding pins are placed in the o/p mode. The o/p current is measured for a particular written (through internal register) high/low port voltage and fixed load across different chips. In this way, the pull up/pull down driver transistor variations may be extracted. To further incorporate pro- tection diode variations, i/p voltages greater or less than Vdd and Vss by 0.3-0.7 V (less than absolute ratings) would be applied to forward-bias either diode in the input mode and the resulting current measured. The different measurement schemes for digital I, I/O, or O pins are in Fig. 3.5.

The measured leakage currents at both logic low and high input voltages for three pins are illustrated in Fig. 3.6 for three PIC micro-controller ICs [3]. Out of three pins, two are from the same port (pins 3 and 7 of port A). The measurements are performed with the high precision semiconductor analyzer instrument [83]. According to specifications, the power pins are connected to 5.5 V and 0 V respec- tively. The negative values for the 0 V i/p signify the reverse direction of current as compared to 5V i/p (source/sink). On careful observation, it is seen that for Abbreviations 67

Figure 3.5: (a) Typical measurement scheme of input leakage currents; ex- tended schemes measuring (b) output drive; (c) forward-biased diode current that can be utilized in PiRA to create signatures. the 1st chip, at 0 V, the current for pin 3 of port A is slightly higher than that of the pin 7. However, the reverse is observed at the 5.5 V input. Similar is the trend (reversed comparative values at low/high i/p) for chip 2. On the contrary, for chip 3, pin 3 of port A has higher current values for both inputs as compared to pin 7, but the magnitude by which the current is higher at 5.5 V (compared to nominal) is much greater than that at 0 V. Similar analysis may be performed with the port B pin. The general comparative trend as well as the percentage deviation from inter-die nominal value are important in the final signature entropy. Hence, the variations considering same/different port pins and logic low/high input voltages add to the net chip entropy for authentication. For pure analog circuits such as op-amps, depending on existing linearity/correlation between pins (e.g. often if inverting i/p shows higher than nominal bias current, then so does the non-inv. i/p), intermediate multiple i/p voltage points, between maximum and minimum recommended values may be analyzed for inclusion into the signature space.

3.2.3 Signature Generation

The signature generation scheme is chosen to utilize the extracted entropy and achieve high uniqueness and robustness of the legitimate chip signatures. For enhanced robustness, comparison of inter-pin (pins of same chip) normalized vari- ations is preferred over other schemes such as digitization of individual pin values over the entire range. This is analogous to the comparisons between ring oscillator Abbreviations 68

Figure 3.6: Measured pin leakage currents at logic low and high input voltages for 3 different pins across 3 chips [3].

frequencies in RO-PUF implementations [20]. Small changes in pin currents across different iterations due to common mode (temperature/input voltage change) or other sources generally do not affect the signature bits in the said scheme. How- ever, as compared to delay based PUFs, where all the delay paths are designed for the same nominal frequency, the individual pins (even from same port) usually have different nominal resistances, especially for input leakage current measurements. This is because pins may be multiplexed with different functions to save pin count in chips like micro-controllers, FPGAs etc. Moreover, I/O logics are designed to meet the specification limits rather than having the same nominal electrical val- ues. As a result, to compare between pin values of an IC to generate signature, a normalization scheme has to be chosen. The scheme should distribute the pin vari- ations around the chosen nominal parameters in an un-biased manner. Another advantage of the comparison based scheme is the rapid growth (quadratic-linear

from NC2, N is no. of pins) of signature space with number of candidate pins, especially beneficial in small scale ICs. This allows one to create large signatures with greater uniqueness as well as bit selection for enhanced robustness.

After empirical analysis, normalization of individual pin values around the corre- sponding global pin mean (µ) and standard deviation (σ) is chosen. Here, global values refer to the µ and σ of the measured current distribution of corresponding pin across all the chips. The IC lot, used to calculate the global values, would ideally represent the spectrum of possible variations for the chip pins. The µ and σ of the distribution after normalization is 0 and 1 respectively. Due to such nor- malization, intrinsic variations in pin i/p leakages, o/p drive currents and forward biased diode currents etc. from different pins can all be incorporated into the Abbreviations 69 total IC signature space, leading to utilization of the high available entropy. For example, in a digital IC where only two input voltages are selected per pin and 10 candidate pins are considered, there are 20 normalized values for each IC. For some chips, not all pins (e.g. reset, offset etc) would exhibit un-correlated varia- tion characteristics for 2 voltage inputs. In these cases, one voltage point may be considered for a particular pin where as multiple i/p voltages for non-linear ana- log inputs. Two values from the entire normalized set are chosen and compared between each other to produce a 1/0 signature bit. Hence, the general formula for the total number of possible signature bits L is

L = SC2, S is the total possible comparisons. Often in digital ICs,

S = (M ∗ N)C2, M is average number of independent readings per pin and N is the number of pins.

The maximum number of parameters per digital I/O pin is usually 6 (2 i/p, 2 o/p, 2 diode currents). Hence even just 3-4 such pins could provide a large signature data set. For purely analog I/O, amount of data per pin vary with different chip types. The large signature space allows for non-robust, biased bit removal and maintain a unique signature greater than for example ∼ 80 bits. It is empirically observed that non-robust bits are caused due to comparison between two close (al- most equal) normalized values. Hence, across the sample set of chips, non-robust and/or biased comparisons are removed by automated methods. Through testing it is often seen that particular individual pin/s cause non-robust signature bits and should be discarded from analysis. The chosen normalization parameters, comparison pairs and the particular order for signature creation are applied to all manufactured ICs. To account for possible widely varying IC pin leakage currents due to different foundries etc, the normalization parameters may be updated fol- lowing different protocols, but is not pursued here. The IC signature generation steps are enumerated below-

Signature Generation Steps Input:

[Ij], the parameter vector of chip,

[Mj]&[SDj], mean, standard dev. for each j, C ⊆ S,C← select comparison pairs for IC Abbreviations 70

for j ⊆ (1, .., N), N←no. of parameters (pins, voltages)

Normalization: for all i ⊆ (1, .., N),

Ti = (Ii − Mi)/SDi end Comparison: co ← 0 for all i ⊆ (S), for all j ⊆ (S), j 6= i, (i, j) ⊆ (C) co ← co + 1

If Ti >= Tj

Ps[co] = 1 else

Ps[co] = 0 end end

Output: Sig = [Ps(k)], k ⊆ (1, .., L), L← |C|

3.3 Security Analysis

In this section, we analyze the security of the proposed approach from the as- pect of probability of copy of any legitimate signature by malicious adversaries. Through experimental measurements, two commonly used chips are verified for high uniqueness and robustness of signatures.

3.3.1 PiRA Security

With PiRA, along with cloning the design, an adversary needs to copy a legiti- mate chip signature to pass authentication. For most chips, PiRA implementation reduces probability of cloning signatures to virtually zero. Abbreviations 71

1) PiRA is based on intrinsic, uncontrollable variations in pin resistances within and across chips. The variations are mainly due to intra and inter-die random process variations in the I/O logic. The entropy of signature space is increased by choosing variations across multiple individual I/O components at different in- dependent voltages per pin. 8-10 candidate pins with just logic high and low i/p voltages can easily allow a minimum of 80 bit IC signature. The resistance looking into IC pins follows different distributions (different µ and σ) for different pins. IC design is performed to only constrain them within specified limits rather than achieve same nominal values etc. Hence, the varying distributions renders it virtually impossible for attackers to replicate them for each pin of an IC. The inherent randomness of sampled values due to manufacturing processes would lead to unique combination of leakage currents for all ICs, including malicious chips. With a maximum of 1-10 million chips (∼ 223) of a kind produced, a > 70-80 bit signature renders all attempts by an attacker futile in copying any legitimate IC signature while maintaining any economic benefits.

2) In case of small size of signature space (e.g. 6 pins in a voltage regulator IC), the comparison based signature generation may suffer from a security weakness in the fact that an attacker does not require to copy the individual legitimate pin resistance distribution for cloning. As normalization removes all effects of nominal values in parameters, an attacker can copy an IC signature even by entirely different distributions. ICs with enough randomly varying candidate parameters would automatically reject any such attacker attempts. PiRA can incorporate an additional step in signature verification to overcome any such weaknesses for very small scale chips. During authentication, the pin currents would be measured and during normalization would be compared with IC designer chosen nominal values. If any value/s lie outside legitimate distributions, then the IC is rejected even if signatures match. This additional step further strengthens the defense of PiRA for small chips with minimal number of authentication parameters.

3.3.2 Uniqueness and Robustness of Signature

We experimentally measured pin currents and generated signatures for 28 PIC 16F722A micro-controller (µC) chips [3] and 22 LM741 op-amp ICs [84]. Although a larger sample set for both would have led to a better representation of the signature space, the empirical measurements serve only to verify the feasibility Abbreviations 72

Figure 3.7: (a) Fractional Inter-Hamming and (b) Fractional Intra-Hamming distance (5 repetitions) for 82 bit signatures across 28 PIC µC ICs; (c) Proba- bility of 1 of signature bits.

and efficiency of the proposed approach. For the 28 pin µC chips, 8 pins were considered with two voltage (logic high of 5.5 V and low of 0V) inputs per pin. There are more candidate pins, but 16 (8X2) considerably independent IC pin parameters lead to 120 (16C2) possible signature bits. The 8 pins include 3 pins of port A (IC pin 1, 3 and 5), 3 pins of port B (IC pin 14,15 and 16) and 2 pins of port C (IC pin 24, 28). All µC pins are multiplexed with other functionalities (e.g. clock, ADC input, reset) and hence have different distributions. With Vdd and Vss of 5.5 V and 0 V, the leakage currents are measured in normal input mode. A high end state of the art characterization system with sub-pA current resolution [83] is used for measurement. In a suitably controlled environment (ambient temperature of 25◦C and minimal disturbances), each chip is placed on the socket and readings conducted. Same socket and probe wires, cables are used for all chips to avoid any effect of external variations. Each measurement is repeated 8 times around the same instant and the average taken to remove random measurement noises. The measurements for each chip are also repeated on 5 different days with ∼ 5 − 10◦C ambient temperature differences to test for robustness at different temporal instants. We had used the same measuring instrument. A verification device (VD), exchanged between IC manufacturer and system designer (as mentioned earlier) can be used for these standardized measurements across different sites. The metric for signature uniqueness [20] is the fractional Inter-Hamming distance and its average is- Abbreviations 73

Figure 3.8: (a) Fractional Inter-Hamming and (b) Fractional Intra-Hamming distance (5 repetitions) for 80 bit signatures across 22 OP-AMP ICs; (c) Prob- ability of 1 of signature bits.

N−1 N X X HDinter = (2/(N ∗ (N − 1))) ∗ HDij i=1 j=i+1

HDij is the fractional Inter-Hamming distance between chips i and j and N is the number of chips. HDinter is desired to be ∼ 0.5. Signature robustness has been quantified by the fractional Intra-Hamming distance distribution [20] with average value (HDintra)- N Z−1 Z X X X HDintra = (2/(N ∗ Z ∗ (Z − 1)))) ∗ HDIijk i=1 j=1 k=j+1

th Here HDIijk is the fractional Intra-Hamming distance for chip i between the j th and k measurements. Here Z = 5. HDintra should be ideally around zero. Besides, a third quantity providing an estimate of the randomness of individual signature bits (hence presence of any bias) is given by the probability of 1/0 of the bits over all ICs.

After normalization, it was observed that parameter values varied by a maximum of 2-2.5 σ around mean. For the PIC 16F722A, considering all 120 signature bits, the robustness was an issue with the average HDintra being equal to ∼ 9%. Abbreviations 74

The Inter-Hamming distribution was centered around an average of ∼ 0.5, but the distribution was very wide. A few bits were biased to 1 or 0 (probability greater than 0.75 across chips). After thorough post-measurement analysis, the above-mentioned degradations could not be only attributed to a particular pin or set of pins. Rather, different comparisons between normalized values contributed towards them. These non-robust and/or biased ones are removed by setting appro- priate thresholds. The outliers in the inter-hamming distributions were analyzed for reduction as well. The final fractional Inter-Hamming distance distribution for the selected 82 bit signature is illustrated in Fig. 3.7(a), with an average of 0.516. The same for the Intra-Hamming distribution is shown in Fig. 3.7(b), with an acceptable average of 0.044. Fig. 3.7(c) shows that none of the bits are biased with probabilities of 1 being between 0.4 to 0.63.

For the op-amp LM741, measurements were conducted with positive and negative supply voltages of 15 and −15 V. Only 5 pins are candidate pins for PiRA in the 8 pin IC. These are the two inputs, two offsets and the output pin. Although the supply values allow for wide input voltage range at the pins, only 4 voltage points for the 2 i/p pins are considered due to significant linearity (correlation) in pin cur- rents at multiple voltage points for a pin. Similarly, only 2 measurement voltages are considered for the two offset pins and 3 for the output pin. These points were considered from the analysis of increasing percentage of non-correlated variation around the nominal (different linear slopes for different ICs) across chips. Inclu- sion of more voltage inputs does not really increase the net entropy. Additionally, a 1 K Ω resistor is connected in series with both offset pins for all chips to reduce the high current range to measurable values. Overall 15 nominal parameters were calculated per IC, with total of 105 signature bits. After similar analysis as the µC, bits were removed for increased uniqueness/robustness. The corresponding op-amp metrics (Fig. 3.8(a),(b) and (c)) for 80 bit signature are within acceptable limits for authentication.

Apart from normal mode pin resistance measurements, signatures of the 28 µC chips have also been created from forward biased protection diodes at the pin in- puts. Both Vdd and Vss diodes have been considered for analysis in 7 pins, namely 3 of port A and 2 each for ports B and C. More no. of pins can be easily considered to extract higher entropy. The forward biased diode resistance has been measured at an i/p voltage which is 0.5 V outside the supply ranges (Fig. 3.9(a)), still within Abbreviations 75

Figure 3.9: (a) Forward-biased diode voltage selection; (b) Fractional Inter- Hamming and (c) Fractional Intra-Hamming distance (5 repetitions) for 91 bit signatures considering both Vdd and Vss diodes in 7 I/O ports across 28 PIC µC ICs; (d) Probability of 1 of each signature bit. the absolute ratings. The currents are in the range of 50 µA. For these 14 nor- malized parameters (2 diodes per IC pin) for every chip, 91 bit signatures have been created without any particular biased or non-robust comparisons. As in the previous scenario, measurements were repeated on 5 separate days for robustness analysis. The metrics for overall signature quality, considering only IC port diodes is illustrated in Fig. 3.9(b), (c) and (d) and are well suited for IC authentication.

Removal of 4-5 comparisons can bring HDintra down to 4.2%. Although not per- formed here, load current measurements for driver source/sink transistors can be done similarly for a fixed load resistor in the port output modes to extract further entropy. Hence efficiency of PiRA has been verified experimentally for sample ICs.

3.3.3 Discussion

Apart from the micro-controller and the op-amp, we analyzed the feasibility of PiRA in a discrete SRAM memory chip as well [85]. Due to constraints of time, we had analyzed the signatures in 25 SRAM ICs with leakage measurements in 10 input pins, namely 8 address pins and 2 control pins (output enable and chip enable) at logic high and low voltages. 135 bit signatures were generated for each IC after removal of some biased bits and analyzed for uniqueness. The results are shown in Fig. 3.10(a) and (b). Multiple iterations of SRAM measurements Abbreviations 76

Figure 3.10: (a) Fractional Inter-Hamming distance for 135 bit signatures in 25 SRAM ICs (good uniqueness); (b) Probability of 1 of each signature bit. would be done to test for robustness in the future. The calculated uniqueness metric values are a major step towards verifying implementation of PiRA in SRAM memory chips.

A better controlled automated precise measurement setting, available in industries would lead to better quality of signatures. Moreover, the ICs considered in the pa- per are manufactured at much older process nodes. The efficiency of PiRA would increase for higher process variations in ICs fabricated at recent technologies. For PiRA, authentication is till the system designer level, which is sufficient for pro- tection against most cloning attacks. Research would be conducted to analyze signature robustness with aging and hence potential for in-field authentication.

3.4 Conclusion

We have presented a simple, novel IC authentication scheme, PiRA, to protect the integrity of chips against all forms of cloning attacks. PiRA exploits the random intrinsic variations of pin resistances within and across ICs to create chip-specific signatures for authentication. For example, pin resistances in the normal input Abbreviations 77 mode can be obtained by measuring the input leakage/bias currents at pins, anal- ogous to the input high and low leakage based defect tests in ICs. Signatures are generated in PiRA by incorporating multiple I/O logic components per pin to increase the overall entropy. Compared to existing design-for-security tech- niques, PiRA has a major advantage of incurring virtually zero design effort and hardware overhead. Furthermore, it can be applied to ICs of all types including analog/mixed signal ICs. PiRA is suitable for legacy design chips as well.

This chapter discusses the possible sources of variation of pin resistance, the mea- surement scheme for PiRA as well as the signature generation for authentication. Security has been analyzed against all possible cloning attack modes. Finally, ex- perimental measurements for sets of commonly used digital and analog IC prove the effectiveness of the scheme. Future research would include extension of PiRA to in-field authentication and to different chip types. Chapter 4

A Flexible Architecture for Systematic Implementation of SoC Security Policies

As discussed in the introduction chapter, in System-on-Chips (SoCs), system level security policies protect the security assets (e.g. keys, fuse configurations, private user data etc.) sprinkled around in multiple constituent IP blocks. These policies are of different types like access control, information flow, time-of-check-time-of- use (TOCTOU), liveness etc. and offers system level protection to the SoC from different threats. However, in the current complex SoC design process involving different design and integration teams, these policies are typically implemented in an adhoc, non-systematic manner. This creates significant problems in verifying SoC adherence to security requirements during post-Si validation, on-field tests as well as patch or upgrade the policies in the scenario of bugs found or changing security requirements. Besides, the principle of design-reuse, which is followed significantly in SoC design, is also hampered. In this chapter, we present a flexible, scalable security architecture framework that would provide SoC designers with a systematic, methodical and disciplined approach to analyze, verify and upgrade SoC security policies.

78 Abbreviations 79

4.1 Architecture

Fig. 4.1 illustrates our proposed architecture. It includes two main components: (1) a centralized security policy controller IP (referred to as E-IIPS or extended IIPS in rest of the chapter) that executes the SoC security policies, and (2) security wrappers around individual IPs to facilitate communication with E-IIPS. To facil- itate configurability across different products and use cases, E-IIPS is defined as a microcontrolled soft IP. SoC designers can program security policies as firmware modules that are stored in a secure ROM or flash memory in the E-IIPS bound- ary. Secure policy update is supported through an authenticated firmware update mechanism. E-IIPS communicates with other IPs via corresponding security wrap- pers as follows. For enforcing different security policies, E-IIPS may need different local IP-specific collaterals. For instance, suppose a policy prohibits access of in- ternal registers of IP A by IP B when A is in the middle of a specific security-critical computation. To enforce the policy, E-IIPS must “know” when B attempts to ac- cess the local registers of A as well as the security state of the computation being performed by A. The security wrappers provide a standardized way for E-IIPS to obtain such collateral while abstracting the details of internal implementation of individual IPs. In particular, the wrappers implement a standard frame or packet based protocol to communicate with E-IIPS during the execution. Based on the policies implemented, E-IIPS can configure the wrapper of an IP at boot time to provide internal event information under specific conditions (e.g. security status of internal computation, read requests to specific IPs, etc.); the security wrappers monitor for configured conditions and provide requested notification to E-IIPS. The E-IIPS verifies the event in the context of the current security state of the system and assets appropriate security controls if policy violations are detected. The IP development teams are responsible for augmenting individual IP with the security wrapper, by extracting security-critical information (see below).

Design Choices. A key design choice for E-IIPS is its centralized firmware- upgradable architecture, i.e., it is implemented as a single re-usable IP block in the SoC. This choice of central control is governed by the need to provide a sin- gle place for understanding, exploration, upgrade, and validation of system-level security policies. Indeed, the current complexity in security policy analysis and modification is precisely that the policies are “sprinkled” across the different IPs in the SoC in an adhoc fashion. Our centralized architecture is specifically intended Abbreviations 80

Figure 4.1: Schematic of a proposed architecture framework with the major components, for systematic implementation of SoC security policies to alleviate this complexity. On the other hand, this choice implies that interaction of IPs with E-IIPS is a bottleneck for communication bandwidth and hence sys- tem performance. We address this issue by making the security wrappers “smart” or “intelligent” so that only security-relevant information is communicated to E- IIPS, possibly under the latter’s directive. Finally, the choice of a microcontrolled rather than hardware implementation stems from the need to update security poli- cies on-field, either due to customer requirements for their products or in response to a known exploit or design bug. On the other hand, this makes E-IIPS itself vulnerable to attacks through rogue firmware updates. In Section 4.1.4, we discuss authentication mechanisms to address this issue. Next we describe in a bit more detail, the two major components of the proposed security architecture. Abbreviations 81

Table 4.1: Representative set of security critical events according to IP type

Type of IP Example IPs Type of Events Associated Metadata Memory IP Memory/cache read/write request to page size, burst controller, specific address, DMA access, size(DMA) ECC type, DMA engine request scheduling policy change low power clk freq. Processor CPU, GPU, start/end of critical system stored operation logs, Core ethernet controller, threads, interrupt/exception, flag/register settings, video controller firmware upgrade req. process duration Communication Bus Controller, Bridge, data transfer request, source / transfer packet size, serial Core NoC Router, USB destination address, peripheral frequency, router sche- control., PCI Express IP transfer req., idle modes duling policy, bus clk rate Hard logic AES, SHA firmware integrity check duration of Custom IP engines, FFT, start/end, secure key access, operation, mode, DWT block FFT req. by video card local clk domain

4.1.1 IP Security Wrappers

Security wrappers extract security-critical events from the operating states of the underlying IP for communication with E-IIPS. We note that the naive approach of simply extracting all data, control, and status signals from IPs to E-IIPS would in- cur prohibitive communication and routing overhead. To address this problem, we develop security wrappers on IPs that incorporate “smartness” to detect security- critical events of interest in the IP while providing both a standard communication interface between the IP and the E-IIPS and a standard template-based design that can be easily integrated on top of the IP implementation.

The question lies in the fact that how can the wrapper identify necessary security- critical events while still providing a standardized template-based design. The key observation is that IPs in a SoC can be divided into a small collection of broad cat- egories. Table 4.1 shows some of the broad IP categories together with some of the typical security-critical information relevant to each. For instance, “Memory IPs” here include all IPs controlling the access to different memory hierarchies, e.g., memory controllers, Direct Memory Access (DMA) modules, cache controllers, flash control logic etc., and processor cores include general purpose CPUs, GPUs, as well as cores controlled by microcode/firmware e.g., ethernet, UART controller, audio, video cards etc. The security-critical events of interest also standardize sub- stantially within each IP category. For instance, events in a Memory IP include read and write requests to specific address ranges by particular IPs (including DMA), change of request scheduling policy as well as functional/standby modes. On the other hand, events in a processor core include start and end of critical Abbreviations 82

Figure 4.2: Architecture of a generic IP security wrapper. system or application processes and threads, system memory read/write requests, computations generating exceptions, interrupts by system controllers (often uti- lized by adversaries to jump to critical system addresses), etc. Finally, any event is associated with metadata that is sufficient for information about the event, e.g., DMA access details can be analyzed from page size, DMA burst size, and address range. This metadata is communicated by the security wrapper to E-IIPS, often under request from the E-IIPS (see below). Of course, in addition to standardized events, there are some IP-specific requirements in each IP, depending potentially on the IP’s role in the SoC architecture. Our framework allows the SoC integrator to request additional security-critical events from specific IP which can then be mapped into its wrapper.

4.1.2 Security Wrapper Implementation

Our security wrapper design is frame-based, with a standard format for security- critical event definitions, which can be instantiated into corresponding events for specific IPs. A typical IP security wrapper architecture is shown in Fig. 4.2. Fig. 4.3(a) illustrates an event frame with its particular fields. The wrapper typ- ically consists of an activity monitor logic (to identify whether the IP is active), Abbreviations 83

an event type detector, and a buffer to store the event metadata. An event ID is locally generated inside the wrapper and sent as part of the frame to enable E-IIPS to correlate multiple frames from the same event. Some events also require corresponding local clock domains. The wrapper also incorporates registers that would be configured by E-IIPS at boot time to specify particular events which need notification. The frame-based interface is used to provide a standardized communication mechanism with E-IIPS. In general, E-IIPS provides two types of control signals to a security wrapper: (1) disable to block IP functions (in varying granularity depending on policy) in response to a suspected security compromise or policy violation, and (2) request to request more data or controls regarding an event. A specific implementation involving the disable and request signal bits inside the wrapper is shown in Fig. 4.2 (right). Besides, an example communica- tion protocol between an IP security wrapper and the E-IIPS module, with the wrapper sending detected required security critical events and E-IIPS analyzing and requesting more event metadata is illustrated in Fig. 4.3(b). We exploit the existing boundary scan interface of the IP to transmit data in parallel shift/access mode in high bandwidth demands for certain functional security validation.

4.1.3 Security Policy Controller

E-IIPS acts as the “security brain” of the SoC, providing programmable interface to different security policies. Its key functionality is to analyze events communi- cated by the security wrappers, determine the security state of the system, and communicate IP-specific request and disable signals. Fig. 4.4 shows the top- level architecture of E-IIPS. The architecture includes the two major components, (1) a Security Buffer that provides access to the IP-specific event logs from the security wrappers, and (2) the Policy Enforcer that forms the analysis component or execution engine of E-IIPS.

Security Buffer. The security buffer interfaces with the Policy Enforcer through a buffer controller that defines how the buffer frames are analyzed by the Policy Engine. We implement the buffer storage through a standard static segmentation scheme, permitting variable-length segments based on the volume of metadata. The event logs can be read by the controller through ports on the buffer (con- trolled by the buffer controller). The IP wrapper to buffer control logic maintains synchronization and coherence of the security wrapper and Control Engine with Abbreviations 84

Figure 4.3: (a) Fields of a typical event frame; (b) An example communication protocol between wrapper and security engine.

data frames from IPs with different read and write speeds, segment sizes, and event frequency.

Policy Enforcer. We implement the policy enforcer as a microcontroller-based implementation, which can be performed on a standard processor core. Func- tionally, the enforcer is a microcontrolled state machine, that asserts or deasserts the required disable or request signals for different IPs, based on the current security state of system. In addition to a microcontrolled engine, it also includes a standard instruction memory (for storing microcode or firmware implementing the policies), and a small amount of data memory for intermediate computation. The nature of the computation involved in security policy enforcement typically requires some custom modifications of existing commercial cores. We summarize a few of the illustrative necessary modifications.

• Direct Register Writes. Modifications to file update logic is made to allow direct register updates from IP security wrappers in case of security critical events, thus avoiding extra cycles for instruction and operand Abbreviations 85

Figure 4.4: Representative centralized E-IIPS top level architecture.

fetch from memory. This is necessary for time-sensitive policies, including TOCTOU and some access control policies.

• Fused Datapaths and Secure Mode. Often the event metadata width of an IP permits fusing the datapath (e.g. two 32-bit registers into one 64-bit) which facilitates concurrent analysis of multiple frames.

• Branch Prediction Buffers. Branch prediction buffer design is a critical re- quirement for achieving low power and minimal performance overhead, since security policy implementations involve the use of conditional branches and jumps with a much higher frequency than traditional application and system programs.

Finally, the E-IIPS module includes configuration registers to permit the SoC designer to activate only a subset of implemented policies for a specific application or use case. The register is configured at design time through a combination of fuse/antifuses and multiplexers. It also aids in extending E-IIPS as a plug-n-play standalone IP with a generic architecture. Next we provide some details on secure policy upgrades to ensure trustworthiness of the E-IIPS functions itself.

4.1.4 Secure Authenticated Policy Upgrades

In this work, the threat model permits attacks to the SoC through malicious firmware or software to subvert protection of system assets. Since E-IIPS itself Abbreviations 86

is microcontrolled, it is also vulnerable to such attacks. Unfortunately, it is not possible to protect E-IIPS by merely disabling updates to its firmware. Since a key reason for a microcontrolled design is to permit policy upgrades on-field, it is crit- ical to permit such upgrades through firmware updates. To address this problem, we implement an authentication mechanism based on on-chip challenge-response keys. Keys are generated at power-on using a standard technique based on Phys- ically Unclonable Functions (PUF), that exploits intrinsic process variations on silicon to ensure robustness. Since keys are generated at power-on, we avoid on- chip key storage access control attacks through software, firmware or through the system interface. Finally, we avoid TOCTOU attacks during firmware updates by requiring single-threaded firmware updates, i.e. a firmware update cannot be interrupted by an overlapping update request.

4.1.5 Policy Implementation in SoC Integration

Security policy implementation in the SoC through our framework requires col- laboration between IP developer and SoC system integrator.

IP Provider: The IP provider identifies the key standard as well as custom specified security-critical events in the IP (along with required metadata), based on the IP type (Table 4.1); this content is incorporated into frames via the frame generation logic along with registers for configuration, to build the security wrapper.

SoC Integrator: The SoC integration team implements the security policies chosen for the application through the Policy Enforcer firmware of E-IIPS. He/she also configures the E-IIPS module according to the features required for his usage scenario. Furthermore, since E-IIPS is centralized and IPs are delivered with security wrappers integrated, the SoC integration validation team is responsible for verification of policies against system-level use-cases through simulation, hardware accelerator, FPGA, or post-silicon.

4.1.6 Alleviation of Issues

The proposed architecture alleviates to a major extent, the previously mentioned problems arising due to the current ad-hoc, non-systematic nature of implemen- tation of SoC security policies involving multiple parties and spanning the entire Abbreviations 87

Table 4.2: Policies for Usage Case Analysis

Security Policy Description Associated IPs Secure Crypto Verify functionality of E-IIPS, memory controller, crypto- Verification AES engine at power-on core, test-access control Access Control Prevent DMA access to E-IIPS, memory system level addresses controller, DMA engine design cycle. The security policy controller, evaluating the system security state and implementing the required policies serves as the central IP based control engine to explore, analyze, validate as well as upgrade the system security requirements. This narrows down the typical intensive tasks of SoC security policy implementa- tion from different sprinkled around control logics to mostly a single infrastructure IP block i.e. the SPC. During design of the security wrappers, the standard tem- plate based framework acting as the baseline (depending on the IP category), serves to significantly reduce the design effort and complexity with regards to extraction of security critical events from IPs and implementation of necessary controls inside SoC. System level use cases can be more efficiently studied in a me- thodical fashion to detect potential security violations as well as associated system function, power or performance hamper during post-Si validation. For debugging as well as patching these errors, the corresponding IP security wrappers and the firmware based policies in the SPC can be modified or refined to extract necessary additional or different events if possible or making alternative decisions for assert- ing or disabling controls. Hence with the easily available knowledge of what and where to zone in for security validation and upgrades, afforded by the proposed disciplined policy implementation approach, a lot of the current typical issues are alleviated.

4.2 Use Case Scenarios

In this section, we present 2 use case scenarios of how generic security policies in the domain of secure-crypto and access control are mapped into our proposed architecture. The policy type, its function and the involved IPs in our implemen- tation are shown in Table 4.2. Abbreviations 88

4.2.1 Use Case I: Secure Crypto

Policy: Crypto-processor data paths including the encryption engine need to be functionally validated at power-on before any execution.

The above policy ensures trustworthiness of the system at boot time. The valida- tion stipulated by the policy includes checks for correct operation of encryption (e.g., AES) and hash (e.g., SHA-1) engines as well as ensuring the stochasticity (randomness) of bits output by the True Random Number Generator (TRNG) inside a typical cryptographic core. Often an undetected functional failure of the crypto data or control path results in compromise of the system security, mostly in terms of availability of resources (leading to denial of service attacks etc.). In this study, we provide a sample implementation of how a SoC designer maps the AES engine verification to the proposed platform. The flow of operations/mes- sages between the E-IIPS and the corresponding IP security wrappers through the standard interfaces is illustrated in Fig. 4.5.

In our implementation, the E-IIPS waits for the system boot process to finish (including power-on-self tests, firmware integrity check, system software load to memory) before proceeding with the AES verification. This ensures the full trust- worthiness of other system components during the check. The “boot mode finish” is indicated by a particular value of representative system mode register, mapped to the memory. In response, the E-IIPS disables external system interfaces of peripheral cores, JTAG and other test/debug ports to eliminate possible attack surfaces. It also blocks transition to system execution mode. E-IIPS configures a set of known plaintext and key inputs in a buffer in the crypto-processor wrapper at boot time through potentially using the serial/parallel boundary scan interface for high bandwidth communication (not shown in Fig. 4.5). The appropriate crypto test access port settings are asserted by the JTAG/TAM controller, in response to E-IIPS configuration request during boot. The desired cipher outputs are stored inside E-IIPS boundary. E-IIPS sends a particular plaintext/key buffer index to crypto-wrapper for execution. The computed cipher text is communicated through frames (or boundary scan) to the E-IIPS for verification. If it matches the desired, the proactive holds are lifted and the system goes to the normal execution phase. The exact sequence in terms of event detection and message communications is summarized: Abbreviations 89

Figure 4.5: Flow/message diagram representation of implementation of Use Case I.

• Memory Control Wrapper (MCW) detects event “system boot finish” and transfers frame to E-IIPS.

• E-IIPS reads the event from security buffer and asserts the disable interface to peripheral cores and test/debug ports. It blocks system execution by write to register mapped in memory through the MCW request interface.

• Receiving confirmation through frames that these actions have been per- formed by IP wrappers, the E-IIPS sends the plain text and key buffer index through the crypto processor wrapper (CPW) request interface. Abbreviations 90

• The cipher text is computed, 5 frames are generated inside CPW, the first one containing the event “Encryption Complete” and metadata indicating that next 4 frames (of 32 bits) constitute the 128 bit cipher output.

• CPW sends the frame 1. E-IIPS sets the appropriate CPW request signals for the next 4 frames.

• E-IIPS verifies the computed cipher text.

This also shows a use case where the P1500 boundary scan infrastructure can be suitably used for high bandwidth data/control communication in our framework, thereby reducing routing complexity and overhead.

4.2.2 Use Case II: Access Control

Policy: Direct Memory Access (DMA) is prohibited in system-specific (ring 0/1 in 4 ring system) addresses of different IPs in the SoC memory space.

Most current SoCs involve DMA to the system memory through one or more dedicated DMA controllers/engines to reduce the workload on the processor cores. DMA through I/O peripherals (in memory-mapped I/O schemes) are often utilized by attackers to snoop assets and modify system-level code. Policies like the one above protect against these security threats.

As illustrated in Fig. 4.6, E-IIPS configures the IP-specific system-level address ranges at boot time in the memory controller through its security wrapper (MCW). When an access from the DMA controller is detected by MCW, the requested address is checked with the system specific ranges inside the wrapper logic. In case of no violation, the system memory bus is granted for DMA. In case of system address overlap, the request is blocked, the violation is logged as an event along with the corresponding DMA channel number (device), and is communicated to the E-IIPS through frames by MCW. E-IIPS maintains a buffer of DMA violations in the recent past for different I/O channels. If the number exceeds a threshold within a set time (configured by SoC designer) for a particular channel, memory access requests from that device are disabled. The specific events/message flows and interface signals, as shown in Fig. 4.6 are summarized: Abbreviations 91

Figure 4.6: Flow diagram representation of implementation of Use Case II

• E-IIPS configures system address ranges in memory controller register through MCW request interface at boot.

• When a DMA request is detected, the MCW checks the corresponding ad- dress. If violation is detected, the request is blocked. The event is sent as a frame along with channel number (representing specific device in this case) to E-IIPS.

• E-IIPS updates count of DMA violations for specific I/O channel and com- pares with threshold. If the number exceeds the limit within set time limit, the corresponding device memory accesses are disabled through it’s disable interface. Abbreviations 92

Figure 4.7: A representative high level architecture of our functional toy SoC model in verilog RTL

4.3 Overhead Analysis

Given the dearth of appropriate open-source SoC design models to perform ex- periments and validate proposed architecture level modifications against, we have developed a simple SoC design to assist in the current research. Our toy model has IPs of different functionalities interacting with each other to perform specific system level functions. The IPs are obtained from opencores [86] in Verilog RTL models. An early version of this model with the major architecture level compo- nents is illustrated in Fig. 4.7. In this model, all memory blocks are functionally implemented as register files for ease of synthesis. IP address spaces are mapped to the memory. Hence these IPs can access the memory directly, analogous to DMA. At present, all IP-IP communications are point-to-point. The model has been functionally validated in Modelsim.

All IPs are wrapped with security abstraction layers. Events detected by these wrappers include a major representative subset of those listed in Table 4.1 accord- ing to the IP type, e.g., read/write requests to memory, duration of processes, specific conditional jumps in µP core, transfer start/end of SPI module or en- cryption start/finish in the AES core etc. The 32-bit frame formation logic and Abbreviations 93

Table 4.3: Area & Power Overhead of IP Security Wrapper (at 32nm)

IP Orig. Area Dyn. Pw. Leak. Pw. Area(µm2) Ovhd(%) Ovhd(%) Ovhd(%) AES(128 bit) 101620 2.1 − − SPI Controller 3947 9.2 11 9.7 DLX µP core 290496 6.8 − 5.4 (w. 4KB I/D mem.) FFT(128 point) 1810 10.2 − 16.1 - negligible

Table 4.4: Area & Power of Central Security Controller(at 32 nm)

Die Area(µm2) Dynamic Power(mW) Leakage Power(mW) 2831860 13.67 34.13 metadata buffers are present. IPs contain configuration registers which are con- figured by E-IIPS at boot time (master core system reset asserted). The E-IIPS is implemented with a single DLX 32-bit RISC core (5 stage pipeline). Firmware (security policies) is stored in local instruction memory of 4 KB. 2 bit disable and request signals are output to each IP from the controller.

To obtain representative overhead values, the IPs were synthesized at 32nm predic- tive technology library. The calculated area, dynamic (at 1GHz clock) and leakage power overheads are provided in Table 4.3. The overheads are mostly minimal. In some scenarios, the power reduces after re-synthesizing with wrappers due to internal (heuristic) optimizations and hence reported as negligible. E-IIPS was synthesized at 32nm and the resulting area and power (at 1GHz clock) values are provided in Table 4.4. Finally, the area overhead of the control engine was esti- mated with respect to our toy model and commercial SoCs from Apple and Intel at 32nm (∼ 1GHz clock) and provided in Table 4.5. The 32KB system memory (major area component) area was estimated with established SRAM models. As our toy SoC is rather small with only handful of IPs, the overhead is compara- tively higher. The overhead for the controller is minimal in realistic scenario. For a generic SoC design, we can conclude that the hardware overheads due to the proposed architecture would be minimal. After analysis with NoC fabrics of differ- ent types, the routing complexity and transfer power/energy would be evaluated as part of future work. Abbreviations 94

Table 4.5: Die Area Overhead of Central Controller(at 32 nm)

SoC Die Area(µm2) Overhead of controller(%) Our Toy Model 13.1X106 21.7 Apple A5 (APL2498) 69.6X106 4.06 Intel Atom Z2520 ∼ 40X106 7.1

4.4 Conclusion

In this chapter, we have presented a novel architectural framework for implement- ing diverse security policies in a system-on-chip. It enables systematic, methodical implementation of security policies, and hence can greatly facilitate the process of secure SoC design involving various security assets. It enables effective validation and debug and update/patch of security policies during post-silicon validation, which often imposes major roadblocks in SoC production cycle. The architecture consisting of a centralized security policy controller, referred to as E-IIPS, and generic security wrapper per IP block, is easily scalable to large number of IPs and flexible to accommodate IP blocks of varying function and structural properties. The E-IIPS module, when combined with the capability of implementing conven- tional hardware security functions and primitives, can serve as even more powerful security infrastructure IP. We have analyzed the architecture level details, verified functional correctness of the architecture through extensive simulations and evalu- ated the hardware overhead. The hardware overhead for the proposed architecture is expected to be minimal for realistic SoCs. Chapter 5

Exploiting Design-for-Debug in SoC Security Policy Architecture

In the last chapter, we observed that systematic, methodical implementation of System-on-Chip (SoC) security policies typically involves smart wrappers extract- ing local security critical events of interest from Intellectual Property (IP) blocks, together with a central, flexible control engine that communicates with the wrap- pers to analyze the events for policy adherence. It was analyzed that for certain complex, security critical constituent IP modules, considerable hardware (H/W) overhead may be incurred in designing these wrappers according to the policy re- quirements. Although typically, wrappers follow a standard template based design based on the type of IP, some custom modifications would be required to adapt it to the particular SoC usage scenario and the role of the IP in the SoC architecture. Along with resource overhead, these may also lead to increase in design complex- ity and associated time-to-market. In this chapter, we address this problem by exploiting the extensive design-for-debug (DfD) instrumentation already available on-chip. Modern SoC designs contain a significant amount of DfD features to enable observability and control of the design execution during post-silicon debug and validation, and provide means to “patch” the design in response to errors or vulnerabilities found on-field. Hence, typically the DfD modules already detect the information necessary for most of these events required for security policies. On re-purposing the debug infrastructure for security, the DfD trace macrocells local to the IPs can be configured to extract these security critical events during SoC normal execution. In addition to reduction in the overall hardware overhead, the proposed approach also adds flexibility to the security architecture itself, e.g. 95 Abbreviations 96 permitting use of on-field DfD instrumentation, survivability and control hooks to patch security policy implementation in response to bugs and attacks found during post-silicon or changing security requirements on-field. In this chapter, we demon- strate how to design scalable interface between security and debug architectures that provides the benefits of flexibility to security policy implementation without interfering with existing debug and survivability use cases and at minimal addi- tional cost in hardware resource, energy and design complexity. Below we provide a brief background on typical debug infrastructures implemented in modern SoCs.

5.1 On-Chip Debug Infrastructure

The supported functionality, integration density and complexity of modern day System-on-Chips (SoCs) have increased manifold over the years. At the same time, the number of different, heterogeneous H/W-S/W based IP blocks have grown inside a SoC. Together with extremely aggressive time-to-market schedules in the SoC ecosystem involving various stakeholders, these factors have made post- Si validation and debug of SoC designs the most complex, tedious and difficult part of the design process. Design-for-Debug (DfD) refers to on-chip hardware for facil- itating post-silicon validation [87]. A key requirement for post-silicon validation is observability and controllability of internal signals during silicon execution. DfD in modern SoC designs include facilities to trace critical hardware signals, dump contents of registers and memory arrays, patch microcode and firmware, create user-defined triggers and interrupts, etc. As an estimate, on-chip debug infras- tructure typically comprises ∼ 20 − 30% of the total silicon die area of a modern day SoC [87]. Furthermore, the DfD architecture is getting standardized to enable third-party EDA vendors to create software APIs for accessing and controlling the hardware instrumentation through the debug access ports, for system-level debug use cases. As an example, ARM CoresightTM architecture [53] provides facilities for tracing, synchronization, and time-stamping hardware and software events, a trigger logic, and facilities for standardized DfD access and trace transport.

A representative block diagram schematic of a debug infrastructure in a SoC, along the lines of ARM CoresightTMis illustrated in Fig. 5.1. As shown, typically it comprises of the following major components - 1) Access Port ; 2) Local Trace Sources with trigger logic ; 3) Trace Sinks/Hub ; 4) Debug Communication Fabric. Abbreviations 97

Figure 5.1: Simplified SoC DfD Architecture Based on CoresightTM.

The access port provides access to an external debugger or an on-chip memory de- bugger to configure the individual local debug component triggers for H/W-S/W tracing. External connection could be through a standard JTAG or Serial Wire connection. Internally, the access to trace source configuration registers (program- mer visible) is memory mapped through for example a debug configuration bus, controlled at the access port. JTAG based scan, if present, are also controlled from the corresponding access port. The local debug logic, sprinkled around the SoC and instrumenting individual IP activity/events/traces comprise the trace sources e.g. Embedded Trace Macrocell (ETM), System Trace Macrocell (STM) in Coresight. They comprise hardware or micro-controlled logic like address, data comparators, performance counters, event sequencers, logical event combinations etc., that can be used to detect a particular configured event and thereby trigger collection of H/W-S/W traces around the event (analogous to breakpoints, watch- points in simulation). A simple example for a µP core could be detection of the program counter traversing specific system memory address ranges and tracing all executions around it. In this scenario, the event and the corresponding start and end address ranges would be configured by the debugger accordingly. The trigger for tracing might be local to the IP block or global, arising from other IP debug sources (cross-trigger). Finally the traces are transported over the trace bus ac- cording to standard protocols (e.g. MIPI-STP [88]) and communicated externally Abbreviations 98 through the trace port or stored in trace buffers. The trace sink logic controls the trace collection with source IP encoding and time stamping for synchronization.

To interface with the local debug instrumentation, IP blocks are augmented with standard debug and test wrappers (by IP providers) which extract out the critical nets, IP traces, registers and other important design features, while abstracting out the internal implementation details of the IP . Besides providing observabil- ity into the design, on-chip debug often also provides controllability to the IP functionality to enable patches and upgrades on field and thereby support surviv- ability. Typically, the full range of on-chip debug features is not required after design passing full post-Si validation. But they are still mostly kept in the final production ready designs due to probable changes in critical path delays, routing complexities, power profiles etc. that may occur after any design modification as well as to ensure support for debug of critical issues potentially arising on field. The different debug components are usually power gated during normal on-field operations.

5.2 Methodology

The key insight of this chapter is that we can implement a security policy control framework without incurring significant additional architecture and design over- head, by exploiting infrastructures already available on-chip. In particular, modern SoC designs contain a significant amount of Design-for-Debug (DfD) features to enable observability and control of the design execution during post-silicon debug and validation, and provide means to “patch” the design in response to errors or vulnerabilities found on-field. On the other hand, usage of this instrumentation post production, i.e., for on-field debug and error mitigation, is sporadic and rare. Consequently, computing systems have a significant amount of mature hardware infrastructure for control and observability of internal events, that is typically available and unused during normal system usages.

The main contribution of the chapter is a flexible security architecture that ex- ploits on-chip DfD features for implementing SoC security policies in a systematic fashion. This refines the architecture framework of the previous chapter with the interface between the security and on-chip debug infrastructures to make it more light weight and flexible. According to the requirements of the security policy Abbreviations 99

Table 5.1: Typical Security Critical Events detected by DfD Trace Cell in Processor Core

Trigger Event Ex. Security Context Program counter at Prevent malicious programs specific address, page, trying to gain address range elevated privileges System mode traps for Verify limited special specific interrupts, I/O, file register access by handling, return from interrupt other IPs in kernel mode High conditional branch or Highly branched code jump instruction frequency often signs of malware Invalid instruction , Un-trusted program source; Frequent division by 0 exceptions Apply strict access control Read/Write request to Protect confidentiality, specific data memory page/s integrity of security asset # of clock cycles bet- Satisfy resource availa- ween 2 events = threshold bility & avoid deadlock More than one inter- Verify TOCTOU policy in communicating threads authenticated firmware load in µC architecture, IP security wrappers must be smart i.e., they would detect a stan- dard set of security critical events of interest depending on IP type as well as some custom ones based on the particular SoC usage scenario. The events neces- sary are based on the policy as well as the IP involved, e.g., for a CPU typically, we need to detect attempts of privilege escalation and monitor control flow of programs to detect probable presence of malware in different stacks as well as prevent fine-grained timing based masquerading attacks. Developing IP security wrappers thus requires custom hardware logic for identification of these events. Sometimes in the scenario of a large, complex, security critical IP, the resource overhead of the wrappers may be considerable. However, the DfD modules on chip already detect the information necessary for most of these events. For example, Table 5.1 illustrates a few representative security-critical events that can be de- tected through CoresightTM macrocell for a processor core. Here, the macrocell is assumed to implement standard instruction and data value/address/range com- parators, condition code/status flags match check, performance counters, event sequencers, logical event combinations etc. Correspondingly, local DfD for NoC fabric routers can detect bulk of the critical events required for addressing threats such as malicious packet redirection, IP masquerade, etc. We show how to build efficient, low-overhead security wrappers by re-purposing the debug infrastructure, while being transparent to debug and validation usages. We illustrate some of the Abbreviations 100

Figure 5.2: Additional hardware resources for interfacing DfD with IP security wrapper. design trade-offs involved between complexity, transparency needs, and energy efficiency.

5.3 DfD-Based Security Architecture

Our architecture is built on top of the E-IIPS framework developed in previous work [89]. In particular, we exploit DfD to implement smart security wrappers for IPs that communicate with the centralized policy controller (SPC or E-IIPS), which implements security policies. Furthermore, SPC can also program DfD to implement security controls using de-feature and control logic available for on-field patching and upgrades.

5.3.1 Debug-Aware IP Security Wrapper

We architect smart security wrappers for each IP by exploiting DfD to identify relevant security-critical events; on detection of such an event, DfD communicates Abbreviations 101 the information to the IP security wrapper that communicates it to the centralized SPC. Direct communication between DfD and E-IIPS requires the trace communi- cation fabric to be active along with appropriate interface between the trace sinks and E-IIPS block. As heavy weight trace fabrics are usually power gated during normal system operation, this direct DfD-SPC communication may lead to sig- nificant power overhead and additional hardware requirements as well. Hence the DfD detected events would be sent to E-IIPS through the corresponding IP secu- rity wrapper. To enable this functionality without hampering debug usages for the DfD, we need local (IP-level) modification of DfD logic and appropriate adapta- tion of the security wrapper. Fig. 5.2 illustrates a block diagram level schematic of the additional hardware requirements. In particular, noninterference with debug usage requires transmission of security data to SPC via a separate port (instead of re-purposing the debug trace port), which requires an additional trigger logic.

The events of interest for the IP are programmed by the SPC via the configuration register interface of the corresponding DfD module. Since DfD module can be configured to detect a number of security events (related or disparate) at runtime, SPC must correctly identify the corresponding event from the communication frame sent by the security wrapper. We standardize this interface across all local DfD/security-wrapper pairs, by tagging event information with the corresponding configuration register address. We note that this standardization comes at the cost of additional register overhead in IPs where one or only a few events are detected via debug logic. Trace packet generation controls can be disabled during SPC access (similarly, security-debug interface disabled during system debug) to save leakage power when debug (similarly, security) architecture is not in use. Besides security critical event triggers and observability of associated information, the local DfD control hooks can also be re-purposed (by appropriate SPC configuration) to enforce security controls in the IP via the existing debug wrapper, during both design and on-field patch/upgrade phase.

5.3.2 SPC-Debug Infrastructure Interface

The Security Policy Controller (SPC) must be able to configure the individual local (to the IPs) on-chip debug logic to detect relevant security-critical events and assert appropriate controls. Fig. 5.3 illustrates the communication between SPC and the debug interface along with additional modifications required for this Abbreviations 102

Figure 5.3: Interfacing SPC with on-chip debug. interface. As used during system debug, the existing configuration bus (address and data) is used by SPC for trace cell programming. For enabling SPC access to the configuration bus, small enhancements are necessary in the debug access port (DAP) to include a SPC access link and potentially scaling up the control logic (shown as simple multiplexer here in representation) for DfD configuration source selection. This incurs minimal hardware overhead in comparison to typical debug infrastructure resources found in SoCs.

As there are usually enough configuration registers and associated logic in the local DfD components to monitor all possible security-critical events, the SPC can configure the trace cells with appropriate values at boot phase; therefore, the configuration fabric can be turned off during most times to save leakage power. In some rare scenarios, SPC cannot configure DfD for detection of all necessary events at boot; for these cases, SPC interfaces with the power management module to turn on the Debug Access Port (DAP) and configuration bus at runtime. Apart from the incorporation of the instructions in the SPC firmware memory to load different configuration register values at boot and potentially run time, the programmed register addresses (or other identifier) and associated data values are stored over run time in the SPC data memory (shown as SPC modifications in Fig. 5.3). These are used to uniquely identify DfD detected events from frames sent through the corresponding IP security wrappers over system execution. As an example, if the event frame consists of the configuration register address associated with DfD detected event, the SPC can match/verify with the stored database and uniquely Abbreviations 103

identify the event.

5.3.3 Design Methodology

A SoC design flow involves integration of a collection of (often pre-designed) IPs through a composition of NoC fabrics. An architectural modification involving communication among different IPs typically disrupts the SoC integration flow. We now outline the changes necessary to SoC integration to adapt to our proposed DfD-security architecture.

IP Designer: The IP provider needs to map the required security-critical events to the DfD instrumentation for the IP. The respective configuration register values are derived from the debug programming model. Finally, the security wrapper is augmented with custom logic for events not detected by DfD, and the standardized event frames and interface for communicating with SPC are created, along with the wrapper to DfD interface.

SoC Integrator: The SoC integrator augments the event detection logic of the local DfD instrumentation in IPs with appropriate triggers to the DfD/security- wrapper interface for transmission of event occurrence information, modifies debug access port with required hardware resources to incorporate SPC access to debug, and adds necessary security and debug access control requirements to ensure debug transparency in the presence of security requirements. For the latter use case sce- nario i.e. ensuring security during debug and validation, where the DfD may not be re-purposed to detect all security critical events, the SoC integrator may pro- ceed with more proactive, stricter security controls for constituent IPs in the SoC operations. All the necessary configuration register addresses/values are stored in the additional data memory/buffer in SPC to uniquely identify DfD detected security critical events. The SPC is also augmented with firmware instructions to configure these debug registers, mostly during boot. Abbreviations 104

5.4 Use Case Analysis

5.4.1 An Illustrative Policy Implementation

To illustrate the use of our framework, we consider its use in implementing the following illustrative policy:

I/O Noninterference: When the CPU is executing in high-security mode, I/O devices cannot access protected data memory.

The policy, albeit hypothetical, is illustrative of the typical security requirements involved in protecting assets from malicious I/O device drivers. Fig. 5.4 illustrates the flow of events involved in the implementation of this policy by SPC through DfD/security-wrapper of the associated processor. Here the DfD configuration is through a debug access port, CPU has an associated Embedded Trace Macrocell as the local DfD instrumentation, and I/O device requests are assumed to be based on Direct Memory Access (DMA). Following are the key steps involved.

1. During boot, SPC configures the required DfD instruments. This includes ”program counter within secure code range” and ”write access to protected data memory” event triggers in the ETM.

2. Along with DfD, the SPC configures the IP security wrappers for the subset of events to detect, frame protocol to follow, scan chain access etc. DMA engines are also configured by boot firmware on device-channel and channel to memory mapping.

3. When a secure program (assumed in protected code range) is loaded, the ETM detects the event, and triggers the security wrapper which communi- cates with SPC. The SPC updates the security state.

4. A DMA interrupt is detected by the corresponding security wrapper and transmitted to SPC. Any write request from the high-privilege driver to the protected memory is detected by ETM and transmitted to SPC via wrapper. The SPC identifies policy violation in context of the current security state and enforces necessary controls. Abbreviations 105

Figure 5.4: Use case scenario of security policy implementation exploiting the local DfD instrumentation.

5.4.2 On-Field Policy Implementation/Patch

Given changing security requirements e.g., adoption of system in different market segments or to address bugs or attacks detected on-field, new policies may require to be implemented or existing ones need to be upgraded or patched. These may require new events to be detected (outside what had been considered in design phase), extraction of more event information and/or control the IP functionality in response to a policy. The interface of the security architecture with the on- chip debug infrastructure allows the possibility of on-field system reconfiguration and upgrade, — something virtually impossible to perform with security wrappers customized for specific security policies. Achieving this requires selection of local Abbreviations 106

Figure 5.5: Block diagram schematic of SoC model with on-chip debug infras- tructure.

DfD instrumentation at corresponding IPs to identify if the relevant events can be detected; if so, the corresponding register address and value are added to SPC memory to be sent through DAP at boot/execution for configuration. With a standardized DfD interface with the IP security and debug wrappers, the corre- sponding events can be uniquely detected and control signals asserted in the IP if applicable.

5.5 Experimental Results

As mentioned in the previous chapter, due to lack of standard open-source mod- els for studying SoC architecture, we have been developing our own SoC design model. Although simpler than an industry design, our model is substantial and can be used for implementing realistic security requirements. Fig. 5.5 shows the updated model (includes the representative debug infrastructure with interface to the security framework) from the version of the last chapter. It includes a DLX microprocessor core (DLX) with code memory, a 128b FFT engine (FFT), a 128b AES crypto core (AES), and a SPI controller. The IPs are augmented with se- curity wrappers according to standard security critical event set [89]. The SPC incorporates the DLX microprocessor core as the execution engine with policies stored in its instruction memory. Abbreviations 107

Table 5.2: Example DfD Instrumentation Features by IP Type in SoC Model

DfD by IP type Ex. DfD Inputs Ex. Trigger Events Ex. Trace Content DLX TM Prog. counter (PC), Inst. opcode, PC in desired range, page, addr., Past, future ‘N’ inst. Data RD/WR addr. (DA), Special Jump, Branch T/NT, Particular exc- addr./values, status reg. register value, condition codes eption, interrupt, DA in specific page values, next branch inst. AES TM Plain text(PT),Key,Cipher o/p(CT), Encrypt/Dec. start/stop, Specific current 16B PT, Key, Mode, Status, Intermediate round key round reached, Key = desired All future round keys, CT SPI TM Parameters, Status, i/p packet, start/stop of operation, Source IP configuration register, Cycle counter, Configuration register = desired, acknowledgement error past ‘N’ i/p packets Memory Cont- Addr., Data, RD/WR, Burst size, Addr in specific range, bank, In/Out Data word/byte, roller TM Word/byte granularity, ECC DMA request, change in row buffer future ‘N’ data addr.

Table 5.3: Area (µm2), Power (µW) of DAP (SoC Area- ∼ 1.42X106µm2; SoC Power- > 30 mW )

DAP Area New DAP DAP Pwr New DAP (Orig.) Area (Orig.) Pwr 380.2 527.67 12.63 19.82

Table 5.4: Area (µm2), Power (µW) Overhead of DfD Trace Macrocells in SoC

DfD TM Die Area Area Power Power (Orig.) Ovrhd.(%) (Orig.) Ovrhd.(%) DLX TM 15617 6.07 512 6.7 AES TM 5918 8.5 165 10.9 FFT TM 2070 18.8 60.6 20.2 SPI TM 2054.6 17.08 57.75 15.8 Mem. TM 4623 7.9 163.3 1.65

We implemented a representative debug infrastructure, based on a simplified ver- sion of ARM CoresightTM features. It is functionally validated using Model- Sim [90] for typical use cases. Necessary interfaces/logic as described in Section 5.3, are added to the model to support DfD reuse for security policies. The DAP con- trols memory-mapped accesses to DfD instrumentation via the configuration bus. It also contains logic to control simultaneous debug and security requirements. Local DfD, similar to Coresight ETM/ITM are added for the functional IPs with their features enlisted in Table 5.2. Each has 16 32-bit configuration registers (64B address space) and support interfaces with the corresponding IP security wrapper. On security event detection, the configuration register based unique identifier (10 bits) is sent to the wrapper to be communicated with the SPC.

Table 5.3 summarizes the area and power overhead of the debug access port (DAP) Abbreviations 108

Table 5.5: Area (µm2) Savings of IP Security Wrapper

Wrapper Orig. New Area (corres. IP) Area Area Savings(%) DLX µP 3437 2326 32.32 SPI cntrl. 1055 842 20.2 AES crypto 1661 702 57.7 to incorporate SPC access (which entails modification of DAP). The area estima- tion is provided from synthesis at 32nm predictive technology. Note that with respect to base DAP design, the area and power numbers are high because of the simple DAP logic in the model; however, the additional system overhead induced by the modification is negligible since the DAP contribution to overall SoC die area and power is minimal. The area and power overheads of DfD Trace Macro- cells (TM) with respect to original TMs (without security wrapper interface) are enumerated in Table 5.4. The overheads are typically within 10%, but can be higher for some small IPs (e.g. FFT, SPI).

Table 5.5 measures the decrease in wrapper hardware area overhead through re- purposing DfD for security wrappers. The measurement is done by comparing the current implementation with the earlier reference implementation in which the security wrapper is responsible for detecting all necessary security-critical events, with no dependence on DfD. We note that the savings can be substantial, ranging from 20% to close to 60%. This is because a comprehensive DfD framework typ- ically captures a majority of the security-critical events since they are also likely to be relevant for functional validation and debug.

Finally, Table 5.6 measures power overhead. This experiment is interesting since DfD reuse has two opposing effects: power consumption may increase since trace macrocells remain active to collect security-critical events even when debug is not active; on the other hand, decreasing hardware overhead of the wrappers contribute to reduced power consumption. For DLX, the two directions cancel out, while for SPI and AES there is a net power overhead. We note that the overhead is minimal with respect to the overall power consumption of the entire SoC.

We end this section with an observation on interpretation of results. We note that the numbers provided are based on the security and DfD infrastructures in Abbreviations 109

Table 5.6: Power (mW) Analysis in SoC on implementation of Debug Reuse

IP IP Power Wrapper Corres. TM Consumption Savings Power DLX µP 6.54 0.52 0.551 SPI cntrl. 0.321 0.024 0.062 AES crypto 5.53 0.03 0.173 our SoC model; while reflective and inspired by industrial systems and policies, our implementations are much simpler. Nevertheless, we believe that the overhead measurements substantially carry over to realistic usage scenarios. Perhaps more importantly, our experiments provide the guidance on parameters to analyze when considering re-purposing DfD for security implementation vis-a-vis standalone se- curity wrappers.

5.6 Related Work

Early research on security policies looked primarily on software systems [39]. More recently, with the increasing prominence of SoC designs, there has been significant interest in SoC security. There have been research on exploiting DfD for pro- tection against software (s/w) attacks, e.g.. Backer et al. [91] analyzes the use of enhanced DfD infrastructure to confirm adherence of s/w execution to trusted model. Besides, Lee et al. [92] studies low bandwidth communication of external hardware with the processor via the core debug interface, to monitor information flow. Methods have also been proposed on securing SoC during debug/test by ap- propriate changes in DfD/test infrastructure [93, 94]. But this work is regarding flexible and light-weight architecture framework for generic security policy imple- mentation exploiting the re-purposing of DfD for security critical event extraction. Hence it is complimentary to existing works in the domain.

5.7 Hardware Patch in SoCs

With the ever-rising functional capabilities and associated complexities of a mod- ern day SoC, the number of different types of aforementioned security policies gov- erning the access to on-chip security critical assets is also increasing. With better Abbreviations 110 standardized heterogeneous integration methodologies, optimized power manage- ment techniques on-chip along with the still ever-present shrinking transistor sizes, this trend of enhanced SoC resources and design complexity is expected to con- tinue in atleast the near future. Policy implementation typically involves subtle interactions between H/W logic, S/W and/or F/W of the underlying IP blocks and other SoC components. Many of them also get refined or modified along the de- sign cycle by various stakeholders. Besides, as compared to a significant fraction of these policies implemented at the operation system or system S/W level in typical processor based computing platforms, majority of the SoC level security policies involve direct interaction with the hardware logic of the constituent IP blocks. As mentioned before, a key challenge arises in the upgrade or patch of these policies in response to design bugs found during post-Si validation, satisfying system level performance/power profiles and/or changing security requirements on field. The latter could arise due to vulnerabilities detected on-field or adoption of the final SoC based system or product in different global market segments. Hence in SoCs, a key requirement during the upgrade is that of a patch in the underlying hard- ware logic rather than just a S/W patch/upgrade. The H/W patch may involve addition of new security policies or modification of existing system level policies in the SoC. These could indirectly require extraction of more security critical events from various IP blocks at a finer time-space granularity and/or setting of addi- tional proactive controls than those considered at design time. On the other hand, it could also signify appropriate configuration to detect less events and/or apply subset of controls than originally designed with, to relax the security constraints.

The proposed centralized architecture framework (incorporating the interface to debug instrumentation), providing a systematic, methodical approach for security policy implementation in SoCs, is flexible and adaptable to be applicable towards the requirement of this hardware patch. The DfD trace macrocell, local to the IP block of interest, can be configured accordingly by the centralized security policy controller (SPC) to extract additional (to what was considered during design of security wrappers) internal security critical events and/or set/disable appropriate controls to govern its functionality, depending on the system state. As mentioned in this chapter, debug infrastructure in a modern day SoC consists of a significant amount of resources for adequate observability/controllability, and thereby ensur- ing full coverage during post-Si and/or on-field. Hence the DfD blocks are capable of detecting majority or all of the security critical events as well as controlling part of the IP functionalities, required for policy implementation. The necessary Abbreviations 111 configuration register values can be found from the debug infrastructure program- ming model and stored in the SPC to be loaded at boot time. Without the DfD interface, one would not be able to implement these modified security policies (requiring extra hardware event logic/controls) without an additional fabrication cycle. Presently, in majority of these cases where a respin is not a feasible option, often the course of action is to implement pessimistic (safety first), proactive se- curity policies which may hamper performance/power and vice versa depending on what is deemed critical. Within the existing hardware resources of the security wrapper of an IP, the event detection and trigger logic, metadata extraction, con- trol logic modification and frame generation parameters may be configured by the SPC (at boot) to adapt to changing (stricter or more relaxed) security require- ments. This provision of the framework aids in H/W level system security patch as well. Finally the centralized SPC module, which at the high level is basically a state machine outputting different security controls after analysis of the system state, can be easily upgraded to implement new or modified policies utilizing these altered/additional extracted events.

As discussed in this and the previous chapter, a typical approach would be a micro-controller based implementation of the SPC where the security policies are stored as firmware in a non-volatile instruction memory. They would be upgraded based on secure authentication via on-chip keys. An alternative approach would be using a Field-Programmable Gate Array (FPGA) [95, 96] to implement the SPC module. A FPGA, which constitutes reconfigurable logic and interconnect fabric would also meet the SPC requirement of the upgradeability feature. The modi- fied configuration bitstream would be uploaded to the internal FPGA non-volatile memory during upgrade of security policies. Besides, for application scenarios like this centralized SPC module, which would not typically require frequent program- reprogramming iterations, a FPGA might be better suited especially with regards to implementation of time-sensitive policies like time-of-check-time-of-use (TOC- TOU), liveness policies etc. due to normally higher speed of operation of a FPGA as compared to a micro-controller. Besides as the underlying constituent hardware logic and their interconnections are changed during reconfiguration of a FPGA, they are more secure from design reverse-engineering based attacks, which could be used by an adversary to gain knowledge of the policies (proprietary to the SoC design house) if possible. This change in constituent H/W logic of a FPGA may also aid in hardware level patch during upgrade of security policies in a SoC. Abbreviations 112

5.8 Conclusion

In this chapter we have developed a SoC security architecture that exploits on- chip DfD to implement security policies. It provides the advantage of flexibility and on-field update of security requirements, while being transparent to debug use cases. This flexibility of hardware patch helps in scenarios of resolving bugs/vul- nerabilities found on-field or changing security requirements etc. Our experiments suggest that the approach can provide significant benefit in hardware and area savings of the IP security wrappers with no substantial energy overhead. Chapter 6

Security Assurance in SoC in presence of Untrusted IP Blocks

6.1 Problem of Untrustworthy IPs

The increasing complexity of SoC design and validation coupled with strict time- to-market demands have typically led to SoC designers utilizing pre-qualified 3rd party IP blocks to increase design productivity [97], [98], [99], [100], as illustrated in a generic flow diagram in Fig. 6.1. These IPs could be of different types like proces- sor core, graphics core, memory subsystem and corresponding controllers, device controllers as well communication fabric components. Test and debug frameworks are also integrated as infrastructure IPs. These IPs constitute one of the largest vulnerabilities in the present SoC design cycle. In particular, a modern SoC design typically constitutes several hundreds of IPs, most of which are procured by the SoC integration house from third-party vendors. With these third-party vendors located in different parts of the world with varying control over rules and regula- tions, these IPs could be potentially untrustworthy [101], [102], [103], [104], [105]. This includes the possibility of a malicious vendor or even a rogue employee in one of the vendor businesses inserting stealthy malicious logic or covert backdoor channels into the design [100], [106], [101]. Commonly referred to as hardware or firmware Trojan, these can make the system fail during critical system operation or leak sensitive information from the system to an unauthorized agent. Like mali- cious insertions possible in an IC at an untrustworthy foundry [107], these IP level

113 Abbreviations 114

Figure 6.1: Typical Representative SoC Front End (Until Fabrication Ready) Design Flow

Trojans could be implemented so that they function or perform a malicious ac- tivity outside the design specification boundary (additional to meeting functional, parametric specifications) and/or utilizing rare system or internal IP event/s to trigger the rogue operation within specifications, making them extremely difficult to detect using generic functional and structural testing [97], [98]. Moreover, H/W Trojans may also be activated by S/W based triggers or their payload controlled by S/W, which allows flexibility to an attacker to dynamically change untrustworthy Trojan behavior [103], [108].

Some examples of representative hardware or even firmware (often supplied by IP designer) level IP “Trojans” in this context in terms of their effect include a processor core maliciously sending security critical register values to I/O memory or debug port conditionally using unused or in addition to existing in- struction function, a memory controller generating an additional bank read/write request for a load/store to a specific address range or even a crypto-processor send- ing on-chip keys during encryption out to memory/system interface in concurrent shadow mode. Similarly, as an example of such a threat in the communication fabric, a router in a network-on-chip may conditionally (e.g. depending on source and destination addresses and packet content) send a data packet to both the des- tination address and a chosen address to create a path for potential leakage. At the system level, a rogue action by an untrustworthy IP core may affect, influence or trick other SoC components directly/indirectly to leak security critical infor- mation or function in a way so as to cause system malfunctions. Apart from an intent of malice, due to lack of strict design/verification rules, regulations or cost limitations in certain licensing scenarios, unintentional vulnerabilities may also Abbreviations 115

Figure 6.2: a) Example IP level Trojans in a representative state machine verilog RTL (soft IP); b) Logic level representation of a sample Trojan Model. percolate in with the 3rd party IP block. The numerous instances of bugs found in a design on field, some of which could be security critical, attest to a chance of high probability of these inadvertent loopholes [101] escaping test. Critical scenarios that have been backtracked to such loopholes include easy accessibility into secure grade FPGA design through the unprotected test port [109], inter- mediate values of round keys in encryption being reflected indirectly on system bus etc. Overall, for a vulnerability, whether it be malicious or unintentional or whether the attack outcome depends on the particular business model etc., it can potentially be exploited by adversaries of different capabilities to cause erroneous system functions or leak security critical data. This could be extremely critical in mission critical applications e.g. defense, automotive as well for financial insti- tutions which could be the target of the adversaries of the highest capability in terms of available resources, technical know-how and financial backing. We note that as compared to previous studies regarding analyzing Trojan effects in partic- ular standalone ASICS, IP cores, processors etc. and methods for verifying these designs for trustworthiness [110], [107], in this chapter, our focus is on analysis of system or platform level security vulnerabilities in the SoC designs created by integration of these untrusted IPs. Abbreviations 116

6.2 Background and Related Work

Malicious logic in IP design is difficult to identify by standard functional vali- dation [97, 98]. In particular, Trojans are typically designed to be exercised by rare events under very specific execution corner-cases that are difficult to excite in a functional validation environment. We illustrate examples of Trojans inside a simple, representative finite state machine (FSM) in Verilog RTL in Fig. 6.2(a). State machine based controllers are usually components of all IP cores such as processor, memory controller, bus control module etc. Here addition of a new, potentially stealthy state as well as modifications to an existing state of the vari- able “X”, taking advantage of some example rare conditions are highlighted. For a processor, this might represent usage of reserved instruction set opcodes for rogue functions or perhaps additional operations for an existing instruction that can be used to indirectly leak data. For a memory controller, such a Trojan might lead to modification in memory request scheduling policies and thus potentially star- vation of resources for particular IP/s. Fig. 6.2(b) is a lower level (logic/circuit level) representation of a typical Trojan Model in a design highlighting the no- tion of “triggers” and “payloads”. Researchers have looked at developing testing methods aimed at static trust verification of IPs [106], [111], [112], [100]. Detecting Trojans in these 3rd party IP blocks is extremely challenging as there is typically no golden (Trojan free) RTL for the IP, rather only the functional specifications. As a result, the number of possible ways to express a Trojan in the circuit is often unbounded and grows exponentially with design size. Although techniques like that in [111], [112], [113], based on probabilistically suspicious nets, nodes, re- gion or unused circuit detection and generation of optimized targeted test sets are shown to be effective in particular instances, they suffer from significant false neg- atives/positives based on chosen test set, threshold parameters, design type etc. and hence do not provide complete IP trust assurance. Even with assumed com- binational and sequential Trojan models, the Trojan coverage for relatively small (compared to today’s designs) ISCAS benchmarks is on an average ∼ 80 − 85% with suitable chosen parameters [47]. The efficiency of these techniques with re- spect to test time as well as Trojan coverage reduce with increasing sizes of modern designs. Along similar lines, although formal verification of designs for direct/indi- rect effect of potentially untrustworthy sources on security assets is effective for small designs [106], [114], they are just not scalable to most industry grade IP sizes used today. In modern day SoCs composed of hundreds of IPs of different sizes and Abbreviations 117 types, the net untrustworthiness after static trust verification may involve a cumu- lative effect of the uncovered (by tests) trust issues in the IPs themselves. Hence along with these static methods, execution time dynamic checks are required, that would serve as the last line of defense i.e. operations of IPs with varying degrees of trust and their communications with other SoC components should be closely monitored at run time for potential malicious or undependable behavior, to ensure secure SoC execution. Here, security also includes reliability of operation of the SoC as Trojans may affect inter-IP control signals (which may not be considered formally as security assets), which in turn can cause system failures.

A few run time monitoring techniques have also been proposed for detecting H/W Trojans, but most have been studied and analyzed in context of processor (µP ) cores [103], [102]. However, they primarily focus on standalone IP validation and are difficult to adapt or scale to Trojans affecting system-level behavior, influenc- ing other IPs in the SoC design. Furthermore, these approaches do not address online mitigation of detected threats in a platform. We feel in order to reduce the significant incurred resource overhead due to the continuous monitoring support suggested in some solutions [103], an untrustworthy IP action should only be veri- fied if it propagates to and attempts to modify system state or affect other system components and potentially security critical assets along the flow of operation. This is in accordance with the well accepted security principle of isolating a mali- cious behavior locally within a rogue IP core or module, similar to the concept of enclaves, containers in S/W security [115]. Furthermore scrambling of inputs to untrusted units according to coverage test set may require interaction with ran- dom number generators, encryption engines at input and output interfaces of the IP [101]. Apart from H/W overhead, this might cause significant performance degradations of the system as well. Hence, the existing dynamic techniques are limited in their applicability at the SoC level.

In previous chapters, a flexible and configurable architecture “E-IIPS” was pro- posed to implement system level security policies to protect against attacks via rogue software stacks as well as threats from SoC to system external interface. The architecture includes a programmable central security policy controller (SPC) that keeps track of the system security state and enforces the restrictions imposed by the policies, together with smart security wrappers for individual IP blocks that detect security critical events from IP operations. However, the architecture did Abbreviations 118

Table 6.1: Current trends in Trojan Research and Scope of this Work

Existing Research Scope of the on Untrusted IP Proposed Work Analyze feasibility and effect of Analyze effect of IP level Trojans Trojans in ASICs and processor cores at the SoC or system level Explore static IP trust veri- Run time detection of potentially fication techniques for Trojan detection suspicious IP activity affecting system Run-time methods that do not IP-Trust aware security policies analyze address error correction or recovery & assert appropriate security controls not account for malicious IPs (rogue hardware or firmware coming in from IP ven- dor). In this chapter, we exploit and extend the centralized policy implementation architecture in a systematic, disciplined manner to provide system-level security assurance in the presence of untrusted IPs. In particular, the proposed framework implements fine-grained IP-Trust aware security policies for run-time protection and mitigation of system-level vulnerabilities in a SoC in the presence of malicious logic both in the core IP functionality and corresponding security wrapper imple- mentations. Below, we describe with examples, what we refer to as system level Trojans in context of a typical modern day SoC.

6.3 System-level Security Issues Caused by Un- trusted IPs

We start with an overview of system-level security issues caused by untrusted IPs to highlight the distinction between our work and related work on IP-level hardware Trojans. Table 6.1 lists the key distinctions between general trends in Trojan research and scope of this work. There have been previous studies on the feasibility and potency of malicious modifications in some ASIC designs and in some instances processor cores [107, 108, 116] in isolation. This includes inverting internal logic nodes via XOR gate and rarely activated combinational trigger logic as well as sequential Trojans or “Time Bombs” which can reset or disable the IP operation based on a counter reaching a chosen threshold (starting from reset) or after a particular sequence of input values. However, these have not involved analysis of untrustworthy IPs in the context of a system or platform such as a SoC where multiple H/W, S/W or F/W controlled IP modules perform dedicated roles and communicate with each other to perform system level functions. For Abbreviations 119 a SoC design, Trojans or malicious logic in an IP often visibly affect the overall system function rather than the IP core itself. In a SoC, with respect to a Trojan, we are concerned with these effects on overall system function rather than each IP core itself. For example, a malicious IP may send spurious communications to other IPs resulting in leakage of sensitive information, data corruption, or denial- of-service of the entire system. A critical problem with such system-level Trojans in an IP A is that their effect can only be observed in an overall system context — typically as a direct malicious effect from another IP B — and may remain undiscovered in standalone functional or IP-trust validation of A. On the other hand, due to scalability reasons, system-level validation (of both functionality and security assurance) with real use-cases can be exercised only on executions using fabricated silicon [117, 118]. However, aggressive time-to-market requirements imply a limited window for post-silicon validation before the product goes on- field, resulting in potential escapes to shipped systems. Hence potential system level Trojan attacks can easily slip through even in presence of efficient static IP trust verification techniques. System or platform level attacks are being generally defined here as attacks which involve utilization of other IPs or system components by the rogue IP core to trigger and propagate the effect of the internal malicious activity at the system level and thereby compromise SoC security in some way. The objective of this work is to provide architectural support for on-field detection (and mitigation) of system-level Trojan threats.

System-level Trojans vs. Trojans in a malicious IP:

While there is no standard, universally-agreed taxonomy, most industrial SoC in- tegration teams have developed a categorization based on analysis of system-usage scenarios and protection requirements at different phases of system execution. In particular, an untrustworthy IP can typically affect system-level behavior through message communications, resource sharing, and control of operation and data flow. We can classify malicious behavior either in terms of the kind of threat introduced, or in terms of the system-level impact of the Trojan We can classify the rogue IP threats into four categories [119]: (1) Interception; (2) Interruption; (3) Modifica- tion; and (4) Fabrication. Correspondingly, in terms of behavior, malicious IPs can be characterized with the following taxonomy.

• Passive Reader: An IP that illegally reads/collects secret information meant for other IPs. Abbreviations 120

Figure 6.3: Message diagram level representation of untrustworthy IP being a (a) passive reader and modifier, (b) diverter and masquerader, along with the associated threats.

• Modifier: An IP that maliciously changes communication/message content between two IPs.

• Diverter: An IP that diverts a message/information between two IPs to a third IP.

• Masquerader: An IP that poses or disguises itself as some other component, in order to request service from or control the operation of other IP/s.

Unlike IP-level Trojans, the taxonomies above characterize system-level Trojans by impact rather than their design, implementation, or triggering characteristics i.e. like the threat classification, the four categories above, all capture maliciousness as a ”black box” i.e. they do not care exactly what kind of malicious logic/circuit has been inserted, but they simply look at the effect in the context of other IPs in the SoC. An upshot is that the taxonomy does not directly translate to a scheme of statically checking a design for system-level Trojans, and one must resort to validation of the run-time behavior of the system either through dynamic or formal analysis. Also, the categories are not mutually exclusive, e.g., a passive reader may also act as a masquerader or a diverter may also read information before diversion and hence also a passive reader. Furthermore, as illustrated in Fig. 6.3, any adversary behavior can result in multiple threats, e.g., a masquerader can cause any of interception, interruption, or modification. Abbreviations 121

Below we provide some examples of system-level Trojans arising from a sample of potentially malicious IPs. While the descriptions themselves are simplified for pedagogical reasons, they are inspired by realistic protection/mitigation strategies developed by security architects in typical industrial SoC design flows. We note however that they are only meant to be an illustrative sample; an exhaustive overview of the spectrum of vulnerabilities caused by untrusted IPs is beyond the scope of this work.

1. Untrusted Processor: In a processor, the underlying control and/or datap- ath may be slightly modified so that an existing instruction or opcode will perform something extra (potentially malicious), in addition to executing its normal functions as expected per the specifications. These changes may be in the hardware logic itself, at the microcode or firmware level or a combi- nation of both. Along similar lines, the Trojan might be also in the form of a new instruction utilizing reserved opcodes (These may also bypass detec- tion provided that the typical security policy concerning the processor does not check for a golden set of instruction opcode fetches only). To cause an intra-processor malicious activity to propagate outside and affect the sys- tem, an attacker may zone in on processor sites/events involved with system interaction. This would typically include the memory subsystem (including DMA mapped I/O devices) as well as test/debug interfaces. As examples of the additional functionality, a Trojan inside a processor might lead to a sin- gle memory access instruction resulting in two memory data access requests (apart from TLB) only when the address is within a desired range (which can be potentially configured from time to time to avoid detection). The additional request generated may be termed as the shadow memory opera- tion [102]. In other words for a simple RISC Load(LD)/Store(SW) architec- ture, a particular LD/SW instruction would lead to two LD/SW requests if the address is within a particular range. The underlying modification could be such that irrespective of the adversary software privilege level, the shadow mode memory access would always occur in the kernel mode (or the highest in system) with the system mode bit/s set in register (H/W), giving the shadow operation visibility to the complete system memory available to the processor (and potentially other IPs) as well as evade detection by generic privilege based access control security policies. This may be utilized to bring into the program window or tamper with security critical assets Abbreviations 122

stored in the protected memory space. For the former scenario(load), the resultant secret may be written back to DMA memory space (and signal DMA controller for action) for the attacker chosen I/O device of choice and thus leaked. Thus such an attack (leading finally to exposing a secret to an attacker) involves potentially multiple IP blocks (memory controller, DMA engines, device controller etc.). With some knowledge of the final system address ranges in the SoC, the shadow load address arguments may passed on accordingly (directly/indirectly) with the upper S/W stacks during ex- ecution. In terms of taxonomy, the untrusted processor core could be here classified as a passive reader, modifier (for a store) or even a masquerader. Another example of this additional modification, as part of an existing or new instruction might be added malicious H/W logic to change the control flow of programs on demand i.e. instead of using buffer-overflow and other S/W attacks to change the control flow, one may think of this as hardware based control flow attacks. The modification may be designed in a stealthy manner such that the instruction pointer (visible to programmer/debugger) might still be pointing towards legitimate instructions in sequence whereas in something analogous to a shadow mode, the underlying H/W is executing code from the malicious source of attacker’s choice whose initial instruction could be a jump/branch (to avoid detectability of any kind). This added indirection would make it extremely difficult to detect this rogue action. A potential trigger for this attack may be a security critical program trying to access a secure key from on-chip secure flash that would be required for decryption/authentication and consequently control operations for a critical system. The adversary malicious program may interrupt this flow thereby leading to denial-of-service (failure in mission critical system) or leak the secure key (stored in a processor register) to the attacker device of choice via DMA or test/debug port (attacker may have on-line access to debugger). With the key, an unauthorized agent can gain entry into the mission critical system and control/tamper it. Hence different IP cores may be utilized by a small malicious modification in an untrustworthy processor to conduct a sys- tem level attack. Here the untrusted processor is a passive reader or modifier and/or even a diverter (potentially diverts keys from flash to I/O device). We note here that for the processor and the other remaining example un- trusted IP types considered, we do not dwelve into possible implementation details of the particular backdoor or modification at the micro-architecture Abbreviations 123

or logic level, but rather analyze it from a high level, architecture based behavioral abstraction.

2. Untrusted Memory Controller: Like a processor, the controller to the sys- tem memory, which typically comes in the form of a separate IP, may be untrustworthy as well. It is a critical component as it governs the accesses to the memory, which is shared between different IPs and system compo- nents in a SoC. With memory-mapped input-output and DMA, the memory controller governs access to external devices as well. An example scenario may be a Trojan in the memory controller which tampers with the stored values of only a particular IP in certain trigger conditions. This specific IP could be for example a processor in a multi-processor SoC, which controls the different sensor functionalities in a mobile system (like accelerometer, cam- era, microphone, temperature, keypad pressure sensors, health sensors etc.). Tampering may be in the form of modification to data values in the request buffer before storage into the memory bank. This may cause malfunction in the control of these sensors and thereby hamper usability/availability or in extreme cases such as temperature feedback control, can cause the en- tire system to overheat and fail. The tamper might also be in form of an extra shadow copy/write of an input key or signature (input via camera, microphone or key pad) to attackers program space in the memory so that a program of the adversary’s choice running in a different processor can later extract it. Hence the memory controller could behave as a passive reader or modifier. Besides tamper of access requests, a hardware logic or often firmware modifications may be intentionally inserted to cause the memory access scheduling policy in the controller to ignore a particular IP’s request on a certain trigger based on system context. As a result, the particular IP may be starved of resources to perform an operation, which can lead to a system failure in the form of erroneous outputs or denial-of-service. Here the controller acts as a modifier, altering the scheduling policy.

3. Untrusted Network-on-Chip: Another shared system resource which is heav- ily used by the different IPs to communicate with each other to perform system wide functions is the Network-on-Chip (NoC). NoCs are composed of multiple levels of routers or switches which direct the packets of information from the source to the intended destination. Typically, the full communica- tion fabric or the NoC is designed as an infrastructure IP. We can imagine Abbreviations 124

different scenarios of malicious modifications in the hardware logic and/or firmware of the routers that can cause widespread system level effects. A particular case may be that for only pre-selected specific secure data packets or assets going from memory controller to a crypto-IP at system power up or down, a particular malicious router creates a copy and routes it as a packet to a device controller with external system access (along with the original intended destination i.e. crypto-IP). This may expose the system in a cer- tain security critical usage scenario to replay attacks, unauthorized access breaches etc. Here the router can be classified as a diverter and/or a passive reader. In a particular system context with specific real time constraints, a Trojan in a router may be activated to increase the packet latency by altering the route schedule policy (e.g. from a shortest time to least link bandwidth consumption). This delay may result in a payload of a failed system request or function in the real time system. The router acts as a modifier in such a scenario.

4. Untrusted Device Controller: Finally, along with the others, a controller to devices like USB, bluetooth, ethernet, modem and other I/O components may be untrustworthy as well. As a simple malicious modification, a device controller (e.g. display controller), through insertions at the H/W logic or firmware may modify bits of device data input under certain trigger events such as a particular authentication request (for log-in) in a program on the processor (would involve display controller operation for communicating re- quest, thus acting as trigger), that would require an user to input credentials. This intentional tamper would lead to denial of service attacks. Similarly, as device controllers are the link between the SoC and the external system, there could be may other possible attacks of denial-of-service types and/or fabricated device requests from the controller itself (not originating from any IP or program in processor) that may result in information leakage from other security critical components of the system (outside SoC e.g. system flash/ROM ). Hence in the examples presented, a device controller can be classified as a modifier or passive reader and/or a masquerader. Abbreviations 125

6.4 SoC Security Architecture Resilient to Un- trusted IP

In this work, we propose a security architecture that implements fine-grained, IP- Trust aware security policies at run time, which ensure reliable, secure operation of the SoC even in presence of untrustworthy IP activity. The proposed framework detects potential undependable or suspicious behavior of untrustworthy IPs during execution and asserts appropriate security controls as defined by the policies. The architecture is an extension of the proposed security infrastructure “E-IIPS” [89], which presents a systematic, methodical approach towards implementation of di- verse types of system level security policies in a SoC, as described in chapter 4. The generic security policies considered in [89] deals with the typical threats of attacks via the SoC to system interface and malicious S/W stacks executing on different IP cores. This work significantly enhances the E-IIPS architecture with necessary support to provide a disciplined, systematic and scalable approach for addressing the threats of IP level Trojans in SoCs, irrespective of the type of constituent IP/s.

6.4.1 Assumptions

As mentioned earlier, in the E-IIPS architecture, the malicious and/or uninten- tional H/W logic or F/W code may reside in the IP Core or the security wrappers, which are typically provided by IP providers, similar to P1500 test wrappers for boundary scan [120]. In the IP, we assume that apart from the standard, com- monly used and highly validated test and debug wrappers, a Trojan might be inserted anywhere within the internal control logic, data-path and/or temporary storage locations. For the “smart” security wrappers capable of detecting different security critical IP events/data, apart from the frame generation logic and wrap- per interfaces with SPC [89] and local DfD [121], which are relatively standard (in terms of high level specification, behavior and micro architecture) across all IP types, the Trojan or unintentional vulnerability might be present undetected anywhere in the wrapper sub-components. As touched upon earlier, although two IPs or system components inside the SoC may be both untrustworthy, probabilis- tically, the event of them being malicious on the same system context is extremely low. Independence between different IP vendors and complexity of conducting Abbreviations 126

Table 6.2: Assumptions Regarding Trustworthiness of Associated Components in Solution Methodology with respect to an Untrusted IP

Component Trustworthiness in SoC Assumption Except standard test (P1500) and IP Core debug wrappers, all untrustworthy (incl. datapath, control, storage) IP Except standard wrapper to SPC, Security local DfD interfaces and frame Wrapper generation logic, all untrustworthy Local Design- Trustworthy for-Debug (DfD) DfD to wr- Trustworthy apper link SPC to DfD Trustworthy config. link Interacting IP wrapper, comm- Trustworthy unication link Wrapper to SPC communication Trustworthy link/fabric SPC Trustworthy coordinated attacks in the scenario of multiple IPs from the same vendor, form the rationale behind this assumption. Hence, with respect to analyzing a potential defense against an untrustworthy IP (core + wrapper), the other interacting com- ponents in E-IIPS architecture like the DfD or debug trace macrocell, interacting IPs/system components and communication fabric between the wrapper and the SPC are considered secure and trustworthy i.e. the IP cannot collude with other components cohesively to conduct a potential system level attack. The architec- ture of the debug framework is standardized for adoption in different SoC design scenarios. It is often an infrastructure IP belonging to the design house itself and hence trustworthy. These above mentioned assumptions are listed in Table 6.2. First we present the solution for the untrustworthy security wrappers in isolation and consequently we deal with the problem of Trojans in the IP core. At the end, we integrate the two proposed approaches to enhance SoC robustness against the scenario of Trojans in any part of the IPs. Abbreviations 127

6.4.2 Untrustworthy Security Wrappers

Among the typical wrapper sub-components, the event monitoring/extraction logic including the associated metadata storage may be highly susceptible to a malicious exploit. This is because they vary according to IP types and partic- ular implementation. They also serve a critical role in the E-IIPS architecture in lending “intelligence” to the wrappers for detecting the necessary, critical IP events and can be directly used to launch an attack on the system. A malicious or unintentional vulnerability in the event detection H/W logic may cause the secu- rity wrapper to not report certain security critical events as and when they occur internal to the IP (False Negative), possible under some stealthy condition based triggers. As a result, irrespective of the trustworthiness of the underlying H/W of the IP core, the SPC would be oblivious to any behavior/activity (rogue or trusted) originating inside the IP and potentially propagating to the system level. Hence, the system security state maintained by the SPC would not be updated and hence not reflective of the actual system context. This would result in in- correct security controls being applied, which can potentially lead to system level confidentiality/integrity as well as availability based attacks. Example scenarios might include:

1. A thread executing in a processor core requests a conditional jump to SoC level system address (where security critical assets are typically stored during SoC execution) and the wrapper does not report it to SPC only when address is in a certain address range which could be something like the particular storage space of DRM keys or the process scheduling block in kernel memory.

2. The program on a processor, requests memory controller for sensitive data to output to a particular device channel via a DMA engine on platform and the memory controller/DMA (depending on policy implementation) wrapper does not report it to SPC only in the scenario of a particular channel (specific device) of interest.

For the former, suppose if the memory controller security wrapper is not designed for verifying accesses via this processor (may be SoC designer implements the corresponding check only in processor according to specification), then the mali- cious wrapper can lead to confidentiality/integrity/availability attacks. Similarly in the latter case, in a particular scenario of design of processor, DMA engine and Abbreviations 128 memory controller wrapper, such exploits may present significant opportunities for leakage or tamper of security critical assets. On the other hand, the reverse (False Positive) may also occur with respect to a malicious wrapper behavior i.e. the wrapper fabricates and reports particular security critical events to SPC, possibly under some trigger conditions, when actually they do not occur at all in the IP operation/execution. Such a scenario could be equally stealthy in terms of poten- tial effects at the system level. An example of this, which could potentially lead to availability attacks includes:

1. A bus controller or a router in a network-on-chip falsely reports to the SPC that IP “A” has frequently requested access to IP “B” configuration regis- ters or its local memory data under a particular trigger condition (may be when IP “A” starts a security critical operation as detected by a particular communication request pattern of IP “A”, which goes through this router). The SPC, according to implemented policy disables all IP A communication requests for the current execution run leading to availability attacks and potentially system failure.

It is true that any malicious insertion or modification or for that matter a bug in an IP security wrapper would be comparatively easier to detect by validation, as compared to those in IP core designs, due to the relatively simpler design speci- fication of the security wrapper (in contrast to main IP functionality). Besides, an adversary during IP design typically does not know the actual SoC level se- curity policies governing the IP. It may so happen that the inserted Trojan is inadvertently disabled during wrapper configuration by the SPC or the policy is just implemented in such a way that the event that the Trojan masquerades is not directly used to modify any security controls. In such scenarios, the Trojan trigger or payload may not be realized at all. But at the same time, to an attacker, a Trojan in the security wrapper is easier to design/insert, compared to IP Core level Trojans as well as it offers the adversary a potentially easier and direct way to conduct a potential attack at the system level. An attacker’s job to propagate the effect of the Trojan to the IP interfaces and consequently to the system is much simpler in this scenario. Hence malicious logic in a wrapper is definitely a security vulnerability in the scenario of incomplete trust coverage via functional/structural tests. We propose a solution to detect the untrustworthy security wrapper action at run time and thereby protect the security and reliability of SoC operations. Abbreviations 129

6.4.2.1 Solution Methodology

The proposed solution is simple. It is based on detection of untrustworthy se- curity wrapper action by comparing and verifying it with a trusted monitor at run time. The trusted monitor on die in this case is the DfD module or trace macrocell, local to the IP as illustrated in Fig. 6.4(a). As required for observ- ability and some controllability during post-Si validation, on-field tests as well as patch/upgrades, a typical macrocell [122] for a processor (as an example) in a SoC implements standard instruction and data value/address/range comparators, condition code/status flags match check, performance counters, event sequencers, logical event combinations etc. Along similar lines, for different IP types, the corresponding DfD modules incorporate adequate resources, which can be uti- lized to detect almost all of the security critical IP events required for policy implementation [121]. Moreover, these local trace macrocells would potentially already be utilized to verify the wrappers during post-Si validation. Utilizing this to re-purpose debug instruments for security, the central engine i.e. SPC would configure the DfD module, local to the untrustworthy IP of interest at power up to detect part or all of the security critical events at run time. The number of wrapper events to be verified is chosen by the SoC designer depending on the degree of untrustworthiness of IP, system execution contexts etc. and mapped to security policy functions in the SPC. For example, during security sensitive boot process, the DfD might be configured to detect all expected wrapper events where as during normal mode, a subset might be required to be monitored. For a security wrapper or trace macrocell communicated security critical event, the SPC compares the other’s response (may be same event, wrong event or no event frame at all) and asserts appropriate security controls as determined by the pol- icy. Here an important point to note is that the previous chapter proposed the use of existing local DfDs (after configuration ) to share the load of the wrapper by detecting some of the critical IP events at run time, thereby reducing the H/W resources of wrapper and making them lightweight. However this requires custom design of the wrappers from the IP provider’s part to adjust to different debug capabilities and design choices in a particular SoC implementation. This may lead to some increased design complexity as well as time-to-market. In our proposed method, we assume that the wrappers are not touched. It would constitute the corresponding debug interfaces [121] as specified by the SoC designer (for pro- posed verification), as shown in Fig. 6.4. During execution, due to power/energy Abbreviations 130

Figure 6.4: (a) Cross-verification based proposed methodology to detect un- trusted wrappers; (b) Modifications required for re-purposing DfD for security policies in SoC; (c) Zoomed view of the additions in IP security wrapper and corresponding DfD. constraints in particular use cases, the monitoring may be switched off as well.

The verification methodology is derived from N-version programming in soft- ware [107] that generates and executes multiple functionally equivalent versions of the same program to achieve high reliability in presence of software faults. Here only two independent sources are involved, one being trusted and the other un- trustworthy. Potentially, the security wrapper of an interacting IP may be another independent source for event verification of untrusted wrapper of interest, as the same event context or sequence may be designed to be detected by both wrappers, depending on the particular implementation. For example an IP’s (e.g. processor) particular request to the memory may not be reported to SPC by its wrapper, but the memory controller’s wrapper reports it when it receives the request along the flow of operation. Hence, a malicious IP wrapper action may be detected by SPC under scenarios when the actual event sequence extends beyond the IP and the interacting IP wrapper can detect it as well. Due to its rather limited scope, we do not consider it as part of the solution in this work. Note however, that repurposing DfD for security requires addressing the following trade-offs.

Transparency to Debug Use Cases. Post-silicon debug and validation are themselves critical activities performed under highly aggressive schedules. It is therefore critical that re-purposing the DfD does not interfere or “compete” with debug usages of the same hardware. Abbreviations 131

Maintaining Power-Performance Profile. On-chip instrumentation, and in particular the design of on-chip debug communication fabric, is optimized for energy and performance in usages related to debug. For example, since debug traffic is typically ”bursty”, it is possible to incur low penalty in power consumption even with a high-bandwidth fabric by power-gating the fabric components during normal execution; while re-purposing the same infrastructure, one must ensure that the power/energy profile is not significantly disrupted by the new usages.

Acceptable Overhead for Interfacing Hardware. The required hardware resources for interfacing with the DfD and configuring it for security needs should be minimal.

6.4.2.2 Implementation Details

For reusing DfD modules to detect security critical events for verification of the corresponding security wrapper, the SPC needs to have an interface with the debug infrastructure to configure the DfD module with corresponding values. As required in [121], the required modification in a typical debug access port [53], as illustrated in Fig. 6.4(b), involves the addition of a SPC access port as well as possible modifications in the logic controlling accesses to the configuration bus. These typically constitute minimal H/W overhead [121]. With the DOT of the different IPs as well as the system execution context defining the wrapper events to be verified, the corresponding configuration register values of the associated trace macrocell are stored as policy arguments inside the SPC as seen in Fig. 6.4(b). A H/W or S/W based event comparison is present as part of the policy enforcer. To avoid the IP provider in getting to know which specific security critical events would be monitored (and potentially insert Trojans accordingly) in the SoC and utilize it maliciously, the SoC designer (rather than the IP Provider) figures out the corresponding DfD configuration register values from the debug programming model [53]. This practice incurs some small additional design effort. Standard trace macrocells typically incorporate logic resources to detect all of the security critical IP events required for policy implementation. Hence most of them can be configured at power up/boot time to be detected by the DfD block during execution. A few may be required to be programmed by the SPC only at run time, based on the policy requirements. Hence, apart from power up, the configuration bus is mostly OFF (power gated) and hence saves leakage power/energy. In certain Abbreviations 132 rare scenarios, where there is less configuration register space for programming DfDs to detect all sorts of events simultaneously, a few new registers may be added to the debug configuration address space to trigger concurrent event monitoring.

Noninterference with debug usage requires transmission of security data from DfD to SPC via a separate port/link (instead of re-purposing the debug trace port and the trace communication fabric, which might incur significant power/energy overhead at run time). This port/link would be triggered accordingly by the event detection logic in DfD. As in the usage scenario of [121], we interface the DfD with the corresponding IP security wrapper and send the DfD detected event/s through the existing active frame generation logic. Although the security wrapper is untrustworthy, as mentioned in Table 6.2, the DfD interface in wrapper and frame generation logic, which are mostly standard across wrappers of all IP types, are assumed trusted. With the wrapper to SPC communication link also considered trusted, the reuse of wrapper trusted resources to transport DfD detected event frames to the SPC would not give rise to any new security concerns and at the same time, aid in meeting debug re-purposing constraints. A close up illustration of a representative security wrapper, DfD module and the additional resources required for their interfacing is provided in Fig. 6.4(c). As a particular implementation, for the SPC to uniquely identify the DfD detected events from frames, the current value of the corresponding configuration register would be sent as part of the DfD frame. Besides, the implementation details govern whether wrapper frames and DfD frames are sent to the SPC simultaneously (as part of bigger frame) or consecutively one after the other. Irrespective of this, if a mismatch in the two values (including no event reporting from one) in the policy enforcer triggers the SPC to apply necessary security controls as governed by the policies. This may include cases such as SPC noting event at which mismatch happened and accepting only DfD events for that context in future, continue dual verification to monitor the degree of untrustworthiness of IP and/or getting all consequent possible events from the DfD for high frequency of mismatches. These security controls can just configure the frame generation logic inside the wrapper to select the source/s of information. Comparatively, the incurred power/energy overhead of a few active trace macrocells, as required by our method, is minimal in contrast to a full active debug operation. During the debug phase, where the particular trace macrocell may be required for other monitoring/controlling purposes, proactive security policies (e.g. disable untrusted IP interaction with a particular IP until validation completion etc.) would be implemented in the Abbreviations 133

SPC to avoid requirement of most of the fine grained IP security critical events in making a decision. This would increase the probability of malicious logic in wrappers not triggering and/or realizing their payloads.

6.4.3 Untrustworthy IP Cores

As seen in previous works on static trust verification of designs, untrustworthy IP cores is an extremely challenging problem in terms of finding an adequate solution to provide high security coverage. We note here that our proposed run time SoC protection mechanism against potential system level effects of untrustworthy IPs is complementary to these existing static IP trust validation techniques, which at- tempt mostly to detect malicious modifications and/or covert backdoor channels in designs via targeted test vectors or formal analysis. In contrast, the aim of our dynamic protection is to monitor and detect system (SoC) level direct/indirect effects of IP level Trojans (and bugs) at run time and apply necessary security controls, according to requirements of corresponding fine-grained IP trust-aware security policies. Although we do not claim complete coverage against the gamut of all possible untrustworthy IP core scenarios, the intention is to show that just like the scenario of SoC with underlying trusted IP hardware, where security poli- cies defend against threats, mainly originating from malicious S/W stacks and SoC to system interface, the SoC designer can also implement policies to detect un- trusted, undependable IP actions arising from Trojans in the design, and prevent any system level compromise. At the same time, one can do so in a systematic, methodical fashion with some enhancements to the E-IIPS architecture. As op- posed to an exact set of rules and regulations, the solution provides guidelines to SoC designers/integrators on an efficient approach towards solving untrustworthy IPs in SoC issue. We note here once again, that we have assumed that there is no malicious collusion between IP cores to execute system level attacks i.e. we treat each IP as independent entity from view point of untrustworthiness.

As mentioned earlier, for such 3rd party IPs, there is no golden RTL implementa- tion or associated models available as templates to a SoC designer, apart from the high level IP functional/architecture specifications (trusted as SoC designer/ar- chitect would typically provide it) and the SoC architecture (signifying that an IP’s interface with other SoC components, IPs is known). Even if architecture is not explicitly specified for the IP by the SoC designer, high level features like Abbreviations 134

number of pipeline stages, their overall functions, number of cache levels, presence of virtual memory or not for processors and similarly for other IPs are mostly available and easy to validate by the SoC design house. The key observation here is that one can utilize only these high level specification, IP interface level informa- tion along with generic architecture level rationale to verify correlations between specific, abstracted out, temporal events across different micro-architecture level sub-components of an IP to detect potentially un-trustworthy behavior that might affect the SoC operations. Typically in a design like a trusted IP core, a function- ally relevant operation, meaningful and visible to SoC components external to the IP, incorporates specific correlated, internal (to IP) events occurring temporally across multiple spatial micro-architecture level IP sub-units i.e. these sub-units interact in a specific rational, meaningful manner with each other to perform an activity or operation [102], relevant at the SoC level. The corresponding events are referred to here as “Micro-architecturally Correlated Events ” (MCE). Typ- ically, IP level Trojans disrupt/affect in some way the correlation between these spatio-temporal events i.e. in presence of an activated Trojan, internal events bearing little or no correlation with recent one/s may appear. This is explained with examples as follows.

Example 1: For a typical in-order processor core with 5 pipeline stages, a memory sub-system request (or simply a LD/SW in a RISC core) would involve a typical MCE sequence, i.e., the decode stage deciphering the instruction to be a LD/SW, the execution unit calculating the address and consequently the memory access stage generates the appropriate memory sub-system request. An example proces- sor level Trojan is one inside the memory access logic to conditionally generate a shadow LD/SW in additional to the normal one. Here the correlation disruption is in the form of a single active memory instruction (after decode in instruction window) generating two data memory requests, which does not satisfy common architecture rationale for a simple RISC processor.

Example 2: A hardware Trojan in the instruction fetch stage of a processor, trig- gering a branch or jump (to potentially a malicious source) without any previous active branch/jump instruction or corresponding activity in the program counter (PC) select logic (apart from PC +4/8) is an example of not satisfying correlation.

Example 3: In a typical memory controller architecture, for an example Trojan inside the request buffer, the earliest active request not being served under a FIFO Abbreviations 135

Figure 6.5: a) Typical sequence of events in detecting malicious action of IP in proposed solution; b) Block diagram representation of architecture of enhanced IP-Trust aware security wrapper. based scheduling policy or a random change to row buffer based policy for a Trojan in the arbiter/scheduler (for just a normal request; no condition flags etc.) are scenarios of uncorrelated spatio-temporal events.

Example 4: A rogue router conditionally generating a different or additional destination address and sending the packet there instead of or additional to the one in its active buffer also does not satisfy the MCE typical for a router from the view point of high-level specification and common architecture level rationale.

The major components of the proposed run-time framework that are required to be added to the “E-IIPS” architecture of chapters 4 and 5, to provide the above- mentioned SoC security against untrustworthy IP designs are described below. Utilizing these, a sequence of typical operations (across time) involved in detection of an untrustworthy IP action, that attempts to propagate to the system level is illustrated in Fig. 6.5(a).

6.4.3.1 IP-Trust Aware Security Monitors:

For targeting confidentiality(C) and integrity(I) attacks at the system level, the payload of the Trojan in the IP would be designed by an adversary to propagate the malicious action out of the IP. This would invariably involve output trans- actions with interacting IPs or SoC components. Detecting availability attacks is more challenging and is dealt later. For C/I attacks through these untrusted Abbreviations 136

IPs, we monitor high-level temporal events across the IP sub-units or MCE, that directly/indirectly affect or lead up to these output interactions. This is done by inserting “security monitors” inside the test-security wrapper to monitor and store a recent history of high-level events from strategic locations internal to the IP. An illustration of the architecture of a typical wrapper and associated security monitors is provided in Fig. 6.5(b). From the high level, an IP core constitutes control logic to control the operations, data-path logic to perform the operations and local stores (e.g. register file, reservation tables in processor) to temporarily store context. Typically, a Trojan would reside mostly in the control logic (includ- ing controlling the stores) and/or to some extent as modifications in the data path H/W. Altering the control flow or adding new control states allows the attacker possibilities to conditionally trigger the designed Trojan and thereby ensure its stealthiness. As highlighted before, in terms of their expression/payload, one way to classify IP level Trojans into two major categories is done as follows:

1. Performing covert functions or operation (usually conditionally triggered) in addition to what is specified as part of functional specification, opera- tion manual or normally expected from common architecture rationale. e.g. additional shadow loads/stores in processor, a router along with routing a packet from intended source to destination, also changes its scheduling pol- icy on conditional trigger from source and packet data (would affect later packets), memory controller sending a particular request to two banks etc. These are extremely difficult to detect as all intended functionalities are met.

2. Utilize rare event or event sequence conditions associated with IP to cause modifications to existing or legitimate IP functions/state (not additional) e.g. in processor, a load/store to a different address than one intended in processor, an addition operation in the ALU instead of a shift/subtraction etc, a packet in router misdirected to different destination, a memory con- troller following a row buffer based policy instead of specified priority based scheduling, in response to rare event triggers etc.

The former might require some additional hardware/logic insertions as compared to the latter in some situations. But it is more stealthy as both the factors of rareness of trigger conditions and operations in addition to intended specifica- tions are involved. On the other hand, a miscalculation of profiling rare events at static/design time by attacker as compared to actual dynamic time traffic or Abbreviations 137

Table 6.3: Categorization of MCE and Policies by IP Types

IP Ex. IPs Ex. MCE list Ex. Associated Policies Proce- GPP, GPU, µC based Instr. fetch, decoded value, Decode preceded by corr. fetch; Fetch preceded by essor device and system memory req. generation, cache corr. prog. counter(PC) calc.; Mem. system access Core controllers (USB, video, access, interrupts/exceptions, preceded by corr. decode; Main mem. access only power management etc.) prog. counter calculation logic on cache miss; PC jump only on corr. instr. or trap Stor- Main memory controller New/next access in req. buffer, Entry in req. buffer corr. to valid mem. req.; age cache controller change in scheduler policy or Req. scheduled corr. to prev. valid buffer entry; Contr- flash, NVM control priority of I/O channel (DMA), Req. scheduled acc. to policy by scheduler; oller logic, DMA engines bank/col/row calc. by interface DMA req. to mem. preceded by corres. µP grant Comm. Router, switch, New/next req. in buffer, Entry in buffer corres. to valid comm. req.; Fabric bus controller, next router along which packet Scheduled packet corres. to prev. valid buffer entry; Unit bus bridge sent, scheduling policy change, Next destination chosen from current routing policy; freq. scale in bus bridge Next bus transaction selected acc. to current priority events occurring in IP on interaction with environment may expose the latter to a higher chance of detectability. Typically, for most adversarial privilege lev- els, in case of the IP level Trojan of the former type, the associated additional operation or function would have no correlation or be only minimally correlated with the IP level context and recent IP operations/functions from a high level abstraction. For example, the additional shadow load/store, originating from the decode stage or memory access sub-unit in a processor core is not preceded by a corresponding instruction fetch and/or decode and occurs randomly in the IP context (no correlation with recent past events). Similarly, for a particular mem- ory controller context, a controller changing its scheduling policy in response to a set condition flag inside the scheduler H/W (for logging service efficiency etc.) along with executing a simultaneous covert load/store without any active requests or active requests with that address/data pair is another example of uncorrelated events in current context (schedule policy change) of the memory controller. For these examples, to potentially detect these malicious activities at run time, the proposed security monitors need to log events from the code fetch, decode and memory access stages of processor as well as scheduler arbiter, request buffer and controller-memory interface of the memory controller respectively. For the latter scenario of IP level Trojans as well, above approach would work as well. Depending on policy requirements, the security monitors may require to be at a finer infor- mation granularity and so do the policies checking the correlation. For example where the malice expresses in the form of miscalculation of load address or addi- tion in place of intended subtraction, instead of only checking whether a memory access or execution activity corresponds to recent decode action, the arguments also need to be taken into consideration i.e. load address, type of ALU operation etc. Abbreviations 138

Figure 6.6: Potential sites in the IP design for insertion of IP-Trust aware se- curity monitors in a) MIPS processor core, b) representative memory controller and c) NoC router

Just like the scenario of categorization of IP types and standardization (to a large extent) of the security critical events as in chapter 4, we broadly divide the IP types and illustrate a similar cluster of high level IP-Trust aware security critical events or required “MCE” and associated policies for each type. These are enumerated in Table 6.3. Of course, depending on the degree of untrustworthiness of the IP and the extent of IP and system level validation performed on it, events may be added or removed from this MCE list or may be monitored selectively based on security critical contexts (like boot, debug etc.) via configuration by the SPC, but this Table 6.3 just serves to provide a flavor and nature of these different events and policies to be analyzed according to the type of IP, to verify its trustworthiness at run time. For example, for models of H/W Trojan in the instruction fetch stage in a simple in-order pipelined processor, a typical high level correlation check, that may be applicable under Jumps to system memory, is to verify the instruction/s fetched against the corresponding program counter calculated in last cycle or presence of recent asynchronous interrupts/exceptions according to status registers. For untrusted program counter logic, this may be subject to check once cycle back to correlation with the past instruction fetched/decoded. Our assumption is that an attacker would not insert these Trojans in multiple sub-units internal to the IP with a view of stealthiness or detectability. Hence, with most of the sub-components actually legitimate (not necessarily trustworthy to SoC designer), such correlation checks won’t have to be subject to recursive checks to many cycles back and the malicious trigger can be detected at the bud or near its origin in space and time. For the example above, if the program counter logic is trusted from extensive functional/structural validation, the undependable activity may be caught in a Abbreviations 139 single round of check. Similarly for a model of Trojan fabricating a memory request in controller to memory interface of an untrusted memory controller, under read/writes to shared protected memory range, a correlation check according to the IP-Trust policy “A particular memory access with address or address/data pairs must be preceded by its corresponding presence in the request buffer” would be able to detect any such trigger. Like before, such check may lead to recursive event verification in scenarios of untrusted request buffer control logic as well. In the business model, similar to wrapper specifications before, a SoC designer would additionally request the MCE requirements for the IP (to the IP provider) to be extracted out as H/W logic monitors as part of the wrappers. In many scenarios, the required events may be already be detected by existing standard wrapper event type and metadata logic. These would reduce the incurred additional design modification and hardware overhead of the new monitors. In cases where SoC designer wishes to maintain greater control (and thereby trustworthiness) in trade off with added design effort, the IP providers are requested to provide only the necessary signals to be brought out to IP interface and the SoC designer would insert these monitors during SoC integration. An illustration of potential sites, regions or sub-units of representative IP cores of three different types, from where IP-Trust aware security critical events (intended MCE) may be extracted in a typical scenario is provided in Fig. 6.6(a), (b) and (c).

With respect to IP providers providing the MCE monitors in wrapper, their trust- worthiness is again of utmost importance in proceeding ahead with these run time security checks. Similar to techniques mentioned in the untrusted security wrap- per section, the integrity of these monitors can be easily verified at run time using appropriately configured local debug trace macrocells (which would detect these MCE for different cases as they are required for validation/debug as well). Based on certain outcomes, the DfD trace macrocell may be consequently utilized to serve as these MCE monitors. Alternatively, taking advantage of the high extent of stan- dardization of DfD components (and hence less potential for untrustworthiness), these trace macrocells may be used by the SoC designer from the very beginning to detect these critical IP-Trust aware events at run time. With appropriate inter- faces present/inserted in the trace macrocells as mentioned in malicious wrapper solution, the SoC designer would have to decipher the corresponding trace cell con- figuration register values from the debug programming model. An illustration of the architecture of a typical wrapper and associated security monitors is provided in Fig. 6.5(b) in a typical case of the wrapper extracting the required MCE. The Abbreviations 140 typical wrapper components are not shown here. As seen in Fig. 6.5(b), the input to the security monitors (SM) are from control logic and/or datapath sub-units of the IP core. The buffer inside SM stores the recent log of temporal events for the corresponding logic/stage. The SMs are triggered by the “Interface Trigger” logic to send the event logs to the wrapper frame generation (for communicat- ing to SPC) on specific interface events of interest (described below). Based on validation/test coverage, degree of untrustworthiness of the IP as well as security requirements during different execution phases (coarse grained like boot, execu- tion, power down, debug etc.), the applicable security monitors are appropriately configured by the SPC at boot time.

6.4.3.2 IP-Trust Aware Interface Triggers

In the work, our goal is not to perform verification of all events of interest of the untrustworthy IP at all times according to intended high level specification i.e. IP functional validation for any potential faults, or bugs is not the aim. Our proposed solution is based on the principle of isolating undependable or malicious activities, arising from Trojans or unintended vulnerabilities, to within the IP core itself. This leads to interacting SoC components not being affected directly or indirectly by the malicious action propagation and hence no compromise in system level se- curity, which is the high level goal here. Hence, one would desire to perform this event correlation verification according to IP-Trust aware high level policies, only when the corresponding IP attempts to communicate with the other SoC compo- nents through its output interface. As mentioned before, these attempts would be detected by the “Interface Trigger” logic, inserted as part of the security wrapper as shown in Fig. 6.5(b). On detection, the appropriate security monitors are trig- gered to send their recent security critical event logs to the SPC for verification. However, in most cases, the constraints of communication bandwidth, SPC execu- tion resource constraints, power/energy profiles along with requirements of system level performance demand for an optimized, selective scheme on which these events are subject to verification i.e. trigger only on certain interface actions/events that may be configured inside the Interface Trigger logic. As mentioned in [102], the input interface of any general hardware module or IP core can be categorized into 4 types of signals: Abbreviations 141

1. Control- Controls the operation or function of the module e.g. Valids, Status, Commands etc.

2. Data- Provides the data or information on which the operations or functions are performed.

3. Test- Constitutes signals required for putting the core in Test mode as well as for testing the module e.g. scan chain inputs

4. Global- Constitutes signals like clock, reset etc.

Similar to Trojans effects inside control logic and datapaths of untrustworthy IP cores, from the point of view of affecting the system level security, the control and data inputs at the interfaces of the interacting IPs would be mainly influenced by potential rogue activity of the corresponding untrusted IP level Trojan. Test and global signals are generally limited in number for a module and any influence on them due to Trojan has a much higher detection probability during validation. Hence it is assumed that they are typically not utilized by attackers to propa- gate payload at the system level. For each IP or component interacting with the untrustworthy IP in the SoC, the SoC designer figures out the type of inter-IP communicating input signals (data, control etc.) and analyzes the security crit- icality of the control and data inputs, if applicable, from system and subsystem level simulation. The corresponding interface trigger conditions or events such as “all activity for these specific control and data signal inputs at boot” or “specific events at these particular control signals at execution time” etc are formulated and configured by the SPC in the interface trigger logic at power up. For example, in a particular usage scenario for an untrustworthy processor communicating with the memory system, all control (read/write) and data (address, data) input may be critical at boot where as during normal execution, along with control, only address in secure protected range is critical with respect to potential system level security compromise. Similarly for the same processor controlling a crypto-IP, both con- trol (mode, enable, round status etc.) and data (e.g. keys) are typically critical at boot (as keys are transferred to crypto IP secure storage at that time) and only control inputs are security relevant during normal execution. A typical interface trigger logic has outputs to all the security monitors as shown in Fig. 6.5(b). On detection of the corresponding interface level event, the monitors are all triggered to communicate their recent event logs to the frame generation logic. These will Abbreviations 142

Figure 6.7: Input Tags associated with input data/control streams to untrust- worthy IP, depending on the security criticality of the interacting IP. consequently be sent to SPC for proper correlation validation. With regards to re- source constraints and performance requirements, the frame based interface may be made wider as compared to before, to allow for higher throughput of these events. This depends on the maximum number of events to be sent, any real time requirements etc and hence is implementation dependent. Depending on whether any undependable or rogue activity has been detected or not, the inter- face triggers may also be configured by the SPC at run time to tighten or relax conditions. Along with trigger event selection, the corresponding interface logic also constitutes proactive controls that are programmed/controlled by the SPC during boot/run time according to requirements of IP-Trust aware policies (see below). For example, for an untrusted IP1 interacting with IP2, the corresponding security policies may require that the communication is proactively disabled until the critical events are verified to satisfy correlation. In another case, the policies may dictate the inter-IP communication and the verification of events by SPC to occur in parallel and if correlation checks fail, then roll back any state changes if applicable. Selection of IP-Trust aware security policies are described below:

With respect to the untrustworthy IP of interest, apart from these output interface activity acting as triggers for verification, the input control/data stream originat- ing from the interacting IPs also may play a significant role in determining the trigger conditions and/or which events to monitor for validation in this scenario. In other words, for a security critical IP, whose control and/or data signals feed into the untrustworthy IP for some operations/computations, it is typically impor- tant to take into account the source of these control/data inputs in determining the granularity of event monitoring as well as interface triggers. For example, in a particular system context, when suppose a crypto-core (separate IP which is typically security critical as it may store security assets) communicates data (e.g. keys) to the untrusted processor for computation, any kind of output control/data Abbreviations 143 activity directly/indirectly related to the crypto data should serve as trigger con- ditions for validation, before it exits the untrusted IP. So a tag, which could be as simple as a bit determining security critical or not, should be associated with the inputs from communicating IPs at the interface as shown schematically in Fig. 6.7. This tag bit/s propagate through the IP to security monitors and interface trigger, for all interactions inside the IP directly/indirectly associated with the associated critical data/control. Hence the tag can be used for selecting which events to trigger on, which events to validate as well as what particular security controls should be applied at the interface trigger logic by the SPC during verification. A typical list of output interface trigger conditions (to verify IP-Trust aware events) in an untrustworthy processor for a particular scenario, with interface with main memory controller, DMA engines, flash controller as well as crypto-IP and DSP accelerators is provided in Table 6.4. Here along with event configuration in the “Interface Triggers”, we assume that some of the system level parameters like se- cure range of memory as well as specific security critical devices/channels are also programmed in there by SPC at power up (stored in SPC by SoC designer). From this representative scenario, we observe that the particular interface trigger events typically depends on various factors such as security tag of the outgoing transac- tion, boot or normal mode/phase of the system as well as security criticality of interacting IP.

Availability attacks via denial-of-service does not always require the propagation at the system level and thereby interaction with other SoC components e.g. an IP1 like memory controller not responding to a request from IP2 under certain conditional triggers may lead to system failure or compromise depending on the effects of the action on IP2. Hence these are more difficult to handle as compared to the scenario of system level confidentiality/integrity attacks that have been the major focus in this work. We present some directions or ideas along which a SoC designer might focus on to provide security against such denial-of-service threats. Similar to tags used to notify/identify security critical IP inputs and any associated direct/indirect operations on them, similar metadata scheme may also be added at the input interface to denote for example “Action Required” or “Some Response Required” for specific IP requests according to requirements of corresponding liveness policies programmed in the SPC. If the response (could be request grant or deny) is not received in the o/p interface within an estimated time or system context, the security monitors are triggered to send the event logs to the SPC via the frame generation logic for appropriate verification and action. Abbreviations 144

Table 6.4: Representative Interface Triggers for an Untrustworthy Processor

Interact- Security Example o/p Int- ing IP Critical? (Tag) erface Trigger Memory Con- Partially If secure tag or boot phase, all troller (I/O) (RD/WR from/to control/data, else only RD/WR Secure Memory) request to secure memory DMA Engine Partially If secure tag or boot phase, all (I/O) (only I/O from/to control/data, else only I/O specific devices) requests to specific devices Flash Con- Fully (Flash NA (Authenticated upgrade (o/p) troller (I) contains bootcode) only in special debug mode) Crypto-IP(I/O) Fully All control and data at all times DSP Accel.(I/O) No Control/data only at boot

6.4.3.3 IP-Trust Aware Security Policies

IP-Trust aware security policies in the SPC dictate what and between which micro- architectural events the correlation checks should be performed and the applicable security controls at the interfaces of the untrustworthy IP/s before and after the verification of the IP behavior. Similar to the case of generic policies, they are programmed by the SoC designer as firmware in the instruction memory (could be flash, ROM etc.) of the policy enforcer inside the SPC. Examples of typical correlation checks, that capture the high level (abstracted out) intended/golden behavior or operation of untrustworthy IP cores of different types are enumerated in Table 6.3. These intended high level behaviors, involving interaction between different IP sub-units are stored in the SPC. As mentioned above, apart from the verification of the event correlation, these policies also govern the security controls in the interface trigger logic of the corresponding untrustworthy IP wrap- pers. What values to apply to these controls at run time before any verification in the current context are influenced by a host of factors including degree of un- trustworthiness (DOT) of the IP core, security criticality of the interacting IP in the current system, any real time availability or performance requirements as well as past history of events from that IP. Depending on security criticality and or resource availability/performance constraints, two scenarios may typically arise for the untrustworthy IP:

• Disable all interface actions/activities until verification of the recent micro- architecture level events for intended correlation by the SPC. This is a safety Abbreviations 145

first approach which could be applicable for highly security critical interact- ing IPs, during the system boot phase where many security assets are shared between IPs and/or where the untrustworthy 3rd party IP has a high DOT.

• Allow the interface activities to take place, thereby allowing the IP action to propagate to other SoC components. At the same time, events sent from security monitors are verified for correlation. Apart from the scenarios men- tioned in former case, these would be the method typically followed during normal execution. In case of undependability detected, the system may be roll back by the SPC if applicable and possible. This might require main- taining of some check points by the SPC.

Moreover, there could be potential choices as to when the SPC would analyze the sent events. These could be critical for performance reasons in scenario 1 (described above) or preventing any security compromises in system in scenario 2. The straight cut solution would be for the SPC doing the correlation checks at the instant according to the programmed policies. Depending on the estimation of analysis or execution requirements at design time, a separate analysis engine could be added to the generic policy enforcer (which analyzes normal security threats from S/W stacks, system interface etc.) to perform the IP-Trust aware verification separately. This could be just the addition of another micro-controller core inside the SPC. This is highly implementation dependent. The second solution regarding when the SPC should analyze events is adding support in the SPC to track untrustworthy IP action effects as it propagates through the system (using H/W based tag bits) [123] and analyzing the events only when the effects are found to influence statically defined security critical assets/signals in other IP or SoC components (like configuration registers, stored keys, specific control signals etc.). On one hand, this may come in handy in scenarios where the potentially untrusted IP actions is handled/controlled by existing generic security policies in other SoC components/IPs and/or leads to no hampering of function e.g. dropped packet in NoC or only latency increase etc. A scenario of the former may be existing security policies leading to a memory controller rejecting an additional shadow load to secure memory generated in malicious processor. This saves execution resources of the SPC policy enforcer in terms of analysis of the events. But on the other hand, similar to H/W based tag support for S/W control flow integrity and information flow checks [124], H/W support is required in the SPC for tracking the potential IP action along the system. Resources are expended in determining Abbreviations 146 the flow of action effects etc. Hence, due to the two opposing constraints, the appropriate method to follow is dependent on implementation choices of the SoC designer. As mentioned before, the goal of the work is to provide general guidelines to the security community on possible methods or approaches to follow in order to tackle the untrustworthy IP problem in SoC. Finally, as part of the policies are also stored the conditions or events that need to be configured in the interface trigger logic of the untrustworthy IP wrapper. These detect the events related to the IP attempts to communicate with different interacting IP/s, on which to trigger the security monitors to send event logs for check. The SPC mainly configures these in the untrustworthy IP wrapper during boot. Sometimes, according to policy requirements for scenarios such as rogue, vulnerable activity detected etc., these triggers may be required to be dynamically configured by the SPC to add or remove events.

Both solutions of untrusted wrapper and untrusted IP core provided above might be combined to provide integrated security in scenario of Trojans or vulnerabilities possible in any IP sub-unit or module (described in use case example below).

6.5 Use Case Analysis

In this section, in a particular usage scenario, we elaborate on how our proposed run-time solution can detect malicious IP activity originating from both security wrapper and IP core level Trojans and prevent any compromise of the system secu- rity. Here, in this use case, we assume for illustrative purposes, that the adversary inserts Trojans in both the wrapper and the core of the main memory controller and designs same rare event based trigger logic for both. The hypothetical attack model is as follows:

Information Leakage via Malicious Shadow Store to Memory of Specific Device: For particular load/store request to protected region of memory during specific secure critical program execution (trigger), in the scenario of cache miss and associated eviction of dirty cache block to memory, the Trojan in controller- memory interface of the memory controller (MC) is triggered to cause a payload of an extra shadow store (along with store of evicted block) operation to the memory mapped region of device of interest of the attacker. The attacker, through a controlled program on processor can request an output to the corresponding Abbreviations 147

Figure 6.8: Operation flow of the proposed solution for providing system level protection in use case scenario of Trojan in security wrapper and core of main memory controller. device, which is performed through DMA. Hence potential critical assets (e.g. keys, configuration register values etc.) used by the original secure program/service may be potentially extracted. The trigger also leads to an activated malicious logic in the MC security wrapper to not communicate event logs to SPC (for correlation analysis) on trigger from interface.

Assumptions: In this use case, we assume that the attacker (possible memory Abbreviations 148 controller IP provider) can utilize the SoC designer configured secure memory range (boot) in the memory controller (MC) security wrapper to act as trigger for the inserted Trojan in both the MC and wrapper. If not known at IP design details, the adversary can potentially utilize firmware (F/W) updates in MC scheduler to program the triggers (functional model similar to S/W based trigger), assuming F/W is involved in a typical MC design. Regarding the solution, as seen in the flow diagram of associated operations in Fig. 6.8, the processor core, memory con- troller IP, the trace-macrocell of the MC, DMA engine on platform along with the centralized SPC are involved in the system level usage scenario. Although other IP cores could be independently malicious/vulnerable, in the context of the attack model, the memory controller is the only IP considered significantly untrustwor- thy here. Due to high DOT of the MC as well as security criticality of the shared memory, all accesses associated with (including evictions) secure memory region are disabled by the SPC according to security policies until the micro-architecture events are verified for necessary correlations. The MC DfD trace macrocell is utilized for corresponding untrusted security wrapper validation by the SPC. The memory controller is assumed to be constituting three major sub-components com- ponents i.e. request buffer, access scheduler and controller-memory interface logic (which also computes appropriate bank/rank/row/column). Below is the sequence of steps that are performed along the system operation (as shown in Fig. 6.8) in detecting the Trojan/s and ensuring system security.

Flow of Operation:

1. At the boot phase, the SPC configures the security monitors (if necessary) and the interface trigger logic of the MC security wrapper to program for example “all accesses associated with secure portion of the memory should be disabled until the preceding MC actions are verified for trustworthiness”. The DfD trace macrocell associated with the MC is configured for part/all of the corresponding security wrapper events for verification. Besides the processor core wrapper is configured with generic security critical events to monitor by SPC and the DMA engine by appropriate boot F/W.

2. During normal system level execution, for a security critical program exe- cuting on the processor core, a cache miss occurs for a LD/SW to secure memory region. Associated with it is also a cache block eviction. The pro- cessor wrapper checks whether the memory access is even allowed according Abbreviations 149

to privilege level of program by the necessary security policies. It passes the check and the accesses (along with evicted store) propagate to the memory controller where it is added to the request buffer.

3. Consequently, along with cache missed LD/SW, the evicted SW is scheduled by the arbiter (inside scheduler). This triggers the Trojan in the controller to memory interface to generate an additional shadow SW to device memory of attacker’s interest (payload). Hence both these requests would access the physical memory next. However, the configured interface trigger controls of the MC wrapper (by SPC policy) has disabled the controller interface un- til the verification of past MC events for trustworthiness. On this memory access attempt, the interface logic triggers the security wrapper to send the event logs for check to SPC. But a triggered Trojan (same trigger delayed) in the security wrapper monitor prevents the SPC to notified of these events. At the same time, the DfD trace macrocell detects the corresponding events of 2 pending stores to physical memory and informs the SPC through the debug-wrapper interface and frame generation logic. Receiving nothing from the security monitors, the SPC detects malicious or undependable MC wrap- per. The SPC through wrapper-SPC interface commands the MC security monitors to send the event logs for verification.

4. Assuming the event logs to be legitimate (can be again verified by DfD, but not shown here), the SPC analyzes them for satisfying correlation according to MCE of the memory controller IP core. Consequently, according to rule “All RD/WR accesses at controller-memory interface must be preceded by their presence in the scheduler sub-unit as well as the request buffer”, the correlation disruption is caught and hence the Trojan action detected. Ac- cording to policy requirements, the disable at the MC to memory interface is upheld and the corresponding accesses rejected/dropped. The next access from request buffer is served by the MC.

5. In due time (shortly afterwards as he/she doesn’t want the corresponding de- vice memory modified), the attacker through running a program remotely/- physically on the processor core, requests for data output from corresponding memory (as dictated by shadow SW address/es) to device of interest (may be network adapter, bluetooth, Wifi etc.). However, with the attack thwarted by the proposed solution, the adversary fails in the attempt to extract po- tentially security critical information. Abbreviations 150

Figure 6.9: a) Block Diagram Schematic of the SoC framework; the internal sub-units of the b) DLX processor, c) Representative memory controller and d) SPI controller

Table 6.5: Different Scenarios of Trojan (represented by payload) Coverage by Insertion of Security Monitors in three IP Cores of our framework

Sec. Monitor Ex. Processor Ex. Mem. Con- Ex. SPI Con- Coverage Trojan (Pay- troller Trojan (Pay- troller Trojan (Pay- Scenarios load) Coverage load) Coverage load) Coverage Addl. shadow (or hide) LD/SW Drop/Fabricate mem. error in serial clk Scenario I by mem/ALU stage, wrong PC req. by scheduler, gen. logic, wrong select logic, erroneous decode, wrong next req. select counter op. for wrong WB op. calc./propagate from policy (e.g. FIFO) external bit Tx/Rx All Prev. + PC jump/branch All Prev. + drop/ All Prev. + data Scenario II calc. w/o corres. inst., mem./ fabricate req. by tamper at Tx/Rx WB stage perturbs ALU res./WB buffer, error in bank/ shift, data latch val., ALU perform wrong calc. row/col calc. at interface at incorrect times All Prev. + cache sub-system Scenario III altering mem. req. data/addr. NA NA wrong prog. branch T/NT

6.6 Overhead Analysis

In this section, we consider the hardware overheads incurred by the IP-Trust aware security monitors, inserted in the wrappers of different IP types, for varying Trojan coverage scenarios. We extend our developed SoC framework (used in chapters 4 and 5) in verilog RTL, towards inserting these security monitors. The base SoC model schematic, showing the major IP components is illustrated in Fig. 6.9(a). Abbreviations 151

6.6.1 Security Monitor Implementations

For incorporating the proposed security monitors, we considered three IP blocks of different types, e.g., the DLX processor core, the memory controller and the SPI controller. Their major sub-units are shown schematically in Fig. 6.9(b), (c) and (d). According to the methodology described in the architecture section for untrusted IP cores, security monitors are inserted in their wrappers at vary- ing spatio-temporal granularities, for verifying different Trojan coverage scenarios. Table 6.5 lists the IP-level Trojans (in terms of their effect/payload), against which protection is ensured by these monitors and necessary IP-Trust aware security poli- cies in the SPC. Interface trigger logic was incorporated within the wrappers too. For event communication to SPC, a separate, additional frame-based interface has been incorporated in the IP wrappers to avoid contention between the two traffic types (generic events and proposed MCE logs). We note that it is infeasible for the SPC to perform complete emulation of the high-level, intended IP operation. Hence monitors must be placed at intermediate strategic locations in the IP as discussed in architecture section, to enhance efficiency of Trojan coverage. In sce- nario I of DLX Trojan coverage, monitors are placed at input of fetch, output of decode, and output of memory access stage, whereas in scenario II, monitors are additionally inserted at ALU outputs and write-back outputs. A malicious logic triggering a branch/jump for a particular unsatisfied condition (which implies there should be no branch/jump under normal operation) is detected using the ALU monitor in scenario II. Similarly for the memory controller (MC), monitors at output of request buffer as well as scheduler can detect any malicious activity in scheduler hardware or firmware in scenario I. Additionally, storing the memory requests arriving at the input interface of MC allows one to detect Trojans in the buffer as well in scenario II. The framework has been functionally validated using ModelSim for typical use cases. All analysis for area/power have been performed using 32nm technology library.

6.6.2 Results

The estimated area and power overheads for the inserted security monitors and interface triggers in the three scenarios of Trojan coverage for the processor core are shown in Table 6.6. We note that the overhead is calculated here with respect to the base design of DLX core with standard security wrapper. The overhead Abbreviations 152

Table 6.6: Area & Power Overhead of Security Monitors in Processor IP (Orig. Area and Power with 1 KB inst., data memory at 32 nm - 352405 µm2 , 12.56 mW)

Different Security Die Area Power (Active + Leakage) Monitor Scenarios Overhead (%) Overhead (%) Case I (32 b o/p) 6.68 6.92 Case I (256 b o/p) 7.17 7.32 Case II 10.44 10.82 Case III 11.68 11.62

Table 6.7: Area & Power Overhead of Security Monitors in Memory Controller (MC) IP and SPI Controller IP (Orig. Area and Power of MC and SPI with wrappers at 32 nm - 629433 µm2, 13.81 mW;; 5456 µm2, 0.298 mW)

Different Memory Controller SPI Controller Monitor Area Active+Leak Area Active+Leak Scenarios Ovrhd.(%) Pwr Ovhd.(%) Ovrhd.(%) Pwr Ovhd.(%) Case I 10.77 14.04 29.08 19.12 Case II 11.16 18.53 101.88 66.77

varies between a relatively small 6 − 11% across the coverage scenarios. Besides, increasing the additional frame interface width to 256 bits from 32 bits to transfer simultaneously the 8 temporal event logs stored in each monitor (8 chosen accord- ing to design details) incurs minimal additional overhead. In addition, a point to also highlight is that for significantly increased Trojan coverage from scenario II to III, the corresponding increase in hardware overhead is minimal. This signifies that one could potentially gain high run time security against an untrusted IP at minimal hardware cost.

Similarly, the incurred hardware overheads for the memory controller and the SPI controller are listed in Table 6.7. For the memory controller, a 4KB register based functional memory is also added to the base memory controller to calculate the security monitor overheads. Although for small IP designs, the area/power overhead could be significant with respect to the base (e.g., in scenario II of SPI controller), the overhead with respect to the full SoC die is insignificant for all IPs, as shown in Table 6.8. Along with our representative toy SoC, two commercial SoCs manufactured at 32 nm are also taken into consideration to calculate these approximate overheads. Note that the increase in the SPC overhead with respect Abbreviations 153

Table 6.8: Die Area Overhead (OVH) of Security Monitors (SMs) with maxi- mum Trojan coverage wrt. to our SoC framework (Area - 13.1X106), Apple A5 APL2498 (Area - 69.6X106), Intel Atom Z2520 (Area - 40.2X106), all at 32 nm process technology

IP Core OVH(%)in model OVH(%)in A5 OVH(%)in Atom Processor 0.31 0.059 0.1 Mem. Control 0.543 0.103 0.175 SPI Controller 0.043 0.008 0.014 to its base value, due to incorporation of additional IP interfaces for event logs, control signals and all required IP-trust aware policies as firmware in instruction memory has not been taken into account in this work. However, from the sample values in Table 6.8, we believe that even after incorporation of SPC overheads, the H/W overhead of this proposed architecture would be minimal with respect to the full SoC die Perhaps more generally, our experiments show the design parameters and trade-offs that a security architect must analyze to deploy the framework in an industrial SoC design environment.

6.7 Conclusion

In this chapter, we have presented, for the first time to our knowledge, comprehen- sive analysis of trust issues at SoC level caused by untrusted IP blocks. We have also presented a novel architecture-level solution to achieve trusted SoC operation with untrusted IPs. With growing reliance on third-party IP blocks during SoC design process, untrusted IPs are rapidly becoming major security concerns for SoC manufacturers. Design-time IP trust verification approaches proposed in lit- erature to date, fail to provide high confidence in identifying Trojans of all forms and types as well as possible exploits of apparently benign design artifacts, such as the design-for-test infrastructure. The proposed architecture provides a relatively low-cost and robust defense against untrusted IPs. The architecture employs fine- grained IP-Trust aware security policies to detect and prevent malicious operation of an untrusted IP at the system level. It builds system trust by considering trust- worthiness of minimal set of standard components (e.g. design-for-debug structure and a security policy checker), which are suitable for comprehensive trust verifica- tion. The proposed architecture is evaluated for diverse use-cases, which proves its effectiveness for representative SoC designs. It is applicable, in general, to various Abbreviations 154 types of SoC designs and is scalable to large complex SoCs with arbitrary number of IPs. Chapter 7

Conclusion and Future Work

The thesis concludes with a highlight of the major contributions of the work and future directions for research being continued along this path.

The overall high level theme of the research presented in this dissertation revolves around ensuring security and trustworthiness of the underlying hardware layer of modern day electronic systems. Part I of this thesis proposes three low cost design/validation approaches for protection against recycled and cloned IC - the two major forms of counterfeit ICs prevalent in the semiconductor supply chain at present. Security against recycled chips is achieved through an active defense technique which inserts one-time-programmable antifuses to the I/O port logic or pad to pin interface at the die or package level, respectively. These normally open AFs, inserted to one or a few IC pins, disables the corresponding pin functions and prevents usage of these ICs until unlocked by a trusted party e.g. system designer. Once programmed, these AFs serve as a signature for previous usage, handling or tamper, enough for detecting these recycled/remarked ICs. On the other hand, security against cloned ICs of different types is achieved through an authentication approach utilizing securely stored OTP keys on die or unique chip signatures generated from intrinsic, random variations in AF program resistances designed for the same structure and programming parameters. The entropy for the IC specific signature can also be extracted from inherent, typically random process/manufacturing related variations in input/output pin resistances across ICs. This incurs virtually zero design effort and die/package level H/W over- head. Part I of the thesis elaborates on the methodologies and implementation details of these different anti-counterfeiting techniques. Detailed security analysis,

155 Abbreviations 156 highlighting the strengths and weaknesses of each method have been performed. Along with verification of efficiency, the H/W overhead, wherever applicable, has been estimated through appropriate simulation studies. Hardware measurements have been conducted wherever required. As compared to existing DfS approaches, mostly confined in the research community, one or a combination of our proposed techniques would have the advantages of 1) dual protection against both recycled and cloned ICs, 2) minimal to virtually zero design effort and H/W overhead, 3) applicability to legacy designs as well as 4) to both digital and analog ICs.

Part II of this dissertation focussed on a flexible architecture framework for a systematic, methodical approach towards implementation of system level security policies in a modern SoC. These policies typically govern the access of security crit- ical assets, sprinkled around in multiple IPs of the SoC, thereby protecting them from underlying confidential, integrity and availability attacks from the software stack, firmware as well as external interfaces between SoC and the system. The proposed architecture framework revolves around a centralized micro-controlled security policy controller (SPC) which executes the system level policies stored in its instruction memory (e.g. flash, ROM etc.) and asserts the necessary se- curity controls. The SPC serves as the single location for analysis, validation and upgrade of policies, thereby allowing a disciplined, formal approach towards policy implementation. As a result, the existing issues with post-Si validation and on-field upgrades/patch of SoC policies can be substantially alleviated. The SPC communicates with constituent IP blocks through augmented smart security wrapper, which extract the security critical internal events of interest from the IP and communicate them with the SPC. These wrappers provide a standard in- terface to extract security related information/activity while abstracting out the internal implementation details of the IP. They can be provided by the IP ven- dors and can be configured by the SPC at boot time to detect subset of events according to particular usage scenarios. We have provided details of the imple- mentation of SPC and IP security wrappers, focusing on the clustering of standard security critical events according to a few broad category of IPs. Using a repre- sentative SoC model in Verilog RTL, the architectural framework is implemented and functionally verified for a set of typical use-cases. The H/W overhead of the IP security wrappers and the SPC with respect to a typical modern day SoC has been estimated experimentally to be minimal. Abbreviations 157

To reduce the design effort, H/W overhead involved in security wrapper implemen- tation, as well as to enable much greater flexibility in on-field upgrades/patches in response to bugs found or changing security requirements on-field and dur- ing post-Si validation, we have proposed re-purposing the typically resource-rich on-chip debug instrumentation to detect these security critical events as well as extract new ones for policy upgrade/patch. Taking advantage of the high degree of observability and controllability of IP designs enabled by the local debug trace macrocells (required for post-Si validation and on-field tests), the SPC configures these local DfD at boot time to detect and send the necessary IP events of interest. Similar to the architecture of ARM Coresight, a debug framework has been incor- porated to our SoC model and its functionality completely validated in common usage scenarios. The potential savings in wrapper area overheads for different IPs due to debug re-purposing have been estimated experimentally to be significant as well. Finally, the security policy framework has been enhanced with the required run-time architecture support to implement IP-Trust aware security policies in order to ensure security and reliability of SoC operation in presence of inherently untrustworthy constituent 3rd party IP blocks. Hence, along with the earlier attack models of S/W and/or SoC-system interfaces, the threat also incorporates Trojan attacks via malicious/covert hardware or firmware logic and/or typical uninten- tional vulnerability in these 3rd party IPs. The methodology is based on isolating rogue activity/payload of the Trojan within the corresponding IP itself. Rogue action is detected through verification of correlation between typical high level IP temporal events via appropriate IP-Trust aware security policies. Any malicious security wrapper can be detected by cross-checking the local DfD associated with the IP. This added architecture support was incorporated into our SoC framework and a set of Trojan use cases analyzed. The associated H/W overhead for this run time defense against IP level Trojans in SoC was calculated to be low with respect to modern SoC values.

Future directions for Part I of this thesis includes experimentally verifying the efficiency of the two antifuse based proposed DfS methods against both recy- cled and cloned ICs. This involves fabrication and/or integration of the metal- insulator-metal antifuses and test fuses at port/pins of representative functional die or packages. Experimentally verifying the uniqueness and robustness of the IC signatures by measuring the program resistances of these antifuses would prove its efficacy in practice. Furthermore, the highly promising pin resistance based Abbreviations 158 authentication needs to be tested for more digital ICs of different types as well as larger analog/mixed signal chips.

For the proposed security policy architecture framework, future works include analyzing the communication bandwidth and associated power/energy profiles in- volving communication fabrics of different types between IP security wrappers and SPC in typical usage scenarios. These include analysis with fabrics such as cross- bar, bus or network-on-chips. Effect of these communication patterns on system performance can be analyzed simultaneously. The developed SoC model needs to be enhanced with more IPs of different types and inter-IP interactions can be enhanced to closely represent a large-scale realistic scenario. Bibliography

[1] S. J. Wang et. al. High-Performance Metal/Silicide Antifuse. IEEE Electron Device Letters (EDL), 13(9), 1992.

[2] . Counterfeit Chips on the Rise, . http://spectrum.ieee.org/computing/.

[3] . PIC16F84A - Datasheet, . http://www.microchip.com/.

[4] K. Lee et. al. A 32-KB Standard CMOS Antifuse One-Time Programmable ROM Embedded in a 16-bit Microcontroller. IEEE Journal of Solid State Circuits (JSSC), 41(9), 2006.

[5] C. Downing. 1T-OTP - The Ideal NVM Solution for the Growing Mobile Device Market. http://www.chipestimate.com.

[6] Z. Chen, A. Sinha, and P. Schaumont. Using Virtual Secure Circuit to Protect Embedded Software from Side-Channel Attacks’. IEEE Transaction on (TComp), 62(1):124–36, 2011.

[7] S. L. Buedo. Electronic Packaging Technologies. http://www.slideshare. net/.

[8] . Assembly and PCB Layout Guidelines for Chip- Scale Packages, . http: //www.microsemi.com/.

[9] S. T. King et. al. Designing and Implementing Malicious Hardware. In Usenix Workshop on Large-Scale Exploits and Emergent Threats, 2008.

[10] J. Rajendran et. al. Securing Processors Against Insider Attacks: A Circuit- Microarchitecture Co-Design Approach. IEEE Design and Test, 30(2):35–44, 2013.

[11] A. Das et. al. Detecting/preventing information leakage on the memory bus due to malicious hardware. In Proceedings of Design, Automation & Test in Europe (DATE), pages 861–66, March 2010. 159 Bibliography 160

[12] U. Guin et. al. Counterfeit Integrated Circuits: A Rising Threat in the Global Semiconductor Supply Chain. Proceedings of IEEE, 102(8):1207– 1228, 2014.

[13] U. Guin, M. Tehranipoor, and D. DiMase. Counterfeit IC Detection and Challenges Ahead. In ACM SIGDA, Mar 2013.

[14] K. Huang, J. M. Carulli Jr., and Y. Makris. Counterfeit Electronics: A Rising Threat in the Semiconductor Manufacturing Industry. In IEEE In- ternational Test Conference (ITC), 2013.

[15] . Chip counterfeiting case exposes defense supply chain flaw, . http://www. eetimes.com/.

[16] X. Zhang, N. Tuzzio, and M. Tehranipoor. Identification of Recovered ICs using Fingerprints from a Light-Weight On-Chip Sensor. In Design Automa- tion Conference (DAC), June 2012.

[17] X. Zhang and M. Tehranipoor. Design of on-chip lightweight sensors for effective detection of recycled ICs. IEEE Transactions on Very Large Scale Integrated (TVLSI) Systems, 22(5):1016–1029, 2014.

[18] A. B. Kahng et. al. Robust IP Watermarking Methodologies for Physical Design. In Design Automation Conference (DAC), June 1998.

[19] U. Guin, D. DiMase, and M. Tehranipoor. Counterfeit Integrated Circuits: Detection, Avoidance, and the Challenges Ahead. Journal of Electronic Testing, 30(1):9–23, 2014.

[20] G. E. Suh and S. Devadas. Physical unclonable functions for device au- thentication and secret key generation. In Design Automation Conference (DAC), June 2007.

[21] A. Maiti and P. Schaumont. Improved Ring Oscillator PUF: An FPGA- friendly Secure Primitive. Journal of Cryptology, 24:375–397, 2011.

[22] M. Cortez et. al. Modeling SRAM Start-Up Behavior for Physical Unclonable Functions. In IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2012. Bibliography 161

[23] R. Chakraborty and S. Bhunia. RTL Hardware IP Protection Using Key- Based Control and Data Flow Obfuscation. In International Conference on VLSI Design, pages 405–410, Jan 2010.

[24] R. Pappu et. al. Physical One-Way Functions. Science, 297(5589):2026–30, 2002.

[25] Y. M. Alkabani and F. Koushanfar. Active Hardware Metering for Intellec- tual Property Protection and Security. In Proceedings of USENIX Security, 2007.

[26] . Tamper-Evident Packaging for Healthcare, . http://www. healthcarepackaging.com/package-development/tamper-evidence.

[27] A. Basak, Y. Zheng, and S. Bhunia. Active defense against counterfeiting attacks through robust antifuse-based on-chip locks . In IEEE VLSI Test Symposium (VTS), pages 1–6, April 2014.

[28] A. Basak and S. Bhunia. P-Val: Antifuse-based Package-level Defense against Counterfeit ICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) “To Appear”, 2016.

[29] A. Basak, F. Zhang, and S. Bhunia. PiRA: IC authentication utilizing intrinsic variations in pin resistance . In IEEE International Test Conference (ITC), pages 1–8, October 2015.

[30] . 50 YEARS OF MOORE?S LAW, . http://spectrum.ieee.org/static/ special-report-50-years-of-moores-law.

[31] John Rushby. Noninterference, Transitivity, and Channel-Control Security Policies. Technical report, SRI, 1992.

[32] X. Li. Sapper: A Language for HardwareLevel Security Policy Enforcement. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014.

[33] S. Krstic, J. Yang, D. W. Palmer, R. B. Osborne, and E. Talmor. Security of SoC Firmware Load Protocol. In IEEE HOST, 2014.

[34] M. Sastry, I. Schoinas, and D. Cermak. Method for enforcing resource access control in computer system. In US Patent 20120079590 A1. Bibliography 162

[35] S. J. Greenwald. Discussion Topic: What is the Old Security Paradigm. In Workshop on New Security Paradigms, pages 107–118, 1998.

[36] M. Miettinen, S. Heuser, W. Kronz, A. Sadeghi, and N. Ashokan. ConXsense: automated context classification for context-aware access con- trol. In ASIACCS, pages 293–304, 2014.

[37] M. Conti, B. Crispo, F. Fernandes, and Y. Zhauniarovich. CRePE: A sys- tem for enforcing Fine-grained Context-related Policies on Android. IEEE Transactions on Information Forensics and Security, 7(5):1426–1438, 2012.

[38] R. Hull, B. Kumar, P. Patel-Schneider, A. Sahuguet, S. Varadarajan, and A. Vyas. Enabling Context-aware and Privacy-conscious User Data Sharing. In 2004 IEEE International Conference on Mobile Data Management, pages 187–198, 2004.

[39] J.A. Goguen and J. Meseguer. Security Policies and Security Models. In Proc. 1982 IEEE Symposium on Security and Privacy, pages 11–20, 1982.

[40] T. Amtoft, S. Bandhakavi, and A. Banerjee. A Logic for Information Flow in Object-Oriented Programs. In Proceedings of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2006), pages 91–102. ACM Press, January 2006.

[41] B. Alper and F. B. Schneider. Recognizing Safety and Liveness. Distributed Computing, 2(3):117–126, 1987.

[42] N. Borisov, R. Johnson, N. Sastry, and D. Wagner. Fixing Races for Fun and Profit: How to Abuse Atime. In Proceedings of the 14th USENIX Security Symposium, pages 303–314, 2005.

[43] J. T. Haigh and W. D. Young. Extending the Non-Interference Version of MLS for SAT. In Symposium on Security and Privacy, 1986.

[44] Building a Secure System using TrustZoneTechnology, www.infocenter.arm.com.

[45] Intel Software Guard Extensions (Intel SGX), www.software.intel.com.

[46] (TPM) Summary, www.trustedcomputinggroup.org. Bibliography 163

[47] R. S. Chakraborty, F. Wolff, S. Paul, C. Papachristou, and S. Bhunia. MERO: A Statistical Approach for Hardware Trojan Detection. In Workshop on Cryptographic Hardware and Embedded Systems, 2009.

[48] U. Guin, D. DiMase, and M. Tehranipoor. Counterfeit Integrated Circuits: Detection, Avoidance, and the Challenges Ahead. Journal of Electronic Testing, 30(1):25–40, 2014.

[49] X. Wang, Y. Zheng, A. Basak, and S. Bhunia. IIPS: Infrastructure IP for Secure SoC Design. IEEE Transaction on Computers, 2014.

[50] Y. Zorian. Embedded memory test and repair: Infrastructure IP for SOC yield. In International Test Conference, pages 340–349, 2002.

[51] A. Basak, S. Bhunia, and S. Ray. A Flexible Architecture for System- atic Implementation of SoC Security Policies . In IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 536–543, November 2015.

[52] F. DaSilva et. al. Overview of the IEEE P1500 standard . In IEEE Inter- national Test Conference (ITC, pages 988–997, November 2003.

[53] CoreSight On-chip Trace & Debug Architecture, www.arm.com.

[54] S. Chiang et. al. Antifuse Structure Comparison for Field Programmable Gate Arrays . In IEEE Electron Devices Meeting (IEDM), 1992.

[55] N. Robson et. al. Electrically Programmable Fuse (eFUSE): From Mem- ory Redundancy to Autonomic Chips. In IEEE Custom Integrated Circuits Conference (CICC), 2007.

[56] H. Ito et. al. Pure CMOS One-time Programmable Memory using Gate-Ox Anti-fuse . In IEEE Custom Integrated Circuits Conference (CICC), 2004.

[57] R. S. Chakraborty and S. Bhunia. Hardware protection and authentica- tion through netlist level obfuscation. In IEEE International Conference on Computer-Aided Design (ICCAD), pages 674–77, Nov 2008.

[58] S. Paul, R. S. Chakraborty, and S. Bhunia. VIm-Scan: A Low Overhead Scan Design Approach for Protection of Secret Key in Scan-Based Secure Chips. In IEEE VLSI Test Symposium (VTS), pages 455–60, May 2007. Bibliography 164

[59] . Security Analysis - Analytical Solutions, . http://www.asinm.com/.

[60] A. Krishna et. al. MECCA: A Robust Low-Overhead PUF using Embedded Memory Array. In Cryptographic Hardware and Embedded Systems (CHES), pages 407–420, September 2011.

[61] T. H. Kim, R. Persaud, and C. Kim. Silicon Odometer: An On-Chip Re- liability Monitor for Measuring Frequency Degradation of Digital Circuits. IEEE Journal of Solid State Circuits (JSSC), 43(4), 2008.

[62] G. Zhang et. al. An Electro-Thermal Model for Metal-Oxide-Metal Anti- fuses. IEEE Transactions on Electron Devices (TED), 42(3), 1995.

[63] C-C. Shih et. al. Characterization and Modeling of a Highly Reliable Metal-to-Metal Antifuse for High-Performance and High-Density Field- Programmable Gate Arrays. In Proceedings of the Reliability Physics Sym- posium (RPS), pages 25–33, April 1997.

[64] S. S. Cohen et. al. Novel Metal-Insulator-Metal Structure for Field- Programmable Devices. IEEE Transactions on Electron Devices (TED), 40(7), 1993.

[65] . 8-bit AVR Microcontroller, . http://www.atmel.com/.

[66] . Virtex-5 FPGA User Guide, . http://www.xilinx.com/.

[67] . Intel Core i7 Processor Family , . http://www.intel.com/.

[68] K. E. Gordon and R. J. Wong. Conducting filament of the programmed metal electrode amorphous silicon antifuse. In International Electron Devices Meeting (IEDM), pages 27–30, Dec 1993.

[69] Y. Tamura and H. Shinriki. Most Promising Metal-to-Metal Antifuse based l0nm-thick p-SiNx film for High Density and High Speed FPGA Application. In International Electron Devices Meeting (IEDM), pages 285–88, Dec 1994.

[70] J. Kim and K. Lee. 3-Transistor Antifuse OTP ROM Array using Standard CMOS Process. In Symposium on VLSl Circuits Digest of Technical Papers, 2003.

[71] M. T. Takagi et. al. A Highly Reliable Metal-to-Metal Antifuse for High- speed Field Programmable Gate Arrays . In IEEE Electron Devices Meeting (IEDM), Dec 1993. Bibliography 165

[72] . The Antifuse, . http://www10.edacafe.com/.

[73] C. Kothandaraman, S. K. Iyer, and S. S. Iyer. Electrically Programmable Fuse (eFUSE) Using Electromigration in Silicides. IEEE Electron Device Letters (EDL), 23(9), 2002.

[74] S. Nimbal. IC Packaging. http://www.slideshare.net/.

[75] . Wire Bonding Services’, . http://component-solutions.tek.com/.

[76] R. P. Howson. The reactive sputtering of oxides and nitrides. Journal of Pure and Applied Chemistry, 66(6), 1994.

[77] . BASIC CONCEPTS OF REACTIVE SPUTTERING, . http:// reactive-sputtering.info/node/99.

[78] . Precision Thin Film Chip Resistor, . http://www.vishay.com/.

[79] . Thin Film, . http://www.mini-systemsinc.com/.

[80] . Leakage Curve Test, . http://www.semitracks.com/.

[81] . DC Characterization of ICs Using PXI Instrumentation, . http://www. marvintest.com/.

[82] E. A. Ostertag. Apparatus for testing input pin leakage current of a device under test. US Patent 4862070 A.

[83] . Model 4200-SCS Semiconductor Characterization System, . http://www.keithley.com/.

[84] . LM741 Operational Amplifier, . http://www.ti.com/.

[85] . 32K X 8 LOW POWER CMOS STATIC SRAM, . http://www.issi.com/.

[86] OpenCores, opencores.com.

[87] B. Vermueulen. Design-for-Debug To Address Next-Generation SoC Debug Concerns . In IEEE ITC, 2007.

[88] Debug Specifications, mipi.org.

[89] A. Basak, S. Bhunia, and S. Ray. A Flexible Architecture for Systematic Implementation of SoC Security Policies. In IEEE ICCAD, 2015. Bibliography 166

[90] ModelSim - Leading Simulation and Debugging, www.mentor.com.

[91] J. Backer, D. Hely, and R. Karri. On enhancing the debug architecture of a system-on-chip (SoC) to detect software attacks. In IEEE DFTS, 2015.

[92] J. Lee, I. Heo, Y. Lee, and Y. Paek. Efficient dynamic information flow tracking on a processor with core debug interface. In ACM DAC, 2015.

[93] J. Backer and R. Karri. Secure Design-for-Debug for Systems-on-Chip. In IEEE ITC, 2015.

[94] D. Akselrod, A. Ashkenazi, and Y. Amon. Platform Independent Debug Port Controller Architecture with Security Protection for Multi-Processor System-on-Chip ICs. In IEEE DATE, 2006.

[95] G. Lemieux and D. Lewis. Using sparse crossbars within LUT. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pages 59–68, 2001.

[96] J. Rose and V. Betz. FPGA Routing Architecture: Segmentation and Buffer- ing to Optimize Speed and Density. In Proceedings of the ACM/SIGDA In- ternational Symposium on Field Programmable Gate Arrays (FPGA), pages 59–68, 1999.

[97] J. V. Rajendran, A. K. Kanuparthi, M. Zahran, S. K. Addepalli, G. Ormaz- abal, and R. Karri. Securing Processors Against Insider Attacks: A Circuit- Microarchitecture Co-Design Approach. IEEE Design and Test Magazine, 30(2):35–44, 2013.

[98] A. Das, G. Memik, J. Zambreno, and A. Choudhary. Detecting/preventing information leakage on the memory bus due to malicious hardware. In IEEE DATE, pages 861–66, March 2010.

[99] C. Liu, J. V. Rajendran, C. Yang, and R. Karri. Shielding Heterogeneous MPSoCs from Untrustworthy 3PIPs Through Security-Driven Task Schedul- ing. In IEEE DFT, pages 101–106, Oct 2013.

[100] X. Zhang and M. Tehranipoor. Case Study: Detecting Hardware Trojans in Third-Party Digital IP Cores. In IEEE HOST, pages 67–70, June 2011.

[101] A. Waksman and S. Sethumadhavan. Silencing Hardware Backdoors. In IEEE Symposium on Security and Privacy, pages 49–63, May 2011. Bibliography 167

[102] A. Waksman and S. Sethumadhavan. Tamper Evident Microprocessors. In IEEE Symposium on Security and Privacy, pages 173–188, May 2010.

[103] H. David, J. Dubeuf, and R. Karri. Run-time detection of hardware Trojans: The processor protection unit. In IEEE ETS, pages 1–6, May 2013.

[104] D. M. Ancajas, K. Chakraborty, and S. Roy. Fort-NoCs: Mitigating the threat of a compromised NoC. In IEEE DAC, pages 1–6, June 2014.

[105] R. Simha, B. Narahari, J. Zambreno, and A. Choudhary. Secure Execution with Components from Untrusted Foundries. In Advanced Networking and Communications Hardware Workshop, pages 1–6, 2006.

[106] J. Rajendran, V. Vedula, and R. Karri. Detecting malicious modifications of data in third-party intellectual property cores. In IEEE DAC, pages 1–6, June 2015.

[107] S. Bhunia, M. S. Hsiao, M. Banga, and S. Narasimhan. Hardware Trojan Attacks: Threat Analysis and Countermeasures. Proceedings of the IEEE, 102(8):1229–47, 2014.

[108] S. T. King, J. Tucek, A. Cozzie, C. Grier, W. Jiang, and Y. Zhou. Designing and implementing malicious hardware. In Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats (LEET), 2008.

[109] S. Skorobogatov and C. Woods. Breakthrough Silicon Scanning Discovers Backdoor in Military Chip. In CHES, pages 23–40, Sept. 2012.

[110] Bhasin S, J. L. Danger, S. Guilley, X. T. Ngo, and L. Sauvage. Hardware Trojan Horses in Cryptographic IP Cores. In IEEE Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC), pages 15–29, Aug 2013.

[111] A. Waksman, M. Suozzo, and S. Sethumadhavan. FANCI: identification of stealthy malicious logic using boolean functional analysis. In Proceedings of ACM CCS, pages 697–708, Nov 2013.

[112] M. Banga and M. S. Hsiao. Trusted RTL: Trojan detection methodology in pre-silicon designs. In IEEE HOST, pages 56–59, June 2010.

[113] M. Hicks, M. Finnicum, S. T. King, M. M. K. Martin, and J. M. Smith. Overcoming an Untrusted Computing Base: Detecting and Removing Mali- cious Hardware Automatically. In IEEE Symposium on Security and Privacy (SP), pages 159–72, May 2010. Bibliography 168

[114] E. Love, Y. Jin, and Y. Makris. Proof-Carrying Hardware Intellectual Prop- erty: A Pathway to Trusted Module Acquisition. IEEE Transactions on Information Forensics and Security , 7(1):25–40, 2011.

[115] S. Hogg. Software Containers: Used More Frequently than Most Realize, 2014.

[116] H. Salmani and M. Tehranipoor. Analyzing circuit vulnerability to hardware Trojan insertion at the behavioral level. In IEEE DFT, pages 190–95, Oct 2013.

[117] Priyadarsan Patra. On the cusp of a validation wall. IEEE Design & Test of Computers, 24(2):193–196, 2007.

[118] Siva Yerramili. Addressing Post-silicon Validation Challenge: Leverage Val- idation and Test Synergy. In International Test Conference (ITC 2006), 2006.

[119] C. P. Pfleeger and S. L. Pfleeger. Security in Computing. Prentice Hall, 2007.

[120] F. DaSilva, Y. Zorian, L. Whetsel, K. Arabi, and R. Kapur. Analyzing circuit vulnerability to hardware Trojan insertion at the behavioral level. In IEEE ITC, pages 998–97, Oct. 2003.

[121] A. Basak, S. Bhunia, and S. Ray. Exploiting Design-for-Debug for Flexible SoC Security Architecture. In IEEE DAC (accepted), June 2016.

[122] Embedded Trace Macrocell Architecture Specification, infocenter.arm.com.

[123] J. Porquet and S. Sethuamdhavan. WHISK: An Uncore Architecture for Dynamic Information Flow Tracking in Heterogeneous Embedded SoCs. In IEEE (CODES + ISSS), pages 1–9, Oct. 2013.

[124] L. Davi, A. Dmitrienko, M. Egele, T. Fischer, T. Holz, R. Hund, S. Nurn- berger, and A. R. Sadeghi. MoCFI: A Framework to Mitigate Control-Flow Attacks on Smartphones. In NDSS, Feb 2012.