University of Connecticut OpenCommons@UConn
Doctoral Dissertations University of Connecticut Graduate School
8-18-2017 Computer Aided Design Solutions for Testability and Security Qihang Shi University of Connecticut - Storrs, [email protected]
Follow this and additional works at: https://opencommons.uconn.edu/dissertations
Recommended Citation Shi, Qihang, "Computer Aided Design Solutions for Testability and Security" (2017). Doctoral Dissertations. 1640. https://opencommons.uconn.edu/dissertations/1640 Computer Aided Design Solutions for
Testability and Security
Qihang Shi, Ph.D.
University of Connecticut, 2017
As technology down scaling continues, new technical challenges emerge for the Inte- grated Circuits (IC) industry. One direct impact of down-scaling in feature sizes leads to elevated process variations, which has been complicating timing closure and requiring classification of fabricated ICs according to their maximum performance. To address this challenge, speed-binning based on on-chip delay sensor measurements has been proposed as alternative to current speed-binning methods. This practice requires ad- vanced data analysis techniques for the binning result to be accurate. Down-scaling has also increased transistor count, which puts an increased burden on IC testing. In particular, increase in area and capacity of embedded memories has led to overhead in test time and loss test coverage, which is especially true for System-on-Chip (SOC) Qihang Shi, University of Connecticut, 2017 designs. Indeed, expected increase in logic area between logic and memory cores will likely further undermine the current solution to the problem, the hierarchical test ar- chitecture. Further, widening use of information technology led to widened security concerns. In today’s threat environment, both hardware Intellectual Properties (IP) and software security sensitive information can become target of attacks, malicious tamper- ing, and unauthorized access. Therefore, it is necessary for existing design flows to be properly improved to address these new challenges.
Among possible options, mathematical optimization and novel architectural designs have time and again proved to be most promising approaches. In this work, we focus on developing new tools for this purpose by augmenting existing design flow. Imple- mentation results on benchmarks proves proposed solutions to be effective and efficient. Computer Aided Design Solutions for
Testability and Security
Qihang Shi
B.E., Tsinghua University, Beijing, 2001
M.S., Lund University, Lund, 2009
A Dissertation
Submitted in Partial Fullfilment of the
Requirements for the Degree of
Doctor of Philosophy
at the
University of Connecticut
2017
i Copyright by
Qihang Shi
2017
ii APPROVAL PAGE
Doctor of Philosophy Dissertation
Computer Aided Design Solutions for
Testability and Security
Presented by
Qihang Shi, M.S.
Major Advisor
Mark M. Tehranipoor
Associate Advisor
Domenic Forte
Associate Advisor
Lei Wang
Associate Advisor
Rajeev Bansal
Associate Advisor
John Chandy
University of Connecticut
2017
iii To My Loving Family
iv ACKNOWLEDGEMENTS
I am deeply grateful to my advisor, Dr. Mark M. Tehranipoor, for his support and guid- ance during my Ph.D. study at the University of Connecticut. Pursuing Ph.D. under his advisory was an opportunity that I treasured greatly and I hope in the end I’ve delivered enough to match the confidence invested in me. At times, it seemed question- able whether I would ever think fast enough to keep up with his analytical and critical prowess; yet as I push myself over the edge, I eventually saw myself performing at a rate I did not consider possible for me. Dr. Tehranipoor will forever remain a shining example of excellence as a researcher, mentor, and entrepreneur in my heart.
I would like to also express my gratitude to Dr. Domenic Forte for his inspiring and patient guidance over the past two years. Many of my small achievements would not have been possible without his insightful comments and advices. I shall admire and try my best to emulate his academic vision and diligent work ethnic in my future ca- reers. I’d like to offer my sincere thanks to Dr. John Chandy, Dr. Rajeev Bansal, and
Dr. Lei Wang for joining my Advisory Committees and providing constructive feed- back. Dr. Chandy and Dr. Bansal both kindly offered help when I most needed it, to which I shall remain forever grateful.
I would like to further convey my thanks to all industrial coordinators who have su- pervised and advised my work, Mr. LeRoy Winemberg, Dr. Ramesh Tekumalla, and
v Dr. Lisa McIlrath, who extended generous aid during my training to become a compe- tent researcher.
Special thanks to all my labmates: Shuo Wang, Hassan Salmani, Xuehui Zhang, Wei
Zhao, Jifeng Chen, Fang Bao, Niranjan Kayam, Kan Xiao, Ujjwal Guin, Gustavo Con- treras, Kun Yang, Md. Tauhidur Rahman, Adib Nahiyan, Miao He, Zimu Guo, Mehdi
Sadi, Fahim Rahman, and Halit Dogan for their help and cooperation. The discussions and exchanges of knowledge enriched my skills and experience. I am also deeply in- debted to my teachers, mentors, classmates and friends in my life. There are so many of them that I cannot list all their names.
Words cannot delineate my appreciation to my family for their love and support. My parents, Yexin Shi and Yueping Zhou, whose confidence in me and support afforded me through my darkest years into a brighter future. My work is indeed impossible with- out your support through the years. As I dedicate this humble work to you, I hope it furnishes a little consolation for all the paths we opted not to tread.
vi TABLE OF CONTENTS
1. INTRODUCTION ...... 1
1.1 Related Work ...... 4
1.1.1 Speed-binning with Ring Oscillators and Modeling of Process Variations4
1.1.2 Raising Time Cost in System-on-Chip (SoC) Memory Tests ...... 6
1.1.3 Threat and Countermeasures of Circuit Microprobing Attacks ...... 8
1.1.4 Threat of Untrusted Foundries and Existing Countermeasures ...... 10
1.2 Motivations and Contributions ...... 22
2. OPTIMIZED SENSOR SELECTION WITH GENETIC ALGORITHM 27
2.1 Optimization Algorithm ...... 29
2.2 Simulated Binning using SPICE ...... 32
2.2.1 Objective and Method ...... 32
2.2.2 Model Assumptions used in this Simulation ...... 33
2.2.3 Method of Implementation ...... 34
2.2.4 Simulation Results ...... 36
2.3 Experimental Result using Silicon Data ...... 40
2.4 Sensor Design ...... 40
2.5 Measurements ...... 42
2.6 Optimization Results and Analyses on Silicon Data ...... 44
2.7 Summary ...... 46
vii 3. CONCURRENT APPLICATION OF MEMORY AND LOGIC TESTS 48
3.1 Ensuring Concurrency ...... 51
3.2 Testing Memory Access Paths ...... 57
3.3 Contribution to Test Scheduling ...... 65
3.4 Experimental Results ...... 66
3.5 Discussion on Results ...... 74
3.6 Summary ...... 76
4. A LAYOUT-DRIVEN FRAMEWORK TO EVALUATE VULNERABILI-
TIES TO MICROPROBING ATTACKS ...... 77
4.1 Modeling Bypass Attack ...... 81
4.2 Algorithm to Find Exposed Area ...... 85
4.3 Evaluation Results ...... 89
4.3.1 Evaluation of algorithm ...... 89
4.3.2 Performance evaluation of active shield ...... 91
4.3.3 Future research directions ...... 92
4.4 Summary ...... 93
5. OBFUSCATED BUILT-IN SELF-AUTHENTICATION (OBISA) .... 94
5.1 Combining BISA with Split Manufacturing Techniques ...... 98
5.2 Obfuscated Built-In Self-Authentication via Wire Lifting ...... 102
5.2.1 Cost of Wire Lifting ...... 105
viii 5.2.2 Fast Wire Lifting ...... 108
5.2.3 Implementation Flow ...... 117
5.3 Experimental Evaluation ...... 127
5.3.1 Performance of BP-based Wire Lifting Algorithm ...... 129
5.3.2 Pin-based vs. Cell-based Definition of Edges ...... 132
5.3.3 Security Evaluation with Known Layout-based Metrics ...... 132
5.3.4 Implementation and Overhead on Large Designs ...... 136
5.4 Summary ...... 142
6. Conclusion ...... 143
6.1 Future Work ...... 144
6.2 IMPACT OF THE DISSERTATION WORK ...... 146
Bibliography 147
ix LIST OF FIGURES
1.1 Typical cross-sections of microprocessors (MPU) and Application-Specific
Integrated Circuits (ASIC) [1] ...... 9
1.2 Typical split manufacturing arrangement (assuming split made between
Metal 2 and Metal 1 layers)...... 11
1.3 Structure of (a) BISA, (b) four-stage LFSR, and (c) four-stage MISR. . . . 14
1.4 Principle of k-security applied to a full adder circuit as an example. . . . . 26
2.1 Block diagram of an RO sensor...... 27
2.2 Block diagram of an Path-RO sensor...... 28
2.3 Flowchart of the proposed optimization algorithm...... 30
2.4 Accuracy of SVM classification without optimizing sensor selection. . . . . 37
2.5 Evolution of SVM classification accuracy under GA-based optimization of
sensor selection...... 39
2.6 On-chip sensor block layout (left) and their locations (right) in the fabri-
cated SoC die. In the figure, RO sensors are denoted as “IR sensor" and
Path-RO sensors as “PUT"...... 40
2.7 Per-die average of normalized measured sensor delay, organized by sensor
modules, compared against FMAX...... 42
2.8 Evolution of an exemplary process of proposed optimization algorithm. . . 45
x 2.9 Frequency of each sensor appearing in best accuracy sensor selection list,
weighted by classification accuracy...... 45
3.1 Timing scheme of sequential application of logic and memory test...... 48
3.2 A typical memory access structure...... 49
3.3 Proposed architecture: Test Scheme 1 - Bypass to leave MLFFs and MCFFs
outside of memory test path: MUX inserted at MLFF Q-pins (solid red,
solid line); Test Scheme 2 - Include MLFFs and MCFFs in memory test
path: MUX inserted at MLFF SI-pins (slashed red, dashed line). . . . . 52
3.4 Comparison of test time between (a) Scheme 1 (Q-pin bypass) and (b)
Scheme 2 (SI-pin bypass)...... 54
3.5 Comparison of test time between (a) Scheme 1 (Q-pin bypass) and (b)
Scheme 2 (SI-pin bypass)...... 56
3.6 Pattern rearranging algorithm...... 58
3.7 Test generation approaches: (a) Unaltered MLFF scan chain (black) with
additional circuitry to reconfigure it into decompressor for determinis-
tic test approach (solid red/solid line) and circuitry to reconfigure it into
linear feedback register (LFSR) for pseudo random pattern approach
(slashed red/dotted line); (b) MCFF scan chain (black) reconfigured
into response compactor (empty red/solid line) for both approaches. . . 62
3.8 Comparison of test time between proposed TPG approach and scan test. . . 74
xi 4.1 Diagram of known microprobing techniques for assessment of design vul-
nerability...... 80
4.2 Assumptions on FIB-based milling undertaken in this study...... 82
4.3 Geometric calculations for non-perpendicular milling scenario...... 84
4.4 Finding milling-exclusion area. (a) Milling-exclusion area on sides of in-
tersecting wire; (b) Milling-exclusion area on ends of intersecting wire. 86
4.5 Exemplary results produced by proposed algorithm. (a) Exemplary tar-
geted wire (highlighted) in layout; (b) Mill-exclusion area (black) pro-
jected on canvas of same wire...... 89
5.1 A sample split manufacturing arrangement. It assumes split is made be-
tween Metal-2 and Metal-1 layers...... 96
5.2 Example: Due to OBISA insertion, wire lifting optimization on the same
full adder can achieve higher k-security rating...... 105
5.3 Edge types and vertex types: How to constrain the wire lifting problem to
make it easier to solve...... 109
5.4 Implementation flow of proposed OBISA technique on a reasonably-sized
layout...... 119
5.5 Circuit432 layout, with and without wire lifting optimization...... 121
5.6 Hierarchical wire lifting: Apply OBISA flow to each logic module, then
integrate into final GDSII...... 122
5.7 Layout partitioning for simplified wire lifting...... 124
xii 5.8 Flow of partitioned wire lifting...... 126
5.9 Neighborhood connectedness (C(R)) curve as radii (R) increases, on Cir-
cuit432 layouts...... 135
xiii LIST OF TABLES
2.1 Parametric assumptions used in mathematical model for path and sensor
delay simulations...... 35
2.2 Comparison between optimized classification and SVM of same total train-
ing size...... 39
2.3 Design of 8 RO sensor modules in each sensor block...... 41
2.4 Design of 8 Path-RO sensor modules in each sensor block...... 41
2.5 Classification accuracy using linear SVM...... 43
3.1 Area overhead of both bypass schemes...... 67
3.2 Components for test time calculation...... 69
3.3 Components for test time calculation...... 70
3.4 Fault coverage of three test generator approaches...... 73
3.5 Overall area overhead and test time reduction in percentage...... 75
4.1 Performance against known microprobing techniques of published designs . 81
4.2 Maximum achievable reduction of deff by milling at an angle...... 85
4.3 Evaluation results on long nets and nets on low layers ...... 90
4.4 Evaluation of active shield performance ...... 91
5.1 Comparison of Binary Programming (BP) and greedy algorithm-based wire
lifting in terms of ne kept and time consumption...... 129
xiv 5.2 Comparison of security and ne between cell-based and pin-based definition
of edges...... 132
5.3 Success rate of proximity attacks...... 134
5.4 Standard cell composition bias of key_sel and DES toplevel...... 136
5.5 Power, timing, and area overheads of wire-lifted DES modules...... 138
5.6 Power, timing, and area overheads of wire-lifted AES modules...... 138
xv Chapter 1
INTRODUCTION
Advance in IC fabrication technology has introduced a number of benefits to the in- dustry. For example, integration of exponentially more devices is made possible for each newer generation of systems, and decrease in cost of cryptographic hardware has supported informatization of everyday life.
On the other hand, these advances did not come without cost.
Process variations on devices manufactured with lower feature sizes has increased.
Conventionally, speed-binning has been done with functional tests, while more recent techniques use structural tests [2, 3]. However, both techniques have limitations. Due to these limitations, techniques based on post-silicon measurements with on-chip delay sensors have been developed [4–11]. In particular, Ring Oscillators (RO) and path- based ROs are used with reported success. However, for FMAX estimation that is closely related to the speed-binning result, using data from all sensors might not be the best idea, and there are sensor combinations that could be optimized, which could yield better speed estimation speed and accuracy.
Increase in transistor count has lead to increase in area and capacity of embedded
1 2 memories in modern system-on-chips (SOCs). In 2013, the International Technology
Roadmap for Semiconductors (ITRS) [12] predicted memory and uncore logic in mi- croprocessors and SOC designs to continue to grow in area with each technology node advance while percentage of core logic area remains almost the same in the future. In the test section of its 2013 report, ITRS reported [13] 40% of respondents considered test cost as one of their major concerns, with 85% expecting it to become the biggest concern in the future. The trend of aggressive growth in embedded memories necessi- tates improvement in testing methodologies and further reduction of test cost. Research to improve the test of embedded memories is one among them.
Security sensitive information in portable devices made access of information convenient, meanwhile it also opened up possibilities of physical attacks. Microprob- ing is one kind of physical attack that directly probes at signal wires in order to extract sensitive information [14]. In a successful microprobing attack, plaintexts such as per- sonal data, code format intellectual property (IP) or even encryption keys can be com- promised [15]. Most security critical ICs reinforced against microprobing attacks with active shield to zeroize sensitive information once a breach has been detected. How- ever, major problems exist with this approach. Active shields are designed to cover the entirety of the die, in some designs more than one metal routing layer is required. This puts a prohibitively high cost on the design, and leaves ICs fabricated with technologies offering smaller number of available routing layers dangerously exposed to probing at- tacks. Furthermore, research has shown using active shields in the top metal layer of an 3
IC to be very ineffective against microprobing attacks [16].
Finally, advanced fabrication technology has increased in cost, forcing outsourc- ing of fabrication, and casts doubts on IP security in a globalized IC supply chain.
Most semiconductor companies now outsource fabrication to off-shore state-of-the-art foundries, which leads to gradual loss of control of the whole supply chain. Split man- ufacturing has been proposed recently as an approach to enable use of state-of-the-art semiconductor foundries while minimizing the risks to IP [17,18]. This method divides a design into Front End of Line (FEOL) and Back End of Line (BEOL) parts for fabri- cation at different foundries. Therefore, an untrusted foundry only has access to FEOL information, which is insufficient for them to constitute a threat to IP security of the design. Naturally, the security of split manufacturing depends on this assessment being true. However, it is currently difficult to verify this quality. Attacks and protections are proposed to clarify this very issue, but as of yet no definite answer exists for security of split manufacturing.
In light of these challenges, current design flow needs to adapt to properly address them. Research already exist that seek to address these issues, which are detailed in
Section 1.1. 4
1.1 Related Work
Considerable research interest has been invested invested on all of the four particu- lar challenges to modern IC design. According to the design challenges they seek to address, existing literature are summarized in following sections.
1.1.1 Speed-binning with Ring Oscillators and Modeling of Process Variations
In 2004, Wang et al. proposed to use ring oscillator (RO) based sensors to perform binning of wafers [4]. This approach uses a simple inverter chain which might not fit well with the task of predicting speed of an IC, which is determined by the slowest timing paths. Therefore, authors in [5] proposed to improve it with a novel path-based
RO sensor called Path-RO sensor. Instead of using a simple inverter chain, the Path-RO sensor is based on actual critical paths of the functional design, and is believed to yield more accurate path delay information than traditional inverter ROs. Unfortunately, modern designs contain thousands of paths that might become critical due to process variation, and the amount is impossible for the few sensors to represent. To address this, in [10] the technique is further improved by introducing techniques to design paths whose delay will be representative of a number of potential critical paths instead of only representative of itself. However, a representative sensor may still become less than representative due to intra-die variation. Therefore in addition to improved sensor designs, a technique widely used is to insert multiple sensors at a number of locations.
Some techniques, such as the one reported by Liu et al., use a statistical model to 5 perform learning from measurements collected with on-chip RO sensors so that circuit delay can be predicted [19].
The first problem with current application of process variation and RO-based on- chip speed sensors stems from difficulties in mathematical modeling of process varia- tion. To begin with, many assumptions that are commonly made in these models lack substantiation in reality. This include assumptions that are “typically true” such as as- sumption of all source of influences being additive, and assumptions that are simply as- sumed for the sake of simplicity, for example the assumption that systematic variation follows a spherical spatial correlation. Indeed, it has been suggested that with-in-die and die-to-die variations might not be statistically independent [20], which makes even the “typically true” assumptions shaky. To ensure validity, many assumptions taken in these models are corroborated with silicon data, which depend heavily on the particular technology used to fabricate test chips used to gather these data. For example, system- atic and random variations are assumed to be equal in [20] based on empirical data in 32-nm technology [21]. For another design fabricated with another technology, the validity of these assumptions might not hold. In some other cases where details about the technology are withheld due to limited intellectual property authorization, so are precise modeling restricted. Therefore, methods based purely on mathematical model- ing of process variation such as SSTA does not account for all application scenarios or satisfy all practical requirements. Moreover, on-chip speed sensors extract post-silicon data but are designed pre-silicon. In practice, these sensors are typically placed close to 6 functional circuits for the purpose of improving correlation in systematic variation [5] with functional circuits in its vicinity, or placed at regular grid locations to better rep- resent the overall picture on the whole die [22]. Both practices assume that systematic variation are correlated to distance alone and are homogeneous and isometric often used in mathematical models of process variation. However, this might not be realistic considering the nature of variation sources. For example, variation in effective channel lengths could be caused by accumulative drift during photo-lithography, which obvi- ously will neither be homogeneous nor isometric. When real correlation does not match model assumption, sensors could be placed at on-chip locations that represent poorly on process variation information they are designed to capture. Therefore, in reality it is unavoidable for speed sensors thus placed to have outlier sensors that negatively impact binning. These problems might be solvable with more precise mathematical models of process variation; however, a newer technology might invalidate all information gained on the old one. Further, more precise modeling might require higher CPU time to solve, which is exactly the problem these work set out to alleviate by using simplified models in the first place.
1.1.2 Raising Time Cost in System-on-Chip (SoC) Memory Tests
One approach to reduce memory test time is to run logic and memory tests at the same time [23–25]. These methods are mainly concerned with loading of MBIST controller vectors, but paid very little attention to whether test coverage is lost by the bypass on 7 memory interface.
Another more recent development that also allows concurrent test of memory and logic is through test scheduling for individually wrapped design blocks [26–28]. Each wrapped design block, referred to as âAIJcore⢠A˘ I,˙ is tested with boundary registers whose values are shifted-in through a test control line (e.g., one that complies with
IEEE 1500-2005 standard [28]) that is in turn connected to test processors and external automatic test equipment (ATE). The test processor regulates test control signals on the test control line and can be configured to allow multiple cores to conduct test at the same time. Test scheduling algorithms have been developed to optimize concurrent test among all SOC cores [29, 30], that resulted in significant reduction in test time compared to testing the blocks in sequence [29]. This approach is tailored for large SOC designs and has seen wide industry adoption [31, 32] and academic interest [33–35].
However, on a lower level the test is not conducted on paths that starts and ends in different cores.
Usually memory cores are tested individually without involvement of any logic core that may access it, and this could mean that the longest path to memory cells during functional mode may be left untested. As area of embedded memories increase in modern SOC designs, test of memory access paths also increase in importance. Due to the increased area of both uncore logic and memory, interconnect load on memory access paths increases, resulting in aggravated transition and path delay faults (TDFs and PDFs) and creating possible critical or speed paths. Since memory access remains 8 as one of the most critical paths, delay tests in these paths have also become critical to speed binning of the complete design [36]. Recent literature presented several methods to address this problem [37–39]. Authors in [37] showed a memory test optimization algorithm for better fault coverage in memory and peripheral logic. Another research proposed various improvements to memory interface tests in preventing X-propagation, allowing detection of memory input faults as well as test time improvement based on scan architecture modification [38]. Despite improvement on test time, this method remains a scan-based test, and does little to ensure concurrent application of logic and memory test. Another recent approach proposed to test the memory access paths by scanning in test patterns using the fan-in and fan-out registers [39]. This method solves
PDF test for memory access paths, however it does not allow concurrent application of logic and memory tests.
1.1.3 Threat and Countermeasures of Circuit Microprobing Attacks
Circuit microprobing refers to techniques that allow an attacker to directly observe par- tial or full sensitive information, e. g. plaintexts or encryption keys. ICs designed for security-critical applications such as smartcards, microcontrollers in mobile devices and security tokens [16, 40, 41] are among the most common victims to this kind of attacks. Unfortunately, a lot of these applications also have exploitable security weak- nesses [41], probably due to tight budget margins. Wires of targeted nets that the at- tacker wishes to reach are likely buried under multiple passivation, metal, and dielectric 9
Fig. 1.1: Typical cross-sections of microprocessors (MPU) and Application-Specific Integrated Circuits (ASIC) [1] . layers (shown in Figure 1.1). The attacker either has to mill through these obstacles, or mill through from the silicon substrate, the so-called back-side attacks. [42] This is facilitated by utilizing either the phenomenon of Photon Emission (PE) or Laser Volt- age Techniques (LVX) [43]. However, both methods require observation of photon emissions, therefore limiting their effectiveness as feature size scales down. Further, it can also be prevented by fabricating a back-to-back 3D IC to avoid leaving back-side exposed [44]. Therefore, protection methods that assume milling through metal wires from the front-side is still a worthy assumption for antiprobing designs.
Existing techniques designed to stop microprobing usually perform their duty by detecting and then zeroizing sensitive information. The more widely studied and attempted approach is to detect hardware tampering by building a mesh of trigger wires to cover the design [44–48]. This is called an active shield, because the trigger wires are supposed to be constantly monitored in order to detect an attack. Some shield 10
designs are analog: for example, the authors in [45] use capacitance measurement to
detect damage done to it, and thereby detect tampering. The problem with analog
shield designs is that analog sensors rely on parametric measurement, which has been
shown to be weaker [40]. Therefore, digital active shields have been proposed [44, 47,
48]. These methods send digital random vectors through the trigger wires, and check
whether received vectors are altered. A milling through the mesh would be reliably
detected when it cuts off at least one of the trigger wires.
The digital active shield design is most commonly implemented in an industrial
design, so much that published attacks has also proposed exploits to beat them. [40,
41] Conversely, this also has encouraged dedicated protections that invalidates these
exploits by antiprobing designers. The authors in [44] investigated the problem of
possible prediction attack, where an attacker could predict the next random vector to be sent if the random vector generation is not secure enough. The authors then presented a design where block ciphers in Cipher Block Chaining (CBC) mode are used to generate secure random vectors. Another research proposed to obfuscate layout routing of the active shield so that the attacker would not be able to figure out how to perform a successful rerouting attack [48].
1.1.4 Threat of Untrusted Foundries and Existing Countermeasures
Due to globalization of IC supply chain, IC fabrication has been off-shored to overseas sites, raising concerns on whether trust between IP owner and overseas IC fabricator 11 could be established [49]. A foundry with malicious intent could conduct a number of attacks such IP piracy [50], IC cloning and overproduction [51, 52], hardware Trojan insertion [53]. Such a threat to IP security is often referred to as an untrusted foundry.
Fig. 1.2: Typical split manufacturing arrangement (assuming split made between Metal 2 and Metal 1 layers).
Research exist to address threat of all possible attacks by untrusted foundries.
Split manufacturing, also known as split fabrication, is a novel technique proposed to address the threat to IP security posed by untrustworthy foundries [17]. The technique proposes to split an IC layout ready for fabrication into multiple parts, and fabricate each part at a separate foundry. One example of split manufacturing, as shown in Fig- ure 5.1, has the untrusted foundry manufacture the front-end-of-line (FEOL) part of the IC, and then ship it to a trusted foundry to deposit back-end-of-line part onto it.
By denying the untrusted foundry complete layout information, split manufacturing prevents it from stealing IP information, and deters attacks that require reverse engi- 12 neering of the design. Meanwhile, techniques against hardware Trojan insertion exist in two categories characterized by how they address the issue: The first category focus on detecting Trojans, either by functional verification, side-channel signal analysis, or by new front-end design techniques such as design-for-trust [54–62]. Techniques in this category detect existence of hardware Trojans by generating a signature of the circuit under test (CUT), then classify the CUT with this signature. To perform classification, they require a golden model, i.e. signature of a copy of the same circuit that is known to be free from hardware Trojans. Unfortunately, it remains doubtful whether golden mod- els can be acquired for real world applications. On the other hand, technique focused on preventing hardware Trojans from being inserted does not have this requirement.
Built-in self authentication (BISA) is the first proposed technique to prevent hardware
Trojan insertion in circuit layout and mask. By occupying all available spaces for Tro- jan insertion and detecting malicious removal through built-in self test, BISA is able to deter hardware Trojan insertion without the requirement of golden models.
However, problems remain. To begin with, techniques against IP piracy do not usually consider the threat of hardware Trojan insertion, and neither do techniques against hardware Trojan insertion consider IP theft, despite both attacks could be com- mitted by their intended adversary.
An untrusted foundry is characterized by two attributes: First, the service of an untrusted foundry is imperative, otherwise it would be secure enough simply by replacing it with a trusted foundry; Second, an untrusted foundry cannot be trusted 13 with the security of the intellectual property (IP), hence additional measures need to be taken to prevent it from jeopardizing it. Since both attributes need to be present for split manufacturing to be necessary, we may assume all untrusted foundries in the adversarial model possess both attributes.
A natural inference from this fact is, to ensure security of a fabrication with split manufacturing, we must ensure security of all possible attacks from the untrusted foundry considering its capability. Due to the first quality, it is likely that the untrusted foundry is equally aware of its own importance, so there is no disincentive for the untrusted foundry from trying all possible attacks given its technical capability. At the same time, as a result of the second quality, the IP owner has no reason to trust the untrusted foundry not to. In all likelihood, untrusted foundries will likely try all attacks in their arsenal, and IP owners will desire an overall solution that will secure his design against all of them. Therefore, a complete security solution to address the threat of untrusted foundry needs to consider all possible attacks, and both split manufacturing and BISA are limited.
Built-In Self-Authentication (BISA)
Built-In Self-Authentication (BISA) prevents hardware Trojan insertion by exhausting one resource essential to it: white spaces. Normally, during the placement step of the back-end design of the circuit, gates in the circuit are placed at optimized locations based on density and routability [63]. This leaves spaces in the layout that are not 14
filled with standard cells. For design purposes other than security against hardware
Trojan insertion, these white spaces can be filled with filler cells to serve as decoupling capacitors and/or extension of power tracks [64]. However, unsupervised filler cells are prone to malicious removal by Trojan inserters in order to make room for hardware
Trojans. If they are replaced by Trojan gates, no performance loss significant enough to raise suspicion will likely be incurred, since Trojan gates are rarely triggered as well.
(a)
(b)
(c)
Fig. 1.3: Structure of (a) BISA, (b) four-stage LFSR, and (c) four-stage MISR.
BISA architecture BISA prevents hardware Trojan insertion by occupying white spaces with testable standard cells instead. Then, all inserted BISA cells are connected into tree-like structures to form a built-in self test (BIST) circuitry, so that they could 15 be tested to verify no BISA cell has been removed. Removal of its member cells will lead to a BIST failure, so that no attempt to make room for hardware Trojans will evade detection. As shown in Figure 1.3a, BISA consists of three parts: the BISA circuit under test, the test pattern generator (TPG), and the output response analyzer (ORA).
The BISA circuit under test is composed of all BISA cells that have been inserted into unused spaces during layout design. In order to increase its stuck-at fault test coverage, the BISA circuit is divided into a number of smaller combinational logic blocks, called
BISA blocks shown in Figure 1.3a. Each BISA block can be considered as an indepen- dent combinational logic block. The TPG generates test vectors that are shared by all
BISA blocks. The ORA will process the outputs of all BISA blocks and generate a sig- nature. TPG has been implemented with Linear Feedback Shift Register (LFSR)while
ORA has been implemented with Multiple Input Signature Register (MISR) in prior work [65]. Examples of four-stage LFSR and four-stage MISR are shown in Figure
1.3b and 1.3c. They are used in the generation of random vectors and compression of responses into a signature. SFF in the figure represents scan flip flop. Other types of
TPG and ORA can also be applied [66].
The main advantage of BISA over other techniques with similar objectives is that it has no golden chip requirement. Most other research on addressing the issue of
Trojan insertions have focused on the development of:
1. Hardware Trojan detection techniques using functional verification and side-channel
signal analysis (applied post-silicon), or 16
2. New design techniques to improve detection capabilities (applied to front-end
design) [67].
Most detection approaches need golden chips either fabricated by a trusted foundry or verified to be Trojan-free through reverse engineering, both of which are prohibitively expensive, if not impossible in many scenarios. Since BISA relies on logic testing, process variation is not a factor either, as compared to Trojan detection techniques based on side-channel analysis. As an additional advantage, impact of BISA on the original design in terms of area and power is also negligible.
Attacks on BISA The attack most likely to succeed against BISA is the so-called redesign attack. This attack replaces original circuitry with smaller functionally equiv- alent circuitry to make room for Trojan insertion. Both BISA and original circuitry can be targeted in this attack. Redesigning the original circuit will result in significant changes of the electrical parameters, such as power and path delay. These could be detected by delay-based and power-based techniques [68–74]. Therefore, it is more likely for the attack to succeed against BISA cells. It is possible to further secure BISA cells by performing this optimization on BISA design to prevent this particular attack.
However, the attacker can also choose to design a custom cell functionally equivalent to several BISA cell in order to make room for Trojan insertion. Prevention of such an attack would require anticipation of all possible custom cell designs that are function- ally equivalent to any combination of BISA cells. That is not likely feasible except for 17 very small BISA circuitry. Therefore we believe this attack remains a possible threat to
BISA security.
Further, BISA is also under a few limitations on its performance. For example, due to the existence of resizing attack, all BISA cells have to be of the smallest variant in area among standard cells of the same function, which might make it easier for the attacker to identify them.
Split Manufacturing Security and Limitations
Prior works on split manufacturing security The prior work in this field [75–77] is motivated by one major objective: to establish a sound metric of security for designs fabricated using various split manufacturing methods. Studies focused on this problem generally agree that the key to split manufacturing security is how difficult it is for an untrusted foundry to recover the hidden BEOL information. Depending on the source of information perceived crucial for the untrusted foundry to obtain BEOL information, existing study on the problem of security metrics can be classified into two categories: metrics that evaluate the layout, and metrics that evaluate the graph connectivity of
FEOL part of the circuit left observable.
Most researchers attack the problem from a layout point of view. Publications in this category often examine irregularities in the layout (also known as “hints” or
“clues”) and theorize how they can be used by a hypothetical attacker. Authors in [75] proposed proximity attack, which simulates an attacker who makes educated guesses 18 on BEOL connections of open input/output pins in FEOL. [78] proposed to also con- sider load capacitance limitations as well as the direction of dangling wires, while [79] discussed more definitions of proximity based on known router behaviors. Another study [77] also uses layout information, but instead of performing hypothetical attacks, it seeks to define objective measures of the layout that might become useful to exploit.
This leads to the following metrics being proposed:
1. Neighborhood Connectedness (NC) is defined as function of count of connections
P connections to neighbors in R per neighbor cell within range R, i.e. C(R) = P neighbors in R ;
2. Standard-cell composition bias is designed to reflect imbalance in number of
instances of specific cells considered representative of design function∗;
3. Cell-level obfuscation measures percentage of standard cells that has functional-
ity hidden with obfuscation;
PN 4. Entropy measures disorder in FEOL information, and is given by E = − i=1 pilog(pi),
where N is total number of cell types and pi is the percentage of each cell type
among all cells; and
5. Overheads in area and delay are used to measure the cost of the obfuscation
technique.
Security metrics based on layout information has the advantage of being readily available to designers through layout editor tools, and have seen adoption in a number ∗e.g. XOR gates for their correlation with cryptographic circuits, flip-flops for their correlation with state machines, adders for their correlation with arithmetic cores, etc. 19 of recently published techniques intended to strengthen security for split manufactured designs [80, 81]. Unfortunately, problems remain to plague this approach. First, since no attack has been reported to have successfully reconstructed BEOL connection from
FEOL clues, all hypothetical attacks and objective metrics remain as convincing as an- other. Further, proposed metrics themselves don’t scale linearly with difficulty in any conceivable attack, which makes it difficult for designers to estimate how much protec- tion is enough. For example, in a proximity attack, the percentage of correct guesses means nothing without the knowledge of which connections crucial to design function- ality are correctly guessed. And finally, when multiple metric values are measured, it is very difficult to find a way to estimate the relative importance among them. Without this, the designer will have no other option than to ensure arbitrarily high (again, it is very hard for him to know how high) security on all of them. Hence, trading-off be- tween security metrics will be virtually impossible. In contrast, another research [76] evaluates split manufacturing security based on graph connectivity of FEOL layout and seeks to define difficulty by solving for the dimensions of the solution space from which the attacker must pick one correct solution. The proposed metric operates on
Directional Acyclic Graphs (DAG) abstracted from both the complete layout (G) and
FEOL layout (H) (see Figure 1.4a for an example). In the resulting DAG, gates are abstracted into vertices (represented with colored circles in Figure 1.4a, whose color represents models of each gate), and nets are abstracted into sets of directional edges, each of which corresponds to a driving gate/vertex and driven gate/vertex pair (repre- 20
sented with arrows in Figure 1.4a). Then it computes the number of legal mappings
that maps each gate ui in complete layout graph G to a distinct gate vj in FEOL layout graph H. This number k is defined as the security of that gate, and the security of the complete layout is defined as the lowest k of all gates. In the example shown in Figure
1.4, XOR gates have a security k = 2 but all other gates have k = 1, therefore the overall security of the circuit remains at k = 1. This security metric is often referred to using its letter of choice k as k-security. A greedy algorithm is then presented to find a minimal subset of wires in the layout to uplift to BEOL while satisfying minimum security k, an optimization of split manufacturing security also known as wire lifting.
Unlike layout based approaches, k-security is defined based on number of pos- sible mappings between cells in the layout and cells in the netlist. This is because by definition, a specific attack against split manufacturing will be aiming at recovering the BEOL connections; any attack that does not have this ultimate goal is not an at- tack specifically against split manufacturing, and any attack that succeeds in this goal will have completely compromised split manufacturing. This definition has two ad- vantages. One, it is quantifiable, meaning an optimization algorithm can be designed with k-security as its objective function to improve the absolute security of the BEOL connections, instead of simply preventing every known loopholes. Two, its definition is not dependent on specific layout, which makes it compatible with most layout based approaches, and secure even when netlist of the design is compromised [76].
It is also effective against a wide spectrum of threats, owing to the fact that few 21
attack is possible without making sense of what function gates in the layout serves.
However, it is not without weaknesses. Its first disadvantage is its difficulty to
compute. k-security definition checks whether any given FEOL netlist has a security
level of k by checking a property called subgraph isomorphism, which makes com- putation of the security level of any wire lifting solution NP-hard [76]. The proposed wire lifting algorithm in [76] functions by procedurally checking if lifting another wire lowers the security level, which makes it exponentially more complex. This makes it ex- tremely computationally costly, and even less scalable than typical NP-hard algorithms.
In addition, the fact that k-security is defined using graph connectivity also results in lack of consideration on any leakage of information from FEOL layout. Indeed, the authors acknowledged this weakness, and proposed to remove it by performing layout design after generation of FEOL netlist. This understandably deteriorated the perfor- mance of the design in every possible way, as it relies on layout not being optimized for performance to keep it from leaking security information. Another example of its restrictions on performance optimization is its requirement to restrict the number of standard cell models. Optimization during netlist synthesis otherwise unrestricted of- ten instantiates many less common standard cell models, and this long tail distribution will seriously restrict highest possible k any wire lifting can ever hope to reach. There- fore, k-security based wire lifting will have to restrict number of standard cell models synthesizer is allowed to use, which in turn restricts a number of design performance parameters. To summarize, k-security and wire lifting technique based on it offers a 22 better defined metric, but is too restrictive and too slow for industrial-level application.
Limitation of split manufacturing
Split manufacturing prevents all attacks that require complete knowledge of the whole layout, which also includes attacks against BISA such as identification of BISA cells.
However, not all attacks require complete knowledge of the whole layout. One example is the untargeted Trojan insertion [82], a threat discussed in details in [83]. Untargeted hardware Trojans do not target specific functions of the original circuitry, and there- fore cannot commit attacks that require knowledge of such functions. Nevertheless, untargeted hardware Trojans are still capable of degrading the performance and/or re- liability of manufactured ICs, or trigger a denial-of-service (DoS) in critical control systems [84]. This constitutes a threat that still needs to be addressed.
1.2 Motivations and Contributions
To improve upon existing proposals to better address the challenges in modern IC de- sign, we believe computer aided design techniques could contribute the following im- provements to the current practices:
• To better address impact of elevated process variation upon speed binning:
As was discussed in Section 1.1.1, problems with current methods on speed-binning and process variation modeling in general exist in primarily two aspects: the process 23
variation and design complexity cause RO-based speed sensors to deviate from IC speed
(i.e. FMAX), and this problem is difficult for current models of process variations to ac-
curately model. Considering these implications, we propose to improve on-chip speed
sensor based binning with optimized sensor selection with Genetic Algorithm [85], a
method insensitive to parameter change and doesn’t require very powerful hardware
computing power. By finding and removing from consideration sensors whose infor-
mation prove detrimental to speed binning with RO-based on-chip speed sensor infor-
mation, this method could potentially offer improvement in binning precision. In this
work we claim the following contributions:
1. A genetic algorithm to find an optimized set of sensors that will yield better
estimation accuracy;
2. Evaluation of proposed algorithm using path-delay data from SPICE-based tim-
ing simulations;
3. Evaluation of proposed algorithm using silicon measurements on fabricated com-
mercial ICs.
• To improve embedded memory test application and facilitate test coverage for uncore logic: We propose a novel design-for-test architecture to provide a capability to both concurrent test of logic and memory blocks, and delay fault test on functional memory access paths.
1. We develop a gate-level test architecture to enable concurrent application of 24
memory and logic tests by bypassing memory test access at fan-in registers of
the memory core, instead of conventional bypass at memory interface.
2. We also configure the proposed test architecture to allow detection of delay faults
in functional memory access paths, by reconfiguring registers on target paths into
test pattern generators/response compactors.
3. We also present using at-speed memory test to target delay faults on these paths.
• To address the problem of microprobing attacks and improve antiprobing de- signs: We propose a framework to examine protection designs in terms of their perfor- mance against known exploits, and evaluate them based on how much trouble it would take the attacker to circumvent their defenses. Specifically, in this field, our work will provide the following contributions:
1. A layout-driven framework to assess designs against microprobing attacks con-
sidering known attacks and exploits.
2. A thorough mathematical analysis on bypassing attack at any angle.
3. A verification algorithm based on a mainstream layout editor (Synopsys IC com-
piler) to quantitatively evaluate a post-place-and-route design in terms of exposed
area vulnerable to microprobing by security-critical nets; To our knowledge, the
proposed algorithm is the first to provide this capability.
• To address the problem of untrusted foundry: We propose a new method based on existing built-in self-authentication (BISA) technique that is enhanced by split 25 manufacturing technique with back-end-of-line (BEOL) wire lifting optimization, so that the new technique, known as obfuscated BISA (OBISA) can be secure against both hardware Trojan insertion and IP theft, without suffering from the same vulnerabilities of either BISA or split manufacturing. Our contributions in this problem include the following:
1. An OBISA technique that is secure against untargeted hardware Trojan insertion
and secure against IP piracy and IC cloning at the same time, both of which are
primary strengths of BISA and split manufacturing.
2. In addition to inheriting strengths from both parent techniques, the combined
OBISA technique will also utilize both techniques to strengthen the other. This
includes making the resulting OBISA technique secure against redesign attack,
making OBISA more independent from golden models by further reducing re-
liance on detection-based anti-Trojan techniques, utilizing additional OBISA cells
to improve wire-lifting security and relaxing restrictions on original circuitry, and
reduce threat of proximity attacks, etc.
3. In addition to producing a secure technique, the proposed OBISA technique is
also going to improve wire-lifting by reducing design overheads, hardware costs,
and computational costs. 26
Netlist 3 1 4 5 2 H: FEOL Graph
G: Complete 3 Graph 1 4 2 5
(a) A split manufactured full adder, its FEOL graph H, and its complete graph G.
Legal Mapping 3 of Vertices #0 1 4 2 5
Legal Mapping 1 of Vertices #1 3 4 2 5
(b) Legal mappings of vertices in FEOL graph H to vertices in complete graph G.
Fig. 1.4: Principle of k-security applied to a full adder circuit as an example. Chapter 2
OPTIMIZED SENSOR SELECTION WITH GENETIC
ALGORITHM
In this chapter, we are concerned with accuracy of speed-binning of manufactured ICs with measured results from on-chip speed sensors. We assume that limited numbers of sensor blocks are placed across different locations on the layout. In each sensor block, we consider two kinds of delay sensors, namely RO sensors (Figure 2.1) and Path-RO sensors (Figure 2.2). RO sensors are ring oscillators whose rings consist of inverter chains only. When measured, RO sensors provide number of oscillation cycles in a given period of time. Since ring oscillators have the same type of gate (inverters) and the same fan-out values among these gates, their measured values are highly homoge- neous, and could serve as comparison standards. The other kind of sensor we use is
Fig. 2.1: Block diagram of an RO sensor.
27 28
Fig. 2.2: Block diagram of an Path-RO sensor.
the Path-RO sensor. It is a path delay sensor proposed and extensively described by
Wang et al [86]. The purpose of using actual delay paths to construct sensors is to bet- ter represent impact of process variation upon path delay. Since different gate models react differently to the same process variation, RO sensors consisted of only inverters can only represent inverter gate delay. Therefore, it is only possible to study impact of process variation upon path delay with sensors that measure path delay itself.
This chapter will first present the proposed optimization algorithm that finds a set of sensors which will yield better estimation accuracy in Section 2.1. Then in Section
2.2, path-delay data from SPICE-based timing simulation is presented to evaluate the performance of the proposed algorithm. In section 2.6, results using silicon measure- ments are presented and analyzed. Finally, the chapter is concluded in Section 2.7. 29
2.1 Optimization Algorithm
In this section we present an optimization algorithm to find a set of sensors which could yield improved speed-binning results. The speed binning in this study is done with Support Vector Machine classification [87]. SVM is a machine learning based classification technique whose efficiency is irrelevant to the particular technology, de- sign or prior knowledge. In reality, the sensor-based classification problem is likely not linearly separable; however, it is beyond the scope of this work to find a most suit- able kernel function for SVM. To simplify the issue, the SVM classification method is implemented with a simple linear kernel function for a binary classification, assuming that only two bins are required. This blind classification technique is chosen because it could perform equally well whether statistical information about the process varia- tion is available or not. We have performed experiments in both of these situations in
Sections 2.2 and 2.6.
For the speed binning in this study, we assume a total number of J dies are to be binned. Of those, J 0 dies are used to train the prediction model by supplying the solver with their sensor measurements as vector and their FMAX as classification. After the learning is completed, sensor measurements from the other J − J 0 dies are supplied for classification. This classification result is then compared against their given FMAX classification to evaluate classification accuracy.
The proposed genetic algorithm is shown in Figure 2.3. This algorithm is suitable for our purpose for a couple of reasons: Firstly, a genetic algorithm does not require 30
Fig. 2.3: Flowchart of the proposed optimization algorithm. any knowledge of underlying mechanism, which makes it universal for all processes; in addition, its improvement can be guaranteed. The algorithm is designed favoring high suitability individuals to expedite evolution, and the drawback being covered with ad- ditional random individuals. As shown in the figure, the algorithm starts with a random population, evaluates their survival rate, randomly choose survival population, perform mutation and reproduction, repeat until a desired survival rate is reached (“abort_limit” in the flowchart). Since the algorithm intends to optimize sensor selection, we use a total number of S 1-dimensional binary vectors to represent each individual of the pop-
ulation. Here, each individual represents a possible solution, and S is the size of the
gene pool. Each individual’s representing binary vector is consisted of L binary values,
each value represents whether measurement from a corresponding sensor should be 31
used or not. Here, L is the total number of sensors on each die. The target function is a
function that evaluates the accuracy of the speed-binning technique of choice, e. g. lin-
ear Support Vector Machine (SVM), by supplying it with training sequences and then
compare its classifications with reliable FMAX. This entails a number of G dies in addi- tion to training dies dedicated to this purpose. For each individual, the target function returns a prediction accuracy of the technique assuming the choice of sensors specified by the individual’s binary vector is used. In terms of SVM classification, the prediction accuracy is the test error of the classification. Since prediction accuracy is a natural number as ∈ [0, 1], s ∈ {1,...,S}, it is used as the survival rate of each individual in the following phase where individuals are selected for survival. In the following sur- vival stage, S uniformly distributed random numbers P ∈ [0, 1] are compared against the survival rate of every individual to find those that survived this generation. These survived individuals are randomly subjected to mutation: each of their binary values
1 is subjected to probability P = L logic inversion. Then from the resulting individual,
a roulette selection is performed to find the parents of next generation of individuals.
This is done by performing a roulette selection, where a uniformly distributed random
PS test R ∈ [0, s=1 as] is taken once for each individual, each random result chooses one
individual by falling into its “slot” whose size equals to the individual’s survival rate.
To avoid falling into evolutionary dead-water, for each generation I “invading” indi-
viduals are introduced into next generation, which are generated randomly to prevent
evolutionary backwater. The other S − I individuals of the next generation are pro- 32 duced by randomly crossing S − I pairs of parents drawn with roulette selection from parent generation. This concludes processing for the current generation and creates the population of the next generation. The optimization continues until a limited number of generation or a desirable survival rate is reached. Upon conclusion, the algorithm selects the best performing individual to be used for classification.
2.2 Simulated Binning using SPICE
To verify the validity and usefulness of the proposed optimization, we first implement a binning based on SPICE-based simulation data. In this section, we introduce the method and purpose of this simulation, models and assumptions made to facilitate this simulation, then demonstrate the effect of speed binning using the proposed optimiza- tion algorithm.
2.2.1 Objective and Method
This simulation is intended to study the effect of the proposed algorithm on varying process variation and sensor design scenarios that might appear in different technology and designs, which will be difficult to study with silicon data. For this purpose, IC speed and sensor data are produced with simulation. A “true” (i. e., assumed to be accurate) speed binning is then performed with IC speed using a simple two-bin speed binning scheme. A number of ICs are randomly selected to form a training batch for the speed- sensor-based binning method using SVM. The trained model is then used to perform 33
binning on the rest of ICs. This process will be used as an exemplary speed-sensor-
based binning with unoptimized sensor selection. Then an optimization is performed
using the proposed algorithm to try to find an optimized sensor selection during the
training phase, whose trained model will be used to perform a second binning on the
rest of ICs. This binning result, called binning with optimized sensor selection, is then compared with the previous result to evaluate its improvement.
In this study we obtain IC speed with simulated path delays of the longest paths of the design. In other words, IC speed used in this study is defined as the delay of the longest path on the die under the influence of process variation. The total standard deviation of the process variation we shall cover in details in the next section.
2.2.2 Model Assumptions used in this Simulation
Model of process variation used in this evaluation is taken from the VARIUS model
[20]. The VARIUS model considers three sources of variation: the die-to-die variation, assumed to be independent; the random and systematic parts of with-in-die variation.
This simulation also follows the assumption made in the VARIUS model that σrandom =
σ = √1 σ , and that systematic variation in effective channel length is de- systematic 2 WID
1 pendent on the systematic variation in threshold voltage (xsystematic, Leff = 2 xsystematic,Vth ).
Since VARIUS model is focused on modeling with-in-die variation, we add the assump- √ √ tion that σD2D = 2σrandom = 2σsystematic = σWID, an assumption also made in some
earlier research [88]. Further, we use correlated random variables to generate system- 34
atic variation [89], whose correlation values are randomly generated with a uniform
distribution on [−1, 1].
In order to limit number of paths necessary to cover to avoid unrealistically long simulation time, we have restricted our simulated paths to top 20% paths, and accord-
ingly adjusted standard deviation of process variations to σ = σ = √1 20% ≈ D2D WID 3 2
4.71%.
In summary, parameters and assumptions used in the mathematical model are shown in Table 2.1.
2.2.3 Method of Implementation
Most of the variation figures used in this study were generated with MATLAB R [90] and Statistics and Machine Learning ToolboxTM Release 2012b [90]. In particular, gen- eration of systematic variation in threshold voltage is done by mvnrnd tool that gener- ates correlated Gaussian distributed random samples on specified correlations, which is in turn generated as uniformly distributed random numbers in [−1, 1] with a pub- lished method [89]. The only exception to this is the random variation, which is im- plemented using local variation block provided by Synopsys’ [91] HSPICE R version
K-2015.06 [92, 93] circuit simulator, since it would be otherwise impossible to append variation parameter to the generic subckt definition of standard gates.
The benchmark design that provides paths used in this study is one single OpenSPARCTM
[94] T1 core [95]. Static timing analysis with the help of Synopsys’ PrimeTime R tim- 35
Table 2.1: Parametric assumptions used in mathematical model for path and sensor delay simulations.
Name Role Value Total relative variation in x + effective channel length or D2D,[Leff|Vth], j x x + [Leff|Vth],i,j threshold voltage at one rand,[Leff|Vth] x device of path i on die j. systematic,[Leff|Vth],i,j Die-to-die variation in effective channel length or xD2D,[Leff|Vth] threshold voltage at all N(0, σD2D,[Leff|Vth],j) devices on die j. Standard deviation of die-to-die variation on √1 × 20% ≈ σ effective channel length or 3 2 D2D,[Leff|Vth],j 4.71% threshold voltage at all devices on die j. Random component of with-in-die variation in x effective channel length or random,[Leff|Vth] N(0, σ ) threshold voltage at a random,[Leff|Vth] device. Standard deviation of the random component of with-in-die variation in σ 1 ×20% ≈ 3.33% random,[Leff|Vth] effective channel length or 6 threshold voltage at a device. Systematic component of with-in-die variation in xsystematic,Vth,i,j threshold voltage at all N(0, σsystematic,Vth,i,j) devices of path i on die j. Standard deviation of the systematic component of 1 σsystematic,Vth,i,j with-in-die variation in 6 ×20% ≈ 3.33% threshold voltage at all devices of path i on die j. Systematic component of with-in-die variation in 1 xsystematic,Leff,i,j effective channel length at 2 xsystematic,Vth,i,j all devices of path i on die j. Correlation between systematic variations in threshold voltage at all U(−1, 1) ρ 0 systematic,Vth,i:i devices of path i and path i0 on all dies. 36 ing analyzer yields 438 paths that needs to be covered, which are exported to SPICE decks for 200 dies, each using a different set of variation parameters generated with
MATLAB R [90] and Statistics and Machine Learning ToolboxTM Release 2012b [90].
The Synopsys 32/28nm Generic Library for Teaching IC Design [96] supplied by Syn- opsys [91] were used to synthesize the design and provide physical SPICE models.
Sensor delay are also collected using SPICE simulation. 10 sensor blocks are assumed to be placed in vicinity of the 10 longest static timing paths, giving them 90% correlation in systematic variation with said paths. This is a likely scenario in real on- chip speed sensor implementation since a real implementation is unlikely able to allow for very large number of speed sensors due to area overhead, and despite designers’ effort to place sensors as close to functional paths as possible, they still would not experience exactly the same variations as the paths they try to mimic. In each sensor block, we used 8 kinds of RO sensor and 8 kinds of Path-RO sensors are inserted in each sensor block, totaling 160 sensor values per each simulated die. We have constructed
RO sensors with 31, 51, 71, and 91 inverters long, and placed two instances of each in every sensor block. For Path-RO sensors, we have replicated ten longest paths by STA, and randomly chose 70 other paths from top 20% paths.
2.2.4 Simulation Results
Before our proposed optimization algorithm is tested, we first present results from
SVM-only classification. As previously explained, this classification makes use of a 37
Fig. 2.4: Accuracy of SVM classification without optimizing sensor selection. simple linear kernel function and only two bins. Obviously, number and choice of train- ing dies leave an impact on classification accuracy. In this experiment, we randomly selected training dies and repeated classification 200 times to show distribution of clas-
sification accuracy. 5% to 50% percent training-die scenarios are investigated, and their
result shown in Figure 2.4. It can be gathered from the figure that although accuracy
could be quite high sometimes, it can also get quite low at some other occasions. The
distribution at each given amount-of-training-die scenario is most likely due to correla-
tion among die-to-die variations, which could make some dies better at being training
dies. However, since inter-die correlation is not the focus of this work, optimization
in training die selection is not investigated. Another observation worth mentioning is
the classification accuracy doesn’t increase much as more training dies are used, hint-
ing at possibility of information saturation. Judging from the SVM method, this could
probably be an indication of outlier sensors preventing better classification.
We performed proposed GA optimization to a 10-die-training SVM classifica- 38
tion. Figure 2.5 shows how SVM classification evolves as sensor selection is optimized
by the proposed optimization algorithm. The x-axis shows the generation each data point belongs to, and the y-axis shows the classification accuracy of each data point.
The blue trace represents the best classification among each generation, red trace rep- resents the average accuracy, and green represents the worst. In this example, a pop- ulation size S = 100 is used, and the optimization is set to run until 300 generations or 100% accuracy is reached. I = 1 invading individual is introduced in each gen- eration. When the optimization starts, the SVM used all sensors and could achieve a
90.53% classification accuracy; by the time the optimization ends, the best individual could achieve 94.74% accuracy. Although improvement is quite evident, this result proved inferior to our experiment with silicon data (to be presented in next section).
This could have a couple of explanations. For one, we assumed a strict 90% correlation between paths and sensors, which might not be true for all sensors, as we have dis- cussed. Another possible reason is the small size of IC samples we were able to access probably made the silicon result more optimistic, causing a “rounding up” effect on the measured accuracy.
Now we evaluate the capability of proposed method to improve binning. We start from the same 10 training dies and randomly choose additional dies for the proposed optimization process. The optimization results are shown in Table 2.2. In the table, GA- optimized results are compared against SVM results, assuming the same dies used for
GA optimization were appended to the training dies for SVM. The results substantiated 39
Fig. 2.5: Evolution of SVM classification accuracy under GA-based optimization of sensor selection.
Table 2.2: Comparison between optimized classification and SVM of same total train- ing size.
Number of Number of SVM Optimized Training Dies Dies for Optimization Accuracy Accuracy 10 92.22% 87.22% 20 90.59% 91.18% 10 30 52.50% 92.67% 40 71.33% 92.17% 50 91.43% 95.00% 60 93.08% 93.08% the improvement of the proposed method over using all sensors. Although it shows that die-to-die process variation does have an impact over SVM result through choice of training dies, the result also showed that with GA-optimized sensor selection the SVM could become more robust toward outlier dies as well. 40
Fig. 2.6: On-chip sensor block layout (left) and their locations (right) in the fabricated SoC die. In the figure, RO sensors are denoted as “IR sensor" and Path-RO sensors as “PUT".
2.3 Experimental Result using Silicon Data
2.4 Sensor Design
The silicon data used in this section are provided by fabricated ICs as part of research work reported in [86]. These sensors were fabricated on a commercial processor IC used in automotive applications. 34 dies are measured while FMAX information are available for 6 of them. Each measured die carries J = 10 sensor blocks at various locations, as shown in Figure 2.6. In each sensor block, KRO = 8 different modules of
RO sensors and KPath-RO = 8 different modules of Path-RO sensors (henceforth referred to as modules) are inserted, whose respective designs are shown in Table 2.3 and Table
2.4. In total, L = J(KRO + KPath-RO = 160 sensors are available on each measured die. Data from these sensors are shown as delay figures normalized according to their individual modules, with measurement noise removed by taking average of multiple measurements. 41
Table 2.3: Design of 8 RO sensor modules in each sensor block.
Num- Design Description ber 1 300MHz Inverter Chain 2 300MHz Inverter Chain 3 300MHz Inverter Chain 4 300MHz Inverter Chain 5 350MHz Inverter Chain 6 350MHz Inverter Chain 7 400MHz Inverter Chain 8 400MHz Inverter Chain
Table 2.4: Design of 8 Path-RO sensor modules in each sensor block.
Num- Design Description ber Path constructed with popular cells from 1 functional design 2 Clock buffer chain with long Metal 2 wires 3 Clock buffer chain with long Metal 3 wires 4 Clock buffer chain with long Metal 5 wires Clock buffer chain with wide and long Metal 4 5 wires Clock buffer chain with wide and long Metal 5 6 wires Small high-V inverter chain with alternate long 7 t and short wires Large standard-V inverter chain with alternate 8 t long and short wires 42
Fig. 2.7: Per-die average of normalized measured sensor delay, organized by sensor modules, compared against FMAX.
2.5 Measurements
We first present a simple comparison of measurements from RO and Path-RO sensors against provided FMAX classifications of the dies. As previously stated, each measured die carries LRO = JKRO = 80 RO sensors and LPath-RO = JKPath-RO = 80 Path-RO sensors. Naturally, there lies a question on how to represent each die with these data.
A naive approach is to find the average of all sensors on each die and represent the die with this average. Shown in Figure 2.7 are two exemplary analyses of this approach, where average is taken of measurements from sensors in all sensor blocks of the same sensor module and compared with given FMAX values. In the left figure, average is taken of measurements from sensors in all sensor blocks of the same sensor module, while in the right figure the average is taken of measurements from each sensor in a sensor block. In both figures, averages are compared with given FMAX values. 43
Table 2.5: Classification accuracy using linear SVM.
Using only Using All n Using only RO Path-RO Sensors 2 55% 45% 45% 3 83.33% 73.33% 73.33% 4 80% 80% 80% 5 80% 80% 80%
Apparently this method does not offer a reliable prediction on FMAX. However,
upon careful observation it is clear that some sensors offer better result than others.
This naturally begs investigation on whether this can be exploited in yielding better
prediction accuracy using less sensor information.
We now put the data through SVM classification as was done in Section 2.2. In
this analysis, a simple linear kernel function is used. Of 6 dies whose FMAX are known,
a number of n dies with known FMAX values are used to train the prediction model by supplying the solver with their sensor measurements as vector and their FMAX as classification. The solver then finds a hyperplane that best separates vectors of one classification from those of the other. After the learning is completed, sensor mea- surements from other 6 − n dies are supplied for classification. Since only two speed bins were given for FMAX, a binary classification is sufficient for the purpose of speed binning in this case. This classification result is then compared against their FMAX to