Reliability Analysis and Improvement of Nanoscale CMOS Digital Circuits

By: M.Sc. AndrésFelipe Gómez Chacón

A Dissertation submitted in partial fulﬁlment of the requirements for the degree of

Doctor in Electronics Sciences

At the National Institute for Astrophysics, Optics and Electronics

Thesis Advisor: Ph.D. V´ıctorHugo Champac Vilela, INAOE

December 2017 Tonantzintla, Puebla, Mexico

@INAOE 2017 Copyright The author hereby grants to INAOE permission to reproduce and to distribute copies of this thesis document in whole or in part ii Abstract

Reliability degradation due to transistor aging is a major challenge in the design of digital integrated circuits for current and future technology nodes. Bias Temper- ature Instability (BTI) is the dominant aging mechanisms in digital circuits. BTI gradually increases the device’s threshold voltage (Vth) over the lifetime, which in turn degrades circuit speed, and ultimately, it may cause a faulty operation. Circuit performance degradation due to BTI is highly dependent on the operating conditions such as the executed workloads by the circuit and the operating temperature. Moreover, variability in devices parameter due to process variations aggravate the reliability issues. These eﬀects need to be properly accounted during circuit design phase to obtain electronic products with improved lifetime. Conventional worst-case one-time guardbans to deal with aging due to BTI are becoming too large with technology scaling, which leads to conservative designs with reduced performance. This thesis addresses BTI aging issues from two diﬀerent perspectives:

1. Aging Monitoring:

Aging sensors are introduced in the circuit critical paths (CP) in order to detect when they are close to violate timing speciﬁcations due to BTI. Then, failure-prevention actions (i.e. increase system clock period) can take place.

iii selection are the uncertainty caused by process parameters variations, the uncertainty on spatial correlation between gates since gate placement in the layout is unknown at early design phase, and the uncertainty of the operating workload of the circuit, which is unknown in advance at the design phase.

2. Reliable Circuit Design:

Among the gates that belong to the critical paths of a circuit, some of them are more efficient to be re-sized to compensate aging degradation. This thesis proposes a circuit design methodology to reduce the guardband required for a circuit by smart selection an sizing of the circuit’s critical gates. Gate selection metrics are proposed to guide the sizing process of the circuits. One of the most important parameters taken into account by the metrics is the statistical path delay sensitivity to gate sizing, which depends on process variations, spatial correlation between gates, and the BTI degradation of the devices in the path. Classic worst-case aging assumptions are avoided by analyzing the circuit performance for various possible operating workloads and identifying a more realistic maximum degradation of the paths. In such way, area overhead due to worst-case aging assumptions is further mitigated. The proposed approach provides efficient trade-offs between gate sizing and guarband to compensate BTI aging under process variations effects.

The aforementioned proposals of this thesis have been implemented in an Aging-Aware Statistical Timing Analysis Tool written in C++ code. The benefit of the proposed methods were validated on different ISCAS circuits. The contributions of this thesis provide a better understanding of how the aging mechanisms affect digital circuits. Moreover, new aging-aware design techniques are available to help designers to approach the complex problem of the design of reliable high-performance integrated circuits.

iv Acknowledgement

To the Consejo Nacional de Ciencia y Tecnolog´ıa(CONACYT), for the economic support granted through the scholarship for doctorate studies No. 420129/ 264560

To the National Institute for Astrophysics, Optics and Electronics (INAOE), for giving me the opportunity to do the doctorate studies.

To my advisor, Dr. V´ıctorHugo Champac Vilela, for his guidance and support in the course of my doctoral research.

To the professors: Dr. Alejandro D´ıazSánchez Dr. Gordana Jovanovic Dolecek Dr. Lu´ısHernándezMart´ınez Dr. HéctorLu´ısVillacorta Minaya Dr. Roberto GómezFuentes for their in-depth review and comments of this thesis.

To the staﬀ of INAOE, for the support and help that they give to the students.

v vi A mis padres, perseverancia es una virtud que aprend´ıde ustedes, y sin ella no habr´ıaculminado este trabajo.

A mi esposa, motivaci´onpara siempre seguir adelante.

vii viii Contents

Abstract iii

Acknowledgementv

List of ﬁgures xvii

List of tables xix

List of Acronyms xxi

1 Introduction1 1.1 Semiconductor technology scaling...... 2 1.2 Process Parameters Variations...... 3 1.2.1 Systematic Variations...... 3 1.2.2 Non-Systematic Variations...... 4 1.3 Transistor aging...... 8 1.3.1 Hot Carriers Injection (HCI)...... 8 1.3.2 Time Dependent Dielectric Breakdown (TDDB)...... 9 1.3.3 Bias Temperature Instability...... 9 1.4 Reliability of digital integrated circuits...... 10 1.4.1 Reliability and Yield...... 11 1.4.2 Bathtub curve...... 12 1.4.3 Technology Scaling Impact on Reliability...... 13

ix CONTENTS

1.5 Impact of aging and process variations on performance of digital integrated circuits...... 15 1.6 Justiﬁcation of this thesis...... 18 1.7 Thesis organization...... 19

2 BTI Mechanism and its Impact on Digital Integrated Circuits 21 2.1 Physical Mechanisms for Bias Temperature Instability...... 21 2.1.1 Reaction-Diﬀusion Model...... 22 2.1.2 Trapping-Detrapping Model...... 24 2.2 Long-Term BTI Predictive Model...... 26 2.3 BTI Impact on Digital Circuits...... 29 2.4 Aging-aware design techniques...... 32 2.4.1 Design-time techniques...... 32 2.4.2 Run-time techniques...... 34

3 Aging-Aware Statistical Timing Analysis 37 3.1 Background...... 37 3.1.1 Basic deﬁnitions for timing analysis...... 38 3.1.2 Timing Analysis Methods...... 39 3.2 Modeling of BTI Degradation...... 44 3.2.1 Process Variations-Aware BTI Model...... 44 3.2.2 Workload Impact on BTI...... 46 3.3 BTI-aware Statistical Timing Analysis Procedure...... 49 3.3.1 Gate Delay Model...... 50 3.3.2 Cell Library Characterization...... 53 3.3.3 Topological Paths Search...... 55 3.3.4 Workload and Degradation Analysis...... 56 3.3.5 Path Statistical Delay Computation...... 59 3.3.6 Inter-Path Covariance...... 61

x CONTENTS

4 Early Selection of Aging Critical Paths for Delay Degradation Monitoring 63 4.1 Aging Monitoring Technique...... 64 4.2 Related work...... 66 4.3 Challenges for Critical Path Selection...... 67 4.3.1 Critical Path Deﬁnition...... 67 4.3.2 Critical Paths dependence on process variations...... 68 4.3.3 Critical Paths dependence on operating workload..... 69 4.4 Proposed Critical Paths Selection Condition...... 71 4.4.1 Handling Spatial Correlation Uncertainty...... 72 4.4.2 Handling Workload Uncertainty...... 80 4.5 Proposed methodology for critical path selection...... 83 4.5.1 Fast Path Pre-Filter...... 84 4.5.2 Potential Critical Paths Under Worst-Case Aging..... 85 4.5.3 Critical Path Selection Algorithm...... 86 4.6 Experimental Results on ISCAS Circuits...... 89 4.6.1 Path Selection Results...... 90 4.6.2 Validation of Spatial Correlation Approximation...... 93 4.6.3 Comparison with Other Works...... 100 4.7 Conclusions...... 101

5 Guardband Reduction by Eﬃcient Selection and Sizing of Critical Gates 103 5.1 Related Work...... 103 5.2 Considerations for Gate Sizing Metrics...... 106 5.2.1 Sizing Impact on Gate Delay...... 106 5.2.2 Sizing Impact on Path Delay...... 107 5.3 Proposed Gate Sizing Metrics...... 108 5.3.1 Statistical Path Delay Sensitivity...... 108

xi CONTENTS

5.3.2 Proposed Sizing Metrics...... 113 5.4 Methodology for Efficient Selection and Sizing of Critical Gates. 118 5.4.1 PCP Identification Under Worst-Case Aging...... 119 5.4.2 Multiple Workloads Aging Analysis...... 119 5.4.3 Selection and Sizing of Critical Gates...... 122 5.5 Results from Application to ISCAS Benchmark Circuits...... 125 5.5.1 Sensitivity Approximation...... 125 5.5.2 Optimization Results...... 126 5.5.3 Benefit of the Multiple Workload-Aware Aging Analysis. 128 5.5.4 Gate Sizing Optimization Comparison...... 132 5.6 Conclusions...... 134

6 Conclusions 137

xii List of Figures

1.1 Technology scaling following the behavior predicted by Moore’s law. The number of transistors per processor and the chip’s operating frequency increase...... 2

1.2 Optical Proximity Correction. Taken from [1]...... 4

1.3 Classiﬁcation of Spatially Correlated Process Variations...... 5

1.4 Spatial correlation behavior as function of distance [2]...... 6

1.5 Number of dopant for various technology nodes [3]...... 7

1.6 Hot Carriers phenomenon...... 9

1.7 Oxide breakdown due to conducting path through the gate oxide to substrate [4]...... 10

1.8 Physical mechanisms for BTI...... 11

1.9 Consumer Electronics and Safety Critical Applications. Reliability is of great interest in Safety Critical Applications (Pictures from google images)...... 12

1.10 Failure rate represented by the bathtub curve...... 13

1.11 Gate Electric ﬁeld increase vs. technology nodes [5]...... 14

1.12 Power consumption increase with technology scaling [6]...... 14

1.13 Total parameter variation vs. technology year [7]...... 15

1.14 Impact of device parameters shift at diﬀerent levels of abstraction. 16

xiii LIST OF FIGURES

1.15 Impact of aging and process variations on the delay of the logic gate. Top: Transient behavior. Bottom: Histogram of the delay samples...... 16 1.16 Digital System and clock waveform...... 17

2.1 Reaction-Diffusion BTI Model...... 23 2.2 Trapping-Detrapping BTI Model...... 25 2.3 Long-term predictive model...... 26 2.4 Threshold voltage degradation dependence with different parameters. 28 2.5 Delay degradation of a path due to BTI...... 30 2.6 Circuit delay degradation due to BTI for different ISCAS Circuits. 31 2.7 Critical path reordering due to different aging rates [8]...... 32 2.8 Aging effects may cause a functional failure...... 32

3.1 Basic definitions for timing analysis...... 38 3.2 Arrival times propagation in block-based timing analysis approach. 40 3.3 Propagation of gate delays through two paths...... 40 3.4 Deterministic Static Timing Analysis. Gates take a fixed delay.. 41 3.5 Process Corners for STA...... 42 3.6 Computed delay of a 20 inverter chain path for each Monte Carlo trial. The obtained histogram is also shown...... 43 3.7 Statistical Static Timing Analysis. Gates delays become a random variable...... 44 3.8 Threshold voltage shift due to BTI as function of operational time for different initial V th due to process variations...... 45 3.9 Histogram of the threshold voltage distribution for a fresh and aged device...... 46 3.10 Workload applied to a circuit...... 47 3.11 Workload impact on delay...... 48 3.12 Workload as a Signal Probability...... 49

xiv LIST OF FIGURES

3.13 Aging-aware path-based timing analysis procedure...... 50 3.14 Behavior of the delay of an inverter gate as a function of important parameters. a) Gate size, b) Load capacitance, c)Input slew rate, d) Temperature...... 52 3.15 Gate delay as a function threshold voltage degradation...... 53 3.16 Characterization process of an inverter gate...... 54 3.17 Example of a digital circuit and its graph representation for searching the topological paths...... 55 3.18 Signal probability propagation rules for some basic gates...... 57 3.19 Devices duty-cycle computation based on signal probabilities at gates inputs...... 58 3.20 Electric model for temperature estimation [9]...... 59 3.21 Two paths with structural correlation, which increases inter-path covariance...... 62

4.1 Aging sensor operation for error prediction...... 65 4.2 Aging Monitoring Framework...... 65 4.3 Critical Paths Definition Over time...... 68 4.4 Process variations impact on defining critical paths...... 69 4.5 Histogram of the delay of a PCP for various workload profiles (IS- CAS c2670)...... 70 4.6 Statistical Delay Difference Between two Paths...... 71

4.7 Impact of σDD on the selection probability of a path (P1 < P2).. 75 4.8 Relationship between sum of path variances and inter-path covariances for some ISCAS circuits...... 77

4.9 Analysis of the global correlation value (ρg) that maximizes the standard deviation of the delay diﬀerence between two paths.... 78

4.10 Ratio of σDD estimated with ρg = 0 to that estimated with ρg = 1 79

xv LIST OF FIGURES

4.11 Analysis of the degraded delay of two paths i and j in ISCAS c2670 for 1000 random-generated operating workloads...... 82 4.12 Behavior over the time of the delay distribution of two paths (i and j)...... 83 4.13 Diagram of the proposed methodology...... 84 4.14 Fast Path pre-filtering...... 85 4.15 Selected critical paths vs. evaluated workload profiles...... 92 4.16 Impact of probability threshold on the CP set size and CPU time (WL = 60)...... 94 4.17 Approximated standard deviation of the delay difference between all the paths in the circuit and the LCP of the circuit Vs. the standard deviation of the delay difference obtained using spatial correlation extracted from circuit layout...... 95 4.18 Histograms of the ratio of of the estimated standard deviation of the delay difference using correlation approaches and the one obtained using spatial correlation information extracted from circuits layout. 97 4.19 Overestimation of critical paths selected using correlation bounds with respect to the critical paths selected using the proposed correlation approximation...... 98 4.20 Histogram of the probability to become critical of the missed paths of all ISCAS circuits...... 99

5.1 Nominal delay, Delay degradation, and Standard Deviation of the delay of an inverter gate as function of its size...... 107 5.2 Impact of sizing a gate on the adjacent gates...... 108 5.3 Example of the magnitude of the components of the statistical path delay sensitivity to gate sizing (Equation 5.4)...... 110 5.4 A path example to illustrate the impact of sizing a gate on its neighboring gates in the path...... 111

xvi LIST OF FIGURES

5.5 Layout of an inverter gate...... 114 5.6 Areas of some basic digital gates as function of gate size..... 115 5.7 Circuit example to illustrate gate criticality...... 116 5.8 Flow of the proposed gate sizing optimization methodology.... 118 5.9 Maximum delay degradation of some paths as function of the number of tested workload proﬁles...... 121 5.10 Fast and Slow PCP sets...... 123 5.11 Statistical delay sensitivity of a Path with respect to sizing each gate in the path. Longest path of C1908 circuit...... 126 5.12 Histograms of the µ + 3σ corner of the circuit aged delay obtained for an exhaustive number (20000) of multiple signal probability groups at main circuit inputs...... 130 5.13 Number of iterations to achieve target guardband...... 134

xvii LIST OF FIGURES

xviii List of Tables

2.1 Aging-Aware Design Techniques...... 33

4.1 Main characteristics of the considered ISCAS circuits...... 90 4.2 Selected paths using the proposed methodology (ε = 0.25%)... 92 4.3 Comparison of CPs selected using spatial correlation against those selected using spatial correlation extracted from circuits layout (ε = 0.25%)...... 98 4.4 Selected nodes to be monitored...... 100 4.5 Methodology Comparison...... 101

5.1 Results using multiple workloads for aging analysis...... 127 5.2 Results using a single workload for aging analysis...... 129 5.3 Results using worst-case aging analysis...... 130 5.4 Percentage of WLs for which 10% of guardband may be violated. 131 5.5 Percentage of area overhead for target guardbands of 10% and 20% of using three selection and sizing methods...... 133

xix LIST OF TABLES

xx List of Acronyms

NBTI Negative Bias Temperature Instability

PBTI Positive Bias Temperature Instability

HCI Hot Carriers Injection

TDDBD Time Dependent Dielectric Break-down

PDF Probability Density Function

LTDM Long-Term Degradation Model

RDF Random Dopant Fluctuation

LCP Longet Critical Path

PCP Potential Critical Paths

SSTA Statistical Static Timing Analysis

DSTA Deterministic Static Timing Analysis

RD Reaction-Diﬀusion

TD Trapping/De-Trapping

DOE Design Of Experiments

SP Signal Probability

xxi LIST OF TABLES

xxii Chapter 1

Introduction

The relentless scaling of semiconductor technology has provided a steady increase in the performance of digital integrated circuits. However, reliability degradation caused by aging-induced and process-induced device’s parameters shifts have become a major concern in deeply scaled technologies, affecting the reliability of modern circuits. This chapter reviews the main reliability issues concerned to semiconductor technology scaling. The chapter is organized as follows: Section 1.1 presents the semiconductor technology scaling trends to achieve higher performance. Section 1.2 reviews the principal process variations issues presented in current and future technology nodes. Section 1.3 reviews the principal transistor aging mechanisms. Section 1.4 highlights the importance of chip reliability analysis and verification in safety-critical applications. Section 1.5 describes how these technology issues affect the reliability and performance of the digital integrated circuits. Section 1.6 provides a brief justification of the research work presented in this thesis. Finally, Section 1.7 gives a brief description of the organization of this thesis book.

1 1.1. SEMICONDUCTOR TECHNOLOGY SCALING

1.1 Semiconductor technology scaling

During the past years, the development of complex electronics products, each time with better performance than the previous generation, has relied on device’s dimensions scaling. Smaller devices are capable to switch faster, which allow digital circuits to perform logic operations in a reduced amount of time. Moreover, higher integration density and lower power consumption are obtained [10]. As illustrated in Figure 1.1, both the number of transistors per processor and the chip’s operating frequency have increased over the years with technology scaling. Until now, the nanometric and sub-nanometric technology regime has been reached, as predicted by Moore’s Law [11]. An important implication of technology scaling is that the cost associated with the integrated circuit’s production is reduced, improving the semiconductor industry proﬁt.

101010 2007 109 (Hz) 1997 10108

10107 1989

k frequency 1978 10106 cloc 105 1971 10 10 10 10 10 10 10 10 10 10 103 104 105 106 107 108 109 1010 transistors per10 processor transistors per processor Figure 1.1: Technology scaling following the behavior predicted by Moore’s law. The number of transistors per processor and the chip’s operating frequency increase.

Unfortunately, as the technological and physical limits of the devices are reached, new design challenges have emerged. In deeply scaled technologies, device’s parameters may shift from the intended values at design due to fabrication process-induced variations and device aging [12,13]. Even these small variations on the intended device’s parameters can result in a signiﬁcant change in circuit per-

2 1.2. PROCESS PARAMETERS VARIATIONS formance (i.e., delay) [14], which can make a circuit to violate specifications. Even more, circuits that meet design specifications right after manufacturing may eventually violate them due to aging effects. Assurance of lifetime reliability of modern digital integrated circuits under the combined effect of process variations and aging has become a major concern for designers. Therefore, an in-depth analysis of the impact of these mechanisms on circuit performance and reliability is mandatory in modern and future technology nodes. In such way, new design techniques can be developed to assure correct functionality of modern high-performance and complex circuits.

1.2 Process Parameters Variations

Process parameters variations are caused by the randomness of manufacturing process due to imprecise process control, as well as the intrinsic variability of some fabrication steps (i.e., channel doping) [3,15]. Process variations are of static nature, which means that induced device parameters shifts remain constant over the lifetime. Process variations are classiﬁed into systematic and non-systematic variations.

1.2.1 Systematic Variations

Systematic variations are design-dependent variations (i.e., depend on layout characteristics), which are mainly presented in interconnections [16]. Systematic variations make the width, length, and shape of fabricated interconnections to diﬀer from those intended at the design phase. A principal source of systematic variation is the inability to scale the wavelength of the light source for lithography, which has led to optical proximity eﬀects [17]. Figure 1.2 (top) shows an example of the design target for a given interconnection, its corresponding mask used for lithography and the printed shape on the wafer. Detailed analysis of the layout

3 1.2. PROCESS PARAMETERS VARIATIONS can be performed to account systematic variations at pre-manufacturing [18]. A resolution-enhancement technique called Optical Proximity Correction (OPC) is based on pre-distorting mask patterns to make printed shapes on wafer appear closer to the targeted shapes, as shown in Figure 1.2 (bottom) [1].

Lithography

Figure 1.2: Optical Proximity Correction. Taken from [1].

1.2.2 Non-Systematic Variations

Non-systematic variations, also known as random variations, correspond to the truly uncertain component of physical parameter variations. Non-systematic variations are modeled as individual random variables for each device in a circuit. These random variables are usually assumed to be normal-distributed [18, 19]. Non-systematic variations themselves can be classiﬁed into spatially correlated variations (Die-to-Die and Within-Die) and independent (Pure Random) variations.

Spatially Correlated Variations

Spatially correlated variations are those parameter variations that tend to shift similarly between devices. These variations mainly affect the channel length (L), the channel width (W) and oxide thickness (Tox) of the fabricated devices [19,20]. Spatially correlated variations arise at different fabrications steps, which define the spatial scale of the variation [21], as shown in Figure 1.3. Discrepancies

4 1.2. PROCESS PARAMETERS VARIATIONS in the alignment of the wafers when processing one lot of wafers to the next may lead to Lot-To-Lot variations, which affect all the devices within the same wafer lot similarly. When processing a given wafer, discrepancies between the different masks used during photo-lithography may lead to Wafer-to-Wafer variations. Furthermore, small variations in the laser intensity and exposure time while a par- ticular mask is scanned may lead to Reticle-to-Reticle Variations. At a smaller scale, variations across-mask may lead to within-die variations, which means that devices in the same die may have different critical dimensions [18]. As technology scales down, within-die variations have become a major contributor to the total parameters variations [7,19].

Between Wafers

Die to Die Variations Within Wafer

Between lots

Figure 1.3: Classiﬁcation of Spatially Correlated Process Variations.

Spatially correlated variations cause that the parameter’s shifts of a device in a certain location will change with some degree of correlation with the parameter’s shifts of another device located in a diﬀerent position. The amount of correlation between two devices has been shown to be a strong decaying function of the distance between them [2,22]. Figure 1.4 illustrates this phenomenon, where two devices closely located will be highly correlated, and two devices that are far away from each other will be poorly correlated. It is important to highlight that even two devices placed far away within a die will exhibit some degree of correlation due to variations from Reticle-to-Reticle to Fab-to-Fab scale, which impact nearly equal to all the devices within a die. A model for robust extraction of spatial correlation between two devices i and j from the circuit layout was proposed in [2],

5 1.2. PROCESS PARAMETERS VARIATIONS

Medium Correlation

High Correlation

Low Correlation

Figure 1.4: Spatial correlation behavior as function of distance [2].

ρij = exp(−dij/CD) (1.1)

where ρij is the degree of spatial correlation between the gates, dij is the distance in the layout between gates i and j, and CD is the correlation distance, which is a constant that modulates the decaying rate of the correlation, and it has been shown to be roughly half of the chip width [22].

Independent Variations

Independent variations are those non-exhibiting any degree of correlation between devices. These variations lead to devices with diﬀerent performance even when placed near to each other in the layout. A major source of independent variations is the Random Dopant Fluctuation (RDF) of the atom impurities doping the channel region of a transistor. Small variations in the implant dose, energy, or angle of the ion beam lead to ﬂuctuations in the discrete location of dopant atoms in the channel and source/drain regions.

These variations mainly impact on the threshold voltage (Vth) of the devices [7]. Since the number of dopant atoms reduces as technology scales down (See Figure

6 1.2. PROCESS PARAMETERS VARIATIONS

1.5), RDF has become of great concern in modern technology nodes [3].

Figure 1.5: Number of dopant for various technology nodes [3].

Threshold voltage variation due to RDF (σV th,RDF ) has been shown to be inversely proportional to the transistor area [23],

s AVT σV th,RDF = σVth0 · (1.2) Weff Leff

where AVT is a technology constant dependent on the critical dimensions, Weff and Leff are the eﬀective channel width and length, respectively, and σVth0 is the threshold voltage variation of the technology.

Modeling of Process Parameter Variations

Due to the process variations, the physical and electrical parameters of each device have to be modeled as a random variable. Variations in channel length, channel width, oxide thickness and threshold voltage, have been shown to be the dominant contributors to device performance variations [19,20]. Due to the diﬀerent sources of variations at Die-to-Die, Within-Die, and Independent scale, each parameter P can be represented as the sum of each contribution to total parameter variation:

7 1.3. TRANSISTOR AGING

P = Pnom + ∆PD2D + ∆PWID + ∆PR (1.3)

where Pnom is the nominal parameter value at design, and ∆PD2D, ∆PWID and ∆PR are random variables representing the parameter variations due to Die- to-Die, Within-Die, and Independent process variations sources. These random variables are assumed Gaussian-distributed [18].

1.3 Transistor aging

Together with process variations, transistor’s aging has become a major concern in deeply scaled semiconductor technologies [24]. Unlike process variations that are static in nature, transistors aging mechanisms cause a gradual deterioration of the electrical parameters of the devices over the lifetime, which in turn degrades the performance of the circuit [25–27]. Aging mechanisms reduce the proﬁt of semiconductor industry as they strongly impact on the lifetime reliability of the integrated circuits. The most important transistor aging mechanisms are described below.

1.3.1 Hot Carriers Injection (HCI)

HCI mechanism affects both NMOS and PMOS transistors. HCI occurs when the electric field at the drain-to-channel depletion region of a transistor is too high given kinetic energy to free electrons. Carriers are accelerated until they have enough energy to overcome the potential barrier of the Si/SiO2 interface and leave the channel. As is shown on Figure 1.6 some of those hot carriers damage the gate oxide and the interface or get trapped in the oxide and form space charges. Both mechanisms lead to a degradation of the transistor characteristics [28]. The rest of the carriers contributes to the gate or bulk current. The end result of HCI is a degradation of transistor’s parameters such as effective mobility and

8 1.3. TRANSISTOR AGING threshold voltage [16]. It is important to note that device’s damage due to HCI is not recoverable, hence it results in monotone transistor aging [29].

Figure 1.6: Hot Carriers phenomenon.

1.3.2 Time Dependent Dielectric Breakdown (TDDB)

TDDB, also known as gate oxide breakdown, is a failure mechanism caused by the long stress time of the Si − O2 at high electric ﬁelds. As illustrated in Figure 1.7, the oxide breakdown occurs when a conducting path through the gate oxide to substrate if formed due to electron tunneling current [4]. In addition to thin gate oxides, the saturating trend in supply voltage scaling results in a large electric ﬁeld applied to the gate oxide, which increases the probability of dielectric breakdown.

1.3.3 Bias Temperature Instability

Bias Temperature Instability (BTI) is the dominant aging mechanism in digital circuits, affecting both NMOS and PMOS transistors biased at specific stress conditions: positive gate bias for NMOS (PBTI) and negative gate bias for PMOS (NBTI) transistors [30]. In previous technology nodes, the PBTI effect on NMOS transistors was negligible in comparison to the effect of NBTI. However, since the

9 1.4. RELIABILITY OF DIGITAL INTEGRATED CIRCUITS

Figure 1.7: Oxide breakdown due to conducting path through the gate oxide to substrate [4]. introduction of high-k metal-gate technologies, PBTI eﬀect has become a critical aging concern for NMOS devices [31]. At stress conditions, devices threshold voltage (Vth) is increased due to BTI. Two physical mechanisms have been identiﬁed as contributors to BTI degradation (See Figure 1.8):

Interface Traps: Traps are generated at the Si − SiO2 interface due to the breaking of weak Si − H bonds under the presence of a high vertical electric ﬁeld. The generated traps at the silicon-oxide interface increase the device’s threshold voltage with a time power law relation (tn)[32][33][34].

Bulk oxide Traps: There exist a number of oxide defect states. Due to tunneling through the pre-existing defects within the oxide, a defect (trap) can capture a charge carrier from the channel, which increases the threshold voltage [30,34].

1.4 Reliability of digital integrated circuits

Reliability is deﬁned as the ability of a circuit to perform its required function under stated constraints (i.e., operating temperature, voltage, etc.) for a speciﬁed

10 1.4. RELIABILITY OF DIGITAL INTEGRATED CIRCUITS

g g g g s s d d s s defects defectsd d X X X X X X X X X X X n+ Interfacen+ trapsInterfacen+ traps n+ n+ Chargen+CarriersCharge n+Carriers n+

(a) Interface Traps Generation (b) Oxide Bulk Traps

Figure 1.8: Physical mechanisms for BTI.

lifetime [35]. The required degree of product reliability is application dependent. Figure 1.9 shows two different kinds of electronic applications. In the top: Consumer Electronics, which are electronic equipment intended for everyday use, most often in entertainment, communications and office productivity. Reliability of those electronic products is not very critical because their use is determined by a short time range and they can fail without causing major losses. In the bottom: Safety-Critical Applications, which are those applications whose failure could result in loss of life, significant property damage, or damage to the environment. Reliability is a great concern for safety-critical applications. There are many well-known examples in application areas such as medical devices, aircraft flight control, weapons, and nuclear systems. Furthermore, many modern information systems are becoming safety-critical in a general sense because their failure may result in a large financial loss or even loss of life [36].

1.4.1 Reliability and Yield

Reliability and Yield are two important parameters for the development of a new technology node. Yield can be deﬁned as the probability of failure of a recently manufactured device (t = 0), while reliability can be seen as the probability of a functional failure over an operating lifetime (t > 0) of the device. A fabrication

11 1.4. RELIABILITY OF DIGITAL INTEGRATED CIRCUITS

Figure 1.9: Consumer Electronics and Safety Critical Applications. Reliability is of great interest in Safety Critical Applications (Pictures from google images). process with low yield is unacceptable to start large-scale production, but even a process with high yield and low reliability is not economically sustainable at the long-term because many chips will fail to intended speciﬁcation before the expected lifetime ends. Moreover, for safety-critical applications, circuits with a low degree of reliability lead to catastrophic results.

1.4.2 Bathtub curve

The Bathtub Curve (See Figure 1.10) illustrates a measurement of the number of failures per unit of time (failure rate) vs the operating time [37]. The failures that can occur in a digital circuit are classiﬁed according to the point in the lifetime that these failures occur. Early Failures, also known as Infant Failures correspond to those devices that quickly fails during operational lifetime. These failures are presented due to manufacturing defects, consequently, their occurrence reduces as the manufacturing process reaches maturity. In addition, most of these defects can be detected during the post-manufacture test. Therefore, most of the circuits with early failure defects do not reach the customer. Random Failures correspond to those failures presented during normal lifetime and are attributed to random external events or early aging. The failure rate during a normal lifetime is usually very small, leading to correct circuit functionality. Aging Failures are failures caused by the physical deterioration, or aging, of the devices. The time at

12 1.4. RELIABILITY OF DIGITAL INTEGRATED CIRCUITS which these failures become dominant depends on technology parameters and the circuit’s operating conditions such as supply voltage, temperature, and workload.

Early Failures Random Failures Wear-Out Failures

Observed Failures Failure Rate

Lifetime

Figure 1.10: Failure rate represented by the bathtub curve.

1.4.3 Technology Scaling Impact on Reliability

The lifetime reliability of electronic products reduces with technology scaling. because transistor aging becomes more significant. One reason behind the increased impact of aging mechanisms is that constant field scaling [38] has not been achieved because it requires that threshold voltage scales proportionally to the feature size reduction. However, threshold voltage scaling is limited by the off-state current of the devices. Therefore, in order to achieve high-performance specification, supply voltage has not been scaled as aggressively as the devices feature size [39]. Another reason why supply voltage is not further reduced is to keep noise immunity under conservative levels. Hence, area scaling without proper supply voltage scaling results in higher electric fields across the transistor’s thin gate oxide, as illustrated in Figure 1.11. Higher power density is also a major contributor to accelerated circuit aging in scaled technologies. As illustrated in Figure 1.12, the power consumption has been increasing with technology scaling in spite of the reduced supply voltage. This is due to the larger device density on a chip. However, high power densities

13 1.4. RELIABILITY OF DIGITAL INTEGRATED CIRCUITS

6 10

4 8

2 10006 700 500 350 250 180 130 90 65 45 Technology node (nm) 4 Figure 1.11: Gate Electric ﬁeld increase vs. technology nodes [5]. 2 manifest in elevated1000 operating700 500 temperatures350 250 along180 the130 chip,90 which65 further45 triggers Technology node (nm) the aging mechanisms [14].

1200

1000 Leakage Dynamic 800

600

400

200

0 250 180 130 90 65 45 Technology node (nm)

Figure 1.12: Power consumption increase with technology scaling [6].

A paralleling trend with technology scaling is that process parameters variations increase due imperfections in the manufacturing process of the ICs [40]. Figure 1.13 illustrates the total variations of some transistor parameters for different technology years. As can be seen, the relative percent of parameters unpre-

14 1.5. IMPACT OF AGING AND PROCESS VARIATIONS ON PERFORMANCE OF DIGITAL INTEGRATED CIRCUITS dictability is becoming signiﬁcant with technology scaling. Furthermore, process variations impact with aging mechanisms and aggravate the lifetime reliability issue [41–43].

45 T V 40 L ox th 35 30 25 20 15 10 5 0 1997 1999 2002 2005 2006 Year

Figure 1.13: Total parameter variation vs. technology year [7].

1.5 Impact of aging and process variations on performance of digital integrated circuits

Figure 1.14 illustrates the impact of device’s parameters shifts on circuit performance for different levels of abstraction. Variations on device level parameters, which are caused by process variations and aging effects, result in a significant impact on electrical properties of devices (i.e., On-state current), which in turn affect circuit level parameters such as delay. Due to the joint effect of aging and process variations, the delay of logic gates become highly uncertain. Figure 1.15 illustrates the transient behavior for a high to low output transition of a gate cell under both process variations and aging effects. As can be seen, the time at which the output signal crosses 0.5V (≈

VDD/2) changes signiﬁcantly due to process variations. Moreover, as the devices

15 1.5. IMPACT OF AGING AND PROCESS VARIATIONS ON PERFORMANCE OF DIGITAL INTEGRATED CIRCUITS

High Circuit Parameters e.g.

Electrical Parameters e.g. , l of abstractionl of Device Parameter e.g. , , , ℎ

Leve Low

Figure 1.14: Impact of device parameters shift at diﬀerent levels of abstraction. deteriorate due to aging mechanisms (i.e., BTI), the time at which the transition occurs increases. As can be seen in the histograms, both the mean value and the shape (variance) of the delay distribution change due to aging [43,44].

1.2 1 0.8 0.6 0.4 , 0.2 , 0 0 20 40 60 80 100 120 140 time (ps)

80 70 60 fresh aged 50 40 Wear-out 30 20 10 0 20 25 30 35 40 45 Delay (ps)

Figure 1.15: Impact of aging and process variations on the delay of the logic gate. Top: Transient behavior. Bottom: Histogram of the delay samples.

A digital system is based on the structure shown in Figure 1.16(a). A combinational logic block receives various input signals, which are processed and prop-

16 1.5. IMPACT OF AGING AND PROCESS VARIATIONS ON PERFORMANCE OF DIGITAL INTEGRATED CIRCUITS agated through the logic gates to the outputs, where the resulting data is stored in memory elements (Flip-Flops). This task is performed in a synchronous way based on the clock signal. The period of the clock signal deﬁnes the available time to process the input logic data and propagate logic signals to the memory elements. The long-term eﬀect of aging mechanisms and with process variations may result in the incapability to process the logic signals within the clock period. This occurs when the summation of all gate delays along a given path is larger than the clock period. Therefore, an incorrect logic data is stored in the Flip-Flops.

Combinational Logic

D Q D Q D Q Critical Path D Q CKD Q CKD Q CK D Q CKD Q CK CK CK CK

Clock Signal (a) Digital System

Clock

Clock Signal Nominal Guardband

Clock+GB

Clock Signal with Guardband (b) Clock Signal

Figure 1.16: Digital System and clock waveform

Guardband for reliable operation As illustrated in Figure 1.16(b), a simple approach to assure safe circuit’s operation is to add a time guardband to the clock period. This guardband provides extra time for signal propagation through the critical signal paths having large delays [45,46]. The guardband is used not only to cover variations caused by aging and process, but also variations in the supply voltage and operating temperature

17 1.6. JUSTIFICATION OF THIS THESIS of the system, among others. The size of the guardband depends on the impact of each variation source on circuit timing [47], and it is usually estimated based on worst-case assumptions (i.e., worst process, voltage, temperature, and aging) to guarantee circuit’s correct functionality [48]. However, this means that maximum circuit’s performance is reduced. Unfortunately, since aging and process variations are aggravated with technology scaling, the required guardbands are becoming too large, limiting the beneﬁts of technology scaling.

1.6 Justiﬁcation of this thesis

Operational failures that may arise because of aging and process variations effects are unacceptable for safety-critical applications. Hence, the development of reliability-aware circuit design techniques is mandatory for current and future technology nodes. This thesis addresses aging issues from two diﬀerent perspectives:

1. Aging Monitoring:

This thesis proposes a methodology to eﬃciently identify, at an early design phase, the critical paths of the circuit that should be monitored for reliable on-line aging tracking. The major challenges addressed for critical path selection are the uncertainty caused by process parameters variations, the uncertainty on spatial correlation between gates since gate placement in the layout is unknown at early design phase, and the uncertainty of the operating workload of the circuit, which is unknown in advance at the design phase.

2. Reliable Circuit Design:

18 1.7. THESIS ORGANIZATION

Among the gates that belong to the critical paths of a circuit, some of them are more eﬃcient to be re-sized to compensate aging degradation.

This thesis proposes a circuit design methodology to reduce the guardband required for a circuit by smart selection an sizing of the circuit’s critical gates. Gate selection metrics are proposed to guide the sizing process of the circuits. One of the most important parameter taken into account by the metrics is the statistical path delay sensitivity to gate sizing, which depends on process variations, spatial correlation between gates and the BTI degradation of the devices in the path. Classic worst-case aging assumptions are avoided by analyzing the circuit performance for various possible operating workloads and identifying a more realistic maximum degradation of the paths. In such way, area overhead due to worst-case aging assumptions is further mitigated. The proposed approach provides efficient trade-offs between gate sizing and guarband to compensate BTI aging under process variations effects.

The contributions of this thesis will provide a better understanding of how the aging mechanisms aﬀect digital circuits. Moreover, new aging-aware design techniques will be available to help designers to approach the complex problem of the design of reliable circuits with high quality and yield.

1.7 Thesis organization

This thesis document is organized as follows: Chapter1 has presented a brief introduction of the reliability issues in nanoscale technologies and the justiﬁcation of this thesis. Chapter2 describes the Bias Temperature Instability mechanism and its modeling for long-term reliability analysis of integrated circuits. The impact of BTI at circuit level and state-of-art aging-aware techniques are also analyzed. Chapter3 presents the framework for BTI-Aware Statistical Static

19 1.7. THESIS ORGANIZATION

Timing Analysis, which is used to simulate and analyze digital circuits under the joint eﬀect of aging and process variations. Chapter4 presents the proposed methodology to identify, at an early design stage, the critical paths due to BTI and process variations of a digital circuit. The results from ISCAS circuits are also given. Chapter5 presents the proposed methodology for reliable circuit design using gate selection and sizing metrics. Finally, Chapter6 summarizes the general conclusions of this work.

20 Chapter 2

BTI Mechanism and its Impact on Digital Integrated Circuits

Among aging mechanisms, Bias Temperature Instability (BTI) is well known to be a major lifetime reliability limiting factor. This Chapter presents a detailed description and analysis of the BTI physical mechanism and its impact on digital circuits performance. Furthermore, a review of the state-of-art circuit design techniques to deal with transistor reliability issues is given. This chapter is organized as follows: Section 2.1 explains the physical mechanisms behind BTI degradation. Section 2.2 presents a widely used long-term model to predict the BTI-induced Vth shift of a device within a circuit. Section 2.3 describes the impact that BTI aging has on circuit level performance of digital circuits. Section 2.4 describes the principal BTI-aware circuit design techniques.

2.1 Physical Mechanisms for Bias Temperature Instability

Bias Temperature Instability is an aging mechanism that aﬀects both PMOS and NMOS transistors. The BTI eﬀect on PMOS, called Negative-BTI (NBTI), occurs

21 2.1. PHYSICAL MECHANISMS FOR BIAS TEMPERATURE INSTABILITY when a Negative gate to source voltage is applied to the device. NBTI is considered the major reliability issue before the 45nm technology node. However, the BTI eﬀect on NMOS device, called Positive-BTI (PBTI), has become important since the introduction of the high-k metal gate dielectric in sub-45nm technologies [49]. PBTI occurs when a positive gate to source voltage is applied to the device. At stress condition, trap generation at silicon-oxide interface and carriers trapping at oxide defects are two widely accepted BTI mechanisms responsible for the observed device’s threshold voltage (Vth) increase over the lifetime, resulting in a reduction of current drive capability [50]. An accurate estimation of the Vth degradation that devices experience is critical in the design of reliable digital circuits. The BTI mechanism has been explained using two physics-based models: The Reaction-Diﬀusion Model and The Trapping-Detrapping Model [30].

2.1.1 Reaction-Diﬀusion Model

The reaction-diﬀusion (RD) model is one of the earliest approaches to explain BTI mechanism. It explains that traps are generated at the silicon-oxide interface due to the dissociation of weak Si −H bonds under the high vertical electric ﬁelds and elevated temperatures [33][51].

Figure 2.1 illustrates the RD mechanism for BTI. In the silicon wafer, each Si atom is bonded with four other Si atoms, forming a tetrahedral structure, where all the states of the valence band of each Si atom are ﬁlled. At the silicon-oxide interface, most of the Si atoms are bonded with oxygen. However, due to structural imperfections, some Si atoms may remain unbounded. Those unsatisﬁed

(dangling) bonds of the Si atom are called interface traps [52]. The interface traps become charged when occupied by holes or electrons, hence contributing to threshold voltage shift. In order to mitigate the number of interface traps, the

Si − SiO2 interface is passivated with hydrogen atoms during fabrication hence some weak Si − H bonds are formed at the interface, as shown in Figure 2.1.

22 2.1. PHYSICAL MECHANISMS FOR BIAS TEMPERATURE INSTABILITY

Poly Gate Oxide (SO) Dangling Bond d s

+ + + + + P+ + + + + + + + P+

Carriers Channel Semiconductor

Figure 2.1: Reaction-Diﬀusion BTI Model.

However, when a gate voltage is applied (BTI stress), some weak Si − H bonds may break (Reaction) due to the high vertical electric ﬁeld. The dissociation of

Si − H bonds increases at elevated temperatures [33]. The released H atoms then diffuse into the gate oxide (Diffusion) in the form of different H − species (H or

H2). Then, the unsatisfied dangling bond at the interface can capture a charge carrier from the channel via field-assisted tunneling (See Figure 2.1). This reduces the total carriers that flow through the channel, which can be seen as an increase in the threshold voltage of the device. When the stress is removed (Vgs = 0), the device experience a partial recovery, when hydrogen species diffuse back to the interface and passivize some of the dangling bonds. The recovery phase has an important impact on the estimation of BTI degradation in the AC operation of digital circuits, where devices are continuously switching between stress and recovery BTI phases. However, this recovery is partial, and the overall BTI effect tends to gradually Vth.

The overall rate of interface traps generation in the RD model can be described by the following equation [33]:

23 2.1. PHYSICAL MECHANISMS FOR BIAS TEMPERATURE INSTABILITY

∂Nit = kF · (No − Nit) − kRNH (0)Nit (2.1) dt | {z } | {z } generation annealing

where No is the initial number of Si − H bonds, Nit is the number of interface traps, kF is the breaking rate constant of Si − H bonds, and it depends on the operating temperature, kR is the recovery rate constant when H species diﬀuse back to the interface and it also depends on the operating temperature, NH (0) is the number of hydrogen atoms at the Si − SiO2 interface.

2.1.2 Trapping-Detrapping Model

The Traping-Detrapping (TD) model explains BTI issue due to the capturing of individual charge carriers at the gate oxide defects, which in turn changes the effective threshold voltage of the devices [53], [54]. This mechanism is also used to explain random-telegraph noise (RTN) effects. Figure 2.2 illustrates the TD mechanism for BTI. It is assumed that some pre-existing traps (called oxide vacancies) are located in the dielectric of the device. These defects are presented due to the amorphous nature of the SiO2 [52]. When BTI stress is presented, trap energy (relative to the Fermi energy level) is modulated. If the trap gains sufficient energy, it may capture a charge carrier from the channel (via tunneling). A charge trapping event changes the number of carriers available to conduct current and also affects the mobility of channel charge carriers [34,55]. This leads to discrete steps in the threshold voltage of the device and the current flowing through the channel. When stress is removed, the trapped charge may be emitted after sufficient recovery time. Trapping and detrapping events have been shown to be stochastic in nature [56]. The probability of trapping and de-trapping a charge depends on capture and emission time constants, which are strongly dependent on stress voltage and temperature. The probabilities of defect occupancy in case of capture (PC ) and

24 2.1. PHYSICAL MECHANISMS FOR BIAS TEMPERATURE INSTABILITY

Poly

Gate Oxide (SO) Oxide Vacancy d + s

+ + + + + + + + + + + P+ P+ Carriers Channel

Semiconductor

Figure 2.2: Trapping-Detrapping BTI Model.

emission (PE) are deﬁned by:

τe 1 1 PC (tstress) = 1 − exp − + · tstress (2.2a) τc + τe τe τc τc 1 1 PE(trelax) = 1 − exp − + · trelax (2.2b) τc + τe τe τc

where τc, the capture time constant, represents the stress time at which the capture of a charge carrier become highly probable, τe, the emission time constant, represent the amount of time at which the emission of the charge carrier from the defect is highly probable, tstess and trelax are the stress and relaxation times. The traps that contribute to transistor deterioration due to BTI are those whose capture time constant is much larger than the emission time constant. Thus, the rate at which charge carriers are captured abruptly becomes larger than the rate at which carriers are emitted, and the number of trapped charge increases over the lifetime [34].

The eﬀectiveness of the reaction-diﬀusion model and the trapping-detrapping model for prediction of the performance degradation of logic gates were investigated in [57]. It was concluded that both models match well to obtain the BTI

25 2.2. LONG-TERM BTI PREDICTIVE MODEL degradation signature. The TD model was shown to be superior to capture fast degradation changes due to non-periodicity of the stress pattern. On the other hand, the RD model is more suitable for long-term modeling as it speed-up computational time [57].

2.2 Long-Term BTI Predictive Model

The previous section presented two physical mechanisms to explain the BTI issue in semiconductor devices. However, detailed physical simulation of BTI is not practical for long-term reliability analysis of very large scale integrated circuits, where millions of devices are presented. Furthermore, each device may experience a diﬀerent operating temperature and stress-recovery patterns (stress duty cycle), depending on the workload applied to the circuit, which further complicates physical simulations. Long-term predictive models aim to address the aforementioned issue by estimating the degradation of a device given the expected operating conditions. As illustrated in Figure 2.3, instead of performing detailed cycle-to-cycle stress and recovery phase simulation, a long-term model provides a tight upper bound for the degradation on the transistor.

Long term model h Vt Δ Stress and Relaxation phases

Figure 2.3: Long-term predictive model.

The ﬁrsts long-term predictive models [58, 59] were based on the Reaction- Diﬀusion physics. These models express BTI degradation as a power-law function

26 2.2. LONG-TERM BTI PREDICTIVE MODEL

n of time (∆Vth ∼ t ). Recently, long-term BTI models based on the Trapping- Detrapping physics have been investigated [34,60], where BTI degradation is expressed as a logarithmic function of time (∆Vth ∼ log(1 + C · t). Power-law models, based on the reaction-diﬀusion mechanism are still widely accepted in the literature [61–64]. The following closed form equation to calculate

BTI-induced Vth degradation was used [64],

p n n ∆V thBTI ≈ K·tox· Cox · (VGS − VTH0)·exp(Eox/E0)·exp(−Ea/kT )·α ·t (2.3)

where n is the time exponent, tox is the gate oxide thickness, Eox is the vertical electric ﬁeld, T is the temperature, k is the Boltzmann constant, Cox is the oxide capacitance per unit of area, VTH0 is the initial (fresh) threshold voltage value,

Ea and E0 are constants. The parameter α is the duty cycle, which refers to the percentage of time the device is in stress condition. The duty cycle of a device depends on the executed workload by the circuit. The duty-cycle solution of the long-term model avoids detailed cycle-to-cycle simulations and allows a fast estimation of device’s degradation. The constant parameter K is a technology- dependent ﬁtted constant, which may take a diﬀerent value for for NBTI and PBTI.

The closed form of Equation 2.3 determines the threshold voltage degradation ∆V th for a given time while taking into account the impact of important parameters. Figure 2.4a-d shows the behavior of both NMOS and PMOS threshold voltage degradation with diﬀerent parameters obtained with the long-term predictive model (Equation 2.3) for a 32nm CMOS technology [65]. Figure 2.4(a) shows the dependence with stress time. As can be seen, at the beginning of the lifetime operation, BTI causes a sharp increase in the threshold voltage. This occurs because there is a high density of weak Si − H bonds that can be broken. As more H atoms are removed from the interface over the lifetime, the increase

27 120 120 100 100 80 80 60 60 NMOS 40 NMOS 40 PMOS PMOS 20 20 0 0 2.2. LONG-TERM0 BTI2 PREDICTIVE4 6 8 MODEL10 60 80 100 120 time (years) temperature (°C)

120 120 120 120 100 100 100 110 120 120 80 80 80 100 100 90 60 60 80 6080 NMOS NMOS NMOS 40 40 40 60 PMOS 60 NMOSPMOS 70 PMOS 20 20 PMOSNMOS 40 NMOS 2040 0 PMOS 0 PMOS 50 20 200 0 2 4 6 8 10 0 60 0.2 0.480 0.6 1000.8 1 120 -100 -50 0 50 100 | V | 0 time (years) 0 stresstemperature probability (°C) th0 0 2 4 6 8 10 20 40 60 80 100 120 120(a) Dependencetime (years) with stress time (b) Dependence120 temperature with stress (°C) probability 100 110 120 120 120 120 80 100 100 100 90 60 80 80 80 NMOS 100 NMOS 40 70 60 60 60 PMOS PMOS 20 NMOS 40 NMOS 40 40 80 NMOS NMOSPMOS 50 PMOS 0 -100 PMOS-50 0 50 100 20 20 2060 80 100PMOS 120 | V | 0 0 0 temperature (°C) 60 th0 0 2 4 6 8 10 0 20 0.2 40 0.4 60 0.6 80 0.8 100 1 120 -100 -50 0 50 100 | V | time (years) stresstemperature probability (°C) th0 (c) Dependence with temperature (d) Dependence with the initial Vth 120 120 110 100 Figure 2.4: Threshold voltage degradation dependence with different parameters. 80 90 60 NMOS 40 NMOS rate of70 Vth reduces.PMOS Note that this power-law behavior (∆V th ∝ tn) is reported 20 PMOS 0 in the50 literature for the reaction-diffusion model [32]. Figure 2.4(b) shows the 0 0.2 0.4 0.6 0.8 1 -100 -50 0 50 100 | V | stress probability impact of the stress probabilityth0 (duty cycle). Stress probability is defined as the percentage of time the device is under stress condition [66, 67]. A large stress probability means that most of the time, the transistor is at BTI stress, with little recovery time, which results in a larger degradation. On the other hand, a low stress probability indicates that the device experience large relaxation pe- riods, which mitigates the total Vth deterioration. It is important to note that the specific stress probability experienced by the devices in a circuit depends on the input pattern applied to the circuit (i.e., the workload), which defines the signal probabilities (SP) at main circuit inputs. Depending on the circuit usage, these signal probabilities are translated into a stress probability for the devices at internal gates. Figure 2.4(c) shows that temperature has a significant impact

28 2.3. BTI IMPACT ON DIGITAL CIRCUITS

on BTI degradation. This is because Si − H bonds break easier at elevated temperatures, and H atoms diﬀuse at larger speed [68], [69]. Figure 2.4(d) shows the dependence of BTI degradation with respect to the initial (fresh) value of device’s threshold voltage. As mentioned in Chapter I, process-related parameters such as Vth can also change due to variations in the fabrication process. It can be seen a closely linear relation between the shift on Vth caused by process variation and the shift on Vth caused by BTI aging. When the magnitude of the initial threshold voltage is smaller (represented by a negative shift of V th0), the obtained Vth shift due to BTI is larger. Therefore, process variations and BTI interact with each other [43]. Note from Figures 2.4a-d that we considered that the degradation caused by PBTI mechanism on NMOS devices is slightly larger than the degradation caused by NBTI mechanisms on PMOS devices, as reported from silicon measurements in [31].

2.3 BTI Impact on Digital Circuits

The increment on Vth reduces the logic gates current capability to charge or discharge capacitive loads. A reduction in the current capability increases the delay of the gates to propagate the logic functions in a digital circuit. The aged delay of a gate (Dage) can be computed once the Vth shift of each of its transistors is predicted. The aged delay of a gate (Dage) is composed of the sum of the nominal gate delay (Dn) and the gate delay degradation caused by BTI (∆DBTI ) aging, as shown in Equation 2.4.

Dage = Dn + ∆DBTI (t) (2.4)

As shown in Figure 2.5, each gate in a path may have a diﬀerent degradation depending on its operating conditions such as temperature and the duty cycle of the devices in it. The total path delay degradation is given by the delay

29 2.3. BTI IMPACT ON DIGITAL CIRCUITS degradation contribution of each gate. As circuit ages, the gate delays increase, and consequently, it is possible that a path fails timing specifications, although the specifications were fulfilled right after manufacturing. Therefore, BTI aging limits the lifetime of digital circuits. The paths that may cause such faulty behavior are usually referred as Critical Paths.

Delay Delay Delay

Delay time time time Critical Path time Delay

Spec.

time

Figure 2.5: Delay degradation of a path due to BTI.

Figure 2.6 shows the delay degradation over the lifetime of two digital circuits. The degradation of each circuit may be quite different depending on the set of stress probabilities taken by the devices (represented α1, α2, and α3). If the input pattern applied to the circuit causes a large stress probability on critical devices, the delay degradation rate of the circuit increases. This behavior is of great concern in hardware security due to potential aging attacks [70]. Also, delay degradation of the circuits is different because of circuit’s topology characteristics, such as logic depth, and node arrangement of the gates, which impact on the stress probability of the devices [71, 72]. Several works in the literature show that the percentage of delay degradation caused by BTI can be around 10% to 30%, and its impact is aggravated in further scaled technologies [30]. The BTI impact on delay can be more pronounced in some gates than others because the specific stress probability that the devices experience during their operational time is non-uniform. Moreover, some gates may be more sensitive to the Vth shift due to BTI. Consequently, the delay degradation rate of two signal paths can also be quite different. This effect can lead to eventually reordering

30 2.3. BTI IMPACT ON DIGITAL CIRCUITS

200 175 : 880 150 125 100 75 50 : 1908 25 0 0 1 2 3 4 5 6 7 8 9 10 time (years)

Figure 2.6: Circuit delay degradation due to BTI for different ISCAS Circuits. of the circuit critical paths as time progresses [8]. Figure 2.7(b) shows the path reordering behavior for the small circuit shown in Figure 2.7(a). Although the delay at output 9 is initially the slowest of the circuit, the delay at output 10 eventually become the slowest one after some aging time. This is different to classic timing analysis with no aging, where the paths of a circuit have a fixed timing order over time. However, under BTI effects, the original critical path may become non-critical and vice versa. This makes complex the critical path definition under aging effects. Furthermore, this effect may become more pronounced under process variations.

As shown in Figure 2.8, if BTI-induced delay degradation is significant, a circuit may fail to time specification before the end of its expected lifetime, even if the specification was fulfilled right after manufacturing [73]. Due to BTI, severely aged paths may perform signal transitions after the clock edged (delay specification), where the results of logic functions are stored in flip-flops. Consequently, an incorrect logic value is stored as the result of the logic functions. Such failures are unacceptable for safety-critical applications. Therefore, lifetime reliability analysis and improvement is required for the design of reliable, high-performance electronic systems.

31 2.4. AGING-AWARE DESIGN TECHNIQUES Background and Justification BTI effect is translated into an increase of delay. Device level Gate Level

Δℎ = ⋅ ⋅ () = + Δ() (a) Circuit c17 (b) Delay degradation for two paths

Figure 2.7:Impact Critical pathon reorderingLifetime due toReliability diﬀerent aging rates [8].

Circuit Delay

Spec. A circuit may fail its timing (Tck) Fail to Spec. specification even if it was met right after manufacturing time

Obtained Expected Lifetime Lifetime 5

Figure 2.8: Aging eﬀects may cause a functional failure.

2.4 Aging-aware design techniques

Design techniques have been explored in the past to deal with circuit delay degradation due to BTI aging [74,75]. Table 2.1 shows an overview of the main aging- aware design techniques. They can be categorized in Design-Time techniques and Run-Time Techniques.

2.4.1 Design-time techniques

Design-time techniques are those that take place only once during circuit design phase. A simple design-time technique to account aging and process variations

32 2.4. AGING-AWARE DESIGN TECHNIQUES

Table 2.1: Aging-Aware Design Techniques Category Technique References

Guardbanding [45][76][46][77] Synthesis/Library Optimization [78][79][80][81][82][75] Design-Time Logic Restructuring/Pin Re-Ordering [71][72][83]

Dual VDD/VthAssignement [84][85][86] Gate Sizing [87][88][89][90][91][92][93][42] Input Vector Control [94][95][96][97] Internal Node Control [98][99][100][101][102] Power Gating [103][104][105] Run-Time Task Distribution [106]

Dynamic Voltage/Frequency/Vth Scaling [107][108][109][110][111][112][113] Error Prediction/Detection [114][115][116][117] is guardbanding. As explained in Chapter 1 (See Figure 1.16(b)), a large extra time is usually added to the clock period to provide enough time for correct propagation of digital signals through the paths experiencing severe aging and process variations. However, this approach significantly penalizes performance as the impact of both BTI and process are increasing with technology scaling. An alternative guardbanding approach is to use greater than nominal supply voltage [77]. However, this increases both the power consumption and the degradation rate of the circuit. High-level aging-aware synthesis perform circuit optimization to balance the post-aged delay of the paths instead of balancing the paths delays at design time. This is shown to be beneficial to extend the circuit lifetime under a given guarband [81]. Aging affected gates are replaced by a robust counterpart obtained from an aging-aware cell library. However, the design of that gate library can be very complex since gates must be designed to address different workloads. Logic restructuring and node reordering techniques manipulate the stress probabilities of some influential devices to the deterioration of the critical paths of the circuit. To reduce the stress probability of critical devices, logic restructuring modifies the arrangement of super gates. A super gate is a group of connected

33 2.4. AGING-AWARE DESIGN TECHNIQUES gates that is logically equivalent to a big AND/OR gate. By swapping some of the gates, the delay deterioration of the super gate is mitigated due to reduced stress probability of the critical devices [71]. Node reordering technique reduces the stress probability of the devices within a gate by using the stacking effect of the transistors: The stress probability of the transistor in the top of the stack depends on the stress probability of the transistors in the bottom of the stack. Therefore, the stress probability of the overall stack can be reduced by properly chosen the input signal that drives each device in the stack. Although these techniques are of low cost in terms of area and power, their effectiveness may not be sufficient to compensate aging.

Dual VDD/Vth assignment aims the co-optimization of power consumption and aging. A high (low) VDD (Vth) is assigned to the gates (transistors) in the critical paths. On the other hand, for devices (gates) in non-critical paths, a low VDD

(Vth) is assigned to save power. A drawback of using multiple supply voltages is the need of voltage level converters to connect gates in diﬀerent VDD regime. Gate sizing is one of the most eﬀective approaches to compensate aging and process variations. The current drive capability of the gates increases for wider transistors at the cost of area and power overhead. Most of the gate sizing approaches are based either on formulating an optimization problem and solving it using an exact optimization method, such as Lagrange Method. Other approaches use heuristics to guide the optimization problem, leading to close to optimum results with lower computational time than exact optimization methods.

2.4.2 Run-time techniques

Run-time techniques are those techniques that are dynamically applied to the circuit over the lifetime operation. Input Vector Control (IVC) and Internal Node Control (INC) are techniques that take place only during circuit standby mode. When the circuit is under

34 2.4. AGING-AWARE DESIGN TECHNIQUES normal operation conditions, these techniques are not applied. At standby, the input vector that minimizes the aging and static current consumption is applied to the circuit. The input vector that minimizes aging is the one that puts out of BTI stress those devices in the critical paths. Since the control-ability of internal nodes by main inputs is limited, the internal node control technique adds some logic at internal nodes to be able to manipulate the stress state of some critical transistors. Recently, a rejuvenation approach that applies the best stimuli that sets the critical devices at recovery state has been proposed in [97]. These rejuvenation stimuli are applied during small time intervals (not standby). One of the advantages of IVC and INC techniques is that the circuit can change between standby and normal operation modes quickly. However, when the duration of the standby mode is large enough, Power Gating technique, which disconnects the logic gates from the VDD/GND rails, may be more effective to minimize power and aging. In multiprocessors systems, it is possible to perform task distribution to every single core dynamically. The work in [106] proposed an algorithm that assigns the task requiring more resources, hence more critical for performance, to those cores that remains with low aging while easy tasks are assigned to the most aging affected cores. The effect is to balance the degradation rate of the entire system. Dynamic Voltage Scaling (DVS), Dynamic Frequency Scaling (DFS) and Adap- tive Body Bias (ABB), are three widely known run-time techniques to address aging issues. DVS uses a low supply voltage level when the circuit is fresh, so that power consumption is reduced. As the circuit ages, the supply voltage level is increased to keep reliable operation. Similarly, DFS uses a higher frequency when the circuit is fresh, and it gradually reduces the operating frequency as circuit ages. In such way, close to best performance is achieved along the lifetime. ABB technique uses the body effect to control the threshold voltage of the device. As the devices age due to BTI, the body bias voltage is increased to compensate the performance deterioration. However, ABB technique is no longer efficient in

35 2.4. AGING-AWARE DESIGN TECHNIQUES deeply scaled technologies due to the increased impact of variation in the threshold voltage. Moreover, in emerging technologies using FinFET devices, the body terminal is removed. Hence ABB is no longer applicable in those technologies. Error prediction and error detection are interesting techniques for monitoring the aging condition of the circuit. Aging sensors are introduced at the end-points of the circuit critical paths to keep under surveillance their delay degradation. When the circuit delay degradation is larger than a given threshold, the sensors detect this condition and generate a warning signal. In the error prediction technique, the warning signal is generated before a faulty operation really occurs. Thus the delay of the critical paths is still lower than the clock period, and correct data is stored in the memory elements (i.e., Flip-Flops). In the Error detection scheme, the warning signal is generated when a faulty operation has occurred (the logic signal was not propagated to the memory elements under the given clock period). Therefore, error recovery strategies are needed. Error prediction/detection are usually combined with other run-time approaches such as task distribution and dynamic frequency-voltage scaling [110], so that each time an error prediction/detection occurs, self-healing mechanisms such as increasing supply voltage or reducing operating frequency take place. In order to reduce the complexity and overhead of delay monitoring, an accurate selection of those paths that really require being monitored is needed.

36 Chapter 3

Aging-Aware Statistical Timing Analysis

As discussed in previous chapters, temporal transistor degradation caused by reliability mechanisms such as NBTI/PBTI has a great impact on circuit timing performance. This chapter provides a detailed explanation of the process of computing circuit delay considering aging effects. This chapter is organized as follows: Section 3.1 presents a background of the principal concepts for timing analysis. Section 3.2 presents the modeling of BTI degradation considering process variations effects and the impact of circuit workload. Finally, Section 3.3 provides a detailed explanation of the procedure for computing circuit delay using Statistical Static Timing Analysis and having into account the effects of aging due to BTI.

3.1 Background

Timing analysis is a very important step during circuit design. It is used mainly for timing veriﬁcation to make sure that the circuit design meets timing requirements. Timing analysis is also important for circuit optimization, since it allows to identify circuit segments requiring timing improvement.

37 3.1. BACKGROUND

3.1.1 Basic deﬁnitions for timing analysis

The timing performance of a digital gate can be measured in terms of the delay that takes to propagate a digital signal from its inputs to its output. Figure 3.1(a) shows an example of a simple NAND gate experiencing a signal transition at one of its input nodes (node X). Note that signal propagation through the gate occurs only if the other input (node Y) is at the logic value that sensitizes the gate (logic 1 for a NAND gate). Figure 3.1(b) illustrates the waveforms at gate input X and gate output Z. Some basic deﬁnitions are as follows:

D is the delay of the gate.

SRI is the slew-rate of the input signal.

SRO is the slew-rate of the output signal.

The delay of the gate (D) is measured between the 50% transition point of the switching input and output waveforms. The slew-rate of the input (SRI) and output (SRO) signals measures how quickly the signal switches from one logic value to the other. Both D and SRO are important timing metrics of a gate and have been widely used for timing analysis of an entire circuit [118].

“1” “1” (a) NAND gate with a transition at an (b) Transient waveforms for input (X) input. and output (Z).

Figure 3.1: Basic deﬁnitions for timing analysis.

38 3.1. BACKGROUND

3.1.2 Timing Analysis Methods

Timing analysis methods estimate the delay of a circuit in a faster approach than performing circuit level electrical simulations based on SPICE engine. Timing Analysis can be classiﬁed into two diﬀerent categories depending on the circuit traversal algorithm used: Block Based Timing Analysis and Path-Based Timing Analysis. Furthermore, depending on how the delay information is evaluated, timing analysis can be based on Deterministic Timing Analysis, Monte Carlo- Based or Statistical Timing Analysis

Timing Analysis Categories

Block-Based Timing Analysis The goal of block-based timing analysis is to compute the maximum delay that takes to propagate a signal from circuit primary inputs to every gate output node. This delay is called the Arrival Time (AT) of a node. In this method, the gates of the circuit are divided into blocks depending on their logic level and the arrival times are propagated in a level-based order, from primary inputs to primary outputs [119]. Figure 3.2 illustrates the propagation of arrival times for a two-input NAND gate using the MAX operator. As shown, the arrival time at the output of a gate corresponds to the maximum of the inputs arrival times, plus the delay through the gate [120]. In order to compute the delay of the subsequent gates, the input transition slews are also propagated in a similar way. One of the main advantages of block-based approaches is that computational time is linear with respect to the circuit size [121]. This is because arrival times are propagated only once for each gate. However, the use of MAX operator makes pessimistic the estimation of timing quantities, which may lead to over-design.

Path-Based Timing Analysis The goal of path-based timing analysis is to compute the speciﬁc delay of each signal path in the circuit. A signal path is any

39 3.1. BACKGROUND

− +

= ( + , + )

Figure 3.2: Arrival times propagation in block-based timing analysis approach. possible trajectory that a signal may traverse from a primary input to a primary output. Figure 3.3 illustrates the computation of the delay of two paths: Path 1 is composed of gates G0, G1, and G2, and path 2 is composed of gates G1 and G2. In this case, the path delay is computed using the operator SUM, which adds the delay contributions of each gate in the path trajectory. The delay of gates with multiple inputs is evaluated assuming that the non-switching inputs (also called path side-inputs) are tied at the logic value that sensitizes the gate.

= + + G0 G1 G2

= +

Figure 3.3: Propagation of gate delays through two paths.

The main advantage of path-based timing analysis is its high accuracy. An- other advantage is that it makes easier the identiﬁcation of paths having large delays since the topology and delay information of every path is available. Path- based timing analysis is more computationally costly than Block-Based timing analysis. This is because the delay of a gate is re-computed for each path passing through the gate [7]. For reliability analysis and improvement, a high accuracy on computed delays is desired since an over-estimation of the circuit delay may lead to signiﬁcant

40 3.1. BACKGROUND over-design. Therefore, the path-based approach is used in this thesis for timing analysis.

Timing Evaluation Methods

The main methods to evaluate the delay of a circuit are: Deterministic Static Timing Analysis, Monte Carlo Analysis, and Statistical Static Timing Analysis.

Deterministic Static Timing Analysis Deterministic Static Timing Anal- ysis (DSTA) computes circuit delay assuming that devices process parameters receive a ﬁxed value [18], so that gate delays are deterministic (See Figure 3.4).

+ + =

G0 ℎ G1 G2

Figure 3.4: Deterministic Static Timing Analysis. Gates take a ﬁxed delay.

In order to account for the impact of process variations on circuit timing using STA, a Corner-based analysis+ is usually adopted.+ The corner-based= analysis relies on the assumption that if the circuit operates correctly under the most pessimistic G0 ℎ conditions, then it will work correctlyG1 for all otherG2 process conditions [7]. Figure 3.5 illustrates the feasible corners of a fabrication process. The process corners are deﬁned based on the current capability that devices take. Typically, Slow- Slow (SS), Typical-Typical (TT) and Fast-Fast (FF) process corners are of great concern. In the TT process corner, the parameters are assumed to take their nominal values. In the SS corner, the parameters shifts of all the devices in the circuit are chosen to minimize the device’s current capability while current capability is maximized in the FF corner. In order to compute delays, the gate

41 3.1. BACKGROUND library needs to be characterized for the corresponding process conditions on each corner.

Feasible Process

= { ↑, ↓, ↓, ↓ }

= { ↓, ↑, ↑, ↑}

Figure 3.5: Process Corners for STA

The corner-based method is computational inexpensive since only one timing computation run is performed for each corner. However, designing circuits based on process corners leads to signiﬁcant over-design since the process corners are too pessimistic because the statistical behavior of correlated and independent variations is not considered.

Monte Carlo Analysis Monte Carlo analysis method is a brute-force approach to compute the statistical delay of a circuit. It consists on an exhaustive computation of the circuit delay for several circuit’s samples resulting from assigning a value to the process parameters of each device, which are modeled as a normal-distributed random-variable:

2 2 2 2 W = N(µW , σW ) L = N(µL, σL) T ox = N(µT ox, σT ox) V th = N(µV th, σV th) (3.1) At each sample, each device process parameter receives a diﬀerent value, which is generated according to the corresponding probability distribution function.

42 3.1. BACKGROUND

Then, DSTA is run to obtain the delay of the paths. This process is repeated several times to obtain a large enough number of samples for the desired accuracy of the statistical results. Figure 3.6 shows an example of the computed delay for a 20 inverter gate chain at each Monte Carlo trial. The mean value and standard deviation of the path delay are obtained from the histogram distribution for all the computed delays. The larger the number of trials that are performed, the higher the accuracy of the results [7].

1400

1200

1000

800

600 0 500 1000 1500 2000 trial

Figure 3.6: Computed delay of a 20 inverter chain path for each Monte Carlo trial. The obtained histogram is also shown.

Monte Carlo method provides a very accurate prediction of circuit yield and reliability. However, the major drawback is its computational ineﬃciency since a large number DSTA runs are required for high accuracy.

Statistical Static Timing Analysis Statistical Static Timing Analysis (SSTA) mathematically considers the probability distributions of the process parameters during delay computation, which in turn makes the gate delays to become a random variable, and gives a distribution of possible circuit’s delay outcomes rather than a single outcome. In such way, the statistical delay information of the paths (mean value and standard deviation) is obtained on a single delay computation run.

43 3.2. MODELING OF BTI DEGRADATION As shown in Figure 3.7+, in SSTA method,+ the statistical= sum of the gate delay distributions is performed. The mathematical treatment of the statistical sum of G0 ℎ the random variables representingG1 each gate isG2 usually based on simpliﬁed gate delay models (i.e., linear or second-order models) accounting for the impact of the process parameters.

+ + =

G0 ℎ G1 G2

Figure 3.7: Statistical Static Timing Analysis. Gates delays become a random variable.

Statistical Static Timing Analysis is less computationally costly than Monte Carlo method, while it keeps the statistical behavior of the process parameters into account. Therefore, SSTA is convenient for delay computation of large circuits.

3.2 Modeling of BTI Degradation

3.2.1 Process Variations-Aware BTI Model

Process variations have a strong inﬂuence on the degradation due to BTI that devices experience [43]. As can be observed in Equation 2.3, ∆V thBTI depends on the initial value of Vth (Vth0) of the device, which is impacted by process variations.

Figure 3.8 illustrates the shift in threshold voltage due to BTI (V thBTI ) as a function of lifetime for diﬀerent process variations in the initial V th (∆V thPV ). Data for this plot has been obtained using the long-term prediction model of

Equation 2.3. As can be observed, positive (negative) values on ∆V thPV produce a lower (higher) degradation rate. Thus, BTI-induced Vth degradation must be modeled as a random variable.

44 3.2. MODELING OF BTI DEGRADATION

Figure 3.8: Threshold voltage shift due to BTI as function of operational time for diﬀerent initial V th due to process variations

The impact of process variations in the long-term degradation of Vth can be accounted by a ﬁrst-order Taylor approximation of Equation 2.3 with respect to variations [122],

n n ∆V thBTI = (1 + Sv · ∆V thPV ) · A · α · t (3.2)

where ∆V thPV is the initial shift in Vth due to process variations, and A, and

Sv are ﬁtted constants that depend on technology node, temperature, and electric ﬁeld, and α is the stress duty-cycle of the transistor.

The total variation in Vth of a transistor m is obtained by adding the shifts contributions due to BTI and Process variations (∆V thBTI + ∆V thPV ).

∆V thm = ∆V thBT I,m + ∆V thP V,m

n n = (1 + SV,m · ∆V thP V,m) · Am · αm · t + ∆V thP V,m (3.3)

n n n n = Am · αm · t + (1 + SV,m · Am · αm · t ) ·∆V thP V,m | {z } AF

By analyzing Equation 3.3, it can be observed that the ﬁrst term gives the BTI

45 3.2. MODELING OF BTI DEGRADATION impact on the mean value of the threshold voltage of the device, and the second term shows that BTI interacts with process variations through the Aging Factor (AF ). The aging factor deﬁnes the BTI impact on threshold voltage variability. At t = 0 the total variation in V th is due to process variation only. However, as time increases aging eﬀects cause changes in both the mean and standard deviation of Vth [122]. This behavior can be appreciated in Figure 3.9, where a histogram of the threshold voltage distribution for a fresh and aged device is shown.

0.20

≈ ≈ ) 0.15 ≈ ≈

malize 10 years 0.10 Aging Nor ( unts 0.05 Co

0 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 V (V) th

Figure 3.9: Histogram of the threshold voltage distribution for a fresh and aged device.

3.2.2 Workload Impact on BTI

A workload corresponds to the set of input vectors (0’s and 1’s) applied to the main circuit inputs while executing a given task or program, as illustrated in Figure 3.10. Each input vector (set of bits at main inputs) is usually evaluated in one clock period. When a logic vector is evaluated, the internal nodes of the circuit (n1 and n2) switch to set a logic value. The operating workload of a circuit has a strong impact on the BTI-induced threshold voltage degradation of the devices since it inﬂuences the eﬀective time

46 : 0.42 0.54 0.98 0.77 0.02 0.22 : 0.68 0.63 0.83 0.57 0.17 0.35 ...... ...... ......

: 0.94 0.55 0.35 0.48 0.65 0.18 3.2. MODELING OF BTI DEGRADATION

Workload Circuit Under Analysis

0…..0100101111100

0….. 0111101110011

1…..0011101010110

Figure 3.10: Workload applied to a circuit. the devices experience BTI stress and the operating temperature [123, 124], an consequently it impacts on the circuit delay degradation (See Figure 3.11). De- pending on the input vector being evaluated, a set of devices enter in the stress condition of BTI for the current clock period. As the input vector alternates, the devices also alternate between the stress and the relaxation phase due to BTI. Therefore, the sequence of input logic vectors (workload) impacts on the stress time that each device experiences, and consequently, it deﬁnes the eﬀective stress duty-cycle of each device. The operating temperature depends on the power dis- sipation of the circuit, which is mainly due to dynamic power consumption and leakage power consumption of the digital gates. A logic gate whose output node is constantly switching from one logic state to the other (due to the input vectors being applied) consumes a higher dynamic power. The leakage power of a gate depends on the logic state of its input nodes.

Logic Simulation is an approach for estimating the impact of the workload on the aging degradation that a circuit will experience. A large number of input vectors are applied to the circuit and evaluated at the logic level. Then, it is counted how many times each transistor was settled at stress condition, i.e., Posi- tive Gate-to-Source bias for NMOS and Negative Gate-to-Source bias for PMOS. The duty-cycle of a device is approximated as the ratio of the number of times the device was at BTI stress condition to the total number of vectors. The operating temperature can also be estimated based on the average power consumption over

47 3.2. MODELING OF BTI DEGRADATION

Workload

Stress Operating Duty-Cycle Temperature

BTI-Induced Threshold Voltage Degradation of Devices

Delay Degradation

Figure 3.11: Workload impact on delay. the applied input vectors. However, performing such analysis for large-scale digital circuits is unpractical [66] since the exact sequence of input vectors applied to a circuit over the lifetime is very large.

A more eﬃcient approach is to represent the workload as a set of Signal Prob- abilities applied at main circuit inputs:

WL = {SPn1,SPn2, ...SPni} (3.4)

where SPni corresponds to the signal probability at input node i. As shown in Figure 3.12, the signal probability is an estimation of the average time, in percentage, an input node is at logic 1 for a given workload (WL) being executed over the lifetime. One of the advantages of using signal probability representation of the workload is that the exact sequence pattern of the logic inputs is not needed.

In order to estimate the degradation of the devices, the signal probabilities are propagated to internal nodes using probability rules. Then, the duty-cycle and the operating temperature of the devices are obtained from the signal probabilities at internal nodes. Once duty-cycle and operating temperature are obtained, the BTI-

48 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE

induced Vth shift of each device can be computed using the long-term predictive model of Equation 3.3.

LOGIC 0 LOGIC 1

NODE A = 80%

NODE B = 50%

CLOCK

Figure 3.12: Workload as a Signal Probability.

Degradation analysis is usually performed assuming a representative workload of the circuit, which can be estimated if information of a typical bit pattern trace is available [89]. However, the speciﬁc workload that a circuit will experience over the lifetime is rarely known in advance at the design phase, which complicates reliability analysis and causes signiﬁcant over-design due to worst-case assumptions. This issue will be addressed in the proposed design techniques in this thesis.

3.3 BTI-aware Statistical Timing Analysis Pro- cedure

Traditional Timing Analysis does not consider the temporal degradation caused by aging mechanisms such as NBTI and PBTI. However, as technology continues to shrink, aging analysis has become a mandatory topic during IC design and veriﬁcation to guarantee that timing constraints are met not only right after manufacturing but also over the circuit’s lifetime. This section provides a detailed explanation of the procedure of computing the delay of a circuit while accounting the eﬀects of aging and process variations. As a result, the statistical timing behavior over the lifetime of the signal paths in a circuit is obtained.

49 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE

An overview of the Aging-Aware timing analysis flow used in this work is presented in Figure 3.13. The process consists of four main steps. First, a pre- computing step, where the gate library is characterized via accurate SPICE simulations, takes place. Then, polynomials fitting the extracted data are obtained to allow a fast evaluation of gates delays using a gate delay model. Second, a signal path search step is performed to find all the topological paths that are in the circuit. Third, a workload and degradation analysis step is carried out. In this step, devices duty-cycles and operating temperature are estimated. With this information, Vth degradation of devices is computed. Finally, the statistical delay of the paths is computed. As a result, the circuit timing performance over time is obtained.

Cells Library Circuit Netlist Circuit Workload

HSPICE Simulations Path Search Algorithm Signal Probability Propagation

Polynomial Fitting Topological Paths Duty Cycle Temperature Library Characterization Signal Paths Search Computation Computation Paths Statistical Delay Gate Delay Model Vth Degradation Prediction Computation Workload and Degradation Analysis Statistics of paths delay (, )

Figure 3.13: Aging-aware path-based timing analysis procedure.

A more detailed explanation of each of the steps in the Aging-Aware SSTA ﬂow is provided next.

3.3.1 Gate Delay Model

In order to estimate the delay of a circuit, timing analysis methods require a gate delay model capturing the dependence of gate delay with respect to important parameters such as the size of the gate (K), loading capacitance (CL), input slew-rate (SRI), and operating temperature (T ). Furthermore, the model must account for the impact of parameters variations caused by process variations and

50 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE aging degradation (∆P ). The delay of a gate can be represented as a function of those parameters:

D = f(K, CL, SRI, T, ∆P ) (3.5)

Behavior of the delay of a gate

The behavior of the gate delay with respect to important parameters can be observed in Figures 3.14(a)-3.14(d). The data for these Figures were obtained for an inverter gate using SPICE electrical simulations. Figure 3.14(a) shows that delay reduces when gate size increases. This is because the current capability of the gate increases as transistors are made bigger (Id ∼ W ). It should be noted that the amount of delay reduction for larger gate sizes tend to reduce because the delay does not only depends on the gate current capability but also on its loading capacitance, which increases with gate size due to self-loading parasitics. Figure 3.14(b) shows that delay is closely a linear function of the gate loading capacitance. The behavior of gate delay with respect to the slew-rate at the switching input is shown in Figure 3.14(c). When the signal transition from a logic value to another takes a longer time, the delay of the gate also becomes longer. Figure 3.14(d) shows the impact of the operating temperature on gate delay. As can be observed, gate delay slightly increases when higher operating temperatures are presented.

Statistical Linear Gate Delay Model

For Aging-aware SSTA, the delay of each gate in the library is modeled as a linear combination of the random variables representing parameters shifts due to both process variations and aging eﬀects [18,122],

M D D D X D D = Dn + SW ∆W + SL ∆L + Stox∆tox + SV th,m∆V thm (3.6) m

51 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE

25 150150 25 150 125125 20 25 150 125 100100 20 125 10015 207575 100 15 75 5050 10 15 75 50 2525 10 50 255 1000 1x 2x 3x 4x 5x 6x6x 7x7x 8x8x 9x9x 10x10x 11 22 33 44 55 66 77 8 8 9 9 1010 25 5 0 Size 5 LoadLoad Capacitance Capacitance (fF) (fF) 0 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x 1 2 3 4 5 6 7 8 9 10 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x 1 2 3 4 5 6 7 8 9 10 Size 27 Load Capacitance(a) (fF) 2828 (b)Size Load Capacitance (fF)

2626 27 26.528 27 28 26 2424 26 26.5 26 26.5 24 2222 24 26 25.5 2026 22 20 22 25.5 25 25.51818 20 2025 10 20 30 40 50 60 70 80 90 100 25 50 7575 100100 125125 10 20 30 40 50 60 70 80 90 100 Temperature (°C) 25 SRISRI (ps) (ps) 18 25 18 Temperature (°C) 10 20 30 40 50 60 70 80 90 100 25 50 75 100 125 10 20 30 40 50 60 70 80 90 100 25 50 75 100 125 SRI (ps) Temperature (°C) SRI (ps) Temperature (°C) (c) (d)

Figure 3.14: Behavior of the delay of an inverter gate as a function of important parameters. a) Gate size, b) Load capacitance, c)Input slew rate, d) Temperature

D D D D where Dn is the nominal gate delay. SW , SL , Stox and SV th are the gate delay sensitivities with respect to deviations in W, L, tox, and Vth, respectively. M is the number of transistors in the gate. ∆V th is composed of two deviation components, one due to aging eﬀects and the other due to process variations (See Equation 3.3). As commented before, the delay of the gate is function of important parameters such as the size of the gate, loading capacitance, input slew-rate , and operating temperature (See Equation 3.5)

Equation 3.6 allows obtaining the statistical delay of a path by performing the statistical SUM operation using addition rules of linear combinations of random variables. The SUM operation will be explained later. The linear model is adequate for small variations as computational complexity remains low and the error due to discarded higher order terms can be neglected [18]. This is generally the case of intra-chip variations where the process parameter variations are relatively

52 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE small in comparison with the nominal values, and they can be approximated as normally distributed random variables [125]. The sensitivity coeﬃcient of the gate delay to variations in a parameter P is deﬁned as the partial derivative of the gate delay with respect to changes in the parameter P (See Equation 3.7). This derivative is evaluated at the nominal value of the parameter.

∂D(K, CL, SRI, T ) SD = (3.7) P ∂P

The behavior of the delay of an inverter gate as a function of shifts in Vth is plotted in Figure 3.15, where it is evident a closely linear dependence with respect to Vth. Therefore, the linear model is well suited to account for such variations.

50 = ℎ 45 40 0 20 40 60 80 100 120 Vth (mV) BTI

Figure 3.15: Gate delay as a function threshold voltage degradation.

3.3.2 Cell Library Characterization

In order to quickly and accurately estimate the statistical delay of a gate during timing analysis, every single gate in the technology library is pre-characterized with SPICE electrical simulations. Figure 3.16 illustrates the characterization process of an inverter gate having a low to high transition at its input. SPICE

53 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE simulations are run to measure the nominal delay and the delay sensitivities to device parameters (i.e. channel width, channel length, oxide thickness and threshold voltage) at various operating conditions given by combinations of input transition time (SRIN), the gate size (K), load capacitance (CL), and the operating Tem- perature (T e). The parameters combinations are obtained using full-factorial Design of Experiments (DOE), where a number of levels are assigned to each parameter. The larger the number of levels, the greater the resolution of the characterization process. In this work, 9, 16, 10, 5 levels were used for SRIN, K, CL, and T e, respectively. This process is repeated for each input transition type (low-to-high and high-to-low) and for each input node of each gate in the library. After SPICE simulations, tables with timing information for each gate are obtained, as shown in Figure 3.16. Then, polynomials that ﬁt the extracted data are obtained as a function of SRIN, K, CL and T e. Thus a mathematical expression if obtained to model the delay of each gate. The generated polynomials capture the timing behavior of the gates and are used for fast estimation of the statistical gate delays (using Equation 3.6) with close-to-spice accuracy.

, , ""

1 0.2 20 25° 10 12 50 , , "" 2 1 0.40.2202050°25°15101612 6050 = (, , , ) 4 2 0.60.43020100°50°20151916 7060 = (, , , ) 6 4 0.80.6503025°100°25202319 8070 6 0.8 50 25° 25 23 80 = (, , , ) … … … … … … …

Figure 3.16: Characterization process of an inverter gate.

The pre-characterization process is done for all the basic static cells (Inverter, 2-4 input NAND and 2-4 input NOR). However, it can be easily extended to more complex cells. It must be noted that although the pre-characterization process may be time-consuming, it is performed only once for the entire gates library and the given technology node. Therefore, once the gate delay polynomials have been obtained, they can be used to evaluate the statistical delay of a gate in any general combinational circuit.

54 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE

3.3.3 Topological Paths Search

A topological path represents a possible trajectory that a logic signal can take from a main circuit input to a main circuit output. All the possible paths that a logic signal may traverse need to be identified to verify their correct timing performance. Given the gate-level netlist, a path search algorithm is used to find the topological paths of the circuit. The first step is to identify the preceding and subsequent gates of every gate in the circuit. In such way, a graph for the circuit structure is obtained. Figures 3.17(a) and 3.17(b) show an example of a digital circuit and its corresponding graph for searching the topological paths. Each vertex in the graph represents a node of the gate-level netlist, and the edges correspond to the logic gates that propagates a signal from one node to another. S and T nodes are the source and the sink terminals, respectively. All the primary inputs are connected to the source vertex while all the primary outputs are connected to the sink vertex.

G0 G4 G0 G2 G4 G2 G1 G5 G1 G5 G3 G3 (a) Digital Circuit (b) Graph Representation

Figure 3.17: Example of a digital circuit and its graph representation for searching the topological paths

A depth-ﬁrst search algorithm is used to traverse the graph from a principal output to a principal input. The path search is performed as follows:

1. Initialize and start the path structure with an output node.

55 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE

2. Select and visit a non-marked adjacent vertex. Add the edge (gate) and vertex (node) to the path structure.

3. Repeat step 2 until a main input is reached. Mark the latest edge.

4. If all the edges arriving to a vertex are marked: a) Mark the vertex and its latest traversed output edge; b) Unmark the arriving edges to the vertex.

5. Repeat steps 1-5 until all the nodes connected to the output node (vertex) are marked.

6. Select other output node and repeat steps 1-6.

The result of the path search algorithm is a structure with information of the gates and nodes that compose each path in the circuit

3.3.4 Workload and Degradation Analysis

The objective of this step is to estimate the stress duty-cycle and operating temperature that the devices experience given an operating workload proﬁle for the circuit. In such way, an accurate prediction of the devices degradation can be made.

Signal Probability Propagation

In order to compute devices duty-cycle, the signal probabilities at main circuit inputs (representing the operating workload of the circuit) are propagated to the internal nodes ﬁrst. Signal probability propagation is performed using basic probability rules as illustrated in Figure 3.18. For an inverter gate, the signal probability at its output node is equal to the probability of its input node to be at logic zero. The formula to propagate the signal probabilities for other gates can be easily derived based on its truth table (See Figure 3.18). For simplicity

56 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE purposes, the signal correlation that arises due to re-convergent fan-out is neglected in this thesis. However, signal probability propagation methods taking into account signal correlation are available in the literature [66].

= 1 − = 1 − ( ⋅ ) = (1 − )(1 − )

A B Y A B Y A Y 0 0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 1 1 0

Figure 3.18: Signal probability propagation rules for some basic gates.

Devices Duty-cycle Computation duty-cycle (α) of each transistor (See Equation 2.3) represents the eﬀective percentage of the lifetime that the device is likely to be at BTI stress condition, i.e., when the absolute value of the gate-source voltage of the device is equal to the bias voltage (VDD)[8]:

αm = P (|Vgs,m| = VDD) (3.8)

Once signal probabilities at every internal node of the circuit are obtained, the duty-cycle of each transistor can be computed. Figure 3.19 shows simple equations to compute the duty-cycle (α) of each transistor in some basic gates as a function of the signal probabilities at gate inputs. It is important to note that stacking eﬀect due to transistors in series has been considered [59].

57 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE

α = (1 − ) α = (1 − ) SP α = (1 − SP) α = (1 − ) SP SP α = SP ⋅ α = (1 − SP)(1 − SP) SP

SP α = SP α = SP α = SP α = SP

Figure 3.19: Devices duty-cycle computation based on signal probabilities at gates inputs.

Temperature Estimation

Temperature strongly influences BTI mechanism. The temperature profile of a circuit is obtained from the power consumption profile using the electric model show in Figure 3.20, which gives the following expression for the operating temperature of a gate:

Ti = RJ,i · Pi + RI−A · Ptotal + TA (3.9)

where Pi is the power consumption (Static and Dynamic) of the gate i, RJ,i is the junction to internal air heat resistance, Ptotal is the total circuit power consumption, RI−A is the heat resistance from internal air to ambient, and TA is the ambient temperature. This model assumes that devices affect one another through their influence on the internal chip temperature and allows a fast estimation of the chip thermal profile [9].

Vth Degradation Prediction

Once duty-cycle and operating temperature are obtained, the BTI-induced Vth shift of each device can be computed using the long-term predictive model of Equation 3.3.

58 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE

…

Gate 1 Gate 2 Gate i

Figure 3.20: Electric model for temperature estimation [9].

3.3.5 Path Statistical Delay Computation

Once computed the degradation of each transistor in the circuit, the delay of each path can be obtained. The path delay is calculated for rising edge and falling edge separately. The process consists of applying a rise (or fall) edge at the path start point (i.e., main input of the circuit) and then statistically add the delay contribution of each gate in the path. For a path p having N gates, the statistical add operation of N random variables (Di) representing the delay of each gate in the path is performed:

Dp = D1 + D2 + ... + DN = N(µp, σp) (3.10)

It should be noted that the statistical sum of random variables in SSTA is not as trivial as the simple sum of deterministic delays in DSTA. For SSTA, the add operation of gate delays requires computing the convolution of two probability distribution functions, which is computationally expensive [19]. In order to improve the speed of SSTA methods, the random variables representing each process parameter are usually assumed Gaussian-distributed [7]. Since gate delays are modeled as a linear combination of the transistors parameters variations, the path delay resulting from the addition of gates delays is also Gaussian distributed [126].

Mean Value of Path Delay The mean value of the path delay µp is the

59 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE simple sum of the mean value of the delay of each gate in the path.

N X µp = µD,i (3.11) i=1 Using the linear gate delay model of Equation 3.6, and excluding the terms not related with process parameters variations, it can be easily obtained that the mean value of the path delay is:

N N M,i X X X D,i n n µp = Dn,i + SV th,m · Am,i · αm,i · t (3.12) i=1 i=1 m=1 where the first summation adds the nominal delay of the gates in the path and the second summation adds the impact on the delay of the aging-induced V th degradation of the transistors in the gates. When the circuit is young (t = 0), the mean value of the path delay is defined by the nominal delay of the gates only. However, as operational time progresses, the mean value of the path delay increases due to the impact of devices threshold voltage degradation on the delay of each gate. The degradation rate of the path depends on the gates delay sensitivity coefficients to V th shift. Variance of Path Delay The computation of path delay variance requires to compute the covariance between each pair of gates in the path:

N N 2 X X σp = COV (Di,Dj) (3.13) i=1 j=1 In order to compute the covariance between the delay of two gates, the following theorem that applies to linear combinations of random variables is used:

Theorem 1 Given two linear combinations of random variables A and B:

A = a0 · X0 + a1 · X1 + ... + ak · Xk (3.14)

B = b0 · X0 + b1 · X1 + ... + bk · Xk

60 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE

where coeﬃcients ai and bi represent the sensitivity of the function A and B th to the i random variables Xi and Xi. The covariance between A and B is,

k k X X COV (A, B) = ai · bj · ρij · σX,i · σX,j (3.15) i=1 j=1

where σX,i and σX,i are the standard deviation of the random variables Xi and

Xj, respectively, and ρij is the correlation between them. By applying the the aforementioned theorem to compute the covariance between the delay of two gates modeled with the linear model of Equation 3.6 and assuming that there only exist correlation (due to spatial proximity) between variations on the same parameter type (i.e., W of gate i with W of gate j), the covariance between the delay of two gates can be expressed as,

D,i D,j COV (Di,Dj) = SW · SW · ρij · σW,i · σW,j

D,i D,j +SL · SL · ρij · σL,i · σL,j (3.16) D,i D,j +Stox · Stox · ρij · σT ox,i · σT ox,j M,i M,j X X D,i D,j + SV th,m · SV th,k · ρij · AFm · σV th,i · AFk · σV th,j m k

The spatial correlation between the two gates (ρij) can be extracted from the circuit layout using the model of [2], which models spatial correlation as an exponentially decreasing function of the distance between two gates (See Equation 1.1).

3.3.6 Inter-Path Covariance

The inter-path covariance measures how the statistical delays of a pair of paths jointly change due to process variations. It is useful to determine whether a path can take delays larger than other.

61 3.3. BTI-AWARE STATISTICAL TIMING ANALYSIS PROCEDURE

Let’s consider two paths, a path A formed by K gates, and a path B formed by L gates. The inter-path covariance can be computed as:

K L X X COV (DP,i,DP,j) = COV (Di,k,Dj,l) (3.17) i=1 j=1

th where COV (Di,k,Dj,l) represents the covariance between the delay of the k gate in path A and the delay of the lth gate in path B. Inter-path covariance depends on how far are placed in the circuit layout the gates of one path from the gates of the other path. A special case to be considered is when the two paths share the same gate, as illustrated in Figure 3.21. In this case, the spatial correlation between the shared gates in the paths when computing

COV (Di,k,Dj,l) becomes maximum (the unit). The case shown in Figure 3.21 is also known as structural (topological) correlation. Therefore, the structural correlation between paths increases the inter-path covariance.

Figure 3.21: Two paths with structural correlation, which increases inter-path covariance.

62 Chapter 4

Early Selection of Aging Critical Paths for Delay Degradation Monitoring

Delay degradation monitoring is a technique to deal with transistor aging by keep- ing under surveillance the delay degradation of the circuit’s critical paths. In such way, prevention actions can be performed when the circuit has aged. This chapter presents a methodology to identify the critical paths of a circuit for reliable aging monitoring. The proposed methodology uses a statistical selection criterion to de- cide whether a path should be monitored. The proposed methodology takes into account the fact that different paths may become critical for different operating workloads. In order to identify the critical paths early in the design stage, where no layout information is already available, a spatial correlation approximation has been proposed. This allows identifying the correct placement of the aging monitors before the final circuit layout has been done.

63 4.1. AGING MONITORING TECHNIQUE

4.1 Aging Monitoring Technique

Aging monitoring is a smart way to deal with delay uncertainty caused by aging effects. Instead of adding worst-case margins to the clock period, the actual degradation of the circuit is monitored along the lifetime, and the system can react if degradation becomes significant. In order to measure the delay degradation of a circuit, in-situ circuit failure prediction sensors have been extensively investigated in the literature [115, 127–132]. As its name indicates, failure prediction sensors detect if a path is close to violating the clock period due to aging effects. Figure 4.1(a) illustrates the basic structure of an aging sensor and its behavior can be understood from the timing diagram shown in Figure 4.1(b). When the circuit is young (CP fresh), the CP transition occurs before the rising clock edge and also before a small timing window settled by the delay element (DE), where hazardous transitions are detected. Thus, the delayed signal (CP delayed) also makes its transition before clock edge and both the main flip-flop and the shadow flip-flop in the aging sensor store the same value. When CP wears due to BTI (CP aged), it performs a transition inside the detection window, so that each flip-flop stores a different value (note that the correct value is stored in the main flip-flop) and the sensor output signal is activated. The output of the prediction sensor can be used as control signal to perform countermeasures such as adaptive body biasing, supply voltage scaling, and frequency scaling, to guarantee reliable operation. Other structures for error prediction sensors have been suggested in [115,127–132].

A basic framework for aging monitoring is shown in Figure 4.2. At design phase, the critical paths of the circuit are identiﬁed. The critical paths are those likely to become the longest path of the circuit and violate timing speciﬁcations. Then, aging sensors are placed at the selected paths in order to be able to keep under track their delay degradation. The clock period is initially settled to leave a small timing guardband to assure reliable operation until the time when the circuit aging is tested. During lifetime operation, delay-test is periodically performed. A

64 4.1. AGING MONITORING TECHNIQUE

Main FF Critical Q D Q Path

CLK

Delay XOR2 Sensor Element Shadow FF Output D.E D Q

CLK CLK

AGING SENSOR (a) Error Prediction Architecture (b) Waveform

Figure 4.1: Aging sensor operation for error prediction. test vector is applied to the circuit activating each of the critical paths, and if the sensor output is activated, which indicates that a path is close to violate timing speciﬁcations, preventive actions are taken, such as increasing the clock period. In such way, close-to-best performance is achieved over the lifetime. )

Design Circuit for Desired Performance = (

se se Select Critical Paths to be Monitored

Aging Sensor Insertion on Critical Paths Design Design Pha

Periodical On-line Delay Test ) >

( Error e NO Predicted? YES Lifetim Preventive Actions (i.e. Decrease Frequency)

Figure 4.2: Aging Monitoring Framework

A major concern for aging monitoring is the identification of the critical paths to be monitored at design phase. A cost-efficient methodology for critical paths selection is needed to guarantee that aging sensors are effectively introduced only at output nodes of those paths strictly requiring to be monitored.

65 4.2. RELATED WORK

4.2 Related work

Different approaches for aging analysis and critical path selection have been explored in the past. Aging-Aware Deterministic Static Timing Analysis (DSTA) is used in [133], [134], to identify the critical paths of a circuit. Those paths that under aging effects exceed a threshold delay value are considered as critical. How- ever, circuit aging strongly depends on process variations (PV), and Aging-Aware DSTA cannot capture such combined effect of PV and BTI.

Aging-Aware Statistical Static Timing Analysis was used in [135] for circuit lifetime optimization. The critical paths to be optimized were those paths whose µ + 3σ of the aged delay was larger than the clock period (delay constraint). However, it should be noted that under a frequency scaling framework, the clock period of the circuit is not unique, as it is periodically adjusted to the actual longest path of the circuit. Therefore, although some paths may exceed the initial clock period, they would not trigger any logic fault if their delays are always shorter than the delays of the monitored paths.

In [115], [136] the joint effect of aging and process variations are considered. The selection of critical paths was improved by computing the probability of a degraded path under analysis (t = 10 years) to take a delay longer than the Longest Critical Path (LCP) of the circuit with no aging (t = 0 years). There are two main issues in these approaches: 1) Worst-case duty-cycle assumptions were made (i.e. the path under analysis is assumed under maximum aging whereas the LCP is assumed fresh), which leads to a large number of unnecessarily selected critical paths; 2) These approaches require spatial correlation information to perform Aging-Aware SSTA, but correlation information can be extracted only after final circuit layout has been made. However, critical paths need to be identified at early stages of the design flow in order to introduce test circuitry. Furthermore, it should be noted that some paths that are not covered by monitoring the LCP only may be covered by monitoring another path.

66 4.3. CHALLENGES FOR CRITICAL PATH SELECTION

The main contributions of this work are:

1. Critical paths are selected statistically using a greedy algorithm and a spatial correlation approximation that allows performing SSTA early in the design ﬂow.

2. Critical paths are selected for various realistic operating workload proﬁles. Thus, diﬀerent aging conditions of the paths are taken into account during path selection and classic worst-case aging assumptions are avoided.

4.3 Challenges for Critical Path Selection

Selecting more paths than required leads to additional area overhead introduced by non-required aging sensors, and extra online test effort. On the other hand, monitoring fewer paths than required may lead to lower reliability since potential errors may not be predicted, and consequently, the clock frequency may not scale as demanded by the circuit. The identification of the critical paths that should be monitored is a complex problem because of the following challenges: 1) The critical Paths depend on process variations and need to be selected statistically; 2) The critical paths depend on the operating workload of the circuit, which is usually unknown in advance at the design phase. These challenges are described in more detail later. Let’s first define the concept of critical path.

4.3.1 Critical Path Deﬁnition

A path is said to be a Critical Path (CP) if it can become the slowest path of the circuit over the lifetime. Due to the combined eﬀect of aging and process variations, there is a set of paths satisfying the above condition. Periodical online-test should be done over the critical paths, as they are the ones likely to fail ﬁrst due to aging. Figure

67 4.3. CHALLENGES FOR CRITICAL PATH SELECTION

4.3 illustrates a possible temporal behavior of some paths in a circuit and how clock period should adapt accordingly. Path P1 is initially the longest path of the circuit hence the clock period is settled slightly larger than P1 delay. Each time the sensors detect when P1 is close to exceeding the clock period due to BTI degradation, the clock period is increased to keep correct operation. However, path P2, that is not a critical at t = 0 can also become the longest path of the circuit after a given operational. Therefore, path P2 also requires being monitored. Note that the delay of path P3 is always shorter than P1 and P2 over the entire lifetime. Thus monitoring P1 and P2 (the critical paths of the circuit) is suﬃcient to assure reliable operation.

, , ,

P1 defines P2 defines clock frequency clock frequency

Figure 4.3: Critical Paths Deﬁnition Over time

An eﬃcient critical path selection should identify only those paths that may become the slowest path of the circuit over the lifetime as those paths “fail-ﬁrst” than the others. However, this is not a trivial problem due to the following challenges.

4.3.2 Critical Paths dependence on process variations

Process variations play an important role in determining whether a path is likely to become critical and should be monitored. Figure 4.4 illustrates the delay distributions of various paths in a circuit. The path P1 is taken as reference and we want to know if the other paths can become slower than path P1. As can be

68 4.3. CHALLENGES FOR CRITICAL PATH SELECTION observed, identiﬁcation of those paths that can become slower than P1 do not only depends on the mean value of the paths delay. Due to process variations, the probability of the delays of a path to be longer than the delays of other path depends on the variance of each path and also on the covariance between them. As explained in previous Chapters, spatial correlation further impact on these parameters and consequently on the probability of a path to become critical.

Paths

Delay

Figure 4.4: Process variations impact on deﬁning critical paths

The critical paths should be selected statistically by means of Statistical Static Timing Analysis (SSTA). However, conventional SSTA approaches require detailed information on gate placement in the corresponding layout to account for spatial correlation between devices, but this information is not usually available in an early design phase [137–139], where the critical paths should be identiﬁed to introduce aging sensors and other test circuitry.

4.3.3 Critical Paths dependence on operating workload

The aging rate of the circuit is highly sensitive to the executed workload since it defines the average time that devices are at stress condition (duty-cycle). Fig- ure 4.5 shows a histogram of the aged delay of a path in ISCAS c880 circuit for 1000 different workload profiles. As shown, overall path delay degradation has a Gaussian-like distribution, as was also found in [140,141]. The delay degradation

69 4.3. CHALLENGES FOR CRITICAL PATH SELECTION that the path experiences significantly changes depending on the executed workload. For this example, up to 50ps of different delay degradation can occur for different workload profiles.

450 . 400 350 Fresh 300 250 Worst-Case 200 150 100 50 0 380 400 420 440 460 480 500 Path Aged Delay (ps)

Figure 4.5: Histogram of the delay of a PCP for various workload proﬁles (ISCAS c2670)

A major issue to consider the workload impact on BTI and circuit delay degradation is that the exact operating workload that a circuit will experience over the lifetime is uncertain at the design phase. If the workload of a circuit is known in advance, the circuit can be optimally designed. Otherwise, if the workload is not (exactly) known, conservative assumptions have to be made to assure that the circuit does not fail during its specified lifetime. State-of-art approaches solve workload uncertainty issue by assuming a specific value for the SP at each input of the circuit (usually SP = 0.5 is assumed) [71,81,89]. However, this assumption may not be reliable enough if the actual workload applied to the circuit differs from that assumed during design. Other approaches assume worst-case degradation conditions, settling the devices under near-static stress (α ≈ 1) and elevated temperatures [142, 143]. As shown in Figure 4.5, the aged delay estimated for a path using worst-case assumptions is much longer than the actual delay that the path can take for various realistic operating workloads. Although circuit design under this assumption guarantees reliable operation, significant over-design (area and power overhead) may be obtained.

70 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION

4.4 Proposed Critical Paths Selection Condition

A path should be monitored if it can become the longest path of the circuit over the lifetime. This can be expressed as follows:

Selection Condition: A path i should be monitored if for a given operating workload its probability over the lifetime to take delays longer than every other path j is greater than a probability threshold (ε).

The above selection condition can be expressed as follows:

P (Di > D1) ≥ ε & P (Di > D2) ≥ ε & ... & P (Di > Dj) ≥ ε (4.1)

Equation 4.1 is based on comparing the distributions between a pair of paths. If for all the feasible workload conditions a path i does not satisfy Equation 4.1 for at least one path j, the path i does not require to be monitored as the delays of other paths are always longer, which means that the other path will fail first. The probability threshold (ε) defines the maximum allowed probability of a non-monitored path to becoming the longest path of the circuit. If this happens, a faulty operation may occur. Therefore, the probability threshold (ε) establishes a trade-off between the desired degree of circuit reliability and the corresponding number of paths to be monitored.

> > > >

: −

Delay Delay Difference

Figure 4.6: Statistical Delay Diﬀerence Between two Paths

For a given pair of paths i and j, the probability of path i to be greater than

71 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION path j can be computed from the bivariate joint probability density function. However, the two-dimensional problem is transformed to a uni-dimensional problem using the delay diﬀerence (DDij = Di − Dj) random variable, and evaluating its probability of being greater than zero, as shown in Figure 4.6,

P (Di > Dj) ≥ ε ⇒ P (DDij > 0) ≥ ε (4.2)

The delay diﬀerence between two paths is normally distributed with a mean

2 value (µDD) and a variance (σDD) given by:

µDD,ij = µDi − µDj (4.3a)

2 2 2 σDD,ij = σDi + σDj − 2COV (Di,Dj) (4.3b)

where µDi and µDj are the mean delays of the paths under analysis, σDi and

σDj are the standard deviations of the delays of the paths under analysis, and

COV (Di,Dj) is the inter-path covariance between Di and Dj. Once computed 2 µDD,ij and σDD,ij between two paths, the probability of Equation 4.2 is obtained by integrating the PDF of DDij in the region where it takes values greater than zero. The integral is approximated by Gaussian quadrature method with 15 points [144].

Note that both, µDD,ij and σDD,ij depend on the operational time and the operating workload, which is uncertain during design. Furthermore, σDD,ij also depends on the spatial correlation of the gates, which depends on the placement coordinates of the gates in the ﬁnal layout, which are unknown at early design phases. The following sections explain how to handle uncertainty due to unknown spatial correlation and operating workload.

4.4.1 Handling Spatial Correlation Uncertainty

The spatial correlation that arises due to devices proximity in the circuit layout has a signiﬁcant impact on the selection probability of a path. As commented in

72 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION

Chapter 3, spatial correlation can be extracted from the ﬁnal circuit layout using the model of [2]. However, critical path selection and aging sensor insertion need to be performed at an early design stage, where the coordinates of the transistor locations in the layout are unknown. The following lines explain the spatial correlation approach used to perform SSTA at an early design stage, without spatial correlation information from the circuit layout.

The standard deviation of the delay diﬀerence between two paths i and j can be expressed as:

2 2 2 σDD,ij = σDi + σDj − 2COV (Di,Dj)

N N M M N M X X X X X X = ρxy · σx · σy + ρxy · σx · σy − 2 · ρxy · σx · σy x∈i y∈i x∈j y∈j x∈i y∈j (4.4)

where N and M are the numbers of gates in path i and j, σx and σy are the standard deviation of the delays of gates x and y in the corresponding path, and

ρxy is the spatial correlation between the gates. ρxy in the ﬁrst and the second summation relates two gates in the same path while ρij in the third summation relates one gate in path i and one gate in path j when computing inter-path covariance.

The values of spatial correlation required to compute the standard deviation of the delay diﬀerence can be represented by a correlation matrix as shown in

Equation 4.5. The term ρmi,nj represents the spatial correlation between gates n and m in paths i and j, respectively. Matrix indexes highlighted in green correspond to the spatial correlation between gates in the same path. The diagonal of the matrix highlighted in red corresponds to the self-correlation of a gate, which is the unit, no matter where the gate is placed. Those values are used to compute each path variance (First two summations of Equation 4.4). Matrix

73 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION indexes highlighted in blue correspond to the spatial correlation between a gate in one path and a gate in the other path. Those values are used to compute the inter-path covariance (Third summation of Equation 4.4). Note that in case of structural correlation, some of the spatial correlation values relating gates in the diﬀerent paths become the unit.

  ρ ρ ... ρ ρ ρ ... ρ  1i1i 1i2i 1iNi 1i1j 1i2j 1iMj       ρ2 1 ρ2 2 ... ρ2 N ρ2 1 ρ2 2 ... ρ2 M   i i i i i i i j i j i j     ......   ......         ρNi1i ρNi2i ... ρNiNi ρNi1j ρNi2j ... ρNiMj    ρij =   (4.5)    ρ1j1i ρ1j2i ... ρ1jNi ρ1j1j ρ1j2j ... ρ1jMj         ρ2j1i ρ2j2i ... ρ2jNi ρ2j1j ρ2j2j ... ρ2jMj       ......   ......      ρMj1i ρMj2i ... ρMjNi ρMj1j ρMj2j ... ρMjMj

In order to select critical paths under conservative assumptions due to spatial

2 correlation uncertainty, σDD should be approximated in such way that its value is maximum, maximizing the probability of the Delay Diﬀerence variable to take values greater than zero, as illustrated in Figures 4.7(a) and 4.7(b).

2 The approximation of σDD can be made assuming spatial correlation bounds, using a similar idea to that in [137–139]. Analyzing Equation 4.4, it can be

2 observed that an upper bound for σDD can be obtained as follows:

Assume that the spatial correlation for computing the variances of the delay of paths i and j is the unit (gates are assumed fully correlated). Thus

74 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION

3 3

(a) Lower σDD (b) Larger σDD

Figure 4.7: Impact of σDD on the selection probability of a path (P1 < P2).

estimated variances of the paths delays are maximum.

Assume that the spatial correlation for computing the inter-path covariance between paths i and j is zero (except if a structural correlation exists). Thus estimated inter-path covariance is minimum.

2 2 In such way, a maximum value for the sum of paths variances (σDi +σDj) is obtained, and a minimum value is obtained for inter-path covariance (COV (Di,Dj)).

Thus overall σDD is maximized. However, this approximation would result in a signiﬁcant overestimation of the σDD values that are obtained after gate placement in the layout, and consequently, in the number of paths selected for delay monitoring [114,145].

In order to develop a more efficient method to estimate σDD without spatial correlation, an analysis of the behavior of the terms in Equation 4.4 was done using spatial correlation information extracted from the final layout of some circuits. For this experiment, the layouts of the circuits were done using Mentor Graphics EDA tool. The gates were placed in the layout using the auto-place option. Then, Statistical Static Timing Analysis was performed using spatial correlation information extracted with the model of [2]. Then, the standard deviation of the delay difference (Equation 4.4) between each path i in the circuit and a reference path j was computed. The path with the longest mean delay was used as reference

75 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION path (j) for this experiment. Figures 4.8(a) and 4.8(b) plot the terms of Equation 4.4 obtained for this experiment for two ISCAS circuits. Each dot in the figures represents a different path in the circuit. As can be observed, there exists a relationship between the sum of path variances and the corresponding inter-path covariance. For those paths with larger variances, which are usually the paths with the larger mean delay too, the inter-path covariance with the reference path tends to be greater. Those paths whose delay is larger than the 90% of the maximum delay of the circuit are highlighted in green in the Figure. Those paths with largest delays are the ones with larger probabilities to become critical. As can be seen, both the sum of path variances and the corresponding inter-path covariance tend to be larger for those paths. This trend was also observed for other ISCAS circuits. These results demonstrate that the correlation bounding approach does not capture such dependency. Based on this observation, a first approximation is made:

Correlation Approximation I: A global correlation (ρg) value is assumed to estimate the standard deviation of the delay diﬀerence between two paths.

Correlation Approximation I means that all ρmi,nj in Equation 4.5 are replaced by a single ρg value, except when gates m in path i is equal to the gate n in path j, where ρg becomes the unit. Some paths in a given circuit could be out of the trend observed in Figures 4.8(a) and 4.8(b), and consequently, the global correlation approximation may be inaccurate for them, leading either to an overestimation or an underestimation of the σDD. The impact of this approximation will be analyzed later in the results part.

A second issue is to determine the best ρg value to be used to estimate σDD in such way that most of the critical paths are correctly selected. Therefore, selected

ρg should maximize σDD. The experiment illustrated in Figure 4.9(a) is used to analyze which global correlation value (between 0 and 1) maximizes σDD between two paths. The standard deviation of the delay difference between two paths i and j of different depths is computed using different global correlation values.

76 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION

(a) c432 (b) c880

Figure 4.8: Relationship between sum of path variances and inter-path covariances for some ISCAS circuits.

Path j is used as a reference path and has a ﬁxed depth of 20 inverters, while path i depth is changed from 1 to 20 inverters. Figure 4.9(b) shows how σDD changes as function of the depth of path i for various ρg. As can be observed, for

ρg = 0, σDD is low for short paths, and it increases linearly as the depth of path i (N) gets closer to the depth of the reference path j (M = 20). On the other hand, for ρg = 1, σDD is large for short paths, but it reduces as the depth of path i (N) gets closer to the depth of the reference path j (M = 20). Interestingly, for long paths σDD approximated using ρg = 0 is larger than σDD approximated using ρg = 1. As shown in Figure 4.9(b), for path depths larger than N ≈ 13, using ρg = 0 maximizes σDD. This behavior can be deduced analytically with the following simple equations, assuming that the variance of each inverter gate is the same [19]:

77 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION

Path j 20 3 =0 =0.2 =0.4 =0.6 =0.8 =1.0 g g g g g g 2.5 Path depth: fixed to 20 Inverters 15 Long Paths 2

Path i 10 1.5

1 5 0.5 Path depth: Changed from 1 to 20 Inverters 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Path Depth (N) Path Depth (N)

(a) Example of Two paths (b) σDD as function of depth of path i (N)

20 3 =0 =0.2 =0.4 =0.6 =0.8 =1.0 g g g g g g 2.5 15 Long Paths 2

10 1.5

1 5 0.5

0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Path Depth (N) Path Depth (N)

Figure 4.9: Analysis of the global correlation value (ρg) that maximizes the standard deviation of the delay diﬀerence between two paths.

σ2 = [N + ρ (N 2 − N)] · σ2 Di g inv

σ2 = [M + ρ (M 2 − M)] · σ2 (4.6) Dj g inv

2 COV (Di, Dj) = N · M · ρg · σinv

Using Equations 4.6, σDD using ρg = 0 and ρg = 1 are given by:

2 2 σDD,ρg=0 = (N + M) · σinv − 0 (4.7a)

2 2 2 2 2 σDD,ρg=1 = (N + M ) · σinv − 2 · N · M · σinv (4.7b)

From Equation 4.7a, it is evident that the ﬁrst term, corresponding to the sum

78 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION of path variances always increases as function of N while nothing is subtracted since covariance becomes zero. From Equation 4.7b, the ﬁrst term (sum of variances) increases as function of N 2, while subtracted term (covariance) increases as function of N ·M, and since M > N, the covariance term increases faster than the sum of path variances as N increases, and therefore, σDD,ρg=1 reduces, as shown in Figure 4.9(b). The ratio of σDD computed using ρg = 0 to that computed using

ρg = 1 as function of the depth of path i (N) is plotted in Figure 4.9(c). As can be observed, for long paths (path with a large number of gates), the σDD value computed using ρg = 0 is greater than the computed using ρg = 1. Long paths are those with larger delays in the circuit. Hence, the global correlation value should be set to zero (ρg = 0) to approximate σDD at early design stages. The behavior of Figure 4.9(b) was observed on ISCAS circuits. For example, Figure

4.10, shows σDD,ρg=0/σDD,ρg=1 obtained for every path in ISCAS circuit C1908.

As can be seen, for those paths with larger delays, ρg = 0 provides a larger σDD.

Figure 4.10: Ratio of σDD estimated with ρg = 0 to that estimated with ρg = 1

Based on the previous analysis the second approximation is proposed:

Correlation Approximation 2: A global correlation value of ρg = 0 is assumed to approximate σDD between two paths. By using the proposed correlation approximations, the delay diﬀerence be-

79 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION tween two paths can be estimated at an early design phase without spatial correlation information. Moreover, this approach is more realistic than classic pessimistic correlation bounds.

4.4.2 Handling Workload Uncertainty

Operating workload affects the selection probability of a path. As mentioned before, the specific workload that the circuit will experience over the lifetime is hardly known in advance at the design phase. State-of-art approaches usually assume pessimistic workload scenarios to assure that the critical paths that can be presented for any operational workload are identified. In [136,146], the longest critical path at t = 0 is used as reference path (j) for critical path selection. The critical paths are selected evaluating Equation 4.1 assuming that the reference path (j) remains unaffected by aging, while the path under analysis (i) experiences worst-case aging as all the transistors in the path are assumed at near-static stress (α ≈ 1). In such way, the mean value of the delay difference is maximized. However, this approach is unrealistic since such unbalanced degradation may not occur during normal circuit operation. Furthermore, all the paths experience some degradation under realistic operating conditions. In [145], the degradation of the paths was estimated for various feasible workload scenarios. A maximum/minimum degradation of each path was identified. Then, Equation 4.1 was evaluated assuming that the reference path (j) experiences minimum aging while the path under evaluation was assumed under maximum aging. This im- proves the workload conditions assumed for path selection, but such unbalanced degradation is still pessimistic. Figures 4.11(a) and 4.11(b) show scatter plots of the mean value of the aged delay (after 10 years of aging) of two paths in ISCAS c2670 for several different workload profiles. For these figures, the reference path (j) is the path with the largest nominal delay in the circuit (LCP), and two cases of paths under analysis

80 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION

(i) are shown: a) The delay degradation of path i has a low correlation with the delay degradation of path j. b) The delay degradation of path i has a high correlation with the delay degradation of path j.

As can be observed in Figures 4.11(a) and 4.11(b), even diﬀerent operating workloads may cause similar degradation in the paths. This aging correlation between paths may arise due to nodes signal probability correlation, paths topology (i.e., similar logic gates are presented in the paths) and the existence of shared gates between paths. This realistic behavior of delay degradation is not considered in classic approaches. Figure 4.11(c) shows a histogram of the mean value of the delay diﬀerence (See Equation 4.3a) for the cases in 4.11(a) and 4.11(b).

As shown, the maximum realistic µDD that can exist between the two paths is much smaller than that computed using the max/min aging approach in [145]. Therefore, the probability of the path i to take delays greater than path j is much smaller than the estimated using such assumption. Based on the above observations, it is proposed to perform path selection for various realistic random-generated workload profiles. By analyzing different workload profiles, a more efficient selection of critical paths can be made. It is important to trade-off the computational effort for analyzing various workloads and the degree of circuit reliability. It must be noted that for a given operating workload, the probability of a path to take longer delays than other path changes over the lifetime depending on the aging rate of each path. This behavior is captured by the temporal dependence of the paths delay difference. Therefore, paths that are not critical at the beginning of the lifetime may become critical at the end of the lifetime [134]. Taking into account this reordering behavior is complex as exhaustive evaluation of crit-

81 4.4. PROPOSED CRITICAL PATHS SELECTION CONDITION

540 540 540 540

535 535 540 530540 530 ) ) ) ) 530 530 s s ps ps p p 535 520 520 ( ( ( ( 530 525 525 , , , , ) ) 530 s 510 510 ps p 520 μ μ μ μ ( 520( 520

525 , , 500 500 515 515 510 μ 520 μ 510 510 490 490 515500 500510 510520 520530 530540 540550 550 500500 500510 510520 520530 530540 540550 550

μ μD (ps)D (ps) μ Dμ (psD) (ps) 510 j j 490 j j 500 510 520 530 540 550 500 510 520 530 540 550 (a) Low Delay Degradation300 300 Correlation (b) High Delay Degradation Correlation μ D (ps) Low CorrelationLow Correlation μ D (ps) j using using j 250 250High CorrelationHigh Correlation max/minmax/min 300 approachapproach Low Correlation 200 200 using 250 High Correlation max/min 150 150 MaximumMaximumapproach 200 realisticrealistic 100 100 150 Maximum 50 50 realistic 100 0 0 -2050 -20-10 -100 010 1020 2030 3040 40 μ DDμ(ps) (ps) 0 DD -20 -10 0 10 20 30 40

μDD(ps) (c) Histograms of the delay diﬀerence

Figure 4.11: Analysis of the degraded delay of two paths i and j in ISCAS c2670 for 1000 random-generated operating workloads.

ical path selection condition (Equation 4.1) as a function of operational time is computationally costly.

The computational cost of computing the reordering probability over the time between a pair of paths can be lowered by evaluating Equation 4.2 only at the point in the time where the reordering probability is maximum. Figure 4.12 shows the behavior over the time of the delay distribution of two paths (i and j) for two degradation cases: a) The aging rate of path i is greater than the aging rate of path j (Figure 4.12(a)), and b) The aging rate of path i is lower than the aging rate of path j (Figure 4.12(b)). As can be seen, due to monotonically increase of paths delays as a function of aging time exponent (n)[147], the maximum reordering probability between the paths occurs either at the beginning or at the end of

82 4.5. PROPOSED METHODOLOGY FOR CRITICAL PATH SELECTION Case I: ΔD > ΔD Case I: ΔD > ΔD ΔD < ΔD CaseCase II:II: ΔD < ΔD

LargestLargest >> atat LargestLargest >> atat time=10time=10 years years time=0time=0 years years

(a) ∆Di < ∆Dj (b) ∆Di > ∆Dj

Figure 4.12: Behavior over the time of the delay distribution of two paths (i and j). the operational lifetime. Based on this observation, it is suﬃcient to evaluate Equation 4.2 at both the beginning and the end of the lifetime to determine if the delay of a path i can become longer than the delay of path j.

4.5 Proposed methodology for critical path selection

The flow of the proposed methodology is shown in Figure 4.13. For timing analysis, the linear gate delay model of Equation 3.6 was used to capture the effect of process variations and BTI-induced Vth shift on gate delay. The linear model parameters are obtained during gate library pre-characterization, as explained in Chapter3. The methodology starts from the gate-level circuit netlist. A path search algorithm (See section 3.3.3) is used to find all the topological paths of the circuit. Since the number of topological paths may be too large in modern digital integrated circuits, a fast path pre-filtering step based on Static Timing analysis with process corners is done. This reduces the number of paths that are analyzed statistically. The filtered paths are next analyzed statistically using the proposed

83 4.5. PROPOSED METHODOLOGY FOR CRITICAL PATH SELECTION early spatial correlation approach. First, a coarse path section is made to identify those paths that can become critical due to process variations and worst-case aging conditions. These paths are called Potential Critical Paths (PCP). Then, a fine statistical critical path selection process is made using realistic aging conditions (i.e., duty-cycle and temperature) for a set of different random-generated workload profiles. A path selection algorithm calculates the probability of a path to become the longest path of the circuit for each one of the workload profiles. The selected paths are those that can become critical due to aging and process variations for any executed workload. In such way, adequate online error prediction can be performed.

Gate Level Netlist

Topological Paths Search

Fast Path Pre-filter Spatial Early-SSTA Correlation Potential Critical Paths Approach under worst-case aging

Random Critical Path Selection Workloads Algorithm

Critical Paths to be Monitored

Figure 4.13: Diagram of the proposed methodology

The following subsections explain in detail each of the steps of the proposed methodology.

4.5.1 Fast Path Pre-Filter

The fast path pre-ﬁlter step allows identifying those paths having some probability to become critical under excessive aging and process corner variations. Timing de-

84 4.5. PROPOSED METHODOLOGY FOR CRITICAL PATH SELECTION lay information of all topological paths is obtained by conventional Static Timing Analysis based on process corners.

Corner analysis FF SS

P1 (LCP) P2 p%

P3 p%

P4 p% Delay Threshold Delay Value

Figure 4.14: Fast Path pre-ﬁltering.

Figure 4.14 illustrates the pre-filtering step. Those paths taking delays at the Slow-Slow process corner plus some pessimistic percentage (p%) of aging larger than the delay of the fresh Longest Critical Path (LCP) at Fast-Fast corner are included in the pre-filtered path set. The value of p% represents the highest delay degradation due to BTI that a circuit may suffer, which has been reported to be around 15 − 25% for 10 years [25,148]. In this work, we select a value of p = 25%. By using this overestimated value, it is assured that no path with possibilities to become critical for any process and aging condition will be discarded.

4.5.2 Potential Critical Paths Under Worst-Case Aging

The set of pre-ﬁltered paths is reduced even more by identifying the statistical Potential Critical Paths (PCP). A PCP is a path that under worst-case aging conditions may exceed the initial (fresh) delay of the Longest Critical Path (LCP) of the circuit, which is the path with the largest mean delay at t = 0. This means that if the clock period is adjusted according to the LCP only, the PCPs may trigger a faulty operation over the lifetime. This condition is expressed as follows: PCP Condition: A path i is said to be a PCP if the probability of its aged

85 4.5. PROPOSED METHODOLOGY FOR CRITICAL PATH SELECTION delay to be longer than the fresh delay of the LCP is larger than a user-deﬁned probability threshold ε:

P (Di,t=10 > DLCP,t=0) ≥ ε (4.8)

where Di,t=10 is the degraded delay of path i after 10 years of aging and

DLCP,t=0 is the delay of the LCP with no aging. Here, the delay degradation of path i is evaluated under worst case conditions given by a near-static duty-cycle (α ≈ 1) and elevated operating temperature (T = 110◦). Paths that do not meet this condition do not require to be monitored since they would never activate an error prediction sensor because the delay of the LCP is always longer for the entire lifetime (the LCP will always fail ﬁrst), even when the paths suﬀer extreme aging and the LCP remains fresh.

4.5.3 Critical Path Selection Algorithm

The ﬁnal selection of critical paths takes place over the reduced set of PCPs. The selection algorithm takes into account that:

Different paths may become critical depending on the workload profile at main circuit inputs. Therefore, various workload profiles are applied to the circuit to identify the different critical paths that may exist.

Some of the paths not covered by the LCP could be covered by another path. Therefore, a path is also compared against other previously selected paths to determine if it requires to be monitored. In such way, the ﬁnal set of CPs is reduced.

Algorithm1 describes the procedure for ﬁnal critical path selection. The set of PCPs is sorted in a decreasing order according to the nominal delay (no-BTI and no-PV) of the paths. The set of critical paths is initialized with the LCP of the circuit since this path has the largest chance to cover most of the other

86 4.5. PROPOSED METHODOLOGY FOR CRITICAL PATH SELECTION

ALGORITHM 1: Fine Critical Path Selection Input: Potential Critical Paths Output: Critical Paths to be monitored Sort PCPs paths from largest to shortest delay. Initialize CP set: Path[1].Critical=YES, Num.CPs=1 for WL = 1 to WL = MaxW L do generate propagate SP () compute temperature() compute duty cycle() compute A and SV () Run SSTA on PCPs for i = 2 to i = Num.P CP s do if PCP[i].Critical==YES then The PCP i is already selected. Continue with next PCP. end P ath P rob cum = 1 for j = 1 to j < i do if PCP[j].Critical==YES then P ath P rob=max (P (DDij,t=0 > 0),P (DDij,t=10yr > 0)) P ath P rob cum = min (P ath P rob, P ath P rob cum) if P ath P rob cum < ε then Monitored PCP j covers PCP i PCP[i].Critical=NO break end if P ath P rob cum ≥ ε then Monitored PCP j does not cover PCP i Try with next monitored PCP j end end end if P ath P rob cum ≥ ε then PCP i can be greater than all current monitored PCPs j PCP[i].Critical=YES Num.CPs++ end end end paths. Critical path selection is performed for a set of MaxW L random-generated workload proﬁles (WL = {WL1,WL2, ..., W LN }), which are intended to represent realistic operating scenarios for the circuit. Each workload proﬁle is represented by the Signal Probabilities (SP) at main circuit inputs,

WLi = {SP1,i,SP2,i, ...SPNI,i} (4.9)

where WLi is the workload proﬁle i, and SPNI,i is the signal probability as-

87 4.5. PROPOSED METHODOLOGY FOR CRITICAL PATH SELECTION signed to the input NI for the workload proﬁle i. A uniform random generator between 0 and 1 is used to obtain the signal probability assigned to each input. For each workload proﬁle, the signal probabilities are propagated to internal nodes using basic probability rules. Then, the stress duty-cycle (α) and operating temperature of each transistor in the circuit are computed. Once duty-cycle and operating temperature are obtained, the device’s parameters Am and SV,m of the process-variation aware BTI model (Equation 3.3) are updated. Then, aging- aware SSTA is run over the PCP set to update the paths timing information according to the analyzed workload. The selection condition (Equation 4.1) of each non-monitored PCP i to take delays longer than each already selected CP j (initially the LCP only) is computed. Since the exhaustive computation of Equation 4.2 between each pair of PCPs is computationally expensive, Algorithm1 uses a greedy heuristic based on the PCPs ranking to reduce the number of paths comparisons. The heuristic takes into account the following aspects to speed-up computational time:

1. If a path is identiﬁed as critical (i.e. none of the actual CP covers the path delays), the path is removed from the PCP set and it is included in the CP set. Then, its selection condition is not evaluated for the rest of the workloads.

2. Each PCP is only compared against the CPs with a larger mean delay.

3. Usually, the ﬁrst slowest CPs are capable to cover most of the paths. There- fore, each PCP is usually compared against few CPs.

In order to determine if a path needs to be monitored, the following procedure is carried out (See Algorithm1): For a PCP i, its selection probability (P ath P rob) is initialized at the unit. This value is going to be reduced by evaluating the probability of PCP i to take delays larger than other CP j being monitored. Equation 4.2 is computed only with respect to those paths j already

88 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS in the CP set (being monitored) and having a larger nominal delay (j < i). Note that the maximum selection probability of the one obtained at t = 0 or t = 10 years of aging is stored in P ath P rob variable and compared against the probability threshold (ε). If the computed probability for PCP i is lower than ε, then PCP i does not require to be monitored, and its selection probability is no longer computed against the other remaining critical paths. This approach (See highlighted line in blue in the Algorithm) further reduces computational time. In case the computed probability for a PCP i is larger than ε, which means that monitoring the CP j does not cover the delays of the PCP i, the selection probability process continues with the next CP j being monitored. Note that the variable P ath P rob stores the minimum selection probability that PCP i took with respect to the CPs with it was compared. If none of the current selected CPs j can cover the delays of the PCP i, the PCP i is included in the critical path set which need to be monitored. After applying Algorithm1, those paths that are likely to take delays greater than all the other paths in the circuit due to BTI-induced delay degradation and process variations are eﬀectively selected to be under constant surveillance by aging sensors. Hence, reliable circuit tuning can be performed.

4.6 Experimental Results on ISCAS Circuits

The proposed methodology has been applied to some ISCAS 85/89 benchmark circuits implemented in a 32nm technology library [65]. The algorithm has been written in C++ code. In addition, the layout of the ISCAS circuits has been obtained to compare the results obtained using the global correlation approximation with those obtained using layout information. The experiments were performed on an Intel(R) Xeon(R) CPU E5-2609 V2 @ 2.50GHz. Table 4.1 shows the main characteristics of the analyzed circuits. Second and third columns give the number of gates and topological paths of the circuits. IS-

89 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS

CAS circuits of different sizes and complexity have been considered to evaluate the scalability of the proposed methodology. Columns fourth and fifth show the nominal delay (no-BTI and no-PV) and the Design-Time guardband required to cover BTI aging (10 years) and process variations under a conventional design (non-adaptive). This guardband was computed under worst-case aging assumptions. On average, a 51.20% of guardband over the nominal delay is required. This imposes a significant performance loss.

Table 4.1: Main characteristics of the considered ISCAS circuits Design-Time Circuit Num. Gates Num. Paths Dnom (ps) Guardband (ps) C432 168 82364 600.25 312.32 C880 226 4935 351.50 194.19 C1908 244 15638 329.21 164.84 C2670 393 3490 492.12 235.56 C3540 748 787401 731.75 352.53 C5315 1139 24666 433.09 213.25 C7552 1365 43614 426.43 218.3 S838 279 1714 350.47 210.53 S1423 657 44726 1427.29 628.13 S5378 1174 11728 339.77 176.68 S9234 1817 29071 542.51 273.91 S13207 2403 585368 748.13 371.39 S35932 10902 26146 221.73 123.282

4.6.1 Path Selection Results

Table 4.2 shows detailed results of the selected paths and CPU time at the PCP selection and final path selection steps. For critical path selection, a probability threshold of ε = 0.25% has been used to provide a 99.75% degree of circuit reliability. The second column of Table 4.2 gives the number of identified PCPs. The percentage of PCPs out of the total topological paths is also given in the parentheses. As can be seen, the percentage of PCPs may change significantly, depending on the circuit topology, how balanced are the delays of the paths and how sensitive are the paths to BTI-aging. These PCPs were selected using worst-

90 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS case aging conditions as in some state-of-art works [115,136]. However, monitoring such number of paths could be expensive due to online test effort and area overhead of the inserted sensors. The computational time of the PCPs selection is proportional to the number of paths in the circuit since each path is compared against the LCP of the circuit. Final selection of critical paths has been applied for different cases of workloads. The number of selected critical paths (CPs) and the computational time (CPU) are given for four cases (1, 20, 40, and 60) of the number of random-generated workload profiles. For workload = 1, a reduced set of paths is identified as critical. This large reduction occurs because the proposed approach takes into account that the monitored paths degrade over the lifetime, which means that the clock period is gradually adjusted according to these paths, and consequently, various paths with lower aging cannot trigger any fault. For most of the circuits, the number of selected critical paths increases when more workloads profiles are analyzed during the fine selection step. This is because the new workloads induce more aging in some of the paths that were not selected first. For some circuits, a limited number of new critical paths (CPs) are identified as the number of workloads increases (i.e., circuits s1423, s838 and c2670). However, for other circuits, the number of new critical paths increases more significantly as the number of workloads increases (i.e., s13207). The CPU time increases as more workload are used. The total CPU time of the proposed methodology is the addition of both the CPU time for the coarse selection and the CPU time for the fine selection. An interesting observation is that only a few new critical paths are found when the number of tested workload profiles increases from 40 to 60. Figure 4.15 support this observation by showing the behavior of the number of critical paths (normalized with respect to the number of paths selected with one single workload) with respect to the number of tested workload profiles for some circuits. These results suggest that for some circuits, selecting critical paths for a moderate number of workload profiles may be enough to identify most of the paths that may become critical for any given workload operating condition. However, some

91 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS circuits may be more sensitive to the used workloads than others, and they may require a greater number of workload to assure high reliability. Designers should trade-oﬀ the degree of circuit reliability, which increases by detecting paths that may become critical under certain workload conditions, with the CPU cost to be paid.

Table 4.2: Selected paths using the proposed methodology (ε = 0.25%) Coarse Selection Fine Selection ISCAS Workloads=1 Workloads=20 Workloads=40 Workloads=60 PCPs CPU (sec) CPs CPU (sec) CPs CPU (sec) CPs CPU (sec) CPs CPU (sec) c432 44319 (53.8%) 66.32 1484 136.76 1632 1064.39 1636 1783.42 1653 2982.18 c880 2516 (50.98%) 2.40 327 4.17 348 45.25 357 85.89 360 130.29 c1908 11154 (71.32%) 8.67 2429 122.98 2576 339.39 2598 553.35 2599 776.38 c2670 752 (21.54%) 1.80 90 1.65 98 22.98 98 45.33 109 67.1 c3540 100167 (12.72%) 629.33 400 121.77 448 2269.19 614 4421.11 616 6791.88 c5315 9903 (40.14%) 13.10 738 26.13 766 181.7 796 336.75 804 507.22 c7552 17650 (40.46%) 21.42 646 20.52 732 245.48 754 481.52 763 728.66 s838 372 (21.7%) 0.48 36 0.22 36 3.17 36 7.12 36 11.85 s1423 4896 (10.94%) 151.27 9 36.02 11 720.33 13 1440.06 16 2159.71 s5378 2016 (17.18%) 3.11 220 1.78 241 21.66 264 44.78 268 66.16 s9234 2840 (9.75%) 13.45 58 2.87 66 54.75 69 109.22 69 163.61 s13207 170913 (29.19%) 567.60 630 256.32 1008 4896.43 1024 9500.36 1029 14107.7 s35932 13907 (53.18%) 6.06 224 5.67 345 107.53 353 182.66 354 277.61

1.8 1.7 c5315 c432 s13207 s35932 1.6 1.5 1.4 1.3 1.2 1.1 1 0 20 40 60 80 100 workloads

Figure 4.15: Selected critical paths vs. evaluated workload proﬁles

Figure 4.16 shows the impact of increasing the probability threshold (ε) for critical path selection. Figure 4.16(a) shows the reduction in the percentage of the number of critical paths that need to be monitored for ε values of 1% and 2%.

92 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS

The percentage is computed with respect to the results using ε = 0.25% (Table 4.2). When larger ε is allowed, fewer paths need to be monitored. The reduction in the CP set is because those paths with a lower probability to become critical are discarded. On average, the critical path set is reduced in 28% and 41% for ε = 1% and ε = 2%, respectively. When ε is increased, CPU time is reduced due to two reasons: 1) The set of PCP selected under worst-case aging conditions is reduced, which means that the number of paths analyzed in the fine selection process is lower; and 2) In the fine selection algorithm, a PCP is more likely to be covered by the first selected Critical Paths. Therefore, the number of pair of paths comparisons reduces. Figure 4.16(b) shows the reduction in CPU time obtained for the analyzed circuits. On average, CPU time is reduced by 13.7% and 19.49% for ε = 1% and ε = 2%, respectively. Small values for the probability threshold are recommended for higher degrees of reliability.

4.6.2 Validation of Spatial Correlation Approximation

Due to the spatial correlation approach used to perform SSTA at an early design state, the estimated standard deviation of the delay difference between paths may differ from the exact value that is obtained using spatial correlation information extracted from final circuit layout with the model of [2]. Figure 4.17 shows a comparison between the standard deviation of the delay difference computed with the proposed correlation approximation and the one computed using spatial correlation extracted from circuit layout. To obtain the

ﬁgures, σDD,ij was computed between the Longest Critical Path (LCP) and every other path in the circuits for a given workload proﬁle. Each dot represents a path, and the dots highlighted in orange correspond to those paths whose probability to become slower than the LCP was greater than the probability threshold (ε). As can be observed in Figures 4.17(a) and 4.17(b), for the critical paths, the estimated σDD,ij with the proposed correlation approach is slightly larger than

93 Results 4.6. EXPERIMENTAL RESULTSResults ON ISCAS CIRCUITS Impact of increasing the probability threshold Impact80 of increasing the probability threshold 8070 =1%= 1% =2% 7060 =1%== 1%2% =2% 6050 = 2% 5040 CPs reduce in 28% and 4030 41%CPs for reduce ε = 1% in and 28% ε and= 2% 3020 41% for(on ε = average) 1% and ε = 2% 2010 0 (on average) 100 c432c432 c880c880 c1908c1908 c2670c2670 c3540c3540 c5315c5315 c7552c7552 s838s838 s1423s1423 s5378s5378 s9234s9234 s13207s13207s35932 s35932 0 Circuit c432c432 c880c880 c1908c1908 c2670c2670 c3540c3540 c5315c5315 c7552c7552 s838s838 s1423s1423 s5378s5378 s9234s9234 s13207s13207s35932 s35932 (a) Reduction on selected critical paths. 35 Circuit

3530 = 1%=1% =2% CPU time reduces in 13.7% = 2% 3025 = 1%=1% =2% andCPU19time.49%reducesfor ε =in1%13and.7% = 2% 2520 εand= 219%.(on49%average)for ε = 1% and

2015 ε = 2% (on average) 1510 1) Smaller initial PCP set 105 1)2) FewerSmallerpathsinitialcomparisonsPCP set 50 c432c432 c880c880 c1908c1908 c2670c2670 c3540c3540 c5315c5315 c7552c7552 s838s838 s1423s1423 s5378s5378 s9234s9234 s13207s13207s35932 s359322) Fewer paths comparisons 0 Circuit 31 c432c432 c880c880 c1908c1908 c2670c2670 c3540c3540 c5315c5315 c7552c7552 s838s838 s1423s1423 s5378s5378 s9234s9234 s13207s13207s35932 s35932 Circuit 31 (b) Reduction on CPU time.

Figure 4.16: Impact of probability threshold on the CP set size and CPU time (WL = 60).

the one computed using spatial correlation information. This imply that the approximated selection probability of the paths is larger than the one obtained using layout information, and therefore, all the critical paths are selected with the proposed approach. However, some paths may be out of this trend, as shown in

Figure 4.17(c) obtained from circuit C7552. As can be seen, the estimated σDD,ij is lower than the one obtained using layout information for some paths. This may occur if the gates within a path are close to each other but the two paths are far from each other. However, it should be highlighted that those paths out of this trend may still be selected/covered due to the following reasons:

94 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS

The mean value of the delay diﬀerence also inﬂuences the estimated probability for a path.

If the estimated probability using the correlation approximation is still larger than the threshold probability, the path is selected.

The path may be covered by a path other than the LCP.

(a) c880 (b) c1908

Figure 4.17: Approximated standard deviation of the delay diﬀerence between all the paths in the circuit and the LCP of the circuit Vs. the standard deviation of the delay diﬀerence obtained using spatial correlation extracted from circuit layout.

The proposed spatial correlation approximation is more realistic than approaches using correlation bounds for computing the standard deviation of the delay diﬀerence between paths. As commented before, using correlation bounds

95 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS results in a significant overestimation of the standard deviation of the delay difference, and consequently, on the selection probability of a path. Figure 4.18 shows histograms of the ratio of σDD,ij computed using correlation estimation approaches at early design phase to the one computing using spatial correlation from circuit layout. Histograms for both the ratio corresponding to the σDD,ij obtained with the proposed correlation approximation and with correlation bounds are shown. As can be observed, when the proposed correlation approximation is used, the ratio of σDD,Early/σDD,Layout is closer to the unit, which means that the proposed correlation approximation provides a good estimation of the σDD,ij between two paths. It is also evident that using correlation bounds is over-pessimistic, which may result in a significant number of paths being unnecessarily monitored. Figure 4.19 shows, for some ISCAS, the overestimation of the number of paths selected using correlation bounds to estimate σDD early in the design flow with respect to the case when the proposed correlation approximation is used. The selected paths are overestimated in up to 70X times, which may represent a larger area overhead due to extra aging sensors and online delay test effort overhead. Table 4.3 shows a detailed comparison of the critical path set obtained with the proposed methodology against the critical path set obtained using circuit layout information. For both cases, path were selected under 60 different random- generated workload profiles. Spatial correlation has been extracted from the final circuit layout using the model of [2]. Table 4.3 shows that there are some wrong and missed selected paths. A path is said to be a wrong path (missed path) if it was selected (unselected) using the spatial correlation approach and unselected (selected) using spatial correlation extracted from circuit layout. As mentioned before, the spatial correlation approximation seeks to maximize σDD to increase critical paths coverage. Although the proposed approach may select more paths than those needed, it should be noted that it is much more realistic than over-pessimistic approaches using correlation bounds and/or worst-case aging assumptions. As shown in Table 4.3,

96 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS

100100 300300 250250 ProposalProposal 8010080 ProposalProposal 300 BoundsBounds 200200 BoundsBounds 60 60 250 80 Proposal Proposal 150150 Bounds 40 40 Bounds 200 60 100100 150 204020 5050 100 0 0 020 500 0 02 24 46 68 810 1012 1214 1416 161818 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1010 0 / / 0 / / 0 2 4 DD,early6DD,early8 DD,layout10DD,layout12 14 16 18 0 1 2 3 DD,earlyDD,early4 5 DD,layout6 DD,layout7 8 9 10 / / DD,early(a) c880DD,layout DD,early(b) c1908DD,layout 120120 100120100 ProposalProposal Bounds 10080 80 ProposalBounds 608060 Bounds 406040 204020 20 0 0 0 1 2 3 4 5 6 7 8 9 10 00 1 2 3 4 5 6 7 8 9 10 / 0 1 2 3 DD,early4 5/ 6DD,layout7 8 9 10 DD,early/ DD,layout DD,early DD,layout (c) c7552

Figure 4.18: Histograms of the ratio of of the estimated standard deviation of the delay diﬀerence using correlation approaches and the one obtained using spatial correlation information extracted from circuits layout.

there were selected some wrong paths, which may increase periodical test eﬀort and circuitry area. However, few missed paths that remain without surveillance are obtained for some circuits. Missed paths are obtained because the selection probability of a path computed using the spatial correlation approach underes- timates the selection probability of the path computed using spatial correlation extracted from the Layout. Figure 4.20 shows a histogram of the selection probability computed with spatial correlation of layout for each of the missed paths of all the ISCAS circuits. Most of the missed paths have low probabilities to become critical (less than 2%), and they may not represent a major reliability concern for the circuit. Only few missed paths got larger probabilities than 2%. Column 6 of Table 4.3 shows the maximum selection probability that a path took

97 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS

0 c432 c880 c1908 c2670 c5315 c7552 s838 s5378 s9234 s35932 Circuit

Figure 4.19: Overestimation of critical paths selected using correlation bounds with respect to the critical paths selected using the proposed correlation approximation.

Table 4.3: Comparison of CPs selected using spatial correlation against those selected using spatial correlation extracted from circuits layout (ε = 0.25%) CPs Correlation CPs Correlation Wrong Missed Missed Penalty ISCAS Approach from Layout Paths Paths Probab. (%) GB (ps) c432 1653 1100 553 0 0 0 c880 360 216 144 0 0 0 c1908 2599 1520 1079 0 0 0 c2670 109 83 26 0 0 0 c3540 616 190 428 2 0.32 3.70 (1.05%) c5315 806 370 436 0 0 0 c7552 763 464 329 30 0.94 7.60 (3.48%) s838 36 27 9 0 0 0 s1423 16 8 8 0 0 0 s5378 268 97 171 0 0 0 s9234 69 59 34 24 6.81 58.04 (21.2%) s13207 1029 198 831 0 0 0 s35932 354 187 191 24 0.72 10.93 (8.85%) among the missed paths for each circuit. For circuits C3540, C7552 and s35932, this probability is below 1%. Only for circuit s9234, the selection probability of a missed path was up to 6.26%. However, this corresponded to only 1 specific path in the circuit. In order to guarantee reliable operation of the missed paths, minor modifications could be done to the final circuit design. For example, gate sizing could be applied to the small set of missed paths. Another approach is to

98 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS add a small guardband penalty to the clock period. Column 7 of Table 4.3 shows the guardband penalty that needs to be added to the clock period for the circuits with missed paths. This guardband penalty is significantly smaller than the nominal guardband to cover aging and process variations of the conventional designs. The percentage of required guardband penalty compared to nominal guardband in conventional designs are shown in parenthesis. In the worst case, the guardband penalty is 21.2% of the nominal guardband for circuit s9234. For the other circuits having missed paths, the guardband penalty is smaller. These results demonstrate that the proposed methodology provides efficient critical path selection at early stages of the design flow, which allows reliable online delay monitoring and clock frequency scaling.

0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Probability

Figure 4.20: Histogram of the probability to become critical of the missed paths of all ISCAS circuits.

Once the critical paths to be monitored are identiﬁed, aging sensors (See Fig- ure 4.1(a)) are inserted at the output nodes to keep under surveillance circuit delay degradation. Only one single aging sensor per group of those selected paths converging at the same output node is required. This minimizes the total number of aging sensors required per number of critical paths. Table 4.4 shows a comparison of the number of nodes where aging sensors should be placed according to the

99 4.6. EXPERIMENTAL RESULTS ON ISCAS CIRCUITS selected paths using the proposed correlation approximation and using correlation obtained from circuit layout. The total number of outputs of each ISCAS circuit is also given. Only few conventional FF require to be modified to introduce the aging sensor. However, the number of nodes to be monitored may increase for large industrial circuits, where several paths having balanced delays may become critical. As can be observed, the number of aging sensors match well for most of the circuits. There are only three circuits where a different number of sensors was obtained. In circuit C7552, 18 and 17 nodes were selected using the proposed correlation and the correlation information extracted from circuit layout, respectively. There were two wrong nodes (due to the wrong paths) and only one missed node (due to the missed paths). In circuit s5378 there was only one wrong node. The largest difference on the selected nodes is in circuit s35932, where 21 missed nodes were obtained. This large difference in the number of selected nodes is because each missed path is isolated with respect to the other paths in the circuit. It should be noted that circuit s35932 is the one with the largest number of output nodes. As can be seen, the missed nodes in circuit s35932 closely correspond to the number of missed paths. As commented before, the missed paths have in fact low probability to become critical and can be covered with little modifications to the final circuit.

Table 4.4: Selected nodes to be monitored ISCAS c432 c880 c1908 c2670 c3540 c5315 c7552 s838 s1423 s5378 s9234 s13207 s35932 Output Nodes 7 26 25 50 22 123 108 33 79 203 234 714 1760 Correlation 1 3 7 1 2 4 18 4 1 8 1 1 32 Approach Correlation 1 3 7 1 1 4 17 4 1 7 1 1 53 from Layout

4.6.3 Comparison with Other Works

A comparison of the main features of the proposed methodology with other state- of-art methods is shown in Table 4.5. The approach in [134] does not consider

100 4.7. CONCLUSIONS the impact of process variations. Furthermore, it only considers a unique workload profile, assigning a signal probability of 0.5 to all the main inputs of the circuit. The approach in [135] selects a path based on its probability to exceed a given delay constraint. This approach is pessimistic because those paths that may exceed the initial delay constraint but that are always faster than another monitored path do not require to be monitored. Approaches in [115] and [136] select a path based on its probability to exceed the delay of the LCP. However, they assume worst-case aging conditions, where the path under analysis is under maximum aging, while the LCP remains fresh. Furthermore, they do not consider that there could exist some paths that can cover other paths that the LCP can not. The proposed methodology presents various improvements with respect to those in the state-of-art. The selection algorithm considers that a path can cover other paths that the LCP can not. Furthermore, worst-case workload assumptions are not made. Instead, critical paths are selected for various realistic workload profiles considering the effect of temperature. Furthermore, it is worth to highlight that the proposed approach can take place at an early design stage as it does not require layout information to approximate the spatial correlation between gates.

Table 4.5: Methodology Comparison Feature [134] [135] [115] [136] Proposal Process Variations No Yes Yes Yes Yes

Selection Criteria Di > TCK − Tm P (Di > TCK ) P (Di > DLCP ) P (Di > DLCP ) P (Di,t > Dj,t) Workload Proﬁle Single Workload Worst Worst Worst Multiple Workloads Temperature Eﬀect No No No No Yes Spatial Correlation N/A Layout Layout Layout No Layout

4.7 Conclusions

A methodology to select the critical paths of a circuit for reliable Dynamic Fre- quency Scaling under BTI-aging and process variations eﬀects have been proposed.

101 4.7. CONCLUSIONS

The statistical selection condition assures that those paths likely to fail first over the lifetime are included in the critical path set for reliable application of DFS. The proposed methodology addresses three major challenges for path selection: 1) Critical Path statistical behavior due to process variations, 2) Critical Path dependence on workload profile, which is rarely known in advance at the design phase, and 3) Critical Path reordering probability over the time due to aging. The proposed methodology can be applied at an early design phase as it uses a spatial correlation approach to performing SSTA. Hence no detailed layout information is required to extract spatial correlation between gates. It has been shown that a trade-off can be established between the computational time and the critical paths set identified for various operating workloads scenarios. Furthermore, it was shown that most of the critical paths are identified by analyzing a moderate number of workload profiles. A spatial correlation approach was used to perform aging-aware SSTA early in the design flow, where the critical paths need to be identified. Few missed paths are obtained compared to an approach using spatial correlation information extracted from circuit layout. However, those few missed paths, which are unlikely to become critical, could be covered with little effort using design gate sizing or adding a small guardband penalty to the clock period. Additionally, it has been shown that the number of critical paths selected with the proposed approach is significantly lower than that selected using worst-case aging scenarios. Hence, the proposed approach further mitigates online test effort.

102 Chapter 5

Guardband Reduction by Eﬃcient Selection and Sizing of Critical Gates

Guardbands added to the clock period to assure reliable circuit operation are becoming unacceptably large as devices continue to shrink, due to the increasing impact of aging and process variations effects. This chapter presents a methodology to reduce the guardband of a circuit towards a desired target guardband by efficient selection and sizing of critical gates considering BTI aging and PV effects. The proposed approach uses metrics to identify those gates providing efficient guardband reduction with as small as possible area overhead.

5.1 Related Work

Various aging-aware design techniques already exist in the literature. Gate-level size optimization is a widely used approach to address aging and process variations issues. This method tries to size the gates to achieve optimal trade-oﬀs between delay, area, and reliability. This requires a method to ﬁnd and re-size

103 5.1. RELATED WORK the most beneficial gates. In [91], the design optimization of a full-adder circuit based on extensive SPICE simulations was presented. However, SPICE-based reliability analysis is computationally unfeasible for large-scale integrated circuits. The works [78,82], built gate libraries robust to aging by sizing the transistors in a gate according to the duty cycle that the devices experience. However, these approaches require more detailed guidance to determine where to place the robust gates within a circuit as the gates having more delay degradation may not be the most influential to overall circuit timing. In [149], it is proposed to increase the size of the gates in the critical paths of the circuit and having a delay degradation larger than a given threshold (i.e., 5%). The proposed approach takes into account the maximal load capacitance that a gate of a given size can drive. However, not all the gates in the critical paths have the same impact on circuit delay degradation. Therefore, they should be treated differently. In [89, 92] an optimization problem minimizing circuit area for a given delay constraint under aging is for- mulated and solved using Lagrangian relaxation. However, these methods may become complex for large circuits, especially if process parameters variations are considered.

The concept of gate criticality metrics under aging effects was introduced in [142]. Gate criticality metrics provide a fast estimation of how efficiently the delay degradation of the circuit can be improved at a given area cost. Thus, designers can optimize the circuit by increasing (decreasing) the size of the gates with higher (lower) metric score. Different gate metrics have been proposed in [44, 133, 135, 142]. These metrics combine parameters representing the nominal delay (with no-PV and no-BTI) sensitivity with respect to gate sizing, the number of critical paths impacted by the gate, and the delay degradation of a gate. Gates are selected based on the metric score ranking, and design actions take place on the selected gates. In [133,142], the selected gates are replaced by their aging-robust counterparts from an aging-aware gate library (such as that in [82]). In [44], [135], the size of the gates with the highest metric score is iteratively increased until the

104 5.1. RELATED WORK desired timing constraint is met. However, area saving by size-decreasing in gates with little impact on delay is not considered. Also, the used metrics do not consider the impact of sizing a gate on both the degradation and the standard deviation of the paths delay (under PV), which may limit the efficiency of the gate selection and sizing process. BTI-induced delay degradation strongly depends on the workload executed by the circuit as it defines the duty cycle of each device in the circuit. Unfor- tunately, the exact operating workload executed by a circuit over the lifetime is unpredictable and hardly known in advance at the design phase. Therefore, a major limitation of the aforementioned aging-aware optimization approaches is that they either assume worst-case duty cycle or a specific duty cycle profile at main circuit inputs for aging estimation. While the first approach leads to conservative designs with excessive area overhead, the second approach may not be reliable enough if the actual signal probabilities that the circuit experiences during normal operation differ from those used at the circuit design phase. Recently, a sizing approach considering the distribution of paths delay degradation for various different duty cycle profiles has been proposed in [141]. However, degraded path delays are optimized for the mean value of the expected degradation over the set of feasible operating workloads, which do not guarantee reliable operation. Furthermore, the effect of process variations is not considered. The contributions of this thesis for reliable circuit design are:

1. New statistical gate sizing metrics are proposed. The metrics include the impact of gate sizing on BTI delay degradation and standard deviation of delay reduction. A fast approximation of the statistical delay sensitivity of a path with respect to sizing of a gate is proposed. The sizing process considers gate sizing-up to improve delay and gate sizing-down to mitigate area overhead.

2. A multiple workloads aware sizing algorithm is proposed. The paths delays

105 5.2. CONSIDERATIONS FOR GATE SIZING METRICS

are estimated for various diﬀerent signal probabilities scenarios at main inputs. Then, the paths are optimized for the signal probability scenario that causes largest delay degradation. In such way, a more accurate estimation of the maximal paths delay degradation is made.

5.2 Considerations for Gate Sizing Metrics

5.2.1 Sizing Impact on Gate Delay

When the size of a gate is changed, its timing behavior is modiﬁed. The statistical delay of a gate has three components: the nominal mean delay, the mean delay degradation due to BTI and the standard deviation of the delay due to joint eﬀects of BTI and process variations (See Equation 5.1). These delay components are obtained from the linear gate delay model 3.6.

Di = N(µ, σ) = N(µDn + µ∆D, σD) (5.1)

Figure 5.1 shows the initial (nominal) delay, the gate delay degradation due to BTI after 10 years of aging, and the standard deviation of the gate delay of an inverter gate as a function of the gate size. Nominal delay reduces because transistors current capability to charge or discharge capacitive loads increases, while gate delay degradation reduces because gate delay becomes less sensitive to Vth instability. Similarly, standard deviation reduces as the gate delay is less sensitive to process parameters variations. An important parameter measuring the impact of sizing a gate on its timing behavior is the statistical delay sensitivity, which measures the change in the gate delay per unit of gate size increase. From Figure 5.1, it is concluded that statistical

D delay sensitivity to gate sizing (SK ) is composed of (See Equation 5.2):

Dn 1. A nominal delay sensitivity component (SK ).

106 5.2. CONSIDERATIONS FOR GATE SIZING METRICS

∆D 2. A delay degradation sensitivity component (SK ).

σD 3. A standard deviation sensitivity component (SK ).

D Dn ∆D σD SK = SK + SK + SK (5.2)

100 20

D D 80 μn μ BTI D 15

60 ( ps ) 10 ) ps ( 40

5 20

0 0 0 2X 4X 6X 8X 10X Gate Size

Figure 5.1: Nominal delay, Delay degradation, and Standard Deviation of the delay of an inverter gate as function of its size.

It should be highlighted that conventional sizing approaches [44,133,135,142] do not consider the two last sensitivity components regarding the impact of sizing a gate on the delay degradation and on the standard deviation of the delay. Including these eﬀects in the statistical delay sensitivity allows to select more eﬃciently the best gates to be sized.

5.2.2 Sizing Impact on Path Delay

When the size of a gate is increased, it does not only impact on the gate timing itself, but also on the timing of the adjacent gates [92]. As illustrated in Figure 5.2, when the size of a gate within a path is increased, its input capacitance increases, thus impacting of the delay of its preceding gate in the path. Similarly, the output transition time of the gate reduces due to its

107 5.3. PROPOSED GATE SIZING METRICS

path segment of gate i 0

K Non-sensitive -2 Slightly Sensitive -4 i-1 i i+1 / K D i / K -6 D i Segment of gate i -8 i-3 i-2 i-1 i i+1 i+2 i+3 Gate Figure 5.2: Impact of sizing a gate on the adjacent gates higher current capability, thus impacting on the subsequent gate. Therefore, the selection of the optimal gates to be sized to improve delay has to consider the impact on adjacent gates and the optimization should be global (i.e., at the path level) rather than local (i.e., at the gate level).

5.3 Proposed Gate Sizing Metrics

Gate selection metrics are proposed to guide the optimization process. The gate selection metrics are intended to identify the best critical gates to be sized in the critical paths to improve circuit guardband eﬃciently.

5.3.1 Statistical Path Delay Sensitivity

The statistical path delay sensitivity to gate sizing measures how much impact has sizing a gate on the path delay. Therefore, it provides information of the most eﬃcient gate to be sized to improve the statistical delay of a path. Knowing the mean and standard deviation for a given aging time for all the gates in a path using Equation 3.6, the path delay PDF (Dp = N(µDp, σDp)) is

108 5.3. PROPOSED GATE SIZING METRICS obtained by:

N X µD,p = µDi = µDn,p + µ∆D,p (5.3a) i=1 v u N N uX X σD,p = t ρij · σDi · σDj (5.3b) i=1 j=1

where µDi is the mean delay of the gate i for the given aging time, σDi and σDj are the standard deviation of the gate delay i and j, respectively. The parameter

ρij corresponds to the correlation between gate delays. Note that the mean value of path delay has a nominal component and a component due to aging effects. Also, the standard deviation of the paths depends on aging effects, as the threshold voltage variability changes due to aging, as was shown in Equation 3.3. The statistical path delay sensitivity is defined as the derivative of the µ + 3σ of the path delay distribution to sizing a gate i in the path:

SDp = ∂µDp + 3 σDp Ki ∂Ki ∂Ki (5.4) h i = ∂µDn,p + ∂µ∆D,p + 3 · ∂σDp ∂Ki ∂Ki ∂Ki

where Ki is the size of the gate i in the path, µDp and σDp are the mean value and the standard deviation of the aged path delay obtained from Equations

5.3a and 5.3b, respectively. µDn,p and µ∆D,p correspond to the mean value of the nominal (fresh) path delay and the mean value of the delay degradation of the path. As can be seen from Equation 5.4, three components inﬂuence the statistical path delay sensitivity to sizing a gate. The component related to the nominal delay (no aging and no PV), the component related due to aging eﬀects and the component related to process variations. Figure 5.3 shows the magnitude of the components of the statistical path delay sensitivity to gate sizing for the path

109 5.3. PROPOSED GATE SIZING METRICS

i-1 i i+1

Figure 5.3: Example of the magnitude of the components of the statistical path delay sensitivity to gate sizing (Equation 5.4). example shown in the inset Figure. As can be observed, the component related to the nominal path delay is the largest. However, the components due to the impact of aging on the mean delay and the impact of process variations are also signiﬁcant. It is worth to mention that the aging component depends on the degradation of the gate. A gate whose devices has a larger aging also exhibit a larger ∂µ∆D,p . It is also important to note that spatial correlation plays an ∂Ki important role in the magnitude of the derivative of the standard deviation of the path delay with respect to the size of a gate in the path. Figure 5.3 shows two cases: when all the gates in the path are placed far away, and their spatial correlation is almost zero (ρ = 0), and the case when all the gates are placed very close to each other, having a full spatial correlation (ρ = 1). Therefore, it can be concluded that those gates that have a higher correlation with the other gates in the path may be preferable to optimize.

Approximation of the Statistical Delay Sensitivity of a Path

The computation of Equation 5.4 may be computational costly as it implies to evaluate the statistical distribution of the aged path delay for both the current size of the gate and when the size of the gate is changed by a small perturbation

110 5.3. PROPOSED GATE SIZING METRICS

(this is done to compute the derivatives). Therefore, for a path with N gates, the statistical path delay would have to be computed N + 1 times to compute the statistical delay sensitivity with respect to the size of each gate, which is unpractical. Therefore, we propose some simpliﬁcations to evaluate Equation 5.4 eﬃciently.

2 2 path segment of gate i 0 path segment of gate i 0 K Non-sensitive -2 K Non-sensitive Slightly -2 SensitiveSlightly -4 Sensitive i-1 i i+1 / K -4 D i i-1 i i+1 / K -6 /D K i -6 D / Ki D i Segment of gate i -8 Segment of gate i -8 i-3 i-2 i-1 i i+1 i+2 i+3 i-3 i-2 i-1 i i+1 i+2 i+3 Gate Gate (a) Path Example (b) Sensitivity of the mean and standard deviation of the delay of each gate

Figure 5.4: A path example to illustrate the impact of sizing a gate on its neighboring gates in the path.

Figure 5.4(a) shows a path where the size of the gate i is increased in an small step. Figure 5.4(b) shows the change in the mean value and standard deviation of the delay of each gate in the path due to the small increase in the size of the gate i. As can be seen, only the timing response of the gates i − 1, i, and i + 1 are signiﬁcantly aﬀected. Both the mean and standard deviation of the gate i − 1 increases due to the larger input capacitance of the sized gate. On the other hand, the mean and standard deviation of the gate i + 1 reduces because of its input signal switches faster as gate i becomes stronger. Obviously, the mean value and the standard deviation of the delay of the sized gate are the most reduced when the size of this gate is increased. It should be noted that the change in the standard deviation of the delay of a gate is much smaller than the change in the mean value, as was observed before in Figure 5.3. Based on above observations the following approximations are made:

111 5.3. PROPOSED GATE SIZING METRICS

Sensitivity of the mean of the path delay to gate sizing

The change in the mean delay of a path is mainly due to the change in the mean delay of the gates in the segment of the gate i. Therefore, we approximate the sensitivity of the mean delay of a path to gate sizing (First term of Equation 5.4) as the sum of the sensitivities of the gates in the segment of the gate:

∂µDp ≈ ∂µD,i−1 + ∂µD,i + ∂µD,i+1 ∂Ki ∂Ki ∂Ki ∂Ki (5.5)

∂µ ∂µ ∂µ ≈ D,i−1 · ∂Cin,i + D,i + D,i+1 · ∂SROi ∂CLi−1 ∂Ki ∂Ki ∂SRIi+1 ∂Ki

where µD,i−1, µD,i and µD,i+1 are the aged delays of the gates i − 1, i and i + 1 in the path segment of the gate being analyzed, CLi−1 is the load capacitance of the gate i − 1, Cin,i is the input capacitance of gate i, SRIi+1 is the signal transition time at input of gate i + 1 and SROi is the signal transition time at output of gate i, which is equal to SRIi+1.

Note that by using this approximation only the mean delay of the gate segment needs to be recomputed rather than the mean delay of the entire path.

Sensitivity of the standard deviation of the path delay to gate sizing

The change in the standard deviation of the delay of a path due to sizing a gate i is mainly due to the change of the standard deviation of the delay of the gate i and its impact on the covariance with the other path gates. Therefore, we approximate the sensitivity of the standard deviation of the path delay to gate sizing (second term of Equation 5.4) as follows:

112 5.3. PROPOSED GATE SIZING METRICS

∂ PN PN ρ ·σ ·σ ∂σD,p = √1 · i=1 j=1 ij Di Dj ∂Ki 2 ∂Ki 2 σD,p

∂σ2 1 Di PN ∂σDi (5.6) ≈ · + 2 · ρij · σDj 2σD,p ∂Ki j6=i ∂Ki

1 ∂σDi PN ≈ · ρij · σDj σD,p ∂Ki j=1 As can be observed, the sensitivity of the standard deviation of the path delay depends on the spatial correlation that the sized gate i has with each other of the gates in the path. As shown before, when a gate is highly correlated with the others, its eﬃciency for reduction of the standard deviation of the path delay increases. Note that Equation 5.6 only depends on the derivative of the standard deviation of the size of the gate itself. Therefore, to evaluate Equation 5.6 only the standard deviation of the gate of interest i needs to be recomputed.

5.3.2 Proposed Sizing Metrics

Equations 5.5 and 5.6 measure the gate-sizing induced change on the mean and the standard deviation of the delay at a path level. However, the best gates to be sized need to be selected at the circuit level. For this reason, the statistical path sensitivity is combined with other circuit parameters related to the impact of sizing a gate at the circuit level.

Area Impact

Circuit area is a very important constraint in the design of digital circuits because it is proportional to power consumption and it aﬀects the cost of a product. It is desirable to design circuits meeting delay and lifetime constraints at the cost of minimum area overhead. Therefore, it is important to consider the area cost of sizing a gate, in the metrics formulation

113 5.3. PROPOSED GATE SIZING METRICS

, )

+ ( ⋅

Figure 5.5: Layout of an inverter gate.

We can deﬁne the Area Impact of sizing a gate as the gate area increase per unit of the size of the gate. The area impact of a gate depends on the cell layout. For instance, consider the layout of an inverter gate shown in Figure 5.5. The area of the cell is the product of the height and the width (Ai) of the cell,

Areagate = K · (WN + WP ) · (Ai) (5.7)

The height of the cell depends on the channel width assigned to NMOS and PMOS devices for minimum size symmetrical cell and the gate size, which proportionally scales the transistors dimensions. While the height of the cell is design- dependent, the width of the cell depends on the layout-rules only, which are given by the technology. As shown in Figure 5.5, the design rules aﬀecting the height of a cell are the Active area extension (AEX), the contacts width (CW), the Poly to Diﬀusion Contact distance (PDC),and the channel length.

When the size of a gate (K) is increased, the area increases linearly, as shown in Figure 5.6. The change in circuit area is given by the slope of the lines for each gate, which depends on the minimum channel width assigned to PMOS and NMOS and the width of the cells. As can be observed, increasing the size of an

114 5.3. PROPOSED GATE SIZING METRICS

2 NOR2 1.5 NAND2 INV

0.5

0 0 2 4 6 8 10 Gate Size

Figure 5.6: Areas of some basic digital gates as function of gate size inverter gate has a lower impact on the area, while increasing the size of NOR gates has the highest area cost. The area impact of sizing a gate should be taken into account in the metrics. Based on this observation, in the proposed metric the statistical delay sensitivity is normalized with respect to the gate area impact (Ai), which is a layout dependent parameter but can be estimated for a pre-designed gate library. By

D normalizing gate delays sensitivities (SK /Ai), the metric considers the gate delay reduction per unit increase of area.

Gate Criticality

Gate criticality can be considered as the number of critical paths passing through a gate [142]. Critical paths of a circuit, are those paths that do not meet the timing constraint. Let us consider the circuit with two critical paths P1 and P2 shown in Figure 5.7. If we resize a non-shared gate in one path (e.g., gate G1), it would be required to resize a gate in the other path. In this case, two gates would be resized to meet timing constraint. However, if a shared gate (e.g., G4 or G5) is resized, only one gate impacts the area cost. Shared gates between paths present a higher criticality and are topologically preferred for size optimization.

115 5.3. PROPOSED GATE SIZING METRICS

Figure 5.7: Circuit example to illustrate gate criticality

Path Slack

The aged delay of a critical path depends on its number of gates and their rate of degradation [8]. The amount of path delay degradation and its number of gates determine the severity of a path to violate the timing constraint. Therefore, we use aged path slack time to take into account these effects. The slack time of a path p is defined as the difference between the path delay and the delay constraint:

slackp = Dcons − (µD,p + 3 · σD,p) (5.8)

where Dcons is the delay constraint, µD,p is the mean path delay and σD,p) is the standard deviation of the path delay. Then, the path slack is positive if the path violates the timing constraint and negative if it meets the desired constraint.

Metrics Formulation

The statistical delay sensitivity of a path, reveals which gate in the path has a larger impact on the µ + 3σ delay of the path. This parameter is combined with other important information of the gates to form the proposed sizing metrics. Two gate sizing metrics are proposed to guide the optimization process: One that measures the beneﬁt of sizing-up a gate, and other that measures the beneﬁt of sizing-down a gate. For each gate i, the two following sizing metrics (See Equation 5.9) are evaluated:

116 5.3. PROPOSED GATE SIZING METRICS

SD ·|Slack+ |·N Slack− ·∆A Ki,AV G i,AV G i i,AV G i (5.9) MSU,i = MSD,i = D ∆Ai SKi,AV G·Ni

where MSU,i and MSD,i are the sizing-up and sizing-down metrics, respectively. D SKi,AV G is the average delay sensitivity of the Ni paths passing through the gate to change in the gate size (Ki), Slacki,AV G is the average slack of these paths, and ∆Ai is the area impact of sizing the gate, which depends on the cell layout. Each metric is evaluated for a diﬀerent path set. For sizing-up, the metric is evaluated within the paths with positive slack (violating the delay constraint).

Thus, Slacki,AV G takes a positive value. On the other hand, Slacki,AV G takes a negative value for the sizing-down metric. The value of Ni is also diﬀerent depending on the metric that is being evaluated.

The parameters used in the metric provide topological, timing and area-related information. The metric score determines the delay-area trade-off of sizing a gate. The sizing-up metric score increases for gates influencing many paths since they allow to improve various paths at a time. The sizing-up metric score also increases for those gates in slow paths with large negative slacks as those paths should be optimized with higher priority. A large delay sensitivity also increases the sizing- up metric score as a large delay reduction can be obtained by increasing the gate size. Finally, the sizing-up metric score reduces for gates with a high area impact because increasing the size of those gates is area costly. A similar interpretation of the parameters is made for the sizing-down metric. In this case, the size- down metric score increases for those gates affecting few paths and with low delay sensitivity to gate sizing (low impact on delay) and large positive slack. Also, gates with a large area impact are preferred due to potential area savings when down-sizing the gate.

117 5.4. METHODOLOGY FOR EFFICIENT SELECTION AND SIZING OF CRITICAL GATES

5.4 Methodology for Eﬃcient Selection and Siz- ing of Critical Gates

The proposed framework for efficient critical gate selection and sizing consists of the three principal steps shown in Figure 5.8. In the first step, the Potential Criti- cal Paths (PCPs) of the circuit are identified using Worst-case aging assumptions. A PCP is a path whose delay may exceed the nominal delay of the circuit due to the combined effect of aging and process variations. These paths impose some guardband for reliable circuit operation. The second-step performs a multiple workload-aware aging analysis of the PCP set. In this step, a realistic maximum delay degradation of the PCPs is estimated for a finite number of Signal Proba- bility combinations at main inputs (representing a possible executed workload). In such way, uncertainty about the specific signal probability profile of the circuit is addressed. In the third step, the delay of the PCPs is improved using a metric-guided statistical size optimization technique. The result of this flow is an optimized design that meets a given target guardband with low area overhead.

Circuit Gate Level Netlist Multiple Workloads-Aware Aging Analysis

PCP Identification under Set of Signal Probability Worst-Case Aging Profiles at main inputs

PCP Maximum Aging Duty Cycle & Evaluation Temperature Evaluation

Selection and Sizing of Critical Gates Guardband Heuristic Sizing Computation

Ranked Candidate Improved max < Design Gates

Identify Fast and Slow Evaluate Sizing Metrics PCPs

Figure 5.8: Flow of the proposed gate sizing optimization methodology.

118 5.4. METHODOLOGY FOR EFFICIENT SELECTION AND SIZING OF CRITICAL GATES

5.4.1 PCP Identiﬁcation Under Worst-Case Aging

Aging-Aware Statistical Static Timing Analysis (SSTA) is run assuming worst- case aging conditions. The devices in the circuit are assumed under near-static stress (α ≈ 1) and operating under a high temperature (T = 120◦C). Those paths whose µ + 3σ corner of the aged delay distribution is greater than the nominal (without aging and PV) delay of the circuit are identified as PCP. These paths impose some guardband for reliable operation. The identification of PCPs under worst-case aging conditions allows focusing our optimization flow in only a restricted path set rather than in the entire circuit, allowing to minimize computational effort.

5.4.2 Multiple Workloads Aging Analysis

One major challenge to be addressed for aging-aware circuit design is that the exact workload that the circuit will experience over the lifetime is hardly known in advance at circuit design phase. The workload impacts on the stress duty cycle (α) of each device and on their operating temperature [140], [124], which in turn influence BTI degradation, complicating circuit reliability analysis and optimization. To address unpredictability of the circuit workload at the design phase, we refine the aging conditions at which the delay of each PCP is evaluated during design optimization by performing a Multiple Workload Aware Maximum Aging Analysis. The main idea of this step is to determine a realistic maximum upper bound for the delay degradation of each PCP. This is done by evaluating the delay degradation of the previously selected PCPs for various workload profiles. In order to speed-up the computational time, the following actions are made:

1. Only the mean value of the aged delay of each PCP is computed for each workload.

119 5.4. METHODOLOGY FOR EFFICIENT SELECTION AND SIZING OF CRITICAL GATES

2. If the cumulative maximum mean aged delay of a path does not increase for a given number of consecutive workloads, the actual path delay is taken as maximum and the path is no longer updated for the next workloads.

Algorithm2 describes the proposed approach. For a user-defined number of workloads profiles, a set of signal probabilities at main circuit inputs are generated and propagated to internal nodes. A uniform random number generator between 0 and 1 is used to obtain the signal probability assigned to each input. Then, the stress duty cycle (α) of each transistor in the circuit is computed. The operating temperature of each cell is also computed as it strongly influences BTI mechanism. For each path, only the mean value of the delay degradation is computed. This is done to avoid the excessive computational cost of updating the PCPs timing degradation statistically for each workload profile. If the obtained delay degradation of a PCP j (µ∆Dj ,W L) is greater than the maximum obtained with previous workload profiles (µ∆Dj ,MAX ), the maximum aged delay of the path is updated and the conditions of duty cycle and temperature of path’s devices causing such maximum degradation for the path are stored. If the maximum delay degradation of a path does not increase within a consecutive user-defined number (N) of workload profiles, a flag signal (PCP [j].MAX) is activated, indicating that the currently stored delay degradation for the path is a good enough estimation for the maximal path delay degradation. Then, this path is no longer evaluated for the subsequent workload profiles. This is done to mitigate the computational cost associated with this step. It is important to note that the signal probability profile that causes maximum path delay degradation can be different for each path. Once the conditions that cause maximum degradation for each path have been identified, the standard deviation of the paths delay is computed, and the set of PCPs is reduced by discarding those paths whose maximum aged at the µ + 3σ corner do not exceed the nominal circuit delay. This process mitigates the computational effort required for design optimization.

120 5.4. METHODOLOGY FOR EFFICIENT SELECTION AND SIZING OF CRITICAL GATES

ALGORITHM 2: Multiple Workload-Aware Maximum Aging Analysis Input: Potential Critical Paths, Max WL Output: α and T of PCP gates causing worst aging for WL = 1 to WL = Max WL do generate propagate SP () compute duty cycle() compute temperature() compute ∆V thBTI () for j = 1 to j = Num.P CP s do if PCP [j].MAX == 0 then

Compute µ∆Dj ,W L

if µ∆Dj ,W L > µ∆Dj ,MAX then µ∆Dj ,MAX = µ∆Dj ,W L Save α and T of each device in PCP j else

if µ∆Dj ,MAX has not increased in the last N proﬁles then PCP[j].MAX=1 PCP j is not evaluated for next WLs end end end end end end

50 0 100 200 300 400 500 Workload Profiles

Figure 5.9: Maximum delay degradation of some paths as function of the number of tested workload proﬁles.

Figure 5.9 shows the behavior of the cumulative maximum delay degradation obtained for some paths of circuit C1908 as a function of the number of tested (sampled) workload proﬁles. As can be seen, after some workload proﬁles are tested, the maximum delay degradation obtained for all the paths tend to saturate.

121 5.4. METHODOLOGY FOR EFFICIENT SELECTION AND SIZING OF CRITICAL GATES

This behavior suggests that only a moderated number of workload proﬁles need to be analyzed to get a good estimation of the maximum aged delay that a path can take. Once identiﬁed the realistic conditions for maximal delay degradation of each path, the circuit is optimized to guarantee reliable operation of the PCP set even under such conditions.

5.4.3 Selection and Sizing of Critical Gates

Iterative selection and sizing of critical gates is done to optimize the circuit design for a reduced guardband constraint. In such way, lifetime reliability can be assured with low area overhead.

Guardband Computation

The ﬁrst step in the gate sizing procedure (See Figure 5.8) is to compute the actual guardband required for reliable circuit operation. The time-guardband that each

PCP impose over the nominal circuit delay (GBp) is given by,

GBp = (µDp + 3σD,p) − Dnom (5.10)

where µDp and σD,p are the mean value and the standard deviation of maximum aged delay of PCP p, and Dnom is the nominal circuit delay without aging. Due to the combined impact of aging and process variations, these guardbands may be too large, and if one path exceeds the target guardband GBt, it may perform a faulty operation before the lifetime ends.

Identiﬁcation of Fast and Slow PCPs

The PCPs are then separated in two diﬀerent subsets depending on the corresponding guardband imposed by each path, as illustrated in Figure 5.10: a)

Slow-PCPs subset, which has negative slack (GBt − GBp < 0); and b) Fast-PCPs

122 5.4. METHODOLOGY FOR EFFICIENT SELECTION AND SIZING OF CRITICAL GATES

initial guardband (GB) nominal aging + process target GB ()

P2 P3 P4 P5 Fast-PCPs* Slow-PCPs* () () Non-Critical Path

Size-Down Size-Up Gates Gates

Figure 5.10: Fast and Slow PCP sets

subset, which has some positive slack (GBt − GBp > 0). This classiﬁcation is done to exploit the fact that diﬀerent design actions can be taken over each PCP subset: Some gates are sized-up in the Slow-PCPs to improve their delay, while some gates are sized-down in the Fast-PCPs to take advantage of their slack to mitigate area overhead.

Gate Selection and Sizing Heuristic

The gate selection metrics are evaluated to identify the best critical gates to be sized in each critical path subset. The sizing-up metric is evaluated within the Slow-PCPs to identify the most beneficial gates to improve their timing efficiently. The sizing-down metric, which identifies those gates with little impact on delay and belonging to paths with enough slack, is evaluated within the Fast-PCPs to mitigate area overhead with little negative impact on delay. Algorithm3 summarizes the sizing procedure using the following heuristics to select the best gates to size up/down using the sizing-metrics scores.

Sizing-Up Selection: The obtained sizing-up metric score MSU,i reflects the benefit of Slow-PCPs delay reduction vs. area trade-off established by each gate.

Thus, the N gates with the highest MSU,i are picked and size-up proportionally to their respective score: ∆K = step · MSU,i. Where N is a user-deﬁned number of gates that are sized at each iteration and step is the maximum size change that a

123 5.4. METHODOLOGY FOR EFFICIENT SELECTION AND SIZING OF CRITICAL GATES gate can take at an iteration.

Sizing-Down Selection: The obtained sizing-down metric score MSD,i re- ﬂects a trade-oﬀ between Fast-PCPs delay increase to area reduction. However, the interdependence between Fast-PCPs and Slow-PCPs must be considered for sizing-down gates. This is because sizing-down a gate having a high MSD,i may negatively impact on Slow-PCPs if the gate also has a high MSU,i score. Therefore, the two following conditions are applied to pick the gates to size-down:

Gates sized-up is not allowed to be sized-down in the same iteration.

Gates that have a sizing-up metric score (MSU,i) larger than a constraint

(CMSU ) are not allowed to be sized-down.

The constraint in MSU,i is used to limit the negative impact on the slow-PCPs delay of sizing-down gates. The value of the constraint is dynamically changed along the sizing process. It is initially set to 1 (maximum) to maximize area savings as any gate is allowed to be sized-down, but it is gradually reduced if the delay of the Slow-PCPs is not being improved in a given iteration, so that delay converges towards the desired target delay. The N gates with the highest

MSD,i score fulﬁlling the aforementioned conditions are sized-down according to the following rule: ∆K = −step · (1 − MSU,i) · MSD,i. Thus, the amount of size reduction of a selected gate reduces (increases) if the gate has a high (low) MSU,i

(MSD,i) score. In general, the size-down procedure is useful when the initial design has over- sized gates due to a non-optimal design. Also, it becomes beneficial when a gate in a Fast-PCP is driven by a gate in a Slow-PCP. This may occur if the gate in the Fast-PCP was sized-up at the beginning of the optimization procedure (i.e., the gate was critical first), but then its influence to the remaining Slow-PCPs decreases. Once the selected gates are sized, the PCPs timing information is updated (See Algorithm3). The delay of each PCP is evaluated under the conditions

124 5.5. RESULTS FROM APPLICATION TO ISCAS BENCHMARK CIRCUITS of temperature and duty cycle of the devices that caused maximum aged path delay. The gate selection and sizing process is iteratively repeated until the target guardband is achieved. It is worth to mention that the parameters N and step impact on the speed of the sizing process and the accuracy of the results.

ALGORITHM 3: Sizing Heuristic

Input: Gates Metrics Scores (MSU,i and MSD,i) Output: Selected Gates with Updated Size

Set CMSU = 1 Rank gates in Slow-PCPs according to MSD,i Rank gates in Fast-PCPs according to MSD,i // Size-up gates in Slow-PCPs: for i = 1 to N do Ki = Ki + step · MSU,i end // Size-Down gates in Fast-PCPs: for i = 1 to N do

if MSU,i < CMSU and i was not sized-up then Ki = Ki − step · (1 − MSU,i) · MSD,i end end Update PCPs Timing Information if max(GBp) is not reduced then

CMSU =CMSU − 0.1 end

5.5 Results from Application to ISCAS Bench- mark Circuits

The proposed gate sizing optimization technique for guardband reduction has been implemented in C ++ code and applied to some ISCAS benchmark circuits designed using a 32nm Synopsys Generic Technology [65]. The original design of each circuit was of minimum area, where all the gates are of minimum size.

5.5.1 Sensitivity Approximation

Let us ﬁrst analyze the accuracy of the proposed path statistical sensitivity approximation. For this analysis, the impact of sizing each gate in the µ + 3σ delay

125 0

-10

-20

Exact -30 Aproximation

5.5. RESULTS FROM0 APPLICATION5 10 15 TO20 ISCAS25 BENCHMARK30 35 CIRCUITS Gate Index

-10

-20

-30

-40 Exact Aproximation -50 0 5 10 15 20 Gate Index

Figure 5.11: Statistical delay sensitivity of a Path with respect to sizing each gate in the path. Longest path of C1908 circuit. of the slowest path of ISCAS circuit C1908 was computed. Figure 5.11 shows the statistical path delay sensitivity with respect to the size of each gate in the path. Data is shown for both the sensitivity obtained with the proposed derivative approximation and the exact derivative calculation, where the statistical delay of the path is completely re-computed when the size of each gate is perturbed. As can be observed, the proposed approximation follows well the derivative obtained with the exact computation. Therefore, the proposed approximation measures well the impact that sizing a gate has on the delay of a path.

5.5.2 Optimization Results

Table 5.1 shows detailed results obtained from the application of the proposed methodology to ISCAS 85/89 circuits. The second and third columns give the total number of paths and gates in the circuits. Circuits of diﬀerent size and complexity were considered. Columns 4-7 show results related to the multiple workload-aware degradation analysis step. The column labeled as PCPs correspond to the number of Potential Critical Paths, which are those paths whose µ + 3σ delay can become greater than the nominal delay of the circuit. These paths are the ones considered during selection and sizing of the gates. As can

126 5.5. RESULTS FROM APPLICATION TO ISCAS BENCHMARK CIRCUITS

Table 5.1: Results using multiple workloads for aging analysis Workloads Analysis Sizing: GB = 20% Sizing: GB = 10% Circuit Paths Gates t t PCPs CGs GB(%) CPU (sec) Slow-PCP Area (%) CPU (sec) Slow-PCP Area % CPU (sec) c432 82363 175 43161 149 40.47 1817.66 7990 14.32 1817.66 28711 40.34 4102.46 c880 4935 254 1607 149 45.79 86.84 480 15.15 65.09 971 52.03 146.07 c1908 15638 253 8523 198 40.77 342.02 1727 9.76 271.73 4969 35.97 1386.80 c2670 3490 419 650 110 36.43 49.93 156 4.09 38.74 402 16.01 85.31 c3540 787401 748 106760 435 36.57 11422.47 4426 1.06 3516.21 31668 3.15 6305.02 c5315 24666 1224 4785 403 35.77 302.50 677 2.38 204.81 2140 8.94 519.201 c7552 43613 1450 8070 935 38.25 442.08 781 0.32 143.53 3401 1.22 257.65 s298 231 166 79 36 40.44 2.57 23 6.26 1.17 46 16.34 2.52 s838 1714 279 262 102 38.09 13.73 73 4.92 3.066 150 11.01 6.62 s1423 44726 991 4323 339 30.75 1285.01 183 1.25 608.43 1180 4.91 1295.85 s5378 11728 1297 757 147 42.47 44.02 105 1.13 15.85 422 4.72 43.51 s9234 29071 1817 2234 256 39.66 200.55 233 2.36 115.21 976 8.92 269.42 s13207 585368 2403 125051 493 37.69 9024.80 1595 0.94 5331.54 24180 3.10 10752.82 s35932 26146 10902 7044 4805 46.99 311.92 1024 7.17 1288.83 2050 22.91 536.39 be observed, the number of PCPs does not depend on the total number of paths (i.e., the number of PCPs in c7552 and s1423 is very different, but these circuits have a similar number of paths). The number of PCPs changes depending on the susceptibility of each circuit to aging and the circuit topology. Column 5 gives the number of gates that compose the selected set of PCPs. These gates are called Critical Gates. The proposed heuristic identifies which critical gates are more beneficial to be sized accordingly to the obtained metrics score. Column 6 shows the initial guardband that would have to be added to the nominal delay to assure reliable operation of the non-optimized circuit under the combined effect of aging and process variations. As can be seen, the percentage of guardband needed can be up to 45% of the nominal delay, which may be unacceptably large for high-performance state-of-art designs. Column 7 shows the CPU time spent in the multiple workload-aware degradation analysis. This corresponds to the time for evaluating the PCP set for multiple workload profiles. However, it should be noted that the number of times each path is evaluated may be different depending on when it is detected that the maximal delay obtained for a path does not further increase as more workload profiles are analyzed.

Columns 8 to 13 of Table 5.1 show the results obtained from applying the proposed methodology for selection and sizing of critical gates to reduce the initial

127 5.5. RESULTS FROM APPLICATION TO ISCAS BENCHMARK CIRCUITS guardband to more acceptable sizes of 20% and 10%. The number of PCPs in the initial design that violates the corresponding target guardband (Slow-PCPs), the area overhead, and the CPU time for design improvement are given. When the guardband constraint is of 20% the area overhead for most of the circuits remains low because only some slow-PCPs out of the whole PCP set need to be improved. However, when the guardband constraint becomes stringent, the number of slow-PCPs significantly increases for most of the circuits. This increase depends on how balanced are the delays of the PCPs. The area overhead and the corresponding CPU time also increases for stringent guardband constraints as further optimization is needed to achieve the target guardband. It is worth to mention that the area overhead for gate sizing depends on the circuit topology as it significantly changes from one circuit to another. Therefore, designers should trade-off the desired target guardbands and the corresponding area budget for optimization.

5.5.3 Beneﬁt of the Multiple Workload-Aware Aging Anal- ysis

Tables 5.2 and 5.3 show the results for the cases when worst-case conditions and when only one single workload condition are assumed for aging analysis, respectively. When only a representative single workload profile is used, the number of PCPs, the number of Critical Gates and the estimated guardband for the circuit are smaller than those obtained with our proposed multiple workload degradation analysis approach. This is because, in our approach, at least one of the tested workload profiles caused more aging in the PCPs than the single workload profile. Consequently, the area overhead when designing circuits using the single duty cycle assumption is lower than the area overhead obtained with our proposal. Also, the CPU time for design optimization is slightly lower. However, it should be noted that the exact workload profile that a circuit will experience over the

128 5.5. RESULTS FROM APPLICATION TO ISCAS BENCHMARK CIRCUITS lifetime is not well known in advance at the design phase. If the workload profile that the optimized circuit experiences over the lifetime are different than the used during design, some of the paths may degrade enough to cause a failure to time specifications. When worst-case aging is assumed, the number of PCPs, Critical Gates and estimated GB for the circuit significantly increase, which results in significant area overhead and extra CPU time since the PCPs receive more sizing than needed. For instance, consider circuit c2670, where 12.68% of the area is saved with respect to worst-case design for a 20% of target guardband when our approach is used. The area saved increases to 49.19% for tighter 10% guardband constraint. A similar comparison can be made for the other circuits.

Table 5.2: Results using a single workload for aging analysis GB = 20% GB = 10% Circuit PCPs CGs GB(%) t t slow-PCP Area (%) CPU (sec) slow-PCP Area (%) CPU (sec)

c432 41859 149 39.72 6773 12.49 1651.70 25905 36.52 3477.28 c880 1399 137 41.99 359 10.91 47.78 791 35.77 96.993 c1908 8102 196 39.42 1370 7.14 228.39 4322 30.85 919.49 c2670 630 106 35.41 143 3.47 35.16 372 14.55 90.93 c3540 98300 432 35.58 3476 0.85 2918.81 27636 2.97 5890.34 c5315 4641 403 35.51 637 2.16 187.42 2066 8.63 617.04 c7552 7752 927 36.61 622 0.27 120.83 3047 1.07 247.36 s298 73 36 39.8 22 5.70 1.02 45 15.39 2.49 s838 258 102 37.18 67 4.23 2.59 139 10.18 6.152 s1423 4144 339 30.56 166 1.00 522.07 1110 4.90 1252.60 s5378 735 140 42.03 88 0.99 12.59 376 4.25 41.19 s9234 2119 250 38.98 196 2.27 115.29 887 8.32 239.76 s13207 118393 492 37.43 1337 0.84 4602.98 21395 2.91 9764.68 s35932 5827 4706 45.85 978 6.00 114.42 1711 18.31 410.02

An interesting observation from Table 5.3 is that the computational for sizing the circuit assuming under worst-case assumptions can be significantly longer than the computational time for sizing the circuit using the proposed approach because more sizing iterations are required to reduce the overestimated delay degradation. In this case, it is evident that the extra computational time used to evaluate the timing response of the paths for various workload profiles is justified. The robustness of the optimized designs for 20000 random generated workload

129 5.5. RESULTS FROM APPLICATION TO ISCAS BENCHMARK CIRCUITS

Table 5.3: Results using worst-case aging analysis GB = 20% GB = 10% Circuit PCPs CGs GB(%) t t slow-PCP Area (%) CPU (sec) slow-PCP Area (%) CPU

c432 53800 152 50.97 26176 31.47 3740.57 41112 95.71 7968.74 c880 2360 158 59.96 1004 54.33 210.47 1582 330.11 1526.04 c1908 11276 204 52.75 5409 38.12 4793.69 8610 125.90 4871.58 c2670 837 131 51.52 461 16.67 99.286 659 65.20 256.52 c3540 204056 459 47.17 26432 2.69 10172.12 88146 7.10 17373.13 c5315 9331 551 53.22 3007 11.43 788.71 5656 44.57 2250.963 c7552 12866 1070 53.87 3615 1.30 449.40 7635 3.96 765.07 s298 88 38 55.18 54 21.95 3.28 76 127.44 26.72 s838 501 135 68.67 251 27.62 23.96 375 92.09 76.55 s1423 10885 408 48.13 2054 6.39 2973.07 5568 16.04 5162.69 s5378 1287 264 55.96 512 6.88 82.18 809 56.85 673.32 s9234 4795 335 49.53 861 6.00 376.15 1837 49.93 4390.52 s13207 200000 510 49.71 21606 2.78 15752.84 109607 9.14 33316.94 s35942 11121 5021 54.61 1388 11.94 726.19 4361 66.45 2422.31

1200 1.1 1.1 1200 Multiple WL ISCAS Multiple SP 1000 MultipleMultiple WL SP ISCAS 1000 c880 SingleSingle WLSP SingleSingle WLSP s298 800 800 600 Chips that may 600 Chips that may violate GB violate GB 400 400 200 200 0 0 420 430 440 450 460 470 480 215 217.5 220 222.5 225 227.5 230 +3 (ps) +3 (ps) D D D D (a) c880 (b) s298

Figure 5.12: Histograms of the µ + 3σ corner of the circuit aged delay obtained for an exhaustive number (20000) of multiple signal probability groups at main circuit inputs. profiles was analyzed. For each workload profile, the corresponding duty-cycle and operating temperature of the devices were computed, and SSTA was performed to obtain the corresponding µ + 3σ delay of all the PCPs. Then, the maximum µ+3σ delay among all the PCPs was identified, since this value corresponds to the maximum delay that the circuit can take for the given signal probability profile. Figures 5.12(a) and 5.12(b) show histograms of the µ + 3σ delay of circuits c880 and s298 for both the optimized design using our proposes multiple workload-

130 5.5. RESULTS FROM APPLICATION TO ISCAS BENCHMARK CIRCUITS

Table 5.4: Percentage of WLs for which 10% of guardband may be violated. Circuit Multiple WLs Single WL

c432 0.40 70.50 c880 4.79 55.11 c1908 0.11 32.24 c2670 3.52 53.62 c3540 3.05 7.62 c5315 4.17 29.7 c7552 0.05 24.77 s298 3.21 18.08 s838 0.33 9.71 s1423 0.29 0.48 s5378 4.57 51.98 s9234 1.58 31.88 s13207 0.24 15.10 s35932 2.04 79.05

Avg. 2.02 34.79

aware aging analysis and the optimized design using only one single workload profile for aging analysis. As can be seen, there are some workloads for which the µ + 3σ delay of the circuit may violate the allowed 10% of guardband. How- ever, it is clear that the optimized design with the proposed approach may violate the guardband for a significantly lower number of workloads, which demonstrates the benefit of the proposed approach. Table 5.4 shows the percentage of signal probability profiles for which the µ + 3σ delay of the circuits violated the specified guardband constraint of 10%. For most of the circuits, the robustness of the optimized design using the multiple workload-aware degradation analysis is significantly better than the optimized designs using only one single workload profile. Therefore, the obtained designs with the proposed approach are more reliable.

In the case that the coverage of possible workloads want to be improved, designers can trade-oﬀ the degree of circuit reliability and the computational cost of performing a more exhaustive duty-cycle-aware aging analysis step.

131 5.5. RESULTS FROM APPLICATION TO ISCAS BENCHMARK CIRCUITS

5.5.4 Gate Sizing Optimization Comparison

The eﬃciency of the proposed gate sizing optimization metrics and the heuristic was compared against other aging-aware metrics proposed in [142] and [135], which are given in Equation 5.11

Ni·∆Di Di PN Mi,[142] = + δ Mi,[135] = S · ∆Di (5.11) max(Ni·∆Di) Ki p

where Mi,[142] is the metric proposed in [142], Ni is the number of paths (PCPs) passing through the gate i, ∆Di is the delay degradation of the gate, and δ is a parameter that takes the value of 1 if the gate is in the slowest path of the circuit

Di (the path with the largest negative slack). Mi,[135] is the metric in [135], SKi is the delay sensitivity of the gate to changes in its size, N is the number of paths

Di passing through the gate and SKi is the delay degradation of the gate. Note that the metrics chosen for comparison are based on different characteristics of the gates. The metric in [142] focuses on identifying the gates suffering largest degradation and affecting many paths. A similar metric has been proposed in [133] to improve the aged performance of critical paths in an ALU. The metric in [135] considers not only the gate delay degradation and the number of paths affected by the gate but also the delay sensitivity on gate sizing. This metric was shown to perform better than that proposed in [44]. The metrics in Equation 5.11 were used in the proposed metric-guided design flow. However, only the sizing-up heuristic was applied since the approaches in [142], and [135] do not consider a metric for sizing-down gates. Table 5.5 shows the results for 20% and 10% of guardband constraint. For comparison purposes, the area overhead of the proposed approach is also given. It can be observed that our proposal gives designs with lower area overhead than those obtained using the metrics of [142] and [135]. This is because the proposed metrics includes important parameters not taken into account in the others such as the area impact and the paths slack. Furthermore, the proposed metric uses

132 5.5. RESULTS FROM APPLICATION TO ISCAS BENCHMARK CIRCUITS a statistical sensitivity that takes into account the impact of sizing a gate on the nominal delay, delay deterioration and variability due to process variations. Among the other metrics, the one in [142] is less efficient for gate sizing. This is because this metric only considers the delay degradation and the number of paths impacting by the gate. However, it does not consider the path delay sensitivity to gate sizing. Therefore, it does not measure the potential delay improvement of sizing a gate. Although the metric in [135] includes the delay sensitivity parameters, this sensitivity does not consider aging or process variations effects. Therefore, it may fail to indicate the gates more beneficial for delay improvement.

Table 5.5: Percentage of area overhead for target guardbands of 10% and 20% of using three selection and sizing methods GB = 20% GB = 10% Circuit t t Our [142][135] Our [142][135]

c432 12.49 29.40 25.28 36.52 96.17 75.64 c880 15.15 21.85 21.52 52.03 102.15 67.45 c1908 9.76 17.84 14.16 35.97 58.93 54.61 c2670 4.09 10.63 6.31 16.01 173.9 25.47 c3540 0.85 2.76 1.27 2.97 7.24 4.84 c5315 2.38 5.07 3.21 8.94 16.13 14.69 c7552 0.33 1.38 0.46 1.22 3.36 1.86 s298 6.26 10.61 11.21 16.34 40.01 24.30 s838 4.92 8.33 6.90 11.01 15.90 16.42 s1423 1.25 4.12 1.03 4.92 10.91 5.54 s5378 1.14 2.67 2.22 4.72 11.50 8.66 s9234 2.27 3.74 3.19 8.32 14.93 12.12 s13207 0.84 2.22 1.55 2.91 5.85 5.77 s35932 6.00 8.61 9.11 18.31 33.74 29.26

Avg. 4.83 9.23 7.67 15.72 42.19 24.75

Figure 5.13 shows the number of iterations performed when using each of the metrics for gate sizing. An iteration corresponds to the process of performing SSTA over all the PCPs to determine the current guarband required for the circuit and the Slow- and Fast- PCP subsets, the evaluation of the sizing metric for each

133 5.6. CONCLUSIONS gate in the PCPs, and the application of the sizing heuristic. It can be observed that the proposed metrics imply a larger number of iterations. This is because the proposed metrics select the gates giving an eﬃcient delay-area trade-oﬀ, which are not necessarily the ones improving quicker the circuit delay. On the other hand, the metric in [142] gives a higher priority to those gates in the longest PCP of the circuit, which results in a quick delay reduction but with increased area overhead.

3530 Our [142] [135]

3025 Our [137] [146]

2520

2015

1510

105

50 c432 c880 c1908 c2670 c3540 c5315 c7552 s298 s838 s1423 s5378 s9234 s13207 s35932 0 Circuit c432 c880 c1908 c2670 c3540 c5315 c7552 s298 s838 s1423 s5378 s9234 s13207 s35932 100 (a) GBt = 20%Circuit of nominal delay 90 Our [137] [146] 100 80 90 Our [142] [135] 70 80 60 70 50 60 40 50 30 40 20 30 10 20 0 10 c432 c880 c1908 c2670 c3540 c5315 c7552 s298 s838 s1423 s5378 s9234 s13207 s35932 0 Circuit c432 c880 c1908 c2670 c3540 c5315 c7552 s298 s838 s1423 s5378 s9234 s13207 s35932 Circuit

(b) GBt = 10% of nominal delay

Figure 5.13: Number of iterations to achieve target guardband.

5.6 Conclusions

A gate sizing optimization methodology for guardband reduction in the presence of aging due to BTI and process variations have been presented. Since the workload proﬁle that a circuit experience over the lifetime if unknown at the design

134 5.6. CONCLUSIONS phase, the proposed methodology calculates the maximum aged delay of the circuit paths for various workload profiles at main inputs, which define the duty cycle of the devices. In such way, traditional worst-case aging or unreliable single workload assumptions have been avoided. It has been shown that a reasonable number of signal probability profiles is sufficient to obtain a good estimation of the maximum degraded delay of the circuit paths. For delay optimization towards the desired guardband, gate metrics and a sizing heuristic have been proposed to select the best gates for both sizing-up to improve delay and sizing-down to mitigate area overhead. An approximation for the statistical sensitivity of a path delay has been proposed to mitigate computational effort of statistical timing analysis and speed-up metrics evaluation. The application of the proposed methodology on IS- CAS benchmark circuits has shown that gate sizing using the proposed approach to estimate the maximum aged delay of the circuit paths results in significant area savings compared to gate sizing under worst-case aging assumptions. Fur- thermore, it has been shown that the obtained designs can operate reliably for a different usage profile than those used during design optimization. The results using the proposed metrics has been compared against the results using other gates metrics in the literature, and it has been shown that the proposed approach provides a better area-delay trade-off.

135 5.6. CONCLUSIONS

136 Chapter 6

Conclusions

Aging eﬀects have an increasing impact on the lifetime reliability and performance of modern digital integrated circuits designed with deeply scaled CMOS technologies. This thesis has focused on reliability analysis and improvement of digital integrated circuits considering aging due to Bias Temperature Instability (BTI), which is the dominant aging mechanism in digital circuits. Performance degradation due to BTI is highly dependent on operating conditions such as the executed workloads by the circuits, and process-induced parameters variations.

An aging-aware statistical static timing analysis tool has been developed in C++ code to analyze the timing performance of large digital circuits under the combined effects of BTI and Process Variations. The impact of the operating workload of the circuit on devices aging has been taken into account as it strongly influences the stress duty-cycle and operating temperature of the devices, and consequently, the overall circuit degradation. It has been shown that conventional clock guardband to assure reliable circuit operation may be too large (more than 30% of nominal delay), resulting in significant performance loss.

This thesis has addressed the design of reliable digital circuits from two diﬀer- ent perspectives: 1) Aging Monitoring of Critical Paths; and 2) Eﬃcient Design for Guardband Reduction.

137 For aging monitoring, it has been proposed a methodology to select a reduced set of critical paths. Those paths should be monitored for reliable on-line prediction of possible circuit failures due to aging. This allows to minimize the initial guardband of the circuit and to dynamically react against aging by gradually increasing the clock period to keep reliable operation. A statistical selection condition has been proposed to identify those paths that may eventually become the slowest path of the circuit for a given process and aging condition. The selection condition allows to trade-off the degree of circuit reliability and the corresponding size of the critical paths set to be monitored. The proposed methodology first reduces the overall set of topological path to a smaller set of Potential Critical Paths from which detailed Critical Path selection is performed. The selection algorithm considers that different paths may become critical depending on the operating workload. Furthermore, it considers that paths that can become slower than the initial LCP of the circuit may be covered by other paths. The proposed methodology is suitable to be applied at early design phases as it uses a spatial correlation approximation, which has been proposed based on previous analysis of the delay difference between paths using spatial correlation extracting from circuits layouts. It has been shown that using the proposed correlation approximation for path selection results in a much more reduced set of critical paths than using conventional correlation bounding approaches. The proposed methodology considers various possible operating workloads for the circuit during path selection process to avoid conventional worst-case aging assumptions. Designers should trade-off the degree of circuit reliability with the computational effort of analyzing a larger number of workloads. However, it has been shown that for most of the circuits considering a moderate number of workloads is sufficient to identify most of the critical paths. Moreover, it was shown that critical path selection using various possible executed workloads results in a much more reduced set of paths to be monitored, compared to classic worst-case workload assumption. The proposed path selection method using the spatial correlation approximation was compared

138 with path selection using spatial correlation extracted from circuit layout, and it has been shown that most of the critical paths are eﬀectively selected. Moreover, it has been shown that those paths not selected with the proposed approach have a low probability to become critical.

For reliability-aware circuit design, it has been proposed a gate size optimization technique based on new statistical gate sizing metrics. Unlike other works that only considers sizing-up gates to compensate aging and process variations, the proposed method also considers sizing-down gates to mitigate area overhead. The proposed metrics consider various parameters to measure the benefit of sizing- up/down a gate in the circuit to reduce the required guardband for reliable operation with low area overhead. The parameters taken into account in the gate metrics are: 1) the statistical path delay sensitivity to gate sizing, which measures how much impact has sizing a gate on the µ + 3σ delay of a path; 2) The area impact of sizing the gate; 3) The number of paths passing through the gate; and 4) The slack time of the paths passing through the gate. This thesis improved the classic concept of delay sensitivity by taking into account a nominal delay sensitivity component, a delay degradation sensitivity component, and a standard deviation sensitivity component. A fast approximation of the statistical delay sensitivity of path delay to gate sizing has been proposed based on an analysis of the impact of sizing a gate on the mean delay and standard deviation of the adjacent gates in the path. A multiple workload profiles analysis has been proposed to estimate, in a more realistic way, the maximum delay degradation of the paths. This allows avoiding pessimistic worst-case aging assumptions. Furthermore, this allows reducing the set of paths where size-optimization is focused. It has been shown that a moderate number of workload profiles is sufficient to obtain a good estimation of the maximum degraded delay of the circuit paths. The proposed methodology provides an efficient trade-off between the desired guardband for the circuit and the area overhead. The results have shown that the proposed approach significantly reduces the area overhead and CPU time for size-optimization

139 compared to circuit design using worst-case aging assumptions. Furthermore, by considering multiple operating workload conditions during circuit design, a more reliable design is obtained with lower area overhead, compared to circuit design considering a single workload trace. The results have also shown that by using the proposed metrics, timing constraints can be met with lower area overhead than that obtained using other gate metrics available in the literature. The contributions of this thesis provide new aging-aware design techniques to help designers to approach the complex problem of the design of reliable digital integrated circuits under the combined eﬀects of aging due to BTI and process variations.

140 Published Papers

1. A. F. Gomez and V. Champac, “Critical path selection under NBTI/PBTI aging for adaptive frequency tuning,” 2016 IEEE East-West Design & Test Sym- posium (EWDTS), Yerevan, 2016, pp. 1-4.

2. A.F. Gomez, F. Lavratti, G. Medeiros, M. Sartori, L. Bolzani Poehls, V. Champac, F. Vargas, “Eﬀectiveness of a hardware-based approach to detect resistive-open defects in SRAM cells under process variations”, In Microelectron- ics Reliability, Volume 67, 2016, Pages 150-158, ISSN 0026-2714,

3. Forero, Freddy, Andres F. Gomez and Vctor H. Champac. “Improvement of Negative Bias Temperature Instability Circuit Reliability and Power Consump- tion Using Dual Supply Voltage.” J. Low Power Electronics 12 (2016): 395-402.

4. A. Gomez and V. Champac, “A Metric-Guided Circuit Design Method- ology for Aging Guardband Compensation”, (PhD Forum) IFIP/IEEE Interna- tional Conference on Very Large Scale Integration (VLSI-SoC), Tallinn, Septem- ber, 2016.

5. F. Forero, A. Gomez and V. Champac, “A methodology for NBTI circuit reliability at reduced power consumption using dual supply voltage,” 2016 17th Latin-American Test Symposium (LATS), Foz do Iguacu, 2016, pp. 81-86.

141 6. A. F. Gomez and V. Champac, “Early Selection of Critical Paths for Reliable NBTI Aging-Delay Monitoring,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 7, pp. 2438-2448, July 2016.

7. A. Gomez and V. Champac, “A new sizing approach for lifetime improvement of nanoscale digital circuits due to BTI aging,” 2015 IFIP/IEEE Interna- tional Conference on Very Large Scale Integration (VLSI-SoC), Daejeon, 2015, pp. 297-302.

8. V. Champac, A. N. h. Reyes and A. F. Gomez, “Circuit performance optimization for local intra-die process variations using a gate selection metric,” 2015 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI- SoC), Daejeon, 2015, pp. 165-170.

9. A. Gomez, L. Poehls, F. Vargas and V. Champac, “An early prediction methodology for aging sensor insertion to assure safe circuit operation due to NBTI aging,” 2015 IEEE 33rd VLSI Test Symposium (VTS), Napa, CA, 2015, pp. 1-6.

10. A. Gomez and V. Champac, “Eﬀective selection of favorable gates in BTI-critical paths to enhance circuit reliability,” 2015 16th Latin-American Test Symposium (LATS), Puerto Vallarta, 2015,pp1-6.

142 References

[1] Tetsuaki Matsunawa, Bei Yu, and David Z. Pan. Optical proximity correction with hierarchical bayes model. Journal of Micro/Nanolithography, MEMS, and MOEMS, 15(2):021009, 2016.

[2] J. Xiong, V. Zolotov, and L. He. Robust extraction of spatial correlation. IEEE Transac- tions on Computer-Aided Design of Integrated Circuits and Systems, 26(4):619–631, April 2007.

[3] Martin Wirnshofer. Sources of Variation, pages 5–14. Springer Netherlands, Dordrecht, 2013.

[4] H. Nan, L. Li, and K. Choi. Tddb-based performance variation of combinational logic in deeply scaled cmos technology. In Thirteenth International Symposium on Quality Electronic Design (ISQED), pages 328–333, March 2012.

[5] Jeﬀrey Hicks, Daniel Bergstrom, Mike Hattendorf, Jason Jopling, Jose Maiz, Pae Sangwoo, Chetan Prasad, and Jami Wiedemer. 45nm transistor reliability. Intel Technology Journal, 12(2):131 – 144, 2008.

[6] Mesut Meterelliyoz. Process variation and thermal aware circuit design and test for nanoscale technologies. PhD thesis, Purdue University, 2008.

[7] Cristiano Forzan and Davide Pandini. Statistical static timing analysis: A survey. Inte- gration, the {VLSI} Journal, 42(3):409 – 435, 2009. Special Section on {DCIS2006}.

[8] W. Wang, S. Yang, S. Bhardwaj, S. Vrudhula, F. Liu, and Y. Cao. The impact of nbti eﬀect on combinational circuit: Modeling, simulation, and analysis. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(2):173–183, Feb 2010.

[9] Whithe Paper Freescale. Thermal analysis of semiconductor systems. 2008.

[10] S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4):23–29, Jul 1999.

[11] G. E. Moore. Cramming more components onto integrated circuits, reprinted from electronics, volume 38, number 8, april 19, 1965, pp.114 ﬀ. IEEE Solid-State Circuits Society Newsletter, 11(5):33–35, Sept 2006.

143 REFERENCES

[12] T. Liu, C. C. Chen, and L. Milor. Comprehensive reliability-aware statistical timing analysis using a uniﬁed gate-delay model for microprocessors. IEEE Transactions on Emerging Topics in Computing, PP(99):1–1, 2017.

[13] M. Yabuuchi, R. Kishida, and K. Kobayashi. Correlation between bti-induced degrada- tions and process variations by measuring frequency of ros. In 2014 IEEE International Meeting for Future of Electron Devices, Kansai (IMFEDK), pages 1–2, June 2014.

[14] J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers. The impact of technology scaling on lifetime reliability. In International Conference on Dependable Systems and Networks, 2004, pages 177–186, June 2004.

[15] A. Asenov. Statistical device variability and its impact on design. In 2008 14th IEEE International Symposium on Asynchronous Circuits and Systems, pages xv–xvi, April 2008.

[16] Elie Maricau and Georges Gielen. CMOS Reliability Overview, pages 15–35. Springer New York, New York, NY, 2013.

[17] Sani R. Nassif Duane Boning Michael Orshansky. Lithography Enhancement Techniques, pages 127–154. Springer US, Boston, MA, 2008.

[18] D. Blaauw, K. Chopra, A. Srivastava, and L. Scheﬀer. Statistical timing analysis: From basic principles to state of the art. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(4):589–607, April 2008.

[19] Chirayu S. Amin, Noel Menezes, Kip Killpack, Florentin Dartu, Umakanta Choudhury, Nagib Hakim, and Yehea I. Ismail. Statistical static timing analysis: How simple can we get? In Proceedings of the 42Nd Annual Design Automation Conference, DAC ’05, pages 652–657, New York, NY, USA, 2005. ACM.

[20] Hongliang Chang and Sachin S. Sapatnekar. Statistical timing analysis considering spatial correlations using a single pert-like traversal. In Proceedings of the 2003 IEEE/ACM In- ternational Conference on Computer-aided Design, ICCAD ’03, pages 621–, Washington, DC, USA, 2003. IEEE Computer Society.

[21] V Izumi Nitta V Toshiyuki Shibuya and V Katsumi Homma. Statistical static timing analysis technology. Fujitsu Sci. Tech. J, 43(4):516–523, 2007.

[22] P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos. Modeling within- die spatial correlation eﬀects for process-design co-optimization. In Sixth international symposium on quality electronic design (isqed’05), pages 516–521, March 2005.

[23] S. S. Chung. The variability issues in small scale trigate cmos devices: Random dopant and trap induced ﬂuctuations. In Proceedings of the 20th IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA), pages 173–176, July 2013.

144 REFERENCES

[24] S. S. Sapatnekar. What happens when circuits grow old: Aging issues in cmos design. In 2013 International Symposium on VLSI Technology, Systems and Application (VLSI- TSA), pages 1–2, April 2013. [25] S. Khan, S. Hamdioui, H. Kukner, P. Raghavan, and F. Catthoor. Bti impact on logical gates in nano-scale cmos technology. In 2012 IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits Systems (DDECS), pages 348–353, April 2012. [26] C. Nunes, P. F. Butzen, A. I. Reis, and R. P. Ribas. A methodology to evaluate the aging impact on flip-flops performance. In 2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI), pages 1–6, Sept 2013. [27] J. Fang and S. S. Sapatnekar. The impact of bti variations on timing in digital logic circuits. IEEE Transactions on Device and Materials Reliability, 13(1):277–286, March 2013. [28] D. Lorenz, G. Georgakos, and U. Schlichtmann. Aging analysis of circuit timing considering nbti and hci. In 2009 15th IEEE International On-Line Testing Symposium, pages 3–8, June 2009. [29] J. Fang and S. S. Sapatnekar. Incorporating hot-carrier injection effects into timing analysis for large circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(12):2738–2751, Dec 2014. [30] Navid Khoshavi, Rizwan A. Ashraf, Ronald F. DeMara, Saman Kiamehr, Fabian Obo- ril, and Mehdi B. Tahoori. Contemporary cmos aging mitigation techniques: Survey, taxonomy, and methods. Integration, the VLSI Journal, 59(Supplement C):10 – 22, 2017. [31] S. Zafar, Y. Kim, V. Narayanan, C. Cabral, V. Paruchuri, B. Doris, J. Stathis, A. Callegari, and M. Chudzik. A comparative study of nbti and pbti (charge trapping) in sio2/hfo2 stacks with fusi, tin, re gates. In 2006 Symposium on VLSI Technology, 2006. Digest of Technical Papers., pages 23–25, 2006. [32] H. Kufluoglu and M. A. Alam. A generalized reaction ndash;diffusion model with explicit

h ndash; hboxH2 dynamics for negative-bias temperature-instability (nbti) degradation. IEEE Transactions on Electron Devices, 54(5):1101–1107, May 2007. [33] Ahmad Ehteshamul Islam, Nilesh Goel, Souvik Mahapatra, and Muhammad Ashraful Alam. Reaction-Diﬀusion Model, pages 181–207. Springer India, New Delhi, 2016. [34] Ketul B. Sutaria, Jyothi B. Velamala, Athul Ramkumar, and Yu Cao. Compact modeling of BTI for circuit reliability analysis, pages 93–119. Springer New York, 1 2015. [35] A.K. Verma, S. Ajit, and D.R. Karanki. Reliability and Safety Engineering. Springer Series in Reliability Engineering. Springer London, 2010. [36] John C. Knight. Safety critical systems: Challenges and directions. In Proceedings of the 24th International Conference on Software Engineering, ICSE ’02, pages 547–550, New York, NY, USA, 2002. ACM.

145 REFERENCES

[37] Alvin W. Strong, Ernest Y. Wu, Rolf-Peter Vollertsen, Jordi Sune, Giuseppe La Rosa, and Timothy D. Sullivan. Reliability Wearout Mechanisms in Advanced CMOS Technologies. John Wiley & Sons, 2006.

[38] R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc. Design of ion-implanted mosfet’s with very small physical dimensions. IEEE Journal of Solid-State Circuits, 9(5):256–268, Oct 1974.

[39] Mark White and Yuan Chen. Scaled cmos technology reliability users guide. National Aeronautics and Space Administration,Jet Propulsion Laboratory, 2010.

[40] M. A. Khan and H. G. Kerkhoﬀ. The essence of reliability estimation during operational life for achieving high system dependability. In 2013 Euromicro Conference on Digital System Design, pages 575–581, Sept 2013.

[41] Marvin Onabajo and Jose Silva-Martinez. Process Variation Challenges and Solutions Approaches, pages 9–30. Springer US, Boston, MA, 2012.

[42] Yinghai Lu, Li Shang, Hai Zhou, Hengliang Zhu, Fan Yang, and Xuan Zeng. Statisti- cal reliability analysis under process variation and aging eﬀects. In Design Automation Conference, 2009. DAC ’09. 46th ACM/IEEE, pages 514–519, July 2009.

[43] W. Wang, V. Reddy, Bo Yang, V. Balakrishnan, S. Krishnan, and Yu Cao. Statistical prediction of circuit aging under process variations. In 2008 IEEE Custom Integrated Circuits Conference, pages 13–16, Sept 2008.

[44] Yinghai Lu, Li Shang, Hai Zhou, Hengliang Zhu, Fan Yang, and Xuan Zeng. Statistical reliability analysis under process variation and aging eﬀects. In 2009 46th ACM/IEEE Design Automation Conference, pages 514–519, July 2009.

[45] A. T. Krishnan, F. Cano, C. Chancellor, V. Reddy, Zhangfen Qi, P. Jain, J. Carulli, J. Masin, S. Zuhoski, S. Krishnan, and J. Ondrusek. Product drift from nbti: Guard- banding, circuit and statistical eﬀects. In 2010 International Electron Devices Meeting, pages 4.3.1–4.3.4, Dec 2010.

[46] Victor M. van Santen, Hussam Amrouch, Javier Martin-Martinez, Montserrat Nafria, and J¨orgHenkel. Designing guardbands for instantaneous aging eﬀects. In Proceedings of the 53rd Annual Design Automation Conference, DAC ’16, pages 69:1–69:6, New York, NY, USA, 2016. ACM.

[47] M. S. Gupta, J. A. Rivers, P. Bose, G. Y. Wei, and D. Brooks. Tribeca: Design for pvt variations with local recovery and ﬁne-grained adaptation. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 435–446, Dec 2009.

[48] Jason Cong, Henry Duwe, Rakesh Kumar, and Sen Li. Better-than-worst-case design: Progress and opportunities. Journal of Computer Science and Technology, 29(4):656–663, Jul 2014.

146 REFERENCES

[49] S. Zafar, A. Kumar, E. Gusev, and E. Cartier. Threshold voltage instabilities in high- kappa; gate dielectric stacks. IEEE Transactions on Device and Materials Reliability, 5(1):45–64, March 2005.

[50] Chenyue Ma, Lining Zhang, Xinnan Lin, and Mansun Chan. Universal framework for temperature dependence prediction of the negative bias temperature instability based on microscope pictures. Japanese Journal of Applied Physics, 55(4):044201, 2016.

[51] S. Mahapatra, P. B. Kumar, and M. A. Alam. Investigation and modeling of interface and bulk trap generation during negative bias temperature instability of p-mosfets. IEEE Transactions on Electron Devices, 51(9):1371–1379, Sept 2004.

[52] E. M. Vogel. Electrical characterization of defects in high-k gate dielectrics. In 2005 International Semiconductor Device Research Symposium, pages 209–210, Dec 2005.

[53] T. Grasser, B. Kaczer, W. Goes, H. Reisinger, T. Aichinger, P. Hehenberger, P. J. Wagner, F. Schanovsky, J. Franco, M. Toledano Luque, and M. Nelhiebel. The paradigm shift in understanding the bias temperature instability: From reaction -diﬀusion to switching oxide traps. IEEE Transactions on Electron Devices, 58(11):3652–3666, Nov 2011.

[54] M. Toledano-Luque, B. Kaczer, P. J. Roussel, T. Grasser, G. I. Wirth, J. Franco, C. Vrancken, N. Horiguchi, and G. Groeseneken. Response of a single trap to ac negative bias temperature stress. In 2011 International Reliability Physics Symposium, pages 4A.2.1–4A.2.8, April 2011.

[55] G. I. Wirth, R. da Silva, and B. Kaczer. Statistical model for mosfet bias temperature instability component due to charge trapping. IEEE Transactions on Electron Devices, 58(8):2743–2751, Aug 2011.

[56] B. Kaczer, S. Mahato, V. Valduga de Almeida Camargo, M. Toledano-Luque, P. J. Rous- sel, T. Grasser, F. Catthoor, P. Dobrovolny, P. Zuber, G. Wirth, and G. Groeseneken. Atomistic approach to variability of bias-temperature instability in circuit simulations. In 2011 International Reliability Physics Symposium, pages XT.3.1–XT.3.5, April 2011.

[57] H. Kkner, S. Khan, P. Weckx, P. Raghavan, S. Hamdioui, B. Kaczer, F. Catthoor, L. Van der Perre, R. Lauwereins, and G. Groeseneken. Comparison of reaction-diﬀusion and atomistic trap-based bti models for logic gates. IEEE Transactions on Device and Mate- rials Reliability, 14(1):182–193, March 2014.

[58] S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and S. Vrudhula. Predictive modeling of the nbti eﬀect for reliable design. In IEEE Custom Integrated Circuits Conference 2006, pages 189–192, Sept 2006.

[59] Rakesh Vattikonda, Wenping Wang, and Yu Cao. Modeling and minimization of pmos nbti eﬀect for robust nanometer design. In Proceedings of the 43rd Annual Design Automation Conference, DAC ’06, pages 1047–1052, New York, NY, USA, 2006. ACM.

147 REFERENCES

[60] K. B. Sutaria, J. B. Velamala, C. H. Kim, T. Sato, and Y. Cao. Aging statistics based on trapping/detrapping: Compact modeling and silicon validation. IEEE Transactions on Device and Materials Reliability, 14(2):607–615, June 2014.

[61] Bogdan Tudor, Joddy Wang, Zhaoping Chen, Robin Tan, Weidong Liu, and Frank Lee. An accurate mosfet aging model for 28nm integrated circuit simulation. Microelectronics Reliability, 52(8):1565 – 1570, 2012. ICMAT 2011 - Reliability and variability of semiconductor devices and ICs.

[62] S. Kiamehr, F. Firouzi, and M. B. Tahoori. Aging-aware timing analysis considering combined eﬀects of nbti and pbti. In International Symposium on Quality Electronic Design (ISQED), pages 53–59, March 2013.

[63] H. I. Yang, S. C. Yang, W. Hwang, and C. T. Chuang. Impacts of nbti/pbti on timing control circuits and degradation tolerant design in nanoscale cmos sram. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(6):1239–1251, June 2011.

[64] Shreyas Kumar Krishnappa, Harwinder Singh, and Hamid Mahmoodi. Incorporating eﬀects of process, voltage, and temperature variation in bti model for circuit design. In IEEE Latin American Symp. on Circuits and Systems, pages 236 – 239, 2010.

[65] https://www.synopsys.com.

[66] Alexander Stempkovsky, Alexey Glebov, and Sergey Gavrilov. Calculation of stress probability for nbti-aware timing analysis. In Proceedings of the 2009 10th International Sym- posium on Quality of Electronic Design, ISQED ’09, pages 714–718, Washington, DC, USA, 2009. IEEE Computer Society.

[67] Hiroaki KONOURA, Yukio MITSUYAMA, Masanori HASHIMOTO, and Takao ONOYE. Stress probability computation for estimating nbti-induced delay degradation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E94.A(12):2545–2553, 2011.

[68] Dieter K. Schroder. Negative bias temperature instability: What do we understand? Mi- croelectronics Reliability, 47(6):841 – 852, 2007. Modelling the Negative Bias Temperature Instability.

[69] S. Khan and S. Hamdioui. Temperature dependence of nbti induced delay. In 2010 IEEE 16th International On-Line Testing Symposium, pages 15–20, July 2010.

[70] Naghmeh Karimi, Arun Karthik Kanuparthi, Xueyang Wang, Ozgur Sinanoglu, and Ramesh Karri. Magic: Malicious aging in circuits/cores. ACM Trans. Archit. Code Optim., 12(1):5:1–5:25, April 2015.

[71] Kai-Chiang Wu and Diana Marculescu. Joint logic restructuring and pin reordering against nbti-induced performance degradation. In Design, Automation and Test in Europe, DATE 2009, Nice, France, April 20-24, 2009, pages 75–80, 2009.

148 REFERENCES

[72] Saman Kiamehr, Farshad Firouzi, and Mehdi B. Tahoori. Input and transistor reordering for nbti and hci reduction in complex cmos gates. In Proceedings of the Great Lakes Symposium on VLSI, GLSVLSI ’12, pages 201–206, New York, NY, USA, 2012. ACM.

[73] Dominik Lorenz, Georg Georgakos, and Ulf Schlichtmann. Aging-aware timing analysis of combinatorial circuits on gate level (alterungsanalyse von kombinatorischen schaltungen auf gatterebene). it - Information Technology, 52:181–188, 2010.

[74] X. Chen, Y. Wang, Y. Cao, Y. Xie, and H. Yang. Assessment of circuit optimization techniques under nbti. IEEE Design Test, 30(6):40–49, Dec 2013.

[75] H. Amrouch, B. Khaleghi, A. Gerstlauer, and J. Henkel. Reliability-aware design to suppress aging. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6, June 2016.

[76] Mohammad Saber Golanbari, Saman Kiamehr, Mojtaba Ebrahimi, and Mehdi Baradaran Tahoori. Aging guardband reduction through selective ﬂip-ﬂop optimization. In ETS, 2015.

[77] Saket Gupta and Sachin S. Sapatnekar. Gnomo: Greater-than-nominal vdd operation for bti mitigation. In ASP-DAC, 2012.

[78] S. Kiamehr, F. Firouzi, M. Ebrahimi, and M. B. Tahoori. Aging-aware standard cell library design. In 2014 Design, Automation Test in Europe Conference Exhibition (DATE), pages 1–4, March 2014.

[79] P.F. Butzen, V. Dal Bem, A.I. Reis, and R.P. Ribas. Design of cmos logic gates with enhanced robustness against aging degradation. Microelectronics Reliability, 52(9):1822 – 1826, 2012. SPECIAL ISSUE 23rd EUROPEAN SYMPOSIUM ON THE RELIABILITY OF ELECTRON DEVICES, FAILURE PHYSICS AND ANALYSIS.

[80] S. Kiamehr, M. Ebrahimi, F. Firouzi, and M. B. Tahoori. Extending standard cell library for aging mitigation. IET Computers Digital Techniques, 9(4):206–212, 2015.

[81] Mojtaba Ebrahimi, Fabian Oboril, Saman Kiamehr, and Mehdi B. Tahoori. Aging-aware logic synthesis. In Proceedings of the International Conference on Computer-Aided Design, ICCAD ’13, pages 61–68, Piscataway, NJ, USA, 2013. IEEE Press.

[82] Michitarou Yabuuchi and Kazutoshi Kobayashi. Size optimization technique for logic circuits that considers bti and process variations. IPSJ Transactions on System LSI Design Methodology, 9:72–78, 2016.

[83] M. Ghane and H. R. Zarandi. Gate merging: An nbti mitigation method to eliminate critical internal nodes in digital circuits. In 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pages 786–791, Feb 2016.

[84] Y. Wang, H. Luo, K. He, R. Luo, H. Z. Yang, and Y. Xie. Nbti-aware dual v-th assignment for leakage reduction and lifetime assurance. Chinese Journal of Electronics, 18(2):225– 230, 2009.

149 REFERENCES

[85] Wen-Pin Tu, Shih-Wei Wu, Shih-Hsu Huang, and Mely Chen Chi. Nbti-aware dual threshold voltage assignment for leakage power reduction. In ISCAS, 2012.

[86] F. Forero, A. Gomez, and V. Champac. A methodology for nbti circuit reliability at reduced power consumption using dual supply voltage. In 2016 17th Latin-American Test Symposium (LATS), pages 81–86, April 2016.

[87] K. Kang, H. Kuﬂuoglu, M. A. Alam, and K. Roy. Eﬃcient transistor-level sizing technique under temporal performance degradation due to nbti. In 2006 International Conference on Computer Design, pages 216–221, Oct 2006.

[88] S. Roy and D. Z. Pan. Reliability aware gate sizing combating nbti and oxide breakdown. In 2014 27th International Conference on VLSI Design and 2014 13th International Con- ference on Embedded Systems, pages 38–43, Jan 2014.

[89] X. Yang and K. Saluja. Combating nbti degradation via gate sizing. In 8th International Symposium on Quality Electronic Design (ISQED’07), pages 47–52, March 2007.

[90] Zhila Amini-sheshdeh and Abdolreza Nabavi. Design of improved-reliability nanocircuits with mixed nbti- and hci-aware gate-sizing formulation. IEEJ Transactions on Electrical and Electronic Engineering, 8(6):587–590, 2013.

[91] Z. Abbas, M. Olivieri, U. Khalid, A. Ripp, and M. Pronath. Optimal nbti degradation and pvt variation resistant device sizing in a full adder cell. In 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), pages 1–6, Sept 2015.

[92] S. Khan and S. Hamdioui. Modeling and mitigating nbti in nanoscale circuits. In 2011 IEEE 17th International On-Line Testing Symposium, pages 1–6, July 2011.

[93] Song Jin, Yinhe Han, Huawei Li, and Xiaowei Li. Statistical lifetime reliability optimization considering joint eﬀect of process variation and aging. Integration, the {VLSI} Journal, 44(3):185 – 191, 2011.

[94] M. Yi, X. Liu, Q. Wu, T. Ni, Z. Huang, and H. Liang. Nbti mitigation by m-ivc with input duty cycle and randomness constraints. In 2016 IEEE East-West Design Test Symposium (EWDTS), pages 1–5, Oct 2016.

[95] F. Firouzi, S. Kiamehr, and M. B. Tahoori. Power-aware minimum nbti vector selection using a linear programming approach. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 32(1):100–110, Jan 2013.

[96] Y. Wang, X. Chen, W. Wang, V. Balakrishnan, Y. Cao, Y. Xie, and H. Yang. On the eﬃcacy of input vector control to mitigate nbti eﬀects and leakage power. In 2009 10th International Symposium on Quality Electronic Design, pages 19–26, March 2009.

[97] Maksim Jenihhin, Giovanni Squillero, Thiago Santos Copetti, Valentin Tihhomirov, Sergei Kostin, Marco Gaudesi, Fabian Vargas, Jaan Raik, Matteo Sonza Reorda, Leti- cia Bolzani Poehls, Raimund Ubar, and Guilherme Cardoso Medeiros. Identiﬁcation and

150 REFERENCES

rejuvenation of nbti-critical logic paths in nanoscale circuits. Journal of Electronic Testing, 32(3):273–289, 2016. [98] S. Arasu, M. Nourani, J. M. Carulli, and V. K. Reddy. Controlling aging in timing-critical paths. IEEE Design Test, 33(4):82–91, Aug 2016. [99] David R. Bild, Robert P. Dick, and Gregory E. Bok. Static nbti reduction using internal node control. ACM Trans. Des. Autom. Electron. Syst., 17(4):45:1–45:30, October 2012. [100] Y. Wang, X. Chen, W. Wang, Y. Cao, Y. Xie, and H. Yang. Leakage power and circuit aging cooptimization by gate replacement techniques. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 19(4):615–628, April 2011. [101] S. Bian, M. Shintani, Z. Wang, M. Hiromoto, A. Chattopadhyay, and T. Sato. Runtime nbti mitigation for processor lifespan extension via selective node control. In 2016 IEEE 25th Asian Test Symposium (ATS), pages 234–239, Nov 2016. [102] S. Morita, S. Bian, M. Shintani, M. Hiromoto, and T. Sato. Comparative study of path selection and objective function in replacing nbti mitigation logic. In 2017 18th Interna- tional Symposium on Quality Electronic Design (ISQED), pages 426–431, March 2017. [103] Andrea Calimera, Enrico Macii, and Massimo Poncino. Nbti-aware power gating for concurrent leakage and aging optimization. In Proceedings of the 2009 ACM/IEEE Inter- national Symposium on Low Power Electronics and Design, ISLPED ’09, pages 127–132, New York, NY, USA, 2009. ACM. [104] K. C. Wu, D. Marculescu, M. C. Lee, and S. C. Chang. Analysis and mitigation of nbti- induced performance degradation for power-gated circuits. In IEEE/ACM International Symposium on Low Power Electronics and Design, pages 139–144, Aug 2011. [105] D. Rossi, V. Tenentes, S. Yang, S. Khursheed, and B. M. Al-Hashimi. Reliable power gating with nbti aging beneﬁts. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 24(8):2735–2744, Aug 2016. [106] Jin Sun, Roman L. Lysecky, Karthik Shankar, Avinash Karanth Kodi, Ahmed Louri, and Janet Roveda. Workload assignment considering NBTI degradation in multicore systems. JETC, 10(1):4:1–4:22, 2014. [107] Xiaoming Chen, Yu Wang, Yu Cao, Yuchun Ma, and Huazhong Yang. Variation-aware supply voltage assignment for minimizing circuit degradation and leakage. In Proceedings of the 2009 ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED ’09, pages 39–44, New York, NY, USA, 2009. ACM. [108] L. Zhang and R. P. Dick. Scheduled voltage scaling for increasing lifetime in the presence of nbti. In 2009 Asia and South Paciﬁc Design Automation Conference, pages 492–497, Jan 2009. [109] S. V. Kumar, C. H. Kim, and S. S. Sapatnekar. Adaptive techniques for overcoming performance degradation due to aging in cmos circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 19(4):603–614, April 2011.

151 REFERENCES

[110] E. Mintarno, J. Skaf, R. Zheng, J. B. Velamala, Y. Cao, S. Boyd, R. W. Dutton, and S. Mitra. Self-tuning for maximized lifetime energy-eﬃciency in the presence of circuit aging. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(5):760–773, May 2011.

[111] S. Narang and A. P. Srivastava. Nbti detection methodology for building tolerance with respect to nbti eﬀects employing adaptive body bias. In 2015 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2015], pages 1–7, March 2015.

[112] R. Faraji and H. R. Naji. Adaptive technique for overcoming performance degradation due to aging on 6t sram cells. IEEE Transactions on Device and Materials Reliability, 14(4):1031–1040, Dec 2014.

[113] H. Mostafa, M. Anis, and M. Elmasry. Nbti and process variations compensation circuits using adaptive body bias. IEEE Transactions on Semiconductor Manufacturing, 25(3):460–467, Aug 2012.

[114] A. Gomez, L. Poehls, F. Vargas, and V. Champac. An early prediction methodology for aging sensor insertion to assure safe circuit operation due to nbti aging. In 2015 IEEE 33rd VLSI Test Symposium (VTS), pages 1–6, April 2015.

[115] J. Vazquez-Hernandez. Error prediction and detection methodologies for reliable circuit operation under nbti. In 2014 International Test Conference, pages 1–10, Oct 2014.

[116] Z. Amini-shehsdeh and A. Nabavi. A novel sensor for prediction of aging failure. In 2011 Third International Conference on Computational Intelligence, Modelling Simula- tion, pages 399–403, Sept 2011.

[117] Masanori Hashimoto. Run-time adaptive performance compensation using on-chip sensors. In Proceedings of the 16th Asia and South Paciﬁc Design Automation Conference, ASPDAC ’11, pages 285–290, Piscataway, NJ, USA, 2011. IEEE Press.

[118] Jan M. Rabaey. Digital Integrated Circuits: A Design Perspective. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996.

[119] A. Devgan and C. Kashyap. Block-based static timing analysis with uncertainty. In ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486), pages 607–614, Nov 2003.

[120] John Lee and Puneet Gupta. Discrete circuit optimization. Foundations and Trends in Electronic Design Automation, 6(1):1–120, 2012.

[121] Bhaghath P J and Ramesh S R. Article: A survey of ssta techniques with focus on accuracy and speed. International Journal of Computer Applications, 89(7):21–25, March 2014. Full text available.

[122] S. Jin, Y. Han, H. Li, and X. Li. Pˆ(2)clraf: An pre- and post-silicon cooperated circuit lifetime reliability analysis framework. In 2010 19th IEEE Asian Test Symposium, pages 117–120, Dec 2010.

152 REFERENCES

[123] E. Mintarno, V. Chandra, D. Pietromonaco, R. Aitken, and R. W. Dutton. Workload dependent nbti and pbti analysis for a sub-45nm commercial microprocessor. In 2013 IEEE International Reliability Physics Symposium (IRPS), pages 3A.1.1–3A.1.6, April 2013.

[124] Behzad Eghbalkhah, Mehdi Kamal, Hassan Afzali-Kusha, Ali Afzali-Kusha, Moham- mad Bagher Ghaznavi-Ghoushchi, and Massoud Pedram. Workload and temperature dependent evaluation of bti-induced lifetime degradation in digital circuits. Microelec- tronics Reliability, 55(8):1152 – 1162, 2015.

[125] A. Kerber. Methodology for determination of process induced bti variability in mg/hk cmos technologies using a novel matrix test structure. IEEE Electron Device Letters, 35(3):294–296, March 2014.

[126] A. Papoulis and S. U. Pillai. Probability, Random Variables, and Stochastic Processes. McGraw-Hill Higher Education, 4 edition, 2002.

[127] S. Mitra. Circuit failure prediction for robust system design. In 2008 IEEE International Integrated Reliability Workshop Final Report, pages 1–51, Oct 2008.

[128] Luming Yan, Huaguo Liang, Zhengfeng Huang, and Yanbin Liu. A fault detection sensor for circuit aging using double-edge-triggered ﬂip-ﬂop. Journal of Electronics (China), 30(1):97–103, Feb 2013.

[129] M. Omaa, D. Rossi, N. Bosio, and C. Metra. Self-checking monitor for nbti due degradation. In 2010 IEEE 16th International Mixed-Signals, Sensors and Systems Test Workshop (IMS3TW), pages 1–6, June 2010.

[130] J. Park and J. A. Abraham. An aging-aware ﬂip-ﬂop design based on accurate, run-time failure prediction. In 2012 IEEE 30th VLSI Test Symposium (VTS), pages 294–299, April 2012.

[131] J. C. Vazquez, V. Champac, A. M. Ziesemer, R. Reis, I. C. Teixeira, M. B. Santos, and J. P. Teixeira. Delay sensing for long-term variations and defects monitoring in safety– critical applications. Analog Integrated Circuits and Signal Processing, 70(2):249–263, Feb 2012.

[132] B. Zandian, W. Dweik, Suk Hun Kang, T. Punihaole, and M. Annavaram. Wearmon: Re- liability monitoring using adaptive critical path testing. In 2010 IEEE/IFIP International Conference on Dependable Systems Networks (DSN), pages 151–160, June 2010.

[133] S. Kostin, J. Raik, R. Ubar, M. Jenihhin, F. Vargas, L. M. B. Poehls, and T. S. Copetti. Hierarchical identiﬁcation of nbti-critical gates in nanoscale logic. In 2014 15th Latin American Test Workshop - LATW, pages 1–6, March 2014.

[134] Jifeng Chen, Shuo Wang, and Mohammad Tehranipoor. Eﬃcient selection and analysis of critical-reliability paths and gates. In Proceedings of the Great Lakes Symposium on VLSI, GLSVLSI ’12, pages 45–50, New York, NY, USA, 2012. ACM.

153 REFERENCES

[135] Song Jin, Yinhe Han, Huawei Li, and Xiaowei Li. Statistical lifetime reliability optimization considering joint eﬀect of process variation and aging. Integration, the {VLSI} Journal, 44(3):185 – 191, 2011.

[136] Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Monitoring of aging in integrated circuits by identifying possible critical paths. Microelectronics Reliability, 54(67):1075 – 1082, 2014.

[137] Khaled R Heloue and Farid N Najm. Early statistical timing analysis with unknown within-die correlations. In IEEE TAU Workshop, Austin, 2007.

[138] M. Orshansky and A. Bandyopadhyay. Fast statistical timing analysis handling arbitrary delay correlations. In Proceedings. 41st Design Automation Conference, 2004., pages 337– 342, July 2004.

[139] Michael Orshansky and Wei-Shen Wang. Statistical analysis of circuit timing using ma- jorization. Commun. ACM, 52(8):95–100, August 2009.

[140] S. Bian, M. Shintani, S. Morita, H. Awano, M. Hiromoto, and T. Sato. Workload-aware worst path analysis of processor-scale nbti degradation. In 2016 International Great Lakes Symposium on VLSI (GLSVLSI), pages 203–208, May 2016.

[141] S. Duan, B. Halak, and M. Zwolinski. An ageing-aware digital synthesis approach. In 2017 14th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), pages 1–4, June 2017.

[142] Shengqi Yang, Wenping Wang, Mark Hagan, Wei Zhang, Pallav Gupta, and Yu Cao. Nbti- aware circuit node criticality computation. J. Emerg. Technol. Comput. Syst., 9(3):23:1– 23:19, October 2013.

[143] Song Jin and Yi Ran Huang. Analysis and evaluation on nbti-induced circuit aging. In Applied Science, Materials Science and Information Technologies in Industry, volume 513 of Applied Mechanics and Materials, pages 3976–3982. Trans Tech Publications, 5 2014.

[144] R.L. Burden and J.D. Faires. Numerical Analysis. Cengage Learning, 2010.

[145] A. F. Gomez and V. Champac. Early selection of critical paths for reliable nbti aging- delay monitoring. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 24(7):2438–2448, July 2016.

[146] J. Vazquez-Hernandez. Error prediction and detection methodologies for reliable circuit operation under nbti. In 2014 International Test Conference, pages 1–10, Oct 2014.

[147] B. C. Paul, Kunhyuk Kang, H. Kuﬂuoglu, M. A. Alam, and K. Roy. Impact of nbti on the temporal performance degradation of digital circuits. IEEE Electron Device Letters, 26(8):560–562, Aug 2005.

[148] M. Yabuuchi and K. Kobayashi. Circuit characteristic analysis considering nbti and pbti- induced delay degradation. In 2012 IEEE International Meeting for Future of Electron Devices, Kansai, pages 1–2, May 2012.

154 REFERENCES

[149] Ing-Chao Lin, Shun-Ming Syu, and Tsung-Yi Ho. Nbti tolerance and leakage reduction using gate sizing. J. Emerg. Technol. Comput. Syst., 11(1):4:1–4:12, October 2014.

155 REFERENCES

156