THERMAL TESTING OF INTEGRATED CIRCUITS Thermal Testing of Integrated Circuits

by JosepAltet University Politecnica de Catalunya and Antonio Rubio University Politecnica de Catalunya

SPRINGER-SCIENCE+BUSINESS MEDIA, B.V. A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN 978-1-4419-5287-5 ISBN 978-1-4757-3635-9 (eBook) DOI 10.1007/978-1-4757-3635-9

Printed on acid-free paper

AII Rights Reserved © 2002 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 2002 Softcover reprint ofthe hardcover lst edition 2002 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specificalIy for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Contents

ACKNOWLEDGEMENTS Xl

PREFACE xiii

1. INTRODUCTION TO THE TESTING OF INTEGRATED CIRCUITS 1 1. INTRODUCTION 1 2. NATURE AND MODELLING OF REALISTIC DEFECTS 3 2.1 Classification of defects 3 2.2 Realistic defect models 5 2.2.1 Defect model for a GOS 5 2.2.2 Bridging defect models 6 2.2.3 Open defect models 6 2.3 Defective behaviours at the electrical level 8 2.3.1 Gate oxide short defects 8 2.3.2 Bridging defect 8 2.3.3 Open defects 9 2.3.4 Conclusions 9 3. FAULT MODELS AND CONVENTIONAL TESTING STRATEGIES 10 3.1 Fault models 10 3.2 Conventional test strategies 11 4. PRACTICAL ASPECTS OF TESTING INTEGRATED CIRCUITS 12 4.1 Test pattern generation 13 4.2 Design for testability and test standards 13 4.3 Built-in self-testing 14 4.4 The cost of testing 14

v vi Contents

5. FUTURE PERSPECTNES OF CONVENTIONAL TEST STRATEGIES 15 5.11DDQ testing 15 5.2Evolution of semiconductor technology and role of 1DDQ testing in deep submicron circuits 17 6. CONCLUSIONS AND SCOPE OF THIS BOOK 18 7. REFERENCES 19

2. THERMAL TRANSFER AND THERMAL COUPLING IN IC'S 23 1. INTRODUCTION: HEAT TRANSFER AND ITS RELATION TO THERMODYNAMICS 23 2. MECHANISMS OF HEAT TRANSFER 25 2.1 The conduction mechanism 25 2.1.1 Thermal resistance 27 2.1.2 Contact resistance 29 2.2 The convection mechanism 31 2.2.1 Natural convection 31 2.2.2 Forced convection 32 2.3 The radiation mechanism 32 3. ENERGY BALANCE IN A MEDIUM: HEAT TRANSFER EQUATION 35 4. THERMAL ELEMENTS IN IC'S 37 4.1 Heat sources 37 4.1.1 Passive components 38 4.1.2 Active devices 38 4.1.3 Power dissipation due to switching activity 39 4.1.4 Peltier Effect 40 4.2 IC structure: materials and transfer 41 5. EFFECTS OF HEATER TRANSFER IN IC'S 42 5.1 Temperature sensitivity of electronic devices 42 5.1.1 Temperature effects in MOS 42 5.1.2 P-njunction 43 5.1.3 BJT devices 44 5.2 Ageing mechanisms and circuit degradation 45 6. CONCLUSIONS 47 7. APPENDIX: UNITS AND CONVERSION FACTORS 48 8. REFERENCES 49

3. THERMAL ANALYSIS IN INTEGRATED CIRCUITS 53 1. INTRODUCTION 53 2. DEFINITIONS 54 2.1 Thermal analysis versus electro-thermal analysis 54 Contents VB

2.2 Boundary conditions 55 2.2.IExample 1: Application of boundary conditions for an IC analysis 56 3. THERMAL ANALYSIS OF INTEGRATED CIRCUITS 58 3.1 Analytical methods 59 3.1.IExample 2: Presentation of the method. Calculation of a static two-dimensional temperature map 60 3.1.2Example 3: Calculation of a three-dimensional time dependent temperature map 64 3.1.3 Example 4: Thermal analysis in cylindrical coordinates 67 3.1.4 Example 5: AC thermal analysis 70 3.1.5 Example 6: Analysis of multi-layer structures 73 3.2 Numerical methods 78 3.2.1 Finite difference method 78 3 .2.1.1 Nodal equation extraction 79 3.2.1.2 RC modelling of heat transfer 84 3.2.1.3 Reduction ofthe complexity in thermal analysis ofIC's 85 4. ELECTRO-THERMAL ANALYSIS OF INTEGRATED CIRCUITS 91 4.1.1 Example 7: Dynamic electro-thermal procedure 92 5. CONCLUSIONS AND SUMMARY 94 6. REFERENCES 94

4. TEMPERATURE AS A TEST OBSERVABLE VARIABLE IN ICS 97 1. INTRODUCTION 97 2. MODIFICATION OF THE THERMAL PATH BETWEEN THE HEAT SOURCES AND THE HEAT SINK 99 2.1 Example 1: thermal testing of the quality of solder joints. 100 2.2Example 2: thermal testing of the quality ofpackages 102 3. MODIFICATION OF THE HEAT SOURCES PRESENT IN THE IC 116 3.1 Identification of defects as heat sources 117 3.1.1 Example 1: Power dissipated in different bridge topologies 119 3.1.2 Example 2: Effects of device scaling and degraded logic levels. 121 3.1.3Example 3: Power dissipated in CMOS combinational circuits with a GOS defect. 123 3.1.4 Conclusions 124 3.2 Thermal disturbances generated by heat sources 125 3.2.1 Dynamic thermal characterisation 125 3.2.2 Static thermal characterisation 129 viii Contents

3.3 Location of the heat source 131 3.3.1 Amplitude measurements 132 3.3.2 Phase measurements 132 3.3.3 Rise time and delay measurements 134 4. SUMMARY 136 5. REFERENCES 136

5. THERMAL MONITORING OF IC'S 139 1. INTRODUCTION 139 2. OPTICAL METHODS 141 2.1 Contact methods 141 2.1.1 Liquid crystal thermography 141 2.1.1.1 Principle of operation 141 2.1.1.2 Technique performance 143 2.1.2 Fluorescent microthermography 144 2.1.2.1 Principle of operation 144 2.1.2.2 Technique performance 144 2.2 Non-contact methods 145 2.2.1 Infrared emission thermography 145 2.2.1.1 Principle of operation 146 2.2.1.2 Technique performance 147 2.2.2 Thermoreflectometers 148 2.2.3 Interferometers 154 3. MECHANICAL METHODS 158 4. BUILT-IN TEMPERATURE SENSORS 161 4.1 Absolute temperature sensors 162 4.2 Differential temperature sensors 169 5. CONCLUSIONS 179 6. REFERENCES 181

6. FEASmILITY ANALYSIS AND CONCLUSIONS 185 1. INTRODUCTION 185 2. FEASmILITY ASPECTS OF THE THERMAL TESTING OF CIRCUITS 187 2.1 Cost estimation 187 2.2 Discriminability analysis 190 2.2.1Heat sources in fault-free circuits: generation of thermal disturbances 190 2.2.2 Discriminability 195 2.2.3 Strategies to improve the feasibility of thermal testing 195 2.2.4 Generation of test vectors 196 3. GENERAL CONCLUSIONS 198 4. REFERENCES 199 Contents ix

INDEX 201 Acknowledgements

The authors would like to thank the researchers referenced throughout the book for their valuable previous work. We are specially grateful to Professors Wilfrid Claeys, Stefan Dilhaire, Stephane Grauby and all the research team of the "Centre de Physique Moleculaire Optique et Hertzienne" from the Universite Bordeaux I, France; Sebastian Volz from the Laboratoire d'Etudes Thermiques, Ecole Nationale Superieur de Mechanique et d' Aerotechnique, France; Jean Christophe Batsale, from the Laboratoire d'Energetique et Phenomenes de Transfert - Universite Bordeaux I, France; Hideo Tamamoto from the Department of fuformation Engineering, Akita University, Japan; Joan Figueras and Rosa Rodriguez, from the Electronic Engineering Department, Universitat Politecnica de Catalunya, Spain; Jaume Segura from the Physics Department, Universitat de les TIles Balears, Spain; Victor Champac from the INAOE, Mexico; and Andre Ivanov, from the Electrical and Computer Engineering Department, The University of British Columbia, Canada; with whom we have been tightly working during the last years in this field. Weare also grateful to Prof. P.E. Bagnoli, C. Casarosa, M. Ciampi, E. Dallago, V. Szekely, M. Rencz, A. Poppe and B. Courtois for providing figures from their research work.

Xl Preface

Integrated circuits (IC's) have undergone a significant evolution in terms of complexity and performance as a result 'of the substantial advances made in manufacturing technology. Circuits, in their various mixed formats, can be made up tens or even hundreds of millions of devices. They work at extremely low voltages and at very high frequencies. Testing of circuits has become an essential process in IC manufacturing, in the effort to ensure that the manufactured components have the appropriate levels of quality. Along with the ongoing trend towards more advanced technology and circuit features, major testing challenges are continuously emerging. The use of ambivalent procedures to test the analogue and digital sections of such complex circuits without interfering in their nominal operation is clearly a critical part of today's technological ipdustries. Chapter 1 presents the general purposes and basic concepts rel~ted With' the"testing of integrated circuits, discussing the various strategies and their limitations. Readers who are already familiar with the field may opt to skip this chapter. This book offers a multidisciplinary focus on thermal testing. This is a testing method which is not only suitable for use in combination with other existing techniques, but is also backed by a wealth of knowledge and offers exciting opportunities in the form of as yet unexplored areas of research and innovation for industrial applications. In short, thermal testing is that general category of testing procedures in which the observable magnitude is the temperature of a part or whole of the system. The technique can be applied either to the packaging of the components, or directly to the components themselves. This book will also deal with the testing of packaging and silicon dies.

xiii xiv Preface

In order to achieve a thorough understanding of thermal testing, a knowledge of thermodynamics, specifically of heat propagation mechanisms, and diffusion and heat balance equations, will be necessary. Electrical engineers may refer to Chapter 2 for any necessary background information on thermodynamics, as it consists of an introduction to such basic concepts as transfer mechanisms, contact resistance, radiation, heat balance, heat sources, the sensitivity of electronic devices to temperature, the behaviour of materials and the ageing effects of temperature on the reliability of the components. Chapter 3 features an analysis of thermal propagation, and static and transient cases, introducing analytical and numerical solution techniques that are applied to a set of easy-to-understand exercises. The application of temperature as a test-observable magnitude is presented in Chapter 4 for both packaging and silicon die. In this chapter, the concepts of thermal map deviation, the effect of different types of failures on temperature measurements and the diagnostic capabilities of the technique are discussed. Chapter 5 is an introduction to the various instrumental techniques that may be used to measure the temperature of the surface of the silicon die. Optical, mechanical and built-in sensor methods are compared and contrasted, providing specific details of their principles of operation and the resolution and dynamic characteristics for each technique. Finally, Chapter 6 focuses on the evaluation of the feasibility of thermal testing in VLSI circuits with built-in thermal sensors. We would like to thank the researchers from the fields of physics, mathematics and electronics whose efforts have made this technique possible. A special recognition must be made of all the researchers who have gone before us in the fields of thermal measurement and testing, and have created this new thermal testing method, which has such great potential.

Josep AItet Antonio Rubio Chapter 1

Introduction to the testing of integrated circuits

1. INTRODUCTION

Today's electronic technology is based on the design and manufacture of integrated circuits. The concept of the comes from the work of 2000 Nobel Prize winner 1.S. Kilby [1]. Its origins can be dated to February 1959. In his patent declaration, Kilby defines the integrated circuit with the following statement: "this invention relates to miniature electronic circuits, and more particularly to unique integrated electronic circuits fabricated from semiconductor material". Following this idea, in modem VLSI technology, integrated circuits are complex mixed mode circuits (with tens and hundreds of millions of devices) placed together in a single silicon semiconductor crystal. An aggregation of devices, generally Metal Oxide Semiconductor, (MOS) [2] devices, compose the analogue and digital sections of CMOS systems [3]. Since its invention in 1959 until today, and presumably at least until the end of the first decade of the 21 st century, Kilby's concept has been and will continue to be the basis of the modem electronic devices and systems industry. The major advance in this technology has been the continued reduction of size, or scaling-down, of the devices throughout this time. MOS devices have been demonstrated with a channel length of less than 50nm [4]. The continuous trend towards miniaturisation has allowed the integration of increasingly complex circuits and systems, nearly one hundred million devices in the year 2002. This evolution was stated by G. Moore in 1971 in the declaration known as Moore's Law [5]: in integrated circuit technology the number of devices integrated into a single circuit is duplicated every 18 months. This law has been accurately proven during integrated circuit technology's lifetime [6]. 2 Chapter 1

The integrated circuit industry is based on a sophisticated manufacturing process. The entire set of devices configuring the circuit is located on the surface of a semiconductor crystal. This technique, called planar process [7], was introduced in 1959 and has been maintained until today. In planar technology, circuits and devices are described and implemented on a particular surface organised from the interaction of selective P and N-type semiconductor regions (layout) and the deposition of selected regions of insulators (SiD2), poly silicon and metal connections. The metal (aluminium and copper) interconnection system is very complex in modern circuits, being structured as a stacked system with more than 6 independent metallic interconnection levels. The manufacturing process of integrated circuits is based on photolithography [8]. Each sequential sub-process is defined by a sub-set of the whole layout, forming what is called a mask. The manufacturing process can be dermed as a sequence of physical and chemical reactions, each of which is defined by a mask and applied following the photolithographic technique. The photolithographic technique is based on the application of a light source of appropriate wavelength through a sophisticated optical system and the masks on the surface of the circuit previously coated with a radiation sensitive substance. The existence of process deviations and manufacturing defects is inherent to the complex and sophisticated manufacturing technique because of the deviation of the process parameters and the natural fluctuations of the reactions. This fact is caused by the effect of the discrete number of elements configuring the system, especially important in deep submicron technologies; therefore a quality verification process is required (testing process). As a consequence of the high levels of quality demanded by the semiconductor industry, the testing of manufactured circuits has become an essential part of the whole design-manufacturing-service system. Since the manufacturing defect sources exhibit random behaviour (due to fluctuations of the process as well as the random location of point defects) the test procedure has to be applied to each of the manufactured devices, making testing a significant part of the cost of integrated circuits. The test procedures are based on both the function of the circuit and the process defect behaviour and must satisfy the accuracyltime of application trade-off. Due to the complexity of the problem and the impact on the quality and cost of the products, test technology has emerged as a fundamental pillar of electronic technology and the electronics industry. In Section 2 of this chapter, the typology and modelling of the more significant manufacturing defects are reviewed, and the concepts of fault and fault model are introduced. Major test concepts are reviewed in Section 4. Section 5 discusses the status and future perspectives and challenges of test 1. Introduction to the testing of integrated circuits 3 technology. Finally, Section 6 summarises the chapter and introduces and justifies the scope of this book.

2. NATURE AND MODELLING OF REALISTIC DEFECTS

There are two reasons why a manufactured circuit can fail in the objective or service it was designed and manufactured for: manufacturing defects and performance degradation due to process deviations. Although the latter type of failure is taking on more relevance in the modem deep submicron technologies, in this book we will focus our attention on the testing of manufacturing defects. Manufacturing defects (also known as catastrophic defects) are lacks of structural integrity of the circuit due to the fabrication process or ageing. In this section the main defect sources and the modelling and behaviour of defective parts are presented.

2.1 Classification of defects

From the point of view of a whole integrated circuit or system, defects can be classified as extrinsic or intrinsic. Extrinsic defects cover failures affecting the casing of the semiconductor. These defects affect the electric bonding connections and the heat dissipation path. Intrinsic defects affects the silicon piece itself. As regards permanent catastrophic defects, intrinsic defects can be classified through the typology of their cause. Defects may appear in both the manufacturing (manufacturing defects) and utilisation (physical failures) phases, but the manifestation of the defects can be put together under the category of realistic defects. The most important type of manufacturing defects can be attributed to the photolithographic process. Damaged masks or dust particles in the clean room may produce alterations in the respective regions of the circuit. They affect a specific layer (diffusions, thin oxide, polysilicon, vias and contacts, metal lines) causing a lack (usually opens) or addition (usually shorts) of material. The following types of defects appear as a result of these causes: o Gate oxide shorts (GOS). This defect [9] is produced by imperfections in the gate-oxide-substrate region of the MOS device, one of the most critical regions of the circuit due to the thinness of the gate oxide. The effect, an undesired connection between gate and substrate (see Figure 1.1) can be caused by an imperfection in the substrate surface, a defect (hole) in the oxide or an extension in the polysilicon gate entering the oxide region. The result is a violation of 4 Chapter 1

the insulated nature of the MOS . The extreme thinness of the oxide on the transistor channel makes this a frequent defect in modem circuits [10]. o Bridge or short-circuit. This defect causes the undesired connection (with a given connecting or bridge resistance [11], see Figure 1.2) of adjacent lines [12] . It is caused by the appearance of hot spots in the photolithography and especially affects the interconnecting lines. Complex interconnecting systems make this a frequent defect. o Open gate and drain defects. Complementary to the previous type, opens in lines may also produce failures. Opens are unintentional electrical discontinuities. Their analysis and classification depend on the type of line opened. The line can be one of the source-drain branches in the CMOS circuit. This defect, called an open drain, causes a permanent lack of conduction in the branch, and is usually considered as a full open circuit. It is more complex when the line open affects a gate defect called an open gate (see Figure 1.3). In this case a voltage in the transistor gate can appear because of the capacitive behaviour of the gate, causing an undesired path of current [13].

Source Gate Drain

Figure 1.1: A GOSfailure causes contact between the gate and the channel due to a break in the oxide. On the right, a photograph of an MOS device with a GOS is shown{14].

LineA lineB LineC

Figure 1.2: Short-circuit between lines A and B of the interconnection layer caused by a point defect. 1. Introduction to the testing of integrated circuits 5

D!\Alr.

D

s

Figure 1.3: Photograph and diagram of an open gate of an open failure [14J.

2.2 Realistic defect models

Realistic defects such as those introduced in the above section exhibit behaviours which are considerably more complex than simple shorts and opens. The effect of these defects must be analysed through the use of defect models that consider both the characteristics of the device affected by the failure and a set of parameters that model the characteristics of the defect itself. All these models work in the electrical domain.

2.2.1 Defect model for a GOS

Gate oxide short (GOS) failures appear frequently in CMOS technology [11]. In these failures an undesired path of current through the oxide of the gate appears in MOS transistors, thus causing the defective device to manifest a violation of the gate insulation principle of MOS technology. GOS defects are connections between the gate electrode and the channel or substrate through the Si02 oxide of the device (oxide breakdown). Figure 1.4 shows the section of a gate oxide short defective n-channel transistor. A one• dimensional circuit level defect model was presented in [15]. The defective MOS transistor is modelled (see Figure 1.4) by three components: a contact barrier B and two minor transistors in which the GOS defect splits the original transistor. The barrier can be defined by a serial resistance that models the GOS contact resistance and the potential barrier characteristics which, depending on the contact materials, can be rectifying or ohmic [15]. The position of the GOS defect is another parameter of the model which indicates the place where the defect is located and the size of the two 6 Chapter 1 transistors into which the device is split. The position of the defect can be characterised by the factor k=xIL, where x is the position of the defect with respect to the source and L the length of the channel. Figure 1.4 shows the MOS Ic/VDs characteristics of a defective MOS. The curves are displaced in the vertical axis by a current that depends on the gate voltage, following the previous model. Analysis of the curves leads to the conclusion that together with the gate current violation a distortion in the behaviour of the MOS is manifested.

IO (vA)

300.0 L ./ Y 11'0 . 00 {Dly ./ o //v / /

-300.0 G .00OQ IS . 000 va • !lOOO/cU v (VI

Figure 1.4: GOS defect model for an n-channel MOS and Ir/VDs characteristics of a defective device for different values of VG [14J.

2.2.2 Bridging defect models

A bridging defect appears when two or more lines are abnormally connected within an integrated circuit. Normally, the bridging defect cannot be modelled by the stuck-at model approach, since usually a bridge is not permanently stuck when it is with another logic node. The model for a metallic short is simply given by a (RB) connecting the bridged lines. Typically, the resistance value is not zero, but presents values from tens to hundreds ohms [12]. At any rate, this level of resistance is low in comparison with the on resistance of MOS devices.

2.2.3 Open defect models

The open drain defects are clearly identified by the incapability of the respective branch to conduct. These defects are easily modelled by the elimination of the defective device. The case of a gate open defect is very different. In the case of an open in the gate line the MOS gate electrode floats, acquiring a voltage that depends on the parasitic capacitances affecting the node. Failure, as shown in Figure 1.3, can be modelled at an 1. Introduction to the testing of integrated circuits 7

electrical level as indicated in Figure 1.5 a). The floating gate voltage (VFa) will follow the capacitive voltage divisor given by the four parasitic

capacitances: gate-source (Cgso), gate-drain (Cgdo), poly silicon-bulk (Cpb) and polysilicon-adjacent metal (Cmp) lines. Figure 1.5 b) shows the gate voltage levels for different poly-bulk and metal-poly capacitances (voltage in the metal line VM =5 volts).

G

I VB a)

€ 2 o oJ' 1.$

o~o --~----~------~2 , Cmp (fF) b)

Figure 1.5: Electrical model for an open gate defect (a) and gate voltage levels in the device versus the capacitances related to the polysilicon line (b) [13].

Observe that the gate may exhibit voltages that are comparable to or higher than the threshold voltage (0.9 volts in the technology of the case presented in Figure 1.5(b». Thus, the transistor may enter over-threshold conduction mode in spite of its floating gate in the region 1 (Figure 1.5) or sub-threshold conduction (region 2). An accurate expression for the voltage gate Va is shown in [15]:

(1.1) 8 Chapter 1

where Q8 is the bulk depleted charge, Q/ the channel inversion charge and Qo the effective interface charge in the oxide.

2.3 Defective behaviours at the electrical level

Physical failure may cause undesired behaviours at both the electrical and logical levels in digital devices. Since complementary CMOS technology circuits present very low leakage current for logical quiescent levels, the presence of a realistic defect causes a drain of current detectable by IDD current observation.

2.3.1 Gate oxide short defects

The electrical behaviour of a GOS defect is caused by the undesired injection of current to the channel because of the oxide break, as mentioned above. The change in the characteristic curves (Figure l.4b) causes a change in the static characteristic of a CMOS inverter. Figure 1.6 a) shows the IelVlN characteristic of an inverter circuit with three different locations of the defect (k= 0.9, 0.5 and 0.1) where ID is the drain current and VIN the voltage at the input of the inverter. Observe the high values of current present in the defective inverter, especially when compared with the few nanoamps ID levels expected in a fault-free device.

JD (uA) VDD 300. N IH~

r--.,.l\ ,kIo 0.& ".00 ...... " /dl v ~-I - Joo 1 ~ "",\ ~ -H--- GOS defect ...... ~ .lISOOO/cJ1v (V) ".000 GND

a) b) Figure 1.6: (a) Static lr/V/N characteristic for an inverter with a GOS defect in the n-MOS transistor [14J for three different locations of the GOS defect. (b) Electrical equivalent circuit.

2.3.2 Bridging defect

A bridge in a logic gate may produce two different types of effects: a change in the function performed by the bridged gates and a change /. Introduction to the testing of integrated circuits 9

(degradation) in the electrical levels of the logic. The change in the logical electric levels of an inverter with its output bridged to VDD is shown in Figure 1.7 for different bridge resistances. For values lower than 5ill, the behaviour of the inverter corresponds to a stuck-at-/ fault. For higher ohmic values the electrical levels show distortion. The last curve corresponds to a fault-free inverter. .

v. I VI -- VDD 0 I I ,I t--t--:.. j ! t::::".~ ~ ---:' .o kll t\\ i'.. "...... Bridge defe ct 0 ~ v H\\ "l... '::::: ~ 3 . f-- ~\\\ 1\ , ""'- 1\\ \ i "- 5. 1 kll ' \\ ,\ ~ ! i \ ," 't--i::=- 10 .011.11 I I I\. ~ 14 .6 kll 0 ' t-- .0 kll 5 . 0 00 VIN .tiCOO/ d 1v (V) GND

a) b) Figure 1.7: (a) Static characteristics of an inverter with a bridge connecting the out node with VDDfor different bridging resistances [14J. (b) Equivalent electrical circuit.

2.3.3 Open defects

Open drains are always a cause of current reduction. In complementary logic the inability of the defective branch to conduct may cause (for specific input patterns) the output node to float, introducing a memory element; this type of defect is the basis of the name stuck-open faults [20]. However, in analogue circuits and non-complementary types of CMOS logic circuits, the later stages may cause an undesired current because of the potential intermediate voltage of the output node. This is the case, even for complementary CMOS logic, of an open gate defect; because of the floating voltage of the gate the defective stage (and potentially the following ones) causes an excess of ID current. In Figure 1.8 the levels of current caused by the gate voltage (Figure 1.5 (b» are shown. Again, the current levels are higher than those corresponding to a fault-free case.

2.3.4 Conclusions

Failures caused in modem integrated circuits because of manufacturing and ageing defects exhibit significant deviations in their electrical characteristics. For the three types of defects, abnormal current levels, degradation of electrical logic levels and potential loss of the logic function 10 Chapter 1 are the failure effects that a fault may cause. In order to analyse the faults, electrical level models are used for the three types of defects. The voltage level alterations depend on the topology and size of circuits causing a loss in the logic testing coverage. The excess of current of the defective stages can be magnified because of the current conduction of posterior logic stages due to the electrical level distortion at the output node. Consequently, integrated circuits testing evaluating power supply current consumption (dynamic and specially static or quiescent) manifest a significant efficiency to detect such realistic defects. In all of the cases in which devices leak a flux of undesired current there is also an extra power dissipation, and consequently an unexpected heating source in the IC structure.

3. FAULT MODELS AND CONVENTIONAL TESTING STRATEGIES

3.1 Fault models

Manufacturing defects and physical failures are collectively referred to as physical faults [16][17]. These faults may produce an error [16] in the expected service of the circuit to the global systems. The objective of test technology is the effective screening of manufactured circuits to detect the presence of faults. Fault models are abstract descriptions of the effect of a fault on a defective circuit [18]. Fault models are useful to generate appropriate testing techniques. They can be defmed at the different levels of circuit description. The defect models introduced in Section 2.3 can be considered as physical and electrical fault models. Consumption alteration models of the circuit based on defects have also been considered (parametric faults). Among them, the quiescent current fault model, dynamic current fault models, power dissipation fault model and heat generation fault models can be mentioned. For digital circuits fault models can be defmed at a logic level. This set of fault models is called logic fault models [17]. The most simple logic fault model is applicable to a single logic node (able to take two logic levels, the '1' level and the '0' level). A circuit is said to have a stuck-at-l fault [19] in node i if this node is fixed to the logic '1' level (unable to attain the '0' level) because of a fault. The same can be applied for a stuck-at-O fault. When a fault generally affects a digital circuit it is referred to as a logic fault. Open drains may affect the logic of the circuit and may also introduce 1. Introduction to the testing of integrated circuits 11 memory states. These types of faults are referred to as stuck-open faults [20]. It is possible to defme a functional fault for medium and complex digital systems.

110 160

&..0 1211

40 20 0 2 0 1 l 5 C.p(fF)

Figure 1.8: Current levels (lDD) in a CMOS inverter circuit caused by an open defect in its n• MOS transistor, for different values of the Cmp and Cpb capacitances [13].

Consequently, functional fault models [21] can be derived. Further, for digital circuits, it is possible to define the effect of a defect through the change in the temporal response of digital circuit (propagation time of the circuit) generating the concept of delay faults [22] and delay fault models. The type of defect caused by the parasitic coupling [23] of adjacent lines is also dynamic. This type of fault is very important in memory systems [24] and also for modern technologies in high speed digital systems, called crosstalk faults [25]. Due to the vast array of different behaviours, parametric faults [26] and functional faults are considered in analogue circuits.

3.2 Conventional test strategies

A test procedure is applied to each of the fabricated circuits as a result of the nature of the manufacturing defects. This is done in the semiconductor industry through the use of sophisticated Automatic Test Equipment (ATE). The ATE applies test vectors(a sequential set of input patterns) to the Circuit Under Test (CUT) following the design shown in Figure 9. Conventional 12 Chapter 1 testing machines are able to check the circuits with logic, functional, delay and/or current testing procedures. One parameter of the test application procedure is the speed or frequency at which the vectors are presented to the CUT input. Logic and functional test vectors are applied at nominal speed. This allows the application of a high number of vectors in a short period of time (application time impacts the overall cost of the circuit). It is also appropriate to detect delay and coupling defects. The ATE constitution is very sophisticated because of the high performance requirement in the transmission lines used to read and apply the respective output and input vectors.

I I I

I I

Power Test Pattern em Supply Generator and I Veriflcation

~ TATE

Figure J .9: An ATE system applies the test procedure to the circuits under test.

4. PRACTICAL ASPECTS OF TESTING INTEGRATED CIRCUITS

It is possible to derive test procedures in both manufacturing and application phases from the knowledge of failure sources. Typically, in the factory these test procedures consist of a large set of input vectors applied to the CUT (test vectors). The ATE system reads the outputs of the CUT in the form of logic value, electrical levels and propagation delay time. The ATE system compares these vectors read from the CUT with those corresponding to fault-free circuits, generating a detailed report and a faulty/non-faulty flag that is used to accept or reject the circuit. Although it is not conventional, certain ATE machines (oriented to current testing) can read the value of the 1. Introduction to the testing of integrated circuits 13

IDD current. This method, called IDDQ testing [27], is useful to attain a complementary and efficient screening of defective devices. In a given circuit, a test vector set can be characterised by two parameters: its length (number of test vectors) and its coverage (proportion of detectable faults in relation with a set of reference). In order to reduce length and maximise coverage, Design for Testability Rules, DFT, [28] [29] are required at the design level. Complex and highly reliable circuits include part of the testing circuitry (built-in testing) [30] in order to simplify testing and extend the testing procedure to the application field.

4.1 Test pattern generation

The objective of test pattern generation (TPG) is to identify the minimum set of test vectors able to detect a given fault coverage (theoretically 100%). The task effort depends on the level of fault model used as a target, and its generation cost is very sensitive to the complexity of the circuit under test. Automatic Test Pattern Generation (A TPG) procedures have been researched in an effort to attain high productivity in test generation. The circuits most investigated, with the highest level of efficiency, have been the combinational sections of digital circuits with single stuck-at targets [31][32][33], the extension to sequential systems [36] and the March test procedures for RAM memories [34][35]. Two other domains of great interest are the ATPG techniques for delay tests [37][38] and IDDQ testing [39][40]. There are also ATPG tools applicable in VLSI DFf frameworks.

4.2 Design for testability and test standards

Today's VLSI design would be inconceivable without the participation of designers in the testing procedure. Design for Testability (DFT) is the set of rules and principles that designers follow to make the test of the circuit under design as easy and efficient as possible. These techniques are related to clear and demanding structural methods to attain high controllability (capability to force a given node to a given logic value) and observability (capability to read an internal node through the primary outputs of the circuit under test) of all the nodes of the circuit. Since the concept of testability is relative to the fault model set target, DFT techniques also depend on it. The definition of Testability Standards [41][42], now widely followed by modem designers, has had a major impact on the progress of DFT. 14 Chapter 1 4.3 Built-in self-testing

The built-in self-test (BISn corresponds to the concept of partially or fully incorporating the circuits required to apply the test into the circuit under test. This technique would alleviate many of the problems related to conventional testing: reduction of the test application time, the incorporation of testing in the application field, the elimination of highly costly ATE systems and access and test application problems. The amount of data passing between the ATE and the CUT is reduced drastically, even making possible the use of compacted signatures to support the information. The problems related to BIST are the increase in complexity of the IC design, the use of additional silicon area, and the depth of memory required to store the test vectors. The latter problem is overcome through the use of random test pattern generators (linear feedback shift registers), which are very efficient in terms of occupied area. The concept of reconfigurable circuits has been introduced to reduce the extra area of silicon. In these circuits the same hardware as is required to apply the test vectors is configured later as part of the nominal function of the circuit. The well• known built-in logic block observer (BILBO) [43] is a versatile multimode test register for implementing such reconfigurable schemes. Some in this field [44] accept up to a 40% area overhead due to BIST in certain circumstances, considering the overall cost of testing. The concept of BIST has recently been extended to mixed signal circuits where the analogue sections are tested by the digital sections through the use of configurable circuits [45][46][47].

4.4 The cost of testing

The test cost affects the electronic system implementations at all levels [48]. There are two complementary factors that affect test cost:

o The cost of the test design and application. This covers the cost of the ATE machine, which grows exponentially with each generation of circuits (see [63]). For example, the average ATE machine will cost 8 million dollars in 2003. Additional factors include the cost of test generation and test application. The latter is directly related to the application time (length of test vectors). o The cost of an inappropriate result of the test procedure. This component is composed of the cost of good circuits classified as faulty and the cost of faulty devices not detected by the test (escaping the test procedure and affecting a more expensive later test stage). 1. Introduction to the testing of integrated circuits 15

The overall effect on cost can be substantial, depending on factors such as complexity, technology and production volume. The cost of testing the IC as a proportion of the overall cost of the IC (design and testing) may reach significant levels, ranging from 5% to 25%, and in many cases higher, even making component production prohibitive. Although the silicon area is increased, BIST techniques involve a major test cost saving [49].

5. FUTURE PERSPECTIVES OF CONVENTIONAL TEST STRATEGIES

Future integrated circuits will exhibit the characteristics of high complexity, mixed Systems-On-Chip [50] (even including RF sections [51]), low voltage and very high performances. The integration of entire complex systems on a single chip is a critical factor for testing because of the increased test procedures and the coverage reduction due to more sophisticated technology [52]. The impact of tests on cost and productivity will be significant. The challenge factor of testing for future circuits can be condensed in two points [50]: first, that during the technology evolution lifetime the cost of manufacturing has not been scaled (compared to the progressive reduction in the per-transistor manufacturing cost), and second, that the engineering effort to generate test procedures has been growing geometrically (in parallel with product complexity). It will be necessary to introduce new and efficient testing techniques to overcome these difficulties.

5.1 IDDQ testing

IDDQ testing is based on the measurement of the circuit drain current for a small set of test vectors in its quiescent state. As the quiescent drain current is very low for complementary CMOS digital circuits and the presence of a realistic defect involves an unexpected level of current (see Section 2 of this chapter), IDDQ has been considered as an efficient testing technique [53][54]. In Figure 1.10 an example of IDDQ testing is shown, in section a) the CMOS circuit for a NAND logic gate is shown and section c) shows the IDD current waveform for an input change: there is a transient of current going back to a quiescent level that is practically negligible. In the case of a bridge in output node to ground (section b) the transient of current also has a peak but the quiescent value is kept at a detectable value (waveform c). Quiescent current value is given by: IDDQ=VDlJ(RB+Rpmos), where Rpmos is the equivalent resistance of the pmos network 16 Chapter 1

A '----.---'---. our B A ..---J...... t---1

B •• - ---'---

a) b)

A.B

IDD

c) d) Figure 1.10: Example of bridge detection by 1DDQ testing. Case c) corresponds to a fault-free case and waveform d) to a bridge defect in the output node to ground.

There are several reasons why the IooQ testing technique has attracted significant attention from the semiconductor test industry [53]. The first is that it is extremely cost effective and is based on the root cause of the problem (realistic defects). IooQ testing also involves a shorter time to market. As it is based on the sum of drain currents, IOOQ testing involves a high observability factor, which drastically reduces the complexity of the test generation and the length of the current test vectors. Another reason is the higher coverage level in comparison with other testing techniques (for example bridging defect testing over logic tests). In conclusion, IooQ testing 1. Introduction to the testing of integrated circuits 17

has brought increased coverage with less effort and cost, which has led to it being adopted by the semiconductor industry. However, IDDQ testing has some drawbacks, especially when deep submicron technologies are considered. The first is that in order to make a defect detectable it is necessary to attain an acceptable level of current discriminability. Thus, for very complex circuits (where the quiescent current is high because of the number of devices), high temperature circuits (temperature increases leakage currents), very deep submicron circuits (VT reduction involves an increase in sub-threshold currents) and leaky circuits (as is the case of certain 110 pads) IDDQ testing is not efficient. Current testing is used for a fast screening of defective circuits and as a supplemental test to conventional testing (logic testing, functional testing, stuck-at testing, delay testing). Intensive research work has been addressed to propose design rules for Iddq-ability [55][56], IDDQ test pattern generation [57][58] and circuit partitioning (to increase discriminability)[59]. Another characteristic of IDDQ testing is the speed down of test vectors, an application which is necessary because of the circuit's need to reach a quiescent level. Thus, current testing is applied to a frequency much lower than the nominal speed used in logic and functional testing.

5.2 Evolution of semiconductor technology and role of IDDQ testing in deep submicron circuits

According to the well-known Moore's Law, faster and smaller transistors are being developed at a regular pace. In recent years technology exhibits featured size shrinkage to 0.18 11m and below. The MOS subthreshold domain cannot be neglected for these technologies. Sah's [60] model indicated a null drain current for gate bias lower than VT (threshold voltage). In reality, the drain current does not drop to zero, but rather decreases exponentially as the gate voltage goes below the threshold voltage. Moreover, these leakage currents increase exponentially with temperature and with VT reductions and linearly with shrinking channel length. This involves a significant increase in fault-free quiescent current, which causes a lack of discriminability in the current testing. This is aggravated because of current technology process fluctuations. Process variations can cause a shift in quiescent current by two orders of magnitude, making it impracticable to choose a current level threshold [54]. Along with partitioning [59], substrate biasing [61] and temperature reduction [62] in the test procedure are being used as compensating techniques that extend the use of current testing to deep submicron technologies (see [61] for more techniques). 18 Chapter 1

For low power supply voltage technologies the requirement to sense current is a limiting factor, because the necessity to place a serial probe (current to voltage converter) affects the circuit performances (in both analogue and digital sections).

6. CONCLUSIONS AND SCOPE OF THIS BOOK

The presence of defects in manufactured integrated circuits is inherent to technological processes. Moreover, in modem technologies the situation is becoming more difficult because of the increased circuit complexity, the reduction of power supply voltage, the increase of functional performances and manufacturing process fluctuations. Due to the need to reach an acceptable quality factor of produced components, the manufacturing (and by extension, the test in the application framework) test technology requirements clearly present a technological challenge. Physical defects are at the root of manufacturing failures. Defect models in the analogue domain have been proposed, showing a degradation of performances and levels, and an abnormal path of currents from VDD to ground as their main consequences. This involves a violation of the CMOS quiescent characteristics and a limit for battery-operated systems. This observation of increased leakage current has been applied as a complementary testing technique to conventional testing techniques (logic testing, delay testing), exhibiting very interesting features, such as reduced generation and application cost as well as an elevated fault coverage. This technique, adopted for quiescent state of the circuit, is called IDDQ testing. Nevertheless, current testing presents several drawbacks when used in complex deep submicron technologies. Thus, the challenge continues. This book introduces a new testing technique environment based on the measurement of the temperature of the surface of the silicon or the packaging systems and the identification of faulty circuits on this basis. The technique is called thermal testing. Thermal testing methods can be applied off-chip after manufacturing and especially on-line if sensors are integrated in the chip. The technique involves a "natural coupling technique", so no change of impedances is made in the circuit under test and the performances of the electrical circuits are not affected. Chapter 2 introduces the basis of the heat propagation process particularly applied to IC materials. In Chapter 3, analytical and numerical methods of evaluation, which are the basis of the CAD tools for heat and temperature calculations, are discussed. Chapter 4 will deal with the evaluation and possibilities of thermal testing, analysing the heat sources produced by physical failures and the 1. Introduction to the testing of integrated circuits 19 alternatives of sensing thermal parameters. In Chapter 5 the different techniques for measuring the temperature of the surface of the IC are presented, both off-line and on-line. Finally, Chapter 6 summarises the conclusion of the performances and opportunities of thermal testing.

7. REFERENCES

[1] Kilby, lS., 'The integrated circuit's early history", Proceedings IEEE, vol. 88, no. 1, January 2000, pp. 109-11l. [2] Tsividis, Yannis P., "Operation and modelling of the MOS transistor", McGraw Hill, 1989. [3] Uyemura, J.P., "Fundamentals of MOS digital integrated circuits", Addisson-Wesley, 1988. [4] Pidin, S. et al., "Experimental and simulation study on sub-5Onm CMOS design", Proceedings Symposium on VLSI Technology 2001, pp. 35-36. [5] Bondypadhyay, P.K., "Moore's law governs the silicon revolution", Proceedings IEEE, vol. 86, no. 1, January 1998, pp. 78-8l. [6] Hamilton, S., ''Taking Moore's law into the next century", Computer, vol. 32, no. 1, January 1999, pp. 43-48. [7] Sze, S.M., "Physics of Semiconductor Devices", New York, Wiley, 1981. [8] Asano, K. et al., ''Patterning sub-3Onm MOSFET gate with i-line lithography", IEEE Transactions on Electron Devices, vol. 48, no. 5, May 2001, pp. 1004-1006. [9] Segura, lA and Rubio, A, "A detailed analysis of CMOS SRAMS with gate oxide shorts defects", IEEE Journal Solid State Circuits, vol. 32, no. 10, October 1997, pp. 1543-1550. [10] Renovell, M. et al., ''Boolean and current detection of MOS transistor with gate oxide shorts", Proceedings International Test Conference 2001, pp. 1039-1048. [11] Semenov, O. and Sachdev, M., "Impact of technology scaling on bridging fault detection in sequential and combinational CMOS circuits", Proceedings International Workshop on Defect Based Testing 2000, pp. 36-42. [12] Rodriguez-Montaiies,R., Bruis, E.M.J.G. and Figueras, J., ''Bridging defects resistance measurements in a CMOS process", Proceedings International Test Conference 1992, p. 892. [13] Champac, V.H., Figueras, J. and Rubio, A, "Electrical model of the floating gate defect in CMOS IC's: implications on Iddq testing", IEEE Transactions on Computer-Aided• Design of Circuits and Systems, vol. 13, no. 3, March 1994, pp. 359-369. [14] Segura, 1 et al., "Quiescent current analysis and experimentation of defective CMOS circuits", Journal of Electronic Testing: Theory and Application, 3, 337-348(1992), Kluwer Academic Publishers. [15] Segura, 1 et al., "A detailed analysis and electrical modeling of gate oxide shorts in MOS transistors", Journal of Electronic Testing: Theory and Application, 8, 229-239(1996), Kluwer Academic Publishers. [16] Avizienis, A and Laprie, lC., "Dependable computing: from concepts to design diversity", Proceedings of the IEEE, vol. 74, no. 8, May 1986. [17] Abramovici, M., Breuer, M.A. and Friedman, AD., Digital Systems Testing and Reliable Design, Computer Science Press, 1990. [18] Abraham, lA and Fuchs, W.K., "Fault and error models for VLSI", Proceedings of the IEEE, vol. 74, no. 5, May 1986. 20 Chapter 1

[19] Breuer, M.A and Friedman, AD., "Diagnosis and reliable design of digital systems", Computer Science-Press, Washington DC 1976. [20] Wadsack, R.L., "Fault modelling and logic simulation of CMOS and MOS integrated circuits", Bell System Technology Journal, pp. 1449-1474, May-June 1978. [21] Abraham, J.A and Parker, K.P., ''Practical Microprocessor Testing: Open and closed loops approaches", Proc. COMPCON Spring 1981, pp. 308-311, February 1981. [22] Sharma, M. and Patel, J.H., ''Testing of critical paths for delay faults", Proceedings of the International Test Conference 2001, pp. 634-641. [23] Tyszer, J., "Testing of three-stage switching networks for coupling faults", IEEE Transactions on Communications, vol. 40. no. 2, Feb. 1992, pp. 413-422. [24] Kuo-Liang Cheng; Ming-Fu Tsai; Cheng-Wen Wu, "Efficient neighborhood pattern• sensitive fault test algorithms for semiconductor memories", IEEE Proceedings VLSI Test Symposium 2001, pp. 225-230. [25] Rubio, A; Itazaki, N.; Xu, X.; Kinoshita, K., "An approach to the analysis and detection of crosstalk faults in digital VLSI circuits", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 13, no. 3, March 1994, pp. 387-395. [26] Milne, A; Taylor, D.; Saunders, J.; Talbot, AD., "Generation of optimised fault lists for simulation of analogue circuits and test programs", lEE Proceedings Circuits, Devices and Systems, vol. 146, no. 6, Dec. 1999, pp. 355-360. [27] Gulati, R.K. and Hawkins, C.F. editors, "lDDQ testing of VLSI circuits", Kluwer Academic Publishers, 1993. [28] McCluskey, E.I, "Logic Design Principles: with emphasis on testable semicustom circuits", Prentice Hall International, 1986. [29] Fujiwara, H., "Logic testing and design for testability", The MIT Press, 1985. [30] Chujen Lin; Haynes, L.; Mandava, P.; Prasad, P., "Automatic BIST design tool for mixed-signal circuits", AUTOTESTCON '98. IEEE Systems Readiness Technology Conference, 1998 IEEE, pp. 97-102. [31] Roth, IP., "Computer logic, testing and verification", Computer Science Press, Rockville, Maryland, 1980. [32] Fujiwara, H. and Shimono, T., "On the acceleration of test pattern generation algorithms", IEEE Transactions on Computers, vol. C-32, no. 12, pp. 1137-1144, December 1983. [33] Schulz, M.H. et al., "Improved deterministic test pattern generation with applications to redundancy identification", IEEE Transactions on Computer-Aided-Design, vol. 8, no. 7, pp. 811-816, July 1989. [34] Franklin, M.; Saluja, K.K.; Kinoshita, K., "Design of a BIST RAM with row/column pattern sensitive fault detection capability", Proceedings International Test Conference, 1989, pp. 327-336. [35] Mohammad, M.Gh.; Saluja, K.K.; Yap, A, ''Testing flash memories", Thirteenth International Conference on VLSI Design, 2000, pp. 406-411. [36] Das, D.K.; Bhattacharya, B.B.; Ohtake, S.; Fujiwara, H., "Testable design of sequential circuits with improved fault efficiency", Fourteenth International Conference on VLSI Design, 2001, pp.128-133. [37] Heragu, K.; Patel, J.H.; Agrawal, V.D., "A test generator for segment delay faults", Proceedings, Twelfth International Conference On VLSI Design, 1999, pp. 484-491. [38] Wangning Long; Zhongchen Li; Shiyuan Yang; Yinghua Min, "Memory efficient ATPG for path delay faults", Proceedings Test Symposium, 1997, pp. 326-331. [39] Kondo, H.; Kwang-Ting Cheng, "An efficient compact test generator for I1sub DDQ/ testing", Proceedings of the Fifth Asian Test Symposium, 1996, pp. 177-182. 1. Introduction to the testing of integrated circuits 21

[40] Isern, E.; Figueras, J., "Test generation with high coverage for quiescent current test of bridging faults in combinational circuits", Proceedings International Test Conference, 1993, pp. 73-82. [41] IEEE standard test access port and boundary-scan architecture, IEEE Std 1149.1-2001. [42] IEEE standard for a mixed-signal test bus, IEEE Std 1149.4-1999, 28 March 2000. [43] Konemann, P., Mucha, J. and Zwiehoff, G., "Built-in logic block observation technique", Proceedings of the IEEE Test Conference, 1979. [44] Varma, P., Ambler, A.P. and Baker, K., "An analysis of the economics of self test", Proceedings of the IEEE Test Conference, 1984. [45] Bernard, S.; Azais, F.; Bertrand, Y.; Renovell, M., "Analog BIST generator for ADC testing", IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2001, pp. 338-346. [46] Seongwon Kim; Soma, M., "An all-digital built-in self-test for high-speed phase-locked loops", IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 48, no. 2, Feb. 2001, pp.141-150. [47] Peralias, E.J.; Huertas, G.; Rueda, A.; Huertas, J.L, "Self-testable pipelined ADC with low hardware overhead", IEEE Proceedings on VLSI Test Symposium, 2001, pp. 272-277. [48] Turino, J., "Test economics in the 21 st century", IEEE Design and Test of Computers, July-September 1997, pp. 41-44. [49] Ungar, L.Y. and Ambler, T., "Economics of built-in self-test", IEEE Design and Test of Computers, September-October 2001, pp. 70-79. [50] Zorian, Y., Dey, S. and Rodgers, M.J., "Test of future System-On-Chips", Proceedings of the International Conference on Computer Aided Design, 2000, pp. 392-398. [51] Lee, T.H. and Wong, S.S., "CMOS RF Integrated Circuits at 5 GHz and Beyond", Proceedings of the IEEE, vol. 88, no. 10, October 2000, pp. 1560-1571. [52] Pynn, c.T., "Analyzing Manufacturing Test Costs", IEEE Design and Test of Computers, July-September 1997, pp. 36-41. [53] Rajsuman, R., "Iddq testing for CMOS VLSI", Proceedings of the IEEE, vol. 88, no. 4, April 2000, pp. 544-566. [54] Kundu, S., "Iddq defect detection in deep sub-micron CMOS ICs", Proceedings of the Asian Test Symposium, 1998, pp. 150-152. [55] Zarrinfar, F. and Raj suman , R., "Automated Iddq testing from CAD to manufacturing", Proceedings International Workshop on Iddq testing", 1995, pp. 48-51. [56] Hawkins, C.F. et aI., "Quiescent power supply current measurements for CMOS IC defect detection", IEEE Transactions on Industrial Electronics, May 1989, pp. 211-218. [57] Cusey, J.P. and Patel, l.H., "BART: A bridging fault test generator for sequential circuits", Int. Test Conference, 1997, pp. 823-832. [58] Mao, W. et aI., Quietest: A quiescent current testing methodology for detecting leakage faults", Int. Conf. Computer Aided Design, 1990, pp. 280-283. [59] Rullan, M. et aI., "Analysis ofIssq/lddq testing implementation and circuit partitioning in CMOS cell based design", European Design and Test Conference, 1996, pp. 584-588. [60] Sah, C.T., "Characteristics of the Metal-0xide-Semiconductors Transistors", IEEE Trans. Electron Devices, ED-II, 324, 1964. [61] Proc. Int. Technology Roadmap for semiconductor, November 1999. [62] Szekely, V.; Rencz, M.; Torok, S.; Courtois, B., "Cooling as a possible way to extend the usability of Usub DDQI testing", Electronics Letters, Volume: 33 Issue: 25,4 Dec. 1997, pp.2117-2118. [63] Singer, G., "The future oftest and DFT", IEEE Design and Test, July-September 1997, pp.11-l6. Chapter 2

Thermal transfer and thermal coupling in Ie's

1. INTRODUCTION: HEAT TRANSFER AND ITS RELATION TO THERMODYNAMICS

Prior to the 19th century, heat was envisioned as a liquid that flowed from hotter to colder objects. This imagined substanceless and weightless fluid was called caloric and no distinction was made between heat and temperature until the writings of Joseph Black (1728-1799). It was not until J. P. Joule published a defmitive paper in 1847 that the idea of caloric was abandoned. Joule showed that heat is a form of energy. Moreover, after the experimental results of Rumford, Helmholtz, Joule and others, it was demonstrated that any of the various forms of energy can be transformed into another. Thermodynamics is the field of science that studies the connection between heat and work and the conversion of one into the other. There are two major laws concerning thermodynamics. The First Law of Thermodynamics is the law of the conservation of energy. When heat is transformed into any other form of energy, or when other forms of energy are transformed into heat, the total amount of energy (taking into account all the forms) in the system is constant. The law states that energy cannot be created or destroyed; it is simply converted into one of the possible forms (heat, light, waves, chemical energy, potential energy, kinetic energy, etc.). The Second Law of Thermodynamics states that some heat is lost when heat is converted into mechanical energy. In a thermal converting machine it is mandatory that part of the heat energy is used just to heat (increase of temperature) the engine. The percentage of heat dedicated to work is called 23 24 Chapter 2

the thermal efficiency of the engine. It was Sadi Carnot (1796-1832) who conducted theoretical studies of the efficiency of heat engines, in an attempt to model the most efficient heat engine possible. His theoretical work provided the basis for the practical improvement of the steam engine and also laid the foundations of thermodynamics. For the most theoretically efficient machine, called the Carnot engine, he showed that the efficiency of the engine is given by:

efficiency = 1 - TeolPhot (2.1)

where That and Teold are the temperatures of the hottest and coldest parts of the machine. There is a second statement derived from the second law: in contact between two bodies of different temperatures, the heat transference is from the hotter to the colder body. This apparently obvious statement is in fact a result of the concept of spontaneous reaction, a concept typically related to the second law. The correct meaning of heat and temperature and the relationship between them comes from the Molecular Theory and Statistical Physics. The first theory states that matter is made up of molecules. These molecules are in constant motion within the limits of the material. The heat of a material is the total kinetic energy of its molecules while the temperature of the object is the measurement of the average energy of the molecules. The total kinetic energy of particles in an object is given by:

Total Kinetic Energy = Lmv~ 12 (2.2)

which indicates that the more energy (heat) a body has, the faster the molecules are moving. The average value of the particles' energy gives the concept of temperature.

Lmv~ 12 A verage ftmetlcv·· E nergy = ; n =3'2 kT (2.3)

where n is the total number of particles, k is Boltzmann's constant (1.38x10-23 J/K) and Tthe absolute temperature. Since the molecules of various materials have different weights and sizes, the amount of energy required to speed up the set of molecules depends on the type of material. Moreover, the amount of heat required to cause a change in the average speed of the particles (related to temperature) is proportional to the mass (the amount of particles) of the object. A specific 2. Thermal transfer and thermal coupling in Ie's 25 heat coefficient can be defined for each material. It is the amount of heat (energy) required to raise the temperature of 1 gram of substance 1 degree Celsius (0C), (see Table 1 for a list of specific heat coefficients for different elements). Note that this concept is parallel to the capacitance per unit of volume in an electrical conductor. The overall heat capacitance of a body would be the specific heat multiplied by the mass of the body (in fact, the volume x density). This chapter is meant to present the reader with the main concepts regarding heat transfer (specifically the conduction mechanisms and the different types of thermal resistance). Additionally, the heat sources in an electrical circuit and the effect of temperature on device performance and reliability characteristics will be analysed.

Table 2. 1 . Specific heat for different elements. Element Specific Heat (kJlkK K) at 20°C Aluminium 0.0089 Copper 0.0076 Iron 0.0013 Silicon 0.0129

2. MECHANISMS OF HEAT TRANSFER

Heat transfers between bodies or regions of a body of different temperature. The heat flow always takes the direction from the body or region of higher temperature to that of lower temperature (this is another conclusion of the Second Law of Thermodynamics). The basic mechanisms or modes that model or explain heat transference are conduction, convection and radiation [64].

2.1 The conduction mechanism

The conduction mechanism applies to the transference of heat between different regions of the same body with a given gradient of temperature or between different bodies at different temperatures with a surface of physical contact. A difference of temperature in regions of the same body implies a difference of kinetic energy in the random motion of the molecules attached to the body lattice. Because of this difference in kinetic energy of molecules, there is transference of energy between the region of higher energy to that of lower energy. In the case of two bodies with a common physical surface, the 26 Chapter 2 transference is produced through this surface contact. The elementary processes of heat transference in the conduction mechanism are transference through atomic or molecular lattice vibration and transference through the exchange and collision of free carriers. In the case of gases and liquids the main principle is the collision between particles. In the case of semiconductors and insulators, the main mechanism is lattice vibration, and in the case of conductors the predominant mechanism is transference through carriers (electrons). All of these elementary mechanisms are included in the general concept of conduction at the macroscopic level. Let us fIrst focus our attention on a single, homogeneous body, extended in a single dimension (x), limited by two plain surfaces (S2 and S]) at different temperatures (T2' T], T2>T]), see Figure 2.1. The flux of heat (energy) at a given point tl{x), that is, the energy rate transferred in a unitary surface, is proportional to the temperature gradient dT I dx :

dT

where k is a parameter of the material of the body known as thermal conductivity. Table 2.2 shows values of k for a set of different materials. In the case of a homogeneous body and stationary state, the principle of energy balance between the different slices of the material concludes that the gradient of temperature is constant for all x of the body. In the case of Figure 2.1 this is applicable to the region between the two surfaces S2 and S], so if A is the transversal area of the section of the body (in the direction perpendicular to the heat flux) the rate of energy Q will be given by:

Q=A.

Table 2.2. Thermal conductivity for different materials Material K (watts/m. Kelvin) Air 0.027 PVC 0.092 Cork 0.043 Water 0.611 Pyrex glass 1.09 Stainless steel 15.0 Aluminium 204 Copper 386 Silicon 140 2. Thermal transfer and thermal coupling in IC's 27

L

Higher Lower Temperature Temperature Region (T2) Region (Tt)

temperature

T2 --.~

x

Figure 2.1. Conduction of heat in a homogeneous body.

where L is the distance between the two limiting surfaces. The above flux rate principle can be expressed in a more general way when 3-D or non-homogeneous bodies are considered. In this case, equation [67] is transformed into

«P(x,y,z)= -k· VT (2.6)

which is known as Fourier's Law.

2.1.1 Thermal resistance

In the analysis of the 1-D homogeneous body shown in Figure 2.1, we have seen that when a difference of temperature is applied to two surfaces of the body there is a flux of energy traversing the body that is proportional to the temperature difference (TrTI)' Thus, a parallelism can be established with the problem of electrical conduction in a homogeneous body when a voltage difference (V2- VI) is applied between two limiting faces. In this case, the charge rate (electrical current) is proportional to the voltage difference (Ohm's Law), and the proportional factor between voltage difference and current is called resistance, R, which depends on the material and body geometry. By this parallelism we can defme the concept of thermal resistance, Rth' [65] of a body such as the one shown in Figure 2.l: 28 Chapter 2

(2.7)

The concept is perfectly extendable to 3-D. Note that the expression 11k, independent of the body geometry, is equivalent to the concept of resistivity (p) in electrical circuits. Rth is expressed in Kelvin/watts or equivalent. This concept can be used to evaluate the overall thermal resistance of a multiple body system. Let us consider, as in Figure 2.2, the case of three cascaded bodies, where we apply a temperature difference (TrTJ) between the two extreme faces. We will assume that the surfaces in contact are perfect (perfectly flat, intimately connected or thermally adapted) in such a way that the entire contact surface is efficient to transfer heat. The case assumes a perfect lattice connection in the junctions, abruptly changing the thermal conductivity in them. If we designate as J12 and J23 the two junctions corresponding to the two interfaces between blocks BJIB2 and BiB3, we will call the temperatures in these interfaces Tn and T23 respectively. Here, the result of the principle of energy balance (continuity conditions), that is, the energy flux rate (Q) at equilibrium, is the same for any value of x (equivalent to the concept of current continuity in electrical circuits). Therefore, in each of the three blocks, B J, B 2, B 3, the concept of thermal resistance can be applied, and it follows that:

Q = T2 -1]2 = 1]2 -123 = 123 -1] (2.8) Rthl R th2 Rth3

with RthJ, R th2 and Rth3 being the thermal resistances of each of the three blocks. Figure 2.2 shows the temperature diagram. Note the three temperature gradients in each block. If we call the overall equivalent resistance (the general concept for the three cascaded block system) Rtheq we can write:

showing that the overall resistance of a set of cascaded bodies is the addition of the respective thermal resistances for each body. Figure 2.2 shows the equivalent electrical circuit. Similarly, the same concept can be applied to a system with several side-by-side bodies, in which case the equivalent resistance corresponds to the parallel of the respective thermal resistance. 2. Thermal transfer and thermal coupling in IC's 29

B1

Temperature

Figure 2.2. Equivalent thermal circuit/or three cascaded blocks.

2.1.2 Contact resistance

Real contact surfaces between bodies are placed far apart to verify the properties of the ideal surfaces mentioned above. All surfaces (even flat ones) are inherently rough [66]. This implies an additional restriction on heat conduction. The effective surfaces are highly dependent on metallurgic and microscopic characteristics and the treatment that may be applied to them [67]. Figure 2.3 features the diagram of the appearance of real contact surfaces; apparently flat at high scale (c), it has the appearance indicated in (a). Section (b) of the same figure indicates contact with a round probe (ball contact in a Ball Grid Array, for example). This feature introduces additional thermal resistance in the contact between two bodies. Figure 2.3 shows the overall equivalent thermal resistance circuit. The overall thermal resistance of the contact system has three components: the thermal resistance of each of the bodies plus the contact resistance Rj (d). 30 Chapter 2

In real systems, this contact resistance can be significant and in many cases, a limiting factor for heat evacuation. Thermal contact resistances can be described and evaluated using models that take into account the statistical profile of the surfaces (surface roughness, asperity slope, mean and rms values for each surface), the elastic deformation properties and the thermal constriction of the materials as well as the force and temperature of contact.

\.

Body 2

a) b)

Rthl junction 1 R· J

2 Rth2

c) d)

Figure 2.3. Real contact surfaces.

Documents on contact resistance models can be found in [64]. As an example for the case of the round probe shown in Figure 2.3 b) (corresponding to a contact in a Balling Grid Array package), the contact resistance Rc is given by:

(2.10)

where a is the radius of the round contact (see Figure 2.3) and ks=(2.kl·k2)/(kl+k2), with kJ and k2 being the respective thermal conductivity of the two objects (probe and body). In order to obtain a more accurate value for the contact resistance we should take into account the 2. Thermal transfer and thermal coupling in Ie's 31

effect of air in the gap around the contact and the deformation properties due to contact pressure.

Table 2.. 3 Examples ofcontact resistance values Silicon chip with lapped aluminium with pressure 0.3 - 0.6104 m2 KIN between 27 and 500 kN/m2 [65] Silicon chip and aluminium with 0.2 mm epoxy 0.2 - 0.9 104 m2 KIN [65] Aluminium and aluminium with metallic coating 0.01 - 0.1 104 m2 KIN (Pb)

2.2 The convection mechanism

Convection is a heat transference mechanism caused by the energy exchange and motion of molecules in a fluid. Two types of convection mechanisms are clearly identified: natural convection and forced convection.

2.2.1 Natural convection

Natural convection is the heat transference mechanism caused by the motion of molecules due to different density or composition of different regions of a fluid. Temperature usually has an important role in this difference of densities, but concentration gradients can also be the cause of this density difference. Along with the physical properties of the fluid and the thermodynamic conditions of the system, the shape of the vessel containing the fluid also has an important influence. In heat transference by convection, the pressure, humidity, speed of fluid, the type of surface and its position (vertical or horizontal) and, obviously, temperature influence the mechanism; The heat transfer function in a fluid is complex, but in the case of heat transfer from a body to the fluid (air, water, etc.) surrounding it, heat transfer is practically proportional to the difference in temperature between the surface Ts and the free stream fluid TF [68]. The rate of heat transferred is given by:

(2.11)

where the constant of proportionality he is called the convective heat transfer coefficient, whose units are watts/m2K. This equation is called Newton's Law of Cooling. The coefficient he depends on the type of surface and fluid, and the geometry and orientation of the bodies. Usually he is 32 Chapter 2 evaluated through an averaged heat transfer scenario. Table 2.4 shows typical he values for different situations.

Table 2.4. Convective heat transfer coefficient. Case he (Wlm2K) Natural convection in air 3-25 Natural convection in water 15-100 Forced convection in air 10-200 Forced convection in water 50-10000

2.2.2 Forced convection

Newton's law can be adapted for forced convection by changing the exponent affecting L1T. In the case of natural laminar cooling Q varies with L1T14 and in the case of turbulent fluid, with L1Tw. Additionally, the value of he is affected by conditions and is highly sensitive to the speed of the fluid [69].

2.3 The radiation mechanism

The radiation transference mechanism corresponds to the electromagnetic radiation emitted by a body because of its temperature at the expense of its internal energy (the energy of its particles). Thus, thermal radiation is an electromagnetic radiation of the same nature as light, radio waves and x• rays. Radiation is characterised by its frequency f and wavelength A, related by the expression A=clf, where c is the speed of light. Regarding the wavelength (A) for different radiations we find:

Thermal radiation Visible Ii ht Radiowaves

This type of heat transfer does not require a medium for transport, being in fact more efficient (in terms of energy loss in transport) in a vacuum [70][71][72]. The flux of radiant energy leaving a surface due to emission and reflection of electromagnetic radiation is its radiosity J (watts/m2). A body that absorbs all the incident radiation is called a blackbody. All the 2. Thermal transfer and thermal coupling in IC's 33

radiation leaving a blackbody, lb' is emitted by its surface and follows the Stefan-Boltzmann Law:

h=a·T4 (2.12)

where a is the Stefan-Boltzmann constant (5.67 10-8 W/m2K4) and T the absolute temperature. The lb radiosity for a blackbody is due only to the emitted radiation. Thus, it is called the blackbody emissive power. Real surfaces emit less radiation than the surfaces of a blackbody. In a real body the fraction of the blackbody emissive power is called emissivity, E. The heat transfer rate due to radiation is expressed as:

Q=c·A·a·T4 (2.13)

where A is the surface area. Table 2.6 shows a list of emissivity factors for different surfaces at room temperature.

Table 2.6. Emissivity factors. Surface Emissivity, E

I Pyrex glass 0.8 Stainless steel 0.3 Lightly oxidised aluminium 0.035 ( deoxidised) Water 0.9 Polished silver 0.02 Silicon 0.8 Alumina 0.78 Polished aluminium foil 0.05 White paper 0.86

In the case of a body with temperature T1 enclosed by an isothermal surface at temperature T2 the heat exchange between the two surfaces is given by:

(2.14)

which represents the balance of radiation energy, the energy leaving the body and the energy coming to the body from the environment. 34 Chapter 2

Example. Analysis of a heat dissipation problem with several heat transfer mechanisms. Let's suppose a heat source wrapped in a layer of material, and this system enclosed in an environment with air, as in Figure 2.4. The heat source is isothermal with a temperature of 50°C (TA), the thermal resistance (Rth) of the layer is lKIW, and the air environment enclosing the system has a temperature of 25°C (Tenv). IT the emissivity E of the surface of the wrapped system is 0.5 and the convectivity heat transfer coefficient he is 5 (assume an exponent of 5/4 for laminar convection), determine the temperature of the surface of the wrapped system TB•

Flu id (air) and walls surrounding th e system at 25

Wrapped sys tem convection

radiation

Surface A conduction Heat source TA- 50"C

Figure 2.4. Example system.

The heat source transfers heat to the wrapping layer through conduction and this layer transfers heat to the enclosing environment by convection and radiation. Thus, in equilibrium the two transference mechanisms have to be equal:

Qeonduetion = Qeonveetion + Qradullion

The answer is TB = 304.5 K (31.5 DC). 2. Thermal transfer and thermal coupling in Ie's 35 3. ENERGY BALANCE IN A MEDIUM: HEAT TRANSFER EQUATION

In both a whole object and an infmitesimal part of it the First Law of Thermodynamics, which describes the conservation of energy, can be expressed as a continuity equation similar to the cases of charge and mass transfer mechanisms. As in these cases, the equation (a differential equation) relates heat flow, temperature and thermal properties of materials and can be applied to model and solve both static and dynamic problems of heat transfer. In order to introduce the heat transfer law into the equation it is necessary to consider the specific laws for each type of heat transfer mechanism indicated in the above section. The solution for the heat conduction mechanism is usually sufficient for many engineering problems, where convection and radiation are not present or its effect is negligible. The equation for the conductive heat transfer in a continuous medium can be derived by imposing the principle of heat or energy conservation over an arbitrary volume (V) of the medium. which is bounded by a closed surface (S). For convenience, the energy conservation principle is expressed in a temporal rate formula, given by:

(2.15)

where Qv is the rate of heat increase in V, Qs is the rate of heat conduction into V across S and Qa is the rate of heat generation within V. IT u denotes the specific internal energy of the medium we can state:

Rate of heat increase in V = Qv = Jvp ~; dV (2.16)

where p is the density of the medium and u the internal energy of the body. The specific heat (cp) of the medium can be expressed as:

du c =- (2.17) p dT

where T is the average temperature of the volume V.

By introducing the concept of specific heat (cp) of the medium in the rate of heat increase in V, the equation can be written as: 36 Chapter 2

Rate of heat increase in V = Qv = fvp· c p ~ dV (2.18)

In order to analyse the rate of heat conduction into V across S we apply Fourier's Law:

Rate of heat conduction into V across S = =Qs = i -qsdS= i k ·VT·~·dS= ldiv(kvndv (2.19)

where qs is the heat rate in a differential surface of S, k is the thennal conductivity of the medium and n is a unitary vector perpendicular to the surface S to which the divergence theorem has been applied. If we defme qvas the heat rate per volume produced in V, we can write:

Rate of heat generation within V = Qa = fiNdV. (2.20)

The energy conservation principle can be written overall as:

r (p.c . aT -div(kVT)-qv)dV =0 (2.21) Jv p at

and since the volume V is arbitrary, it follows that:

p·cp • ~~ = div(kVT) + qv (2.22)

The above equation is a linear equation if the conductivity k and the specific heat capacity p.cp are assumed to be constant and the heat generation qv is independent of the temperature T. Equation (2.22) can be rewritten as:

(2.23)

where D = fp. cp and is called the thermal diffusivity of the medium, and V2 denotes the Laplacian operator.

Equation (2.22) is the thermal continuity equation for linear media and can give two additional well known equations: 2. Thermal transfer and thermal coupling in IC's 37

• without internal heat generation (qv =0):

aT 2 -=D·Vat T (2.24)

that is, the diffusion equation and • if temperature does not vary with time:

V 2 T+qv =0 (2.25)

that is, the Poisson equation. When qv=O it is called Laplace equation: V 2T=0. There are many practical engineering problems that require the analysis of a heat transfer equation. Exact analytical solutions are only possible for homogeneous media and for very well defined boundary conditions. Moreover, analytical problems expand unaffordably if expansion of materials, mass transfer or mechanical stress are considered. In these cases, numerical methods make for an ideal approach. The next chapter will deal with the methods used and problems solved when Computer Aided Design tools using numerical methods are applied.

4. THERMAL ELEMENTS IN Ie'S

The objective of this section is to introduce the heat sources present in an integrated circuit, which are responsible for the power dissipated by the system and the temperature increase of the circuit.

4.1 Heat sources

There are three main heat sources in nature, chemical reactions (reactions that produce heat together with the resulting components, for example, combustion reactions), radiation sources (electromagnetic waves, for example sunlight) and mechanical friction (present in all mechanisms and machines). In the field of electrical current across conductors, real insulators and semiconductors, the collision of carriers with the conductor molecular structure produces heat, in fact, a type of friction. This is called the louie effect. 38 Chapter 2

4.1.1 Passive components

Resistors are responsible for heat losses and generation in general electrical circuits. Figure 2.5 shows two typical situations in electrical circuits: a resistor R (case a) and a voltage source Vs (case b) with internal resistance Rs are crossed by a current I. In both cases the components generate a heat usually expressed as a rate through power dissipation P (watts). In all electric circuits both conductors and insulators introduce losses into the system because of the Joule effect. In the first case in Figure 2.5, (a) P=R.I2 and in the second (b) p=Rs.i. In the case of an AC excitation the current value in the above equations corresponds to the effective current, Inns. Ideal inductance and capacitance do not generate heat when working in a circuit; they are said to be lossless.

I I

a) b)

Figure 2.5. Losing circuit elements.

4.1.2 Active devices

This Joule loss effect due to current conduction can be extended to devices. Both bipolar and MOS devices dissipate heat when working in a circuit. Figure 2.6 shows four devices working in DC components: in case a) a or p-n junction dissipates a power P=!. Van where Vac is the anode• cathode voltage, in case b), a bipolar transistor, the dissipation is due to both base and collector currents with the power dissipation given by P=fs. VBE + Ie. VeE, and finally in case c), the MOS device, the power dissipation is only due to the channel conduction because of the null gate current, P=IDs. VDS. Section d) of Figure 2.6 shows the configuration of an MOS device as active load. The dipole circuit makes the MOS work in the saturation region, so the power dissipated is: P=K. V.(V- Vi12, where K is the transconductance factor and Vt the threshold voltage of the MOS. 2. Thermal transfer and thermal coupling in IC's 39

Ie Vac Ib ¥ ~s. ~:I ---I>t- ~Vce -1~ Vds Vbe I •

a) b) c) d)

Figure 2.6. Power dissipated by devices.

4.1.3 Power dissipation due to switching activity

In a digital circuit, devices and conductors work with transient currents. In the case of CMOS circuits, the current shape corresponds to the charge (through the PMOS device) and discharge (through the NMOS device) of a capacitance CL (see Figure 2.7) that models the input capacitance of the following stages. In such conditions, both devices (P and NMOS) dissipate the same power, regardless of their size. Power is dissipated in the sharp charge (PMOS) and discharge (NMOS) processes. The heat (energy) dissipated by each device is equal to the energy stored in the load capacitance: p=CL.V2DD.l2, where VDD is the power supply voltage and CL the gate load capacitance.

Figure 2.7. Switching device in a CMOS circuit.

This heat dissipation is the most significant - though not the only - mechanism in CMOS digital circuits. Two other mechanisms exist: the switching short-circuit current and the leakage power. The first one appears because of the simultaneous conduction of the series P and NMOS devices of the logic gate due to finite rise and fall times of voltage transitions. This causes an additional power dissipation during logic transitions that depends on the effective value of the current peak and operating frequency. The 40 Chapter 2 leakage power is caused by the flow of the saturation currents of the pn junctions as well as by the subthreshold currents in the MOS cut-off devices. In the latter case, the power dissipated does not depend on the operating frequency, being a time-continuous loss. In a complex circuit, millions of devices are working, switching currents and generating heat. This heat is removed through the heat paths towards the heat sinks, which are one of the more important elements of circuit packaging. To achieve this dissipation, the circuit manifests an increase in temperature over the ambient temperature. This effect is known as self• heating in active circuits. The analysis of this phenomenon and its use as a test observable is the main objective of this book.

4.1.4 Peltier Effect

When an electrical current passes though the junction of two different conductors, heat is either released or absorbed at the junction, depending on the direction of the current. This effect is due to the difference of energy levels of the charge carriers in the two conductors that form the junction. Thus, when two different conductors named a and b are joined, the amount of heat released or absorbed can be written as:

(2.26)

where n is the number of charge carriers per volume, W is the average energy of the charge carriers and v is the mean drift carriers velocity. The electrical current is the same in both conductors, therefore:

(2.27)

where q is the electron charge. Combining the two expressions:

E -Eb Q = I· a = I . IIab (2.28) q

where ILb is the Peltier coefficient between the two conductors, defined as the amount of heat absorbed or released per unit current. It is interesting to note that the Peltier effect is, unlike louIe effect, a reversible process. Therefore, if heat is released when current crosses the junction in one direction, the same amount of heat would be absorbed if the current is reversed. 2. Thermal transfer and thermal coupling in Ie's 41

4.2 Ie structure: materials and transfer

Nowadays electronic technology allows the integration of an entire electronic system on a single piece of silicon [73] . This technology, called integrated circuit technology or VLSI technology minimises the cost of systems and makes for higher performance of their components. The number of components practically integrated in a single circuit increases geometrically due to manufacturers' tendency to reduce layout resolution (Moore's Law, [74]). In this technology, all of the circuit components are placed on the surface (planar technology), called the active region, of a silicon crystal, called the body or substrate. The bulk material used in both the active region and the substrate is silicon (Si). The interconnection of the great number of devices is achieved through a complex multilayer interconnection system deposited on the surface of the active region. Silicon oxide (electrical insulator) Si02, is used for the insulation and infill between layers. The upper part of the circuit is passivated with a layer of Si02 or a similar material to prevent corrosion. The whole integrated circuit is packaged, usually in plastic or ceramic material. The packaged integrated circuit is connected to others through a (PCB). The active region is considered to be the place where heat is generated because of the dissipation of electronic devices (transistors and diodes), see Figure 2.8. Due to the predominance of heat conduction over convection and the high thermal conductivity of silicon and metal layers [75], heat dissipated by the device is mainly conducted through silicon and metal to the substrate, and from it heat is conducted through the package to PCB heat sinks.

packaging

air f------i ---+-.... passivation interconnections l======t--+-... active region substrate base of the package

~ PCB ~------~ Figure 2.B. lAyers in the thermal path of an integrated circuit.

The resulting conducting path is a complex system made up of multiple components. Each path from a device to a heat sink contributes to the final value of the effective thermal resistance. The literature includes analysis of 42 Chapter 2 thermal paths and values of thermal resistance for many types of package: DIP (dual in-line package) [76], CCC (ceramic chip carrier) [77], PGA (pin grid array) [7S], and PLCC (plastic leadless chip carrier) [79].

5.EFFECTS OF HEATER TRANSFER IN IC'S

5.1 Temperature sensitivity of electronic devices

In this section, the effects of temperature on electronic devices used in integrated circuits will be evaluated. Three device types are considered: MOS transistors, p-n junctions and BiT transistors. Given a parameter X of an electronic circuit, the temperature sensitivity coefficient for X, S; , is defmed as:

(2.29)

This concept can be applied to any parameter of a circuit. Although R, L and C components have temperature variations, here our attention will only be devoted to active components.

5.1.1 Temperature effects in MOS transistors

There are two parameters of an MOS device that are sensitive to temperature: the carriers' mobility 11 and the device threshold voltage V,. The mobility of carriers in the channel is affected by temperature. A good approximation of this behaviour is given by [SO]:

(2.30)

where T is the absolute temperature of the device, To is a reference absolute temperature (usually room temperature) and kJ is a constant with a value between 1.5 and 2.0 [SI]. The device threshold voltage V, exhibits linear behaviour with temperature [S2]:

(2.31) 2. Thermal transfer and thermal coupling in IC's 43

where k2 is a factor between 0.5 m VIK and 4 m V IK, with the range becoming larger with more heavily doped substrates and thicker oxides. The two parameters can be joined in the saturation region equation:

I = p(T)Cox W (V _ V (T» 2 (2.32) D 2 L GS t

from which it can be observed that an increase in temperature causes an increase of the drain current due to the decrease in VIT) and a decrease of the drain current due to the decrease of f.1{T). Effects on mobility are predominant at high drain currents, as is the decrease of VIT) at low currents. A working point for the device can be found at which it becomes practically temperature independent. Applications in which the MOS transistor works in the subthreshold region (VGs

(2.33)

where I DO is a temperature-dependent parameter, n is a coefficient around 1.5, and tA = kT / q. The main effect of temperature sensitivity is due to the exponential temperature component [80].

5.1.2 P-n junction diodes

The significant effect of temperature on the characteristics of a p-n junction makes these and related devices such as bipolar transistors very interesting as thermal sensors. In the case of a p-n junction, the Schockley [83] equation

I = IS (e qV/ kT -1) (2.34)

gives us the behaviour of the conducting current I for a given forward biasing voltage V. In the equation, Is is a temperature-dependent parameter called the saturation current that depends on the type of semiconductor and the geometry of the device. In the equation, q is the electron charge, k the Boltzmann constant and T is the absolute temperature of the device. 44 Chapter 2

In spite of the explicit thermal dependence of the components inside the parenthesis the major thermal sensitivity is due to the Is parameters. This parameter can be written as [84]:

(2.35)

where A is the section area of the device, ni is the intrinsic concentration of carriers, ND and NA the concentration of donors and acceptors respectively, Dh and De the diffusion coefficients for holes and electrons and Lh and Le the diffusion length for the two types of carriers. The temperature-dependent components are Dh, De, Lh, Le and ni but the overall thermal behaviour is practically only dependent on the n/ factor. This factor is heavily affected by temperature:

ni2 - _ K . T3 ·e -Ego/kT (2.36)

where K is a constant and Ego is the gap energy of the semiconductor at zero Kelvin. In this expression, the main effect of T comes from the exponential factor. On the basis of this analysis, it can be concluded that the diode current increases practically exponentially with temperature. The thermal sensitivity coefficient for a diode current with constant voltage biasing is given by:

l dI I _ E80 - qV I· dT Vconstant = kT2 (2.37)

If we consider the diode voltage thermal variations for a constant biasing current, the voltage shows a practically linear decrease with temperature:

dV Ego -qV (2.38) dT I /constant == qT

5.1.3 BJT devices

The impact of temperature on BJT device behaviour can be analysed with the Ebers-Moll model equations [85]. The effects are strongly interrelated with those examined in the above section for the p-n junctions. 2. Thermal transfer and thermal coupling in Ie's 45

From these equations, the expression of the collector current, Ie, can be derived from the knowledge of the base current, IB' and the voltages biasing (VeE,VBE):

l = (....!!L-). I +~ (eq(VCE -VBE 'IkT -1) (2.39) e I-a B I-a ~ F F

where leo is the saturation current (see the concept of Is in the above section) of the collector junction and aF is the forward current gain collector• emitter. The term (ap/(l-aF» is also written as PF, the forward current gain collector-base factor. The effects of temperature on the device parameters can be summarised as:

• The saturation currents of the two p-n junctions (emitter-base and base• collector) increase with T following the principles indicated in the above section. As a rule of thumb, the leo current duplicates its value with each 10 °C increase. • For a constant current biasing, the effect of T on the junction voltages will also follow the principles indicated for the p-n junctions. Specifically:

(2.40)

• The IdlB current gain, p, increases markedly with T. Basically, [86] this is due to the increase in the thermal activity of carriers that causes a special decrease in the recombination factor, because of greater thermal agitation, increasing the value of p.

As the main self-heating source in the active region is caused by leVCE, there is positive feedback (power consumption, temperature, current gain, power consumption) that may cause a thermal run-out. This effect requires care in the thermal sink design in EJT transistors to avoid the thermal destruction of the device.

5.2 Ageing mechanisms and circuit degradation

Along with the changes of characteristics in the devices analysed in the above section, heat (temperature) may produce irreversible structural degradations, which are usually caused gradually over time (ageing), and that may affect the behaviour of the device and even cause functional failure. 46 Chapter 2

Heat accelerates the failure rate of components. Its study is one of the main objectives of reliability analysis. The failure probability of a device under a given set of working parameters has brought about the use of statistical models for the analysis of components' useful lifetime. The main mechanisms in circuit degradation are:

• Corrosion, the oxidising of conductors and especially contacts caused by the ambient and accelerated by mechanical stress and temperature [87]. Changes in composition (in the case of conductors) and increases in resistance due to surface oxide films (in the case of contacts) degrade the electrical component. • Hot-carner-induced (HCI) defects [88]. In a deep submicron MOS device the kinetic energy of channel carriers is high enough to create new energetic states in the interfaces and oxides when the carrier flux is scattered in the drain region. This phenomenon is present at the same time as new interface states are created when the hydrogen bonds Si-H that passivate the dangling bonds at the mismatched Si• Si02 interface are activated by hot carriers during device switching. This process influences the drain current by i) increasing the local resistance due to trapping of carriers and ii) decreasing the local carrier mobility due to the charge scattering. This mechanism is accelerated by temperature. • Time-dependent dielectric breakdown. Especially active on the thin oxide on the channel of the device, this effect leads the devices to malfunction due to the loss of insulation of the dielectric barrier [89]. The defect, called Gate-Oxide-Short or Gate-Oxide-Breakdown degrades the device characteristics (both dynamic and static) and violates the insulated gate concept, which is basic to MOS devices (also called IGFETs, insulated gate field effect transistors). This mechanism is very sensitive to temperature. • Electromigration. Electromigration is the phenomena observed in conductors that causes molecular migration of the conductor structure towards the electronic flow [90]. The energy of the flow can be high enough to collide with atoms or molecules of the solid and transport them in space. This effect leads to the creation of voids that actually accelerate the process, first causing the conductor to increase its resistance and subsequently creating an open. This phenomenon is characterised by the type of material and grain compositions, the current density and temperature. In 1969, Black published a semi• empirical theory in which he related the MTTF (mean time to failure) to current density and temperature [91]. 2. Thermal transfer and thermal coupling in Ie's 47

ill all the previously indicated degradation mechanisms, temperature is a very significant accelerating parameter. It is so significant that high temperature testing techniques are used to detect infant mortality (screening) in electronic device manufacturing. The physics of failures uses the Arrhenius Relation to model the effect of T on failure rate. Swedish scientist Svante Arrhenius (1859-1927) presented his relation in 1889 to describe the temperature dependence of the reaction rates of chemical constituents reacting with one another. The relation is given by [92]: _5. r=A'e kT (2.41)

where r is the reaction rate, A is a constant, T the absolute temperature, k is Boltzmann's constant and Ea the activation energy, an energy threshold required for the reaction to occur. ill failure physics, Arrhenius type relations are often used to evaluate the lifetime of components with temperature. It is the case of Black's model for electromigration, the HellberglPeck model for corrosion and other similar models for the other mechanisms. One of the most important results of the relation is the effect of accelerating the rate of failure (or reaction) due to a temperature increase. The accelerating factor a for a temperature T with respect to a given reference temperature To is given by:

(2.42)

thus showing the exponential accelerating effect of temperature on physical-chemical processes.

6. CONCLUSIONS

illtegrated circuits are tending towards very complex electronic systems in which millions of devices are arranged on a single semiconductor crystal. The Joule effect relates the passage and switching of currents across conductors and devices to heat generation. There are three mechanisms of heat transfer from heat sources to heat sinks, but the conduction mechanism is predominant in the case of integrated circuits. The effective thermal resistance of a circuit corresponds to a complex mesh of heat conductors. Packaging modelling deals with the 3-D evaluation of such a heat propagation in real Ie's. However, the conduction mechanism follows well- 48 Chapter 2 known equations. The complexity of a real case forces the use of numerical analysis. The models of heat transfer also allow the evaluation of heat transfer over time. Nowadays, heat management is an integral part of VLSI design. Heat affects the behaviour of electronic devices because of its impact on the thermal agitation of structure and carriers. Heat is also a source of structural (and consequently functional) degradation. This fact affects both conductors and insulators, and is an essential parameter in the reliability analysis of a component.

7. APPENDIX: UNITS AND CONVERSION FACTORS

There are two sets of units used worldwide that affect energy and heat transfer as well as temperature. The SI (Systeme International) uses the kilogram (kg) for mass, the metre (m) for length, the second (s) for time and the Newton (N) for force. The English Engineering Units use the pound mass (Ibm) for mass, the foot (ft) for length, the second (s) for time and the pound force (lbf) for force.

The conversion factors between the magnitudes are given by:

1 N = 0.22481 [bf 1 lbf = 4.448 N 1 m = 3.2808 ft 1 ft = 0.3048 m 1 kg = 2.2046 Ibm 1 Ibm = 0.4536 kg

In the SI the unit of energy is the joule (J, I joule = 1 Newton x 1 metre). In addition to the joule (J) there are special units for energy, the calorie (cal) used for heat, the Btu (British thermal unit) used in the English System and the electron-volt (eV). The conversion factors between them are:

1 J = I N.m =0.4787 x 10-4 Btu = 6.242 x 1018 e V = 0.2388 cal 1 Btu = 1055.056 J 1 cal = 4.1868 J

The power magnitude is measured in watts (W, J/s, Sf) and Btulh (Btu per hour, English system); 2. Thermal transfer and thermal coupling in IC's 49

1 W = 3.4123 BtWh 1 BtWh = 0.2931 W 1 Btuls = 1055.1 W

The temperature magnitude is usually measured in Celsius degrees (0C), Fahrenheit degrees (OF) and Kelvin degrees (K, absolute temperature). The conversion factors between them are:

T(K) = T(0C) + 273.15 = [T(°F) + 459.67J/1.8 T(OC) = [T(°F) - 32J/1.8 T(°F) = 1.8T(°C) + 32 = 1.8[T(K) -273.15J + 32

The increments of temperature can be expressed in the three units with the following relation:

The heat transfer coefficient can be expressed in the following units, showing the conversion factor:

The specific heat can be expressed as:

1 J/(kg.K) = 2.3886 x 10-4 Btul(h.jt. OF)

in the case of the thermal conductivity:

1 W/(m.K) = 0.57782 Btul(hft. OF)

and finally the thermal resistance can be expressed and converted as follows:

1 KIW = 0.52750 °F.hlBtu.

8. REFERENCES

[64] Rohsenow, W, Hartnett, J.P. and Cho, Y.I., "Handbook of Heat Transfer", McGraw-Hill Handbooks, 1998. 50 Chapter 2

[65] Antonetti, V.W. and Yoyanovich, M.M., "Enhancement of thermal contact conductances by metallic coating: theory and experiments", Journal of Heat Transfer (107): 513- 519,1985. [66] Bush, A.W. et al., "The elastic contact of a rough surface", Wear, vol. 35, pp. 87-111, 1975. [67] Mikic, B.B. and Rohsenow, Thermal contact resistance, Mechanical Eng. Report No. DSR 74542-41, MIT, Cambridge, 1966. [68] AI-Arabi, M. and El-Riedy, M.K., "Natural convection heat transfer from isothermal horizontal plates of different shapes", Int. J. Heat Mass Transfer (19): 1399-1404,1976. [69] Shah, RK. et al., "Laminar flow forced convection in ducts", Supplement 1 to Advances in Heat Transfer, eds. T.F. Irvine and J.P. Hartnett, Academic Press, New York, 1978. [70] Siegel, R and Howell, J.R, Thermal Radiation Heat Transfer, 3rd ed., Hemisphererray10r and Francis, Washington, D.C. 1992. [71] Brewster, M.Q., Thermal radiative transfer and properties, John Wiley and Sons, New York, 1992. [72] Modest, M.M., Radiative heat transfer, McGraw-Hill, New York, 1993. [73] Perry, T.S. "For the record: Kilby and the IC", IEEE Spectrum, Volume: 25 Issue: 12, Dec. 1988, Page(s): 40-41. [74] Borkar, S. "Obeying Moore's law beyond 0.18 micron" ASIC/SOC Conference, 2000, Proceedings. 13th Annual IEEE International, 2000 Page(s): 26-31. [75] Bakoglu, H.B., Circuits, interconnections and packaging for VLSI, Addison-Wesley, 1990. [76] Andrews, J.A. et aI., "Thermal characteristics of 16 and 4O-pin plastic DIP's", IEEE Transactions on Components, Hybrids and Manufacturing Technology, vol. CHMT-4, no. 4, 1981, pp. 455-461. [77] Mahalingam, M., "Thermal management on packaging", Proceedings IEEE, vol. 73, no. 9,1985, pp. 1396-1404. [78] Mahalingam, M., "Thermal studies on Pin Grid Array Packages for High Density LSI and VLSI logic circuits", IEEE Transactions on Components, Hybrids and Manufacturing Technology, vol. CHMT-6, no. 3, 1983, pp. 246-256. [79] Krueger, W. and Bar-Cohen, A., "Thermal characterization on a PLCC- expanded Rjc methodology", IEEE Transactions on Components, Hybrids and Manufacturing Technology, vol. 15, no. 5,1992, pp. 691-698. [80] Tsividis, Yannis P., "Operation and modelling of the MOS transistor", McGraw Hill, 1989. [81] Klaasen, F.M., "MOS devices modelling" in Design of VLSI Circuits for Telecommunications, Y. Tsividis and P. Antognetti (editors), Prentice Hall 1985. [82] Klaasen, F.M. and Hes, W., "On the temperature coefficient of the MOSFET threshold voltage", Solid State Electronics, vol. 29, pp. 787-789,1986. [83] Shockley, W., "The theory of p-n junctions in semiconductors and p-n transistors" Bell System Tech. J., 28, pp. 435-489, July 1949. [84] Gray, P.E. et al., "Physical electronics and circuit models of transistors", John Wiley 1970. [85] Ebers, J.J. and Moll, lL., "Large-signal behaviour of junction transistors", Proc. IRE, 42, pp. 1761-1772, December 1954. [86] Gray, P.E. et al., Physical electronics and circuit models for transistors, John Wiley and Sons, 1970. 2. Thermal transfer and thermal coupling in IC's 51

[87] Eiksson, P. et al., "Design of accelerated corrosion tests for electronic components in automotive applications", IEEE Transactions on Components, Hybrids and Manufacturing Technology, Part A, vol. 24 no. 1, March 2001, pp. 99-107. [88] Hess, K. et al., "The physics of determining chip reliability", IEEE Circuits and Systems, May 2001, pp. 33-39. [89] Mizubayashi, W. et al., "Statistical analysis of soft breakdown in ultrathin gate oxides", Symposium on VLSI technology, 2001, pp. 95-96. [90] Parikh, S. et al., "Defect and electromigration characterization of a two level copper interconnect", Interconnect Technology Conference, 2001, pp. 183-185. [91] Blake, J.R., "Electromigration failure modes in aluminium metallization for semiconductor devices", Proc. IEEE, vol. 57, p. 1587, 1969. [92] Manca, IV. et al., "The Arrhenius relation for electronics in extreme temperature conditions", Third Conference on High Temperature Electronics, 1999, pp. 29-32. Chapter 3

Thermal analysis in integrated circuits

1. INTRODUCTION

The goal of this chapter is to present different techniques to obtain temperature maps or temperature waveforms at certain areas or points of an integrated circuit (IC). The data we need to perform this analysis are: description of the physical structure of the IC, thermal properties of the materials from which it is made, placement of the devices that act as heat sources, description of its power consumption waveform and, finally, the thermal conditions of the IC's surroundings, which will determine what we will call the boundary conditions. The relation between all this initial data and temperature is given by the heat transfer equation, which should be solved inside the structure of the Ie. In this chapter we will first present techniques to solve this equation, and second, ways to perform thermal and electrical circuit simulations simultaneously. It is not our intention to discuss all of the techniques available to perform thermal analysis, but to present the principles of the main ones. Specifically, we will focus on the techniques that are going to be used in the next chapter to study the use of temperature for test purposes, or that are used in the specialised literature of that field. At the end of this chapter the reader will find a detailed bibliography with full descriptions and application examples for the various techniques available. This chapter is organised as follows: in Section 2 we present definitions in order to clearly differentiate between thermal and electro-thermal analysis and to present the terminology and meaning of the different boundary 53 54 Chapter 3 conditions. In Section 3 we introduce techniques for thermal analysis of IC's, and in Section 4 two different ways to perform electro-thermal simulations are shown. Finally, chapter conclusions and references are covered in Section 5.

2. DEFINITIONS

2.1 Thermal analysis versus electro-thermal analysis

First, we must differentiate between thermal analysis and electro-thermal analysis. The goal of thermal analysis is to obtain temperature values as a function of the power dissipated in some IC locations when this power function is temperature independent. In IC's, thermal analysis is usually performed to extract figures of merit on the transfer medium, i.e., the IC structure. The most typical figures of merit are the thermal resistance and the thermal coupling resistance. The former relates the temperature of a device to its power dissipation (self heating), whereas the latter relates the temperature at one point to the power dissipated by a device placed at a different point (thermal coupling). The units of both figures of merit are °CIW, and their use assumes linearity, which means that the thermal properties of the materials involved in the analysis are temperature independent. Electro-thermal analysis is performed when the power function dissipated in some locations of the IC is temperature dependent. As explained in the previous chapter, the electrical behaviour of semiconductor devices is strongly temperature dependent. This implies that if an accurate analysis of an electrical circuit is to be carried out, thermal and electrical analyses have to be coupled as described in Figure 3.1.

Electrical Power dissipated input .------. by the devices nodes Electrical analysis Thermal analysis Thermal of circuits. Units: of circuits. Units: output [V],[A]. [W],[°C] nodes Electrical Temperature of output each device nodes

Figure 3.1: Coupling between thermal and electrical analysis in an electro-thermal analysis. 3. Thermal analysis in Ie's 55

Usually, this analysis is performed in physical design stages. Its goal is to prevent performance degradations due to temperature imbalances between devices or to analyse electro-thermal circuits, i.e., circuits in which the information is supported not only by voltage and current functions, but also by temperature, taking advantage of the thermal coupling that exists between devices placed on the same IC. Examples of electro-thermal circuits can be found in [1] to [5].

2.2 Boundary conditions

Now we will focus on the thermal analysis of IC's. The goal is to solve the heat transfer equation. The question is: in which region of space should we solve this equation? In the previous chapter it was shown that most of the heat generated by the devices is transferred to the heat sink by conduction through the solid part of the Ie. To simplify the analysis process, it would be interesting to restrict the analysis to a portion or all of this solid part. Thus, mathematically, the solid part subject to analysis should be uncoupled from the material surrounding it, obviously maintaining the continuity of the thermal transfer mechanisms in the border surface that separates the analysed solid from the surrounding environment. This is achieved by specifying the thermal conditions of that border surface: the boundary conditions. Considering the specific problem of thermal analysis of IC's, the boundary conditions typically used are: -First kind or Dirichlet boundary condition: the temperature, T, of the boundary surface is specified:

(3.1)

where rB are the spatial coordinates of the boundary surface and t is time. A subset of this boundary condition is the isothermal boundary condition:

(3.2)

where TB is a time-independent and space-independent temperature. -Second kind or Newman boundary condition: the heat flow through the boundary surface is specified.

(3.3) 56 Chapter 3

where nB is the direction orthogonal to the boundary surface at the coordinate rB. A subset of this boundary condition is the adiabatic boundary condition, where the heat flow through the boundary surface is zero.

OT(r,t)1 =0 (3.4) onB r=rB

-Third kind, mixed or convective boundary condition: Newton's law of cooling is specified at the border surface:

(3.5)

where he is the convective heat transfer coefficient of the boundary surface and TE is the temperature of the fluid exterior to the analysed solid part (by specifying a constant temperature, this fluid is assumed to be a heat sink).

2.2.1 Example 1: Application of boundary conditions for an Ie analysis

Figure 3.2. shows a silicon die of an Ie in a rectangular coordinate axis. The extension of the die goes from the coordinate (0,0,0) to (L, W,H). Its surface is located at the plane Z=O and an MOS transistor dissipating a static power Po is placed at the coordinate (Xd, Yd, 0). Let us suppose that the goal of the thermal analysis was to determine the differences of temperature between points A and B due to this power consumption.

(O.O,O)!.--, __x,.."d ______----,L,,- ______-x. , ,/ /' Yd MO~':H --~- -- -i ------, , 1 w~----~~---+------~

Figure 3.2: Representation a/the Ie described in Example 1. 3. Thermal analysis in IC's 57

One approach would be to restrict the thermal analysis to the silicon die with the following boundary conditions: -Bottom boundary surface isothermal, as it is assumed to be in contact with a good thermal conductor (usually, silicon dies are attached to a metal layer used as a heat spreader with a thin layer of high thermal conductive glue). Since we aim to find a temperature difference, this reference value may have an arbitrary value. For the sake of simplicity, we set it to zero. -Lateral boundary surfaces adiabatic, as heat flowing through these surfaces can be neglected due to their small surface area in comparison with that of the bottom boundary. This lateral area is usually in contact with a low thermal conductive material or still air. If the temperature differences between the solid and the air are small, convection can be neglected in favour of conduction through the silicon die. -Top boundary surface adiabatic, as low thermal conductive layers, such as silicon dioxide and a passivation layer, are placed over the silicon. Additionally, these layers can be in contact with still air. With these boundary conditions, the Poisson equation of heat conduction can be solved in the silicon die:

- k· V 2T(x, y, z) = P(x, y, z) (3.6)

where P(x,y,z) is the power density function dissipated inside the silicon die [W/m3]. In MOS transistors, power is mainly dissipated in the channel, beneath the gate. Therefore, in this example the power density function is:

_ if (x, y,Z)E channel of the MOS transistor P( x, y, z ) - {~w·[ . d (3.7) o if (x, y, z)i!: channel of the MOS transistor

where d is the MOS channel depth and w, I are, respectively, its channel width and length. As d is very small in comparison with the dimensions of the silicon die and with distance between the MOS and points A and B, the analysis can be simplified by incorporating this power dissipation into the top boundary surface. Now, it would be an adiabatic boundary surface in the entire top surface except the area over the device, where an incoming heat flow equal to Pol(w·l) is applied [W/m2]. With this simplification, no heat is generated inside the silicon die, and the temperature at points A and B can be found by solving the Laplace equation of heat conduction:

(3.8) 58 Chapter 3

The mathematical expression of the boundary conditions would be:

Tlz=H =0 aT! _ aT! _ aTI _ aTI _0 (3.9) ax x=O - ax x=L - dy y=O - ay y=w -

(X,Y)E _.-k aT! -_{~[·W [(Xd-i,Yd-W),(Xd+i,Yd+2 2 2 w)] 2 az z=O 0 otherwise

It is important to remember that these boundary conditions are simplifications of reality that allow us to uncouple the region of analysis from the rest of the universe. The temperature values obtained with this analysis will agree with the real ones in accordance with the accuracy of these simplifications. For instance, [5] shows how the isothermal bottom boundary condition is not accurate enough for the thermal analysis of power Ie's. In that reference, the authors analyse a structure formed with two layers: silicon die over a metal die.

3. THERMAL ANALYSIS OF INTEGRATED CIRCUITS

There are two different approaches for performance of thermal analysis of IC's: analytical methods and numerical methods. The goal of analytical methods is to obtain a mathematical expression that writes the temperature inside the analysed region as a function of all the variables that may affect it: power dissipated by the heat sources, its location, thermal properties of the materials, boundary conditions, etc. The great advantage of this technique is that it facilitates performance of parametric analysis, that is, analysis of temperature behaviour as a function of one of the variables present in the obtained expression. Its main drawback is that it can only be used when the geometry of the region under analysis can be easily described in one of the three coordinate systems (rectangular, cylindrical or spherical), and when the number of heat sources is small and the dissipated power is either a constant, a step function or a periodic function. Numerical methods discretise the region under analysis into a mesh of nodes and generate a set of linear equations in which the unknown quantities are the temperatures of the different nodes. The main advantage of this method is that it limits neither the geometry description of the region under 3. Thermal analysis in IC's 59

analysis, the number of heat sources nor the time description of its power dissipation. An additional advantage of these methods is that they allow the coupling of simulations from different domains: thermal, optical, mechanical, etc. The restrictions of this technique are imposed by the computational resources available to solve the linear equation system.

3.1 Analytical methods

In this section we will vmsider the case of thermal analysis with just one heat source present. If the problem has n heat sources, the final solution would be the superposition of the n individual temperatures obtained with just one heat source. By doing so, we assume linearity, i.e., the thermal properties of the materials are temperature independent. Generally, analytic solutions of the heat conduction equation are classified into three main categories: -Closed form solutions. -Fourier series summation (separation of variables). -Approximated solutions. Closed form analytic solutions are the most interesting, due to their simplicity and applicability. For instance, if a spherical heat source, with radius r0' that dissipates a constant power P is placed inside an infinite and homogeneous medium, a closed form for the temperature can be found for r>ro from Fourier's law of conduction:

PdT Pdr q=--=-k-:::::)dt=--·- (3.10) 4m- 2 dr 41lk r 2

integrating:

p T(r)=--+C forr> ro (3.11) 41lkr

where C is an integrating constant. As T=O for r~oo => C=O. This equation is very simple and easy to handle; however, it may not be valid when the problem has boundary conditions, such as the case described in Example 1. When boundary conditions exist, the analytical solution can be found by using the method of separation of variables and series decomposition of the temperature function. This technique is presented in the following examples. In Example 2, we will calculate the temperature inside a silicon die when temperature depends on the spatial variables x and y. In Example 3 we show 60 Chapter 3 how the method presented in Example 2 can be extended for the calculation of the temperature map inside the silicon die when it depends on the three spatial coordinates x, y, z and time. In Example 4 we demonstrate how temperature can be calculated in a cylindrical silicon die. Example 5 shows how cylindrical coordinates can be used for the analysis of IC's, comparing AC thermal analysis performed with both cylindrical and rectangular coordinates. Finally, Example 6 shows how thermal maps can be obtained in a multilayer structure. In the context of IC thermal analysis, some specific problems may have approximated solutions. These are obtained by either simplifying exact solutions or combining closed form solutions. They are not presented in this book, but the reader may find some examples in [6], [7] and [8].

3.1.1 Example 2: Presentation of the method. Calculation of a static two-dimensional temperature map

Figure 3.3 shows a silicon die placed over a metal layer kept at a constant temperature. There is a resistor strip over the silicon die that dissipates a constant power P. The goal of this example is to perform a thermal analysis to find the temperature inside the silicon die. If the strip is very large, in some areas inside the silicon temperature variations in the z direction can be assumed to be zero. Although it is not very realistic, the reason for this assumption is to reduce the three-dimensional problem to a two-dimensional one for pedagogical reasons. Thus, the aim is to find the temperature function T(x,y) inside the silicon die. As explained in Example 1, the thermal analysis is restricted to the silicon strip. The boundary conditions of that problem would be: i) Bottom surface isothermal. For the sake of convenience, the value T=O is chosen. Later, the real temperature of the metal has to be added to the expression obtained with this boundary condition. ii) Lateral surface adiabatic. The heat flow through the lateral surface is neglected with respect to the heat flowing through the bottom surface. iii) The heat flow through the top surface is known and equal to:

dT(X'y)1 _{- P =Pd - k - Xb- X (3.12) dy y=H 0 a Otherwise

The temperature function T(x,y) can be found by solving the Laplace equation inside the silicon strip: 3. Thermal analysis in IC's 61

(3.13)

The method of separation of variables assumes that the function T(x,y) can be expressed as the product of a function that only depends on x, X(x) and a function that only depends on y, fry).

T{x, y) = x{x). y{y) (3.14)

y Top View A I

, .....- Resistor Power=-P ,6/I S·ilieon dioxide (O,H)

Si lieon (0,0) (L,O) --->X Metal at T=Constant

Lateral view

Figure 3.3: Structure analysed in Example 2.

Therefore, the heat transfer equation (3.13) can be written as:

(3.15)

Dividing both sides of equation (3.15) by T(x,y) and rearranging:

1 d 2 X 1 d 2 y (3.16) - X dx2 = Y dy2

The left side of equation (3.16) only depends on x, whereas the right side only depends on y. This equality has to be satisfied for all the possible values of x and y. This is only possible if both sides of (3.16) are equal to a constant, which is chosen to be ri. Rearranging, two ordinary differential equations are obtained: 62 Chapter 3

(3.17)

The set of values athat satisfy (3.17) and the boundary conditions are the eigenvalues of the problem, whereas the functions X, Y that satisfy both the equations and the boundary conditions are the eigenfunctions of the problem. When ancO, the general solution of (3.17) is:

x = A· Cos(a· x)+B· Sin (a . x) (3.18) Y = C· Cosh(a· y)+D· Sinh(a· y)

and when a=O:

X =ax-b (3.19) Y=cy+d

Focusing on (3.18) and considering the boundary conditions:

-dX[ =O=>B=O dx x=O dX =-a.A.Sin(a.x)+a.B.Cos(a.x) => dx dX! = 0 => Sin(a. L) = 0 => an dx x=L n1C =-,n=1,2,3 ... YI y=O =o=>c=o L (3.20)

Therefore, the eigenfunctions of the problem are, for a,.:;r{):

Tn(x,y)=Cn . Cos (an ·x)·Sinh(an .y) (3.21) Cn =An ·Dn

The solution for n=O ( a=O) is: 3. Thermal analysis in IC's 63

TO (x, y) = (ax+b)(cy +d) = CoY Co =bc a =0 asY(O)=O (3.22)

d=O asdXI =0 dx x=L

As equation (3.17) is a linear differential equation, its general solution is a sum of the various eigenfunctions:

00 T(x, y) = Co' Y + L Cn . Cos(an . x)· Sinh(an . y) (3.23) n=l

The value of the constants Cj is derived when the heat flow entering from the top surface is written in terms of infinite series of cosine functions or Fourier series:

-k-aTI =g(x):::)-aTI =---=~+g(x) ao Lan00 . Cos (an ·x) (3.24) ay y=H CJy y=H k 2 n=l

where:

ao = 2·Co (3.25a) an =Cn ·an . Cosh (an ·H) (n = 1,2,3 ..... )

From the Fourier series theory, the value of the coefficients aj can be found with the following expression:

L 1 g(x) a· =-J---·Cos(a· ·x)-dx (j 0,1,2,3 .... ) (3.25b) } L k } = o

For the specific problem depicted in Figure 3.3 and the boundary conditions of (3.12) the different terms aj would be: 64 Chapter 3

(j =1,2,3 .... )

(3.26)

where

Szncx. () =--Sin(x) (3.27) x

The final expression for the temperature is:

P DO a T(x,y) =-y+ L n . Cos(anx)·Sinh(any) (3.28) 2Lk n=l an . Cosh(an . H)

3.1.2 Example 3: Calculation of a three-dimensional time dependent temperature map

In this example, we intend to indicate how the previous example could be extended to the calculation of a time dependent three-dimensional temperature map. In addition to this, we will discuss the number of terms of the series that should be taken into account during the calculation process. In a three-dimensional case, the equation to solve in rectangular coordinates is:

(3.29)

Now, the method of separation of variables assumes that the function T(x,y,z,t) can be expressed as the product of four functions that only depend on x, y, z, and t respectively.

T(x, y, z,t) = X(x)· Y(y)· Z(z)· '¥(t) (3.30) 3. Thermal analysis in IC's 65

There are many ways to deal with the time-dependency of temperature. One approach would be to perform the Laplace transform of expression (3.30). Then, after substituting and rearranging as in the above example we obtain:

(3.31)

where D is the thermal diffusivity and s is the transformed variable in the Laplace domain. We have also assumed null initial conditions:

D=~ p·e L{'I'(t)}= t(s) (3.32)

L{ ~;t)} = s· t(s) - '1'(0)

where L{} is the operator that performs the Laplace transform. As in the above example, each summand of (3.31) depends on only one variable, therefore, the only solution that holds this equality is that each summand is equal to a constant that is chosen to be:

2 ---=adX 2 d 2x 2 _ dY =/32 (3.33) d 2 y

y2 =~+a2 +/32 D

And three ordinary differential equations are obtained:

(3.34) 66 Chapter 3

One of the advantages of using the Laplace transform is that the ordinary differential equation with the time variable does not appear. Using the boundary conditions and notation described in Example 1 (Figure 3.2) and the procedure described in Example 2, the general form of the solution is in (3.35). In this example, we have considered z=O at the bottom of the silicon die and z=H at the top of the die, where the incoming heat flow is known.

GO GO T(x, y, z, s) = ~ ~ Cnm (s)· Cos(anx)· Cos(PmY)· Sinh(rnmz) n=Om=O (3.35) m·1r Pm=w

The value of the constants Cnm(s) is derived when the Laplace transform of the heat flow entering from the top surface of the die is written in terms of a two-variable infinite series of cosine functions or Fourier series:

- dTI = g(x,y,s) = I: I:anm(S).Anm .Cos(anx).Cos(PmY) dZ Top Surface k., n=Om=O .. . , .. where: 1 if n=m=O 4 1 Anm = m > 0, n = 0 or n > 0, m = 0 2 if 1 if m>O,n>O

L W anm(s)=_l- J J' g(x,y,s) ·Cos(anx)·Cos(Pmy)·dx.dy LW x=Oy=O k (3.36)

For real implementations of this procedure, only finite terms of the sum (3.35) or (3.28) can be considered. This will generate an error called the "truncation error":

NM TcalcN,M (x,y,z,s) = ~ ~Tnm(x,y,Z,s) (3.37) n=Om=O

A detailed analysis of the analysis of the behaviour of TcalcN,Arlx, y, z) as a function of N and M can be found in [9] for a static analysis and is not reproduced here due to space limitations. However, the conclusions of this 3. Thermal analysis in IC's 67 work are that for the temperature to converge within 1 percent of its final value, the required number of terms to be used in the sum is proportional to the ratio between the size of the heat source and the size of the silicon die, and equal to:

N=6~ M=6 W (3.38) I w

3.1.3 Example 4: Thermal analysis in cylindrical coordinates

The use of cylindrical coordinates may be interesting for IC thermal analysis to reduce the complexity of the calculation and to increase convergence when the heat source is small with respect to the substrate. In some papers (for instance [20]), this coordinate system has been used for thermal analysis of IC's with errors of less than 5% with respect to the analysis in rectangular coordinates. Figure 3.4 shows a cylindrical silicon die with a heat source on its top surface. In this example we extract the temperature map T(r,z) in all of the substrate with the following boundary conditions: bottom surface isothermal, lateral side adiabatic and known heat flow in the top surface.

Figure 3.4: Structure analysed in Example 4. 68 Chapter 3

The mathematical expressions of the boundary conditions are:

(3.39)

If the thermal steady state is analysed, the heat transfer equation to solve is:

(3.40)

The method of separation of variables assumes that temperature can be written as the product of two functions that only depend on r and z respectively:

T(r, z) = R(r)· Z(z) (3.41)

Substituting in (3.40) and rearranging:

1 d 2 R 1 1 dR 1 d 2Z ---+---=---- (3.42) R dr 2 r R dr Z dz 2

As in equation (3.26), this equality only holds for all the possible values of rand z if both sides are equal to a constant that is chosen to be -d. Rearranging, two ordinary differential equations are obtained:

2 2dR dR 2 2 r ·--+r·-+r ·R·a =0 dr2 dr (3.43) d 2Z 2 ---a ·Z=O dz 2

whose solutions are, for ~: 3. Thermal analysis in IC's 69

R = A· Jo(a· r)+B· Yo(a· r) (3.44) Z = C· Sinh(a· z) +D· Cosh(a· z)

and for a=O:

R=a (3.45) Z =bz+c

where a, b, c, A, B, C and D are constants, and 10 and Yo are, respectively, the first and second kind zeroth order Bessel functions. To satisfy the boundary conditions:

B = 0 as Yo (0) = -00 (3.46a) D=c=O as Z(O)=O

Rh• a must be the roots of the first order Bessel function, as:

(3.46b)

These roots can be approximated to:

for n =1,2,3 .... (3.47)

The general solution of the temperature function inside the cylindrical die is:

00 T(Z,r) = Co' z + L Cn . Sinh(anz)·J o(anr) (3.48) n=l

The value of the different terms Cn is found when the heat flow entering through the top surface of the substrate is written in terms of a first kind Bessel function series: 70 Chapter 3

-ka-aTI = g(r) = -k· Co + L00 -k ·an . Cn . Cosh(anz) ·Jo(anr) = z~ ~ 0~ 00 =Do+ LDn . Jo(anr) n=l

The value of the coefficients Dn can be found from the theory of Bessel function series:

(3.50)

3.1.4 Example 5: AC thermal analysis

The goal of this example is to compare results from two different thermal analyses: in the first one, the geometry is described in rectangular coordinates, whereas in the second, it is described in cylindrical coordinates. The purpose of this comparison is to show that temperature results are very similar when obtained with either of these coordinate systems. In both cases, we will analyse a silicon die with one heat source. In order to perform the comparison as fairly as possible, the area and volume of the silicon die and the area of the heat source are the same in both cases. Following the notation of the previous examples, the specific dimensions of the silicon die are WxLxH= 3000 /lm x 3000 /lm x 400 /lm in the rectangular case and RbXH= 1692 /lm x 400 /lm in the cylindrical one. The dimensions of the heat source are wxl = 40 /lm x 40 /lm in the rectangular case and rs = 22 /lm in the cylindrical one. In this example we will perform an AC analysis: If the heat source dissipates a harmonic power function, the steady state temperature at any point on the silicon surface will also be a harmonic function, with the same frequency (the system is linear). The attenuation of the temperature amplitude and the phase shift between the temperature and the power waveforms depend on two factors: the frequency of the power function and the distance between the temperature monitoring point and the heat source. The analysis of both amplitude and phase shift of this harmonic temperature 3. Thermal analysis in IC's 71 at one point on the silicon surface as a function of the activating frequency and its distance from the heat source is what we call AC thermal analysis. To perform AC analysis the time dependency of the temperature has to be taken into account, as explained in Example 3. The power function dissipated by the heat source (which defines the top boundary condition) is the Dirac delta function ~ t), whose Laplace transform is equal to 1. Once the temperature function has been obtained, the Laplace variable has to be substituted by s=j2Jif, wherej is the imaginary number andfis the activation frequency of the heat source. Thus, the temperature at one point is a complex number that depends on frequency. The phase shift and amplitude can be obtained with the following expression:

Ph Sh Im{Temperature} ase 1ift =arctg -~---"-----4 - Re{Temperature} (3.51) Amplitude = mOd{Temperature}

In fact, what we have just described is how to obtain the thermal transfer function between the heat source and a point on the silicon surface called the temperature monitoring point. The same procedure is followed to obtain transfer functions in linear electrical circuits. In this example, we will focus our attention on the study of the phase shift between the temperature and power waveforms. The reasons of this study are three: i) When the temperature measurement techniques are explained, it will be clear that phase measurements do not usually need a calibration stage. Therefore, they are easier to perform than amplitude measurements. ii) When we analyse the effect on temperature of structural defects in circuits, we will see how the distance between the monitoring point and the defect can be derived by analysing the phase shift between the power and temperature waveforms. This information can be used for defect diagnosis. iii) If we have a semi-spherical heat source of radius ro, dissipating a harmonic power function of frequency f, and located inside a semi-infinite homogeneous medium, the temperature for r>ro follows the expression:

C -rg; ~ J'(wt-r ~2.Diw) T(r,t)=-.e 2.D· e (3.52) r

where C is a constant, D is the thermal diffusion constant of the media, af2;ris equal to f and t is time. This expression shows a thermal wave with a linear phase shift with the distance r. If the magnitude of the slope of this phase shift is drawn as a function of the frequency in a log-log chart, the 72 Chapter 3 graphic is a straight line with a slope equal to V2 due to the square root of OJ of the phase expression. The attenuation of the amplitude increases with distance and frequency. If the frequency is high, the temperature will go to almost zero at distances close to the heat source. In this case, no differences will exist between the semi -infinite case and the bounded case of a heat source on a silicon die, as the silicon die will be seen at such frequencies as a semi-infinite medium. The frequency at which the behaviour of the temperature changes from the semi-infinite case to the bounded case can be detected by analysing the phase of the temperature. Figure 3.5 shows, for the cylindrical case, the phase shift between the temperature and power waveforms. The horizontal axis is the distance between the monitoring point and the centre of the heat source. As can be observed, the phase shift has a linear behaviour whose slope depends on the frequency. In fact, Figure 3.6 shows this slope (absolute value) as a function of the frequency for the cylindrical, rectangular and semi-infinite cases. As is shown, the three slopes are the same for frequencies between I kHz and 10 kHz. For frequencies lower than 1 kHz, the boundary conditions affect the behaviour of the temperature in the cylindrical and rectangular cases. It is interesting to note that the two bounded cases (cylindrical and rectangular) show similar results throughout the frequency range analysed .

o ....~-_ ~------.----- f=100Hz ~".... -- --..... -50 ...... -- .....-:4----- f=lkHz ...... L -100 ...... ~ ...... ---- f=5kHz rIl ...... Q) .... . ~ -150 ...... if .... ~ ---- f=lOkHz

-200

0.5 1.5 2 2.5 -250 Distance from the heat source [mmJ

Figure 3.5: Temperature phase shift as a function of the distance from the heat source for different frequencies. 3. Thermal analysis in IC's 73

- + Rectangular o Cylindric - Spherical : lfL ~ ····~···T··········~················· JJlfL : ··lfL···r:···f···· ......

Frequency (Hz)

Figure 3.6: Slope of the phase shift as a function of distance.

3.1.5 Example 6: Analysis of multi-layer structures

The aim of this example is to show how to perform thermal analysis in a multi-layer structure such as the one drawn in Figure 3.7. There, the different layers may be described in either rectangular or cylindrical coordinates. If the final goal of the thermal analysis is to obtain the temperature map at the surface of any of the layers, one interesting approach to analyse this structure is the method proposed by the authors of [12], based on the use of thermal quadrupoles. In this example we will merely show how to extend the analysis presented in the Examples 3 and 4 to a multi-layer structure. The reader will find broader and deeper analysis in [12]. The concept of the thermal quadrupole is as follows: let us suppose that we have a homogeneous wall of thickness H and that the temperature inside this wall only depends on the spatial variable z and time. Thus, the heat transfer equation to solve is:

(3.53)

If we call fX.z,s) the Laplace transform of T(z,t), equation (3.53) becomes:

(3.54)

We have already seen that the solution of this equation is: 74 Chapter 3

B(z) = A· Sinh(a· z)+B· Cosh(a· z) (3.55)

Heat flow %,/////JHtH//////h~ ___ ,_=

%~ %~ ~ =Adiabatic boundary % Metal % ~ condition / / y~ T=O

Figure 3.7: Multi-layer structure modelling an Ie.

The Laplace transform of the heat flow at point z can be obtained by applying the Fourier law of heat conduction:

(J(z) = -k dB =-k· a· (A. Cosh(a· z) + B· Sinh(a· z») (3.56) dz

Thus, the following relationship can be written:

[B(zt=<>]_[ Cosh(a·H) ~1ka Sinh(a.H)] . [B(zt=H]_ - (J(z)l z=<> - ka·Sinh(a·H) Cosh(a· H) (J(z)IZ=H (3.57) =[A B].[B(zt=H] C D (J(z)lz=H

This matrix is called the thermal quadrupole of the wall. It is interesting to underscore that this matrix relates the temperature and the heat flow at z=O with the temperature and heat flow at z=H. Supposing we know ~H) and «0), we can find ~O). For the sake of simplicity, let us fix ~H)=O: 3. Thermal analysis in IC's 75

O(z>iz=o = B . ¢(zt=H 1 ¢(z>iz=o = D· ¢(z)lz=H => ¢(z)lz=H = D ¢(z>iz=o (3.58) B O(z>iz=o = D '¢(z)lz=O

The most interesting application of the matrix formulation is that it makes it easier to find the temperature at the surface of a wall when there are N consecutive walls in thermal contact. If thermal continuity can be assumed between them, the temperature and heat flow at the end of a wall is the temperature and heat flow at the beginning of the next one. Therefore, we can write (both Band ¢are functions of z):

01 NAB· 01 Z=r.HiN z=O = n I I. i=l 1 (3.59) [~,=o] ,J, D.l [ ~~,~Il'

So far, in this example we have introduced the concept of thermal quadrupoles for a one-dimensional thermal transfer problem. How can this procedure be extended to the analysis of a multi-layer three-dimensional problem in rectangular coordinates such as the one described in Example 3, or the two-dimensional problem such as that presented in Example 4? The solution is to perform an integral transform of the heat transfer equations (3.29) or (3.40) in order to transform them into a one-dimensional equation such as that described in (3.54). Following the same notation as in Examples 3 and 4, the transforms to perform are, for the rectangular case, the Laplace transform for the time domain and the cosine transform for the x, y space domain:

L W 00 O(an,Pm'Z'S)= I I IT(x,y,z,t)'e-st ·Cos(anx)·Cos(Pmy)·dxdydt x=Oy=Ot=O 1l a =n·- n,m =0,1,2 ... n L (3.60)

The reason why the cosine transform has been selected is discussed later in this example. If this transformed temperature is replaced in the three-dimensional heat transfer equation in rectangular coordinates (3.29), we obtain: 76 Chapter 3

(3.61)

This expression is very similar to (3.54). Therefore, for any of the two layers of Figure 3.7 we can write:

where z=O means the top surface of any of the layers and z=H the bottom surface of any of the two layers. The transformed heat flow can be obtained from the transformed temperature or by transforming the heat flow, as indicated in the following expression:

d()nm (J(an , Pm'z,s) =-k .--= dz

(3.63)

If the final goal of the thermal analysis is to obtain the temperature at the surface of the structure depicted in Figure 3.5, then we will obtain:

(3.64)

where matrix 1 is the thermal quadrupole of the silicon, the central matrix is the thermal quadrupole of the contact resistance (as explained in Chapter 2, in the region of contact between the silicon and metal layers there is a continuity in the heat flow and a discontinuity in the temperature equal to the heat flow mUltiplied by the contact resistance) and matrix 2 is the thermal 3. Thermal analysis in Ie's 77

quadrupole of the metal. Matrix T is the product of the three previous matrices. Now, the value of the transformed temperature Onm(O) can be obtained as described in (3.58). To obtain the anti-transformed temperature:

where: (3.65) I if n = m =0 Anm = { 2 if m > 0, n = 0 or n > 0, m = 0 4 if m > O,n > 0

Comparing (3.65) and (3.60) with (3.36) we can find the justification for using the cosine transform for the x and y spatial variables: by doing so, the final temperature satisfies the boundary condition of adiabatic lateral surfaces. In fact, the method of thermal quadrupoles is just another way of presenting the method of separation of variables, which is very convenient when the structure to analyse has several layers. If the structure of Figure 3.5 were described in cylindrical coordinates, the transform to perform on the temperature would be, following the radial notation of Example 4:

00 Rb ()(an,z,s) = I IT(r,z,t) ·e-st Jo(anr) .£ltdr n =0,1,2... (3.66) t=Or=O

where a;, are the roots of the first order Bessel function as described in (3.47) with lXo=O. If this transformed temperature is replaced in equation (3.42), we obtain:

(3.67)

Expression (3.67) also has a quadrupole representation as described in (3.62). In this case, however, only one subscript is needed. The transformed heat flow can be obtained either from the transformed temperature or by transforming the heat flow. Finally, the temperature at z=O can be obtained from the transformed temperature: 78 Chapter 3

(3.68)

3.2 Numerical methods

Numerical methods discretise the region under analysis into a mesh of nodes and generate a set of linear equations in which the unknown quantities are the temperatures of the different nodes. There are three different approaches to obtain this set of linear equations: the Finite Element Method (FEM), the Finite Difference Method (FDM) and the Boundary Element Method (BEM). In this text, we will introduce the Finite Difference Method. Application examples of the other two methods are indicated in [116] and [118]. We will present the FDM from two different points of view: the fIrst approach will be to obtain a set of linear equations that can be solved with any of the known methods. The second approach will be to obtain an equivalent electrical circuit that models heat conduction in the Ie. This technique is known as RC formulation of the thermal transfer problem and it is a very interesting strategy for those who are used to electrical simulators, as they can use the same tool to perform both electrical and thermal analysis of IC's.

3.2.1 Finite difference method

The fInite difference method generates the set of linear equations by approximating derivatives in temperature with respect to space to fInite differences between the temperature of the nodes that form the mesh:

dT = Lim T(x+&)-T(x) = T(x+&)-T(x) (3.69) dx Ax-+O & &

To quantify the error, we can expand Tat x+L1x in terms of T at x using the Taylor theorem:

dT 1 2 d 2T 1 3 d 3T T(x+&) =T(x)+&-+-& -2 +-& -3 + ... (3.70) dx 2! dx 3! dx

rearranging: 3. Thermal analysis in IC's 79

dT = T(x+Ax)-T(x) +O(Ax) dx Ax (3.71) where O( Ax) : function of order Ax

which shows that the error of the approximation (3.69) is of the same order than the node spacing. The approximation (3.69) is called forward• difference approximation. Other approximations with lower errors exist. The reader may find some of them in [108]. If temperature has time dependency, the same criterion is used for derivatives in temperature with respect to time. Algebraic difference equations can now be obtained from the heat transfer equation if the approximation of equation (3.69) is recursively used to obtain the finite difference approach of the second order derivative in temperature with respect to space. However, we will show how to obtain the set of linear algebraic equations by applying the energy conservation principle directly to a node of the mesh. This procedure is similar to the one described in Section 2.3 to derive the heat transfer equation, and it is better to understand the RC electrical model of heat transfer.

3.2.1.1 Nodal equation extraction Let us analyse a two-dimensional time-independent problem such as that described in Example 2, but considering that heat can be generated inside the body and the boundary conditions can be any of those described in Section 2.2. This is just to present a more general case. The goal of this example is to find the temperature map T( x, y) inside the body. Figure 3.7 shows a representation of this body with a mesh of nodes. Each node can be identified with two subscripts (n, m) and can be classified into one of two categories: external nodes and internal nodes. External nodes are in direct contact with a boundary surface, whereas internal nodes are always surrounded by other nodes. Focusing on internal nodes, such as the one depicted in Figure 3.8, the principle of conservation of energy states:

(3.72)

where Qx- is the heat flow that is conducted from node (n-i, m) to node (n, m) due to its difference of temperature. The same is applicable for Qx+, Qy_ and Qy+' QNM is the heat generated in the volume of the body associated with the node (n, mY, which, in this two-dimensional example, is equal to:

QNM p(x,y)·dxdy (3.73) = fLLi,y 80 Chapter 3

where p(x, y) is the power density function. In this two-dimensional problem the units of p(x, y) are [W/m2]. In a three-dimensional example, fPtvMK would be the integral along the volume associated with the node (n, m, k) and the units of p(x, y, z) would be [W/m3].

x I I I I I I I .. (n) 0123456- - -- - • • • • • 4 • • • • • 4 • • • • • 4 •... • ... • ... • ...... • 4

y (m)

Figure 3.7: Example of mesh in a two-dimensional thermal problem.

Figure 3.8: Energy balance in an internal node.

Assuming that p(x, y) is a constant, Pv, and approximating Fourier's Law of conduction by expression (3.69): 3. Thermal analysis in Ie's 81

QNM = Pv ·Ax·Ay aT Tn m -Tn- 1 m (3.74) Qx- = -k . dAx . ax == -k . Ay . ' Ax '

In this case, the differential area dA x is equal to Lly, as the problem is two• dimensional. In a three-dimensional problem it would be Lly·LIz. Substituting in (3.72) and with L1x=Lly:

(3.75)

If there is no internal heat generation:

(3.76)

which states that the temperature at each node is the arithmetic average of the temperature at the four nearest neighbouring nodes. Equation (3.72) of conservation of energy is also valid for external nodes. A generic external node is shown in Figure 3.9:

Exterior Temperature t·~ -----,------.---••----.---.-----1------I Tn-I,m Q""I / -Jf{ Tn,m I Qx+ T n+l,m I : : Qnm : : ------1------~------~~--i------f------

, Tn,m+l~' ' .'-«------~ . . !J.x

Figure 3.9: Conservation of energy in external nodes.

Depending on the boundary conditions, we know: If the boundary conditions are surface isothermal, with a temperature equal to T:

(3.77) 82 Chapter 3

For second or third kind boundary conditions:

QExterior = Lit· h· (Tn,m - T Exterior) Convective boundary condition

QExterior = Lit· f (x, y) Known heat flow

QExterior = 0 Adiabatic boundary condition (3.78)

In three-dimensional cases, the differential of area would be Lix·L1z. The set of linear equations generated can be solved with a direct method, such as Gaussian elimination. However, in order to speed up the calculation process, iterative methods such as the Gauss-Seidel method are frequently used. If the temperature inside the body changes with time, equation (3.72) is transformed into:

(3.79)

where QST is the heat flow stored in the mass of the body volume associated with the node (n, m) which is equal to:

(3.80)

where p is the density of the material, c its specific heat, EST is the energy stored in the mass associated with the node (n, m), VolNM is the volume of the body associated with the node (n, m) and the superscripts t; HI mean time. Similar to the derivation for the static case, we can derive the following expression for an internal node:

HI D ·l1t ( Pv 2) Tn,m = 11x 2 Tm+1,n + Tm- 1,n + Tm,n+l + Tm,n-l +TI1x + (3.81) + 1-4--D.I1t) ·T r ( 11x2 n,m

Equation (3.81) is called the explicit finite difference formulation. It expresses the nodal temperature (n, m) at time (H 1) in terms of the nodal temperature (n, mY, (n+I, mY, (n, m+J), (n-I, mY, (n, m-J) at the earlier instant of time To Initial temperature (initial conditions) have to be provided to start the calculation process. 3. Thermal analysis in Ie's 83

Similar equations can be found for external nodes as a function of their specific boundary conditions. References at the end of this chapter (for instance [108] or [107]) show a list of nodal equations for external nodes with different boundary conditions. It is interesting to underscore that the values of L1x and Lit present in (3.81) have to be selected in such a way that calculations do not violate the physical requirements represented by the Second Law of Thermodynamics. Otherwise, the resulting solution will exhibit a non-physical meaning and can become unstable. Let us analyse the particular case of a two-dimensional body without internal heat generation. In this case, the stability criterion for explicit finite difference nodal equations takes the form:

D·I!J 1 --:!::- (3.82) Ax2 ST

where the value of ST depends on the type of node (internal or external) and on the boundary conditions in external nodes. For instance, we have a two-dimensional node, whose initial condition is 'fl=rc. This node is surrounded with external nodes whose boundary conditions are isothermal with T=O°C. Thus, from equation (3.81):

(3.83)

If ST=3 in this example (3.83), it violates the Second Law of Thermodynamics, as heat will flow spontaneously from colder to hotter nodes. If ST=I the solution becomes unstable. The following table shows the evolution of the temperature at this node for two values of ST. In the first case, although it converges to the right value, for odd values of 't the temperature at this node has non-physical value. In the second case, the system has become unstable.

Table 3.1: Example of unstable thermal systems. 't ST=3 ST=l 0 1 1 1 -0.33 -3 2 0.11 9 3 -37.03 m -27 4 12.34 m 81

For this case, the stability criterion is: 84 Chapter 3

&2 AtS--' ST=4 (3.84) D·ST'

3.2.1.2 RC modelling of heat transfer The RC modelling of heat transfer comes from the analogy that can be established between the following variables of the thermal and electrical domains:

Table 3.2: Analo$!ies between thermal and electrical domains. Thermal domain Electrical domain Temperature (DC, K) Voltage (V) Energy - Heat (J) Charge (C) Heat flow (W) Current (A) Thermal resistance (KIW) Electrical resistance (Q=V/A) Thermal capacitance (JIK) Electrical capacitance (F=c/v)

Thus, the energy conservation equation (3.79) becomes the Kirchoff Current Law for the node (n, m) of an electrical circuit:

A- I

/:l.Y' I

...(------;> I:l.x

Figure 3.10: RC model of heat conduction.

(3.85) 3. Thermal analysis in IC's 85

where, from (3.74) and (3.80), we can derive the values of the current sources, and capacitance:

INM = Pv ·A,x·Ay

I k A Vn,m - Vn-1,m Vn,m - Vn-1,m x- =- ·uy· A_ = (3.86) LU Rx_ aVn,m V;,!t - V;'m 1ST =p·c·VoINM ·--=CNM .---'---'-- at At

The interesting feature of this formulation is that now, with an electrical simulator, thermal analysis can be performed. The reader should be aware that, in fact, the RC modelling has the same mathematical principle as the nodal equation formulation and, therefore, it may present the same instability problems as are described in the previous examples. To perform the RC model of thermal transfer, the body under analysis is divided into small volumes, and each volume has a node of the mesh associated with it. Usually, the edges of these volumes are parallel to the axis of one of the three spatial coordinate systems: rectangular, cylindrical or spherical. In fact, in the last two examples, we have worked with the rectangular coordinate system. When the geometry of the volumes fits with the axis of a spatial coordinate system, the values of the resistances associated which each node of the mesh are tabulated as a function of the location of the node and its size. For instance, the references [107] and [108] show these tables and their formulae. Nevertheless, the shape and edges of these volumes are free. As a general rule, the resistance that joins two consecutive nodes has to relate its difference of temperatures to the heat that flows from the hotter node to the colder one. As we will see in the next section, this freedom can be used to reduce the number of nodes of a system.

3.2.1.3 Reduction of the complexity in thermal analysis of IC's As we have seen, the error generated by the transformation of spatial derivatives into finite differences is proportional to the distance between the nodes that form the mesh. Thus, high accuracy or high resolution requires low spacing between them. However, this implies an increase in the nodes needed to cover all of the body under analysis and, therefore, an increase in the equations to be solved or the nodes of the RC circuit. This leads to slower calculation times and a need for higher computational resources, especially memory. Several strategies have been published to handle this accuracy• complexity trade-off in the context of thermal analysis of IC's. In this text, we will present three of them: variable grid density [123], [126], the 86 Chapter 3 multiport thermal macromodel [125], [122] -asymptotic waveform evaluation (AWE) [119], [124] and suitable choice of the shape of the volumes associated with each node [126], [127]. i) Variable grid density: The idea ofthis strategy is to increase the grid density in areas or volumes where high accuracy is required and to increase node spacing where this accuracy is not needed. An example of this strategy is published in [123], where a logarithmic spacing is done in order to increase accuracy near the heat source. ii) Multiport thermal macromodel - asymptotic waveform evaluation: Usually, current is injected only in some of nodes that form the 3-D RC mesh that models heat transfer through the IC structure. The location of these nodes coincides with the location of the devices that dissipate an amount of power that significantly influences the temperature increases of the silicon surface. Additionally, we are only interested in the voltage of some nodes, whose location coincides with the location of the temperature sensitive or critical devices. Figure 3.11 shows a schematic representation of this situation. In this figure, input and output nodes are joined with an RC passive circuit. Several strategies exist to develop simple macromodels from complex RC circuits. For DC simulations, a matrix of thermal coupling resistances can be extracted:

N Vout- ~ Rtl.. t- .[- 1 =£... "t-oug I j =1,2,3... m (3.87) i=l

where Rthi-outj is the thermal coupling resistance between the ;th dissipating device and the output j of Figure 3.11.

Input Nodes ~------~ Dissipating device 1 Output node 1 Output node 2 Dissipating device 2 RC circuit mesh: Output node 3 Dissipating device 3 Multiport macromodel Output node 4 Dissipating device 4 Output node m Dissipating device n

Figure 3.11.- Multiport representation of a thermal model. 3. Thermal analysis in IC's 87

For transient or AC simulations, one of the procedures used is called asymptotic waveform evaluation (AWE). A WE captures the essential circuit behaviour by finding a few dominant poles and residues using a moment matching algorithm known as Pade approximation [118] . This algorithm has been successfully applied to distributed line, 3-D interconnect, power bus distribution, substrate noise-coupling and switched- circuit simulations. The goal of this procedure is to obtain a simplified transfer function from a more complex one, with just k poles. For instance, [124] shows that one or two poles are often sufficient to accurately model 3-D network problems. iii) Choice of the volume's shape: In the next chapter we will need an analysis of the characterisation of the temperature waveform at several points of the silicon surface when an MOS transistor acts as a heat source and dissipates a power pulse of magnitude M and duration T. Figure 3.12 shows this case. In this figure, the heat source is an MOS transistor sized w=lO j.Jm and 1=1 .2 j.Jm and the goal is to find the temperature waveform at several points following the X axis.

Figure 3.12: MOS transistor acting as a heat source.

In MOS devices, power is mainly dissipated in its channel, beneath the gate. Therefore, the heat source can be geometrically described as a parallelepiped of dimensions wxlxOA j.Jm3, where 0.4 J.lm is the typical channel depth. The silicon die is also a parallelepiped. Thus, it seems that the most logical way to mesh the silicon die is to use a rectangular grid. However, this meshing strategy will give us huge RC nets if high spatial resolution is desired (for instance, extraction of the temperature every 2 microns along the axis). 88 Chapter 3

Heat Source

Isotherms or thermal wavefronts

Silicon die Cross Section Top View

Figure 3.13: Isotherms or thermal wavefronts when an MOS transistor (Heat Source) dissipates a power pulse.

Figure 3.13 shows a heat source dissipating power. The isothermal surfaces are indicated. If we extract all the isothermal surfaces that cross the X axis every 2 microns, the silicon die will be divided into volumes, like the layers of an onion. Each one can be associated with a node of an RC net. Figure 3.14 shows an example of the resulting one-dimensional RC network.

Silicon and Die Attach Layers Metal Package Layer Layer

Figure 3.14: One-dimensional RC model of heat conduction through the silicon die. In this particular example (extracted from [126]), a variable grid density has been used. The spatial resolution is 2 !lm close to the heat source (the first 300 !lm), and is then reduced as the distance from the heat source increases. For instance, in the example in Figure 3.14, the metal and package layers are modelled with just one node. The advantage of using a meshing grid that coincides with the isotherms in homogeneous materials is that one-dimensional RC nets are generated. We would like to underscore the concept of homogeneous material, as we assume that isotherms in static analysis will agree with the thermal "wavefronts" in transient analysis (we use the term "wavefront" to graphically indicate the transient evolution of temperature as opposed to a step power dissipation. However, this term is not very accurate, as heat conduction is a diffusion mechanism and not a wave transmission mechanism). Due to the small size of the heat source compared with the silicon die, the isotherm surface consists of ellipsoids near the heat source, spheres far from both the heat source and the boundary of the silicon die, and once again, 3. Thermal analysis in IC's 89

ellipsoids close the boundary of the die. For example, Figure 3.15 shows these isotherm surfaces close to the heat source in a top view and a 3-D view. These figures have been extracted with a commercial FDM program. In the 3-D view the parallelepiped that acts as a heat source is indicated.

30.0

Figure 3.15: Thermal map of the silicon surface and isotherm surface. In this analysis, the centre of the heat source is located at the coordinates: X=O, Y=O, Z=300 f1m.

When the isotherms are spheres, the value of the resistance and capacitance of the nodes associated with them can be derived with:

1 1

(3.88)

where j is the number of node of the RC net and the number of the isothermal surface, counting from the heat source, k ,c and p are the thermal properties of the silicon and rj is the radius of the isothermal surface j. For the regions in which the isothermal surfaces are ellipsoids, their X, Y and Z radius can be extracted from a static analysis. The values of the resistances and capacitances associated with them can be derived with:

T(S j )-T(S j+l) Rj ,j+l = p (3.89) Cj =VolJ"c,p 90 Chapter 3

where T(Sj) is the temperature at the midpoint between the surface j and the surface j-l over the X axis, and Volj is the volume comprised between the surface j and the surface j-l.

IE+03 1E-08

IE-09 IE+02 §: 9: Q. IE-IO ,;, (OS CI.> U ~ IE+O I IE- II

IE+OO IE- 12 0 2 4 6 8 10 0 2 4 6 8 10 Node Node

W = 5jLm. W=lOjLm. W=20jLm.

W =25.5 jLm. Spherical case

Figure 3.16: Value o/the R and C elements o/the RC net/or different heat source geometry.

As an example of the results obtained with this procedure, Figure 3.16 shows the value of resistance and capacitance for the first nodes of the RC net for different widths of the MOS transistor and compared with an ideal spherical and punctual heat source. As can be observed, beyond node 10, all the values of resistance and capacitance converge to the same value.

Power dissipated by the heat source J:l';' 1 ,·t. . "l' :...... j.... · .. ; ~::w p G\:.l :C : C ·· . - ; ~; Distance= 4 J1m ... - ~ . . .:. ~ . ," ~ ~-io ..2'C Distance = 24 11m Increment of •..•~ ~ OC Temperature . ~.... . ~ ..~ .. . ~ " ~ ~: o.rc Distance=44llm L1T --~, · , ·, ~ O ·C

.. :. ~ -!O.05 ·C Distance= 64llm ~. ~ O ·C o 10 20 30 40 50 60 Tlme (J1S)

Figure 3.17: Wave/onns o/the temperature increases generated by the heat source/ollowing the X axis. Four different pulse durations: 2, 8, 16 and 32 J.IS. 3. Thermal analysis in Ie's 91

Figure 3.16 shows waveforms of the temperature increases obtained from the thermal analysis when the heat source dissipates 13 mW for 2,8, 16 and 321.1s.

4. ELECTRO-THERMAL ANALYSIS OF INTEGRATED CIRCUITS

The goal of electro-thumal analysis is to perform in parallel a thermal and electrical analysis of the circuit, in order to ascertain the temperature of its devices while the circuit is running. These two analyses are coupled, as the power dissipated by the devices (one of the inputs of thermal analysis) is an output of electrical analysis. Additionally, temperature (one of the outputs of thermal analysis), is an input of the electrical one (see Figure 3.1). The electro-thermal simulators reported in the literature obtain results from two different approaches: the direct method and the relaxation procedure. Electro-thermal simulators based on direct methods simultaneously solve the electrical and the thermal RC circuit. The coupling between the two circuits is done by the devices: they have terminals to be connected to the electrical circuit (the variables of these terminals are voltage and current) and terminals to be connected to the thermal circuit (the variables are temperature and power). The models and equations of these devices take into account this electro-thermal behaviour [121], [122], [125]. These simulators are recommended when there is strong thermal coupling between the devices of an IC. However, they require large computational resources, as the two circuits are solved simultaneously. The relaxation procedure is as follows: first, an electrical analysis is done with all the components at the same temperature. The power dissipated by all the devices is extracted, and the vector Po is formed. Each component of this vector is the power dissipated by each device of the circuit. Then, a thermal simulation is performed. The power value used as data is the vector Po. The temperature of each device is extracted and the vector TI is formed, where each component of the vector is the temperature of each device. With these new temperature values, a new electrical simulation is performed and a new power vector is extracted, Pl. Here, a convergence criterion must be set in order to see if more iterations have to be carried out. Usually, convergence criteria are based on the distance between the vectors Pj and Pj +l . Both DC and transient simulations can be performed with this procedure. In transient simulations, if the electrical simulator does not allow the temperature of each device to be time-dependent, transient simulation time is divided into 92 Chapter 3 intervals. The temperature of all the components does not change in any of the intervals. Details of this variation can be found in [99] or [120]. Simulators based on this philosophy do not need as many computational resources (for instance, memory) as direct simulations, since electrical and thermal simulations can be performed sequentially. However, it is reported in the literature that the convergence of these simulators is slow when there is a high thermal coupling between devices [128].

4.1.1 Example 7: Dynamic electro-thermal procedure

If temperature increases are low and self heating can be neglected in most of the devices, the relaxation procedure can be simplified to one iteration, and in most cases can be done directly with a simulator such as HSPICE.

RCnetwork modeling Ix=OA t substrate

Vout

Vl=M.8T\(t) V2=M.8T2(t)

Figure 3.18: Dynamic electro-thermal simulation procedure.

Let us suppose the following example: In a silicon die we have the MOS transistor of Figure 3.12 acting as a heat source and a temperature sensor, placed at a given distance from this heat source. The goal of the electro- 3. Thermal analysis in Ie's 93

thermal analysis is to characterise variations of the output voltage of the sensor as a function of the power dissipated by the heat source. Figure 3.18 shows a way to perform this electro-thermal simulation. The sensor schematic is outlined in Figure 3.18. The explanation of this circuit is given in the chapter devoted to thermal measuring techniques. The key is that its output voltage is proportional to the temperature difference of its bipolar transistors QI and Q2. Therefore, the important data to extract in the thermal analysis is the temperature of these transistors. As can be observed, the power function dissipated by the heat source is coupled to the one-dimensional RC net that models the heat transfer through the silicon die. This coupling has been done with a current source whose output current is equal to the power dissipated by the transistor. Vx and Ix are dummy sources whose function is to probe both the current that goes through and the voltage drop accross the heat source. The voltage of each node of the RC net is equal to the temperature increases at different points of the silicon surface. Two nodes correspond to the location of the bipolar transistors. The coupling between this voltage and the sensor has been done with an equivalent electrical excitation of the sensor, as illustrated in the figure: the voltage controlled voltage sources VI and V2 have been placed. The parameter M is equal to:

M =-.!L gm (3.90) ale St=-aT

where St is the collector current sensitivity to temperature and gm is the transistor transconductance (collector current sensitivity to VBE). By using this approximation, self-heating has been neglected. There is also a second error source. The collector current of the temperature transducer is a function of: temperature, base-emitter voltage and base• collector voltage. The equivalent electrical excitation of the sensor implies the approximation:

The error module is equal to: 94 Chapter 3

(3.90)

In this case, the error is less than 0.5% when iJT is lK.

5. CONCLUSIONS AND SUMMARY

This chapter is meant to explain different techniques that are published in the literature to extract thermal maps in integrated circuits. We have especially focused on the techniques that are either discussed or used in the following chapters and on the techniques taken up in the specialised literature that are more related to thermal testing of IC' s. As a summary, we have presented two different approaches to perform thermal analysis of IC's: analytic approaches and numerical ones. In the fIrst approach, we have explained the technique known as separation of variables and series development of the temperature. The subject has been introduced progressively: from the static two-dimensional analysis of a one-layer structure, to the dynamic three-dimensional analysis of multi-layer structures. In the presentation of numerical methods, we have focused on the finite difference method. Its mathematical principle has been derived and we have exposed the procedure to extract either a set of linear equations or an equivalent RC circuit that models heat transfer through the silicon. We have also presented strategies to reduce the number of nets of this RC model, and devised procedures by which it can be coupled with electrical simulators to perform electro-thermal analysis.

6. REFERENCES

[93] W.T. Matzen, R.A. Meadows, J.D. Merryman, and S.P. Emmons, "Thermal Techniques as Applied to Functional Electronic Blocs," Proceedings IEEE, Dec. 1964, pp. 1496-1501. 3. Thermal analysis in Ie's 95

[94] RP. Gray and D.J. Hamilton, "Analysis of Electrothermal Integrated Circuits," IEEE Journal of Solid State Circuits, Vol. SC-6, no. 1, pp. 8-14, Feb. 1971. [95] V. Szekely, "Thermal Monitoring of Microelectronic Structures," Microelectronics Journal, Vol. 25, pp. 157-170, 1994. [96] RP. Gray, D.J. Hamilton and J.D. Lieux, "Analysis and Design of Temperature Stabilized Substrate Integrated Circuits", IEEE Journal of Solid State Circuits, Vol. SC-9, no. 2, pp. 61-69 [97] P. Antognetti, G.R. Bisio, F. Curatelli and S. Palara, "Three Dimensional Transient Thermal Simulation: Application to Delayed Short Circuit Protection in Power IC's," IEEE Journal of Solid State Circuits, Vol. SC-15, no. 3, pp. 227-281,1980. [98] F.N. Massana, "A Closed Form Solution of Junction to Substrate Thermal Resistance in Semiconductor Chips," IEEE Transactions on Components, Packaging and ManUfacturing Technology - Part A, Vol. 19, no. 4, Dec. 1996, pp. 539-545. [99] W. Van Petegem, B. Geetaerts, W. Sansen and B. Graindourze, "Electrothermal Simulation and Design of Integrated Circuits, "IEEE Journal of Solid State Circuits. Vol. no. 29,no. 2,Feb. 1994,pp. 143-146. [100] K. Poulton, K.L. Knudsen, J.J. Corcoran, K.C. Wang, RL. Pierson and RB. Nubling, ''Thermal design and simulation of bipolar integrated circuits, "IEEE Journal of Solid State Circuits, Vol. 27, no. 10, Oct. 1992, pp. 1379-1386. [101] C.C. Lee, AL. Palisoc and Y.J. Min, "Thermal analysis of integrated circuit devices and packages," IEEE Transactions on Computers, Hybrids, and Manufacturing Technology, Vol. 12, no. 4, Dec. 1989, pp. 701-709. [102] C.C. Lee, AL. Palisoc and Y.J. Min, "A general integration algorithm for the inverse Fourier transform of four-layer infinite plate structures", IEEE Transactions on Computers, Hybrids, and Manufacturing Technology, Vol. 12, no. 4, Dec. 1989, pp.710- 716. [103] H.S. Carslaw and J.C. Jaeger, "Conduction of Heat in Solids," Oxford Science Publications, 1959. [104] D. Maillet, S. Andre, J.C. Batsale, A Degiovani and C. Moyne, "Thermal Quadrupoles. Solving the Heat Equation through Integral Transforms," Wiley, 2000. [105] R.V. Churchill, "Fourier Series and Boundary Value Problems," McGraw-Hill, Second Edition 1963. [106] G.P. Tolstov, "Fourier Series," Dover Publications, Inc., 1976. [107] AF. Mills, "Heat and Mass Transfd', Irwing Inc., 1995. [108] L.C. Thomas, "Heat Transfd', Prentice Hall, 1992. [109] RD. Lindsted and R.J. Surty, "Steady-State Junction Temperatures of Semiconductor Chips," IEEE Transactions on Electron Devices, Vol. ED-19, no. 1, Jan. 1972, pp. 41-44 [110] T.S. Fisher, C.T. Averdisian and J.P. Krusius, "Transient Thermal Response Due to Periodic Heating on a Convectively Cooled Substrate," IEEE Transactions on Components, Packaging and Manufacturing Technology - Part B. Vol. 19, no. 1, Feb. 1996, pp. 225-262. [111] R. Castello and P. Antognetti, "Integrated-Circuit Thermal Modeling," IEEE Journal of Solid-State Circuits, Vol. SC-13, no. 3, June 1978, pp. 363-366. [112] V. Kadambi and N. Abuaf, "An Analysis of the Thermal Response of Power Chip Packages," IEEE Transactions on Electron Devices, Vol. ED-32, no. 6, June 1985. [113] D. Chen, E. Li, E. Rosanbaum and S.S. Kang, "Interconnect Thermal Modeling for Accurate Simulation of Circuit Timing and Reliability," IEEE Transactions on Computer• Aided Design of Integrated Circuits and Systems, Vol. 19, no. 2, Feb. 2000, pp. 197-205. 96 Chapter 3

[114] A. Ammous, S. Ghedira, B. Allard and D. Renault, "Choosing a Thermal Model for Electrothermal Simulation of Power Semiconductor Devices," IEEE Transactions on Power Electronics, Vol. 14, no. 2, March 1999, pp. 300-307. [115] I. Guven, c.L. Chan and E. Madenci, "Transient Two-Dimensional Thermal Analysis of Electronic Packages by the Boundary Element Method," IEEE Transactions on Advanced Packaging, Vol. 22, no. 3, August 1999, pp. 476-486. [116] J.T. Hsu and L. Vu-Quoc, "A Rational Formulation of thermal Circuit Models for Electrothermal Simulation - Part 1: Finite Element Method," IEEE Transactions on Circuits and Systems -I. Vol. 43, no. 9, Sep. 1996, pp. 721-732. [117] J.T. Hsu and L. Vu-Quoc, "A Rational Formulation of thermal Circuit Models for Electrothermal Simulation - Part II: Model Reduction Techniques," IEEE Transactions on Circuits and Systems -I. Vol. 43, no. 9, Sep. 1996, pp.733-744. [118] L.T. Pillage and RA. Rohrer, "Asymptotic Waveform Evaluation for Timing Analysis," IEEE Transactions on Computer-Aided Design, Vol. 9, no. 4, pp. 352-366. [119] D. Liu, V. Phaniu1atha, Q. Zhang and M.S. Nakhla, "Asymptotic Thermal Analysis of Electronic Packages and Printed-Circuit Boards, "IEEE Transactions on Components, Packaging, and Manufacturing Technology - Part A, Vol. 18, no. 4, Dec. 1995. [120] S. Wiinsche, C. Claub, P. Schwarz and F. Winkler, "Electro-Thermal Circuit Simulation Using Simulator Coupling," IEEE Transactions on VLSI Systems, Vol. 5, Sep. 1997, pp. 277-282. [121] G. Digele, S. Lindenkreuz and E. Kasper, "Fully Coupled Dynamic Electro-Thermal Simulation, "IEEE Transactions on VLSI Systems, Vol. 5, no. 3, Sep. 1997, pp. 250-257. [122] M.N. Sabry, A. Bontemps, V. Aubert and R Vahrmann, "Realistic and Efficient Simulation of Electro-Thermal Effects in VLSI Circuits, "IEEE Transactions on VLSI Systems, Vol. 5, no. 3, Sep. 1997, pp. 283-289. [123] A.R. Hefner and D.L. Blackburn, "Simulating the Dynamic Electrothermal Behavior of Power Electronic Circuits and Systems," IEEE Transactions on Power Electronics, Vol. 8, no. 4, Oct. 1993, pp. 376-385. [124] S.S. Lee and D.J. Allstot, "Electrothermal Simulation of Integrated Circuits," IEEE Journal of Solid State Circuits, Vol. 28, no. 12, Dec. 1993, pp. 1283-1293. [125] V. Szekely, A. Poppe, A. Pili, A. Csendes, G. Hajas and M. Rencz, "Electro-Thermal and Logi-Thermal Simulation of VLSI Designs," IEEE Transactions on VLSI Systems, Vol. 5, no. 3, Sep. 1997. pp. 258-269. [126] J. Altet, A. Rubio, E. Shaub, S. Dilhaire and W. Claeys, "Thermal Coupling in Integrated Circuits: Application to Thermal Testing," IEEE Journal of Solid State Circuits, Vol. 36, no. 1, Jan. 2001, pp. 81-91. [127] K. Fukahori and P.R Gray, "Computer Simulation of Integrated Circuits in the Presence of Electrothermal Interaction," IEEE Journal of Solid-State Circuits, Vol. sc-11, no.6,Dec.1976,pp.834-848 [128] V. Szekely, M. Renz and B. Courtois, "Tracing the Thermal Behavior of IC's," IEEE Design and Test of Computers. April-June 1998, pp. 14-21. Chapter 4

Temperature as a test observable variable in ICs Thermal testing

1. INTRODUCTION

Temperature is a physical magnitude that has been used as a parametric test observable for IC's in different scenarios. In the 1990's, the strategies that used temperature as a test observable were generically termed thermal testing techniques. Thermal testing can be defined as the use of temperature measuring techniques for the detection of structural defects in an IC. Such defects include structural problems in the topology of the microelectronic circuit, or the package structure (which comprises the package itself, but also the soldering integrity with the PCB and the cooling mechanisms, such as fans or radiators). In Chapter 1 the first set of defects was termed intrinsic defects, whereas the second set of defects was referred to as extrinsic defects. The goal of this introduction is to classify the nature of the defects that we could detect by thermally testing an Ie. Figure 4.1 shows an electrical model of the typical block diagram for a thermal testing procedure. The current sources model the power dissipated by the devices in this IC. As explained in the previous chapter, heat transfer through the IC can be modelled with an RC network. This block includes all the heat paths from the heat sources to the heat sink. The silicon die, package, pins, radiator and fan all form part of the heat path. The heat sink is usually the air that surrounds the IC, where its temperature is not affected by the amount of power dissipated by the devices. This condition can be found at a given distance from the IC. Due to this power dissipation, the temperature of the IC rises from the ambient temperature and is monitored at different temperature observation points with a temperature sensor system, which 97 98 Chapter 4 transforms the measured magnitude into an electrical output variable (Vour in Figure 4.1).

POWER DISSIPATED TEMPERATURE MONITORING BY DEVICES POINTS r------~ : Dissipating device 1 :Ou ut uode 1 : Dissipating device 2 RC model of :Ou ut node 2 : Dissipating device 3 heat transrer , ,: Dissipating device 4 trough the IC 'Output node m : Dissipating deviCe n L ______IC Strocture: Chip+Coolers emperat vamb: Ambient Sensor Vout: Function System Temperature of the measured I L...-~-...Jtemperature _ HeatSink

Figure 4.1: Block diagram of a thermal testing procedure. Electrical model.

In Figure 4.1 we have used a dashed box to indicate the elements that model the chip structure. In this particular case, the temperature monitoring system is outside the box. However, this is not the only possible configuration, as we will see cases in which the temperature sensor is integrated into the Ie under test (built-in temperature sensors). During a testing procedure, the temperature waveforms observed at the monitoring points are processed and certain figures of merit are extracted. These figures are compared with those derived from fault-free cases, and depending on the result of the comparison, the circuit is assumed to be structurally defective or operative. The temperature waveform typical of fault-free cases can be obtained either from thermal measurements of a fault• free circuit, or from thermal analysis of a fault-free Ie structure. The specific figure of merit depends on the target fault. Various cases are presented throughout this chapter. There are three factors that may cause a change in the temperature waveform at the temperature monitoring point: first, a variation of the ambient temperature, as it directly affects the value of the sensed temperature, second, a variation in the magnitude of the power dissipated by the heat sources, and third, changes in the circuit that model heat transfer through the Ie chip. The latter category includes topology changes or modifications of the resistance or capacitance values of the components that constitute the circuit model. As for thermal testing, variation of the power magnitude dissipated by the heat sources and changes in the circuit that models heat transfer may indicate structural changes at specific points of the chip. First, changes in the dissipated power are caused by a differing electrical behaviour of the devices 4. Temperature as a test observable of IC's 99 that make up the circuit. Chapter 1 showed how the presence of a structural defect in a circuit can cause modifications of some or all of its devices' electrical behaviour. Second, changes in the RC circuit that models heat transfer through the silicon may be caused by changes in the structure of the IC package or its assembly. For instance, typical examples of factors that generate an increase in the thermal coupling resistance between the heat sources and the temperature monitoring points are faulty thermal contacts between the different layers that constitute the package (due to poor adhesion between them) or the breakdown of the cooling fan. An increase in the thermal coupling resistance causes an increase in the temperature measured for the same amount of power dissipated by the heat sources. These two points make up the main subject matter of this chapter. The effect of ambient temperature variations on temperature measurements will be discussed in Chapter 5. This chapter is structured as follows: Section 4.2 discusses how the modification of the thermal path between the heat source and the heat sink affects the temperature measurements performed on the IC (extrinsic defects). The effects of structural defects in the circuit topology on the temperature map of the silicon are discussed in Section 4.3 (intrinsic defects). In pursuance of the detection of intrinsic defects, we will analyse how different defects that may appear in devices (listed in Chapter 1) affect the power dissipation of devices and the thermal map of the Ie. Finally, we will see how a simple procedure using thermal information can indicate the location of defects in the IC layout.

2. MODIFICATION OF THE THERMAL PATH BETWEEN THE HEAT SOURCES AND THE HEAT SINK

Defects in the chip structure that modify the thermal path from the heat source to the heat sink can be detected and, in some cases, located and identified using temperature measurements. A wide array of defects can be detected with thermal measurements: bad joining between different layers that form the chip structure, incorrect chip soldering to the PCB, faulty chip mounting in a socket, etc. We will present some of the capabilities of thermal testing in this field in two examples. The first example is devoted to thermal testing of the quality of solder joints in surface-mounted technology packages. The second example is more general, and devoted to the thermal testing of package quality. In fact, the entire theory described in the second 100 Chapter 4 example can also be used in the first one. We will first cover a very specific example and later the general theory, for pedagogical reasons.

2.1 Example 1: thermal testing of the quality of solder joints [129],[130].

Figure 4.2 is a diagram of a suiface-mounted technology (SMT) solder joint: it is seen as a pile-up of different elements: lead, solder, copper, insulator and substrate. Solder joints can undergo severe strain in some environments. A gcod example can be found in automotion, where the presence of the engine and the mobile nature of the car subject the solders to thermal cycling and vibrations, which limit their reliability. One of the most common effects of these strains is the appearance of cracks in the layer interfaces that form the solder joint. These cracks lead to two consequences: generation of electrical barriers in the electrical domain, and thermal barriers in the thermal domain. However, thermal barriers are revealed more rapidly than their electrical counterparts, as electrical connections between two surfaces in contact are more effective than thermal ones.

Op1lca1 dilatation rudlng

I

...... :su~ st.,... : . : . : . : . : . : . : . : . : . : . : .

Figure 4.2: Diagram of a solder joint. ([129J. Image courtesy of the CPMOH Lab. Universite Bordeaux 1).

This means that whereas the electrical resistance of the joint does not significantly change during its life (unless it is close to lethal failure), the thermal behaviour of this joint undergoes marked evolution over time. If an electrical current passes through the joint, heat is generated inside the structure due to the Joule effect. In addition, heat is absorbed or released in the layer interfaces due to the Peltier effect. These effects can be measured with an interferometer. This instrument is discussed in detail in the next chapter. For now, bear in mind that the interferometer can measure dilatations of the whole joint structure due to temperature increases. For 4. Temperature as a test observable of IC's 101

example, in this case, it can measure the dilatation at the location indicated in Figure 4.2, taking the base of the substrate as a reference level. By measuring dilatation, we obtain information about the temperature of the whole structure, as the dilatation of inner layers due to their temperature increase generates displacement of the surface layer where measurements are taken. Results published in [129], [130] show the absolute dilatation response of the whole joint structure when a current (1 A) passes through the solder joint for 50 ms. Figure 4.3 shows the waveforms obtained from two responses, measured before and after an accelerated ageing process, consisting of 1,000 thermal cycles with temperature excursions ranging from -4O"C to 125°C, with a transition speed of 4°C/minute and temperature kept at the extreme values for 1 hour. Whereas the electrical resistance measured in [129] does not exhibit a significant change (L\RIR < 5%), the dynamic thermal behaviour shows major differences in both amplitude and rise-fall times. This thermal behaviour indicates that a crack is being produced in the joint structure.

nm 15 1 o··j ...... tP;~f·_ ...... ·h+ ...... ·... ·... ·I ...... h...... I 5 ...... -r". ~ ...... {-•.) ...... ! ......

°0,00 0,06 : 0,12t 0,18! 0,24 i 0,30 (s]

Figure 4.3: Absolute expansion response of a solder joint as a function of time. (a) Before ageing. (b) After ageing. ([129] Image courtesy of the CPMOH-U. Bordeaux 1).

The measured dilatation, Ad, has two independent contributions; one proportional to the current I that flows through the structure, and another proportional to 12. The first term is the Peltier effect, whereas the second is the Joule effect. For example, in Figure 3.3, we see that the current must cross three layers: lead (L), solder (S) and copper (C). Therefore, the measured dilatation can be written as:

&II = (K1 ·IlLS +Kz ·Ilsc)·[ +

+(K4 .RE +Ks .R§ +K6 .R~ ).[z :::) (4.1)

:::) &II = A·[ +B·[z

where the constants Ki are dilatation coefficients, fly is the Peltier coefficient between material i and j, and R is the electrical resistance of the different materials. A and B are proportionality constants. Since the Peltier 102 Chapter 4 effect is reversible, if the current passing through the joint is reversed, the dilatation would be:

2 &12 =-A·/+B·/ (4.2)

Peltier and louIe contributions can be isolated if LJdr LJd2 and LJd1+LJd2 are obtained. Results published in [130] show that the Peltier effect accurately tracks the cracks that appear in the interface between the layers forming the joint, as it is an effect that only occurs in these interfaces.

Dilatation (nm) Dilatation (nm) 20 2,0 .,..------,--:---

1 5 1,5 -I--->-.f.=-4 ---+---I---

10 1,0

5 0,5

o , I ,(5) 0,0 , 0 , 0 0 ,1 0 ,2 0 ,3 0 ,' 0 ,5 0 ,0 0 , 2 0 , 3 o . ~ 0 ,5

Figure 4.4: loule (left) and Peltier (right) expansion of a solder joint. ([130J Image courtesy of the CPMOH-U. Bordeaux I).

2.2 Example 2: thermal testing of the quality of packages. [131]-[148]

Figure 4.5 features the block diagram of a thermal testing procedure (similar to Figure 4.1) but with just one dissipating device acting as a heat source and one temperature monitoring point. Focusing now on this diagram, the question to answer in this section is if it is possible to extract information about the Ie structure (package and coolers) from thermal measurements. Assuming linearity, we can write a transfer function between the temperature observed at the monitoring point and the power dissipated by the heat source. We will name this transfer function h(t) in the time domain and H(s) in the Laplace domain. 4. Temperature as a test observable of Ie's 103

POWER DISSIPATED TEMPERATURE MONITORING BY A DEVICE POINT ------, I I Output node 1 p(t) RC model of : temp(t) heat transfer I Dissipating device 1 trough the IC I I I I I ______J IC Strueture: Chip+Coolers Temperature Sensor Vamb: Ambient System Voul' IouI' fout I Temperature L.-----r--'

_ HeatSink

Figure 4.5: Block diagram of a thermal testing procedure with just one heat source and one monitoring point.

H (s) = Laplace Transform of h(t) = Jh(t)· e -st. dt (4.3) t=O

This transfer function is defined as:

H(s) = T(s) (4.4) P(s)

where T( s) is the Laplace transform of the temperature, tempe t), observed at the monitoring point with null initial conditions and P(s) is the Laplace transform of the power, pet), dissipated by the heat source. By using the convolution theorem, the temperature waveform at the monitoring point can be derived for any power dissipated by the heat source if the transfer function is known:

+00 temp(t) = p(t)*h(t) = Jp(;).h(t-;).d; (4.5)

where * is the convolution operator. In the case of Figure 4.5, this transfer function has a physical meaning. If the temperature monitoring point is at the heat source location, the function H(s) is the thermal impedance of the heat source. The thermal resistance defined in the previous chapter is H(O), the value at the origin of the transformed thermal impedance. Therefore, the thermal impedance is a more 104 Chapter 4

general concept, as it describes the temperature dynamics of a component as a function of its dynamic power dissipation. If the location of the temperature monitoring point is different from the location of the heat source, the function R(s) is the thermal coupling impedance between the two locations. The thermal coupling resistance described in the previous chapter is also the value at the origin of the transformed transfer function. As explained in Section 3.3, the transfer function depends on the thermal properties (thermal conductance, density, specific heat) and the dimensions of the materials that make up the Ie. If the presence of any structural defect in the package modifies any of the thermal properties listed above, the transfer function will change. Therefore, this function is a signature of the internal thermal structure of the chip under test, and a signature of the physical structure of the materials that make it up. The transfer function can be measured in two ways: by working in the frequency domain or in the time domain. In this section we will explain the second approach, as it is the most widely used by the references listed at the end of this chapter. If the power dissipated by the heat source is the Dirac delta function, /Xt), the temperature measured at the monitoring point coincides with the transfer function of the system:

p(t) = c5(t) => P(s) =1 (4.6) H(s) - T(s) - T(s)1 - P(s) - P(s)=l

It is important to emphasise the word "coincides", as the transfer function is not a signal, but rather represents the composition of the system. The measured temperature is a signal, which formally coincides with the transfer function when the power dissipation is the Dirac delta function. Although mathematically clear, it is very difficult to obtain this power dissipation due to its infinite bandwidth. An easier function to obtain is the step function, u( t), defined as:

p(t)=Po ou(t) t

U(t)={~ t~O

In this case, the temperature measured at the monitoring point coincides with the time integral of the transfer function: 4. Temperature as a test observable of IC's 105

1 p(t) = u(t) => P(s) =- s (4.8) H(s) t T(S) = H(s)·P(s) =-=> temp(t) = Jh(q)dq s 0

Therefore:

h(t) = d(temp(t» I (4.9) dt p(t)=u(t)

Thus, the temperature signal obtained when the step function is the power dissipated contains all the information about the thermal system dynamics. Many authors defme thermal transient impedance as the normalised temperature function when the step function is the power dissipated:

_ temp(t) - Tamb Z (4.10) th - tr- Po

where temp(t) is the temperature at the monitoring point, Tamb is the reference temperature in Figure 4.5 and Po is the amplitude of the power step. The thermal transient impedance can be written as the infmite sum of exponential terms:

Zth_tr =r.Rthi .(1_e-tITi ) (4.11) i=1

where Rthi is the amplitude term associated with the time constant 'Zi. We have included the proof of (4.11) as an exercise. This expression can be reached by solving the heat transfer equation in the IC structure when the power dissipated by the heat source is the step function. References [147], [152] and [153] show how to obtain (4.11). Figure 4.6 shows examples of different Zth_tr functions that could be obtained in different IC's. All these curves, although realistic in shape, have been hand plotted, so as to give an idea of this function's appearance. Changes in the "charging" slope of a curve are caused by the different time constant of the different regions through which heat transfers, which may differ in at least one or two orders of magnitude. Therefore, the logarithmic time axis in Figure 4.6 is considerably relevant. For example, the initial 106 Chapter 4 temperature rise is governed by the terms with the smallest time constant. This occurs when heat spreads in the heat source region: due to the small volume involved, the thermal capacitance is small and so is its time constant. The temperature increase of the fIrst microseconds is dominated by the heat source dimensions, and is attributable to its thermal resistance. The larger the capacitance associated with a thermal resistance, the longer the heating interval until its influence is expressed. Large capacitances are always associated with large volumes, such as the mass of the package. The set of transient thermal impedance curves plotted in Figure 4.6 suggests that the heat source and the silicon die is the same in all the cases analysed, whereas there are differences in the packaging materials, type and/or mounting topology. This statement is based on the fact that the differences between the different curves come about when the terms with higher time constants dominate.

70

60 ~50 ~40 I ~30

20

10

10-4 10.3 10.2 10.1 1 10 100 Time(s)

Figure 4.6: Examples of transient thermal impedances.

When the temperature is measured at the heat source location, the transient thermal impedance tends towards thermal resistance as time increases:

00 Rth =Zth_tr (t =00) =L Rthi (4.12) i=l

If expression (4.11) is obtained analytically, the values of the different terms of the series Rthi and 'Zi are always positive (if the temperature is obtained at the heat source location) and decrease in magnitude as i increases. If the temperature is obtained at a location different from that of 4. Temperature as a test observable of IC's 107 the heat source, some terms of the Rthi series may be negative, but its absolute value always decreases as i increases. Therefore, (4.11) can be approximated by simply taking the first N terms of the series into account:

N Zth_tr'" 'LRth; .(1_e-tlTi ) (4.13) ;=1

The value of N that should be used in order to describe the thermal transient impedance depends on the Ie, mounting topology, etc. The same Ie, but with a different cooling system (for example, natural convection or forced convection), gives different thermal transient impedance curves. In the bibliography, different case studies are described in detail. In most of these cases N ranges from 3 to 7. From a thermal modelling point of view, expression (4.13) is very interesting, as it is based on a direct electrical circuit model. For example, Figure 4.7 shows the electrical model of (4.13) for N=3.

Figure 4.7: Foster network that models (4.13).

The network plotted in Figure 4.7 is called Foster network. Therein, each time constant is simply equal to the product of the resistance and capacitance of each cell: r; = RthrC. Although this circuit has been taken directly from (4.13), it does not have physical meaning if its internal nodes are used to interpret the dynamic thermal behaviour at internal points of the Ie. In this electrical circuit, the current flowing across a capacitor during a dynamic regime is the same on both sides of the device, due to the symmetrical variation of the positive and negative electrical charges. In the thermal domain, there are no analogies for the negative electric charges. There is only an analogy between the electrical and thermal domains for the current (heat flow) flowing on one side of the capacitor. A complete analogy exists between the thermal and electrical domains if all the of the network have a terminal connected to 108 Chapter 4 ground. This is the case of the circuit in Figure 4.8, known as a Cauer network. There is no direct correspondence between the elements of the Cauer and Foster networks. In the Cauer network, the value of a time constant does not only depend on an RC pair, but rather on all the resistances and capacitances of the circuit. Chapter 3 of [150], among other textbooks, discusses how to synthesise a transfer function into a Cauer or a Foster network.

Figure 4.8: Cauer network that models (4.13).

If the heat flow inside the IC structure is somehow one-dimensional and the heat losses due to lateral radiation and convection are negligible, we can establish a correspondence between the different nodes of the Cauer network and locations inside the IC structure. It should be underscored that we reached the same conclusion in the previous chapter, when the fmite difference method was used to extract RC models of heat transfer through the silicon structure. Therefore, this is an experimental way to generate thermal models of packages. The models generated will continue to be valid as long as the boundary conditions do not change and the main heat flow path of the situation under analysis is the same as that existing when the thermal transient impedance was extracted. So far, we have seen how we can use thermal measurements to extract information on the structure of an IC to develop an electrical model of it. Before analysing how this information can be used to perform thermal testing of packages, let us briefly present how expression (4.13) can be derived from measurements such as that plotted in Figure 4.6. Two different approaches are followed in the relevant literature to obtain the time constant spectrum from a thermal transient impedance. The time constant spectrum is the representation of the function Rthi=f( r;). An example is plotted in Figure 4.9. Note that the horizontal axis is plotted in logarithmic scale. If there are orders of magnitude in the values of the Rthi associated with each time constant, the vertical axis is then usually also plotted in logarithmic scale. 4. Temperature as a test observable of IC's 109

7

1

10-4 10-3 10-2 1 Time Constant 't (8)

Figure 4.9: Example o/time constant spectrum.

One approach to obtain 4.9 is to perform numerical analysis. An example can be found in [148]. The reader may find details of the technique therein. These analyses begin by evaluating all the exponential functions, starting from the lowest possible time constant. Once the parameters of an exponential are known, this function is extracted from the thermal transient impedance curve. As the exponential with the lowest time constant expresses the latest in the transient thermal impedance curve, the remaining waveform can be shortened. This reduces the complexity of the next iteration, as the number of samples of the remaining waveform to process is reduced. As has been indicated, this method is iteratively applied. However, as the amplitude of the successive waveforms decreases, so does the signal to noise ratio. Therefore, the evaluation of the different Rth;, T;. parameters must take into account the noise present in the measured thermal transient impedance waveform. Another approach to extract the time constant spectrum is the use of the deconvolution operator. For example, this strategy is used in [135] to [145]. This approach is based on the hypothesis that the time constant spectrum is a continuous function rather than a discrete one, such as that plotted in Figure 4.9. This hypothesis is based on the fact that Figure 4.7 and Figure 4.8 are lumped representations of distributed circuits. In fact, in the previous chapter, the RC models of heat transfer through the IC were obtained by approximating the derivative operator present in the heat transfer equation by finite differences. Thus, if the time constant spectrum is a continuous function, (4.11) transforms into: 110 Chapter 4

+00 Zth_tr(t)= JRth('f).(1-e-tl'l").dr (4.14)

-00

If a logarithmic axis is used for time and time constants:

z =lnt (4.15) s=lnr

Then, (4.14) can be written as:

(4.16)

By differentiating both sides with respect to z we obtain:

dZth _tr dz -00 (4.17)

W(z) = e(z-ez)

(4.17) can be written using the convolution operator:

dZth tr ----= m(z) = Rth(z) * W (z) (4.18) dz

Expression (4.18) offers a direct way to determine the continuous time• constant spectrum from either measured or calculated thermal transient impedance waveforms. W( z) works as a weighting function, and can be known beforehand. By use of the convolution theorem, convolutions in the time domain are products of functions in the transformed Fourier domain. Therefore, when working in the transformed domain, the transformed time constant spectrum can be found by dividing the transformed derivative of the thermal transient impedance and the transformed value of W( z). In order to obtain the Foster network from the continuous time constant spectrum, this function must be discretised. One approach is to generate a set of samples from the continuous function whose values are equal to: 4. Temperature as a test observable of IC's 111

1:b Rth(rj ) = JRth( r) . d l' (4.19)

The accuracy of this method is limited by the presence of high frequency noise in the function m(z). This problem is thoroughly analysed in [137]. The problem is rooted in the fact that the function W( z) has small components in the high frequency. As the transformed value of this function is used as a divisor to obtain the transformed value of R(z), these small components enhance the high frequency noise of m(z). Thus, accurate filtering and deconvolution procedures are needed to obtain accurate and reliable results. As explained at the beginning of this section, all the methods presented are directly applicable to the testing of IC packages. The presence of a defect in a package or its assembly usually involves the appearance of thermal barriers or changes in the material densities. For example, a faulty contact between two layers that form part of the package will increase the thermal contact resistance between them. Material voids reduce the mass density. Defective package mounting usually reduces paths for the heat to flow from the heat sources to the heat sink. Therefore, an IC with structural defects in its package would exhibit different transient thermal impedance responses and a different time constant spectrum. In addition to the detection of defects, the processing of the thermal information available may help in their location in the IC structure. This can occur when the heat flow is one-dimensional in the IC structure and the lateral thermal losses in the IC due to convection and radiation are negligible. In that case, as has already been discussed, the different nodes of the Cauer network that models the heat flow through the IC structure have a correspondence with the physical location in the IC structure. For example, Figure 4.10 shows the correspondence that may exist between the nodes of the Cauer network (N=6) and the physical locations of a three layer structure that models an IC (this method of modelling the IC structure was presented in the previous chapter). This correspondence can be established, for example, if thermal simulations of the IC structure are compared with electrical simulations of the RC net. One way to do so would be to use a static thermal analysis to extract isothermal surfaces whose temperature coincides with the voltage of the different nodes of the electrical net when the magnitude of the power dissipated by the heat source and the current supplied by the current source are the same. 112 Chapter 4

1 2 3 AdillbtUu: surface Heat Source ~ J I I .- I )0. CAVER NETWORK f- - .smcon I ...s· '" Rtb'6 ie '" f .!/ --~~taJ...-- ~ i Node Dumbe 13 '& 1 ~ i(t) t ..: ------5 Package 6 Iso/hemwJ .u"ace ICSTRUCTURE

Figure 4.10: Correspondence between the nodes of a Cauer network and locations inside a three-layer structure modelling and Ie.

Reference [148] analyses a structure very similar to the one depicted in Figure 4.10, among others. In that work, the influence that the thermal properties of the materials that form the Ie structure have on the value of the resistances and capacitors that form the net is analysed. As an example, one of the results of this work is that if a defect generates a thermal resistance between the metal and the package layer, the resistor Rth'4 of the electrical net of the faulty Ie will have a significant increase with respect to that of the fault-free Ie. The same occurs with Rth'2 and Rth'3 if the value of thermal contact resistance between the silicon and the metal layer increases. In this case, the subindex of the electrical component belonging to the electrical model of a faulty Ie that disagree the most with the electrical model of the fault-free circuit, gives an idea of the location of the defect.

PAO ¥M~ __

~ 4 • PAD ~ [] PA l o PA3

Nodeoftbe RCNET 2 3 4 5

Figure 4.11: Effect of different mounting topologies on the resistance value of a 6 node Cauer network that models heat transfer through the chip structure. ([148J Image courtesy of Professors P.E. Bagnoli, e. Casarosa, E. Dallago and M. Nardoni). 4. Temperature as a test observable of IC's 113

An interesting approach to using thermal measurements for extraction of information on the internal structure of an IC is discussed in [135], [140] and [143]. In this case, as in that above, the heat flow through the IC structure must somehow be one-dimensional for this technique to be accurate. If this condition is met, then we can extract the cumulative structure Junction from the Cauer-equivalent network. In order to properly understand the meaning of this function, we will discuss two additional functions: cumulative thermal capacitance, CI"., and cumulative thermal resistance RI",. They are respectively defined as the accumulative sum of the capacitances and resistances of the Cauer network from the first node (that which is connected to the current source) up to node m. The cumulative structure Junction gives the values of ClJnas a function of the values of RlJn.

Adiabatic surface Adiabatic surface ///// i': "////. /. I I I I I I I I I I I I I I I I I I I I I I I I Isothermal I I I Isothermal Heat Heat I I I ~ate~..\ :~~ surface I I MJterial A I I surface Source I I I I I Source I I I I I I I I I I I I I I I I I ." ,//. .", Yo Adiabatic surface Adiabatic surface

b) a)

Heat Isothermal Source surface

c)

Figure 4.12: Three different cases of one-dimensional heat transfer.

This function is monotonously increasing, and can provide information on the heat flow through the IC structure. For example, let us analyse what would happen in the three cases illustrated in Figure 4.12. Case a) shows a one-dimensional heat flow though a uniform parallelepiped. Case b) shows a one-dimensional heat flow through a parallelepiped made up of two different materials. Case c) can be approximated to a one-dimensional flow through a homogeneous material (it would be a strictly one-dimensional heat flow if it could be described in cylindrical or spherical coordinates). Due to its simplified geometry and composition, its Cauer-equivalent network can be easily extracted with the finite difference method presented in the previous chapter, using homogeneous grid spacing. If we do so, we will see that all the resistances that form the model of case a) have the same value. The same 114 Chapter 4 would be true for the capacitances. Therefore, the functions CEn and REn of these structures would have a constant slope for any value of m, and the cumulative structure function would also present a constant slope. The same would be true in case b) if we individually consider material A and material B. Nonetheless, the cumulative structure function of the whole structure presents a discontinuity in the slope when the node m reaches the transition between the two materials. In case c), the cumulative structure function will present a second derivative greater than zero, due to the increase in the cross sectional area of the heat flow. This reduces the value of the resistance and increases the value of the capacitance as the node order of the Cauer network increases. From these three cases, we can conclude that a change in the slope in this function represents either a new material or an increase in the cross sectional area of the heat flow or both. Therefore, this function gives us an idea of the internal heat flow path: cross section area and material transitions. For example, if in two thermal measurements of a priori identical IC structures, the cumulative structure functions obtained are identical but one function is shifted to the right respect to the other, it would indicate an extra thermal resistance in the heat flow path for the same thermal capacitance. A descendant of this function is the differential structure function, defined as the derivative of the cumulative thermal capacitance with respect to the cumulative thermal resistance:

(4.20)

If we consider a material such as that illustrated in Figure 4.12a, the capacitance of a dx -wide slice of this material can be written as:

dC1:m = Cth ·A·dx dx (4.21) dR~ =• ..m k.A

where Cth is the heat capacitance per unit of volume, k is the thermal conductivity of the material and A is the cross sectional area of the heat flow. Thus, (4.20) can be transformed into:

(4.22)

This data provides information about the structure of an IC as a function of the cumulative thermal resistance. The figures below show two examples 4. Temperature as a test observable of Ie's 115

of application of this function. Figure 4.13 ([140], with permission from the author), shows how the difference in thermal resistance between two chip mounting configurations can be found using this function. As can be observed, the effect of a socket increases the thermal resistance by about 25 °CfW. It is interesting to note that the function tends to infinity. This indicates that the heat flow reaches the heat sink, which by defmition has an infinite thermal capacitance.

TTMK· Structure function

_. 100 ~ - - - i '-, 10 7 ):::::::::' .. pc""" ~ - -., - ' .-- .- .. "' -- ~ .- J ~ 0.1

0.01

- ~ 0.001 0 10 20 30 40 50 60 10 80 Rth (KIWJ

Figure 4.13: Effect of a chip socket on the differential structure junction ([140), Image courtesy of Professors V. Szekely, M. Rencz, A. Poppe and B. Courtois).

Peaks in the function indicate locations inside the structure with peaks in the thermal capacitance. With these peaks, locations with high thermal capacitance such as the package or mounting plates can be identified. Figure 4.14 ([143], with permission from the author) shows the comparison of two differential structure functions. The numbers on the left-hand curve indicate peaks in the function that can be assigned to volumes in the IC structure with high thermal capacitance. For example, peak 3 indicates the thermal capacitance of the mounting plate, upon which the chip under test has been mounted. As can be observed, the two curves are very similar in shape, but with a shift between them. This shift indicates that the two circuits have the same structure, but the left-hand one has more thermal resistance than the right-hand one. The point at which the two curves split indicates the location inside the IC where the increase in thermal resistance is located. In this case, 116 Chapter 4

this shift indicates an increase in the contact resistance between the die attached to the left-hand device and the right-hand one, probably caused by a defective attachment.

l3SIer. d6...... iaI stNehD fimdion

tiD!

100

~ ~ !.: 0.1

0.01

0 2 3 5 Rlh lKIWI

Figure 4.14: Differential structure function of two chips. The shift between the two curves indicates the location in the Ie structure at which the increase in thermal resistance appears ([143], Image courtesy of Professors V. Szekely, M. Rencz).

3. MODIFICATION OF THE HEAT SOURCES PRESENT IN THE IC

The goal of this section is to analyse how temperature measurements can detect structural defects in the electronic components of the circuit. In the model featured in Figure 4.1, we suppose that the heat path from the heat sources to the heat sink does not undergo any modification, as it is free from defects. However, the heat sources alter its power dissipation. This means that for some reason, the devices that make up this IC dissipate a power that is different from that expected. In this book we will focus on digital CMOS circuits and, from among the different defects listed in Chapter 1, bridges and gate oxide shorts (GOS). However, the methodology used in this section to extract all the data that is going to be dealt with can be applied to other defects, circuits or technologies. This section is divided into three subsections. The first analyses how the power dissipated by the devices changes from the expected value due to the presence of a defect. The second discusses how this power dissipation changes the thermal map of the surface of the silicon and, finally, the third explains how the location of the defect can be extracted in some cases by 4. Temperature as a test observable of IC's 117

processing the temperature waveforms generated on the surface of the silicon.

3.1 Identification of defects as heat sources

Static CMOS circuits are known for their low current consumption during the quiescent state between two logic transitions. However, as we saw in Chapter 1, if a structural defect is present in the circuit, a path can be created for the current to flow from the power supply to GND. Let us focus on bridge and GOS defects. In bridge defects, we will suppose that the defect is activated if the logic value of the two nodes that it joins is different. In GOS defects, it is activated if the two nodes that it joins have different values and they are such that the diode present in the model is in a conducting state (Chapter 1 presented the GOS model). When the defect is activated, an electrical path is opened for the current to flow from the power supply to GND. This path is electrically characterised by the series connection of N+I resistances: the equivalent resistance, Req;, of the N (i goes from 1 to N) MOS transistors that are in this path and the resistance that models the defect: Rb for bridges and Rgos for GOS.

VDD VDD

Req '0' Rb :::: "'1 ~P~N Rb Rbn -=-

Figure 4.15: Electrical current path modelled with resistors.

The equivalent resistance of the MOS i can be written as:

VDS· Req· =--' (4.23) , IDS

where VDS; is the drain to source voltage and IDS is the drain to source current. As: 118 Chapter 4

(4.24)

where W,L are the transistor dimensions, f1 is the carrier mobility in the MOS channel and Cox is the gate capacitance per unit of area. The equivalent resistance depends on the dimensions of the device, the technology and transistor type (P or N). When a defect is activated, extra power is dissipated with respect to the fault-free case. This extra power can be expressed as the addition of two terms: the first one, Po1, is the power dissipated due to the electrical current that flows from the power supply to GND through the defect itself, and can be estimated as:

POl=-N------~~------VbD (4.25)

L Reqi + (Rb or Reos) ;=1

where V DD is the supply voltage. The second term, POb is the power dissipated due to the electrical current flowing from VDD to GND in other branches of the circuit due to degraded logic values in the circuit nodes [163]. The term POlis the sum of the individual power dissipated by the (N+l) devices that conduct the electrical current:

N POI = L P; + PDefect ;=1

(4.26)

Each device becomes a heat source that dissipates power as long as there is current flowing through it, i.e., as long as the test vector that activates the defect is applied. This dissipated energy changes the thermal surface map of 4. Temperature as a test observable of IC's 119

the silicon Ie. The device with the highest equivalent resistance is called the main heat source, as its power dissipation magnitude is the highest. The objective of the following examples is to show the value of the power dissipated by the different devices of simple circuits as a function of the defect topology and device dimensions. The technology is CMOS 1.2 11m, with a supply voltage of 5 V throughout. The various plots have been obtained with an electrical simulator. The expressions presented in this section can be used to scale the power values for reduced technologies.

3.1.1 Example 1: Power dissipated in different bridge topologies [157]

Figure 4.16 shows the three topologies analysed in this example: bridge between the output of two NAND gates (topology 1), bridge between the output of a NAND gate and GND (topology 2) and bridge between the output of a NAND gate and the supply line VDD (topology 3). In all cases, the bridge is represented by its resistive model Rb and the NAND gates are taken from the standard cell library of a 1.2 11m CMOS technology.

A-1

~=D2ut Topology 1 fopology 2 Topology 3

Figure 4.16: Bridging topologies analysed in this example.

The power dissipated by the different devices that conduct current as a function of the bridging resistance value is shown in Figure 4.17 for topology I,Figure 4.18 for topology 2 and Figure 4.19 for topology 3. In each plot, the total power Ptoto defined in (4.25) is also represented. The power dissipation values are also plotted as a function of the input vector that activates the defect, as in topology 1 or topology 2 several input vectors can activate it. 120 Chapter 4

Vector = 1101 Vector = 1100 2.0E-02 -r----,---.,...---,----,----, . i ! · ... · .. · .... ·1· ... · .. · .. ,'"'!· ... •.. ·.. ! , . . ~ 1.5E-02 ~···- -· ····--.l·I· ···--··· ··· ··4 I .. ··- ······4.·1-j···· ······· ... -iI ...... _ .. - 1.0E-02 '~'t':'::I~~;I~=: 5.0E-03 "~~r~~tq:':= O.OE+oo 1E+00 lE+Ol lE+02 lE+03 lE+04 lE+05 1E+00 lE+Ol lE+02 lE+03 lE+04 lE+05

Rb [0] Rb [0]

GI : GI: N2 Rb

G2: PI G2: P2 Ptotal

Figure 4.17: Power dissipated by the different devices of topology 1 as a function of the bridge resistance and the input logic vector.

Vector=10 Vector = 00 4_0E-02

--.-.------~------~ ------~------! ------3.0E-02 i ! ! ~ 2.0E-02 ~:~~F~tsKJ. ~: 1.0E-02 O.OE+OO lE+OO lE+OI lE+02 lE+03 IE+04 lE+05 IE+oo IE+OI IE+02 lE+03 lE+04 IE+05 Rb [0] Rb [0]

Vector 10 : 03 : PI ------Ptota! Vect.or 00: 03: P! i P2

Figure 4.18: Power dissipated by the different devices in topology 2 as a function of both the bridging resistance and the logic input vector.

The behaviour of the total power dissipated is the same in all cases when Rb has low values: constant and independent of the value of Rb. The magnitude is similar in most of the cases: about 15 mW _In topology 2, when the input vector is 00, this figure doubles, as both PMOS transistors, P 1 and P2, feed current to the circuit. In these regions, the heat sources are the MOS 4. Temperature as a test observable of IC's 121 transistor that drives the current: PI (Gate 2) or N2 (Gate 1) in topology 1, either of the PMOS transistors in topology 2 (they have the same equivalent resistance) and N2 (gate G4) in topology 3. In topology 1, there is a small change in the Prot value when the input vector changes if compared with the Prot change in topology 2 when the input vector changes. This is due to the presence of two series-connected NMOS transistors with high equivalent resistance that limit the maximum current flowing from VDD to GND. This means that the denominator of (4.24) is almost independent of the input vector. In all the topologies, P tot is constant and independent of the value of Rb while the denominator of (4.25) verifies:

(4.27)

The value of Rb from which (4.27) is no longer valid depends on the topology and on the input vector, and in this example is between 100 Q and 1 kQ. From this transition value of Rb onwards, the value of Prot decreases because the denominator of (4.25) increases. When the bridging resistance is about 1 ill, the main heat source is the defect itself in all the topologies. For values of Rb higher than 1 ill, the total power gradually decreases until it becomes negligible.

2.0E-02 ! ~ G4:Nl

1.5E-02 G4: N2 ~ I.OE·02 Rb Q.. ----:!~~~:::!~~tt :~ Ptotal 5.0E-03

O.OE+OO -t---;-"""""'''-t---+-=-+--; IE+OO I ' 01 1 +02 IE+03 1 +04 IE+05 Rb ['1]

Figure 4.19: Power dissipated by the devices in topology 3 as a function of the bridging resistance.

3.1.2 Example 2: Effects of device scaling and degraded logic levels [160].

If expressions (4.24), (4.25) and (4.26) are combined, the variations of the total dissipated power due to an activated bridge as a function of the transistor dimensions can be obtained: 122 Chapter 4

(4.28)

where Ai is a proportionality constant.

w[ ]......

20 '~~'~~'4~'*'4-4--'"· . . . , . , , .

· . . .. .,.

14 ·~4·4--~·4~·*·-t+-,,·· . . . . , . ,

11 '44-'4~~4--T'*'+-+'"· . . . . , . , .

8 '44'44'~*'*'*-+'"· . . ,. .. 5 .4-4-.4-4-54-4--4-.*4-'" 1 10 100 lK 10K Rb[n]

Power dissipated by: N1IRb Units:mW. P2 N2 -=Dissipation less than 1 mW.

Figure 4.20: Power dissipated by the devices of an inverter chain as a function of the transistor's dimensions.

This variation is analysed in Figure 4.20, where the power dissipated by the devices belonging to a chain of three inverters is analysed when there is a bridge between the output of the ftrst inverter and the supply voltage. The analysis is presented as a function of both the equivalent resistance of the bridge Rb and the transistor gate width W. value of the gate length is, in all the cases, the minimum allowed by the technology. As can be observed, qualitatively, the behaviour of the power dissipated by the defect and the transistor N1 is the same as that of the devices shown in Figure 4.18 when the input vector was 10: the main heat source for low values of Rb is the transistor that drives the defect, whereas when the model of the defect is between 100 n and 1 ill, the defect itself becomes the main heat source. However, in this case the power dissipation magnitude depends on the dimensions of the transistors. Additionally, Figure 4.20 shows the power dissipated by N2 and P2 due to the degraded logic levels that appear in the gate node. Devices P3 and N3 also become heat sources due to degraded logic levels. Nevertheless, they are not plotted in the chart as they dissipate power for a very small range of values of Rb• 4. Temperature as a test observable of Ie's 123

3.1.3 Example 3: Power dissipated in CMOS combinational circuits with a GOS defect [159].

Using the GOS model presented in Chapter 1, the power dissipated by the devices of the circuit illustrated in Figure 4.21 can be found as a function of both the resistive value of the defect and the defect location. In this figure, the transistor PI belonging to the gate G3 is the defective one, with a source to gate GOS. Figure 4.22 shows the power dissipated by the different devices as a function or the GOS resistance. As in Example 1, the dimensions of the devices are the same as those of the standard cell library of this technology. The input vector applied is 11101.

G1 ~~ -A--L..-~~--'---'-- ~~ ~vout Ve

B

Figure 4.21: Circuit analysed to study the heat sources that appear in a circuit with GOS.

As can be observed, the behaviour of the total power dissipated is the same as in the two examples above. In the region where it is independent of the Roos value, the main heat sources are the MOS transistors that drive the defect (gate Gl, transistors Nl and N2). As was discussed in Chapter 1, the behaviour of a MOS transistor with a GOS defect depends on the defect location in the gate of the transistor. This was modelled in Chapter 1 with the parameter k (k E [0,1]).

2.5E-02

2.0E-02 GI: NI GI: N2 I.5E-02 ~ G3: PI ~ 1.06002 G3: N2 5.0E-03 Ptota! O.OE+OO lE+Ol lE+02 lE+03 1E-H>4 IE+05 Rgos [Q)

Figure 4.22: Power dissipated by the devices in Figure 4.21. Input vector: 11101. 124 Chapter 4

Figure 4.23 shows the evolution of the total power dissipated by the devices in Figure 4.21 as a function of the GOS resistance for five different defect locations: k = 0, 0.3, 0.5, 0.7, l. As can be observed, the power dissipated due to both logic degraded values and the current flowing through the defect is affected by the defect location.

2.5E-02 .------,----,---,----, I ~ 2.0E-02 ---·-·----·----··-·~·--·-- ··-···· --A;~···-----··--·t--. __ .. _-_ ... - i .'}' I ! K=I -; -0 1.5E-02 K=.7

... K=.S 0:1 'Cj I.OE-02 ~~J;~~t-== ' = i \\ i K=.3 .....o =- 5.0E-03 ---.... ·.. -...... ·l .. --·---..... --~ ..... ---· ---.~-..... --...... -... K=O 1 ! : 1 : O . OE+oo +-~~...... ;; ---,...... ,..,;i--~.,...... ,....r-~...;:;::;:;:;:;of IE+O I IE+02 JE+03 IE+05 Rgos

Figure 4.23: Total power dissipated by the devices in Figure 4.21 as a function of the GOS defect location.

3.1.4 Conclusions

Heat sources appear when defects are activated, due to current that flows through the defect and due to logic-degraded values. Focusing on bridge and GOS defects, the main heat sources are the MOS transistors that drive the defect when the resistance that models the defect has a low value. For the technology analysed, when the modelling resistor has a value around 1 kn, the main heat source is the defect itself. For higher values of the modelling resistor, the power dissipated by all the heat sources decreases. The total power dissipated by all the heat sources depends on several variables: supply voltage, transconductance gain of the transistors, transistor size and value of the resistor that models the defect. In the cases analysed, the total power magnitude is around 15 mW for low values of the resistor that models the defect. In this section we have analysed bridge phenomena on a thermal level. Other defects can also be analysed with their electrical model. For example, we can predict what will happen to open faults: the cases described in Chapter I will have increased current consumption due to open faults, and new heat sources will appear in the defective circuit. In this case, the only heat sources will be the MOS transistors. If the open fault generates a 4. Temperature as a test observable of IC's 125

decrease in the switching activity, its thermal effect will be the decrease in heat generated due to this activity.

3.2 Thermal disturbances generated by heat sources

In activated bridge defects, as long as the input vector that activates them is applied to the input of the CUT, the heat sources dissipate power. Due to this dissipated energy, the thermal map of the silicon surface of the IC changes. Based on the linearity hypothesis, all the thermal analysis techniques expounded in the previous chapter can be used to characterise this thermal disturbance. All the data presented in this section has been extracted using an RC model of heat transfer through the IC structure. As the heat sources are small compared with the silicon substrate, one-dimensional heat flow has been assumed. Details of this RC extraction can be found in Section 3.2.1.3. The hypothesis of linearity also facilitates calculation of the overall thermal disturbance generated by the activation of N heat sources as the addition of the N individual thermal disturbances generated by each heat source. Therefore, all the data presented in this section has been extracted by solving the heat transfer equation in an IC structure. In the following chapter, which deals with thermal monitoring of IC's, we will present several temperature measurements obtained with different temperature measuring strategies that agree with the analysis presented in this chapter. Two different kinds of thermal disturbances can be considered: dynamic and static.

3.2.1 Dynamic thermal characterisation

The power function dissipated by the heat sources can be modelled with a pulse function, whose main parameters are its duration and magnitude. Dynamic thermal characterisation is necessary because, depending on the value of this duration, the thermal steady state may not be reached. In the above section it was shown that the thermal steady state is fully reached after power is dissipated for some minutes. However, in normal testing procedures, test vectors are not applied over a period of minutes and, therefore, the thermal steady state is not reached. In such circumstances, as depicted in Figure 4.24, when the temperature is monitored at a point on the surface of the silicon die, a power dissipation pulse will generate a transient temperature increase and decrease as illustrated below. According to Figure 4.24, the main parameters of the thermal disturbance are: 126 Chapter 4

1. Peak value of the thermal waveform observed at the monitoring point: L1T roC]. 2. Delay of the maximum temperature increase of the thermal waveform in relation to the end of the dissipated power pulse: t3-t2 [JIS].

Temperature rC] '&TrC]t~ ______--9~Fi----r-...---"'=---1... -- I I I Time ------..... ,"- Silicon die Power III 7 [m~ __ ~___ --J!

tJ t2 t3 Time '------'"I~ T=t2-tl o Heat source Delay = t3 - t2 o Temperature monitoring point

Figure 4.24: Thermal disturbance generated at a point on the surface of the silicon.

These two parameters are a function e.!":

1. Magnitude of the power dissipated by the heat source: M [m WI. 2. Duration of the power pulse dissipated by the heat source: T [JIS]. 3. Distance between the monitoring point and the heat source: r [pm].

An example of the temperature increases generated as a result of the power dissipated by a MOS transistor and obtained with thermal analysis can be found in Figure 3.17. This figure illustrates temperature increases of about 2°C at 4 /lm from the heat source, over 0.2°C at 24 /lm, over O.l°C at 44 /lm and under O.l°C at 64 /lm. In this figure, the heat source is a PMOS transistor that dissipates 13 mW over four different time periods. At the first point, the temperature increase is very fast: the steady state is reached in 10 /ls. This agrees with the information given in the above section: during the initial moments of power dissipation, heat spreads in the heat source region and, due to the small volume involved, small heat capacitances are associated with this region. Therefore, the associated time constant is low. The longer the pulse duration, the higher the temperature increases. This can be seen in Figure 4.25, where the maximum temperature increase is plotted as a function of the pulse duration for different distances of the monitoring point. The influence of the distance between the monitoring point and the heat source is analysed in Figure 4.26. 4. Temperature as a test observable of IC's 127

All these plots have been extracted with a power dissipation magnitude of 13 mW. With the assumption of linearity, the temperature increases being generated by a power pulse of magnitude M different from 13 mW, the vertical axis in Figure 4.25 and Figure 4.26 should be scaled by a factor of M/13mW.

l.OE-Hl1

r = 10 11m. l.OE+OO !-" ! ~...... ~...... r = 30 11m.

1.0E-0I r = 50 11m.

E r = 80 11m. 1.0E-02 Eo<

Duration [s]

Figure 4.25: Maximum temperature increase as a function of the power pulse duration.

IE+OI ,

IE+OO .... >~~ ::······!···········-·······t··-··········-···+·····_ ...... U ~ Duration = 2 I1s. IE-O I --~~:~;:::J:=.: ~~L=,~- Eo< Duration = 8 I1s.

I E-03 +-.~~+-.~~;-.-~...,....,r-.-~...,...., o 25 50 75 100 Distance [Ilm]

Figure 4.26: Maximum temperature increase as a function of the distance between the heat source and the monitoring point.

The heat source dimensions affect the peak value of the temperature increases when the monitoring point is near the heat source, as the volumetric dissipated power density increases if the transistor dimensions drop. This effect has already been analysed in this book (Figure 3.13). When 128 Chapter 4 isotherms become spheres, the temperature increase is only a function of three variables: the magnitude and duration of the power pulse, and the distance between the monitoring point and the heat source. Furthermore, the thermal disturbance is independent of the heat source geometry. This effect is illustrated in Figure 4.27, where the temperature increase calculated for two different PMOS transistors is plotted. The transistor sizes are: length 1.2 J.lm, and width 25.5 and 5 J.lm respectively. In both cases, the power pulse magnitude is 4 mW and its duration 20 J.ls.

IE+OI ~ . \\ !,. l E+OO .:: ...... : ••••~ 4 •••••• 1••• -.,. ••• ···t ...... ·.. ~·.· ...... \ .., i : 1 1 ..; ..

~, 1;, w=25.5j.lm E l ~.. .. ~ E-c <1 IE-Ol ····_····· .. !_······ ... ·i······ w=5j.lm ! I I E·02 +--.-+-.--i---o...... ;...... -j-.....,....-t o 10 20 30 40 50 Distance [~m]

Figure 4.27: Temperature increase generated by two heat sources of different size. dissipating the same power pulse.

25 ..,---,----,------r-----,--,

,i •...... _. __ l __ . __ __ ...... 20 _...... ~ ...... l ...... i_ ~~~

Duration = IOj.lS .... ~ 10 Duration = 20j.ls Q Duration = 30j.ls

O~~~~~~~~~M 25 50 75 100 125 Distance [~m]

Figure 4.28: Delay of the maximum temperature increase as a function of the distance from the monitoring point.

Finally, Figure 4.28 shows the delay of the maximum temperature increases at the monitoring point. The information given in this figure is plotted as a function of the distance from the temperature monitoring point 4. Temperature as a test observable of IC's 129

for three different pulse durations. As we will see later in this section, this information can be used to obtain the distance between the heat source and the monitoring point and therefore, used to identify the active heat source in a layout full of potential heat sources.

3.2.2 Static thermal characterisation

If the thermal steady state is reached, the temperature becomes time• independent and its value can be obtained with static thermal analysis. In addition to this case, if the heat source dissipates a periodic power function, the temperature increase can also, in some cases, be found with static thermal analysis. This is due to the low-pass filter nature of the thermal coupling. In fact, the RC net that models heat transfer through the IC is a low-pass filter circuit. For example, Figure 4.29 shows the magnitude of the transfer function of the thermal coupling obtained with the RC net as a function of the frequency for different temperature monitoring points. If the heat source dissipates a periodic power function, the high frequency harmonics of this function are strongly attenuated. Depending on the location of the temperature monitoring point and the frequency of the power signal, static thermal analysis of the DC component of the power function may be sufficient to characterise the thermal coupling.

40 .-----,----..,-.,-, ---",-.,-, ---, 35 ·· ·-·L.--+-·-··-·~--..··i· ·-··· ! ! • ; ! 30 ~ 25 .....i 20 r = 20 J.1m '-' 15 $, 10 OIl Q S r = 60 J.1m ~ 0 =N -5 r=80J.1m

<5 g 8 8 8 + + + + + ~ ~ UJ UJ UJ .!.l UJ

Freq.

Figure 4.29: Modulus of the thermal transfer function.

For N devices dissipating a static power, the temperature increase that will be generated at a monitoring point can be written as:

N AT =L Rth;_p . Pi (4.29) i=1 130 Chapter 4

where Rthi-P is the thermal coupling resistance between device i and the temperature monitoring point and Pi is the power dissipated by device i.

CoupIiDg thermal resistan

Distance from Ihe heat source YuislJl.ml Distance from Ihe heat source XuislJl.ml

Figure 4.30: Thennal coupling resistance as a function of the distance between the heat source and the monitoring point. Data obtained with a commercial FDM software.

54 _. :~ ::::::- .. ::::: .' -- 48 -<.---<.--::_- 46 _•• -

42 .' 40 38 -400_300 200 DIstan

Figure 4.31: Thennal coupling resistance as a function of the distance from the heat source. Data obtained with a commercial FDM software.

For example, Figure 4.30 and Figure 4.31 illustrate values of the thermal coupling impedance obtained with numerical methods. In both cases, the heat source is an MOS transistor measuring 20 !lm x 1.2 !lm, and the Ie has been modelled with a three-layer structure. The dimensions of the silicon die are: Imm x Imm x 300 !lm. In the fIrst fIgure, the heat source is in one comer ofthe silicon die, at the coordinates (100 j.lm, 100 !lm). In the second fIgure, the heat source is in the centre of the silicon die. In both cases, the 4. Temperature as a test observable of IC's 131

horizontal axis indicates the coordinates of the monitoring point, taking the heat source location as the origin.

3.3 Location of the heat source

The location of the heat source can be found using thermal measurements. This is relevant for diagnostic purposes, since if the heat source is a MOS transistor and it can be located, the location in the layout of the gate affected by the defect can be known. In addition, information about the characteristics of the defect can be derived through a thermal characterisation. For example, the power dissipated by the heat source can be obtained from temperature measurements, and if there is a resistive bridge, the value of the bridge can be derived. The most direct way to detect the location of the heat source is to perform two-dimensional temperature maps of the surface of the silicon. Thus, the hottest points will indicate the locations of the heat sources. In the following chapter we will present techniques to obtain direct two-dimensional temperature maps of the silicon die. However, we will also demonstrate that taking measurements of the temperature of a point on the surface of the silicon is, in most cases, one of the most versatile, accurate and sensitive approaches. This strategy is called punctual measurements. With punctual measurements, the location of the heat source can be obtained by processing one of the following parameters of the temperature waveform: amplitude, phase, rise time and delay. This is made possible by using the one-dimensional nature of the heat transfer through the silicon in spherical coordinates in a large part of the silicon die or by using the knowledge of the thermal coupling resistance (if its location is based on static amplitude measurements). As long as the heat transfer is spherical, its propagation is independent of the size of the heat source and the size of the silicon die. This process of the temperature signal will provide us with the distance between the heat source and the monitoring point. The location of the heat source can be extracted with multiple point measurements. In this section we will present some temperature information obtained with thermal analysis and that obtained with measurements. Measured data has been obtained with the use of built-in differential temperature sensors and a laser interferometer. The next chapter features details of these temperature-sensing methods. Therefore, these methods have been omitted from this section. 132 Chapter 4

3.3.1 Amplitude measurements

The distance between the heat source and the monitoring point can be derived using measurements of the temperature amplitude (both in transient and static cases) if the magnitude of the power dissipated by the heat source is known. Figure 4.32 shows the static temperature profile from the heat source when it dissipates 10 mW.

--- Simulated data ~mM:·=T: .. , , , , , , , ,

50 100 150 200 250 300 Distance (urn)

Figure 4.32: Static temperature increase as a function of the distance from the heat source. M= 10 mW. Measurements performed with a built-in differential temperature sensor.

The drawback of this approach is that the magnitude of the power dissipated by the heat source must be known. In addition, the temperature sensor must be properly calibrated, as the accuracy of this method depends on the value of temperature interpreted from the signal provided by the temperature sensor.

3.3.2 Phase measurements

Through the use of phase measurements, the distance between the heat source and the monitoring point can be obtained without knowing the magnitude of the power dissipated. In Chapter 3, Example 5, we saw that if we have a semi-spherical heat source of radius ro, dissipating a harmonic function of frequency J, and located inside a semi-infinite homogeneous media, the temperature for r>ro follows the expression (See expression 3.52):

J w·t-r - C -r - .( ~)2.D T(r,t) = -e ~2·D. e (4.30) r 4. Temperature as a test observable of IC's 133

The phase of expression (4.30) exhibits a linear shift with the distance r and has no dependence on the magnitude of the power dissipated by the heat source. As discussed in Example 5 of Chapter 3, as long as the silicon die is seen as a semi-infinite media, when the isotherms are spherical, the temperature distribution will follow expression (4.30).

D Measure f = 123 Hz

o Measure f = 1 KHz

x Measuref= 10KHz

Model f = 123 Hz

Model f = 1 KHz

Modelf = 10KHz

-150 -t---+-----ii---+---+-----i----1 o 50 100 150 200 250 300

Distance [f..lm]

Figure 4.33: Measured and simulated phase of the thermal waveform as a function of the distance between the monitoring point and the heat source for three different frequencies. Measurements performed with a laser thermorejlectometer.

Figure 4.33 shows measured values of phase shift as a function of the distance from the heat source for three different frequencies. Table 4.1 compares the slope of the phase in Figure 4.33 measured with the slope predicted by (4.30) with D=90.36·1Q-6 [m2/s].

Table 4.1: Slope 0 the phase of the temperature waveform rC/JL]' Measures a nd equation. Frequency Equation (4.30) Measure Error % f= 123 Hz -0.118 -0.143 21.1 f= 1 kHz -0.337 -0.330 2 f= 10 kHz -1.068 -1.03 3

As can be observed, there is very good agreement between analytical and experimental data when the heat source is activated with a function of 134 Chapter 4 frequencies 1 kHz and 10kHz, meaning that at these frequencies, the silicon die is seen as a semi-infmite medium. The same figure shows a comparison between data extracted with the RC net and measured data. Good agreement exists between the two categories.

P2 -~?f ./ I / / I 50ll'n I r-

Figure 4.34: Location of heat sources in an Ie [165].

This linear behaviour of the phase has been used to locate the heat source with three phase measurements in [165]. The results of this work are illustrated in Figure 4.34: The monitoring points are PI, P2 and P3. In this layout, there are six devices that may act as a heat source (their numbers are circled in the figure). The distance between the monitoring point and the heat source can be found using phase measurements. As indicated in the figure, the intersection of the three distances reveals the active heat source.

3.3.3 Rise time and delay measurements

Both the delay and the rise time of the temperature waveform are dependent on the distance from the heat source and independent of the power dissipated by the heat source. Figure 4.35 shows the measured delay of the thermal waveform as a function of the distance from the heat source for different dissipated magnitudes. The delay is defined as the time elapsed between the dissipated power pulse'S crossing of the 50% threshold and the thermal signal's crossing of the 50% threshold. The rise time is shown in Figure 4.36. It is defmed as the time between the levels of 30% and 70% of the maximum temperature increase expected 4_ Temperature as a test observable of IC's 135 at each monitoring point. The data of this figure has been extracted in a scenario in which the heat source dissipates a periodic power pulse, magnitude = 10 mW, f = 117 Hz and duty cycle = 17%_

125..,....---..------,,.------. • 100 ····-··-····--·t···-····--·--·-J·--····--·-c-·· : : []

~ 50% - -1- \Power Signal ; 75 ·-··-··-······+·······--··-·-l·~·-··--·--·-·· l I;J i I I - ~ 50 ····--·--·--·--r--·---·-····--)""-··-·--·-·---· ~nnalSignal I 25 .•.•~ ...••••., •••...•.••.•.•••! ......

Delay o o 50 100 150

Distance (11m)

[] M=4mW

• M=24mW

Figure 4.35: Definition of theTlTUll delay and measures as a function of the distance from the heat source_ Measurements performed with a built·in differential temperature sensor_

250-.----..------,,------,

- - ~ 200 ···-···········r·············r···--···i ·--

: 150 ············--l--·······--·-t~----·········

~ 100 ·-·····-·······t··--·-U··-·--j····--·········· Ii d i 50 ...... --....liC- ... ---.-... -.-+ ...... R i i 0+----~---4---_4 o 50 100 150 Distance (11m)

[] M=24mW

)( M=4mW

.& M=3mW

Figure 4.36: Rise time of the theTlTUll waveform as a function of the distance from the heat source and for different power dissipated magnitudes_ Measurements performed with a differential built-in temperature sensor_ 136 Chapter 4

In these figures, the reason for the discrepancy between measurements when the distance increases is due to the noise coupling between the lines that drive the heat source and the built-in differential temperature sensor used to perform the measurements.

4. SUMMARY

In this chapter we have seen how structural defects in the Ie can be detected with thermal measurements. We have presented this topic from two perspectives: 1. Detection of extrinsic defects in the package and chip mounting: Large thermal measurements can be used to extract information on the entire Ie structure. Specifically, we have seen that if the thermal transient impedance is measured, its waveform contains all the dynamic information about the thermal structure of the chip. Processes based on this thermal transient impedance can be used to develop models of packaged Ie's, detect structural defects in the package and identify the location of the defect if the heat flow through the Ie is mainly one• directional and lateral thermal losses are negligible. 2. Detection of intrinsic defects: The presence of defects in the electronic components of a circuit alters the power dissipated by the devices in a fault-free case. One of the most common effects of defects is the appearance of abnormal heat sources that increase the temperature on the surface of the silicon. In this chapter we have characterised this thermal disturbance and we have shown how processes of the sensed temperature waveforms allow the calculation of the distance between the temperature monitoring point and the heat source. Therefore, the exact location of the heat source can be ascertained using multiple temperature measurements.

5. REFERENCES

[129] W. Claeys, V. Quintard, S. Dilhaire and D. Lewis, "Testing of the Quality of Solder Joints through the Analysis of their Thermal Behaviour with an Interferometric Laser Probe," Quality and Reliability Eng. International, Vol. 10, No.3, pp. 237-242, May-Jun. 1994. [130] W. Claeys, V. Quintard, S. Dilhaire, A. Hijazi, Y. Danto, "Early Detection of Ageing in Solder Joints Through Laser Probe Thermal Analysis of Peltier Effect," Quality and Reliability Eng. International, Vol. 10, No.4, pp. 289-295, Jul.-Aug. 1994. [131] D.G. Fardner, J.L. Gardner, G. Lush, W. W. Meinke, "Method for the Analysis of Multicomponent Exponential Decay Curves, "Journal of Chemical Physics, Vol. 31, pp. 978-986,1959. 4. Temperature as a test observable of IC's 137

[132] F. Christianens, E. Beyne, "Transient Thennal Modeling and Characterization of a Hybrid Component," 1996 Electronic Components and Technology Conference, pp. 154- 164. [133] J.W. Sofia, "Analysis of Thermal Transient Data with Synthesized Dynamic Models for Semiconductor Devices," IEEE Transactions on Components, Packaging and Manufacturing Tech. Part A. Vol. 18, No.1, March 1995, pp. 39-47. [134] F. Christianes, B. Vandevelde, E. Beyne, R. Mertens, J. Bergmans, "A Generic Methodology for Deriving Compact Dynamic Thermal Models, Applied to the PSGA Package," IEEE Transactions on Components, Packaging, and Manufacturing Tech. Part A. Vol. 21, No.4, Dec. 1998, pp. 565-575. [135] V. Szekely, T.V. Bien, "Fine Structure of Heat Flow Path in Semiconductor Devices: A Measurement and Identification Method," Solid State Electronics, Vol. 31, pp. 1363-1368, 1988. [136] V. Szekely, M. Rencz, ''Thennal Dynamics and the Time Constant Domain," IEEE Transactions on Components and Packaging Technologies, Vol. 23, No.3, Sep. 2000. [137] V. Szekely, "Identification of RC Networks by Deconvolution: Chances and Limits," IEEE Transactions on Circuits and Systems-:I Fundamental Theory and Applications, Vol. 45, No.3, March 1998, pp. 244-258. [138] V. Szekely, "On the Representation of Infinite-Length Distributed RC One-Ports," IEEE Transactions on Circuits and Systems, Vol. 38, No.7, July 1991, pp. 711-719. [139] V. Szekely, C. Marta, M. Rencz, G. Vegh, S. Torok, "A Thermal Benchmark Chip: Design and Applications," IEEE Transactions on Components, Packaging and Manufacturing Technology - Part A. Vol. 21, No.3, Sept. 1998, pp. 399-405. [140] V. Szekely, M. Rencz, A. Poppe, B. Courtois, "New way for Thermal Transient Testing, "15th Annual IEEE Semiconductor Thermal Measurement and Management Symposium, SEMITHERM 99. pp. 182-188. [141] V. Szekely, M. Rencz, A. Poppe, B. Courtois, "New Hardware Tools for the Thermal Transient Testing of Packages," Proc. 3rd Electronics Packaging Technology Conference, Singapore, Dec. 5-7 2000. [142] V. Szekely, S. Ress, A. Poppe, S. Torok, D. Magyari, Zs. Benedek, K. Torki, B. Courtois, M. Renz, "New Approaches in the Transient Thermal Measurements," Microelectronics ]oumaI31, 2000, 727-733. [143] M. Rencz, V. Szekely, "Determining Partial Thermal Resistances in a Heat-Flow Path with the Help of Transient Measurements, "Proc. 7th THERMINIC Workshop, Paris 2001, pp. 250-255. [144] V. Szekely, M. Renz, L. Pohl, "Novelties in the theory and practice of thennal transient measurements," Proc. 7th THERMINIC Workshop, Paris 2001, pp. 239-244. [145] M. Carmona, S. Marco, J. Palacin, J. Samitier, "A Time-Domain Method for the Analysis of Thermal Impedance Response Preserving the Convolution Form," IEEE Transactions on Components and Packaging Technology, Vol. 22, No.2, June 1999, pp. 238-244. [146] T. Hopkins, R. Tiziani, "Transient Thermal Impedance Considerations in Power Semiconductor Applications," Automotive Power Electronics 1989, pp. 89-97. [147] P.E. Bagnoli, C. Casarosa, M. Ciampi, E. Dallago, "Thermal Resistance Analysis by Induced Transient (TRAIT) Method for Power Electronic Devices Thermal Characterization - Part I: Fundamentals and Theory," IEEE Transactions on Power Electronics, Vol. 13, No.6, November 1998, pp. 1208-1219. [148] P.E. Bagnoli, C. Casarosa, M. Ciampi, E. Dallago, "Thermal Resistance Analysis by Induced Transient (TRAIT) Method for Power Electronic Devices Thermal 138 Chapter 4

Characterization - Part II: Practice and Experiments," IEEE Transactions on Power Electronics, Vol. 13, No.6, November 1998, pp. 1220-1228. [149] F.N. Masana, "A New Approach to the Dynamic Thermal Modelling of Semiconductor Packages," Microelectronics Reliability 41 (2001) 901-912. [150] G.T. Ternes, 1.W. LaPatra, "Introduction to Circuit Synthesis and Design, "Mc Graw• Hill, 1977. [151] RE. Thomas, AJ. Rosa, "Circuits and Signals: An Introduction to Linear and Interface Circuits," John Wiley & Sons. [152] P. Antognetti, G. R Bisio, F. Curatelli, S. Palara, "Three-Dimensional Transient Thermal Simulation: Application to Delayed Short Circuit Protection in Power ICs," IEEE Journal of Solid-State Circuits, Vol. sc-15, No.3, June 1980, pp. 277-280. [153] H.S. Carslaw, J.C. Jaeger, "Conduction of Heat in Solids," Oxford Science Publications, 1959. [154] S. Nishino, K. Ahshima, "A study on fault detection for MSIILSI Board by Thermography," Proc. of Pacific Rim International Symposium on Fault Tolerant Computing, Melbourne, Australia, pp. 198-202, Dec. 1993. [155] S. Nishino, K. Ahshima, "VLSI PCB Fault Detection Ability Using Thermography," The Bulletin ofthe Oyama National College of Technology, No. 27, March 1995. [156] J. K61zer, J. Otto, "Electrical Characterization of Megabit DRAMs, Part2: Internal Testing" IEEE Design & Test of Computers, pp. 39-51, December 1991. [157] J. Altet, "Thermal coupling in mixed circuits: Application to the Test of Integrated Circuits," Ph.D. Thesis. Electronic Engineering Department. Universitat Politecnica de Catalunya, Barcelona, Spain. 1997. ISBN: 84-7653-726-3. [158] J. Altet, A. Rubio, "Built-in dynamic thermal testing technique for ICs," Electronics Letters. Vol. 32, No. 21, pp. 1982-1984, 10th October 1996. [159] J. Altet, A. Rubio, "An Approach to On Line Differential Thermal Testing," Proc. 2nd IEEE International On-Line Testing Workshop IOTW96. 1996. pp. 17-20. [160] J. Altet, A. Rubio, W. Claeys, S. Dilhaire, E. Schaub, H. Tamamoto, "Differential thermal testing: an approach to its feasibility", Journal of Electronic Testing: Theory and Applications 14,57-66,1999. [161] S. Lopez-Buedo, J. Garrido, E. Boemo, "Thermal testing on Reconfigurable Computers", IEEE Design & Test of Computers. Vol. 17. No.1, pp. 84-91. January• March 2000. [162] J. Altet, A. Rubio, E. Schaub, S. Dilhaire, W. Claeys, "Thermal Couplings in Integrated Circuits: Application to Thermal Testing, "IEEE Journal of Solid-State Circuits, Vol. 36, No.1, Jan 2001, pp. 81-91. [163] A. Rubio, J. Figueras, R Rodriguez, 1. Segura, "Iddq Secondary Components in CMOS Logic Circuits Preceded by Defective Stages Affected by Analogue Type Faults," lEE Electronics Letters, 29th August 1991, Vol. 27, No. 18, pp. 1656-1658. [164] S. Dilhaire, 1. Altet, S. Jorez, E. Schaub, A. Rubio, W. Claeys, "Fault localization in les by goniometric laser probing of thermal induced surface waves", Microelectronics Reliability 39, pp. 919-923. 1999. [165] S. Dilhaire, E. Schaub, W.Claeys, J. Altet, A. Rubio, "Localisation of heat sources in electronic circuits by microthermallaser probing", Int. Journal Thermal Sciences, No. 39, 2000, pp. 544-549. [166] 1. Altet, A. Rubio, S. Dilhaire, E. Schaub, W. Claeys, "Thermal Testing, Application to IC Diagnosis", 18th VLSI Test Symposium, VTS'OO, 2000, pp. 189-193. Chapter 5

Thermal monitoring of Ie's

1. INTRODUCTION

The feasibility of the different thermal testing procedures is inherently linked to the existence and performance of temperature monitoring techniques. The purpose of monitoring the temperature of an integrated circuit is to obtain an image of the thermal map (the spatial distribution of temperature) of its surface. This map can cover the entire surface, or merely a region of it. It can be continuous (with a given spatial and thermal resolution) or discrete (temperatures at a finite and specific number of points on the surface). The map can be full static (DC) or dynamic (AC) with a given bandwidth. The data of this map can be processed later to generate a detailed thermal analysis which, if compared to reference maps or using its dynamic behaviour, can be used to distinguish defective devices from fully operative ones. In this chapter we will classify the main measuring methods into three domains, depending on the way in which the measurement is taken: by optical, mechanical or built-in temperature sensors. In Section 2, which deals with optical methods, we will discuss the techniques based on measurements of characteristics of light, either generated or reflected by the surface of the CUT. In the latter case, which is based on general magnitudes of the reflected light, such magnitudes and phases may depend on the IC temperature. We will differentiate between methods that require a coating (liquid crystal thermography, microscoping fluorescence) and those which merely take measurements of the sample surface, with no contact requirements (infrared emission, thermo• rejlectometers, interferometers). 139 140 Chapter 5

Section 3 deals with techniques based on mechanical contact between sensors and the surface to be measured. Section 4 introduces the different methods of measurement that rely on temperature sensors built into the circuit under analysis. The criteria for use of absolute and differential sensors are presented, and different design strategies will be discussed. Finally, Section 5 summarises the different techniques of measurements, comparing their performances, in both static and dynamic scenarios. Most of the techniques presented are evaluated using laboratory experiments on real IC's. In these cases, we will evaluate the technique in terms of its ability to perform three different types of thermal measurements: 1) DC measurements, in which the heat source dissipates a DC power and the temperature is measured when the thermal steady state is reached. This technique is appropriate to perform on-line testing of the integrated system as a whole and to carry out failure analysis of quiescent hot spots. 2) AC measurements, in which the heat source dissipates a periodic power function P(t)=PDC+PACSin(wt). The temperature at one point on the silicon surface can be described by T(t)=TDc+TAc·Sin(wt+(J). Both TAC and (J can be measured using a lock-in . These measurements are used either to characterise thermal coupling impedance values in the frequency domain or to diagnose defective circuits. 3) Transient measurements: the waveform of the temperature at a given point on the surface is obtained when the heat source dissipates a power pulse of magnitude PMAX and duration D, with capabilities that are similar to those mentioned above. The conclusions of this chapter will compare the different techniques in terms of cost, bandwidth (if dynamic measurements can be performed), complexity of the required equipment and applicability (the ability to take measurements at any point of the IC). An IC sample that has been specially designed to characterise thermal couplings is considered in the techniques discussed in 2.2.2 and following sections. In these cases, the power dissipated by MOS transistors behaves as it would in a faulty device. In this IC sample, several individually controllable heat sources (MOS transistors with sizes of 10/1.2 and connected in diode configuration) have been positioned along with the thermal sources, and their control BiCMOS differential temperature sensors have been built-in. The technology features two layers of metal and the sample is coated with oxide and passivation layers. The power supply voltage (Vdd) can be increased up to 5V. The capacity to carry out point measurements or full thermal maps is analysed for each method. 5. Thermal monitoring of IC's 141 2. OPTICAL METHODS

First, a distinction must be made between contact methods and non• contact methods. In the first type, a device or film must be applied to the surface to be measured. In the second, this surface treatment is not needed; direct observation of the surface is sufficient.

2.1 Contact methods

Two methods will be examined: liquid crystal thermography and fluorescence microthermography.

2.1.1 Liquid crystal thermography

The development of liquid crystal (LC) thermography over the last 30 years has provided thetmal engineering with a relatively inexpensive technique for visual ising and measuring surface temperature.

2.1.1.1 Principle of operation

This technique is based on the use of liquid crystals. The term "liquid crystal" denotes a state of aggregation of molecules that is intermediate between the crystalline solid and the amorphous liquid [167]. In this state, a substance is strongly anisotropic in some of its properties, yet at the same time it exhibits a certain degree of fluidity, in many cases comparable to that of an ordinary liquid. The first observations of liquid crystalline or mesomorphic substances were made towards the end of the 19th century by Reinitzer [168] and Lehmann [169]. An essential requirement that substances must fulfil to achieve a mesomorphic state is that their molecules must be highly anisotropic in their geometrical shape, (i.e. rod or disk shapes). Depending on the type of substance, liquid crystals may pass through different mesophases before being transformed into the isotropic liquid state. Transitions between all these phases can be triggered by thermal processes (thermotropic mesomorphism), the application of electrical fields (electrical mesomorphism) and the influence of solvents (lyotropic mesomorphism). The first principle is used in liquid crystal thermography (LCT) while the second forms the basis for the liquid crystal displays (LCD) used in electronic devices. Following the nomenclature proposed by Friedel [170], liquid crystals have been classified into three types: nematic, cholesteric and smectic. The differential factor between these categories is their molecular organisation. The first two types are used regularly in both thermography and displays. 142 Chapter 5

The principle of operation for electrical mesomorphism is shown in Figure 5.1, and that for thermotropic mesomorphism is illustrated in Figure 5.2. In both cases, the key principle of work is the transition of regions from the nematic or cholesteric phase (crystal) to the isotropic phase (liquid). A thin (5-15 Ilm) layer of liquid crystal material sandwiched between two glass plates constitutes the liquid crystal cell. A source of unpolarised light penetrates from the top of the structure (Figure 5.1). The beam is polarised after crossing a polariser fIlm applied to the upper frontal part of the structure. The birefringent nature of the liquid crystal in its nematic or cholesteric phase (Figure 5.1, left) causes the cell to rotate the plane of light polarisation. A second polariser film, applied to the bottom of the cell, now has its polarising axis oriented at 90" with respect to the top one. Since the light has rotated 90" after crossing the cell, it is in phase with the bottom polariser. Thus, the cell is crossed and the light strikes the reflector (a mirror or the integrated circuit surface itself). In this situation (Figure 5.1, left) the light travels upwards and is rotated again, crossing the two top polarisers and leaving the system. Consequently, the light is reflected by the entire cell system. In the event of electrical polarisation (Figure 5.1, right) or thermal transition (Figure 5.2), the crystal liquid exhibits an isotropic phase that does not have a phase rotation effect on the light. In this case (Figure 5.1, right) the light entering the cell from the top is not reflected, and takes on an opaque appearance. Displays are designed with transparent electrodes that trigger isotropic phases in specific regions of the display. In thermography, it is temperature that changes the behaviour to isotropic above a transition characteristic temperature, Tphrtl71]. Microscopic observation reveals the presence of points with temperatures that are higher than TphT• Liquid crystal thermography is based on these materials' capability to bring about a transition change upon reaching a given level of temperature. The transition from the nematic or cholesteric phase to the isotropic phase, which can be observed through microscopic observation of the reflected light (Figures 5.1 and 5.2), occurs at a characteristic temperature TphT, causing a marked, observable contour between the surface regions that are below and above TphT• A map of the surface which differentiates between these two types of regions can be obtained from microphotography and later image processing. Images for different temperatures can be obtained by introducing changes in the sample temperature. For example, let us take the sample linked to a mounted on the microscope stage. If the temperature of the thermostat is T Th and the phase transition occurs at TphT, then the contour of the phase transition represents the borders of regions that prove [171]: 5. Thermal monitoring of Ie's 143

(5.1)

If the value of TTh [171][172] is modified, the contour of lines with a temperature increase of L1T is obtained.

Input of light Output of light Input of Light

~===:t:::::lfront polariser r:::;!:::;:;:::::==::::J It

c:!:===::t::::I rear polariser c:;:!:::::::;;:==::::J

reflector mirror nematic or cholesteric phase isotropic phase

Figure 5.1: Operation of a liquid crystal cell. In the left-hand section the liquid crystal is not biased, the medium is in the nematic phase and the light is reflected. In the right-hand section the liquid crystal is biased and the medium is in isotropic phase; as a result, the light is not reflected [173].

isotropic phase nle phase nlc phase LIQUID CRYSTAL

bot spot SEMICONDUCTOR

Figure 5.2: Thennotropic mesomorphism; a hot spot with a temperature higher than TphT triggers the transition to the isotropic phase{172J.

2.1.1.2 Technique performance

Liquid crystal techniques are used to locate regions of high temperature (hot spots) which are caused by defects in the Ie. For the temperature range 144 Chapter 5 under study, a liquid crystal material with the appropriate phase transition temperature has been selected. The spatial resolution is approximately 1-5 )..Lm. In most cases, the spatial resolution and temperature resolution (approximately 0.5 0c) of this technique are low in comparison with other techniques [178]. The time constant required to heat the liquid crystal cell makes for a restriction of the dynamic resolution, allowing measurement times for one point of approximately 30 seconds. Consequently, the disadvantage of this measuring process is that it is time-consuming, 10-20 minutes being requiring to generate a map [174]. This technique is very economical, much more so than any other technique outlined in this chapter, but it is obviously appropriate only for static maps of temperature. The technique has been used in a wide range of applications oriented towards the detection of localised hot spots. It is among the first tests carried out in the diagnosis of detected failures (failure analysis). The image is usually obtained through the use of lenses or microscopes.

2.1.2 Fluorescent microthermography

This technique offers a high-resolution method to obtain a thermal map of a small surface.

2.1.2.1 Principle of operation

The technique consists of spin-coating the sample with a thin film of polymer that is heavily doped with a fluorophor whose fluorescence quantum yield is strongly dependent on temperature [174]. An image of the sample's excited fluorescence is obtained. This image contains both optical contrast and thermal information. The excitation light is near-UV and the image is recorded with a CCD-camera (see Figure 5.3). By image-processing the photographs of the heated sample and the cold sample (in order to eliminate optical contrast not associated with temperature, the hot image is normalised in relation to the image taken at ambient temperature), it is possible to obtain high contrast images with a spatial resolution of approximately 15 )..Lm (which can be extended by 0.7 )..Lm through near• diffractation-limited procedures [175]) and a temperature resolution of 0.01 DC.

2.1.2.2 Technique performance

This technique allows a spatial resolution of less than l)..Lm and a temperature resolution of 0.01 dc. It takes approximately 500 seconds to obtain a thermal map of a chip in the steady state. 5. Thermal monitoring of Ie's 145

UV source

LI

Figure 5.3: Layout of fluorescent microscopic equipment. The light is applied to the monochromator, M, and so to the sample. The fluorescence is directed towards the CCD• camera, which records the image [176].

2.2 Non-contact methods

This section discusses optical methods that require neither contact nor surface preparation for measurement. This section presents the following techniques: infrared emission thermography, thermoreflectometers and interferometers.

2.2.1 Infrared emission thermography

As mentioned in Chapter 2, thermal radiation, mostly within the infrared portion of the electromagnetic spectrum, is generated by all objects above the temperature of absolute zero. Infrared thermography [171][178] is a process of temperature measurement that detects the invisible infrared radiation and converts the energy detected into visible light or an electrical signal. Thermography facilitates visualisation of the differences between images (for example, from fault-free and defective devices) either in grey scale or in colour (Figure 5.4). Thermal imaging is a fully non-contact, non-intrusive technique. Thus, images of objects that are impossible to physically touch can be scanned. 146 Chapter 5

Figure 5.4: Thermographic image of the surface of an integrated circuit [177].

2.2.1.1 Principle of operation

All objects emit electromagnetic radiation. The spectrum of the radiation and its magnitude are dependent on the temperature of the object and its emissivity (see "radiation as a mechanism of energy transference" in Chapter 2). The blackbody radiation laws make up the basic principles that govern infrared thermography. An object with ideal radiation emission (full absorbance and null reflectance) is called a blackbody in recognition of the fact that blackbodies thoroughly absorb perfectly incident radiation. A blackbody can be seen as representing the theoretical maximum value of radiation for a given temperature. Planck's law expresses the spectral distribution of blackbody radiation. This law [179][180] describes the hemispherical emission of thermal radiation within a differential interval at wavelength A from a unitary area of the blackbody at temperature T into a solid angle of 21l'steradians, and is expressed as:

(5.2)

where W( A, T) is the blackbody monochromatic emissive power (WIm\ c is the speed of light in vacuum, h is Planck's constant and k is Boltzmann's constant. The total power per unit of area emitted by a blackbody can be calculated by integrating equation (2) over all wavelengths:

(5.3) 5. Thermal monitoring ofIC's 147

where 0' is the Stefan-Boltzmann constant (5.670 x 10-8 W/m2K4). Finally, the total power per unit of area for a real surface is given by:

(5.4)

where £ is the emissivity factor of the surface. Two types of cameras can be used to record an infrared image, infrared (IR) cameras and conventional CCD cameras. IR cameras contain a special lens (suited for IR radiation) and sensitive material. These cameras usually operate in three primary infrared (IR) wavebands: near-infrared (or shortwave IR), midwave IR, and longwave IR. Near-infrared (NIR) cameras are based on indium gallium arsenide (InGaAs) technology. Indium antimonide (InSb) detectors are used in midwave infrared (MWIR) cameras. For long-wavelength requirements there are uncooled microbolometer and quantum-well infrared (QWIP) array cameras. These cameras are quite costly. The second type of cameras that can record an infrared image are CCD cameras. These cameras are semiconductor light-sensitive charge-coupled devices. When incident radiation comes into contact with the light-sensitive area (each pixel of the camera has an approximate area of 15 !lm x 15 !lm) it generates free electrons due to the photoelectric effect. The accumulated charge is linearly proportional to the incident radiation and the exposure time. The wavelength of incident radiation to which the CCD device responds is dependent on the spectrum of the silicon. Silicon CCD devices not only respond to visible light but also have a range that covers the near infrared region above 0.8 !lm of wavelength. Since the cut-off wavelength of silicon is 1.1 !lm, use of conventional (and more economical) CCD cameras is viable to obtain images of infrared radiation between 0.8 and 1.1 !lm. Calibration of the measuring system is essential in this measurement technique, which is highly sophisticated so as to evaluate offset levels and gains of the components, and take into account the radiation emitted by the surroundings which is reflected in the object, and the radiation emitted by the optical system (lens and sensors) itself.

2.2.1.2 Technique performance

Infrared thermography (lRn can be used to obtain a high temperature resolution, usually of the order of 0.1 °C [174]. It is somewhat less accurate due to the unpredictability of the surface thermal-emissivity factor. Coating the chip surface with black paint makes an accuracy of 0.2°C to 0.3°C a realistic result. The primary limitation of IR microscopy is its spatial resolution. The maximum resolution is usually one or two times the 148 Chapter 5 illuminating wavelength. In IR radiation, resolutions of around 1-2 !J,m are common [178]. The AC measurements are limited to the thermal capacitances, which allow frequencies of around 50 kHz in integrated circuits. The temperature resolution can be of the order of 0.1 °C and the accuracy is moderate.

2.2.2 Thermoreflectometers [181]-[192]

Thermoreflectometry is a non-contact point optical method, which facilitates transient and AC temperature measurements. It has a bandwidth that ranges from -DC to 150 MHz. The physical principle of thermoreflectometry is based on the thermoreflectance that materials undergo: when a semiconductor or metal registers a temperature change, L1T, its reflection coefficient (for normal incidence and fixed wavelength), R, also undergoes a change, &, which can be written as:

I1R = a . I1T + b . I1T2 + ... (5.5)

where a and b are proportionality constants. Assuming linearity, if a laser beam of intensity f/Jo is focused on a material, and the reflected light intensity, R·f/Jo, is detected with a , the measured current is proportional to this reflected light and equal to:

(5.6)

The temperature increase can be derived by measuring the current change, &, in relation to the temperature change ,L1T:

11! = I1R = 1. . aR . I1T ! R R aT (5.7) dT~(~~~r~ ~VF-l.~

The exact value of the constant If/ depends on the material and light wavelengths. For example, it is 1.13·10-4 Kl for pure silicon and 2.5.10-5 Kl for aluminium [186], [187], (wavelength = 632.8 nm). When measuring the temperature of the surface of an IC, the laser beam reaches the silicon surface through silicon dioxide and passivation layers. This affects the coefficient If/by a scaling factor that can be greater or lesser 5. Thermal monitoring of lC's 149 than unity, depending on the thickness of these layers [183]. If the thickness of the layers is unknown, the exact value of this coefficient will also be unknown. It should be noted that although this uncertainty affects absolute temperature measurements, it does not affect AC phase measurements.

THERMOREFLECTANCE MICROSCOPE

- --, 1 •~ .: &,'1 ~- 1 - __ I

'--______LBPT · B.NS4Jtf·CPNl.'1H

Figure 5.5: Block diagram of a thermorejlectometer. Image courtesy of the CPMOH• Universite Bordeaux I.

Results published in the relevant literature [184], [185] show that laser probes are rapid surface thermometers, with excellent lateral resolution (1 !-lm) and large dynamics (~T: 102 to 1O-3K). Figure 5.5 shows a block diagram of a basic thermoreflectance microscope set-up. In this case, the laser is of the stabilised HeNe variety. The Faraday isolator prevents the reflected light from re-entering into the laser. The different prisms, PI and P2, transmit or reflect the light depending on its polarisation. The IJ4 and IJ2 plates rotate the polarisation of the light when it passes through them. In this example, a white light source allows the location aimed at by the laser to be monitored on a video screen. Both the 150 Chapter 5 signal generated by the photodiode and the signal from the generator that biases the heat source can be viewed on an oscilloscope screen. The microscope objective focuses the laser light on the silicon surface of an Ie with the desired beam diameter. The sample is mounted on a socket that is attached to a motor system that can transport it through the plane perpendicular to the laser beam. Thus, measurements of any point on the Ie surface can be taken. This block diagram demonstrates that a specially• prepared laboratory is needed to perform such measurements. Figure 5.6 shows a detail of the lens that focuses the laser light on the silicon surface of an Ie and the socket in which it is mounted. In this example, the laser beam has a diameter of 3 microns over the silicon surface.

Figure 5.6: Photograph of the lens that focuses the laser beam on the silicon substrate surface. Image courtesy of the CPMOH-Universite Bordeaux I.

Figure 5.7: Detail of the IC targeted by the laser in Figure 5.6. The heat source is an MOS transistor in diode configuration. WIL = 1011.2. [190). 5. Thermal monitoring of IC's 151

In order to illustrate some measurements, Figure 5.7 shows a detail of the IC targeted by the laser in Figure 5.6. This IC was specifically developed in [190] to measure the thermal coupling generated by digital MOS transistors acting as heat sources. In this IC, heat sources are nMOS transistors (with a size of 10/1.2) connected in a diode configuration, i.e., with the source grounded and the gate connected to the drain. They behave in the same way as the heat sources that appear in defective circuits. In addition, each gate is directly accessible through a pin. Thus, any type of voltage function (and therefore, power function) can be applied. Transient and AC measurements can be taken by applying different voltage functions to the gates of these MOS transistors.

0.06 X=30jll11

X=40l1m 0.05 E X=50jll11 Eo< 0.04

=-~ X=90l1m 10 0.01 =- X=IOOl1m

0 X=15011m O.OE+OO 5.0E·05 1.0E-04 1.5E·04 2.0E-04 Time [s] X = 200 11m

Figure 5.8: Signal obtained with the laser thermorejlectometer along the x axis in Figure 5.7 at various distances from the heat source.

In Figure 5.7, a heat source is located at the origin of a coordinate axis. Following the x axis, there is a region that is free of metal lines, in which a homogeneous thickness of passivation and oxide layers can be assumed. If this assumption is true, when thermal measurements are taken along this axis, the constant If/ will be affected by the same scaling factor in all the measurements. Figure 5.8 shows the waveforms obtained at ten different points along this axis, when the heat source dissipates a power pulse of magnitude 23 mW for 100 Ils. As can be predicted by the thermal analyses performed in Chapter 4, as the distance from the heat source increases, the maximum temperature decreases and the instant at which the temperature waveform reaches its maximum is delayed. This figure shows an example of transient measurements. 152 Chapter 5

AC measurements are highly suitable for the validation of electrical models of heat conduction through an IC. Phase AC measurements were shown in Chapter 4, Section 3.3.2 and compared with simulated data from the model. The measurements given in Figure 4.32 were actually taken along the x axis in Figure 5.7. The advantage of phase measurements is that they are not affected by either passivation or oxide layers placed over the silicon. Figure 5.9 compares the trends of the measured and simulated peak-to-peak amplitudes. In this case, as amplitude measurements are affected by passivation and oxide layers, the data obtained from the model has been scaled to achieve a better fit with the measured data.

100.---,---,---,---,---,--.

o Measure. f = 123 Hz E j o Measure. f = 1 KHz. ~ 10 . ..---!.------+------..

Figure 5.9: Measured and simulated amplitude of the thermal disturbance when the heat source is activated with a harmonic function.

Figure 5.7 features two points which have been marked A and B. These points are equidistant from the heat source (65 /lm). However, metal lines cross the space between A and the heat source, whereas there are no metal lines between point B and the heat source. Figure 5.10 shows the amplitude and phase of the thermal disturbance measured in polar coordinates at the two points. Slight discrepancies can be observed for low-frequency thermal waveforms. On the basis of these measurements it can be concluded that most of the energy dissipated by the heat source flows through the lower surface of the silicon die. Therefore, metal lines have a negligible influence on the surface distribution of the thermal disturbance. The slight differences in the low-frequency measurements can be attributed to the boundary conditions of the silicon die. As has already been discussed, point measurements are possible with this technique. Nevertheless, thermal maps can also be obtained by scanning an 5. Thermal monitoring ofIC's 153

area of the Ie. For example, Figure 5.11 (right) shows a phase map (60!lm x 60 !lm) obtained from AC measurements. The left-hand side of the figure shows a photograph of the layout area scanned. The activated heat source can be easily located by comparing the phase map with the layout of the scanned area.

270 + Point B: Magnitude and phase

o Point A: Magnitude and phase

j= 10 - 300Hz

j= I KHz o j= 3 KHz.

j= 5 KHz.

j= 10 KHz.

90

Figure 5.10: Measurements of amplitude and phase at two points: A (0, 65 jJm), B (65 jJm, 0). An offset of 230° has been added to the phase to keep the data from being covered by the vertical axis.

: :~ I 0-8-7

0 ·9-6 0 -10-9 j · 11 -1 -12--11

. - 1 3.-1~ .-14--13 . - l S- 1 ~ . - 16--1 ~ . - 1 7- 1 ~ . -l8-d . -19-1~ . -2 0- 1 ~ . -21-20

Figure 5_11: Example of a two-dimensional phase map. [188J. 154 Chapter 5

2.2.3 InterferoDBeters

Figure 5.12 shows the principle of a Michelson interferometer. Interferometry consists of comparing the phases of two light waves which are coherent with each other: one reflected in a mirror and another reflected on the surface to be measured. Since these light waves must be coherent with each other, the laser source has to be the same for both of them. If these two waveforms are superimposed, the intensity of the resulting waveform is:

(5.8)

where 10 is the intensity emitted by the laser, A is the light wavelength, L,., called the reference length, is the length of the optical path from the beam splitter to the mirror, Le is the length of the device, and the length of the optical path from the beam splitter to the measured sample.

Heat Source

I I Integrated I I Circuit II II II II , ' Thennal --t>I- . Photodiode ax Ex panSlOn

Figure 5.12: Principle of the Michelson interferometer.

If there is an active heat source in the measured sample, and the reflecting surface of this sample shifts L1x due to a thermal expansion, the light intensity detected by the photodiode is:

(5.9)

The intensity of the resulting light is a periodic function as L1x increases. The reference length and the length of the device must be adjusted to provide maximum variation of I for a variation of L1x. Examples of Michelson interferometers devoted to microelectronic applications can be found in [193] and [184]. The basic block diagram of the 5. Thermal monitoring of IC's 155

interferometers developed in these references is illustrated in Figure 5.13: a beam from the HeNe laser is split into a reference arm and a probe arm, ended respectively by a piezomirror and the sample under test. The sample is mounted on a computer-controlled translation stage while the mirror can be moved by a piezoactuator controlled by the feedback loop of the active stabilisation. The two reflected beams that come out of the interferometer interfere with each other on a photodiode. The detected interferometric signal is sent to an active stabilisation system. The principle of active stabilisation consists of co:~rolling the position of the piezomirror in order to compensate the slow phase shifts between the two interfering beams. With the interferometer discussed in [184], it is possible to record either the interferometric signal induced in the photodiode or the voltage applied to the piezomirror by the output of the feedback loop. Results published in [184] show a sensitivity of up to 10-16 m and a frequency range from almost DC to 125 MHz. The measurement range is from 10-16 m (in some applications) to 10-3 m.

Piezomirror Active Stabilization l-t~:::;:~ Microscope Objective

Beam Spliter Transistor Photodiode under test

Laser

Figure 5.13: Interferometric set-up surface displacement. Image courtesy of the CPMOH - Universite Bordeaux I {188}.

Following the x axis marked in Figure 5.7, examples of transient and AC measurements are shown in Figure 5.15 and Figure 5.14, respectively [188]. AC measurements are plotted as a function of the distance from the heat source for different activation frequencies. As in the case of the reflectometer, the linear behaviour of the phase can be used to locate the heat source with three measurements. Transient measurements are shown in Figure 5.15, in which different thermal expansion signals are shown for five given locations in the x axis of Figure 5.7 (X=O, 30, 60, 90 and 120 /lm). As 156 Chapter 5

in Figure 5.8, the magnitude of the power dissipated by the heat source is 23 mW.

AmDlitude (run) 0

·50 .. 0.1 ."e ·\00 ~ .c :a. 0.01 =- e ·\50 < 0.001

·200 0.0001 \ 0 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 Distance (um) Distance

.... -8- 1010Hz - - X - . 4010 Hz 10010 Hz • 70010Ilz -[3-- 2010 Hz .. + .. 5010 Hz • _ 20010 Hz ----..- 100010 Hz .. (> . . 3010 Hz . -/:> - 7010 Hz __ + . _40010 Hz

Figure 5.14: Phase (right) and amplitude (left) interferometric measurements with periodic activation of the heat source. Measurements taken along the x axis in Figure 5.7.

I Nonnal urface displacement (nm) ...,.....,..... 5 1 Excitation (V) I 1-- 0,8 ( --x=Ollm 4 I --x=30!1m --x=60!1m 0,6 3 --x=90!1m --x=120/tm 0,4 2 ~

0,2 I ""--"" ~ ~ , ..... 50 100 150 °° time (!1s)

Figure 5.15: Transient interferometric measurements. Measurements taken along the x axis in Figure 5.7.

Differential interferometric measurements can be used to locate the active heat sources. Differential measurements are taken by measuring the difference in expansion between two nearby points. Figure 5.16 shows this approach: the incoming laser beam is separated into two beams with a Wollaston prism. A Wollaston prism is an optical element made up of two orthogonal prisms. If light enters this prism with a polarisation of 45°, it is 5. Thermal monitoring of IC's 157

split into two beams, one with a vertical polarisation and the other with a horizontal polarisation. The two laser beams are focused on the circuit at two points, A and B. The mid-point between A and B is at a distance ro from the heat source. When the Wollaston prism rotates, so do points A and B on the silicon surface. For example, when a single heat source is active, the signal provided by the differential measurement would be maximal if the heat source and points A and B were aligned. Contrarily, the differential measurement is null if points A and B are on an isothermal surface. Therefore, directional detection is possible with differential measurements.

TOP VIEW

Surface position ~".A"gl'

"1-+-tI--'WoUaston prism

I B I ------I -SUrface relaxed- position I O""'------...... ----- ....Di stance from the o Heat Source

Figure 5.16: Differential interferometer measurement principle.

DisplacemeDt (pm) Dl3placanent (pm) 90' 90' 120' 60' Th",U13I Wave A 25 J>ropagation Axis 2 15 30" I 0.5 o 0' 0'

240' 300' 210' Thcnnal Wave B Propagation Axis

Figure 5.17: Response of the differential interferometer as a function of angular position of points A and B. Left: One heat source active. Right: Two heat sources active. [194J 158 Chapter 5

Detailed examples are featured in [194] and [184]. Some results are reproduced in Figure 5.17 as an example. The two charts show differential AC measurement as a function of the orientation angle. The dashed lines indicate the direction of the heat source with respect to the point roo In the case of the left-hand figure, there is only one active heat source, and the activation frequency is 1 kHz. In the right-hand figure, two heat sources are active (with an activation frequency of 75 kHz). The two dashed lines indicate the direction of the activated heat sources with respect to the point roo This directional detection offers useful capabilities for multiple heat source detection. In this section, the interferometry technique has been presented as a point measuring strategy to obtain displacements due to thermal expansions. If a surface of an IC is lit with a laser and the reflected light is processed with a CCD camera, direct two-dimensional maps can be obtained. This procedure can be applied in both interferometry and reflectometry. When applied in interferometry, it is known as electronic speckle pattern interferometry (ESP!). Its working principle is illustrated in Figure 5.18. Details of its performance and examples of its application can be found in [195] and [196]. One interesting feature of this tech~que is that, in addition to direct two-dimensional maps of thermal expansions of the IC surface, processing of the signal provides access to surface stress values due to thermal expansions, and indicates weak points in the IC structure due to thermal stress.

;I"{//// 4' Mirror \tzt/ ---aHeat ,--_Las_e_r---sD- -~~ ::: == = __ Source -I I I I Integrated I I I I Circuit L ! II \ I II

~ 1: Thermal ~ x Expansion

Figure 5.18: Basic operation a/the ESPl.

3. MECHANICAL METHODS

This section explains how to obtain thermal imaging using an atomic force microscope (AFM). 5. Thermal monitoring of Ie's 159

Photodiode System --t>I- ~'('" --t>I- ~ --t>I-~, '"' ,I. " \ r------'y'----1 01( , , < \ Electronic Control

.-----'-----.Device Under Measure

Piezoelectric ceramic

Figure 5.19: Block diagram of an AFM system.

The basic block diagram of an atomic force microscope is offered in Figure 5.19. It consists of a thermal probe that is in contact with the device being measured. The contact surface has a radius of tens or hundreds of nanometers. The sample is mounted on a piezoelectric ceramic. Either the cantilever that holds the probe or the piezoelectric ceramic (or both) can be moved in x, y and z directions. For example, if the piezoelectric ceramic is moved upwards, the interatomic forces between the sample and the probe bend the cantilever. This is detected by the , which track a laser beam that is reflected on the probe. The feedback control system moves the piezoelectric ceramic or the cantilever in order to maintain either a constant cantilever deflection or the sample force. Figure 5.20 shows an example of a thermal probe. In this case, the probe is a Wollaston with a shaped tip, and which has been etched to reveal the platinum core wire. The Pt wire can be used as a thermal resistor: the tip voltage is measured at a constant input current while the surface is scanned. Therefore, the Pt electrical resistance can be estimated at each point on the sample surface. For this specific probe, preliminary calibration allows the relation of temperature variations to electrical resistance variations to within 10% accuracy [188]. Other measurements can be taken with a different configuration: if the temperature of the Pt probe is kept constant while the surface is being scanned and the power dissipated by the Pt resistance is monitored, the thermal conductivity of the material in contact with the probe can be obtained. 160 Chapter 5

Wollaston Wire :';"f1"~/

Cantilever Carrier

Platinum Core

Figure 5.20: Example of a thermal probe. Image courtesy of the LET-ENSMA [188].

5 0~n '

Ol'm ...... ____"---'-.....;....-'-1 Ollm 50llm a) Topographic + Effect Shading

50I'lU

c) Magnitude d) Phase f=123 Hz f=J 23 Hz.

Figure 5.21: Example of topographic and AC measurements taken with a scanning thermal microscope (SThM - Topometrix). [188].

Figure 5.21 shows examples of measurements taken with a specific atomic force microscope. Point a) reports a processed topographic image. Topographic images can be obtained with these systems if the signal provided by either the piezoelectric ceramic or the photodiode system is 5. Thermal monitoring of IC's 161 recorded. Point b) shows a photograph of the portion of the layout that has been scanned by the probe. Comparison of a) and b) facilitates the identification and location of the metal lines and devices. In this picture, there is an MOS transistor that acts as a heat source (the same MOS transistors described in Figure 5.7). Points c) and d) show AC (magnitude and phase) temperature measurements obtained when the heat source is modulated at a frequency of 123 Hz and the thermal signal is filtered at the same reference frequency. In this measuring system, the oxide and passivation layers placed over the silicon attenuate the temperature signal. For example, in [197] a temperature increase of 0.6 DC is measured when an MOS transistor dissipating 33 mW is coated with a passivation layer consisting of 1 J.1m of silicon dioxide and I J.1m of silicon nitride. Another element that masks the measurement is the presence of metal lines between the heat source and the measuring probe. For example, Figure 5.21 shows how both phase and amplitude signals are attenuated by the location of the metal 2 line that runs crosswise, acting as a heat spreader. As for the spatial resolution of this technique, in [197] it is within the range of 0.5 J.1m. Recent works, such as [198], show a spatial resolution in the nanometer range for the topographic abilities of the atomic force microscope.

4. BUILT-IN TEMPERATURE SENSORS

All the methods discussed so far require either direct visual access to the silicon die or a specially prepared laboratory to perform the measurements, or both. Incorporation of a temperature sensor into the same silicon die as the circuit under test provides flexibility for the test procedure and allows both laboratory testing and in-field testing. Neither direct visual access to the silicon die nor a laboratory environment are required. Direct thermal coupling measurements can be taken, without them being affected by any layer applied to the silicon. This temperature-sensing strategy is based on on-chip or built-in temperature sensors. There are two major drawbacks to this strategy: first, there is a silicon area overhead, as space is needed for the temperature sensorls in addition to that required for the circuit. Second, the temperature monitoring points are restricted to the placement sites of the temperature-sensitive devices. The output signal of built-in temperature sensors can be proportional either to absolute temperature or to the difference in temperature at two points on the silicon surface. This leads to the categorisation of temperature 162 Chapter 5 sensors into two groups: absolute and differential. The electrical output signal of a sensor can be respectively written for the two categories as:

SignalOut = SA -T + C (5.10) SignalOut = SD . (7; - 7;)

where SA is the absolute sensitivity of the absolute temperature sensor, SD is the differential sensitivity of the differential temperature sensor, C is an offset constant that may be zero and T, T2 and T1 are temperatures at different points on the silicon surface. Absolute temperature sensors are used to monitor the working temperature of the circuit under test and eventually correct its operating point if it is working beyond its reliability limit. Another application of absolute temperature sensors is the thermal testing of packages. Differential temperature sensors are used to detect alterations of the thermal map of the silicon due to changes in the power dissipated by its devices. They are insensitive to temperature increases that may offset the thermal map of the silicon surface, e.g., ambient temperature changes or leakage current increases in CMOS circuits.

4.1 Absolute temperature sensors

The literature features descriptions of temperature sensors being implemented in many applications: thermometers, flow meters, temperature compensation circuitry, etc. Recent applications include the thermal management and thermal testing of IC' s. The first solid state temperature sensors were linear circuits manufactured with bipolar technology, whose temperature-sensitive device was a forward-biased PN junction: a diode or bipolar transistor. These devices were used because their electrical nature is acutely sensitive to temperature, in a highly predictable (meaning that their behaviour is process• independent) and repeatable way. In [202], temperature sensors based on transistors are classified into three groups, depending on the number of transistors used as a temperature transducer (either one, two or three). In single-transistor temperature sensors, the base-emitter voltage drop of a single transistor is used as a temperature-sensitive parameter. In this case, if the collector current that biases a transistor has the following temperature dependence:

(5.11) 5. Thermal monitoring ofIC's 163

where 8 and F are temperature-independent parameters, then in [203] it follows that:

(5.12)

where VBEfO) and r are temperature-independent constants, f,(T) is a function that includes the temperature-dependent terms which are higher than or equal to two and T is the absolute temperature. The value of the frrst two parameters depends on the value of 8in (5.7) and, to a slight degree, on the doping level of the base. Frequent values are 1.27 and 2.2 m V/ °C, respectively. The functionf,(T) also depends on the value of 8, and can even be zero if the correct 8 is chosen. The sensors that feature two transistors as temperature transducers are also called PTAT: proportional to absolute temperature. In these sensors, the difference between the base-emitter voltages of two bipolar transistors is used as a temperature-dependent parameter. Based on (2.36), if two temperature transistors from the same process are at the same temperature, then:

k·T AVBE =VBE -VBE =-·In(p·r) 1 2 q

(5.13)

where k is the Boltzmann constant, q is the electron charge, T is the absolute temperature, A is the emitter area of the transistor i and lei is the collector current of transistor i. The behaviour of these sensors is highly linear, as the higher order temperature dependence terms of VBEJ and VBE2 tend to cancel each other out. The temperature sensors that include three transistors as temperature transducers are also called sensors with an intrinsic reference. The output signal of these sensors is proportional to:

(5. 14)

where KJ and K2 are multiplicative constants that have been adjusted to cancel the offset of the VPTAT function when the temperature is within the typical IC working temperature range. 164 Chapter 5

Current Mirror p 1

• R2 Vout=(R21R1)·~VBE

'----r----' J::L\ V BEIR 1

Figure 5.22: Principle ofa PTAT temperature sensor.

Usually, sensors with an intrinsic reference have extra circuitry to compensate the non-linear behaviour of transistor 3, which can be either built-in digital processing [208] or consist of different biasing strategies [209]. Temperature sensors for thermal testing and thermal monitoring must be as simple as possible in order to keep area overhead as low as possible. Examples of the use of p-n junctions to monitor the working temperature of digital microprocessors are featured in [200] and [201]. In [200], a single diode is built into the centre of the silicon die. The anode and cathode of this diode are accessible through two chip pins. Several IC's, such as the ADM 1021 , MAX1617A or LM84, have been developed that can be connected to the diode terminals, or to configure a sensor circuit that provides information on the microprocessor working temperature. In [201], a single diode working as a temperature transducer is built-in with a microprocessor, together with a reference source and a comparator. When the working temperature of the microprocessor reaches the maximum value allowed, the signal delivered by the diode reaches the value of the reference source and the comparator changes its state. The output signal of this comparator is accessible through a chip pin. In both cases, when the temperature reaches the maximum value allowed, the manufacturer recommends a reduction of the clock rate to reduce power consumption. CMOS temperature sensors have been gaining popularity due to the lower cost of this technology. CMOS temperature sensors can be classified as linear circuits and temperature-dependent oscillators. Linear circuits use either parasitic bipolar transistors or MOS transistors as temperature-dependent devices. Parasitic bipolar transistors can be obtained in a CMOS process if the diffusion-well layers or the diffusion• well-substrate layers of this process are combined to obtain n-p-n or p-n-p junctions. Although the electrical performances of these bipolar transistors 5. Thermal monitoring of IC's 165

are inferior to those that can be obtained in a bipolar process (lower beta, higher base resistance, etc.), they have been used to design temperature sensors. Examples can be found in [210] and [211]. MOS transistors have been used as temperature transducers, either biased in a subthreshold region (called weak inversion) or biased in strong inversion as a threshold voltage reference [207]. When an MOS transistor is biased in the subthreshold region, carriers move in the channel due to a diffusion mechanism (when the MOS is biased in strong inversion, carriers move in the channel due to a drift mechanism). This transport mechanism is the same one that occurs in the base of a forward-biased bipolar transistor. Therefore, the two devices have similar mathematical characterisations and temperature dependence (compare expressions (2.30) and (2.36)). This means that all the sensor topologies that have been designed with bipolar transistors can now be used to replace the bipolar transistors with MOS transistors in weak inversion. The basic principle of MOS temperature transducers based on threshold voltage references is shown in Figure 5.23. As explained in Chapter 2, the threshold voltage of an MOS transistor has a negative temperature coefficient. In this circuit, if the working temperature of M 1 changes, as its drain current is constant, its gate-source voltage changes. This directly affects the voltage drop across RIo changing the drain current ID2•

Figure 5.23: Working principle of threshold voltage reference sensors.

An improved version of this sensor is shown in Figure 5.5. This sensor was proposed [214] and used to perform thermal monitoring of CMOS IC's and thermal testing of the quality of packages. The gate source voltage of transistors Mi, M2 and M3 can be described as:

V = V VG~ GS12"3 GS + => (5.15) => (VGS1 - VT )-VT = (VGS2 - VT )+ (VGS3 - VT ) 166 Chapter 5

where VT is the threshold voltage of these transistors. The current mirror can be used to relate the drain current of M 1 to the drain current of M2 and M3. If these transistors work in saturation, using (2.29), we have:

Current Mirror

Figure 5.24: Basic circuit of a threshold voltage reference [214], [215].

(5.16)

If the two above expressions are combined, they result as:

1 Vcs = VT 1 +----.----..- 1 1- .!5J..

(5.17) 5. Thermal monitoring oJIC's 167

If all the transistors are from the same IC, the quotient between K; /~ depends only on the size ratio between transistors i and j. As transistors M2 and M4 have the same gate-source voltage, M4 mirrors the drain current of M2. Therefore, the output current can be written as:

2

2 = VT (T)· peT)· f(~,4)(5.18)

In this expression, the temperature-dependent terms have been explicitly indicated. The advantage of this sensor is that the temperature dependence of both the mobility and the threshold voltage contribute to the temperature dependence of the output current in the same direction:

~ __2_. I . dVT(T) I . dp(T) +_1_. (5.19) dT - VT(T) OUT dT peT) OUT aT

This sensor has been implemented in [214] with a current-to-frequency converter. The figures of merit of this sensor are: area of 0.02 mm2 (implemented with 1 Jlm technology), power consumption of 200 JlW and an accuracy of ± 1°C. Other features of this sensor as implemented in [214] are: low sensitivity on power supply variations (± 0.25% change in the output frequency due to a ± 0.25V change in the supply voltage) and long• term stability, with a drift of ± 0.5 °C over 160 days of continuous operation. The main drawback of linear temperature sensors based on MOS transistors is that their sensitivity depends on process-dependent parameters, such as threshold voltage, and not on physical constants, as is the case with bipolar transistors. This makes this type of sensors process-dependent. Usually, digital output is desirable in temperature sensors. A common way to present the temperature in digital format is to give a pulse train whose frequency follows a specific temperature-dependent function. Therefore, several CMOS temperature-dependent oscillators have been designed as direct digital temperature sensors. Examples can be found in the literature in [212], [205], [217] and [206]. References [205] and [217] are based on ring oscillators. Figure 5.25 shows the basic topology of this circuit: a chain of an odd number of feedbacked inverters makes for logic instability at the output node of the sensor. The working principle of this circuit is as follows: the value of Vref is much higher than the threshold voltage of MI. Thus, if there is a temperature 168 Chapter 5 change, the temperature dependence of the carrier mobility on the drain current predominates over the threshold voltage temperature dependency. Transistor M 1 behaves like a temperature-controlled tap, regulating the current available in the inverter chain to charge and discharge the gate capacitances. The output signal of this sensor is usually the clock input of a counter that is activated for a fixed time interval.

Vref --1

Figure 5.25: Example of a temperature sensor based on ring oscillators.

This sensor was implemented in [205], where the sensitivity of the output frequency was 10 kHz/ °C (@ Vdd = 5 V). In [216] and [217] this sensing strategy was used to perform thermal testing of FPGAs, and the circuit shown in Figure 5.25 was implemented without transistor MI. The temperature affects the carrier mobility in all the inverters, the propagation delay increasing as the temperature increases. In this work, this simple circuit is located on-line at any point of the FPGA in order to ascertain the working temperature of a specific part of the circuit or to monitor the power dissipated by nearby logic cells, so as to detect the presence of activated defects. Results of these works show a temperature sensor with a temperature variation of -0.20 %/oC and a frequency between 21 and 27 MHz, depending on the implementation alternatives, at t=25 °C. The main advantages of this sensor topology are its simplicity and linearity. The main drawback is its high sensitivity to supply voltage changes. For example, in [216] this value has been quantified at 3.2 kHz/mY. A thermo-electric oscillator is designed and implemented in [212], with an accuracy of ± 2°C, a power dissipation of 20 mW and a sensitivity of approximately 520 HZ/ °C. The objective of this circuit is to measure the internal thermal diffusion constant of the silicon, whose temperature dependency is -0.57 [%/ 0C]. To measure this dependency, the frequency of the circuit depends on the thermal coupling impedance that exists between a heat source and a temperature-sensitive device located in the same silicon die. This is an example of a thermo-electric circuit. 5. Thermal monitoring of IC's 169

Another example of a temperature-dependent oscillator is featured in [206], in which a combination of oscillators such as those shown in Figure 5.26 are used to drive two counters. The resistor Rpo1y (in this case, made of polysilicon) charges the capacitance. The temperature affects the charging time as it modifies the value of the resistance. Switch SW is controlled by the output signal of the inverter. Based on this principle, and with two oscillators, one based on poly silicon and another based on p-well, a temperature transducer of resolution 0.1 °C and accuracy 0.5 °C (V dd regulated at 1 m V) is implemented. Linearisation and correction for process variations are carried out on chips with an EPROM and digital correction circuits.

I

Figure 5.26: Principle ofa temperature-controlled oscillator based on resistors.

4.2 Differential temperature sensors

Differential temperature sensors are used to measure the temperature increases generated by internal defects acting as heat sources. Figure 5.27 shows a simplified schematic of a BiCMOS layout of this type of sensor. The circuit has the same structure as a differential voltage amplifier. However, in this case, the two branches of the differential structure are destabilised by the temperature differential between the two bipolar transistors, rather than by the base-emitter voltage differential. Therefore, the temperature transducers of the circuit are the two bipolar transistors QJ and Q2, the temperatures of which are TJ and T2, respectively. 170 Chapter 5

Vdd

Rl

Vout

T2 Q2

collector

emitter

Figure 5.27: Simplified schematic of the built-in differential temperature sensor. Small signal electro-thermal model of the bipolar transistors.

The analysis of this circuit is based on a combination of the classical analysis of differential voltage and the calculation of thermal feedback in operational amplifiers [218]. Ideally, the output voltage of the sensor circuit is only sensitive to the temperature differential of the transducers. Nonetheless, the output voltage of the sensor can be shown as:

(5.20)

where SdT is the senSItIvIty to temperature differential and SeT is the sensitivity to common-mode temperature. If we assume that the surface temperature of the silicon is T, if a heat source is activated, the temperature of the bipolar transistors will increase, and will be:

Tl =T+ATI (5.21) T2 =T+AT2

thus, (5.16) can be rewritten as: 5. Thermal monitoring oflC's 171

(5.22)

In (5.18), if a differential temperature sensor is used for built-in testing purposes, its common sensitivity SeT must be as low as possible, whereas its differential sensitivity SdT must be as high as possible. Thus, it will only be sensitive to temperature increases generated by the activation of a heat source. The location of the temperature transducers inside the silicon die must provide a value other than zero for the term (~Tl-~T2) when internal heat sources are activated. The electro-thermal low-signal model of the bipolar transistor shown in Figure 5.27 can be used to obtain analytical values of SeT and SdT. In the figure, the sensitivity parameter ST is defined as:

S _ alc (5.23) T - aT

where I C is the collector current of the bipolar transistor and T is the absolute temperature. The use of this model assumes that self-heating can be neglected. In [190] numerical values of ST have been obtained using the HSPICE model of the bipolar transistor, with the assumption of VCB=O:

_IS. A NF·VTh~E 1J _ IS . A NF·VTh~E Ic I ---( e - ---·e (5.24) VCB=O qb qb

where A is the emitter area, IS is the inverse saturation current, NF is the forward current emission coefficient, VTh is the thermal voltage and Ie and VBE are the transistor bias. Thus:

ST =Ic [ --1 ( EG--VBE ) +-XTI] = VTh·T NF T (5.25) EG-(VB%F) XTI =g +Ic- m T T

where EG, XTI and NF are HSPICE parameters [219] of the bipolar model and gm is the low-signal transconductance of the bipolar transistor. For example, an NPN transistor can be shown to have the following ST 172 Chapter 5 parameter by using the HSPICE parameters of a 1.2 f.1m BiCMOS technology:

(5.26)

0.1 S~ 0.09 (K') 0.08 0.07 0.06 0.05

Figure 5.28: S~Ic as a function of the bias and room temperature for an NPN transistor of a 1.2 mm BieMOS process.

To obtain the differential and the common sensitivity of the sensor, several assumptions can be made to simplify the analysis: i) QJ and Q2 have the same low-signal model, since their operating points are practically the same (meaning that the base current of Qb is neglected). The parameters of their model are denoted by: ro, rm gm and St. ii) The parameters of the low-signal model of Qb are denoted by rob, rnh, gmb and Stb. iii) The temperature of the bipolar transistors are:

Ql :T+ATI Q2 :T+AT2 (5.27) Qb: T

where T is the room temperature and L1TJ and L1T2 are the temperature increases at the temperature transducers' location caused by the activation of an internal heat source, respectively. We are assuming that Qb remains at room temperature during the activation of the heat source. 5. Thermal monitoring of IC's 173

I gm+-:=gm '" iv) 'ob »'0 (5.28) 'nb »'" 1 gm »- '0

If the output resistance of the current source Ie IS infinite, by superposition:

~v I =s '0 ·R2 .~T, Out I1T2 =O,T=O t 2'0 + Rl + R2 1

, ·R2 ~VO I =-s a ·~T2 (5.29) ut I1Tl =O,T=O t 2'0 + Rl + R2

~VOutlI1Tl =O, I1T2=O = 0

If R=R j =R2, equation (5.18) becomes:

(5.30)

In this case, this sensor is only sensItive to temperature differences generated by internal heat sources. If the output resistance of the current source Ie has a finite value equal to rt, analytical values for SeT and SdT can be obtained. The most relevant conclusions that can be drawn from these results are: 1. SdT decreases and SeT has a non-zero value. 2. The sensor loses its symmetrical differential behaviour. Figure 5.29 shows the schematic of two built-in differential temperature sensors. Devices MRI to MR4 in sensor 1 and Mol and M02 in sensor 2 are used to provide a high-output impedance and therefore, high-differential sensitivity. Devices MPI and MNI are used to adjust the output voltage and to compensate for the thermal offset which results from device mismatching. Figure 5.30 is a photograph of sensor 2. In this sensor the distance between QJ and Q2 is 500 j.lm, which ensures that (L1Tr L1T2) is not zero. Sensor I has a similar layout. DC simulations performed using HSPICE have shown a differential sensitivity of 1.4VrC for sensor 1 and -2.7VrC for sensor 2 (@Vdd=5V, IcQl=IcQ2=23 j.lA, ST = 1.45 j.lmJ°C) [190]. 174 Chapter 5

Figure 5.29: Schematic of two BiCMOS built-in differential temperature sensors. Left: sensor 1. Right: sensor 2.

Left Side HEAT SOURCES BIASING DEVICES RiQht Side HEAT SOURCES

Figure 5.30: Photograph of sensor 2.

Figure 5.30 shows that there is a set of heat sources to the left of transistor Q1, and another set to the right of transistor Q2. These heat sources are identical, and demonstrate thermal behaviour as measured and discussed in Sections 2.2, 2.3 and 3. Each source may be individually activated and its power dissipation range goes from 0 to 25mW. The objective of this layout is to exhibit the ability to increase the temperature of one or the other temperature transducers of a sensor by activating the heat sources near either of them. 5. Thermal monitoring ofIC's 175

1.25 rr---;::=====;---, 3 ~ I Power=5mW. 1 & 0 Sensor 1 2 E 0 -=0 I> <1 -1 t:c'~...... --, 100 200 30 -2 Distance [~m] -3 0 5 10 15 20 25 3( Power [mW] Distance from Ql Distance from Q2

--G- 17~m ...... <> ...... 17~m ····0··· 78~ ~. 78~m ---ffi--- 137~m _._+._- 137~

Figure 5.31: DC temperature measurements. Left: LlVOUT as a junction of the distance between the active heat source and the temperature transducer (QJ for sensor 1 and Q2 for sensor 2). Right: LlVOUT in sensor 2 as afunction of the power dissipated by the heat source.

0.05 ...,.....--..,.---.,...-----,,----, -i8~.m_ i ! -.. ..,.-." .... ~ , o ····· ..· .. ·.. ··!· .. ······· .. ·.. ·····r~·.. ~;.-:t:-:,.:~:··

: : .. --.-.~- ..- ..--.- ..- .. -.... -=-.....- ..- ..- ..- ..-. E.. -0.05 . : : 6 -0.1 I>

lb = 100 J1A lb = 520 J1A. lb = 50 J1A.

-0-- 1711m -·--0-··- 17 J1m ---m--- 17 J1m

·· .. ·.. ·0...... 77 J1m ----6---- 77 J1m -.-.•-.-. 77 J1m

Figure 5.32: LlVOUT (sensor 2) as a function of the power simultaneously dissipated by two heat sources equidistant from the temperature transducers Q1 and Q2 and as a function of the bias current Ib drawn by the current source Ie. 176 Chapter 5

The differential and common sensitivities of these sensors can be obtained with DC temperature measurements. Figure 5.31 shows two examples of differential measurements: the left-hand part of this figure shows the output voltage variation as a function of the distance between the activated heat source and the closest bipolar transistor, when the dissipated power is 5 mW. In order to obtain data for sensor 1, the heat sources close to QI were sequentially activated, whereas those close to Q2 were activated to obtain data for sensor 2. The location of the heat sources is shown by the dots in this figure. The lines in the figure indicate the expected behaviour of the temperature amplitude as predicted by the one-dimensional RC electrical model of heat conduction as used in Section 2 of this chapter. The right-hand part of Figure 5.31 shows the linearity of the differential sensitivity. First, heat sources close to QI have been sequentially activated, and later, the same was done with the heat sources close to Q2. The data contained in this figure shows a differential sensitivity of -215.2 VIW (174.6 VIW) when the activated heat source is located at 17 )..Lm from QI (Q2), -79.5 VIW (68.77 VIW) when the activated heat source is located at 78 )..Lm from QI (Q2) and -29.36 VIW (23.57 VIW) when it is located at 137 )..Lm from QI (Q2)' Measurements of the common sensitivity are featured in Figure 5.32. Two heat sources equidistant from QI and Q2 have been simultaneously activated. Results are plotted as a function of the power dissipated by each heat source, its distance from the closest temperature transducer and the bias current h drawn by the current source Ie. As predicted by the analysis, the lower the bias current (and therefore, the higher the output impedance of the current source), the lower the common sensitivity. The bandwidth of the temperature sensor is limited by the presence of high impedance nodes in the circuit, which introduce dominant poles in the transfer function. In this case, both of the sensors have just one high impedance node: the output node. Therefore, the cut-off frequency is equal to:

1 fc=------(5.31) 2 .Jr. ROut' C Out

where C out and Rout are the capacitor and resistance as seen from the output node. Examples of transient measurements are given in Figure 5.33. On the left-hand side of this figure, the dynamic output voltage variation for sensor 2 is shown, when a heat source 77 )..Lm from QI dissipates a power pulse with a magnitude of 23 mW for 30, 50, 80 and 100 )..Ls. As is shown, the sensor allows real-time monitoring of the temperature difference at the transducer 5. Thermal monitoring of IC's 177

locations. The effect of the sensor bias on the sensor's sensitivity and bandwidth are shown on the right-hand part of the figure.

.. CD YFC2 = 1.95 V

-- CD VFC2= 1.696 ..., ~ .~ -:--~ ~ . - " II I CD YFC2= 1.2 V .. ~rt

Figure 5.33: Left: Dynamic output voltage variation (sensor 2) when a heat source 77 f.1m from Ql dissipates 23mW for 30, 50, 80 and 100 flS. (X-axis: 50 flSldiv; Y-axis: 50mVldiv). Right: Output voltage variation (sensor 2) as a function of the bias (and therefore, different bandwidth and sensitivity).

AC measurements can be used to obtain the thermal coupling impedance between the heat source and the temperature monitoring points. For example, Figure 5.34 shows the amplitude and phase of the first harmonic of the Vout signal (sensor 2) as a function of the distance between the activated heat source and the bipolar transistor Q/ for five different activation frequencies. As with measurements made with non-contact optical methods, the phase exhibits a linear behaviour with respect to the distance.

100 180 ~ 160 .. 40 i 'O ~120 !i 100 1 40 60 80 100 120 140 20 40 60 80 100 120 140 Dio;tance Dio;tance

~ 1 Hz. - - ~ -- 123 Hz.- - + -- 4010Hz. -a-- IOHz.--)(--1010Hz

Figure 5.34: Phase and amplitude AC measurements with periodic activation of the heat source.

Figure 5.35 shows the Bode diagram obtained between the power source dissipated by the heat source and the output voltage of sensor 2. In this case, 178 Chapter 5

the heat source is 80 Ilm from transistor Q1. This graph shows the effect of two transfer functions connected in series: the transfer function of the thermal coupling between the heat source and the bipolar transistors that act as temperature transducers, and the transfer function of the sensor itself. As is shown, this transfer function exhibits a clear low pass filter nature. For example, [220] uses these low-pass filters for electronic designs.

Distance from Ql: 80 urn 0.1 ...... ~'""',..",::-r::~rrrm--=r-.-m"I'TI'T-,.....,..,..,.,.".rr-~ -180

-=~:+--'---+---3 -200 -220 -240 10.01 -260 -280

0.001 10 100 1000 Frequency

-&- Amplitude -9- Phase

Figure 5.35: Bode diagram between the power dissipated by the heat source and the output voltage of sensor 2.

The structure of this IC is suitable for checking electro-thermal simulation procedures. A comparison between simulated and measured values is given in Figure 5.36. Therein, the different heat sources have been sequentially activated with the following voltage function:

v(t) ={2.I4-[ case; t)-I] if O~t~T (5.32) o if tT

where t is time and T is a variable parameter (in seconds). With this function, the heat source dissipates power when the voltage applied to its gate surpasses the threshold value of 0.8 V. In this graph, the measured data is compared with the results of the electro-thermal simulation procedure shown in Example 7 of Chapter 3. The results are plotted as a function of parameter T for five different heat sources at different distances from transistor Q2. In this case, the electro-thermal simulation permits a prediction of a temperature difference of 0.6 mK between the two temperature transducers, when the heat source is 137 Ilm of Q2 and the parameter T is 2 5. Thermal monitoring of IC's 179 fls. This result is an indicator of the sensitivity of a built-in differential sensing strategy.

I E+O I..,..------;------,

Simulated Measured I E+OO HI 17 ~m

E IE-Ol • 47 ~m .... $ ::I 78~m ~ l E-02 V

J E-04 +----.--.....,....,...... ,...... ;--...... -...... -.i JE+OO IE+O l l E+02 Parameter T [I1S]

Figure 5.36: Comparison between simulated and measured results.

5. CONCLUSIONS

In order to evaluate the efficiency of an IC thermal testing methodology, several parameters must be taken into account when analysing different temperature-sensing strategies to detect hot spots in IC's. These include sensitivity, applicability (ability to perform measurements at any location of the IC), cost, bandwidth capability, the complexity of the equipment required to obtain the measurements and the hot spot location ability. In terms of sensitivity, the differential built-in temperature sensors exhibit the highest sensitivity (0.2 mK) at the sensing point. This technique is based on the placement of a limited number of sensors in the circuit under test, allowing the calculation of temperature (phase and amplitude) in surrounding points using the sensor matrix data and through the heat propagation mechanism laws. Laser rejlectometry and laser interferometry techniques present a lower sensitivity at the measuring point but have the advantage of directly reaching all the points of interest within the scanning area. The scanning thermal microscope technique has the lowest sensitivity out of all the scanning techniques (0.05K), which is also partially compensated by its scanning base. The non-scanning techniques (liquid crystal thermography, fluorescent microthermography and infrared emission 180 Chapter 5

thermography) are characterised by lower sensitivity, 0.1 DC, 0.01 DC and 0.1 DC respectively. The three scanning techniques (scanning thermal microscope, laser reflectometry and laser interferometry) offer high applicability and location ability due to their capacity to generate full thermal surface maps of the accessed area from the measuring process. The topographic abilities of the scanning thermal microscope technique offer easy identification of the heat source. The resolution of the scanning thermal microscope is approximately 50 nm, and 500 nm for both laser reflectometry and laser interferometry. In terms of applicability, the differential built-in temperature sensors have the lowest level. The placement of the sensors is determined during the design phase of the circuit (in accordance with design for thermal testing goals). However, the latter technique is not affected by metal, or oxide and passivation layers (the propagation of heat mainly comes about through the substrate, which features good thermal coupling with the semiconductor sensors). In surface scanning techniques, these layers always affect the measurements, spreading the hot spot and reducing the sensitivity and accuracy of measurements. The resolution for liquid crystal thermography, fluorescent microthermography and infrared emission thermography can be situated at around a few microns. The cost of the equipment required is directly related to its complexity. The three scanning techniques require a laboratory environment and direct access to the silicon surface (making it necessary to remove the package), and all three involve equipment of high cost and technical complexity, in addition to requiring considerable technical know-how. All three scanning techniques require visual access to the sample. The cost of scanning thermal microscope equipment is much higher than that used in laser reflectometry and laser interferometry. In terms of testing procedure application cost, the most economical devices are undoubtedly the differential built-in temperature sensors. This technique has a silicon overhead, and consequently it has a manufacturing cost. Nevertheless, the equipment required for testing is relatively simple (conventional electronic instruments compatible with remote advanced testing access techniques such as JTAG mechanisms). This technique can be applied in both the manufacturing phase and the in-field maintenance phase with the same requirements, leaving the circuit packaged at all times. Liquid crystal thermography, fluorescent microthermography and infrared emission thermography are much more economical and simpler than the scanning techniques. Finally, in terms of their capability to track and measure dynamic thermal signals (bandwidth), the fastest techniques are laser interferometry and laser reflectometry, with capacity for up to 125 MHz. The differential built-in temperature sensors have a bandwidth of approximately 1 MHz. The 5. Thermal monitoring of Ie's 181 scanning thermal microscope, as its Pt sensor has greater thermal inertia, is slightly slower than the other techniques (10kHz). Similar dynamics are exhibited by the infrared emission technique, whereas they are much lower for fluorescent thermography and practically static for liquid crystal. The selection of the technique used for measurement or diagnosis of a defective Ie must be made by evaluating the characteristics outlined above and the requirements of each application.

6. REFERENCES

[167] Chandrasekhar, S., "Liquid crystals", Cambridge University Press, 1992. [168] Reinitzer, F., Monatsch Chem., 9, 421 (1888). [169] Lehmann, 0., Z. Physical Chem., 4, 462 (1889). [170] Friedel, G., Ann. Physique, 18,273 (1922). [171] Szekely, V. and Rencz, M., "Image processing procedures for the thermal measurements", IEEE Transaction on Components and Packaging Technology, vol. 22, no. 2, June 1999, pp. 259-265. [172] Hiatt, J., "A method of detecting hot spots on semiconductors using liquid crystals", Proc. Int. ReI. Phys, Sym., Apr. 1981, pp. 130-133. [173] Bahadur, B., "Liquid crystal displays", Gordon and Breach Science Publishers, Molecular Crystals and Liquid Crystals, vol. 109, no. 1, 1984. [174] Szekely, v., Rencz, M. and Courtois, B., "Tracing the thermal behavior ofICs", IEEE Design and Test of Computers, April-June 1998, pp. 14-21. [175] Kolodner, P. and Tyson, J.A., "Remote thermal imaging with .7-mm spatial resolution using temperature-dependent fluorescent thin film", Appl. Phys. Lett., 42(1), 1 January 1983, pp. 117-119. [176] Kolodner, P. and Tyson, J.A., "Microscopic fluorescent imaging of surface temperature profiles with 0.01 DC resolution", Appl. Phys. Lett., 40(9), 1 May 1982, pp. 782-784. [177] Results of CP94922 Copernicus Project: Therminic. European Union Projects. [178] Soden, J.M. and Anderson, R.E., "IC failure analysis: techniques and tools for quality and reliability improvement", Proceedings of the IEEE, vol. 81, no. 5, May 1993, pp. 703- 716. [179] Lee, D., "Thermal analysis of integrated circuit chips using thermographic imaging techniques", IEEE Transactions on Instrumentation and Measurements, vol. 43, no. 6, December 1994, pp. 824-829. [180] Moore, P.J. and Harscoet, F., "Low cost thermal imaging for power systems applications using a conventional CCD camera", Energy Management and Power Delivery, 1998. Proceedings of EMPD '98. 1998, pp. 589 -594 vol.2. [181] Majumdar, A., Annual Review of Materials Science, 1999, Vol. 29, pp. 505-585. [182] W. Claeys, S. Dilhaire, V. Quintard, J.P. Dom and Y. Danto, "Thermoreflectance Optical Test Probe for the Measurement of Current-Induced Temperature Changes in Microelectronic Components", Quality and Reliability Engineering International, Vol. 9, 303-308 (1993). [183] Quintard, V., Deboy, G., Dilhaire, S., Lewis, D., Phan, T. and Claeys, W., "Laser beam thermography of circuits in the particular case of passivated semiconductors", Microelectronic Engineering 31 (Elesevier), pp. 291-298, 1'..196. 182 Chapter 5

[184] Claeys, W., Dilhaire, S., Jorez, S. and Patifio Lopez, L.D., "Laser probes for the thermal and thermomechanical characterization of microelectronic devices," Microelectronics Journal, vol. 32, no. 10-11, Elsevier, pp. 891-898,2001. [185] Ju, Y.S., Kading, O.W., Leung, Y.K., Wong, S.S., and Goodson, K.E., "Short• Timescale Thermal Mapping of Semiconductor Devices", IEEE Electron Device Letters, Vol. 18, no. 5, May 1997, pp. 169-171. [186] Weaklinem, H.A, and Redfield, D., "Temperature dependence of the optical properties of silicon", J. Appl. Phys, vol. 50, no. 3, p. 1491, 1979. [187] Dilhaire, S., Jorez, S., Patmo, L.D., Claeys, W., and Schaub, E., "Calibration Procedure of Temperature Measurements by Thermoreflectance upon Microelectronic Devices," 11 th International Conference on Photoacustic and Photothermal Phenomenon, Kyoto, 2000. [188] Altet, J., Dilhaire, S., Volz, S., Rampnoux, J.M., Rubio, A, Grauby S., Patino, L.D., Claeys, W., and Saulnier, lB., "Four Different Approaches for the Measurement of the IC Surface Temperature: Application to Thermal Testing," Proc. 7th Therminic Workshop, 2001, pp. 233-238. [189] Altet, J., Dilhaire, S., Grauby, S., and Volz, S., "Advanced Techniques for IC Surface Temperature Measurement," Electronics Cooling. volume B, no. I, Feb. 2001, pp. 22-30. [190] Altet, J., Rubio, A, Schaub, E., Dilhaire, S., and Claeys, W., "Thermal Couplings in Integrated Circuits: Application to Thermal Testing, " IEEE Journal of Solid-State Circuits, vol. 36, no. I, Jan. 2001, pp. 81-91. [191] Kurabayashi, K., and Goodson, K.E., "Precision Measurement and Mapping of Die• Attach Thermal Resistance", IEEE Transactions on Components, Packaging and Manufacturing Technology - Part A, vol. 21, no. 3, Sep. 1998, pp. 506-514. [192] Ju, Y.S., and Goodson, K.E., "Thermal Mapping of Interconnects Subjected to Brief Electrical Stresses, "IEEE Electron Device Letters, vol. 18, no. II, November 1997. [193] Dilhaire, S., "Developpement d'un interferometre laser tres haute resolution pour la caracterisation de composants microelectroniques," Ph.D. Thesis. no. 1103, 1994, Universite Bordeaux I. [194] Dilhaire, S., Altet, J., Jorez, S., Schaub, E., Rubio, A, and Claeys, W., "Fault localisation in ICs by goniometric laswer probing of thermal induced surface waves", Microelectronics Reliability 39 (1999), 919-923. [195] Jorez, S. "Developpement d'instrumentation et de methodologies pour la caracterisation thermique et thermomecanique de composants electroniques," Ph.D. Thesis. 2001. Thesis n° 2425 Universite Bordeaux I. [196] Dilhaire, S., Jorez, S., Grauby, S., Patino, L.D., Rampnoux, J.M., and Claeys, W., "Thermal stress analysis of Thermoelectric Devices Studied by Speckle Interferometry", Proc. Therminic 2001. pp. 24-27. [197] Lai, l, Chandrachood, M., Majumdar, A, and Carrejo, J.P., "Thermal Detection of Device Failure by Atomic Force Microscopy," IEEE Electron Device Letters, vol. 16, no. 7, July 1995, pp. 312-315. [198] Lo, J.C., Armitage, W.D., and Johnson III, C.S., "Using Atomic Force Microscopy for Deep-Submicron Failure Analysis", IEEE Design & Test of Computers, Jan.-Feb. 2001, pp.1O-18. [199] Ohte, A. and Yamagata, M., "A Precision Silicon Transistor Thermometer," IEEE Transactions on Instrumentation and Measurement, vol. IM-26, no. 4, Dec. 1977, pp. 335- 341. [200] -, "Pentium ® III Active Thermal Management Techniques", Order # 273405-001, August 2000. http://www.intel.com 5. Thermal monitoring of IC's 183

[201] -, "Intel ® Pentium ® 4 Processor in the 423-pin Package. Thermal Design Guidelines," Order # 249203-001. November, 2000. [202] Meijer, G.C.M., ''Thermal Sensors Based on Transistors," Sensors and Actuators, vol. A10m 1986, pp. 103-125. [203] Tsividis, Y.P., "Accurate Analysis of Temperature Effects in IC-VBE Chartacteristics with Application to Bandgap Reference Sources," IEEE Journal of Solid State Circuits, Vol. sc-15, no. 6, Dec. 1980, pp. 1076-1084. [204] Rasmussen, W. and Ristic L. (Editor), "Sensor Technology and Devices. Chap. 8: Thermal Sensors," Artech House, 1993. [205] Boyle, S.R. and Heald, RA. "A CMOS Circuit for Real-Time Chip Temperature Measurement," Spring COMPCON 94, pp. 286-291,1994. [206] Rasmussen, W., Zhu, J., Richard, S., and Cheeke, D., "CMOS Intelligent Temperature Sensor," 33rd Midwest Symp. Circuits and Systems, 1990, pp. 849-852. [207] Kolling, A., Bak, F., Bergveld, P. and Evert, S., "Design of a CMOS Temperature Sensor with Current Output," Sensors and Actuators, A21-A23, 1990, pp. 645-649. [208] Krummenacher, P. and Oguey, H., "Smart Temperature Sensor in CMOS Technology", Sensors and Actuators, A21-A23, 1990, pp. 636-638. [209] Micheda, J. and Kim, S.K., "A Precision CMOS Bandgap Reference," IEEE Journal of Solid State Circuits, vol. sc-19, no. 6, Dec. 1984, pp. 1014-1021. [210] Bianchi, RA., Karam, J.M., Courtois, B., Nadal, R, Pressecq, F., and Sifflet, S., "CMOS-compatible temperature sensor with digital output for wide temperature range applications", Microelectronics Journal 31, 2000, pp. 803-810. [211] Meijer, G.C.M., Wang, G., Fruett, F., ''Temperature Sensors and Voltage References Implemented in CMOS Technology," IEEE Sensors Journal, Vol. 1, no. 3, Oct. 2001, pp. 225-234. [212] Szekely, V. ''Thermal monitoring of microelectronic structures", Microelectronics Journal,25 1994, pp. 157-170. [213] Szekely, Marta, Cs., Rencz, M., Benedek, Zs., and Courtois, B., "Design for thermal testability (DfIT) and a CMOS realization", Sensors and Actuators ASS, 1996, pp. 29-33. [214] Szekely, V., Marta, Cs., Kohari, Zs., and Rencz, M., "CMOS Sensors for On-Line Thermal Monitoring of VLSI Circuits," IEEE Transactions on VLSI Systems, vol. 5, no. 3,Sep.1997,pp.270-276. [215] Szekely, V. Marta, C., Rencz, M., Vegh, G., Benedek, Z., and Torok, S., "A Thermal Benchmarck Chip: Design and Applications," IEEE Trans. on Compo Pack. and Manuf. Tech.- Part A, vol. 21, no. 3, Sept. 1998, pp. 399-405. [216] L6pez-Buedo, S. Garrido, J. and Boemo, E., "Thermal Testing on Reconfigurable Computers," IEEE Design and Test of Computers, Jan.-Mar. 2000, pp. 84-91. [217] L6pez-Buedo, S., Garrido, S. and Boemo, E., "Measurement of FPGA Die Temperature Using Run-Time Configuration," 7th Therrninic Workshop, 2001, pp. 168-173. [218] Solomon, J.E., ''The Monolothic Op Amp: A Tutorial Study", IEEE Journal Of Solid• State Circuits, vol. SC-9, no. 6, pp. 314-332, December 1974. [219] HSPICE User's Manual. [220] Gray, P.R and Douglas, J.H., "Analysis of Electrothermal Integrated Circuit$," IEEE Journal of Solid State Circuits, vol. sc-6, no. 1, Feb. 1971, pp. 8-14. Chapter 6

Feasibility analysis and conclusions

1. INTRODUCTION

In foregoing chapters the fundamentals of heat propagation laws and analysis tools, in addition to the nature of the heat sources in IC's and their relation to the presence of failures have been introduced. Use of temperature as a test observable, its advantages and disadvantages as a complementary test procedure and the various methods of temperature measurement have also been presented. Thermal testing is being used by industry in many fields of engineering and has a growing range of applications in the VSLI IC manufacturing environment. One of the key roles of packages of IC's is as a mechanism for the removal of heat from the inside of the circuit towards heat sinking elements. Consequently, thermal testing plays a major role in the analysis and testing of packages at various levels within electronic systems. The companies dedicated to thermal testing, in addition to their testing methods and tools have already reached a certain degree of maturity. An example of innovative research projects in the field are the results obtained in the THERMINIC Copernicus project and the DELPHI project, both of which have been carried out under the tutelage of the European Union. Also noteworthy is the role of spin-off companies stemming from the universities that are active in this area, such as the Technical University of Budapest, which is involved in consulting and technical support in the field of thermal testing for the electronics industry.

185 186 Chapter 6

Thermal analysis is also already well established in the failure analysis laboratories of most leading semiconductor manufacturers, by way of the various techniques of thermal map generation, from simple liquid crystal thermography to the more sophisticated atomic force microscope-based scanning. In Chapter 4, and at the end of Chapter 5 we introduced built-in differential temperature sensors as another thermal testing technique with an appropriate level of sensitivity. As their name implies, they are built-in, and do not affect the electrical or temporal characteristics of the CUT. In terms of their drawbacks, we have discussed silicon overhead and the slowness of the test application. This technique offers attractive diagnostic capabilities. The incorporation of thermal sensors inside the silicon die is being implemented by chip manufacturers for user-defined thermal management purposes (e.g., the Intel Corporation's new families of processors). The use of built-in thermal sensors for testing purposes is currently being researched. Most of the results in the area of differential built-in temperature sensors that have been presented in the book have come from the research results of the UPC group of which the authors form part. This chapter is dedicated to a feasibility analysis and discussion of the differential built-in temperature testing technique. Efforts will be aimed at the identification of research topics that must be thoroughly mastered before the technique can be applied. This chapter identifies several important factors that affect cost (mainly area overhead) and the sensing capacity (number of sensors and sensitivity of the sensor circuits). Various trade-offs between them and the time taken by the test applications are discussed. These can be used by VLSI designers as standards for implementation of the technique in the design of thermal testing procedures. The ways in which methods of test pattern generation can increase the efficiency of thermal testing is also evaluated in this chapter, and preliminary ATPG for thermal testing proposals are shown. Finally, the chapter ends with a section devoted to general conclusions of the book, the present situation and future expectations for thermal testing, their advantages and disadvantages and the possibilities of future research in this field. 6. Feasibility analysis and conclusions 187 2. FEASIBILITY ASPECTS OF THE THERMAL TESTING OF CIRCUITS

2.1 Cost estimation

The aim of this section is to analyse two parameters that affect the cost and feasibility of this testing technique: the number of sensors to include in an IC in order to ensure proper fault coverage and the appropriate test frequency. Before beginning this analysis, four other parameters are dermed: Area monitored by a transducer: Due to the attenuation that a thermal disturbance undergoes as its distance from the heat source increases, a temperature transducer can detect only those heat sources activated within the area of a circle of radius R whose centre is the transducer location. This is represented in Figure 6.1. If a differential sensing strategy is used, the area monitored by a single sensor is equal to the sum of the two areas monitored by the two temperature transducers.

Heat source not monitored by the temperature transducer, Silicon die I Transducer _ QlorQ2

Heat source monitored by the transducer Figure 6.1: Graphical representation of the area monitored by a transducer.

Temperature threshold level at the monitoring point: If the measured temperature exceeds a reference level called the temperature threshold level, TTH, while a thermal test is running, it can be used as a criterion to assume that a circuit is faulty. Testing period T: If a defect is activated by an input vector, the heat sources dissipate power as long as this vector is applied to the input of the circuit. Therefore, the duration of the dissipated power pulse is equal to the testing period, T. Minimum power dissipated by a heat source: When the heat sources that appear in defective circuits were characterised in Chapter 4, it was shown that the power dissipation magnitude depends on the value of the resistor that models the defect. Therefore, if the aim of a thermal test is to detect a 188 Chapter 6 fault that can be modelled with a resistor within the resistor range [RMAX>

RMIN], the temperature transducer must be able to detect the temperature increase that is generated when the heat sources dissipate a power magnitude within the range [PMAX, PMIN]. This PM1N value is defined as the minimum power dissipated by a heat source (defect). The aim of this section is to examine certain trade-offs between the two parameters defined above: the area monitored by a transducer and the testing period, since they are directly related to the number of thermal sensors to be included in the CUT and the testing frequency. The values of the four above-defined parameters are correlated, in such a way that when value of any of the three parameters is established, the value of the last one can be determined. The relationship between these parameters has already been analysed in Figures 4.24 and 4.25, which show the peak value of the thermal disturbance as a function of the distance between the monitoring point and the heat source, the power pulse duration and the power dissipation magnitude. The curves are the same, but the peak value of the temperature increase has been replaced by TTH, the distance has been replaced by the radius of the monitored area, R, the power pulse duration has been replaced by the testing period, T, and PM1N has substituted the power dissipation magnitude. Figure 6.2 shows the value of TTH as a function of both the testing period and the radius of the monitored area when PM1N is equal to 8 mW. Various values of PM1N are scaled proportionately to the vertical axis in Figure 6.2.

....., 0.1 u T = 1 ~s. is T = 10 ~s. Eo-<

o 50 100 150 200 250 Radius [!lm] Figure 6.2: TTH [OC] as a function of both the radius of the monitored area by a transducer and the testing period. PM1N =8 mW.

For low-cost thermal testing, a high testing frequency and a low area overhead (equivalent to the number of thermal sensors to include in the IC) must be applied. The results shown in Figure 6.2 indicate that this is feasible as long as the threshold thermal level is kept low. Two parameters fix the 6. Feasibility analysis and conclusions 189 lower bound of TTH: sensor sensitivity and the presence of thermal disturbances generated by the power dissipation during the normal operation of the CUT. For example, if the temperature is monitored with the BiCMOS differential temperature sensors presented in the foregoing chapter, thermal gradients of 2 mOC will change the output voltage of the sensor by 5 m V through the use of the differential temperature sensitivity obtained with the standard parameter models (1.4 vrc and 2.7 vrc respectively). Currently, high-sensitivity thermal sensors constitute an unexplored research topic. Once TTH has been fixed, a relationship is established between the radius of the area monitored by a transducer and the testing period. For example, Figure 6.3 shows the value of this relationship when TTH has been fixed at 25 mK. Assuming a sensor with a thermal gain equal to 2 V rc, the temperature threshold level makes for a threshold voltage of 50 m V. Figure 6.3 shows one of the trade-offs that is inherent to this testing technique: when TTH and PM1N are fixed, a reduction of the number of built-in thermal sensors makes for a slow-down in the test process. When the testing period is extended, the temperature increase generated by a defect is higher. Thus, the area monitored by a transducer is increased.

200

...... 150 .. -...... t-...... :::

i! i 0 10 100 Testing period [Ils]

Figure 6.3: Radius of the area monitored by a transducer as a function of both the PM1N and the testing period. LJTTH = 25 mK.

Similar remarks can be made on static thermal testing. In this case, however, the analysis is much simpler, as there is no need to consider the parameter testing period. 190 Chapter 6 2.2 Discriminability analysis

The thermal disturbance generated by a defect was characterised in Chapter 4. Detection of this temperature increase in the course of a testing procedure classifies the circuit under test (CUT) as faulty. However, other heat sources may appear in the course of the circuit's normal operation. In order for thermal testing to be feasible, the temperature increase generated by a defect must be the main thermal disturbance to be measured at the temperature monitoring point. In other words, the thermal disturbance generated by a fault-free circuit must be ascertained in order to set a threshold temperature increase. The purpose of this section is to characterise the thermal disturbances that are generated by the heat sources which may arise in the course of CMOS digital circuits' normal operation, and to introduce various testing strategies which increase the feasibility of thermal testing.

2.2.1 Heat sources in fault-free circuits: generation of thermal disturbances

The power dissipated by a CMOS fault-free circuit can be expressed as the sum of three terms:

(6.1)

where Pc is the power dissipated by the devices that drive the parasitic capacitors during a logic state transition, Psc is the short-circuit power and P Q is the quiescent power, which is generated by the reverse bias diode current and the subthreshold conduction. Technological scaling trends are making it possible to attain higher values of magnitude of the quiescent power [181]. Two main factors are contributing to this increase: first, the increase in the integration density (i.e. the more devices present in an IC, the higher the overall quiescent power), and second, the drop in the threshold voltage. The latter value has become necessary to maintain the current drive capabilities of the MOS transistors in view of the decrease in the supply voltage. However, this increases subthreshold conduction. Although quiescent power is state-dependent [222], a constant power distribution function of the surface can be assumed as regards thermal parameters. This assumption implies the addition of a thermal offset in the thermal map of the silicon IC. If a differential sensing strategy is used, this thermal disturbance will be rejected by the CMRR of the sensor. If the 6. Feasibility analysis and conclusions 191 power distribution function generates a thermal gradient, it can be compensated by using thermal offset correction circuitry. Therefore, as this power consumption does not affect thermal testing, it will not be discussed further. The power dissipated during a logic transition temporarily alters the temperature map of the IC. During a rising (or falling) logic transition at the output node of a CMOS static logic gate, the PMOS (NMOS) transistor net of this gate conducts current to charge (or discharge) the parasitic capacitance (CL) associated with the output node. If the voltage at this node swings from 0 to VDD (or VDD to 0), the energy dissipated by the PMOS (NMOS) transistor net during the logic transition is equal to:

(6.2)

This energy is independent of finite rise and fall times at the input of the switching gate. Finite fall and rise times generate short-circuit currents inside the logic gate, as both the PMOS transistors and NMOS transistors are in a conducting state, whereas:

(6.3)

where Vth and Vtp are the values of threshold voltage of the NMOS and PMOS transistors respectively, and VIN is the input voltage. Therefore, static CMOS digital circuits dissipate power during transition logic, Pc+Psc, and during the quiescent time between transitions, PQ•

Yout Yin f------,

Figure 6.4: Circuit used to analyse the heat sources that arise during a logic transition. 192 Chapter 6

For example, Figure 6.4 features a circuit that will be used to show the numerical values of the power dissipated during a logic transition and the thermal disturbances that will be generated. In this case, the logic transition rising (or falling) time at node Yin is equal to 0.1 ns. The four inverters located between this node and node A generate a logic transition at the latter node that is independent of the rise or fall times imposed at node Yin' Figure 6.5 shows the power dissipated by the devices in Figure 6.4 during a falling transition at the input terminal. Eight different cases have been analysed, with varying values assigned to the capacitors Cll and C12. This plot has been obtained through electrical simulations.

Case # 1 2 3 4 5 6 7 8 Cll lOOtF 100 tF 100 tF 100 tF 1 pF 1 pF 1 pF IJlF Cl2 1 pF 500 tF 200 tF lOOtF 1 pF 500 tF 200 tF lOOtF

Figure 6.5: Power dissipated by the devices in Figure 6.4 during a falling transition at the input terminal. Eight different cases have been analysed.

Similar behaviour is observed during a rising logic transition (exchanging the power functions dissipated by the PMOS and NMOS transistors). When the output signal undergoes a rising (or falling) transition, the device labelled PMOS (or NMOS) conducts current to charge (or discharge) capacitor C12. Over a given time interval, both the NMOS and the PMOS devices simultaneously conduct current, and dissipate the short-circuit power. Although the order of magnitude of the instantaneous dissipated power is of the same order as those of the faulty cases analysed in Chapter 4, due to the short switching time of a logic transition, if the testing period lasts for a 6. Feasibility analysis and conclusions 193 matter of microseconds, the total energy dissipated (the time integral of the instantaneous power) is three orders of magnitude lower than the power dissipated due to a defect. Figure 6.6 shows the energy dissipated by the devices labelled PMOS and NMOS when there is a rising logic transition at the input. This graphic is plotted as a function of capacitor Cl2 for three different values of Cll. The energy that is stored in the output capacitance has also been plotted. As has been demonstrated, when capacitor Cll has high values, the energy dissipated by the NMOS transistor is the energy that is stored in the output capacitor, and the energy dissipated due to short-circuit currents can be ignored. When capacitor Cl2 has a low value, the energy dissipated by the NMOS transistor becomes independent of the value of the output capacitor. This means that the dissipated power is mainly attributable to a short-circuit current conduction.

IE-IO , l Energy stored in C12. j IE-II -.------.. --...... '1' .. -...... -.-.-...... -0-...... Cll = IOfF. Energy dissipated by PMOS

6 IE-1 2 ~." ... " ..; ...... ,--,0 • . -. 0 1 = 10fF. Energy dissipated by NMOS i ...... ~ 1)11 -··-...... ·.. ···--.... ····· .. ···i .. ·.. ·.. ···· .... ·...... ~... · ____ .____ CII = 100 IF. Energy dissipated by PMOS a... IE-13 Q.I ••••••• i C Iio;j IE- 14 . ~-"!:~...... ;: .t... -...... -. ---x- -- 0 1 = 100 fFEnergy di ssipated by NMOS .,. ~ ! .. "'0'0 " _._ . • _ . _ . CIl=lpF.EnergydissipatedbyPMOS IE-15 ...... · .. · ...... ·...... ·.. · .. ·.. r· .... ·· ....· ...... :o.".·~:o ! CII = IpF. Energy dissipated by NMOS IE-1 6 ;-~~~~~--~~~~ IE-14 IE-13 IE-12 CI 2 [F]

Figure 6.6: Energy dissipated by the PMOS and NMOS devices as a junction of capacitors CLl and CL2 when there is a rising logic transition at the input.

The thermal disturbance generated by these heat sources can be obtained by using the RC model of heat conduction through the IC that was presented in Chapter 3 and used in Chapters 4 and S. Figure 6.7 shows the peak value of the thermal disturbance as a function of the distance between the heat source and the monitoring point. In this case, the heat source is the PMOS transistor, which dissipates four of the eight power functions represented in Figure 6.S. As demonstrated, due to the low-pass filter nature of thermal coupling, the thermal disturbance generated by the switching activity decreases rapidly as the distance from the heat source increases. 194 Chapter 6

IE+OO

IE-OI Cll = 100 fF...... ~ 1&02 CI2=lpF. Eo-< <:l IE-03 CI2=.5pF.

CI2=.2pF. IE-04 CI2=.lpF. IE-05 o 5 10 15 20 Distance [/lm]

Figure 6.7: Peak value of the thermal disturbance generated by the switching activity of a logic gate.

I E-06 -:r-- -,.---r--...,..--.,...-;:::;----,

: i ; ! L...... IE-07 ~ ...... 4····T··! ······· ..!...... "l ...... l : i 1 ---0-- Case I ! j ~ 1 ! ! ...... ()...... Case 2 IE-OS ·······i··.. · ...... 1' ...... :......

I E-09 ...... 0-rr,,..;...,...,..,...,+t-.....-T..;-.-, i ...... +r.,.....,-l o 5 10 15 20 25 Distance [/lm]

Figure 6.8: Delay of the maximum temperature increases in the context of the beginning of the logic transition.

Finally, the delay of the maximum temperature increase, in the context of the beginning of the logic transition, is shown in Figure 6.8. The curve has been plotted as a function of the distance between the heat source and the monitoring point in the cases 1 and 2, whose values are listed in the above table. 6. Feasibility analysis and conclusions 195

2.2.2 Discriminability

In the section on cost estimation of thermal testing, it was mentioned that the threshold temperature increase should be as low as possible to reduce area overhead. Two terms set the lower bound of this parameter: 1. The sensitivity of the sensor 2. The thermal disturbances generated by switching activity As regards the second point, switching logic gates located near the monitoring point may affect the discriminability of this testing technique. Assuming linearity, the temperature waveform generated by N switching gates will be the sum of the N individual temperature waveforms.

T= 101's.

T = 1001's.

... T generated by a switching gate with Cl = .5 pF Distance = 6 IUD.

t. T generated by three switching gates with Cl= 5pF. Distance = 6 !lID o SO 100 ISO 200 2SO Radius [I-lm]

Figure 6.9: Lower bound of L1TTH in the presence of logic gates switching at 6 f11TI from the temperature monitoring point.

Figure 6.9 repeats the data featured in Figure 6.2 (threshold temperature increase as a function of the radius of the monitored area for three different testing periods), indicating the temperature increase generated at the monitoring point in the presence of switching logic gates 6 microns away from it. If the switching gates are positioned 16 f.lm away from the monitoring point, the increase in the lower bound of the temperature threshold will be reduced by one order of magnitude.

2.2.3 Strategies to improve the feasibility of thermal testing

The discriminability of a thermal testing technique can be increased by following certain design and test strategies. The set of rules known as Design for Thermal Testability is made up of design strategies aimed at improving the feasibility and reduce the cost of thermal testing. At the time of writing, this topic is still being researched. Some of the rules are as follows: 196 Chapter 6

• Low power design strategies: This increases the radius of the monitored area. • Compact layouts: This increases the number of logic gates inside the monitored area. • Optimum location of the temperature monitoring points: The points are to be positioned far from the gates with high switching activity ratios, and/or from high fan-out gates, so as to maximise the number of logic gates within their monitored area.

Some of the test strategies are:

• Generation of test vectors for thermal testing: This is discussed in the following sub-section. • Selection of the measurement time: The time it takes for the temperature waveforms to reach the temperature monitoring point depends on its distance from the heat source and the time interval in which the heat source is active. The temperature disturbance generated by switching activity arrives immediately following a logic transition, whereas the temperature disturbance generated by a defect arrives as long as the input vector is applied.

2.2.4 Generation of test vectors

If the set of test vectors is properly chosen, the feasibility of thermal testing can be improved. This vector set should fulfil the following requirements: a) Upon the activation of a defect, the total amount of power dissipated by the heat sources that arise should be maximised. As demonstrated in Chapter 4, the total power dissipated by certain defective topologies depends on the input vector. b) Minimisation of the switching activity of the CUT, especially as regards the gates that are near the monitoring point. c) Generation of the thermal disturbances triggered by the switching activity in the common mode of the sensor if a differential sensing strategy is used. d) Minimisation of the number of test vectors.

An A TPG for thermal testing was carried out in [225] for combinational circuits. It is based on the PODEM algorithm. Further information on this classical algorithm can be found in [226]. In thermal testing, it is not necessary to propagate the error generated to primary logic outputs; the target of the test vector set is merely used to activate the defect. 6. Feasibility analysis and conclusions 197

The ATPG procedure defines the cost of transition controllability. This is a function that quantifies the number of internal nodes that must change in order to set a specific logic state in one node. In [224] a transition controllability cost was defined to minimise heat dissipation during a test procedure. A weighted implementation of this function was carried out in [225], taking into account the parasitic capacitor associated with each node.

,-, ~ I()()()() -;= ~ eu= ,. .. -..... --.--.-.-.--.~-~---- .

~ • Another + > . , Previous

Figure 6.10: Results obtained with the ATPG program when run on the [SeAS '85 benchmark circuits. [225J

Figure 6.10 shows the results generated by this program when it was applied to all the ISCAS'85 benchmark circuits, Metal-to-metal bridge faults were evaluated in this case. The figure shows the numbers of transitions and vectors needed to achieve the maximum coverage (in all the cases, the fault coverage was between 99 and 100%). The data is plotted as a function of various program options. It may not be necessary for all the primary inputs to be set to a specific logic value in order to activate the specific fault M. For example, the vector XXJOIX, in which X may be either a logic 0 or a logic 1, will activate this fault. The program offers three options to designate these X values: Random option: The X values are randomly designated. 198 Chapter 6

Another option: The X values are designated to activate a different target fault. Previous option: The X values are given the same values as those of the prior test vector that was applied to the circuit. Since it reduces the distance between two consecutive logic transitions, the "previous option" is the option which reduces the internal switching activity to the greatest degree.

3. GENERAL CONCLUSIONS

The main aim of this book has been to present the performance and feasibility of a parametric observable for the testing of integrated circuits: temperature. Thermal testing has been defined as the evaluation of the structural integrity of a circuit under test through temperature measurements at various points. Basically, from a thermal point of view, a thermal testing set-up is made up of devices that act as heat sources, thermal coupling mechanisms and temperature observation systems. Depending on the type of measurement (location, time of measurement, signal processing) the status of either the devices that act as heat sources, i.e. the devices of the circuit under test or the integrity of the heat transfer media, i.e. the entire package structure, can be characterised. The main advantage of this testing procedure is that there are no electrical interactions between the circuit under test and the thermal monitoring system, as the connection between the two is made through the natural thermal coupling that exists in integrated circuits. This makes this test strategy attractive for high-performance circuits. Additionally, the use of built-in temperature sensors makes it possible to test either when the circuit is working in a field application or in a laboratory. Finally, special mention must be made of the diagnostic capabilities offered by this technique, which are made possible by its applications designed to locate defects in circuits or perform the structural extraction of a package system. We have attempted to take a multidisciplinary approach to this subject. In the physical domain, we have introduced the fundamentals of heat, temperature and heat conduction through the IC structure. In the mathematical domain, we have introduced various techniques that may be used to analytically or numerically obtain thermal-coupling models in an Ie. Certain strategies which can be employed to perform electro-thermal analysis of IC's have also been presented. In the electronic domain, the first chapter provided a background of testing techniques, defmitions and challenges that have arisen as a consequence of the evolution of microelectronics. In Chapter 4 we characterised the thermal couplings in 6. Feasibility analysis and conclusions 199

defective CMOS digital circuits. Finally, a detailed analysis of built-in temperature sensors that are used for testing has been outlined. This can be linked to another discipline: instrumentation. An entire chapter is dedicated to a thorough introduction to instrumentation mechanisms for the thermal monitoring ofIC's. Thermal testing is an important research topic at present. Its main research areas are: first, the design of CMOS temperature sensors (which allow both absolute and differential measurements) for thermal testing; second, the applicability of this technique to specific digital or other types of circuits (especially those in which the electrical observability is low or those whose performance may be altered by an electrical observation, including memories, analogue circuits or MEM's); and finally, design for thermal testability, a field in which additional efforts are required to consolidate this testing strategy. We now conclude this book, with the hope that the reader has enjoyed exploring the possibilities and potential applications of temperature as a test observable.

4. REFERENCES

[221] International Technology Roadmap for Semiconductors SemiTech, http://public.itrs.netJ [222] Ferre, A Figueras, J., "IDDQ Characterisation in Submicron CMOS," Proc. 1997 International Test Conference. [223] Altet, J., Rubio, A, Tamamoto, H., "Analysis of the Feasibility of Dynamic Thermal Testing in Digital Circuits," 6th Asian Test Symposium ATS. pp. 149-154. [224] Wang, S., Gupta, S.K., "ATPG for Heat Dissipation Minimization During Test Application," Proc. ofITC'94, pp. 250-258. [225] Tamamoto, H., Saito, M., Rubio, A., Altet, J., "Test Pattern Generation for Thermal Testing," Proc. European Test Workshop 1998, pp. 101-102. [226] Abramovici, M., Breuer, M.AM., Briedman, AD., "Digital System Testing and Testable Design," Computer Science Press, 1990. Index

absolute temperature sensors, 162 British thermal unit, 48 AC thermal analysis, 71 Btu, 48 accelerated ageing process, 101 Btu/h,48 active loads, 38 built-in logic block observer, 14 adiabatic boundary condition, 56 built-in sensors, 161 ageing, 45 built-in temperature sensors, 140, 161 amplitude, 131 built-in testing, 13 analytical methods, 58 caloric, 23 Approximated solutions, 59 calorie, 48 Area monitored by a transducer, 187 Carnot engine, 24 area overhead, 186 Cauer network, 108, 111 Arrhenius Relation, 47 CCD camera, 158 asymptotic waveform evaluation, 86, 87 chemical reactions, 37 ATPG, 186, 196 cholesteric LCD, 141 Automatic Test Equipment, 11 Circuit Under Test, 11 AWE, 86. 87 Closed form solutions, 59 Bessel function series, 69 CMOS systems, 1 BiCMOS sensors, 174 CMRR,190 BILBO, 14 components power dissipation, 38 bipolar devices, 38 Computer Aided Design, 37 BIST,14 conduction, 25 BJT devices, 44 contact resistance, 29 blackbody, 32, 146 continuity equation, 35 blackbody emissive power, 33 convection, 25, 31 Boundary Element Method, 78 convective boundary condition, 56 Bridge defects, 4 convective heat transfer coefficient, 31, bridges, 116 56 bridging resistance, 120 conventional CCD cameras, 147 201 202 Index convolution theorem, lO3 figures of merit, 98 corrosion, 46 Finite Difference Method, 78 cosine transform, 75 Finite Element Method, 78 crosstalk faults, 11 First Law of Thermodynamics, 23 cumulative structure function, 113 Fluorescent microthermography, 140, cumulative thermal capacitance, 113, 114 144, 179 cumulative thermal resistance, 113, 114 foot, 48 deconvolution operator, lO9 forced convection, 32 defect models, 5 forward-difference, 79 delay, 131, 194 Foster network, 107, 110 delay fault models, 11 Fourier series, 63, 66 Delay of the maximum temperature Fourier series summation, 59 increase, 126 Fourier's Law, 27, 36 Designfor Testability Rules, 13 functional fault models, 11 Designfor Thermal Testability, 195 functional faults, 11 design rules for Iddq-ability, 17 Gate oxide shorts, 3, 46, 116 device temperature sensitivity coefficient, Gate-Oxide-Breakdown, 46 42 Gaussian elimination, 82 dielectric breakdown, 46 Gauss-Seidel method, 82 differential built-in temperature sensors, Generation of test vectors for thermal 179 testing, 196 Differential interferometric GOS resistance, 123 measurements, 156 heat, 23 differential structure function, 114 heat generation fault models, 10 differential temperature sensors, 169 heat paths, 97 Dirac delta function, 71, 104 heat sink, 40, 55, 97 direct method, 82, 91 heat transfer coefficient, 49 Dirichlet boundary condition, 55 Hot-carrier-induced (HCI) defects, 46 discriminability, 195 IDDQ test pattern generation, 17 Distance, 126 IDDQ testing, 13, 15 Duration of the power pulse, 126 Indium antimonide detector, 147 dynamic current fault models, 10 indium gallium arsenide detector, 147 eigenvalues, 62 infinite series of cosine functions, 63 electrical fault models, lO infrared (/R) cameras, 147 electrical mesomorphism, 141 infrared emission, 139 Electrical model, 98 infrared emission thermography, 145, electromigration,46 180 electronic speckle pattern interferometry, Infrared thermography, 145 158 integrated circuit technology, 41 electron-volt, 48 integrated circuits, 1 electro-thermal analysis, 54, 91 interferometer, 100, 139, 154 emissivity, 33 intrinsic defects, 3, 97 English Engineering Units, 48 isothermal boundary condition, 55 error, 10 l.S. Kilby, 1 explicit finite difference formulation, 82 joule, 48 extrinsic defects, 3, 97 louie effect, 37, 100, 101 Fahrenheit degrees, 49 Kelvin degrees, 49 failure analysis, 144 kilogram, 48 fault models, 10 kinetic energy, 24 feasibility analysis, 186 Kirchoff Current Law, 84 Index 203

Laplace domain, 102 parametric test observable, 97 Laplace equation, 37, 57 parasitic bipolar transistors, 164 Laplace transform, 75 Peak value of the thermal waveform, 126 laser interferometry, 179 Peltier coefficient, 40, 101 Laser reflectometry, 179 Peltier effect, 40, 100, 101, 102 layout, 2 performance degradation, 3 liquid crystal, 141 phase, 131 liquid crystal displays, 141 phase shift, 70 liquid crystal thermography, 139, 141, photodiode, 148 179 photoelectric effect, 147 logic fault, 10 photolithography, 2 logic transition, 191 physical failures, 3 longwave IR cameras, 147 physical faults, 10 lyotropic mesomorphism, 141 planar process, 2 Magnitude of the power dissipated, 126 planar technology, 41 main heat source, 119 p-njunction, 43,164 manufacturing defects, 3 PODEM,196 mask,2 Poisson equation, 37, 57 mean time to failure, 46 poundforce, 48 mechanical friction, 37 power density function, 57 mesh of nodes, 78 power dissipation fault model, 10 mesomorphic substances, 141 printed circuit board, 41 Metal Oxide Semiconductor, 1 proportional to absolute temperature Michelson interferometer. Interferometry, transducers, 163 154 punctual measurements, 131 microscoping fluorescence, 139 quantum-well infrared photodetector, 147 midwave IR cameras, 147 quiescent current fault model, 10 Minimum power dissipated, 187 quiescent power, 190 mixed boundary condition, 56 radiation, 25, 32 Molecular Theory, 24 radiation sources, 37 Moore's Law, I, 17,41 radiosity, 32 MOS devices, 38 radius of the monitored area, 188 MOS transistors, 42 RC formulation of the thermal transfer multipart thermal macromodel, 86 problem, 78 NAND gates, 119 RC models, 108 natural convection, 31 realistic defects, 3 near-infrared cameras, 147 reflection coefficient, 148 nematic LCD, 141 relaxation procedure, 91 Newman boundary condition, 55 reliability analysis, 46 Newton, 48 rise time, 131 Newton's Law of Cooling, 31 scanning thermal microscope, 179 noise coupling, 136 second,48 numerical analysis, 109 Second Law of Thermodynamics, 23 numerical methods, 58 self-heating, 40, 54 on-chip sensors, 161 semi-infinite homogeneous medium, 71, open drain defects, 4 132 open gate defects, 4 semi-spherical heat source, 132 oxide breakdown, 5 sensor bandwidth, 139 parametric analysis, 58 sensors with an intrinsic reference, 163 parametric faults, 10, 11 short-circuit currents, 191 204 Index short-circuit power, 190 thermal diffusion equation, 37 silicon area overhead, 161 thermal diffusivity, 36, 65 single-transistor temperature sensors, thermal disturbance, 125, 193 162 thermal efficiency, 24 SiOl> 2 thermal impedance, 103 slope, 133 thermal map, 139 smectic LCD, 141 thermal quadrupole, 73, 74 Solder joints, 100 thermal resistance, 27, 49, 54, 103 specific heat, 25, 35, 49 thermal testing of the quality of packages, Statistical Physics, 24 102 Stefan-Boltzmann Law, 33 thermal testing techniques, 97 step function, 104 thermal transfer function, 71 stuck-at-I fault model, 9 thermal transient impedance, 105 stuck-open faults, 9, 11 Thermodynamics, 23 surface-mounted technology, 100 Thermorejlectometry, 139, 148 Systeme International, 48 thermotropic mesomorphism, 141 Systems-On-Chip, 15 threshold temperature increase, 195 temperature, 24 threshold thermal level, 188 temperature amplitude, 70 threshold voltage reference, 165 temperature monitoring point, 71 time constant, 105 temperature sensor system, 97 time constant spectrum, 108, 109 Temperature threshold level, 187 time domain, 102 test coverage, 13 transfer function, 102, 129 test length, 13 transient thermal impedances, 106 test technology, 2, 10 transition characteristic temperature, 142 test vectors, 11, 12 transition controllability cost, 197 Testability Standards, 13 two-variable infinite series of cosine Testing period, 187 functions, 66 testing process, 2 uncooled microbolometer, 147 thermal analysis, 54 variable grid density, 85 thermal barriers, 100 VLSI technology, 1,41 thermal conductivity, 26, 36, 49 watts, 48 thermal continuity equation, 36 Wollaston prism, 156 thermal coupling, 54 Wollaston wire, 159 thermal coupling impedance, 104 work, 23 thermal coupling resistance, 54, 86, 104, 130