<<

SIGNAL AND POWER INTEGRITY OF HIGH-SPEED IC IN CHIP-PACKAGE SYSTEM

By

HYUNHO BAEK

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2013

1

© 2013 Hyunho Baek

2

Dedicated to my lovely wife, children and respectful parents

3 ACKNOWLEDGMENTS

First of all, I would like to thank Jesus Christ who has been leading me during the Ph.D. program and I have been very blessed to receive all the support from many people until the completion of this dissertation. I would like to express my gratitude toward them.

I would like to thank my advisor, Prof. William R. Eisenstadt, for his support and guidance throughout my graduate research. He not only gave me the opportunity to work in his research group, but also led me to reach the destination of my graduate research with untiring energy. I will always be grateful for his valuable advice and insight. Also, I would like to thank my committee members, Prof. Jenshan Lin, Prof. YK Yoon, and Prof. Gloria Wiens. I sincerely appreciate their time and feedback.

Being a member of Prof. Eisenstadt’s research group meant a lot to me. I would like to thank all members of the group for their friendship and fellowship. Special thanks goes to my colleagues, Krishna, Said, Moishe, Byul, Dooyoung, Waco and Jongmin.

Finally, I would like to thank with heartfelt gratitude, my wife, Hyejeong. Thank you for your constant love, dedication and encouragement. Her love is so deep as well for our lovely children, Brianna and Joshua. I would also like to especially thank my parents. They have supported me with their complete love and encouragement in the course of my graduate research.

This whole accomplishment would not have been possible without their inspiration and advice.

My lovely brother and his wife have been a great support as well. I have known that I would eventually make it because my family has been by my side the whole time, praying for me.

Therefore, this dissertation is dedicated to my beloved family.

4 TABLE OF CONTENTS

Page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 9

LIST OF FIGURES ...... 11

ABSTRACT ...... 20

CHAPTER

1 INTRODUCTION ...... 22

1.1 Trends in Integrated Circuits and Package Technology ...... 22 1.2 Classification of Modern Package ...... 25 1.2.1 Wire-bonding Package ...... 26 1.2.2 Flip-Chip Package ...... 27 1.2.3 3D TSV Package ...... 28 1.3 High-Speed I/O in Chip-Package System...... 29 1.3.1 Single-Ended I/O Driver ...... 31 1.3.2 Differential Pair I/O Driver ...... 31 1.4 Challenges in High-Speed IC for SI/PI in Chip-Package System ...... 32 1.5 Electrical Modeling in Chip-Package System ...... 34 1.6 Future Organization of the Dissertation ...... 36

2 PRINCIPLES OF AND POWER INTEGRITY CHARACTERIZATION IN CHIP-PACKAGE SYSTEM ...... 38

2.1 Signal Integrity Characterization in Chip-Package System ...... 38 2.1.1 Edge Rate and Knee Frequency ...... 40 2.1.2 Reflection and Effects ...... 41 2.1.3 Power/Ground Bounce Noise ...... 42 2.1.4 Crosstalk Noise ...... 43 2.1.5 High Frequency Transmission Line Loss Effects ...... 46 2.1.6 Simultaneous Switching Noise ...... 46 2.1.7 Inter-Symbol Interference (ISI) and Data Dependent Jitter (DDJ) ...... 47 2.2 Power Integrity Characterization in Chip-Package System ...... 49 2.2.1 Power Delivery Network ...... 49 2.2.2 IR-Drop (DC Power Integrity) ...... 51 2.2.3 AC Power Integrity ...... 51

3 SINGLE-ENDED I/O IN 3D IC FOR SIGNAL INTEGRITY CHARACTERIZATION IN 3-TIER VERTICAL CONNCETION ...... 53

5 3.1 Introductory Remarks ...... 53 3.2 GTL I/O Test IC ...... 53 3.3 Characterizing Signal Integrity of High-Speed I/O in 3D ICs ...... 55 3.4 Limitation of On-chip Measurement ...... 56 3.5 Comparison between On-chip and Off-chip Measurements ...... 59 3.5.1 High-Speed Transient Measurement ...... 59 3.5.2 Crosstalk ...... 61 3.5.3 EM Simulation and Transient Simulation Results ...... 63 3.6 Additional On-Board Measurements ...... 68 3.6.1 3D IC GTL I/O Response to Varying Data Patterns ...... 68 3.6.2 The Edge Timing Comparing Launched from Several 3D IC Tiers Simultaneously Launched to the Same I/O Output ...... 69 3.6.3 GTL I/O Performance under Degraded Environmental Conditions in the 3D IC such as Reduced Bias ...... 73 3.7 Summary ...... 74

4 SIGNAL INTEGRITY CHARACTERIZATION OF HIGH-SPEED I/O IN TWO TIER 3D-IC AND PACKAGE SYSTEM ...... 75

4.1 Introductory Remarks ...... 75 4.2 Characterizing TSV and Interconnect of High-Speed I/O in 3D IC-Package System .... 75 4.3 Simulation and Measurement Results in the 3D Chip-Package System ...... 78 4.3.1 High-Speed Transient Measurement ...... 80 4.3.2 Transient Measurement with Data Patterns ...... 82 4.4 Summary ...... 84

5 ON-CHIP 20GBPS HIGH-SPEED IC TEST SYSTEM FOR SIGNAL INTEGRITY CHARACTERIZATION IN FLIP-CHIP PACKAGE ...... 85

5.1 Introductory Remarks ...... 85 5.2 Characterizing Interconnect Signal Integrity of High-Speed I/O in Flip-Chip Package ...... 86 5.3 Demonstration of a Proposed On-Chip 20Gbps High-Speed I/O Test IC ...... 87 5.3.1 8-modulus 20-GHz Phase-Locked Loop in 90nm CMOS ...... 87 5.3.2 A 20-Gbps 27-1 PRBS Generator in UMC 90nm CMOS ...... 90 5.3.3 4-port 20Gbps Differential CML I/O Logic ...... 94 5.3.4 Overall Simulation Results of High-speed I/O Test IC ...... 97 5.4 Demonstration a Test Vehicle for Interconnect Signal Integrity Characterization ...... 98 5.4.1 Coupled Microstrip Lines Pattern ...... 98 5.4.2 CMLs with Multiple Via Pattern ...... 100 5.5 Comparison between the Dielectric Substrates in the Test Vehicle ...... 101 5.6 Summary ...... 104

6 COST EFFECTIVE MODELING METHODOLOGIES AND EVALUATING ELECTRICAL INTERACTION IN FCBGA (FLIP-CHIP BALL GRID ARRAY) PACKAGES ...... 105

6 6.1 Introductory Remarks ...... 105 6.2 Modeling Methodology for Die-Pkg Connectivity...... 105 6.3 Polynomials using Polynomial Regression ...... 109 6.4 Evaluating Electrical Interaction between Die and Package in FCBGA ...... 115 6.4.1 Different Pattern between Die and Package Planes ...... 115 6.4.2 Electrical Behavior between C4 Bump Pair ...... 119 6.4.3 Electrical Behavior between C4 Bump Pair in Complex Model ...... 120 6.4.4 Evaluating Electrical Behavior of the Intermediate Bump Pairs using Polynomials ...... 125 6.5 Summary ...... 127

7 ON-DIE POWER SUPPLY NOISE MEASUREMENT SYSTEM IN CHIP-PACKAGE SYSTEM ...... 128

7.1 Introductory Remarks ...... 128 7.2 Noise Characterization ...... 131 7.2.1 Noise Source ...... 131 7.2.2 Power Delivery Network ...... 134 7.2.3 Simulated Power Supply Noise for the Excitation Models ...... 136 7.3 On-chip Power Supply Noise Measurement System - Circuit design ...... 141 7.3.1 Design Specification ...... 142 7.3.2 Sampling and Detection System ...... 144 7.3.3 ADC Wired to On-chip Test System ...... 147 7.3.3.1 Switched-capacitor voltage doubler (SCVR) ...... 148 7.3.3.2 Voltage-controlled delay lines and oscillators (VCDL/VCOs) ...... 149 7.3.3.3 Unity-gain buffer (Two-stage CMOS OP-AMP) ...... 150 7.3.3.4 Level shifter ...... 151 7.3.3.5 Gated D latch (Conversion window) ...... 151 7.3.3.6 7 bit synchronous counter ...... 151 7.3.4 Overall Simulation Results of the Measurement System in UMC 90nm ...... 152 7.4 Accuracy of the Measurement System in UMC 90nm ...... 157 7.4.1 Trigger Signal – Aperture Jitter ...... 160 7.4.2 Switch and Holding Capacitor ...... 162 7.4.3 Vdroop Rate of the Capacitor ...... 163 7.4.4 Leakage Current of Surrounding the Capacitor in PVT Variations ...... 165 7.4.5 Resolution of ADC ...... 166 7.4.6 Voltage Variation of the Unity Gain Buffer ...... 166 7.4.7 Voltage Variation of the SC Voltage Doubler ...... 168 7.4.8 On-chip Counter ...... 169 7.4.9 Voltage Variation at the Power Supply Voltage ...... 170 7.4.10 PVT Variation of the Overall Measurement Circuit ...... 170 7.4.11 A Summary of the Accuracy of the Measurement System ...... 174 7.5 Ecosystem ...... 176 7.5.1 Software – Trigger Signal ...... 176 7.5.2 Software – Look-up Table ...... 177 7.5.3 Locations of Voltage Measurement Circuit ...... 178 7.6 On-die Power Supply Noise Measurement System in 65nm Technologies ...... 179

7 7.6.1 Sampling Period ...... 179 7.6.2 VCO frequency and VCO Sensitivity Under PVT Variation ...... 181 7.7 Summary ...... 182

CHAPTER 8 CONCLUSION ...... 184

APPENDIX: POLYNOMIALS USING POLYNOMIAL REGRESSION ...... 187

LIST OF REFERENCES ...... 190

BIOGRAPHICAL SKETCH ...... 194

8 LIST OF TABLES

Table Page

1-1 3D TSV packaging advantage and disadvantage ...... 29

1-2 Major signal integrity tools ...... 34

3-1 Eye diagram measurement results ...... 60

3-2 Crosstalk measurement results ...... 63

3-3 Simulated On-chip and Off-chip insertion loss ...... 65

3-4 Different data patterns measurement results ...... 69

3-5 Edge timing difference between tiers ...... 71

3-6 Simultaneously launching to the same I/O output ...... 73

4-1 Dimension of 3D-IC ...... 77

4-2 Dimension of each test strategy in 3D-IC ...... 77

4-3 Off-chip transient measurement results ...... 81

4-4 Data pattern, “01010100” measurement results...... 83

4-5 Data pattern, “01001001” measurement results...... 83

5-1 Summary of PLL performance ...... 90

5-2 All 18 possible characteristic polynomials for a 27-1 PRBS ...... 93

5-3 Eye-diagram simulation results...... 100

5-4 Dielectric substrates ...... 102

6-1 Comparison the SSE, R-square and RMSE (Root Mean Square Error) of N=10...... 111

6-2 4th degree polynomials of each curve (N=1, 5, and10) in Figure 6-7 ...... 112

6-3 Comparison the computational cost between HFSS and Polynomial ...... 113

6-4 Properties of the material in the die ...... 118

6-5 Properties of the material in the package ...... 118

7-1 Elements of CPU PDN noise sources ...... 132

9 7-2 Noise sources on the core ...... 133

7-3 Noise sources on the I/O ...... 133

7-4 Noise frequency and voltage droop amount of each excitation ...... 138

7-5 Design metrics of each excitation ...... 138

7-6 Summary of PDN current Icc(t) ...... 140

7-7 Design specification of switched-capacitor voltage doubler ...... 149

7-8 List of the sources of error ...... 159

7-9 Voltage droop amount at the capacitor in PVT variation ...... 164

7-10 Leakage current at hold mode in PVT variation ...... 166

7-11 Output ripple of unity-gain buffer by different capacitor ...... 167

7-12 Output ripple of voltage doubler by different capacitor ...... 168

7-13 Simulation results with the different ripple of the voltage doubler at Typical 27ºC, Vsample: 1.25V ...... 169

7-14 Status of the sources of error ...... 175

7-15 JTAG controller command signals ...... 177

7-16 Influence of scaling on MOS device characteristics [7.15] ...... 179

7-17 Comparison Ron and time constant between 90nm and 65nm ...... 180

7-18 Propagation delay in 90nm and 65nm process ...... 181

7-19 VCO output frequency in case of typical process @27ºC ...... 181

7-20 Comparison the resolution between 90nm and 65nm in TYP @27ºC...... 182

10 LIST OF FIGURES

Figure Page

1-1 Moore’s law and more ...... 22

1-2 Interconnect and gate delay ...... 23

1-3 On-chip local clock and Chip-to-Board frequency ...... 23

1-4 A trend of the packaging integration density ...... 24

1-5 Trend of: supply voltage and threshold voltage for various versions ...... 25

1-6 SOC consumer stationary power consumption trends ...... 25

1-7 Various package types ...... 26

1-8 Wire-bond package ...... 26

1-9 Flip-chip package ...... 27

1-10 Comparison of conventional 3D packaging and packaging with TSV technology ...... 28

1-11 Transition of 3D packaging ...... 29

1-12 Typical high speed serial link block diagram ...... 30

1-13 Single-ended and differential I/O drivers ...... 30

1-14 High-speed interface cost trend ...... 32

1-15 Dramatic increase in PDN current with decreasing design rule and voltage levels and increasing total IC power consumption ...... 33

2-1 Signal integrity in the [1.4] ...... 39

2-2 Relation between knee frequency and edge rate in digital signals ...... 40

2-3 Rise time versus harmonic content for a 1GHz ideal square wave [2.2] ...... 41

2-4 Lattice diagram of multiple discontinuities, resulting in the superposition of reflections ...... 41

2-5 Basic mechanism of ground and VDD noise ...... 43

2-6 Mutual capacitance, inductance and crosstalk induced noise ...... 44

2-7 Graphical explanation of crosstalk noise ...... 45

11 2-8 Simultaneous switching noise mechanism [2.1] ...... 47

2-9 Effect of ISI on timings [2.1] ...... 48

2-10 Effect of ISI on signal integrity [2.1] ...... 48

2-12 Power delivery network ...... 50

2-13 System-level power loss budget for the power gated PDN [2.3] ...... 51

3-1 Overall GTL I/O test IC structure ...... 54

3-2 GTL I/O driver ...... 55

3-3 GTL I/O test IC structure ...... 55

3-4 On-chip measurement setup ...... 56

3-5 Photograph of GTL I/O test IC and wire-bonded PCB...... 57

3-6 Off-chip measurement setup ...... 58

3-7 Eye-diagram of each tier in on-chip measurement ...... 58

3-8 Eye-diagram of each tier in off-chip measurement ...... 59

3-9 GTL I/O crosstalk test structure...... 61

3-10 On-chip FEXT and On-chip NEXT ...... 62

3-11 Modeling in HFSS A) on-chip, B) off-chip ...... 64

3-12 S21 magnitude of On-chip and Off-chip ...... 65

3-13 Transient simulations with S21 parameter of Tier C...... 66

3-14 Eye diagram of each tier in off-chip transient simulation in ADS ...... 67

3-15 Different data patterns at Tier C ...... 68

3-16 Edge timing difference of output pulse when turning on several tiers simultaneously @ Vsupply = 1.4V ...... 70

3-17 Amplified output pulse when turning on several tiers simultaneously @Vsupply = 1.5V ...... 72

3-18 Different data pattern, 00110110 with degrading Vsupply at Tier A ...... 73

4-1 Tezzaron 3D CMOS structure ...... 76

12 4-2 Test strategies (TS, TR, BS and BR) in 3D-IC ...... 76

4-3 Example of test methods in Top tier of 3D-IC ...... 77

4-4 3D layout of GTL I/O 3D test IC...... 78

4-5 On-chip transient simulation results with 1Gbps ...... 78

4-6 Photograph of GTL I/O test IC and wire-bonded PCB...... 79

4-7 Off-chip transient measurement with 1Gbps ...... 80

4-8 Data pattern, “01010100” ...... 82

4-9 Data pattern, “01001001” ...... 83

5-1 Overall block diagram for High-Speed I/O IC test system in Flip-chip packages for signal integrity characterization ...... 85

5-2 Concept of signal integrity characterization in flip-chip package ...... 86

5-3 Overall block diagram for On-chip High-Speed I/O test IC ...... 87

5-4 8-modulus 20GHz integer-N charge-pump PLL block diagram ...... 88

5-5 8-modulus prescaler block diagram ...... 89

5-6 VCO output buffer schematic ...... 89

5-7 True single phase clocking schematic...... 89

5-8 Simulated result for 20GHz PLL ...... 90

5-9 Proposed PRBS generator block diagram ...... 91

5-10 Proposed pulsed latch F/F with XOR block diagram ...... 91

5-11 Proposed merged XOR Latch ...... 92

5-12 20Gbps PRBS 27-1 sequence of x7+x4+1 characteristic polynomial of 27-1 ...... 92

5-13 Eye-diagram 20Gbps PRBS sequence ...... 93

5-14 CML I/O driver block diagram ...... 94

5-15 Modified CML latch in pulsed latch F/F ...... 94

5-16 Tapered CML I/O driver ...... 95

5-17 Comparison the simulation results at 20Gbps in 90nm process ...... 96

13 5-18 Simulated result for 4-port 20Gbps CML I/O logic...... 97

5-19 Transient simulation results of On-chip 20Gbps High-Speed I/O test IC ...... 97

5-20 Test vehicle for interconnect signal integrity characterization ...... 98

5-21 Dimension of coupled microstrip lines on PCB ...... 98

5-22 S magnitude of both CMLs_10mm and CMLs_20mm ...... 99

5-23 Transient simulation results with S-parameter of both CMLs_10mm and CMLs_20mm ...... 99

5-24 Eye-diagram of both CMLs_10mm ...... 100

5-25 Eye-diagram of both CMLs _20mm ...... 100

5-26 Dimension of CMLs with 2 vias on PCB ...... 101

5-27 S magnitude of both CMLs_via2 and CMLS_via6 ...... 101

5-28 Dimension of CMLs with multiple vias on package ...... 103

5-29 Simulation results of multiple substrates ...... 103

6-1 Modeling strategies ...... 106

6-2 Port definition in HFSS for modeling strategies S-parameter magnitude ...... 107

6-3 S-parameters between single 3D model and combined planar models ...... 108

6-4 Proof the transfer function is equal to S21 of 2-port network ...... 109

6-5 Die-pkg models with different number of N...... 110

6-6 Evaluating goodness of the fit to the S21 curve of N=10 ...... 111

6-7 Transfer function curve of each model (N=1, 5, 10) ...... 112

6-8 Coefficient analysis of 4th degree polynomials ...... 113

6-9 Comparison S-parameters and calculated polynomials ...... 114

6-10 Different patterns between the first layer of die and package ...... 115

6-11 Different patterns between the first layer of die and package in HFSS ...... 116

6-12 Different patterns between the first layer of die and package planes ...... 116

6-13 Diff. pattern between the first layer of die and package planes in HFSS ...... 117

14 6-14 Simulations of electrical interactions of Figure 6-10 and Figure 6-12 ...... 118

6-15 S11 among C4 bumps pair on die and pkg...... 119

6-16 Complex model of the separated die in FCBGA structure ...... 121

6-17 Port definition of separated die in complex model ...... 121

6-18 Comparison S-parameters among the C4 bump pairs in the die ...... 122

6-19 Port definition of separated package in complex model ...... 122

6-20 Comparison S-parameters among the C4 bump pairs in the package ...... 123

6-21 Port definition of separated die in complex model ...... 123

6-22 S-parameters between single 3D model and combined planar models ...... 124

6-23 Port definition of separated package in complex model ...... 125

6-24 Comparison S-parameters of intermediate ball bumps and calculated polynomials ...... 126

7-1 Microprocessor Vcc fluctuations caused by interaction of parasitics with changes in current demand [7.1] ...... 128

7-2 The impedance profile of core micro-architecture (Nehalem) ...... 129

7-3 A strategy of on-chip power supply noise measurement system ...... 130

7-4 Intel sandy bridge microprocessor [7.5] ...... 132

7-5 Die floor plan ...... 134

7-6 SPICE model of the die floor plan PDN ...... 135

7-7 Impedance profile of core-type PDN ...... 135

7-8 Impedance profile of an example I/O-type PDN ...... 136

7-9 Power supply noise based on the core type Icc(t) ...... 136

7-10 Power supply noise based on the I/O type Icc(t) ...... 137

7-11 Behavior of PDN current Icc(t) and PDN voltage Vcc(t) ...... 139

7-12 On-die noise measurement system...... 142

7-13 Design specification of the measurement system ...... 143

7-14 Sampling and detection system ...... 144

15 7-15 Gate-D Latch schematic and the truth table ...... 144

7-16 Simulation results of the Gated-D latch ...... 145

7-17 Bootstrapped switch ...... 146

7-18 Comparison between the proposed switch and bootstrapped switch ...... 147

7-19 A/D wired to On-chip test system ...... 147

7-20 Switched-capacitor voltage doubler ...... 149

7-21 VCDL/VCOs [7.11] ...... 149

7-22 Ideal buffer (Two-stage OP-AMP) ...... 150

7-23 Simulation results of the buffer in UMC 90nm process ...... 150

7-24 Simulated excitation model and power supply noise ...... 152

7-25 Simulation results of the measurement circuit for Vsample = 0.75V ...... 153

7-26 Simulation results of 7bit on-chip counter for Vsample = 0.75V ...... 154

7-27 Simulation results of the measurement circuit for Vsample = 1.25V ...... 155

7-28 Simulation results of 7bit on-chip counter for Vsample = 1.25V ...... 156

7-29 3 logical blocks of the measurement circuit ...... 157

7-30 Sources of error in the 3 logical blocks of the measurement circuit ...... 158

7-31 Effects of aperture jitter and sampling clock jitter [7.12] ...... 160

7-32 Theoretical data converter SNR and ENOB due to jitter vs.fullscale sinewave input frequency [7.12] ...... 161

7-33 Simplified RC model of the sampling switch and holding capacitor ...... 162

7-34 Diagram of the sampling period and the 4 RC time constant ...... 163

7-35 Simulation results of RC time constant and sampling period ...... 163

7-36 The equation of the leakage current ...... 164

7-37 Voltage droop rate of the capacitor in PVT variation ...... 165

7-38 Leakage current of the capacitor in PVT variation ...... 166

7-39 Simulation results for the voltage variation of the buffer at Vovershoot ...... 167

16 7-40 Pulse count difference between various output ripple of the buffer ...... 168

7-41 Possibility lose pulses due to a glitch pulse at Gated-D Latch ...... 169

7-42 VCO sensitivity in PVT variations ...... 171

7-43 Frequency error range in specific voltage in PVT variations @ 1V ...... 172

7-44 Voltage error range in specific frequency in PVT variations @ 1V ...... 173

7-45 Simplified diagram between the circuit and the software for a trigger signal...... 176

7-46 Simplified diagram between the circuit and the software for a look-up table ...... 177

7-47 Locations of circuits on the core in Intel sandy-bridge microprocessor ...... 178

7-48 Locations of circuits on the L3 cache in Intel sandy-bridge microprocessor ...... 178

7-49 Simulation results of RC time constant and sampling period ...... 180

7-50 Simulation results of VCO sensitivity ...... 182

17 LIST OF ABBREVIATIONS

AGP Accelerated Graphics Port

CML Current Mode Logic

CMLs Coupled Microstrip Lines

DDJ Data Dependent Jitter

DFT Device For Test

EM Electro Migration

ESL Equivalent Series Inductance

ESR Equivalent Series Resistor

FCBGA Flip-Chip Ball Grid Array

FEXT Far-End Crosstalk

FR-4 Flame Retardant 4

GTL Gunning Transceiver Logic

IBIS Input/output Buffer Information Specification

IC

ISI Inter-Symbol Interference

LFSR Linear Feedback Shift Register

NEXT Near-end crosstalk

PCB

PI Power Integrity

PDN Power Delivery Network

PLL Phase Locked Loop

PRBS Pseudo Random Bit Sequence

PVT Process, Voltage and Temperature

SCVR Switched-Capacitor Voltage Doubler

18 SerDes Serializer-Deserializer

SI Signal Integrity

SIP System In Package

SOC System On Chip

SSN Simultaneous switching noise

SSO Simultaneous Switching Output

TAP Test Access Port

TGV Through Glass Via

TSPC True Single Phase Clocking

TSV Through Silicon Via

VCDL Voltage-Controlled Delay Lines

VCO Voltage Controlled Oscillator

VRM Voltage Regulator Module

19 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

SIGNAL AND POWER INTEGRITY OF HIGH-SPEED IC IN CHIP-PACKAGE SYSTEM

By

Hyunho Baek

December 2013

Chair: William R. Eisenstadt Major: Electrical and Computer Engineering

Over the past decade, the frequency of local clocks on chip and the speed of I/O drivers for data communication from chip-to-package are increasing as well as interconnect is getting more complex with more functionality as the integration of technology advances. These enabled design engineers to design circuits with considerable improvement; higher I/O densities, more complex interconnects and TSV in 3D ICs, and high speed clock signals on the chip and the package.

However, signal integrity is degraded by several factors; low signaling voltage, high frequency data speed, and high density of interconnects that impact on the performance of communication. Also, in DFT (Device-For-Test), there is the challenge for the high-frequency on-chip signals to exported off-chip without degrading the signal quality in order to analyze and perform circuit testing. In addition, characterizing the on-chip power supply noise is also a key issue in the area of power integrity as the package structure is complex.

The main purpose of this research is investigating the signal integrity effects of I/O drivers for TSVs and interconnects and validating the accuracy of package electrical models using efficient test circuitry that could reside along with the device-under-test (DUT) on the same die as well as characterizing on-chip power supply noise in the chip-package system.

20 Hence, the author proposes to validate TSV and interconnect performance using Single-Ended

I/O Test IC for signal integrity characterization. The proposed approaches are different than conventional probed characterization approach since it is non-invasive test which is providing internal test points and can measure performance of the IC packaging which is wire-bonding. As well as the author proposes 20Gbps High-Speed IC test system for interconnects signal integrity characterization in planar IC and Flip-chip packages. Not only investigating the signal integrity effects of differential I/O in chip and package but also validating the electrical performance among the dielectric materials in the package is enabled in 20Gbps high-speed data rate. In addition, not only modeling in FCBGA accurately but also analyzing the electrical behavior of the package structures is essential for enhancement of signal and power integrity in chip-package systems. Hence, the author proposes cost effective modeling methodology and evaluating electrical interaction in FCBGA. For power integrity characterization, the author analyzes power supply noise caused by two representative noises; Core-type and I/O-type circuit activity as well as proposes an on-chip power supply noise measurement system using one supply voltage.

21 CHAPTER 1 INTRODUCTION

1.1 Trends in Integrated Circuits and Package Technology

Silicon-based CMOS technologies are advanced and are utilized in the area of microprocessors, memories, and logic devices in the market. As described by Moore’s law, the expected pace of periodic improvements in density and performance has been achieved through evolutionary device scaling and/or increases in chip size and the downscaling of minimum feature size which enables the integration of an increasing number of transistors on a single chip

[1.1]. The advanced integration technology enables a high signal volume system that is increased by combining system-on-a-chip (SoC) and system-in-package (SiP) justified by lower manufacturing and assembly costs. Because, the traveling route of signals among the components can be short, the required power is reduced. This trend is represented graphically in the ITRS Figure 1-1. [1.1]

Figure 1-1. Moore’s law and more

In decades, transistor scaling has been progressed to improve device performances by reducing gate length, dielectric thickness and junction depth. However, increasing resistance is

22 significant issue as long as the interconnect scales down since the conductor cross-sectional area is shrink [1.8] as well as increasing capacitance could be another issue if metal height is still constant to the conductor spacing. Hence, the interconnect delay is getting higher and higher in overall chip performance as feature size scales down as shown in Figure 1-2.

Figure 1-2. Interconnect and gate delay

The frequency of local clocks on-chip and the speed of I/O drivers for data communication from chip-to-package are getting faster and faster [1.1] to meet the requirement of high performance in chip-package systems as shown in Figure 1-3.

Figure 1-3. On-chip local clock and Chip-to-Board frequency

23 Even if Moore’s Law scaling is continued for transistor count and cost, there are fundamental developments required for interconnect to meet the high-bandwidth low-power signaling needs without introducing performance bottlenecks as scaling continues. The developments of wafer-level packaging, system in package and the coming 3D revolution will enable scaling advantages in packaging. In the future, 3D IC integration [1.5] becomes more attractive to achieve high performance and functionality; enabled through the reduction of global interconnect length, higher packing density, smaller footprint, and the possibility of mixed- technology integration in 3D ICs as shown Figure 1-4.

Figure 1-4. A trend of the packaging integration density

As Moore’s Law has continued with technology scaling, the required power supply voltage is decreasing as shown in Figure 1-5 and the reduction of power supply voltage [1.1] is driven by several factors—reduction of power dissipation, reduced transistor channel length, and reliability of gate dielectrics. However, total power consumption targets are relatively flat [1.1] in spite of the use of a lower supply voltage. In addition to, as shown in Figure 1-6, the power consumption is driven by a couple of factors; 1. Higher chip operating frequency, 2. Higher interconnect overall capacitance and resistance, 3. Growing of increasing gate leakage exponentially, 4. Scaled on-chip transistors.

24

Figure 1-5. Trend of: supply voltage and threshold voltage for various versions

Figure 1-6. SOC consumer stationary power consumption trends

1.2 Classification of Modern Package

IC packaging technology has been developed to meet the high speed I/O trends in the area of electronic equipment as shown in Figure 1-7. Increased device complexity generates an explosion of new packaging technology such as miniaturization to meet tighter electrical and thermal performance requirements. However, the miniaturization will be technologically limited due to the increase of leakage current which generates heat in transistors, and signal delay time caused by wiring. Hence, alternately, 3D packaging technology [1.2] is one of the technologies that are expected to make a breakthrough compared to miniaturization on a 2D surface and which will enable high density integration that does not depend on miniaturization on 2D surfaces. The 3D packaging key technologies include electrical packaging technology, which

25 means that it is vital to connect the stacked chips electrically. Conventional 3D packaging technology uses wire bonding with fine metal lead wires. By using TSVs, it is possible to save the space that would be necessary for bonding wires and to make wiring lengths shorter.

Typically, packages can be classified in many different ways, but in this section, they are categorized into three ways.

Figure 1-7. Various package types

1.2.1 Wire-bonding Package

Wire-bonding is an electrical interconnection technique using thin wires, pad surface and a combination of heat, pressure and/or ultrasonic energy as shown in Figure 1-8.

Figure 1-8. Wire-bond package

Once the formation of the wire-bonding is created, the surfaces between the wire and the pad is sharing electron or is occurring inter-diffusion of atoms. The wire-bonding process is utilized the application of ultrasonic energy for enhancement of smoothing out surface

26 roughness. Also, both the material deformation and breaking up contamination layers are led by the bonding force. The most significant benefit of using the wire-bonding technology in package is low-cost. On the other hand, wire-bonding tends to be limited in density and performance; there are many disadvantages such as

 Low I/O counts due to technology limitations,

 Large bonding pads that is taken an extra area,

 Large bonding pitch,

 The requirement for relatively large quantities of gold,

 Slow production rate,

 Relatively poor electrical performance,

 Variations in bond geometry,

 Robustness and reliability problems brought about by environmental conditions.

1.2.2 Flip-Chip Package

Flip-chip package is demonstrated in Figure 1-9. In the package, the surface of a die is face down to be connected to the package or the printed circuit board (PCB). As a result, the contacting surface of the component can be placed closely to save an area, to keep a small size.

Figure 1-9. Flip-chip package

Especially, the short length of the interconnection contributes better performance in high frequency applications than other interconnection methods since parasitic inductances and capacitances are reduced through Flip-chip technology. Also, the entire chip surface can be used

27 to be distributing the contact pads rather than being limited to the periphery. Therefore, it can be saved the silicon area as well as increased the maximum number of interconnects. In addition to, the length of the signal interconnection can be shortened. On the other hand, the thermal resistance is possibly increased and the solder bonds inspection can be difficult since it is faced down to the package or PCB. And, the thermal expansion can be mismatched between the die and the substrate.

1.2.3 3D TSV Package

TSV technology, one of the advanced 3D packaging technologies, is expected to improve packaging density compared with technologies, using wire bonding as shown in Figure 1-10

[1.6]. By using TSV technology, circuits in the stacked system chips are interconnected by electrical connections through silicon vias (TSVs) so that those chips can be integrated into a stacked system. TSV technology has attracted attention as alternative advanced technologies to replace the conventional process scaling of VLSI. Furthermore, TSV technology is expected to enable multifunctional VLSI systems, because of its capability to integrate heterogeneous LSI chips and/or MEMS in a single package.

Figure 1-10. Comparison of conventional 3D packaging and packaging with TSV technology

By using TSVs, it is possible to save the space would be necessary for bonding wires and to make wiring lengths shorter. TSV technology is moving further ahead of other novel 3D

28 packaging technologies mentioned above towards the production of commercially viable LSI components as shown in Figure 1-11. [1.7, 1.8]

Figure 1-11. Transition of 3D packaging

For reference, advantages and disadvantages of 3D Integration technology with TSV are listed briefly in the table 1-2.

Table 1-1. 3D TSV packaging advantage and disadvantage Advantages Disadvantages

3D Integration 1. Short vertical interconnects 1. Architecture

with TSV 2. Miniaturization 2. Design to leverage 3D technology 3. Higher Bandwidt/lower latency 3. Power Delivery 4. New function in small form factor 4. Thermal Management 5. Lower power 5. Industry Compatibility & Standards 6. Improve performance 6. Manufacture Equipment, Process, 7. Lower Cost Assembly & Fine pitch Wafer Test

1.3 High-Speed I/O in Chip-Package System

A typical serial link diagram is shown in Figure 1-12. It consists of a transmitter, a channel, and a receiver. The transmission line can be implemented for a channel that is representing traces on a printed circuit board (PCB), traces within packages, sockets, cables, and connectors that join these various parts together.

29

Figure 1-12. Typical high speed serial link block diagram

The data capacity communicating between two links through the high speed I/O channel are increased considerably compared to other methods including on-chip and off-chip transmission. The trend in high speed serial links is Gbit/sec I/O speeds since the basic structure high frequency multiple Gbps/GHz I/O technology continues to improve. It has led a significant growth in speed and serial I/O port count in computing, networking, and as well as consumer applications to meet the needs of growing markets. High speed I/O chip-package communication can be categorized based on how voltages and currents are observed at the driver and the receiver. They divide into two areas, differential I/O and single-ended I/O as shown in Figure 1-

13.

Pull-Only Push-Pull

Single

-

Ended

Differential

Figure 1-13. Single-ended and differential I/O drivers

30 1.3.1 Single-Ended I/O Driver

A single-ended driver has one output port and assumes a common ground or power connection as a reference. A single-ended signal has a power or ground reference or both. The referencing scheme can be identified by the proximity of the signal line to the power or ground domains. The proximity of the signal to the power or ground domain tends to make a return path to that domain in high frequencies, because capacitive coupling makes a low impedance path between the signal and the domain. Despite of its slow speed, Single-Ended I/O (GTL I/O) is widely used for personal computer applications, especially the front-side bus and the AGP, which is the interface between the processor (CPU or GPU) and the processor chipset [2.1] because of its rather simpler structure, lower pin count, and simpler buffer circuitry. GTL drivers generate faster edge rate signals than push-pull drivers since they don’t have slow PMOS devices. They can be used to enabling validate single-ended package electrical models at higher frequencies. In this research, GTL (Gunning Transceiver Logic) is present as a single-ended I/O test IC. This chip is used to look at the signal integrity for single-ended I/O chip-package and validate the package electrical model having single-ended inner traces for bonded 3D ICs in this research.

1.3.2 Differential Pair I/O Driver

A differential driver can be designed in a current mode or a voltage mode configuration.

Differential I/O is normally used for Serializer-Deserializer (SerDes) products such as PCI express, Serial-ATA, and Gigabit Ethernet. The increasing number of ports and signal speed makes it hard to model and characterize the signal path which consists of package, socket, and

PCB traces. can achieve higher data rates and has low susceptibility to noise. In this research, a 20Gbps 4 port differential I/O test IC was designed to look at the signal

31 integrity effects of the differential I/O chip-package system and validate the package electrical model having differential inner traces.

1.4 Challenges in High-Speed IC for SI/PI in Chip-Package System

High-speed interfaces in chip-package system have several important areas of concern; the Gbps/GHz interface keeping its exponential development pace due to the high volume computing and networking applications. The port count will be limited by chip level power consumption and the actual increasing single-lane Gbps data rates. The large scale integration not only presents a challenge in designing reliable high performance interface IP targeted for a very noisy SoC environments, but also in the testing of high performance interfaces. Also, most multi-Gbps transceivers were designed as high-performance, high-priced, and high-margin devices with a low level of integration and relatively low production volume. In addition to high port count, cost efficient ATE solutions are essential to test all serial ports because of the increasing cost [1.1] as shown in Figure 1-14. Therefore, reliable DFT features or other low cost test techniques are needed for large port count Serializer-Deserializer (SerDes).

Figure 1-14. High-speed interface cost trend

Specially, in a 3D IC, stacked Die (SiP and TSV) products can present many unique challenges to backend manufacturing flows because these products can contain die from more

32 than one supplier. Chip-to-chip, chip-to-wafer and wafer-to-wafer bonding, high-density micro bump technologies (typically copper and solder), and TSVs (Through Silicon Vias) enable connection of IP from many vendors through to the backside of a chip or wafer. Therefore, in case of a multi-chip package which is assembled heterogeneous die, a couple of test points on different platforms are necessary to investigate the module completely. However, the various test points may be faced to a limitation due to mechanical damage [1.1]. Hence, new testing methodology will be introduced using high-speed I/O test IC to accommodate contacting the top side of the chip for stacked IC and packaged 3D-IC on the PCB. The techniques that reduce overall test cost are valuable.

Also, chips are being designed by different vendors and being integrated by the system integrator, power integrity and IR drop control analysis becomes essential for power and signal integrity analysis to be performed with the needed accuracy. Moreover, the enhanced data rate contributes to the increasing power consumption as well. As a result, the PDN current has dramatically increased, as shown in Figure 1-15. [1.3]

Figure 1-15. Dramatic increase in PDN current with decreasing design rule and voltage levels and increasing total IC power consumption

33 1.5 Electrical Modeling in Chip-Package System

Since physical structures are more complex, both package modeling and simulation are more critical in package design. Also, not only modeling die and package accurately but also analyzing the electrical behavior of the package structures is important in order to anticipate higher performance and functionality for power integrity and signal integrity in the chip-to- package system.

Particularly, accurate package electrical models of power/ground supply networks and routed traces are key to the success in designing high speed I/O chip-package systems. Signal integrity parameters such as cross talk, simultaneous switching noise are directly related to the accuracy of a package electrical model. Common EM modeling methodologies used in SI analysis [1.4] are listed in Table 1-2.

Table 1-2. Major signal integrity tools Company Tool Function SI 2D 2D static DC EM simulation extracts inductance and capacitor

SI 3D 3D static DC EM simulation extracts resistance inductance and capacitance

Ansoft PCB/MCM PCB/MCM pre and post route SI analysis Signal Integrity

Turbo Package Package RLGC extraction Analyzer

Applied ApsimSI Reflection and Crosstalk simulation for lossy Simulation coupled transmission lines Technology ApsimDELTA-I Delta-I noise simulation Cadence SPECTRAQuest SI simulation: transmission line simulation, power plane builder

HP Eesof Picosecond Frequency-domain and time-domain simulation Interconnect Modeling for coupled lines and I/O buffers Suit

34 Table 1-2. Continued Company Tool Function Hyperlynx HyperSuite Single/couple transmission line simulation (PADS)

INCASES SI-WORKBENCH Lossy coupled transmission line simulation (Zuken)

Mentor Graphics IS_Analyzer Delay, Crosstalk simulation

Quantic EMC BoardSpecialist Delay, Crosstalk simulation

Sigrity SPEED97/ Power/ground noise simulation with couple SPEED2000 lossy transmission line analysis

Viewlogic XTK Couple lossy transmission line analysis Systems (Innoveda) AC/Grade Power/ground modeling

2D field solvers are used for extracting RLGC matrices of single/couple transmission lines; single/couple lossy transmission line simulator; 3D field solvers simulate performance of wirebonds, vias, metal planes; behavior modeling of drivers and receivers. IBIS (Input/output

Buffer Information Specification) is also an emerging standard used to describe the analog behavior of the Input/output (I/O) of a digital integrated circuit.

In this research, to evaluate and validate the package electrical performance and model, the package can be generalized as a two-port network with one port on the top of the package where the die circuits are connected and the other port on the bottom of the package where the printed circuit board (PCB) is connected between a die and a package as a microwave interconnection concept for the case of chip-to-package system. Also, a modeling methodology is proposed for die-to-package connectivity that shows good agreement with the power, ground grids and C4 bump performance in case of flip-chip package.

35 1.6 Future Organization of the Dissertation

This section describes the organization of the dissertation. This work concentrates on the solutions to the issues arising from the need for developing for signal and power integrity characterization and effective electrical modeling in the chip-package system.

The current chapter (Chapter 1) introduces the reader to the background of the trends of the integrated circuit technology, packaging and the high-speed interfaces in chip-package system.

In chapter 2, the principle of signal and power integrity characterization in chip-package system are discussed before introducing a test IC for signal and power integrity characterization.

Chapter 3 describes in detail about signal integrity characterization using single-ended

I/O driver for TSV and interconnect in a 3D IC and package system. MITLL 150nm FDSOI technology is used to fabricate the test IC. This work contributs to test methods for signal integrity characterization in 3D-IC and ball-bonded package using high-speed interface data.

Chapter 4 describes another 3D IC process, Tezzaron 150nm technology. A signal integrity characterization for TSV and Interconnect are performed using high-speed I/O drivers to examine on-chip and ball-bonded packages. Through the proposed test strategies, this research contributed to how to characterize the 3D-IC and package system.

Chapter 5 talks in detail about the signal integrity characterization using 20Gbps high- speed IC in a flip-chip package system. The high-speed IC includes a 20GHz PLL, 20Gbps

PRBS and 20Gbps Differential CML I/O to analyze the signal integrity in UMC 90nm process.

Also, the chip and PCB are modeled using a flip-chip package in HFSS for characterizing signal integrity in chip-package system at a 20Gbps data rate. This work is a contribution to the high- speed Test IC System for signal integrity characterization.

36 Chapter 6 discusses about the evaluating electrical interaction between die and package in a FCBGA structure. The author proposes a modeling method to save a cost to characterize the signal and power integrity in FCBGA structure. In addition, polynomial regression method is used to model the FCBGA structure. This work contributes to save a cost to model a die and a package simultaneously as well as to figure out the electrical interaction between the die and the package in early phase.

Chapter 7 discusses in detail about power integrity characterization through measuring the on-chip power supply noise as well as analyzing the power supply noise. The author proposes an On-die power supply noise measurement system to measure both Vdroop and Vovershoot under 1VDD supply voltage. In addition, the two representative noises; core-type circuit activity and I/O-type circuit activity are characterized to be aware of the design metric for the PDN designers and the input noise range of the measurement system for the circuit designer. Also, evaluating the accuracy of the measurement system under PVT variations is performed. In addition, the scaled measurement system performance is compared in terms of the sampling period and the VCO sensitivity between 90nm process and 65nm process nodes to figure out the effect of the trend of future technology. The work enables the detection on-chip noise to mitigate the factors that make the signal and power integrity worse. And, PDN-level designers can be recognized the design metrics based on the power supply noise in order to suppress the PDN stress.

Finally, chapter 8 concludes by discussing about the conclusion of the dissertation.

37 CHAPTER 2 PRINCIPLES OF SIGNAL AND POWER INTEGRITY CHARACTERIZATION IN CHIP- PACKAGE SYSTEM

2.1 Signal Integrity Characterization in Chip-Package System

The accurate timing and the reliable quality of the signal are two representative concerns for I/O signal integrity. First, signal timing pertaining to interconnects depends on the delay caused by the electrical length of the interconnect structure where the electromagnetic energy flows from one end to another such as chip-to-chip, chip-to-package and package-to-package.

Hence, the timing depends on the delay caused by the physical length that the signal must propagate as well as the shape of the waveform when the threshold is reached [1.4]. In addition, the fabricated interconnects and packages need to be capable of supporting very fast varying and broadband signals without degrading signal integrity. With current technology, the package interconnection delay dominates the system timing budget and becomes the bottleneck for the high-speed system design. It is generally accepted today that package performance is one of the major limiting factors of the overall system performance when designing these packages (chip carriers, PCBs and 3D stacked IC) and integrating these packages together. [1.4]

Figure 2-1 shows the role of Signal Integrity (SI) analysis [1.4] in the high-speed design process. From this chart, we will notice that SI analysis is being applied throughout the design flow and is tightly integrated into each design stage. Therefore, SI analysis at this stage is also called constraint driven SI design because the design guidelines developed will be used as constraints for component placement and routing. Hence, in order to perform the signal integrity analysis in this research, high-speed I/O; single-ended I/O and differential I/O are used in both

3DICs and package system.

38

Figure 2-1. Signal integrity in the design flow [1.4]

A test IC was designed to prove on-chip high speed I/O stimulus and test capability and to launch, measure and model Gigabit/sec (Gbps) chip/package/board signal paths in SiPs, flipchips and 3-D stacked ICs through bondwires, ballbonds, stud bumps, AC coupled connections and through chip vias. The data from the measurements will provide verification of physical 3-D EM package modeling simulators. In addition, this work will enable the package and package test engineers to anticipate 3-D package test systems for high speed digital designers prior to receiving prototype die; thus anticipating and solving chip-to-package integration issues in a timely fashion.

39 2.1.1 Edge Rate and Knee Frequency

Edge rate [2.1] called rise/fall time of a signal can be measured by a transient simulation or measurement using high-speed I/O in this research since the edge rate is important with respect to the signal integrity. Below is the knee frequency that has been defined to provide a rough rule of thumb for this limit of a signal’s frequency content.

0.5 F  , where tr is rise time of the signal KNEE tr

This frequency represents the frequency below which, the majority of the spectral content is contained. Figure 2-2 shows the relationship between knee frequency and edge rate in digital signals. As illustrated in Figure 2-3, the signal edge gets shorter as more harmonic content is added. The equation indicates that the frequency content of a signal is determined mainly by its signal edge rate which is affected to the signal integrity. Typically, the common practice in digital design is to make signal edge rate to be 10% of period. Because, the square wave can be relatively smooth when the signal is transmitted up to 5th harmonic.

Figure 2-2. Relation between knee frequency and edge rate in digital signals

40

Figure 2-3. Rise time versus harmonic content for a 1GHz ideal square wave [2.2]

2.1.2 Reflection Noise and Transmission Line Effects

In high-speed systems, reflection noise [2.1] increases time delay and produces overshoot, undershoot and ringing. The root cause of reflection noise is the impedance discontinuity along the signal transmission path. When a signal changes its routing layer and the impedance values are not consistent, a reflection will occur at the discontinuity boundary.

Figure 2-4. Lattice diagram of multiple discontinuities, resulting in the superposition of reflections

41 When a trace travels down planes with perforations at different locations such as degassing holes, via holes, etc., crossing a gap, having branches (stubs), or passing the proximity of another trace, an impedance discontinuity will occur and reflections can be observed. When a signal finally reaches the receiving end of a transmission line, multiple reflections will happen if the load ZL is not matched with the transmission line , Z0. [1.4] Hence, the reflected signal is generated and additional interference occurs, this is shown in Figure 2-4, illustrating the superposition of reflections for multiple impedance discontinuities. To minimize reflection noise, common design practices include controlling trace characteristic impedance, eliminating stubs, and always using a solid metal plane as the reference plane for return current.

The most effective method to decrease the reflection noise is to terminate the transmission line with a termination resistor equal to the characteristic impedance of the transmission line. Most recently, in high speed interface systems, double termination which include source and load terminations are applied.

2.1.3 Power/Ground Bounce Noise

In mixed signal IC design, power and ground should be a good analog ground. In other words, the impedance associated with power distribution network should be kept to be very low.

The problem is that when I/O drivers charge and discharge output capacitive nodes, sudden currents must be supplied by the board-level power through the inductive chip package and these generate ground and VDD bounce noise due to parasitic inductance. Figure 2-5 [2.1] shows the basic mechanism of generating ground and VDD noise. This type of noise is called Ldt/dI noise and proportional to the edge rate and parasitic inductance of the power supply network. The trend toward fast ICs has led to an increase in ground and power fluctuation (ground bounce and power droop). These effects can cause logic failure and serious jitter degradation of clock circuits such as Phase-Locked-Loop (PLL) and Clock and Data Recovery (CDR). This noise also

42 results in corrupting the bit reference level of a receiver circuit. I/O drivers are designed such that they draw huge current from power supply network in order to drive large off-chip capacitive loads with a reasonable amount of speed. This process causes a large fluctuation in the power supply rails. This phenomenon is getting worse as the channel length of CMOS transistors goes to the deep submicron level. In addition to the parasitic inductance and non-ideal return current paths act like a parasitic inductance and made the power rail fluctuation worse.

Figure 2-5. Basic mechanism of ground and VDD noise

2.1.4 Crosstalk Noise

When transmission structures are placed right next each other, the adjacent conductors are infringed and interacted through the electric and magnetic fields from the signal. Crosstalk is the coupling of energy from one line, aggressor to another line, victim because of the mutual capacitance and mutual inductance. As long as the system is complex, the signaling interfaces are placed in parallel, the crosstalk is more significant factor to be affected the signal integrity and the timing due to the propagation characteristic of the interconnects such as characteristic impedance and propagation velocity.

43 A

B Figure 2-6. Mutual capacitance, inductance and crosstalk induced noise, A) Mutual Capacitance and Mutual Inductance, B) Crosstalk Induced Noise

Mutual inductance, Lm will induce current from a aggressor line onto a victim line through the magnetic field. In other words, a current will be induced on victim line in close enough proximity to the aggressor line as the magnetic field encompasses the victim trace. And, the circuit element in Figure 2-6 that represents this tranfer of enerh are the following familiar equations and the magnitude of this noise is derived as.

dI V  L (2.1) NOISE,Lm m dt

Hence, the mutual inductance Lm becomes very siginificant in high-speed digital applications since the inductance will inject a voltage noise onto the victim proportional to the rate of change of the current on the aggressor line.

Mutual capacitance, Cm is simply the coupling of two conductors via the electric field as well as mutual capacitance will inject a current onto the victim line proportional to the rate in

44 change of voltage on the aggressor line. The coupling due to the electric field represented in the circuit model by a mutual capacitor as shown in Figure 2-6:

dV I  C (2.2) NOISE m dt

Like as the mutual inductance, the induced noise is proportional to the rate of change in case of the mutual capacitance. Therefore, mutual capacitance also becomes very significant in high-speed digital applicaions. The shape of the crosstalk noise seen at the near and far ends of the victim line can be deduced by looking at Figure 2-7.

Figure 2-7. Graphical explanation of crosstalk noise

The far-end crosstalk (FEXT) pulse will travel concurrently with the edge of the signal on the driving line. The near-end crosstalk (NEXT) pulse will originate at the edge and propagate back toward the near end. Subsequently, when the signal edge reaches the far end of the driving line at time t = TD (where TD is the electrical delay of the transmission line), the driving signal and the far-end crosstalk will be terminated by a resistor. The last portion of the near-end

45 crosstalk induced on the victim line just prior to the signal being terminated, however, will not arrive at the near end until time t = 2TD because it must propagate the entire length of the line to return. Therefore, for a pair of terminated transmission lines, the near-end crosstalk will begin at time t = 0 and have a duration of 2TD, or twice the electrical length of the line. Furthermore, the far-end crosstalk will occur at time t = TD and have a duration approximately equal to the signal rise or fall time.

2.1.5 High Frequency Transmission Line Loss Effects

As digital systems evolve and technologists push for smaller and faster systems, the geometric dimensions of the transmission lines and package components shrink. Smaller dimensions and high-frequency content cause the resistive losses in the transmission line to be exacerbated. Modeling the resistive losses in transmission lines is becoming increasingly important. Resistive losses will affect the performance of a digital system by decreasing the signal amplitude, thus affecting noise margins and slowing edge rates, which in turn affects timing margins. Previously it has been possible to ignore losses on the PCB and in the package because systems operated at slower frequencies. Modern systems, however, require rigorous analysis of losses because they are often a first-order effect that significantly degrades the performance of digital interconnects.

2.1.6 Simultaneous Switching Noise

Simultaneous switching noise (SSN) is inductive noise caused by several outputs switching at the same time as seen in Figure 2-8. A signal switching by itself may have perfect signal integrity. However, noise negated from the other signals from the other signals can corrupt the signal quality of the target net when all signals in a bus are switching simultaneously. SSN is typically very difficult to quantify because it depends heavily on the physical geometry of the system. The basic mechanism, however, is the familiar equation.

46 di V  NL (2.3) SSN TOT dt where VSSN is the simultaneous switching noise, N the number of drivers switching, Ltot the equivalent inductance in which current must pass, and I the current per driver.

Figure 2-8. Simultaneous switching noise mechanism [2.1]

When a large number of signals switch at the same time, the power supply must deliver enough current to satisfy the sudden demand. Since the current must pass through an inductance,

Ltot, a noise of VSSN will be introduced onto the power supply, which in turn will manifest itself at the driver output.

2.1.7 Inter-Symbol Interference (ISI) and Data Dependent Jitter (DDJ)

The inter symbol interference (ISI) noise is created when a signal is traveling to transmission line which can be affected by the noise on the bus due to crosstalk, reflections and etc. As a result, the signal integrity is degraded and the timing margin is reduced by the ISI noise. ISI is a major concern in any high-speed design, but especially so when the period is smaller than two times the delay of the transmission line. ISI must be analyzed rigorously in system design because it is often a dominant effect on performance. Figure 2-9 shows a graphical example of how ISI can affect timings.

47

Figure 2-9. Effect of ISI on timings [2.1]

To capture the full effects of ISI, it is important to perform many simulations with long random bit patterns, and the timings should be taken at each transition. The bit patterns should be chosen so that all system resonances are sufficiently excited and the noise is allowed to settle partially prior to the next transition. ISI will also dramatically affect the signal integrity. It is important to investigate different bit patterns to ensure a robust design. Figure 2-10 is an example that demonstrates how dramatically the bit pattern can affect the signal integrity.

Figure 2-10. Effect of ISI on signal integrity [2.1]

There are several methods to minimize the ISI. The first method is to minimize reflections on the bus by avoiding impedance discontinuities and minimizing stub lengths and

48 large parasitics. The interconnects between the links needs to be kept as short as possible as well as the coupled traces such as microstrip line are not placed too tightly. Lastly, crosstalk effects have to be minimized to decrease the ISI noise.

2.2 Power Integrity Characterization in Chip-Package System

In high speed designs, the power integrity is getting more significant issue as long as long as the packaging density is complex. The critical issue of the power integrity is in chip, package and PCB design. Modern electronics save battery life by power gating part of the circuit. The effect of the power fluctuation contributes a degradation of the delay budget in chip and package since the noise is affected the chip and packages. The power integrity problems are going be severe as the supply voltage is lower, package layers are plenty, and incorrect decoupling caps are placed on package. The ultimate goal of the power integrity in chip-package system is to provide a non-distorted signal for the logic blocks on-chip as well as to produce a stable supply voltage which is power to all-level for the device operations in a low impedance power delivery network.

2.2.1 Power Delivery Network

A power delivery network (PDN) is the network that connects the power supply to the power/ground terminals of the IC through the package and board. Ideally, it provides sufficient voltage and current for the ICs the instant that the transistors switch. However, the power delivery network consists of inductive, resistive, and capacitive components, which impede supplying an infinitely large amount of current in an infinitesimally small amount of time. The series inductance in the supply path induces undesired voltage fluctuation, VL, through the mechanism of equation (2.4) so that the current transients on the power delivery network (PDN) cause power supply noise given by:

49 di di V  L , where is the rate of change of the current (2.4) L dt dt PDN designers are pursuing to have a less impedance of PDN than the target impedance all over the frequency range, especially in order to hold the power supply noise voltage since the low impedance of interest frequency contributes a low ripple on the power supply. Therefore, low PDN impedance can prevent excessive voltage fluctuations and lead to power supply noise reduction. The target impedance of a PDN to limit the voltage ripple on the power supply is given by (2.5):

VDD %ripple ZPDN  , where VDD is the supply voltage, (2.5) IMAX %ripple is the target percentage of the voltage ripple, and Imax is the maximum current

As a result, multi layers power and ground planes are composed of most PDNs to provide a low-impedance route in high-speed systems between the voltage regulator module (VRM) and the IC on the package and the printed circuit board (PCB), as shown in Figure 2-11. The power delivery network model consists of distributed networks of R, L, and C in Figure 2-12.

Figure 2-11. Main building blocks of the power delivery

(MB)

Figure 2-12. Power delivery network

50 2.2.2 IR-Drop (DC Power Integrity)

An IR-Drop can be representing a DC power integrity that is including voltage power supply and current for logic blocks on die, package and board. The drop which is voltage fluctuation is occurred by the combination of the resistance of the power delivery network and several factors such as simultaneous switching of several devices, Ldi/dt, chip-package resonance, and inadequate decoupling capacitance as shown in Figure 2-13 as well as the IR drop might be able to contribute timing uncertainty and affect the performance of the system.

Figure 2-13. System-level power loss budget for the power gated PDN [2.3]

2.2.3 AC Power Integrity

The AC Power Integrity is determined by the mounted device placement and the power delivery network components such as the power/ground planes, vias, interconnects, decoupling capacitor, and even VRM (Voltage Regulator Modulator). Therefore, the AC Power Integrity simulation is required more effort computationally in contrast to the DC power integrity analysis.

The key factor of the AC Power Integrity is to figure out the high current density at the transmission lines, power/ in especially high-speed I/O or clock frequency switching block. Since, a high inductance which is proportional to the frequency is created by a

51 high current. The high current will be creating the voltage fluctuation with the impedance of the power delivery network. In addition, the goal of the AC Power Integrity analysis is to have lower impedance of the power delivery network as much as possible. For example, I/O or core logic is switching which creates a current density in the power delivery network with a specific frequency. And, the power delivery network has a resonant frequency where the highest impedance all over the frequency ranges. Then, the voltage fluctuations are occurred severely when the switching frequency of Core or I/O logic block and the resonant frequency are matched. Therefore, it will help to avoid voltage fluctuations under switching if the impedance of the PDN is minimized.

52 CHAPTER 3 SINGLE-ENDED I/O IN 3D IC FOR SIGNAL INTEGRITY CHARACTERIZATION IN 3-TIER VERTICAL CONNCETION

3.1 Introductory Remarks

On-chip data I/O transmitter circuits have been developed which can directly provide stimulation of buried nodes and then the resultant transient signals of interest are detected through top side probe pads. These circuits and a high speed of oscilloscope are used to investigate signal propagation on the various Tiers of CMOS 3D test ICs. In this work, the 3D IC test structures from the MIT Lincoln laboratory 150nm CMOS SOI process were mounted on glass slides, biased and controlled with DC probes and tested via RF probes and a high-speed oscilloscope [3.1]. In addition, the 3D Integrated Circuits were ball-bonded on to an FR-4 board designed for matched 50Ω microwave and high speed transient measurements. This greatly reduced the bias and digital control interference and jitter compared to probed measurements.

The resulting transient signals, crosstalk measurements and eye diagrams are improved strikingly in noise performance compared to the RF and DC probed measurements.

3.2 GTL I/O Test IC

A design from prior proposed single-ended test IC [3.2], GTL (Gunning Transceiver

Logic) I/O is used to analyze the signal integrity for single-ended I/O chip-package and validate the package electrical model. GTL I/O is widely used for personal computer applications, especially the front-side bus and the Accelerated Graphics Port (AGP), which is the interface between the processor (CPU or GPU) and the processor chipset [2.1]. GTL drivers generate faster edge rate signals than push-pull drivers since they don’t have slow PMOS devices. They can be used to validate single-ended package electrical models at higher frequencies. The GTL

I/O test IC includes 2 ports of GTL drivers, GTL clock and data receivers and serial to parallel

53 converter to fire different data patterns having different data depths across two GTL I/O drivers as shown in Figure 3-1.

Figure 3-1. Overall GTL I/O test IC structure

In the test IC, a serial to parallel converter is used to load the serial data from external test equipment. In this research, a 16 bit serial IN buffer was designed that outputs 2 bits at a time and repeats the sequence indefinitely since there are two GTL I/O driver ports. In addition, a data depth controller is designed so that the periodic bit pattern is controlled by the external depth control signal to allow the noise to settle partially prior to the next bit and resonances to be sufficiently exited. However, a single bit transition more efficiently investigates the package power and ground noise since it allows longer term settling.

The GTL I/O driver is simply an open drain circuit; an NMOS transistor drain that can be shunted to ground, one end of a line and the other end of the transmission line is pulled up to the termination voltage. Turning the NMOS on and shunting the net to the ground generates a low signal. The NMOS typically has a very low equivalent resistance. Turning the NMOS device off and letting the termination resistor pull the net high generates a high transition. Figure 3-2 shows a 2 port GTL driver which has an off- chip 50 load.

54

Figure 3-2. GTL I/O driver

3.3 Characterizing Signal Integrity of High-Speed I/O in 3D ICs

In 3D stacked ICs, the signal integrity is a bigger issue compared to conventional 2D integrated structures due to large substrate-to-substrate parasitic effects. TSVs in 3D structures can support higher packing density and have efficient interconnection. However, the TSVs can be the site of signal integrity issues such as SSN (Simultaneous switching noise) and ISI (Inter-

Symbol Interference).

Figure 3-3. GTL I/O test IC structure

In this 3D system, GTL (Gunning Transceiver Logic) test structures were designed for characterizing the performance of the signals traveling through 3D IC TSVs and interconnects.

55 In order to look at the effects of TSV high speed transients on data communication, a set of single ended GTL test structures were placed on the 3-tiers of the 3D stacked IC.

Figure 3-3 [3.2] shows the block diagram of the GTL on-chip test system. The authors sent high speed signals from 3D IC vertical tiers (Tier A, Tier B and Tier C) to the probe point of the IC. Through this test structure, the crosstalk can be measured between vertical connections.

By observing the eye diagram for different data patterns via a programmed data stream or a 27-1

PRBS (Pseudo Random Bit Sequence) controlled and applied in this measurement, the inter- symbol-interference (ISI) and data-dependent jitter (DDJ) is characterized as well.

Figure 3-4. On-chip measurement setup

3.4 Limitation of On-chip Measurement

On-chip noise and jitter generated in high-speed I/O measurements using RF and DC probes make it difficult to assess the chip transient performance. The GTL I/O measurements were very noisy and showed a high level of jitter due to the interference through on-chip probe connections and through the bias and low speed digital control ports as well as nonplanar DC-pin probe connections. The large number of pins in the DC probe made it difficult to feed stable DC bias signals to the chip. There was 20 total DC and RF probe connections to the chip from four sides as shown in Figure 3-5 (a).

56 A

B Figure 3-5. Photograph of GTL I/O test IC and wire-bonded PCB, A) Chip microphotograph, B) Wire-bonded FR-4 PCB

In order to overcome these limitations, the 3D IC benefits from wire-bonding on to a custom FR-4 board which has matched 50Ω microwave transmission to transient test equipment as shown in Figure 3-5 (b). The custom FR-4 board consists of 3 ground planes and 1 top side signal plane with transmission lines. The wire-bonding of the IC on to a custom FR-4 board showed less interference and jitter from the analog bias ports and the digital control ports. As displayed in Figure 3-7, the eye diagram of each tier is very noisy when compared to the off-chip measurements eye-diagram in Figure 3-8

57

Figure 3-6. Off-chip measurement setup

A Horz: 200ps/div Vert: 100mV/div

B Horz: 200ps/div Vert: 100mV/div Figure 3-7. Eye-diagram of each tier in on-chip measurement, A) Tier A, B) Tier B and C) Tier C

58 C Horz: 200ps/div Vert: 100mV/div Figure 3-7. Continued

3.5 Comparison between On-chip and Off-chip Measurements

3.5.1 High-Speed Transient Measurement

An eye-diagram is one of several time domain analyzes used to evaluate transient behavior in high-speed I/O and can characterize the jitter, and amplitude.

The width and height of the eye shape corresponds to important signal integrity parameters. As shown in Figure 3-8, the eye-diagram results of off-chip measurements are much more opened than on-chip measurement results.

A Horz: 200ps/div Vert: 100mV/div Figure 3-8. Eye-diagram of each tier in off-chip measurement, A) Tier A, B) Tier B, and C) Tier C

59 B Horz: 200ps/div Vert: 100mV/div

C Horz: 200ps/div Vert: 100mV/div Figure 3-8. Continued

In addition, the Tier B has a smallest eye height and width among the tiers for both off- chip and on-chip measurements. This is so because the eye width is significantly impacted by jitter due to coupling from nearby tiers and the eye width is shown in Table 3-1.

Table 3-1. Eye diagram measurement results 1Gbps On-chip PCB Tiers height width height width Jitter Tier C 347.0mV 899ps 433.0mV 819ps 29.5ps Tier B 216.0mV 746ps 432.0mV 717ps 47.2ps Tier A 300.2mV 890ps 443.0mV 773ps 37.9ps

60 3.5.2 Crosstalk

Crosstalk is also an important factor affecting signal integrity and is caused by mutual interconnect capacitance via electrical fields and mutual interconnect inductance via magnetic fields.

A

B Figure 3-9. GTL I/O crosstalk test structure. A) NEXT, B) FEXT

As shown in Figure 3-9, the 3D IC GTL I/O driver has two output channels which are capable of characterizing an aggressor and a victim line in each tier. In order to measure NEXT or FEXT, the aggressor line should be switching while victim line is quiet. In the case of NEXT, only Tier C has to be active and for FEXT, only Tier A has to be active. Figure 3-10 shows on- chip and off-chip measurement results of NEXT/FEXT crosstalk through the top side probe pads after the signals are conducted through the TSVs of the vertical tiers in the 3D IC.

61 A B Horz: 200ps/div Vert: 45mV/div Horz: 100ps/div Vert: 100mV/div

C D Horz: 200ps/div Vert: 86.5mV/div Horz: 200ps/div Vert: 58.3mV/div

E F Horz: 100ps/div Vert: 94.5mV/div Horz: 200ps/div Vert: 100mV/div

F H Horz: 200ps/div Vert: 19.2mV/div Horz: 200ps/div Vert: 77.4mV/div Figure 3-10. On-chip FEXT’s A) aggressor, B) victim, Off-chip FEXT’s C) aggressor, D) victim, On-chip NEXT’s E) aggressor, F) victim, Off-chip NEXT’s G) aggressor, H) victim

62 Measuring crosstalk is essential to check whether the signal integrity is impacted by packaging the 3D IC on FR-4 boards. Since, wire-bonds are in the signal path the wire-bond increases series inductance. The pair of the aggressor lines and the victim lines such as (a,b),

(c,d), (e,f) and (g,h) in Figure 5 show that the signals appearing on one line can appear on other lines due to mutual capacitive as a result of the coupling of the electric fields between two lines.

In order to reduce the capacitance in this research, the line spacings on the board are as wide as possible, and the dielectric was thin placing the line close to the ground plane to reduce coupling.

The noise of the off-chip crosstalk measurements was much less than the on-chip measurement results.

Table 3-2. Crosstalk measurement results On-chip Off-chip Crosstalk NEXT FEXT NEXT FEXT Duration 806ps 404ps 595ps 526ps

Based on the Table 3-2, the NEXT of on-chip and off-chip systems had a long delay compared to the FEXT. The FEXT result indicates that capacitive crosstalk is dominant between

Tier A and Tier C.

3.5.3 EM Simulation and Transient Simulation Results

The authors selected ANSYS’s High Frequency Structure Simulator (HFSS) [3.3] as the electromagnetic (EM) modeling tool to investigate the fabricated IC, the probes and the FR-4 testing boards. S-parameters can be simulated for the GTL I/O test structure’s electrical characteristics. This is done for the on-chip measurement modeling from the output port of each tier to top side probe pads and also off-chip measurement modeling from the output port of each tier to the end of transmission line on the board. Using HFSS, both on-chip modeling and off- chip modeling has been implemented for the same 3D IC model as shown in Figure 3-11. In

63 order to find S-parameters, especially S21 which is the insertion loss, the input impedance of I/O driver set by Spectre simulation in Cadence and the output port have been terminated with 50Ω.

A

B Figure 3-11. Modeling in HFSS A) on-chip, B) off-chip

The insertion loss measurements are performed sequentially, Tier C, B and A as shown in

Figure 3-12. Based on the results, the insertion loss of Tier A is biggest and Tier C is smallest among the tiers, however, the results are not identical with the experimental results since the parasitic capacitance between the tier and metal layers are not fully included to the HFSS EM simulation.

64

Horz: 100MHz/div Vert: 0.1dB/div Figure 3-12. S21 magnitude of On-chip and Off-chip

As shown in Table 3-3, the off-chip insertion loss is bigger than the on-chip insertion loss due to ball-bonding on FR-4 board as well as the transmission line effects.

Table 3-3. Simulated On-chip and Off-chip insertion loss On-chip On-chip Off-chip Off-chip S21(dB) Out1 (mV) Out2 (mV) Out1(mV) Out2 (mV) Tier C 0.019 0.016 1.458 1.460 Tier B 0.024 0.025 1.463 1.464 Tier A 0.027 0.029 1.466 1.468

The S-parameters are utilized for transient simulations in ADS [3.4] in order to compare measurement results and simulation results of the GTL I/O structure in the 3D IC as shown in

Figure 8. Using S parameters to represent the board line, the time domain simulation has been accomplished with frequency domain data. The on-chip and off-chip transient results can be verified by comparing them to the simulation in ADS. In this transient simulation, a 1Gbps (27-1)

PRBS component is used as an input data pattern. The input data pattern goes through a modeling component (SNP) which contains a frequency domain characteristic from the HFSS

65 modeling in ADS. The component represents how much the output signals are degraded by the insertion loss of on-chip and off-chip via the S-parameter measurements.

A Horz: 5.0ns/div Vert: 100mV/div

B Horz: 5.0ns/div Vert: 100mV/div Figure 3-13. Transient simulations with S21 parameter of Tier C. A) On-chip, B) Off-chip

In (a) of Figure 3-13, the input and output pulses have little difference in amplitude which means that the input signal is not affected by the S parameter since the insertion loss values are not dominant. However, in (b) of Figure 3-13, it is shown that the output pulse which is

513.2mV is smaller than the input pulse. There is less than a 10% difference between the simulated and the actual measurement results which are shown in Table 3-1. In addition, a channel simulator is used in order to evaluate the eye-diagram of the off-chip system in ADS and

66 the transient simulation results of each tier that is affected by the insertion loss in Table 3-3 are shown in Figure 3-14. It is observed that the results in Figure 3-14 and in Figure 3-3 which shows the off-chip system measurement results are consistent in comparison.

A Horz: 2.0ns/div Vert: 100mV/div

B Horz: 2.0ns/div Vert: 100mV/div Figure 3-14. Eye diagram of each tier in off-chip transient simulation in ADS. A) Tier A, B) Tier B and C) Tier C

67 C Horz: 2.0ns/div Vert: 100mV/div Figure 3-14. Continued

3.6 Additional On-Board Measurements

3.6.1 3D IC GTL I/O Response to Varying Data Patterns

It is shown that the rise time and voltage amplitude are varying depending on the different data patterns in Figure 3-15. Based on the results, the data pattern which is 00110110 has the most delayed rise time of the first edge of the pulses compared to other data patterns since this data pattern has more ISI occurrence. The ISI occurs when the serial stream contains a number of bits of the same value followed by short bit occurrence of the opposite value.

A Horz: 2.0ns/div, Vert: 100mV/div Figure 3-15. Different data patterns at Tier C. A) 00011001, B) 00110110 and C) 01000000

68 B Horz: 2.0ns/div, Vert: 97.8mV/div

C Horz: 1.0ns/div, Vert: 100mV/div Figure 3-15. Continued

Table 3-4 shows the performance of each tier with different data patterns. Table 3-4. Different data patterns measurement results 00011001 00011001 00110110 00110110 01000000 01000000 Tier Rise time Vamp(mV) Rise time Vamp(mV) Rise time Vamp(mV) C 460ps 566.0 550ps 577.4 366ps 551.4 B 450ps 548.1 460ps 553.0 431ps 530.3 A 440ps 528.7 550ps 549.7 335ps 544.9

3.6.2 The Edge Timing Comparing Signals Launched from Several 3D IC Tiers Simultaneously Launched to the Same I/O Output

The signals which launched from several 3D IC Tiers simultaneously to the same I/O output are detected at the surface probe point. The edge timing difference is shown in Figure 3-

16.

69 A Horz: 399.6ps/div, Vert: 100mV/div

B Horz: 500.0ps/div, Vert: 100mV/div

C Horz: 500.0ps/div, Vert: 100mV/div Figure 3-16. Edge timing difference of output pulse when turning on several tiers simultaneously @ Vsupply = 1.4V. A) Tier A&B, B) Tier A&C, C) Tier B&C and D) Tier A&B&C

70 D Horz: 500.0ps/div, Vert: 100mV/div Figure 3-16. Continued

Based on the measurement results, it can be calculated how much delay is caused by

TSVs parasitic capacitance between the tiers such as Tier A&B, Tier A&C, Tier B&C and Tier

A&B&C. Tier B&C has two pulses which have same amplitudes which allows measurement the edge timing difference. However, the pulses of Tier A&C and Tier A&B has different amplitudes due to conductor DC loss that occurs the resistive component of transmission lines, so the edge timing differences are derived from the time differences between 50% pulse amplitudes. The edge timing difference between Tiers in the 3DIC is shown in Table 3-5.

Table 3-5. Edge timing difference between tiers Tiers Edge timing difference Tier A&B 65ps Tier B&C 48ps Tier A&C 100ps

In order to find edge timing difference between tiers, the supply voltage has to have adjusted to little bit lower than the 1.5V. If the voltage is at 1.5V, the output pulse shows only one amplified signal, not the different two pulses as shown in Figure 3-17 and Table 3-6.

71 A Horz: 500.0ps/div, Vert: 99.3mV/div

B Horz: 500.0ps/div, Vert: 100mV/div

C Horz: 500.0ps/div, Vert: 100mV/div Figure 3-17. Amplified output pulse when turning on several tiers simultaneously @Vsupply = 1.5V. A) Tier A&B, B) Tier A&C, C) Tier B&C and D) Tier A&B&C

72 D Horz: 500.0ps/div, Vert: 100mV/div Figure 3-17. Continued

Table 3-6. Simultaneously launching to the same I/O output Tier Tier Tier Output Tier A&B&C A&B A&C B&C Risetime 432ps 302ps 432ps 347ps Jitter 3.3ps 2.4ps 2.1ps 5.1ps Vamp(mV) 663.52 723.32 742.78 744.40

3.6.3 GTL I/O Performance under Degraded Environmental Conditions in the 3D IC such as Reduced Bias

The data patterns are degraded by the parasitic capacitance, ISI (inter-symbol interference) between the tiers when the data patterns are fired from more than two tiers simultaneously to same I/O output with varying the supply voltages in Figure 3-18.

A Horz: 2.0ns/div, Vert: 100mV/div

B Horz: 2.0ns/div, Vert: 100mV/div Figure 3-18. Different data pattern, 00110110 with degrading Vsupply at Tier A. A) 1.3V, B) 1.4V and C) 1.5V

73 C Horz: 2.0ns/div, Vert: 100mV/div Figure 3-18. Continued

3.7 Summary

GTL High Speed I/O test structures in MIT Lincoln Laboratory (MIT LL) 3D ICs were fabricated to TSV and interconnect signal integrity characterizations. The approach the author proposed is non-invasive test and the signal integrity of the TSV and interconnect with the IC packaging can be measured using internal test points for packages; wire-bonding. The measurements were performed both on-chip with probes and off-chip on a board to be determined the performance differences. The board measurements had superior noise characteristics. The fabricated models of the 3D IC interconnect is analyzed using EM simulation in HFSS and transient simulation in ADS. The extensive additional results of on-board measurements are used to characterize for TSV and interconnect in 3D Stacked ICs.

74 CHAPTER 4 SIGNAL INTEGRITY CHARACTERIZATION OF HIGH-SPEED I/O IN TWO TIER 3D-IC AND PACKAGE SYSTEM

4.1 Introductory Remarks

The authors proposed to validate efficiently, TSV and interconnect performance using

GTL I/O drivers in new a 3D chip-package system. The test ICs were designed and fabricated in the Tezzaron two tier 3D CMOS 150nm Technology. For on-chip measurements, the 3D-ICs were mounted on glass slides and tested through RF probes and a high-speed oscilloscope. To provide anticipated information in 3D chip-package system, the 3D IC test structures were ball- bonded on to a custom FR-4 board with matched 50Ω lines for high-speed transient measurements. The transient signals from the on-chip measurements have interference with the noise from several DC probe and RF probes; however the off-chip measurements have improved noise performance compared to the on-chip measurements.

4.2 Characterizing TSV and Interconnect of High-Speed I/O in 3D IC-Package System

A single-ended I/O driver introduced in [4.1] is designed and used to measure 1Gbps transient signals to characterize the TSV and interconnect in a Tezzaron 3D CMOS 150nm technology. The chip consists of top and bottom tiers with M6-M6 (metal 6 to metal 6) connections between the tiers as shown in Figure 4-1 [4.2].

Two GTL drivers were designed on each tier in order to characterize the TSV performance. The goal was to compare outputs from the drivers that go straight out through the

TSVs to the backside metal (BSM) to signals that go to the BSM and then up and down through the tiers twice before coming back to the probing points on the BSM.

75 A

B Figure 4-1. Tezzaron 3D CMOS structure. A) 3D chip structure, B) Cross-cut

There are four test strategies; 1. TS (I/O driver is in Top tier with a straight-forward route), 2. TR (I/O driver is in Top tier with a round-trip route), 3. BS (I/O driver is in Bottom tier with a straight-forward route) and 4. BR (I/O driver is bottom tier with a round-trip route.) as shown in Figure 4-2 and Figure 4-3.

Figure 4-2. Test strategies (TS, TR, BS and BR) in 3D-IC

76

(a) Straight-forward (b) Round-trip (c) TSV location in crosscut Figure 4-3. Example of test methods in Top tier of 3D-IC

For reference, the dimensions of the 3D-ICs are presented in Table 4-1 and Table 4-2.

Table 4-1. Dimension of 3D-IC Thickness(um) WTOP and WBOTTOM 12 Metal Layers 6 TSV of WTOP 6

Table 4-2. Dimension of each test strategy in 3D-IC Straight-Forward(um) Round-trip(um) WTOP  Pad 620.7 752.6 WBOTTOM  Pad 460.7 633.8

Figure 4-4 shows the I/O pad arrangement and the 3D chip layout with 4 different test strategies. The size is 2050um by 1798um.

77

Figure 4-4. 3D layout of GTL I/O 3D test IC.

4.3 Simulation and Measurement Results in the 3D Chip-Package System

In Figure 4-5, the simulation result shows the output signal switching at 1Gbps before launching on the TSV and interconnects. Each tier can be enabled separately to generate the output signal by tier control signals as the simulation below presents. However, on-chip transient measurement performance was impossible to access since the on-chip noise and jitter generated in high-speed I/O measurements using RF and DC probes corrupted the data.

Figure 4-5. On-chip transient simulation results with 1Gbps

78 To overcome the limitations, the 3D-IC was ball bonded to the FR-4 board and which made it possible to measure the transient performance with 1Gbps data.

A

B Figure 4-6. Photograph of GTL I/O test IC and wire-bonded PCB. A) Chip microphotograph, B) Wire-bonded FR-4 PCB

79 4.3.1 High-Speed Transient Measurement

Using four different strategies, off-chip measurements were performed in order to characterize the TSV and interconnect. The switching data signals are compared between the strategies as shown in Figure 4-7 and in Table 4-3.

A Vert:26.7mV/div, Horz:200ps/div

B Vert:26.7mV/div, Horz:200ps/div Figure 4-7. Off-chip transient measurement with 1Gbps. A) Top tier – straight-forward, B) Top tier – round-trip, C) Bottom tier – straight-forward and D) Bottom tier – round-trip

80 C Vert:23.0mV/div, Horz:200ps/div

D Vert:32.3mV/div, Horz:200ps/div Figure 4-7. Continued

Table 4-3. Off-chip transient measurement results TS TR BS BR Risetime 173ps 213ps 228ps 427ps Jitter 3.3ps 3.4ps 3.9ps 4.2ps V_amp 115.44mV 114.49mV 111.15mV 112.63mV

Based on the rise time and jitter from the test results, the board output data from GTL I/O driver in the top tier showed better performance than the driver in the bottom tier since output

81 signals from the top tier had less interference than the bottom tier as shown Table 4-2. This is so since the output signal from bottom tier that traveled all metal layers to the top tier (connected through M6 layer). Also, the output signals from straight-forward path have better performance than the round-trip path since the straight-forward signal route has 1 TSV instead of 5 TSVs in the round-trip route.

4.3.2 Transient Measurement with Data Patterns

The authors measured the data pattern “01010100” transients then extracted the rise time and voltage amplitudes to analyze effects of the different signal paths. This is shown in Figure 4-

8.

A Vert: 33.4mV/div, Horz:2.0ns/div

B Vert: 34.8mV/div, Horz:2.0ns/div

C Vert: 40.7mV/div, Horz:2.0ns/div

D Vert: 30.4mV/div, Horz:2.0ns/div Figure 4-8. Data pattern, “01010100”. A) TS, B) TR, C) BS and D) BR

Based on the results presented in Table 4-4, the BR data pattern has the most delayed rise time for the first edge of the pulses compared to other strategies since this bottom tier and round- trip has a long traveling route with more TSVs and interconnect.

82 Table 4-4. Data pattern, “01010100” measurement results TS TR BS BR Risetime 170ps 320ps 180ps 370ps V_amp 120.35mV 136.14mV 127.65mV 105.71mV

For reference, the author measured another data pattern which is “01001001” to make sure that the results are consistent compared to the previous results.

A

Vert: 25.5mV/div, Horz:2.0ns/div

B Vert: 31.5mV/div, Horz:2.0ns/div

C Vert: 40.4mV/div, Horz:2.0ns/div

D Vert: 25.5mV/div, Horz:2.0ns/div Figure 4-9. Data pattern, “01001001”. A) TS, B) TR, C) BS and D) BR

The results are consistent with “01010100” pattern and the output signal from BR has the worst performance which is the most delayed risetime and smallest voltage amplitude due to conductor DC loss that occurs the resistive component of transmission lines as shown in Figure

4-9 and Table 4-5.

Table 4-5. Data pattern, “01001001” measurement results TS TR BS BR Risetime 200ps 310ps 350ps 420ps V_amp 136.36mV 128.06mV 151.80mV 125.03mV

83

4.4 Summary

In this chapter, GTL High-Speed I/O test ICs were designed and fabricated to perform

TSV and interconnect signal integrity characterizations in a Tezzaron two tier 150nm 3D technology and the 3D ICs were ball-bonded on to FR-4 boards to make a comparison between a chip and a package system. On-chip simulations and off-chip measurements were performed to determine the performance differences among 4 different testing strategies. The authors performed high-speed transient measurements with different data patterns and confirmed that the performance is affected by the length of interconnect and the number of TSVs in 3D Stacked IC.

The contribution of this research is that the signal integrity characterization can be performed using internal tests point which is non-invasive.

84 CHAPTER 5 ON-CHIP 20GBPS HIGH-SPEED IC TEST SYSTEM FOR SIGNAL INTEGRITY CHARACTERIZATION IN FLIP-CHIP PACKAGE

5.1 Introductory Remarks

A 20Gbps high-speed IC test system design will be presented for signal integrity characterization in a chip-package system. In previous chapters, chapter 3 and chapter 4, 1Gbps single-ended I/O Test ICs were used for TSV and interconnect signal integrity characterization and the chapter presented both simulation and measurement results. However, as we mentioned in the chapter 1, both on-chip local clock and off-chip data rates are increased and enable higher performance. Hence, for such a high-speed application, investigating the signal integrity effects of I/O drivers as well as validating the accuracy of package electrical models is increasingly essential when using 20Gbps high-speed I/O. However, it is difficult for high-speed clock frequency and data signals to go through the circuits on-chip when generated from the external sources. [5.1] Also, measurement costs are high since CMOS components are replaced by expensive equipment. To overcome such limitations, the authors proposed an on-chip 20Gbps

High-Speed I/O IC test system for signal integrity characterization in flip-chip packages.

Figure 5-1. Overall block diagram for High-Speed I/O IC test system in Flip-chip packages for signal integrity characterization

As shown in Figure 5-1, the on-chip high-speed I/O Test IC is mounted on the die and it is connected to the PCB using flip-chip packages to achieve high-speed data rates. A test vehicle

85 has two different test patterns; one is the coupled microstrip lines pattern and the other is coupled microstrip lines with multiple patterned vias. Hence, using the test vehicle and on-chip high- speed I/O test IC, it is possible to perform the signal integrity characterization at 20Gbps.

5.2 Characterizing Interconnect Signal Integrity of High-Speed I/O in Flip-Chip Package

Signal integrity is one of the major issues in the design of a circuit and a packaging as the complexity of interconnect is increased in flip-chip package. In order to perform a high-speed transient simulation, differential I/O drivers generate differential signals which are going through the C4 bumps between the die and the package to the probe point on a PCB as shown in Figure

5-2. In a package, a test vehicle having differential inner traces is modeled in order to analyze the signal integrity parameters such as timing, crosstalk, ISI (inter-symbol interference) and transmission line effects at the 20Gbps data rate. The authors selected a flip-chip packaging technology since the package provides better impedance matching and reduced interconnection losses in comparison to wire-bonding. Also, differential I/O is implemented instead of single- ended I/O as well as a Rogers 4350B PCB is used as a package instead of FR-4 board due to high performance requirement. In addition to, the difference of electrical performance between the dielectrics such as silicon and glass will be discussed in terms of the S-parameters.

Figure 5-2. Concept of signal integrity characterization in flip-chip package

86 5.3 Demonstration of a Proposed On-Chip 20Gbps High-Speed I/O Test IC

As mentioned in a previous section, the interconnect signal integrity simulation is performed by a proposed on-chip high-speed I/O Test IC. The test IC consists of 8-modulus 20-

GHz Phase-Locked Loop (PLL) as a clock generator, a 20-Gbps Full-Rate 27-1 PRBS Generator and 4-port 20-Gbps Differential CML I/O Logic as shown in Figure 5-3. All components are designed with proposed techniques and simulated in UMC 90nm technology.

Figure 5-3. Overall block diagram for On-chip High-Speed I/O test IC

First of all, a high frequency clock signal is provided through 20GHz PLL, which is integrated in the chip so that a lower reference frequency clock is needed to feed through the package or the die. In the PLL logic, the LC-VCO generates a 20GHz clock so that the PRBS and CML I/O driver are driven by the clock. Also, an on-chip PRBS generator must be implemented in order to provide random data to CML I/O driver. Lastly, the CML I/O driver is firing the data to interconnect of the die and the package through C4 bumps for interconnect signal integrity characterization.

5.3.1 8-modulus 20-GHz Phase-Locked Loop in 90nm CMOS

A 20GHz integer-N charge-pump PLL is designed with the 8-modulus prescaler, a 3-state phase frequency detector (PFD), a charge pump, a second-order passive loop filter, LC-tank

87 Voltage Controlled Oscillator (VCO), a modified Cherry Hooper Amplifier, TSPC (true single phase clocking) and output buffer as shown in Figure 5-4.

Figure 5-4. 8-modulus 20GHz integer-N charge-pump PLL block diagram

A PFD utilizes two flip-flops to produce three states; pull-up, pull-down, and high- impedance. In order to avoid the dead-zone problem occurring when the phase error puts phase values close to zero, delay blocks implemented by buffers and MOS capacitors give a fixed minimum width to the PFD output pulses. Also, VCO noise leakage is reduced due to the dead- zone free PFD. A second-order loop filter is designed and the loop bandwidth is set to 400 KHz based on the charge-pump current, 130uA. In addition, the loop stability that has to be less than

1/10 of the reference frequency. The 8-modulus prescaler [5.2] consists of a divide-by-4/5 synchronous circuit, a divide-by-64 asynchronous circuit, and a modulus control block for adjusting the frequency-divide ratios in Figure 5-5. The prescaler divide ratios are 256 to 263 and contribute negligible noise from the VCO. And, the fully differential output of the prescaler has a rail-to-rail swing using a modified Cherry-Hooper Amplifier. The modulus control block generates a pulsed control signal to adjust the number of divide-by-5 operations of the divide-by-

4/5 circuit.

88

Figure 5-5. 8-modulus prescaler block diagram

The output buffer of PLL, as a clock distributor, has to provide a rail-to-rail signal swing which consists of a PMOS DC level shifter, nMOS amplifier and an inductor in series with the load as shown in Figure 5-6. Also, high frequency operation is enabled by pMOS which blocks the high speed signal current flow since the pull down current from the driver can flow into the load capacitor and the output voltage can be transitioned sharply.

Figure 5-6. VCO output buffer schematic

Figure 5-7. True single phase clocking schematic

89 As shown in Figure 5-7, a True Single Phase Clocking (TSPC) has a role as divide-by-2 divider. Hence, the overall divide ratios of the PLL are going to be 512, 514, 516, 518, 520, 522,

524 and 526. Also, the TSPC provides a divided signal to PFD which needs a single-ended input signal instead of the differential signal of the prescaler output.

Figure 5-8. Simulated result for 20GHz PLL

In Fig. 8, the overall simulation result of the PLL shows that the PLL is locked at targeted frequency which is 39MHz and can provide 20GHz clock signal to other components in the I/O system. Table 5-1 summarizes the simulated PLL characteristics.

Table 5-1. Summary of PLL performance Objects Value Locked Frequency 20GHz Settling time < 10us Supply voltage 1.2v Technology UMC 90nm logic CMOS Loop Bandwidth 400kHz Reference Frequency 39.06MHz Prescaler ratio 512~526 (8 modulus)

5.3.2 A 20-Gbps 27-1 PRBS Generator in UMC 90nm CMOS

The PRBS (Pseudo Random Bit Sequence) generator is composed of basic linear feedback shift register (LFSR) and output buffer. The LFSR is composed of 6-D flip-flops and a

90 merged XOR flip-flop as shown in Figure 5-9. A reset switch connects Q1 to Vdd to insert logic

‘1’ when the PRBS falls into all zeros which means the state is stuck.

Figure 5-9. Proposed PRBS generator block diagram

In this IC design, two techniques are utilized to achieve the high-speed PRBS Generator.

One is a merged XOR-gate flip-flop that is proposed in this work and the other is a pulsed latch flip-flop [5.3] instead of master-slave flop-flop (F/F). They introduced the pulsed latch F/F to mitigate latency. In a master-slave F/F, the master keeps the input from racing through while slave is transparent. However, at 20GHz, the latency of the slave is preventing from the race- through which means that the master-slave structure can be alternate. Using this technique, we also developed a merged XOR-gate Flip-flop that replaces the one of latch in a master-slave flip- flop with two CML buffers to save another delay from the XOR gate and F/F in Figure 5-10.

Figure 5-10. Proposed pulsed latch F/F with XOR block diagram

The clock in the merged XOR Flip-flop only needs to drive one latch, therefore it can contribute to lower power consumption in the PRBS 27-1 case. At the same time, the power

91 consumption is reduced because the two current sources required for the slave latch gates can now be omitted. The merged XOR latch schematic is shown in Figure 5-11.

Figure 5-11. Proposed merged XOR Latch

The merged XOR latch consists of four nMOS transistors to work with the XOR logic and inductive peaking, negative feedback and a CML buffer with shunt peaking in a second- stage. The described bandwidth-enhancing inductors and negative feedback circuits are described in [5.4]. Using the proposed merged XOR-gate F/F and pulsed latch F/F, the 20Gbps

27-1 PRBS Generator has been designed and simulated in 90nm CMOS in Cadence and the results are shown in Figure 5-12. The PRBS pattern is verified that it is equal to the sequence of x7+x4+1 [5.7] characteristic polynomial of 27-1.

Figure 5-12. 20Gbps PRBS 27-1 sequence of x7+x4+1 characteristic polynomial of 27-1

92 As shown in Figure 5-13, the eye-diagram is shown a 50ps pulse width verified a data transition with 20Gbps in simulation.

Figure 5-13. Eye-diagram 20Gbps PRBS sequence

Table 5-2. All 18 possible characteristic polynomials for a 27-1 PRBS

93 5.3.3 4-port 20Gbps Differential CML I/O Logic

The 20Gbps CML I/O Logic consists of data receiver which is tapered CML buffer, a modified Pulsed latch Flip-flop and CML I/O driver which is tapered CML buffer in Figure 5-14.

Figure 5-14. CML I/O driver block diagram

The pulsed latch F/F that consists of the CML latch and two buffers is proposed by [5.3] in 130nm process. However, the CML latch in the F/F is designed in 90nm technology and it is modified so that the current source has two stacked transistors since the transistors have a high slope (low output resistance) in their short channel device DC-output characteristics as shown in

Figure 5-15. And, the data which comes from the PRBS generator are received by the data receiver with less noise and make it easier to recover the signal edge.

Figure 5-15. Modified CML latch in pulsed latch F/F

The authors used a modified pulsed latch F/F instead of master-slave flip-flops to operate faster high speed CML I/O Logic since the conventional two latches create a long delay in the master-slave flip-flop switching. In addition, the CML latch uses inductive peaking to extend the bandwidth. The current flows are delayed by inductors so that the drain junction and the parasitic capacitances of the inductors and the load capacitance are charged quickly and driven hard

94 simultaneously. The inductors and negative feedback extends the bandwidth, however, they sacrifice latency. Therefore, the latch applied in this work does not include negative feedback that might be degraded in higher frequency operation of the CML I/O driver. In addition, pMOS transistors in the CML latch are used as a load since on-chip resistors are not controlled well in a

CMOS process. The current sources consist of two stacked nMOS transistors in which the upper transistor is a low threshold voltage transistor and the bottom is a regular threshold voltage transistor.

Figure 5-16. Tapered CML I/O driver

As shown in Figure 5-16, a 4-stage tapered buffer [5.5] having taper factor of 1.6 was chosen to amplify the 20Gbps data signal has been implemented at the end of CML I/O driver.

The number of stages and taper factor was deciding by the maximum bit rate amplified. The main driver uses shunt inductive peaking to extend a bandwidth and a cascaded current source to have a high slope in short channel devices DC-output characteristics. For reference, below is the comparison of the transient simulation results between modified pulsed-latch F/F and master- slave F/F at 20Gbps in Figure 5-17.

95 A

B Figure 5-17. Comparison the simulation results at 20Gbps in 90nm process. A) Modified Pulsed F/F transient simulation, B) Conventional Master-Slave F/F transient simulation

4-port CML I/O driver simulation result is shown in Figure 5-18 using a 20GHz clock signal and a 20Gbps input data. Hence, the CML I/O driver can fire the 20Gbps 27-1 PRBS data to the interconnects having a differential pair on a die and a package.

96

Figure 5-18. Simulated result for 4-port 20Gbps CML I/O logic

5.3.4 Overall Simulation Results of High-speed I/O Test IC

A proposed on-chip High-Speed I/O Test IC design has been demonstrated using UMC

90nm in CMOS Logic and overall transient simulation result shows that 20GHz PLL can feed the clock signal to other logic circuits; 4-port 20-Gbps Differential CML I/O Logic and a 20-

Gbps 27-1 PRBS Generator as shown in Figure 5-19.

Figure 5-19. Transient simulation results of On-chip 20Gbps High-Speed I/O test IC

97 Hence, a proposed on-chip High-Speed Test IC can provide both a high-speed clock and random data to the chip and to the package for signal integrity characterization.

5.4 Demonstration a Test Vehicle for Interconnect Signal Integrity Characterization

As shown in Figure 5-20, the authors designed two patterns on the PCB; one is a coupled microstrip lines pattern which has two different lengths of transmission lines, the other are coupled microstrip lines with vias. For characterizing the interconnect and transmission lines, the differential pair signal is traveling from the Test IC to the PCB through C4 bumps in Flip-chip package.

Figure 5-20. Test vehicle for interconnect signal integrity characterization

5.4.1 Coupled Microstrip Lines Pattern

Two sets of coupled microstrip lines (CMLs) are modeled identically except for the length as shown in Figure 5-21. One of them is 10mm and the other is twice the length10mm.

Figure 5-21. Dimension of coupled microstrip lines on PCB

98 The S-parameters of each set of CMLs is extracted from the model in HFSS [3.3] in order to find transmission line parameters which are R, L, C and G used in determining the propagation constant and characteristic impedance. [5.6] Hence, the difference of insertion loss between 10mm of CMLs and 20mm of CMLs will be determined depending on the length as shown in Figure 5-22. For reference, the operational frequency is 10GHz on PCB due to 20Gbps data rate.

Figure 5-22. S magnitude of both CMLs_10mm and CMLs_20mm

In addition, high-speed transient simulation is performed with both S-parameters in ADS

[3.4] and it shows how much the signals are degraded by the insertion loss of interconnects from the I/O driver on a chip to the probe point on a PCB as illustrated in Figure 5-2. The simulation result is presented in Figure 5-23.

Horz: 20ps/div, Vert: 50mV/div Figure 5-23. Transient simulation results with S-parameter of both CMLs_10mm and CMLs_20mm

99 Also, ISI (Inter-Symbol Interference) that is occurred between adjacent pulses of a data is characterized by creating eye-diagram with 20Gbps PRBS data. The interference depends on line-length, data rate and sub-material on a PCB. Results are shown that CMLs_10mm has less

ISI than CMLs_20mm due to a short line-length as shown in Figure 5-24, 5-25 and Table 5-3.

Horz: 10ps/div, Vert: 100mV/div Figure 5-24. Eye-diagram of both CMLs_10mm

Horz: 10ps/div, Vert: 100mV/div Figure 5-25. Eye-diagram of both CMLs _20mm

Table 5-3. Eye-diagram simulation results CMLs_10mm CMLs_20mm height width height width 530mV 497ps 268mV 497ps

5.4.2 CMLs with Multiple Via Pattern

The via is one of the major discontinuity elements in packages since it is composed of inductance, resistance and capacitance. The electromagnetic simulation is performed both CMLs

100 with 2 vias and CMLs with 6 vias to analyze its electrical performance. For reference, CMLs with 2 vias dimensions are illustrated in Figure 5-26 and CMLs with 6 vias has a same dimensions except for the number of vias. In Figure 5-27, the CMLs with 2 vias has less insertion loss than CMLs with 6 vias at 10GHz.

Figure 5-26. Dimension of CMLs with 2 vias on PCB

Figure 5-27. S magnitude of both CMLs_via2 and CMLS_via6

5.5 Comparison between the Dielectric Substrates in the Test Vehicle

The electrical properties of dielectric materials used in packages and PCB are affected to the electrical performance of digital interconnects. The electrical properties of a dielectric material are described by both dielectric constant and dielectric loss tangent. The dielectric constant is shown how the material increases the capacitance and decreases the speed of light in the material. The loss tangent which is dissipation factor describes the number of dipoles and

101 their motion and is a measure of how much the conductivity increases proportional to the frequency. (5-1) [5.8]

tan()  n pMAX (5-1) Where: tan(δ) = the dissipation factor, n = the number density of dipoles in the dielectric p = the dipole moment, a measure of the charge and separation of each dipole Θmax = how far the dipoles rotate in the applied field

In Table 5-4, some of dielectric materials are listed and shown the dielectric constant and dissipation factor.

Table 5-4. Dielectric substrates Substrates Dielectric Constant Dielectric Loss tangent FR4 4.7 0.015 Silicon 11.9 0.005 Rogers RO4350 3.43~3.53 0.004 PTFE / Woven glass 2.5 0.002

As test patterns are introduced earlier, Interconnects such as the transmission line and via are placed on the Rogers 4350 dielectric material as shown in Figure 5-26 since the dielectric material has less impact of substrate to interconnects than FR4 because the dielectric constant and the loss tangent is lower than FR4’s as shown in Table 5-3. Silicon and glass are also substrates that can be used as dielectric materials. In order to see the difference of the electrical performance, EM simulations are performed among the dielectric substrates such as glass, silicon and Rogers 4350 as shown in Figure 5-28.

102

Figure 5-28. Dimension of CMLs with multiple vias on package

In Figure 5-29, the electrical performance is demonstrated in terms of insertion loss in both test patterns; one is CMLs with 2 vias and the other is CMLs with 6 vias.

A

B Figure 5-29. Simulation results of multiple substrates. A) CMLs_Via2 with multiple substrates, B) CMLs_Via6 with multiple substrates

103 At low frequency, there is no difference of the loss among the dielectric materials up to

5GHz. However, as long as the frequency is increasing, the losses are varying depending on the dielectric constant and dissipation factor. Since, the power dissipation is higher as long as the conductivity is higher; the conductivity increases with frequency due to the increasing motion of the dipoles; dipoles move the same distance, but faster so the current increases and the conductivity increase. Hence, the test vehicle with silicon has worst performance among the substrates since both the dielectric constant and dissipation factor are higher than the others. In comparison between the silicon and the glass, TGV (through glass via) shows a better performance in terms of the insertion loss.

5.6 Summary

In this chapter, designs describing a 20Gbps on-chip High-speed I/O Test IC and test vehicle are introduced for interconnect signal integrity characterization in the flip-chip package.

First of all, a proposed test IC consists of 20GHz 8-Modulus PLL, 20Gbps 27-1 PRBS generator and 4-port differential CML I/O driver. The test IC was designed in UMC 90nm technology and the vehicle are modeled in Rogers 4350B PCB. Using this Test IC and test vehicle simulations, the author demonstrated how to characterize the signal integrity of interconnect in flip-chip packages and both transient and electromagnetic simulation results are presented. In addition to, the test vehicle is performed the EM simulation to figure out the difference of the electrical performance in terms of insertion loss among dielectric materials such as silicon, duroid and glass. Through this research, not only investigating the signal integrity effects of multi-port differential I/O chip system but also validating the accuracy of package electrical model in flip- chips is enabled in 20Gbps high-speed data rate.

104 CHAPTER 6 COST EFFECTIVE MODELING METHODOLOGIES AND EVALUATING ELECTRICAL INTERACTION IN FCBGA (FLIP-CHIP BALL GRID ARRAY) PACKAGES

6.1 Introductory Remarks

As the integration technology advances, both package modeling and simulation are more critical in FCBGA package design since physical structures are more complex. Also, not only modeling die and package accurately but also analyzing the electrical behavior of the package structures is important in order to anticipate higher performance and functionality for power integrity and signal integrity in FCBGA packages. However, system-level considerations have become increasingly important as ITRS modeling and simulation roadmaps identify in crosscut issues [6.1] that 3D modeling and simulating the entire IC-package structure is a huge challenge

[6.2]; the large system model size requires expertise across design disciplines; as well as having unacceptably high computational cost due to in the hundreds and thousands of physical links with connectivity [6.3].

A modeling methodology is proposed for die-to-package connectivity that shows good agreement with the power, ground grids and C4 bump performance. Both a single 3D and a combined model will be demonstrated and the S-parameters extracted from both models are compared to verify that the methodology can be used for die-to-package connectivity. In addition, polynomial modeling is proposed to represent an electrical behavior in terms of S- parameters for a huge structure. The evaluation of the electrical behavior of the package physical structure between the first metal layer on the die and package is demonstrated.

6.2 Modeling Methodology for Die-Pkg Connectivity

Although co-simulation of the entire structure gives accurate modeling simulation results

[6.4], the high computational cost is a key challenge. In this work, the authors show a modeling methodology for the FCBGA structures that can be cost saving (as see in Figure 6-1).

105 A B Figure 6-1. Modeling strategies. A) Single 3D model – die + pkg, B) Combined model - separated die & pkg

Fig. 1(a) shows that the S-parameter is derived from a single model connected between die and package model. And, Figure 6-1 (b) shows a system build from both a die and a package model separately so that the S-parameters are cascaded through C4 bumps. For verification of the die to package connectivity, S-parameters are extracted from both a single 3D model in

Figure 6-1 (a) and separated planar models in Figure 6-1 (b). The connectivity can be demonstrated if the S-parameter of a single 3D model is matched with the combined S- parameters for planar substrates which are connected through C4 bumps for the separated die and package.

Figure 6-2 illustrates the simplified FCBGA structure and port definitions for the die and package for Figure 6-1. The authors designed the structure in HFSS [3.3] and especially the C4 bump in Figure 6-2 (b) is cut in half to simulate the die and package separately and then to combine each S-parameter of die and package through the C4 bumps. The solution type of this modeling is set by the choice of the driven terminal. Lumped ports are used for all ports in

Figure 6-2. Especially, the impedance of the each terminal has to be same between the port 3 and the port 4 on the C4 bumps preventing the impedance mismatch for die and pkg so that the S- parameters are cascaded.

106 A

B Figure 6-2. Port definition in HFSS for modeling strategies S-parameter magnitude. A) Single 3D model, B) Combined model

And, the ports on the C4 bumps are also differential pairs and to have a symmetrical differential port between a die and a package as a microwave interconnection concept. In order to approach the simulation of this system as a two-port network, the 4-port network that is defined above is treated as two-port network using post processing in ADS [3.4].

107 A

B Figure 6-3. S-parameters between single 3D model and combined planar models. A) S11 of Single & Combined model, B) S21 of Single & Combined model

Using the modeling strategies, the die and package connectivity can be verified in terms of S21 and S11 parameters and the magnitudes are almost matched as shown in Figure 6-3. In addition, ANSYS Q3D [6.5] is used to simulate between DC and 10MHz and the S11 curve is matched to the S11 of HFSS at 10MHz. Hence, the methodology we proposed allows simulating the die and packaging separately.

108 6.3 Polynomials using Polynomial Regression

Polynomial regression is a form of linear regression in which data is used to find polynomial of representing the transfer function of each model. In this work, the polynomials are derived from the transfer function which is the gain from the incident voltage, Va, to the output voltage, Vl of the 2-port network in the preceding figure. As mentioned previously, a 4-port network is treated as 2-port network. Figure 6-4 shows how to compute the Va from the source voltage Vs [6.6] and the transfer function of each model can be equal to S21 parameter to seek the polynomials using the equations (6.1)-(6.3).

A B

Vl tf  (6.1) Va Z  Z ' S 1  1   tf  S S  21 l S (6.2) ZS 21 S22l 1 inS  Where,

ZS  ZO Zl  ZO  11    ,     S l in  S11  S12  S21  ZS  ZO Zl  ZO  1 S22l  Hence,

Vl tf   S 21 where, ZS  ZS ' Zl  ZO (6.3) Va Figure 6-4. Proof the transfer function is equal to S21 of 2-port network. A) 2-port network, B) Incident Voltage(Va) from Source Voltage(Vs)

As shown in Figure 6-5, three-different models with identical port setup in Figure 6-2 (a) are designed. Each model has different number of N, that is a number of pairs of C4 bump within the ports, and the length of model is proportional to the N since N=1 has a unit length of the model.

109 A

B

C Figure 6-5. Die-pkg models with different number of N. A) N=1, B) N=5, and C) N=10

To find a polynomial of each model, the transfer function needs to be derived from the

th S21 parameters of each model and the curve of transfer function can be defined by 4 -degree polynomials using polynomial regression. The curve of transfer function can be described the S21

110 parameter more accurately as long as the degree of the polynomials is higher. On the other hand, if the degree of the polynomials is lower such as linear, quadratic and cubic, the curve cannot be representing the S21 curve using the polynomials as shown in Figure 6-6. The reason why the

th author has selected the 4 degree polynomials is that the root mean square error to the S21 curve under 99% prediction is lower than other polynomials such as linear, quadratic and cubic.

However, it is similar to 5th degree polynomial. Therefore, the author is selected the 4th-degree polynomial and the evaluating goodness of fit is demonstrated based on the S21 curve of N=10 in

Table 6-1.

Figure 6-6. Evaluating goodness of the fit to the S21 curve of N=10

Table 6-1. Comparison the SSE, R-square and RMSE (Root Mean Square Error) of N=10 SSE RSME N=10 R-square (Sum of squares due to error) (Root Mean Square Error) Cubic 0.0011550 1 0.0004813 4th degree 0.0001497 1 0.0001733 5th degree 0.0000500 1 0.0001002

111 Hence, the curves of transfer function of N=1, 5 and 10 are representing the S21 parameter using 4th degree polynomials as shown in Figure 6-7 and each of polynomials are demonstrated in Table 6-2.

Figure 6-7. Transfer function curve of each model (N=1, 5, 10)

Table 6-2. 4th degree polynomials of each curve (N=1, 5, and10) in Figure 6-7 N 4th degree polynomials of each curve 1 f (x)  4.15e7 x4  4.35e5  x3  0.0002 x2  0.0026 x1  0.0008 5 f (x)  2.46e5x4  4.13e4  x3  0.01407 x2  0.0064 x1  0.0001 10 f (x) 1.27e4 x4 1.63e3  x3  0.0581 x2  0.1224 x1  0.0018

Once the coefficients of each degree are derived from the polynomials of each curve, the coefficient can be set as P1, P2, P3, P4 and P5 using polynomial regression as shown in Figure

6-8 (a). Hence, the electrical behavior of complex model can be captured using the polynomials in Figure 6-8 (b) instead of 3D co-simulation with highly computational cost.

112

Figure 6-8. Coefficient analysis of 4th degree polynomials

4 3 2 1 (6.4) f NEXT (x)  P1 x  P2 x  P3 x  P4 x  P5

In Figure 6-9, complex models are simulated using the polynomials and the S21 is matched between the curve of polynomial and the simulation result in HFSS. And, the comparison of the computational cost between the HFSS and the polynomial for N=15 and 30 are shown under the Intel Core i7 Quad-core processor desktop in Table 6-3.

Table 6-3. Comparison the computational cost between HFSS and Polynomial N=15 N=15 N=30 N=30 Type HFSS Polynomial HFSS Polynomial Time 14hrs less than a min. 30hrs less than a min.

Even if S-parameters are complex (magnitude and angle), the author is interested in the magnitude only, as it is of the most interest in the modeled passive structure. Hence, the author mostly cares about how much loss the FCBGA model gets. If the loss is considered any kinds of signal with active structures such as an amplifier or attenuation in the FCBGA packages, then the

113 phase matters how the signal is degraded by the package. However, in this research the magnitude of the S-parameter is the key factor to be discussed in the frequency variation from

DC to 5GHz.

A

B Figure 6-9. Comparison S-parameters and calculated polynomials. A) N=15, B) N=30

114 6.4 Evaluating Electrical Interaction between Die and Package in FCBGA

Conventional modeling methods separate the IC and the package during characterization since they inherently assume no significant electrical interaction between the first layer IC metal and flip-chip package systems. However, it is anticipated there will be worst case IC upper level metal to package interconnect signal coupling interactions through these simulations.

6.4.1 Different Pattern between Die and Package Planes

In Figure 6-10, 6-12, 6-13 and 6-14, the structures modeled in HFSS are simulated to analyze the factors such as coupling, insertion and return loss that can be accounting for the early phase design decisions about the first level metal signal lines of die and package. The C4 bumps patterns are identical for all models and the pattern between the die and package metal is different.

A

B Figure 6-10. Different patterns between the first layer of die and package. A) Type ‖(parallel): Package to Die, B) Type ⊥(perpendicular): Package to Die

115 A

B Figure 6-11. Different patterns between the first layer of die and package in HFSS. A) Type ‖(parallel): Package to Die, B) Type ⊥(perpendicular): Package to Die

A Figure 6-12. Different patterns between the first layer of die and package planes. A) Type ‖(parallel) w/ pkg plane, B) Type ⊥(perpendicular) w/ pkg plane

116 B Figure 6-12. Continued

A

B Figure 6-13. Diff. pattern between the first layer of die and package planes in HFSS. A) Type ‖(parallel) w/ pkg plane, B) Type ⊥(perpendicular) w/ pkg plane

117 In Figure 6-14 (a), a parallel type has less insertion loss than a perpendicular type since

C4 bumps are connected by another grid in the first layer of package. Through EM simulation, we found that Figure 6-11 (a) is more inductive than Figure 6-11 (b). In case of Figure 6-14 (b), both insertion losses are similar to each other because all grids are connected by package planes.

A B Figure 6-14. Simulations of electrical interactions of Figure 6-10 and Figure 6-12. A) Type ⊥ & Type ‖ in Figure 6-10, B) Type ⊥ & Type ‖ in Figure 6-12

Hence, the coupling between die bottom layer and Pkg top layer shows different values depending on the pattern and the metal layers in the packages. For reference, the die and package modeling properties are shown in Table 6-4 and 6-5.

Table 6-4. Properties of the material in the die Thickness Permittivity Loss Conductivity Layer (um) Tangent (s/m) Passivation(SiO2) 50 4 0 Bottom(Pwr/Gnd) 15 5.91E+07 Silicon 85 11.9 0

Table 6-5. Properties of the material in the package Thickness Loss Layer Permittivity Conductivity (um) Tangent Solder Mask 5 4 0.015 Top(Pwr/Gnd) 15 5.91E+07 Build-up 1 30 3.5 0.026 Gnd_L01 15 5.91E+07 Build-up 2 30 3.5 0.018 Pwr_L02 15 5.91E+07

118 6.4.2 Electrical Behavior between C4 Bump Pair

Using the port setup in Figure 6-2, the electrical interaction among C4 bumps pair can be used to figure out how the return losses are vary depending on the length of the sequence of each

C4 bumps pairs. This allows evaluating the performance which route is efficient among the bumps. In Figure 6-15 (a) and (c), Diff2 can be switching to other pairs (1st, 2nd, 3rd and 4th) in order to measure the S-parameter between Diff2 and Diff1 sequentially. The observation of the simulation results as shown in Figure 6-15 (b) and (d) is that the return loss of S11_4th is the lowest and S11_1st is the highest in all frequencies range. In addition to, the die to package connectivity can be demonstrated by the separated die and package model through the ports on the C4 bumps.

A

B Figure 6-15. S11 among C4 bumps pair on die and pkg. A) Die metal layer w/ 5 C4 bump pairs, B) Simulation results - die layer w/ 5 pairs, C) Pkg metal layer w/ 5 C4 bump pairs and D) Simulation results – pkg layer w/ 5 pairs

119 C

D Figure 6-15. Continued

6.4.3 Electrical Behavior between C4 Bump Pair in Complex Model

Now that the simplified die and pkg model is demonstrated in terms of the electrical behavior between C4 bump pair, the complex model is needed to take a look at how the electrical behavior is in complex model since the coupling effect can be occurred between the number of metal layers in die and package.

First, the separated die model is implemented as shown in Figure 6-16. The dimension of the implemented model is 2mm X 1mm which is divided by 10 based on Intel Ivy bridge microprocessor which dimension is a 21mm X 10mm [6.7]. And, the bump pitch is 150um.

120

Figure 6-16. Complex model of the separated die in FCBGA structure

As shown in Figure 6-17, there are 13 bumps in each metal line on a die and 6 metal lines. The differential port, Diff 1is defined at the edge of the metal layer on die and the other differential port 2 is defined on the C4 bumps like as the previous port setup in Figure 6-15. At this time, Diff2 is allowed switching to other bump pairs (1st, 7th and 13th) in order to evaluate the performance including coupling effects among the bumps in complex model. Each of the bump pairs are separated by 150um that is the unit length, M between 1st bump pairs and 2nd bump pairs. The 7th bump pairs are placed in difference of 7 times of 150um from the 1st bump pairs.

Hence, 3 of bump pairs (1st, 7th and 13th) can be named by the M=1, 7 and 13 respectively.

Figure 6-17. Port definition of separated die in complex model

121 The observation of the simulation results of the return loss and insertion loss in Figure 6-

18 is that the loss is less as long as the Diff 2 port is far away from the Diff 1 port. This contributes to be determined the efficient route between the die and C4 bumps.

A B Figure 6-18. Comparison S-parameters among the C4 bump pairs in the die. A) Return loss of M=1, 7 and 13, B) Insertion loss of M=1, 7 and 13

Like as the separated die model, the separated pkg model can be implemented to evaluate electrical performance as shown in Figure 6-19. The comparison of the S-parameters is shown

Figure 6-20 and the results are correlated to the results of the separated die model as well.

Figure 6-19. Port definition of separated package in complex model

122 A B Figure 6-20. Comparison S-parameters among the C4 bump pairs in the package. A) Return loss of M=1,7 and 13, B) Insertion loss of M=1,7 and 13

Now that the ports are defined for both separated die and separated package, the die and package connectivity in complex model can be demonstrated using the proposed modeling methodology as shown in Figure 6-21. Once the S-parameter are ready for each die and package model from HFSS simulation, the ports on the C4 bumps are connected using the ADS. The C4 bumps that are connected to each other are utilized 7th bump pairs.

A Figure 6-21. Port definition of separated die in complex model. A) Single 3D complex model – die + pkg, B) Combined 3D complex model - separated die & pkg

123 B Figure 6-21. Continued

A

B Figure 6-22. S-parameters between single 3D model and combined planar models. A) S11– Single & Combined model, B) S21– Single & Combined model

124 The S-parameters in terms of the S11 and S21 between the single 3D model and the combined model are shown in Figure 6-22 and matched to each other in the frequency variation from DC to 5GHz. Hence, more complex model is also capable of proving the die to pkg connectivity through the S-parameters between the single model and combined model. As well as, it can be contributed to save a computational time to evaluate the electrical characteristic for the complex FCBGA structure.

6.4.4 Evaluating Electrical Behavior of the Intermediate Bump Pairs using Polynomials

The author has been demonstrated the modeling methodology using polynomial solution for the simplified FCBGA models as well as proposed the modeling methodology which is cut in half of the ball bumps for cost efficient solution using separated Die and Pkg as shown in Figure

6-2. For now, the author demonstrated how to construct a polynomial solution for intermediate ball bumps in a series on the separated die and separated package, not just the last bump using the proposed methodologies. As shown in Figure 6-23, the insertion loss can be defined between the Diff 1 port and the Diff 2 that can be switching other pairs (M=1,7 and 13).

Figure 6-23. Port definition of separated package in complex model

125 As shown in Figure 6-4, the S21 parameter between the Diff 1 and Diff 2 can be treated as

th a transfer function. Then, the S21 curve will be defined using 4 - degree polynomials as shown in

Figure 6-6. Next, each of coefficients of the 4th-degree polynomial is also analyzed by polynomial as shown in Figure 6-7 and Table 6-2. Finally, the S21 parameter for intermediate ball bumps (M=4 and 7) in a series can be captured by the polynomials as shown in Figure 6-24 and the curves generated by the polynomial are matched to the insertion loss of each case, M=4 and

7.

A

B Figure 6-24. Comparison S-parameters of intermediate ball bumps and calculated polynomials. A) M=4, B) M=7

126 6.5 Summary

In this chapter, a cost effective modeling methodology is proposed for the die-to-package connectivity in FCBGA. This modeling allows one to simulate the die and package separately, not as co-simulation of package. In addition, building a FCBGA model using polynomials to save computational cost was proposed. Using both methodologies, it is capable of capturing the insertion losses for intermediate ball bumps in a series on the separated die and separated package, not just the last bump. The package systems was simulated in order to evaluate the electrical interaction between the top of metal layer in a die and a package layers so that die– level and package-level design can make early decisions. These techniques can be used for huge and complex modeling in FCBGAs and other applications.

127 CHAPTER 7 ON-DIE POWER SUPPLY NOISE MEASUREMENT SYSTEM IN CHIP-PACKAGE SYSTEM

7.1 Introductory Remarks

Power integrity is more and more significant factor in chip, package and PCB design as integration technology advances. Power fluctuation affects performance and creates functionality degradation in the chip, package and board as shown in Figure 7-1. This figure depicts a typical supply noise waveform measured on the die due to rapid, large changes in supply and I/O current and IR events. Note that this includes four different droops from each resonance frequency in the impedance of the power delivery network. In addition, the noise waveform can be varied by power delivery systems depending on the product such as tablet, smartphone and PC.

Figure 7-1. Microprocessor Vcc fluctuations caused by interaction of parasitics with changes in current demand [7.1]

Figure 7-2 shows [7.2] impedance of the Intel Microarchitecture (Nehalem) Power

Delivery Network (PDN). The supply noise frequency range has remained roughly constant in the 100 MHz range in the last few process technology nodes since chip packages have not changed significantly and the reduction in the parasitic inductance of the power and ground C4 bumps has been matched to first order by an increase in the on-chip decoupling capacitance.

128

Figure 7-2. The impedance profile of Intel core micro-architecture (Nehalem)

Currently, the overall target for power integrity characterization is to ensure high quality supply voltages that can be provided to circuits on-chip. Moreover, a Voltage Regulator Module

(VRM) which is fully integrated on die such as shown by universities is a future option. In other word, the power integrity characterization is more significant as the complexity of the system is increased. By doing so, all levels of a system such as microprocessor which is lowest-level, package, printed circuit board and VRM have to be analyzed [7.3] since the device operation is directly affected by the on-chip power supply noise which is caused by the transients of the power delivery network.

Therefore, the significant of this research is that circuit-level designers must anticipate various power supply noise source in order to address any risk immediately as well as PDN-level designers should be recognized the design metrics based on the power supply noise in order to suppress the PDN stress. In addition, package cost can be saved by on-chip power supply noise characterization as well as it will be contributing to preventing the failures of the system in commercial electronics in the market place.

129

Figure 7-3. A strategy of on-chip power supply noise measurement system

A strategy of on-chip power supply noise measurement system is separated into three key segments; Noise Characterization, Circuit Design and Eco-system as shown in Figure 7-3.

First, on-die power supply noise is created by the combination of the current profile which is the noise source on die and the impedance profile of the power delivery network. For example, the current profile on-die can be separated by two representative circuit activities; one is core-type circuit activity and the other is I/O-type circuit activity. Second, a power supply noise measurement circuit has to be designed on-die to be detected the power supply noise.

Hence, the author proposed on-die noise measurement circuits demonstrated with both UMC

90nm and STMicroelectronics 65nm technology. In addition, not only designing the circuit to detect the noise but also enhancing the accuracy of the system is really important. Therefore, each block of the measurement circuit has been analyzed to detect the noise precisely. Lastly, the measurement circuit has to be interacted with eco-system that is enabled the circuit through hardware support such as software and etc. Also, the location of the measurement circuit on die is as important as much designing the circuit on-die. The author will be introduced in how to determine the location in the microprocessor as well.

130 7.2 Noise Characterization

Power Distribution Network (PDN) designers usually verify the performance of the PDN design as shown in the equation (7-3). PDN designers can get the ZPDN from the package and board model. And, circuit designers can be providing the current profile which is an excitation

(IEXCITATION) of the circuit activity to PDN designers. Then, Vnoise can be derived from the impedance profile (ZPDN) and excitation model (IEXCITATION).

(7-1)

VNOISE  ZPDN ( f ) I EXITATION ( f ) VTARGET (7-2)

(7-3)

When VNOISE is revealed based on the equation (7-1), the PDN designers will be verified whether the VNOISE is met their specification, VTARGET or not in (7-2). At this point, VTARGET has to be determined carefully since non-optimal VTARGET can result in over- or under-designed

PDNs. Over-designed PDNs impacts and increases cost, on the other hand, under-design PDNs impacts degraded performance. Therefore, PDN cost can be reduced if other factors of system performance compensate for VNOISE. Next generation product decisions can be better informed by quantifying VNOISE sensitivity and evaluating package performance.

7.2.1 Noise Source

In Figure 7-4, Intel Sandy bridge microprocessors [7.5] is composed of Core processes,

I/O, L3 cache, Graphic processor and memory controller. Each block of the microprocessor has circuits and the current profile will be determined by those circuit activities. Among the circuit activities, there are two representatives; one is core-type circuit activity and the other is I/O-type circuit activity. Two of those are the main issues that create the power supply noise that affects to both circuit performance and PDN performance.

131

Figure 7-4. Intel sandy bridge microprocessor [7.5]

The elements of CPU PDN noise sources are categorized based on the core-type circuits and I/O-type circuits in Table 7-1. Example of the circuit block, physical characteristic and PDN stress are presented in the table.

Table 7-1. Elements of CPU PDN noise sources Circuit type Example Physical Characteristic PDN stress Core-type  CPU Core  Large areas on the die  Clock gating starts/stops circuits  Graphics (significant Power-Ground the clock tree and logic Core capacitance)  Power gating powers  Can operate at low voltage up/down large Cdie supported by process  Activity factor (%gate switching) varies quickly, ie., virus  Data patterns I/O-type  Parallel I.O  Generally industry  Data bursts near PDN circuits (e.g., DDR) standard (there are resonance (i.e., DDR  Serial I/O proprietary I/O buses) protocol) (e.g., PCIE)  Smaller die area, usually  Data patterns  Chip-to-chip along the die periphery  Power management that I/O (on- and  Output drivers capable of controls the power or off- module) pF’s of load activity factor of parts of the I/O interfaces

Note that the noise sources can be affect the PDN stress depending the level of each activity, let’s take a look at the each type of the excitation model both core-type and I/O-type in

Table 7-2 and 7-3.

132 Table 7-2. Noise sources on the core Activity Description A control signal can start and stop the clock. This effectively halts all logic switching within that clock domain. Dynamic current is zero while static power Clock (leakage) is in effect. Actual implementation may have multiple clock domains, Gating more than one clock each on different phases.

A control signal makes or breaks current flow into the voltage domain affecting the DC path from power supply to the circuits in the voltage domain. Dynamic and Power static current is zero. Power gating can be implemented at any level: board-level, Gating die level and local circuit block level.

This is a change in activity, either increasing activity or decreasing activity. This depends on the instructions being executed at the time and can range from polling Activity (AOAC) to intense mathematical calculations (graphics transcoding) and vice- Change versa. This activity change might be associated with voltage and frequency scaling.

This is a repetitive behavior and probably has low likelihood of occurring in actual Cyclic system use; although design trends such as rush-to-halt might increase the Activity probability of occurrence.

Table 7-3. Noise sources on the I/O Activity Description Particular sequences of data patterns, especially when executed across many outputs simultaneously, can stress the I/O PDN. MemTest is an example of such code. Data Intensive 3D graphics can also stress the I/O PDN. Low-level bus control can create patterns patterns that stress the PDN resonance by writing sequences of 1’s and 0’s.

I/O circuit activity can change significantly depending on the tasks or tasks that the CPU is executing. For example, a CPU can go from a near-sleep state to high Activity activity such as 3D graphics and see a significant change in activity level. I/O Change activity for the CPU can change significantly as it switches from READ and

WRITE and Tri-State modes.

Data WRITES can occur in bursts with period of little or no bus activity between Data data WRITE bursts. The length of the bursts as well as the NOPs is dictated by the burst bus protocols. The bursts can occur in a way to excite the I/O PDN resonance.

Hence, the excitation model, Icc(t) are needed to be defined by the noise sources so that the power supply noise can be generated by each type of the excitations and the impedance of the power delivery network. This will be introduced in this research as well.

133 7.2.2 Power Delivery Network

As I mentioned previously in chapter 2, the power delivery network (PDN) can be simplified into a lumped RLC network including a decoupling cap to be made lower than targeted impedance based on the die, package and board in a microprocessor system. Ideally, the

PDN provides sufficient voltage and current for the IC bias the instant that the transistors switch.

However, simultaneous switching of several devices, Ldi/dt, chip-package resonance, and inadequate decoupling capacitance can create the power supply noise. The inductance elements can increases the impedance of the power deliver network.

In order to simulate the power supply noise through the combination of the excitation models and the impedance of the power delivery network, the author has been created a simple die model as shown in Figure 7-5 and a SPICE model which corresponds to the die floor plan to be representing a power delivery network as shown in Figure 7-6.

Figure 7-5. Die floor plan

134

Figure 7-6. SPICE model of the die floor plan PDN

Using the SPICE model of the power delivery network, the core-type and I/O-type impedance profiles can be derived as shown in Figure 7-7 and Figure 7-8 so that they can be generated power supply noise with the excitation models. The resonant frequency of the impedance profiles are 101MHz, 105MHz for Core-type, IO-type respectively.

Figure 7-7. Impedance profile of core-type PDN

135

Figure 7-8. Impedance profile of I/O-type PDN

7.2.3 Simulated Power Supply Noise for the Excitation Models

Now that the current profile and the impedance are modeled, the power supply voltage can be generated for core-type circuit activity and I/O-type circuit activity as shown in Figure 7-

9 and Figure 7-10.

A B

C D

Figure 7-9. Power supply noise based on the core type Icc(t). A) Power up - Power gating(PFET), B) Power up - noise waveform, C) Data Pattern, D) Data Pattern – noise waveform, E) IccMax - PLL IccMax and F) IccMax - noise waveform

136 E F Figure 7-9. Continued

A B

C D

F E Figure 7-10. Power supply noise based on the I/O type Icc(t). A) Power up - Staggering(Wake- up), B) Power up – noise waveform, C) Data Pattern, D) Data Pattern – noise waveform, E) Protocol(DDR) - Burst-Idle-Burst and F) Protocol(DDR) – noise waveform

137 Based on the simulation results of the power supply noise, the range of the noise can be determined as shown in Table 7-4. The frequency of each noise is determined by how fast the voltage droops recovers to the nominal voltage. And, PDN designers can examine the design metrics of the PDN in order to address the PDN stress as shown in Table 7-5.

Table 7-4. Noise frequency and voltage droop amount of each excitation Excitation Core-type Core-type I/O-type I/O-type Frequency Voltage Frequency Voltage Power up 1GHz 35mV 117MHz 16.6mV ICCmax 106MHz 14mV Data Pattern 102MHz 125mV 102MHz 125mV Protocol 1.5GHz 36mV

Table 7-5. Design metrics of each excitation Excitation Core-type Metrics I/O-type Metrics Comment Power up @on-board Vdroop @on-board Vdroop Step-like behavior • PMIC (Power • Power gates (di/dt high) Management @on-die IC) • Stagger logical @on-die blocks (I.e., PCI • Power Gate, Express lanes) Clock gating ICCmax High activity Vmin High activity Vmin Sustain Maximum Activity High DC current, (DC) High DC current, (DC)  Treat as a DC analysis frequency, voltage frequency, voltage is (Power DC) is dependent dependent Power Reverse of Vmax Typically not a high Vmax Load release (supplying load down power up (EOS) risk or concern. (EOS) is heavy, then release. there is no load. (overshoot) - Hard to fail the metric. (if Nominal supply voltage lower enough, TIS is low enough) Data Ping-pong data Vdroop BIB (Burst-Idle- Vdroop Encoding helps Pattern between two logic Burst) - DBI (data burst inversion), /Protocol blcoks High frequency - 8b10b encoding pattern @at close to - PRBS encoding PDN resonance I/O not takes high current - reduces risk of data patterns but not protocol Random Mixed activity Vdroop Mixed activity Vdroop Software OS might affect (Real stress (Real stress repetitive pattern. programs) programs) Non-deterministic Vdroop causes functionality problem and timing issue

138 In addition to, the behavior of the PDN Icc(t) can be defined as shown in Figure 7-11.

Both voltage droop and voltage overshoot are occurred simultaneously, but the voltage droop is a more critical issue since it affects the performance and functionality degradation of the chip than the voltage overshoot. The voltage overshoot is normally considered for the electrical overstress

(EOS) of the system and represent the power supply exceeding its voltage.

A B

C D

E F Figure 7-11. Behavior of PDN current Icc(t) and PDN voltage Vcc(t). A) Current step, B) PDN Voltage Vcc(t) of current step, C) Current impulse train, D) PDN Voltage Vcc(t) of current impulse train, E) Complex impulse train and F) PDN Voltage Vcc(t) of complex impulse train

139 Therefore, two representative noise sources; Core-type circuit activity and I/O-type

circuit activity can be defined in terms of the behavior of the PDN Icc(t) and the noise

waveforms as shown in Table 7-6 and the PDN voltage Vcc(t) can be anticipated with only the

behavior of the excitation model. This work contributed to the effective design of PDNs.

Table 7-6. Summary of PDN current Icc(t) Icc(t) Description Core Type I/O-type Comment Current • Single step or • Magnitude • Magnitude • For shared voltage Step multiple steps varies from change is rails, the power-up • Power-Up leakage typically 100’s or power-down of • Power-Down (Amps) to mA large circuit block • Circuit blocks can ICCMAX • Time frame is causes noise on the vary in size (10’s Amps) or nsec other circuit blocks • Can be caused by vice versa • Multiple steps that remain power gates, • Time frame is may be powered up. clock gates or usec separated by • For the circuit activity factor • Multiple steps significant powering-up, • PMIC may be time latency can be a separated by key performance significant metric time

Current • Repeating • Total ICC • ICC swing per • Better model of Impulse • Bursts swing can be I/O is small clock gating and Train • More realistic very large (~1V/50Ω) so activity factor behavior per depending on total ICC clock cycle circuit block depends on N • Superimposes size bits high-frequency • GHz clock noise on top of implies large PDN resonance di/dt behavior

Complex • Changing • Magnitude can • Encoding • Will be used to test Impulse magnitudes vary with time, schemes like the validation Train • Changing x, y DBI(data burst design system frequency content inversion) can reduce number of simultaneously switching bits; staggering is also effective

140 In conclusion, Core-type and I/O-type noise are demonstrated and characterized using both excitations and noise waveforms. PDN current Icc(t) can be characterized with the Core- type and I/O-type noise in terms of current steps, current “impulse” trains and complex

“impulse” trains. The differences of the noise characteristic are demonstrated between Core-type and the I/O-type circuit as well as design metrics of each current profile are analyzed. A CPU

PDN performance can be enhanced by using noise characterization for Core and I/O-type circuit activity on die.

7.3 On-chip Power Supply Noise Measurement System - Circuit design

Typically, on-die power supply noise is occurred by interaction of parasitics with changes in current demand such as Core and I/O current and IR events as well as by the impedance profile of the power delivery network in the package. In order to get rid of the risk due to the power supply noise, both PDN designers and circuit designers are trying to make it lower as a rapid current demand reduced as well as the impedance of the PDN is decreased. However, those are challenging as a feature size of CMOS scales and the packaging density is complex. Hence, the necessity of the on-chip noise measurement system is getting dominant and several on-chip noise measurement systems have been introduced [7.1~7.4] to be capturing the power fluctuation under various power supply voltage on-die. However, the thing that a noise measurement system is operated by multiple supply voltage is risky since there should be increased coupling effect from more power rails since the noise is coupled onto the resonant structure on the signal trace through the reference plane such as common mode noise including crosstalk, simultaneous switching output noise, and plane noise-induced signal resonance [2.2]. Hence, the measurement system is preferred to be provided one power supply voltage which is correlating to the logical blocks such as Core and IO blocks in order to measure the power supply noise as well as is placed right next to the logical blocks. Also, the power supply noise

141 can be characterized by the execution of the current profile of specific circuit activities and the impedance of the power delivery network.

Therefore, the author proposes on-die power supply noise measurement system using one power supply voltage for a repetitive power supply noise. The proposed measurement system is composed of two main sections; one is sampling and detection system and the other is an effective ADC to be converting from the sampled voltage to the number of digital pulses. In addition to, the proposed system is designed in UMC 90nm CMOS technology as shown in

Figure 7-12.

Figure 7-12. On-die noise measurement system

7.3.1 Design Specification

First of all, the design specification of the measurement system has to be defined. By doing so, the author demonstrated the noise characterization based on Core-type and I/O-type circuit activity. The author assumed that the power supply noise can be occurred by repetitive current profile and the Vcc(t) of the noise voltage is repetitive. Based on the Nyquist sampling theorem [7.6], the clock frequency of the measurement system has to be faster than the noise frequency to detect the power supply noise. For reference, the noise frequency is figured out in

Table 7-4 and the clock frequency has been determined by the noise frequency in this research.

142 As shown in Figure 7-13, the noise has been sampled once every clock cycle and the ultimate sampling clock frequency is going to be a step size of each clock cycle such as 1st, 2nd and 3rd clock cycle since the measurement system is designed to measure a repetitive noise. The step size can be determined by a trigger signal which is synchronous to the clock and the sequence of the clock cycle is defined by a trigger signal as well. The trigger signal can be supported by software and the sampling bandwidth is the period of each cycle.

For power supply noise wave reconstruction, the measured voltage will be stitch together and the noise can be analyzed in terms of Vmax, Vmin using look-up table which is stored by software.

Figure 7-13. Design specification of the measurement system

Conventionally, the range of power supply noise is around ±10% of a supply voltage of the die and the noise frequency is 10MHz to 100MHz [7.7]. However, as long as the complexity of the die and package is increasing, the voltage droop and voltage overshooting are definitely over the ±10% of Vcc. For example, the noise characterization the author demonstrated earlier is shown that it is exceeding the ±10% of Vcc (1V) as shown in Table 7-4.

143 Therefore, in this research, the measurement system is designed to be measuring is ±25% of the Vcc (1V). This is a key feature of a proposed design that is an improvement over Intel’s current design is measurement of overshoot noise since most of industry is interested in voltage droop. In addition to, the noise frequency that the system will be interested in is up to 1.5GHz.

7.3.2 Sampling and Detection System

The sampling and detection system is composed of the Gate-D Latch and the proposed bootstrapped switch as shown in Figure 7-14.

Figure 7-14. Sampling and detection system

The Gate-D latch is transparent when the C is high as shown in Figure 7-15. The clock signal is applied to the D and the enabled sampling clock will be set by the width of the sampling window which is the duration of the C.

Figure 7-15. Gate-D Latch schematic and the truth table

Then, the enabled clock signal, Q will be used as a sampling clock signal to measure the power supply noise in bootstrapped switch as shown in Figure 7-16. The sampling window needs

144 to be shifted to the next clock cycle to measure the noise since the noise is repetitive. And, the measurement circuit has to have a certain amount of time to be converting to digital pulses.

Therefore, the sampling switch is measured the noise once every clock cycle and shifted to the next clock cycle.

Figure 7-16. Simulation results of the Gated-D latch

As shown in Figure 7-9 and Figure 7-10, both voltage droop and voltage overshoot exceeding supply voltage can be detected using a sampling switch under one supply voltage system. Therefore, the author is designed a bootstrapped switch that can be sampling and holding the voltage within the measurement range. The schematic is shown in Figure 7-17.

145

Figure 7-17. Bootstrapped switch

The switch has a double charge pump [7.8] to provide the boosted gate voltage and to prevent the latch-up. When the state is ON which is when CLK is high, the input voltage is applied to the drain of the PMOS transistor P1. However, if the input voltage is larger than the bulk potential of M4 (VDD), it can forward bias the p-n diode at the junction between the source/drain and the well of M4. This damages the circuit severely. Hence, a separated charge pump is used to generate a voltage 2VDD when the main switch is on and the bulk of PMOS

(M4) switches between 2VDD and VDD. This occurs when the sampling switch is goes on and then off. This charge pump avoids forward biasing the p-n diode for input voltages larger than

VDD and the sampling switch can be reliably used for sampling signals exceeding VDD.

However, the main switch still has charge-injection error which has to be minimized. Hence, the author adopts a concept [7.9] to cancel the charge-injection error using a dummy switch. This technique is that if the width of dummy switch is taken exactly on-half that of main switch, and if the clock speed is fast, then the charges will cancel that of the main switch. In Figure 7-18, the simulation results is shown how much it is improved by the dummy switch technique.

146

Figure 7-18. Comparison between the proposed switch and bootstrapped switch

7.3.3 ADC Wired to On-chip Test System

Once the measurement system is detected the noise which is the edge of the analog signal, the measured voltage has to be converting to digital pulses using ADC on chip. The ADC consists of a switched capacitor voltage doubler, Voltage-Controlled Delay Lines and oscillators

(VCDL/VCO), unity gain buffer, a level shifter which is inverter chained buffer, gated-D latch and 7bit on-chip synchronous counter as shown in Figure 7-19.

Figure 7-19. A/D wired to On-chip test system

The unity-gain buffer in ADC will be driving a detected noise which is Vsample to be used as a VDD (Vc) of the ring type VCO. Therefore, the buffer has to be capable of driving the detected power supply noise, even exceeding supply voltage of the system (1V). In other words,

147 to make a wide-range measurement system which is measuring ±25% of the Vcc, the switched capacitor voltage doubler has to be implemented to be used as a supply voltage of the buffer.

Hence, the supply voltage of the unity-gain buffer is going to be 2Vcc. Then, the VCO will be generating digital pulses which have a voltage amplitude as same as Vc which is used for VDD of the ring VCO). Therefore, the output frequency of the VCO is going to be increased as long as the Vc is higher since the VCO is proportional to the Vc. Then, the level shifter makes the voltage amplitude of the VCO equal to the supply voltage (1V) to be used as an input signal of the Gated-D latch. Finally, the digital pulses can be activated only to count the pulses during the conversion window that is determined by the enable signal. The width of the conversion window is depending on the resolution of the ADC.

From now on, each of blocks will be demonstrated based on the flow of the ADC.

7.3.3.1 Switched-capacitor voltage doubler (SCVR)

First of all, the output of the voltage doubler should be bigger than Vmax of the sampled voltage and the voltage doubler can be driving a load current of the buffer. Hence, the author designed two opposite-phase SC converters connected in parallel [7.10] in UMC 90nm and checked that it works properly as shown in Figure 7-20. The SCVR doesn’t need to have a separate bootstrap gate drivers since the structure of the VR consists of opposite-phase, parallel and cross-coupled converter connection. Also, the switching frequency at the output capacitor is twice faster than the f which is clock frequency. This contributes that the ripple of the output can be reduced as long as the frequency is faster. For reference, the design specification is shown in

Table 7-7.

148 A B Figure 7-20. Switched-capacitor voltage doubler. A) Schematic of SCVR [7.10], B) Simulation Results

Table 7-7. Design specification of switched-capacitor voltage doubler Objectives Values Output voltage of the doubler 1.8V Vmax of the sampled voltage 1.25V Switching Frequency 3GHz Output Capacitor 20pF Output Ripple Less than 10mV 0.258mA @ Vsample: 1.25V Load current 0.07mA @ Vsample: 0.7V Output transient recovery time 15ns

7.3.3.2 Voltage-controlled delay lines and oscillators (VCDL/VCOs)

The VCDL/VCO consists of a buffer and a 5-stages ring VCO. As shown in Figure 7-21, the sampled voltage (Vsample) drives a unity-gain buffer which generates the supply voltage of the ring VCO which is Vc. Then, this VCO can be generating digital pulses and the frequency is proportional to the Vc which is generated by the buffer.

Vsample + Buffer -

Vc

Figure 7-21. VCDL/VCOs [7.11]

149 7.3.3.3 Unity-gain buffer (Two-stage CMOS OP-AMP)

Ideal op-amp is designed as a buffer a high impedance source. The output is fed back to the inverting input as shown in Figure 7-22 (a). Since the output adjusts to make the inputs the same voltage. In other words, it is a voltage follower and the simulation result is shown on the right side. The open-loop gain has to be higher so that Vin is equal to Vout as the closed loop gain is closed to 1. For reference, the closed loop gain (Av/ (Av+1)) is 0.976dB, where the open loop gain is 41dB. The schematic and simulation results are shown in Figure 7-23 and 7-24, respectively.

A

B Figure 7-22. Ideal buffer (Two-stage OP-AMP). A) Diagram of ideal buffer, B) Two-stage OP- AMP schematic

A B Figure 7-23. Simulation results of the buffer in UMC 90nm process. A) Open loop gain of the buffer: 41dB, B) Transient simulation Vin and Vout

150 7.3.3.4 Level shifter

The amplitude of the VCO output is as same as the Vc which means that the Vc might be able to below than the supply voltage or exceed the supply voltage. Two-stages inverter chained buffer is designed for a level shifter and is converting the different amplitude of the VCO output to 1VDD to be used as an input signal of the Gated-D latch. This contributes a wide range measurement system as well.

7.3.3.5 Gated D latch (Conversion window)

The conversion window will be controlled the VCO output signal to be enabled during a certain amount of period. Therefore, it makes possible for the on-chip counter to be counting the pulses during the width of the conversion window. The width of the window (Twin) will be determined by the resolution of the ADC as shown in equation (7-2).

fout Kvco  Vctrl Where: Δfout = output frequency diff. between the measured voltage (7-2) ΔVctrl = measured voltage diffrence Twin 1LSB  Kvco Where: Twin = the width of the conversion window

In this work, the duration of the conversion window is set to 30ns so that the resolution of ADC can be achieved 11.03mV which is 1LSB (Least Significant Bit). As long as the system is requested a better resolution of ADC, the duration of the conversion window makes it longer.

7.3.3.6 7 bit synchronous counter

The digital pulses which are enabled during the width of the conversion window are counted by the 7-bit synchronous counter. The reason why the author designed a synchronous on-chip counter is that there is no propagation delay that creates error in especially high speed

151 clock rates. And, the number of bits is also determined by the resolution of the ADC since the number of digital pulses will be increasing as long as the width of the conversion window is longer or the VCO frequency is faster and faster. In addition to, the number of digital pulses will be calibrated to be tracking the sampled power supply noise accurately and this will be demonstrated in the next section. For reference, the 7-bit on-chip synchronous counter consists of AND, NOR and D flip-flop.

7.3.4 Overall Simulation Results of the Measurement System in UMC 90nm

Now that all of blocks are demonstrated, the overall simulation results will be presented in UMC 90nm technology. Two simulation results will be shown; one for Vdroop- when the system is sampled the minimum voltage of the measurement range (Vsample = 0.75V) and the other for Vovershoot- when the system is sampled the maximum voltage of the measurement range (Vsample = 1.25V). The simulation results are based on the sampling the noise which is generated by the excitation model as shown in Figure 7-24 (a) and the impedance of Core-type

PDN as shown in Figure 7-7. For reference, the voltage at Vdroop and Vovershoot can be adjusted depending on the supply voltage of the PDN.

A B Figure 7-24. Simulated excitation model and power supply noise. A) Current step excitation model, Icc(t), B) Simulated power supply noise, Vcc(t)

In addition to, the power supply noise, Vcc(t) as shown in Figure 7-24 (b) is created by the current step excitation model Icc(t). As the author mentioned earlier, the design metrics of the power delivery network can be defined to be used to address any risk immediately for PDN

152 designers as shown in Figure 7-11. In this case, the design metric is “voltage droop” since the first voltage droop is dominant.

Figure 7-25. Simulation results of the measurement circuit for Vsample = 0.75V

As shown in Figure 7-25, the sampling switch has been sampled the power supply noise once every clock cycle as the author mentioned in the design specification of the measurement system. The width of enable sampling window is 1ns which is allowed to go through 1 pulse of the 3.03GHz clock signal. The 3.03GHz clock signal is determined by the noise frequency which is 230MHz at the power supply noise waveform. It means that the measurement system can detect the noise using the 3.03GHz clock signal which is two times faster than the noise frequency.

Assuming that the sampling window is shifted to the next clock cycle, the location where the system is sampling is the minimum voltage of the power supply noise. Then, the sampled voltage is going to be ideally 0.75V, however, it is 0.736V which is ~99% of the 0.75V since there is offset between the sampling switch and bulk capacitor to be charged. The capacitor will be charged approximately 99% during 4 times RC time constant.

153 Then, the unity gain buffer is generating the Vc voltage to be used as a VDD of the VCO based on the Vsample. There is an offset between the input and output of the buffer since the gain is not equal to 1.Then, the VCO can be oscillating the digital pulses which have the sample amplitude with Vc and the output frequency of VCO will be proportional to the Vc.

Next, the amplitude of VCO will be converting to 1VDD to be processed as an input signal of the Gated-D latch since the amplitude might be able to be higher or lower than the Vcc.

Then, the pulses are enabled during the width of conversion window, 30ns. However, the system needs to delay till the Vc is setting down. The author is named the delay, “unity-gain buffer delay” and it takes normally 15ns. Hence, the unity-gain buffer delay has to be considered to determine the starting point of the conversion window. Then, there is a red pulse to be counted by the on-chip counter as shown in Figure 7-25. The 7bit on-chip counter is counting the red pulses in Figure 7-26 and it is shown the 7 bits, 0101010(2) which is equal to 42 as shown in

Figure 7-26. The number of bits of the counter will be determined by the number of enabled pulses depending on the resolution of the conversion process. Through this demonstration, it is shown that the proposed measurement system can detect the noise which is Vsample, 0.75V and converted to the number of digital pulses, 42.

Figure 7-26. Simulation results of 7bit on-chip counter for Vsample = 0.75V

154 The other simulation result in case of Vovershoot is shown in Figure 7-27. For this simulation, the noise is regenerated in order to have the Vovershoot which is 1.25V. By doing so, the transient simulation is performed under a different supply voltage with the Icc(t) and the impedance of the PDN.

Figure 7-27. Simulation results of the measurement circuit for Vsample = 1.25V

Assuming that the sampling window is shifted to the clock cycle where the Vovershoot is occurred, the sampling switch is sampled the Vovershoot, like as the Vdroop case, the Vsample is 1.23V, not 1.25V since there is an offset between the sampling switch and holding capacitor.

The sampled voltage which is 1.23V is ~99% of the 1.25V from the charging capacitor during 4 times RC time constant. The procedure after the sampling switch is same as the Vdroop example.

Finally the simulation result of the counter is shown that the measurement system is detected the

Vovershoot and converted to the digital pulses, 87 as shown in Figure 7-28.

155

Figure 7-28. Simulation results of 7bit on-chip counter for Vsample = 1.25V

As shown the simulation results both Vovershoot case and Vdroop case, the measurement system can be detect the power supply noise and converted to the digital pulses properly. For reference, the difference of the digital pulses between the Vdroop case (# of Vdroop = 42) and the Vovershoot (# of Vovershoot = 87) case is 45 pulses which is C o u n t . And, the difference of the amplitude between the Vdroop (Vdroop = 0.736V) and Vovershoot (Vovershoot = 1.23V) is 494mV which is Vamplitude . As the author mentioned earlier, the resolution of the conversion process, Vresolution is 11.03mV.

At this moment, it is necessary to be verified that the can be demonstrated by Vresolution and C o u n t using the equation (7-4).

Vamplitude  Count Vresolution (7-4)

Based on the equation (7-4), the is equal to 496mV which is closed to 494mV.

Hence, it is shown that the difference of the sampled voltage can be verified by the resolution of the conversion process and the difference of the digital pulses.

156 7.4 Accuracy of the Measurement System in UMC 90nm

The power supply noise has to be measured precisely using a proposed measurement system since there are lots of variations that will be affected to the accuracy of the measurement system on die. Therefore, the author is analyzed sources of error from the proposed measurement system so that the system can be more confident from the factors of the inaccuracy. First of all, the measurement circuit needs to be separated in 3 logical blocks such as pass-through or block the voltage, VCO and Counter as shown in Figure 7-29.

Figure 7-29. 3 logical blocks of the measurement circuit

Then, the author analyzed where the error gets introduced based on the 3 logical blocks in the measurement system as shown in Figure 7-30 and the lists are sorted in the table 7-8.

157

Figure 7-30. Sources of error in the 3 logical blocks of the measurement circuit

158 Table 7-8. List of the sources of error Block Objective Target Issue Pass-Thru or Trigger signal The system will be using a Jitter & Skew will be affected block the (Aperture Jitter) trigger signal through HW the inaccuracy when the noise voltage supports (SW) - Calibrated needs to be sampled in time - it out then. Aperture jitter will be affected the bandwidth of the system – can or can’t measure the input noise. Sample & Hold 165ps > 71.2ps, where The period to be charged up to Switch 165ps = ideal sampling 99% has to be smaller than the period by 3GHz sampling sampling period. frequency. (71.2ps = 4τ , 99% charged of the capacitor) Sampling Frequency Min 3GHz (since fnoise = Sampling frequency of the 1.5GHz) system is the step size of each clock cycle, not bootstrapped switch sampling freq. Cutoff- frequency Min. 1.5GHz f3db = 1/(2π*RC), > 1.5GHz R = Ron of the switch, C= Bulk cap Vdroop of the Check if Vdroop is negligible Vdroop will be vary depending on capacitor 0.1mV/30ns @ 100ºC (TT) the leakage current of the SW 1mV/30ns @ 100ºC (FF) and cap itself. Leakage current of Negligible leakage current Leakage will be occurred at the surrounding the cap corresponding to the Vdroop gate leakage of the switch and it of the capacitor affects the accuracy. Also, the leakage will be vary depends on the variation of the process and temperature. VCO Resolution of ADC 12.5mV The conversion window size has (1LSB) to be considered the resolution of ADC Voltage variation [3] Minimize Vpp There is a ripple of the output of Unity gain buffer Av: At least ~30dB the buffer. This could affect the accuracy of the ADC. Voltage variation[2] Less than 50mV There is a ripple of the output of SC Voltage Doubler the Voltage doubler. This affects the buffer as well Counter On-chip Counter 7bit synchronous Prevent from a glitch and an asynchronous clock VDD Voltage variation[1] Make the circuit as immune Voltage variation would be Noise rail as possible to the power affected the system supply noise Process/ PVT Variation Calibrated it out NFET, PFET will be affected by Temperature (Process, Voltage the process and The leakage is and Temperature) exponentially increased by the temperature  It will affect the accuracy

159

All sources of error are analyzed based on the each block and figured out the target to meet their specification to be enhancing the accuracy of the measurement circuit. Hence, the author went over each by each to find out the solution to mitigate the inaccuracy.

7.4.1 Trigger Signal – Aperture Jitter

The measurement system will be used a trigger signal through SW. When the trigger signal comes, the sampling switch is sampling the noise once every clock cycle. At the moment, the switch can be exposed to the error caused by a jitter and a skew unless the power supply noise is sampled by the switch in exact time.

As shown in Figure 7-31, there is voltage error which is relating to the aperture jitter which is sample-to-sample variation. The aperture jitter is referred to the sample-to-sample variation between the switch opens instantly. The amplitude of the associated output error is related to the rate-of-change of the analog input and the external sampling clock. Therefore, the root-sum-square of the external sampling clock jitter and the sampler aperture jitter can be the total amount of jitter [7.12].

Figure 7-31. Effects of aperture jitter and sampling clock jitter [7.12]

160 Based on the equation (7-5), the allowed aperture jitter is determined. In other words, the sampling bandwidth of the measurement system is determined by the jitter.

SNR  20log101/ 2  f _int put Tj Where: Tj = Aperture jitter (7-5) f_input = Input noise frequency

As shown in Figure 7-32, the equation (7-5) is plotted and shown that the allowed aperture jitter is determined by the SNR (Signal-To-Noise Ratio) and the ENOB (effective number of bits). In the measurement system, the ENOB of the ADC is 7 bits and the interesting analog noise input frequency is 1.5GHz.

Figure 7-32. Theoretical data converter SNR and ENOB due to jitter vs.fullscale sinewave input frequency [7.12]

Hence, when the sampling switch is sampled the noise depending on the size of step- through as mentioned in the design specification, the switch has to be considered the aperture jitter to sample the power supply noise using the measurement system. In addition to, the measurement system can be enhanced as the skew calibrated out and the jitter averaged out.

161 7.4.2 Switch and Holding Capacitor

The components Ron, which is on-resistance and C can be determined the analog input bandwidth of the measurement system as shown in Figure 7-33.

Figure 7-33. Simplified RC model of the sampling switch and holding capacitor

The Ron is determined by the equation in (7-6) and it is shown that Ron is affected by many elements; input voltage, supply voltage, thickness of oxide, gate length and width. And,

MOSFET has to be considered to be optimized as an analog switch. Therefore, RON is reduced by increasing the ratio of the MOSFET width/length.

1 L2 L2 Ron    (7-6) W W  Qch  Cox (VDD Vth Vi)  Cox (VDD Vth Vi) L L For reference, both Ron of main switch as well as the Ron of dummy switch are derived using parameters such as “beta” and “threshold voltage, Vth” from DC operating point in spectre simulation in cadence.

Based on the Ron and the holding capacitor, the cutoff frequency of the sampling switch is determined by the equation in (7-7) and the frequency is 8.94GHz which is over 3GHz since the maximum noise input frequency is 1.5GHz. The cutoff bandwidth of the switch has to be bigger than 2 times of the maximum input noise frequency due to the Nyquist theorem. [7.6]

f3db = 1/(2pi * Ron * C) (7-7)

Let’s take a look the sampling and holding switch in terms of the sampling period and RC time constant. In this simulation, the clock frequency is set to 3GHz since the fastest input noise frequency is 1.5GHz. Therefore, the sampling period is ideally 165ps which is “D” has to be longer than the 4 times RC time constant in equation (7-8) so that the capacitor can be charged

162 up to ~99% of its capacitance. As the condition, D > 4τ is satisfied, the sample and hold switch can be achieved the accuracy, ~99% of the sampled voltage as shown in Figure 7-34.

-1 D > 4τ where D, f3dB = (2πτ) and τ = Ron*C (7-8)

Figure 7-34. Diagram of the sampling period and the 4 RC time constant

Based on the simulation results as shown in Figure 7-35, the D, actual sampling period

203.7ps is longer than 4 times RC constant 143.8ps when the capacitor is charged up to 990mV which is 99% of 1V.

B A Figure 7-35. Simulation results of RC time constant and sampling period. A) ~99% of the cap, 4τ = 143.8ps, B) Sampling period, D = 203.7ps

7.4.3 Vdroop Rate of the Capacitor

Voltage droop happens at the capacitor when the switch fell into hold mode. The droop rate is the rate at which the output voltage is changing due to leakage from the hold capacitor. In

163 this measurement system, the Vdroop rate is calculated based on how much voltage is dropped during 30ns, the width of conversion window.

Figure 7-36. The equation of the leakage current

The cap model is MIMCAPS_20F_MM (50fF) in UMC 90nm process. The metal- insulator-metal (MIM) capacitor is constructed from a thin insulation film between two plane metals. The reason why the author has chosen the MIM cap is that it is insensitive to the silicon substrate since the electric field is enclosed largely between the upper metal layers [7.13].

Analyzing the droop rate is a one of key factors to be enhancing the accuracy of the measurement system since it is important to know how much leakage from hold capacitor is affected to the measurement system. Here is the amount of voltage droop in Table 7-9 and the droop rate in PVT variation at 1.25V as shown in Figure 7-37.

Table 7-9. Voltage droop amount at the capacitor in PVT variation @1.25V Voltage droop in 30ns (V/ns) Degree(ºC) Typ SS FF -50 0.052m 0.034m 0.23m 27 0.098m 0.051m 0.245m 100 0.1m 0.06m 0.275m

164

Figure 7-37. Voltage droop rate of the capacitor in PVT variation

Based on the results, the droop rate of the capacitor is increasing as long as the temperature is higher and specifically in FF corner. However, the droop rate is negligible since it is less than 1% error to the voltage variation such as voltage droop and voltage overshooting

(±25% of the VCC) the measurement range of the system.

7.4.4 Leakage Current of Surrounding the Capacitor in PVT Variations

There are three main leakages; gate and channel leakage of the switch (NMOS), capacitor leakage itself and gate leakage of the buffer (NMOS) surrounding the capacitor as shown in

Figure 7-36. Even if the leakage current is increasing as long as the temperature is higher and the most leakage current is occurred at FF corner, the leakage current is also negligible based on the simulation result. As shown in equation (7-8), the leakage current will be derived by this equation. Now that the capacitance, the conversion time and the voltage droop amount are ready,

ILKG can be found out based on the table as shown in Table 7-8 and Figure 7-38.

Leakage current : ΔV/Δt = ILKG/CH (7-8) Where: ILKG=Leakage current, CH=hold capacitance, Δt = Conversion time

165 Table 7-10. Leakage current at hold mode in PVT variation @1.25V Leakage current amount in 30ns Degree(ºC) Typ SS FF -50 86pA 56pA 383pA 27 163pA 85pA 408pA 100 166pA 100pA 458pA

Figure 7-38. Leakage current of the capacitor in PVT variation

7.4.5 Resolution of ADC

The affordable resolution of the analog to digital conversion process is about 5% of the

Voltage droop amount which is 250mV. To achieve the targeted resolution which is 12.5mV, the conversion window has to be 30ns in this simulation since the resolution will be determined by the equation (7-2). In this measurement circuit, the resolution is 11.03mV for the 1LSB (least significant bit). If the circuit is needed a better resolution, then the conversion window makes it longer than 30ns.

7.4.6 Voltage Variation of the Unity Gain Buffer

As shown in Figure 7-30, Vc which is the output of the unity-gain buffer has been possibly affected to the system because of voltage variations such as an offset of the buffer between input and output and a output ripple of the buffer. First, the offset can be occurred since

166 the buffer doesn’t have a unity gain; exactly the closed-loop gain is around 0.976dB in this simulation. As shown in the Figure 7-39, the Vsample which is input of the buffer is 1.23V but the Vc which is output of the buffer is 1.223V. Second, the ripple of the output voltage, Vpp is existed at the output of buffer.

Figure 7-39. Simulation results for the voltage variation of the buffer at Vovershoot

To get rid of the concerns from the voltage variations, first, the unity-gain buffer has to be calibrated out with multiple measurements using a constant Vsample for the error which is caused by the non-ideal unity gain of the buffer. Second, the calibration is also needed to find out the output frequency variation of the VCO with various Vpp. For example, if the capacitor at the output node is not implemented nor has 0F, the ripple of the output buffer, Vpp is going to be biggest. Here is the Vpp in both cases in Table 7-11.

Table 7-11. Output ripple of unity-gain buffer by different capacitor Capacitor Vpp 500fF 13.7mV 0F 25mV

Based on the two different Vpp, the output frequency of the VCO needs to be calibrated out as the number of pulses is compared as shown in Figure 7-40. The calibration is performed in the extreme condition which is at FF corner and the highest sampled voltage to see the maximum variation from the simulation. In conclusion, the different number of pulses is only 1 pulse at the

167 output frequency of the VCO and it is shown that the concerns caused by voltage variations of the buffer can be removed through the calibration since the measurement system is just considering 1 pulse difference in PVT variations.

Figure 7-40. Pulse count difference between various output ripple of the buffer

7.4.7 Voltage Variation of the SC Voltage Doubler

The SC voltage doubler has a ripple at the output since the doubler is working as the switch is turned on and off. Therefore, it is necessary to determine if it affects the accuracy of the system or not due to the variation as well. Basically, the amplitude of output ripple of the doubler is varying depending on the amount of the capacitance on the output node like as the unity-gain buffer. Here is the table shown the Vpp difference by different capacitors in Table 7-12.

Table 7-12. Output ripple of voltage doubler by different capacitor Capacitor Vpp 20pF 18mV 10pF 40mV Even if the Vpp of the doubler is different, there is no effect to the output of buffer and output frequency of the VCO in the measurement system as shown in Table 7-13. Therefore, it is verified that the various Vpp of the doubler is not going to be affected the accuracy for the system, as long as the output voltage of the SC voltage doubler is higher than the sampled voltage (Vsample) so that the system works properly.

168 Table 7-13. Simulation results with the different ripple of the voltage doubler at Typical 27ºC, Vsample: 1.25V Vout_ripple of voltage doubler by different Vc_ripple Vc # of pulse capacitor 18mV @ 20pF 15.3mV ~1.23V 87 40mV @ 10pF 16mV ~1.23V 87

7.4.8 On-chip Counter

Counting the digital pulses is a key factor to measure the power supply noise in the measurement system. However, the counter can lose a pulse or pulses because of a glitch pulse and an asynchronous clock between the VCO output pulse and the enable signal of the conversion window as shown in Figure 7-41.

Figure 7-41. Possibility lose pulses due to a glitch pulse at Gated-D Latch

This is so because D-latch is synchronous to the system clock, but VCO is asynchronous to the clock. It means that there is no correlation between the phase of the VCO output pulse and the point where the conversion window starts and stops. However, losing a pulse or a couple of pulses is not a big issue since it is about 1% of the total pulses during the 30ns conversion window as well as the measurement system can be averaged out through multiple measurements.

169 7.4.9 Voltage Variation at the Power Supply Voltage

As shown in Figure 7-30, the measurement circuit is provided by 1.0V supply voltage which is including power supply noise. At this point, the author needs to check whether the overall measurement circuit is affected by the voltage variation of the noisy power supply voltage. First of all, the key factor of the sampling and detection system is on resistance (Ron) of the sampling switch and capacitance of the bulk capacitor since the RC time constant is determined by Ron and C. However, the Ron in equation (7-6) is not sensitive to the power supply noise and the capacitor will be constant. Therefore, the time constant will not be changed a lot because of the variation. The other part, analog to digital conversion in the measurement system is composed of digital circuits mostly. Hence, it is verified that the accuracy of the measurement circuit will not be affected by the noise rail of the power supply voltage.

7.4.10 PVT Variation of the Overall Measurement Circuit

For design robustness of the measurement system, it is necessary to be characterized under PVT (Process, Voltage and Temperature) variations. Through the characterization, the author can be recognized how much the system is affected by the variations as well. In order to see the effect by process variation, the author has chosen even corners (SS, FF, TT) as well as various temperatures (-50ºC, 27ºC, 100ºC) are used to see how much the measurement system is affected by the temperature in terms of the VCO sensitivity. As the measurement system is performed in extreme environment, the stability of the circuit can be revealed. Let’s take a look at the VCO sensitivity through overall simulation in PVT variation in Figure 7-42.

170

Figure 7-42. VCO sensitivity in PVT variations

The observation of the VCO sensitivity is that the VCO output frequency is increasing as long as the temperature is low. In addition to, the output frequency of the VCO is getting faster and faster at the FF corner. In the given sensitivity, let’s assume that the sampled voltage (1V) is known, but the temperature is completely unknown. In this case, even if the sampled voltage is fixed to 1V, the output frequency of VCO will be varied depending on the temperature as shown in Figure 7-43. In addition to, the error rate can be calculated using the equation (7-10) for each case.

퐇퐢퐠퐡퐞퐬퐭 퐅퐫퐞퐪퐮퐞퐧퐜퐲 퐚퐭 ퟏ퐕 − 퐋퐨퐰퐞퐬퐭 퐅퐫퐞퐪퐮퐞퐧퐜퐲 퐚퐭 ퟏ퐕 Error rate = x 100 (7-10) 퐅퐫퐞퐪퐮퐞퐧퐜퐲 퐚퐭 ퟐퟕº퐂, ퟏ퐕

171 A

B

C Figure 7-43. Frequency error range in specific voltage in PVT variations @ 1V. A) Error range in SS process, B) Error range in TT process and C) Error range in FF process

172 Through this calibration, the range of the output frequency is characterized by the temperature and process. The observation of the calibration is that the error range of the frequency is wider as long as the process is getting faster and faster.

On the other hands, when the output frequency of the VCO is known for the sampled voltage (1V), then the detected voltage has an error range (7-11) since the frequency is vary depending on the temperature as well.

퐋퐨퐰퐞퐬퐭 퐕퐨퐥퐭퐚퐠퐞 − 퐇퐢퐠퐡퐞퐬퐭 퐕퐨퐥퐭퐚퐠퐞 Error rate = x 100 (7-11) ퟏ퐕

A

B

Figure 7-44. Voltage error range in specific frequency in PVT variations @ 1V. A) Error range in SS process, B) Error range in TT process, and C) Error range in FF process

173 C

Figure 7-44. Continued

As shown in the calibration in Figure 7-44, the source of error is the range of the voltage difference due to unknown temperature under the specific output frequency of the VCO and it is important to get the error range of the measured voltage affected by the temperature. In addition to, the error range is going to be more specific if some bounce of the temperature is known from the heat sink and thermal solution since most of system on chip is working among the temperature and the author is set to 1V as an operational temperature for example. Hence, for enhancing the accuracy of the measurement system, the system has to be calibrated out to be suppressed the error under the PVT variations.

7.4.11 A Summary of the Accuracy of the Measurement System

From now on, the author has been reviewed the each objective of sources of error in

Table 7-8 and analyzed how to meet the target to be suppressed the inaccuracy of the measurement system. Table 7-14 is the summarized table which is demonstrated the status of each objective for enhancing the accuracy of the system.

174 Table 7-14. Status of the sources of error Block Objective Target Status Pass-Thru Trigger signal The system will be using a Jitter & Skew needed to calibrate or block the (Aperture Jitter) trigger signal through HW out to mitigate the inaccuracy voltage supports (SW) - Calibrated when the switch is sampled the it out then. noise. Defined allowed jitter value when the sampling window is step- through the noise Sample & Hold 165ps > 71.2ps, where 203.7ps > 143.7ps Switch 165ps = ideal sampling D > 4τ, D = Sampling duration period by 3GHz sampling τ = 17.8ps where Ron=356Ω, frequency. (71.2ps = 4τ , C=50fF 99% charged of the Ideally 4τ = ~71.2ps. capacitor) Simulated τ = 143.7ps Sampling Frequency Min 3GHz (fnoise = 1.5GHz) Step size: smaller than 330ps. Cutoff- frequency Min. 1.5GHz f3db = 8.94GHz, where f3db = 1/(2π*RC), Ron=356Ω, C=50fF Vdroop of the Check if Vdroop is negligible Vdroop is negligible capacitor 0.1mV/30ns @ 100ºC (TT) 0.1mV/30ns @ 100ºC (TT) 1mV/30ns @ 100ºC (FF) 0.275mV/30ns @ 100ºC (FF) Leakage current of Negligible leakage current ~1mV/100ns surrounding the cap corresponding to the Vdroop Less than 1% error to the Vdroop of the capacitor (250mV) VCO Resolution of ADC 12.5mV 11.03mV @27ºC, TYP in 30ns (1LSB) conversion window Voltage variation [3] Minimize Vpp Vpp: 13.7mV, Av: 41dB Unity gain buffer Av: At least ~30dB Voltage follower. Gain = ~1 (0.97xxx) - Negligible Voltage variation[2] Less than 50mV Vpp: 18mV – Negligible SC Voltage Doubler Counter On-chip Counter 7bit synchronous 7bit resolutions(1LSB = ~12.5mV) Number of pulse count will be varying depending on the variation of process and temperature. Need to be averaging out through calibration. VDD Voltage variation[1] Make the circuit as immune Ron (of the switch) and Cap only Noise rail as possible to the power matter to the power supply noise. supply noise But, Ron is not sensitive to the noise. Everything else is digital to count the pulse. Process/ PVT Variation Calibrated it out Check variations Temperature (Process, Voltage and - VCO Sensitivity Temperature) - Leakage current of the cap. NFET, PFET will be affected by the process. The leakage is exponentially increased by the temperature  it will affect the accuracy

175 7.5 Ecosystem

The author designed the on-die noise measurement system in UMC 90nm technology and concentrated on how to measure the power supply noise on die in terms of the circuit design.

However, understanding how the measurement system works is one of the most important as much as how the system is designed. Ecosystem is the interfaces which are interacted between the measurement circuits and the supporting systems such as software and even basic concepts on coverage effectiveness of the number and locations of voltage measurement circuits. In this section, the author will be introduced which ecosystem is necessary for the measurement system.

7.5.1 Software – Trigger Signal

A trigger signal can make it possible to set up the points where to start the sampling noise and where to stop since the trigger is deterministic to the clock and synchronous to the clock on die. Therefore, when the trigger comes, the system starts sampling and stop sampling within the sampling window. As the author mentioned about the design specification earlier, the sampling switch will be sampled once every clock cycle, so there is a step-size to be determined by the trigger signal and the sampling frequency is defined by the width of the step size. By doing so, a trigger signal will be provided by software that is interacting with the circuit through the platform and SOC using JTAG [7.14] interface as shown in Figure 7-45.

Figure 7-45. Simplified diagram between the circuit and the software for a trigger signal.

176 7.5.2 Software – Look-up Table

Look-up table is necessary to be storing the number of pulses which is correlating to each clock cycle. Here is the process; when the trigger comes, the measurement system can be capturing the power supply noise for one of clock cycle and be counted the pulses which are generated by the ring-VCO during the conversion window. At this moment, a number of bits on the counter are shifted to the register and the number of bits in the shift register will be stored into the look-up table through a JTAG Adapter-TAP (Test Access Port) as shown in Figure 7-46.

Figure 7-46. Simplified diagram between the circuit and the software for a look-up table

Hence, the software can be aware of the timing when the number of bits is ready to be counted through the trigger signal since it is notified the start timing of each clock cycle. Once, the software is finished to be read the count using the TAP, then the shift register will be ready the count of the next clock cycle when another trigger comes. The TAP makes possible to be accessed to all specific processor and supports the commands as shown in Table 7-15. For reference, the IEEE standard defines the following TAP signals, used for the and driving the TAP controller (JTAG state machine):

Table 7-15. JTAG controller command signals Command Description Comment TDI Test Data In serial data from debugger to target TDO Test Data Out serial data from target to debugger TCLK Test Clock TMS Test Mode Select controls the TAP controller state transitions TRST Test Reset optional, resets the TAP controller

177 7.5.3 Locations of Voltage Measurement Circuit

The author also evaluated coverage effectiveness of the number and locations of voltage measurement circuits, especially its ability to capture noise droops that would affect functionality or performance.

Figure 7-47. Locations of circuits on the core in Intel sandy-bridge microprocessor

As shown in Figure 7-47, the voltage measurement circuit will be placed evenly; spread out equally. The density is approximately for a 1000um grid and the locations will be right next to the block of an execution unit. Especially, the areas of highest activity and higher power on the specific logic block will have more the measurement circuits.

Figure 7-48. Locations of circuits on the L3 cache in Intel sandy-bridge microprocessor

In case of L3 cache, the location of the circuits are on the edge of each block that is related to the core since whitespace between control logics are placed on those area as shown in

Figure 7-48. The density is as same as the circuits on the core and for reference, the circuits is placed on whitespace and holes in the floor plan.

178 7.6 On-die Power Supply Noise Measurement System in 65nm Technologies

Now that the measurement system has been demonstrated in 90nm in UMC process, the author designed the measurement system in 65nm CMP process in order to figure out the transistor scaling effect in terms of the accuracy variation and so on. The 65nm process is provided by STMicroelectronics and the gate length of the 65 process is shrinking by square root of 2 from 90nm process. To see the trend of future technology, firstly, the influence of scaling on

MOS device characteristics is needed to recognize since the critical parameters of a device are scaled by a dimensionless factor S as shown in Table 7-16. Then, let’s take a look how the trend of technology can be affected the on-die power supply noise measurement system.

Table 7-16. Influence of scaling on MOS device characteristics [7.15] Parameter Sensitivity Constant Lateral Field Length: L 1/S 1/S Width: W 1/S 1 Gate oxide thickness: tOX 1/S 1 Supply voltage: VDD 1/S 1 Threshold voltage: Vtn, Vtp 1/S 1 Substrate doping: NA S 1 β W/L 1/Tox S S Current: Ids β(VDD-Vt)2 1/S 1 Resistance: R Vdd/Ids 1 1/S Gate Capacitance: C WL/tox 1/S 1/S Gate delay: τ RC 1/S 1/S2 Clock frequency: f 1/ τ S S2 Dynamic power dissipation (per gate): P CV2f 1/S2 S Chip Area: A 1/S2 1 Power density P/A 1 S Current density Ids/A S S

7.6.1 Sampling Period

The smaller technology, the length of CMP 65nm process is reduced by scale factor, S which is √2 from the UMC 90nm process. Hence, the β is increased by √2 to be reduced the on resistance in equation (7-5). This contributes to be decreasing the RC time constant of the

179 sampling switch and holding capacitor in 65nm process. Therefore, the sampling period of the

switch can be reduced in order to detect faster power supply noise than the measurement system

in the UMC 90nm process.

A B Figure 7-49. Simulation results of RC time constant and sampling period. A) ~99% of the cap, 4τ = 69.3ps, B) Sampling period, D = 183.7ps

As shown in Figure 7-49, the 4 times RC constant (4τ) is 69.3ps which is faster than

143.9ps in case of the 90nm process in Table 7-17. Basically, the Ron of the 65nm processs is

shrinked by S than Ron of the 90nm process based on the equation (7-5).

Table 7-17. Comparison Ron and time constant between 90nm and 65nm 90nm 65nm On resistance of the sampling switch 356Ω 267Ω 4τ (4 X RC time constant) 143.8ps 69.3ps

It is contributing to not only enhancing the accuracy of the sampling switch but also

sampling the faster power supply noise which is over 1.5GHz. Hence, as the feature size scales

down, the measurement system is also affected the performance to be measuring the noise by the

trend of technology.

180 7.6.2 VCO frequency and VCO Sensitivity Under PVT Variation

The frequency of the ring oscillator is defined by the equation (7-12).

1 (7-12) 푓표푢푡 = 2 × 푇푝 × N Where: Tp: Propagation delay N: Number of stages

Hence, the output frequency is determined by the propagation delay and the number of stages of the ring oscillator. The designed ring oscillator has 5-stages interver and the proparagtion delay is shown in Table 7-18 and the delay can be reduced as long as the feature size of the transistor is short.

Table 7-18. Propagation delay in 90nm and 65nm process TYP_27ºC, VDD=1V 90nm 65nm Delay 68.4ps 23.7ps

The output frequency of the VCO is simulated between 90nm and 65nm technologies in

Table 7-19 under typical process at 27ºC and it is shown that the short propagation delay contributes a faster output frequency of VCO.

Table 7-19. VCO output frequency in case of typical process @27ºC TYP_27ºC 90nm 65nm 0.75V 1.46GHz 2.83GHz 1.00V 2.30GHz 4.13GHz 1.25V 2.97GHz 5.01GHz

However, for enhancement of the accuracy of the measurement circuit as the frequency of VCO is increased, it has to be analyzed how much the measurement circuit is affected by the

PVT variation in case of both technologies. By doing so, the comparison between 90nm and

65nm is necessary in terms of the VCO output freqyency as shown in Figure 7-50. And, the observation from this figure is that the variation of the 65nm process VCO is bigger than the

181 90nm process. In addition to, the output frequency of the VCO is faster as long as the temperature is decreasing.

A B Figure 7-50. Simulation results of VCO sensitivity. A) 90nm process, B) 65nm process

However, the resolution of the measurement circuit is increasing based on the equation

(7-9) since the Δfout of 65nm process is bigger than the Δfout of 90nm process as shown in

Table 7-20. It helps for the measurement system to be detected the power supply noise more accurate as long as the PVT variation is not dominant.

Table 7-20. Comparison the resolution between 90nm and 65nm in TYP @27ºC TYP_27ºC 90nm 65nm Δfout 1.53G 2.18G ΔVctrl 0.5V 0.5V Twin 30ns 30ns Resoultion (1LSB) 10.8mV 7.64mV

Through the calibration of the VCO sensitivity between 90nm and 65nm technologies, the performance of the measurement system will be varying depending on the trend of technology for the accuracy of the measurement system in terms of the resolution of ADC and the VCO sensitivity under PVT variations.

7.7 Summary

An on-die power supply noise measurement system is proposed for power integrity characterization to detect the power supply noise on die. First, two representative power supply noises are characterized using the current profiles of core-type circuit activity and I/O-type

182 circuit activity as well as the impedance profile of modeled power deliver network including board, package and chip. Using the current profile and the impedance profile, the power supply noise can be simulated to characterize the on-die noise. Second, the author proposed the on-die power supply noise measurement circuit which consists of the sampling and detection system and the analog to digital conversion. Especially, the bootstrapped switch in the sampling system is proposed to measure the voltage droop and the voltage overshoot which is beyond the power supply voltage. The system is simulated in 90nm to measure the power supply noise and is analyzed the sources of error to enhance the accuracy of the measurement circuit. Third, the ecosystem is demonstrated to be defined with the interfaces such as software and location of the circuit on die. Lastly, in order to see the effect of the transistor scaling, the measurement circuit is designed in 65nm technology to see the difference between the 90nm in terms of the accuracy of the system and the measurement range of the power supply noise.

183 CHAPTER 8 CONCLUSION

As the integration technology advances in both IC (Integrated circuits) and packages, the electrical performance of the system depends heavily on both On-chip local clock and Off-chip data speed between chip and package. Also, interconnect density is more complex for better functionality as achieved through Moore’s law. Now that both the frequency and the complexity of interconnect are increased, signal and power integrity is degraded by high-performance requirements as well as signal and power integrity become a big issue and will be inevitably more and more important in chip, package and PCB design as integration technology advances.

In addition to, both package modeling and simulation are more critical in FCBGA package design since physical structures are more complex. Also, not only modeling die and package accurately but also analyzing the electrical behavior of the package structures is important in order to anticipate higher performance and functionality for signal integrity and power integrity in FCBGA packages.

This dissertation primarily focused on developing new schemes to characterize signal and power integrity in chip-package system. First, for signal integrity characterization, two approaches have been used; one for characterizing TSV and interconnect signal integrity using single-ended

I/O driver which is GTL High Speed I/O test structures in 3D-chips and wire-bonding packages, and the other for characterizing interconnect signal integrity using new 20Gbps High-speed Test

IC system including differential-I/O driver in planar chip and flip-chip packages. Second, for electrical modeling, new cost effective modeling methodologies are proposed as well as electrical interaction in FCBGA Packages is evaluated. Third, for power integrity characterization, new on- die power supply noise measurement system is proposed and designed and the power supply noise are characterized using the noise current profile and impedance profile of the power delivery

184 network. Also, for enhancing the accuracy of the measurement circuit, the sources of error for the measurement circuit are analyzed. In addition to, the measurement circuit with smaller technology which is 65nm process is designed to go over the accuracy variation in the trends of the future technology.

Based on these objectives, the contribution of this research can be summarized as follows:

 Characterization of TSV and interconnect signal integrity in 3D stacked ICs and wire-

bonding package using non-invasive probing 1Gbps GTL High-speed I/O test

structure: The proposed test structure is non-invasive probing (non-destructive) which is

utilized the internal test points for TSV and interconnect signal integrity characterization

both on-chip and off-chip. Two 3D technologies are used to be fabricated for the test IC;

one is MITLL technology and the other is Tezzaron technology. Test vehicles using wire-

bonding packages with the 3D test IC have been implemented to demonstrate the electrical

performance between the on-chip and the off-chip.

 Development of a 20Gbps High-Speed IC test system for interconnects signal integrity

characterization in planar IC and Flip-chip packages: 20Gbps on-chip High-speed I/O

Test IC and test vehicle are introduced for interconnect signal integrity characterization in

the flip-chip package. First of all, a proposed test IC consists of 20GHz 8-Modulus PLL,

20Gbps 27-1 PRBS generator and 4-port differential CML I/O driver. Through this research,

not only investigating the signal integrity effects of differential I/O chip system but also

validating the accuracy of package electrical model in flip-chips is enabled in 20Gbps high-

speed data rate. The test vehicle is performed the EM simulation to figure out the difference

185 of the electrical performance in terms of insertion loss among dielectric materials such as

silicon, duroid and glass.

 Development of a methodology for FCBGA (Flip-chip Ball Grid Array) structure and

Investigation of the electrical interaction between the die and package in FCBGA

structure: This modeling methodology allows simulating the die and packaging separately,

not as co-simulation of package. In addition, building a FCBGA model using polynomials

to save computational cost was proposed. The package systems was simulated in order to

evaluate the electrical interaction between the top of metal layer in a die and a package

layers so that die–level and package-level design can make early decisions. These

techniques can be used for huge and complex modeling in FCBGAs and other applications.

 Development of an on-die power supply noise measurement system in microprocessor:

The significant of this research is that circuit-level designers can be addressed any risk

immediately by means of the detected noise as well as PDN-level designers can be

recognized the design metrics based on the power supply noise in order to suppress the PDN

stress. In addition to, it makes possible to measure the power supply noise which is ±25%

of power supply voltage. The significant work of this research is measuring the Vovershoot

using the proposed system under 1VDD supply voltage.

186 APPENDIX A POLYNOMIALS USING POLYNOMIAL REGRESSION

As shown in Figure 6-2, the transfer function of the each model can be converted to 4th- degree polynomials using Matlab code and the function, “Rational curve fit tool.” Once the polynomials are derived from the tool as shown in Table 6-2, each coefficient of the 4th-degree polynomial can be defined as shown in Figure 6-8. Now that the new polynomial is defined, the comparison between the S21 parameter of each model and the curve of polynomial can be shown using Matlab code as shown in Figure 6-9.

 MATLAB Code – Plot both a curve of the polynomial and S21 curve of the model

function rational_function_fit_complicated

clear; %% To import the data from ADS into Matlab if nargin==0 titledlg='FCBGA_Analyzer'; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; promptdlg={'Package file Name (.s2p)'}; defdlg={'Diff_die_7th.s2p'}; dlg=inputdlg(promptdlg,titledlg,1,defdlg,options); fn_SP=dlg{1}; end

%% To import the data from ADS into Matlab n_p = 7; %Number of pairs of C4 bump

A_SP=importdata(fn_SP); Freq=A_SP(:,1);

save('Freq.mat','Freq') N=length(Freq); S_Params = zeros(2,2,N);

%% RLCG parameters u=0; t=0; for p=1:0.001:N u=u+1; end

for r=1:N %% S_parameter make format S11=A_SP(r,2)+j*A_SP(r,3); S21=A_SP(r,4)+j*A_SP(r,5);

187 S12=A_SP(r,6)+j*A_SP(r,7); S22=A_SP(r,8)+j*A_SP(r,9); S_parameter=[S11 S12;S21 S22]; S_Params(:,:,r)=(S_parameter);

%% RLCG calculation - Treat as 2 port for 4-port network. K = (((((S11)^2-(S21)^2+1)^2-(2*S11)^2)/(2*S21)^2)^0.5); yield = (((1-(S11)^2+(S21)^2)/(2*S21) + K)^-1); t=t+1; end

%% compute the transfer function from the frequency response data TrFunc = s2tf(S_Params,100,100,100,1);

%% fit a rational function to the computed data and store the result in an rfmodel object RationalFunc = rationalfit(Freq,TrFunc,-45); nPoles = length(RationalFunc.A); disp(sprintf('The derived rational function contains %d poles.', nPoles));

%% Get the Numerator and Denominator of the Laplace Transform S-Domain Transfer Functions A = RationalFunc.A; C = RationalFunc.C;

S21_tf_real = real(TrFunc); S21_tf_ang = unwrap(angle(TrFunc)); S21_tf_db = db(TrFunc); save('S21_tf_real.mat','S21_tf_real') save('S21_tf_ang.mat','S21_tf_ang') save('S21_tf_db.mat','S21_tf_db')

%% compute the frequency response of the fitted model data [fresp,freq]=freqresp(RationalFunc,Freq); num=0; for dd=0.01:0.001:5 num=num+1; end z = cell(size(num)); y = cell(size(num)); n_n=0; m=0.01; for x=0.01:0.001:5

m=m+1; n_n=n_n+1; %% Meeting_compute the frequency response of the fitted model data p1 = -4E-08*n_p^3 + 4E-07*n_p^2 + 1E-05*n_p + 6E-05; p2 = 7E-07*n_p^3 - 4E-05*n_p^2 + 0.0005*n_p - 0.0003; p3 = -4E-06*n_p^3 + 0.0002*n_p^2 + 3E-05*n_p - 0.0691; p4 = -6E-06*n_p^3 + 6E-05*n_p^2 - 8E-05*n_p - 0.0071; p5 = 5E-06*n_p^3 - 8E-05*n_p^2 + 0.0001*n_p - 0.0004;

188 z{n_n} = m; y{n_n} = p1*x^4+p2*x^3+p3*x^2+p4*x+p5; end z = z(1:n_n)'; y = y(1:n_n)'; x_axis = cell2mat(z); y_axis = cell2mat(y); save('y.mat','y_axis') save('z.mat','x_axis')

%% plot the amplitude of the frequency response of the fitted model data and that of the computed data figure plot(freq,real(TrFunc),'r',x_axis/1e3,y_axis,'b') xlabel('Frequency, GHz') ylabel('db(S21)') legend('S-parameter Data','Polynomial Data') grid on end

189 LIST OF REFERENCES

1.1. ITRS. (2011). International Technology Roadmap for Semiconductors (ITRS) reports, http://www.itrs.net/reports.html, 2011

1.2. Yole Development. (2007). Yole Development, "3DIC & TSV Report Cost, Technologies & Margets," Nov.2007

1.3. Wang, M. and Hu, J. (2010). Power Delivery Network Optimization for Laptop and Desktop Computer Platforms, DesignCon, 2010

1.4. Raymond, Y. (2011) Signal Integrity chapter by Raymond Y. Chen of Sigrity, Inc

1.5. Knickerbocker, J. (2009) IBM T.J. Watson Research, New York, USA

1.6. YOSHINAGA, T., (2010) “Trends in R&D in TSV Technology for 3D LSI Packaging,” Science & Technology Trends, pp26-38, Oct. 2010

1.7. Toshiba Corp. (2007) Toshiba Corp press release, http://www.toshiba.co.jp/about/press/2007_10/pr0101.htm, October 1, 2007

1.8. Elpida Memory Inc. (2009) press release, August 27, 2009

2.1. Stephen, H. (2000) “High-Speed Digital System Design-A Handbook of Interconnect Theory and Design Practices”, John Wiley & Sons, Inc, 2000

2.2. Bogatin, Eric (2010) “Signal and Power Integrity Simplified 2nd Edition”, Prentice Hall, 2010

2.3. CST-Computer Simulation Technology. (2013) “Power Integrity Simulation for High Speed Board using CST PCBS,” http://www.cst.com, pp1-5, 2013

2.4. Huang, J. (2010) Simulation and Measurement of an On-Die Power-Gated Power Delivery System, DesignCon, 2010

3.1. Eisenstadt, W. (2011) “3D IC High Performance I/O, RF and Multicore Circuits,” 2011 GoMacTech, Orlando, FL, March 21-24, 2011

3.2. Groger, M.; Harb, S.M.; Morris, D.; Eisenstadt, W.R.; Puligundla, S., "High speed I/O and thermal effect characterization of 3D stacked ICs, (2009) " 3D System Integration, 2009. 3DIC 2009. IEEE International Conference on , vol., no., pp.1,5, 28-30 Sept. 2009

3.3. High Frequency Structure Simulator (HFSS) v13.0, (2011) ANSYS, Inc

3.4. Advanced Design Systems(ADS) (2008), Agilent Technologies

190 4.1. Baek, Hyunho; Harb, S.; Eisenstadt, W.R., (2012) "GTL high speed I/O in 3D ICs for TSV and interconnect signal integrity characterization," Electronic Components and Technology Conference (ECTC), 2012 IEEE 62nd , vol., no., pp.844,850, May 29 2012- June 1 2012

4.2. Tezzaron Semiconductor, http://www.tezzaron.com/

5.1. Zlatkovic, V., (2006) "Clocking Challenges in High Speed Source Synchronous Interfaces," Microelectronics, 2006 25th International Conference on , vol., no., pp.622,625, 2006

5.2. Ding, Yanping; O, K.K., (2007) "A 21-GHz 8-Modulus Prescaler and a 20-GHz Phase- Locked Loop Fabricated in 130-nm CMOS," Solid-State Circuits, IEEE Journal of , vol.42, no.6, pp.1240,1249, June 2007

5.3. Kim, Jaeha; Kim, Jeong-Kyoum; Lee, Bong-Joon; Hwang, Moon-Sang; Lee, Hyung-Rok; Lee, Sang-Hyun; Kim, Namhoon; Jeong, Deog-Kyoon; Kim, Wonchan, (2005) "Circuit techniques for a 40Gb/s transmitter in 0.13μm CMOS," Solid-State Circuits Conference, 2005. Digest of Technical Papers. ISSCC. 2005 IEEE International , vol., no., pp.150,589 Vol. 1, 10-10 Feb. 2005

5.4. Mijuskovic, D.; Bayer, M.; Chomicz, T.; Garg, N.; James, F.; McEntarfer, P.; Porter, J., (1994) "Cell-based fully integrated CMOS frequency synthesizers," Solid-State Circuits, IEEE Journal of , vol.29, no.3, pp.271,279, Mar 1994

5.5. Li, N.C.; Haviland, G.L.; Tuszynski, A. A., (1990) "CMOS tapered buffer," Solid-State Circuits, IEEE Journal of , vol.25, no.4, pp.1005,1008, Aug 1990

5.6. Eisenstadt, W.R.; Eo, Y., (1992) "S-parameter-based IC interconnect transmission line characterization," Components, Hybrids, and Manufacturing Technology, IEEE Transactions on , vol.15, no.4, pp.483,490, Aug 1992

5.7. Laskin, E. (2006) “On-Chip Self-test IP Blocks for High Speed Applications,” Dissertation, University of Toronto, pp.9, 2006

6.1. ITRS. (2011) International Technology Roadmap for Modeling and Simulation (ITRS) reports, http://www.itrs.net/reports.html, 2011

6.2. Wane, S.; An-Yu Kuo, (2008) "Chip-package co-design methodology for global co- simulation of re-distribution layers (RDL)," Electrical Performance of Electronic Packaging, 2008 IEEE-EPEP , vol., no., pp.59,62, 27-29 Oct. 2008

6.3. Kowalski, M.E.; Codd, P., (2007) "Co-Simulation of IC, Package and PCB Power Delivery Networks in Ultra-Low Voltage Power Rail Designs," Electronic Components and Technology Conference, 2007. ECTC '07. Proceedings. 57th , vol., no., pp.798,803, May 29 2007-June 1 2007

191 6.4. Ha, Myunghyun; Srinivasan, K.; Swaminathan, M., (2008) "Chip-package co-simulation with multiscale structures," Electrical Performance of Electronic Packaging, 2008 IEEE- EPEP , vol., no., pp.339,342, 27-29 Oct. 2008

6.5. Q3D Extractor v13.0, ANSYS, Inc.

6.6. Matlab R2009a, The MathWorks

6.7. Ivy Bridge (microarchitecture) (2013) from http://en.wikipedia.org/wiki/Ivy_Bridge_(microarchitecture)

7.1. Muhtaroglu, A.; Taylor, G.; Rahal-Arabi, T., (2004) "On-die droop detector for analog sensing of power supply noise," Solid-State Circuits, IEEE Journal of , vol.39, no.4, pp.651,660, April 2004

7.2. Jiao, Dong; Gu, Jie; Kim, C.H., (2009) "Circuit techniques for enhancing the clock data compensation effect under resonant supply noise," Custom Integrated Circuits Conference, 2009. CICC '09. IEEE , vol., no., pp.29,32, 13-16 Sept. 2009

7.3. Oh, Dan; Lan, Hai; Madden, C.; Chang, Sam; Yang, Ling; Schmitt, R., (2010) "In-situ characterization of 3D package systems with on-chip measurements," Electronic Components and Technology Conference (ECTC), 2010 Proceedings 60th , vol., no., pp.1485,1492, 1-4 June 2010

7.4. Alon, E.; Stojanovic, V.; Horowitz, M.A., "Circuits and techniques for high-resolution measurement of on-chip power supply noise," Solid-State Circuits, IEEE Journal of , vol.40, no.4, pp.820,828, April 2005

7.5. Intel Sandy Bridge Review (2011) from http://www.bit- tech.net/hardware/cpus/2011/01/03/intel-sandy-bridge-review/

7.6. Nyquist-Shannon sampling theorem (2013) from http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem#cite_note -1

7.7. Chansungsan, C., (2005) "Auto-referenced on-die power supply noise measurement circuit," Custom Integrated Circuits Conference, 2005. Proceedings of the IEEE 2005 , vol., no., pp.39,42, 18-21 Sept. 2005

7.8. Puligundla, S. (2007) “SIGNAL-STRENGTH INDICATORS AND HIGH-SPEED SAMPLERS FOR EMBEDDED TEST OF MIXED-SIGNAL INTEGRATED CIRCUITS,” Dissertation, University of Florida, 2007

7.9. McCreary, J.L.; Gray, P.R., (1975) "All-MOS charge redistribution analog-to-digital conversion techniques. I," Solid-State Circuits, IEEE Journal of , vol.10, no.6, pp.371,379, Dec. 1975

192 7.10. Maksimovic, D.; Dhar, S., (1999) "Switched-capacitor DC-DC converters for low-power on-chip applications," Power Electronics Specialists Conference, 1999. PESC 99. 30th Annual IEEE , vol.1, no., pp.54,59 vol.1, Aug 1999

7.11. Sidiropoulos, S.; Liu, Dean; Kim, Jaeha; Wei, Guyeon; Horowitz, M., (2000) "Adaptive bandwidth DLLs and PLLs using regulated supply CMOS buffers," VLSI Circuits, 2000. Digest of Technical Papers. 2000 Symposium on , vol., no., pp.124,127, 15-17 June 2000

7.12. Kester, W. (2009) "Aperture Time, Aperture Jitter, Aperture Delay Time Removing the Confusion", Analog Devices pp. 1-8, 2009

7.13. Chee-Hong, I., Fujishima, M (2008) "Design and Modeling of Millimeter-wave CMOS Circuits for Wireless Transceivers", Springer, 2008

7.14. IEEE Std. 1149.1, Standard Test Access Port and Boundary Scan Architecture

7.15. Neil, H., Harris, W., & Harris, D. (2005) “CMOS VLSI DESIGN 3rd edition”, Pearson, 2005

193 BIOGRAPHICAL SKETCH

Hyunho (Albert) Baek was born in Seoul, S. Korea. He received his B.E in information and communication engineering from Andong National University, Andong, S. Korea in 2004.

He received his M.S and Ph.D. in electrical and computer engineering from University of

Florida, Gainesville, FL USA in 2010 and 2013 respectively.

During the summer of 2009, he worked as a Global Intern at Samsung Electronics

System LSI, Ki-Heung, S. Korea and the summer of 2012 and 2013, he worked as a Graduate

Intern Technical at Intel Corporation, Folsom, California USA.

His research interests are in the areas of mixed signal design, signal and power integrity characterization in chip-package system.

194