QUILT PACKAGING: A NOVEL HIGH SPEED CHIP-TO-CHIP

COMMUNICATION PARADIGM FOR SYSTEM-IN-PACKAGE

A Dissertation

Submitted to the Graduate School

of the University of Notre Dame

in Partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

by

Qing Liu, B.E., M.E., M.S.E.E.

Gary H. Bernstein, Director

Graduate Program in Electrical Engineering

Notre Dame, Indiana

December 2007

© Copyright by

QING LIU

2007

All Rights Reserved

QUILT PACKAGING: A NOVEL HIGH SPEED CHIP-TO-CHIP

COMMUNICATION PARADIGM FOR SYSTEM-IN-PACKAGE

Abstract

by

Qing Liu

As state-of-the-art features continue to shrink and the incorporation

of high-k, low-k isolation dielectric materials and strained and SiGe layers on

becomes common, chip density and performance are improved. However, system

performance has not kept up with the pace especially at multi-GHz clock rates. The

bottleneck is packaging. Conventional packaging techniques require high driving

current and large in-die area for bonding pads, and provide limited bandwidth. As a result, several technologies, such as system-on-chip, system-in-packaging and

system-on-packaging, have been actively pursed to meet the demands of low power,

high I/O counts and fast chip-to-chip communication. Here, we present a novel

packaging technique, Quilt Packaging (QP), for system-in-package. QP uses

microelectromechanical systems (MEMS) inspired fabrication techniques to form Qing Liu contacts along the vertical edge facets of the integrated circuits (ICs) during the back-end-of-line process, enabling the ICs to be interconnected by butting them against each other. A shorter path between chips is established compared with other system-in-packaging techniques pursued by industry, which leads to shorter delay, less power consumption and better signal integrity. The contacts are formed by copper nodules embedded inside the silicon substrate. Nodules are made of trenches into the silicon substrate by deep reactive ion etch (DRIE), which are filled by electrolytic copper plating followed by chemical-mechanical polishing (CMP). Different QP structures are fabricated with nodule depth of 20 µm and widths from 10 µm to 100

µm. To further improve the transmission performance, tapered nodules, which provide better impedance matching to on-chip interconnects, are designed and fabricated.

QP is a novel packaging technique for ultra-fast and low-power chip-to-chip communications. The fabrication process of QP can be easily integrated into standard

IC process with an extra two masks, one for the nodules and the other for the separation of the chips. The system performance by implementing QP can be dramatically improved along with the improvement of the ICs trends.

to my parents…

ii

CONTENTS

FIGURES…………………………………………………………………………….vii

TABLES……………………………………………………………………………..xiv

ACKNOWLEDGMENT……………………………………………………………..xv

CHAPTER 1 INTRODUCTION……………………………………………………...1

1.1 Motivation…………………………………………………………………..1

1.2 Organization………………………………………………………………...6

CHAPTER 2 REVIEW OF PACKAGING TECHNIQUES…………………………..7

2.1 Through-Hole Technology………………………………………………….7

2.2 Surface-Mount Technology…………………………………………………8

2.3 Multichip Modules………………………………………………………….9

2.4 Chip Bonding Techniques…………………………………………………13

2.5 Proximity Communication………………………………………………...15

2.6 Neo-Stack Technology…………………………………………………….17

2.7 Transfer & Join Technology……………………………………………….18

2.8 Off-the-Top Chip-to-Chip Interconnection………………………………..20

iii CHAPTER 3 QUILT-PACKAGING: INTRODUCTION, THEORY AND

SIMULATION ……………………………………………………23

3.1 Introduction to Quilt-Packaging…………………………………………...23

3.2 Signal Integrity Characteristics……………………………………………27

3.2.1 Delay……………………………………………………………...27

3.2.2 Reflection…………………………………………………………28

3.2.3 Crosstalk………………………………………………………….28

3.2.4 Ground Bounce…………………………………………………...29

3.3 Electrical Modeling of Interconnects……………………………………...30

3.3.1 On-Chip Interconnection…………………………………………...30

3.3.2 Modeling of On-Chip Interconnects at Intermediate Frequency…...33

3.3.3 Transmission Line Effects………………………………………….39

3.3.4 Modeling of Conventional Packages……………………………….47

3.3.5 Modeling of Quilt-Packaging………………………………………50

CHAPTER 4 QUILT-PACKAGING: FABRICATION PROCESS………………….81

4.1 Calibration of DRIE……………………………………………………….82

4.2 Copper Plating……………………………………………………………..88

4.2.1 Electroless Copper Plating…………………………………………89

4.2.2 Electrolytic Copper Plating………………………………………...92

4.3 Chemical Mechanical Polishing………………………………………….98

iv 4.3.1 Theory…………………………………………………………….101

4.4 Experiments on Quilt-Packaging Fabrication with Electroless Plating….103

4.5 Experiments on Quilt-Packaging Fabrication with Electrolytic Plating…111

4.6 Final Fabrication Process of Quilt-Packaging……………………………121

CHAPTER 5 MICROWAVE MEASUREMENTS OF QUILT PACKAGING…….141

5.1 Review of de-embedding techniques for on- measurements………141

5.1.1 Open………………………………………………………………142

5.1.2 Open and Short……………………………………………………143

5.1.3 Open, Short and Thru……………………………………………..145

5.1.4 Two-port Network with a Thru……………………………………147

5.1.5 Three-step with Two Shorts, Open and Thru……………………..148

5.1.6 Four-step with Two Shorts and Two Opens………………………151

5.1.7 Two-port Network with One Open and Two Thrus……………….153

5.1.8 Two-port Network with One Open and One Thru………………...156

5.2 Microwave Measurements of QP………………………………………...159

CHAPTER 6 CONCLUSIONS AND FUTURE WORK…………………………...170

6.1 Conclusions………………………………………………………………170

6.2 Future Work………………………………………………………………173

v REFERENCES……………………………………………………………………...182

Chapter 1 References…………………………………………………………182

Chapter 2 References…………………………………………………………184

Chapter 3 References…………………………………………………………186

Chapter 4 References…………………………………………………………189

Chapter 5 References…………………………………………………………192

Chapter 6 References…………………………………………………………194

vi

FIGURES

1.1 A system-on-chip where different function blocks are integrated into one chip, and complexity is decreased by re-using IP (intellectual property) from different design houses………………………………………………...... 2 1.2 SiP (a) 2-D assembly and (b) 3-D assembly…………………………………4 1.3 SiP example based on 3-D assembly…………………………………………4 1.4 SoP concept for system integration of thin film components………………...5 1.5 Conceptual diagram of Quilt-Packaging……………………………………..6 2.1 Schematic of a dual-in-line plated through-hole package……………………7 2.2 (a) A schematic of the surface-mount technology and (b) a closer look at surface-mount package………………………………………………………8 2.3 A typical multichip module………………………………………………….9 2.4 (a) MCM-L, (b) MCM-C, and (c) MCM-D………………………………..10 2.5 Fabrication process of MCM-C…………………………………………….12 2.6 Cross section of MCM-D with a flip-chip solder bump……………………13 2.7 Chip bonding techniques: (a) wire bonding, (b) tape automated bonding, and (c) flip-chip bonding………………………………………………………..14 2.8 Cross section of inter-chip interconnection………………………………...15 2.9 Circuit diagram of transmitter and receiver………………………………..15 2.10 Chip photograph……………………………………………………………16 2.11 A close look of aligned chip………………………………………………..16 2.12 Neo-stack process sequence………………………………………………..17 2.13 A flash memory module……………………………………………………18 2.14 Schematic graph of (a) normal MCM and (b) T & J……………………….19 2.15 T & J technology for (a) 2-D SiP, (b) 3-D SiP with thermal problem, and (c) 3-D SiP offering thermal dissipation……………………………………….20 2.16 Conventional and the new off-the-top interconnection between two chips..21 2.17 20 Gbps/channel chip-to-chip interconnection……………………………..21 2.18 High bandwidth interconnection using off-the-top………………………...22 3.1 Conceptual diagram of Quilt-Packaging…………………………………...23 3.2 Cartoon representation of: (a) a simple two-chip QP connection and (b) a three-chip QP system……………………………………………………….25 3.3 Quilt-packaging applications in (a) optical communications, (b) RF communications, and (c) high speed digital processors……………………26 3.4 Ground bounce setup and voltages…………………………………………29

vii 3.5 The trends of smallest transistor gate length and minimum width of interconnects………………………………………………………………..31 3.6 A typical chip cross-section………………………………………………...31 3.7 Cross-section of SEM picture of 90 nm CMOS interconnects and their key design rules…………………………………………………………………32 3.8 (a) Lumped model and (b) distributed model………………………………33 3.9 Repeater insertion to reduce RC delay……………………………………..34 3.10 Schematic of uniformly repeated line with initial cascade stage…………..37 3.11 Eliminating buffer stage in (a) by resizing logic stages and repeaters……..37 3.12 Delay comparison…………………………………………………………..38 3.13 Delay and power comparison………………………………………………38 3.14 Simulation models for distributed RC lines: (a) π model, (b) t model, (c) π2 model, (d) t2 model, (e) π3 model and (f) t3 model………………………..39 3.15 (a) Schematic of transmission line and (b) distributed RLGC model……...40 3.16 Distributed LC model for lossless transmission line……………………….41 3.17 Schematic of combined skin effect and proximity effect…………………..44 3.18 Transmission line circuit with load and generator…………………………45 3.19 A schematic top view of an open classic IC package……………………...47 3.20 A partial package model for 3 leads………………………………………..48 3.21 Main parasitics in package…………………………………………………48 3.22 Typical ground bond and grounded lead for a floating paddle package…...49 3.23 Side view of a typical two chip interconnection through quilt-packaging…50 3.24 Cross sections of (a) conventional CPW, (b) conductor-backed CPW, and (c) CPW with finite ground planes…………………………………………….51 3.25 Schematic of a two-port transmission line…………………………………56 3.26 HFSS simulation model of simple CPW structured QP prototype…………59 3.27 Wave port field distribution when it is (a) too small and (b) too wide……..59 3.28 HFSS simulation model of CPW QP prototype with improved, tapered nodule pattern………………………………………………………………60 3.29 Simulated eturn loss (a) and insertion loss (b) before de-embedding……...61 3.30 Simulated return loss (a) and insertion loss (b) after de-embedding……….61 3.31 Prototypes in Ansoft HFSS: (a) “simple QP,” (b) “QP improved 1,” (c) “QP improved 2,” (d) on-chip interconnect, and (e) a closer look at “QP improved 2”…………………………………………………………………………...63 3.32 Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures and on-chip interconnect…………………………………………………...67 3.33 Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures after de-embedding…………………………………………………………68 3.34 Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures on both low and high resistivity silicon substrate………………………….69 3.35 Simulated return loss (a) and insertion loss (b) of 50 µm wide QP structures on both low and high resistivity silicon substrate before de-embedding…..70

viii 3.36 Simulated return loss (a) and insertion loss (b) of 50 µm wide QP structures on low resistivity silicon substrate after de-embedding……………………71 3.37 Simulated return loss (a) and insertion loss (b) of 20 µm wide QP structures on both low and high resistivity silicon substrate before de-embedding…..72 3.38 Simulated return loss (a) and insertion loss (b) of 20 µm wide QP structures on low resistivity silicon substrate after de-embedding……………………73 3.39 Simulated return loss (a) and insertion loss (b) of 10 µm wide QP structures on both low and high resistivity silicon substrate before de-embedding…..74 3.40 Simulated return loss (a) and insertion loss (b) of 10 µm wide QP structures on low resistivity silicon substrate after de-embedding……………………74 3.41 Prototypes in Ansoft HFSS: (a) “simple QP with limited ground”, (b) “QP improved with limited ground”, (c) on-chip interconnect with limited ground, and (d) a closer look at “QP improved with limited ground”………………76 3.42 Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures with wide and limited ground plane on low resistivity silicon substrate before de-embedding……………………………………………………….76 3.43 Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures with wide and limited ground plane on high resistivity silicon substrate before de-embedding……………………………………………………….77 3.44 Simulated return loss (a) and insertion loss (b) of 50 µm wide QP structures with wide and limited ground plane on low resistivity silicon substrate before de-embedding……………………………………………………….77 3.45 Simulated return loss (a) and insertion loss (b) of 50 µm wide QP structures with wide and limited ground plane on high resistivity silicon substrate before de-embedding……………………………………………………….78 3.46 Simulated return loss (a) and insertion loss (b) of 20 µm wide QP structures with wide and limited ground plane on low resistivity silicon substrate before de-embedding……………………………………………………….78 3.47 Simulated return loss (a) and insertion loss (b) of 20 µm wide QP structures with wide and limited ground plane on high resistivity silicon substrate before de-embedding……………………………………………………….79 3.48 Simulated return loss (a) and insertion loss (b) of 10 µm wide QP structures with wide and limited ground plane on low resistivity silicon substrate before de-embedding……………………………………………………….79 3.49 Simulated return loss (a) and insertion loss (b) of 10 µm wide QP structures with wide and limited ground plane on high resistivity silicon substrate before de-embedding……………………………………………………….80 4.1 Fabrication process of quilt packaging: (a) Wafer after the devices formed and before the first metal layer placed, (b) deep trenches at the edge of the chips by deep reactive ion etching (DRIE), (c) a closer look of (b), (d) passivation of trenches to form insulation layer, (e) after seed layer deposition and electroless copper plating of the trenches, (f)

ix chemical-mechanical polishing (CMP) of wafer surface, (g) complete interconnection with pads, and (h) separate chips by DRIE and CMP, undercut removal by isotropic etching……………………………………..82 4.2 A typical Bosch process……………………………………………………82 4.3 Cartoon shows the sequential steps of the Bosch process…………………83 4.4 SEM picture of the sidewall after the Bosch process, which shows the typical scalloping profile…………………………………………………..84 4.5 SEM pictures of trenches after 3 minutes of Bosch etch: (a), (b): 2 µm wide; (c), (d): 5 µm wide; (e), (f): 10 µm wide; and (g), (h): 20 µm wide……….87 4.6 Etch rate of Bosch process………………………………………………....87 4.7 1 µm thick copper is electroless plated on the surface of 1.5 µm thick polysilicon……………………………………………………………….....91 4.8 SEM pictures of polyimide trenches filled with electroless plated copper (a) 0.7 µm wide, 1.9 µm deep and (b) 0.7 µm wide, 2.8 µm deep…………….92 4.9 Schematic process flow of dual damascene………………………………94 4.10 Via plating patterns………………………………………………………..95 4.11 Additives distribution near and within a via……………………………...96 4.12 Time evolution of via fill. The top row is normal plating. The center row is conformal plating. The bottom row is bottom-up plating………………...96 4.13 (a) Pulse and (b) pulse-reverse current waveforms……………………….97 4.14 Configuration of CMP tool………………………………………………..99 4.15 Schematic of Cu CMP. (a) Before CMP; (b) ideal case after CMP; and (c) real case after CMP……………………………………………………….100 4.16 Two-layer mask set for electroless plating experiment. Different shapes with different dimensions are included for the investigation of etching and plating process. The maximum width of lines and pads is 20 µm, and the minimum width is 2 µm…………………………………………………..104 4.17 (a), (b): patterns covered by AZ 4620; and (c), (d): patterns after stripping AZ 4620. Only Cr/Cu inside the trenches is left and will serve as seed layer in electroless copper plating………………………………………...105 4.18 Electroless copper plating after 4 hours in (a), (b): 35 deg. C; (c), (d): 40 deg. C; and (e), (f): 45 deg. C………………………………………………….107 4.19 Surface profile after 4 hours of plating at 35 deg. C……………………...107 4.20 Microscope and SEM pictures after CMP for wafers plated at (a), (b): 35 deg. C; (c), (d): 40 deg. C; and (e), (f): 45 deg. C……………………………...108 4.21 After partially separating the chips by DRIE for the wafers plated at (a)-(d): 35 deg. C; (e)-(h): 40 deg. C; and (i)-(l): 45 deg. C………………………110 4.22 Simple schematic of evaporation process…………………………………111 4.23 Overview of mask set for test of QP fabrication process using electrolytic copper plating……………………………………………………………..112 4.24 Alignment techniques for QP interconnection: (a) corner silicon etch; (b) nodule area silicon etch and (c) keyed nodules…………………………...113

x 4.25 Current waveform in electroplating………………………………………115 4.26 After 2 hours of electroplating……………………………………………115 4.27 Patterns after annealing…………………………………………………...116 4.28 Patterns after CMP………………………………………………………..117 4.29 After 45 minutes of DRIE………………………………………………...117 4.30 Micrographs of separated chips…………………………………………...118 4.31 Micrographs of aligned QP prototypes using alignment techniques of (a), (b): corner silicon etch; (c), (d): nodule area silicon etch; and (e), (f): keyed nodules……………………………………………………………………119 4.32 SEM pictures of (a), (b): 100 µm wide keyed nodules; (c), (d): closer look of the 100 µm wide keyed nodules; (e), (f): 20 µm wide nodules; and (g), (h): side views of 20 µm wide nodules……………………………………….120 4.33 Overview of the three-layer QP mask set………………………………..122 4.34 Final fabrication process of QP structures. (a) Define and etch nodules by DRIE, (b) passivate trenches by PECVD SiO2, (c) sputter Ti/Cu seed layer inside the trenches, (d) plate copper to fill the trenches, (e) planarize the nodules by CMP, (f) continue back-end-of-line process to finish ICs, (g) spin

thick photoresist as protection layer, open separation area and remove SiO2 on the surface, (h) use DRIE to remove part of the silicon substrate, (i) dip

wafer in BHF to remove SiO2 and Ti on the sidewall of protruded nodules, (j) continue DRIE to separate the chips, (k) electrolessly plate Sn on the nodule sidewall, (l) align and fix chips, then plate Sn to form bridge, and (m) strip photoresist and form a complete two-chip QP structure………………….126 4.35 (a), (b): 10 µm wide copper nodules; (c), (d): 20 µm wide copper nodules; (e), (f): 50 µm wide copper nodules; and (g), (h): 100 µm wide copper nodules after CMP………………………………………………………..130 4.36 Patterns after lift off: (a) 10 µm; (b) 20 µm; (c) 50 µm; and (d) 100 µm...131

4.37 Patterns coated with AZ 4620 after develop and removal of SiO2 on the surface: (a) 10 µm; (b) 20 µm; (c) 50 µm; and (d) 100 µm………………132 4.38 Patterns after 20 minutes of Bosch process………………………………133 4.39 SEM pictures of separated chips with copper nodule width at (a) – (d): 10 µm; (e) – (h): 20 µm; (i) – (l): 50 µm; and (m) – (p): 100 µm……………136 4.40 Side views of before (a, b) and after (c, d) clean of the precipitants on copper nodules……………………………………………………………………137 4.41 Side view of chip after separation by DRIE Bosch process………………137 4.42 Alignment of two chips on Loctite 460: (a) 10 µm; (b) 20 µm; (c) 50 µm; and (d) 100 µm……………………………………………………………138 4.43 QP connections of chips with 50 µm wide nodules (a, b), and 100 µm wide nodules (c, d)……………………………………………………………...139 4.44 Overviews of QP structures for (a) 50 µm wide nodule chips and (b) 100 µm wide nodule chips…………………………………………………………140 5.1 Open structure and equivalent circuit……………………………………..142

xi 5.2 Open and short……………………………………………………………144 5.3 Open, short and thru………………………………………………………145 5.4 Two-port network de-embedding with a thru……………………………..147 5.5 Three-step calibration standards and their equivalent circuits……………149 5.6 Equivalent circuit of the RF test fixture…………………………………..149 5.7 Four-step calibration standards…………………………………………...151 5.8 Equivalent circuits of calibration standards………………………………151 5.9 Equivalent circuit of the test fixture………………………………………152

5.10 Graph representation of Zi , Z1 and α ………………………………...153 5.11 Schematic representation of cascade configuration, which includes probe pads, metal interconnect lines and transistor (DUT)……………………...155 5.12 DUT and its corresponding open, thru1 and thru2 structures…………….156 5.13 Cascaded-based de-embedding method. (a) Layouts of DUT, OPEN and THRU. (b) Schematic diagrams…………………………………………..157 5.14 Fabricated QP structures for microwave measurements: (a) simple QP with 50 µm wide nodules; (b) QP improved 2 with 100 µm wide nodules……160 5.15 (a) Return loss and (b) insertion loss of both simulated and measured QP structures with 100 µm wide nodules on low resistivity silicon substrate………………………………………………….……………….160

5.16 (a) Return loss (S11) and (b) insertion loss (S21) of 100 µm QP structures after de-embedding………………………………………………………..161 5.17 (a) Return loss and (b) insertion loss of both simulated and measured QP structures with 100 µm wide nodules on high resistivity silicon substrate………………………………………………...………………...164

5.18 (a) Return loss (S11) and (b) insertion loss (S21) comparison of 100 µm QP improved 2 on 10 Ω • cm and 8000 Ω • cm silicon substrate…………….164 5.19 Maximum available gain (MAG) of QP improved 2 on both low and high resistivity silicon substrate………………………………………………..165

5.20 (a) Return loss (S11) and (b) insertion loss (S21) of 100 µm QP structures on high resistivity substrate after de-embedding……………………………..166 5.21 (a) Return loss and (b) insertion loss of both simulated and measured QP structures with 50 µm wide nodules on low resistivity silicon substrate………………………………………………...………………...167

5.22 (a) Return loss (S11) and (b) insertion loss (S21) of 50 µm QP structures on low resistivity substrate after de-embedding……………………………...167 5.23 (a) Return loss and (b) insertion loss of both simulated and measured QP structures with 50 µm wide nodules on high resistivity silicon substrate………………………………………………...………………...168 6.1 Schematic of the connecting technique with fixture……………………...175 6.2 (a) 25 µm thick hexagonal chips with dicing line prepared by anisotropic etching, (b) a 25 µm thick chip with rounded corner……………………..176

xii 6.3 Simple Schematic of QP connection……………………………………...177 6.4 IntelliSuite simulation results on QP connecting by copper film with chip spacing at (a) 0.5 µm, (b) 1 µm, (c) 2 µm, (d) 3 µm and (e) 4 µm……….178 6.5 A QP model built on GaAs substrate……………………………………..180 6.6 (a) return loss and (b) insertion loss of 65 GHz to 85 GHz targeting automotive radar system………………………………………………….180

xiii

TABLES

2.1 COMPARISON OF MCM FEATURES……………………………………11

3.1 DIMENSIONS OF THE 100 µM QP INTERCONNECTS………………64

3.2 DIMENSIONS OF THE 50 µM QP INTERCONNECTS…………………64

3.3 DIMENSIONS OF THE 20 µM QP INTERCONNECTS…………………65

3.4 DIMENSIONS OF THE 10 µM QP INTERCONNECTS…………………65

4.1 DIMENSION AND SPACING OF LONG LINES………………………...85

4.2 ADJUSTABLE CMP PARAMETERS……………………………………101

xiv

ACKNOWLEDGMENTS

First of all, I would like to gratefully thank my advisor, Professor Gary H.

Bernstein, for his enthusiastic guidance and inspiration. My experience at Notre Dame becomes so rewarding under his mentoring. It is very fortunate for me to be his student and be exposed to his work ethic, active thinking and caring. I would never finish my Ph.D work without his constant encouragement and patience.

I thank my dissertation committee for their assistance. I would like to say a big

“thank you” to Dr. Patrick Fay for the technical discussions on simulations and fabrication processes, and his great help on microwave measurements. I thank Dr.

Gregory Snider for his help in fabrication. He inspired me to do clean up on the copper and get better connectivity. I also thank Dr. Jay Brockman for leading me to thinking about the industry needs on this work not only the research.

My colleagues deserve many thanks during my study at Notre Dame. I thank

Dr. Minjun Yan for his help on SEM pictures, Dr. Zhuowen Sun for the discussions on

Ansoft software, Jason Kullick and Wayne Buckhanan for their help on the first successful batch, Jie Su, Heng Yang, Dr. Wenchuang (Walter) Hu and Dr. Qingling

Hang for useful technical discussions along with fun talks.

xv Many other faculty, stuff and colleagues at Department of Electrical

Engineering and also outside of the department gave me a lot of help. Thank you.

Finally, I have to say “thank you so much” to my parents. Without their love, I would never be able to happily achieve at this moment.

xvi

CHAPTER 1

INTRODUCTION

1.1 Motivation

Ever since integrated circuits (ICs) were invented, the density, performance and power consumption have been dramatically improved by incorporating advanced device and circuit design, processing, and packaging techniques.

As the gate length of modern IC technology goes into submicron era (45 nm in today’s state-of-the-art CMOS processing) and continues the trend with the attribute of strained and SiGe layers on silicon [1], single chip density and performance are enhanced to provide higher speed and more functions. However, not all the advantages can be achieved in the system level due to the limitation of packaging, especially at multi-GHz frequency[2]. Besides, the rapid demand of consumer electronics with low power consumption (longer battery life), low cost, and better portability present challenges to conventional packaging techniques. To alleviate the problems, system-on-chip (SoC), system-in-package (SiP) [3], and system-on-package (SoP) [4][5] have been proposed and actively pursued.

A SoC is a VLSI system that utilizes a large number of to incorporate several kinds of function blocks into a single chip, which offers the most compact and light-weight design. A schematic of SoC is shown in Fig. 1.1. By combining different

1

Figure 1.1. A system-on-chip where different function blocks are integrated into one chip, and complexity is decreased by re-using IP (intellectual property) from different design houses. (Adopted from [3].)

chips into one, the delay, power consumption and electromagnetic radiation caused by interconnection between chips are largely decreased by the much shorter on-chip interconnects. System performance can be improved by several times.

In the wireless communication field, a CMOS SoC transceiver [6] and a global positioning system receiver [7], which integrated radio frequency (RF)/analog front end and digital base band into one single chip, have been successfully demonstrated.

Now that SoC has been extensively pursued in recent years, several issues arise

[3]. First, some intellectual property (IPs) such as central processing unit (CPU) and digital signal processor (DSP) are sometimes protected, and cannot be distributed as common IPs. Other manufacturers cannot build them into their SoC. Second, as the die area, which includes several function blocks, becomes larger and larger, the yield ratio decreases. Third, SoC usually uses more masks and higher fabrication cost to accommodate different processes. For example, the processes to fabricate the CPU are much different from the processes to fabricate DRAM. Fourth, a SoC has many IPs. In order to design a successful SoC, manufacturers must prove that the IPs have the full functionality and test timing. The thorough test of IPs is very hard, and it makes SoC

2 risky. Fifth, some IPs are difficult to share and integrate. Some RF/Analog IPs are very sensitive to noise. When integrating them into digital blocks, the high noise environment caused by the switching of digital circuits can cause serious problems.

SiP provides an alternative way to improve system performance and decrease power consumption, while solving some problems faced by SoC [3][8]. Instead of incorporating all the function blocks into one chip, a SiP contains multiple chips combined into a single package, whose dimension is typically on the same order of magnitude as one chip’s package. The integration can be optimized for the system level package. When a system is partitioned into separate components, each component can be optimized for manufacturing cost and performance. The fabrication processes can be simplified, the yield ratio increases the cost drops and the die area decreases. In addition, by partitioning function blocks like memory, logic and RF/analog into different die, the changes to the interface circuits can engender a more efficient layout. The I/O drivers for the low capacitance interface are much smaller than the typical high-current off-chip drivers. Additional area and power reduction can be achieved by resizing the I/O pre- driver circuitry. Because of the attributes above, there are about 30 IC and packaging companies gearing up to produce SiP based multichip modules [4].

Generally speaking, there are two technologies in SiP, i.e. 2-D assembly and 3-D assembly, shown in Fig. 1.2. Compared with SoC, 2-D SiP can provide almost the same system area, while offering optimized processes for each die and good electrical isolation between each die. 3-D SiP can achieve a smaller system area by stacking different dies, as shown in Fig. 1.3. The devices in 3-D SiP are spaced more closely, so the communication among devices is faster, but, the heat dissipation is a huge problem. For

3

(a) (b)

Figure 1.2. SiP (a) 2-D assembly and (b) 3-D assembly. (Adopted from [3].)

Figure 1.3. SiP example based on 3-D assembly. (Adopted from [9].)

a 3-D SiP, the heat generated by the die inside the package is very hard to dissipate, because the heat sink can dissipate only the heat on top and bottom of the package efficiently. The accumulated heat can cause damage to the die, such as electromigration, and ultimately decrease the lifetime of the system.

SoP, first introduced at Georgia Tech, is a new concept of SiP, which utilizes thin film technology to combine passives into the package. In [4], the quality factors (Q) of inductors achieve 100 – 400 on SoP instead of 10 – 25 normally on a silicon substrate.

Also, the passives, such as decoupling capacitors, take a lot of active area on a die if fabricated on the wafer, or add much more weight and become bulky if fabricated discretely. The thin film technology in SoP helps build better and more compact passives

4

Figure 1.4. SoP concept for system integration of thin film components. (Adopted from [4]) and results in smaller and faster systems. Figure 1.4 shows the SoP concept for a system integration of thin film components.

In this dissertation, “quilt-packaging” (QP), a new 2-D SiP paradigm [10], is proposed and demonstrated. The essence of our proposed approach is to use MEMS- inspired fabrication techniques to form contacts along the vertical edge facets of integrated circuits, enabling ICs to be interconnected by butting them against each other.

The advantages of this technique are that by providing much shorter interconnects between chips, great improvement in power reduction, speed increase and noise reduction can be achieved. The fabrication processes are designed so that they can be compatible with standard CMOS and BJT processes. The microwave performance of the QP

5

IC3

IC1

IC2

Figure 1.5. Conceptual diagram of Quilt-Packaging.

structure shows it suitable for wide bandwidth communication systems. A conceptual schematic of QP is shown in Fig. 1.5.

1.2 Organization

Chapter 2 begins with a review of traditional packaging techniques. Then, multichip modules (MCM) and some more advanced techniques actively pursued in industry will be presented. In Chapter 3, signal integrity issues of IC interconnections will be introduced, and advantages of QP will be addressed. Full 3-D electromagnetic

(EM) simulation of QP will be given. Chapter 4 introduces the mask design and fabrication processes of QP. Chapter 5 shows the microwave measurement of different

QP structures. Comparisons with other techniques will be presented. Finally, the last chapter contains conclusion and plans for future work.

6

CHAPTER 2

REVIEW OF PACKAGING TECHNIQUES

The goal of this chapter is to give a brief review of packaging techniques during the last several decades. Traditional and the most advanced packaging techniques will be presented with their advantages and disadvantages.

2.1 Through-Hole Technology

Through-hole technology (THT) has been the prevailing packaging technology from the 1960s to the mid-1980s. Chips are mounted by inserting small legs into plated through-holes drilled in printed circuit boards (PCB). A dual-in-line plated through-hole package is shown in Fig. 2.1.

Figure 2.1. Schematic of a dual-in-line plated through-hole package.

7 THT is low cost and mature, but the number of external connections that a chip package can provide is severely limited by the large PCB through-holes. In addition, the package size of THT is much larger than the chip die area, which leads to larger distance between systems, and deteriorated system performance. The requirement for faster I/O speed and higher system performance forced the packaging evolution from THT to surface-mount technology.

2.2 Surface-Mount Technology

Surface-mount technology (SMT) emerged in the 1970s. Because of the low availability of SMT components and higher cost, SMT became popular only in the mid-

1980s. Surface-mount assembly achieves higher density of interconnection by reducing the lead spacing and eliminating the space waste of drill holes in the

(PCB). A surface-mount assembly is shown in Fig. 2.2.

(a)

(b)

Figure 2.2. (a) A schematic of the surface-mount technology and (b) a closer look at surface-mount package. (Adopted from [1].)

8 Compared with THT, SMT provides several advantages. First, the smaller package size and higher number of leads result in increased packaging efficiency.

Second, the reduction of the peripheral area of SMT package, as well as the spacing between SMT packages result in higher system speed and better signal integrity.

As the demand for faster, smaller and more power efficient systems becomes more and more urgent, the packaging techniques became the bottle neck. Slight improvements in packaging techniques can no longer meet the fast development of integrated systems. A revolution was needed. Multichip modules emerged as a brand new approach for packaging, and a possible way leading to system-in-package.

2.3 Multichip Modules

Multichip modules (MCM) completely remove the single chip package. In MCM, multiple bare dies are directly mounted and interconnected on a substrate, as shown in

Fig. 2.3. Since the substrate has finer conductor lines, thinner dielectric layers, and a denser via grid than the board, MCM can provide much higher packaging efficiency.

Figure 2.3. A typical multichip module. (Adopted from [1].)

9 MCM can be generally categorized in three major types depending on the material or processing methods of the multilayer substrate [2, 3, 4, 5]:

• MCM-L: organic laminated layers (fine line PCB);

• MCM-C: cofired ceramic layers;

• MCM-D: deposited thin film layers.

The three MCM schemes are shown below in Fig. 2.4 and the comparisons are shown in Table 2.1.

Figure 2.4. (a) MCM-L, (b) MCM-C, and (c) MCM-D. (Adopted from [5])

10 TABLE 2.1

COMPARISON OF MCM FEATURES. (Adopted from [5]) Design parameters MCM-L MCM-C MCM-D Feature size (line/space) (mm) 125/125 100/125 20/20a Via size (µm) 250 200 20 Critical dimension uniformity 12 25b 5 +/- (µm) Number of levels 10 20+ 5+ Dielectric Constant 3.5 to 4.5 5.2 to 7.8 2.9 Dielectric thickness (µm) 112 100 1 to 10 Cutouts/cavities Yes Yes No Integrated resistors/capacitors No Yes Yes

aSmall lines and gaps can be defined, subject to circuit impedance and delay requirements.

bEven though line widths are smaller for MCM-C than MCM-L, critical dimension uniformity is greater because of firing shrinkage.

A common approach of MCM-L is chip-on-board (COB), which uses fine line multilayer PCB technology with bare dies directly mounted on the board. The fabrication process of MCM-L is described below.

First, the metal pattern, usually Cu, is etched in a single layer. Second, layers are bonded together, which is called lamination. Third, through-holes are drilled in the sublaminate. Fourth, metal is plated along the through-holes. Finally, the different layers are stacked and bonded together to form MCM-L. MCM-L is low cost, easy to repair and rework individual layers, has well established infrastructure, and can assemble components on both sides. However, it also suffers low performance, low wiring density, poor thermal conductivity of substrate, mismatches between substrate and die materials, and high crosstalk noise.

MCM-C is highly reliable and well characterized, but it is more expensive. The fabrication process of MCM-C is shown in Fig. 2.5.

11 First, a liquid slurry is formed from ceramic particles and organic binders and then cast into a solid sheet, which is called green tape. Then, via holes are drilled or punched through the green tape. Next, metal is applied to the green sheet to form conductive patterns, and the via holes may be filled by using screen printing, which is a kind of thick film process in which a paste or ink is squeezed through open areas of a screen and transferred to the surface of a green tape sheet to form patterns. Finally, all the green tapes are stacked on top of each other and heated in a furnace. During the heating, the organic binders decompose, the ceramic densifies, and the structure shrinks.

MCM-C has higher wiring capacity, better electrical conductivity, better thermal conductivity, and is superior in strength and rigidity, and can also assemble components on both sides, but it also suffers shrinkage of substrate during cofiring, high dielectric constant of substrate, and mismatch between substrate and die materials.

Figure 2.5. Fabrication process of MCM-C. (Adopted from [5].)

MCM-D interconnection patterns are made by deposited conductors, typically Cu or Al, and dielectrics, typically polyimide or benzocyclobutene (BCB), on a base substrate made of ceramic, silicon, or metal. MCM-D techniques are most closely parallel with the processes in fabrication. The thin dielectric film is usually deposited by a spin coating process, which can yield well-controlled uniformity

12 and thickness. Vias can be made by laser ablation, reactive ion etching or wet etching of dielectric layers. The thin conductor layer is deposited by sputtering, and is patterned by photolithography and etching. Furthermore, electroless- or electro-plating can also be done as an additive process. Finally, the dielectric layers need to be heated for curing.

The temperature required (in the range of 200 deg. C to 400 deg. C) is much lower than that required in MCM-C (typically 800 deg. C for low temperature co-fired ceramic). A cross section of a typical MCM-D is shown in Fig. 2.6.

Figure 2.6. Cross section of MCM-D with a flip-chip solder bump. (Adopted from [8].)

The advantages of MCM-D technology are high system performance, high interconnection density, low dielectric constant substrate, and good electric properties.

However, MCM-D is more costly, and has more serious known good die (KGD) problem.

KGD means that the bare die has been tested and burned-in at frequency specifications.

Since repair and rework in MCM-D are most difficult, the system cost increases dramatically when the problem occurs.

2.4 Chip Bonding Techniques

For MCM, bare chips are attached to the substrate in any of three ways, i.e. wire bonding, tape automated bonding (TAB) and flip-chip bonding, as shown in Fig. 2.7.

13

Figure 2.7. Chip bonding techniques: (a) wire bonding, (b) tape automated bonding, and (c) flip-chip bonding. (Adopted from [2].)

For wire bonding, the back side of the die directly attaches to the substrate. The electrical connection is made by attaching wires from the I/O pads on the active side to the corresponding pads on the substrate. The attachment is made by thermal compression.

TAB uses a thin polymer tape containing metallic circuitry. The interconnections are patterned on the multilayer tape, which is plated on top of the bare die and pressed, so that the metal tracks correspond to the bonding sites on the die. Flip-chip bonding uses small solder balls on the I/O pads on the device side of the chip to both physically attach the chip and make electrical connections, also called face down bonding or controlled- collapse chip connections (C4), invented by IBM in 1960s [6].

With the system performance ever increasing, more advanced packaging techniques are being actively pursued. The following several techniques are among them.

14 2.5 Proximity Communication [7]

“Proximity communications” is a new packaging technique proposed by SUN

Microsystems. The idea is to capacitively couple transmitter pads on one chip to the receiver pads on another chip, so that the inter-chip interconnections can be replaced by much faster and denser on-chip interconnections, as shown in Fig. 2.8.

Figure 2.8. Cross section of inter-chip interconnection. (Adopted from [7].) Since the transmitter and receiver pads are protected by the top dielectric layer and passivation, electrostatic discharge (ESD) protection becomes unnecessary. Figure

2.9 shows the transmitter and receiver circuitry along with the signal capacitor, Cs, and the parasitic capacitors of the transmitter and receiver pads, Cpt and Cpr.

Figure 2.9. Circuit diagram of transmitter and receiver. (Adopted from [7].)

The transmitter is an inverter driving the transmitter pads. Cpt loads the inverter.

In the receiver, Cpr forms a capacitive divider along with Cs. The receiver consists of an

15 inverter and a feedback inverter. They form a positive feedback, which holds the receiver state until the next transition occurs. A third inverter may be needed to amplify the signal.

A test chip fabricated in 0.35 µm CMOS technology is shown in Fig. 2.10, and a close look can be seen in Fig. 2.11. 16 channels operate simultaneously, with each communicating pseudo-random patterns at a rate of 1.35 Gbps, for an aggregate bandwidth of 21.6 Gbps. Each communication channel consumes a static power of 3.6 mW due to receiver bias currents, and a dynamic power of 3.9 pJ per bit due to switching activity.

Figure 2.10. Chip photograph. (Adopted from [7])

Figure 2.11. A close look of aligned chip. (Adopted from [7])

16 For this packaging technique, alignment of the two chips is a critical issue.

Misalignment can cause I/O signal errors. Another issue is capacitive coupling, which needs larger drivers to boost the signal especially at the receiver end. So, it needs relatively large power, and the speed is limited by the coupling.

2.6 Neo-Stack Technology [8]

“Neo-stack” is a new 3-D packaging technology invented by Irvine Sensors. It starts with complete wafers of chips. A gold metallization process is performed to bring all the signals to the edge, and the wafers are diced. The dice are stacked, and the stack is lapped in the street area, exposing the ends of the reroute metal. Bus metallization is deposited to the side of the stack, which interconnects the dice and a ceramic top cap substrate, allowing signals into and out of the stack. The fabrication process of neo-stack is shown in Fig. 2.12.

A flash memory module, which includes 16 layers of flash memory and the associated control and drive circuitry, is shown in Fig. 2.13.

Figure 2.12. Neo-stack process sequence. (Adopted from [8])

17 Neo-stack has several advantages: It can start with known good die (KGD); it allows heterogeneous stacking of mixed chip types; 50 or more thin layers are available; die shrink can be accommodated easily; and it has a relatively large silicon heat conduction path. However, it also has some limitations: Wafer preparation is complicated; all dice must be the same size, limiting the stack to a single die type; dice are not burned in, which limits the stack height; and frequent die shrinks need substantial retooling.

Figure 2.13. A flash memory module. (Adopted from [8].)

2.7 Transfer & Join Technology [9]

Transfer & join (T & J) is another new packaging technology developed by IBM.

Polyimide is deposited on the wafer and patterned by etching. Metal studs are formed at the vias, which act as the I/O pads. Interconnections are formed inside a chip, which is on top of a temporary glass wafer. Isostatic pressure bonding is used to bind device wafer and interconnection chip together. Then, a laser is utilized to delaminate the glass wafer. The schematic graph of T & J comparing with normal MCM is shown in Fig. 2.14.

18

(a)

(b) Figure 2.14. Schematic graph of (a) normal MCM and (b) T & J. (Adopted from [9].)

T & J can be used to achieve a high interconnection density 2-D system-in- package design. When replacing the temporary glass wafer with another device wafer, which has been processed in the same way as the wafer on the bottom, it can achieve a 3-

D SiP design. However, since the device side of each wafer is facing together, the thermal dissipation will be a problem. To solve this question, through-holes filled with metal are formed in the top device wafer, and the two device wafers are bond together with the bare side of the top wafer sitting on top of the device side of the bottom wafer.

The three methodologies are shown in Fig. 2.15.

T & J offers high wiring density. It also allows mixed chip type interconnection.

With proper design, it can achieve optimum system performance, but heat dissipation is a critical issue in this technology. Special polyimide is required to withstand and help dissipate the heat, to act as good electrical insulator, and to be processed easily. For a 3-

19 D SiP using T & J, through-hole etching and filling are needed. Ultra thin silicon wafers are needed, which increases the cost.

3D

device side

(a) (b) (c)

Figure 2.15. T & J technology for (a) 2-D SiP, (b) 3-D SiP with thermal

problem, and (c) 3-D SiP offering thermal dissipation.

(Adopted from [9])

2.8 Off-the-Top Chip-to-Chip Interconnection [10]

At SiliconPipe, they believe that the discontinuities introduced from the package to PCB board are the main obstacle for the high speed signal transmission. By utilizing the off-the-top chip-to-chip interconnection technology, the discontinuities will be eliminated and the straight-through signal from die-to-die will be achieved, which enables the transmission speed up to the full silicon speed.

The schematic of off-the-top interconnection technology is shown in Fig. 2.16.

The top graph is the traditional chip-to-chip interconnection through PCB board, and the bottom graph is the new technology, which eliminates the parasitics in the channel.

20

Figure 2.16. Conventional and the new off-the-top interconnection between two chips. (Adopted from [10]).

Figure 2.17. 20 Gbps/channel chip-to-chip interconnection. (Adopted from [10])

SiliconPipe claims that the off-the-top technology can achieve 15 Gbps/channel packaging for 1.3 µm CMOS structures and 20 Gbps/channel packaging for 0.9 µm

CMOS structures over the distance up to 15 inches. Figure 2.17 shows the schematic of this technology.

The above graph shows that off-the-top technology uses connectors (blue blocks) to connect the chip with the channel. It can also cross above the whole chip and bond to the pads on the carrier, as shown in Fig. 2.18.

21

Figure 2.18. High bandwidth interconnection using off-the-top. (Adopted from [10]).

The advantages for this technology are that it eliminates the discontinuity in the

PCB board, it can offer relatively shorter path for different chips, it allows different material systems to work in the same system, and it can be integrated into the available packaging techniques relatively easier. The disadvantages are that it cannot eliminate the bandwidth limitation generated by the bond wires inside the package, the distance between the chips is still too long, and the inductance effect is not alleviated.

There are about 30 IC and packaging companies gearing up to produce SiP based multichip modules [11], including Freescale’s “redistributed chip packaging” [12] and

Intel’s “bumpless build up layer” [13, 14], besides the techniques shown above. The evolution of chip packaging techniques for high speed applications has been remarkable in the last decade. [15] Quilt-packaging, as a novel approach, is unique for the ease of integration and scalability, and the ultimate short path between ICs.

22

CHAPTER 3

QUILT-PACKAGING: INTRODUCTION, THEORY AND SIMULATION

Our proposed new packaging technology, Quilt Packaging (QP), is presented in this chapter. Essential issues related to signal integrity of interconnection and packaging are addressed. Microwave simulations, including transmission performance and crosstalk, of QP are also given.

3.1 Introduction to Quilt-Packaging

Quilt-packaging (QP) is a novel SiP technology for ultra-high-speed data transmission between chips [1][2][3][4][5]. The conceptual diagram is repeated in Fig.

3.1.

IC3

IC1

IC2

Figure 3.1. Conceptual diagram of Quilt-Packaging.

23 The idea is to form contacts during the process of fabricating IC chips along their edges. IC chips are interconnected by butting them against each other. The advantages of the quilt-packaging scheme can be summed up as

• High signal bandwidth (projected > 200 GHz) with excellent and controllable

signal integrity;

• Circuit simplification and area reduction (up to 50%) through elimination of ESD

protection circuits and input/output (I/O) buffers;

• Reduced power dissipation (up to 50%) through elimination of pin drivers and

package capacitances;

• System cost reduction by allowing increased functionality within a single package;

• Heterogeneous integration of ICs and components fabricated using different

processes (e.g. bipolar and CMOS) and material systems (e.g. Si and III-Vs);

• Functional partition, allowing complex designs to be broken into functional sub-

block ICs and integrated into a full system in a single package;

• Increased usable wafer area and die silicon efficiency by elimination of saw

streets and bond pads;

• Cost reduction by eliminating packages;

• Decreased weight and volume of chip and board structures;

• Decreased use of passives on PC board;

• Can be used in combination with existing bonding strategies.

A basic schematic of two chips communicating through QP is shown in Fig. 3.2 (a).

The notch at the forefront is schematic only and reveals the depth of the metal nodules into the substrate. Figure 3.2 (b) shows a three-chip system interconnected by QP. The

24 bonding pads shown in the figure (or, alternatively, solder bumps) suggest that QP can be used in combination with existing packaging techniques. Compared with the techniques currently pursued by industry, QP provides the shortest path (< 50 µm) between the chips and leads to minimum delay, wider bandwidth and lower electrical noise [6].

Additionally, as will be shown in the fabrication process for QP in chapter 4, nodules are defined by standard photolithography and can achieve very high I/O density.

(a)

(b)

Figure 3.2. Cartoon representation of: (a) a simple two-chip QP connection and (b) a three-chip QP system.

Examples of the versatility of QP are schematically shown in Fig. 3.3. Figure 3.3

(a) shows an optical communication system. Figure 3.3 (b) is a wireless communication system. Figure 3.3 (c) is an example of high speed digital processing unit. Ultra-high

25 speed communication is achieved between chips in different material systems or fabrication processes in a single package, without requiring complex package wiring substrate and costly multi-cavity pin grid arrays [7]. The possibility of directly interconnecting III-V devices with Si and SiGe VLSI circuits also opens up the possibilities for new system concepts, computer architecture and applications.

(a)

(b)

(c) Figure 3.3. Quilt-packaging applications in (a) optical communications, (b) RF communications, and (c) high speed digital processors.

26 To analyze the electrical performance of QP, we will begin with signal integrity issues in both on-chip and package interconnections, and the electrical modeling methodologies.

3.2 Signal Integrity Characteristics

Ideally, signal transmission through interconnection is instantaneous and without distortion. This can be considered almost true at low frequency. However, as the frequency increases, the physical structure of interconnection becomes increasingly complicated, and this can no longer be held true. Several effects, such as delay, reflection, crosstalk, and ground bounce, regarded as signal integrity issues [8][9], play an increasingly important role in characterizing interconnections.

3.2.1 Delay

Delay, caused by the non-negligible electrical length of interconnection both on- chip and at the package, can be expressed by eq. (3.1) [8]

l τ = , (3.1) vp

where l is the physical length and vp is phase velocity. vp is defined by eq. (3.2)

ω v = , (3.2) p β where ω = 2π f , and β is the propagation constant. If τ is not constant over the signal

−1 ⎛⎞dβ bandwidth, group velocity, vg = ⎜⎟, should be used instead of phase velocity. If ⎝⎠dω group velocity is not constant over the frequency band of the signal, the pulse shape of

27 the digital signal will be distorted after propagation, which may lead to inter-symbol interference.

3.2.2 Reflection

Reflection is the wave returning to the generator from the end of interconnections due to the impedance mismatch of the input impedance of interconnections or their loads

with respect to the generator impedance. In scattering matrix notation, Sii characterizes the reflection coefficient at port i with respect to the reference impedance.

In the frequency domain, reflections may cause resonances at packages or on the on-chip interconnections, which can result in radiation, and corruption on the signal lines.

In the time domain, reflection can cause degradation of noise margin due to the corruption of signal levels.

3.2.3 Crosstalk

Crosstalk is the interferences between adjacent interconnections due to capacitive and inductive coupling. Crosstalk can cause spurious (false) signals in the coupled interconnections. A second effect is the induced delay of transitions due to the noise that crosstalk introduces. F. Moll and M. Roca [8] demonstrated that when the transition in the affecting line is in the same direction as the transition in the affected line, the affecting signal causes a decreased transition delay, while when the transition in the affecting line is in the opposite direction to that of the transition in the affected line, the affecting signal causes an increased transition delay. Crosstalk also can increase energy consumption by two different contributions: first, the spurious signal causes extra

28 dissipation in the driver resistance of the coupled lines; second, this spurious signal continues to propagate in the following nodes.

3.2.4 Ground Bounce

Ground bounce, also called delta-I noise, is generated by voltage fluctuations on parasitic inductors of power supply and ground connections when drivers switch from one state to the other [10][11].

The definition of ground bounce can be expressed by the simple example shown in Fig. 3.4.

Figure 3.4. Ground bounce setup and voltages. (Adopted from [10].)

The above is a simplified model of driver and receiver that neglects capacitive

and inductive coupling and transmission line effects on the interconnections. Lpower and

Lground are the inductance of power and ground connections respectively. Lline is the inductance of the interconnection to the receiver load. The voltage fluctuation of the ground can be expressed in eq. (3.3) as

29 dI VL= . (3.3) g ground dt

The current I is generated during switching, which causes the discharge of load

capacitor, CL , and is formulated in eq. (3.4) as

dV IC= C , (3.4) L dt

where VC is the voltage of the capacitor. Also, the power supply inductance will generate the same type of noise.

3.3 Electrical Modeling of Interconnects

At low frequencies, the interconnections including on-chip and packaging will not affect the performance of the functional blocks, and can be modeled simply as a zero resistance wire. But, as the frequency and the density of on-chip interconnections and

I/O ports become higher and higher, modeling of the interconnections becomes more and more complicated. Self resistance, capacitance, and inductance along with coupling capacitance and coupling inductance will all need to be considered.

3.3.1 On-Chip Interconnection

The width of on-chip interconnections scales along with the gate length, as shown in Fig. 3.5 [12]. A typical chip cross-section with multilayer interconnections is shown in

Fig. 3.6 [13], and a 90 nm CMOS technology interconnection and its key design rules are shown in Fig. 3.7 [14]. Lower layers are devoted to local routing, interconnecting nearby elements, while the upper layers are devoted to global routing, transmitting signals along the whole chip and voltage reference. Therefore, a reverse scaling scheme is applied to

30 the interconnections: the lower layers are scaled down with the rest of the technology, to increase interconnection density, while the upper layers are scaled up to decrease the resistance of the global routing.

Figure 3.5. The trends of smallest transistor gate length and minimum width of interconnects. (Adopted from [12].)

Figure 3.6. A typical chip cross-section. (Adopted from [13])

31

Figure 3.7. Cross-section of SEM picture of 90 nm CMOS interconnects and their key design rules. (Adopted from [14])

F. Moll and M. Roca [8] showed that, generally, resistance does not matter much for the local wiring, because the total resistance is dominated by the device resistance.

But for global wiring, RC effects are non-negligible, which is due to the combination of decreasing driver resistance to drive large load and large wiring resistances due to long lengths. The total capacitance can be dominated by local wires. As we can see in Fig.

3.5, the scaling of gate length always leads that of wire width. So the capacitance of local wires decreases slower than that of devices. And, when interconnects become denser and denser, the coupling capacitors can scale the capacitance up to several times their normal value. At extremely high frequency, inductance will play an important role, especially for the long un-buffered wires and the power/ground lines for returning current.

32 3.3.2 Modeling of On-Chip Interconnects at Intermediate Frequency

Since the effects of inductance will appear only at very high frequency, to simplify the modeling, we first neglect the inductance of the interconnects. As a first approach, the interconnect can be modeled by a single total resistor, R, and a single global capacitor, C. This model, called lumped RC model, is pessimistic and inaccurate for long wires. For the long wires, a distributed RC model is more appropriate. Both of the models are shown in Fig. 3.8.

L (a) (b)

Figure 3.8. (a) Lumped model and (b) distributed model.

(Adopted from [15])

If the length of the long wire is L , we use a distributed model to separate the whole length into N equal parts. So, if R = rL and CcL= ,

L L R =RRRRr =⋅⋅⋅= = =⋅⋅⋅= = and CC= =⋅⋅⋅= C = C =⋅⋅⋅= C = c . 12ii− 1 NN 12ii− 1 NN

The voltage at node i can be solved using the following differential equation [15]:

L ∂VVVVV()()−+− c iiiii= +−11. (3.5) L Nt∂ r N

When N →∞, the equation becomes the well-known diffusion equation

∂VV∂2 rc = , (3.6) ∂tx∂ 2

33 where V is the voltage at one particular point in the wire, and x is the distance between the point and the input source.

To calculate the delay of the cascaded N-stage RC chain, [16] shows the following expression, which is called the Elmore delay

NN NN τ N ==∑ RCij∑∑∑ CR ij. (3.7) ijiiji==11 ==

So, at node 1, the delay is τ111= CR . At node 2, the delay is

τ 211212=+CR C() R + R ; and at node i, the delay is

τ iii= CR11+ C 2() R 1 + R 2 +⋅⋅⋅+ C ( R 1 + R 2 +⋅⋅⋅+ R ). (3.8)

The delay at node N, when the wire is partitioned equally, is

2 ⎛⎞LN2 +1 τ N =⎜⎟((2)())cr + c r +⋅⋅⋅+ c Nr = L cr . (3.9) ⎝⎠NN2

When N →∞, the delay becomes

L2cr τ = . (3.10) N 2

But, for lumped model, the delay is

τ = L2cr . (3.11)

which is twice as much as the delay from the distributed model. This shows that lumped model is not appropriate for modeling long wires.

To decrease the RC delay in long wires, we can insert inverters, which are also called repeaters, as the intermediate buffers in the interconnects [15]. The idea is shown in Fig. 3.9.

Figure 3.9. Repeater insertion to reduce RC delay.

34 When a wire is sufficiently long, making each interconnect line N times shorter will reduce its propagation delay quadratically, which is enough to offset the delay that

the repeaters introduce. The propagation delay t p can be expressed as below [15]:

L trcNNt=+−0.38 ( )2 ( 1) , (3.12) pN pbuf

where, t pbuf is the propagation delay of the buffer, which is determined not only by the size of the buffer, but also the load capacitances. The optimum number of buffers

∂t inserted can be calculated by setting p = 0 . The optimum N is ∂N

0.38rc NL= . (3.13) t pbuf

To incorporate the buffer effects into the delay, [17] and [18] show that for an un- buffered interconnect, the delay can be approximately expressed as

tRCRCRCRCRC0.5 ≈++++0.377INTINT 0.693( TT TJ TINT INTT ) , (3.14)

where CINT is the capacitance of interconnect, RINT is the resistance of interconnect, CT

is the gate capacitance of the load, RT is the drain effective resistance of the driving

transistor and CJ is the drain junction capacitance of the driving transistor.

When the interconnect is divided into N sections, and a total of (1)N − buffers are inserted. The total delay can be expressed as [19]:

⎡⎤RCINT INT R00 RC INT R INT tNppJ≈++++⎢⎥12000 p(( hCC ) hC ). (3.15) ⎣⎦NN h hN N

where h is the gate size of the inserted buffer, C0 is the gate capacitance of the minimum

width transistor, CJ 0 is the drain junction capacitance of the minimum width transistor

35 and R0 denotes the gate effective resistance of the minimum width transistor. The optimum gate size of the buffer and the optimum number of buffers can be derived by differentiating in terms of h , and N , and setting the derivatives to zero.

∂t p CRINT 0 =→0 hOPT = . (3.16) ∂hRCINT 0

∂t p p1 RCINT INT =→0 NOPT = . (3.17) ∂+NpRCC2000()J

Substitute hOPT and NOPT into eq. (3.15), the optimum delay is expressed as

⎛⎞C tppp=+2 0 ττ , (3.18) pOPT⎜⎟12 2 INT MOS ⎝⎠CC00+ J

where τ INT(= RC INT INT ) is the time constant of interconnect, and τ MOS((=+RC00 C J 0 )) is

the time constant of the buffer that corresponds to the inverter delay with fanout of 1. p1

is 0.377 and p2 is 0.693 for the case of the delay from zero to a half of source voltage.

So, t pOPT can be expressed as

t pOPT= 2.4 ττ INT MOS (when CJ 0 = 0 ) , (3.19)

= 2.0 ττINT MOS (when CCJ 00= ). (3.20)

When the interconnect is optimally buffered, the overall capacitance increases due to the insertion of buffers. The total capacitance of buffers can be expressed as below:

CNhCCbuf= OPT OPT()00+ J

p1 CC00+ J = CINT pC20

= 0.73CINT (when CJ 0 = 0 ), (3.21)

36 = 1.04CINT (when CCJ 00= ). (3.22)

The total capacitance is increased by 73~104% compared with the system without buffers. The increased capacitance will increase the power consumption of the system.

Inserting uniform inverters has been a popular method for decreasing RC delay.

But, the initial cascade of buffers leading to the first inverter (see Fig. 3.10) may introduce a significant fraction of delay [20]. Also, this localized stack of buffers can lead to unwanted power spikes.

Figure 3.10. Schematic of uniformly repeated line with initial cascade stage. (Adopted from [20].)

To solve the problem, S. Srinivansaraghavan and W. Burleson [20] presented a new idea by resizing the logic gates close to long interconnects and the repeaters, which is shown in Fig. 3.11.

Figure 3.11. Eliminating buffer stage in (a) by

resizing logic stages and repeaters. (Adopted from [20].)

37 Figure 3.12 shows the delay with and without the buffered stage. Also, instead of resizing all, the sized logic stages can be cascaded, while the repeaters will remain uniform. Figure 3.13 shows the delay and power consumption compared with the buffered stage. The results are for 0.18 µm technology and 1.8 V power supply.

Figure 3.12. Delay comparison. (Adopted from [20].)

Figure 3.13. Delay and power comparison. (Adopted from [20])

To accommodate the effect of interconnects into system design, we can model them as a lumped RC network to approximate the distributed RC. With that, circuit simulation tools, such as HSPICE, can be used to simulate circuit performance for both devices and interconnects. There are several models built for interconnects, as shown in

Fig. 3.13. The accuracy is determined by the number of stages. J.M. Rabaey [15] says

38 that the error of π3 model, defined in Fig. 3.14, is less than 3%, which is generally acceptable.

(a) (b)

(c) (d)

(e) (f) Figure 3.14. Simulation models for distributed RC lines: (a) π model, (b) t model, (c) π2 model, (d) t2 model, (e) π3 model, and (f) t3 model.

3.3.3 Transmission Line Effects

When interconnects become sufficiently long, or the frequency is very high, inductance effects begin to dominate the delay behavior, and the transmission line effects become severe and cannot be neglected. Instead of using a distributed RC model to model the interconnects, the distributed RLC model, also known as the transmission line model, becomes more accurate to describe the real behavior. The main difference between the transmission line and the distributed RC models is that in the transmission

39 line the signal propagates over the interconnection medium as a wave, while the signal diffuses from the source to the load over distributed RC components.

A transmission line can be schematically represented as a two-wire line. The distributed transmission line model is represented as an RLGC network [21]. Both are shown in Fig. 3.15. The series resistance per unit length, R , represents the resistance of the conductors; the series inductance per unit length, L , is due to the self-inductance of the two conductors; the shunt conductance per unit length, G , is due to the dielectric loss or the finite resistivity in the substrate; and the shunt capacitance per unit length, C, represents the close proximity of the two conductors.

I()z

+ Vz()

− z (a)

(b) Figure 3.15. (a) Schematic of transmission line and (b) distributed RLGC model.

The telegrapher equations, which describe the propagation of the transmission line in the time domain, are expressed as

∂vzt(,)∂ izt (,) =−Ri(,) z t − L , (3.23) ∂zt∂

∂izt(,)∂ vzt (,) and =−Gv(,) z t − C . (3.24) ∂∂zt

For the steady state, eqs. (3.23) and (3.24) can be simplified as

40 dV() z =−(R + jLIzω )(), (3.25) dz

dI() z and =−()()GjCVz + ω . (3.26) dz

Solving eqs. (3.25) and (3.26), the traveling wave can be found as

+−γ zz −γ Vz()=+ Ve00 Ve , (3.27)

+−γ zz −γ and I()zIeIe=+00, (3.28) where,

γα=+j β =()()RjLGjC + ω + ω (3.29) is complex and frequency dependent propagation constant. e−γ z means the wave propagates in the +z direction, while eγ z means the wave propagates in the −z direction.

When applying eqs. (3.25) to (3.27), we get

γ +−γγzz − I()zVeVe=−()00. (3.30) RjL+ ω

The characteristic impedance, Z0 , can be defined as

Rj+ ωL Z = . (3.31) 0 G + jωC

In many cases, the loss is very small and can be neglected. The lossless transmission line model can be simplified as a distributed LC network, as shown in Fig.

3.16.

Figure 3.16. Distributed LC model for lossless transmission line.

41 Then, the propagation constant and characteristic impedance can be reduced to

γβω==jjLC , (3.32)

L and Z = . (3.33) 0 C

When a step input signal is applied to the transmission line, it propagates with a certain speed, v , given as

ω 11c v == = = 0 , (3.34) β LC εµεµrr where ε is the dielectric constant of the surrounding medium, µ is the magnetic

permeability of the medium, ε r and µr are the relative values of dielectric constant and

permeability of the medium with respect to vacuum, and c0 is the speed of light in vacuum.

However, to analyze today’s dense interconnects with very high switching speeds, the loss of dielectric and conductor can no longer be neglected. Dielectric loss, which is due to nonzero conductivity of the medium, is always present in any practical nonmagnetic dielectrics. For lossy dielectrics, the complex dielectric constant is defined as [22]:

εˆ = εε′ − j ′′ , (3.35)

where, ε′ ==εεεr 0 and ε ′′ = εδtan . tanδ is known as loss tangent, and is defined as

ωεσ′′ + tanδ = , (3.36) ωε′ where σ is the conductivity of the dielectrics. Loss tangent is a frequency dependant parameter.

42 In many cases, the loss or attenuation, even in good conductors, where the conduction current is much larger than the displacement current, needs to be considered.

In this case, ε ′′ is much larger than ε ′ . The propagation constant becomes

ωµσ γα=+jj β =+(1 ) . (3.37) 2

At higher frequency, the thickness of the conductor for transmitting current tends to shrink, which is known as the skin effect. We define skin depth as the distance to which the magnitude of the fields of wave traveling in the medium is reduced to 1/e of

those at the medium surface. The skin depth, δ s , is expressed as

12 δ == . (3.38) s α ωµσ

For Cu, which is used as the conductor in our quilt-packaging structure, the skin depth at 1 GHz is 2.09 µm, and at 5 GHz is 0.93 µm, and at 50 GHz, is decreased to 0.28

µm. The exponential falloff of current due to skin effect results in a higher AC resistance than DC resistance. At microwave frequency, making the conductor thicker than several skin depths is sufficient, since virtually no current will flow inside of the conductor. To calculate the current of a plane conductor we integrate current from the surface to the skin depth. To simplify the calculation, when the conductor is much thicker than the skin depth, we can assume that the total current is a uniform current transmitted in a conductor with the thickness of one skin depth. S. Ramo [23] and W.C. Johnson [24] gave the equations to calculate current density and skin depth of round wires using Bessel functions. Also, because the current is confined to a shallow depth from the surface, any surface roughness of the conductor will increase the effective resistance.

43 Besides the skin effect, which causes the current to concentrate at the surface of the conductor, proximity effects, which cause the current to concentrate where magnetic flux is greatest, also plays an important role [24][25]. It means the current at the surface is not consistent over the whole place. Figure 3.17 shows the combined skin effect and proximity effect. (b) and (c) show that for transmission lines above the ground, the current on the backside is much higher than that running on the topside. For the upper two wires, shown in (a), one has current flowing into the page, and another one has current flowing out of the page.

(a)

(b) (c)

Figure 3.17. Schematic of combined skin effect and proximity effect.

(Adopted from [25].)

After discussing the loss effects in transmission lines, we can realize that the lossless transmission line model is not suitable, especially at high frequency.

Transmission lines are used to connect different components, which by themselves will affect the performance of the interconnection. A transmission line with driving source and termination load is shown in Fig. 3.18.

44

Γ g Γl Iin

Z g +

V Z0 Z V Zin in l g ~ ()γ −

−l 0 z Figure 3.18. Transmission line circuit with load and generator.

The characteristic impedance of the transmission line is Z0 . When the load

impedance Zl ≠ Z0 , part of the wave at the termination will be reflected to the generator.

The reflection coefficient at the termination, Γl , is defined as the ratio of the reflected wave to the incident wave, and can be expressed as

− VZZ00l − Γ=l + = . (3.39) VZZ00l +

In this case, the available power at the transmission line will not be delivered thoroughly to the load. The loss is called the “return loss”, and can be defined (in dB) as

RL=−20log Γl dB . (3.40)

Another important parameter is voltage standing wave ratio (VSWR), which is the ratio of maximum and minimum voltage measured in the transmission line, and defined by

1+ Γ VSWR = l . (3.41) 1− Γl

Equations (3.27) and (3.28) can be re-written as

+−γγzz Vz()=+Γ V0 ( el e ), (3.42)

45 + V0 −γγzz and I()zee=−Γ (l ), (3.43) Z0

+ where V0 is the incident wave amplitude at z = 0 . The reflection coefficient at a distance l from the load can be expressed as

−−22jβllαγ − 2 l Γ=Γll()lee =Γ l e. (3.44)

The input impedance Zin at this point is

Vl()− Zl + Zl0 tanhγ ZZin ==0 . (3.45) I()−+lZZl0 l tanhγ

The power delivered to the transmission line at zl= − is

+ 2 1 V0 2 PVlIl=−−=−ΓRe()()⎡⎤∗ ⎡⎤ 1 () le2αl , (3.46) in⎣⎦⎣⎦ l 22Z0 while the power delivered to the load is

+ 2 1 V0 2 ⎡⎤∗ PVIll==−ΓRe⎣⎦ (0) (0) (1 ). (3.47) 22Z0

The difference between Pin and Pl is the power loss through the transmission line

2 V + PPP=−=0 ⎡(1)(1) e22ααll −+Γ−2 e− ⎤ . (3.48) loss in l⎣ l ⎦ 2Z0

The reflection coefficient at the source, Γg , is defined as

Z g − Z0 Γ=g . (3.49) Z g + Z0

The characteristic impedance, Z0 , and complex propagation constant, γ , can also be related to the two-port s-parameters as [26][27]:

46 22 22⎡(1+−SS11 ) 21 ⎤ ZZ0 = r ⎢ 22⎥ , (3.50) ⎣(1−−SS11 ) 21 ⎦

−1 22 −γ l ⎛⎞1−+SS and eK=±⎜⎟11 21 , (3.51) ⎝⎠2S21 where,

0.5 22 2 2 ⎡(1)(2)SS11−+− 21 S 11 ⎤ K = ⎢ 2 ⎥ . (3.52) ⎣ (2S21 ) ⎦

Zr is the reference impedance at both ports of the transmission line ( Zrgl==ZZ).

3.3.4 Modeling of Conventional Packages

A schematic top view of a conventional package with N leads is shown in Fig.

3.19.

As frequency rises, to model the package more accurately, more and more lumped and distributed components should be incorporated. Figure 3.20 shows a typical topology for a 3-lead package.

Figure 3.19. A schematic top view of an open classic IC package. (Adopted from [9].)

47

Figure 3.20. A partial package model for 3 leads. (Adopted from [25].)

The above graph is only a partial model, which shows only the coupling effects between leads 1 and 2, and between leads 2 and 3. In addition, there is also inductance coupling from lead1 to lead 3, and capacitive coupling from lead 1 to lead 3. The inductors on the left in Fig. 3.20 represent the lead inductances, and the inductors on the right in Fig. 3.20 represent the bond-wire inductance. This model neglects the coupling effect between lead inductance and bond-wire inductance. Although the coupling is much less than lead-to-lead and wire-to-wire coupling, when the frequency is sufficiently high, it needs to be included in the model. Fig. 3.21 shows some main parasitic elements of the package.

Figure 3.21. Main parasitics in package. (Adopted from [25].)

48 Here, we assume that the PCB has a uniform distribution of inductance and capacitance and the resistive loss is negligible. Although in this situation, theoretically, the PCB transmission line can transmit any frequency, the package cannot transmit all frequencies equally well due to the lack of uniform distribution of inductance and capacitance, which means the signal bandwidth is limited by the package.

Another important issue is the ground bond wire inductance from the nodes. In standard low cost packages, the paddle is floating unless some leads are wire-bonded to the paddle and grounded externally. In addition, there are some wire bonds from the die to the paddle. Fig. 3.22 shows a typical ground bond wire for a floating paddle package.

Figure 3.22. Typical ground bond and grounded lead for a floating paddle package. (Adopted from [25].)

The package model can be treated as a sub-circuit in the circuit simulator, such as

HSPICE, to achieve more accurate results. In the simulation, we should avoid using the ideal ground; instead, when the die ground is connected to the package through bond- wire and lead, we need to add the appropriate imperfect parasitics (resistor, inductor and capacitor) to account for the realistic feedback path to other circuit components.

49 Very complex models are needed to predict the characteristics of conventional packaging system in relatively high frequency (multi-GHz) due to the long signal travel length, which also limits the performance at higher frequency, where inductance, capacitance and resistance become frequency-dependant. For QP, by treating it as a transmission line using an electromagnetic simulator, it provides us a simple but accurate way to model the structure at several tens and even hundreds of GHz.

3.3.5 Modeling of Quilt-Packaging

Figure 3.23 shows a two-chip interconnection through quilt-packaging.

N1 N2 N3 N4 N5

SiO 2 Cu nodule Si Si On-chip interconnect

Figure 3.23. Side view of a typical two chip interconnection through quilt-packaging.

From the above graph, we can clearly see that the network for the two chip interconnection can be divided into five parts (N1 to N5). N1 and N5 are the on-chip interconnects. N2 and N4 are the parts where copper nodules are embedded inside the silicon substrate. N3 represents that the two extended nodules connect together and form a whole block. N2 is the same as N4, and if the length of N1 is the same as that of N5, the

whole network is symmetric, which means SS11= 22 and SS21= 12 , where S11 is the ratio

of the reflected wave amplitude at port 1 to the incident wave amplitude at port 1 and S21

50 is the ratio of the reflected wave amplitude at port 2 to the incident wave amplitude at port 1.

On the chip surface plane, when the signal line is surrounded by ground planes, the structure is called a coplanar waveguide (CPW). Figure 3.24 shows cross sections of three different CPWs, which are conventional, conductor-backed and finite ground plane

CPWs, respectively.

2b 2a t

ε h r

(a) 2b 2a

h ε r

(b)

2c 2b 2a

ε r h

(c) Figure 3.24. Cross sections of (a) conventional CPW, (b) conductor- backed CPW, and (c) CPW with finite ground planes.

51 For the conventional CPW, the effective dielectric constant and characteristic impedance can be expressed as [28]:

ε r −1()Kk()′ K k1 ε eff =+1 , (3.53) 2()()K kKk1′ and

30π K (k′ ) Z0 = respectively, (3.54) ε eff K()k where

a k = , (3.55) b

kk′ =−1 2 , (3.56)

sinh(π ah / 2 ) k = , (3.57) 1 sinh(πbh / 2 )

2 kk11′ =−1 , (3.58) and K represents the complete integral of the first kind. The K()/(kKk′ ) ratio can be approximated by [29]:

⎧ π for00.707≤≤ k ⎪ ⎛⎞ ⎪ 1+ k′ Kk() ⎪ln⎜⎟ 2 = ⎨ ⎝⎠1− k′ . (3.59) Kk()′ ⎪ ⎛⎞ ⎪ 11+ k ln⎜⎟ 2for 0.707≤ k ≤ 1 ⎩⎪π ⎝⎠1− k

When the thickness of the conductor is considered, the width of the signal strip effectively increases, and the width of the gaps between the central strip and ground planes effectively decreases. The following formulas can be used to find the effective dielectric constant and characteristic impedance [30]:

52 t 0.7(ε eff − 1) εε()t =− ba− , (3.60) eff eff K()kt + 0.7 K()kba′ − and

30π K()ke′ Z0 = , (3.61) εeff ()t K()ke where

Se ke = , (3.62) SWee+ 2

2 kkee′ =−1 , (3.63)

Sae =+∆2 , (3.64)

Wbae =−−∆, (3.65)

1.25ta⎡⎤⎛⎞ 8π ∆=⎢⎥1ln + ⎜⎟. (3.66) π ⎣⎦⎝⎠t

For the conductor-backed CPW, the effective dielectric constant and characteristic impedance can be obtained as [31]:

Kk()′ K()k 1+ ε 1 r K()kKk (′ ) ε = 1 , (3.67) eff Kk()′ K()k 1+ 1 K()kKk (1′ ) and

60π 1 Z = , (3.68) 0 Kk() K()k εeff + 1 K()kKk′ ()1′ where

53 a k = , (3.69) b

kk′ =−1 2 , (3.70)

tanh(π ah / 2 ) k = , (3.71) 1 tanh(πbh / 2 )

2 kk11′ =−1 . (3.72)

For the CPW with finite ground planes, the effective dielectric constant and characteristic impedance can be evaluated by [32]:

ε r −1()Kk()K k1′ εeff =+1 , (3.73) 2()()K kKk′ 1 and

30π K()k1′ Z0 = , (3.74) εeff K()k1 where

sinh(πππah / 2 ) 1− sinh22 ( bh / 2 ) / sinh ( ch / 2 ) k = , (3.75) sinh(πππbh / 2 ) 1− sinh22 ( ah / 2 ) / sinh ( ch / 2 )

abc1(/)− 2 k = . (3.76) 1 bac1(/)− 2

The propagation constant of CPW can be easily determined by

γ = jωµεεeff 0 . (3.77)

Equation (3.77) assumes that the conductor and the dielectrics are lossless. If the conductor has finite conductivity and the dielectrics have non-zero loss tangent, the loss

in the CPW can be characterized by the attenuation constant α = ααcd+ , where αc and

αd represent the attenuation constant of conductor and dielectric, respectively.

54 For the conventional CPW with the central strip thickness at t, the attenuation due to the conductor loss can be expressed as [33]:

4.88×+ 104 ba αε= RZP cseffπ 0 ()ba− 2

⎧⎫1.25ta⎛⎞ 8π 1.25 t ⎪⎪ln⎜⎟++ 1 , (3.78) ⎪⎪ππ⎝⎠ta2 × ⎨⎬2 (/)dB m ⎪⎪⎡⎤21.258at⎛⎞π a ⎪⎪⎢⎥21ln+−⎜⎟ + ⎩⎭⎣⎦ba−−π () ba⎝⎠ t where

2 ⎧ kKk⎡⎤() ⎪ 3/2 ⎢⎥ for0≤≤ k 0.707 ⎪(1− kk′′ )( )⎣⎦ Kk ( ′ ) P = ⎨ . (3.79) ⎪ 1 ⎪ for0.707≤ k ≤ 1 ⎩(1− kk )

Here, Rs = ωµσ0 /2 is the surface resistivity of the conductor, with σ being the conductivity.

The expression for dielectric attenuation constant is

27.3ε reff (εδ− 1) tan αd = (/)dB m . (3.80) εεeff(1) r − λ0

where tanδ is the loss tangent of the dielectric and λ0 is the free-space wave-length.

Equations (3.50) ~ (3.52) show that the characteristic impedance and propagation constant can be derived by the simulated or measured s-parameters. We can also derive the s-parameters if we know the characteristics of the transmission line. Figure 3.25 shows the schematic of a transmission line. The characteristic impedance of the lossless

transmission lines at source and load is Zl , the characteristic impedance of the

55 transmission line that we are interested in is Z0 , the propagation constant is γ , and the length is l .

port1 port2

+ + V1 V2 − − V1 Z V2 Z 0 Z l l γ

− l 0 z

Figure 3.25. Schematic of a two-port transmission line

Since the network is symmetric, we need only to find S11 and S21 :

− V1 S11 = + , (3.81) + V2 =0 V1

− V2 S21 = + . (3.82) + V2 =0 V1

In this case, we terminated the load transmission line with Zl .

The voltage along the transmission line can be written as

Vz()=+ Ve+−γ zz Ve −γ .

At zl=− ,

+−−+−+γγll Vl()−= VeVe + = V11 + V = V 1 (1) + S 11. (3.83)

At z = 0 ,

+− +− − V(0) =+=+= VVVVV22 2. (3.84)

If we define

− V Zl − Z0 Γ=(0) + = . (3.85) VZZl + 0

56 then, eq. (3.83) can be changed to

+++−γ llγ V()−= l V111 (1) + S = Ve +Γ (0) Ve . (3.86)

From the equation above, we can get

VS+ (1+ ) V + = 111. (3.87) eeγ ll+Γ(0) −γ

Also, by applying eq. (3.84), (3.83) can be converted to

−+ VV2 =+Γ[1(0)]. (3.88)

Substitute eq. (3.86) into eq. (3.87), we can get

(1++ΓS11 )[ 1 (0)] VV−+= . (3.89) 21eeλλll+Γ(0) −

So, S21 can be easily derived as

− V2 (1++ΓS11 )[ 1 (0)] S21 ==+−γγll. (3.90) Ve1 +Γ(0) e

The problem is reduced to finding S11 . If we assume that the input impedance at

port 1 is Zin , then

Zin− Z l S11 = . (3.91) Zin+ Z l

Here, Zin can be expressed by

Zl + Zl0 tanh(γ ) ZZin = 0 . (3.92) Z0 + Zll tanh(γ )

The derivation shows the convenience of transformation from the characteristics of transmission lines to s-parameters and vice versa. The return loss (RL) is the ratio of power reflected to the generator to the power from the generator, which can be expressed

57 as RL= 20log( S11 ) dB. 0 dB means all the power is reflected from the load to the generator. The lower the return loss, the less the power is reflected back. The insertion loss (IL) is the ratio of power of the load to the power from the generator, which can be

expressed as ILS= 20log(21 ) dB. 0 dB means all the power is delivered to the load. The lower the insertion loss, the less the power is on the load. In our transmission line analysis, the bandwidth is defined as from 0 Hz to the first frequency where insertion loss is equal to -3 dB.

We should be aware that the closed form equations for the CPWs expressed above are suitable only for simple structures as shown in Fig. 3.24. For quilt-packaging structure shown in Fig. 3.23, because of its complexity, it is very difficult to describe by closed form equations. The frequency dependent parameters for CPWs are very hard to quantify, and the discontinuity between the different parts is very complex especially at very high frequency. So, a 3-D electromagnetic field solver, Ansoft HFSS [34], using finite element analysis, is adopted to simulate and analyze quilt-packaging structures.

A CPW structured QP prototype was built in Fig. 3.26. The silicon substrate has a relative permittivity of 11.9, resistivity of 10 Ω • cm, and a thickness of 100 µm. The isolation SiO2 has a thickness of 1 µm. The metal (Cu) on-chip interconnect is 1 µm thick. The spacing between the signal line and the ground plane is 8 µm. The on-chip interconnect has a length of 200 µm. The distance between the two chips is 40 µm. The copper nodule’s length is 100 µm, with 80 µm embedded inside the silicon substrate and

20 µm extended outside the chip edge. Three situations were simulated with the copper nodule depth at 10, 20 and 50 µm respectively. The purpose is to find which one can provide a better compromise between performance, robustness and ease of fabrication.

58 By choosing “wave port” in this simulation, HFSS can provide not only s-parameters but also port characteristic impedance and propagation constant at the port cross section, and it offers de-embedding for the transmission line, which means it can use the characteristics calculated at the port to find the s-parameters if we want to extend or shorten the length of the same transmission line connected with the port. The size of the wave port is critical. If it is too small, the fringing field behavior will be lost. If it is too large, the TE01 waveguide type mode distribution may result. The effects are shown in

Fig. 3.27. In our simulations, the wave port’s dimension is chosen at 144 µm × 500 µm, which is suggested in [35] as the width to be 4 times the signal line width plus spacing between signal line and ground plane, and the length of 5 times the substrate thickness.

Figure 3.26. HFSS simulation model of simple CPW structured

QP prototype.

(a) (b) Figure 3.27. Wave port field distribution when it is (a) too small and (b) too wide. (Adopted from [35].) 59 As mentioned before, the discontinuity between part N1 (on-chip interconnect) and N2 (copper nodule) can diminish the performance (more return loss and less insertion loss). To improve it, a tapered waveguide nodule pattern is designed and built in HFSS as shown in Fig. 3.28. The goal is to match the characteristic impedance of the on-chip

CPW and the nodule CPW at the interface. In the CPW theory, the characteristic impedance decreases as the thickness of the signal line or ground plane decreases and increases as the spacing between the two increases. A series of simulations with different spacings was conducted to find the best ones.

Figure 3.28. HFSS simulation model of CPW QP prototype with improved, tapered nodule pattern.

Figure 3.29 shows s-parameters from 1 GHz to 200 GHz for both straight and improved interconnected nodules before de-embedding. Return loss, S11, is shown in Fig.

3.29(a). In all cases, the improved geometries show better impedance matching across the interface, and thicker nodules result in poorer performance. The insertion loss, S21, shown in Fig. 3.29(b) follows the same trend. The return loss and insertion loss after de- embedding are shown in Fig. 3.30. In our 20 µm thick nodules, return loss better than -4

60 dB up to 150 GHz is predicted, and in the tapered geometry, the return loss improves to

-14 dB. Insertion loss improves from -3 dB to -2 dB at 200 GHz in the tapered geometry.

20 um wide 10 um deep 20 um wide 10 um deep 20 um wide 20 um deep 20 um wide 20 um deep 20 um wide 50 um deep 20 um wide 50 um deep 20 um wide 10 um deep improved 20 um wide 10 um deep improved 20 um wide 20 um deep improved 20 um wide 20 um deep improved 20 um wide 50 um deep improved 20 um wide 50 um deep improved 0 0

-10 -2

-20 -4

-30

-6 -40 (dB) before De-embedding (dB) before De-embedding (dB) before 11 21 S S -8 -50

0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.29. Simulated return loss (a) and insertion loss (b) before de-embedding.

20 um wide 10 um deep 20 um wide 10 um deep 20 um wide 20 um deep 20 um wide 20 um deep 20 um wide 50 um deep 20 um wide 50 um deep 20 um wide 10 um deep improved 20 um wide 10 um deep improved 20 um wide 20 um deep improved 20 um wide 20 um deep improved 20 um wide 50 um deep improved 20 um wide 50 um deep improved 0 0

-5 -2 -10

-15 -4

-20 -6 -25 (dB) after de-embedding (dB) after de-embedding after (dB) 11 21 S -30 S -8

-35 0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.30. Simulated return loss (a) and insertion loss (b) after de-embedding.

As we can see, the 20 µm thick nodule structure has a comparable performance to the 10 µm thick nodule structure, and is much better than the 50 µm thick nodule structure. The robustness of the nodule is a big concern for reliability, which tends to increase the thickness. As will be shown in the next chapter, a thicker nodule puts more

61 burden on the electrolytic plating process and chemical-mechanical polishing (CMP).

From the simulation results above and the fabrication process point of view, a 20 µm thick nodule gives us a better compromise. In the following simulations, we will use this nodule thickness only.

Three CPW QP geometries along with on-chip interconnect (without nodules), as shown in Fig. 3.31, were built. Figure 3.31 (a) is a “simple QP” prototype. Figure 3.31 (b) is “QP improved 1,” which has a tapered nodule signal line to reduce capacitance and improve impedance matching between the on-chip interconnect and the nodule structure.

Figure 3.31 (c) is “QP improved 2,” which has both a tapered signal nodule and a tapered ground plane to further reduce the interface discontinuity. Figure 3.31 (d) is a simple on- chip interconnect built for comparison of the effects introduced by the inlaid nodules.

Figure 3.31 (e) is a closer look at Fig. 3.31 (c). The symbols are the on-chip signal line

length (lo ), on-chip signal line width, which is the same as the signal nodule width (Wo ),

spacing between on-chip signal line and ground plane ( So ), inlaid nodule length (ln ),

nodule bridge length, which is the spacing between two chips ( Sa ), spacing between

signal nodule and ground nodule ( Sn ), nodule width at the interface of on-chip

interconnect (Wi ), and recessed ground nodule width (Wgr ).

The silicon substrate has a thickness of 600 µm, which is the same as that used to fabricate QP at Notre Dame. The depth of the nodules for both signal and ground is 20

µm. Four signal nodules with widths of 10, 20, 50 and 100 µm were simulated. The tapered signal lines and ground planes were chosen after optimization. The dimensions of the four groups are summarized in Table 3.1 to 3.4.

62

(a) (b)

(c) (d)

Sa

W gr S o l o Sn

Wi Wo ln (e)

Figure 3.31. Prototypes in Ansoft HFSS: (a) “simple QP,” (b) “QP improved 1,” (c) “QP improved 2,” (d) on-chip interconnect, and (e) a closer look at “QP improved 2”.

63

TABLE 3.1

DIMENSIONS OF THE 100 µM QP INTERCONNECTS (UNIT: µM)

100 µm QP geometries

I II III lo 200 200 200

Wo 100 100 100

S o 35 35 35 ln 80 80 80

Sa 40 40 40

S n 50 50 50 Wi 100 10 10

Wgr 0 0 95

TABLE 3.2

DIMENSIONS OF THE 50 µM QP INTERCONNECTS (UNIT: µM)

50 µm QP geometries

I II III l 200 200 200 o Wo 100 100 100

So 50 50 50

l 80 80 80 n Sa 40 40 40

Sn 50 50 50

W 50 10 10 i Wgr 0 0 50

64

TABLE 3.3

DIMENSIONS OF THE 20 µM QP INTERCONNECTS (UNIT: µM)

20 µm QP geometries

I II III lo 200 200 200

Wo 20 20 20

S o 10 10 10 ln 80 80 80

Sa 40 40 40

S n 20 20 20 Wi 20 2 2

Wgr 0 0 25

TABLE 3.4

DIMENSIONS OF THE 10 µM QP INTERCONNECTS (UNIT: µM)

10 µm QP geometries

I II III l 200 200 200 o Wo 10 10 10

So 6 6 6

l 80 80 80 n Sa 40 40 40

Sn 10 10 10

W 10 2 2 i Wgr 0 0 70

65 The dimensions of the QP prototypes were chosen as the following steps: first, Sn is fixed for each simple QP prototype as 50 µm for QP with 100 and 50 µm wide nodules,

20 µm for QP with 20 µm wide nodules and 10 µm for QP 10 µm wide nodules; second,

adjust So to get the best return and insertion loss for each simple QP prototype; third,

adjust Wi to continue improve the performance; finally, if step three is not enough,

increase Wgr to get the best combination performance for each prototype.

As shown in Fig. 3.31, the “lumped port” is chosen instead of “wave port” to drive the CPW, because here, the substrate thickness is chosen at 600 µm, not 100 µm as in Fig. 3.26 and 3.28, to match the actual fabrication situation. If the wave port was used, the width of the wave port would be 3000 µm and is more than 10 times larger than the length of the wave port, which will generate questionable results at high frequency according to [35]. Similar HFSS models to the ones in Fig. 3.26 and 3.28 were built using lumped ports and the simulation results were almost the same as in Fig. 3.29, which proves that using lumped ports does not compromise the accuracy in this situation.

For the 100 µm and 50 µm wide nodules, due to the larger sizes (larger capacitance), return loss and insertion loss deteriorate much faster at high frequency.

Simulations up to 40 GHz were performed on these two. For the 20 µm and 10 µm wide nodules, simulations up to 200 GHz were performed.

Figure 3.32 (a) and (b) show the return loss (S11) and insertion loss (S21) of the QP structures with 100 µm wide nodules and the on-chip interconnect without nodules before de-embedding. The return loss and insertion loss of simple QP are almost the same as the on-chip interconnect, which shows that the nodules introduce very little effect on the microwave performance of the whole packaging structure. With improved nodule

66 structures, the return loss and insertion loss improve and are better than the on-chip interconnect with the same length. At 40 GHz, the return loss of simple QP and on-chip interconnect is about -12 dB, the return loss of QP improved 1 is about -16 dB and the return loss of QP improved 2 is about -20 dB. The insertion loss follows the same trend.

At 40 GHz, the insertion loss improves from about -2.24 dB for simple QP and on-chip interconnect to -2.04 dB for QP improved 1 and 2.

-1.00 simple QP QP improved 1 -12 -1.25 QP improved 2 on-chip interconnect -15 -1.50

-1.75

-18 simple QP QP improved 1 -2.00

(before de-embedding) (before QP improved 2 (before de-embedding) (before

11 -21 on-chip interconnect 21 S S -2.25

-24 -2.50 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.32. Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures and on-chip interconnect.

After de-embedding to remove the effects of the on-chip interconnect (part N1 and N4 in Fig. 3.23), a better than -23 dB return loss and -0.65 dB insertion loss at 40

GHz are achieved for QP improved 2, as shown in Fig. 3.33. In all cases, the “improved”

QP geometries provide better impedance matching and loss characteristics than the

“simple” QP structure.

Quilt Packaging offers a highly-optimized chip-to-chip interconnect in terms of performance and overall benefits. In a system using QP technology, signals that originate near the edges of the dice need to traverse a very short distance between them, so that

67 latency would be extremely low. A worst-case scenario would be signals that originate deep in the interior of a chip and must terminate deep within the receiving chip. Even in such cases, the combination of using a low-loss upper metal level interconnect in conjunction with QP will offer a significant advantage over a “standard” signal path through a solder bump to a package substrate and back again due to the much shorter signal travel length of QP. The advantage to QP is that the connection between ICs is now direct without the need to traverse package structures such as leads, bumps, and package wiring.

simple QP simple QP QP improved 1 QP improved 1

QP improved 2 QP improved 2 -12 -0.2

-0.3 -15

-0.4

-18 -0.5

-0.6 -21 (dB) (after de-embedding) (after (dB) (dB) (after de-embedding) 11 21

S -0.7 S -24 -0.8 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.33. Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures after de-embedding.

From both Fig. 3.32 (b) and 3.33 (b), we see that the insertion loss has a moderate increase at low frequency (< 3 GHz). This low frequency dispersion is attributed to dielectric loss in the low-resistivity substrate, which is 10 Ω • cm in our simulation. To compare, a high resistivity substrate (8000 Ω • cm) was substituted and simulated. The models are almost the same, except that the thickness of the substrate is 380 µm instead of 600 µm. The values chosen here all match the QP prototypes fabricated and measured, described in the later chapters. Figure 3.34 (a) and (b) show the return loss and insertion

68 loss of the three QP structures on low and high resistivity substrate before de-embedding.

Both return loss and insertion loss are improved on the high resistivity silicon substrate.

At 40 GHz, a return loss better than -25 dB, and an insertion loss better than -0.15 dB are achieved using QP improved 2. Here, the impact of on-chip interconnect still exists. After de-embedding, the return loss and insertion loss will be better. Since the insertion loss is so low even before de-embedding, over de-embedding was encountered in our simulation

(larger than 0 dB insertion loss was obtained), and will not be shown here.

-10 0.0

-15 -0.3 -20 -0.6 simple QP (10 ohm-cm) -25 -0.9 QP improved 1 (10 ohm-cm) QP improved 2 (10 ohm-cm) -30 -1.2 simple QP (8000 ohm-cm)

simple QP (10 ohm-cm) QP improved 1 (8000 ohm-cm) -1.5 QP improved 2 (8000 ohm-cm) -35 QP improved 1 (10 ohm-cm) QP improved 2 (10 ohm-cm) -1.8 -40 simple QP (8000 ohm-cm)

(dB) (before de-embedding) (dB) QP improved 1 (8000 ohm-cm) (dB) (before de-embedding) (before (dB) 11 21 -2.1

S QP improved 2 (8000 ohm-cm) -45 S -2.4 -50 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz)

(a) (b) Figure 3.34 Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures on both low and high resistivity silicon substrate.

No low frequency dispersion is observed on the high resistivity substrate. It should be noted that the optimization process to get the dimensions in the low resistivity substrate was not repeated for the high resistivity substrate. If optimization were to be performed, better performance is expected.

Figure 3.35 (a) and (b) show the simulated return loss (S11) and insertion loss (S21) of the QP structures and on-chip interconnect with 50 µm wide nodules on both low and high resistivity silicon substrates before de-embedding. At 40 GHz, return loss of about -

20 dB was achieved on the low resistivity substrate, and lower than -30 dB on the high

69 resistivity substrate; insertion loss of approximately -1.8 dB on the low resistivity substrate, and as low as -0.1 dB on the high resistivity substrate. Both return loss and insertion loss of the QP structures with 50 µm wide nodules are better than the ones with

100 µm wide nodules, which is due to the smaller width of both on-chip interconnect and nodules leading to less capacitance in the whole network. Also, the insertion loss on the high silicon substrate does not suffer from the low frequency dispersion caused by the dielectric loss.

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 1 (10 ohm-cm) QP improved 1 (10 ohm-cm) QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) on-chip interconnect (10 ohm-cm) on-chip interconnect (10 ohm-cm) simple QP (8000 ohm-cm) simple QP (8000 ohm-cm) QP improved 1 (8000 ohm-cm) QP improved 1 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) -10 0.0

-20 -0.4

-30 -0.8 -40

-1.2 -50 (dB) (before de-embedding)

(dB) (before de-embedding) -1.6

11 -60 21 S S -70 -2.0 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.35. Simulated return loss (a) and insertion loss (b) of 50 µm wide QP structures on both low and high resistivity silicon substrate before de-embedding.

After de-embedding the effect of on-chip interconnect in the QP structures on the low resistivity substrate, the return loss is less than -19 dB in the whole frequency range and the insertion loss is better than -0.65 dB at 40 GHz for QP improved 2, as shown in

Fig. 3.36. The over-de-embedding was also encountered for the QP structures on high resistivity substrate. The graphs were omitted. In the following simulations, over-de- embedding occurred on all the QP structures formed on high resistivity substrate, and

70 remained a concern for accurately describing the microwave performance. However, the signal integrity is very good for the QP structures on the high resistivity substrate even before de-embedding. It is certain that by removing the contributing effects of on-chip interconnect, the performance will be better with only the QP nodule structures.

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 1 (10 ohm-cm) QP improved 1 (10 ohm-cm)

QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) -16 -0.2

-0.3 -18

-0.4 -20

-0.5

-22 (dB) (after de-embedding) (dB) (after de-embedding)

11 -0.6 21 S S

-24 -0.7 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 3.36. Simulated return loss (a) and insertion loss (b) of 50 µm wide QP

structures on low resistivity silicon substrate after de-embedding.

For the QP structures with 10 and 20 µm wide nodules, simulations with frequency up to 200 GHz were conducted.

Figure 3.37 shows the simulated return loss and insertion loss of the QP structures with 20 µm wide nodules and on-chip interconnects on both low resistivity (10 Ω • cm) and high resistivity (8000 Ω • cm) silicon substrates before de-embedding. The “QP improved 2” promises better performance than the “simple QP” and “QP improved 1”.

On the high resistivity substrate, it has a return loss around -20 dB at 200 GHz, while the insertion is only about -0.5dB at 200 GHz. Even on low resistivity substrate, the insertion loss of all three QP structures is predicted to be better than -2.5 dB in the whole frequency range, which means the bandwidth of the QP structures is beyond 200 GHz,

71 resulting in ultra-fast chip-to-chip communications. The predicted insertion loss on the low resistivity substrate also shows moderate increase at low frequency due to dielectric loss, while on high resistivity substrate, the low-frequency dispersion is gone. Figure 3.38 shows the return loss and insertion loss of the three QP structures on low-resistivity substrate after de-embedding. Only the effect on the nodule structures is left. A better than -22dB return loss and -0.7dB insertion loss is predicted on the “QP improved 2” in the whole frequency range.

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 1 (10 ohm-cm) QP improved 1 (10 ohm-cm) QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) on-chip interconnect (10 ohm-cm) on-chip interconnect (10 ohm-cm) simple QP (8000 ohm-cm) simple QP (8000 ohm-cm) QP improved 1 (8000 ohm-cm) QP improved 1 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) 0.0 -10 -0.5 -20

-1.0 -30

-1.5 -40

-50 -2.0 (dB) (before de-embedding) (dB) 20 um (dB) (before de-embedding) 20 um de-embedding) (before (dB) 11 21

S S -60 -2.5 0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 3.37. Simulated return loss (a) and insertion loss (b) of 20 µm wide QP

structures on both low and high resistivity silicon substrate before de-

72

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 1 (10 ohm-cm) QP improved 1 (10 ohm-cm)

QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) 0.0

-12

-0.3

-18

-0.6

-24

-0.9 -30 (dB) (after de-embedding)(after (dB) 20 um (dB) (after de-embedding) 20 um 20 de-embedding) (after (dB) 21 11 -36 S -1.2 S 0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.38. Simulated return loss (a) and insertion loss (b) of 20 µm wide QP structures on low resistivity silicon substrate after de-embedding.

Figure 3.39 shows the simulated return loss and insertion loss of the QP structures with 20 µm wide nodules and on-chip interconnects on both low resistivity and high resistivity silicon substrates before de-embedding. On the high resistivity substrate, “QP improved 2” has a return loss of -10 dB and an insertion loss of -1 dB at 200 GHz, which is slightly inferior to the structures with 20 µm wide nodules. It may be due to the larger resistivity loss on both on-chip interconnect and nodules. After de-embedding the QP structures on low-resistivity substrate, a better than -12 dB return loss and better than -2.6 dB insertion loss at 200 GHz are achieved for “QP improved 2”, as shown in Fig. 3.40.

73

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 1 (10 ohm-cm) QP improved 1 (10 ohm-cm) QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) on-chip interconnect (10 ohm-cm) on-chip interconnect (10 ohm-cm) simple QP (8000 ohm-cm) simple QP (8000 ohm-cm) QP improved 1 (8000 ohm-cm) QP improved 1 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) 0 0

-10 -1

-20

-2

-30

-40 -3 (dB) (before de-embedding) 10 um de-embedding) (before (dB) um 10 de-embedding) (before (dB) 11 21 S -50 S -4 0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 3.39. Simulated return loss (a) and insertion loss (b) of 10 µm wide QP

structures on both low and high resistivity silicon substrate before de-embedding.

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 1 (10 ohm-cm) QP improved 1 (10 ohm-cm)

QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) 0.0 -6 -0.6 -12

-1.2 -18

-1.8 -24

-30 -2.4 um 10 de-embedding) (after (dB) (dB) (after de-embedding) 10 um de-embedding) (dB) (after 11 21 S S -3.0 0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.40. Simulated return loss (a) and insertion loss (b) of 10 µm wide QP structures on low resistivity silicon substrate after de-embedding.

The above simulations show that the QP structures are capable of ultra-fast chip- to-chip communications. There is one thing needed to be noticed in our QP structures built in Ansoft HFSS. The ground plane is wider compared with the width of the signal

74 line. In a real system, the density of I/Os is very important. To demonstrate the ability of

QP structures in dense situations, three structures (two QP structures and one simple on- chip interconnect structure) were built and simulated. Figure 3.41 shows the three structures. Figure 3.41 (a) is the “simple QP with limited ground”, Fig. 3.41 (b) is the

“QP improved with limited ground”, Fig. 3.41 (c) is the “on-chip interconnect with limited ground”, and a closer look at the “QP improved with limited ground” is shown in

Fig. 3.41 (d). The widths of the ground planes are set to the same as the width of the

signal line and nodule. The nodule width (Wi ), same as shown in Fig. 3.31 (e), at the interface between on-chip interconnect and nodule is set to be 10 µm when nodule widths are 50 µm or 100 µm, and 2 µm when nodule widths are 10 µm or 20 µm. Both low and high resistivity silicon substrates are included in the simulations. For 50 µm and 100 µm wide nodules, simulations up to 40 GHz were performed, for 10 µm and 20 µm wide nodules, simulations up to 200 GHz were conducted.

(a) (b)

75

(c) (d)

Figure 3.41. Prototypes in Ansoft HFSS: (a) “simple QP with limited ground”, (b) “QP improved with limited ground”, (c) on-chip interconnect with limited ground, and (d) a closer look at “QP improved with limited ground”.

Figure 3.42 and 3.43 show the return loss and insertion loss of both QP structures

with 100 µm nodules with wide ground plane and limited ground plane on low and high

resistivity silicon substrate. The “QP improved with limited ground” structure was

compared with “QP improved 2”. As we can see, the difference between the two insertion

losses at 40 GHz is less than 0.05 dB, and the difference between the two return losses is

less than 5 dB in the whole frequency range.

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) on-chip interconnect (10 ohm-cm) on-chip interconnect (10 ohm-cm) simple QP with limited ground (10 ohm-cm) simple QP with limited ground (10 ohm-cm) QP improved with limited ground (10 ohm-cm) QP improved with limited ground (10 ohm-cm) on-chip interconnect with limited ground (10 ohm-cm) on-chip interconnect with limited ground (10 ohm-cm)

-1.2 -12

-1.5

-15

-1.8

-18 -2.1 (dB) (before de-embedding) 100 um de-embedding) (before (dB) (dB) (before de-embedding) 100 um de-embedding) (before (dB) 11 21 S S

-21 -2.4 010203040 010203040 Frequency (GHz) Frequency (GHz)

(a) (b) Figure 3.42. Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures with wide and limited ground plane on low resistivity silicon substrate before de-embedding.

76 simple QP (8000 ohm-cm) simple QP (8000 ohm-cm) QP improved 2 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) simple QP with limited ground (8000 ohm-cm) simple QP with limited ground (8000 ohm-cm) QP improved with limited ground (8000 ohm-cm) QP improved with limited ground (8000 ohm-cm) on-chip interconnect with limited ground (8000 ohm-cm) on-chip interconnect with limited ground (8000 ohm-cm) -10 0.0

-20 -0.1

-30 -0.2

-40 -0.3 (dB) (before de-embedding) 100 um 100 (before de-embedding) (dB) (dB) (before de-embedding) 100 um 21 11

S S -50 -0.4 010203040 0 10203040 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.43. Simulated return loss (a) and insertion loss (b) of 100 µm wide QP structures with wide and limited ground plane on high resistivity silicon substrate before de-embedding.

Figure 3.44 and 3.45 show the return loss and insertion loss of the 50 µm wide QP structures with wide and limited ground plane on both low and high resistivity substrate.

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) on-chip interconnect (10 ohm-cm) on-chip interconnect (10 ohm-cm) simple QP with limited ground (10 ohm-cm) simple QP with limited ground (10 ohm-cm) QP improved with limited ground (10 ohm-cm) QP improved with limited ground (10 ohm-cm) on-chip interconnect with limited ground (10 ohm-cm) on-chip interconnect with limited ground (10 ohm-cm)

-0.8 -16

-1.2

-18

-1.6

-20 (dB) (before de-embedding) um 50 de-embedding) (before (dB) (dB) (dB) (before de-embedding) 50 um 21 11 S S -2.0 010203040 0 10203040 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.44. Simulated return loss (a) and insertion loss (b) of 50 µm wide QP structures with wide and limited ground plane on low resistivity silicon substrate before de-embedding.

77 simple QP (8000 ohm-cm) simple QP (8000 ohm-cm) QP improved 2 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) simple QP with limited ground (8000 ohm-cm) simple QP with limited ground (8000 ohm-cm) QP improved with limited ground (8000 ohm-cm) QP improved with limited ground (8000 ohm-cm) on-chip interconnect with limited ground (8000 ohm-cm) on-chip interconnect with limited ground (8000 ohm-cm) -20 0.00

-30 -0.05

-40 -0.10

-50 -0.15 -60 (dB) (before de-embedding) 50 um (dB) (before de-embedding) 50 um de-embedding) (before (dB) 11 21

S S

-70 -0.20 010203040 0 10203040 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.45. Simulated return loss (a) and insertion loss (b) of 50 µm wide QP structures with wide and limited ground plane on high resistivity silicon substrate before de-embedding.

Limited difference is observed up to 40 GHz.

For 10 and 20 µm wide QP structures, simulations up to 200 GHz were conducted to compare the performance, shown in Figs. 3.46 to 3.49.

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) on-chip interconnect (10 ohm-cm) on-chip interconnect (10 ohm-cm) simple QP with limited ground (10 ohm-cm) simple QP with limited ground (10 ohm-cm) QP improved with limited ground (10 ohm-cm) QP improved with limited ground (10 ohm-cm) on-chip interconnect with limited ground (10 ohm-cm) on-chip interconnect with limited ground (10 ohm-cm) 0.0 -10

-0.5

-20 -1.0

-30 -1.5

-2.0 -40 (dB) (before de-embedding) 20 um (dB) (before de-embedding) 20 um 20 de-embedding) (before (dB) 21 11

S -2.5 S

-50 0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 3.46. Simulated return loss (a) and insertion loss (b) of 20 µm wide QP structures with wide and limited ground plane on low resistivity silicon substrate before de-embedding.

78 simple QP (8000 ohm-cm) simple QP (8000 ohm-cm) QP improved 2 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) simple QP with limited ground (8000 ohm-cm) simple QP with limited ground (8000 ohm-cm) QP improved with limited ground (8000 ohm-cm) QP improved with limited ground (8000 ohm-cm) on-chip interconnect with limited ground (8000 ohm-cm) on-chip interconnect with limited ground (8000 ohm-cm) 0.0 -10 -0.4 -20

-0.8 -30

-1.2 -40

-1.6 -50 (dB) (before de-embedding)um 20 (dB) (before de-embedding)um 20 11 21 S S -2.0 -60 0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.47. Simulated return loss (a) and inserti on loss (b) of 20 µm wide QP structures with wide and limited ground plane on high resistivity silicon substrate before de-embedding.

simple QP (10 ohm-cm) simple QP (10 ohm-cm) QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm) on-chip interconnect (10 ohm-cm) on-chip interconnect (10 ohm-cm) simple QP with limited ground (10 ohm-cm) simple QP with limited ground (10 ohm-cm) QP improved with limited ground (10 ohm-cm) QP improved with limited ground (10 ohm-cm) on-chip interconnect with limited ground (10 ohm-cm) on-chip interconnect with limited ground (10 ohm-cm) 0

-10 -1

-20 -2

-3 -30

-4 (dB) (before de-embedding) 10 um de-embedding) (before (dB) (dB) (before de-embedding) 10 um 10 de-embedding) (before (dB) 11 -40 21 S S

-5 0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 3.48. Simulated return loss (a) and insertion loss (b) of 10 µm wide QP structures with wide and limited ground plane on low resistivity silicon substrate before de-embedding.

79 simple QP (8000 ohm-cm) simple QP (8000 ohm-cm) QP improved 2 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) on-chip interconnect (8000 ohm-cm) simple QP with limited ground (8000 ohm-cm) simple QP with limited ground (8000 ohm-cm) QP improved with limited ground (8000 ohm-cm) QP improved with limited ground (8000 ohm-cm) on-chip interconnect with limited ground (8000 ohm-cm) on-chip interconnect with limited ground (8000 ohm-cm) 0 0

-10 -1

-20

-2

-30

-40 -3 (dB) (before (dB) de-embedding) 21

S (dB) (before de-embedding) 10 um de-embedding) (before (dB) 11 S -50 -4 0 40 80 120 160 200 0 40 80 120 160 200 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 3.49. Simulated return loss (a) and insertion loss (b) of 10 µm wide QP structures with wide and limited ground plane on high resistivity silicon substrate before de-embedding.

From the figures above, we can see that the insertion loss and return loss are comparable for both QP structures and QP structures with limited ground. On low resistivity silicon substrate, the performance is similar in the whole frequency range (1 ~

200 GHz), while for QP structures with limited ground on high resistivity substrate, resonance occurs at about 140 GHz, which may limit the bandwidth. But, even at 140

GHz, QP structures have much better performance compared with state-of-art packaging techniques in industry.

In this chapter, extensive simulations were conducted on different QP structures and compared with the on-chip interconnect. The discontinuity introduced by the nodules show very limited effect on the system performance due to the small size. It strongly suggests that the QP structures have ultra-wide bandwidth and are suitable for fast chip- to-chip interconnection.

80

CHAPTER 4

QUILT-PACKAGING: FABRICATION PROCESS

In this chapter, we will clearly look into each step. Three mask sets were designed to demonstrate the whole QP process. A preliminary fabrication process of QP is briefly described as in Fig. 4.1.

(a) (b)

(c) (d)

(e) (f)

81

(g) (h) Figure 4.1. Fabrication process of quilt packaging: (a) Wafer after the devices formed and before the first metal layer placed, (b) deep trenches at the edge of the chips by deep reactive ion etching (DRIE), (c) a closer look of (b), (d) passivation of trenches to form insulation layer, (e) after seed layer deposition and electroless copper plating of the trenches, (f) chemical-mechanical polishing (CMP) of wafer surface, (g) complete interconnection with pads, and (h) separate chips by DRIE and CMP, undercut removal by isotropic etching.

4.1 Calibration of DRIE

The first important process is to etch trenches (Fig. 4.1 b and c) with high aspect ratio in a silicon wafer. We use the Alcatel 601E deep plasma etching system running the

Bosch process [1]. The Bosch process is a time-multiplexed deep etching (TMDE) process [2], that offers the advantage of high speed and high aspect ratio silicon etching using fluorine chemistries. The Bosch process uses two different gases: SF6 for silicon etching and C4F8 for passivation. The two gases are introduced independently one at a time, as shown in Fig. 4.2, and the machine alternates the etching cycle and passivation cycle.

Figure 4.2. A typical Bosch process. (Adopted from [2].)

82 During the etching step, a shallow trench that has the isotropic profile due to SF6 discharge is formed in the silicon substrate. The typical duration time is ≤ 12s. During the passivation step, a protective fluorocarbon film is deposited on all the surfaces due to

C4F8 discharge. The duration time is typically ≤ 10s, and is shorter than etching cycle.

Then, ion bombardment causes the preferential removal of the film from the horizontal surface, and leaves the sidewall still covered with the protective film. A highly anisotropic feature is built, as shown in Fig. 4.3. The Bosch process in the DRIE has an etching cycle of about 7 seconds, and a passivation cycle of about 2 seconds.

Figure 4.3. Cartoon shows the sequential steps of the Bosch process. (Adopted from [2].)

The Bosch process works at about 20 deg. C, and shows a high selectivity towards the standard photoresist masks, reaching 200:1, and towards a large variety of hard masks

(SiO2, Si3N4, etc.), exceeding 300:1 [3].

Because of the alternate etching and passivation cycles, and the isotropic nature of etch of fluorinated chemistries, Bosch-fabricated structures show a characteristic scalloped sidewall roughness, as shown in Fig. 4.4.

83

Figure 4.4. SEM picture of the sidewall after the Bosch process, which shows the typical scalloping profile. (Adopted from [2].) For our Alcatel 601E, which runs the Bosch process, the etching rate is not uniform for trenches with different aspect ratios, a phenomenon called aspect ratio dependent etching (ARDE). So, the calibration of different silicon etching rates for different profiles is needed for our process. The standard Bosch process has an SF6 flow rate of 300 sccm and a C4F8 flow rate of 130 sccm, with duration cycle time at 7 seconds and 2 seconds respectively. The pressure is set by the position of automatic pressure control, at 23%. The plasma is generated by a source power of 1800 W, and the substrate power is 80 W. The silicon substrate temperature is maintained at 20 deg. C during the whole process.

A one-layer test mask set to perform the DRIE calibration, was designed and fabricated at Notre Dame. Long trenches, for easy cleavage [4] so that the cross-section can be inspected using our scanning electron microscopy (SEM) were etched using Bosch process. The dimensions of the long trenches are shown in Table 4.1. The minimum width of the lines is 1 µm, and the maximum width of the lines is 20 µm. Different spacings ranging from 2 µm to 20 µm are used to observe the effects on silicon etching rate.

84 TABLE 4.1 DIMENSION AND SPACING OF LONG LINES.

Spacing Number Spacing Number 1um×0.8cm 2um 20 5um 20

2um×0.8cm 2um 20 5um 20 5um×0.8cm 5um 20 10um 20

10um×0.8cm 5um 20 10um 20

20um×0.8cm 10um 20 20um 20

The procedures of preparation of test samples are described as below:

1. RCA clean wafers: a. Dip 4-inch silicon wafers for 10 minutes into RCA 1

solution containing de-ionized (DI) H2O : H2O2 : NH4OH = 5:1:1, which is

pre-heated to 700C. The purpose is to remove insoluble organic contaminants;

b. Rinse wafers in DI water for 1 minute; c. Submerge the wafers for 10

minutes into RCA 2 solution consisting of DI H2O : H2O2 : HCL = 6:1:1,

which is also pre-heated to 700C,. The reason is to remove ionic and heavy

metal atom contaminants; d. Rinse in DI water for 1 minute.

2. HF clean wafers: a. Submerge wafers into HF : H2O = 1:10 solution for 20

seconds to remove the thin layer of native oxide of the silicon wafers; b.

Rinse in DI water for 2 minutes; c. Spin or blow dry.

3. Spin photoresist AZ 5214 on the wafers at speed of 3000 rpm for 30 seconds,

which results in about 2 µm thick coating on the wafers.

4. Perform photolithography using the mask on 6300 wafer stepper. The dose

needed for AZ 5214 is 120 mJ/cm2, while the dose of the stepper provides is

around 24 mJ/cm2, so the duration time for each exposure is 5 seconds.

85 5. Bake wafers at 1100C on the hot plate for 1 minute, which is for image reverse.

6. Flood exposure: the overall dose is in the range of 300 ~ 350 mJ.

7. Develop photoresist in AZ 327 for 50 ~ 60 seconds: the time for development

is longer when the exposed pattern is smaller, in this case about 1 µm. It is

very hard to develop 1 µm wide patterns. Much longer time is needed, but

this causes over-development of the relatively larger patterns.

Four wafers were prepared using the same steps from 1 to 7. Wafers were loaded into the Alcatel and Bosch process was run to etch the exposed silicon area. The etch time is 3, 5, 10 and 20 minutes respectively. Figure 4.5 shows the SEM pictures of the etched trenches after 3 minutes of Bosch process.

(a) (b)

(c) (d)

86

(e) (f)

(g) (h)

Figure 4.5. SEM pictures of trenches after 3 minutes of Bosch etch: (a), (b): 2 µm wide; (c), (d): 5 µm wide; (e), (f): 10 µm wide; and (g), (h): 20 µm wide.

From the above pictures, we can see that the spacing does not affect the etching speed, but the width (aspect ratio) of the lines has a huge effect on the etch rate. Figure

4.6 shows the etch rate for different widths.

160 2 um wide 5 um wide 140 10 um wide 20 um wide 120

100

80 60 Etch Depth (um)

40

20 0 5 10 15 20 Etch Time (min) Figure 4.6. Etch rate of Bosch process.

87 The etch rate is not uniform with time. This is because the Bosch process uses

SF6 to etch trenches in silicon. The ions hit the silicon, and some residuals are generated.

The etched residuals inside the trench need to be blown out before the next effective ion etching. So, when the trench depth increases, the etch rate decreases. In order to etch trenches with 20 µm width to 20 µm deep, the etch time is around 70 s.

From the fabrication process of quilt-packaging shown in Fig. 4.1, we know that after the formation of embedded copper nodules, another etch step is needed to remove the silicon underneath part of the nodules. The Bosch process yields anisotropic etching in the vertical direction because of the alternative etching and passivation cycles, in the mean time, it has the ability and flexibility to change the silicon etch profile [5]. By correctly changing Bosch process parameters, such as pressure, flow rate, cycle time and power, we can achieve the reentrant etch to expose the copper nodules while separating the chips (Fig. 4.1 h).

4.2 Copper Plating

The step following the trench etch (Fig. 4.1 d) is to deposit a layer of silicon- dioxide using plasma enhanced chemical vapor deposition (PECVD) on the pattern side and the back side of the wafer. The pattern side deposition is for isolation between silicon substrate and the following metal seed layer and copper plating. Since the silicon substrate has a finite resistivity, the back side deposition is to create a protection layer, which prevents the plating process from affecting the substrate. PECVD is a conformal deposition process, which ensures the thickness of the deposited material is almost the same on the bottom and on the sidewall. It is appropriate for the isolation.

88 Both electroless and electrolytic copper plating processes were tested to fill the trenches (Fig. 4.1 e).

4.2.1 Electroless Copper Plating

Electroless plating is a selective plating process in which metal ions are reduced to a metallic coating by the reducing agent in the solution. Plating takes place only in suitable catalytic places. Electroless plating offers a number of advantages over electroplating, such as selective deposition, which has been widely used for the damascene process in on-chip interconnect formation. Typical electroless plating formulations contain [6]:

1. A source of metal ions;

2. A reducing agent to help prevent precipitation in bulk solution;

3. A complexant to depress the free metal ion concentration to a value

determined by the dissociation constant of the metal complex, and allow the

bath to be operated at higher pH values;

4. A buffer to stabilize the pH of the solution;

5. Exaltants to increase the rate to an acceptable level without causing bath

instability;

6. Stabilizers to adsorb on the active nuclei and shield them from the reducing

agent in the plating solution.

Electroless copper plating is a low temperature process that introduces less thermal stress and is compatible with the IC techniques. It can be performed on a catalytic surface due to the anodic oxidation of the reducing agent and cathodic reduction

89 of copper ions. H. Jiang, etc. [7][8] and J.-L.A. Yeh [9] presented a method to electrolessly plate copper on silicon to form high-Q on-chip inductors. Since silicon is not thermodynamically favorable for the initiation of electroless copper plating, some catalytic metal (palladium) is used to activate the silicon surface by the mechanism of contact displacement to form a seed layer where the electroless deposition can begin.

The contact displacement deposition is a reaction in which electrons are supplied by the substrate instead of an oxidation, so, the surface of silicon-dioxide and silicon-nitride remain inert. The activation solution also contains diluted HF to remove the native oxide on the silicon surface. Formaldehyde is used as the reducing agent. The main reaction of copper plating can be expressed as [10]:

2+− − Cu++→+++24 HCHO OH Cu H22 22 H O HCOO . (4.1)

This electrochemical mechanism can be decomposed to two steps, which happen simultaneously at the electrodes. First, the copper deposition happens in the palladium surface [11]:

2+ −− Anode: HCHO+→322 OH HCOO ++ H2 O e . (4.2)

Cathode: Cu2+−+→2 e Cu . (4.3)

On the surface of deposited copper, the reaction at the cathode remains the same while at the anode, it becomes [12]:

2+ −− Anode: 242HCHO+→ OH HCOO +++ 2 H22 O H 2 e . (4.4)

Figure 4.7 shows the encapsulated copper on a polysilicon surface. The electroless copper plating bath, which is used in [9], consists of a cupric salt (5 g/l CuSO4

• H2 O), a reducing agent (5 ml/l HCHO), and a complexant [15 g/l

Ethylenediaminetetraacetic acid (EDTA)]. The buffer to adjust the batch pH level to 12-

90

Figure 4.7. 1 µm thick copper is electroless plated on the surface of 1.5 µm thick polysilicon. (Adopted from [9])

13 is potassium hydroxide (KOH). Additives in the bath are surfactant RE610, brightener 2,2’-dipyridyl and surfactant Triton X-100, which can enhance bath stability and improve the metallurgical quality of the plated copper.

The bath is agitated at about 100-300 rpm, and the temperature is between 55-66 deg. C. The copper resistivity (deposited at 66 deg. C) is 2.1 µΩ⋅cm , and the deposition rate is around 55 nm/min.

R. Jagannathan and M. Krishnan [13] gave another method to do electroless copper plating. Instead of using a Pd catalytic solution to activate the silicon surface, they evaporated a layer of Cr/Cu (20 nm/50 nm) on the surface as the seed layer. The solutions were formulated with copper sulfate, a tetraaza ligand, and a suitable buffer such as triethanolamine. Amine borane reducing agents are used instead of the conventional formaldehyde-based electroless copper solution, and can permit the deposition to operate at a pH level of ≤ 9. Figure 4.8 shows two electroless copper plating examples. The resistivity of the deposited copper can achieve 1.83-1.95 µΩ⋅cm .

91

(a) (b) Figure 4.8. SEM pictures of polyimide trenches filled with electroless plated copper (a) 0.7 µm wide, 1.9 µm deep and (b) 0.7 µm wide, 2.8 µm deep. (Adopted from [13])

We will show the experiments of electroless copper plating on Cr/Cu seed layer in the following sections.

4.2.2 Electrolytic Copper Plating

Electrolytic plating is an electrochemical reaction that takes place under the influence of electric current. During the process, the wafer with patterned seed layer is connected as the negatively biased cathode. The electroplated metal (Cu) is connected as the positively biased anode. Both cathode and anode are immersed in a conductive plating solution containing cupric ions. When sufficient current passes through from anode to cathode, at the cathode, Cu2+ is reduced to Cu as

Cu2+−+→2 e Cu , while at the anode, an oxidation reaction occurs as

Cu→+ Cu2+−2 e .

The current is conducted by the flow of positive Cu2+ ions through the electroplating solution. All the cupric ions that are consumed from the solution at the cathode are

92 replaced by the dissolution from the copper anode, which balances the current flow and maintains the neutrality in the solution [14].

Copper has been widely adopted in on-chip interconnects in state-of-the-art microelectronics for its superior performance in conductivity and electromigration over aluminum and copper-aluminum alloy, thus enabling a higher current density and sustaining the trend to narrower line-widths. However, copper cannot be effectively applied using ‘dry’ deposition techniques [15], such as physical vapor deposition (PVD) and chemical vapor deposition (CVD). PVD cannot provide sufficient fill, while CVD requires costly, unstable and hazardous organic-metallics. Electrolytic plating is chosen over electroless plating at this area because electroless plating is slower, involves more complicated and costly chemistry and control, and the bath needs frequent replacement.

An efficient processing sequence, called “dual damascene”, combining electrolytic copper plating and chemical mechanical polishing (CMP), metallizing both trenches and vias in a single step, which results in lower cost and fewer defects, is a powerful technique in silicon IC fabrication process. The schematic of “dual damascene” process flow is shown in Fig. 4.9.

Trench etch

Via PR

93

Via etch

Barrier layer (Ta, TaN, Ti or TiN) & copper seed layer

Electrolytic

copper plating

After CMP

Figure 4.9. Schematic process flow of dual damascene.

There are several key issues in the copper metallization process [15]. First, and the most important, is the requirement of completely filling up the vias and the trenches.

To achieve this, copper cannot be plated in “normal” mode, in which the top ridges build- up first due to the enhanced transport and higher field at small curvature. Voids will occur during this process. Conformal plating, which is due to the suppressing the deposition process through the use of excess additives, is also not acceptable. Seams can

94 be generated in the center. Bottom-up, also called “super-fill”, is the preferred process.

The key is IBM’s discovery that the bottom-up fill can be achieved by the use of a mixture of plating additives that have differential adsorption on the flat surface and within the via. During the process, the copper growth starts from the bottom of the via and rapidly progressed upwards. Figure 4.10 shows the schematics of the three plating techniques.

(a) normal plating

(b) conformal plating

(c) bottom-up plating

Figure 4.10. Via plating patterns.

(adopted from [15])

The typical mixture used in bottom-up plating solution contains a relatively large, slow diffusing, suppressive additives (e.g. polyethylene glycol) that preferentially adsorbs on the flat surface and the rims of the vias and slows down the plating rate at these areas, combining with another additives (e.g. a kind of organic sulfur compound) that preferentially adsorbs within the vias and speeds up the plating rate (or negates the effects of suppressive additives) at the bottom. Figure 4.11 shows the distribution of the

95 additives close to and within a via. Variable adsorption leads to variable kinetics and bottom-up fill. Figure 4.12 shows SEM pictures of the three plating techniques.

Figure 4.11. Additives distribution near and within a via. (adopted from [15])

Initial stage Midway Final deposit Figure 4.12. Time evolution of via fill. The top row is normal plating. The center row is conformal plating. The bottom row is bottom-up plating. (Adopted from [15])

96 In addition to the use of additives in a plating solution, pulse and pulse-reverse current electroplating have also been exploited to improve high-aspect-ratio features

[16][17][18]. Simple schematics of pulse and pulse-reverse current waveforms are shown in Fig. 4.13.

(a)

(b) Figure 4.13. (a) Pulse and (b) pulse-reverse current waveforms.

Compared with DC plating, pulse and pulse-reverse plating have several potential advantages, such as improved deposition properties, improved current efficiency, and the reduced use of additives. Besides, the uniformity requirement on the wafer scale during copper plating is very important, which can lead to less burden on the CMP process. It is not a trivial challenge, since it is not easy to achieve highly uniform flow across the entire wafer, and, the difference of feature density (“loading”) across the wafer causing variations in the local current densities and in transport. For pulse and pulse-reverse plating, the time-averaged plating uniformity in the features depends on several issues, such as operating conditions, bath composition, and feature size. Pulse-reverse plating frequently results in greater improvements in throwing power than the pulse plating [18], where in electroplating plating, the throwing power is qualitatively defined as the ability of a solution to deposit metal uniformly upon a cathode of irregular shape.

97 Other than filling ability and uniformity, the seed layer resistance effect, which may lead to thicker deposition near the wafer circumference, is also a concern. The planting of seed layer in aggressive geometries, such as narrow, high-aspect-ratio, perpendicular or backward sloping sidewall, needs advanced PVD techniques, or some alternative seeding process such as electroless plating and CVD. The seed layer is often very thin, especially towards the bottom of the sidewalls. The thin layer may become discontinuous due to copper agglomeration and lead to gap filling failure.

During the process, the plated copper surface is continuously undergoing recrystallization, in which the relatively smaller plated copper grains (~ 0.1 µm) grow to about 1 µm over a period of time, generally several days in room temperature [15].

Larger grains are preferable because they have higher conductivity and less electromigration.

4.3 Chemical Mechanical Polishing

Chemical mechanical polishing (CMP) processes are widely adopted nowadays in ultra-large-scale-integrated (ULSI) circuits to planarize wafers, remove films and construct damascene interconnects. CMP relies on chemical-mechanical action, as its name suggests, and not on mechanical abrasion to polish the surface. High areas of wafers are removed without affecting low areas, which allows the wafer to be isotropic and planar. CMP is carried out using slurries, which consist of fine abrasive particles suspended in an aqueous media containing chemical reagents. In the CMP process, the wafer is affixed to a rotating wafer carrier by back pressure, and pressed face-down to a rotating platen which holds a polishing pad, as shown in Fig. 4.14 [19] .

98

Figure 4.14. Configuration of CMP tool. (Adopted from [19].)

Slurry held in suspension is dripped onto the platen during polish. The carrier and the platen rotate at different speeds in the same or opposite directions.

The polishing process combines mechanical and chemical actions. The chemical reagents in slurries soften the material and make the mechanical abrasion easier. The polishing speed is dramatically increased and the roughness of the surface is decreased.

Copper has been the choice for interconnect material in ULSI technology due to its low resistivity. Damascene technology using copper CMP has been the best way for copper patterning so far. The idea is to deposit a diffusion-barrier layer (usually Ta- or

Ti- based) and copper seed layer in the etched dielectric trenches. Then, electroplating or electroless plating is performed to fill the trenches while some areas expand above the surface. CMP is then operated to remove excess Cu and the barrier layer without excessive loss of interconnect lines and dielectrics. Figure 4.15 shows a single layer of

Cu interconnect before and after CMP.

99

Figure 4.15. Schematic of Cu CMP. (a) Before CMP; (b) ideal

case after CMP; and (c) real case after CMP.

(Adopted from [20].)

In the ideal case, the CMP process finishes removal of excessive copper and barrier layer simultaneously over the whole wafer area, as shown in Fig. 4.15 (b).

However, there is always a lack of uniformity in the real case. The removal rates differ for each layer and the underlying pattern geometry [20]. Because it is required to remove excess Cu and barrier layer thoroughly, the dielectric layer is over-polished, resulting in erosion. And, because the Cu is softer than the hard dielectric material, it is polished faster, and the copper line is dished as shown in Fig. 4.15 (c). Erosion and dishing reduce the thickness of both the dielectric layer and Cu interconnects, and decrease the surface planarity of Cu, which can affect the performance of chips significantly. Lai et al [21] and Gbondo-Tugbawa [22] reported that erosion is more significant than dishing in the small Cu interconnect linewidth regions, such as the device level features, and dishing is more important than erosion in the large Cu interconnect linewidth region, such as the top layers of multi-level damascene structure and our proposed quilt-packaging configuration.

There are several parameters that control the CMP materials, as listed in Table 4.2

[23].

100 TABLE 4.2 ADJUSTABLE CMP PARAMETERS. (Adopted from [23].)

Parameter CMP Effect Pressure Removal Rate, Removal Uniformity Carrier Velocity Removal Rate, Removal Uniformity Pad Velocity Removal Rate, Removal Uniformity Slurry Chemistry Removal Rate, Surface Topography Slurry Abrasive (type, size, percent) Removal Rate, Surface Topography Pad Hardness Removal Rate, Removal Uniformity, Surface Topography Pad Porosity Slurry Distribution, Removal Uniformity Carrier Film Hardness Removal Uniformity

4.3.1 Theory

Copper CMP can be performed in either acid or basic media. When the slurry is at a high pH level, copper will react to form a hydrated copper layer [24]:

− − Cu+→2()2 OH Cu OH2 + e . (4.5)

This altered layer is then removed by the mechanical action of pad and slurry abrasives. Copper can also be removed in low pH level media, which includes a slurry chemistry that rapidly etches copper, and a slurry component that passivates the copper surface and blocks the etching process, such as benzotriazole (BTA). Copper is etched in low pH solution, as the copper dissolves as Cu2+ ions while oxygen is reduced [24]:

Cu→+ Cu2+ 2 e− . (4.6)

− − OHOe22++→244 OH. (4.7)

101 BTA can protect copper and inhibit the reaction. The abrasive in slurry then removes the BTA and allows copper to be etched and dissolved in the high regions to achieve planarization.

The polishing rate can be expressed using the Preston equation [25]

dh = kPv . (4.8) dt R

where, k is the Preston constant, P is the pressure of force per unit area, and vR is the

relative velocity. From [26], k can be defined as the ratio of the wear coefficient kw to the hardness H of the polished material. So, the Preston equation can be re-written as

dh k = w Pv . (4.9) dt H R

The wear coefficient is dependant on the polishing mechanism and is insensitive

to the polished material. kw roughly remains constant for different coating materials in the CMP process. For copper CMP, the dielectric layer is usually SiO2, if no low-k

material is used. If we assume kw is the same for Cu and SiO2, then the polishing rate for

Cu and SiO2 can be expressed as

⎛⎞dh kw ⎜⎟= PvCu R , (4.10) ⎝⎠dtCu HCu

⎛⎞dh kw and ⎜⎟= PvSiO R , (4.11) dt H 2 ⎝⎠SiO2 SiO2 where H and H are the hardness of Cu and SiO2 respectively. If the amount of Cu SiO2 dishing Cu remains the same during the over-polishing time, the copper removal rate should be uniform and the same as the oxide removal rate:

102 ⎛⎞dh ⎛⎞ dh ⎜⎟= ⎜⎟. (4.12) dt dt ⎝⎠Cu ⎝⎠ SiO2

Substituting eqs. (4.10) and (4.11) into (4.12), and being aware that the relative velocity is almost the same for both Cu and SiO2, we find the relation between the pressure distributed on Cu and SiO2 and the hardness of the two materials:

PH Cu= Cu . (4.13) PH SiO22 SiO

Since the pressure does not vary much in different areas of the die, and the hardness differs for different material, to reduce dishing, we must try to decrease the over-polishing time. Continuous investigation is required to obtain better surface planarity.

4.4 Experiments on Quilt-Packaging Fabrication with Electroless Plating

The first test process for copper nodule formation of quilt-packaging was using electroless copper plating on Cr/Cu seed layer inside the silicon trenches. [27] The solvent used was PC electroless Cu from Trensene Inc., which consists of three solutions.

Solution A consists of ethylene diamine tetraacetate tetrasodium salt, triethanolamine, diethanolamine, copper sulfate, sodium hydroxide, and aqueous solution. Solution B is formulated of methanol, formaldehyde, aqueous solution and formalin. Solution D consists of palladium chloride, hydrochloric acid and aqueous solution. Solution D is for the activation, and A and B are for the plating. Since the seed layer consists of Cu, which doesn't’ need to be activated, only the A and B mixtures were used here.

A new two-layer mask set was designed for this experiment, as shown in Fig. 4.16.

103

Figure 4.16. Two-layer mask set for electroless plating experiment. Different shapes with different dimensions are included for the investigation of etching and plating process. The maximum width of lines and pads is 20 µm, and the minimum width is 2 µm.

The blue pattern is the first layer, where the area will be etched and plated with copper; the red pattern is the second layer, i.e. the long etching process with both Bosch and SF6 only. The fabrication process is shown in Fig. 4.1. Besides trench define, electroless copper plating and CMP, there are two other issues that need to be addressed.

The first is the photoresist coverage of the etched trenches. Only seed layer inside the trenches will be left, and the seed layer on the other area will be removed. Because the

104 trenches are deep, normal photoresist (eg. AZ 1813 and AZ 5214) cannot be used here.

Photoresist with high viscosity is required. The other issue is the mask for the long etch step for the separation of chips. Similarly, normal photoresist cannot stand such a long etch time. A hard mask such as aluminum may cause short problem in the chip area, so a thick photoresist layer with relatively high etching selectivity in the Bosch process is needed. We find photoresist AZ 4620 can meet both requirements. In the test, a one- minute Bosch etch was performed, which resulted in about 5 µm deep trenches. Figure

4.17 shows the trenches covered by AZ 4620 and the features with seed layer only inside the trenches after stripping the AZ 4620.

(a) (b)

(c) (d) Figure 4.17. (a), (b): patterns covered by AZ 4620; and (c), (d): patterns

after stripping AZ 4620. Only Cr/Cu inside the trenches is left and will serve as seed layer in electroless copper plating.

105 For copper nodule formation by electroless plating, five experiments were conducted: 1. room temperature with mild agitation; 2. 30 deg. C without agitation; 3. 35 deg. C without agitation; 4. 40 deg. C without agitation; 5. 45 deg. C without agitation.

The experiments above room temperature without agitation were due to the lack of spin plate with heating function. The plating rate at room temperature, even with mild agitation, is very slow. The patterns with seed layer were not completely covered by plated copper after 5 hours. More than 20 hours are needed for the copper filling in this situation. At 30 deg. C, after more than 7 hours, there are still voids inside the trenches.

For the last three conditions, after 4 hours, the trenches are filled with copper as viewed by optical microscope. However, there are lots of precipitants on the surface of the wafer as shown in Fig. 4.18. Figure 4.19 shows the surface profile after plating at 35 deg. C using AlphaStep 500. The surface profiles are similar for plating at 40 and 45 deg. C.

CMP is followed to remove the precipitants and check the plating quality of electroless plating.

(a) (b)

(c) (d)

106

(e) (f)

Figure 4.18. Electroless copper plating after 4 hours in (a), (b): 35 deg. C; (c), (d): 40 deg. C; and (e), (f): 45 deg. C.

Figure 4.19. Surface profile after 4 hours of plating at 35 deg. C.

From the picture above, we can see that the copper precipitants crowded at the edge and in the center of the pads. The maximum height above the surface according to the aggregation is more than 37 µm, which will be polished by CMP process.

The instrument for our CMP process is from Logitech. The slurry is iCue® 4200 from Cabot Microelectronics. The slurry is first fully mixed by using an impeller fixed to a multi-speed electrical mixer motor for at least 30 minutes. Then, Hydrogen Peroxide

(H2O2) is added to the slurry at the volume ratio of H2O2 : slurry = 1 : 12, mixing them together for another 15 minutes. Since there is no temperature sensor, which can monitor

107 the removal of copper in this instrument, constant observation is needed to prevent over- polishing. The three wafers, plated at 35, 40 and 45 deg. C, were processed respectively.

Fig 4.20 shows the microscope pictures and SEM pictures of the three wafers after CMP.

(a) (b)

(c) (d)

(e) (f) Figure 4.20. Microscope and SEM pictures after CMP for wafers plated at

(a), (b): 35 deg. C; (c), (d): 40 deg. C; and (e), (f): 45 deg. C. The dark areas inside the patterns in the microscope pictures are voids. The above pictures show that the plating process at 40 deg. C gives the best smoothness of copper after CMP. But even at this condition, there are still some voids in the patterns, which are not shown here. And, the depth of nodules is only 5 µm for all three wafers.

108 The next step is the separation of the chips by DRIE. For this experiment, the chips are only partially separated. The microscope and SEM pictures are shown in Fig.

4.21.

(a) (b)

(c) (d)

(e) (f)

(g) (h)

109

(i) (j)

(k) (l) Figure 4.21. After partially separating the chips by DRIE for the wafers plated at (a)-(d): 35 deg. C; (e)-(h): 40 deg. C; and (i)-(l): 45 deg. C.

The copper nodules extended outside the edges of the chips after DRIE. This first experiment demonstrated that the process is feasible for quilt-packaging fabrication.

However, several issues needed to be addressed. First is the seed layer deposition. In this experiment, evaporation was used to cover the trenches. The coverage is not good due to the process. Generally, the metal atoms being evaporated go straight to the target, and when the trenches are small and deep, one side or even the bottom cannot receive the metal atoms, as shown in Fig. 4. 22. Second is the electroless copper plating process.

Even at 40 deg. C, which is the best of the three, it still took four hours to fill the trenches with depth of only 5 µm. Another plating solvent, Printoganth PV from Atotech, was tried too, but did not provide the same quality and plating rate as the solvent from

Transene. To improve the robustness of the copper nodules, which is to increase the

110 thickness, a much longer time will be needed. And, because of the conformal plating mechanism during the process, voids inside the trenches will be very hard to avoid. Third is the fine tuning of the CMP process. As mentioned before, to avoid dishing due to over- polishing, constant observation is needed. The repeatability of CMP is very hard to control due to the change of condition of the polishing pads along the process. Fourth is the separation of chips by DRIE. The pictures in Fig. 4.21 show the diving-board effect, which is that the copper nodules extend too much from the peripheral of the chips.

Careful design of the mask set and tuning of the parameters in Bosch process are needed to help improve the robustness of the nodules, which serve as chip-to-chip interconnects.

Figure 4.22. Simple schematic of evaporation process.

4.5 Experiments on Quilt-Packaging Fabrication with Electrolytic Plating

Due to the low plating rate and voids in electroless plating, to achieve better reliability of the copper nodules in quilt-packaging, electrolytic plating was tested, although by doing this, the burden on CMP is increased. The seed layer was changed to

Ti/Cu and sputtering was used to have conformal coverage inside the deep trenches. The detailed fabrication process along with the final mask set will be addressed in the next section. In this section, another mask set was designed and fabricated before the final

111 simulations. An overview of this mask set is shown in Fig. 4.23. A total of 16 chips are included in the mask set. Three different chip-to-chip alignment techniques were tested, which are shown in Fig. 4.24.

Figure 4.23. Overview of mask set for test of QP fabrication process using electrolytic copper plating.

(a)

112

(b)

(c)

Figure 4.24. Alignment techniques for QP interconnection: (a) corner silicon etch; (b) nodule area silicon etch and (c) keyed nodules.

The orange patterns are the first layer of mask, which serves as defining the copper nodules. The blue patterns are the second layer of mask, which is the metal layer for on-chip interconnects to nodules and on-chip calibrations. The green patterns are the third layer, which defines the DRIE area for separation of the chips. The purple patterns are the areas where the first layer and second layer overlap. Since QP is designed for very high speed inter-chip communications, misalignment of the two chips will cause mismatch and discontinuities at the interface. The alignment of the two or more QP modules is very critical. By trying the three techniques shown in Fig. 4.24, we wanted to find the most suited way in terms of accuracy, reliability and ease of fabrication.

113 The electrolytic copper plating solvent used here is InterVia Cu 8520 from Rohm and Haas. The solvent has excellent micro via hole filling performance, and support pulse and reverse-pulse current plating. InterVia Cu 8520 bath consists of three solutions,

InterVia Cu 8500, 8520A and 8520C. InterVia Cu 8500 is the basic solution that contains

copper sulfate ( CuSO42⋅5 H O ), which provides copper ions; sulfuric acid, which provides a plating solution with conductivity and leveling effect; and chloride, which is necessary to promote dissolution at the anode and affects the deposition reaction at the cathode. InterVia Cu 8520A and C are organic additives, which improves filling performance and plating properties of the film. The make-up procedure of 1 liter of the bath is: 1. add 984 ml of InterVia Cu 8500; 2. add 1.5 ml of InterVia Cu 8520C and mix well; 3. add 15 ml of InterVia Cu 8520A and mix well; 4. dummy plate at 2-3 A/dm2 for approximately 2 hours to confirm uniform plating appearance. Bottom-up plating can be achieved using this bath.

The copper anode used here is an industry standard phosphorized copper (0.055%) from IMC. During the plating process, the dissolution of copper produces some Cu+ , which then disproportionate in solution near the anode surface to produce copper metal and Cu2+ [28] forming black films on the copper anode. The addition of phosphorous to the copper anode significantly reduces the Cu+ concentration in solution near the anode.

The current waveform used in this experiment is shown in Fig. 4.25. The pulse current is 1 A, with on time of 2.5 ms and off time of 0.5 ms. The reverse current is 0.28

A, with on time of 0.3 ms and off time of 0.7 ms. The current is chosen based on the requirements of the plated area, plating bath and the power supply.

114

1 A

0.28 A

Figure 4.25. Current waveform in electroplating.

The Bosch process was used to define the copper nodules. In this experiment, 3 minutes of DRIE, which resulted in about 20 µm deep nodules, were conducted. To fill the trenches, a total of 2 hours of electroplating was processed. Figure 2.26 shows the patterns after the plating. The surface looked rough, and it was hard to tell whether the trenches were filled or not.

Figure 4.26. After 2 hours of electroplating.

Another wafer with the same process was annealed at 400 deg. C in nitrogen for

30 minutes. The patterns became dark after the annealing, as shown in Fig. 4.27. The color difference was dramatic, so we did not add the annealing process to the later processed wafers.

115

Figure 4.27. Patterns after annealing.

CMP was followed after the plating. Because of the thickness difference between this experiment and the previous one, and the tighter rigidity of the electrolytic copper, a much longer CMP process was needed. A different slurry iCue® 5001, also from Cabot, was chosen here. The ratio of slurry to H2O2 was 6:1. The flow rate was 80 ml/min. The down force pressure was 20 psi. The carrier and platen rotation speeds were both at 60 rpm and in the reverse direction. The temperature of the platen was kept at 30 deg. C all the time. Constant observation was made and a total of 80 minutes of CMP was performed. Micrographs of the patterns after CMP are shown in Fig. 4.28. Compared with the patterns using electroless copper plating, they are much smoother and there are almost no voids visible, indicating the excellent micro via filling property of the process.

(a) (b)

116

(c) (d)

Figure 4.28. Patterns after CMP.

After putting up the metal layer using the second mask, a thick photoresist (AZ

4620) was spun onto the wafer and served as the protection mask for the DRIE to separate the chips. After 45 minutes of Bosch etching, the silicon substrate were etched to a depth of around 300 µm. Figure 4.29 shows the microscope pictures after the 45-minute etch. Here, the chips have not been separated yet. The red cover on the chip is the remaining photoresist.

(a) (b)

(c) (d) Figure 4.29. After 45 minutes of DRIE.

117 Since the silicon substrate we used is about 600 µm thick, another 45 minutes of

DRIE was performed to completely separate the chips. Photoresist was then stripped in

Acetone and oxygen plasma. Overviews of separated chips are shown in Fig. 4.30.

Figure 4.30. Micrographs of separated chips.

The alignment techniques, shown in Fig. 4.24, were tested here. Figure 4.31 shows the micrographs of the aligned quilt-packaging prototypes.

(a) (b)

(c) (d)

118

(e) (f) Figure 4.31. Micrographs of aligned QP prototypes using alignment techniques of (a), (b): corner silicon etch; (c), (d): nodule area silicon etch; and (e), (f): keyed nodules.

The pictures above show that the keyed nodules alignment has the best performance of the three. This is because the DRIE process used to separate the chips will etch excessive silicon area in the horizontal plane, whose advantage is to remove the silicon under the extended silicon nodules and provide better connection between the chips, while the disadvantage is that it is very hard to control the profile of the etched silicon. Unless very careful mask design and very tight DRIE control are achieved, the first two alignment techniques cannot provide enough accuracy.

The SEM pictures of the extended copper nodules in the separated chips are shown in Fig. 4.32. The side views of the nodules are also shown here. The depth of the nodules is around 20 µm. Compared with the nodules whose depth is around 5 µm, shown in Fig. 4.21, they are much thicker and therefore, stronger.

(a) (b)

119

(c) (d)

(e) (f)

(g) (h) Figure 4.32. SEM pictures of (a), (b): 100 µm wide keyed nodules; (c), (d): closer look of the 100 µm wide keyed nodules; (e), (f): 20 µm wide nodules; and (g), (h): side views of 20 µm wide nodules.

From the pictures above, we can clearly see the reentrant etch of the silicon substrate by the Bosch process, which exposes the nodules. The scallop-like shape, seen in the closer look of the side view, is due to the etch and passiviation cycle in the Bosch process.

There are several issues arisen from this experiment. From Fig. 4.31, the copper surface after separation is dark, which is due to the insufficiency of the protection photoresist layer. To protect the copper on-chip interconnect, a multi-spin process is adopted and the result will be shown in the following section. From Fig. 4.32, there are

120 some precipitants accumulated on the side wall of the copper nodules, which are not conductive and will affect the connection. At first, we thought it was carbon particles from the C4F8 passivation cycle in Bosch. Oxygen and SF6 only plasma were used to try to remove the precipitants. However, neither succeeded. We then suspected they are from the reaction between copper and fluorine. To remove this film, a clean process was added after the Bosch process, which used a slow copper etchant 49-1 from Trensene. In the next section, a complete process of forming QP structures is proposed and finished prototypes for microwave measurements are demonstrated.

4.6 Final Fabrication Process of Quilt-Packaging

By comparing the electroless and electrolytic copper plating, the later is chosen for the formation of nodules in the QP fabrication process. In chapter 3, we did the simulations of the QP structures. A whole new three-layer mask set based on the simulation results was designed and fabricated. An overview of this mask set is shown in

Fig. 4.33. There are four chips in one mask with four different nodule widths, 10, 20, 50 and 100 µm. The first orange layer (purple in the picture because of the overlap of the first and second layer) is to define the location of the nodules. The second blue layer is for on-chip interconnect to the nodules and calibration for the de-embedding in microwave measurements. The green third layer is where the Bosch process will be performed to separate the chips. From the previous experiment, only keyed nodules alignment is chosen here. The dimension of each chip is about 1 cm × 1 cm, which is for the ease of handling after separation.

121

20 µm 50 µm

100 µm 10 µm

Figure 4.33. Overview of the three-layer QP mask set.

After all the experiments were performed in the previous sections, the final

fabrication process flow for QP structures is presented in Figs. 4.34 (a) – (m). After the

IC front-end processing on the wafer is completed, nodules are defined along the edges

by standard photolithography processing, and etched by DRIE to create trenches for the

nodules (Fig. 4.34a). A layer of plasma enhanced chemical vapor deposition (PECVD)

SiO2 is deposited to form an isolation layer inside the trenches between the silicon

substrate and metal contacts formed later (Fig. 4.34b). A thin Ti/Cu seed layer is then

sputtered to coat the whole front side of the wafer (Fig. 4.34c), and copper electroplating

122 is performed to fill the trenches (Fig. 4.34d). Chemical-mechanical polishing (CMP) is used to planarize the trenches and form the nodules (Fig. 4.34e). Subsequent processing is to connect nodules with on-chip interconnects is included as part of the conventional back-end-of-line processing (Fig. 4.34f). After the completion of interconnects, a thick photoresist layer is spun onto the wafer and the SiO2 is etched by buffered HF (BHF) at the open area at the edges for the chip separation (Fig. 4.34g). DRIE is used again to etch part of the silicon substrate (Fig. 4.34h). The wafer is then dipped in BHF to remove the

SiO2 and Ti on the sidewall of the nodules (Fig. 4.34i). The etch is continued all the way through the wafer to undercut part of the nodules and separate the chips. The thick photoresist mask is consumed partly in the meantime (Fig. 4.34j).

To connect QP structures, two methods are tried. First, Sn, electrolessly plated on the chips, coats only the copper surface at the protruding nodule sidewall (Fig. 4.34k).

After stripping the photoresist, two or more chips are connected by soldering them together to form the QP structures (“quilt”) (Fig. 4.34m). Alternatively, two or more chips are fixed together by adhering to a low viscosity glue spun on a substrate. Sn is then electrolessly plated to form the bridge between the two chips (Fig. 4.34l). After stripping photoresist, the QP structures are formed (Fig. 4.34m).

substrate

(a)

123

SiO2

(b)

Seed layer

(c)

Cu

(d)

(e)

(f)

124

photoresist

(g)

(h)

(i)

(j)

125

Sn

(k) (l)

(m)

Figure 4.34. Final fabrication process of QP structures. (a) Define and etch nodules by DRIE, (b) passivate trenches by PECVD SiO2, (c) sputter Ti/Cu seed layer inside the trenches, (d) plate copper to fill the trenches, (e) planarize the nodules by CMP, (f) continue back-end-of-line process to finish ICs, (g) spin thick photoresist as protection layer, open separation area and remove SiO2 on the surface, (h) use DRIE to remove part of the silicon substrate, (i) dip wafer in BHF to remove SiO2 and Ti on the sidewall of protruded nodules, (j) continue DRIE to separate the chips, (k) electrolessly plate Sn on the nodule sidewall, (l) align and fix chips, then plate Sn to form bridge, and (m) strip photoresist and form a complete two-chip QP structure.

In our QP fabrication run, a bare silicon wafer was used. The purpose is to demonstrate the feasibility of the process and investigate the microwave performance of the QP interconnects.

Both low resistivity silicon wafer (about 10 Ω • cm) with 600 µm thickness and high resistivity silicon wafer (about 8000 Ω • cm) with 400 µm thickness were used for fabrication. The procedure is as below:

1. RCA clean wafers: a. Dip 4-inch silicon wafers for 10 minutes into RCA 1 solution

containing de-ionized (DI) H2O : H2O2 : NH4OH = 5:1:1, which is pre-heated to 70

126 deg. C. The purpose is to remove insoluble organic contaminants; b. Rinse wafers in

DI water for 1 minute; c. Submerge the wafers for 10 minutes into RCA 2 solution

consisting of DI H2O : H2O2 : HCL = 6:1:1, which is also pre-heated to 70 deg. C.

The reason is to remove ionic and heavy metal atom contaminants; d. Rinse in DI

water for 1 minute.

2. HF clean wafers: a. Submerge wafers into HF : H2O = 1:50 solution for 50 seconds to

remove the thin layer of native oxide of the silicon wafers; b. Rinse in DI water for 2

minutes; c. Spin or blow dry.

3. Spin HMDS and photoresist AZ 5214 on the wafers at speed of 5000 rpm for 30

seconds, which results in about 1 µm thick coating on the wafers.

4. Bake the wafers at 105 deg. C for 35 seconds.

5. Perform photolithography using the first layer mask on 6300 wafer stepper. The dose

needed for 1 µm AZ 5214 is 60 mJ/cm2, while the dose of the stepper provides is

around 24 mJ/cm2, so the duration time for each exposure is 2.5 seconds.

6. Bake wafers at 110 deg. C on the hot plate for 1 minute, which is for image reverse.

7. Flood exposure using Cobilt: the overall dose should be in the range of 300 ~ 350 mJ.

Generally, an exposure time of 60 seconds is needed.

8. Develop photoresist in AZ 917 for 19 seconds. Over-develop may cause increase of

dimension and even distortion of the patterns.

9. Perform Bosch process in Alcatel 601E DRIE system for 3 minutes and 18 seconds,

which results in about 20 µm deep trenches for the copper nodules.

10. Use O2 plasma in Drytech for 20 minutes to completely remove the photoresist.

127 11. Deposit a 1 µm layer of SiO2 on front side of wafer and a 100 nm layer of SiO2 on

back side of the wafer using PECVD. Since PECVD deposition is a conforming

process, the bottom and sidewalls are both coated with about 1 µm SiO2, which

serves as an isolation layer between silicon substrate and the later formed copper

nodules. The SiO2 on back side of the wafer is to prevent copper precipitants

attachment in electrolytic plating process.

12. Sputter seed layer (Ti/Cu at 50 nm/800 nm) on the front side of the wafer using

PE2400. The chamber pressure is kept at 20 mT for conformal coverage.

13. Pulse reverse pulse electrolytic copper plating is performed to fill the trenches. The

experiment parameters are the same as in chapter 4.5. The plating solution is InterVia

Cu 8250 from Roam and Haas and the power supply is DuPR 10-1-3 from

Dynatronix. To ensure filling of trenches and increase uniformity of plating, first, a

60 minutes of plating is performed, then, rotate the wafer position for 180 degree and

another 60 minutes of plating is conducted.

14. Chemical-mechanical polishing is followed to remove the excessive copper on the

wafer surface. The slurry used is iCue® 5001 from Cabot. First the slurry is stirred

for 30 minutes then mix slurry with peroxide with the volume ratio at slurry : H2O2 =

12 : 1, and another 10 minutes of stir is performed. The temperature on the platen is

kept at 30 deg. C during the process. The flow rate is 80 ml/min. The rotation speed

of carrier and platen is the same, at 50 rpm, while in the reverse direction. The

pressure is kept at 20 psi. The removal time varies since the pad condition changes

along the process. A duration time of about 70 minutes for each wafer is used.

128 Constant observation of the wafers to prevent over-polish is necessary because of the

lack of monitoring system in the Logitech CMP instrument.

15. To ensure the surface has no titanium left, a Ti etchant TFTN from Transene is used

following the CMP. The etch rate is 1 nm/second at 85 deg. C. TFTN doesn’t etch

SiO2 and has a copper etch rate of only 0.1 nm/second. So, a 10 seconds etch at 85

deg. C is performed at this step.

Fig. 4.35 shows the 10, 20, 50 and 100 µm wide copper nodules after this step.

(a) (b)

(c) (d)

129

(e) (f)

(g) (h)

Figure 4.35. (a), (b): 10 µm wide copper nodules; (c), (d): 20 µm wide copper nodules; (e), (f): 50 µm wide copper nodules; and (g), (h): 100 µm wide copper nodules after CMP.

16. Spin HMDS and AZ 5214 at 5000 rpm for 30 seconds on the wafers.

17. Bake the wafers at 105 deg. C for 35 seconds.

18. Use the second layer mask for photolithography on the stepper. The exposure time is

again 2.5 seconds.

19. Bake the wafers at 110 deg. C for 60 seconds.

20. Flood exposure for 60 seconds.

21. Develop in AZ 917 for 19 seconds.

22. Evaporate Ti/Cu at 20 nm/600 nm on the front side of the wafers using FC1800. This

metal layer serves as the on-chip interconnect and also covers the copper nodules.

130 23. Dip the wafers in Acetone to lift off the metal on the photoresist. Sonicate is used to

help remove the metal pieces completely. Rinse the wafers for 1 minute and blow dry.

Figure 4.36 shows the patterns after lift-off.

(a) (b)

(c) (d)

Figure 4.36. Patterns after lift off: (a) 10 µm; (b) 20 µm; (c) 50 µm; and (d) 100 µm.

24. Spin HMDS at 2000 rpm for 30 seconds. Triple-spin AZ 4260 at 2000 rpm for 30

seconds. The triple-spin is to increase the thickness of the photoresist protection layer

for the later separation of the chips. After the spin, a thickness of about 16 µm is

achieved.

25. Bake the wafers at 90 deg. C for 4 minutes.

26. Use the third layer mask to do photolithography. A long time, 200 seconds, of

exposure is needed here because of the thickness of the photoresist coating. Since the

131 longest exposure time for stepper 6300 is 128 seconds, double exposure is conducted.

The alignment must be carefully performed.

27. Develop in AZ 917 for 3 minutes and 30 seconds.

28. Bake the wafers at 90 deg. C for 5 minutes.

29. Dip the wafers inside BHF (1:10) for 6 minutes to remove SiO2 on the open area

completely. Only silicon will show at the open area. Figure 4.37 shows the patterns

coated with AZ 4620 after this step. The front ends of copper nodules expose due to

the develop process.

(a) (b)

(c) (d)

Figure 4.37. Patterns coated with AZ 4620 after develop and removal of

SiO on the surface: (a) 10 µm; (b) 20 µm; (c) 50 µm; and (d) 100 µm. 2

30. Use DRIE to perform Bosch process for 20 minutes. Then, dip the wafers inside BHF

again for 6 minutes to remove SiO2 and Ti at the sidewalls of the trenches. Figure

4.38 shows the patterns after 20 minutes of DRIE of Si.

132

(a) (b)

(c) (d)

(e) (f)

(g) (h) Figure 4.38. Patterns after 20 minutes of Bosch process.

31. Put a bare wafer underneath the processed wafer, and continue Bosch process to

separate the chips. To ensure reentrant etch to expose copper nodules, the parameters

133 of Bosch process are chosen as: SF6 flow rate at 300 sccm, C4F8 at 130 sccm, with

duration cycles at 7 seconds and 2 seconds respectively; the automatic pressure

control position at 23%; the plasma generated by a source power of 1800 W, and the

substrate power at 80 W; the temperature on the surface of the wafer during

processing maintained at 20 deg. C.

The DRIE system will detect the pressure on substrate and will stop automatically if

low pressure is sensed, which is the reason we put a wafer underneath the processing

wafer to maintain the sensible pressure.

For 600 µm thick low resistivity wafer, another 45 minutes are needed; for the 400

µm thick high resistivity wafer, another 30 minutes are required to completely

remove the silicon at the open area. Due to the loading effect of DRIE, different mask

designs will need fine tune of etch time.

Figure 4.39 shows the SEM pictures of the separated chips. Here, the photoresist was

removed to reduce the contamination and charging and increase resolution in the

SEM.

(a) (b)

(c) (d) 134

(e) (f)

(g) (h)

(i) (j)

(k) (l)

135

(m) (n)

(o) (p)

Figure 4.39. SEM pictures of separated chips with copper nodule width at (a) – (d): 10 µm; (e) – (h): 20 µm; (i) – (l): 50 µm; and (m) – (p): 100 µm.

32. As shown above, small precipitants are found covering the end of copper nodules

after the separation of the chips. They were considered to be from the reaction of

copper and fluorine. This film will prevent the two copper nodules from two chips

connecting properly from each other. To remove this thin film, a low copper etchant

49-1 from Transene is used. The etch rate is 2.2 nm/sec at 30 deg. C. The separated

chips are dipped inside this etchant for 15 minutes to completely remove the

precipitants. Figure 4.40 shows the side views of the copper nodules before and after

the removal of the precipitants. Figure 4.41 shows the reentrant etch of the Bosch

process. The chip shown in Fig. 4.41 is made from low resistivity silicon substrate,

and the thickness is around 600 µm. A 60 µm reentrant profile is shown to ensure that

136 when connecting, the copper nodules will connect first before the two substrates

touch.

(a) (b)

(c) (d) Figure 4.40. Side views of before (a, b) and after (c, d) clean of the precipitants on copper nodules.

nodules

60 µm

600 µm

Figure 4.41. Side view of chip after separation by DRIE Bosch process.

137 33. Spin low viscous glue Loctite 460 on a bare silicon wafer at 6000 rpm for 30 seconds.

The resulting thickness is around 2 µm. Put the chips on the glue surface and align

properly. The glue is a great help in alignment under microscope manually, and it can

be removed later by Acetone. After the alignment, heat the glue at 60 deg. C for 10

minutes to fix the chips on top. Figure 4.42 shows the aligned chips.

(a) (b)

(c) (d) Figure 4.42. Alignment of two chips on Loctite 460: (a) 10 µm; (b) 20 µm; (c) 50 µm; and (d) 100 µm.

34. Electrolessly plate Sn on the chips to form a bridge to connect. The plating solution is

bright electroless Tin from Transene. The plating rate on copper is about 1.2 µm/min

at 83 deg. C. Photoresist, silicon substrate and silicon-dioxide will not be coated with

Sn in this solution. A total of 15 minutes of plating is conducted.

35. Acetone is used to remove the photoresist. Figure 4.43 shows the quilt-packaging

connections between the two chips.

138

(a) (b)

(c) (d) Figure 4.43. QP connections of chips with 50 µm wide nodules (a, b), and 100 µm wide nodules (c, d).

A probe station was used to measure the electrical continuity of the two chips through QP. The chips with 50 and 100 µm wide nodules have good continuity, while the chips with 10 and 20 µm wide nodules are still disconnected after the process above. The reason can be attributed to the alignment of the two chips, which was conducted by hand.

More precise instrument to help the alignment will be needed for the chips with smaller features.

The overviews of the two chips connected by QP are shown below in Fig. 4.44.

The QP structures were later tested for their microwave performance, and will be introduced in the next chapter.

139

(a)

(b) Figure 4.44. Overviews of QP structures for (a) 50 µm wide nodule chips and (b) 100 µm wide nodule chips.

In this chapter, a low temperature QP fabrication process is proposed and demonstrated, which can be integrated into a standard IC fabrication process for ultra-fast communication between two or more chips.

140

CHAPTER 5

MICROWAVE MEASUREMENTS OF QUILT PACKAGING

In chapter 3, microwave performance of QP structures were simulated and in chapter 4, the fabrication process was presented. Microwave measurements of the QP structures using network analyzer are introduced in this chapter.

5.1 Review of de-embedding techniques for on-wafer measurements

According to the high losses and process deviations associated with CMOS technology, accurate calibration techniques are demanded to provide precise models for the device-under-test (DUT) fabricated on silicon wafers. Currently, a combination of ceramic impedance-standard-substrate (ISS) with calibration devices fabricated on the same wafer as DUT gives the most reliable approach [1]. ISS offers high-accuracy and low-loss standards for two-port calibration procedures such as short-open-load-reciprocal

(SOLR), short-open-load-thru (SOLT), thru-reflect-line (TRL), line-reflect-match (LRM), and line-reflect-reflect-match (LRRM). Those types of calibration offer a calibrated reference plane close to the probe tips. The calibration devices fabricated on wafer are utilized to subtract the parasitic component from the measurement of the DUT and its test-structure. This procedure of correction for the influence of on-wafer parasitics is called de-embedding.

141 In our case, the quilt-packaging (QP) structure’s microwave performance is measured along with the on-chip interconnection and the pads. To model the QP structure correctly, we need to remove the effects of the test-structure. Here, we present eight sets of calibration devices along with their equivalent circuits to de-embed the DUT (QP).

The measurements are from vector network analyzer, and in s-parameters.

5.1.1 Open

The open structure is used to de-embed the shunt capacitance due to pads and interconnects, which represents the coupling through the substrate [2]. Figure 5.1 shows

the structure and the equivalent circuit. yp,1 and yp,2 correspond to the probe pads

admittance at port 1 (P1) and port 2 (P2). The series admittance ys is the contribution of the interconnects. ys

P1 P2 DUT

yp,1 yp,2

G G Figure 5.1. Open structure and equivalent circuit.

First, the measured open structure’s s-parameters are transformed to y-parameters as below [3]:

⎡(1−++SSSSS11 )(1 22 ) 12 21 − 2 12 ⎤ ⎡⎤YY11,open 12, open ⎢ ∆∆SS⎥ ⎢⎥= Y0 ⎢ ⎥ (5.1) ⎣⎦⎢⎥YY21,open 22, open ⎢−+−+2(1)(1) S 21 SSSS 11 22 12 21 ⎥ ⎣⎢ ∆∆SS⎦⎥

142 where, YZ00=1/ , Z0 is the characteristic impedance at the port, and

∆=SSSSS(1 +11 )(1 + 22 ) − 12 21 .

ys , yp,1 and yp,2 can be found [4]:

yYs =− 12,open (5.2)

yYp,1=+ 11, open Y 12, open (5.3)

yYp,2=− 12, open Y 22, open (5.4)

Second, transform the s-parameters of the DUT with test structure to y-parameters

using equation (5.1). Subtract the parasitic components of ys , yp,1 and yp,2 using:

yy y ⎡⎤YY11,DUT 12, DUT ⎡⎤YY11 12 ⎡ ps,1 +− s⎤ ⎢⎥=−⎢⎥⎢ ⎥ (5.5) ⎣⎦YYYY21,DUT 22, DUT⎣⎦ 21 22⎣⎢−+yyy s p ,2 s ⎦⎥

If needed, the DUT’s y-parameters can be transformed back to the s-parameters:

⎡⎤()()YYYY0−++ 11,DUT 0 22, DUT YY 12, DUT 21, DUT − 2 YY 12, DUT 0 ⎡⎤SS⎢⎥∆∆YY 11,DUT 12, DUT = ⎢⎥DUT DUT ⎢⎥⎢⎥ ⎣⎦SS21,DUT 22, DUT−+−+2()() YYYYYYYY 21, DUT 0 0 11, DUT 0 22, DUT 12, DUT 21, DUT ⎢⎥ ⎣⎦⎢⎥∆∆YYDUT DUT

(5.6)

where, ∆YYDUT=+()()11, DUT YYYYY 0 22, DUT +− 0 12, DUT 21, DUT .

5.1.2 Open and Short

By adding a short to the previous method, the series parasitics such as contact impedance, between the probe and pads, and the interconnect’s series loss can be de- embedded. Figure 5.2 shows the calibration structures and the equivalent circuit.

143

P1 zi,1 zi,2 P2 DUT open yp,1 yp,2

G G

short Figure 5.2. Open and short.

To find the circuit components in Fig. 5.2, we start the de-embedding procedure from the outside ports and move in toward the DUT. Symmetry is assumed here.

First, the measured s-parameters of the short structure are transformed to z- parameters. Since the structure is symmetrical

(1+ SSSS11 )(1−+ 22 ) 12 21 zzzii,1== ,2 ishort , = Z 0 . (5.7) (1−−−SSSS11 )(1 22 ) 12 21

Second, measure the s-parameters of the open structure and transform them to z-

parameters. ziopen, can be calculated using equation (5.7). We can find yp as

yyyp,1=== p ,2 p1/( z i , open − z i , short ). (5.8)

Third, measure the s-parameters of the DUT with test structure and transform them to z-parameters:

⎡⎤(1+−+SSSS11 )(1 22 ) 12 21 2 S 12 ⎡⎤Z Z⎢⎥(1−−− S )(1 SSS ) (1 −−− S )(1 SSS ) 11 12⎢⎥ 11 22 12 21 11 22 12 21 ZZ==⎢⎥0 (5.9) ⎣⎦Z21ZSSSSS 22⎢⎥2(1)(1) 21−++ 11 22 12 21 ⎢⎥ ⎣⎦(1−−−SSSS11 )(1 22 ) 12 21 (1 −−− SSSS 11 )(1 22 ) 12 21

144 zishort, is de-embedded as

⎡Zi,1 0 ⎤ ZZ′ =−⎢ ⎥ (5.10) ⎣0 Zi,2 ⎦

−1 Transform Z′ into Y′ , YZ′ = ′ . Then de-embed yp as

⎡⎤yp,1 0 YYYDUT ==−′′ ′ ⎢⎥ (5.11) ⎣⎦⎢⎥0 y p,2

The DUT’s s-parameters can be obtained using equation (5.6).

5.1.3 Open, Short and Thru

Adding a thru can better de-embed the interconnect lines. The calibration devices and equivalent circuit are shown in Fig. 5.3.

open short z z z z i l l i P1 P2 l fix DUT y yp p lDUT

G G

thru

Figure 5.3. Open, short and thru.

145 zi and yp can be de-embedded in the same way as in chapter 5.1.2 by measuring

the s-parameters of the short and open structures. To find zl , first, measure the s-

parameters of the thru structure, and turned them into z-parameters ( Zthru ) using equation

(5.9). De-embed the effects of zi by:

⎡⎤zi 0 ZZthru′ =− thru ⎢⎥. (5.12) ⎣⎦0 zi

' Transform Z thru to y-parameters (Ythru′ ), then de-embed the effects of yp :

⎡⎤y p 0 YYthru′′=− thru ′ ⎢⎥. (5.13) ⎣⎦⎢⎥0 yp

zl can be found as

zylDUTthru=η /(211,′′ ) . (5.14)

Here, symmetry is assumed. If we want to find the zl at port 2, we can use y22,′′ thru .

The correction factor ηDUT can be expressed as

ηDUT=−()/ll fix DUT l fix (5.15)

where l fix is the effective length between signal pads and lDUT is the effective length of

the DUT. Both l fix and lDUT are shown in Fig. 5.3.

Now, we measure the s-parameters of the DUT with test fixture. Transform them

into z-parameters. De-embed zi using equation (5.12), and yp using (5.13). Transform y-

parameters back to z-parameters, and de-embed zl :

⎡⎤zl 0 ZZDUT′′′=− DUT′′ ⎢⎥ (5.16) ⎣⎦0 zl

146 5.1.4 Two-port Network with a Thru

Instead of using a detailed equivalent circuit, we can use a two-port network to represent the interconnection between probe tips to the DUT. This method only requires one thru calibration device, as shown in Fig. 5.4.

P1 P2 DUT ⎡SS11,pp 12, ⎤ ⎡⎤SS11,pp 12, ⎢ ⎥ ⎢⎥ ⎢ ⎥ ⎢⎥ ⎢SS⎥ ⎢⎥SS ⎣ 21,pp 22, ⎦ ⎣⎦21,pp 22, G G

Figure 5.4. Two-port network de-embedding with a thru. Assuming the input and output interconnects are identical, use the measured s-

parameters of the thru standard to find S p :

SS11,thru+ 22, thru SS11,pp== 22, (5.17) 2 ++SS21,thru 12, thru

1 SS==()(1) S + S − S2 (5.18) 12,p 21, p2 12, thru 21, thru 11, p

Transform S p into T-parameters:

⎡⎤1 S22, p ⎢⎥− SS21,pp 21, T = ⎢⎥ (5.19) p ⎢⎥ SSS11,ppp 11, 22, ⎢⎥S12, p − ⎣⎦⎢⎥SS21,pp 21,

The de-embedding is performed as below:

−11− TTTTDUT=⋅ p meas ⋅ p (5.20)

where Tmeas is the measured transmission parameters.

The T-parameters can also be transformed back to s-parameters:

147 ⎡⎤TTT21 21 12 ⎢⎥T22 − ⎡⎤SST11 12 11 T 11 ⎢⎥= ⎢⎥ (5.21) ⎣⎦SS21 22⎢⎥1 T 12 ⎢⎥− ⎣⎦TT11 11

The advantage of this method is that complicated pads and interconnect layouts are no problem. But if the length of the thru is too long, over-calibration may occur.

5.1.5 Three-step with Two Shorts, Open and Thru

This method, which offers a more complicated de-embedding model for the test fixture, was presented in [5]. An improved three-step method was shown in [6]. The port- to-port parasitics and ground inductance can be de-embedded using this method. The calibration devices and their equivalent circuits are shown in Fig. 5.5. And the equivalent circuit of the RF test-fixture with DUT is shown in Fig. 5.6.

open

short1

148

short2

thru Figure 5.5. Three-step calibration standards and their equivalent circuits. (Adopted from [6])

Figure 5.6. Equivalent circuit of the RF test fixture. (Adopted from [6])

We can find the parasitic admittance G1 , G2 and G3 , and the parasitic impedance

Z1 , Z2 and Z3 by measuring the s-parameters of the calibration standards, and transforming them into y-parameters.

Gy111,12,=+open y open (5.22)

149 Gy222,12,=+open y open (5.23)

−1 Gy312,12,=−(1/open + 1/ y thru ) (5.24)

11⎛⎞− 1 1 Z =+ − (5.25) 1 ⎜⎟ 2 ⎝⎠yyGyG12,thru 11, short 1−− 1 22, short 2 2

11⎛⎞− 1 1 Z =− + (5.26) 2 ⎜⎟ 2 ⎝⎠yy12,thru 11, short 1−−G 1y 22, short 2G 2

11⎛⎞ 1 1 Z =+ + (5.27) 3 ⎜⎟ 2 ⎝⎠yy12,thru 11, short 1−−G 1y 22, short 2G 2

To de-embed the parasitics in the RF test fixture, we first convert the measured s-

parameters to y-parameters. The resulting y-parameters, Ymeas , are then de-embedded for

the influence of G1 and G2 :

⎡⎤G1 0 YYAmeas=−⎢⎥ (5.28) ⎣⎦0 G2

Then, the y-parameter matrix YA is converted to z-parameter matrix Z A . The

impedances Z1 , Z2 and Z3 can be de-embedded:

⎡⎤Z13+ ZZ 3 ZZBA=−⎢⎥ (5.29) ⎣⎦Z323ZZ+

Again, the resulting z-parameter matrix ZB is converted into y-parameter matrix

YB . The coupling effect between the two ports, G3 , is finally de-embedded and the

DUT’s y-parameters are given as

⎡⎤GG33− YYDUT=− B ⎢⎥ (5.30) ⎣⎦−GG33

150 5.1.6 Four-step with Two Shorts and Two Opens

A four-step de-embedding technique is presented in [7], which include two shorts and two open calibration standards, as shown in Fig. 5.7. The equivalent circuit of each standard is shown in Fig. 5.8, and Fig. 5.9 shows the equivalent circuit of the RF test fixture with DUT.

Figure 5.7. Four-step calibration standards. (Adopted from [7])

Figure 5.8. Equivalent circuits of calibration standards. (Adopted from [7])

151

Figure 5.9. Equivalent circuit of the test fixture. (Adopted from [7])

Zc represents the contact resistance. Z p represents the coupling between the signal pad and ground pads, which includes the fringing capacitance between pads and

coupling through semiconducting substrate. Zi and Z1 denote the impedance from pad to

DUT boundary. Z2 represents the impedance of the dangling leg used to connect DUT

and surrounding substrate to ground. Z3 denotes the direct and substrate-carried coupling

from ground to device input/output. Z f denotes the direct and substrate-carried coupling from input to output.

The parasitics can be found as follows:

2 Z = Z (5.31) css3 11,

Z psoss=−ZZ11, 11, (5.32) where, “ ss ” and “ so ” represent the simple short and simple open standard, respectively.

1 ZZZ=+() (5.33) 221,12,2 s s

Z − Z ZZ+=11,s 2 (5.34) i 1 1+α

152 Z321,11,2=+−−+ZZoo2( ZZZ i 1 ) (5.35)

⎛⎞Z ZZ=−3 2 (5.36) f 3 ⎜⎟ ⎝⎠ZZ21,o − 2 where, “ s ” and “o ” represent the short and open standards, respectively. The parameter

α is introduced to account for nonzero length of the short standard. An easy way to estimate α is to compare the number of squares of the short standard to the number of

squares of the two leads Zi and Z1 . Figure 5.10 shows where Zi , Z1 and α stand for.

Zi + Z1 α()Zi + Z1

Figure 5.10. Graph representation of Z , Z and α . i 1

α can be calculated roughly by comparing the resistivity of the short lead to the lead from pad to DUT boundary.

After getting all the values of the parasitics, we begin our de-embedding procedure. First, measure the s-parameters of the test fixture with DUT, and convert them to z-parameters, as

⎡⎤Z11Z 12 Z = ⎢⎥ (5.37) ⎣⎦Z21Z 22

Zc is de-embedded as

153 ⎡⎤3 0 ⎢⎥2 Z =−ZZ⎢⎥ (5.38) A 3 c ⎢⎥0 ⎣⎦⎢⎥2

−1 Then, convert Z A to y-parameter matrix YZA = A . Z p is de-embedded as

⎡⎤101 YYBA=−⎢⎥ (5.39) ⎣⎦01Z p

Step 3 is the de-embedding of Zi + Z1 and Z2 . First convert YB back to z-

parameter matrix ZB , then subtract the effect of the impedances as

⎡⎤Zi ++ZZ12 Z 2 ZZCB=−⎢⎥ (5.40) ⎣⎦Z212ZZZi ++

Before doing step 4, the z-parameter matrix ZC is transformed to y-parameter

matrix YC . The final de-embedding step is shown as

⎡⎤11 1 ⎢⎥+− Z34ZZf YY=−⎢⎥ (5.41) DUT C ⎢⎥111 ⎢⎥−+ ⎣⎦⎢⎥Z f ZZ34

5.1.7 Two-port Network with One Open and Two Thrus

In this method, the DUT is modeled in cascade configuration, as shown in Fig.

5.11, where, YPAD is the admittance between signal pad and ground plane. [8]

154

Figure 5.11. Schematic representation of cascade configuration, which includes probe pads, metal interconnect lines and transistor (DUT). (Adopted from [8])

Figure 5.12 shows the layouts of DUT measurement and on-wafer calibration structures.

Figure 5.12. DUT and its corresponding open, thru1 and thru2 structures. (Adopted from [8])

When I12= I , only one thru is needed in this method.

The de-embedding procedure is as below:

DUT OPEN THRU1 THRU 2 1) Measure the scattering parameters, ⎣⎡S ⎦⎤ , ⎣⎡S ⎦⎤ , ⎣⎡S ⎦⎤ , and ⎣⎦⎡⎤S

OPEN OPEN of the DUT, “open”, “thru1”, and “thru2”. Convert ⎣⎡S ⎦⎤ to ⎣⎦⎡⎤Y by

using equation (5.1).

OPEN OPEN ⎡ PAD ⎤ 2) Calculate YPAD (YYPAD =+11 Y 12 ) and ⎣A ⎦ :

155 PAD ⎡⎤10 ⎣⎦⎡⎤A = ⎢⎥ (5.42). ⎣⎦YPAD 1

THRU1 THRU 2 THRU1 THRU 2 3) Calculate ⎣⎦⎡⎤A and ⎣⎡A ⎦⎤ from ⎣⎡S ⎦⎤ and ⎣⎡S ⎦⎤ .

IN OUT 4) Calculate ⎣⎦⎡⎤A and ⎣⎡A ⎦⎤ by:

IN THRU1 PAD −1 ⎣⎦⎣⎡⎤⎡AA= ⎦⎣⎦⎤⎡⎤ A (5.43)

out PAD−1 THRU 2 ⎣⎦⎣⎦⎣⎡⎤⎡⎤⎡AA= A ⎦⎤ (5.44)

DUT DUT 5) Convert ⎣⎦⎡⎤S to ⎣⎡A ⎦⎤ and calculate the ABCD matrix of the DUT

TRANS IN−11 DUT OUT − ⎣⎦⎣⎦⎣⎦⎣⎦⎡⎤⎡⎤⎡⎤⎡⎤AAAA= .

TRANS TRANS If needed, transform ABCD matrix ⎣⎡A ⎦⎤ to ⎣⎡S ⎦⎤ .

5.1.8 Two-port Network with One Open and One Thru

In [9], Cho. etc, claim that the method in [8] suffers the effect of improper pad compensation for the “THRU” dummy structures. To improve the accuracy, they proposed a new cascade-based deembedding procedure using only on “OPEN” and one

“THRU” as shown in Fig. 5.13.

156

Figure 5.13. Cascaded-based de-embedding method. (a) Layouts of DUT, OPEN and THRU. (b) Schematic diagrams. (Adopted from [9])

The de-embedding procedure is summarized as

DUT OPEN THRU 1) Measure scattering parameters ⎣⎡S ⎦⎤ , ⎣⎡S ⎦⎤ and ⎣⎡S ⎦⎤ of the DUT,

“OPEN” and “THRU”.

OPEN OPEN 2) Convert s-parameter ⎣⎡S ⎦⎤ to y-parameter ⎣⎡Y ⎦⎤ , and calculate

OPEN OPEN ⎡ PAD ⎤ YYPAD =+11 Y 12 and ⎣A ⎦ .

157 THRU THRU 3) Convert s-parameter ⎣⎡S ⎦⎤ to ABCD matrix ⎣⎡A ⎦⎤ . De-embed the pad

T PAD−11 THRU PAD − T parasitics by ⎣⎦⎣⎦⎣⎦⎣⎦⎡⎤⎡⎤⎡⎤⎡⎤AA= A A, then transform ⎣⎦⎡⎤A back to

T s-parameter ⎣⎦⎡⎤S .

4) Calculate the interconnect characteristic impedance Zc and propagation

constant γ as follows [10]:

22 (1+−SS11 ) 21 ZZc =± 0 22 (5.45) (1−−SS11 ) 21

−1 1 ⎡⎛⎞1−+SS22 ⎤ γ =−ln ⎢⎜⎟11 21 ±K ⎥ (5.46) lS2 ⎣⎢⎝⎠21 ⎦⎥

where, Z0 is the impedance of the microwave measurement system, l is the

interconnect length, and

1/2 222 2 ⎛⎞(1−+SS21 11 ) − (2 S 11 ) K = ⎜⎟2 (5.47) ⎝⎠(2S21 )

The “ ± ” sign in equations (5.45) and (5.46) is used to correct the

unreasonable solutions.

INT1 5) Create the input and output interconnects ABCD matrices ⎣⎦⎡⎤A and

⎡⎤INT 2 ⎣⎦A by substituting the input and output interconnect lengths (l1 and l2 )

into:

⎡coshγ lZc sinhγ l⎤ ⎡⎤AB⎢ ⎥ ⎢⎥= 1 (5.48) ⎣⎦CD⎢ sinhγ ll coshγ ⎥ ⎣⎢ Zc ⎦⎥

158 IN out IN PAD−1 INT1 6) Calculate ⎣⎦⎡⎤A and ⎣⎡A ⎦⎤ using ⎣⎡AA⎦⎣⎤⎡= ⎦⎣⎤⎡ A ⎦⎤ and

OUT INT2 PAD −1 ⎡⎤⎡⎤⎡⎤⎣⎦⎣⎦⎣⎦AAA= .

DUT DUT DINDUTOUT−11− 7) Convert ⎣⎦⎡⎤S to ⎣⎡A ⎦⎤ and calculate ⎣⎡AAAA⎦⎣⎤⎡= ⎦⎣⎤⎡ ⎦⎣⎤⎡ ⎦⎤.

D D 8) If needed, convert ⎣⎦⎡⎤A to ⎣⎡S ⎦⎤

5.2 Microwave Measurements of QP

In Chapter 4, QP prototypes with 50 and 100 µm wide nodules were fabricated and connected by electroless Sn plating. To obtain s-parameters of the QP structures, two-port on-wafer measurements were conducted using a HP 8722E vector network analyzer and a Cascade Microtech Summit 9000 analytic probe station with ground- signal-ground (G-S-G) air coplanar probes with a pitch of 150 µm. The calibration of the test environment was performed using a line-reflect-match (LRM) scheme [11]. A 50 Ω load was terminated at each port for both simulation and measurement. On-wafer s- parameters of the QP structures were performed from 100 MHz to 40 GHz.

The “simple QP”, “QP improved 1” and “QP improved 2”, shown in the measurements, are the same as in Fig. 3.32. A “simple QP” for 50 µm wide nodules and a

“QP improved 2” for 100 µm wide nodules were shown in Fig. 4.42, and repeated here in

Fig. 5.14. Figure 5.15 shows the return loss (S11) and insertion loss (S21) of both simulated and measured QP structures with 100 µm wide nodules on low resistivity silicon substrate (10 Ω • cm) before de-embedding, which includes the effects of the on- chip interconnects on both ends.

159

(a) (b)

Figure 5.14. Fabricated QP structures for microwave measurements: (a) simple QP with 50 µm wide nodules; (b) QP improved 2 with 100 µm wide nodules.

simple QP (simulation) simple QP (simulation) QP improved 1 (simulation) QP improved 1 (simulation) QP improved 2 (simulation) QP improved 2 (simulation) on-chip interconnect (simulation) on-chip interconnect (simulation) simple QP (measurement) simple QP (measurement) QP simproved 1 (measurement) QP improved 1 (measurement) QP improved 2 (measurement) QP improved 2 (measurement) -5 0.0 -10 -0.5

-15 -1.0

-20 -1.5

-25 -2.0 -30 -2.5 -35

um 100 substrate) 10 ohm-cm (on (dB) (dB) (on 10 ohm-cmsubstrate) um 100 21 11

S -3.0 S -40 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 5.15. (a) Return loss and (b) insertio n loss of both simulated and measured QP structures with 100 µm wide nodules on low resistivity silicon substrate.

From the pictures above, we can find that in all cases, the improved QP geometries provide better impedance matching than the simple QP and result in better performance. The measured data, both in return loss and in insertion loss, is a little better than the simulated results, which may be due to the nodule depth variation in the fabrication process. When the real nodule depth is less than 20 µm, which is the depth in the simulations, the performance may improve because of the reduced parasitic

160 capacitance associated with the nodules embedded inside the silicon substrate. Another reason might be that in simulation, the resistivity of the substrate is fixed at 10 Ω • cm, while for the real process, the resistivity varies in a certain range. If the resistivity of the fabricated substrate is higher than 10 Ω • cm, the performance of the QP will be better.

Measured on-wafer open and short test structures were used to de-embed the effects of the on-chip interconnects to leave only the transmission performance of the nodule structures. S11 and S21 of the nodule interconnects are shown in Fig. 5.16. A better than 18 dB return loss and less than 0.2 dB insertion loss at 40 GHz are achieved.

simple QP simple QP QP improved 1 QP improved 1

QP improved 2 QP improved 2 -18 0.0

-21 -0.1 -24

-27 -0.2

-30 -0.3

-33 (dB)ohm-cm10 (on substrate) 100after um de-embedding (dB) (on 10 ohm-cm substrate) 100 um after de-embedding after 100 um substrate) 10 ohm-cm (on (dB) 21

11 -0.4 -36 S S 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 5.16. (a) Return loss (S11) and (b) insertion loss (S21) of 100 µm QP structures after de-embedding.

Quilt Packaging offers a highly-optimized chip-to-chip interconnect in terms of performance and overall benefits. In a system using QP technology, signals that originate near the edges of the dice need to traverse a very short distance between them, so that latency would be extremely low. A worst-case scenario would be signals that originate deep in the interior of a chip and must terminate deep within the receiving chip. Even in such cases, the combination of using a low-loss upper metal level interconnect in conjunction with QP will offer a significant advantage over a “standard” signal path

161 through a solder bump to a package substrate and back again. The advantage to QP is that the connection between ICs is now direct without the need to traverse package structures such as leads, bumps, and package wiring.

Here, we compare our measured performance of QP to alternative structures. Devlin et al. [12] showed that a 25.4 µm (1 mil) diameter, 0.3 nH bond wire introduced an insertion loss of 2 dB at 40 GHz. Braunisch et al. [13] demonstrated solder joints with an insertion loss of more than 1 dB at 40 GHz. When considering the die- package interface with additional pad capacitance (C = 300 fF), the insertion loss at 40

GHz increased to about 4 dB. Mantysalo and Ristolainen [14] investigated stacked 3-D package interconnects. A single solder-plated polymer ball introduced 0.3 dB insertion loss and 15 dB return loss at 10 GHz, while a ball-via-ball structure introduced more than

0.5 dB insertion loss and about 12 dB return loss at 10 GHz. Pfeiffer and Chandrasekhar

[15] showed that a 30 µm high, 90 µm wide gold stud bump for flip-chip interconnects introduced 0.35 dB insertion loss at 30 GHz. Above 30 GHz, the insertion loss was predominated by the onset of a structural resonance. Banerjee and Drayton [16] designed and measured both monolithic (on the same substrate) and hybrid (on different substrates, similar to QP) packages built on high-resistivity silicon substrates (> 2000 Ω • cm) using double aluminum wirebonds. The insertion loss due to the wirebonds was 0.15 dB for the monolithic package at 50 GHz and 0.5 dB for the hybrid package at 40 GHz, respectively.

Lahiji et al. [17] presented low-loss multiwafer vertical interconnects built on 100

µm thick, high-resistivity silicon (> 2000 Ω • cm) and GaAs substrates. At 20 GHz, the vertical vias in the two silicon designs showed 0.12 and 0.38 dB insertion loss and 12.9 and 17.3 dB return loss, respectively, and in the GaAs design, insertion loss was 0.2 dB

162 and return loss was 13.6 dB. It is not surprising that excellent performance was obtained in 3-dimensional packaging systems, given the short distances involved in these direct chip-to-chip interconnects. Considering the costs and benefits of the myriad options in chip-to-chip interconnects, QP offers extremely high performance as well as many other advantages [18].

From Figs. 5.15 and 5.16, we see that insertion loss has a moderate increase at low frequency (<3 GHz), which is attributed to the dielectric loss on the low resistivity silicon substrate. If high resistivity silicon substrate or III-V substrate are used, this low frequency dispersion will disappear. In Chapter 4, QP structures built on high resistivity

(8000 Ω • cm) were also fabricated, and the microwave measurements were conducted again from 100 MHz to 40 GHz.

Fig. 5.17 shows the return loss and insertion loss of both simulated and measured

QP structures with 100 µm wide nodules before de-embedding. The improved QP geometries have better performance than the simple QP. Compared with Fig. 5.15, the insertion loss of “QP improved 2” improves from -1.5 dB to -0.3dB. Also, the moderate drop at low frequency is not present in Fig. 5.17. The measured insertion loss in Fig. 5.17 is a little worse than the simulated data, which also attributes to the fabrication process variation and/or substrate resistivity variation.

163

simple QP (simulation) simple QP (simulation) QP improved 1 (simulation) QP improved 1 (simulation) QP improved 2 (simulation) QP improved 2 (simulation) on-chip interconnect (simulation) on-chip interconnect (simulation) simple QP (measurement) simple QP (measurement) QP improved 1 (measurement) QP improved 1 (measurement) QP improved 2 (measurement) QP improved 2 (measurement) -10 0.0

-20 -0.1 -30 -0.2 -40

-0.3 -50 -60 -0.4 (dB) (on 8000 ohm-cm substrate) 100 um 100 substrate) ohm-cm 8000 (on (dB) (dB)(on 8000 ohm-cm 100 substrate) um

11

-70 21 S -0.5 0 10203040 S 0 10203040 Frequency (GHz) Frequency (GHz) (a) (b)

Figure 5.17. (a) Return loss and (b) insertion loss of both simulated and measured QP structures with 100 µm wide nodules on high resistivity silicon substrates before de-embedding.

To compare the transmission characteristics of QP structures on both low and high resistivity silicon substrate in more detail, return loss and insertion loss of “QP improved 2” are measured on both low and high resistivity substrate and shown in Fig.

5.18.

QP improved 2 (10 ohm-cm) QP improved 2 (10 ohm-cm)

QP improved 2 (8000 ohm-cm) QP improved 2 (8000 ohm-cm) 0.0

-20 -0.3

-30 -0.6

-40 -0.9

(dB) (dB) 21 11 S S -50 -1.2

-60 -1.5

-70 -1.8 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz)

(a) (b) Figure 5.18. (a) Return loss (S11) and (b) insertion loss (S21) comparison of 100 µm QP improved 2 on low and high resistivity silicon substrates before de-embedding.

164 The S11 and S21 obtained so far are both from ports loaded with 50 Ω. In practice, both the transmission lines leading to and departing from the QP structures do not necessarily have characteristic impedances of 50 Ω. If the two ports are simultaneously conjugate matched instead of loaded with 50 Ω, the maximum available gain (MAG) is achieved. [3]

The MAG is expressed as

S GKK=−−21 2 1 (5.49) MAG ( ) S12 where, K is the Rollett stability factor and defined as

1+−DS22 − S 2 K = 11 22 (5.50), 2⋅⋅SS12 21

D =⋅−⋅SS11 22 SS 12 21 (5.51).

For two ports to be simultaneously conjugate matched, the necessary and sufficient condition is K ≥1.

Figure 5.19 shows the MAG of 100 µm QP improved 2 structures on both low and high resistivity silicon substrates before de-embedding.

MAG (10 ohm-cm)

MAG (8000 ohm-cm) 0.0

-0.4

-0.8

(dB) MAG

-1.2

-1.6 0 10203040 Frequency (GHz) Figure 5.19. Maximum available gain (MAG) of QP improved 2 on both low and high resistivity silicon substrates.

165 Clearly, on high resistivity substrate, the MAG of QP is much better and flatter in the whole measured frequency range.

Fig. 5.20 shows the return loss and insertion loss of the QP structures with 100

µm wide nodules on high resistivity substrate after de-embedding. A return loss of -30dB and an insertion loss of around -0.15 dB are achieved at 40 GHz for QP improved 2.

simple QP simple QP QP improved 1 QP improved 1

QP improved 2 QP improved 2 0.00 -20 -0.05 -30

-0.10

-40

-0.15 -50 -0.20

-60 (dB) (on 8000 ohm-cm substrate) 100 um after de-embedding um after 100 substrate) ohm-cm 8000 (on (dB) (dB) (on 8000 ohm-cm substrate) 100 um afterde-embedding 100 um substrate)ohm-cm (on 8000 (dB)

21 -0.25 11 S S 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 5.20. (a) Return loss (S11) and (b) insertion loss (S21) of 100 µm QP structures on high resistivity substrate after de-embedding.

The QP structures with 50 µm wide nodules on low resistivity substrate were also measured. Figure 5.21 and 5.22 show the return loss and insertion loss of both simulated and measured results before and after de-embedding. The return loss is less than -25 dB for the improved QP structures. The insertion loss of the nodules is less than -0.6 dB for simple QP and -0.4 dB for QP improved 2 at 40 GHz.

166 simple QP (simulation) simple QP (simulation) QP improved 1 (simulation) QP improved 1 (simulation) QP improved 2 (simulation) QP improved 2 (simulation) on-chip interconnect (simulation) on-chip interconnect (simulation) simple QP (measurement) simple QP (measurement) QP improved 1 (measurement) QP improved 1 (measurement) QP improved 2 (measurement) 0.0 QP improved 2 (measurement) -12

-0.5 -18

-1.0 -24

m substrate) 50 um -1.5 -30

-2.0 -36 (dB) (on 10 ohm-cm substrate) 50 um 50 substrate) ohm-cm 10 (on (dB) 21 10 ohm-c (dB) (on 11 S -2.5 S -42 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz) (a) (b) Figure 5.21. (a) Return loss and (b) insertion loss of both simulated and measured QP structures with 50 µm wide nodules on low resistivity silicon substrate.

simple QP (de-embedded) simple QP (de-embedded) QP improved 1 (de-embedded) QP improved 1 (de-embedded) QP improved 2 (de-embedded) QP improved 2 (de-embedded) -15 0.0

-20 -0.1 -25 -0.2 -30

-0.3

-35 -0.4 -40

-0.5 -45 (dB) (on 10 ohm-cm substrate) 50 um after de-embedding after um 50 substrate) ohm-cm 10 (on (dB) 11 (dB) (on 10 ohm-cm substrate) (dB) (on substrate) 50 um 10 ohm-cm after de-embedding S 21

0 10203040 S -0.6 0 10203040 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 5.22. (a) Return loss (S11) and (b) insertion loss (S21) of 50 µm QP structures on low resistivity substrate after de-embedding.

Compared with Fig. 5.16 and Fig. 5.22, the insertion loss of the 50 µm QP structures is a little higher than that of the 100 µm QP structures. It may be due to the de- embedding structures (on-wafer open and short) difference (eg. SiO2 layer thickness under metal lines, dimension of the metal lines, etc.) in the 50 µm QP structures. Another strong candidate is the electrolessly plated Sn, which connects the two chips. The contact resistance between Sn and the two chips has a large impact on the microwave

167 performance. A robust repeatable connecting process is needed for the precise evaluation of the return loss and insertion loss of the QP structures.

For the QP structures with 50 µm wide nodules on high resistivity substrate, simple QP and QP improved 1 were measured. QP improved 2 suffered from connection issues and will not be shown here. Figure 5.23 shows the return loss and insertion loss of both simulated and measured results on the 50 µm QP structures before de-embedding.

simple QP (simulation) QP improved 1 (simulation) simple QP (simulation) QP improved 2 (simulation) QP improved 1 (simulation) on-chip interconnect (simulation) QP improved 2 (simulation) simple QP (measurement) on-chip interconnect (simulation) QP improved 1 (measurement) simple QP (measurement) -10 QP improved 1 (measurement) 0.0 -20

-0.1 -30 -0.2 -40

-0.3 -50 -60 -0.4 (dB) (on 8000 ohm-cm substrate) 50 um (dB) (on 8000 ohm-cm substrate) 50 um 11 21

S -70

S -0.5 0 10203040 0 10203040 Frequency (GHz) Frequency (GHz)

(a) (b)

Figure 5.23. (a) Return loss and (b) insertion loss of both simulated and measured QP structures with 50 µm wide nodules on high resistivity silicon substrate.

A better than -20 dB return loss and a better than -0.3 dB insertion loss are observed for the 50 µm QP structures. The result is pre-de-embedding. After de- embedding, the performance is expected to be better. In our microwave measurements, due to the relatively large process parameter variation within a wafer, over-de-embedding was found for these 50 µm QP structures. The results are omitted here due to the in- validness of the data set.

168 In this chapter, microwave measurements of the fabricated 100 and 50 µm QP structures on both low and high resistivity substrates were performed up to 40 GHz. By comparing with the state-of-art packaging techniques being pursued by industry, QP offers minimum delay, ultra-high speed and low interference for chip-to-chip communications.

Due to the lack of precise alignment instruments, QP structures with 10 and 20

µm wide nodules were fabricated but could not be connected to form a quilt. The need for better process (such as DRIE, plating and CMP) and alignment control will be critical for the implementation of QP for mass production.

169

CHAPTER 6

CONCLUSIONS AND FUTURE WORK

6.1 Conclusions

A novel packaging technique, Quilt Packaging (QP), was proposed and demonstrated as an advanced system-in-packaging (SiP) method for ultra-high speed chip-to-chip communications. QP uses protruding metal nodules fabricated during the IC fabrication process to form inter-chip contacts.

Extensive 3-D electromagnetic simulations in Ansoft HFSS show the superior performance of QP with high return loss and very low insertion loss at very high frequency (40 GHz for 50 and 100 µm wide nodules and 200 GHz for 10 and 20 µm wide nodules). The combination of microwave performance, ease of fabrication and the robustness of nodules put our focus on the 20 µm deep nodules embedded inside the silicon substrate. To further improve the QP performance for high speed communications, a tapered nodule signal line and a corresponding tapered nodule ground plane were designed to reduce the discontinuity between on-chip interconnects and the nodule structures. For QP structures with 100 µm wide nodules on low resistivity (10 Ω • cm) substrate, the improved geometries demonstrated a return loss better than -16 dB and a better than -0.6 dB insertion loss in the frequency range up to 40 GHz, compared with the simple QP structure with a return loss at about -12 dB and an insertion loss at about -0.78

170 dB after de-embedding. Other QP structures with 10, 20 and 50 µm wide nodules all show improved transmission performance for the tapered geometries.

The best simulated return and insertion loss were achieved on the 20 µm wide QP structures. On low resistivity silicon substrate, a return loss better than -20 dB and an insertion loss better than -0.7 dB is predicted in the frequency range from 1 GHz to 200

GHz after de-embedding. The dielectric loss due to the low resistivity substrate causes a moderate drop in insertion loss at low frequency (<3 GHz). This low frequency dispersion is eliminated on the high resistivity (8000 Ω • cm) silicon substrate, and better insertion loss is achieved. Before de-embedding, the insertion loss of the 100 µm wide

QP improved 2 structure, which includes the effect of the on-chip interconnect, is better than -0.15 dB at 40 GHz, compared with -1.9 dB on low resistivity substrate. For the 20

µm wide QP improved 2 structure, the insertion loss improves from -2 dB on low resistivity substrate to -0.5 dB on high resistivity substrate at 200 GHz.

To increase the I/O density of the QP prototypes, the width of the ground planes of both on-chip interconnect and nodule structure is decreased to the same width as the signal lines. The QPs with limited ground show comparable performance to the previous

QPs with wider ground. For QP structures with 10 and 20 µm wide nodules on high resistivity substrates, resonances occur at about 140 GHz, which limits the bandwidth.

A low temperature fabrication process, which can be integrated into standard

CMOS process during the back-end-of-line procedures, was presented. The critical steps, such as DRIE, electroless/electrolytic Cu plating and CMP, were studied, and many experiments have been done to find improved combinations of parameters in the fabrication processes.

171 The etch rate of Bosch process in DRIE is dependant on the aspect ratio of the etched trench dimension. With the same length, the wider the trench is, the faster the etch rate will be. To precisely control the depth of the etched trenches, which later become the nodules, careful adjustments of etch duration is required.

Both electroless and electrolytic Cu plating were investigated to find the better solution for the formation of the copper nodules. Electroless Cu plating has better selectivity and will grow Cu only inside the trenches, where the seed layer is left, and nowhere else, where the seed layer is removed. However, the speed of the electroless plating is very limited. 20 µm needs a lot of time and the quality of the plated Cu is questionable with many voids appearing even at 5 µm deep testing nodules. The robustness and reliability of the electrolessly formed Cu nodule are a concern.

Electrolytic Cu plating is chosen over the electroless plating to form more robust Cu nodules much faster. Bottom-up filling from the plating solution along with pulse- reverse-pulse makes the trench filling much easier.

The electrolytic plating puts a large burden on the following CMP process. To remove the Cu more efficiently, the parameters, including carrier and platen rotation speed, down force pressure, slurry flow rate, and polishing time, all need careful adjustment. Too much down force pressure can break the wafer since the topology after

Cu plating is quite rough on the whole wafer. Since there is no temperature sensor in our

CMP instrument to sense the difference between polishing Cu and polishing the dielectric layer, it is hard to prevent over-polishing. Constant observation is needed. And, since the condition of the platen changes with time, repeatability of the CMP process is a concern.

172 The separation of the chips is achieved again by DRIE. The adjustments made in the standard Bosch process result in an under-cut beneath the copper nodules and a slope in the silicon substrate, which assures that by the connecting two chips together, the nodules on the front edge of the chip will touch first. Since long etch time is required to separate the chips, a thick mask to block the plasma etch at unwanted areas is critical for the intact of the QP chips.

The microwave measurements on the fabricated QP structures with 100 µm wide nodules on low resistivity silicon substrate showed insertion loss as low as -0.2 dB at 40

GHz after de-embedding. On high resistivity substrate, the insertion loss is less than -0.3 dB before de-embedding, which includes the effects of the on-chip interconnects. After de-embedding, the performance is expected to be even better. The insertion loss measured on the QP structures with 50 µm wide nodules is little higher, which is due to the high contact resistance of the plated Sn bridge on the exposed Cu nodules.

QP proves to be a novel high performance packaging technique, which shows very low insertion loss and ultra-wide bandwidth and can be used for high speed chip-to- chip communications.

6.2 Future Work

According to the conclusions presented above, there are several issues needed to be considered and improved in the future work.

Precise control of the etch rate from the DRIE Bosch process needs extensive experiments. The loading effects, which include macroloading (etch rate is dependant on the etchable area on the wafer scale), microloading (etch rate is dependant on a chip or

173 feature scale), and profile loading (sidewall profile is dependant on the etchable area, which is different on array lines from isolated lines), have been studied. [1] Different mask designs will have different etch rates on the etchable area and each needs to be calibrated. For defining the nodule trenches in QP, an accurate etch depth will provide a more efficient measure on the later electrolytic Cu plating and less burden on CMP.

Due to the lack of instruments, an optimized Cu plating process for our nodule formation is waiting to be explored. Larger output current from a power source and a better setup to accommodate the IR drop on the wafer surface with conductive seed layer are very important for better micro-filling of trenches and uniformity of plating.

As mentioned in Chapter 5, the microwave measurements on the QP structures with 50 µm wide nodules are missing the “QP improved 2” on high resistivity substrate, which is because of the lack of the proper alignment instrument to put the two chips very close and the lack of the robustness of the Sn plating leading to connecting problem. The chips were manually aligned, which resulted in larger than expected gap between the two chips. In the future, instead of connecting the two chips by bridging them with Sn, the Cu nodules can be coated by plated Sn first, then, two chips with the nodules from one side of each chip are aligned precisely and pushed against each other with a continuous force.

A heating instrument, by laser gun or hot air blow, heats the connecting spot very quickly and melts the Sn. With the existence of the force pushing the two chips together, when the Sn from each nodule connects in liquid state, the surface tension will combine them into one and form a steady bond between the two chips. Since the melting point of Sn is only about 230 deg. C, the low temperature will not cause damage on the fabricated chips.

A fixture is under development to hold the chips. The temperature control of the heater is

174 also very critical. By implementing this technique, the smaller size of the nodules can be connected effectively and dramatically improve the I/O density and reliability. The schematic picture of this connecting technique is shown in Fig. 6.1.

Fixed end

Spring attached to provide pushing

Y direction sliding arm force

Figure 6.1. Schematic of the connecting technique with fixture.

Also, to better improve reliability, which is even more important than performance alone in industry, we can put dielectric fillings between the chips and help maintain the position of each chip. More simulations will need to be done to find an optimized design for this structure.

To separate the chips by DRIE, a thick photoresist mask layer was used. To sustain the long time through-wafer etch, this layer is very critical for protecting the circuits. Another way to separate the chips is called “dicing by thinning” [2][3][4], in which the wafers are thinned by back side grounding and polishing to separate the die along the previously etched trenches on the front side of the wafers. Examples of the

175 thinned chips are shown in Fig. 6.2. The thickness of the silicon substrate is only 25 µm after the thinning.

(a) (b)

Figure 6.2. (a) 25 µm thick hexagonal chips with dicing line prepared by

anisotropic etching, and (b) a 25 µm thick chip with rounded corner.

(Adopted from [4])

By adopting this technique, the stringent requirement on the thick mask will be greatly relaxed. Thinner photoresist layer, less exposure time and shorter etch time all improve the efficiency and reliability on the fabrication of QP. The vertical alignment of nodules depends on the tight control of wafer thickness after grinding, which is estimated at better than 1 µm [5]. If less accurate instruments are used, another way to align is to put the chips facing down and after the connection is made, fill the dielectrics to fix the position of the chips.

The thinning of the substrate will have some impact on the microwave performance, and more simulations and optimizations needed to be addressed.

176 The mechanical strength of the nodule connecting is very important. A simplified schematic used to describe the QP connection is shown in Fig. 6.3.

1.2 mm

spacing

Figure 6.3. Simple Schematic of QP connection.

Here, each chip has a dimension of 5 mm × 5 mm × 600 µm. In QP, lots of small nodules extend outside the chips and form the connections, which is simplified as two stripes connecting the two chips. In our mechanical simulations in the commercial software package IntelliSuite [6], the back plane of the left chip is fixed. Since each silicon chip weighs about 0.03492 g, to test the stress of the copper stripe, a pressure of about 3.42e-6 N is added at the center of the right chip. For Copper, the yield stress ranges from 55 MPa to 330 MPa, and the ultimate stress ranges from 230 MPa to 380

MPa. To assure the mechanical strength, the stress on the copper stripe at any point needs to be less than 55 MPa. A series of simulations was conducted, with the spacing between the two chips ranging from 0.5 µm to 4 µm. The thickness of the copper films was varied to find the lowest safe point. The simulation results are shown in Fig. 6.4. At 0.5 µm, a copper film with 5 µm thickness is good enough to hold the right chip without distortion

177 or breakage. At 1 µm, a 6 µm thick copper film is needed. At 2, 3 and 4 µm, the film thickness of 8 µm is strong enough.

(a) (b)

(c) (d)

(e)

(e)

Figure 6.4. IntelliSuite simulation results on QP connecting by copper film with chip spacing at (a) 0.5 µm, (b) 1 µm, (c) 2 µm, (d) 3 µm and (e) 4 µm.

178 In the real QP connections, the copper nodules have thickness of about 20 µm, which is more than enough to support the other chips even with mild vibration. However, in real QP, it is not the copper connecting directly from one chip to the other. The plated

Sn plays a very big role for the mechanical strength of QP connections. A more complicated model to incorporate the effects of the plated Sn in the simulations is awaiting further exploration.

Other than the microwave performance and mechanical strength, the thermal effect of QP is also very important. How the heat will dissipate through the Cu nodules from one chip to the other or to the surrounding environment will need thorough thermal analysis. Spreading the heat to the other chips from the main chip can be an advantage by lowering the temperature of the main chip or a disadvantage because the peripheral chips’ temperature might increase. The total impacts are interesting and critical for our QP structures. As we know, the driving force to multi-core architecture from merely increasing single chip frequency is the thermal effect. QP can serve as the high performance channel for multi-core systems.

In our research of QP so far, we have focused on silicon substrates. For microwave applications, III-V materials still dominate due to their wide bandgap and low resistivity. A simple simulation built on GaAs substrate using HFSS [7] is shown in Fig.

6.5. No optimization was performed for this model.

179

Figure 6.5. A QP model built on GaAs substrate.

The return loss and insertion loss are shown in Fig. 6.6.

(a) (b) Figure 6.6. (a) return loss and (b) insertion loss of 65 GHz to 85 GHz targeting automotive radar system.

The return loss is less than -20 dB and the insertion loss is better than -0.4 dB, before de-embedding.

180 To build QP on GaAs, a different DRIE system is required. To take advantage of the tools built for silicon processing, SiGe might be a feasible choice for ultra-high performance microwave chip-to-chip interconnection.

DRIE is a single wafer process, requires temperature control and constant gas flow, and is therefore, expensive. To find a cost-effective way to define the trenches will add values to QP.

During the formation of the copper nodules, by careful design and process control, high density capacitors can be fabricated during the same time. The etched deep trenches have much higher surface area and can be taken advantage of for high performance analog or power designed with high density capacitors.

181

Chapter 1 references

[1] R. Chou, B. Doyle, M. Doczy, S. Datta, S. Hareland, B. Jin, J. Kavalieros and M. Metz, “Silicon nano-transistors and breaking the 10 nm physical gate length barrier,” Proc. 61st Device Research Conf., pp. 123-126, Salt Lake City, June, 2003.

[2] International Technology Roadmap for (ITRS), 2005 Edition, SIA.

[3] T. Sakurai, “Superconnect technology,” IEICE Trans. Electron., vol.E84-C, no. 12, 2001, pp. 1709-1716.

[4] R.R. Tummala, “SOP: what is it and why? A new microsystem-integration technology paradigm - Moor’s law for system integration of miniaturized convergent systems of the next decade,” IEEE Trans. Adv. Packag., vol. 27, no. 2, May 2004, pp. 241-249.

[5] R.R. Tummala, M. Swaminathan, M.M. Tentzeris, J. Laskar, G. Chang, S. Sitaraman, D. Keezer, D. Guidotti, Z. Huang, K. Lim, L. Wan, S.K. Bhattacharya, V. Sundaram, F. Liu and P.M. Raj, “The SOP for miniaturized, mixed-signal computing, communication, and consumer systems of the next decade,” IEEE Trans. Adv. Packag., vol. 27, no. 2, May 2004, pp. 250-267.

[6] A. Abidi, A. Rogougaran, G. Chang, J. Rael, J. Chang, M. Rofougaran, and P. Chang, “The future of CMOS wireless transceivers,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, vol. 440, San Francisco, CA, 1997, pp. 118-119.

[7] D. Shaeffer, A. Shahani, S. Mohan, H. Samavati, H. Rategh, M. Hershenson, M. Xu, C. Yue, D. Eddleman, and T. Lee, “A 115 mW CMOS GPS receiver,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, San Francisco, CA, 1998, pp. 122-123.

[8] M. Goetz, “System on chip design methodology applied to system in package architecture,” IEEE Electronic Components and Technology Conf. 2002, pp. 254- 258.

[9] K. Ohsawa, H. Odaira, M. Ohsawa, S. Hirade, T. Iijima, and S.G. Pierce, “3-D assembly interposer technology for next-generation integrated systems,” ISSCC Digest of Tech. Papers, Feb. 2001, pp. 272-273.

182 [10] G.H. Bernstein, Q. Liu, Z. Sun, and P. Fay, “Quilt-packaging: a new paradigm for inter-chip communication,” Proc. IEEE 7th Electronics Packaging Technology Conference, 2005, pp. 1-6.

183

Chapter 2 references

[1] R. Fillion, “Advanced packaging technology for leading edge microelectronics and flexible electronics,” http:// people.ccmr.cornell.edu/~cober/mse542/page2/files /Fillion%20GE.pdf.

[2] N. Sherwani, Q. Yu, and S. Badida, Introduction to multichip modules, John Wiley & Sons, Inc., 1995.

[3] P.A. Sandborn and H. Moreno, Conceptual Design of Multichip Modules and Systems, Kluwer Academic Publishers, 1994.

[4] S. Donnay, P. Pieters, K. Vaesen, W. Diels, P. Wambacq, W.D. Raedt, E. Beyne, M. Engels, and I. Bolsens, “Chip-package codesign of a low-power 5-GHz RF front end,” Proceedings of the IEEE, vol. 88, no. 10, 2000, pp. 1583-1597.

[5] N.A. Blum, H.K. Charles, A.S. Francomacaro, “Multichip module substrates,” Johns Hopkins APL Technical Digest, vol. 20, no. 1, 1999, pp. 62-69.

[6] E. Davis, W. Harding, R. Schwartz, and J. Coring, “Solid logic technology: Versatile high performance microelectronics,” IBM J. Res. Develop., vol. 8, 1964, pp. 102.

[7] R.J. Drost, R.D. Hopkins, and I.E. Sutherland, “Proximity communication,” Proc. IEEE 2003 Custom Integrated Circuits Conf., 2003, pp. 469-472.

[8] K.D. Gann, “Neo-stacking technology,” http://www.irvine-sensors.com/pdf/Neo- Stacking%20Technology%20HDI-d.pdf.

[9] K.W. Guarini, A.W. Topol, M. Ieong, R. Yu, L. Shi, M.R. Newport, D.J. Frank, D.V. Singh, G. M. Cohen, S.V. Nitta, D.C. Boyd, P.A. Oneill, S.L. Tempest, H.B. Pogge, S. Purushothaman, and W.E. Haensch, “Electrical integrity of state-of-the-art 0.13 µm SOI CMOS devices and circuits transferred for three-dimensional (3D) (IC) fabrication,” Proc. Int’l. Electron Device Meeting, 2002, pp. 943-945.

[10] J. Fjelstad, “Novel interconnection technology for high speed chip to chip signal transmission,” http://www.sipipe.com/docs/NovelInterconnectionTechnology.pdf.

[11] R.R. Tummala, “SOP: what is it and why? A new microsystem-integration technology paradigm - Moor’s law for system integration of miniaturized

184 convergent systems of the next decade,” IEEE Trans. Adv. Packag., vol. 27, no. 2, May 2004, pp. 241-249.

[12] J. Fjelstad, “Novel interconnection technology for high speed chip to chip signal transmission,” Available: http://www.sipipe.com/docs/NovelInterconnectionTechnologyJ.pdf.

[13] S.N. Towle, H. Braunisch, C. Hu, R.D. Emery and G.J. Vandentop, “Bumpless build-up layer packaging,” in Proc. ASME Int. Mech. Eng. Congr. Expo. (IMECE), New York, Nov., 11-16 2001.

[14] H. Braunisch, S.N. Towle, R.D. Emery, C. Hu and G.J. Vandentop, “Electrical performance of bumpless build-up layer packaging,” in Proc. IEEE Electronic Components and Technol. Conf. (ECTC), 2002, pp. 353-358.

[15] E. Klink, B. Garben, A. Huber, D. Kaller, S. Griver-Talocia, and G.A. Katopis, “Evolution of organic chip packaging technology for high speed applications,” IEEE Trans. Adv. Packag., vol. 27, no. 1, February 2004, pp. 4-9.

185

Chapter 3 references

[1] G.H. Bernstein, P. Fay, W. Porod and Q. Liu, “System for inter-chip communication”, US Patent (submitted).

[2] G.H. Bernstein, Q. Liu, Z. Sun, and P. Fay, “Quilt-packaging: a new paradigm for inter-chip communication,” Proc. IEEE 7th Electronics Packaging Technology Conference, 2005, pp. 1-6.

[3] Q. Liu, M. Yan, A. Tong, G. Snider, P. Fay, and G.H. Bernstein, “Fabrication and characterization of quilt packaging: a novel inter-chip paradigm for system-in- package (SiP),” 2nd Intl. Workshop on SOP, SOC, SIP Electronics Technologies, 2006.

[4] G.H. Bernstein, Q. Liu, M. Yan, Z. Sun, W. Porod, G. Snider, and P. Fay, “Quilt packaging: high-density, high-speed interchip communications,” IEEE Trans. Adv. Packag., 2007. (accepted)

[5] Q. Liu, P. Fay and G.H. Bernstein, “A novel scheme for wide bandwidth chip-to-chip communications,” Journal of Microelectronics and . (accepted)

[6] S.W. Song, M. Ismail, G. Moon, and D.Y. Kim, “Accurate model of simultaneous switching noise in low voltage digital VLSI,” Proc. 1999 IEEE Int. Symp. on Circ. and Sys., 1999, pp 210-213.

[7] S.F. Al-Sarawi, D. Abbott, and P.D. Franzon, “A review of 3-D packaging technology,” IEEE Trans. Components, Packaging, and Manufacturing Technol., pt. B, vol. 21, no. 1, 1998, pp. 2-14.

[8] F. Moll and M. Roca, Interconnection Noise in VLSI Circuits, Kluwer Academic Publishers, 2004.

[9] L. Martens, High-Frequency Characterization of Electronic Packaging, Kluwer Academic Publishers, 1998.

[10] G.A. Katopis, “Delta-I noise specifications for high-performance computing machine,” Proc. IEEE, vol. 73, no. 9, pp. 1405-1415, Sept. 1985.

[11] National Semiconductor, “Understanding and minimizing ground bounce,” Application Note 640, Dec. 1989.

186 [12] M.J. Kobrinsky, B.A. Block, J. Zheng, B.C. Barnett, E. Mohammed, M. Reshotko, F. Robertson, S. List, I. Young, and K. Cadien, “On-chip optical interconnects,” Intel Technology Journal, vol. 08, issue 02, pp. 129-141, May 2004.

[13] G.H. Havemann, and J.A. Huchby, “High-performance interconnects: an integration overview,” Proc. IEEE, vol. 89, no. 5, pp. 586-601, May 2001.

[14] Y. Takao, S. Nakai, N. Horiguchi, “Extended 90 nm CMOS technology with high manufacturability for high-performance, low-power, RF/analog applications,” Fujitsu Sci. Tech. J., vol. 39, pp. 32-39, June 2003.

[15] J.M. Rabaey, Digital Integrated Circuits: A Design Perspective, Prentice Hall, 1999.

[16] E. Elmore, “The transient response of damped linear networks with particular regard to wideband amplifiers,” Journal of Applied Physics, pp. 55-63, Jan. 1948.

[17] T. Sakurai, “Superconnect technology,” IEICE Trans. Electron., vol.E84-C, no. 12, pp. 1709-1716, 2001.

[18] T. Sakurai, “Interconnection from design perspective,” Advanced Metallization Conference, pp. 53-58, Oct. 2000.

[19] K. Nose and T. Sakurai, “Power-conscious interconnect buffer optimization with improved modeling of driver MOSFET and its implications to bulk and SOI CMOS technology,” International Symposium on Low Power Electronics and Design, Monterey, CA, USA, 1.4s, pp. 24-29, Aug. 2002.

[20] S. Srinivansaraghavan and W. Burleson, “Interconnect effort – a unification of repeater insertion and logical effort,” IEEE Computer Society Symposium on VLSI, Tampa, FL, pp. 55-61, Feb. 2003.

[21] D.M. Pozar, Microwave Engineering, John Wiley & Sons, 1998.

[22] C. Nguyen, Analysis Methods for RF, Microwave and Millimeter-Wave Planar Transmission Line Structures, John Wiley & Sons, 2000.

[23] S. Ramo, Fields and Waves in Communication Electronics, John Wiley & Sons, 1984.

[24] W.C. Johnson, Transmission Lines and Networks, McGraw-Hill, 1950.

[25] D.L. Monthei, Package Electrical Modeling, Thermal Medeling, and Processing for GaAs Wireless Applications, Kluwer Academic Publishers, 1999.

187 [26] Y.-C. Shih, “Broadband characterization of conductor-backed coplanar waveguide using accurate on-wafer measurement techniques,” Microwave J., vol. 34, no. 4, pp. 95-105, Apr. 1991.

[27] W.R. Eisentadt and Y. Eo, “S-parameter-based IC interconnect transmission line characterization,” IEEE Trans. Comp., Packag., Manufact. Technol., vol. 15, no. 4, pp. 483-490, Aug. 1992.

[28] G. Ghione and C. Naldi, “Analytical formulas for coplanar lines in hybrid and monolithic MICs,” Electron. Lett., vol. 20, no. 4, pp. 179-181, Feb. 1984.

[29] W. Hilberg, “From approximation to exact relations for characteristic impedance,” IEEE Trans. Microwave Theory Tech., vol. MTT-13, pp. 29-38, Jan. 1965.

[30] K.C. Gupta, R. Garg, and I.J. Bahl, Microstrip Lines and Slotlines, Artech House, Dedham, MA, 1979, pp. 277-280.

[31] G. Ghione and C. Naldi, “Parameters of coplanar waveguides with lower ground plane,” Electron. Letters, vol. 19, no. 18, pp. 734-735, Sept. 1983.

[32] G. Ghione and C. Naldi, “Coplanar waveguides for MMIC applications: effect of upper shielding, conductor backing, finite-extent ground planes, and line-to-line coupling,” IEEE Trans. Microwave Theory Tech., vol. MTT-35, no. 3, pp. 260-267, Mar. 1987.

[33] K.C. Gupta, R. Garg, and R. Chadha, Computer-Aided Design of Microwave Circuits, Artech House, Dedham, MA, 1981.

[34] Ansoft High-Frequency Structure Simulator (HFSS), Ansoft Inc., Pittsburgh, PA.

[35] “Port tutorial series: coplanar waveguide (CPW),” Ansoft HFSS Online Techinical Support, www.ansoft.com/OTS.

188

Chapter 4 referecnes

[1] F. Laermer and A. Schilp, “Method of anisotropically etching silicon,” US-Patent No. 5501893.

[2] A.A. Ayon, R.L. Bayt and K.S. Breuer, “Deep reactive ion etching: a promising technology for micro- and nanosatellites,” Smart Mater. Struct. vol. 10, pp. 1135- 1144, 2001.

[3] F. Laermer, A. Schilp, K. Funk, M. Offenberg, “Bosch deep silicon etching: improving uniformity and etch rate for advanced MEMS applications,” Proceedings MEMS ’99, pp. 211-216, Orlando, FL, USA.

[4] W. Hu, T. Orlova, and G.H. Bernstein, “Technique for preparation of precise wafer cross sections and applications to electron beam lithography of poly (methylmethacrylate) resist,” J. Val. Sci. Technol. B, vol. 20, no. 6, pp. 3085-3088, Nov/Dec. 2002.

[5] K.S. Chen, A.A. Ayon, X. Zhang, and S.M. Spearing, “Effect of process parameters on the surface morphology and mechanical performance of silicon structures after deep reactive ion etching (DRIE),” J. Microelectromech. Syst., vol. 11, no. 3, 2002, pp. 264-275.

[6] G.O. Mallory and J.B. Hajdu, Electroless Plating: Fundamentals and Applications, William Andrew Publishing/Noyes, Chapter 12, pp. 289-329, 1990.

[7] H. Jiang, J.-L.A. Yeh, and N.C. Tien, “A new fabrication method for high-Q on-chip spiral inductor,” Proc. SPIE, vol. 3876, pp. 153-159, 1999.

[8] H. Jiang, Y. Wang, J.-L.A. Yeh and N.C. Tien, “Fabrication of high-performance on- chip suspended spiral inductors by micromachining and electroless copper plating,” IEEE MTT-S Digest, pp. 279-282, 2000.

[9] J.-L.A. Yeh, H. Jiang, H.P. Neves, and N.C. Tien, “Copper-encapsulated silicon micromachined structures,” IEEE J. of Microelectromechanical Systems, vol. 9, no. 3, pp. 281-287, Sept. 2000.

[10] R.M. Lukes, “Chemistry of autocatalytic reduction of copper by alkaline formaldehyde,” Plating, vol. 15, no. 11, pp. 1066-1068, Nov. 1964.

189 [11] P. Buck and L.R. Griffith, “Voltammetric and chronopotentiometric study of the anodic oxidation of methanol, formaldehyde, and formic acid,” J. Electrochem. Soc., vol. 109, no. 11, pp. 1066-1068, Nov. 1964.

[12] M. Paunovic, “Electrochemical aspects of electroless deposition of metals,” Plating, vol. 55, no. 11, pp. 1161-1167, Nov. 1968.

[13] R. Jagannathan and M. Krishnan, “Electroless plating of copper at a low pH level,” IBM J. Res. Develop., vol. 37, no. 2, pp. 117-123, March 1993.

[14] J. Wu, Inductive links with integrated receiving coils for MEMS and implantable applications, Ph.D thesis, Univ. of Notre Dame, 2003.

[15] U. Landau, “Copper metallization of semiconductor interconnects – issue and prospects,” Invited Talk, CMP, Symposium, Abstract # 505, Electrochemical Society Meeting, Phoenix, AZ, October, 2000.

[16] N.V. Mandich, “Pulse and pulse-reverse electroplating,” 66th Metal Finishing Guidebook, vol. 95, no. 1A, pp. 375-380, 1998.

[17] N. Tantavichet and M.d. Pritzker, “Low- and high-frequency pulse current and pulse reverse plating of copper,” J. Electrochem. Soc., 150, pp. 665-677, 2003.

[18] J. Lee and A.C. West, “Impact of pulse parameters on current distribution in high aspect ratio vias and through-holes,” J. Electrochem. Soc., 152, pp. 645-651, 2005.

[19] D. Boing, W. Moyne, T. Smith, J. Moyne, R. Telfeyan, A. Hurwitz, S. Shellman, and J. Taylor, “Run by run control of chemical-mechanical polishing,” IEEE Trans. CPMT (C), vol. 19, no. 4, pp. 307-314, Oct. 1996.

[20] K. Noh, N. Saka, and J.-H. Chun, “A mechanical model for erosion in copper chemical-mechanical polishing,” http://web.mit.edu/cmp/publications/papers/noh_final.pdf.

[21] J.-Y Lai, N. Saka and J.-H Chun, “Evolution of copper-oxide damascene structures in chemical mechanical polishing, -- copper dishing and oxide erosion,” J. Electrochem. Soc., vol. 149, G41-G50, 2002.

[22] T.E. Gbondo-Tugbawa, “Chip-scale modeling of pattern dependencies in copper chemical mechanical polishing process,” Ph.D. thesis, M.I.T., 2002.

[23] C.L. Borst, W.N. Gill, and R.J. Gutmann, Chemical-Mechanical Polishing of Low Dielectric Constant Polymers and Organosilicate Glasses: Fundamental Mechanisms and Application to IC Interconnect Technology, Kluwer Academic Publishers, 2002.

190 [24] J.M. Steigerwald, S.P. Murarka, r.J. Gutmann, D.J. Duquette, “Effect of copper ions in the slurry on the chemical-mechanical polish rate of titanium,” J. Electrochem. Soc., vol. 141, pp. 3512-3516, Dec. 1994.

[25] F.W. Preston, “The theory and design of plate glass polishing machines,” J. Soc Glass Technology, vol. 11, pp. 214-256, 1927.

[26] J.-Y. Lai, “Mechanics, mechanism and the modeling of chemical mechanical polishing process,” Ph.D. thesis, M.I.T., 2001.

[27] Jie Wu, “Inductive links with integrated receiving coils for MEMS and implantable applications,” Ph.D thesis, Notre Dame, 2003.

[28] A. Frank and A.J. Bard, “The decomposition of the sulfonate additive sulfopropyl sulfonate in acid copper electroplating chemicals,” J. Electrochem. Soc., vol. 150, C244-C250, 2003.

191

Chapter 5 references

[1] T.E. Kolding, ‘On-wafer calibration techniques for Giga-Hertz CMOS measurements,’ IEEE International Conference on Microelectronic Test Structures, Vol. 12, 1999, pp. 105-110.

[2] D. Lovelace, ‘Program de-embeds wafer-probed data’, Microwave & RF, Vol. 32, No. 6, 1993, pp. 136-138.

[3] D.M. Pozar, Microwave Engineering, John Wiley & Sons, 1998.

[4] J. Hanseler, H. Schinagel, and H. Zapf, ‘Test structures and measurement techniques for the characterization of the dynamic behavior of CMOS transistors on wafer in the GHz range,’ IEEE International Conference on Microelectronic Test Structures, Vol. 5, 1992, pp. 90-93.

[5] H. Cho and D. Burk, ‘A three step method for the de-embedding of high frequency s- parameter measurements,’ IEEE Trans. on Electron Devices, Vol. 38, No. 6, 1991, pp. 1371-1375.

[6] E.P. Vandamme, D. Schreurs and C. Dinther, ‘Improved three-Step de-embedding method to accurately account for the influence of pad parasitics in silicon on-wafer test structures,’ IEEE Trans. on Electron Devices, Vol. 48, No. 4, 2001, pp. 737-742.

[7] T.E. Kolding, ‘A four-step method for de-embedding Gigahertz on-wafer CMOS measurement,’ IEEE Trans. on Electron Devices, Vol. 47, No. 4, 2000, pp. 734-740.

[8] C.H. Chen and M.J. Deen, ‘A general noise and s-parameter de-embedding procedure for on-wafer high-frequency measurements of MOSFETs,’ IEEE Trans. Microwave Theory Tech., vol. 49, pp. 1004-1005, May 2001.

[9] M.H. Cho, G.W. Huang, K.M. Chen and A.S. Peng, ‘A novel cascade-based de- embedding method for on-wafer microwave characterization and automatic measurement,’ IEEE MTT-S Int. Microwave Symp. Dig., pp.1237-1240, June 2004.

[10] W.R. Eisenstadt and Y. Eo, ‘S-parameter-based IC interconnect transmission line characterization,’ IEEE Trans. Compon., Hybrids, Manufact. Tech., vol. 15, pp. 483-490, August 1992.

192 [11] T. Winkel, L. Dutta, and H. Grabinski, “An on-wafer de-embedding procedure for devices under measurement with error-networks containing arbitrary line lengths,” 47th Automatic RF Techniques Group Conference Digest, 1996, pp. 102-111.

[12] L.M. Devlin, G.A. Pearson, A.W. Dearn, and S. Williamson, “28GHz multi-chip modules,” Available: www.plextek.com/papers/mm_paper.pdf.

[13] H. Braunisch, K.P. Hwang, and R.D. Emery, “Compliant die-package interconnects at high frequencies,” Proc. IEEE Electronic Components and Technology Conference, 2004, pp. 1237-1243.

[14] M. Mantysalo and E.O. Ristolainen, “Modeling and analyzing vertical interconnections,” IEEE Trans. Adv. Packag., vol. 29, no. 2, May 2006, pp. 335-342.

[15] U.R. Pfeiffer and A. Chandrasekhar, “Characterization of flip-chip interconnects up to millimeter-wave frequencies based on a nondestructive in situ approach,” IEEE Trans. Adv. Packag., vol. 28, no. 2, 2005, pp. 160-167.

[16] S.R. Banerjee and R.F. Drayton, “Wafer level interconnects for 3D packaging,” Proc. IEEE Electronic Components and Technology Conference, 2004, pp. 1513-1518.

[17] R.R. Lahiji, K.J. Herrick, Y. Lee, A. Margomenos, S. Mohammadi, and L.P.B. Katehi, “Multiwafer vertical interconnects for three-dimensional integrated circuits,” IEEE Trans. Microw. Theory Tech., vol. 54, no. 6, 2006, pp. 2699-2706

[18] G.H. Bernstein, Q. Liu, Z. Sun, and P. Fay, “Quilt-packaging: a new paradigm for inter-chip communication,” Proc. IEEE 7th Electronics Packaging Technology Conference, 2005, pp. 1-6.

193

Chapter 6 references

[1] J. Karttunen, J. Kiihamaki, and S. Franssila, “Loading effects in deep silicon etching,” Proceedings of SPIE 2000, vol. 4174, pp. 90-97.

[2] C. Landesberger, G. Klink, G. Schwinn, and R. Aschenbrenner, “New dicing and thinning concept improves mechanical reliability of ultra thin silicon,” Proc. 2001 IEEE Int. Symp. And Exhibition on Advanced Packaging Materials, 2001, pp. 92-97.

[3] G. Klink, M. Feil, F. Ansorge, R. Aschenbrenner, “Innovative packaging concepts for ultra thin integrated circuits,” Proc. 51st IEEE Electronic Components Conference, 2001, pp. 1034-1039.

[4] M. Feil, C. Adler, G. Klink, M. Konig, C. Landesberger, S. Scherbaum, G. Schwinn, and H. Spohrle, “Ultra thin ICs and MEMS elements: techniques for wafer thinning, stress-free separation, assembly and interconnection,” Microsystem Technologies, v. 9, 2003, pp. 176-182.

[5] Available: http://www.thatcorp.com/datashts/grinddata.pdf.

[6] Available: http://www.intellisensesoftware.com

[7] D. Nair, Private Communication, Advanced Microwave Packaging, Delphi, Kokomo, IN.

194