<<

NANOELECTRONICS: TECHNOLOGY ASSESSMENT AND PROJECTION AT THE DEVICE, CIRCUIT, AND SYSTEM LEVEL

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHYLOSOPHY

Lan Wei

August 2010

© 2010 by Lan Wei. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/mj834nb2809

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Philip Wong, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Subhasish Mitra

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Krishna Saraswat

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii

iv

ABSTRACT

In the past thirty years, industry has been making steady progress to improve the intrinsic Si-based device performance while reducing the device size simply by scaling down the devices in all three dimensions, as proposed by R. Dennard in the 1970s. However, the improvement of intrinsic device performance based on Dennard scaling is approaching its limit with non-scalable supply voltage, leakage currents and non-negligible parasitic capacitances and resistances. Continued progress in necessitates a holistic view that crosses the traditional boundaries of device, circuit, and system. The best devices are those that are optimized for the circuits and systems of the target application. Device design and engineering must aim at improvement at the circuit and system level.

In this thesis, the design space is explored for future Si CMOS technology, and field effect , a promising technology in the post-Si era. Compact models of transport properties and capacitive components of different device structures have been developed to facilitate circuit-level analysis and system-level optimization. Possible ways of extending technology roadmap are proposed.

When Dennard scaling is becoming less effective, technology boosters are introduced to improve the device intrinsic performance. Currently, the main approaches are enhancing the device intrinsic transport property (e.g. introducing strain and novel channel materials) and strengthening the device intrinsic electrostatic integrity (e.g. SOI, FinFET, high-k/). The conventional ways of technology assessment focus on the intrinsic device-level characteristics under certain bias conditions, regardless of the details of the operating conditions. Our study shows the importance of comprehending the circuit environment and system application when benchmarking

v

device performance. As a general guideline for future technology development, enhancing the transport property is a key booster for circuits with un-stacked logic gates and high-performance applications, while strengthening the electrostatic integrity is preferred for circuits with stacked logic gates and low-power applications.

As the non-scalable parasitic capacitances and resistances are no longer negligible comparing to the intrinsic properties, the device-level parasitic components become critical for circuit-level performance. Parasitic engineering is inevitable for future technology generations. We analyze the geometric and electrical properties of different capacitance components and develop compact models for a variety of conventional and emerging device structures, including bulk devices, fully-depleted -on-insulator (FDSOI) devices, planar double gate (DG) devices, and one- dimensional /nanotube devices. With the models, we carefully examine the impact of different device parasitic capacitances on circuit- performance. When the physical gate length can no longer be effectively scaled down and traditional boosters (e.g., strain, high-k/metal gate) are having diminishing return, engineering the parasitic components by “selective device structure scaling” becomes the most fruitful path to extend the technology roadmap. By judiciously engineering the device parasitic resistance and parasitic capacitance, and considering the impact of the interconnect wiring capacitance, selective device structure scaling will enable technology scaling and contacted gate pitch scaling for several generations beyond the currently perceived limits. For a fully custom designed 53-bit multiplier, the selectively scaled devices with reduced footprint achieve 30% smaller layout area, 25% higher speed and 10% efficiency, compared with conventional devices.

Beyond Si CMOS scaling, carbon nanotube field effect transistor (CNFET) is among the promising candidates to extend the technology. Device level analyses have suggested significant benefits, but it is not clear what the circuit/system benefits are, at the chip-level. We take an application-oriented approach, as opposed to the device- level modeling employed in previous works. A fast, non-iterative analytical model for

vi

CNFET has been developed, including both carrier transport and capacitance components which are the key determinants of performance. The chip-level system (processor plus on-chip cache) performance of CNFET technology has been optimized and compared with Si-based technology. The optimization results show that CNFET chip can operate as 5 times faster than partially-depleted silicon-on-insulator (PDSOI) chip, at the same chip power level from 0.01W to 100W. The optimization methodology and platform can be potentially transplanted to other emerging devices.

Besides full chip-level optimization, simple technology assessment and optimization methodologies are also required, especially at the early development stage of a new technology. Conventionally, aspiring emerging device technologies (e.g. III-V, CNFET, TFET) are often targeted to outperform Si FETs at the same off-state current

(Ioff) and supply voltage (Vdd). We present a new device technology assessment methodology based on energy-delay optimization which treats Ioff and Vdd as “free variables”, and bounded by constraints due to device variation and circuit noise margin. We show that for each emerging device (III-V, CNFET, TFET), there is a corresponding and different optimal set of Ioff and Vdd, and an optimal energy-delay. Today’s best-available III-V and CNFET can outperform the best Si FET by 1.5-2x and 2-3.5x, respectively. Projected into the 10nm gate length regime, III-V-on- Insulator, CNFET, and TFET are 1.25x, 2-3x, and 5-10x (for FO1 delays of 0.3ns, 0.1ns, and 1ns respectively) better than the ITRS target at the same gate length.

In conclusion, this thesis addresses the significance of a holistic view across the traditional boundaries of device, circuit, and system in technology assessment and projection. Guided by this concept, we chart a new path for Si CMOS technology scaling for future technology generations. We also accomplish for the first time chip- level performance assessment and optimization of new emerging such as CNFETs. This establishes a new benchmark and design methodology for emerging nanoelectronics.

vii

viii

ACKNOWLEDGMENTS

I wish to acknowledge the people who have made my 5-year experience at Stanford a very pleasant and fruitful journey.

This dissertation would not have been possible without the supervision and support from my Principle Advisor, Prof. H. –S. Philip Wong. As a distinguished researcher, he has provided excellent technical guidance in every stage of my Ph. D. research through his vast knowledge and experience. As an outstanding mentor, Prof. Wong always respects his students and takes into consideration their long-term growth as a priority. I owe him my deepest gratitude for his support and encouragement for a lot of my achievements here at Stanford.

I would also like to thank Prof. Krishna Saraswat, Prof. Subhasish Mitra, and Prof. Ada Poon. Through his technical insights, Prof. Saraswat has given me invaluable suggestions in regards to both my research and career development. I am fortunate to have been a part of the “carbon nanotube club” led by Prof. Mitra and Prof. Wong. As a successful and energetic junior faculty, Prof. Mitra is the key person to keep the club active, supportive, and productive. Prof. Ada Poon is not only the person who guided me into the bioengineering area, but also a trustable and caring friend who has given me precious advice on both work and life.

I am heartily thankful to my industrial mentors and other academic advisors. In particular, I have greatly benefited from Dr. David J. Frank at IBM Research and Dr. Frédéric Boeuf at STMicroelectronics, with whom I have intensively collaborated. Discussions with them are always inspiring, and they never hesitate to offer their help or share their thoughts, knowledge, and experience. I am also grateful for Dr. Pranav Kalavade (), Dr. Leland Chang, Dr. Tak Ning, Dr. Wilfried Haensch (IBM), Dr.

ix

Thomas Skotnicki (STMicroelectronics), and Dr. Kwok Ng (SRC), who have been very supportive and helpful during my internships and the following study. Discussions with Prof. Dimitri Antoniadis at MIT and Prof. Mark Horowitz at Stanford are also greatly appreciated.

My colleagues in the Stanford Nanoelectronics Group and the “carbon nanotube club” deserve special thanks. I would especially like to show my gratitude to Dr. Jie Deng and Prof. Deji Akinwande for their help during my project set-up and job search, and thank Dr. Yuan Zhang and Soogine Chong for their continual kindness and encouragement. I would also like to thank Saeroonter Oh, Jieying Luo, Kokab Parizi, Albert Lin, Nishant Patil, Jie Zhang, and Arash Hazeghi for productive collaborations and useful discussions; and thank many of the other group members for creating the friendly and pleasant atmosphere at and off work, including Jenny Hu, Li-Wen Chang, SangBum Kim, Byoungil Lee, Kyeongran Yoo, Rakesh Jeyasingh, Marissa Caldwell, Crystal Kenney, Jason Parker, Cara Beasley, Jiale Liang, Xiangyu Chen, Yi Wu, Dr. Xinyu Bao, Dr. Dae Sung Lee, Dr. Yang Chai, Dr. Duygu Kuzum, Dr. Gael Close, Dr. Kerem Akarvardar, Gordon Wan, and our administrator, Ms. Fely Barrera.

This work is financially supported by Stanford Graduate Fellowship, the Focus Center Research Program Center for Circuit and System Solutions (C2S2), the member companies of the Stanford Initiative for Nanoscale Materials and Processes (INMP), SRC, NSF/NRI, and PERCS DARPA. All are greatly appreciated.

Besides the academic exploration, my graduate-school life has also been an adventure full of different and diversified experiences. The journey would have not been such a delightful memory without the support and company of my friends along the way.

Finally and most importantly, I wish to acknowledge my parents, my grandparents, and all other family members, for a wonderful family in every aspect and their unconditional support at any time.

x

DEDICATION

The dissertation is dedicated to my parents, Yan Gao and Aibing Wei, and to my grandparents, Jie Li, and Xitian Gao, for their endless love, care, and support.

xi

xii

TABLE OF CONTENTS

LIST OF TABLES ...... xvi

LIST OF FIGURES ...... xvii Chapter 1 Introduction ...... 1 1.1 Moore’s Law and Dennard Scaling ...... 1 1.2 CMOS Scaling: Trends and Challenges ...... 3 1.3 Thesis Outline ...... 5 1.4 Bibliography ...... 7 Chapter 2 Si : Effective Scaling and Electrostatic Integrity ...... 11 2.1 Introduction ...... 11 2.2 Modeling and Simulation Methodology ...... 12 2.3 Logic Switching Application ...... 17 2.4 SRAM ...... 20 2.5 Conclusions ...... 28 2.6 Bibliography ...... 29 Chapter 3 Si MOSFETs: Parasitic Engineering & Selective Device Structure Scaling ...... 35 3.1 Introduction ...... 35 3.2 Parasitic Capacitance Engineering ...... 37 3.2.1 Analytical Models for Capacitance Components ...... 37 3.2.2 Actual Device Delay ...... 44 3.2.3 Parasitic Capacitance Engineering ...... 47 3.3 Selective Device Structure Scaling including Parasitic Capacitances and Series Resistances ...... 53 3.3.1 Background of Contacted Gate Pitch Scaling ...... 53 3.3.2 Selective Device Structure Scaling ...... 55

3.3.3 Inverter Delay Improvement ...... 63 3.3.4 Circuit-Level Improvement ...... 66 3.3.5 Extending the Technology Roadmap ...... 68 3.4 Conclusions ...... 70 3.5 Bibliography ...... 72 Chapter 4 Carbon Nanotube FETs: Compact Models and Chip-level Optimization ...... 79 4.1 Introduction ...... 79 4.2 Electrostatic Capacitances ...... 80 4.3 Quantum Capacitances and Channel Surface Potential ...... 87 4.4 Ballistic I-V Model and Source Exhaustion ...... 98 4.5 System-level Design Optimization ...... 102 4.6 Conclusions ...... 109 4.7 Bibliography ...... 109 Chapter 5 Technology Assessment Methodology for Complementary Logic Applications based on Energy-Delay Optimization ...... 113 5.1 Introduction ...... 113 5.2 Methodology ...... 114 5.3 Energy-Delay Optimization ...... 122 5.4 Comparison: Si, III-V, CNT, TFET ...... 125 5.4.1 Today’s Devices ...... 125 5.4.2 Projection to 10 nm Gate ...... 126 5.5 Conclusions ...... 128 5.6 Bibliography ...... 129 5.7 Appendix ...... 131 Chapter 6 Conclusions ...... 133 6.1 Summary of Contributions ...... 133 6.2 Recommendations for Possible Future Work ...... 136 6.3 Bibliography ...... 138

xiv

Appendix Modeling and Performance Comparison of 1-D and 2-D Device including Parasitic Gate Capacitance and Screening Effect ...... 139 A.1 Introduction ...... 139 A.2 Device Delay Metric ...... 140 A.3 Capacitance Modeling ...... 141

A.4 Capacitance Penalty (ηc) and Actual Current Benefit (ηI) ...... 150 A.5 Design Optimization ...... 155 A.6 Extended 1D Device Scaling Roadmap ...... 156 A.7 Conclusions ...... 159 A.8 Bibliography ...... 159 Author’s Biography ...... 163 List of Publications ...... 165

xv

LIST OF TABLES

Number Page Table 2.1 Key parameters for nominal cases at 32nm ...... 14 Table 2.2 Changes of RSNM and WSNM by reducing DIBL of specific devices from 150mV/V to 50mV/V ...... 27 Table 3.1 Key parameters for nominal devices following the ITRS 2009...... 49 Table 3.2 The effects of selective scaling scenarios on parasitics per unit gate width...... 61 Table 3.3 The geometric parameters used for the typical case of 65nm technology devices...... 64 Table 3.4 Key parameters for device doping profiles...... 64 Table 3.5 Summary of parameters used for ITRS technology projection ...... 69

Table 4.1 Device characteristics for Ptot=10W ...... 107 Table 5.1 Key parameters for analytical model ...... 121 Table A.1 Device parameters used for 32nm and 11nm technology...... 145 Table A.2 Values for fitting parameters used for 32nm and 11nm...... 150

xvi

LIST OF FIGURES

Number Page Figure 1.1 Schematic of a typical bulk MOSFET ...... 2 Figure 1.2 Historical trend of CMOS device scaling...... 4

Figure 2.1 (a) Id-Vgs and (b) Id-Vds curves of the nominal 32nm HP NMOS...... 14 Figure 2.2 The comparison of the experimental data and analytical models on

the Ion-Ioff plane...... 15

Figure 2.3 Id-Vds curves of two devices with different DIBL but the same Idsat...... 16

Figure 2.4 Delay improvement contours on DIBL-Idsat improvement plane ...... 18 Figure 2.5 Switching trajectories...... 19 Figure 2.6 Read operations: (a) circuit diagram, and (b) VTC plot...... 21 Figure 2.7 Write operations: (a) circuit diagram, and (b) VTC plot...... 21 Figure 2.8 (a) The trajectory and (b) VTC plot for read operations...... 23 Figure 2.9 DIBL vs. RSNM...... 23 Figure 2.10 (a) The trajectory and (b) VTC plot for write operations...... 26 Figure 2.11 DIBL vs. WSNM...... 26 Figure 2.12 RSNM vs. WSNM with different ratios of the widths of transistors ...... 28 Figure 3.1 Device schematics of (a) bulk device, (b) FDSOI, and (c) planar double gate ...... 37

Figure 3.2 Comparison of analytical model and numerical simulation of Cif ...... 40

Figure 3.3 Comparison of analytical model and numerical simulation of Cof ...... 41

Figure 3.4 Comparison of analytical model and numerical simulation of CPCCA ...... 42

Figure 3.5 (a) Schematic of Ccorner, (b) comparison of analytical model and

numerical simulation of Ccorner...... 43 Figure 3.6 Circuit and waveform of a 2-stage inverter chain...... 45 Figure 3.7 FO1 and FO3 delay of the nominal devices based on ITRS 2009 ...... 47 xvii

Figure 3.8 (a) Capacitance components normalized to Cgc, and (b) relative

contribution of intrinsic capacitance and parasitic capacitances in Ctot of a self-loaded FO1 inverter chain...... 48 Figure 3.9 FO3 delay dependences on (a) gate height, (b) gate-to-contact distance, and (c) average device width...... 51

Figure 3.10 (a) Historical data of the ratios of saturation current (Idsat) of PMOS and NMOS (P/N ratio), and (b) FO3 delay dependence on P/N ratio ...... 52 Figure 3.11 (a) Schematics of planar device with related parasitics. (b-d) Schematics of the three proposed scaling approaches...... 54 Figure 3.12 Comparison of 3-D predicted capacitance values and 2-D counterparts ...... 56 Figure 3.13 (a) Gate Capacitance, (b) S/D node capacitance, and their breakdown ...... 57 Figure 3.14 (a) Gate capacitance, (b) S/D node capacitance, and their breakdown ...... 60

Figure 3.15 (a) On-current, and (b) Gate capacitance vs. Lpitch for both planar bulk CMOS and UTBSOI in 65 nm node ...... 62

Figure 3.16 FO4 delay vs. Lpitch for both planar bulk CMOS and UTBSOI...... 65

Figure 3.17 (a)Rseries , and (b) its components as a function of technology nodes...... 67

Figure 3.18 Contacted gate pitch (Lpitch) and physical gate length (Lgate) vs. technology node down to 11 nm node...... 70 Figure 4.1 Schematic structure of MOSFET-like planar CNFET...... 81 Figure 4.2 Schematic of electrostatic capacitances network...... 83 Figure 4.3 Coordinate transformation in the cross section ...... 85 Figure 4.4 Comparison of the analytical model and numerical simulation of the

channel surface potential (ϕch), with only electrostatic capacitances involved...... 86 Figure 4.5 Illustration of (a) transmitted carriers, and (b) reflected carriers...... 88 Figure 4.6 Illustration of effective carriers...... 89 Figure 4.7 Capacitance network to calculate channel surface potential ...... 91

Figure 4.8 Multi-facetted Vg-Vd plane with piecewise constant quantum capacitance approximation and corresponding band diagram...... 93

xviii

Figure 4.9 Multi-facetted Vg-Vd plane with piecewise constant quantum capacitance approximation...... 94

Figure 4.10 ϕch calculated by the analytical model and the error from the results by numerical simulation ...... 95 Figure 4.11 Comparison between simulationand our model of surface potential

and its derivative over Vg ...... 96 Figure 4.12 Comparison between simulation and our model of (a) surface

potential and (b) its derivative over Vd ...... 97

Figure 4.13 Id-Vg curve.…...... 99

Figure 4.14 Id-Vd curve...... 100 Figure 4.15 Band diagrams (a) with no source exhaustion, and (b) with source exhaustion ...... 100 Figure 4.16 An experimental structure designed to illustrate source exhaustion ...... 101 Figure 4.17 Input and output waveforms of a single stage inverter ...... 102 Figure 4.18 Schematic of system level device optimization ...... 103 Figure 4.19 Optimization results versus power constraint at 11 nm technology node...... 105

Figure 4.20 The percentage partition of Pdyn in Ptot for optimal designs ...... 106 Figure 4.21 Illustration of the optimal design for CNFET vs. PDSOI ...... 107 Figure 4.22 Optimal delay at power constraint of 10W...... 108

Figure 4.23 EFSD dependence of optimal delay...... 108

Figure 5.1 Methodology to determine Ion and Ioff given the device I-V characteristics and supply voltage...... 116 Figure 5.2 An example circuit with LD=4...... 117 Figure 5.3 (a) Circuit diagram and voltage transfer curve (VTC) of an inverter. (b) Logic level constraint, and (c) Noise margin constraint ...... 117 Figure 5.4 Energy-Delay plot for 32nm Si MOSFET ...... 119

Figure 5.5 Pareto curve of minimum Etot vs. D ...... 119

xix

Figure 5.6 IdVg of published experimental data and analytcial models for (a) 32nm Si MOSFET (b) TFET (c) CNFET, and (d) III-V MOSFET...... 120

Figure 5.7 Minimum Etot vs. D with (a) different LD, and (b) different a...... 122

Figure 5.8 Optimal Ion/Ioff corresponding to selected Pareto curves ...... 123 nom Figure 5.9 Optimal Vdd corresponding to selected Pareto curves ...... 124

Figure 5.10 Optimal Ion and Ioff corresponding to selected Pareto curves...... 124

Figure 5.11 Minimum Etot vs. D for different device structures in the best available today’s experimental results...... 125

Figure 5.12 IdVg of projected devices at Vdd=0.66V in (a) log and (b) linear scale...... 126 nom Figure 5.13 (a) Minimum Etot vs. D and (b) corresponding Vdd for emerging devices projected for 11nm technology node...... 127 Figure A.1 3D simulation structure...... 142

Figure A.2 Comparison between the model and 3D simulation for Cgc...... 146 S Figure A.3 Illustrations of Cof ...... 148

Figure A.4 Comparison between the model and 3D simulation for Cof...... 149

Figure A.5 (a) ηc for 11nm technology node as a function of r and s. (b) The absolute value of capacitance components for r=1...... 151

Figure A.6 High channel density contributes to a small ηc...... 151

Figure A.7 Illustration of normal field and elliptical field components of Cgc and

Cof...... 152

Figure A.8 Components of ηc contributed by Cof and Cgtp...... 153

Figure A.9 Narrower devices suffer more from ηc ...... 154

Figure A.10 The optimal designs for (a) Wgate=10Lg and (b) 3Lg at different channel densities...... 156 Figure A.11 Extended Roadmap by increasing the channel density per generation

for a fixed ηI0...... 158

Figure A.12 Extended Roadmap by increasing ηI0 per generation for a fixed channel pitch and radius...... 158

xx

Chapter 1 Introduction

1.1 Moore’s Law and Dennard Scaling

The first bipolar transistor was invented in 1947 [1], which revolutionarily realized switching and amplifying functions in a smaller, faster, more reliable and more effective way than vacuum tubes. Ten years later, (IC) was demonstrated [2], with all components of an electronic circuit integrated onto a single solid-state substrate. After the first metal-oxide-semiconductor field effect transistor (MOSFET) was fabricated in 1960 [3], complementary MOS (CMOS) was invented in 1963 [4], which greatly reduced chip-level power dissipation. The invention of CMOS was a major breakthrough for the .

Since 1960s, the semiconductor industry has been unprecedentedly growing, as predicted by G. Moore [5]. As an economic observation and forecast, Moore concluded and predicted that “the complexity for minimum component costs has increased at a rate of roughly a factor of two per year … Certainly over the short term this rate can be expected to continue, if not to increase [5].” In 1975, Moore altered his projection to the widely known quote of “a doubling every two years [6].” Moore’s Law is in principle an economic observation; however, there are solid scientific reasons behind it.

Figure 1.1 shows a schematic of a typical long-channel MOSFET. A proper gate voltage is applied on the gate, the surface potential in the channel changes through the electrostatic coupling between the gate and the channel. Inverted carriers appear in the channel area. Then, with a proper voltage applied on the drain, the carriers move

1

and form a current flow. The channel region is the region realizing the “field effect”, – the conductivity of the channel is determined by how the channel potentials and carriers respond to the electric field from the electrostatic coupling between the gate and the channel. The channel region is often referred as the “intrinsic region.” There are three key dimensions associated with the channel: channel length (Lg), channel width (W), and oxide thickness (tox) which determines the coupling capacitance (Cox).

The contacted gate pitch (Lpitch) is defined as the minimum pitch between the source and drain contacts of a transistor (with a gate electrode in between). Lpitch is a commonly used metric for device density.

Figure 1.1 Schematic of a typical bulk MOSFET

The cost of silicon devices is strongly correlated with the size of the devices. Thus there is a constant desire to scale down the devices. Not to compromise the device reliability, R. Dennard et al. proposed constant field scaling theory [7], which is widely referred as “Dennard scaling”. The key to Dennard scaling is to simultaneously scale all three dimensions including wiring and depletion layers, and

2

all voltages including supply voltage (Vdd) and threshold voltages (Vth), by 1/κ (κ>1). Following [7], the electric field in the gate (E) does not change after scaling. The device switching delay (τ) and power consumption per switch (P) are scaled by 1/κ and 1/κ2, respectively. The device density (D) increases to κ2 (Equation (1.1)-(1.4) ).

V E  dd  1 (1.1) t ox

L2  g 1 (1.2)  Vdd

WV 3 P dd 1 (1.3)  2 Ltgox

D 1  2 (1.4) WLpitch

Dennard scaling provides a guideline for the development of very-large-scale- integration (VLSI) technology for many years since 1970s. The remarkable feature of MOSFET devices revealed by Dennard scaling has fueled the rapid growth of the semiconductor industry, – the speed increases, and the cost decrease, as the devices are made smaller [8]. After more than three decades, the feature size of a typical transistor in a processor is below 100nm [9]-[12].

1.2 CMOS Scaling: Trends and Challenges

Because the fabrication cost is closely correlated with device density, contacted gate pitch scaling is the main driver to reduce cost (Equation (1.4)). From Figure 1.2, Lpitch

3

has scaled by 0.7× per technology node (equivalent to κ=1.4 per technology node) consistently throughout many generations from 1 μm to 45 nm [9]-[25]. Channel length directly determines the intrinsic device performance as in Equation (1.2)-(1.3). In the 0.35µm and 0.25µm era, aggressive gate length scaling was used to boost the performance [13][14]. The dimensional Dennard scaling has been roughly followed until 65nm with κ=1.4 per technology node. By the 45 nm node, this strategy is no longer effective and gate length scaling virtually stopped due to the difficulty in coming up with device technologies that continue to provide good electrostatic control [26][27]. To continuously improve the device performance even when Dennard scaling is no longer effective, “effective scaling” schemes are introduced to compensate for the performance gap (Fig 1.2). Such performance boosters include enhancing carrier transport by band structure engineering from 130nm [9]-[12][18]- [25][28] and introducing high-k gate dielectric to replace SiO2 from 45nm [9]- [12][23]-[25][29].

Figure 1.2 Historical trend of CMOS device scaling. The symbols are published data from leading companies [9]-[25]. The lines are the projections by the international technology roadmap of semiconductor (ITRS) [26]. The dashed line and circles in red correspond to Lpitch. The solid line and stars in blue correspond to Lg.

4

Moving forward into nanometer regime, the improvement of intrinsic device performance based on Dennard scaling is approaching its limit with the non-scalable supply voltage, leakage currents and non-negligible parasitic capacitances and resistances [28][30]-[33]. With these non-scalable elements, device design and engineering needs to taken into consideration the device components outside the channel region, such as the source/drain regions, contact plugs, and gate electrodes. The interaction between device design and circuit design also deserves more attention. In the long-channel regime, device design and circuit design are two relatively independent layers. Basically, the focus of device design is to proportionally shrink all three dimensions of the devices to improve the intrinsic on-state performance of the devices, while satisfying the requirements on yield, reliability, etc. Device libraries are provided to circuit designers with very limited variables, mainly channel length and width. Little suggestions or feedback are echoed to device designers. When device intrinsic performance is no longer the dominant factor at the device-level or circuit-level, interactive design and technology assessment is required.

In general, continued progress in nanoelectronics necessitates a holistic view that crosses the traditional boundaries of device, circuit, and system. The best devices are those that are optimized for the circuits and systems of the target application. Device design and engineering must aim at improvement at circuit and system level.

1.3 Thesis Outline

In this thesis, the design space is explored for future Si CMOS technology, and carbon nanotube field effect transistor, a promising technology in for non-Si nanoelectronics. Compact models of transport properties and capacitive components of different device structures have been developed to facilitate circuit-level analysis and system-level optimization. Possible ways of extending technology roadmap are proposed. A

5

technology assessment methodology considering circuit-level and system-level performance is introduced.

Currently, intensive effort has been made to strengthen the device intrinsic electrostatic integrity (e.g. SOI, FinFET, high-k/metal gate) at the device-level. In Chapter 2, we examine the impact of enhancing electrostatic controllability on both logic switching applications and static-random-access-memory (SRAM) applications. Device operation trajectories under different circuit environments are analyzed. For logic switching applications, we conclude that strengthening the electrostatic integrity is preferred for circuits with stacked logic gates and low-power applications. For SRAM applications, we propose a way to simultaneously improve both read and write stabilities by improving electrostatic control.

In Chapter 3, parasitics engineering is discussed. We analyze the geometric and electrical property of different capacitance components and develop compact models for a variety of conventional and emerging device structures, including bulk devices, fully-depleted silicon-on-insulator (FDSOI) devices, and planar double gate (DG) devices. With the models, we carefully examine the impact of different device parasitic capacitances on the circuit level, and propose “selective device structure scaling” to extend the technology roadmap.

Chapter 4 focuses on carbon nanotube field effect transistor (CNFET) a promising candidate to extend semiconductor technology. A fast, non-iterative analytical model for CNFET has been developed, including both carrier transport and capacitance components which are the key determinants of performance. The chip-level system (processor plus on-chip cache) performance of CNFET technology has been optimized and compared with Si-based technology.

In Chapter 5, we introduce a technology assessment methodology for complementary logic applications. Instead of device level characteristics, circuit-level performances such as delay and energy, are evaluated while taking variability and noise margin into account. Several emerging devices, including III-V field effect transistor (III-V),

6

tunneling field effect transistor (TFET), and CNFETs are evaluated and benchmarked with baseline Si MOSFET, at both 32nm and 11nm technology.

In conclusion, this thesis addresses the significance of a holistic view across the traditional boundaries of device, circuit, and system in technology assessment and projection. Guided by this concept, we chart a new path for Si CMOS technology scaling for future technology generations. We also accomplish for the first time chip- level performance assessment and optimization of new emerging transistors such as CNFETs. This establishes a new benchmark and design methodology for emerging nanoelectronics.

1.4 Bibliography

[1] http://en.wikipedia.org/wiki/Transistor. [2] http://en.wikipedia.org/wiki/Integrated_circuit. [3] D. Kahng and M. M. Atalla, “Silicon dioxide field surface devices,” presented at Device Research Conf. IEEE, Pittsburgh, 1960. [4] F. Wanlass and C. T. Sah, “Nanowatt logic using field-effect metal-oxide- semiconductor trides,” Technical Digest of the 1963 Int. Solid-State Circuit Conf., IEEE, pp.32 –33, 1963. [5] G. E. Moore, “Cramming More Components onto Integrated Circuits,” , pp. 114–117, April 19, 1965. [6] G. E. Moore, "Progress in digital integrated electronics," in Devices Meeting, 1975 International, 1975, pp. 11-13. [7] R. Dennard et al., “Design of Ion-Implanted MOSFET’s with Very Small

7

Physical Dimensions,” IEEE Journal of Solid-State Circuits, vol. SC-9, no. 5, pp.

256 –268, Oct 1974. [8] Y. Taur and T. H. Ning, Fundamentals of Modern VLSI Devices, Cambridge University Press, October 1998. [9] S. Natarajan, et al., "A 32nm logic technology featuring 2nd-generation high-k + metal-gate transistors, enhanced channel strain and 0.171 µm2 SRAM size in a 291Mb array," Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 941-943. [10] F. Arnaud, et al., "32nm general purpose bulk CMOS technology for high performance applications at low voltage," Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 633-636. [11] X. Chen, et al., "A cost effective 32nm high-K/ metal gate CMOS technology for low power applications with single-metal/gate-first process," VLSI Technology, 2008 Symposium on, 2008, pp. 88-89. [12] C. H. Diaz, et al., "32nm gate-first high-k/metal-gate technology for high performance low power applications," Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 1-4. [13] R. A. Chapman, J. W. Kuehne, P. S-H. Ying, W. F. Richardson, A. R. Peterson, A. P. Lane, I. -C. Chen, L. Velo, C. H. Blanton, M. M. Moslehi, and J. L. Paterson, “High performance sub-half micron CMOS using rapid thermal processing,” Electron Devices Meeting, 1991. IEDM 1991. IEEE International, 1991, pp. 101- 104. [14] M. Rodder, Q. Z. Hong, M. Nandakumar, S. Aur, J. C. Hu, and I. C. Chen, “A sub-0.18μm gate length CMOS technology for high performance (1.5V) and low power (1.0V),” Electron Devices Meeting, 1996. IEDM 1996. IEEE International, 1996, pp. 563-566, 1996. [15] P. Gilbert, et al., “A high performance l.5V, 0.10μm gate length CMOS technology with scaled copper metalization,” Electron Devices Meeting, 1998. IEDM 1998. IEEE International, 1998, pp. 1013-1016, 1998.

8

[16] K. K. Young, et al., “A 0.13 μm CMOS technology with 193 nm lithography and Cu/low-k for high performance applications,” Electron Devices Meeting, 2000. IEDM 2000. IEEE International, 2000, pp. 563-566, 2000. [17] S. Tyagi, et al., “A 130 nm generation logic technology featuring 70 nm transistors, dual Vt transistors and 6 layers of Cu interconnects,” Electron Devices Meeting, 2000. IEDM 2000. IEEE International, 2000, pp. 567-570. [18] V. Chan, et al., “High speed 45nm gate length CMOSFETs integrated into a 90nm bulk technology incorporating strain engineering,” Electron Devices Meeting, 2003. IEDM 2003. IEEE International, 2003, pp. 77-80, 2003. [19] K. Mistry, M. Armstrong, C. Auth, S . Cea, T. Coan, T. Ghani, T. Hoffmann, A. Murthy, J. Sandford, R. Shaheed, K. Zawadzki, K. Zhang, S. Thompson, and M. Bohr, “Delaying forever: Uniaxial strained silicon transistors in a 90nm CMOS technology,” VLSI Technology, 2004 Symposium on, pp. 50-51, 2004. [20] W.-H. Lee, et al., “High performance 65 nm SOI technology with enhanced transistor strain and advanced-low-K BEOL,” Electron Devices Meeting, 2005. IEDM 2005. IEEE International, 2005, pp.56-59, 2005. [21] S. Tyagi, et al., “An advanced low power, high performance, strained channel 65nm technology,” Electron Devices Meeting, 2005. IEDM 2005. IEEE International, 2005, pp. 1070-1072, 2005. [22] P. Bai, et al., “A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 Cu interconnect layers, low-k ILD and 0.57 μm2 SRAM cell,” Electron Devices Meeting, 2004. IEDM 2004. IEEE International, 2004, pp. 657- 660, 2004. [23] C. Auth, et al., "45nm High-k + metal gate strain-enhanced transistors," in VLSI Technology, 2008 Symposium on, 2008, pp. 128-129. [24] K. Henson, et al., "Gate length scaling and high drive currents enabled for high performance SOI technology using high-κ/metal gate," in Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 645-648. [25] C. H. Jan, et al., "A 45nm low power system-on-chip technology with dual gate

9

(logic and I/O) high-k/metal gate strained silicon transistors," in Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 637-640. [26] http://public.itrs.net/reports.html. [27] S. Thompson and S. Parthasarathy, "Moore's law: the future of Si microelectronics," Materials Today, vol. 9, pp. 20-25, 2006. [28] S. E. Thompson, et al., "A logic featuring strained-silicon," Electron Device Letters, IEEE, vol. 25, pp. 191-193, 2004. [29] R. Chau, et al., "High-κ/metal-gate stack and its MOSFET characteristics," Electron Device Letters, IEEE, vol. 25, pp. 408-410, 2004. [30] K. J. Kuhn, "CMOS scaling beyond 32nm: challenges and opportunities," presented at the Proceedings of the 46th Annual Design Automation Conference, San Francisco, California, 2009. [31] A. Khakifirooz and D. A. Antoniadis, "MOSFET Performance Scaling—Part I: Historical Trends," Electron Devices, IEEE Transactions on, vol. 55, pp. 1391- 1400, 2008. [32] A. Khakifirooz, D. A. Antoniadis, "MOSFET Performance Scaling—Part II: Future Directions," Electron Devices, IEEE Transactions on , vol.55, no.6, pp.1401-1408, June 2008. [33] D. A. Antoniadis, et al., "Continuous MOSFET performance increase with device scaling: The role of strain and channel material innovations," IBM Journal of Research and Development, vol. 50, pp. 363-376, 2006.

10

Chapter 2 Si MOSFETs: Effective Scaling and Electrostatic Integrity

2.1 Introduction

Historically, metal-oxide-semiconductor field effect transistors (MOSFETs) are scaled down following Dennard Scaling [1]. All three dimensions of a MOSFET, including the channel length (Lg), channel width (W), effective oxide thickness (EOT), are scaled by 1/κ (κ>1) for each generation. Supply voltage (Vdd) is also scaled by 1/κ. As a result, the saturation current per unit width (Idsat) at Vgs=Vds=Vdd is constant for each generation, while device intrinsic delay is decreased by 1/κ and dynamic power

2 dissipation per switching is decreased by 1/κ . At the device level, Idsat at a fixed off- state drain current (Ioff) has historically been used as one of the most important figures of merit for MOSFET benchmarking, and is often adopted as targets or specifications for technology development for future generations [2]. However, for a CMOS circuit, the devices usually do not operate at the bias point which gives Idsat. Instead, the whole trajectory of the device operating points matters, for both logic switching application [3]-[6] and SRAM. Furthermore, the device operation trajectories are not identical for different circuit design and targeting applications. Thus, it is necessary to take into consideration the circuit-level details when engineering and optimizing device structures. In this chapter, we study the importance of different regions of the device I-V characteristics on the circuit-level performance. By analyzing the device operation trajectories, we establish the connections of these device-level targets with

11

circuit-level performances. We specifically focus on the impact of device electrostatic control on logic switching delay and SRAM stability. In Section 2.2, the modeling and simulation methodology are introduced. A device behavioral model is fitted to experimental data and a fast circuit simulator is built. In Section 2.3, our study targets logic switching application. The effectiveness of improving circuit speed by enhancing device electrostatic control is analyzed. In Section 2.4, SRAM stability is discussed. We examine the operation trajectories of pull-up, pull-down, and pass gate transistors in a typical 6-transistor (6T) SRAM cell. A way to simultaneously improve both read and write stabilities is revealed.

2.2 Modeling and Simulation Methodology

As the devices are scaled down, short channel device suffers from poor gate electrostatics and other parasitic effects. Non-uniform doping [7][8], high-k materials [9][10], thin-body structures [11]-[13], and multi-gate structures [14]-[21] are introduced to improve the electrostatic control. Meanwhile, mobility and velocity boosters, e.g. strain [22][23], are introduced to keep Idsat boosted. There are several levels of device modeling methodologies for describing device I-V performance. Numerical simulators, such as Sentaurus [24], compute device performance by the finite element method. These models incorporate accurate physical models and detailed information of the device structure and materials, with the penalty of computational efficiency. Conventional compact models, such as BSIM models [25][26] and PTM models [27][28], simplify the physics and express the model in an analytical way. These models usually have tens of parameters, some of which are

12

fitting parameters without obvious physical meanings and have to be adjusted for different technologies. Conventional compact models are usually used for circuit simulation, given a set of known devices of a developed technology. It is difficult to use compact models as device design tools. In this work, we adopt a new semi- empirical, physical IV behavioral model [23], which captures device-level characteristics that are directly measurable, such as subthreshold swing (SS), drain- induced-barrier-lowering (DIBL), current booster (kvs). These input variables are not completely independent in reality. For example, doping profile may affect SS and DIBL simultaneously. But we make an assumption that they can be adjusted independently over a reasonable range. In this way, we can skip the details about device engineering, such as how doping profile is designed, and how strain is applied to the device, because this is not the main purpose of the study. Meanwhile, we are able to set device design targets in a physical way, by assigning reasonable values to the key device variables, which all have clear physical meanings. Nominal parameters are characterized to fit devices characteristics at 32nm technology [29] with fitting parameters in Table 2.1. Simple and analytical parasitic capacitances are modeled following [30] . The capacitances are assumed to be independent of SS,

DIBL, and kvs.

Both I-V and C-V models are implemented into the Model for Assessment of cmoS Technologies And Roadmaps (MASTAR) program [33], which is a fast device and circuit simulator based on analytical models. MASTAR can perform transient simulation for a 4-stage FO4 inverter chain and 2-input NOR chain for logic switching application. The average delay from the 50% input level to the 50% output level is extracted. MASTAR can also simulate static butterfly curve to extract the read static noise margin (RSNM) [34][35] and the write static noise margin (WSNM) [36] of a 6- transistor (6T) SRAM cell.

13

Table 2.1 Key parameters for nominal cases at 32nm [29][31][32]

Device NMOS PMOS Physical meaning Parameter HP/LoP/LstP HP/LoP/LstP

Rs Total series resistance(Ω-μm) 350 350 DIBL Drain induced barrier lower (mV/V) 130 160

SS Subthreshold slope (mV/dec) 98 98

T Electric oxide thickness in inversion inv 1.2 1.4 (nm)

CPP Contact poly pitch (nm) 112 112

Lg Channel length (nm) 30 30

Vdd Supply voltage (V) 1 1

Ion On-state current (μA/μm) 1584/1293/756 1257/1034/619 I Off-state current (nA/μm) 100/10/0.1 100/10/0.1 off

2000 10-3 1500 -5 m m) 10   A/

A/ 1000   -7 ( (

10 d d I I 500 10-9 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 V (V) V gs ds (a) (b)

Figure 2.1 (a) Id-Vgs and (b) Id-Vds curves of the nominal 32nm HP NMOS [29]. The symbols and lines correspond to the experimental data and analytical models, respectively.

14

1000

100

Ioff (nA/µm) 10 NMOS PMOS

1 800 1300 1800 Ion (µA/µm)

Figure 2.2 The comparison of the experimental data [29] (symbols) and

analytical models (lines) on the Ion-Ioff plane. The model is fitted across a wide range of Ioff.

Because static power is not the focus of this study, and the device operates at above- threshold mode in most of the critical regions of the operation trajectories, we specifically take DIBL, instead of SS, as a target device-level characteristic to study the device electrostatic control. For a fixed Ioff, larger DIBL and kvs can both result in a larger Idsat. However, the origins of the larger Idsat are quite different and the impacts on IV characteristics are different. Compared with long channel devices, devices with higher DIBL have larger drain current under high drain voltage (Vd), with the same current at low Vd. However, devices with large kvs injection velocity booster have larger current for both high and low Vd. Different combinations of DIBL and kvs may result in the same Idsat, however, the different Id-Vd behaviors result in different trajectories during switching even with the same Idsat, as discussed in Section III and IV.

15

In this work, when the device DIBL is varied, the off-state current (Ioff) and the saturation current (Idsat) are kept constant to satisfy the conventional constraints on static power and delay (CV/Idsat), by accordingly adjusting kvs. The purpose of changing kvs is to have identical Idsat for different devices in comparison, because Idsat is widely adopted as a technology specification for delay evaluation purpose. It does not imply that kvs always changes correspondingly to DIBL in real device design.

With identical Ioff and Idsat, a high-DIBL device has a smaller drain current than a low- DIBL device at the same intermediate biasing conditions (Figure 2.3). The difference is more significant in the linear-to-saturation transition region. The model is calibrated to 32nm technology node as shown in Figure 2.1, Figure 2.2, and Table 2.1.

large DIBL small DIBL Idsat d I

0 Vdd V ds

Figure 2.3 Id-Vds curves of two devices with different DIBL but the same Idsat.

16

2.3 Logic Switching Application

We focus our analysis on the delay aspect. A 4-stage FO4 inverter chain is taken as a sample circuit. Three different applications, i.e. high performance (HP) with low threshold voltage (LVT), low operating power (LoP) with standard threshold voltage (SVT) and low stand-by power (LstP) with high threshold voltage (HVT), are included. For all cases in this work regardless of DIBL or kvs, the Ioff is 100nA/µm, 10nA/µm, and 0.1nA/µm for the HP, LoP, and LstP cases, respectively. Figure 2.4 shows the delay contour depending on DIBL and Idsat. Each point in the plot corresponds to a certain combination of DIBL and Idsat. The y-axis is the ratio of Idsat over Idsat of the nominal device as in Table 2.1. The circuits built by the devices corresponding to the points on the same contour line have the same delay from MASTAR simulation. Each stripe in the contour plot represents a 10% delay difference. The dot-dashed line refers to 10% speed-up over the dashed line. The slopes of the contours correspond to the sensitivity of the delay over the parameter indicated by the axis. The vertical and horizontal arrows correspond to two possible ways to improved circuit delay: (1) boosting current by improving mobility, injection velocity, etc, and (2) reducing DIBL by enhancing gate electrostatic control.

For the horizontal arrows in Figure 2.4, though Idsat is kept at the same level in both cases by adjusting kvs, the inverter chain built by devices with higher DIBL switches at a lower speed. This is true for all HP, LoP, and LstP cases. However, the sensitivities of delay over DIBL are not identical for HP, LoP, and LstP cases. The LstP case has the steepest slope, while the the HP case has the least steep slope. This means that the LstP case is the most sensitive to DIBL, while HP is the least sensitive (Figure 2.4).

Effective current (Ieff) as defined in [3] represents the current passed by the transistors

17

averaged over the switching trajectory, and is an accurate measure of the switching speed. When DIBL is changed by a fixed amount from the nominal value for all three cases, the absolute values of Ieff are changed by a similar amount for all three cases

(Figure 2.5a, b, and c). However, since LstP devices have the smallest nominal Ieff, the percentage change in Ieff is the largest. Thus the change in delay is the largest, as shown in Figure 2.5c. To achieve a 10% speed-up of inverter-chain at the 32 nm node, kvs should be boosted to get a 15% higher Idsat for all three applications; or DIBL should be reduced by 115mV/V, 90mV/V and 55mV/V for HP, LoP and LstP, respectively (Figure 2.4). Hence, for LstP application, improving device DIBL is a relatively easy and efficient way to improve circuit delay. Conversely, for HP application, boosting kvs is a more practical and efficient way.

Figure 2.4 Delay improvement contours on DIBL-Idsat improvement plane for (a) inverter chain with HP devices, (b) inverter chain with LoP devices, (c) inverter chain with LstP devices, and (d) 2-input NOR chain with HP devices. Idsat0_HP, Idsat0_LoP, and Idsat0_LstP are the saturation current of the nominal device of HP, LoP, and LstP devices at 32nm (Table 2.1).

18

Besides the target applications, the sensitivity also depends on the circuit topology. In a static digital circuit, the logic gates are mainly categorized into two groups: (1) stacked gates, such as NAND and NOR gates have stacked devices in either or both of the pull-up and pull down branches, and (2) un-stacked gates, such as inverters, with single devices in both pull-up and pull-down circuits. In NAND and NOR circuits, the stacked devices operate in the linear region most of the time [6]. For a balanced

1600 1400 Case A: Nominal DIBL Case A: Nominal DIBL 1400 Case B: DIBL=0.2V/V 1200 Case B: DIBL=0.2V/V 1200 Trajectory A 1000 Trajectory A 1000

A) 800 A) 

800 

( Trajectory B ( d I

d Trajectory B 600 I 600

400 Inverter switching trajectory 400 Inverter switching trajectory Case A:Ieff=804A/m, delay=3.52ps Case A:I =598A/m, delay=4.70ps 200 200 eff Case B:Ieff =762A/m,delay=3.71ps Case B:Ieff =556A/m, delay=5.04ps 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 V (V) d V (V) d (a) 32nm HP inverter chain (b) 32nm LoP inverter chain

800 1600 Case A: Nominal DIBL Case A: Nominal DIBL 700 Case B: DIBL=0.2V/V 1400 Case B: DIBL=0.2V/V 600 Trajectory A 1200 Trajectory A 500 1000 Trajectory B A) A)  400 Trajectory B  800 ( ( d d I I 300 600

200 Inverter switching trajectory 400 2‐input NOR switching trajectory Case A:I =291A/m, delay=9.86ps eff Case A:Ieff=478A/m, delay=5.03ps 100 Case B:I =252A/m, delay=11.24ps 200 eff Case B:Ieff =446A/m, delay=5.43ps 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 V (V) V (V) d d (c) 32nm LstP inverter chain (d) 32nm HP NOR chain

Figure 2.5 Switching trajectories for for (a) inverter chain with HP devices, (b) inverter chain with LoP devices, (c) inverter chain with LstP devices, and (d) 2-

input NOR chain with HP devices. Idsat in Case A and Case B are the same, while Case B has a degraded DIBL of 200mV/V. The absolute difference in Ieff is similar (~40µA/µm) for inverter chains with (a) HP, (b) LoP and (c) LstP devices.

19

design with the same equivalent Idsat as inverters, NAND and NOR gates have larger gate capacitances, thus the switching trajectory of the parallel devices in NAND and NOR chains are further from the saturation point than the inverter chains since the devices are turned on more slowly. (Figure 2.5d). Thus, the delay is less sensitive to

Idsat. For the same balanced NMOS and PMOS, Ieff of NAND and NOR circuits [6] are much smaller than that of the inverter. So when the absolute values of Ieff are changed by a similar amount, the percentage change in Ieff is larger for stacked circuit (Figure 2.5d). As a result, improving DIBL is more efficient for delay reduction in the case of stacked circuits than un-stacked circuits.

To summarize Section 2.3, enhancing gate electrostatic control can improve circuit- level delay by increasing the current in the linear-region, even when Idsat of the devices are not increased. However, the efficiency of this method depends on the target applications and circuit topology, because the device operation trajectories are different. In general, the further the switching trajectory is from Idsat, the more sensitive the delay is to DIBL. Hence improving DIBL is more practical and efficient for LstP applications and stacked circuits, compared with HP/LoP applications and un- stacked circuits.

2.4 SRAM

In modern VLSI design, the role of memory is more and more important [37]-[39]. The performance of SRAM, one of the dominant types of on-chip memory, directly

20

BL BL 1 WL1 WL 1.0 PU PU 0.8 1 0.6

VR 1 (V)

V L PG L PG 0.4 V PD PD 0.2 0.0 0.00.20.40.60.81.0 VR (V)

(a) (b)

Figure 2.6 Read operations: (a) circuit diagram, and (b) VTC plot. The circles in (b) are the three distinct roots. RSNM is defined as the length of the square shown in (b).

BL BL 1 WL1 WL 1.0 PU PU 0.8 1 BL=Vdd VR 0 0.6

PG VL PG (V) L BL=Gnd

V 0.4 PD PD 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 VR (V)

(a) (b)

Figure 2.7 Write operations: (a) circuit diagram, and (b) VTC plot. The circles in (b) are the only root. WSNM is defined as the length of the square shown in (b). affects the overall performance of the digital systems [37][38]. As the supply voltage scales down to sub-1V, cell stability has become a major issue for the design of

21

SRAM [40]. A typical 6T SRAM cells consists of two branches, each formed by a pull-up (PU) transistor, a pull-down transistor (PD), and a pass-gate (PG) transistor. The voltage of the common terminal of PU, PD and PG-transistor of the same branch represents the stored logic information. During read operations (Figure 2.6a), the SRAM cell operates as a bistable circuit with two distinct stable states, expecting no flip of the stored information. During write operations (Figure 2.7a), the bit-line (BL) with low voltage intends to pull down the voltage level of the storage node at the same side of the cell. After decoupling the two branches in the same cell, each branch has one set of read and one set of write static voltage transfer curves (VTCs), with the input as the gate voltage of PU and PD transistors, and the output as the voltage of the storage node, while the BL and word-line (WL) are biased as during the read and write operations, respectively. The VTC plots with VTCs of both branches, often referred as “butterfly curves”, are useful tools for stability analysis. The cross points (the “root”) on the VTC plots are the possible DC states of the SRAM cell. A successful read operation can be translated as the requirement of having 3 distinct roots on the read VTC plot (Figure 2.6b). A successful write operation can be translated as the requirement of having only one root on the write VTC plots (Figure 2.7b). Read static noise margin (RSNM) Figure 2.6b) and write static noise margin (WSNM) (Figure 2.7b) are two figures of merit widely adopted to describe the stability of the

SRAM cell. 32nm technology [29] in Figure 2.1 and 2.2 is used as a sample device.

With the assumption of symmetric branches, RSNM and WSNM equal to the length of the largest squares that can fit between the cell static VTC during a read and write operation, respectively (Figure 2.6b and Figure 2.7b). As supply voltage scales down, the design space becomes tighter and tighter, because RSNM and WSNM usually compete with each other. For example, increasing the width of the PG transistors improves WSNM at the expense of a decreased RSNM.

22

pull up + pass gate 1.0 pull down 2.0 practical trajectory 0.8 ideal trajectory 1.5

m) 0.6 (V)  1.0 L 0.4 V

(mA/ 0.5 0.2 d I 0.0 0.0 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 VL (V) VR (V) (a) (b)

Figure 2.8 (a) The trajectory and (b) VTC plot for read operations.

pull down 0.20 pull down and pass gate 0.17 pull down and pull up pull up all pass gate

(V) 0.18

0.16

0.15 0.16 RSNM (V) RSNM 0.14 0.14 0.06 0.09 0.12 0.15 0.06 0.09 0.12 0.15 DIBL (V/V) DIBL (V/V) (a) (b)

Figure 2.9 DIBL vs. RSNM. DIBL of a single transistor (a) and multiple devices (b) are adjusted. In (a), changing DIBL of the PD transistor is the most effective in improving RSNM (17.6%); and in (b) changing DIBLE of the PG transistor is the most effective for WSNM (25.1%). The ratio of the widths of PD, PU, and PG transistors is 1.5:1:1.

To address the impact of the device level characteristics on the SRAM behavior, we change the DIBL of each transistor as well as a combination of the pull down, pull up and PG transistors, and evaluate the resultant RSNM and WSNM. Similar to the

23

analysis in Section 2.3, the trajectories of SRAM operations are investigated to analyze SRAM performance. There is a one-to-one relation between the trajectories and the VTC plots of the SRAM. However, the information shown in the trajectories and the VTC plots focus on different aspects. The VTC plots include the voltage information of both storage nodes. No current information is revealed. The operation trajectories, on the other hand, show the biasing condition and current passed by of each of the transistor, but voltages of only one of the storage node can be directly read out from the trajectory plot. Figure 2.8a is the trajectory for read operations of the left branch of the SRAM cell. VL and VR are the voltages of the left and right storage nodes as in Figure 2.6a, respectively. The set of square-marked curves is the I-V behavior of the PD transistor. It is the same as the typical NMOS Id-Vds behavior. The points on the same curve correspond to the same VR. The set of dotted curves with cross makers is the sum of the current of PU and PG transistors, coming out of the left storage node. Because the gate of the PG transistor is always biased at Vdd during the read operation, the current of PG transistor is independent of VR. The difference between the different curves only comes from the PU transistor. The solid lines are the intersections of the two sets of curves with the same VR. This is the actual device operation trajectory during read operation, corresponding to the solid curves in the VTC plot (Figure 2.8b). If the devices are ideal switches, the trajectory and VTCs are shown as the dashed red lines in the Figure 2.8a and b. RSNM is maximized to be V- dd/2 with ideal switches. The ideal voltage transfer curve should have zero gain at

VR>Vdd/2 and VR

The closer the actual trajectory is to the ideal trajectory, the larger RSNM is. In reality, the actual trajectory deviates from the ideal trajectory, especially at the

VR>Vdd/2 bias region, in which the trajectory is determined by the saturation region of

24

the PG transistor and the linear-to-saturation transition region of the pull down transistor. Since a lower DIBL increases the current in the transition region, reducing the DIBL of the pull down transistor is the most effective way for improving SRAM stability (Figure 2.9). Figure 2.9 shows the RSNM dependence on DIBL of different devices. DIBL of single and multiple devices are varied from the nominal cases in Figure 2.9 a and b, respectively.

WSNM can be analyzed in a similar way as shown in Figure 2.10. In Figure 2.10a, the set of square-marked curves are the I-V behavior of the PU transistor. The points on the same curve have the same VR. The set of dotted curves are the sum of the current of the PD and PG transistors, going out of the left storage node. The points on the same curve have the same VR. The solid line is the intersections of the two sets of

I-V curves above with the same VR, which is one-to-one mapped to the VTC plot in Figure 2.10b. The write trajectory is mainly determined by the transition region of the PG transistor and the saturation region of the pull up transistor (Figure 2.11). Figure 2.11 shows the RSNM dependence on DIBL of different devices. DIBL of single and multiple devices are varied from the nominal cases in Figure 2.11a and b, respectively. Lowering the DIBL of the PG transistor is the most effective way to improve WSNM due to the current enhancement in the transition region (Figure 2.11a).

As illustrated in Table 2.2, the dependence of RSNM and WSNM changes with the sizing of the transistors. The WSNM dependence can be very sensitive to the sizing ratios. For example, if the DIBL of PU transistor is reduced from 150mV/V to 50mV/V, the WSNM reduces by 17.8% when the ratio of the widths of the PD, PU and PG transistors is 1.5:1:1, while the WSNM increases by 0.63% when the ratio is 2:1:1.5.

25

1.0 DIBL=50mV/V 0.8 DIBL=150mV/V

0.6 (V) R 0.4 V 0.2

0.0 0.0 0.2 0.4 0.6 0.8 1.0 VL (V) (b)

Figure 2.10 (a) The trajectory and (b) VTC plot for write operations.

0.35 pull down 0.35 pull down and pass gate pull up pull down and pull up 0.30 pass gate 0.30 all

0.25 0.25

0.20 0.20 WSNM (V) WSNM (V) 0.15 0.15 0.06 0.09 0.12 0.15 0.06 0.09 0.12 0.15 DIBL (V/V) DIBL (V/V) (a) (b)

Figure 2.11 DIBL vs. WSNM. DIBL of a single transistor (a) and multiple devices (b) are adjusted. The ratio of the widths of PD, PU, and PG transistors is 1.5:1:1.

Generally, there is a competing effect between RSNM and WSNM, e.g., increasing the size of the PG transistor improves WSNM but degrades RSNM. However, because changing the DIBL of the three transistors has different impacts on RSNM and WSNM (Figure 2.9 and Figure 2.11), improving the DIBL of both the PD and the PG transistor can simultaneously enlarge RSNM and WSNM (Table 2.2). The reasons are as follows. For the read operation, reducing DIBL of PD transistor improves RSNM,

26

while reducing DIBL of PG transistor degrades RSNM. Both transistors are in the on- state. However, the PD transistor has a larger current. Hence, the impact from the PD transistor dominates. Similarly, for the write operation, reducing DIBL of PG transistor improves WSNM, while reducing DIBL of the PD transistor degrades WSNM. However, since PD transistor is mainly at the off-state during write operations, the impact from the PG transistor dominates. Furthermore, because both PD and PG transistors are n-type devices, it is practical to improve DIBL of both transistors with the same processing flow. This suggests we could improve both RSNM and WSNM by reducing DIBL through device engineering such as optimizing the doping profile [7][8], using a thinner channel body[11]-[21], and relaxing the channel length [30]-[41]. The improvement depends on the ratio of the widths of the transistor. If the DIBL of PU transistor is reduced from 150mV/V to 50mV/V, the WSNM reduces by 17.8% when the ratio of the widths of the PD, PU and PG transistors is 1.5:1:1, RSNM and WSNM can be simultaneously improved by 12%. Moreover, improving the DIBL of the PU transistor can increase the RSNM and WSNM simultaneously for certain ratios of the widths of the transistors (Table 2.2).

Table 2.2 Changes of RSNM and WSNM by reducing DIBL of specific devices from 150mV/V to 50mV/V

1.5:1:1 ratio 2:1:1.5 ratio

RSNM WSNM RSNM WSNM

PD +17.6% -11.4% +18.2% -5.31%

PU +10.3% -17.8% +9.23% +0.63%

PG -5.46% +25.1% -4.80% +8.14%

PU+PG +12.3% +12.4% +13.5% +2.63%

PU+PD +28.3% -30.0% +28.1% -6.03%

PU+PD+PG +22.5% -16.7% +23.1% -7.36%

27

By varying the DIBL and the ratio of the transistor widths, the RSNM and WSNM can be engineered to cover a wide range. The optimum design is on the Pareto optimal curve of Figure 2.12. Each point on the Pareto curve is equally good. The decision on the device design should be made based on the practical requirement of the cell stability. For example, for a design targeting RSNM at 0.17V, the maximum WSNM is achieved by choosing the ratio of the pull down, pull up and PG transistors to be 2.5:1:1.5 and the DIBL to be 50, 100, 100mV/V, respectively.

PD:PU:PG 2.5:1:1.1 2.5:1:1.5 2:1:1.5 0.4 2.5:1:1.25

0.3 PD:PU ratio Pareto Curve 0.2

WSNM (V) WSNM PU:PG ratio 0.1 0.12 0.16 0.20 0.24 RSNM (V)

Figure 2.12 RSNM vs. WSNM with different ratios of the widths of transistors with the constraint that the cell area is less than 10 times of the pull up transistor’s area.

2.5 Conclusions

In this paper, we carefully analyze the impact of device electrostatic control on different circuit blocks. We start from a behavioral I-V model, and implement

28

analytical I-V and C-V models into a fast circuit simulator, MASTAR. Analysis on logic switching application and SRAM are both discussed, with DIBL as a representative variable for device electrostatic control. For logic switching application, improving DIBL accelerates the switching delay at the circuit-level. The efficacy depends on the target application (HP, LoP, and LstP applications) and circuit topologies (stacked and un-stacked circuits). Improving DIBL is the most effective for LstP application and stacked circuits. For SRAM application, we focus on read and write static noise margins. A switching trajectory approach is proposed to analyze the impact of the device I-V characteristic on the SNM of a SRAM cell. Reducing the DIBL of the pull down and PG transistor can effectively improve SRAM stability. A 12% simultaneous increase in both RSNM and WSNM is demonstrated by reducing the DIBL of the pull down and the PG transistors from 150mV/V to 50mV/V at a practical sizing ratio.

2.6 Bibliography

[1] R. Dennard et al., “Design of Ion-Implanted MOSFET’s with Very Small

Physical Dimensions,” IEEE Journal of Solid-State Circuits, vol. SC-9, no. 5, pp.

256 –268, Oct 1974.

[2] http://public.itrs.net/reports.html.

[3] M. H. Na, et al., "The effective drive current in CMOS inverters," in Electron Devices Meeting, 2002. IEDM '02. Digest. International, 2002, pp. 121-124.

[4] J. Deng and H. S. P. Wong, "Metrics for performance benchmarking of nanoscale

29

Si and carbon nanotube FETs including device nonidealities," Electron Devices, IEEE Transactions on, vol. 53, pp. 1317-1322, 2006.

[5] E. Yoshida, et al., "Performance boost using a new device design methodology based on characteristic current for low-power CMOS," in Electron Devices Meeting, 2006. IEDM '06. International, 2006, pp. 1-4.

[6] K. von Arnim, et al., "An Effective Switching Current Methodology to Predict the Performance of Complex Digital Circuits," in Electron Devices Meeting, 2007. IEDM 2007. IEEE International, 2007, pp. 483-486.

[7] T. Yuan and E. J. Nowak, "CMOS devices below 0.1 µm: how high will performance go?," in Electron Devices Meeting, 1997. IEDM '97. Technical Digest., International, 1997, pp. 215-218.

[8] S. Asai and Y. Wada, "Technology challenges for integration near and below 0.1µm," Proceedings of the IEEE, vol. 85, pp. 505-520, 1997.

[9] R. Chau, et al., "High-κ/metal-gate stack and its MOSFET characteristics," Electron Device Letters, IEEE, vol. 25, pp. 408-410, 2004.

[10] Z. Wenjuan, et al., "Mobility measurement and degradation mechanisms of MOSFETs made with ultrathin high-k ," Electron Devices, IEEE Transactions on, vol. 51, pp. 98-105, 2004.

[11] L. Hyung-Kyu and J. G. Fossum, "Threshold voltage of thin-film Silicon-on- insulator (SOI) MOSFET's," Electron Devices, IEEE Transactions on, vol. 30, pp. 1244-1251, 1983.

[12] D. J. Wouters, et al., "Subthreshold slope in thin-film SOI MOSFETs," Electron Devices, IEEE Transactions on, vol. 37, pp. 2022-2033, 1990.

[13] J. P. Colinge, Silicon-on-insulator technology : materials to VLSI , Kluwer Academic Publishers, Boston, MA, 2004.

[14] F. Balestra, et al., "Double-gate silicon-on-insulator transistor with volume 30

inversion: A new device with greatly enhanced performance," Electron Device Letters, IEEE, vol. 8, pp. 410-412, 1987.

[15] K. Suzuki, et al., "Scaling theory for double-gate SOI MOSFET's," Electron Devices, IEEE Transactions on, vol. 40, pp. 2326-2329, 1993.

[16] H. S. P. Wong, et al., "Device design considerations for double-gate, ground- plane, and single-gated ultra-thin SOI MOSFET's at the 25 nm channel length generation," in Electron Devices Meeting, 1998. IEDM '98 Technical Digest., International, 1998, pp. 407-410.

[17] D. Hisamoto, et al., "FinFET-a self-aligned double-gate MOSFET scalable to 20 nm," Electron Devices, IEEE Transactions on, vol. 47, pp. 2320-2325, 2000.

[18] J. P. Colinge, et al., "Silicon-on-insulator `gate-all-around device'," in Electron Devices Meeting, 1990. IEDM '90. Technical Digest., International, 1990, pp. 595-598.

[19] J.-P. Colinge, "Multiple-gate SOI MOSFETs," Solid-State Electronics, vol. 48, pp. 897-905, 2004.

[20] B. Doyle, et al., "Tri-Gate fully-depleted CMOS transistors: fabrication, design and layout," in VLSI Technology, 2003. Digest of Technical Papers. 2003 Symposium on, 2003, pp. 133-134.

[21] B. S. Doyle, et al., "High performance fully-depleted tri-gate CMOS transistors," Electron Device Letters, IEEE, vol. 24, pp. 263-265, 2003.

[22] S. E. Thompson, et al., "A logic nanotechnology featuring strained-silicon," Electron Device Letters, IEEE, vol. 25, pp. 191-193, 2004.

[23] D. A. Antoniadis, et al., "Continuous MOSFET performance increase with device scaling: The role of strain and channel material innovations," IBM Journal of Research and Development, vol. 50, pp. 363-376, 2006.

[24] Sentaurus ® Synopsis. 31

[25] Y. Cheng and C. Hu, MOSFET Modeling BSIM3 User’s Guide, Kluwer Academic Publishers, MA, 1999.

[26] http://www-device.eecs.berkeley.edu/~bsim3/

[27] Y. Cao, et al., "New paradigm of predictive MOSFET and interconnect modeling for early circuit simulation," in Custom Integrated Circuits Conference, 2000. CICC. Proceedings of the IEEE 2000, 2000, pp. 201-204.

[28] http://ptm.asu.edu/.

[29] S. Natarajan, et al., "A 32nm logic technology featuring 2nd-generation high-k + metal-gate transistors, enhanced channel strain and 0.171 µm2 SRAM cell size in a 291Mb array," Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 941-943.

[30] L. Wei, et al., "CMOS technology roadmap projection including parasitic effects," in VLSI Technology, Systems, and Applications, 2009 (VLSI-TSA'09). International Symposium on, 2009, pp. 78-79.

[31] F. Arnaud, et al., "32nm general purpose bulk CMOS technology for high performance applications at low voltage," Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 633-636.

[32] X. Chen, et al., "A cost effective 32nm high-K/ metal gate CMOS technology for low power applications with single-metal/gate-first process," VLSI Technology, 2008 Symposium on, 2008, pp. 88-89.

[33] http://public.itrs.net/models.html.

[34] E. Seevinck, et al., "Static-noise margin analysis of MOS SRAM cells," Solid- State Circuits, IEEE Journal of, vol. 22, pp. 748-754, 1987.

[35] J. Lohstroh, et al., "Worst-case static noise margin criteria for logic circuits and their mathematical equivalence," Solid-State Circuits, IEEE Journal of, vol. 18, pp. 803-807, 1983. 32

[36] A. Bhavnagarwala, et al., "Fluctuation limits & scaling opportunities for CMOS SRAM cells," in Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, 2005, pp. 659-662.

[37] C. Webb, “Memory Architecture,” IEDM Short Course on “Advanced Memory Technology and Architecture,” 2001.

[38] J. D. Warnock, et al., "The circuit and physical design of the POWER4 microprocessor," IBM J. Res. Dev., vol. 46, pp. 27-51, 2002.

[39] K. Changhyun, "Future Memory Technology Trends and Challenges," in Quality Electronic Design, 2006. ISQED '06. 7th International Symposium on, 2006, pp. 513-513.

[40] R. W. Mann, et al., "Limits of bias based assist methods in nano-scale 6T SRAM," in Quality Electronic Design (ISQED), 2010 11th International Symposium on, 2010, pp. 1-8.

[41] F. Boeuf, et al., "An Evaluation of the CMOS Technology Roadmap From the Point of View of Variability, Interconnects, and Power Dissipation," Electron Devices, IEEE Transactions on, vol. 55, pp. 1433-1440, 2008.

33

34

Chapter 3 Si MOSFETs: Parasitic Engineering & Selective Device Structure Scaling

3.1 Introduction

Historically, the delay of CMOS devices is benchmarked by the intrinsic delay

τint=CgcVdd/Idsat, where Cgc is the intrinsic gate-to-channel capacitance, Vdd is the supply voltage, and Idsat is saturation current when the device is biased at Vgs=Vds=Vdd. In the long channel regime, parasitic capacitances are negligible compared with Cgc; hence the device-level intrinsic delay is a good indicator for the circuit-level speed for CMOS logic applications. According to Dennard’s scaling theory [1], when the channel length (Lg), channel width (W), effective oxide thickness (EOT), and supply voltage are scaled by 1/ κ (κ>1), Cgc/Idsat remains as a constant. Thus, τint is scaled by 1/κ, which means the circuit built by scaled devices can run κ times faster. Dennard scaling has been the technology engine for many generations down to sub-100nm regime. Technology boosters such as strain [2][3], high-k/metal gate[4][5] have helped the continuation of the historic performance trend down to the 45 nm technology node. As device physical gate length is reduced below 30 nm, gate length scaling becomes less effective, because the shorter gate lengths must be traded off against various leakage (subthreshold, gate, BTBT) currents [6]-[11]. Meanwhile, the increasing contribution of parasitic capacitances and series resistances becomes a challenge for future technology. In particular, parasitic capacitances [10]-[12], which

35

are also charged/discharged during the device switching, are no longer negligible. Parasitic capacitances significantly affect the circuit-level speed.

In this chapter, we first focus on parasitic capacitances in Section 3.2. We introduce simple analytical capacitance models for bulk CMOS, fully-depleted silicon-on- insulator (FDSOI) devices, and planar double gate (DG) devices. Then, we study which device-level capacitance components affect circuit-level delay performance by analyzing the CMOS logic switching trajectory. An expression for the actual device delay (τ) is developed to estimate circuit-level delay, including both intrinsic and parasitic capacitances. Furthermore, the capacitance models are implemented into a fast device and circuit simulator, the Model for Assessment of cmoS Technologies And Roadmaps (MASTAR) application [13], to estimate FO1 and FO3 delay of the inverter chains. Different device design scenarios are examined. In Section 3.3, we propose new selective scaling scenarios which extend the study to examine the effect of detailed device features in both the lateral and vertical directions (contact sizes, overlay tolerances, gate heights and plug heights) on both the parasitic resistances and capacitances. In particular, the impact of selective scaling of device structures on the trade-off between the parasitic capacitances (outer-fringe capacitance, gate-to-plug capacitance, plug-to-plug capacitance) and parasitic resistance are quantified. Circuit- level simulations are performed to verify the benefits of selective device structure scaling. Guidelines for selective scaling are proposed, and a more comprehensive and efficient selective scaling scenario than that was described in [14] is developed. A methodology for extending technology scaling roadmap is introduced to continue the historic performance trend for several generations even without scaling the gate length or sacrificing the contacted gate pitch (or device density).

36

3.2 Parasitic Capacitance Engineering

3.2.1 Analytical Models for Capacitance Components

Device capacitances are split into different components according to their physical origins, including gate-to-channel capacitance (Cgc), gate-to-substrate capacitance

(Cgb), overlap capacitance (Cov), inner fringe capacitance (Cif), outer-fringe capacitance (Cof), gate-to-plug capacitance (Cpcca), corner capacitance (Ccorner), and junction capacitance (Cj). Simple analytical models of each component are developed for bulk CMOS device (denoted as “bulk” in the subscripts, Figure 3.1a), FDSOI devices (denoted as “SOI”, in the subscripts, Figure 3.2b), and DG devices (denoted as “DG” in the subscripts, Figure 3.3c), as described in the following paragraphs.

Models of Cov, Cif, Cof, Cpcca, Ccorner, and Cj are for one side (source or drain). Lg is the channel length. ΔL is the gate to source/drain overlap length in total. W is the gate width. Wext is the width of gate-overhang per side. Lpitch is the contacted gate pitch.

LPCCA is the distance between the contact plug and the gate edge. tox and tox_el are the

Lcont Lg

LPCCA

tpoly

trSD tSi tSi xj tbox

(a) (b) (c)

Figure 3.1 Device schematics of (a) bulk device, (b) FDSOI device, and (c) planar double gate device

37

physical and electrical gate dielectric thickness, respectively. tpoly is the poly gate thickness. tsi is the silicon body thickness in FDSOI and DG devices. tbox is the thickness of body insulator layer in FDSOI devices. trSD is the thickness of raised- source/drain in FDSOI devices. xj is the source/drain junction depth in bulk devices.

εox, εcap, εSi are the permittivity of gate dielectric, capping layer, and silicon body, respectively.

A. Gate-to-channel capacitance (Cgc)

Gate-to-channel capacitances are modeled as a parallel plate capacitance between the gate electrode and the sheet of channel charge. Cgc only appears when the device is turned on and an inversion layer is formed. Electrical gate dielectric thickness (tox_el) is the separation between the gate and the inversion layer, which includes the quantization effect.

 ox CCgc__ bulk gc SOI LLW g , tox_ el

2 ox CLLWgc_ DG g . (3.1) tox_ el

B. Gate-to-substrate capacitance (Cgb)

When a channel is present, the substrate is shielded from the gate. When the device is in the subthreshold region, i.e., the device is in off-state, the gate is electrically coupled to the substrate. Gate-to-substrate capacitance can be modeled as several parallel capacitances in series, including the capacitances of gate dielectric layer, depleted Si layer, and the capacitance of the insulation layer in FDSOI devices. For double-gate devices, the middle of the silicon body is considered as a virtual ground.

38

However, no actual charge goes into or out of the undoped silicon body during device switching. Thus, Cgb_DG=0.

1 1 1  ox si CLLWLLWgb_ bulk   , tt ox dep

1 1 11 ox si  box  CLLWLLWLLWgb_ SOI    (3.2) tt  t  ox dep  box 

Cgb_ DG  0.

C. Overlap capacitance (Cov)

Similar to Cgc, the overlap capacitance is modeled as Cgc in Equation (3.1), with an equivalent length of 0.5ΔL per side.

 ox CCov__ bulk ov SOI 0.5  LW , tox_ el (3.3) CCov__. DG 2 ov bulk

D. Inner fringe capacitance (Cif)

To the first order, inner and outer fringe capacitances are determined by the shorter one of the two electrodes. Analytical model for inner fringe capacitance (per side) in long channel devices has previously been developed [15], under the assumption that channel length is much longer than the junction depth in bulk devices or silicon thickness in FDSOI/DG devices. However, this assumption can be problematic for advanced technology nodes where devices are designed well into the short channel regime [16]. The following analytical models are developed, so that Cif is determined by the smaller one of 0.5Lg and teff. Here, teff is the effective body thickness as defined

39

in Equation (3.4). aif is a fitting parameter around 0.05~0.1. Good agreement is achieved between the numerical simulation and our analytical models, as in Figure 3.2. The numerical simulation is done by Maxwell 2D [17], a 2D static electromagnetic field solver.

teff x j(),() bulk t si SOI or 0.5() t si DG

2 min(0.5Ltt , ) CWsi ln 1goxeff , if_, bulk SOI   atif ox 4 min(0.5Ltt , ) CWsi ln 1 goxeff (3.4) if_ DG   atif ox

EOT=1nm EOT=0.5nm 350 450

300 400 m) m)   250 350 (pF/ (pF/ if if C C 200 300

L =5,10,15nm g Lg=5,10,15nm 150 0.3 0.6 0.9 1.2 0.3 0.6 0.9 1.2 t /L t /L eff g eff g

Figure 3.2 Comparison of analytical model (lines) and numerical simulation (circles) of Cif

E. Outer-fringe capacitance(Cof)

Outer-fringe capacitance is modeled with conformal mapping methodology as in [18][19]. The usual assumption is that the length of the source/drain extension region

40

is much larger than the gate height, and therefore the integration along electric field line is typically done along the gate electrode in the vertical direction. However, for advanced technology nodes, device channel length scaling and hence gate height scaling has slowed down, while device pitch scaling is still about 0.7x per generation and continues to scaled down the source/drain extension length. As a result, this assumption is no longer valid. Following the conformal mapping methodology in [18]-[20], while integrating along the electrode of the source/drain extension in the horizontal direction, Cof can be modeled as follows. Figure 3.3 shows the comparison between numerical simulation and the analytical models. aof is fitted to be 0.35.

22 capWWPCCA tox PCCA cap W CCof__ bulk of SOIln  a of ln tt ox ox 22WWPCCA t22 PCCA W Cacaplnox cap ln of_ DG of  (3.5) ttox ox 

90

80 m)  70 (pF/ of

C t =1.5nm ox 60 t =2nm ox t =3nm Lg=25nm ox 50 0.33 0.67 1 1.33 L /L PCCA g Figure 3.3 Comparison of analytical model (lines) and numerical simulation

(symbols) of Cof

41

F. Gate-to-plug capacitance (Cpcca)

Gate-to-plug capacitance is split into two parts: Cpcca_nr is associated with the normal field, while Cpcca_fr is associated with the fringe field. In bulk devices, Cpcca_nr is described by a one-piece parallel plate capacitance; while in FDSOI devices where raised-source/drain is present, Cpcca_nr is described by two parallel plate capacitances in parallel. Cpcca_fr is modeled following [21]. For the planar DG structure as in Fig 1c, only the coupling between the top gate electrode to the contact plugs is included. The model agrees with numerical simulation as in Figure 3. 4.

CCCCCpcca_____ bulk pcca SOI pcca DG pcca fr pcca nr

0.5capWWtWt  cap poly__ eff  spacer rSD eff  PCCA t 2 PCCA Lg spacer ln  2Lt gbkpolyeff_  2tLpoly_ eff g  exp 2 2 1 bk PCCA  (3.6)

300 t =2L ,t =0 poly g rSD 250 t =L ,t =0 poly g rSD t =2L ,t =L m) 200 poly g rSD g  t =L ,t =L 150 poly g rSD g (pF/

PCCA 100 C 50 Lg=25nm 0 0.33 0.67 1 1.33 L /L PCCA g

Figure 3.4 Comparison of analytical model (lines) and numerical simulation

(symbols) of CPCCA

42

G. Corner capacitance (Ccorner)

The corner capacitance is the capacitance between the gate overhang to source/drain extension regions. The electric field associated with Ccorner is 3-dimensional. Ccorner is empirically modeled by Equation (3.7). Here, d is the distance between the mid-points of the two shaded surfaces (surface 1 and 2) as in Figure 3.5a. A is the effective area,

Surface 2: WextxTpoly

Wext

Tpoly d Surface 1:

Teff LPCCAxTeff LPCCA

(a)

numerical simulation 7 analytical model m) 

5 (aF/ corner

C 3

tpoly/Lg 2122222222122

Wext(nm) 24 24 30 36 24 24 24 16 24 24 24 30 36 2 2 2 2 1 2 2 2 4 2 2 2 L /L 1 PCCA g 3 3 3 3 3 3 3 3 3 3 3 3

tox(nm) 4444421444111

(b)

Figure 3.5 (a) Schematic of Ccorner, (b) comparison of analytical model (crosses) and numerical simulation (circles) of Ccorner

43

which is the geometric mean of the areas of the two shaded surfaces (surface 1 and 2). acorner is fitted to be 5. Figure 3.5b shows the comparison between numerical simulation and the analytical model. The numerical simulation is performed by Maxwell 3D [22], a 3D electromagnetic simulator.

222 dW0.5ext 0.5 TtT poly ox 0.5 eff 0.5 L PCCA

AWTLext poly PCCA T eff  (3.7) CCacap A corner__ bulk corner SOI corner d  CaA2 cap corner_ DG corner d

H. Junction Capacitance (Cj)

Source/drain junction capacitance is modeled as a conventional p-n junction capacitance in a bulk device, and as a parallel plate capacitance of the buried oxide between the source/drain and substrate in a FDSOI device.

0.33 qNN  V CWxLL0.5 si a d 1  , j_ bulk j pitch g  2VNbi a N d V bi  CWLLbox 0.5(  ), j_ SOIt pitch g box (3.8) C jDG_  0.

3.2.2 Actual Device Delay

During the switching process in logic applications, an input signal (Vin) is applied at the gate of the driving stage. In CMOS circuits with full-swing signals, the input signal either rises from 0 to Vdd or falls from Vdd to 0. Correspondingly, either pull-

44

down (NMOS) or pull-up (PMOS) transistor(s) of the driving stage is turned on, passing transient current to discharge or charge the capacitance at the output node, respectively. The output voltage (Vout) then either falls from Vdd to 0 or rises from 0 to Vdd, respectively. Delay is generally measured as the time duration from the 50% input level to the 50% output level. Depending on the switching directions, rise-to- fall delay (τrf) corresponds to the rising input to falling output process, while fall-to- rise delay (τfr) refers to the falling input to rising output progress (Figure 3.6). Actual device delay (τ) is the average of τrf and τfr.

Figure 3.6 Circuit and waveform of a 2-stage inverter chain.

The total capacitance (Ctot) charged/discharged at the output node is the capacitance that determines the circuit-level delay. Ctot includes three parts, the drain capacitance of the driving stage (Cd), the gate capacitance of the load stage (Cg), and the wiring capacitance between the two stages (Cint). We do not include Cint in this work but it can be included as a straight forward manner as it is a constant independent of voltages. The drain capacitance of the driving stage consists of the drain-to-gate capacitances (Cdg) and drain-to-substrate junction capacitance (Cj). Because the voltages applied to the two electrodes of Cdg change in a reverse direction during the switching, a multiplier M (M=2) is included to account for the Miller effect. In bulk

CMOS, Cj is Vout-dependent, because the depletion width of the drain-to-substrate

45

junction changes with Vout. For rise-to-fall delay, Vout falls from Vdd to 0.5Vdd; while for fall-to-rise delay, Vout rises from 0 to 0.5Vdd. Thus, to the first order, Cj is calculated as the average of Cj(Vout=0) and Cj(Vout=Vdd) to average τrf and τfr. Cg is also Vout-dependent. For MOSFET devices, when the device is off and no channel charge sheet is present, the gate is electrostatically coupled to source/drain junction through the inner fringe capacitance (Cif) and to the substrate through the gate-to- substrate capacitance (Csub). When the device is on and a conductive inversion layer is formed as the channel, both Cif and Csub are shielded by the channel. Instead, the gate is coupled to the source/drain junction by the gate-to-channel capacitance (Cgc).

For the rise-to-fall delay, Vout falls from Vdd to 0.5Vdd. During this process, PMOS of the load stage is switched from the off-state to the on-state, while NMOS of the load stage stays in the on-state (Figure 3.6). Conversely, for the fall-to-rise delay, Vout rises from 0 to 0.5Vdd. PMOS stays in the on-state, while NMOS is switched from the off- state to the on-state (Figure 3.6).

To average the contribution of PMOS and NMOS in both τrf and τfr, one quarter of Cg in the off-state (Cg_off) and three quarters of Cg in the on-state (Cg_on) are included to estimate Ctot, to the first order, as in Equation (3.9). M is the coefficient of Miller effect, and FO is the electric fan-out. FO1 (FO=1) and FO3 (FO=3) delays of inverter chains are widely adopted figures of merit for circuit-level delay benchmarking [4].

CCtot d0.25 C g__ off  0.75 C g on  FO

CMCdg j+ 0.25  C g__ off  0.75  C g on  FO  0.25CCCCCCgboffovifofpccacorner_  2 2 2 2  2 CCCov of pcca C corner  MC j FO 0.75CCC 2 2 2CC2 gc ov of pcca corner

(3.9)

46

3.2.3 Parasitic Capacitance Engineering

The Model for Assessment of cmoS Technologies And Roadmaps (MASTAR) application [13] is a simple device and circuit simulator based on analytical models, used for ITRS Roadmap calculation and projection [16]. We build nominal devices according to ITRS 2009 specifications [16]. The key parameters are listed in Table 3.1. As projected by ITRS 2009, the typical devices for Year 2010 (32nm node), Year 2013 (22nm node), Year 2016 (16nm node), and Year 2019 (11nm node) are bulk devices, FDSOI device, FDSOI devices, and DG devices, respectively. With these devices, we simulate inverter chains with our calculated device capacitances following Equation (3.9). By implementing (3.9) within the MASTAR program, the delay of an inverter chain is calculated. FO1 and FO3 delay with nominal devices including parasitic capacitances are illustrated in Figure 3.7.

30 FO1 FO3

10 Delay (ps)

1 2010(bulk) 2013(FDSOI) 2016(FDSOI) 2019(DG) Year Figure 3.7 FO1 and FO3 delay of the nominal devices based on ITRS 2009

47

(a)

intrinsic capacitance contributed by Cgc parasitic capacitances. 1

0.8

0.6

0.4

0.2

Capacitances normalized to Cgc to normalized Capacitances 0 32nm 22nm 16nm 11nm Technology node (b)

Figure 3.8 (a) Capacitance components normalized to Cgc for nominal devices based on ITRS2009, and (b) relative contribution of intrinsic capacitance (due to

Cgc) and parasitic capacitances in Ctot of a self-loaded FO1 inverter chain. . Note that Cov, Cif, Cof, Cpcca, Ccorner, and Cj are capacitances per side.

Figure 3.8a shows different capacitance components normalized to Cgc for nominal devices. Note that Cov, Cif, Cof, Cpcca, Ccorner, and Cj are capacitances per side.

Several parasitic capacitances, including Ccorner, Cpcca, and Cif, are comparable to Cgc.

Cpcca/Cgc increases when devices are scaled to more advanced technology. Taking

Miller effect into consideration, Ctot of an FO1 inverter chain is calculated following Equation (3.9). Figure 3.8b shows the relative contribution of intrinsic capacitance

48

(contributed by Cgc) and parasitic capacitances to Ctot for nominal devices. The intrinsic capacitance only accounts for 30%~40% of the total capacitance charged/discharged during the switching process. The weight of intrinsic capacitance decreases for more advanced technologies. From Year 2016 to 2019 (16nm technology node to 11nm technology node), by switching from single-channel device structures (bulk, FDSOI) to double-gate devices structure, the weight of the parasitic capacitances in the total capacitances decreases, because dual-channel structure doubles Cgc, but not all parasitic components are doubled. Different possible design scenarios are examined below.

Table 3.1 Key parameters for nominal devices adapted from the ITRS 2009 [16]. Reasonable assumptions are made for quantities not included in the ITRS.

2010 2013 2016 2019 Parameter Definition Unit 32nm 22nm 16nm 11nm Bulk FDSOI FDSOI DG Metal 1 pitch (M1) nm 90 64 44 32 Contacted gate pitch (Lpitch) nm 120 85 59 43 Channel length (Lg) nm 27 20 15 12 Length of lightly doped nm 3.9 2.9 2.1 1.7 source/drain (ΔL) Gate-to-contact length (LPCCA) nm 31.5 22 14.5 10 Physical effective oxide nm 0.95 0.7 0.57 0.62 thickness (EOT) Electrical effective oxide nm 1.26 1.1 0.97 1.02 thickness (EOT_el) Gate height (Tpoly) nm 24 40 30 24 Spacer thickness (Tspacer) nm 21 14.7 9.7 6.7 Raised source/drain thickness nm 0 20 15 12 (TrSD) Contact size (Lcont) nm 30 21.3 14.7 10.7 Junction/Si body thickness (x or j nm 12 7 5.1 3.9 Tsi) Minimum NMOS width (Wnmin) nm 45 32 22 16 Minimum PMOS width (Wpmin) nm 90 64 44 32 Supply voltage (Vdd) V 0.97 0.87 0.78 0.71 Off-state current (Ioff) nA/µm 100 100 100 100 NMOS Saturation current (Idsat) µA/µm 1200 1470 1730 1970

49

A. Gate height (Tpoly)

The dependence of Ctot on Tpoly is shown in Figure 3.9a. When Tpoly is reduced, Cpcca,

Cof , and Ccorner decreases. Thus, devices with a lower gate height suffer less from parasitics. Lowering the gate height by half of the nominal case speeds up the devices by 7%, for all technologies from 32nm to 11nm (Year 2010 to 2019). This is in agreement with earlier modeling [14] and experimental [23] studies.

B. Gate-to-contact distance (LPCCA)

Figure 3.9b shows the delay dependence on the LPCCA at each technology node. At the device level, reducing LPCCA reduces Cof and Ccorner, with the penalty of increasing

Cpcca, thus reducing the overall parasitic capacitances somewhat independent of LPCCA. At the circuit level, because a smaller device pitch enables shorter interconnect, thus a smaller interconnect RC delay [24] is achieved. Without taking interconnect into account, the delay of the inverter chain changes less than 5% when reducing LPCCA by half or increasing LPCCA to 1.5x.

C. Wide device vs. narrow device

Traditionally, devices can be divided into two main families according to the statistical distribution of the device widths: devices in logic cells with “large” width (~0.5µm) and devices in SRAM cells with small width (<0.1µm). As logic cell size is scaled down, the average “large” width is becoming smaller and is now close to 0.1µm. Most capacitance components are proportional to the device width to the first order, except for Ccorner. As a consequence, the Ccorner becomes significant for small device width. As shown in Figure 3.9c, wide devices, where Ccorner is negligible, is 13-20% faster than those with minimum widths.

50

30 nominal 0.5xT (nominal) poly

10 FO3 Delay (ps) Delay FO3

3 32nm 22nm 16nm 11nm Technology node (a)

30 nominal 0.5xL (nominal) PCCA 1.5xL (nominal) PCCA

10 FO3 Delay (ps) Delay FO3

3 32nm 22nm 16nm 11nm Technology node (b)

30 Wn=Wnmin,Wp=Wpmin Wn=1m,Wp=2m

10 FO3 Delay (ps) Delay FO3

3 32nm 22nm 16nm 11nm Technology node (c) Figure 3.9 FO3 delay dependences on (a) gate height, (b) gate-to-contact distance, and (c) average device width.

51

D. Enhancing PMOS current driving capability

Conventionally, for static circuit design, the ratio of the width of PMOS over NMOS is about 2. Intense effort has been made to boost PMOS driving capability [25]- [27](Figure 3.10a), which circuit designers can potentially utilize to reduce the PMOS device width and thus reduce the total capacitance. P/N ratio is defined as the ratio of

Idsat of PMOS over that of NMOS. Figure 3.10b shows that 27% delay reduction is obtained by boosting PMOS driving capability by 2x, (i.e. same driving capability as NMOS). Boosting PMOS driving capability by 1.5x achieves 2/3 of the improvement of 2x, i.e., 18%.

0.8

0.7 (nmos) 0.6 dsat

0.5

0.4 (pmos) / I / (pmos) dst I 0.3

0.2 32 45 65 90 130 180 250 350 Technology Node (nm) (a)

30 32nm 22nm 16nm 11nm

10 FO3 Delay (ps)

3 1/2 3/4 1/1 P/N ratio (b)

Figure 3.10 (a) Historical data of the ratios of saturation current (Idsat) of PMOS and NMOS (P/N ratio) [3][5]-[9][28]-[38], and (b) FO3 delay dependence on P/N ratio

52

From the analysis above, lowering the gate height is an effective way to reduce the parasitic capacitances, because both Cpcca and Cof are reduced. Reducing LPCCA does not introduce a big delay penalty, because the increase of CPCCA is offset by the reduction of Cof. When interconnect is taken into consideration, reducing LPCCA can potentially improve the circuit-level speed because of the shorter interconnect. Ccorner must be included for advanced technology nodes, where it is comparable to other capacitance components in narrow devices. Enhancing PMOS current driving capability can effectively reduce the circuit-level delay by reducing the total load capacitance.

3.3 Selective Device Structure Scaling including Parasitic

Capacitances and Series Resistances

3.3.1 Background of Contacted Gate Pitch Scaling

Contacted gate pitch (Lpitch) is the main driver for cost and performance. It has scaled along with general lithography from 1 μm through the 45 nm node [3],[5],[6]-[9], [28][38]. We introduce the concept of selectively scaled footprint [21], which is analogous to aggressive selective scaling of the gate length introduced in 0.35µm to 0.25µm era. Selective footprint scaling enables us to trade part of parasitic capacitance (Cpar) with extension series resistance (Rext) and interconnect wiring lengths. The speed and power efficiency are improved at the circuit level with a reduced wire length and chip area [21]. This section examines how to optimize the selective scaling in device structures in general, in both the horizontal (contact sizes,

53

overlay tolerances) and the vertical directions (gate heights, contact plug heights). We make the bold, yet plausible, proposal that contact sizes, overlay tolerances and heights of gates and plugs should be aggressively reduced (faster than general lithography) through process innovations. For example, aggressive reduction of contact sizes and overlay tolerance can potentially be achieved by self-assembly patterning techniques augmented by conventional photolithography. Block copolymer [39] (an organic material similar to photoresist) can self-organize into sub-20 nm holes

120 nm L 19 nm gc L M1 pitch

Lpoly Lcont L Hplug gc Hgate

Lsilicide Coverlap tsub Cg,intrinsic Scenario A: half Lcont (a) (b)

Cptp Cptp

Cgtp Cgtp

Plug Plug Plug Plug Cof Cof Plug Cof Cof

Scenario B: half Hplug Scenario C: half Hgate (c) (d)

Figure 3.11 (a) Schematics of planar device with related parasitics. Inset is the SEM photo of block copolymer self-assembled contact hole patterns. The holes are self-aligned to the edge of a topography 40 nm deep. Holes are 19nm±1.8 nm with a

pitch of 42 nm± 2.7 nm. Lgc is the offset of the holes from the edge. (b-d) Schematics of the three proposed scaling approaches: (b) Scale contact size (Lcont), (c) Scale contact plug height (Hplug), and (d) Scale gate height (Hgate). Dashed lines denote the structures before scaling.

54

that are self-registered to an existing 40 nm topography (inset of Figure 3.11a) [40][41] . In addition, the reduction of the gate heights and plug heights are enabled by metal gate technology [4][5].

3.3.2 Selective Device Structure Scaling

We first focus on the delay merit at the device level. The trade-off between the series resistances and parasitic capacitances determines the device speed. We decompose possible selective scaling scenarios into two categories: (I) Reducing the distance between the gate edge to the inner edge of the contact plug (Lgc); (II) reducing the lateral size of the contact hole (Lcont), the contact plug height (Hplug) and the gate height (Hgate). For category (I), the series resistance is reduced because of the shorter source/drain extension region, obtained by sacrificing the parasitic capacitances; while, for category (II), the parasitic capacitances are effectively reduced, by trading off the contact series resistance. Furthermore, both (I) and (II) efficiently reduce the layout pitch, which directly shortens the interconnect length. Figure 3.11(a) shows a schematic of the device structure with the geometric parameters labeled.

(I): Reducing Lgc

We start with a detailed analysis of the total gate capacitance (Cgg) and S/D node capacitance (Csd) by both 3D and 2D simulations. A full 3D simulation [10] accurately captures the 3D fringing capacitance (Figure 3.11) from the gate to contact plugs. The geometric parameters of the nominal case used for simulation are listed in Table 3.3. The parameters are chosen based on 65nm technology. The reason for using a 65nm technology for analysis rather than 45nm technology is explained in Section 3.3.4. Further discussions about more advanced technologies are given in

Section 3.3.5 with an extended roadmap. For the total gate capacitance (Cgg) and the

55

S/D node capacitance (Csd) which take into consideration the parasitic components, 2D slot approximation of the plug matches the 3D results quite well at long Lgc, but overestimates the parasitic capacitances at short Lgc (Figure 3.12). The reason is that for short Lgc (<15nm), the fringing effect is significant enough that the elliptical shape of the E-field cannot be ignored as in the 2D case, as illustrated by the inset of Figure 3.12. It is important to be clear about the range where 3D simulation is necessary to minimize computational cost with an acceptable accuracy. 3D simulations shows negligible difference between cylindrical plugs and square pillar plugs for Lgc>15nm, and even for Lgc=5nm, the cylindrical plugs give only 6% less Cgg and 7% less Csd than square pillar plugs.

3.5 C C C gg sd g,intrinsic 3 E-field

m) 2.5 Decreasing Lgc μ

2

1.5 Gate Contact

1 Capacitance (fF/ Capacitance Typical device 0.5 Dashed lines: 2D Solid lines: 3D (square pillar plugs) 0 0 10 20 30 40 50 60 70 80 Distance Between Gate Edge and Contact Edge (nm)

Figure 3.12 Due to the increased portion of the elliptical shape of the E-field for

the shorter Lgc (<20nm), 3-D predicted capacitance values (solid lines) become much lower, compared with 2-D counterparts (dashed lines).

Figure 3.13 shows the the components of Cgg and Csd as a function of Lgc. The intrinsic gate capacitance is only ~50% of Cgg, the total gate capacitance. Parasitic capacitances are contributed by the outer-fringe capacitance (Couter-fringe), the gate-to- plug capacitance (Cgate-plug), and the plug-to-plug capacitance (Cplug-plug). The Couter-

56

fringe and Cgate-plug are responsible for ~40%. At small Lgc, the rapid increase of capacitance is due to Cgate-plug. This is shown numerically in Figure 3.13 and can also be explained by the analytical model described in [21]. In [21], Cgate-plug is decomposed into normal and fringing parts. The normal part is roughly inversely proportional to Lgc, while the fringing part is inversely proportional to the logarithms of Lgc. Both parts increase when Lgc decreases.

2 C gate-plug 0.4H C gate outer-fringe

m) C overlap μ 1.5 C g,intrinsic (fF/ gg 1

0.5 Capacitance C

0 5 10 20 30 40 50 60 70 80 L (nm) gc (a)

0.8 C gate-plug 0.7 C plug-plug m) 0.4H μ 0.6 gate C outer-fringe

(fF/ C 0.5 overlap sd 0.4

0.3

0.2

Capacitance C 0.1

0 5 10 20 30 40 50 60 70 80 L (nm) gc (b) Figure 3.13 (a) Gate Capacitance, (b) S/D node capacitance, and their breakdown including Miller effect vs. gate to plug edge distance Lgc. The minimum Lgc is about 0.4Hgate to keep Cgg (Csd) within 8% (15%) of the nominal values for typical devices, respectively.

57

To first order, the increase in Cgg (Csd) will be less than 8% (15%) of the nominal values for “typical” devices (designed with standard design rules) if Lgc is larger than

0.4× gate height (Hgate). For Lgc<0.4Hgate, there is significant performance loss due to increasing parasitic capacitance.

Reducing Lgc reduces the series resistances, mainly due to shortening of the source/drain extension region. With the same supply voltage and channel structures, reducing the series resistances increases the on-current. Ultra-Thin Body Silicon-on- Insulator devices (UTBSOI) and bulk devices are built with Taurus Devices [42], following the conventional design parameters (Table 3.3 and Table 3.4). By reducing

Lgc gradually without changing the other parameters, the on-current is improved by 10% for UTBSOI and 8% for bulk devices. The on-current improvement saturates for very small Lgc when the source/drain extension region is too short to contribute significantly to the total series resistance. For a very rough estimation, the resistance of the extension region can be approximated to be proportional to the length of the extension region and inversely proportional to the junction depth. For further studies, Kim et al. [43] has carefully modeled the series resistances.

(II): Reducing contact size, plug height, and gate height

Selectively scaling the device structure (the contact size, plug height, and gate height) in all three directions can reduce the parasitic capacitance. To illustrate the concept, we studied three scaling scenarios (Figure 3.12 b-d) and their combinations: A) reducing the planar dimensions of S/D contacts by half, B) reducing the S/D contact plug height by half, and C) reducing the gate height by half. The rest of this paragraph gives a first-order qualitative analysis, followed by quantitative results in the next paragraph. Scenario ‘A’ does not reduce either Cgg or Csd. But it potentially reduces

58

the interconnect length by shrinking the circuit area. Scenario ‘A’ may increase the contact resistance sharply at vey advanced technology nodes, due to limitations related to the transfer length as discussed later. Scenario ‘B’ reduces Csd by reducing Cplug- plug, but does not reduce Cgg. Reducing Hgate (Scenario ‘C’) is effective in reducing both Cgg and Csd. The effect of any other structure scaling scenarios without changing the Lgc can be evaluated as a combination / superposition of the effects of these three basic scaling scenarios.

3D simulation are performed with Lgc = 10 nm at the 65 nm node, to capture the impacts on the capacitances by reducing contact size, plug height, and gate height to half of the typical values as listed in Table 3.3. Figure 3.14 shows the device gate capacitance (Figure 3.14a), S/D node capacitance (Figure 3.14b), and their components with different selective scaling scenarios. The height of each bar indicates the absolute values of the total gate capacitance (Cgg) or the total S/D node capacitance

(Csd), while the percentage numbers correspond to their contributions from different components. The effect of scaling scenarios A, (A+B), (A+C), and (A+B+C) are illustrated in Figure 3.14.

As verified by Figure 3.14, to the first order, scenario ‘A’ does not reduce either Cgg or

Csd; scenario‘B’ reduces Csd by reducing Cplug-plug, but does not reduce Cgg to the first order; reducing Hgate (Scenario‘C’) is effective in reducing both Cgg and Csd. As a general rule, the most effective way to reduce the device parasitic capacitance is to reduce the height of the lowest components, e.g. the gate height for a planar bulk device, or the raised S/D and gate height for ultra-thin body SOI (UTBSOI). That is confirmed by the simulation result that (A+C) is more effective than (A+B) in terms of reducing the parasitic capacitances.

59

C gate-plug C 2 outer-fringe C overlap

m) C g,intrinsic μ 1.5

(fF/ 12% gg 30% 34% 34% 23% 22% 1 21% 18% 18% 20% 21% 11% 11% 11% 13% 13% 0.5

Capacitance C 38% 37% 37% 44% 44%

0 Typical A (A+B) (A+C) (A+B+C)

(a)

C plug-plug 0.8 C gate-plug C 0.7 outer-fringe

m) C

μ overlap 0.6

(fF/ 18% 20% 8%

sd 0.5 24% 27% 0.4 14%

0.3 39% 43% 49% 29% 35%

0.2 28% 23% 27% 27% 32% Capacitance C 0.1 15% 14% 16% 17% 19% 0 Typical A (A+B) (A+C) (A+B+C) (b) Figure 3.14 (a) Gate capacitance, (b) S/D node capacitance, and their breakdown

with different selective scaling scenarios. Lgc is 10 nm. Scaling down the gate height is the most effective way of reducing device parasitic capacitance.

The impacts of these selective scaling scenarios on the parasitics are summarized in Table 3.2. At the device level, the effectiveness of reducing parasitic capacitances is, in descending order: (B+C)> (A+B+C)> C> (A+C)> B> (A+B)> Not Scaled> A. Since Scenario ‘A’ effectively reduces the chip area, the interconnect length is reduced. Consequently, the reduction of interconnect capacitance on the critical path

60

improves the circuit speed. When interconnect capacitance at the circuit level is also considered, this order becomes: (A+B+C)> (B+C)> (A+C)> C> B> (A+B)> A>Not Scaled, because a smaller device footprint helps reduce the interconnect capacitance. Reducing the contact size has some secondary effect on the capacitance, such as increasing Cgate-plug and Cplug-plug.

Table 3.2 The effects of selective scaling scenarios on parasitics per unit gate width.

Coverlap Couter-finge Cgate-plug Cplug-plug Rc+Rplug Lpitch

Scenario A - ↓ ↑ ↑ ↑ ↓

Scenario B - - ↓ ↓↓ ↓ -

Scenario C - ↓ ↓↓↓ ↑ - -

Figure 3.15 shows the on-current and gate capacitances for selectively scaled structures, normalized to those of the typical devices, for both SOI and bulk devices. A complete device is built using Taurus Device [42], with the geometric parameters as in Table 3.3 and the doping profiles as in Table 3.4. The channel structure remains the same for different Lpitch, thus the magnitude of the on-current reveals the difference in the parasitic resistances in a reverse way. Reducing the plug height and the gate height do not change the series resistance to first order. In our simulation for the 65nm technology (Figure 3.15), the contact resistance is not very sensitive to the contact size for device structure scaling scenarios for current-generation technologies. However, for future technologies, it is quite possible that reducing the contact size can dramatically increase the series resistance. We use the transmission line model [44] to

 R  estimate this impact: R  R  coth L S  . Here, R is the contact cont S C  silicide  cont  C  resistance between the diffusion layer and the silicide layer, in units of ohm. Rs is the 61

sheet resistance per square of the underlying heavily doped silicon layer in units of ohm/□, ρc is the specific contact resistivity between the metal and the diffusion layer,

2 C in units of ohm-cm . A transfer length lt  is defined [44]. The calculated RS transfer length for our 65nm technology devices is around 60nm and 30nm for NMOS and PMOS, respectively. Most current paths end within a length of lt, which means the

1.1

(I) Reducing Lgc Only

1.05 (I) + (II) Selectively Scaledscaled footprints Device Structure 1

Normalized On-Current Normalized UTBSOI Planar Bulk 0.95 100 150 200 250 Contacted Gate Pitch L (nm) pitch (a)

1.5 L UTBSOI pitch Planar Bulk 1.4

1.3 G G

1.2 (I) Reducing Lgc Only

1.1 (I) + (II) Selectively 1 scaledScaled footprints Device Structure 0.9 Normalized Gate Gate Capacitance Normalized

100 150 200 250 Contacted Gate Pitch L (nm) pitch (b)

Figure 3.15 (a) On-current, and (b) Gate capacitance vs. Lpitch for both planar bulk CMOS and UTBSOI in 65 nm node with (I) reducing Lgc only and (I) + (II) selectively scaled device structure. The currents corresponding to (I)+(II) are of the

same magnitudes as (I), only with a displacement in Lpitch axis.

62

contact resistance only slightly depends on the length of the silicide region (Lsilicide), when Lsilicide is greater than lt. However, once Lsilicide is less than lt, a sharp increase of contact resistance is expected if Lsilicide is further reduced. Typically, Lsilicide is proportional to the technology feature size. To overcome this contact resistance issue,

Lsilicide has to be relaxed, since Lsilicide is less than lt in the advanced technologies.Generally, the device structure should be selectively scaled in the following ways: (1) Reduce the vertical electrode heights, especially the height of the lower electrode, i.e. Hgate. (2) Moderately reduce the overlay tolerance (Lgc). A rule of thumb for Lgc is about 0.4Hgate. (3) Reduce the lateral contact size down to the level of the transfer length. In the following sections, we show examples to illustrate the effectiveness of selective device structure scaling for improving device and circuit level performance.

3.3.3 Inverter Delay Improvement

To validate the concept of selective device structure scaling, we use mixed-mode device/circuit simulations (Taurus [42]) to optimize FO4 inverter delay for both planar bulk CMOS and UTBSOI. The nominal devices are built with the key parameters listed in Table 3.3 and Table 3.4. For the group labeled “Reducing Lgc only”, we gradually reduce Lgc with all the other parameters unchanged. For the group labeled

“Selectively scaled device structure”, we gradually reduce Lgc, while halving Lcont,

Hgate and Hplug.

First, we analyze the behavior of the saturation on-current and the device capacitances as a function of the distance (Lsd) between the gate edge and the S/D contact stud for both planar bulk CMOS and UTBSOI with the scaling scenarios: (I) reducing Lgc only and (I) + (II) 3-D (A+B+C) selectively scaled device structure. We choose 65nm technology rather than the up to date 45nm technology mainly because reliable

63

compact device model is available to us only up to 65nm node. Also, 65nm and 45nm technologies are similar in the way that the contact size scaling is not expected to increase the series resistance significantly. Clearly, reducing the distance Lgc results in a trade-off between higher on-current due to reduced series resistance (Figure 3.15a) and higher parasitic capacitance (Figure 3.15b). The maximum on-current improvement of the scaled device over the device with the standard digital circuit design rule (identified as “typical device” in the following) is about 10%. When evaluating dynamic performance (circuit speed), capacitance effects are also significant. 3-D device structure scaling (A+B+C) is able to reduce the gate capacitance by about 10% as compared to that of the typical device (Figure 3.15b). The general trend for bulk CMOS and UTBSOI are similar.

Table 3.3 The geometric parameters used for the typical case of 65nm technology devices.

Lpoly Lgc Lcont Lsilicide Hgate Hplug tox kox tsub

35nm 80nm 65nm 130nm 70nm 210nm 1.2nm 3.9 100nm

Table 3.4 Key parameters for device doping profiles. UTBSOI bulk body thickness 15nm S/D junction depth 35nm p-channel doping 5.4x1018 cm-3 S/D peak doping 1x1020 cm-3 n-channel doping 5.25x1018 cm-3 S/D extension junction depth 11nm poly doping 5x1019 cm-3 S/D extension peak doping 2x1020cm-3 S/D peak doping 5x1019 cm-3 p-channel peak doping 6x1018 cm-3 n-channel peak doping 1.8x1018cm-3 poly doping 1x1020cm-3

64

A four-stage inverter chain is used to examine the dynamic performance for both planar bulk CMOS and UTBSOI. Circuit simulations are performed using device/circuit mixed-mode numerical simulation in Taurus-Device [42]. The device width ratio between pFET and nFET is designed at 1.5 to balance the pull-up and the pull-down delay. The inverter delay (τ) is evaluated between the 50% to 50% points. A trade-off between the on-current and capacitances exists when sizing the dimensions of S/D region. Up to 5% higher speed can be obtained simply by pushing the gate to silicide distance Lgs (Lgc=Lgs+Δ, Δ is the distance from the silicide edge to the plug edge.) smaller (Figure 3.16). With 3-D selective structure scaling (A+B+C), the inverter speed is about 15% faster for both the planar bulk device and UTBSOI, compared with devices with standard layout rules (Lsd = 12 λ). At the design point with minimum FO4 delay, the on-current improvement over typical device is about

7%, and the total device length is about Lsd0 = 6.6 λ which is 45% smaller (isolated device) layout area than the typical device.

UTBSOI Planar Bulk 1.1 Conventional

Delay (I) Reducing L only gc ‐layout Devices FO4

1 (I) + (II) Selectively scaled 0.9 device structure Normalized

0.8 100 150 200 250 Contacted Gate Pitch Lpitch (nm)

Figure 3.16 FO4 delay vs. Lpitch for both planar bulk CMOS and UTBSOI with (I) reducing Lgc only and (I) + (II) selectively scaled device structures.

65

The reduced junction capacitance for bulk device with selective structure scaling has trivial impact (<3%) on speed because Cplug-plug, Cgate-plug, Couter-fringe and Coverlap are the largest components of Csd. This also explains the similarity of the trend between bulk CMOS and UTBSOI. Mobility degradation due to reduced stress of the smaller active area (~50MPa less stress by reducing Lgc by 65 nm) is smaller than 4.5% [45][46]which corresponds to less than 3% on-current degradation. As a result, even though the stress-dependent mobility is ignored in our analysis, the general trend and the conclusions addressed in this paper remain valid. For more advanced technologies which depend heavily on strain-induced mobility enhancement, the reduction of drive- current with reduced stress must be further studied.

3.3.4 Circuit-Level Improvement

The performance improvement at the circuit macro level is larger than the inverter delay discussed above because a smaller device footprint results in a smaller layout area and reduces the interconnect capacitance. We verify this for UTBSOI by a full custom 53-bit multiplier using both the conventional layout and the optimized footprint (Lsd=7λ, the minimum point in Figure 3.16), in a similar way as described in

[14]. Compared with the multiplier made with typical devices (Lsd=12λ), the multiplier built with the selectively scaled devices (Lsd=7λ) occupies 30% less layout area, operates at 25% higher speed (~10% comes from the shorter interconnects), and consumes 10% less dynamic power due to the smaller interconnect capacitance of the smaller circuit layout area. The principles of selective structure scaling for bulk devices are the same as UTBSOI, and the amount of improvement in the device level and simple circuit level are very similar for UTBSOI and bulk devices. Since the major difference for a circuit macro-level analysis is the reduction of wiring

66

capacitances due to the tighter pitches, we expect the bulk devices have the similar improvement at the system level with the Lsd=7 λ footprint.

600 typical (I)+(II) 500 (I)+(II) with relaxed contact size

400

300 (ohm-um) series

R 200 Limited by R C 100 Relaxed silicide length (L ) is required silicide 0 11 16 22 32 45 65 90 Technology node(nm)

(a)

600 Solid Symbol: Typical 2R Empty Symbol: (I)+(II) 500 C

400 (ohm-um) Half-size contact & silicide length (L ) silicide series 300 Relaxed silicide length (L ) silicide 200

Components of R of Components 100 2R ext

0 11 16 22 32 45 65 90 Technology node(nm)

(b)

Figure 3.17 (a)Rseries , and (b) its components as a function of technology nodes. Rc increase quickly with half size contact scaling. A relaxed contact silicide length (Lsilicide) should be used for future technology nodes in order not to degrade on- current.

67

3.3.5 Extending the Technology Roadmap

Figure 3.18 shows a scaling scenario in which aggressive Lpitch scaling compensates for the slower than 0.7x per node Lgate scaling. With selective device structure scaling in both the horizontal (reducing contact size and Lgc) and vertical (reduced gate and plug height) directions, the technology roadmap can be extended to the 11 nm node with physical gate length no shorter than 10 nm. The parameters used for the projection are listed in Table 3.5. Lpitch scaling is bounded by parasitic capacitance and contact resistance. Scaling the length of the S/D silicide (Lsilicide) below the current transfer length causes rapid increase of contact resistance Rc below 45nm technology node (Figure 3.17). In order not to degrade the on-current for aggressively scaled devices, a relaxed silicide length (Lsilicide) in source/drain contact regions has to be used once Lsilicide becomes comparable to or less than the transfer length. As illustrated in Figure 3.17(b), the contact resistance with the silicide length reduced by half becomes much larger than that of the typical device beyond 45nm technology.

For the extended technology roadmap, we relax Lsilicide in a way that the total series resistance after selective scaling is no larger than that for the conventional layout, as indicated by the solid line in Figure 3.17(a). The main limiter of the transfer length is the specific contact resistivity, which is difficult to reduce with present techniques. We therefore assume that the transfer length does not scale significantly along with the technology [47][48]. Potentially, the specific contact resistivity can be reduced by further increasing the doping concentration in the diffusion layer or lowering the barrier heights by choosing a different metal or silicide [47]-[49]. Metal source/drain with fermi-level depinning in the Schottky junctions, is a possible candidate to reduce source/drain resistance [50]-[53]. The Lpitch scaling is bounded by the relaxed Lsilicide in more advanced technologies. Figure 3.18 marks the Lpitch boundary within which the device on-current is greater than or equal to the values for “typical” devices with

68

zero or trivial parasitic capacitance penalty. The extended scaling path requires tight pitch patterning, tight overlay tolerances, small contact holes and a short gate height processes. They are all potential yield limiters. On the other hand, the benefits of selective devie structure scaling are significant. Novel nanofabrication techniques are needed to realize the substantial benefits offered by Lpitch scaling and parasitics engineering. The potential candidates of such fabrication techniques include diblock copolymer for small contact sizes and overlay tolerance, and metal gate process for low gate height.

Table 3.5 Summary of parameters used for ITRS technology projection [2005][16]. The RC delay is expected to scaling according to ITRS roadmap. The total resistance

R is the sum of the channel resistance (Rchan), the extension resistance (Rext) and the contact resistance (Rc). The Rchan is assumed to be about 70% of the total resistance in a well-designed device under the typical layout rules. The extension resistance, contact resistance and capacitance are computed based on the parameters listed here.

Technology node nm 90 65 45 32 22 16 11

Metal 1 half pitch (contacted) nm 90 65 45 32 22 16 11

Supply voltage(Vdd) V 1.2 1 1 0.9 0.8 0.7 0.6

Contact junction depth(Xj) nm 55 38.5 27.5 19.8 15.4 12.1 11

Silicide thickness (tsilicide) nm 27.5 19.25 13.75 9.9 7.7 6.05 5.5

Physical gate length(Lgate) nm 50 35 25 18 14 11 10

Drain extension junction depth(Xjext) nm 17.5 12.25 8.75 6.3 4.9 3.85 3.5

ohm- Contact maximum resistivity (ρ ) 3e-8 3e-8 3e-8 3e-8 1e-8 1e-8 1e-8 c cm2

69

L : Data from IEDM, VLSI gate L : Data from IEDM, VLSI pitch 3 Proposed L scaling roadmap Slope = 1 10 pitch Proposed L gate L scaling scenarios pitch

Rc & Cpar Limited Cpar Limited 2 10 Length (nm) Length Lpitch (I) Smaller Lgc, Hgate=2Lgate 1 10 (I)+(II, A+B), Hgate=2Lgate ITRS (I)+(II, A+B+C), H =L Lgate gate gate 11 16 22 32 45 65 90 130 180 250 350 Technology Node (nm)

Figure 3.18 Contacted gate pitch (Lpitch) and physical gate length (Lgate) vs. technology node down to 11 nm node. The minimum Lpitch (lines with ‘×’ symbol) is bounded by either parasitic Capacitance (Cpar) or contact resistance (Rc) or both. By device selective structure scaling, the historic performance trend can continue for another 2 to 3 generations even without gate length scaling. The suggested technology scaling path is denoted by dashed curve.

3.4 Conclusions

For technology beyond 45nm, the intrinsic delay CgcV/I, which relies mainly on Lgate and does not capture the relevant design parameters necessary for MOSFET optimization, is inadequate for device performance evaluation. Parasitic capacitance has become a severe problem, which cannot be ignored for technology evaluation. We develop simple analytical models for parasitic capacitance components according to their physical origins. Starting from the ITRS 2009 edition, we use the MASTAR model to study the device delay including the parasitic capacitances. By selectively engineering the P/N ratio and the device dimensions, such as gate height, gate-to-

70

contact distance, device width, and gate length, the device performance can be boosted by reducing parasitics and preserving electrostatic integrity. By examine the dependence of parasitic capacitances and series resistances on the device geometric dimensions, we propose a new device scaling scenario for sub-45 nm technology node high performance CMOS technology. We postulate that even with the gate length remaining essentially the same, selectively scaling the device structure will provide significant circuit-level performance improvement from technology generation to technology generation. The benefit comes from optimizing the tradeoff between series resistances and parasitic capacitances and the reduction of the interconnect capacitances. By shrinking the lateral distance between the gate edge and the source/drain contact edge, the parasitic capacitance increases but both the series resistance and interconnect capacitance decreases. By vertically lowering the gate heights and plug heights and reducing the contact sizes, the parasitic capacitance and interconnect capacitance are reduced with the penalty of series resistance. For small benchmark circuits, such as inverter chains, and a fully custom designed 53-bit multiplier, the selectively scaled device with reduced footprint achieves smaller layout area, higher speed and energy efficiency. The results are verified by 2D and 3D electrostatic simulation [17][22] (for capacitances), Taurus Device [42] and analytical calculation [43] (for series resistances), Taurus mix-mode simulation [42] (for inverter chains) and complex circuit simulation as in [14] (for 53-bit multiplier). This work provides a strong incentive to develop innovative technologies, such as small contacts, tight tolerances, low-k spacers [54], low gate and plug height integration schemes [23], and low specific contact resistivity source/drain technology [48][49], such as metal S/D with unpinned Fermi level [50][52][53].

71

3.5 Bibliography

[1] R. Dennard et al., “Design of Ion-Implanted MOSFET’s with Very Small

Physical Dimensions,” IEEE Journal of Solid-State Circuits, vol. SC-9, no. 5, pp.

256 –268, Oct 1974.

[2] S. E. Thompson, et al., "A logic nanotechnology featuring strained-silicon," Electron Device Letters, IEEE, vol. 25, pp. 191-193, 2004.

[3] V. Chan, et al., “High speed 45nm gate length CMOSFETs integrated into a 90nm bulk technology incorporating strain engineering,” Electron Devices Meeting, 2003. IEDM 2003. IEEE International, 2003, pp. 77-80, 2003.

[4] R. Chau, et al., "High-κ/metal-gate stack and its MOSFET characteristics," Electron Device Letters, IEEE, vol. 25, pp. 408-410, 2004.

[5] K. Mistry et al., “A 45nm Logic Technology with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100% Pb- free Packaging,” IEEE International Electron Devices Meeting (IEDM), pp. 247- 250, Washington DC, December 10-12, 2007.

[6] S. Natarajan, et al., "A 32nm logic technology featuring 2nd-generation high-k + metal-gate transistors, enhanced channel strain and 0.171 µm2 SRAM cell size in a 291Mb array," Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 941-943.

[7] F. Arnaud, et al., "32nm general purpose bulk CMOS technology for high performance applications at low voltage," Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 633-636.

[8] X. Chen, et al., "A cost effective 32nm high-K/ metal gate CMOS technology for low power applications with single-metal/gate-first process," VLSI Technology,

72

2008 Symposium on, 2008, pp. 88-89.

[9] C. H. Diaz, et al., "32nm gate-first high-k/metal-gate technology for high performance low power applications," Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 629-632.

[10] S. E. Thompson and S. Parthasarathy, “Moore's law: the future of Si microelectronics,” Materials Today, vol. 9, pp. 20-25, 2006.

[11] J. W. Sleight, I. Lauer, O. Dokumaci, D. M. Fried, D. Guo, B. Haran, S. Narasimha, C. Sheraw, D. Singh, M. Steigerwalt, X. Wang, P. Oldiges, D. Sadana, C.Y. Sung, W. Haensch, M. Khare, “Challenges and Opportunities for High Performance 32 nm CMOS Technology, ” IEEE International Electron Devices Meeting (IEDM), pp. 697-700, San Francisco, CA, December 11-13, 2006.

[12] J. Mueller, R. Thoma, E. Demircan, C. Bermicot, and A. Juge, "Modeling of Mosfet Parasitic Capacitances, and their Impact on Circuit Performance," Solid State Electronics Vol 51, pp 1485- 1493, Nov-Dec 2007.

[13] MASTAR model, http://public.itrs.net/models.html.

[14] J. Deng, K. Kim, C.-T. Chuang, H. -S. P. Wong, “The Impact of Device Footprint Scaling on High-Performance CMOS Logic Technology,” IEEE Transactions on Electron Devices, pp. 1148-1155, vol. 54, No.5, May 2007.

[15] R. Shrivastava and K. Fitzpatrick, "A simple model for the overlap capacitance of a VLSI MOS device," Electron Devices, IEEE Transactions on, vol. 29, pp. 1870- 1875, 1982.

[16] ITRS Roadmap, http://itrs.net/reports.html.

[17] Maxwell 2D®, Ansoft Corporation, PA.

[18] A. Bansal, B. C. Paul, and K. Roy, “Modeling and optimization of fringe capacitance of nanoscale DGMOS devices,” IEEE Transactions on Electron 73

Devices, vol. 52, No. 2, pp. 256–262, Feb. 2005.

[19] W. Wu and M. Chan, “Analysis of Geometry-Dependent Parasitics in Multifin Double-Gate ,” IEEE Transactions on Electron Devices, vol. 52, No. 4, pp. 692- 698, Apr. 2007.

[20] R. Plonsey and R. Collin, Principles and Applications of Electromagnetic Fields, p146-163. New York: McGraw-Hill, 1961.

[21] J. Deng, H.-S. P. Wong, “Modeling and analysis of planar gate electrostatic capacitance for 1-D FET with multiple cylindrical conducting channels,” IEEE Trans. on Electron Devices, vol. 54, No. 9, pp.2377-2385, 2007.

[22] Maxwell 3D®, Ansoft Corporation, PA.

[23] Z. Ren, K. T. Schonenberg, V. Ontalus, I. Lauer, and S. A. Butt, “CMOS Gate Height Scaling,” The 9th International Conference on Solid-State and Integrated- Circuit Technology (ICSICT2008), paper A1.6, Beijing, China, October 20-23, 2008.

[24] L. Wei, et al., "Selective device structure scaling and parasitics engineering: a way to extend the technology roadmap," IEEE Transactions on Electron Devices, vol. 56, pp. 312-320, February 2009.

[25] B. F. Yang, et al., "Stress dependence and poly-pitch scaling characteristics of (110) PMOS drive current," in VLSI Technology, 2007 IEEE Symposium on, 2007, pp. 126-127.

[26] K.-M. Tan, et al., "A High-Stress Liner Comprising Diamond-Like Carbon (DLC) for Strained p-Channel MOSFET," Electron Device Letters, IEEE, vol. 29, pp. 192-194, 2008.

[27] H. C. H. Wang, et al., "High-Performance PMOS Devices on (110)/<111'> Substrate/Channel with Multiple Stressors," in Electron Devices Meeting, 2006. IEDM '06. International, 2006, pp. 1-4. 74

[28] R. A. Chapman, J. W. Kuehne, P. S-H. Ying, W. F. Richardson, A. R. Peterson, A. P. Lane, I. -C. Chen, L. Velo, C. H. Blanton, M. M. Moslehi, and J. L. Paterson, “High performance sub-half micron CMOS using rapid thermal processing,” Electron Devices Meeting, 1991. IEDM 1991. IEEE International, 1991, pp. 101- 104.

[29] M. Rodder, Q. Z. Hong, M. Nandakumar, S. Aur, J. C. Hu, and I. C. Chen, “A sub-0.18μm gate length CMOS technology for high performance (1.5V) and low power (1.0V),” Electron Devices Meeting, 1996. IEDM 1996. IEEE International, 1996, pp. 563-566, 1996.

[30] P. Gilbert, et al., “A high performance l.5V, 0.10μm gate length CMOS technology with scaled copper metalization,” Electron Devices Meeting, 1998. IEDM 1998. IEEE International, 1998, pp. 1013-1016, 1998.

[31] K. K. Young, et al., “A 0.13 μm CMOS technology with 193 nm lithography and Cu/low-k for high performance applications,” Electron Devices Meeting, 2000. IEDM 2000. IEEE International, 2000, pp. 563-566, 2000.

[32] S. Tyagi, et al., “A 130 nm generation logic technology featuring 70 nm transistors, dual Vt transistors and 6 layers of Cu interconnects,” Electron Devices Meeting, 2000. IEDM 2000. IEEE International, 2000, pp. 567-570.

[33] W.-H. Lee, et al., “High performance 65 nm SOI technology with enhanced transistor strain and advanced-low-K BEOL,” Electron Devices Meeting, 2005. IEDM 2005. IEEE International, 2005, pp.56-59, 2005.

[34] S. Tyagi, et al., “An advanced low power, high performance, strained channel 65nm technology,” Electron Devices Meeting, 2005. IEDM 2005. IEEE International, 2005, pp. 1070-1072, 2005.

[35] P. Bai, et al., “A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 Cu interconnect layers, low-k ILD and 0.57 μm2 SRAM cell,”

75

Electron Devices Meeting, 2004. IEDM 2004. IEEE International, 2004, pp. 657- 660, 2004.

[36] C. Auth, et al., "45nm High-k + metal gate strain-enhanced transistors," in VLSI Technology, 2008 Symposium on, 2008, pp. 128-129.

[37] K. Henson, et al., "Gate length scaling and high drive currents enabled for high performance SOI technology using high-κ/metal gate," in Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 645-648.

[38] C. H. Jan, et al., "A 45nm low power system-on-chip technology with dual gate (logic and I/O) high-k/metal gate strained silicon transistors," in Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 637-640.

[39] C. T. Black, K. W. Guarini, R. Ruiz, E. M. Sikorski, I. V. Babich, R. L. Sandstrom, Y. Zhang, “Polymer Self Assembly in Semiconductor Microelectronics,” IEEE International Electron Devices Meeting (IEDM), pp. 439-442, San Francisco, CA, December 11-13, 2006.

[40] L.-W. Chang, H.-S. P. Wong, “Diblock Copolymer Directed Self-Assembly for CMOS Device Fabrication,” 31st SPIE International Symposium on Microlithography, in Design and Process Integration for Microelectronic Manufacturing IV., edited by A. K. K Wong, V. K. Singh, Proceedings of the SPIE, Vol. 6150, pp. 329-334, 2006.

[41] J. Bang, S. H. Kim, E. Drockenmuller, M. J. Misner, T. P. Russell, C. J. Hawker, Defect-Free Nanoporous Thin Films from ABC Triblock Copolymers,” J. Am. Chem. Soc. Vol. 128, p.7622-7629, 2006.

[42] Taurus-Device®, Version 2005.10, Synopsys Corp., CA.

[43] S.-D. Kim, C.-M. Park, J. C. S. Woo, “Advanced Model and Analysis of Series Resistance for CMOS Scaling Into Nanometer Regime – Part I: Theoretical Derivation,” IEEE Transactions on Electron Devices, pp457- 466, vol. 49, No.3,

76

March 2002.

[44] H. H. Berger, “Contact Resistance and Contact Resistivity,” Electro-chem. Soc. J., vol. 119, no.4, pp 507-514, Apr. 1972.

[45] S. Thompson, G. Sun, K. Wu, J. Lim, T. Nishida, “Key Differences for Processinduced Uniaxial vs. Substrate-induced Biaxial Stressed Si and Ge Channel MOSFETs,” IEEE International Electron Devices Meeting (IEDM), pp. 221-224, San Francisco, CA, December 13-15, 2004.

[46] N. Shah, M.S. Thesis, U. Florida, 2005.

[47] R. Shenoy, K. Saraswat, “Optimization of Extrinsic Source/Drain Resistance in Ultrathin Body Double-Gate FETs,” IEEE Transactions on Nanotechnology, pp. 265-270, vol. 2, No. 4, Dec 2003.

[48] A. Yagishita, T-J King, J Bokor, “Schottky Barrier Height /reduction and Drive Current Improvement in Metal Source/Drain MOSFET with Strained-Si /Channel,” Japanese Journal of Applied Physics, pp.1713-1716, vol. 43, No. 4B, 2004.

[49] A. Kinoshita, C. Tanaka, K. Uchida, J. Koga, “High-performance 50-nm-gate- length Schottky-source/drain MOSFETs with Dopant-segregation Junctions,” 2005 Symposium on VLSI Technology, pp.158-159, Jun 14-16, 2005.

[50] D. Connelly, C. Faulkner, D. Grupp, J. Harris, “A New Route to Zero-Barrier Metal Source/Drain MOSFETs,” IEEE Transactions on Nanotechnology, pp.98- 104, vol. 3, No. 1, March 2004.

[51] T Takahashi, T. Nishimura, L. Chen, S. Sakata, K. Kita, A. Toriumi, “ Proof of Ge-interfacing Concepts for Metal/High-k/Ge CMOS,” IEEE International Electron Devices Meeting (IEDM), pp. 697-700, Washington DC, December 10- 12, 2007.

[52] M. Kobayashi, A. Kinoshita, K. Saraswat, H.-S. P. Wong, and Y. Nishi, “Fermi- 77

Level Depinning in Metal/Ge Schottky Junction and Its Application to Metal Source/Drain Ge NMOSFET,” Symp. VLSI Technology, Honolulu, Hawaii, June 17 – 20, 2008.

[53] J. Hu, D. Choi, J.S. Harris, K. Saraswat, H.-S. P. Wong, “Fermi-Level Depinning of GaAs for Ohmic Contacts,” Device Research Conference, Santa Barbara, CA, June 23 – 25, 2008.

[54] J. Park and C. Hu, “Air Spacer MOSFET Technology for 20nm Node and Beyond,” The 9th International Conference on Solid-State and Integrated-Circuit Technology (ICSICT2008), paper A1.10, Beijing, China, October 20-23, 2008.

78

Chapter 4 Carbon Nanotube FETs: Compact Models and Chip-level Optimization

4.1 Introduction

Carbon nanotube field effect transistors (CNFETs) are considered potential candidates to replace or complement Si CMOS technology beyond the 16nm technology node, because of the good electrical transport properties of carbon nanotubes (CNTs) and the feasibility of using them to build field effect transistors (FETs) with geometrically excellent electrostatic control[1][2]. Device level analyses have suggested significant benefits [3][4], but it is not clear what the circuit/system benefits are, at a chip-level. We take an application-oriented approach, as opposed to the device-level modeling employed in previous works. To predict, design, and optimize device performance, various groups have developed a hierarchy of CNFET models, which model CNFET characteristics at different levels. The NEGF approach [5] is rigorous in physics but is numerically intensive. Others [6][7] are circuit-simulator compatible but require iterations for calculating the surface potential and quantum capacitance. The compact model in [8][9] are attempts at fast analytical solutions. However, these models all assume a re-distribution of channel carriers which intends to fill the lower energy levels first. This assumption is not valid when the channel length is comparable to or smaller than the carrier scattering mean free path, and inelastic scattering is negligible. Furthermore, second order effects such as the drain-induced barrier lowing (DIBL)

79

and carbon nanotube inter-channel screening have not been included. In this paper, we develop a model that analytically includes source/drain coupling and accounts for the sources of the channel carriers when calculating the quantum capacitances. Source exhaustion, an important phenomenon for ballistic transport, is also discussed for short channel CNFETs. The model has been implemented into a system-level performance optimizer which enables chip-level design optimization and benchmarking of CNFETs [10]. In Section 4.2, the electrostatic capacitance model is described, which includes the coupling from the source and drain. In Section 4.3, a non-iterative surface potential model is presented, following the detailed analysis on quantum capacitance components. In Section 4.4, the transport model is described, along with a discussion of source exhaustion, a special phenomenon due to ballistic transport. Circuit simulation results are shown. In Section 4.5, the model is integrated into a system- level design optimization program which evaluates optimal device parameters based on system-level design objectives. The CNFET is projected to achieve 5x chip-level speed up over PDSOI at 11nm technology node for a high-performance four-core processor with 1.5M logic gates and 5MB SRAM cells per core.

4.2 Electrostatic Capacitances

CNFET designs usually consist of multiple CNTs, connected in parallel, each one forming a separate gated conducting channel from source to drain. The planar structure (Figure 4.1) is chosen for better compatibility with the current CMOS technology. Also, a higher density of channels is possible with this structure as compared to the wrap-around gate structure [11]. A high density of channels is

80

required to deliver the necessary current for high performance applications [12][13]. Additionally, the planar structure does not invoke assumptions of fabrication technology that are not currently practiced in today’s technology. For electrostatic purpose, we abstract the CNT channels to be ideal conducting cylinders with radii r and channel pitch s (if there are more than one channel). The gate and the contact plugs are also ideal conductors with the same width and height. The channel length is defined as Lgate and the length of the doped source/drain extension region, between the gate and source/drain contact plug, is defined as Lsd. The total number of channels

(a)

(b)

Figure 4.1 Schematic structure of MOSFET-like planar CNFET. (a) Bird’s eye view showing multiple conducting 1D channels. (b) Cross section along the gate width.

81

under the gate is N. According to the number of CNTs under the same gate electrode, we model the electrostatic capacitances of CNFETs in two different groups: (1) CNFETs with only one single channel under the gate (“S” channel), denoted, and (2) CNFETs with multiple channels under the gate, in which the channels in the middle of the array are referred to as “M” channels, and the channels at the edge of the array are referred to as “E” channels. “S”, “E”, and “M” channels have 0, 1, and 2 adjacent neighbors, respectively. The more conductive neighbors the channel has, the more screening effect it feels. In Equation (4.1)-(4.6), “S”, “E”, and “M” channels are denoted with superscripts “S”, “E”, and “M”, respectively.

We make an assumption that the electrostatic capacitances is in dependent of channel charge density. In the case of Lgate>>r, the gate-to-channel capacitance (Cgc) can be modeled following Equation (4.1) [14]. Cgc_sr accounts for the inner screening effect.

A more general model for Cgc can be found in [15]. h=tox+r. kox and ksub are the permittivity of the gate dielectric and substrate, respectively.

40 oxL g Cgc_ sr  222 2 shrhhr2  2 ox sub hr2 s hr ln ln22 tanh  222 shrhhr2  ox sub 92rs sr  

S 20 oxL g Cgc  1 tr  hr  coshox ox sub ln  rrox sub 3

S E CCgcgcsr _ Cgc  S CCgcgcsr _

ME CCCgcgcgcsr2 _ . (4.1)

82

In this chapter, we use symbol ϕ to denote surface potential (in units of V), and symbol φ to denote the corresponding energy level (in units of eV), i.e., φ=-eϕ. In a short channel device, the electrical field penetrates from source and drain and affects the channel surface potential ϕch and the corresponding energy level φch, which is the energy level at the bottom of the conduction band. We model the source/drain coupling by introducing semi-empirical capacitances Cs_fr, Cs_nr, Cd_fr, and Cd_nr as in

Figure 4.2. Cs_nr (Cd_nr) and Cs_fr (Cd_fr) account for the electrostatic coupling from a certain channel to the source (drain) of the same tube, and to the source (drain) of the adjacent tube(s), respectively. The subthreshold slope (SS) and drain-induced-barrier- lowering (DIBL) can be calculated with these capacitances.

Gate Gate Cox Cd_nr Cs_fr Cd_fr Cs_nr

C Source s_nr Drain Source Channel Drain

Cd_nr C Cs_fr d_fr Cs_fr Cd_fr

Figure 4.2 Schematic of electrostatic capacitances network.

In the subthreshold region, there are very few carriers, and the channels, which are undoped, are not conductive. We assume the space between the gate and substrate is filled with a dielectric identical to the gate dielectric material, and the presence of the undoped CNTs does not affect the permittivity of this space. For ballistic transport, the highest barrier (φmax) for carriers along the channel controls the current flow. In the subthreshold region, the highest barrier occurs in the channel, i.e. φmax is equal to

φch. φch can be calculated as a result of the voltage divider formed by Cgc, Cs_fr, Cs_nr,

Cd_fr, and Cd_nr. In the single channel (“S”) case, the electric field between

83

source/drain and the surface of the channel is approximated to be elliptical when a small gate voltage Vg is applied, with the source electrode grounded. With a small drain voltage Vd, the highest barrier happens at the middle of the channel, approximately. We transfer the rectangular coordinate system into the elliptical coordinate system as shown in Figure 4.3. Following the conformal mapping methodology [16], the middle point of the channel surface, with the rectangular coordinate (x=1, y=0.5Lgate/tox) normalized to tox, has the elliptical coordinate u on the hyperbolic axis as calculated in Equation (4.2). Since the electric field is not perfectly elliptical, a semi-empirical fitting parameter b is introduced here. From (4.2), we can easily obtain the ratio between Cgc and the sum of Cs_nr and Cd_nr as in (4.3). When a large drain voltage is applied, the highest barrier moves toward the source side. This can be empirically modeled by fitting the ratio between Cs_nr and Cd_nr as in (4.4). Since the coupling capacitance is inversely proportional to the distance to the electrode, to the very first order, the ratio between Cs/d_nr and Cs/d_fr can be modeled as in (4.5). c is a fitting parameter. And the ½ factor in the “E” case is needed because the edge channel only has one adjacent neighbor, while the middle channel has two neighbors. With (4.2)-(4.5), φch can be calculated as in (4.6).

2 0.5Lg ab2 , b  1.3  3.6exp(  LLLnmg /00 ),  2.8 . tox

2 uaaarccos 0.5 4 (4.2)  

SME,, SME ,, 2 SME ,, CCs__ nr d nr1 C gc (4.3) u

C SME,, snr_  3.4(SME ),2.4( ),2.2( ) C SME,, where, dnr_ (4.4)

84

CLM 0.5 2 sd/_ frcwherec g , C M 22 3 sd/_ nr (0.5Lsg ) 

CCEM1 s /_dfr sdfr /_ CCEM2 s /_dnr sdnr /_ (4.5)

SS SSSVCggcdd VC chSSS , che ch CCCCgc s d sub MM M VCggc VC dd MM chMMM ,  che ch , CCCCgc s d sub EE EEEVCggcdd VC chEEE , che ch CCCCgc s d sub

SS SS CCssnrddnr__, CC MM M MM M where CCs snr__ C s fr, CC d  dnr _ C d _ fr. (4.6) EE E EE E CCs snr__ C s fr, CC d  dnr _ C d _ fr

gate y=Lgate /tox /8 /4 π /8 π π y v= v=3 v= u=π/2 Channel surface, x=1

source u=π/4 u=3π/8 drain

u=0 x

Figure 4.3 Coordinate transformation in the cross section: from rectangular coordinate (x-y) to elliptical coordinate (u-v).

Figure 4.4 compares the analytical model and the numerical simulation results for a variety of device structures in the range of our interest. Good agreement has been

85

achieved. The analytical models are semi-empirical with limited fitting parameters to reduce the complexity while providing the essential physics insight. The 3D simulation is done by Maxwell 3D [17], a 3D electromagnetic solver. The gate, plug, channel and source/drain extensions are set to be ideal conductors while the channel and source/drain extension are set to be eqipotentials. The electrostatic field is solved for the entire space and the capacitances are extracted from the solution.

(a)

(b) Figure 4.4 Comparison of the analytical model and numerical simulation of the channel surface potential (ϕch), with only electrostatic capacitances involved. (a) Vg=- 0.3V, Vd=Vs=0, and (b) Vg=-0.3V, Vd=1V, and Vs=0. The x-axis is the combination of oxide thickness (tox) and channel pitch (s).

86

4.3 Quantum Capacitances and Channel Surface Potential

In this chapter, we take n-type CNFET as an example. P-type CNFETs can be modeled analogously, due to electron-hole symmetry of the CNT band structure. The valence band can be found by reflecting the conduction band with respect to the mid- gap of the band structure. When the energy level of the bottom of the conduction band

(φch) is pushed down by the electrostatic coupling under a positive gate bias, populate the channel area, which tends to raise φch. The impact of channel charges on

φch can be regarded as an equivalent quantum capacitance between the bottom of the conduction band of the channel and the Fermi level of the source/drain. The quantum capacitance has a significant effect on low density of states materials, such as CNT and [18].

As a quasi-1-dimensional system, carriers in CNFETs are limited to travel along the axial direction, and the carrier mean free path is relatively long [19][20], on the order of 10-100nm. For devices with channel length comparable to or smaller than the mean free path, the carriers do not redistribute in energy. Thus it is reasonable to assume that these short-channel CNFETs operate at an ideal ballistic limit, where both intra- and inter- subband inelastic scattering are negligible, and carriers do not redistribution in energy in the channel.

Unlike electrostatic capacitances, quantum capacitances depend on the channel carrier density, and thus are functions of bias conditions and band structure. Channel carriers (electrons in the case of n-type CNFETs) can be divided into two categories: (1)

“transmitted carriers” with a population of QT, which flow directly between source and drain without changing their transport directions in k-space, and (2) “reflected carriers” with a population of QR, which are originally injected from the drain, but are 87

reflected back by the source barrier and contribute a second time to the channel carrier density in k-space. Figure 4.5 shows an example of the transmitted and reflected carriers injected from the drain side. Not all channel carriers contribute to the drain current (Id). For a given the surface potential of the channel (ϕch), source (ϕs), and drain (ϕd), the corresponding energy levels are φch, φs, and φd, respectively. The highest barrier for the electrons (φmax) is the highest one among the three surface potentials, i.e., φmax= Max(φch , φs, φd). In Figure 4.6, φmax=φs. Under the ballistic transport assumption, φmax determines the current, as previously noted. Only the carriers with energy higher than φmax can travel along the channel and contribute to the

Source transmitted carriers:

QTs is a function of ch and µS. S

D EC S ch Source EC Drain transmitted carriers: D Q is a function of  and µ . EV Td ch D Drain

(a)

S

D EC S ch Source EC D E Reflected carriers: V Drain QR is a function of ch , S, and D.

EV

(b) Figure 4.5 Illustration of (a) transmitted carriers, and (b) reflected carriers.

88

current. We refer to those carriers as “effective carriers” (Figure 4.6). All transmitted carriers injected from the source are effective carriers. Transmitted carriers injected from the drain with energy above φmax are also effective carriers. The rest of the drain injected carriers do not contribute to the drain current: those transmitted carriers injected from the drain with energy lower than φmax will be bounced back by the source barrier and become reflected carriers, which together give a zero net current. But both transmitted and reflected carries affect the quantum capacitance, regardless of whether they are effective carriers or not.

effective carriers S D

D EC S ch The highest barrier(ϕmax) D

EV

Figure 4.6 Illustration of effective carriers.

Both QT and QR are functions of φch, source chemical potential µS, and drain chemical potential µD, in units of eV. QT can be split into QTS and QTD; and QR can be split into

QRS and QRD. QTS and QRS are functions of (φch-µS), independent of µD. QTD and QRD are functions of (φch-µD), independent of µS. The charge neutrality equations can be written as in (4.7), where Qes is the channel charge required by the electrostatic coupling. Qg, Qd, and Qs are charges from the electrostatic coupling from the gate, drain, and source to the channel, respectively.

89

QQQTRes 0

QQTTschSTdchD()() Q   (4.7) QQRRschS()() Q RdchD   QQes g()()() ch G Q s ch S Q d ch D

There are four quantum capacitances CQTs, CQTd, CQRs and CQRd associated with QTS,

QTD, QRS and QRD, respectively, as defined in (4.8). CQTs, CQTd, CQRs, and CQRd carry the band structure information and are voltage dependent. Note that the sign of the chemical potentials and the surface potentials are in the opposite direction of the applied voltages, due to the negative charge of electrons.

QQ CeTT, Ce  QTs QTd sd QQ CeRR, Ce  QRse  QRd  s d (4.8) Assuming ballistic transport and equilibrium distribution of carriers in the source and drain, the drain current can be evaluated if φmax is known. This requires φch, which, in principle, should be calculated by solving the charge balance equation (Equation (4.7)) or including quantum capacitance (Equation (4.8)) by iterating to arrive at a self- consistent channel surface potential for a given applied bias [6][7]. However, this approach lowers the computational efficiency, and thus, is not desirable for intensive computation applications such as using the model within a system optimization loop [24]. Instead, we develop a fully analytical quantum capacitance model without iteration. As a first approximation, only the first 2 sub-bands are considered here. This approximation is valid for low power supply voltages which are expected to be used in future digital logic.

With both electrostatic capacitances and quantum capacitances, the channel surface potential ϕch and the corresponding φch can be evaluated with a capacitance network as illustrated in Figure 4.7. Metal gate and degenerate source/drain are assumed, while

90

the Fermi level offset and surface potential in the source, drain and gate (not channel) are set as constants EFSD, EFSD and φMS. φG=µG+φMS. φS=µS+EFSD. φD=µD+EFSD. Vg and

Vd are external applied voltages on the gate and drain electrodes, respectively, and the source electrode is grounded. RS and RD are the source and drain series resistances, respectively. For n-type CNFET with n-type source and drain doping, we start from a very negative gate bias Vg0. Because the population of channel carriers is essentially zero under Vg0, φch can be calculated with a purely electrostatic analysis, without the involvement of voltage-dependent quantum capacitance. We can then calculate φch at a certain Vg and Vd following (4.9)-(4.11), which are obtained by first differentiating (4.7) and then using (4.8).

 C ch G (4.9) gGDSQtotCCCC

 CC  C ch  DQTdQRd (4.10) dCCCC G D S Qtot

where CCCCCQtot QTd QTs QRs QRd (4.11)

V Gate g  GND G MS G Cfr_GS C Vd R G ch S Cfr_GD  RD S CS E CQTsCQRs C FSD D D S CQRd CQTd EFSD  Source D Drain Figure 4.7 Capacitance network to calculate φch. Both electrostatic capacitances and quantum capacitances are shown.

91

Empirically, φch is roughly piecewise linear with Vg and Vd, as will be shown below, so we make a piecewise constant approximation for the quantum capacitances, which are the derivatives of φch from Equation (4.9)-(4.11). We carefully analyze which states are filled and which states are empty for ideal ballistic transport for both the transmitted and the reflected carriers, under different bias conditions. Basically, CQTs corresponds to the change of QT relative to -µS for a fixed φch; CQTd corresponds to the change of QT relative to -µD for a fixed φ; and CQTs+CQTd corresponds to the change of

QT relative to φch for fixed µS and µD. CQTs, CQTd, CQRs, and CQRd are piecewise constant in each segment, in increments of 0.5Cq for carriers from a single side (either source or drain) in a single subband. Cq is fitted to be 400aF/µm for CNTs, close to the ground state metallic CNT quantum capacitance. 0.5 is used because carriers traveling in one direction populate only half of the k-space.

QQ TT 0.5Cq , if positive at 0K CeQTs, Ce QTd  sd   0, if 0 at 0K QQ CeRR, Ce   QRs QRd 0.5Cq , if negative at 0K sd  (4.12)

The bias-dependent quantum capacitances with piecewise constant approximation can be visualized using a multi-facetted Vg-Vd plane. Figure 4.8(a) shows such a facetted plane, associated with Ec1<µs

Cq. They are approximated as constants in each segment according to (4.12). As an illustration, Figure 4.8(b) is the band diagram corresponding to Facet D in Figure

4.8(a), where μS>μD>φS>φch>φD. Under this bias condition, both transmitted carriers

92

s 

= d

Vg 

Vg1 E:(0,0,½,‐½)

D:(0,½,½,0) F: (0,0,0,0)  =  C:(½, ½,0,0) ch s B:(½,0,0,0) ch = s A:(0,0,0,0)

V E /e V d1 FSD d2 Vd

(a)

Carriers injected from the source (all transmitted carriers) Transmitted carriers injected from the drain Reflected carriers injected from the drain

S  S D EC ch D

EV

(b)

Figure 4.8 (a) Multi-facetted Vg-Vd plane with piecewise constant quantum capacitance approximation. Ec1<µs

CQTS=0. As φD<φch in Facet D, the drain can supply carriers with proper energy to fill the empty states at the bottom of channel conduction band. QT increases with (φch-

93

µD). Similarly, QR increases with (φch-µS) but is independent of (φch-µD). Thus,

CQTS=0, CQTD=0.5Cq, CQRs=0.5Cq and CQRd=0 per unit gate length, in facet D. Figure

st nd 4.9 shows a more complex facetted Vg-Vd plane, with both the 1 and 2 subbands involved, with the source and drain doping satisfying Ec2<µs

s =

Vg 

d =

 O(0,0,0,0) d  N(0,0, ½,‐½) M(0,0, 1,‐1) K(0,1, ½, ½) G(0, H(0, 1,1,0) ½, ½,0) I(0,0,0,0) ch = s F(1,1, D(1, 0,0) ½,0,0) E(1,0,0,0) ch = s ‐Δ C(½, ½, 0,0) B (½,0,0,0)

ch = s A:(0,0,0,0)

Vd

Figure 4.9 Multi-facetted Vg-Vd plane with piecewise constant quantum capacitance

approximation. Ec2<µs

CQRd, in units of Cq, respectively.

Now, with this quantum capacitance model, φch(=-eϕch) can be calculated following (4.9)-(4.11), assuming minority carriers can be neglected for a sufficiently large CNT bandgap, when source and drain are heavily doped. The calculation does not require iterative method or numerical integration, hence, is computationally efficient. Figure 4.10 show the comparison between the analytical model and numerical simulation.

94

Computed using the universal density of states for the CNTs [21] for calculating the quantum capacitances, the comparison spans a range of biasing conditions relevant for digital logic applications, for a CNFET with a single (19, 0) CNT, 10nm channel length, and 1.5nm effective oxide thickness at 300K. The comparison with source/drain degeneracies of EFSD=0.1eV and 0.4eV are both illustrated, corresponding to the case in which the carriers being supplied from (a) only the 1st subband, and (b) from both the 1st and 2nd subbands, respectively. The numerical simulation is conducted by solving the charge neutrality equation (Equation (7)) self- consistently. Good agreement between our analytical model and numerical simulation is achieved in both cases, validating the piecewise constant quantum capacitance approximation.

ϕch [V] error=ϕ (model)- ϕ (simulation) [V] ch ch 0.06 0.5 1 0.08 0.06 0.04 0.04 0 0 0.02 0.02 0 -1 -0.02 0 1 -0.5 1 1 1 0 0.5 0 0.5 -0.02

V [V] -1 0 g V [V] V [V] -1 0 V [V] d (a) g d

ϕch [V] error=ϕch (model)- ϕch (simulation) [V]

0.2 0.5 0.03 0.02 0 0.02 0 0.015 -0.2 0.01 -0.5 -0.4 0 0.01 -1 -0.01 1 -0.6 1 0.005 1 1 0 -0.8 0 0.5 0.5 0 V [V] -1 0 V [V] -1 0 g V [V] g V [V] d d (b) Figure 4.10 ϕch calculated by the analytical model and the error from the results by numerical simulation: (a) E <µ

95

Figure 4.11 shows the channel potential for biases that are vertical slices of Vd=Vd1 and Vd2 in Figure 4.8(a). The solid and dashed lines correspond to the numerical simulation and analytical models. The piecewise linear property of ϕch is clearly illustrated. The derivative of ϕch over Vg is the result of the capacitive voltage divider. The numerical simulation contains the point-by-point voltage dependent quantum capacitances, while the step shape associated with the analytical model is a result of the piecewise constant quantum capacitance approximation. A negative quantum capacitance is predicted under certain bias conditions, as in Facet E in Figure 4.8(a).

Figure 4.12 shows the details of (a) ϕch and (b) d(ϕch)/dVd, with (c) the band

1 1 E A E

0.5 C D 0.8 B Numerical (V) 0 (V/V) 0.6 B C g simulation ch

ϕ CQot = 0.5Cq Analytical A /dV -0.5 Numerical simulation ch 0.4 model ϕ CQtot = Cq d Analytical model D -1 0.2 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Vg(V) ϕch (V) (a) 1 1 Numerical simulation A F H Analytical model 0.5 0.8 Numerical (V) H (V/V) simulation G g 0.6 ch 0

ϕ F Analytical

B /dV C = 0.5C Qtot q model ch

-0.5 ϕ 0.4

d CQtot = Cq A B G -1 0.2 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Vg(V) ϕch (V) (b) Figure 4.11 Comparison between simulation (solid curves) and our model

(dashed curves) of surface potential and its derivative over Vg at (a) Vd1 (=0.05V) and (b) Vd2 (=0.25V) in Figure 4.8a. EFSD=0.1eV. The letters represent the facet index in Figure 4.8a.

96

0.8 1 G Numerical simulation F Analytical model 0.7 0.5 G H F (V/V) g (V) 0.6 E 0

ch H /dV ϕ ch

ϕ E

0.5 d -0.5 Numerical simulation Analytical model 0.4 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Vd(V) Vd(V) (a) (b)

S  S D

source EC D ch drain In the ballistic limit, no carriers with suitable energy to fill the empty EV states. (c) Figure 4.12 Comparison between simulation and our model of (a) surface

potential and (b) its derivative over Vd at Vg1(=0.9V) in Figure 4.8a. The letters represent the facet index in Figure 4.8a. EFSD=0.1eV. Negative quantum capacitance is predicted in Facet E of Figure 4.8a, with the corresponding band diagram and carrier population illustrated in (c). diagram corresponding to Facet E in Figure 4.8(a). This is because QRD increases when µD decreases, i.e., when the gate-to-drain voltage decreases. As the capacitance is by definition the derivative of the amount of charges held by the capacitor over the voltage applied between the electrodes, this leads to a negative quantum capacitance. This situation only occurs when inelastic scattering is not present. Because the empty states in the channel with energy lower than φD are not filled, lowering the drain surface potential can fill some of the empty states, increasing QRD. There has not been

97

any experimental report on such a phenomenon, mainly due to the inability of fabricating a purely ballistic CNFET with ideal absorbing ohmic contacts.

4.4 Ballistic I-V Model and Source Exhaustion

In ballistic transport, the drain current can be evaluated by (4.13). T is the transmission coefficient, which is set to be 1 in the pure ballistic limit. DOS is the 1 E density of states, which can be calculated as . v is the injected carrier velocity,  k 1 E th which is . f is the Fermi-Dirac distribution. F0 is the 0 -order Fermi integral.  k The first two subbands are included. Again, no iterative method or numerical integration is involved. Figure 4.13 and Figure 4.14 are sample I-V characteristics, for a CNFET with a single (19, 0) CNT, 10nm channel length, and 1.5nm oxide thickness at 300K.

 I22 e T DOS v f dE e T DOS v f dE d  maxsource max drain 44ee TfdE TfdE hh maxsource max drain 2 4e maxSD  max FF00 hkTkT i1  (4.13)

The current of a ballistic device is ultimately limited by the exhaustion of carriers in the source [22][23] at high gate bias. Since CNTs have low density of states, small source/drain degeneracy can become a severe problem even under low gate bias.

98

Source exhaustion happens when the energy level of the bottom of the conduction band of the source (φS) exceeds that of the channel (φch) (Figure 4.15), i.e. φmax=φS. When source exhaustion occurs, all the carriers in the source become effective carriers. Even if φch is electrostaticly lowered by further increasing the gate voltage

(Vg) or drain voltage (Vd), φch cannot further modulate the effective carrier populations

(Qeff) in either the +k or –k direction. All source carriers are exhausted and swept into the channel and thus Id remains unchanged. As a result, gate transconductance becomes zero, and the differential output resistance becomes infinite when the drain injected current is negligible (high Vd). This could happen at a low Vg for CNFET as shown in Figure 4.13. In Figure 4.13, the flattening-out of the Id-Vg curve at high Vg could be mischaracterized as high series resistances. But this effect is, in fact, independent of the series resistance. The current under source exhaustion sets an intrinsic limitation for the drain current. To fully optimize devices for high current, research on improving source carrier concentration is needed.

Id vs. Vg: single channel (E =0.1eV) Id vs. Vg: single channel (E =0.4eV) FSD FSD 15 60

50

10 40 A) A)

  30 Id( Id( 5 20

10

0 0 -0.5 0 0.5 1 1.5 2 -0.5 0 0.5 1 1.5 2 Vg(V) Vg(V) (a) (b)

Figure 4.13 Id-Vg curve. (a) EFSD=0.1eV and (b) EFSD=0.4eV. Vd is from 0 to 1V with a 0.1V step. Source exhaustion occurs at around Vg= 0.4V in (a).

Although CQRd is negative in some regimes, the drain current increases monotonically with Vd (Figure 4.14). For example, with Vg=Vg1 in Figure 4.8a, CQRd is negative around Vd=Vd1 and φch decreases when Vd increases (Figure 4.12 a). However, due to

99

source exhaustion (Figure 4.15 b), the current is independent of φch but increases with

Vd since the population of the effective carriers injected from the drain deceases due to the lower drain Fermi level at higher Vd.

Id vs. Vd: single channel (E =0.1eV) Id vs. Vd: single channel (E =0.4eV) FSD FSD 20 60

V ≥0.4V, source exhaustion occurs 50 15 g 40 A) A)  10  30 Id( Id( 20 5 10

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Vd(V) Vd(V) (a) (b)

Figure 4.14 Id-Vd curve (a) EFSD =0.1eV and (b) EFSD = 0.4eV. Vg is from 0 to 1V with a 0.1V step.

Effective carriers injected from the source (+k) Effective carriers injected from the drain (‐k)

+k Qeff changes ‐k +k Qeff does not ‐k with φch change with φch  S S The highest barrier(ϕ ) max D EC  E S  D C ch S ch  D D

EV EV

Vg↑ (a)No source exhaustion (b) Source exhaustion

Figure 4.15 Band diagrams for (a) low gate voltages with no source exhaustion

(φmax=φch) and (b) high gate voltages with source exhaustion (φmax=φS).

100

Though source exhaustion in CNFET has never been reported experimentally due to the inability to fabricate a ballistic device with ideal absorbing ohmic contacts, we propose a potential experiment for future exploration, as in Figure 4.16. In this device, the source and drain are very heavily doped in the region next to the contacts, to suppress the limitation from bad contacts. The regions next to the gate are less heavily doped, whose degeneracy limits the current by source exhaustion. We expect to see that the maximum achievable current depends on the degeneracy of the less heavily doped region. The control of the doping density and location is a critical step for the experiment.

Very heavily‐doped regions: to avoid limitation on the population of injected carriers by bad contact

Source Gate Drain

Less heavily‐doped regions: adjust doping level and measure current

Figure 4.16 An experimental structure designed to illustrate source exhaustion

The transport and capacitance models can be used to realize circuit-level simulation and system-level optimization. Figure 4.17 shows the waveform of a CNFET inverter driving a fixed load capacitance of 1fF. The CNFETs have one (19, 0) CNT with

Lgate=10m, and EOT=1.5nm at 300K. The gate workfunction of the CNFETs are set to be 0.2eV (nCNFET) and -0.2eV (pCNFET) to obtain the enhanced-mode transistors.

101

Since the analytical model is very computationally efficient, it can be implemented into a system-level optimizer program [10][24], to realize chip-level performance optimization, for design and benchmarking purposes.

1 Vin 0.9 Vout 0.8

0.7

0.6

0.5

0.4 Vin and Vout/V and Vin 0.3

0.2

0.1

0 0 0.5 1 1.5 2 2.5 3 3.5 time/nS

Figure 4.17 Input and output waveforms of a single stage inverter driving a 1fF load capacitance, calculated by the analytical model.

4.5 System-level Design Optimization

D. Frank et al. [24] have developed a system-level design optimization tool for partially-depleted silicon-on-insulator (PDSOI) devices. The optimizer targets a high- performance, 4 core processor with 1.5M logic gates and 5MB SRAM cells per core. The goal of the optimization is to find the values of the selected device technology parameters that will result in the greatest possible processor performance for a given power level. The total power (Ptot) consists of two parts, dynamic power (Pdyn) and static power (Pstat). Dynamic power is consumed for active circuit switching. Static

102

power dissipates due to subthreshold leakage (PsubVt), gate leakage (Pox), band-to-band tunneling leakage (Pb2b), and minority carrier current (Pminority). The equations to calculate different power components are included in [24]. When the device is scaled according to Dennard scaling, the static power increases. Because Ptot is constrained,

Pdyn decreases accordingly, which eventually leads to a loss in the overall performance beyond some points in the scaling process. The optimizer mathematically solves the optimal points which balance the dynamic and static power to achieve the maximum performance.

Fixed Variables Parameters

CNFET Area model

Wiring statistics Thermal model New values Device structure Wire capacitance CNFET IV model CNFET Leakage model Constraint Optimizer Delay Leakage power Tolerance adjustment Tolerance adjustment

Adjust for latency of long paths Total power

Figure 4.18 Schematic of system level device optimization [9]. Most of the design variables can be flexibly chosen to be included into the optimization loop,

including Vdd, Lgate, LSD, D, S, EFSD, EOT, φMS.

We have extended the optimization tool to implement the CNFET model described in this Chapter. With the CNT models, the optimizer calculates the values of user- defined device design parameters to achieve optimal chip-level performance under certain constraints (e.g. power, area) (Figure 4.18). Instead of a planar structure, we

103

choose gate-all-around (GAA) structure for better gate controllability. In GAA structure, the channels are shielded from their neighbors, thus no screening effect exists. This corresponds to the single channel case (“S”) in the electrostatic models (Equation (4.1)-(4.6)). The tolerance model includes diameter variation, CNT pitch variation, gate length variation, source/drain doping variation, Vdd variations, and signal coupling noise, etc. The worst case delay and average subthreshold leakage are estimated from the tolerance model as in [24]. A valid design guarantees that an individual NAND gate does not fail due to variations and noise shifts.

Figure 4.19(a) shows the minimized logic delay at different power levels when simultaneously optimizing over Vdd, Lgate, D, s, LSD and φMS, for CNFETs with and without variability included. The results are compared to optimized 11 nm generation PDSOI (including variability). For local device-level fluctuations, we include the variations of Lgate, D, s, φMS, device width (W), power supply and coupling noise. For global process tolerances which may cause a chip-level offset, we take into consideration the variations of Lgate, D, s, φMS, W, tox, kox and external Vdd tolerance. We assume ideal CNT-metal contacts in this work. The 5x speed improvement (including variability) over PDSOI for a wide range of chip power levels (0.01W- 100W) indicates that CNFETs are a good candidate for advanced technology nodes.

Lgate in the optimal designs are in the 5-20nm range (Figure 4.19(b)), increasing for lower power designs. For these GAA structures with 2nm thick high-k gate insulators and an optimal diameter of 0.5nm, the gate electrostatic control over the channel overwhelms the coupling from the source and drain for Lgate greater than 15nm, so that no further improvement in short channel effects occurs for increasing Lgate. This explains the Lgate saturation trend toward low power (Figure 4.19b).

104

4 10 PD SOI(variability)

3 CNFET(variability) 10 CNFET (nominal)

102 5x speed up

101 Ave. loaded delay (ps) delay loaded Ave. 0 10 10-2 10-1 100 101 102 103 Chip power (W) (a)

45 PD SOI(variability) CNFET(variability) 35 CNFET (nominal)

(nm) 25 gate L 15

5 10-2 10-1 100 101 102 103 Chip power (W) (b)

Figure 4.19 Optimization results versus power constraint at 11 nm technology

node (a) minimum average loaded delay, (b) optimal Lgate to achieve the minimum delay. The variables Vdd, Lgate, D, s, LSD and φMS are simultaneously optimized. EFSD is fixed at 0.1eV.

Figure 4.20 shows the percentage partition of Pdyn in Ptot for both PDSOI and CNFETs. Over a wide range of power level (0.01W-100W), the percentage is relatively constant, regardless of the device structures. Similar conclusions are reported from other system-level analysis on Si devices [25][26]. At the optimal points, Pstat≈PsubVt. Following Equation (4.14), a constant Pdyn/Ptot indicates a constant

105

Ieff/Ioff ratio in the worst case conditions, where Ieff is the effective current, which is the average current passing through the device during a switching event [27]. The worst case conditions are considered. The worst-case values of the variables with plus signs

+ in the superscripts (such as Ioff ) are larger than the nominal values. Similarly, the

- worst-case values of the variables with minus signs in the superscripts (such as Ieff ) are larger than the nominal values. The worst-case values for Vdd can be either larger or smaller than the nominal Vdd, depending on how it contributes to the powers. The conclusion of a constant Ieff/Ioff ratio is different from the conventional device design guidelines, which usually fixes Ioff and Vdd as technology specifications. Here, there is no a universal optimal value for Ioff or Vdd.

 PPstatic subVt IV off dd 2  CVdd Pdyn   (4.14) CVdd/ I eff  PIVdyn eff dd   PIVstatic off dd

 PPdyn  dyn II eff  eff     PP II  totPDSOI  tot CNFET  offPDSOI  off CNFET

100 PD SOI(variability) 80 CNFET(variability)

(%) 60 tot

/P 40 dyn P 20

0 10-2 10-1 100 101 102 Chip power (W)

Figure 4.20 The percentage partition of Pdyn in Ptot for optimal designs

106

Table 4.1 shows the device characteristics for the optimal design at Ptot=10W. Similar

Ieff/Ioff ratios are shown for PDSOI and CNFETs, as discussed earlier. GAA CNFETs, with a better short channel performance, can achieve a similar Ieff/Ioff ratio as PDSOI, using a shorter channel length and a smaller Vdd window (Table 4.1). The ballistic transport properties of CNFETs enable a higher Ieff than PDSOI. The load capacitances in the CNFET circuit are smaller than the PDSOI circuit, because both the channel length and interconnect lengths are shorter (Figure 4.21). As a result, CNFET chip can run as 5x faster as PDSOI chip at the same power level.

lgId Vdd_CNFET CNFET

PDSOI

Vdd_PDSOI V g Figure 4.21 Illustration of the optimal design for CNFET vs. PDSOI

Table 4.1 Device characteristics for Ptot=10W

Chip power=10W, Lgate Vdd Ieff Ieff /Ioff Cload ave incl. variability nm V mA/µm fF PDSOI (optimal) 21nm 0.7V 0.234 456 1.05 CNFET (optimal) 10nm 0.4V 0.62 334 0.68

Optimization results also indicate that a smaller diameter is preferred (Figure 4.22). This conclusion is different from analysis based on conventional device metrics. This is mainly because electrostatic scaling allows the gate length to decrease with decreasing diameter and the band-to-band tunneling at the drain junction is suppressed due to the large bandgap in CNFETs with small tubes, and we have chosen not to impose a contact resistance penalty for small CNTs in our model. EFSD is set to be

107

0.1eV in Figure 4.19, while in Figure 4.22, EFSD is also included in the optimizing loop, which results in some variation in the optimal EFSD (inset). However, Figure

4.23 shows that the delay has only a weak dependence on EFSD at the 10W power level, decreasing the importance of the precise optimal EFSD for the case we have considered, in which RS and RD do not depend on EFSD

7.5 0.15 13 (eV) 7 12 (nm) FSD 0.1 gate 6.5 11 Optimal L Optimal Optimal E Optimal 0.05 10 6 0 1 2 3 Diameter (nm)

5.5 Power level=10W

Ave. loaded delay (ps) delay loaded Ave. Including variability 5 0 0.5 1 1.5 2 2.5 3 . Diameter (nm) Figure 4.22 Optimal delay at power constraint of 10W as a function of diameter, including variability. The variables Vdd, Lgate, s, LSD and φMS are simultaneously optimized with the inset showing the optimal Lgate and EFSD at each diameter value. 6 Power level = 10W 5.8 D=0.5 nm Including variability 5.6

5.4

5.2 Ave. loaded delay (ps) delay loaded Ave. 5 0 0.1 0.2 0.3 0.4 E (eV) FSD Figure 4.23 EFSD dependence of optimal delay at power constraint of 10W and a fixed diameter of 0.5nm, with variability. The optimized variables include Vdd, Lgate, s, LSD and φMS . The optimal point agrees with the case of D=0.5nm in Figure 4.22.

108

4.6 Conclusions

We develop a non-iterative analytical model for intrinsic CNFET with a short channel length operating in the ballistic regime. The model requires no iterative method or numerical simulation. It also includes coupling effects from the source and drain, which cannot be ignored for short channel devices. The quantum capacitances in the ballistic limit are carefully analyzed, with the assumption of negligible scattering. Negative quantum capacitance is predicted in certain regimes. We show that source exhaustion is an important characteristic in devices operating under ballistic transport, which sets a fundamental limit for device current. Source engineering to supply enough carriers for high current will be very important. The utility of this compact model has been demonstrated by using it in a system-level optimization program that determines optimal device design parameters within a given system design constraint (e.g power). Our optimization results show that CNFETs have great potential for very advanced technology nodes.

4.7 Bibliography

[1] R. Chau, et al., "Benchmarking nanotechnology for high-performance and low- power logic transistor applications," Nanotechnology, IEEE Transactions on, vol. 4, pp. 153-158, 2005.

[2] P. Avouris, et al., "Carbon-based electronics," Nat Nano, vol. 2, pp. 605-615, 2007. 109

[3] J. Guo, et al., "Performance projections for ballistic carbon nanotube field-effect transistors," Applied Physics Letters, vol. 80, pp. 3192-3194, 2002.

[4] Y. Ouyang, et al., "Comparison of performance limits for carbon nanoribbon and carbon nanotube transistors," Applied Physics Letters, vol. 89, p. 203107, 2006.

[5] M. Lundstrom, J. Guo, Nanoscale transistors: device physics, modeling and simulation, Springer, 2006

[6] J. Deng and H. S. P. Wong, "A Compact SPICE Model for Carbon-Nanotube Field-Effect Transistors Including Nonidealities and Its Application--Part II: Full Device Model and Circuit Performance Benchmarking," Electron Devices, IEEE Transactions on, vol. 54, pp. 3195-3205, 2007.

[7] B. C. Paul, et al., "Modeling and analysis of circuit performance of ballistic CNFET," in Proceedings of the 43rd annual Design Automation Conference, San Francisco, CA, USA, 2006, pp. 717-722.

[8] A. Balijepalli, et al., "Compact modeling of carbon nanotube transistor for early stage process-design exploration," in Proceedings of the 2007 international symposium on Low power electronics and design, Portland, OR, USA, 2007, pp. 2-7.

[9] T. J. Kazmierski, et al., "Numerically Efficient Modeling of CNT Transistors With Ballistic and Nonballistic Effects for Circuit Simulation," Nanotechnology, IEEE Transactions on, vol. 9, pp. 99-107, 2010.

[10] L. Wei, D. Frank, L. Chang, H.-S. P. Wong, “A Non-iterative Compact Model for Carbon Nanotube FETs Incorporating Source Exhaustion Effects,” 2009 IEEE International Electron Devices Meeting (IEDM 2009), pp. 917 – 920, Baltimore, USA, December 6 – 9, 2009.

[11] Z. Chen, et al., "Externally Assembled Gate-All-Around Carbon Nanotube Field- Effect Transistor," Electron Device Letters, IEEE, vol. 29, pp. 183-185, 2008.

110

[12] L. Wei, et al., "1-D and 2-D devices performance comparison including parasitic gate capacitance and screening effect," Electron Devices Meeting (IEDM), 2007 IEEE International, 2007, pp. 741-744.

[13] N. Patil, et al., "Circuit-Level Performance Benchmarking and Scalability Analysis of Carbon Nanotube Transistor Circuits," Nanotechnology, IEEE Transactions on, vol. 8, pp. 37-45, 2009.

[14] J. Deng, H.-S. P. Wong, “Modeling and analysis of planar gate electrostatic capacitance for 1-D FET with multiple cylindrical conducting channels,” IEEE Trans. on Electron Devices, vol. 54, No. 9, pp.2377-2385, 2007.

[15] L. Wei, et al., "Modeling and performance comparison of 1-D and 2-D devices including parasitic gate capacitance and screening effect," IEEE Transactions on Nanotechnology, vol. 7, pp. 720-727, November 2008.

[16] R. Plonsey and R. Collin, Principles and Applications of Electromagnetic Fields, p146-163. New York: McGraw-Hill, 1961.

[17] Maxwell 3D®, Ansoft Corporation, PA.

[18] C. Dekker, "Carbon Nanotubes As Molecular Quantum Wires," Physics Today, vol. 52, p. 22, 1999.

[19] A. Javey, et al., "Ballistic carbon nanotube field-effect transistors," , vol. 424, pp. 654-657, 2003.

[20] C. T. White and T. N. Todorov, "Carbon nanotubes as long ballistic conductors," Nature, vol. 393, p. 240, 1998.

[21] J. W. Mintmire and C. T. White, "Universal Density of States for Carbon Nanotubes," Physical Review Letters, vol. 81, p. 2506, 1998.

[22] J. Guo, et al., "Assessment of silicon MOS and carbon nanotube FET performance limits using a general theory of ballistic transistors," in Electron Devices Meeting, 2002. IEDM '02. Digest. International, 2002, pp. 711-714. 111

[23] M. V. Fischetti, et al., "Simulation of Electron Transport in High-Mobility MOSFETs: Density of States Bottleneck and Source Starvation," in Electron Devices Meeting, 2007. IEDM 2007. IEEE International, 2007, pp. 109-112.

[24] D. J. Frank, et al., "Optimizing CMOS technology for maximum performance," IBM Journal of Research and Development, vol. 50, pp. 419-431, 2006.

[25] D. Markovic, et al., "Methods for true energy-performance optimization," Solid- State Circuits, IEEE Journal of, vol. 39, pp. 1282-1293, 2004.

[26] C. Piguet, et al., "Leakage Reduction at Architectural Level," in Integrated Circuit Design and Technology, 2006. ICICDT '06. 2006 IEEE International Conference on, 2006, pp. 1-4.

[27] M. H. Na, et al., "The effective drive current in CMOS inverters," in Electron Devices Meeting, 2002. IEDM '02. Digest. International, 2002, pp. 121-124.

112

Chapter 5 Technology Assessment Methodology for Complementary Logic Applications based on Energy-Delay Optimization

5.1 Introduction

Aspiring emerging device technologies, for example, III-V field effect transistors (III- V), carbon nanotube field effect transistors (CNFETs), and tunneling field effect transistors (TFETs) are often targeted to outperform Si MOSFETs. Historically, off- state current (Ioff) and supply voltage (Vdd) are specified as technology targets for Si MOSFETs at each technology node [1] in accordance with Dennard’s scaling [2] and practical application constraints. Logic devices are then optimized to achieve high on- state current (Ion) or effective current (Ieff) [3] under fixed Ioff and Vdd, in order to reduce device switching delay (CV/I). While this view of device design/comparison has served us well, technology development and system design have increasingly focused on power/energy and delay optimization [4-6] where Vdd and Ioff are treated as “free variables” for achieving the best power/energy and delay tradeoff for a given system architecture [7]. Such device optimization has been done at the full-chip level for high performance [8] and low power [7] systems for Si and recently for carbon nanotube transistors (CNFETs) [9]. Such chip-level optimization, however, is complex and not practical for device-level benchmarking for emerging technologies. Attempts to develop a simple device-level benchmarking methodology has been made in [9],

113

without taking into account device variation and noise margin. We present a new device technology assessment methodology based on energy-delay optimization which treats Ioff and Vdd as “free variables” and bounded by constraints due to device variation and circuit noise margin. Architecture and application environment are captured in two key design parameters: activity factor (a) and logic depth (LD). In Section 5.2, we explain the methodology of energy-delay optimization subject to variation and noise margin constraints. In Section 5.3, the optimization results of 32nm Si CMOS are discussed in depth. In Section 5.4, today’s best available Si MOSFETs and several emerging device technologies (III-V, CNFETs, TFETs) are evaluated and compared. Projection to 10nm gate length (Lg) is made.

5.2 Methodology

Device switching delay (D) and total energy per switch (Etot) are two key figures of merit, which, to the first order, are functions of Ion, Ioff, and Vdd. To examine the energy efficiency of a device at a target delay, the first step is to figure out the possible

Ion, Ioff, and Vdd combinations that achieves the target delay. Given a device structure, a set of Id-Vg curves under different Vds can be measured or simulated.

Figure 5.1 illustrates the methodology to determine Ion and Ioff at a selected Vdd. Here we assume Ioff can be freely adjusted over a reasonable range without changing the device structure by engineering the gate workfunction. We first pick the curve with Vds which equals to a certain Vdd=Vdd1. On this curve, any two points separated by a Vdd window (vertical dashed lines) of Vdd1 on the Vg-axis give a possible combination of

114

Ioff and Ion under this supply voltage, corresponding to Vg=Vgo and Vg=Vgo+Vdd1, respectively. By sweeping this window along the Vg-axis, different (Ioff, Ion) combinations can be obtained for the same Vdd. We change Vdd by going to another curve with a different Vds=Vdd2 and sweep with a different window of Vdd2. Different devices have different sources of device variations. For Si MOSFET, CNFET, and III-V transistors, the major variation sources include random dopant fluctuation and gate line edge roughness [11][12]. CNFETs have additional variation sources such as channel density and CNT chirality variations [9]. For vertical tunneling TFET, random doping fluctuation can affect the minimum band bending for tunneling to occur. As an illustration of the methodology, a lumped threshold voltage variation (σVt) is employed to capture the dominant effect for field-effect transistors similar to Si

MOSFETs. In general, device variations can give rise to variations in both the Id and

Vg directions can arise in a complex manner. For example, a variation in CNT diameter results in both a threshold voltage variation and a variation of the transport

* property of series resistance of the CNFET. In the worst case condition, the Vdd window is shrunk from dashed lines to dotted lines due to σVt. Equation (5.1) relates

- + the worst case on-state current (Ion ) and off-state current (Ioff ) to the supply voltage

nom variation (σVdd) and device variations (σVt). Vdd is the nominal supply voltage. The detailed values of σVt and σVdd (e.g., 1σ, 3σ, or 5σ values) will be chosen based on practical considerations.

VVnom VVV,  nom V dd dd dd dd dd dd I IV V V VV, V on d gs g0 dd t ds dd (5.1)

 IoffIV d gs V g0  VV t, ds V dd

* A lumped σVt representation of device variation is more applicable to Si MOSFET-like devices such as Ge and III-V channel FETs. For other emerging novel devices, the physics of the variation and their impact on the I-V characteristics must be evaluated on an individual

basis. This lumped σVt approximation is a subject for further study. 115

Log(Id) Vdd2 Vds=Vdd2 I on1 Vds=Vdd1

σVt σVt Ioff1

Vdd1

Vg Vg0 Vg0+Vdd1 “on” (above‐threshold) “off” (sub‐threshold) Figure 5.1 Methodology to determine Ion and Ioff given the device I-V characteristics and supply voltage.

With Ion, Ioff, and Vdd, the delay D and the total energy consumption per switch Etot can be calculated following Equation (5.2)-(5.4).Worst case conditions are considered by including supply voltage variation (σVdd) and device variation (σVt) (Figure 5.1). In a complementary logic circuit, logic depth (LD) and average activity factor (a) are two key design parameters that are mainly determined by the application and circuit design [7][13]. LD is the number of logic gates a signal needs to travel in a critical path within one clock cycle. An example circuit with LD=4 is shown in Figure 5.2. a is the average probability of a gate to switch during one clock cycle. FO is the fanout. Estat and Edyn are static energy consumption due to leakage, and dynamic energy consumption for active switching, respectively. Cgc, Cpar, Cwire, and Ctot are the intrinsic capacitance, parasitic capacitance, wiring capacitance, and total load capacitance, respectively.

CFOCCtot gc  par  C wire (5.2)

 CVtot dd CV tot dd DLD  (5.3) Ion ILDon

116

EEtot dyn E stat    aCtot  V dd  V dd  I off  V dd  D (5.4)   2 Iaoff aCtot  V dd 1 _ ILDon Latch Latch LD=4

Delay

Figure 5.2 An example circuit with LD=4.

Vout N N 2 A V V A:(0,VH)B:(Vdd, VL) C in out C:(VLi, VHo) D:(VHi, VLo) N1

M M:(0.5Vdd, 0.5Vdd) 0.5VNM0 N:(0.25Vdd+0.5VNM0, VNo) N1:(0.25Vdd, 0.75Vdd) B D V N2:(0.5Vdd, Vdd) in (a)

I1 I3 0.25Vdd 0.75Vdd 0 p∙Vdd +0.5VNM +0.5VNM I2 I4

I1≥I2 ↔ VH≥p∙Vdd I3≥I4 ↔ N above N1N2 →V ≥V NM NM0 (b) (c) Figure 5.3 (a) Circuit diagram and voltage transfer curve (VTC) of an inverter.

VH and VL are high and low logic levels, respectively. Points C and D are where the slope of VTC is equal to -1. (b) Logic level constraint under the nominal condition:

I1≥I2 for the high logic level to be at least p·Vdd (p<1, p=0.9 in this work). (c) Noise margin constraint under the nominal condition: Point N being above Line N1N2 is a sufficient condition for noise margin VNM >VNM0 (VNM0<0.5Vdd), assuming symmetric pull-up and pull-down devices.

117

To realize complementary logic functions, constraints of high/low logic levels (VH and

VL, respectively) and noise margin must be satisfied. Figure 3.3 (a) shows the circuit diagram and the voltage transfer curve (VTC) of a typical CMOS inverter. VH and VL are defined as the output voltage corresponding to Vin=0 and Vdd, respectively. We make an assumption that the pull-up and pull-down devices are symmetric in this work. To satisfy logic level constraints, VH must be higher than p·Vdd (p<1), translating to the constraint on the current as in Equation (5.5) and Figure 5.3(b). p is set to be 0.9 in this work. Points C and D are where the slope of VTC is equal to -1.

The coordinates of Points C and D on the Vin-Vout planes are (VLi, VHo) and (VHi, VLo), respectively. Noise margin is defined as VNM=VHo-VHi=VLo-VLi, assuming symmetric pull-up and pull-down devices. It requires iterative calculation to accurately calculate the noise margin, given a specific device. To simplify the calculation, we apply a stronger constraint imposed by Equation (5.6) and Figure 5.3(c), which is equivalent to the condition that Point N is above Line N1N2, assuming symmetric pull-up and pull-down devices. This is a sufficient condition for the noise margin constraint of

VNM>VNM0, where VNM0 is the target noise margin and VNM0<0.5Vdd (see proof in Section 5.7). With (5.5) and (5.6) the constraints on logic levels and noise margin are applied without iterative calculation.

  nom IIVV10dgsg V ddtds VVpV, dd V dd  IIVV V VVV,  pVnom (5.5)  20dgsg t dddsdd dd nom nom  If I12 I , VLddH  p V and V  (1 p ) V dd ( p  1)

nom  VVgs g0 0.75 V dd  0.5 V NM  V dd  V t , II  3 d nom  VVVVds0.25 dd 0.5 NM dd  nom (5.6)  VVgs g0 0.25 V dd  0.5 V NM  V t  V dd , II   4 d nom  VVVVds0.75 dd 0.5 NM dd nom If IIVV34, NM  NM 0 (V0.5 NM 0  V dd )

118

A combination of Ion, Ioff, and Vdd corresponds to a point on the Etot-D plane. For each

Vdd, an Etot-D curve is generated as the Vdd window is swept along the Vg axis as in

Figure 5.1. As shown in Figure 5.4, σVt pushes the curve toward a less energy- efficient direction (upper right), compared with the nominal case. Noise margin sets the lower bound of the achievable delay. The curve terminates at long delays when band-to-band tunneling current (BTBT) dominates. State-of-the-art 32nm Si technology [14] is used as an example device.

-14 10 Points do not satisfy the noise margin

-15 constraint 10 Curves σV =0,V =0 terminates t NM0 σV =30mV, when BTBT t σV ↑ t dominates VNM0=0 -16 10 σVt =30mV,

Energy per switch (J) 32nm Si MOSFET VNM0=0.15Vdd V =0.4V, LD=10, a=0.1, FO=1 dd -11 -10 -9 -8 10 10 10 10 Delay (sec) Figure 5.4 Energy-Delay plot for 32nm Si MOSFET[14] at nominal Vdd=0.4V.

10-14 Ld=10, a=0.1, FO=1, nom σVt = 20mV, VNM0=0.1Vdd nom Vdd =0.95V -15 nom 10 Vdd =0.8V nom Vdd =0.65V nom Vdd =0.5V nom Vdd =0.2V -16 10 32nm Si

Energy per switch (J) nom MOSFET Vdd =0.35V 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Delay (sec)

Figure 5.5 Pareto curve of minimum Etot vs. D (blue circles) are obtained by joining the minimum Etot and D points for all the possible Vdd, Ion, and Ioff combinations. Points along each Vdd curve is obtained by sweeping the Vdd window along the Vg axis.

119

For a certain D, the most energy efficient point is the one with minimum Etot while simultaneously satisfying the logic level and noise margin constraints (Figure. 5.5). For monotonic I-V curves and a simple voltage-independent input capacitance

Equation (5.2) for a specific device, the optimal combination of Ion, Ioff, and Vdd to achieve minimum Etot is unique for a certain D. The Pareto optimal curve is obtained by joining these optimal points (blue circles in Figure 5.5). Designs on the Pareto curve are all optimal designs with different energy-delay tradeoffs that designers will choose based on application needs. A device that is strictly better should have an Etot-

D optimal curve that is toward the origin of the Etot-D plot. The optimal Vdd depends

-5 -3 10 10 (a) 32nm Si (b) TFET m) m) -6 V =  -10  10 ds 10 Vds =0.5V

0.05V, 1V (A/ (A/ d I d I line: exp. line: exp. 10-9 sym.: model -15 sym.: model 10 0 0.5 1 -0.4 -0.1 0.2 V (V) V (V) gs -4 gs -3 10 10 (c) III-V MOS (d) CNFET m) m)  -6 -7 10 Vds= 10 Vds =0.1V, (A/

(A/um) 0.05V,0.5V

d 0.3V, 0.5V (A/ I d -9 line: exp. I line: exp. 10 sym.: model 10-10 sym.: model -0.2 0 0.2 0.4 0.6 0.5 1 1.5 2 V (V) gs V (V) gs

Figure 5.6 IdVg of published experimental data (symbols) are fitted to analytcial models (lines) for each device: (a) 32nm Si MOSFET (b) TFET (c) CNFET, and (d) III-V MOSFET. Key parameters and assumptions are listed in Table 5.1.

120

nom on σVt, VNM0, etc., and varies along the Pareto curve. In Figure 5.5, the Vdd =0.2V

nom curve is above the Vdd =0.35V curve because σVt is roughly Vdd-independent and thus has a greater impact for low Vdd operation. Using Si MOSFET as a baseline we compare state-of-the-art experimental 32nm Si MOSFET [14] with experimental TFET [15], III-V MOSFET [16], and CNFET [17] (Table 5.1 & Figure 5.6).

Table 5.1 Key parameters for analytical model (Physical equivalent oxide thickness

(EOTphy) and source/drain resistances (Rsd) are consistent with ITRS 09 [1]. Note that the ITRS target may not be achievable by practical physical devices.)

Other parameters (Some Rsd Lg EOTphy Cgc+Cpar Cwire parameter definitions Device * (Ohm (nm) (nm) (fF/µm) (fF) -µm) available in the references) 32nm Si SS=98mV/dec, MOSFET 30 0.9 2.3 2 230 DIBL=130mV/V [14][18] E =0.56eV, m*=0.06m , TFET [15] 5 3 15 2 N/A g 0 Vtunnel=0.595V [15] III-V SS=95mV/dec,DIBL=1 MOSFET 75 2.2 1.9 2 330 25mV/V,In0.7Ga0.3As

Experimental [16][19] channel SS=70mV/dec, avg. CNFET 130 2 0.85 2 8.8k CNT diameter=1.2nm, [17][20] 3CNTs/µm Multi-gate, 11nm ITRS 10 0.57 0.98 0.5 120 SS=85mV/dec, target [1] DIBL=100mV/V E =0.56eV, m*=0.06m , 11nm TFET 10 0.57 0.98 0.5 N/A g 0 Vtunnel=0.595V [15] Top-gate IIIV-OI, 11nm III-V- 10 0.57 0.85 0.5 120 SS=78mV/dec, on-Insulator Projection DIBL=132mV/V 11nm SS=72mV/dec, avg. ballistic 10 0.57 0.9 0.5 120 CNT diameter=1.2nm, CNFET [9] 250 CNTs/µm * Physical effective oxide thickness. Cgc and quantum capacitances are separately calculated.

121

5.3 Energy-Delay Optimization

We take 32nm Si MOSFET as a sample device in Figure 5.7-5.10. The optimal device design depends on both LD (Figure 5.7a) and a (Figure 5.7b); thus, no universal optimal device design exists. For an optimal design, Etot is minimized for the specified delay D. In Equation (5.4), the 2nd term inside the 2nd parenthesis is usually less than 1

+ 2 (i.e. Estat

-15 10 LD=5 LD=10 LD=20

10-16

a=0.1, FO=1 32nm Si MOSFET σV =20mV, V =0.1V nom -17 t NM0 dd (a) min Energy per switch (J) switch per Energy min 10 10-11 10-10 10-9 10-8 10-7 Delay (sec)

-15 10 LD=10, FO=1 a=0.01

σVt =20mV, a=0.03 nom VNM0=0.1Vdd a=0.1

10-16

32nm

-17 Si MOSFET (b) min Energy per switch (J) switch per Energy min 10 10-11 10-10 10-9 10-8 10-7 Delay (sec)

Figure 5.7 Minimum Etot vs. D with (a) different LD, and (b) different a. The optimal solutions are LD and a (applications and circuit architecture) dependent.

122

+ small for low Etot. When Vdd decreases, Ion/Ioff decreases. However, the noise margin sets a minimum requirement for Ion/Ioff. Thus, Ion/Ioff is relatively constant in Figure

5.8. This results in a relatively constant Edyn/Etot as shown in the inset of Figure 5.8, because Edyn/Estat is proportional to Ion/Ioff. Similar conclusions have been obtained from study based on Si technology [4][13].

104 a=0.01,LD=10 a=0.1,LD=10 103 a=0.1,LD=20 off /I 2

on 0.9 10 tot

FO=1 /E 0.8 dyn opt I 1 0.7 10 σVt =20mV

nom opt E 0.6 V =0.1V -11 -9 -7 NM0 dd 10 10 10 32nm Si MOSFET Delay (sec) 10-11 10-10 10-9 10-8 10-7 Delay (sec) Figure 5.8 Optimal Ion/Ioff corresponding to selected Pareto curves in Figure 5.7. As a result of noise margin constraint (see text), Ion/Ioff and Edyn/Etot (inset) are roughly constant with D, regardless of LD and a.

nom For a certain delay, Vg0, i.e., the position of the Vdd window, at a fixed Vdd is

- determined by Ion in Equation (5.2). For a smaller delay, this window moves toward

- the “on” direction in Fig 5.1, to achieve higher Ion , with the penalty of reducing Ion/Ioff.

Specifically, for MOSFET, this means the Vdd window is required to move from the subthreshold region to the above threshold region when delay reduces (Figure 5.1).

nom When Ion/Ioff is not enough to satisfy the noise margin constraint, Vdd is forced to

nom increase (Figure 5.9) to achieve the small delay. Because σVt is independent of Vdd ,

nom Vdd is bounded by σVt at large delay, for devices with a constant subthreshold slope,

nom such as MOSFETs. As revealed by Figure 5.9, Vdd in the optimal cases depends on 123

the target delay, as well as LD and a, and it is not a technology constant as prescribed by the ITRS. The optimal Ion and Ioff also depend on LD and a (Figure 5.10). a has a big impact on Ioff. Because Vdd varies within a small range of < 3x, both Ion and Ioff are linear with D in log-log scale (see Equation (5.3) and Figure 5.10), with a slope of around -1.

1 a=0.01,LD=10 0.8 a=0.1,LD=10 a=0.1,LD=20

(V) 0.6 Vdd is bounded due to σV nom dd t 0.4

opt V Vdd increases to 0.2 satisfy the noise 0 margin constraints 10-11 10-10 10-9 10-8 10-7 Delay (sec) nom Figure 5.9 Optimal Vdd corresponding to selected Pareto curves in Figure 5.7.

10-2 a=0.01,LD=10 Ion a=0.1,LD=10 10-4 a=0.1,LD=20 (A/um)

off -6 10 Ioff

, opt I , opt Slope=‐1 -8 on 10 FO=1 σVt =20mV

opt I nom Slope=‐1 VNM0=0.1Vdd 10-10 10-11 10-10 10-9 10-8 10-7 Delay (sec)

Figure 5.10 Optimal Ion and Ioff corresponding to selected Pareto curves in Figure 5.7.

124

5.4 Comparison: Si, III-V, CNT, TFET

5.4.1 Today’s Devices

The best-available emerging devices [14]-[18] are evaluated (Figure 5.11) using the methodology described above, assuming the same σVdd and σVt. Today’s III-V MOSFET and CNFET are 1.5-2x and 2-3.5x more energy efficient than Si device for >1GHz and <1GHz applications, respectively. III-V MOSFET is favored for above- 1GHz applications over Si MOSFET, while band-to-band tunneling current starts to limit its use at long delays (>1ns). III-V has no advantage in the very small delay regime. For a certain Vdd, III-V MOSFET can operate in a more “off” mode (i.e. higher Ion/Ioff) in Figure 5.1, while Si MOSFET has to operate in a more “on” mode (i. e., lower Ion/Ioff) to achieve the same Ion. In the >1GHz region, III-V has little advantage if pushed to very high performance (points to the left), because the current degradation at high Vgs is larger for III-V MOSFET than a Si device due to higher Rsd. The curve terminates on the right due to band-to-band tunneling (BTBT) current.

-14 10 32nm Si[10] LD=10, a=0.1, TFET[11] FO=1,σVt =20mV, nom III-V MOSFET[12] VNM0=0.1Vdd -15 10 CNFET[13] IIIV: More -16 1.5‐2x 10 energy 2‐3.5x efficient

Energy per switch (J) for lower power -17 > 1GHz < 1GHz 10 -12 -11 -10 -9 -8 -7 -6 -5 10 10 10 10 10 10 10 10 Delay (sec)

Figure 5.11 Minimum Etot vs. D for different device structures in the best available today’s experimental results.

125

Despite the sparse CNT density (~3-5 CNT/µm), the CNFET in [17] still outperforms 32nm Si due to small self-loaded capacitance of sparse channels and good electrostatic control [9]. TFET is less energy efficient than Si MOSFET, mainly due to the large intrinsic capacitance of today’s experimental device with a long gate length.

5.4.2 Projection to 10 nm Gate

Performances of emerging device structures (Figure 5.12) are projected at 11nm technology node with both conventional methodology and the new methodology describe above (Figure 13a, Table 5.1). Good electrostatic gate control is required to operate the device with sub-1V Vdd while satisfying the noise margin constraint. Hence, a III-V-on-Insulator (IIIV-OI) structure is implemented. With the conventional methodology, assuming a fixed Vdd of 0.66V and a fixed Ioff of 100nA/µm, the device delays of IIIV-OI, TFET, and CNFET are 1.18, 100, and 0.68

-3 32nm Si 10 11nm ITRS 11nm TFET -5 10 11nm CNFET m) 11nm IIIV-OI  -7 6 10 Vgs=0.66V (A/

d V =0.66V m) gs I V =0, gs  4 -9 ITRS 10 Ioff =100nA/µm 2 target (mA/ d -11 (a) I (b) 10 0 -0.10 0.2 0.4 0.6 0.8 1 0 0.33 0.66 1 V (V) V (V) gs gs

Figure 5.12 IdVg of projected devices at Vdd=0.66V in (a) log and (b) linear scale. The threshold voltages are adjusted to achieve Ioff=100nA/µm at Vgs=0. The Ion for CNFET, TFET, and III-V-on-Insulator are 1.4, 0.01, and 0.77 times that of 11 nm

ITRS target (2.0mA/µm) for a fixed Ioff=100nA/µm and Vdd=0.66V comparison. The current of CNFET saturates at high Vgs due to source saturation [9].

126

times of that of the ITRS target device. The total energy consumption per switch of III-OI, TFET, and CNFET are 0.91, 1.05, and 0.95 times of that of the ITRS target device. With the new methodology, IIIV-OI outperforms ITRS target by ~1.25x around 3-10GHz. TFET is ~5-10x more energy efficient than the ITRS target when high speed is not required (0.1-1GHz), because the small subthreshold slope enables

TFET to operate at very low (0.1xV)Vdd (Figure 5.13b). Thus TFET is a promising candidate for applications where high speed is not required. CNFET can be 2-3x

LD=10, 32nm Si -15 11nm ITRS 10 a=0.1, FO=1 11nm TFET 11nm IIIV-OI -16 11nm CNFET 10 IIIV‐OI: 1.25x

-17 10 CNFET TFET: Energy per switch (J) 2‐3x 5‐10x (a) -18 10 -12 -11 -10 -9 -8 -7 -6 -5 10 10 10 10 10 10 10 10 Delay (sec)

32nm Si 0.8 11nm ITRS 11nm TFET 11nm IIIV-OI 0.6 11nm CNFET (V)

nom dd 0.4 V

0.2 LD=10, a=0.1, FO=1 (b) 0 -12 -11 -10 -9 -8 -7 -6 -5 10 10 10 10 10 10 10 10 Delay (sec)

nom Figure 5.13 (a) Minimum Etot vs. D and (b) corresponding Vdd for emerging devices projected for 11nm technology node. A device that is strictly better should

have Etot-D optimal curves that are toward the origin (lower left) of the Etot-D plot. The experimental 32nm Si MOSFET [14] results are shown for comparison.

127

more energy efficient than the ITRS target around 10-100GHz due to good electrostatic control and carrier transport. As a result, the ideal CNFETs with ballistic transport are favored for high speed applications. The optimal power supply depends critically on the device characteristics and is different for the different emerging devices.

5.5 Conclusions

This chapter develops a simple, new universal methodology to fairly evaluate different device structures, taking into consideration power/energy and delay tradeoff as well as noise margin, circuit designs, and applications constraints. Ioff and Vdd, which are conventionally fixed as technology specifications, are treated as free variables and optimized to obtain the best energy-delay trade-off. While conventional fixed Ioff and

Vdd method based on CV/I shows III-V and TFET do not meet the 11 nm ITRS target, these emerging devices both provide better energy-delay tradeoff for certain delay target ranges. This highlights the usefulness of the methodology presented in this chapter. According to this methodology, today’s best-available III-V and CNFET can outperform the best Si FET by 1.5-2x and 2-3.5x, respectively. Projected into the 10nm gate length regime, III-V-on-Insulator, CNFET, and TFET are 1.25x, 2-3x, and 5-10x (for FO1 delays of 0.3ns, 0.1ns, and 1ns respectively) better than the ITRS target at the same gate length.

128

5.6 Bibliography

[1] ITRS http://public.itrs.net/.

[2] R. Dennard et al., “Design of Ion-Implanted MOSFET’s with Very Small

Physical Dimensions,” IEEE Journal of Solid-State Circuits, vol. SC-9, no. 5, pp.

256 –268, Oct 1974.

[3] M. H. Na, et al., "The effective drive current in CMOS inverters," in Electron Devices Meeting, 2002. IEDM '02. Digest. International, 2002, pp. 121-124.

[4] D. Markovic, et al., "Methods for true energy-performance optimization," Solid- State Circuits, IEEE Journal of, vol. 39, pp. 1282-1293, 2004.

[5] M. Horowitz, et al., "Scaling, power, and the future of CMOS," in Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, 2005, pp. 7 -15.

[6] H. Kam, et al., "Circuit-level requirements for MOSFET-replacement devices," in Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, 427.

[7] B. Zhai, et al., "Energy-Efficient Subthreshold Processor Design," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 17, pp. 1127-1137, 2009.

[8] D. J. Frank, et al., "Optimizing CMOS technology for maximum performance," IBM Journal of Research and Development, vol. 50, pp. 419-431, 2006.

[9] L. Wei, D. Frank, L. Chang, H.-S. P. Wong, “A Non-iterative Compact Model for Carbon Nanotube FETs Incorporating Source Exhaustion Effects,” 2009 IEEE International Electron Devices Meeting (IEDM 2009), pp. 917 – 920, Baltimore, USA, December 6 – 9, 2009.

129

[10] H. Kam, “MOSFET Replacement Devices for Energy-Efficient Digital integrated Circuits,” Ph. D. Thesis, University of California at Berkeley, 2009.

[11] A. Asenov et al., “Simulation of Intrinsic Parameter Fluctuations in Decananometer and Nanometer-Scale MOSFETs," IEEE Trans. Electron Devices, p. 1837, September 2003.

[12] Dave Reid, et al, “Statistical enhancement of combined simulations of RDD and LER variability: What can simulation of a 105 sample teach us?,” in Electron Devices Meeting (IEDM), 2009 IEEE International, 2009, pp. 703-706

[13] C. Piguet, et al., "Leakage Reduction at Architectural Level," in Integrated Circuit Design and Technology, 2006. ICICDT '06. 2006 IEEE International Conference on, 2006, pp. 1-4.

[14] S. Natarajan, et al., "A 32nm logic technology featuring 2nd-generation high-k + metal-gate transistors, enhanced channel strain and 0.171 µm2 SRAM cell size in a 291Mb array," Electron Devices Meeting, 2008. IEDM 2008. IEEE International, 2008, pp. 941-943.

[15] S. H. Kim, et al., “Germanium-source tunnel field effect transistors with record high ION/IOFF," in VLSI Technology, 2009 Symposium on, 2009, pp. 178-179.

[16] M. Radosavljevic, et al., "Advanced high-K gate dielectric for high-performance short-channel In0.7Ga0.3As quantum well field effect transistors on silicon substrate for low power logic applications," in Electron Devices Meeting (IEDM), 2009 IEEE International, 2009, pp. 319-322.

[17] A. D. Franklin, et al., "Current Scaling in Aligned Carbon Nanotube Array Transistors With Local Bottom Gating," Electron Device Letters, IEEE, vol. 31, pp. 644-646, 2010.

[18] A. Khakifirooz and D. A. Antoniadis, "MOSFET Performance Scaling—Part I: Historical Trends," Electron Devices, IEEE Transactions on, vol. 55, pp. 1391-

130

1400, 2008.

[19] S. Oh and H. S. P. Wong, "A Physics-Based Compact Model of III-V FETs for Digital Logic Applications: Current-Voltage and Capacitance-Voltage Characteristics," Electron Devices, IEEE Transactions on, vol. 56, pp. 2917-2924, 2009.

[20] J. Deng and H. S. P. Wong, "A Compact SPICE Model for Carbon-Nanotube Field-Effect Transistors Including Nonidealities and Its Application--Part II: Full Device Model and Circuit Performance Benchmarking," Electron Devices, IEEE Transactions on, vol. 54, pp. 3195-3205, 2007.

5.7 Appendix

Denote VNM0 as the target noise margin (VNM0<0.5Vdd), and VNM as the circuit noise margin. To satisfy the noise margin constraint, we want VNM>VNM0. Here, we prove the statement that Point N being above Line N1N2 is a sufficient condition for

VNM>VNM0 (VNM0<0.5Vdd), assuming symmetric pull-up and pull-down devices.

Figure 5.3 (a) shows the VTC curve for a typical inverter. Points C and D are where the slope of VTC is equal to -1. The coordinates of Point C, D, M, N, N1, and N2 are defined in Figure 5.3 (a). Denote C(VinC, VoutC) and N(VinN, VoutN). We have,

VinN=0.25Vdd+0.5VNM0, and VinC=VLi, VoutC =VHo. (5.7)

Noise margin (NM) is defined as NM=VHo-VHi=VLo-VLi. Assuming symmetric pull-up and pull-down devices, we have VHo+VLo=VHi+VLi=Vdd.

131

Hence, VNM =VLi+VHo-Vdd. (5.8)

The slope of VTC becomes more and more negative, i.e. the absolute value increases, when Vin increases from 0 to 0.5Vdd (i.e., from Point A to M). The slope at Point C is

-1 on the Vin-Vout plane.

(i) If Point N is on the left of Point C,

(VoutN-VoutC)/(VinN-VoutN) < -1, VoutN-VoutC>0, and VinN-VoutN<0.

Hence, VoutC+VinC > VoutN+VinN.

(i) If Point N is on the right of Point C,

(VoutN-VoutC)/(VinN-VoutN) > -1, VoutN-VoutC<0, and VinN-VoutN>0.

Hence, VoutC+VinC > VoutN+VinN.

(iii) If Point N and Point C overlap, VoutC+VinC = VoutN+VinN.

In summary, VoutC+VinC ≥ VoutN+VinN. (5.9)

With N1(0.25Vdd, 0.75Vdd), N2(0.5Vdd,Vdd), and N(VinN=0.25Vdd+0.5VNM0, VoutN) Point

N being above Line N1N2 is equivalent with

VoutN>0.5Vdd+VinN. (5.10)

Combining (5.7), (5.9), and (5.10), we have

VLi+VHo=VoutC+VinC ≥ VoutN+VinN> Vdd+VNM0. (5.11)

Substitute (5.11) into (5.8), we have, VNM>VNM0.

132

Chapter 6 Conclusions

6.1 Summary of Contributions

In this thesis, the design space is explored for future Si CMOS technologies well as for carbon nanotube field effect transistor, a promising technology to replace or complement Si technology. Compact models of transport properties and capacitive components of different device structures have been developed to facilitate circuit- level analysis and system-level optimization. Possible ways of extending the semiconductor technology roadmap are proposed. A new benchmark and design methodology is established based on energy-delay optimization for general emerging nanoelectronics.

Si Technology

When Dennard scaling is less effective, technology boosters are introduced to improve the device intrinsic performance. Currently, one of the main approaches is strengthening the device intrinsic electrostatic integrity. Intensive research has been conducted at the device level, such as replacing bulk structure with SOI and FinFET structure, and introducing high-k/metal gate materials. In Chapter 2, starting from a device behavior model, we study the impact of improving device electrostatic integrity on the circuit-level performance by analyzing the device operating conditions in different circuit environments. Both logic switching applications and SRAM applications are discussed. Our study shows the importance of comprehending the circuit environment and system application when benchmarking device performance.

133

As a general guideline concluded from our study, strengthening the electrostatic integrity is preferred for circuits with stacked logic gates and low-power applications for logic switching applications. For SRAM applications, we demonstrate that strengthening the device electrostatic integrity can potentially improve both read and write stability simultaneously.

As the non-scalable parasitic capacitances and resistances are no longer negligible comparing to the intrinsic properties, the device-level parasitic components become critical for circuit-level performance. Parasitics engineering is inevitable for future technology generations. In Chapter 3, we first analyze the geometric and electrical properties of different capacitance components and develop compact models for a variety of conventional and emerging device structures, including bulk devices, fully- depleted silicon-on-insulator (FDSOI) devices, and planar double gate (DG) devices. With these models, we examine the impact of different device parasitic capacitances on circuit-level. When physical gate length can no longer be effectively scaled down and traditional boosters (e.g., strain, high-k/metal gate) are having diminishing return, we propose to engineer the parasitic components by “selective device structure scaling” as a fruitful path to extend the technology roadmap. By judiciously engineering the device parasitic resistance and parasitic capacitance, and considering the impact of the interconnect wiring capacitance, selective device structure scaling will enable technology scaling and contacted gate pitch scaling for several generations beyond the currently perceived limits. We show that for a fully custom designed 53- bit multiplier the selectively scaled devices with reduced footprint achieve 30% smaller layout area, 25% higher speed and 10% energy efficiency, compared with conventional devices.

134

Beyond Si technology

Beyond Si CMOS scaling, carbon nanotube field effect transistor (CNFET) is among the promising candidates to extend the technology. Previous device level analyses have suggested significant benefits, but it is not clear what the circuit/system benefits are, at the chip level. We take an application-oriented approach, as opposed to the device-level modeling employed in previous works. In Chapter 4, we first develop a fast, non-iterative analytical model for CNFET including both carrier transport and capacitance components which are the key determinants of performance. Then, the chip-level system (processor plus on-chip cache) performance of CNFET technology is optimized and compared with Si-based technology. Our optimization results show that CNFET chip can operate as 5 times faster than partially-depleted silicon-on- insulator (PDSOI) chip, for chip power levels from 0.01W to 100W. The optimization methodology and platform can be potentially transplanted to other emerging devices.

Technology assessment

Beside full chip-level optimization, simple technology assessment and optimization methodologies are also required, especially at the early development stage of a new technology. Conventionally, aspiring emerging device technologies (e.g. III-V, CNFET, TFET) are often targeted to outperform Si FETs at the same off-state current

(Ioff) and supply voltage (Vdd). We present a new device technology assessment methodology based on energy-delay optimization which treats Ioff and Vdd as “free variables”, and bounded by constraints due to device variation and circuit noise margin. We show that for each emerging device (III-V, CNFET, TFET), there is a corresponding and different optimal set of Ioff and Vdd, and an optimal energy-delay. Today’s best-available III-V and CNFET can outperform the best Si FET by 1.5-2x and 2-3.5x, respectively. Projected into the 10nm gate length regime, III-V-on-

135

Insulator, CNFET, and TFET are 1.25x, 2-3x, and 5-10x (for FO1 delays of 0.3ns, 0.1ns, and 1ns respectively) better than the ITRS target at the same gate length.

In conclusion, this thesis addresses the significance of a holistic view across the traditional boundaries of device, circuit, and system in technology assessment and projection. Guided by this concept, we chart a new path for Si CMOS technology scaling for future technology generations. We also accomplish for the first time chip- level performance assessment and optimization of new emerging transistors such as CNFETs. This establishes a new benchmark and design methodology for emerging nanoelectronics.

6.2 Recommendations for Possible Future Work

As dimensional scaling of silicon CMOS is approaching the fundamental limit, opportunities arise together with challenges for the post-Si era. have electrical, thermal, mechanical, sensing, and optical properties that are hopeful and generally superior to larger bulk solids. Devices made from nanomaterials are being heavily explored for logic applications, memory, interconnects, etc. For example, carbon-based devices/interconnect and nano scale non-volatile memories, are expected to extend and sustain the historical integrated circuit scaling cadence and reduction of cost/function into future decades. Tunneling field effect transistors and nano electro- mechanical devices are also among the competitive candidates for low power applications. In addition, organic devices show some attractive properties such as mechanical flexibility. for complex computational/low power applications

136

are also active research topics. Then the research direction naturally emerges: for a target application, what will be the best technologies? A convincing projection will be substantiated by a thorough understanding of the fundamental physics and a system- level analysis taking into account the characteristics of the target application in a global picture:

• Compact Modeling as a tool for fast-speed circuit simulation and -aided system design, is required for a new device to be widely adopted and integrated into future technologies. The first step toward compact modeling is to understand the fundamental device physics, and then the essential physics are analytically modeled in a computationally efficient way.

• Application-Orientated Optimization Strategies are necessary for technology benchmarking, as well as simple but robust design towards large-scale and cost- efficient manufacture. On the one hand, simple metrics for evaluating technology are useful for prediction and to guide device design when the technology is still under exploration or development. On the other hand, full system-level optimization is indispensible for more accurate benchmarking and more efficient design.

• Energy Efficient Systems are vital in a long term due to both environmental and technical reasons. From an environmental impact point of view, the world data center consumes 152B kWh, which comprises 1% of the world energy consumption in 2005 [1]. This number is expected to keep increasing. From a technology point of view, the performance is limited by power consumption. The cooling capability sets an upper limit for the total power consumption of a system. To realize faster, more complex and more powerful functions, the system has to operate energy efficiently. A holistic view is required. Currently, logic, storage, communication are separately studied. Most technology standards, such as supply voltages, are specified to favor the device operations. Devices to build SRAM and interconnect repeaters are realized

137

with the same set of devices customized for logic purposes. This makes the system less efficient. To build more energy efficient systems, the requirements for all different function blocks must be considered, and the performance and power consumptions must be balanced in the entire system. Emerging devices and technology give more dimensions and freedom to build a function block or the entire system. Different devices have different strengths and weaknesses. With a holistic optimization, a hybrid system will wisely allocate different types of devices and/or circuits energy-efficiently favored to different modules or applications in an optimal way and architecture. Opportunities and challenges rise from judicious choices of devices, design methodology, and system integration.

6.3 Bibliography

[1] J. G. Koomey, “Estimating total power consumption by servers in the U.S. and the world.” [Online]. Available: http://enterprise.amd.com/Downloads/.

138

Appendix Modeling and Performance Comparison of 1-D and 2-D Device including Parasitic Gate Capacitance and Screening Effect

A.1 Introduction

Nanowire and nanotube field effect transistors have similar electrostatic structures: they all have quasi-1-dimensional channels. This type of devices, denoted as 1-D devices in the paper, is widely studied and considered to be a potential solution to continue the scaling trends beyond 32nm technology, because of the better electrostatic control of the gate over the channel (gate-all-around structure) and/or the better transport property (less scattering and higher mobility, etc.) [1-6] than 2D (Ultra Thin Body- or “UTBSOI”) devices and 3D (planar bulk) devices. However, the device performance is not solely determined by the current drive. It is also critically dependent on device parasitics, especially the parasitic capacitances. In fact, in the limit of zero gate length and completely ballistic transport, the device performance is determined by the parasitic capacitances [7]. The intrinsic gate capacitance (Cgc) determines the channel charge density and should be maximized.

On the other hand, the parasitic capacitance (Cpar) increases the RC charging time and should be minimized. The ratio of the parasitic capacitance (Cpar) over the gate intrinsic capacitance (Cgc) for 1D devices is larger than that for 2D devices [8]. For the case of multiple and dense 1D channels per device, the channel screening effect

139

degrades the current carried per channel to drive capacitive loads. In Section A.2, we explain the delay metric for performance benchmarking. In Section A.3, analytical models for gate capacitances of 1D devices are developed. Section A.4 analyzes the disadvantages and advantages of 1D devices over 2D devices in terms of capacitance and current. With the models developed in Section A.2, the 1D devices are optimized and the overall benefits of 1D devices are evaluated in Section A.5. Possible scaling roadmaps for 1D devices are proposed in Section VI.

A.2 Device Delay Metric

The device delay is typically benchmarked by the metric τ=CtotV/Ieff [9]. τ can be expanded into four terms: Cgc/Ioff, Ioff/Ieff, (1+Cpar/Cgc) and V as in (A.1),

C V (C  C )V C I  C    tot  gc par  gc  off 1  par  V I I I I  C  eff eff off eff  gc  . (A.1)

Here, Ctot=Cgc+Cpar. Ieff and Ioff are the effective current and off-state current, respectively. The effective current benchmarks the driving capability of the device. A widely used format of Ieff is developed by Na et al. [9]. For 2D and 3D devices, the current can be normalized to the physical device width. For 1D device with multiple channels, normalizing the current to the physical device width introduces difficulty because of the non-planar geometry, channel density, and screening [10]. To fairly compare among different technologies, i.e. 1D, 2D and 3D, we choose to compare devices for the same Ioff/Cgc because Cgc is proportional to the channel charge at the on-state. For a constant Ioff/Cgc, the gate delay τ is determined by both Ieff/Ioff and

(1+Cpar/Cgc). A key comparison between 1D and 2D devices is to consider the tradeoff

140

(I / I )   eff off 1D I (I / I ) between the “actual current benefit” eff off 2D and the “capacitance penalty” 1 C /C  par gc 1D C  1 Cpar /Cgc 2D . Both ηI and ηc depend on the material property and the device geometry. The ratio Ieff/Ioff is determined by both the degree of gate control over the channel charge and the carrier transport in the channel. Given the same Ioff/Cgc, both a steeper subthreshold slope in the subthreshold region and a higher carrier mobility/velocity in the saturation region help improve Ieff/Ioff. For FETs with the same material at a constant Ioff/Cgc, a general conclusion is that (Ieff/Ioff)3D <(Ieff/Ioff)2D

<(Ieff/Ioff)1D [11-14]. On the other hand, 1D devices suffer from a larger Cpar/Cgc than 2D devices [8]. The overall performance of 1D devices is the result of the trade-off between the actual current benefit (ηI) and the capacitance penalty (ηc). The performance comparison of 1D devices and 2D devices, taking into account Cpar and channel screening effect is addressed in Section IV, followed by the device performance optimization Section V. To help with the calculation and analysis, we first introduce the models for the gate capacitances in Section III.

A.3 Capacitance Modeling

The planar gate structure (Figure A.1) is chosen here since (a) it is more compatible with the conventional MOSFET process and (b) a high density of parallel channels is possible with this structure. In addition, the planar gate structure does not invoke assumptions of fabrication technology that is not currently practiced in today’s technology. We abstract the channels of a 1D device to be ideal conducting cylinders

141

(a)

(b)

Figure A.1 3D simulation structure. (a) Bird’s eye view showing multiple conducting 1D channels. (b) Cross section along the gate width. with radii r and channel pitch s (if there are more than one channel). The gate and the contact plugs are also ideal conductors with the same width and height. The total number of channels under the gate is N. The total gate capacitance is decoupled into three components: gate-to-channel capacitance (Cgc), gate outer-fringe capacitance

(Cof) and gate-to-plug capacitance (Cgtp). The latter two are the parasitic parts of the gate capacitance. (Cpar=Cof+Cgtp) The details of the structures and the capacitance decompositions are shown in Figure A.1. The analytical models for Cgc and Cpar are developed to match 3D numerical simulations. The analytical models are semi- empirical with limited fitting parameters to reduce the complexity while keeping the essential physics insight. The 3D simulation is done by Maxwell3D [15], a 3D electromagnetic solver. The gate, plug, channel and source/drain extension are set to

142

be ideal conductor while the channel and source/drain extension are set to be equal potential. The electrostatic field is solved for the entire space and the capacitances are extracted from the solution.

A. Gate-to-channel capacitance: Cgc

We start from the case with only one single isolated channel under the gate (denoted with superscript “S”). With no screening effect as the channel is isolated, the gate-to- channel capacitance can be expressed by (A.2).

S 2ox 0 CaLgc 1 g , (A.2) 2 2 hr  1 gc hr2 cosh ox sub ln  rrox sub 3  Here, the quantum capacitance is assumed to be much larger than the electrostatic capacitances for simplicity. The assumption must be revised for cases where the 1D channel has a small density of states (e.g. carbon nanotube [16,17]) The gate controllability over the channel region could be less efficient when quantum capacitance is included and the performance of 1D devices are expected to be worse. We define λ as half of the feature size, which is about 1/8 of the contacted gate pitch of the technology node historically [18-22]. When the overhang of the gate to the edge of the cylinders array in the device width direction (Figure A.1b) is larger than

1.5λ (i.e. Wov>3λ), the approximation is valid that most the fringing electric field lines are shielded by the gate. The loss of the fringing E-field beyond 3λ can be corrected by the βgcr term and the pre-factor a1. Equation (2) reduces to the model in [8] for small r. The distribution of the charges on the channel surface is more complicated, when a high-k gate dielectric material (κox) different from the substrate dielectric (κsub) is used [8], which can be taken into consideration by adding image charges to satisfy

143

the electrostatic boundary condition. The model includes the impact of the image

ox sub hr2 charge by adding an item of ln  after the normal inverse hyperbolic ox sub 3r cosine item.

Following the methodology in [8], the electrostatic capacitances between the gate and one of the N identical channels in parallel can be grouped into two types: the channels in the middle of the array (with superscript ‘M’) and those at the edge of the array (with superscript ‘E’). (Figure A.1b) The channels at the edge have neighbor channels on one side, while the channels in the middle have neighbors on both sides. As a result, the capacitances of the middle channels deviate more from the non-screened (isolated channel) case due to the screening from both sides. By applying a voltage V across the gate electrode and the channel arrays, charges accumulate on the gate and the channels. When the channel is isolated, the coupling capacitance between the two electrodes is denoted by CS. When there are adjacent channels, the voltage drop between the gate electrode and the selected channel is not only caused by the charges on the selected channel but also by the charges on the adjacent channels. The

“screening capacitance”, Csr, corresponds to the equivalent capacitance due to the additional voltage drop originating from the adjacent channels. η1 and η2 describe the difference of the electrostatic environments between the two channel types under consideration to its neighbors. The two types of capacitances (CE and CM) are expressed by (A.3). The coupling between the gate and the channel is weakened when there are other channels nearby, because of the “screening” from the neighbors.

S S E CC sr M CC sr C  S , C  S . (A.3) 1CC sr 22CC sr

The screening capacitance for the gate-to-channel capacitance can be modeled by (A.4). For an infinitesimally small radius, charges are equally distributed on the

144

surface [8] and (A.4) is reduced to the format described in [8]. When the channel radius increases, charges are denser on the side closer to gate than the opposite side;

κgcr includes the effect caused by this charge re-distribution. The effective channel pitch becomes s-κgcr as used in (A.4).

4 CaL ox 0 , (A.4) gc_2 sr g 2 srthhr 2 22 gc ox   ln 2 22 srthhr 2 gc ox   With (A.2)-(A.4), the gate-to-channel capacitance when there are multiple channels in parallel can be written as (A.5).

S S E CCgcgcsr _ M CCgc gc_ sr Cgc  S , Cgc  S . (A.5) CCgcgcsr _ 2CCgc gc_ sr

By comparing with full 3D numerical simulation, we obtain values for βgc (=0.15) and

κgc (=1), respectively. a1 and a2 are fitting parameters around 0.9 and 1.2. Figure A.2 shows the comparison between the model and the 3D simulation. The device parameters that are relevant for 32nm to 11nm technology nodes (as shown in Table A.1) are used. For small EOT values, it is necessary to use a high-k gate dielectric to limit gate tunneling leakage. For the high-k case, EOT is defined to be the distance from the bottom of the gate to the top of the channel multiplied by κhighk / κSiO2, where

κhighk and κSiO2 are the dielectric constants of the high-k dielectric and SiO2, respectively.

Table A.1 Device parameters used for 32nm and 11nm technology.

λ EOT Lg Lsd Hgate

32nm 16nm 0.8nm 18nm 46nm 36nm

11nm 6nm 0.5nm 10nm 12nm 20nm

145

Figure A.2 Comparison between the model and 3D simulation for Cgc. There is less than 10% error between model and 3D simulation for device parameters that are relevant for 32nm to 11nm technology nodes (as shown in Table A.1). The fitting parameters are listed in Table A.2.

B. Gate outer-fringe capacitance: Cof

Conformal mapping is a common approach for fringe capacitance modeling [8, 23, 24]. Following the similar method as in [8], the system of the cylindrical source/drain extension region and the gate is mapped into a system with cylinders in parallel with the gate. For an isolated channel, when the radius is infinitesimally small, the electrical field of the system after conformal mapping contains only fringing field. On the other hand, when the radius is infinitely large, the cylinder becomes a flat plate and

146

there is only normal electrical field. When the radius is in between the two limit

S cases, Cof consists of the fringing component (Cof_inf_fr) and the normal component (C- of_inf_nr) given by (A.6)-(A.7).

2 22 cap0L sd H eff_ frhLr of0.5 sd  of , Cof_inf_ fr  , (A.6) 1 H eff_ fr cosh  r

220.5Lsd cap0 HtrLeff_ nr ox of  of0.5 sd , Cof_ inf_ nr  , (A.7) Heff_ nr

κcap is the dielectric constant of the dielectric between the gate and the source/drain plug. Cof_inf_fr corresponds to the capacitance supported by the fringing field, while Cof-

_inf_nr is the part supported by the normal field. In equation (A.6), the fringing component (Cof_inf_fr) can be modeled similar to that in [8], but with a different Heff_fr (see Figure A.3) to account for the case of a large r. The normal-fringing component

(Cof_inf_nr) is modeled by a conformal mapping method as in [8] with an equivalent width of ar4 2 and an equivalent Heff_nr. Since Heff_nr accounts for the normal component, there is no corresponding item of βofr as in Heff_nr in (A.7). The higher the dielectric constant of the gate dielectric is, the better the coupling between the gate and the channel and the smaller αof is. For κox=4 and 25, αof is 0.6 and 0.1, respectively. βof

S and αof are fitted to be 0.6 and 0.5, respectively. The outer fringe capacitance (Cof ) for an isolated channel is then given by the weighted sum of the two components

(Cof_inf_nr and Cof_inf_fr) in (A.8). In equation (A.8), a3(1-2r/Wgate) represents the weight of the sidewall-fringing part. Here, we approximate the charges on the channel to be line charges.

2r CaS 1 C  arC 2 , (A.8) of3 of _ inf_ fr 4 of _ inf_ nr Wgate

147

S Figure A.3 Illustrations of Cof , composed of (a) the sidewall-fringe part and (b) the normal-fringing part. The figure on the right (c) is the top view of the electric

field line. The insets of (a) and (b) illustrate the physical interpretion of Heff_nr and Heff_fr.

The capacitive coupling from the neighboring channels is described in a way analogous to the Cgc case ((A.4)-(A.5)) and is given by (A.9)-(A.11). η1 and η2 in (A.9) are empirical fitting parameters, describing the logarithmic potential drop with the distance due to the line charges. τ1 and τ2 are characteristic lengths for the rate of potential drop, which are 4.5 and 220nm in our cases. N is the number of channels under the gate. κof r in (A.10) is introduced to include the effect of r on the screening.

(s + κof r) is the effective distance between the line charges of the edge channel to its adjacent neighbor. a3, a4, κof are around 0.9, 0.35 and 0.7~1.7, respectively, for the range of device parameters relevant from 32 nm to 11 nm nodes.

 N 2  2N  N  2       exp    ln 2   1   N  2  s   r  1  1  ,  of  , (A.9)

cap0L sd C of_ sr  , (A.10) 22 2Hsreff of ln  srof  148

S S E CCof of_ sr E CCof of_ sr Cof  S , Cof  S . (A.11) CCof_1 sr of CCof_2 sr 2 of

Figure A.4 Comparison between the model and 3D simulation for Cof. The model is accurate within 10% error across device parameters that are relevant for 32nm to 11nm technology nodes. The fitting parameters are listed in Table A.2.

C. Gate-to-plug capacitance: Cgtp

Our 3D simulation results confirm that the change in channel radii has little effect on

Cgtp as expected. Less than 10% difference has been observed for r increases from

1nm to 16nm or as (s-2r) increases from 0.5 to 20nm. Following [8], we model Cgtp as in (A.4). αgtp is fitted to be 0.4~0.5. τbk describes the effective contribution of the gate height to the fringe capacitance. Cgtp_nr accounts for the normal field between the two

149

plate-like electrodes. Cgtp_fr includes the contributions of the fringing field from the top and the sidewall of the gate and plug.

 2H  L     exp2  2 1 gate gate , bk  L   sd  (A.12)

cap oH gate Cgtp_ nr  , (A.13) Lsd

 C   cap 0 , (A.14) gtp_ fr gtp  2 LLsd gate ln LH gate bk gate

C  W  C  C gtp gate  gtp _ fr gtp _ nr . (A.15)

Table A.2 Values for fitting parameters used for 32nm and 11nm.

Cgc Cof Cgtp 0.6 (κox=4) 0.4 (32nm) a1 0.9 a3 0.9 αof αgtp 0.1 (κox=25) 0.5 (11nm) a2 1.2 a4 0.35 βof 0.6 βgc 0.15 τ1 4.5 γof 0.5 0.7 (32nm) κ 1 τ 220nm κ gc 2 of 1.7 (11nm)

A.4 Capacitance Penalty (ηc) and Actual Current Benefit (ηI)

A. Capacitance Penalty (ηc)

With these models, we examine the delay degradation due to parasitic capacitances for various channel radii (r) and channel densities (1/s). The capacitance penalty ηc is a

150

Figure A.5 (a) ηc for 11nm technology node as a function of r and s. (b) The absolute value of capacitance components for r=1. The percentage numbers are the ratio of Cof or Cgtp over Cgc, including the Miller effect.

3 s=20nm (50 channels/μm)

r=1nm, Wgate=200nm s=10nm (100 channels/μm) s=5nm (200 channels/μm) s=2.5nm (400 channels/μm) s=0 (2D)

2 High channel density is preferred for smaller ηc c η

1

0.5 11 16 22 32 Technology Node (nm) Figure A.6 High channel density (e.g. by more compact patterning) contributes to a small ηc, which is close to 2D devices. The case for the very wide gate (Wgate=200 nm) indicates the lower bound of ηc. The degradation as compared to 2D devices becomes larger for more advanced technologies.

151

function of the geometric size (Figure A.5) and the technology node (Figure A.6). 1D devices exhibit larger performance degradation than 2D devices due to parasitic capacitances, as evidenced by a larger ηc for smaller r and larger s. For 11nm technology, ηc increases from 1 for the 2D case to more than 2.5 for small r and large s

(Figure A.5). For 2D devices, Cgc is supported only by the normal field and Cof is mainly supported by the normal field in the coordinate system after conformal mapping (Figure A.7). The capacitances due to the elliptical field are not as sensitive to the dielectric thickness (roughly an inverse logarithmic relationship) as those by the normal field (an inverse relationship). The more the device is 1D like, i.e. the smaller

r and larger s, the larger the portions of Cgc and Cof are supported by the elliptical field.

When the device structures transit from 2D to 1D, both Cgc and Cof decrease.

However, as compared with the 2D case, a larger portion of Cof is supported by the elliptical field as compared to Cgc as for the 1D case (Figure A.7), because the

normal field gate channel gate

channel elliptical field

(a) (b) 2D 1D gate gate

channel channel normal field elliptical field

(c)

Figure A.7 Illustration of normal field and elliptical field components of Cgc and Cof. (a) Cross-section along device width, (b) top view, and (c) the comparison of a 2D device to a 1D device. For 1D channels, the ratio of elliptical field to normal

field is larger for Cof than for Cgc.

152

source/drain region is more open. During the transition from 2D to 1D, Cgc decreases faster than Cof (Figure A.5). Thus, the ratio of Cof over Cgc of a 1D device is larger than that of a 2D device. On the other hand, the ratio of Cgtp over Cgc increases with smaller r and larger s because Cgtp roughly remains constant, while Cgc decreases. 1D device suffers more from the parasitic effect for the more advanced technologies

(Figure A.6). The ηc can be large and increases rapidly with advancing technology nodes, especially for low channel density (large s), e.g. less than 50/μm. The main reason is that Cgtp increases while Cgc decreases for more advanced technologies (Figure A.8) due to the geometric scaling. The quantitative results shown in Section

IV, V and VI are calculated with κox=4. The use of high-k gate dielectric is preferred from the ηc point of view, because for the same EOT, Cgc remains at a similar level, but Cof is significantly reduced due to the larger physical dielectric thickness and the

4.5 due to C r=1nm, Wgate=200nm gc due to C 4 of due to Cgtp 3.5 ¬ 50/um

gc 3 /C par 2.5 ¬ 100/um 1+C 2 ¬ 200/um ¬ 400/um 1.5 2D 1

0.5

0 11 16 22 32 Technology Node (nm)

Figure A.8 Components of ηc in Figure A.6 contributed by Cof and Cgtp. The increase in ηc comes for decreasing 1D channel densities from both the Cof/Cgc ratio and the Cgtp/Cgc ratio. The Cgtp is the major contributor. The dashed lines envelop the factor (1+Cpar/Cgc) for different channel densities.

153

same ambient (e.g. sidewall) dielectric. Generally, higher gate dielectric constant, larger channel radius, higher channel density, and wider gate are preferred for a smaller capacitance penalty. However, even for extremely large channel density, this parasitic capacitance problem still exists for gates with Wgate < 10Lg (Figure A.9). At the 11nm technology node, even for a 400/μm channel density (2.5 nm pitch), parasitic capacitance causes around or more than 20% degradation from the 2D case.

B. Current Benefit (ηI) and Screening Effect

(/)II The “actual current benefit” is defined as eff off1 D including screening I  (/)IIeff off2 D effects. A useful quantity is the intrinsic current benefit (ηI0), defined as the improvement of the intrinsic current of 1D devices of over that of 2D devices, under the same off-state current condition and without any screening effect. ηI0 mainly depends on the material properties (e.g. mobility, velocity), the series resistance,

1.5

r=1nm, 400 channels/μm 1.4

1.3

c 1.2 η

1.1 Larger η for c smaller Wgate

1

0.9 11 16 22 32 Technology Node (nm) Figure A.9 Narrower devices suffer more from ηc because of more fringing effect and less screening in source/drain region. At the 11nm technology node, the

degradation from 2D device is greater than 20% for Wgate less than 10Lg.

154

and the subthreshold slope. It can be obtained by experiments or estimated by modeling [25]. This is the quantity often used for experiments or modeling as it is the case for an isolated channel or an array with low channel density. With multiple channels, ηI becomes less than ηI0 because of the screening effect which weakens the gate control per channel. For delay estimation purpose, the actual current benefit (ηI) should be used, which means the actual device performance improvement is degraded from the intrinsic current improvement because of the screening effect. For a fixed

Ioff, a larger Cgc gives better Ieff/Ioff ratio. To the first order, ηI is proportional to

(Cgc)1D/(Cgc)2D.

A.5 Design Optimization

1D device suffers on the one hand from the screening effect with higher channel densities; and, on the other hand, from large parasitics for lower channel densities and smaller r. The two competing effects represented by ηc and ηI result in optimal points for a set of channel density (1/s), channel radius (r) and gate width (Wgate). The departures from ηI0 of 1D vs. 2D devices with the optimal designs are illustrated in

Figure A.10. The numbers are normalized to ηI0. 60% means the 1D devices are

0.6×ηI0 faster than the 2D devices. Thus, the higher the percentage, the better. For both Wgate=10Lg and 3Lg, the optimal design occurs at r=1nm with 200 channels/μm. This small diameter may be difficult to reach with semiconductor but is feasible with carbon nanotubes. However, even in this optimal case, the actual delay improvement is still more than 30% overestimated over the intrinsic delay improvement ηI0. When the design is not optimized or if ηI0 is not sufficiently large, 1D devices can perform worse than 2D devices.

155

100 100 (%)

0 400/ m at r=1nm 400/ m at r=1nm

  I

 200/ m at r=1nm 200/ m at r=1nm 90  90  100/m at r=2-4nm 100/m at r=2-4nm 50/ m at r=8nm 50/ m at r=4-8nm 80  80  25/m at r=16nm 70 70 ) normalized to to ) normalized 1D  / 60 60 2D 

50 50

40 40

30 30 W =3L Wgate=10Lg gate g

20 20

Real delay improvement ( improvement delay Real 11 16 22 32 11 16 22 32 Technology node(nm) Technology node(nm)

(a) (b)

Figure A.10 The optimal designs for (a) Wgate=10Lg and (b) 3Lg at different channel densities, and the delay improvement over 2D devices (τ2D/τ1D) under these optimal cases. Both parasitics (ηc) and screening effect (ηI) are included.

A.6 Extended 1D Device Scaling Roadmap

By comparing the performance between 1D and 2D devices, we show that the parasitic capacitance and the channel-to-channel screening effect play significant roles on device performance for future technologies. To maximize the benefits of 1D devices, design optimization is required for each channel material with a different ηI0. Two extended roadmaps for 1D devices are proposed to enable the technology scaling beyond 32nm technology nodes: (a) for a fixed ηI0, e.g. a fixed channel material, and

156

(b) for a fixed channel structure (density and geometry). For a fixed ηI0 the performance per technology can be improved by gradually increasing the channel density together with optimizing the device structures based on the channel density.

For ηI0=5, which is typical for the carbon nanotube FET [25], a possible way to extend the roadmap down to 11nm with Lg no less than 10nm is proposed in Figure A.11. The channel density is gradually increased together with the device structure optimized based on the channel density, to meet the requirement of at least 17% speed up per technology node (Figure A.11). The upper bound for the channel density can in principle be reached by channel materials such as carbon nanotube due to its small radius. Parasitic capacitance limits the device performance for a given ηI0. This limit can be approached by the design optimization as discussed in Section A.5. Aggressively reducing the gate electrode height beyond simple scaling will remedy the degradation due to Cgtp [26]. More aggressive performance improvement is possible with larger ηI0. As shown in Figure A.11, some pitch/radius combinations perform better in 16nm node technology than 11nm node technology. The reason is that the rapidly increasing Cgtp ruins the improvement brought by scaling. Figure A.12 proposes another scaling path by boosting ηI0 for advancing technology nodes for a fixed channel pitch 40nm and a radius of 10nm— a channel density and geometry similar to currently available semiconductor nanowire technology. In this scenario, device performance for successive technology nodes is accomplished by increasing ηI0 (e.g. by changing channel material or increasing strain).

157

1 10

η =5 I0

Expected: 17% speedup per technology node r=1nm,4/mm (current CNFET) 0 10 r=1nm,25/μm r=2nm,50/μm r=2nm,100/μm r=1nm,200/μm (optimal) Proposed scaling path Delay normalizedto 32nm CMOS

-1 10 11 16 22 32 Technology Node (nm)

Figure A.11 Extended Roadmap by increasing the channel density per generation for a fixed ηI0. (ηI0=5)

1 10

r=10nm, 25channels/μm

Expected: 17% speedup per technology node η I0 =1 η =2 0 I0 10 η I0 =5 η I0 =6 Proposed scaling path Delay normalizedto 32nm CMOS

-1 10 11 16 22 32 Technology Node (nm)

Figure A.12 Extended Roadmap by increasing ηI0 per generation for a fixed channel pitch and radius. (r=10nm, s=40nm)

158

A.7 Conclusions

Parasitic capacitances take away the improvement of 1D devices over 2D devices that comes from the better carrier transport and/or electrostatic control. By analytically modeling the parasitic effects and examining the delay metric, we quantify the actual benefit of the optimized 1D devices. The analytical model presented in this paper paves the way for compact modeling of 1D FETs for circuit analysis [27]. Two extended technology roadmaps are proposed for scaling to 11 nm node using Lg no less than 10nm.

A.8 Bibliography

[1] N. Singh, A. Agarwal, L. Bera, T. Liow, R. Yang, S. Rustagi, C. Tung, R. Kumar, G. Balasubramanian and D.-L. Kwong, “High-Performance Fully Depleted (Diameter≤5nm) Gate-All-Around CMOS Devices,” IEEE Electron Devices Letters, vol. 27, No. 5, pp. 383-386, May 2007.

[2] X. Shao and Z. Yu, “Nanoscale FinFET Simulation: A Quasi-3D Quantum Mechanical Model using NEGF,” Solid-State Electronics, pp. 1435-1445, 49 (2005).

[3] H. Lee, L. Yu, S. Ryu, J. Han, K, Jeon, D. Jang, K. Kim, J. Lee, J Kim, “Sub- 5nm All-Around Gate FinFET for Ultimate Scaling,” Digest of Symposium on VLSI Technology, pp. 58-59, 2006.

[4] R. Tu, L. Zhang, Y. Nishi, H. Dai, “Measuring the Capacitance of Individual

159

Semiconductor Nanowires for Carrier Mobility Assessment,” Nano Lett., 7 (6), pp. 1561 -1565, 2007.

[5] S. Jin, M. V. Fischetti and T. Tang, “Modeling of in Gated Silicon Nanowires at Room Temperature: Surface Roughness Scattering, Dielectric Screening, and Band Nonparabolicity,” Journal of Applied Physics 102, 083175 (2007).

[6] Y. Cui, Z. Zhong, D. Wang, W. U. Wang, and C. M. Lieber, “High Performance Silicon Nanowire Field Effect Transistors,” Nano Letters, vol. 3, pp. 149-152, 2003.

[7] P. Solomon and S. Laux, “The Ballistic FET: Design, Capacitance and Speed Limit,” IEEE International Electron Device Meeting, pp.95-98, 2001.

[8] J. Deng, H.-S. P. Wong, “Modeling and analysis of planar gate electrostatic capacitance for 1-D FET with multiple cylindrical conducting channels,” IEEE Trans. on Electron Devices, vol. 54, No. 9, pp.2377-2385, 2007.

[9] M. H. Na, E.J. Nowak, W. Haensch, J. Cai, “The effective drive current in CMOS inverters,” International Electron Devices Meeting, p. 121-124, 2002.

[10] X. Wang, H.-S. P. Wong, P. Oldiges and R.J. Miller, “Electrostatic Analysis of Carbon Nanotube Arrays,” 2003 IEEE International Conference on Simulation of Semiconductor Processes and Devices, pp. 163 – 166, 2003.

[11] J. Sturm, K. Tokunaga, J. Colinge, “Increased Drain Saturation Current in Ultra- Thin Silicon-on-Insulator (SOI) MOS Transistors,” IEEE Electron Device Letters, vol. 9, No. 9, pp.460-463, Sept 1988.

[12] J. Colinge, “Subthreshold Slope of Thin-Film SOI MOSFET’s,” IEEE Electron Device Letters, vol. EDL-7, No. 4, pp. 244-246, Apr 1986.

[13] J. P. Colinge, M. H. Gao, A. R. Rodriguez, H. Maes, and C. Claeys, “Silicon-on- insulator: Gate-all-around device,” IEEE International Electron Device Meeting, 160

pp. 595-598, 1990.

[14] C. Hu, “Device Challenges and Opportunities,” Digest of Symposium on VLSI Technology, pp. 4-5, 2004.

[15] Maxwell 3D®, Ansoft Corporation, PA.

[16] J. W. Mintmire and C. T. White, “Universal Density of States for Carbon Nanotubes,” Phys. Rev. Lett. 81, pp. 2506 - 2509 (1998).

[17] P. Kim, T. W. Odom, J. Huang, and C. M. Lieber, “Electronic Density of States of Atomically Resolved Single-Walled Carbon Nanotubes: Van Hove Singularities and End States,” Phys. Rev. Lett. 82, pp.1225 - 1228 (1999).

[18] S. Narasimha, et al, “High Performance 45-nm SOI Technology with Enhanced Strain, Porous Low-k BEOL, and Immersion Lithography,” IEEE International Electron Device Meeting, pp. 689-692, 2006.

[19] H. Nii, et al, “A 45nm High Performance Bulk Logic Platform Technology (CMOS6) using Ultra High NA (1.07) Immersion Lithography with Hybrid Dual- Damascene Structure and Porous Low-k BEOL,” IEEE International Electron Device Meeting, pp. 685-688, 2006.

[20] J. Sleight, et al, “Challenges and Opportunities for High Performance 32nm CMOS Technology,” IEEE International Electron Device Meeting, pp. 697-700, 2006.

[21] K. Mistry, et al, “A 45nm Logic Technology with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100% Pb-free Packaging,” IEEE International Electron Device Meeting, pp 247-250, 2007.

[22] T. Miyashita, et al, “High-Performance and Low-Power Bulk Logic Platform Utilizing FET Specific Multiple-Stressors with Highly Enhanced Strain and Full- Porous Low-k Interconnects for 45nm CMOS Technology,” IEEE International 161

Electron Device Meeting, pp. 251-254, 2007.

[23] A. Bansal, B. C. Paul, and K. Roy, “Modeling and optimization of fringe capacitance of nanoscale DGMOS devices,” IEEE Transactions on Electron Devices, vol. 52, No. 2, pp. 256–262, Feburary 2005.

[24] W. Wu and M. Chan, “Analysis of Geometry-Dependent Parasitics in Multifin Double-Gate FinFETs,” IEEE Transactions on Electron Devices, vol. 52, No. 4, pp. 692- 698, April 2007.

[25] J. Deng, H.-S. P. Wong, “A Compact SPICE Model for Carbon-Nanotube Field- Effect Transistors Including Nonidealities and Its Application—Part II: Full Device Model and Circuit Performance Benchmarking,” IEEE Transactions on Electron Devices, vol.54, No.12, pp.3195-3205, 2007.

[26] J. Deng, L. Wei, L. Chang, K. Kim, C. Chuang, H.-S. P. Wong, “Extended Technology Roadmap by Selective Device Footprint Scaling and Parasitics Engineering,” International Symposium on VLSI Technology, Systems and Applications, pp. 159-160, 2008.

[27] L. Wei, D. Frank, L. Chang, H.-S. P. Wong, “An Analytical Model for Intrinsic Carbon Nanotube FETs,” the 38th European Solid-State Device Research Conference, (ESSDERC2008), pp. 222-225, September 15 – 19, 2008.

162

Author’s Biography

Lan Wei was born in Dalian, China. She received the B. S. degrees in Microelectronics and Economics from Peking University in 2005 and the M. S. degree in Electrical Engineering from Stanford University in 2007, and expects the Ph. D. degree in Electrical

Engineering from Stanford University in 2010. She has held research intern positions at Intel (2006), IBM Research (2007), STMicroelectronics (2008), and Grenoble Institute of Technology (2008).

Under the supervision of Prof. H. –S. Philip Wong in the Stanford Nanoelectronics Group, her Ph. D. research focuses on technology scaling with a holistic view across the traditional boundaries of device, circuit, and system domains, as well as integrated bio-systems and biomedical devices. She has contributed to the Process Integration, Devices, and Structures (PIDS) Chapter of the International Technology Roadmap for (ITRS) 2009 Edition. Lan Wei was a recipient of a number of awards, including Stanford Graduate Fellowship (2005-2009), and the Chinese Government Fellowship for Outstanding Self-Financed Students Abroad (2009).

She enjoys exploring different cultures, people, places, and , by traveling and reading.

163

List of Publications

Refereed Journal Publications

[1] L. Wei, J. Deng, L.-W. Chang, K. Kim, C.-T. Chuang, and H. -S. P. Wong, "Selective Device Structure Scaling and Parasitics Engineering: A Way to Extend the Technology Roadmap," IEEE Transactions on Electron Devices, vol. 56, No.2, pp. 312 – 320, February, 2009. [2] L. Wei, J. Deng, and H.-S. P. Wong, “Modeling and Performance Comparison of 1-D and 2-D Devices Including Parasitic Gate Capacitance and Screening Effect,” IEEE Transactions on Nanotechnology, vol. 7, No. 6, pp. 720 – 727, November, 2008. [3] L. Wei, D. Frank, L. Chang, and H.-S. P. Wong, “Non-iterative Compact Model for Intrinsic Carbon Nanotube FETs: Quantum Capacitances and Transport,” submitted to IEEE Transactions on Nanotechnology, 2010. [4] L. Wei, F. Boeuf, T. Skotnicki, and H.-S. P. Wong, “Parasitic Capacitances: Analytical Models and Impact on Circuit-Level Performance,” submitted to IEEE Transactions on Electron Devices, 2010. [5] L. Wei, F. Boeuf, D. Antoniadis, T. Skotnicki, and H.-S. P. Wong, “Improving Circuit-Level Performance by Device Electrostatic Engineering,” submitted to IEEE Transactions on Electron Devices, 2010.

Refereed Conference Papers

[6] L. Wei, S. Oh, and H. –S. Philip Wong, “Performance Benchmarks for Si, III-V, 165

TFET, and carbon nanotube FET – Re-thinking the Technology Assessment Methodology for Complementary Logic Applications,” 2010 IEEE International Electron Devices Meeting (IEDM 2010), paper 16.2, San Francisco, USA, December 6 – 8, 2010. [7] J. Luo, L. Wei, F. Boeuf, D. Antoniadis, T. Skotnicki, and H.-S. P. Wong, “Device Engineering for Improving SRAM Static Noise Margin,” 2010 International Conference on Solid State Devices and Materials (SSDM 2010), paper C-4-3, Tokyo, Japan, September 22 – 24, 2010. [8] L. Wei, D. Frank, L. Chang, and H.-S. P. Wong, “A Non-iterative Compact Model for Carbon Nanotube FETs Incorporating Source Exhaustion Effects,” 2009 IEEE International Electron Devices Meeting (IEDM 2009), pp. 917 – 920, Baltimore, USA, December 6 – 9, 2009. [9] L. Wei, F. Boeuf, D. Antoniadis, T. Skotnicki, and H.-S. P. Wong, “Exploration of Device Design Space to Meet Circuit Speed Targeting 22nm and Beyond,” 2009 International Conference on Solid State Devices and Materials (SSDM 2009), pp. 808 – 809, Sendai, Japan, September 23 – 26, 2009. [10] L. Wei, F. Boeuf, T. Skotnicki, and H.-S. P. Wong, “CMOS Technology Roadmap Projection Including Parasitic Effects,” 2009 International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA 2009), pp. 78 – 79, Hsinchu, Taiwan, April 27 – 29, 2009. [11] L. Wei, D. Frank, L. Chang, H.-S. P. Wong, “An Analytical Model for Intrinsic Carbon Nanotube FETs,” the 38th European Solid-State Device Research Conference (ESSDERC 2008), pp.222 – 225, Edinburgh, United Kingdom, September 15 – 19, 2008. [12] J. Deng, L. Wei, L.-W. Chang, K. Kim, C.-T. Chuang, and H.-S. P. Wong, "Extending Technology Roadmap by Selective Device Footprint Scaling and Parasitics Engineering," 2008 International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA 2008), pp. 159 – 160, Hsinchu, Taiwan,

166

April 21 – 23, 2008. [13] L. Wei, J. Deng, and H.-S. P. Wong, “1-D and 2-D Devices Performance Comparison including Parasitic Gate Capacitance and Screening Effect,” 2007 IEEE International Electron Devices Meeting (IEDM 2007), pp. 741 – 744, Washington, D. C., USA, December 10 – 12, 2007.

Invited Conference Papers and Workshop Presentations

[14] S. Oh, L. Wei, S. Chong, J. Luo, and H.-S. P. Wong, “Device and Circuit Interactive Design and Optimization Beyond the Conventional Scaling Era,” invited paper, 2010 IEEE International Electron Devices Meeting (IEDM 2010), San Francisco, USA, December 6 – 8, 2010. [15] L. Wei and H.-S. Philip Wong, “Compact Modeling Aided Technology Design and Projection Considering System-Level Performance,” MOS Modeling and Parameter Extraction Working Group MOS-AK/GSA Workshop, Baltimore, USA, December 9, 2009. [16] H.-S. P. Wong, L. Wei, S. Oh, A. Lin, J. Deng, S. Chong, and K. Akarvardar, “Technology Projection Using Simple Compact Models,” invited plenary paper, 2009 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD 2009), pp. 1 – 8, San Diego, USA, September 9 – 11, 2009. [17] H.-S. P. Wong, L. Wei, and J. Deng, “The Future of CMOS Scaling – Parasitics Engineering and Device Footprint Scaling,” invited paper, 2008 International Conference on Solid State and Integrated Circuit Technology (ICSICT 2008), pp. 21 – 24, Beijing, China, October 20 – 23, 2008.

167

Parts of this dissertation are reprinted, with permission from the following IEEE publications.

© 2008 IEEE. Reprinted, with permission, from IEEE Transactions on Nanotechnology, “Modeling and Performance Comparison of 1-D and 2-D Devices Including Parasitic Gate Capacitance and Screening Effect,” L. Wei, J. Deng, and H.- S. P. Wong.

© 2009 IEEE. Reprinted, with permission, from IEEE Transactions on Electron Devices, “Selective Device Structure Scaling and Parasitics Engineering: A Way to Extend the Technology Roadmap,” L. Wei, J. Deng, L.-W. Chang, K. Kim, C.-T. Chuang, and H. -S. P. Wong,

© 2009 IEEE. Reprinted with permission, from Proceedings of 2009 IEEE International Electron Devices Meeting, “A Non-iterative Compact Model for Carbon Nanotube FETs Incorporating Source Exhaustion Effects,” L. Wei, D. Frank, L. Chang, and H.-S. P. Wong.

168