2009 ISSCC
The New Era of Scaling in an SoC World
Mark Bohr Intel Senior Fellow Logic Technology Development
1 The End of Scaling is Near?
“Optical lithography will reach its limits in the range of 0.75-0.50 microns”
“Minimum geometries will saturate in the range of 0.3 to 0.5 microns”
“X-ray lithography will be needed below 1 micron”
“Minimum gate oxide thickness is limited to ~2 nm”
“Copper interconnects will never work”
“Scaling will end in ~10 years”
Perceived barriers are meant to be surmounted, circumvented or tunneled through
2 Outline
• Transistor Scaling
• Microprocessor Evolution
• Vision of the Future
3 Scaling Trends
10
CPU Transistor Count 10 9 2x every 2 years
1 10 7 Microns 0.1 10 5
3 0.01 10 1970 1980 1990 2000 2010 2020
Transistor dimensions scale to improve performance, reduce power and reduce cost per transistor
4 Scaling Trends
10
CPU Transistor Count 10 9 2x every 2 years
1 10 7 Microns 0.1 65nm 10 5 45nm Feature Size 32nm 0.7x every 2 years 3 0.01 10 1970 1980 1990 2000 2010 2020
Transistor dimensions scale to improve performance, reduce power and reduce cost per transistor
5 MOSFET Scaling
Device or Circuit Parameter Scaling Factor Device dimension tox, L, W 1/ κ Doping concentration Na κ Voltage V 1/ κ Current I 1/ κ Capacitance εA/t 1/ κ Delay time/circuit VC/I 1/ κ Power dissipation/circuit VI 1/ κ2 Power density VI/A 1
R. Dennard, IEEE JSSC, 1974
Classical MOSFET scaling was first described in 1974
6 30 Years of MOSFET Scaling
Dennard 1974 Intel 2005
1 µm
Gate Length: 1.0 µm 35 nm Gate Oxide Thickness: 35 nm 1.2 nm Operating Voltage: 4.0 V 1.2 V
Classical scaling ended in the early 2000s due to gate oxide leakage limits
7 90 nm Strained Silicon Transistors
NMOS PMOS
High Stress Film
SiGe SiGe
SiN cap layer SiGe source-drain Tensile channel strain Compressive channel strain
Strained silicon provided increased drive currents, making up for lack of gate oxide scaling
8 High-k + Metal Gate Transistors
65 nm Transistor 45 nm HK+MG
SiO 2 dielectric Hafnium-based dielectric Polysilicon gate electrode Metal gate electrode
High-k + metal gate transistors break through gate oxide scaling barrier
9 Transistor Performance Increase
NMOS PMOS 1000 1000 1.0 V 1.0 V 65nm 45nm 65nm 45nm
100 +12% 100 +50%
(nA/um) 5x (nA/um) 100x OFF OFF
I 10 I 10
1 1 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.4 0.6 0.8 1.0 1.2 1.4 1.6
ION (mA/um) ION (mA/um) 45 nm HK+MG provides average 30% drive current
increase or >5x I OFF leakage reduction Ref. K. Mistry, IEDM ’07 10 Gate Leakage Reduction
100 SiON/Poly 65nm 10 25x 1 SiON/Poly 65nm 0.1 1000x 0.01
0.001 HiK+MG 45nm HiK+MG 45nm 0.0001 Normalized Gate Leakage PMOS NMOS 0.00001 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 VGS (V)
HK+MG significantly reduces gate leakage
11 Bitcell Leakage Reduction
12 1.0V 25C 10
8
6 IGATE 10x 4
IOFF 2 Normalized Cell Leakage Cell Normalized IJUNCT 0 65nm 45nm
SRAM bitcell leakage reduced ~10x
12 VT Variability Reduction
1.1 1 Minimal oxide scale HiK+MG 0.9
C2 0.8 Less V T Normalized 0.7 variation to 180nm Tox scaling 0.6 0.5 0.4 180nm 130nm 90nm 65nm 45nm
4 4q3ε φ T 4 N 1 Cc σV = si B ⋅ ox ⋅ = 22 ( Tran 2 εox Leff ⋅ Zeff 2 Leff ⋅ Zeff
HK+MG provides oxide scaling needed for variability reduction Ref. K. Kuhn, IEDM ’07 13 Interconnect Trends
10 10
8
6 M2 Pitch # Metal 1 (um) Layers 4
0.7x per 2 generation
0.1 0 500 350 250 180 130 90 65 45 32 Technology Generation (nm) Added metal layers + material improvements enable interconnect scaling
14 Interconnect Trends
10 10
8
6 M2 Pitch # Metal 1 (um) Layers 4
0.7x per 2 generation
0.1 0 500 350 250 180 130 90 65 45 32 Technology Generation (nm) Added metal layers + material improvements enable interconnect scaling
15 Interconnect Trends
10 10
8
6 M2 Pitch # Metal 1 (um) Layers 4
SiO 2 SiOF Low-k Lower-k 2 Al Cu 0.1 0 500 350 250 180 130 90 65 45 32 Technology Generation (nm) Added metal layers + material improvements enable interconnect scaling
16 45 nm Interconnects Pitch (nm)
M8 810 Loose pitch + thick metal on upper layers M7 560 Cu • High speed global wires Low-k • Low resistance power grid M6 360
M5 280
Tight pitch on lower layers M4 240 • Maximum density for M3 160 local interconnects M2 160 M1 160
Hierarchical interconnect pitches
17 45 nm Interconnects
Polymer
M9 7 µm Cu
M1-8
Thick M9 for very low resistance on-die power routing
18 45 nm Microprocessor Products
Quad Core Dual Core Single Core
6 Core 8 Core 45 nm process serves microprocessor applications from low power to high performance
19 32 nm Generation
10
1 Microns 0.1
32nm
0.01 1970 1980 1990 2000 2010 2020
20 32 nm Logic Technology
• 2nd generation high-k + metal gate transistors - High-k EOT scaled from 1.0 nm to 0.9 nm - Replacement metal gate process flow - 4th generation strained silicon • 9 copper + low-k interconnect layers - Hierarchical interconnect pitches - Thick M9 for power routing • Immersion lithography on critical layers - 70% transistor and interconnect pitch scaling - 50% SRAM cell area scaling • Pb-free and halogen-free packages
Higher performance, lower power, lower cost per transistor
21 Contacted Gate Pitch Trend
1000 Pitch
Gate Pitch (nm) 0.7x every 2 years 32 nm Generation 100 112.5 nm Pitch 1995 2000 2005 2010
Transistor gate pitch continues to scale 0.7x every 2 years
22 Transistor Performance
2.0 2.0
1.0 V, 100 nA I OFF 32nm 1.5 45nm 1.5 65nm Drive 90nm Current 1.0 1.0 (mA/um) 130nm NMOS 0.5 0.5 PMOS 0.0 0.0 1000 100 Gate Pitch (nm)
Drive currents continue to increase while gate pitch scales
23 32 nm Interconnects
8 um Cu M9
Pitch (nm) M8 566.5
M7 450.1
M6 337.6 M5 225.0 M4 168.8 M3 112.5 M2 112.5 M1 112.5 Hierarchical interconnect pitches
24 SRAM Cell Size Scaling
10
Cell Area 1 (um 2) 0.5x every 2 years 32 nm Generation 0.171 um 2 Cell 0.1 1995 2000 2005 2010
Transistor density continues to double every 2 years
25 SRAM Cell Scaling
65 nm 0.570 µm2
45 nm 0.346 µm2
32 nm 0.171 µm2
Good pattern resolution while scaling feature size and continuing with 193 nm exposure wavelength
26 32 nm SRAM Test Chip
• 291 Mbit
• 0.171 um 2 cell size
• >1.9 billion transistors
• >3.8 GHz operation
• Functional silicon in Aug ‘07
32 nm SRAM test vehicle included all transistor and interconnect features used on 32 nm microprocessors Ref. Y. Wang, paper 27.1, ISSCC ’09 27 30 Years of Scaling
Contact 1978
Ten 32nm SRAM Cells 2008
1 µm
28 The Old Era of Device Scaling
Device or Circuit Parameter Scaling Factor Device dimension tox, L, W 1/ κ Doping concentration Na κ Voltage V 1/ κ Current I 1/ κ Capacitance εA/t 1/ κ Delay time/circuit VC/I 1/ κ Power dissipation/circuit VI 1/ κ2 Power density VI/A 1
It has served us well for >30 years
29 The New Era of Device Scaling
SiGe SiGe
Copper + Low-k Strained Silicon High-k + Metal Gate
Modern CMOS scaling is as much about material and structure innovation as dimensional scaling
30 Outline
• Transistor Scaling
• Microprocessor Evolution
• Vision of the Future
31 Microprocessor Evolution
More transistors Higher frequency More data bits per cycle Instruction parallelism Out-of-order issue Multi-threading
Many of these innovations have been for improved performance, now the challenge is to innovate for power efficiency
32 45 nm Nehalem CPU
Modern microprocessors are a complex system on a chip with multiple functional units and multiple interfaces
33 45 nm Nehalem CPU
23 master DLL circuits 11 PLL circuits 5 digital thermal sensors
Multiple clocking domains, local control
34 SRAM Dynamic Sleep Transistors
Normal SRAM Sleep transistors V DD sub-block leakage shut off leakage in inactive sub-blocks
SRAM Cache Sub-Block
Sleep Sleep Control Transistor
IREM images showing banks being accessed
VSS
5-10x leakage reduction during “retention/standby” Ref. K. Zhang, VLSI Circuits ‘04 35 Integrated Power Gates
Thick On-Die (M9) VCC Interconnect Layer
Power Gates
Core0 Core1 Core2 Core3 Nehalem
Memory System, Cache, I/O VTT
• Shuts off both switching power and leakage power • Enables idle cores to go to ~0 power, independent of state of other cores on die
Ref. R. Kumar, paper 3.2, ISSCC ’09 36 Power Gates Enabled with Design+Process Co-optimization
M9
M1-8
Thick metal 9 layer for low Ultra-low leakage transistor for resistance on-die power routing high off-resistance power gates
37 Nehalem Turbo Mode
Many threaded workloads Lightly threaded workloads - Turbo Mode Frequency Core 0 Core 1 Core 2 Core 3 Core Core 0 Core 1 Core 2 Core 3 Core Core 0 Core 1 Core 2 Core 3 Core
• All cores operating • Power gates shut off some cores • Zero power for inactive cores • Higher frequency for active cores
Dynamically delivering optimal performance and energy efficiency
Ref. R. Kumar, paper 3.2, ISSCC ’09 38 Nehalem Power Control Unit
Vcc BCLK
Core 0 PLL Vcc Freq. PLL Integrated proprietary Sensors microcontroller Core 1
Vcc Shifts control from hardware to Freq. PLL embedded firmware Sensors
Core 2 PCU Real time sensors for voltage, Vcc temperature, current/power Freq. PLL Sensors Flexibility enables sophisticated Core 3 Uncore algorithms, tuned for current Vcc LLC Freq. PLL operating conditions Sensors
Ref. R. Kumar, paper 3.2, ISSCC ’09 39 Adaptive Frequency System
Higher freq.
Digital Voltage Supply Lower freq.
• Adaptive PLL frequency – Higher frequency during voltage peaks – Lower frequency during voltage droops • Up to 5% frequency improvement at same voltage • Lower power at same frequency
Ref. N. Kurd, VLSI Circuits ’08 40 PC Platform Comparison
1985 2008
TM Cache Clock Intel386 DRAM Nehalem Control Gen. Processor Control Processor
Cache Cache Intel387 Math DRAM DRAM TAG Data DRAM DRAM SRAM SRAM Co-processor DRAM DRAM DRAM DRAM
Modern microprocessors integrate many of the separate system components from past platforms
41 Microprocessor Evolution
Intel386 TM Nehalem Transistor Count: 280 thousand 731 million Frequency: 16 MHz >3.6 GHz # Cores: 1 4 Cache Size: None 8 MB I/O Peak Bandwidth: 64 MB/sec 50 GB/sec Adaptive Circuits: None Sleep Mode Turbo Mode Power Gating Adaptive Frequency Clocking
42 45 nm SoC Transistors
1000 1000 NMOS PMOS 100 100 High High Performance Perfomance 10 1.0V 10 1.0V (nA/um) (nA/um) 1 1 OFF OFF I I Low Low Power Power 0.1 1.1V 0.1 1.1V
0.01 0.01 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.4 0.6 0.8 1.0 1.2 1.4 1.6
ION (mA/um) ION (mA/um)
Wider range of transistor types provided for SoC: High performance and low power Ref. C. Jan, IEDM ‘08 43 45 nm SoC I/O Transistors
NMOS @ 1.8V PMOS @ 1.8V 10 1 65nm 45nm 65nm 45nm
SiO 2 HKMG SiO 2 HKMG 1 0.1
0.1 +17% 0.01 +57% (nA/um) (nA/um) OFF OFF OFF I I 0.01 0.001
0.001 0.0001 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6
ION (mA/um) ION (mA/um)
Wider range of transistor types provided for SoC: High speed, high voltage I/O Ref. C. Jan, IEDM ‘08 44 Devices for SoC Analog Circuits
Passive Elements • Precision resistor • High Q varactor • High Q inductor
70 NMOS Active Elements 60 28nmx0.9umx100 VDS = 1.1V • RF CMOS VG = 0.6V 50
40 RF + Mixed Signal Circuits
30 H21 • A to D, D to A converters (dB) Gain -20dB/dec. Mason’s U • RF transceiver 20 • LCPLL 10 fT = 395 GHz fMAX = 410 GHz • High speed I/O 0 0.1 1 10 100 1000 Frequency (GHz)
Ref. C. Jan, IEDM ‘08 45 The Old Era of Microprocessor Scaling
Larger Cores Higher Frequency Higher Power
It has served us well for >30 years
46 The New Era of Microprocessor Scaling
Many-Core Multi-Core Multi-Function System on a Chip
Avoiding the power wall requires a systemic approach from process technology through circuit design to micro-architecture to deliver products with power efficient performance
47 Outline
• Transistor Scaling
• Microprocessor Evolution
• Vision of the Future
48 Future Scaling Challenges
• Patterning ever-smaller features sizes
• Transistor and interconnect technologies that provide higher performance at lower power
• Continued voltage scaling for low power
• Integrating a wider range of device types for system-on-chip or system-in-package products
49 Lithography
1 1000
Wavelength 248nm 193nm
OPC micron 0.1 100 nm Phase shift Immersion
32nm Feature Size 22nm EUV 13.5nm 15nm 0.01 10 1980 1990 2000 2010 2020
193 nm enhancements got us to the 32 nm generation
50 Layout Restrictions
65 nm Layout Style 32 nm Layout Style
• Bi-directional features • Uni-directional features • Varied gate dimensions • Uniform gate dimension • Varied pitches • Gridded layout
51 Lithography Options for Beyond 32 nm
Pitch Doubling 2-D Features
Double Patterning • Pitch doubling • Improved 2-D features
Pixilated Mask Printed Image
Computational Lithography • Pixilated mask • Existing 193 nm litho tools
52 Extreme Ultraviolet Lithography
Cymer beta source Intel EUV Mask ASML ADT printed wafer
2007
1H1H1H’1H ’’’08080808
C
2H2H2H’2H ’’’08080808
Target 2H2H2H080808 08
Philips beta source Photoresist Development Nikon EUV1 printed wafer Continued progress towards EUV implementation
53 Transistor Options
340 <110> Hole <100> 290 /Vs) /Vs) 22 Substrate Engineering 240 <110> <100> (100) + Increased p-channel mobility 190 <111> 2x ? Impact on n-channel mobility 140 <110>
Hole Mobility (cm (cm Mobility Mobility Hole Hole 90 (100) Mobility (110) (110) Mobility 40 -3000 -2000 -1000 0 <110> Stress (MPa)
FinFET Multi-Gate FETs + Improved electrostatics + Steeper sub-threshold slope GAA ? Higher parasitic resistance ? Higher parasitic capacitance
54 III-V Transistor Options
500 InGaAs QWFET [L = 80nm] InSb p-QWFET [L = 40nm] G 150 G 400 V =0.5V V =0.5V [GHz] [GHz] DS DS T T
300 100 1.1V 1.1V 200 V =0.5V DS 50 V =0.5V Strained Si 100 DS p-MOSFET Silicon [L = 60nm] G [L = 60nm] G Cut-offfrequency, f 0 Cut-offfrequency, f 0 10 1 10 2 10 3 10 1 10 2 10 3 DC power dissipation [ µW/ µm] DC power dissipation [ µW/ µm]
InGaAs NMOS QWFET InSb PMOS QWFET
Peak fT > 400GHz at Vcc = 0.5V Peak fT > 140 GHz at Vcc = -0.5V
III-V materials for improved performance at low voltage
55 3-D Chip Stacking
+ High density chip-chip connections Top Chip
+ Small form factor TSV Bottom Chip + Combine dissimilar technologies Package
? Added cost CPU TSV ? Degraded power delivery, Memory heat sinking Package ? Area impact on lower chip
3-D chip stacking using through-silicon vias
56 Optical Interconnects
Optical Interconnects Optical Optical Layer Layer Laser
Chip Chip
Ge Photodetectors
Waveguides Laser
Modulators Optical Layer Chip (CPU, Memory, Graphics, etc.)
Nearer term: High bandwidth chip-chip interconnects Longer term: On-chip interconnects Ref. I. Young, paper 28.1, ISSCC ’09 57 High Density Memory
Floating Body Cell Phase Change Memory Seek and Scan Probe
Dense memory increasingly important Several novel directions being explored
58 System Integration
Discrete 2-D Integration (SoC)
High High Density Speed Memory Memory
High Low Perf. Power Logic Logic 3-D Integration Power Radio Regulator Logic Memory Power Reg. Radio Sensors Photonics Sensors Photonics
System integration needed for performance, power, form factor Challenge is to integrate wider range of heterogeneous elements
59 Higher Level System Integration
Organic Electronic
Computing
Sensors
Power Supply Motion
Reptile Autonomous Vehicle Stanford entry 2007 DARPA challenge
We’re trying to emulate nature’s capabilities
60 Evolutionary Comparison
Organic
Complex Single-Cell Multi-Cell Reptile Human Molecule Organism Organism Electronic
Transistor Integrated Microprocessor Autonomous Robot Circuit PC Vehicle What did nature have to “invent” to evolve to higher forms?
61 Brain Neuron
Input
Output
Up to 1 meter in length
~50 um Neuron Transistor Charge carrier: Ions Electrons Voltage swing: 100 mV 1.0 V Threshold voltage: 10-20 mV ~300 mV
Nature is a master of low power operation Neuron image from J. Nolte [36] 62 Organic vs. Electronic Circuits
FI, FO ~1000 Operates ~100 Hz
AND/OR Function
Brain circuits are slow but massively parallel Neuron image from J. Nolte [36] 63 Organic vs. Electronic Interconnects
Myelinated Axon 25 m/sec
1.0 mm
Cu + Low-k + Repeaters >10 7 m/sec RepeaterRepeater Cu Wire Low-kCu Wire
Low-k
0.5 mm
Myelin coating improves axon signal speed ~10x, but still slow Axon image from J. Nolte [36] 64 Organic vs. Electronic Systems
10 11 Neurons >10 8 CPU Transistors # Devices: 10 14 Synapses 10 11 System Total Eyes, Ears, Taste, Keyboard, Radio, Input Devices: Touch, Smell USB Port Operating Freq: 100 Hz >2 GHz Power: 20 Watts 40 Watts
We have a way to go and much to learn
65 Conclusion
• Moore’s Law continues, but the formula for success is changing – New materials and device structures are needed to continue scaling – Circuit design and micro-architecture innovations focus more on power efficiency • System level integration is increasingly important – Success will be determined by ability to integrate a wider and more heterogeneous set of components • Organic evolution has given us some clues for effective higher level system integration – Low power operation – Massive parallelism – Integrated sensors
66