2009 ISSCC

The New Era of Scaling in an SoC World

Mark Bohr Senior Fellow Logic Technology Development

1 The End of Scaling is Near?

“Optical lithography will reach its limits in the range of 0.75-0.50 microns”

“Minimum geometries will saturate in the range of 0.3 to 0.5 microns”

“X-ray lithography will be needed below 1 micron”

“Minimum gate oxide thickness is limited to ~2 nm”

will never work”

“Scaling will end in ~10 years”

Perceived barriers are meant to be surmounted, circumvented or tunneled through

2 Outline

Scaling

• Microprocessor Evolution

• Vision of the Future

3 Scaling Trends

10

CPU 10 9 2x every 2 years

1 10 7 Microns 0.1 10 5

3 0.01 10 1970 1980 1990 2000 2010 2020

Transistor dimensions scale to improve performance, reduce power and reduce cost per transistor

4 Scaling Trends

10

CPU Transistor Count 10 9 2x every 2 years

1 10 7 Microns 0.1 65nm 10 5 45nm Feature Size 32nm 0.7x every 2 years 3 0.01 10 1970 1980 1990 2000 2010 2020

Transistor dimensions scale to improve performance, reduce power and reduce cost per transistor

5 MOSFET Scaling

Device or Circuit Parameter Scaling Factor Device dimension tox, L, W 1/ κ Doping concentration Na κ Voltage V 1/ κ Current I 1/ κ Capacitance εA/t 1/ κ Delay time/circuit VC/I 1/ κ Power dissipation/circuit VI 1/ κ2 Power density VI/A 1

R. Dennard, IEEE JSSC, 1974

Classical MOSFET scaling was first described in 1974

6 30 Years of MOSFET Scaling

Dennard 1974 Intel 2005

1 µm

Gate Length: 1.0 µm 35 nm Gate Oxide Thickness: 35 nm 1.2 nm Operating Voltage: 4.0 V 1.2 V

Classical scaling ended in the early 2000s due to gate oxide leakage limits

7 90 nm Strained Silicon

NMOS PMOS

High Stress Film

SiGe SiGe

SiN cap layer SiGe source-drain Tensile channel strain Compressive channel strain

Strained silicon provided increased drive currents, making up for lack of gate oxide scaling

8 High-k + Metal Gate Transistors

65 nm Transistor 45 nm HK+MG

SiO 2 dielectric Hafnium-based dielectric Polysilicon gate electrode Metal gate electrode

High-k + metal gate transistors break through gate oxide scaling barrier

9 Transistor Performance Increase

NMOS PMOS 1000 1000 1.0 V 1.0 V 65nm 45nm 65nm 45nm

100 +12% 100 +50%

(nA/um) 5x (nA/um) 100x OFF OFF

I 10 I 10

1 1 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.4 0.6 0.8 1.0 1.2 1.4 1.6

ION (mA/um) ION (mA/um) 45 nm HK+MG provides average 30% drive current

increase or >5x I OFF leakage reduction Ref. K. Mistry, IEDM ’07 10 Gate Leakage Reduction

100 SiON/Poly 65nm 10 25x 1 SiON/Poly 65nm 0.1 1000x 0.01

0.001 HiK+MG 45nm HiK+MG 45nm 0.0001 Normalized Gate Leakage PMOS NMOS 0.00001 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 VGS (V)

HK+MG significantly reduces gate leakage

11 Bitcell Leakage Reduction

12 1.0V 25C 10

8

6 IGATE 10x 4

IOFF 2 Normalized Leakage Cell Normalized IJUNCT 0 65nm 45nm

SRAM bitcell leakage reduced ~10x

12 VT Variability Reduction

1.1 1 Minimal oxide scale HiK+MG 0.9

C2 0.8 Less V T Normalized 0.7 variation to 180nm Tox scaling 0.6 0.5 0.4 180nm 130nm 90nm 65nm 45nm

 4 4q3ε φ  T  4 N  1  Cc  σV =  si B  ⋅ ox ⋅  =  22  ( Tran        2  εox  Leff ⋅ Zeff  2  Leff ⋅ Zeff 

HK+MG provides oxide scaling needed for variability reduction Ref. K. Kuhn, IEDM ’07 13 Interconnect Trends

10 10

8

6 M2 Pitch # Metal 1 (um) Layers 4

0.7x per 2 generation

0.1 0 500 350 250 180 130 90 65 45 32 Technology Generation (nm) Added metal layers + material improvements enable interconnect scaling

14 Interconnect Trends

10 10

8

6 M2 Pitch # Metal 1 (um) Layers 4

0.7x per 2 generation

0.1 0 500 350 250 180 130 90 65 45 32 Technology Generation (nm) Added metal layers + material improvements enable interconnect scaling

15 Interconnect Trends

10 10

8

6 M2 Pitch # Metal 1 (um) Layers 4

SiO 2 SiOF Low-k Lower-k 2 Al Cu 0.1 0 500 350 250 180 130 90 65 45 32 Technology Generation (nm) Added metal layers + material improvements enable interconnect scaling

16 45 nm Interconnects Pitch (nm)

M8 810 Loose pitch + thick metal on upper layers M7 560 Cu • High speed global wires Low-k • Low resistance power grid M6 360

M5 280

Tight pitch on lower layers M4 240 • Maximum density for M3 160 local interconnects M2 160 M1 160

Hierarchical interconnect pitches

17 45 nm Interconnects

Polymer

M9 7 µm Cu

M1-8

Thick M9 for very low resistance on-die power routing

18 45 nm Microprocessor Products

Quad Core Dual Core Single Core

6 Core 8 Core 45 nm process serves microprocessor applications from low power to high performance

19 32 nm Generation

10

1 Microns 0.1

32nm

0.01 1970 1980 1990 2000 2010 2020

20 32 nm Logic Technology

• 2nd generation high-k + metal gate transistors - High-k EOT scaled from 1.0 nm to 0.9 nm - Replacement metal gate process flow - 4th generation strained silicon • 9 copper + low-k interconnect layers - Hierarchical interconnect pitches - Thick M9 for power routing • on critical layers - 70% transistor and interconnect pitch scaling - 50% SRAM cell area scaling • Pb-free and halogen-free packages

Higher performance, lower power, lower cost per transistor

21 Contacted Gate Pitch Trend

1000 Pitch

Gate Pitch (nm) 0.7x every 2 years 32 nm Generation 100 112.5 nm Pitch 1995 2000 2005 2010

Transistor gate pitch continues to scale 0.7x every 2 years

22 Transistor Performance

2.0 2.0

1.0 V, 100 nA I OFF 32nm 1.5 45nm 1.5 65nm Drive 90nm Current 1.0 1.0 (mA/um) 130nm NMOS 0.5 0.5 PMOS 0.0 0.0 1000 100 Gate Pitch (nm)

Drive currents continue to increase while gate pitch scales

23 32 nm Interconnects

8 um Cu M9

Pitch (nm) M8 566.5

M7 450.1

M6 337.6 M5 225.0 M4 168.8 M3 112.5 M2 112.5 M1 112.5 Hierarchical interconnect pitches

24 SRAM Cell Size Scaling

10

Cell Area 1 (um 2) 0.5x every 2 years 32 nm Generation 0.171 um 2 Cell 0.1 1995 2000 2005 2010

Transistor density continues to double every 2 years

25 SRAM Cell Scaling

65 nm 0.570 µm2

45 nm 0.346 µm2

32 nm 0.171 µm2

Good pattern resolution while scaling feature size and continuing with 193 nm exposure wavelength

26 32 nm SRAM Test Chip

• 291 Mbit

• 0.171 um 2 cell size

• >1.9 billion transistors

• >3.8 GHz operation

• Functional silicon in Aug ‘07

32 nm SRAM test vehicle included all transistor and interconnect features used on 32 nm microprocessors Ref. Y. Wang, paper 27.1, ISSCC ’09 27 30 Years of Scaling

Contact 1978

Ten 32nm SRAM Cells 2008

1 µm

28 The Old Era of Device Scaling

Device or Circuit Parameter Scaling Factor Device dimension tox, L, W 1/ κ Doping concentration Na κ Voltage V 1/ κ Current I 1/ κ Capacitance εA/t 1/ κ Delay time/circuit VC/I 1/ κ Power dissipation/circuit VI 1/ κ2 Power density VI/A 1

It has served us well for >30 years

29 The New Era of Device Scaling

SiGe SiGe

Copper + Low-k Strained Silicon High-k + Metal Gate

Modern CMOS scaling is as much about material and structure innovation as dimensional scaling

30 Outline

• Transistor Scaling

• Microprocessor Evolution

• Vision of the Future

31 Microprocessor Evolution

More transistors Higher frequency More data bits per cycle Instruction parallelism Out-of-order issue Multi-threading

Many of these innovations have been for improved performance, now the challenge is to innovate for power efficiency

32 45 nm Nehalem CPU

Modern microprocessors are a complex with multiple functional units and multiple interfaces

33 45 nm Nehalem CPU

23 master DLL circuits 11 PLL circuits 5 digital thermal sensors

Multiple clocking domains, local control

34 SRAM Dynamic Sleep Transistors

Normal SRAM Sleep transistors V DD sub-block leakage shut off leakage in inactive sub-blocks

SRAM Cache Sub-Block

Sleep Sleep Control Transistor

IREM images showing banks being accessed

VSS

5-10x leakage reduction during “retention/standby” Ref. K. Zhang, VLSI Circuits ‘04 35 Integrated Power Gates

Thick On-Die (M9) VCC Interconnect Layer

Power Gates

Core0 Core1 Core2 Core3 Nehalem

Memory System, Cache, I/O VTT

• Shuts off both switching power and leakage power • Enables idle cores to go to ~0 power, independent of state of other cores on die

Ref. R. Kumar, paper 3.2, ISSCC ’09 36 Power Gates Enabled with Design+Process Co-optimization

M9

M1-8

Thick metal 9 layer for low Ultra-low leakage transistor for resistance on-die power routing high off-resistance power gates

37 Nehalem Turbo Mode

Many threaded workloads Lightly threaded workloads - Turbo Mode Frequency Core 0 Core 1 Core 2 Core 3 Core Core 0 Core 1 Core 2 Core 3 Core Core 0 Core 1 Core 2 Core 3 Core

• All cores operating • Power gates shut off some cores • Zero power for inactive cores • Higher frequency for active cores

Dynamically delivering optimal performance and energy efficiency

Ref. R. Kumar, paper 3.2, ISSCC ’09 38 Nehalem Power Control Unit

Vcc BCLK

Core 0 PLL Vcc Freq. PLL  Integrated proprietary Sensors microcontroller Core 1

Vcc  Shifts control from hardware to Freq. PLL embedded firmware Sensors

Core 2 PCU  Real time sensors for voltage, Vcc temperature, current/power Freq. PLL Sensors  Flexibility enables sophisticated Core 3 Uncore algorithms, tuned for current Vcc LLC Freq. PLL operating conditions Sensors

Ref. R. Kumar, paper 3.2, ISSCC ’09 39 Adaptive Frequency System

Higher freq.

Digital Voltage Supply Lower freq.

• Adaptive PLL frequency – Higher frequency during voltage peaks – Lower frequency during voltage droops • Up to 5% frequency improvement at same voltage • Lower power at same frequency

Ref. N. Kurd, VLSI Circuits ’08 40 PC Platform Comparison

1985 2008

TM Cache Clock Intel386 DRAM Nehalem Control Gen. Processor Control Processor

Cache Cache Intel387 Math DRAM DRAM TAG Data DRAM DRAM SRAM SRAM Co-processor DRAM DRAM DRAM DRAM

Modern microprocessors integrate many of the separate system components from past platforms

41 Microprocessor Evolution

Intel386 TM Nehalem Transistor Count: 280 thousand 731 million Frequency: 16 MHz >3.6 GHz # Cores: 1 4 Cache Size: None 8 MB I/O Peak Bandwidth: 64 MB/sec 50 GB/sec Adaptive Circuits: None Sleep Mode Turbo Mode Power Gating Adaptive Frequency Clocking

42 45 nm SoC Transistors

1000 1000 NMOS PMOS 100 100 High High Performance Perfomance 10 1.0V 10 1.0V (nA/um) (nA/um) 1 1 OFF OFF I I Low Low Power Power 0.1 1.1V 0.1 1.1V

0.01 0.01 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.4 0.6 0.8 1.0 1.2 1.4 1.6

ION (mA/um) ION (mA/um)

Wider range of transistor types provided for SoC: High performance and low power Ref. C. Jan, IEDM ‘08 43 45 nm SoC I/O Transistors

NMOS @ 1.8V PMOS @ 1.8V 10 1 65nm 45nm 65nm 45nm

SiO 2 HKMG SiO 2 HKMG 1 0.1

0.1 +17% 0.01 +57% (nA/um) (nA/um) OFF OFF OFF I I 0.01 0.001

0.001 0.0001 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6

ION (mA/um) ION (mA/um)

Wider range of transistor types provided for SoC: High speed, high voltage I/O Ref. C. Jan, IEDM ‘08 44 Devices for SoC Analog Circuits

Passive Elements • Precision resistor • High Q varactor • High Q inductor

70 NMOS Active Elements 60 28nmx0.9umx100 VDS = 1.1V • RF CMOS VG = 0.6V 50

40 RF + Mixed Signal Circuits

30 H21 • A to D, D to A converters (dB) Gain -20dB/dec. Mason’s U • RF transceiver 20 • LCPLL 10 fT = 395 GHz fMAX = 410 GHz • High speed I/O 0 0.1 1 10 100 1000 Frequency (GHz)

Ref. C. Jan, IEDM ‘08 45 The Old Era of Microprocessor Scaling

Larger Cores Higher Frequency Higher Power

It has served us well for >30 years

46 The New Era of Microprocessor Scaling

Many-Core Multi-Core Multi-Function System on a Chip

Avoiding the power wall requires a systemic approach from process technology through circuit design to micro-architecture to deliver products with power efficient performance

47 Outline

• Transistor Scaling

• Microprocessor Evolution

• Vision of the Future

48 Future Scaling Challenges

• Patterning ever-smaller features sizes

• Transistor and interconnect technologies that provide higher performance at lower power

• Continued voltage scaling for low power

• Integrating a wider range of device types for system-on-chip or system-in-package products

49 Lithography

1 1000

Wavelength 248nm 193nm

OPC micron 0.1 100 nm Phase shift Immersion

32nm Feature Size 22nm EUV 13.5nm 15nm 0.01 10 1980 1990 2000 2010 2020

193 nm enhancements got us to the 32 nm generation

50 Layout Restrictions

65 nm Layout Style 32 nm Layout Style

• Bi-directional features • Uni-directional features • Varied gate dimensions • Uniform gate dimension • Varied pitches • Gridded layout

51 Lithography Options for Beyond 32 nm

Pitch Doubling 2-D Features

Double Patterning • Pitch doubling • Improved 2-D features

Pixilated Mask Printed Image

Computational Lithography • Pixilated mask • Existing 193 nm litho tools

52 Extreme Ultraviolet Lithography

Cymer beta source Intel EUV Mask ASML ADT printed wafer

2007

1H1H1H’1H ’’’08080808

C

2H2H2H’2H ’’’08080808

Target 2H2H2H080808 08

Philips beta source Photoresist Development Nikon EUV1 printed wafer Continued progress towards EUV implementation

53 Transistor Options

340 <110> Hole <100> 290 /Vs) /Vs) 22 Substrate Engineering 240 <110> <100> (100) + Increased p-channel mobility 190 <111> 2x ? Impact on n-channel mobility 140 <110>

Hole Mobility (cm (cm Mobility Mobility Hole Hole 90 (100) Mobility (110) (110) Mobility 40 -3000 -2000 -1000 0 <110> Stress (MPa)

FinFET Multi-Gate FETs + Improved electrostatics + Steeper sub-threshold slope GAA ? Higher parasitic resistance ? Higher parasitic capacitance

54 III-V Transistor Options

500 InGaAs QWFET [L = 80nm] InSb p-QWFET [L = 40nm] G 150 G 400 V =0.5V V =0.5V [GHz] [GHz] DS DS T T

300 100 1.1V 1.1V 200 V =0.5V DS 50 V =0.5V Strained Si 100 DS p-MOSFET Silicon [L = 60nm] G [L = 60nm] G Cut-offfrequency, f 0 Cut-offfrequency, f 0 10 1 10 2 10 3 10 1 10 2 10 3 DC power dissipation [ µW/ µm] DC power dissipation [ µW/ µm]

InGaAs NMOS QWFET InSb PMOS QWFET

Peak fT > 400GHz at Vcc = 0.5V Peak fT > 140 GHz at Vcc = -0.5V

III-V materials for improved performance at low voltage

55 3-D Chip Stacking

+ High density chip-chip connections Top Chip

+ Small form factor TSV Bottom Chip + Combine dissimilar technologies Package

? Added cost CPU TSV ? Degraded power delivery, Memory heat sinking Package ? Area impact on lower chip

3-D chip stacking using through-silicon vias

56 Optical Interconnects

Optical Interconnects Optical Optical Layer Layer Laser

Chip Chip

Ge Photodetectors

Waveguides Laser

Modulators Optical Layer Chip (CPU, Memory, Graphics, etc.)

Nearer term: High bandwidth chip-chip interconnects Longer term: On-chip interconnects Ref. I. Young, paper 28.1, ISSCC ’09 57 High Density Memory

Floating Body Cell Phase Change Memory Seek and Scan Probe

Dense memory increasingly important Several novel directions being explored

58 System Integration

Discrete 2-D Integration (SoC)

High High Density Speed Memory Memory

High Low Perf. Power Logic Logic 3-D Integration Power Radio Regulator Logic Memory Power Reg. Radio Sensors Photonics Sensors Photonics

System integration needed for performance, power, form factor Challenge is to integrate wider range of heterogeneous elements

59 Higher Level System Integration

Organic Electronic

Computing

Sensors

Power Supply Motion

Reptile Autonomous Vehicle Stanford entry 2007 DARPA challenge

We’re trying to emulate nature’s capabilities

60 Evolutionary Comparison

Organic

Complex Single-Cell Multi-Cell Reptile Human Molecule Organism Organism Electronic

Transistor Integrated Microprocessor Autonomous Robot Circuit PC Vehicle What did nature have to “invent” to evolve to higher forms?

61 Brain Neuron

Input

Output

Up to 1 meter in length

~50 um Neuron Transistor Charge carrier: Ions Electrons Voltage swing: 100 mV 1.0 V Threshold voltage: 10-20 mV ~300 mV

Nature is a master of low power operation Neuron image from J. Nolte [36] 62 Organic vs. Electronic Circuits

FI, FO ~1000 Operates ~100 Hz

AND/OR Function

Brain circuits are slow but massively parallel Neuron image from J. Nolte [36] 63 Organic vs. Electronic Interconnects

Myelinated Axon 25 m/sec

1.0 mm

Cu + Low-k + Repeaters >10 7 m/sec RepeaterRepeater Cu Wire Low-kCu Wire

Low-k

0.5 mm

Myelin coating improves axon signal speed ~10x, but still slow Axon image from J. Nolte [36] 64 Organic vs. Electronic Systems

10 11 Neurons >10 8 CPU Transistors # Devices: 10 14 Synapses  10 11 System Total Eyes, Ears, Taste, Keyboard, Radio, Input Devices: Touch, Smell  USB Port Operating Freq: 100 Hz >2 GHz  Power: 20 Watts  40 Watts

We have a way to go and much to learn

65 Conclusion

• Moore’s Law continues, but the formula for success is changing – New materials and device structures are needed to continue scaling – Circuit design and micro-architecture innovations focus more on power efficiency • System level integration is increasingly important – Success will be determined by ability to integrate a wider and more heterogeneous set of components • Organic evolution has given us some clues for effective higher level system integration – Low power operation – Massive parallelism – Integrated sensors

66