Physical Design Challenges and Innovations to Meet Power, Speed, and Area Scaling Trend

Home , TSMC

LC LU TSMC TSMC Fellow/Senior Director, R&D

Platform Solutions

Mobile

3D Transistors High (FinFET, GAA-NWT) Performance Computing

Automotive CPU

Voice 3D Stacking

(BSI, Monolithic MEMS) Camera IoT

Video MP 3 Storage SYSTEM SoC 3D Packaging INTEGRATION (WLP, WoW) Multiple  System performance/  Better performance Bandwidth Chips  Better Power  Application-specific on PCB  Better form factor  Process technology + Design Solutions 2D 3D © 2017 ISPD ISPD 2017 2 Application-Optimized Design Platforms Area, Performance and Power

Power Speed & Memory Bandwidth High- Performance Functional Safety Computing Standard Compliance

Mobile

Ultra-Low Power Automotive

IoT

Speed

 Area: process primary dimension scaling slows down  Very challenging to continue Moore’s law economically  Process and design co-optimization to provide enough area scaling  Performance: dimension scaling grows metal and via resistance exponentially  Fully automatic and smart via pillar design flow to reduce high resistance impact  Power: ultra low voltage for ultra low power  Robust design and variation modeling at low voltages  Heterogeneous integration  Low cost and high system performance 3D packaging  High physical design complexity  Machine learning to be applied to physical design

 Process primary dimension (metal, gate and fin pitch) scales less and per generation cadence takes longer  Process-design co-optimization innovation to keep area scaling  Fin depopulation to increase cell density  Power plan and cell layout co-optimization to increase cell placement utilization  EUV to increase routing density

4-fin cell 3-fin cell 2-fin cell

 Higher fins have highest speed  Fewer fins have highest speed@same power and lowest power@same speed

Logic Density 3.0

2.0 Cell Utilization 1.0 80%

75%

70% 16nm 10nm 7nm 3~4 fins 3 fins 2 fins

 To improve power plan IR, PG via count is increasing over technology generations

Shrink PG Pitch to increase via count

 However, shrinking PG pitch impacts routing resources. So different and new PG design approaches evolved

Allows better cell placement freedom

Uniform M1 Dual M1 Top Down

PG design has to be friendly to cell architecture (Uniform  Dual) PG and Cell Optimized Co-Design Cell re-Design

Big cell design has to be Bottom Up redesigned to be compatible with Dual PG architecture

Power strap: No stagger pin: 5 access points Less cells placed under PG

Unused space

NXE3300 EUV Hole type DSA, Line/Space DSA, Single 16 nm half pitch 12 nm half pitch Patterning

Inverse Multiple EUV Directed Self- Litho. patterning Assembly

 More Metal resources for logic density improvement  2 library sets are needed for two metal offset.

Metal:Poly = 1:1 Metal:Poly = 2:3

ND2D1 ND2D1 M1:PO pitch = 1:1 M1:PO pitch = 2:3

M1:Poly pitch = 1:1 M1:Poly pitch = 2:3

2.0 Optimized power plan, Cell pin access, Utilization 1.0 EUV routing 80%

75%

70% N16 N10 N7 11ML 12ML 13ML

 Process dimension scaling grows metal and via resistance exponentially  Fully automatic and smart EDA design flow is needed to effectively solve high resistance issue for 7nm and beyond

75 2

) Mx R

1.8 nm

60 40

increase ~3X 1.6 nm)

45 40

1.4 (vs. (vs. 30 increase ~3X

Mx C 1.2 Cap

15 Mx

Mx Resistance (vs. (vs. Resistance Mx

0 0.8 40nm 28nm 16nm 7nm 5nm

50%

40%

30%

20%

10% BEOL R delay vs. Total delay Total vs. delay R BEOL

0% 40nm 28nm 16nm 7nm 5nm

Single stacking via Via pillar

M6 M6

M5 M5 M4 M4

M3 M3 M2 M2

• Large drivers + upper thick metal + via pillar to reduce transistor, metal and via resistance • VIA-Pillar enablement is complex and requires EDA enablement across all Implementation phases

 Implies layer promotion for better wire resistance  Effectively reduces source Via resistance

Connection through DPT layer: DPT wire resistance dominates

Metal R Source Via R Std. Wire cell (20μm) 55.43X 1X

Layer promotion to non-DPT layer: Source Via resistance increases

Metal R Source Via R 13.23X 3.29X

Via pillar: Best combination of metal and source Via resistance

Metal R Source Via R 13.23X 1X

50%

40%

30%

20% with Via Pillar

10% BEOL R delay vs. Total delay Total vs. delay R BEOL

0% 40nm 28nm 16nm 7nm 5nm

• EM via pillar (EM VP), used to guarantee cell-level EM, has to be 100% inserted – One cell master will only have 1 EM VP • Performance via pillar (Performance VP), which is larger than EM VP, can reduce more resistance – One cell master can have multiple performance VP

VP Association I

Cell Performance Performance Performance EM Default VP1 VP2 VP.. VP VP Master VP Association II Cell EM Performance Performance Performance Master VP VP1 VP2 VP..

Design Kits •VIA pillar (VP) definition, association

Floorplan

• VP aware placement Placement • Automatic NDR

• VP insertion and verification CTS • CTS with VP

Routing/Opt. • VP optimization • DEF out option

 Low voltage design reduces power effectively  Low voltage design challenges  Functionality robustness  Accurate variation modeling

Skew Cells/ D0 or Fine-Grained Cells Internal Half-Sizing Cells 0.9V- 0.6V Small High-Stack(3 or 4) Cells Transmission Gates Multi-Bit Latch/Flop Proper Stage Ratio to Maintain Internal Slew 0.5V and Proper P/N Ratio to Improve Noise Margin below New Circuit Structure to Enhance Robustness at Low–Vdd

 Improve cell performance by reducing delay degradation and variation induced by low-Vdd

• High sigma design checks are required to ensure robust operation at low operating voltage • For write operation, forward path (red) must be stronger than feedback path (blue)

Reset clk_b clk SI

D clk Reset clk_b

clk_b

Reset clk clk

clk_b Reset

2 1.8 Delay variation increases 1.6 exponentially as VDD decreases

1.4

1.2

0.8 deviation 0.6 ◄ Non-Gaussian Gaussian ► 0.4 at low voltage at regular voltage

0.2 Log of scaled delay standard delay scaled of Log 0 VDD

Flop Hold Constraint Behavior

Output Low VDD

Regular VDD

Hold time

 Methodology improvement – Timing values are models by both “mean” and “early/late distribution” – Implement the new models in timing characterization and STA tools Original distribution (Non-Gaussian with long right tail)

“Early” distribution to cover short tail “Late” distribution to cover long tail

 By using new timing model and advanced statistical OCV method, we achieve more accurate STA results compared to Monte-Carlo simulations

Optimistic

Slack

SPICE Monte Carlo Simple OCV Advanced Statistical OCV Path ID

 Low cost and high system performance  InFO and CoWoS integration  Integrated EDA design flow for package and chips

TFET RF CMOS Analog GAA- NWT eFlash

Source Gate Drain III-V FFET Mixed Signal Si CMOS BSI CIS 3D Low Integrated Ge FinFET Power Specialty on Si CMOS MEMS CMOS Technology

FinFET HV 3D-IC Technology BCD - Power IC InFO TM Cu-TSV CoWoS Vertical Stacking Si Interposer

Analog RF Passive

Logic

Memory Wafer Bonding

Conventional CoWoS / InFO Vertical Stacking

Conventional SIP 3D Wafer-based or MCM System Integration (Wirebond, (CoWoS, InFo-PoP, Vertical Stacking, Wafer Bonding) Flipchip, SMT)

4000

3000

CoWoS

2000 InFO

I/O Pin CountI/O Pin InFO-M 1000 InFO PoP

WLCSP*

0 0 200 400 600 800 1000 1200 1400 2 * WLCSP : Wafer Level Chip Scale Package Die/PKG size (mm ) ** InFO : Integrated Fan-Out *** CoWoS : Chip on Wafer on Substrate © 2017 ISPD ISPD 2017 35 Wafer-Level Packaging Technologies: Integrated Fan-Out (InFO)

4000 Logic Multi-chips integration Smallest form-factor InFO

3000 Cost competitive

CoWoSDRAM  Logic Through- 2000 InFO–Via InFO (TIV) InFO-PoP

I/O Pin CountI/O Pin InFO-M 1000 InFO PoP Logic Memory

* WLCSP InFO-M (Multi-chip) 0 0 200 400 600 800 1000 1200 1400 Die/PKG size (mm2)

4000

Homogeneous partition

3000

CoWoS High bandwidth memory 2000 integration for high-performance InFO computing

I/O Pin CountI/O Pin InFO-M 1000 InFO PoP High-performance heterogeneous partitioning WLCSP*

0 0 200 400 600 800 1000 1200 1400 Die/PKG size (mm2)

© 2017 ISPD ISPD 2017 37 Advanced Packaging Co-Design – SoC + InFO + IPD (Integrated Passive Device) FC-PoP InFO-PoP o o Rja (17.2 C/W) Rja (15.4 C/W) IPD 1 Thermal SR F-S pad L6 PM4 dissipation RDL3 Insulator5 PM3 improved by 12% L5 RDL2 Insulator4 PM2 L4 RDL1 250μm RDL0 PM1 Core Molding 590μm Silicon L3 Compounds Insulator2 L2 B-S pad Insulator1 L1 SR 2 Thickness TMV Silicon DRAM Molding reduced by 57% Compounds

DRAM Eye Opening: Eye Opening: 0.785 UI 0.663 UI 3 Eye width improved by >12%

InFO-PoP W/I IPD

CDL Netlist Auto-Export InFO Layout Creation InFO GDS PKG Layout In-design DRC InFO PKG Layout InFO CDL LVS Netlist

InFO DRC/LVS

RLCK Extraction < 4GHz RC/RLCK Extraction for STA, IR, SEM, & PI Analysis InFO RC Extraction S-parameter Extraction 4GHz – 10GHz S-parameters Extraction for SI/PI, & EMI analysis

Inter-die DRC/LVS Inter-die Inter-die DRC Inter-die LVS DRC/LVS InFO EM/IR Analysis

B1 B2 B3 B4 Dynamic A1 A2 A3 A4 Static IR IR Top die EM/IR Analysis Analysis Analysis

SI/PI Simulation SI/PI Simulation SI Analysis PI Analysis Thermal-aware EM/IR Analysis

Normal Analysis Thermal-aware (No Thermal) Analysis Thermal Analysis

Complex Physical Design Problems

 Place and route Machine Learning experiment  Feature extraction and convolutional neural network data mapping  Clock gating and routing congestion speed improvement

 Eliminate human & fixed-model subjective bias with statistical important features

Traditional EDA Approach Machine Learning Approach

Placement Placement result Routing result Routing Model Routing simulation overheads training overheads

Routing programming Detailed Learning

Prediction pattern model routing result

Easy to be biased and Only heuristics Leverage widely used deep no guaranteed quality are possible learning packages

Traditional (EDA) flow Flow enables by Machine Learning Pre-route design Congestion map after routing Congestion prediction Congestion map after routing

Only know routing results after Able to predict routing behavior and improve running routing congestion with new recipes to achieve 40Mhz © 2017 ISPD speed improvement ISPD 2017 42 ML Design Enablement Platform

RTL Synthesis Place & Route STA Signoff Tapeout

Adopt TSMC ML Platform to predict the design improvement opportunity

RTL Synthesis Place & Route STA Signoff Optimized design

APR TSMC ML Design Enablement Platform database Script to Feature Data pre- perform design Prediction extractions processing optimization

TSMC’s proven ML training models

RTL Synthesis Place & Route STA Signoff Tapeout

Adopt TSMC ML Platform to train design and build new models

RTL Synthesis Place & Route STA Signoff Optimized design

APR database TSMC ML Design Enablement Platform for new Feature Data pre- Script to learning perform design extractions processing optimization Prediction

Training Model Mgmt. TSMC’s models

Customer’s models

• Capability to predict and set accurate ARM A72 clock gating latency to avoid over-design and achieve better speed from 50 to 150Mhz • Post-route speed is improved by 40MHz with detour prediction and early fix

Predicted detours Real post-route detours

 Five semiconductor industry trends, issues and solutions are discussed: area, performance and power scaling, heterogeneous 3D integration and Machine Learning  Physical design and EDA play even more critical roles to extend Moore’s law and to enable highly integrated complex 3D SOC chips