Computer Aided Design of Digital Systems I

Professor Massoud Pedram Dept. of EE University of Southern California Spring 2003

Instructor Info

? Massoud Pedram

? Office : EEB 344

? Office Hours: Thursdays 9-12, EEB 344

? Email : [email protected]

?1 TA Info

? Soroush Abbaspour

? Office : EEB 206

? Office hours : Mondays 10-12, EEB 201

? Email : [email protected]

Class Info

? When : Fridays, 9-11:50 am

? Where: OHE 100

? Class Web Page: http://www-classes.usc.edu/engr/ee-s/680

?2 Readings

? Required Textbook:

? We will not be using a textbook for this class, but will instead use handouts of powerpoint slides and published papers from the literature. These will be accessible from the class web site.

? Recommended Readings:

? S. M. Sait and H. Youssef, VLSI Physical Design Automation: Theory and Practice, IEEE Press, Piscataway, NJ, 1995.

? N. Sherwani, Algorithms for VLSI Physical Design Automation, 3rd ed., Kluwer Academic Publishers, Boston, MA, 1995.

? A. B. Kahng and G. Robins, On Optimal Interconnections for VLSI, Kluwer Academic Publishers, Boston, MA, 1995.

? M. Sarrafzadeh and C. K. Wong, An Introduction to VLSI Physical Design, McGraw-Hill, New York, NY, 1996.

? E. G. Friedman, ed., Clock Distribution Networks in VLSI Circuits and Systems, IEEE Press, New York, NY, 1995.

Course Objectives

? EE680 is a CAD course aimed at graduate students who want broad exposure to the algorithms and data structures of modern VLSI design tools

? EE680 is an applied algorithms class, where the application area is VLSI design. We focus on the class of designs called Application Specific Integrated Circuits where the goal is to go from a high-level design down to a mask layout both correctly and quickly

?3 Prerequisites

? EE457

? EE477L or EE577a

? CSCI 455x

? Graduate student standing

Syllabus

? Introduction to Physical Design

? Partitioning

? Placement

? Global Routing

? Detailed Routing

? Static Timing Analysis

? Clock Design

? Supply Network Design

? Parasitic Extraction

?4 Grading

? Homework: 30%

? Late home work will not be accepted

? Midterm Exams: 40%

? Two in-class, open-book exams, each 20%

? Final Project: 30%

? Requires people to work in groups of two

Policies

? No “Incomplete” grades will be given for the course, except under very extreme circumstances.

? You are not permitted to submit extra work in an attempt to raise your grade.

? You are responsible for all assigned readings and information presented in class, including due dates, assignments, exams and so forth. Moreover, you are expected to attend all class meetings.

? Scholastic misconduct will not be tolerated. Scholastic dishonesty includes, but is not limited to: cheating on assignments or exams; plagiarizing; or interfering with another student’s work.

?5 Modern Physical Design: Algorithm Technology Methodology

Based on ICCAD-00 tutorial by Chow/Kahn/Sarrafzadeh with contributions from Keutzer

Outline

? Why CAD

? Technology trends

? What is Physical Design

? Post-layout optimization methodologies

? manufacturability and reliability

? performance

? Custom or custom-on-the-fly methodologies

? Flavors of planning-based methodologies

? Implications for P&R

?6 Micro-electronic Industry

Micro-electronic Industry: US$920 billion personal computers, cellular phones, networking equipment, video games, cameras

Supported by

Semiconductor Industry: $US150 billion microprocessors, micro- controllers, DSPs, integrated chipsets

Semiconductor Industry

US $150 Billion

Analog 15.2% MOS Micro Discrete & Opto. 33.7% 13.2%

0.9% Bipolar Digital 18.3% 14.8% MOS Memory MOS Logic

?7 Semiconductor Manufacturing

Design New New New Methodology Process Material Procedure

System IC specification Design Fabrication Packaging Test $150 billion

Tools: design simulation Equipment Material Equipment Equipment emulation

$3 billion $33 billion $23 billion

Making Sausages

Piglet Pig Pork Sausages Packaging

?8 Semiconductor Industry

? Rapid advancement in technology ? Miniaturization ? Low cost ? Unbounded market needs ? Vertical integration ? Capital intensive ? Customization ? Short product life ? Brain power intensive

Technology

rapid advancement in technology miniaturization, low cost Moore’s Law cheaper, smaller, faster systems

greater market needs

?9 Miniaturization (Production)

? m 1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 1990 1992 1994 1996 1998 2000 2002 Thickness of skin is 100 ? m

Diameter of a piece of hair is 50 ? m Finger nails grow by 1 ? m in 10 minutes

Miniaturization

? m 1.40 1.20 1.00 0.80 0.60 0.40 0.20 0.00 1990 1992 1994 1996 1998 2000 2002

WW Far East

?10 Influencing CAD: Moore’s Law

NTRS: CHIP Frequency (GHz)

?11 Low Cost

? A transistor cost $30 in 1960

? 8080 (5,000 tx) cost $150 in 1974: 3 cents/tx

? Pentium-II (7,500,000 tx) cost $225 in 1997: 0.003 cents/tx

? 1,000 fold decrease from 1960 to 1974

? 1,000,000 fold decrease from 1960 to 1997

Vertical Integration

Design Masks Foundry Packaging Test

Lead frame Wafers Chemicals

?12 Capital Intensive (Construction Cost)

Million US$ 2500 Moore’s Law also applies! 256M, 200mm 2000 20,000 Wafer/m

64M, 200mm 256M, 300mm 1500 25,000 Wafer/m 20,000 Wafer/m Fully Automated 4M, 150mm 1000 20,000 Wafer/m 30% Automated 256K, 150mm 16M, 150mm 500 Manual 25,000 Wafer/m 64K, 100mm 50% Automated DRAM 1M, 150mm 0 20,000 Wafer/m 1970 1975 1980 1985 1990 1995 2000

Customization and Short Product Life

Complex Design Work Short Design Cycle

Brain Power Design Power

?13 Tools and Design Methodology

Design Tools Methodology System Level Tools & Design Methodology Are Inextricably Interwoven- RTL like an “Electronic” DNA

Gate Level Underlying Architecture adds yet Another Dimension Transistor Architecture Level

Physical Level

Where does CAD fit in

?14 CAD at every level…

Increased Device and Context Complexity

?15 Deep Sub Micron effects

Heterogeneity on CHIP

?16 Stronger Market Pressure

A Quadruple-Whammy

?17 Evolution of EDA Industry

Design Process

?18 Case Study: Alpha Performance History

100

EV67-700

EV6-575

EV56-500 EV56-600

EV56-400 10

SPECint95. EV5-300

EV45-275

EV4-200

1 1992 1993 1994 1995 1996 1997 1998 1999 Date of Introduction

Courtesy Bill Herrick Director, Alpha Microprocessor Development

Case Study: Complexity Trends

? Process scaling has continued steadily

Process Features ? Planarization has enabled an increase 0.8 10 in the number of interconnect layers 0.7 8 0.6 ? Transistor counts have increased 0.5 6 0.4 dramatically with the L2 cache SRAMs 0.3 4 0.2 ? Additionally, design team size has

2 Metal Layers Dimension (um) 0.1 0 0 increased ~40% per generation EV4 EV5 EV6 EV7 EV8 ? Opportunities to manage complexity Chip Features and productivity 300 450 ? Fundamental understanding and modeling of 400 250 ) 350 2 process and circuit element behaviors 200 300 250 ? High level design methods 150 200 100 150 ? CAD Die Size (mm

Transistors (M) 100 50 ? Design reuse 50 0 0 ? Micro-architecture EV4 EV5 EV6 EV7 EV8

?19 Case Study: Power Dissipation Trends

? Power consumption is increasing Power Dissipation ? Better cooling technology needed 160 3.5 ? Supply current is increasing faster! 140 3 120 2.5 ? On-chip signal integrity will be a 100 2 80 major issue 1.5 60

Power (W) 1 40 Voltage (V) ? Power and current distribution are 20 0.5 0 0 critical EV4 EV5 EV6 EV7 EV8 ? Opportunities to slow power growth Supply Current ? Accelerate Vdd scaling 140 3.5 ? Low ? dielectrics & thinner (Cu) 120 3 interconnect 100 2.5 80 2 ? SOI circuit innovations 60 1.5 ? Clock system design

Current (A) 40 1 Voltage (V) ? micro-architecture 20 0.5 0 0 EV4 EV5 EV6 EV7 EV8

Case Study: Performance Trends

? Performance has increased Clock Speed significantly (7x) faster than frequency 60 1600 1400 ? Performance tracks transistor count 50 1200 40 when L2 cache ignored 1000 30 800 ? Transistor budget has increased more than 20 600 performance when L2 cache is considered 400 10 200 but Frequency (MHz)

Relative Performance 0 0 ? benchmarks did not reflect larger EV4 EV5 EV6 EV7 EV8 applications

Transistor Count ? Opportunities to continue performance 70 300 improvements 60 250 50 200 ? Continued scaling of devices, interconnect 40 150 and dielectrics 30 100 ? Clock distribution 20 10 50 Transistors (M) ? Micro-architecture Relative Performance 0 0 ? System design EV4 EV5 EV6 EV7 EV8

?20 Case Study: Challenging Design Trends

? Micro-architecture and logic design Logic Levels per Cycle are stressed as frequency has 20 1600 increased faster than scaling 1400 15 1200 ? Further reducing the number of gate 1000 10 800 delays per cycle will be difficult 600 5 400 ? Cycles to communicate across chip 200 Frequency (MHz)

Gate Delays per Cycle 0 0 track with frequency EV4 EV5 EV6 EV7 EV8 ? Clock edge rates are not scaling

Cycles Across Chip ? Opportunities to continue 8 1600 7 1400 performance increases 6 1200 ? Chip implementation design 5 1000 4 800 ? Clock system design

Cycles 3 600 ? Micro-architecture 2 400

1 200 Frequency (MHz) 0 0 EV4 EV5 EV6 EV7 EV8

Outline

? Why CAD

? Technology trends

? What is Physical Design

? Post-layout optimization methodologies

? manufacturability and reliability

? performance

? Custom or custom-on-the-fly methodologies

? Flavors of planning-based methodologies

? Implications for P&R

?21 Overall Road Map Technology Characteristics

YEAR OF FIRST PRODUCT SHIPMENT 1997 1999 2002 2005 2008 2011 2014 TECHNOLOGY NODE 250 180 130 100 70 50 35 DENSE LINES (DRAM HALF-PITCH) (nm) ISOLATED LINES (MPU GATES) (nm) 200 140 100 70 50 35 25 Logic (Low-Volume—ASIC)‡ Usable transistors/cm2 (auto layout) 8M 14M 24M 40M 64M 100M 160M Nonrecurring engineering cost 50 25 15 10 5 2.5 1.3 /usable transistor (microcents) Number of Chip I/Os – Maximum Chip-to-package (pads) 1515 1867 2553 3492 4776 6532 8935 (high-performance) Chip-to-package (pads) 758 934 1277 1747 2386 3268 4470 (cost-performance) Number of Package Pins/Balls – Maximum Microprocessor/controller 568 700 957 1309 1791 2449 3350 (cost-performance) ASIC 1136 1400 1915 2619 3581 4898 6700 (high-performance) Package cost (cents/pin) 0.78-2.71 0.70-2.52 0.60-2.16 0.51-1.85 0.44-1.59 0.38-1.36 0.33-1.17 (cost-performance) Power Supply Voltage (V) Minimum logic Vdd (V) 1.8–2.5 1.5–1.8 1.2–1.5 0.9–1.2 0.6–0.9 0.5–0.6 0.37-0.42 Maximum Power High-performance with heat sink (W) 70 90 130 160 170 175 183 Battery (W)—(Hand-held) 1.2 1.4 2 2.4 2.8 3.2 3.7

Overall Road Map Technology Characteristics (Cont’d)

YEAR OF FIRST PRODUCT SHIPMENT 1997 1999 2002 2005 2008 2011 2014 TECHNOLOGY NODE 250 180 130 100 70 50 35 DENSE LINES (DRAM HALF-PITCH) (nm) Chip Frequency (MHz) On-chip local clock 750 1250 2100 3500 6000 10000 16903 (high-performance) On-chip, across-chip clock 375 1200 1600 2000 2500 3000 3674 (high-performance) On-chip, across-chip clock 300 500 700 900 1200 1500 1936 (high-performance ASIC) On-chip, across-chip clock 400 600 800 1100 1400 1800 2303 (cost-performance) Chip-to-board (off-chip) speed (high-performance, reduced-width, 375 1200 1600 2000 2500 3000 3674 multiplexed bus) Chip-to-board (off-chip) speed 250 480 885 1035 1285 1540 1878 (high-performance, peripheral buses) Chip Size (mm2) (@sample/introduction) DRAM 280 400 560 790 1120 1580 2240 Microprocessor 300 340 430 520 620 750 901 ASIC [max litho field area] 480 800 900 1000 1100 1300 1482 Lithographic Field Size (mm2) 22 x 22 25 x 32 25 x 36 25 x 40 25 x 44 25 x 52 25 x 59 484 800 900 1000 1100 1300 1482 Maximum Number Wiring Levels 6 6–7 7 7–8 8–9 9 10

?22 ITRS Acceleration

95 97 99 01 04 07 10 13 16 500

350 1994 250 Pitch (nm) - 1997 180 Technology Node 1998/1999 130

DRAM Half 100 DRAM Half Pitch - 90 Minimum Feature 70 2000 ITRS, MPU/ASIC Gate “Physical” 65 50 45 1999 ITRS MPU/ASIC Gate “In Resist” 35 33 25 23

Technology Node and MPU/ASIC Gate Length Minimum Feature Size (nm) 16 95 97 99 01 04 07 10 13 16 Year of Production

Technology Scaling Trends

? Interconnect

? Impact of scaling on parasitic capacitance

? Impact of scaling on inductance coupling

? Impact of new materials on parasitic capacitance & resistance

? Trends in number of layers, routing pitch

? Device

? Vdd, Vt, sizing ? Circuit trends (multi-threshold CMOS, multiple supply voltages, dynamic CMOS)

? Impact of scaling on power and reliability

?23 Technology Scaling Trends (Cont’d)

Technology Scaling Trends (Cont’d)

? Scaling of x0.7 every three years

? .25u .18u .13u .10u .07u .05u

? 1997 1999 2002 2005 2008 2011

? 5LM 6LM 7LM 7LM 8LM 9LM

? Interconnect delay dominates system performance

? consumes 70% of clock cycle

?24 Technology Scaling Trends (Cont’d)

? Cross coupling capacitance is dominating

? cross capacitance ? 100%, ground capacitance ? 0%

? 90% in 0.18u

? huge signal integrity implications (e.g., guardbands in static analysis approaches)

? Multiple clock cycles required to cross chip

? whether 3 or 15 not as important as fact of “multiple” > 1

Deep-Submicron Interconnect Complexity

18

16

14 Estimated Number of Nets At-Risk 12 Risk Factors: Interconnect Delay 10 Signal Integrity Electromigration 8 Process Variations 6

4

2 0.13 0.18 0.25 0.35 0.5 0 Risk Nets (millions) - At Technology (? )

?25 Scaling of Noise with Process

? Cross coupling noise increases with

? process shrink

? frequency of operation

? Propagated noise increases with decrease in noise margins

? decrease in supply voltage

? more extreme P/N ratios for high speed operation

? IR drop noise increases with

? complexity of chip size

? frequency of chip

? shrinking of metal layers

New Materials Implications

? Lower dielectric

? reduces total capacitance

? doesn’t change cross-coupled / grounded capacitance proportions

? Copper metallization

? reduces RC delay

? avoids electromigration (factor of 4-5 ?)

? thinner deposition reduces cross cap

?26 New Materials Implications (Cont’d)

? Multiple layers of routing

? enabled by planarized processes; 10% extra cost per layer

? reverse-scaled top-level interconnects

? relative routing pitch may increase

? room for shielding

Technical Issues in UDSM Design

? New issues and problems arising in UDSM

? catastrophic yield: critical area, antennas

? parametric yield: density control (filling) for CMP

? parametric yield: subwavelength lithography implications

? optical proximity correction (OPC)

? phase-shifting mask design (PSM)

? signal integrity

? crosstalk and delay uncertainty

? DC electromigration

? AC self-heat

? hot electrons

?27 Technical Issues in UDSM Design (Cont’d)

? Current context: cell-based place-and-route methodology

? placement and routing formulations, basic technologies

? methodology contexts

Technical Issues in UDSM Design (Cont’d)

? Manufacturability (chip can't be built)

? antenna rules

? minimum area rules for stacked vias

? CMP (chemical mechanical polishing) area fill rules

? layout corrections for optical proximity effects in subwavelength lithography; associated verification issues

? Signal integrity (failure to meet timing targets)

? crosstalk induced errors

? timing dependence on crosstalk

? IR drop on power supplies

?28 Technical Issues in UDSM Design (Cont’d)

? Reliability (design failures in the field)

? electromigration on power supplies

? hot electron effects on devices

? wire self heat effects on clocks and signals

Noise Sources

? Analog design concerns are due to physical noise sources

? because of discreteness of electronic charge and stochastic nature of electronic transport processes

? example: thermal noise, flicker noise, shot noise

? Digital circuits due to large, abrupt voltage swings, create deterministic noise which is several orders of magnitude higher than stochastic physical noise

? still digital circuits are prevalent because they are inherently immune to noise

? Technology scaling and performance demands make noisiness of digital circuits a big problem

?29 Why Now

? These effects have always existed, but become worse at UDSM sizes because of:

? finer geometries

? greater wire and via resistance

? higher electric fields if supply voltage not scaled

? more metal layers

? higher ratio of cross coupling to grounded capacitance

? lower supply voltages

? more current for given power

? lower device thresholds

? smaller noise margins

Why Now (Cont’d)

? Focus on interconnect

? susceptible to patterning difficulties

? CMP, optical exposure, resist development/etch, CVD, ...

? susceptible to defects

? critical area, critical volume

?30 Issues in IC Design

SYSTEM SUBSYSTEM CIRCUIT PHYSICAL MANUFACTURE DESIGN DESIGN DESIGN DESIGN INTERFACE

Spec RTL/code Gates/Cells Xtrs Masks Chip

SW design Logic Analog and Detailed Mask Design Partitioning optimization macro placement correction Functional Technology design Detailed Yield mapping mapping Extraction routing optimization Sorting Floorplanning

Analysis Performance Power Power, Power N/A modeling estimation noise distribution Power analysis analysis estimation Signal integrity System Functional Circuit N/A Verification simulation simulation simulation Formal checking LVS/DRC Equivalence checking Static timing verification

Test Test logic Test Pattern Chip test & Test architecture insertion model generation & diagnostics generation merge

New Figure 4 (Draft Rev. B, 3-12-99) Red denotes most challenging activity

Design Productivity Gap

? “Design Productivity Gap” = “failure of Design Technology” ? Number of available transistors grows faster than designer ability to design them well ? Increased design effort, risk, turnaround time (TAT) ? fewer designs are worth trying ? Manufacturing non-recurring engineering (NRE) cost also increasing (mask set) ? fewer designs are worth trying ? “Workarounds” sacrifice quality, value of designs ? even with workarounds, fewer designs worth trying ? This is a semiconductor industry problem, not an EDA problem

?31 The Design Productivity Gap “How many gates can I get for $N?”

Equivalent Added Complexity

68 %/Yr compounded Complexity growth rate $10 $3

21 %/Yr compound$1 Productivity growth rate

3 Yr. Design Year Technology Chip Complexity Frequency Staff Staff Cost* 1997 250 nm 13 M Tr. 400 MHz 210 90 M 1998 250 nm 20 M Tr. 500 270 120 M 1999 180 nm 32 M Tr. 600 360 160 M 2002 130 nm 130 M Tr. 800 800 360 M * @ $ 150 k / Staff Yr. (In 1997 Dollars) Source: SEMATECH

Headcount and Moore’s Law

?32 ASSP: 44% WallTime, 39% Total Effort After First Tape-out

100% Time and Effort Allocation by First Tape-out

First Tape-out 80%

60% Weeks) - 40% (Man 61% 20% 39%

Percent of Total Project Effort 56% 44% 0% 0% 20% 40% 60% 80% 100% Start of Concept Phase Release to Manufacturing Percent of Total Project Duration * Data Source: Collett International Inc.’s Design Productivity Management SystemTM (DPMS) database. ** ASSP (Application Specific Standard Product): Standard “off--the-shelf” IC product that has been designed to implement a specific application function.

ASSP Design Productivity +27% Annually

10X Design Productivity Trend* 9X 8X Key = ASSP*** design 7X 6X 5X 4X 3X

Design Productivity 2X 1X 0 06/94 06/95 06/96 06/97 06/98 Project Start Date * Data Source: Collett International Inc.’s Design Productivity Management SystemTM (DPMS) database. ** Methodology: The design productivity trendline is the ordinary -least-squares (OLS) regression line. 27% is the compound annual growth rate between 06/94 & 06/98. *** ASSP (Application Specific Standard Product): Standard “off--the-shelf” IC product that has been designed to implement a specific application function.

?33 Silicon Complexity and Design Complexity

? Silicon complexity: physical effects cannot be ignored

? fast but weak gates; resistive and cross-coupled interconnects

? subwavelength lithography from 350nm generation onward

? delay, power, signal integrity, manufacturability, reliability all become first-class objectives along with area

? Design complexity: more functionality and customization, in less time

? reuse-based design methodologies for SOC

? Interactions increase complexity

? need robust, top-down, convergent design methodology

Guiding Philosophy in the Back-End

? Many opportunities to leave $$$ on table

? physical effects of process, migratability

? design rules more conservative, design waivers up

? device-level layout optimizations in cell-based methodologies

? Verification cost increases

? Prevention becomes necessary complement to checking

?34 Guiding Philosophy in the Back-End (Cont’d)

? Successive approximation = design convergence

? upstream activities pass intentions, assumptions downstream

? downstream activities must be predictable

? models of analysis/verification = objectives for synthesis

? More “custom” bias in automated methodologies

Implications of Complexity

? UDSM: Silicon complexity + Design complexity

? convergent design: must abstract what’s beneath

? prevention with respect to analysis/verification checks

? many issues to worry about (all are “first-class citizens”

? apply methodology (P/G/clock design, circuit tricks, …) whenever possible

? must concede loss of clean abstractions: need unifications

? synthesis and analysis in tight loop

? logic and layout : chip implementation planning methodologies

? layout and manufacturing : CMP/OPC/PSM, yield, reliability, SI, statistical design, …

?35 Implications of Complexity (Cont’d)

? must hit function/cost points that maximize $/wafer

? reuse-based methodology

? need for differentiating IP ? customization

Outline

? Why CAD

? Technology trends

? What is Physical Design

? Post-layout optimization methodologies

? manufacturability and reliability

? performance

? Custom or custom-on-the-fly methodologies

? Flavors of planning-based methodologies

? Implications for P&R

?36 Implementation

Physical Design

?37 Standard Cell Layout

Gordian Placement Flow

?38 Library, Netlist, and Aspect Ratio7

Setting up Global Optimization

?39 Resulting Layout

Partitioning

?40 Layout after Min-cut

Final Placement

?41 Slicing Tree

Other details – slotting constraints

?42 Generating Final Placement

Standard Cell Layout

?43 Global Layout

Channel Routing

pins top and bottom edges

?44 Two-Dimensional Compaction

Final Placement

?45 Outline

? Why CAD

? Technology trends

? What is Physical Design

? Post-layout optimization methodologies

? manufacturability and reliability

? performance

? Custom or custom-on-the-fly methodologies

? Flavors of planning-based methodologies

? Implications for P&R

Example: Defect-related Yield Loss

? High susceptibility to spot defect-related yield loss, particularly in metallization stages of process

? Most common failure mechanisms: shorts or opens due to extra or missing material between metal tracks

? Design tools fail to realize that values in design manuals are minimum values, not target values

? Spot defect yield loss modeling

? extremely well-studied field

? first-order yield prediction: Poisson yield model

? critical-area model much more successful

? fatal defect types

?46 Defect-related Yield Loss

fatal defect types

Critical Area for Short Circuits

Critical Area for Shorts

?47 Critical Area for Short Circuits

Critical Area for Shorts

Approaches to Spot Defect Yield Loss

? Modify wire placements to minimize critical area

? Router issue

? router understands critical-area analyses, optimizations

? spread, push/shove (gridless, compaction technology)

? layer reassignment, via shifting (standard capabilities)

? related: via doubling when available, etc.

? Post-processing approaches in PV are awkward

? breaks performance verification in layout (if layout has been changed by physical verification)

? no easy loop back to physical design: convergence problems

?48 Example: Antennas

? Charging in semiconductor processing

? many process steps use plasmas, charged particles

? charge collects on conducting poly, metal surfaces

? capacitive coupling: large electrical fields over gate oxides

? stresses cause damage, or complete breakdown

? induced Vt shifts affect device matching (e.g., in analog)

Antennas

? Charging in semiconductor processing (Antenna effects refer to permanent gate oxide damage caused by electrical discharges during manufacturing)

? Standard solution: limit the “antenna ratio”

? antenna ratio = (Apoly + AM1 + ... ) / Agate-ox

? e.g., antenna ratio < 300

? AMx ? metal (x) area electrically connected to node without using metal (x+1), and not connected to an active area

?49 Antennas

? General solution == bridging (break antenna by moving route to higher layer)

? Antennas also solved by protection diodes

? not free (leakage power, area penalties)

? Basically, annoying-but-solved problem

? not clear whether today’s approaches scale into the future

? (today, mostly post-processing approaches)

Macroscopic Process Effects

Dummy Fill controls Variation greatly reduced. several types of process Dummy Area Fill Large topographical variation caused by soft Pattern distortions : pad and mechanical properties

Dense Array CMP, SOG

Contact Overetch causesReduced leakage contact etch variation

RIE

Isolated Transistor Dense Array

Dummy Fill Pattern

CVD

R. Pack, Cadence

?50 Subwavelength Optical Lithography

? WYSIWYG (layout = mask = wafer) failed starting with 350nm generation ? Optical lithography: feature size limited by diffraction ? Available knobs ? aperture: OPC ? phase: PSM

Optical Proximity Correction (OPC)

? Aperture changes to improve process control ? improve yield (process window) ? improve device performance

OPC Corrections No OPC With OPC

Original Layout

?51 OPC Terminology

Phase Shifting Masks (PSM)

conventional mask phase shifting mask glass Chrome Phase shifter 0 E at mask 0

0 E at wafer 0

0 I at wafer 0

?52 Field-Dependent Aberration

? Field-dependent aberrations cause placement errors and distortions

Big Chip Lens Cell A Field-dependent aberrations affect the fidelity and placement (X1 , Y 1) of critical circuit features. Cell A

Towards Lens Wafer Plane (X0 , Y0)

Cell A

Center: Minimal Edge: High Aberrations Aberrations (X2 , Y 2)

CELL_ A(X1,Y1) ? CELL_A(X0,Y0) ? CELL _A(X2,Y2)

R. Pack, Cadence

Design-Manufacturing Interface Changes EDA

? Closely related to foundry capital expenditure

? Unites EDA with much of mask industry, even process development

? Expands scope of physical “verifications”, moves awareness upstream into “syntheses” (logic, layout)

? Very comprehensive changes to data model, infrastructure, flows

? Unified, front-to-back solutions will win

?53 Process Variation Sources

• Design ? (manufacturing variability) ? Value • Intrinsic variations – Systematic: due to predictable sources, can be compensated during design stage – Random: inherently unpredictable fluctuations and cannot be compensated • Dynamic variations – Stem from circuit operation, including supply voltage and temperature fluctuations – Depend on circuit activity and hard to be compensated • Correlations – Tox and Vth0 are correlated due to

| Qdep | Vth0 ? V fb ? 2? B ? ?Tox ? ox – Line width and spacing are anti-correlated by one; ILD and interconnect thickness also anti-correlated

Technology Trend Over Generations

Technology 180nm 130nm 100nm Device nmos pmos nmos pmos nmos pmos Leff (µm) 0.10 ±15% 0.12 ± 15% 0.09 ± 15% 0.09 ± 15% 0.06 ± 15% 0.06 ± 15% Tox (nm) 40 ± 4% 42 ± 4% 33 ± 4% 33 ± 4% 25 ± 4% 25 ± 4% Vth0 (V) 0.40 ± 12.5% -0.42 ± 12.5% 0.27 ± 15.5% -0.35 ± 15.5% 0.26 ± 12.7% -0.30 ± 12.7% Rdsw (O/?) 250 ± 10% 450 ± 10% 200 ± 10% 400 ± 10% 180 ± 10% 300 ± 10% Interconnect local global local global local global e 3.5 ± 3% 3.2 ± 5% 2.8 ± 5% w (µm) 0.28 ± 20% 0.80 ± 20% 0.20 ± 20% 0.60 ± 20% 0.15 ± 20% 0.50 ± 20% s (µm) 0.28 ± 20% 0.80 ± 20% 0.20 ± 20% 0.60 ± 20% 0.15 ± 20% 0.50 ± 20% t (µm) 0.45 ± 10% 1.25 ± 10% 0.45 ± 10% 1.20 ± 10% 0.50 ± 10% 1.20 ± 10% ILDh (µm) 0.65 ± 15% 1.80 ± 15% 0.45 ± 15% 1.60 ± 15% 0.30 ± 15% 1.20 ± 15% Rvia (O) 46 ± 20% 50 ± 20% 54 ± 20% Length (µm) 61.01 1061 45.19 1127 33.90 1247 Wn/Ln (µm) 1.26/0.18 20/0.18 0.91/0.13 15/0.13 0.80/0.10 10/0.10 Dynamic Temp (oC) 25-100 25/100 25/100 Vdd (V) 1.8 ± 10% 1.5 ± 10% 1.2 ± 10% Tr (ps ) 160 95 60

• Values are from ITRS, BPTM, and industry; red is 3s • From ongoing work at UCSD/UCB/Michigan; some values may be incorrect (e.g., Rvia)

?54 Variation Sensitivities: Local Stage

30 25 Leff Leff 25 Vth0 Vth0 20 Rdsw w 20 Vdd eps 15 w t 15 ILDh

Variation (%) Variation (%) 10 Vdd

s 10 s 3 3 Delay Sensitivity for Noise Sensit1vity for 5 5

0 0 180nm 130nm 100nm 180nm 130nm 100nm

• Sensitivity evaluated by the percentage change in performance when there is 3s variation at the parameter • For local stage, device variations have larger impact on line delay and interconnect variations have stronger impact on crosstalk noise

Mapping Design to Value (1)

Across-Wafer Frequency Variation

?55 Mapping Design to Value (2)

AMD Processors

Athlon MP 450 Athlon 4 Mobile 400 Athlon Desktop 350 Duron 300 Duron Mobile

250

200 Price ($) 150

100

50

0 0 200 400 600 800 1000 1200 1400 1600 Clock Speed (MHz)

Goal: combine charts (1) and (2), drive Design optimizations

FO4 INV Delays Per Clock Period

120.00

100.00

386

80.00 486 DX2 DX4 Pentium Pentium MMX 60.00 Pentium Pro Pentium II Celeron 40.00 Pentium III Pentium 4

20.00 Number of FO4 inverter delays

0.00 1982 1987 1993 1998 2004 Year

? FO4 INV = inverter driving 4 identical inverters (no interconnect) ? Half of frequency improvement came from reducing logic stages ? Other extra performance came from slower Vdd scaling, but this costs too much power

?56 Wire Spacing and Layout Methodology

? Routing tools do not always optimize for spacing

? Stand-alone spacing

? layout (GDSII/DEF) -> layout (GDSII/DEF)

? Need tight interface to extraction and timing simulation

? Future: built-in extraction and timing estimates

Courtesy M. Berkens, DAC99

Data Aspects of Post Layout Optimization

? Jogging increases amount of data significantly

? Massive data needs striping

? minor loss of optimality for large stripes

? need work across hierarchy

? fix boundary location, “look” beyond cut- line

? need propagate net information

? Must support multi-processing for reasonable TAT

Courtesy M. Berkens, DAC99

?57 Wire Spacing and Shielding

? Pre routing specification

? convenient, handled by router

? robust but conservative

? may consume big area

? Post routing specification

? area efficient-shield only where needed & have space

? ease task of router

? sufficient shielding is not guaranteed

? Either way: definite interactions w/ fill insertion, possible interactions w/ phase-shifting (M1,M2?)

Courtesy M. Berkens, DAC99

Opportunities for Via Strengthening

? Add cut holes where possible

? wire widening may need larger/more vias

? “non square” via cells

? Increase metal-via overhang

? non uniform overhang

Courtesy M. Berkens, DAC99

?58 Wire spacing example

before spacing after spacing

Courtesy M. Berkens, DAC99

Performance Optimization Methodology

? Tradeoffs: Speed / Power / Area

? Must compromise and choose between often competing criteria

? For given criteria (constraints) on some variables, make best choice for free variables (min cost) => Need to be on boundary of feasible region

?59 Optimization Methods

Reorganize Logic Retime Buffer Size

Space

? Many different kinds of delay/area optimization are possible

? Many optimizations are somewhat independent

? use several different optimizations. Apply whichever ones are applicable

Optimization at Layout Level

? Size Transistors

? Space/size wires

? Add/delete buffers

? Modify circuit locally

?60 Transistor Sizing: Area Delay Curve

Area Feasible X You are here Region Optimal Curve lowest area for You need a given delay to be here

Infeasible Region X Min cost No assignment of sizes can produce a result here

Min delay Required Delay Delay

Transistor sizing: What will it buy me

? Scenario: Lots of capacitance in wires

? will it buy me speed: Yes

? will is save me power: “Yes” (qualified)

Architecture cannot Architecture is Area satisfy application (increase parallelism) an overkill for Delay cannot this application be improved Architecture and at any cost application are well matched Delay can be improved at almost no cost

Delay

?61 Transistor Sizing: Convexity + Dual Goals

5ns 10ns

W1 Circuits of constant delay:15ns (size of Xtr1) Faster circuits inside this curve

Circuits of constant cost W1 +W2 = Cte Optimal point for 10ns W2 (size of Xtr2) Note: Actually circuit delay is Posynomial ~ Convex

Transistor Sizing Methods

? Exact Solutions

? gradient Search

? convex Programming

? Approximate methods (very good solutions)

? iterative improvement on critical path (e.g. TILOS)

?62 Convex Programming Outside Delay Case

? Add more and more bounds

? guess new solution (deep) inside bounds

Bound 1 New guess delay is too slow so W1 New guess add new bound: Tangent to curve of equal delay at new Required guess. New feasible region is to the left (region which contains Delay X

New bound required delay).

Bound 2

W2

Convex Programming Inside Delay Case

? New guess delay is adequate but try and improve cost

Bound 1 W 1 Add a bound to force New guess search into region of lower cost. New bound Required Delay is constant cost curve X New bound passing through new

Bound 3 guess. New feasible Constant cost Bound 2 region is below new bound. W2

?63 Transistor Sizing: Approximate Solutions

Circuit delay affected only by delay of Move maximally in this critical path. Upsize by direction D1 to improve delay W1 small amount transistors on crit path with biggest D1/D2 = Constant Delay improvement/cost. Repeat until timing met

Constant Move minimally in this cost direction D2 to reduce cost

W2

Transistor Sizing: TILOS method

? = speedup of T ?2 = slowdown of T 1 T Effective speedup of T = ? -? = 5 5 1 2

3 Critical Path 4 Effective speedup per unit area

? Increase Xtr on critical path with largest effective speedup per unit area

? effective speedup: T

?64 Short Circuit Power Optimization

? Critical path methods miss short circuit power

Critical path

Short circuit power burned I slow in all of these gates due to slow input rise time. Gates not Slow node on critical path

? Increase Islow until capacitive power increase for driving Islow is more than decrease in S.C. power

? sweep circuit from outputs to inputs

TILOS Optimization Trajectory

Starting Point. Feasible Region Power X downsize Reduce S.C. Note: Min Size != Min Power X Infeasible X fix timing Region X Reduce S. Circuit.

Required delay Delay

?65 Buffer Insertion: Area-delay tradeoffs

With buffer Without buffer Feasible Region Area Is the Union of both feasible regions

Area of Min Size buffer Add buffer at Optimization Trajectory this point Delay

? Optimal curve is envelope of curves

? jump to buffered curve during timing optimization

Local Re-synthesis

? Pass Xtr re-synthesis, logic reorganization

? Gate collapsing

Tp P1 P3 P2

N3 N1 N2 Tn

? Tp conducts <=> N1 conducts. Replace Tp with N1 ? repeat for P2 and Tn for correct NMOS/PMOS

?66 Gate Collapsing: Example

? Trade off drive-capability/logic-levels

? Intrinsic Delay RC Delay

? reduce number of transistors (area )

Outline

? Why CAD

? Technology trends

? What is Physical Design

? Post-layout optimization methodologies

? manufacturability and reliability

? performance

? Custom or custom-on-the-fly methodologies

? Flavors of planning-based methodologies

? Implications for P&R

?67 Custom Methodology in ASIC(?) / COT

? How much is on the table w.r.t. performance?

? 4x speed, 1/3x area, 1/10x power (Alpha vs. StrongArm vs. “ASIC”)

? layout methodology spans RTL synthesis, auto P&R, tiling/generation, manual

? library methodology spans gate array, std cell, rich std cell, liquid lib, …

? Traditional view of cell-based ASIC

? Advantages: high productivity, TTM, portability (soft IP, gates)

? Disadvantages: slower, more power, more area, slow production of std cell library

? Traditional view of Custom

? Advantages: faster, less power, less area, more circuit styles

? Disadvantages: low productivity, longer TTM, limited reuse

Custom Methodology in ASIC(?) / COT (Cont’d)

? With sub-wavelength lithography:

? how much more guardbanding will standard cells need?

? composability is difficult to guarantee at edges of PSM layouts, when PSM layouts are routed, when hard IPs are made with different density targets, etc.

? context-independent composability is the foundation of cell- based methodology!

? With variant process flavors:

? hard layouts (including cells) will be more difficult to reuse

? ? Relative cost of custom decreases

? On the other hand, productivity is always an issue...

?68 Custom Methodology in ASIC(?) / COT (Cont’d)

? Architecture

? heavy pipelining

? fewer logic levels between latches

? Dynamic logic

? used on all critical paths

? Hand-crafted circuit topologies, sizing and layout

? good attention to design reduces guardbands

? The last seems to be the lowest-hanging fruit for ASIC

Custom Methodology in ASIC(?) / COT (Cont’d)

? ASIC market forces (IP differentiation) will define needs for xtor-level analyses and syntheses

? Flexible-hierarchical top-down methodology

? basic strategy: iteratively re-optimize chunks of the design as defined by the layout, i.e., cut out a piece of physical hierarchy, reoptimize it (“peephole optimization”)

? for timing/power/area (e.g., for mismatched input arrival times, slews)

? for auto-layout (e.g., pin access and cell porosity for router)

? for manufacturability (density control, critical area, phase- assignability)

? DOF’s: diffusion sharing, sizing, new mapping / circuit topology sol’s

? chunk size: as large as possible (tradeoff between near-optimality, CPU time)

?69 Custom Methodology in ASIC(?) / COT (Cont’d)

? antecedents: IBM C5M, Motorola CELLERITY, DEC CLEO

? “infinite library”recovers performance and density that a 300-cell library and classic cell- based flow leave on the table

Custom Methodology in ASIC(?) / COT (Cont’d)

? Supporting belief: characterization and verification are increasingly a non-issue

? CPUs get faster; size of layout chunks (O(100-1000) xtors) stay same

? natural instance complexity limits due to hierarchy, layers of interest

? Compactor-based migration tools are an ingredient ?

? migration perspective can infer too many constraints that aren’t there (consequence of compaction mindset)

? little clue about integrated performance analyses

?70 Custom Methodology in ASIC(?) / COT (Cont’d)

? Tuners are an ingredient ? (size, dual-Vt, multi- supply)

? limit DOFs (e.g., repeater insertion and clustering, inverter opts

? cannot handle modern design rules, all-angle geometries

? not intended to do high-quality layout synthesis

? Layout synthesis is an ingredient ?

? requires optimizations based on detailed analyses (routability, signal integrity, manufacturability), transparent links to characterization and verification

Custom Methodology in ASIC(?) / COT (Cont’d)

? “Layout or re-layout on the fly” is an element of performance- and cost-driven ASIC methodology going forward

? “Polygon layout as a DOF in circuit optimization” is a very small step from “polygon layout as a DOF in process migration”

? designers are already reconciled to the latter

?71 Outline

? Why CAD

? Technology trends

? What is Physical Design

? Post-layout optimization methodologies

? manufacturability and reliability

? performance

? Custom or custom-on-the-fly methodologies

? Flavors of planning-based methodologies

? Implications for P&R

Clear Thinking: Basics of Design Convergence

? What must converge ?

? logic, timing, and spatial embedding

? support front-end signoff, provide predictable back- end

? Ways to achieve Convergence through Predictability

? correct by construction (“assume, then enforce”)

? constraints and assumptions passed downstream; not much goes upstream

? ignores concerns via guardbanding

? separates concerns as able (e.g., FE logic/timing vs. BE spatial embedding)

?72 Clear Thinking: Basics of Design Convergence (Cont’d)

? construct by correction (“tight loops”)

? logic-layout unification; synthesis-analysis unification, concurrent optimization

? elimination of concerns

? reduced degrees of freedom, pre-emptive design techniques

? e.g., power distribution, layer assignment / repeater rules, GALS/LIS

What Must A Design Closure Tool Look Like

? Input

? RT-level HDL + technology + constraints

? Output

? “go”: recipe for invocation and composition of “commodity” SP&R

? “no go”: diagnosis of RTL code problems

?73 What Must A Design Closure Tool Look Like (Cont’d)

? Logical and physical hierarchies co-evolve

? spatial: top-down coarse placement ? physical hierarchy

? logic/timing: implementable RTL ? logical hierarchy

? limits of human fanout, organizations ? always have hierarchy

? natural sequence of no-floorplanning, phys -floorplanning, RTL- floorplanning...

? Details (must construct, predict, ignore, eliminate, ...)

? pin optimizations, interconnect planning, hierarchy reconciliations, budgeting mechanisms, compatibility with downstream SP&R, ...

What Must A Design Closure Tool Look Like (Cont’d)

? RTL partitioning

? understand interaction b/w block definition and placement quality

? recognize and cure a physically challenged logic hierarchy

? Global interconnect planning and optimization

? symbolic route representations to support block plan ECOs

? Controllable SP&R back end (including power/clock/scan)

? Incremental / ECO optimizations, and optimizations that are “robust” under partial or imperfect design knowledge

?74 Need RTL Planning Technology

? Better estimators (“initial WLMs”)

? to account for resource, topological heterogeneity

? to account for optimizations (placement, ripup/reroute, timing)

? ? “earliest RTL signoff with detailed P&R knowledge”

Observation: Commoditized SP&R

? RTL-to-GDSII will commoditize SP&R market sectors

? Many solutions are reasonable and will survive in the marketplace ? RTL-down SP&R becomes a “commodity”

? No solution is complete

? Key missing pieces include RTL partitioning; hierarchy and block management; real working RTL diagnosis and signoff

? Individual point technologies (e.g., global placement or detailed routing) become less valuable ? integration is most important

?75 Classic Picture

Combining Logical and Physical

?76 Required Advance in Design System Architecture Yesterday 1000nm Today 180nm Tomorrow 50nm

System System System System System Design Design Model Design Model

Functional Software SPEC Performance Logic Design Hw/Sw Perf. Testability Design Optimization Model Verification

Functional Cockpit Verification RTL SW Auto-Pilot RTL SW Optimize Analyze Opt Hw/Sw Comm. Perf. Synthesis SW Timing Synthesis Logic Hw/Sw Power + Timing Analysis Circuit Data File Noise + Placement Opt Place Model Test EQ check Wire Mfg. Timing Analysis Performance other Repository other File Testability Functional Verification MASKS File Verification Place/Wire Equivalence checking + Timing Analysis Place/Wire + Logic Opt

File File Multiple design files are converged into one efficient Data Model Timing Analysis Disk accesses are eliminated in critical methodology loops MASKS Verification of Function, Performance, Testability and other design Performance criteria all move to earlier, higher levels of abstraction followed by File Verification equivalence checking and assertion driven design optimizations Testability Industry Standard interfaces for data access and control MASKS Verification Incremental modular tools for optimization and analysis New Table 8

Planning / Implementation Methodologies (Cont’d)

? Centered on logic design

? wire-planning methodology with block/cell global placement

? global routing directives passed forward to chip finishing

? constant-delay methodology may be used to guide sizing

? Centered on physical design

? placement-driven or placement-knowledgeable

? Buffer between logic and layout synthesis

? placement, timing, sizing optimization tools

?77 Planning / Implementation Methodologies (Cont’d)

? Centered on SOC, chip-level planning

? interface synthesis between blocks

? communications protocol, protocol implementation decisions guide logic and physical implementation

Performance Optimization Tool Flow

Architectural/ Behavioral Design

RTL Design Floor Planning Logic Synthesis Electrical Analysis Optimization Placement /Estimation

Global Routing Detailed Routing

Extraction EM Power Timing Signal Integrity Clock Analysis Analysis Analysis Analysis Analysis

?78 Performance Optimization Methodology

? Design Optimization

? global restructuring optimization -- logic optimization on layout using actual RC, noise peak values etc.

? localized optimization -- with no structural changes and least layout impact

? repeater/buffer insertion for global wires

? Physical optimization

? high fanout net synthesis (eg. for clock nets); buffer trees to meet delay/skew and fanout requirements

? automatically determine network topology (# levels, #buffers, and type of buffers)

? wire sizing, spacing, shielding etc.

? Fixing timing violations automatically

? fix setup/hold time violations

? fix maximum slew and fanout violations

Ultra Deep Submicron Timing

GL Total RCw Delay=Gi+GL+RCw Gi Gi = Intrinsic Gate Delay 60% GL = Gate Delay from Loading RCw= Delay from Interconnect 25% Loading 20% 20% Critical Path Delay 5% 10% 0% 0% 0% Electrical Optimization

Gi GL RCw Logic Optimization 50K gate Block at 0.18 microns

?79 KEY ISSUE: PREDICTABILITY

? Everything we do is ultimately aimed at a predictable back end (physical implementation after some handoff level of design)

? Predictability == regression models

? Predictability == an enforceable assumption

? constant-delay paradigm (logical effort, DEC, IBM, ...)

? Predictability == fast constructive prediction

? RT-level (Tera), gate-level flat full-chip (SPC)

? Predictability == remove the need for predictability

? GALS, LIS

? “protocol- / communication-based system-level design”

Problems With Physical Hierarchy

? Physical hierarchy = hierarchical organization of the core layout region

? In general, no relation to high-quality (e.g., w.r.t. timing, routability) embedding of logic

? artificial physical hierarchy created by top-down placers

? core region is relatively homogeneous, isotropic: imposing a hierarchy is generally harmful

? Of course, some obvious exceptions

? regular structures (memories, PLAs, datapaths)

? hard IP blocks

? but these don’t fit well in top-down placement anyway

? General trend: non-hierarchical embedding approaches

?80 The Problem With Hierarchies

? Two hierarchies: logical/functional, and physical

? schematic hierarchy also typical in structured-custom

? RTL design = logical/functional hierarchy

? provides valuable clues for physical embedding: datapath structure, timing structure, etc.

? can be incredibly misleading (e.g., all clock buffers in a single hierarchy block)

? Main issues:

? how to leverage logical/functional hierarchy during embedding

? when to deviate from designer’s hierarchy

? methodology for hierarchy reconciliation (buffers, repartitioning / reclustering, etc.)

Interconnect Complexities

? Interconnect effects play a major role in the increasing costs for large hard-block or rectilinear-outline based design styles

? Probabilistic wireload models fail

? Without new capabilities for soft IP design and assembly, interconnect problems will significantly impact performance and cost for emerging IC technologies

Local wires

blocks

global Global wires wires Occurrence Rate (Normalized)

~0.5 wirelength die _ size

?81 Technology Scaling

? Block sizes cannot grow as rapidly as chip sizes since block design becomes increasingly more difficult --- each block is a chip design over multiple configurations

? If the blocks are inflexible, the global wiring problems begin to dominate all aspects of performance quality and system cost Occurrence Rate (Normalized) Larger chip with finer feature sizes ~0.5 wirelength die _ size

Soft Blocks

? With soft, flexible blocks, the system assembly can more thoroughly exploit the available technology

? Interconnect problem is controlled via: soft boundaries for area re- shaping; re-synthesis and re-mapping for timing; smart wires; and top-down specified block synthesis

? Cf. “Amoeba” placement, coloring analysis of “good” placements with respect to original logic hierarchy, etc. Occurrence Rate (Normalized) Superior timing, power and cost ~0.5 wirelength die _ size

?82 Soft-Block Assembly

? Hard rectilinear blocks make prediction of global wires extremely difficult

? Top-down constraint-driven assembly of soft fabrics: ability to significantly restructure circuit level blocks during the assembly process helps reach performance goals

? For example, timing-critical interconnect paths can be completely restructured during assembly without changing any of the system level specification

? Key issue: how to determine the soft blocks in the first place

? non-classical partitioning objectives: area sensitivity, functional and clocking structure, critical timing-path awareness, matching capabilities of block placer

? block placement: largely unsolved issue

? unclear whether packing-centric or connectivity-centric approaches are best

Aristo, DAC-2000

ARISTO TYPICAL DESIGN FLOW RTL Hard Blocks Library Design Design Gate-Level IP Blocks Constraints Netlist Verilog Verilog

Concurrent Block Synthesis

Concurrent Block Partitioning, Clustering & Placement

Block Shaping, Compaction & Early Planning Concurrent Port Placement

Gate-Level Optimization

Design Refinement Gate-Level Place & Route

Chip Top-Level Routing Assembly

RC Extraction

PREDICTABLE HIERARCHICAL DESIGN CONVERGENCE Timing Analysis

?83 Monterey, DAC-2000

RTL

statistical WLM Behavioral / RTL timing library synthesis

Design Signoff Physical Prototyping

Increasing

Timing Route

Place Modeling Detail logic

GDSII

Sequence, DAC-2000

3D PrepareExtraction Database Timing Sign-off TrueDelay-3D CalculationParasitics

Place Timing Sequence Timing RTL Synthesis & Analysis Route Analysis

InterconnectInterconnect DrivenDriven OptimizationOptimization

Driver sizing, topology-based optimization

?84 Cadence, DAC-2000

? RTL, chip constraints Constraints complete and Partitioning & Log/Phys Mapping block RTLs are feasible Block Area/Performance Estimation

Block Placement Ensure interblock delays

Inter-block Routing and Buffering are accounted for

Communication Logic Synthesis

Concurrent Placement, Synthesis No iterations from here down And Route of Cells in Blocks

Finalize Route/Extract/Back Ann.

Avant!, DAC-2000 “shared algs/data = design closure”

Design Closure Needs Consistency & Silicon Accuracy

Design Planning Delay Calculation

VDSM Physical Synthesis Optimization Placement Extraction Analysis Place & Route Routing VDSM Optimization Equivalence Checking Final Extraction Simulation/Analysis

VDSM Database Physical Verification Mask Synthesis

Capability is unique in the Industry

?85 Magma, DAC-2000 “fixed timing”

0.6ns 0.6ns 0.6ns 0.6ns

FF

? Actively managing wire delay:

? Through automatic sizing (sizing-driven placement)

? Through buffer insertion

Magma, DAC-2000 “timing closure dos and don’ts”

? Don’t: try to accurately adapt a model to reality

? The model might be accurate, the data is generally not...

? Instead: Adapt the reality to the model

? Use the simplest appropriate model

? Adapt reality (e.g. cell sizes) to keep model correct.

? Don’t: iterate:

? The loops are slow, and affect tool capacity

? Many parameters are optimized simultaneously

? Unclear when (or whether) it converges.

?86 Magma, DAC-2000 “timing closure dos and don’ts” (Cont’d)

? Instead:

? Pick a methodology that is correct-by-construction

? Don’t: bolt together tools using files or ‘databases’

? Steps do not cooperate and data is often inconsistent.

? Instead: use single data model

? All design and analysis data simultaneously available.

Synopsys Flow Example

Hard IP M Chip Architect: e m Module Compiler: • Budgeting Micro o Datapath • Specialized • Estimation Controller r y datapath • Floorplanning synthesis • IO placement

FlexRoute: Design Compiler: • Detailed top- Analog • RTL synthesis level routing PhysicalPlaced Chip Architect: CompilerGates • Coarse routing Physical Compiler: Gates • Detailed placement • RTL synthesis & placement together

Detailed standard cell routing: Cadence, Avant!, proprietary

?87 What is the Right Methodology for SOC

? Will productivity scale adequately relative to available capacity = design complexity ?

? Consider:

? Emerging networking, telecom ICs: >20M gates, <0.11um

? >80 soft IPs taking more than 65% of IC area

? >5 large hard IPs (CPU, DSP, DRAM)

? >200 small hard IPs (SRAM, FIFO, Analog, etc.)

? >50 clock domains

? Multiple power supplies

? High datapath and BIST content

More Radical Methodology Changes are Required

? Flat cell-based is out of capacity

? Cell abstraction inadequate

? Hierarchical block based is resource-intensive, insufficiently automated

? Block packing algorithms issues

? Difficult to automate as we did with cell-based

? Floorplanning breaks when there are hundreds of blocks

? Lack of “unified” and meaningful abstractions

? Lack of network-processing methods similar to those available in the front end (Verilog)

? Lack of automated solutions for clock, power, test

?88 Future Physical Implementation Platforms

? Where are the cycles ?

? Distributed, heterogeneous, massively parallel platforms

? Extremely cost-effective (Linux farms, idle desktops, …)

? Where is the productivity lever ?

? By definition, not in “commoditized” design tasks (logic optimization, technology mapping, placement, routing, …)

? Require new platforms and methodologies that decompose and distribute the design optimization problem, without loss of solution quality

? Typical issues: decoupling of design subproblems, combination of subsolutions into single solution

Outline

? Why CAD

? Technology trends

? What is Physical Design

? Post-layout optimization methodologies

? manufacturability and reliability

? performance

? Custom or custom-on-the-fly methodologies

? Flavors of planning-based methodologies

? Implications for P&R

?89 Cell-Based P&R: Classic Context

? Architecture design

? golden microarchitecture design, behavioral model, RT-level structural HDL passed to chip planning

? cycle time and cycle-accurate timing boundaries established

? hierarchy correspondences (structural-functional, logical (schematic) and physical) well-established

? Chip planning

? hierarchical floorplan, mixed hard-soft block placement

? block context-sensitivity: no-fly, layer usage, other routing constraints

? route planning of all global nets (control/data signals, clock, P/G)

? induces pin assignments/orderings, hard (partial) pre-routes, etc.

Cell-Based P&R: Classic Context

? Individual block design -- various P&R methodologies

? Chip assembly -- possibly implicit in above steps

? What follows: qualitative review of key goals, purposes

?90 Placement Directions

? Global placement

? engines (analytic, top-down partitioning based, (iterative annealing based) remain the same; all support “anytime” convergent solution

? becomes more hierarchical

? block placement, latch placement before “cell placement”

? support placement of partially/probabilistically specified design

? Detailed placement

? LEQ/EEQ substitution

? shifting, spacing and alignment for routability

? ECOs for timing, signal integrity, reliability

? closely tied to performance analysis backplane (STA/PV)

? support incremental “construct by correction” use model

Function of a UDSM Router

? Ultimately responsible for meeting specs/assumptions

? slew, noise, delay, critical-area, antenna ratio, PSM-amenable …

? Checks performability throughout top-down physical impl.

? actively understands, invokes analysis engines and macromodels

? Many functions

? circuit-level IP generation: clock, power, test, package substrate routing

? pin assignment and track ordering engines

? monolithic topology optimization engines

? owns key DOFs: small re-mapping, incremental placement, device-level layout resynthesis

? is hierarchical, scalable, incremental, controllable, well- characterized (well-modeled), detunable (e.g., coarse/quick routing), ...

?91 Out-of-Box Uses of Routing Results

? Modify floorplan

? floorplan compaction, pin assignments derived from top-level route planning

? Determine synthesis constraints

? budgets for intra-block delay, block input/output boundary conditions

? Modify netlist

? driver sizing, repeater insertion, buffer clustering

? Placement directives for block layout

? over-block route planning affects utilization factors within blocks

? Performance-driven routing directives

? wire tapering/spacing/shielding choices, assumed layer assignments, etc.

Routing Directions

? Cost functions and constraints

? rich vocabulary, powerful mechanisms to capture, translate, enforce

? Degrees of freedom

? wire widths/spacings, shielding/interleaving, driver/repeater sizing

? router empowered to perform small logic resyntheses

? “Methodology”

? carefully delineated scopes of router application

? instance complexities remain tractable due to hierarchy and restrictions (e.g., layer assignment rules) that are part of the methodology

?92 Routing Directions

? Change in search mechanisms

? iterative ripup/reroute replaced by “atomic topology synthesis utilities”: construct entire topologies to satisfy constraints in arbitrary contexts

? Closer alignment with full-/automated-custom view

? “peephole optimizations” of layout are the natural extensions of Motorola CELLERITY, IBM CM5, etc. methodologies

Summary: Silicon Complexity Challenges

? Silicon Complexity = impact of process scaling, new materials, new device/interconnect architectures ? Non-ideal scaling (leakage, power management, circuit/device innovation, current delivery) ? Coupled high-frequency devices and interconnects (signal integrity analysis and management) ? Manufacturing variability (library characterization, analog and digital circuit performance, error-tolerant design, layout reusability, static performance verification methodology/tools) ? Scaling of global interconnect performance (communication, synchronization) ? Decreased reliability (SEU, gate insulator tunneling and breakdown, joule heating and electromigration) ? Complexity of manufacturing handoff (reticle enhancement and mask writing/inspection flow, manufacturing NRE cost)

?93 Summary: System Complexity Challenges

? System Complexity = exponentially increasing transistor counts, with increased diversity (mixed-signal SOC, …) ? Reuse (hierarchical design support, heterogeneous SOC integration, reuse of verification/test/IP) ? Verification and test (specification capture, design for verifiability, verification reuse, system-level and software verification, AMS self- test, noise-delay fault tests, test reuse) ? Cost-driven design optimization (manufacturing cost modeling and analysis, quality metrics, die-package co-optimization, …) ? Embedded software design (platform-based system design methodologies, software verification/analysis, codesign w/HW) ? Reliable implementation platforms (predictable chip implementation onto multiple fabrics, higher-level handoff) ? Design process management (design team size and geographic distribution, data management, collaborative design support, systematic process improvement)

?94