ECE 5745 Complex Digital ASIC Design Topic 5: Automated Design Methodologies Christopher Batten

School of Electrical and Computer Engineering Cornell University

http://www.csl.cornell.edu/courses/ece5745 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA Comparison Part 1: ASIC Design Overview

PP Topic 1 Hardware Description MM Languages

Topic 4 Topic 6 Topic 5 Full-Custom Closing Automated Topic 8 Design the Design Testing and Verification Methodology Gap Methodologies

Topic 3 CMOS Circuits

Topic 7 Clocking, Power Distribution, Topic 2 Packaging, and I/O CMOS Devices

ECE 5745 T05: Automated Design Methodologies 2 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA Comparison Automated Design Methodologies

Automate transformations from Behavioral to Structural Domain and from Structural to Physical Domain

Adapted from [Ellervee’04]

ECE 5745 T05: Automated Design Methodologies 3 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA Comparison Design Principles in Automated Methodologies

I Modularity: Use modularity to enable mixing different custom and automated methodologies

I Hierarchy: Use hierarchy to more efficiently handle automatically transforming large designs

I Encapsulation: Automated methodologies have a significantly higher emphasis on encapsulation in all domains (behavoral, structural, physical) at all levels of abstraction: architecture-, register-transfer-, -level

I Regularity: Use regularity to create automated tools to generate structures like datapaths and memories

I Extensibility: Automated methodologies enable more highly parameterized and flexible implementations improving extensibility

ECE 5745 T05: Automated Design Methodologies 4 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA Comparison Agenda

Standard-Cell-Based Design System-on-Chip Platform-Based Design Gate-Array-Based Design Programmable Logic-Based Design Reprogrammable Logic-Based Design Comparison and Hybrids

ECE 5745 T05: Automated Design Methodologies 5 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Agenda

Standard-Cell-Based Design System-on-Chip Platform-Based Design Gate-Array-Based Design Programmable Logic-Based Design Reprogrammable Logic-Based Design Comparison and Hybrids

ECE 5745 T05: Automated Design Methodologies 6 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell-Based Design

Adapted from [Weste’11]

ECE 5745 T05: Automated Design Methodologies 7 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Example Standard Cell

Adapted from [SAED’11,Weste’11]

ECE 5745 T05: Automated Design Methodologies 8 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell-Based Flow CAD Algorithms

Topic 12 Topic 13 Synthesis Algorithms Physical Design Automation Placement

RTL to Logic Synthesis Global x = a'bc + a'bc' Routing y = b'c' + ab' + ac Technology Independent Synthesis x = a'b y = b'c' + ac Technology Detailed Dependent Routing Synthesis

ECE 5745 T05: Automated Design Methodologies 9 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell-Based Front-End Flow

Adapted from [Weste’11]

ECE 5745 T05: Automated Design Methodologies 10 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell-Based Back-End Flow

Adapted from [Weste’11]

ECE 5745 T05: Automated Design Methodologies 11 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Older Standard-Cell ASICS

Limited metal layers require dedicated routing channels and feedthrough cells

Adapted from [Rabaey’02]

ECE 5745 T05: Automated Design Methodologies 12 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Modern Standard-Cell ASICS

Increasing number of metal layers allow cells to be hidden under interconnect

Adapted from [Weste’11,Rabaey’02]

ECE 5745 T05: Automated Design Methodologies 13 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell Libraries

Adapted from [Weste’11]

ECE 5745 T05: Automated Design Methodologies 14 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell Libraries

Cell description coding Specification

Cell logic models, containing cell Description simulation functionality, pins, area, timing, power Used for circuit constructions, timing, Logic Synthesis power and area optimizations Formal Verification (RTL Vs Gates)

Pre-layout STA

Cell Library Cell physical model (Abstract no Timing view): cell size, pin locations, pin OK? directions Used by physical synthesis yes Floorplanning, Placement & Routing Cell physical view: Original cell layout Used to build layout of finished design Formal Verification (Layout vs.synthesized )

Cell descriptions: Behavioral Post-layout Simulation () and level(SPICE) yes Used to simulate completed design at no Timing Finished the required level OK? design

Adapted from [Melikyan]

ECE 5745 T05: Automated Design Methodologies 15 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell Library Characterization

Adapted from [Melikyan]

ECE 5745 T05: Automated Design Methodologies 16 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell Electrical Characterization

I Characterization computes cell parameter (e.g. delay, output current) depending on input variables: output load, input slew, etc.)

I Characterization is performed for various combinations of operating conditions: process, voltage, temperature (also called PVT corners)

slew 0 . 7 Input Slew slew 0 . 7 0 . 5 Process: Fast slew Temp: 125o 0 . 7 0 . 5 0 . 2 Voltage: 1.32v Iout 0 . 5 Process: Slow 0 . 2 0 . 1 Temp: -40o

0 . 2 0 . 1 . 023 . 047 . 065 . 078 . 091 Voltage: 1.08v Cchar output cap 0 . 1 . 023 . 047 . 065 . 078 . 091 Process: Typical output capTemp: 25o . 023 . 047 . 065 . 078 . 091 Voltage: 1.2v output cap

Adapted from [Melikyan]

ECE 5745 T05: Automated Design Methodologies 17 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison ST Microelectronics – 3-Input NAND Gate

Adapted from [Rabaey’02]

ECE 5745 T05: Automated Design Methodologies 18 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell Electrical Characterization (.lib)

/* Characterization fora 3-input NAND gate */ cell ( NAND3X0 ) {

/* Overall characterization */ cell_footprint : "nand3x0"; area : 7.3728; cell_leakage_power : 9.151417e+04;

/* Characterization for input pin IN1 */ pin ( IN1 ) { direction : "input";

/* Fixed input capacitance */ capacitance : 2.190745;

/* Transient capacitance values */ fall_capacitance : 2.212771; rise_capacitance : 2.168719;

ECE 5745 T05: Automated Design Methodologies 19 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell Electrical Characterization (.lib)

cell ( NAND3X0 ) { pin ( IN1 ) {

/* Short-circuit and internal switching power when IN2 and IN3 are zero */ internal_power () { when : "!IN2&!IN3";

/* 1D lookup tables to calculate power as function of input slew */ rise_power ( "power_inputs_1" ) { index_1( " 0.0160000, 0.0320000, 0.0640000" ); values( "-1.2575404, -1.2594251, -1.2887053" ); } fall_power ( "power_inputs_1" ) { index_1( " 0.0160000, 0.0320000, 0.0640000" ); values( " 1.9840914, 1.9791286, 2.0696119" ); } }

ECE 5745 T05: Automated Design Methodologies 20 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell Electrical Characterization (.lib)

cell ( NAND3X0 ) { pin ( QN ) { direction : "output";

/* Boolean logic eq forQN as function of inputs */ function : "(IN3*IN2*IN1)’"; timing () { related_pin : "IN1"; cell_rise ( "del_1_7_7" ) {

index_1( "0.016, 0.032, 0.064" );/ * Input slew */ index_2( "0.1, 3.75, 7.5" );/ * Load cap */

/* Lookup table to calculate delay as non-linear function of input signal slew and load cap */ values( "0.0178632, 0.0275957, 0.0374970", \ "0.0215562, 0.0316225, 0.0414275", \ "0.0261721, 0.0387623, 0.0496870" ) }

ECE 5745 T05: Automated Design Methodologies 21 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison Standard-Cell Physical Characterization

VDD VDD

A B A B Metal Pins

Y Y

NAND_1 GND GND

LayoutLayout View View AbstractAbstract Physical View View

Adapted from [Melikyan]

ECE 5745 T05: Automated Design Methodologies 22 / 55 • Standard-Cell • SoC/Platform Gate-Array Prog Logic FPGA Comparison SAED 90 nm Library – NAND Gate Data Sheet

Adapted from [SAED’11]

ECE 5745 T05: Automated Design Methodologies 23 / 55 Standard-Cell • SoC/Platform • Gate-Array Prog Logic FPGA Comparison Agenda

Standard-Cell-Based Design System-on-Chip Platform-Based Design Gate-Array-Based Design Programmable Logic-Based Design Reprogrammable Logic-Based Design Comparison and Hybrids

ECE 5745 T05: Automated Design Methodologies 24 / 55 Hybrids Chip Implementations Abound ‣ Ex: standard practice in microprocessors that data-paths are full-custom and control (instruction decode, pipeline control) in standard-cells. (Less common recently)

Control (“random”) logic difficult to “regularize”. Relatively small percentage of die area/power. Permits late binding of design changes.

Extra NAND or NOR gates were often added to control section, and some wafers left without metallization, to permit late design fixes through metal mask revisions (gate-array idea).

Lecture 02, Introduction 1 21 CS250, UC Berkeley Fall ‘11

Standard-Cell • SoC/Platform • Gate-Array Prog Logic FPGA Comparison System-on-Chip (SoC) System-on-chip (SOC) • Brings together: standard cell blocks, custom analog blocks, processor cores, memory blocks • Standardized on-chip buses (or hierarchical interconnect) permit “easy” integration of many blocks. – Ex: AMBA, Sonics, … • “IP Block” business model: Hard- or soft-cores available from third party designers. • ARM, inc. is the shining example. Hard- and “synthesizable” RISC processors. • ARM and other companies provide, Ethernet, USB controllers, analog functions, memory blocks, …

‣ Pre-verified block designs, standard bus interfaces (or adapters) ease integration - lower NREs, shorten TTM. Adapted from [Asanovic’11]

ECE 5745 SIP, SOP, T05:MCM Automated interesting Design Methodologies alternatives. 25 / 55 Lecture 02, Introduction 1 22 CS250, UC Berkeley Fall ‘11 Standard-Cell • SoC/Platform • Gate-Array Prog Logic FPGA Comparison SoC Hardware/Software Co-Design

Parallel Development of Parallel Development of Hardware IP Blocks Software IP Modules

Generic Specify Generic Hardware System-on-Chip Software IP Blocks IP Models

Electronic System Level Design

Partition Architecture Hardware/ Platform Select Select Software Architecture Software IP Platform Modules

Application- Integrate Application- Application- Integrate specific specific IP Blocks specific Software Hardware into Architecture Software IP Modules IP Blocks Platform Modules Operating Hardware/Software system Co-simulation Low-level Hardware Functional Software Software Design Flow simulation Design Flow Simulation

Adapted from [Wikipedia]

ECE 5745 T05: Automated Design Methodologies 26 / 55 Hardware and Low-level Software Emulation on FPGA-based Emulation Platform

Application Physical Software Design Development

Hardware/Software Prototype Verification on Software IC Fabrication Application Prototype Test or Development Board

Volume IC Fabrication

Ship ICs and Software to Clients Parallel Development of Parallel Development of Hardware IP Blocks Software IP Modules

Generic Specify Generic Hardware System-on-Chip Software IP Blocks IP Models

Electronic System Level Design

Partition Architecture Hardware/ Platform Select Select Software Architecture Software IP Platform Modules

Application- Integrate Application- Application- Integrate specific specific IP Blocks specific Software Hardware into Architecture Software IP Modules IP Blocks Platform Modules Operating Standard-Cell • SoC/Platform • Gate-Array Prog Logic FPGA Comparison Hardware/Software system Co-simulation Low-level Hardware Functional Software Software Design Flow simulation Design Flow SoC Hardware/Software Co-DesignSimulation

Hardware and Low-level Software Emulation on FPGA-based Emulation Platform

Application Physical Software Design Development

Hardware/Software Prototype Verification on Software IC Fabrication Application Prototype Test or Development Board

Volume IC Fabrication

Ship ICs and Software to Clients Adapted from [Wikipedia]

ECE 5745 T05: Automated Design Methodologies 27 / 55 Standard-Cell • SoC/Platform • Gate-Array Prog Logic FPGA Comparison SoC Platform-Based Design

“Only the consumer gets freedom of choice; designers need freedom from choice.” – Orfali I A platform is a restriction on the space of possible implementation choices, providing well-defined abstraction of the underlying technology for the app developer

I New platforms defined at architecture/ microarchitecture boundary

I Key to such approaches is the representation of communication in the platform model

Adapted from [ASV’01,Rabaey’02]

ECE 5745 T05: Automated Design Methodologies 28 / 55 Standard-Cell SoC/Platform • Gate-Array • Prog Logic FPGA Comparison Agenda

Standard-Cell-Based Design System-on-Chip Platform-Based Design Gate-Array-Based Design Programmable Logic-Based Design Reprogrammable Logic-Based Design Comparison and Hybrids

ECE 5745 T05: Automated Design Methodologies 29 / 55 Standard-Cell SoC/Platform • Gate-Array • Prog Logic FPGA Comparison Gate-Array-BasedGate Arrays Design

Can cut mask costs by prefabricating arrays of on wafers Only customize metal layer for each design GND Fixed-size unit transistors NMOS Metal connections personalize design PMOS Two kinds: Channeled Gate Arrays VDD – Leave space between rows of transistors for routing PMOS Sea-of-Gates – Route over the top of unused NMOS transistors GND

[ OCEAN Sea-of-Gates Base Pattern ] Adapted from [Terman’02]

ECE6.371 5745 – Fall 2002 T05: Automated9/27/02 Design MethodologiesL08 – Logical Effort and ASIC Design Styles 30 28 / 55 ‣ Store prefabricated wafers of “active” & gate layers & local interconnect, comprising, primarily, rows of transistors. Customize as needed with “back-end” metal processing (contact cuts, vias, metal wires). Could use a different factory. Standard-Cell SoC/Platform • Gate-Array • Prog Logic FPGA Comparison Gate-Array Fabrication Process

Lecture 02, Introduction 1 15 CS250, UC Berkeley Fall ‘11 Adapted from [Asanovic’11]

ECE 5745 T05: Automated Design Methodologies 31 / 55

Gate Array

• Shifts large portion of design and mask NRE to vendor. • Shorter design and processing times, reduced time to market. • Highly structured layout with fixed size transistors leads to large sub-circuits (ex: Flip-flops) and higher per die costs. • Memory arrays are particularly inefficient, so often prefabricated, also:

Sea-of-gates, structured ASIC, master-slice.

Lecture 02, Introduction 1 16 CS250, UC Berkeley Fall ‘11 Standard-Cell SoC/Platform • Gate-Array • Prog Logic FPGA Comparison Gate-Array Personalization Gate Array Personalization

Isolating transistors Isolating transistors by shared GND with “off” gate contact

GND

Adapted from [Terman’02]

ECE 5745 T05: Automated Design Methodologies 32 / 55

6.371 – Fall 2002 9/27/02 L08 – Logical Effort and ASIC Design Styles 29 Standard-Cell SoC/Platform • Gate-Array • Prog Logic FPGA Comparison Gate-Array Example – 3-input NAND Gate

ABCY

Adapted from [Weste’11]

ECE 5745 T05: Automated Design Methodologies 33 / 55 Standard-Cell SoC/Platform Gate-Array • Prog Logic • FPGA Comparison Agenda

Standard-Cell-Based Design System-on-Chip Platform-Based Design Gate-Array-Based Design Programmable Logic-Based Design Reprogrammable Logic-Based Design Comparison and Hybrids

ECE 5745 T05: Automated Design Methodologies 34 / 55 Standard-Cell SoC/Platform Gate-Array • Prog Logic • FPGA Comparison Array-Based Programmable Logic

Adapted from [Rabaey’02]

ECE 5745 T05: Automated Design Methodologies 35 / 55 Standard-Cell SoC/Platform Gate-Array • Prog Logic • FPGA Comparison Programming a PROM

Adapted from [Rabaey’02]

ECE 5745 T05: Automated Design Methodologies 36 / 55 Standard-Cell SoC/Platform Gate-Array • Prog Logic • FPGA Comparison Various Programmability Options for PLA/PROM NOR Structure

Adapted from [Weste’11]

ECE 5745 T05: Automated Design Methodologies 37 / 55 Standard-Cell SoC/Platform Gate-Array • Prog Logic • FPGA Comparison PLA Implemented with Dynamic Logic

Adapted from [Weste’11]

ECE 5745 T05: Automated Design Methodologies 38 / 55 CoolRunner-II CPLD Family R Standard-Cell SoC/Platform Gate-Array • Prog Logic • FPGA Comparison path. The BSC and ISP block has the JTAG controller and In-SystemModern Programming Circuits. Complex Programmable Logic Devices

BSC Path

Clock and Control Signals Function Function Block 1 Block n I/O Pin MC1 MC1 I/O Pin I/O Pin MC2 MC2 I/O Pin 16 FB 16 FB

16 PLA PLA 16 AIM 40 40 I/O Blocks I/O Blocks

I/O Pin MC16 MC16 I/O Pin Direct Inputs Direct Inputs 16 16

JTAG BSC and ISP

DS090_01_121201 Xilinx CoolRunner-IIFigure CPLD: 1: CoolRunner-II 2–32 PLAs CPLD integratedArchitecture on a single chip PLAs support 40 inputs, 56 product terms, and 16 output terms Function Block flexible, and very robust when compared to fixed or cas- PLAs chained together through a programmablecaded product “Advanced term FBs. Interconnect Matrix” The CoolRunner-II CPLD FBs contain 16 macrocells, with 40 entry sites for signals to arrive for logic creation and con- Classic CPLDs typically have a few productAdapted terms from available [Xilinx’08] nection. The internal logic engine is a 56 product term PLA. for a high-speed path to a given macrocell. They rely on ECE 5745 T05: Automated Design Methodologies 39 / 55 All FBs, regardless of the number contained in the device, capturing unused p-terms from neighboring macrocells to are identical. For a high-level view of the FB, see Figure 2. expand their product term tally, when needed. The result of this architecture is a variable timing model and the possibil-

MC1 ity of stranding unusable logic within the FB. MC2 The PLA is different — and better. First, any product term can be attached to any OR gate inside the FB macrocell(s). Second, any logic function can have as many p-terms as needed attached to it within the FB, to an upper limit of 56. 16 40 PLA Out Third, product terms can be re-used at multiple macrocell OR functions so that within a FB, a particular logical product To AIM need only be created once, but can be re-used up to 16 times within the FB. Naturally, this plays well with the fitting software, which identifies product terms that can be shared. The software places as many of those functions as it can MC16 into FBs, so it happens for free. There is no need to force 3 macrocell functions to be adjacent or any other restriction save residing in the same FB, which is handled by the soft- Global Global Set/Reset Clocks ware. Functions need not share a common clock, common set/reset, or common output enable to take full advantage of DS090_02_101001 the PLA. Also, every product term arrives with the same Figure 2: CoolRunner-II CPLD Function Block time delay incurred. There are no cascade time adders for putting more product terms in the FB. When the FB product At the high level, the product terms (p-terms) reside in a term budget is reached, there is a small interconnect timing (PLA). This structure is extremely penalty to route signals to another FB to continue creating logic. Xilinx design software handles all this automatically.

4 www.xilinx.com DS090 (v3.1) September 11, 2008 Product Specification R CoolRunner-II CPLD Family

Macrocell resets, and output enables. Each macrocell flip-flop is con- figurable for either single edge or DualEDGE clocking, pro- The CoolRunner-II CPLD macrocell is extremely efficient viding either double data rate capability or the ability to and streamlined for logic creation. Users can develop sum distribute a slower clock (thereby saving power). For single of product (SOP) logic expressions that comprise up to 40 edge clocking or latching, either clock polarity can be inputs and span 56 product terms within a single function selected per macrocell. CoolRunner-II CPLD macrocell block. The macrocell can further combine the SOP expres- details are shown in Figure 3. Note that in Figure 4, stan- sion into an XOR gate with another single p-term expres- dard logic symbols are used except the trapezoidal multi- sion. The resulting logic expression’s polarity is also plexers have input selection from statically programmed selectable. As well, the logic function can be pure combina- Standard-Cell SoC/Platform Gate-Array configuration• Prog Logic select• lines (FPGAnot shown). XilinxComparison application torial or registered, with the storage element operating note XAPP376 gives a detailed explanation of how logic is selectably as a D or T flip-flop, or transparent latch. Avail- created in the CoolRunner-II CPLD family. able at each macrocell are independent selectionsCPLD of global, Macro Cell function block level or local p-term derived clocks, sets,

From AIM

40

49 P-terms To PTA, PTB, PTC of other macrocells 4 P-terms Direct Input from Feedback CTC, CTR, I/O Block to AIM CTS, CTE PTA PTA PTB CTS GSR V CC GND PTC

GND S To I/O Block D/T Q

PLA OR Term PTC CE FIF Latch GCK0 CK DualEDGE GCK1 R CTC GCK2 PTA PTC CTR GSR GND Adapted from [Xilinx’08] DS090_03_121201 ECE 5745 FigureT05: 3: CoolRunner-II Automated Design CPLD Methodologies Macrocell 40 / 55

When configured as a D-type flip-flop, each macrocell has tional functionality is retained for use as a buried logic node an optional clock enable signal permitting state hold while a if needed. FTo ggle is the maximum clock frequency to which clock runs freely. Note that Control Terms (CT) are available a T flip-flop can reliably toggle. to be shared for key functions within the FB, and are gener- ally used whenever the exact same logic function would be Advanced Interconnect Matrix (AIM) repeatedly created at multiple macrocells. The CT product The Advanced Interconnect Matrix is a highly connected terms are available for FB clocking (CTC), FB asynchro- low power rapid switch. The AIM is directed by the software nous set (CTS), FB asynchronous reset (CTR), and FB out- to deliver up to a set of 40 signals to each FB for the cre- put enable (CTE). ation of logic. Results from all FB macrocells, as well as, all Any macrocell flip-flop can be configured as an input regis- pin inputs circulate back through the AIM for additional con- ter or latch, which takes in the signal from the macrocell’s nection available to all other FBs as dictated by the design I/O pin, and directly drives the AIM. The macrocell combina-

DS090 (v3.1) September 11, 2008 www.xilinx.com 5 Product Specification Standard-Cell SoC/Platform Gate-Array Prog Logic • FPGA • Comparison Agenda

Standard-Cell-Based Design System-on-Chip Platform-Based Design Gate-Array-Based Design Programmable Logic-Based Design Reprogrammable Logic-Based Design Comparison and Hybrids

ECE 5745 T05: Automated Design Methodologies 41 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic • FPGA • Comparison FieldField-Programmable Programmable Gate-ArraysGate Arrays

 Two-dimensional array of simple logic- and interconnection- blocks.  Typical architecture: LUTs implement any function of n-inputs (n=3 in this case).  Optional Flip-flop with each LUT.

‣ Fuses, EPROM, or Static RAM cells are used to store the “configuration”. ‣ Here, it determines function implemented by LUT, selection of Flip-flop, and interconnection points. ‣ Many FPGAs include special circuits to accelerate carry-chain and many special cores: RAMs, MAC, Enet, PCI, SERDES, ... Adapted from [Asanovic’11] Lecture 02, Introduction 1 17 CS250, UC Berkeley Fall ‘11 ECE 5745 T05: Automated Design Methodologies 42 / 55

Traditional FPGA versus ASIC argument (circa 2000) FPGA total cost ASIC ASICs cost FPGAs cost effective effective volume • ASIC: High NRE costs ($2M for 0.35um chip). Relatively Low cost per die. • FPGAs: Very low NRE costs. Relatively low silicon efficiency ⇒ high cost per part. • Cross-over volume from cost effective FPGA design to ASIC in the 10K range.

Lecture 02, Introduction 1 18 CS250, UC Berkeley Fall ‘11 Standard-Cell SoC/Platform Gate-Array Prog Logic • FPGA • Comparison Simplified FPGA Floorplan

Adapted from [Weste’11]

ECE 5745 T05: Automated Design Methodologies 43 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic • FPGA • Comparison Simplified FPGA Block

Adapted from [Weste’11]

ECE 5745 T05: Automated Design Methodologies 44 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • Agenda

Standard-Cell-Based Design System-on-Chip Platform-Based Design Gate-Array-Based Design Programmable Logic-Based Design Reprogrammable Logic-Based Design Comparison and Hybrids

ECE 5745 T05: Automated Design Methodologies 45 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • Operation Binding Time

"Hardware" "Software" Full Standard SoC Gate Prog. Reprog. μproc Custom Cell Platform Array Logic Logic DSP First First First Metal Fuse Load Load Mask Mask Mask Masks Program Config Program

Later Binding Time

I Earlier the operation is bound, the less area, delay, and energy required for the implementation

I Later the operation is bound, the more flexible the device

Adapted from [Asanovic’11]

ECE 5745 T05: Automated Design Methodologies 46 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • Comparison of Specific Design Methodologies

Design Unit Power Impl Time to Method NRE Cost Disp Compl Market Perf Flex Full Custom VHigh Low Low High High VHigh Low Standard Cell High Low Low High High High Low SoC Platform High Low Low Med High High Med Gate Array Med Med Low Med Med Med Med Prog Logic Low High Med Low Low Med Med Reprog Logic Low High Med Med Low High High µProc/DSP Low High High Low Low Low VHigh

ECE 5745 T05: Automated Design Methodologies 47 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • ASIC vs. FPGA

I Traditional Argument FPGA . ASIC: High NRE ($2M for 0.35 µm chip), low marginal cost, best effeciency . FPGA: Low NRE, high marginal cost, ASIC

lower effeciency Cost Total . Cross-over point: around 10,000 I Current Trends Volume . ASIC: Increasing NRE ($40M for 90 nm FPGA chip) due to design costs, verification ASIC costs, mask costs, etc . FPGA: Better able to track Moore’s law,

integrating fixed function blocks Cost Total . Cross-over point: around 100,000

Volume Adapted from [Asanovic’11]

ECE 5745 T05: Automated Design Methodologies 48 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • Scale – ASIC with Pre-Placement & SRAMs

ECE 5745 T05: Automated Design Methodologies 49 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • T0 – Full Custom w/ Standard Cells T0 Die Breakdown

Switched to HP CMOS 26G Control Logic process late in design • used 1.0µm rules in MIPS DP I$ 0.8µm process • only used 2 out of 3 VMP metal layers 16.75x16.75mm2 VP1 730,701 transistors 4W typical @ 5V, 40MHz Vector Registers 12W maximum

Performance: VP0 320MMAC/s 640MB/s

Adapted from [Terman’02]

ECE 5745 T05: Automated Design Methodologies 50 / 55 6.371 – Fall 2002 11/8/02 L19 – Case Study T0 15 Standard-Cell ImplementationSoC/Platform Gate-Array Progstrategies Logic FPGA • Comparison • Application-SpecificOne could just start drawing IC transistors, – Full but Custom this quickly w/ becomes Standard hopeless Cell with a chip of any size. Instead we’ll use one of the following strategies:

Standard cell: predefined gates, Full custom: custom “cells” meant to be automatically placed and routed. stacked in columns to create N-bit wide In .5u 10K fets/mm2 datapath. Signals between columns routed across cells. In .5u 25K/mm2

RAM Generator: one cell iterated many times perhaps surrounded by driver/sensing logic. Basic structure stays the same, only dimensions change. In .5u 45K/mm2 for multiport regfile

6.371 – Fall 2002 09/04/02 L01 – VLSI Overview 16 Adapted from [Terman’02]

ECE 5745 T05: Automated Design Methodologies 51 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • Xilinx Vertex-II Pro – FPGA w/ Hard Processor

Adapted from [Rabaey’02]

ECE 5745 T05: Automated Design Methodologies 52 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • HardCopy/Stratix Design Flow Altera HardCopy – FPGA to Gate-Array-Like Tapeout

System Bitstream System RTL code Prototyping bring-up available Stratix Software/ Timing placement Stratix constraints hardware and routing database co-design

Formal Synthesis verification HardCopy Design Center HardCopy Astro placement and HardCopy HardCopy global routing database fab, Cadence assembly, Design Conformal Tape- and test handoff out Synopsys PrimeTime SI

Synopsys TetraMax

© 2008 Altera Corporation—Public Altera, Stratix, Arria, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation 7 Adapted from [Mansur’08]

ECE 5745 T05: Automated Design Methodologies 53 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • Acknowledgments

I [Asanovic’11] K. Asanovic,´ J. Wawrzynek, and J. Lazzaro, UC Berkeley CS 250 VLSI Systems Design, Lecture Slides, 2011.

I [ASV’01] A.L. Sangiovanni-Vincentelli, “Platform-Based Design”, UC Berkeley White Paper, 2001.

I [Mansur’08] D. Mansur, “Stratix IV FPGA and HardCopy IV ASIC @ 40 nm,” Hot Chips, Aug. 2008.

I [Melikyan] V. Melikeyan, “IC Design Introduction”, Lecture Slides, Synopsys Curriculum Materials.

I [Rabaey’02] J. Rabaey et al., Companion Slides for “Digital Integrated Circuits: Design Perspective,” 2nd ed, Prentice Hall, 2002.

ECE 5745 T05: Automated Design Methodologies 54 / 55 Standard-Cell SoC/Platform Gate-Array Prog Logic FPGA • Comparison • Acknowledgments

I [SAED’11] “Standard Cell Lib SAED EDK90 CORE Databook”, Synopsys, 2011.

I [Terman’02] C. Terman and K. Asanovic,´ MIT 6.371 Introduction to VLSI Systems, Lecture Slides, 2002.

I [Weste’11] N. Weste and D. Harris, “CMOS VLSI Design: A Circuits and Systems Perspective,” 4th ed, Addison Wesley, 2011.

I [Xilinx’08] CoolRunner-II CPLD Family,” Xilinx DataSheet, 2008. http://www.xilinx.com/support/documentation/data_ sheets/ds090.pdf

ECE 5745 T05: Automated Design Methodologies 55 / 55