18nm FinFET

Double-gate structure + raised source/drain

400 -1.50 V Gate 350 Gate 300 -1.25 V

Lecture 30 Source Drain 250 -1.00 V Silicon 200 Fin [uA/um] -0.75 V BOX d Si fin - Body! I 150 -0.50 V 100 -0.25 V Perspectives 50 0 X. Huang, et al, 1999 IEDM, p.67~70 -1.5 -1.0 -0.5 0.0 Vd [V]

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Administrivia Power Density

z With Vdd ~1.2V, these devices are quite fast. FO4 z Final on Friday December 14, 2001 – 8 am delay is <5ps Location: 180 Tan Hall z If we continue with today’s architectures, we could z Topics – all what was covered in class. run digital circuits at 30GHz

z Review Session - TBA z But - we will end up with 20kW/cm2 power density. z Lab and hw scores to be posted on the web – please z Lower supply – to 0.6V, we are down to 5kW/cm2. check if correct or if something is missing z z Superb Job on Posters! Speeds will be a bit lower, too, FO4 = 10ps, lowering the frequencies to ~10GHz [Tang, ISSCC’01], and z FEEDBACK ON COURSE EXTREMELY WELCOME! lowering power

z Assume that a high performance DG or bulk FET can be designed with 1kW/cm2, with FO4 = 10ps [Frank, Proc IEEE, 3/01] Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Transistor Count Power will be a problem

10000 100000 1.8B 18KW 1000 900M 10000 5KW 425M 1.5KW 100 200M 1000 500W 10 P6 Pentium® proc 100 486 Pentium ® proc 1 286 386 486 10 8086 386 0.1 286 Power (Watts) 8085 (MT) 8080 8085 8086 8008 0.01 8080 1 4004 8008 S. Borkar S. Borkar 4004 0.001 0.1 1970 1980 1990 2000 2010 1971 1974 1978 1985 1992 2000 2004 2008 Year Year 200M--1.8B transistors on the Lead Power delivery and dissipation will be prohibitive

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

1 Power is a Limiting Factor Microprocessor Design

z Core will be running at 7 - 10GHz z If we have 2cm x 2cm in a high-performance z Requires fast devices, low thresholds with 0.5-0.6V microprocessor, we will end up with 4kW power supplies dissipation. z Lowest NMOS VTh ~ -0.1V to get swing in CMOS. z If our power has to be limited to 180W, we can afford z Assume threshold of 0 – 0.1V. The devices will be to have only 4.5% of these devices with 0.6V supply very leaky, will use second threshold to control on the die, given that nothing else dissipates power. leakage power. z With second threshold set to have 10x less leakage, 90% of devices off critical paths can be made high- threshold. z Power limits the size of the µP core to 5-10% die (today’s count, just shrunk), 30-50% of total power budget. Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Possible Scenario Add Dedicated Datapath

z Example: 0.5 % of devices will be of highest z Can execute e.g. DIVX decoder, graphics performance Vdd Vdd/2 Freq = 1 Freq = 0.5 z 35% is leakage (assume: 20% drain, 10% gate, 5% Vdd = 0.5 drain-to-body) Logic Block Vdd = 1 Logic Block Throughput = 1 Throughput = 1 z 65% is active power, if just 0.5% of these CV2 = 13W, Power = 1 Power = 0.25 leakage 7W Area = 1 Logic Block Area = 2 Pwr Den = 1 Pwr Den = 0.125 z How would other 99.5% devices that populate the Leakage Curr. = 2 2cmx2cm die look like? z Will run at 10x lower frequency, at 0.5-0.7 of the VDD = 0.25 - 0.35V

z Thresholds for critical paths VTh = 150mV z Need leakage – another threshold or control of VT Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Microprocessors 180W Gives Us:

Power Area Today → 20nm

µP Core Cache

Memory µP Core Dedicated datapath µP Core Memory 2GHz Dedicated µP Logic Core Dedicated datapath

7-10 GHz

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

2 Today’s Design Methodologies Memory Will Not Scale Much Further

z z Density is the key requirement The Deep Sub-Micron (DSM) Effect (≤ 0.25µ) z Will occupy 70-80% of the die ∝ DSM ∝ 1/DSM z Low leakage z Low activity – Inherently low active power, low power “Microscopic Problems” “Macroscopic Issues” density (at least 10x less than logic) • Wiring Load Management • Time-to-Market • Noise, Crosstalk • Millions of Gates z Need higher VTh ~ 0.5V, and higher supply 0.8-1V (?) • Reliability, Manufacturability • High-Level Abstractions • Complexity: LRC, ERC • Reuse & IP: Portability • Accurate Power Prediction • Predictability • Accurate Delay Prediction • etc. • etc. Everything Looks a Little Different ? …and There’s a Lot of Them!

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Systems-on-a-Chip The Productivity Gap

Today → 20nm 10,000,000 Logic Transistors/Chip 100,000,000 p

i .10µ 1,000,000 Transistor/Staff Month 10,000,000

h C

h

100,000 58%/Yr. compound 1,000,000 t

n er

Complexity growth rate o

p

M y s .35µ 10,000 100,000

t r

- i

) o

v

f t

i

f K

1,000 10,000 t

s (

a i

c

t s

x u

S

/ n

d

100 x x 1,000 . a

o Radio s r x x

x r

n

T x P (60GHz (?), CMOS ?) 10 21%/Yr. compound 100 a

r

c 2.5µ i

T

g Productivity growth rate

o 1 10

1 9 1 3 7

7 7 9

9 3 5 5 L

3 5

1

9 9 0 0 0

8 9 0 8 9 9 0

8 8

8

9 9 0 0 0

9 9 0 9 9 9 0

9 9 25M transistors, 3MB embedded SRAM 9

1 1 2 2 2

1 1 2 1 1 1 2

1 1 MIPS core @ 100MHz, DSP @ 144MHz 1 2W 7 PLLs, 12 ADC, DACs, 100 clocks, 1.4W Source: SEMATECH

Digital Integrated Circuits Broadcom set-topPerspectives box © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Transistor Requirements Implementation Methodologies

z Will need different kinds of transistors: Digital Circuit Implementation Approaches » (speed, leakage) » Dedicated DSP (power, leakage) » Memory (density is main concern) » Analog (?) Custom Semi-custom z Power and leakage determine the size ratios between these blocks Cell-Based Array-Based z Number of different transistors types is determined by parameter spread z Less devices could solve the problem, but, need control of the Standard Cells Macro Cells Pre-diffused Pre-wired th threshold (4 terminal), with strong transfer function. Compiled Cells (Gate Arrays) (FPGA)

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

3 Custom Design – Layout Editor — Sea-of-gates

polysilicon

VDD

metal rows of Uncommited uncommitted possible cells GND contact Cell

In1 In2 In 3 In4

routing channel Committed Cell (4-input NOR) Magic Layout Editor Out (UC Berkeley)

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Standard Cell - Example Sea-of-gate Primitive Cells

Oxide-isolation

PMOS

PMOS

NMOS

NMOS NMOS 3-input NAND cell (from Mississippi State Library) characterized for fanout of 4 and Using oxide-isolation Using gate-isolation for three different technologies

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Synthesis Sea-of-gates

1. Describe your circuit in HDL (VHDL, Verilog) Random Logic 2. Syntehsis programs map it into a library. Set the constraints (timing, area) 3. Get a gate level netlist – automatic place and route 4. Insert clock 5. Extract the netlist from layout 6. Does it meet constraints? – go back to 1, 2, 3, 4. Called ‘Design closure’ – timing closure, power closure. Memory Subsystem

LSI Logic LEA300K (0.6 µm CMOS)

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

4 Field-Programmable Gate Arrays Prewired Arrays Fuse-based

I/O Buffers

Program/Test/Diagnostics Categories of prewired arrays (or field- Vertical routes

programmable devices): Standard-cell like floorplan z Fuse-based (program-once)

z I/O Buffe rs Non-volatile EPROM based I/O Buffers z RAM based Rows of logic modules Routing channels

I/O Buffe rs

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Programmable Logic Devices Interconnect

Programmed interconnection Input/output pin

Cell Antifus e

Horizontal tracks

PAL Ve r t ic a l t r a c k s PLA PROM Programming interconnect using anti-fuses

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Field-Programmable Gate Arrays EPLD Block Diagram RAM-based

Primary inputs Macrocell CLB CLB

switching matrix Horizontal routing channel Interconnect point

CLB CLB

Vertical routing channel Courtesy Altera Corp.

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

5 RAM-based FPGA Basic Cell (CLB) Architecture ReUse

z Silicon System Platform Combinationa l logic Storage eleme nts » Flexible architecture for hardware and software

R » Specific (programmable) components A Din R » Network architecture 1 2 Any function of up to B/Q /Q 4 variables F D Q1 F C/Q1/Q2 G » Software modules CE F D » Rules and guidelines for design of HW and SW

A Any function of up to z Has been successful in PC’s R B/Q1/Q2 4 variable s G C/Q1/Q2 F D Q2 » Dominance of a few players who specify and control architecture D G CE G z Application-domain specific (difference in constraints) E Clock » Speed (compute power)

CE » Dissipation »Costs » Real / non-real time data Courtesy of

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Platform-Based Design RAM-based FPGA

“Only the consumer gets freedom of choice; designers need freedom from choice” (Orfali, et al, 1996, p.522)

z A platform is a restriction on the space of possible implementation choices, providing a well-defined abstraction of the underlying technology for the application developer z New platforms will be defined at the architecture-micro-architecture boundary z They will be component-based, and will provide a range of choices from structured-custom to fully programmable implementations z Key to such approaches is the representation of communication in the platform model

Xilinx XC4025 Source:R.Newton Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

Addressing the Design Complexity Issue Design at a crossroad Architecture Reuse System-on-a-Chip

Reuse comes in generations Generation Reuse element Status Multi- 500 k Gates FPGA z Embedded applications Spectral where cost, performance, st RAM + 1 Gbit DRAM 1 Standard cells We ll e s ta b lis he d Imager Preprocessing Analog and energy are the real 2 nd IP b lo c ks Being introduced issues! 64 SIMD Processor µC rd 3 Architecture Eme rging Array + SRAM system z DSP and control intensive 4 th IC Early re s e arc h +2 Gbit z Mixed-mode Image Conditioning DRAM z Combines programmable 100 GOPS Recog- and application-specific nition modules Source: Theo Claasen (Philips) – DAC 00 z Software plays crucial role

Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000

6 EE 141 Summary

z Digital CMOS design is well and kicking z Some major challenges down the road caused by Deep Sub-micron » Super GHz design » Power consumption!!!! » Reliability – making it work » Device variations Some new circuit solutions are bound to emerge z Who can afford design in the years to come? Some major design methodology change in the making!

Digital Integrated Circuits Perspectives © Prentice Hall 2000

7