Lecture 30 Perspectives Administrivia Transistor Count 18Nm Finfet

18nm FinFET Double-gate structure + raised source/drain 400 -1.50 V Gate 350 Gate 300 -1.25 V Lecture 30 Source Drain 250 -1.00 V Silicon 200 Fin [uA/um] -0.75 V BOX d Si fin - Body! I 150 -0.50 V 100 -0.25 V Perspectives 50 0 X. Huang, et al, 1999 IEDM, p.67~70 -1.5 -1.0 -0.5 0.0 Vd [V] Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Administrivia Power Density z With Vdd ~1.2V, these devices are quite fast. FO4 z Final on Friday December 14, 2001 – 8 am delay is <5ps Location: 180 Tan Hall z If we continue with today’s architectures, we could z Topics – all what was covered in class. run digital circuits at 30GHz z Review Session - TBA z But - we will end up with 20kW/cm2 power density. z Lab and hw scores to be posted on the web – please z Lower supply – to 0.6V, we are down to 5kW/cm2. check if correct or if something is missing z z Superb Job on Posters! Speeds will be a bit lower, too, FO4 = 10ps, lowering the frequencies to ~10GHz [Tang, ISSCC’01], and z FEEDBACK ON COURSE EXTREMELY WELCOME! lowering power z Assume that a high performance DG or bulk FET can be designed with 1kW/cm2, with FO4 = 10ps [Frank, Proc IEEE, 3/01] Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Transistor Count Power will be a problem 10000 100000 1.8B 18KW 1000 900M 10000 5KW 425M 1.5KW 100 200M 1000 500W 10 P6 Pentium® proc 100 486 Pentium ® proc 1 286 386 486 10 8086 386 0.1 286 Power (Watts) 8085 Transistors (MT) 8080 8085 8086 8008 0.01 8080 1 4004 8008 S. Borkar S. Borkar 4004 0.001 0.1 1970 1980 1990 2000 2010 1971 1974 1978 1985 1992 2000 2004 2008 Year Year 200M--1.8B transistors on the Lead Microprocessor Power delivery and dissipation will be prohibitive Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 1 Power is a Limiting Factor Microprocessor Design z Core datapath will be running at 7 - 10GHz z If we have 2cm x 2cm die in a high-performance z Requires fast devices, low thresholds with 0.5-0.6V microprocessor, we will end up with 4kW power supplies dissipation. z Lowest NMOS VTh ~ -0.1V to get swing in CMOS. z If our power has to be limited to 180W, we can afford z Assume threshold of 0 – 0.1V. The devices will be to have only 4.5% of these devices with 0.6V supply very leaky, will use second threshold to control on the die, given that nothing else dissipates power. leakage power. z With second threshold set to have 10x less leakage, 90% of devices off critical paths can be made high- threshold. z Power limits the size of the µP core to 5-10% die (today’s transistor count, just shrunk), 30-50% of total power budget. Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Possible Scenario Add Dedicated Datapath z Example: 0.5 % of devices will be of highest z Can execute e.g. DIVX decoder, graphics performance Vdd Vdd/2 Freq = 1 Freq = 0.5 z 35% is leakage (assume: 20% drain, 10% gate, 5% Vdd = 0.5 drain-to-body) Logic Block Vdd = 1 Logic Block Throughput = 1 Throughput = 1 z 65% is active power, if just 0.5% of these CV2 = 13W, Power = 1 Power = 0.25 leakage 7W Area = 1 Logic Block Area = 2 Pwr Den = 1 Pwr Den = 0.125 z How would other 99.5% devices that populate the Leakage Curr. = 2 2cmx2cm die look like? z Will run at 10x lower frequency, at 0.5-0.7 of the processor VDD = 0.25 - 0.35V z Thresholds for critical paths VTh = 150mV z Need leakage power management – another threshold or control of VT Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Microprocessors 180W Gives Us: Power Area Today → 20nm µP Core Cache Cache Memory µP Core Dedicated datapath µP Core Memory 2GHz Dedicated µP Logic Core Dedicated datapath 7-10 GHz Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 2 Today’s Design Methodologies Memory Will Not Scale Much Further z z Density is the key requirement The Deep Sub-Micron (DSM) Effect (≤ 0.25µ) z Will occupy 70-80% of the die ∝ DSM ∝ 1/DSM z Low leakage z Low activity – Inherently low active power, low power “Microscopic Problems” “Macroscopic Issues” density (at least 10x less than logic) • Wiring Load Management • Time-to-Market • Noise, Crosstalk • Millions of Gates z Need higher VTh ~ 0.5V, and higher supply 0.8-1V (?) • Reliability, Manufacturability • High-Level Abstractions • Complexity: LRC, ERC • Reuse & IP: Portability • Accurate Power Prediction • Predictability • Accurate Delay Prediction • etc. • etc. Everything Looks a Little Different ? …and There’s a Lot of Them! Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Systems-on-a-Chip The Productivity Gap Today → 20nm 10,000,000 Logic Transistors/Chip 100,000,000 p i .10µ 1,000,000 Transistor/Staff Month 10,000,000 h C h 100,000 58%/Yr. compound 1,000,000 t n er Complexity growth rate o p M y s .35µ 10,000 100,000 t r - i ) o v f t i f K 1,000 10,000 t s ( a i c t s x u S / n d 100 x x 1,000 . a o Radio s r x x x r n T x P (60GHz (?), CMOS ?) 10 21%/Yr. compound 100 a r c 2.5µ i T g Productivity growth rate o 1 10 1 9 1 3 7 7 7 9 9 3 5 5 L 3 5 1 9 9 0 0 0 8 9 0 8 9 9 0 8 8 8 9 9 0 0 0 9 9 0 9 9 9 0 9 9 25M transistors, 3MB embedded SRAM 9 1 1 2 2 2 1 1 2 1 1 1 2 1 1 MIPS core @ 100MHz, DSP @ 144MHz 1 2W 7 PLLs, 12 ADC, DACs, 100 clocks, 1.4W Source: SEMATECH Digital Integrated Circuits Broadcom set-topPerspectives box © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Transistor Requirements Implementation Methodologies z Will need different kinds of transistors: Digital Circuit Implementation Approaches » Datapaths (speed, leakage) » Dedicated DSP (power, leakage) » Memory (density is main concern) » Analog (?) Custom Semi-custom z Power and leakage determine the size ratios between these blocks Cell-Based Array-Based z Number of different transistors types is determined by parameter spread z Less devices could solve the problem, but, need control of the Standard Cells Macro Cells Pre-diffused Pre-wired th threshold (4 terminal), with strong transfer function. Compiled Cells (Gate Arrays) (FPGA) Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 3 Custom Design – Layout Editor Gate Array — Sea-of-gates polysilicon VDD metal rows of Uncommited uncommitted possible cells GND contact Cell In1 In2 In 3 In4 routing channel Committed Cell (4-input NOR) Magic Layout Editor Out (UC Berkeley) Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Standard Cell - Example Sea-of-gate Primitive Cells Oxide-isolation PMOS PMOS NMOS NMOS NMOS 3-input NAND cell (from Mississippi State Library) characterized for fanout of 4 and Using oxide-isolation Using gate-isolation for three different technologies Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Synthesis Sea-of-gates 1. Describe your circuit in HDL (VHDL, Verilog) Random Logic 2. Syntehsis programs map it into a standard cell library. Set the constraints (timing, area) 3. Get a gate level netlist – automatic place and route 4. Insert clock 5. Extract the netlist from layout 6. Does it meet constraints? – go back to 1, 2, 3, 4. Called ‘Design closure’ – timing closure, power closure. Memory Subsystem LSI Logic LEA300K (0.6 µm CMOS) Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 4 Field-Programmable Gate Arrays Prewired Arrays Fuse-based I/O Buffers Program/Test/Diagnostics Categories of prewired arrays (or field- Vertical routes programmable devices): Standard-cell like floorplan z Fuse-based (program-once) z I/O Buffe rs Non-volatile EPROM based I/O Buffers z RAM based Rows of logic modules Routing channels I/O Buffe rs Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Programmable Logic Devices Interconnect Programmed interconnection Input/output pin Cell Antifus e Horizontal tracks PAL Ve r t ic a l t r a c k s PLA PROM Programming interconnect using anti-fuses Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Field-Programmable Gate Arrays EPLD Block Diagram RAM-based Primary inputs Macrocell CLB CLB switching matrix Horizontal routing channel Interconnect point CLB CLB Vertical routing channel Courtesy Altera Corp. Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 5 RAM-based FPGA Basic Cell (CLB) Architecture ReUse z Silicon System Platform Combinationa l logic Storage eleme nts » Flexible architecture for hardware and software R » Specific (programmable) components A Din R » Network architecture 1 2 Any function of up to B/Q /Q 4 variables F D Q1 F C/Q1/Q2 G » Software modules CE F D » Rules and guidelines for design of HW and SW A Any function of up to z Has been successful in PC’s R B/Q1/Q2 4 variable s G C/Q1/Q2 F D Q2 » Dominance of a few players who specify and control architecture D G CE G z Application-domain specific (difference in constraints)

Lecture 30 Perspectives Administrivia Transistor Count 18Nm Finfet

GS40 0.11-Μm CMOS Standard Cell/Gate Array

Power Management 24

Power Management Using FPGA Architectural Features Abu Eghan, Principal Engineer Xilinx Inc

Introduction to ASIC Design

Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice

ECE 274 - Digital Logic Lecture 22 Full-Custom Integrated Circuit

Computer Architecture Techniques for Power-Efficiency

Full-Custom Ics Standard-Cell-Based

Dynamic Voltage/Frequency Scaling and Power-Gating of Network-On-Chip with Machine Learning

Power Reduction Techniques for Microprocessor Systems

Happy: Hyperthread-Aware Power Profiling Dynamically

Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems †