Low Power Architectures Lecture #1:Introduction

LowLow powerpower ArchitecturesArchitectures LectureLecture #1:Introduction#1:Introduction Dr. Avi Mendelson [email protected] Contributors: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred PPollackollack Dr. Avi Mendelson, Technion, EE department Intel WhyWhy PowerPower meter?meter? Dr. Avi Mendelson Page 2 1 MooreMoore’’ss LawLaw For many years technology obeys Moore’s Law Transistors Per Die 108 256M 64M 7 Memory ® 10 Memory 16M Pentium III Microprocessor 6 4M ® 10 1M Pentium II Pentium® Pro 5 256K Pentium® 10 64K i486™ 16K i386™ 104 4K 80286 1K 8086 103 8080 4004 102 101 100 ’70 ’73 ’76 ’79 ’82 ’85 ’88 ’91 ’94 '97 2000 Source: Intel Dr. Avi Mendelson Page 3 In the Last 25 Years Life was Easy(*) Doubling of transistor density every 30 months Increasing die sizes, allowed by – Increasing Wafer Size – Process technology moving from “black art” to “manufacturing science” ⇒ Doubling of transistors every 18 months Tech Old µArch mm (linear) New µArch mm (linear) Ratio 1.0µ i386C 6.5 i486 11.5 Ratio3.1 0.7µ i486C 9.5 Pentium® 17 3.2 0.5µ Pentium® 12.2 Pentium® Pro 17.3 2.1 0.18µ Pentium® III 10.3 Next Gen ? 2--3 Implications: (in the same technology) (*) source Fred Pollack, 1. New µArch ~ 2-3X die area of the last µArch Micro-32 2. Provides 1.5-1.7X integer performance of the last µArch Dr. Avi Mendelson Page 4 2 Suddenly,Suddenly, thethe powerpower monstermonster appearsappears Dr. Avi Mendelson Page 5 The power crisis – power consumption Sourse: cool- chips, Micro 32 Dr. Avi Mendelson Page 6 3 Processor Power Evolution 100 ? Pentium® II Pentium® Pro Pentium® 4 Pentium® III 10 Pentium® Pentium® w/MMX tech. Max Power (Watts) Power Max i486i486 i386i386 1 1.5µ 1µ 0.8µ 0.6µ 0.35µ 0.25µ 0.18µ 0.13µ New generation: always increase power Compactions: higher performance at lower power One size fits all: start with high power segment and shrink it to Mobile Dr. Avi Mendelson Page 7 The power crisis: Power Density A real thread to the Moor law Think of watts/cm2 Power is not distributed evenly over the chip. A failure can happen if a single point reach the max power point. Complex algorithms lead to denser power: – Dense random logic Timing pressure leads to faster/bigger/power- hungrier gates – Designers put together units that communicate with each other. It creates “regions” with high activity factors -> hot spots. Dr. Avi Mendelson Page 8 4 PowerPower DensityDensity 1000 Rocket Nozzle Nuclear Reactor Nozzle 2 100 Pentium® 4 Hot plate Pentium® III Pentium® II Watts/cm 10 Pentium® Pro i386 Pentium® i486 1 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ * “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” – Fred Pollack, Intel Corp. Micro32 conference key note - 1999. Dr. Avi Mendelson Page 9 SomeSome implicationsimplications We can’t build microprocessors with ever increasing power density and die sizes The constraint is power – not manufacturability The design of any future micro-processor should take power into consideration. We need to distinguish between different aspects of power: Power delivery Max power (TJ) Power density - hot spots Energy – static + dynamic Power and Energy aware design should take care of each of these aspects One- size does not fit all anymore Dr. Avi Mendelson Page 10 5 WhyWhy powerpower andand powerpower densitydensity increaseincrease overover timetime Dr. Avi Mendelson Page 11 BasicBasic terminologyterminology Dr. Avi Mendelson Page 12 6 Power and the digital world… Power is consumed: – When capacitance is charged and discharged – A charged cap is a logical ‘1’, a discharged cap is ‘0’ E=CV2 IN OUT 01 10 The capacitance can be the gates of other transistors or wires (buses and long interconnects) Dr. Avi Mendelson Page 13 Power and the digital world (2)… Secondary effects like leakage and short-circuit current are increasing with advanced process technologies IN OUT IN OUT 0 1 1/2 Leakage Short-circuit (sub-threshold) Leakage is growing dramatically – 7% now, expect 20% in next process technology, 50% in next one – … Unless we do something (and we will) Dr. Avi Mendelson Page 14 7 PowerPower && EnergyEnergy Energy “The capacity for doing work” * Important for – Battery life - lower energy per task Î longer battery life – Electric bills - lower energy per task Î lower bills Measured over time Proportional to the overall capacitance and to the voltage squared (CV2) * Merriam-Webster’s Collegiate® Dictionary - http://www.m-w.com/ Dr. Avi Mendelson Page 15 PowerPower && EnergyEnergy Power Work done per time unit – Measured in Watts P = αCV2f (α: activity, C: capacitance, V: voltage, f: frequency) “Measured” at peak time Higher power Î higher current – Cannot exceed platform power delivery constrains Higher power Î higher temperature – Cannot exceed the thermal constrains Dr. Avi Mendelson Page 16 8 Voltage, Power, Frequency Transistor switches faster at higher voltage Î Higher voltage enables higher frequency Maximum frequency grows about linearly with voltage …Within a given voltage range Vmin-Vmax – V < Vmin 1000 Î transistors won’t switch XScale processor freq. & power vs. voltage * 900 – V > Vmax 800 Î the device may burn Fequency(Mhz) Î the device may burn 700 Power (mWatt) “The cube law”: 600 P = kV3 500 (or ~1%V=3%P) 400 300 200 100 0 * Source: Intel Corp. (http://developer.intel.com) 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 Dr. Avi Mendelson Page 17 The scalability theory – process technology Target (Ideally):Eache generation (2-3 years) Reduce gate delay by 30% 50% freq gain 2. Increase density by 2x – 0.7 shrink on a side, 50% area reduction on compaction – Transistor Z and L shrink by 30% – Interconnect pitches shrink by ~30% – Add metal layers to make-up for (1) pitches < 30%, and (2) RC Dr. Avi Mendelson Page 18 9 ScalingScaling theorytheory----11 ofof 22 = = = = = Width W 0.7, Length L 0.7, tox 0.7 ° Lateral and vertical dimensions reduce 30% 0.7 × 0.7 Area Cap = C a = = 0.7, 0.7 Fringing Cap = C f = 0.7, Total Cap ⇒ C = 0.7 ± Capacitance--area and fringing--reduce 30% Die Area = X ×Y = 0.7 × 0.7 = 0.72 ² Die area reduces 50% Dr. Avi Mendelson Page 19 ScalingScaling theorytheory----22 ofof 22 Cap 0.7 = = 0.7 Transistor 1 ³ Capacitance per transistor reduces 30% Cap = 0.7 = 1 Area 0.7 × 0.7 0.7 ´ Capacitance per unit area increases 43% × = = = W − = 0.7 0.7 = Vdd 0.7, Vt 0.7, I (Vdd Vt ) 0.7 tox 0.7 C ×Vdd 0.7 × 0.7 0.7 × 0.72 T = = = 0.7, Power = C ×V 2 × f = = 0.72 I 0.7 0.7 µ Delay reduces 30%, power reduces 50% Dr. Avi Mendelson Page 20 10 Process Technology – the Enabler Every process generation (every 2-3 years), Ideally: – Shorten gate delay by 30% Î ~50% (100/70) frequency gain – Vdd scaled down by ~30% Î Results: 2 » 2/3 reduction in energy/transition (CV2Î 0.7 x 0.72 = 0.34X) 1 » 1/2 reduction in power (CV2fÎ0.7 x 0.72 x 1.5 =0.5X) » Power density unchanged Dr. Avi Mendelson Page 21 IdealIdeal Scenarios...Scenarios... Ideal “Shrink” Ideal New µarch – Same µarch – Same die size – 1X #Xistors – 2X #Xistors – 0.5X size – 1X size – 1.5X frequency – 1.5X frequency – 0.5X power – 1X power – 1X IPC (instr./cycle) – 2X IPC – 1.5X performance – 3X performance – 1X power density – 1X power density Looks good. Isn’t it? Dr. Avi Mendelson Page 22 11 Process Technologies – Reality But in reality: – New designs squeeze frequency to 2X per process – New designs use more transistors (2X-3X to get 1.5X-1.7X perf) So, every new process and architecture generation: – Power goes up about 2X – Power density goes up 30%~80% This is bad, and… Will get worse in future process generations: – Voltage (Vdd) will scale down less – Leakage is going to the roof Not as good as it first looked… Aha? Dr. Avi Mendelson Page 23 In Picture… Silicon Process Technology 1.5µ 1.0µ 0.8µ 0.6µ 0.35µ 0.25µ 0.18µ 0.13µ Intel386™ DX Processor Intel486™ DX Processor Pentium® Processor Pentium® Pro Processor Pentium® II Processor Pentium® III Processor Pentium® 4 Processor Dr. Avi Mendelson Page 24 12 Performance Efficiency of µarchitectures Old mm New mm Area Perf Tech µArch (linear) µArch (linear) Ratio Ratio 1.0µ i386C 6.5 i486 11.5 3.1 0.7µ i486C 9.5 Pentium® proc 17 3.2 1.8 0.5µ Pentium® proc 12.2 Pentium Pro® proc 17.3 2.1 1.5 0.18µ Pentium III® proc 10.3 Pentium® 4 proc 14.7 2 1.4 3.5 Area Implications: (in the same technology) 3 Perf. 1. New µarch ~2-3X die area of the last µarch 2.5 2 2. Provides 1.4-1.8X integer performance of 1.5 the last µarch 1 0.5 0 486=>PP PP=>Ppro PIII=>P4P We are on the Wrong Side of a Square Law Dr. Avi Mendelson Page 25 Power Evolution (Theoretical) 250250 100100 Leakage Power ) ) 2 Active Power 2 200200 Power Density 7575 150150 5050 Watts Watts 100100 2525 5050 Power Density (W/cm Power Density Power Density (W/cm Power Density 00 00 0.25µ0.25µ 0.18µ0.18µ 0.13µ0.13µ 0.1µ0.1µ For a 15mm/side die (225mm2) Assume 2X frequency increase each generation Future process numbers are estimated Dr.

Low Power Architectures Lecture #1:Introduction

Design and Implementation of Pentium-M Based Floswitch for Intracluster Communication Veerappa Chikkagoudar, Dr

Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance

Pentium II Processor Performance Brief

Intel Architecture Optimization Manual

Intel® Processor Graphics: Architecture & Programming

The Microarchitecture of the Pentium 4 Processor

X86 Assembly Language Syllabus for Subject: Assembly (Machine) Language

Multiprocessing Contents

IXP400 Software's Programmer's Guide

The Intel X86 Microarchitectures Map Version 2.0

Technical Product Summary Classic/PCI I486 Baby-AT Motherboard

Intel Xeon Processor Can Be Identified by the Following Values