PRACTICAL POWER GATING AND DYNAMIC VOLTAGE/FREQUENCY SCALING

Stephen Kosonocky AMD Fellow

1 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky OUTLINE

. Motivation for DVFS and Power Gating . Dynamic Voltage/Frequency Scaling . Power Gating . Conclusion

2 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky MOTIVATION

3 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky SERVER COST TRENDS

. Server compute capacity/$ is increasing steadily – Driving up total power/$ spent on servers – Also driving up electrical power spending . Green initiative and focus on energy independence creates additional pressure to lower energy costs . Increased energy efficiency directly reduces costs

• Source Data: Uptime Institute, 2009

4 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky MOBILE AND DESKTOP

Now: Parallel/Data-Dense

. High degree of integration of 16:9 @ 7 megapixels special functions HD video flipcams, phones, webcams (1GB)

. Diverse workloads 3D Internet apps and HD video online, social networking w/HD files – Data serial, data parallel 3D Blu-ray HD – Streaming Multi-touch, facial/gesture/voice – Compute intensive in bursts recognition + mouse & keyboard

– Long periods of inactivity All day computing (8+ Hours)

. Small form factor Immersive and . Long battery life interactive performance . Instant access of the device

. Autonomous Workloads

- Samuel Naffziger, VLSI Symposium 2011, Kyoto

5 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky POWER/PERFORMANCE TRADE-OFF

Processor Power-Frequency/Voltage 35 12

30 Perf/Watt 10

25 Power Performance/Watt 8 maximized at the 20 lowest voltage operating point 6 15 Watts/Core 4 Frequency (GHz) Frequency 10

Fmax/Voltage 2 5

0 0 0.725 0.775 0.825 0.875 0.925 0.95 1.025 1.075 1.125 1.175 1.225 Voltage Performance/Watt maximized at VMIN

6 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky HIGH PERFORMANCE CORE POWER

Depending on node, leakage ~ 40% total power

7 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky Two major knobs have emerged for controlling power

1. Dynamic Voltage and Frequency Scaling – Optimize performance for the application while it’s running 2. Power Gating – Gate power during idle periods

Each present unique challenges for implementation and optimization

8 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DYNAMIC VOLTAGE/FREQUENCY SCALING

9 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LLANO ACCELERATED PROCESSING UNIT (APU)

• Integration Provides Improvement • Eliminate power and latency of extra chip crossing • 3X bandwidth between GPU and Memory • Same sized GPU is substantially more effective • Power efficient, advanced technology for both CPU and GPU • Thermal management optimization between CPU and GPU ~ Power Breakdown

. Key features for Mobile Mark 07 IO Analog PHY – Lower Idle, Active power CPU – DVFS / Clock-gating / Power Gating NB – Robust and Flexible Power Management Control DDR GPU PHY

10 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU POWER MANAGEMENT

- Sebastien Nussbaum, IEEE Vail Elements, June 22th, 2009

11 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky

Voltage . Dynamic frequency and voltage scaling (DVFS)

P-states: Core executing at some frequency V . Match the frequency of operation with P-states C1/C2: Core halted and frequency ramped down the workload . Lower voltage to sustain required frequency C3/C4: Core halted, clocks ramped to even lower VAltvid frequency, maintain caches, maintain core Examples:

. P-states for active control of V C5: Core halted, clocks ramped to even lower deeper Altvid frequency, flush caches, maintain core voltage/frequency

. C-states for degrees of idle control

C6: Core off, flush caches, core state saved to 0V DRAM

12 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DESIGN FOR DVFS

. Optimization of timing across entire voltage range necessary for efficiency DVFS . Required two separate timing corners with equally aggressive cycle time goals – 0.8V & 1.3V . Both gate-dominated timing paths and paths with high RC content have been tuned to provide better scaling than the prior design . Enables the core to thoroughly exploit boost capability

13 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU DVFS

- Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009

14 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU THERMAL PROFILE

110.0

105.0

100.0

95.0

90.0

85.0 CPU-centric workload

- Samuel Naffziger, VLSI Symposium 2011, Kyoto

15 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU THERMAL PROFILE

110.0

105.0

100.0

95.0

90.0

85.0 GPU-centric data parallel workload

- Samuel Naffziger, VLSI Symposium 2011, Kyoto

16 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU THERMAL PROFILE

110.0

105.0

100.0

95.0

90.0

85.0

Balanced workload

- Samuel Naffziger, VLSI Symposium 2011, Kyoto

17 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky WHAT HAPPENS WHEN WE ADD MORE CORES?

CPU 0 CPU 1 Already at 6, 2 high current supplies PLL PLL VDD CPU 2 CPU 3

PLL PLL

VDDIO, VLDT VRM DIMMs VTT PLLs SVI PLLs Graphics

Misc

PLL VDDA

VDDNB

18 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky PACKAGE LIMITS

. Packages add constraints to increasing number of discrete Typical low cost package structure supplies Chip . Typically only use 4 thick layers for high current power distribution . 3-4 thin build-up layers on each side – Used for secondary supply and signal routing to pins

. Difficult to add many more supply rails

19 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky TYPICAL PACKAGE LAYER ROUTING LAYERS VDDR VDDA VDDNB VDD VDDIO

M1 – PWR (top metal layer) M2 – routing layer 1 Package capacitors Routing congestion Space limited for capacitors

20 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky CORE VOLTAGE POWER DELIVERY IN PACKAGE

VDD Max Package Droop VSS Max Package Droop

2.4x -0.025 0.041 higher -0.0255 0.04 -0.026 0.039

0.038 -0.0265 0.037 -0.027 0.036

0.035 -0.0275 1 7 0.034 -0.028 13 0.033 S29 19 0.032 -0.0285 25 S22 S25

S22 -0.029 S15 31 S19 1 S16 4 7

S13 S8 10 S10 S7 13 16 19 S4

22 S1 25 S1 28 31 34

. Package Routing Constraints forces compromises for VDD plane – More difficult with increasing VDD planes . More resources dedicated to VSS – Shows less droop

21 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky WHAT DOES THE ACTUAL VDD LOOK LIKE?

25ns trace of DGEMM benchmark 1.35 1.4

1.3 1.2 ~50mV 1 1.25 0.8 VDD 1.2 0.6

0.4 1.15 Current Normalized 0.2

1.1 0 Time

X86 core Supply simulation w/ package board model

Increased local decoupling cap could help

22 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky PACKAGE CAPS FOR SUPPLY DECOUPLING

Most effective

Marginally effective

Hard to find places for additional components

23 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LARGE ON-DIE DECOUPLING CAPACITOR FOR REDUCTION OF SUPPLY NOISE Trench Capacitor Shift resonant frequency from 80MHz -> 7MHz

- D. Wendel, et.al. “The Implementation of POWER7: A Highly Parallel and Scalable Multi-Core High-End Server ”, ISSCC 2010 Adds significant process cost 100fF/um2 ~ 25x thick-ox

24 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky METHODS FOR INTEGRATED VOLTAGE REGULATION

. Switched capacitor regulator

. Buck converter

. Linear regulator

. Switched voltages

25 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky INTEGRATED SWITCHED CAPACITOR REGULATOR

. Efficiency dependent on load current 2:1 Conversion – Most efficient at 2:1 voltage conversion ratio – Limits number of in the path – ~90% for 2:1, ~70% for other values . Requires large capacitors – Can use trench capacitor for better area efficiency . Large voltage and current ripple

– Can mitigate with large decap and/or Leland Chang, Robert K. Montoye, Brian L. Ji, Alan J. Weger, Kevin G. Stawiasz, and Robert H. Dennard, “A interleaved converters Fully-Integrated Switched-Capacitor 2:1 Voltage Converter with Regulation Capability and 90% Efficiency at 2.3A/mm2”, VLSI Symposium , 2010

26 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky INTEGRATED BUCK CONVERTER

. Integrated inductor options . ~ 0.33nH possible . High frequency operation can reduce inductance requirements . L proportional 1/Fs . Can reasonably approach 1GHz when fully integrated . Parasitic series resistance is key problem . I2R loss large . Multi-phase or parallel regulator architecture can help Interleaved Outputs – (I/n) 2 R (n=number phases) – Current ripple is averaged by output cap W. Kim, et.al, June, 2007 . Too many phases will reduce efficiency for low current loads

27 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky INTEGRATED BUCK • Embedded inductors needs lower resistance or higher inductance/turn CONVERTER KEY • Multi-phase or parallel can approach LIMITATION: reasonable efficiency only with very high DCR VS. INDUCTANCE number of paths • Reduces current in each inductor CMOS Inductor vs. Small Package • Fully embedded approach really needs a specialized integrated inductor technology size components Peak Efficiency considering inductor resistive loss alone

>70%

Discrete inductors consume valuable package resources

28 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky INTEGRATED LINEAR REGULATION

. Error between divided output and reference voltages is and fed to the pass transistor to Vin correct the error . NMOS type higher performance Vref . PMOS type (Low Drop Out) Vout . Efficiency good only for small ErrAmp (Vin – Vout) I Eff = (Power Out)/(Power In) load = ~ Vout/Vin . LDO design can be more efficient (PMOS output) – Stability & speed of feedback become an issue

29 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LINEAR REGULATOR (PMOS)

Vin PMOS based linear regulators offer low dropout voltage while operating

within (0-Vin) range

- Required gain-bandwidth product Vref challenging high for processor + loads

Vout  BW ~ 1GHz needed

 Iload Reff Cdecap Hurts efficiency, power wasted in Op-amp

30 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LINEAR REGULATOR (NMOS)

Charge Vin Source-follower topology Pump  Will require charge pump to provide gate overdrive.

 Vref + Accurate voltage setting. -  Supply rejection – dropout voltage tradeoff

Vout

Iload Reff Cdecap High BW response makes it a good candidate for integration w/o large Capacitors Intrinsically good frequency response Requires charge pump or high Charge-pump (or low current supply) allows Low drop out voltage supply >Vin for good efficiency

Higher Efficiency

 Amplifier is high gain, low bandwidth.

 Bandwidth and efficiency depends on desired slew rate

 0Vth devices offer additional efficiency

31 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DVFS SYSTEM IMPLEMENTED WITH DUAL VOLTAGES

. Fully integrated167 processor system . Each processor tile contains a core that operates at: – A fully-independent clock frequency . Any frequency below maximum . Halts, restarts, and changes arbitrarily – Dynamically-changeable Dean N. Truong, Wayne H. Cheng, Tinoosh Mohsenin, Zhiyi Yu, Anthony T. supply voltage Jacobson, Gouri Landge, Michael J. Meeuwsen, Christine Watnik, Anh T. Tran, Zhibin Xiao, Eric W. Work, Jeremy W. Webb, Paul V. Mejia, Bevan M. . VddHigh or VddLow Baas, “167-Processor Computational Platform in 65 nm CMOS”, JSSC, Vol. 44, No. 4, April 2009 . Disconnected for leakage reduction . Each power gate comprises 48 individually-controllable parallel transistors

32 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky POWER GATING

33 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky WHAT'S THE PROBLEM? JUST TURN IT OFF WHEN YOUR NOT USING IT!

.Many choices and options .Need to analyze your particular application for the optimal solution .Considerations: – Design complexity – Power savings – Frequency/performance impact – Wake-up time – Area overhead – Verification complexity – Return on investment (ROI)

34 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky POWER GATING MODES (ACTIVE)

During active mode . Natural low gate activity factors inside of logic block allow sharing of power gate device for many logic functions with gated logic block – Reduces resistance requirements of power IMAX gate device RON – Works to improve Ion/Ioff ratio requirements . Design objective – Minimize resistance of cut-off device . Penalties: Logic Block – Increased logic delay – Requires higher external supply voltage for same frequency . Consumes more power during active mode

VLOGIC  Vdd VIRPG

VIRPG  RON  I MAX Header

35 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky POWER GATING MODES (SLEEP)

During sleep mode . Steady state – Want to maximize OFF resistance of cut-off device ISLEEP ROFF – Reduce stand-by power with lower leakage (10x -> 1000x possible) . Transition to cutoff Logic Block

– Creates supply di/dt for small ISLEEP vs. large IL

PSLEEP  Vdd  ISLEEP

dICUTOFF ISLEEP  I L   Header Switch dt tCUTOFF

36 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky POWER GATING MODES (WAKE)

During wake-up mode

Need to control in-rush IWAKE – RWAKE current to charge logic devices to VDD . Logic capacitance is CLogic discharged to nVdd during sleep mode

1 n Vdd IWAKE  RWAKE Header Switch 1st order analysis

37 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LEAKAGE SAVINGS WITH POWER GATING

>20x leakage savings

Early Llano Measured Results Very effective method for controlling power during idle periods

Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor F. Andrade, Amy Novak, Samuel Naffziger, “An -64 Core in 32 nm SOI CMOS”, JSSC, January, 2011

38 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DETERMINATION OF FREQUENCY IMPACT DURING ACTIVE MODE

. Determine tolerable voltage drop for active mode operation – De-rate timing models by fixed maximum value to account for voltage loss in power switch – Increase external voltage to compensate for voltage drop . Use critical path simulations or representative RO data for voltage/frequency trade-off . Design worst-case power gate and grid resistance to meet specification target . Analyze drop at highest operating voltage and highest power consumption . Keep voltage drop due to power gating small to minimize uncertainty at power island boundary crossings . 2nd order effects: – Circuit area growth due to wire blockages caused by power switch – Decreased available quiet decoupling capacitance due to increased resistance to implicit and explicit charge reservoirs on the die

39 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky FREQUENCY IMPACT ESTIMATED WITH RING OSCILLATOR

32 nm FO1 RO Data (RVT, min W) 3.0%

2.5% 1% VDD loss ~ 1% frequency loss

2.0%

0.5% Delta VDD 1.5% 1.0% Delta VDD 1.5% Delta VDD

Frequency Delta Frequency 1.0%

Target IR drop: 0.5% VIRPG~11mV @1.1V

0.0% 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 VDD (V)

40 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky POWER GATE CELL I-V CHARACTERISTICS

7.0E-04 76.50

6.0E-04 IDS 76.00 FET operates in 5.0E-04 the linear region RDS

75.50 4.0E-04

IDS (A) 3.0E-04 Can be 75.00 approximated as RDS (ohms) 2.0E-04 a resistor

74.50 1.0E-04 Target region of operation 0.0E+00 74.00 0.00 0.01 0.02 0.03 0.04 0.05 0.06 VDS (V)

41 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky EMBEDDED ROW-BASED POWER SWITCH PLACEMENT

VirVDD VDD PS PS PS PS PS PS PS PS PS PS PS VSS

Current Flow

VirVDD VDD PS PS PS PS PS PS PS PS PS PS PS VSS Pre-populate logic area with power gate cells at uniform locations prior to place and route

42 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky MAXIMUM POWER GATE SPACING (ROW-BASED PLACEMENT)

Imax

VIRW(0) VIRW(1) VIRW(2) VIRW(3) VIRW(4)

VIRPG

. Assume uniform current VIRPG = 2 * rows * IROW * RPG density

. Can estimate power gate VIR_WC = VIRPG + ∑ VIRW spacing with respect to IR and EM limits

43 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky IR DROP EXAMPLE (UNIFORM POWER DENSITY)

IR Drop Along Virtual VSS Perpendicular to Standard Cell Rows

70.0

60.0 1X Metal Virtual Supply IR drop 50.0 1.3X Metal Virtual Supply IR drop

40.0

30.0 IR Drop (mV) 20.0 Max current density = 1.0A/mm2 10.0 75.57ohm footer 30rows 0.0 0 20 40 60 80 100 120 140 Standard Cell Rows Between Footers

44 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky EM EXAMPLE (UNIFORM POWER DENSITY)

EM Margin Along Virtual VSS Perpendicular to Standard Cell Rows

25%

20% 1X Metal EM Limit 1.3X Metal EM Limit

15% Max current density =1.0A/mm2

EM Limit 10%

5% Thick metal will provide more EM margin 30rows 0% 0 20 40 60 80 100 120 140 Standard Cell Rows Between Footers

45 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky VERTICAL CROSS SECTION

C4 Bump

M11 Full VDD Grid M10 thicker metal M9

M8

M7

M6 Virtual VDD VDD M5 Via Grid Tower M4 M3 EM

M2 VDD PS VVDD Logic EM hazards can occur in high power density regions

46 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky EMBEDDED POWER GATE EXAMPLE STYLE USED IN LLANO GPU AMD GPU Functional Unit Power Gate Tile

Power gate: Checkerboard pattern SRAM

Always-on cells: FeedThru cells, Power control logic, Clamp cells for macro input pins and tile output ports.

Input I/O buffers

Power gate cells must squeeze around arrays Providing power to always-on cells can be problematic IP I/O protection typically needed to isolate individual regions SRAMs need their own custom power gating

47 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky EMBEDDED POWER GATE EXAMPLE

AMD GPU Functional Unit Power Gate Tile

Simulated with Apache Redhawk in vectorless dynamic analysis mode

Power gating increases distribution of IR drop on grid Can cause problems with critical circuits and paths

48 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky EXTERNAL RING STYLE GATING

. Interrupt global supply grid to power-gated IP block . Requires careful chip floorplanning Global to accommodate supply GND Grid interruption VGND Macro/Core . Increased sharing of power gate M1 metal devices can ease power gate sizing Virtual Footer Switch – Large IP power gating can greatly Grid M2 metal Location ease worst-case current analysis . Can be difficult to create always power-on islands with power-gated region

49 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LLANO CPU CORE RING GATING WITH PACKAGE LAYER ASSIST

Virtual voltage spreads uniformly Bumps near hot spots can exceed max limits

Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor F. Andrade, Amy Novak, Samuel Naffziger, “An x86-64 Core in 32 nm SOI High R can create noise issues CMOS”, JSSC, January, 2011 VSS

50 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky STRAIGHT EDGE & ZIG/ZAG EDGE PACKAGE PLANE BOUNDARIES

Peak currents flowing into bumps at periphery can be alleviated using zig/zag approach

51 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky MAXIMIZING POWER GATING OPPORTUNITIES

. To maximize power savings, we would like to go into a power gating mode as soon as possible to reduce leakage power . Practical constraints limit opportunity – Energy break-even point – In-rush current induced noise – Time to restore state of power gated region

Power

A A A c c c t Idle t Idle t Idle Application i i i v v v Mode e e e

time

52 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky ENERGY BREAK-EVEN POINT

Power Gate Example:

T0: Logic block starts inactivity T1: Control circuit decides to power gate T2: Power gate signal propagation complete T3: Energy break-even point T4: Full discharge of virtual supply T5: Control circuit decides to re- power logic T6: Power gate signal propagation complete T7: Gated logic is fully charged and ready for activity Z. Hu et al., "Microarchitectural Techniques for Power Gating of Execution Units," ISLPED'04, August 9–11, 2004, Newport Beach, California, USA.

53 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky IN-RUSH CURRENT CONTROL

. During wake-up of the power gated region it’s critical to control the in- rush current . The effective resistance of parallel combination of footers is very small – Can create excessive currents . Transition to cutoff mode can also create similar di/dt events . Hazards – Supply bounce from large di/dt – EM violations – Short circuit currents from imbalanced nodes in power gated region during wake-up

54 | Practical Power Gating and Dynamic Voltage/FrequencyS. Scaling Kosonocky Issues | August 17, 2011 | Stephen Kosonocky 54 UNDERSTANDING SUPPLY BOUNCE

In-rush current can cause supply bounce on neighboring IP

Suhwan Kim, Chang Jun Choi, Deog-Kyoon Jeong, Stephen Kosonocky, Sung Bae Park, “Reducing Ground-Bounce Noise and Stabilizing the Data-Retention Voltage of Power-Gating Structures”, IEEE Transactions on Electron Devices, Vol. 55, NO. 1, January 2008

55 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky 55 IN-RUSH CURRENT SIDE-EFFECTS

. Disturb neighboring circuits – Retention , adjacent IP logic, SRAM . Supply bounce, ground bounce . Over stress gate-oxides due to voltage excursions . Un-even distribution of wake signals during power-up – Creates excessive short circuit currents when some gates are charged and others are not . Large di/dt can cause EM failures

56 | Practical Power Gating and Dynamic Voltage/FrequencyS. Scaling Kosonocky Issues | August 17, 2011 | Stephen Kosonocky 56 PROGRAMMABLE IN-RUSH CURRENT CONTROL

. When exiting power gating, power gates are gradually enabled in groups with progressively larger effective device widths – Controlled by FSM – Fuse programmable strengths and duration for post-silicon tuning – Analytical R-C model w/o inductance demonstrating control

group1 group2 group3 group4 group5

57 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LLANO CPU CORE IN-RUSH CURRENT ON CC6 EXIT

Apache Redhawk In-rush Current Simulation (Conservative Programming) 3.50 1.4

3.00 1.2

2.50 1

2.00 0.8 IVSSCORE0

VSSCORE0

Current Current (A) 1.50 0.6 VSSCORE (V)

1.00 0.4

0.50 0.2

0.00 0 0 20000 40000 60000 80000 100000 Time (ps)

Fuse programmable in-rush control built-in Sample setting shows a 3A peak during CC6 exit

58 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky HW DATA SHOWING IN-RUSH EFFECT ON OTHER CORES

1 cores in/out CC6 3 cores in/out CC6 1 cores in/out CC6 3 cores in/out CC6 3 core max freq 1 core max freq 3 core max freq 1 core max freq Large in-rush setting Small in-rush setting Transient droop limited by CC6 exit Transient droop limited by # cores running max frequency

59 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky STATE SAVE AND RESTORE

. Regulate virtual supply during sleep mode to low voltage – Moderate leakage reduction possible – Globally applies to all retention elements in gated region – Example: . K. Kumagai et al., "A Novel Powering-down Scheme for Low Vt CMOS Circuits," Symposium on VLSI Circuits, 1998 (Virtual Rail CMOS). . L. Clark, S. Demmons, "Standby Power Management for a 0.18μm ," ISLPED 2002 (Intel XScale processor). . Retention flops with always-on latch cell – Many variants possible – Some with explicit control or automatic restore – Good practice to avoid adding power domain crossing in functional path of flop – Can increase timing uncertainty with virtual supply bounce – Reduces requirements of always-on supply distribution . Write-out critical state to memory using serial or parallel paths – Can utilize scan mechanism to avoid a dedicate – Low area overhead – Can be a large time overhead depending on total size of state restore memory

60 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky CONCLUSION

61 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky CONCLUSION

. DVFS provides a nice capability for optimization of CPU/GPU/APU performance per application – As we add more cores and IP to the SOC, it’s will be difficult to continue to provide unique voltage rails with external regulators – Integrated regulation has it’s own challenges, more details will be covered by the other presenters . Power gating can mitigate the need for additional voltage rails – Maximum frequency impact is minimal – Integration by embedding in IP or surrounding in a ring each have their own issues – Opportunity is limited by energy breakeven, in-rush noise, and state restore overhead – In-rush current control is necessary to prevent frequency impacts

62 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky