Computer Rules Ø Jim Kardach, rered chief power architect, hp://www.youtube.com/watch?v=cZ6akewB0ps

1 HW1 Ø Has been posted on the online schedule. Ø Due on March 3rd, 1pm. Ø Submit in class. Ø Hard deadline: no homework accepted aer deadline. Ø No collaboraon is allowed.

Chenyang Lu CSE 467S The Power Problem

Ø Processors improve performance at the cost of power. q Performance/wa remains low.

Ø Soluon q Hardware offer mechanisms for saving power. q Soware executes power management policies.

3 Power vs. Energy Ø Power: Energy consumed per unit me q 1 wa = 1 joule/second Ø Power à heat Ø Energy à baery life

4 Why worry about energy? Intel vs. Duracell

16x

14x (MIPS)

12x Hard Disk (capacity) 10x Improvement (compared to year 0) 8x

6x Memory (capacity)

4x

2x Battery (energy stored) 1x

0 1 2 3 4 5 6 Time (years) Ø No Moore’s Law in baeries: 2-3%/year growth. Trend in Power Density Sun’s Surface

1000 Rocket Nozzle

Nuclear Reactor

2 100 ® 4

Pentium® III

Watts/cm Pentium® II 10 Hot plate Pentium® Pro Pentium® New Challenges in the Coming Generaons of CMOS Technologies, Fred Pollack, 1 Intel Corp. Micro, 1999. 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ Process

6 Trend in Cooling Soluon

7 Power Ø Hardware support Ø Power management policy Ø Power manager Ø Holisc approach

8 CMOS Power Consumpon Ø Voltage drops: power consumpon ∝ V2. Ø Toggling: more acvity à higher power. Ø Leakage when inacve.

9 Power-Saving Features

Voltage drops Reduce power supply voltage.

Toggling Run at lower clock frequency. Reduce acvity. Disable funcon units when not in use.

Leakage Disconnect parts from power supply when not in use.

10

Ø Why voltage scaling? q Power ∝ V2 à reduce power supply voltage saves energy. q Lower voltage à lower clock frequency. • Tradeoff between performance vs. energy.

Ø Why dynamic? q Peak compung demand is much higher than average.

Ø Changing voltage takes me q to stabilize power supply and clock

11 Examples

Ø StrongARM SA-1100 takes two supplies q VDD is main 3.3V supply. q VDDX is 1.5V. Ø AMD K6-2+ q 8 frequencies: 200-600 MHz. q Voltage: 1.4, 2.0 V. q Transion me: 0.4 ms for voltage change. Ø PowerPC 603 q Can shut down unused execuon units. q organized into subarrays to reduce acve circuitry.

12 Intel SpeedStep

Intel Core 2 Duo E6600

Intel Penum M P states

13 Linux DVFS Governors Ø Performance q Always set at the max frequency Ø Powersave q Always set at the lowest frequency Ø Ondemand q Automacally adjust the frequency according to CPU usage Ø Conservave q Like ondemand, but in a more conservave way. Ø Userspace q Set at a fixed frequency by the user

14 Ondemand Ø Inial implementaon in 2.6.9 Ø For all CPUs q if (> 80% busy) then P0 (max frequency) q if (< 20% busy) then down by 20% Ø Mulple improvements since 2.6.9

15 Get & Set CPU Frequency Ø Get the current frequency: q /sys/devices/system/cpu/cpu[X]/cpufreq/scaling_cur_freq q Example: 2400000 (2.4GHz)

Ø Frequency & governors available: q /sys/devices/system/cpu/cpu[X]/cpufreq/scaling_available_frequencies q Example: 2400000 2133000 1867000 1600000 q /sys/devices/system/cpu/cpu[X]/cpufreq/scaling_available_governor q Example: ondemand userspace performance powersave conservave

Ø Set the frequency: q Root privilege q echo userspace > /sys/devices/system/cpu/cpu[X]/cpufreq/ scaling_governor q echo 2133000 > /sys/devices/system/cpu/cpu[X]/cpufreq/scaling_setspeed

16 Clock Gang

Ø Applicable to clocked digital components q Processors, controllers, memories Ø Stop clock à stop signal propagaon in circuits

✔ Short transion me q Clock generaon is not stopped q Only clock distribuon is stopped

✘ Relavely high power consumpon q Clock itself sll consumes energy q Cannot prevent power leaking

17 Supply Shutdown Ø Disconnect parts from power supply when not in use.

✔ General ✔ Save most power

✘ Long transion me

18 Example: SA-1100 Three power modes: Ø Run: normal operaon. Ø Idle: stops CPU clock, w. I/O logic sll powered. Ø Sleep: shuts off most of chip acvity

19 SA-1100 SLEEP Ø RUN à SLEEP q (30 µs) Flush to memory CPU states (registers) q (30 µs) Reset processor and wakeup event q (30 µs) Shut down clock

Ø SLEEP à RUN q (10 ms) Ramp up power supply q (150 ms) Stabilize clock q (negligible) CPU boot

20 Duo Processor SV IntelIntel CoreCore DuoDuo ProcessorProcessor SVSV Name Vcc Watt C0 High Frequencey Mode (P0) 1.3 31 C0 Low Frequency Mode (Pn) 1.0 C1 Auto Halt Stop Grant (HFM) 15.8 C1E Enhanced Halt (LFM) 4.8 C2 Stop Clock (HFM) 15.5 C2E Enhanced Stop Clock (LFM) 4.7 C3 Deep Sleep (HFM) 10.5 C3E Enhanced Deep Sleep (LFM) 3.4 C4 Intel Deeper Sleep 0.85 2.2 DC4 Intel Enhanced Deeper Sleep 0.80 1.8

3 Ottawa Linux*Intel® SymposiumCore™ Duo Processor 65nmJuly 19, 2006Process – Datasheet 7 21 The Mote Revolution: Low Power Wireless Sensor Network Devices, Joseph Polastre, Robert Szewczyk, Cory Sharp, David Culler, Hot Chips 16.

22 Power Consumpon with Wireless NIC Power Ø Hardware support Ø Power management policy Ø Power manager Ø Holisc approach

24 Approaches Ø Stac Power Management q Does not depend on acvity. q Example: user-acvated power-down.

Ø Dynamic Power Management q Adapt to acvity at run me. q Example: automacally disabling funcon units.

25 Dynamic Power Management

Ø Inherent tradeoff: energy vs. performance Ø Fundamental premises q Non-uniform workload during operaon q Possible to predict workload with some degree of accuracy

26 PowerPC 603 Acvity Percentage of me idle for SPEC integer/floang-point: unit Specint92 Specfp92 D cache 29% 28% I cache 29% 17% load/store 35% 17% fixed-point 38% 76% floang-point 99% 30% system register 89% 97%

27 Problem Formulaons Ø Minimize energy under performance constraints q Real-me applicaons

Ø Opmize performance under energy/power constraints q Baery lifeme (energy) q Temperature (power)

28 Power Down/Up Cost Ø Going into/out of an inacve mode costs q me q energy

Ø Must determine if going into an inacve mode is worthwhile.

Ø Model power states with a Power State Machine (PSM)

29 SA-1100 Power State Machine

PON = 400 mW

run 10 µs 160 ms 90 µs 10 µs 90 µs idle sleep

P = 50 mW OFF POFF = 0.16 mW

PTR = PON

30 Greedy Policy Ø Immediately goes to sleep when system becomes idle

Ø Works when transion me is negligible q Ex. between IDLE and RUN in SA-1100

Ø Doesn’t work when transion me is long! q Ex. between SLEEP and RUN/IDLE in SA-1100 q Need beer soluons!

31 Break-Even Time TBE Ø Minimum idle me required to compensate for the cost of entering an inacve state.

Ø Enter an inacve state is beneficial only if idle me > TBE.

32 Break-Even Time

PTR ≤ PON

Ø PTR: Power consumpon during transion

Ø PON: Power consumpon when acve

Ø TBE of an inacve state is the total me it takes to enter and leave the state

Ø TBE = TTR = TON,OFF + TOFF,ON

q TBE = 160 ms + 90 µs for SLEEP in SA-1100

33 SA-1100 Power State Machine

PON = 400 mW

run 10 µs 160 ms 90 µs 10 µs 90 µs idle sleep

P = 50 mW OFF POFF = 0.16 mW

PTR = PON

34 Break-Even Time

PTR > PON

Ø TBE must include addional inacve me to compensate for extra power consumpon during transion.

TBE = TTR + TTR(PTR - PON)/(PON - POFF)

Ø Reduce TBE à save more energy

q Shorter TTR

q Higher power difference between PON – POFF

q Lower PTR

35 Inherent Exploitability Ø Achievable energy saving depends on workload! q Distribuon of idle periods

Ø Given an idle period Tidle > TBE

q ES(Tidle) = (Tidle - TTR)(PON - POFF) + TTR(PON – PTR)

Ø Assumpons q No performance penalty. q Ideal manager with knowledge of workload in advance.

36 Inherent Exploitability based on real workload

37 Time-Power Product Workload-independent Metric

CS = TBEPOFF

Ø An inacve state with lower CS may save more energy Ø Only a crude esmate q May not be representave of real power savings

38 Predicve Techniques

Ø Interested event: p = {Tidle > TBE} q Predict based on history Ø Observed event: o q Triggers state transion Ø Objecve: predict p based on o

39 Metrics Ø Safety: condional probability Prob(p|o)

q If an observed event happens à the probability of Tidle>TBE q Ideally, safety = 1. Ø Efficiency: Prob(o|p)

q If Tidle > TBE à the probability of correctly predicng.

Ø Overpredicon à high performance penalty à poor safety Ø Underpredicon à wastes energy à poor efficiency

40 Fixed Timeout Policy

Ø Enter inacve state when system has been idle for TTO

q o: Tidle > TTO

Ø Wake up in response to acvity

Ø Hypothesis: If system has been idle for TTO à it will connue to be idle for Tidle-TTO > TBE

41 TTO???

Ø Increasing TTO improves safety, but reduces efficiency. Ø Highly workload dependent

Ø Karlin’s result: TTO = TBE à Energy consumpon is at most twice the energy consumed under an ideal policy

42 Impact of Timeout Threshold

43 Impact of Workloads

44 Crique: Fixed Timeout

Ø How to set meout threshold? q Tradeoff between safety and efficiency q Works best when workload traces are available

Ø Fundamental limitaons q Always waste energy before reaching the meout threshold q Always incur performance penalty for wake up

45 Possible Improvement Ø Predicve shutdown q shut down immediately when an idle period starts. q avoid wasng energy before reaching the meout threshold. q more efficient, less safe.

Ø Predicve wakeup q wake up when the predicted idle me expires, even if no new acvity has occurred. q avoid performance penalty for wakeup. q less efficient, safer.

46 Predicve Shutdown Threshold-based Policy

Ø Observaon: short acve period tends to be followed by long idle period. Ø If acve period < threshold, the following idle period is

predicted to be longer than TBE. Ø What is the right threshold? q Workload dependent q Require offline analysis

47 Threshold-based Predicve Shutdown

48 Predicve Wakeup Regression-based Algorithm

Ø Predict the length of an idle period based on q preceding acve period q previous n pairs of idle/acve periods Ø More complicated than fixed meout q Need to maintain history informaon Ø Depend on offline analysis and traces to determine the regression funcon and parameters

49 Adapt to Workload Changes

Ø Grade n meout thresholds based on history q Use the best one for predicon q Use weighted average of n thresholds Ø Adjust meout q Increase meout threshold if causing too many shutdowns q Decrease meout threshold if causing too few shut downs Ø Stochasc techniques

50 Criques: History-based Predictors

Ø Depend on short-term correlaon between past & future q Hold in many workloads q Fail when the correlaon is weak

Ø Workload in many embedded systems are more predictable than PCs q Workload (e.g., periodic tasks) known a priori q Specialized applicaon

51 ESSAT Efficient Sleep Scheduling based on Applicaon Timing

Ø Reduce radio power consumpon by exploing the ming properes of periodic queries in sensor networks Ø Sleep scheduling incurs low delay penalty

O. Chipara, C. Lu, and G.-C. Roman, Efficient Power Management based on Applicaon Timing Semancs for Wireless Sensor Networks, ICDCS 2005.

52 Power Ø Hardware support Ø Power management policy Ø Power manager Ø Holisc approach

53 Power Manager Ø Usually implemented in soware (OS) for flexibility Ø Hardware and soware co-design q Soware implements policy q Hardware implements power saving mechanisms Ø Need standard interfaces to deal with hardware diversity q Different vendors q Different devices: processor, sensor, controller …

54 ACPI Advanced Configuraon and Power Interface Open standard for power management services. hp://www.acpi.info/

applicaons power OS kernel device management drivers ACPI BIOS

Hardware plaorm devices, processor, chipset

55 ACPI System Power States

Used as contract between hardware and OS vendors

56 ACPI Global Power States Ø G3: mechanical off – no power consumpon Ø G2: so off – restore requires full OS reboot Ø G1: sleeping state q S1: low wake-up latency with no loss of context q S2: low latency with loss of CPU/cache state q S3: low latency with loss of all state except memory q S4: lowest-power state with all devices off Ø G0: working state

57 Intel Core i7 C States

58 Intel Penum M P states

59 Device Power States Ø Device power state is invisible to the user. q Devices may be inacve when the system is in the working state. Ø Each device may be controlled by a separate power management policy.

60 Power Ø Hardware support Ø Power management policy Ø Power manager Ø Holisc approach

61 Holisc View of Power Consumpon

Ø Instrucon execuon (CPU) Ø Cache (instrucon, ) Ø Main memory Ø Other: non-volale memory, display, network interface, I/O devices

62 Mote • System view when switching from sleep to acve

2.5 1– 10ms ms typical

Source: Joseph Polastre, Robert Szewczyk, Cory Sharp, David Culler. The Mote Revolution: Low Power Wireless Sensor Network Devices. In Hot Chips 16, 2004.

63 Sources of Energy Consumpon

Relave energy per operaon (Cahoor): q memory transfer: 33 q external I/O: 10 q SRAM write: 9 q SRAM read: 4.4 q mulply: 3.6 q add: 1

64 Opmize Memory System Ø Different instrucons à Different energy consumpon

Ø Energy: register << cache (SRAM) << memory (DRAM)

Ø Opmizing memory system à significant energy saving

65 Cache Behavior

Sweet spot in cache size: Ø Too small: waste energy on memory accesses; Ø Too large: cache itself burns too much power.

66 Impacts of Cache Size

67 Opmizaons

Ø Reduce memory footprint q Reduce code size q Analyze/test footprint to find right size: stack, heap… Ø Find correct cache size q Analyze cache behavior (size of ) Ø Minimize memory and cache access q Use registers efficiently à less cache access q Idenfy and eliminate cache conflicts à less memory access Ø Beer performance à More idle me!

68 Reading

Ø Textbook 3.7 Ø Required: Secons I, II, III.A, III.B, IV of L. Benini, A. Bogliolo and G. De Micheli, A Survey of Design Techniques for System-Level Dynamic Power Management, IEEE Transacons on VLSI, pp. 299-316, June 2000. Ø Interesng: Intel Inside…Your Smartphone hp://spectrum.ieee.org/semiconductors/processors/intel- insideyour-smartphone

69