Automatic Performance Setting for Dynamic Voltage Scaling

Krisztián Flautner [email protected]

Steve Reinhardt Trevor Mudge

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 1 Overview • A mechanism for quantifying the user experience. – Metric: response time. – Automatic, no user program modifications required. – Run-time feedback to the kernel.

• Guiding performance setting of DVS processors. – For interactive episodes: slow down processor to save energy when response times are fast enough. – For periodic events: track periodicity, utilization and inter- task communication to establish necessary performance.

• Simulated and experimental results.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 2 Dynamic Voltage Scaling

Execute only as fast as necessary to meet deadlines. Running fast and idling is not energy efficient.

Power = Capacitance • voltage2 • frequency

• Voltage is proportional to the frequency. • Reduce f and v to match performance demands. • Reduced frequency implies longer execution time.

Energy ~ voltage2

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 3 Why bother? 100

Pentium II (R) Pentium Pro ? (R)

Pentium(R) Pentium(R) 10 MMX

486 486 Max Power (Watts) Power Max

386

386 Source: 1 1.5µ 1µ 0.8µ 0.6µ 0.35µ 0.25µ 0.18µ 0.13µ

Higher performance = increased power consumption.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 4 Power Density!

1000 Rocket Nozzle Sun’s Nuclear Reactor Surface 100 ? 2

Watts/cm 10 Hot plate

1 Source: Intel 1.5µ 0.8µ 0.35µ 0.18µ 0.1µ

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 5 Small performance reduction = big energy savings

2 1

1.6 0.8 Energy factor Energy 1.2 0.6

Voltage (V) 0.8 0.4

0.4 0.2

Graph based on Intel XScale data 0 0 0 200 400 600 800 1000 1200 Frequency (Mhz) 20% performance reduction = 32% energy reduction 40% performance reduction = 55% energy reduction

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 6 Processors supporting DVS

Transmeta Intel XScale lpARM Intel SA-1100 Intel XScale Crusoe 5600 Demo

8Mhz 59Mhz 500Mhz 150Mhz 150Mhz Min. 1.1V 0.79V 1.2V 0.75V 0.75V 1.8mW 106mW ~1W 40mW 40mW

100Mhz 251Mhz 700Mhz 800Mhz 1000Mhz Max. 3.3V 1.65V 1.6V 1.5V 1.75V 220mW 964mW ~2W 900mW 1.45W

Process 0.6 0.35 0.18 0.18 0.18

Max/min 9 4.4 1.8 4 5.4 energy

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 7 Some recent desktop processors

AMD Intel Pentium IV Intel Pentium III MPC 7450 Model 4

500Mhz @ 1.35V 650Mhz @ 1.75V 533Mhz @ 1.8V Core 1.4Ghz @ 1.7V 733Mhz @ 1.65V 1.2Ghz @ 1.75V 667Mhz @ 1.8V

100Mhz, 133Mhz 200Mhz, 266Mhz 133Mhz I/O 400Mhz 3.3V 1.6V 1.8V-2.5V

Process 0.18 0.18 0.18 0.18

Max. 12W 38W 17W 66.3W Power 19.1W 66W 19.1W

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 8 Performance setting algorithms

• Programmer specified – Works well but requires explicit specification of deadlines. • Interval based algorithms – Use the ratio of idle to busy time to guide DVS. – Only work well if processor utilization is regular. – No service quality guarantees.

• Ours: episode classification based – Find important execution episodes – predict their performance. – Works with existing user programs. – Works well with irregular workloads. – Uses information in kernel to derive deadlines automatically. – Impact on response time is automatically quantified. • Performance can be adapted to the user’s preference. Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 9 Episode classification

• Interactive episodes – When the user is waiting for the computer to respond.

• Periodic episodes – Producer (e.g. MP3 player). – Consumer (e.g. sound daemon).

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 10 A utilization trace

Each horizontal quantum is a millisecond, height corresponds to the utilization in that quantum.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 11 Episode classification

Interactive (Acrobat Reader), Producer (MP3 playback), and Consumer (esd sound daemon) episodes.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 12 Mouse movement

X server updates screen every ~10ms. Update takes ~0.25ms.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 13 Interactive episodes

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 14 Interactive episodes can include idle time

Waiting for data from the network during a run of . Page rendering starts after 250ms.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 15 Finding interactive episodes

• One way: mouse click indicates start, idle time indicates end. – Inaccurate, latency in finding the end of the episode.

• Our approach: track inter-task communication. – Start of an interactive episode: • X server sends a message to another task. – During interactive episode: • Keep track of communicating tasks (episode’s task set). • Compute desired metrics. – Conditions for ending the episode (applied to tasks in task set): • No tasks are executing. • Data written by the tasks have been consumed. • No task was preempted the last time it ran. • No tasks are blocked on I/O.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 16 Characteristics of Interactive Episodes

• Faster is not necessarily better. – Human perception has finite resolution. – Perception threshold is ~50ms. – The goal is to run fast enough to meet the perception threshold, no point to running any faster.

• Many interactive episodes are already fast enough. • More will be imperceptible in the near future. – 200ms perception threshold today estimates work done during 50ms 3 years from now. Slow down the processor!

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 17 Time above the perception threshold 100%

80%

60%

40%

Acrobat Reader 20% FrameMaker Ghostview Time aboveTime the perception threshold GIMP Nets cape 0% 50ms 100ms 150ms 200ms 250ms 300ms Perception threshold Time above the perception threshold is given as a percentage of time spent in all interactive episodes.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 18 The key: performance-setting algorithm

• Use episode detection and classification. – Interactive episodes. – Periodic episodes (producer and consumer). • Performance-setting on a per episode basis. • Stretch episodes to their deadlines. – Interactive episode: perception threshold. – Stretch producer to consumer.

No modification of existing programs needed. Works with irregular processor utilization and multiprogramming.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 19 Cumulative interactive episode length distribution

Minimum performance level sufficient Max. performance 10ms 50ms 100%

90%

80%

70%

60%

50%

40% FrameMaker 30%

20% Cumulative number 10% Cumulative time

0% 1e-05 0.0001 0.001 0.01 0.1 1 Episode length (sec)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 20 Performance-setting strategy for interactive episodes

• Predict the performance factor that would be correct most of the time (not for most events). – Based on past optimal performance factors.

• Limit worst case impact on response time. – Run at full performance after PanicThreshold is reached.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 21 Performance-setting for interactive episodes

At the beginning of the episode • Wait 5ms before transition to ignore short episodes • Switch to predicted performance level.

During the episode • If episode duration reaches PanicThreshold, switch to maximum performance.

At the end of the episode • Estimate full performance episode duration. • Compute optimum performance level for past episode. • Compute new prediction based on optimum settings.

PanicThreshold = PerceptionThreshold(1 + PerformanceFactor) Predicted PerformanceFactor is the average of past optimum settings, weighted by the corresponding episode lengths.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 22 Performance-setting algorithm

Periodic activity detected • Enter period-sampling mode. • Switch to maximum performance. • Establish base performance level. • Exit period-sampling mode.

Start of interactive episode • If not in period-sampling mode, apply interactive episode performance-setting policy.

End of interactive episode • Update interactive episode statistics. • Switch to base performance level, if there is periodic activity on the machine.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 23 Performance-setting during the Acrobat Reader benchmark (200ms p.t.)

1

0.8

0.6

0.4 Transitions to maximum performance level are due to Performance factor reaching the PanicThreshold

0.2

0 0 2 4 6 8 10 12 14 16 18

Time (sec)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 24 Performance-setting during the Acrobat Reader + MP3 benchmark (200ms p.t.)

Full performance for periodic activity.

1

Transitions due to PanicThreshold 0.8

0.6

0.4 Performance factor

0.2

0 0 5 10 15 20

Time (sec)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 25 Hardware assumptions

Minimum performance 150Mhz @ 0.75V

Maximum performance 1000Mhz @ 1.75V

PLL resynch time 0.02ms (stalls execution)

Voltage transition time 1ms

Assumptions based on Intel Xscale.

We assume that processor switches to sleep mode when it is not executing an episode.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 26 Energy factors (no MP3)

100%

80%

60%

40% Energy factor

20% Acroread FrameMaker Ghostview GIMP Nets cape Xemacs 0% 50ms 100ms 150ms 200ms 250ms 300ms Perception threshold

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 27 Energy factors with MP3 playback

100%

80%

60%

40% Energy factor

20% Acroread FrameMaker Ghostview GIMP Netscape Xemacs 0% 50ms 100ms 150ms 200ms 250ms 300ms Perception threshold

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 28 Changes in cumulative episode lengths as the result of performance scaling (Xemacs 50ms p.t. )

10 0% 10ms 50ms

90%

80%

70% scaling performance After

60%

50%

40%

30%

20% Before performance scaling Cumulative percentage of time of percentage Cumulative

10%

0% 1e-05 0.0001 0.001 0.01 0.1 1 Episode length (sec)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 29 Vertigo

• A DVS implementation for 2.4 kernel. • Currently runs on Crusoe. – Test machine: PictureBook (PCG-C1VN) using TM5600 processor (300Mhz-600Mhz).

Goals: • Robust implementation. • Evaluate our algorithms on computers with DVS. • Contrast with conventional DVS algorithm (LongRun).

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 30 Vertigo implementation

User processes Vertigod daemon

•Monitored through kernel hooks. •User-mode process. •System calls •Implements DVS policy. •Task switch/create/exit •Can specify hints.

Vertigo Module Kernel •Episode detection & tracking. Hooks •Comm. with policy daemon. •Event tracing. •/proc interface.

• Some kernel modification required (~20 lines): – Socket, inode, task_struct datastructures, task create/exit notification. • Episode detection done in kernel module. • System calls dynamically patched through syscall table.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 31 Vertigo implementation issues

• Need millisecond resolution timer interrupts. – Linux has 10ms resolution. – Generate extra “fake” interrupts from kernel hooks. • Need constant-rate timestamp counter. – Always count at peak frequency rate, even when asleep. – does this. – Intel XScale does not. Need to query external clock. • Policy implemented in user-mode process. – Flexible, can do floating point arithmetic. – Communication cost very platform dependent. • Pentium II: ~6,000 cycles, Crusoe: >60,000 cycles. – API designed to minimize communication. – Move communication off the critical path (to after critical episodes).

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 32 Vertigo vs. LongRun

• LongRun: implemented as part of the processor. – Interval based algorithm (guided by busy vs. idle time). – Min. and max. range is controllable in software.

• Vertigo: implemented in OS kernel. – Classification based algorithm. – Distinguishes important from unimportant parts of execution. – Takes the quality of the user experience into account.

• Qualitative comparison on following graphs. – The two runs of the benchmarks are close but not identical. • Human repeated the runs of the benchmark. – Transitions to sleep are not shown. – Same perceived interactive performance.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 33 No user activity

LongRun Performance level Performance

Time (s) 100% = 600Mhz @ 1.6V

Max. energy savings that should be Frequency range of the TM5600 processor. expected on this processor is ~34%.

50% = 300Mhz @ 1.3V Vertigo Performance level Performance

Time (s)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 34 Emacs

LongRun Performance level Performance

Time (s)

Vertigo Performance level Performance

Time (s)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 35 Acrobat Reader

LongRun Performance level Performance

Time (s)

Vertigo Performance level

Time (s)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 36 Acrobat Reader with sleep transitions

LongRun Performance level Performance

Frequent transitions to/from sleep mode. Time (s) Longer durations without sleeping.

Vertigo Performance level

Time (s)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 37 Desired improvements

• Processor parameters are good enough. – Faster voltage transitions would help a little. – As peak performance gets higher, lower minimum performance is desirable.

• More sophisticated prediction algorithms. – Distinguish between episode instances, not just episode types.

• Larger performance range for DVS processor. – Puts more pressure on performance-setting algorithm. – More opportunity for energy savings.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 38 Conclusions

• Many interactive episodes are already fast enough. – More will be fast enough in the near future. – Use Dynamic Voltage Scaling to save energy.

• Episode classification based on inter-task communication. – Fast, accurate, no user program modifications required.

• Performance-setting based on episode classification. – Works well with multiprogramming, irregular processor utilization. – Ensures high quality interactive performance. – Significant energy savings (10%-80%).

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 39 Future work

• Evaluate our algorithms on real hardware. – Processors are slowly becoming available. – Impact on interactive performance.

• An API to specify episodes. – Light-weight: specify hints, not complete information. – Works in concert with existing detection mechanism.

• Apply episode detection to other problems. – Scheduler: can real-time deadlines be detected automatically?

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 40 fin.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 41 Response time

The time it takes for the computer to respond to user initiated events.

• Faster is not always better. – Fundamental limit to what is perceptible to humans. • Movies: 20-30 frames per second. • Perceptual causality: 50ms-100ms. • Dragging objects on screen: 200ms. • Non-continuous operation: 1-2sec.

The goal is to run fast enough to meet the perception threshold, no point to running any faster.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 42 The performance gap

100000

All perfo rmance requirements are met (B).

10 0 0 0 Available P erfo rmance

10 0 0 Des ired perfo rmance

10 0 Performance

Available performance Slowest available Available performance 10 starts accommodating performance exceeds is higher than required (D). requirements (A). minimum requirements (C).

1 0 1.5 3 4.5 6 7.5 9 Tim e (years)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 43 Cumulative interactive episode length distribution

Minimum performance level sufficient Max. performance 10ms 50ms 100%

90%

80%

70%

60%

50%

Xemacs 40%

30%

20% Cumulative number 10% Cumulative time

0% 1e-05 0.0001 0.001 0.01 0.1 1 Episode length (sec)

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 44 Communication between tasks

CPU 0 CPU 1 CPU 0 CPU 1

75 7 757 R W 89 5 W 778 75 7 W 757 W

W

757 W W 778 75 7

R 2090 757 W W 89 5 R

889 75 7 W 2088 W 757 W

75 7 757 W

757 W

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 45 Producer and consumer episodes

HW sound device

Sound daemon

MP3 player

• Example: MP3 playback through esd sound daemon. • Monitor communications to/from sound daemon. • Distance between producer and consumer episodes determines necessary performance level.

Krisztián Flautner - [email protected] Automatic Performance Setting for Dynamic Voltage Scaling 46