Software-Level Power-Aware Computing

Lecture Organizations • Lecture 1: Software-level • Introduction to Low-power systems • Low-power binary enco ding Power-Aware Computing • Power-aware compiler techniques • Lectures 2 & 3 • Dynamic voltage scaling (DVS) techniques Lecture 2 – OS-level DVS: Inter-Task DVS – CilCompiler-llDVS:Itlevel DVS: Intra-TkTask DVS – Application-level DVS • Dynamic power management • Lecture 4 • Software power estimation & optimization • Low-power techniques for multiprocessor systems • Leakage reduction techniques 2 Low Power SW.2 J. Kim/SNU Voltage, Frequency & Energy Basic Idea of DVS 7.000 2 z E ∝ Ncycle · VDD 6.000 Power Deadline 5.000 2 5.0 (a) No • 12.5x108 cycle 4.000 50MHz power-down •5.0V 3.000 • 31.25J 10 25 2.000 5.02 Energy Clock speed (b) Power-down • 5x108 cycle 1.000 50MHz •5.0V 0. 000 • 12. 5J 2.5 2.3 10 25 2.1 1.9 (c) Dynamic • 5x108 cycle 1.7 1.5 voltage •2.0V 131.3 20MHz 1.1 Voltage 2.02 scaling • 2.0J 0.9 0.7 25 Time 0.5 → Slow and Steady wins the race! 3 4 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Key Issues for successful DVS Commercial DVS Processors • Efficient Detection of Slack/Idle Intervals • Transmeta Crusoe • Efficient Voltage Scaling Policy for Slack Intervals • AMD K2 + (PowerNow Technology) • Intel SpeedStep • XScale slack interval How to detect How to scale voltage 5 6 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Voltage Scaling Processors DVS Support in PXA250 • Use Two Registers in PXA250 Xscale Core • CCCR (Core Clock Configuration Register): Commercial Academic – Specify memory clock & core clock Transmeta AMD Intel UC Berkely Ubicom Processors Crusoe Mobile K6 PXA250 (ARM8) LART(StrongARM) (LongRun) (PowerNow) • CCLKCFG (Core Clock Configuration) Register 200~700MHz 192~588MHz 100~400MHz 5~80MHz 59~251MHz SliLScaling Leve l – Set FCS (Frequency Change Sequence) bit to 1.1~1.65V 0.9~2.0V 0.85~1.3V 1.2~3.8V 0.79~1.65V change the clock speed 59↔251MHz : 140μs CP14 register 6 : CCLKCFG 1.1 ↔ 1.65V 0.9 ↔2.0V Each step 1.2 ↔ 3.8V 31 10 SliTiScaling Time 0790.79→1. 65V : 40 μs < 300μs 200μs 500μs 520μs 0.79←1.65V : 5.5ms reserved Scaling Power ?? ?? ?? 130μJ ?? FSC TURBO Change if FCS bit = 1 7 8 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU CCCR Setting Example Voltage Scaling Code 0x41300000 : CCCR 31 9876543210 1 #include <machine/pmu.h> void Main(void) 2 #include <machine/cp14.h> { reserved N M L 3 int i, bb, cc, j; 4 int thread__g[]{,,};args[3] = {0, 1, 2}; 5 for ( k = 1 ; k < 13 ; k++ ) { 6 void change_clock_speed(k); 7 change_clock_speed(int speed) bb = get_os_time(); N 8 { for ( i = 0 ; i < 10000 ; i++ ) j = 10; LM 9 int settings[20]={ 0, 0x121, 0x122, 0x123, 0x124, 0x125, 0x1a2, cc = get_os_time(); 2 3 4 6 10 0x141, 0x1a4, 0x142, 0x1a5, 0x143, 0x144, 0x145 }; printf("%d : %d \n", k, cc - bb); 11 int cccr_val = 0x121, clkcfg_val = 2; } 11 99.5 .85V 199.1 1.0V 298.6 1.1V 12 } 13 cccr_val = settings[speed]; 14 118.0 switch (speed) { 21 235.9 353.9 15 case 6 : clkcfg_val = 3; break; 13V1.3V 16 case 8 : clkcfg_val = 3; break; 31132.7 265.41.1V 398.9 17 case 10 : clkcfg_val = 3; break; 18 default : clkcfg_val = 2; break; 41147.5 294.9 19 } 1.0V 20 memcpy(0x40000000+0x1300000, &cccr_val, 4); 21 CP14_WRTIE_CCLKCFG((g);clkcfg_val); 5 1 165. 9 331. 8 13V1.3V 22 } 23 298. 24 int 12199.1 1.1V 398.1 1.3V 25 get_os_time() 6 26 { 27 int ostime; 2 2 235.9 28 29 memcpy(&ostime, 0x40000000+0xa00010, 4); 32265.4 1.1V 30 return(ostime); 31 } 42294.9 52331.9 1.3V 9 10 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Voltage Scaling in Linux ARM IEM DEMO SVS Kernel module Kernel thread DVS scheduler setNewVoltage() setScaledSpeed() Wake_up 전압 조절 Sleep_ on Wake_up setScaledSpeed() KlhdKernel thread Device driver 2 ~ 3ms ltc1663_i2c_write_data() Wake_up ltc1663_i2c_write_data() Write Driver voltage value Sleep_on LTC1663 DAC Wake_up Write voltage Regulate CPU Vo ltage LTC1663 DAC 11 12 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Successful Low Power S/W Techniques Roadmap 1. Understand workload variations of your target • DVS in Non Real -Time Systems 2. Devise efficient ways to detect them • DVS in Real -Time Systems • Compiler-level DVS: Intra-task DVS 3. Devise efficient ways to utilize the detected workload • OS-level DVS: Inter-task DVS variations using available H/W supports • Application-level DVS – MPEG-decoder implementation • Algorithm-level DVS – Low-power convolution 13 14 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Non Real-Time Jobs DVS for Non Real-Time Jobs • Non Real-Time Jobs • NtiiNo timing cons tittraints • Basic Approach: • No periodic executions • Predict workload based on history information • UkUnknown WCET • Usually based on some variations of interval scheduler – PAST, FLAT – LONG_SHORT , AGED _AVERAGE It is hard to predict the future workload!! – CYCLE, PATTERN, PEAK 15 16 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Key Question PAST • Looking a fixed window into the past • Assume the next window will be like the previous one How can we predict the future workload? • If the past window was • mostly busy ⇒ increase speed • Based on long term history: Hard to adapt quickly for the changed workload • mostly idle ⇒ decrease speed • Based on short term history: Too many clock/voltage changes 17 18 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: PAST FLAT • Try to smooth speed to a global average busy time • Make the utilization of next window to be <const> Utilization = window size • Set speed fast enough to complete the predicted new work being pushed into the coming window PAST FUTURE time low utilization low utilization ? Decrease Decrease speed speed 19 20 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: FLAT LONG-SHORT • Look up the last 12 windows • Short-term past : 3mostrecentwindows3 most recent windows <Const>=0.7 • Long-term past : the remaining windows ? • Workload Prediction • the utilization of next window will be a weighted time average of these 12 windows’ utilizations I/DthdIncrease/Decrease the speed the next utilization to be 0.7 21 22 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: LONG-SHORT AGED-AVERAGE • Employs an exponential-smoothing method utilization = # cycles of busy interval / window size • Workload Prediction 0 .3 .5 1 1 1 .8 .5 .3 .1 0 0 • The utilization of next window will be a weighted average of all previous windows’ utilizations – ggyeometrically reduce the wei ght time 0 12345678 9 10 11 12 0 +.3+.5 +1+1+1+.8 +.5 +.3+ 4(.1+ 0 + 0) current = 0.276 time 9 + 4(3) fclk = 0.276× fmax 23 24 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: AGED_AVERAGE CYCLE • Workload Prediction utilization = # cycles of busy interval / window size • Examine the last 16 windows 0 .3 .5 1 1 1 .8 .5 .3 .1 0 0 – Does there exist a cyclic of length X? – If so, predict by extending this cycle time – Otherwise, use the FLAT algorithm 0 12345678 9 10 11 12 1 2 4 8 current average = 0 + 0 + (0.1) + (0.3) + ⋅⋅⋅ time 3 9 27 81 fclk = average× fmax 25 26 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: CYCLE utilization = # cycles of busy interval / window size 0 .4 .8 .1 .3 .5 .7 .0 time 0 12345678 current time | 0 − .3 | + | .4 − .5 | + | .8 − .7 | + | .1 − 0 | error measure = = 0.15 4 Predict : The next utilization will be .3 27 28 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU PATTERN Example: PATTERN • A generalized version of CYCLE • Workload Prediction A BCD • Convert the n-most recent windows’ utilizations 0 0250.25 050.5 7257.25 101.0 into a pattern in alphabet {A, B, C, D}. Pattern = ABCDD Pattern = ABCD • Find the same pppattern in the past 0 .3.511 .1.35.6.9 ⋅⋅⋅⋅⋅⋅ time 12345 ⋅⋅⋅⋅⋅⋅ 89 10 11 12 current Predict : The next utilization will be D time 29 30 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Perspectives-based Algorithm Roadmap • Per-Task Basis Performance Predictions • DVS in Non Real -Time Systems • DVS in Real -Time Systems • Compiler-level DVS: Intra-task DVS • OS-level DVS: Inter-task DVS • Application-level DVS – MPEG-decoder implementation k ×WorkEst old + Work fse WorkEst new = • Algorithm-level DVS k + 1 k × Deadline old + Work fse + Idldle – Low-power convolution Deadline new = k + 1 WorkEst Perf = Deadline [Flautner, OSDI2002] 31 32 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Two Types of DVS Algorithms Inter-task DVS • Inter-task DVS algorithms • Inter-Task Voltage Scheduling for Hard Real-Time • Determine the supply voltage and clock speed Systems [Yao95, Hong98, Okuma99, Shin99, Lee99]. on task-by-task basis • Problem : Given a set of tasks, how to assign the proper speed to each task dynamically while guaranteeing all their deadlines. • Intra-task DVS algorithms • Task-by-task Speed Assignment • Determine the suppl y volta ge and clock s peed within a single task boundary – The sl ack ti me due to a tas k use d by fo llow ing tasks, not by the current one.

Load more