Software-Level Power-Aware Computing

Software-Level Power-Aware Computing

Lecture Organizations • Lecture 1: Software-level • Introduction to Low-power systems • Low-power binary enco ding Power-Aware Computing • Power-aware compiler techniques • Lectures 2 & 3 • Dynamic voltage scaling (DVS) techniques Lecture 2 – OS-level DVS: Inter-Task DVS – CilCompiler-llDVS:Itlevel DVS: Intra-TkTask DVS – Application-level DVS • Dynamic power management • Lecture 4 • Software power estimation & optimization • Low-power techniques for multiprocessor systems • Leakage reduction techniques 2 Low Power SW.2 J. Kim/SNU Voltage, Frequency & Energy Basic Idea of DVS 7.000 2 z E ∝ Ncycle · VDD 6.000 Power Deadline 5.000 2 5.0 (a) No • 12.5x108 cycle 4.000 50MHz power-down •5.0V 3.000 • 31.25J 10 25 2.000 5.02 Energy Clock speed (b) Power-down • 5x108 cycle 1.000 50MHz •5.0V 0. 000 • 12. 5J 2.5 2.3 10 25 2.1 1.9 (c) Dynamic • 5x108 cycle 1.7 1.5 voltage •2.0V 131.3 20MHz 1.1 Voltage 2.02 scaling • 2.0J 0.9 0.7 25 Time 0.5 → Slow and Steady wins the race! 3 4 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Key Issues for successful DVS Commercial DVS Processors • Efficient Detection of Slack/Idle Intervals • Transmeta Crusoe • Efficient Voltage Scaling Policy for Slack Intervals • AMD K2 + (PowerNow Technology) • Intel SpeedStep • XScale slack interval How to detect How to scale voltage 5 6 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Voltage Scaling Processors DVS Support in PXA250 • Use Two Registers in PXA250 Xscale Core • CCCR (Core Clock Configuration Register): Commercial Academic – Specify memory clock & core clock Transmeta AMD Intel UC Berkely Ubicom Processors Crusoe Mobile K6 PXA250 (ARM8) LART(StrongARM) (LongRun) (PowerNow) • CCLKCFG (Core Clock Configuration) Register 200~700MHz 192~588MHz 100~400MHz 5~80MHz 59~251MHz SliLScaling Leve l – Set FCS (Frequency Change Sequence) bit to 1.1~1.65V 0.9~2.0V 0.85~1.3V 1.2~3.8V 0.79~1.65V change the clock speed 59↔251MHz : 140μs CP14 register 6 : CCLKCFG 1.1 ↔ 1.65V 0.9 ↔2.0V Each step 1.2 ↔ 3.8V 31 10 SliTiScaling Time 0790.79→1. 65V : 40 μs < 300μs 200μs 500μs 520μs 0.79←1.65V : 5.5ms reserved Scaling Power ?? ?? ?? 130μJ ?? FSC TURBO Change if FCS bit = 1 7 8 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU CCCR Setting Example Voltage Scaling Code 0x41300000 : CCCR 31 9876543210 1 #include <machine/pmu.h> void Main(void) 2 #include <machine/cp14.h> { reserved N M L 3 int i, bb, cc, j; 4 int thread__g[]{,,};args[3] = {0, 1, 2}; 5 for ( k = 1 ; k < 13 ; k++ ) { 6 void change_clock_speed(k); 7 change_clock_speed(int speed) bb = get_os_time(); N 8 { for ( i = 0 ; i < 10000 ; i++ ) j = 10; LM 9 int settings[20]={ 0, 0x121, 0x122, 0x123, 0x124, 0x125, 0x1a2, cc = get_os_time(); 2 3 4 6 10 0x141, 0x1a4, 0x142, 0x1a5, 0x143, 0x144, 0x145 }; printf("%d : %d \n", k, cc - bb); 11 int cccr_val = 0x121, clkcfg_val = 2; } 11 99.5 .85V 199.1 1.0V 298.6 1.1V 12 } 13 cccr_val = settings[speed]; 14 118.0 switch (speed) { 21 235.9 353.9 15 case 6 : clkcfg_val = 3; break; 13V1.3V 16 case 8 : clkcfg_val = 3; break; 31132.7 265.41.1V 398.9 17 case 10 : clkcfg_val = 3; break; 18 default : clkcfg_val = 2; break; 41147.5 294.9 19 } 1.0V 20 memcpy(0x40000000+0x1300000, &cccr_val, 4); 21 CP14_WRTIE_CCLKCFG((g);clkcfg_val); 5 1 165. 9 331. 8 13V1.3V 22 } 23 298. 24 int 12199.1 1.1V 398.1 1.3V 25 get_os_time() 6 26 { 27 int ostime; 2 2 235.9 28 29 memcpy(&ostime, 0x40000000+0xa00010, 4); 32265.4 1.1V 30 return(ostime); 31 } 42294.9 52331.9 1.3V 9 10 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Voltage Scaling in Linux ARM IEM DEMO SVS Kernel module Kernel thread DVS scheduler setNewVoltage() setScaledSpeed() Wake_up 전압 조절 Sleep_ on Wake_up setScaledSpeed() KlhdKernel thread Device driver 2 ~ 3ms ltc1663_i2c_write_data() Wake_up ltc1663_i2c_write_data() Write Driver voltage value Sleep_on LTC1663 DAC Wake_up Write voltage Regulate CPU Vo ltage LTC1663 DAC 11 12 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Successful Low Power S/W Techniques Roadmap 1. Understand workload variations of your target • DVS in Non Real -Time Systems 2. Devise efficient ways to detect them • DVS in Real -Time Systems • Compiler-level DVS: Intra-task DVS 3. Devise efficient ways to utilize the detected workload • OS-level DVS: Inter-task DVS variations using available H/W supports • Application-level DVS – MPEG-decoder implementation • Algorithm-level DVS – Low-power convolution 13 14 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Non Real-Time Jobs DVS for Non Real-Time Jobs • Non Real-Time Jobs • NtiiNo timing cons tittraints • Basic Approach: • No periodic executions • Predict workload based on history information • UkUnknown WCET • Usually based on some variations of interval scheduler – PAST, FLAT – LONG_SHORT , AGED _AVERAGE It is hard to predict the future workload!! – CYCLE, PATTERN, PEAK 15 16 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Key Question PAST • Looking a fixed window into the past • Assume the next window will be like the previous one How can we predict the future workload? • If the past window was • mostly busy ⇒ increase speed • Based on long term history: Hard to adapt quickly for the changed workload • mostly idle ⇒ decrease speed • Based on short term history: Too many clock/voltage changes 17 18 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: PAST FLAT • Try to smooth speed to a global average busy time • Make the utilization of next window to be <const> Utilization = window size • Set speed fast enough to complete the predicted new work being pushed into the coming window PAST FUTURE time low utilization low utilization ? Decrease Decrease speed speed 19 20 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: FLAT LONG-SHORT • Look up the last 12 windows • Short-term past : 3mostrecentwindows3 most recent windows <Const>=0.7 • Long-term past : the remaining windows ? • Workload Prediction • the utilization of next window will be a weighted time average of these 12 windows’ utilizations I/DthdIncrease/Decrease the speed the next utilization to be 0.7 21 22 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: LONG-SHORT AGED-AVERAGE • Employs an exponential-smoothing method utilization = # cycles of busy interval / window size • Workload Prediction 0 .3 .5 1 1 1 .8 .5 .3 .1 0 0 • The utilization of next window will be a weighted average of all previous windows’ utilizations – ggyeometrically reduce the wei ght time 0 12345678 9 10 11 12 0 +.3+.5 +1+1+1+.8 +.5 +.3+ 4(.1+ 0 + 0) current = 0.276 time 9 + 4(3) fclk = 0.276× fmax 23 24 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: AGED_AVERAGE CYCLE • Workload Prediction utilization = # cycles of busy interval / window size • Examine the last 16 windows 0 .3 .5 1 1 1 .8 .5 .3 .1 0 0 – Does there exist a cyclic of length X? – If so, predict by extending this cycle time – Otherwise, use the FLAT algorithm 0 12345678 9 10 11 12 1 2 4 8 current average = 0 + 0 + (0.1) + (0.3) + ⋅⋅⋅ time 3 9 27 81 fclk = average× fmax 25 26 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: CYCLE utilization = # cycles of busy interval / window size 0 .4 .8 .1 .3 .5 .7 .0 time 0 12345678 current time | 0 − .3 | + | .4 − .5 | + | .8 − .7 | + | .1 − 0 | error measure = = 0.15 4 Predict : The next utilization will be .3 27 28 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU PATTERN Example: PATTERN • A generalized version of CYCLE • Workload Prediction A BCD • Convert the n-most recent windows’ utilizations 0 0250.25 050.5 7257.25 101.0 into a pattern in alphabet {A, B, C, D}. Pattern = ABCDD Pattern = ABCD • Find the same pppattern in the past 0 .3.511 .1.35.6.9 ⋅⋅⋅⋅⋅⋅ time 12345 ⋅⋅⋅⋅⋅⋅ 89 10 11 12 current Predict : The next utilization will be D time 29 30 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Perspectives-based Algorithm Roadmap • Per-Task Basis Performance Predictions • DVS in Non Real -Time Systems • DVS in Real -Time Systems • Compiler-level DVS: Intra-task DVS • OS-level DVS: Inter-task DVS • Application-level DVS – MPEG-decoder implementation k ×WorkEst old + Work fse WorkEst new = • Algorithm-level DVS k + 1 k × Deadline old + Work fse + Idldle – Low-power convolution Deadline new = k + 1 WorkEst Perf = Deadline [Flautner, OSDI2002] 31 32 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Two Types of DVS Algorithms Inter-task DVS • Inter-task DVS algorithms • Inter-Task Voltage Scheduling for Hard Real-Time • Determine the supply voltage and clock speed Systems [Yao95, Hong98, Okuma99, Shin99, Lee99]. on task-by-task basis • Problem : Given a set of tasks, how to assign the proper speed to each task dynamically while guaranteeing all their deadlines. • Intra-task DVS algorithms • Task-by-task Speed Assignment • Determine the suppl y volta ge and clock s peed within a single task boundary – The sl ack ti me due to a tas k use d by fo llow ing tasks, not by the current one.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    31 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us