Software-Level Power-Aware Computing

Total Page:16

File Type:pdf, Size:1020Kb

Software-Level Power-Aware Computing Lecture Organizations • Lecture 1: Software-level • Introduction to Low-power systems • Low-power binary enco ding Power-Aware Computing • Power-aware compiler techniques • Lectures 2 & 3 • Dynamic voltage scaling (DVS) techniques Lecture 2 – OS-level DVS: Inter-Task DVS – CilCompiler-llDVS:Itlevel DVS: Intra-TkTask DVS – Application-level DVS • Dynamic power management • Lecture 4 • Software power estimation & optimization • Low-power techniques for multiprocessor systems • Leakage reduction techniques 2 Low Power SW.2 J. Kim/SNU Voltage, Frequency & Energy Basic Idea of DVS 7.000 2 z E ∝ Ncycle · VDD 6.000 Power Deadline 5.000 2 5.0 (a) No • 12.5x108 cycle 4.000 50MHz power-down •5.0V 3.000 • 31.25J 10 25 2.000 5.02 Energy Clock speed (b) Power-down • 5x108 cycle 1.000 50MHz •5.0V 0. 000 • 12. 5J 2.5 2.3 10 25 2.1 1.9 (c) Dynamic • 5x108 cycle 1.7 1.5 voltage •2.0V 131.3 20MHz 1.1 Voltage 2.02 scaling • 2.0J 0.9 0.7 25 Time 0.5 → Slow and Steady wins the race! 3 4 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Key Issues for successful DVS Commercial DVS Processors • Efficient Detection of Slack/Idle Intervals • Transmeta Crusoe • Efficient Voltage Scaling Policy for Slack Intervals • AMD K2 + (PowerNow Technology) • Intel SpeedStep • XScale slack interval How to detect How to scale voltage 5 6 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Voltage Scaling Processors DVS Support in PXA250 • Use Two Registers in PXA250 Xscale Core • CCCR (Core Clock Configuration Register): Commercial Academic – Specify memory clock & core clock Transmeta AMD Intel UC Berkely Ubicom Processors Crusoe Mobile K6 PXA250 (ARM8) LART(StrongARM) (LongRun) (PowerNow) • CCLKCFG (Core Clock Configuration) Register 200~700MHz 192~588MHz 100~400MHz 5~80MHz 59~251MHz SliLScaling Leve l – Set FCS (Frequency Change Sequence) bit to 1.1~1.65V 0.9~2.0V 0.85~1.3V 1.2~3.8V 0.79~1.65V change the clock speed 59↔251MHz : 140μs CP14 register 6 : CCLKCFG 1.1 ↔ 1.65V 0.9 ↔2.0V Each step 1.2 ↔ 3.8V 31 10 SliTiScaling Time 0790.79→1. 65V : 40 μs < 300μs 200μs 500μs 520μs 0.79←1.65V : 5.5ms reserved Scaling Power ?? ?? ?? 130μJ ?? FSC TURBO Change if FCS bit = 1 7 8 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU CCCR Setting Example Voltage Scaling Code 0x41300000 : CCCR 31 9876543210 1 #include <machine/pmu.h> void Main(void) 2 #include <machine/cp14.h> { reserved N M L 3 int i, bb, cc, j; 4 int thread__g[]{,,};args[3] = {0, 1, 2}; 5 for ( k = 1 ; k < 13 ; k++ ) { 6 void change_clock_speed(k); 7 change_clock_speed(int speed) bb = get_os_time(); N 8 { for ( i = 0 ; i < 10000 ; i++ ) j = 10; LM 9 int settings[20]={ 0, 0x121, 0x122, 0x123, 0x124, 0x125, 0x1a2, cc = get_os_time(); 2 3 4 6 10 0x141, 0x1a4, 0x142, 0x1a5, 0x143, 0x144, 0x145 }; printf("%d : %d \n", k, cc - bb); 11 int cccr_val = 0x121, clkcfg_val = 2; } 11 99.5 .85V 199.1 1.0V 298.6 1.1V 12 } 13 cccr_val = settings[speed]; 14 118.0 switch (speed) { 21 235.9 353.9 15 case 6 : clkcfg_val = 3; break; 13V1.3V 16 case 8 : clkcfg_val = 3; break; 31132.7 265.41.1V 398.9 17 case 10 : clkcfg_val = 3; break; 18 default : clkcfg_val = 2; break; 41147.5 294.9 19 } 1.0V 20 memcpy(0x40000000+0x1300000, &cccr_val, 4); 21 CP14_WRTIE_CCLKCFG((g);clkcfg_val); 5 1 165. 9 331. 8 13V1.3V 22 } 23 298. 24 int 12199.1 1.1V 398.1 1.3V 25 get_os_time() 6 26 { 27 int ostime; 2 2 235.9 28 29 memcpy(&ostime, 0x40000000+0xa00010, 4); 32265.4 1.1V 30 return(ostime); 31 } 42294.9 52331.9 1.3V 9 10 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Voltage Scaling in Linux ARM IEM DEMO SVS Kernel module Kernel thread DVS scheduler setNewVoltage() setScaledSpeed() Wake_up 전압 조절 Sleep_ on Wake_up setScaledSpeed() KlhdKernel thread Device driver 2 ~ 3ms ltc1663_i2c_write_data() Wake_up ltc1663_i2c_write_data() Write Driver voltage value Sleep_on LTC1663 DAC Wake_up Write voltage Regulate CPU Vo ltage LTC1663 DAC 11 12 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Successful Low Power S/W Techniques Roadmap 1. Understand workload variations of your target • DVS in Non Real -Time Systems 2. Devise efficient ways to detect them • DVS in Real -Time Systems • Compiler-level DVS: Intra-task DVS 3. Devise efficient ways to utilize the detected workload • OS-level DVS: Inter-task DVS variations using available H/W supports • Application-level DVS – MPEG-decoder implementation • Algorithm-level DVS – Low-power convolution 13 14 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Non Real-Time Jobs DVS for Non Real-Time Jobs • Non Real-Time Jobs • NtiiNo timing cons tittraints • Basic Approach: • No periodic executions • Predict workload based on history information • UkUnknown WCET • Usually based on some variations of interval scheduler – PAST, FLAT – LONG_SHORT , AGED _AVERAGE It is hard to predict the future workload!! – CYCLE, PATTERN, PEAK 15 16 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Key Question PAST • Looking a fixed window into the past • Assume the next window will be like the previous one How can we predict the future workload? • If the past window was • mostly busy ⇒ increase speed • Based on long term history: Hard to adapt quickly for the changed workload • mostly idle ⇒ decrease speed • Based on short term history: Too many clock/voltage changes 17 18 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: PAST FLAT • Try to smooth speed to a global average busy time • Make the utilization of next window to be <const> Utilization = window size • Set speed fast enough to complete the predicted new work being pushed into the coming window PAST FUTURE time low utilization low utilization ? Decrease Decrease speed speed 19 20 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: FLAT LONG-SHORT • Look up the last 12 windows • Short-term past : 3mostrecentwindows3 most recent windows <Const>=0.7 • Long-term past : the remaining windows ? • Workload Prediction • the utilization of next window will be a weighted time average of these 12 windows’ utilizations I/DthdIncrease/Decrease the speed the next utilization to be 0.7 21 22 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: LONG-SHORT AGED-AVERAGE • Employs an exponential-smoothing method utilization = # cycles of busy interval / window size • Workload Prediction 0 .3 .5 1 1 1 .8 .5 .3 .1 0 0 • The utilization of next window will be a weighted average of all previous windows’ utilizations – ggyeometrically reduce the wei ght time 0 12345678 9 10 11 12 0 +.3+.5 +1+1+1+.8 +.5 +.3+ 4(.1+ 0 + 0) current = 0.276 time 9 + 4(3) fclk = 0.276× fmax 23 24 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: AGED_AVERAGE CYCLE • Workload Prediction utilization = # cycles of busy interval / window size • Examine the last 16 windows 0 .3 .5 1 1 1 .8 .5 .3 .1 0 0 – Does there exist a cyclic of length X? – If so, predict by extending this cycle time – Otherwise, use the FLAT algorithm 0 12345678 9 10 11 12 1 2 4 8 current average = 0 + 0 + (0.1) + (0.3) + ⋅⋅⋅ time 3 9 27 81 fclk = average× fmax 25 26 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Example: CYCLE utilization = # cycles of busy interval / window size 0 .4 .8 .1 .3 .5 .7 .0 time 0 12345678 current time | 0 − .3 | + | .4 − .5 | + | .8 − .7 | + | .1 − 0 | error measure = = 0.15 4 Predict : The next utilization will be .3 27 28 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU PATTERN Example: PATTERN • A generalized version of CYCLE • Workload Prediction A BCD • Convert the n-most recent windows’ utilizations 0 0250.25 050.5 7257.25 101.0 into a pattern in alphabet {A, B, C, D}. Pattern = ABCDD Pattern = ABCD • Find the same pppattern in the past 0 .3.511 .1.35.6.9 ⋅⋅⋅⋅⋅⋅ time 12345 ⋅⋅⋅⋅⋅⋅ 89 10 11 12 current Predict : The next utilization will be D time 29 30 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Perspectives-based Algorithm Roadmap • Per-Task Basis Performance Predictions • DVS in Non Real -Time Systems • DVS in Real -Time Systems • Compiler-level DVS: Intra-task DVS • OS-level DVS: Inter-task DVS • Application-level DVS – MPEG-decoder implementation k ×WorkEst old + Work fse WorkEst new = • Algorithm-level DVS k + 1 k × Deadline old + Work fse + Idldle – Low-power convolution Deadline new = k + 1 WorkEst Perf = Deadline [Flautner, OSDI2002] 31 32 Low Power SW.2 J. Kim/SNU Low Power SW.2 J. Kim/SNU Two Types of DVS Algorithms Inter-task DVS • Inter-task DVS algorithms • Inter-Task Voltage Scheduling for Hard Real-Time • Determine the supply voltage and clock speed Systems [Yao95, Hong98, Okuma99, Shin99, Lee99]. on task-by-task basis • Problem : Given a set of tasks, how to assign the proper speed to each task dynamically while guaranteeing all their deadlines. • Intra-task DVS algorithms • Task-by-task Speed Assignment • Determine the suppl y volta ge and clock s peed within a single task boundary – The sl ack ti me due to a tas k use d by fo llow ing tasks, not by the current one.
Recommended publications
  • Approaches in Green Computing
    Special Issue - 2015 International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 NSRCL-2015 Conference Proceedings Approaches in Green Computing Reena Thomas Fedrina J Manjaly 3 rd BCA, Department of Computer Science 3 rd BCA , Department of Computer Science Carmel College Carmel College Mala, Thrissur Mala, Thrissur Abstract— In a 2008 article San Murugesan defined green computing as "the study and practice of designing, manufacturing, using, and disposing of computers, servers, and associated subsystems — such as monitors, printers, storage devices, and networking and communications systems — efficiently and effectively with minimal or no impact on the environment."Murugesan lays out four paths along which he believes the environmental effects of computing should be addressed:Green use, green disposal, green design, and green manufacturing. Green computing can also develop solutions that offer benefits by "aligning all IT processes and practices with the core principles of sustainability, which are to reduce, reuse, and recycle; and finding innovative ways to use IT in business processes to deliver sustainability benefits across the Figure 1: Green Computing Migration Framework enterprise and beyond". I. INTRODUCTION II. APPROACHES In 1992, the U.S. Environmental Protection Agency A. Product longevity launched Energy Star, a voluntary labeling program that is Gartner maintains that the PC manufacturing process designed to promote and recognize energy-efficiency in accounts for 70% of the natural resources used in the life monitors, climate control equipment, and other cycle of a PC. More recently, Fujitsu released a Life Cycle technologies. This resulted in the widespread adoption of Assessment (LCA) of a desktop that show that sleep mode among consumer electronics.
    [Show full text]
  • Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems
    Old Dominion University ODU Digital Commons Electrical & Computer Engineering Theses & Dissertations Electrical & Computer Engineering Winter 2007 Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems Hau Trung Ngo Old Dominion University Follow this and additional works at: https://digitalcommons.odu.edu/ece_etds Part of the Electrical and Computer Engineering Commons Recommended Citation Ngo, Hau T.. "Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems" (2007). Doctor of Philosophy (PhD), Dissertation, Electrical & Computer Engineering, Old Dominion University, DOI: 10.25777/j6kw-q685 https://digitalcommons.odu.edu/ece_etds/185 This Dissertation is brought to you for free and open access by the Electrical & Computer Engineering at ODU Digital Commons. It has been accepted for inclusion in Electrical & Computer Engineering Theses & Dissertations by an authorized administrator of ODU Digital Commons. For more information, please contact [email protected]. POWER-AW ARE DESIGN METHODOLOGIES FOR FPGA-BASED IMPLEMENTATION OF VIDEO PROCESSING SYSTEMS By Hau Trung Ngo B. S. May 2001, Old Dominion University M. S. May 2003, Old Dominion University A Dissertation Submitted to the Faculty of Old Dominion University in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY ELECTRICAL AND COMPUTER ENGINEERING OLD DOMINION UNIVERSITY December 2007 Approved by: Vijayan i K. Asari (Direc(Director) Shirshak K. Dhali (Member) Min Song (Member) Ravi Mukkdmala (Member) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABSTRACT POWER-AWARE DESIGN METHODOLOGIES FOR FPGA-BASED IMPLEMENTATION OF VIDEO PROCESSING SYSTEMS Hau Trung Ngo Old Dominion University Director: Dr. Vijayan Asari The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain.
    [Show full text]
  • A Dynamic Voltage Scaling Algorithm for Sporadic Tasks∗
    In: Proceedings of the 24th IEEE Real-Time Systems Symposium, Cancun, Mexico, December 2003, pp. 52-62. A Dynamic Voltage Scaling Algorithm for Sporadic Tasks¤ Ala0 Qadi Steve Goddard Shane Farritor Computer Science & Engineering Mechanical Engineering University of Nebraska—Lincoln University of Nebraska - Lincoln Lincoln, NE 68588-0115 Lincoln, NE 68588-0656 faqadi,[email protected] [email protected] Abstract In CMOS circuits the power consumed by a CMOS gate is proportional to the square of the voltage applied to the Dynamic voltage scaling (DVS) algorithms save energy circuit, as shown by Equation (1) where CL is the gate load by scaling down the processor frequency when the proces- capacitance (output capacitance),VDD is the supply voltage sor is not fully loaded. Many algorithms have been proposed and f is the clock frequency [29]. The circuit delay td is for periodic and aperiodic task models but none support the given by Equation (2) where k is a constant depending on canonical sporadic task model. A DVS algorithm, called the output gate size and the output capacitance and VT is DVSST, is presented that can be used with sporadic tasks the threshold voltage [29]. The clock frequency is inversely in conjunction with preemptive EDF scheduling. The algo- proportional to the circuit delay; it is expressed using td and rithm is proven to guarantee each task meets its deadline the logic depth of a critical path as in Equation (3) where Ld while saving the maximum amount of energy possible with is the depth of the critical path [29]. processor frequency scaling.
    [Show full text]
  • Comptia Fc0-Gr1 Exam Questions & Answers
    COMPTIA FC0-GR1 EXAM QUESTIONS & ANSWERS Number : FC0-GR1 Passing Score : 800 Time Limit : 120 min File Version : 31.4 http://www.gratisexam.com/ COMPTIA FC0-GR1 EXAM QUESTIONS & ANSWERS Exam Name: CompTIA Strata Green IT Exam Visualexams QUESTION 1 A small business currently has a server room with a large cooling system that is appropriate for its size. The location of the server room is the top level of a building. The server room is filled with incandescent lighting that needs to continuously stay on for security purposes. Which of the following would be the MOST cost-effective way for the company to reduce the server rooms energy footprint? A. Replace all incandescent lighting with energy saving neon lighting. B. Set an auto-shutoff policy for all the lights in the room to reduce energy consumption after hours. C. Replace all incandescent lighting with energy saving fluorescent lighting. D. Consolidate server systems into a lower number of racks, centralizing airflow and cooling in the room. Correct Answer: C Section: (none) Explanation Explanation/Reference: QUESTION 2 Which of the following methods effectively removes data from a hard drive prior to disposal? (Select TWO). A. Use the remove hardware OS feature B. Formatting the hard drive C. Physical destruction D. Degauss the drive E. Overwriting data with 1s and 0s by utilizing software Correct Answer: CE Section: (none) Explanation Explanation/Reference: QUESTION 3 Which of the following terms is used when printing data on both the front and the back of paper? A. Scaling B. Copying C. Duplex D. Simplex Correct Answer: C Section: (none) Explanation Explanation/Reference: QUESTION 4 A user reports that their cell phone battery is dead and cannot hold a charge.
    [Show full text]
  • Vertigo: Automatic Performance-Setting for Linux
    USENIX Association Proceedings of the 5th Symposium on Operating Systems Design and Implementation Boston, Massachusetts, USA December 9–11, 2002 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2002 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Vertigo: Automatic Performance-Setting for Linux Krisztián Flautner Trevor Mudge [email protected] [email protected] ARM Limited The University of Michigan 110 Fulbourn Road 1301 Beal Avenue Cambridge, UK CB1 9NJ Ann Arbor, MI 48109-2122 Abstract player, game machine, camera, GPS, even the wallet— into a single device. This requires processors that are Combining high performance with low power con- capable of high performance and modest power con- sumption is becoming one of the primary objectives of sumption. Moreover, to be power efficient, the proces- processor designs. Instead of relying just on sleep mode sors for the next generation communicator need to take for conserving power, an increasing number of proces- advantage of the highly variable performance require- sors take advantage of the fact that reducing the clock ments of the applications they are likely to run. For frequency and corresponding operating voltage of the example an MPEG video player requires about an order CPU can yield quadratic decrease in energy use.
    [Show full text]
  • Power Reduction Techniques for Microprocessor Systems
    Power Reduction Techniques For Microprocessor Systems VASANTH VENKATACHALAM AND MICHAEL FRANZ University of California, Irvine Power consumption is a major factor that limits the performance of computers. We survey the “state of the art” in techniques that reduce the total power consumed by a microprocessor system over time. These techniques are applied at various levels ranging from circuits to architectures, architectures to system software, and system software to applications. They also include holistic approaches that will become more important over the next decade. We conclude that power management is a multifaceted discipline that is continually expanding with new techniques being developed at every level. These techniques may eventually allow computers to break through the “power wall” and achieve unprecedented levels of performance, versatility, and reliability. Yet it remains too early to tell which techniques will ultimately solve the power problem. Categories and Subject Descriptors: C.5.3 [Computer System Implementation]: Microcomputers—Microprocessors;D.2.10 [Software Engineering]: Design— Methodologies; I.m [Computing Methodologies]: Miscellaneous General Terms: Algorithms, Design, Experimentation, Management, Measurement, Performance Additional Key Words and Phrases: Energy dissipation, power reduction 1. INTRODUCTION of power; so much power, in fact, that their power densities and concomitant Computer scientists have always tried to heat generation are rapidly approaching improve the performance of computers. levels comparable to nuclear reactors But although today’s computers are much (Figure 1). These high power densities faster and far more versatile than their impair chip reliability and life expectancy, predecessors, they also consume a lot increase cooling costs, and, for large Parts of this effort have been sponsored by the National Science Foundation under ITR grant CCR-0205712 and by the Office of Naval Research under grant N00014-01-1-0854.
    [Show full text]
  • Vertigo: Automatic Performance-Setting for Linux
    Vertigo: Automatic Performance-Setting for Linux Krisztián Flautner ARM Limited, Cambridge, UK Trevor Mudge The University of Michigan Presented by Choi Hojung Background(1) • In 2002 – need for low power and high performance processors – from embedded computers to servers – high performance – battery operated Background(2) – Intel SpeedStep • SpeedStep by Intel – No built-in performance-setting policy – A simple approach by the usage model – When on AC power, processor runs at higher speed. – When on battery power , processor runs at a slower speed, thus saving battery power Background(3) - LongRun • LongRun for Crusoe™, by Transmeta – power management that dynamically manages the frequency and voltage levels at runtime – use historical utilization to guide clock rate selection – in processor’s firmware – interval-based algorithm Background(4) - LongRun • Flaws & Questions – in Processor’s firmware – utilization periods can be obscured when all tasks are observed in the aggregate – not have any information about interactive performance in operating system level – a single algorithm perform well under all conditions Background(5) - DVS • Dynamic voltage scaling(DVS) – also called Dynamic Voltage and Frequency Scaling(DVFS) – reduces the power consumed by a processor by lowering its operating voltage – P : power consumption - V : the supply voltage – C : the capacitance - f : operating frequency Proposed • What Vertigo proposed – Implemented in OS kernel level to use a richer set of data for prediction – to reduce the processor’s performance
    [Show full text]
  • ECE 571 – Advanced Microprocessor-Based Design Lecture 28
    ECE 571 { Advanced Microprocessor-Based Design Lecture 28 Vince Weaver http://web.eece.maine.edu/~vweaver [email protected] 9 November 2020 Announcements • HW#9 will be posted, read AMD Zen 3 Article • Remember, no class on Wednesday 1 When can we scale CPU down? • System idle • System memory or I/O bound • Poor multi-threaded code (spinning in spin locks) • Thermal emergency • User preference (want fans to run less) 2 Non-CPU power saving • RAM • GPU • Ethernet / Wireless • Disk • PCI • USB 3 GPU power saving • From Intel lesswatts.org ◦ Framebuffer Compression ◦ Backlight Control ◦ Minimized Vertical Blank Interrupts ◦ Auto Display Brightness • from LWN: http://lwn.net/Articles/318727/ ◦ Clock gating or reclocking ◦ Fewer memory accesses: compression. Simpler background image, lower power 4 ◦ Moving mouse: 15W. Blinking cursor: 2W ◦ Powering off unneeded output port, 0.5W ◦ LVDS (low-voltage digital signaling) interface, lower refresh rate, 0.5W (start getting artifacts) 5 More LCD • When LCD not powered, not twisted, light comes through • Active matrix display, transistor and capacitor at each pixel (which can often have 255 levels of brightness). Needs to be refreshed like memory. One row at a time usually. 6 Ethernet • PHY (transmitter) can take several watts • WOL can draw power when system is turned off • Gigabit draw 2W-4W more than 100Megabit 10 Gigabit 10-20W more than 100Megabit • Takes up to 2 seconds to re-negotiate speeds • Green Ethernet IEEE 802.3az 7 WLAN • power-save poll { go to sleep, have server queue up packets. latency • Auto association { how aggressively it searches for access points • RFKill switch • Unnecessary Bluetooth 8 Disks • SATA Aggressive Link Power Management { shuts down when no I/O for a while, save up to 1.5W • Filesystem atime • Disk power management (spin down) (lifetime of drive) • VM writeback { less power if queue up, but power failure potentially worse 9 Soundcards • Low-power mode 10 USB • autosuspend.
    [Show full text]
  • Dynamic Thermal Management for High-Performance Microprocessors
    Dynamic Thermal Management for High-Performance Microprocessors David Brooks Margaret Martonosi Department of Electrical Engineering Department of Electrical Engineering Princeton University Princeton University dbrooks @ee.princeton.edu mrm @ee.princeton.edu Abstract [23]. The second major source of design complexity in- volves power delivery, specifically the on-chip decoupling With the increasing clock rate and transistor count of capacitances required by the power distribution network. today’s microprocessors, power dissipation is becoming a Unfortunately, these cooling techniques must be de- critical component of system design complexity. Thermal signed to withstand the maximum possible power dissipa- and power-delivery issues are becoming especially critical tion of the microprocessor, even if these cases rarely oc- for high-performance computing systems. cur in typical applications. For example, while the Alpha In this work, we investigate dynamic thermal manage- ment as a technique to control CPUpower dissipation. With 21 264 processor is rated as having a maximum power dis- the increasing usage of clock gating techniques, the average sipation of 95W when running “max-power” benchmarks, power dissipation typically seen by common applications is the average power dissipation was found to be only 72W becoming much less than the chip’s rated maximum power for typical applications [IO]. The increased use of clock dissipation. However; system designers still must design gating and other power management techniques that target thermal heat sinks to withstand the worst-case scenario. average power dissipation will expand this gap even further We define and investigate the major components of any in future processors. This disparity between the maximum dynamic thermal management scheme.
    [Show full text]
  • An Analysis of Efficient Multi-Core Global Power Management Policies
    An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret Martonosi∗ †IBM T.J. Watson Research Center ∗Department of Electrical Engineering Yorktown Heights Princeton University Abstract causes super-linear growth in non-core area and power in order to meet scalable performance targets. Chip-level power and thermal implications will continue to As such, the power dissipation problem did not disappear rule as one of the primary design constraints and performance in the new multi-core regime. In fact, power and peak tem- limiters. The gap between average and peak power actually perature continue to be the key performance limiters, even as widens with increased levels of core integration. As such, if other constraints like chip I/O bandwidth and intra/inter chip per-core control of power levels (modes) is possible, a global data bandwidth and available area for on-chip cache begin to power manager should be able to dynamically set the modes emerge as new (secondary) limiters. Consequently, power and suitably. This would be done in tune with the workload char- thermal management of multi-core (often organized as CMP) acteristics, in order to always maintain a chip-level power that chips has been one of the primary design constraints and an is below the specified budget. Furthermore, this should be pos- active research area. sible without significant degradation of chip-level throughput Prior research in this area has been primarily limited to performance. We analyze and validate this concept in detail in studying the efficacy of providing dynamic management of this paper.
    [Show full text]
  • ECE 571 – Advanced Microprocessor-Based Design Lecture 19
    ECE 571 { Advanced Microprocessor-Based Design Lecture 19 Vince Weaver http://www.eece.maine.edu/∼vweaver [email protected] 2 April 2013 Project/HW Reminder • Homework #4 was posted, due on Friday. • Hardware issues with x86. Procure machine? 1 Metrics Clarification • Energy delay = E*t (J*s), Energy delay squared = E*t*t (J*s*s), Smaller is better • In related papers it is confusing, as no one shows formula (despite academic papers loving formulas). Often they use the inverse (so larger is better?) which also confuses things • Other papers use MIPS/Watt = insn/J Not cross-platform. You really want to optimize for time, 2 not for instructions which can vary for lots of reasons and doesn't exactly equate to time. 3 DVFS and other CPU Power/Energy Saving Methods • A lot of related work • Will focus on actual implementations rather than academic papers this time 4 CMOS Energy Equation, Again 2 • Etot = [(CtotVddαf) + (NtotIleakageVdd)]t 5 Delay, Again CLVdd • Td = W µCox( L )(Vdd−Vt) 2 (Vdd−Vt) • Simplifies to fMAX ∼ Vdd • If you lower f, you can lower Vdd 6 DVFS • Voltage planes { on CMP might share voltage planes so have to scale multiple processors at a time • DC to DC converter, programmable. • Phase-Locked Loops. Orders of ms to change. Multiplier of some crystal frequency. • Senger et al ISCAS 2006 lists some alternatives. Two phase locked loops? High frequency loop and have programmable divider? 7 • Often takes time, on order of milliseconds, to switch frequency. Switching voltage can be done with less hassle. 8 When can we scale
    [Show full text]
  • Power Savings in Mpsoc
    DEGREE PROJECT, IN , SECOND LEVEL STOCKHOLM, SWEDEN 2009 Power Savings in MPSoC IOANNIS SAVVIDIS KTH ROYAL INSTITUTE OF TECHNOLOGY ICT SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY TRITA TRITA-ICT-EX-2009:158 www.kth.se Power Savings in MPSoC Ioannis T. Savvidis Master of Science Thesis Stockholm, Sweden 2009 TRITA -ICT -EX -2009:158 PPOOWWEERR SSAAVVIINNGGSS IINN MMPPSSOOCC by Ioannis T. Savvidis [email protected] , [email protected] A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE Master of Science Supervisors: Industry: Tume Wihamre KTH: Johnny Öberg Examiner: Prof. Ahmed Hemani Ericsson AB Royal Institute of Technology (KTH) Stockholm, September 2009 This page has been intentionally left blank. ii “For the things we have to learn before we can do, we learn by doing” Aristotle, 384BC (Stageira) - 322BC (Chalcis) Ioannis Savvidis: Power Savings in MPSoC MSc dissertation project, ©September 2009 This thesis report is submitted in partial fulfillment of the requirements for the degree of Master of Science at the Royal Institute of Technology (KTH), Stockholm, Sweden. Content is based on the work carried out in the Department of Digital ASIC, Signal Processing HW at Ericsson AB, Kista, Stockholm, Sweden. Copyright and similar legal issues are regulated by employment contracts established between ERICSSON AB and the author. iii This page has been intentionally left blank. iv Abstract High performance integrated circuits suffer from a permanent increase of the dissipated power per square millimeter of silicon, over the past years. This phenomenon is driven by the miniaturization of CMOS processes, increasing packing density of transistors and increasing clock frequencies of microchips, thus pushing heat removal and power distribution to the forefront of the problems confronting the advance of microelectronics.
    [Show full text]