Dynamic Frequency and Voltage Scaling for a Multiple-Clock-Domain Microprocessor

Total Page:16

File Type:pdf, Size:1020Kb

Dynamic Frequency and Voltage Scaling for a Multiple-Clock-Domain Microprocessor DYNAMIC FREQUENCY AND VOLTAGE SCALING FOR A MULTIPLE-CLOCK-DOMAIN MICROPROCESSOR MULTIPLE CLOCK DOMAINS IS ONE SOLUTION TO THE INCREASING PROBLEM OF PROPAGATING THE CLOCK SIGNAL ACROSS INCREASINGLY LARGER AND FASTER CHIPS. THE ABILITY TO INDEPENDENTLY SCALE FREQUENCY AND VOLTAGE IN EACH DOMAIN CREATES A POWERFUL MEANS OF REDUCING POWER DISSIPATION. Grigorios Magklis Demand for higher processor per- domains can have independent voltage and Intel formance has led to a dramatic increase in frequency control, enabling dynamic voltage clock frequency as well as an increasing num- scaling at the domain level. ber of transistors in the processor core. As Global dynamic voltage scaling already Greg Semeraro chips become faster and larger, designers face appears in many systems and can help reduce significant challenges, including global clock power dissipation for rate-based and partial- Rochester Institute of distribution and power dissipation. ly idle workloads. An MCD architecture can A multiple clock domain (MCD) microar- save power even during intensive computa- Technology chitecture,1 which uses a globally asynchro- tion by slowing domains that are compara- nous, locally synchronous (GALS) clocking tively unimportant to the application’s current style,2,3 permits future aggressive frequency critical path, even when it is impossible to David H. Albonesi increases, maintains a synchronous design completely gate off those domains. The dis- methodology, and exploits the trend of mak- advantage is the need for interdomain syn- Steven G. Dropsho ing functional blocks more autonomous. In chronization, which, because of buffering, MCD, each processor domain is internally out-of-order execution, and superscalar data Sandhya Dwarkadas synchronous, but domains operate asynchro- paths, has a relatively minor impact on over- nously with respect to one another. Design- all performance, less than 2 percent.4 Michael L. Scott ers still apply existing synchronous design MCD potentially has a significant energy techniques to each domain, but global clock advantage with only modest performance cost, University of Rochester skew is no longer a constraint. Moreover, if the frequencies and voltages of the various 62 Published by the IEEE Computer Society 0272-1732/03/$17.00 2003 IEEE Front end External memory L1 instruction cache Main memory Fetch unit Memory ROB, rename, dispatch L2 cache Integer unit Floating-point unit Load/store unit Integer issue queue Floating-point issue queue Integer ALUs and register file Floating-point ALUs and register file L1 data cache Figure 1. MCD processor block diagram. domains assume appropriate values at appro- that decoupled different pipeline func- priate times.1 Designers can implement this tions or control function completely online in hard- • relatively little interfunction communi- ware, making it transparent to the user and sys- cation occurred. tem software.4 Online control is useful in environments where legacy applications must Main memory is external to the processor, run without modification, or significant user and we can view it as a fifth domain that involvement is undesirable. Otherwise, profil- always runs at full speed. ing and instrumentation of the application We based our frequency and voltage-scal- provides a more global view of the program ing model on the Intel XScale processor (as than in a hardware implementation, and has described by L.T. Clark in the short course the potential to provide better results, if the “Circuit Design of Xscale Microprocessors,” behavior observed during the profiling run is at the 2001 Symp. VLSI Circuits). The XScale consistent with that occurring in production.5 continues to execute through the voltage/fre- This article briefly summarizes both of these quency change. There is, however, a substan- approaches and compares their performance tial delay before the change becomes fully against a near-optimal offline technique. effective. Key to MCD’s fine-grained adaptation is effi- MCD microarchitecture cient, on-chip voltage scaling circuitry, a rapid- The MCD microarchitecture1 consists of ly emerging technology. New microinductor four different on-chip clock domains, shown technologies are paving the way for highly-effi- in Figure 1, each with independent control of cient, on-chip, buck converters.6 This circuit frequency and voltage. In choosing the technology should be mature enough for com- boundaries among domains, we identified mercialization within the next few years, and points where the MCD microarchitecture, including the volt- age control algorithms we present, will be ready • there already existed a queue structure to take advantage of the technology. NOVEMBER–DECEMBER 2003 63 MICRO TOP PICKS Online control algorithm signal processing and signal synthesis inspired Analysis of processor resource utilization this algorithm.7 reveals a correlation, over an interval of The MCD architecture employs the instructions, between the valid entries in the attack/decay algorithm independently in each input queue (for each of the integer, floating- back-end domain. The hardware counts the point, and load/store domains) and the entries in the domain issue queue over a desired frequency for the domain. This cor- 10,000-instruction interval. Using that num- relation follows from considering the instruc- ber and the corresponding number from the tion processing core as the domain queue’s prior interval, the algorithm determines if sink and the front end as the source. Queue there has been a significant change (a thresh- utilization indicates the rate at which instruc- old of 1.75 percent), in which case the algo- tions flow through the core; if utilization rithm uses the attack mode: The frequency increases, instructions are not flowing fast changes (up or down as appropriate) by a enough. Queue utilization is thus an appro- modest amount (6 percent). If no significant priate metric for dynamically determining the change occurs or if there is no activity in the desired domain frequency (except in the front- domain, the algorithm uses the decay mode: end domain, which the online algorithm does It decreases the domain frequency slightly not attempt to control). (0.175 percent). This correlation between issue queue utiliza- In all cases, if the overall instructions per tion and desired frequency is not without chal- cycle (IPC) changes by more than a certain lenges. Notable among them is that changes in threshold (2.5 percent), the frequency remains a domain’s frequency might affect the issue unchanged for that interval. This convention queue utilization of that domain and possibly identifies natural decreases in performance others. This interaction among the domains is that are unrelated to the domain frequency a potential source of error that might degrade and prevents the algorithm from reacting to performance beyond acceptable thresholds or them. Thresholding tends to reduce the inter- lead to lower-than-expected energy savings. action of a domain with adjustments in other Interactions might lead to instability in domain domains. The IPC performance counter is the frequencies, as changes in the other domains only global information that is available to all influence each particular domain. domains. The online algorithm consists of two com- To protect against settling at a local mini- ponents that act independently but coopera- mum when a global minimum exists, the algo- tively. The result is a frequency curve that rithm forces an attack whenever a domain approximates the envelope of the queue uti- frequency has been at one extreme or the other lization curve, creating a small performance for 10 consecutive intervals. This is a com- degradation and a significant energy savings. mon technique to apply when a control sys- In general, an envelope detection algorithm tem reaches an end point and the reacts quickly to sudden changes in the input plant/control relationship becomes undefined. signal (queue utilization, in this case). In the absence of significant changes, this algorithm Profile-based control algorithm slowly decreases the controlling parameter. The profile-based control algorithm has Such an approach represents a feedback four phases: It control system. For a control system, if the plant (the entity under control) and the con- • uses standard performance profiling tech- trol point (the parameter being adjusted) are niques to identify subroutines and loop linearly related, then the system will be stable, nests that run long enough to justify and the control point will correctly adjust to reconfiguration; changes in the plant. Because of the rapid • constructs a directed acyclic graph adjustments necessary for significant changes (DAG) that represents dependences in utilization and the otherwise slow adjust- among domain operations in these long- ments, we call the approach an attack/decay running fragments of code, and distrib- algorithm.4 The attack-decay-sustain-release utes the slack in the DAG to minimize (ADSR) envelope-generating techniques in energy; 64 IEEE MICRO A 75K B C 50K 19K D 29K E 20K 4K 1K 2K L M N 20K 8K 5K 4K 3K 6K F G H I J K Figure 2. Call tree with associated instruction counts. The shaded nodes are candidates for reconfiguration. • uses per-domain histograms of operating After running the binary code and collect- frequencies to identify, for each long-run- ing its statistics, we annotate each tree node ning code fragment, the minimum fre- with the dynamic instances and the total quency for each domain that would instructions executed, from which we can
Recommended publications
  • Design and Evaluation of a Clock Multiplexing Circuit for the SSRL Booster Accelerator Timing System
    SLAC-TN-15-018 Design and Evaluation of a Clock Multiplexing Circuit for the SSRL Booster Accelerator Timing System Million Araya† August 21, 2015 Seattle Central Community College, Seattle, WA CCI Program, SLAC National Accelerator Laboratory SPEAR3 is a 234 m circular storage ring at SLAC’s synchrotron radiation facility (SSRL) in which a 3 GeV electron beam is stored for user access. Typically the electron beam decays with a time constant of approximately 10hr due to electron lose. In order to replenish the lost electrons, a booster synchrotron is used to accelerate fresh electrons up to 3GeV for injection into SPEAR3. In order to maintain a constant electron beam current of 500mA, the injection process occurs at 5 minute intervals. At these times the booster synchrotron accelerates electrons for injection at a 10Hz rate. A 10Hz 'injection ready' clock pulse train is generated when the booster synchrotron is operating. Between injection intervals-where the booster is not running and hence the 10 Hz ‘injection ready’ signal is not present-a 10Hz clock is derived from the power line supplied by Pacific Gas and Electric (PG&E) to keep track of the injection timing. For this project I constructed a multiplexing circuit to 'switch' between the booster synchrotron 'injection ready' clock signal and PG&E based clock signal. The circuit uses digital IC components and is capable of making glitch-free transitions between the two clocks. This report details construction of a prototype multiplexing circuit including test results and suggests improvement opportunities for the final design. I. Introduction The ultimate purpose of a synchrotron radiation facility is to generate stable, high-power beams of light spanning from the Infrared to x-ray portion of the electromagnetic spectrum.
    [Show full text]
  • V850 Standby Modes
    Application Note V850 Standby Modes V850ES/SG2 V850ES/SJ2 Document No. U18825EE1V0AN00 Date Published June 2007 © NEC Electronics Corporation June 2007 Printed in Germany NOTES FOR CMOS DEVICES 1 VOLTAGE APPLICATION WAVEFORM AT INPUT PIN Waveform distortion due to input noise or a reflected wave may cause malfunction. If the input of the CMOS device stays in the area between VIL (MAX) and VIH (MIN) due to noise, etc., the device may malfunction. Take care to prevent chattering noise from entering the device when the input level is fixed, and also in the transition period when the input level passes through the area between VIL (MAX) and VIH (MIN). 2 HANDLING OF UNUSED INPUT PINS Unconnected CMOS device inputs can be cause of malfunction. If an input pin is unconnected, it is possible that an internal input level may be generated due to noise, etc., causing malfunction. CMOS devices behave differently than Bipolar or NMOS devices. Input levels of CMOS devices must be fixed high or low by using pull-up or pull-down circuitry. Each unused pin should be connected to VDD or GND via a resistor if there is a possibility that it will be an output pin. All handling related to unused pins must be judged separately for each device and according to related specifications governing the device. 3 PRECAUTION AGAINST ESD A strong electric field, when exposed to a MOS device, can cause destruction of the gate oxide and ultimately degrade the device operation. Steps must be taken to stop generation of static electricity as much as possible, and quickly dissipate it when it has occurred.
    [Show full text]
  • Generate a Clock Signal from a Crystal Oscillator
    www.ti.com Product Overview Generate a Clock Signal from a Crystal Oscillator Clocked Device U CLK Figure 1-1. Using an Unbuffered Inverter and Schmitt-Trigger Inverter to Generate a Clock Signal From a Crystal Oscillator Design Considerations • Drive crystal oscillators directly • Can be disabled with added logic • Allows for selectable system clocks with multiple crystals • Outputs a clean and reliable square wave • See the Use of the CMOS Unbuffered Inverter in Oscillator Circuits Application Report for more information about this use case. • Need additional assistance? Ask our engineers a question on the TI E2E™ logic support forum Recommended Parts Part Number Automotive Qualified VCC Range Features SN74LVC2GU04-Q1 ✓ 1.65 V — 5.5 V Dual unbuffered inverter SN74LVC2GU04 SN74AHC1GU04 2 V — 5.5 V Single unbuffered inverter SN74AUC1GU04 0.8 V — 2.7 V Single unbuffered inverter SN74LVC1G17-Q1 ✓ 1.65 V — 5.5 V Single Schmitt-trigger buffer SN74LVC1G17 For more devices, browse through the online parametric tool where you can sort by desired voltage, channel numbers, and other features. SCEA099 – OCTOBER 2020 Generate a Clock Signal from a Crystal Oscillator 1 Submit Document Feedback Copyright © 2020 Texas Instruments Incorporated IMPORTANT NOTICE AND DISCLAIMER TI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATASHEETS), DESIGN RESOURCES (INCLUDING REFERENCE DESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES “AS IS” AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRD PARTY INTELLECTUAL PROPERTY RIGHTS. These resources are intended for skilled developers designing with TI products.
    [Show full text]
  • VLSI Digital Signal Processing
    CLOCKS Clocks in Digital Systems • Why are clocks and clocked memory registers needed inside digital systems? • Clocks pace the flow of data inside digital processors • The exact speed of data through circuits is impossible to predict accurately due to factors such as: – Fabrication process variations – Supply voltage variations “PVT variations” – Temperature variations – Countless parasitic effects (e.g., wire-to-wire capacitances) – Data-dependent variations (e.g., calculating 1 OR 1 = 1 requires a different delay than 1 OR 0 = 1) © B. Baas 322 Clocks in Digital Systems • Clocked memory elements slow down the fastest signals, wait until all signals have finished propagating through the combinational logic in the stage*, and then release them into the next stage simultaneously, controlled by the active edge of the clock signal • * This is why we care about clock only the single slowest signal in a block (max propagation delay) when finding the maximum clock frequency © B. Baas 323 Clocks in Digital Systems • All paths within a digital system consist of an input register, (optionally) followed by combinational logic, followed by an output register • Therefore: – If we can make this structure work under all conditions, we can build a robust digital system – We should analyze this structure carefully clock a combinational out b logic c_p1 c_p3 © B. Baas c_p2 324 Robust Clock Design • Edge-triggered memory elements (flip-flops) are generally more robust than level-sensitive memory elements (transparent latches) • Always follow these rules in this class, and for the most robust designs: 1. Only clock signals may connect to flip-flop or latch clock inputs • A simpler circuit may sometimes be possible if a logic signal is connected to a clock input, but do not do it for robustness • always @(posedge key) begin 2.
    [Show full text]
  • Tms320f280x, Tms320c280x, Tms320f2801x Digital Signal Processors
    TMS320F2809, TMS320F2808, TMS320F2806, TMS320F2802, TMS320F2801, TMS320C2802, TMS320F2809, TMS320F2808, TMS320F2806,TMS320C2801, TMS320F2802, TMS320F28016, TMS320F2801, TMS320F28015TMS320C2802, www.ti.com TMS320C2801,SPRS230P – OCTOBER TMS320F28016, 2003 – REVISED TMS320F28015 FEBRUARY 2021 SPRS230P – OCTOBER 2003 – REVISED FEBRUARY 2021 TMS320F280x, TMS320C280x, TMS320F2801x digital signal processors 1 Features • Three 32-bit CPU timers • Enhanced control peripherals • High-performance static CMOS technology – Up to 16 PWM outputs – 100 MHz (10-ns cycle time) – Up to 6 HRPWM outputs with 150-ps MEP – 60 MHz (16.67-ns cycle time) resolution – Low-power (1.8-V core, 3.3-V I/O) design – Up to four capture inputs • JTAG boundary scan support – Up to two quadrature encoder interfaces – IEEE Standard 1149.1-1990 Standard Test – Up to six 32-bit/six 16-bit timers Access Port and Boundary Scan Architecture • Serial port peripherals • High-performance 32-bit CPU (TMS320C28x) – Up to 4 SPI modules – 16 × 16 and 32 × 32 MAC operations – Up to 2 SCI (UART) modules – 16 × 16 dual MAC – Up to 2 CAN modules – Harvard bus architecture – One Inter-Integrated-Circuit (I2C) bus – Atomic operations • 12-bit ADC, 16 channels – Fast interrupt response and processing – 2 × 8 channel input multiplexer – Unified memory programming model – Two sample-and-hold – Code-efficient (in C/C++ and Assembly) – Single/simultaneous conversions • On-chip memory – Fast conversion rate: – F2809: 128K × 16 flash, 18K × 16 SARAM 80 ns - 12.5 MSPS (F2809 only) F2808: 64K × 16
    [Show full text]
  • Phase Alignment of Asynchronous External
    PHASEALIGNMENTOFASYNCHRONOUSEXTERNALCLOCK CONTROLLABLEDEVICESTOPERIODICMASTERCONTROLSIGNALUSING THEPERIODICEVENTSYNCHRONIZATIONUNIT by CharlesNicholasOstrander Athesissubmittedinpartialfulfillment oftherequirementsforthedegree of MasterofScience in ElectricalEngineering MONTANASTATEUNIVERSITY Bozeman,Montana May2009 ©COPYRIGHT by CharlesNicholasOstrander 2009 AllRightsReserved ii APPROVAL ofathesissubmittedby CharlesNicholasOstrander Thisthesishasbeenreadbyeachmemberofthethesiscommitteeandhasbeen foundtobesatisfactoryregardingcontent,Englishusage,format,citation,bibliographic style,andconsistency,andisreadyforsubmissiontotheDivisionofGraduateEducation. Dr.BrockJ.LaMeres ApprovedfortheDepartmentElectricalEngineering Dr.RobertC.Maher ApprovedfortheDivisionofGraduateEducation Dr.CarlA.Fox iii STATEMENTOFPERMISSIONTOUSE Inpresentingthisthesisinpartialfulfillmentoftherequirementsfora master’sdegreeatMontanaStateUniversity,IagreethattheLibraryshallmakeit availabletoborrowersunderrulesoftheLibrary. IfIhaveindicatedmyintentiontocopyrightthisthesisbyincludinga copyrightnoticepage,copyingisallowableonlyforscholarlypurposes,consistentwith “fairuse”asprescribedintheU.S.CopyrightLaw.Requestsforpermissionforextended quotationfromorreproductionofthisthesisinwholeorinpartsmaybegranted onlybythecopyrightholder. CharlesNicholasOstrander May2009 iv TABLEOFCONTENTS 1.INTRODUCTION .......................................................................................................... 1
    [Show full text]
  • Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: a Practical Approach∗
    IEICE TRANS. INF. & SYST., VOL.E101–D, NO.4 APRIL 2018 1116 PAPER Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: A Practical Approach∗ Carlos Cesar CORTES TORRES†a), Nonmember, Hayate OKUHARA†, Student Member, Nobuyuki YAMASAKI†, Member, and Hideharu AMANO†, Fellow SUMMARY In the past decade, real-time systems (RTSs), which must in RTSs. These techniques can improve energy efficiency; maintain time constraints to avoid catastrophic consequences, have been however, they often require a large amount of power since widely introduced into various embedded systems and Internet of Things they must control the supply voltages of the systems. (IoTs). The RTSs are required to be energy efficient as they are used in embedded devices in which battery life is important. In this study, we in- Body bias (BB) control is another solution that can im- vestigated the RTS energy efficiency by analyzing the ability of body bias prove RTS energy efficiency as it can manage the tradeoff (BB) in providing a satisfying tradeoff between performance and energy. between power leakage and performance without affecting We propose a practical and realistic model that includes the BB energy and the power supply [4], [5].Itseffect is further endorsed when timing overhead in addition to idle region analysis. This study was con- ducted using accurate parameters extracted from a real chip using silicon systems are enabled with silicon on thin box (SOTB) tech- on thin box (SOTB) technology. By using the BB control based on the nology [6], which is a novel and advanced fully depleted sili- proposed model, about 34% energy reduction was achieved.
    [Show full text]
  • 7. Latches and Flip-Flops
    Chapter 7 – Latches and Flip-Flops Page 1 of 18 7. Latches and Flip-Flops Latches and flip-flops are the basic elements for storing information. One latch or flip-flop can store one bit of information. The main difference between latches and flip-flops is that for latches, their outputs are constantly affected by their inputs as long as the enable signal is asserted. In other words, when they are enabled, their content changes immediately when their inputs change. Flip-flops, on the other hand, have their content change only either at the rising or falling edge of the enable signal. This enable signal is usually the controlling clock signal. After the rising or falling edge of the clock, the flip-flop content remains constant even if the input changes. There are basically four main types of latches and flip-flops: SR, D, JK, and T. The major differences in these flip-flop types are the number of inputs they have and how they change state. For each type, there are also different variations that enhance their operations. In this chapter, we will look at the operations of the various latches and flip- flops. 7.1 Bistable Element The simplest sequential circuit or storage element is a bistable element, which is constructed with two inverters connected sequentially in a loop as shown in Figure 1. It has no inputs and two outputs labeled Q and Q’. Since the circuit has no inputs, we cannot change the values of Q and Q’. However, Q will take on whatever value it happens to be when the circuit is first powered up.
    [Show full text]
  • Introduction (Pdf)
    chapter1.fm Page 1 Thursday, August 17, 2000 4:43 PM CHAPTER 1 INTRODUCTION The evolution of digital circuit design n Compelling issues in digital circuit design n How to measure the quality of digital design n Valuable references 1.1 A Historical Perspective 1.2 Issues in Digital Integrated Circuit Design 1.3 Quality Metrics of A Digital Design 1.4 Summary 1.5 To Probe Further 1 chapter1.fm Page 2 Thursday, August 17, 2000 4:43 PM 2 INTRODUCTION Chapter 1 1.1A Historical Perspective The concept of digital data manipulation has made a dramatic impact on our society. One has long grown accustomed to the idea of digital computers. Evolving steadily from main- frame and minicomputers, personal and laptop computers have proliferated into daily life. More significant, however, is a continuous trend towards digital solutions in all other areas of electronics. Instrumentation was one of the first noncomputing domains where the potential benefits of digital data manipulation over analog processing were recognized. Other areas such as control were soon to follow. Only recently have we witnessed the con- version of telecommunications and consumer electronics towards the digital format. Increasingly, telephone data is transmitted and processed digitally over both wired and wireless networks. The compact disk has revolutionized the audio world, and digital video is following in its footsteps. The idea of implementing computational engines using an encoded data format is by no means an idea of our times. In the early nineteenth century, Babbage envisioned large- scale mechanical computing devices, called Difference Engines [Swade93]. Although these engines use the decimal number system rather than the binary representation now common in modern electronics, the underlying concepts are very similar.
    [Show full text]
  • Dynamic Voltage/Frequency Scaling and Power-Gating of Network-On-Chip with Machine Learning
    Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Mark A. Clark May 2019 © 2019 Mark A. Clark. All Rights Reserved. 2 This thesis titled Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning by MARK A. CLARK has been approved for the School of Electrical Engineering and Computer Science and the Russ College of Engineering and Technology by Avinash Karanth Professor of Electrical Engineering and Computer Science Dennis Irwin Dean, Russ College of Engineering and Technology 3 Abstract CLARK, MARK A., M.S., May 2019, Electrical Engineering Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning (89 pp.) Director of Thesis: Avinash Karanth Network-on-chip (NoC) continues to be the preferred communication fabric in multicore and manycore architectures as the NoC seamlessly blends the resource efficiency of the bus with the parallelization of the crossbar. However, without adaptable power management the NoC suffers from excessive static power consumption at higher core counts. Static power consumption will increase proportionally as the size of the NoC increases to accommodate higher core counts in the future. NoC also suffers from excessive dynamic energy as traffic loads fluctuate throughout the execution of an application. Power- gating (PG) and Dynamic Voltage and Frequency Scaling (DVFS) are two highly effective techniques proposed in literature to reduce static power and dynamic energy in the NoC respectively.
    [Show full text]
  • Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems£
    Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems£ Padmanabhan Pillai and Kang G. Shin Real-Time Computing Laboratory Department of Electrical Engineering and Computer Science The University of Michigan Ann Arbor, MI 48109-2122, U.S.A. pillai,kgshin @eecs.umich.edu ABSTRACT ful microprocessors running sophisticated, intelligent control soft- In recent years, there has been a rapid and wide spread of non- ware in a vast array of devices including digital camcorders, cellu- traditional computing platforms, especially mobile and portable com- lar phones, and portable medical devices. puting devices. As applications become increasingly sophisticated and processing power increases, the most serious limitation on these Unfortunately, there is an inherent conflict in the design goals be- devices is the available battery life. Dynamic Voltage Scaling (DVS) hind these devices: as mobile systems, they should be designed to has been a key technique in exploiting the hardware characteristics maximize battery life, but as intelligent devices, they need powerful of processors to reduce energy dissipation by lowering the supply processors, which consume more energy than those in simpler de- voltage and operating frequency. The DVS algorithms are shown to vices, thus reducing battery life. In spite of continuous advances in be able to make dramatic energy savings while providing the nec- semiconductor and battery technologies that allow microprocessors essary peak computation power in general-purpose systems. How- to provide much greater computation per unit of energy and longer ever, for a large class of applications in embedded real-time sys- total battery life, the fundamental tradeoff between performance tems like cellular phones and camcorders, the variable operating and battery life remains critically important.
    [Show full text]
  • Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems †
    sensors Article Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems † Yen-Lin Chen 1,* , Ming-Feng Chang 2, Chao-Wei Yu 1 , Xiu-Zhi Chen 1 and Wen-Yew Liang 1 1 Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, Taiwan; [email protected] (C.-W.Y.); [email protected] (X.-Z.C.); [email protected] (W.-Y.L.) 2 MediaTek Inc., Hsinchu 30078, Taiwan; [email protected] * Correspondence: [email protected]; Tel.: +886-2-27712171 (ext. 4239) † This paper is an expanded version of “Learning-Directed Dynamic Volt-age and Frequency Scaling for Computation Time Prediction” published in Proceedings of 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, Changsha, China, 16–18 November 2011. Received: 6 August 2018; Accepted: 8 September 2018; Published: 12 September 2018 Abstract: Dynamic voltage and frequency scaling (DVFS) is a well-known method for saving energy consumption. Several DVFS studies have applied learning-based methods to implement the DVFS prediction model instead of complicated mathematical models. This paper proposes a lightweight learning-directed DVFS method that involves using counter propagation networks to sense and classify the task behavior and predict the best voltage/frequency setting for the system. An intelligent adjustment mechanism for performance is also provided to users under various performance requirements. The comparative experimental results of the proposed algorithms and other competitive techniques are evaluated on the NVIDIA JETSON Tegra K1 multicore platform and Intel PXA270 embedded platforms.
    [Show full text]