Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky OUTLINE

Total Page:16

File Type:pdf, Size:1020Kb

Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky OUTLINE PRACTICAL POWER GATING AND DYNAMIC VOLTAGE/FREQUENCY SCALING Stephen Kosonocky AMD Fellow 1 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky OUTLINE . Motivation for DVFS and Power Gating . Dynamic Voltage/Frequency Scaling . Power Gating . Conclusion 2 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky MOTIVATION 3 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky SERVER COST TRENDS . Server compute capacity/$ is increasing steadily – Driving up total power/$ spent on servers – Also driving up electrical power spending . Green initiative and focus on energy independence creates additional pressure to lower energy costs . Increased energy efficiency directly reduces costs • Source Data: Uptime Institute, 2009 4 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky MOBILE AND DESKTOP Now: Parallel/Data-Dense . High degree of integration of 16:9 @ 7 megapixels special functions HD video flipcams, phones, webcams (1GB) . Diverse workloads 3D Internet apps and HD video online, social networking w/HD files – Data serial, data parallel 3D Blu-ray HD – Streaming Multi-touch, facial/gesture/voice – Compute intensive in bursts recognition + mouse & keyboard – Long periods of inactivity All day computing (8+ Hours) . Small form factor Immersive and . Long battery life interactive performance . Instant access of the device . Autonomous power management Workloads - Samuel Naffziger, VLSI Symposium 2011, Kyoto 5 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky POWER/PERFORMANCE TRADE-OFF Processor Power-Frequency/Voltage 35 12 30 Perf/Watt 10 25 Power Performance/Watt 8 maximized at the 20 lowest voltage operating point 6 15 Watts/Core 4 Frequency (GHz) Frequency 10 Fmax/Voltage 2 5 0 0 0.725 0.775 0.825 0.875 0.925 0.95 1.025 1.075 1.125 1.175 1.225 Voltage Performance/Watt maximized at VMIN 6 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky HIGH PERFORMANCE CORE POWER Depending on process node, leakage ~ 40% total power 7 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky Two major knobs have emerged for controlling power 1. Dynamic Voltage and Frequency Scaling – Optimize performance for the application while it’s running 2. Power Gating – Gate power during idle periods Each present unique challenges for implementation and optimization 8 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DYNAMIC VOLTAGE/FREQUENCY SCALING 9 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LLANO ACCELERATED PROCESSING UNIT (APU) • Integration Provides Improvement • Eliminate power and latency of extra chip crossing • 3X bandwidth between GPU and Memory • Same sized GPU is substantially more effective • Power efficient, advanced technology for both CPU and GPU • Thermal management optimization between CPU and GPU ~ Power Breakdown . Key features for Mobile Mark 07 IO Analog PHY – Lower Idle, Active power CPU – DVFS / Clock-gating / Power Gating NB – Robust and Flexible Power Management Control DDR GPU PHY 10 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU POWER MANAGEMENT - Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009 11 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DYNAMIC VOLTAGE SCALING Voltage . Dynamic frequency and voltage scaling (DVFS) P-states: Core executing at some frequency V . Match the frequency of operation with P-states C1/C2: Core halted and frequency ramped down the workload . Lower voltage to sustain required frequency C3/C4: Core halted, clocks ramped to even lower VAltvid frequency, maintain caches, maintain core Examples: . P-states for active control of V C5: Core halted, clocks ramped to even lower deeper Altvid frequency, flush caches, maintain core voltage/frequency . C-states for degrees of idle control C6: Core off, flush caches, core state saved to 0V DRAM 12 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky DESIGN FOR DVFS . Optimization of timing across entire voltage range necessary for efficiency DVFS . Required two separate timing corners with equally aggressive cycle time goals – 0.8V & 1.3V . Both gate-dominated timing paths and paths with high RC content have been tuned to provide better scaling than the prior design . Enables the core to thoroughly exploit boost capability 13 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU DVFS - Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009 14 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU THERMAL PROFILE 110.0 105.0 100.0 95.0 90.0 85.0 CPU-centric workload - Samuel Naffziger, VLSI Symposium 2011, Kyoto 15 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU THERMAL PROFILE 110.0 105.0 100.0 95.0 90.0 85.0 GPU-centric data parallel workload - Samuel Naffziger, VLSI Symposium 2011, Kyoto 16 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky APU THERMAL PROFILE 110.0 105.0 100.0 95.0 90.0 85.0 Balanced workload - Samuel Naffziger, VLSI Symposium 2011, Kyoto 17 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky WHAT HAPPENS WHEN WE ADD MORE CORES? CPU 0 CPU 1 Already at 6, 2 high current supplies PLL PLL VDD CPU 2 CPU 3 PLL PLL VDDIO, VLDT VRM DIMMs VTT PLLs SVI PLLs Graphics Misc Northbridge PLL VDDA VDDNB 18 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky PACKAGE LIMITS . Packages add constraints to increasing number of discrete Typical low cost package structure supplies Chip . Typically only use 4 thick layers for high current power distribution . 3-4 thin build-up layers on each side – Used for secondary supply and signal routing to pins . Difficult to add many more supply rails 19 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky TYPICAL PACKAGE LAYER ROUTING LAYERS VDDR VDDA VDDNB VDD VDDIO M1 – PWR (top metal layer) M2 – routing layer 1 Package capacitors Routing congestion Space limited for capacitors 20 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky CORE VOLTAGE POWER DELIVERY IN PACKAGE VDD Max Package Droop VSS Max Package Droop 2.4x -0.025 0.041 higher -0.0255 0.04 -0.026 0.039 0.038 -0.0265 0.037 -0.027 0.036 0.035 -0.0275 1 7 0.034 -0.028 13 0.033 S29 19 0.032 -0.0285 25 S22 S25 S22 -0.029 S15 31 S19 1 S16 4 7 S13 S8 10 S10 S7 13 16 19 S4 22 S1 25 S1 28 31 34 . Package Routing Constraints forces compromises for VDD plane – More difficult with increasing VDD planes . More resources dedicated to VSS – Shows less droop 21 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky WHAT DOES THE ACTUAL VDD LOOK LIKE? 25ns trace of DGEMM benchmark 1.35 1.4 1.3 1.2 ~50mV 1 1.25 0.8 VDD 1.2 0.6 0.4 1.15 Current Normalized 0.2 1.1 0 Time X86 core Supply simulation w/ package board model Increased local decoupling cap could help 22 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky PACKAGE CAPS FOR SUPPLY DECOUPLING Most effective Marginally effective Hard to find places for additional components 23 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky LARGE ON-DIE DECOUPLING CAPACITOR FOR REDUCTION OF SUPPLY NOISE Trench Capacitor Shift resonant frequency from 80MHz -> 7MHz - D. Wendel, et.al. “The Implementation of POWER7: A Highly Parallel and Scalable Multi-Core High-End Server Processor”, ISSCC 2010 Adds significant process cost 100fF/um2 ~ 25x thick-ox 24 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky METHODS FOR INTEGRATED VOLTAGE REGULATION . Switched capacitor regulator . Buck converter . Linear regulator . Switched voltages 25 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky INTEGRATED SWITCHED CAPACITOR REGULATOR . Efficiency dependent on load current 2:1 Conversion – Most efficient at 2:1 voltage conversion ratio – Limits number of switches in the path – ~90% for 2:1, ~70% for other values . Requires large capacitors – Can use trench capacitor for better area efficiency . Large voltage and current ripple – Can mitigate with large decap and/or Leland Chang, Robert K. Montoye, Brian L. Ji, Alan J. Weger, Kevin G. Stawiasz, and Robert H. Dennard, “A interleaved converters Fully-Integrated Switched-Capacitor 2:1 Voltage Converter with Regulation Capability and 90% Efficiency at 2.3A/mm2”, VLSI Symposium , 2010 26 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky INTEGRATED BUCK CONVERTER . Integrated inductor options . ~ 0.33nH possible
Recommended publications
  • Instruction-Level Distributed Processing
    COVER FEATURE Instruction-Level Distributed Processing Shifts in hardware and software technology will soon force designers to look at microarchitectures that process instruction streams in a highly distributed fashion. James E. or nearly 20 years, microarchitecture research In short, the current focus on instruction-level Smith has emphasized instruction-level parallelism, parallelism will shift to instruction-level distributed University of which improves performance by increasing the processing (ILDP), emphasizing interinstruction com- Wisconsin- number of instructions per cycle. In striving munication with dynamic optimization and a tight Madison F for such parallelism, researchers have taken interaction between hardware and low-level software. microarchitectures from pipelining to superscalar pro- cessing, pushing toward increasingly parallel proces- TECHNOLOGY SHIFTS sors. They have concentrated on wider instruction fetch, During the next two or three technology generations, higher instruction issue rates, larger instruction win- processor architects will face several major challenges. dows, and increasing use of prediction and speculation. On-chip wire delays are becoming critical, and power In short, researchers have exploited advances in chip considerations will temper the availability of billions of technology to develop complex, hardware-intensive transistors. Many important applications will be object- processors. oriented and multithreaded and will consist of numer- Benefiting from ever-increasing transistor budgets ous separately
    [Show full text]
  • Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice
    Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice Jairam S, Madhusudan Rao, Jithendra Srinivas, Parimala Vishwanath, Udayakumar H, Jagdish Rao SoC Center of Excellence, Texas Instruments, India (sjairam, bgm-rao, jithendra, pari, uday, j-rao) @ti.com 1 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 2 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 3 Clock Gating Overview JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer • Suspend clocks selectively • No change to functionality • Specific to circuit structure • Possible to automate gating at RTL or gate-level JS/BGM – ISLPED08 4 Clock Network Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer
    [Show full text]
  • Saber Eletrônica, Designers Pois Precisamos Comprovar Ao Meio Anunciante Estes Números E, Assim, Carlos C
    editorial Editora Saber Ltda. Digital Freemium Edition Diretor Hélio Fittipaldi Nesta edição comemoramos o fantástico número de 258.395 downloads da edição 460 digital em PDF que tivemos nos primeiros 50 dias de circu- www.sabereletronica.com.br lação. Assim, esperamos atingir meio milhão em twitter.com/editora_saber seis meses. Na fase de teste, no ano passado, com Editor e Diretor Responsável a Edição Digital Gratuita que chamamos de “Digital Hélio Fittipaldi Conselho Editorial Freemium (Free + Premium) Edition”, já atingimos João Antonio Zuffo este marco e até ultrapassamos. Redação Fica aqui nosso agradecimento a todos os que Hélio Fittipaldi Augusto Heiss seguiram nosso apelo, para somente fazerem Revisão Técnica Eutíquio Lopez download das nossas edições através do link do Portal Saber Eletrônica, Designers pois precisamos comprovar ao meio anunciante estes números e, assim, Carlos C. Tartaglioni, Diego M. Gomes obtermos patrocínio para manter a edição digital gratuita. Publicidade Aproveitamos também para avisar aos nossos leitores de Portugal, Caroline Ferreira, cerca de 6.000 pessoas, que infelizmente os custos para enviarmos as Nikole Barros revistas impressas em papel têm sido altos e, por solicitação do nosso Colaboradores Alexandre Capelli, distribuidor, não enviaremos mais os exemplares impressos em papel Bruno Venâncio, para distribuição no mercado português e ex-colônias na África. César Cassiolato, Dante J. S. Conti, Em junho teremos a edição especial deste semestre e o assunto prin- Edriano C. de Araújo, cipal é a eletrônica embutida (embedded electronic), ou como dizem os Eutíquio Lopez, Tsunehiro Yamabe espanhóis e portugueses: electrónica embebida. Como marco teremos também no Centro de Exposições Transamérica em São Paulo, a 2ª edição da ESC Brazil 2012 e a 1ª MD&M, o maior evento de tecnologia para o mercado de design eletrônico que, neste ano, estará sendo promovido PARA ANUNCIAR: (11) 2095-5339 pela UBM junto com o primeiro evento para o setor médico/odontoló- [email protected] gico (a MD&M Brazil).
    [Show full text]
  • Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: a Practical Approach∗
    IEICE TRANS. INF. & SYST., VOL.E101–D, NO.4 APRIL 2018 1116 PAPER Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: A Practical Approach∗ Carlos Cesar CORTES TORRES†a), Nonmember, Hayate OKUHARA†, Student Member, Nobuyuki YAMASAKI†, Member, and Hideharu AMANO†, Fellow SUMMARY In the past decade, real-time systems (RTSs), which must in RTSs. These techniques can improve energy efficiency; maintain time constraints to avoid catastrophic consequences, have been however, they often require a large amount of power since widely introduced into various embedded systems and Internet of Things they must control the supply voltages of the systems. (IoTs). The RTSs are required to be energy efficient as they are used in embedded devices in which battery life is important. In this study, we in- Body bias (BB) control is another solution that can im- vestigated the RTS energy efficiency by analyzing the ability of body bias prove RTS energy efficiency as it can manage the tradeoff (BB) in providing a satisfying tradeoff between performance and energy. between power leakage and performance without affecting We propose a practical and realistic model that includes the BB energy and the power supply [4], [5].Itseffect is further endorsed when timing overhead in addition to idle region analysis. This study was con- ducted using accurate parameters extracted from a real chip using silicon systems are enabled with silicon on thin box (SOTB) tech- on thin box (SOTB) technology. By using the BB control based on the nology [6], which is a novel and advanced fully depleted sili- proposed model, about 34% energy reduction was achieved.
    [Show full text]
  • Dynamic Voltage/Frequency Scaling and Power-Gating of Network-On-Chip with Machine Learning
    Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Mark A. Clark May 2019 © 2019 Mark A. Clark. All Rights Reserved. 2 This thesis titled Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning by MARK A. CLARK has been approved for the School of Electrical Engineering and Computer Science and the Russ College of Engineering and Technology by Avinash Karanth Professor of Electrical Engineering and Computer Science Dennis Irwin Dean, Russ College of Engineering and Technology 3 Abstract CLARK, MARK A., M.S., May 2019, Electrical Engineering Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning (89 pp.) Director of Thesis: Avinash Karanth Network-on-chip (NoC) continues to be the preferred communication fabric in multicore and manycore architectures as the NoC seamlessly blends the resource efficiency of the bus with the parallelization of the crossbar. However, without adaptable power management the NoC suffers from excessive static power consumption at higher core counts. Static power consumption will increase proportionally as the size of the NoC increases to accommodate higher core counts in the future. NoC also suffers from excessive dynamic energy as traffic loads fluctuate throughout the execution of an application. Power- gating (PG) and Dynamic Voltage and Frequency Scaling (DVFS) are two highly effective techniques proposed in literature to reduce static power and dynamic energy in the NoC respectively.
    [Show full text]
  • Register Allocation and VDD-Gating Algorithms for Out-Of-Order
    Register Allocation and VDD-Gating Algorithms for Out-of-Order Architectures Steven J. Battle and Mark Hempstead Drexel University Philadelphia, PA USA Email: [email protected], [email protected] Abstract—Register Files (RF) in modern out-of-order micro- 100 avg Int avg FP processors can account for up to 30% of total power consumed INT → → by the core. The complexity and size of the RF has increased due 80 FP to the transition from ROB-based to MIPSR10K-style physical register renaming. Because physical registers are dynamically 60 allocated, the RF is not fully occupied during every phase of the application. In this paper, we propose a comprehensive power 40 management strategy of the RF through algorithms for register allocation and register-bank power-gating that are informed by % of runtime 20 both microarchitecture details and circuit costs. We investigate algorithms to control where to place registers in the RF, when to 0 disable banks in the RF, and when to re-enable these banks. We 60 80 100 120 140 160 include detailed circuit models to estimate the cost for banking Num. Registers Occupied and power-gating the RF. We are able to save up to 50% of the leakage energy vs. a baseline monolithic RF, and save 11% more Fig. 1. Average Reg File occupancy CDF for SPEC2006 workloads. leakage energy than fine-grained VDD-gating schemes. 1 1 Index Terms—Computer architecture, Gate leakage, Registers, SRAM cells 0.8 0.8 I. INTRODUCTION 0.6 0.6 F.cactus I.astar 0.4 0.4 Out-of-order superscalar processors, historically found only F.gems I.libq in high-performance computing environments, are now used in F.milc I.go 0.2 F.pov 0.2 Imcf a diverse range of energy-constrained applications from smart- F.zeus Iomn phones to data-centers.
    [Show full text]
  • A 65 Nm 2-Billion Transistor Quad-Core Itanium Processor
    18 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 1, JANUARY 2009 A 65 nm 2-Billion Transistor Quad-Core Itanium Processor Blaine Stackhouse, Sal Bhimji, Chris Bostak, Dave Bradley, Brian Cherkauer, Jayen Desai, Erin Francom, Mike Gowan, Paul Gronowski, Dan Krueger, Charles Morganti, and Steve Troyer Abstract—This paper describes an Itanium processor imple- mented in 65 nm process with 8 layers of Cu interconnect. The 21.5 mm by 32.5 mm die has 2.05B transistors. The processor has four dual-threaded cores, 30 MB of cache, and a system interface that operates at 2.4 GHz at 105 C. High speed serial interconnects allow for peak processor-to-processor bandwidth of 96 GB/s and peak memory bandwidth of 34 GB/s. Index Terms—65-nm process technology, circuit design, clock distribution, computer architecture, microprocessor, on-die cache, voltage domains. I. OVERVIEW Fig. 1. Die photo. HE next generation in the Intel Itanium processor family T code named Tukwila is described. The 21.5 mm by 32.5 mm die contains 2.05 billion transistors, making it the first two billion transistor microprocessor ever reported. Tukwila combines four ported Itanium cores with a new system interface and high speed serial interconnects to deliver greater than 2X performance relative to the Montecito and Montvale family of processors [1], [2]. Tukwila is manufactured in a 65 nm process with 8 layers of copper interconnect as shown in the die photo in Fig. 1. The Tukwila die is enclosed in a 66 mm 66 mm FR4 laminate package with 1248 total landed pins as shown in Fig.
    [Show full text]
  • Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems£
    Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems£ Padmanabhan Pillai and Kang G. Shin Real-Time Computing Laboratory Department of Electrical Engineering and Computer Science The University of Michigan Ann Arbor, MI 48109-2122, U.S.A. pillai,kgshin @eecs.umich.edu ABSTRACT ful microprocessors running sophisticated, intelligent control soft- In recent years, there has been a rapid and wide spread of non- ware in a vast array of devices including digital camcorders, cellu- traditional computing platforms, especially mobile and portable com- lar phones, and portable medical devices. puting devices. As applications become increasingly sophisticated and processing power increases, the most serious limitation on these Unfortunately, there is an inherent conflict in the design goals be- devices is the available battery life. Dynamic Voltage Scaling (DVS) hind these devices: as mobile systems, they should be designed to has been a key technique in exploiting the hardware characteristics maximize battery life, but as intelligent devices, they need powerful of processors to reduce energy dissipation by lowering the supply processors, which consume more energy than those in simpler de- voltage and operating frequency. The DVS algorithms are shown to vices, thus reducing battery life. In spite of continuous advances in be able to make dramatic energy savings while providing the nec- semiconductor and battery technologies that allow microprocessors essary peak computation power in general-purpose systems. How- to provide much greater computation per unit of energy and longer ever, for a large class of applications in embedded real-time sys- total battery life, the fundamental tradeoff between performance tems like cellular phones and camcorders, the variable operating and battery life remains critically important.
    [Show full text]
  • Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems †
    sensors Article Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems † Yen-Lin Chen 1,* , Ming-Feng Chang 2, Chao-Wei Yu 1 , Xiu-Zhi Chen 1 and Wen-Yew Liang 1 1 Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, Taiwan; [email protected] (C.-W.Y.); [email protected] (X.-Z.C.); [email protected] (W.-Y.L.) 2 MediaTek Inc., Hsinchu 30078, Taiwan; [email protected] * Correspondence: [email protected]; Tel.: +886-2-27712171 (ext. 4239) † This paper is an expanded version of “Learning-Directed Dynamic Volt-age and Frequency Scaling for Computation Time Prediction” published in Proceedings of 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, Changsha, China, 16–18 November 2011. Received: 6 August 2018; Accepted: 8 September 2018; Published: 12 September 2018 Abstract: Dynamic voltage and frequency scaling (DVFS) is a well-known method for saving energy consumption. Several DVFS studies have applied learning-based methods to implement the DVFS prediction model instead of complicated mathematical models. This paper proposes a lightweight learning-directed DVFS method that involves using counter propagation networks to sense and classify the task behavior and predict the best voltage/frequency setting for the system. An intelligent adjustment mechanism for performance is also provided to users under various performance requirements. The comparative experimental results of the proposed algorithms and other competitive techniques are evaluated on the NVIDIA JETSON Tegra K1 multicore platform and Intel PXA270 embedded platforms.
    [Show full text]
  • Energy Proportional Computing in Commercial Fpgas with Adaptive
    Energy proportional computing in Commercial FPGAs with Adaptive Voltage Scaling Jose Nunez-Yanez Department of Electrical and Electronic Engineering University of Bristol, UK +44 117 3315128 [email protected] ABSTRACT an open-loop configuration. However, worst case variability is Voltage and frequency adaptation can be used to create energy rarely the case. For this reason, this paper investigates Adaptive proportional systems in which energy usage adapts to the amount Voltage Scaling (AVS) in which run-time monitoring of of work to be done in the available time. Closed-loop voltage and performance variability in the silicon is used together with system frequency scaling can also take into account process and characterization to influence the voltage and the frequency on the temperature variations in addition to system load and this removes fly in a closed-loop configuration. a significant proportion of the margins used by device The rest of the paper is structured as follows. Section 2 describes manufacturers. This paper explores the capabilities of commercial related work. Section 3 presents the hardware platform used in FPGAs to use closed-loop adaptive voltage scaling to improve this research while section 4 introduces the design flow that their energy and performance profiles beyond nominal. An embeds the AVS capabilities in the user design. Section 5 adaptive power architecture based on a modified design flow is presents the power adaptive architecture based on the novel in- created with in-situ detectors and dynamic reconfiguration of situ detectors. Section 6 presents and discusses the results clock management resources. The results of deploying AVS in focusing on power and energy measurements.
    [Show full text]
  • The Dynamic Voltage and Frequency Scaling Based on the On-Chip Microcontroller System
    Journal of Theoretical and Applied Information Technology 10th May 2013. Vol. 51 No.1 © 2005 - 2013 JATIT & LLS. All rights reserved. ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 THE DYNAMIC VOLTAGE AND FREQUENCY SCALING BASED ON THE ON-CHIP MICROCONTROLLER SYSTEM 1,2TIEFENG LI, 1CAIWEN MA, 1WENHUA LI 1 Xi'an Institute of Optics and Precision Mechanics of Chinese Academy of Sciences, Xi’an, 710119,China 2The Graduate University of Chinese Academy of Sciences ,Beijing, 100049, China E-mail: [email protected] , [email protected] , [email protected] ABSTRACT With the rapid increase of complexity and size of the on-chip microcontroller system (OCMS), the power consumption issue for the OMCS is increasingly becoming critical and needs to be solved quickly. Because the CPU is often the major power consumer in the OCMS, so an important strategy to achieve energy saving is via the dynamic voltage and frequency scaling (DVFS), which can enable a processor to operate at a range of voltages and frequencies. However, it needs to be emphasized that the conventional DVFS is fully executed by the software scheduler in the operating system, and its main drawback is that the scheduler can’t accurately track the performance requirements of CPU when the dormant frequency of CPU is increasing continuously. In this paper, we firstly present a typical hardware DVFS architecture, which can automatically carry out DVFS without the software scheduler involvement. Therefore, it avoids increasing the software's workload and reduces the power consumption. Keywords: Power Consumption, DVFS, Software Scheduler, Performance Requirement 1.
    [Show full text]
  • Optimization of Clock Gating Logic for Low Power LSI Design
    Optimization of Clock Gating Logic for Low Power LSI Design Xin MAN September 2012 Waseda University Doctoral Dissertation Optimization of Clock Gating Logic for Low Power LSI Design Xin MAN Graduate School of Information, Production and Systems Waseda University September 2012 Abstract Power consumption has become a major concern for usability and reliability problems of semiconductor products, especially with the significant spread of portable devices, like smartphone in recent years. Major source of dynamic power consumption is the clock tree which may account for 45% of the system power, and clock gating is a widely used technique to reduce this portion of power dissipation. The basic idea of clock gating is to reduce the dynamic power consumption of registers by switching off unnecessary clock signals to the registers selectively depending on the control signal without violating the functional correctness. Clock gating may lead to a considerable power reduction of overall system with proper control signals. Since the clock gating logic consumes chip area and power, it is imperative to minimize the number of inserted clock gating cells and their switching activity for power optimization. Commercial tools support clock gating as a power optimization feature based on the guard signal described in HDL and the minimum number of registers injecting the clock gating cell specified as the synthesis option (structural method). However, this approach requires manual identification of proper control signals and the proper grouping of registers to be gated. That is hard and designer-intensive work. Automatic clock gating generation and optimization is necessary. In this dissertation, we focus on the optimization of clock gating logic i ii based on switching activity analysis including clock gating control candidate extraction from internal signals in the original design and optimum control signal selection considering sharing of a clock gating cell among multiple registers for power and area optimization.
    [Show full text]