Dynamic Voltage Scaling

Total Page:16

File Type:pdf, Size:1020Kb

Dynamic Voltage Scaling EE241 - Spring 2012 Advanced Digital Integrated Circuits Lecture 18: Dynamic Voltage Scaling Outline Finish multiple supplies Dynamic voltage scaling 2 1 Supply Voltage Tradeoffs Multiple Supplies in a Block CVS Layout: Usami’98 4 2 Level-Converting Flip-Flop VH VL CLK CK Q CK M M CK CK 1 2 D CK CK CK 5 Three VDD’s 1.4 1.3 1.2 1.1 1 0.9(V) V2 (V) 3 0.8V 0.7 + 0.6 Power Reduction Ratio 0.5 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 V1 (V) V2 (V) From Kuroda V1 = 1.5V, VTH = 0.3V, p(t):lambda 6 3 Optimum Numbers of Supplies { V1, V2 } { V1, V2, V3 } { V1, V2, V3, V4 } 1.0 V2/V1 V2/V1 V2/V1 V3/V1 V3/V1 0.5 V4/V1 Supply Voltage Ratio Voltage Supply 1.0 P2/P1 P3/P1 P4/P1 0.4 Power Dissipation Ratio 0.5 1.0 1.5 0.5 1.0 1.5 0.5 1.0 1.5 V1 (V) V1 (V) V1 (V) The more VDD’s, the less power, but the effect will be saturated. Power reduction effect will be decreased as VDD’s are scaled. Optimum V /V is around 0.7. 2 1 Hamada, CICC’01 7 Multiple Supply Voltages Two supply voltages per block are optimal Optimal ratio between the supply voltages is 0.7 Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF) An option is to use an asynchronous (combinatorial) level converter More sensitive to coupling and supply noise 8 4 Dual-Supply-Datapath: Layout Issue : VDDH circuit : VDDL circuit : Signal flow VDDL Row VDDH Row (empty) (a) Dedicated row (Conventional) V Row DDL Complex interconnections VDDH Row (b) Possible layout reduction (Conventional) (c) Shared-well layout A shared-well technique is appropriate for random placement of cells 9 Standard-Cell Dual-Supply-Voltage N-well isolation VDDH VDDL VDDH VDDL i1 o1 i2 o2 VSS VSS VDDH circuit VDDL circuit VDDH circuit VDDL circuit (a) circuit schematic (b) layout A VDDH circuit is assigned only to a critical path A VDDL circuit is used in a non-critical path and for driving a large capacitive load 10 5 Shared-Well Dual-Supply-Voltage Shared N-well VDDH VDDH VDDL VDDL i1 o1 i2 o2 VSS VSS VDDH circuit VDDL circuit VDDH circuit VDDL circuit (a) circuit schematic (b) layout Both circuits can be placed in the same N-well Cell layout becomes complex An intrinsic negative back-biasing of PMOS degrades speed Shimazaki, ISSCC’03 11 Measured Results: Energy & Delay Room temp. 800 1.16GHz 700 Single-supply VDDL=1.4V 600 Energy:-25.3% Shared well Delay :+2.8% (V =1.8V) 500 DDH Energy [pJ] 400 VDDL=1.2V Energy:-33.3% 300 Delay :+8.3% 200 0.6 0.8 1.0 1.2 1.4 1.6 TCYCLE [ns] The dual-supply technique expands the power-delay optimization space 12 6 Power /Energy Optimization Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Sleep Mode Run Time Logic design Scaled V DFS, DVS Active DD Clock gating Trans. sizing Multi-VDD Stack effects Trans sizing Sleep T’s Leakage Scaling VDD Multi-VDD Variable VTh + Variable VTh + Multi-VTh + Input control 13 Adaptive Supply Voltages 14 7 Processors for Portable Devices 1000 Dynamic Voltage 100 Scaling Notebook Computers 10 Pocket-PCs Performance (MIPS) 1 PDAs 0.1 110 Processor Energy (Watt*sec) Burd ISSCC’00 • Eliminate performance energy trade-off 15 Typical MPEG IDCT Histogram 16 8 Processor Usage Model Desired Compute-intensive and Throughput low-latency processes Maximum Processor Speed Background and time System Idle high-latency processes System Optimizations: Burd • Maximize Peak Throughput ISSCC’00 • Minimize Average Energy/operation 17 Common Design Approaches (Fixed VDD) Compute ASAP: Excess throughput Always high throughput time Clock Frequency Reduction: fCLK Reduced Delivered Throughput Delivered Energy/operation remains unchanged… time 18 while throughput scaled down with fCLK 9 Scale VDD with Clock Frequency Constant supply voltage 1 3.3V ~10x Energy 0.5 Reduction Reduce VDD, slow circuits down. Energy/operation 0 1.1V 00.51Burd Throughput ( f ) ISSCC’00 CLK 19 CMOS Circuits Track Over VDD 1.0 CLK f Inverter RingOsc 0.5 RegFile SRAM Normalized max. 0 VT 2VT 3VT 4VT VDD Burd Delay tracks within +/- 10% ISSCC’00 20 10 Dynamic Voltage Scaling (DVS) 1 Vary fCLK,VDD 2 Dynamically adapt Delivered Throughput time Burd • Dynamically scale energy/operation with throughput. ISSCC’00 • Always minimize speed minimize average energy/operation. • Extend battery life up to 10x with the exact same hardware! 21 Operating System Sets Processor Speed • DVS requires a voltage scheduler (VS). • VS predicts workload to estimate CPU cycles. • Applications supply completion deadlines. Processor Speed (MPEG) 80 60 CPU cycles (MHz) F time DESIRED 40 20 DESIRED F 0 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Time (sec) 22 11 Converter Loop Sets VDD, fCLK IDD fCLK RST Counter f1MHz Latch Ring Oscillator Processor FMEAS 7 PENAB F V Set by DES FERR DD N O.S. ENAB L 0110100 + CDD Register Digital Loop Filter Buck converter • Feedback loop sets VDD so that FERR 0. • Ring oscillator delay-matched to CPU critical paths. Burd • Custom loop implementation Can optimize C . ISSCC’00 DD 23 Design Over Wide Range of Voltages • Circuit design constraints. (Functional verification) • Circuit delay variation. (Timing verification) • Noise margin reduction. (Power grid, coupling) • Delay sensitivity. (Local power distribution) Design verification complexity similar to high-performance processor design @ fixed VDD 24 12 Delay Variation & Circuit Constraints 1.0 CLK f Inverter RingOsc 0.5 RegFile SRAM Normalized max. 0 VT 2VT 3VT 4VT VDD Cannot use NMOS pass gates – fails for V < 2V . • DD T Burd • Functional verification only needed at one V value ISSCC’00 DD . 25 Relative Delay Variation Delay relative to ring oscillator +40 Four extreme cases of +20 critical paths: Gate 0 Interconnect Diffusion All vary monotonically with VDD. Series Percent Delay Variation -20 V 2VT 3VT 4VT T V Burd DD ISSCC’00 • Timing verification only needed at min. & max. VDD. 26 13 Delay Sensitivity Delay Delay VDD ,() VIVRDD DD Delay VDD Delay() V DD 1 0.8 0.6 Delay / Delay Delay / 0.4 0.2 Burd 0 Normalized ISSCC’00 VT 2VT 3VT 4VT VDD • Design of local power grid (for timing constraints) only need to consider VDD 2VT. 27 Multiple Path Tracking A. Drake, ISSCC’07 28 14 Alternative: Error Detection Bull, ISSCC’2010 29 Design for Dynamically Varying VDD • Static CMOS logic. • Ring oscillator. • Dynamic logic (& tri-state busses). • Sense amp (& memory cell). Max. allowed |dVDD/dt| Min. CDD = 100nF (0.6m) Circuits continue to properly operate as VDD changes 30 15 Static CMOS Logic VDD rds|PMOS Vin = 0 Vout = VDD Vout CL max. = 4ns 0.6m CMOS: |dVDD/dt| < 200V/s • Static CMOS robustly operates with varying VDD. 31 Ring Oscillator Simulated with dVDD/dt = 20V/s 4 3 2 VDD Volts 1 fCLK 0 60 80 100 120 140 160 180 200 220 240 260 Time (ns) • Output fCLK instantaneously adapts to new VDD. 32 16 Dynamic Logic VDD clk = 1 Errors clk Vout VDD False logic low: VDD > VTP VDD V Vin out Volts VDD Latch-up: V > V clk DD be Time 0.6m CMOS: |dVDD/dt| < 20V/s • Cannot gate clock in evaluation state. • Tri-state busses fail similarly Use hold circuit. 33 Measured System Performance & Energy 100 Dynamic VDD 80 x 85 MIPS @ 60 Static VDD 5.6 mW/MIPS (3.8V) 40 6 MIPS @ 20 0.54 mW/MIPS Dhrystone 2.1 MIPS (1.2V) 0 0 1 2 3 4 5 6 Energy (mW/MIPS) Burd ISSCC’00 • Dynamic operation can increase energy efficiency > 10x. 34 17 VDD-Hopping MPEG-4 encoding Time 1 #n #n+1 Transition 0.8 time between ƒ 0.6 levels Next milestone = 200µs n-th slice finished here 0.4 0.2 Application slicing and software Normalized power feedback guarantee real-time 0 operation. 1 23 8 # of frequency levels Two hopping levels are sufficient. 35 Power /Energy Optimization Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Sleep Mode Run Time Logic design Scaled V DFS, DVS Active DD Clock gating Trans. sizing Multi-VDD Stack effects Trans sizing Sleep T’s Leakage Scaling VDD Multi-VDD Variable VTh + Variable VTh + Multi-VTh + Input control 36 18 Clock gating Requires careful skew control ... Well handled in today’s EDA tools 37 Clock-gating efficiently reduces power Without clock gating 30.6mW With clock gating MPEG4 decoder 8.5mW DEU 0 5 10 15 20 25 VDE Power [mW] MIF DSP/ 90% of F/F’s were clock-gated. HIF 896Kb SRAM 70% power reduction by clock- gating alone. Courtesy M. Ohashi, Matsushita, ISSCC 2002 38 19 Local Clock Gating 2 Q CKI 1.2 0.85 0.85 DI 0.5 D 0.85 0.5 0.5 CKIB CKIB 0.5 0.5 0.85 0.5 0.85 0.5 Data-Transition Look-Ahead Pulse XNOR Generator CKIB ‘Clock on demand’ 0.85 Flip-flop CKI CP 0.5 39 Power /Energy Optimization Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Sleep Mode Run Time Logic design Scaled V DFS, DVS Active DD Clock gating Trans. sizing Multi-VDD Stack effects Trans sizing Sleep T’s Leakage Scaling VDD Multi-VDD Variable VTh + Variable VTh + Multi-VTh + Input control 40 20 Circuit-Level Activity Encoding Conditional Inversion Coding for Interconnect 41 Number Representation 42 21 Next Lecture Leakage management 43 22.
Recommended publications
  • Instruction-Level Distributed Processing
    COVER FEATURE Instruction-Level Distributed Processing Shifts in hardware and software technology will soon force designers to look at microarchitectures that process instruction streams in a highly distributed fashion. James E. or nearly 20 years, microarchitecture research In short, the current focus on instruction-level Smith has emphasized instruction-level parallelism, parallelism will shift to instruction-level distributed University of which improves performance by increasing the processing (ILDP), emphasizing interinstruction com- Wisconsin- number of instructions per cycle. In striving munication with dynamic optimization and a tight Madison F for such parallelism, researchers have taken interaction between hardware and low-level software. microarchitectures from pipelining to superscalar pro- cessing, pushing toward increasingly parallel proces- TECHNOLOGY SHIFTS sors. They have concentrated on wider instruction fetch, During the next two or three technology generations, higher instruction issue rates, larger instruction win- processor architects will face several major challenges. dows, and increasing use of prediction and speculation. On-chip wire delays are becoming critical, and power In short, researchers have exploited advances in chip considerations will temper the availability of billions of technology to develop complex, hardware-intensive transistors. Many important applications will be object- processors. oriented and multithreaded and will consist of numer- Benefiting from ever-increasing transistor budgets ous separately
    [Show full text]
  • Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice
    Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice Jairam S, Madhusudan Rao, Jithendra Srinivas, Parimala Vishwanath, Udayakumar H, Jagdish Rao SoC Center of Excellence, Texas Instruments, India (sjairam, bgm-rao, jithendra, pari, uday, j-rao) @ti.com 1 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 2 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 3 Clock Gating Overview JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer • Suspend clocks selectively • No change to functionality • Specific to circuit structure • Possible to automate gating at RTL or gate-level JS/BGM – ISLPED08 4 Clock Network Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer
    [Show full text]
  • Saber Eletrônica, Designers Pois Precisamos Comprovar Ao Meio Anunciante Estes Números E, Assim, Carlos C
    editorial Editora Saber Ltda. Digital Freemium Edition Diretor Hélio Fittipaldi Nesta edição comemoramos o fantástico número de 258.395 downloads da edição 460 digital em PDF que tivemos nos primeiros 50 dias de circu- www.sabereletronica.com.br lação. Assim, esperamos atingir meio milhão em twitter.com/editora_saber seis meses. Na fase de teste, no ano passado, com Editor e Diretor Responsável a Edição Digital Gratuita que chamamos de “Digital Hélio Fittipaldi Conselho Editorial Freemium (Free + Premium) Edition”, já atingimos João Antonio Zuffo este marco e até ultrapassamos. Redação Fica aqui nosso agradecimento a todos os que Hélio Fittipaldi Augusto Heiss seguiram nosso apelo, para somente fazerem Revisão Técnica Eutíquio Lopez download das nossas edições através do link do Portal Saber Eletrônica, Designers pois precisamos comprovar ao meio anunciante estes números e, assim, Carlos C. Tartaglioni, Diego M. Gomes obtermos patrocínio para manter a edição digital gratuita. Publicidade Aproveitamos também para avisar aos nossos leitores de Portugal, Caroline Ferreira, cerca de 6.000 pessoas, que infelizmente os custos para enviarmos as Nikole Barros revistas impressas em papel têm sido altos e, por solicitação do nosso Colaboradores Alexandre Capelli, distribuidor, não enviaremos mais os exemplares impressos em papel Bruno Venâncio, para distribuição no mercado português e ex-colônias na África. César Cassiolato, Dante J. S. Conti, Em junho teremos a edição especial deste semestre e o assunto prin- Edriano C. de Araújo, cipal é a eletrônica embutida (embedded electronic), ou como dizem os Eutíquio Lopez, Tsunehiro Yamabe espanhóis e portugueses: electrónica embebida. Como marco teremos também no Centro de Exposições Transamérica em São Paulo, a 2ª edição da ESC Brazil 2012 e a 1ª MD&M, o maior evento de tecnologia para o mercado de design eletrônico que, neste ano, estará sendo promovido PARA ANUNCIAR: (11) 2095-5339 pela UBM junto com o primeiro evento para o setor médico/odontoló- [email protected] gico (a MD&M Brazil).
    [Show full text]
  • Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: a Practical Approach∗
    IEICE TRANS. INF. & SYST., VOL.E101–D, NO.4 APRIL 2018 1116 PAPER Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: A Practical Approach∗ Carlos Cesar CORTES TORRES†a), Nonmember, Hayate OKUHARA†, Student Member, Nobuyuki YAMASAKI†, Member, and Hideharu AMANO†, Fellow SUMMARY In the past decade, real-time systems (RTSs), which must in RTSs. These techniques can improve energy efficiency; maintain time constraints to avoid catastrophic consequences, have been however, they often require a large amount of power since widely introduced into various embedded systems and Internet of Things they must control the supply voltages of the systems. (IoTs). The RTSs are required to be energy efficient as they are used in embedded devices in which battery life is important. In this study, we in- Body bias (BB) control is another solution that can im- vestigated the RTS energy efficiency by analyzing the ability of body bias prove RTS energy efficiency as it can manage the tradeoff (BB) in providing a satisfying tradeoff between performance and energy. between power leakage and performance without affecting We propose a practical and realistic model that includes the BB energy and the power supply [4], [5].Itseffect is further endorsed when timing overhead in addition to idle region analysis. This study was con- ducted using accurate parameters extracted from a real chip using silicon systems are enabled with silicon on thin box (SOTB) tech- on thin box (SOTB) technology. By using the BB control based on the nology [6], which is a novel and advanced fully depleted sili- proposed model, about 34% energy reduction was achieved.
    [Show full text]
  • Register Allocation and VDD-Gating Algorithms for Out-Of-Order
    Register Allocation and VDD-Gating Algorithms for Out-of-Order Architectures Steven J. Battle and Mark Hempstead Drexel University Philadelphia, PA USA Email: [email protected], [email protected] Abstract—Register Files (RF) in modern out-of-order micro- 100 avg Int avg FP processors can account for up to 30% of total power consumed INT → → by the core. The complexity and size of the RF has increased due 80 FP to the transition from ROB-based to MIPSR10K-style physical register renaming. Because physical registers are dynamically 60 allocated, the RF is not fully occupied during every phase of the application. In this paper, we propose a comprehensive power 40 management strategy of the RF through algorithms for register allocation and register-bank power-gating that are informed by % of runtime 20 both microarchitecture details and circuit costs. We investigate algorithms to control where to place registers in the RF, when to 0 disable banks in the RF, and when to re-enable these banks. We 60 80 100 120 140 160 include detailed circuit models to estimate the cost for banking Num. Registers Occupied and power-gating the RF. We are able to save up to 50% of the leakage energy vs. a baseline monolithic RF, and save 11% more Fig. 1. Average Reg File occupancy CDF for SPEC2006 workloads. leakage energy than fine-grained VDD-gating schemes. 1 1 Index Terms—Computer architecture, Gate leakage, Registers, SRAM cells 0.8 0.8 I. INTRODUCTION 0.6 0.6 F.cactus I.astar 0.4 0.4 Out-of-order superscalar processors, historically found only F.gems I.libq in high-performance computing environments, are now used in F.milc I.go 0.2 F.pov 0.2 Imcf a diverse range of energy-constrained applications from smart- F.zeus Iomn phones to data-centers.
    [Show full text]
  • A 65 Nm 2-Billion Transistor Quad-Core Itanium Processor
    18 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 1, JANUARY 2009 A 65 nm 2-Billion Transistor Quad-Core Itanium Processor Blaine Stackhouse, Sal Bhimji, Chris Bostak, Dave Bradley, Brian Cherkauer, Jayen Desai, Erin Francom, Mike Gowan, Paul Gronowski, Dan Krueger, Charles Morganti, and Steve Troyer Abstract—This paper describes an Itanium processor imple- mented in 65 nm process with 8 layers of Cu interconnect. The 21.5 mm by 32.5 mm die has 2.05B transistors. The processor has four dual-threaded cores, 30 MB of cache, and a system interface that operates at 2.4 GHz at 105 C. High speed serial interconnects allow for peak processor-to-processor bandwidth of 96 GB/s and peak memory bandwidth of 34 GB/s. Index Terms—65-nm process technology, circuit design, clock distribution, computer architecture, microprocessor, on-die cache, voltage domains. I. OVERVIEW Fig. 1. Die photo. HE next generation in the Intel Itanium processor family T code named Tukwila is described. The 21.5 mm by 32.5 mm die contains 2.05 billion transistors, making it the first two billion transistor microprocessor ever reported. Tukwila combines four ported Itanium cores with a new system interface and high speed serial interconnects to deliver greater than 2X performance relative to the Montecito and Montvale family of processors [1], [2]. Tukwila is manufactured in a 65 nm process with 8 layers of copper interconnect as shown in the die photo in Fig. 1. The Tukwila die is enclosed in a 66 mm 66 mm FR4 laminate package with 1248 total landed pins as shown in Fig.
    [Show full text]
  • Optimization of Clock Gating Logic for Low Power LSI Design
    Optimization of Clock Gating Logic for Low Power LSI Design Xin MAN September 2012 Waseda University Doctoral Dissertation Optimization of Clock Gating Logic for Low Power LSI Design Xin MAN Graduate School of Information, Production and Systems Waseda University September 2012 Abstract Power consumption has become a major concern for usability and reliability problems of semiconductor products, especially with the significant spread of portable devices, like smartphone in recent years. Major source of dynamic power consumption is the clock tree which may account for 45% of the system power, and clock gating is a widely used technique to reduce this portion of power dissipation. The basic idea of clock gating is to reduce the dynamic power consumption of registers by switching off unnecessary clock signals to the registers selectively depending on the control signal without violating the functional correctness. Clock gating may lead to a considerable power reduction of overall system with proper control signals. Since the clock gating logic consumes chip area and power, it is imperative to minimize the number of inserted clock gating cells and their switching activity for power optimization. Commercial tools support clock gating as a power optimization feature based on the guard signal described in HDL and the minimum number of registers injecting the clock gating cell specified as the synthesis option (structural method). However, this approach requires manual identification of proper control signals and the proper grouping of registers to be gated. That is hard and designer-intensive work. Automatic clock gating generation and optimization is necessary. In this dissertation, we focus on the optimization of clock gating logic i ii based on switching activity analysis including clock gating control candidate extraction from internal signals in the original design and optimum control signal selection considering sharing of a clock gating cell among multiple registers for power and area optimization.
    [Show full text]
  • Clock Gating
    EDA Challenges for Low Power Design Anand Iyer, Cadence Design Systems Agenda • ItIntrod ucti on • LP techniques in detail • Challenges to low power techniques • Guidelines for choosinggq various techniques Why is Power an Issue? Leakage Power Active Performance = Power 180 130 90 65 Source: Intel, 2004 mW/MHzProcess Technology (nm) PhPower hungry process Source: EETimes, 2004 Complex System Sluggish Battery Life Improvement 2000 2001 2002 2003 2004 Approaches To Power Management • System Architecture (multi-core) Leakage Active • Software/Hardware power management system – ARM IEM Design and System Level • Voltage scaling / frequency scaling Optimizat io n • Multiple voltage islands •We Clock gating, will logic discuss structuring this •Multi-Vth cell selection to reduce leakage Implementation • Support for multiin voltagedetail islands (aka “multi-vdd” aka “MSV”) implementation • Signoff accurate analysis •SOI Process Level Optimization • High-K, Gate Stack, power gating, etc. • LLD Controlling Power in Implementation Dynamic power Leakage power 2 (≈ k•Ck • C L •V• V DD •f• f CLK) (≈ VDD • Ileakage) • Clock gating (including de-clone) • Multi-Vt cell optimization • Area optimization • Substrate biasing (VT CMOS) • Static voltage scaling (MSV)* • Power shut-off (PSO) – aka • Dynamic voltage frequency scaling MTCMOS - including State retention (DVFS)* • Adaptive voltage scaling (AVS)* – Fine grain control – Coarse grain control * Techniques that affect both dynamic and leakage power Techniques and Trade-offs Power reduction Leakage Dynamic
    [Show full text]
  • Mutual Impact Between Clock Gating and High Level Synthesis in Reconfigurable Hardware Accelerators
    electronics Article Mutual Impact between Clock Gating and High Level Synthesis in Reconfigurable Hardware Accelerators Francesco Ratto 1 , Tiziana Fanni 2 , Luigi Raffo 1 and Carlo Sau 1,* 1 Dipartimento di Ingegneria Elettrica ed Elettronica, Università degli Studi di Cagliari, Piazza d’Armi snc, 09123 Cagliari, Italy; [email protected] (F.R.); [email protected] (L.R.) 2 Dipartimento di Chimica e Farmacia, Università degli Studi di Sassari, Via Vienna 2, 07100 Sassari, Italy; [email protected] * Correspondence: [email protected] Abstract: With the diffusion of cyber-physical systems and internet of things, adaptivity and low power consumption became of primary importance in digital systems design. Reconfigurable heterogeneous platforms seem to be one of the most suitable choices to cope with such challenging context. However, their development and power optimization are not trivial, especially considering hardware acceleration components. On the one hand high level synthesis could simplify the design of such kind of systems, but on the other hand it can limit the positive effects of the adopted power saving techniques. In this work, the mutual impact of different high level synthesis tools and the application of the well known clock gating strategy in the development of reconfigurable accelerators is studied. The aim is to optimize a clock gating application according to the chosen high level synthesis engine and target technology (Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA)). Different levels of application of clock gating are evaluated, including a novel multi level solution. Besides assessing the benefits and drawbacks of the clock gating application at different levels, hints for future design automation of low power reconfigurable accelerators through high level synthesis are also derived.
    [Show full text]
  • White P Aper
    WHITE PAPER CREATING LOW-POWER DIGITAL INTEGRATED CIRCUITS – THE IMPLEMENTATION PHASE INTRODUCTION Power consumption has now moved to the forefront of digital integrated circuit (IC) development con- cerns. The combination of higher clock speeds, greater functional integration, and smaller process geometries has contributed to significant growth in power density. Furthermore, with every new pro- cess generation, leakage power consumption increases at an exponential rate. In recent years, a wide variety of techniques have been developed to address the various aspects of the power problem and to meet ever more aggressive power specifications. These techniques include clock gating, multi-switching threshold (multi-Vt) transistors, multi-supply multi-voltage (MSMV), substrate biasing, dynamic voltage and frequency scaling (DVFS), and power shut-off (PSO). Figure 1 illustrates the power, timing, and area tradeoffs among the various power management techniques. Power-reduction Power Timing Area Methodology Impact Benefit Penalty Penalty Technique Architecture Design Verification Implementation Multi-Vt Optimization Medium Little Little Low Low None Low Clock Gating Medium Little Little Low Low None Low Multi-supply Voltage Large Some Little High Medium Low Medium Power Shut-off HUGE Some Some High High High High Dynamic and Large Some Some High High High High Adaptive Voltage Frequency Scaling Substrate Biasing Large Some Some Medium None None High Figure 1. Tradeoffs associated with the various power management techniques Notably, designers cannot simply “bolt on” low power” at the end of the development process. The size and complexity of today’s ICs makes it imperative to consider power throughout the design phases—the chip/system architecture, power architecture, and design (including micro-architecture decisions)—and all the way to implementation with power-aware synthesis, placement, and routing.
    [Show full text]
  • 1/2 - 2008 Elektronik Industrie
    EI_Titel 29.01.2008 14:27 Uhr Seite 1 Februar 2008 branchenorientierte applikationen für elektronik-entwickler 39. JAHRGANG 14,00 € unverbindliche Preisempfehlung Nr.1/2 D 19067 39. JAHRGANG Was Entwickler wissen müssen! www.elektronik-industrie.de 1/2 - 2008 elektronik industrie elektronik BESUCHEN EMV 2008 SIE UNS: Stand-Nr.: Embedded World CCD-320 Halle 10.0 Stand: 10-406 EMBEDDED SYSTEME Embedded-Systeme, Programmierbare Logik, Labormesstechnik, Kfz-Elektronik Neues vom PowerPC ÿ 24 UNIT Testing neu formuliert ÿ 40 Flexible BIOS- Konfiguration ÿ 44 PROGRAMMIERBARE LOGIK Abgleich der Pin-Belegung ÿ 83 Mikrocontroller-Design inin FPGAsFPGAs ÿ 86 LABOR- MESSTECHNIK Oszilloskope: Warum analog? ÿ 56 Signalintegrietätsanalyse und Compliance Testing ÿ 60 www.elektronik-industrie.de www.automobil-elektronik.de 2008 MESSE AUTOMOBIL ELEKTRONIK SPEZIAL KFZ-ELEKTRONIK EMV 2008 DÜSSELDORF Eine Sonderausgabe der Fachmagazine elektronik industrie und AUTOMOBIL ELEKTRONIK Mikrocontroller steuern AutoSar Demonstrator ÿ 90 ININ DIESERDIESER AUSGABE ˘ TITELSTORY: ÿ erfolgsmedien für experten Ethernet für „Grenzenlose Verbindung“ 22 Entdecken Sie weitere interessante Artikel und News zum Thema auf all-electronics.de! Hier klicken & informieren! Wir messen die Performance von Mobilgeräten in ␮A und MHz. Bei Handheld-Entwicklungen gilt, analog is everywhere. ADI Produkte, optimiert für Handheld-Entwicklungen Anwendungsoptimierte ICs vereinfachen und Klasse-D-Verstärker SSM2301 Geringe EMI, niedrige THD beschleunigen Ihre Entwicklung Audio Mikrofon-Vorverstärker
    [Show full text]
  • Low Power Data-Dependent Transform Video and Still Image Coding
    Low Power Data-Dependent Transform Video and Still Image Coding by Thucydides Xanthopoulos Bachelor of Science in Electrical Engineering Massachusetts Institute of Technology, June 1992 Master of Science in Electrical Engineering and Computer Science Massachusetts Institute of Technology, February 1995 Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology February 1999 c 1999 Massachusetts Institute of Technology. All rights reserved. Signature of Author Department of Electrical Engineering and Computer Science February 1, 1999 Certified by Anantha Chandrakasan, Ph.D. Associate Professor of Electrical Engineering Thesis Supervisor Accepted by Arthur Clarke Smith, Ph.D. Professor of Electrical Engineering Graduate Officer 2 Low Power Data-Dependent Transform Video and Still Image Coding by Thucydides Xanthopoulos Submitted to the Department of Electrical Engineering and Computer Science on February 1, 1999 in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science Abstract This work introduces the idea of data dependent video coding for low power. Algorithms for the Discrete Cosine Transform (DCT) and its inverse are introduced which exploit sta- tistical properties of the input data in both the space and spatial frequency domains in order to minimize the total number of arithmetic operations. Two VLSI chips have been built as a proof-of-concept of data dependent processing implementing the DCT and its inverse (IDCT). The IDCT core processor exploits the presence of a large number of zero- valued spectral coefficients in the input stream when stimulated with MPEG-compressed video sequences.
    [Show full text]