Lecture 30 Perspectives Administrivia Transistor Count 18Nm Finfet

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 30 Perspectives Administrivia Transistor Count 18Nm Finfet 18nm FinFET Double-gate structure + raised source/drain 400 -1.50 V Gate 350 Gate 300 -1.25 V Lecture 30 Source Drain 250 -1.00 V Silicon 200 Fin [uA/um] -0.75 V BOX d Si fin - Body! I 150 -0.50 V 100 -0.25 V Perspectives 50 0 X. Huang, et al, 1999 IEDM, p.67~70 -1.5 -1.0 -0.5 0.0 Vd [V] Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Administrivia Power Density z With Vdd ~1.2V, these devices are quite fast. FO4 z Final on Friday December 14, 2001 – 8 am delay is <5ps Location: 180 Tan Hall z If we continue with today’s architectures, we could z Topics – all what was covered in class. run digital circuits at 30GHz z Review Session - TBA z But - we will end up with 20kW/cm2 power density. z Lab and hw scores to be posted on the web – please z Lower supply – to 0.6V, we are down to 5kW/cm2. check if correct or if something is missing z z Superb Job on Posters! Speeds will be a bit lower, too, FO4 = 10ps, lowering the frequencies to ~10GHz [Tang, ISSCC’01], and z FEEDBACK ON COURSE EXTREMELY WELCOME! lowering power z Assume that a high performance DG or bulk FET can be designed with 1kW/cm2, with FO4 = 10ps [Frank, Proc IEEE, 3/01] Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Transistor Count Power will be a problem 10000 100000 1.8B 18KW 1000 900M 10000 5KW 425M 1.5KW 100 200M 1000 500W 10 P6 Pentium® proc 100 486 Pentium ® proc 1 286 386 486 10 8086 386 0.1 286 Power (Watts) 8085 Transistors (MT) 8080 8085 8086 8008 0.01 8080 1 4004 8008 S. Borkar S. Borkar 4004 0.001 0.1 1970 1980 1990 2000 2010 1971 1974 1978 1985 1992 2000 2004 2008 Year Year 200M--1.8B transistors on the Lead Microprocessor Power delivery and dissipation will be prohibitive Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 1 Power is a Limiting Factor Microprocessor Design z Core datapath will be running at 7 - 10GHz z If we have 2cm x 2cm die in a high-performance z Requires fast devices, low thresholds with 0.5-0.6V microprocessor, we will end up with 4kW power supplies dissipation. z Lowest NMOS VTh ~ -0.1V to get swing in CMOS. z If our power has to be limited to 180W, we can afford z Assume threshold of 0 – 0.1V. The devices will be to have only 4.5% of these devices with 0.6V supply very leaky, will use second threshold to control on the die, given that nothing else dissipates power. leakage power. z With second threshold set to have 10x less leakage, 90% of devices off critical paths can be made high- threshold. z Power limits the size of the µP core to 5-10% die (today’s transistor count, just shrunk), 30-50% of total power budget. Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Possible Scenario Add Dedicated Datapath z Example: 0.5 % of devices will be of highest z Can execute e.g. DIVX decoder, graphics performance Vdd Vdd/2 Freq = 1 Freq = 0.5 z 35% is leakage (assume: 20% drain, 10% gate, 5% Vdd = 0.5 drain-to-body) Logic Block Vdd = 1 Logic Block Throughput = 1 Throughput = 1 z 65% is active power, if just 0.5% of these CV2 = 13W, Power = 1 Power = 0.25 leakage 7W Area = 1 Logic Block Area = 2 Pwr Den = 1 Pwr Den = 0.125 z How would other 99.5% devices that populate the Leakage Curr. = 2 2cmx2cm die look like? z Will run at 10x lower frequency, at 0.5-0.7 of the processor VDD = 0.25 - 0.35V z Thresholds for critical paths VTh = 150mV z Need leakage power management – another threshold or control of VT Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Microprocessors 180W Gives Us: Power Area Today → 20nm µP Core Cache Cache Memory µP Core Dedicated datapath µP Core Memory 2GHz Dedicated µP Logic Core Dedicated datapath 7-10 GHz Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 2 Today’s Design Methodologies Memory Will Not Scale Much Further z z Density is the key requirement The Deep Sub-Micron (DSM) Effect (≤ 0.25µ) z Will occupy 70-80% of the die ∝ DSM ∝ 1/DSM z Low leakage z Low activity – Inherently low active power, low power “Microscopic Problems” “Macroscopic Issues” density (at least 10x less than logic) • Wiring Load Management • Time-to-Market • Noise, Crosstalk • Millions of Gates z Need higher VTh ~ 0.5V, and higher supply 0.8-1V (?) • Reliability, Manufacturability • High-Level Abstractions • Complexity: LRC, ERC • Reuse & IP: Portability • Accurate Power Prediction • Predictability • Accurate Delay Prediction • etc. • etc. Everything Looks a Little Different ? …and There’s a Lot of Them! Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Systems-on-a-Chip The Productivity Gap Today → 20nm 10,000,000 Logic Transistors/Chip 100,000,000 p i .10µ 1,000,000 Transistor/Staff Month 10,000,000 h C h 100,000 58%/Yr. compound 1,000,000 t n er Complexity growth rate o p M y s .35µ 10,000 100,000 t r - i ) o v f t i f K 1,000 10,000 t s ( a i c t s x u S / n d 100 x x 1,000 . a o Radio s r x x x r n T x P (60GHz (?), CMOS ?) 10 21%/Yr. compound 100 a r c 2.5µ i T g Productivity growth rate o 1 10 1 9 1 3 7 7 7 9 9 3 5 5 L 3 5 1 9 9 0 0 0 8 9 0 8 9 9 0 8 8 8 9 9 0 0 0 9 9 0 9 9 9 0 9 9 25M transistors, 3MB embedded SRAM 9 1 1 2 2 2 1 1 2 1 1 1 2 1 1 MIPS core @ 100MHz, DSP @ 144MHz 1 2W 7 PLLs, 12 ADC, DACs, 100 clocks, 1.4W Source: SEMATECH Digital Integrated Circuits Broadcom set-topPerspectives box © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Transistor Requirements Implementation Methodologies z Will need different kinds of transistors: Digital Circuit Implementation Approaches » Datapaths (speed, leakage) » Dedicated DSP (power, leakage) » Memory (density is main concern) » Analog (?) Custom Semi-custom z Power and leakage determine the size ratios between these blocks Cell-Based Array-Based z Number of different transistors types is determined by parameter spread z Less devices could solve the problem, but, need control of the Standard Cells Macro Cells Pre-diffused Pre-wired th threshold (4 terminal), with strong transfer function. Compiled Cells (Gate Arrays) (FPGA) Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 3 Custom Design – Layout Editor Gate Array — Sea-of-gates polysilicon VDD metal rows of Uncommited uncommitted possible cells GND contact Cell In1 In2 In 3 In4 routing channel Committed Cell (4-input NOR) Magic Layout Editor Out (UC Berkeley) Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Standard Cell - Example Sea-of-gate Primitive Cells Oxide-isolation PMOS PMOS NMOS NMOS NMOS 3-input NAND cell (from Mississippi State Library) characterized for fanout of 4 and Using oxide-isolation Using gate-isolation for three different technologies Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Synthesis Sea-of-gates 1. Describe your circuit in HDL (VHDL, Verilog) Random Logic 2. Syntehsis programs map it into a standard cell library. Set the constraints (timing, area) 3. Get a gate level netlist – automatic place and route 4. Insert clock 5. Extract the netlist from layout 6. Does it meet constraints? – go back to 1, 2, 3, 4. Called ‘Design closure’ – timing closure, power closure. Memory Subsystem LSI Logic LEA300K (0.6 µm CMOS) Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 4 Field-Programmable Gate Arrays Prewired Arrays Fuse-based I/O Buffers Program/Test/Diagnostics Categories of prewired arrays (or field- Vertical routes programmable devices): Standard-cell like floorplan z Fuse-based (program-once) z I/O Buffe rs Non-volatile EPROM based I/O Buffers z RAM based Rows of logic modules Routing channels I/O Buffe rs Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Programmable Logic Devices Interconnect Programmed interconnection Input/output pin Cell Antifus e Horizontal tracks PAL Ve r t ic a l t r a c k s PLA PROM Programming interconnect using anti-fuses Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 Field-Programmable Gate Arrays EPLD Block Diagram RAM-based Primary inputs Macrocell CLB CLB switching matrix Horizontal routing channel Interconnect point CLB CLB Vertical routing channel Courtesy Altera Corp. Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000 5 RAM-based FPGA Basic Cell (CLB) Architecture ReUse z Silicon System Platform Combinationa l logic Storage eleme nts » Flexible architecture for hardware and software R » Specific (programmable) components A Din R » Network architecture 1 2 Any function of up to B/Q /Q 4 variables F D Q1 F C/Q1/Q2 G » Software modules CE F D » Rules and guidelines for design of HW and SW A Any function of up to z Has been successful in PC’s R B/Q1/Q2 4 variable s G C/Q1/Q2 F D Q2 » Dominance of a few players who specify and control architecture D G CE G z Application-domain specific (difference in constraints)
Recommended publications
  • GS40 0.11-Μm CMOS Standard Cell/Gate Array
    GS40 0.11-µm CMOS Standard Cell/Gate Array Version 1.0 January 29, 2001 Copyright Texas Instruments Incorporated, 2001 The information and/or drawings set forth in this document and all rights in and to inventions disclosed herein and patents which might be granted thereon disclosing or employing the materials, methods, techniques, or apparatus described herein are the exclusive property of Texas Instruments. No disclosure of information or drawings shall be made to any other person or organization without the prior consent of Texas Instruments. IMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before placing orders, that information being relied on is current and complete. All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgement, including those pertaining to warranty, patent infringement, and limitation of liability. TI warrants performance of its semiconductor products to the specifications applicable at the time of sale in accordance with TI’s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this war- ranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements. Certain applications using semiconductor products may involve potential risks of death, personal injury, or severe property or environmental damage (“Critical Applications”). TI SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED, AUTHORIZED, OR WAR- RANTED TO BE SUITABLE FOR USE IN LIFE-SUPPORT DEVICES OR SYSTEMS OR OTHER CRITICAL APPLICATIONS.
    [Show full text]
  • Power Management 24
    Power Management 24 The embedded Pentium® processor family implements Intel’s System Management Mode (SMM) architecture. This chapter describes the hardware interface to SMM and Clock Control. 24.1 Power Management Features • System Management Interrupt can be delivered through the SMI# signal or through the local APIC using the SMI# message, which enhances the SMI interface, and provides for SMI delivery in APIC-based Pentium processor dual processing systems. • In dual processing systems, SMIACT# from the bus master (MRM) behaves differently than in uniprocessor systems. If the LRM processor is the processor in SMM mode, SMIACT# will be inactive and remain so until that processor becomes the MRM. • The Pentium processor is capable of supporting an SMM I/O instruction restart. This feature is automatically disabled following RESET. To enable the I/O instruction restart feature, set bit 9 of the TR12 register to “1”. • The Pentium processor default SMM revision identifier has a value of 2 when the SMM I/O instruction restart feature is enabled. • SMI# is NOT recognized by the processor in the shutdown state. 24.2 System Management Interrupt Processing The system interrupts the normal program execution and invokes SMM by generating a System Management Interrupt (SMI#) to the processor. The processor will service the SMI# by executing the following sequence. See Figure 24-1. 1. Wait for all pending bus cycles to complete and EWBE# to go active. 2. The processor asserts the SMIACT# signal while in SMM indicating to the system that it should enable the SMRAM. 3. The processor saves its state (context) to SMRAM, starting at address location SMBASE + 0FFFFH, proceeding downward in a stack-like fashion.
    [Show full text]
  • Power Management Using FPGA Architectural Features Abu Eghan, Principal Engineer Xilinx Inc
    Power Management Using FPGA Architectural Features Abu Eghan, Principal Engineer Xilinx Inc. Agenda • Introduction – Impact of Technology Node Adoption – Programmability & FPGA Expanding Application Space – Review of FPGA Power characteristics • Areas for power consideration – Architecture Features, Silicon design & Fabrication – now and future – Power & Package choices – Software & Implementation of Features – The end-user choices & Enablers • Thermal Management – Enabling tools • Summary Slide 2 2008 MEPTEC Symposium “The Heat is On” Abu Eghan, Xilinx Inc Technology Node Adoption in FPGA • New Tech. node Adoption & level of integration: – Opportunities – at 90nm, 65nm and beyond. FPGAs at leading edge of node adoption. • More Programmable logic Arrays • Higher clock speeds capability and higher performance • Increased adoption of Embedded Blocks: Processors, SERDES, BRAMs, DCM, Xtreme DSP, Ethernet MAC etc – Impact – general and may not be unique to FPGA • Increased need to manage leakage current and static power • Heat flux (watts/cm2) trend is generally up and can be non-uniform. • Potentially higher dynamic power as transistor counts soar. • Power Challenges -- Shared with Industry – Reliability limitation & lower operating temperatures – Performance & Cost Trade-offs – Lower thermal budgets – Battery Life expectancy challenges Slide 3 2008 MEPTEC Symposium “The Heat is On” Abu Eghan, Xilinx Inc FPGA-101: FPGA Terms • FPGA – Field Programmable Gate Arrays • Configurable Logic Blocks – used to implement a wide range of arbitrary digital
    [Show full text]
  • Introduction to ASIC Design
    ’14EC770 : ASIC DESIGN’ An Introduction Application - Specific Integrated Circuit Dr.K.Kalyani AP, ECE, TCE. 1 VLSI COMPANIES IN INDIA • Motorola India – IC design center • Texas Instruments – IC design center in Bangalore • VLSI India – ASIC design and FPGA services • VLSI Software – Design of electronic design automation tools • Microchip Technology – Offers VLSI CMOS semiconductor components for embedded systems • Delsoft – Electronic design automation, digital video technology and VLSI design services • Horizon Semiconductors – ASIC, VLSI and IC design training • Bit Mapper – Design, development & training • Calorex Institute of Technology – Courses in VLSI chip design, DSP and Verilog HDL • ControlNet India – VLSI design, network monitoring products and services • E Infochips – ASIC chip design, embedded systems and software development • EDAIndia – Resource on VLSI design centres and tutorials • Cypress Semiconductor – US semiconductor major Cypress has set up a VLSI development center in Bangalore • VDAT 2000 – Info on VLSI design and test workshops 2 VLSI COMPANIES IN INDIA • Sandeepani – VLSI design training courses • Sanyo LSI Technology – Semiconductor design centre of Sanyo Electronics • Semiconductor Complex – Manufacturer of microelectronics equipment like VLSIs & VLSI based systems & sub systems • Sequence Design – Provider of electronic design automation tools • Trident Techlabs – Power systems analysis software and electrical machine design services • VEDA IIT – Offers courses & training in VLSI design & development • Zensonet Technologies – VLSI IC design firm eg3.com – Useful links for the design engineer • Analog Devices India Product Development Center – Designs DSPs in Bangalore • CG-CoreEl Programmable Solutions – Design services in telecommunications, networking and DSP 3 Physical Design, CAD Tools. • SiCore Systems Pvt. Ltd. 161, Greams Road, ... • Silicon Automation Systems (India) Pvt. Ltd. ( SASI) ... • Tata Elxsi Ltd.
    [Show full text]
  • Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice
    Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice Jairam S, Madhusudan Rao, Jithendra Srinivas, Parimala Vishwanath, Udayakumar H, Jagdish Rao SoC Center of Excellence, Texas Instruments, India (sjairam, bgm-rao, jithendra, pari, uday, j-rao) @ti.com 1 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 2 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 3 Clock Gating Overview JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer • Suspend clocks selectively • No change to functionality • Specific to circuit structure • Possible to automate gating at RTL or gate-level JS/BGM – ISLPED08 4 Clock Network Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer
    [Show full text]
  • ECE 274 - Digital Logic Lecture 22 Full-Custom Integrated Circuit
    ECE 274 - Digital Logic Lecture 22 Full-Custom Integrated Circuit Full-Custom Integrated Circuit Chip created specifically to implement the transistors of the desired chip Lecture 22 – Implementation Layout – detailed description how each transistor and wires should be Manufactured IC Technologies layed on a chips surface Typically use CAD tools to convert our circuit design to a custom layout Fabricating an IC is often referred to a silicon spin 1 2 Semicustom (Application Specific) Integrated Full-Custom Integrated Circuit Circuits - ASICs Full-Custom Integrated Circuit Gate Arrays Pros Utilize a chip whose transistors are pre-designed to forms rows (arrays) of logic gates on the chip Maximum performance Sometimes referred to as sea-of-gates Cons Pros High NRE (Non-Recurring Engineering) cost Much cheaper than full-custom IC Cost of setting of the fabrication of an IC Fabrications time is typically several weeks Often exceeds $1 million Cons May take months before first IC is available Less optimized compared to full-custom IC - Slower performance, bigger size, and more power consumption 3 4 Semicustom (Application Specific) Integrated Semicustom (Application Specific) Integrated Circuits - ASICs Circuits - ASICs Standard Cells Cell Array Utilize library of pre-layed-out gates and smaller pieces of logic (cells) Standard cells are replaced on the IC with only the wiring left to be that a designer must instantiate and connect with wires completed Pros Sometimes referred to as sea of cells Can be better optimized
    [Show full text]
  • Computer Architecture Techniques for Power-Efficiency
    MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 COMPUTER ARCHITECTURE TECHNIQUES FOR POWER-EFFICIENCY i MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 ii MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 iii Synthesis Lectures on Computer Architecture Editor Mark D. Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50 to 150 page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi 2008 Chip Mutiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, James Laudon 2007 Transactional Memory James R. Larus, Ravi Rajwar 2007 Quantum Computing for Computer Architects Tzvetan S. Metodi, Frederic T. Chong 2006 MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 Copyright © 2008 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi www.morganclaypool.com ISBN: 9781598292084 paper ISBN: 9781598292091 ebook DOI: 10.2200/S00119ED1V01Y200805CAC004 A Publication in the Morgan & Claypool Publishers
    [Show full text]
  • Full-Custom Ics Standard-Cell-Based
    Full-Custom ICs Design a chip from scratch. Engineers design some or all of the logic cells, circuits, and the chip layout specifi- cally for a full-custom IC. Custom mask layers are created in order to fabricate a full-custom IC. Advantages: complete flexibility, high degree of optimization in performance and area. Disadvantages: large amount of design effort, expensive. 1 Standard-Cell-Based ICs Use predesigned, pretested and precharacterized logic cells from standard-cell li- brary as building blocks. The chip layout (defining the location of the building blocks and wiring between them) is customized. As in full-custom design, all mask layers need to be customized to fabricate a new chip. Advantages: save design time and money, reduce risk compared to full-custom design. Disadvantages: still incurs high non-recurring-engineering (NRE) cost and long manufacture time. 2 D A B C A B B D C D A A B B Cell A Cell B Cell C Cell D Feedthrough Cell Standard-cell-based IC design. 3 Gate-Array Parts of the chip are pre-fabricated, and other parts are custom fabricated for a particular customer’s circuit. Idential base cells are pre-fabricated in the form of a 2-D array on a gate-array (this partially finished chip is called gate-array template). The wires between the transistors inside the cells and between the cells are custom fabricated for each customer. Custom masks are made for the wiring only. Advantages: cost saving (fabrication cost of a large number of identical template wafers is amortized over different customers), shorter manufacture lead time.
    [Show full text]
  • Dynamic Voltage/Frequency Scaling and Power-Gating of Network-On-Chip with Machine Learning
    Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Mark A. Clark May 2019 © 2019 Mark A. Clark. All Rights Reserved. 2 This thesis titled Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning by MARK A. CLARK has been approved for the School of Electrical Engineering and Computer Science and the Russ College of Engineering and Technology by Avinash Karanth Professor of Electrical Engineering and Computer Science Dennis Irwin Dean, Russ College of Engineering and Technology 3 Abstract CLARK, MARK A., M.S., May 2019, Electrical Engineering Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning (89 pp.) Director of Thesis: Avinash Karanth Network-on-chip (NoC) continues to be the preferred communication fabric in multicore and manycore architectures as the NoC seamlessly blends the resource efficiency of the bus with the parallelization of the crossbar. However, without adaptable power management the NoC suffers from excessive static power consumption at higher core counts. Static power consumption will increase proportionally as the size of the NoC increases to accommodate higher core counts in the future. NoC also suffers from excessive dynamic energy as traffic loads fluctuate throughout the execution of an application. Power- gating (PG) and Dynamic Voltage and Frequency Scaling (DVFS) are two highly effective techniques proposed in literature to reduce static power and dynamic energy in the NoC respectively.
    [Show full text]
  • Power Reduction Techniques for Microprocessor Systems
    Power Reduction Techniques For Microprocessor Systems VASANTH VENKATACHALAM AND MICHAEL FRANZ University of California, Irvine Power consumption is a major factor that limits the performance of computers. We survey the “state of the art” in techniques that reduce the total power consumed by a microprocessor system over time. These techniques are applied at various levels ranging from circuits to architectures, architectures to system software, and system software to applications. They also include holistic approaches that will become more important over the next decade. We conclude that power management is a multifaceted discipline that is continually expanding with new techniques being developed at every level. These techniques may eventually allow computers to break through the “power wall” and achieve unprecedented levels of performance, versatility, and reliability. Yet it remains too early to tell which techniques will ultimately solve the power problem. Categories and Subject Descriptors: C.5.3 [Computer System Implementation]: Microcomputers—Microprocessors;D.2.10 [Software Engineering]: Design— Methodologies; I.m [Computing Methodologies]: Miscellaneous General Terms: Algorithms, Design, Experimentation, Management, Measurement, Performance Additional Key Words and Phrases: Energy dissipation, power reduction 1. INTRODUCTION of power; so much power, in fact, that their power densities and concomitant Computer scientists have always tried to heat generation are rapidly approaching improve the performance of computers. levels comparable to nuclear reactors But although today’s computers are much (Figure 1). These high power densities faster and far more versatile than their impair chip reliability and life expectancy, predecessors, they also consume a lot increase cooling costs, and, for large Parts of this effort have been sponsored by the National Science Foundation under ITR grant CCR-0205712 and by the Office of Naval Research under grant N00014-01-1-0854.
    [Show full text]
  • Happy: Hyperthread-Aware Power Profiling Dynamically
    HaPPy: Hyperthread-aware Power Profiling Dynamically Yan Zhai, University of Wisconsin; Xiao Zhang and Stephane Eranian, Google Inc.; Lingjia Tang and Jason Mars, University of Michigan https://www.usenix.org/conference/atc14/technical-sessions/presentation/zhai This paper is included in the Proceedings of USENIX ATC ’14: 2014 USENIX Annual Technical Conference. June 19–20, 2014 • Philadelphia, PA 978-1-931971-10-2 Open access to the Proceedings of USENIX ATC ’14: 2014 USENIX Annual Technical Conference is sponsored by USENIX. HaPPy: Hyperthread-aware Power Profiling Dynamically Yan Zhai Xiao Zhang, Stephane Eranian Lingjia Tang, Jason Mars University of Wisconsin Google Inc. University of Michigan [email protected] xiaozhang,eranian @google.com lingjia,profmars @eesc.umich.edu { } { } Abstract specified power threshold by suspending a subset of jobs. Quantifying the power consumption of individual appli- Scheduling can also be used to limit processor utilization cations co-running on a single server is a critical compo- to reach energy consumption goals. Beyond power bud- nent for software-based power capping, scheduling, and geting, pricing the power consumed by jobs in datacen- provisioning techniques in modern datacenters. How- ters is also important in multi-tenant environments. ever, with the proliferation of hyperthreading in the last One capability that proves critical in enabling software few generations of server-grade processor designs, the to monitor and manage power resources in large-scale challenge of accurately and dynamically performing this datacenter infrastructures is the attribution of power con- power attribution to individual threads has been signifi- sumption to the individual applications co-running on cantly exacerbated.
    [Show full text]
  • Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems †
    sensors Article Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems † Yen-Lin Chen 1,* , Ming-Feng Chang 2, Chao-Wei Yu 1 , Xiu-Zhi Chen 1 and Wen-Yew Liang 1 1 Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, Taiwan; [email protected] (C.-W.Y.); [email protected] (X.-Z.C.); [email protected] (W.-Y.L.) 2 MediaTek Inc., Hsinchu 30078, Taiwan; [email protected] * Correspondence: [email protected]; Tel.: +886-2-27712171 (ext. 4239) † This paper is an expanded version of “Learning-Directed Dynamic Volt-age and Frequency Scaling for Computation Time Prediction” published in Proceedings of 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, Changsha, China, 16–18 November 2011. Received: 6 August 2018; Accepted: 8 September 2018; Published: 12 September 2018 Abstract: Dynamic voltage and frequency scaling (DVFS) is a well-known method for saving energy consumption. Several DVFS studies have applied learning-based methods to implement the DVFS prediction model instead of complicated mathematical models. This paper proposes a lightweight learning-directed DVFS method that involves using counter propagation networks to sense and classify the task behavior and predict the best voltage/frequency setting for the system. An intelligent adjustment mechanism for performance is also provided to users under various performance requirements. The comparative experimental results of the proposed algorithms and other competitive techniques are evaluated on the NVIDIA JETSON Tegra K1 multicore platform and Intel PXA270 embedded platforms.
    [Show full text]