Measurement, Modeling, and Characterization for Energy-Efficient Computing

Total Page:16

File Type:pdf, Size:1020Kb

Measurement, Modeling, and Characterization for Energy-Efficient Computing THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Measurement, Modeling, and Characterization for Energy-Efficient Computing BHAVISHYA GOEL Division of Computer Engineering Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY Göteborg, Sweden 2016 Measurement, Modeling, and Characterization for Energy-Efficient Computing Bhavishya Goel ISBN 978-91-7597-411-8 Copyright c Bhavishya Goel, 2016. Doktorsavhandlingar vid Chalmers tekniska högskola Ny serie Nr 4092 ISSN 0346-718X Technical Report 129D Department of Computer Science and Engineering Computer Architecture Research Group Department of Computer Science and Engineering Chalmers University of Technology SE-412 96 GÖTEBORG, Sweden Phone: +46 (0)31-772 10 00 Author e-mail: [email protected] Chalmers Reposervice Göteborg, Sweden 2016 Measurement, Modeling, and Characterization for Energy-Efficient Computing Bhavishya Goel Division of Computer Engineering, Chalmers University of Technology ABSTRACT The ever-increasing ecological footprint of Information Technology (IT) sector coupled with adverse effects of high power consumption on electronic circuits has increased the significance of energy-efficient computing in the last decade. Making energy-efficient computing a norm rather than an exception requires that system designers and program- mers understand the energy implications of their design and implementation choices. This necessitates a detailed view of system’s energy expenditure and/or power consump- tion. We explore this aspect of energy-efficient computing in this thesis through power measurement, power modeling, and energy characterization. First, we present a quantitative comparison between power measurement data col- lected for computer systems using four techniques: a power meter at wall outlet, current transducers at ATX power rails, CPU voltage regulator’s current monitor, and Intel’s proprietary RAPL (Running Average Power Limit) interface. We compare them for ac- curacy, sensitivity and accessibility. Second, we present two different methodologies to model processor power con- sumption. The first model estimates power consumption at the granularity of individ- ual cores using per-core performance events and temperature sensors. We validate the methodology on six different platforms and show that our model estimates power con- sumption with high accuracy across all platforms consistently. To understand the energy expenditure trends across different frequencies and different degrees of parallelism, we need to model power at a much finer granularity. The second power model addresses this issue by estimating static and dynamic power consumption for individual cores and the uncore. We validate this model on Intel’s Haswell platform for single-threaded and multi-threaded benchmarks. We use this power model to characterize energy efficiency of frequency scaling on Haswell microarchitecture and use the insights to implement a low overhead DVFS scheduler. We also characterize the energy efficiency of thread scaling using the power model and demonstrate how different communication parame- ters and microarchitectural traits affect application’s energy when it scales. Finally, we perform detailed performance and energy characterization of Intel’s Re- stricted Transactional Memory (RTM). We use TinySTM software transactional memory (STM) system to benchmark RTM’s performance against competing STM alternatives. We use microbenchmarks and STAMP benchmark suite to compare RTM an STM per- formance and energy behavior. We quantify the RTM hardware limitations and identify conditions required for RTM to outperform STM. Keywords: power estimation, energy characterization, power-aware scheduling, power management, transactional memory, power measurement ii Preface Parts of the contributions presented in this thesis have previously been published in the following manuscripts. Bhavishya Goel, Sally A. McKee, Roberto Gioiosa, Karan Singh, Major Bhadauria, Marco Cesati, “Portable, Scalable, Per-Core Power Estimation for Intelligent Resource Management” in Proceedings of the 1st International Green Computing Conference, Chicago, USA, August, 2010, pp.135-146. Bhavishya Goel, Sally A. McKee, Magnus Själander, “Techniques to Measure, Model, and Manage Power,” in Advances in Comput- ers 87, 2012, pp.7-54. Bhavishya Goel, Ruben Titos-Gil, Anurag Negi, Sally A. Mc- Kee, Per Stenstrom, “Performance and Energy Analysis of the Re- stricted Transactional Memory Implementation on Haswell” in Pro- ceedings of 28th IEEE International Parallel and Distributed Pro- cessing Symposium, Phoenix, USA, May, 2014, pp.615-624 The following manuscript has been accepted but is yet to be published. Bhavishya Goel, Sally A. McKee, “A Methodology for Modeling Dynamic and Static Power Consumption for Multicore Processors” in Proceedings of 30th IEEE International Parallel and Distributed Processing Symposium, Chicago, USA, May, 2016 The following manuscripts have been published but are not included in this work. iii PREFACE iv . Bhavishya Goel, Magnus Själander, Sally A. McKee, “RTL Model for Dynamically Resizable L1 Data Cache” in Swedish System-on- Chip Conference, Ystad, Sweden, 2013 . Magnus Själander, Sally A. McKee, Bhavishya Goel, Peter Brauer, David Engdal, Andras Vajda, “Power-Aware Resource Scheduling in Base Stations” in 19th Annual IEEE/ACM International Sym- posium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Singapore, Singapore, July, 2012, pp.462- 465. Acknowledgments I will like to thank following people for supporting me during my PhD and contributing to this thesis directly or indirectly. Professor Sally A. McKee for giving me the opportunity to work with her, for giving me the confidence boost during the lows and for cele- brating my highs. She always trusted me to choose my own research path but was always there whenever I needed her for counseling. She is also an excellent cook and her brownies and palak paneers have got me through various deadlines. Magnus Själander for helping me with my research during the early years of my PhD and letting me bounce ideas off him. His unabated enthusiasm for research is very contagious. Madhavan Manivannan for being a great friend and colleague; and for spending loads of his precious time to review my papers and thesis. Professor Lars Svensson for helping me with practical issues during his time as division head and for giving me critical feedback to improve the technical and non-technical aspects of this thesis. Professor Per Larsson-Edefors for teaching some of the most entertain- ing courses I have taken at Chalmers and for his valuable feedback from time to time. Professor Per Stenström for being an inspiration in the field of computer architecture. Professor Lieven Eeckhout for evaluating my licentiate thesis and for his insightful comments on my research. v ACKNOWLEDGMENTS vi . Anurag Negi for being an accommodating roommate and a helpful co- author. Ruben Titos for helping me in getting started with the basics of trans- actional memory, for doing his part in the paper we co-authored and for cheering me up with his positive vibes. Karan Singh, Vincent M. Weaver and Major Bhadauria for getting me started with the power modeling infrastructure at Cornell and answering my naïve questions patiently. Roberto Gioiosa and Marco Cesati for helping me run experiments for my first paper. Jacob Lidman for being an amazing study partner, for sharing office space with me and putting up with my rants and bad jokes. Alen, Angelos, Bapi, Dmitry, Kasyab, Mafijul, Waliullah, Gabriele, Vinay, Jochen, Fatemeh, Behrooz, Chloe, Petros, Risat, Stavros, Alirad, Ahsen, Viktor, Miquel, Vassilis, Yiannis and Ivan for being wonderful colleagues and friends. Peter, Rune and Arne for proving invaluable IT support. Rolf Snedsböl for managing my teaching duties. Eva, Lotta, Tiina, Jonna, and Marianne for providing excellent adminis- trative support during my time at Chalmers. All the colleagues and staff at Computer Science and Engineering de- partment for creating a healthy work environment. Anthony Brandon for giving me valuable support when I was working with ρVEX processor for ERA project. European Union for funding my research. My family for motivating me to push for my goals and being there as a backup whenever I needed them. Bhavishya Goel Göteborg, May 2016 Contents Abstract i Preface iii Acknowledgments v 1 Introduction 2 1.1 Power Measurement . .8 1.2 Power Modeling . .9 1.3 Energy Characterization . 10 1.4 Contributions . 10 1.5 Thesis Organization . 12 2 Power Measurement Techniques 13 2.1 Power Measurement Techniques . 14 2.1.1 At the Wall Outlet . 14 2.1.2 At the ATX Power Rails . 15 2.1.3 At the Processor Voltage Regulator . 17 2.1.4 Experimental Results . 19 2.2 RAPL power estimations . 24 2.2.1 Overview . 24 2.2.2 Experimental Results . 24 2.3 Related Work . 31 2.4 Conclusions . 32 3 Per-core Power Estimation Model 33 3.1 Modeling Approach . 37 vii CONTENTS viii 3.2 Methodology . 38 3.2.1 Counter Selection . 38 3.2.2 Model Formation . 42 3.3 Validation . 44 3.3.1 Computation Overhead . 46 3.3.2 Estimation Error . 46 3.4 Management . 53 3.4.1 Sample Policies . 58 3.4.2 Experimental Setup . 58 3.4.3 Results . 60 3.5 Related Work . 62 3.6 Conclusions . 65 4 Energy Characterization for Multicore Processors Using Dynamic and Static Power Models 67 4.1 Power Modeling Methodology . 69 4.1.1 Uncore Dynamic Power Model . 69 4.1.2 Core Dynamic Power Model . 72 4.1.3 Uncore Static Power Model . 74 4.1.4 Core Static Power Model . 75 4.1.5 Total Chip Power Model . 77 4.2 Validation . 80 4.3 Energy Characterization of Frequency Scaling . 81 4.3.1 Energy Effects of DVFS . 83 4.3.2 DVFS Prediction . 85 4.4 Energy Characterization of Thread Scaling . 92 4.4.1 Energy Effects of Serial Fraction . 96 4.4.2 Energy Effects of Core-to-core Communication Overhead 99 4.4.3 Energy Effects of Bandwidth Contention . 102 4.4.4 Energy Effects of Locking-Mechanism Overhead . 104 4.5 Related Work . 106 4.6 Conclusions . 108 5 Characterization of Intel’s Restricted Transactional Memory 109 5.1 Experimental Setup . 110 5.2 Microbenchmark analysis . 112 CONTENTS ix 5.2.1 Basic RTM Evaluation .
Recommended publications
  • Intel® Processor Architecture
    Intel® Processor Architecture January 2013 Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Agenda •Overview Intel® processor architecture •Intel x86 ISA (instruction set architecture) •Micro-architecture of processor core •Uncore structure •Additional processor features – Hyper-threading – Turbo mode •Summary Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2 Intel® Processor Segments Today Architecture Target ISA Specific Platforms Features Intel® phone, tablet, x86 up to optimized for ATOM™ netbook, low- SSSE-3, 32 low-power, in- Architecture power server and 64 bit order Intel® mainstream x86 up to flexible Core™ notebook, Intel® AVX, feature set Architecture desktop, server 32 and 64bit covering all needs Intel® high end server IA64, x86 by RAS, large Itanium® emulation address space Architecture Intel® MIC accelerator for x86 and +60 cores, Architecture HPC Intel® MIC optimized for Instruction Floating-Point Set performance Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3 Itanium® 9500 (Poulson) New Itanium Processor Poulson Processor • Compatible with Itanium® 9300 processors (Tukwila) Core Core 32MB Core Shared Core • New micro-architecture with 8 Last Core Level Core Cores Cache Core Core • 54 MB on-die cache Memory Link • Improved RAS and power Controllers Controllers management capabilities • Doubles execution width from 6 to 12 instructions/cycle 4 Intel 4 Full + 2 Half Scalable Width Intel • 32nm process technology Memory QuickPath Interface Interconnect • Launched in November 2012 (SMI) Compatibility provides protection for today’s Itanium® investment Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation.
    [Show full text]
  • HC20.26.630.Inside Intel® Core™ Microarchitecture (Nehalem)
    Inside Intel® Core™ Microarchitecture (Nehalem) Ronak Singhal Senior Principal Engineer Intel Corporation Hot Chips 20 August 26, 2008 1 Legal Disclaimer • INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. • Intel may make changes to specifications and product descriptions at any time, without notice. • All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. • Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. • Nehalem, Merom, Wolfdale, Harpertown, Tylersburg, Penryn, Westmere, Sandy Bridge and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user • Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
    [Show full text]
  • Hardware-Accelerated Platforms and Infrastructures for Network Functions: a Survey of Enabling Technologies and Research Studies
    Received June 3, 2020, accepted July 7, 2020, date of publication July 9, 2020, date of current version July 29, 2020. Digital Object Identifier 10.1109/ACCESS.2020.3008250 Hardware-Accelerated Platforms and Infrastructures for Network Functions: A Survey of Enabling Technologies and Research Studies PRATEEK SHANTHARAMA 1, (Graduate Student Member, IEEE), AKHILESH S. THYAGATURU 2, (Member, IEEE), AND MARTIN REISSLEIN 1, (Fellow, IEEE) 1School of Electrical, Computer, and Energy Engineering, Arizona State University (ASU), Tempe, AZ 85287, USA 2Programmable Solutions Group (PSG), Intel Corporation, Chandler, AZ 85226, USA Corresponding author: Martin Reisslein ([email protected]) This work was supported by the National Science Foundation under Grant 1716121. ABSTRACT In order to facilitate flexible network service virtualization and migration, network func- tions (NFs) are increasingly executed by software modules as so-called ``softwarized NFs'' on General- Purpose Computing (GPC) platforms and infrastructures. GPC platforms are not specifically designed to efficiently execute NFs with their typically intense Input/Output (I/O) demands. Recently, numerous hardware-based accelerations have been developed to augment GPC platforms and infrastructures, e.g., the central processing unit (CPU) and memory, to efficiently execute NFs. This article comprehensively surveys hardware-accelerated platforms and infrastructures for executing softwarized NFs. This survey covers both commercial products, which we consider to be enabling technologies, as well as relevant research studies. We have organized the survey into the main categories of enabling technologies and research studies on hardware accelerations for the CPU, the memory, and the interconnects (e.g., between CPU and memory), as well as custom and dedicated hardware accelerators (that are embedded on the platforms); furthermore, we survey hardware-accelerated infrastructures that connect GPC platforms to networks (e.g., smart network interface cards).
    [Show full text]
  • The Fourth-Generation Intel Core Processor
    ................................................................................................................................................................................................................. HASWELL:THE FOURTH-GENERATION INTEL CORE PROCESSOR ................................................................................................................................................................................................................. HASWELL, THE FOURTH-GENERATION INTEL CORE PROCESSOR ARCHITECTURE, DELIVERS A Per Hammarlund RANGE OF CLIENT PARTS, A CONVERGED CORE FOR THE CLIENT AND SERVER, AND Alberto J. Martinez TECHNOLOGIES USED ACROSS MANY PRODUCTS.ITUSESANOPTIMIZEDVERSIONOFINTEL Atiq A. Bajwa David L. Hill 22-NM PROCESS TECHNOLOGY.HASWELL PROVIDES ENHANCEMENTS IN POWER- Erik Hallnor PERFORMANCE EFFICIENCY, POWER MANAGEMENT, FORM FACTOR AND COST, CORE AND Hong Jiang UNCORE MICROARCHITECTURE, AND THE CORE’S INSTRUCTION SET. Martin Dixon ......Haswell, the fourth-generation Haswell is a “tock”—a significant micro- Intel Core Processor, delivers a family of pro- architecture change over the previous- Michael Derr cessors with new innovations.1,2 Haswell generation Ivy Bridge. Haswell is built with delivers a range of client parts, a converged an SoC design approach that allows fast and Mikal Hunsaker microprocessor core for the client and server, easy creation of derivatives and variations on Rajesh Kumar and technologies used across many products. the baseline. Graphics and media come with Many of Haswell’s
    [Show full text]
  • Execution-Cache-Memory Performance Model
    Execution-Cache-Memory Performance Model: Introduction and Validation Johannes Hofmann Jan Eitzinger Dietmar Fey Chair for Computer Architecture Erlangen Regional Computing Center (RRZE) Chair for Computer Architecture University Erlangen–Nuremberg University Erlangen–Nuremberg University Erlangen–Nuremberg Email: [email protected] Email: [email protected] Email: [email protected] Abstract—This report serves two purposes: To introduce Computer (CISC) architecture. It originates from the late and validate the Execution-Cache-Memory (ECM) performance 70s and initially was a 16 bit ISA. During its long history model and to provide a thorough analysis of current Intel proces- it was extended to 32bit and finally 64bit. The ISA con- sor architectures with a special emphasis on Intel Xeon Haswell- EP. The ECM model is a simple analytical performance model tains complex instructions that can have variable number of which focuses on basic architectural resources. The architectural operands, have variable opcode length and allow for address analysis and model predictions are showcased and validated using references in almost all instructions. To execute this type a set of elementary microbenchmarks. of ISA on a modern processor with aggressive Instruction Level Parallelism (ILP) the instructions must be converted I. INTRODUCTION on the fly into a Reduced Instruction Set Computer (RISC) Today’s processor architectures appear complex and in- like internal instruction representation. Intel refers to these transparent to software developers. While the machine alone instructions as micro-operations, µops for short. Fortunately is already complicated the major complexity is introduced decoding of CISC instructions to µops works so well that by the interaction between software and hardware.
    [Show full text]
  • Mcsima+: a Manycore Simulator with Application-Level+ Simulation and Detailed Microarchitecture Modeling
    McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling Jung Ho Ahn†, Sheng Li‡, Seongil O†, and Norman P. Jouppi‡ †Seoul National University, ‡Hewlett-Packard Labs †{gajh, swdfish}@snu.ac.kr, ‡{sheng.li, norm.jouppi}@hp.com Abstract—With their significant performance and energy ad- 30MB L3 cache. Scalable Network-on-Chip (NoC) and cache vantages, emerging manycore processors have also brought new coherency implementation efforts have also emerged in real challenges to the architecture research community. Manycore industry designs, such as the Intel Xeon Phi [9]. Moreover, processors are highly integrated complex system-on-chips with emerging manycore designs usually require system software complicated core and uncore subsystems. The core subsystems (such as OSes) to be heavily modified or specially patched. can consist of a large number of traditional and asymmetric For example, current OSes do not support the multi-processing cores. The uncore subsystems have also become unprecedentedly powerful and complex with deeper cache hierarchies, advanced (MP) mode in ARM big.LITTLE, where both fat A15 cores on-chip interconnects, and high-performance memory controllers. and thin A7 cores are active. A special software switcher [12] In order to conduct research for emerging manycore processor is needed to support thread migration on the big.LITTLE systems, a microarchitecture-level and cycle-level manycore sim- processor. ulation infrastructure is needed. Simulators have been prevalent tools in the computer This paper introduces McSimA+, a new timing simulation architecture research community to validate innovative ideas, infrastructure, to meet these needs. McSimA+ models x86- as prototyping requires significant investments in both time and based asymmetric manycore microarchitectures in detail for money.
    [Show full text]
  • Nehalem-EX CPU Architecture Sailesh Kottapalli, Jeff Baxter NHM-EX Architecture
    Nehalem-EX CPU Architecture Sailesh Kottapalli, Jeff Baxter NHM-EX Architecture 11 Intel Confidential Legal Disclaimer • INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. • Intel may make changes to specifications and product descriptions at any time, without notice. • All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. • Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. • Nehalem, Merom, Boxboro, Millbrook, Penryn, Westmere, Sandy Bridge and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user • Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
    [Show full text]
  • (19) United States (12) Patent Application Publication (10) Pub
    US 20130262910A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2013/0262910 A1 NATHAN (43) Pub. Date: Oct. 3, 2013 (54) TIME KEEPING IN UNKNOWN AND (52) US. Cl. UNSTABLE CLOCK ARCHITECTURE USPC ........................................................ .. 713/502 (75) Inventor: Ofer NATHAN, Kiryat Yam (IL) (57) ABSTRACT (73) Assignee: INTEL CORPORATION’ Santa Clara’ The present invention may provide a system With a ?xed CA (Us) clock to provide a ?xed clock signal, and a variable clock to _ provide a variable clock signal. The system may also include (21) Appl' No" 13/435992 a chipset With a chipset time stamp counter (TSC) based on (22) Filed. Man 30, 2012 the ?xed clock signal. A processor may include a fast counter that may be based on the variable clock signal and generate a Publication Classi?cation fast count value. A sloW counter may doWnload a time stamp value based on the chipset TSC at Wakeup. The sloW counter (51) Int. Cl. may be based on the ?xed clock signal and may generate a G06F 1/14 (2006.01) sloW count value. A central TSC may combine the fast count G06F 1/04 (2006.01) and sloW count value to generate a central TSC value. CHIPSET PCHTS‘C VARIABLE FIXED CLOCK m 512 CLOCK (CRY CLK) UNCORE @ RST FAST COUNTER SLOW COUNTER w 2.7.2 Patent Application Publication Oct. 3, 2013 Sheet 1 0f 6 US 2013/0262910 A1 PROCESSOR ExEcuTIoN UNIT m m Pl-YCKED CACHE REGISTER FILE INSTRUCTION “24 m SET 109 | + PROCESSOR BUS ] 110i 3 E MEMORY GRAPHICS/ /\_J\ MEMORY /1_[\ INsTRuCTION VIDEO CARD 1_1& CONTROLLER HUB 1_18_ g; H L15 W DATA m E; LEGACY DATA STORAGE I/O CONTROLLER 24' USER INPUT INTERFACE WIRELESS I/O TRANSCEIVER CI) CONTROLLER HUB 1_26 l) /\\‘:> SERIAL EXPANSION PORT FLASH BIDS 4 h AUDIO ii \I—/ CONTROLLER @ NETWORK CONTROLLER 100 54‘ FIG.
    [Show full text]
  • Core and Uncore Joint Frequency Scaling Strategy
    Journal of Computer and Communications, 2018, 6, 184-201 http://www.scirp.org/journal/jcc ISSN Online: 2327-5227 ISSN Print: 2327-5219 Core and Uncore Joint Frequency Scaling Strategy Vaibhav Sundriyal1, Masha Sosonkina1, Bryce Westheimer2, Mark Gordon2 1Old Dominion University, Norfolk, VA, USA 2Department of Chemistry, Iowa State University, Ames Laboratory, Ames, IA, USA How to cite this paper: Sundriyal, V., Abstract Sosonkina, M., Westheimer, B. and Gor- don, M. (2018) Core and Uncore Joint Energy-proportional computing is one of the foremost constraints in the de- Frequency Scaling Strategy. Journal of sign of next generation exascale systems. These systems must have a very high Computer and Communications, 6, FLOP-per-watt ratio to be sustainable, which requires tremendous improve- 184-201. https://doi.org/10.4236/jcc.2018.612018 ments in power efficiency for modern computing systems. This paper focuses on the processor—as still the biggest contributor to the power usage—by Received: August 17, 2018 considering both its core and uncore power subsystems. The uncore describes Accepted: December 26, 2018 those processor functions that are not handled by the core, such as L3 cache Published: December 29, 2018 and on-chip interconnect, and contributes significantly to the total system Copyright © 2018 by author(s) and power. The uncore frequency scaling (UFS) capability has been available to Scientific Research Publishing Inc. the user since the Intel Haswell processor generation. In this paper, perfor- This work is licensed under the Creative mance and power models are proposed to use both the UFS and dynamic Commons Attribution International voltage and frequency scaling (DVFS) to reduce the energy consumption in License (CC BY 4.0).
    [Show full text]
  • Intel 64 and IA-32 Architectures Optimization Reference Manual [PDF]
    Intel® 64 and IA-32 Architectures Optimization Reference Manual Order Number: 248966-033 June 2016 Intel technologies features and benefits depend on system configuration and may require enabled hardware, software, or service ac- tivation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting http://www.intel.com/design/literature.htm.
    [Show full text]
  • Architecture 2 CPU, DSP, GPU, NPU Contents
    Architecture 2 CPU, DSP, GPU, NPU Contents 1 Central processing unit 1 1.1 History ................................................. 1 1.1.1 Transistor CPUs ....................................... 2 1.1.2 Small-scale integration CPUs ................................. 3 1.1.3 Large-scale integration CPUs ................................. 3 1.1.4 Microprocessors ....................................... 4 1.2 Operation ............................................... 4 1.2.1 Fetch ............................................. 5 1.2.2 Decode ............................................ 5 1.2.3 Execute ............................................ 5 1.3 Structure and implementation ..................................... 5 1.3.1 Control unit .......................................... 6 1.3.2 Arithmetic logic unit ..................................... 6 1.3.3 Memory management unit .................................. 6 1.3.4 Integer range ......................................... 6 1.3.5 Clock rate ........................................... 7 1.3.6 Parallelism .......................................... 8 1.4 Performance .............................................. 11 1.5 See also ................................................ 11 1.6 Notes ................................................. 11 1.7 References ............................................... 12 1.8 External links ............................................. 13 2 Digital signal processor 14 2.1 Overview ............................................... 14 2.2 Architecture .............................................
    [Show full text]
  • Next Generation Intel® Microarchitecture Nehalem Paul G
    Next Generation Intel® Microarchitecture Nehalem Paul G. Howard, Ph.D. Chief Scientist, Microway, Inc. Copyright 2009 by Microway, Inc. Intel® usually introduces a new processor every year, alternating between a new microarchitecture and a new process technology, including a die shrink of the current microarchitecture. Nehalem1 is the latest microarchitecture, using the same 45 nm process used in later versions of the Core microarchitecture (Penryn), but with a brand new design. Some of the more important new features of Nehalem are: • The new Intel Quick Path Interconnect, replacing part of the front side bus • An integrated memory controller manages triple-channel DDR3 memory, replacing the rest of the front side bus • Restructured cache memory, including the introduction of a large inclusive L3 cache • Integrated power management, including dynamic overclocking • The return of Hyper-Threading as an integral part of the design • Some new SSE instructions for string processing, pattern recognition, and networking • Modular design, allowing a wide variety of chip and package configurations. Many of the new features are transparent to users. The strength of Nehalem is that it can run existing code faster and with much more efficient power usage than the Core microarchitecture that preceded it. Road map Intel refers to their processor development model as a “tick-tock” strategy, the tick being the migration to a new process (typically a die shrink) and the tock being the new microarchitecture, with about one year between successive ticks and tocks. The rationale is that keeping the logical design separate from the physical process will help isolate any issues that arise.
    [Show full text]