Automatic Generation of Models of Microarchitectures

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Generation of Models of Microarchitectures Automatic Generation of Models of Microarchitectures Dissertation zur Erlangung des Grades des Doktors der Ingenieurwissenschaften der Fakultät für Mathematik und Informatik der Universität des Saarlandes von Andreas Abel Saarbrücken 2020 Tag des Kolloquiums: 12. Juni 2020 Dekan: Prof. Dr. Thomas Schuster Prüfungsausschuss: Vorsitzender: Prof. Dr. Thorsten Herfet Berichterstatter: Prof. Dr. Jan Reineke Prof. Dr. Wolfgang J. Paul Dr. Boris Köpf Akademischer Mitarbeiter: Dr. Roland Leißa Abstract Detailed microarchitectural models are necessary to predict, explain, or optimize the performance of software running on modern microprocessors. Building such models often requires a significant manual effort, as the docu- mentation provided by hardware manufacturers is typically not precise enough. The goal of this thesis is to develop techniques for generating microarchitec- tural models automatically. In the first part, we focus on recent x86 microarchitectures. We implement a tool to accurately evaluate small microbenchmarks using hardware per- formance counters. We then describe techniques to automatically generate microbenchmarks for measuring the performance of individual instructions and for characterizing cache architectures. We apply our implementations to more than a dozen different microarchitectures. In the second part of the thesis, we study more general techniques to obtain models of hardware components. In particular, we propose the concept of gray-box learning, and we develop a learning algorithm for Mealy machines that exploits prior knowledge about the system to be learned. Finally, we show how this algorithm can be adapted to minimize incompletely specified Mealy machines—a well-known NP-complete problem. Our imple- mentation outperforms existing exact minimization techniques by several orders of magnitude on a number of hard benchmarks; it is even competitive with state-of-the-art heuristic approaches. Zusammenfassung Zur Vorhersage, Erklärung oder Optimierung der Leistung von Software auf modernen Mikroprozessoren werden detaillierte Modelle der verwendeten Mikroarchitekturen benötigt. Das Erstellen derartiger Modelle ist oft mit einem hohen Aufwand verbunden, da die erforderlichen Informationen von den Prozessorherstellern typischerweise nicht zur Verfügung gestellt werden. Das Ziel der vorliegenden Arbeit ist es, Techniken zu entwickeln, um derartige Modelle automatisch zu erzeugen. Im ersten Teil beschäftigen wir uns mit aktuellen x86-Mikroarchitekturen. Wir entwickeln zuerst ein Tool, das kleine Microbenchmarks mithilfe von Per- formance Countern auswerten kann. Danach beschreiben wir Techniken, um automatisch Microbenchmarks zu erzeugen, mit denen die Leistung einzelner Instruktionen gemessen sowie die Cache-Architektur charakterisiert werden kann. Im zweiten Teil der Arbeit betrachten wir allgemeinere Techniken, um Hard- waremodelle zu erzeugen. Wir schlagen das Konzept des “Gray-Box Learning” vor, und wir entwickeln einen Lernalgorithmus für Mealy-Maschinen, der bekannte Informationen über das zu lernende System berücksichtigt. Zum Abschluss zeigen wir, wie dieser Algorithmus auf das Problem der Minimierung unvollständig spezifizierter Mealy-Maschinen übertragen werden kann. Hierbei handelt es sich um ein bekanntes NP-vollständiges Problem. Unsere Implementierung ist in mehreren Benchmarks um Größenordnungen schneller als vorherige Ansätze. Acknowledgements First and foremost, I would like to thank my advisor, Prof. Jan Reineke. He gave me the freedom to explore my own ideas and was always available for discussions and to provide guidance. I’m looking forward to continuing working with him! I would also like to thank Prof. Wolfgang Paul and Dr. Boris Köpf for reviewing my thesis, and Prof. Thorsten Herfet for acting as the chair of the examination board. Finally, I would like to thank my current and former colleagues at the Real- Time and Embedded Systems Lab and the Compiler Design Lab. In particular, I would like to thank Dr. Roland Leißa for serving as the academic assistant on my examination board. Contents 1 Introduction 13 1.1 Contributions and Structure of This Thesis . 14 1.2 Publications . 19 2 nanoBench: A Low-Overhead Tool for Running Microbench- marks on x86 Systems 21 2.1 Introduction . 21 2.2 Background . 23 2.2.1 Hardware Performance Counters . 23 2.2.2 Assembler Instructions . 25 2.3 Features . 25 2.3.1 Example . 26 2.3.2 Generated Code . 27 2.3.3 Running the Generated Code . 27 2.3.4 Kernel/User Mode . 29 2.3.5 Interface . 29 2.3.6 Loops vs. Unrolling . 29 2.3.7 Accessing Memory . 30 2.3.8 Warm-Up Runs . 30 2.3.9 noMem Mode . 30 2.3.10 Performance Counter Configurations . 31 2.3.11 Execution Time of nanoBench . 31 2.3.12 Supported Platforms . 32 2.4 Implementation . 32 2.4.1 Accurate Performance Counter Measurements . 32 2.4.2 Generating Code . 33 2.4.3 Kernel Module . 34 2.4.4 Allocating Physically-Contiguous Memory . 34 2.5 Related Work . 35 2.6 Conclusions and Future Work . 36 3 uops.info: Characterizing the Latency, Throughput, and Port Usage of Instructions on x86 Microarchitectures 39 3.1 Introduction . 40 3.2 Related Work . 42 3.2.1 Information Provided by the Manufacturers . 42 3.2.2 Measurement-Based Approaches . 43 3.3 Background . 44 9 CONTENTS 3.3.1 Pipeline of Intel Core CPUs . 44 3.3.2 Pipeline of AMD Ryzen CPUs . 46 3.4 Definitions . 46 3.4.1 Latency . 47 3.4.2 Throughput . 47 3.4.3 Port Usage . 48 3.5 Algorithms . 49 3.5.1 Port Usage . 49 3.5.2 Latency . 52 3.5.3 Throughput . 59 3.6 Implementation . 61 3.6.1 Details of the x86 Instruction Set . 61 3.6.2 Measurements on the Hardware . 62 3.6.3 Analysis Using IACA . 63 3.6.4 Machine-Readable Output . 63 3.7 Evaluation . 63 3.7.1 Experimental Setup . 64 3.7.2 Hardware Measurements vs. Documentation . 64 3.7.3 Hardware Measurements vs. IACA . 67 3.7.4 Interesting Results . 69 3.8 Limitations . 76 3.9 Conclusions and Future Work . 77 4 Characterizing Cache Architectures 79 4.1 Introduction . 79 4.2 Background . 81 4.2.1 Cache Organization . 81 4.2.2 Replacement Policies . 82 4.3 Cache-Characterization Tools . 86 4.3.1 CacheInfo . 86 4.3.2 CacheSeq . 88 4.3.3 Replacement Policies . 90 4.3.4 Age Graphs . 92 4.3.5 Test for Adaptive Policies . 92 4.4 Results . 94 4.4.1 L1 Data Caches . 94 4.4.2 L2 Caches . 96 4.4.3 L3 Caches . 104 4.4.4 Resetting the Replacement Policy State . 109 4.4.5 Implementation Costs . 111 4.5 Related Work . 111 10 CONTENTS 4.5.1 Microbenchmark-Based Cache Analysis . 111 4.5.2 Influence of the Replacement Policy on Performance Prediction Accuracy . 113 4.5.3 Security Aspects of Replacement Policies . 114 4.6 Conclusions and Future Work . 115 5 Gray-Box Learning of Serial Compositions of Mealy Machines 117 5.1 Introduction . 118 5.2 Problem Statement . 119 5.2.1 Basic Notions . 119 5.2.2 The Gray-Box Learning Problem . 120 5.3 Preliminaries . 121 5.4 Approach . 122 5.4.1 Observation Tables . 123 5.4.2 Inference Algorithm . 126 5.5 Implementation . 128 5.5.1 Computing the Partitions . 128 5.5.2 Reachability of the Error State . 131 5.5.3 Checking if Two Machines are Right-Equivalent . 131 5.5.4 Handling Counterexamples . 132 5.6 Evaluation . 132 5.7 Related Work . 134 5.8 Conclusions and Future Work . 135 5.A Appendix: Proofs for Chapter 5 . 136 6 MeMin: SAT-Based Exact Minimization of Incompletely Specified Mealy Machines 139 6.1 Introduction . 139 6.1.1 Outline . 141 6.2 Definitions . 141 6.2.1 Basic Definitions . 141 6.2.2 Problem Statement . 143 6.2.3 General Approach . 143 6.3 Related Work . 144 6.4 Approach . 146 6.4.1 Incompatibility Matrix . 147 6.4.2 Encoding as a SAT Problem . 147 6.4.3 Computing a Partial Solution . 149 6.5 Implementation . 149 6.5.1 Dealing with Partially Specified Outputs . 149 11 CONTENTS 6.5.2 Dealing with Partially Specified Inputs . 150 6.5.3 Undefined Reset States . 150 6.6 Evaluation . 150 6.6.1 Benchmarks . 151 6.6.2 Evaluation of MeMin . 155 6.6.3 Other Tools . 155 6.6.4 Experimental Setup . 158 6.7 Conclusions and Future Work . 158 6.A Appendix: Complete Benchmark Results . 159 7 Summary, Conclusions, and Future Work 165 7.1 Summary and Conclusions . 165 7.1.1 Models of Recent Microarchitectures . 165 7.1.2 General Models . 166 7.2 Future Work . 167 Bibliography 169 Index 197 12 1 Introduction Modern microprocessors are among the most complex man-made systems. As a consequence, it is becoming increasingly difficult to predict, explain, or optimize the performance of software running on such microprocessors. As a basis, one needs detailed models of their microarchitectures..
Recommended publications
  • Integrating Program Optimizations and Transformations with the Scheduling of Instruction Level Parallelism*
    Integrating Program Optimizations and Transformations with the Scheduling of Instruction Level Parallelism* David A. Berson 1 Pohua Chang 1 Rajiv Gupta 2 Mary Lou Sofia2 1 Intel Corporation, Santa Clara, CA 95052 2 University of Pittsburgh, Pittsburgh, PA 15260 Abstract. Code optimizations and restructuring transformations are typically applied before scheduling to improve the quality of generated code. However, in some cases, the optimizations and transformations do not lead to a better schedule or may even adversely affect the schedule. In particular, optimizations for redundancy elimination and restructuring transformations for increasing parallelism axe often accompanied with an increase in register pressure. Therefore their application in situations where register pressure is already too high may result in the generation of additional spill code. In this paper we present an integrated approach to scheduling that enables the selective application of optimizations and restructuring transformations by the scheduler when it determines their application to be beneficial. The integration is necessary because infor- mation that is used to determine the effects of optimizations and trans- formations on the schedule is only available during instruction schedul- ing. Our integrated scheduling approach is applicable to various types of global scheduling techniques; in this paper we present an integrated algorithm for scheduling superblocks. 1 Introduction Compilers for multiple-issue architectures, such as superscalax and very long instruction word (VLIW) architectures, axe typically divided into phases, with code optimizations, scheduling and register allocation being the latter phases. The importance of integrating these latter phases is growing with the recognition that the quality of code produced for parallel systems can be greatly improved through the sharing of information.
    [Show full text]
  • MBP4ASG41M-VS3.Pdf
    ASRock > G41M-VS3 Página 1 de 2 Home | Global / English [Change] About ASRock Products News Support Forum Download Awards Dealer Zone Where to Buy Products G41M-VS3 Motherboard Series »G41M-VS3 Translate »Overview & Specifications ■ Supports FSB1333/1066/800/533 MHz CPUs »Download ■ Supports Dual Channel DDR3 1333(OC) ■ Intel® Graphics Media Accelerator X4500, Pixel Shader 4.0, DirectX 10, Max. shared »Manual memory 1759MB »FAQ ■ EuP Ready »CPU Support List ■ Supports ASRock XFast RAM, XFast LAN, XFast USB Technologies ■ Supports Instant Boot, Instant Flash, OC DNA, ASRock OC Tuner (Up to 158% CPU »Memory QVL frequency increase) »Beta Zone ■ Supports Intelligent Energy Saver (Up to 20% CPU Power Saving) ■ Free Bundle : CyberLink DVD Suite - OEM and Trial; Creative Sound Blaster X-Fi MB - Trial This model may not be sold worldwide. Please contact your local dealer for the availability of this model in your region. Product Specifications General - LGA 775 for Intel® Core™ 2 Extreme / Core™ 2 Quad / Core™ 2 Duo / Pentium® Dual Core / Celeron® Dual Core / Celeron, supporting Penryn Quad Core Yorkfield and Dual Core Wolfdale processors - Supports FSB1333/1066/800/533 MHz CPU - Supports Hyper-Threading Technology - Supports Untied Overclocking Technology - Supports EM64T CPU - Northbridge: Intel® G41 Chipset - Southbridge: Intel® ICH7 - Dual Channel DDR3 memory technology - 2 x DDR3 DIMM slots - Supports DDR3 1333(OC)/1066/800 non-ECC, un-buffered memory Memory - Max. capacity of system memory: 8GB* *Due to the operating system limitation, the actual memory size may be less than 4GB for the reservation for system usage under Windows® 32-bit OS. For Windows® 64-bit OS with 64-bit CPU, there is no such limitation.
    [Show full text]
  • The Microarchitecture of the Pentium 4 Processor
    The Microarchitecture of the Pentium 4 Processor Glenn Hinton, Desktop Platforms Group, Intel Corp. Dave Sager, Desktop Platforms Group, Intel Corp. Mike Upton, Desktop Platforms Group, Intel Corp. Darrell Boggs, Desktop Platforms Group, Intel Corp. Doug Carmean, Desktop Platforms Group, Intel Corp. Alan Kyker, Desktop Platforms Group, Intel Corp. Patrice Roussel, Desktop Platforms Group, Intel Corp. Index words: Pentium® 4 processor, NetBurst™ microarchitecture, Trace Cache, double-pumped ALU, deep pipelining provides an in-depth examination of the features and ABSTRACT functions of the Intel NetBurst microarchitecture. This paper describes the Intel® NetBurst™ ® The Pentium 4 processor is designed to deliver microarchitecture of Intel’s new flagship Pentium 4 performance across applications where end users can truly processor. This microarchitecture is the basis of a new appreciate and experience its performance. For example, family of processors from Intel starting with the Pentium it allows a much better user experience in areas such as 4 processor. The Pentium 4 processor provides a Internet audio and streaming video, image processing, substantial performance gain for many key application video content creation, speech recognition, 3D areas where the end user can truly appreciate the applications and games, multi-media, and multi-tasking difference. user environments. The Pentium 4 processor enables real- In this paper we describe the main features and functions time MPEG2 video encoding and near real-time MPEG4 of the NetBurst microarchitecture. We present the front- encoding, allowing efficient video editing and video end of the machine, including its new form of instruction conferencing. It delivers world-class performance on 3D cache called the Execution Trace Cache.
    [Show full text]
  • Evolution of Microprocessor Performance
    EvolutionEvolution ofof MicroprocessorMicroprocessor PerformancePerformance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static & dynamic branch predication. Even with these improvements, the restriction of issuing a single instruction per cycle still limits the ideal CPI = 1 Multiple Issue (CPI <1) Multi-cycle Pipelined T = I x CPI x C (single issue) Superscalar/VLIW/SMT Original (2002) Intel Predictions 1 GHz ? 15 GHz to ???? GHz IPC CPI > 10 1.1-10 0.5 - 1.1 .35 - .5 (?) Source: John P. Chen, Intel Labs We next examine the two approaches to achieve a CPI < 1 by issuing multiple instructions per cycle: 4th Edition: Chapter 2.6-2.8 (3rd Edition: Chapter 3.6, 3.7, 4.3 • Superscalar CPUs • Very Long Instruction Word (VLIW) CPUs. Single-issue Processor = Scalar Processor EECC551 - Shaaban Instructions Per Cycle (IPC) = 1/CPI EECC551 - Shaaban #1 lec # 6 Fall 2007 10-2-2007 ParallelismParallelism inin MicroprocessorMicroprocessor VLSIVLSI GenerationsGenerations Bit-level parallelism Instruction-level Thread-level (?) (TLP) 100,000,000 (ILP) Multiple micro-operations Superscalar /VLIW per cycle Simultaneous Single-issue CPI <1 u Multithreading SMT: (multi-cycle non-pipelined) Pipelined e.g. Intel’s Hyper-threading 10,000,000 CPI =1 u uuu u u Chip-Multiprocessors (CMPs) u Not Pipelined R10000 e.g IBM Power 4, 5 CPI >> 1 uuuuuuu u AMD Athlon64 X2 u uuuuu Intel Pentium D u uuuuuuuu u u 1,000,000 u uu uPentium u u uu i80386 u i80286
    [Show full text]
  • Conroe and Allendale Electrical, Mechanical, and Thermal
    Intel® Xeon® Processor 5500 Series Datasheet, Volume 1 March 2009 Document Number: 321321-001 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The Intel® Xeon® Processor 5500 Series may contain design defects or errors known as errata which may cause the product to deviate from published specifications.Current characterized errata are available on request. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details. Over time processor numbers will increment based on changes in clock, speed, cache, FSB, or other features, and increments are not intended to represent proportional or quantitative increases in any particular feature.
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • Intel® Core™ Microarchitecture • Wrap Up
    EW N IntelIntel®® CoreCore™™ MicroarchitectureMicroarchitecture MarchMarch 8,8, 20062006 Stephen L. Smith Bob Valentine Vice President Architect Digital Enterprise Group Intel Architecture Group Agenda • Multi-core Update and New Microarchitecture Level Set • New Intel® Core™ Microarchitecture • Wrap Up 2 Intel Multi-core Roadmap – Updates since Fall IDF 3 Ramping Multi-core Everywhere 4 All products and dates are preliminary and subject to change without notice. Refresher: What is Multi-Core? Two or more independent execution cores in the same processor Specific implementations will vary over time - driven by product implementation and manufacturing efficiencies • Best mix of product architecture and volume mfg capabilities – Architecture: Shared Caches vs. Independent Caches – Mfg capabilities: volume packaging technology • Designed to deliver performance, OEM and end user experience Single die (Monolithic) based processor Multi-Chip Processor Example: 90nm Pentium® D Example: Intel Core™ Duo Example: 65nm Pentium D Processor (Smithfield) Processor (Yonah) Processor (Presler) Core0 Core1 Core0 Core1 Core0 Core1 Front Side Bus Front Side Bus Front Side Bus *Not representative of actual die photos or relative size 5 Intel® Core™ Micro-architecture *Not representative of actual die photo or relative size 6 Intel Multi-core Roadmap 7 Intel Multi-core Roadmap 8 Intel® Core™ Microarchitecture Based Platforms Platform 2006 20072007 Caneland Platform (2007) MP Servers Tigerton (QC) (2007) Bensley Platform (Q2’06)/ Glidewell Platform (Q2’06) ) DP Servers/ Woodcrest (Q3’06) DP Workstation Clovertown (QC) (Q1’07) Kaylo Platform (Q3’06)/ Wyloway Platform (Q3 ’06) UP Servers/ Conroe (Q3’06) UP Workstation Kentsfield (QC) (Q1’07) Bridge Creek Platform (Mid’06) Desktop -Home Conroe (Q3’06) Kentsfield (QC) (Q1’07) Desktop -Office Averill Platform (Mid’06) Conroe (Q3’06) Mobile Client Napa Platform (Q1’06) Merom (2H’06) All products and dates are preliminary 9 Note: only Intel® Core™ microarchitecture QC refers to Quad-Core and subject to change without notice.
    [Show full text]
  • Nt* and Rtl* INT 2Eh CALL Ntdll!Kifastsystemcall
    ȘFĢ: Fųřțįm’ș Đěřįvǻțįvě Bỳ Jǿșěpħ Ŀǻňđřỳ ǻňđ Ųđį Șħǻmįř Țħě Ŀǻbș țěǻm ǻț ȘěňțįňěŀǾňě řěčěňțŀỳ đįșčǿvěřěđ ǻ șǿpħįșțįčǻțěđ mǻŀẅǻřě čǻmpǻįģň șpěčįfįčǻŀŀỳ țǻřģěțįňģ ǻț ŀěǻșț ǿňě Ěųřǿpěǻň ěňěřģỳ čǿmpǻňỳ. Ųpǿň đįșčǿvěřỳ, țħě țěǻm řěvěřșě ěňģįňěěřěđ țħě čǿđě ǻňđ běŀįěvěș țħǻț bǻșěđ ǿň țħě ňǻțųřě, běħǻvįǿř ǻňđ șǿpħįșțįčǻțįǿň ǿf țħě mǻŀẅǻřě ǻňđ țħě ěxțřěmě měǻșųřěș įț țǻķěș țǿ ěvǻđě đěțěčțįǿň, įț ŀįķěŀỳ pǿįňțș țǿ ǻ ňǻțįǿň-șțǻțě șpǿňșǿřěđ įňįțįǻțįvě, pǿțěňțįǻŀŀỳ ǿřįģįňǻțįňģ įň Ěǻșțěřň Ěųřǿpě. Țħě mǻŀẅǻřě įș mǿșț ŀįķěŀỳ ǻ đřǿppěř țǿǿŀ běįňģ ųșěđ țǿ ģǻįň ǻččěșș țǿ čǻřěfųŀŀỳ țǻřģěțěđ ňěțẅǿřķ ųșěřș, ẅħįčħ įș țħěň ųșěđ ěįțħěř țǿ įňțřǿđųčě țħě pǻỳŀǿǻđ, ẅħįčħ čǿųŀđ ěįțħěř ẅǿřķ țǿ ěxțřǻčț đǻțǻ ǿř įňșěřț țħě mǻŀẅǻřě țǿ pǿțěňțįǻŀŀỳ șħųț đǿẅň ǻň ěňěřģỳ ģřįđ. Țħě ěxpŀǿįț ǻffěčțș ǻŀŀ věřșįǿňș ǿf Mįčřǿșǿfț Ẅįňđǿẅș ǻňđ ħǻș běěň đěvěŀǿpěđ țǿ bỳpǻșș țřǻđįțįǿňǻŀ ǻňțįvįřųș șǿŀųțįǿňș, ňěxț-ģěňěřǻțįǿň fįřěẅǻŀŀș, ǻňđ ěvěň mǿřě řěčěňț ěňđpǿįňț șǿŀųțįǿňș țħǻț ųșě șǻňđbǿxįňģ țěčħňįqųěș țǿ đěțěčț ǻđvǻňčěđ mǻŀẅǻřě. (bįǿměțřįč řěǻđěřș ǻřě ňǿň-řěŀěvǻňț țǿ țħě bỳpǻșș / đěțěčțįǿň țěčħňįqųěș, țħě mǻŀẅǻřě ẅįŀŀ șțǿp ěxěčųțįňģ įf įț đěțěčțș țħě přěșěňčě ǿf șpěčįfįč bįǿměțřįč věňđǿř șǿfțẅǻřě). Ẅě běŀįěvě țħě mǻŀẅǻřě ẅǻș řěŀěǻșěđ įň Mǻỳ ǿf țħįș ỳěǻř ǻňđ įș șțįŀŀ ǻčțįvě. İț ěxħįbįțș țřǻįțș șěěň įň přěvįǿųș ňǻțįǿň-șțǻțě Řǿǿțķįțș, ǻňđ ǻppěǻřș țǿ ħǻvě běěň đěșįģňěđ bỳ mųŀțįpŀě đěvěŀǿpěřș ẅįțħ ħįģħ-ŀěvěŀ șķįŀŀș ǻňđ ǻččěșș țǿ čǿňșįđěřǻbŀě řěșǿųřčěș. Ẅě vǻŀįđǻțěđ țħįș mǻŀẅǻřě čǻmpǻįģň ǻģǻįňșț ȘěňțįňěŀǾňě ǻňđ čǿňfįřměđ țħě șțěpș ǿųțŀįňěđ běŀǿẅ ẅěřě đěțěčțěđ bỳ ǿųř Đỳňǻmįč Běħǻvįǿř Țřǻčķįňģ (ĐBȚ) ěňģįňě. Mǻŀẅǻřě Șỳňǿpșįș Țħįș șǻmpŀě ẅǻș ẅřįțțěň įň ǻ mǻňňěř țǿ ěvǻđě șțǻțįč ǻňđ běħǻvįǿřǻŀ đěțěčțįǿň. Mǻňỳ ǻňțį-șǻňđbǿxįňģ țěčħňįqųěș ǻřě ųțįŀįżěđ.
    [Show full text]
  • The Intel X86 Microarchitectures Map Version 2.0
    The Intel x86 Microarchitectures Map Version 2.0 P6 (1995, 0.50 to 0.35 μm) 8086 (1978, 3 µm) 80386 (1985, 1.5 to 1 µm) P5 (1993, 0.80 to 0.35 μm) NetBurst (2000 , 180 to 130 nm) Skylake (2015, 14 nm) Alternative Names: i686 Series: Alternative Names: iAPX 386, 386, i386 Alternative Names: Pentium, 80586, 586, i586 Alternative Names: Pentium 4, Pentium IV, P4 Alternative Names: SKL (Desktop and Mobile), SKX (Server) Series: Pentium Pro (used in desktops and servers) • 16-bit data bus: 8086 (iAPX Series: Series: Series: Series: • Variant: Klamath (1997, 0.35 μm) 86) • Desktop/Server: i386DX Desktop/Server: P5, P54C • Desktop: Willamette (180 nm) • Desktop: Desktop 6th Generation Core i5 (Skylake-S and Skylake-H) • Alternative Names: Pentium II, PII • 8-bit data bus: 8088 (iAPX • Desktop lower-performance: i386SX Desktop/Server higher-performance: P54CQS, P54CS • Desktop higher-performance: Northwood Pentium 4 (130 nm), Northwood B Pentium 4 HT (130 nm), • Desktop higher-performance: Desktop 6th Generation Core i7 (Skylake-S and Skylake-H), Desktop 7th Generation Core i7 X (Skylake-X), • Series: Klamath (used in desktops) 88) • Mobile: i386SL, 80376, i386EX, Mobile: P54C, P54LM Northwood C Pentium 4 HT (130 nm), Gallatin (Pentium 4 Extreme Edition 130 nm) Desktop 7th Generation Core i9 X (Skylake-X), Desktop 9th Generation Core i7 X (Skylake-X), Desktop 9th Generation Core i9 X (Skylake-X) • Variant: Deschutes (1998, 0.25 to 0.18 μm) i386CXSA, i386SXSA, i386CXSB Compatibility: Pentium OverDrive • Desktop lower-performance: Willamette-128
    [Show full text]
  • ECE 571 – Advanced Microprocessor-Based Design Lecture 16
    ECE 571 { Advanced Microprocessor-Based Design Lecture 16 Vince Weaver http://www.eece.maine.edu/~vweaver [email protected] 31 March 2016 Announcements • Project topics • HW#8 will be similar, about a modern ARM chip 1 Busses • Grey Code, only one bit change when incrementing. Lower energy on busses? (Su and Despain, ISLPED 1995). 2 Reading of the Webpage http://anandtech.com/show/9582/intel-skylake-mobile-desktop-launch-architecture-analysis/ The Intel Skylake Mobile and Desktop Launch, with Architecture Analysis by Ian Cutress 3 Background on where info comes from Intel Developer Forum This one was in August 4 Name tech Year Conroe/Merom 65nm Tock 2006 Penryn 45nm Tick 2007 Nehalem 45nm Tock 2008 Westmere 32nm Tick 2010 Sandy Bridge 32nm Tock 2011 Ivy Bridge 22nm Tick 2012 Haswell 22nm Tock 2013 Broadwell 14nm Tick 2014 Skylake 14nm Tock 2015 Kaby Lake? 14nm Tock 2016 5 Clock: tick-tock. Upgrade the process technology, then revamp the uarch. 14nm technology? Finfets? What technology are Pis at? 40nm? 14nm yields getting better. hard to get, even with electron beam lithography plasma damage to low-k silicon only 0.111nm finfet. Intel has plants Arizona, etc. Delay to 10nm 7nm? EUV? 6 Skylake Processor { Page 1 • 4.5W ultra-mobile to 65W desktop They release desktop first these days. For example just today releasing \Xeon E5-2600 v4" AKA Broadwell-EP Confusing naming i3, i5, i7, Xeon, Pentium, m3, m5, m7, etc. Number of pins important. Low-power stuck with LPDDR3/DDR3L instead of DDR4 possibly due to lack of pins? eDRAM? 7 Intel no longer releasing info on how many transistors/transistor size? 8 Skylake Processor { Page 2 • \mobile first” design.
    [Show full text]
  • A Performance Analysis Tool for Intel SGX Enclaves
    sgx-perf: A Performance Analysis Tool for Intel SGX Enclaves Nico Weichbrodt Pierre-Louis Aublin Rüdiger Kapitza IBR, TU Braunschweig LSDS, Imperial College London IBR, TU Braunschweig Germany United Kingdom Germany [email protected] [email protected] [email protected] ABSTRACT the provider or need to refrain from offloading their workloads Novel trusted execution technologies such as Intel’s Software Guard to the cloud. With the advent of Intel’s Software Guard Exten- Extensions (SGX) are considered a cure to many security risks in sions (SGX)[14, 28], the situation is about to change as this novel clouds. This is achieved by offering trusted execution contexts, so trusted execution technology enables confidentiality and integrity called enclaves, that enable confidentiality and integrity protection protection of code and data – even from privileged software and of code and data even from privileged software and physical attacks. physical attacks. Accordingly, researchers from academia and in- To utilise this new abstraction, Intel offers a dedicated Software dustry alike recently published research works in rapid succession Development Kit (SDK). While it is already used to build numerous to secure applications in clouds [2, 5, 33], enable secure network- applications, understanding the performance implications of SGX ing [9, 11, 34, 39] and fortify local applications [22, 23, 35]. and the offered programming support is still in its infancy. This Core to all these works is the use of SGX provided enclaves, inevitably leads to time-consuming trial-and-error testing and poses which build small, isolated application compartments designed to the risk of poor performance.
    [Show full text]
  • Intel PSU Cage Replacement Process Support Guide
    Intel® PSU Cage Replacement Process Support Guide A Guide for Technically Qualified Assemblers of Intel® 2U ATX Products Document No.: PSU-01 Revision No.: 002 Disclaimer Information in this document is provided in connection with Intel® products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not designed, intended or authorized for use in any medical, life saving, or life sustaining applications or for any other application in which the failure of the Intel product could create a situation where personal injury or death may occur. Intel may make changes to specifications and product descriptions at any time, without notice. Intel server boards contain a number of high-density VLSI and power delivery components that need adequate airflow for cooling. Intel's own chassis are designed and tested to meet the intended thermal requirements of these components when the fully integrated system is used together. It is the responsibility of the system integrator that chooses not to use Intel developed server building blocks to consult vendor datasheets and operating parameters to determine the amount of airflow required for their specific application and environmental conditions. Intel Corporation can not be held responsible if components fail or the server board does not operate correctly when used outside any of their published operating or non-operating limits.
    [Show full text]