Combining Static and Dynamic Approaches to Model Loop Performance in HPC Vincent Palomares

Total Page:16

File Type:pdf, Size:1020Kb

Combining Static and Dynamic Approaches to Model Loop Performance in HPC Vincent Palomares Combining static and dynamic approaches to model loop performance in HPC Vincent Palomares To cite this version: Vincent Palomares. Combining static and dynamic approaches to model loop performance in HPC. Hardware Architecture [cs.AR]. Université de Versailles-Saint Quentin en Yvelines, 2015. English. NNT : 2015VERS040V. tel-01293040 HAL Id: tel-01293040 https://tel.archives-ouvertes.fr/tel-01293040 Submitted on 24 Mar 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Université de Versailles Saint-Quentin-en-Yvelines École Doctorale “STV” Combiner Approches Statique et Dynamique pour Modéliser la Performance de Boucles HPC Combining Static and Dynamic Approaches to Model Loop Performance in HPC THÈSE présentée et soutenue publiquement le 21 Septembre 2015 pour l’obtention du Doctorat de l’Université de Versailles Saint-Quentin-en-Yvelines (spécialité informatique) par Vincent Palomares Composition du jury Président : François Bodin - Professeur, Université de Rennes Rapporteurs : Denis Barthou - Professeur, Université de Bordeaux Henri-Pierre Charles - Directeur de Recherche, CEA Examinateurs : Alexandre Farcy - CPU Architect, Intel David J. Kuck - Fellow, Intel Directeur de thèse : William Jalby - Professeur, Université de Versailles Thanks I would first like to thank William Jalby, my advisor, for the help and guidance he provided throughout these past few years. His healthy doses of both optimism and skepticism motivated me to keep aiming higher. I would also like to thank David Wong and David Kuck, with whom collab- orating was very pleasant and stimulating. Working on Cape with them was an enriching experience. I am grateful to Denis Barthou and Henri-Pierre Charles, the reporters, for reviewing my work and making suggestions on how to improve my manuscript. Their input was very helpful and contributed to making this document clearer. My thesis work was extremely fun, in no small part thanks to Zakaria Bendi- fallah and José Noudohouenou. We certainly had our share of good laughs in our office. Let’s just hope the quotations we’ve collected along the years1 never get in the wrong hands. I would like to thank all the members of the lab for their dynamism, competence and friendliness. I do not want to make a comprehensive list here (for fear I may forget a name or two2 and cause jealousy between those mentioned and those not). Surely, a blanket statement should be enough to prevent that?! Some of them are however definitely worthy of a special mention. First, Emmanuel Oseret, with whom I often discussed microarchitectural details on Intel CPUs, and who agreed to implement some features in his CQA tool that were specially tailored to my needs. He also proofread my manuscript, helping me rid it of various mistakes. I also want to mention Mathieu Tribalat, who helped me navigate some of MAQAO’s intricacies, and whose work in taking over DECAN made getting application measurements considerably easier. The end-of-thesis rush and the tension that came with it were made easier by chatting (and complaining!) with other students in the same situation. I want to thank Nicolas Triquenaux for finding and sharing information about thesis completion procedures, and Zakaria (once again), whose perpetual struggle with paperwork made it easier to see mine was quite mild after all. On a more personal side, I would like to thank Anna Galusza, my fiancée, for supporting me throughout the writing of this manuscript and reviewing some of its key parts, as well as my parents and the rest of my family for helping me get this far. Finally, I would like to thank you, the reader: you are the reason why I wrote this manuscript3. This is particularly true if you read it until the end, at which point you should feel free to fill the following blank with your name4: THANK YOU, ! 1 Some quotations? What quotations? Move along, there is nothing to see here! 2 Or three... 3 You, and getting my Ph.D., that is. 4 NB: Not all libraries are very fond of this practice, so you might want to be discreet if using a borrowed copy. ii Fluffy Canary I am told I can do anything I want with the Thanks section... Let’s see if it’s true! iii Résumé: La complexité des CPUs s’est accrue considérablement depuis leurs débuts, in- troduisant des mécanismes comme le renommage de registres, l’exécution dans le désordre, la vectorisation, les préfetchers et les environnements multi-coeurs pour améliorer les performances avec chaque nouvelle génération de processeurs. Cepen- dant, la difficulté a suivi la même tendance pour ce qui est a) d’utiliser ces mêmes mécanismes à leur plein potentiel, b) d’évaluer si un programme utilise une machine correctement, ou c) de savoir si le design d’un processeur répond bien aux besoins des utilisateurs. Cette thèse porte sur l’amélioration de l’observabilité des facteurs limitants dans les boucles de calcul intensif, ainsi que leurs interactions au sein de microarchitec- tures modernes. Nous introduirons d’abord un framework combinant CQA et DECAN (des outils d’analyse respectivement statique et dynamique) pour obtenir des métriques détail- lées de performance sur des petits codelets et dans divers scénarios d’exécution. Nous présenterons ensuite PAMDA, une méthodologie d’analyse de performance tirant partie de l’analyse de codelets pour détecter d’éventuels problèmes de perfor- mance dans des applications de calcul à haute performance et en guider la résolution. Un travail permettant au modèle linéaire Cape de couvrir la microarchitecture Sandy Bridge de façon détaillée sera décrit, lui donnant plus de flexibilité pour effectuer du codesign matériel / logiciel. Il sera mis en pratique dans VP3, un outil évaluant les gains de performance atteignables en vectorisant des boucles. Nous décrirons finalement UFS, une approche combinant analyse statique et simulation au cycle près pour permettre l’estimation rapide du temps d’exécution d’une boucle en prenant en compte certaines des limites de l’exécution en désordre dans des microarchitectures modernes. Mots Clés: codelet, analyse de boucle, analyse statique, analyse dynamique, calcul intensif, HPC, optimisation, modélisation rapide, performance, exécution dans le désordre, simulation au cycle près iv Abstract: The complexity of CPUs has increased considerably since their beginnings, intro- ducing mechanisms such as register renaming, out-of-order execution, vectorization, prefetchers and multi-core environments to keep performance rising with each prod- uct generation. However, so has the difficulty in making proper use of all these mechanisms, or even evaluating whether one’s program makes good use of a ma- chine, whether users’ needs match a CPU’s design, or, for CPU architects, knowing how each feature really affects customers. This thesis focuses on increasing the observability of potential bottlenecks in HPC computational loops and how they relate to each other in modern microarchi- tectures. We will first introduce a framework combining CQA and DECAN (respectively static and dynamic analysis tools) to get detailed performance metrics on small codelets in various execution scenarios. We will then present PAMDA, a performance analysis methodology leveraging elements obtained from codelet analysis to detect potential performance problems in HPC applications and help resolve them. A work extending the Cape linear model to better cover Sandy Bridge and give it more flexibility for HW/SW codesign purposes will also be described. It will be directly used in VP3, a tool evaluating the performance gains vectorizing loops could provide. Finally, we will describe UFS, an approach combining static analysis and cycle- accurate simulation to very quickly estimate a loop’s execution time while accounting for out-of-order limitations in modern CPUs. Keywords: codelet, loop analysis, static analysis, dynamic analysis, HPC, optimization, fast modeling, performance, out-of-order, cycle-accurate simulation Contents 1 Introduction1 1.1 High Performance Computing (HPC)..................1 1.2 Objectives and Contributions......................2 1.3 Overview.................................2 2 Background5 2.1 Recent Microarchitectures........................5 2.1.1 Sandy Bridge...........................5 2.1.2 Ivy Bridge............................. 10 2.1.3 Haswell.............................. 10 2.1.4 Silvermont............................. 12 2.2 Performance Analysis........................... 14 2.2.1 Static Analysis.......................... 15 2.2.2 Dynamic Analysis........................ 16 2.2.3 Simulation............................. 18 2.3 Using Codelets.............................. 21 2.3.1 Codelet Presentation....................... 21 2.3.2 Artificial Codelets........................ 22 2.3.3 Extracted Codelets........................ 23 3 Codelet Performance Measurement Framework 25 3.1 Introduction................................ 25 3.2 Target Loops: Numerical Recipes Codelets............... 26 3.2.1 Obtention and Target Properties................ 26 3.2.2 Presentation and Categories................... 27 3.3 Measurement
Recommended publications
  • Microcode Revision Guidance August 31, 2019 MCU Recommendations
    microcode revision guidance August 31, 2019 MCU Recommendations Section 1 – Planned microcode updates • Provides details on Intel microcode updates currently planned or available and corresponding to Intel-SA-00233 published June 18, 2019. • Changes from prior revision(s) will be highlighted in yellow. Section 2 – No planned microcode updates • Products for which Intel does not plan to release microcode updates. This includes products previously identified as such. LEGEND: Production Status: • Planned – Intel is planning on releasing a MCU at a future date. • Beta – Intel has released this production signed MCU under NDA for all customers to validate. • Production – Intel has completed all validation and is authorizing customers to use this MCU in a production environment.
    [Show full text]
  • A Superscalar Out-Of-Order X86 Soft Processor for FPGA
    A Superscalar Out-of-Order x86 Soft Processor for FPGA Henry Wong University of Toronto, Intel [email protected] June 5, 2019 Stanford University EE380 1 Hi! ● CPU architect, Intel Hillsboro ● Ph.D., University of Toronto ● Today: x86 OoO processor for FPGA (Ph.D. work) – Motivation – High-level design and results – Microarchitecture details and some circuits 2 FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT6-LUT 6-LUT6-LUT 3 6-LUT 6-LUT FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT 6-LUT 6-LUT 6-LUT 4 6-LUT 6-LUT FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 5 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 6 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel
    [Show full text]
  • The Microarchitecture of the Pentium 4 Processor
    The Microarchitecture of the Pentium 4 Processor Glenn Hinton, Desktop Platforms Group, Intel Corp. Dave Sager, Desktop Platforms Group, Intel Corp. Mike Upton, Desktop Platforms Group, Intel Corp. Darrell Boggs, Desktop Platforms Group, Intel Corp. Doug Carmean, Desktop Platforms Group, Intel Corp. Alan Kyker, Desktop Platforms Group, Intel Corp. Patrice Roussel, Desktop Platforms Group, Intel Corp. Index words: Pentium® 4 processor, NetBurst™ microarchitecture, Trace Cache, double-pumped ALU, deep pipelining provides an in-depth examination of the features and ABSTRACT functions of the Intel NetBurst microarchitecture. This paper describes the Intel® NetBurst™ ® The Pentium 4 processor is designed to deliver microarchitecture of Intel’s new flagship Pentium 4 performance across applications where end users can truly processor. This microarchitecture is the basis of a new appreciate and experience its performance. For example, family of processors from Intel starting with the Pentium it allows a much better user experience in areas such as 4 processor. The Pentium 4 processor provides a Internet audio and streaming video, image processing, substantial performance gain for many key application video content creation, speech recognition, 3D areas where the end user can truly appreciate the applications and games, multi-media, and multi-tasking difference. user environments. The Pentium 4 processor enables real- In this paper we describe the main features and functions time MPEG2 video encoding and near real-time MPEG4 of the NetBurst microarchitecture. We present the front- encoding, allowing efficient video editing and video end of the machine, including its new form of instruction conferencing. It delivers world-class performance on 3D cache called the Execution Trace Cache.
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • Evaluation of the Intel Sandy Bridge-EP Server Processor
    Evaluation of the Intel Sandy Bridge-EP server processor Sverre Jarp, Alfio Lazzaro, Julien Leduc, Andrzej Nowak CERN openlab, March 2012 – version 2.2 Executive Summary In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing an 8-core “Sandy Bridge-EP” processor with Intel’s previous microarchitecture, the “Westmere-EP”. The Intel marketing names for these processors are “Xeon E5-2600 processor series” and “Xeon 5600 processor series”, respectively. Both processors are produced in a 32nm process, and both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Simultaneous Multi-Threading (SMT), the cache sizes available, the memory configuration installed, as well as the power configuration if throughput per watt is to be measured. Software must also be kept under control and we show that a change of operating system or compiler can lead to different results, as well. We have tried to do a good job of comparing like with like. In summary, we obtained a performance improvement of 9 – 20% per core and 46 – 60% improvement across all cores available.
    [Show full text]
  • The Intel X86 Microarchitectures Map Version 2.0
    The Intel x86 Microarchitectures Map Version 2.0 P6 (1995, 0.50 to 0.35 μm) 8086 (1978, 3 µm) 80386 (1985, 1.5 to 1 µm) P5 (1993, 0.80 to 0.35 μm) NetBurst (2000 , 180 to 130 nm) Skylake (2015, 14 nm) Alternative Names: i686 Series: Alternative Names: iAPX 386, 386, i386 Alternative Names: Pentium, 80586, 586, i586 Alternative Names: Pentium 4, Pentium IV, P4 Alternative Names: SKL (Desktop and Mobile), SKX (Server) Series: Pentium Pro (used in desktops and servers) • 16-bit data bus: 8086 (iAPX Series: Series: Series: Series: • Variant: Klamath (1997, 0.35 μm) 86) • Desktop/Server: i386DX Desktop/Server: P5, P54C • Desktop: Willamette (180 nm) • Desktop: Desktop 6th Generation Core i5 (Skylake-S and Skylake-H) • Alternative Names: Pentium II, PII • 8-bit data bus: 8088 (iAPX • Desktop lower-performance: i386SX Desktop/Server higher-performance: P54CQS, P54CS • Desktop higher-performance: Northwood Pentium 4 (130 nm), Northwood B Pentium 4 HT (130 nm), • Desktop higher-performance: Desktop 6th Generation Core i7 (Skylake-S and Skylake-H), Desktop 7th Generation Core i7 X (Skylake-X), • Series: Klamath (used in desktops) 88) • Mobile: i386SL, 80376, i386EX, Mobile: P54C, P54LM Northwood C Pentium 4 HT (130 nm), Gallatin (Pentium 4 Extreme Edition 130 nm) Desktop 7th Generation Core i9 X (Skylake-X), Desktop 9th Generation Core i7 X (Skylake-X), Desktop 9th Generation Core i9 X (Skylake-X) • Variant: Deschutes (1998, 0.25 to 0.18 μm) i386CXSA, i386SXSA, i386CXSB Compatibility: Pentium OverDrive • Desktop lower-performance: Willamette-128
    [Show full text]
  • A Performance Analysis Tool for Intel SGX Enclaves
    sgx-perf: A Performance Analysis Tool for Intel SGX Enclaves Nico Weichbrodt Pierre-Louis Aublin Rüdiger Kapitza IBR, TU Braunschweig LSDS, Imperial College London IBR, TU Braunschweig Germany United Kingdom Germany [email protected] [email protected] [email protected] ABSTRACT the provider or need to refrain from offloading their workloads Novel trusted execution technologies such as Intel’s Software Guard to the cloud. With the advent of Intel’s Software Guard Exten- Extensions (SGX) are considered a cure to many security risks in sions (SGX)[14, 28], the situation is about to change as this novel clouds. This is achieved by offering trusted execution contexts, so trusted execution technology enables confidentiality and integrity called enclaves, that enable confidentiality and integrity protection protection of code and data – even from privileged software and of code and data even from privileged software and physical attacks. physical attacks. Accordingly, researchers from academia and in- To utilise this new abstraction, Intel offers a dedicated Software dustry alike recently published research works in rapid succession Development Kit (SDK). While it is already used to build numerous to secure applications in clouds [2, 5, 33], enable secure network- applications, understanding the performance implications of SGX ing [9, 11, 34, 39] and fortify local applications [22, 23, 35]. and the offered programming support is still in its infancy. This Core to all these works is the use of SGX provided enclaves, inevitably leads to time-consuming trial-and-error testing and poses which build small, isolated application compartments designed to the risk of poor performance.
    [Show full text]
  • Intel PSU Cage Replacement Process Support Guide
    Intel® PSU Cage Replacement Process Support Guide A Guide for Technically Qualified Assemblers of Intel® 2U ATX Products Document No.: PSU-01 Revision No.: 002 Disclaimer Information in this document is provided in connection with Intel® products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not designed, intended or authorized for use in any medical, life saving, or life sustaining applications or for any other application in which the failure of the Intel product could create a situation where personal injury or death may occur. Intel may make changes to specifications and product descriptions at any time, without notice. Intel server boards contain a number of high-density VLSI and power delivery components that need adequate airflow for cooling. Intel's own chassis are designed and tested to meet the intended thermal requirements of these components when the fully integrated system is used together. It is the responsibility of the system integrator that chooses not to use Intel developed server building blocks to consult vendor datasheets and operating parameters to determine the amount of airflow required for their specific application and environmental conditions. Intel Corporation can not be held responsible if components fail or the server board does not operate correctly when used outside any of their published operating or non-operating limits.
    [Show full text]
  • M39 Sandy Bridge-PDF
    SANDY BRIDGE SPANS GENERATIONS Intel Focuses on Graphics, Multimedia in New Processor Design By Linley Gwennap {9/27/10-01} ................................................................................................................... Intel’s processor clock has tocked, delivering a next- periods. For notebook computers, these improvements can generation architecture for PCs and servers. At the recent significantly extend battery life by completing tasks more Intel Developer’s Forum (IDF), the company unveiled its quickly and allowing the system to revert to a sleep state. Sandy Bridge processor architecture, the next tock in its tick-tock roadmap. The new CPU is an evolutionary im- Integration Boosts Graphics Performance provement over its predecessor, Nehalem, tweaking the Intel had a false start with integrated graphics: the ill-fated branch predictor, register renaming, and instruction de- Timna project, which was canceled in 2000. More recently, coding. These changes will slightly improve performance Nehalem-class processors known as Arrandale and Clark- on traditional integer applications, but we may be reaching dale “integrated” graphics into the processor, but these the point where the CPU microarchitecture is so efficient, products actually used two chips in one package, as Figure few ways remain to improve performance. 1 shows. By contrast, Sandy Bridge includes the GPU on The big changes in Sandy Bridge target multimedia the processor chip, providing several benefits. The GPU is applications such as 3D graphics, image processing, and now built in the same leading-edge manufacturing process video processing. The chip is Intel’s first to integrate the as the CPU, rather than an older process, as in earlier graphics processing unit (GPU) on the processor itself.
    [Show full text]
  • Intel® Itanium® Architecture Assembly Language Reference Guide
    Intel® Itanium® Architecture Assembly Language Reference Guide Copyright © 2000 - 2003 Intel Corporation. All rights reserved. Order Number: 248801-004 World Wide Web: http://developer.intel.com Intel(R) Itanium(R) Architecture Assembly Lanuage Reference Guide Page 2 Disclaimer and Legal Information Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. This Intel® Itanium® Architecture Assembly Language Reference Guide as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
    [Show full text]
  • CPU) MCU / MPU / DSP This Page of Product Is Rohs Compliant
    INTEL Central Processing Units (CPU) MPU /DSP MCU / This page of product is RoHS compliant. CENTRAL PROCESSING UNITS (CPU) Intel Processor families include the most powerful and flexible Central Processing Units (CPUs) available today. Utilizing industry leading 22nm device fabrication techniques, Intel continues to pack greater processing power into smaller spaces than ever before, providing desktop, mobile, and embedded products with maximum performance per watt across a wide range of applications. Atom Celeron Core Pentium Xeon For quantities greater than listed, call for quote. MOUSER Intel Core Cache Data Price Each Package Processor Family Code Freq. Size No. of Bus Width TDP STOCK NO. Part No. Series Name (GHz) (MB) Cores (bit) (Max) (W) 1 10 Desktop Intel 607-DF8064101211300Y DF8064101211300S R0VY FCBGA-559 D2550 Atom™ Cedarview 1.86 1 2 64 10 61.60 59.40 607-CM8063701444901S CM8063701444901S R10K FCLGA-1155 G1610 Celeron® Ivy Bridge 2.6 2 2 64 55 54.93 52.70 607-RK80532RC041128S RK80532RC041128S L6VR PPGA-478 - Celeron® Northwood 2.0 0.0156 1 32 52.8 42.00 40.50 607-CM8062301046804S CM8062301046804S R05J FCLGA-1155 G540 Celeron® Sandy Bridge 2.5 2 2 64 65 54.60 52.65 607-AT80571RG0641MLS AT80571RG0641MLS LGTZ LGA-775 E3400 Celeron® Wolfdale 2.6 1 2 64 65 54.93 52.70 607-HH80557PG0332MS HH80557PG0332MS LA99 LGA-775 E4300 Core™ 2 Conroe 1.8 2 2 64 65 139.44 133.78 607-AT80570PJ0806MS AT80570PJ0806MS LB9J LGA-775 E8400 Core™ 2 Wolfdale 3.0 6 2 64 65 207.04 196.00 607-AT80571PH0723MLS AT80571PH0723MLS LGW3 LGA-775 E7400 Core™ 2 Wolfdale
    [Show full text]
  • Intel® Omni-Path Architecture Overview and Update
    The architecture for Discovery June, 2016 Intel Confidential Caught in the Vortex…? Business Efficiency & Agility DATA: Trust, Privacy, sovereignty Innovation: New Economy Biz Models Macro Economic Effect Growth Enablers/Inhibitors Intel® Solutions Summit 2016 2 Intel® Solutions Summit 2016 3 Intel Confidential 4 Data Center Blocks Reduce Complexity Intel engineering, validation, support Data Center Blocks Speed time to market Begin with a higher level of integration HPC Cloud Enterprise Storage Increase Value VSAN Ready HPC Compute SMB Server Block Reduce TCO, value pricing Block Node Fuel innovation Server blocks for specific segments Focus R&D on value-add and differentiation Intel® Solutions Summit 2016 5 A Holistic Design Solution for All HPC Needs Intel® Scalable System Framework Small Clusters Through Supercomputers Compute Memory/Storage Compute and Data-Centric Computing Fabric Software Standards-Based Programmability On-Premise and Cloud-Based Intel Silicon Photonics Intel® Xeon® Processors Intel® Solutions for Lustre* Intel® Omni-Path Architecture HPC System Software Stack Intel® Xeon Phi™ Processors Intel® SSDs Intel® True Scale Fabric Intel® Software Tools Intel® Xeon Phi™ Coprocessors Intel® Optane™ Technology Intel® Ethernet Intel® Cluster Ready Program Intel® Server Boards and Platforms 3D XPoint™ Technology Intel® Silicon Photonics Intel® Visualization Toolkit Intel Confidential 14 Parallel is the Path Forward Intel® Xeon® and Intel® Xeon Phi™ Product Families are both going parallel How do we attain extremely high compute
    [Show full text]