Intel® Omni-Path Architecture Overview and Update

Total Page:16

File Type:pdf, Size:1020Kb

Intel® Omni-Path Architecture Overview and Update The architecture for Discovery June, 2016 Intel Confidential Caught in the Vortex…? Business Efficiency & Agility DATA: Trust, Privacy, sovereignty Innovation: New Economy Biz Models Macro Economic Effect Growth Enablers/Inhibitors Intel® Solutions Summit 2016 2 Intel® Solutions Summit 2016 3 Intel Confidential 4 Data Center Blocks Reduce Complexity Intel engineering, validation, support Data Center Blocks Speed time to market Begin with a higher level of integration HPC Cloud Enterprise Storage Increase Value VSAN Ready HPC Compute SMB Server Block Reduce TCO, value pricing Block Node Fuel innovation Server blocks for specific segments Focus R&D on value-add and differentiation Intel® Solutions Summit 2016 5 A Holistic Design Solution for All HPC Needs Intel® Scalable System Framework Small Clusters Through Supercomputers Compute Memory/Storage Compute and Data-Centric Computing Fabric Software Standards-Based Programmability On-Premise and Cloud-Based Intel Silicon Photonics Intel® Xeon® Processors Intel® Solutions for Lustre* Intel® Omni-Path Architecture HPC System Software Stack Intel® Xeon Phi™ Processors Intel® SSDs Intel® True Scale Fabric Intel® Software Tools Intel® Xeon Phi™ Coprocessors Intel® Optane™ Technology Intel® Ethernet Intel® Cluster Ready Program Intel® Server Boards and Platforms 3D XPoint™ Technology Intel® Silicon Photonics Intel® Visualization Toolkit Intel Confidential 14 Parallel is the Path Forward Intel® Xeon® and Intel® Xeon Phi™ Product Families are both going parallel How do we attain extremely high compute density for parallel workloads AND maintain the robust programming models and tools that developers crave? Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon® Intel® Xeon Phi™ Intel® Xeon Phi™ processor processor processor processor processor code- processor code- processor code- coprocessor coprocessor named 64-bit 5100 series 5500 series 5600 series named Sandy named Knights Knights Ivy Bridge Bridge EP HaswellEP Corner Landing1 EP Core(s) 1 2 4 6 8 12 18 57-61 72 Threads 2 2 8 12 16 24 36 228-244 288 256 256 256 SIMD Width 128 128 128 128 AVX AVX AVX2 512 2 x 512 More cores More Threads Wider vectors *Product specification for launched and shipped products available on ark.intel.com. 1. Not launched - in planning. Intel Confidential Tick-Tock Development Model Sustained Microprocessor Leadership Nehalem Sandy Bridge Haswell SkyLake Microarchitecture Microarchitecture Microarchitecture Microarchitecture Sandy Nehalem Westmere Ivy Bridge Haswell Broadwell SkyLake Future Bridge 45nm 32nm 32nm 22nm 22nm 14nm 14nm XXnm New New New New New New New New Micro- Process Micro- Process Micro- Process Micro- Process architecture Technology architecture Technology architecture Technology architecture Technology (SSE) (AVX) (AVX2) (AVX512) TOCK TICK TOCK TICK TOCK TICK TOCK TICK Typically, Increase in Transistor Density Enables New Capabilities, Higher Performance Levels, and Greater Energy Efficiency Intel Confidential 8 Intel Confidential Intel® Xeon® processor E5-2600 v4 product family Grantley-Refresh Overview Broadwell microarchitecture Built on 14nm process technology Socket compatible# replacement for Intel® Xeon® processor E5-2600 v3 on Grantley Several new features and capabilities Feature Xeon E5-2600 v3 (Haswell-EP) Xeon E5-2600 v4 (Broadwell-EP) Cores Per Socket Up to 18 Up to 22 Threads Per Socket Up to 36 threads Up to 44 threads Last-level Cache (LLC) Up to 45 MB Up to 55 MB QPI Speed (GT/s) 2x QPI 1.1 channels 6.4, 8.0, 9.6 GT/s PCIe* Lanes/ Controllers/Speed(GT/s) 40 / 10 / PCIe* 3.0 (2.5, 5, 8 GT/s) 4 channels of up to 3 RDIMMs or 3 Memory Population + 3DS LRDIMM& LRDIMMs Max Memory Speed Up to 2133 Up to 2400 TDP (W) 160 (Workstation only), 145, 135, 120, 105, 90, 85, 65, 55 # Requires BIOS and firmware update All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. & Depends on market availability Intel may make changes to specifications and product descriptions at any time, without notice Intel Confidential Intel Confidential Intel® Xeon Phi™ Product Family Highly-parallel processing to power your breakthrough innovations “Meet Knight's Landing: Intel's most Future powerful chip ever is overflowing with cutting- edge technologies” – PC World 06/2014 Knights Hill 3rd generation Coming Soon . 10 nm process . Integrated Fabric (2nd Knights Landing Generation) Intel® Xeon Phi™ . In Planning… Available Today x200 Product Family . 14 nm process Knights Corner . Host Processor Intel® Xeon Phi™ & Coprocessor 1 x100 Product Family . >3 TF DP Peak . 22 nm process . Up to 72 Cores . Coprocessor only . Up to 16GB HBM 2 . >1 TF DP Peak . Up to 384GB DDR4 . Up to 61 Cores . ~500 GB/s STREAM . Integrated Fabric2 . Up to 16GB GDDR5 *Results will vary. This simplified test is the result of the distillation of the more in-depth programming guide found here: https://software.intel.com/sites/default/files/article/383067/is-xeon-phi-right-for-me.pdf All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subjectIntel to chConfidentialange without notice. 1 Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expecations of cores, clock frequency and floating point operations per cycle. FLOPS = cores x clock frequency x floating-point operations per second per cycle. 2 Host processor only Intel Confidential Three (3) Knights Landing Products Groveport Platform Ingredient in Grantley/Purley Platform Knights Landing Processor Knights Landing Coprocessor “Self-boot” Intel® Xeon Phi™ processor platform Requires Intel® Xeon® processor host Solution for future clusters with both Xeon and Xeon Phi Solution for general purpose servers and workstations . Binary-compatible with Intel® Xeon® processor (Skylake) . Targeted for applications with larger sections of serial work1 . Higher performance density for highly parallel applications2 . Upgrade path from Knights Corner as PCIe* card . Reduced system power consumption2 . Higher perf/Watt & perf/$$3 *Other names and brands may be claimed as the property of others. 1 Projections based on early product definition and as compared to prior generation Intel® Xeon Phi™ Coprocessors For more info, download the Groveport (KNL) Snapshot: 2 Based on Intel internal analysis. Lower power based on power consumption estimates between (2) HCAs https://sharepoint.amr.ith.intel.com/sites/snapshot/Groveport compared to 15W additional power for KNL-F. Higher density based on removal of PCIe* slots and associated HCAs populated in those slots. 3 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Intel Confidential Knights Landing Architectural Diagram Up to 16GB high-bandwidth Over 3 TF DP peak on-package memory Full Xeon ISA compatibility through AVX-512 (MCDRAM) ~3x single-thread vs. compared to Knights Exposed as NUMA node Corner 2x 512b VPU per core ~500 GB/s sustained BW (Vector Processing Units) Up to 72 cores 2D mesh architecture MCDRAM MCDRAM MCDRAM MCDRAM Tile DDR4 DDR4 6 channels 2 2 DDR4 Up to DDR4 DDR4 VPU VPU Up to 72 cores HUB 384GB DDR4 DDR4 Core Core MCDRAM MCDRAM MCDRAM MCDRAM 1MB L2 Wellsburg DMI PCH HFI Based on Intel® Atom Silvermont processor with many HPC enhancements Common with Deep out-of-order buffers Grantley PCH PCIe Gen3 Gather/scatter in hardware 2 ports Storm Lake x36 Improved branch prediction (IFP) (IFP) Coax Cable Cable Coax Coax Cable Cable Coax - - Integrated Fabric 4 threads/core Micro Micro On-package High cache bandwidth 50 GB/s bi-directional & more Intel Confidential Intel® True Scale Fabric ~10% of Verbs-based Instructions . Network Infrastructure - Optimized Price/Performance interconnect for HPC . Host Architecture - High MPI message rate & low end-to-end latency . Scalable Switch Solution - Performance & Latency scales with network Intel Confidential — Do Not Forward Intel® Omni-Path Architecture: Changing the Fabric Landscape Next Generation Optimizing • Performance • Density Next Intel® Xeon Phi™ processor • Power (Knights Hill) • Cost CPU-Fabric Next Intel® Xeon® processor Integration Intel® Xeon Phi™ processor (Knights Landing) Multi-chip package integration Next Intel® Xeon® processor Discrete PCIe HFI Intel® OPA Intel® Xeon® processor E5-2600 v3 HFI Card + Discrete PCIe HFI Time Intel Confidential Intel® Omni-Path Architecture Product Family Wolf River Prairie River Software Cables HFI Switch ASIC ASIC Intel OPA Gen1 Host Fabric Interface (HFI) Silicon Intel OPA Gen1 Switch Silicon Intel® Fabric Suite Passive Copper 2 x 100 Gbps, 50 GB/sec Fabric Bandwidth 48 ports, 9.6Tb/s, 1200 GB/sec Fabric Bandwidth [based on OFA with & Active Optical Intel® OPA support] Cable (AOC) 768-port (20U) 2 1056-port (in planning) 192-port (7U) 2 Product Line 264-port (in planning) Custom Mezz Standard Integrated Xeon® Edge Switch1 Director Class Custom & PCIe Cards 1 PCIe Board and Xeon Phi™ (Eldorado Forest) 1 (Chippewa Forest) Switches (DCS) Switches (Sawtooth Forest) . OEM products . Low Profile PCIe v3.0 x16 . Knights Landing: . 24 / 48 port individual QSFP28 ports . Full Bisection Bandwidth . OEM products . QSFP-based leaf module Passive Cu based on Wolf . Low Profile PCIe v3.0 x8 2 x 100 Gbps ports . Short reach – QSFP28 Cu cables based on Prairie
Recommended publications
  • Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance
    White Paper Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation White Paper Inside Intel®Core™ Microarchitecture Introduction Introduction 2 The Intel® Core™ microarchitecture is a new foundation for Intel®Core™ Microarchitecture Design Goals 3 Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art multi-core optimized Delivering Energy-Efficient Performance 4 and power-efficient microarchitecture is designed to deliver Intel®Core™ Microarchitecture Innovations 5 increased performance and performance-per-watt—thus increasing Intel® Wide Dynamic Execution 6 overall energy efficiency. This new microarchitecture extends the energy efficient philosophy first delivered in Intel's mobile Intel® Intelligent Power Capability 8 microarchitecture found in the Intel® Pentium® M processor, and Intel® Advanced Smart Cache 8 greatly enhances it with many new and leading edge microar- Intel® Smart Memory Access 9 chitectural innovations as well as existing Intel NetBurst® microarchitecture features. What’s more, it incorporates many Intel® Advanced Digital Media Boost 10 new and significant innovations designed to optimize the Intel®Core™ Microarchitecture and Software 11 power, performance, and scalability of multi-core processors. Summary 12 The Intel Core microarchitecture shows Intel’s continued Learn More 12 innovation by delivering both greater energy efficiency Author Biographies 12 and compute capability required for the new workloads and usage models now making their way across computing. With its higher performance and low power, the new Intel Core microarchitecture will be the basis for many new solutions and form factors. In the home, these include higher performing, ultra-quiet, sleek and low-power computer designs, and new advances in more sophisticated, user-friendly entertainment systems.
    [Show full text]
  • Dual-Core Intel® Xeon® Processor 3100 Series Specification Update
    Dual-Core Intel® Xeon® Processor 3100 Series Specification Update — on 45 nm Process in the 775-land LGA Package December 2010 Notice: Dual-Core Intel® Xeon® Processor 3100 Series may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are documented in this Specification Update. Document Number: 319006-009 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Enabling Execute Disable Bit functionality requires a PC with a processor with Execute Disable Bit capability and a supporting operating system.
    [Show full text]
  • Microcode Revision Guidance August 31, 2019 MCU Recommendations
    microcode revision guidance August 31, 2019 MCU Recommendations Section 1 – Planned microcode updates • Provides details on Intel microcode updates currently planned or available and corresponding to Intel-SA-00233 published June 18, 2019. • Changes from prior revision(s) will be highlighted in yellow. Section 2 – No planned microcode updates • Products for which Intel does not plan to release microcode updates. This includes products previously identified as such. LEGEND: Production Status: • Planned – Intel is planning on releasing a MCU at a future date. • Beta – Intel has released this production signed MCU under NDA for all customers to validate. • Production – Intel has completed all validation and is authorizing customers to use this MCU in a production environment.
    [Show full text]
  • Single Board Computer
    Single Board Computer SBC with the Intel® 8th generation Core™/Xeon® (formerly Coffee Lake H) SBC-C66 and 9th generation Core™ / Xeon® / Pentium® / Celeron® (formerly Coffee Lake Refresh) CPUs High-performing, flexible solution for intelligence at the edge HIGHLIGHTS CONNECTIVITY CPU 2x USB 3.1; 4x USB 2.0; NVMe SSD Slot; PCI-e x8 Intel® 8th gen. Core™ / Xeon® and 9th gen. Core™ / port (PCI-e x16 mechanical slot); VPU High Speed Xeon® / Pentium® / Celeron® CPUs Connector with 4xUSB3.1 + 2x PCI-ex4 GRAPHICS MEMORY Intel® UHD Graphics 630/P630 architecture, up to 128GB DDR4 memory on 4x SO-DIMM Slots supports up to 3 independent displays (ECC supported) Available in Industrial Temperature Range MAIN FIELDS OF APPLICATION Biomedical/ Gaming Industrial Industrial Surveillance Medical devices Automation and Internet of Control Things FEATURES ® ™ ® Intel 8th generation Core /Xeon (formerly Coffee Lake H) CPUs: Max Cores 6 • Intel® Core™ i7-8850H, Six Core @ 2.6GHz (4.3GHz Max 1 Core Turbo), 9MB Cache, 45W TDP (35W cTDP), with Max Thread 12 HyperThreading • Intel® Core™ i5-8400H, Quad Core @ 2.5GHz (4.2GHz Intel® QM370, HM370 or CM246 Platform Controller Hub Chipset Max 1 Core Turbo), 8MB Cache, 45W TDP (35W cTDP), (PCH) with HyperThreading • Intel® Core™ i3-8100H, Quad Core @ 3.0GHz, 6MB 2x DDR4-2666 or 4x DDR4-2444 ECC SODIMM Slots, up to 128GB total (only with 4 SODIMM modules). Cache, 45W TDP (35W cTDP) Memory ® ™ ® ® ECC DDR4 memory modules supported only with Xeon Core Information subject to change. Please visit www.seco.com to find the latest version of this datasheet Information subject to change.
    [Show full text]
  • IBM Posts SPEC CPU2006 Scores for Quad-Core X3200 M2 X3200 M2 Achieves Leadership Specint2006 Score for a Single-Socket Server Using Intel Xeon X3370 Processor
    IBM posts SPEC CPU2006 scores for quad-core x3200 M2 x3200 M2 achieves leadership SPECint2006 score for a single-socket server using Intel Xeon X3370 processor August 12, 2008 ... IBM® System xTM 3200 M2 server is an affordable, single-socket tower server that has been optimized to provide outstanding availability, manageability, and performance features for small to medium-sized businesses, retail stores, or distributed enterprises. The x3200 M2 systems include features not typically seen in this class of system, such as standard, hardware-based RAID 0/1, 2.5-inch (SFF) hot-swap SAS drives, and redundant power supplies (on select models). The x3200 M2 includes quad- and dual-core Intel® Xeon® processors for applications that require performance and stability; the x3200 M2 also supports Intel Pentium® dual-core and Core 2 Duo processors for applications that require lower cost. In measurements with the SPEC CPU2006 benchmark suite, the x3200 M2 achieved a leadership SPECint2006 score for a system using the Intel Xeon X3370 processor and competitive scores on the other members of the benchmark suite. The x3200 M2 was configured with the Quad-Core Intel Xeon Processor X3370 (3.00GHz, 12MB L2 cache, and 1333 MHz front-side bus—1 processor/4 cores/4 threads) and 8GB of DDR2 PC2- 6400 memory, and ran SUSE Linux® Enterprise Server 10 SP1 x64. (1) The scores in the following tables are the first SPEC CPU2006 results published for this processor model. SPEC CPU2006 x3200 M2 – Quad-Core Intel Xeon Processor X3370 Benchmark (3.00GHz, 12MB L2 Cache, 1333 MHz FSB) SPECint2006 26.3 SPECint_rate2006 76.2 SPECint_rate_base2006 66.2 SPECfp2006 24.2 SPECfp_rate2006 51.8 SPECfp_rate_base2006 47.8 Results are current as of August 12, 2008.
    [Show full text]
  • A Superscalar Out-Of-Order X86 Soft Processor for FPGA
    A Superscalar Out-of-Order x86 Soft Processor for FPGA Henry Wong University of Toronto, Intel [email protected] June 5, 2019 Stanford University EE380 1 Hi! ● CPU architect, Intel Hillsboro ● Ph.D., University of Toronto ● Today: x86 OoO processor for FPGA (Ph.D. work) – Motivation – High-level design and results – Microarchitecture details and some circuits 2 FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT6-LUT 6-LUT6-LUT 3 6-LUT 6-LUT FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT 6-LUT 6-LUT 6-LUT 4 6-LUT 6-LUT FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 5 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 6 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel
    [Show full text]
  • Introduction to Intel Xeon Phi Programming Models
    Introduction to Intel Xeon Phi programming models F.Affinito F. Salvadore SCAI - CINECA Part I Introduction to the Intel Xeon Phi architecture Trends: transistors Trends: clock rates Trends: cores and threads Trends: summarizing... The number of transistors increases The power consumption must not increase The density cannot increase on a single chip Solution : Increase the number of cores GP-GPU and Intel Xeon Phi.. Coupled to the CPU To accelerate highly parallel kernels, facing with the Amdahl Law What is Intel Xeon Phi? 7100 / 5100 / 3100 Series available 5110P: Intel Xeon Phi clock: 1053 MHz 60 cores in-order ~ 1 TFlops/s DP peak performance (2 Tflops SP) 4 hardware threads per core 8 GB DDR5 memory 512-bit SIMD vectors (32 registers) Fully-coherent L1 and L2 caches PCIe bus (rev. 2.0) Max Memory bandwidth (theoretical) 320 GB/s Max TDP: 225 W MIC vs GPU naïve comparison The comparison is naïve System K20s 5110P # cores 2496 60 (*4) Memory size 5 GB 8 GB Peak performance 3.52 TFlops 2 TFlops (SP) Peak performance 1.17 TFlops 1 TFlops (DP) Clock rate 0.706 GHz 1.053 GHz Memory bandwidth 208 GB/s (ECC off) 320 GB/s Terminology MIC = Many Integrated Cores is the name of the architecture Xeon Phi = Commercial name of the Intel product based on the MIC architecture Knight's corner, Knight's landing, Knight's ferry are development names of MIC architectures We will often refer to the CPU as HOST and Xeon Phi as DEVICE Is it an accelerator? YES: It can be used to “accelerate” hot-spots of the code that are highly parallel and computationally extensive In this sense, it works alongside the CPU It can be used as an accelerator using the “offload” programming model An important bottleneck is represented by the communication between host and device (through PCIe) Under this respect, it is very similar to a GPU Is it an accelerator? / 2 NOT ONLY: the Intel Xeon Phi can behave as a many-core X86 node.
    [Show full text]
  • The New Intel® Xeon® Processor Scalable Family
    Akhilesh Kumar Intel Corporation, 2017 Authors: Don Soltis, Irma Esmer, Adi Yoaz, Sailesh Kottapalli Notices and Disclaimers This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
    [Show full text]
  • Instruction Rate with Ivy Bridge Vs Haswell for Some Common Jobs
    Instruction rate with Ivy Bridge vs Haswell for some common jobs David Smith on behalf of IT-DI-LCG, UP team. 20 Oct 2016, ATLAS computing workflow performance meeting 20.10.16 Sandy Bridge / Haswell 1 Introduction • Get some insight about how the job’s code is interacting with the CPU while running by looking at Instructions per Cycle (IPC) • This is not our usual performance measure, but I hope this may let one more easily see how the cpu pipeline is handling the code, and to some extent compare microarchitectures 10/20/2016 Sandy Bridge / Haswell 2 IPC for some jobs • Ratio of retired instructions / unhalted clock cycles (most over whole job). Physical machine. • Atlas simu with single process athena, HT on, affinity fixed to 1 core. No other significant load. • Haswell was Xeon E5-2683 v3 (~3GHz); Ivy Bridge i7-3770k (~3.8GHz) • checked Ivy Bridge also on Xeon E5-2695 v2 (~3.1GHz) running ATLAS Sim (19.2) => 0.91 IPC • checked Atlas simu (19.2) with athenaMP (8) affinity to 4 cores on one socket => 1.58/0.97 IPC • ATLAS sim was job 2972328065 (19.2.4.9, slc6-gcc47-opt or 20.7.8.5, slc6-gcc49-opt; mc15_13TeV.362059.Sherpa_CT10_Znunu_Pt140_280_CFilterBVecto_fac4) • Looked up previous HS06 results; usually ~10% higher for Haswell (per job slot/per GHz) 10/20/2016 Sandy Bridge / Haswell 3 Which microarchitectures are used? • The above are usually classed as the intel microarchitectures: e.g. Ivy Bridge is the die shrink version of SB, and is classed as SB microarch. • This is the last 90 days of ATLAS jobs, raw data from elastic search (thanks Andrea) • Used wall clock time per cpu type, with classification based on type string, weighted by quoted cpu freq, and a rough weighting of x1.5 for Intel Core, as that microarch.
    [Show full text]
  • Accelerators for HP Proliant Servers Enable Scalable and Efficient High-Performance Computing
    Family data sheet Accelerators for HP ProLiant servers Enable scalable and efficient high-performance computing November 2014 Family data sheet | Accelerators for HP ProLiant servers HP high-performance computing has made it possible to accelerate innovation at any scale. But traditional CPU technology is no longer capable of sufficiently scaling performance to address the skyrocketing demand for compute resources. HP high-performance computing solutions are built on HP ProLiant servers using industry-leading accelerators to dramatically increase performance with lower power requirements. Innovation is the foundation for success What is hybrid computing? Accelerators are revolutionizing high performance computing A hybrid computing model is one where High-performance computing (HPC) is being used to address many of modern society’s biggest accelerators (known as GPUs or coprocessors) challenges, such as designing new vaccines and genetically engineering drugs to fight diseases, work together with CPUs to perform computing finding and extracting precious oil and gas resources, improving financial instruments, and tasks. designing more fuel efficient engines. As parallel processors, accelerators can split computations into hundreds or thousands of This rapid pace of innovation has created an insatiable demand for compute power. At the same pieces and calculate them simultaneously. time, multiple strict requirements are placed on system performance, power consumption, size, response, reliability, portability, and design time. Modern HPC systems are rapidly evolving, Offloading the most compute-intensive portions of already reaching petaflop and targeting exaflop performance. applications to accelerators dramatically increases both application performance and computational All of these challenges lead to a common set of requirements: a need for more computing efficiency.
    [Show full text]
  • Quick-Reference Guide to Optimization with Intel® Compilers
    Quick Reference Guide to Optimization with Intel® C++ and Fortran Compilers v19.1 For IA-32 processors, Intel® 64 processors, Intel® Xeon Phi™ processors and compatible non-Intel processors. Contents Application Performance .............................................................................................................................. 2 General Optimization Options and Reports ** ............................................................................................. 3 Parallel Performance ** ................................................................................................................................ 4 Recommended Processor-Specific Optimization Options ** ....................................................................... 5 Optimizing for the Intel® Xeon Phi™ x200 product family ............................................................................ 6 Interprocedural Optimization (IPO) and Profile-Guided Optimization (PGO) Options ................................ 7 Fine-Tuning (All Processors) ** ..................................................................................................................... 8 Floating-Point Arithmetic Options .............................................................................................................. 10 Processor Code Name With Instruction Set Extension Name Synonym .................................................... 11 Frequently Used Processor Names in Compiler Options ...........................................................................
    [Show full text]
  • Broadwell Skylake Next Gen* NEW Intel NEW Intel NEW Intel Microarchitecture Microarchitecture Microarchitecture
    15 лет доступности IOTG is extending the product availability for IOTG roadmap products from a minimum of 7 years to a minimum of 15 years when both processor and chipset are on 22nm and newer process technologies. - Xeon Scalable (w/ chipsets) - E3-12xx/15xx v5 and later (w/ chipsets) - 6th gen Core and later (w/ chipsets) - Bay Trail (E3800) and later products (Braswell, N3xxx) - Atom C2xxx (Rangeley) and later - Не включает в себя Xeon-D (7 лет) и E5-26xx v4 (7 лет) 2 IOTG Product Availability Life-Cycle 15 year product availability will start with the following products: Product Discontinuance • Intel® Xeon® Processor Scalable Family codenamed Skylake-SP and later with associated chipsets Notification (PDN)† • Intel® Xeon® E3-12xx/15xx v5 series (Skylake) and later with associated chipsets • 6th Gen Intel® Core™ processor family (Skylake) and later (includes Intel® Pentium® and Celeron® processors) with PDNs will typically be issued no later associated chipsets than 13.5 years after component • Intel Pentium processor N3700 (Braswell) and later and Intel Celeron processors N3xxx (Braswell) and J1900/N2xxx family introduction date. PDNs are (Bay Trail) and later published at https://qdms.intel.com/ • Intel® Atom® processor C2xxx (Rangeley) and E3800 family (Bay Trail) and late Last 7 year product availability Time Last Last Order Ship Last 15 year product availability Time Last Last Order Ship L-1 L L+1 L+2 L+3 L+4 L+5 L+6 L+7 L+8 L+9 L+10 L+11 L+12 L+13 L+14 L+15 Years Introduction of component family † Intel may support this extended manufacturing using reasonably Last Time Order/Ship Periods Component family introduction dates are feasible means deemed by Intel to be appropriate.
    [Show full text]