Intel's Knights Landing

Total Page:16

File Type:pdf, Size:1020Kb

Intel's Knights Landing Intel’s Knights Landing – what’s old, what’s new? April 2nd 2014, Annual Concurrency Forum Meeting Andrzej Nowak, CERN openlab CTO office Overview old die shot Andrzej Nowak - Intel’s Knights Landing – what’s old, what’s new? 2 High level view KNC 2012 KNL 2015+ Core 61 cores 72 cores P54C (Pentium Pro) Silvermont (Atom) 1-1.2 GHz ? Memory 16 GB GDDR 16 GB eDRAM (3D) 0 GB DDR3 Up to 384 GB DDR4 Package 22nm 14nm 1 TF DP 3 TF DP PCIe gen2 PCIe gen3, Socket Andrzej Nowak - Intel’s Knights Landing – what’s old, what’s new? 3 Core progression . Move from a beefed-up P54C core to a Silvermont (modern Atom) • Out of order! • Still 4 threads/core • 1 MB L2 shared across two cores – no change in cache size . Mesh fabric interconnect rather than ring bus . AVX-512 (3.1) implemented – convergence with Xeon . The promise: 3-4x more GF/Watt than current supercomputers. Will it hold? Andrzej Nowak - Intel’s Knights Landing – what’s old, what’s new? 4 Programmability (1) . Bootability implies wide-spread OS support • Fully native applications made easy (i.e. normal) • Kernel patches already inbound . AVX 512 convergence will improve the quality of software tools and compilers • Less maintenance, more performance • No guarantee of binary compatibility so far . Parallelization, vectorization • No escape Andrzej Nowak - Intel’s Knights Landing – what’s old, what’s new? 5 Programmability (2) . A new layer of memory • High-BW eDRAM on chip, DDR further away • Can we afford to remain agnostic? Andrzej Nowak - Intel’s Knights Landing – what’s old, what’s new? 6 “Crazy” stuff . >100 PF system target . 3D memory on board, 500GB/s . Socket option (single) . Integrated next-gen 100Gbit interconnect: Storm Lake . 200W TDP only? Andrzej Nowak - Intel’s Knights Landing – what’s old, what’s new? 7 What does this mean for us? . No fundamental shift • Still need to parallelize, vectorize . Better software support • Better chance to get good performance with GNU and open source stacks . Faster cores • Always better . Not many more cores • Reduces scaling issues . Socket option and bootability • Easier management, finally lots of memory . Integrated interconnect • Fantastic device for data taking and HPC Andrzej Nowak - Intel’s Knights Landing – what’s old, what’s new? 8 Thank you [email protected] .
Recommended publications
  • Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance
    White Paper Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation White Paper Inside Intel®Core™ Microarchitecture Introduction Introduction 2 The Intel® Core™ microarchitecture is a new foundation for Intel®Core™ Microarchitecture Design Goals 3 Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art multi-core optimized Delivering Energy-Efficient Performance 4 and power-efficient microarchitecture is designed to deliver Intel®Core™ Microarchitecture Innovations 5 increased performance and performance-per-watt—thus increasing Intel® Wide Dynamic Execution 6 overall energy efficiency. This new microarchitecture extends the energy efficient philosophy first delivered in Intel's mobile Intel® Intelligent Power Capability 8 microarchitecture found in the Intel® Pentium® M processor, and Intel® Advanced Smart Cache 8 greatly enhances it with many new and leading edge microar- Intel® Smart Memory Access 9 chitectural innovations as well as existing Intel NetBurst® microarchitecture features. What’s more, it incorporates many Intel® Advanced Digital Media Boost 10 new and significant innovations designed to optimize the Intel®Core™ Microarchitecture and Software 11 power, performance, and scalability of multi-core processors. Summary 12 The Intel Core microarchitecture shows Intel’s continued Learn More 12 innovation by delivering both greater energy efficiency Author Biographies 12 and compute capability required for the new workloads and usage models now making their way across computing. With its higher performance and low power, the new Intel Core microarchitecture will be the basis for many new solutions and form factors. In the home, these include higher performing, ultra-quiet, sleek and low-power computer designs, and new advances in more sophisticated, user-friendly entertainment systems.
    [Show full text]
  • Sgxometer: Open and Modular Benchmarking for Intel SGX
    SGXoMeter: Open and Modular Benchmarking for Intel SGX Mohammad Mahhouk Nico Weichbrodt Rüdiger Kapitza TU Braunschweig, Germany TU Braunschweig, Germany TU Braunschweig, Germany ABSTRACT mobile devices like phones and tablets. Also, personal computers, Intel’s Software Guard Extensions (SGX) are currently the most laptops and servers can be secured using AMD Secure Encrypted wide-spread commodity trusted execution environment, which Virtualisation [22, 23], and Intel SGX [24, 27]. provides integrity and confidentiality of sensitive code and data. Intel SGX promises with its isolated memory regions, so called Thereby, it offers protection even against privileged attackers and enclaves, both confidentiality and integrity protection of the sen- various forms of physical attacks. As a technology that only be- sitive data and code running inside them against malicious and came available in late 2015, it has received massive interest and privileged software. It also provides local and remote attestation undergone a rapid evolution. Despite first ad-hoc attempts, there is mechanisms [21] to ensure the authenticity and integrity of the so far no standardised approach to benchmark the SGX hardware, running enclaves, adding protection against forging attempts. Thus, its associated environment, and techniques that were designed to the utilisation of SGX in cloud services can considerably reduce harden SGX-based applications. the customers’ reluctance of using them. Furthermore, Intel has re- In this paper, we present SGXoMeter, an open and modular leased a Software Development Kit (SDK)[18] to ease programming framework designed to benchmark different SGX-aware CPUs, with SGX. It introduces wrappers for low-level instructions and `code revisions, SDK versions and extensions to mitigate side- provides a high-level interface that offers multiple functionalities, channel attacks.
    [Show full text]
  • A Superscalar Out-Of-Order X86 Soft Processor for FPGA
    A Superscalar Out-of-Order x86 Soft Processor for FPGA Henry Wong University of Toronto, Intel [email protected] June 5, 2019 Stanford University EE380 1 Hi! ● CPU architect, Intel Hillsboro ● Ph.D., University of Toronto ● Today: x86 OoO processor for FPGA (Ph.D. work) – Motivation – High-level design and results – Microarchitecture details and some circuits 2 FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT6-LUT 6-LUT6-LUT 3 6-LUT 6-LUT FPGA: Field-Programmable Gate Array ● Is a digital circuit (logic gates and wires) ● Is field-programmable (at power-on, not in the fab) ● Pre-fab everything you’ll ever need – 20x area, 20x delay cost – Circuit building blocks are somewhat bigger than logic gates 6-LUT 6-LUT 6-LUT 6-LUT 4 6-LUT 6-LUT FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 5 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel code and hardware accelerators need effort – Less effort if soft processors got faster 6 FPGA Soft Processors ● FPGA systems often have software components – Often running on a soft processor ● Need more performance? – Parallel
    [Show full text]
  • SOM-5992 COM Express R3.0 Type 7 Module
    Intel Xeon Processor D-1500 SOM-5992 COM Express R3.0 Type 7 Module NEW Features Intel® Xeon® Processor D-1500 Product Family COM Express R3.0 Basic Module Type 7 pin out 2~16 core processor, with max. TDP 45W Dual-CH DDR4 2400 w/ECC, max. 128GB High speed Ethernet (dual 10GBASE-KR interfaces, one GbE) Abundant expansion. (PCIe x16, PCIe x8, 8 PCIe x1) Supports iManager, Embedded Software APIs and Wise-PaaS/RMM Software APIs: Utilities: Specifications Form Factor COM Express Basic Module Form Factor Pin-out Type COM Express R3.0 Type 7 compatible Xeon Xeon Xeon Xeon Xeon Pentium Pentium Xeon Xeon Pentium CPU D-1577 D-1548 D-1537 D-1528 D-1527 D1517 D1508 D-1559 D-1539 D1519 Base Frequency 1.3 GHz 2.0 GHz 1.7 GHz 1.9 GHz 2.2 GHz 1.6 GHz 2.2 GHz 1.5 GHz 1.6 GHz 1.5 GHz Max Single Core Turbo 2.1 GHz 2.6 GHz 2.3 GHz 2.5 GHz 2.7 GHz 2.2 GHz 2.6 GHz 2.1 GHz 2.2 GHz 2.1 GHz Processor System Frequency Cores 16 8 8 6 4 4 2 12 8 4 LLC 24MB 12MB 12MB 9MB 6MB 6MB 3MB 18MB 12MB 6MB CPU TDP 45W 45W 35W 35W 35W 25W 25W 45W 35W 25W BIOS AMI UEFI 128Mbit Technology DDR4 1866/2133/2400MHz Max. Frequency 2133 MHz 2400 MHz 2133 MHz 2133 MHz 2133 MHz 2133 MHz 1866 MHz 2133 MHz 2133 MHz 2133 MHz Memory ECC Support ECC and non-ECC Max.
    [Show full text]
  • The Intel X86 Microarchitectures Map Version 2.0
    The Intel x86 Microarchitectures Map Version 2.0 P6 (1995, 0.50 to 0.35 μm) 8086 (1978, 3 µm) 80386 (1985, 1.5 to 1 µm) P5 (1993, 0.80 to 0.35 μm) NetBurst (2000 , 180 to 130 nm) Skylake (2015, 14 nm) Alternative Names: i686 Series: Alternative Names: iAPX 386, 386, i386 Alternative Names: Pentium, 80586, 586, i586 Alternative Names: Pentium 4, Pentium IV, P4 Alternative Names: SKL (Desktop and Mobile), SKX (Server) Series: Pentium Pro (used in desktops and servers) • 16-bit data bus: 8086 (iAPX Series: Series: Series: Series: • Variant: Klamath (1997, 0.35 μm) 86) • Desktop/Server: i386DX Desktop/Server: P5, P54C • Desktop: Willamette (180 nm) • Desktop: Desktop 6th Generation Core i5 (Skylake-S and Skylake-H) • Alternative Names: Pentium II, PII • 8-bit data bus: 8088 (iAPX • Desktop lower-performance: i386SX Desktop/Server higher-performance: P54CQS, P54CS • Desktop higher-performance: Northwood Pentium 4 (130 nm), Northwood B Pentium 4 HT (130 nm), • Desktop higher-performance: Desktop 6th Generation Core i7 (Skylake-S and Skylake-H), Desktop 7th Generation Core i7 X (Skylake-X), • Series: Klamath (used in desktops) 88) • Mobile: i386SL, 80376, i386EX, Mobile: P54C, P54LM Northwood C Pentium 4 HT (130 nm), Gallatin (Pentium 4 Extreme Edition 130 nm) Desktop 7th Generation Core i9 X (Skylake-X), Desktop 9th Generation Core i7 X (Skylake-X), Desktop 9th Generation Core i9 X (Skylake-X) • Variant: Deschutes (1998, 0.25 to 0.18 μm) i386CXSA, i386SXSA, i386CXSB Compatibility: Pentium OverDrive • Desktop lower-performance: Willamette-128
    [Show full text]
  • Inside Intel® Core™ Microarchitecture and Smart Memory Access an In-Depth Look at Intel Innovations for Accelerating Execution of Memory-Related Instructions
    White Paper Inside Intel® Core™ Microarchitecture and Smart Memory Access An In-Depth Look at Intel Innovations for Accelerating Execution of Memory-Related Instructions Jack Doweck Intel Principal Engineer, Merom Lead Architect Intel Corporation Entdecken Sie weitere interessante Artikel und News zum Thema auf all-electronics.de! Hier klicken & informieren! White Paper Intel Smart Memory Access and the Energy-Efficient Performance of the Intel Core Microarchitecture Introduction . 2 The Five Major Ingredients of Intel® Core™ Microarchitecture . 3 Intel® Wide Dynamic Execution . 3 Intel® Advanced Digital Media Boost . 4 Intel® Intelligent Power Capability . 4 Intel® Advanced Smart Cache . 5 Intel® Smart Memory Access . .5 How Intel Smart Memory Access Improves Execution Throughput . .6 Memory Disambiguation . 7 Predictor Lookup . 8 Load Dispatch . 8 Prediction Verification . .8 Watchdog Mechanism . .8 Instruction Pointer-Based (IP) Prefetcher to Level 1 Data Cache . .9 Traffic Control and Resource Allocation . 10 Prefetch Monitor . 10 Summary . .11 Author’s Bio . .11 Learn More . .11 References . .11 2 Intel Smart Memory Access and the Energy-Efficient Performance of the Intel Core Microarchitectures White Paper Introduction The Intel® Core™ microarchitecture is a new foundation for Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art, power-efficient multi-core microarchi- tecture delivers increased performance and performance per watt, thus increasing overall energy efficiency. Intel Core microarchitecture extends the energy-efficient philosophy first delivered in Intel's mobile microarchitecture (Intel® Pentium® M processor), and greatly enhances it with many leading edge microarchitectural advancements, as well as some improvements on the best of Intel NetBurst® microarchitecture.
    [Show full text]
  • Intel Xeon Processor Can Be Identified by the Following Values
    Intel® Xeon® Processor Specification Update December 2006 Notice: The Intel® Xeon® processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are documented in this specification update. Document Number: 249678-056 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling 1-800-548-4725 or by visiting Intel's website at http://developer.intel.com/design/litcentr. Intel®, the Intel® logo, Pentium®, Pentium® III Xeon™, Celeron, Intel® NetBurst™ and Intel® Xeon™ are trademarks or registered trademarks of Intel® Corporation or its subsidiaries in the United States and other countries.
    [Show full text]
  • M39 Sandy Bridge-PDF
    SANDY BRIDGE SPANS GENERATIONS Intel Focuses on Graphics, Multimedia in New Processor Design By Linley Gwennap {9/27/10-01} ................................................................................................................... Intel’s processor clock has tocked, delivering a next- periods. For notebook computers, these improvements can generation architecture for PCs and servers. At the recent significantly extend battery life by completing tasks more Intel Developer’s Forum (IDF), the company unveiled its quickly and allowing the system to revert to a sleep state. Sandy Bridge processor architecture, the next tock in its tick-tock roadmap. The new CPU is an evolutionary im- Integration Boosts Graphics Performance provement over its predecessor, Nehalem, tweaking the Intel had a false start with integrated graphics: the ill-fated branch predictor, register renaming, and instruction de- Timna project, which was canceled in 2000. More recently, coding. These changes will slightly improve performance Nehalem-class processors known as Arrandale and Clark- on traditional integer applications, but we may be reaching dale “integrated” graphics into the processor, but these the point where the CPU microarchitecture is so efficient, products actually used two chips in one package, as Figure few ways remain to improve performance. 1 shows. By contrast, Sandy Bridge includes the GPU on The big changes in Sandy Bridge target multimedia the processor chip, providing several benefits. The GPU is applications such as 3D graphics, image processing, and now built in the same leading-edge manufacturing process video processing. The chip is Intel’s first to integrate the as the CPU, rather than an older process, as in earlier graphics processing unit (GPU) on the processor itself.
    [Show full text]
  • Pentium® Processor Family Developer's Manual
    D Pentium® Processor Family Developer’s Manual 1997 12/19/96 9:12 AM Front.doc Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The Pentium® processor may contain design defects or errors known as errata. Current characterized errata are available on request. MPEG is an international standard for video compression/decompression promoted by ISO. Implementations of MPEG CODECs, or MPEG enabled platforms may require licenses from various entities, including Intel Corporation. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be obtained from: Intel Corporation P.O.
    [Show full text]
  • Intel Pentium 4 and Intel Xeon Processor Optimization
    Intel® Pentium® 4 and Intel® Xeon™ Processor Optimization Reference Manual Issued in U.S.A. Order Number: 248966-007 World Wide Web: http://developer.intel.com INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel prod- ucts are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. This Intel Pentium 4 and Intel Xeon Processor Optimization Reference Manual as well as the software described in it is fur- nished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a com- mitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation.
    [Show full text]
  • Intel® Xeon® Processor D-1500 Product Family
    PRODUCT BRIEF Intel® Xeon® Processor D-1500 Product Family (Intel® Xeon® Processors D-1527/D-1529/D-1539/D-1548/D-1559/D-1567/D-1577 and Intel® Pentium® Processors D1508/D1509/D1519) Support dense, ruggedized Internet of Things solutions with the advanced intelligence, high performance, low power, long-life reliability, and robust thermals of the Intel® Xeon® processor D-1500 family. Internet of Things Product Overview Enhanced Performance per Watt As the Internet of Things (IoT) The Intel Xeon processor D-1500 expands and technology solutions product family delivers exceptional are expected to use less power and value and unmatched performance space while offering more reliability density per watt with an Intel Xeon and performance, Intel is proud to processor in an SoC package. Its TDP introduce the Intel® Xeon® processor of ~19 W to 65 W, industry-leading D-1500 product family. It is Intel’s 14 nm process technology, and a 3rd-generation 64-bit system-on- compute-only design make it ideal a-chip (SoC) and the first Intel Xeon for meeting the diverse needs of SoC based on Intel 14nm silicon customers seeking midrange low- technology. power, high-density solutions—like This lineup offers hardware and those found in programmable logic software scalability from two up to controllers (PLCs), motion control eight cores, making it the perfect devices, and avionics controllers. choice for a broad range of high- The Intel Xeon processor D-1500 performing, low-power solutions that product family represents up to 2.3x will bring intelligence and Intel® Xeon® greater performance per watt1 and a processor reliability, availability, and scalable product lineup spanning two serviceability (RAS) to the edge.
    [Show full text]
  • Intel® Processor Graphics for Pentium® and Celeron® Processors
    undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined ® undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefinedIntel undefined undefinedProcessor undefined undefined Graphics undefined undefined for Pentium® and Celeron® Processors Product Guide undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined
    [Show full text]