Benefits of the AMD Opteron Processor for LS-DYNA

Total Page:16

File Type:pdf, Size:1020Kb

Benefits of the AMD Opteron Processor for LS-DYNA th 4 European LS -DYNA Users Conference MPP / Linux Cluster / Hardware I Benefits of the AMD Opteron™ Processor for LS-DYNA Dawn Rintala Sr. Developer Engineer Computation Products Group Advanced Micro Devices, Inc. Demands on System from Explicit/Implicit Analysis •Explicit – Quick Communication between processors – Good FPU •Implicit – Low memory latency – High memory bandwidth AMDCopyright Opteron™ delivers: by DYNAmore 1.Breakthrough 64-bit performance 2.Complete 32-bit compatibility 3.Industry leading price/performance 22 May 2003 LS-DYNA Users Conference 2003 2 K – I - 43 th MPP / Linux Cluster / Hardware I 4 European LS -DYNA Users Conference Benefits of AMD Opteron™ for LS-DYNA •Performance – Raw Performance – Price Performance •Scalability – On-Chip Cache – HyperTransport™ •Low Memory Latency •Memory Addressability 4P, 32GB AMD Opteron™ Processor-based System 22 May 2003 LS-DYNA Users Conference 2003 3 AMD Opteron™ SOC Architecture Overview AMD Opteron™ Architecture •First AMD64 processor •Aggressive out-of-order, 9-issue DDR Memory superscalar processor Controller L1 •Integrated DDR memory controller Instruct. AMD Opteron™ Cache Processor L2 •Leading performance in integer, Core L1 Cache floating point and multimedia Data Cache – AMD64, x87, MMX™, 3DNow!™, SSE, SSE2 HyperTransport™ technology •GluelessCopyright multiprocessing throughby DYNAmore HyperTransport™ •Expandable IO through HyperTransport 22 May 2003 LS-DYNA Users Conference 2003 4 K – I - 44 th 4 European LS -DYNA Users Conference MPP / Linux Cluster / Hardware I AMD Opteron™ Processor Technology Overview • Processor Core Overview DRAM – Support for AMD’s 64-bit technology – 12-stage int, 17-stage fp pipelines – Enhanced TLB structures – TLB flush filter – Enhanced branch prediction MCT – 1MB L2 cache CPU SRQ – ECC protection • Memory Controller Overview – Dual-channel DDR memory XBAR – PC2700, PC2100, or PC1600 DDR memory support – Registered or Unbuffered DIMMs – ECC and Chip Kill HT HT – High bandwidth (up to 6.4GB/s) HT • HyperTransport™ Technology Overview – One, two, or three links – 2, 4, 8, 16, or 32-bits full duplex – Up to 6.4 GB/s bandwidth per link – 19.2 GB/s aggregate external bandwidth HT = HyperTransport™ technology 22 May 2003 LS-DYNA Users Conference 2003 5 AMD Opteron™ processor-based 4P Server AMDAMD Opteron Opteron™™ AMDAMD Opteron Opteron 200-333MHz 940 mPGA 940 mPGA 200-333MHz 144-Bit Reg DDR 940 mPGA 940 mPGA 144-Bit Reg DDR 16x16 coherent HyperTransportTM @ 1600MT/s AMDAMD Opteron Opteron AMDAMD Opteron Opteron 200-333MHz 200-333MHz 940940 mPGA mPGA 940 mPGA 144-Bit Reg DDR 940 mPGA 144-Bit Reg DDR 16x16 HyperTransport@ 1600MT/s 64-bits @ 64-bits @ 8x8 HyperTransport @ 400MT/s 133Mhz AMDAMD-8131-8131ää 133Mhz PCI-X PCI-X PCI-X PCI-X Legacy PCI Hot Plug Hot Plug PCI VGA Graphics Copyright8x8 HyperTransport @ 1600MT/s by DYNAmoreAMD-8111ä AMD-8111ä Southbridge FLASH Southbridge LPC 64-bits @ 64-bits @ 1000 BaseT SIO Zircon BMC 66Mhz AMD-8131 66Mhz Gbit Ethernet AMD-8131 USB PCI-X U320 AC97 100 BaseT PCI-X PCI-X PCI-X UDMA100 SCSI 10/100 Ethernet Management LAN 1000 BaseT Gbit Ethernet 22 May 2003 LS-DYNA Users Conference 2003 6 K – I - 45 th MPP / Linux Cluster / Hardware I 4 European LS -DYNA Users Conference The Rewards of Good Plumbing •High Bandwidth – 2P system is designed to achieve 7 GB/s aggregate memory Read bandwidth – 4P system is designed to achieve 10 GB/s aggregate memory Read bandwidth •Low Latency – Average 2P unloaded latency (page hit) is designed to be < 120 ns – Average 4P unloaded latency (page hit) is designed to be < 140 ns – Latency under load increases slowly due to excess Interconnect Bandwidth – Latency shrinks quickly with increasing CPU clock speed and HyperTransport™ link speed 22 May 2003 LS-DYNA Users Conference 2003 7 Integrated Memory Controller Latency (Local Memory Access, Registered Memory, CAS2) Read Latency Accessing Local Memory, PC2100 200 180 AMD Athlon™ latency = 180ns 160 140 120 PageHit 0 Hop PageMiss 0 Hop 100 Prb Miss 1 Hop (2 node case) Prb Miss 2 Hop (4 node case) Latency (ns) 80 Copyright60 by DYNAmore 40 20 0 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 Frequency (MHz) 22 May 2003 LS-DYNA Users Conference 2003 8 K – I - 46 th 4 European LS -DYNA Users Conference MPP / Linux Cluster / Hardware I Floating-Point Performance SPECfp®_rate2000 Performance SPECfp®_rate2000 Performance and (Peak, 4P) Scalability (Peak, 2-4P scaling) Itanium 2 49.3 Itanium 2 1.0GHz 49.3 244% 1.0GHz (4P) 161% Itanium 2 30.7 100% AMD Opteron™ 844 49.2 244% 1.0GHz (2P) AMD Opteron™ 49.2 844 (4P) 184% AMD Opteron 842 45 223% AMD Opteron 26.7 100% 244 (2P) AMD Opteron 840 40.7 201% Xeon MP 20.2 2.0GHz (4P) 146% Xeon MP 13.8 Xeon 2.0GHz 20.2 100% 2.0GHz (2P) 100% www.amd.com/opteronperformance © 2003 Advanced Micro Devices. AMD, the AMD Arrow Logo, AMD Opteron and any combinations thereof are trademarks of Advanced Micro Devices. Microsoft and Windows are trademarks of Microsoft Corporation. SPEC and the benchmark named SPECfp are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on 22 Maywww.spec.org 2003 as of Apr 14, 2003. SPEC benchmark scores for AMD Opteron processorLS-DYNA-based systems Users are Conference under submission 2003 to the SPEC organization. For complete configuration information visit www.spec.org9. The Result AMD Opteron™ delivers: 1. Breakthrough 64-bit performance 2. Complete 32-bit compatibility 3. Industry leading price/performance Copyright by DYNAmore 22 May 2003 LS-DYNA Users Conference 2003 10 K – I - 47 th MPP / Linux Cluster / Hardware I 4 European LS -DYNA Users Conference Cautionary Statement Trademark Attribution •This presentation contains forward-looking statements, which are made pursuant to the safe harbor provisions of the U.S. Private Securities Litigation Reform Act of 1995. Forward-looking statements are generally preceded by words such as “expects,” “plans,” “believes,” “anticipates,” or “intends.” Investors are cautioned that all forward-looking statements in this presentation involve risks and uncertainties that could cause actual results to differ materially from current expectations. Forward-looking statements in this presentation involve the risks that AMD OpteronTM processors will not perform pursuant to their design specifications, will not achieve customer and/or market acceptance, and that third-party solution vendors will not provide the infrastructure necessary to support these products. We urge investors to review in detail the risks and uncertainties in the company’s U.S. Securities and Exchange Commission filings, including the most recently filed Form 10-K. •AMD, the AMD Arrow logo, AMD Opteron, AMD Athlon, and combinations thereof, AMD-8111 and AMD-8131 are trademarks of Advanced Micro Devices, Inc. HyperTransport is a licensed trademark of the HyperTransport Technology Consortium. Other product and company names used in this publication are for identification purposes only and may be trademarks of their respective companies. 22 May 2003 LS-DYNA Users Conference 2003 11 Copyright by DYNAmore K – I - 48.
Recommended publications
  • Direct Memory Access Components Verification System
    ТРУДЫ МФТИ. — 2012. — Том 4, № 1 Frolov P. V. et al. 1 УДК 004.052.42 P. V. Frolov, V. N. Kutsevol, A. N. Meshkov, N. Yu. Polyakov, M. P. Ryzhov AO «MCST» PAO «INEUM» Direct Memory Access components verification system A method of direct memory access subsystem verification used for Elbrus series micro- processors has been described. A peripheral controller imitator has been developed in order to reduce verification overhead. The model of imitator has been included into the functional machine simulator. A pseudorandom test generator for verification of the direct memory access subsystem has been based on the simulator. Ключевые слова: system verification, functional model, direct memory access, pseu- dorandom test generation. Direct Memory Access components verification system 1. Introduction Modern computer systems require very intensive data exchange between the peripheral de- vices and the random-access memory. In the most cases this exchange is performed by the direct memory access (DMA) subsystem. The increasing demands for the performance of the subsys- tem lead to an increase in its complexity, therefore requiring development of effective approaches to DMA subsystem verification [1,2]. This article is based on a result of a comprehensive project than combined implementation of a there co-designed verification techniques based on the consecutive investigation of theDMA subsystem employing one the three models: 1) a functional model written in C++ that corre- sponds to behaviour of the subsystem in the environment determined by a real computer system configuration, 2) RTL model in Verilog and 3) FPGA-based prototype. This article describesthe first method that enables verifying correctness of the design at an early stage of the verification and eliminate a large quantity of bugs using simple tests.
    [Show full text]
  • VIA RAID Configurations
    VIA RAID configurations The motherboard includes a high performance IDE RAID controller integrated in the VIA VT8237R southbridge chipset. It supports RAID 0, RAID 1 and JBOD with two independent Serial ATA channels. RAID 0 (called Data striping) optimizes two identical hard disk drives to read and write data in parallel, interleaved stacks. Two hard disks perform the same work as a single drive but at a sustained data transfer rate, double that of a single disk alone, thus improving data access and storage. Use of two new identical hard disk drives is required for this setup. RAID 1 (called Data mirroring) copies and maintains an identical image of data from one drive to a second drive. If one drive fails, the disk array management software directs all applications to the surviving drive as it contains a complete copy of the data in the other drive. This RAID configuration provides data protection and increases fault tolerance to the entire system. Use two new drives or use an existing drive and a new drive for this setup. The new drive must be of the same size or larger than the existing drive. JBOD (Spanning) stands for Just a Bunch of Disks and refers to hard disk drives that are not yet configured as a RAID set. This configuration stores the same data redundantly on multiple disks that appear as a single disk on the operating system. Spanning does not deliver any advantage over using separate disks independently and does not provide fault tolerance or other RAID performance benefits. If you use either Windows® XP or Windows® 2000 operating system (OS), copy first the RAID driver from the support CD to a floppy disk before creating RAID configurations.
    [Show full text]
  • Fibre Channel Interface
    Fibre Channel Interface Fibre Channel Interface ©2006, Seagate Technology LLC All rights reserved Publication number: 100293070, Rev. A March 2006 Seagate and Seagate Technology are registered trademarks of Seagate Technology LLC. SeaTools, SeaFONE, SeaBOARD, SeaTDD, and the Wave logo are either registered trade- marks or trademarks of Seagate Technology LLC. Other product names are registered trade- marks or trademarks of their owners. Seagate reserves the right to change, without notice, product offerings or specifications. No part of this publication may be reproduced in any form without written permission of Seagate Technol- ogy LLC. Revision status summary sheet Revision Date Writer/Engineer Sheets Affected A 03/08/06 C. Chalupa/J. Coomes All iv Fibre Channel Interface Manual, Rev. A Contents 1.0 Contents . i 2.0 Publication overview . 1 2.1 Acknowledgements . 1 2.2 How to use this manual . 1 2.3 General interface description. 2 3.0 Introduction to Fibre Channel . 3 3.1 General information . 3 3.2 Channels vs. networks . 4 3.3 The advantages of Fibre Channel . 4 4.0 Fibre Channel standards . 5 4.1 General information . 6 4.1.1 Description of Fibre Channel levels . 6 4.1.1.1 FC-0 . .6 4.1.1.2 FC-1 . .6 4.1.1.3 FC-1.5 . .6 4.1.1.4 FC-2 . .6 4.1.1.5 FC-3 . .6 4.1.1.6 FC-4 . .7 4.1.2 Relationship between the levels. 7 4.1.3 Topology standards . 7 4.1.4 FC Implementation Guide (FC-IG) . 7 4.1.5 Applicable Documents .
    [Show full text]
  • Owner's Manual
    Dell Latitude 3330 Owner's Manual Regulatory Model: P18S Regulatory Type: P18S002 Notes, Cautions, and Warnings NOTE: A NOTE indicates important information that helps you make better use of your computer. CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid the problem. WARNING: A WARNING indicates a potential for property damage, personal injury, or death. © 2013 Dell Inc. Trademarks used in this text: Dell™, the Dell logo, Dell Boomi™, Dell Precision™ , OptiPlex™, Latitude™, PowerEdge™, PowerVault™, PowerConnect™, OpenManage™, EqualLogic™, Compellent™, KACE™, FlexAddress™, Force10™ and Vostro™ are trademarks of Dell Inc. Intel®, Pentium®, Xeon®, Core® and Celeron® are registered trademarks of Intel Corporation in the U.S. and other countries. AMD® is a registered trademark and AMD Opteron™, AMD Phenom™ and AMD Sempron™ are trademarks of Advanced Micro Devices, Inc. Microsoft®, Windows®, Windows Server®, Internet Explorer®, MS-DOS®, Windows Vista® and Active Directory® are either trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries. Red Hat® and Red Hat® Enterprise Linux® are registered trademarks of Red Hat, Inc. in the United States and/or other countries. Novell® and SUSE® are registered trademarks of Novell Inc. in the United States and other countries. Oracle® is a registered trademark of Oracle Corporation and/or its affiliates. Citrix®, Xen®, XenServer® and XenMotion® are either registered trademarks or trademarks of Citrix Systems, Inc. in the United States and/or other countries. VMware®, Virtual SMP®, vMotion®, vCenter® and vSphere® are registered trademarks or trademarks of VMware, Inc. in the United States or other countries.
    [Show full text]
  • AMD Athlonxp/Athlon/Duron Processor Based MAIN BOARD
    All manuals and user guides at all-guides.com FN41 AMD AthlonXP/Athlon/Duron Processor Based MAIN BOARD all-guides.com All manuals and user guides at all-guides.com WARNING Thermal issue is highly essential for processors with a speed of 600MHz and above. Hence, we recommend you to use the CPU fan qualified by AMD or motherboard manufacturer. Meanwhile, please make sure CPU and fan are securely fastened well. Otherwise, improper fan installation not only gets system unstable but also could damage both CPU and motherboard because insufficient thermal dissipation. If you would like to know more about thermal topic please see AMD website for detailed thermal requirement through the address: http://www.amd.com/products/athlon/thermals http://www.amd.com/products/duron/thermals All manuals and user guides at all-guides.com Shuttle® FN41 Socket 462 AMD AthlonXP/Athlon/Duron Processor based DDR Mainboard Manual Version 1.0 Copyright Copyright© 2002 by Shuttle® Inc. All Rights Reserved. No part of this publication may be reproduced, transcribed, stored in a retrieval system, translated into any language, or transmitted in any form or by any means, electronic, mechanical, magnetic, optical, chemical, photocopying, manual, or otherwise, without prior written permission from Shuttle® Inc. Disclaimer Shuttle® Inc. shall not be liable for any incidental or consequential damages resulting from the performance or use of this product. This company makes no representations or warranties regarding the contents of this manual. Information in this manual has been carefully checked for reliability; however, no guarantee is given as to the correctness of the contents.
    [Show full text]
  • AMD Opteron™ Shared Memory MP Systems Ardsher Ahmed Pat Conway Bill Hughes Fred Weber Agenda
    AMD Opteron™ Shared Memory MP Systems Ardsher Ahmed Pat Conway Bill Hughes Fred Weber Agenda • Glueless MP systems • MP system configurations • Cache coherence protocol • 2-, 4-, and 8-way MP system topologies • Beyond 8-way MP systems September 22, 2002 Hot Chips 14 2 AMD Opteron™ Processor Architecture DRAM 5.3 GB/s 128-bit MCT CPU SRQ XBAR HT HT HT 3.2 GB/s per direction 3.2 GB/s per direction @ 0+]'DWD5DWH @ 0+]'DWD5DWH 3.2 GB/s per direction @ 0+]'DWD5DWH HT = HyperTransport™ technology September 22, 2002 Hot Chips 14 3 Glueless MP System DRAM DRAM MCT CPU MCT CPU SRQ SRQ non-Coherent HyperTransport™ Link XBAR XBAR HT I/O I/O I/O HT cHT cHT cHT cHT Coherent HyperTransport ™ cHT cHT I/O I/O HT HT cHT cHT XBAR XBAR CPU MCT CPU MCT SRQ SRQ HT = HyperTransport™ technology DRAM DRAM September 22, 2002 Hot Chips 14 4 MP Architecture • Programming model of memory is effectively SMP – Physical address space is flat and fully coherent – Far to near memory latency ratio in a 4P system is designed to be < 1.4 – Latency difference between remote and local memory is comparable to the difference between a DRAM page hit and a DRAM page conflict – DRAM locations can be contiguous or interleaved – No processor affinity or NUMA tuning required • MP support designed in from the beginning – Lower overall chip count results in outstanding system reliability – Memory Controller and XBAR operate at the processor frequency – Memory subsystem scale with frequency improvements September 22, 2002 Hot Chips 14 5 MP Architecture (contd.) • Integrated Memory Controller
    [Show full text]
  • Architecture and Application of Infortrend Infiniband Design
    Architecture and Application of Infortrend InfiniBand Design Application Note Version: 1.3 Updated: October, 2018 Abstract: Focusing on the architecture and application of InfiniBand technology, this document introduces the architecture, application scenarios and highlights of the Infortrend InfiniBand host module design. Infortrend InfiniBand Host Module Design Contents Contents ............................................................................................................................................. 2 What is InfiniBand .............................................................................................................................. 3 Overview and Background .................................................................................................... 3 Basics of InfiniBand .............................................................................................................. 3 Hardware ....................................................................................................................... 3 Architecture ................................................................................................................... 4 Application Scenarios for HPC ............................................................................................................. 5 Current Limitation .............................................................................................................................. 6 Infortrend InfiniBand Host Board Design ............................................................................................
    [Show full text]
  • Get More out of the Intel Foxhollow Platform
    Get More Out Of the Intel Foxhollow Platform Akber Kazmi, Marketing Director, PLX Technology Introduction As being reported by the mainstream technology media, Intel is leveraging the technology from its latest server-class Nehalem CPU to offer the Lynnfield CPU, targeted for high-end desktop and entry-level servers. This platform is codenamed “Foxhollow “. Intel is expected to launch this new platform sometime in the second half of 2009. This entry-level uni-processor (UP) server platform will feature two to four cores as Intel wants to pack a lot of processing power in all its platforms. The Foxhollow platform is quite different from the previous Desktops and UP servers in that it reduces the solution from three chips to two chips by eliminating the northbridge and replacing the southbridge with a new device called the Platform Controller Hub (or PCH) code named Ibexpeak (5 Series Chipset). As Intel has moved the memory controller and the graphics function into the CPU, there's no need for an MCH (Memory Controller Hub), so Intel has simplified its chipset design to keep costs down in the entry-level and mainstream segments. The PCH chip interfaces with the CPU through Intel’s DMI interconnect. The PCH will support eight PCIe lanes, up to four PCI slots, the GE MAC, display interface controllers, I/O controllers, RAID controllers, SATA controllers, USB 2.0 controllers, etc. Foxhollow Motherboards Foxhollow motherboards are being offered in two configurations, providing either two or three x8 PCIe ports for high performance I/Os. However, motherboard vendors can use an alternate configuration that provides one more PCIe x8 port with no significant burden and instead offers 33% more value than the three port solution and 50% more value than the two port solution.
    [Show full text]
  • Chapter 6 MIDI, SCSI, and Sample Dumps
    MIDI, SCSI, and Sample Dumps SCSI Guidelines Chapter 6 MIDI, SCSI, and Sample Dumps SCSI Guidelines The following sections contain information on using SCSI with the K2600, as well as speciÞc sections dealing with the Mac and the K2600. Disk Size Restrictions The K2600 accepts hard disks with up to 2 gigabytes of storage capacity. If you attach an unformatted disk that is larger than 2 gigabytes, the K2600 will still be able to format it, but only as a 2 gigabyte disk. If you attach a formatted disk larger than 2 gigabytes, the K2600 will not be able to work with it; you could reformat the disk, but thisÑof courseÑwould erase the disk entirely. Configuring a SCSI Chain Here are some basic guidelines to follow when conÞguring a SCSI chain: 1. According to the SCSI SpeciÞcation, the maximum SCSI cable length is 6 meters (19.69 feet). You should limit the total length of all SCSI cables connecting external SCSI devices with Kurzweil products to 17 feet (5.2 meters). To calculate the total SCSI cable length, add the lengths of all SCSI cables, plus eight inches for every external SCSI device connected. No single cable length in the chain should exceed eight feet. 2. The Þrst and last devices in the chain must be terminated. There is a single exception to this rule, however. A K2600 with an internal hard drive and no external SCSI devices attached should have its termination disabled. If you later add an external device to the K2600Õs SCSI chain, you must enable the K2600Õs termination at that time.
    [Show full text]
  • Motherboard Gigabyte Ga-A75n-Usb3
    Socket AM3+ - AMD 990FX - GA-990FXA-UD5 (rev. 1.x) Socket AM3+ Pagina 1/3 GA- Model 990FXA- Motherboard UD5 PCB 1.x Since L2 L3Core System vendor CPU Model Frequency Process Stepping Wattage BIOS Name Cache Cache Bus(MT/s) Version AMD FX-8150 3600MHz 1MBx8 8MB Bulldozer 32nm B2 125W 5200 F5 AMD FX-8120 3100MHz 1MBx8 8MB Bulldozer 32nm B2 125W 5200 F5 AMD FX-8120 3100MHz 1MBx8 8MB Bulldozer 32nm B2 95W 5200 F5 AMD FX-8100 2800MHz 1MBx8 8MB Bulldozer 32nm B2 95W 5200 F5 AMD FX-6100 3300MHz 1MBx6 8MB Bulldozer 32nm B2 95W 5200 F5 AMD FX-4100 3600MHz 1MBx4 8MB Bulldozer 32nm B2 95W 5200 F5 Socket AM3 GA- Model 990FXA- Motherboard UD5 PCB 1.x Since L2 L3Core System vendor CPU Model Frequency Process Stepping Wattage BIOS Name Cache Cache Bus(MT/s) Version AMD Phenom II X6 1100T 3300MHz 512KBx6 6MB Thuban 45nm E0 125W 4000 F2 AMD Phenom II X6 1090T 3200MHz 512KBx6 6MB Thuban 45nm E0 125W 4000 F2 AMD Phenom II X6 1075T 3000MHz 512KBx6 6MB Thuban 45nm E0 125W 4000 F2 AMD Phenom II X6 1065T 2900MHz 512KBx6 6MB Thuban 45nm E0 95W 4000 F2 AMD Phenom II X6 1055T 2800MHz 512KBx6 6MB Thuban 45nm E0 125W 4000 F2 AMD Phenom II X6 1055T 2800MHz 512KBx6 6MB Thuban 45nm E0 95W 4000 F2 AMD Phenom II X6 1045T 2700MHz 512KBx6 6MB Thuban 45nm E0 95W 4000 F2 AMD Phenom II X6 1035T 2600MHz 512KBx6 6MB Thuban 45nm E0 95W 4000 F2 AMD Phenom II X4 980 3700MHz 512KBx4 6MB Deneb 45nm C3 125W 4000 F6 AMD Phenom II X4 975 3600MHz 512KBx4 6MB Deneb 45nm C3 125W 4000 F2 AMD Phenom II X4 970 3500MHz 512KBx4 6MB Deneb 45nm C3 125W 4000 F2 AMD Phenom II X4 965 3400MHz 512KBx4
    [Show full text]
  • Evaluation of AMD EPYC
    Evaluation of AMD EPYC Chris Hollowell <[email protected]> HEPiX Fall 2018, PIC Spain What is EPYC? EPYC is a new line of x86_64 server CPUs from AMD based on their Zen microarchitecture Same microarchitecture used in their Ryzen desktop processors Released June 2017 First new high performance series of server CPUs offered by AMD since 2012 Last were Piledriver-based Opterons Steamroller Opteron products cancelled AMD had focused on low power server CPUs instead x86_64 Jaguar APUs ARM-based Opteron A CPUs Many vendors are now offering EPYC-based servers, including Dell, HP and Supermicro 2 How Does EPYC Differ From Skylake-SP? Intel’s Skylake-SP Xeon x86_64 server CPU line also released in 2017 Both Skylake-SP and EPYC CPU dies manufactured using 14 nm process Skylake-SP introduced AVX512 vector instruction support in Xeon AVX512 not available in EPYC HS06 official GCC compilation options exclude autovectorization Stock SL6/7 GCC doesn’t support AVX512 Support added in GCC 4.9+ Not heavily used (yet) in HEP/NP offline computing Both have models supporting 2666 MHz DDR4 memory Skylake-SP 6 memory channels per processor 3 TB (2-socket system, extended memory models) EPYC 8 memory channels per processor 4 TB (2-socket system) 3 How Does EPYC Differ From Skylake (Cont)? Some Skylake-SP processors include built in Omnipath networking, or FPGA coprocessors Not available in EPYC Both Skylake-SP and EPYC have SMT (HT) support 2 logical cores per physical core (absent in some Xeon Bronze models) Maximum core count (per socket) Skylake-SP – 28 physical / 56 logical (Xeon Platinum 8180M) EPYC – 32 physical / 64 logical (EPYC 7601) Maximum socket count Skylake-SP – 8 (Xeon Platinum) EPYC – 2 Processor Inteconnect Skylake-SP – UltraPath Interconnect (UPI) EYPC – Infinity Fabric (IF) PCIe lanes (2-socket system) Skylake-SP – 96 EPYC – 128 (some used by SoC functionality) Same number available in single socket configuration 4 EPYC: MCM/SoC Design EPYC utilizes an SoC design Many functions normally found in motherboard chipset on the CPU SATA controllers USB controllers etc.
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]