AMD EPYC™ 7003 Cpus “ZEN 3” Cores 3Rd Generation EPYC

Total Page:16

File Type:pdf, Size:1020Kb

AMD EPYC™ 7003 Cpus “ZEN 3” Cores 3Rd Generation EPYC ❑ Introduction. Pitch on “Why AMD for CPU” ❑ SKU orientation ❑ Architecture ❑ Software Development Environment ❑ SPACK HPC Package Management ❑ Applications and their Characterisation ❑ References © Advanced Micro Devices Inc | All Rights Reserved 3 NASA | JULY 2021 Source: https://openai.com/blog/ai-and-compute/ (Machine Intelligence) and https://www.top500.org/ (High Performance Computing) “DISCOVER” “RED STORM” “FRANKLIN” “RANGER” “HOPPER” “CIELO” US DEPT. OF ENERGY “CORONA” “PERLMUTTER” EUROHPC JU SANDIA NERSC TACC NERSC LLNL EXASCALE PROGRAM FUNDING LLNL NERSC “CSD3” “ ” 2012-PRESENT CAMBRIDGE UNIIVERSITY EL CAPITAN ORNL “JACQUARD” “ROADRUNNER” “ COSMA8” NERSC LLNL DURHAM UNIVERSITY “TITAN” “ LUMI” ORNL UK Met Office “JAGUAR” “KRAKEN” EUROHPC JU ORNL U OF TENNESSE/NISC “ FRONTIER” ORNL 2005 2006 2007 2008 2009 2010 2011 2012 2019 2020 2021 2022 2023 4 NASA | JULY 2021 5 © Advanced Micro Devices Inc | All Rights Reserved 6 NASA | JULY 2021 5nm 7nm In Design Shipping Now 7nm 14nm / 12nm “ZEN 4” “ZEN 3” “ZEN 2” “ZEN” “ZEN+” 2017 2022 7 NASA | JULY 2021 ROADMAPS SUBJECT TO CHANGE AMD CPU GPU Profilers, Tracers, and Debuggers for Developers System management for IT ✓ ✓ Industry leading HPC & ML frameworks for portability ✓ ✓ Standard Math and Communication Libraries ✓ ✓ C/C++, Python, Fortran ✓ ✓ 8 NASA | JULY 2021 Up to cores per socket Memory channels Up to L3 cache See MLN-001 and MLN-016 © Advanced Micro Devices Inc | All Rights Reserved EPYC 7002 & 7003 DIE MCM (8 CCD + 1 IO) 7002 (“Rome”) Z2 L2 L2 Z2 16MB L3 Z2 L2 L2 Z2 Z2 L2 L2 Z2 16MB L3 Z2 L2 L2 Z2 7003 (“Milan”) Z3 L2 L2 Z3 Z3 L2 32 MB L2 Z3 Z3 L2 L3 L2 Z3 Z3 L2 L2 Z3 11 © Advanced Micro Devices Inc | All Rights Reserved The information contained herein is subject to change without notice. (Not Compatible With Naples MB) © Advanced Micro Devices Inc | All Rights Reserved © Advanced Micro Devices Inc | All Rights Reserved © Advanced Micro Devices Inc | All Rights Reserved © Advanced Micro Devices Inc | All Rights Reserved ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ “F” PERFORMANCE PER CORE OPTIMIZED *Max boost for AMD EPYC processors is the maximum frequency achievable by any single core on the processor under normal operating conditions for server systems. © Advanced Micro Devices Inc | All Rights Reserved COMMENTS: ▪ More CCDs provide increased memory BW ▪ PCIe and Infinity fabric assigned per NUMA. Check CTX-6 affinity ▪ NPS4 shown; NPS2 and NPS1 also possible ▪ NPS4 provides highest memory bandwidth compared to NPS2 or NPS1 ▪ Data Fabric (DF) Speed has been increased from 1467MHz (Rome) to 1600MHz (Milan) ▪ 1600MHz is ‘coupled’ to 3200MHz, i.e. it is exactly 2 cycles in 3200. Extra memory bw increase derives from both increased DF speed and coupled mode. © Advanced Micro Devices Inc | All Rights Reserved [jason@milan003 ~]$ hwloc-ls | less // Shows a snip from hwloc-ls // Group0 L#2 NUMANode L#2 (P#2 63GB) L3 L#4 (32MB) - L2 L#32 (512KB) + L1d L#32 (32KB) + L1i L#32 (32KB) + Core L#32 + PU L#32 (P#32) L2 L#33 (512KB) + L1d L#33 (32KB) + L1i L#33 (32KB) + Core L#33 + PU L#33 (P#33) - L2 L#34 (512KB) + L1d L#34 (32KB) + L1i L#34 (32KB) + Core L#34 + PU L#34 (P#34) L2 L#35 (512KB) + L1d L#35 (32KB) + L1i L#35 (32KB) + Core L#35 + PU L#35 (P#35) L2 L#36 (512KB) + L1d L#36 (32KB) + L1i L#36 (32KB) + Core L#36 + PU L#36 (P#36) L2 L#37 (512KB) + L1d L#37 (32KB) + L1i L#37 (32KB) + Core L#37 + PU L#37 (P#37) L2 L#38 (512KB) + L1d L#38 (32KB) + L1i L#38 (32KB) + Core L#38 + PU L#38 (P#38) L2 L#39 (512KB) + L1d L#39 (32KB) + L1i L#39 (32KB) + Core L#39 + PU L#39 (P#39) L3 L#5 (32MB) L2 L#40 (512KB) + L1d L#40 (32KB) + L1i L#40 (32KB) + Core L#40 + PU L#40 (P#40) L2 L#41 (512KB) + L1d L#41 (32KB) + L1i L#41 (32KB) + Core L#41 + PU L#41 (P#41) L2 L#42 (512KB) + L1d L#42 (32KB) + L1i L#42 (32KB) + Core L#42 + PU L#42 (P#42) L2 L#43 (512KB) + L1d L#43 (32KB) + L1i L#43 (32KB) + Core L#43 + PU L#43 (P#43) L2 L#44 (512KB) + L1d L#44 (32KB) + L1i L#44 (32KB) + Core L#44 + PU L#44 (P#44) L2 L#45 (512KB) + L1d L#45 (32KB) + L1i L#45 (32KB) + Core L#45 + PU L#45 (P#45) [jason@milanLogin ~]$ seq -s , 0 2 127 L2 L#46 (512KB) + L1d L#46 (32KB) + L1i L#46 (32KB) + Core L#46 + PU L#46 (P#46) 0,2,4,6,8,10,12,14,16,18,20,22,24,26,2 L2 L#47 (512KB) + L1d L#47 (32KB) + L1i L#47 (32KB) + Core L#47 + PU L#47 (P#47) 8,30,32,34,36,38,40,42,44,46,48,50,52, 54,56,58,60,62,64,66,68,70,72,74,76,78 HostBridge ,80,82,84,86,88,90,92,94,96,98,100,102 PCIBridge ,104,106,108,110,112,114,116,118,120,1 22,124,126 PCI 21:00.0 (InfiniBand) Net "ib0" © Advanced Micro DevicesOpenFabrics Inc | All Rights Reserved "mlx5_0" AMD EPYC SOFTWARE DEVELOPMENT ENVIRONMENT PLATFORM PERFORMANCE PERFORMANCE DEVELOPER JAVA COMPILERS COMPILERS LIBRARIES TOOLS AMD Optimizing AMD AOCL CPU Libraries µProf Performance & Optimizing Power Profiling AOCC CPU CPU Inference Tools Compilers ZenDNN Acceleration Package Manager integration (e.g. Spack) GDB GNU debugger Use AMD tools for best performance and code efficiency on EPYC CPUs Compilers → focus on delivering the best out-of-the-box code generation for C, C++, Fortran, Java Libraries → support common kernels for core math, solvers and FFT Profiling tools → enable developers to access the full capabilities of EPYC CPUs All tools available at developer.amd.com © Advanced Micro Devices Inc | All Rights Reserved ◢ [AMD Confidential - Distribution with NDA] AOCC – AMD’s performance compiler AMD Provides AMD Adds AMD Starts With 20 AMD CPU SOFTWARE DEVELOPMENT ENVIRONMENT | 2021 | AMD CONFIDENTIAL – NDA USE ONLY ◢ [AMD Confidential - Distribution with NDA] Introducing the AMD Optimizing CPU Libraries Library type Name Description ◢ AOCL are a set of numerical Basic Linear Optimized high-performance Basic Linear BLIS libraries tuned specifically for the Algebra Algebra Subprograms (BLAS) AMD EPYC™ processor family. libFlame & Optimized dense matrix computations as Linear Algebra ◢ Tuned implementations of ScaLAPACK found in Linear Algebra Package (LAPACK) industry-standard math libraries Fast Fourier Fast C routines for computing Discrete enable fast development of FFTW Transforms Fourier Transform high-performance computing applications. Subset of standard C99 math functions optimized for x86-64; Higher accuracy & Core Math & ◢ Simple interfaces to take LibM performance than compiler’s math functions. other functions AMD- originated advantage of the latest AMD Optimized memcpy, memmove from Glibc processor hardware innovations. for Zen architecture, up-streamed to Glibc. ◢ All libraries contributed to the AOCL-Sparse AMD-Sparse Sparse matrix operations and solvers. open-source communities, AMD- originated including AMD-originated LibM Secure Random Single and Double Precision Secure RNG and Sparse. Number Generator Random Number Generator library 21 AMD CPU SOFTWARE DEVELOPMENT ENVIRONMENT | 2021 | AMD CONFIDENTIAL – NDA USE ONLY SPACK: AOCC-OPTIMISED INSTALLATION - - - https://developer.amd.com/spack/ © Advanced Micro Devices Inc | All Rights Reserved SPACK: AOCC-OPTIMISED INSTALLATION [jason@MilanLogin ~]$ spack list amd ==> 8 packages. amdblis amdfftw amdlibflame amdlibm amdscalapack bamdst llvm-amdgpu namd [jason@MilanLogin ~]$ spack list aocc ==> 1 packages. aocc spack install stream %[email protected] +openmp cflags="-O3 -mcmodel=large -DSTREAM_TYPE=double \ -mavx2 -DSTREAM_ARRAY_SIZE=2500000000 -DNTIMES=10 -ffp-contract=fast -fnt-store" spack load stream © Advanced Micro Devices Inc | All Rights Reserved AMD EPYC™ CPUS: APPLICATIONS The following domains are where AMD typically performs well in HPC. VERY STRONG coverage across all real workloads (Life Sciences investigation starting 21Q2) Sector Problem Set Example Code Set (not exhaustive) Automotive ➢ Computational Fluid Dynamics (CFD) ➢ StarCCM+, OpenFOAM, … ➢ Structural Codes (‘Finite Element Analysis’ - FEA) ➢ LSDYNA, NASTRAN, ABAQUS, … Aerospace Same problem set as Automotive Same code set as Automotive Oil and Gas ➢ Reservoir Often rely on ISV codes. Some have ➢ Seismic their own inhouse solvers Weather Weather Forecast and Climate Modelling WRF, UFS, GFS, Unified Model, … HPC Centers, Wide range of codes: Molecular Dynamics, GROMACS, NAMD, CP2K, LAMMPS, including Academia Quantum Chemistry, weather, CFD, inhouse codes VASP, WRF, OpenFOAM, inhouse, …. Government Labs Security conscious (SME?); all problem sets above Any of the above + inhouse codes may apply © Advanced Micro Devices Inc | All Rights Reserved 26 HPC Centre of Excellence | nasa 2021 Higher is Better CHAR-SUT-01, CHAR-APP-01 water_1536 27 HPC Centre of Excellence | nasa 2021 Higher is Better CHAR-SUT-01, CHAR-APP-01 Compare 7742 v 7713 v 7763 Milan vs Rome 7713 7763 120% WRF 6% 7% 100% openFOAM 7% 7% VASP - 14% 80% CP2K 9% 14% 60% QE 9% 19% 40% NAMD 3% 16% LAMMPS 2% 12% 20% GROMACS 7% 22% 0% HPL -14% 4% HPCG 10% 10% 7742 7713 7763 Relative to 7742 Higher is Better, Rome 7742 = 100% 28 HPC Centre of Excellence | nasa 2021 CHAR-SUT-01, CHAR-APP-01 Compare relative performance between MPI+OpenMP Milan vs Rome nd rd 220% 2 Gen EPYC 7742 and 3 Gen EPYC 7763 with all cores active per CDD but 170% with differing MPI + OMP layout∆ : • 7742: 1 MPI /CCD x 8 OMP * 120% • 7742: 1 MPI / L3 x 4 OMP 70% • 7763: 1 MPI / CCD x 8 OMP • 7763: 2 MPI / CCD x 4 OMP 20% → Milan improves on Rome *Normalise to ‘1’ and compare -30% ∆ OpenFOAM and VASP are MPI only 7742; 1 MPI/CCD + 8 OMP 7742; 1 MPI/L3 + 4 OMP 7763; 1 MPI/CCD + 8 OMP 7763; 2 MPI/CCD + 4 OMP Z2 L2 16MB L2 Z2 Z3 L2 32 L2 Z3 Z2 L2 L3 L2 Z2 Z3 L2 L2 Z3 Z2 L2 16MB L2 Z2 Z3 L2 MB L2 Z3 Z2 L2 L3 L2 Z2 Z3 L2 L3 L2 Z3 29 HPC Centre of Excellence | nasa 2021 Higher is Better CHAR-SUT-01, CHAR-APP-01 2 competing system-level choke-points: LAMMPS, EMA ❑ Bandwidth to main memory ❑ ‘Compute Bound (‘frequency’) These are mutually exclusive to each other Perform roofline analysis to confirm where hot- routine lands (red circle) We have performed this analysis on a number of popular HPC codes across CFD, Weather, Quantum Chemistry, Molecular Dynamics: Codes are memory bound or borderline → HPL (compute bound) is *NOT* a good proxy for scoping job-throughput on realistic workloads.
Recommended publications
  • Effective Virtual CPU Configuration with QEMU and Libvirt
    Effective Virtual CPU Configuration with QEMU and libvirt Kashyap Chamarthy <[email protected]> Open Source Summit Edinburgh, 2018 1 / 38 Timeline of recent CPU flaws, 2018 (a) Jan 03 • Spectre v1: Bounds Check Bypass Jan 03 • Spectre v2: Branch Target Injection Jan 03 • Meltdown: Rogue Data Cache Load May 21 • Spectre-NG: Speculative Store Bypass Jun 21 • TLBleed: Side-channel attack over shared TLBs 2 / 38 Timeline of recent CPU flaws, 2018 (b) Jun 29 • NetSpectre: Side-channel attack over local network Jul 10 • Spectre-NG: Bounds Check Bypass Store Aug 14 • L1TF: "L1 Terminal Fault" ... • ? 3 / 38 Related talks in the ‘References’ section Out of scope: Internals of various side-channel attacks How to exploit Meltdown & Spectre variants Details of performance implications What this talk is not about 4 / 38 Related talks in the ‘References’ section What this talk is not about Out of scope: Internals of various side-channel attacks How to exploit Meltdown & Spectre variants Details of performance implications 4 / 38 What this talk is not about Out of scope: Internals of various side-channel attacks How to exploit Meltdown & Spectre variants Details of performance implications Related talks in the ‘References’ section 4 / 38 OpenStack, et al. libguestfs Virt Driver (guestfish) libvirtd QMP QMP QEMU QEMU VM1 VM2 Custom Disk1 Disk2 Appliance ioctl() KVM-based virtualization components Linux with KVM 5 / 38 OpenStack, et al. libguestfs Virt Driver (guestfish) libvirtd QMP QMP Custom Appliance KVM-based virtualization components QEMU QEMU VM1 VM2 Disk1 Disk2 ioctl() Linux with KVM 5 / 38 OpenStack, et al. libguestfs Virt Driver (guestfish) Custom Appliance KVM-based virtualization components libvirtd QMP QMP QEMU QEMU VM1 VM2 Disk1 Disk2 ioctl() Linux with KVM 5 / 38 libguestfs (guestfish) Custom Appliance KVM-based virtualization components OpenStack, et al.
    [Show full text]
  • AMD EPYC™ 7371 Processors Accelerating HPC Innovation
    AMD EPYC™ 7371 Processors Solution Brief Accelerating HPC Innovation March, 2019 AMD EPYC 7371 processors (16 core, 3.1GHz): Exceptional Memory Bandwidth AMD EPYC server processors deliver 8 channels of memory with support for up The right choice for HPC to 2TB of memory per processor. Designed from the ground up for a new generation of solutions, AMD EPYC™ 7371 processors (16 core, 3.1GHz) implement a philosophy of Standards Based AMD is committed to industry standards, choice without compromise. The AMD EPYC 7371 processor delivers offering you a choice in x86 processors outstanding frequency for applications sensitive to per-core with design innovations that target the performance such as those licensed on a per-core basis. evolving needs of modern datacenters. No Compromise Product Line Compute requirements are increasing, datacenter space is not. AMD EPYC server processors offer up to 32 cores and a consistent feature set across all processor models. Power HPC Workloads Tackle HPC workloads with leading performance and expandability. AMD EPYC 7371 processors are an excellent option when license costs Accelerate your workloads with up to dominate the overall solution cost. In these scenarios the performance- 33% more PCI Express® Gen 3 lanes. per-dollar of the overall solution is usually best with a CPU that can Optimize Productivity provide excellent per-core performance. Increase productivity with tools, resources, and communities to help you “code faster, faster code.” Boost AMD EPYC processors’ innovative architecture translates to tremendous application performance with Software performance. More importantly, the performance you’re paying for can Optimization Guides and Performance be matched to the appropriate to the performance you need.
    [Show full text]
  • Evaluation of AMD EPYC
    Evaluation of AMD EPYC Chris Hollowell <[email protected]> HEPiX Fall 2018, PIC Spain What is EPYC? EPYC is a new line of x86_64 server CPUs from AMD based on their Zen microarchitecture Same microarchitecture used in their Ryzen desktop processors Released June 2017 First new high performance series of server CPUs offered by AMD since 2012 Last were Piledriver-based Opterons Steamroller Opteron products cancelled AMD had focused on low power server CPUs instead x86_64 Jaguar APUs ARM-based Opteron A CPUs Many vendors are now offering EPYC-based servers, including Dell, HP and Supermicro 2 How Does EPYC Differ From Skylake-SP? Intel’s Skylake-SP Xeon x86_64 server CPU line also released in 2017 Both Skylake-SP and EPYC CPU dies manufactured using 14 nm process Skylake-SP introduced AVX512 vector instruction support in Xeon AVX512 not available in EPYC HS06 official GCC compilation options exclude autovectorization Stock SL6/7 GCC doesn’t support AVX512 Support added in GCC 4.9+ Not heavily used (yet) in HEP/NP offline computing Both have models supporting 2666 MHz DDR4 memory Skylake-SP 6 memory channels per processor 3 TB (2-socket system, extended memory models) EPYC 8 memory channels per processor 4 TB (2-socket system) 3 How Does EPYC Differ From Skylake (Cont)? Some Skylake-SP processors include built in Omnipath networking, or FPGA coprocessors Not available in EPYC Both Skylake-SP and EPYC have SMT (HT) support 2 logical cores per physical core (absent in some Xeon Bronze models) Maximum core count (per socket) Skylake-SP – 28 physical / 56 logical (Xeon Platinum 8180M) EPYC – 32 physical / 64 logical (EPYC 7601) Maximum socket count Skylake-SP – 8 (Xeon Platinum) EPYC – 2 Processor Inteconnect Skylake-SP – UltraPath Interconnect (UPI) EYPC – Infinity Fabric (IF) PCIe lanes (2-socket system) Skylake-SP – 96 EPYC – 128 (some used by SoC functionality) Same number available in single socket configuration 4 EPYC: MCM/SoC Design EPYC utilizes an SoC design Many functions normally found in motherboard chipset on the CPU SATA controllers USB controllers etc.
    [Show full text]
  • Amd(Amd.Us)18Q1 点评 2018 年 07 月 30 日
    海外公司报告 | 公司动态研究 证券研究报告 AMD(AMD.US)18Q1 点评 2018 年 07 月 30 日 作者 AMD 7 年最佳,10 年翻身,重申买入,TP 上调至 何翩翩 分析师 23 美元 SAC 执业证书编号:S1110516080002 [email protected] 业绩超预期,7 年来最佳盈利季 雷俊成 分析师 SAC 执业证书编号:S1110518060004 AMD 18Q2 实现 7 年来最佳盈利季度,non-GAAP EPS 0.14 美元,营收 17.6 [email protected] 亿美元同比大涨 53%,均超过华尔街预期的 EPS 0.13 美元和营收 17.2 亿美 马赫 分析师 SAC 执业证书编号:S1110518070001 元。计算与图形业务同比大涨 64%至 10.9 亿美元好于市场预期的 10.6 亿, [email protected] 但受 Q2 区块链相关贡献进一步减弱带来该业务环比跌 3%。挖矿业务本季 董可心 联系人 营收占比从上季的 10%降低为 6%,公司进一步看淡下半年需求。EESC 业务 [email protected] 同比涨 37%至 6.7 亿美元,好于预期的 6.61 亿,EPYC 逐步进入放量阶段, 公司维持到年底会实现中单位数份额的预测。Q2 毛利率提升至 37%,Q3 指引营收 17 亿美元,同比增长 7%,略低于市场预期的 17.6 亿,毛利率提 相关报告 升至约 38%;全年指引营收增速保持 25%,我们认为公司指引基于 17Q3 的 1 《AMD(AMD.US)点评:EPYC“从 高基数较为保守,且区块链影响作为一次性业务逐渐消弭也会进一步减少 零到一”终实现,7nm 产品周期全方位 业绩不确定性,我们看好 EPYC 会在下半年至 Q4 迎来关键放量。 回归“传奇”;TP 上调至 22 美元,重 服务器市场 AMD 与 Intel“荣辱互见” 申买入》2018-06-20 2 《AMD(AMD.US)点评:公布 7nm 服务器市场 AMD 与 Intel“荣辱互见”,EPYC 服务器随着 Cisco、HPE 适配 GPU 加入 AI 计算抢滩战,Ryzen+EPYC 以及超级云计算客户的需求能见度提高,Q2 出货量和营收均环比提高超 50%,目前与 AMD 合作的 5 个云计算巨头成主要推动力。我们认为 AMD “双子星”仍是中流砥柱;TP 上调至 将继续通过单插槽服务器高核心数和低功耗打造性价比优势,下半年加速 18 美元,重申买入》2018-06-08 市场渗透蚕食 Intel 份额,进入明年则等待 7nm 的第二代 EPYC 面市,面对 3 《AMD(AMD.US)18Q1 点评:2018 已将 10nm Cannon Lake 量产时点延后至明年的 Intel,AMD 将终于实现制 开门红,业绩指引均超预期,Ryzen 继 程反超,加速量价齐升。“从零到一”抢占 20 亿美元以上的市场份额。 续扎实闪耀,EPYC 仍待升级放量,重 反观 Intel Q2 数据中心业务收入 55.5 亿美元,虽然在整体行业高景气度下 申买入》2018-04-30 同比增长 27%,但仍低于市场预期的 56.3 亿美元。业绩发布会上 Intel 更为 4 《2017 扭亏为盈业绩迎拐点,2018 明确消费级 10nm 产品会到 19 年下半年节日旺季才推向市场,让市场情绪 厚积待薄发,Ryzen+EPYC 继续双星闪 愈加悲观的同时也给了 AMD 足够的时间窗口。 耀,重申买入》2018-02-01 Ryzen 继续攻城略地,进一步打开笔记本市场 5 《AMD(AMD.US)点评:合作英特
    [Show full text]
  • Software-Based Undervolting Faults in AMD Zen Processors Fehler in AMD Zen Prozessoren Durch Software-Basierte Unterspannung
    Software-based Undervolting Faults in AMD Zen Processors Fehler in AMD Zen Prozessoren durch Software-basierte Unterspannung Bachelorarbeit im Rahmen des Studiengangs IT-Sicherheit der Universität zu Lübeck vorgelegt von Anja Rabich ausgegeben und betreut von Prof. Dr. Thomas Eisenbarth mit Unterstützung von Luca Wilke Lübeck, den 31. August 2020 Abstract Dynamic Voltage and Frequency Scaling (DVFS) is a powerful performance enhance- ment method used by modern processors, allowing them to scale voltage or frequency as needed based on the power requirements of the CPU. This not only saves power, but also prevents processors from overheating. However, the continued integration of soft- ware interfaces giving a user direct access to this functionality has been shown to be a potential security risk, allowing a privileged adversary to indirectly tamper with sensitive computations. This thesis summarizes the results of various papers showing that using DVFS features, unsuitable voltage/frequency values can be set for the processor leading to hardware faults and calculation errors which can be used to undermine the integrity of Trusted Execution Environments (TEE). Results are partially replicated for Intel’s TEE implementation SGX, followed by extending the same methodology to AMD’s Zen Pro- cessors, on which there is currently no information. Results show that undervolting is an unlikely attack vector. iii Zusammenfassung Dynamische Spannungs- und Frequenzskalierung (engl. DVFS) ist ein in modernen Prozessoren vorhandener Leistungs- und Stromverwaltungsmechanismus, womit Span- nung und Frequenz der CPU je nach Bedarf skaliert werden können. Somit wird nicht nur Strom gespart, sondern auch zusätzlich verhindert, dass der Prozessor überhitzt. Die zunehmende Integration von Softwareschnittstellen zu diesen Mechanismen die dem Nutzer Einstellungsmöglichkeiten anbieten, haben sich zunehmend als potenzielle Sicherheitslücke erwiesen.
    [Show full text]
  • Die Meilensteine Der Computer-, Elek
    Das Poster der digitalen Evolution – Die Meilensteine der Computer-, Elektronik- und Telekommunikations-Geschichte bis 1977 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 und ... Von den Anfängen bis zu den Geburtswehen des PCs PC-Geburt Evolution einer neuen Industrie Business-Start PC-Etablierungsphase Benutzerfreundlichkeit wird gross geschrieben Durchbruch in der Geschäftswelt Das Zeitalter der Fensterdarstellung Online-Zeitalter Internet-Hype Wireless-Zeitalter Web 2.0/Start Cloud Computing Start des Tablet-Zeitalters AI (CC, Deep- und Machine-Learning), Internet der Dinge (IoT) und Augmented Reality (AR) Zukunftsvisionen Phasen aber A. Bowyer Cloud Wichtig Zählhilfsmittel der Frühzeit Logarithmische Rechenhilfsmittel Einzelanfertigungen von Rechenmaschinen Start der EDV Die 2. Computergeneration setzte ab 1955 auf die revolutionäre Transistor-Technik Der PC kommt Jobs mel- All-in-One- NAS-Konzept OLPC-Projekt: Dass Computer und Bausteine immer kleiner, det sich Konzepte Start der entwickelt Computing für die AI- schneller, billiger und energieoptimierter werden, Hardware Hände und Finger sind die ersten Wichtige "PC-Vorläufer" finden wir mit dem werden Massenpro- den ersten Akzeptanz: ist bekannt. Bei diesen Visionen geht es um die Symbole für die Mengendarstel- schon sehr früh bei Lernsystemen. iMac und inter- duktion des Open Source Unterstüt- möglichen zukünftigen Anwendungen, die mit 3D-Drucker zung und lung. Ägyptische Illustration des Beispiele sind: Berkley Enterprice mit neuem essant: XO-1-Laptops: neuen Technologien und Konzepte ermöglicht Veriton RepRap nicht Ersatz werden.
    [Show full text]
  • High Performance Linpack Benchmark on AMD EPYC™ Processors
    High Performance Linpack Benchmark on AMD EPYC™ Processors This document details running the High Performance Linpack (HPL) benchmark using the AMD xhpl binary. HPL Implementation: The HPL benchmark presents an opportunity to demonstrate the optimal combination of multithreading (via the OpenMP library) and MPI for scientific and technical high-performance computing on the EPYC architecture. For MPI applications where the per-MPI-rank work can be further parallelized, each L3 cache is an MPI rank running a multi-threaded application. This approach results in fewer MPI ranks than using one rank per core, and results in a corresponding reduction in MPI overhead. The ideal balance is for the number of threads per MPI rank to be less than or equal to the number of CPUs per L3 cache. The exact maximum thread count per MPI rank depends on both the specific EPYC SKU (e.g. 32 core parts have 4 physical cores per L3, 24 core parts have 3 physical cores per L3) and whether SMT is enabled (e.g. for a 32 core part with SMT enabled there are 8 CPUs per L3). HPL performance is primarily determined by DGEMM performance, which is in turn primarily determined by SIMD throughput. The Zen microarchitecture of the EPYC processor implements one SIMD unit per physical core. Since HPL is SIMD limited, when SMT is enabled using a second HPL thread per core will not directly improve HPL performance. However, leaving SMT enabled may indirectly allow slightly higher performance (1% - 2%) since the OS can utilize the SMT siblings as needed without pre-empting the HPL threads.
    [Show full text]
  • AMD's Early Processor Lines, up to the Hammer Family (Families K8
    AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) Dezső Sima October 2018 (Ver. 1.1) Sima Dezső, 2018 AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) • 1. Introduction to AMD’s processor families • 2. AMD’s 32-bit x86 families • 3. Migration of 32-bit ISAs and microarchitectures to 64-bit • 4. Overview of AMD’s K8 – K10.5 (Hammer-based) families • 5. The K8 (Hammer) family • 6. The K10 Barcelona family • 7. The K10.5 Shanghai family • 8. The K10.5 Istambul family • 9. The K10.5-based Magny-Course/Lisbon family • 10. References 1. Introduction to AMD’s processor families 1. Introduction to AMD’s processor families (1) 1. Introduction to AMD’s processor families AMD’s early x86 processor history [1] AMD’s own processors Second sourced processors 1. Introduction to AMD’s processor families (2) Evolution of AMD’s early processors [2] 1. Introduction to AMD’s processor families (3) Historical remarks 1) Beyond x86 processors AMD also designed and marketed two embedded processor families; • the 2900 family of bipolar, 4-bit slice microprocessors (1975-?) used in a number of processors, such as particular DEC 11 family models, and • the 29000 family (29K family) of CMOS, 32-bit embedded microcontrollers (1987-95). In late 1995 AMD cancelled their 29K family development and transferred the related design team to the firm’s K5 effort, in order to focus on x86 processors [3]. 2) Initially, AMD designed the Am386/486 processors that were clones of Intel’s processors.
    [Show full text]
  • AMD Raven Ridge
    DELIVERING A NEW LEVEL OF VISUAL PERFORMANCE IN AN SOC AMD “RAVEN RIDGE” APU Dan Bouvier, Jim Gibney, Alex Branover, Sonu Arora Presented by: Dan Bouvier Corporate VP, Client Products Chief Architect AMD CONFIDENTIAL RAISING THE BAR FOR THE APU VISUAL EXPERIENCE Up to MOBILE APU GENERATIONAL 200% MORE CPU PERFORMANCE PERFORMANCE GAINS Up to 128% MORE GPU PERFORMANCE Up to 58% LESS POWER FIRST “Zen”-based APU CPU Performance GPU Performance Power HIGH-PERFORMANCE AMD Ryzen™ 7 2700U 7th Gen AMD A-Series APU On-die “Vega”-based graphics Scaled GPU Managed Improved Upgraded Increased LONG BATTERY LIFE and CPU up to power delivery memory display package Premium form factors reach target and thermal bandwidth experience performance frame rate dissipation efficiency density 2 | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 | * See footnotes for details. “RAVEN RIDGE” APU AMD “ZEN” x86 CPU CORES CPU 0 “ZEN” CPU CPU 1 (4 CORE | 8 THREAD) USB 3.1 NVMe PCIe FULL PCIe GPP ----------- ----------- Discrete SYSTEM 4MB USB 2.0 SATA GFX CONNECTIVITY CPU 2 CPU 3 L3 Cache X64 DDR4 HIGH BANDWIDTH SOC FABRIC System Infinity Fabric & MEMORY Management SYSTEM Unit ACCELERATED Platform Multimedia Security MULTIMEDIA Processor Engines AMD GFX+ 1MB L2 EXPERIENCE X64 DDR4 (11 COMPUTE UNITS) Cache Video Audio Sensor INTEGRATED CU CU CU CU CU CU Display Codec ACP Fusion Controller Next SENSOR Next Hub FUSION HUB CU CU CU CU CU AMD “VEGA” GPU UPGRADED DISPLAY ENGINE 3 | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 | SIGNIFICANT DENSITY INCREASE “Raven Ridge” die BGA Package: 25 x 35 x 1.38mm Technology: GLOBALFOUNDRIES 14nm – 11 layer metal Transistor count: 4.94B 59% 16% Die Size: 209.78mm2 more transistors smaller die than prior generation “Bristol Ridge” APU 4 | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 | * See footnotes for details.
    [Show full text]
  • AMD EPYC™ 7003 Series Cpus Set New Standard As Highest Performance Server Processor
    March 15, 2021 AMD EPYC™ 7003 Series CPUs Set New Standard as Highest Performance Server Processor New AMD EPYC processor extends per socket performance leadership and best per core performance1** with new “Zen 3” cores and modern security features Partners including AWS, Cisco, Dell Technologies, Google Cloud, HPE, Lenovo, Microsoft Azure, Oracle Cloud Infrastructure, Supermicro, Tencent Cloud and others grow EPYC processor ecosystem to an expected 400 cloud instances and 100 new OEM platforms by end of 2021 SANTA CLARA, Calif., March 15, 2021 (GLOBE NEWSWIRE) -- At a digital event, AMD (NASDAQ: AMD) announced the new AMD EPYC™ 7003 Series CPUs, which includes the AMD EPYC 7763, the world’s highest-performing server processor2*. The new EPYC 7003 series processors help HPC, cloud and enterprise customers do more, faster, by delivering the best performance of any server CPU with up to 19% more instructions per clock3. “With the launch of our 3rd Gen AMD EPYC processors, we are incredibly excited to deliver the fastest server CPU in the world. These processors extend our data center leadership and help customers solve today’s most complex IT challenges, while substantially growing our ecosystem,” said Forrest Norrod, senior vice president and general manager, Data Center and Embedded Solutions Business Group. “We not only double the performance over the competition in HPC, cloud and enterprise workloads with our newest server CPUs, but together with the AMD Instinct GPUs, we are breaking the exascale barrier in supercomputing and helping to tackle problems that have previously been beyond humanity’s reach.” AMD EPYC Processors, Powering the Modern Data Center Available immediately, AMD EPYC 7003 Series Processors have up to 64 “Zen 3” cores per processor and introduce new levels of per-core cache memory, while continuing to offer the PCIe® 4 connectivity and class-leading memory bandwidth4 that defined the EPYC 7002 series CPUs.
    [Show full text]
  • Best Practice Guide Modern Processors
    Best Practice Guide Modern Processors Ole Widar Saastad, University of Oslo, Norway Kristina Kapanova, NCSA, Bulgaria Stoyan Markov, NCSA, Bulgaria Cristian Morales, BSC, Spain Anastasiia Shamakina, HLRS, Germany Nick Johnson, EPCC, United Kingdom Ezhilmathi Krishnasamy, University of Luxembourg, Luxembourg Sebastien Varrette, University of Luxembourg, Luxembourg Hayk Shoukourian (Editor), LRZ, Germany Updated 5-5-2021 1 Best Practice Guide Modern Processors Table of Contents 1. Introduction .............................................................................................................................. 4 2. ARM Processors ....................................................................................................................... 6 2.1. Architecture ................................................................................................................... 6 2.1.1. Kunpeng 920 ....................................................................................................... 6 2.1.2. ThunderX2 .......................................................................................................... 7 2.1.3. NUMA architecture .............................................................................................. 9 2.2. Programming Environment ............................................................................................... 9 2.2.1. Compilers ........................................................................................................... 9 2.2.2. Vendor performance libraries
    [Show full text]
  • AMD Introduces World's Most Powerful 16- Core
    November 7, 2019 AMD Introduces World’s Most Powerful 16- core Consumer Desktop Processor, the AMD Ryzen™ 9 3950X – AMD Ryzen™ 9 3950X rounds out 3rd Gen Ryzen desktop processor series, arriving November 25 – – New AMD Athlon™ 3000G processor to provide everyday users with unmatched performance per dollar, coming November 19 – SANTA CLARA, Calif., Nov. 07, 2019 (GLOBE NEWSWIRE) -- Today, AMD announced the release of the highly anticipated flagship 16-core AMD Ryzen 9 3950X processor, available worldwide November 25, 2019. AMD Ryzen 9 3950X processor brings the ultimate processor for gamers with effortless 1080P gaming in select titles1 and up to 2X more energy efficient processing power compared to the competition2 as the world’s fastest 16- core consumer desktop processor3. In addition, AMD also announced a significant performance uplift4 coming for mainstream desktop users with the new AMD Athlon 3000G, arriving November 19, 2019. “We are excited to bring the AMD Ryzen™ 9 3950X to market later this month, offering enthusiasts the most powerful 16-core desktop processor ever,” said Chris Kilburn, corporate vice president and general manager, client channel, AMD. “We are focused on offering the best solutions at every level of the market, including the AMD Athlon 3000G for everyday PC users that delivers great performance at an incredible price point.” AMD Ryzen 9 3950X: Fastest 16-core Consumer Desktop Processor Offering up to 22% performance increase over previous generations5, the AMD Ryzen 9 3950X offers faster 1080p gaming in select titles1 and content creation6 than the competition. Built on the industry-leading “Zen 2” architecture, the AMD Ryzen 9 3950X also excels in power efficiency3 with a TDP7 of 105W.
    [Show full text]