Optimizing for AMD Ryzen
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
AMD Accelerated Parallel Processing Opencl Programming Guide
AMD Accelerated Parallel Processing OpenCL Programming Guide November 2013 rev2.7 © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated Parallel Processing logo, ATI, the ATI logo, Radeon, FireStream, FirePro, Catalyst, and combinations thereof are trade- marks of Advanced Micro Devices, Inc. Microsoft, Visual Studio, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the U.S. and/or other jurisdic- tions. Other names are for informational purposes only and may be trademarks of their respective owners. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or other- wise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury, death, or severe property or envi- ronmental damage may occur. -
Catalyst™ Software Suite Version 9.5 Release Notes
Catalyst™ Software Suite Version 9.5 Release Notes This release note provides information on the latest posting of AMD’s industry leading software suite, Catalyst™. This particular software suite updates both the AMD Display Driver, and the Catalyst™ Control Center. This unified driver has been further enhanced to provide the highest level of power, performance, and reliability. The AMD Catalyst™ software suite is the ultimate in performance and stability. For exclusive Catalyst™ updates follow Catalyst Maker on Twitter. This release note provides information on the following: z Web Content z AMD Product Support z Operating Systems Supported z New Features z Performance Improvements z Resolved Issues for the Windows Vista Operating System z Resolved Issues for the Windows XP Operating System z Resolved Issues for the Windows 7 Operating System z Known Issues Under the Windows Vista Operating System z Known Issues Under the Windows XP Operating System z Known Issues Under the Windows 7 Operating System z Installing the Catalyst™ Vista Software Driver z Catalyst™ Crew Driver Feedback ATI Catalyst™ Release Note Version 9.5 1 Web Content The Catalyst™ Software Suite 9.5 contains the following: z Radeon™ display driver 8.612 z HydraVision™ for both Windows XP and Vista z HydraVision™ Basic Edition (Windows XP only) z WDM Driver Install Bundle z Southbridge/IXP Driver z Catalyst™ Control Center Version 8.612 Caution: The Catalyst™ software driver and the Catalyst™ Control Center can be downloaded independently of each other. However, for maximum stability and performance AMD recommends that both components be updated from the same Catalyst™ release. -
AMD Ryzen™ PRO & Athlon™ PRO Processors Quick Reference Guide
AMD Ryzen™ PRO & Athlon™ PRO Processors Quick Reference guide AMD Ryzen™ PRO Processors with Radeon™ Graphics for Business Laptops (Socket FP6/FP5) 1 Core/Thread Frequency Boost/Base L2+L3 Cache Graphics Node TDP Intel vPro Core/Thread Frequency Boost*/Base L2+L3 Cache Graphics Node TDP AMD PRO technologies COMPARED TO 4.9/1.1 Radeon™ 6/12 UHD AMD Ryzen™ 7 PRO 8/16 Up to 12MB Graphics 7nm 15W intel Intel Core i7 10810U GHz 13MB 4750U 4.1/1.7 GHz CORE i7 14nm 15W (7 Cores) 10th Gen Intel Core i7 10610U 4/8 4.9/1.8 9MB UHD GHz AMD Ryzen™ 7 PRO Up to Radeon™ intel 4/8 6MB 10 12nm 15W 4.8/1.9 3700U 4.0/2.3 GHz Vega CORE i7 Intel Core i7 8665U 4/8 9MB UHD 14nm 15W 8th Gen GHz Radeon™ AMD Ryzen™ 5 PRO 6/12 Up to 11MB Graphics 7nm 15W intel 4.4/1.7 4650U 4.0/2.1 GHz CORE i5 Intel Core i5 10310U 4/8 7MB UHD 14nm 15W GHz (6 Cores) 10th Gen AMD Ryzen™ 5 PRO 4/8 Up to 6MB Radeon™ 12nm 15W intel 4.1/1.6 3500U 3.7/2.1 GHz Vega8 CORE i5 Intel Core i5 8365U 4/8 7MB UHD 14nm 15W th GHz 8 Gen Radeon™ AMD Ryzen™ 3 PRO 4/8 Up to 6MB Graphics 7nm 15W intel 4.1/2.1 4450U 3.7/2.5 GHz CORE i3 Intel Core i3 10110U 2/4 5MB UHD 14nm 15W (5 Cores) 10th Gen GHz AMD Ryzen™ 3 PRO 4/4 Up to 6MB Radeon™ 12nm 15W intel 3.9/2.1 3300U 3.5/2.1 GHz Vega6 CORE i3 Intel Core i3 8145U 2/4 4.5MB UHD 14nm 15W 8th Gen GHz AMD Athlon™ PRO Processors with Radeon™ Vega Graphics for Business Laptops (Socket FP5) AMD Athlon™ PRO Up to Radeon™ intel Intel Pentium 4415U 2/4 2.3 GHz 2.5MB HD 610 14nm 15W 300U 2/4 3.3/2.4 GHz 5MB Vega3 12nm 15W 1. -
Software-Based Undervolting Faults in AMD Zen Processors Fehler in AMD Zen Prozessoren Durch Software-Basierte Unterspannung
Software-based Undervolting Faults in AMD Zen Processors Fehler in AMD Zen Prozessoren durch Software-basierte Unterspannung Bachelorarbeit im Rahmen des Studiengangs IT-Sicherheit der Universität zu Lübeck vorgelegt von Anja Rabich ausgegeben und betreut von Prof. Dr. Thomas Eisenbarth mit Unterstützung von Luca Wilke Lübeck, den 31. August 2020 Abstract Dynamic Voltage and Frequency Scaling (DVFS) is a powerful performance enhance- ment method used by modern processors, allowing them to scale voltage or frequency as needed based on the power requirements of the CPU. This not only saves power, but also prevents processors from overheating. However, the continued integration of soft- ware interfaces giving a user direct access to this functionality has been shown to be a potential security risk, allowing a privileged adversary to indirectly tamper with sensitive computations. This thesis summarizes the results of various papers showing that using DVFS features, unsuitable voltage/frequency values can be set for the processor leading to hardware faults and calculation errors which can be used to undermine the integrity of Trusted Execution Environments (TEE). Results are partially replicated for Intel’s TEE implementation SGX, followed by extending the same methodology to AMD’s Zen Pro- cessors, on which there is currently no information. Results show that undervolting is an unlikely attack vector. iii Zusammenfassung Dynamische Spannungs- und Frequenzskalierung (engl. DVFS) ist ein in modernen Prozessoren vorhandener Leistungs- und Stromverwaltungsmechanismus, womit Span- nung und Frequenz der CPU je nach Bedarf skaliert werden können. Somit wird nicht nur Strom gespart, sondern auch zusätzlich verhindert, dass der Prozessor überhitzt. Die zunehmende Integration von Softwareschnittstellen zu diesen Mechanismen die dem Nutzer Einstellungsmöglichkeiten anbieten, haben sich zunehmend als potenzielle Sicherheitslücke erwiesen. -
Real-Time Ray-Tracing Techniques for Integration Into Existing Renderers TAKAHIRO HARADA, AMD 3/2018 AGENDA
Real-Time Ray-Tracing Techniques for Integration into Existing Renderers TAKAHIRO HARADA, AMD 3/2018 AGENDA y Radeon ProRender, Radeon Rays update y Unity GPU Lightmapper using Radeon Rays (by Jesper) ‒ Helping the game content creator to make better assets y Radeon ProRender + Universal Scene Description ‒ Real-time preview of assets y Radeon ProRender Real-time Rendering ‒ Hybrid ray tracing is a stepping stone to a fully ray traced future, as the same path was followed with production movie rendering. Our solution provides a was to fully path traced rendering with Radeon pro render 2 | GDC 2018 | 19-23 MARCH 2018 RADEON PRORENDER, RADEON RAYS AMD’s Ray tracing solutions 3 | GDC 2018 | 19-23 MARCH 2018 RADEON PRORENDER, RADEON RAYS AMD’S RAY TRACING SOLUTIONS y Radeon ProRender ‒ A complete renderer (ray casting, shading) ‒ Physically based rendering library ‒ Output - Rendered image ‒ For renderer users, or developers y Radeon Rays ‒ For developers ‒ Ray intersection library ‒ Output - Intersections 4 | GDC 2018 | 19-23 MARCH 2018 RADEON PRORENDER y For developers y For content creators ‒ SDK available today on request ‒ https://pro.radeon.com/en/software/prorender/ ‒ [email protected] y Plugins ‒ Maya, 3DS Max, Blender, Rhino, SolidWorks y C API y Direct integration y OpenCL 1.2, Metal 2 ‒ Cinema4D (Maxon, R19~) ‒ Modo (The Foundry, Beta) y Multi platform solution ‒ OS (Windows, MacOs, Linux) ‒ Vendors (AMD,…) RPR for Blender on MacOs 5 | GDC 2018 | 19-23 MARCH 2018 (BMW from Mike Pan) RADEON PRORENDER FEATURE HIGHLIGHTS y Heterogeneous -
High Performance Linpack Benchmark on AMD EPYC™ Processors
High Performance Linpack Benchmark on AMD EPYC™ Processors This document details running the High Performance Linpack (HPL) benchmark using the AMD xhpl binary. HPL Implementation: The HPL benchmark presents an opportunity to demonstrate the optimal combination of multithreading (via the OpenMP library) and MPI for scientific and technical high-performance computing on the EPYC architecture. For MPI applications where the per-MPI-rank work can be further parallelized, each L3 cache is an MPI rank running a multi-threaded application. This approach results in fewer MPI ranks than using one rank per core, and results in a corresponding reduction in MPI overhead. The ideal balance is for the number of threads per MPI rank to be less than or equal to the number of CPUs per L3 cache. The exact maximum thread count per MPI rank depends on both the specific EPYC SKU (e.g. 32 core parts have 4 physical cores per L3, 24 core parts have 3 physical cores per L3) and whether SMT is enabled (e.g. for a 32 core part with SMT enabled there are 8 CPUs per L3). HPL performance is primarily determined by DGEMM performance, which is in turn primarily determined by SIMD throughput. The Zen microarchitecture of the EPYC processor implements one SIMD unit per physical core. Since HPL is SIMD limited, when SMT is enabled using a second HPL thread per core will not directly improve HPL performance. However, leaving SMT enabled may indirectly allow slightly higher performance (1% - 2%) since the OS can utilize the SMT siblings as needed without pre-empting the HPL threads. -
AMD's Early Processor Lines, up to the Hammer Family (Families K8
AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) Dezső Sima October 2018 (Ver. 1.1) Sima Dezső, 2018 AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) • 1. Introduction to AMD’s processor families • 2. AMD’s 32-bit x86 families • 3. Migration of 32-bit ISAs and microarchitectures to 64-bit • 4. Overview of AMD’s K8 – K10.5 (Hammer-based) families • 5. The K8 (Hammer) family • 6. The K10 Barcelona family • 7. The K10.5 Shanghai family • 8. The K10.5 Istambul family • 9. The K10.5-based Magny-Course/Lisbon family • 10. References 1. Introduction to AMD’s processor families 1. Introduction to AMD’s processor families (1) 1. Introduction to AMD’s processor families AMD’s early x86 processor history [1] AMD’s own processors Second sourced processors 1. Introduction to AMD’s processor families (2) Evolution of AMD’s early processors [2] 1. Introduction to AMD’s processor families (3) Historical remarks 1) Beyond x86 processors AMD also designed and marketed two embedded processor families; • the 2900 family of bipolar, 4-bit slice microprocessors (1975-?) used in a number of processors, such as particular DEC 11 family models, and • the 29000 family (29K family) of CMOS, 32-bit embedded microcontrollers (1987-95). In late 1995 AMD cancelled their 29K family development and transferred the related design team to the firm’s K5 effort, in order to focus on x86 processors [3]. 2) Initially, AMD designed the Am386/486 processors that were clones of Intel’s processors. -
AMD Raven Ridge
DELIVERING A NEW LEVEL OF VISUAL PERFORMANCE IN AN SOC AMD “RAVEN RIDGE” APU Dan Bouvier, Jim Gibney, Alex Branover, Sonu Arora Presented by: Dan Bouvier Corporate VP, Client Products Chief Architect AMD CONFIDENTIAL RAISING THE BAR FOR THE APU VISUAL EXPERIENCE Up to MOBILE APU GENERATIONAL 200% MORE CPU PERFORMANCE PERFORMANCE GAINS Up to 128% MORE GPU PERFORMANCE Up to 58% LESS POWER FIRST “Zen”-based APU CPU Performance GPU Performance Power HIGH-PERFORMANCE AMD Ryzen™ 7 2700U 7th Gen AMD A-Series APU On-die “Vega”-based graphics Scaled GPU Managed Improved Upgraded Increased LONG BATTERY LIFE and CPU up to power delivery memory display package Premium form factors reach target and thermal bandwidth experience performance frame rate dissipation efficiency density 2 | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 | * See footnotes for details. “RAVEN RIDGE” APU AMD “ZEN” x86 CPU CORES CPU 0 “ZEN” CPU CPU 1 (4 CORE | 8 THREAD) USB 3.1 NVMe PCIe FULL PCIe GPP ----------- ----------- Discrete SYSTEM 4MB USB 2.0 SATA GFX CONNECTIVITY CPU 2 CPU 3 L3 Cache X64 DDR4 HIGH BANDWIDTH SOC FABRIC System Infinity Fabric & MEMORY Management SYSTEM Unit ACCELERATED Platform Multimedia Security MULTIMEDIA Processor Engines AMD GFX+ 1MB L2 EXPERIENCE X64 DDR4 (11 COMPUTE UNITS) Cache Video Audio Sensor INTEGRATED CU CU CU CU CU CU Display Codec ACP Fusion Controller Next SENSOR Next Hub FUSION HUB CU CU CU CU CU AMD “VEGA” GPU UPGRADED DISPLAY ENGINE 3 | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 | SIGNIFICANT DENSITY INCREASE “Raven Ridge” die BGA Package: 25 x 35 x 1.38mm Technology: GLOBALFOUNDRIES 14nm – 11 layer metal Transistor count: 4.94B 59% 16% Die Size: 209.78mm2 more transistors smaller die than prior generation “Bristol Ridge” APU 4 | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 | * See footnotes for details. -
SMBIOS Specification
1 2 Document Identifier: DSP0134 3 Date: 2019-10-31 4 Version: 3.4.0a 5 System Management BIOS (SMBIOS) Reference 6 Specification Information for Work-in-Progress version: IMPORTANT: This document is not a standard. It does not necessarily reflect the views of the DMTF or its members. Because this document is a Work in Progress, this document may still change, perhaps profoundly and without notice. This document is available for public review and comment until superseded. Provide any comments through the DMTF Feedback Portal: http://www.dmtf.org/standards/feedback 7 Supersedes: 3.3.0 8 Document Class: Normative 9 Document Status: Work in Progress 10 Document Language: en-US 11 System Management BIOS (SMBIOS) Reference Specification DSP0134 12 Copyright Notice 13 Copyright © 2000, 2002, 2004–2019 DMTF. All rights reserved. 14 DMTF is a not-for-profit association of industry members dedicated to promoting enterprise and systems 15 management and interoperability. Members and non-members may reproduce DMTF specifications and 16 documents, provided that correct attribution is given. As DMTF specifications may be revised from time to 17 time, the particular version and release date should always be noted. 18 Implementation of certain elements of this standard or proposed standard may be subject to third party 19 patent rights, including provisional patent rights (herein "patent rights"). DMTF makes no representations 20 to users of the standard as to the existence of such rights, and is not responsible to recognize, disclose, 21 or identify any or all such third party patent right, owners or claimants, nor for any incomplete or 22 inaccurate identification or disclosure of such rights, owners or claimants. -
CPU Benchmarks - List of Benchmarked Cpus
11.09.2020 PassMark - CPU Benchmarks - List of Benchmarked CPUs CPU Benchmarks CPU Benchmarks Over 1,000,000 CPUs Benchmarked CPU List Below is an alphabetical list of all CPU types that appear in the charts. Clicking on a specific processor name will take you to the chart it appears in and will highlight it for you. Results for Single CPU Systems and Multiple CPU Systems are listed separately. Find CPU Single CPU Systems Multi CPU Systems CPUS High End Single CPU Systems High Mid Range Last updated on the 11th of September 2020 Low Mid Range Low End Column CPU Mark Rank CPU Value Price Best Value CPU Name (higher is (lower is (higher is (USD) (On Market) better) better) better) Best Value XY AMD 3015e 2,678 1285 NA NA Scatter Best Value AMD 3020e 2,721 1272 NA NA (All time) AMD A4 Micro-6400T APU 1,004 2126 NA NA New Desktop AMD A4 PRO-3340B 1,519 1790 NA NA New Laptop AMD A4 PRO-7300B APU 1,421 1839 NA NA Single Thread AMD A4 PRO-7350B 1,024 2108 NA NA Systems with AMD A4-1200 APU 445 2572 NA NA Multiple CPUs Overclocked AMD A4-1250 APU 432 2583 NA NA Power AMD A4-3300 APU 994 2131 76.50 $12.99 Performance CPU Mark by Socket AMD A4-3300M APU 665 2394 22.19 $29.99* Type Cross-Platform CPU AMD A4-3305M APU 807 2273 38.78 $20.81 Performance AMD A4-3310MX APU 844 2239 NA NA CPU Mega List AMD A4-3320M APU 877 2212 23.77 $36.90 Search Model AMD A4-3330MX APU 816 2265 NA NA 0 1,067 2078 Compare AMD A4-3400 APU 53.35 $20.00 https://www.cpubenchmark.net/cpu_list.php AMD A4 3420 APU 8 01 $12 9 * 1/87 11.09.2020 PassMark - CPU Benchmarks - List -
AMD Introduces World's Most Powerful 16- Core
November 7, 2019 AMD Introduces World’s Most Powerful 16- core Consumer Desktop Processor, the AMD Ryzen™ 9 3950X – AMD Ryzen™ 9 3950X rounds out 3rd Gen Ryzen desktop processor series, arriving November 25 – – New AMD Athlon™ 3000G processor to provide everyday users with unmatched performance per dollar, coming November 19 – SANTA CLARA, Calif., Nov. 07, 2019 (GLOBE NEWSWIRE) -- Today, AMD announced the release of the highly anticipated flagship 16-core AMD Ryzen 9 3950X processor, available worldwide November 25, 2019. AMD Ryzen 9 3950X processor brings the ultimate processor for gamers with effortless 1080P gaming in select titles1 and up to 2X more energy efficient processing power compared to the competition2 as the world’s fastest 16- core consumer desktop processor3. In addition, AMD also announced a significant performance uplift4 coming for mainstream desktop users with the new AMD Athlon 3000G, arriving November 19, 2019. “We are excited to bring the AMD Ryzen™ 9 3950X to market later this month, offering enthusiasts the most powerful 16-core desktop processor ever,” said Chris Kilburn, corporate vice president and general manager, client channel, AMD. “We are focused on offering the best solutions at every level of the market, including the AMD Athlon 3000G for everyday PC users that delivers great performance at an incredible price point.” AMD Ryzen 9 3950X: Fastest 16-core Consumer Desktop Processor Offering up to 22% performance increase over previous generations5, the AMD Ryzen 9 3950X offers faster 1080p gaming in select titles1 and content creation6 than the competition. Built on the industry-leading “Zen 2” architecture, the AMD Ryzen 9 3950X also excels in power efficiency3 with a TDP7 of 105W. -
Take a Way: Exploring the Security Implications of AMD's Cache Way
Take A Way: Exploring the Security Implications of AMD’s Cache Way Predictors Moritz Lipp Vedad Hadžić Michael Schwarz Graz University of Technology Graz University of Technology Graz University of Technology Arthur Perais Clémentine Maurice Daniel Gruss Unaffiliated Univ Rennes, CNRS, IRISA Graz University of Technology ABSTRACT 1 INTRODUCTION To optimize the energy consumption and performance of their With caches, out-of-order execution, speculative execution, or si- CPUs, AMD introduced a way predictor for the L1-data (L1D) cache multaneous multithreading (SMT), modern processors are equipped to predict in which cache way a certain address is located. Conse- with numerous features optimizing the system’s throughput and quently, only this way is accessed, significantly reducing the power power consumption. Despite their performance benefits, these op- consumption of the processor. timizations are often not designed with a central focus on security In this paper, we are the first to exploit the cache way predic- properties. Hence, microarchitectural attacks have exploited these tor. We reverse-engineered AMD’s L1D cache way predictor in optimizations to undermine the system’s security. microarchitectures from 2011 to 2019, resulting in two new attack Cache attacks on cryptographic algorithms were the first mi- techniques. With Collide+Probe, an attacker can monitor a vic- croarchitectural attacks [12, 42, 59]. Osvik et al. [58] showed that tim’s memory accesses without knowledge of physical addresses an attacker can observe the cache state at the granularity of a cache or shared memory when time-sharing a logical core. With Load+ set using Prime+Probe. Yarom et al. [82] proposed Flush+Reload, Reload, we exploit the way predictor to obtain highly-accurate a technique that can observe victim activity at a cache-line granu- memory-access traces of victims on the same physical core.