Amd Ryzen™ Processor Software Optimization Presented by Kenneth Mitchell

Total Page:16

File Type:pdf, Size:1020Kb

Amd Ryzen™ Processor Software Optimization Presented by Kenneth Mitchell AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION PRESENTED BY KENNETH MITCHELL LET’S BUILD… 2020 MAY 15, 2020 ABSTRACT Join AMD Game Engineering team members for an introduction to the AMD Ryzen™ family of processors followed by advanced optimization topics. Learn about the high-performance AMD "Zen 2" microarchitecture and profiling tools. Gain insight into code optimization opportunities and lessons learned. Examples may include C/C++, assembly, and hardware performance-monitoring counters. AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 2 SPEAKER BIOGRAPHY Ken Mitchell is a Principal Member of Technical Staff in the Radeon™ Technologies Group/AMD ISV Game Engineering team where he focuses on helping game developers utilize AMD processors efficiently. His previous work includes automating and analyzing PC applications for performance projections of future AMD products as well as developing benchmarks. Ken studied computer science at the University of Texas at Austin. AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 3 AGENDA • Success Stories • “Zen 2” Architecture Processors • AMD uProf Profiler • Optimizations and Lessons Learned • Contacts AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 4 SUCCESS STORIES AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 5 SUCCESS STORIES BORDERLANDS 3 GEARS 5 WORLD WAR Z DirectX® 12 DirectX® 12 Vulkan® AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 6 “ZEN 2” ARCHITECTURE PROCESSORS AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 7 “ZEN 2” PRODUCT EXAMPLES NOTEBOOK DESKTOP HIGH END DESKTOP “Renoir” “Matisse” “Castle Peak” AMD Ryzen™ 7 4800U 8-Core AMD Ryzen™ 9 3950X 16-Core AMD Ryzen™ Threadripper™ Processor Processor 3990X 64-Core Processor AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 8 MICROARCHITECTURE AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 9 ADVANCES IN “ZEN 2” MICROARCHITECTURE 32K ICACHE BRANCH • +15% IPC Improvement from “Zen” to “Zen 2” 8 way PREDICTION • 2x op cache capacity DECODE OP CACHE • Reoptimized L1I cache 4 instructions/cycle 8 ops/cycle rd Micro-op Queue • 3 address generation unit 6 ops dispatched FLOATING • 2x FP data path width INTEGER POINT Integer Rename Floating Point Rename • 2x L3 capacity • Improved branch prediction accuracy Sche Sche Sche Sche Sche Sche Sche Scheduler duler duler duler duler duler duler duler • Hardware optimized Security Mitigations Integer Physical Register File FP Register File • Secure Virtualization with Guest Mode Execute Trap (GMET) ALU ALU ALU ALU AGU AGU AGU MUL ADD MUL ADD • Improved SMT fairness • for ALU and AGU schedulers 512K L2 2 loads + 1 store LOAD/STORE 32K DCACHE (I+D)CACHE • Improved Write Combining Buffer per cycle QUEUES 8 way 8 way AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 10 DATA FLOW AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 11 “RENOIR” 8 CORE PROCESSOR Unified 32B fetch 32K 32B/cycle DRAM 32B/cycle Memory 16B/cycle I-Cache Channel 8-way Controller 512K L2 uclk memclk 32B/cycle 32B/cycle I+D Cache 4M L3 2*32B load 32B/cycle 8-way Data 32B/cycle 32K GFX9 D-Cache I+D Cache Fabric 1*32B store 8-way 16-way 32B/cycle cclk l3clk Media IO Hub 64B/cycle Controller fclk lclk AMD Ryzen™ 7 4800U, 15W TDP, 8 Cores, 16 Threads, 4.2 GHz max boost clock, 1.8 GHz base clock, integrated GPU. * Monolithic Die. Each 4M L3 Cache has its own 32B/cycle link to the data fabric. 64b DDR4 Channel Shown. AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 12 CCD CCD “MATISSE” 16 CORES PROCESSOR IOD Unified 32B fetch 32K 32B/cycle DRAM 32B/cycle Memory 16B/cycle I-Cache Channel 8-way Controller 512K L2 uclk memclk 32B/cycle 16M L3 32B/cycle R I+D Cache Data 2*32B load 32K 32B/cycle 8-way I+D Cache 16B/cycle W Fabric D-Cache 16-way 1*32B store 8-way cclk 64B/cycle IO Hub l3clk Controller fclk lclk AMD Ryzen™ 9 3950X, 105W TDP, 16 Cores, 32 Threads, 4.7 GHz max boost clock, 3.5 GHz base clock. * Two Core Complex Die (CCD). Each CCD has two 16M L3 Cache Complexes. * The L3 Cache Complexes within a CCD share a single link to the Data Fabric. AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 13 0 1 4 5 IOD “CASTLE PEAK” 64 CORE PROCESSOR 2 3 6 7 Unified 32B fetch 32K 32B/cycle DRAM 32B/cycle Memory 16B/cycle I-Cache Channel 8-way Controller 512K L2 uclk memclk 32B/cycle 16M L3 32B/cycle R I+D Cache Data 2*32B load 32K 32B/cycle 8-way I+D Cache Fabric 16B/cycle W D-Cache 16-way Quadrant 1*32B store 8-way cclk 64B/cycle IO Hub l3clk Controller fclk lclk AMD Ryzen™ Threadripper™ 3990X, 280W TDP, 64 Cores, 128 Threads, 4.3 GHz max boost clock, 2.9 GHz base clock. * Two CCDs per Data Fabric Quadrant. * Two Data Fabric Quadrants have Unified Memory Controllers and two have IO Hubs. AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 14 INSTRUCTION SET AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 15 INSTRUCTION SET EVOLUTION AES AVX BMI SHA XOP ADX TBM FMA F16C AVX2 BMI2 CLWB SMEP SSSE3 FMA4 SMAP XSAVE SSE4.1 SSE4.2 RDRND XSAVES XSAVEC CLZERO MOVBE XGETBV RDSEED OSXSAVE FSGSBASE EXAMPLE XSAVEOPT MONITORX WBNOINVD CLFLUSHOPT YEAR FAMILY PRODUCT FAMILY ARCHITECTURE PRODUCT PCLMULQDQ 2019 17h “Matisse” “Zen2” Ryzen™ 9 3950X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 2017 17h “Summit Ridge”, ”Pinnacle Ridge” “Zen”, “Zen+” Ryzen™ 7 2700X 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 2015 15h “Carrizo”, “Bristol Ridge” “Excavator” A12-9800 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 2014 15h “Kaveri”, “Godavari” “Steamroller” A10-7890K 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 2012 15h “Vishera” “Piledriver” FX-8370 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 2011 15h “Zambezi” “Bulldozer” FX-8150 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 0 1 2013 16h “Kabini” “Jaguar” A6-1450 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 2011 14h “Ontario” “Bobcat” E-450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2011 12h “Llano” “Husky” A8-3870 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 “Zen 2” added CLWB and the AMD vendor specific instruction WBNOINVD. AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 16 SOFTWARE PREFETCH LEVEL INSTRUCTIONS Prefetch T0|T1|T2|NTA • Loads a cache line from the specified memory address into the data- L1 cache level specified by the locality reference T0, T1, T2, or NTA. Aggressively • If a memory fault is detected, a bus cycle is not initiated and the Evict instruction is treated as an NOP. Prefetch • Prefetch levels T0/T1/T2 are treated identically in “Zen” & “Zen 2” NTA microarchitectures. L2 • The non-temporal cache fill hint, indicated with PREFETCHNTA, reduces cache pollution for data that will only be used once. It is not suitable for cache blocking of small data sets. Lines filled into the L2 cache with PREFETCHNTA are marked for quicker eviction from the L2 and when evicted from the L2 are not inserted into the L3. L3 • The operation of this instruction is implementation-dependent. Prefetch fill & evict policies may differ for other processor vendors or microarchitecture generations. memory AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 17 “MATISSE” CACHE AND MEMORY AMD PUBLIC | LET’S BUILD… 2020 | AMD RYZEN™ PROCESSOR SOFTWARE OPTIMIZATION | MAY 15, 2020 | 18 CACHE LATENCY • Cache line size is 64 Bytes. Count/ Level Capacity Sets Ways Line Size Latency CCD • 2 cpu clock cycles to move a single cache line. • L2 is inclusive of L1. uop 8 4 K uops 64 8 8 uops NA • lines filled into L1 are also filled into L2. • L3 is filled from L2 victims of all 4 cores within its CCX.
Recommended publications
  • Real-Time Finite Element Method (FEM) and Tressfx
    REAL-TIME FEM AND TRESSFX 4 ERIC LARSEN KARL HILLESLAND 1 FEBRUARY 2016 | CONFIDENTIAL FINITE ELEMENT METHOD (FEM) SIMULATION Simulates soft to nearly-rigid objects, with fracture Models object as mesh of tetrahedral elements Each element has material parameters: ‒ Young’s Modulus: How stiff the material is ‒ Poisson’s ratio: Effect of deformation on volume ‒ Yield strength: Deformation limit before permanent shape change ‒ Fracture strength: Stress limit before the material breaks 2 FEBRUARY 2016 | CONFIDENTIAL MOTIVATIONS FOR THIS METHOD Parameters give a lot of design control Can model many real-world materials ‒Rubber, metal, glass, wood, animal tissue Commonly used now for film effects ‒High-quality destruction Successful real-time use in Star Wars: The Force Unleashed 1 & 2 ‒DMM middleware [Parker and O’Brien] 3 FEBRUARY 2016 | CONFIDENTIAL OUR PROJECT New implementation of real-time FEM for games Planned CPU library release ‒Heavy use of multithreading ‒Open-source with GPUOpen license Some highlights ‒Practical method for continuous collision detection (CCD) ‒Mix of CCD and intersection contact constraints ‒Efficient integrals for intersection constraint 4 FEBRUARY 2016 | CONFIDENTIAL STATUS Proof-of-concept prototype First pass at optimization Offering an early look for feedback Several generic components 5 FEBRUARY 2016 | CONFIDENTIAL CCD Find time of impact between moving objects ‒Impulses can prevent intersections [Otaduy et al.] ‒Catches collisions with fast-moving objects Our approach ‒Conservative-advancement based ‒Geometric
    [Show full text]
  • AMD Powerpoint- White Template
    RDNA Architecture Forward-looking statement This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to, the features, functionality, performance, availability, timing, pricing, expectations and expected benefits of AMD’s current and future products, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD's Securities and Exchange Commission filings, including but not limited to AMD's Quarterly Report on Form 10-Q for the quarter ended March 30, 2019 2 Highlights of the RDNA Workgroup Processor (WGP) ▪ Designed for lower latency and higher
    [Show full text]
  • Amd Filed: February 24, 2009 (Period: December 27, 2008)
    FORM 10-K ADVANCED MICRO DEVICES INC - amd Filed: February 24, 2009 (period: December 27, 2008) Annual report which provides a comprehensive overview of the company for the past year Table of Contents 10-K - FORM 10-K PART I ITEM 1. 1 PART I ITEM 1. BUSINESS ITEM 1A. RISK FACTORS ITEM 1B. UNRESOLVED STAFF COMMENTS ITEM 2. PROPERTIES ITEM 3. LEGAL PROCEEDINGS ITEM 4. SUBMISSION OF MATTERS TO A VOTE OF SECURITY HOLDERS PART II ITEM 5. MARKET FOR REGISTRANT S COMMON EQUITY, RELATED STOCKHOLDER MATTERS AND ISSUER PURCHASES OF EQUITY SECURITIES ITEM 6. SELECTED FINANCIAL DATA ITEM 7. MANAGEMENT S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS OF OPERATIONS ITEM 7A. QUANTITATIVE AND QUALITATIVE DISCLOSURE ABOUT MARKET RISK ITEM 8. FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA ITEM 9. CHANGES IN AND DISAGREEMENTS WITH ACCOUNTANTS ON ACCOUNTING AND FINANCIAL DISCLOSURE ITEM 9A. CONTROLS AND PROCEDURES ITEM 9B. OTHER INFORMATION PART III ITEM 10. DIRECTORS, EXECUTIVE OFFICERS AND CORPORATE GOVERNANCE ITEM 11. EXECUTIVE COMPENSATION ITEM 12. SECURITY OWNERSHIP OF CERTAIN BENEFICIAL OWNERS AND MANAGEMENT AND RELATED STOCKHOLDER MATTERS ITEM 13. CERTAIN RELATIONSHIPS AND RELATED TRANSACTIONS AND DIRECTOR INDEPENDENCE ITEM 14. PRINCIPAL ACCOUNTANT FEES AND SERVICES PART IV ITEM 15. EXHIBITS, FINANCIAL STATEMENT SCHEDULES SIGNATURES EX-10.5(A) (OUTSIDE DIRECTOR EQUITY COMPENSATION POLICY) EX-10.19 (SEPARATION AGREEMENT AND GENERAL RELEASE) EX-21 (LIST OF AMD SUBSIDIARIES) EX-23.A (CONSENT OF ERNST YOUNG LLP - ADVANCED MICRO DEVICES) EX-23.B
    [Show full text]
  • Memory Centric Characterization and Analysis of SPEC CPU2017 Suite
    Session 11: Performance Analysis and Simulation ICPE ’19, April 7–11, 2019, Mumbai, India Memory Centric Characterization and Analysis of SPEC CPU2017 Suite Sarabjeet Singh Manu Awasthi [email protected] [email protected] Ashoka University Ashoka University ABSTRACT These benchmarks have become the standard for any researcher or In this paper, we provide a comprehensive, memory-centric charac- commercial entity wishing to benchmark their architecture or for terization of the SPEC CPU2017 benchmark suite, using a number of exploring new designs. mechanisms including dynamic binary instrumentation, measure- The latest offering of SPEC CPU suite, SPEC CPU2017, was re- ments on native hardware using hardware performance counters leased in June 2017 [8]. SPEC CPU2017 retains a number of bench- and operating system based tools. marks from previous iterations but has also added many new ones We present a number of results including working set sizes, mem- to reflect the changing nature of applications. Some recent stud- ory capacity consumption and memory bandwidth utilization of ies [21, 24] have already started characterizing the behavior of various workloads. Our experiments reveal that, on the x86_64 ISA, SPEC CPU2017 applications, looking for potential optimizations to SPEC CPU2017 workloads execute a significant number of mem- system architectures. ory related instructions, with approximately 50% of all dynamic In recent years the memory hierarchy, from the caches, all the instructions requiring memory accesses. We also show that there is way to main memory, has become a first class citizen of computer a large variation in the memory footprint and bandwidth utilization system design.
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • Oracle Corporation: SPARC T7-1
    SPEC CINT2006 Result spec Copyright 2006-2015 Standard Performance Evaluation Corporation Oracle Corporation SPECint_rate2006 = 1200 SPARC T7-1 SPECint_rate_base2006 = 1120 CPU2006 license: 6 Test date: Oct-2015 Test sponsor: Oracle Corporation Hardware Availability: Oct-2015 Tested by: Oracle Corporation Software Availability: Oct-2015 Copies 0 300 600 900 1200 1600 2000 2400 2800 3200 3600 4000 4400 4800 5200 5600 6000 6400 6800 7600 1100 400.perlbench 192 224 1040 675 401.bzip2 256 224 666 875 403.gcc 160 224 720 1380 429.mcf 128 224 1160 1190 445.gobmk 256 224 1120 854 456.hmmer 96 224 813 1020 458.sjeng 192 224 988 7550 462.libquantum 416 224 7330 1190 464.h264ref 256 224 1150 956 471.omnetpp 255 224 885 986 473.astar 416 224 862 1180 483.xalancbmk 256 224 1140 SPECint_rate_base2006 = 1120 SPECint_rate2006 = 1200 Hardware Software CPU Name: SPARC M7 Operating System: Oracle Solaris 11.3 CPU Characteristics: Compiler: C/C++/Fortran: Version 12.4 of Oracle Solaris CPU MHz: 4133 Studio, FPU: Integrated 4/15 Patch Set CPU(s) enabled: 32 cores, 1 chip, 32 cores/chip, 8 threads/core Auto Parallel: No CPU(s) orderable: 1 chip File System: zfs Primary Cache: 16 KB I + 16 KB D on chip per core System State: Default Secondary Cache: 2 MB I on chip per chip (256 KB / 4 cores); Base Pointers: 32-bit 4 MB D on chip per chip (256 KB / 2 cores) Peak Pointers: 32-bit L3 Cache: 64 MB I+D on chip per chip (8 MB / 4 cores) Other Software: None Other Cache: None Memory: 512 GB (16 x 32 GB 4Rx4 PC4-2133P-L) Disk Subsystem: 732 GB, 4 x 400 GB SAS SSD
    [Show full text]
  • The Amd Linux Graphics Stack – 2018 Edition Nicolai Hähnle Fosdem 2018
    THE AMD LINUX GRAPHICS STACK – 2018 EDITION NICOLAI HÄHNLE FOSDEM 2018 1FEBRUARY 2018 | CONFIDENTIAL GRAPHICS STACK: KERNEL / USER-SPACE / X SERVER Mesa OpenGL & Multimedia Vulkan Vulkan radv AMDVLK OpenGL X Server radeonsi Pro/ r600 Workstation radeon amdgpu LLVM SCPC libdrm radeon amdgpu FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 2FEBRUARY 2018 | CONFIDENTIAL GRAPHICS STACK: OPEN-SOURCE / CLOSED-SOURCE Mesa OpenGL & Multimedia Vulkan Vulkan radv AMDVLK OpenGL X Server radeonsi Pro/ r600 Workstation radeon amdgpu LLVM SCPC libdrm radeon amdgpu FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 3FEBRUARY 2018 | CONFIDENTIAL GRAPHICS STACK: SUPPORT FOR GCN / PRE-GCN HARDWARE ROUGHLY: GCN = NEW GPUS OF THE LAST 5 YEARS Mesa OpenGL & Multimedia Vulkan Vulkan radv AMDVLK OpenGL X Server radeonsi Pro/ r600 Workstation radeon amdgpu LLVM(*) SCPC libdrm radeon amdgpu (*) LLVM has pre-GCN support only for compute FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 4FEBRUARY 2018 | CONFIDENTIAL GRAPHICS STACK: PHASING OUT “LEGACY” COMPONENTS Mesa OpenGL & Multimedia Vulkan Vulkan radv AMDVLK OpenGL X Server radeonsi Pro/ r600 Workstation radeon amdgpu LLVM SCPC libdrm radeon amdgpu FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 5FEBRUARY 2018 | CONFIDENTIAL MAJOR MILESTONES OF 2017 . Upstreaming the DC display driver . Open-sourcing the AMDVLK Vulkan driver . Unified driver delivery . OpenGL 4.5 conformance in the open-source Mesa driver . Zero-day open-source support for new hardware FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 6FEBRUARY 2018 | CONFIDENTIAL KERNEL: AMDGPU AND RADEON HARDWARE SUPPORT Pre-GCN radeon GCN 1st gen (Southern Islands, SI, gfx6) GCN 2nd gen (Sea Islands, CI(K), gfx7) GCN 3rd gen (Volcanic Islands, VI, gfx8) amdgpu GCN 4th gen (Polaris, RX 4xx, RX 5xx) GCN 5th gen (RX Vega, Ryzen Mobile, gfx9) FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 7FEBRUARY 2018 | CONFIDENTIAL KERNEL: AMDGPU VS.
    [Show full text]
  • AMD APP SDK Developer Release Notes
    AMD APP SDK v3.0 Beta Developer Release Notes 1 What’s New in AMD APP SDK v3.0 Beta 1.1 New features in AMD APP SDK v3.0 Beta AMD APP SDK v3.0 Beta includes the following new features: OpenCL 2.0: There are 20 samples that demonstrate various features of OpenCL 2.0 such as Shared Virtual Memory, Platform Atomics, Device-side Enqueue, Pipes, New workgroup built-in functions, Program Scope Variables, Generic Address Space, and OpenCL 2.0 image features. For the complete list of the samples, see the AMD APP SDK Samples Release Notes (AMD_APP_SDK_Release_Notes_Samples.pdf) document. Support for Bolt 1.3 library. 6 additional samples that demonstrate various APIs in the Bolt C++ AMP library. One new sample that demonstrates the consumption of SPIR 1.2 binary. Enhancements and bug fixes in several samples. A lightweight installer that supports the following features: Customized online installation Ability to download the full installer for install and distribution 1.2 New features for AMD CodeXL version 1.6 The following new features in AMD CodeXL version 1.6 provide the following improvements to the developer experience: GPU Profiler support for OpenCL 2.0 API-level debugging for OpenCL 2.0 Power Profiling For information about CodeXL and about how to use CodeXL to gather performance data about your OpenCL application, such as application traces and timeline views, see the CodeXL home page. Developer Release Notes 1 of 4 2 Important Notes OpenCL 2.0 runtime support is limited to 64-bit applications running on 64-bit Windows and Linux operating systems only.
    [Show full text]
  • A Review of Gpuopen Effects
    A REVIEW OF GPUOPEN EFFECTS TAKAHIRO HARADA & JASON LACROIX • An initiative designed to help developers make better content by “opening up” the GPU • Contains a variety of software modules across various GPU needs: • Effects and render features • Tools, SDKs, and libraries • Patches and drivers • Software hosted on GitHub with no “black box” implementations or licensing fees • Website provides: • The latest news and information on all GPUOpen software • Tutorials and samples to help you optimise your game • A central location for up-to-date GPU and CPU documentation • Information about upcoming events and previous presentations AMD Public | Let’s build… 2020 | A Review of GPUOpen Effects | May 15, 2020 | 2 LET’S BUILD A NEW GPUOPEN… • Brand new, modern, dynamic website • Easy to find the information you need quickly • Read the latest news and see what’s popular • Learn new tips and techniques from our engineers • Looks good on mobile platforms too! • New social media presence • @GPUOpen AMD Public | Let’s build… 2020 | A Review of GPUOpen Effects | May 15, 2020 | 3 EFFECTS A look at recently released samples AMD Public | Let’s build… 2020 | A Review of GPUOpen Effects | May 15, 2020 | 4 TRESSFX 4.1 • Self-contained solution for hair simulation • Implementation into Radeon® Cauldron framework • DirectX® 12 and Vulkan® with full source • Optimized physics simulation • Faster velocity shock propagation • Simplified local shape constraints • Reorganization of dispatches • StrandUV support • New LOD system • New and improved Autodesk® Maya®
    [Show full text]
  • Catalyst™ Software Suite Version 9.5 Release Notes
    Catalyst™ Software Suite Version 9.5 Release Notes This release note provides information on the latest posting of AMD’s industry leading software suite, Catalyst™. This particular software suite updates both the AMD Display Driver, and the Catalyst™ Control Center. This unified driver has been further enhanced to provide the highest level of power, performance, and reliability. The AMD Catalyst™ software suite is the ultimate in performance and stability. For exclusive Catalyst™ updates follow Catalyst Maker on Twitter. This release note provides information on the following: z Web Content z AMD Product Support z Operating Systems Supported z New Features z Performance Improvements z Resolved Issues for the Windows Vista Operating System z Resolved Issues for the Windows XP Operating System z Resolved Issues for the Windows 7 Operating System z Known Issues Under the Windows Vista Operating System z Known Issues Under the Windows XP Operating System z Known Issues Under the Windows 7 Operating System z Installing the Catalyst™ Vista Software Driver z Catalyst™ Crew Driver Feedback ATI Catalyst™ Release Note Version 9.5 1 Web Content The Catalyst™ Software Suite 9.5 contains the following: z Radeon™ display driver 8.612 z HydraVision™ for both Windows XP and Vista z HydraVision™ Basic Edition (Windows XP only) z WDM Driver Install Bundle z Southbridge/IXP Driver z Catalyst™ Control Center Version 8.612 Caution: The Catalyst™ software driver and the Catalyst™ Control Center can be downloaded independently of each other. However, for maximum stability and performance AMD recommends that both components be updated from the same Catalyst™ release.
    [Show full text]
  • AMD Ryzen™ PRO & Athlon™ PRO Processors Quick Reference Guide
    AMD Ryzen™ PRO & Athlon™ PRO Processors Quick Reference guide AMD Ryzen™ PRO Processors with Radeon™ Graphics for Business Laptops (Socket FP6/FP5) 1 Core/Thread Frequency Boost/Base L2+L3 Cache Graphics Node TDP Intel vPro Core/Thread Frequency Boost*/Base L2+L3 Cache Graphics Node TDP AMD PRO technologies COMPARED TO 4.9/1.1 Radeon™ 6/12 UHD AMD Ryzen™ 7 PRO 8/16 Up to 12MB Graphics 7nm 15W intel Intel Core i7 10810U GHz 13MB 4750U 4.1/1.7 GHz CORE i7 14nm 15W (7 Cores) 10th Gen Intel Core i7 10610U 4/8 4.9/1.8 9MB UHD GHz AMD Ryzen™ 7 PRO Up to Radeon™ intel 4/8 6MB 10 12nm 15W 4.8/1.9 3700U 4.0/2.3 GHz Vega CORE i7 Intel Core i7 8665U 4/8 9MB UHD 14nm 15W 8th Gen GHz Radeon™ AMD Ryzen™ 5 PRO 6/12 Up to 11MB Graphics 7nm 15W intel 4.4/1.7 4650U 4.0/2.1 GHz CORE i5 Intel Core i5 10310U 4/8 7MB UHD 14nm 15W GHz (6 Cores) 10th Gen AMD Ryzen™ 5 PRO 4/8 Up to 6MB Radeon™ 12nm 15W intel 4.1/1.6 3500U 3.7/2.1 GHz Vega8 CORE i5 Intel Core i5 8365U 4/8 7MB UHD 14nm 15W th GHz 8 Gen Radeon™ AMD Ryzen™ 3 PRO 4/8 Up to 6MB Graphics 7nm 15W intel 4.1/2.1 4450U 3.7/2.5 GHz CORE i3 Intel Core i3 10110U 2/4 5MB UHD 14nm 15W (5 Cores) 10th Gen GHz AMD Ryzen™ 3 PRO 4/4 Up to 6MB Radeon™ 12nm 15W intel 3.9/2.1 3300U 3.5/2.1 GHz Vega6 CORE i3 Intel Core i3 8145U 2/4 4.5MB UHD 14nm 15W 8th Gen GHz AMD Athlon™ PRO Processors with Radeon™ Vega Graphics for Business Laptops (Socket FP5) AMD Athlon™ PRO Up to Radeon™ intel Intel Pentium 4415U 2/4 2.3 GHz 2.5MB HD 610 14nm 15W 300U 2/4 3.3/2.4 GHz 5MB Vega3 12nm 15W 1.
    [Show full text]
  • Amd(Amd.Us)18Q1 点评 2018 年 07 月 30 日
    海外公司报告 | 公司动态研究 证券研究报告 AMD(AMD.US)18Q1 点评 2018 年 07 月 30 日 作者 AMD 7 年最佳,10 年翻身,重申买入,TP 上调至 何翩翩 分析师 23 美元 SAC 执业证书编号:S1110516080002 [email protected] 业绩超预期,7 年来最佳盈利季 雷俊成 分析师 SAC 执业证书编号:S1110518060004 AMD 18Q2 实现 7 年来最佳盈利季度,non-GAAP EPS 0.14 美元,营收 17.6 [email protected] 亿美元同比大涨 53%,均超过华尔街预期的 EPS 0.13 美元和营收 17.2 亿美 马赫 分析师 SAC 执业证书编号:S1110518070001 元。计算与图形业务同比大涨 64%至 10.9 亿美元好于市场预期的 10.6 亿, [email protected] 但受 Q2 区块链相关贡献进一步减弱带来该业务环比跌 3%。挖矿业务本季 董可心 联系人 营收占比从上季的 10%降低为 6%,公司进一步看淡下半年需求。EESC 业务 [email protected] 同比涨 37%至 6.7 亿美元,好于预期的 6.61 亿,EPYC 逐步进入放量阶段, 公司维持到年底会实现中单位数份额的预测。Q2 毛利率提升至 37%,Q3 指引营收 17 亿美元,同比增长 7%,略低于市场预期的 17.6 亿,毛利率提 相关报告 升至约 38%;全年指引营收增速保持 25%,我们认为公司指引基于 17Q3 的 1 《AMD(AMD.US)点评:EPYC“从 高基数较为保守,且区块链影响作为一次性业务逐渐消弭也会进一步减少 零到一”终实现,7nm 产品周期全方位 业绩不确定性,我们看好 EPYC 会在下半年至 Q4 迎来关键放量。 回归“传奇”;TP 上调至 22 美元,重 服务器市场 AMD 与 Intel“荣辱互见” 申买入》2018-06-20 2 《AMD(AMD.US)点评:公布 7nm 服务器市场 AMD 与 Intel“荣辱互见”,EPYC 服务器随着 Cisco、HPE 适配 GPU 加入 AI 计算抢滩战,Ryzen+EPYC 以及超级云计算客户的需求能见度提高,Q2 出货量和营收均环比提高超 50%,目前与 AMD 合作的 5 个云计算巨头成主要推动力。我们认为 AMD “双子星”仍是中流砥柱;TP 上调至 将继续通过单插槽服务器高核心数和低功耗打造性价比优势,下半年加速 18 美元,重申买入》2018-06-08 市场渗透蚕食 Intel 份额,进入明年则等待 7nm 的第二代 EPYC 面市,面对 3 《AMD(AMD.US)18Q1 点评:2018 已将 10nm Cannon Lake 量产时点延后至明年的 Intel,AMD 将终于实现制 开门红,业绩指引均超预期,Ryzen 继 程反超,加速量价齐升。“从零到一”抢占 20 亿美元以上的市场份额。 续扎实闪耀,EPYC 仍待升级放量,重 反观 Intel Q2 数据中心业务收入 55.5 亿美元,虽然在整体行业高景气度下 申买入》2018-04-30 同比增长 27%,但仍低于市场预期的 56.3 亿美元。业绩发布会上 Intel 更为 4 《2017 扭亏为盈业绩迎拐点,2018 明确消费级 10nm 产品会到 19 年下半年节日旺季才推向市场,让市场情绪 厚积待薄发,Ryzen+EPYC 继续双星闪 愈加悲观的同时也给了 AMD 足够的时间窗口。 耀,重申买入》2018-02-01 Ryzen 继续攻城略地,进一步打开笔记本市场 5 《AMD(AMD.US)点评:合作英特
    [Show full text]