The LPGPU2 Project: Low-Power Parallel Computing on Gpus

Total Page:16

File Type:pdf, Size:1020Kb

The LPGPU2 Project: Low-Power Parallel Computing on Gpus View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by DepositOnce Juurlink, B.; Lucas, J.; Mammeri, N.; Bliss, M.; Keramidas, G.; Kokkala, C.; Richards, A. The LPGPU2 Project: Low-Power Parallel Computing on GPUs Conference paper | Accepted manuscript (Postprint) This version is available at https://doi.org/10.14279/depositonce-7088 © ACM 2017. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in SCOPES '17 Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems http://dx.doi.org/10.1145/3078659.3078672 Juurlink, B., Lucas, J., Mammeri, N., Bliss, M., Keramidas, G., Kokkala, C., & Richards, A. (2017). The LPGPU2 Project. In Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems - SCOPES ’17. ACM Press. https://doi.org/10.1145/3078659.3078672 Terms of Use Copyright applies. A non-exclusive, non-transferable and limited right to use is granted. This document is intended solely for personal, non-commercial use. The LPGPU2 Project - Low-Power Parallel Computing on GPUs Extended Abstract Ben Juurlink Jan Lucas Nadjib Mammeri TU Berlin TU Berlin TU Berlin Einsteinufer 17 Einsteinufer 17 Einsteinufer 17 Berlin, Germany 10587 Berlin, Germany 10587 Berlin, Germany 10587 [email protected] [email protected] [email protected] Martyn Bliss Georgios Keramidas Chrysa Kokkala Samsung Electronics Think Silicon Think Silicon Communications House Patras Science Park Patras Science Park Staines, UK Rion Achaias, Greece 26504 Rion Achaias, Greece 26504 [email protected] [email protected] [email protected] Andrew Richards Codeplay Argyle House Edinburgh, Scotland [email protected] ABSTRACT 1 INTRODUCTION TheLPGPU2 project is a30-month-project (InnovationAction) Consumers today expect to be able to carry a supercomputer fundedby theEuropean Union.Itsoverall goalis to develop an aroundwiththem,thathasdetailedgraphicaldisplays,is easy to analysisandvisualizationframework thatenablesGPU application useandlastsformorethanadayon asinglebatterycharge.Not developers to improve theperformance andpower consumptionof only that, but now users expect their devices to see, listen and theirapplications.To achieve thisoverall goal,several key objec- understand the world around them. This applies to devices from tives need to be achieved. First,several applications(usecases)need smartphonesthatcanunderstandhumanspokenrequeststo allthe to be developed foror ported to low-power GPUs. Thereafter,these way to cars that can drive themselves. To deliver on this applicationsneed to be optimizedusingthetooling framework. capability requires very power efficient graphics processors In addition,power measurementdevices andpower models need ("GPUs"). to be developed thatare 10xmoreaccuratethanthestateof the Powerconsumptionis importantto processing,smartpower art.Theproject consortiumactivelypromotes open vendor-neutral managementalgorithmssuchas dynamicvoltagefrequencyscaling standards viathe Khronosgroup. Thispaper brieflyreports on the (DVFS)areusedas amitigationforhighpowerconsumptionbut achievements madein thefirsthalfof theproject, andfocuses on thisusuallyresultsin alesscompellinguserexperienceas theCPU theprogress madein applications;in power measurement,esti- andGPUareclockeddownto conservepowerresultingin lessraw mation,andmodelling; andin theanalysisandvisualizationtool processingpower.Thestrictpowerlimitationsmeansthatthese suite. demandscannotbe metthroughhardwareimprovementsalone,the softwaremustbetterexploittheavailableresources.Unfortunately, programmersarehinderedwhencreatinglow-powerGPUsoftware by thequalityof currentperformanceanalysistools.As software becomesmorecomplexit becomesincreasinglyunmanageablefor ,&:803%4 programmersto optimizethesoftwareforlow-powerdevices. (16T MPXQPXFS FNCFEEFE QPXFSNPEFMJOH QFSGPSNBODFDPVO TheLPGPU2consortiumhascometogetherto workas adiverse UFST NJDSPCFODINBSLT WJTVBMJ[BUJPO GSBNFXPSL teamon deliveringlow-powerGPUsfromalltherangeof angles 1 and functionality. These applications will be used as benchmarks for the power and performance analysis tool being developed as part of the project. Figure 1 illustrates the role of the applications in LPGPU 2 . The applications under consideration are important commer- cial applications for individual partners in the area of their core businesses: • Spin Digital develops video codecs for the next generation of ultra high quality media. In LPGPU 2 Spin Digital is de- veloping a media player based on the H.265 video codec. The main contribution has been the development of a high performance and high quality multi-API video rendering engine. When combined together with the H.265 video de- Figure 1: General diagram of the LPGPU 2 framework coder, the new video rendering engine allows the creation of state-of-the art media playback applications in areas such as UHD TV, Virtual Reality (VR), and large screen required. The consortium combines commercial tools, applications, display. platform and GPU designers with academic researchers to analyse • Samsung graphics team in the UK is responsible for android GPU power and performance, define standard interfaces to reliably mobile graphics in their platform. Their applications are measure the power and performance, and create a tool chain to related to Virtual Reality (VR), Augmented Reality (AR) provide clear information and insights to software developers. The and font rendering. Any output from this work will help companies in the consortium are world leaders in power-efficient in improving Samsungs’ mobile graphics platform where GPU design (Think Silicon), GPU and compute tools (Codeplay), their mobile phones could be used for VR using GearVR, graphics standards and applications (Samsung), video codecs and their mobile camera can used for AR and their mobile media players (Spin Digital), and the university in the consortium phone screen could be used for displaying text using font (Technical University of Berlin TUB) has leading experts on parallel rendering. applications and multi-core architectures. • To realize the possibilities of computational photography, This project proposes to aid the programmer in creating software applications must access the hardware at low level. This for low-power GPUs by building on the results of the first LPGPU would allow to control individual ISP blocks, as well as project [ 2] to provide a complete performance analysis process for send and receive detailed information from the hardware, programmers. It will address all aspects of performance analysis, including exposure settings, flash, focus point, timestamps, from hardware power and performance counters, to a tool chain and raw image sensor data. Using the GPU for post-camera that processes and visualizes information from these counters. The processing is a promising direction since the bulk of the main objectives of this project are: memory traffic can be eliminated. Think Silicon has built 1. To help programmers to improve the energy efficiency of three different implementations of a set of ISP algorithms their applications, the LPGPU2 tool chain will provide hints and (in C, in NEMA|gfx, and in Vulkan). An ISP demonstrator suggestions to GPU programmers showing ways to reduce power. based on NEMA|gfx [ 10 ], a proprietary low-level graphics 2. To enable programmers to be able to write their software API of Think Silicon, was developed and presented at in- once and run it on a variety of different low-power GPUs. The dustrial exhibitions with the goal of exploiting commercial LPGPU2 project will work on standardizing power analysis and opportunities of executing ISP and post-camera processing power-efficient programming models. algorithms on embedded GPUs. 3. To increase the productivity in GPU software development, • TensorFlow ™[1], is an artificial intelligence framework we propose an approach in which there are layers of technologies, that can be used for executing machine learning algorithms. that all work together via open standards, or open source software. While a computation expressed using TensorFlow can be 4. To bring technologies to market in a commercializable form, executed across heterogeneous systems, support has so including productizing and commercializing the technologies de- far been limited to NVIDIA ® processors using CUDA ® [6]. veloped in the previous LPGPU project [ 2]. This includes bringing In order to enable developers to access a wider range of the SYCL ™ standard into real-world AI applications and putting processors, we are working to bring support for OpenCL the LPGPU video decoders into commercial video systems. devices to the TensorFlow framework using SYCL. OpenCL is a framework for writing programs that execute across 2 APPLICATIONS heterogeneous platforms, and SYCL is a royalty-free, cross- As part of LPGPU 2 several applications are being developed and platform C++ abstraction layer that builds on the underly- ported to low-power GPU environments. These applications com- ing concepts, portability and efficiency of OpenCL, while bine graphics and compute parts, and require the use of existing adding the ease-of-use and flexibility of modern C++14. Par- and emerging APIs such as OpenGL ES ™ [4], OpenCL ™ [7, 11 ], allelization is also important from a processing and power and Vulkan ™ [13 ], in order to deliver the required performance management perspective.
Recommended publications
  • Real-Time Finite Element Method (FEM) and Tressfx
    REAL-TIME FEM AND TRESSFX 4 ERIC LARSEN KARL HILLESLAND 1 FEBRUARY 2016 | CONFIDENTIAL FINITE ELEMENT METHOD (FEM) SIMULATION Simulates soft to nearly-rigid objects, with fracture Models object as mesh of tetrahedral elements Each element has material parameters: ‒ Young’s Modulus: How stiff the material is ‒ Poisson’s ratio: Effect of deformation on volume ‒ Yield strength: Deformation limit before permanent shape change ‒ Fracture strength: Stress limit before the material breaks 2 FEBRUARY 2016 | CONFIDENTIAL MOTIVATIONS FOR THIS METHOD Parameters give a lot of design control Can model many real-world materials ‒Rubber, metal, glass, wood, animal tissue Commonly used now for film effects ‒High-quality destruction Successful real-time use in Star Wars: The Force Unleashed 1 & 2 ‒DMM middleware [Parker and O’Brien] 3 FEBRUARY 2016 | CONFIDENTIAL OUR PROJECT New implementation of real-time FEM for games Planned CPU library release ‒Heavy use of multithreading ‒Open-source with GPUOpen license Some highlights ‒Practical method for continuous collision detection (CCD) ‒Mix of CCD and intersection contact constraints ‒Efficient integrals for intersection constraint 4 FEBRUARY 2016 | CONFIDENTIAL STATUS Proof-of-concept prototype First pass at optimization Offering an early look for feedback Several generic components 5 FEBRUARY 2016 | CONFIDENTIAL CCD Find time of impact between moving objects ‒Impulses can prevent intersections [Otaduy et al.] ‒Catches collisions with fast-moving objects Our approach ‒Conservative-advancement based ‒Geometric
    [Show full text]
  • AMD Powerpoint- White Template
    RDNA Architecture Forward-looking statement This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to, the features, functionality, performance, availability, timing, pricing, expectations and expected benefits of AMD’s current and future products, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD's Securities and Exchange Commission filings, including but not limited to AMD's Quarterly Report on Form 10-Q for the quarter ended March 30, 2019 2 Highlights of the RDNA Workgroup Processor (WGP) ▪ Designed for lower latency and higher
    [Show full text]
  • The Importance of Data
    The landscape of Parallel Programing Models Part 2: The importance of Data Michael Wong and Rod Burns Codeplay Software Ltd. Distiguished Engineer, Vice President of Ecosystem IXPUG 2020 2 © 2020 Codeplay Software Ltd. Distinguished Engineer Michael Wong ● Chair of SYCL Heterogeneous Programming Language ● C++ Directions Group ● ISOCPP.org Director, VP http://isocpp.org/wiki/faq/wg21#michael-wong ● [email protected][email protected] Ported ● Head of Delegation for C++ Standard for Canada Build LLVM- TensorFlow to based ● Chair of Programming Languages for Standards open compilers for Council of Canada standards accelerators Chair of WG21 SG19 Machine Learning using SYCL Chair of WG21 SG14 Games Dev/Low Latency/Financial Trading/Embedded Implement Releasing open- ● Editor: C++ SG5 Transactional Memory Technical source, open- OpenCL and Specification standards based AI SYCL for acceleration tools: ● Editor: C++ SG1 Concurrency Technical Specification SYCL-BLAS, SYCL-ML, accelerator ● MISRA C++ and AUTOSAR VisionCpp processors ● Chair of Standards Council Canada TC22/SC32 Electrical and electronic components (SOTIF) ● Chair of UL4600 Object Tracking ● http://wongmichael.com/about We build GPU compilers for semiconductor companies ● C++11 book in Chinese: Now working to make AI/ML heterogeneous acceleration safe for https://www.amazon.cn/dp/B00ETOV2OQ autonomous vehicle 3 © 2020 Codeplay Software Ltd. Acknowledgement and Disclaimer Numerous people internal and external to the original C++/Khronos group, in industry and academia, have made contributions, influenced ideas, written part of this presentations, and offered feedbacks to form part of this talk. But I claim all credit for errors, and stupid mistakes. These are mine, all mine! You can’t have them.
    [Show full text]
  • Delivering Heterogeneous Programming in C++
    Delivering Heterogeneous Programming in C++ Duncan McBain, Codeplay Software Ltd. About me ● Graduated from Edinburgh University 3 years ago ● Postgrad course got me interested in GPU programming ● Worked at Codeplay since graduating ● Research projects, benchmarking, debuggers ● Most recently on C++ library for heterogeneous systems 2 © 2016 Codeplay Software Ltd. Contents • What are heterogeneous systems? • How can we program them? • The future of heterogeneous systems 3 © 2016 Codeplay Software Ltd. What are heterogeneous systems • By this, I mean devices like GPUs, DSPs, FPGAs… • Generally a bit of hardware that is more specialised than, and fundamentally different to, the host CPU • Specialisation can make it very fast • Can also be harder to program because of specialisation 4 © 2016 Codeplay Software Ltd. Some definitions • Host – The CPU/code that runs on the CPU, controls main memory (RAM), might control many devices • Device – A GPU, DSP, or something more exotic • Heterogeneous system – A host, a device and an API tying them together 5 © 2016 Codeplay Software Ltd. Some definitions • Kernel – Code representing the computation to be performed on the device. • Work group – A collection of many work items executing on a device. Has shared local memory and executes same instructions 6 © 2016 Codeplay Software Ltd. Some definitions ● Work item – A single thread or task on a device that executes in parallel ● Parallel for – Some collection of work items, in many work groups, executing a kernel in parallel. In general, cannot return anything, and must be enqueued asynchronously 7 © 2016 Codeplay Software Ltd. Example heterogeneous device ● CPUs today can execute instructions out-of-order, speculative execution, branch prediction ● Complexity hidden from programmer ● Contrast with e.g.
    [Show full text]
  • Exploring Weak Scalability for FEM Calculations on a GPU-Enhanced Cluster
    Exploring weak scalability for FEM calculations on a GPU-enhanced cluster Dominik G¨oddeke a,∗,1, Robert Strzodka b,2, Jamaludin Mohd-Yusof c, Patrick McCormick c,3, Sven H.M. Buijssen a, Matthias Grajewski a and Stefan Turek a aInstitute of Applied Mathematics, University of Dortmund bStanford University, Max Planck Center cComputer, Computational and Statistical Sciences Division, Los Alamos National Laboratory Abstract The first part of this paper surveys co-processor approaches for commodity based clusters in general, not only with respect to raw performance, but also in view of their system integration and power consumption. We then extend previous work on a small GPU cluster by exploring the heterogeneous hardware approach for a large-scale system with up to 160 nodes. Starting with a conventional commodity based cluster we leverage the high bandwidth of graphics processing units (GPUs) to increase the overall system bandwidth that is the decisive performance factor in this scenario. Thus, even the addition of low-end, out of date GPUs leads to improvements in both performance- and power-related metrics. Key words: graphics processors, heterogeneous computing, parallel multigrid solvers, commodity based clusters, Finite Elements PACS: 02.70.-c (Computational Techniques (Mathematics)), 02.70.Dc (Finite Element Analysis), 07.05.Bx (Computer Hardware and Languages), 89.20.Ff (Computer Science and Technology) ∗ Corresponding author. Address: Vogelpothsweg 87, 44227 Dortmund, Germany. Email: [email protected], phone: (+49) 231 755-7218, fax: -5933 1 Supported by the German Science Foundation (DFG), project TU102/22-1 2 Supported by a Max Planck Center for Visual Computing and Communication fellowship 3 Partially supported by the U.S.
    [Show full text]
  • Candidate Features for Future Opengl 5 / Direct3d 12 Hardware and Beyond 3 May 2014, Christophe Riccio
    Candidate features for future OpenGL 5 / Direct3D 12 hardware and beyond 3 May 2014, Christophe Riccio G-Truc Creation Table of contents TABLE OF CONTENTS 2 INTRODUCTION 4 1. DRAW SUBMISSION 6 1.1. GL_ARB_MULTI_DRAW_INDIRECT 6 1.2. GL_ARB_SHADER_DRAW_PARAMETERS 7 1.3. GL_ARB_INDIRECT_PARAMETERS 8 1.4. A SHADER CODE PATH PER DRAW IN A MULTI DRAW 8 1.5. SHADER INDEXED LOSE STATES 9 1.6. GL_NV_BINDLESS_MULTI_DRAW_INDIRECT 10 1.7. GL_AMD_INTERLEAVED_ELEMENTS 10 2. RESOURCES 11 2.1. GL_ARB_BINDLESS_TEXTURE 11 2.2. GL_NV_SHADER_BUFFER_LOAD AND GL_NV_SHADER_BUFFER_STORE 11 2.3. GL_ARB_SPARSE_TEXTURE 12 2.4. GL_AMD_SPARSE_TEXTURE 12 2.5. GL_AMD_SPARSE_TEXTURE_POOL 13 2.6. SEAMLESS TEXTURE STITCHING 13 2.7. 3D MEMORY LAYOUT FOR SPARSE 3D TEXTURES 13 2.8. SPARSE BUFFER 14 2.9. GL_KHR_TEXTURE_COMPRESSION_ASTC 14 2.10. GL_INTEL_MAP_TEXTURE 14 2.11. GL_ARB_SEAMLESS_CUBEMAP_PER_TEXTURE 15 2.12. DMA ENGINES 15 2.13. UNIFIED MEMORY 16 3. SHADER OPERATIONS 17 3.1. GL_ARB_SHADER_GROUP_VOTE 17 3.2. GL_NV_SHADER_THREAD_GROUP 17 3.3. GL_NV_SHADER_THREAD_SHUFFLE 17 3.4. GL_NV_SHADER_ATOMIC_FLOAT 18 3.5. GL_AMD_SHADER_ATOMIC_COUNTER_OPS 18 3.6. GL_ARB_COMPUTE_VARIABLE_GROUP_SIZE 18 3.7. MULTI COMPUTE DISPATCH 19 3.8. GL_NV_GPU_SHADER5 19 3.9. GL_AMD_GPU_SHADER_INT64 20 3.10. GL_AMD_GCN_SHADER 20 3.11. GL_NV_VERTEX_ATTRIB_INTEGER_64BIT 21 3.12. GL_AMD_ SHADER_TRINARY_MINMAX 21 4. FRAMEBUFFER 22 4.1. GL_AMD_SAMPLE_POSITIONS 22 4.2. GL_EXT_FRAMEBUFFER_MULTISAMPLE_BLIT_SCALED 22 4.3. GL_NV_MULTISAMPLE_COVERAGE AND GL_NV_FRAMEBUFFER_MULTISAMPLE_COVERAGE 22 4.4. GL_AMD_DEPTH_CLAMP_SEPARATE 22 5. BLENDING 23 5.1. GL_NV_TEXTURE_BARRIER 23 5.2. GL_EXT_SHADER_FRAMEBUFFER_FETCH (OPENGL ES) 23 5.3. GL_ARM_SHADER_FRAMEBUFFER_FETCH (OPENGL ES) 23 5.4. GL_ARM_SHADER_FRAMEBUFFER_FETCH_DEPTH_STENCIL (OPENGL ES) 23 5.5. GL_EXT_PIXEL_LOCAL_STORAGE (OPENGL ES) 24 5.6. TILE SHADING 25 5.7. GL_INTEL_FRAGMENT_SHADER_ORDERING 26 5.8. GL_KHR_BLEND_EQUATION_ADVANCED 26 5.9.
    [Show full text]
  • Programmers' Tool Chain
    Reduce the complexity of programming multicore ++ Offload™ for PlayStation®3 | Offload™ for Cell Broadband Engine™ | Offload™ for Embedded | Custom C and C++ Compilers | Custom Shader Language Compiler www.codeplay.com It’s a risk to underestimate the complexity of programming multicore applications Software developers are now presented with a rapidly-growing range of different multi-core processors. The common feature of many of these processors is that they are difficult and error-prone to program with existing tools, give very unpredictable performance, and that incompatible, complex programming models are used. Codeplay develop compilers and programming tools with one primary goal - to make it easy for programmers to achieve big performance boosts with multi-core processors, but without needing bigger, specially-trained, expensive development teams to get there. Introducing Codeplay Based in Edinburgh, Scotland, Codeplay Software Limited was founded by veteran games developer Andrew Richards in 2002 with funding from Jez San (the founder of Argonaut Games and ARC International). Codeplay introduced their first product, VectorC, a highly optimizing compiler for x86 PC and PlayStation®2, in 2003. In 2004 Codeplay further developed their business by offering services to processor developers to provide them with compilers and programming tools for their new and unique architectures, using VectorC’s highly retargetable compiler technology. Realising the need for new multicore tools Codeplay started the development of the company’s latest product, the Offload™ C++ Multicore Programming Platform. In October 2009 Offload™: Community Edition was released as a free-to-use tool for PlayStation®3 programmers. Experience and Expertise Codeplay have developed compilers and software optimization technology since 1999.
    [Show full text]
  • AMD Codexl 1.7 GA Release Notes
    AMD CodeXL 1.7 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: For 64-bit Windows platforms o CodeXL Standalone application o CodeXL Microsoft® Visual Studio® 2010 extension o CodeXL Microsoft® Visual Studio® 2012 extension o CodeXL Microsoft® Visual Studio® 2013 extension o CodeXL Remote Agent For 64-bit Linux platforms o CodeXL Standalone application o CodeXL Remote Agent Note about installing CodeAnalyst after installing CodeXL for Windows AMD CodeAnalyst has reached End-of-Life status and has been replaced by AMD CodeXL. CodeXL installer will refuse to install on a Windows station where AMD CodeAnalyst is already installed. Nevertheless, if you would like to install CodeAnalyst, do not install it on a Windows station already installed with CodeXL. Uninstall CodeXL first, and then install CodeAnalyst. System Requirements CodeXL contains a host of development features with varying system requirements: GPU Profiling and OpenCL Kernel Debugging o An AMD GPU (Radeon HD 5000 series or newer, desktop or mobile version) or APU is required. o The AMD Catalyst Driver must be installed, release 13.11 or later. Catalyst 14.12 (driver 14.501) is the recommended version. See "Getting the latest Catalyst release" section below. For GPU API-Level Debugging, a working OpenCL/OpenGL configuration is required (AMD or other). CPU Profiling o Time-Based Profiling can be performed on any x86 or AMD64 (x86-64) CPU/APU.
    [Show full text]
  • The Amd Linux Graphics Stack – 2018 Edition Nicolai Hähnle Fosdem 2018
    THE AMD LINUX GRAPHICS STACK – 2018 EDITION NICOLAI HÄHNLE FOSDEM 2018 1FEBRUARY 2018 | CONFIDENTIAL GRAPHICS STACK: KERNEL / USER-SPACE / X SERVER Mesa OpenGL & Multimedia Vulkan Vulkan radv AMDVLK OpenGL X Server radeonsi Pro/ r600 Workstation radeon amdgpu LLVM SCPC libdrm radeon amdgpu FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 2FEBRUARY 2018 | CONFIDENTIAL GRAPHICS STACK: OPEN-SOURCE / CLOSED-SOURCE Mesa OpenGL & Multimedia Vulkan Vulkan radv AMDVLK OpenGL X Server radeonsi Pro/ r600 Workstation radeon amdgpu LLVM SCPC libdrm radeon amdgpu FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 3FEBRUARY 2018 | CONFIDENTIAL GRAPHICS STACK: SUPPORT FOR GCN / PRE-GCN HARDWARE ROUGHLY: GCN = NEW GPUS OF THE LAST 5 YEARS Mesa OpenGL & Multimedia Vulkan Vulkan radv AMDVLK OpenGL X Server radeonsi Pro/ r600 Workstation radeon amdgpu LLVM(*) SCPC libdrm radeon amdgpu (*) LLVM has pre-GCN support only for compute FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 4FEBRUARY 2018 | CONFIDENTIAL GRAPHICS STACK: PHASING OUT “LEGACY” COMPONENTS Mesa OpenGL & Multimedia Vulkan Vulkan radv AMDVLK OpenGL X Server radeonsi Pro/ r600 Workstation radeon amdgpu LLVM SCPC libdrm radeon amdgpu FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 5FEBRUARY 2018 | CONFIDENTIAL MAJOR MILESTONES OF 2017 . Upstreaming the DC display driver . Open-sourcing the AMDVLK Vulkan driver . Unified driver delivery . OpenGL 4.5 conformance in the open-source Mesa driver . Zero-day open-source support for new hardware FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 6FEBRUARY 2018 | CONFIDENTIAL KERNEL: AMDGPU AND RADEON HARDWARE SUPPORT Pre-GCN radeon GCN 1st gen (Southern Islands, SI, gfx6) GCN 2nd gen (Sea Islands, CI(K), gfx7) GCN 3rd gen (Volcanic Islands, VI, gfx8) amdgpu GCN 4th gen (Polaris, RX 4xx, RX 5xx) GCN 5th gen (RX Vega, Ryzen Mobile, gfx9) FEBRUARY 2018 | AMD LINUX GRAPHICS STACK 7FEBRUARY 2018 | CONFIDENTIAL KERNEL: AMDGPU VS.
    [Show full text]
  • Conga-TR4 User's Guide
    COM Express™ conga-TR4 COM Express Type 6 Basic module based on 4th Generation AMD Embedded V- and R-Series SoC User’s Guide Revision 1.10 Copyright © 2018 congatec GmbH TR44m110 1/66 Revision History Revision Date (yyyy.mm.dd) Author Changes 0.1 2018.01.15 BEU • Preliminary release 1.0 2018.10.15 BEU • Updated “Electrostatic Sensitive Device” information on page 3 • Corrected single/dual channel MT/s rates for two variants in table 2 • Updated section 2.2 “Supported Operating Systems” • Added values for four variants in section 2.5 "Power Consumption" • Added values in section 2.6 "Supply Voltage Battery Power" • Updated images in section 4 "Cooling Solutions" • Added note about requiring a re-driver on carrier for USB 3.1 Gen 2 in section 5.1.2 "USB" and 7.4 "USB Host Controller" • Added Intel® Ethernet Controller i211 as assembly option in table 4 "Feature Summary" and section 5.1.4 "Ethernet" • Corrected section 7.4 "USB Host Controller" • Added section 9 "System Resources" 1.1 2019.03.19 BEU • Corrected image in section 2.4 "Supply Voltage Standard Power" • Updated section 10.4 "Supported Flash Devices" 1.2 2019.04.02 BEU • Corrected supported memory in table 2, 3, and added information about supported memory in table 4 • Added information about the new industrial variant in table 3 and 7 1.3 2019.07.30 BEU • Updated note in section 4 "Cooling Solutions" • Changed number of supported USB 3.1 Gen 2 interfaces to two throughout the document • Added note regarding USB 3.1 Gen 2 in section 7.4 "USB Host Controller" 1.4 2020.01.07 BEU
    [Show full text]
  • A Review of Gpuopen Effects
    A REVIEW OF GPUOPEN EFFECTS TAKAHIRO HARADA & JASON LACROIX • An initiative designed to help developers make better content by “opening up” the GPU • Contains a variety of software modules across various GPU needs: • Effects and render features • Tools, SDKs, and libraries • Patches and drivers • Software hosted on GitHub with no “black box” implementations or licensing fees • Website provides: • The latest news and information on all GPUOpen software • Tutorials and samples to help you optimise your game • A central location for up-to-date GPU and CPU documentation • Information about upcoming events and previous presentations AMD Public | Let’s build… 2020 | A Review of GPUOpen Effects | May 15, 2020 | 2 LET’S BUILD A NEW GPUOPEN… • Brand new, modern, dynamic website • Easy to find the information you need quickly • Read the latest news and see what’s popular • Learn new tips and techniques from our engineers • Looks good on mobile platforms too! • New social media presence • @GPUOpen AMD Public | Let’s build… 2020 | A Review of GPUOpen Effects | May 15, 2020 | 3 EFFECTS A look at recently released samples AMD Public | Let’s build… 2020 | A Review of GPUOpen Effects | May 15, 2020 | 4 TRESSFX 4.1 • Self-contained solution for hair simulation • Implementation into Radeon® Cauldron framework • DirectX® 12 and Vulkan® with full source • Optimized physics simulation • Faster velocity shock propagation • Simplified local shape constraints • Reorganization of dispatches • StrandUV support • New LOD system • New and improved Autodesk® Maya®
    [Show full text]
  • Porting Source to Linux
    Porting Source to Linux Valve’s Lessons Learned Overview . Who is this talk for? . Why port? . Windows->Linux . Linux Tools . Direct3D->OpenGL Why port? 100% Why port? Nov Dec Jan Feb . Linux is open 10% . Linux (for gaming) is growing, and quickly 1% . Stepping stone to mobile . Performance 0% . Steam for Linux Linux Mac Windows % December January February Windows 94.79 94.56 94.09 Mac 3.71 3.56 3.07 Linux 0.79 1.12 2.01 Why port? – cont’d . GL exposes functionality by hardware capability—not OS. China tends to have equivalent GPUs, but overwhelmingly still runs XP — OpenGL can allow DX10/DX11 (and beyond) features for all of those users Why port? – cont’d . Specifications are public. GL is owned by committee, membership is available to anyone with interest (and some, but not a lot, of $). GL can be extended quickly, starting with a single vendor. GL is extremely powerful Windows->Linux Windowing issues . Consider SDL! . Handles all cross-platform windowing issues, including on mobile OSes. Tight C implementation—everything you need, nothing you don’t. Used for all Valve ports, and Linux Steam http://www.libsdl.org/ Filesystem issues . Linux filesystems are case-sensitive . Windows is not . Not a big issue for deployment (because everyone ships packs of some sort) . But an issue during development, with loose files . Solution 1: Slam all assets to lower case, including directories, then tolower all file lookups (only adjust below root) . Solution 2: Build file cache, look for similarly named files Other issues . Bad Defines — E.g. Assuming that LINUX meant DEDICATED_SERVER .
    [Show full text]