Cooperative CPU-GPU Dynamic Power Management Methodologies for Energy-Efficient Mobile Gaming

Total Page:16

File Type:pdf, Size:1020Kb

Cooperative CPU-GPU Dynamic Power Management Methodologies for Energy-Efficient Mobile Gaming UNIVERSITY OF CALIFORNIA, IRVINE Cooperative CPU-GPU Dynamic Power Management Methodologies for Energy-efficient Mobile Gaming DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science by Jurn-Gyu Park Dissertation Committee: Professor Nikil Dutt, Chair Professor Alex Nicolau Professor Mohammad Al Faruque 2017 Portion of Chapter 2 c 2014 IEEE All other materials c 2017 Jurn-Gyu Park DEDICATION To my God The LORD is my shepherd, I shall not be in want. He makes me lie down in green pastures, he leads me beside quiet waters, he restores my soul. He guides me in paths of righteousness for his name's sake. Even though I walk through the valley of the shadow of death, I will fear no evil, for you are with me; your rod and your staff, they comfort me. You prepare a table before me in the presence of my enemies. You anoint my head with oil; my cup overflows. Surely goodness and love will follow me all the days of my life, and I will dwell in the house of the LORD forever. - Psalms 23 - To my Wife A wife of noble character who can find? She is worth far more than rubies. - Proverbs 31:10 - ii TABLE OF CONTENTS Page LIST OF FIGURES vi LIST OF TABLES viii ACKNOWLEDGMENTS ix CURRICULUM VITAE x ABSTRACT OF THE DISSERTATION xii 1 Introduction 1 1.1 Smartphone Systems . .1 1.2 Motivation and Challenges . .4 1.2.1 Existing Approaches . .5 1.3 Thesis Overview . .6 1.3.1 Graphics Workload Characterization (GWC) . .6 1.3.2 Cooperative Frequency-Capping Governor (Co-Cap) . .7 1.3.3 Hierarchical-FSM based Dynamic Behavior Modeling Governor (HiCAP)8 1.3.4 Machine Learning enhanced Modeling Governor (ML-Gov) . .9 1.4 Thesis Contributions . .9 1.4.1 Design and Implementation of micro-benchmarks . .9 1.4.2 Simple but highly effective CPU-GPU Frequency Capping . 10 1.4.3 Hierarchical FSM-based Dynamic Behavior Modeling . 10 1.4.4 Machine Learning enhanced Simple and Accurate Prediction Models . 11 2 Graphics Workload Characterization for DVFS Design 12 2.1 Introduction . 12 2.1.1 Contributions . 14 2.2 Motivation and Related Work . 15 2.2.1 Motivation . 15 2.2.2 Related Work . 16 2.3 Graphics Workload Characterization . 18 2.3.1 Mobile Graphics Pipeline . 18 2.3.2 Workload Characterization and Micro-benchmarks . 21 2.4 Experimental Setup and Results . 23 iii 2.4.1 Experimental Setup and Methodology . 23 2.4.2 Experimental Results . 25 2.5 Opportunities for DVFS design . 34 2.5.1 GPU DVFS . 35 2.5.2 CPU DVFS . 36 2.5.3 Integrated DVFS . 37 2.6 Summary . 37 3 Cooperative CPU-GPU Frequency Capping 39 3.1 Introduction . 39 3.2 Related Work . 42 3.3 Motivation . 44 3.4 Co-Cap Methodology . 46 3.4.1 Training Phase . 47 3.4.2 Deployment Phase . 57 3.5 Evaluation of Co-Cap . 58 3.5.1 Experimental Setup . 58 3.5.2 Experimental Results . 63 3.5.3 Analysis and Discussion . 66 3.6 Conclusion . 69 4 Hierarchical FSM-based Integrated CPU-GPU Frequency Capping 70 4.1 Introduction . 70 4.2 Related Work . 73 4.3 Approach . 74 4.3.1 Preliminaries . 74 4.3.2 HFSM-based Dynamic Behavior Model . 76 4.3.3 Frequency-Capping . 79 4.4 Experimental Results . 81 4.4.1 Experimental Setup . 81 4.4.2 Automatic Measurement Tool . 82 4.4.3 Results and Analysis . 84 4.5 Conclusion . 88 5 A Machine Learning Enhanced Integrated Governor 89 5.1 Introduction . 89 5.2 Motivation and Related Work . 91 5.2.1 Motivation . 92 5.2.2 Related Work . 94 5.3 ML-Gov Methodology . 96 5.3.1 Learning Phase . 96 5.3.2 Prediction Phase . 104 5.4 Experimental Results . 108 5.4.1 Experimental Setup . 108 5.4.2 Results and Analysis . 111 iv 5.4.3 Discussion . 114 5.5 Conclusion . 115 6 Conclusion and Future Directions 117 6.1 Summary . 117 6.2 Contributions . 118 6.3 Future Directions . 120 Bibliography 121 v LIST OF FIGURES Page 1.1 Trends of Smartphone Applications . .1 1.2 Change of High-performance Mobile Integrated GPUs . .2 1.3 Change of Apple Smartphone Platforms [29] . .2 1.4 Correlation between Performance and Power [7] (The performance metric is Drystone MIPS and power consumption unit is mW) . .3 1.5 Comparison of CPU and GPU power consumption [5]: x-axis: time in second, y-axis: Power (mW) . .3 1.6 State-of-the-art Approaches . .5 1.7 Thesis Overview . .7 2.1 Mobile GPU Power Consumption [5] . 13 2.2 Motivating Example for GPU Workload Characterization. 15 2.3 Abstract Mobile Graphics Pipeline. 18 2.4 Logical Graphics Rendering Pipeline . 20 2.5 Power Measurement and Adreno Profiler Setup . 24 2.6 Results of mb-TexM at Different Frequencies with Workload Variation . 26 2.7 Results of mb-VerM at Different Frequencies with Workload Variation . 29 2.8 Workload Factor result with Enabled CPU DVFS on mb-VerM . 30 2.9 Results of mb-App at Different Frequencies with Workload Variation . 31 2.10 FPS of mb-VerSh and mb-FragSh ........................ 32 2.11 Average Power of mb-VerSh and mb-FragSh .................. 33 2.12 Energy per Frame of mb-VerSh and mb-FragSh ................ 33 3.1 System Comparison . 40 3.2 Mobile Platform Trends. 40 3.3 Different types of CPU/GPU Workload. 41 3.4 Motivating Examples . 45 3.5 Co-Cap Overview. 47 3.6 Co-Cap Training Phase. 47 3.7 A Sample of the Training Set. 49 3.8 Effects of CPU (or GPU) maximum frequency capping on FPS and Power. 49 3.9 The Proposed Methodology for Data Collection and LUT Building. 51 3.10 LUTs after the Building Step . 52 3.11 Fine-grained Refinement Step. 53 3.12 The Methodology of Fine-grained Refinement Step. 53 vi 3.13 Detailed Fine-grained Refinement Step. 54 3.14 Final LUTs after the Refinement Steps (∗frequencies are refined during the refinement step 2) . 56 3.15 The Evaluation of the Fine-grained Refinement Steps . 56 3.16 Co-Cap Deployment Phase. 57 3.17 The Training Sets. 59 3.18 The Deployment Sets. 60 3.19 FPS, Power and EpF Results of Different Types of Graphics Workloads. 62 3.20 The Average Results of the Training Set (High-variation). 64 3.21 The Detailed Results of the Training Set (High-variation). 64 3.22 Average Results of the Deployment Sets . 65 3.23 The results of the Deployment Set (MB). 66 3.24 The results of the Deployment Set (RG). 66 4.1 CPU-GPU Mobile Governors . 71 4.2 HFSMs in Game Design. 71 4.3 Sample Mobile Game (ShootEmDown) . 72 4.4 A set of benchmarks. ..
Recommended publications
  • High End Visualization with Scalable Display System
    HIGH END VISUALIZATION WITH SCALABLE DISPLAY SYSTEM Dinesh M. Sarode*, Bose S.K.*, Dhekne P.S.*, Venkata P.P.K.*, Computer Division, Bhabha Atomic Research Centre, Mumbai, India Abstract display, then the large number of pixels shows the picture Today we can have huge datasets resulting from in greater details and interaction with it enables the computer simulations (CFD, physics, chemistry etc) and greater insight in understanding the data. However, the sensor measurements (medical, seismic and satellite). memory constraints, lack of the rendering power and the There is exponential growth in computational display resolution offered by even the most powerful requirements in scientific research. Modern parallel graphics workstation makes the visualization of this computers and Grid are providing the required magnitude difficult or impossible. computational power for the simulation runs. The rich While the cost-performance ratio for the component visualization is essential in interpreting the large, dynamic based on semiconductor technologies doubling in every data generated from these simulation runs. The 18 months or beyond that for graphics accelerator cards, visualization process maps these datasets onto graphical the display resolution is lagging far behind. The representations and then generates the pixel resolutions of the displays have been increasing at an representation. The large number of pixels shows the annual rate of 5% for the last two decades. The ability to picture in greater details and interaction with it enables scale the components: graphics accelerator and display by the greater insight on the part of user in understanding the combining them is the most cost-effective way to meet data more quickly, picking out small anomalies that could the ever-increasing demands for high resolution.
    [Show full text]
  • The Development and Validation of the Game User Experience Satisfaction Scale (Guess)
    THE DEVELOPMENT AND VALIDATION OF THE GAME USER EXPERIENCE SATISFACTION SCALE (GUESS) A Dissertation by Mikki Hoang Phan Master of Arts, Wichita State University, 2012 Bachelor of Arts, Wichita State University, 2008 Submitted to the Department of Psychology and the faculty of the Graduate School of Wichita State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy May 2015 © Copyright 2015 by Mikki Phan All Rights Reserved THE DEVELOPMENT AND VALIDATION OF THE GAME USER EXPERIENCE SATISFACTION SCALE (GUESS) The following faculty members have examined the final copy of this dissertation for form and content, and recommend that it be accepted in partial fulfillment of the requirements for the degree of Doctor of Philosophy with a major in Psychology. _____________________________________ Barbara S. Chaparro, Committee Chair _____________________________________ Joseph Keebler, Committee Member _____________________________________ Jibo He, Committee Member _____________________________________ Darwin Dorr, Committee Member _____________________________________ Jodie Hertzog, Committee Member Accepted for the College of Liberal Arts and Sciences _____________________________________ Ronald Matson, Dean Accepted for the Graduate School _____________________________________ Abu S. Masud, Interim Dean iii DEDICATION To my parents for their love and support, and all that they have sacrificed so that my siblings and I can have a better future iv Video games open worlds. — Jon-Paul Dyson v ACKNOWLEDGEMENTS Althea Gibson once said, “No matter what accomplishments you make, somebody helped you.” Thus, completing this long and winding Ph.D. journey would not have been possible without a village of support and help. While words could not adequately sum up how thankful I am, I would like to start off by thanking my dissertation chair and advisor, Dr.
    [Show full text]
  • Links to the Past User Research Rage 2
    ALL FORMATS LIFTING THE LID ON VIDEO GAMES User Research Links to Game design’s the past best-kept secret? The art of making great Zelda-likes Issue 9 £3 wfmag.cc 09 Rage 2 72000 Playtesting the 16 neon apocalypse 7263 97 Sea Change Rhianna Pratchett rewrites the adventure game in Lost Words Subscribe today 12 weeks for £12* Visit: wfmag.cc/12weeks to order UK Price. 6 issue introductory offer The future of games: subscription-based? ow many subscription services are you upfront, would be devastating for video games. Triple-A shelling out for each month? Spotify and titles still dominate the market in terms of raw sales and Apple Music provide the tunes while we player numbers, so while the largest publishers may H work; perhaps a bit of TV drama on the prosper in a Spotify world, all your favourite indie and lunch break via Now TV or ITV Player; then back home mid-tier developers would no doubt ounder. to watch a movie in the evening, courtesy of etix, MIKE ROSE Put it this way: if Spotify is currently paying artists 1 Amazon Video, Hulu… per 20,000 listens, what sort of terrible deal are game Mike Rose is the The way we consume entertainment has shifted developers working from their bedroom going to get? founder of No More dramatically in the last several years, and it’s becoming Robots, the publishing And before you think to yourself, “This would never increasingly the case that the average person doesn’t label behind titles happen – it already is.
    [Show full text]
  • Why Crowdfund? Motivations for Kickstarter Funding
    Why Crowdfund? Motivations for Kickstarter Funding Author: William LaPorta Advisor: Dr. Michele Naples 13 May 2015 The College Of New Jersey Abstract This paper attempts to answer the question, "What makes entrepreneurs utilize crowdfunding?" To answer this question we analyze the level of Kickstarter funding per capita in each state as a function of various macroeconomic variables. We find that states with higher income, states with higher income inequality, states with a lower concentration of small firms, states with lower unemployment and high social media usage are more likely to have higher levels of Kickstarter funding. These results indicate that Kickstarter and other crowdfunding websites are utilized in more affluent and technologically savvy parts of the country, with high levels of business activity. LaPorta 1 Table of Contents INTRODUCTION ................................................................................................................................................... 2 LITERATURE REVIEW ...................................................................................................................................... 3 HOW KICKSTARTER WORKS ............................................................................................................................................. 3 ECONOMIC LITERATURE ON CROWDFUNDING ........................................................................................................... 4 BASIS OF RESEARCH AND DESIGN .................................................................................................................................
    [Show full text]
  • Order Independent Transparency in Opengl 4.X Christoph Kubisch – [email protected] TRANSPARENT EFFECTS
    Order Independent Transparency In OpenGL 4.x Christoph Kubisch – [email protected] TRANSPARENT EFFECTS . Photorealism: – Glass, transmissive materials – Participating media (smoke...) – Simplification of hair rendering . Scientific Visualization – Reveal obscured objects – Show data in layers 2 THE CHALLENGE . Blending Operator is not commutative . Front to Back . Back to Front – Sorting objects not sufficient – Sorting triangles not sufficient . Very costly, also many state changes . Need to sort „fragments“ 3 RENDERING APPROACHES . OpenGL 4.x allows various one- or two-pass variants . Previous high quality approaches – Stochastic Transparency [Enderton et al.] – Depth Peeling [Everitt] 3 peel layers – Caveat: Multiple scene passes model courtesy of PTC required Peak ~84 layers 4 RECORD & SORT 4 2 3 1 . Render Opaque – Depth-buffer rejects occluded layout (early_fragment_tests) in; fragments 1 2 3 . Render Transparent 4 – Record color + depth uvec2(packUnorm4x8 (color), floatBitsToUint (gl_FragCoord.z) ); . Resolve Transparent 1 2 3 4 – Fullscreen sort & blend per pixel 4 2 3 1 5 RESOLVE . Fullscreen pass uvec2 fragments[K]; // encodes color and depth – Not efficient to globally sort all fragments per pixel n = load (fragments); sort (fragments,n); – Sort K nearest correctly via vec4 color = vec4(0); register array for (i < n) { blend (color, fragments[i]); – Blend fullscreen on top of } framebuffer gl_FragColor = color; 6 TAIL HANDLING . Tail Handling: – Discard Fragments > K – Blend below sorted and hope error is not obvious [Salvi et al.] . Many close low alpha values are problematic . May not be frame- coherent (flicker) if blend is not primitive- ordered K = 4 K = 4 K = 16 Tailblend 7 RECORD TECHNIQUES . Unbounded: – Record all fragments that fit in scratch buffer – Find & Sort K closest later + fast record - slow resolve - out of memory issues 8 HOW TO STORE .
    [Show full text]
  • Best Practice for Mobile
    If this doesn’t look familiar, you’re in the wrong conference 1 This section of the course is about the ways that mobile graphics hardware works, and how to work with this hardware to get the best performance. The focus here is on the Vulkan API because the explicit nature exposes the hardware, but the principles apply to other programming models. There are ways of getting better mobile efficiency by reducing what you’re drawing: running at lower frame rate or resolution, only redrawing what and when you need to, etc.; we’ve addressed those in previous years and you can find some content on the course web site, but this time the focus is on high-performance 3D rendering. Those other techniques still apply, but this talk is about how to keep the graphics moving and assuming you’re already doing what you can to reduce your workload. 2 Mobile graphics processing units are usually fundamentally different from classical desktop designs. * Mobile GPUs mostly (but not exclusively) use tiling, desktop GPUs tend to use immediate-mode renderers. * They use system RAM rather than dedicated local memory and a bus connection to the GPU. * They have multiple cores (but not so much hyperthreading), some of which may be optimised for low power rather than performance. * Vulkan and similar APIs make these features more visible to the developer. Here, we’re mostly using Vulkan for examples – but the architectural behaviour applies to other APIs. 3 Let’s start with the tiled renderer. There are many approaches to tiling, even within a company.
    [Show full text]
  • Bioshock Infinite
    SOUTH AFRICA’S LEADING GAMING, COMPUTER & TECHNOLOGY MAGAZINE VOL 15 ISSUE 10 Reviews Call of Duty: Black Ops II ZombiU Hitman: Absolution PC / PLAYSTATION / XBOX / NINTENDO + MORE The best and wors t of 2012 We give awards to things – not in a traditional way… BioShock Infi nite Loo k! Up in the sky! Editor Michael “RedTide“ James [email protected] Contents Features Assistant editor 24 THE BEST AND WORST OF 2012 Geoff “GeometriX“ Burrows Regulars We like to think we’re totally non-conformist, 8 Ed’s Note maaaaan. Screw the corporations. Maaaaan, etc. So Staff writer 10 Inbox when we do a “Best of [Year X]” list, we like to do it Dane “Barkskin “ Remendes our way. Here are the best, the worst, the weirdest 14 Bytes and, most importantly, the most memorable of all our Contributing editor 41 home_coded gaming experiences in 2012. Here’s to 2013 being an Lauren “Guardi3n “ Das Neves 62 Everything else equally memorable year in gaming! Technical writer Neo “ShockG“ Sibeko Opinion 34 BIOSHOCK INFINITE International correspondent How do you take one of the most infl uential, most Miktar “Miktar” Dracon 14 I, Gamer evocative experiences of this generation and make 16 The Game Stalker it even more so? You take to the skies, of course. Contributors 18 The Indie Investigator Miktar’s played a few hours of Irrational’s BioShock Rodain “Nandrew” Joubert 20 Miktar’s Meanderings Infi nite, and it’s left him breathless – but fi lled with Walt “Shryke” Pretorius 67 Hardwired beautiful, descriptive words. Go read them. Miklós “Mikit0707 “ Szecsei 82 Game Over Pippa
    [Show full text]
  • Power Optimizations for Graphics Processors
    POWER OPTIMIZATIONS FOR GRAPHICS PROCESSORS B.V.N.SILPA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY DELHI March 2011 POWER OPTIMAZTIONS FOR GRAPHICS PROCESSORS by B.V.N.SILPA Department of Computer Science and Engineering Submitted in fulfillment of the requirements of the degree of Doctor of Philosophy to the Indian Institute of Technology Delhi March 2011 Certificate This is to certify that the thesis titled Power optimizations for graphics pro- cessors being submitted by B V N Silpa for the award of Doctor of Philosophy in Computer Science & Engg. is a record of bona fide work carried out by her under my guidance and supervision at the Deptartment of Computer Science & Engineer- ing, Indian Institute of Technology Delhi. The work presented in this thesis has not been submitted elsewhere, either in part or full, for the award of any other degree or diploma. Preeti Ranjan Panda Professor Dept. of Computer Science & Engg. Indian Institute of Technology Delhi Acknowledgment It is with immense gratitude that I acknowledge the support and help of my Professor Preeti Ranjan Panda in guiding me through this thesis. I would like to thank Professors M. Balakrishnan, Anshul Kumar, G.S. Visweswaran and Kolin Paul for their valuable feedback, suggestions and help in all respects. I am indebted to my dear friend G Kr- ishnaiah for being my constant support and an impartial critique. I would like to thank Neeraj Goel, Anant Vishnoi and Aryabartta Sahu for their technical and moral support. I would also like to thank the staff of Philips, FPGA, Intel laboratories and IIT Delhi for their help.
    [Show full text]
  • Godus Beta 204 Cracked 2018 No Survey
    Godus Beta 2.0.4 Cracked 2018 No Survey Godus Beta 2.0.4 Cracked 2018 No Survey 1 / 2 Days Under Custody v1.5 DayZ v0.47 (I will not reupload/update for this game, because it need cracked server, but i dont make server, sorry) D.C. II ~Da Capo II. Let the god games begin: 22cans' Godus beta available on Steam Early ... Your browser does not currently recognize any of the video formats .... Not quite, says Tim McDonald, after a dozen hours with the beta. ... On the one hand: oh dear, it's Peter Molyneux and his trail of broken dreams .... 校园恋物语|Love in School · 樱之杜†净梦者 1 第一部 Sakura no Mori ... Games · Adventure Time: Pirates of the Enchiridion (Update 24/10/2018) .... Submission date: Thu Mar 29 02:43:09 2018. Server IP address: 104.27.166.47. Country ... Blacklisted: No. SSL Certificate details: Available via API only.. banished v1.0.4 beta build 140825 cracked-3dm snow light multi2- ... shadow warrior flt godus 7 days to die 1.1a update nihilumbra legends of aethereus ... latest vlc media player 2.0.4 version free download ride to hell pc ... update trainer (kitchen scramble v.50 - free download, no password and no survey ). Persia.The.Forgotten.Sands.BETA.Crack-CHCK-[tracker.BT no survey no password no download ... Godus beta 2.0.4 Cracked with lucky patcher. Just De Cheat .... The beta for Peter Molyneux's god- sim Godus is now available on Steam. The game, which is now at beta version 2.0, includes a host of all new ..
    [Show full text]
  • Opengl FAQ and Troubleshooting Guide
    OpenGL FAQ and Troubleshooting Guide Table of Contents OpenGL FAQ and Troubleshooting Guide v1.2001.11.01..............................................................................1 1 About the FAQ...............................................................................................................................................13 2 Getting Started ............................................................................................................................................18 3 GLUT..............................................................................................................................................................33 4 GLU.................................................................................................................................................................37 5 Microsoft Windows Specifics........................................................................................................................40 6 Windows, Buffers, and Rendering Contexts...............................................................................................48 7 Interacting with the Window System, Operating System, and Input Devices........................................49 8 Using Viewing and Camera Transforms, and gluLookAt().......................................................................51 9 Transformations.............................................................................................................................................55 10 Clipping, Culling,
    [Show full text]
  • Ray Tracing on Programmable Graphics Hardware
    Ray Tracing on Programmable Graphics Hardware Timothy J. Purcell Ian Buck William R. Mark ∗ Pat Hanrahan Stanford University † Abstract In this paper, we describe an alternative approach to real-time ray tracing that has the potential to out perform CPU-based algorithms Recently a breakthrough has occurred in graphics hardware: fixed without requiring fundamentally new hardware: using commodity function pipelines have been replaced with programmable vertex programmable graphics hardware to implement ray tracing. Graph- and fragment processors. In the near future, the graphics pipeline ics hardware has recently evolved from a fixed-function graph- is likely to evolve into a general programmable stream processor ics pipeline optimized for rendering texture-mapped triangles to a capable of more than simply feed-forward triangle rendering. graphics pipeline with programmable vertex and fragment stages. In this paper, we evaluate these trends in programmability of In the near-term (next year or two) the graphics processor (GPU) the graphics pipeline and explain how ray tracing can be mapped fragment program stage will likely be generalized to include float- to graphics hardware. Using our simulator, we analyze the per- ing point computation and a complete, orthogonal instruction set. formance of a ray casting implementation on next generation pro- These capabilities are being demanded by programmers using the grammable graphics hardware. In addition, we compare the perfor- current hardware. As we will show, these capabilities are also suf- mance difference between non-branching programmable hardware ficient for us to write a complete ray tracer for this hardware. As using a multipass implementation and an architecture that supports the programmable stages become more general, the hardware can branching.
    [Show full text]
  • Interactive Computer Graphics Stanford CS248, Winter 2021 You Are Almost Done!
    Lecture 18: Parallelizing and Optimizing Rasterization Interactive Computer Graphics Stanford CS248, Winter 2021 You are almost done! ▪ Wed night deadline: - Exam redo (no late days) ▪ Thursday night deadline: - Final project writeup - Final project video - It should demonstrate that your algorithms “work” - But hopefully you can get creative and have some fun with it! ▪ Friday: - Relax and party Stanford CS248, Winter 2021 Cyberpunk 2077 Stanford CS248, Winter 2021 Ghost of Tsushima Stanford CS248, Winter 2021 Forza Motorsport 7 Stanford CS248, Winter 2021 NVIDIA V100 GPU: 80 streaming multiprocessors (SMs) L2 Cache (6 MB) 900 GB/sec (4096 bit interface) GPU memory (HBM) (16 GB) Stanford CS248, Winter 2021 V100 GPU parallelism 1.245 GHz clock 1.245 GHz clock 80 SM processor cores per chip 80 SM processor cores per chip 64 parallel multiple-add units per SM 64 parallel multiple-add units per SM 80 x 64 = 5,120 fp32 mul-add ALUs 80 x 64 = 5,120 fp32 mul-add ALUs = 12.7 TFLOPs * = 12.7 TFLOPs * Up to 163,840 fragments being processed at a time on the chip! Up to 163,840 fragments being processed at a time on the chip! L2 Cache (6 MB) 900 GB/sec GPU memory (16 GB) * mul-add counted as 2 !ops: Stanford CS248, Winter 2021 Hardware units RTX 3090 GPU for rasterization Stanford CS248, Winter 2021 RTX 3090 GPU Hardware units for texture mapping Stanford CS248, Winter 2021 RTX 3090 GPU Hardware units for ray tracing Stanford CS248, Winter 2021 For the rest of the lecture, I’m going to focus on mapping rasterization workloads to modern mobile GPUs Stanford CS248, Winter 2021 all Q.
    [Show full text]