Programming of Graphics

Total Page:16

File Type:pdf, Size:1020Kb

Programming of Graphics Peter Mileff PhD Programming of Graphics GPU overview Graphics and Game Engines University of Miskolc Department of Information Technology Overview of the GPU... 2 GPU Overview ⦿ Graphics Processing Unit (GPU) is the central unit of your graphics card ⦿ Its objective: ● Performing complex graphical operations ● Directly accelerate the visualization ● Offload the CPU: ○ taking high-level visualization tasks from the CPU ○ therefore CPU can be used to do other things ⦿ The reason of the spread of the GPUs: ● Hardware manufacturers quickly recognized the business opportunities. Creating: ○ Multimedia applications (e.g. Photoshop) ○ Engineering systems (e.g. CAD systems) ○ Games 3 First Achievements ⦿ In 1996, 3dfx company released Voodoo I ⦿ Voodoo I characteristics: ● The first 3D accelerator card (4MB RAM, 50 Mhz) ● Huge success ● Support only the 3D visualization ○ It required an additional 2D video card ⦿ The idea: ● The 2D transformations are performed by a fast 2D video card ○ E.g. the popular Matrox video card ● The 3D transformations are performed by the Voodoo card ○ its hardware were able to make faster calculations than software rendering. 4 Other important events ⦿ In the same year: ● NVIDIA and the ATI started their own GPU series ● Nvidia: NV1, RIVA 128, Geforce 256 ● ATI: 3D Rage, Rage Pro, Rage 128 ⦿ The video cards immediately became very popular ⦿ The reasons of this are: ● Reasonable price ● These cards could be buy in every computer shop ● Cards were supported by games and operating systems (mainly by windows) 5 Today main (GPU) trades 6 Architecture of the GPU… 7 CPU vs GPU ⦿ The GPU architecture is very different from the CPU (already from the very beginning!) ⦿ Reason 1: ● They are designed for specific purposes: typically to speed up graphical calculations ● Graphical calculations have different requirements than the needs against the CPU ● The CPU is for general purposes ⦿ Reason 2: ● Graphical calculations and the process of the rasterization can be heavily parallelized ⦿ The development of the GPUs started to this direction 8 CPU vs GPU ⦿ CPU: implements a single-threaded computing architecture ● allows to run multiple processes on a single threaded pipeline ● application data can be reached through one memory interface ⦿ GPU: the architecture follows the stream processing technology ● This is much more efficient approach to process large amount of data ● A GPU can contain even thousands of stream processors ● There are no conflicts and the wait like at the CPU ○ Stream processors form a pipeline 9 CPU vs GPU 10 Geforce 8800 11 Geforce GTX 280 12 CPU vs GPU ⦿ CPU: uses a lot of resources to ● the control of the programs, ● to switch between instructions and tasks ⦿ GPU: is totally unsuitable for this ● GPU contains a lot of arithmetic logic units (ALU), ○ has the ability to calculate faster with order of magnitude ● Limitations: ○ every processing unit should run the same command – Data parallelism! ⦿ CPU also supports data parallelism! ● with extended instruction sets (e.g.. SSE, SSE2, SSE3, SSE4, AVX, ALTIVEC, stb), ● with multicore CPUs 13 The problem of data transfer ⦿ There is a distance between the GPU and CPU ● they are connected through the system bus ⦿ The data transfer problem appeared soon! ● Transfer data from main memory to the GPU memory is time consulting 14 The problem of data transfer ⦿ For this reason, numerous bus types were developed ● Former standards: ISA, MCA, VLB, PCI ● In 1997, the AGP (Accelerated Graphics Port) standard was developed ⦿ Very fast data transfer between CPu and GPU ⦿ Today is still present in the AGP standard ⦿ Today, the dominant solution is the PCI Express standard ● a high-speed serial computer expansion bus standard PCIe 1.0 PCIe 2.0 PCIe 3.0 PCIe 4.0 250 MB/s 500 MB/s 984,6 MB/s 1969.2 MB/s 15 Tendency of evolution ⦿ The GPUs evolution far exceeds the development of CPUs ⦿ Moore's law (1965): ● is the observation that the number of transistors in a dense integrated circuit doubles approximately every two years. ⦿ Today: ● CPU: the speed slowed to 18 months ● GPU: doubling rate reduced to 6 months 16 Tendency of evolution ⦿ Example: ATI Radeon HD 3800 GPU family: ● 320 stream processor ● 666 million transistors ● Performance > 1 terraFLOPS Intel Core 2 Quad CPU ● 582 million transistors ● Performance ~ 9.8 gigaFLOPS 17 Tendency 18 Tendency 19 Programming the GPU… 20 Programming APIs ⦿ In parallel with the development of video cards numerous low-level programming interfaces (API) were developed ● Under strong influence of hardware vendors ⦿ First well known API: Glide API ● Developed by 3dfx for their own Voodoo cards ● OpenGL like interface ● Targeted games in terms of performance and functionality ● It was dominant in game industry until mid-1990s ● In 2000, Nvidia acquired 3dfx 21 Direct3D vs OpenGL ⦿ Direct3D ● Part of the Microsoft’s DirectX graphical API ● Available only for Windows platforms ○ Desktops, XBox, Windows Phone ● The most popular graphical APIs for game developers ⦿ The reason of its popularity: ● development is perfectly follows the evolution of graphics hardwares ● Provides also built in higher level solutions: ○ Optimized mathematical solutions. E.g. matrices, vectors, collision detection, etc ○ Own 3D bone animation based model format called X ● Other additional higher-level APIs: DirectDraw, DirectInput, DirectSound, etc 22 Direct3D vs OpenGL ⦿ OpenGL (Open Graphics Library): ● Specification standard for platform independent 2D és 3D visualization ● Introduced in 1992 by Silicon Graphics Inc ● The ARB (Architecture Review Board) consortium was responsible for its development ○ Members are the major software and hardware manufacturers: ○ ATI, NVIDIA, Intel, Microsoft, etc.) ● In 2006 Khronos Group consortium took over its development ○ https://www.khronos.org/ ● Slower development: the development of the specification is a slow process, which significantly hinders the graphics-intensive applications developers. 23 Direct3D vs OpenGL ⦿ Real competitors in the field of game development ⦿ Both API has its own advantages and drawbacks ● Mainly there are only structural differences, the two APIs are almost identical in functionality ⦿ Advantage of the OpenGL (the future): ● Platform independence: opengl has the opportunity to run on almost all devices ● OpenGL can also be used for embedded systems and mobile devices. ○ This version is called OpenGL ES ● Popular operating systems are using OpenGL ○ iOS - OpenGL ES ○ Linux, Unix, BSD - OpenGL ○ Playstation - OpenGL ○ AmigaOS, MorphOS, Haiku OS ○ etc 24 Game and Graphics Engines… 25 Game engines ⦿ Objective 1: to provide a toolkit for the developers team (developer, designer, tester), ● E.g.: editors,runtime environment, network, audio ⦿ Efficient, convenient and fast game development becomes possible ⦿ It is a layer between the Operating System and the game logic. ⦿ It simplifies the routine programming tasks: ● Otherwise these should be performed for all games ● E.g: creating a window, audio, play video, loading assets, collision detection, etc. ⦿ Objective 2: representing an appropriate technical quality ● in terms graphics quality and performance 26 Structure of a Game Engine ⦿ The process of game development requires a complex IT knowledge! ● The game engine supports these process and therefore it’s functionality should be also complex They are organized into well-defined subsystems: ⦿ Core subsystem: core functions, controls the modules and other subsystems. Provides platform independency, forwards events to other engine parts. ⦿ Graphics subsystem: responsible for visualization. It is typically built upon an API (OpenGL, DirectX) ● Display models, lights, effects, post-processing, particle systems, etc. 27 Structure of a Game Engine ⦿ Audio and Music subsystem: playing audio effects and music ⦿ Artificial intelligence subsystem ⦿ Network subsystem: support for network connections and data transfer ⦿ Input and Event subsystem: handle input devices and event management ⦿ Scripting subsystem: support script based development ⦿ Resource subsystem: functions to access to resources ⦿ Physics subsystem: make physical based simulations possible. (E.g. racing games) ⦿ Other subsystems: for math calculations, video playing, etc. 28 Structure of a Game Engine ⦿ Subsystems should be a replaceable unit ⦿ Sometimes a subsystem is not developed by in-house ● the companies may decide to buy an existing and well- functioning technology. ● If the development of the new subsystem will cost more than licensing an existing ○ Typical example is integrating a physical subsystem ⦿ Examining today's major game engines modularity can be seen ● Main components are written using a low level language (e.g. C/C++) ○ Because of performance ● Game logic is written using a higher level language ○ fewer errors ○ Cheaper developers 29 Today’s major Engines ⦿ Thanks to technology, the graphics and game engines can offer sumptuous visuals ⦿ Games become increasingly complex ● They contain even more cinematic parts, and functionality ⦿ A modern game engine can be very expensive ● In return: developers will receive multiple years of experience in the form of implemented algorithms 30 Today’s major Engines ⦿ Unreal Engine 4 - Epic Games ⦿ Engine is free, but 5% royalty should be payed after the first $3,000 of revenue per product per quarter ⦿ ID Tech 5 – ID Software ⦿ Frostbite 3 - EA Digital Illusions CE ⦿ Cryengine 3 - Crytek ⦿ Source Engine – Valve ⦿ Unity Engine - Unity ⦿ ShiVa 3D - Stonetrip ⦿ C4 Engine - Terathon
Recommended publications
  • Energy-Efficient VLSI Architectures for Next
    UNIVERSITY OF CALIFORNIA Los Angeles Energy-Efficient VLSI Architectures for Next- Generation Software-Defined and Cognitive Radios A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Fang-Li Yuan 2014 c Copyright by Fang-Li Yuan 2014 ABSTRACT OF THE DISSERTATION Energy-Efficient VLSI Architectures for Next- Generation Software-Defined and Cognitive Radios by Fang-Li Yuan Doctor of Philosophy in Electrical Engineering University of California, Los Angeles, 2014 Professor Dejan Markovic,´ Chair Dedicated radio hardware is no longer promising as it was in the past. Today, the support of diverse standards dictates more flexible solutions. Software-defined radio (SDR) provides the flexibility by replacing dedicated blocks (i.e. ASICs) with more general processors to adapt to various functions, standards and even allow mutable de- sign changes. However, such replacement generally incurs significant efficiency loss in circuits, hindering its feasibility for energy-constrained devices. The capability of dy- namic and blind spectrum analysis, as featured in the cognitive radio (CR) technology, makes chip implementation even more challenging. This work discusses several design techniques to achieve near-ASIC energy effi- ciency while providing the flexibility required by software-defined and cognitive radios. The algorithm-architecture co-design is used to determine domain-specific dataflow ii structures to achieve the right balance between energy efficiency and flexibility. The flexible instruction-set-architecture (ISA), the multi-scale interconnects, and the multi- core dynamic scheduling are also proposed to reduce the energy overhead. We demon- strate these concepts on two real-time blind classification chips for CR spectrum anal- ysis, as well as a 16-core processor for baseband SDR signal processing.
    [Show full text]
  • Drivers for Windows Compressed Modes User’S Guide
    Drivers for Windows Compressed Modes User’s Guide Version 2.1 NVIDIA Corporation October 24, 2002 NVIDIA Drivers Compressed Modes User’s Guide Version 2.1 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Copyright © 2002 NVIDIA Corporation. All rights reserved. This software may not, in whole or in part, be copied through any means, mechanical, electromechanical, or otherwise, without the express permission of NVIDIA Corporation. Information furnished is believed to be accurate and reliable. However, NVIDIA assumes no responsibility for the consequences of use of such information nor for any infringement of patents or other rights of third parties, which may result from its use. No License is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in the software are subject to change without notice. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. NVIDIA, the NVIDIA logo, GeForce, GeForce2 Ultra, GeForce2 MX, GeForce2 GTS, GeForce 256, GeForce3, Quadro2, NVIDIA Quadro2, Quadro2 Pro, Quadro2 MXR, Quadro, NVIDIA Quadro, Vanta, NVIDIA Vanta, TNT2, NVIDIA TNT2, TNT, NVIDIA TNT, RIVA, NVIDIA RIVA, NVIDIA RIVA 128ZX, and NVIDIA RIVA 128 are registered trademarks or trademarks of NVIDIA Corporation in the United States and/or other countries. Intel and Pentium are registered trademarks of Intel. Microsoft, Windows, Windows NT, Direct3D, DirectDraw, and DirectX are registered trademarks of Microsoft Corporation. CDRS is a trademark and Pro/ENGINEER is a registered trademark of Parametric Technology Corporation. OpenGL is a registered trademark of Silicon Graphics Inc.
    [Show full text]
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
    Case M:07-cv-01826-WHA Document 249 Filed 11/08/2007 Page 1 of 34 1 BOIES, SCHILLER & FLEXNER LLP WILLIAM A. ISAACSON (pro hac vice) 2 5301 Wisconsin Ave. NW, Suite 800 Washington, D.C. 20015 3 Telephone: (202) 237-2727 Facsimile: (202) 237-6131 4 Email: [email protected] 5 6 BOIES, SCHILLER & FLEXNER LLP BOIES, SCHILLER & FLEXNER LLP JOHN F. COVE, JR. (CA Bar No. 212213) PHILIP J. IOVIENO (pro hac vice) 7 DAVID W. SHAPIRO (CA Bar No. 219265) ANNE M. NARDACCI (pro hac vice) KEVIN J. BARRY (CA Bar No. 229748) 10 North Pearl Street 8 1999 Harrison St., Suite 900 4th Floor Oakland, CA 94612 Albany, NY 12207 9 Telephone: (510) 874-1000 Telephone: (518) 434-0600 Facsimile: (510) 874-1460 Facsimile: (518) 434-0665 10 Email: [email protected] Email: [email protected] [email protected] [email protected] 11 [email protected] 12 Attorneys for Plaintiff Jordan Walker Interim Class Counsel for Direct Purchaser 13 Plaintiffs 14 15 UNITED STATES DISTRICT COURT 16 NORTHERN DISTRICT OF CALIFORNIA 17 18 IN RE GRAPHICS PROCESSING UNITS ) Case No.: M:07-CV-01826-WHA ANTITRUST LITIGATION ) 19 ) MDL No. 1826 ) 20 This Document Relates to: ) THIRD CONSOLIDATED AND ALL DIRECT PURCHASER ACTIONS ) AMENDED CLASS ACTION 21 ) COMPLAINT FOR VIOLATION OF ) SECTION 1 OF THE SHERMAN ACT, 15 22 ) U.S.C. § 1 23 ) ) 24 ) ) JURY TRIAL DEMANDED 25 ) ) 26 ) ) 27 ) 28 THIRD CONSOLIDATED AND AMENDED CLASS ACTION COMPLAINT BY DIRECT PURCHASERS M:07-CV-01826-WHA Case M:07-cv-01826-WHA Document 249 Filed 11/08/2007 Page 2 of 34 1 Plaintiffs Jordan Walker, Michael Bensignor, d/b/a Mike’s Computer Services, Fred 2 Williams, and Karol Juskiewicz, on behalf of themselves and all others similarly situated in the 3 United States, bring this action for damages and injunctive relief under the federal antitrust laws 4 against Defendants named herein, demanding trial by jury, and complaining and alleging as 5 follows: 6 NATURE OF THE CASE 7 1.
    [Show full text]
  • In5050 – Gpu & Cuda
    IN5050 – GPU & CUDA Håkon Kvale Stensland Simula Research Laboratory / Department for Informatics PC Graphics Timeline § Challenges: − Render infinitely complex scenes − And extremely high resolution − In 1/60th of one second (60 frames per second) § Graphics hardware has evolved from a simple hardwired pipeline to a highly programmable multiword processor DirectX 6 DirectX 7 DirectX 8 DirectX 9 DirectX 9.0c DirectX 9.0c DirectX 10 DirectX 5 Multitexturing T&L TextureStageState SM 1.x SM 2.0 SM 3.0 SM 3.0 SM 4.0 Riva 128 Riva TNT GeForce 256 GeForce 3 Cg GeForceFX GeForce 6 GeForce 7 GeForce 8 1998 1999 2000 2001 2002 2003 2004 2005 2006 University of Oslo IN5050, Pål Halvorsen, Carsten Griwodz, Håkon Stensland GPU – Graphics Processing Units University of Oslo IN5050, Pål Halvorsen, Carsten Griwodz, Håkon Stensland Basic 3D Graphics Pipeline Application Host Scene Management Geometry Rasterization GPU Frame Pixel Processing Buffer Memory ROP/FBI/Display University of Oslo IN5050, Pål Halvorsen, Carsten Griwodz, Håkon Stensland Graphics in the PC Architecture § PCIe (PCI Express) Between processor and chipset − Memory Control now integrated in CPU § The old “NorthBridge” integrated onto CPU − PCI Express 4.0 x16 bandwidth at 64 GB/s (32 GB in each direction) § “SouthBridge” (X570) handles all other peripherals § Most mainstream CPUs now come with integrated GPU − Same capabilities as discrete GPU’s − Less performance (limited by die space and power) AMD «Raven Ridge» Zen+ APU University of Oslo IN5050, Pål Halvorsen, Carsten Griwodz, Håkon Stensland High-end «Graphics» Hardware § nVIDIA Ampere Architecture § The latest generation GPU, codenamed A100 § 54,2 billion transistors § 6912 Processing cores (SP) − Mixed precision − Dedicated Tensor cores − PCI Express 4.0 − NVLink interconnect Tesla V100 − Hardware support for preemption.
    [Show full text]
  • Programming Graphics Hardware Overview of the Tutorial: Afternoon
    Tutorial 5 ProgrammingProgramming GraphicsGraphics HardwareHardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 Introduction to the Hardware Graphics Pipeline Cyril Zeller 9:30 Controlling the GPU from the CPU: the 3D API Cyril Zeller 10:15 Break 10:45 Programming the GPU: High-level Shading Languages Randy Fernando 12:00 Lunch Tutorial 5: Programming Graphics Hardware Overview of the Tutorial: Afternoon 12:00 Lunch 14:00 Optimizing the Graphics Pipeline Matthias Wloka 14:45 Advanced Rendering Techniques Matthias Wloka 15:45 Break 16:15 General-Purpose Computation Using Graphics Hardware Mark Harris 17:30 End Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware IntroductionIntroduction toto thethe HardwareHardware GraphicsGraphics PipelinePipeline Cyril Zeller Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing 1999-2000: Transform and lighting 2001: Programmable vertex shader 2002-2003: Programmable pixel shader 2004: Shader model 3.0 and 64-bit color support PC graphics software architecture Performance numbers Tutorial 5: Programming Graphics Hardware Real-Time Rendering Graphics hardware enables real-time rendering Real-time means display rate at more than 10 images per second 3D Scene = Image = Collection of Array of pixels 3D primitives (triangles, lines, points) Tutorial 5: Programming Graphics Hardware Hardware Graphics Pipeline
    [Show full text]
  • GPU-Based Deep Learning Inference
    Whitepaper GPU-Based Deep Learning Inference: A Performance and Power Analysis November 2015 1 Contents Abstract ......................................................................................................................................................... 3 Introduction .................................................................................................................................................. 3 Inference versus Training .............................................................................................................................. 4 GPUs Excel at Neural Network Inference ..................................................................................................... 5 Inference Optimizations in Caffe and cuDNN 4 ........................................................................................ 5 Experimental Setup and Testing Methodology ........................................................................................ 7 Inference on Small and Large GPUs .......................................................................................................... 8 Conclusion ................................................................................................................................................... 10 References .................................................................................................................................................. 10 2 Abstract Deep learning methods are revolutionizing various areas of machine perception. On a
    [Show full text]
  • Arxiv:1809.03668V2 [Cs.LG] 20 Jan 2019 17, 20, 21]
    Comparing Computing Platforms for Deep Learning on a Humanoid Robot Alexander Biddulph∗, Trent Houliston, Alexandre Mendes, and Stephan K. Chalup School of Electrical Engineering and Computing The University of Newcastle, Callaghan, NSW, 2308, Australia. [email protected] Abstract. The goal of this study is to test two different computing plat- forms with respect to their suitability for running deep networks as part of a humanoid robot software system. One of the platforms is the CPU- centered Intel R NUC7i7BNH and the other is a NVIDIA R Jetson TX2 system that puts more emphasis on GPU processing. The experiments addressed a number of benchmarking tasks including pedestrian detec- tion using deep neural networks. Some of the results were unexpected but demonstrate that platforms exhibit both advantages and disadvantages when taking computational performance and electrical power require- ments of such a system into account. Keywords: deep learning, robot vision, gpu computing, low powered devices 1 Introduction Deep learning comes with challenges with respect to computational resources and training data requirements [6, 13]. Some of the breakthroughs in deep neu- ral networks (DNNs) only became possible through the availability of massive computing systems or through careful co-design of software and hardware. For example, the AlexNet system presented in [15] was implemented efficiently util- ising two NVIDIA R GTX580 GPUs for training. Machine learning on robots has been a growing area over the past years [4, arXiv:1809.03668v2 [cs.LG] 20 Jan 2019 17, 20, 21]. It has become increasingly desirable to employ DNNs in low powered devices, among them humanoid robot systems, specifically for complex tasks such as object detection, walk learning, and behaviour learning.
    [Show full text]
  • 4010, 237 8514, 226 80486, 280 82786, 227, 280 a AA. See Anti-Aliasing (AA) Abacus, 16 Accelerated Graphics Port (AGP), 219 Acce
    Index 4010, 237 AIB. See Add-in board (AIB) 8514, 226 Air traffic control system, 303 80486, 280 Akeley, Kurt, 242 82786, 227, 280 Akkadian, 16 Algebra, 26 Alias Research, 169 Alienware, 186 A Alioscopy, 389 AA. See Anti-aliasing (AA) All-In-One computer, 352 Abacus, 16 All-points addressable (APA), 221 Accelerated Graphics Port (AGP), 219 Alpha channel, 328 AccelGraphics, 166, 273 Alpha Processor, 164 Accel-KKR, 170 ALT-256, 223 ACM. See Association for Computing Altair 680b, 181 Machinery (ACM) Alto, 158 Acorn, 156 AMD, 232, 257, 277, 410, 411 ACRTC. See Advanced CRT Controller AMD 2901 bit-slice, 318 (ACRTC) American national Standards Institute (ANSI), ACS, 158 239 Action Graphics, 164, 273 Anaglyph, 376 Acumos, 253 Anaglyph glasses, 385 A.D., 15 Analog computer, 140 Adage, 315 Anamorphic distortion, 377 Adage AGT-30, 317 Anatomic and Symbolic Mapper Engine Adams Associates, 102 (ASME), 110 Adams, Charles W., 81, 148 Anderson, Bob, 321 Add-in board (AIB), 217, 363 AN/FSQ-7, 302 Additive color, 328 Anisotropic filtering (AF), 65 Adobe, 280 ANSI. See American national Standards Adobe RGB, 328 Institute (ANSI) Advanced CRT Controller (ACRTC), 226 Anti-aliasing (AA), 63 Advanced Remote Display Station (ARDS), ANTIC graphics co-processor, 279 322 Antikythera device, 127 Advanced Visual Systems (AVS), 164 APA. See All-points addressable (APA) AED 512, 333 Apalatequi, 42 AF. See Anisotropic filtering (AF) Aperture grille, 326 AGP. See Accelerated Graphics Port (AGP) API. See Application program interface Ahiska, Yavuz, 260 standard (API) AI.
    [Show full text]
  • Shippensburg University Investment Management Program
    Shippensburg University Investment Management Program Hold NVIDIA Corp. (NASDAQ: NVDA) 11.03.2020 Current Price Fair Value 52 Week Range $501.36 $300 $180.68 - 589.07 Analyst: Valentina Alonso Key Stock Statistics Email: [email protected] Sector: Information Technology Revenue (TTM) $13.06B Stock Type: Large Growth Operating Margin (TTM) 28.56% Industry: Semiconductors and Semiconductors Equipment Market Cap: $309.697B Net Income (TTM) $3.39B EPS (TTM) $5.44 Operating Cash Flow (TTM) $5.58B Free Cash Flow (TTM) $3.67B Return on Assets (TTM) 11.67% Return on Equity (TTM) 27.94% P/E $92.59 Company overview P/B $22.32 Nvidia is the leading designer of graphics processing units that P/S $23.29 enhance the experience on computing platforms. The firm's chips are used in a variety of end markets, including high-end PCs for gaming, P/FCF 44.22 data centers, and automotive infotainment systems. In recent years, the firm has broadened its focus from traditional PC graphics Beta (5-Year) 1.54 applications such as gaming to more complex and favorable Dividend Yield 0.13% opportunities, including artificial intelligence and autonomous driving, which leverage the high-performance capabilities of the Projected 5 Year Growth 17.44% firm's graphics processing units. (per annum) Contents Executive Summary ....................................................................................................................................................3 Company Overview ....................................................................................................................................................4
    [Show full text]
  • Numerical Behavior of NVIDIA Tensor Cores
    Numerical behavior of NVIDIA tensor cores Massimiliano Fasi1, Nicholas J. Higham2, Mantas Mikaitis2 and Srikara Pranesh2 1 School of Science and Technology, Örebro University, Örebro, Sweden 2 Department of Mathematics, University of Manchester, Manchester, UK ABSTRACT We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: (1) accurately simulate NVIDIA tensor cores on conventional hardware; (2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and (3) build custom hardware whose behavior matches that of NVIDIA tensor cores. As part of this work we provide a test suite that can be easily adapted to test newer versions of the NVIDIA tensorcoresaswellassimilaracceleratorsfromothervendors,astheybecome available. Moreover, we identify a non-monotonicity issue
    [Show full text]
  • Manycore GPU Architectures and Programming, Part 1
    Lecture 19: Manycore GPU Architectures and Programming, Part 1 Concurrent and Mul=core Programming CSE 436/536, [email protected] www.secs.oakland.edu/~yan 1 Topics (Part 2) • Parallel architectures and hardware – Parallel computer architectures – Memory hierarchy and cache coherency • Manycore GPU architectures and programming – GPUs architectures – CUDA programming – Introduc?on to offloading model in OpenMP and OpenACC • Programming on large scale systems (Chapter 6) – MPI (point to point and collec=ves) – Introduc?on to PGAS languages, UPC and Chapel • Parallel algorithms (Chapter 8,9 &10) – Dense matrix, and sorng 2 Manycore GPU Architectures and Programming: Outline • Introduc?on – GPU architectures, GPGPUs, and CUDA • GPU Execuon model • CUDA Programming model • Working with Memory in CUDA – Global memory, shared and constant memory • Streams and concurrency • CUDA instruc?on intrinsic and library • Performance, profiling, debugging, and error handling • Direc?ve-based high-level programming model – OpenACC and OpenMP 3 Computer Graphics GPU: Graphics Processing Unit 4 Graphics Processing Unit (GPU) Image: h[p://www.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html 5 Graphics Processing Unit (GPU) • Enriching user visual experience • Delivering energy-efficient compung • Unlocking poten?als of complex apps • Enabling Deeper scien?fic discovery 6 What is GPU Today? • It is a processor op?mized for 2D/3D graphics, video, visual compu?ng, and display. • It is highly parallel, highly multhreaded mulprocessor op?mized for visual
    [Show full text]
  • COM Express® + GPU Embedded System (VXG/DXG)
    COM Express® + GPU Embedded System (VXG/DXG) VXG Series DXG Series Connect Tech Inc. Tel: 519-836-1291 42 Arrow Road Toll: 800-426-8979 (North America only) Guelph, Ontario Fax: 519-836-4878 N1K 1S6 Email: [email protected] www.connecttech.com [email protected] CTIM-00409 Revision 0.12 2018-03-16 COM Express® + GPU Embedded System (VXG/DXG) Users Guide www.connecttech.com Table of Contents Preface ................................................................................................................................................... 4 Disclaimer ....................................................................................................................................................... 4 Customer Support Overview ........................................................................................................................... 4 Contact Information ........................................................................................................................................ 4 One Year Limited Warranty ............................................................................................................................ 5 Copyright Notice ............................................................................................................................................. 5 Trademark Acknowledgment .......................................................................................................................... 5 ESD Warning .................................................................................................................................................
    [Show full text]