Introduction to Modern GPU Hardware

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Modern GPU Hardware The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact [email protected] . I will correct the error asap. This course used only and please do NOT broadcast. Thank you. Introduction to Modern GPU Hardware Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Hsinchu, Taiwan Fall, 2016 1 Outline GPU Pipeline History of GPU Hardware GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali GPU Applications Summary 2 GPU Fundamentals: Graphics Pipeline Graphics State Screenspace triangles (2D) triangles Screenspace Xformed, Lit Vertices (2D) Vertices Lit Xformed, Final Pixels (Color, Depth) (Color, Pixels Final Fragments (pre Fragments Vertices (3D) Vertices Transform Assemble Application Rasterize Shade Video & Light Primitives Memory (Textures) - pixels) CPU GPU Render-to-texture • A simplified graphics pipeline – Note that pipe widths vary – Many caches, FIFOs, and so on not shown GPU Fundamentals: Modern Graphics Pipeline Graphics State Screenspace triangles (2D) triangles Screenspace Xformed, Lit Vertices (2D) Vertices Lit Xformed, Final Pixels (Color, Depth) (Color, Pixels Final Fragments (pre Fragments Vertices (3D) Vertices TransformVertex Assemble Application Rasterize FragmentShade Video Processor& Light Primitives Processor Memory (Textures) - pixels) CPU GPU Render-to-texture • Programmable vertex • Programmable pixel processor! processor! GPU Fundamentals: Modern Graphics Pipeline Graphics State Screenspace triangles (2D) triangles Screenspace Xformed, Lit Vertices (2D) Vertices Lit Xformed, Final Pixels (Color, Depth) (Color, Pixels Final Fragments (pre Fragments Vertices (3D) Vertices Vertex AssembleGeometry Fragment Application Rasterize Video Processor PrimitivesProcessor Processor Memory (Textures) - pixels) CPU GPU Render-to-texture Programmable More flexible primitive assembly! memory access! History of Graphics Hardware (1/3) … - mid ’90s SGI mainframes and workstations PC: only 2D graphics hardware mid ’90s Consumer 3D graphics hardware (PC) - 3dfx, NVIDIA, Matrox, ATI, … Triangle rasterization (only) Cheap: pushed by game industry 1999 3DFX Voodoo graphics 4MB - 1997 PC-card with TnL (Transform and Lighting) - NVIDIA GeForce: Graphics Processing Unit (GPU) PC-card more powerful than specialized workstations 6 History of Graphics Hardware (2/3) https://www.zhihu.com/question/21980949 History of Graphics Hardware (3/3) Modern graphics hardware Graphics pipeline partly programmable Leaders: AMD(ATI) and NVIDIA - “AMD Radeon HD 6990” and “NVIDIA GeForce GTX 590” Game consoles similar to GPUs (Xbox) 8 Computational Power (1/2) • GPUs are fast… – 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160): • Computation: 48 GFLOPS peak • Memory bandwidth: 21 GB/s peak • Price: $874 (chip) – NVIDIA GeForce 8800 GTX: • Computation: 330 GFLOPS observed • Memory bandwidth: 55.2 GB/s observed • Price: $599 (board) • GPUs are getting faster, faster – CPUs: 1.4× annual growth – GPUs: 1.7× (pixels) to 2.3× (vertices) annual growth Computational Power (2/2) GPU CPU Courtesy Naga Govindaraju Flops Comparison on GPU and CPU Memory Bandwidths Comparison of CPU and GPU Motivation • Why are GPUs getting faster so fast? – Arithmetic intensity • the specialized nature of GPUs makes it easier to use additional transistors for computation – Economics • multi-billion dollar video game market is a pressure cooker that drives innovation to exploit this property Flexible and Precise • Modern GPUs are deeply programmable – Programmable pixel, vertex, and geometry engines – Solid high-level language support • Modern GPUs support “real” precision – 32-bit/64-bit floating point throughout the pipeline • High enough for many applications – DX10-class GPUs add 32-bit integers Graphics Hardware Consideration (1/2) • GPU = Graphics Processing Unit – Vector processor – Operates on 4 tuples • Position ( x, y, z, w ) • Color ( red, green, blue, alpha ) • Texture Coordinates ( s, t, r, q ) – 4 tuple ops, 1 clock cycle • SIMD [ Single Instruction Multiple Data ] – ADD, MUL, SUB, DIV, MADD, … Graphics Hardware Consideration (2/2) • Pipelining 1 2 3 – Number of stages 1 • Parallelism 2 – Number of parallel processes 3 1 2 3 • Parallelism + pipelining 1 2 3 – Number of parallel pipelines 1 2 3 Outline GPU Pipeline History of GPU Hardware GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali Summary 17 Growth of NVIDIA GPU • Performance matrices – Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate. Growth of NVIDIA GPU NVIDIA GeForce 7900 GTX Nvidia Graphics Card Architecture • GeForce-8 Series – 12,288 concurrent threads, hardware managed – 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak Host CPU Work Distribution IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory TF TF TF TF TF TF TF TF TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 L2 L2 L2 L2 L2 L2 Memory Memory Memory Memory Memory Memory NVIDIA FERMI FERMI: Streaming Multiprocessor (SM) • Each SM contains • 32 Cores • 16 Load/Store units • 32,768 registers • Newer FP representation • IEEE 754-2008 • Two units • Floating point • Integer FERMI: Results FERMI: Comparison Kepler: Core Architecture http://www.weistang.com/article-941-1.html Titan vs Tesla Comparison 09/02/11 Maxwell: Core Architecture http://www.weistang.com/article-941-1.html http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A B%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B Kepler vs Maxwell Comparison http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B 09/02/11 https://zh.wikipedia.org/wiki/CUDA 09/02/11 NVIDIA ULP-Geforce (Tegra2) 31 NVIDIA ULP-Geforce (Tegra2) • Ultra low power (ULP) GeForce GPU with 4 pixel shaders + 4 vertex shaders • 32-bit single-channel memory controller with either LPDDR2-600 or DDR2-667 memory 32 NVIDIA ULP-Geforce (Tegra3) 33 NVIDIA ULP-Geforce (Tegra3) • The GPU in Tegra 3 is an evolution of the Tegra 2 GPU, with twice the number of pixel shader units (8 compared to 4) and higher clock frequency. • 32-bit single-channel memory controller with either LPDDR2 or DDR3 memory 34 Tegra Roadmap 09/02/11 Mobile Roadmap http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler- into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table- drawing-tablet?page=2 09/02/11 ATI Radeon X1900 XTX • Features of ATI Radeon X1900 XTX – Core speed 650 MHz – 48 pixel shader processors – 8 vertex shader processors – 51 GB/s memory http://product.pcpop.com/000024721/Index bandwidth .html – 512 MB memory ATI Radeon X1900 XTX • High Memory Bandwidth Graphics Card High bandwidth GPU 51GB/s Graphics memory Output 650MHz ½ GB AGP bus Processor Chip 2GB/s High bandwidth AGP memory Cache ½ GB CPU 77GB/s ½ MB 3GB/s Parallel Processes Parallel Main memory 3GHz 1GB ATI Radeon 9700 • Parallelism + pipelining: ATI Radeon 9700 4 vertex pipelines 8 pixel pipelines Radeon Comparison http://www.pcdiy.com.tw/detail/4275 09/02/11 IMG PowerVR Series5XT (SGXMP) 41 IMG PowerVR Series5XT (SGXMP) • Shader-driven Tile-Based Deferred Rendering (TBDR) architecture • Fully programmable GPU using unique USSE architecture • All SGX cores support OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1 42 IMG PowerVR Series6 (Rogue) 43 IMG PowerVR Series6 (Rogue) • Support OpenGL ES 3.0, OpenGL ES 2.0, OpenGL 3.x/4.x, OpenCL 1.x and DirectX10 with certain family members extending their capabilities to full WHQL-compliant DirectX11.1 functionality 44 IMG PowerVR 7XT Plus http://imgtec.eetrend.com/article/713045 IMG PowerVR 7XT Plus http://imgtec.eetrend.com/article/713046 Features of ARM Mali 47 ARM Mali-200 48 ARM Mali-300 49 ARM Mali-400MP 50 ARM Mali-450MP 51 ARM Mali-T604 52 ARM Mali-T604 • GPGPU (support OpenCL 1.1) • Tri-pipe architecture • The first GPU based on the Midgard architecture • True IEEE double-precision floating-point math in hardware for Full Profile • The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU • 5x performance improvement over previous Mali graphics processors. 53 ARM Mali-T624 54 9/27/2016 ARM Mali-T678 55 ARM Mali-T678 • 50% performance improvement compared to the Mali- T658. 56 ARM Mali-T760 57 ARM Mali-T880 58 ARM Mali Comparison https://zh.wikipedia.org/wiki/Mali_(GPU) 59 ARM Mali Comparison https://zh.wikipedia.org/wiki/Mali_(GPU) 60 Applications (1/7) • Includes lots of applications – Ray-tracer – Image segmentation – FFT/Linear Algebra http://f.fwallpapers.com/images/3d -bunny.jpg http://graphics.stanford.edu/data/3Ds canrep/stanford-bunny-cebal-ssh.jpg Applications (2/7) http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler- into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table- 09/02/11 drawing-tablet?page=2 Applications (3/7) http://5pit.tw/tech/computer/tid_12880 Applications (4/7) http://wechatinchina.com/thread-461154-1-1.html 09/02/11 Applications (5/7) https://read01.com/Pnd3D.html 09/02/11 Applications (6/7) AR and VR Applications @@ http://wechatinchina.com/thread-461154-1-1.html 09/02/11 Applications (7/7) http://www.naipo.com/Portals/1/web_tw/Knowledge_Center/Industry_E conomy/publish-482.htm 09/02/11 Summary Understand the GPU pipeline in depth Understand the motivation of of GPU hardware Understand modern GPU hardware architecture and specifications Understand GPU/GPGPU applications 68 Reference GPU Architecture & CG, Mark Colbert, 2006 Introduction to Graphics Hardware and GPUs, Yannick Francken, Tom Mertens GPU Tutorial, Yiyunjin, 2007 Evolution of GPU and Graphics Pipelining, Weijun Xiao Commercial product website (NVIDIA, ATI, IMG, ARM).
Recommended publications
  • NVIDIA Opengl in 2012 Mark Kilgard
    NVIDIA OpenGL in 2012 Mark Kilgard • Principal System Software Engineer – OpenGL driver and API evolution – Cg (“C for graphics”) shading language – GPU-accelerated path rendering • OpenGL Utility Toolkit (GLUT) implementer • Author of OpenGL for the X Window System • Co-author of Cg Tutorial Outline • OpenGL’s importance to NVIDIA • OpenGL API improvements & new features – OpenGL 4.2 – Direct3D interoperability – GPU-accelerated path rendering – Kepler Improvements • Bindless Textures • Linux improvements & new features • Cg 3.1 update NVIDIA’s OpenGL Leverage Cg GeForce Parallel Nsight Tegra Quadro OptiX Example of Hybrid Rendering with OptiX OpenGL (Rasterization) OptiX (Ray tracing) Parallel Nsight Provides OpenGL Profiling Configure Application Trace Settings Parallel Nsight Provides OpenGL Profiling Magnified trace options shows specific OpenGL (and Cg) tracing options Parallel Nsight Provides OpenGL Profiling Parallel Nsight Provides OpenGL Profiling Trace of mix of OpenGL and CUDA shows glFinish & OpenGL draw calls Only Cross Platform 3D API OpenGL 3D Graphics API • cross-platform • most functional • peak performance • open standard • inter-operable • well specified & documented • 20 years of compatibility OpenGL Spawns Closely Related Standards Congratulations: WebGL officially approved, February 2012 “The web is now 3D enabled” Buffer and OpenGL 4 – DirectX 11 Superset Event Interop • Interop with a complete compute solution – OpenGL is for graphics – CUDA / OpenCL is for compute • Shaders can be saved to and loaded from binary
    [Show full text]
  • Developer Tools Showcase
    Developer Tools Showcase Randy Fernando Developer Tools Product Manager NVISION 2008 Software Content Creation Performance Education Development FX Composer Shader PerfKit Conference Presentations Debugger mental mill PerfHUD Whitepapers Artist Edition Direct3D SDK PerfSDK GPU Programming Guide NVIDIA OpenGL SDK Shader Library GLExpert Videos CUDA SDK NV PIX Plug‐in Photoshop Plug‐ins Books Cg Toolkit gDEBugger GPU Gems 3 Texture Tools NVSG GPU Gems 2 Melody PhysX SDK ShaderPerf GPU Gems PhysX Plug‐Ins PhysX VRD PhysX Tools The Cg Tutorial NVIDIA FX Composer 2.5 The World’s Most Advanced Shader Authoring Environment DirectX 10 Support NVIDIA Shader Debugger Support ShaderPerf 2.0 Integration Visual Models & Styles Particle Systems Improved User Interface Particle Systems All-New Start Page 350Z Sample Project Visual Models & Styles Other Major Features Shader Creation Wizard Code Editor Quickly create common shaders Full editor with assisted Shader Library code generation Hundreds of samples Properties Panel Texture Viewer HDR Color Picker Materials Panel View, organize, and apply textures Even More Features Automatic Light Binding Complete Scripting Support Support for DirectX 10 (Geometry Shaders, Stream Out, Texture Arrays) Support for COLLADA, .FBX, .OBJ, .3DS, .X Extensible Plug‐in Architecture with SDK Customizable Layouts Semantic and Annotation Remapping Vertex Attribute Packing Remote Control Capability New Sample Projects 350Z Visual Styles Atmospheric Scattering DirectX 10 PCSS Soft Shadows Materials Post‐Processing Simple Shadows
    [Show full text]
  • Manycore GPU Architectures and Programming, Part 1
    Lecture 19: Manycore GPU Architectures and Programming, Part 1 Concurrent and Mul=core Programming CSE 436/536, [email protected] www.secs.oakland.edu/~yan 1 Topics (Part 2) • Parallel architectures and hardware – Parallel computer architectures – Memory hierarchy and cache coherency • Manycore GPU architectures and programming – GPUs architectures – CUDA programming – Introduc?on to offloading model in OpenMP and OpenACC • Programming on large scale systems (Chapter 6) – MPI (point to point and collec=ves) – Introduc?on to PGAS languages, UPC and Chapel • Parallel algorithms (Chapter 8,9 &10) – Dense matrix, and sorng 2 Manycore GPU Architectures and Programming: Outline • Introduc?on – GPU architectures, GPGPUs, and CUDA • GPU Execuon model • CUDA Programming model • Working with Memory in CUDA – Global memory, shared and constant memory • Streams and concurrency • CUDA instruc?on intrinsic and library • Performance, profiling, debugging, and error handling • Direc?ve-based high-level programming model – OpenACC and OpenMP 3 Computer Graphics GPU: Graphics Processing Unit 4 Graphics Processing Unit (GPU) Image: h[p://www.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html 5 Graphics Processing Unit (GPU) • Enriching user visual experience • Delivering energy-efficient compung • Unlocking poten?als of complex apps • Enabling Deeper scien?fic discovery 6 What is GPU Today? • It is a processor op?mized for 2D/3D graphics, video, visual compu?ng, and display. • It is highly parallel, highly multhreaded mulprocessor op?mized for visual
    [Show full text]
  • 1.2 Molecular Dynamics Simulations
    Accelerator-based Look-up Table for Coarse-grained Molecular Dynamics Computations Prepared by: Town Ananya Gangopadhyay GNGANA001 Scientific Computing Research Unit Department of ChemistryCape University of Cape Town of Supervised by: Prof. Kevin J. Naidoo Scientific Computing Research Unit Department of Chemistry University of Cape Town UniversityDr. Simon Winberg Scientific Computing Research Unit Department of Electrical Engineering University of Cape Town July 2018 Dissertation presented to the University of Cape Town in fulfilment of the academic requirements for a Master of Science degree in Computational Science. Key Words: Molecular Dynamics, Parallel Computing, Coarse-grained, GPU, LUT. The copyright of this thesis vests in the author. No quotation from it or information derivedTown from it is to be published without full acknowledgement of the source. The thesis is to be used for private study or non- commercial research purposes Capeonly. of Published by the University of Cape Town (UCT) in terms of the non-exclusive license granted to UCT by the author. University Declaration I declare that this dissertation titles ACCELERATOR-BASED LOOK-UP TABLE FOR COARSE- GRAINED MOLECULAR DYNAMICS COMPUTATIONS, is a presentation of my original research work done at the Scientific Computing Research Unit, Department of Chemistry, University of Cape Town, South Africa. No part of this thesis has been submitted elsewhere for any other degree of qualification. Whenever contributions of others are involved, every effort is made to indicate this clearly, with due reference to the literature, and acknowledgment of collaborative research and discussions. Name: Ananya Gangopadhyay Signature: Date: 9 July 2018 i Abstract Molecular Dynamics (MD) is a simulation technique widely used by computational chemists and biologists to simulate and observe the physical properties of a system of particles or molecules.
    [Show full text]
  • Dense and Sparse Parallel Linear Algebra Algorithms on Graphics Processing Units
    Departament de Sistemes Informatics` i Computacio´ Dense and sparse parallel linear algebra algorithms on graphics processing units Author: Alejandro Lamas Davi~na Director: Jos´eE. Rom´anMolt´o October 2018 To the extent possible under law, the author has waived all copyright and related or neighboring rights to this work. To one, two, and three. Acknowledgments I would like to express my gratitude to my director Jos´eRom´an,for his permanent support during all these years of work. His wise advice and selfless guidance have been decisive for the culmination of this thesis. His door has always been open for me, and he has solved all my doubts with unlimited patience. For all that and for more, I thank him. I would like to extend my gratitude to my colleagues of the SLEPc project. Here I thank again to Jos´eRom´anfor his unique humor sense. To Carmen, for showing me the way and for all her good advices. To Enrique, who helped me to get rolling. And to the former members Andr´esand Eloy, to whom I had the opportunity to meet and who enliven the group meals. I will keep good memories from these years. I do not want to forget to mention to Xavier Cartoix`a,Jeff Steward and Altuˇg Aksoy, great researchers with whom I have had the opportunity to collaborate. The afternoon snacks would not have been the same without the excellent discus- sions and comments of Fernando, David and of course Salva, who, without noticing it, also helped to improve this dissertation. Last, I would like to thank to Jos´eLuis, IT staff of the department, for his high valuable work behind the scenes and his promptly response to any incidence.
    [Show full text]
  • Reviewer's Guide
    Reviewer’s Guide NVIDIA® GeForce® GTX 280 GeForce® GTX 260 Graphics Processing Units TABLE OF CONTENTS NVIDIA GEFORCE GTX 200 GPUS.....................................................................3 Two Personalities, One GPU ........................................................................................................ 3 Beyond Gaming ............................................................................................................................. 3 GPU-Powered Video Transcoding............................................................................................... 3 GPU Powered Folding@Home..................................................................................................... 4 Industry wide support for CUDA.................................................................................................. 4 Gaming Beyond ............................................................................................................................. 5 Dynamic Realism.......................................................................................................................... 5 Introducing GeForce GTX 200 GPUs ........................................................................................... 9 Optimized PC and Heterogeneous Computing.......................................................................... 9 GeForce GTX 200 GPUs – Architectural Improvements.......................................................... 10 Power Management Enhancements.........................................................................................
    [Show full text]
  • Lecture: Manycore GPU Architectures and Programming, Part 1
    Lecture: Manycore GPU Architectures and Programming, Part 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan [email protected] https://passlab.github.io/CSCE569/ 1 Manycore GPU Architectures and Programming: Outline • Introduction – GPU architectures, GPGPUs, and CUDA • GPU Execution model • CUDA Programming model • Working with Memory in CUDA – Global memory, shared and constant memory • Streams and concurrency • CUDA instruction intrinsic and library • Performance, profiling, debugging, and error handling • Directive-based high-level programming model – OpenACC and OpenMP 2 Computer Graphics GPU: Graphics Processing Unit 3 Graphics Processing Unit (GPU) Image: http://www.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html 4 Graphics Processing Unit (GPU) • Enriching user visual experience • Delivering energy-efficient computing • Unlocking potentials of complex apps • Enabling Deeper scientific discovery 5 What is GPU Today? • It is a processor optimized for 2D/3D graphics, video, visual computing, and display. • It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. • It provide real-time visual interaction with computed objects via graphics images, and video. • It serves as both a programmable graphics processor and a scalable parallel computing platform. – Heterogeneous systems: combine a GPU with a CPU • It is called as Many-core 6 Graphics Processing Units (GPUs): Brief History GPU Computing General-purpose computing on graphics processing units (GPGPUs) GPUs with programmable shading Nvidia GeForce GE 3 (2001) with programmable shading DirectX graphics API OpenGL graphics API Hardware-accelerated 3D graphics S3 graphics cards- single chip 2D accelerator Atari 8-bit computer IBM PC Professional Playstation text/graphics chip Graphics Controller card 1970 1980 1990 2000 2010 Source of information http://en.wikipedia.org/wiki/Graphics_Processing_Unit 7 NVIDIA Products • NVIDIA Corp.
    [Show full text]
  • PC Hardware Contents
    PC Hardware Contents 1 Computer hardware 1 1.1 Von Neumann architecture ...................................... 1 1.2 Sales .................................................. 1 1.3 Different systems ........................................... 2 1.3.1 Personal computer ...................................... 2 1.3.2 Mainframe computer ..................................... 3 1.3.3 Departmental computing ................................... 4 1.3.4 Supercomputer ........................................ 4 1.4 See also ................................................ 4 1.5 References ............................................... 4 1.6 External links ............................................. 4 2 Central processing unit 5 2.1 History ................................................. 5 2.1.1 Transistor and integrated circuit CPUs ............................ 6 2.1.2 Microprocessors ....................................... 7 2.2 Operation ............................................... 8 2.2.1 Fetch ............................................. 8 2.2.2 Decode ............................................ 8 2.2.3 Execute ............................................ 9 2.3 Design and implementation ...................................... 9 2.3.1 Control unit .......................................... 9 2.3.2 Arithmetic logic unit ..................................... 9 2.3.3 Integer range ......................................... 10 2.3.4 Clock rate ........................................... 10 2.3.5 Parallelism .........................................
    [Show full text]
  • View Annual Report
    ar5182 Electronic EDGAR Proof Job Number: -NOT DEFINED- Filer: -NOT DEFINED- Form Type: 10-K Reporting Period / Event Date: 01/25/09 Customer Service Representative: -NOT DEFINED- Revision Number: -NOT DEFINED- This proof may not fit on letter-sized (8.5 x 11 inch) paper. If copy is cut off, please print to a larger format, e.g., legal- sized (8.5 x 14 inch) paper or oversized (11 x 17 inch) paper. Accuracy of proof is guaranteed ONLY if printed to a PostScript printer using the correct PostScript driver for that printer make and model. (this header is not part of the document) EDGAR Submission Header Summary Submission Type 10-K Live File on Return Copy on Submission Contact Aarti Ratnam Submission Contact Phone Number 408-566-5163 Exchange NASD Confirming Copy off Filer CIK 0001045810 Filer CCC xxxxxxxx Period of Report 01/25/09 Smaller Reporting Company off Shell Company No Voluntary Filer No Well-Known Seasoned Issuer Yes Notify via Filing website Only off Emails [email protected] Documents 10-K fy2009form10k.htm FISCAL YEAR 2009 FORM 10-K EX-21.1 fy2009subsidiaries.htm LISTING OF SUBSIDIARIES EX-23.1 fy2009pwcconsent.htm CONSENT OF INDEPENDENT REGISTERED PUBLIC ACCOUNTING FIRM EX-31.1 fy2009cert302ceo.htm 302 CERTIFICATION OF CEO EX-31.2 fy2009cert302cfo.htm 302 CERTIFICATION OF CFO EX-32.1 fy2009906certceo.htm 906 CERTIFICATION OF CEO EX-32.2 fy2009906certcfo.htm 906 CERTIFICATION OF CFO GRAPHIC fiveyearsperfstockgraph.jpg FIVE YEARS STOCK PERFORMANCE GRAPH GRAPHIC tenyearsperfstockgraph.jpg TEN YEARS STOCK PERFORMANCE GRAPH Module
    [Show full text]
  • NVIDIA Performance Primitives & Video Codecs On
    NVIDIA Performance Primitives & Video Codecs on GPU Gold Room | Thursday 1st October 2009 | Anton Obukhov & Frank Jargstorff Overview • Two presentations: – NPP (Frank Jargstorff) – Video Codes on NVIDIA GPUs (Anton Obukhov) • NPP Overview – NPP Goals – How to use NPP? – What is in NPP? – Performance What is NPP? • C Library of functions (primitives) running on CUDA architecture • API identical to IPP (Intel Integrated Performance Primitives) • Speedups up to 32x over IPP • Free distribution – binary packages for Windows and Linux (32- and 64 bit), Mac OS X • Release Candidate 1.0: Available to Registered Developers now. – Final release in two weeks at http://www.nvidia.com/npp NPP’s Goals • Ease of use – no knowledge of GPU architecture required – integrates well with existing projects • work well if added into existing projects • work well in conjunction with other libraries • Runs on CUDA Architecture GPUs • High Performance – relieve developers from optimization burden • Algorithmic Building Blocks (Primitives) – recombine to solve wide range of problems Ease of Use • Implements Intel’s IPP API verbatim – IPP widely used in high-performance software development – well designed API • Uses CUDA “runtime API” – device memory is handled via simple C-style pointers – pointers in the NPP API are device pointers – but: host and device memory management left to user (for performance reasons) • Pointer based API – pointers facilitate interoperability with existing code (C for CUDA) and libraries (cuFFT, cuBLAS, etc.) – imposes no “framework”
    [Show full text]
  • 750Ti Driver Download Geforce 335.23 Driver
    750ti driver download GeForce 335.23 Driver. This 335.23 Game Ready WHQL driver ensures you’ll have the best possible gaming experience for Titanfall. Performance Enhanced GPU clock offset options for GeForce GTX 750Ti / GTX 750 Diablo III – updated DX9 profile Bound by Flame – updated profile DOTA 2 – updated profile Need for Speed Rivals – updated DX11 profile Watch Dogs – updated profile Gaming Technology Supports GeForce ShadowPlay™ technology Supports GeForce ShadowPlay™ Twitch Streaming Supports NVIDIA GameStream™ technology Titanfall – rated “Good” Thief – rating now “Good” Call of Duty: Ghosts – in-depth laser sight added. GeForce GTX TITAN, GeForce GTX TITAN Black. GeForce 700 Series: GeForce GTX 780 Ti, GeForce GTX 780, GeForce GTX 770, GeForce GTX 760, GeForce GTX 760 Ti (OEM), GeForce GTX 750 Ti, GeForce GTX 750, GeForce GTX 745. GeForce 600 Series: GeForce GTX 690, GeForce GTX 680, GeForce GTX 670, GeForce GTX 660 Ti, GeForce GTX 660, GeForce GTX 650 Ti BOOST, GeForce GTX 650 Ti, GeForce GTX 650, GeForce GTX 645, GeForce GT 645, GeForce GT 640, GeForce GT 630, GeForce GT 620, GeForce GT 610, GeForce 605. GeForce 500 Series: GeForce GTX 590, GeForce GTX 580, GeForce GTX 570, GeForce GTX 560 Ti, GeForce GTX 560 SE, GeForce GTX 560, GeForce GTX 555, GeForce GTX 550 Ti, GeForce GT 545, GeForce GT 530, GeForce GT 520, GeForce 510. GeForce 400 Series: GeForce GTX 480, GeForce GTX 470, GeForce GTX 465, GeForce GTX 460 SE v2, GeForce GTX 460 SE, GeForce GTX 460, GeForce GTS 450, GeForce GT 440, GeForce GT 430, GeForce GT 420, GeForce 405.
    [Show full text]
  • Applications Kernels
    C-DAC Four Days Technology Workshop ON Hybrid Computing – Coprocessors/Accelerators Power-Aware Computing – Performance of Applications Kernels hyPACK-2013 (Mode-4 : GPUs) Lecture Topic: An Overview of CUDA enabled GPUs Venue : CMSD, UoHYD ; Date : October 15-18, 2013 C-DAC hyPACK-2013 An Overview of CUDA enabled NVIDIA GPUs 1 An Overview of CUDA enabled NVIDIA GPUs Lecture Outline Following topics will be discussed An overview of CUDA enabled NVIDIA GPU Tuning & Performance Issues on NVIDIA GPUs An Overview of CUDA 4.x/5.0 & -Fermi /Kepler GK110 Source : NVIDIA, References given in the presentation C-DAC hyPACK-2013 An Overview of CUDA enabled NVIDIA GPUs 2 Part-1 CUDA enabled NVIDIS GPUs Source & Acknowledgements : NVIDIA, References C-DAC hyPACK-2013 An Overview of CUDA enabled NVIDIA GPUs 3 Computing - CPU/GPU Source & Acknowledgements : NVIDIA, References C-DAC hyPACK-2013 An Overview of CUDA enabled NVIDIA GPUs 4 Computing - CPU/GPU Floating-Point Operations per Second and Memory Bandwidth for the CPU and GPU Source & Acknowledgements : NVIDIA, References C-DAC hyPACK-2013 An Overview of CUDA enabled NVIDIA GPUs 5 Why Are GPUs So Fast? GPU originally specialized for math-intensive, highly parallel computation So, more transistors can be devoted to data processing rather than data caching and flow control AMD ALU ALU Control ALU ALU Cache DRAM DRAM CPU GPU NVIDIA Commodity industry: provides economies of scale Competitive industry: fuels innovation Source : NVIDIA, References C-DAC hyPACK-2013 An Overview of CUDA enabled NVIDIA GPUs 6 GPU Computing : Think in Parallel Some Design Goals Scale to 100’s of cores, 1000’s of parallel 0 1 2 3 4 5 6 7 threads …… Let programmers focus on parallel float x = input[threadID]; float y = func(x); algorithms & Re-writing the Code output[threadID] = y; … • Not on the mechanics of a parallel programming language Enable heterogeneous systems (i.e.
    [Show full text]