GPU Programming Introduction to CUDA

Total Page:16

File Type:pdf, Size:1020Kb

GPU Programming Introduction to CUDA GPU programming Introduction to CUDA Dániel Berényi 22 October 2019 Introduction Dániel Berényi • E-mail: [email protected] Materials related to the course: • http://u235axe.web.elte.hu/GPUCourse/ The birth of GPGPUs What was there before… Graphics APIs – vertex / fragment shaders - The GPUs had dedicated parts for executing the vertex and the fragment shaders - Limited number of data, data types and data structures - Limited number of instruction in shaders People tried to create simulations, non-graphical computations with these. The GeForce 7 series G71 chip design The GeForce 8 series In 2006 NVIDIA introduced a new card (GeForce 8800) with more advanced, unified processors and gave generic access to it via the CUDA API. more reading GeForce 8800 unified pipeline architecture Host Data Assembler Setup / Rstr / ZCull Vtx Thread Issue Geom Thread Issue Pixel Thread Issue SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF TF L1 L1 L1 L1 L1 L1 L1 L1 ProcessorThread L2 L2 L2 L2 L2 L2 FB FB FB FB FB FB CUDA NVIDIA gave access to the new card via a new API, CUDA “Compute Unified Device Architecture” It is not specialized for graphics tasks, it has a general purpose programming model It was based on the C language, but extended with specific elements. CUDA vs OpenCL Let’s compare CUDA with OpenCL! CUDA OpenCL What it is HW architecture, language, API and language API, SDK, tool, etc. specification Type Proprietary technology Royalty-free standard Maintained by Nvidia Khronos, multiple vendors Target hardware Nvidia hw (mostly GPUs) Wide range of devices (CPU, GPU, FPGA, DSPs, …) CUDA vs OpenCL CUDA OpenCL Form Singe source Separate source (host and device code is (host and device code in separate source files) in same source code) Language Extension of C / C++ Host only handles API code, can be in any language Device code is an extension of C Compiled by nvcc Host code is compiled by the host language’s compiler, device code is compiled by vendor runtime CUDA vs OpenCL CUDA OpenCL Intermediate nvcc compiles device Conforming implementations code into PTX compile to SPIR or SPIR-V Can compile to intermediate offline yes yes and load at runtime Graphics OpenGL / DirectX / OpenGL / DirectX interoperability Vulkan CUDA vs OpenCL CUDA OpenCL Platform, device, context, queue Initialization Implicit creation Explicit device memory Explicit device memory allocation, Data management allocation, copy to some buffer movement is handled device, copy back implicitely, copy back Load source, create program, Just invoke it with Kernel build, bind arguments, enqueue arguments on queue Queue management Implicit Explicit CUDA vs OpenCL CUDA OpenCL Grid NDRange Thread block Work group Thread Work item Thread ID Global ID Block index Block ID Thread index Local ID Shared memory Local memory Registers Private memory CUDA vs OpenCL The CUDA computational grid is specified as: • The number of threads inside the block • The number of blocks inside the grid The OpenCL computational grid is specified as: • The number of threads inside the workgroup • The number of threads inside the grid! (and the workgroup size must evenly divide the gridsize!) A sample code in details • Squaring an array of integers The CUDA kernel __global__ void sq(int* dst, int* src) { int x = src[threadIdx.x]; dst[threadIdx.x] = x * x; } The CUDA kernel __global__ void sq(int* dst, int* src) { int x = src[threadIdx.x]; dst[threadIdx.x] = x * x; } The __global__ qualifier signals device function entry points The CUDA kernel __global__ void sq(int* dst, int* src) { int x = src[threadIdx.x]; dst[threadIdx.x] = x * x; } All entry points must return void The CUDA kernel __global__ void sq(int* dst, int* src) { int x = src[threadIdx.x]; dst[threadIdx.x] = x * x; } There are no specific types or qualifiers for buffers, we just expect pointers to arrays The CUDA kernel __global__ void sq(int* dst, int* src) { int x = src[threadIdx.x]; dst[threadIdx.x] = x * x; } threadIdx is a built-in object, that stores the actual thread’s indices in the computational grid The CUDA host code #include <vector> #include <algorithm> #include <iostream> int main() { std::vector<int> A{1, 2, 3, 4, 5, 6, 7, 8, 9}; std::vector<int> B(A.size()); std::vector<int> C(A.size()); size_t sz = A.size(); int* src = nullptr; int* dst = nullptr; The CUDA host code int* src = nullptr; int* dst = nullptr; cudaError_t err = cudaSuccess; err = cudaMalloc( (void**)&src, sz*sizeof(int) ); if( err != cudaSuccess){ … } err = cudaMalloc( (void**)&dst, sz*sizeof(int) ); if( err != cudaSuccess){ … } The CUDA host code int* src = nullptr; int* dst = nullptr; cudaMalloc allocates memory on the device cudaError_t err = cudaSuccess; err = cudaMalloc( (void**)&src, sz*sizeof(int) ); if( err != cudaSuccess){ … } err = cudaMalloc( (void**)&dst, sz*sizeof(int) ); if( err != cudaSuccess){ … } The CUDA host code int* src = nullptr; int* dst = nullptr; cudaError_t err = cudaSuccess; err = cudaMalloc( (void**)&src, sz*sizeof(int) ); if( err != cudaSuccess){ … } err = cudaMalloc( (void**)&dst, sz*sizeof(int) ); if( err != cudaSuccess){ … } CUDA API functions return error codes to signal success or failure. The CUDA host code Copying sz*sizeof(int) bytes of data from host pointer A.data() to device pointer src. err = cudaMemcpy( src, A.data(), sz*sizeof(int), cudaMemcpyHostToDevice ); if( err != cudaSuccess){ … } The CUDA host code dim3 dimGrid( 1, 1 ); dim3 dimBlock( sz, 1 ); sq<<<dimGrid, dimBlock>>>(dst, src); err = cudaGetLastError(); if (err != cudaSuccess){ … } The CUDA host code dim3 dimGrid( 1, 1 ); dim3 is a built-in type for 3D sizes. Here we create only one block dim3 dimBlock( sz, 1 ); with sz threads. sq<<<dimGrid, dimBlock>>>(dst, src); err = cudaGetLastError(); if (err != cudaSuccess){ … } The CUDA host code dim3 dimGrid( 1, 1 ); dim3 dimBlock( sz, 1 ); <<< … >>> marks a kernel invocation sq<<<dimGrid, dimBlock>>>(dst, src); err = cudaGetLastError(); if (err != cudaSuccess){ … } The CUDA host code dim3 dimGrid( 1, 1 ); dim3 dimBlock( sz, 1 ); Function name (sq) sq<<<dimGrid, dimBlock>>>(dst, src); err = cudaGetLastError(); Function arguments if (err != cudaSuccess){ … } Computation grid and other options The CUDA host code dim3 dimGrid( 1, 1 ); dim3 dimBlock( sz, 1 ); sq<<<dimGrid, dimBlock>>>(dst, src); err = cudaGetLastError(); if (err != cudaSuccess){ … } Errors during the kernel invocation are reported through the cudaGetLastError() function. The CUDA host code err = cudaMemcpy( B.data(), dst, sz*sizeof(int), cudaMemcpyDeviceToHost ); if( err != cudaSuccess){ … } Data must be copied back from the device to the host The CUDA host code What is happening in the background? In CUDA there is an implicit stream (queue) selected, that receives all the issued commands, these are automatically sequenced after each other and cudaMemcpy commands are blocking, until the data is available from the previous kernel call. So in the simples examples here, we do not need to handle host syncronizations. When we have to, we can use events, callbacks, or the cudaDeviceSynchronize function. Compiling CUDA codes CUDA codes are compiled by the nvcc compiler supplied by Nvidia. It performs the extraction of the device side codes (marked by <<< >>> and __global__ ) from the source code and then proceeds to compile the host code to binary and the device code to PTX. Compiling CUDA codes Compiling under windows: (note the installation resources here) • A Visual Studio installation or the Visual Studio Build tools are needed. See here. • You need to run the Developer Command Prompt (documentation here and here) • You need to be aware which VS version is compatible with which CUDA version • Once you’ve set up the developer prompt, you can invoke nvcc on your CUDA sources. Compiling CUDA codes Sample setup script for initializing the command line on Windows: "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat" amd64 -vcvars_ver=14.16 The generic setup script for initializing the command prompt for development Compiling CUDA codes Sample setup script for initializing the command line on Windows: "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat" amd64 -vcvars_ver=14.16 Architecture Visual Studio toolset version to build with (now 2017 toolset) Compiling CUDA codes Compiling under linux/unix and/or Mac OS: If you’ve installed cuda properly nvcc should be simply available from the command line just like gcc. Installation resources: Mac OS Linux nvcc more information on nvcc Once your system / command prompt is ready, nvcc acts as gcc: nvcc mycode.cu –o myexecutable some arguments we will use: -O3 to enable optimizations -std=c++11 or -std=c++14 to select C++ standard version --expt-extended-lambda for generic lambda functions The goal of GPU computing We would like to write code that runs fast! Faster then on the CPU. But the GPU result must be correct too. It is highly recommended, that we: • measure CPU performance • measure GPU performance • compare the results if they did the same thing (assuming the CPU implementation was correct) Measuring performance on the CPU Since C++11: #include <chrono> auto t0 = std::chrono::high_resolution_clock::now(); // computation to measure auto t1 = std::chrono::high_resolution_clock::now(); auto dt = std::chrono::duration_cast<std::chrono::microseconds>(t1-t0).count() Measuring performance on the CPU Since C++11: See on cppreference auto dt = std::chrono::duration_cast<std::chrono::microseconds>(t1-t0).count() std::chrono::hours std::chrono::minutes std::chrono::seconds std::chrono::milliseconds std::chrono::microseconds std::chrono::nanoseconds Measuring performance on the GPU Simply using the CPU timers to measure the GPU performance may give incorrect results, as the GPU is operating asynchronously. Thus it would require CPU-GPU synchronizations that can reduce performance on large programs. Much more reliable time can be obtained by events that are placed into the stream to record time marks.
Recommended publications
  • NVIDIA® Geforce® 7300 GT Gpus Features and Benefits
    NVIDIA GEFORCE 7 SERIES MARKETING MATERIALS NVIDIA® GeForce® 7300 GT GPUs Features and Benefits Next-Generation Superscalar GPU Architecture Delivers over 2x the shading power of previous generation products taking gaming performance to extreme levels. Full Microsoft® DirectX® 9.0 Shader Model 3.0 Support The standard for today’s PCs and next-generation consoles enables stunning and complex effects for cinematic realism. NVIDIA GPUs offer the most complete implementation of the Shader Model 3.0 feature set—including vertex texture fetch (VTF)—to ensure top-notch compatibility and performance for all DirectX 9 applications. NVIDIA® CineFX® 4.0 Engine Delivers advanced visual effects at unimaginable speeds. Full support for Microsoft® DirectX® 9.0 Shader Model 3.0 enables stunning and complex special effects. Next-generation shader architecture with new texture unit design streamlines texture processing for faster and smoother gameplay. NVIDIA® SLI™ Technology1 Delivers up to 2x the performance of a single GPU configuration for unparalleled gaming experiences by allowing two graphics cards to run in parallel. The must-have feature for performance PCI Express® graphics, SLI dramatically scales performance on today’s hottest games. NVIDIA® Intellisample™ 4.0 Technology The industry’s fastest antialiasing delivers ultra-realistic visuals, with no jagged edges, at lightning- fast speeds. Visual quality is taken to new heights through a new rotated grid sampling pattern, advanced 128 tap sample coverage, 16x anisotropic filtering, and support for transparent supersampling and multisampling. True High Dynamic-Range (HDR) Rendering Support The ultimate lighting effects bring environments to life for a truly immersive, ultra-reailstic experience. Based on the OpenEXR technology from Industrial Light & Magic (http://www.openexr.com/), NVIDIA’s 64-bit texture implementation delivers state-of-the-art high dynamic-range (HDR) visual effects through floating point capabilities in shading, filtering, texturing, and blending.
    [Show full text]
  • Release 85 Notes
    ForceWare Graphics Drivers Release 85 Notes Version 88.61 For Windows Vista x86 and Windows Vista x64 NVIDIA Corporation May 2006 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, 3DFX, 3DFX INTERACTIVE, the 3dfx Logo, STB, STB Systems and Design, the STB Logo, the StarBox Logo, NVIDIA nForce, GeForce, NVIDIA Quadro, NVDVD, NVIDIA Personal Cinema, NVIDIA Soundstorm, Vanta, TNT2, TNT,
    [Show full text]
  • NVIDIA® Geforce® 7900 Gpus Features and Benefits Next
    NVIDIA GEFORCE 7 SERIES MARKETING MATERIALS NVIDIA® GeForce® 7900 GPUs Features and Benefits Next-Generation Superscalar GPU Architecture: Delivers over 2x the shading power of previous generation products taking gaming performance to extreme levels. Full Microsoft® DirectX® 9.0 Shader Model 3.0 Support: The standard for today’s PCs and next-generation consoles enables stunning and complex effects for cinematic realism. NVIDIA GPUs offer the most complete implementation of the Shader Model 3.0 feature set—including vertex texture fetch (VTF)—to ensure top-notch compatibility and performance for all DirectX 9 applications. NVIDIA® CineFX® 4.0 Engine: Delivers advanced visual effects at unimaginable speeds. Full support for Microsoft® DirectX® 9.0 Shader Model 3.0 enables stunning and complex special effects. Next-generation shader architecture with new texture unit design streamlines texture processing for faster and smoother gameplay. NVIDIA® SLI™ Technology*: Delivers up to 2x the performance of a single GPU configuration for unparalleled gaming experiences by allowing two graphics cards to run in parallel. The must-have feature for performance PCI Express® graphics, SLI dramatically scales performance on today’s hottest games. NVIDIA® Intellisample™ 4.0 Technology: The industry’s fastest antialiasing delivers ultra- realistic visuals, with no jagged edges, at lightning-fast speeds. Visual quality is taken to new heights through a new rotated grid sampling pattern, advanced 128 tap sample coverage, 16x anisotropic filtering, and support for transparent supersampling and multisampling. True High Dynamic-Range (HDR) Rendering Support: The ultimate lighting effects bring environments to life for a truly immersive, ultra-realistic experience. Based on the OpenEXR technology from Industrial Light & Magic (http://www.openexr.com/), NVIDIA’s 64-bit texture implementation delivers state-of-the-art high dynamic-range (HDR) visual effects through floating point capabilities in shading, filtering, texturing, and blending.
    [Show full text]
  • Nvidia Geforce Go 7400 Driver Download Windows 7
    Nvidia geforce go 7400 driver download windows 7 click here to download Operating System, Windows Vista bit NVIDIA recommends that you check with your notebook OEM about recommended Before downloading this driver: Beta driver for GeForce Go 7-series, GeForce 8M and GeForce 9M series Go , GeForce Go , GeForce Go , GeForce Go Operating System: Windows Vista bit, Windows 7 bit Beta driver for GeForce Go 7-series, GeForce 8M and GeForce 9M series notebook GPUs. in several free applications and demos by downloading the GeForce Plus Pack. GTX, GeForce Go , GeForce Go , GeForce Go , GeForce Go , . Since all GeForce Go 7 Series GPUs are designed to be compatible with the next -generation Microsoft® Windows Vista™ operating system (OS), you can rest. Download nVidia GeForce Go video card drivers or install DriverPack Solution software Operating System Versions: Windows XP, 7, 8, , 10 (x64, x86). This package supports the following driver models:NVIDIA GeForce Free Sony Windows NT//XP/ Version Full Specs. Free NVIDIA Windows 98/NT//XP/ Version Full Specs NVIDIA GeForce Go and NVIDIA GeForce Go graphics processing units ( GPUs) deliver all of of the award-winning GeForce 7 Series GPU architecture to thin and light notebook PCs. Subcategory, Keyboard Drivers. Download the latest Nvidia GeForce Go device drivers (Official and Certified). Nvidia GeForce Go Compatibility: Windows XP, Vista, 7, 8, Downloads. nVidia - Display - NVIDIA GeForce Go , Windows XP Bit Edition Version , Drivers (Other Hardware), 4/3/, , MB NVIDIA GeForce Go - Driver Download. Updating your drivers with Driver Alert can help your computer in a number of ways. Windows 7 Bit Driver.
    [Show full text]
  • 最新 7.3.X 6.0.0-6.02 6.0.3 6.0.4 6.0.5-6.09 6.0.2(RIQ)
    2017/04/24現在 赤字は標準インストール外 最新 RedHawk Linux 6.0.x 6.3.x 6.5.x 7.0.x 7.2.x 7.3.x Version 6.0.0-6.02 6.0.3 6.0.4 6.0.5-6.09 6.0.2(RIQ) 6.3.1-6.3.2 6.3.3 6.3.4-6.3.6 6.3.7-6.3.11 6.5.0 6.5.1 6.5-6.5.8 7.0 7.01-7.03 7.2-7.2.5 7.3-7.3.1 Xorg-version 1.7.7(X11R7.4) 1.10.6(X11R7.4) 1.13.0(X11R7.4) 1.15.0(X11R7.7) 1.17.2-10(X11R7.7) 1.17.2-22(X11R7.7) X.Org ANSI C 0.4 0.4 0.4 0.4 0.4 0.4 Emulation X.Org Video Driver 6.0 10.0 13.1 19.0 19.0 15.0 X.Org Xinput driver 7.0 12.2 18.1 21.0 21.0 20.0 X.Org Server 2.0 5.0 7.0 9.0 9.0 8.0 Extention RandR Version 1.1 1.1 1.1 1.1/1.2 1.1/1.2/1.3/1.4 1.1/1.2 1.1/1.2/1.3 1.1/1.2/1.3 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 NVIDIA driver 340.76 275.09.07 295.20 295.40 304.54 331.20 304.37 304.54 310.32 319.49 331.67 337.25 340.32 346.35 346.59 352.79 367.57 375.51 version (Download) PTX ISA 2.3(CUDA4.0) 3.0(CUDA 4.1,4.2) 3.0(CUDA4.1,4.2) 3.1(CUDA5.0) 4.0(CUDA6.0) 3.1(CUDA5.0) 3.1(CUDA5.0) 3.1(CUDA5.0) 3.2(CUDA5.5) 4.0(CUDA6.0) 4.0(CUDA6.0) 4.1(CUDA6.5) 4.1(CUDA6.5) 4.3(CUDA7.5) 5.0(CUDA8.0) 5.0(CUDA8.0) Version(対応可能 4.2(CUDA7.0) 4.2(CUDA7.0) CUDAバージョン) Unified Memory N/A Yes N/A Yes kernel module last update Aug 17 2011 Mar 20 2012 May 16 2012 Nov 13 2012 Sep 27 2012 Nov 20 2012 Aug 16 2013 Jan 08 2014 Jun 30 2014 May 16 2014 Dec 10 2014 Jan 27 2015 Mar 24 2015 Apr 7 2015 Mar 09 2016 Dec 21 2016 Arp 5 2017 標準バンドル 4.0.1 4.1.28 4.1.28 4.2.9 5.5 5.0 5.0 5.0 5.5 5.5,6.0 5.5,6.0 5.5,6.0 7.5 7.5 8.0
    [Show full text]
  • Trabajo #1 De Computación Gráfica
    TRABAJO #1 DE COMPUTACIÓN GRÁFICA MONITORES 1.¿Cuántos tipos de monitores hay? Hay actualmente tres tipos principales de tecnologías: CRT, o tubos de rayos catódicos (los de siempre), LCD, o pantallas de cristal líquido (Liquid Crystal Display), y las pantallas de plasma. LCD: Una pantalla de cristal líquido o LCD (acrónimo del inglés Liquid crystal display) es una pantalla delgada y plana formada por un número de píxeles en color o monocromos colocados delante de una fuente de luz o reflectora. A menudo se utiliza en pilas, dispositivos electrónicos, ya que utiliza cantidades muy pequeñas de energía eléctrica. CTR: El monitor esta basado en un elemento CRT (Tubo de rayos catódicos), los actuales monitores, controlados por un microprocesador para almacenar muy diferentes formatos, así como corregir las eventuales distorsiones, y con capacidad de presentar hasta 1600x1200 puntos en pantalla. Los monitores CRT emplean tubos cortos, pero con la particularidad de disponer de una pantalla completamente plana. PLASMA: Se basan en el principio de que haciendo pasar un alto voltaje por un gas a baja presión se genera luz. Estas pantallas usan fósforo como los CRT pero son emisivas como las LCD y frente a estas consiguen una gran mejora del color y un estupendo ángulo de visión. 2.Compara en una tabla similitudes y diferencias, ventajas y desventajas, entre CRT, LCD, PLASMA, TFT, HDA, y otras tecnologías Similitud Diferencia Ventajas Desventajas CRT Lo usan los -Permiten -Ocupan monitores de reproducir una más espacio plasma mayor variedad (cuanto mas cromática. fondo, mejor geometría). -Distintas resoluciones se -Los pueden ajustar modelos al monitor.
    [Show full text]
  • NVIDIA Performance Primitives & Video Codecs On
    NVIDIA Performance Primitives & Video Codecs on GPU Gold Room | Thursday 1st October 2009 | Anton Obukhov & Frank Jargstorff Overview • Two presentations: – NPP (Frank Jargstorff) – Video Codes on NVIDIA GPUs (Anton Obukhov) • NPP Overview – NPP Goals – How to use NPP? – What is in NPP? – Performance What is NPP? • C Library of functions (primitives) running on CUDA architecture • API identical to IPP (Intel Integrated Performance Primitives) • Speedups up to 32x over IPP • Free distribution – binary packages for Windows and Linux (32- and 64 bit), Mac OS X • Release Candidate 1.0: Available to Registered Developers now. – Final release in two weeks at http://www.nvidia.com/npp NPP’s Goals • Ease of use – no knowledge of GPU architecture required – integrates well with existing projects • work well if added into existing projects • work well in conjunction with other libraries • Runs on CUDA Architecture GPUs • High Performance – relieve developers from optimization burden • Algorithmic Building Blocks (Primitives) – recombine to solve wide range of problems Ease of Use • Implements Intel’s IPP API verbatim – IPP widely used in high-performance software development – well designed API • Uses CUDA “runtime API” – device memory is handled via simple C-style pointers – pointers in the NPP API are device pointers – but: host and device memory management left to user (for performance reasons) • Pointer based API – pointers facilitate interoperability with existing code (C for CUDA) and libraries (cuFFT, cuBLAS, etc.) – imposes no “framework”
    [Show full text]
  • Release 75 Notes
    ForceWare Graphics Drivers Release 75 Notes Version 78.03 For Windows XP / 2000 Windows XP Media Center Edition Windows 98 / ME Windows NT 4.0 NVIDIA Corporation August 2005 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, 3DFX, 3DFX INTERACTIVE, the 3dfx Logo, STB, STB Systems and Design, the STB Logo, the StarBox Logo, NVIDIA nForce, GeForce, NVIDIA Quadro, NVDVD, NVIDIA Personal Cinema,
    [Show full text]
  • High Performance Visualization Through Graphics Hardware and Integration Issues in an Electric Power Grid Computer-Aided-Design Application
    UNIVERSITY OF A CORUÑA FACULTY OF INFORMATICS Department of Computer Science Ph.D. Thesis High performance visualization through graphics hardware and integration issues in an electric power grid Computer-Aided-Design application Author: Javier Novo Rodríguez Advisors: Elena Hernández Pereira Mariano Cabrero Canosa A Coruña, June, 2015 August 27, 2015 UNIVERSITY OF A CORUÑA FACULTY OF INFORMATICS Campus de Elviña s/n 15071 - A Coruña (Spain) Copyright notice: No part of this publication may be reproduced, stored in a re- trieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording and/or other- wise without the prior permission of the authors. Acknowledgements I would like to thank Gas Natural Fenosa, particularly Ignacio Manotas, for their long term commitment to the University of A Coru˜na. This research is a result of their funding during almost five years through which they carefully balanced business-driven objectives with the freedom to pursue more academic goals. I would also like to express my most profound gratitude to my thesis advisors, Elena Hern´andez and Mariano Cabrero. Elena has also done an incredible job being the lead coordinator of this collaboration between Gas Natural Fenosa and the University of A Coru˜na. I regard them as friends, just like my other colleagues at LIDIA, with whom I have spent so many great moments. Thank you all for that. Last but not least, I must also thank my family – to whom I owe everything – and friends. I have been unbelievably lucky to meet so many awesome people in my life; every single one of them is part of who I am and contributes to whatever I may achieve.
    [Show full text]
  • IMPLEMENTING an INTERIOR POINT METHOD for LINEAR PROGRAMS on a CPU-GPU SYSTEM∗ in Memory of Gene Golub 1. Introduction. Hidden
    Electronic Transactions on Numerical Analysis. ETNA Volume 28, pp. 174-189, 2008. Kent State University Copyright 2008, Kent State University. [email protected] ISSN 1068-9613. IMPLEMENTING AN INTERIOR POINT METHOD FOR LINEAR PROGRAMS ON A CPU-GPU SYSTEM∗ JIN HYUK JUNG† AND DIANNE P. O’LEARY‡ In memory of Gene Golub Abstract. Graphics processing units (GPUs), present in every laptop and desktop computer, are potentially pow- erful computational engines for solving numerical problems. We present a mixed precision CPU-GPU algorithm for solving linear programming problems using interior point methods. This algorithm, based on the rectangular-packed matrix storage scheme of Gunnels and Gustavson, uses the GPU for computationally intensive tasks such as ma- trix assembly, Cholesky factorization, and forward and back substitution. Comparisons with a CPU implementation demonstrate that we can improve performance by using the GPU for sufficiently large problems. Since GPU archi- tectures and programming languages are rapidly evolving, we expect that GPUs will be an increasingly attractive tool for matrix computation in the future. Key words. GPGPU, Cholesky factorization, matrix decomposition, forward and back substitution, linear pro- gramming, interior point method, rectangular packed format AMS subject classifications. 90C05, 90C51, 15A23, 68W10 1. Introduction. Hidden inside your desktop or laptop computer is a very powerful par- allel processor, the graphics processing unit (GPU). This hardware is dedicated to rendering images on your screen, and its design was driven by the demands of the gaming industry. This single-instruction-multiple-data (SIMD) processor has its own memory, and the host CPU is- sues instructions and data to it through a data bus such as PCIe (Peripheral Component Inter- connect Express).
    [Show full text]
  • Release 95 Notes
    ForceWare Graphics Drivers Release 95 Notes Version 96.85 For Windows Vista x86 and Windows Vista x64 NVIDIA Corporation November 8, 2006 Rev. B Confidential Information Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, 3DFX, 3DFX INTERACTIVE, the 3dfx Logo, STB, STB Systems and Design, the STB Logo, the StarBox Logo, NVIDIA nForce, GeForce, NVIDIA Quadro, NVDVD, NVIDIA Personal Cinema,
    [Show full text]
  • BFG Geforce 7800 GT OC Highlights
    BFG GeForce 7800 GT OC Highlights A bit about the GeForce 7800 GT At the QUAKECON 2005 held in DALLAS, TX NVIDIA unleashed its second new GeForce 7 series product upon the world. Its name was the GeForce 7800 GT, and it is prepared to kick some. Just 2 months prior to the unveiling of the GeForce 7800 GT, NVIDIA had treated fans to the release of the GeForce 7800 GTX, which is the first product to hail form the new GeForce 7 series to win the performance crown in the video card domain. At the same time, ATI’s next generation GPUs – the RADEON X1000 series - were still undergoing development back at the factory due to problems related to the new 90nm process. The most powerful model ATI could muster at that time was still the RADEON X850 XT PE, which was easy pickings for NVIDIA’s new flagship video card. With the absence of any real competition, the GeForce 7800 GTX’s astronomical price tag (official MSRP $599) was set too high for even some of the most enthusiastic of enthusiasts. The problem is, if only a few gamers are allowed to experience the new features and performance of the GeForce 7 series, the passion of most others wishing to own a GeForce 7 card would quickly fade. With that considered, a cheaper, but still high performance model was expected to appear shortly after the debut of the GeForce 7800 GTX. As we all know, that card has arrived, and it is dubbed the GeForce 7800 GT. GeForce 7800 GT Overview Though the GeForce 7800 GT is not the top model in the GeForce 7 series, its specifications still come in at well above any GeForce 6 series card.
    [Show full text]