
NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 0.8.2 4/24/2007 ii CUDA Programming Guide Version 0.8.2 Table of Contents Chapter 1. Introduction to CUDA....................................................................... 1 1.1 The Graphics Processor Unit as a Data-Parallel Computing Device ...................1 1.2 CUDA: A New Architecture for Computing on the GPU ....................................3 1.3 Document’s Structure ...................................................................................6 Chapter 2. Programming Model......................................................................... 7 2.1 A Highly Multithreaded Coprocessor...............................................................7 2.2 Thread Batching...........................................................................................7 2.2.1 Thread Block .........................................................................................7 2.2.2 Grid of Thread Blocks.............................................................................8 2.3 Memory Model ...........................................................................................10 Chapter 3. Hardware Implementation ............................................................13 3.1 A Set of SIMD Multiprocessors with On-Chip Shared Memory ........................13 3.2 Execution Model .........................................................................................14 Chapter 4. Application Programming Interface ..............................................17 4.1 An Extension to the C Programming Language .............................................17 4.2 Language Extensions..................................................................................17 4.2.1 Function Type Qualifiers.......................................................................18 4.2.2 Variable Type Qualifiers .......................................................................19 4.2.3 Execution Configuration .......................................................................20 4.2.4 Built-in Variables..................................................................................21 4.2.5 Compilation with NVCC ........................................................................21 4.3 Common Runtime Component.....................................................................22 4.3.1 Built-in Vector Types............................................................................22 4.3.2 Mathematical Functions........................................................................22 4.3.3 Time Function .....................................................................................23 4.3.4 Texture Type.......................................................................................23 4.4 Device Runtime Component ........................................................................24 4.4.1 Mathematical Functions........................................................................24 CUDA Programming Guide Version 0.8.2 iii 4.4.2 Synchronization Function .....................................................................25 4.4.3 Type Casting Functions ........................................................................25 4.4.4 Texture Functions................................................................................25 4.5 Host Runtime Component ...........................................................................26 4.5.1 Common Concepts...............................................................................26 4.5.2 Runtime API........................................................................................27 4.5.3 Driver API ...........................................................................................32 Chapter 5. GeForce 8800 Series and Quadro FX 5600/4600 Technical Specification ....................................................................................39 5.1 General Specification ..................................................................................39 5.2 Floating-Point Standard ..............................................................................40 Chapter 6. Performance Guidelines.................................................................43 6.1 Instruction Performance .............................................................................43 6.1.1 Instruction Throughput ........................................................................43 6.1.2 Memory Bandwidth ..............................................................................45 6.2 Number of Threads per Block......................................................................55 6.3 Data Transfer between Host and Device ......................................................56 Chapter 7. Example of Matrix Multiplication ...................................................57 7.1 Overview ...................................................................................................57 7.2 Source Code Listing....................................................................................59 7.3 Source Code Walkthrough...........................................................................61 7.3.1 Mul() ................................................................................................61 7.3.2 Muld() ..............................................................................................61 Appendix A. Mathematics Functions................................................................63 Appendix B. Runtime API Reference ...............................................................67 B.1 Device Management ...................................................................................67 B.1.1 cudaGetDeviceCount() ..................................................................67 B.1.2 cudaGetDeviceProperties() ........................................................67 B.1.3 cudaChooseDevice() ......................................................................68 B.1.4 cudaSetDevice() ............................................................................68 B.1.5 cudaGetDevice() ............................................................................68 B.2 Memory Management .................................................................................68 B.2.1 cudaMalloc() ..................................................................................68 iv CUDA Programming Guide Version 0.8.2 B.2.2 cudaMalloc2D() ..............................................................................68 B.2.3 cudaFree() ......................................................................................69 B.2.4 cudaMallocArray() ........................................................................69 B.2.5 cudaFreeArray() ............................................................................69 B.2.6 cudaMemset() ..................................................................................69 B.2.7 cudaMemset2D() ..............................................................................69 B.2.8 cudaMemcpy() ..................................................................................70 B.2.9 cudaMemcpy2D() ..............................................................................70 B.2.10 cudaMemcpyToArray() ....................................................................70 B.2.11 cudaMemcpy2DToArray() ................................................................70 B.2.12 cudaMemcpyFromArray() ................................................................71 B.2.13 cudaMemcpy2DFromArray() ............................................................71 B.2.14 cudaMemcpyArrayToArray() ..........................................................71 B.2.15 cudaMemcpy2DArrayToArray() ......................................................71 B.2.16 cudaMemcpyToSymbol() ..................................................................72 B.2.17 cudaMemcpyFromSymbol() ..............................................................72 B.2.18 cudaGetSymbolAddress() ..............................................................72 B.2.19 cudaGetSymbolSize() ....................................................................73 B.3 Texture Reference Management..................................................................73 B.3.1 Low-Level API .....................................................................................73 B.3.1.1 cudaCreateChannelDesc()......................................................73 B.3.1.2 cudaGetChannelDesc()............................................................73 B.3.1.3 cudaGetTextureReference()..................................................73 B.3.1.4 cudaBindTexture()..................................................................73 B.3.1.5 cudaUnbindTexture()..............................................................74 B.3.2 High-Level API.....................................................................................74 B.3.2.1 cudaBindTexture()..................................................................74 B.3.2.2 cudaUnbindTexture()..............................................................74 B.4 Execution Control .......................................................................................75 B.4.1 cudaConfigureCall() ....................................................................75 B.4.2 cudaLaunch() ..................................................................................75 B.4.3 cudaSetupArgument() ....................................................................75
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages105 Page
-
File Size-