NVIDIA CUDA Programming Guide

NVIDIA CUDA Programming Guide

NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 0.8.2 4/24/2007 ii CUDA Programming Guide Version 0.8.2 Table of Contents Chapter 1. Introduction to CUDA....................................................................... 1 1.1 The Graphics Processor Unit as a Data-Parallel Computing Device ...................1 1.2 CUDA: A New Architecture for Computing on the GPU ....................................3 1.3 Document’s Structure ...................................................................................6 Chapter 2. Programming Model......................................................................... 7 2.1 A Highly Multithreaded Coprocessor...............................................................7 2.2 Thread Batching...........................................................................................7 2.2.1 Thread Block .........................................................................................7 2.2.2 Grid of Thread Blocks.............................................................................8 2.3 Memory Model ...........................................................................................10 Chapter 3. Hardware Implementation ............................................................13 3.1 A Set of SIMD Multiprocessors with On-Chip Shared Memory ........................13 3.2 Execution Model .........................................................................................14 Chapter 4. Application Programming Interface ..............................................17 4.1 An Extension to the C Programming Language .............................................17 4.2 Language Extensions..................................................................................17 4.2.1 Function Type Qualifiers.......................................................................18 4.2.2 Variable Type Qualifiers .......................................................................19 4.2.3 Execution Configuration .......................................................................20 4.2.4 Built-in Variables..................................................................................21 4.2.5 Compilation with NVCC ........................................................................21 4.3 Common Runtime Component.....................................................................22 4.3.1 Built-in Vector Types............................................................................22 4.3.2 Mathematical Functions........................................................................22 4.3.3 Time Function .....................................................................................23 4.3.4 Texture Type.......................................................................................23 4.4 Device Runtime Component ........................................................................24 4.4.1 Mathematical Functions........................................................................24 CUDA Programming Guide Version 0.8.2 iii 4.4.2 Synchronization Function .....................................................................25 4.4.3 Type Casting Functions ........................................................................25 4.4.4 Texture Functions................................................................................25 4.5 Host Runtime Component ...........................................................................26 4.5.1 Common Concepts...............................................................................26 4.5.2 Runtime API........................................................................................27 4.5.3 Driver API ...........................................................................................32 Chapter 5. GeForce 8800 Series and Quadro FX 5600/4600 Technical Specification ....................................................................................39 5.1 General Specification ..................................................................................39 5.2 Floating-Point Standard ..............................................................................40 Chapter 6. Performance Guidelines.................................................................43 6.1 Instruction Performance .............................................................................43 6.1.1 Instruction Throughput ........................................................................43 6.1.2 Memory Bandwidth ..............................................................................45 6.2 Number of Threads per Block......................................................................55 6.3 Data Transfer between Host and Device ......................................................56 Chapter 7. Example of Matrix Multiplication ...................................................57 7.1 Overview ...................................................................................................57 7.2 Source Code Listing....................................................................................59 7.3 Source Code Walkthrough...........................................................................61 7.3.1 Mul() ................................................................................................61 7.3.2 Muld() ..............................................................................................61 Appendix A. Mathematics Functions................................................................63 Appendix B. Runtime API Reference ...............................................................67 B.1 Device Management ...................................................................................67 B.1.1 cudaGetDeviceCount() ..................................................................67 B.1.2 cudaGetDeviceProperties() ........................................................67 B.1.3 cudaChooseDevice() ......................................................................68 B.1.4 cudaSetDevice() ............................................................................68 B.1.5 cudaGetDevice() ............................................................................68 B.2 Memory Management .................................................................................68 B.2.1 cudaMalloc() ..................................................................................68 iv CUDA Programming Guide Version 0.8.2 B.2.2 cudaMalloc2D() ..............................................................................68 B.2.3 cudaFree() ......................................................................................69 B.2.4 cudaMallocArray() ........................................................................69 B.2.5 cudaFreeArray() ............................................................................69 B.2.6 cudaMemset() ..................................................................................69 B.2.7 cudaMemset2D() ..............................................................................69 B.2.8 cudaMemcpy() ..................................................................................70 B.2.9 cudaMemcpy2D() ..............................................................................70 B.2.10 cudaMemcpyToArray() ....................................................................70 B.2.11 cudaMemcpy2DToArray() ................................................................70 B.2.12 cudaMemcpyFromArray() ................................................................71 B.2.13 cudaMemcpy2DFromArray() ............................................................71 B.2.14 cudaMemcpyArrayToArray() ..........................................................71 B.2.15 cudaMemcpy2DArrayToArray() ......................................................71 B.2.16 cudaMemcpyToSymbol() ..................................................................72 B.2.17 cudaMemcpyFromSymbol() ..............................................................72 B.2.18 cudaGetSymbolAddress() ..............................................................72 B.2.19 cudaGetSymbolSize() ....................................................................73 B.3 Texture Reference Management..................................................................73 B.3.1 Low-Level API .....................................................................................73 B.3.1.1 cudaCreateChannelDesc()......................................................73 B.3.1.2 cudaGetChannelDesc()............................................................73 B.3.1.3 cudaGetTextureReference()..................................................73 B.3.1.4 cudaBindTexture()..................................................................73 B.3.1.5 cudaUnbindTexture()..............................................................74 B.3.2 High-Level API.....................................................................................74 B.3.2.1 cudaBindTexture()..................................................................74 B.3.2.2 cudaUnbindTexture()..............................................................74 B.4 Execution Control .......................................................................................75 B.4.1 cudaConfigureCall() ....................................................................75 B.4.2 cudaLaunch() ..................................................................................75 B.4.3 cudaSetupArgument() ....................................................................75

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    105 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us