CUDA Samples

CUDA Samples Reference Manual TRM-06704-001_v11.4 | September 2021 Table of Contents Chapter 1. Release Notes.................................................................................................... 1 1.1. CUDA 11.4..................................................................................................................................1 1.2. CUDA 11.3..................................................................................................................................1 1.3. CUDA 11.2..................................................................................................................................1 1.4. CUDA 11.1..................................................................................................................................2 1.5. CUDA 11.0..................................................................................................................................2 1.6. CUDA 10.2..................................................................................................................................3 1.7. CUDA 10.1 Update 2................................................................................................................. 3 1.8. CUDA 10.1 Update 1................................................................................................................. 4 1.9. CUDA 10.1..................................................................................................................................4 1.10. CUDA 10.0................................................................................................................................4 1.11. CUDA 9.2..................................................................................................................................5 1.12. CUDA 9.0..................................................................................................................................5 1.13. CUDA 8.0..................................................................................................................................5 1.14. CUDA 7.5..................................................................................................................................6 1.15. CUDA 7.0..................................................................................................................................7 1.16. CUDA 6.5..................................................................................................................................8 1.17. CUDA 6.0..................................................................................................................................8 1.18. CUDA 5.5..................................................................................................................................9 1.19. CUDA 5.0..................................................................................................................................9 1.20. CUDA 4.2................................................................................................................................10 1.21. CUDA 4.1................................................................................................................................10 Chapter 2. Getting Started................................................................................................. 11 2.1. Getting CUDA Samples...........................................................................................................11 Windows...................................................................................................................................... 11 Linux............................................................................................................................................ 11 2.2. Building Samples.................................................................................................................... 11 Windows...................................................................................................................................... 11 Linux............................................................................................................................................ 12 2.3. CUDA Cross-Platform Samples.............................................................................................12 TARGET_ARCH............................................................................................................................13 TARGET_OS................................................................................................................................. 13 TARGET_FS................................................................................................................................. 13 Cross Compiling to Embedded ARM architectures..............................................................13 Copying Libraries.................................................................................................................... 13 CUDA Samples TRM-06704-001_v11.4 | ii 2.4. Using CUDA Samples to Create Your Own CUDA Projects.................................................. 14 2.4.1. Creating CUDA Projects for Windows.............................................................................14 2.4.2. Creating CUDA Projects for Linux...................................................................................14 Chapter 3. Samples Reference.......................................................................................... 16 3.1. Simple Reference....................................................................................................................16 asyncAPI...................................................................................................................................... 16 bf16TensorCoreGemm - bfloat16 Tensor Core GEMM............................................................ 17 binaryPartitionCG - Binary Partition Cooperative Groups........................................................17 cdpSimplePrint - Simple Print (CUDA Dynamic Parallelism)..................................................18 cdpSimpleQuicksort - Simple Quicksort (CUDA Dynamic Parallelism).................................. 18 clock - Clock...............................................................................................................................18 clock_nvrtc - Clock libNVRTC................................................................................................... 19 cppIntegration - C++ Integration............................................................................................... 19 cppOverload.................................................................................................................................19 cudaNvSci - CUDA NvSciBuf/NvSciSync Interop......................................................................20 cudaOpenMP............................................................................................................................... 20 cudaTensorCoreGemm - CUDA Tensor Core GEMM...............................................................21 dmmaTensorCoreGemm - Double Precision Tensor Core GEMM..........................................21 fp16ScalarProduct - FP16 Scalar Product............................................................................... 22 globalToShmemAsyncCopy - Global Memory to Shared Memory Async Copy....................... 22 immaTensorCoreGemm - Tensor Core GEMM Integer MMA..................................................23 inlinePTX - Using Inline PTX......................................................................................................23 inlinePTX_nvrtc - Using Inline PTX with libNVRTC...................................................................23 matrixMul - Matrix Multiplication (CUDA Runtime API Version)............................................. 24 matrixMul_nvrtc - Matrix Multiplication with libNVRTC...........................................................24 matrixMulCUBLAS - Matrix Multiplication (CUBLAS).............................................................. 25 matrixMulDrv - Matrix Multiplication (CUDA Driver API Version)............................................25 memMapIPCDrv - Memmap IPC Driver API.............................................................................26 simpleAssert............................................................................................................................... 26 simpleAssert_nvrtc - simpleAssert with libNVRTC..................................................................27 simpleAtomicIntrinsics - Simple Atomic Intrinsics..................................................................27 simpleAtomicIntrinsics_nvrtc - Simple Atomic Intrinsics with libNVRTC............................... 27 simpleAttributes..........................................................................................................................28 simpleAWBarrier - Simple Arrive Wait Barrier........................................................................28 simpleCallback - Simple CUDA Callbacks............................................................................... 29 simpleCooperativeGroups - Simple Cooperative Groups.........................................................29 simpleCubemapTexture - Simple Cubemap Texture............................................................... 29 simpleCudaGraphs - Simple CUDA Graphs............................................................................

CUDA Samples

Multiprocessing Contents

Multiprocessing and Scalability

Design of a Message Passing Interface for Multiprocessing with Atmel Microcontrollers

Piotr Warczak Quarter: Fall 2011 Student ID: 99XXXXX Credit: 2 Grading: Decimal

Message Passing Fundamentals

Composable Multi-Threading and Multi-Processing for Numeric Libraries

14. Parallel Computing 14.1 Introduction 14.2 Independent

Efficient Parallel Approaches to Financial Derivatives and Rapid Stochastic Convergence

Parallel Processing with Eight-Node Raspberry Pi Cluster

Introduction to Multithreading, Superthreading and Hyperthreading Introduction

Three Basic Multiprocessing Issues

Helios: Heterogeneous Multiprocessing with Satellite Kernels