Cuda C Programming Guide

Total Page:16

File Type:pdf, Size:1020Kb

Cuda C Programming Guide CUDA C PROGRAMMING GUIDE PG-02829-001_v7.5 | September 2015 Design Guide CHANGES FROM VERSION 7.0 ‣ Updated C/C++ Language Support to: ‣ Added new section C++11 Language Features, ‣ Clarified that values of const-qualified variables with builtin floating-point types cannot be used directly in device code when the Microsoft compiler is used as the host compiler, ‣ Documented the extended lambda feature, ‣ Documented that typeid, std::type_info, and dynamic_cast are only supported in host code, ‣ Documented the restrictions on trigraphs and digraphs, ‣ Clarified the conditions under which layout mismatch can occur on Windows. ‣ Updated Table 12 to mention support of half-precision floating-point operations on devices of compute capabilities 5.3. ‣ Updated Table 2 with throughput for half-precision floating-point instructions. ‣ Added compute capability 5.3 to Table 13. ‣ Added the maximum number of resident grids per device to Table 13. ‣ Clarified the definition of __threadfence() in Memory Fence Functions. ‣ Mentioned in Atomic Functions that atomic functions do not act as memory fences. www.nvidia.com CUDA C Programming Guide PG-02829-001_v7.5 | ii TABLE OF CONTENTS Chapter 1. Introduction.........................................................................................1 1.1. From Graphics Processing to General Purpose Parallel Computing............................... 1 1.2. CUDA®: A General-Purpose Parallel Computing Platform and Programming Model.............4 1.3. A Scalable Programming Model.........................................................................5 1.4. Document Structure...................................................................................... 7 Chapter 2. Programming Model............................................................................... 9 2.1. Kernels......................................................................................................9 2.2. Thread Hierarchy........................................................................................10 2.3. Memory Hierarchy....................................................................................... 12 2.4. Heterogeneous Programming.......................................................................... 14 2.5. Compute Capability..................................................................................... 16 Chapter 3. Programming Interface..........................................................................17 3.1. Compilation with NVCC................................................................................ 17 3.1.1. Compilation Workflow.............................................................................18 3.1.1.1. Offline Compilation.......................................................................... 18 3.1.1.2. Just-in-Time Compilation....................................................................18 3.1.2. Binary Compatibility...............................................................................18 3.1.3. PTX Compatibility..................................................................................19 3.1.4. Application Compatibility.........................................................................19 3.1.5. C/C++ Compatibility............................................................................... 20 3.1.6. 64-Bit Compatibility............................................................................... 20 3.2. CUDA C Runtime.........................................................................................20 3.2.1. Initialization.........................................................................................21 3.2.2. Device Memory..................................................................................... 21 3.2.3. Shared Memory..................................................................................... 24 3.2.4. Page-Locked Host Memory........................................................................29 3.2.4.1. Portable Memory..............................................................................30 3.2.4.2. Write-Combining Memory....................................................................30 3.2.4.3. Mapped Memory...............................................................................30 3.2.5. Asynchronous Concurrent Execution............................................................ 31 3.2.5.1. Concurrent Execution between Host and Device........................................32 3.2.5.2. Concurrent Kernel Execution............................................................... 32 3.2.5.3. Overlap of Data Transfer and Kernel Execution......................................... 32 3.2.5.4. Concurrent Data Transfers.................................................................. 33 3.2.5.5. Streams.........................................................................................33 3.2.5.6. Events...........................................................................................37 3.2.5.7. Synchronous Calls.............................................................................37 3.2.6. Multi-Device System............................................................................... 38 3.2.6.1. Device Enumeration.......................................................................... 38 3.2.6.2. Device Selection.............................................................................. 38 www.nvidia.com CUDA C Programming Guide PG-02829-001_v7.5 | iii 3.2.6.3. Stream and Event Behavior................................................................. 38 3.2.6.4. Peer-to-Peer Memory Access................................................................39 3.2.6.5. Peer-to-Peer Memory Copy..................................................................39 3.2.7. Unified Virtual Address Space................................................................... 40 3.2.8. Interprocess Communication..................................................................... 41 3.2.9. Error Checking......................................................................................41 3.2.10. Call Stack.......................................................................................... 42 3.2.11. Texture and Surface Memory................................................................... 42 3.2.11.1. Texture Memory............................................................................. 42 3.2.11.2. Surface Memory............................................................................. 52 3.2.11.3. CUDA Arrays..................................................................................56 3.2.11.4. Read/Write Coherency..................................................................... 56 3.2.12. Graphics Interoperability........................................................................56 3.2.12.1. OpenGL Interoperability................................................................... 57 3.2.12.2. Direct3D Interoperability...................................................................59 3.2.12.3. SLI Interoperability..........................................................................65 3.3. Versioning and Compatibility.......................................................................... 66 3.4. Compute Modes..........................................................................................67 3.5. Mode Switches........................................................................................... 68 3.6. Tesla Compute Cluster Mode for Windows.......................................................... 68 Chapter 4. Hardware Implementation......................................................................69 4.1. SIMT Architecture....................................................................................... 69 4.2. Hardware Multithreading...............................................................................71 Chapter 5. Performance Guidelines........................................................................ 72 5.1. Overall Performance Optimization Strategies...................................................... 72 5.2. Maximize Utilization.................................................................................... 72 5.2.1. Application Level...................................................................................72 5.2.2. Device Level........................................................................................ 73 5.2.3. Multiprocessor Level...............................................................................73 5.2.3.1. Occupancy Calculator........................................................................ 75 5.3. Maximize Memory Throughput........................................................................ 77 5.3.1. Data Transfer between Host and Device....................................................... 78 5.3.2. Device Memory Accesses..........................................................................79 5.4. Maximize Instruction Throughput.....................................................................83 5.4.1. Arithmetic Instructions............................................................................83 5.4.2. Control Flow Instructions......................................................................... 87 5.4.3. Synchronization Instruction.......................................................................88 Appendix A. CUDA-Enabled GPUs..........................................................................
Recommended publications
  • GPU-Based Password Cracking on the Security of Password Hashing Schemes Regarding Advances in Graphics Processing Units
    Radboud University Nijmegen Faculty of Science Kerckhoffs Institute Master of Science Thesis GPU-based Password Cracking On the Security of Password Hashing Schemes regarding Advances in Graphics Processing Units by Martijn Sprengers [email protected] Supervisors: Dr. L. Batina (Radboud University Nijmegen) Ir. S. Hegt (KPMG IT Advisory) Ir. P. Ceelen (KPMG IT Advisory) Thesis number: 646 Final Version Abstract Since users rely on passwords to authenticate themselves to computer systems, ad- versaries attempt to recover those passwords. To prevent such a recovery, various password hashing schemes can be used to store passwords securely. However, recent advances in the graphics processing unit (GPU) hardware challenge the way we have to look at secure password storage. GPU's have proven to be suitable for crypto- graphic operations and provide a significant speedup in performance compared to traditional central processing units (CPU's). This research focuses on the security requirements and properties of prevalent pass- word hashing schemes. Moreover, we present a proof of concept that launches an exhaustive search attack on the MD5-crypt password hashing scheme using modern GPU's. We show that it is possible to achieve a performance of 880 000 hashes per second, using different optimization techniques. Therefore our implementation, executed on a typical GPU, is more than 30 times faster than equally priced CPU hardware. With this performance increase, `complex' passwords with a length of 8 characters are now becoming feasible to crack. In addition, we show that between 50% and 80% of the passwords in a leaked database could be recovered within 2 months of computation time on one Nvidia GeForce 295 GTX.
    [Show full text]
  • Version 7.8-Systemd
    Linux From Scratch Version 7.8-systemd Created by Gerard Beekmans Edited by Douglas R. Reno Linux From Scratch: Version 7.8-systemd by Created by Gerard Beekmans and Edited by Douglas R. Reno Copyright © 1999-2015 Gerard Beekmans Copyright © 1999-2015, Gerard Beekmans All rights reserved. This book is licensed under a Creative Commons License. Computer instructions may be extracted from the book under the MIT License. Linux® is a registered trademark of Linus Torvalds. Linux From Scratch - Version 7.8-systemd Table of Contents Preface .......................................................................................................................................................................... vii i. Foreword ............................................................................................................................................................. vii ii. Audience ............................................................................................................................................................ vii iii. LFS Target Architectures ................................................................................................................................ viii iv. LFS and Standards ............................................................................................................................................ ix v. Rationale for Packages in the Book .................................................................................................................... x vi. Prerequisites
    [Show full text]
  • Cuda Compiler Driver Nvcc
    CUDA COMPILER DRIVER NVCC TRM-06721-001_v10.1 | August 2019 Reference Guide CHANGES FROM PREVIOUS VERSION ‣ Major update to the document to reflect recent nvcc changes. www.nvidia.com CUDA Compiler Driver NVCC TRM-06721-001_v10.1 | ii TABLE OF CONTENTS Chapter 1. Introduction.........................................................................................1 1.1. Overview................................................................................................... 1 1.1.1. CUDA Programming Model......................................................................... 1 1.1.2. CUDA Sources........................................................................................ 1 1.1.3. Purpose of NVCC.................................................................................... 2 1.2. Supported Host Compilers...............................................................................2 Chapter 2. Compilation Phases................................................................................3 2.1. NVCC Identification Macro.............................................................................. 3 2.2. NVCC Phases............................................................................................... 3 2.3. Supported Input File Suffixes...........................................................................4 2.4. Supported Phases......................................................................................... 4 Chapter 3. The CUDA Compilation Trajectory............................................................
    [Show full text]
  • Oracle Database Administrator's Reference for UNIX-Based Operating Systems
    Oracle® Database Administrator’s Reference 10g Release 2 (10.2) for UNIX-Based Operating Systems B15658-06 March 2009 Oracle Database Administrator's Reference, 10g Release 2 (10.2) for UNIX-Based Operating Systems B15658-06 Copyright © 2006, 2009, Oracle and/or its affiliates. All rights reserved. Primary Author: Brintha Bennet Contributing Authors: Kevin Flood, Pat Huey, Clara Jaeckel, Emily Murphy, Terri Winters, Ashmita Bose Contributors: David Austin, Subhranshu Banerjee, Mark Bauer, Robert Chang, Jonathan Creighton, Sudip Datta, Padmanabhan Ganapathy, Thirumaleshwara Hasandka, Joel Kallman, George Kotsovolos, Richard Long, Rolly Lv, Padmanabhan Manavazhi, Matthew Mckerley, Sreejith Minnanghat, Krishna Mohan, Rajendra Pingte, Hanlin Qian, Janelle Simmons, Roy Swonger, Lyju Vadassery, Douglas Williams This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this software or related documentation is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S.
    [Show full text]
  • NVIDIA CUDA Installation Guide for Linux
    NVIDIA CUDA Installation Guide for Linux Installation and Verification on Linux Systems DU-05347-001_v11.4 | September 2021 Table of Contents Chapter 1. Introduction........................................................................................................ 1 1.1. System Requirements...............................................................................................................1 1.2. About This Document............................................................................................................... 3 Chapter 2. Pre-installation Actions..................................................................................... 4 2.1. Verify You Have a CUDA-Capable GPU....................................................................................4 2.2. Verify You Have a Supported Version of Linux........................................................................ 5 2.3. Verify the System Has gcc Installed........................................................................................5 2.4. Verify the System has the Correct Kernel Headers and Development Packages Installed........................................................................................................................................5 2.5. Install MLNX_OFED.................................................................................................................. 7 2.6. Choose an Installation Method................................................................................................ 7 2.7. Download
    [Show full text]
  • Unix Programmer's Manual
    There is no warranty of merchantability nor any warranty of fitness for a particu!ar purpose nor any other warranty, either expressed or imp!ied, a’s to the accuracy of the enclosed m~=:crials or a~ Io ~helr ,~.ui~::~::.j!it’/ for ~ny p~rficu~ar pur~.~o~e. ~".-~--, ....-.re: " n~ I T~ ~hone Laaorator es 8ssumg$ no rO, p::::nS,-,,.:~:y ~or their use by the recipient. Furln=,, [: ’ La:::.c:,:e?o:,os ~:’urnes no ob~ja~tjon ~o furnish 6ny a~o,~,,..n~e at ~ny k:nd v,,hetsoever, or to furnish any additional jnformstjcn or documenta’tjon. UNIX PROGRAMMER’S MANUAL F~ifth ~ K. Thompson D. M. Ritchie June, 1974 Copyright:.©d972, 1973, 1974 Bell Telephone:Laboratories, Incorporated Copyright © 1972, 1973, 1974 Bell Telephone Laboratories, Incorporated This manual was set by a Graphic Systems photo- typesetter driven by the troff formatting program operating under the UNIX system. The text of the manual was prepared using the ed text editor. PREFACE to the Fifth Edition . The number of UNIX installations is now above 50, and many more are expected. None of these has exactly the same complement of hardware or software. Therefore, at any particular installa- tion, it is quite possible that this manual will give inappropriate information. The authors are grateful to L. L. Cherry, L. A. Dimino, R. C. Haight, S. C. Johnson, B. W. Ker- nighan, M. E. Lesk, and E. N. Pinson for their contributions to the system software, and to L. E. McMahon for software and for his contributions to this manual.
    [Show full text]
  • Oracle8i Administrator's Reference Release 3 (8.1.7) for IBM DYNIX/Ptx
    Oracle8i Administrator’s Reference Release 3 (8.1.7) for IBM DYNIX/ptx November 2000 Part No. A87392-01 Oracle8i Administrator’s Reference Release 3 (8.1.7) for IBM DYNIX/ptx Part No. A87392-01 Copyright © 1996,2000, Oracle Corporation. All rights reserved. The programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications. It shall be the licensee’s responsibility to take all appropriate fail-safe, backup, redundancy, and other measures to ensure the safe use of such applications if the programs are used for such purposes, and Oracle Corporation disclaims liability for any damages caused by such use of the programs. The programs (which include both the software and documentation) contain proprietary information of Oracle Corporation; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly, or decompilation of the programs is prohibited. The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Oracle Corporation does not warrant that this document is error free. Except as may be expressly permitted in your license agreement for these programs, no part of these programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Oracle Corporation. If the programs are delivered to the U.S. Government or anyone licensing or using the programs on behalf of the U.S.
    [Show full text]
  • UNIX™ System V DOCUMENTER's WORKBENCH™ Volume ONE
    ___ALTOS _ UNIX™ SySTEM V DOCUMENTER'S WORKBENCH™ VOlUME ONE INTROduCTioN ANd TEXT FORMATTERS REfERENCE UNIX™ System V DOCUMENTER'S WORKBENCH™ Volume One ACKNOW'LBDGBMEN'rS The Altos logo, as it appears in this manual, is a registered trademark of Altos Computer Systems. DOCUMENTER IS WORKBENCH™ is a trademark of AT&T Technologies. IMPRINT@ and IMAGEN® are registered trademarks of the IMAGEN Corporation. TEKTRONIX® is a registered trademark of Tektronics, Inc. TELETYPE™ is a trademark of AT&T Teletype Corporation. TRENDATA® is a registered trademark of Trendata Corporation. UNIX™ is a tr ademark of AT &'1' Bell Labor at or ies. CONTENTS Chapter 1 DOCUMENTER'S WORKBENCH SOFTWARE Chapter 2 DOCUMENT PREPARATION APPENDIX USER REFERENCE MANUAL - i - Chapter 1 DOCUMENTER'S WORKBENCH SOFTWARE PAGE INTRODUCTION............................................... 1-1 HOW TO USE THIS BOOK. • . • . .. 1-2 I!.. 1 ,J Chapter 1 DOCUMENTER'S WORKBENCH SOFTWARE INTRODUCTION This book is the introductory volume of a set of documents that provide information about the DOCUMENTER'S WORKBENCH software on the UNIX* system. Other books in this series are: 307-151 Text Formatters Reference-provides a reference covering the text formatters nroff, troff (device independent), otroff (old troff), and sroff. 307 -152 Macro Packages Reference-provides a reference for the macro packages mm (memorandum macros), sroff/mm (mm macros for the sroff formatter), and mv (view graph macros). 307-153 Preprocessors Reference-provides a reference for the preprocessors tbl, pic, and eqn/neqn. This book is both an introduction to the DOCUMENTER'S WORKBENCH software and a DOCUMENTER'S WORKBENCH software command reference manual. Chapter 2 contains introductory material on some of the software, and the appendix provides pages describing the various commands that are part of the DOCUMENTER'S WORKBENCH software.
    [Show full text]
  • Gnu Coreutils Core GNU Utilities for Version 5.93, 2 November 2005
    gnu Coreutils Core GNU utilities for version 5.93, 2 November 2005 David MacKenzie et al. This manual documents version 5.93 of the gnu core utilities, including the standard pro- grams for text and file manipulation. Copyright c 1994, 1995, 1996, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”. Chapter 1: Introduction 1 1 Introduction This manual is a work in progress: many sections make no attempt to explain basic concepts in a way suitable for novices. Thus, if you are interested, please get involved in improving this manual. The entire gnu community will benefit. The gnu utilities documented here are mostly compatible with the POSIX standard. Please report bugs to [email protected]. Remember to include the version number, machine architecture, input files, and any other information needed to reproduce the bug: your input, what you expected, what you got, and why it is wrong. Diffs are welcome, but please include a description of the problem as well, since this is sometimes difficult to infer. See section “Bugs” in Using and Porting GNU CC. This manual was originally derived from the Unix man pages in the distributions, which were written by David MacKenzie and updated by Jim Meyering.
    [Show full text]
  • PTX Installation Manual Moore
    ot · tiom r -_ ris -i ,er USEI ·,s I ANUAL January 2018 No. 150-704-00 F Table of Contents Introduction 1 0.sCfiption 1 Calibration 3 Installation 6 Installation in Hazardous Locations 8 Specific Conditions of Use 8 Maintenance& Troubleshooting 11 Moore Industries"STAR" Center has a wide variety of quaJity instrumentation in stock and ready to sh ip. • Signal Transmitters 4S�LIVERY • TemperatureTransm itters • P/1 and1 /PConverters • Isolators and Converters • Indicators and Displays • AlannTrips • Integrators and Totalizers • Power Transducers l06$0�CIO!ll.,...1 • Instrument Power Suppl es �llldkt� S..,wcll,, C.bN Ill�U.SA. i ltt18 181 8'11,7111 • TIii! 8S-JSt.i • Racks, Rails and Enclosures TOLL FREE F� :(,18)891•2' 1• 1-800-999-2900 OONt<et�WS(l'Ul.VEOA Mostinstruments can be customized Un'ei!dK ifiodOffl , uo,,,,.co...,,.� AoJo• , °'-"r to meet yoor needs. Even then, you·n W. &u.-RH1f).201J,UriWld'°"dom T-0'2li0SI •... • Ttlc 87&67 never have to wa t more than a few days. 080FREE0 52510PHONE7 FAX:0:53� i ...,.... 008TOLL 261928 mEE Page1 PTX Introduction Tht HP•Style PTX. Thisstyt& oCunit SI inlended10 runotlon as a modutarrep&aoomenl In appllcations Moore ndustflos'IOOP-PQWe<ed Poteot ometer where conduitand encfO-Sures may already be in l l place.when a specialenclosur e s n()11&qui -ed,or In Transmll1er,the PTX, Is a devioeused to convert i s1andard,thr ee-w re potentlomttor(poQ nput 0 appllcatlons where the OIN•sty le unit 1$not otherwise i i 1 appropriate, pl'0p011i0nal current output.
    [Show full text]
  • Networker Command to Gather System Information Nsr Support
    Maintenance Commands nsrsup ( 1m ) NAME nsrsup - NetWorker command to gather system information SYNOPSIS nsr_support [ −ab:cdef:g:hi:jl:m:no:prst:uvwxz ] nsrsup [ −ab:cdef:g:hi:jl:m:no:prst:uvwxz ] DESCRIPTION nsr_support gathers up information about a system to be used by Technical Support or Engineering for analysis. Although there are a lot of flags, in most cases the default should suffice. The output is gathered into /nsr/tmp/n_supp.pid. If an output filename is provided, the contents are moved there. If output is to be emailed, the output is split into multiple files with appropriate names with the pid as suffix, then it’s tar’red, optionally compressed and then uuencoded for mailing. The output can also be in html format for either saving locally or mailing. PLATFORMS nsr_support is officially supported on the following platforms. It will likely run on other Un∗x platforms as well. Platform names as reported by the uname(1) command. SunOS OSF1 HP-UX Linux AIX IRIX IRIX64 DYNIX/ptx OPTIONS The following options affect the behaviour of nsr_support(1m). Long boolean flags can be turned on with −−flag−name | −−flag−name=yes, and they can be turned off with −−noflag−name | −−flag−name=no. Long non−boolean flags can only be turned of with −−noflag−name. Flags which take arguments require their arguments to be a comma or space separated, quoted list. For single letter options, the uppercase version of the flag turns the behaviour off. -a, --all Turn on all flags. -A Turn off all flags. -b cores,daemons --dbg-cores --dbg-daemons Run debugger on core files and/or daemons.
    [Show full text]
  • Tivoli TEC Adapters 3.6.1 Release Notes Addendum
    Tivoli TEC Adapters 3.6.1 Release Notes Addendum for Data General DG/UX, DEC-NT, Digital UNIX, NCR UNIX SVR4, Red Hat Linux-ix86, OpenServer-ix86, OpenStep4-ix86, SCO UnixWare, Sequent DYNIX/ptx, SGI IRIX, Siemens Nixdorf Reliant UNIX, and Sun Solaris-ix86 May 31, 2000 Tivoli TEC Adapters 3.6.1 for Data General DG/UX, DEC-NT, Digital UNIX, NCR UNIX SVR4, Red Hat Linux-ix86, OpenServer-ix86, OpenStep4-ix86, SCO UnixWare, Sequent DYNIX/ptx, SGI IRIX, Siemens Nixdorf Reliant UNIX, and Sun Solaris-ix86 Release Notes Addendum (May 31, 2000) Copyright Notice Copyright © 2000 by Tivoli Systems, an IBM Company, including this documentation and all software. All rights reserved. May only be used pursuant to a Tivoli Systems Software License Agreement or Addendum for Tivoli Products to IBM Customer or License Agreement. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual, or otherwise, without prior written permission of Tivoli Systems. The document is not intended for production and is furnished “as is” without warranty of any kind. All warranties on this document are hereby disclaimed including the warranties of merchantability and fitness for a particular purpose. Note to U.S. Government Users—Documentation related to restricted rights—Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corporation. Trademarks The following product names are trademarks of Tivoli Systems or IBM Corporation: AIX, IBM, OS/2, RS/6000, Tivoli Management Environment, TME 10, TME 10 Framework, TME 10 Distributed Monitoring, TME 10 Inventory, TME 10 Enterprise Console, TME 10 Remote Control, and TME 10 Software Distribution.
    [Show full text]