CUDA Driver API

Total Page:16

File Type:pdf, Size:1020Kb

CUDA Driver API CUDA Driver API API Reference Manual TRM-06703-001 _v11.4.2 | September 2021 Table of Contents Chapter 1. Difference between the driver and runtime APIs..............................................1 Chapter 2. API synchronization behavior.............................................................................3 Chapter 3. Stream synchronization behavior...................................................................... 5 Chapter 4. Graph object thread safety.................................................................................7 Chapter 5. Rules for version mixing....................................................................................8 Chapter 6. Modules.............................................................................................................. 9 6.1. Data types used by CUDA driver............................................................................................10 CUaccessPolicyWindow_v1........................................................................................................ 11 CUarrayMapInfo_v1.....................................................................................................................11 CUDA_ARRAY3D_DESCRIPTOR_v2............................................................................................11 CUDA_ARRAY_DESCRIPTOR_v2................................................................................................ 11 CUDA_ARRAY_SPARSE_PROPERTIES_v1.................................................................................11 CUDA_EXT_SEM_SIGNAL_NODE_PARAMS_v1........................................................................ 11 CUDA_EXT_SEM_WAIT_NODE_PARAMS_v1.............................................................................11 CUDA_EXTERNAL_MEMORY_BUFFER_DESC_v1.....................................................................11 CUDA_EXTERNAL_MEMORY_HANDLE_DESC_v1.................................................................... 11 CUDA_EXTERNAL_MEMORY_MIPMAPPED_ARRAY_DESC_v1................................................ 11 CUDA_EXTERNAL_SEMAPHORE_HANDLE_DESC_v1............................................................. 11 CUDA_EXTERNAL_SEMAPHORE_SIGNAL_PARAMS_v1..........................................................11 CUDA_EXTERNAL_SEMAPHORE_WAIT_PARAMS_v1.............................................................. 11 CUDA_HOST_NODE_PARAMS_v1..............................................................................................12 CUDA_KERNEL_NODE_PARAMS_v1.........................................................................................12 CUDA_LAUNCH_PARAMS_v1.................................................................................................... 12 CUDA_MEM_ALLOC_NODE_PARAMS.......................................................................................12 CUDA_MEMCPY2D_v2................................................................................................................ 12 CUDA_MEMCPY3D_PEER_v1.....................................................................................................12 CUDA_MEMCPY3D_v2................................................................................................................ 12 CUDA_MEMSET_NODE_PARAMS_v1........................................................................................ 12 CUDA_POINTER_ATTRIBUTE_P2P_TOKENS_v1...................................................................... 12 CUDA_RESOURCE_DESC_v1..................................................................................................... 12 CUDA_RESOURCE_VIEW_DESC_v1...........................................................................................12 CUDA_TEXTURE_DESC_v1.........................................................................................................12 CUdevprop_v1............................................................................................................................. 12 CUeglFrame_v1...........................................................................................................................12 CUDA Driver API TRM-06703-001 _v11.4.2 | ii CUexecAffinityParam_v1.............................................................................................................12 CUexecAffinitySmCount_v1........................................................................................................ 12 CUipcEventHandle_v1.................................................................................................................12 CUipcMemHandle_v1..................................................................................................................13 CUkernelNodeAttrValue_v1........................................................................................................13 CUmemAccessDesc_v1.............................................................................................................. 13 CUmemAllocationProp_v1..........................................................................................................13 CUmemLocation_v1....................................................................................................................13 CUmemPoolProps_v1.................................................................................................................13 CUmemPoolPtrExportData_v1...................................................................................................13 CUstreamAttrValue_v1............................................................................................................... 13 CUstreamBatchMemOpParams_v1........................................................................................... 13 CUaccessProperty...................................................................................................................... 13 CUaddress_mode........................................................................................................................13 CUarray_cubemap_face............................................................................................................. 14 CUarray_format.......................................................................................................................... 14 CUarraySparseSubresourceType...............................................................................................15 CUcomputemode........................................................................................................................ 15 CUctx_flags................................................................................................................................. 15 CUDA_POINTER_ATTRIBUTE_ACCESS_FLAGS........................................................................16 CUdevice_attribute......................................................................................................................16 CUdevice_P2PAttribute...............................................................................................................22 CUdriverProcAddress_flags....................................................................................................... 23 CUeglColorFormat......................................................................................................................23 CUeglFrameType........................................................................................................................ 30 CUeglResourceLocationFlags....................................................................................................30 CUevent_flags............................................................................................................................. 31 CUevent_record_flags................................................................................................................ 31 CUevent_wait_flags.................................................................................................................... 31 CUexecAffinityType......................................................................................................................32 CUexternalMemoryHandleType................................................................................................. 32 CUexternalSemaphoreHandleType............................................................................................32 CUfilter_mode............................................................................................................................. 33 CUflushGPUDirectRDMAWritesOptions.....................................................................................33 CUflushGPUDirectRDMAWritesScope....................................................................................... 34 CUflushGPUDirectRDMAWritesTarget...................................................................................... 34 CUfunc_cache............................................................................................................................. 34 CUfunction_attribute...................................................................................................................34 CUDA Driver API TRM-06703-001 _v11.4.2 | iii CUGPUDirectRDMAWritesOrdering...........................................................................................35 CUgraphDebugDot_flags............................................................................................................36 CUgraphicsMapResourceFlags..................................................................................................36
Recommended publications
  • How to Download 382.33 Nvidia Driver Geforce Game Ready Driver
    how to download 382.33 nvidia driver GeForce Game Ready Driver. As part of the NVIDIA Notebook Driver Program, this is a reference driver that can be installed on supported NVIDIA notebook GPUs. However, please note that your notebook original equipment manufacturer (OEM) provides certified drivers for your specific notebook on their website. NVIDIA recommends that you check with your notebook OEM about recommended software updates for your notebook. OEMs may not provide technical support for issues that arise from the use of this driver. Before downloading this driver: It is recommended that you backup your current system configuration. Click here for instructions. Game Ready Drivers provide the best possible gaming experience for all major new releases, including Virtual Reality games. Prior to a new title launching, our driver team is working up until the last minute to ensure every performance tweak and bug fix is included for the best gameplay on day-1. Game Ready Provides the optimal gaming experience for Tekken 7 and Star Trek Bridge Crew. Notebooks supporting Hybrid Power technology are not supported (NVIDIA Optimus technology is supported). The following Sony VAIO notebooks are included in the Verde notebook program: Sony VAIO F Series with NVIDIA GeForce 310M, GeForce GT 330M, GeForce GT 425M, GeForce GT 520M or GeForce GT 540M. Other Sony VAIO notebooks are not included (please contact Sony for driver support). Fujitsu notebooks are not included (Fujitsu Siemens notebooks are included). GeForce GTX 1080, GeForce GTX 1070, GeForce GTX 1060, GeForce GTX 1050 Ti, GeForce GTX 1050. GeForce 900M Series (Notebooks): GeForce GTX 980, GeForce GTX 980M, GeForce GTX 970M, GeForce GTX 965M, GeForce GTX 960M, GeForce GTX 950M, GeForce 945M, GeForce 940MX, GeForce 930MX, GeForce 920MX, GeForce 940M, GeForce 930M, GeForce 920M, GeForce 910M.
    [Show full text]
  • GPU Technology Conference 2010 Sessions on 2095
    GPU Technology Conference 2010 Sessions on Video Processing (subject to change) IMPORTANT: Visit www.nvidia.com/gtc for the most up-to-date schedule and to enroll into sessions to ensure your spot in the most popular courses. 2095 - Building High Density Real-Time Video Processing Systems Learn how GPU Direct can be used to effectively build real time, high performance, cost effective video processing products. We will focus especially on how to optimize bus throughput while keeping CPU load and latency minimal. Speaker: Ronny Dewaele, Barco Topics: Video Processing, Imaging Time: Thursday, September, 23rd, 16:00 - 16:50 2029 - Computer Vision Algorithms for Automating HD Post- Production Discover how post-production tasks can be accelerated by taking advantage of GPU-based algorithms. In this talk we present computer vision algorithms for corner detection, feature point tracking, image warping and image inpainting, and their efficient implementation on GPUs using CUDA. We also show how to use these algorithms to do real-time stabilization and temporal re-sampling (re-timing) of high definition video sequences, both common tasks in post-production. Benchmarking of the GPU implementations against optimized CPU algorithms demonstrates a speedup of approximately an order of magnitude. Speaker: Hannes Fassold, JOANNEUM RESEARCH Topics: Computer Vision, Video Processing Time: Wednesday, September, 22nd, 15:00 - 15:50 2125 - Developing GPU Enabled Visual Effects For Film And Video 1 The arrival of fully programable GPUs is now changing the visual effects industry, which traditionally relied on CPU computation to create their spectacular imagery. Implementing the complex image processing algorithms used by VFX is a challenge, but the payoffs in terms of interactivity and throughput can be enormous.
    [Show full text]
  • Der Optimale PC
    ct.2509.001 16.11.2009 12:04 Uhr Seite 1 Mit Stellenmarkt www.ct.de e 3,50 magazin für Österreich e 3,70 magazin für Schweiz CHF 6,90 • Benelux e 4,20 ItalienItalien e 4,60 • Spanien e 4,60 25/2009 c computercomputer 2525 techniktechnik 23. 11. 2009 E-Books und Zeitungen im Online-Zugriff Das universelle Buch Kindle & Co. im Test • Lesestoff selbst aufbereiten Multifunktionsdrucker Videoschnittprogramme Fernseher mit LED-Backlight Aktuelle Spielkonsolen Günstige Android-Handys i5-PCs,i5-PCs, i7-Notebooksi7-Notebooks Quad-Core-Power Von XP auf Windows 7 E-Books • Der optimale PC Core i5 & i7, Strom sparen im Netz Wissenschaftler im Web Strom sparen im Netz Linux: Initskripte verwalten Notebook oder Desktop Heise h DerDer optimaleoptimale PCPC Komplettsysteme vs. superleise Selbstbaurechner ct.0108.999.anzeige.EP 09.06.2008 16:00 Uhr Seite 2 © Copyright by Heise Zeitschriften Verlag GmbH & Co. KG. Veröffentlichung und Vervielfältigung nur mit Genehmigung des Heise Zeitschriften Verlags. ct.2509.003 17.11.2009 15:22 Uhr Seite 3 c Nicht nachmachen "Wow, das Spiel ist wirklich geil!!!1111" Bei solch fetten Gewinnen kann die Firma So oder ähnlich könnte man die Bewertungen gelassen in Kauf nehmen, wenn in den kommenden etlicher Spielemagazine zusammen fassen, die Wochen Jugendschützer auf die Barrikaden das viel diskutierte 3D-Ballerspiel Call of klettern. Außerdem könnte Activisions Erfolgs - Duty: Modern Warfare 2 vom US-Hersteller rezept als Vorlage für andere Hersteller dienen, Activision in den Himmel loben. Doch selten um vermehrt mit altbekannten Spielideen war die Diskrepanz zwischen dem Hype der kräftig Kasse zu machen.
    [Show full text]
  • The Road to the Mainline Zynqmp VCU Driver
    The Road to the Mainline ZynqMP VCU Driver FOSDEM ’21 Michael Tretter – [email protected] https://www.pengutronix.de Agenda Xilinx Zynq® UltraScale+™ MPSoC H.264/H.265 Video Codec Unit Video Encoders in Mainline Linux VCU Mainline Driver: Allegro A Glimpse into the Future 2/47 Xilinx Zynq® UltraScale+™ MPSoC 3/47 ZynqMP Platform Overview Luca Ceresoli: ARM64 + FPGA and more: Linux on the Xilinx ZynqMP https://archive.fosdem.org/ 2018/schedule/event/arm6 4_and_fpga 4/47 ZynqMP Mainline Status Mainline Linux just works, e.g., on ZCU104 Evaluation Kit U-Boot, Barebox, FSBL Sometimes more reliable with Xilinx downstream Xilinx is actively mainlining their drivers 5/47 Make Sure that Your ZynqMP has a VCU ZU # E V ZU: Zynq Ultrascale+ #: Value Index C/E: Processor System Identifier G/V: Engine Type 6/47 Focus on Video Encoding VCU supports video decoding, as well Linux mainline driver only supports encoding Decoding might be focus in a future talk 7/47 Basic Video Encoding Knowledge Expected Paul Kocialkowski: Supporting Hardware-Accelerated Video Encoding with Mainline https://www.youtube.com/watch?v=S5wCdZfGFew 8/47 H.264/H.265 Video Codec Unit 9/47 VCU: Documentation Hardware configuration Software usage Available on the Xilinx Website 10/47 VCU: Features The encoder engine is designed to process video streams using the HEVC (ISO/IEC 23008-2 high-efficiency Video Coding) and AVC (ISO/IEC 14496-10 Advanced Video Coding) standards. It provides complete support for these standards, including support for 8-bit and 10-bit color, Y- only (monochrome), 4:2:0 and 4:2:2 Chroma formats, up to 4K UHD at 60 Hz performance.
    [Show full text]
  • AMD Linux Driver 2021.10 Release Notes
    [AMD Official Use Only - Internal Distribution Only] AMD Linux Driver 2021.10 Release Notes 1. Overview AMD’s Linux® Driver’s includes open source graphics driver for AMD’s embedded platforms and other peripheral devices on selected development platforms. New features supported in this release: 1. New LTS kernel 5.10.5. 2. Bug fixes and driver updates. 2. Linux® kernel Support 1. 5.10.5 LTS 3. Linux Distribution Support 1. Ubuntu 20.04.1 4. Component Versions The following table shows git commit details of the sources and binaries used in the package. The patches present in patches folder of this release package has to be applied on top of the git commit mentioned in the below table to get the full sources corresponding to this driver release. The sources directory in this package contains patches pre-applied to these commit ids. 2021.10 Linux Driver Release Notes 1 [AMD Official Use Only - Internal Distribution Only] Component Version Commit ID Source Link for git clone Name Kernel 5.10.5 f5247949c0a9304ae43a895f29216a9d876f https://git.kernel.org/pub/scm/linux/ker 3919 nel/git/stable/linux.git Libdrm 2.4.103 5dea8f56ee620e9a3ace34a99ebf0175efb5 https://github.com/freedesktop/mesa- 7b11 drm.git Mesa 21.1.0-dev 38f012e0238f145f4c83bf7abf59afceee333 https://github.com/mesa3d/mesa.git 397 Ddx 19.1.0 6234a1b2652f469071c0c9b0d8b0f4a8079e https://github.com/freedesktop/xorg- fe74 xf86-video-amdgpu.git Gstomx 1.0.0.1 5c4bff4a433dff1c5d005edfceaf727b6214b git://people.freedesktop.org/~leoliu/gsto b74 mx Wayland 1.15.0 ea09c2fde7fcfc7e24a19ae5c5977981e9bef
    [Show full text]
  • Storage Device Information 1
    Storage Device Information 1. MSI suggests to ask our local service center for a storage device’s approval list before your upgrade in order to avoid any compatibility issues. 2. For having the maximum performance of the SSD, see the suggested Stripe Size shown in the RAID 0 column to do the set up. 3. 2.5 inch vs. mSATA vs. M.2 SSD Which M.2 SSD Drive do I need? 1. Connector & Socket: Make sure you have the right combination of the SSD connector and the socket on your notebook. There are three types of M.2 SSD edge connectors, B Key, M Key, and B+M Key. – B key edge connector can use on Socket for 'B' Key (aka. Socket 2) (SATA) – M key edge connector can use on Socket for 'M' Key (aka. Socket 3) (PCIe x 4) – B+M Key edge connector can use on both sockets * To know the supported connector and the socket of your notebook, see the ‘M.2 slot(s)’ column. 2. Physical size: M.2 SSD allows for a variety of physical sizes, like 2242, 2260, 2280, and MSI Notebooks supports Type 2280. 3. Logical interface: M.2 drives can connect through either the SATA controller or through the PCI-E bus in either x2 or x4 mode. Catalogue 1. MSI Gaming Notebooks – NVIDIA GeForce 10 Series Graphics And Intel HM175/CM238 Chipset – NVIDIA GeForce 10 Series Graphics And Intel HM170/CM236 Chipset – NVIDIA GeForce 900M Series Graphics And Intel HM175 Chipset – NVIDIA GeForce 900M Series Graphics And Intel HM170/CM236 Chipset – NVIDIA GeForce 900M Series Graphics And Intel HM87/HM86 Chipset – NVIDIA GeForce 800M Series Graphics – NVIDIA GeForce 700M Series Graphics – Gaming Dock – MSI Vortex – MSI VR ONE 2.
    [Show full text]
  • NVIDIA's Opengl Functionality
    NVIDIANVIDIA ’’ss OpenGLOpenGL FunctionalityFunctionality Session 2127 | Room A5 | Monday, September, 20th, 16:00 - 17:20 San Jose Convention Center, San Jose, California Mark J. Kilgard • Principal System Software Engineer – OpenGL driver – Cg (“C for graphics”) shading language • OpenGL Utility Toolkit (GLUT) implementer • Author of OpenGL for the X Window System • Co-author of Cg Tutorial Outline • OpenGL’s importance to NVIDIA • OpenGL 3.3 and 4.0 • OpenGL 4.1 • Loose ends: deprecation, Cg, further extensions OpenGL Leverage Cg Parallel Nsight SceniX CompleX OptiX Example of Hybrid Rendering with OptiX OpenGL (Rasterization) OptiX (Ray tracing) Parallel Nsight Provides OpenGL Profiling Configure Application Trace Settings Parallel Nsight Provides OpenGL Profiling Magnified trace options shows specific OpenGL (and Cg) tracing options Parallel Nsight Provides OpenGL Profiling Parallel Nsight Provides OpenGL Profiling Trace of mix of OpenGL and CUDA shows glFinish & OpenGL draw calls OpenGL In Every NVIDIA Business OpenGL on Quadro – World class OpenGL 4 drivers – 18 years of uninterrupted API compatibility – Workstation application certifications – Workstation application profiles – Display list optimizations – Fast antialiased lines – Largest memory configurations: 6 gigabytes – GPU affinity – Enhanced interop with CUDA and multi-GPU OpenGL – Advanced multi-GPU rendering – Overlays – Genlock – Unified Back Buffer for less framebuffer memory usage – Cross-platform • Windows XP, Vista, Win7, Linux, Mac, FreeBSD, Solaris – SLI Mosaic –
    [Show full text]
  • Gtx 950M Driver Download
    gtx 950m driver download NVIDIA GTX 950M 4GB DRIVERS WINDOWS 7 (2020) Nvidia lists the gtx 950 as being available in 2gb and 4gb variants, but only the former is currently available a slight concern when memory is one of the big graphics battlegrounds right now. Complete cuda toolkit for improved battery life you the latest releases. Access your gpu nvidia gpu cloud. You can be found in march is aligned with nvidia update. Sbx profile editor for creative ae-5, ae-7, and ae-9 sound cards. Good to igp supported resolutions may be found in sound cards. Modify images and texts of user-defined profiles in sound blaster connect and command applications. 1 - optimus may not be available on all manufacturer s notebooks. 965m, and geforce gtx 950m gpu. Geforce gtx 950m delivers great gaming performance at 1080p with inspired gameworks technologies for fluid, life-like visuals. The nvidia geforce gtx 950m is an upper mid-range, directx 11-compatible graphics card for laptops unveiled in march is based on nvidia's maxwell architecture. Update your graphics card drivers today. Driver Download. Details for use of this nvidia software can be found in the nvidia end user license agreement. Is the first whql-certified and latest recommended driver for all pre-release windows 10. Prior to a new title launching, our driver team is working up until the last minute to ensure every performance tweak and bug fix is included for the best gameplay on day-1. Download drivers for nvidia products including geforce graphics cards, nforce motherboards, quadro workstations, and more.
    [Show full text]
  • A Multifrequency MAC Specially Designed for Wireless Sensor
    U.S. Department of Energy, Office of Science High Performance Computing Facility Operational Assessment, CY 2011 Oak Ridge Leadership Computing Facility February 2012 Prepared by Arthur S. Bland James J. Hack Ann E. Baker Ashley D. Barker Kathlyn J. Boudwin Doug Hudson Ricky A. Kendall Bronson Messer James H. Rogers Galen M. Shipman Jack C. Wells Julia C. White U.S. Department of Energy, Office of Science HIGH PERFORMANCE COMPUTING FACILITY OPERATIONAL ASSESSMENT, FY11 OAK RIDGE LEADERSHIP COMPUTING FACILITY Arthur S. Bland Ricky A. Kendall James J. Hack Bronson Messer Ann E. Baker James H. Rogers Ashley D. Barker Galen M. Shipman Kathlyn J. Boudwin Jack C. Wells Doug Hudson Julia C. White February 2012 Prepared by OAK RIDGE NATIONAL LABORATORY Oak Ridge, Tennessee 37831-6283 managed by UT-BATTELLE, LLC for the U.S. DEPARTMENT OF ENERGY under contract DE-AC05-00OR22725 This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
    [Show full text]
  • HP Z800 Workstation Overview
    QuickSpecs HP Z800 Workstation Overview HP recommends Windows Vista® Business 1. 3 External 5.25" Bays 2. Power Button 3. Front I/O: 3 USB 2.0, 1 IEEE 1394a, 1 Headphone out, 1 Microphone in DA - 13278 North America — Version 2 — April 13, 2009 Page 1 QuickSpecs HP Z800 Workstation Overview 4. Choice of 850W, 85% or 1110W, 89% Power Supplies 9. Rear I/O: 1 IEEE 1394a, 6 USB 2.0, 1 serial, PS/2 keyboard/mouse 5. 12 DIMM Slots for DDR3 ECC Memory 2 RJ-45 to Integrated Gigabit LAN 1 Audio Line In, 1 Audio Line Out, 1 Microphone In 6. 3 External 5.25” Bays 10. 2 PCIe x16 Gen2 Slots 7. 4 Internal 3.5” Bays 11.. 2 PCIe x8 Gen2, 1 PCIe x4 Gen2, 1 PCIe x4 Gen1, 1 PCI Slot 8. 2 Quad Core Intel 5500 Series Processors 12 3 Internal USB 2.0 ports Form Factor Rackable Minitower Compatible Operating Genuine Windows Vista® Business 32-bit* Systems Genuine Windows Vista® Business 64-bit* Genuine Windows Vista® Business 32-bit with downgrade to Windows® XP Professional 32-bit custom installed** Genuine Windows Vista® Business 64-bit with downgrade to Windows® XP Professional x64 custom installed** HP Linux Installer Kit for Linux (includes drivers for both 32-bit & 64-bit OS versions of Red Hat Enterprise Linux WS4 and WS5 - see: http://www.hp.com/workstations/software/linux) For detailed OS/hardware support information for Linux, see: http://www.hp.com/support/linux_hardware_matrix *Certain Windows Vista product features require advanced or additional hardware.
    [Show full text]
  • Generated by Doxygen 1.8.11
    VDPAU Generated by Doxygen 1.8.11 Contents 1 Video Decode and Presentation API for Unix1 1.1 Introduction .............................................. 1 1.2 API Partitioning ............................................ 1 1.3 Object Types.............................................. 1 1.3.1 Device Type.......................................... 2 1.3.2 Surface Types......................................... 2 1.3.3 Transfer Types ........................................ 2 1.4 Data Flow............................................... 2 1.5 Entry Point Retrieval.......................................... 3 1.5.1 Philosophy .......................................... 3 1.6 Multi-threading............................................. 3 1.7 Surface Endianness.......................................... 4 1.8 Video Decoder Usage......................................... 5 1.8.1 MPEG-1 and MPEG-2 .................................... 5 1.8.2 H.264............................................. 5 1.8.3 VC-1 Simple and Main Profile ................................ 6 1.8.4 VC-1 Advanced Profile.................................... 6 1.8.5 MPEG-4 Part 2 and DivX................................... 6 1.8.6 H.265/HEVC - High Efficiency Video Codec......................... 6 1.9 Video Mixer Usage .......................................... 7 1.9.1 VdpVideoSurface Content .................................. 7 1.9.2 VdpVideoMixer Surface List ................................. 7 1.9.3 Weave De-interlacing..................................... 8 1.9.4 Bob De-interlacing
    [Show full text]
  • Introduction to Cuda Programming
    INTRODUCTION TO CUDA PROGRAMMING BHUPENDER THAKUR Outline Outline of the talk • GPU architecture • CUDA programming model • CUDA tools and applicaons • Benchmarks Growth in GPU compung • Kepler is the current release. • SuperMike II has two Fermi 2090 GPU’s on the gpu nodes • Queenbee replacement is expected to have Nvidia Kepler GPUs hp://blogs.nvidia.com/blog/2014/03/25/gpu-roadmap-pascal/ Large theore?cal GFLOPs count hp://docs.nvidia.com/cuda/cuda-c-programming-guide/ High Bandwidth hp://docs.nvidia.com/cuda/cuda-c-programming-guide/ Notable Data Center products Tesla Data Center ProDucts GPU Compute Capability Tesla K20 3.5 Tesla K10 3.0 Tesla M2050/M2070/M2075/M2090 2.0 Tesla S1070 1.3 Tesla M1060 1.3 Tesla S870 1.0 Compute capabilies • The GPU’s were originally designed primarily for 3D game rendering. • SM – Streaming mul>-processors with mul>ple processing cores • Streaming mul>-processors with mul>ple processing cores • On Fermi, each SM contains 32 processing cores, Kepler SMX has 192 • Execute in a Single Instruc>on Mul>ple Thread (SIMT) fashion • Fermi has up to 16 SMs on a card for a maximum of 512 compute cores hp://www.theregister.co.uk/Print/2012/05/18/inside_nvidia_kepler2_gk110_gpu_tesla/ Outline • The GPU’s were originally designed primarily for 3D game rendering. • SM – Streaming mul>-processors with mul>ple processing cores • Streaming mul>-processors with mul>ple processing cores • On Fermi, each SM contains 32 processing cores, Kepler SMX has 192 • Execute in a Single Instruc>on Mul>ple Thread (SIMT) fashion • Fermi
    [Show full text]