CSE 591: GPU Programming Basics on Architecture and Programming

Total Page:16

File Type:pdf, Size:1020Kb

CSE 591: GPU Programming Basics on Architecture and Programming Recommended Literature CSE 591: GPU Programming Basics on Architecture and Programming text book reference book Klaus Mueller programming guides available from nvidia.com Computer Science Department Stony Brook University more general books on parallel programing Course Topic Tag Cloud Course Topic Tag Cloud Architecture Architecture Limits of parallel programming Limits of parallel programming Host Performance tuning Host Performance tuning Kernels Kernels Debugging Thread management Debugging Thread management OpenCL OpenCL Algorithms Memory Algorithms Memory CUDA Device control CUDA Device control Example applications Example applications Parallel programming Parallel programming Speedup Curves Speedup Curves but wait, there is more to this….. Amdahl’s Law Amdahl’s Law Governs theoretical speedup Governs theoretical speedup 1 1 1 1 S = = S = = P P P P (1− P) + (1− P) + (1− P) + (1− P) + S parallel N S parallel N P: parallelizable portion of the program P: parallelizable portion of the program S: speedup S: speedup N: number of parallel processors N: number of parallel processors P determines theoretically achievable speedup • example (assuming infinite N): P=90% Æ S=10 P=99% Æ S=100 Amdahl’s Law Focus Efforts on Most Beneficial How many processors to use Optimize program portion with most ‘bang for the buck’ • when P is small Æ a small number of processors will do • look at each program component • when P is large (embarrassingly parallel) Æ high N is useful • don’t be ambitious in the wrong place Focus Efforts on Most Beneficial Beyond Theory.... Optimize program portion with most ‘bang for the buck’ Limits from mismatch of parallel program and parallel platform • look at each program component • man-made ‘laws’ subject to change with new architectures • don’t be ambitious in the wrong place Example: • program with 2 independent parts: A, B (execution time shown) A B Original program B sped up 5× A sped up 2× • sometimes one gains more with less Beyond Theory.... Beyond Theory.... Limits from mismatch of parallel program and parallel platform Limits from mismatch of parallel program and parallel platform • man-made ‘laws’ subject to change with new architectures • man-made ‘laws’ subject to change with new architectures Memory access patterns Memory access patterns • data access locality and strides vs. memory banks • data access locality and strides vs. memory banks Memory access efficiency • arithmetic intensity vs. cache sizes and hierarchies Beyond Theory.... Beyond Theory.... Limits from mismatch of parallel program and parallel platform Limits from mismatch of parallel program and parallel platform • man-made ‘laws’ subject to change with new architectures • man-made ‘laws’ subject to change with new architectures Memory access patterns Memory access patterns • data access locality and strides vs. memory banks • data access locality and strides vs. memory banks Memory access efficiency Memory access efficiency • arithmetic intensity vs. cache sizes and hierarchies • arithmetic intensity vs. cache sizes and hierarchies Enabled granularity of program parallelism Enabled granularity of program parallelism • MIMD vs. SIMD • MIMD vs. SIMD Hardware support for specific tasks Æ on-chip ASICS Beyond Theory.... Device Transfer Costs Limits from mismatch of parallel program and parallel platform Transferring the data to the device is also important • man-made ‘laws’ subject to change with new architectures • computational benefit of a transfer plays a large role • transfer costs are (or can be ) significant Memory access patterns • data access locality and strides vs. memory banks Memory access efficiency • arithmetic intensity vs. cache sizes and hierarchies Enabled granularity of program parallelism • MIMD vs. SIMD Hardware support for specific tasks Æ on-chip ASICS Support for hardware access Æ drivers, APIs Device Transfer Costs Device Transfer Costs Transferring the data to the device is also important Transferring the data to the device is also important • computational benefit of a transfer plays a large role • computational benefit of a transfer plays a large role • transfer costs are (or can be ) significant • transfer costs are (or can be ) significant Adding two (N×N) matrices: Adding two (N×N) matrices: • transfer back and from device: 3 N2 elements • transfer back and from device: 3 N2 elements • number of additions: N2 • number of additions: N2 Æ operations-transfer ratio = 1/3 or O(1) Æ operations-transfer ratio = 1/3 or O(1) Multiplying two (N×N) matrices: • transfer back and from device: 3 N2 elements • number of multiplications and additions: N3 Æ operations-transfer ratio = O(N) grows with N Conlcusions: Programming Strategy Course Topic Tag Cloud Use GPU to complement CPU execution • recognize parallel program segments and only parallelize these Architecture Limits of parallel programming • leave the sequential (serial) portions on the CPU Host Performance tuning parallel portions (enjoy) Kernels sequential portions (do not bite) Debugging Thread management OpenCL Algorithms Memory CUDA Device control PPP (Peach of Parallel Programming – Kirk/Hwu) Example applications Parallel programming Overall GPU Architecture (G80) GPU Architecture Specifics Memory bandwith: 86.4 GB/s (GPU) Streaming Additional hardware 4GB/s BW (GPU ↔ CPU, PCI Express) multi-processor SM • each SP has a multiply-add (MAD) and one extra multiply unit Stream Host • special floating-point function units (SQRT, TRIG, ..) SM block processor SP Input Assembler Massive multi-thread support Thread Execution Manager • CPUs typically run 2 or 4 threads/core • G80 can run up to 768 threads/SM Æ 12,000 threads/chip • G200 can run 1024 threads/SM Æ 30,000 threads/ship G80 (2008) Parallel Data Parallel Data Parallel Data Parallel Data Parallel Data Parallel Data Parallel Data Parallel Data Cache Cache Cache Cache Cache Cache Cache Cache • GeForce 8-series (8800 GTX, etc) TextureTexture Texture Texture Texture Texture Texture Texture Texture • 128 SP (16 SM × 8 SM) • 500 Gflops (768 MB DRAM) NVIDIA Quadro: G200 (2009) professional version of consumer Load/store Load/store Load/store Load/store Load/store Load/store GeForce series • GeForce GTX 280, etc Global Memory • 240 SP 1 Tflops (1 GB DRAM) 768 MB Off-chip (GDDR) DRAM (on-board) • NVIDIA Fermi Architecture NVIDIA Fermi New GeForce 400 series • GTX 480, etc • up to 512 SP (16 × 32) but typically < 500 (GTX 480 has 496 SP) CUDA Core SM (Streaming • 1.3 Tflops (1.5GB DRAM) Multiprocessor) memory interface (64 bit) Important features: • C++, support for C, Fortran, Java, Python, OpenCL, DirectCompute • ECC (Error Correcting Code) memory (Tesla only) • 512 CUDA Cores™ with new IEEE 754-2008 floating-point standard • 8× peak double precision arithmetic performance over last-gen GPUs On chip: SMs: 16 • NVIDIA Parallel DataCache™ cache hierarchy CUDA cores: 16/SM, 512/chip • NVIDIA GigaThread™ for concurrent kernel execution memory interfaces: 6 (BW 384 bits) NVIDIA Fermi Course Topic Tag Cloud Architecture full cross-bar interface Limits of parallel programming Host Performance tuning Kernels two16-wide cores Debugging Thread management OpenCL Algorithms Memory CUDA Device control 4 special function units (math, etc) Example applications Parallel programming Mapping the Architecture to Parallel Programs Mapping the Architecture to Parallel Programs Parallelism is exposed as threads Thread communication • all threads run the same code • threads within a block cooperate via shared memory, atomic • a thread runs on one SP operations, barrier synchronization • SPMD (Single Process, Multiple Data) • threads in different blocks cannot cooperate The threads divide into blocks • threads execute together in a block • each block has a unique ID within a Thread Block 0 Thread Block 1 Thread Block N - 1 grid Æ block ID threadID 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 • each thread has a unique ID within a block Æ thread ID … … … block ID and thread ID can be used to float x = float x = float x = • input[threadID]; input[threadID]; input[threadID]; compute a global ID float y = func(x); float y = func(x); … float y = func(x); output[threadID] = y; output[threadID] = y; output[threadID] = y; … … … The blocks aggregate into grid cells • each such cell is a SM Mapping the Architecture to Parallel Programs Mapping the Architecture to Parallel Programs Threads within a block are organized into SPMD warps Mapping • execute the same instruction • depends on device simultaneously with different data hardware A warp is 32 threads Thread management Æ a 16-core block takes 2 clock cycles • very lightweight to compute a warp thread creation, scheduling One SM can maintain 48 warps • in contrast, on the simultaneously CPU thread • keep one warp active while 47 wait for management is very memory Æ latency hiding heavy • 32 threads × 48 warps ×16 SMs Æ 24,576 threads ! Qualifiers Qualifiers Function qualifiers Variable qualifiers • specify whether a function executes on the host or on the device • specify the memory location on the device of a variable • __global__ defines a kernel function (must return void) • __shared__ and __constant__ are optionally used together • __device__ and __host__ can be used together with __device__ Function Exe on Call from Variable Memory Scope Lifetime __device__ GPU GPU __shared__ Shared Block Block __global__ GPU CPU __device__ Global Grid Application __host__ CPU CPU __constant__ Constant Grid Application CPU=host and GPU=device For function executed on the device • no recursion
Recommended publications
  • Gs-35F-4677G
    March 2013 NCS Technologies, Inc. Information Technology (IT) Schedule Contract Number: GS-35F-4677G FEDERAL ACQUISTIION SERVICE INFORMATION TECHNOLOGY SCHEDULE PRICELIST GENERAL PURPOSE COMMERCIAL INFORMATION TECHNOLOGY EQUIPMENT Special Item No. 132-8 Purchase of Hardware 132-8 PURCHASE OF EQUIPMENT FSC CLASS 7010 – SYSTEM CONFIGURATION 1. End User Computer / Desktop 2. Professional Workstation 3. Server 4. Laptop / Portable / Notebook FSC CLASS 7-25 – INPUT/OUTPUT AND STORAGE DEVICES 1. Display 2. Network Equipment 3. Storage Devices including Magnetic Storage, Magnetic Tape and Optical Disk NCS TECHNOLOGIES, INC. 7669 Limestone Drive Gainesville, VA 20155-4038 Tel: (703) 621-1700 Fax: (703) 621-1701 Website: www.ncst.com Contract Number: GS-35F-4677G – Option Year 3 Period Covered by Contract: May 15, 1997 through May 14, 2017 GENERAL SERVICE ADMINISTRATION FEDERAL ACQUISTIION SERVICE Products and ordering information in this Authorized FAS IT Schedule Price List is also available on the GSA Advantage! System. Agencies can browse GSA Advantage! By accessing GSA’s Home Page via Internet at www.gsa.gov. TABLE OF CONTENTS INFORMATION FOR ORDERING OFFICES ............................................................................................................................................................................................................................... TC-1 SPECIAL NOTICE TO AGENCIES – SMALL BUSINESS PARTICIPATION 1. Geographical Scope of Contract .............................................................................................................................................................................................................................
    [Show full text]
  • Download Gtx 970 Driver Download Gtx 970 Driver
    download gtx 970 driver Download gtx 970 driver. Completing the CAPTCHA proves you are a human and gives you temporary access to the web property. What can I do to prevent this in the future? If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware. If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices. Another way to prevent getting this page in the future is to use Privacy Pass. You may need to download version 2.0 now from the Chrome Web Store. Cloudflare Ray ID: 67a229f54fd4c3c5 • Your IP : 188.246.226.140 • Performance & security by Cloudflare. GeForce Windows 10 Driver. NVIDIA has been working closely with Microsoft on the development of Windows 10 and DirectX 12. Coinciding with the arrival of Windows 10, this Game Ready driver includes the latest tweaks, bug fixes, and optimizations to ensure you have the best possible gaming experience. Game Ready Best gaming experience for Windows 10. GeForce GTX TITAN X, GeForce GTX TITAN, GeForce GTX TITAN Black, GeForce GTX TITAN Z. GeForce 900 Series: GeForce GTX 980 Ti, GeForce GTX 980, GeForce GTX 970, GeForce GTX 960. GeForce 700 Series: GeForce GTX 780 Ti, GeForce GTX 780, GeForce GTX 770, GeForce GTX 760, GeForce GTX 760 Ti (OEM), GeForce GTX 750 Ti, GeForce GTX 750, GeForce GTX 745, GeForce GT 740, GeForce GT 730, GeForce GT 720, GeForce GT 710, GeForce GT 705.
    [Show full text]
  • MSI Afterburner V4.6.4
    MSI Afterburner v4.6.4 MSI Afterburner is ultimate graphics card utility, co-developed by MSI and RivaTuner teams. Please visit https://msi.com/page/afterburner to get more information about the product and download new versions SYSTEM REQUIREMENTS: ...................................................................................................................................... 3 FEATURES: ............................................................................................................................................................. 3 KNOWN LIMITATIONS:........................................................................................................................................... 4 REVISION HISTORY: ................................................................................................................................................ 5 VERSION 4.6.4 .............................................................................................................................................................. 5 VERSION 4.6.3 (PUBLISHED ON 03.03.2021) .................................................................................................................... 5 VERSION 4.6.2 (PUBLISHED ON 29.10.2019) .................................................................................................................... 6 VERSION 4.6.1 (PUBLISHED ON 21.04.2019) .................................................................................................................... 7 VERSION 4.6.0 (PUBLISHED ON
    [Show full text]
  • Manycore GPU Architectures and Programming, Part 1
    Lecture 19: Manycore GPU Architectures and Programming, Part 1 Concurrent and Mul=core Programming CSE 436/536, [email protected] www.secs.oakland.edu/~yan 1 Topics (Part 2) • Parallel architectures and hardware – Parallel computer architectures – Memory hierarchy and cache coherency • Manycore GPU architectures and programming – GPUs architectures – CUDA programming – Introduc?on to offloading model in OpenMP and OpenACC • Programming on large scale systems (Chapter 6) – MPI (point to point and collec=ves) – Introduc?on to PGAS languages, UPC and Chapel • Parallel algorithms (Chapter 8,9 &10) – Dense matrix, and sorng 2 Manycore GPU Architectures and Programming: Outline • Introduc?on – GPU architectures, GPGPUs, and CUDA • GPU Execuon model • CUDA Programming model • Working with Memory in CUDA – Global memory, shared and constant memory • Streams and concurrency • CUDA instruc?on intrinsic and library • Performance, profiling, debugging, and error handling • Direc?ve-based high-level programming model – OpenACC and OpenMP 3 Computer Graphics GPU: Graphics Processing Unit 4 Graphics Processing Unit (GPU) Image: h[p://www.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html 5 Graphics Processing Unit (GPU) • Enriching user visual experience • Delivering energy-efficient compung • Unlocking poten?als of complex apps • Enabling Deeper scien?fic discovery 6 What is GPU Today? • It is a processor op?mized for 2D/3D graphics, video, visual compu?ng, and display. • It is highly parallel, highly multhreaded mulprocessor op?mized for visual
    [Show full text]
  • Geforce 500 Series Graphics Cards, Including Without Limitation the Geforce GTX
    c) GeForce 500 series graphics cards, including without limitation the GeForce GTX 580, GeForce GTX 570, GeForce GTX 560 Ti, GeForce GTX 560, GeForce GTX 550 Ti, and GeForce GT 520; d) GeForce 400 series graphics cards, including without limitation the GcForcc GTX 480, GcForcc GTX 470, GeForce GTX 460, GeForce GTS 450, GeForce GTS 440, and GcForcc GT 430; and c) GcForcc 200 series graphics cards, including without limitation the GeForce GT 240, GcForcc GT 220, and GeForce 210. 134. On information and belief, the Accused ’734 Biostar Products contain at least one of the following Accused NVIDIA ’734 Products that infringe at least claims 1-3, 7-9, 12-15, 17, and 19 of the ’734 patent, literally or under the doctrine of equivalents: the GFIOO, GFl04, GF108, GF110, GF114, GF116, GF119, GK104, GKIO6, GK107, GK208, GM107, GT2l5, GT2l6, and GT2l8, which are infringing NVIDIA GPUs. (See Section VI.D.1 and claim charts attached as Exhibit 28.) 3. ECS 135. On information and belief, ECS imports, sells for importation, and/or sells after importation into the United States, Accused Products that incorporate Accused NVIDIA ’734 Products (the “Accused PCS ’734 Products”) and therefore infringe the ’734 patent, literally or under the doctrine of equivalents. 136. On information and belief, the Accused ECS "/34 Products include at least the following products that directly infringe at least claims 1-3, 7-9, 12-15, 17, and 19 of the ’734 patent, literally or under the doctrine of equivalents: a) GeForce 600 series graphics cards, including without limitation
    [Show full text]
  • New NVIDIA Geforce GTX 480 GPU Cranks up PC Gaming to New Heights
    New NVIDIA GeForce GTX 480 GPU Cranks Up PC Gaming to New Heights Latest Consumer GPU Delivers Unprecedented DirectX 11 Performance, 3D Vision Surround, PhysX, and Interactive Ray-Tracing SANTA CLARA, CA -- Hot off the heels of PAX East, the consumer gaming show held this past weekend in Boston, NVIDIA today officially launched its new flagship graphics processors, the NVIDIA® GeForce® GTX 480 and GeForce GTX 470. The top-of-the line in a new family of enthusiast-class GPUs, the GeForce GTX 480 was designed from the ground up to deliver the industry's most potent tessellation performance, which is the key component of Microsoft's DirectX 11 development platform for PC games. Tessellation allows game developers to take advantage of the GeForce GTX 480 GPU's ability to increase the geometric complexity of models and characters to deliver far more realistic and visually compelling gaming environments. The GeForce GTX 480 is joined by the GeForce GTX 470 as the first products in NVIDIA's Fermi line of consumer products. They will be available in mid-April, from the world's leading add-in card partners and PC system builders. The remainder of the GeForce 400-series lineup will be announced in the coming months, filling out additional performance and price segments. The GeForce GTX 480 and GTX 470 GPUs bring a host of new gaming features never before offered for the PC -- including support for real-time ray tracing and NVIDIA 3D Vision™ Surround for truly immersive widescreen, stereoscopic 3D gaming. Facts about NVIDIA GeForce GTX 480: • Tear it up.
    [Show full text]
  • Lecture: Manycore GPU Architectures and Programming, Part 1
    Lecture: Manycore GPU Architectures and Programming, Part 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan [email protected] https://passlab.github.io/CSCE569/ 1 Manycore GPU Architectures and Programming: Outline • Introduction – GPU architectures, GPGPUs, and CUDA • GPU Execution model • CUDA Programming model • Working with Memory in CUDA – Global memory, shared and constant memory • Streams and concurrency • CUDA instruction intrinsic and library • Performance, profiling, debugging, and error handling • Directive-based high-level programming model – OpenACC and OpenMP 2 Computer Graphics GPU: Graphics Processing Unit 3 Graphics Processing Unit (GPU) Image: http://www.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html 4 Graphics Processing Unit (GPU) • Enriching user visual experience • Delivering energy-efficient computing • Unlocking potentials of complex apps • Enabling Deeper scientific discovery 5 What is GPU Today? • It is a processor optimized for 2D/3D graphics, video, visual computing, and display. • It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. • It provide real-time visual interaction with computed objects via graphics images, and video. • It serves as both a programmable graphics processor and a scalable parallel computing platform. – Heterogeneous systems: combine a GPU with a CPU • It is called as Many-core 6 Graphics Processing Units (GPUs): Brief History GPU Computing General-purpose computing on graphics processing units (GPGPUs) GPUs with programmable shading Nvidia GeForce GE 3 (2001) with programmable shading DirectX graphics API OpenGL graphics API Hardware-accelerated 3D graphics S3 graphics cards- single chip 2D accelerator Atari 8-bit computer IBM PC Professional Playstation text/graphics chip Graphics Controller card 1970 1980 1990 2000 2010 Source of information http://en.wikipedia.org/wiki/Graphics_Processing_Unit 7 NVIDIA Products • NVIDIA Corp.
    [Show full text]
  • 750Ti Driver Download Geforce 335.23 Driver
    750ti driver download GeForce 335.23 Driver. This 335.23 Game Ready WHQL driver ensures you’ll have the best possible gaming experience for Titanfall. Performance Enhanced GPU clock offset options for GeForce GTX 750Ti / GTX 750 Diablo III – updated DX9 profile Bound by Flame – updated profile DOTA 2 – updated profile Need for Speed Rivals – updated DX11 profile Watch Dogs – updated profile Gaming Technology Supports GeForce ShadowPlay™ technology Supports GeForce ShadowPlay™ Twitch Streaming Supports NVIDIA GameStream™ technology Titanfall – rated “Good” Thief – rating now “Good” Call of Duty: Ghosts – in-depth laser sight added. GeForce GTX TITAN, GeForce GTX TITAN Black. GeForce 700 Series: GeForce GTX 780 Ti, GeForce GTX 780, GeForce GTX 770, GeForce GTX 760, GeForce GTX 760 Ti (OEM), GeForce GTX 750 Ti, GeForce GTX 750, GeForce GTX 745. GeForce 600 Series: GeForce GTX 690, GeForce GTX 680, GeForce GTX 670, GeForce GTX 660 Ti, GeForce GTX 660, GeForce GTX 650 Ti BOOST, GeForce GTX 650 Ti, GeForce GTX 650, GeForce GTX 645, GeForce GT 645, GeForce GT 640, GeForce GT 630, GeForce GT 620, GeForce GT 610, GeForce 605. GeForce 500 Series: GeForce GTX 590, GeForce GTX 580, GeForce GTX 570, GeForce GTX 560 Ti, GeForce GTX 560 SE, GeForce GTX 560, GeForce GTX 555, GeForce GTX 550 Ti, GeForce GT 545, GeForce GT 530, GeForce GT 520, GeForce 510. GeForce 400 Series: GeForce GTX 480, GeForce GTX 470, GeForce GTX 465, GeForce GTX 460 SE v2, GeForce GTX 460 SE, GeForce GTX 460, GeForce GTS 450, GeForce GT 440, GeForce GT 430, GeForce GT 420, GeForce 405.
    [Show full text]
  • Geforce Gt 610 Driver Download Geforce Gt 610 Driver Download
    geforce gt 610 driver download Geforce gt 610 driver download. Completing the CAPTCHA proves you are a human and gives you temporary access to the web property. What can I do to prevent this in the future? If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware. If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices. Another way to prevent getting this page in the future is to use Privacy Pass. You may need to download version 2.0 now from the Chrome Web Store. Cloudflare Ray ID: 67b128b3cd0fc3f7 • Your IP : 188.246.226.140 • Performance & security by Cloudflare. GeForce Game Ready Driver. Game Ready Drivers provide the best possible gaming experience for all major new releases, including Virtual Reality games. Prior to a new title launching, our driver team is working up until the last minute to ensure every performance tweak and bug fix is included for the best gameplay on day-1. Game Ready Provides the optimal gaming experience for Tom Clancy’s Ghost Recon Wildlands. Gaming Technology Includes DirectX 12 optimizations which provide additional performance increases for a variety of titles. NVIDIA TITAN X (Pascal), GeForce GTX TITAN X, GeForce GTX TITAN, GeForce GTX TITAN Black, GeForce GTX TITAN Z. GeForce 10 Series: GeForce GTX 1080 Ti, GeForce GTX 1080, GeForce GTX 1070, GeForce GTX 1060, GeForce GTX 1050 Ti, GeForce GTX 1050.
    [Show full text]
  • Several Tips on How to Choose a Suitable Computer
    Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and post- processing of your data with Artec Studio. On our web-site you can find general requirements, in this document we’ll try to cover all of them and provide more detailed explanations. It also contains some tested configurations and some tips and tricks to make your hardware perform as fast as possible. 1 Table of Contents General requirements for hardware Processor 1.1. Recommended processors and microarchitecture 1.2. What about Xeon? 1.3. What about AMD processors? RAM USB 3.1. General recommendations 3.2. USB 3.0 3.3. Connecting several scanners/sensors to 1 computer Videocard 4.1. What is supported and what is definitely not supported? 4.2. What about Quadro cards? 4.3. What about SLI? 4.4. Do you support laptops with NVIDIA Optimus technology? 4.5. Stereo support 4.6. Real-time fusion requirements 4.7. GCTest OS 5.1. What is recommended and what is not supported? 5.2. Do you have a version for MacOS? Several tips to increase performance 6.1. Desktop machines 6.2. Laptops 6.3. Quadro cards Frequently asked questions 7.1. What laptops does Artec use? Can you recommend any particular model? 7.2. Can I test your software? 2 General requirements for hardware Processor: I5 or I7 recommended Memory: 8-12 Gb for Artec Eva / 12-16 Gb for Artec Spider USB: 1 USB 2.0 port for a regular scanner.
    [Show full text]
  • HP Z400 Workstation Overview
    QuickSpecs HP Z400 Workstation Overview HP recommends Windows Vista® Business 1. 3 External 5.25" Bays 2. Power Button 3. Front I/O: 2 USB 2.0, 1 IEEE 1394a (optional card required), Headphone, Microphone DA - 13276 North America — Version 4 — April 17, 2009 Page 1 QuickSpecs HP Z400 Workstation Overview 4. 3 External 5.25” Bays 9. Rear I/O: 6 USB 2.0, PS/2 keyboard/mouse 1 RJ-45 to Integrated Gigabit LAN 5. 4 DIMM Slots for DDR3 ECC Memory 1 Audio Line In, 1 Audio Line Out, 1 Microphone In 6. 2 Internal 3.5” Bays 10. 2 PCIe x16 Gen2 Slots 7. 475W, 85% efficient Power Supply 11.. 1 PCIe x4 Gen2, 1 PCIe x4 Gen1, 2 PCI Slots 8. Dual/Quad Core Intel 3500 Series Processors 12 4 Internal USB 2.0 ports Form Factor Convertible Minitower Compatible Operating Genuine Windows Vista® Business 32-bit* Systems Genuine Windows Vista® Business 64-bit* Genuine Windows Vista® Business 32-bit with downgrade to Windows® XP Professional 32-bit custom installed** (expected available until August 2009) Genuine Windows Vista® Business 64-bit with downgrade to Windows® XP Professional x64 custom installed** (expected available until August 2009) HP Linux Installer Kit for Linux (includes drivers for both 32-bit & 64-bit OS versions of Red Hat Enterprise Linux WS4 and WS5 - see: http://www.hp.com/workstations/software/linux) Novell Suse SLED 11 (expected availability May 2009) *Certain Windows Vista product features require advanced or additional hardware. See http://www.microsoft.com/windowsvista/getready/hardwarereqs.mspx and http://www.microsoft.com/windowsvista/getready/capable.mspx for details.
    [Show full text]
  • Parallel Nsight 1.5 + CUDA Toolkit 3.2 + Ecosytem Update As of Sept 2010
    High Performance Computing with CUDA™ Supercomputing 2010 Tutorial Cyril Zeller, NVIDIA Corporation Welcome! Goal: A detailed introduction to high performance computing with CUDA CUDA = NVIDIA’s architecture for GPU computing Outline: Motivation and introduction GPU programming basics Code analysis and optimization Lessons learned from production codes GPUs are Fast! 8x Higher Linpack Performance Performance / $ Performance / watt Gflops Gflops / $K Gflops / kwatt 750 656.1 70 60 800 600 60 656 50 600 450 40 400 300 30 20 150 11 200 146 80.1 10 0 0 0 CPU Server GPU-CPU CPU Server GPU-CPU CPU Server GPU-CPU Server Server Server CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz, 48 GB memory, $7K, 0.55 kw GPU-CPU 1U Server: 2x Tesla C2050 + 2x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw Increasing Number of CUDA Applications 2010 Already Available Q2 Q3 Q4 CULA CUDA C/C++, Nsight Visual Allinea TotalView PGI Accelerator Tools & LAPACK Library PGI Fortran Studio IDE Debugger Debugger Enhancements Libraries Thrust: C++ Jacket: NPP Performance Platform Cluster Bright Cluster CAPS HMPP Mathworks Mathematica Template Lib MATLAB Plugin Primitives (NPP) Management Management Enhancements MATLAB Seismic Analysis: Seismic Analysis: Seismic City, Seismic Reservoir Reservoir Oil & Gas ffA, HeadWave Geostar Acceleware Interpretation Simulation 1 Simulation 2 Bio- AMBER, GROMACS, GROMOS, BigDFT, ABINIT, Quantum Chem Other Popular Quantum Chemistry HOOMD, LAMMPS, NAMD, VMD TeraChem Code 1 MD code Chem Code 2 Bio- Hex Protein CUDA-BLASTP, GPU-HMMER,
    [Show full text]