Nvidia's GPU Microarchitectures
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
CUDA by Example
CUDA by Example AN INTRODUCTION TO GENERAL-PURPOSE GPU PROGRAMMING JASON SaNDERS EDWARD KANDROT Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City Sanders_book.indb 3 6/12/10 3:15:14 PM Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. NVIDIA makes no warranty or representation that the techniques described herein are free from any Intellectual Property claims. The reader assumes all risk of any such claims based on his or her use of these techniques. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U.S. Corporate and Government Sales (800) 382-3419 [email protected] For sales outside the United States, please contact: International Sales [email protected] Visit us on the Web: informit.com/aw Library of Congress Cataloging-in-Publication Data Sanders, Jason. -
A Concurrent PASCAL Compiler for Minicomputers
512 Appendix A DIFFERENCES BETWEEN UCSD'S PASCAL AND STANDARD PASCAL The PASCAL language used in this book contains most of the features described by K. Jensen and N. Wirth in PASCAL User Manual and Report, Springer Verlag, 1975. We refer to the PASCAL defined by Jensen and Wirth as "Standard" PASCAL, because of its widespread acceptance even though no international standard for the language has yet been established. The PASCAL used in this book has been implemented at University of California San Diego (UCSD) in a complete software system for use on a variety of small stand-alone microcomputers. This will be referred to as "UCSD PASCAL", which differs from the standard by a small number of omissions, a very small number of alterations, and several extensions. This appendix provides a very brief summary Of these differences. Only the PASCAL constructs used within this book will be mentioned herein. Documents are available from the author's group at UCSD describing UCSD PASCAL in detail. 1. CASE Statements Jensen & Wirth state that if there is no label equal to the value of the case statement selector, then the result of the case statement is undefined. UCSD PASCAL treats this situation by leaving the case statement normally with no action being taken. 2. Comments In UCSD PASCAL, a comment appears between the delimiting symbols "(*" and "*)". If the opening delimiter is followed immediately by a dollar sign, as in "(*$", then the remainder of the comment is treated as a directive to the compiler. The only compiler directive mentioned in this book is (*$G+*), which tells the compiler to allow the use of GOTO statements. -
The Intro to GPGPU CPU Vs
12/12/11! The Intro to GPGPU . Dr. Chokchai (Box) Leangsuksun, PhD! Louisiana Tech University. Ruston, LA! ! CPU vs. GPU • CPU – Fast caches – Branching adaptability – High performance • GPU – Multiple ALUs – Fast onboard memory – High throughput on parallel tasks • Executes program on each fragment/vertex • CPUs are great for task parallelism • GPUs are great for data parallelism Supercomputing 20082 Education Program 1! 12/12/11! CPU vs. GPU - Hardware • More transistors devoted to data processing CUDA programming guide 3.1 3 CPU vs. GPU – Computation Power CUDA programming guide 3.1! 2! 12/12/11! CPU vs. GPU – Memory Bandwidth CUDA programming guide 3.1! What is GPGPU ? • General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical path of application • Data parallel algorithms leverage GPU attributes – Large data arrays, streaming throughput – Fine-grain SIMD parallelism – Low-latency floating point (FP) computation © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007! ECE 498AL, University of Illinois, Urbana-Champaign! 3! 12/12/11! Why is GPGPU? • Large number of cores – – 100-1000 cores in a single card • Low cost – less than $100-$1500 • Green computing – Low power consumption – 135 watts/card – 135 w vs 30000 w (300 watts * 100) • 1 card can perform > 100 desktops 12/14/09!– $750 vs 50000 ($500 * 100) 7 Two major players 4! 12/12/11! Parallel Computing on a GPU • NVIDIA GPU Computing Architecture – Via a HW device interface – In laptops, desktops, workstations, servers • Tesla T10 1070 from 1-4 TFLOPS • AMD/ATI 5970 x2 3200 cores • NVIDIA Tegra is an all-in-one (system-on-a-chip) ATI 4850! processor architecture derived from the ARM family • GPU parallelism is better than Moore’s law, more doubling every year • GPGPU is a GPU that allows user to process both graphics and non-graphics applications. -
Units and Magnitudes (Lecture Notes)
physics 8.701 topic 2 Frank Wilczek Units and Magnitudes (lecture notes) This lecture has two parts. The first part is mainly a practical guide to the measurement units that dominate the particle physics literature, and culture. The second part is a quasi-philosophical discussion of deep issues around unit systems, including a comparison of atomic, particle ("strong") and Planck units. For a more extended, profound treatment of the second part issues, see arxiv.org/pdf/0708.4361v1.pdf . Because special relativity and quantum mechanics permeate modern particle physics, it is useful to employ units so that c = ħ = 1. In other words, we report velocities as multiples the speed of light c, and actions (or equivalently angular momenta) as multiples of the rationalized Planck's constant ħ, which is the original Planck constant h divided by 2π. 27 August 2013 physics 8.701 topic 2 Frank Wilczek In classical physics one usually keeps separate units for mass, length and time. I invite you to think about why! (I'll give you my take on it later.) To bring out the "dimensional" features of particle physics units without excess baggage, it is helpful to keep track of powers of mass M, length L, and time T without regard to magnitudes, in the form When these are both set equal to 1, the M, L, T system collapses to just one independent dimension. So we can - and usually do - consider everything as having the units of some power of mass. Thus for energy we have while for momentum 27 August 2013 physics 8.701 topic 2 Frank Wilczek and for length so that energy and momentum have the units of mass, while length has the units of inverse mass. -
GPU Architecture • Display Controller • Designing for Safety • Vision Processing
i.MX 8 GRAPHICS ARCHITECTURE FTF-DES-N1940 RAFAL MALEWSKI HEAD OF GRAPHICS TECH ENGINEERING CENTER MAY 18, 2016 PUBLIC USE AGENDA • i.MX 8 Series Scalability • GPU Architecture • Display Controller • Designing for Safety • Vision Processing 1 PUBLIC USE #NXPFTF 1 PUBLIC USE #NXPFTF i.MX is… 2 PUBLIC USE #NXPFTF SoC Scalability = Investment Re-Use Replace the Chip a Increase capability. (Pin, Software, IP compatibility among parts) 3 PUBLIC USE #NXPFTF i.MX 8 Series GPU Cores • Dual Core GPU Up to 4 displays Accelerated ARM Cores • 16 Vec4 Shaders Vision Cortex-A53 | Cortex-A72 8 • Up to 128 GFLOPS • 64 execution units 8QuadMax 8 • Tessellation/Geometry total pixels Shaders Pin Compatibility Pin • Dual Core GPU Up to 4 displays Accelerated • 8 Vec4 Shaders Vision 4 • Up to 64 GFLOPS • 32 execution units 4 total pixels 8QuadPlus • Tessellation/Geometry Compatibility Software Shaders • Dual Core GPU Up to 4 displays Accelerated • 8 Vec4 Shaders Vision 4 • Up to 64 GFLOPS 4 • 32 execution units 8Quad • Tessellation/Geometry total pixels Shaders • Single Core GPU Up to 2 displays Accelerated • 8 Vec4 Shaders 2x 1080p total Vision • Up to 64 GFLOPS pixels Compatibility Pin 8 • 32 execution units 8Dual8Solo • Tessellation/Geometry Shaders • Single Core GPU Up to 2 displays Accelerated • 4 Vec4 Shaders 2x 1080p total Vision • Up to 32 GFLOPS pixels 4 • 16 execution units 8DualLite8Solo • Tessellation/Geometry Shaders 4 PUBLIC USE #NXPFTF i.MX 8 Series – The Doubles GPU Cores • Dual Core GPU Up to 4 displays Accelerated • 16 Vec4 Shaders Vision -
NVIDIA Quadro Technical Specifications
NVIDIA Quadro Technical Specifications NVIDIA Quadro Workstation GPU High-resolution Antialiasing ° Dassault CATIA • Full 128-bit floating point precision • Up to 16x full-scene antialiasing (FSAA), ° ESRI ArcGIS pipeline at resolutions up to 1920 x 1200 ° ICEM Surf • 12-bit subpixel precision • 12-bit subpixel sampling precision ° MSC.Nastran, MSC.Patran • Hardware-accelerated antialiased enhances AA quality ° PTC Pro/ENGINEER Wildfire, points and lines • Rotated-grid FSAA significantly 3Dpaint, CDRS The NVIDIA Quadro® family of In addition to a full line up of 2D and • Hardware OpenGL overlay planes increases color accuracy and visual ° SolidWorks • Hardware-accelerated two-sided quality for edges, while maintaining ° UDS NX Series, I-deas, SolidEdge, professional solutions for workstations 3D workstation graphics solutions, the lighting performance3 Unigraphics, SDRC delivers the fastest application NVIDIA Quadro professional products • Hardware-accelerated clipping planes and many more… Memory performance and the highest quality include a set of specialty solutions that • Third-generation occlusion culling • Digital Content Creation (DCC) graphics. have been architected to meet the • 16 textures per pixel • High-speed memory (up to 512MB Alias Maya, MOTIONBUILDER needs of a wide range of industry • OpenGL quad-buffered stereo (3-pin GDDR3) ° NewTek Lightwave 3D Raw performance and quality are only sync connector) • Advanced lossless compression ° professionals. These specialty Autodesk Media and Entertainment the beginning. The NVIDIA -
Guide for the Use of the International System of Units (SI)
Guide for the Use of the International System of Units (SI) m kg s cd SI mol K A NIST Special Publication 811 2008 Edition Ambler Thompson and Barry N. Taylor NIST Special Publication 811 2008 Edition Guide for the Use of the International System of Units (SI) Ambler Thompson Technology Services and Barry N. Taylor Physics Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899 (Supersedes NIST Special Publication 811, 1995 Edition, April 1995) March 2008 U.S. Department of Commerce Carlos M. Gutierrez, Secretary National Institute of Standards and Technology James M. Turner, Acting Director National Institute of Standards and Technology Special Publication 811, 2008 Edition (Supersedes NIST Special Publication 811, April 1995 Edition) Natl. Inst. Stand. Technol. Spec. Publ. 811, 2008 Ed., 85 pages (March 2008; 2nd printing November 2008) CODEN: NSPUE3 Note on 2nd printing: This 2nd printing dated November 2008 of NIST SP811 corrects a number of minor typographical errors present in the 1st printing dated March 2008. Guide for the Use of the International System of Units (SI) Preface The International System of Units, universally abbreviated SI (from the French Le Système International d’Unités), is the modern metric system of measurement. Long the dominant measurement system used in science, the SI is becoming the dominant measurement system used in international commerce. The Omnibus Trade and Competitiveness Act of August 1988 [Public Law (PL) 100-418] changed the name of the National Bureau of Standards (NBS) to the National Institute of Standards and Technology (NIST) and gave to NIST the added task of helping U.S. -
Directx and GPU (Nvidia-Centric) History Why
10/12/09 DirectX and GPU (Nvidia-centric) History DirectX 6 DirectX 7! ! DirectX 8! DirectX 9! DirectX 9.0c! Multitexturing! T&L ! SM 1.x! SM 2.0! SM 3.0! DirectX 5! Riva TNT GeForce 256 ! ! GeForce3! GeForceFX! GeForce 6! Riva 128! (NV4) (NV10) (NV20) Cg! (NV30) (NV40) XNA and Programmable Shaders DirectX 2! 1996! 1998! 1999! 2000! 2001! 2002! 2003! 2004! DirectX 10! Prof. Hsien-Hsin Sean Lee SM 4.0! GTX200 NVidia’s Unified Shader Model! Dualx1.4 billion GeForce 8 School of Electrical and Computer 3dfx’s response 3dfx demise ! Transistors first to Voodoo2 (G80) ~1.2GHz Engineering Voodoo chip GeForce 9! Georgia Institute of Technology 2006! 2008! GT 200 2009! ! GT 300! Adapted from David Kirk’s slide Why Programmable Shaders Evolution of Graphics Processing Units • Hardwired pipeline • Pre-GPU – Video controller – Produces limited effects – Dumb frame buffer – Effects look the same • First generation GPU – PCI bus – Gamers want unique look-n-feel – Rasterization done on GPU – Multi-texturing somewhat alleviates this, but not enough – ATI Rage, Nvidia TNT2, 3dfx Voodoo3 (‘96) • Second generation GPU – Less interoperable, less portable – AGP – Include T&L into GPU (D3D 7.0) • Programmable Shaders – Nvidia GeForce 256 (NV10), ATI Radeon 7500, S3 Savage3D (’98) – Vertex Shader • Third generation GPU – Programmable vertex and fragment shaders (D3D 8.0, SM1.0) – Pixel or Fragment Shader – Nvidia GeForce3, ATI Radeon 8500, Microsoft Xbox (’01) – Starting from DX 8.0 (assembly) • Fourth generation GPU – Programmable vertex and fragment shaders -
Multidisciplinary Design Project Engineering Dictionary Version 0.0.2
Multidisciplinary Design Project Engineering Dictionary Version 0.0.2 February 15, 2006 . DRAFT Cambridge-MIT Institute Multidisciplinary Design Project This Dictionary/Glossary of Engineering terms has been compiled to compliment the work developed as part of the Multi-disciplinary Design Project (MDP), which is a programme to develop teaching material and kits to aid the running of mechtronics projects in Universities and Schools. The project is being carried out with support from the Cambridge-MIT Institute undergraduate teaching programe. For more information about the project please visit the MDP website at http://www-mdp.eng.cam.ac.uk or contact Dr. Peter Long Prof. Alex Slocum Cambridge University Engineering Department Massachusetts Institute of Technology Trumpington Street, 77 Massachusetts Ave. Cambridge. Cambridge MA 02139-4307 CB2 1PZ. USA e-mail: [email protected] e-mail: [email protected] tel: +44 (0) 1223 332779 tel: +1 617 253 0012 For information about the CMI initiative please see Cambridge-MIT Institute website :- http://www.cambridge-mit.org CMI CMI, University of Cambridge Massachusetts Institute of Technology 10 Miller’s Yard, 77 Massachusetts Ave. Mill Lane, Cambridge MA 02139-4307 Cambridge. CB2 1RQ. USA tel: +44 (0) 1223 327207 tel. +1 617 253 7732 fax: +44 (0) 1223 765891 fax. +1 617 258 8539 . DRAFT 2 CMI-MDP Programme 1 Introduction This dictionary/glossary has not been developed as a definative work but as a useful reference book for engi- neering students to search when looking for the meaning of a word/phrase. It has been compiled from a number of existing glossaries together with a number of local additions. -
MSI Afterburner V4.6.4
MSI Afterburner v4.6.4 MSI Afterburner is ultimate graphics card utility, co-developed by MSI and RivaTuner teams. Please visit https://msi.com/page/afterburner to get more information about the product and download new versions SYSTEM REQUIREMENTS: ...................................................................................................................................... 3 FEATURES: ............................................................................................................................................................. 3 KNOWN LIMITATIONS:........................................................................................................................................... 4 REVISION HISTORY: ................................................................................................................................................ 5 VERSION 4.6.4 .............................................................................................................................................................. 5 VERSION 4.6.3 (PUBLISHED ON 03.03.2021) .................................................................................................................... 5 VERSION 4.6.2 (PUBLISHED ON 29.10.2019) .................................................................................................................... 6 VERSION 4.6.1 (PUBLISHED ON 21.04.2019) .................................................................................................................... 7 VERSION 4.6.0 (PUBLISHED ON -
Application Note to the Field Pumping Non-Newtonian Fluids with Liquiflo Gear Pumps
Pumping Non-Newtonian Fluids Application Note to the Field with Liquiflo Gear Pumps Application Note Number: 0104-2 Date: April 10, 2001; Revised Jan. 2016 Newtonian vs. non-Newtonian Fluids: Fluids fall into one of two categories: Newtonian or non-Newtonian. A Newtonian fluid has a constant viscosity at a particular temperature and pressure and is independent of shear rate. A non-Newtonian fluid has viscosity that varies with shear rate. The apparent viscosity is a measure of the resistance to flow of a non-Newtonian fluid at a given temperature, pressure and shear rate. Newton’s Law states that shear stress () is equal the dynamic viscosity () multiplied by the shear rate (): = . A fluid which obeys this relationship, where is constant, is called a Newtonian fluid. Therefore, for a Newtonian fluid, shear stress is directly proportional to shear rate. If however, varies as a function of shear rate, the fluid is non-Newtonian. In the SI system, the unit of shear stress is pascals (Pa = N/m2), the unit of shear rate is hertz or reciprocal seconds (Hz = 1/s), and the unit of dynamic viscosity is pascal-seconds (Pa-s). In the cgs system, the unit of shear stress is dynes per square centimeter (dyn/cm2), the unit of shear rate is again hertz or reciprocal seconds, and the unit of dynamic viscosity is poises (P = dyn-s-cm-2). To convert the viscosity units from one system to another, the following relationship is used: 1 cP = 1 mPa-s. Pump shaft speed is normally measured in RPM (rev/min). -
The Decibel Scale R.C
The Decibel Scale R.C. Maher Fall 2014 It is often convenient to compare two quantities in an audio system using a proportionality ratio. For example, if a linear amplifier produces 2 volts (V) output amplitude when its input amplitude is 100 millivolts (mV), the voltage gain is expressed as the ratio of output/input: 2V/100mV = 20. As long as the two quantities being compared have the same units--volts in this case--the proportionality ratio is dimensionless. If the proportionality ratios of interest end up being very large or very small, such as 2x105 and 2.5x10-4, manipulating and interpreting the results can become rather unwieldy. In this situation it can be helpful to compress the numerical range by taking the logarithm of the ratio. It is customary to use a base-10 logarithm for this purpose. For example, 5 log10(2x10 ) = 5 + log10(2) = 5.301 and -4 log10(2.5x10 ) = -4 + log10(2.5) = -3.602 If the quantities in the proportionality ratio both have the units of power (e.g., 2 watts), or intensity (watts/m ), then the base-10 logarithm log10(power1/power0) is expressed with the unit bel (symbol: B), in honor of Alexander Graham Bell (1847 -1922). The decibel is a unit representing one tenth (deci-) of a bel. Therefore, a figure reported in decibels is ten times the value reported in bels. The expression for a proportionality ratio expressed in decibel units (symbol dB) is: 10 푝표푤푒푟1 10 Common Usage 푑퐵 ≡ ∙ 푙표푔 �푝표푤푒푟0� The power dissipated in a resistance R ohms can be expressed as V2/R, where V is the voltage across the resistor.