Processeurs Graphiques  Partie 2 : Fonctionnement Interne Du GPU

Total Page:16

File Type:pdf, Size:1020Kb

Processeurs Graphiques  Partie 2 : Fonctionnement Interne Du GPU École thématique ARCHI 07 Plan 19-23 mars 2007 Partie 1 : Introduction Processeurs Graphiques Partie 2 : Fonctionnement interne du GPU Partie 3 : Utilisation du GPU David Defour DALI, Université de Perpignan École thématique ARCHI07 mars 2007 2/105 Partie 1 : Introduction Puissance de calcul flottant 32 bits Le calcul ne coûte pas chère Type d’application Place du GPU dans le système Présentation du pipeline graphique École École thématique thématique ARCHI07 ARCHI07 mars 2007 mars 2007 Source : “A Survey of General-Purpose Computation on Graphics Hardware”, J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. J. Purcell 3/105 4/105 Pentium III, 256Ko cache, 0.18 µµµm, 28.1 M Aujourd’hui le calcul ne coûte pas cher… transistors, 90 mm² 2.5 MM² Utilisation du silicium : Communication ( Bus ) Contrôle ( Exécution d’instruction, ooo ) Mémoire ( Cache, registres ) Calcul ( Unités de calcul ) École École thématique Si l’on ramène tout ça à la taille d’une puce … thématique ARCHI07 ARCHI07 mars 2007 mars 2007 Source : www.sandpile.org 5/105 6/105 Pentium M, 2Mo cache, 65nm, 151.6 M transitors, Concrètement … 91 mm² Le calcul ne coûte pas cher 0.9mm FPU du Pentium III : ~2% Puce en 65nm FPU de l’Intel Core 2 : 64-bit FPU <1% La mémoire augmente mais la latence aussi Pentium III, cache L2 : 256Ko / 4 cycles Intel Core 2, cache L2 : 2Mo / 14 cycles Le transfert de donnée devient très coûteux École École 0.18 µm : 12.1 ps/mm thématique thématique 65 nm : ~30-60 ps/mm ARCHI07 ARCHI07 Source : www.sandpile.org mars 2007 mars 2007 10mm 7/105 8/105 Introduction Comment calculer plus ? Utiliser les transistors disponibles pour calculer plus et plus vite 2 approches : 1. Mécanismes pour exploiter le parallélisme de données Pourquoi plus de calcul ? • Jeux d’instructions SIMD (MMX, SEE, SEE2) Cryptographie • Opérations entières (XOR, <<, …) • Opérations sur les grands entiers 2. Architectures dédiées • Physics Processing Unit (Ageia) Calcul scientifique • Processeurs réseaux (AMD, Infineon, Motorola, …) • Multiplication matricielle • Simulation • Cell (IBM, Sony, Toshiba) • Processeurs graphiques (ATI/AMD, Nvidia) Multimédia École • Traitement de l’image École thématique • Traitement du son thématique ARCHI07 • Traitement de l’information ARCHI07 mars 2007 mars 2007 9/105 10/105 Utilisation de la puissance de calcul : Films Applications des GPU: Jeux École École thématique thématique ARCHI07 ARCHI07 mars 2007 mars 2007 11/105 12/105 Applications des GPU : Visualisation, CAD Applications des GPU : Réalité virtuelle École École thématique thématique ARCHI07 ARCHI07 mars 2007 mars 2007 13/105 14/105 Applications des GPU : Calcul scientifique GPU comme coprocesseur Exemples de speed-up par rapport au CPU : FFT : x 4 « High Performance GPU-based FFT Library », Govindaraiu et al. Décomposition LU : x3 - 5 « LU-GPU: Algorithms for dense Linear Systems on Graphics Hardware », Galoppo et al. Finite Difference Time Domain(FDTD, maxwell) : x30 - 50 http://www.emphotonics.com/fastfdtd.html Recherche, tri dans base de données : x4 « GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management », Govindaraiu et al. École École Pliage de protéine : x 40 thématique thématique ARCHI07 ARCHI07 http://folding.stanford.edu/ mars 2007 mars 2007 15/105 16/105 GPU dans le système Bus : les autoroutes de la carte mère PCI Carte Vidéo PCI cadencé à 33 MHz en 32 bit : 125 Mo/s maximum Mémoire PCI cadencé à 33 MHz en 64 bit : 250 Mo/s maximum vidéo PCI cadencé à 66 MHz en 32 bit : 250 Mo/s maximum PCI cadencé à 66 MHz en 64 bit : 500 Mo/s maximum GPU AGP AGP 1x (250 Mo par seconde), AGP 2x (500 Mo par seconde, deux transferts par cycle au lieu d'un) Northbridge AGP 4x (1 Go par seconde), + Bus Système AGP 8x (2 Go/s maximum) (PCI,AGP,PCIe…) PCI Express PCI Express 1X (250 Mo/s) Carte Mère PCI Express 32X (8 Go/s ) École CPU École thématique thématique HyperTransport ARCHI07 ARCHI07 HyperTransport 2.0 (22,4 Go/s, 1,6 GHz) Mémoire système HyperTransport 3.0 (41,6 Go/s, 2,6 Ghz) mars 2007 mars 2007 17/105 18/105 Communications entre CPU et GPU Le pipeline graphique Comment passer d’ici à là ? Application Commandes Géométrie Rasterization École École Pixel thématique thématique ARCHI07 ARCHI07 mars 2007 mars 2007 Écran 19/105 20/105 Le pipeline graphique Le pipeline graphique Application Application Gears of war : Définition du jeu Commandes Commandes Bus CPU Réseaux Système GPU Interface avec l’utilisateur Géométrie Géométrie Physiques IA Rasterization Rasterization École Pixel École Pixel thématique thématique ARCHI07 ARCHI07 mars 2007 Écran mars 2007 Écran 21/105 22/105 Le pipeline graphique Le pipeline graphique Application Application Gears Of War : Commandes Commandes Envoi des commandes DirectX Géométrie Géométrie Drivers DirectX : GPU : Traitement des commandes Transformation des Sommets Rasterization DirectX Rasterization Calcul de la lumière Envoi des instructions au GPU Elimination des objets en dehors de École Pixel École Pixel la scène thématique thématique Assemblage des primitives ARCHI07 ARCHI07 mars 2007 Écran mars 2007 Écran 23/105 24/105 Le pipeline graphique Le pipeline graphique Application Application Commandes Commandes Géométrie Géométrie Rasterization GPU : Rasterization Conversion des triangles en Pixel GPU : École Pixel Interpolation des coordonnées de École Pixel Application de la texture thématique texture thématique ARCHI07 ARCHI07 Test de profondeur Interpolation de couleur mars 2007 Écran mars 2007 Écran Mélange des couleurs 25/105 26/105 Le pipeline graphique Pipeline graphique traditionnel Application Application Fonctions pré-définies Configurable en utilisant les états OpenGL / DirectX Commandes Vertex Processor Fonctionnalités limitées Géométrie Rasterization Rasterization Pixel École Pixel École Processor thématique thématique ARCHI07 ARCHI07 mars 2007 Écran mars 2007 Écran 27/105 28/105 Nouveau Pipeline graphique Vertex Shaders Application Application Vertex Vertex Processor Processor Programmable Les vertex shaders permettent d’animer/modifier les caractéristiques des sommets Rasterization Rasterization Calcul de la lumière Pixel Pixel Processor Processor École Programmable École thématique thématique Ils sont flexibles et rapides ARCHI07 ARCHI07 mars 2007 Écran Écran mars 2007 29/105 30/105 Pixel Shaders Chaque pixel est calculé individuellement PARTIE 2 : Fonctionnement interne du GPU École Les pixels shaders ont une vue partielle des pixels voisins. École thématique thématique ARCHI07 ARCHI07 mars 2007 mars 2007 31/105 32/105 Partie 2 : Fonctionnement interne du GPU Bloc diagramme d’un GPU Commandes Command & data fetch 1. Données Vertex Shader Géométrie 2. Tesselation Cull/Clip/Setup 3. Géométrie, lumière et vertex shader Rasterization Z-Cull Rasterization 4. Clipping, culling Shared 5. Tramage L2 Pixel Pixel Shader texture 6. Pixel Shader Cache Fragment pixel crossbar 7. Test alpha 8. Brouillard Z-compare & Blend École École thématique thématique Memory Memory Memory Memory ARCHI07 ARCHI07 partition partition partition partition mars 2007 mars 2007 Écran 128 Mo 128 Mo 128 Mo 128 Mo GDDR 3 GDDR GDDR 3 GDDR 3 GDDR 33/105 3 GDDR 34/105 Vertex / Vertices Données primitives Permettent de créer des formes géométriques Formes géométriques simples Vertex {x, y, z, w} Point (effet de particule) Coordonnées finales : {x/w, y/w, z/w} W est le caractère homogène généralement = 1 Ligne + d’autres informations Couleur diffuse : {r,g,b,a} Triangle Couleur du vertex lorsque celui-ci est éclairé par une lumière blanche Couleur spéculaire : Couleur réfléchie par le vertex Triangle fan Normale à la surface : Permet le calcul d’éclairage Triangle strip École Coordonnées de textures (u, v) École thématique thématique ARCHI07 Taille mémoire croissante ~ jusqu’à 256 octets ARCHI07 Tampons d’indexation mars 2007 {x,y,z,w} : 16, {r,g,b,a} : 4, normale : 12 … mars 2007 Réutilisation des vertex en utilisant des indices 35/105 36/105 Format de représentation des données 2. Tesselation Virgule fixe 123.45678 Principe : Augmenter le nombre de vertices pour augmenter la finesse de 8, 16, 32 la surface. Signé, non signé Compression de plusieurs données dans un seul nombre Interpolation pour arriver à une résolution donnée (combinaison de 5,6,5 ; 1,5,5,5 ; …) Dépend de la position de l’objet (près, distant) Virgule flottante 1.12345678 e 2 Calcul à partir de la Position du vertex + normale fp16, fp24, fp32 Support partiel de la norme IEEE-754 École École thématique thématique ARCHI07 ARCHI07 mars 2007 mars 2007 Tesselation point-normal d’ATI Tesselation d’un modèle complexe 37/105 38/105 3. Géométrie, lumière et vertex shader Translation Objectif : Transformations géométriques + Lumière Rotation, translation Transformation de système de coordonnées Changer la forme d’un objet Changer la position d’un objet dans une scène Dupliquer des objets Calcul de projection pour les caméras virtuelles Animations Solutions : Vertex processor fixe École Vertex processor programmable École thématique thématique ARCHI07 ARCHI07 mars 2007 mars 2007 39/105 40/105 Rotation Mise à l’échelle Uniforme Déformante École École thématique thématique ARCHI07 ARCHI07 mars 2007 mars 2007 41/105 42/105 Lumière : Diffusion Lumière : Réflexion (miroir) Loi de Lambert : L’intensité lumineuse réfléchie est proportionnelle au cosinus de Miroir parfait : l’angle d’incidence de la lumière θθθl = θθθr Calcul : Model d’illumination
Recommended publications
  • HELLAS: a Specialized Architecture for Interactive Deformable Object Modeling
    HELLAS: A Specialized Architecture for Interactive Deformable Object Modeling Shrirang Yardi Benjamin Bishop Thomas Kelliher Department of Electrical and Department of Computing Department of Mathematics Computer Engineering Sciences and Computer Science Virginia Tech University of Scranton Goucher College Blacksburg, VA Scranton, PA Baltimore, MD [email protected] [email protected] [email protected] ABSTRACT more realistic models, it is likely that the computational Applications involving interactive modeling of deformable resources required by these simulations may continue to ex- objects require highly iterative, floating-point intensive nu- ceed those provided by the incremental advances in general- merical simulations. As the complexity of these models in- purpose systems [12]. This has created an interest in the creases, the computational power required for their simula- use of specialized hardware that provides high performance tion quickly grows beyond the capabilities of current gen- computational engines to accelerate such floating-point in- eral purpose systems. In this paper, we present the design tensive applications [6, 7]. of a low–cost, high–performance, specialized architecture to Our research focuses on the design of such specialized sys- accelerate these simulations. Our aim is to use such special- tems. However, our approach has several important differ- ized hardware to allow complex interactive physical model- ences as compared to previous work in this area: (i) we ing even on consumer-grade PCs. In this paper, we present specifically target algorithms that are numerically stable, details of the target algorithms, the HELLAS architecture, but are much more computationally intensive than those simulation results and lessons learned from our implemen- used by existing hardware implementations [3,6]; (ii) we aim tation.
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • Graphics Hardware
    Department of Computer Engineering Graphics Hardware Ulf Assarsson Graphics hardware – why? l About 100x faster! l Another reason: about 100x faster! l Simple to pipeline and parallelize l Current hardware based on triangle rasterization with programmable shading (e.g., OpenGL acceleration) l Ray tracing: there are research architetures, and few commercial products – Renderdrive, RPU, (Gelato), NVIDIA OptiX 2 3 Perspective-correct texturing l How is texture coordinates interpolated over a triangle? l Linearly? Linear interpolation Perspective-correct interpolation l Perspective-correct interpolation gives foreshortening effect! l Hardware does this for you, but you need to understand this anyway! 4 5 Recall the following l Before projection, v, and after p (p=Mv) l After projection pw is not 1! l Homogenization: (px /pw , py /pw , pz /pw , 1) l Gives (px´, py ´ , pz´ , 1) 6 Texture coordinate interpolation l Linear interpolation does not work l Rational linear interpolation does: – u(x)=(ax+b) / (cx+d) (along a scanline where y=constant) – a,b,c,d are computed from triangle’s vertices (x,y,z,w,u,v) l Not really efficient l Smarter: – Compute (u/w,v/w,1/w) per vertex – These quantities can be linearly interpolated! – Then at each pixel, compute 1/(1/w)=w – And obtain: (w*u/w,w*v/w)=(u,v) – The (u,v) are perspectively-correct interpolated l Need to interpolate shading this way too – Though, not as annoying as textures l Since linear interpolation now is OK, compute, e.g., Δ(u/w)/ Δx, and use this to update u/w when stepping in the x- direction (similarly for other parameters) 7 Put differently: l Linear interpolation in screen space does not work for u,v l Solution: – We have applied a non-linear transform to each vertex (x/w, y/w, z/w).
    [Show full text]
  • Corporate Backgrounder
    AGEIA PhysX FAQs for System and Board Partners For use in partner marketing materials KEY QUESTIONS 1: What exactly is the AGEIA PhysX Processor? 2: What is the Gaming Power Triangle? 3: What is physics for gaming? 4: What is a physics processor? 5: What makes the AGEIA PhysX hardware special? 6: Why do I need physics in my game? 7: Isn’t physics just juiced up graphics? 8: Can I get PhysX enabled games today? 9: Why do I need dedicated hardware for physics? 10: How do I get the new AGEIA PhysX processor into my PC? 11: How much does AGEIA PhysX cost? 12: How do games get PhysX-optimized? 13: Do I have to buy new hardware every year just to get new physics features? AGEIA Confidential March 2006 . 1 What exactly is the AGEIA PhysX Processor? The AGEIA PhysX processor is the first dedicated hardware accelerator for PC games that enables dynamic motion and interaction for a totally new gaming experience. Exciting immersive physical gaming feature that you can expect to see in next generation cutting edge titles will include groundbreaking physics features such as: * Explosions that cause dust and collateral debris * Characters with complex, jointed geometries for more life-like motion and interaction * Spectacular new weapons with unpredictable effects * Cloth that drapes and tears the way you expect it to * Lush foliage that sways naturally when brushed against * Dense smoke & fog that billow around objects in motion What is the Gaming Power Triangle The gaming triangle represents the three critical PC processors that enable the best possible gaming experience.
    [Show full text]
  • Introduction
    A D V A N C E D C O M P U T E R G R A P H I C S Introduction (Thanks to Professions Andries van Dam and John Hughes) CMSC 635 January 15, 2013 Introduction ‹#›/16 A D V A N C E D C O M P U T E R G R A P H I C S What is Computer Graphics? • Computer graphics generally means creation, storage and manipulation of models and images • Such models come from diverse and expanding set of fields including physical, mathematical, artistic, biological, and even conceptual (abstract) structures Frame from animation by William Latham, shown at SIGGRAPH 1992. Latham uses rules that govern patterns of natural forms to create his artwork. CMSC 635 January 15, 2013 Introduction ‹#›/16 A D V A N C E D C O M P U T E R G R A P H I C S Modeling vs. Rendering • Modeling – Create models – Apply materials to models – Place models around scene – Place lights in scene – Place the camera • Rendering – Take “picture” with camera • Both can be done by modern commercial software: – Autodesk MayaTM ,3D Studio MaxTM, BlenderTM, etc. Point Light Spot Light Directional Light Ambient Light CS128 lighting assignment by Patrick Doran, Spring 2009 CMSC 635 January 15, 2013 Introduction ‹#›/16 A D V A N C E D C O M P U T E R G R A P H I C S Solved problem: Rendering Avatar 2009 CMSC 635 January 15, 2013 Introduction ‹#›/16 A D V A N C E D C O M P U T E R G R A P H I C S What is Interactive Computer Graphics? (1/2) • User controls contents, structure, and appearance of objects and their displayed images via rapid visual feedback • Basic components of an interactive graphics system – input (e.g., mouse, tablet and stylus, multi-touch…) – processing (and storage) – display/output (e.g., screen, paper-based printer, video recorder…) • First truly interactive graphics system, Sketchpad, pioneered at MIT by Ivan Sutherland for his 1963 Ph.D.
    [Show full text]
  • Multicore: Why Is It Happening Now? Eller Hur Mår Moore’S Lag?
    Multicore: Why is it happening now? eller Hur Mår Moore’s Lag? Erik Hagersten Uppsala Universitet Are we hitting the wall now? Pop: Can the transistors be made even smaller and faster? Performance Possible path, [log] but requires a paradigm shift 1000 ? Business as 100 ” w usual… La s re o Well uhm … 10 o M “ The transistors can p: Po be made smaller and PDC 1 Summer faster, but there are School other problems / 2010 Institutionen för informationsteknologi | www.it.uu.seNowMC 2 ©Erik Hagersten| user.it.uu.se/~ehYear Darling, I shrunk the computer Mainframes Super Minis: Microprocessor: Mem Multicore: Many CPUs on a Mem PDC Summer chip! School 2010 Institutionen för informationsteknologi | www.it.uu.se MC 3 ©Erik Hagersten| user.it.uu.se/~eh Multi-core CPUs: Ageia PhysX, a multi-core physics processing unit. Ambric Am2045, a 336-core Massively Parallel Processor Array (MPPA) AMD Athlon 64, Athlon 64 FX and Athlon 64 X2 family, dual-core desktop processors. Opteron, dual- and quad-core server/workstation processors. Phenom, triple- and quad-core desktop processors. Sempron X2, dual-core entry level processors. Turion 64 X2, dual-core laptop processors. Radeon and FireStream multi-core GPU/GPGPU (10 cores, 16 5-issue wide superscalar stream processors per core) ARM MPCore is a fully synthesizable multicore container for ARM9 and ARM11 processor cores, intended for high-performance embedded and entertainment applications. Azul Systems Vega 2, a 48-core processor. Broadcom SiByte SB1250, SB1255 and SB1455. Cradle Technologies CT3400 and CT3600, both multi-core DSPs. Cavium Networks Octeon, a 16-core MIPS MPU.
    [Show full text]
  • Gputerasort: High Performance Graphics Co-Processor Sorting For
    GPUTeraSort: High Performance Graphics Co­processor Sorting for Large Database Management Naga K. Govindaraju ∗ Jim Gray † Ritesh Kumar ∗ Dinesh Manocha ∗ naga,ritesh,dm @cs.unc.edu, [email protected] { } http://gamma.cs.unc.edu/GPUTERASORT ABSTRACT advanced memory units, and multi-processors to improve sorting performance. The current Indy PennySort record We present a new algorithm, GPUTeraSort, to sort billion- 1 record wide-key databases using a graphics processing unit benchmark , sorts a 40 GB database in 1541 seconds on a (GPU) Our algorithm uses the data and task parallelism $614 Linux/AMD system. on the GPU to perform memory-intensive and compute- However, current external memory sort performance is intensive tasks while the CPU is used to perform I/O and limited by the traditional Von Neumann style architecture resource management. We therefore exploit both the high- of the CPU. Computer architects use data caches to amelio- bandwidth GPU memory interface and the lower-bandwidth rate the CPU and the main memory bottleneck; but, CPU- CPU main memory interface and achieve higher memory based sorting algorithms incur significant cache misses on bandwidth than purely CPU-based algorithms. GPUTera- large datasets. Sort is a two-phase task pipeline: (1) read disk, build keys, This article shows how to use a commodity graphics pro- sort using the GPU, generate runs, write disk, and (2) read, cessing unit (GPU) as a co-processor to sort large datasets. merge, write. It also pipelines disk transfers and achieves GPUs are programmable parallel architectures designed for near-peak I/O performance. We have tested the perfor- real-time rasterization of geometric primitives - but they mance of GPUTeraSort on billion-record files using the stan- are also highly parallel vector co-processors.
    [Show full text]
  • The Hardware Accelerated Physics Engine with Operating Parameters Controlling the Numerical Error Tolerance 1
    International Journal of Advanced Science and Technology Vol.119 (2018), pp.145-152 http://dx.doi.org/10.14257/ijast.2018.119.13 The Hardware Accelerated Physics Engine with Operating Parameters Controlling the Numerical Error Tolerance 1 Youngsik Kim Department of Game and Multimedia Engineering, Korea Polytechnic University [email protected] Abstract This paper designed the dedicated hardware pipeline for interactive 3D entertainment system considering parallelization characteristics of rigid body physics simulation. This paper also analyzed the large parallelization characteristics of independent rigid group units connected to joints in hierarchical collision detection and design a hardware structure suitable for each parallel processing stage by extracting small data parallelization characteristics of each constraint row unit of LCP equation. Also, this paper performed control such as the number of loop repetitions by performing hardware trade-off in which numerical error is reduced within visual error tolerance by setting error attenuation parameter (ERP) and constraint force mixing (CFM) as driving parameters. Keywords: Hardware Accelerator, Physics Engine, Open Dynamics Engine, Numerical Error Tolerance 1. Introduction The importance of physics engine for simulating the motion of an object about physical phenomena (mass, speed, friction, collision) of nature in the interactive 3D game, virtual reality, 3D animation and robotics industry is becoming more and more important [1-4]. In a car racing game, it is better to use a physics engine simulation than a keyframe animation created by an artist, as it slips by centrifugal force when cornering and collides head-on with other cars. It would be much more natural to model an automobile as a rigid body with mass, a moment of inertia, velocity, etc., and simulate the kinematic motion of the body due to collision or friction.
    [Show full text]
  • F06 Physics Processor
    Nicholas Bannister, Vinay Chaudhary, Aaron Hoy, Charles Norman, Jeremy Weagley Team VerchuCom PPU + Demo Final Report What We Built Game Description Overall Implementation Our original plan of action was to implement the game of Worms that utilized a customized Physics Processing Unit (PPU) built on the FPGA. We had planned to run the game on the Workstation, send objects back and forth to a shared bank of memory on the board, and update the objects’ state using the PPU. However, we came to realize that Worms would not fully show off our main feature of the project, which is the PPU. Therefore, we decided to create a “demo” which basically consisted of a bunch of squares interacting with each other, wind resistance and gravity within a confined area. We then took it to the next level and implemented a simple 2-player shooter game. The object of the game is for each player to destroy all of their squares, either blue or square, before their opponent. Each person has unlimited regular ammo along with 2 “nuclear bombs,” which destroy all of the bombs within a certain radius. Physics Processing Unit (PPU) We designed a specialized CPU (A PPU) that operates on structs representing 2 dimensional objects (position, velocity, acceleration, orientation, etc.). It modeled various particle effects, such as • Model collisions between particles and elastic/inelastic collisions • Effects of Gravity • Wind Resistance The accelerator sits in a basic iteration loop that applies each of the various effects on all the particles in memory, updating their positions, velocities, etc. based on physics.
    [Show full text]
  • PC Hardware Contents
    PC Hardware Contents 1 Computer hardware 1 1.1 Von Neumann architecture ...................................... 1 1.2 Sales .................................................. 1 1.3 Different systems ........................................... 2 1.3.1 Personal computer ...................................... 2 1.3.2 Mainframe computer ..................................... 3 1.3.3 Departmental computing ................................... 4 1.3.4 Supercomputer ........................................ 4 1.4 See also ................................................ 4 1.5 References ............................................... 4 1.6 External links ............................................. 4 2 Central processing unit 5 2.1 History ................................................. 5 2.1.1 Transistor and integrated circuit CPUs ............................ 6 2.1.2 Microprocessors ....................................... 7 2.2 Operation ............................................... 8 2.2.1 Fetch ............................................. 8 2.2.2 Decode ............................................ 8 2.2.3 Execute ............................................ 9 2.3 Design and implementation ...................................... 9 2.3.1 Control unit .......................................... 9 2.3.2 Arithmetic logic unit ..................................... 9 2.3.3 Integer range ......................................... 10 2.3.4 Clock rate ........................................... 10 2.3.5 Parallelism .........................................
    [Show full text]
  • A Comprehensive Survey of Various Processor Types & Latest
    International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue IV, April 2018 | ISSN 2321–2705 A Comprehensive Survey of Various Processor types & Latest Architectures Harisha M.S1, Dr. D. Jayadevappa2 1Research Scholar, Jain University, Bangalore, Karnataka, India 2Professor, Electronics Instrumentation, Dept., JSSATE, Bangalore, Karnataka, India Abstract- This technology survey paper covers application based Processors- ASIP, Processor based on Flynn’s classification which includes SISD, SIMD, MISD, MIMD, Special processors like Graphics processing unit (GPU), Physics processing unit (PPU), Digital signal processor (DSP), network processor, front end processor, co-processor and processor based on number of cores which includes- single core, multi core, multi- processor, hyper threading and multi core with shared cache processors. Keywords- ASIP, SISD, SIMD, MISD, MIMD, GPU, PPU, DSP, Network processor, coprocessor. I. INTRODUCTION A. Types of processors- Processor based on application Application-specific instruction-set processor (ASIP) Application-specific instruction set processor is a component Fig.2 Flynn’s classification used in system-on-a-chip design. The instruction set of an SISD & SIMD ASIP is designed to benefit a specific application [1]. This specialization of the core provides a tradeoff between the flexibility of a general purpose CPU and the performance of an ASIC. Fig.3 SISD and SIMD SISD SISD is a computer architecture in which a single uni-core processor executes a single instruction . Fig.1 ASIP stream, to operate on data stored in a single memory. A sequential computer which exploits no parallelism B. Processor types based on Flynn’s classification in either the instruction or data streams. In Flynn's taxonomy, the processors are classified based on Examples of SISD architecture are the traditional the number of concurrent instruction and data streams uniprocessor machines like PC or old mainframes available in the architecture: Single Instruction Single Data [3].
    [Show full text]
  • Revisiting the Future ☺
    Revisiting the Future ☺ Erik Hagersten Uppsala University Sweden DARK2 in a nutshell 1. Memory Systems (caches, VM, DRAM, microbenchmarks, …) 2. Multiprocessors (TLP, coherence, interconnects, scalability, clusters, …) 3. CPUs (pipelines, ILP, scheduling, Superscalars, VLIWs, embedded, …) 4. Future: (physical limitations, TLP+ILP DARK in the CPU,…) 2009 Dept of Information Technology| www.it.uu.se 2 ©Erik Hagersten| user.it.uu.se/~eh How do we get good performance? Creating and exploring: 1) Locality a) Spatial locality b) Temporal locality c) Geographical locality 2) Parallelism a) Instruction level (ILP) b) Thread level (TLP) DARK c) Memory level (MLP) 2009 Dept of Information Technology| www.it.uu.se 3 ©Erik Hagersten| user.it.uu.se/~eh Ray Kurzweil pictures www.KurzweilAI.net/pps/WorldHealthCongress/ DARK 2009 Dept of Information Technology| www.it.uu.se 4 ©Erik Hagersten| user.it.uu.se/~eh Ray Kurzweil pictures www.KurzweilAI.net/pps/WorldHealthCongress/ DARK 2009 Dept of Information Technology| www.it.uu.se 5 ©Erik Hagersten| user.it.uu.se/~eh Ray Kurzweil pictures www.KurzweilAI.net/pps/WorldHealthCongress/ DARK 2009 Dept of Information Technology| www.it.uu.se 6 ©Erik Hagersten| user.it.uu.se/~eh Ray Kurzweil pictures www.KurzweilAI.net/pps/WorldHealthCongress/ DARK 2009 Dept of Information Technology| www.it.uu.se 7 ©Erik Hagersten| user.it.uu.se/~eh Doubling (or Halving) times Dynamic RAM Memory (bits per dollar) 1.5 years Average Transistor Price 1.6 years Microprocessor Cost per Transistor Cycle 1.1 years Total Bits Shipped
    [Show full text]