The Intro to GPGPU CPU Vs

Total Page:16

File Type:pdf, Size:1020Kb

The Intro to GPGPU CPU Vs 12/12/11! The Intro to GPGPU . Dr. Chokchai (Box) Leangsuksun, PhD! Louisiana Tech University. Ruston, LA! ! CPU vs. GPU • CPU – Fast caches – Branching adaptability – High performance • GPU – Multiple ALUs – Fast onboard memory – High throughput on parallel tasks • Executes program on each fragment/vertex • CPUs are great for task parallelism • GPUs are great for data parallelism Supercomputing 20082 Education Program 1! 12/12/11! CPU vs. GPU - Hardware • More transistors devoted to data processing CUDA programming guide 3.1 3 CPU vs. GPU – Computation Power CUDA programming guide 3.1! 2! 12/12/11! CPU vs. GPU – Memory Bandwidth CUDA programming guide 3.1! What is GPGPU ? • General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical path of application • Data parallel algorithms leverage GPU attributes – Large data arrays, streaming throughput – Fine-grain SIMD parallelism – Low-latency floating point (FP) computation © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007! ECE 498AL, University of Illinois, Urbana-Champaign! 3! 12/12/11! Why is GPGPU? • Large number of cores – – 100-1000 cores in a single card • Low cost – less than $100-$1500 • Green computing – Low power consumption – 135 watts/card – 135 w vs 30000 w (300 watts * 100) • 1 card can perform > 100 desktops 12/14/09!– $750 vs 50000 ($500 * 100) 7 Two major players 4! 12/12/11! Parallel Computing on a GPU • NVIDIA GPU Computing Architecture – Via a HW device interface – In laptops, desktops, workstations, servers • Tesla T10 1070 from 1-4 TFLOPS • AMD/ATI 5970 x2 3200 cores • NVIDIA Tegra is an all-in-one (system-on-a-chip) ATI 4850! processor architecture derived from the ARM family • GPU parallelism is better than Moore’s law, more doubling every year • GPGPU is a GPU that allows user to process both graphics and non-graphics applications. GeForce 8800! © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign Requirements of a GPU system • GPGPU is a GPU that allows user to process both graphics and non-graphics applications. • GPGPU-capable video card • Power supply • Cooling Tesla D870! • PCI-Express 16x GeForce 8800! © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 5! 12/12/11! Examples of GPU devices © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007! ECE 498AL, University of Illinois, Urbana-Champaign! NVIDIA GeForce 8800 (G80) • the eighth generation of NVIDIA’s GeForce graphic cards. • High performance CUDA-enabled GPGPU • 128 cores • Memory 256-768 MB or 1.5 GB in Tesla • High-speed memory bandwidth (86.4GB/s) • Supports Scalable Link Interface (SLI) 6! 12/12/11! NVIDIA GeForce 295(G200) • the tenth generation of NVIDIA’s GeForce graphic cards. • The second generation of CUDA architecture. • Dual GPU card. • 480 cores. (240 per GPU ) • 1242 Mhz processor clock speed. • Memory 1792 MB. (896 MB per GPU) • 223.8 GB/s memory bandwidth. (2 memory interfaces) • Supports Quad Scalable Link Interface (SLI) NVIDIA GeForce 480(Fermi) • the elevenths generation of NVIDIA’s GeForce graphic cards. • The third generation of CUDA architecture. • 480 cores. • 1401 Mhz processor clock speed • Memory 1536 MB. • 177.4 GB/s memory bandwidth • Supports 2way/3way Scalable Link Interface (SLI) 7! 12/12/11! NVIDIA TeslaTM • Feature – GPU Computing for HPC – No display ports – Dedicate to computation – For massively Multi-threaded computing – Supercomputing performance – Large memory capacity up to 6GB in Tesla M2070 NVIDIA Tesla Card >> • Tesla 10:! • C-Series(Card) = 1 GPU with 1.5 GB! • D-Series(Deskside unit) = 2 GPUs! • S-Series(1U server) = 4 GPUs! • Tesla 20 (Fermi architecture) = 1GPU with 3GB or 6GB! ! • Note: 1 G80 GPU = 128 cores = ~500 GFLOPs! • 1 T10 = 240 cores = 1 TFLOPs ! << NVIDIA G80 8! 12/12/11! NVIDIA Fermi (Tesla seris 20) “I believe history will record Fermi • 512 cores (16 SM * 32 cores) as a significant milestone. ” ! Dave Patterson! • 8X faster peak DP floating point calculation. • 520-630 GFLOPS DP • 3GB-GDDR5 for Tesla 2050 • 6GB-GDDR5 for Tesla 2070 • ECC • L1 and L2 cache • Concurrent Kernels Executions (up to 16 kernels) • IEEE754-2008 and FMA Fused Multiply-Add NVidia Fermi Architecture NVIDIA's Fermi white paper! 9! 12/12/11! 3rd Generation SM Architecture •32 cores, 16 load/store registers, and 4 Special Function Unites.! ! •Customized 64KB memory 16KB Shared memory and 48KB L1 Cache, Or 48KB Shared memory and 16KB L1 Cache.! ! • dual warp scheduler.! ! •Each CUDA Core contain one Floating Point Unit and one Integer ALU, With DB support.! •8X faster in double precession operations than GT200.! NVIDIA's Fermi white paper! Memory Hierarchy Each Thread in a block can access the shared memory and the L1 Cache, Each block has the access to the L2 cache and the Global memory. ! NVIDIA's Fermi white paper! 10! 12/12/11! Dual Warp Scheduler >>! << Concurrent Kernel Execution! NVIDIA's Fermi white paper! Fermi Products GTX460 GTX465 GTX470 GTX480 Tesla2050 Tesla2070 Cores 336 352 448 480 448 448 Clock SP:1.05TFLOPS 1350MHz 1215MHz 1215MHz 1401MHz Speed DP: 515 GFLOPS 768MB or Memory 1GB 1280 MB 1.5 GB 3 GB 6GB 1GB 86.4 or bandwidth 102.6 133.9 177.4 148 148 115.2 Power 160W 200W 215W 250W 225W 225W Price $199-$249 $299 $349 $499 $2,499 $3,999 NVIDIA.com! 11! 12/12/11! CUDA Architecture Generations The linked image cannot be displayed. The file may have been moved, renamed, or deleted. Verify that the link points to the correct file and location. Nvidia's Fermi white paper! Fermi VS GT200 Each SM in fermi architecture can do 16 FMA (Fused Multiply- Add) double precision operation per clock cycle. ! Nvidia's Fermi white paper! 12! 12/12/11! Nvidia's Fermi white paper! This slide is from NVDIA CUDA tutorial! © David Kirk/ NVIDIA and Wen- mei W. Hwu, 2007! ECE 498AL, University of Illinois, Urbana-Champaign! 13! 12/12/11! ATI Stream (1) 12/14/09 27 ATI 4870 12/14/09 28 14! 12/12/11! ATI 4870 X2 12/14/09 29 ATI Radeon™ HD 5870 Transistors 2.15 billion (40nm) Stream Cores 1600 Clock speed 850 MHz SP Compute Power 2.72 TeraFLOPS DB Compute Power 544 GigaFLOPS Memory Type GDDR5 4.8Gbps Memory Capacity 1 GB Memory Bandwidth 153.6 GB/sec Board Power 188w max / 27w idle AMD.com! 15! 12/12/11! ATI Radeon™ HD 5970 Transistors 4.3 billion (40nm) Stream Cores 3200 (2 GPUs) Clock speed 725 MHz SP Compute Power 4.64 TeraFLOPS DB Compute Power 928 GigaFLOPS Memory Type GDDR5 4.0Gbps Memory Capacity 2 - 4 GB Memory Bandwidth 256.0 GB/sec Board Power 294w max / 51w idle AMD.com! ! Architecture of ATI Radeon 4000 series 16! 12/12/11! This slide is from ATI presentation! This slide is from ATI presentation! 17! 12/12/11! What about Intel ?? Intel Larrabee • a hybrid between a multi-core CPU and a GPU, • coherent cache hierarchy and x86 architecture compatibility are CPU-like • its wide SIMD vector units and texture sampling hardware are GPU-like. 18! 12/12/11! Months after ISC’09, Intel canceled the larrabee project! ! In ISC’10 they announced new project code name “Night Ferry” using a similar architecture to larrabee called MIC! Intel Night Ferry (MIC Architecture) • 22 nm technology • 32 cores 1.2Ghz ( MIC is up to 50 cores) • 128 threads at 4threads/core. • 8MB shared coherent cache • 1-2GB GDDR5 • Intel HPC tools This slide information from ISC’10 Skaugen_keynote ! 19! 12/12/11! MIC Architecture (Many Integrated Core) This slide information from ISC’10 Skaugen_keynote ! Intel Night Ferry VS NVIDIA Fermi ! !!!!!!! Intel MIC ! ! !NVIDIA Fermi !! MIMD Parallelism ! !!!32 ! !!!!32(28) !! SIMD Parallelism! !!!!16 ! !!!! 16 !! Instruction-Level Parallelism! ! 2 ! !!!! 1!! Thread Granularity ! !!!coarse ! !! fine !! Multithreading ! !!!! 4 ! !! ! 24 !! Clock ! !!!!!! 1.2GHz! ! ! 1.1GHz!! L1 cache/processor ! !!!32KB ! !!!64KB !! L2 cache/processor ! !!!256KB ! !!!24KB !! programming model! !!posix threads ! !CUDA kernels!! virtual memory ! !!!!yes! !!!! no!! memory shared with host ! ! no! !!! no !! hardware parallelism support ! no! !!!! yes !! mature tools ! !!!!!yes! !!!! yes !! This information from the Article “Compiler and more: Night Ferry Versus Fermi” by Michael Wolf. Portland group inc.! 20! 12/12/11! Introduction to ! Open! CL! Toward new approach in Computing! Introduction to openCL • OpenCL stands for Open Computing Language. • It is from consortium efforts such as Apple, NVDIA, AMD etc. • The Khronos group who was responsible for OpenGL. • Toke 6 months to come up with the specifications. 21! 12/12/11! OpenCL 1. Royalty-free. 2. Support both task and data parallel programing modes. 3. Works for vendor-agnostic GPGPUs 4. including multi cores CPUs 5. Works on Cell processors. 6. Support handhelds and mobile devices. 7. Based on C language under C99. 22! 12/12/11! OpenCL Platform Model Basic OpenCL program Structure 1. OpenCL Kernel 2. Host program containing: a. Devices Context. b. Command Queue c. Memory Objects d. OpenCL Program. e. Kernel Memory Arguments. 23! 12/12/11! CPUs+GPU platforms 12/14/09 47 Performance of GPGPU Note: A cluster of dual Xeon 2.8GZ 30 nodes, Peak performance ~336 GFLOPS! 24! 12/12/11! © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007! ECE 498AL, University of Illinois, Urbana-Champaign! 25! 12/12/11! CUDA • “Compute Unified Device Architecture” • General purpose programming model – User kicks off batches of threads on the GPU – GPU = dedicated super-threaded, massively data parallel co-processor • Targeted software stack – Compute oriented drivers, language, and tools • Driver for loading computation programs into GPU – Standalone Driver - Optimized for computation – Interface designed for compute - graphics free API – Data sharing with OpenGL buffer objects – Guaranteed maximum download & readback speeds – Explicit GPU memory management © David Kirk/NVIDIA and Wen-mei W.
Recommended publications
  • CUDA by Example
    CUDA by Example AN INTRODUCTION TO GENERAL-PURPOSE GPU PROGRAMMING JASON SaNDERS EDWARD KANDROT Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City Sanders_book.indb 3 6/12/10 3:15:14 PM Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. NVIDIA makes no warranty or representation that the techniques described herein are free from any Intellectual Property claims. The reader assumes all risk of any such claims based on his or her use of these techniques. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U.S. Corporate and Government Sales (800) 382-3419 [email protected] For sales outside the United States, please contact: International Sales [email protected] Visit us on the Web: informit.com/aw Library of Congress Cataloging-in-Publication Data Sanders, Jason.
    [Show full text]
  • System-On-A-Chip (Soc) & ARM Architecture
    System-on-a-Chip (SoC) & ARM Architecture EE2222 Computer Interfacing and Microprocessors Partially based on System-on-Chip Design by Hao Zheng 2020 EE2222 1 Overview • A system-on-a-chip (SoC): • a computing system on a single silicon substrate that integrates both hardware and software. • Hardware packages all necessary electronics for a particular application. • which implemented by SW running on HW. • Aim for low power and low cost. • Also more reliable than multi-component systems. 2020 EE2222 2 Driven by semiconductor advances 2020 EE2222 3 Basic SoC Model 2020 EE2222 4 2020 EE2222 5 SoC vs Processors System on a chip Processors on a chip processor multiple, simple, heterogeneous few, complex, homogeneous cache one level, small 2-3 levels, extensive memory embedded, on chip very large, off chip functionality special purpose general purpose interconnect wide, high bandwidth often through cache power, cost both low both high operation largely stand-alone need other chips 2020 EE2222 6 Embedded Systems • 98% processors sold annually are used in embedded applications. 2020 EE2222 7 Embedded Systems: Design Challenges • Power/energy efficient: • mobile & battery powered • Highly reliable: • Extreme environment (e.g. temperature) • Real-time operations: • predictable performance • Highly complex • A modern automobile with 55 electronic control units • Tightly coupled Software & Hardware • Rapid development at low price 2020 EE2222 8 EECS222A: SoC Description and Modeling Lecture 1 Design Complexity Challenge Design• Productivity Complexity
    [Show full text]
  • Comparative Study of Various Systems on Chips Embedded in Mobile Devices
    Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol.4, No.7, 2013 - National Conference on Emerging Trends in Electrical, Instrumentation & Communication Engineering Comparative Study of Various Systems on Chips Embedded in Mobile Devices Deepti Bansal(Assistant Professor) BVCOE, New Delhi Tel N: +919711341624 Email: [email protected] ABSTRACT Systems-on-chips (SoCs) are the latest incarnation of very large scale integration (VLSI) technology. A single integrated circuit can contain over 100 million transistors. Harnessing all this computing power requires designers to move beyond logic design into computer architecture, meet real-time deadlines, ensure low-power operation, and so on. These opportunities and challenges make SoC design an important field of research. So in the paper we will try to focus on the various aspects of SOC and the applications offered by it. Also the different parameters to be checked for functional verification like integration and complexity are described in brief. We will focus mainly on the applications of system on chip in mobile devices and then we will compare various mobile vendors in terms of different parameters like cost, memory, features, weight, and battery life, audio and video applications. A brief discussion on the upcoming technologies in SoC used in smart phones as announced by Intel, Microsoft, Texas etc. is also taken up. Keywords: System on Chip, Core Frame Architecture, Arm Processors, Smartphone. 1. Introduction: What Is SoC? We first need to define system-on-chip (SoC). A SoC is a complex integrated circuit that implements most or all of the functions of a complete electronic system.
    [Show full text]
  • System-On-Chip Design with Virtual Components
    past designs can a huge chip be com- pleted within a reasonable time. This FEATURE solution usually entails reusing designs from previous generations of products ARTICLE and often leverages design work done by other groups in the same company. Various forms of intercompany cross licensing and technology sharing Thomas Anderson can provide access to design technol- ogy that may be reused in new ways. Many large companies have estab- lished central organizations to pro- mote design reuse and sharing, and to System-on-Chip Design look for external IP sources. One challenge faced by IP acquisi- tion teams is that many designs aren’t well suited for reuse. Designing with with Virtual Components reuse in mind requires extra time and effort, and often more logic as well— requirements likely to be at odds with the time-to-market goals of a product design team. Therefore, a merchant semiconduc- tor IP industry has arisen to provide designs that were developed specifically for reuse in a wide range of applications. These designs are backed by documen- esign reuse for tation and support similar to that d semiconductor provided by a semiconductor supplier. Here in the Recycling projects has evolved The terms “virtual component” from an interesting con- and “core” commonly denote reusable Age, designing for cept to a requirement. Today’s huge semiconductor IP that is offered for system-on-a-chip (SOC) designs rou- license as a product. The latter term is reuse may sound like tinely require millions of transistors. promoted extensively by the Virtual Silicon geometry continues to shrink Socket Interface (VSI) Alliance, a joint a great idea.
    [Show full text]
  • Lecture Notes
    Lecture #4-5: Computer Hardware (Overview and CPUs) CS106E Spring 2018, Young In these lectures, we begin our three-lecture exploration of Computer Hardware. We start by looking at the different types of computer components and how they interact during basic computer operations. Next, we focus specifically on the CPU (Central Processing Unit). We take a look at the Machine Language of the CPU and discover it’s really quite primitive. We explore how Compilers and Interpreters allow us to go from the High-Level Languages we are used to programming to the Low-Level machine language actually used by the CPU. Most modern CPUs are multicore. We take a look at when multicore provides big advantages and when it doesn’t. We also take a short look at Graphics Processing Units (GPUs) and what they might be used for. We end by taking a look at Reduced Instruction Set Computing (RISC) and Complex Instruction Set Computing (CISC). Stanford President John Hennessy won the Turing Award (Computer Science’s equivalent of the Nobel Prize) for his work on RISC computing. Hardware and Software: Hardware refers to the physical components of a computer. Software refers to the programs or instructions that run on the physical computer. - We can entirely change the software on a computer, without changing the hardware and it will transform how the computer works. I can take an Apple MacBook for example, remove the Apple Software and install Microsoft Windows, and I now have a Window’s computer. - In the next two lectures we will focus entirely on Hardware.
    [Show full text]
  • NVIDIA Quadro Technical Specifications
    NVIDIA Quadro Technical Specifications NVIDIA Quadro Workstation GPU High-resolution Antialiasing ° Dassault CATIA • Full 128-bit floating point precision • Up to 16x full-scene antialiasing (FSAA), ° ESRI ArcGIS pipeline at resolutions up to 1920 x 1200 ° ICEM Surf • 12-bit subpixel precision • 12-bit subpixel sampling precision ° MSC.Nastran, MSC.Patran • Hardware-accelerated antialiased enhances AA quality ° PTC Pro/ENGINEER Wildfire, points and lines • Rotated-grid FSAA significantly 3Dpaint, CDRS The NVIDIA Quadro® family of In addition to a full line up of 2D and • Hardware OpenGL overlay planes increases color accuracy and visual ° SolidWorks • Hardware-accelerated two-sided quality for edges, while maintaining ° UDS NX Series, I-deas, SolidEdge, professional solutions for workstations 3D workstation graphics solutions, the lighting performance3 Unigraphics, SDRC delivers the fastest application NVIDIA Quadro professional products • Hardware-accelerated clipping planes and many more… Memory performance and the highest quality include a set of specialty solutions that • Third-generation occlusion culling • Digital Content Creation (DCC) graphics. have been architected to meet the • 16 textures per pixel • High-speed memory (up to 512MB Alias Maya, MOTIONBUILDER needs of a wide range of industry • OpenGL quad-buffered stereo (3-pin GDDR3) ° NewTek Lightwave 3D Raw performance and quality are only sync connector) • Advanced lossless compression ° professionals. These specialty Autodesk Media and Entertainment the beginning. The NVIDIA
    [Show full text]
  • EE Concentration: System-On-A-Chip (Soc)
    EE Concentration: System-on-a-Chip (SoC) Requirements: Complete ESE350, ESE370, CIS371, ESE532 Requirement Flow: Impact: The chip at the heart of your smartphone, tablet, or mp3 player (including the Apple A11, A12) is an SoC. The chips that run almost all of your gadgets today are SoCs. These are the current culmination of miniaturization and part count reduction that allows such systems to built inexpensively and from small part counts. These chips democratize innovation, by providing a platform for the deployment of novel ideas without requiring hundreds of millions of dollars to build new custom ICs. Description: Modern computational and control chips contain billions of transistors and run software that has millions of lines of code. They integrate complete systems including multiple, potentially heterogeneous, processing elements, sophisticated memory hierarchies, communications, and rich interfaces for inputs and outputs including sensing and actuations. To design these systems, engineers must understand IC technology, digital circuits, processor and accelerator architectures, networking, and composition and interfacing and be able to manage hardware/software trade-offs. This concentration prepares students both to participate in the design of these SoC architectures and to use SoC architectures as implementation vehicles for novel embedded computing tasks. Sample industries and companies: ● Integrated Circuit Design: ARM, IBM, Intel, Nvidia, Samsung, Qualcomm, Xilinx ● Consumer Electronics: Apple, Samsung, NEST, Hewlett Packard ● Systems: Amazon, CISCO, Google, Facebook, Microsoft ● Automotive and Aerospace: Boeing, Ford, Space-X, Tesla, Waymo ● Your startup Sample Job Titles: ● Hardware Engineer, Chip Designer, Chip Architect, Architect, Verification Engineer, Software Engineering, Embedded Software Engineer, Member of Technical Staff, VP Engineering, CTO Graduate research in: computer systems and architecture .
    [Show full text]
  • Threading SIMD and MIMD in the Multicore Context the Ultrasparc T2
    Overview SIMD and MIMD in the Multicore Context Single Instruction Multiple Instruction ● (note: Tute 02 this Weds - handouts) ● Flynn’s Taxonomy Single Data SISD MISD ● multicore architecture concepts Multiple Data SIMD MIMD ● for SIMD, the control unit and processor state (registers) can be shared ■ hardware threading ■ SIMD vs MIMD in the multicore context ● however, SIMD is limited to data parallelism (through multiple ALUs) ■ ● T2: design features for multicore algorithms need a regular structure, e.g. dense linear algebra, graphics ■ SSE2, Altivec, Cell SPE (128-bit registers); e.g. 4×32-bit add ■ system on a chip Rx: x x x x ■ 3 2 1 0 execution: (in-order) pipeline, instruction latency + ■ thread scheduling Ry: y3 y2 y1 y0 ■ caches: associativity, coherence, prefetch = ■ memory system: crossbar, memory controller Rz: z3 z2 z1 z0 (zi = xi + yi) ■ intermission ■ design requires massive effort; requires support from a commodity environment ■ speculation; power savings ■ massive parallelism (e.g. nVidia GPGPU) but memory is still a bottleneck ■ OpenSPARC ● multicore (CMT) is MIMD; hardware threading can be regarded as MIMD ● T2 performance (why the T2 is designed as it is) ■ higher hardware costs also includes larger shared resources (caches, TLBs) ● the Rock processor (slides by Andrew Over; ref: Tremblay, IEEE Micro 2009 ) needed ⇒ less parallelism than for SIMD COMP8320 Lecture 2: Multicore Architecture and the T2 2011 ◭◭◭ • ◮◮◮ × 1 COMP8320 Lecture 2: Multicore Architecture and the T2 2011 ◭◭◭ • ◮◮◮ × 3 Hardware (Multi)threading The UltraSPARC T2: System on a Chip ● recall concurrent execution on a single CPU: switch between threads (or ● OpenSparc Slide Cast Ch 5: p79–81,89 processes) requires the saving (in memory) of thread state (register values) ● aggressively multicore: 8 cores, each with 8-way hardware threading (64 virtual ■ motivation: utilize CPU better when thread stalled for I/O (6300 Lect O1, p9–10) CPUs) ■ what are the costs? do the same for smaller stalls? (e.g.
    [Show full text]
  • 3Dfx Oral History Panel Gordon Campbell, Scott Sellers, Ross Q. Smith, and Gary M. Tarolli
    3dfx Oral History Panel Gordon Campbell, Scott Sellers, Ross Q. Smith, and Gary M. Tarolli Interviewed by: Shayne Hodge Recorded: July 29, 2013 Mountain View, California CHM Reference number: X6887.2013 © 2013 Computer History Museum 3dfx Oral History Panel Shayne Hodge: OK. My name is Shayne Hodge. This is July 29, 2013 at the afternoon in the Computer History Museum. We have with us today the founders of 3dfx, a graphics company from the 1990s of considerable influence. From left to right on the camera-- I'll let you guys introduce yourselves. Gary Tarolli: I'm Gary Tarolli. Scott Sellers: I'm Scott Sellers. Ross Smith: Ross Smith. Gordon Campbell: And Gordon Campbell. Hodge: And so why don't each of you take about a minute or two and describe your lives roughly up to the point where you need to say 3dfx to continue describing them. Tarolli: All right. Where do you want us to start? Hodge: Birth. Tarolli: Birth. Oh, born in New York, grew up in rural New York. Had a pretty uneventful childhood, but excelled at math and science. So I went to school for math at RPI [Rensselaer Polytechnic Institute] in Troy, New York. And there is where I met my first computer, a good old IBM mainframe that we were just talking about before [this taping], with punch cards. So I wrote my first computer program there and sort of fell in love with computer. So I became a computer scientist really. So I took all their computer science courses, went on to Caltech for VLSI engineering, which is where I met some people that influenced my career life afterwards.
    [Show full text]
  • GV-3D1-7950-RH Geforce™ 7950 GX2 Graphics Accelerator
    GV-3D1-7950-RH GeForce™ 7950 GX2 Graphics Accelerator User's Manual Rev. 101 12MD-3D17950R-101R * The WEEE marking on the product indicates this product must not be disposed of with user's other household waste and must be handed over to a designated collection point for the recycling of waste electrical and electronic equipment!! * The WEEE marking applies only in European Union's member states. Copyright © 2006 GIGABYTE TECHNOLOGY CO., LTD Copyright by GIGA-BYTE TECHNOLOGY CO., LTD. ("GBT"). No part of this manual may be reproduced or transmitted in any form without the expressed, written permission of GBT. Trademarks Third-party brands and names are the property of their respective owners. Notice Please do not remove any labels on VGA card, this may void the warranty of this VGA card. Due to rapid change in technology, some of the specifications might be out of date before publication of this booklet. The author assumes no responsibility for any errors or omissions that may appear in this document nor does the author make a commitment to update the information contained herein. Macrovision corporation product notice: This product incorporates copyright protection technology that is protected by U.S. patents and other intellectual property rights. Use of this copyright protection technology must be authorized by Macrovision, and is intended for home and other limited viewing uses only unless otherwise authorized by Macrovision. Reverse engineering or disassembly is prohibited. Table of Contents English 1. Introduction ......................................................................................... 3 1.1. Features ..................................................................................................... 3 1.2. Minimum system requirements ..................................................................... 3 2. Hardware Installation ........................................................................... 4 2.1.
    [Show full text]
  • Club 3D Geforce 6800 GS Pcie Brute Rendering Force
    Club 3D GeForce 6800 GS PCIe Brute rendering force... Introduction: The Club-3D GeForce 6800 GS is Pure Graphics Power for exceptional sharp pricing. This the right hardware to play your games with optimal qual- ity settings and high frame rates. With the Club 3D CyberLink PowerPack 6800 GS you have the correct 3D technology to play your games with all features enabled. Experience all the advanced and impressive shader effects that will present you light effects you have never seen before. The implemented SM3.0 technology creates exceptional natural environments, movements and colors. Order Information: • Club 3D 6800 GS 256MB : CGNX-GS686 Collin McRae 2005 DVD Product Positioning: • High Performance market • Game Enthousiast • LAN Enthousiast Extended Video Cable Specifications: Features: System requirements Item code: CGNX-GS686 • NVIDIA® CineFX™ 3.0 Technology • Intel® Pentium® or AMD™ Athlon™ • Full support for DirectX® 9.0 • 128MB of system memory Format: PCIe • NVIDIA® UltraShadow™ II Technology • Mainboard with free PCIe (x16) slot Engine Clock: 425MHz • 64-Bit Texture Filtering and Blending • CD-ROM drive for software installation Memory Clock: 1000MHz • VertexShaders 3.0 • 350Watt or greater Power Supply Memory: 256MB GDDR3 • PixelShaders 3.0 • 400Watt or greater when configured Memory Bus: 256 bit • Up to 16x Anisotropic Filtering in SLi nVidia Driver/E-manual Pixel Pipelines: 12 • Up to 6x Multi Sampling Anti Aliasing • Support for unlimited shader lengths Operating System Support RAMDAC: 2x 400MHz • DVI digital resolution up
    [Show full text]
  • Chapter 5: Asics Vs. Plds
    Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task. ASICS, more specifically, are designed by the end user to perform some proprietary application. Semi- custom and full-custom Application Specific Integrated Circuits are very useful in integrating digital, analog, mixed signal or system-on-a-chip (SOC) designs but are very costly and not schedule friendly. Depending on the design application, there are many advantages in using ASICs rather than Field Programmable Gate Arrays (FPGAs) or Complex Programmable Logic devices (CPLDs). Some advantages include higher performance, increased densities and decreased space requirements. Some disadvantages include lacking flexibility for changes and difficulty to test and debug. There are some design applications best suited for ASIC technology and others suited for PLDs. Logic designs done in FPGA occupy more space and have decreased performance and may need to be migrated to an ASIC methodology. The migration process introduces issues such as architectural difference and logic mapping to vendor specified functions. 5.2 ASIC Industry The ASIC industry is very volatile with new companies, products and methodologies emerging daily. In the mid-1980s the prediction was that ASIC designs would be taking over 50% of the electronic design market by 1990. When 1990 came the ASIC market turned out to be approximately 10%. Most of the focus for ASICS is providing a technology capable of handling 100,000 or more gates with very high performance. Most of the new ASIC designs do not require high density and 79 performance.
    [Show full text]