Linear Algebra Package on the NVIDIA G80 Processor

Total Page:16

File Type:pdf, Size:1020Kb

Linear Algebra Package on the NVIDIA G80 Processor Linear Algebra PACKage on the NVIDIA G80 Processor Robert Liao and Tracy Wang Computer Science 252 Project University of California, Berkeley Berkeley, CA 94720 liao_r [at] berkeley.edu, tracyx [at] berkeley.edu Abstract primarily on running instructions faster. This brought forth the frequency race between The race for performance improvement microprocessor manufacturers. Programmers previously depended on running one application could write code, wait 6 months, and their code or sequence of code really quickly. Now, the would suddenly run faster. Additionally, many processor industry and universities focus on techniques for increasing instruction level running many things really quickly at the same parallelism were introduced to improve time in an idea collectively known as parallelism. performance. Today, the next performance Traditional microprocessors achieve this by obstacle is not clear, but the general consensus is building more cores onto the architecture. A single that the next big thing includes putting many processor may have as many as 2 to 8 cores that processing cores on a processor. This will enable can execute independently of one another. applications to take advantage of some higher However, many of these processor systems form of parallelism beyond instruction parallelism. include an underutilized graphics processing unit. NVIDIA’s G80 Processor represents the first step During all of this time, Graphics to improve performance of an application through Processing Units (GPUs) have been doing many the use of the inherent parallel structure of a operations in parallel due to the relatively graphics processing unit. independent nature of its computations. Graphics scenes can be decomposed into objects, which can This paper explores the performance of a be decomposed into various rendering steps that subset of the Linear Algebra Package running on are independent from one another. However, this the NVIDIA G80 Processor. Results from the led to a very specialized processor that was exploration show that if utilized properly, the optimized for rendering. performance of linear algebra operations is improved by a factor of 70 with a suitable input Performing computations on a GPU is a size. Additionally, the paper discusses the issues relatively new field full of opportunities thanks to involved with running general programs on the the NVIDIA G80 GPU. This processor represents GPU, a relatively new ability provided by the G80 one of the first processors to expose as many as Processor. Finally, the paper discusses limitations 128 computation cores to the programmer. As a in terms of the use of a GPU. result, a comparison with respect to the CPU to see how the GPU performs is more than 1 Introduction appropriate to determine if this is right direction for parallelism. The effort to improve performance on microprocessors for much of the 1990s focused Though any program can be loaded onto the Central Processing Unit (CPU). The the GPU for this exploration, we decided to relationship between the CPU and GPU is closely benchmark linear algebra operations as a starting intertwined as most of the output from a point to evaluate the GPU for other research areas computer is received through video. In the past, like the Berkeley View. The Berkeley View has architects optimized the CPU and GPU bus routes compiled many kernels called dwarves that they because of the high demand to display video to an think should perform well on parallel platforms. output device like a monitor or LCD. These include dense and sparse matrix operations. Many programs can be reduced purely to these GPU manufacturers have kept up with the dwarves. As a starting point to examining efficacy demand by offering more advanced capabilities in of using the GPU in this capacity, we benchmark their GPUs beyond putting text and windows to a the performance of general linear algebra screen. GPUs today are typically capable of taking operations. geometric information in the form of polygons from an application like a game and performing This paper is organized as follows. Section many different transformations to provide some 2 provides a general introduction to traditional sort of realistic or artistic output. GPUs and how they worked prior to the NVIDIA G80 processor. Section 3 discusses details about This video processing is embarrassingly the NVIDIA G80 processor and outlines its parallel. The representation of a pixel on the capabilities. Section 4 discusses the benchmarks screen often can be rendered independently of used to profile the performance of the G80 with other pixels. As a result, GPU manufacturers have respect to two CPUs used in this paper. Section 5 provided many superscalar features in their provides results, discussion, and speculation on processors to take advantage of this parallelism. the performance runs on the GPU. Section 6 brings This push for parallelism has come to a point forth the issues associated with GPU computing where a GPU is basically a specialized vector along with a discussion on issues on running processor. applications on the G80 platform. Finally, the paper concludes in Section 7 with a summary of Motivation for Change the results, future directions for research, as well A fixed pipeline characterizes the as related works on GPUs. traditional GPU. Many have a fixed number of special shaders such as vertex shaders and pixel 2 Traditional GPUs shaders. NVIDIA noticed that during certain rendering scenarios, many of the specialized Background shaders remain dormant. For instance, a scene with many geometric features will use many Many modern personal computing and vertex shaders, but not very many pixel shaders. workstation architectures include a GPU to off- As a result, NVIDIA began to look for a load the task of rendering graphical objects from reconfigurable solution. Figure 1 shows a typical high-altitude pipeline of a GPU. Data flows forward from the CPU through the GPU and ultimately on to the display. GPUs typically contain many of these pipelines to process scenes in parallel. Additionally, the pipeline is designed to flow Figure 1: The Traditional GPU Pipeline forward, and as a result, certain stages of the pipeline have features like write-only registers to Each processor also has local memory as avoid hazards like read after write hazards found well as shared memory with other processors. in typical CPU pipelines. According to the NVIDIA guide, accessing local and shared memory on-chip is as fast as accessing Additionally, the vector processor right registers. next to the CPU is quiet during heavy computations performed on the CPU. Most Compute Unified Device Architecture developers do not send parallelizable The Compute Unified Device Architecture computations to the GPU because the APIs make (CUDA) is NVIDIA’s API for exposing the it too difficult to do so. The typical interfaces like processing features of the G80 GPU. This C OpenGL and DirectX are designed for graphics, Language API provides services ranging from not computation. As a result, the programmer common GPU operations in the CUDA Library to cannot tap into the GPU’s vast vector resources. traditional C memory management semantics in the CUDA runtime and device driver layers. 3 The NVIDIA G80 GPU Additionally, NVIDIA provides a specialized C The G80 GPU is found in NVIDIA’s compiler to build programs targeted for the GPU. GeForce 8 Series graphics cards as well as the NVIDIA Quadro FX 4600 and 5600. The NVIDIA Code compiled for the GPU is executed on Quadro FX 5600 is the card used in this the GPU. Likewise, memory allocated on the GPU exploration. resides on the GPU. This introduces complications in interfacing programs running in CPU space Architecture with programs running in GPU space. The programmer must keep track of the pointers used The G80 GPU is NVIDIA’s answer to in each processor. Many programs, including the many of the aforementioned concerns and issues. ones used to benchmark the GPU in this paper, It represents a large departure from traditional reasonably assume that all pointers and execution GPU architectures. A block diagram of the code reside on one memory space and one architecture is shown in Figure 2. The GPU execution unit. Porting this style of programming contains 8 blocks of 16 stream processors with a to a separate format is a non-trivial task. total of 128 stream processors. Each stream processor can execute floating point instructions. From the block below, each group of 16 shares a L1 cache. From there, each block has access to 6 L2 caches. This architecture arrangement also allows one processor to directly feed results into another processor for continued stream processing. Each processor can be configured to be a part of some shader unit in the traditional GPU sense. This reconfigurability also means that the processors can be dedicated to performing general computations. This capability is exposed in Figure 2: The NVIDIA G80 Graphics Processor NVIDIA’s Compute Unified Device Architecture. Architecture with respect to matrix size in both the CPU and GPU. BLAS and CUBLAS The LAPACK tools rely on the Basic Linear Algebra Subprograms (BLAS) library. These subprograms are a set of primitive Figure 3: Organization of the modules. operations that operate on matrices. The original A Note on Scarce Specifications BLAS can be run on the CPU. NVIDIA provides its Due to the secretive nature of the industry, own version called CUBLAS (Compute Unified NVIDIA has not released much information about BLAS). CUBLAS is designed to run on the G80 the G80 processor beyond a high level overview. GPU, and abstracts much of the CUDA As a result, we can only speculate on specific like programming API in a succinct mathematical L1 and L2 cache sizes in this benchmark. package. The only major change is the inclusion of allocation and freeing function to deal with the 4 Benchmarking separation of CPU and GPU memory.
Recommended publications
  • NVIDIA® Geforce® 7300 GT Gpus Features and Benefits
    NVIDIA GEFORCE 7 SERIES MARKETING MATERIALS NVIDIA® GeForce® 7300 GT GPUs Features and Benefits Next-Generation Superscalar GPU Architecture Delivers over 2x the shading power of previous generation products taking gaming performance to extreme levels. Full Microsoft® DirectX® 9.0 Shader Model 3.0 Support The standard for today’s PCs and next-generation consoles enables stunning and complex effects for cinematic realism. NVIDIA GPUs offer the most complete implementation of the Shader Model 3.0 feature set—including vertex texture fetch (VTF)—to ensure top-notch compatibility and performance for all DirectX 9 applications. NVIDIA® CineFX® 4.0 Engine Delivers advanced visual effects at unimaginable speeds. Full support for Microsoft® DirectX® 9.0 Shader Model 3.0 enables stunning and complex special effects. Next-generation shader architecture with new texture unit design streamlines texture processing for faster and smoother gameplay. NVIDIA® SLI™ Technology1 Delivers up to 2x the performance of a single GPU configuration for unparalleled gaming experiences by allowing two graphics cards to run in parallel. The must-have feature for performance PCI Express® graphics, SLI dramatically scales performance on today’s hottest games. NVIDIA® Intellisample™ 4.0 Technology The industry’s fastest antialiasing delivers ultra-realistic visuals, with no jagged edges, at lightning- fast speeds. Visual quality is taken to new heights through a new rotated grid sampling pattern, advanced 128 tap sample coverage, 16x anisotropic filtering, and support for transparent supersampling and multisampling. True High Dynamic-Range (HDR) Rendering Support The ultimate lighting effects bring environments to life for a truly immersive, ultra-reailstic experience. Based on the OpenEXR technology from Industrial Light & Magic (http://www.openexr.com/), NVIDIA’s 64-bit texture implementation delivers state-of-the-art high dynamic-range (HDR) visual effects through floating point capabilities in shading, filtering, texturing, and blending.
    [Show full text]
  • Release 85 Notes
    ForceWare Graphics Drivers Release 85 Notes Version 88.61 For Windows Vista x86 and Windows Vista x64 NVIDIA Corporation May 2006 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, 3DFX, 3DFX INTERACTIVE, the 3dfx Logo, STB, STB Systems and Design, the STB Logo, the StarBox Logo, NVIDIA nForce, GeForce, NVIDIA Quadro, NVDVD, NVIDIA Personal Cinema, NVIDIA Soundstorm, Vanta, TNT2, TNT,
    [Show full text]
  • NVIDIA® Geforce® 7900 Gpus Features and Benefits Next
    NVIDIA GEFORCE 7 SERIES MARKETING MATERIALS NVIDIA® GeForce® 7900 GPUs Features and Benefits Next-Generation Superscalar GPU Architecture: Delivers over 2x the shading power of previous generation products taking gaming performance to extreme levels. Full Microsoft® DirectX® 9.0 Shader Model 3.0 Support: The standard for today’s PCs and next-generation consoles enables stunning and complex effects for cinematic realism. NVIDIA GPUs offer the most complete implementation of the Shader Model 3.0 feature set—including vertex texture fetch (VTF)—to ensure top-notch compatibility and performance for all DirectX 9 applications. NVIDIA® CineFX® 4.0 Engine: Delivers advanced visual effects at unimaginable speeds. Full support for Microsoft® DirectX® 9.0 Shader Model 3.0 enables stunning and complex special effects. Next-generation shader architecture with new texture unit design streamlines texture processing for faster and smoother gameplay. NVIDIA® SLI™ Technology*: Delivers up to 2x the performance of a single GPU configuration for unparalleled gaming experiences by allowing two graphics cards to run in parallel. The must-have feature for performance PCI Express® graphics, SLI dramatically scales performance on today’s hottest games. NVIDIA® Intellisample™ 4.0 Technology: The industry’s fastest antialiasing delivers ultra- realistic visuals, with no jagged edges, at lightning-fast speeds. Visual quality is taken to new heights through a new rotated grid sampling pattern, advanced 128 tap sample coverage, 16x anisotropic filtering, and support for transparent supersampling and multisampling. True High Dynamic-Range (HDR) Rendering Support: The ultimate lighting effects bring environments to life for a truly immersive, ultra-realistic experience. Based on the OpenEXR technology from Industrial Light & Magic (http://www.openexr.com/), NVIDIA’s 64-bit texture implementation delivers state-of-the-art high dynamic-range (HDR) visual effects through floating point capabilities in shading, filtering, texturing, and blending.
    [Show full text]
  • Nvidia Geforce Go 7400 Driver Download Windows 7
    Nvidia geforce go 7400 driver download windows 7 click here to download Operating System, Windows Vista bit NVIDIA recommends that you check with your notebook OEM about recommended Before downloading this driver: Beta driver for GeForce Go 7-series, GeForce 8M and GeForce 9M series Go , GeForce Go , GeForce Go , GeForce Go Operating System: Windows Vista bit, Windows 7 bit Beta driver for GeForce Go 7-series, GeForce 8M and GeForce 9M series notebook GPUs. in several free applications and demos by downloading the GeForce Plus Pack. GTX, GeForce Go , GeForce Go , GeForce Go , GeForce Go , . Since all GeForce Go 7 Series GPUs are designed to be compatible with the next -generation Microsoft® Windows Vista™ operating system (OS), you can rest. Download nVidia GeForce Go video card drivers or install DriverPack Solution software Operating System Versions: Windows XP, 7, 8, , 10 (x64, x86). This package supports the following driver models:NVIDIA GeForce Free Sony Windows NT//XP/ Version Full Specs. Free NVIDIA Windows 98/NT//XP/ Version Full Specs NVIDIA GeForce Go and NVIDIA GeForce Go graphics processing units ( GPUs) deliver all of of the award-winning GeForce 7 Series GPU architecture to thin and light notebook PCs. Subcategory, Keyboard Drivers. Download the latest Nvidia GeForce Go device drivers (Official and Certified). Nvidia GeForce Go Compatibility: Windows XP, Vista, 7, 8, Downloads. nVidia - Display - NVIDIA GeForce Go , Windows XP Bit Edition Version , Drivers (Other Hardware), 4/3/, , MB NVIDIA GeForce Go - Driver Download. Updating your drivers with Driver Alert can help your computer in a number of ways. Windows 7 Bit Driver.
    [Show full text]
  • 最新 7.3.X 6.0.0-6.02 6.0.3 6.0.4 6.0.5-6.09 6.0.2(RIQ)
    2017/04/24現在 赤字は標準インストール外 最新 RedHawk Linux 6.0.x 6.3.x 6.5.x 7.0.x 7.2.x 7.3.x Version 6.0.0-6.02 6.0.3 6.0.4 6.0.5-6.09 6.0.2(RIQ) 6.3.1-6.3.2 6.3.3 6.3.4-6.3.6 6.3.7-6.3.11 6.5.0 6.5.1 6.5-6.5.8 7.0 7.01-7.03 7.2-7.2.5 7.3-7.3.1 Xorg-version 1.7.7(X11R7.4) 1.10.6(X11R7.4) 1.13.0(X11R7.4) 1.15.0(X11R7.7) 1.17.2-10(X11R7.7) 1.17.2-22(X11R7.7) X.Org ANSI C 0.4 0.4 0.4 0.4 0.4 0.4 Emulation X.Org Video Driver 6.0 10.0 13.1 19.0 19.0 15.0 X.Org Xinput driver 7.0 12.2 18.1 21.0 21.0 20.0 X.Org Server 2.0 5.0 7.0 9.0 9.0 8.0 Extention RandR Version 1.1 1.1 1.1 1.1/1.2 1.1/1.2/1.3/1.4 1.1/1.2 1.1/1.2/1.3 1.1/1.2/1.3 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 1.1/1.2/1.3/1.4 NVIDIA driver 340.76 275.09.07 295.20 295.40 304.54 331.20 304.37 304.54 310.32 319.49 331.67 337.25 340.32 346.35 346.59 352.79 367.57 375.51 version (Download) PTX ISA 2.3(CUDA4.0) 3.0(CUDA 4.1,4.2) 3.0(CUDA4.1,4.2) 3.1(CUDA5.0) 4.0(CUDA6.0) 3.1(CUDA5.0) 3.1(CUDA5.0) 3.1(CUDA5.0) 3.2(CUDA5.5) 4.0(CUDA6.0) 4.0(CUDA6.0) 4.1(CUDA6.5) 4.1(CUDA6.5) 4.3(CUDA7.5) 5.0(CUDA8.0) 5.0(CUDA8.0) Version(対応可能 4.2(CUDA7.0) 4.2(CUDA7.0) CUDAバージョン) Unified Memory N/A Yes N/A Yes kernel module last update Aug 17 2011 Mar 20 2012 May 16 2012 Nov 13 2012 Sep 27 2012 Nov 20 2012 Aug 16 2013 Jan 08 2014 Jun 30 2014 May 16 2014 Dec 10 2014 Jan 27 2015 Mar 24 2015 Apr 7 2015 Mar 09 2016 Dec 21 2016 Arp 5 2017 標準バンドル 4.0.1 4.1.28 4.1.28 4.2.9 5.5 5.0 5.0 5.0 5.5 5.5,6.0 5.5,6.0 5.5,6.0 7.5 7.5 8.0
    [Show full text]
  • Trabajo #1 De Computación Gráfica
    TRABAJO #1 DE COMPUTACIÓN GRÁFICA MONITORES 1.¿Cuántos tipos de monitores hay? Hay actualmente tres tipos principales de tecnologías: CRT, o tubos de rayos catódicos (los de siempre), LCD, o pantallas de cristal líquido (Liquid Crystal Display), y las pantallas de plasma. LCD: Una pantalla de cristal líquido o LCD (acrónimo del inglés Liquid crystal display) es una pantalla delgada y plana formada por un número de píxeles en color o monocromos colocados delante de una fuente de luz o reflectora. A menudo se utiliza en pilas, dispositivos electrónicos, ya que utiliza cantidades muy pequeñas de energía eléctrica. CTR: El monitor esta basado en un elemento CRT (Tubo de rayos catódicos), los actuales monitores, controlados por un microprocesador para almacenar muy diferentes formatos, así como corregir las eventuales distorsiones, y con capacidad de presentar hasta 1600x1200 puntos en pantalla. Los monitores CRT emplean tubos cortos, pero con la particularidad de disponer de una pantalla completamente plana. PLASMA: Se basan en el principio de que haciendo pasar un alto voltaje por un gas a baja presión se genera luz. Estas pantallas usan fósforo como los CRT pero son emisivas como las LCD y frente a estas consiguen una gran mejora del color y un estupendo ángulo de visión. 2.Compara en una tabla similitudes y diferencias, ventajas y desventajas, entre CRT, LCD, PLASMA, TFT, HDA, y otras tecnologías Similitud Diferencia Ventajas Desventajas CRT Lo usan los -Permiten -Ocupan monitores de reproducir una más espacio plasma mayor variedad (cuanto mas cromática. fondo, mejor geometría). -Distintas resoluciones se -Los pueden ajustar modelos al monitor.
    [Show full text]
  • NVIDIA Performance Primitives & Video Codecs On
    NVIDIA Performance Primitives & Video Codecs on GPU Gold Room | Thursday 1st October 2009 | Anton Obukhov & Frank Jargstorff Overview • Two presentations: – NPP (Frank Jargstorff) – Video Codes on NVIDIA GPUs (Anton Obukhov) • NPP Overview – NPP Goals – How to use NPP? – What is in NPP? – Performance What is NPP? • C Library of functions (primitives) running on CUDA architecture • API identical to IPP (Intel Integrated Performance Primitives) • Speedups up to 32x over IPP • Free distribution – binary packages for Windows and Linux (32- and 64 bit), Mac OS X • Release Candidate 1.0: Available to Registered Developers now. – Final release in two weeks at http://www.nvidia.com/npp NPP’s Goals • Ease of use – no knowledge of GPU architecture required – integrates well with existing projects • work well if added into existing projects • work well in conjunction with other libraries • Runs on CUDA Architecture GPUs • High Performance – relieve developers from optimization burden • Algorithmic Building Blocks (Primitives) – recombine to solve wide range of problems Ease of Use • Implements Intel’s IPP API verbatim – IPP widely used in high-performance software development – well designed API • Uses CUDA “runtime API” – device memory is handled via simple C-style pointers – pointers in the NPP API are device pointers – but: host and device memory management left to user (for performance reasons) • Pointer based API – pointers facilitate interoperability with existing code (C for CUDA) and libraries (cuFFT, cuBLAS, etc.) – imposes no “framework”
    [Show full text]
  • Release 75 Notes
    ForceWare Graphics Drivers Release 75 Notes Version 78.03 For Windows XP / 2000 Windows XP Media Center Edition Windows 98 / ME Windows NT 4.0 NVIDIA Corporation August 2005 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, 3DFX, 3DFX INTERACTIVE, the 3dfx Logo, STB, STB Systems and Design, the STB Logo, the StarBox Logo, NVIDIA nForce, GeForce, NVIDIA Quadro, NVDVD, NVIDIA Personal Cinema,
    [Show full text]
  • High Performance Visualization Through Graphics Hardware and Integration Issues in an Electric Power Grid Computer-Aided-Design Application
    UNIVERSITY OF A CORUÑA FACULTY OF INFORMATICS Department of Computer Science Ph.D. Thesis High performance visualization through graphics hardware and integration issues in an electric power grid Computer-Aided-Design application Author: Javier Novo Rodríguez Advisors: Elena Hernández Pereira Mariano Cabrero Canosa A Coruña, June, 2015 August 27, 2015 UNIVERSITY OF A CORUÑA FACULTY OF INFORMATICS Campus de Elviña s/n 15071 - A Coruña (Spain) Copyright notice: No part of this publication may be reproduced, stored in a re- trieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording and/or other- wise without the prior permission of the authors. Acknowledgements I would like to thank Gas Natural Fenosa, particularly Ignacio Manotas, for their long term commitment to the University of A Coru˜na. This research is a result of their funding during almost five years through which they carefully balanced business-driven objectives with the freedom to pursue more academic goals. I would also like to express my most profound gratitude to my thesis advisors, Elena Hern´andez and Mariano Cabrero. Elena has also done an incredible job being the lead coordinator of this collaboration between Gas Natural Fenosa and the University of A Coru˜na. I regard them as friends, just like my other colleagues at LIDIA, with whom I have spent so many great moments. Thank you all for that. Last but not least, I must also thank my family – to whom I owe everything – and friends. I have been unbelievably lucky to meet so many awesome people in my life; every single one of them is part of who I am and contributes to whatever I may achieve.
    [Show full text]
  • IMPLEMENTING an INTERIOR POINT METHOD for LINEAR PROGRAMS on a CPU-GPU SYSTEM∗ in Memory of Gene Golub 1. Introduction. Hidden
    Electronic Transactions on Numerical Analysis. ETNA Volume 28, pp. 174-189, 2008. Kent State University Copyright 2008, Kent State University. [email protected] ISSN 1068-9613. IMPLEMENTING AN INTERIOR POINT METHOD FOR LINEAR PROGRAMS ON A CPU-GPU SYSTEM∗ JIN HYUK JUNG† AND DIANNE P. O’LEARY‡ In memory of Gene Golub Abstract. Graphics processing units (GPUs), present in every laptop and desktop computer, are potentially pow- erful computational engines for solving numerical problems. We present a mixed precision CPU-GPU algorithm for solving linear programming problems using interior point methods. This algorithm, based on the rectangular-packed matrix storage scheme of Gunnels and Gustavson, uses the GPU for computationally intensive tasks such as ma- trix assembly, Cholesky factorization, and forward and back substitution. Comparisons with a CPU implementation demonstrate that we can improve performance by using the GPU for sufficiently large problems. Since GPU archi- tectures and programming languages are rapidly evolving, we expect that GPUs will be an increasingly attractive tool for matrix computation in the future. Key words. GPGPU, Cholesky factorization, matrix decomposition, forward and back substitution, linear pro- gramming, interior point method, rectangular packed format AMS subject classifications. 90C05, 90C51, 15A23, 68W10 1. Introduction. Hidden inside your desktop or laptop computer is a very powerful par- allel processor, the graphics processing unit (GPU). This hardware is dedicated to rendering images on your screen, and its design was driven by the demands of the gaming industry. This single-instruction-multiple-data (SIMD) processor has its own memory, and the host CPU is- sues instructions and data to it through a data bus such as PCIe (Peripheral Component Inter- connect Express).
    [Show full text]
  • Release 95 Notes
    ForceWare Graphics Drivers Release 95 Notes Version 96.85 For Windows Vista x86 and Windows Vista x64 NVIDIA Corporation November 8, 2006 Rev. B Confidential Information Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, 3DFX, 3DFX INTERACTIVE, the 3dfx Logo, STB, STB Systems and Design, the STB Logo, the StarBox Logo, NVIDIA nForce, GeForce, NVIDIA Quadro, NVDVD, NVIDIA Personal Cinema,
    [Show full text]
  • BFG Geforce 7800 GT OC Highlights
    BFG GeForce 7800 GT OC Highlights A bit about the GeForce 7800 GT At the QUAKECON 2005 held in DALLAS, TX NVIDIA unleashed its second new GeForce 7 series product upon the world. Its name was the GeForce 7800 GT, and it is prepared to kick some. Just 2 months prior to the unveiling of the GeForce 7800 GT, NVIDIA had treated fans to the release of the GeForce 7800 GTX, which is the first product to hail form the new GeForce 7 series to win the performance crown in the video card domain. At the same time, ATI’s next generation GPUs – the RADEON X1000 series - were still undergoing development back at the factory due to problems related to the new 90nm process. The most powerful model ATI could muster at that time was still the RADEON X850 XT PE, which was easy pickings for NVIDIA’s new flagship video card. With the absence of any real competition, the GeForce 7800 GTX’s astronomical price tag (official MSRP $599) was set too high for even some of the most enthusiastic of enthusiasts. The problem is, if only a few gamers are allowed to experience the new features and performance of the GeForce 7 series, the passion of most others wishing to own a GeForce 7 card would quickly fade. With that considered, a cheaper, but still high performance model was expected to appear shortly after the debut of the GeForce 7800 GTX. As we all know, that card has arrived, and it is dubbed the GeForce 7800 GT. GeForce 7800 GT Overview Though the GeForce 7800 GT is not the top model in the GeForce 7 series, its specifications still come in at well above any GeForce 6 series card.
    [Show full text]