GPU Clusters In
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
New CSC Computing Resources
New CSC computing resources Atte Sillanpää, Nino Runeberg CSC – IT Center for Science Ltd. Outline CSC at a glance New Kajaani Data Centre Finland’s new supercomputers – Sisu (Cray XC30) – Taito (HP cluster) CSC resources available for researchers CSC presentation 2 CSC’s Services Funet Services Computing Services Universities Application Services Polytechnics Ministries Data Services for Science and Culture Public sector Information Research centers Management Services Companies FUNET FUNET and Data services – Connections to all higher education institutions in Finland and for 37 state research institutes and other organizations – Network Services and Light paths – Network Security – Funet CERT – eduroam – wireless network roaming – Haka-identity Management – Campus Support – The NORDUnet network Data services – Digital Preservation and Data for Research Data for Research (TTA), National Digital Library (KDK) International collaboration via EU projects (EUDAT, APARSEN, ODE, SIM4RDM) – Database and information services Paituli: GIS service Nic.funet.fi – freely distributable files with FTP since 1990 CSC Stream Database administration services – Memory organizations (Finnish university and polytechnics libraries, Finnish National Audiovisual Archive, Finnish National Archives, Finnish National Gallery) 4 Current HPC System Environment Name Louhi Vuori Type Cray XT4/5 HP Cluster DOB 2007 2010 Nodes 1864 304 CPU Cores 10864 3648 Performance ~110 TFlop/s 34 TF Total memory ~11 TB 5 TB Interconnect Cray QDR IB SeaStar Fat tree 3D Torus CSC -
HP/NVIDIA Solutions for HPC Compute and Visualization
HP Update Bill Mannel VP/GM HPC & Big Data Business Unit Apollo Servers © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The most exciting shifts of our time are underway Security Mobility Cloud Big Data Time to revenue is critical Decisions Making IT critical Business needs must be rapid to business success happen anywhere Change is constant …for billion By 2020 billion 30 devices 8 people trillion GB million 40 data 10 mobile apps 2 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP’s compute portfolio for better IT service delivery Software-defined and cloud-ready API HP OneView HP Helion OpenStack RESTful APIs WorkloadWorkload-optimized Optimized Mission-critical Virtualized &cloud Core business Big Data, HPC, SP environments workloads applications & web scalability Workloads HP HP HP Apollo HP HP ProLiant HP Integrity HP Integrity HP BladeSystem HP HP HP ProLiant SL Moonshot Family Cloudline scale-up blades & Superdome NonStop MicroServer ProLiant ML ProLiant DL Density and efficiency Lowest Cost Convergence Intelligence Availability to scale rapidly built to Scale for continuous business to accelerate IT service delivery to increase productivity Converged Converged network Converged management Converged storage Common modular HP Networking HP OneView HP StoreVirtual VSA architecture Global support and services | Best-in-class partnerships | ConvergedSystem 3 © Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Hyperscale compute grows 10X faster than the total market HPC is a strategic growth market for HP Traditional Private Cloud HPC Service Provider $50 $40 36% $30 50+% $B $20 $10 $0 2013 2014 2015 2016 2017 4 © Copyright 2015 Hewlett-Packard Development Company, L.P. -
Parallel Rendering on Hybrid Multi-GPU Clusters
Eurographics Symposium on Parallel Graphics and Visualization (2012) H. Childs and T. Kuhlen (Editors) Parallel Rendering on Hybrid Multi-GPU Clusters S. Eilemann†1, A. Bilgili1, M. Abdellah1, J. Hernando2, M. Makhinya3, R. Pajarola3, F. Schürmann1 1Blue Brain Project, EPFL; 2CeSViMa, UPM; 3Visualization and MultiMedia Lab, University of Zürich Abstract Achieving efficient scalable parallel rendering for interactive visualization applications on medium-sized graphics clusters remains a challenging problem. Framerates of up to 60hz require a carefully designed and fine-tuned parallel rendering implementation that fits all required operations into the 16ms time budget available for each rendered frame. Furthermore, modern commodity hardware embraces more and more a NUMA architecture, where multiple processor sockets each have their locally attached memory and where auxiliary devices such as GPUs and network interfaces are directly attached to one of the processors. Such so called fat NUMA processing and graphics nodes are increasingly used to build cost-effective hybrid shared/distributed memory visualization clusters. In this paper we present a thorough analysis of the asynchronous parallelization of the rendering stages and we derive and implement important optimizations to achieve highly interactive framerates on such hybrid multi-GPU clusters. We use both a benchmark program and a real-world scientific application used to visualize, navigate and interact with simulations of cortical neuron circuit models. Categories and Subject Descriptors (according to ACM CCS): I.3.2 [Computer Graphics]: Graphics Systems— Distributed Graphics; I.3.m [Computer Graphics]: Miscellaneous—Parallel Rendering 1. Introduction Hybrid GPU clusters are often build from so called fat render nodes using a NUMA architecture with multiple The decomposition of parallel rendering systems across mul- CPUs and GPUs per machine. -
CRAY-2 Design Allows Many Types of Users to Solve Problems That Cannot Be Solved with Any Other Computers
Cray Research's mission is to lead in the development and marketingof high-performancesystems that make a unique contribution to the markets they serve. For close toa decade, Cray Research has been the industry leader in large-scale computer systems. Today, the majority of supercomputers installed worldwide are Cray systems. These systems are used In advanced research laboratories around the world and have gained strong acceptance in diverse industrial environments. No other manufacturer has Cray Research's breadth of success and experience in supercomputer development. The company's initial product, the GRAY-1 Computer System, was first installed in 1978. The CRAY-1 quickly established itself as the standard of value for large-scale computers and was soon recognized as the first commercially successful vector processor. For some time previously, the potential advantagee af vector processing had been understood, but effective Dractical imolementation had eluded com~uterarchitects. The CRAY-1 broke that barrier, and today vect~rization'techni~uesare used commonly by scientists and engineers in a widevariety of disciplines. With its significant innovations in architecture and technology, the GRAY-2 Computer System sets the standard for the next generation of supercomputers. The CRAY-2 design allows many types of users to solve problems that cannot be solved with any other computers. The GRAY-2 provides an order of magnitude increase in performanceaver the CRAY-1 at an attractive price/performance ratio. Introducing the CRAY-2 Computer System The CRAY-2 Computer System sets the standard for the next generation of supercomputers. It is characterized by a large Common Memory (256 million 64-bit words), four Background Processors, a clock cycle of 4.1 nanoseconds (4.1 billionths of a second) and liquid immersion cooling. -
Revolution: Greatest Hits
REVOLUTION: GREATEST HITS SHORT ON TIME? VISIT THESE OBJECTS INSIDE THE REVOLUTION EXHIBITION FOR A QUICK OVERVIEW REVOLUTION: PUNCHED CARDS REVOLUTION: BIRTH OF THE COMPUTER REVOLUTION: BIRTH OF THE COMPUTER REVOLUTION: REAL-TIME COMPUTING Hollerith Electric ENIAC, 1946 ENIGMA, ca. 1935 Raytheon Apollo Guidance Tabulating System, 1890 Used in World War II to calculate Few technologies were as Computer, 1966 This device helped the US gun trajectories, only a few piec- decisive in World War II as the This 70 lb. box, built using the government complete the 1890 es remain of this groundbreaking top-secret German encryption new technology of Integrated census in record time and, in American computing system. machine known as ENIGMA. Circuits (ICs), guided Apollo 11 the process, launched the use of ENIAC used 18,000 vacuum Breaking its code was a full-time astronauts to the Moon and back in July of 1969, the first manned punched cards in business. IBM tubes and took several days task for Allied code breakers who moon landing in human history. used the Hollerith system as the to program. invented remarkable comput- The AGC was a lifeline for the basis of its business machines. ing machines to help solve the astronauts throughout the eight ENIGMA riddle. day mission. REVOLUTION: MEMORY & STORAGE REVOLUTION: SUPERCOMPUTERS REVOLUTION: MINICOMPUTERS REVOLUTION: AI & ROBOTICS IBM RAMAC Disk Drive, 1956 Cray-1 Supercomputer, 1976 Kitchen Computer, 1969 SRI Shakey the Robot, 1969 Seeking a faster method of The stunning Cray-1 was the Made by Honeywell and sold by Shakey was the first robot that processing data than using fastest computer in the world. -
Supercomputing
DepartmentDepartment ofof DefenseDefense HighHigh PerformancePerformance ComputingComputing ModernizationModernization ProgramProgram Supercomputing:Supercomputing: CrayCray Henry,Henry, DirectorDirector HPCMP,HPCMP, PerformancePerformance44 May MayMeasuresMeasures 20042004 andand OpportunitiesOpportunities CrayCray J.J. HenryHenry AugustAugust 20042004 http://http://www.hpcmo.hpc.milwww.hpcmo.hpc.mil 20042004 HPECHPEC ConferenceConference PresentationPresentation OutlineOutline zz WhatWhat’sWhat’’ss NewNew inin thethe HPCMPHPCMP 00NewNew hardwarehardware 00HPCHPC SoftwareSoftware ApplicationApplication InstitutesInstitutes 00CapabilityCapability AllocationsAllocations 00OpenOpen ResearchResearch SystemsSystems 00OnOn-demand-demand ComputingComputing zz PerformancePerformance MeasuresMeasures -- HPCMPHPCMPHPCMP zz PerformancePerformance MeasuresMeasures –– ChallengesChallengesChallenges && OpportunitiesOpportunities HPCMPHPCMP CentersCenters 19931993 20042004 Legend MSRCs ADCs and DDCs TotalTotal HPCMHPCMPP EndEnd-of-Year-of-Year ComputationalComputational CapabilitiesCapabilities 80 MSRCs ADCs 120,000 70 13.1 MSRCs DCs 100,000 60 23,327 80,000 Over 400X Growth 50 s F Us 60,000 40 eak G HAB 12.1 P 21,759 30 59.3 40,000 77,676 5,86 0 20,000 4,393 30,770 20 2.6 21,946 1, 2 76 3,171 26.6 18 9 3 6 0 688 1, 16 8 2,280 8,03212 , 0 14 2.7 18 1 47 10 0 1,944 3,477 10 15.7 0 50 400 1,200 10.6 3 4 5 6 7 8 9 0 1 2 3 4 0 199 199 199 199 199 199 199 200 200 200 200 200 FY 01 FY 02 FY 03 FY 04 Year Fiscal Year (TI-XX) HPCMPHPCMP SystemsSystems (MSRCs)(MSRCs)20042004 -
Parallel Texture-Based Vector Field Visualization on Curved Surfaces Using GPU Cluster Computers
Eurographics Symposium on Parallel Graphics and Visualization (2006) Alan Heirich, Bruno Raffin, and Luis Paulo dos Santos (Editors) Parallel Texture-Based Vector Field Visualization on Curved Surfaces Using GPU Cluster Computers S. Bachthaler1, M. Strengert1, D. Weiskopf2, and T. Ertl1 1Institute of Visualization and Interactive Systems, University of Stuttgart, Germany 2School of Computing Science, Simon Fraser University, Canada Abstract We adopt a technique for texture-based visualization of flow fields on curved surfaces for parallel computation on a GPU cluster. The underlying LIC method relies on image-space calculations and allows the user to visualize a full 3D vector field on arbitrary and changing hypersurfaces. By using parallelization, both the visualization speed and the maximum data set size are scaled with the number of cluster nodes. A sort-first strategy with image-space decomposition is employed to distribute the workload for the LIC computation, while a sort-last approach with an object-space partitioning of the vector field is used to increase the total amount of available GPU memory. We specifically address issues for parallel GPU-based vector field visualization, such as reduced locality of memory accesses caused by particle tracing, dynamic load balancing for changing camera parameters, and the combination of image-space and object-space decomposition in a hybrid approach. Performance measurements document the behavior of our implementation on a GPU cluster with AMD Opteron CPUs, NVIDIA GeForce 6800 Ultra GPUs, and Infiniband network connection. Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: Viewing algorithms I.3.3 [Three-Dimensional Graphics and Realism]: Color, shading, shadowing, and texture 1. -
Evolution of the Graphical Processing Unit
University of Nevada Reno Evolution of the Graphical Processing Unit A professional paper submitted in partial fulfillment of the requirements for the degree of Master of Science with a major in Computer Science by Thomas Scott Crow Dr. Frederick C. Harris, Jr., Advisor December 2004 Dedication To my wife Windee, thank you for all of your patience, intelligence and love. i Acknowledgements I would like to thank my advisor Dr. Harris for his patience and the help he has provided me. The field of Computer Science needs more individuals like Dr. Harris. I would like to thank Dr. Mensing for unknowingly giving me an excellent model of what a Man can be and for his confidence in my work. I am very grateful to Dr. Egbert and Dr. Mensing for agreeing to be committee members and for their valuable time. Thank you jeffs. ii Abstract In this paper we discuss some major contributions to the field of computer graphics that have led to the implementation of the modern graphical processing unit. We also compare the performance of matrix‐matrix multiplication on the GPU to the same computation on the CPU. Although the CPU performs better in this comparison, modern GPUs have a lot of potential since their rate of growth far exceeds that of the CPU. The history of the rate of growth of the GPU shows that the transistor count doubles every 6 months where that of the CPU is only every 18 months. There is currently much research going on regarding general purpose computing on GPUs and although there has been moderate success, there are several issues that keep the commodity GPU from expanding out from pure graphics computing with limited cache bandwidth being one. -
NQE Release Overview RO–5237 3.3
NQE Release Overview RO–5237 3.3 Document Number 007–3795–001 Copyright © 1998 Silicon Graphics, Inc. and Cray Research, Inc. All Rights Reserved. This manual or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Silicon Graphics, Inc. or Cray Research, Inc. RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and/or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94043-1389. Autotasking, CF77, CRAY, Cray Ada, CraySoft, CRAY Y-MP, CRAY-1, CRInform, CRI/TurboKiva, HSX, LibSci, MPP Apprentice, SSD, SUPERCLUSTER, UNICOS, and X-MP EA are federally registered trademarks and Because no workstation is an island, CCI, CCMT, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Animation Theater, CRAY APP, CRAY C90, CRAY C90D, Cray C++ Compiling System, CrayDoc, CRAY EL, CRAY J90, CRAY J90se, CrayLink, Cray NQS, Cray/REELlibrarian, CRAY S-MP, CRAY SSD-T90, CRAY T90, CRAY T3D, CRAY T3E, CrayTutor, CRAY X-MP, CRAY XMS, CRAY-2, CSIM, CVT, Delivering the power . ., DGauss, Docview, EMDS, GigaRing, HEXAR, IOS, ND Series Network Disk Array, Network Queuing Environment, Network Queuing Tools, OLNET, RQS, SEGLDR, SMARTE, SUPERLINK, System Maintenance and Remote Testing Environment, Trusted UNICOS, UNICOS MAX, and UNICOS/mk are trademarks of Cray Research, Inc. -
Literature Review and Implementation Overview: High Performance
LITERATURE REVIEW AND IMPLEMENTATION OVERVIEW: HIGH PERFORMANCE COMPUTING WITH GRAPHICS PROCESSING UNITS FOR CLASSROOM AND RESEARCH USE APREPRINT Nathan George Department of Data Sciences Regis University Denver, CO 80221 [email protected] May 18, 2020 ABSTRACT In this report, I discuss the history and current state of GPU HPC systems. Although high-power GPUs have only existed a short time, they have found rapid adoption in deep learning applications. I also discuss an implementation of a commodity-hardware NVIDIA GPU HPC cluster for deep learning research and academic teaching use. Keywords GPU · Neural Networks · Deep Learning · HPC 1 Introduction, Background, and GPU HPC History High performance computing (HPC) is typically characterized by large amounts of memory and processing power. HPC, sometimes also called supercomputing, has been around since the 1960s with the introduction of the CDC STAR-100, and continues to push the limits of computing power and capabilities for large-scale problems [1, 2]. However, use of graphics processing unit (GPU) in HPC supercomputers has only started in the mid to late 2000s [3, 4]. Although graphics processing chips have been around since the 1970s, GPUs were not widely used for computations until the 2000s. During the early 2000s, GPU clusters began to appear for HPC applications. Most of these clusters were designed to run large calculations requiring vast computing power, and many clusters are still designed for that purpose [5]. GPUs have been increasingly used for computations due to their commodification, following Moore’s Law (demonstrated arXiv:2005.07598v1 [cs.DC] 13 May 2020 in Figure 1), and usage in specific applications like neural networks. -
CXML Reference Guide Is the Complete Reference Manual for CXML
Compaq Extended Math Library Reference Guide January 2001 This document describes the Compaq® Extended Math Library (CXML). CXML is a set of high-performance mathematical routines designed for use in numerically intensive scientific and engineering applications. This document is a guide to using CXML and provides reference information about CXML routines. Revision/Update Information: This document has been revised for this release of CXML. Compaq Computer Corporation Houston, Texas © 2001 Compaq Computer Corporation Compaq, the COMPAQ logo, DEC, DIGITAL, VAX, and VMS are registered in the U.S. Patent and Trademark Office. Alpha, Tru64, DEC Fortran, OpenVMS, and VAX FORTRAN are trademarks of Compaq Information Technologies, L.P. in the United States and other countries. Adobe, Adobe Acrobat, and POSTSCRIPT are registered trademarks of Adobe Systems Incorporated. CRAY is a registered trademark of Cray Research, Incorporated. IBM is a registered trademark of International Business Machines Corporation. IEEE is a registered trademark of the Institute of Electrical and Electronics Engineers Inc. IMSL and Visual Numerics are registered trademarks of Visual Numerics, Inc. Intel and Pentium are trademarks of Intel Corporation. KAP is a registered trademark of Kuck and Associates, Inc. Linux is a registered trademark of Linus Torvalds. Microsoft, Windows, and Windows NT are either trademarks or registered trademarks of Microsoft Corporation in the United States and other countries. OpenMP and the OpenMP logo are trademarks of OpenMP Architecture Review Board. SUN, SUN Microsystems, and Java are registered trademarks of Sun Microsystems, Inc. UNIX, Motif, OSF, OSF/1, OSF/Motif, and The Open Group are trademarks of The Open Group. All other trademarks and registered trademarks are the property of their respective holders. -
GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications
University of Central Florida STARS HIM 1990-2015 2014 GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications Adam Phillips University of Central Florida Part of the Mathematics Commons Find similar works at: https://stars.library.ucf.edu/honorstheses1990-2015 University of Central Florida Libraries http://library.ucf.edu This Open Access is brought to you for free and open access by STARS. It has been accepted for inclusion in HIM 1990-2015 by an authorized administrator of STARS. For more information, please contact [email protected]. Recommended Citation Phillips, Adam, "GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications" (2014). HIM 1990-2015. 1613. https://stars.library.ucf.edu/honorstheses1990-2015/1613 GPU ACCELERATED APPROACH TO NUMERICAL LINEAR ALGEBRA AND MATRIX ANALYSIS WITH CFD APPLICATIONS by ADAM D. PHILLIPS A thesis submitted in partial fulfilment of the requirements for the Honors in the Major Program in Mathematics in the College of Sciences and in The Burnett Honors College at the University of Central Florida Orlando, Florida Spring Term 2014 Thesis Chair: Dr. Bhimsen Shivamoggi c 2014 Adam D. Phillips All Rights Reserved ii ABSTRACT A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applica- tions is presented. The works objectives are to (1) develop stable and efficient algorithms utilizing multiple NVIDIA GPUs with CUDA to accelerate common matrix computations, (2) optimize these algorithms through CPU/GPU memory allocation, GPU kernel development, CPU/GPU communication, data transfer and bandwidth control to (3) develop parallel CFD applications for Navier Stokes and Lattice Boltzmann analysis methods.