Ada Departmental Supercomputer Shared Memory GPU Cluster

Total Page:16

File Type:pdf, Size:1020Kb

Ada Departmental Supercomputer Shared Memory GPU Cluster Ada Departmental Supercomputer Shared Memory GPU Cluster The Ada Departmental Supercomputer is designed to provide System Specifications near top 500 class supercomputing capabilities at your office Processors: Head Node: 2 AMD EPYC 7702 Processors or lab. (64 core-2.0/3.3 GHz) Compute Nodes: 1 AMD EPYC 7702P Proces- Ada is a hybrid supercomputer consisting of a large memory sor (64 core-2.2/3.2 GHz), 8 AMD Radeon head node and 2 to 5 compute nodes, each with eight AMD Instinct MI50 GPUs Radeon Instinct MI50 GPUs. With 5 compute nodes Ada con- Global Memory: 2TB or 4TB 3200 MHz DDR4 tains 448 AMD EPYC processor cores, 40 MI50 GPUs and 2 or 4 TB of globally shared memory. The compute nodes are Compute Node 128 GB 3200 MHz DDR4 (each) Memory: connected to the head node with 200 Gb/s Mellanox Infini- band. The Ada departmental supercomputer can be config- Storage: 1TB on-board M.2 OS SSD ured to deliver 1060 TFLOPS of FP16, 532 TFLOPS of FP32 12x 3.5" SATA/SAS hot-swap and 264 TFLOPS of FP64 GPU floating point performance SSD/HDD bays (head node) Additional 8x 2.5” SSD hot-swap bays on each capable of operating on large computational models. compute node Ada is a true symmetric multi-processing (SMP) computer Interconnect: ConnectX-6 VPI 200 Gb/s InfiniBand Dual Port with a large shared memory and a single operating system PCIe Gen 4 Host Bus Adapters user interface based on Centos 8 Linux. It provides a 1TB (No InfiniBand switch is needed) globally shared fast file system, and a large disk storage ar- I/O: 2x 1 Gb/s LAN ports ray. The performance of Ada is equivalent to supercomputers 4x USB 3.0 costing millions of dollars. Its large shared memory and multi- 1x VGA ple GPUs can support the training of highly complex AI and 1x Dedicated management LAN port machine learning models. It can handle some of the largest Environment: 2x 2200 W redundant PSUs (head node) engineering simulations utilizing fluid dynamics, finite element 2x 2200 W redundant PSUs (compute nodes) analysis and coupled models. It is an excellent resource for molecular dynamics, bio-informatics and drug discovery. With AC Input: 110/208 V / 15 A, 50-60Hz (head node) 110/208 V / 15 A, 50-60Hz (compute node) Ada you can execute your most demanding computationally intensive tasks in your office or lab, without waiting for slower, Dimensions: 6U-12U Standard 19 inch Rack Mountable less powerful cloud resources. Features Benefits Software Specifications Faster projects. Powerful Dedicated GPU Linux OS (Centos 8) Supercomputing Dedicated power when your project needs it. DSMP™ Distributed Symmetric Multi-Processing™ OpenMP, Pthreads, POSIX, SysV IPC Ideal for large memory Large Single Shared Memory applications DSMP™ enables Symmetric Multi-Processing on the Ada — Simple and scalable SMP A single system image with 2 or 4TB single shared memory Single Software Image multi-threaded programming. across 6 server nodes with 448 AMD EPYC™ cores and 40 No complicated cluster tailoring. AMD Radeon Instinct™ MI50 GPUs. Symmetric Computing Inc. Venture Development Center | University of Massachusetts | 100 Morrissey Boulevard | Boston, MA 02125 www.SymmetricComputing.com • Phone +1.978.662.8783 Information contained in this document is subject to change without notice and is presented without express or implied warranty. Distributed Symmetric Multi-Processing, DSMP, Ada, Departmental Supercomputer are trademarks of Symmetric Computing . All other trademarks are the property of their respective owners. Copyright 2018 Symmetric Computing Company. All rights reserved. .
Recommended publications
  • Sprite File System There Are Three Important Aspects of the Sprite ®Le System: the Scale of the System, Location-Transparency, and Distributed State
    Naming, State Management, and User-Level Extensions in the Sprite Distributed File System Copyright 1990 Brent Ballinger Welch CHAPTER 1 Introduction ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ This dissertation concerns network computing environments. Advances in network and microprocessor technology have caused a shift from stand-alone timesharing systems to networks of powerful personal computers. Operating systems designed for stand-alone timesharing hosts do not adapt easily to a distributed environment. Resources like disk storage, printers, and tape drives are not concentrated at a single point. Instead, they are scattered around the network under the control of different hosts. New operating system mechanisms are needed to handle this sort of distribution so that users and application programs need not worry about the distributed nature of the underlying system. This dissertation explores the approach of centering a distributed computing environment around a shared network ®le system. The ®le system is chosen as a starting point because it is a heavily used service in stand-alone systems, and the read/write para- digm of the ®le system is a familiar one that can be applied to many system resources. The ®le system described in this dissertation provides a distributed name space for sys- tem resources, and it provides remote access facilities so all resources are available throughout the network. Resources accessible via the ®le system include disk storage, other types of peripheral devices, and user-implemented service applications. The result- ing system is one where resources are named and accessed via the shared ®le system, and the underlying distribution of the system among a collection of hosts is not important to users.
    [Show full text]
  • Containers: a Sound Basis for a True Single System Image
    Containers : A Sound Basis For a True Single System Image Renaud Lottiaux, Christine Morin To cite this version: Renaud Lottiaux, Christine Morin. Containers : A Sound Basis For a True Single System Image. [Research Report] RR-4085, INRIA. 2000. inria-00072548 HAL Id: inria-00072548 https://hal.inria.fr/inria-00072548 Submitted on 24 May 2006 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Containers : A Sound Basis For a True Single System Image Renaud Lottiaux, Christine Morin N˚4085 Novembre 2000 THÈME 1 apport de recherche ISRN INRIA/RR--4085--FR+ENG ISSN 0249-6399 Containers : A Sound Basis For a True Single System Image Renaud Lottiaux , Christine Morin Thème 1 — Réseaux et systèmes Projet PARIS Rapport de recherche n˚4085 — Novembre 2000 — 19 pages Abstract: Clusters of SMPs are attractive for executing shared memory parallel appli- cations but reconciling high performance and ease of programming remains an open issue. A possible approach is to provide an efficient Single System Image (SSI) operating system giving the illusion of an SMP machine. In this paper, we introduce the concept of container as a mechanism to unify global resource management at the lowest operating system level.
    [Show full text]
  • A Single System Image Java Operating System for Sensor Networks
    A SINGLE SYSTEM IMAGE JAVA OPERATING SYSTEM FOR SENSOR NETWORKS Emin Gun Sirer Rimon Barr John C. Bicket Daniel S. Dantas Computer Science Department Cornell University Ithaca, NY 14853 {egs, barr, bicket, ddantas}@cs.cornell.edu Abstract In this paper we describe the design and implementation of a distributed operating system for sensor net- works. The goal of our system is to extend total system lifetime through power-aware adaptation for sensor networking applications. Our system achieves this goal by providing a single system image of a unified Java virtual machine to applications over an ad hoc collection of heterogeneous sensors. It automatically and transparently partitions applications into components and dynamically finds a placement of these components on nodes within the sensor network to reduce energy consumption and increase system longevity. This paper describes the design and implementation of our system and examines the question of where and when to mi- grate components in a sensor network. We evaluate two practical, power-aware, general-purpose algorithms for object placement, as well as an adaptive scheme for deciding the time granularity of object migration. We demonstrate that our algorithms can increase sensor network longevity by a factor of four to five by effec- tively distributing energy consumption and avoiding hotspots. 1. Introduction able to components at each node, in particular the available power and bandwidth may change over Sensor networks simultaneously promise a radi- time and necessitate the relocation of application cally new class of applications and pose signifi- components. Further, event sources that are being cant challenges for application development.
    [Show full text]
  • Architectural Review of Load Balancing Single System Image
    Journal of Computer Science 4 (9): 752-761, 2008 ISSN 1549-3636 © 2008 Science Publications Architectural Review of Load Balancing Single System Image Bestoun S. Ahmed, Khairulmizam Samsudin and Abdul Rahman Ramli Department of Computer and Communication Systems Engineering, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia Abstract: Problem statement: With the growing popularity of clustering application combined with apparent usability, the single system image is in the limelight and actively studied as an alternative solution for computational intensive applications as well as the platform for next evolutionary grid computing era. Approach: Existing researches in this field concentrated on various features of Single System Images like file system and memory management. However, an important design consideration for this environment is load allocation and balancing that is usually handled by an automatic process migration daemon. Literature shows that the design concepts and factors that affect the load balancing feature in an SSI system are not clear. Result: This study will review some of the most popular architecture and algorithms used in load balancing single system image. Various implementations from the past to present will be presented while focusing on the factors that affect the performance of such system. Conclusion: The study showed that although there are some successful open source systems, the wide range of implemented systems investigated that research activity should concentrate on the systems that have already been proposed and proved effectiveness to achieve a high quality load balancing system. Key words: Single system image, NOWs (network of workstations), load balancing algorithm, distributed systems, openMosix, MOSIX INTRODUCTION resources transparently irrespective of where they are available[1].The load balancing single system image Cluster of computers has become an efficient clusters dominate research work in this environment.
    [Show full text]
  • SSI-OSCAR: a Distribution for High Performance Computing Using a Single System Image
    1 SSI-OSCAR: a Distribution For High Performance Computing Using a Single System Image Geoffroy Vallée (INRIA / ORNL / EDF), Christine Morin (INRIA), Stephen L. Scott (ORNL), Jean-Yves Berthou (EDF), Hugues Prisker (EDF) OSCAR Symposium, May 2005 2 Context • Clusters: distributed architecture • difficult to use • difficult to manage • Different approaches • do everything manually • use software suite to simplify management and use (e.g. OSCAR) • This solution does not completely hide the resources distribution • use a Single System Image (SSI) • all resources are managed at the cluster scale • transparent for users and administrators • gives the illusion that a cluster is an SMP machine 3 What is a Single System Image? • SSI features ● Transparent resource management at the cluster level ● High Availability: tolerate all undesirable events that can occurs (node failure or eviction) ● Support of programming standards (e.g. MPI, OpenMP) ● High performance • A Solution: merge OSCAR and an SSI ● Simple to install ● Simple to use ● Collaboration INRIA / EDF / ORNL 4 SSI - Implementation • Key point: global resource management Limitations for functionalities User level: middle-ware and efficiency (e.g. CONDOR) Complex to develop Kernel level: OS and maintain (e.g. MOSIX, OpenSSI, Kerrighed) Hardware level More expensive (e.g. SGI) 5 Kerrighed – Overview • SSI developed in France (Rennes), INRIA/IRISA, in collaboration with EDF • Management at the cluster scale of • memories (through a DSM) • processes (through mechanisms for global process
    [Show full text]
  • Quantian: a Single-System Image Scientific Cluster Computing
    Quantian: A single-system image scientific cluster computing environment Dirk Eddelbuettel, Ph.D. B of A, and Debian [email protected] Presentation at the Extreme Linux SIG at USENIX 2004 in Boston, July 2, 2004 Quantian: A single-system image scientific cluster computing environment – p. 1 Introduction Quantian is a directly bootable and self-configuring Linux sytem that runs from a compressed dvd image. Quantian offers zero-configuration cluster computing using openMosix. Quantian can boot ’thin clients’ directly via PXE in an ’openmosixterminalserver’ setting. Quantian contains around 1gb of additional ’quantitative’ software: scientific, numerical, statistical, engineering, ... Quantian also contains tools of general usefulness such as editors, programming languages, a very complete latex suite, two ’office’ suites, networking tools and multimedia apps. Quantian: A single-system image scientific cluster computing environment – p. 2 Family tree overview Quantian is based on clusterKnoppix, which extends Knoppix with an openMosix-enabled kernel and applications (chpox, gomd, tyd, ....), kernel modules and security patches. ClusterKnoppix extends Knoppix, an impressive ’linux on a cdrom’ system which puts 2.3gb of software onto a cdrom along with the very best auto-detection and configuration. Knoppix is based on Debian, a Linux distribution containing over 6000 source packages available for 10 architectures (such as i386, alpha, ia64, amd64, sparc or s390) produced by hundreds of individuals from across the globe. Quantian: A single-system image scientific cluster computing environment – p. 3 Family tree: Debian ’Linux the Linux way’: made by volunteers (some now with full-time backing) from across the globe. Focus on very high technical standards with rigorous policy and reference documents.
    [Show full text]
  • Cluster Computing White Paper
    Cluster Computing White Paper Status – Final Release Version 2.0 Date – 28th December 2000 Editor-MarkBaker,UniversityofPortsmouth,UK Contents and Contributing Authors: 1 An Introduction to PC Clusters for High Performance Computing, Thomas Sterling California Institute of Technology and NASA Jet Propulsion Laboratory, USA 2 Network Technologies, Amy Apon, University of Arkansas, USA, and Mark Baker, University of Portsmouth, UK. 3 Operating Systems, Steve Chapin, Syracuse University, USA and Joachim Worringen, RWTH Aachen, University of Technology, Germany 4 Single System Image (SSI), Rajkumar Buyya, Monash University, Australia, Toni Cortes, Universitat Politecnica de Catalunya, Spain and Hai Jin, University of Southern California, USA 5 Middleware, Mark Baker, University of Portsmouth, UK, and Amy Apon, University of Arkansas, USA. 6 Systems Administration, Anthony Skjellum, MPI Software Technology, Inc. and Mississippi State University, USA, Rossen Dimitrov and Srihari Angulari, MPI Software Technology, Inc., USA, David Lifka and George Coulouris, Cornell Theory Center, USA, Putchong Uthayopas, Kasetsart University, Bangkok, Thailand, Stephen Scott, Oak Ridge National Laboratory, USA, Rasit Eskicioglu, University of Manitoba, Canada 7 Parallel I/O, Erich Schikuta, University of Vienna, Austria and Helmut Wanek, University of Vienna, Austria 8 High Availability, Ira Pramanick, Sun Microsystems, USA 9 Numerical Libraries and Tools for Scalable Parallel Cluster Computing, Jack Dongarra, University of Tennessee and ORNL, USA, Shirley Moore, University of Tennessee, USA, and Anne Trefethen, Numerical Algorithms Group Ltd, UK 10 Applications, David Bader, New Mexico, USA and Robert Pennington, NCSA, USA 11 Embedded/Real-Time Systems, Daniel Katz, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA and Jeremy Kepner, MIT Lincoln Laboratory, Lexington, MA, USA 12 Education, Daniel Hyde, Bucknell University, USA and Barry Wilkinson, University of North Carolina at Charlotte, USA Preface Cluster computing is not a new area of computing.
    [Show full text]
  • An Introduction to Single System Image (SSI) Cluster Technique
    Volume III, Issue IV, April 2014 IJLTEMAS ISSN 2278 - 2540 An Introduction to Single System Image (SSI) Cluster Technique Tarun Kumawat [CSE] , JECRC UDML College of Engineering. Kukas, Jaipur, Rajasthan, India1 Sandeep Tomar [CSE] , Arya College of Engineering & I.T. Kukas, Jaipur, Rajasthan, India2 Mohit Gupta [CSE] , Arya College of Engineering & I.T. Kukas, Jaipur, Rajasthan, India3 [email protected] [email protected] 3 [email protected] beowulf.myinstitute.edu), although the cluster Abstract-Cluster computing is not a new area of computing. may have multiple physical host nodes to serve It is, however, evident that there is a growing interest in its the login session. The system transparently usage in all areas where applications have traditionally used distributes user’s connection requests to different parallel or distributed computing platforms. A Single System physical hosts to balance load. Image (SSI) is the property of a system that hides the Single user interface: The user should be able to heterogeneous and distributed nature of the available use the cluster through a single GUI. The resources and presents them to users and applications as a single unified computing resource. SSI can be enabled in interface must have the same look and feel than numerous ways, this range from those provided by extended the one available for workstations (e.g., Solaris hardware through to various software mechanisms. SSI OpenWin or Windows NT GUI). means that users have a globalised view of the resources Single process space: All user processes, no available to them irrespective of the node to which they are matter on which nodes they reside, have a unique physically associated.
    [Show full text]
  • Comparative Study of Single System Image Clusters
    Comparative Study of Single System Image Clusters Piotr OsiLski 1, Ewa Niewiadomska-Szynkiewicz 1,2 1 Warsaw University of Technology, Institute of Control and Computation Engineering, Warsaw, Poland, e-mail: [email protected], [email protected] 2 Research and Academic Computer Network (NASK), Warsaw, Poland. Abstract. Cluster computing has been identified as an important new technology that may be used to solve complex scientific and engineering problems as well as to tackle many projects in commerce and industry. In this paper* we present an overview of three Linux- based SSI cluster systems. We compare their stability, performance and efficiency. 1 Introduction to cluster systems One of the biggest advantages of distributed systems over standalone computers is an ability to share the workload between the nodes. A cluster is a group of cooperating, usually homogeneous computers that serves as one virtual machine [8, 11]. The performance of a given cluster depends on the speed of processors of separate nodes and the efficiency of particular network technology. In advanced computing clusters simple local networks are substituted by complicated network graphs or very fast communication channels. The most common operating systems used for building clusters are UNIX and Linux. Clusters should effectuate following features: scalability, transparency, reconfigurability, availability, reliability and high performance. There are many software tools for supporting cluster computing. In this paper we focus on three of them: Mosix [9] and its open source version – OpenMosix [12], OpenSSI [14] and Kerrighed [3]. One of the most important features of cluster systems is load balancing. The idea is to implement an efficient load balancing algorithm, which is triggered when loads of nodes are not balanced or local resources are limited.
    [Show full text]
  • The Design and Application of an Extensible Operating System
    THE DESIGN AND APPLICATION OF AN EXTENSIBLE OPERATING SYSTEM Leendert van Doorn VRIJE UNIVERSITEIT THE DESIGN AND APPLICATION OF AN EXTENSIBLE OPERATING SYSTEM ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Vrije Universiteit te Amsterdam, op gezag van de rector magnificus prof.dr. T. Sminia, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de faculteit der Exacte Wetenschappen / Wiskunde en Informatica op donderdag 8 maart 2001 om 10.45 uur in het hoofdgebouw van de universiteit, De Boelelaan 1105 door LEENDERT PETER VAN DOORN geboren te Drachten Promotor: prof.dr. A.S. Tanenbaum To Judith and Sofie Publisher: Labyrint Publication P.O. Box 662 2900 AR Capelle a/d IJssel - Holland fax +31 (0) 10 2847382 ISBN 90-72591-88-7 Copyright © 2001 L. P. van Doorn All rights reserved. No part of this publication may be reproduced, stored in a retrieval system of any nature, or transmitted in any form or by any means, electronic, mechani- cal, now known or hereafter invented, including photocopying or recording, without prior written permission of the publisher. Advanced School for Computing and Imaging This work was carried out in the ASCI graduate school. ASCI dissertation series number 60. Parts of Chapter 2 have been published in the Proceedings of the First ASCI Workshop and in the Proceedings of the International Workshop on Object Orientation in Operat- ing Systems. Parts of Chapter 3 have been published in the Proceedings of the Fifth Hot Topics in Operating Systems (HotOS) Workshop. Parts of Chapter 5 have been published in the Proceedings of the Sixth SIGOPS Euro- pean Workshop, the Proceedings of the Third ASCI Conference, the Proceedings of the Ninth Usenix Security Symposium, and filed as an IBM patent disclosure.
    [Show full text]
  • Single System Image in a Linux-Based Replicated Operating System Kernel
    Single System Image in a Linux-based Replicated Operating System Kernel Akshay Giridhar Ravichandran Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Binoy Ravindran, Chair Robert P. Broadwater Antonio Barbalace February 24, 2015 Blacksburg, Virginia Keywords: Linux, Multikernel, Thread synchronization, Signals, Process Management Copyright 2015, Akshay Giridhar Ravichandran Single System Image in a Linux-based Replicated Operating System Kernel Akshay Giridhar Ravichandran (ABSTRACT) Recent trends in the computer market suggest that emerging computing platforms will be increasingly parallel and heterogeneous, in order to satisfy the user demand for improved performance and superior energy savings. Heterogeneity is a promising technology to keep growing the number of cores per chip without breaking the power wall. However, existing system software is able to cope with homogeneous architectures, but it was not designed to run on heterogeneous architectures, therefore, new system software designs are necessary. One innovative design is the multikernel OS deployed by the Barrelfish operating system (OS) which partitions hardware resources to independent kernel instances that communi- cate exclusively by message passing, without exploiting the shared memory available amongst different CPUs in a multicore platform. Popcorn Linux implements an extension of the mul- tikernel OS design, called replicated-kernel OS, with the goal of providing a Linux-based single system image environment on top of multiple kernels, which can eventually run on dif- ferent ISA processors. A replicated-kernel OS replicates the state of various OS sub-systems amongst kernels that cooperate using message passing to distribute or access various services uniquely available on each kernel.
    [Show full text]
  • Mosix the Operating System That Support Multiple Cluster Environment with Its Advancements & Features
    Rahul Rajkumar Pahlajani et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.4, April- 2014, pg. 590-596 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 3, Issue. 4, April 2014, pg.590 – 596 RESEARCH ARTICLE Mosix the Operating System that Support Multiple Cluster Environment with its Advancements & Features Rahul Rajkumar Pahlajani Student of Master of Engineering in (CSE) PRMIT college of Engineering and Technology Amravati, India [email protected] Dr. G. R. Bamnote Head of the Department of (CSE) PRMIT College of Engineering and Technology Amravati, India [email protected] Abstract - Mosix is a series of modifications to the Linux kernel. MOSIX Design Objectives turn a network of Linux computers into a High Performance Cluster computer. MOSIX is the brainchild of Amnon Barak. MOSIX is a cluster operating system that provides users and applications with the impression of running on a single computer with multiple processors (single-system image) and Hide cluster complexity to users. This paper describes the enhancement of MOSIX to openMosix and its cloud environment. There are many advance features of MOSIX by which large number of application work fastly and properly, they also mentioned in this paper. I. INTRODUCTION The MOS for UNIX (MOSIX) is a multi-computer Operating System with decentralized management. Mosix is based on Unix and provides a single-systems image as if using one computer with multiple CPUs. It geared to reduce the management complexity of users.
    [Show full text]