State-Of-The-Art Parallel Computing, Molecular Dynamics on The

Total Page:16

File Type:pdf, Size:1020Kb

State-Of-The-Art Parallel Computing, Molecular Dynamics on The he availability of State-of-the-Art massively parallel Tcomputers presents computational scientists Parallel Computing with an exciting challenge. The architects of these machines claim that they are scalable—that a machine molecular dynamics on the connection machine containing 100 processors arranged in a parallel archi- tecture can be made 10 times more powerful by employing the same basic Peter S. Lomdahl and David M. Beazley design but using 1000 processors instead of 100. Such massively parallel computers are available, and one of largest—a CM-5 Connection Machine con- taining 1024 processors—is here at Los Alamos National Laboratory. The challenge is to realize the promise of those machines—to demon- strate that large scientific This “snapshot” from a molecular dynamics simulation shows a thin plate problems can be computed undergoing fracture. The plate contains 38 million particles. This state-of- with the advertised speed the-art simulation was performed using a scalable algorithm, called SPaSM, and cost-effectiveness. specifically designed to run on the massively parallel CM-5 supercomputer at The challenge is signifi- Los Alamos National Laboratory. The velocities of the particles are indicated by color—red and yellow particles have higher velocities than blue and gray cant because the architec- particles. ture of parallel computers differs dramatically from that of conventional vector supercomputers, which have been the standard tool for scientific computing for the last two decades. 44 Los Alamos Science Number 22 1994 Number 22 1994 Los Alamos Science 45 State-of-the-Art Parallel Computing The massively parallel architecture perimental measurement contains over approach programmers use explicit in- requires an entirely new approach to ten billion atoms. Therefore, the larger structions to initiate communication programming. A computation must the number of atoms included in the among the processors of a parallel sys- somehow be divided equally among the calculation, the more interesting the re- tem, and the processors accomplish the hundreds of individual processors and sults for materials science. Scientists communication by sending messages to communication between processors would very much like to perform simu- each other. must be rapid and kept to a minimum. lations that predict the motions of bil- Message-passing programming is ad- The incentive to work out these issues lions of atoms during a dynamical vantageous not only in achieving high was heightened by Gordon Bell, an in- process such as machining or fracture performance, but also in increasing dependent consultant to the computer and compare the predictions with ex- “portability,” because it is easily adapt- industry and well-known skeptic of periment. Unfortunately, the realization ed for use on a number of massively massively parallel machines. Six years of that prospect is not possible today. parallel machines. We were able to run ago he initiated a formal competition However, if the machines can be scaled our message-passing algorithm on an- for achievements in large-scale scientif- up in both memory capacity and speed, other parallel computer, the Cray T3D, ic computation on vector and parallel as the computer manufacturers have with minimal modification. machines. The annual prizes are promised, then simulations can be Following brief descriptions of MD awarded through and recognized by the scaled up proportionally so that billion- and the architecture and properties of Computer Society of the Institute of atom simulations may become a reality. the CM-5, we present the method by Electrical and Electronic Engineers Furthermore, as we show here, the cur- which we have mapped a molecular-dy- (IEEE). The work reported here was rent generation of massively parallel namics simulation to the architecture of awarded one of the 1993 IEEE Gordon machines has made it possible to simu- the CM-5. Our example illustrates Bell prizes. We were able to demon- late the motions of tens of millions of message passing and other techniques strate very high performance as mea- atoms. Such simulations are large that are widely applicable to effective sured by speed: Our simulation, called enough to perform meaningful studies programming of massively parallel SPaSM (scalable parallel short-range of the bulk properties of matter and to computers. molecular dynamics), ran on the mas- improve the approximate models of sively parallel CM-5 at a rate of 50 gi- those properties used in standard con- gaflops (50 billion floating-point opera- tinuum descriptions of macroscopic sys- Molecular Dynamics tions per second). That rate is nearly tems. Molecular-dynamics simulations 40% of the theoretical maximum; typi- on massively parallel computers thus Molecular dynamics is a computa- cal programs achieve only about 20% represent a new opportunity to study tional method used to track the posi- of the theoretical maximum. Also, the the dynamical properties of materials tions and velocities of individual parti- algorithm we developed has the proper- from first principles. cles (atoms or groups of atoms) in a ty of scalability. When the number of Massively parallel machines have material as the particles interact with processors used was doubled, the time only recently become popular, so the each other and respond to external in- required to run the simulation was cut vendors have not yet developed the fluences. The motion of each particle in half. software—the compilers—needed to is calculated by Newton’s equation of Our achievement was the adaptation optimize a given program for the par- motion, force 5 mass 3 acceleration, of the molecular-dynamics method to ticular architecture of their machines. where the force on a given particle de- the architecture of parallel machines. Consequently, to develop a scalable pends on the interactions of the particle Molecular dynamics (MD) is a first- parallel algorithm for molecular dynam- with other particles. The mutual inter- principles approach to the study of ma- ics that would yield high performance, action of many particles simultaneously terials. MD starts with the fundamental we needed to understand the inner is called a “many-body problem” and building blocks—the individual workings of the CM-5 and incorporate has no analytical, or exact, solution be- atoms—that make up a material and that knowledge into the details of our cause the force on a particle is always follows the motions of the atoms as program. Much of our work focused changing in a complex way as each they interact with each other through on implementing a particular technique particle in the system moves under the interatomic forces. Even the smallest for interprocessor communication influence of the others. piece of material that is useful for ex- known as “message passing.” In this The MD algorithm provides an ap- 46 Los Alamos Science Number 22 1994 State-of-the-Art Parallel Computing molecular1.adb 7/26/94 1.0 The procedure is repeated for each suc- ceeding timestep, and the result is a se- ries of “snapshots” of the positions and "Repulsive" region velocities of the particles at times t50, 0.5 ) Dt, 2Dt, … , nDt. ε In MD simulations particles are usu- ally treated as point particles (that is, 0.0 they have no extent), and the force be- tween any two particles is usually ap- proximated as the gradient of a poten- Potential (in units of tial function V that depends only on the -0.5 "Attractive" region distance r between the two particles. The force on particle i from particle j can be written -1.0 0 1 2 3 4 dV Distance between particles (in units of σ) F 5 2} , where r 5 r 2 r , ij dr ij i j rij Figure 1. The Lennard-Jones Interaction Potential The Lennard-Jones interaction potential V between two particles is plotted as a func- and that force is directed along the line tion of the distance r between the particles. The potential is in units of e, and the dis- connecting the two particles. If the tance between the particles is in units of s. The parameter s is equal to the value of r force is negative, particles i and j at- at which the potential function crosses zero. The potential has a positive slope at tract one another; if the force is posi- longer distances, corresponding to an attractive force, and a steeply negative slope at tive, the particles repel each other. The short distances, corresponding to a strong repulsive force. Separating the two regions net force exerted on particle i by all is the local minimum of the potential—the deepest point in the potential “well”—which other particles j is approximated by has a depth equal to e and is located at r 5 21/6s. At the potential minimum the force summing the forces calculated from all between the particles is zero. Therefore, in the absence of effects from any other parti- pair-wise interactions between particle i cles, the separation between two particles will tend to oscillate around r 5 21/6s. and each particle j. The standard pair-wise potential proximate solution to the many-body used in MD simulations is the Lennard- 1 ′ } 2 problem by treating time as a succes- ri 5 ri 1 viDt 1 2 ai(Dt) Jones potential, which is known empiri- sion of discrete timesteps and treating cally to provide a reasonably good “fit” the force on (and therefore the accelera- and to experimental data for atom-atom in- ′ tion of) each particle as a constant for vi 5 vi 1 aiDt. teractions in many solids and liquids. the duration of a single timestep. At As shown in Figure 1, this potential has each timestep the position and velocity Sometimes, these equations are re- both an attractive part and a repulsive of each particle is updated on basis of placed by more sophisticated approxi- part: the constant acceleration it experiences mations for the new positions and ve- during that timestep.
Recommended publications
  • 2.5 Classification of Parallel Computers
    52 // Architectures 2.5 Classification of Parallel Computers 2.5 Classification of Parallel Computers 2.5.1 Granularity In parallel computing, granularity means the amount of computation in relation to communication or synchronisation Periods of computation are typically separated from periods of communication by synchronization events. • fine level (same operations with different data) ◦ vector processors ◦ instruction level parallelism ◦ fine-grain parallelism: – Relatively small amounts of computational work are done between communication events – Low computation to communication ratio – Facilitates load balancing 53 // Architectures 2.5 Classification of Parallel Computers – Implies high communication overhead and less opportunity for per- formance enhancement – If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • operation level (different operations simultaneously) • problem level (independent subtasks) ◦ coarse-grain parallelism: – Relatively large amounts of computational work are done between communication/synchronization events – High computation to communication ratio – Implies more opportunity for performance increase – Harder to load balance efficiently 54 // Architectures 2.5 Classification of Parallel Computers 2.5.2 Hardware: Pipelining (was used in supercomputers, e.g. Cray-1) In N elements in pipeline and for 8 element L clock cycles =) for calculation it would take L + N cycles; without pipeline L ∗ N cycles Example of good code for pipelineing: §doi =1 ,k ¤ z ( i ) =x ( i ) +y ( i ) end do ¦ 55 // Architectures 2.5 Classification of Parallel Computers Vector processors, fast vector operations (operations on arrays). Previous example good also for vector processor (vector addition) , but, e.g. recursion – hard to optimise for vector processors Example: IntelMMX – simple vector processor.
    [Show full text]
  • A Massively-Parallel Mixed-Mode Computer Designed to Support
    This paper appeared in th International Parallel Processing Symposium Proc of nd Work shop on Heterogeneous Processing pages NewportBeach CA April Triton A MassivelyParallel MixedMo de Computer Designed to Supp ort High Level Languages Christian G Herter Thomas M Warschko Walter F Tichy and Michael Philippsen University of Karlsruhe Dept of Informatics Postfach D Karlsruhe Germany Mo dula Abstract Mo dula pronounced Mo dulastar is a small ex We present the architectureofTriton a scalable tension of Mo dula for massively parallel program mixedmode SIMDMIMD paral lel computer The ming The programming mo del of Mo dula incor novel features of Triton are p orates b oth data and control parallelism and allows hronous and asynchronous execution mixed sync Support for highlevel machineindependent pro Mo dula is problemorientedinthesensethatthe gramming languages programmer can cho ose the degree of parallelism and mix the control mo de SIMD or MIMDlike as need Fast SIMDMIMD mode switching ed bytheintended algorithm Parallelism maybe nested to arbitrary depth Pro cedures may b e called Special hardware for barrier synchronization of from sequential or parallel contexts and can them multiple process groups selves generate parallel activity without any restric tions Most Mo dula programs can b e translated into ecient co de for b oth SIMD and MIMD archi A selfrouting deadlockfreeperfect shue inter tectures connect with latency hiding Overview of language extensions The architecture is the outcomeofanintegrated de Mo dula extends Mo dula
    [Show full text]
  • Simulating Physics with Computers
    International Journal of Theoretical Physics, VoL 21, Nos. 6/7, 1982 Simulating Physics with Computers Richard P. Feynman Department of Physics, California Institute of Technology, Pasadena, California 91107 Received May 7, 1981 1. INTRODUCTION On the program it says this is a keynote speech--and I don't know what a keynote speech is. I do not intend in any way to suggest what should be in this meeting as a keynote of the subjects or anything like that. I have my own things to say and to talk about and there's no implication that anybody needs to talk about the same thing or anything like it. So what I want to talk about is what Mike Dertouzos suggested that nobody would talk about. I want to talk about the problem of simulating physics with computers and I mean that in a specific way which I am going to explain. The reason for doing this is something that I learned about from Ed Fredkin, and my entire interest in the subject has been inspired by him. It has to do with learning something about the possibilities of computers, and also something about possibilities in physics. If we suppose that we know all the physical laws perfectly, of course we don't have to pay any attention to computers. It's interesting anyway to entertain oneself with the idea that we've got something to learn about physical laws; and if I take a relaxed view here (after all I'm here and not at home) I'll admit that we don't understand everything.
    [Show full text]
  • Think in G Machines Corporation Connection
    THINK ING MACHINES CORPORATION CONNECTION MACHINE TECHNICAL SUMMARY The Connection Machine System Connection Machine Model CM-2 Technical Summary ................................................................. Version 6.0 November 1990 Thinking Machines Corporation Cambridge, Massachusetts First printing, November 1990 The information in this document is subject to change without notice and should not be construed as a commitment by Thinking Machines Corporation. Thinking Machines Corporation reserves the right to make changes to any products described herein to improve functioning or design. Although the information in this document has been reviewed and is believed to be reliable, Thinking Machines Corporation does not assume responsibility or liability for any errors that may appear in this document. Thinking Machines Corporation does not assume any liability arising from the application or use of any information or product described herein. Connection Machine® is a registered trademark of Thinking Machines Corporation. CM-1, CM-2, CM-2a, CM, and DataVault are trademarks of Thinking Machines Corporation. C*®is a registered trademark of Thinking Machines Corporation. Paris, *Lisp, and CM Fortran are trademarks of Thinking Machines Corporation. C/Paris, Lisp/Paris, and Fortran/Paris are trademarks of Thinking Machines Corporation. VAX, ULTRIX, and VAXBI are trademarks of Digital Equipment Corporation. Symbolics, Symbolics 3600, and Genera are trademarks of Symbolics, Inc. Sun, Sun-4, SunOS, and Sun Workstation are registered trademarks of Sun Microsystems, Inc. UNIX is a registered trademark of AT&T Bell Laboratories. The X Window System is a trademark of the Massachusetts Institute of Technology. StorageTek is a registered trademark of Storage Technology Corporation. Trinitron is a registered trademark of Sony Corporation.
    [Show full text]
  • Massively Parallel Computing with CUDA
    Massively Parallel Computing with CUDA Antonino Tumeo Politecnico di Milano 1 GPUs have evolved to the point where many real world applications are easily implemented on them and run significantly faster than on multi-core systems. Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs. Jack Dongarra Professor, University of Tennessee; Author of “Linpack” Why Use the GPU? • The GPU has evolved into a very flexible and powerful processor: • It’s programmable using high-level languages • It supports 32-bit and 64-bit floating point IEEE-754 precision • It offers lots of GFLOPS: • GPU in every PC and workstation What is behind such an Evolution? • The GPU is specialized for compute-intensive, highly parallel computation (exactly what graphics rendering is about) • So, more transistors can be devoted to data processing rather than data caching and flow control ALU ALU Control ALU ALU Cache DRAM DRAM CPU GPU • The fast-growing video game industry exerts strong economic pressure that forces constant innovation GPUs • Each NVIDIA GPU has 240 parallel cores NVIDIA GPU • Within each core 1.4 Billion Transistors • Floating point unit • Logic unit (add, sub, mul, madd) • Move, compare unit • Branch unit • Cores managed by thread manager • Thread manager can spawn and manage 12,000+ threads per core 1 Teraflop of processing power • Zero overhead thread switching Heterogeneous Computing Domains Graphics Massive Data GPU Parallelism (Parallel Computing) Instruction CPU Level (Sequential
    [Show full text]
  • CS 677: Parallel Programming for Many-Core Processors Lecture 1
    1 CS 677: Parallel Programming for Many-core Processors Lecture 1 Instructor: Philippos Mordohai Webpage: mordohai.github.io E-mail: [email protected] Objectives • Learn how to program massively parallel processors and achieve – High performance – Functionality and maintainability – Scalability across future generations • Acquire technical knowledge required to achieve the above goals – Principles and patterns of parallel programming – Processor architecture features and constraints – Programming API, tools and techniques 2 Important Points • This is an elective course. You chose to be here. • Expect to work and to be challenged. • If your programming background is weak, you will probably suffer. • This course will evolve to follow the rapid pace of progress in GPU programming. It is bound to always be a little behind… 3 Important Points II • At any point ask me WHY? • You can ask me anything about the course in class, during a break, in my office, by email. – If you think a homework is taking too long or is wrong. – If you can’t decide on a project. 4 Logistics • Class webpage: http://mordohai.github.io/classes/cs677_s20.html • Office hours: Tuesdays 5-6pm and by email • Evaluation: – Homework assignments (40%) – Quizzes (10%) – Midterm (15%) – Final project (35%) 5 Project • Pick topic BEFORE middle of the semester • I will suggest ideas and datasets, if you can’t decide • Deliverables: – Project proposal – Presentation in class – Poster in CS department event – Final report (around 8 pages) 6 Project Examples • k-means • Perceptron • Boosting – General – Face detector (group of 2) • Mean Shift • Normal estimation for 3D point clouds 7 More Ideas • Look for parallelizable problems in: – Image processing – Cryptanalysis – Graphics • GPU Gems – Nearest neighbor search 8 Even More… • Particle simulations • Financial analysis • MCMC • Games/puzzles 9 Resources • Textbook – Kirk & Hwu.
    [Show full text]
  • Core Processors
    UNIVERSITY OF CALIFORNIA Los Angeles Parallel Algorithms for Medical Informatics on Data-Parallel Many-Core Processors A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Maryam Moazeni 2013 © Copyright by Maryam Moazeni 2013 ABSTRACT OF THE DISSERTATION Parallel Algorithms for Medical Informatics on Data-Parallel Many-Core Processors by Maryam Moazeni Doctor of Philosophy in Computer Science University of California, Los Angeles, 2013 Professor Majid Sarrafzadeh, Chair The extensive use of medical monitoring devices has resulted in the generation of tremendous amounts of data. Storage, retrieval, and analysis of such data require platforms that can scale with data growth and adapt to the various behavior of the analysis and processing algorithms. In recent years, many-core processors and more specifically many-core Graphical Processing Units (GPUs) have become one of the most promising platforms for high performance processing of data, due to the massive parallel processing power they offer. However, many of the algorithms and data structures used in medical and bioinformatics systems do not follow a data-parallel programming paradigm, and hence cannot fully benefit from the parallel processing power of ii data-parallel many-core architectures. In this dissertation, we present three techniques to adapt several non-data parallel applications in different dwarfs to modern many-core GPUs. First, we present a load balancing technique to maximize parallelism in non-serial polyadic Dynamic Programming (DP), which is a family of dynamic programming algorithms with more non-uniform data access pattern. We show that a bottom-up approach to solving the DP problem exploits more parallelism and therefore yields higher performance.
    [Show full text]
  • Massively Parallel Computers: Why Not Prirallel Computers for the Masses?
    Maslsively Parallel Computers: Why Not Prwallel Computers for the Masses? Gordon Bell Abstract In 1989 I described the situation in high performance computers including several parallel architectures that During the 1980s the computers engineering and could deliver teraflop power by 1995, but with no price science research community generally ignored parallel constraint. I felt SIMDs and multicomputers could processing. With a focus on high performance computing achieve this goal. A shared memory multiprocessor embodied in the massive 1990s High Performance looked infeasible then. Traditional, multiple vector Computing and Communications (HPCC) program that processor supercomputers such as Crays would simply not has the short-term, teraflop peak performanc~:goal using a evolve to a teraflop until 2000. Here's what happened. network of thousands of computers, everyone with even passing interest in parallelism is involted with the 1. During the first half of 1992, NEC's four processor massive parallelism "gold rush". Funding-wise the SX3 is the fastest computer, delivering 90% of its situation is bright; applications-wise massit e parallelism peak 22 Glops for the Linpeak benchmark, and Cray's is microscopic. While there are several programming 16 processor YMP C90 has the greatest throughput. models, the mainline is data parallel Fortran. However, new algorithms are required, negating th~: decades of 2. The SIMD hardware approach of Thinking Machines progress in algorithms. Thus, utility will no doubt be the was abandoned because it was only suitable for a few, Achilles Heal of massive parallelism. very large scale problems, barely multiprogrammed, and uneconomical for workloads. It's unclear whether The Teraflop: 1992 large SIMDs are "generation" scalable, and they are clearly not "size" scalable.
    [Show full text]
  • Introduction to Parallel Computing
    INTRODUCTION TO PARALLEL COMPUTING Plamen Krastev Office: 38 Oxford, Room 117 Email: [email protected] FAS Research Computing Harvard University OBJECTIVES: To introduce you to the basic concepts and ideas in parallel computing To familiarize you with the major programming models in parallel computing To provide you with with guidance for designing efficient parallel programs 2 OUTLINE: Introduction to Parallel Computing / High Performance Computing (HPC) Concepts and terminology Parallel programming models Parallelizing your programs Parallel examples 3 What is High Performance Computing? Pravetz 82 and 8M, Bulgarian Apple clones Image credit: flickr 4 What is High Performance Computing? Pravetz 82 and 8M, Bulgarian Apple clones Image credit: flickr 4 What is High Performance Computing? Odyssey supercomputer is the major computational resource of FAS RC: • 2,140 nodes / 60,000 cores • 14 petabytes of storage 5 What is High Performance Computing? Odyssey supercomputer is the major computational resource of FAS RC: • 2,140 nodes / 60,000 cores • 14 petabytes of storage Using the world’s fastest and largest computers to solve large and complex problems. 5 Serial Computation: Traditionally software has been written for serial computations: To be run on a single computer having a single Central Processing Unit (CPU) A problem is broken into a discrete set of instructions Instructions are executed one after another Only one instruction can be executed at any moment in time 6 Parallel Computing: In the simplest sense, parallel
    [Show full text]
  • Massively Parallel Processor Architectures for Resource-Aware Computing
    Massively Parallel Processor Architectures for Resource-aware Computing Vahid Lari, Alexandru Tanase, Frank Hannig, and J¨urgen Teich Hardware/Software Co-Design, Department of Computer Science Friedrich-Alexander University Erlangen-N¨urnberg (FAU), Germany {vahid.lari, alexandru-petru.tanase, hannig, teich}@cs.fau.de Abstract—We present a class of massively parallel processor severe, when considering massively parallel architectures, with architectures called invasive tightly coupled processor arrays hundreds to thousands of resources, all working with real-time (TCPAs). The presented processor class is a highly parame- requirements. terizable template, which can be tailored before runtime to fulfill costumers’ requirements such as performance, area cost, As a remedy, we present a domain-specific class of mas- and energy efficiency. These programmable accelerators are well sively parallel processor architectures called invasive tightly suited for domain-specific computing from the areas of signal, coupled processor arrays (TCPA), which offer built-in and image, and video processing as well as other streaming processing scalable resource management. The term “invasive” stems from applications. To overcome future scaling issues (e. g., power con- sumption, reliability, resource management, as well as application a novel paradigm called invasive computing [3], for designing parallelization and mapping), TCPAs are inherently designed and programming future massively parallel computing systems in a way to support self-adaptivity and resource awareness at (e.g., heterogeneous MPSoCs). The main idea and novelty of hardware level. Here, we follow a recently introduced resource- invasive computing is to introduce resource-aware program- aware parallel computing paradigm called invasive computing ming support in the sense that a given application gets the where an application can dynamically claim, execute, and release ability to explore and dynamically spread its computations resources.
    [Show full text]
  • Real-Time Network Traffic Simulation Methodology with a Massively Parallel Computing Architecture
    TRANSPORTATION RESEA RCH RECORD 1358 13 Advanced Traffic Management System: Real-Time Network Traffic Simulation Methodology with a Massively Parallel Computing Architecture THANAVAT }UNCHAYA, GANG-LEN CHANG, AND ALBERTO SANTIAGO The advent of parallel computing architectures presents an op­ of these ATMS developments would ensure optimal net­ portunity fo r lra nspon a tion professionals 1 imulate a large- ca le workwise performance. tra ffi c network with sufficienily fa t response time for real-time The anticipated benefits of these control methods depend opcrarion. Howeve r, ii neccssira tc ·. a fundament al change in the on the complex interactions among principal traffic system modeling algorithm LO tuke full ad va ntage of parallel computing. • uch a methodology t imulare tra ffic n twork with the Con­ components. These systems include driver behavior, level of nection Machine, a massively parallel computer, is described . The congestion, dynamic nature of traffic patterns, and the net­ basic parallel computing architectures are introdu ed, along with work's geometric configuration. It is crucial to the design of a list of commercially available parall el comput ers. This is fol ­ these strategies that a comprehensive understanding of the lowed by an in-depth presentation of the proposed simulation complex interrelations between these key system components meth odology with a massively parallel computer. The propo ed traffic simulation model ha. an inherent path-proces ing capa­ be established. Because it is often difficult for theoretical bilit y to represent drivers ' roure choice behavior at the individual­ formulations to take all such complexities into account, traffic vehicle level.
    [Show full text]
  • Massively Parallel Message Passing: Time to Start from Scratch?
    Massively Parallel Message Passing: Time to Start from Scratch? Anthony Skjellum, PhD University of Alabama at Birmingham Department of Computer and Information Sciences College of Arts and Sciences Birmingham, AL, USA [email protected] Abstract primitive abstractions do not do enough to abstract how operations are performed compared to what is to be performed. Extensions Yes: New requirements have invalidated the use of MPI and to and attempts to standardize with distributed shared memory (cf, other similar library-type message passing systems at full scale on MPI-2 one-sided primitives [10]) have been included, but are systems of 100's of millions of cores. Uses on substantial subsets currently somewhat incompatible with PGAS languages and sys- of these systems should remain viable, subject to limitations. tems [15], and furthermore MPI isn't easy to mutually convert No: If the right set of implementation and application deci- with BSP-style notations or semantics either [16]. Connection sions are made, and runtime strategies put in place, existing MPI with publish-subscribe models [17] used in military / space-based notations can used within exascale systems at least to a degree. fault-tolerant message passing systems are absent (see also [18]). Maybe: Implications of instantiating “MPI worlds” as islands The hypothesis of this paper is that new requirements arising inside exascale systems cannot damage overall scalability too in exascale computation have at least partly invalidated the use of much, or they will inhibit exascalability, and thus fail. the parallel model expressed and implied by MPI and other simi- lar library-type message passing systems at full scale on systems Categories and Subject Descriptors F.1.2.
    [Show full text]