State-Of-The-Art Parallel Computing, Molecular Dynamics on The

he availability of State-of-the-Art massively parallel Tcomputers presents computational scientists Parallel Computing with an exciting challenge. The architects of these machines claim that they are scalable—that a machine molecular dynamics on the connection machine containing 100 processors arranged in a parallel architecture can be made 10 times more powerful by employing the same basic Peter S. Lomdahl and David M. Beazley design but using 1000 processors instead of 100. Such massively parallel computers are available, and one of largest—a CM-5 Connection Machine containing 1024 processors—is here at Los Alamos National Laboratory. The challenge is to realize the promise of those machines—to demon- strate that large scientific This “snapshot” from a molecular dynamics simulation shows a thin plate problems can be computed undergoing fracture. The plate contains 38 million particles. This state-of- with the advertised speed the-art simulation was performed using a scalable algorithm, called SPaSM, and cost-effectiveness. specifically designed to run on the massively parallel CM-5 supercomputer at The challenge is signifi- Los Alamos National Laboratory. The velocities of the particles are indicated by color—red and yellow particles have higher velocities than blue and gray cant because the architec- particles. ture of parallel computers differs dramatically from that of conventional vector supercomputers, which have been the standard tool for scientific computing for the last two decades. 44 Los Alamos Science Number 22 1994 Number 22 1994 Los Alamos Science 45 State-of-the-Art Parallel Computing The massively parallel architecture perimental measurement contains over approach programmers use explicit in- requires an entirely new approach to ten billion atoms. Therefore, the larger structions to initiate communication programming. A computation must the number of atoms included in the among the processors of a parallel sys- somehow be divided equally among the calculation, the more interesting the re- tem, and the processors accomplish the hundreds of individual processors and sults for materials science. Scientists communication by sending messages to communication between processors would very much like to perform simu- each other. must be rapid and kept to a minimum. lations that predict the motions of bil- Message-passing programming is ad- The incentive to work out these issues lions of atoms during a dynamical vantageous not only in achieving high was heightened by Gordon Bell, an in- process such as machining or fracture performance, but also in increasing dependent consultant to the computer and compare the predictions with ex- “portability,” because it is easily adapt- industry and well-known skeptic of periment. Unfortunately, the realization ed for use on a number of massively massively parallel machines. Six years of that prospect is not possible today. parallel machines. We were able to run ago he initiated a formal competition However, if the machines can be scaled our message-passing algorithm on an- for achievements in large-scale scientif- up in both memory capacity and speed, other parallel computer, the Cray T3D, ic computation on vector and parallel as the computer manufacturers have with minimal modification. machines. The annual prizes are promised, then simulations can be Following brief descriptions of MD awarded through and recognized by the scaled up proportionally so that billion- and the architecture and properties of Computer Society of the Institute of atom simulations may become a reality. the CM-5, we present the method by Electrical and Electronic Engineers Furthermore, as we show here, the cur- which we have mapped a molecular-dy- (IEEE). The work reported here was rent generation of massively parallel namics simulation to the architecture of awarded one of the 1993 IEEE Gordon machines has made it possible to simu- the CM-5. Our example illustrates Bell prizes. We were able to demon- late the motions of tens of millions of message passing and other techniques strate very high performance as mea- atoms. Such simulations are large that are widely applicable to effective sured by speed: Our simulation, called enough to perform meaningful studies programming of massively parallel SPaSM (scalable parallel short-range of the bulk properties of matter and to computers. molecular dynamics), ran on the mas- improve the approximate models of sively parallel CM-5 at a rate of 50 gi- those properties used in standard con- gaflops (50 billion floating-point opera- tinuum descriptions of macroscopic sys- Molecular Dynamics tions per second). That rate is nearly tems. Molecular-dynamics simulations 40% of the theoretical maximum; typi- on massively parallel computers thus Molecular dynamics is a computa- cal programs achieve only about 20% represent a new opportunity to study tional method used to track the posi- of the theoretical maximum. Also, the the dynamical properties of materials tions and velocities of individual parti- algorithm we developed has the proper- from first principles. cles (atoms or groups of atoms) in a ty of scalability. When the number of Massively parallel machines have material as the particles interact with processors used was doubled, the time only recently become popular, so the each other and respond to external in- required to run the simulation was cut vendors have not yet developed the fluences. The motion of each particle in half. software—the compilers—needed to is calculated by Newton’s equation of Our achievement was the adaptation optimize a given program for the par- motion, force 5 mass 3 acceleration, of the molecular-dynamics method to ticular architecture of their machines. where the force on a given particle de- the architecture of parallel machines. Consequently, to develop a scalable pends on the interactions of the particle Molecular dynamics (MD) is a first- parallel algorithm for molecular dynam- with other particles. The mutual inter- principles approach to the study of ma- ics that would yield high performance, action of many particles simultaneously terials. MD starts with the fundamental we needed to understand the inner is called a “many-body problem” and building blocks—the individual workings of the CM-5 and incorporate has no analytical, or exact, solution be- atoms—that make up a material and that knowledge into the details of our cause the force on a particle is always follows the motions of the atoms as program. Much of our work focused changing in a complex way as each they interact with each other through on implementing a particular technique particle in the system moves under the interatomic forces. Even the smallest for interprocessor communication influence of the others. piece of material that is useful for ex- known as “message passing.” In this The MD algorithm provides an ap- 46 Los Alamos Science Number 22 1994 State-of-the-Art Parallel Computing molecular1.adb 7/26/94 1.0 The procedure is repeated for each suc- ceeding timestep, and the result is a se- ries of “snapshots” of the positions and "Repulsive" region velocities of the particles at times t50, 0.5 ) Dt, 2Dt, … , nDt. ε In MD simulations particles are usually treated as point particles (that is, 0.0 they have no extent), and the force between any two particles is usually approximated as the gradient of a poten- Potential (in units of tial function V that depends only on the -0.5 "Attractive" region distance r between the two particles. The force on particle i from particle j can be written -1.0 0 1 2 3 4 dV Distance between particles (in units of σ) F 5 2} , where r 5 r 2 r , ij dr ij i j rij Figure 1. The Lennard-Jones Interaction Potential The Lennard-Jones interaction potential V between two particles is plotted as a func- and that force is directed along the line tion of the distance r between the particles. The potential is in units of e, and the dis- connecting the two particles. If the tance between the particles is in units of s. The parameter s is equal to the value of r force is negative, particles i and j at- at which the potential function crosses zero. The potential has a positive slope at tract one another; if the force is posi- longer distances, corresponding to an attractive force, and a steeply negative slope at tive, the particles repel each other. The short distances, corresponding to a strong repulsive force. Separating the two regions net force exerted on particle i by all is the local minimum of the potential—the deepest point in the potential “well”—which other particles j is approximated by has a depth equal to e and is located at r 5 21/6s. At the potential minimum the force summing the forces calculated from all between the particles is zero. Therefore, in the absence of effects from any other parti- pair-wise interactions between particle i cles, the separation between two particles will tend to oscillate around r 5 21/6s. and each particle j. The standard pair-wise potential proximate solution to the many-body used in MD simulations is the Lennard- 1 ′ } 2 problem by treating time as a succes- ri 5 ri 1 viDt 1 2 ai(Dt) Jones potential, which is known empiri- sion of discrete timesteps and treating cally to provide a reasonably good “fit” the force on (and therefore the accelera- and to experimental data for atom-atom in- ′ tion of) each particle as a constant for vi 5 vi 1 aiDt. teractions in many solids and liquids. the duration of a single timestep. At As shown in Figure 1, this potential has each timestep the position and velocity Sometimes, these equations are re- both an attractive part and a repulsive of each particle is updated on basis of placed by more sophisticated approxi- part: the constant acceleration it experiences mations for the new positions and ve- during that timestep.

State-Of-The-Art Parallel Computing, Molecular Dynamics on The

2.5 Classification of Parallel Computers

A Massively-Parallel Mixed-Mode Computer Designed to Support

Simulating Physics with Computers

Think in G Machines Corporation Connection

Massively Parallel Computing with CUDA

CS 677: Parallel Programming for Many-Core Processors Lecture 1

Core Processors

Massively Parallel Computers: Why Not Prirallel Computers for the Masses?

Introduction to Parallel Computing

Massively Parallel Processor Architectures for Resource-Aware Computing

Real-Time Network Traffic Simulation Methodology with a Massively Parallel Computing Architecture

Massively Parallel Message Passing: Time to Start from Scratch?