<<

PARALLEL AND DISTRIBUTED SUPERCOMPUTING AT CALTECH

Paul Messina Caltech Concurrent Supercomputing Facilities Caltech, Mail Code 158-79 Pasadena, California 91125 U.S.A.

Abstract Intel iPSC/860, the Symult S2010, and the CM-2 are all heavily used. Each of these systems has several heavy users Caltech uses parallel for a variety of large-scale at any one point in time. Typically, demand for computa­ scientific applications. Earlier in the decade locally tional resources is greater than availability, so time on the designed and built were used. More recently we machines has to be rationed among the users. have acquired commercial parallel computers, some of Our decision to do large-scale computations on parallel which have performance that rivals or exceeds that of con­ computers has paid off reasonably well. Many groups have ventional, vector-oriented . A new project used CCSF systems to do computations of a size that would has been started that builds on our experience with concur­ have required prohibitively large time allocations on tradi­ rent computers and attempts to apply our methods to the tional supercomputers. Perhaps because almost all of our simultaneous use of parallel and vector supercomputers at application programs are moderate in size (less than 10,000 four institutions that will be connected by a 800 Mbits/se.c. lines) and because many of them were developed specifi­ wide-area network. Distributed supercomputing cally for a parallel computer, getting efficient parallel experiments will be carried out on this testbed. implementations has not been the most difficult problem. General system instability, system software immatwity, Concurrent Supercomputing and bottlenecks in the configurations have been Researchers at the California Institute of major impediments. In other words, it has been the newness (Caltech) have been using concurrent computers for a wide (and therefore immatwity) of the computers rather than variety of scientific and engineering applications for nearly their parallel architectures that has created the most prob­ a de.cade. The hypercube developed lems. For example, the Fortran and C for most of and first implemented at Caltech in the early 1980's pro­ the machines produce rather inefficient sequential code, so vided the stimulus and opportunity for such early produc­ that it is frequently necessary to do a little assembler pro­ tion use of parallel computers. People at Caltech invested gramming to get a reasonable fraction of the hardware's their time and effort in learning how to program parallel peak speed. computers for the usual two reasons: the promise of better Applications are programmed in Fortran or C with mes­ price-performance than sequential machines and the ability sage passing extensions. Two message-passing systems are of parallel computers to scale to much greater performance widely used on our Multiple Instruction Multiple levels and memory sizes than traditional supercomputers. {MIMD) systems: Express, which is a commercial product The first of these has of course been realized by many com­ based on early work at Caltech and Cosmic Environment/ mercial parallel computers. 1be se.cond of these advan­ Reactive Kernel which is the product of Charles Seitz's tages, which at Caltech is the more important, has only research group in the Department. Both recently been demonstrated, for example, by the Connec­ run on a variety of computers, including network-con­ tion Machine model CM-2 in its largest configuration. nected , and thus provide portability for paral­ In pursuit of the advantages of parallel computers, in lel programs at the source level. 1988 Caltech established the Caltech Concurrent Super­ Caltech is by no means the only institution that is work­ Facilities (CCSF). CCSF has acquired and oper­ ing actively in large-scale computing on parallel comput­ ates a variety of concurrent computers, most of which are ers. Many research laboratories and other universities have fast enough that they achieve speeds comparable to one or also acquired and used parallel computers. In the fall of more CRAY Y-MP processors for real applications. Figure 1990, thirteen organizations joined Caltech in forming the 1 shows the current complement of machines in CCSF. The Concurrent Supercomputing Consortium (CSC). The CSC first-generation NCUBE, the JPI./Caltech Marie Illfp, the was formed to acquire computers larger than any single

64 CH2961-1/91/0000/0064$01.00©1991 IEEE Mesh-connected Hypercube with multicomputer Hypercube with scalar and vector Hypercube with with optional scalar processors processors scalar processors vector boards

·.'.•,:_,•:_..,•_.·.,·:··.:_·_.,:_,•·, .'1._:_ __ •. _.·'1. :•·.,•.,.•_ __ ..'•..• ...•·_,•- ••,1.. ..'•_.:,_ •• _,··,_,·.,.•:,•:'1:.,,_:! •.. 1:,·! ..,.'•_,,•.!.,_!:,,_s',• ,:__.. _::.:_ •.':.·':._.·,··_· .•,·_: ••.iitiit:.mm ... _:•_:,:'.>y,,_:,':._::,:,•. ~.:.·•.,_·_·:·.. '.:.·::.. '•.. _·,.. •.•-.-.:··,· ..:.. '·.·.::._.::·.•:_: .. _·.·_•·.:·:_~: .... :_,·· ... '~..::.:_.'• .. ,•·._.,~·._,:._1,, ..•.,.··_,!;::•:, .. •_,:.:_·,,,_·.,.•.:I_:':•.,.·•_,,.:_,··_:•·,.·,·:•·_ ..'•.•.•:·,• •.·•':•:··.:···::•:.::::•:.,•·,······.,.•·:,·.:• ..,.•·.·.,,_:,._:•'_,••.••.,• · i•ii!l·i-i!i:··.",••ll.l·:m··:..... :::_,.• _,··_,:_,·_,·_,·:_,._,_,·•_,·•_,·:_,•_,·:_,:_,•_,,:_,:_,'_,._,i_,:_,'_,._,:_,:_,·_,_,m;_,•_,·;_,,:_,•_,:_,,~_,·:_,:_,•m_,,'_,·__ .. _,•_,,~_l·.,:•A_··,:.··~ •..':_,':':.'·~~-::.lf.•,''l'::i••••,:::t ,:•_:·.-at.!·$.~to· ...... -==-===-===-="""""'"'""' 0008 :r:::r:::rn:::::::rn:\(.ga:anrrrr • Qt, ... :,?:.:::-::::.. :1:: . .'t .:::::r:::~Ri:iiilt.1z.:~·:::1 custom processor 2 MC 68020/68882 + Intel i860 MC 68020/68882 l/lMB/ Weitek XL chips 16MB/node 4MB/node 4MB/node 7.2 GB CFS (disk) Two systems in place: parallel disk farm One 64-node and one with 192 nodes parallel graphics two 32-node systems (50 with vector boards Will become one with Weitek XL chips) 128-node system and one with 32 nodes Rwis Express winter 1990 Runs Express & NX Rwis CFJRK & Express

Mesh-connected multicomputer NSFNET :{ andESNET connections T800 transputers CITNET 4MB/node SIMD scalar CERFNET, NSFNET, JPL, Runs Express Los Nettos, BITNET, HEPNET

Defense Research , -Custom 1-bit processors + Weitek XL chips ES

Figure 1. Major Systems in the Caltech Concurrent Supercomputing Facilities. member can currently afford and to infonnation and supercomputers; the exchange of data among the collabo­ expertise on large-scale scientific computing on massively rating computers would be very slow compared to the cal­ parallel computers. Caltech will acquire and operate the culation rate. The computers would spend much of the time Intel Touchstone Delta System on behalf of the CSC. The waiting for intermediate results to be communicated by Delta will be delivered in the spring of 1991. It is a distrib­ their fellow workers so that the phase of the computa­ uted memory MIMD system whose nodes are connected in tion can be tackled. Networks with much greater a two-dimensional mesh by mesh- chips developed can be built, but the greater speed is not sufficient to solve by Seitz at Caltech. With a peak speed of 32 gigaflops and the problem. Because thousands of miles must be tta­ over 8 of memory it will provide a powerful new versed, the time to deliver a message from one location to tool to computational scientists at CSC sites. another will still be significant due to propagation delays. Distributed Supercomputing To address these new areas, Caltech has joined with the Jet Propulsion Laboratory, Los Alamos National Labora­ Using concurrent machines to carry out ­ tory, and the San Diego Supercomputer Center in the level computations has proved to be feasible and, with sys­ CASA project CASA has support from the Corporation for tems like the Intel Delta and Thinking Machines CM-2, National Research Initiatives (CNRI), which was awarded provides a to greater computing power than ttaditional a grant by the National Science Foundation and the approaches. However, even those systems are not appropri­ Defense Advanced Research Projects Agency for such ate or adequate for certain tasks. As scientists study ever activities. CASA will create a network testbed that will more complex phenomena through computer simulation, connect all four sites and operate at gigabit/second speeds. they often require multidisciplinary approaches, informa­ The goal of the CASA testbed is to demonsttate that high­ tion stored in varied datahaia, and computing resources speed networks can be used to provide the necessary com­ exceeding any single superc001puter. The necessary putational resources for leading-edge scientific problems, resources are geographically dispersed. All supercomput­ regardless of the geographical location of these resources. ers cannot be put in one place. Scientists with expertise on Three important scientific problems will be studied by har­ various aspects of a problem reside at several universities nessing the computer and data resources at the four institu­ and research laboratories. Huge, frequently-updated data­ tions. bases are maintained at only one site. Furthennore, the design details of the most advanced supercomputers make A challenge is to devise ways to use multiple super­ some better suited for certain computations than others. In computers and high bandwidth channels with large laten­ a typical large-scale simulation, there are several computa­ cies to solve important problems. While high latencies can tional steps; some phases of the computation may be most be minimized by the correct protocol and amortized by efficient on a Single Insttuction Multiple Data (SIMD) par­ ttansmitting large amounts of data, choice of allel machine, other phases may run better on a MIMD and application decomposition methods will also be inves­ computer. tigated for optimally driving the "meta-computer" that will be formed by linking high-performance computers with the Dislributing large computations among several super­ network. computezs provides the opportunity both to bring to bear greater computing power than is available in any single The three applications that the CASA testbed will adapt machine and to use the most suitable machine for each step to run in a disttibuted fashion are from the areas of chemis­ of the task. By correctly decomposing an application, non­ try, geophysics, and climate modeling. Chemical reaction linear speedups might be achieved. The execution time can dynamics computations will be carried out to study the be decreased by a factor greater than the sum of the effec­ reaction of fluorine and hydrogen, which is relevant to tive speeds of each CPU. In addition, by creating a high­ powerful chemical lasers. These computations involve performance disttibuted environment, a further paradigm operations on very large mattices and require frequent shift in science methodology can occur. Computational of large blocks of data between the com­ models can be extended into real-time interactive simula­ puters that participate in the calculation. The second appli­ tions that integrate data from experiments, satellites, and cation will develop an interactive program for datahaa. The field of interactive simulation promises to geological applications that takes input from Landsat, seis­ be an extremely powerful tool for science. mic, and topographic . Among the benefits of such analysis will be much clearer identification of fault Computer networks can provide the needed connection woes, plate thrusts, surface erosion effects, and an between the people, the machines, and the data, but today's improved ability to predict earthquake magnitude. The cli­ networks are far too slow to support most large-scale appli­ mate modeling application will combine ocean and atmo­ cations across wide areas. This mismatch of speeds pre­ spheric models simultaneously running in separate cludes effective distribution of work among dispersed computers and continually exchanging data across the

66 CASA network. The resulting concurrent dynamic model will be much more realistic than existing models that use static data f

67