Parallel and Distributed Supercomputing at Caltech
Total Page:16
File Type:pdf, Size:1020Kb
PARALLEL AND DISTRIBUTED SUPERCOMPUTING AT CALTECH Paul Messina Caltech Concurrent Supercomputing Facilities Caltech, Mail Code 158-79 Pasadena, California 91125 U.S.A. Abstract Intel iPSC/860, the Symult S2010, and the CM-2 are all heavily used. Each of these systems has several heavy users Caltech uses parallel computers for a variety of large-scale at any one point in time. Typically, demand for computa scientific applications. Earlier in the decade locally tional resources is greater than availability, so time on the designed and built machines were used. More recently we machines has to be rationed among the users. have acquired commercial parallel computers, some of Our decision to do large-scale computations on parallel which have performance that rivals or exceeds that of con computers has paid off reasonably well. Many groups have ventional, vector-oriented supercomputers. A new project used CCSF systems to do computations of a size that would has been started that builds on our experience with concur have required prohibitively large time allocations on tradi rent computers and attempts to apply our methods to the tional supercomputers. Perhaps because almost all of our simultaneous use of parallel and vector supercomputers at application programs are moderate in size (less than 10,000 four institutions that will be connected by a 800 Mbits/se.c. lines) and because many of them were developed specifi wide-area computer network. Distributed supercomputing cally for a parallel computer, getting efficient parallel experiments will be carried out on this testbed. implementations has not been the most difficult problem. General system instability, system software immatwity, Concurrent Supercomputing and bottlenecks in the machine configurations have been Researchers at the California Institute of Technology major impediments. In other words, it has been the newness (Caltech) have been using concurrent computers for a wide (and therefore immatwity) of the computers rather than variety of scientific and engineering applications for nearly their parallel architectures that has created the most prob a de.cade. The hypercube computer architecture developed lems. For example, the Fortran and C compilers for most of and first implemented at Caltech in the early 1980's pro the machines produce rather inefficient sequential code, so vided the stimulus and opportunity for such early produc that it is frequently necessary to do a little assembler pro tion use of parallel computers. People at Caltech invested gramming to get a reasonable fraction of the hardware's their time and effort in learning how to program parallel peak speed. computers for the usual two reasons: the promise of better Applications are programmed in Fortran or C with mes price-performance than sequential machines and the ability sage passing extensions. Two message-passing systems are of parallel computers to scale to much greater performance widely used on our Multiple Instruction Multiple Data levels and memory sizes than traditional supercomputers. {MIMD) systems: Express, which is a commercial product The first of these has of course been realized by many com based on early work at Caltech and Cosmic Environment/ mercial parallel computers. 1be se.cond of these advan Reactive Kernel which is the product of Charles Seitz's tages, which at Caltech is the more important, has only research group in the Computer Science Department. Both recently been demonstrated, for example, by the Connec run on a variety of computers, including network-con tion Machine model CM-2 in its largest configuration. nected workstations, and thus provide portability for paral In pursuit of the advantages of parallel computers, in lel programs at the source level. 1988 Caltech established the Caltech Concurrent Super Caltech is by no means the only institution that is work computing Facilities (CCSF). CCSF has acquired and oper ing actively in large-scale computing on parallel comput ates a variety of concurrent computers, most of which are ers. Many research laboratories and other universities have fast enough that they achieve speeds comparable to one or also acquired and used parallel computers. In the fall of more CRAY Y-MP processors for real applications. Figure 1990, thirteen organizations joined Caltech in forming the 1 shows the current complement of machines in CCSF. The Concurrent Supercomputing Consortium (CSC). The CSC first-generation NCUBE, the JPI./Caltech Marie Illfp, the was formed to acquire computers larger than any single 64 CH2961-1/91/0000/0064$01.00©1991 IEEE Mesh-connected Hypercube with multicomputer Hypercube with scalar and vector Hypercube with with optional scalar processors processors scalar processors vector boards ·.'.•,:_,•:_..,•_.·.,·:··.:_·_.,:_,•·, .'1._:_ __ •. _.·'1. :•·.,•.,.•___ ..'•..• ...•·_,•- ••,1.. ..'•_.:,_ •• _,··,_,·.,.•:,•:'1:.,,_:! •.. 1:,·! ..,.'•_,,•.!.,_!:,,_s',• ,:__.. _::.:_ •.':.·':._.·,··_· .•,·_: ••.iitiit:.mm ... _:•_:,:'.>y,,_:,':._::,:,•. ~.:.·•.,_·_·:·.. '.:.·::.. '•.. _·,.. •.•-.-.:··,· ..:.. '·.·.::._.::·.•:_: .. _·.·_•·.:·:_~: .... :_,·· ...'~ ..::.:_.'• ..,•·._.,~·._,:._1,, ..•.,.··_,!;::•:, .. •_,:.:_·,,,_·.,.•.:I_:':•.,.·•_,,.:_,··_:•·,.·,·:•·_ ..'•.•.•:·,• •.·•':•:··.:···::•:.::::•:.,•·,······.,.•·:,·.:• ..,.•·.·.,,_:,._:•'_,••.••.,• · i•ii!l·i-i!i:··.",••ll.l·:m··:..... :::_,.• _,··_,:_,·_,·_,·:_,._,_,·•_,·•_,·:_,•_,·:_,:_,•_,,:_,:_,'_,._,i_,:_,'_,._,:_,:_,·_,_,m;_,•_,·;_,,:_,•_,:_,,~_,·:_,:_,•m_,,'_,·__ .. _,•_,,~_l·.,:•A_··,:.··~ •..':_,':':.'·~~-::.lf.•,''l'::i••••,:::t ,:•_:·.-at.!·$.~to· ......-==-===-===-="""""'"'""' 0008 :r:::r:::rn:::::::rn:\(.ga:anrrrr • Qt, ... :,?:.:::-::::.. :1:: . .'t .:::::r:::~Ri:iiilt.1z.:~·:::1 custom processor 2 MC 68020/68882 + Intel i860 MC 68020/68882 l/lMB/node Weitek XL chips 16MB/node 4MB/node 4MB/node 7.2 GB CFS (disk) Two systems in place: parallel disk farm One 64-node and one with 192 nodes parallel graphics two 32-node systems (50 with vector boards Will become one with Weitek XL chips) 128-node system and one with 32 nodes Rwis Express winter 1990 Runs Express & NX Rwis CFJRK & Express Mesh-connected multicomputer NSFNET :{ andESNET connections T800 transputers CITNET 4MB/node SIMD scalar CERFNET, NSFNET, JPL, Runs Express Los Nettos, BITNET, HEPNET Defense Research Internet, -Custom 1-bit processors + Weitek XL chips ES NET Figure 1. Major Systems in the Caltech Concurrent Supercomputing Facilities. member can currently afford and to share infonnation and supercomputers; the exchange of data among the collabo expertise on large-scale scientific computing on massively rating computers would be very slow compared to the cal parallel computers. Caltech will acquire and operate the culation rate. The computers would spend much of the time Intel Touchstone Delta System on behalf of the CSC. The waiting for intermediate results to be communicated by Delta will be delivered in the spring of 1991. It is a distrib their fellow workers so that the next phase of the computa uted memory MIMD system whose nodes are connected in tion can be tackled. Networks with much greater bandwidth a two-dimensional mesh by mesh-routing chips developed can be built, but the greater speed is not sufficient to solve by Seitz at Caltech. With a peak speed of 32 gigaflops and the problem. Because thousands of miles must be tta over 8 gigabytes of memory it will provide a powerful new versed, the time to deliver a message from one location to tool to computational scientists at CSC sites. another will still be significant due to propagation delays. Distributed Supercomputing To address these new areas, Caltech has joined with the Jet Propulsion Laboratory, Los Alamos National Labora Using concurrent machines to carry out supercomputer tory, and the San Diego Supercomputer Center in the level computations has proved to be feasible and, with sys CASA project CASA has support from the Corporation for tems like the Intel Delta and Thinking Machines CM-2, National Research Initiatives (CNRI), which was awarded provides a path to greater computing power than ttaditional a grant by the National Science Foundation and the approaches. However, even those systems are not appropri Defense Advanced Research Projects Agency for such ate or adequate for certain tasks. As scientists study ever activities. CASA will create a network testbed that will more complex phenomena through computer simulation, connect all four sites and operate at gigabit/second speeds. they often require multidisciplinary approaches, informa The goal of the CASA testbed is to demonsttate that high tion stored in varied datahaia, and computing resources speed networks can be used to provide the necessary com exceeding any single superc001puter. The necessary putational resources for leading-edge scientific problems, resources are geographically dispersed. All supercomput regardless of the geographical location of these resources. ers cannot be put in one place. Scientists with expertise on Three important scientific problems will be studied by har various aspects of a problem reside at several universities nessing the computer and data resources at the four institu and research laboratories. Huge, frequently-updated data tions. bases are maintained at only one site. Furthennore, the design details of the most advanced supercomputers make A key challenge is to devise ways to use multiple super some better suited for certain computations than others. In computers and high bandwidth channels with large