High Performance Distributed Computing
Total Page:16
File Type:pdf, Size:1020Kb
Syracuse University SURFACE Northeast Parallel Architecture Center College of Engineering and Computer Science 1995 High Performance Distributed Computing Geoffrey C. Fox Syracuse University, Northeast Parallel Architectures Center Follow this and additional works at: https://surface.syr.edu/npac Part of the Computer Sciences Commons Recommended Citation Fox, Geoffrey C., "High Performance Distributed Computing" (1995). Northeast Parallel Architecture Center. 56. https://surface.syr.edu/npac/56 This Article is brought to you for free and open access by the College of Engineering and Computer Science at SURFACE. It has been accepted for inclusion in Northeast Parallel Architecture Center by an authorized administrator of SURFACE. For more information, please contact [email protected]. High Performance Distributed Computing Geo rey C. Fox [email protected] http://www.npac.syr.edu Northeast Parallel Architectures Center 111 College Place Syracuse University Syracuse, New York 13244-4100 Abstract High Performance Distributed Computing (HPDC) is driven bythe rapid advance of two related technologies|those underlying computing and communications, resp ectively. These technology pushes are linked to application pulls, whichvary from the use of a cluster of some 20 workstations simulating uid ow around an aircraft, to the complex linkage of several hundred million advanced PCs around the glob e to deliver and receivemultimedia information. The review of base tech- nologies and exemplar applications is followed by a brief discussion of software mo dels for HPDC, which are illustrated bytwo extremes| PVM and the conjectured future World Wide Web based WebWork concept. The narrative is supplemented by a glossary describing the diverse concepts used in HPDC. 1 Motivation and Overview Advances in computing are driven by VLSI or very large scale integration, whichtechnology has created the p ersonal computer, workstation, and par- allel computing markets over the last decade. In 1980, the Intel 8086 used 50,000 transistors while to day's (1995) \hot" chips have some ve million transistors||a factor of 100 increase. The dramatic improvementinchip density comes together with an increase in clo ck sp eed and improved design so that to day's workstations and PCs (dep ending on the function) havea factor of 50|1,000 b etter p erformance than the early 8086 based PCs. This p erformance increase enables new applications. In particular, it allows real time multimedia deco ding and display|this capability will b e exploited in 1 the next generation of video game controllers and set top b oxes, and b e key to implementing digital video delivery to the home. The increasing density of transistors on a chip follows directly from a decreasing feature size whichisin19950:5 for the latest Intel Pentium. Feature size will continue to decrease and by the year 2000, chips with 50,000,000 transistors are exp ected to b e available. Communications advances have b een driven by a set of physical trans- p ort technologies that can carry much larger volumes of data to a much wider range of places. Central is the use of optical b er, whichisnowcom- p etitive in price with the staple twisted pair and coaxial cable used bythe telephone and cable industries, resp ectively. The widespread deploymentof optical b ers also builds on laser technology to generate the light, as well as the same VLSI advances that drive computing. The latter are critical to the high-p erformance digital switches needed to route signals b etween arbitrary source, and destination. Optics is not the only physical medium of imp ortance|continuing and critical communications advances can b e as- sumed for satellites and wireless used for linking to mobile (cellular) phones or more generally the future p ersonal digital assistant (PDA). One way of exploiting these technologies is seen in parallel pro cessing. VLSI is reducing the size of computers, and this directly increases the p er- formance b ecause reduced size corresp onds to increased clo ck sp eed, which is approximately prop ortional to (1=) for feature size . Crudely, the cost 2 of a given numb er of transistors is prop ortional to silicon used or , and 3 so the cost p erformance improves by(1=) . This allows p ersonal comput- ers and workstations to deliver to day, for a few thousand dollars, the same p erformance that required a sup ercomputer costing several million dollars just ten years ago. However, we can exploit the technology advances in a di erentwayby increasing p erformance instead of (just) decreasing cost. Here, as illustrated in Figure 1, we build computers consisting of several of the basic VLSI building blo cks. Integrated parallel computers require high sp eed links b etween the individual pro cessors, called no des. In Figure 1, these links are etched on a printed circuit b oard, but in larger systems one would also use cable or p erhaps optical b er to interconnect no des. As shown in Figure 2, parallel computers have b ecome the dominant sup ercom- puter technology with current high and systems capable of p erformance of 11 up to 100 GigaFLOPS or 10 oating p oint op erations p er second. One ex- p ects to routinely install parallel machines capable of TeraFLOPS sustained p erformance by the year 2000. 2 Figure 1: The nCUBE-2 No de and Its Integration into a Board. Up to 128 of these b oards can b e combined into a single sup ercomputer. 3 INTEL DoE (9000 P6) TERAFLOP (Trillion Fujitsu VPP500(80) INTEL Paragon(6768) Operations Cray T3D(1024) IBM SP2 (512) per second) nCUBE-2 TMC CM5 (1024) Transputer Cray C-90 (parallel) nCUBE-1 CRAY-2S GIGAFLOP CRAY-YMP (Billion CRAY-XMP C-90 Operations P per second) E CRAY-1S ETA-10G R ETA-10E High End F Workstation O CDC 7600 IBM 3090VF R MEGAFLOP CDC 6600 M (Million Operations STRETCH A per second) N C IBM 704 E MANIAC SEAC Accounting Machines 1940 1950 1960 19701980 1990 2000 Year Massively Parallel Modestly Parallel Sequential -Microprocessor -Supercomputer -Supercomuter Nodes Performance Nodes Performance Nodes Figure 2: Performance of Parallel and Sequential Sup ercomputers 4 Figure 3: Concurrent Construction of a Wall using N = 8 Bricklayers Often, one compares such highly coupled parallel machines with the hu- man brain, whichachieves its remarkable capabilities by the linkage of some 12 10 no des|neurons in the case of the brain|to solve individual complex problems, such as reasoning and pattern recognition. These no des havein- dividually medio cre capabilities and slow cycle time (ab out .001 seconds), but together they exhibit remarkable capabilities. Parallel (silicon) comput- ers use fewer faster pro cessors (current MPPs have at most a few thousand micropro cessor no des), but the principle is the same. However, so ciety exhibits another form of collective computing whereby it joins several brains together with p erhaps 100,000 p eople linked in the design and pro duction of a ma jor new aircraft system. This collectivecom- puter is \lo osely-coupled"|the individual comp onents (p eople) are often separated by large distances and with mo dest p erformance \links" (voice, memos, etc.). Figure 3 shows HPDC at work in so ciety with a bunch of mason's build- ing a wall. This example is explained in detail in [Fox:88a], while Figure 4 shows the parallel neural computer that makes up each no de of the HPDC system of Figure 3. Wehave the same twochoices in the computer world. Parallel pro cessing is gotten with a tightly coupled set of no des as in Figure 1 or Figure 4. High Performance Distributed Computing, analogously to Figure 3, is gotten from a geographically distributed set of computers linked together with \longer wires," but still co ordinated (a software issue discussed later) to solvea \single problem" (see application discussion later). HPDC is also known as metacomputing, NOW's (Network of Workstations) and COW's (Clusters of 5 Figure 4: Three Parallel Computing Strategies Found in the Brain (of a Rat). Each gure depicts brain activity corresp onding to various functions: (A) continuous map of a tactile inputs in somatosensory cortex, (B) patchy map of tactile inputs to cereb ellar cortex, and (C) scattered mapping of olfactory cortex as represented by the unstructured pattern of 2DG up date in a single section of this cortex [Nelson:90b]. 6 Workstations), where eachacronym has a slightly di erent fo cus in the broad HPDC area. Notice the network (the so called \longer wires" ab ove) which links the individual no des of an HPDC metacomputer can b e of various forms|a few of the manychoices are a lo cal area network (LAN), suchas ethernet or FDDI, a high-p erformance (sup ercomputer) interconnect, such as HIPPI or a wide area network WAN with ATM technology. The physical connectivity can b e copp er, optical b er, and/or satellite. The geographic distribution can b e a single ro om; the set of computers in the four NSF sup ercomputer centers linked by the vBNS; most grandiosely, the several hundred million computers linked by the Global Information Infrastructure (GI I) in the year 2010. HPDC is a very broad eld|one can view client-server enterprise com- puting and parallel pro cessing as sp ecial cases of it. As we describ e later, it exploits and generalizes the software built for these other systems. By de nition, HPDC has no precise architecture or implementation at the hardware level|one can use whatever set of networks and computers is available to the problem at hand. Thus, in the remainder of the article, we fo cus rst on applications and then some of the software mo dels.