DOCUMENT RESUME

ED 344 869 IR 015 479

TITLE Federal High Performance Computing and Communications Program. The Department of Energy Component. INSTITUTION Department of Energy, Washington, D.C. Office of Energy Research. REPORT NO DOE/ER-0489P PUB DATE Jun 91 NOTE 51p.; Color photographs will not reproduce well. AVAILABLE FROM National Technical Information Service, U.S. Department of Commerce, 5285 Port Royal Rd., Springfield, VA 22161. PUB TYPE Information Analyses (070) -- Reports - Descriptive (141)

EDRS PRICE MF01/2CO3 Plus Postage. DE3CRIPTORS Agency Cooperation; *Computer Networks; Computer Science; Computer Software; *Computer System Design; Databases; Federal Programs; *Research and Development; Technological Advancement IDENTIFIERS *High Performance Computing; *Supercomputers

ABSTRACT This report, profusely illustrated with color photographs and other graphics, elaborates on the Department of EnergY (DOE) research program in High Performance Computing and Communications (HPCC). Tte DOE is one of seven agency programs within the Federal Research and Development Program working on HPCC. The DOE HPCC program emphasizes research in four areas: (1) HPCC Systemsevaluate advanced architectures for large-scale scientific and engineering applications; (2) Advanced Software Technology and Algorithmsresearch computational methods, algorithms, and tools; develop large-scale data handling techniques; establish HPCC research centers; and exploit extensive teaming among scientists and engineers, applied mathematicians, and computer scientists; () National Research and Education Networkresearch and develop high-speed computer networking technologies through a multiagency cooperative effort; and (4) Basic Research and Human Resources--encourage research partnerships between national laboratories, industry, and universities; support computational mathematics and computer science research; establish research and educational programs in computational science; and enhance K through 12 education. The stated goals of the DOE HPCC program are to support the economic competitiveness and productivity of the United States, accelerate the application of HPCC technology to the solution of scientific and engineering problems, and enhance U.S leadership in research, development, and deployment of HPCC technologies. A number cf examples of DOE HPCC operations are provided, including the MatVu matrix visualization cmputer software package; Global Ocean Model; Molecular Dynamics Simulations; Pion Propagator; Three-Dimensional Tokamak Modeling; MediaView, a system for multimedia communication; and the CASA testbed, a wide-area, very high speed communication network that enables a number of collaborating agencies to work with geographically ci:_spersed supercomputing resources. (DB) UI DEPANTINENT OF EDUCATION Calm 0 Eamon, Paimrat and apromment DOE/ER-0489P EDUCATIONALCREmeratturoamATION

the documentPULSDarn npproDucat as wavedtram me parson oraamaaaan onamaima C Maar change* hays Dam made * PgnOrOv reiWCWSUChOn quaidy P1)01 0vie* 000Oftf 9111/00 TR ma dam- mem do +lot mmassamy repreamd &boat OEM math:A of Orsacv

The DOE Pmgram Component of the

FEDERAL HIGH PERFORMANCE COMPWING AND COMMUNICATIONS FNIOGRAM

US DEPARTMF111. OF ENERGY Office of Enemy Research Scientific Comp* Staff Washingion, D.C. 20E5

Jtme 1991

.41%. BEST COPY AVAILABLE - , 4 fa- 1,4114? 1^0 14, ' r , -170004,,,411 , ,4- The DOE Program Component of the

FEDERAL HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS PROGRAM

US DEPARTMENT OF ENERGY Office of Energy & Sdentific Computing Staff Washington, D.C. 211586 Ace 1981 ACRONYMS DEFINING SUPPORT AGENCIES

AFOSR Air Force Office of Scientific Reseach ARO Army Research Office AT&T American Telephone and Telegraph DARPA Defense Advanced Research Projects Agency DNA Defense Nuclear Agency DOE Departnr-nt of Energy EPA Environmental Protection Agency IBM Internatimal Business Machines NASA National Aeronautics and Space Administration NSF National Science Foundation

This report has been reproduced directly from the best available copy.

Available to DOE and DOE contractors from the Office of Scientific and Technical Informa- tion. P.O. Box 62. Oak Ridge. TN 37831: prices available from (615) 5764401. FTS 626-8401.

Available to the public from the National Technical Infonnafion Service. U.S. Department of Commerce. 5285 Port Royal Rd.. Springfield. VA 22161. High Perlin-mance Computing and Comnuntications EXECUTIVE SUMMARY

The Department of Energy (DOE) High Suigiort the underlying research, network, Performance Computing and Commu- and computational infrasinictures on which nications (HPCC) Program is one of HPCC technology is based. seven agency programs within the Federal Research and Development Develop the human resource base to meet Program in HPCC that was proposed with the the growing needs of industry, academia, and President's Fiscal Year (FY) 1992 Budget and government in the area of HPCC. deticribed in the supplemental report to the budgetentitled Grand Challenges: High Performance Compwing and Communications) PROGRAM DESCRIPTION The overa 1 Federal HPCC Program is coordinated /iy the Federal Coordinating Council or. Science. Engineering, and Technol- The DOE HPCC program will emphasize ogy (FCCSET) Committee on Physical, research in each of the following four key Mathematical. and Engineering Sciences areas. Selected computational challenges . (PMES) through its subcommittee on High which have a significant effect on national Performance Computing. Communications, and leadership in science and technology, will be Information Technologies (HPCC1T). The used as focal points for these efforts. Grand Challenges report' describes the pro- grams and interrelationships of the Federal HPCC Systemsevaluate advanced archi- agencies that are panicipating in the HPCC tectures for large-scale scientific and engineer- Program. ing applications. The purpose of this document is to elaborate on the DOE research program in HPCC. Advanced Software Technology and Algo- rithmsresearch computational methods, algo- rithms. and tools; develop large-scale data-han- DOE RPM GOALS dling techniques; establish HPCC research cen- ters; exploit extensive teaming among scientists and engineers, applied mathematicians, and To support United States economic computer scientists. competitiveness and productivity through interdisciplinary research and human[MUM: National Research and Education Net- development workresearch and develop high-speed net- working technologies through a multiagency To accelerate the applicatiun of HPCC effort. technnfroy to the solution of scientific and engineer mg problems of significant Basic Research and Human Resourcesen- departmental interest. courage research partnerships between national laboratories, industry, and universities; support To enhance United States leadership in compu ...timid mathematics and computer sci- research, development, and deployment of ence research: establish research and educa- HPCC technologies. tional programs in computational science; en- hance K through 12 ozhication.

STRATEGY

Support computational advances through in- creased research and development efforts in ar- eas of traditional strength in the department

Promote the use of department and depart- ment-supported facilities as a market for HPCC prototypes and commercial product,. lune ivvii

n February 5. 1991. Dr. Allan Brom- Yeari System / Site ley announced the FY 1992 U.S. Re- search and Development Program for 1945 EniacU. of Pennsylvania, high performance cinnputing and AECIDOEYLANL First User communications to Congress. This IIPCC Program. described in the supplement' 1952 Maniac AECIDOENIANL to the President s FY 1992 budget. is the First Operational Fabrication of culmination of several years of ellim on the von Neumann Design pan of senior scientists and managers in gov- 1953 IBM 701 AECIDOEYLANL, ernment. academia, and industry examining First customer installation the state of U.S. high performance computer 1956 IBM 704 AEC(DOE)ILANL. and network technology. The program recom- First customer installation mends increased federal spending by the De- 1959 LARC AEC(DOE)ILLNL. panment of Commerce. the Defense Advanced Prototype system Research Projects Agency (DARPA). the DOE, the Environmental Protection Agency 1961 IBM 7030 Stretch AECIDOEY EPA). the Department ot Health and Human LANL. Prototype; AEC(DOE)ILLNL. Resources. the National Aeronautics and Serial 1 Space Adnunistraoon t NASA i. and the Na- 1963 CDC 3600 AECIDOEYLLNL. tional Science Foundation iNSF ) fOr research Serial 1 in advanced computer teChnolopes to develop 1964 CDC 6600 AECIDOEYLLNL. dramatically inure capable supercompuwrs. Serial 1 more powerful software capabilities. and high- 1969 CDC 7600 AEODOE)ILLNL, speed computer networks. 'rhis DOE docu- Serial I ment describes and discusses only the poten- tial DOE in; ;atives in response to the pro.,!ram 1974 CDC Star 100 plan and funding recommendations. AEODOE)ILLNL Serial 1 1976 MFENET DOE HPCC PROGRAM HISTORY ERDAIDOEIINMFECV at LLVL. First nationwide supercomputer Because ot its mission and the computation- network all y. intensive nature of energy-related applica- 1976 CRAY I tions and problems. the DOE mission depends ERDA(DOE)/LANL, Serial I on advancements in computational techniques and computer and networking technologies. 1981 IBM 3081 DOEISLAC As a result. IX)E has a long history of First customer installation computational research and development, with 1982 CDC Cyber 205 DOEIKAPL. strong industrial and university cooperation. Serial 3, first U.S. installation The current DOE Applied Mathematical 1984 CRAY 2/1 DOEINMFECC Sciences Program came about as the result of a at LLNL. Serial 1 quadrant suggestion by John von Neumann to enhance 1985 CRAY 2/4 DOEINMFECC understanding of the use of digital computers at LLNL. Serial 1 four processor in nuclear applications. Consequently. DOE 1987 ETA-10E/4 has been prominent in maintaining the DOE sponsored SCR1 at FSU, Serial I leadership in HPCC. in encouragingand 1988 IBM even providinginnovation in I-IPCC tech- DOEILANL first IIIPP1 prototype nologies. and in supporting US competitive- 1989 ETA- I OG/4 DOEIFSU. ness and productivity through its extensive use first and only 7-n$ machine ut HPCC technologies. The table (right) 1989 CM-2 DOEILANL, outlines a history of DOE supercomputing first 641 floating point machine haniware involvement in which the DOE national laboratory system has worked with 1990 CRAY 2/8 DOEINERSC. U.S. vendors to bring promising and innova- first 8-proressor CRAY 2 tive computing technologies to bear on 1990 Intel l860/128 DOEIORNL. departmental applications. Serial 1 Touchstone Gamma rirn PerforinamT c'fonpunnv and Cinnentintgatunis

. Oa

sIMIM.111.11wIdEarm.

AIM&

fr. 1

Enloe computer

DOE laboratories are among the pioneers of As an example. corsider the computational the network-based shared-resource architecture challenges posed by the various energy systems of the supercomputer environment and its sup- and materials of the automobile. The DOE mis- porting supercomputer system software. Ex- sion to provide and to implement a national en- amples include the first timesharing operating ergy strategy must include major efforts to re- system for supercomputers. three generations ofduce automotive gasoline consumption. mass storage servers, a supercomputer Fortran There are many energy-related problems environment including optimizing compikrs. a all of which are computational challengesin- portable Fortran mathematical subroutine li- volved in developing more fuel efficient. envi- brary, interactive debugging. vector extensions. ronmentally sound. and safer automobiles. and a portable operating system interface li- These include the modeling of the combustion brary. The laboratories have also pioneered systems, the materials and structure, the aero- computational science and the supporting math- dynamics, the control of materials processing ematical techniques and libraries. and components manufacture, and the use of al- John Von Neumann. The exploitation of state-of-the-art, innova- ternative energy sources such as chemical bat- circa 19,50 tive supercomputer technology has become teries or solar technologies. Any one of these more and more challenging. The complex. mas- individual energy applications saturates super- sively parallel computer architectures needed to computer systems oxlay. In order to develop provide the requisite computer power to solve and computationally experiment with the com- forefront energy research problems in the next plex. integrated models required to advance the decade will present difficult computational re- understanding of these problems. researchers search problems. These will require a rethink- will need to use the terallops computer systems ing of disciplinary and orsanizational bound- proposed for development in the Federal HPCC aries and will demand interdisciplinary team- Program. It will also be necessary to bring to- work on a scale that is unprecedented in recent gether teams of scientists and engineers, ap- memory.

3 BEST COPY AVAILABLE how 991

plied mathematicians, and computer scientotts worked in collaboration with industrial part- to develop the software technology and algo- ners to develop gigabit networks and interface rithms tbr these difficult problems. High Per- technology that increase communications formance Computing Research Centers bandwidth by two orders of magnitude. There- (1112CRCs) will serve as focal points for these fore. DOE also expects to contribute to the teams. In addition to addressing grand chal- HPCC Program gigabit testbed projectses- lenge applications problems. the activities of pecially with reg.ird to tngabit apr' .ations. individual HPCRCs may include usmg early protocols. and systems wftware. versions of advanced computers for software In recent years. tratiititinal theoretical and development, providing feedhack to system experimental techniqueN ;n science and engi- developers, and providing network access to neering have been augnicmed by a powe:ful the broader research community. new technique: computational science. The DOE has long recognized the need for term -computational si:ience- is used to de- computer networks to support remote access to scribe thase intellectual activities in science. its supercomputer systems and to support dis- engineering, mathematics, and computer sci- tributed research collaborations. In addition to ence that develop or exploit HPCC as an es- supercomputers. the DOE is responsible for sential tool. While theory and experimental highly sophisticated facilities such as the Su- methods will not he replaced by computational perconducting Super Collider. snchmtron science, they are being ..omplemented by it in light sources. neutron sources, and electron important ways. Currently, computational sci- microscopes to probe the atomic and molecu- ence educational programs are emerging in a lar structures tbr materials, medical. chemical. small number of American universities. Hav- biological. and pharmacological research. ing recognized the need for long-term invest- High-speed communications access to such fa- ment in basic research and education in com- cilities is necessary to enhance the quality and putational science, the DOE national laborato- !xope of research collaborations. Over IS ries have long been leaders in developing years ago, the DOF implemented national computational science techniques and in train- computer networking access to supercomput- ing limited numbers of postdoctoral research- ers. And more than 25 years ago. DOE labora- ers. DOE intends to contribute further in this tories pioneered high-perfomiance local net- vital area at all educational levels with a varied work access and the client-server model of program. including teacher workshops in com- shared network services such as mass storage. putational science, assistance with curriculum I/O autheitication. graphics, and terminal ac- development, and access to high-performance cess. Because of this development and systems computers for educational programs. integration experience and because of its mis- The DOE also recognizes that. in order to sion to operate these tbrefront facilities. LX)F. bring the full benefits of the HPCC Program to is in a unique position to contribute greatly to the national industrial complex and other enti- the advancemern or U.S. science through the ties concerned with proprietary or privacy m- networking component of the HPCC Program. fomiation. proper security measures must he The Federal HPCC Program includes a included in the program from the stan. The mulhagency. multigigabit network research DOE national laboratories have decades of ex- initiative. The DOE national laboratories have perience in dealing with complex computing traditionally researched very high-speed net- environments that process many levels of sen- working technologies in cooperation with U.S. sitive information. vendors to provide high-speed data transfer for The following sections provide &tail tin supercomputers and for expei intent control the research initiatives proposed by DOE as systems applications. Mmt recently. DOE has patt of the Federal HPCC Program.

4 High Perprmance Compsanng and Convnunu.anons HIGH-PERFORMANCE COMPUTING SYSTEMS MASSIVELY PARALLEL COMPUTERS

The Federal HPCC Technology Devel- Future generations of machines are expected opment Program in High Performance to rely heavily on architectural parallelism. Computing Systems will be primarily Identifying and understanding the limiting performed and coordinated by tors in parallel computing equipment is essen- DARPA. as described in the Grand tial in establishing realistic upper bounds on Challenge report. The DOE will participate in the requirements being placed on machine de- technology development. primatily with regard sign and supporting technologies. Computa- to evaluating advanced architectures for large- tional models that require true teraflop solu- scale scientific and engineering applications, tion speeds will saturate the underlying hard- The DOE has long played an influential ware in many areas. We need to understand role in pushing the development of high where these saturation points occur and seek performance computers. DOE applications are to establish the means to overcome them. The computationally and merrrbry intensive. They DOE will conduct research to determine the also push the state of the technology for effects of increasing parallelism in both hard- hardware interconnects to supporting commu- ware and software. Among the many issues to nications. storage, and output devices. DOE be addressed are the impact of code size on has a natural interest in development of the computational requirements. the effect of vari- foundations for the future generations ot ous languages and models of computation on computing that will lead to advances in raw the amount of usable parallelism made avail- speed and capacity support for trillions of able. the impact on reliability from increasing operations per second tteratlop) computing hardware complexity, the need for operating requirements. DOE also has an interest in system features that optimize the use of the encouraging the development of the next hardware, and the increased demands placed generation of machines so that architecture on supporting technologies. advances can be integrated into effective. Massively parallel computers use an ap- usable computing environments. An important proach to achieving speed that is radically dif- long-term goal that will guide DOE research in ferent from that used in today's production this area is to foster development of high supercomputers. Ii massively parallel comput- performance computational facilities that are ers, many hundreds or thousands of simpler architecturally balanced in a way to make them and less expensive processors are coupled with usable by general DOE applications. a large amount of memory. These processors DOE is specifically interested in research team up to carry out computations in parallel that will lead to understanding the architectural by breaking the problem into many subtasks limitations that will shape future machines, that can be carried out simultaneously by the understanding how basic parallel software sup- individual processors. The processors ex- port is impacted as the architecture scales to change information as needed to complete the higher performance. and providing resources computation. In this way computations can be for research in developing basic components carried out many times faster on the massively of future computing environments. parallel computer than on one of its individual processors. A N 40,1 1 ,.....'``-:' A ;....-,:. \ i \ \\\4 ',- '''y'ay46%'******:;'41:-' $ 0 \:\ § 1,4/04'" ".***tot ' . .-:7;s....',..--_,- 41.'17 '' : 10; A\ 44Nartio#,A - .....4.; \ ''' ,,,.,Iiinflip/V '1/ 7i i 0VI ....1-_:,.._, rA 0 t, 1 -,JO'AV41117°l' There are many different ways to link up example. aecurate three-dimensional flowfield processors and memory in massively parallel simulations might produce as many as ten bil- computers. This adds to the flexibility of the lion potentially important bytes of data at each computer hut also complicates the job of com- timestep. A typical simulation would involve puter design; system architects are faced with thousandn of timesteps. Thus. there might be many ahematoies and choosing the 1,est- as much as ten tnIlion bytes of data to he proc- ones can he a difficult task and may he essed from one problem. It' this document had strongly application dependem. that much information in it, it would be a tew The current generation of massively paral- billion pages long! Of course, only a very lel computers has been successfully used to small fraction of that data could he used in any wive selected applications problems zo, fast as human endeavor. Visualitation enables us to or faster than the best vector supercomputers. represent information in a form that effectively Because of improvements in manufactunng uses human perceptive capabilities. technology. it is ferecast that in two years these machines will calculate ten times faster than they do now: in five years they may he NETWORKING hundreds of times faster. To he effective. HPCC must he available to a To see why this speed is needed. consider large user population. not only in the national an example of three-dimemional flow simula- laboratories hut in universities and industry as tion used for global climate or reservoir mod- well. A goal of' the HPCC Program is to dis- eling. Current models. in which the flows are seminate this computing capability throughout caleulated by repeatedly evaluating equations American science. technology. commerce. and millions of times. inay not be sufficiently ac- industry. This will require local and national eurate. Nevertheless. these calculations can networks with capabilities matched to the take hundreds of hours on a conventional computing resources being accessed. Funher. supercomputer. Accurate climate models within a computing center, the networks to tie could require solutions ot equations hundreds machines together with one another and with of millions of times at each time step of the storage devices will need awesome band- simulation. Completing such calculations widthsbillions of bytes of information per would require years of supercomputer time. second or more. 1 he technology to carry out For weather forecasting. we need the data in this networking function is pan and parcel of' hours. Massively parallel computers currently the same technology in the computers that being designed could do th se demanding cal- generate the data: tast sw itches. high-speed in- culations in days: massively parallel comput- terconnects, networking MitKIIIS and ser- ers proposed to he developeei under the IIPCC s ices. and increasingly sophisticated network- program could meet this need. server computem.

GRAPHICS AND VISUAUZATION WORKSTATIONS AND PERSONAL COMPUTERS Me volume of data produced by such power- ul computing machines will tax storage me- The impact ot massively parallel supercom- dia. input/output devices, and the networks puters w ill be complemented by new genera- that carry the data to the user. The only rea- tions of personal workstations with unprece- sonable way to review the data is visually. For dented power. This power will spill over into the personal computer PC) market. The cost of low-end workstations is now comparable to that of well-equipped PCs. PCs are now ac- quiring workstation-like functions. This trend will continue until there is little to distinguish the two classes. It should not be surprising that workstation and PC will share in the power revolution: many of today's massively parallel compuiers use pnwessors that grew out ot PC

am: workstation processor technokigy. Today's personal workstations provide the power of yesterday's mini-supercomputer for a few thousand dollarsa major achievement t) 1 1 PilhOiPtitilli I al,/ I nIf1/11M1171,911

Ono :Thlod J 01,1 I 4 7, ak the use aj MatVu, a matrd visualkation package. researchers at the Center fit Supercomputing 0 Research and Development haw been able to classift the r), convergence patterns of numerous eigenvalue and singular value spectra bv to assigning a logarithmically scaled color table to the magnitudes of the off- diagonal elements in each sweep of the parth1dar Jacobi algorithm. Parallel Jacobi methods (two- and one-sidedi oh the Anima F.V8 and CRAY X -

tt... MP1414 have been quite effective for compwing the AL eigenvalues and eigenvectors of rectangular mauves. The color table used m the images ramps from black (matrix elements magni- tudes less :han or equal to the 64-bit machine precision) through blue, green. yellow. and orange to red Imam elenwnts qf largest magni- tude). Th9 diagonal elements (yellow. orange. and red each of the matrices repre- sent improved apprownia- lions to the exact singular values, with S, yielding the Most accurate approema- rim

pg.

Wolk wired by DOE. NSF. AFOSR. AT&T. and IBM , 'HI ' 1:47WNV!

of American indusm. In two years workstations of tomorrow. the HPCC Pmgram will provide may have the power of a CRAY Y-MP proces- a differentiating technology for Amenca. sor on a variety of applications. Much of the DOE's challenge is to help !Oster continued work currently suitable only for supercomputers development of the diverse pieces of the will he feasible on inexpensive workstations. IIPCC Program. hut especially to provide the expenise to glue the pieces mto a seamless COMPUTING ENVIRONMENTS whole.

An exciting aspect of the IIPCC Program is its .1.4/1 PERFORMANCE OP.TA STORAGE promiseat retain ely modest investmentio SYSTEMS revolunonize our approach to science. com- merce, and industry. The combination of mas- High performance data storage systems need sively parallel supercomputers. data manage- to be developed that are able to el tectively ment and visualitation computers. ultra-fast net- erve an environment of massivelparallel works. and very powerful personal workstations computers. general-purpose supercomputers. will greatly increase our knowledge base and scientific workstations, and visualliation. speed the rate of scientific progress. Similarly. One of the very important and very difficult the design process in industry will be enhanced: challenges of this decade will be to satisfy the rapid prototyping w ill he a reality. the cost of data storage and data acce,,s requirements in complex designs should decrease. and product the HPCC environment. Current large prob- qualny should benefit greatly. Accessing im- lems that run on supercomputers generate mense commercial databases will become an in- from one to ten gigabytes in data. this data tegral step in business planning. to the extent must he saved and then he available kir quick that information and knowledge are the currency access. With the current success in moving

SYSTEM

edar. the experimental Global Global li Global h shored-memory multiprocessor Memory Memory Memory system of the Center for Module Module Module Supercomputing Research and SP SP F§F-T-1 Development, has a scalable hierarchical structure. Clus- ters, each composed of eight

processors, are connected to a 0 * global memory by means of a very high bandwidth shuffle- Stage 2 8 x 8 Switch 8 x 8 Switch 0 0 8 x 8 Switch exchange global network. The * shuffle-exchange network is Global architent,rally scalable Switch because at has a fixed number 6* of lines per BO node, regard- less of network si:e. In addi- Stage 1 8 x 8 Switch 8 x 8 Switch 8 x 8 Switch tion, the system exhibits 0 scalability as a result of the power of the network and the hierarchical memory provided "Wim- by the clusters. Cedar exhibits stable performance over a wide Cluster Cluster Cluster range of applications because of this two-level parallelism. with and between clusters, for fine and coarse grain parallel- SP - Synchmnization Processor ism. respectively.

Walk suppaned by DOE. NSF. AFOSR, and IBM. High Proormance Computing and C.immunuaturry

gramming nutitipmcessor systems, and development o f panzliel programming method- ologies for writing parallel programs. Established in 1984, the ACRF has become recognised as one of the worl d 's leading he Advanced Connnuing Research centers* paralkl compiling research. Facility (ACRE) at Argonne National Labora-Approximately three hundred researchers from tory is committed to research on computers industry, universities and other government with innovative designs, with parallel archi- laboratories use the ACRE machines each tectures as the principal focus of attention. month for studies in sofmare and algorithm The ACRE comprises a wide variety of development. advanced computers, ranging from a four- To encourage use I the facility, the ACRF processor gruphk supercomputer to a 16,000-sponsors classes, workshops, and institutes on processor massively parullel Connection parallel computing. More than 5(A9 scientists Machine. have participated in these activities, and many This diversity of machines enables re- if rhea researches continue to use the ACRE seardwrs to conduct experiments on softwaremachines remotely via national networks. The portability among different computer architec-ACRE also has established two affiliates tures and to evaluate the suitability of differentprograms to promou the transjir of research configurations, including shared memory andresults to industry and academia. Currently, 15 distributed memory, for spec* applications. industrial affiliates and 27 university affiliates The machines are used in a wide range of we using the ACRE advanced computers to research projects, including design of parallelconduct computational research at the leading algorithms, analysis of languages fix pro- edge of technology. Weiss:potted by DOE and NSF.

large problems to massively parallel computers ADDITIONAL DOE ROLES IN HIGH such as the Connection Machine, the data stor- PERFORMANCE COMPUTING age and data access requirements have dramati- cally increased. A large problem on the Con- DOE continues, through its leading role in the nection Machine will generate from tens of gi- computational sciences, to have a strong inter- gabytes up to a terabyte. To fully support the est in shaping the development of future HPCC current state-of-the-art, large memory, mas- environments. An important goal for DOE is to sively parallel supercomputer. it would be nec- foster architecturally balanced approaches us- essary to achieve data transfer rates of at least able by a broad class of applications. These ap- 50 megabytes/second and a storage capacity of plications are the foundatior for the basic re- at least a hundred terabytes of data. As the mas- search needs of many U.S. industries. sively parallel machines be:ome more power- Specific DOE interests include ( I ) issues as- ful. the data handling requirements will like- sociated with the scalability of architectures wise increase. and their impact on software, (2) research To meet these data storage and data access aimed at providing the resources needed for requirements. it will be necessary to develop components of future systems. (3) reliability. very high perfon-nance storage systems with ad- and (4) integration of the components of this vanced data management capabilities. Such distributed computing environment. storage systems must be scalable to meet the in- 'the DOE laboratories have a great deal of creasing requirements and to maintain a bal- expertise in simulating advanced computer anced overall I-I PCC environment. architectures. This capability will be important to investigations of the scalability of multi- processor architectures. It can also enable accu- rate prediction of the performance of important applications on new designs before they reach hardware production. Thus, design flaws can be remedied early in the product development cycle before changes become prohibitively expensive.

1,1 9 i,orritAJI

ofthe past had relatively simple architectures. making it possible to largely decouple apphca- titms algorithm and software development trom tinware and algornhms are the keys to the details of machine architecture. Toda. the making 11PCC sstems both useful and simple architectures have been thoroughly tn.- economically successful. Funhennore. ploited. The complex archiwctures necessarto 1/41.4tor earlier genertmons ot cvmputers. carry HPCC through the present decade reuuire there is clear evidence that impnivements in extensive development of new softwaretech- software and algorithms lead to larger perform- nolopy,,numerical methods. and applications ance gains than did improvemems in machine algorithms in order that their potential hereal- architecture.ilowever, the computing systems

form and new numerical methods must be developed. Initial work will be with sponsors in four separate modules: global climate change. macromolecular design and environmental health software engineering, and field data he Center for Computational Engineer-management. ing (CCE) at Sandia National Laboratories, Through the CCE' s modules. Sandia will Livermore was established to develop and applyhelp corporate sponsors apply the new super- computational technology to problems of computers to problems in their own areas of national interest. Sponsored by the Department expertise. In other words, the modular struc- of Energy and private industry, the CCE is ture will facilitate technology collaboration. By tasked with bringing together researchers and pooling the talents of scientists from indusny. computer scientists from government and universities, and national laboratories, the private industry to conduct mutually beneficial CCE will automatically transfer computational scientific research. technology, thereby helping sponsors reduce Central to the CCE is the new generation of the -oncept-to-design cycle and speed product massively parallel supercomputers. To be able introduction. The sponsors' industrial expertise to use these massively parallel supercomputers. will help focus the CC'E's research on real- existing software must he rewritten in parallel world applkations.

MODULES COMPUTERS

Nabonai Global Climate Facilities Change s.

Macroiroiecular Design & Environmental Heatth 1/4

Software Engineenng

MathernaDcal AnalysisSupport

Education Universities

The CCE links engineering "modules" to evolving parallel processing technology.

Work supported by DOE.

10 L, HIM Perfin-mamr Clonpunng and Communn anon% Addressing this challenge requires a rethink- ing of disciplinary and organizational bounda- ries and an extraordinary teaming effort. Inte- grated approaches. reaching all the way from computer hardware design to grand challenge applications, must be sought. In order to create the software and algorithms for parallel. distrib- uted, and hierarchical computing, computer sci- entists and applied mathematicians will need to better communicate with scientists in these im- portant applications areas. Most 3f the Federal HPCC generic software technology development will be funded by DARPA. Because of the critical need for interagency coodination in this area, all partici- 15 54 255 1024 pating agencies will coordinate their advanced Number of Procesubs software technology and algorithms (ASTA) programs through the HPCCIT working group.

SUPPORT FOR GRAND CHALLENGES ioneering research at DOE's Sandia The DOE. both as a natural consequence of its National Labmatories in parallel methods. mission and specifically by virtue of its need to applications, and perfonnance evaluation on solve many of the grand challenge problems, a 1024-processor NCUBEIten hypercube won has at its disposal a broad range of important the Karp Challenge and the inaugural applications on which to base a strong and ag- Gordon Bell Award in 1988, as well as an gressive software technology and algorithms R&D 100 Award in 1989. This research support effort. To be successful, this effort will demonstrated 10IX4-fold parallel speedups for need to operate in close collaboration with three full-scale scientific applications on a grand challenge researchers so that a detailed conunercial parallel processing system and understanding of applications will be incorpo- introduced three new concepts of perform- rated into the earliest stages of software and al- ance modelling: scaled speedup, fixed-tkne gorithm design and so that application code de- speedup, and operation efficiency. Sandia was sign decisions will be made in light of full given a second R&D 100 Award in 1989 for understanding of feasible software and algo- the Linda Project, Its innovative research in rithm options. Grand challenge research teams parallel distributed computing using a large will be specifically called on to test emerging number of networiced heterogeneous software and algorithms and to recommend computers. modifications and improvements. Early transfer of developed technology to the private sector Work warted by DOE. will be vigorously pursued to provide both fi- nancial leverage and additional feedback on the effectiveness of the DOE-supported develop- ment effons.

Energy Conservation and Fossil Fuel Combustion For the foreseeable future, 90% of the energy needs of the United States will be met by the combustion of fossil fuels. Fossil fuels are burned in stationary combustion chambers, for example, for electrical power generation, and in mobile fonns such as in automobiles. Automobile engines are most efficient when run at hie temperatures (Carnot effect), but in- cieased temperature leads to increased nitrogen oxide emissions. The burning of alternative fu- I fi 1 1 ltdat'I V4.11

ds such as methanol is complicated by the An important example of the application of emission of formaldehyde. a known carcino- DOE supercomputer methodology is in the de- gen. into the atmosphere. Pollutants are af- velopment of an engine design code that is fully fected by local geology and climatic conditions,three dimensional and includes models of making it necessary when seeking solutions to chemical reactions. fluids. gases. and particu- take into ncount the total system of fuel, en- late formation. The code is in a constant state of gine. and atmosphere. Our environment is too evolution and is designed to handle the most delicate to be used as a testbed: therefore, we complex en?ines, such as the stratified charge must me supercomputers to simulate the at- engine and the two-stroke engine. mospheric infects before expenmcntmg.

oxides (NOx) and unburned hydrocarbons. These hydrocarbons may form solid particles known as soot, Also, the fuel may autoignite in some region before the flame arrives, giving rise to the small explosion known as engine High temperatureS Prefautes knock. Because engine operation is so complex. designing an optinwm engine by experimenta- tion (varying all design parameters independ- ently) is prohibitively expensive. A computer model of the engine allows us to effect design changes numerically on a scale of hours at a fraction of the cost. The engine components depicted in the illustration are a cupped piston, fuel injector. and spark plug. The piston may travel at speeds of up to 90 miles per hour. It draws in fresh air and expels exhaust gases as the combustion cycle progresses. The air and exhaust gases detailed understand-mg offuel burningflow at approximately the piston velocity. At an inside an internal combustion engine requires appropriate point in the cycle (upper right in the comprehension of a nwnber of complex the figure), liquid fuel is injected into the processes. One example is the Direct-Injection engine cylinder. The fuel jet breab up into Stratified Charge (DISC) engine, which small droplets, which interact with the air. The requires high petformance computing for its droplets alternately elongate and flatten; some optimum design. DISC is an experimental of them break up into two new droplets. others engine design. Its goal is to run leaner and collide and coal gee to form larger drops. T1,e cooler than conventional engines, thereby spray is caught up by the air flow and swirls reducing emissions while increasing the around in the cylinder. It simultaneously compression ratio and control of ignition for evaporates to form a gaseous fuellair mixture. greater efficiency and fuel economy. The Shown on the right is a computer simulation burning of the fuel-air mixture in such an depicting four different times during the engine brings into play a few hundred differentcombustion cycle of a DISC engine. The multi- chemical reactions between numerous short- colored surface indicates the fuel' s location, and long-lived chemical species. The rates at When the spray has e aporated sufficiently a which these reactions occur depend on the spark ignites the mixture, and combustion temperature and concentrations of the species. lnlins az a flame propagating through the For instance, as combustion proceeds, various chamber. As the engine proceeds through its pollutants may be formed, such ta-T nitrogen cycle (indicated by the increase in crank angle from-23 to +17 degrees), the fuel burns, resulting in high temperatures (see color code).

12 vh Prrp t, rmputin nil I',onfinoll. am opt Each of the models within the code repre- however. the 4110 reactions are known ftom first sents an approximation to the facts. sometimes principles, the problem is addressable: w hat are because the facts are not well understood hut required are better algorithms mining on a ma- often hecause the capabilities of existing com- chine RUXX) times more powerful. a teraflop puters do not allow such information to he in- machine. cluded. For eNample. the over 41K) chemical rate This computational design technology is used processes involving hydrticarbon and nitrogen by many pnvate industrial engine deshm tirms chemistry are treated globally by less than ten as well as universities and government laboraw- such reactions in order that the code run m a TIL's. tor.' hours on a large supercomputer. Since.

Present engine models are de,ficient in one necessary for an adequate description of the way or another because of limitations in chemistry, but even these models require serious computing resources. Computer codes, such as improvements. Such cakulations are capable of KIVA. can provide :nformation on large scale only the most rudimentary fluid mechanical fluid flows,but are limited in their ability to effects. describe the small scale flow structures in The envisioned itkrease in computing power turbulent flow. Furthermore, fuel chemistry is critical to the effective development of clean. models are necessarily crude. Current chimical efficient engines. kinetics calculations approach the detail Wodi waled by DOE.

I 3 REST COPY AVAILABLE ' %me /Licit

with spacing substantially less than 50 kilometers. The largest calculations that have been run to date with the Cray version used grids with 55-kilometer resolution. With the global ocean model running on the current CM-2, it is prre.5le to solve problems Los Alamos CM-2 is the first realistic general with a 28-kilometer resolution. The ultimate circulation model tfi be implemented on a aim in global climate studies is to develop an massively parallel machine. The code was advanced clinore model capable of describing rewritten for the CM-2 based on the Cray the fully coupled oceanlatmosphere system. version of the Semtner-Chervin global ocean which will be necessary for solving important model, which is highly parallelked to run on problems. such as predicting the effect of the multiprocesso: CRAY X-MP or Y -MP. For greenhouse gases on global warming. large problems. the CM-2 code runs with The image shows the observed temperature speeds comparable to a full 8-processor Y-MP. (at 160 meters depth), which is used in model New algorithms were implemented to substan- calculations. The warmest areas, in the tially improve the code' s performioce.The equatorial region, appear red: the coolest model is a finite difference code with realistic areas, near the poles. appear orange and sutface and ocean-bottom topography. yellow: and the areas in between are shown in To fully resolve the complicated currents blue and green. and circulation in the ocean requirer grids Wodc waited by DOE.

te" ,L- tAkali

' 0..-

Global Climate Modeling global climate models ate able to simulate sat- it is paramount that we improve our understand- isfactorily some aspects of the current climate. ing of global climate and its potential impact hut comparison of future climates predicted by upon human activities and the environment. various models reveals significant and discon- Generation of carbon dioxide. methane. and certing disagreements. other greenhouse gases as a by-product of en- A new generation of models that have tiner ergy production and other activities must be bet- spatial resolution and more realistic treatment ter quantified, and our capability to predict their of the critical physical and chemical processes influence on climate must be greatly enhanced ifthat control our climate is needed. In addition. we are to formulate rational energy strategies longer computations spanning many decades of and policies. Numerical computer models pro- simulated time 12(X) years Or more) wiii be vide the only means for projecting the impact of needed. Current models, however, are con- greenhouse gases on future climate. Present-day strained by the resources of even the largest

14 *.r./!7 4 A.. 1 High Performance Computing and Comnuancanans supercomputers now available. CHAMMP vanced and increasingly computer-intensive the Computer Hardware. Advanced Mathemat- data acquisition and analysis protocols. Re- ics. and Model Physics Climate Modeling quirements for computer-intensive molecular Programis a DOE program specifically dynamics and electrostatic calculations will in- inteded to develop within W years advanced crease as more structural biology data is ac- climate models with the above-mentioned quired and analyzed. Development of parallel improyements and also capable of much longer processing and computer graphics will be re- simulations.= The advanced computer systems quired to support online structural analysis and and software technologies being developed by computer-based seauch strategies for sequence/ the DOE H1-7C Program will provide capabili- structure or structure/function correlations. ties critical to the success of CHAMMP and to The human genome initiative requires algo- solving the nation's environmental and energy- rithms for comparing the -fmgermints- of related problems. pieces of DNA that are known to be derived from a larger piece of DNA but whose degree Biascieates of overlap and spatial ordering relative to one Biological research plays a major role in im- another is not known. Determining these rela- proving the quality of our lives. Computer use tionships is highly computationally intensive is, currently, a major activity of structural biol- and stretches the art of combinatorics. More- ogy research and will become increasingly im- over, the requirements for this work are far portant. There are two categories of use: ( 1 ) from static: the quantity of sequence data has data acquisition and analysis and (2) modeling been doubling every year or two. If the human and theory. Also noteworthy are the efforts be- genome initiative is successful, the quantity of ing made to map and sequence the human ge- data 15 years from now will be 100 to 1000 nomethe full set of instructions for a human times the current amount. The sequencing ef- being. Such research is crucial if we are to de- fort will require computational power. unprece- termine the genetic basis of many human dis- dented in the realm of molecular biology, for eases. improved data management and analysis. and Currently one of the principal uses of com- communications links between those generat- puters is for data acquisition and analysis. Effi- ing. managing. and accessing the experimental cient use ot high-intensity photon and neutron and interpretive data. beams provided by existing facilities demands online computer control of beams, spectrome- Materials Selena* ters. and detectors. Data acquisition rates from Many of the problems associated with materials diffraction and nuclear magnetic resonance research require computation of the subtle (NMR) experiments are high. and analysis of many-body effects that manifest themselves in data using established algorithms, an essential high-temperature superconducting materials component of mast structural biology research and the properties of complex polymers. Chal- sponsored by the DOE Office of Health and lenges in the realm of compufational physics. Environmental Research. is very computer in- chemistry. and engineering include (1) eluci- tensive. Current computer resources are not dating the mechanisms of chemical catalysts. adequate to support the efficient use of existing the design of new catalysts. and the design of facilities. Modeling of macromolecular dynam- new materials for separations science; (2) ics for crystallographic data refinement and understanding the electronic properties of novel electrostatic field calculations for determination materials, such as high-temperature supercon- of structures in aqueous solution are extremely ductors, polymers and synthetic metals: (3) ex- demanding of computer power and time. These tending the empirical molecitIar mechanics ap- types of simulations are important in that they proaches currently employed in the pharmaceu- can provide details of processes that cannot be tical industry for the design of drugs: and (4) easily probed by experimental techniques, such designing new materials and alloys with novel as the relaxation of water molecules in the vi- properties based on an understanding of the in- cinity of biological molecules. teraction among the material constituents at the In the future, efficient use of high flux molecular level. Initial small applications of beams at High Flux Beam Reactor and Nationalthese challenges tax the capability of today's Synchrotron Light Source and the advanced ca- most powerful computing technologies and will pabilities of the Manuel Lujan Jr. Neutron Scat- help to influence their future directions. tering Center. the Advanced Light Source, the Advanced Photon Source, and the Advanced Neutron Source will require application of ad-

15 Ion IW1

tit

by larger atoms, where different colors represent different bases: blue is guanine. green is cytosine. yellow is adenine. purple is thymine, red represents the charged phosphate hik experimental techniques such groups. and white represents sugar. as two-dimensional NMR spectroscopy can A molecular dynamics simulation followed provide detailed information abota the relativeby energy minimization showed that there are distances betw,en atoms in biological mole- actually three distinct structures consistent with cules. the translation of this information into the NMR data and that in two of these struc- three-dimensional structural information can tures the DNA molecule is bent kgnificantly. In be enhanced considerably by the use of a nonlinear mode of motion shown by the drug- molecular simulations. A recent study of DNA complex, each of the three distinct interactions in drug-DNA complexes. per- minima is sampled every 5 ps (10-'2 sr. The formed at the Los Alamos National Labora- combination of two-dimensional NMR and tory, was carried out to construct a set of computer simulations gives a much clearer models consistent with two-dimensional NMR interpretation of the structure and dynamics of data. The particular case involved a complex drug-DNA complexes than would be obtained interaction shown in the figure between a by simply attempting to fit the experimental specific anticancer drug (Distamycin-2) and a data. Cakulations of this type can also provide segment of DNA. The atoms of the anticancer guidance for future studies of protein-DNA drug are represented by the small cyan interactions at the atomic level. objects. The surface of the DNA is indicated Work suwoned by DOE.

i) 1.304414 1 ;:, . NO I. 11)01Per?, rroThingionnuttm: dna l,0471Mion4444114mi

key aspect of designing improved materials is the understanding of their response to various kinds of dynamic loading. including fracture properties (such as the yield stress and the mode offracture). Given a model for the interatomic forces. the fracture process can be simulated using molecular dynamks (MD) techniques, provided that the trajectories for a macroscopically large sample of atoms can be followed for a sufficiently long time on a com- puter. Simulations of 4012 atoms (i.e.. a block I pm on edge) are required to properly model the effects of grain boundaries, plastic deformation, and crack propagation. Until very recently, the largest MD simulations have involved Kr to atoms, so that no realistic calculations offracture in three,dimen- sional solids have yet been attempted. An exceptionally fast MD computer code is being ueveloped to exploit massively parallel architecture of the Connection Machine (CM-2). It will perform simulations on > lO° atoms. corresponding to a two-dimensional square of material approximately 0.25 pm on edge. Many of the features of three-dimensional fracture are retained in two-dimensions, so that these simula- tions will make the first direct connec- tions between the atomic interactions and the macroscopic fracture properties. Shown here is a sequence of snapshots from a sample run of a spall process. using .,30 K atoms, interacting with a many-body potential appropriate for a metal. The locally averaged horizontal velocity is represented as a continuous color rainbow. In the top frame, the flyer plate (shown at the left in blue) is moving to the right and is about to collide with the sample (shown in greenish-orange moving to the left). The relative velocity of collision is 25% of the metal sound speed. The resulting shock waves move through the material, which ultimately fractures at the point where two rarefac- tion waves meet. Significant plastic work results in a damaged spall fragment exiting to the right. Wort supported by DOE.

41) 17 7.nigtd

ince the 1986 discovery of a new class ductors. These efforts to model the structural, of superconducting ceramic oxides, scientists vibrational, and elertronic properties of around the world have been involved in matter can be greatly assisted by powerful intensive research efforts to understand and computers. especially since parallel super- fabricate practical high-temperature supercon- computers and sophisticated algorithms can now be combined to provide sufficient processing power. Physicists and computer scientists at Oak Ridge National Laboratory (ORNL) have collaborated to develop a multipurpose parallel computer code to calculate the electronic strwure of real materials from first principles based on the Korringa. Kohn. and Rostoker coherent potential approxima- non (KKR-CPA) theory of magnetism and alloys. The model allows multiple atoms per unit cell and is ideally suited for situations in which substitutional disorder plays an important role, such as in high-temperature superconductors. The code executes effi- ciently on serial computers, on shared- memory multiprocessors such as Me CRAY I- MP, and on distributed-memory multiproces- sors such as the Intel iPSCI860, and a distributed network of computers. Initial computational experiments on the high-temperature perovskite superconductors BaC p and BaPhi ,13i3O--peiformed on the 8-processor CRAY I-MP at the Ohio Supercomputer Center and the 128-processor Intel 1PSC1860 at ORNL--have revealed several interesting details related to alloy softening of the density of states and Fermi suiface nesting. Moreover, these results revealed that experiments on more complex superconductors such as (La: ,Sr, be able to execute at a rate of over 2_5 plops on the Intel iPS0860. IllehPIT?, 'eIonnhnne anaC4fmmuRit'alli INiN Plasma Physics Research The successful development of magnetic Imam or menial fusion as an alternative energy source requires a deep and detailed knowledge of plasma phenomena. Numerical simulation has made many contributions to the haste understanding of plasma phenomena. One can point to the discovery of nonlinear ctfrets in space plasma physics. inertial .ontincment physics. and magnetically confined plasmas that came about thmueh numerical simulation. The problems arc typically complex with many competing effects acting simultamously on different time and space scales. ant.WVmaccessibk to experimental or analytical approaches. Typically, simulations have given insights into fundamentally new physical processes that are striking in the ways that nonlinearity can bring order out of complexity. Understanding plasma phenomena depends on the ability to do simulations in three dimensions wish realistic geometnes and long time scales. 'The single most important component tor continuing progress is the attainment and successful application of greater computing power. There is a present and urgent need to provide realistic plasma simulations in three physical dimen- sions involving complex bounded systems. Another area with potentially important ap- plications involves the Interdisciplinary cou- pling of molecular physics with plasma sci- ence. For example. to he able to model the syn- thesis ot novel materials using chemical vapor hown above is a visualization of the deposition ;CND). one would need to combine pion propagator based on a QCD lattice the computational techniques of molecular generated on the Cormection Machine. The physics with numerical plasma simulations. propagator is a fimaion of three spatial dimensions and one time dimension, so the Fundamental Physics and OCD data have been averaged over the third Field theories such as quantum electrodynam- spatial dimension and displayed as a function les and quantum chromodynamics KAM arr of x and y (the short axes) and time (the long helieved to describe the entire range of axis). physical phenomena on the atomic. nuclear The event represented is the creation of a and subnuclear scales. QCD is the theory of pion near the center of the volume and its strongly interacting panicles such as nucleons propagation in space both forward and and mesons. The highly nonlinear regimes of backward in time. The magnitude of the these theories cannot te solved with traditional propagator determines the size of the teetmiques of quantum mechanics. llowever. "bubbles" (shown in green) in this visualiza- an approach in which space-time is approxi- tion, and a representative surface of constant inated by a lattice makes it possible to predict amplitude is displayed in white. From the the characteristics of strongly interacting rate at which the amplitude dies out as a panicles. The present calculations are forced to function of rime, the pion mass can be make severe approximations. With the calculated. The generation of the lattice .-ontinuing development of new computational requires approximately 300 hours on a I6K techniques as applied to novel computer CM-2; many meth lattices are required to architectures it is possible to treat these more give a statistical estimate of the pion mass by exactly. The goal of these calculations is to averaging results front the propagator learn how quarks and gluons form composite calculation based on each lattice. Work supported by DOE.

11Pr IS 1.'41t. mr,4-1 . I 1'0 huff NCI

'tkettLi,

4

ne of the more complex plasma Within the Tokamak is depicted the trajec- problems to be solved is the modeling of tory of a single charged particle (indicated by plasma confinement by magnetic fields in a the white line). The orbit is calculated from the Tokamak. This modeling requires a full three- nwgnetic field data stored on the three- d'unensional simulation of the diffusion of a dimensional mesh. The orbit generally follows multicomponent plasma in the twisted toroidal a magnetic field line, while precessin slowly fields. The accompanying picture shows output due to gradients in the field. This calculation is from a three-dimensional finite difference only a start on this exceedingly complex adaptive mxsh simulation. Shown is an interior problem. Diffusion is a result of the drifting swface and a cross section of the Tokamak motion in this magnetic field and the self- with the color of the surface proportional to consistent electric fields generated inside the the strength of the magnetic field. The use of plasma. A realistic simulation of a Tokamak color allows the display of the field strength on will require a computation mesh with a million multiple surfaces through partial transparency :ones and several million particles. Edge of one of them. The magnetic field decreases effects. p;asma heating from beams, and wave from the inner edge of the torus (yellow) to the heating, as well as the internal diffusion outer edge (blue). Color shading enables une process. must be taken into account. to display multiple complex features in three- dimensional computer models.

Work supported by DOE.

ZO .44, wag* lefedfillf V!! ips 4.14 413 . UBLE Prrtormance il-ompuime rx(i'nmwuiufltrnfi structures as will be studied experimentally at Environmehloi Modeling and Remediation Relativistic Heavy Ion Collider and Supercon- A comprehensive mathematical modeling and ducting Super Collider. computational iesearch program is needed in QCD is among the most computer-imensive the area of restoring the earth's surface and problems known. One reason is because full subsurface environment after contamination. QCD is a very complicated field theory. with Problems in this area are of immense impor- many degrees of freedom. Today's best tance to DOE and are found in every region in calculations are being done with 16x16x16x32 our country. Examples include leaky radioac- lattices. It is thought that definitive results will tive or chemical storage containers under- not be obtained until we can calculate with ground (gas stations. nuclear waste sites. etc.) 256x256x256x256 space-time sites and each and chemical spills on the surtace toil spills in site requirm nearly 100 floating point words to the ocean, chemical processing plant spills. describe the fields. That is around 400 billion etc.). Computer models can heIp provide much words of computer memory required just to needed understanding of the complex physical. hold the description of the lattice. Furthermore. chemical, and biological processes involved. nearly all arithmetic operations in QCD involve and detailed simulations are effective substi- complex arithmetic, so the number of opera- tutes for experimental laboratories that can be tions per word of the lattice is high. Extracting used to test understanding of complex phenom- these results will require computers thousands ena and supplement physical intuition. In many of times more powerful than today's fastest cases computer simulations are the only fea- supercomputers. sible method of studying the phenomena due to the long time scales involved in the radioactive and chemical processes.

sing the heat generated by an electric field to melt a region of contaminated soil is the wits4A... ..- first step of the in-situ vitrcation" process ... carently under study for removing hazardous -f",' 7 Lis n I I , 4 4.; , . di;tr - -.-a.. waste from DOE sites. The soil resolidiftes as a i% glass, effectively trapping the contaminants and preventing their migration into the groundwa- ter supply. An erperimen: testing the procedure kas been conducted at Oak Ridge National Laboratory. The glassifled soil can be seen boveen the four electrodes in the test configu- ration shown in the ; howgraph. Computational modeling is essential to gab- better understand and control this process. DOE researchers are developing algorithms fm modeling the many complex processes involved in the vitrifictakm process: heat generation and transfer, natural convection, 4111, liquidlsolid phase change, chemical interac- . origipartions, and the creation and movement of gas A'I,. bubbles. As the process takes place under- Vollereg,"A4Liftia-- ground, computational models must also be

4.44 " tialts.01used to interpret the data obtained from se. 1 40' IS necessarily indirect experimental measuring techniques.

Wok waded by DOE.

21 boa( 4,14JJ

Computer studies are essential for the devel- UNIX and MACH. Although a multitasking. opment of new technologies for identifying. timesharing system might eventually he monitoring, and reclaiming hazardous waste desirable, the initial operating systems may sites and spills. Many such technologies are assign each user a subset of the total number of emerging. suchas in situvitrification. bioreme- nodes on a massively parallel machine. Using diation. organic vohtalization. and surface and this approach would avoid problems associated subsurface hydmlogy. Some, no doubt, will fail with load balancing while timeshanng occurs to live up to their potential, but others may pro- on some fraction of the nodes. The operating vide solutions that significantly improve the system must also ensure the integrity and process. Even a small improvement can yield securny of each application being executed on tremendous savings given the magnitude of the the computer and allow operators to rest-st all problem facing our environment today. Com- of the applications after an inadvertent shut- puter simulations will help to determine which down. A very important issue is the ability of technologies should be vigorously pursued and an operating system to ollow inputioutput to under what conditions they should he utilized. occur in parallel. Without such a capability, an Current models incorporating relatively few input/output bottleneck can form, limiting the chemical, biological,and physical interactions utilization of the machine. It is important to

run at 20 to 40 !Mops on a single CRAY - capitalize on recent advances in a universal MP processor and three-year simulation's take -)arallel. distributed-memory operating system on theorder of II) hours of CPU time and over such as parallel MACH. which will have a a day of actual wall clock time. When addi- small kernel and a large library of servers

tional interactions, such as inclusion of more conlatiling MOM of the UNIX funcnonalits. important chemicals in the reactions, cracks in This w ill allow streamlining the operating subsurface rock formations. and the potential system, providing only those services on exponential growth of biological species. are individual nodes needed by a given job. The incorporated along the required resolufion to operating system must provide functionality. resolve the chemical reactions at the fronts. tying kernels on individual processors together. computational requirements quickly surpass the This should include parallel tile systems that capabilities of today's supercomputers, even can besmoothly created and accessed from for small three-year simulations. Because of the several cooperating processors. The advantages time scales involved, simulations of ten to a to both manufacturers and users of a universal few hundred years are required. which further operating system are obvious: leverage of enlarge the computational power shortage. resourcesanduniformity of environments.

SOFTWARE COMPONENTS AND TOOLS Portable and Scalable Libraries Mathematical software has played a vital role In addition to a need for softwlre tailored in making uniprocessors effectively usable by specifically for grand challenges, one of the scientists and engineers. As more elaborate HPCC Program objectives is to advance computer architectures arise, the need for more generic software technology that will have sophisticated mathematical software hecomes broad national impact. DOE plans to form acute. Ideally, such mathematical libraries collaborative groups among the national should be portable across a vanety of architec- laboratories, universities, and industry to tures. and scalable libraries for a single develop and share in this software technology architecture will be the first step in this effort. This area includes a broad range of evolution. To ensure portability across ma- activities, a selection of which are described chines. one may adopt a language tin- express- below. ing sector and parallel constructs, such as 41) extended version of Fortran. It is clear, how- Massively Parallel Software Systems ever, that such a common language alone Massively parallel computers will not displace cannot guarantee -performance portability." traditional supercomputers for the majority of Until compilers for parallel machines with users until reliable. mahiuser operating systems vector or RISC processors mature. avoiding are developed. Such operating systems are substantial degradation across architecture% will needed to ensure efficient machine utilization still rely on ( I ) tuning code by inserting and should be (Jo/eloped along with the compiler directives, and (2) using basic hardware to ensure adequate hardware support primitives that are written in assembler for for operating system functions. The operating maximum exploitation of architectural features systems that are currently available for parallel and that are a pan of the mathematical library computers are still primitive compared to of a given machine. To ensure performance 11 f-- f MO Performance (".ampunnt. and Communn allow, portability tand avoid degradation of perfor- language clearly and succinctly and the mance). one needs numerical libraries in which associated compiler's ability to exploit the the algorithms make effective use of the torget machine's architecture. There is exten- pnmitives that ideally match the architectural sive scientific debate on the most cost-effective feature for maximum performance. For language/compiler strategy for parallel lira oad balance is a example. scalability of the algorithms used in tectures. The principle options areI ) extend critical issue in porting these libraries can be enhanced by scalable existing languages with parallelism primitives. applications to massively primitives. This has been demonstrated, to a (2) advance compiler optimization technology parallel computers. At cenain extent. in libraries that deal with dense to find parallelism in existing languages. and DOE' s Sandia National matrix computations. However, much remains t 3) develop functional, object-oriented lan- Laboratories, a dynamic to be done in other areas, such as sparse matrix g.uages to replace the old styles. The DOE synchronous load balance computations. laboratories will collaborate with interested owi.tod based on binary univ,vsity and industrial partners to use their decomposition was used to Programming Languages/Compilers extensive expertise to push the technology of balance one mil"on par- Effective use of high performance computers all three options. They will also establish fair ticles in six seconds on a depends heavily on both the programmer's comparisons among the options to determine 1024-processor NCUBFCFJ ability to express algorithms in a programming their relative merit. starting from a random assignment of particles to processors: subsequent load balance updates required less than 0.1 second each. Using heterogeneous programming in a radar simulation application, six evelopment of Basic Linear Algebra user programsa host Subroutines (BLAS3 ) was based on the idea program. dynamic sched- that the addressing locality of a program can uler. ray-tracer. radar be greatly enhanced by properly indexing a simulator. image collector, set of nested loops. For example, indexing and graph.' ^s program through whole rows and columns can cause execute simultaneously and many more cache misses than repeatedly cooperatively on the 1024- indexing through small rectangular blocks processor NCUBE 2. The that fit in cache memory. In the case of the dynamic scheduler per. Alliant FX-8. cache memory misses caused formed asynchronous load performance to fall far short of expectation. balancing based on a With BLAS3, performance increased to processor hierarchy: the almost peak achievable system performance. resulting code ran about 50 45 which is sustained for problem.s that are not Reduction to Hessenberg Form times faster on the NCUBE cache contained. Petformance rates of than on a CRAY Y -MP 40 4 approximately 50 megaflops have been processor. Thi.; scheme has BLAS3 1 observed for the rank.k update BLAS3 35 been recently applied to SDI primitive C4C+B. The performance of the tracking problems and other block methods that use these primitives is 30 complex simulations. also quite high. Rates between 40 and 45 25 megaflops have been observed for a block Wodc suppmed by DOE. LU decomposition algorithm. 2 20 Top: The performance of an IX factoriza- 15 tion implemented in Fortran. using optimized BlAS2 primitives and BLAS3 primitives. 10 Fortran Bottom: The results of the BLAS3-based 5 code for reducing a matrix to upper Hessen- berg form and the Eispack Fortran routine 0 256384 512640768896 1024 ORTHES after automatic optimization.

Woo* saw:toted by DOE. NSF, AFOSR, and IBM.

13 lune NW

The parallel processing group at the Aerospace Corporation originally contacted Argonne because they needed parallel pro- gramming tools to support parallel versions of codes used in their engineering programs. They were also interested in extending tools that they dentists at Argonne National had developed for understanding the perform- Laboratory played a leading part in the ance of sequential programs to work in a design of Strand, a parallel programming parallel programming system. system distributed by Strand Software Argonne scientists helped Aerospace Technologies of Beaverton. Oregon. Corporation develop their tools into a form Recently, members of Argonne's Mathemat- suitable for parallel performance evaluation. ics and Computer Science Division helped The result of this work is a powerful data the Aerospace Corporation of El Segundo. collection and analysis system called Gauge. California apply Strand to engineering Gauge can be used on a wide variety of problems. They also helped the corporation parallel computers, including hypercubes and develop a set of tools for understanding the shared-memory machines. Engineers at performance of parallel programs. Aerospace Corporation are now using Strand and Gauge to develop parallel versions of target tracking and rocket design codes. Al UsageliCallsi[StatisticcirSubsettrilelaripoilirS Argonne, the Gauge tools have proved

Zoom NM= Bucket Mill 11111 invaluable in projects developing parallel Elm linear Scale IhiPuo4rt UnSort 64Pr codes of climate modeling and computational es" biology. . _ R- Execution Tine The group at Argonne also advised Strand 0 1 2 3 Software Technologies on how to incorporate sk:gaues1../11 the Gauge tool into the Strand programming uk:1ine1/16 system. Strand Software Technologies agreed uk:louj/9 to distribute Gauge free of charge with the wiatilettin Strand software system, providing wide uk:leukt9 distribution of the tools developed by the mk:hlehk/9 collaboration. uk:jloul./3 The different colors in the image represent uk:jhiehl_i3 the amount of work being performed in .k;klugh4.13 different parts of the parallel program. Black 14k:kloul_/3 represents the most work, whereas red, blue, uk:starLptl_iS and green represent less work, in that order nen:OiloopflO (see color bar). nen:line/9

wk:uk2/14

uk:init-rsfreti Wait ail:ceded by DOE. nan:enta

uh:uklill

uk:uk/12

man:put/6

Total Execution Tine (ninsisecs:nsecs): 0:59:7? Total Reductions: 02010 Total Suspensions: 25664

24 flihPelnrmance Computing and ConununtrarionA

FORTRAN 77

esearchers at the University of treams and Iterations in a Single Illinois' Center for Supercomputing Assignment Language (Sisal) is a general- Research and Development have pioneered purpose functional language for high- compiler techniques to detect implicit performance computing. The language is paralklism in sequential languages, such as Wendel: to run on bath conventional and Fortran 77. Today, these compiler tech- RESTRUCTURING novel multipmessor systems. The lan- niques are used routinely in most COMPILER guage's developmnt is a cakdsorutive supercomputers. For algorithm kernels, the effort by Lawrence Livermore National parallel code produced by restructuring Laboratory mil Colorado State 111Jversity. ccumpilen is comparable to that generated PARALLEL FORTRAN Currently, researchers at more than twenty manually. The goal of much current academic and resew-A instiotions world- research is a similar achievement for large 1 . 11. . applications. 171 mni TS1 wide are using Sisae. a tt ITH t . Int C I Functional langsuzges provide a clean Wok suppled by DOE. f3M0 and easy-to-use pans:lel prognImming model that facilitates parallel program development and simples compilation. Computational Science Environments The semantics cffunctional languages Much remains to be accomplished in order to isolate the user from the complexities of create effective work environments for compu- parallel programming. The compiler and tational scientists. The issues range from operating systemnot the userare programming tools to facilitate error-free code respotuible for the synchronization, generation, to performance analysis and comnumications, and scheduling of concur- enhancement, from pre- and postprocessing of rent tasks. Functional languages free the data, to effective visualization of multidimen- user to concerurate on the nature of the sional time-dependent phenomena. and from algorithm and not its execution. communication of scientific results to the Until now, functional semantics carried creation of problem-solving environments a high performance con; but recent oriented toward particular classes of applica- advances in compiler and operating system tions and using the specialized idioms and technology have brought Sisal on par wish language of those applications areas. The Fortran on both scalar and vector shared- objective is to make HPCC and the computa- memmy multiprocessors. Given the tional science approach to problem solving expressive and easy-to-use parallel pro- more useful for and readily available to large gramming model it provides, Sisal is an numbers of engineers and scientists. This is attractive alternative to conventional particularly relevant to the transfer of computa- programming languages on shared-memory tional science technology to industry. multi-processors. Tools that merit early consideration include common, multimedia user interfaces. graythic 1 Processor 5 Processors job monitoring and management aids. parallel Program IS Imes Sisal Fortran &sal Fortran Fortran and C extensions, parallel multinode

Gauss Jordan 114 2 6 1 7 0 9 1 0 debuggers. performance analyzers, shared data

RICARD 301 7 3 7 7 1 8 1.8 cache systems, transport. hierachical file stor- age systems checkpoint/restart software, and SIMPLE 1150 128 7 128,8 32.9 99.6 automatic memory coherency managers. Weather Code,2718 55.5 75 1 13.6 54.4

This table, a comparison of Sisal and Fortran execution times, shows the execution times (in seconds) cffour sciennfsc programs on the Alliant FX/80. Clearly, Sisal is as fast as or faster than Fortran.

Work supponnt) by DOE and ARO. 3 I

25 addition to text, that metaphor is emended to include several 'media components, And like text, these additional components are subject to the same cut, copy, and paste paradigm, making os Alamos National Laboratory has them as simple to manipulate as words. As a developed MediaView, a generic framework result, pawn* and complex MediaView for communication via multimedia docu- documents can be constructed by nonspecialists. ments. These documents can include not only MediaView can be of enormous benefit to text, line art, and still images, but sound, workers in a distributed but networked 11PC video sequences, and computer-produced "center." One of many possible examples of use animations. Also, when cast in digital form, is exchanging scientific visualizations and the mathematical content of a document can mathematical analyses. Unlike a conventional be symbolically and numerically minipu- document, these components appearing in lazed. Thus, one can experiment with the MediaView are live and interactive. mathematics, derive new results, and simulate The figure is an interactive animator for a different situations with different piramem-z. sequence of images that were produced on a Finally, an animated scientific visualization supercomputer. The colors represent the density can be incorporated in a MediaView docu- of the fluid calculation being simulated here. ment and be examined in situ or electmi- The color bar, from cyan to yellow, indicates less rally mailed to a colleague for independent dense to more dense. Different colors delineate study. Thus, MediaView is a communication structures that are formed in the fluid simulation. tool that offers new and dramatically differentThe observer is viewing a cat-away of a Mach-6 ways of interacting with others. intergalactic jet simulation, with the green lines Media Vier is easy to use and understand. showing the boundary of the data, had a "slice" it is based on a processor metaphor, not been cut-away. something familiar to any computer user. in folio* supported by DOE. :6 HST Cr High Perifirmanu, 'imputing and CferunumilmonA

num is Kivaat um susM minomma mmus NEN Nem Num mom g Iv FralnIN I

1=.14..m.,i. ....mi I I

4311

multiprocessor arrhitectures. ParaGraph rues brat color and motion to provide a dynamic visual depiction of the behavior of the parallel program. ParaGraph provides esearch and development at Oak leveral distinct visual perspectives from Ridge National Laboratory in the area of which to view the same performance data, performance characterization for parallel in an attempt to gain insights that might be supercomputers has produced two software missed by any single view. tools for monitoring and visualizing the Three of the approximately twenty views behavior of parallel algorithms. Portable in ParaGraph are shown in the accompany- Instrumented Communication Library ing figure. The circular image (two shades (PICL) is a subroutine library that imple- of red) in the upper-right portion of the ments a generic message-passing interface diagram shows how efficiently the program on a variety of multiprocessors. Programs is running on the processors and the load that use PICL routines for interprocessor balance between processors. The larger, communication are portable in the sense lighter-shaded polygon indicates the that the sone source code can be compiled maximum efficiency of the program thus far and executed on many different parallel for each processor, whereas the smaller, architecnges Optionally, PICL also darker ploygon shows the current efficiency provides time-stamped trace data on and load balance. The larger the polygon, interprocessor comowetkation, processor the more efficient the algorithm, and the busylidle times and user-defined events. more circular, the better the load is These data serve as input to a second tool. balanced. Ideally, one would want the called ParaGraph, which is a graphical darker polygon to always be large and display sotem for visualizing the behavior circular and about the same size as the of parallel algorithms on message-passing lighter polygon.

Work mgatod by DOE.

27 how /VW

Distributed Computing distribute computations in order to match Computational science is a diverse and geo- algorithms for specific subproblems with the graphically distributed activity. As previously most appropriate computer architecture. The mentioned and as discussed further below. one goal of distributed systems research is to create of the goals of the HPCC Program is to a uniform user and programming environment t the Georgia establish I-1PC Research Centers that will be across heterogeneous systems: in other words. Institute of Technology, a used by many rekarchers via high-speed to make it as easy to use a collection of research group led by networks. Many researchers will also have computers on a given problem us it is to use a physicist Uzi Lai:skarn has significant local resources in the form of single computer. Important areas of research been a major user of the powerful workstations and local shared fast and development include distributed program- Department' s supercom- microprocessor systems. Effective otilization ofming environments to make it easy to create. puter facilities for a number these distributed DOE resource-. in attacking debug. and run distributed scientific applica- of years. Landman's grand challenge problems will require software tions: distributed operating system techniques specialty is the computer- to support distributed systems. Distributed for control of distributed resources: distributed ized simulation of the systems are systems in which two or more file system management and archival support atomic world, particularly computers cooperate on the solution of the tools to facilitate utilization of high perform- as it is reflected in the problem. For example, one may want to ance networks: and mechanisms for effective properties of materials and distributed software control and update. in the physical and techno- Photo oastesy RAY STANYARD logical consequences of Visualization and imaging their microscopic interac- The purpose of HPCC technology is to aug- tions. By incorporating ment the work of scientists and engineers. both classical and quantum Their interface to these systems is of central mechanical principles into importance if they arc going to he most his supercomputer pro- effective. Therefore. it is important to provide grams. Landman has been the hardware and software technology to allow able to accurately predict for input of a variety of image data and for how various materials output that supports the use of their senses. behaveat the molecular particularly visual. for oining maxinlum and atomic levelunder problem insight. various physical conditions Research and development is needed in and to explain how they input. storage, and analysis of image data and respond when brought into in the output of data in appropriate visual and contact with each other other sensory forms. The output techniques under pressure. Visualiza- Uzi Landman needed include effective management of three- tion of these simulations is providing scientists with fundamental new knowl- edge of the atomic mecha- nisms that underlie the programming and user-at-a-terminal logical behavior of materials. model of a single shared and nonshared Landman, shown in the memory multiprocessor system providing photo, illustrates on a upercomputers do not operate as transparent resource access. The architecture monitor how an electrically stand-alone devices. Typically they are was implemented as the native client-server, unbalanced cluster of salt embedded in a heterogeneous shared multiprocessing operating system on CRAY X- molecules (Na"- CI"' ) resource environment consisting of many MP and Y-MP systems that has been in becomes stabilized. To services, for example: mass storage, input/ production for over two years. Several network make up for the missing output. authentication, graphics engines. services were implemented within the architec- chlorine atom, a free terminals, and workstations. During the ture. For example, the storage part of the electron ( indicated by the 1970s Lawrence Livermore National architecture was implemented on UNIX systems cloud of white dots) Laboratory (LLNL) played a leading role in as a distributable hierarchical mass storage attaches itself to a sodium developing the shared resource server model system. LLNL is collaborating with General atom ( shown in red) on the of a supercomputer environment. During the Atomics to make this storage system available surface of the cluster and 1980s LLNL developed a distributed to industry and other supercomputer centers. begins to pull it away, operating system architecture for a The architecture also provided the basis for the kaving a balanced cluster supercomputer environment. The goa! of this IEEE Distributed Mass Storage Reference of 13 sodium and 13 research and development was to create the Model. chlorine atoms. Work swooned by DOE. Work supponed by DOE

28 High Pert, 'mancr t'ompanne and Common( anon% dimensional data and display. browsing. Applications Algorithm Design Systems connection of output images to quantitative In addition to developing new methods and data, and companson of current and past or algorithms, it is important to package thew alternative approaches. Visualization and techniques and make them easily usable. An imaging are dependent on high performance attractive area for research is how to create networks. effective data-compression tech- parallel design methodologies for large-scale niques. scientific databases, and high-definition scientific applications that include modular and he development of recording and display technology. well-integrated software tools, user interfaces. a comprehensive and standards and protocols. Such design method- general information system Very Large Scientific Databases ologies will expose the similarities and impor- for mokcular biologists Many of the grand challenge applications of tant distinctions between the hardware architec- tim Chromosome Informa- interest to the DOEsuch as global climate tures and provide mechanisms to promote tion System (CIS)is a modeling, understanding the structure of portability. New fundamental mathematical major part of the computing matter. and human genomeptnenfially algorithms will also be posed in such frame- effort at the Lawrence involve very large scientific databases contain- works. Berkeley Laboratory t4LBLI ing hundreds or perhaps thousands of terabytes Human Genome Center. In of data. These databases will require hierar- HIGH PERFORMANCE COMPUTING this system, one consistent chies of storage media. which in turn will RESEARCH CENTERS graphical user intetface require new approaches to daiabase structuring. will transparently interact storage, and searching. In addition, improved DOE will contribute to the support of 11PC Re- with multiple underlying intertaces are needed to facilitate scientists' search ('enters. The main purpose ot these cen- databases. The separation querying, browsing. studying. and transforming. ters will he to enable computational scientists if user actions on the the databases to enable better understanding of to explore the eftectiveness of new full-scale biological data objects and the physical phenomena portrayed in them. high performance architectures in solving grand the data storage enhances Standards need to be developed in this area to challenge problems. By using full-scale archi- the system flexibilityand aid collaboration among scientists and the tectures. code developers can carry nut experi- functionality. The descrip- aggregation of logically single databases from mental code design. with the expectation of ad- tion of the data, and the distributed databases being generated at many equate compute power to solve the problem. In operations on the data, are experimental locations, order ;tir researehers at remote locations to specified in a high-level make effective use of these facilities, each cen- language that captures the COMPUTATIONAL TECHNIQUES ter will become a major node on a National Re- biological concepts while search and Education Network backbone. Col- permitting the use tlf The principal objective in this research area laborative consortia of DOE-supported HPC' commercial data manage- Will he to improve the performance of numeri- Research Centers. industry user industries ment systems at the lower cal simulations on parallel computing systems. and computing and communications indus- levels. Through integration Research in this area should not only encom- tries), and universities will he very highly en- of specialized databases pass algorithm kernels and subroutines. but couraged. (e.g., an image database, a should also include full-scale scientific and The sponsorship of the HPC Research Cen- sequence database, and a engineering applications of -omemporary ters is an area in which DOE has a long history map database) biological ,merest. of excellence and leadership. DOE national information may be laboratories have recognized the importance of accessed at many levels of Computational Methods and Algorithms computational experiments and analysis to abstraction. DOE will support a comprehensive research complement expensive physical experiments. This woric is based on program in advanced computational techniques. Many of the 1X)E national laboratories con- concepts developed by the The principal goal of this effort will be to tinue to operate major productMn-onented su- Data Management research provide methodologies and algorithms to percomputing centers, as well as a number of group at LBL.The CIS has enable the effective solution of grand challenge parallel computing research centers. There are been implemented by a problems and to promote t 3.S. industrial many types of parallel computer architectures collaboration of the Data compentiveness. The locus will be on methods that will need evaluation. Because several al- Management group and that allow exploitation of the protypical ready have displayed high potential at outper- biologists from the LBL architectures deployed at the 11PC Research forming contemporary vector multiprocessors. Human Genome Center. Centers. Creation of the high performance it is reasonable to expect that there will he addi- Work supported by DOE. algorithms, to be incorporated in the methodol- tional such architectures identified in the High ogy. will require. in many instances, new Performance Computing Systems component numerical techniques that exploit greater of the HPCC Program. Among the concepts to parallelism than can be simply extracted from be thoroughly evaluated are SHAD. MIMD, and parallelized versions of existing sequential hybrid processors and shared. distributed. and algorithms. hierarchical memories. Each hardware architec- 3 7 29 lune IVY/

resulting in a dramatic increase in the computational efficiency and enabling the solution of realistic 3D flow problems. One type of adaptive technique is local adaptive mesh refinement, in which the fficient modeling of three-dimen- computational mesh dynamically changes as sional, time-dependent fluid flows plays a a function of space, time, and data to critical role in many grand challenge research maintain a fixed level of accuracy in the proNems. Adequate resolution of these flows calculation. The figure shows computational wing conventional spatial discretizations results obtained using this method to model would require computational effirt far in -.ANL the colkpse of an ellipsoidal cloud of Freon mess of the most optimistic projections for when it is hit by a shock wave, where yellow near-term growth in computational capabili- indicates the location and concentration of ties. Scientists at Lawrence Livermore the Freon. The use of adaptive refinement for National Laboratory, the Courant Institute. this problem reduced computational costs by and Los Alamos National Laboratory are more than an order of magnitude, resulting in developing adaptive methods that focus a savings of hundreds of Cray hours. computational ilk."'I'where it is most needed. Work swatted by DOE. DARPA. NSF. AFOSR. and DNA.

ture type will he evaluated with respect to com- peting software antitectures for the various grand challenge applications. The activities at each center will be designed uring FY I9$5. the U.S. (.'tmgress to suppon interdisciplinary and inter-insutu- mandated a study of computer tional collaborations. A key ingredient will be a networking requirements of the U.S. critical mass of in-house research conducted by research community. The preparation focused teams of applications scientists. com- of such a far reaching study. together putational mathematicians. and computer scien- with the growing widespivad remote access to tists. These collaborative teams will direct their supercomputers through federal supercomputer efforts toward the solution of grand challenge programs. such as those in the DOE and NSF. problems of interest to DOE. They will he re- focused the attention of the U.S. research sponsible for maintaining a dialogue with uni- community on the needr more capable. high- versities, industry, and other laboratories and speed computer networks. centers in order to maximize the dissemination Within the DOE. computer networks had al- of information and avoid unnecessary duplica- ready been used extensively for specitic appli- tion of effon. cations and programs, primarily supporting Fu- The hiPC Research Centers will also play an sion and High Energy Physics. hut these net- important role in computational science works were mostly incompatible and lacking in education. This aspect of their mission is capacity. Because of this and because of a rec- expected to go well beyond the usual training ognized and significant increase in networking courses in computing and to embrace such requirements, the Energy Research 1ER) com- issues as computational science curriculum munity endorsed a proposal to create the En- design at the university level and introductory ergy Sciences Network ESNet). The ESNet and motivational programs and material for has been developed to be compatible with ex- high school and elementary students and their isting network requirements while providing teachers. Clearly, accomplishing these activitiesconnectivity and interoperability to other fed- will require strong collaboration with the eral research networks in addition to a nondis- education community. ruptive transition path to emerging international network standards. The ESNet is the vehicle through which the ER community has become a full partner in the Internet community of computer networks and through which the ER community will become an integral part of the proposed National Re- search and Education Network t NREN). In fact, the ESNet currently incorporates access by

0 High Perform:our Computing and Communiratums

ESNet is engineered. 'metalled, and operated by the networldng sr a If of the National Energy Research Supercomputer SNet s currently a T1 -based (1,5 Center (NERSC), located a: Lawrence Mbitis) data communications network Livermore National Laboratory. Initial supporting more than twenty major Office of deployment of the T1 ciradts began in late Energy Research (0ER) sites directly 1989 and became fidly operational wilth both connected to the backbone. It is a multiproto- protocol families by early 1990. Since col network supporting two major families ofbecoming opermional, total traffic on the data communication' prototcols, DECnet network has doubled about evety six months. and the Internet prenocol. In addition to 'summoner:ins the major The jive major OER programs supported OER-suppomd sites (shows cm the map) are Basic Energy Sciences, Health and ESNet provides access to several regional Environmental Resemrh, High Energy and networb, Ms international connections to Nuclear Physics, Fusion Energy, and the Japan and Europe. and cturendy connects to Superconducting SuperCollider. two Federal Inter-agency eXchange (FIX) points providing interconnects to other national backbone networks, including the NASA Science Nenvork and the National Science Foundation Network (NSFnet).

many ER collaborators through the NSFnet and work that interconnects educational institu- through regional networks as was proposed in tions; national laboratories; non-profit institu- the ESNet Program Plan. June 1987. tions: government facilities; commercial or- These networks, along with ESNet and oth- ganizations engaged in government supported ers. are generally considered to be the precursor research; and unique national scientific and to the NREN. scholarly resounces such as supercomputer cen- The ESNet has been proposed by the DOE ters. libraries, etc. within the framework of the HPCC Program. of The NREN will provide high-speed com- which NREN isa major component. The munications access to over 1300 institutions HPCC Program is a balanced computing re- across the U.S. within the initial planning pe- search proposal that includes research in HPCC riod, and is proposed to offer sufficient capac- systems. software technology, and computer ity.performance. and functionality so that the networks as well as funding for human re- physical distance between institutions is no sources and the NREN. The NREN, as pro- longer a banier to effective collaboration. The posed. will be a computer communications net- NREN will suppoArcess to HPCC facilities

fI 1 31 tune 10.11

and services such as MI-motion video, rapid sion techniques for video transmission, and the transfer of high-resolution images, real-time use of packetized video have spurred great display of time-dependent graphics, remote op- interest in using video ftv research collabora- eration (If experimems. and advanced informa- tion support. The ESNet video pilot prntect has tion sharing and exchange. The NREN is also received very positive evaluations from all he ESNet was intended to incorporate advanced security and a involved. developed after careful uniform network interface to domestic users as The benefits of video are even more evident definition and documenta- well as a standard interface to international re- as one examines the logistics of the many tion of user requirements search networks. international collaborations in which ER from all of the Energy The NREN proposes to achieve economics programs are involved. Video enables these Research program areas. of scale by serving many federal agencies. in- collaborators and negotiators to meet on a This user requirements dustrial R&D centers, and university campuses. weekly and even daily basis, which could not orientation led to an F-SNet Although the federal government will provide a be done otherwise. implementation that substantial direct investment in the NREN. it is The use of windowing technologies and departedfrom other important to recognize that a large indirect in- other software tools has enabled more facile Internet implementations. vestment will be made by academic and indus- and productive use of remote computing and Most notably, the ESNet trial institutions and other networks that will control facilities. These tools 'extend' the was developed to be a connect to the NREN. The 1X)F. NREN budget capability of a principal investigator's local maltiprotocol network to is for the direct support of DOE laboratory and workstation to these remote facilities in such a preserve Energy Research other DOE-lunded research facility connections fashion that they can be used more readily in user applications invest- to the NREN. per the existing federal budget the investigation of complex problems. The ments in existing network process. 'fhis DOE program will complement operation of these tools in a distributed implementations while other NREN deployment. funded and coordi- environment, however, is a very taxing network allowing full interoperabil- nated by the NSF. to the broader research and requirement. ity with other Internet education community. In addition to windowing took there is a implementations, such as The ESNet is user requirement driven w ithm growing requirement tor distributed program- the NASA Science Internet the framework of the DOE mission. The DOE ming and debugging tools. Remote procedure and the NSFnet. Recently. HPCC Program proposal for the NREN is ap- call ftmctionality has recently been imple- the Internet Advisory plications and user oriented. both with regard to mented on the supercomputer systems at the Board has adopted the the operations and the gigabit research areas. National Energy Research Supercomputer concept of incorporating a The DOE will participate in the NREN manage- Center. and as this functionality achieves more rnultiprotocol approach to ment to ensure a reouirements-driven approach widespread use. it is anticipated that the allow a nondisruptive with strong user involvement and site coordina- distributed processes will place a heavy transition to international tion. Because of the common mission orienta- demand on ESNet bandwidths. It w ill he very standards. tion and common reouirements of the ER com- important to have the ability to remotely Work supported by DOE. munity. the etmsensus of this community is that analyze system perftirmance with regard to the ESNet would continue to he viewed as a parallelization. individual system node pro- large. logical community of interest network eesses. and throughput. etc.. especially. w hen within the NREN context, even if in the longer evaluating prototype 100-gigaflops and term ESNet were no longer a separate hut com- teraflops systems as part of the HPCC Program. patible physical network. This will place even more demands on network Several of the most demanding functional bandwidth (possibly another order of magni- requirements tor DOE networking capabilities tude) in the FY 1993-95 timetrame. are highlighted below. Proposals for the next large experiments The ER community will have a continuing within the Fusion Program for the International need tor multiprotocol support throughout the Thermonuclear Experimental Reactor and initial planning period. Although there is a con- within the High-Energy Physics Proeram for stant transition to UNIX-like systems and TCP/ the Superconducting Super Collider include the IP network protocol usage, a wholesale conver- requirement tor distributed control. remote sion to these systems is impractical and expen- operation. of the experiment. Presently. such sive and cannot he accomplished within three requirements for experiment control require the years. During that time. 051 standard protocols most capable local area network implementa- will have been incorporated into the ESNet. tions. These requirements for the wide area where they will co-exist with TCPTIP. X.25. ESNet, in the FY 1995-96 timetrame. will and DECnet. require bandwidths in the gigabit range. Videoconferencing has been initiated within the ESNet in a test mode. In recent years. the cost of video technology, the use of compres-

32 FIih Performancr Compumm and Commumrations DOE NETWORKING RESEARCH PLAN Problem ResolutlonTools User level problems in networking are becom- The DOE HPCC networking plan must ing increasingly sophisticated and subtle. while encompass at least the following technical at the same time the complexity of the network research issues to meet anticipated near term environment is rapidly increasing. Resolution needs for using ESNet and NREN as they of these problems will require a sophisticated evolve. arsenal of tools that will help to confirm. analyze, and pinpoint problems at all levels of Performance network services. Virtually all the characterization to date of the technical parameters of NREN has centered PRIVACY AND SECURITY around bandwidth. However, additional factors such as response time and availability can be Protecting the NREN and its attached resources equally, or even more, important for many against attack is an impoctant requirement. applications and must be given corresponding Protection and monitoring mechanisms must be consideration. In addition, support will be part of its design. provided for research in network performance evaluation. The above is a partial list of considerations that must be addressed by the NREN. in addition to Policy-Based Routing the major charter of providing state-of-the-art Policy-bawd muting is the term used to denote performance to the HPCC Program. An the muting of traffic, taking into account the interagency NREN will evolve over time, usage policies of both the using entity and the adding these capabilities as they can be reliably service provider. This issue is still in an early provided. It is unlikely that these divergent research phase. Topics of study include requirements can be met with other than a determining mechanisms and parameters to hierarchical architecture, incorporating multiple allow independent domains to protect them- networking entities and multiple levels of selves from unwanted traffic and mechanisms networking components. many of which may to determine paths through the system meeting be in place already. client needs. GIGABITS RESEARCH AND DEVELOPMENT Open Systems Interconnect (OSI) Integration The leadership for gigabit network research for The OSI networking standards need to be the Federal HPCC Program will be performed incorporated into the NREN architecture and by DARPA. DOE will support a modest effort capabilities. in gigabit network research: to support grand challenge applications, to provide connectivity Multivalent Support within and among its HPC Centers, to contrib- A variety of protocol suites are currently used ute the high performance networking expertise by various user communities. For DOE, these of the DOE laboratories to the HPCC Program include DOD IP. DECnet (Phase 4), SNA. generally, and to ensure that it maintains the X,25. and others. User-level requirements expertise to upgrade and integrate its ESNet based on these protocols will continue for somewith the emerging gigabit NREN, The gigabit time. The network must allow for support of network research program below meets these these protocols if all or most users are to be needs. The previous section outlined key supported in their research. Design of gateway research issues needed for the interagency and transition mechanisms between protocol interim NREN effort: they apply equally to the suites will require research and development. full NREN effort. Below are listed additional New higher performance protocols will also be gigabit research and development efforts that investigated. DOE will pursue.

Networt Management and Operations Local Gigabit Networks One of the most important factom in providing For a national gigabit backbone to be fully gigibit network service across a large commu- effective, local gigabit networks are required to nity will be the ability to establish a hierarchi- interconnect the local resources of the HPC cal network management and operations Centers. Two main approaches to local gigabit enterprise that will be able to effectively deal interconnection currently exist: the circuit with network management issues in a very switch approach and the bus approach. For diverse environment. each, there are many unanswered questions 3S 33 luneIwo!

ment envisioned for CASA relies on the Los Alamos-proposed High Petformance Parallel Interface (HIPPI) ANSI standard and will he CASA wide-area testbed is one of leverage research at Las Alamos on HIPP/. five very high speed communication network based crossbar switches, networking sofnware. projects lead by the Corporation for National and protocol processors. The CASA project will Research Initiatives (CNRI). CAM is truly andevelop several protome distributed super- interagency collaboration involving the computing applications in chemistry, geophys- California Instinae of Teclmology. DOE' s Losics, and global climate modeling. The present Alamos National Laboratory, NASA' s Jet supercomputing resources mailable in the Propulsion Laboratory, and NSF's San Diego collaboration are Thinking Machines Corpora- Supercotnputer Center. MCI, Pac0c Bell, andtion's CM-2. Cray Research Incorporated' s Y - U.S. West are also collaborating with the MI's, and several hypercube architectures. As CASA testbed. with the other testbed.s. the CASA effort will Primary research goal of this collaborationcontribute to the research and development is the effective simularion of grand challenge base required for the National Research and problems on geographically dispersed super- Education Network. computing resources connected via very high speed networks. The communications environ-

Work supported by NSF and DARPA.

about the best design. The cincun ssutch Gigabit Applications approach looks particularly auractive because Prototype local and kmg-haul gigabit network ot its excellent bandwidth scaling. security, and applicanons need to be des eloped that demon- connection setup overlap characteristics. DOE. strate the utility ofthese networks tor enhanc- using existing gigabit circuit switch expertise ingthe %dentists' pnxiudivity. achieving and equipment within its national laboratories. computational power through distributed will investigate circuit switch architectural resource sharing. or enabling other. possibly questions, such as how is the numberof unforeseen. significant scientific or economic connections economically scaled. where is the advantages for the grand c.iallenge applica- optimum placement of funcfionality. and what tions. Developing distributedapphcations. is the effect of various host and network usingresources of several HPC Centers interface I/0 architectures on actual throughput. cooperatively on a problem. may require

; Perlormani rIminoring ("olmmoonio oorion%

Central Computing Laboratory Data Facility Communications Center

Advanced Computing Laboratory CASA IRWIN Testhed

aggregate throughpiu and less than I microsecond switching time is being coupled to Los Alamos-designed intelligent network he Multiple Crossbar Network interfaces to assemble a high-performance (MCN) testbed is a gigabit per second HIPPI switching system called CP*. Several testbed network at Los Alamos National CP* systems will be interconnected with Laboratory that interconnects computing fiber-optic links to assemble a multiple resources in the Advanced Computing crossbar network. Laboratory, Central Cm:puling Facility, and This network will interconnett supercom- Laboratory Data and Communications puters such as CRAY X-MPs and Y-MPs, Center. it will be extended in approximately Thinking Machine CM-2, workstations from a year to include the wide-area CASA gigabitSun and IBM, framebuffers. and high- per second :embed. CASA sites include performance disk systems. This extensive Ca limit the Jet Propulsion Laboratory and collection of systems with HIPPI interfaces the San Diego Supercomputer Center. provides a unique facility to do performance The purpose of the MCN testhed is to and irueroperability testing of their HIPP! experiment with gigabit per second network- implementations. ing technology, to understand how to apply The applications for this testbed include this technology to leading-edge scientific visualization of high-resolution ( 1024 x 1024 problems, and to promote closer interaction x 24) images at video rates (24 frames/ with industry. The local distribution system second), distributed supercomputing using for the MCN testbed is based on work done software tools such as Express and ISIS, and in the Laboratory' s Network Engineering data motion. This testbed will provide Group on the High-Performance Parallel facilities to explore the applications and Interface (HIPPI) and HIPPI-based switch- networking requirements for high-perform- ing systems. A Network Systems Corporation ance computing systems of the 1990s. crosslksr switch with a 6.4 gigabit per second Wofic womb by DOE

35 unr IUW

new mathematical algorithms, new application BASIC RESEARCH AND HUMAN RESOURCES structuring. and new imerprocess communica- tion techniques and protocols. This component of DOE's HPCC Pro- gram will build on the existing DOE ap- Prototob and System Software plied mathematical sciences base pro- With high performance computers and gigabit gram and recognizes that a portion of local area networks in place, research will be the HPCC funding must go toward long-range improve network performed to measure communication perform- investment in thz future. While the other three congestion control, the ance and to identify bottlenecks in applications. components are designed for a near- or inter- Network Researrh group at operating systems. and communication protocol mediate-term impact. basic research funding Lawrence Berkeley architectures and implementations. These must increase for the long-term health of the Laboratory developed studies will lead to experiments with new field. Likewise, to develop the human resource algorithms for slow-start. approaches to operating system and protocol potential of the future. investment in education dynamic window allocation design and implementation. Extensions of must accelerate, beginning at the high school and unproved round-trip parallel computing tools to enable applications level or below and continuing through life-long estimators. These improve- programs to run on multiple hosts linked with training. In both basic research and education. ments enhanced petform- gigabit wide-area networks will be investi- DOE has a long history of contributions and ance over the Internet gated. experience from which to draw. Our plan for (loaded) by factors of 2 to the HPCC Program uses this base and builds 100 and the technology Gigabit Testbeds upon it. was adapted by virtually Because it is important to experiment with all major computer alternative gigabit networking approaches. BASIC RESEARCH mameacturers. local gigabit !embeds will be set up associated The group also devel- with the HPC Centers. DARPA and NSF have Basic research supports all other aspects of the oped a header compression already initiated a significant Gigabit Testbed HPCC Program. The topics below address algorithm to enable Research Program. DOE-supported sites fundamental aspects of computing and its practical implementations participate in these national or regional area applications. of serial line IP prmocol, testbeds. DOE will provide additional testbed which permits computers to resources between sites for networking New Algorithms :14...t as network hosts over application experimentation that does not Part of the vitality of HPCC depends on a modems, thereby providing interfere with production networks. steady influx of new algorithms and mathe- simpler1 more reliable, and matical models. Indeed, the current generation more secure data connec- of practical, reliable algorithms, which are the tions. This technology, too. tools for attacking the grand challenges were has been transferred to once exotic new techniques designed and tested industry. on model problems. DOE has long provided a Other research includes natural and effective platform for research into development of a TCP new algorithms. Techniques that have grown header prediction algo- out of DOE's basic research program now rithm to increase through- serve as general tools for hundreds of different put on high-speed net- applications. The continued renewalacreative works, inv,estigation of algorithms and mathematical models is one of policy-based routing, and DOE's primary strengths, and the HPCC development of a new Program will exploit this strength. algorithm that allows network addresses to have Parallel Numerical Algorithms arbitrarily complex Much of HPCC is scientific numerical model- structure yet permits ing. As the sophistication of the physics models routing look-ups to be increases, the complexity of numerical prob- completed in constant time. lems expands. Parallel computers magnify the This algorithmthe only issue. We need continuous development of the one known to handle ISO best parallel numerical methods to ensure that addresses efficientlywill we can truly apply the power of our most be included in 4.4 BSD advanced computers to model the needed UNIX. physics correctly.

WW1 supported by DOE 4i 36 High Perfeirmance Computing and C'onnnunnunons Matching Algorithms to Architectums Research on HPCC has produced a diverse set of multiprocessor designs. Each design seems to favor certain types of algorithms and perform poorly on others. The relationship between good matches is often - A obvious at the outset. We need to have a better understanding of how n recent years, DOE researchers have to best select and use a piticular machine for developed and analyzed advanced numeri- solving a spectrum of different computational cal methods for solving the large-scale problems. nonlinear differential and algebraic systems that arise during the numerical Performance Measurement and Analysis solution of ordinary and partial differential When a parallel program runs poorly on a equations. This research has applications specific machine, the cause of the slowness can in a diverse range of problem areas, be baffling. It could be the algorithm, the including circuit simulation, computational translator, the partitioning of work, communica. electromagnetics, computer-aided design of tion bottlenecks, system overheads, hardware mechanical systems, modeling of chemi- failures, or any number of other causes. cally reacting flow, power systems, chemi- Moreover, many current analysis techniques cal vapor deposition and optinud control. alter the performance as they try to measure it. The development of effective solution Mechanisms for accurate and timely identifica- techniques has required a new understand- tion of the cause of poor performance must be ing of the mathematical structure of improved. Successful use of large numbers of differential equation systems, as well as a processors will depend on progress here. rethinking of algorithms. For earemely large systems, new techniques for large- Novel Applications scale linear and nonlinear systems have Many important applications of HPCC remain been developed. to be discovered and explored. Those who best New algorithms and methodologies have understand the potential of HPCC must ensure been transferred to industry in part through that a process for continually exploring new the development of high-quality mathemati- applications is developed and implemented. cal software. Software developed by DOE When feasibility has been demonstrated and for the solution of these problems now has documented for a novel application, responsi- hundreds of users in industry, government bility for its continued exploitation will transfer and academia; variants of this software to the affected applications discipline. have been incorporated into several commercial codes. The analysis and Theory of Parallel Computing Complexity software have directly impacted solution Traditional complexity work provides a techniques for the dynamk analysis of foundation for identifying efficient sequential mechanical systems, trajectory control, and algorithms, but it does not directly apply to for a wide variety of chemical engineering parallel machines. Parallel complexity models applications. Second-generation algorithms often make unrealistic simplifications, such as and software are designed to attack an infinite number of processors or an infinite extremely large systems. 4fectiveness of supply of memory, an assumption that leads to these techniques was recently demonstrated many impractical algorithms. We need a solid on a 2D laser oscillator model at Lawrence foundation for parallel computing algorithms Livermore National Laboratory with 38,(X) that will accurately represent their performance unknowns. Work is now under way to on large. but finite, numbers of processors and exploit the excellent potential for paralleli- various types of parallel computers. zation of these techniques.

Software Productivity and Reliability Work swooned by DOE, ARO, AFOSR. NASF, The cost of software for conventional comput- Amp* Corp., and Exert ers remains very high. The soltware cost for parallel systems will be substantially greater. Likewise, software reliability is often question- able. We need to develop a practical methodol- ogy for building high-quality software that can

.12 37 June 1491

Theoretical Peak Al Camptar Opt Hatm AA Mean 1000 Individual Coda A Psnamance erfornwnce variations in modern super- A A A computers can be visualized by plotting the A $ $ i 3 megaflops delivered on the Perfect Bench- 0 t 100 , s marb. Each dot in the figure represents that 8 , $1. 8! 9 10 , . 1e! megaflops rate o f the indicated machine on one g 11, of the Pedia codes. The gaps between theoreti-I. . , -..--.8 1 .* --r-da 1-141:44 cal peak rates and actual rates delivered on 10 ,1r1 II : 1 Tr , ! real applications demonstrate the need for 41,.l_ ,* , 0 comprehensive performance studies of the relationships between supercomputer architec- tures and applications and systems so,hware. 'Wvrgik"ArtiViLA Wolk supported by DOE NSF, and NAS& 5b# 314, s

problem such that the input, setup, solution, and output times take a total of one minute. It answers the question, "How big a problem ost computer benchmarks are can one run with this computer?" It is the simple. fued-size problems that are timed for first benchmark to allow fair comparison of various machines. Ames Laboratory research- the whole spectrum of computer speeds and ers have recently completed the design of the types. from the smallest personal computers first computer benchmark based on fixed-time to the fastest vector and parallel supercom- comparison principles. Called SLALOM, for puters. SLALOM is rapidly gaining accep- Scalable. Language-independent, Ames tance from computer manufacturers as a just Laboratory One-minute Measurement, the and realistic performance metric that has benchmark adjusts the number of unknown been needed for some time. variables in a complete radiation transfer Wadi suppled by DOR

developed by scientists at Argonne, NASA. and IBMthat uses multigrid methods to model fluid flow in two dimensions. In the existing esearch on software tools at Argonne serial Fortran code, four types of operations National Laboratory has been directed at the are carried out by sweeping sequentially over problem of aiding users who are adapting rectangular suMomains calledpatches. existing scientific and engineering computer Parallelizing the code becomes a matter of programs to execute efficiently on parallel carrying ow these operations concurrently over machines. The research is motivated by the the (niches. VecPar_77 extracted information enormous investments in expertise and comput- about dependencies that inhibit parallel ing resources that such programs represent. execution of the key loops that carry out these Because the work of a community of users may operations. Armed with this information, the depend on the effective application of a program developers made changes that enables program, its adaptation to parallel execution these loops to be executed in parallel. The has sigmfkam scientific and economic conse- result is a significant performance improve- quences. A collection of prototype tools that ment on machines that permit such parallel emerged from this research has been further operatione.g., the Sequent Symmetry. developed into a commercial software product The commercial development of VecPar _77 named VecPar 77 by the Numerical Algorithms was a cooperative endeavor in which one of the Group. Inc. (NAG), which will market the Argonne scientists who had participated in the product worldwide. research took leave to act as a consultant to VecPar_77 is a pre-compiler tool that NAG. His efforts were partially supported by a analyzes and transforms Fortran programs. Its technology commercialization grant from the function is illustrated by its application to the State of Illinois, administered by the Technol- parallelization of a computer program-- ogy Transfer Center at Argonne. Work supplied by DOE end the state of filmes

38 High Performaare Computing and Communications

Urbana, has aiso developed a computer- based procedure that accurately computes the structure of ribosomal RNA. Ribosomal RNA dentists at Argonne National Labora-forms the core of the translation amanitas tory are detennining the feasibility of using the mechanism that reads genetic bfonnation logic programming on high-peobrmance into patella sequesces.duos lies at the heart computers to tardy two- and three-dimen- erne cell's *talon and replication. sional structure in molecular biology. Argonne's automated procedure him A group at Argonne has devised a com- already promd sums.* in several ways. puter program sista Impious logic program- Previously undetected secondary structure ming techniques to match up to 500 sequenceshas been identified at positions 63 and 104 of ofDNAIRNA (basic genetic Wiling blocks ofE. coli I6S ribosom a 1 RNA (shown on the organism). It is the fins such automated codediagram in Ird). Moreover, new tertiary to handle more dum a modest number (about *sow hat been &covered in 7S RNA. ten) of sequences. The program, which is Since the chemical activity of ribosonses is based on a combination of the logic program-invaved in a number of processessmch as ming language Prolog and a lower,level building and repairing *sues, digestion, and language C, rwss efficiently on the Sequent the immure system's response to disease Symmetry machine and a nenvor* of Sun wsderstanding the structwe of ribosomal RNA workstations. is of considerable scienWic and medical The Argonne group, in collaboration with interest4 scientists at the University of Illbsois in affettad by DOE.

tee .Z0 a keS. 44240". f"ititcii. Cliafte.nfet 1.4 VCC6.,C1' 6...Taal/tor 4 t Op' 01,1.11.44(,,0.4.0

011., 'ha ) 'a k us w). . 6,04)..4t amidi ea) r . tviAlit CL Ban,.Cti. N .,,i1Q4,30tw uti "CIAIL ; AI FCV NeF i

4. ' e d't;SCIACwire t rAD-CuSs.4% 210 ..1 .4 : i i k a , 'cI.., A' '4> i r fat .r a ), ,0,, iI eau,nouvor .. g 4,.. c 1 wit .. I'M .. 2 8 4 ,.., A A,At0 ,. C20' ' t Ar0:1c'''.81/0atiftIOCKCCOCA44.0 ,,,,4 o Oat Lt., R, 1 011,,,, .,1 ol ?coat+ wc mar -'14, %, ,440..r "N,:c f. , :04-: 'MD % fig!) , F '", 4 lithe'Cre.' : 01 u'q:111- 7:a.$1k \ o.loac,fty>'t 1 48..ctvosaa 4 co '0\4....4 '... 2 LlOt0urC A fa 1" C. I r CtUett20:*C004040C t GG UP," :

) g

"" jth t

Esherichis coil

39 I a 14 hoe 1991

be used and expanded by others and with researchers. However, the number of postdoc- capabilities that are clearly and accurately toral visitors at the national laboratories in delimited as an integral pan of the code. mathematics, computational science, and any form of engineeriog is very low relative to PIM'!" Reasoning Systems other areas. HPCC offers ticinendous potential for nonnu- In the tuture. high school science and mathe- merical applications. We anticipate that future- matics courses will introduce computational generation supercomputers will actually science techniques. students will use them dur- function as scientific assistants. performing ing college, and they will continue to develop routine tasks with great efficiency and speed them throughout their professional lives. DOE and making informed dem. -ns on the basis of will play a substantial role in supporting our accumulated expenence. We need to explore educational system and in helping to provide the introffaction of parallel programming the highly specialized training beyond that paradie ins into automated reasoning and logic which is possible within traditional degree pro- progrr nming. grams. The following are ways that DOE plans to contribute: HUMA RESOURCES High School Programs to Attract Talented The long-term viability of U.S. competitiveness Students requires a workforce highly educated in DOE will significantly expand its current engineering and scientific disciplines. In recent summer program for inviting high school years traditional theoretical and experimental students and high school teachers to visit techniques in science and engineering have national laboratories. In addition, it will been augmented by a powerful new technique: establish close ties with high school teachers to computaional science. Scientists and engineers assist in appropriate curriculum development, now use computers for basic research in to provide guest lecturers, and to enhance their complex physical. biological, and chemical use of computers in the classroom, including phenomena. as well as in the design and access to supercomputers. This type of early manufacturing of complex engineering systems exposure to computing should draw more including automobiles. airplanes. nuclear students into computational science. In addi- reactors, and computers. th-n. it will generate a U.S. workforce that While theory and experimental methods will better understands how computers can improve not be replaced by computational science, they their productivity, regardless of their ultimate are being augmented by it in important ways. career choice. Computer simulations bridge theory and ex- periment by simultaneously considering sophis- Undergraduate Programs to Augment ticated models and controlled study of complex Educational Training results. Computing also glues scientific disci- DOE laboratories currently have limited plines together. Sophisticated tools are required semmer and co-op programs for college to solve the basic equations on a computer, in- students that provide an opportunity for dependent of the particular application. For ex- students to see how their formal training ample, the equations used to appmtCmate wing applies in practice. These programs also flow also describe gas flow in automobile en- provide valuable feedback and support to girms. blood flow in the heart, waves in the schools on the status of their programs. DOE ocean, hydraulic machinery, fluids in computer plans to enlarge these programs and extend disk drives, flow of polymers in material sci- them to top students from a greater number of ence. motion of oil in the earth, spread of pol- colleges and universities. The emerging lutants in the atmosphere, microchip fabrica- university graduate programs in computational tion, and the formation of metal alloys, as well science will need support in the next decade to as many other complex phenomena. A large evolve undergraduate programs. This support body of expertise is common to these seem- will include access to advanced computers for ingly different applications. classroom teaching, summer jobs in computa- Currently, computational science programs tional science for undergraduates. and develop- are emerging in a small number of U.S. univer- ment of course materials that cut across sities. DOE laboratories have long been leaders traditional departments. in developing computational science techniques and in training limited numbers 4 postdoctoral .1 "":

40 High Performance Cantina* and Canummicanans Graduate Pmgrams to Forms Students on HPCC Program, it is expected that the number Rum* OpportentRes of DOE-funded postdoctoral positions will At the graduate level, we need to interest Ow increase substantially at the national taboret°. best students in solving tough, impomet ties and at universities. probksns. DOE can contribute to this process in several ways, including jointly fumied UffivereltWindostra Exchanges to Expand tesearch projects; participation of DOE Tedmelogy Tramftr researches on PhD. thesis committees; Technology transfer addresses those people sponsotship of research and curriculum already in the workforce. DOE must provide a development workshops; creation of pre- and wide range of training to expand staff's skill in postdoctoral fellowships for computational the rapidly changing area of computational science in univesibes; and establishment of science, DOE can also use its existing experti.- sabbatical opportunities for faculty and their by encouraging temporary personnel ex- students. Many of these activities take place changes. DOE staff could help build new now; additional DOE funding through the programs in computational science within high HPCC Program will permit them to take place schools and universities while exposing faculty on a much larger scale. to the nature and needs of the national laborato- ries. Likewise, exchanges with industry could Post-Gralluste Pnvrams to Advance Researchquickly accelerate the process of taking New PhD.s have the skill and knowledge to technology developed at the laboratories and tackle the enabling technology needs and grand put it to use in industry. In the past, political. challenges of the HPCC Program. DOE has social, and legislative obstacles have severely started a postdoctoral program at the labs to limited this type of activity: with the HPCC strengthen research efforts in key areas. For the Program, DOE hopes to change this situation.

Universities must awly and be approved for participation in the program. Accepunce will be based on material submitted in the univer- shy application. These materials inchick a he Computational Science Graduate description of the curricuhan, enrollment data, Fellowship Program, sponsored by the DOE' s previous and ongoing research, paograduwe Mice of Energy Research, is designed to employment record, and faculty riswnes. support highly capable science and engineering Student applications consist of undergradu- students interested in pursuing docuaral study in ate and gtedmate transcripts (GRE scorn will an applied science ar an engineering discipline be required after the1991-1992 cytle),ftrculty with applications in high performance comput- references, an academic and career goal ing. The program provides a stipend 418000 statement, and a list of work experiences and rhe firez Year, $19200 the second year, $20,400 publications. the third year, and $21600 the Perth year), All fellows we required to work at a DOE tuition and fees, an butinaional allowance or a DOE-approved facility in a research ($1,000 a year), some travel expenses, and assignment related to ongoing high perfor- funding for a workstation. mance computing activities. The program Ls open to US. citizens who For typlication forms and more informa- have a bachelor's degree in life or physical tion, contact sciences, engineering, or mathematics. Eligible Computational Science Graduate applicants may be entering graduate school or Felloivship Program have extensive graduate study. Students who Science/Engineering Education Division have received depzronent (facialty) approval of a Oak Ridge Associated Universities PhD. thesis topic are not eligible for the P.O. Box 117, Oak Ridge, TN 37830-0117 program. (615) 576-0128; Telefax (615) 576-0202

Work =Redid by DOE.

if; 41 June 1991

ONCLUSION the National Research and Education Network. and basic research and human resources. The This document describes the DOE HPCC Program is driven by the need for program component of the Federal unprecedented computational power to investi- High Perfonnance Computing and gate and understand a wide range of scientific Communications (HPCC) Program. and engineering problems that are referred to as The HPCC Program features increasedgrand challenges and that are the fundamental cooperation between industry, academia, and problems of investigation for the mission of government and defines a strategy for support- DOE and the other participating agencies. ing advances through directed R&D efforts. The DOE has a long history of reliance on reduced uncertainties to industry for develop- and excellence in high performance computing ment and use of new technologies, support for and computational science. Consequently. DOE underlying research, network, and computa- fully embraces the goals of the HPCC Program tional infrastnictwes, and support for the U.S. and intends to mobilize its scientific commu- human resource base to meet the needs of nity to help accomplish them. industry, academia, and government. Several of the themes that will pervade The HPCC Program consists of four DOE's HPCC Program are already clear, others complementary components in key areas of will emerge as the program is implemented and HPCC. These areas are HPCC systems. evolves. The program will be applications advanced software technology and algorithms, driven but highly interdisciplinary, involving

! r 4 a Al ICI r p *

V II JIL P At . ,4 -.11 I I

I ;

:1°)Z '14:4 46,0

4k...

-

,4

-1,144 4444,,

42 4 7 fir qh Performance t'impuring and Oimmunoutums extensive teaming of scientists and engineers with mathematicians an- 7 mmputer scientists. Collaboration with industry and universities will be strongly encouraged. HPC Research renters will provide a focus for HPCC Pro- gram intellectual actit,ities. Computational sci-mce education will be emphasized to insure future human resources for the program. DOE will make use of the field work pmposal system to fund HPCC activities at the national laboratories. Unsolicited proposals submitted under the Special Research Grant Program ism DOE/ER-0249. October 19145) will be used to fund HPCC Program activities al other organizations.

REFERENCES

1. Grand ChalleEges: High PertOrmance Om:- Paring and Communications. to obtain a copy of this document. send request to Com- mittee of Physcial. Mathematical. and Engi- , neering Sciences. cio National Science Foundation, Computer and Information Sci- ence and Engineering, 1800 a Street NW, %0 Washington DC 20550. :jiZ1 V 2. Building an Advanced Climate Model: Pro- ":( Ir gram Plan fin- the CHAMMP Climate Mod- 4 . eling Program. U.S. Department of Energy report DOE/ER-0479T, December l94X), ;ix

4 2 43 3EST PRY Nl"i Prepared for the Department of Energy's Office of Energy Research Scientific Computing Staff

44 Most Informational Publications from the Department of Energy are printed in one color. However, in order to retain technical content, four colors have been used on a few figures in this publication. o. 11. 1 1 $ 4( " 111t1P1 a. a. AT' hrjrinrets" 4.,Ci4v of 41 4 4c.. a. 4 4 *v:#* 440 44$ " 4 C. a t 4 * - 4 -"%,,, 4 *, \ ft /A* 'j1.14 dt i"r4 $ . ' . - . A 6 ilAii, . -1. . 4 A 4 1 1r ' * . co "ii.. . I .s 1. ' ''''.41111.,...... 1.\NN . ...' . ilit4rAbOWolli. 1,11,11001.,, i ) 4 444 i f p. r4 40 00 4' 0.4 0