
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln NASA Publications National Aeronautics and Space Administration 1991 THE NAS PARALLEL BENCHMARKS D. H. Bailey NASA Ames Research Center E. Barszcz NASA Ames Research Center J. T. Barton NASA Ames Research Center D. S. Browning Computer Sciences Corporation R. L. Carter NASA Ames Research Center See next page for additional authors Follow this and additional works at: http://digitalcommons.unl.edu/nasapub Bailey, D. H.; Barszcz, E.; Barton, J. T.; Browning, D. S.; Carter, R. L.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Lasinski, T. A.; Schreiber, R. S.; Simon, H. D.; Venkatakrishnan, V.; and Weeratunga, S. K., "THE NAS PARALLEL BENCHMARKS" (1991). NASA Publications. 150. http://digitalcommons.unl.edu/nasapub/150 This Article is brought to you for free and open access by the National Aeronautics and Space Administration at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in NASA Publications by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Authors D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga This article is available at DigitalCommons@University of Nebraska - Lincoln: http://digitalcommons.unl.edu/nasapub/150 Introduction The Numerical Aerodynamic Simulation (NAS) Pro- gram, which is based at NASA Ames Research Center, THE NAS PARALLEL is a large-scale effort to advance the state of computa- BENCHMARKS tional aerodynamics. Specifically, the NAS organization aims &dquo;to provide the Nation’s aerospace research and the 2000 a D. H. development community by year high- of E. Barszcz,1Bailey,1 performance, operational computing system capable an entire vehicle within a J. T. Barton,1 simulating aerospace system time of one to several hours&dquo; D. S. computing (NAS Systems Browning,2 The successful solution of this R. L. Division, 1988, p. 3). Carter,1 will the L. Dagum,2 &dquo;grand challenge&dquo; problem require develop- ment of that can the re- R. A. Fatoohi,2 computer systems perform scientific at a sustained P. O. Frederickson,3 quired complex computations rate 1,000 times than current T. A. Lasinski,1 nearly greater generation can achieve. The architecture of com- R. S. Schreiber,3 supercomputers able to achieve this level of H. D. Simon,2 puter systems performance will be dissimilar to the shared V. Venkatakrishnan,2 likely memory multipro- of While no consensus and S. K. Weeratunga2 cessing supercomputers today. yet exists on what the design will be, it is likely that the NAS APPLIED RESEARCH BRANCH system will consist of at least 1,000 processors comput- NASA AMES RESEARCH CENTER ing in parallel. MOFFETT FIELD, CALIFORNIA 94035 Highly parallel systems with computing power Summary roughly equivalent to that of traditional shared mem- ory multiprocessors exist today. Unfortunately, for var- A new set of benchmarks has been for developed the ious reasons, the performance evaluation of these sys- performance evaluation of highly parallel supercom- tems on of scientific is puters. These consist of five "parallel kernel" bench- comparable types computations marks and three "simulated application" benchmarks. very difficult. Relevant data for the performance of al- Together they mimic the computation and data move- gorithms of interest to the computational aerophysics ment characteristics of large-scale computational fluid on available dynamics applications. The principal distinguishing community many currently parallel systems feature of these benchmarks is their "pencil and are limited. Benchmarking and performance evalua- paper" specification—all details of these benchmarks tion of such systems have not kept pace with advances in are specified only algorithmically. In this way many of hardware, software, and In the difficulties associated with conventional bench- algorithms. particular, there is as no benchmark marking approaches on highly parallel systems are yet generally accepted pro- avoided. gram or even a benchmark strategy for these systems. The popular &dquo;kernel&dquo; benchmarks that have been 1 This author is an employee of NASA Ames Research Center. used for traditional vector supercomputers, such as the 2 This author is an employee of Computer Sciences Corporation. Livermore Loops (McMahon, 1986), the LINPACK This work is supported NASA contract NAS 2-12961. through benchmark 1988a, and the 3 This author is an employee of the Research Institute for Ad- (Dongarra, 1988b), original vanced Computer Science (RIACS). This work is supported by the NAS Kernels (Bailey and Barton, 1985), are clearly in- NAS Systems Division via Cooperative Agreement NCC 2-387 be- appropriate for the performance evaluation of highly tween NASA and the Universities Space Research Association. parallel machines. First of all, the tuning restrictions of these benchmarks rule out many widely used parallel extensions. More importantly, the computation and memory requirements of these programs do not do jus- 63 tice to the vastly increased capabilities of the new par- allel machines, particularly those systems that will be available by the mid-1990s. &dquo;Highly parallel systems with com- On the other hand, a full-scale scientific application is unsuitable. a to a new puting power roughly equivalent to similarly Porting large program architecture a that of traditional shared memory parallel computer requires major effort, and it is difficult a research task multiprocessors exist today. Unfortu- usually to justify major simply to obtain a benchmark number. For that reason nately, for various reasons, the per- we believe that the otherwise very successful PERFECT formance evaluation of these sys- Club benchmark (Berry et al., 1989) is not suitable for tems on comparable types of scien- highly parallel systems. This is demonstrated by the tific computations is very difficult &dquo; sparse performance results for parallel machines in re- cent reports (Pointer, 1989, 1990; Cybenko et al., 1990). Alternatively, an application benchmark could as- sume the availability of automatic software tools for transforming &dquo;dusty deck&dquo; source into efficient parallel code on a variety of systems. However, such tools do not exist today, and many scientists doubt that they will ever exist across a wide range of architectures. Some other considerations for the development of a meaningful benchmark for a highly parallel super- computer are the following: Advanced parallel systems frequently require new algorithmic and software approaches, and these new methods are often quite different from the conventional methods implemented in source code for a sequential or vector machine. Benchmarks must be &dquo;generic&dquo; and should not favor any particular parallel architecture. This requirement precludes the usage of any architecture-specific code, such as message-passing code. The correctness of results and performance figures must be easily verifiable. This requirement implies that both input and output data sets must be kept very small. It also implies that the nature of the computation and the expected results must be specified in great detail. The memory size and run-time requirements must be easily adjustable to accommodate new systems with increased power. The benchmark must be readily distributable. In our view, the only approach that satisfies all of these constraints is a &dquo;paper-and-pencil&dquo; benchmark. The idea is to specify a set of problems only algorith- mically. Even the input data must be specified only on paper. Naturally, the problem has to be specified in sufficient detail that a unique solution exists, and the -Our benchmark set consists of two required output has to be brief yet detailed enough to major components: five parallel ker- certify that the problem has been solved correctly. The nel benchmarks and three simulated or the benchmarks on a person persons implementing application benchmarks. The simu- given system are expected to solve the various problems lated application benchmarks com- in the most appropriate way for the specific system. The bine several computations in a man- choices of data structures, algorithms, processor alloca- ner that resembles the actual order tion, and memory usage are all (to the extent allowed by of execution in certain the specification) left to the discretion of the imple- important CFD codes.&dquo; menter. Some extension of Fortran or C is required, application and reasonable limits are placed on the usage of assem- bly code and the like, but otherwise programmers are free to utilize language constructs that give the best per- formance possible on the particular system being studied. To this end, we have devised a number of relatively simple &dquo;kernels,&dquo; which are specified completely in Bai- ley et al. (1991). However, kernels alone are insufficient to completely assess the performance potential of a par- allel machine on real scientific applications. The chief difficulty is that a certain data structure may be very efficient on a certain system for one of the isolated ker- nels, yet inappropriate if incorporated into a larger ap- plication. In other words, the performance of a real computational fluid dynamics (CFD) application on a parallel system is critically dependent on data motion between computational kernels. Thus, we consider the complete reproduction
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages13 Page
-
File Size-