NCAR/TN-316+STR Experiences on a CRAY-2 Running UNICOS
Total Page:16
File Type:pdf, Size:1020Kb
NCAR/TN-316+STR NCAR TECHNICAL NOTE mmmmm - Al August 1988 Experiences on a CRAY-2 Running UNICOS M. Pernice SCIENTIFIC COMPUTING DIVISION mmm NATIONAL CENTER FOR ATMOSPHERIC RESEARCH BOULDER, COLORADO ii Trademark Information * CRAY X-MP, CRAY-2, COS, UNICOS, CFT, CFT77, and CFT2 are trademarks of Cray Research, Inc. * UNIX and System V are trademarks of AT&T Bell Laboratories. * Amdahl and UTS are registered trademarks of Amdahl Corporation. * VAX and VMS are registered trademarks of Digital Equipment Corporation. * IBM is a registered trademark of International Business Machines Corporation. · Memorex is a registered trademark of Memorex Corporation. * Macintosh is a trademark of Apple Computer, Inc. iii Table of Contents Preface -....... ......... ............................... ................................................................... v 1. Performance of the NCAR Benchmark Suite...................................... 1 1.1 Description of the Suite.................................. ... 1 1.2 Description of the Hardware.------ ------ ------ 2 1.3 Description of the Software ...................................... 2 1.4 Timing Results. ..................................... 3 1.4.1 Results for Loop Kernels...................................... 3 1.4.2 Results for Subroutine Kernels ...................................... 1.4.3 Results for a Climate Model ...................................... 9 1.5 Experiences with the Compilers ...................................... 1 2 1.6 Summary and Conclusions...................................... .. 1 5 2. The NAS Computing Environment...................................... 17 2.1 Documentation ...................................... 17 2.2 Front-ends............................... 1 7 2.3 Mass Storage System .................................. .... ...... 1 7 2.4 Mass Storage System Interface ....................................... 1 8 2.5 Network Interfaces.......................................1 8 2.6 UNICOS on the CRAY-2s ...................................... 1 9 2.7 Conclusions......................................21 Appendix 1...................................... 23 Appendix 2......................................2 7 Appendix 3 . .......... ... ... ................ 3 1 v Preface The CRAY X-MP/48 at the National Center for Atmospheric Research (NCAR) was installed and became operational in the fourth quarter of 1986. By the end of the second quarter of 1988, the X-MP was saturated. It is expected that NCAR will replace the X-MP with the next generation supercomputer sometime in 1990-1992. Manufacturers of general-purpose supercomputers (notably Cray Research, Inc. and ETA Systems) are investing heavily in operating systems based on AT&T System V UNIX, so the replacement for the X-MP will likely have a UNIX operating system. The users of NCAR's computing facilities have been using the batch-oriented Cray Operating. System (COS) since the installation of NCAR's first CRAY-1 in 1977. Concerns have been expressed about the impact that a transition to UNIX will have on our user community. Further questions have been raised about the impact of UNIX on supercomputer performance, particularly on the performance of applications that require substantial amounts of input and output (I/O) operations. To address these concerns, the Performance Analysis Project, which is part of the Computational Support Section of the Scientific Computing Division, obtained access to the computing systems maintained by the Numerical Aerodynamics Simulation (NAS) Program at NASA's Ames Research Center at Moffet Field, California. These systems include a pair of CRAY-2 mainframes that run the UNICOS operating system. By running a set of benchmark codes, the performance of these mainframes and of the uniform UNIX computing environment at NASA-Ames were evaluated. This Technical Note presents the results of the experience gained on these systems. The Scientific Computing Division gratefully acknowledges NAS for providing access to its computing facilities and for the support that was provided during the course of these investigations. 1 1. Performance of the NCAR Benchmark Suite 1.1 Description of the Suite The NCAR Benchmark Suite consists of the following 8 programs: BNMK01: tests basic arithmetic capabilities. The following vector operations are timed for values of the vector length ranging from 5 to 1000: V < V+V S - S + V*V (dot product) V*v V*VV S*V + V*V V- V/V V- S*V + V*V +S V<- V*(V+V) BNMK02: timing and accuracy test of intrinsic functions cos, acos, single and double precision exp and log. BNMK03: timing and accuracy test of real forward- and inverse-FFT routines from FFTPACK. BNMK04: timing and accuracy test of separable elliptic equation solver SEPELI from FISHPAK. BNMK05: timing and accuracy test of linear equation solver SGEFA-SGESL from LINPACK. BNMK06: timing and accuracy test of eigenvalue routines TRED1 and TQLRAT from EISPACK. shaln: timing and accuracy test of a shallow water equation model on an nxn doubly-periodic grid. The values n=64 and n=256 are used. CCM: version CCMOB of the NCAR community climate model. 2 1.2 Description of the Hardware There are 2 CRAY-2s managed by the Numerical Aerodynamics Simulation (NAS) Project at NASA-Ames, named navier and stokes. Both are 4 CPU CRAY-2s, each having a 256 million word Common Memory, 16,384 words of Local Memory for each processor, and a 4.1 nanosecond (nsec) cycle time. Each CRAY-2 has a single Foreground Processor that coordinates these components. The distinguishing feature between the two of them is that stokes is a dynamic memory CRAY-2, having a memory cycle time of 120 nsec, while navier is a static memory model with a shorter memory cycle time of 80 nsec. A bank conflict on navier causes a maximum memory delay of 45 cycles (184.5 nsec) while a bank conflict on stokes causes a maximum memory delay of 57 cycles (233.7 nsec). Each CRAY-2 is equipped with DD-49 disk drives. Some interesting early history of NAS's experiences with a CRAY-2 can be found in the article Early Experiences with the NAS CRAY-2 by John T. Barton, which appears in the Spring 1986 issue of the Cray User Group (CUG) Proceedings. 1.3 Description of the Software Both navier and stokes ran UNICOS version 3.0 for most of the test period. On June 20, 1988, the operating system on navier was upgraded to a pre-release version of UNICOS 4.0. On July 19, stokes was also upgraded to UNICOS 4.0. There are two FORTRAN compilers available on navier and stokes: version 4.0b of cft2, and version 3.0 of cft77. cft2 was upgraded from version 3.1b with the UNICOS upgrade. cft77was upgraded from version 2.0 in early May. These software changes had little impact on the performance of the benchmark suite, except for the double precision functions: the measured times for dexp and dlog under UNICOS 4.0 were about a third of the measured times for these functions under UNICOS 3.0, while the computed error roughly doubled. All measurements on the CRAY-2s that appear in this report were made under UNICOS 4.0. 3 1.4 Timing Results For each program in the benchmark suite, timing runs were made under three different conditions. Data labeled 'Fully Loaded' refers to data obtained when running under a typical daytime load. Data labeled 'Late Night' refers to data obtained from runs made between 1 a.m. and 2 a.m., to determine if performance was sensitive to the interactive load. Data labeled 'Dedicated' refers to data obtained when no other program was executing. Except where noted, all of the execution times were obtained by calls to the SECOND function, which reports the CPU time that has elapsed since the start of the job. 1.4,1 Results for Loop Kernels Figures 1-7 compare the performance of BNMK01 on navier and on the CRAY X-MP at NCAR, which has a cycle time of 8.5 nsec. The X-MP timings are represented by the shaded bars, and were obtained using CFT 1.16 on a dedicated system running COS 1.16. These figures show that an X-MP easily outperforms a CRAY-2 on these kernels, despite having more than twice the clock period. This is consistent with other reports on the computational speed of a CRAY-2. Mflnn - .. i'".' (.U 58.5 39.0 19.5 5 10 20 100 500 1000 Vector Length Figure 1: Performance of V+V navier vs. X-MP 4 L A fft r o 74 MIOpIpS MO.U 59.5 _.II U 39.0 I 19.5 I 1 _vrilI I I I I I 1 5 10 20 100' 50 1000 Vector Length Figure 2: Performance of V*V navier vs. X-MP I A Xi - 2 MtlOpS 5U.U 22.5 15.0 7.5 5 10 20 100 500 1000 Vector Length Figure 3: Performance of V/V navier vs. X-MP 5 Mf l c an I~"'IVrI' 1; 000 Vector Length Figure 4: Performance of V*(V+V) navier vs. X-MP IL *.l.I "trn MTIOpS 80 40 5 10 20 100 500 1000 Vector Length Figure 5: Performance of dot product navier vs. X-MP 6 -al - A^ MtlOpSa 14u I* 105 I 70 _/I 35 I m II I I I II 1 I I 5 10 20 100 500 1000 Vector Length Figure 6: Performance of S*V + V*V navier vs. X-MP .A. -_ _ 4 A f MTIOpS 70 F 1 Vector Length Figure 7: Performance of S*V + V*V + S navier vs. X-MP 7 Figures 8-14 compare the performance of BNMK01 on stokes and on the CRAY X-MP at NCAR and appear in Appendix 1. The data represented by Figures 1-7 and Figures 8-14 can each be reduced into a single number that can be interpreted as a performance ratio between the CRAY-2 and the CRAY X-MP on the set of loop kernels in BNMK01. Several approaches are possible. The method used here takes the weighted harmonic mean of all of the rates reported for a single operation, resulting in a mean execution rate for each loop kernel in BNMK01.