Exploiting High Performance Fortran for Computational Fluid Dynamics, Volume 919

Syracuse University SURFACE Northeast Parallel Architecture Center College of Engineering and Computer Science 1995 Exploiting High Performance Fortran for Computational Fluid Dynamics, volume 919 Ken Hawick Syracuse University, Northeast Parallel Architectures Center Geoffrey C. Fox Syracuse University, Northeast Parallel Architectures Center Follow this and additional works at: https://surface.syr.edu/npac Part of the Programming Languages and Compilers Commons, and the Software Engineering Commons Recommended Citation Hawick, Ken and Fox, Geoffrey C., "Exploiting High Performance Fortran for Computational Fluid Dynamics, volume 919" (1995). Northeast Parallel Architecture Center. 50. https://surface.syr.edu/npac/50 This Report is brought to you for free and open access by the College of Engineering and Computer Science at SURFACE. It has been accepted for inclusion in Northeast Parallel Architecture Center by an authorized administrator of SURFACE. For more information, please contact [email protected]. NPACTechnical Rep ort SCCS 661 Exploiting High Performance Fortran for Computational Fluid Dynamics K. A. Hawick and G. C. Fox Northeast Parallel Architectures Center Syracuse University, 111 College Place, Syracuse, NY 13244-4100, USA. Tel. 315-443-3933, Fax. 315-443-1973, Email: [email protected] ?? 30 Novemb er 1994 Abstract. We discuss the High Performance Fortran data parallel programming language as an aid to software engineering and as a to ol for exploiting High Performance Computing systems for computational uid dynamics applicatio ns. We discuss the use of intrinsic functions, data dis- tribution directives and explicitl y parallel constructs to optimize p erfor- mance by minimizin g communications requirements in a p ortable man- ner. In particular we use an implicit metho d such as the ADI algorithm to illustrate the ma jor issues. We fo cus on regular mesh problems, since these can b e eciently represented by the existing HPF de nition, but also discuss issues arising from the use of irregular meshes that are in- uencing a revised de nition for HPF-2. Some of the co des discussed are available on the World Wide Web at http://www.npac.syr.edu/hpfa/ alongwith other educational and discussion material related to applica- tions in HPF. 1 Intro duction Successful implementations of future computational uid dynamics (CFD) co des for large-scale aerospace systems require High Performance Computing and Net- 3 working (HPCN) technology to provide faster computation sp eeds and larger memory. Although parallel and distributed machines haveshown the promise of ful lling this ob jective, the p otential of these machines for running large-scale pro duction-oriented CFD co des has not b een fully exploited yet. It is exp ected that such computations will b e carried out over a broad range of hardware platforms thus necessitating the use of a programming language that provides p orta- bility, ease of maintainance for co des as well as computational eciency.Until recently, the unavailabilityofsuch a language has hindered any comprehensive ?? Submitted to HPCN 1995 3 Known as High Performance Computing and Communications (HPCC) in the USA. movetoward p orting co des from mainframes and traditional vector computers to parallel and distributed computing systems. High Performance Fortran (HPF)[7] is a language de nition agreed up on in 1993, and b eing widely adopted by systems suppliers as a mechanism for users to exploit parallel computation through the data-parallel programming mo del. HPF evolved from the exp erimental Fortran-D system [2] as a collection of extensions to the Fortran 90 language standard [10]. Many ideas were also absorb ed from the Vienna Fortran System [3]. We do not discuss the details of the HPF language here as they are well do cumented elsewhere[8 ]. HPF language constructs and emb edded compiler directives allow the programmer to express to the compiler additional information ab out how to pro duce co de that maps well to the available parallel or distributed architecture and thus runs fast and can make full use of the larger (distributed) memory.Wehave already conducted a study of the general suitability of the HPF language for CFD[1] using exp erimental HPF compilation systems develop ed at Syracuse and Rice, and with the growing availabilityof HPF compilers on platforms such as Digital's Alphafarm and IBM's SP2 we are now able to describ e sp eci c co ding issues. We employ an ADI algorithm applied to a steady ow problem for illustrative purp oses. There are some con icts b etween the optimal data decomp osition and the computational structure of the algorithm. Weshowhow the data DISTRIBUTE and data ALIGN directives of HPF can b e used to resolve this con ict. Full matrix algorithms can also b e implemented in HPF, although at the time of writing, the optimal algorithms for full matrix solution by LU factorisation, for example, are only available as message-passing based library software suchas ScaLAPACK[4]. We are currently investigating how scalable algorithms suchas these can either b e expressed as HPF co de directly, or as HPF-invo cable run- time libraries. Sparse matrix metho ds such as the conjugate gradientmethod are not trivial to implement eciently in HPF at present. The dicultyisan algorithmic one rather than a weakness of the HPF language itself, however[6]. For CFD simulations, it is generally of prime imp ortance to achieveagiven level of numerical accuracy for a given size of system in the shortest p ossible time. Consequently, there is a tradeo b etween rapidly converging numerical algorithms that are inecient to implement on parallel and distributed systems, and more slowly converging and p erhaps less numerically \interesting" algorithms that can b e implemented very eciently [5]. 2 CFD using HPF There are manyformulations for uid dynamics equations that giverisetoa computationally manageable form. The example system we use here involves the Poisson and vorticity transp ort equations that arise from the unsteady Navier Stokes equations for incompressible ows, where the velo city has comp onents (u; v ), is the stream-function and the vorticity[11]. These equations can b e expressed as Poisson equations in spatial terms with the time dep endence separated: 1 @ ( ) @ ( ) @ @ ( ) @ ( ) 2 = r + (1) @t Re @y @x @x @y A common approach to this problem is to use a split op erator technique such as the Alternating Direction Implicit (ADI) metho d [9]. This technique assumes that the ve star stencil op erator L for the nite di erence scheme can b e split into two line op erators: L = +2 (2) x i+1;j i;j i1;j L = +2 (3) y i;j +1 i;j i;j 1 for two dimensions (or three op erators in 3d). The computation for the ADI scheme requires the solution of two (in 2d) or three (in 3d) matrix equations for one (two) intermediate nite di erence step(s). Lab eling the iterations by n, this can b e expressed as: n+1 n n+1 n L +L x y = (4) 2 h 4t=2 n+2 n+1 n+1 n+2 L +L x y (5) = 2 h 4t=2 where, h and 4t are the nite di erences in space and time resp ectively. Note that wehaveintro duced an arti cial time variable to construct an iterative 4 2 algorithm using = r + = 0. In this formulation, we exp ect 4 to tend 4t to zero (within some convergence criterion) after a certain numb er of iterations. This is a somewhat arti cial construct, but we are only using this formulation to illustrate the computational structure and issues of implementation. Expressing this in matrix notation: 2 2 2h n+1 2h n 2 I) = ( I L ) h (6) (L + y x 4t 4t 2 2 2h n+2 2h n+1 2 I L ) h (7) I) =( (L + x y 4t 4t n+1 n where I is the identity matrix. We solve equation 6 for ,given and n+2 equation 7 for . This cycle of solving b oth matrix equations constitutes a complete iteration of the ADI algorithm. For this scheme, the matrices on the left hand sides of equations 6 and 7 are tridiagonal and we can represent these by three vectors A; B; C. Note that although we visualize as a scalar eld in 2d, for a 2d problem, and thus we imagine indexing its nite di erence representation by(i; j ) for the (x; y ) co-ordinates, from the p oint of view of matrix equations 6 and 7, is just a long vector that happ ens to b e partioned by a nested pair of indices (i; j ). We can therefore write a Fortran 77 co de to illustrate the tridiagonal solver for matrix equation 6 as shown in gure 1. REAL PSI(M1,M2), RHS(M1,M2), TMP(M1,M2) BETA=B PSI(1,1) = RHS(1,1) / BETA DO 30 J = 1,M2 DO 10 I = 1,M1 TMP(I,J) = C / BETA BETA = B - A * TMP(I,J) PSI(I,J) = ( RHS(I,J) - A * PSI(I-1,J) ) / BETA 10 CONTINUE DO 30 I = M1 -1, 1, -1 PSI(I,J) = PSI(I,J) - TMP(I+1,J) * PSI(I+1,J) 20 CONTINUE 30 CONTINUE Fig. 1. Tridiagonal solver for ADI Poisson equation in Fortran 77. This has two lo ops in I , one for decomp osition and forward substitution, and one for backsubstitution. These lo ops contain a dep endency on the previous values of I , but are indep endentofJ .Anidentical co de fragment can b e used for equation 7 but with I and J DO loop variables interchanged. Note that the three vectors a; b; c of the tridiagonal matrix can b e written here as constants for the simple linear case shown.

Exploiting High Performance Fortran for Computational Fluid Dynamics, Volume 919

Verification and Validation in Computational Fluid Dynamics1

Computational Physics * Cruz a Editorial Office, Pure and Applied Physics, India

Fast and Efficient MATLAB-Based MPM Solver

Computational Mechanics Featuring Matlab DRAFT EDITION $Revision: 1.52

Curriculum Vitae Boyce E

Application of Monte Carlo Method in Grid Computing and Allied Fields

Stochastic Computational Mechanics

Quantum Mechanics Computational Physics

The Multiscale Damage Mechanics in Objected-Oriented Fortran Framework

Discontinuous Galerkin Methods for Computational Fluid Dynamics

D7.5 Application Performance on Accelerators

COMPUTATIONAL MECHANICS (2º Cyy,Cle, Master of Science Degree)