Parallel Computation of Finite Element Navier-Stokes Codes Using MUMPS Solver
Total Page:16
File Type:pdf, Size:1020Kb
IJCSI International Journal of Computer Science Issues, Vol. 4, No. 2, 2009 20 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 Parallel Computation of Finite Element Navier-Stokes codes using MUMPS Solver Mandhapati P. Raju 1 Mechanical Engineering, Case Western Reserve University, Cleveland, Ohio 44106 [email protected] Abstract is to use direct solvers in a distributed computing The study deals with the parallelization of 2D and 3D finite environment. element based Navier-Stokes codes using direct solvers. The system of non-linear equations obtained from the Development of sparse direct solvers using multifrontal solvers discretization of Navier-Stokes equations is usually solved has significantly reduced the computational time of direct using a Newton or a Picard algorithm. Newton algorithms solution methods. Although limited by its stringent memory are known for their quadratic convergence behavior. requirements, multifrontal solvers can be computationally efficient. First the performance of MUltifrontal Massively When the initial guess is close to the final solution, Parallel Solver (MUMPS) is evaluated for both 2D and 3D codes Newton achieves quadratic convergence. In this paper, in terms of memory requirements and CPU times. The scalability only the Newton algorithm is used. In using direct solvers, of both Newton and modified Newton algorithms is tested. factorization of the left hand side matrix is the most time Key words: finite element, MUMPS solver, distributed consuming step. To avoid factorization during every computing, Newton method. iteration, a modified Newton is used in which the factorization is done only during the first iteration. The left side matrix evaluated for the first iteration is retained and 1. Introduction is not changed during the subsequent iterations. Only the right hand side matrix is updated during each iteration Discretization of Navier-Stokes equations involves a large step. The right hand side vector is appropriately modified set of non-linear equations, requiring high computing to give the final converged solution. Since the power in terms of speed and memory. The resulting set factorization is done only during the first iteration, the of weak form (algebraic) equations in such problems may subsequent iterations are extremely cheap. It usually be solved either using a direct solver or an iterative solver. requires more number of iterations to obtain the overall The direct solvers are known for their generality and convergence. So there is a tradeoff between the robustness. The direct solution methods generally involve computational time per iteration and the number of the use of frontal algorithms [1] in finite element iterations to obtain the final convergence. Although the applications. The advent of multifrontal solvers [2] has convergence rate is lower compared to the Newton greatly increased the efficiency of direct solvers for sparse iteration, the savings in computational time per iteration is systems. They make full use of the high computer so high that it can more than compensate the decrease in architecture by invoking level 3 Basic Linear Algebra the convergence rate. Subprograms (BLAS) library. Thus the memory MUMPS [11-13] and SUPERLU [14] are amongst the requirement is greatly reduced and the computing speed fastest parallel general sparse direct solvers that are greatly enhanced. Multifrontal solvers have been available under public domain software. A detailed successfully used both in the context of finite volume description of the various features and algorithms problems [3-5], finite element problems [6] and in power employed in these packages can be found in [15]. system simulations [7-9]. The disadvantage of using direct MUMPS is found to be much faster compared to solvers is that the memory size increases much more SUPERLU, although its scalability is low compared to that rapidly than the problem size itself [6]. To circumvent this of SUPERLU. In this paper, parallelization is achieved problem, out-of-core multifrontal solvers [10] have been using a MUltifrontal Massively Parallel Solver (MUMPS) developed which has the capability of storing the factors on a distributed environment using MPI. The linear system on the disk during factorization. Another viable alternative of equations is evaluated on different processors IJCSI IJCSI International Journal of Computer Science Issues, Vol. 4, No. 2, 2009 21 corresponding to the local grid assigned to the processor. calculations, instead of the primitive u,v,w,p formulation, The right hand side vector is assembled on the host penalty approach is used to reduce the memory processor and is input to the MUMPS solver. On exit from requirements. the MUMPS solver, the solution is assembled centrally on ∂uvw∂∂ + +=0, (7) the host processor. This solution is then broadcast to all ∂∂∂xyz the processors. In the context of modified Newton algorithm, the LU factors evaluated during the first ∂∂∂2 ∂∂∂∂∂∂⎛⎞uvw⎛⎞2 u iteration are reused and the solution of the linear system ()uuvuw++() () =λ ⎜⎟ +++⎜⎟ ∂∂∂xyz ∂∂∂∂∂∂ xxyzxxRe with the new right hand side vector is solved. The ⎝⎠⎝⎠ ∂ ⎛⎞11⎛⎞∂∂uv ∂⎛∂∂⎞⎛⎞ uw performance of the solver in terms of scalability and ++++⎜⎟⎜⎟⎜⎟⎜⎟, memory issues for both two-dimensional and three- ∂∂∂∂∂∂yyxzzx⎝⎠Re⎝⎠⎝⎠ Re ⎝⎠ dimensional problems are discussed in detail. (8) 2. Mathematical Formulation ∂∂∂2 ∂∂∂∂∂∂∂⎛⎞⎛⎞uvw⎛⎞1 uv ()uv++() v() vw =λ ⎜⎟⎜⎟ +++⎜⎟ + ∂∂∂x yz ∂∂∂∂∂∂∂ yxyzxyx⎝⎠⎝⎠⎝⎠Re The governing equations for laminar flow through a two- ∂∂∂⎛⎞21vvw⎛⎞ ⎛ ∂∂ ⎞ dimensional rectangular duct are presented below in the +++⎜⎟⎜⎟ ⎜ ⎟, non-dimensional form. ∂∂∂∂∂yyzzy⎝⎠Re⎝⎠ Re ⎝ ⎠ ∂∂uv +=0, (1) (9) ∂∂xy and ∂ ∂∂ ∂∂∂∂∂⎛∂∂⎞⎛⎞uvw1 ⎛⎞ uw uw++= vw w2 λ +++ + ∂∂2 ∂∂∂∂∂∂pu⎛⎞21⎛⎞⎛⎞ uv () () () ⎜⎟⎜⎟⎜⎟ ()uuv+=−++() ⎜⎟⎜⎟⎜⎟ +, ∂∂∂x yz ∂∂∂∂∂∂∂ zxyzxzx⎝⎠⎝⎠Re ⎝⎠ ∂∂xy ∂∂∂∂∂∂ xxxyyx⎝⎠Re Re ⎝⎠⎝⎠ ∂∂∂∂∂⎛⎞12⎛⎞vw⎛⎞ w (2) +++⎜⎟⎜⎟⎜⎟. ∂∂∂∂∂yzyzz⎝⎠Re⎝⎠⎝⎠ Re (10) and where uvw,, are the x, y and z components of velocity,. ∂∂2 ∂∂∂∂∂∂p ⎛⎞12⎛⎞⎛⎞uv v (3) The bulk flow Reynolds number, Re=ρU0D/µ, U0 being ()uv+=−+++() v ⎜⎟⎜⎟⎜⎟, ∂∂x yyxyxyy ∂∂∂∂∂∂⎝⎠Re⎝⎠⎝⎠ Re the inlet velocity, ρ the density, L the channel length, µ is the dynamic viscosity and λ is the penalty parameter. where uv, are the x and y components of velocity, p is Velocities are non-dimensionalized with respect to U0. The boundary conditions are prescribed as follows: the pressure. The bulk flow Reynolds number, (1) Along the channel inlet: Re=ρU0D/µ, U0 being the inlet velocity, ρ the density, L the channel length, and µ is the dynamic viscosity. uvw= 1;== 0; 0. (11) Velocities are non-dimensionalized with respect to U0, 2 pressure with respect to ρU0 . (2) Along the channel exit : The boundary conditions are prescribed as follows: ∂uvw∂∂ (1) Along the channel inlet: = 0;== 0; 0. (12) ∂∂∂xxx uv==1; 0. (4) (3) Along the walls: (2) Along the channel exit: uvw= 0; ==0; 0. (13) ∂∂uv p ===0; 0; 0. (5) ∂∂xx (3) Along the walls: 4. Newton’s Algorithm uv==0; 0. (6) The set of non-linear equations obtained by the discretization of Galerkin Finite element formulation is The governing equations for laminar flow through a solved using Newton’s iterative algorithm. three-dimensional rectangular duct are presented below in the non-dimensional form. In three-dimensional IJCSI IJCSI International Journal of Computer Science Issues, Vol. 4, No. 2, 2009 22 Let X (k ) be the available vector of field unknowns All these steps can be called separately or as a th% st combination of each other. This can be exploited to save for the k iteration. Then the update for the k +1 () some computational effort during the solution of iteration is obtained as subsequent iterations in the solution of a set of nonlinear XX()kk+1 =+ ()αδ X () k, (14) equations. For example if the structure of the matrix does %% % k not change during every iteration, the analysis step can be where α is an under-relaxation factor, and δ X ( ) is the % skipped after evaluating once. Similarly, if the left hand correction vector obtained by solving the linearized matrix does not change, both the analysis and the system factorization steps can be skipped. ()k (k ) . (15) []JX{δ } =−{} RX %% Here, [J] is the Jacobian matrix, 6. Parallel Implementation ()k ∂RX . (16) []J = k The MUMPS solver is implemented using the MPI library, ∂X% () which makes the code very portable and usable on both, and ()k is the residual% vector. Newton’s iteration is {}RX shared and distributed memory parallel computers. The continued% till the infinity norm of the correction vector parallelization is done internally in the code. The calling δ X (k ) converges to a prescribed tolerance of 10−6 . program should also be in a parallel environment to call the code. In the present formulation, each element is The% modified Newton’s algorithm employs the jacobian assigned to particular processor such the elements are calculated during the first iteration repeatedly for all the equally (or almost equally) distributed amongst all the successive iterations. The jacobian is not updated. This processors. The computation of the matrix coefficients and would reduce the rate of convergence of the Newton the right hand side vector are done in parallel algorithm. Since the jacobian is not updated, factorization corresponding to the set of local elements. Evaluation of can be skipped during all the subsequent iterations. the Jacobian matrix and the right hand side vector in a parallel environment is crucial for problems, which 5. MUMPS Solver consume lot of time for the evaluation of matrix coefficients. The most time consuming part is the solution of the set During the progress of overall iterations, the different set of linear equations. To solve these linear systems, we use a of linear equations obtained during every iteration is robust parallel solver MUMPS. It employs multifrontal solved by successive calls to the MUMPS. For the techniques to solve the set of linear equations on parallel modified Newton’s algorithm, the left hand matrix computers on a distributed environment using MPI. It remains the same (numerically).