Solving Incompressible Navier-Stokes Equations on Heterogeneous Parallel Architectures Yushan Wang
Total Page:16
File Type:pdf, Size:1020Kb
Solving incompressible Navier-Stokes equations on heterogeneous parallel architectures Yushan Wang To cite this version: Yushan Wang. Solving incompressible Navier-Stokes equations on heterogeneous parallel architectures. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Paris Sud - Paris XI, 2015. English. NNT : 2015PA112047. tel-01152623 HAL Id: tel-01152623 https://tel.archives-ouvertes.fr/tel-01152623 Submitted on 18 May 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Universit´eParis-Sud ECOLE´ DOCTORALE 427 : INFORMATIQUE PARIS SUD Laboratoire de Recherche en Informatique THESE` DE DOCTORAT INFORMATIQUE par Yushan WANG Solving Incompressible Navier-Stokes Equations on Heterogeneous Parallel Architectures Date de soutenance: 09/04/2015 Composition du jury: Directeur de th`ese : Marc BABOULIN Professeur (Universit´eParis-Sud, Orsay) Co-directeur de th`ese : Olivier LE MAˆITRE Directeur de Recherche (LIMSI/CNRS, Orsay) Rapporteurs : Fabienne JEZ´ EQUEL´ Maˆıtrede Conf´erences (LIP6, Paris) Masha SOSONKINA Professeur (Old Dominion University, USA) Examinateurs : Abdel LISSER Professeur (Universit´eParis-Sud, Orsay) Michel KERN Charg´ede Recherche (INRIA, Le Chesnay) Membre invit´e: Yann FRAIGNEAU Ing´enieur de Recherche (LIMSI/CNRS, Orsay) ii R´esum´e Dans cette th`ese,nous pr´esentons notre travail de recherche dans le domaine du calcul haute performance en m´ecaniquedes fluides. Avec la demande croissante de simula- tions `ahaute r´esolution,il est devenu important de d´evelopper des solveurs num´eriques pouvant tirer parti des architectures r´ecentes comprenant des processeurs multi-cœurs et des acc´el´erateurs. Nous nous proposons dans cette th`esede d´evelopper un solveur efficace pour la r´esolutionsur architectures h´et´erog`enesCPU/GPU des ´equationsde Navier-Stokes (NS) relatives aux ´ecoulements 3D de fluides incompressibles. Tout d'abord nous pr´esentons un aper¸cude la m´ecaniquedes fluides avec les ´equationsde NS pour fluides incompressibles et nous pr´esentons les m´ethodes num´eriquesexistantes. Nous d´ecrivons ensuite le mod`elemath´ematique,et la m´ethode num´eriquechoisie qui repose sur une technique de pr´ediction-projection incr´ementale. Nous obtenons une distribution ´equilibr´eede la charge de calcul en utilisant une m´ethode de d´ecomposition de domaines. Une parall´elisation`adeux niveaux combin´eeavec de la vectorisation SIMD est utilis´eedans notre impl´ementation pour exploiter au mieux les capacit´esdes machines multi-cœurs. Des exp´erimentations num´eriquessur diff´erentes architectures parall`elesmontrent que notre solveur NS obtient des performances satis- faisantes et un bon passage `al'´echelle. Pour am´eliorerencore la performance de notre solveur NS, nous int´egronsle calcul sur GPU pour acc´el´ererles t^aches les plus co^uteuses en temps de calcul. Le solveur qui en r´esultepeut ^etreconfigur´eet ex´ecut´esur diverses architectures h´et´erog`enes en sp´ecifiant le nombre de processus MPI, de threads, et de GPUs. Nous incluons ´egalement dans ce manuscrit des r´esultats de simulations num´eriquespour des benchmarks con¸cus `apartir de cas tests physiques r´eels.Les r´esultatsobtenus par notre solveur sont compar´esavec des r´esultatsde r´ef´erence. Notre solveur a vocation `a^etreint´egr´edans une future biblioth`equede m´ecaniquedes fluides pour le calcul sur architectures parall`eles CPU/GPU. Mots cl´es: ´equations de Navier-Stokes, m´ethode de pr´ediction-projection, calcul haute performance, parall´elisationmulti-niveaux, calcul sur GPU. iv Abstract In this PhD thesis, we present our research in the domain of high performance software for computational fluid dynamics (CFD). With the increasing demand of high-resolution simulations, there is a need of numerical solvers that can fully take advantage of current manycore accelerated parallel architectures. In this thesis we focus more specifically on developing an efficient parallel solver for 3D incompressible Navier-Stokes (NS) equations on heterogeneous CPU/GPU architectures. We first present an overview of the CFD domain along with the NS equations for in- compressible fluid flows and existing numerical methods. We describe the mathematical model and the numerical method that we chose, based on an incremental prediction- projection method. A balanced distribution of the computational workload is obtained by using a domain decomposition method. A two-level parallelization combined with SIMD vectorization is used in our implementation to take advantage of the current distributed multicore machines. Numerical experiments on various parallel architectures show that this solver provides satisfying performance and good scalability. In order to further improve the performance of the NS solver, we integrate GPU comput- ing to accelerate the most time-consuming tasks. The resulting solver can be configured for running on various heterogeneous architectures by specifying explicitly the numbers of MPI processes, threads and GPUs. This thesis manuscript also includes simulation results for two benchmarks designed from real physical cases. The computed solutions are compared with existing reference results. The code developed in this work will be the base for a future CFD library for parallel CPU/GPU computations. Keywords: Navier-Stokes equations, prediction-projection method, Helmholtz solver, Poisson solver, high performance computing, multi-level parallelization, GPU comput- ing. vi Acknowledgements First of all, I would like to thank Marc Baboulin and Olivier Le Ma^ıtre for acting as advisors for my thesis. I appreciate very much their help in this multidisciplinary collaborative research work and their availability. I would also like to thank Yann Fraigneau, with whom I had detailed and fruitful dis- cussions about the code SUNFLUIDH. I also thank the referees of my thesis, Fabienne J´ez´equeland Masha Sosonkina, for constructive comments on the manuscript. I express my gratitude to Franck Cappello (Joint-Laboratory on Extreme Scale Com- puting) for funding my visit to Argonne National Laboratory, USA, in August 2013. I want to thank Jack Dongarra and Stanimire Tomov for giving me access to their com- puting resources at Innovative Computing Laboratory (University of Tennessee, USA). I acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this thesis. Finally, I want to express my gratitude to Jo¨elFalcou, Karl Rupp and the colleagues at LRI, who enabled me to progress in my thesis. vii Contents R´esum´e iii Abstract v Acknowledgements vii Contents viii List of Figures xi Introduction 1 1 Navier-Stokes equations 5 1.1 Computational fluid dynamics and Navier-Stokes equations . 5 1.1.1 Fluid mechanics ............................ 6 1.1.2 Equations of fluid dynamics ...................... 8 1.1.3 The dimensionless Navier-Stokes equations . 13 1.2 Numerical methods for the incompressible Navier-Stokes equations . 14 1.3 Incremental prediction-projection method . 15 1.4 Solution of the Helmholtz system ....................... 17 1.5 Solution of the Poisson equation ....................... 19 1.6 Spatial discretization .............................. 21 1.6.1 Staggered mesh ............................. 21 1.6.2 Spatial discretization for the Navier-Stokes equation . 24 1.7 Conclusion of Chapter 1 ............................ 27 2 Parallel algorithms for solving Navier-Stokes equations 29 2.1 Domain decomposition approach ....................... 30 2.2 Multi-level parallelism ............................. 34 2.2.1 Shared memory architecture ...................... 34 2.2.2 Distributed memory architecture ................... 35 2.2.3 Combining shared and distributed memory systems . 35 2.3 General structure of the solver ........................ 38 2.4 Accelerating the solution of the tridiagonal systems . 39 2.5 Performance results for Navier-Stokes computations . 44 2.5.1 Shared memory with pure MPI programming model . 44 ix Contents x 2.5.2 Performance using MPI + OpenMP . 48 2.5.3 Performance comparison with an iterative method . 49 2.6 Conclusion of Chapter 2 ............................ 51 3 Taking advantage of GPU in Navier-Stokes equations 53 3.1 Introduction to GPU computing ....................... 54 3.2 Using GPU for solving Navier-Stokes equations . 57 3.2.1 A GPU Helmholtz-like solver ..................... 57 3.2.2 A GPU Poisson solver ......................... 62 3.2.3 General structure of the GPU solver . 65 3.3 Experimental results .............................. 66 3.3.1 Overview of computational resources . 66 3.3.2 Performance of the Helmholtz solver . 67 3.3.3 Performance of the Poisson solver ................... 68 3.3.4 Performance of the hybrid CPU/GPU Navier-Stokes solver . 69 3.4 Conclusion of Chapter 3 ............................ 71 4 Simulations of Physical Problems 73 4.1 Three dimensional Taylor-Green vortices ................... 74 4.1.1 Benchmark settings .......................... 74 4.1.2 Validation