Multigrid Method for This Step

The Pennsylvania State University The Graduate School College of Engineering PARALLEL PARTICLE-IN-CELL PERFORMANCE OPTIMIZATION: A CASE STUDY OF ELECTROSPRAY SIMULATION A Thesis in Computer Science and Engineering by Ramachandran Kodanganallur Narayanan © 2016 Ramachandran Kodanganallur Narayanan Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science May 2016 The thesis of Ramachandran Kodanganallur Narayanan was reviewed and approved∗ by the following: Kamesh Madduri Assistant Professor in Department of Computer Science and Engineering Thesis Advisor Mahmut Taylan Kandemir Professor in Department of Computer Science and Engineering Director of Graduate Studies John Hannan Associate Professor in Department of Computer Science and Engineering Interim Associate Department Head ∗Signatures are on file in the Graduate School. ii Abstract The particle-in-cell (PIC) numerical technique is frequently used in physics and engineering simulations. In this work, we describe ES-PICBench, a new shared- memory parallel implementation of the PIC technique for electrospray simulations. Electrospray simulations are used in aerospace applications, and the goal of an electropray simulation is to understand behavior of an electrospray thruster or a colloid thruster. We discuss performance optimizations for various steps of a PIC- based electrospray simulation. One of the main steps in this simulation is solving the Poisson partial differential equation, and this step can be in turn converted to solving a system of linear equations. We develop a parallel implementation of the Multigrid method for this step. We demonstrate that ES-PICBench is significantly faster than other parallel PIC electrospray simulation implementations on Intel Xeon multicore platforms. Further, ES-PICBench can serve as a real-world scientific computing benchmark for analyzing parallel system performance. iii Table of Contents List of Figures vii List of Tables ix List of Symbolsx Acknowledgments xii Chapter 1 Introduction1 Chapter 2 Problem Formulation3 2.1 Problem Parameters .......................... 4 2.2 Methodology .............................. 6 2.3 Prior Work ............................... 8 Chapter 3 PIC Implementation9 3.1 Creation of Grid ............................ 10 3.2 Calculation of Release Rate ...................... 10 3.3 Preallocation and Storage ....................... 11 3.4 Releasing Particles ........................... 11 3.5 Particles leaving the domain...................... 13 3.6 Weighting charges to nodes ...................... 14 3.7 Resort Particles............................. 17 3.8 Resetting RHS Vector ......................... 18 3.9 Moving Particles ............................ 18 3.10 Parallelization.............................. 19 3.10.1 Releasing Particles ....................... 20 iv 3.10.2 Particles leaving the domain.................. 20 3.10.3 Update Charge Fractions.................... 21 3.10.4 Moving Particles ........................ 22 3.10.5 Performance........................... 22 3.11 Results-Particle Distribution...................... 24 Chapter 4 Poisson Solve 27 4.1 Initial Serial Solver........................... 28 4.1.1 Matrix-free approach...................... 28 4.1.2 Exploration of Solver Options with PETSc.......... 31 4.2 Results.................................. 32 Chapter 5 Multigrid Method 35 5.1 Methodology .............................. 35 5.1.1 Pre/Post Smoother....................... 36 5.1.2 Restriction............................ 36 5.1.3 Recurse/Direct Solve...................... 36 5.1.4 Prolongation .......................... 37 5.1.5 Matrix-Free Approach ..................... 37 5.2 Implementation............................. 39 5.3 Validation................................ 41 5.3.1 Dirichlet conditions....................... 41 5.3.2 Mixed Neumann ........................ 43 5.3.3 Small domain length issue................... 45 5.3.4 Troubleshooting Multigrid................... 45 5.4 Comparison with PETSc........................ 46 5.5 Parallelization.............................. 46 5.5.1 Smoother ............................ 46 5.5.2 Direct Solve........................... 47 5.5.3 Remaining steps......................... 48 5.5.4 Results.............................. 48 5.5.5 SubComponent Timings .................... 49 5.5.5.1 Smoother....................... 50 5.5.5.2 Restriction and Prolongation ............ 51 5.5.6 Smoother Performance..................... 52 5.6 Code Results .............................. 55 5.6.1 Potential Distribution ..................... 55 5.6.2 Effect of Particles on Potential................. 55 v Chapter 6 Conclusions and Future Work 57 6.1 Future work............................... 57 Bibliography 59 vi List of Figures 2.1 Side View With Boundary Conditions. ................ 4 2.2 View from the Capillary with Boundary Conditions.......... 5 2.3 View from the Extractor with Boundary Conditions. ........ 5 3.1 Outline of Particle class ........................ 9 3.2 Insertion of new particles into gaps .................. 13 3.3 Swap particles at end to cover leftover gaps.............. 14 3.4 Charges in a sample 2D grid. Arrows show the contribution towards a particular node............................. 15 3.5 Area weighted contribution of charge.................. 15 3.6 Laplace Particle Routine Scalability.................. 24 3.7 Particle Distribution at Timestep 50 ................. 25 3.8 Particle Distribution at Timestep 100................. 25 3.9 Particle Distribution at Timestep 150................. 25 3.10 Particle Distribution at Timestep 250................. 25 3.11 Front view of distribution at 250 timesteps.............. 26 4.1 Form of Matrix A with coefficients in case of Laplace equation. 27 4.2 7-pt stencil with Axes convention.................... 28 4.3 Sparsity Pattern of matrix A. ..................... 29 4.4 Evaluating Ax = b............................ 30 4.5 Graph indicating scaling for Poisson Run. .............. 34 5.1 Allocation of hierarchy of grids for a 2D face.............. 40 5.2 Form of the V-Cycle........................... 40 5.3 Iterations taken for varying Problem Size. .............. 42 5.4 Iterations taken for varying Problem Size............... 44 5.5 Red-Black coloring ........................... 47 5.6 Multigrid Dirichlet (9 levels, 3 coarse points) Speedup graph. 49 5.7 Scalability plot of Smoother from the Subcomponent Timings. 51 vii 5.8 Plot of Restriction and Prolongation Scalability............ 52 5.9 Smoother Scalability with 7 Multigrid levels.............. 54 5.10 Potential Distribution at Timestep 50................. 55 5.11 Electric Field at Timestep 50 ..................... 55 5.12 Potential Contours Timestep 50.................... 56 5.13 Potential Contours at Timestep 250.................. 56 viii List of Tables 2.1 Domain Lengths and Voltage Values used............... 4 3.1 Scalability study for the particle routines. .............. 23 3.2 Speedup values for the Moving Particles routine and for the Total Time. .................................. 23 4.1 PETSc-3.4.3 Solver and Preconditioner combinations explored. 31 4.2 PETSc-3.6.2 Solver and Preconditioner combinations explored. 31 4.3 Details of Performance Setup on a Single Node on Ganga Cluster. 32 4.4 Scaling results for the Poisson Run with 500 Timesteps. 33 5.1 Validation of Multigrid Solver with Dirichlet conditions. 42 5.2 Validation of the Multigrid solver with Neumann Boundary Conditions. 43 5.3 Multigrid Dirichlet with 9 levels and 3 points on coarsest level. 48 5.4 Timings for the individual subcomponent steps in the Multigrid Dirichlet case............................... 50 5.5 Scalability study of Smoother from the Subcomponent Timings. 50 5.6 Tabulation of Restriction and Prolongation Scalability. 52 5.7 Details of the Memory Hierarchy for LionXG. ............ 53 5.8 Cachegrind Performance Results.................... 53 5.9 Standalone Smoother Scalability Setup with 7 Multigrid levels and 3 coarse grid points. .......................... 54 ix List of Symbols ∇ Gradient operator, p.3 Φ Electric potential, p.3 ρq Density of Space charge, p.3 o Free space permittivity, p.3 E Electric Field, p.3 NMD Release rate of particles in the Molecular Dynamics code, p. 10 ∆tMD Molecular Dynamics code timestep, p. 10 Np,rel Release rate of particles in a PIC timestep, p. 10 tPIC PIC timestep, p. 10 Ip,rel Integer part of Np,rel, p. 10 Dp,rel Decimal part of Np,rel, p. 10 ryz,MD The magnitude of position vector of released particle from capillary center, p. 11 vyz,MD The magnitude of velocity of released particle from capillary center, p. 11 θ, η Angles used while randomizing particle attributes, p. 11 yPIC , zPIC New YZ positions after rotating position vector through angles, p. 11 x vy,P IC , vz,P IC New YZ velocities after rotating velocity vector through angles, p. 11 q Charge on particle, p. 18 F~ Force acting on particle of charge q, p. 18 mp Mass of particle, p. 18 ~v t Velocity of particle at time t, p. 18 xi Acknowledgments This research is supported in part by the US National Science Foundation award #1253881. Firstly, I would like to thank Dr. Kamesh Madduri for the support throughout this research project and his guidance on tackling various issues at different stages of the project. I had learnt valuable lessons prototyping an algorithm and the aspects to look out for, while aiming for performance. The ability to look at problems farther away as well as dive into the details, was particularly instructive. I would

Load more