High-Performance High-Order Simulation of Wave and Plasma Phenomena
Total Page:16
File Type:pdf, Size:1020Kb
High-Performance High-Order Simulation of Wave and Plasma Phenomena by Andreas Klockner¨ Dipl.-Math. techn., Universitat¨ Karlsruhe (TH); Karlsruhe, Germany, 2005 M.S., Brown University; Providence, RI, 2006 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Division of Applied Mathematics at Brown University PROVIDENCE, RHODE ISLAND May 2010 c Copyright 2010 by Andreas Klockner¨ This dissertation by Andreas Klockner¨ is accepted in its present form by The Division of Applied Mathematics as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date Jan Sickmann Hesthaven, Ph.D., Advisor Recommended to the Graduate Council Date Johnny Guzman,´ Ph.D., Reader Date Chi-Wang Shu, Ph.D., Reader Approved by the Graduate Council Date Sheila Bonde, Dean of the Graduate School iii Vitae Biographical Information Birth August 5th, 1977 Konstanz, Germany Education 2005 – 2010 Ph.D. in Applied Mathematics (in progress) Division of Applied Mathematics, Brown University, Providence, RI Advisor: Jan Hesthaven 2005 Diplom degree in Applied Mathematics (Technomathematik) Institut fur¨ Angewandte Mathematik, Universitat¨ Karlsruhe, Ger- many Advisor: Willy Dorfler¨ 2001 – 2002 Exchange Student, Department of Mathematics University of North Carolina at Charlotte, Charlotte, NC 2000 Vordiplom in Computer Science, Universitat¨ Karlsruhe, Germany Experience 6/2006 – 9/2006 J. Wallace Givens Research Associate Mathematics and Computer Science Div., Argonne Nat’l Laboratory, Illinois Worked on high-order unstructured electromagnetic simulation of particle accelerators (with Paul Fischer, Misun Min, and col- leagues at ANL’s Advanced Photon Source). iv 2/2005 – 7/2005 Research Associate (Wissenschaftlicher Mitarbeiter) Institut fur¨ Angewandte Mathematik, Universitat¨ Karlsruhe, Germany Worked on various extensions of my thesis research (with Willy Dorfler).¨ 5/2002 – 11/2002 Research Intern DaimlerChrysler Research & Technology, Palo Alto, CA Worked on driver stress detection, precision GPS, and software infrastructure (with Stefan Schrodl).¨ Publications 2010 Viscous Shock Capturing with an Explicitly Time-Stepped Discontinuous Galerkin Method. AK, T. Warburton, J.S. Hesthaven. In preparation. 2009 PyCUDA: GPU Run-Time Code Generation for High-Performance Computing. AK, N. Pinto, Y. Lee, B. Catanzaro, P. Ivanov, and A. Fasih. Submitted, available at http://arxiv.org/abs/0911. 3456. 2009 Nodal Discontinuous Galerkin Methods on Graphics Processors. AK, T. Warburton, J. Bridge, J.S. Hesthaven. Journal of Compu- tational Physics, Volume 228, Issue 21, 20 November 2009. 2009 Deterministic Numerical Schemes for the Boltzmann Equation. A. Narayan, AK. Brown University Scientific Computing Technni- cal Report 2009-39. 2005 On the Computation of Maximally Localized Wannier Functions. Diplom Thesis, Universitat¨ Karlsruhe, Germany. v Acknowledgments First and foremost, the support of my advisor Jan Hesthaven has been the cornerstone of my working life in the past five years. He was a source of questions, of answers, of inspiration, he encouraged me to be bold in the scientific questions I pursue, all while giving me great freedom in following my interests. He has also patiently put up with the things that turned out not to be so smart in hindsight. Beyond science, he has been a role model and a tremendous influence on my life as a whole. I consider myself lucky to have had him as a mentor. Over the years, I have worked very closely with Tim Warburton at Rice University on many of the topics that this thesis discusses. His generosity, help, and insight have benefited me in many ways. Both of the above, along with Chi-Wang Shu and Johnny Guzman´ have graciously agreed to serve on my PhD committee, spent time thinking about my work, and provided invaluable feedback. Throughout my graduate studies, I have had the honor of working on various projects with a large and diverse group of people. Their insights, commentary and encouragement, shared in many conversations, were and continue to be a great asset to my scientific life. The graduate student and postdoc community at Brown’s Division of Applied Mathe- matics is a great crowd in which to grow up academically. Many of you have become my vi friends, and I hope we will be able to stay close even as life scatters us across the globe. Nvidia Corporation have been very generous with equipment and travel support and were instrumental in initiating, furthering and publicizing the GPU-based part of my research. Many contributors around the world have created the open-source software and tools on which my work has crucially depended. This notably includes the communities that have formed around my various projects. Parts of this thesis are based on two publications, [Klockner¨ et al., 2009b] and [Klockner¨ et al., 2009a]. My coauthors have contributed considerably to both articles through their ideas, suggestions, and feedback. Last, but by no means least, my parents Bina and Heinrich Klockner¨ have, throughout my entire life, given me their unconditional support, advice, and love. Thank you, all of you. Some of the computational meshes used in this work were generated using Triangle [Shewchuk, 1996] and TetGen [Si and Gaertner, 2005]. The surface mesh for Figure 5.10 originates in the FlightGear flight simulator and was processed using Blender and MeshLab, a tool developed with the support of the Epoch European Network of Excellence. vii Abstract of “High-Performance High-Order Simulation of Wave and Plasma Phenomena” by Andreas Klockner,¨ Ph.D., Brown University, May 2010 This thesis presents results aiming to enhance and broaden the applicability of the discon- tinuous Galerkin (“DG”) method in a variety of ways. DG was chosen as a foundation for this work because it yields high-order finite element discretizations with very favorable numerical properties for the treatment of hyperbolic conservation laws. In a first part, I examine progress that can be made on implementation aspects of DG. In adapting the method to mass-market massively parallel computation hardware in the form of graphics processors (“GPUs”), I obtain an increase in computation performance per unit of cost by more than an order of magnitude over conventional processor architectures. Key to this advance is a recipe that adapts DG to a variety of hardware through automated self-tuning. I discuss new parallel programming tools supporting GPU run-time code generation which are instrumental in the DG self-tuning process and contribute to its reaching application floating point throughput greater than 200 GFlops/s on a single GPU and greater than 3 TFlops/s on a 16-GPU cluster in simulations of electromagnetics problems in three dimensions. I further briefly discuss the solver infrastructure that makes this possible. In the second part of the thesis, I introduce a number of new numerical methods whose motivation is partly rooted in the opportunity created by GPU-DG: First, I construct and examine a novel GPU-capable shock detector, which, when used to control an artificial viscosity, helps stabilize DG computations in gas dynamics and a number of other fields. Second, I describe my pursuit of a method that allows the simulation of rarefied plasmas using a DG discretization of the electromagnetic field. Finally, I introduce new explicit multi-rate time integrators for ordinary differential equations with multiple time scales, with a focus on applicability to DG discretizations of time-dependent problems. Contents Vitae iv Acknowledgments vi 1 Introduction 1 1.1 About this Thesis . 2 1.2 The Scientific Method and the Computational Experiment . 3 1.3 An Argument for Hybrid Codes . 5 1.4 Assembling a Set of Tools . 6 1.5 Reproducibility for Results in this Thesis . 7 2 Preliminaries 10 2.1 The Discontinuous Galerkin Method . 11 2.1.1 Implementing DG . 14 2.2 GPU Hardware: A Brief Introduction . 15 2.2.1 Specifics of Nvidia hardware . 18 3 A Code-Generating Discontinuous Galerkin Solver 21 3.1 On the Design of a Discontinuous Galerkin PDE Solver . 22 3.2 A Language for Discontinuous Galerkin Methods . 26 3.2.1 Fluxes and Flux-Local Binding . 30 3.2.2 Common Subexpression Elimination . 31 3.2.3 An Example . 32 3.2.4 Discussion . 34 3.3 The Processing Pipeline . 35 3.3.1 Type Inference and Operator Specialization . 35 3.3.2 Optimizations . 36 3.3.3 Target-Specific Rewriting . 37 3.4 The Virtual Machine . 38 3.4.1 The Compilation Step . 38 3.4.2 The Execution Model . 40 viii 3.5 Conclusions . 42 4 Code Generation on Graphics Processors 45 4.1 Introduction . 46 4.2 GPU Software Creation . 50 4.3 Problems Solved by GPU Run-Time Code Generation . 51 4.3.1 Automated Tuning . 52 4.3.2 The Cost of Flexibility . 53 4.3.3 High-Performance Abstractions . 54 4.3.4 GPUs and the Need for Flexibility . 56 4.4 PyCUDA: A Scripting-Based Approach to GPU RTCG . 57 4.4.1 Abstractions in PyCUDA . 61 4.4.2 Code Generation with PyCUDA . 63 4.4.3 PyOpenCL: OpenCL and GPU RTCG . 66 4.5 Successful Applications . 66 4.6 Conclusions . 68 5 Discontinuous Galerkin Methods on Graphics Processors 70 5.1 Introduction . 71 5.2 DG on the GPU: Design . 74 5.3 DG on the GPU: Implementation . 78 5.3.1 How to read this Section . 78 5.3.2 Flux Lifting . 79 5.3.3 Flux Extraction . 82 5.3.4 Element-Local Differentiation . 89 5.4 Experimental Results . 94 5.4.1 Further Results: Double Precision, Distributed Computation . 105 5.5 Conclusions . 109 6 Viscous Shock Capturing in a Time-Explicit Discontinuous Galerkin Method 111 6.1 Introduction . 112 6.2 Basic Design Considerations . 116 6.3 Applications and Equations . 120 6.3.1 Advection Equation . 120 6.3.2 Second-Order Wave Equation . 121 6.3.3 Burgers’ Equation . 122 6.3.4 Euler’s Equations of Gas Dynamics .