Fast Eigenpairs Computation with Operator Adapted Wavelets and Hierarchical Subspace Correction | SIAM Journal on Numerical Anal
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Multigrid Methods
Multigrid Methods Volker John Winter Semester 2013/14 Contents 1 Literature 2 2 Model Problems 3 3 Detailed Investigation of Classical Iterative Schemes 8 3.1 General Aspects of Classical Iterative Schemes . .8 3.2 The Jacobi and Damped Jacobi Method . 10 3.3 The Gauss{Seidel Method and the SOR Method . 15 3.4 Summary . 18 4 Grid Transfer 19 4.1 Algorithms with Coarse Grid Systems, the Residual Equation . 19 4.2 Prolongation or Interpolation . 21 4.3 Restriction . 23 5 The Two-Level Method 27 5.1 The Coarse Grid Problem . 27 5.2 General Approach for Proving the Convergence of the Two-Level Method . 30 5.3 The Smoothing Property of the Damped Jacobi Iteration . 32 5.4 The Approximation Property . 33 5.5 Summary . 36 6 The Multigrid Method 37 6.1 Multigrid Cycles . 37 6.2 Convergence of the W-cycle . 39 6.3 Computational Work of the Multigrid γ-Cycle . 45 7 Algebraic Multigrid Methods 49 7.1 Components of an AMG and Definitions . 49 7.2 Algebraic Smoothness . 52 7.3 Coarsening . 56 7.4 Prolongation . 58 7.5 Concluding Remarks . 60 8 Outlook 61 1 Chapter 1 Literature Remark 1.1 Literature. There are several text books about multigrid methods, e.g., • Briggs et al. (2000), easy to read introduction, • Hackbusch (1985), the classical book, sometimes rather hard to read, • Shaidurov (1995), • Wesseling (1992), an introductionary book, • Trottenberg et al. (2001). 2 2 Chapter 2 Model Problems Remark 2.1 Motivation. The basic ideas and properties of multigrid methods will be explained in this course on two model problems. -
Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product Hartwig Anzt and Stanimire Tomov and Jack Dongarra Innovative Computing Lab University of Tennessee Knoxville, USA Email: [email protected], [email protected], [email protected] Abstract— the computing power of today’s supercomputers, often accel- erated by coprocessors like graphics processing units (GPUs), This paper presents a heterogeneous CPU-GPU algorithm design and optimized implementation for an entire sparse iter- becomes challenging. ative eigensolver – the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) – starting from low-level GPU While there exist numerous efforts to adapt iterative lin- data structures and kernels to the higher-level algorithmic choices ear solvers like Krylov subspace methods to coprocessor and overall heterogeneous design. Most notably, the eigensolver technology, sparse eigensolvers have so far remained out- leverages the high-performance of a new GPU kernel developed side the main focus. A possible explanation is that many for the simultaneous multiplication of a sparse matrix and a of those combine sparse and dense linear algebra routines, set of vectors (SpMM). This is a building block that serves which makes porting them to accelerators more difficult. Aside as a backbone for not only block-Krylov, but also for other from the power method, algorithms based on the Krylov methods relying on blocking for acceleration in general. The subspace idea are among the most commonly used general heterogeneous LOBPCG developed here reveals the potential of eigensolvers [1]. When targeting symmetric positive definite this type of eigensolver by highly optimizing all of its components, eigenvalue problems, the recently developed Locally Optimal and can be viewed as a benchmark for other SpMM-dependent applications. -
Hybrid Particle-Grid Water Simulation Using Multigrid Pressure Solver Per Karlsson
LiU-ITN-TEK-G--14/006-SE Hybrid Particle-Grid Water Simulation using Multigrid Pressure Solver Per Karlsson 2014-03-13 Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings universitet nedewS ,gnipökrroN 47 106-ES 47 ,gnipökrroN nedewS 106 47 gnipökrroN LiU-ITN-TEK-G--14/006-SE Hybrid Particle-Grid Water Simulation using Multigrid Pressure Solver Examensarbete utfört i Medieteknik vid Tekniska högskolan vid Linköpings universitet Per Karlsson Handledare George Baravdish Examinator Camilla Forsell Norrköping 2014-03-13 Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extra- ordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. -
Multigrid Algorithms for High Order Discontinuous Galerkin Methods
Multigrid algorithms for high order discontinuous Galerkin methods Paola F. Antonietti1, Marco Sarti2, and Marco Verani3 Abstract I n this paper we study the performance of h- and p-multigrid algorithms for high order Discontinuous Galerkin discretizations of elliptic problems. We test the performance of the multigrid schemes employing a wide class of smoothers and considering both two- and three-dimensional test cases. 1 Introduction In the framework of multigrid solvers for Discontinuous Galerkin (DG) schemes, the first contributions are due to Gopalakrishnan and Kanschat [16] and Brenner and Zhao [10]. In [16] a V-cycle preconditioner for a Sym- metric Interior Penalty (SIP) discretization of an elliptic problem is analyzed. They prove that the condition number of the preconditioned system is uni- formly bounded with respect to the mesh size and the number of levels. In [10] V-cycle, F-cycle and W-cycle multigrid schemes for SIP discretizations are presented and analyzed, employing the additive theory developed in [8, 9]. A uniform bound for the error propagation operator is shown provided the number of smoothing steps is large enough. All the previously cited works focus on low order, i.e., linear, DG approximations. With regard to high order DG discretizations, h- and p-multigrid schemes are successfully em- ployed for the numerical solution of many different kinds of problems, see e.g. [14, 20, 22, 21, 24, 6], even if only few theoretical results are available that show the convergence properties of the underlying algorithms. In the MOX, Dipartimento di Matematica, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano ITALY. -
Hybrid Multigrid/Schwarz Algorithms for the Spectral Element Method
Hybrid Multigrid/Schwarz Algorithms for the Spectral Element Method James W. Lottes¤ and Paul F. Fischery February 4, 2004 Abstract We study the performance of the multigrid method applied to spectral element (SE) discretizations of the Poisson and Helmholtz equations. Smoothers based on finite element (FE) discretizations, overlapping Schwarz methods, and point-Jacobi are con- sidered in conjunction with conjugate gradient and GMRES acceleration techniques. It is found that Schwarz methods based on restrictions of the originating SE matrices converge faster than FE-based methods and that weighting the Schwarz matrices by the inverse of the diagonal counting matrix is essential to effective Schwarz smoothing. Sev- eral of the methods considered achieve convergence rates comparable to those attained by classic multigrid on regular grids. 1 Introduction The availability of fast elliptic solvers is essential to many areas of scientific computing. For unstructured discretizations in three dimensions, iterative solvers are generally optimal from both work and storage standpoints. Ideally, one would like to have computational complexity that scales as O(n) for an n-point grid problem in lRd, implying that the it- eration count should be bounded as the mesh is refined. Modern iterative methods such as multigrid and Schwarz-based domain decomposition achieve bounded iteration counts through the introduction of multiple representations of the solution (or the residual) that allow efficient elimination of the error at each scale. The theory for these methods is well established for classical finite difference (FD) and finite element (FE) discretizations, and order-independent convergence rates are often attained in practice. For spectral element (SE) methods, there has been significant work on the development of Schwarz-based methods that employ a combination of local subdomain solves and sparse global solves to precondition conjugate gradient iteration. -
Multigrid and Saddle-Point Preconditioners for Unfitted Finite
Multigrid and saddle-point preconditioners for unfitted finite element modelling of inclusions Hardik Kothari∗and Rolf Krause† Institute of Computational Science, Universit`adella Svizzera italiana, Lugano, Switzerland February 17, 2021 Abstract In this work, we consider the modeling of inclusions in the material using an unfitted finite element method. In the unfitted methods, structured background meshes are used and only the underlying finite element space is modified to incorporate the discontinuities, such as inclusions. Hence, the unfitted methods provide a more flexible framework for modeling the materials with multiple inclusions. We employ the method of Lagrange multipliers for enforcing the interface conditions between the inclusions and matrix, this gives rise to the linear system of equations of saddle point type. We utilize the Uzawa method for solving the saddle point system and propose preconditioning strategies for primal and dual systems. For the dual systems, we review and compare the preconditioning strategies that are developed for FETI and SIMPLE methods. While for the primal system, we employ a tailored multigrid method specifically developed for the unfitted meshes. Lastly, the comparison between the proposed preconditioners is made through several numerical experiments. Keywords: Unfitted finite element method, multigrid method, saddle-point problem 1 Introduction In the modeling of many real-world engineering problems, we encounter material discontinuities, such as inclusions. The inclusions are found naturally in the materials, or can be artificially introduced to produce desired mechanical behavior. The finite element (FE) modeling of such inclusions requires to generate meshes that can resolve the interface between the matrix and inclusions, which can be a computationally cumbersome and expensive task. -
Solving Symmetric Semi-Definite (Ill-Conditioned)
Solving Symmetric Semi-definite (ill-conditioned) Generalized Eigenvalue Problems Zhaojun Bai University of California, Davis Berkeley, August 19, 2016 Symmetric definite generalized eigenvalue problem I Symmetric definite generalized eigenvalue problem Ax = λBx where AT = A and BT = B > 0 I Eigen-decomposition AX = BXΛ where Λ = diag(λ1; λ2; : : : ; λn) X = (x1; x2; : : : ; xn) XT BX = I: I Assume λ1 ≤ λ2 ≤ · · · ≤ λn LAPACK solvers I LAPACK routines xSYGV, xSYGVD, xSYGVX are based on the following algorithm (Wilkinson'65): 1. compute the Cholesky factorization B = GGT 2. compute C = G−1AG−T 3. compute symmetric eigen-decomposition QT CQ = Λ 4. set X = G−T Q I xSYGV[D,X] could be numerically unstable if B is ill-conditioned: −1 jλbi − λij . p(n)(kB k2kAk2 + cond(B)jλbij) · and −1 1=2 kB k2kAk2(cond(B)) + cond(B)jλbij θ(xbi; xi) . p(n) · specgapi I User's choice between the inversion of ill-conditioned Cholesky decomposition and the QZ algorithm that destroys symmetry Algorithms to address the ill-conditioning 1. Fix-Heiberger'72 (Parlett'80): explicit reduction 2. Chang-Chung Chang'74: SQZ method (QZ by Moler and Stewart'73) 3. Bunse-Gerstner'84: MDR method 4. Chandrasekaran'00: \proper pivoting scheme" 5. Davies-Higham-Tisseur'01: Cholesky+Jacobi 6. Working notes by Kahan'11 and Moler'14 This talk Three approaches: 1. A LAPACK-style implementation of Fix-Heiberger algorithm 2. An algebraic reformulation 3. Locally accelerated block preconditioned steepest descent (LABPSD) This talk Three approaches: 1. A LAPACK-style implementation of Fix-Heiberger algorithm Status: beta-release 2. -
On the Performance and Energy Efficiency of Sparse Linear Algebra on Gpus
Original Article The International Journal of High Performance Computing Applications 1–16 On the performance and energy efficiency Ó The Author(s) 2016 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav of sparse linear algebra on GPUs DOI: 10.1177/1094342016672081 hpc.sagepub.com Hartwig Anzt1, Stanimire Tomov1 and Jack Dongarra1,2,3 Abstract In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based super- computers. We compare the resource efficiency of different sparse matrix–vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel’s MKL for multicore CPUs, and develop a GPU sparse matrix–matrix product (SpMM) implementation that handles the simultaneous multiplication of a sparse matrix with a set of vectors in block-wise fashion. While a typical sparse computation such as the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM succeeds in exceeding the memory-bound limitations of the SpMV. We integrate this kernel into a GPU-accelerated Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) eigensolver. LOBPCG is chosen as a benchmark algorithm for this study as it combines an interesting mix of sparse and dense linear algebra operations that is typical for complex simulation applications, and allows for hardware-aware optimizations. In a detailed analysis we compare the performance and energy efficiency against a multi-threaded CPU counterpart. The reported performance and energy efficiency results are indicative of sparse computations on supercomputers. Keywords sparse eigensolver, LOBPCG, GPU supercomputer, energy efficiency, blocked sparse matrix–vector product 1Introduction GPU-based supercomputers. -
Multigrid Method Based on a Space-Time Approach with Standard Coarsening for Parabolic Problems
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by CWI's Institutional Repository Applied Mathematics and Computation 317 (2018) 25–34 Contents lists available at ScienceDirect Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc Multigrid method based on a space-time approach with standard coarsening for parabolic problems Sebastião Romero Franco a,b, Francisco José Gaspar c, Marcio Augusto Villela ∗ Pinto d, Carmen Rodrigo e, a Department of Mathematics, State University of Centro-Oeste, Irati, PR, Brazil b Graduate Program in Numerical Methods in Engineering, Federal University of Paraná, Curitiba, PR, Brazil c Centrum Wiskunde & Informatica (CWI), Science Park 123, P.O. Box 94079, Amsterdam 1090, The Netherlands d Department of Mechanical Engineering, Federal University of Paraná, Curitiba, PR, Brazil e IUMA and Applied Mathematics Department, University of Zaragoza, María de Luna, 3, Zaragoza 50018, Spain a r t i c l e i n f o a b s t r a c t MSC: In this work, a space-time multigrid method which uses standard coarsening in both tem- 00-01 poral and spatial domains and combines the use of different smoothers is proposed for the 99-00 solution of the heat equation in one and two space dimensions. In particular, an adaptive Keywords: smoothing strategy, based on the degree of anisotropy of the discrete operator on each Space-time multigrid grid-level, is the basis of the proposed multigrid algorithm. Local Fourier analysis is used Local Fourier analysis for the selection of the crucial parameter defining such an adaptive smoothing approach. -
Electrostatic PIC with Adaptive Cartesian Mesh
Electrostatic PIC with adaptive Cartesian mesh Vladimir Kolobov1,2 and Robert Arslanbekov1 1 CFD Research Corporation, Huntsville, AL, USA 2 The University of Alabama in Huntsville, Huntsville, AL 35899, USA E-mail: [email protected] Abstract. We describe an initial implementation of an electrostatic Particle-in-Cell (ES-PIC) module with adaptive Cartesian mesh in our Unified Flow Solver framework. Challenges of PIC method with cell-based adaptive mesh refinement (AMR) are related to a decrease of the particle-per-cell number in the refined cells with a corresponding increase of the numerical noise. The developed ES-PIC solver is validated for capacitively coupled plasma, its AMR capabilities are demonstrated for simulations of streamer development during high-pressure gas breakdown. It is shown that cell-based AMR provides a convenient particle management algorithm for exponential multiplications of electrons and ions in the ionization events. 1. Introduction Particle-in-Cell (PIC) is a mature technology widely used for plasma simulations. There are three versions of PIC: an electrostatic (EC) PIC uses Poisson equation for calculating electric field, an electromagnetic (EM) PIC employs full-wave Maxwell solver for calculations of electric and magnetic fields, and a quasi-static (QS) PIC uses a quasi-static version of the Maxwell equations (such as Darwin model) for calculation of the EM fields. Surprisingly, PIC solvers with adaptive mesh refinement (AMR) are relatively rare. It is worth mentioning the WARP code for accelerator applications [1], and an explicit EM-PIC code of Fujimoto [2] for astrophysics applications, both using block-structured AMR. In this paper, we describe an initial implementation of an ES-PIC module in our Unified Flow Solver (UFS) framework [3]. -
Hp-Multigrid As Smoother Algorithm for Higher Order Discontinuous Galerkin Discretizations of Advection Dominated Flows Part I
HP-MULTIGRID AS SMOOTHER ALGORITHM FOR HIGHER ORDER DISCONTINUOUS GALERKIN DISCRETIZATIONS OF ADVECTION DOMINATED FLOWS PART I. MULTILEVEL ANALYSIS J.J.W. VAN DER VEGT∗ AND S. RHEBERGENy AMS subject classifications. Primary 65M55; Secondary 65M60, 76M10. Abstract. The hp-Multigrid as Smoother algorithm (hp-MGS) for the solution of higher order accurate space-(time) discontinuous Galerkin discretizations of advection dominated flows is pre- sented. This algorithm combines p-multigrid with h-multigrid at all p-levels, where the h-multigrid acts as smoother in the p-multigrid. The performance of the hp-MGS algorithm is further improved using semi-coarsening in combination with a new semi-implicit Runge-Kutta method as smoother. A detailed multilevel analysis of the hp-MGS algorithm is presented to obtain more insight into the theoretical performance of the algorithm. As model problem a fourth order accurate space-time dis- continuous Galerkin discretization of the advection-diffusion equation is considered. The multilevel analysis shows that the hp-MGS algorithm has excellent convergence rates, both for low and high cell Reynolds numbers and on highly stretched meshes. Key words. multigrid algorithms, discontinuous Galerkin methods, higher order accurate dis- cretizations, space-time methods, Runge-Kutta methods, Fourier analysis, multilevel analysis 1. Introduction. Discontinuous Galerkin finite element methods are well suited to obtain higher order accurate discretizations on unstructured meshes. The use of basis functions which are only weakly coupled to neighboring elements results in a local discretization which allows, in combination with hp-mesh adaptation, the efficient capturing of detailed structures in the solution, and is also beneficial for parallel computing. -
High Efficiency Spectral Analysis and BLAS-3
High Efficiency Spectral Analysis and BLAS-3 Randomized QRCP with Low-Rank Approximations by Jed Alma Duersch A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Applied Mathematics in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Ming Gu, Chair Assistant Professor Lin Lin Professor Kameshwar Poolla Fall 2015 High Efficiency Spectral Analysis and BLAS-3 Randomized QRCP with Low-Rank Approximations Copyright 2015 by Jed Alma Duersch i Contents Contents i 1 A Robust and Efficient Implementation of LOBPCG 2 1.1 Contributions and results . 2 1.2 Introduction to LOBPCG . 3 1.3 The basic LOBPCG algorithm . 4 1.4 Numerical stability and basis selection . 6 1.5 Stability improvements . 10 1.6 Efficiency improvements . 13 1.7 Analysis . 22 1.8 Numerical examples . 25 2 Spectral Target Residual Descent 32 2.1 Contributions and results . 32 2.2 Introduction to interior eigenvalue targeting . 33 2.3 Spectral targeting analysis . 34 2.4 Notes on generalized spectral targeting . 42 2.5 Numerical experiments . 48 3 True BLAS-3 Performance QRCP using Random Sampling 53 3.1 Contributions and results . 53 3.2 Introduction to QRCP . 54 3.3 Sample QRCP . 58 3.4 Sample updates . 64 3.5 Full randomized QRCP . 66 3.6 Parallel implementation notes . 71 3.7 Truncated approximate Singular Value Decomposition . 73 3.8 Experimental performance . 77 ii Bibliography 93 iii Acknowledgments I gratefully thank my coauthors, Dr. Meiyue Shao and Dr. Chao Yang, for their time and expertise. I would also like to thank Assistant Professor Lin Lin, Dr.