XAMG: a Library for Solving Linear Systems with Multiple Right-Hand
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
											Arxiv:1911.09220V2 [Cs.MS] 13 Jul 2020MFEM: A MODULAR FINITE ELEMENT METHODS LIBRARY ROBERT ANDERSON, JULIAN ANDREJ, ANDREW BARKER, JAMIE BRAMWELL, JEAN- SYLVAIN CAMIER, JAKUB CERVENY, VESELIN DOBREV, YOHANN DUDOUIT, AARON FISHER, TZANIO KOLEV, WILL PAZNER, MARK STOWELL, VLADIMIR TOMOV Lawrence Livermore National Laboratory, Livermore, USA IDO AKKERMAN Delft University of Technology, Netherlands JOHANN DAHM IBM Research { Almaden, Almaden, USA DAVID MEDINA Occalytics, LLC, Houston, USA STEFANO ZAMPINI King Abdullah University of Science and Technology, Thuwal, Saudi Arabia Abstract. MFEM is an open-source, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of dis- cretization approaches and emphasis on usability, portability, and high-performance computing efficiency. MFEM's goal is to provide application scientists with access to cutting-edge algorithms for high-order finite element mesh- ing, discretizations and linear solvers, while enabling researchers to quickly and easily develop and test new algorithms in very general, fully unstructured, high-order, parallel and GPU-accelerated settings. In this paper we describe the underlying algorithms and finite element abstractions provided by MFEM, discuss the software implementation, and illustrate various applications of the library. arXiv:1911.09220v2 [cs.MS] 13 Jul 2020 1. Introduction The Finite Element Method (FEM) is a powerful discretization technique that uses general unstructured grids to approximate the solutions of many partial differential equations (PDEs). It has been exhaustively studied, both theoretically and in practice, in the past several decades [1, 2, 3, 4, 5, 6, 7, 8]. MFEM is an open-source, lightweight, modular and scalable software library for finite elements, featuring arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and high-performance computing (HPC) efficiency [9].
- 
												  Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector ProductAccelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product Hartwig Anzt and Stanimire Tomov and Jack Dongarra Innovative Computing Lab University of Tennessee Knoxville, USA Email: [email protected], [email protected], [email protected] Abstract— the computing power of today’s supercomputers, often accel- erated by coprocessors like graphics processing units (GPUs), This paper presents a heterogeneous CPU-GPU algorithm design and optimized implementation for an entire sparse iter- becomes challenging. ative eigensolver – the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) – starting from low-level GPU While there exist numerous efforts to adapt iterative lin- data structures and kernels to the higher-level algorithmic choices ear solvers like Krylov subspace methods to coprocessor and overall heterogeneous design. Most notably, the eigensolver technology, sparse eigensolvers have so far remained out- leverages the high-performance of a new GPU kernel developed side the main focus. A possible explanation is that many for the simultaneous multiplication of a sparse matrix and a of those combine sparse and dense linear algebra routines, set of vectors (SpMM). This is a building block that serves which makes porting them to accelerators more difficult. Aside as a backbone for not only block-Krylov, but also for other from the power method, algorithms based on the Krylov methods relying on blocking for acceleration in general. The subspace idea are among the most commonly used general heterogeneous LOBPCG developed here reveals the potential of eigensolvers [1]. When targeting symmetric positive definite this type of eigensolver by highly optimizing all of its components, eigenvalue problems, the recently developed Locally Optimal and can be viewed as a benchmark for other SpMM-dependent applications.
- 
												  MFEM: a Modular Finite Element Methods LibraryMFEM: A Modular Finite Element Methods Library Robert Anderson1, Andrew Barker1, Jamie Bramwell1, Jakub Cerveny2, Johann Dahm3, Veselin Dobrev1,YohannDudouit1, Aaron Fisher1,TzanioKolev1,MarkStowell1,and Vladimir Tomov1 1Lawrence Livermore National Laboratory 2University of West Bohemia 3IBM Research July 2, 2018 Abstract MFEM is a free, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and high-performance computing efficiency. Its mission is to provide application scientists with access to cutting-edge algorithms for high-order finite element meshing, discretizations and linear solvers. MFEM also enables researchers to quickly and easily develop and test new algorithms in very general, fully unstructured, high-order, parallel settings. In this paper we describe the underlying algorithms and finite element abstractions provided by MFEM, discuss the software implementation, and illustrate various applications of the library. Contents 1 Introduction 3 2 Overview of the Finite Element Method 4 3Meshes 9 3.1 Conforming Meshes . 10 3.2 Non-Conforming Meshes . 11 3.3 NURBS Meshes . 12 3.4 Parallel Meshes . 12 3.5 Supported Input and Output Formats . 13 1 4 Finite Element Spaces 13 4.1 FiniteElements....................................... 14 4.2 DiscretedeRhamComplex ................................ 16 4.3 High-OrderSpaces ..................................... 17 4.4 Visualization . 18 5 Finite Element Operators 18 5.1 DiscretizationMethods................................... 18 5.2 FiniteElementLinearSystems . 19 5.3 Operator Decomposition . 23 5.4 High-Order Partial Assembly . 25 6 High-Performance Computing 27 6.1 Parallel Meshes, Spaces, and Operators . 27 6.2 Scalable Linear Solvers .
- 
												  Slepc Users Manual Scalable Library for Eigenvalue Problem ComputationsDepartamento de Sistemas Inform´aticos y Computaci´on Technical Report DSIC-II/24/02 SLEPc Users Manual Scalable Library for Eigenvalue Problem Computations https://slepc.upv.es Jose E. Roman Carmen Campos Lisandro Dalcin Eloy Romero Andr´es Tom´as To be used with slepc 3.15 March, 2021 Abstract This document describes slepc, the Scalable Library for Eigenvalue Problem Computations, a software package for the solution of large sparse eigenproblems on parallel computers. It can be used for the solution of various types of eigenvalue problems, including linear and nonlinear, as well as other related problems such as the singular value decomposition (see a summary of supported problem classes on page iii). slepc is a general library in the sense that it covers both Hermitian and non-Hermitian problems, with either real or complex arithmetic. The emphasis of the software is on methods and techniques appropriate for problems in which the associated matrices are large and sparse, for example, those arising after the discretization of partial differential equations. Thus, most of the methods offered by the library are projection methods, including different variants of Krylov and Davidson iterations. In addition to its own solvers, slepc provides transparent access to some external software packages such as arpack. These packages are optional and their installation is not required to use slepc, see x8.7 for details. Apart from the solvers, slepc also provides built-in support for some operations commonly used in the context of eigenvalue computations, such as preconditioning or the shift- and-invert spectral transformation. slepc is built on top of petsc, the Portable, Extensible Toolkit for Scientific Computation [Balay et al., 2021].
- 
												  Parallel Mathematical Libraries STIMULATE Training Workshop 2018Parallel Mathematical Libraries STIMULATE Training Workshop 2018 December 2018 I.Gutheil JSC Outline A Short History Sequential Libraries Parallel Libraries and Application Systems: Threaded Libraries MPI parallel Libraries Libraries for GPU usage Usage of ScaLAPACK, Elemental, and FFTW Slide 1 A Short History Libraries for Dense Linear Algebra Starting in the early 1970th, written in FORTRAN LINPACK, LINear algebra PACKage for digital computers, supercomputers of the 1970th and early 1980th Dense linear system solvers, factorization and solution, real and complex, single and double precision EISPACK, EIgenSolver PACKage, written around 1972–1973 eigensolvers for real and complex, symmetric and non-symmetric matrices, solvers for generalized eigenvalue problem and singular value decomposition LAPACK, initial release 1992, still in use, now Fortran 90, also C and C++ interfaces, tuned for single-core performance on modern supercomputers, threaded versions from vendors available, e.g. MKL A Short History Slide 2 A Short History, continued BLAS (Basic Linear Algebra Subprograms for Fortran Usage) First step 1979: vector operations, now called BLAS 1, widely used kernels of linear algebra routines, idea: tuning in assembly on each machine leading to portable performance Second step 1988: matrix-vector operations, now called BLAS 2, optimization on the matrix-vector level necessary for vector computers, building blocks for LINPACK and EISPACK Third step 1990: matrix-matrix operations, now called BLAS 3, memory acces became slower than CPU,
- 
												  Geeng Started with TrilinosGeng Started with Trilinos Michael A. Heroux Photos placed in horizontal position Sandia National with even amount of white space Laboratories between photos and header USA Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXXP Outline - Some (Very) Basics - Overview of Trilinos: What it can do for you. - Trilinos Software Organization. - Overview of Packages. - Package Management. - Documentation, Building, Using Trilinos. 1 Online Resource § Trilinos Website: https://trilinos.org § Portal to Trilinos resources. § Documentation. § Mail lists. § Downloads. § WebTrilinos § Build & run Trilinos examples in your web browser! § Need username & password (will give these out later) § https://code.google.com/p/trilinos/wiki/TrilinosHandsOnTutorial § Example codes: https://code.google.com/p/trilinos/w/list 2 WHY USE MATHEMATICAL LIBRARIES? 3 § A farmer had chickens and pigs. There was a total of 60 heads and 200 feet. How many chickens and how many pigs did the farmer have? § Let x be the number of chickens, y be the number of pigs. § Then: x + y = 60 2x + 4y = 200 § From first equaon x = 60 – y, so replace x in second equaon: 2(60 – y) + 4y = 200 § Solve for y: 120 – 2y + 4y = 200 2y = 80 y = 40 § Solve for x: x = 60 – 40 = 20. § The farmer has 20 chickens and 40 pigs. 4 § A restaurant owner purchased one box of frozen chicken and another box of frozen pork for $60. Later the owner purchased 2 boxes of chicken and 4 boxes of pork for $200.
- 
												  PFEAST: a High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear SolversPFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers James Kestyn∗, Vasileios Kalantzisy, Eric Polizzi∗, Yousef Saady ∗Electrical and Computer Engineering Department, University of Massachusetts, Amherst, MA, U.S.A. yComputer Science and Engineering Department, University of Minnesota, Minneapolis, MN, U.S.A. Abstract—The FEAST algorithm and eigensolver for interior computing interior eigenpairs that makes use of a rational eigenvalue problems naturally possesses three distinct levels filter obtained from an approximation of the spectral pro- of parallelism. The solver is then suited to exploit modern jector. FEAST can be applied for solving both standard and computer architectures containing many interconnected proces- sors. This paper highlights a recent development within the generalized forms of Hermitian or non-Hermitian problems, software package that allows the dominant computational task, and belongs to the family of contour integration eigensolvers solving a set of complex linear systems, to be performed with a [32], [33], [3], [14], [15], [4]. Once a given search interval distributed memory solver. The software, written with a reverse- is selected, FEAST’s main computational task consists of communication-interface, can now be interfaced with any generic a numerical quadrature computation that involves solving MPI linear-system solver using a customized data distribution for the eigenvector solutions. This work utilizes two common independent linear systems along a complex contour. The “black-box” distributed memory linear-systems solvers (Cluster- algorithm can exploit natural parallelism at three different MKL-Pardiso and MUMPS), as well as our own application- levels: (i) search intervals can be treated separately (no over- specific domain-decomposition MPI solver, for a collection of 3- lap), (ii) linear systems can be solved independently across dimensional finite-element systems.
- 
												  Block Locally Optimal Preconditioned Eigenvalue Xolvers BLOPEXBlock Locally Optimal Preconditioned Eigenvalue Xolvers I.Lashuk, E.Ovtchinnikov, M.Argentati, A.Knyazev Block Locally Optimal Preconditioned Eigenvalue Xolvers BLOPEX Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Department of Mathematical Sciences and Center for Computational Mathematics University of Colorado at Denver and Health Sciences Center Supported by the Lawrence Livermore National Laboratory and the National Science Foundation Center for Computational Mathematics, University of Colorado at Denver Block Locally Optimal Preconditioned Eigenvalue Xolvers I.Lashuk, E.Ovtchinnikov, M.Argentati, A.Knyazev Abstract Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) is a package, written in C, that at present includes only one eigenxolver, Locally Optimal Block Preconditioned Conjugate Gradient Method (LOBPCG). BLOPEX supports parallel computations through an abstract layer. BLOPEX is incorporated in the HYPRE package from LLNL and is availabe as an external block to the PETSc package from ANL as well as a stand-alone serial library. Center for Computational Mathematics, University of Colorado at Denver Block Locally Optimal Preconditioned Eigenvalue Xolvers I.Lashuk, E.Ovtchinnikov, M.Argentati, A.Knyazev Acknowledgements Supported by the Lawrence Livermore National Laboratory, Center for Applied Scientific Computing (LLNL–CASC) and the National Science Foundation. We thank Rob Falgout, Charles Tong, Panayot Vassilevski, and other members of the Hypre Scalable Linear Solvers project team for their help and support. We thank Jose E. Roman, a member of SLEPc team, for writing the SLEPc interface to our Hypre LOBPCG solver. The PETSc team has been very helpful in adding our BLOPEX code as an external package to PETSc. Center for Computational Mathematics, University of Colorado at Denver Block Locally Optimal Preconditioned Eigenvalue Xolvers I.Lashuk, E.Ovtchinnikov, M.Argentati, A.Knyazev CONTENTS 1.
- 
												  Downloads in FY15)Lawrence Berkeley National Laboratory Recent Work Title Preparing sparse solvers for exascale computing. Permalink https://escholarship.org/uc/item/0r56p10n Journal Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, 378(2166) ISSN 1364-503X Authors Anzt, Hartwig Boman, Erik Falgout, Rob et al. Publication Date 2020-03-01 DOI 10.1098/rsta.2019.0053 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Preparing sparse solvers for exascale computing royalsocietypublishing.org/journal/rsta Hartwig Anzt1,ErikBoman2, Rob Falgout3, Pieter Ghysels4,MichaelHeroux2, Xiaoye Li4, 5 5 Review Lois Curfman McInnes , Richard Tran Mills , Sivasankaran Rajamanickam2, Karl Rupp6, Cite this article: Anzt H et al.2020Preparing sparse solvers for exascale computing. Phil. Barry Smith5, Ichitaro Yamazaki2 and Trans. R. Soc. A 378: 20190053. 3 http://dx.doi.org/10.1098/rsta.2019.0053 Ulrike Meier Yang 1Electrical Engineering and Computer Science, University of Accepted: 5 November 2019 Tennessee, Knoxville, TN 37996, USA 2Sandia National Laboratories, Albuquerque, NM, USA One contribution of 15 to a discussion meeting 3Lawrence Livermore National Laboratory, Livermore, CA, USA issue ‘Numerical algorithms for 4Lawrence Berkeley National Laboratory, Berkeley, CA, USA high-performance computational science’. 5Argonne National Laboratory, Argonne, IL, USA 6 Subject Areas: Vienna University of Technology, Wien, Wien, Austria computational mathematics, computer MH, 0000-0002-5893-0273 modelling and simulation, software Sparse solvers provide essential functionality for Keywords: a wide variety of scientific applications. Highly sparse solvers, mathematical libraries parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi- scale simulations, especially as we target exascale Author for correspondence: platforms.
- 
												  Software Development Practices Apis for Solver and UDFElmer Software Development Practices APIs for Solver and UDF ElmerTeam CSC – IT Center for Science CSC, November.2015 Elmer programming languages Fortran90 (and newer) – ElmerSolver (~240,000 lines of which ~50% in DLLs) C++ – ElmerGUI (~18,000 lines) – ElmerSolver (~10,000 lines) C – ElmerPost (~45,000 lines) – ElmerGrid (~30,000 lines) – MATC (~11,000 lines) Tools for Elmer development Programming languages – Fortran90 (and newer), C, C++ Compilation – Compiler (e.g. gnu), configure, automake, make, (cmake) Editing – emacs, vi, notepad++,… Code hosting (git) – Current: https://github.com/ElmerCSC – Obsolite: www.sf.net/projects/elmerfem Consistency tests Code documentation – Doxygen Theory documentation – Latex Community server – www.elmerfem.org (forum, wiki, etc.) Elmer libraries ElmerSolver – Required: Matc, HutIter, Lapack, Blas, Umfpack (GPL) – Optional: Arpack, Mumps, Hypre, Pardiso, Trilinos, SuperLU, Cholmod, NetCDF, HDF5, … ElmerGUI – Required: Qt, ElmerGrid, Netgen – Optional: Tetgen, OpenCASCADE, VTK, QVT Elmer licenses ElmerSolver library is published under LGPL – Enables linking with all license types – It is possible to make a new solver even under proprietary license – Note: some optional libraries may constrain this freedom due to use of GPL licences Rest of Elmer is published under GPL – Derived work must also be under same license (“copyleft”) Elmer version control at GitHub In 2015 the official version control of Elmer was transferred from svn at sf.net to git hosted at GitHub Git offers more flexibility over svn
- 
												![Arxiv:2104.01196V2 [Math.NA] 24 Apr 2021 Proposed As an Alternative to the Sequential Algorithm Based on a Triangular Solve](https://docslib.b-cdn.net/cover/1930/arxiv-2104-01196v2-math-na-24-apr-2021-proposed-as-an-alternative-to-the-sequential-algorithm-based-on-a-triangular-solve-4941930.webp)  Arxiv:2104.01196V2 [Math.NA] 24 Apr 2021 Proposed As an Alternative to the Sequential Algorithm Based on a Triangular SolveTWO-STAGE GAUSS{SEIDEL PRECONDITIONERS AND SMOOTHERS FOR KRYLOV SOLVERS ON A GPU CLUSTER STEPHEN THOMASy , ICHITARO YAMAZAKI∗, LUC BERGER-VERGIAT∗, BRIAN KELLEY∗, JONATHAN HU∗, PAUL MULLOWNEYy , SIVASANKARAN RAJAMANICKAM∗, AND KATARZYNA SWIRYDOWICZ´ z Abstract. Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the sequential GS relaxation based on a triangular solve is compared with two-stage variants, replacing the direct triangular solve with a fixed number of inner Jacobi-Richardson (JR) iterations. When a small number of inner iterations is sufficient to maintain the Krylov convergence rate, the two-stage GS (GS2) often outperforms the sequential algorithm on many-core architectures. The GS2 algorithm is also compared with JR. When they perform the same number of flops for SpMV (e.g. three JR sweeps compared to two GS sweeps with one inner JR sweep), the GS2 iterations, and the Krylov solver preconditioned with GS2, may converge faster than the JR iterations. Moreover, for some problems (e.g. elasticity), it was found that JR may diverge with a damping factor of one, whereas two-stage GS may improve the convergence with more inner iterations. Finally, to study the performance of the two-stage smoother and preconditioner for a practical problem, these were applied to incompressible fluid flow simulations on GPUs. 1. Introduction. Solving large sparse linear systems of the form Ax = b is a basic and fundamental component of physics based modeling and simulation.
- 
												  PHAML User's GuideNISTIR 7374 PHAML User's Guide William F. Mitchell U. S. Department of Commerce Technology Administration National Institute of Standards and Technology Information Technology Laboratory Gaithersburg, MD 20899 USA Revised August 28, 2018 for Version 1.20.0 PHAML User's Guide, Version 1.20.0 William F. Mitchell1 Applied and Computational Mathematics Division 100 Bureau Drive Stop 8910 National Institute of Standards and Technology Gaithersburg, MD 20899-8910 email: [email protected] 1Contribution of NIST, not subject to copyright in the United States. The mention of specific products, trademarks, or brand names is for purposes of identification only. Such mention is not to be interpreted in any way as an endorsement or certification of such products or brands by the National Institute of Standards and Technology. All trademarks mentioned herein belong to their respective owners. Abstract PHAML (Parallel Hierarchical Adaptive MultiLevel) is a Fortran module for the solution of elliptic partial differential equations. It uses finite elements, adaptive grid refinement (h, p or hp) and multigrid solution techniques in a message pass- ing parallel program. It has interactive graphics via OpenGL. This document is the user's guide for PHAML. The first part tells how to obtain any needed software, how to build and test the PHAML library, and how to compile and run the example programs. The second part explains the use of PHAML, in- cluding program structure and the various options that are available. The third part is a reference manual which describes the API (application programming interface) of PHAML. The reference manual begins with a 2 page Quick Start section for the impatient.