A Survey of Direct Methods for Sparse Linear Systems

Total Page:16

File Type:pdf, Size:1020Kb

A Survey of Direct Methods for Sparse Linear Systems A survey of direct methods for sparse linear systems Timothy A. Davis, Sivasankaran Rajamanickam, and Wissam M. Sid-Lakhdar Technical Report, Department of Computer Science and Engineering, Texas A&M Univ, April 2016, http://faculty.cse.tamu.edu/davis/publications.html To appear in Acta Numerica Wilkinson defined a sparse matrix as one with enough zeros that it pays to take advantage of them.1 This informal yet practical definition captures the essence of the goal of direct methods for solving sparse matrix problems. They exploit the sparsity of a matrix to solve problems economically: much faster and using far less memory than if all the entries of a matrix were stored and took part in explicit computations. These methods form the backbone of a wide range of problems in computational science. A glimpse of the breadth of applications relying on sparse solvers can be seen in the origins of matrices in published matrix benchmark collections (Duff and Reid 1979a) (Duff, Grimes and Lewis 1989a) (Davis and Hu 2011). The goal of this survey article is to impart a working knowledge of the underlying theory and practice of sparse direct methods for solving linear systems and least-squares problems, and to provide an overview of the algorithms, data structures, and software available to solve these problems, so that the reader can both understand the methods and know how best to use them. 1 Wilkinson actually defined it in the negation: \The matrix may be sparse, either with the non-zero elements concentrated ... or distributed in a less systematic manner. We shall refer to a matrix as dense if the percentage of zero elements or its distribution is such as to make it uneconomic to take advantage of their presence." (Wilkinson and Reinsch 1971), page 191, emphasis in the original. 2 Davis, Rajamanickam, Sid-Lakhdar CONTENTS 1 Introduction 2 2 Basic algorithms 5 3 Solving triangular systems 10 4 Symbolic analysis 11 5 Cholesky factorization 26 6 LU factorization 37 7 QR factorization and least-squares problems 52 8 Fill-reducing orderings 65 9 Supernodal methods 90 10 Frontal methods 100 11 Multifrontal methods 106 12 Other topics 134 13 Available Software 147 References 152 1. Introduction This survey article presents an overview of the fundamentals of direct meth- ods for sparse matrix problems, from theory to algorithms and data struc- tures to working pseudocode. It gives an in-depth presentation of the many algorithms and software available for solving sparse matrix problems, both sequential and parallel. The focus is on direct methods for solving systems of linear equations, including LU, QR, Cholesky, and other factorizations, for- ward/backsolve, and related matrix operations. Iterative methods, solvers for eigenvalue and singular-value problems, and sparse optimization prob- lems are beyond the scope of this article. 1.1. Outline • In Section 1.2, below, we first provide a list of books and prior survey articles that the reader may consult to further explore this topic. • Section 2 presents a few basic data structures and some basic algo- rithms that operate on them: transpose, matrix-vector and matrix- matrix multiply, and permutations. These are often necessary for the reader to understand and perhaps implement, to provide matrices as input to a package for solving sparse linear systems. • Section 3 describes the sparse triangular solve, with both sparse and dense right-hand-sides. This kernel provides a useful background in understanding the nonzero pattern of a sparse matrix factorization, which is discussed in Section 4, and also forms the basis of the basic Cholesky and LU factorizations presented in Sections 5 and 6. Sparse Direct Methods 3 • Section 4 covers the symbolic analysis phase that occurs prior to nu- meric factorization: finding the elimination tree, the number of nonze- ros in each row and column of the factors, and the nonzero patterns of the factors themselves (or their upper bounds in case numerical piv- oting changes things later on). The focus is on Cholesky factorization but this discussion has implications for the other kinds of factorizations as well (LDLT , LU, and QR). • Section 5 presents the many variants of sparse Cholesky factorization for symmetric positive definite matrices, including early methods of historical interest (envelope and skyline methods), and up-looking, left- looking, and right-looking methods. Multifrontal and supernodal meth- ods (for Cholesky, LU, LDLT and QR) are presented later on. • Section 6 considers the LU factorization, where numerical pivoting becomes a concern. After describing how the symbolic analysis dif- fers from the Cholesky case, this section covers left-looking and right- looking methods, and how numerical pivoting impacts these algorithms. • Section 7 presents QR factorization and its symbolic analysis, for both row-oriented (Givens) and column-oriented (Householder) variants. Also considered are alternative methods for solving sparse least-squares prob- lems, based on augmented systems or on LU factorization. • Section 8 is on the ordering problem, which is to find a good permu- tation for reducing fill-in, work, or memory usage. This is a difficult problem (it is NP-hard, to be precise). This section presents the many heuristics that have been created that attempt to find a decent solution to the problem. These heuristics are typically applied first, prior to the symbolic analysis and numeric factorization, but it appears out of or- der in this article, since understanding matrix factorizations (Sections 4 through 7) is a prerequisite in understanding how to best permute the matrix. • Section 9 presents the supernodal method for Cholesky and LU factor- ization, in which adjacent columns in the factors with identical nonzero pattern (or nearly identical) are \glued" together to form supernodes. Exploiting this common substructure greatly improves memory traf- fic and allows for computations to be done in dense submatrices, each representing a supernodal column. • Section 10 discusses the frontal method, which is another way of orga- nizing the factorization in a right-looking method, where a single dense submatrix holds the part of the sparse matrix actively being factorized, and rows and columns come and go during factorization. Historically, this method precedes both supernodal and multifrontal methods. • Section 11 covers the many variants of the multifrontal method for Cholesky, LDLT , LU, and QR factorizations. In this method, the ma- trix is represented not by one frontal matrix, but by many of them, 4 Davis, Rajamanickam, Sid-Lakhdar all related to one another via the assembly tree (a variant of the elim- ination tree). As in the supernodal and frontal methods, dense matrix operations can be exploited within each dense frontal matrix. • Section 12 considers several topics that do not neatly fit into the above outline, yet which are critical to the domain of direct methods for sparse linear systems. Many of these topics are also active areas of research: update/downdate methods, parallel triangular solve, GPU acceleration, and low-rank approximations. • Reliable peer-reviewed software has long been a hallmark of research in computational science, and in sparse direct methods in particular. Thus, no survey on sparse direct methods would thus be complete without a discussion of software, which we present in Section 13. 1.2. Resources Sparse direct methods are a tightly-coupled combination of techniques from numerical linear algebra, graph theory, graph algorithms, permutations, and other topics in discrete mathematics. We assume that the reader is familiar with the basics of this background. For further reading, Golub and Van Loan (2012) provide an in-depth coverage of numerical linear algebra and matrix computations for dense and structured matrices. Cormen, Leiserson and Rivest (1990) discuss algorithms and data structures and their analysis, including graph algorithms. MATLAB notation is used in this article (see Davis (2011b) for a tutorial). Books dedicated to the topic of direct methods for sparse linear systems in- clude those by Tewarson (1973), George and Liu (1981), Pissanetsky (1984), Duff, Erisman and Reid (1986), Zlatev (1991), Bj¨orck (1996), and Davis (2006). Portions of Sections 2 through 8 of this article are condensed from Davis' (2006) book. Demmel (1997) interleaves a discussion of numerical linear algebra with a description of related software for sparse and dense problems. Chapter 6 of Dongarra, Duff, Sorensen and Van der Vorst (1998) provides an overview of direct methods for sparse linear systems. Several of the early conference proceedings in the 1970s and 1980s on sparse matrix problems and algorithms have been published in book form, including Reid (1971), Rose and Willoughby (1972), Duff (1981e), and Evans (1985). Survey and overview papers such as this one have appeared in the lit- erature and provide a useful birds-eye view of the topic and its historical development. The first was a survey by Tewarson (1970), which even early in the formation of the field covers many of the same topics as this survey arti- cle: LU factorization, Householder and Givens transformations, Markowitz ordering, minimum degree ordering, and bandwidth/profile reduction and other special forms. It also presents both the Product Form of the Inverse and the Elimination Form. The former arises in a Gauss-Jordan elimina- tion that is no longer used in sparse direct solvers. Reid's survey (1974) Sparse Direct Methods 5 focuses on right-looking Gaussian elimination, and in two related papers (1977a, 1977b), also considers graphs, the block triangular form, Cholesky factorization, and least-squares problems. Duff (1977b) gave an extensive survey of sparse matrix methods and their applications with over 600 ref- erences. This paper does not attempt to cite the many papers that rely on sparse direct methods, since the count is now surely into the thousands. A Google Scholar search in September 2015 for the term \sparse matrix" lists 1.4 million results, although many of those are unrelated to sparse direct methods.
Recommended publications
  • Recursive Approach in Sparse Matrix LU Factorization
    51 Recursive approach in sparse matrix LU factorization Jack Dongarra, Victor Eijkhout and the resulting matrix is often guaranteed to be positive Piotr Łuszczek∗ definite or close to it. However, when the linear sys- University of Tennessee, Department of Computer tem matrix is strongly unsymmetric or indefinite, as Science, Knoxville, TN 37996-3450, USA is the case with matrices originating from systems of Tel.: +865 974 8295; Fax: +865 974 8296 ordinary differential equations or the indefinite matri- ces arising from shift-invert techniques in eigenvalue methods, one has to revert to direct methods which are This paper describes a recursive method for the LU factoriza- the focus of this paper. tion of sparse matrices. The recursive formulation of com- In direct methods, Gaussian elimination with partial mon linear algebra codes has been proven very successful in pivoting is performed to find a solution of Eq. (1). Most dense matrix computations. An extension of the recursive commonly, the factored form of A is given by means technique for sparse matrices is presented. Performance re- L U P Q sults given here show that the recursive approach may per- of matrices , , and such that: form comparable to leading software packages for sparse ma- LU = PAQ, (2) trix factorization in terms of execution time, memory usage, and error estimates of the solution. where: – L is a lower triangular matrix with unitary diago- nal, 1. Introduction – U is an upper triangular matrix with arbitrary di- agonal, Typically, a system of linear equations has the form: – P and Q are row and column permutation matri- Ax = b, (1) ces, respectively (each row and column of these matrices contains single a non-zero entry which is A n n A ∈ n×n x where is by real matrix ( R ), and 1, and the following holds: PPT = QQT = I, b n b, x ∈ n and are -dimensional real vectors ( R ).
    [Show full text]
  • 3.1 Ducom-COM3
    Addis Ababa University Addis Ababa Institute of Technology School of Electrical and Computer Engineering Performance Improvement of Multi-scale Modeling of Concrete Performance Solver Using Hybrid CPU-GPU System A Thesis Submitted to the School of Electrical and Computer Engineering In Partial Fulfillment of the Requirements for the Degree of Master of Science In Computer Engineering By: Lina Felleke Advisor: Fitsum Assamnew Co-Advisor: Dr. Esayas G/Yohannes April 2015 Addis Ababa University Addis Ababa Institute of Technology School of Electrical and Computer Engineering Performance Improvement of Multi-scale Modeling of Concrete Performance Solver Using Hybrid CPU-GPU System By: Lina Felleke Advisor: Fitsum Assamnew Co-Advisor: Dr. Esayas G/Yohannes April 2015 Addis Ababa University Addis Ababa Institute of Technology School of Electrical and Computer Engineering Performance Improvement of Multi-scale Modeling of Concrete Performance Solver Using Hybrid CPU-GPU System By Lina Felleke Approval by Board of Examiners Dr.Yalemzewd Negash Dean, School of Electrical Signature And Computer Engineering Fitsum Assamnew Advisor Signature Dr. Esayas_G/Yohannes Co-Advisor Signature Dr. Sirinavas Nune ________________ External Examiner Signature Menore Tekeba _________________ Internal Examiner Signature i Declaration I, the undersigned, declare that this thesis work, to the best of my knowledge and belief, is my original work, has not been presented for a degree in this or any other universities, and all sources of materials used for the thesis work have been fully acknowledged. Name: LinaFelleke Signature: _________________ Place: Addis Ababa Date of submission: This thesis has been submitted for examination with my approval as a university advisor. Fitsum Assamnew Signature: ___________________ Advisor Name Dr.
    [Show full text]
  • Sparse Days September 6Th – 8Th, 2017 CERFACS, Toulouse, France
    Sparse Days September 6th – 8th, 2017 CERFACS, Toulouse, France Sparse Days programme Wednesday, September 6th 2017 13:00 – 14:00 Registration 14:00 – 14:15 Welcome message Session 1 (14:15 – 15:30) – Chair: Iain Duff 14:15 – 15:40 Block Low-Rank multifrontal solvers: complexity, performance, and scalability Theo Mary (IRIT) 15:40 – 16:05 Sparse Supernodal Solver exploiting Low-Rankness Property Grégoire Pichon (Inria Bordeaux – Sud-Ouest) 15:05 – 15:30 Recursive inverse factorization Anton Artemov (Uppsala University) Coffee break (15:30 – 16:00) Session 2 (16:00 – 17:40) – Chair: Daniel Ruiz 16:00 – 16:25 Algorithm based on spectral analysis to detect numerical blocks in matrices Luce le Gorrec (IRIT – APO) 16:25 – 16:50 Computing Sparse Tensor Decompositions using Dimension Trees Oguz Kaya (ENS Lyon) 16:50 – 17:15 Numerical analysis of dynamic centrality Philip Knight (University of Strathclyde) 17:15 – 17:40 SuiteSparse:GraphBLAS: graph algorithms via sparse matrix operations on semirings Tim Davis (Texas A&M University) Thursday, September 7th 2017 Session 3 (09:45 – 11:00) – Chair: Tim Davis 09:45 – 10:10 Parallel Algorithm Design via Approximation Alex Pothen (Purdue University) 10:10 – 10:35 Accelerating the Mondriaan sparse matrix partitioner Rob Bisseling (Utrecht University) 10:35 – 11:00 LS-GPart: a global, distributed ordering library François-Henry Rouet (Livermore Software Technology Corporation) Coffee break (11:00 – 11:25) Session 4 (11:25 – 13:05) – Chair: Cleve Ashcraft 11:25 – 11:50 A Tale of Two Codes Robert Lucas
    [Show full text]
  • A Block Sparse Shared-Memory Multifrontal Finite Element Solver For
    Computer Assisted Mechanics and Engineering Sciences, 16 : 117–131, 2009. Copyright c 2009 by Institute of Fundamental Technological Research, Polish Academy of Sciences A block sparse shared-memory multifrontal finite element solver for problems of structural mechanics S. Y. Fialko Institute of Computer Modeling, Cracow University of Technology ul. Warszawska 24, 31-155 Cracow, Poland Software Company SCAD Soft, Ukraine (Received in the final form July 17, 2009) The presented method is used in finite-element analysis software developed for multicore and multipro- cessor shared-memory computers, or it can be used on single-processor personal computers under the operating systems Windows 2000, Windows XP, or Windows Vista, widely popular in small or medium- sized design offices. The method has the following peculiar features: it works with any ordering; it uses an object-oriented approach on which a dynamic, highly memory-efficient algorithm is based; it performs a block factoring in the frontal matrix that entails a high-performance arithmetic on each processor and ensures a good scalability in shared-memory systems. Many years of experience with this solver in the SCAD software system have shown the method’s high efficiency and reliability with various large-scale problems of structural mechanics (hundreds of thousands to millions of equations). Keywords: finite element method, large-scale problems, multifrontal method, sparse matrices, ordering, multithreading. 1. I NTRODUCTION The dimensionality of present-day design models for multistorey buildings and other engineering structures may reach 1,000,000 to 2,000,000 equations and still grows up. Therefore, one of most important issues with modern CAD software is how fast and reliably their solver subsystems can solve large-scale problems.
    [Show full text]
  • Numerical Analysis Group Progress Report
    RAL 94-062 NUMERICAL ANALYSIS GROUP PROGRESS REPORT January 1991 ± December 1993 Edited by Iain S. Duff Central Computing Department Atlas Centre Rutherford Appleton Laboratory Oxon OX11 0QX June 1994. CONTENTS 1 Introduction (I.S. Duff) ¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼ 2 2 Sparse Matrix Research ¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼ 6 2.1 The solution of sparse structured symmetric indefinite linear sets of equations (I.S. Duff and J.K. Reid)¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼ 6 2.2 The direct solution of sparse unsymmetric linear sets of equations (I.S. Duff and J.K. Reid) ¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼ 7 2.3 Development of a new frontal solver (I. S. Duff and J. A. Scott) ¼¼¼¼¼¼¼ 10 2.4 A variable-band linear equation frontal solver (J. A. Scott) ¼¼¼¼¼¼¼¼¼ 12 2.5 The use of multiple fronts (I. S.Duff and J. A. Scott)¼¼¼¼¼¼¼¼¼¼¼¼ 13 2.6 MUPS: a MUltifrontal Parallel Solver for sparse unsymmetric sets of linear equations (P.R. Amestoy and I.S. Duff)¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼ 14 2.7 Sparse matrix factorization on the BBN TC2000 (P.R. Amestoy and I.S. Duff) ¼ 15 2.8 Unsymmetric multifrontal methods (T.A. Davis and I.S. Duff) ¼¼¼¼¼¼¼ 17 2.9 Unsymmetric multifrontal methods for finite-element problems (A.C. Damhaug, I.S. Duff, J.A. Scott, and J.K. Reid) ¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼ 18 2.10 Automatic scaling of sparse symmetric matrices (J.K. Reid) ¼¼¼¼¼¼¼¼ 19 2.11 Block triangular form and QR decomposition (I.S. Duff and C. Puglisi)¼¼¼¼ 20 2.12 Multifrontal QR factorization in a multiprocessor environment (P.R. Amestoy, I.S. Duff, and C. Puglisi) ¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼ 22 2.13 Impact of full QR factorization on sparse multifrontal QR algorithm (I.S.
    [Show full text]
  • Parallel Codes for Computing the Numerical Rank '
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector LINEAR ALGEBRA AND ITS APPLICATIONS ELSEIVIER Linear Algebra and its Applications 275~ 276 (1998) 451 470 Parallel codes for computing the numerical rank ’ Gregorio Quintana-Orti ***, Enrique S. Quintana-Orti ’ Abstract In this paper we present an experimental comparison of several numerical tools for computing the numerical rank of dense matrices. The study includes the well-known SVD. the URV decomposition, and several rank-revealing QR factorizations: the QR factorization with column pivoting, and two QR factorizations with restricted column pivoting. Two different parallel programming methodologies are analyzed in our paper. First, we present block-partitioned algorithms for the URV decomposition and rank-re- vealing QR factorizations which provide efficient implementations on shared memory environments. Furthermore, we also present parallel distributed algorithms, based on the message-passing paradigm, for computing rank-revealing QR factorizations on mul- ticomputers. 0 1998 Elsevier Science Inc. All rights reserved. 1. Introduction Consider the matrix A E KY’“” (w.1.o.g. m 3 n), and define its rank as 1’ := rank(il) = dim(range(A)). The rank of a matrix is related to its singular values a(A) = {crt, 02,. , o,,}, p = min(m, H), since *Corresponding author. E-mail: [email protected]. ’ All authors were partially supported by the Spanish CICYT project TIC 96-1062~CO3-03. ’ This work was partially carried out while the author was visiting the Mathematics and Computer Science Division at Argonne National Laboratory. ’ E-mail: [email protected].
    [Show full text]
  • A Fast Implementation of the Minimum Degree Algorithm Using Quotient Graphs
    A Fast Implementation of the Minimum Degree Algorithm Using Quotient Graphs ALAN GEORGE University of Waterloo and JOSEPH W. H. LIU Systems Dimensions Ltd. This paper describes a very fast implementation of the m~n~mum degree algorithm, which is an effective heuristm scheme for fmding low-fill ordermgs for sparse positive definite matrices. This implementation has two important features: first, in terms of speed, it is competitive with other unplementations known to the authors, and, second, its storage requirements are independent of the amount of fill suffered by the matrix during its symbolic factorization. Some numerical experiments which compare the performance of this new scheme to some existing minimum degree programs are provided. Key Words and Phrases: sparse linear equations, quotient graphs, ordering algoritl~ms, graph algo- rithms, mathematical software CR Categories: 4.0, 5.14 1. INTRODUCTION Consider the symmetric positive definite system of linear equations Ax = b, (1.1) where A is N by N and sparse. It is well known that ifA is factored by Cholesky's method, it normally suffers some fill-in. Thus, if we intend to solve eq. (1.1) by this method, it is usual first to find a permutation matrix P and solve the reordered system (pApT)(px) = Pb, (1.2) where P is chosen such that PAP T suffers low fill-in when it is factored into LL T. There are four distinct and independent phases that can be identified in the entire computational process: Permmsion to copy without fee all or part of this material m granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the titleof the publication and its date appear, and notice is given that copying ts by permission of the Association for Computing Machinery.
    [Show full text]
  • State-Of-The-Art Sparse Direct Solvers
    State-of-The-Art Sparse Direct Solvers Matthias Bollhöfer, Olaf Schenk, Radim Janalík, Steve Hamm, and Kiran Gullapalli Abstract In this chapter we will give an insight into modern sparse elimination meth- ods. These are driven by a preprocessing phase based on combinatorial algorithms which improve diagonal dominance, reduce fill–in and improve concurrency to allow for parallel treatment. Moreover, these methods detect dense submatrices which can be handled by dense matrix kernels based on multi-threaded level–3 BLAS. We will demonstrate for problems arising from circuit simulation how the improvement in recent years have advanced direct solution methods significantly. 1 Introduction Solving large sparse linear systems is at the heart of many application problems arising from engineering problems. Advances in combinatorial methods in combi- nation with modern computer architectures have massively influenced the design of state-of-the-art direct solvers that are feasible for solving larger systems efficiently in a computational environment with rapidly increasing memory resources and cores. Matthias Bollhöfer Institute for Computational Mathematics, TU Braunschweig, Germany, e-mail: m.bollhoefer@tu- bs.de Olaf Schenk Institute of Computational Science, Faculty of Informatics, Università della Svizzera italiana, Switzerland, e-mail: [email protected] arXiv:1907.05309v1 [cs.DS] 11 Jul 2019 Radim Janalík Institute of Computational Science, Faculty of Informatics, Università della Svizzera italiana, Switzerland, e-mail: [email protected] Steve Hamm NXP, United States of America, e-mail: [email protected] Kiran Gullapalli NXP, United States of America, e-mail: [email protected] 1 2 M. Bollhöfer, O. Schenk, R. Janalík, Steve Hamm, and K.
    [Show full text]
  • On the Performance of Minimum Degree and Minimum Local Fill
    1 On the performance of minimum degree and minimum local fill heuristics in circuit simulation Gunther Reißig Before you cite this report, please check whether it has already appeared as a journal paper (http://www.reiszig.de/gunther, or e-mail to [email protected]). Thank you. The author is with Massachusetts Institute of Technology, Room 66-363, Dept. of Chemical Engineering, 77 Mas- sachusetts Avenue, Cambridge, MA 02139, U.S.A.. E-Mail: [email protected]. Work supported by Deutsche Forschungs- gemeinschaft. Part of the results of this paper have been obtained while the author was with Infineon Technologies, M¨unchen, Germany. February 1, 2001 DRAFT 2 Abstract Data structures underlying local algorithms for obtaining pivoting orders for sparse symmetric ma- trices are reviewed together with their theoretical background. Recently proposed heuristics as well as improvements to them are discussed and their performance, mainly in terms of the resulting number of factorization operations, is compared with that of the Minimum Degree and the Minimum Local Fill al- gorithms. It is shown that a combination of Markowitz' algorithm with these symmetric methods applied to the unsymmetric matrices arising in circuit simulation yields orderings significantly better than those obtained from Markowitz' algorithm alone, in some cases at virtually no extra computational cost. I. Introduction When the behavior of an electrical circuit is to be simulated, numerical integration techniques are usually applied to its equations of modified nodal analysis. This requires the solution of systems of nonlinear equations, and, in turn, the solution of numerous linear equations of the form Ax = b; (1) where A is a real n × n matrix, typically nonsingular, and x; b ∈ Rn [1–3].
    [Show full text]
  • Towards Green Multi-Frontal Solver for Adaptive Finite Element Method
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/277587946 Towards Green Multi-frontal Solver for Adaptive Finite Element Method Article in Procedia Computer Science · June 2015 DOI: 10.1016/j.procs.2015.05.240 CITATION READS 1 29 6 authors, including: Konrad Jopek Pawel Gepner AGH University of Science and Technology in … Intel 13 PUBLICATIONS 37 CITATIONS 42 PUBLICATIONS 288 CITATIONS SEE PROFILE SEE PROFILE Jacek Kitowski Maciej Paszynski AGH University of Science and Technology in … AGH University of Science and Technology in … 228 PUBLICATIONS 851 CITATIONS 178 PUBLICATIONS 1,282 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Solver performance analysis and optimization View project Linear computational cost and memory usage hybrid solvers for problems of electromagnetic waves propagation over the human head model View project All content following this page was uploaded by Maciej Paszynski on 02 June 2015. The user has requested enhancement of the downloaded file. Procedia Computer Science Volume 51, 2015, Pages 984–993 ICCS 2015 International Conference On Computational Science Towards green multi-frontal solver for adaptive finite element method H. AbbouEisha1,M.Moshkov1,K.Jopek2,P.Gepner3, J. Kitowski2,andM. Paszy´nski2 1 King Abdullah University of Science and Technology, Thuwal, Saudi Arabia 2 AGH University of Science and Technology, Krakow, Poland 3 Intel Corporation, Swindon, United Kingdom Abstract In this paper we present the optimization of the energy consumption for the multi-frontal solver algorithm executed over two dimensional grids with point singularities. The multi-frontal solver algorithm is controlled by so-called elimination tree, defining the order of elimination of rows from particular frontal matrices, as well as order of memory transfers for Schur complement matrices.
    [Show full text]
  • Efficient Scalable Algorithms for Hierarchically Semiseparable Matrices
    EFFICIENT SCALABLE ALGORITHMS FOR HIERARCHICALLY SEMISEPARABLE MATRICES SHEN WANG¤, XIAOYE S. LIy , JIANLIN XIAz , YINGCHONG SITUx , AND MAARTEN V. DE HOOP{ Abstract. Hierarchically semiseparable (HSS) matrix structures are very useful in the develop- ment of fast direct solvers for both dense and sparse linear systems. For example, they can built into parallel sparse solvers for Helmholtz equations arising from frequency domain wave equation mod- eling prevailing in the oil and gas industry. Here, we study HSS techniques in large-scale scienti¯c computing by considering the scalability of HSS structures and by developing massively parallel HSS algorithms. These algorithms include parallel rank-revealing QR factorizations, HSS constructions with hierarchical compression, ULV HSS factorizations, and HSS solutions. BLACS and ScaLAPACK are used as our portable libraries. We construct contexts (sub-communicators) on each node of the HSS tree and exploit the governing 2D-block-cyclic data distribution scheme widely used in ScaLAPACK. Some tree techniques for symmetric positive de¯nite HSS matrices are generalized to nonsymmetric problems in parallel. These parallel HSS methods are very useful for sparse solutions when they are used as kernels for intermediate dense matrices. Numerical examples (including 3D ones) con¯rm the performance of our implementations, and demonstrate the potential of structured methods in parallel computing, especially parallel direction solutions of large linear systems. Related techniques are also useful for the parallelization of other rank structured methods. Key words. HSS matrix, parallel HSS algorithm, low-rank property, compression, HSS con- struction, direct solver AMS subject classi¯cations. 15A23, 65F05, 65F30, 65F50 1. Introduction. In recent years, rank structured matrices have attracted much attention and have been widely used in the fast solutions of various partial di®erential equations, integral equations, and eigenvalue problems.
    [Show full text]
  • A Parallel Butterfly Algorithm
    A Parallel Butterfly Algorithm The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Poulson, Jack, Laurent Demanet, Nicholas Maxwell, and Lexing Ying. “A Parallel Butterfly Algorithm.” SIAM Journal on Scientific Computing 36, no. 1 (February 4, 2014): C49–C65.© 2014, Society for Industrial and Applied Mathematics. As Published http://dx.doi.org/10.1137/130921544 Publisher Society for Industrial and Applied Mathematics Version Final published version Citable link http://hdl.handle.net/1721.1/88176 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. SIAM J. SCI. COMPUT. c 2014 Society for Industrial and Applied Mathematics Vol. 36, No. 1, pp. C49–C65 ∗ A PARALLEL BUTTERFLY ALGORITHM JACK POULSON†, LAURENT DEMANET‡, NICHOLAS MAXWELL§, AND ¶ LEXING YING Abstract. The butterfly algorithm is a fast algorithm which approximately evaluates a dis- crete analogue of the integral transform Rd K(x, y)g(y)dy at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain d simple geometric condition. In d dimensions with O(N ) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time 2 d of the algorithm is at most O(r N log N). A parallelization of the butterfly algorithm is intro- duced which, assuming a message latency of α and per-process inverse bandwidth of β, executes 2 Nd Nd in at most O(r p log N +(βr p + α)logp)timeusingp processes.
    [Show full text]