A Comparison of Parallel Solvers for Diagonally Dominant and General

Total Page:16

File Type:pdf, Size:1020Kb

A Comparison of Parallel Solvers for Diagonally Dominant and General A COMPARISON OF PARALLEL SOLVERS FOR DIAGONALLY DOMINANT AND GENERAL NARROWBANDED LINEAR SYSTEMS y z x PETER ARBENZ ANDREW CLEARY JACK DONGARRA AND MARKUS HEGLAND Abstract We investigate and compare stable parallel algorithms for solving diagonally dominant and general narrowbanded linear systems of equations Narrowbanded means that the bandwidth is very small compared with the matrix order and is typically b etween and The solvers compared are the banded system solvers of ScaLA PACK and those investigated by Arb enz and Hegland For the diagonally dominant case the algorithms are analogs of the wellknown tridiagonal cyclic reduction algorithm while the inspiration for the general case is the lesserknown bidiagonal cyclic reduction which allows a clean parallel implementation of partial pivoting These divideandconquer type algorithms complement negrained algorithms which p erform well only for widebanded ma trices with each family of algorithms having a range of problem sizes for which it is sup erior We present theoretical analyses as well as numerical exp eriments conducted on the Intel Paragon Key words narrowbanded linear systems stable factorization parallel solution cyclic reduction ScaLAPACK Introduction In this pap er we compare implementations of direct parallel metho ds for solving banded systems of linear equations Ax b The nbyn matrix A is assumed to have lower halfbandwidth k and upp er halfbandwidth k l u meaning that k and k are the smallest integers that imply l u a k j i k ij l u We assume that the matrix A has a narrow band such that k k n Linear systems with wide l u band can b e solved eciently by metho ds similar to full system solvers In particular parallel algo rithms using twodimensional mappings such as the toruswrap mapping and Gaussian elimination with partial pivoting have achieved reasonable success The parallelism of these algorithms is the same as that of dense matrix algorithms applied to matrices of size min fk k g l u indep endent of n from which it is obvious that small bandwidths severely limit the usefulness of these algorithms even for large n Parallel algorithms for the solution of banded linear systems with small bandwidth have b een considered by many authors b oth b ecause they serve as a canonical form of recursive equations as well as having direct applications The latter include the solution of eigenvalue problems with inverse iteration spline interpolation and smo othing and the solution of b oundary value problems for ordinary dierential equations using nite dierence or nite element metho ds For these onedimensional applications bandwidths typically vary b etween and The discretisation of partial dierential equations leads to applications with slightly larger bandwidths for example the computation of uid ow in a long narrow pip e In this case the number of grid p oints orthogonal to the ow direction is much smaller than the number of grid p oints along the ow and this results in a matrix with bandwidth relatively small compared to the total size of the problem There is a tradeo for these type of problems b etween band solvers and general sparse techniques in that the band solver assumes that all of the entries within the band are nonzero which they are not and thus p erforms unnecessary computation but its data structures are much simpler and there is no indirect addressing as in general sparse metho ds In the second half of the s numerous pap ers dealt with parallel solvers for the class of non symmetric narrowbanded matrices that can b e factored stably without pivoting such as diagonally y Institute of Scientic Computing Swiss Federal Institute of Technology ETH Zurich Switzerland arbenzinfethzch z Center for Applied Scientic Computing Lawrence Livermore National Lab oratory PO Box L Liver more CA USA aclearyllnlgov x Department of Computer Science University of Tennessee Knoxville TN USA dongarracsutkedu Computer Sciences Lab oratory RSISE Australian National University Canberra ACT Australia MarkusHeglandanuedu au P ARBENZ A CLEARY J DONGARRA AND M HEGLAND dominant matrices or Mmatrices Many have grown out of their tridiagonal counterparts Banded system solvers similar to Wangs tridiagonal solver were presented in for shared memory multiprocessors with a small number of pro cessors The algorithm that we will review in section is related to cyclic reduction CR This algorithm has b een discussed in detail in where the p erformance of implementations of this algorithm on distributed memory multicomputers like the Intel Paragon or the IBM SP is analyzed as well Johnsson considered the same algorithm and its implementation on the Thinking Machine CM which required a dierent mo del for the complexity of the interprocessor communication A similar algorithm has b een presented by Ha jj and Skelbo e for the Intel iPSC hypercub e Cyclic reduction can b e interpreted as Gaussian T elimination applied to a symmetrically p ermuted system of equations P AP P x P b This inter pretation has imp ortant consequences such as it implies that the algorithm is backward stable It can also b e used to show that the p ermutation necessarily causes Gaussian elimination to generate l lin which in turn increases the computational complexity as well as the memory requirements of the algorithm In section we consider algorithms for solving for arbitrary narrow banded matrices A that may require pivoting for stability reasons This algorithm was prop osed and thoroughly discussed in It can b e interpreted as a generalization of the wellknown blo ck tridiagonal cyclic reduction to blo ck bidiagonal matrices and again it is also equivalent to Gaussian elimination applied to T a p ermuted nonsymmetrically for the general case system of equations P AQ Qx P b Blo ck bidiagonal cyclic reduction for the solution of banded linear systems was introduced by Hegland Conroy describ ed an algorithm that p erforms the rst step of our pivoting algorithm but then continues by solving the reduced system sequentially A further stable algorithm based on full pivoting was presented by Wright This algorithm is however dicult to implement and prone to loadimbalance Parallel banded system solvers based on the QR factorization are discussed in These algorithms require ab out twice the amount of work of those considered here In section we compare the ScaLAPACK implementations of the two algorithms ab ove with the implementations by Arb enz and Arb enz and Hegland resp ectively by means of numerical exp eriments conducted on the Intel Paragon Corresp onding results for the IBM SP are presented in ScaLAPACK is a software package with a diverse user community Each subroutine should have an easily intelligible calling sequence interface and work with easily manageable data distributions These constraints may reduce the p erformance of the co de The other two co des are exp erimental They have b een optimized for low communication overhead The number of messages sent among pro cessors and the marshaling pro cess has b een minimized for the task of solving a system of equations The co de do es for instance not split the LU factorization from the forward elimination which prohibits the solution of a sequence of systems of equations with equal system matrix without factoring the system matrix over and over again Our comparisons shall give an answer to the question how much p erformance the ScaLAPACK algorithms may have lost through the constraint to b e user friendly We further continue a discussion started in on the overhead introduced by partial pivoting As the execution time of the pivoting algorithm was only marginally higher than that of the non pivoting algorithm the question arose if it is necessary to have a pivoting as well as a nonpivoting algorithm in ScaLAPACK In LAPACK for instance all dense and banded systems that are not symmetric p ositive denite are solved by the same pivoting algorithm However in contrast to the serial algorithm in the parallel algorithm the doubling of the memory requirement is unavoidable see x We settle this issue in the concluding section Parallel Gaussian elimination for the diagonally dominant case In this section we assume that the matrix A a in is diagonally dominant ie that ij ij n n X ja j i n ja j ij ii j j 6i Then the system of equations can b e solved by Gaussian elimination without pivoting in the following three steps PARALLEL SOLVERS FOR NARROWBANDED LINEAR SYSTEMS Factorization into A LU Solution of Lz y forward elimination Solution of U x z backward substitution The lower and upp er triangular Gauss factors L and U are banded with bandwidths k and k l u resp ectively where k and k are the halfbandwidths of A The number of oating p oint op erations l u for solving the banded system with r righthand sides by Gaussian elimination is see also n eg k k n k k r n O k r k k maxfk k g n u l l u l u For solving in parallel on a p pro cessor multicomputer we partition the matrix A the solution vector x and the righthand hand side b according to U A B x b U L B C C B B C C D B B C C B B C L U B C C B B C D A B x b B C C B B C B C C B B C B C C B B C B C C B B C L U A A A B C D p p p p p L A D x b p p p p P p k k k n n n i i i R and n p k n This block C R x b R where A R i i i i i i i i tridiagonal partition is feasible only if n k This condition restricts the degree of parallelism i ie the maximal number of pro cessors p that
Recommended publications
  • Recursive Approach in Sparse Matrix LU Factorization
    51 Recursive approach in sparse matrix LU factorization Jack Dongarra, Victor Eijkhout and the resulting matrix is often guaranteed to be positive Piotr Łuszczek∗ definite or close to it. However, when the linear sys- University of Tennessee, Department of Computer tem matrix is strongly unsymmetric or indefinite, as Science, Knoxville, TN 37996-3450, USA is the case with matrices originating from systems of Tel.: +865 974 8295; Fax: +865 974 8296 ordinary differential equations or the indefinite matri- ces arising from shift-invert techniques in eigenvalue methods, one has to revert to direct methods which are This paper describes a recursive method for the LU factoriza- the focus of this paper. tion of sparse matrices. The recursive formulation of com- In direct methods, Gaussian elimination with partial mon linear algebra codes has been proven very successful in pivoting is performed to find a solution of Eq. (1). Most dense matrix computations. An extension of the recursive commonly, the factored form of A is given by means technique for sparse matrices is presented. Performance re- L U P Q sults given here show that the recursive approach may per- of matrices , , and such that: form comparable to leading software packages for sparse ma- LU = PAQ, (2) trix factorization in terms of execution time, memory usage, and error estimates of the solution. where: – L is a lower triangular matrix with unitary diago- nal, 1. Introduction – U is an upper triangular matrix with arbitrary di- agonal, Typically, a system of linear equations has the form: – P and Q are row and column permutation matri- Ax = b, (1) ces, respectively (each row and column of these matrices contains single a non-zero entry which is A n n A ∈ n×n x where is by real matrix ( R ), and 1, and the following holds: PPT = QQT = I, b n b, x ∈ n and are -dimensional real vectors ( R ).
    [Show full text]
  • Abstract 1 Introduction
    Implementation in ScaLAPACK of Divide-and-Conquer Algorithms for Banded and Tridiagonal Linear Systems A. Cleary Department of Computer Science University of Tennessee J. Dongarra Department of Computer Science University of Tennessee Mathematical Sciences Section Oak Ridge National Laboratory Abstract Described hereare the design and implementation of a family of algorithms for a variety of classes of narrow ly banded linear systems. The classes of matrices include symmetric and positive de - nite, nonsymmetric but diagonal ly dominant, and general nonsymmetric; and, al l these types are addressed for both general band and tridiagonal matrices. The family of algorithms captures the general avor of existing divide-and-conquer algorithms for banded matrices in that they have three distinct phases, the rst and last of which arecompletely paral lel, and the second of which is the par- al lel bottleneck. The algorithms have been modi ed so that they have the desirable property that they are the same mathematical ly as existing factorizations Cholesky, Gaussian elimination of suitably reordered matrices. This approach represents a departure in the nonsymmetric case from existing methods, but has the practical bene ts of a smal ler and more easily hand led reduced system. All codes implement a block odd-even reduction for the reduced system that al lows the algorithm to scale far better than existing codes that use variants of sequential solution methods for the reduced system. A cross section of results is displayed that supports the predicted performance results for the algo- rithms. Comparison with existing dense-type methods shows that for areas of the problem parameter space with low bandwidth and/or high number of processors, the family of algorithms described here is superior.
    [Show full text]
  • Periodic Staircase Matrices and Generalized Cluster Structures
    PERIODIC STAIRCASE MATRICES AND GENERALIZED CLUSTER STRUCTURES MISHA GEKHTMAN, MICHAEL SHAPIRO, AND ALEK VAINSHTEIN Abstract. As is well-known, cluster transformations in cluster structures of geometric type are often modeled on determinant identities, such as short Pl¨ucker relations, Desnanot–Jacobi identities and their generalizations. We present a construction that plays a similar role in a description of general- ized cluster transformations and discuss its applications to generalized clus- ter structures in GLn compatible with a certain subclass of Belavin–Drinfeld Poisson–Lie brackets, in the Drinfeld double of GLn, and in spaces of periodic difference operators. 1. Introduction Since the discovery of cluster algebras in [4], many important algebraic varieties were shown to support a cluster structure in a sense that the coordinate rings of such variety is isomorphic to a cluster algebra or an upper cluster algebra. Lie theory and representation theory turned out to be a particularly rich source of varieties of this sort including but in no way limited to such examples as Grassmannians [5, 18], double Bruhat cells [1] and strata in flag varieties [15]. In all these examples, cluster transformations that connect distinguished coordinate charts within a ring of regular functions are modeled on three-term relations such as short Pl¨ucker relations, Desnanot–Jacobi identities and their Lie-theoretic generalizations of the kind considered in [3]. This remains true even in the case of exotic cluster structures on GLn considered in [8, 10] where cluster transformations can be obtained by applying Desnanot–Jacobi type identities to certain structured matrices of a size far exceeding n.
    [Show full text]
  • Overview of Iterative Linear System Solver Packages
    Overview of Iterative Linear System Solver Packages Victor Eijkhout July, 1998 Abstract Description and comparison of several packages for the iterative solu- tion of linear systems of equations. 1 1 Intro duction There are several freely available packages for the iterative solution of linear systems of equations, typically derived from partial di erential equation prob- lems. In this rep ort I will give a brief description of a numberofpackages, and giveaninventory of their features and de ning characteristics. The most imp ortant features of the packages are which iterative metho ds and preconditioners supply; the most relevant de ning characteristics are the interface they present to the user's data structures, and their implementation language. 2 2 Discussion Iterative metho ds are sub ject to several design decisions that a ect ease of use of the software and the resulting p erformance. In this section I will give a global discussion of the issues involved, and how certain p oints are addressed in the packages under review. 2.1 Preconditioners A go o d preconditioner is necessary for the convergence of iterative metho ds as the problem to b e solved b ecomes more dicult. Go o d preconditioners are hard to design, and this esp ecially holds true in the case of parallel pro cessing. Here is a short inventory of the various kinds of preconditioners found in the packages reviewed. 2.1.1 Ab out incomplete factorisation preconditioners Incomplete factorisations are among the most successful preconditioners devel- op ed for single-pro cessor computers. Unfortunately, since they are implicit in nature, they cannot immediately b e used on parallel architectures.
    [Show full text]
  • Arxiv:Math/0010243V1
    FOUR SHORT STORIES ABOUT TOEPLITZ MATRIX CALCULATIONS THOMAS STROHMER∗ Abstract. The stories told in this paper are dealing with the solution of finite, infinite, and biinfinite Toeplitz-type systems. A crucial role plays the off-diagonal decay behavior of Toeplitz matrices and their inverses. Classical results of Gelfand et al. on commutative Banach algebras yield a general characterization of this decay behavior. We then derive estimates for the approximate solution of (bi)infinite Toeplitz systems by the finite section method, showing that the approximation rate depends only on the decay of the entries of the Toeplitz matrix and its condition number. Furthermore, we give error estimates for the solution of doubly infinite convolution systems by finite circulant systems. Finally, some quantitative results on the construction of preconditioners via circulant embedding are derived, which allow to provide a theoretical explanation for numerical observations made by some researchers in connection with deconvolution problems. Key words. Toeplitz matrix, Laurent operator, decay of inverse matrix, preconditioner, circu- lant matrix, finite section method. AMS subject classifications. 65T10, 42A10, 65D10, 65F10 0. Introduction. Toeplitz-type equations arise in many applications in mathe- matics, signal processing, communications engineering, and statistics. The excellent surveys [4, 17] describe a number of applications and contain a vast list of references. The stories told in this paper are dealing with the (approximate) solution of biinfinite, infinite, and finite hermitian positive definite Toeplitz-type systems. We pay special attention to Toeplitz-type systems with certain decay properties in the sense that the entries of the matrix enjoy a certain decay rate off the diagonal.
    [Show full text]
  • Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product
    Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product Hartwig Anzt and Stanimire Tomov and Jack Dongarra Innovative Computing Lab University of Tennessee Knoxville, USA Email: [email protected], [email protected], [email protected] Abstract— the computing power of today’s supercomputers, often accel- erated by coprocessors like graphics processing units (GPUs), This paper presents a heterogeneous CPU-GPU algorithm design and optimized implementation for an entire sparse iter- becomes challenging. ative eigensolver – the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) – starting from low-level GPU While there exist numerous efforts to adapt iterative lin- data structures and kernels to the higher-level algorithmic choices ear solvers like Krylov subspace methods to coprocessor and overall heterogeneous design. Most notably, the eigensolver technology, sparse eigensolvers have so far remained out- leverages the high-performance of a new GPU kernel developed side the main focus. A possible explanation is that many for the simultaneous multiplication of a sparse matrix and a of those combine sparse and dense linear algebra routines, set of vectors (SpMM). This is a building block that serves which makes porting them to accelerators more difficult. Aside as a backbone for not only block-Krylov, but also for other from the power method, algorithms based on the Krylov methods relying on blocking for acceleration in general. The subspace idea are among the most commonly used general heterogeneous LOBPCG developed here reveals the potential of eigensolvers [1]. When targeting symmetric positive definite this type of eigensolver by highly optimizing all of its components, eigenvalue problems, the recently developed Locally Optimal and can be viewed as a benchmark for other SpMM-dependent applications.
    [Show full text]
  • THREE STEPS on an OPEN ROAD Gilbert Strang This Note Describes
    Inverse Problems and Imaging doi:10.3934/ipi.2013.7.961 Volume 7, No. 3, 2013, 961{966 THREE STEPS ON AN OPEN ROAD Gilbert Strang Massachusetts Institute of Technology Cambridge, MA 02139, USA Abstract. This note describes three recent factorizations of banded invertible infinite matrices 1. If A has a banded inverse : A=BC with block{diagonal factors B and C. 2. Permutations factor into a shift times N < 2w tridiagonal permutations. 3. A = LP U with lower triangular L, permutation P , upper triangular U. We include examples and references and outlines of proofs. This note describes three small steps in the factorization of banded matrices. It is written to encourage others to go further in this direction (and related directions). At some point the problems will become deeper and more difficult, especially for doubly infinite matrices. Our main point is that matrices need not be Toeplitz or block Toeplitz for progress to be possible. An important theory is already established [2, 9, 10, 13-16] for non-Toeplitz \band-dominated operators". The Fredholm index plays a key role, and the second small step below (taken jointly with Marko Lindner) computes that index in the special case of permutation matrices. Recall that banded Toeplitz matrices lead to Laurent polynomials. If the scalars or matrices a−w; : : : ; a0; : : : ; aw lie along the diagonals, the polynomial is A(z) = P k akz and the bandwidth is w. The required index is in this case a winding number of det A(z). Factorization into A+(z)A−(z) is a classical problem solved by Plemelj [12] and Gohberg [6-7].
    [Show full text]
  • Arxiv:2004.05118V1 [Math.RT]
    GENERALIZED CLUSTER STRUCTURES RELATED TO THE DRINFELD DOUBLE OF GLn MISHA GEKHTMAN, MICHAEL SHAPIRO, AND ALEK VAINSHTEIN Abstract. We prove that the regular generalized cluster structure on the Drinfeld double of GLn constructed in [9] is complete and compatible with the standard Poisson–Lie structure on the double. Moreover, we show that for n = 4 this structure is distinct from a previously known regular generalized cluster structure on the Drinfeld double, even though they have the same compatible Poisson structure and the same collection of frozen variables. Further, we prove that the regular generalized cluster structure on band periodic matrices constructed in [9] possesses similar compatibility and completeness properties. 1. Introduction In [6, 8] we constructed an initial seed Σn for a complete generalized cluster D structure GCn in the ring of regular functions on the Drinfeld double D(GLn) and proved that this structure is compatible with the standard Poisson–Lie structure on D(GLn). In [9, Section 4] we constructed a different seed Σ¯ n for a regular D D generalized cluster structure GCn on D(GLn). In this note we prove that GCn D shares all the properties of GCn : it is complete and compatible with the standard Poisson–Lie structure on D(GLn). Moreover, we prove that the seeds Σ¯ 4(X, Y ) and T T Σ4(Y ,X ) are not mutationally equivalent. In this way we provide an explicit example of two different regular complete generalized cluster structures on the same variety with the same compatible Poisson structure and the same collection of frozen D variables.
    [Show full text]
  • The Spectra of Large Toeplitz Band Matrices with A
    MATHEMATICS OF COMPUTATION Volume 72, Number 243, Pages 1329{1348 S 0025-5718(03)01505-9 Article electronically published on February 3, 2003 THESPECTRAOFLARGETOEPLITZBANDMATRICES WITH A RANDOMLY PERTURBED ENTRY A. BOTTCHER,¨ M. EMBREE, AND V. I. SOKOLOV Abstract. (j;k) This paper is concerned with the union spΩ Tn(a) of all possible spectra that may emerge when perturbing a large n n Toeplitz band matrix × Tn(a)inthe(j; k) site by a number randomly chosen from some set Ω. The main results give descriptive bounds and, in several interesting situations, even (j;k) provide complete identifications of the limit of sp Tn(a)asn .Also Ω !1 discussed are the cases of small and large sets Ω as well as the \discontinuity of (j;k) the infinite volume case", which means that in general spΩ Tn(a)doesnot converge to something close to sp(j;k) T (a)asn ,whereT (a)isthecor- Ω !1 responding infinite Toeplitz matrix. Illustrations are provided for tridiagonal Toeplitz matrices, a notable special case. 1. Introduction and main results For a complex-valued continuous function a on the complex unit circle T,the infinite Toeplitz matrix T (a) and the finite Toeplitz matrices Tn(a) are defined by n T (a)=(aj k)j;k1 =1 and Tn(a)=(aj k)j;k=1; − − where a` is the `th Fourier coefficient of a, 2π 1 iθ i`θ a` = a(e )e− dθ; ` Z: 2π 2 Z0 Here, we restrict our attention to the case where a is a trigonometric polynomial, a , implying that at most a finite number of the Fourier coefficients are nonzero; equivalently,2P T (a) is a banded matrix.
    [Show full text]
  • Over-Scalapack.Pdf
    1 2 The Situation: Parallel scienti c applications are typically Overview of ScaLAPACK written from scratch, or manually adapted from sequen- tial programs Jack Dongarra using simple SPMD mo del in Fortran or C Computer Science Department UniversityofTennessee with explicit message passing primitives and Mathematical Sciences Section Which means Oak Ridge National Lab oratory similar comp onents are co ded over and over again, co des are dicult to develop and maintain http://w ww .ne tl ib. org /ut k/p e opl e/J ackDo ngarra.html debugging particularly unpleasant... dicult to reuse comp onents not really designed for long-term software solution 3 4 What should math software lo ok like? The NetSolve Client Many more p ossibilities than shared memory Virtual Software Library Multiplicityofinterfaces ! Ease of use Possible approaches { Minimum change to status quo Programming interfaces Message passing and ob ject oriented { Reusable templates { requires sophisticated user Cinterface FORTRAN interface { Problem solving environments Integrated systems a la Matlab, Mathematica use with heterogeneous platform Interactiveinterfaces { Computational server, like netlib/xnetlib f2c Non-graphic interfaces Some users want high p erf., access to details {MATLAB interface Others sacri ce sp eed to hide details { UNIX-Shell interface Not enough p enetration of libraries into hardest appli- Graphic interfaces cations on vector machines { TK/TCL interface Need to lo ok at applications and rethink mathematical software { HotJava Browser
    [Show full text]
  • Prospectus for the Next LAPACK and Scalapack Libraries
    Prospectus for the Next LAPACK and ScaLAPACK Libraries James Demmel1, Jack Dongarra23, Beresford Parlett1, William Kahan1, Ming Gu1, David Bindel1, Yozo Hida1, Xiaoye Li1, Osni Marques1, E. Jason Riedy1, Christof Voemel1, Julien Langou2, Piotr Luszczek2, Jakub Kurzak2, Alfredo Buttari2, Julie Langou2, Stanimire Tomov2 1 University of California, Berkeley CA 94720, USA, 2 University of Tennessee, Knoxville TN 37996, USA, 3 Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA, 1 Introduction Dense linear algebra (DLA) forms the core of many scienti¯c computing appli- cations. Consequently, there is continuous interest and demand for the devel- opment of increasingly better algorithms in the ¯eld. Here 'better' has a broad meaning, and includes improved reliability, accuracy, robustness, ease of use, and most importantly new or improved algorithms that would more e±ciently use the available computational resources to speed up the computation. The rapidly evolving high end computing systems and the close dependence of DLA algo- rithms on the computational environment is what makes the ¯eld particularly dynamic. A typical example of the importance and impact of this dependence is the development of LAPACK [1] (and later ScaLAPACK [2]) as a successor to the well known and formerly widely used LINPACK [3] and EISPACK [3] libraries. Both LINPACK and EISPACK were based, and their e±ciency depended, on optimized Level 1 BLAS [4]. Hardware development trends though, and in par- ticular an increasing Processor-to-Memory speed gap of approximately 50% per year, started to increasingly show the ine±ciency of Level 1 BLAS vs Level 2 and 3 BLAS, which prompted e®orts to reorganize DLA algorithms to use block matrix operations in their innermost loops.
    [Show full text]
  • Conversion of Sparse Matrix to Band Matrix Using an Fpga For
    CONVERSION OF SPARSE MATRIX TO BAND MATRIX USING AN FPGA FOR HIGH-PERFORMANCE COMPUTING by Anjani Chaudhary, M.S. A thesis submitted to the Graduate Council of Texas State University in partial fulfillment of the requirements for the degree of Master of Science with a Major in Engineering December 2020 Committee Members: Semih Aslan, Chair Shuying Sun William A Stapleton COPYRIGHT by Anjani Chaudhary 2020 FAIR USE AND AUTHOR’S PERMISSION STATEMENT Fair Use This work is protected by the Copyright Laws of the United States (Public Law 94-553, section 107). Consistent with fair use as defined in the Copyright Laws, brief quotations from this material are allowed with proper acknowledgement. Use of this material for financial gain without the author’s express written permission is not allowed. Duplication Permission As the copyright holder of this work, I, Anjani Chaudhary, authorize duplication of this work, in whole or in part, for educational or scholarly purposes only. ACKNOWLEDGEMENTS Throughout my research and report writing process, I am grateful to receive support and suggestions from the following people. I would like to thank my supervisor and chair, Dr. Semih Aslan, Associate Professor, Ingram School of Engineering, for his time, support, encouragement, and patience, without which I would not have made it. My committee members Dr. Bill Stapleton, Associate Professor, Ingram School of Engineering, and Dr. Shuying Sun, Associate Professor, Department of Mathematics, for their essential guidance throughout the research and my graduate advisor, Dr. Vishu Viswanathan, for providing the opportunity and valuable suggestions which helped me to come one more step closer to my goal.
    [Show full text]