C Parallelogram Triangular Fill Bulge

Total Page:16

File Type:pdf, Size:1020Kb

C Parallelogram Triangular Fill Bulge UC Berkeley UC Berkeley Electronic Theses and Dissertations Title Avoiding Communication in Dense Linear Algebra Permalink https://escholarship.org/uc/item/95n2b7vr Author Ballard, Grey Publication Date 2013 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California Avoiding Communication in Dense Linear Algebra by Grey Malone Ballard A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science and the Designated Emphasis in Computational Science and Engineering in the Graduate Division of the University of California, Berkeley Committee in charge: Professor James Demmel, Chair Professor Ming Gu Professor Katherine Yelick Fall 2013 Avoiding Communication in Dense Linear Algebra Copyright 2013 by Grey Malone Ballard 1 Abstract Avoiding Communication in Dense Linear Algebra by Grey Malone Ballard Doctor of Philosophy in Computer Science with a Designated Emphasis in Computational Science and Engineering University of California, Berkeley Professor James Demmel, Chair Dense linear algebra computations are essential to nearly every problem in scientific computing and to countless other fields. Most matrix computations enjoy a high compu- tational intensity (i.e., ratio of computation to data), and therefore the algorithms for the computations have a potential for high efficiency. However, performance for many linear algebra algorithms is limited by the cost of moving data between processors on a parallel computer or throughout the memory hierarchy of a single processor, which we will refer to generally as communication. Technological trends indicate that algorithmic performance will become even more limited by communication in the future. In this thesis, we consider the fundamental computations within dense linear algebra and address the following question: can we significantly improve the current algorithms for these computations, in terms of the communication they require and their performance in practice? To answer the question, we analyze algorithms on sequential and parallel architectural models that are simple enough to determine coarse communication costs but accurate enough to predict performance of implementations on real hardware. For most of the computations, we prove lower bounds on the communication that any algorithm must perform. If an algorithm exists with communication costs that match the lower bounds (at least in an asymptotic sense), we call the algorithm communication optimal. In many cases, the most commonly used algorithms are not communication optimal, and we can develop new algo- rithms that require less data movement and attain the communication lower bounds. In this thesis, we develop both new communication lower bounds and new algorithms, tightening (and in many cases closing) the gap between best known lower bound and best known algorithm (or upper bound). We consider both sequential and parallel algorithms, and we asses both classical and fast algorithms (e.g., Strassen's matrix multiplication algorithm). In particular, the central contributions of this thesis are • proving new communication lower bounds for nearly all classical direct linear algebra computations (dense or sparse), including factorizations for solving linear systems, 2 least squares problems, and eigenvalue and singular value problems, • proving new communication lower bounds for Strassen's and other fast matrix multi- plication algorithms, • proving new parallel communication lower bounds for classical and fast computations that set limits on an algorithm's ability to perfectly strong scale, • summarizing the state-of-the-art in communication efficiency for both sequential and parallel algorithms for the computations to which the lower bounds apply, • developing a new communication-optimal algorithm for computing a symmetric-indef- inite factorization (observing speedups of up to 2.8× compared to alternative shared- memory parallel algorithms), • developing new, more communication-efficient algorithms for reducing a symmetric band matrix to tridiagonal form via orthogonal similar transformations (observing speedups of 2{6× compared to alternative sequential and parallel algorithms), and • developing a new communication-optimal parallelization of Strassen's matrix mul- tiplication algorithm (observing speedups of up to 2.84× compared to alternative distributed-memory parallel algorithms). i Table of Contents Table of Contents i List of Figures vi List of Tables viii 1 Introduction 1 1.1 The Role of Scientific Computing . 1 1.2 The Importance of Dense Linear Algebra . 1 1.3 The Rise of Parallelism and the Relative Costs of Communication . 2 1.4 Thesis Goals and Contributions . 3 1.5 Thesis Organization . 5 2 Preliminaries 6 2.1 Notation and Definitions . 6 2.1.1 Asymptotic Notation . 6 2.1.2 Algorithmic Terminology . 7 2.1.3 Communication Terminology . 8 2.2 Memory Models . 10 2.2.1 Two-Level Sequential Memory Model . 10 2.2.2 Distributed-Memory Parallel Model . 12 2.3 Data Layouts . 13 2.3.1 Matrix Layouts in Slow Memory . 13 2.3.2 Matrix Distributions on Parallel Machines . 14 2.4 Fast Matrix Multiplication Algorithms . 15 2.4.1 Strassen's Algorithm . 15 2.4.2 Strassen-Winograd Algorithm . 16 2.5 Lower Bound Lemmas . 16 2.5.1 Loomis-Whitney Inequality . 16 2.5.2 Expansion Preliminaries . 17 2.5.3 Latency Lower Bounds . 18 2.6 Numerical Stability Lemmas . 18 ii I Communication Lower Bounds 21 3 Communication Lower Bounds via Reductions 22 3.1 Classical Matrix Multiplication . 22 3.2 Reduction Arguments . 23 3.2.1 LU Decomposition . 23 3.2.2 Cholesky Decomposition . 24 3.3 Conclusions . 29 4 Lower Bounds for Classical Linear Algebra 30 4.1 Lower Bounds for Three-Nested-Loops Computation . 31 4.1.1 Lower Bound Argument . 32 4.1.2 Applications of the Lower Bound . 35 4.2 Lower Bounds for Three-Nested-Loop Computation with Temporary Operands 41 4.2.1 Lower Bound Argument . 41 4.2.2 Applications of the Lower Bound . 43 4.3 Applying Orthogonal Transformations . 47 4.3.1 First Lower Bound Argument: Applying Theorem 4.10 . 48 4.3.2 Second Lower Bound Argument: Bounding Z Values . 49 4.3.3 Generalizing to Eigenvalue and Singular Value Reductions . 57 4.3.4 Applicability of the Lower Bounds . 58 4.4 Attainability . 59 5 Lower Bounds for Strassen's Matrix Multiplication 60 5.1 Relating Edge Expansion to Communication . 61 5.1.1 Computation Graph . 61 5.1.2 Partition Argument . 61 5.1.3 Edge Expansion and Communication . 62 5.2 Expansion Properties of Strassen's Algorithm . 63 5.2.1 Computation Graph for n-by-n Matrices . 64 5.3 Communication Lower Bounds . 70 5.4 Conclusions . 71 6 Extensions of the Lower Bounds 72 6.1 Strassen-like Algorithms . 73 6.1.1 Connected Decoding Graph Assumption . 73 6.1.2 Communication Costs of Strassen-like Algorithms . 73 6.1.3 Fast Linear Algebra . 74 6.1.4 Fast Rectangular Matrix Multiplication Algorithms . 75 6.2 Memory-Independent Lower Bounds . 76 6.2.1 Communication Lower Bounds . 77 6.2.2 Limits of Strong Scaling . 79 iii 6.2.3 Extensions of Memory-Independent Bounds . 81 6.3 Other Extensions . 82 6.3.1 k-Nested-Loops Computations . 82 6.3.2 Sparse Matrix-Matrix Multiplication . 82 II Algorithms and Communication Cost Analysis 84 7 Sequential Algorithms and their Communication Costs 85 7.1 Classical Linear Algebra . 86 7.1.1 BLAS Computations . 88 7.1.2 Cholesky Decomposition . 88 7.1.3 Symmetric-Indefinite Decompositions . 89 7.1.4 LU Decomposition . 90 7.1.5 QR Decomposition . 91 7.1.6 Symmetric Eigendecomposition and SVD . 92 7.1.7 Nonsymmetric Eigendecomposition . 93 7.2 Fast Linear Algebra . 94 7.3 Conclusions and Future Work . 95 8 Parallel Algorithms and their Communication Costs 97 8.1 Classical Linear Algebra (with Minimal Memory) . 98 8.1.1 BLAS Computations . 99 8.1.2 Cholesky Decomposition . 99 8.1.3 Symmetric-Indefinite Decompositions . 99 8.1.4 LU Decomposition . 100 8.1.5 QR Decomposition . 100 8.1.6 Symmetric Eigendecomposition and SVD . 101 8.1.7 Nonsymmetric Eigendecomposition . 101 8.2 Classical Linear Algebra (with Extra Memory) . 102 8.2.1 Matrix Multiplication . 102 8.2.2 Other Linear Algebra Computations . 103 8.3 Fast Linear Algebra . 103 8.4 Conclusions and Future Work . 104 9 Communication-Avoiding Symmetric-Indefinite Factorization 105 9.1 Block-Aasen Algorithm . 107 9.1.1 Correctness . 108 9.1.2 Solving Two-Sided Triangular Linear Systems . 111 9.1.3 Pivoting . 115 9.1.4 Computing W and H . 116 9.1.5 The Second Phase of the Algorithm: Factoring T . 116 iv 9.2 Numerical Stability . 117 9.2.1 Stability of the Two-Sided Triangular Solver . 117 9.2.2 Stability of the Block-Aasen Algorithm . 119 9.2.3 Growth . 126 9.3 Sequential Complexity Analyses . 126 9.3.1 Computational Cost . 127 9.3.2 Communication Costs . 127 9.4 Numerical Experiments . 132 9.5 Conclusions . 135 10 Communication-Avoiding Successive Band Reduction 136 10.1 Preliminaries . 137 10.1.1 Eigendecomposition of Band Matrices . 137 10.1.2 SBR Notation . ..
Recommended publications
  • Life As a Developer of Numerical Software
    A Brief History of Numerical Libraries Sven Hammarling NAG Ltd, Oxford & University of Manchester First – Something about Jack Jack’s thesis (August 1980) 30 years ago! TOMS Algorithm 589 Small Selection of Jack’s Projects • Netlib and other software repositories • NA Digest and na-net • PVM and MPI • TOP 500 and computer benchmarking • NetSolve and other distributed computing projects • Numerical linear algebra Onto the Rest of the Talk! Rough Outline • History and influences • Fortran • Floating Point Arithmetic • Libraries and packages • Proceedings and Books • Summary Ada Lovelace (Countess Lovelace) Born Augusta Ada Byron 1815 – 1852 The language Ada was named after her “Is thy face like thy mother’s, my fair child! Ada! sole daughter of my house and of my heart? When last I saw thy young blue eyes they smiled, And then we parted,-not as now we part, but with a hope” Childe Harold’s Pilgramage, Lord Byron Program for the Bernoulli Numbers Manchester Baby, 21 June 1948 (Replica) 19 Kilburn/Tootill Program to compute the highest proper factor 218 218 took 52 minutes 1.5 million instructions 3.5 million store accesses First published numerical library, 1951 First use of the word subroutine? Quality Numerical Software • Should be: – Numerically stable, with measures of quality of solution – Reliable and robust – Accompanied by test software – Useful and user friendly with example programs – Fully documented – Portable – Efficient “I have little doubt that about 80 per cent. of all the results printed from the computer are in error to a much greater extent than the user would believe, ...'' Leslie Fox, IMA Bulletin, 1971 “Giving business people spreadsheets is like giving children circular saws.
    [Show full text]
  • Numerical and Parallel Libraries
    Numerical and Parallel Libraries Uwe Küster University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Uwe Küster Slide 1 Höchstleistungsrechenzentrum Stuttgart Numerical Libraries Public Domain commercial vendor specific Uwe Küster Slide 2 Höchstleistungsrechenzentrum Stuttgart 33. — Numerical and Parallel Libraries — 33. 33-1 Overview • numerical libraries for linear systems – dense – sparse •FFT • support for parallelization Uwe Küster Slide 3 Höchstleistungsrechenzentrum Stuttgart Public Domain Lapack-3 linear equations, eigenproblems BLAS fast linear kernels Linpack linear equations Eispack eigenproblems Slatec old library, large functionality Quadpack numerical quadrature Itpack sparse problems pim linear systems PETSc linear systems Netlib Server best server http://www.netlib.org/utk/papers/iterative-survey/packages.html Uwe Küster Slide 4 Höchstleistungsrechenzentrum Stuttgart 33. — Numerical and Parallel Libraries — 33. 33-2 netlib server for all public domain numerical programs and libraries http://www.netlib.org Uwe Küster Slide 5 Höchstleistungsrechenzentrum Stuttgart Contents of netlib access aicm alliant amos ampl anl-reports apollo atlas benchmark bib bibnet bihar blacs blas blast bmp c c++ cephes chammp cheney-kincaid clapack commercial confdb conformal contin control crc cumulvs ddsv dierckx diffpack domino eispack elefunt env f2c fdlibm fftpack fishpack fitpack floppy fmm fn fortran fortran-m fp gcv gmat gnu go graphics harwell hence hompack hpf hypercube ieeecss ijsa image intercom itpack
    [Show full text]
  • Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist
    Biographies Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist Thomas Haigh University of Wisconsin Editor: Thomas Haigh Jack J. Dongarra was born in Chicago in 1950 to a he applied for an internship at nearby Argonne National family of Sicilian immigrants. He remembers himself as Laboratory as part of a program that rewarded a small an undistinguished student during his time at a local group of undergraduates with college credit. Dongarra Catholic elementary school, burdened by undiagnosed credits his success here, against strong competition from dyslexia.Onlylater,inhighschool,didhebeginto students attending far more prestigious schools, to Leff’s connect material from science classes with his love of friendship with Jim Pool who was then associate director taking machines apart and tinkering with them. Inspired of the Applied Mathematics Division at Argonne.2 by his science teacher, he attended Chicago State University and majored in mathematics, thinking that EISPACK this would combine well with education courses to equip Dongarra was supervised by Brian Smith, a young him for a high school teaching career. The first person in researcher whose primary concern at the time was the his family to go to college, he lived at home and worked lab’s EISPACK project. Although budget cuts forced Pool to in a pizza restaurant to cover the cost of his education.1 make substantial layoffs during his time as acting In 1972, during his senior year, a series of chance director of the Applied Mathematics Division in 1970– events reshaped Dongarra’s planned career. On the 1971, he had made a special effort to find funds to suggestion of Harvey Leff, one of his physics professors, protect the project and hire Smith.
    [Show full text]
  • Randnla: Randomized Numerical Linear Algebra
    review articles DOI:10.1145/2842602 generation mechanisms—as an algo- Randomization offers new benefits rithmic or computational resource for the develop ment of improved algo- for large-scale linear algebra computations. rithms for fundamental matrix prob- lems such as matrix multiplication, BY PETROS DRINEAS AND MICHAEL W. MAHONEY least-squares (LS) approximation, low- rank matrix approxi mation, and Lapla- cian-based linear equ ation solvers. Randomized Numerical Linear Algebra (RandNLA) is an interdisci- RandNLA: plinary research area that exploits randomization as a computational resource to develop improved algo- rithms for large-scale linear algebra Randomized problems.32 From a foundational per- spective, RandNLA has its roots in theoretical computer science (TCS), with deep connections to mathemat- Numerical ics (convex analysis, probability theory, metric embedding theory) and applied mathematics (scientific computing, signal processing, numerical linear Linear algebra). From an applied perspec- tive, RandNLA is a vital new tool for machine learning, statistics, and data analysis. Well-engineered implemen- Algebra tations have already outperformed highly optimized software libraries for ubiquitous problems such as least- squares,4,35 with good scalability in par- allel and distributed environments. 52 Moreover, RandNLA promises a sound algorithmic and statistical foundation for modern large-scale data analysis. MATRICES ARE UBIQUITOUS in computer science, statistics, and applied mathematics. An m × n key insights matrix can encode information about m objects ˽ Randomization isn’t just used to model noise in data; it can be a powerful (each described by n features), or the behavior of a computational resource to develop discretized differential operator on a finite element algorithms with improved running times and stability properties as well as mesh; an n × n positive-definite matrix can encode algorithms that are more interpretable in the correlations between all pairs of n objects, or the downstream data science applications.
    [Show full text]
  • Evolving Software Repositories
    1 Evolving Software Rep ositories http://www.netli b.org/utk/pro ject s/esr/ Jack Dongarra UniversityofTennessee and Oak Ridge National Lab oratory Ron Boisvert National Institute of Standards and Technology Eric Grosse AT&T Bell Lab oratories 2 Pro ject Fo cus Areas NHSE Overview Resource Cataloging and Distribution System RCDS Safe execution environments for mobile co de Application-l evel and content-oriented to ols Rep ository interop erabili ty Distributed, semantic-based searching 3 NHSE National HPCC Software Exchange NASA plus other agencies funded CRPC pro ject Center for ResearchonParallel Computation CRPC { Argonne National Lab oratory { California Institute of Technology { Rice University { Syracuse University { UniversityofTennessee Uniform interface to distributed HPCC software rep ositories Facilitation of cross-agency and interdisciplinary software reuse Material from ASTA, HPCS, and I ITA comp onents of the HPCC program http://www.netlib.org/nhse/ 4 Goals: Capture, preserve and makeavailable all software and software- related artifacts pro duced by the federal HPCC program. Soft- ware related artifacts include algorithms, sp eci cations, designs, do cumentation, rep ort, ... Promote formation, growth, and interop eration of discipline-oriented rep ositories that organize, evaluate, and add value to individual contributions. Employ and develop where necessary state-of-the-art technologies for assisting users in nding, understanding, and using HPCC software and technologies. 5 Bene ts: 1. Faster development of high-quality software so that scientists can sp end less time writing and debugging programs and more time on research problems. 2. Less duplication of software development e ort by sharing of soft- ware mo dules.
    [Show full text]
  • 2 Accessing Matlab at ER4
    Department of Electrical Engineering EE281 Intro duction to MATLAB on the Region IV Computing Facilities 1 What is Matlab? Matlab is a high-p erformance interactive software package for scienti c and enginnering numeric computation. Matlab integrates numerical analysis, matrix computation, signal pro cessing, and graphics in an easy-to-use envi- ronment without traditional programming. The name Matlab stands for matrix laboratory. Matlab was originally written to provide easy access to the matrix software develop ed by the LIN- PACK and EISPACK pro jects. Matlab is an interactive system whose basic data element is a matrix that do es not require dimensioning. Furthermore, problem solutions are expressed in Matlab almost exactly as they are written mathematically. Matlab has evolved over the years with inputs from many users. Matlab Toolboxes are sp ecialized collections of Matlab les designed for solving particular classes of functions. Currently available to olb oxes include: signal pro cessing control system optimization neural networks system identi cation robust control analysis splines symb olic mathematics image pro cessing statistics 2 Accessing Matlab at ER4 Sit down at a workstation. Log on. login: smithj <press return> password: 1234js <press return> Use the mouse button to view the Ro ot Menu. Drag the cursor to \Design Tools" then to \Matlab." Release the mouse button. You are now running the Matlab interactive software package. The prompt is a double \greater than" sign. You may use emacs line editor commands and the arrow keys while typing in Matlab. 1 3 An Intro ductory Demonstration Execute the following command to view aquick intro duction to Matlab.
    [Show full text]
  • CS 267 Dense Linear Algebra: History and Structure, Parallel Matrix
    Quick review of earlier lecture •" What do you call •" A program written in PyGAS, a Global Address CS 267 Space language based on Python… Dense Linear Algebra: •" That uses a Monte Carlo simulation algorithm to approximate π … History and Structure, •" That has a race condition, so that it gives you a Parallel Matrix Multiplication" different funny answer every time you run it? James Demmel! Monte - π - thon ! www.cs.berkeley.edu/~demmel/cs267_Spr16! ! 02/25/2016! CS267 Lecture 12! 1! 02/25/2016! CS267 Lecture 12! 2! Outline Outline •" History and motivation •" History and motivation •" What is dense linear algebra? •" What is dense linear algebra? •" Why minimize communication? •" Why minimize communication? •" Lower bound on communication •" Lower bound on communication •" Parallel Matrix-matrix multiplication •" Parallel Matrix-matrix multiplication •" Attaining the lower bound •" Attaining the lower bound •" Other Parallel Algorithms (next lecture) •" Other Parallel Algorithms (next lecture) 02/25/2016! CS267 Lecture 12! 3! 02/25/2016! CS267 Lecture 12! 4! CS267 Lecture 2 1 Motifs What is dense linear algebra? •" Not just matmul! The Motifs (formerly “Dwarfs”) from •" Linear Systems: Ax=b “The Berkeley View” (Asanovic et al.) •" Least Squares: choose x to minimize ||Ax-b||2 Motifs form key computational patterns •" Overdetermined or underdetermined; Unconstrained, constrained, or weighted •" Eigenvalues and vectors of Symmetric Matrices •" Standard (Ax = λx), Generalized (Ax=λBx) •" Eigenvalues and vectors of Unsymmetric matrices
    [Show full text]
  • Comparison of Numerical Methods and Open-Source Libraries for Eigenvalue Analysis of Large-Scale Power Systems
    applied sciences Article Comparison of Numerical Methods and Open-Source Libraries for Eigenvalue Analysis of Large-Scale Power Systems Georgios Tzounas , Ioannis Dassios * , Muyang Liu and Federico Milano School of Electrical and Electronic Engineering, University College Dublin, Belfield, Dublin 4, Ireland; [email protected] (G.T.); [email protected] (M.L.); [email protected] (F.M.) * Correspondence: [email protected] Received: 30 September 2020; Accepted: 24 October 2020; Published: 28 October 2020 Abstract: This paper discusses the numerical solution of the generalized non-Hermitian eigenvalue problem. It provides a comprehensive comparison of existing algorithms, as well as of available free and open-source software tools, which are suitable for the solution of the eigenvalue problems that arise in the stability analysis of electric power systems. The paper focuses, in particular, on methods and software libraries that are able to handle the large-scale, non-symmetric matrices that arise in power system eigenvalue problems. These kinds of eigenvalue problems are particularly difficult for most numerical methods to handle. Thus, a review and fair comparison of existing algorithms and software tools is a valuable contribution for researchers and practitioners that are interested in power system dynamic analysis. The scalability and performance of the algorithms and libraries are duly discussed through case studies based on real-world electrical power networks. These are a model of the All-Island Irish Transmission System with 8640 variables; and, a model of the European Network of Transmission System Operators for Electricity, with 146,164 variables. Keywords: eigenvalue analysis; large non-Hermitian matrices; numerical methods; open-source libraries 1.
    [Show full text]
  • Using MATLAB Version 5 How to Contact the Mathworks
    MATLAB® The Language of Technical Computing Computation Visualization Programming Using MATLAB Version 5 How to Contact The MathWorks: 508-647-7000 Phone 508-647-7001 Fax The MathWorks, Inc. Mail 24 Prime Park Way Natick, MA 01760-1500 http://www.mathworks.com Web ftp.mathworks.com Anonymous FTP server comp.soft-sys.matlab Newsgroup [email protected] Technical support [email protected] Product enhancement suggestions [email protected] Bug reports [email protected] Documentation error reports [email protected] Subscribing user registration [email protected] Order status, license renewals, passcodes [email protected] Sales, pricing, and general information Using MATLAB COPYRIGHT 1984 - 1999 by The MathWorks, Inc. The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or repro- duced in any form without prior written consent from The MathWorks, Inc. U.S. GOVERNMENT: If Licensee is acquiring the Programs on behalf of any unit or agency of the U.S. Government, the following shall apply: (a) For units of the Department of Defense: the Government shall have only the rights specified in the license under which the commercial computer software or commercial software documentation was obtained, as set forth in subparagraph (a) of the Rights in Commercial Computer Software or Commercial Software Documentation Clause at DFARS 227.7202-3, therefore the rights set forth herein shall apply; and (b) For any other unit or agency: NOTICE: Notwithstanding any other lease or license agreement that may pertain to, or accompany the delivery of, the computer software and accompanying documentation, the rights of the Government regarding its use, reproduction, and disclo- sure are as set forth in Clause 52.227-19 (c)(2) of the FAR.
    [Show full text]
  • Squeezing the Most out of Eigenvalue Solvers on High-Performance Computers
    Squeezing the Most out of Eigenvalue Solvers on High-Performance Computers Jack J. Dongarra* Mathematics and Computer Science Division Argonne National Laboratory Argonne, Illinois 60439 Linda Kaufman AT G T Bell Laboratories Murray Hill, New Jersey 07974 and Sven Hammarling Numerical Algorithms Group Ltd. NAG Central Office, Mayfield House 256 Banbuy Road, Oxford OX2 7DE, England Submitted by J. Alan George ABSTRACT This paper describes modifications to many of the standard algorithms used in computing eigenvalues and eigenvectors of matrices. These modifications can dramati- cally increase the performance of the underlying software on high-performance computers without resorting to assembler language, without significantly influencing the floating-point operation count, and without affecting the roundoff-error properties of the algorithms. The techniques are applied to a wide variety of algorithms and are beneficial in various architectural settings. INTRODUCTION On high-performance vector computers like the CRAY-1, CRAY X-MP, Fujitsu VP, Hitachi S-810, and Amdahl 1200, there are three basic performance levels- scalar, vector, and supervector. For example, on the CRAY-1 [5,7, lo], *Work supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract W-31-109Eng.38. LINEAR ALGEBRA AND ITS APPLICATIONS 77:113-136 (1986) 113 00243795/86/$0.00 114 JACK J. DONGARRA ET AL. these levels produce the following execution rates: Rate of execution Performance level (MFLOPS)~ Scalp o-4 Vector 4-50 Super-vector 50-160 Scalar performance is obtained when no advantage is taken of the special features of the machine architecture.
    [Show full text]
  • Solving Large Sparse Eigenvalue Problems on Supercomputers
    CORE https://ntrs.nasa.gov/search.jsp?R=19890017052Metadata, citation 2020-03-20T01:26:29+00:00Z and similar papers at core.ac.uk Provided by NASA Technical Reports Server a -. Solving Large Sparse Eigenvalue Problems on Supercomputers Bernard Philippe Youcef Saad December, 1988 Research Institute for Advanced Computer Science NASA Ames Research Center RIACS Technical Report 88.38 NASA Cooperative Agreement Number NCC 2-387 {NBSA-CR-185421) SOLVING LARGE SPAESE N89 -26 4 23 EIG E N V ALUE PBOBLE PlS 0 N SO PE IiCOM PUT ERS (Research Inst, for A dvauced Computer Science) 21 p CSCL 09B Unclas G3/61 0217926 Research Institute for Advanced Computer Science Solving Large Sparse Eigenvalue Problems on Supercomputers Bernard Philippe* Youcef Saad Research Institute for Advanced Computer Science NASA Ames Research Center RIACS Technical Report 88.38 December, 1988 An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require to use the mauix in the form of matrix by vector multiplications. We compare the implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, we fist discuss how to perform this operation efficiently. We then compare the advantages and the disadvantages of each method and discuss implementations aspects.
    [Show full text]
  • Seasonal Influenza & Weather Factors
    MATLAB – Brief History Interdisciplinary Course – Fall2012 M.C.A. Leite [email protected] Department of Mathematics and Statistics, Toledo University MATLAB Overview • What is MATLAB? •History of MATLAB Who developed MATLAB Why MATLAB was developed Who currently maintains MATLAB • Strengths of MATLAB • Weaknesses of MATLAB What is MATLAB? MATLAB (MATrix LABoratory) • Interactive system • Programming language Considering MATLAB at home Standard edition Available for roughly 2 thousand dollars Student edition Available for roughly 1 hundred dollars. Some limitations, such as the allowable size of a matrix History of MATLAB • Ancestral software to MATLAB Fortran subroutines for solving linear (LINPACK) and eigenvalue (EISPACK) problems Developed primarily by Cleve Moler in the 1970’s History of MATLAB (cont.1) • Later, when teaching courses in mathematics, Moler wanted his students to be able to use LINPACK and EISPACK without requiring knowledge of Fortran • MATLAB developed as an interactive system to access LINPACK and EISPACK History of MATLAB (cont.2) • MATLAB gained popularity primarily through word of mouth because it was not officially distributed • In the 1980’s, MATLAB was rewritten in C with more functionality (such as plotting routines) History of MATLAB (cont.3) • The Mathworks, Inc. was created in 1984 • The Mathworks is now responsible for development, sale, and support for MATLAB • The Mathworks is located in Natick, MA Strengths of MATLAB • MATLAB is relatively easy to learn • MATLAB code is optimized to be relatively
    [Show full text]