Vector Models for Data-Parallel Computing

Total Page:16

File Type:pdf, Size:1020Kb

Vector Models for Data-Parallel Computing Vector Models for Data-Parallel Computing Guy E. Blelloch The MIT Press Cambridge, Massachusetts London, England c 1990 Massachusetts Institute of Technology vi Contents Preface xiii Acknowledgments xv 1 Introduction 1 1.1 Parallel Vector Models . ........................ 3 1.2 Vector Instructions ............................. 5 1.3 Implementation . ............................. 9 1.4SummaryandRoadmap........................... 10 I Models 15 2 Parallel Vector Models 19 2.1 The Vector Random Access Machine . ................... 19 2.2 Comparison to P-RAM Models ....................... 22 2.3 Comparison to Circuit and Network Models . ............... 27 2.4ComparisontoBit-VectorModels..................... 30 2.5 Selecting Primitives ............................. 31 2.6 Other Issues ................................. 31 2.6.1 Serially Optimal Algorithms . ................... 31 2.6.2 SpaceComplexity......................... 32 2.6.3 Equal Time Assumption . ................... 32 2.6.4 Do We Need the Scalar Memory? . ............... 32 2.7 Conclusion ................................. 33 3 The Scan Primitives 35 3.1 Why Scan Primitives? ............................ 37 vii viii CONTENTS 3.2Notation................................... 39 3.3 Example: Line-of-Sight . ........................ 40 3.4 Simple Operations ............................. 42 3.4.1 Example: Split Radix Sort . ................... 43 3.5 Segments and Segmented Scans . ................... 45 3.5.1 Example: Quicksort ........................ 46 3.5.2 Notes on Segments . ........................ 47 3.6 Allocating Elements ............................. 48 3.6.1 Example: Line Drawing . ................... 50 3.6.2 Notes on Allocating ........................ 51 3.7 Long Vectors and Load Balancing . ................... 51 3.7.1 LoadBalancing........................... 53 3.7.2 Example: Halving Merge . ................... 54 3.7.3 Notes on Simulating Long Vectors . ............... 56 4 The Scan Vector Model 59 4.1 The Scan Vector Instruction Set . ................... 59 4.1.1 Scalar Instructions . ........................ 61 4.1.2 Elementwise Instructions . ................... 61 4.1.3 Permute Instructions ........................ 62 4.1.4 Scan Instructions . ........................ 63 4.1.5 Vector-Scalar Instructions . ................... 63 4.2 Simple Operations ............................. 64 4.3 Segments and Segmented Instructions ................... 67 4.4 Segmented Operations . ........................ 68 4.5 Additional Instructions . ........................ 70 4.5.1 Merge Instruction . ........................ 70 4.5.2 Combine Instructions ........................ 71 4.5.3 Multi-Extract Instruction . ................... 72 4.5.4 Keyed-Scan Instructions . ................... 73 II Algorithms 75 5 Data Structures 79 5.1 Graphs . .................................. 79 5.1.1 Vector Graph Representations ................... 80 5.1.2 Neighbor Reducing . ........................ 83 5.1.3 Distributing an Excess Across Edges ............... 83 CONTENTS ix 5.2Trees..................................... 84 5.2.1 Vector Tree Representation . ................... 84 5.2.2 Leaffix and Rootfix Operations ................... 86 5.2.3 Tree Manipulations . ........................ 87 5.3 Multidimensional Arrays . ........................ 91 6 Computational-Geometry Algorithms 93 6.1GeneralizedBinarySearch......................... 94 6.2 Building a k-DTree............................. 96 6.3ClosestPair................................. 99 6.4Quickhull..................................√ 101 6.5 n MergeHull............................... 103 6.6 Line of Sight . ............................. 105 7 Graph Algorithms 109 7.1 Minimum Spanning Tree and Connectivity . ............... 110 7.2 Maximum Flow . ............................. 114 7.3 Maximal Independent Set and Biconnectivity ............... 114 8 Numerical Algorithms 117 8.1 Matrix-Vector Multiplication ........................ 117 8.2Linear-SystemsSolver........................... 118 8.3 Simplex . .................................. 122 8.4OuterProduct................................ 123 8.5 Sparse-Matrix Multiplication ........................ 123 III Languages and Compilers 127 9 Collection-Oriented Languages 131 9.1 Collections ................................. 132 9.2 Collection Operations ............................ 135 9.3 Mapping Collections onto Vectors . ................... 138 10 Flattening Nested Parallelism 143 10.1 Nested Parallelism and Replicating . ................... 144 10.2 The Replicating Theorem . ........................ 147 10.3 Access-Restricted Code . ........................ 149 10.4 Access-Fixed Replicating Theorem . ................... 150 10.5 Indirect Addressing ............................. 152 x CONTENTS 10.6 Conditional Control ............................. 154 10.6.1 Branch-Packing . ........................ 154 10.6.2 Contained Programs ........................ 157 10.6.3 Containment of Functions in Book . ............... 160 10.6.4 Round-Robin Simulation . ................... 161 11 A Compiler for Paralation Lisp 163 11.1 Source Code: Paralation Lisp ........................ 165 11.1.1 Data Structures . ........................ 166 11.1.2 Operators . ............................. 167 11.1.3 Restrictions ............................. 169 11.2TargetCode:Scan-VectorLisp....................... 169 11.2.1 Data Structures . ........................ 170 11.2.2 Operations ............................. 170 11.3 Translation ................................. 171 11.3.1 Data Structures . ........................ 171 11.3.2 Operations ............................. 175 IV Architecture 185 12 Implementing Parallel Vector Models 189 12.1 Implementation on the Connection Machine . ............... 189 12.1.1 The Vector Memory ........................ 190 12.1.2 The Instructions . ........................ 192 12.1.3 Optimizations ............................ 193 12.1.4 Running Times . ........................ 195 12.2 Simulating on P-RAM . ........................ 195 13 Implementing the Scan Primitives 197 13.1 Unsigned +-Scan and Max-Scan . ................... 197 13.1.1 Tree Scan . ............................. 198 13.1.2 Hardware Implementation of Tree Scan .............. 199 13.1.3 An Example System ........................ 202 13.2 Directly Implementing Other Scans . ................... 202 13.2.1 Backward and Segmented Scans.................. 203 13.2.2 Multidimensional Grid Scans ................... 205 13.3 Floating-Point +-Scan........................... 205 CONTENTS xi 14 Conclusion 209 14.1 Contributions . ............................. 210 14.2 Directions for Future Research ....................... 211 14.3 Implementation Goals ............................ 212 14.4DevelopmentofBookIdeas........................ 213 A Glossary 217 B Code 223 B.1 Simple Operations ............................. 224 B.2Segments.................................. 226 B.2.1 Useful Utilities . ........................ 226 B.2.2 Segment Descriptor Translations . ............... 227 B.2.3 Segmented Primitives ....................... 228 B.2.4 Segmented Conditionals . ................... 229 B.3 Other Routines . ............................. 229 C Paralation-Lisp Code 231 C.1 Utilities . .................................. 231 C.2LineDrawing................................ 233 C.3Quad-Tree.................................. 233 C.4 Convex Hull: Quickhull . ........................ 234 C.5 Quicksort .................................. 235 C.6Entropy................................... 236 C.7 ID3: Quinlan’s Learning Algorithm . ................... 237 Bibliography 243 Index 251 xii CONTENTS Preface This book is a revised version of my Doctoral Dissertation, which was completed at the Massachusetts Institute of Technology in November, 1988. The main purpose of the work was to explore the power of data-parallel programming; this exploration lead to the follow- ing conclusions: 1. The advantages gained in terms of simple and clean specifications of applications, al- gorithms, and languages makes a data-parallel programming model desirable for any kind of tightly-coupled parallel or vector machine, including multiple-instruction multiple-data (MIMD) machines. 2. The range of applications and algorithms that can be described using data-parallel programming is extremely broad, much broader than is often expected. Further- more, in most applications there is significantly more data-parallelism available than control parallelism. 3. The power of data-parallel programming models is only fully realized in models that permit nested parallelism: the ability to call a parallel routine multiple times in parallel—for example, calling a parallel matrix-inversion routine over many different matrices, each possibly of a different size, in parallel. Furthermore, to be useful, it must be possible to map this nested parallelism onto a flat parallel machine. 4. A set of scan primitives are extremely useful for describing data-parallel algorithms, andleadtoefficient runtime code. The scan primitives can be found in every algo- rithm in this book with uses ranging from load-balancing to a line-of-sight algorithm. The work does not claim that data-parallel programming models are applicable to all prob- lems, but it demonstrates that for a very wide class of problems, data-parallel programming models are not only applicable, but preferable, for programming tightly-coupled machines. xiii xiv PREFACE Outline This book is organized into four parts, models, algorithms,
Recommended publications
  • Getting Started Computing at the Al Lab by Christopher C. Stacy Abstract
    MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLI..IGENCE LABORATORY WORKING PAPER 235 7 September 1982 Getting Started Computing at the Al Lab by Christopher C. Stacy Abstract This document describes the computing facilities at the M.I.T. Artificial Intelligence Laboratory, and explains how to get started using them. It is intended as an orientation document for newcomers to the lab, and will be updated by the author from time to time. A.I. Laboratory Working Papers are produced for internal circulation. and may contain information that is, for example, too preliminary or too detailed for formal publication. It is not intended that they should be considered papers to which reference can be made in the literature. a MASACHUSETS INSTITUTE OF TECHNOLOGY 1982 Getting Started Table of Contents Page i Table of Contents 1. Introduction 1 1.1. Lisp Machines 2 1.2. Timesharing 3 1.3. Other Computers 3 1.3.1. Field Engineering 3 1.3.2. Vision and Robotics 3 1.3.3. Music 4 1,3.4. Altos 4 1.4. Output Peripherals 4 1.5. Other Machines 5 1.6. Terminals 5 2. Networks 7 2.1. The ARPAnet 7 2.2. The Chaosnet 7 2.3. Services 8 2.3.1. TELNET/SUPDUP 8 2.3.2. FTP 8 2.4. Mail 9 2.4.1. Processing Mail 9 2.4.2. Ettiquette 9 2.5. Mailing Lists 10 2.5.1. BBoards 11 2.6. Finger/Inquire 11 2.7. TIPs and TACs 12 2.7.1. ARPAnet TAC 12 2.7.2. Chaosnet TIP 13 3.
    [Show full text]
  • Triangular Factorization
    Chapter 1 Triangular Factorization This chapter deals with the factorization of arbitrary matrices into products of triangular matrices. Since the solution of a linear n n system can be easily obtained once the matrix is factored into the product× of triangular matrices, we will concentrate on the factorization of square matrices. Specifically, we will show that an arbitrary n n matrix A has the factorization P A = LU where P is an n n permutation matrix,× L is an n n unit lower triangular matrix, and U is an n ×n upper triangular matrix. In connection× with this factorization we will discuss pivoting,× i.e., row interchange, strategies. We will also explore circumstances for which A may be factored in the forms A = LU or A = LLT . Our results for a square system will be given for a matrix with real elements but can easily be generalized for complex matrices. The corresponding results for a general m n matrix will be accumulated in Section 1.4. In the general case an arbitrary m× n matrix A has the factorization P A = LU where P is an m m permutation× matrix, L is an m m unit lower triangular matrix, and U is an×m n matrix having row echelon structure.× × 1.1 Permutation matrices and Gauss transformations We begin by defining permutation matrices and examining the effect of premulti- plying or postmultiplying a given matrix by such matrices. We then define Gauss transformations and show how they can be used to introduce zeros into a vector. Definition 1.1 An m m permutation matrix is a matrix whose columns con- sist of a rearrangement of× the m unit vectors e(j), j = 1,...,m, in RI m, i.e., a rearrangement of the columns (or rows) of the m m identity matrix.
    [Show full text]
  • Selected Filmography of Digital Culture and New Media Art
    Dejan Grba SELECTED FILMOGRAPHY OF DIGITAL CULTURE AND NEW MEDIA ART This filmography comprises feature films, documentaries, TV shows, series and reports about digital culture and new media art. The selected feature films reflect the informatization of society, economy and politics in various ways, primarily on the conceptual and narrative plan. Feature films that directly thematize the digital paradigm can be found in the Film Lists section. Each entry is referenced with basic filmographic data: director’s name, title and production year, and production details are available online at IMDB, FilmWeb, FindAnyFilm, Metacritic etc. The coloured titles are links. Feature films Fritz Lang, Metropolis, 1926. Fritz Lang, M, 1931. William Cameron Menzies, Things to Come, 1936. Fritz Lang, The Thousand Eyes of Dr. Mabuse, 1960. Sidney Lumet, Fail-Safe, 1964. James B. Harris, The Bedford Incident, 1965. Jean-Luc Godard, Alphaville, 1965. Joseph Sargent, Colossus: The Forbin Project, 1970. Henri Verneuil, Le serpent, 1973. Alan J. Pakula, The Parallax View, 1974. Francis Ford Coppola, The Conversation, 1974. Sidney Pollack, The Three Days of Condor, 1975. George P. Cosmatos, The Cassandra Crossing, 1976. Sidney Lumet, Network, 1976. Robert Aldrich, Twilight's Last Gleaming, 1977. Michael Crichton, Coma, 1978. Brian De Palma, Blow Out, 1981. Steven Lisberger, Tron, 1982. Godfrey Reggio, Koyaanisqatsi, 1983. John Badham, WarGames, 1983. Roger Donaldson, No Way Out, 1987. F. Gary Gray, The Negotiator, 1988. John McTiernan, Die Hard, 1988. Phil Alden Robinson, Sneakers, 1992. Andrew Davis, The Fugitive, 1993. David Fincher, The Game, 1997. David Cronenberg, eXistenZ, 1999. Frank Oz, The Score, 2001. Tony Scott, Spy Game, 2001.
    [Show full text]
  • An Efficient Implementation of the Thomas-Algorithm for Block Penta
    An Efficient Implementation of the Thomas-Algorithm for Block Penta-diagonal Systems on Vector Computers Katharina Benkert1 and Rudolf Fischer2 1 High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, 70569 Stuttgart, Germany [email protected] 2 NEC High Performance Computing Europe GmbH, Prinzenallee 11, 40549 Duesseldorf, Germany [email protected] Abstract. In simulations of supernovae, linear systems of equations with a block penta-diagonal matrix possessing small, dense matrix blocks occur. For an efficient solution, a compact multiplication scheme based on a restructured version of the Thomas algorithm and specifically adapted routines for LU factorization as well as forward and backward substi- tution are presented. On a NEC SX-8 vector system, runtime could be decreased between 35% and 54% for block sizes varying from 20 to 85 compared to the original code with BLAS and LAPACK routines. Keywords: Thomas algorithm, vector architecture. 1 Introduction Neutrino transport and neutrino interactions in dense matter play a crucial role in stellar core collapse, supernova explosions and neutron star formation. The multidimensional neutrino radiation hydrodynamics code PROMETHEUS / VERTEX [1] discretizes the angular moment equations of the Boltzmann equa- tion giving rise to a non-linear algebraic system. It is solved by a Newton Raphson procedure, which in turn requires the solution of multiple block-penta- diagonal linear systems with small, dense matrix blocks in each step. This is achieved by the Thomas algorithm and takes a major part of the overall com- puting time. Since the code already performs well on vector computers, this kind of architecture has been the focus of the current work.
    [Show full text]
  • Pivoting for LU Factorization
    Pivoting for LU Factorization Matthew W. Reid April 21, 2014 University of Puget Sound E-mail: [email protected] Copyright (C) 2014 Matthew W. Reid. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". 1 INTRODUCTION 1 1 Introduction Pivoting for LU factorization is the process of systematically selecting pivots for Gaussian elimina- tion during the LU factorization of a matrix. The LU factorization is closely related to Gaussian elimination, which is unstable in its pure form. To guarantee the elimination process goes to com- pletion, we must ensure that there is a nonzero pivot at every step of the elimination process. This is the reason we need pivoting when computing LU factorizations. But we can do more with piv- oting than just making sure Gaussian elimination completes. We can reduce roundoff errors during computation and make our algorithm backward stable by implementing the right pivoting strategy. Depending on the matrix A, some LU decompositions can become numerically unstable if relatively small pivots are used. Relatively small pivots cause instability because they operate very similar to zeros during Gaussian elimination. Through the process of pivoting, we can greatly reduce this instability by ensuring that we use relatively large entries as our pivot elements. This prevents large factors from appearing in the computed L and U, which reduces roundoff errors during computa- tion.
    [Show full text]
  • Mergesort / Quicksort Steven Skiena
    Lecture 8: Mergesort / Quicksort Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794–4400 http://www.cs.sunysb.edu/∼skiena Problem of the Day Given an array-based heap on n elements and a real number x, efficiently determine whether the kth smallest in the heap is greater than or equal to x. Your algorithm should be O(k) in the worst-case, independent of the size of the heap. Hint: you not have to find the kth smallest element; you need only determine its relationship to x. Solution Mergesort Recursive algorithms are based on reducing large problems into small ones. A nice recursive approach to sorting involves partitioning the elements into two groups, sorting each of the smaller problems recursively, and then interleaving the two sorted lists to totally order the elements. Mergesort Implementation mergesort(item type s[], int low, int high) f int i; (* counter *) int middle; (* index of middle element *) if (low < high) f middle = (low+high)/2; mergesort(s,low,middle); mergesort(s,middle+1,high); merge(s, low, middle, high); g g Mergesort Animation M E R G E S O R T M E R G E S O R T M E R G E S O R T M E M E R G E S O R T E M E M R E G O S R T E E G M R O R S T E E G M O R R S T Merging Sorted Lists The efficiency of mergesort depends upon how efficiently we combine the two sorted halves into a single sorted list.
    [Show full text]
  • 4. Linear Equations with Sparse Matrices 4.1 General Properties of Sparse Matrices
    4. Linear Equations with sparse matrices 4.1 General properties of sparse matrices Full n x n – matrix: storage O(n2), solution O(n3) Æ too costly for most applications, especially for fine discretization (large n). Idea: Formulate the given problem in a clever way that leads to a linear system that is sparse: storage O(n), solution O(n)? (that is strucutured: storage O(n), solution O(n log(n)), FFT) (that is dense, but reduced from e.g.3D to 2D) Example: Tridiagonal matrix, banded matrix, block band matrix. 1 .⎛ 0 0 2 .⎞ 0 ⎜ ⎟ 3 .⎜ 4 . 0 5 .⎟ 0 Sparse example matrix 6A .= ⎜ 0 7 . 8 .⎟ , n 9 = .5, nnz = 12 ⎜ ⎟ 0 0⎜ 10 . 11⎟ . 0 ⎜ ⎟ 1 0⎝ 0 0 0⎠ 12 . 4.1.1 Storage in Coordinate Form values AA 12. 9. 7. 5. 1. 2. 11. 3. 6. 4. 8. 10. row JR5 332114 23234 columnJC5 534144 11243 To store: n, nnz, 2*nnz integer for row and column indices in JR and JC, and nnz float in AA. No sorting included. Redundant information. Code for computing c = A*b: for j = 1 : nnz(A) cJR(j) = CJR(j) + AA(j) * bJC(j) ; end aJR(j),JC(j) Disadvantage: Indirect addressing (indexing) in vector c and b Æ jumps in memory2 Advantage: No difference between columns and rows (A and AT), simple. 4.1.2 Compressed Sparse Row Format: CSR row 1 row 2 row 3 row 4 row 5 AA 2 1 5 3 4 6 7 8 9 10 11 12 Values JA 4 1 4 1 2 1 3 4 5 3 4 5 Column indices IA: 1 3 6 10 12 13 pointer to row i Storage: n, nnz, n+nnz+1 integer, nnz float.
    [Show full text]
  • Creativity in Computer Science. in J
    Creativity in Computer Science Daniel Saunders and Paul Thagard University of Waterloo Saunders, D., & Thagard, P. (forthcoming). Creativity in computer science. In J. C. Kaufman & J. Baer (Eds.), Creativity across domains: Faces of the muse. Mahwah, NJ: Lawrence Erlbaum Associates. 1. Introduction Computer science only became established as a field in the 1950s, growing out of theoretical and practical research begun in the previous two decades. The field has exhibited immense creativity, ranging from innovative hardware such as the early mainframes to software breakthroughs such as programming languages and the Internet. Martin Gardner worried that "it would be a sad day if human beings, adjusting to the Computer Revolution, became so intellectually lazy that they lost their power of creative thinking" (Gardner, 1978, p. vi-viii). On the contrary, computers and the theory of computation have provided great opportunities for creative work. This chapter examines several key aspects of creativity in computer science, beginning with the question of how problems arise in computer science. We then discuss the use of analogies in solving key problems in the history of computer science. Our discussion in these sections is based on historical examples, but the following sections discuss the nature of creativity using information from a contemporary source, a set of interviews with practicing computer scientists collected by the Association of Computing Machinery’s on-line student magazine, Crossroads. We then provide a general comparison of creativity in computer science and in the natural sciences. 2. Nature and Origins of Problems in Computer Science December 21, 2004 Computer science is closely related to both mathematics and engineering.
    [Show full text]
  • 2012, Dec, Google Introduces Metaweb Searching
    Google Gets A Second Brain, Changing Everything About Search Wade Roush12/12/12Follow @wroush Share and Comment In the 1983 sci-fi/comedy flick The Man with Two Brains, Steve Martin played Michael Hfuhruhurr, a neurosurgeon who marries one of his patients but then falls in love with the disembodied brain of another woman, Anne. Michael and Anne share an entirely telepathic relationship, until Michael’s gold-digging wife is murdered, giving him the opportunity to transplant Anne’s brain into her body. Well, you may not have noticed it yet, but the search engine you use every day—by which I mean Google, of course—is also in the middle of a brain transplant. And, just as Dr. Hfuhruhurr did, you’re probably going to like the new version a lot better. You can think of Google, in its previous incarnation, as a kind of statistics savant. In addition to indexing hundreds of billions of Web pages by keyword, it had grown talented at tricky tasks like recognizing names, parsing phrases, and correcting misspelled words in users’ queries. But this was all mathematical sleight-of-hand, powered mostly by Google’s vast search logs, which give the company a detailed day-to-day picture of the queries people type and the links they click. There was no real understanding underneath; Google’s algorithms didn’t know that “San Francisco” is a city, for instance, while “San Francisco Giants” is a baseball team. Now that’s changing. Today, when you enter a search term into Google, the company kicks off two separate but parallel searches.
    [Show full text]
  • 7 Gaussian Elimination and LU Factorization
    7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method for solving systems of linear equations). m×m The basic idea is to use left-multiplication of A ∈ C by (elementary) lower triangular matrices, L1,L2,...,Lm−1 to convert A to upper triangular form, i.e., Lm−1Lm−2 ...L2L1 A = U. | {z } =Le Note that the product of lower triangular matrices is a lower triangular matrix, and the inverse of a lower triangular matrix is also lower triangular. Therefore, LAe = U ⇐⇒ A = LU, where L = Le−1. This approach can be viewed as triangular triangularization. 7.1 Why Would We Want to Do This? Consider the system Ax = b with LU factorization A = LU. Then we have LUx = b. |{z} =y Therefore we can perform (a now familiar) 2-step solution procedure: 1. Solve the lower triangular system Ly = b for y by forward substitution. 2. Solve the upper triangular system Ux = y for x by back substitution. Moreover, consider the problem AX = B (i.e., many different right-hand sides that are associated with the same system matrix). In this case we need to compute the factorization A = LU only once, and then AX = B ⇐⇒ LUX = B, and we proceed as before: 1. Solve LY = B by many forward substitutions (in parallel). 2. Solve UX = Y by many back substitutions (in parallel). In order to appreciate the usefulness of this approach note that the operations count 2 3 for the matrix factorization is O( 3 m ), while that for forward and back substitution is O(m2).
    [Show full text]
  • Mixing LU and QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers✩
    J. Parallel Distrib. Comput. 85 (2015) 32–46 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solversI Mathieu Faverge a, Julien Herrmann b,∗, Julien Langou c, Bradley Lowery c, Yves Robert a,d, Jack Dongarra d a Bordeaux INP, Univ. Bordeaux, Inria, CNRS (UMR 5800), Talence, France b Laboratoire LIP, École Normale Supérieure de Lyon, France c University Colorado Denver, USA d University of Tennessee Knoxville, USA h i g h l i g h t s • New hybrid algorithm combining stability of QR and efficiency of LU factorizations. • Flexible threshold criteria to select LU and QR steps. • Comprehensive experimental bi-criteria study of stability and performance. article info a b s t r a c t Article history: This paper introduces hybrid LU–QR algorithms for solving dense linear systems of the form Ax D b. Received 16 December 2014 Throughout a matrix factorization, these algorithms dynamically alternate LU with local pivoting and Received in revised form QR elimination steps based upon some robustness criterion. LU elimination steps can be very efficiently 25 June 2015 parallelized, and are twice as cheap in terms of floating-point operations, as QR steps. However, LU steps Accepted 29 June 2015 are not necessarily stable, while QR steps are always stable. The hybrid algorithms execute a QR step when Available online 21 July 2015 a robustness criterion detects some risk for instability, and they execute an LU step otherwise. The choice between LU and QR steps must have a small computational overhead and must provide a satisfactory level Keywords: Numerical algorithms of stability with as few QR steps as possible.
    [Show full text]
  • This Introduction to the SHOT SIGCIS Session Examining The
    Preprint—Not For Circulation This introduction to the SHOT SIGCIS session Examining the Interaction of Speculative Literature and Computing: Toward a Research Agenda of October 2010 is taken from the introduction to Science Fiction and Computing: Essays on Interlinked Domains edited by David L. Ferro and Eric Swedin and forthcoming from McFarland. Introduction by David L. Ferro When I was in fifth grade, around the Christmas holidays, we were randomly paired up with someone else in the class to exchange gifts. I can't remember what I gave, but I recall what I received: a paperback book, its cover torn off. It was Isaac Asimov's The Rest of the Robots, a collection of short stories that opened up for me, not only a whole new literary genre, but a whole new way of looking at the world. I still have it on my shelf. I didn't think much about receiving that book as the rest of my life unfolded and as I made the career choices that led me to writing this introduction. In retrospect, however, it was a defining moment. Today I teach in a computer science department and have a PhD in Science and Technology Studies. Asimov's book both inspired and reflects my life as well as the book you hold before you. The stories that Asimov wrote were thought experiments–using a fictional format–that explored the engineering and scientific development necessary for further advances in computers. They also dealt with the social implications of those advances. Only now, in researching this book, do I realize how apt Asimov was as the progenitor of inspiration.
    [Show full text]