Vector Models for Data-Parallel Computing

Vector Models for Data-Parallel Computing Guy E. Blelloch The MIT Press Cambridge, Massachusetts London, England c 1990 Massachusetts Institute of Technology vi Contents Preface xiii Acknowledgments xv 1 Introduction 1 1.1 Parallel Vector Models . ........................ 3 1.2 Vector Instructions ............................. 5 1.3 Implementation . ............................. 9 1.4SummaryandRoadmap........................... 10 I Models 15 2 Parallel Vector Models 19 2.1 The Vector Random Access Machine . ................... 19 2.2 Comparison to P-RAM Models ....................... 22 2.3 Comparison to Circuit and Network Models . ............... 27 2.4ComparisontoBit-VectorModels..................... 30 2.5 Selecting Primitives ............................. 31 2.6 Other Issues ................................. 31 2.6.1 Serially Optimal Algorithms . ................... 31 2.6.2 SpaceComplexity......................... 32 2.6.3 Equal Time Assumption . ................... 32 2.6.4 Do We Need the Scalar Memory? . ............... 32 2.7 Conclusion ................................. 33 3 The Scan Primitives 35 3.1 Why Scan Primitives? ............................ 37 vii viii CONTENTS 3.2Notation................................... 39 3.3 Example: Line-of-Sight . ........................ 40 3.4 Simple Operations ............................. 42 3.4.1 Example: Split Radix Sort . ................... 43 3.5 Segments and Segmented Scans . ................... 45 3.5.1 Example: Quicksort ........................ 46 3.5.2 Notes on Segments . ........................ 47 3.6 Allocating Elements ............................. 48 3.6.1 Example: Line Drawing . ................... 50 3.6.2 Notes on Allocating ........................ 51 3.7 Long Vectors and Load Balancing . ................... 51 3.7.1 LoadBalancing........................... 53 3.7.2 Example: Halving Merge . ................... 54 3.7.3 Notes on Simulating Long Vectors . ............... 56 4 The Scan Vector Model 59 4.1 The Scan Vector Instruction Set . ................... 59 4.1.1 Scalar Instructions . ........................ 61 4.1.2 Elementwise Instructions . ................... 61 4.1.3 Permute Instructions ........................ 62 4.1.4 Scan Instructions . ........................ 63 4.1.5 Vector-Scalar Instructions . ................... 63 4.2 Simple Operations ............................. 64 4.3 Segments and Segmented Instructions ................... 67 4.4 Segmented Operations . ........................ 68 4.5 Additional Instructions . ........................ 70 4.5.1 Merge Instruction . ........................ 70 4.5.2 Combine Instructions ........................ 71 4.5.3 Multi-Extract Instruction . ................... 72 4.5.4 Keyed-Scan Instructions . ................... 73 II Algorithms 75 5 Data Structures 79 5.1 Graphs . .................................. 79 5.1.1 Vector Graph Representations ................... 80 5.1.2 Neighbor Reducing . ........................ 83 5.1.3 Distributing an Excess Across Edges ............... 83 CONTENTS ix 5.2Trees..................................... 84 5.2.1 Vector Tree Representation . ................... 84 5.2.2 Leaffix and Rootfix Operations ................... 86 5.2.3 Tree Manipulations . ........................ 87 5.3 Multidimensional Arrays . ........................ 91 6 Computational-Geometry Algorithms 93 6.1GeneralizedBinarySearch......................... 94 6.2 Building a k-DTree............................. 96 6.3ClosestPair................................. 99 6.4Quickhull..................................√ 101 6.5 n MergeHull............................... 103 6.6 Line of Sight . ............................. 105 7 Graph Algorithms 109 7.1 Minimum Spanning Tree and Connectivity . ............... 110 7.2 Maximum Flow . ............................. 114 7.3 Maximal Independent Set and Biconnectivity ............... 114 8 Numerical Algorithms 117 8.1 Matrix-Vector Multiplication ........................ 117 8.2Linear-SystemsSolver........................... 118 8.3 Simplex . .................................. 122 8.4OuterProduct................................ 123 8.5 Sparse-Matrix Multiplication ........................ 123 III Languages and Compilers 127 9 Collection-Oriented Languages 131 9.1 Collections ................................. 132 9.2 Collection Operations ............................ 135 9.3 Mapping Collections onto Vectors . ................... 138 10 Flattening Nested Parallelism 143 10.1 Nested Parallelism and Replicating . ................... 144 10.2 The Replicating Theorem . ........................ 147 10.3 Access-Restricted Code . ........................ 149 10.4 Access-Fixed Replicating Theorem . ................... 150 10.5 Indirect Addressing ............................. 152 x CONTENTS 10.6 Conditional Control ............................. 154 10.6.1 Branch-Packing . ........................ 154 10.6.2 Contained Programs ........................ 157 10.6.3 Containment of Functions in Book . ............... 160 10.6.4 Round-Robin Simulation . ................... 161 11 A Compiler for Paralation Lisp 163 11.1 Source Code: Paralation Lisp ........................ 165 11.1.1 Data Structures . ........................ 166 11.1.2 Operators . ............................. 167 11.1.3 Restrictions ............................. 169 11.2TargetCode:Scan-VectorLisp....................... 169 11.2.1 Data Structures . ........................ 170 11.2.2 Operations ............................. 170 11.3 Translation ................................. 171 11.3.1 Data Structures . ........................ 171 11.3.2 Operations ............................. 175 IV Architecture 185 12 Implementing Parallel Vector Models 189 12.1 Implementation on the Connection Machine . ............... 189 12.1.1 The Vector Memory ........................ 190 12.1.2 The Instructions . ........................ 192 12.1.3 Optimizations ............................ 193 12.1.4 Running Times . ........................ 195 12.2 Simulating on P-RAM . ........................ 195 13 Implementing the Scan Primitives 197 13.1 Unsigned +-Scan and Max-Scan . ................... 197 13.1.1 Tree Scan . ............................. 198 13.1.2 Hardware Implementation of Tree Scan .............. 199 13.1.3 An Example System ........................ 202 13.2 Directly Implementing Other Scans . ................... 202 13.2.1 Backward and Segmented Scans.................. 203 13.2.2 Multidimensional Grid Scans ................... 205 13.3 Floating-Point +-Scan........................... 205 CONTENTS xi 14 Conclusion 209 14.1 Contributions . ............................. 210 14.2 Directions for Future Research ....................... 211 14.3 Implementation Goals ............................ 212 14.4DevelopmentofBookIdeas........................ 213 A Glossary 217 B Code 223 B.1 Simple Operations ............................. 224 B.2Segments.................................. 226 B.2.1 Useful Utilities . ........................ 226 B.2.2 Segment Descriptor Translations . ............... 227 B.2.3 Segmented Primitives ....................... 228 B.2.4 Segmented Conditionals . ................... 229 B.3 Other Routines . ............................. 229 C Paralation-Lisp Code 231 C.1 Utilities . .................................. 231 C.2LineDrawing................................ 233 C.3Quad-Tree.................................. 233 C.4 Convex Hull: Quickhull . ........................ 234 C.5 Quicksort .................................. 235 C.6Entropy................................... 236 C.7 ID3: Quinlan’s Learning Algorithm . ................... 237 Bibliography 243 Index 251 xii CONTENTS Preface This book is a revised version of my Doctoral Dissertation, which was completed at the Massachusetts Institute of Technology in November, 1988. The main purpose of the work was to explore the power of data-parallel programming; this exploration lead to the follow- ing conclusions: 1. The advantages gained in terms of simple and clean specifications of applications, algorithms, and languages makes a data-parallel programming model desirable for any kind of tightly-coupled parallel or vector machine, including multiple-instruction multiple-data (MIMD) machines. 2. The range of applications and algorithms that can be described using data-parallel programming is extremely broad, much broader than is often expected. Further- more, in most applications there is significantly more data-parallelism available than control parallelism. 3. The power of data-parallel programming models is only fully realized in models that permit nested parallelism: the ability to call a parallel routine multiple times in parallel—for example, calling a parallel matrix-inversion routine over many different matrices, each possibly of a different size, in parallel. Furthermore, to be useful, it must be possible to map this nested parallelism onto a flat parallel machine. 4. A set of scan primitives are extremely useful for describing data-parallel algorithms, andleadtoefficient runtime code. The scan primitives can be found in every algorithm in this book with uses ranging from load-balancing to a line-of-sight algorithm. The work does not claim that data-parallel programming models are applicable to all problems, but it demonstrates that for a very wide class of problems, data-parallel programming models are not only applicable, but preferable, for programming tightly-coupled machines. xiii xiv PREFACE Outline This book is organized into four parts, models, algorithms,

Vector Models for Data-Parallel Computing

Getting Started Computing at the Al Lab by Christopher C. Stacy Abstract

Triangular Factorization

Selected Filmography of Digital Culture and New Media Art

An Efficient Implementation of the Thomas-Algorithm for Block Penta

Pivoting for LU Factorization

Mergesort / Quicksort Steven Skiena

4. Linear Equations with Sparse Matrices 4.1 General Properties of Sparse Matrices

Creativity in Computer Science. in J

2012, Dec, Google Introduces Metaweb Searching

7 Gaussian Elimination and LU Factorization

Mixing LU and QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers✩

This Introduction to the SHOT SIGCIS Session Examining The