Mathematics 18.337, Computer Science 6.338, SMA 5505 Applied Parallel Computing Spring 2004
Total Page:16
File Type:pdf, Size:1020Kb
Mathematics 18.337, Computer Science 6.338, SMA 5505 Applied Parallel Computing Spring 2004 Lecturer: Alan Edelman1 MIT 1Department of Mathematics and Laboratory for Computer Science. Room 2-388, Massachusetts Institute of Technology, Cambridge, MA 02139, Email: [email protected], http://math.mit.edu/~edelman ii Math 18.337, Computer Science 6.338, SMA 5505, Spring 2004 Contents 1 Introduction 1 1.1 The machines . 1 1.2 The software . 2 1.3 The Reality of High Performance Computing . 3 1.4 Modern Algorithms . 3 1.5 Compilers . 3 1.6 Scientific Algorithms . 4 1.7 History, State-of-Art, and Perspective . 4 1.7.1 Things that are not traditional supercomputers . 4 1.8 Analyzing the top500 List Using Excel . 5 1.8.1 Importing the XML file . 5 1.8.2 Filtering . 7 1.8.3 Pivot Tables . 9 1.9 Parallel Computing: An Example . 14 1.10 Exercises . 16 2 MPI, OpenMP, MATLAB*P 17 2.1 Programming style . 17 2.2 Message Passing . 18 2.2.1 Who am I? . 19 2.2.2 Sending and receiving . 20 2.2.3 Tags and communicators . 22 2.2.4 Performance, and tolerance . 23 2.2.5 Who's got the floor? . 24 2.3 More on Message Passing . 26 2.3.1 Nomenclature . 26 2.3.2 The Development of Message Passing . 26 2.3.3 Machine Characteristics . 27 2.3.4 Active Messages . 27 2.4 OpenMP for Shared Memory Parallel Programming . 27 2.5 STARP . 30 3 Parallel Prefix 33 3.1 Parallel Prefix . 33 3.2 The \Myth" of lg n . 35 3.3 Applications of Parallel Prefix . 35 3.3.1 Segmented Scan . 35 iii iv Math 18.337, Computer Science 6.338, SMA 5505, Spring 2004 3.3.2 Csanky's Matrix Inversion . 36 3.3.3 Babbage and Carry Look-Ahead Addition . 37 3.4 Parallel Prefix in MPI . 38 4 Dense Linear Algebra 39 4.1 Dense Matrices . 39 4.2 Applications . 40 4.2.1 Uncovering the structure from seemingly unstructured problems . 40 4.3 Records . 41 4.4 Algorithms, and mapping matrices to processors . 42 4.5 The memory hierarchy . 44 4.6 Single processor condiderations for dense linear algebra . 45 4.6.1 LAPACK and the BLAS . 45 4.6.2 Reinventing dense linear algebra optimization . 46 4.7 Parallel computing considerations for dense linear algebra . 50 4.8 Better load balancing . 52 4.8.1 Problems . 52 5 Sparse Linear Algebra 55 5.1 Cyclic Reduction for Structured Sparse Linear Systems . 55 5.2 Sparse Direct Methods . 57 5.2.1 LU Decomposition and Gaussian Elimination . 57 5.2.2 Parallel Factorization: the Multifrontal Algorithm . 61 5.3 Basic Iterative Methods . 63 5.3.1 SuperLU-dist . 63 5.3.2 Jacobi Method . 64 5.3.3 Gauss-Seidel Method . 64 5.3.4 Splitting Matrix Method . 64 5.3.5 Weighted Splitting Matrix Method . 65 5.4 Red-Black Ordering for parallel Implementation . 65 5.5 Conjugate Gradient Method . 66 5.5.1 Parallel Conjugate Gradient . 66 5.6 Preconditioning . 67 5.7 Symmetric Supernodes . 69 5.7.1 Unsymmetric Supernodes . 69 5.7.2 The Column Elimination Tree . 70 5.7.3 Relaxed Supernodes . 72 5.7.4 Supernodal Numeric Factorization . 73 5.8 Efficient sparse matrix algorithms . 75 5.8.1 Scalable algorithms . 75 5.8.2 Cholesky factorization . 77 5.8.3 Distributed sparse Cholesky and the model problem . 78 5.8.4 Parallel Block-Oriented Sparse Cholesky Factorization . 79 5.9 Load balance with cyclic mapping . 79 5.9.1 Empirical Load Balance Results . 80 5.10 Heuristic Remapping . 82 5.11 Scheduling Local Computations . 83 Preface v 6 Parallel Machines 85 6.0.1 More on private versus shared addressing . 92 6.0.2 Programming Model . 93 6.0.3 Machine Topology . 93 6.0.4 Homogeneous and heterogeneous machines . 94 6.0.5 Distributed Computing on the Internet and Akamai Network . 95 7 FFT 97 7.1 FFT . 97 7.1.1 Data motion . 99 7.1.2 FFT on parallel machines . 100 7.1.3 Exercises . 101 7.2 Matrix Multiplication . 101 7.3 Basic Data Communication Operations . 102 8 Domain Decomposition 103 8.1 Geometric Issues . 105 8.1.1 Overlapping vs. Non-overlapping regions . 105 8.1.2 Geometric Discretization . 106 8.2 Algorithmic Issues . 108 8.2.1 Classical Iterations and their block equivalents . 109 8.2.2 Schwarz approaches: additive vs. multiplicative . 109 8.2.3 Substructuring Approaches . 112 8.2.4 Accellerants . 114 8.3 Theoretical Issues . 115 8.4 A Domain Decomposition Assignment: Decomposing MIT . 116 9 Particle Methods 119 9.1 Reduce and Broadcast: A function viewpoint . 119 9.2 Particle Methods: An Application . 120 9.3 Outline . 120 9.4 What is N-Body Simulation? . ..