Mathematics 18.337, Computer Science 6.338, SMA 5505 Applied Parallel Computing Spring 2004

Mathematics 18.337, Computer Science 6.338, SMA 5505 Applied Parallel Computing Spring 2004

Mathematics 18.337, Computer Science 6.338, SMA 5505 Applied Parallel Computing Spring 2004 Lecturer: Alan Edelman1 MIT 1Department of Mathematics and Laboratory for Computer Science. Room 2-388, Massachusetts Institute of Technology, Cambridge, MA 02139, Email: [email protected], http://math.mit.edu/~edelman ii Math 18.337, Computer Science 6.338, SMA 5505, Spring 2004 Contents 1 Introduction 1 1.1 The machines . 1 1.2 The software . 2 1.3 The Reality of High Performance Computing . 3 1.4 Modern Algorithms . 3 1.5 Compilers . 3 1.6 Scientific Algorithms . 4 1.7 History, State-of-Art, and Perspective . 4 1.7.1 Things that are not traditional supercomputers . 4 1.8 Analyzing the top500 List Using Excel . 5 1.8.1 Importing the XML file . 5 1.8.2 Filtering . 7 1.8.3 Pivot Tables . 9 1.9 Parallel Computing: An Example . 14 1.10 Exercises . 16 2 MPI, OpenMP, MATLAB*P 17 2.1 Programming style . 17 2.2 Message Passing . 18 2.2.1 Who am I? . 19 2.2.2 Sending and receiving . 20 2.2.3 Tags and communicators . 22 2.2.4 Performance, and tolerance . 23 2.2.5 Who's got the floor? . 24 2.3 More on Message Passing . 26 2.3.1 Nomenclature . 26 2.3.2 The Development of Message Passing . 26 2.3.3 Machine Characteristics . 27 2.3.4 Active Messages . 27 2.4 OpenMP for Shared Memory Parallel Programming . 27 2.5 STARP . 30 3 Parallel Prefix 33 3.1 Parallel Prefix . 33 3.2 The \Myth" of lg n . 35 3.3 Applications of Parallel Prefix . 35 3.3.1 Segmented Scan . 35 iii iv Math 18.337, Computer Science 6.338, SMA 5505, Spring 2004 3.3.2 Csanky's Matrix Inversion . 36 3.3.3 Babbage and Carry Look-Ahead Addition . 37 3.4 Parallel Prefix in MPI . 38 4 Dense Linear Algebra 39 4.1 Dense Matrices . 39 4.2 Applications . 40 4.2.1 Uncovering the structure from seemingly unstructured problems . 40 4.3 Records . 41 4.4 Algorithms, and mapping matrices to processors . 42 4.5 The memory hierarchy . 44 4.6 Single processor condiderations for dense linear algebra . 45 4.6.1 LAPACK and the BLAS . 45 4.6.2 Reinventing dense linear algebra optimization . 46 4.7 Parallel computing considerations for dense linear algebra . 50 4.8 Better load balancing . 52 4.8.1 Problems . 52 5 Sparse Linear Algebra 55 5.1 Cyclic Reduction for Structured Sparse Linear Systems . 55 5.2 Sparse Direct Methods . 57 5.2.1 LU Decomposition and Gaussian Elimination . 57 5.2.2 Parallel Factorization: the Multifrontal Algorithm . 61 5.3 Basic Iterative Methods . 63 5.3.1 SuperLU-dist . 63 5.3.2 Jacobi Method . 64 5.3.3 Gauss-Seidel Method . 64 5.3.4 Splitting Matrix Method . 64 5.3.5 Weighted Splitting Matrix Method . 65 5.4 Red-Black Ordering for parallel Implementation . 65 5.5 Conjugate Gradient Method . 66 5.5.1 Parallel Conjugate Gradient . 66 5.6 Preconditioning . 67 5.7 Symmetric Supernodes . 69 5.7.1 Unsymmetric Supernodes . 69 5.7.2 The Column Elimination Tree . 70 5.7.3 Relaxed Supernodes . 72 5.7.4 Supernodal Numeric Factorization . 73 5.8 Efficient sparse matrix algorithms . 75 5.8.1 Scalable algorithms . 75 5.8.2 Cholesky factorization . 77 5.8.3 Distributed sparse Cholesky and the model problem . 78 5.8.4 Parallel Block-Oriented Sparse Cholesky Factorization . 79 5.9 Load balance with cyclic mapping . 79 5.9.1 Empirical Load Balance Results . 80 5.10 Heuristic Remapping . 82 5.11 Scheduling Local Computations . 83 Preface v 6 Parallel Machines 85 6.0.1 More on private versus shared addressing . 92 6.0.2 Programming Model . 93 6.0.3 Machine Topology . 93 6.0.4 Homogeneous and heterogeneous machines . 94 6.0.5 Distributed Computing on the Internet and Akamai Network . 95 7 FFT 97 7.1 FFT . 97 7.1.1 Data motion . 99 7.1.2 FFT on parallel machines . 100 7.1.3 Exercises . 101 7.2 Matrix Multiplication . 101 7.3 Basic Data Communication Operations . 102 8 Domain Decomposition 103 8.1 Geometric Issues . 105 8.1.1 Overlapping vs. Non-overlapping regions . 105 8.1.2 Geometric Discretization . 106 8.2 Algorithmic Issues . 108 8.2.1 Classical Iterations and their block equivalents . 109 8.2.2 Schwarz approaches: additive vs. multiplicative . 109 8.2.3 Substructuring Approaches . 112 8.2.4 Accellerants . 114 8.3 Theoretical Issues . 115 8.4 A Domain Decomposition Assignment: Decomposing MIT . 116 9 Particle Methods 119 9.1 Reduce and Broadcast: A function viewpoint . 119 9.2 Particle Methods: An Application . 120 9.3 Outline . 120 9.4 What is N-Body Simulation? . ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    195 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us