Over-Scalapack.Pdf

1 2 The Situation: Parallel scienti c applications are typically Overview of ScaLAPACK written from scratch, or manually adapted from sequen- tial programs Jack Dongarra using simple SPMD mo del in Fortran or C Computer Science Department UniversityofTennessee with explicit message passing primitives and Mathematical Sciences Section Which means Oak Ridge National Lab oratory similar comp onents are co ded over and over again, co des are dicult to develop and maintain http://w ww .ne tl ib. org /ut k/p e opl e/J ackDo ngarra.html debugging particularly unpleasant... dicult to reuse comp onents not really designed for long-term software solution 3 4 What should math software lo ok like? The NetSolve Client Many more p ossibilities than shared memory Virtual Software Library Multiplicityofinterfaces ! Ease of use Possible approaches { Minimum change to status quo Programming interfaces Message passing and ob ject oriented { Reusable templates { requires sophisticated user Cinterface FORTRAN interface { Problem solving environments Integrated systems a la Matlab, Mathematica use with heterogeneous platform Interactiveinterfaces { Computational server, like netlib/xnetlib f2c Non-graphic interfaces Some users want high p erf., access to details {MATLAB interface Others sacri ce sp eed to hide details { UNIX-Shell interface Not enough p enetration of libraries into hardest appli- Graphic interfaces cations on vector machines { TK/TCL interface Need to lo ok at applications and rethink mathematical software { HotJava Browser interface WWW 5 6 The NetSolve Agent The NetSolve System The system is a set of computational servers Intelligent Agent ! Optimum use of Resource Heterogeneity supp orted XDR The agent is p ermanently aware of : Dynamic addition of resource The con guration of the NetSolve system Fault Tolerance The characteristics of each NetSolve resource Each server uses the installed scienti c packages The load of each resource The functionalities of each resource Example of packages : Each client requests informs the agent ab out : BLAS LAPACK The typ e of problem to solve ScaLAPACK The size of the problem Ellpack The NetSolve Agentcho oses the NetSolve resource : Connection to other systems Load Balancing Strategy NEOS 7 8 New challenges LAPACK / ScaLapack A Scalable Library for { No standard software Distributed Memory Machines Fortran 90, HP Fortran, PVM, MPIbut ... { Many avors of message passing LAPACK success from HP-48G to CRAY C-90 Need standard - BLACS {Targets workstations, vector machines, shared mem- { Highly varying ratio of ory computation sp eed { 449,015 requests since then as of 10/95 r = communication sp eed { Dense and banded matrices { Manyways to layout data { Linear systems, least squares, eigenproblems {Fastest parallel algorithm sometimes less { Manual available from SIAM, html, Japanese trans- stable numerically lation { Second release and manual in 9/94 { Used by Cray, IBM, Convex, NAG, IMSL, ... Provide useful scienti c program libraries for computational scientists working on new parallel architectures {Move to new generation high p erformance machines - SP-2, T3D, Paragon, Clusters of WS, ... { Add functionality generalized eigenproblem, sparse matrices, ... 9 10 ScaLAPACK{OVERVIEW Needs Standards {Portable Goal:Port Linear Algebra PACKage LAPACK to distributed-memory {Vendor-supp orted environments. { Scalable Scalability and Performance High eciency { Optimized compute and communication engines Protect programming investment, if p ossible { Blo cked data access level 3 BLAS yields go o d no de p erformance Avoid low-level details { 2-D Blo ck-cyclic data decomp osition yields go o d load balance Scalability Portability { BLAS: optimized compute engine { BLACS: communication kernel PVM, MPI, ... Simple calling interface { Stay as close as p ossible to LAPACK in calling sequence, storage, etc. Ease of Programming / Maintenance { Mo dularity: Build rich set of linear algebra to ols BLAS BLACS PBLAS { Whenever p ossible, use reliable LAPACK algorithms. 11 12 Communication Library for Linear Algebra 2-DIMENSIONAL BLOCK CYCLIC Basic Linear Algebra Communication Subroutines DISTRIBUTION Purp ose of the BLACS: A design to ol, they are a conceptual aid in design and co ding. 012 A11 A12 A13 A14 A15 A16 A17 A18 A11 A14A 17 A12 A15A 18 A13 A16 Asso ciate widely recognized mnemonic names with communication A21 A22 A23 A24 A25 A26 A27 A28 A31 A34 A37 A32 A35 A38 A33 A36 0 op erations, improve program readability and the self-do cumenting A31 A32 A33 A34 A35 A36 A37 A38 A51 A54A 57 A52 A55 A58 A53 A56 qualityofcode. A41 A42 A43 A44 A45 A46 A47 A48 A71 A74 A77 A72 A75 A78 A73 A76 A51 A52 A53 A54 A55 A56 A57 A58 A21 A24 A27 A22 A25 A28 A23 A26 Promote eciency by identifying frequently-o ccurring op erations of h can b e optimized on various computers. A A A A A linear algebra whic A61 A62 A63 A64 A65 A66 A67 A68 A41 44 47 A42 45 48 A43 46 1 A A A A A A71 A72 A73 A74 A75 A76 A77 A78 A61 64 67 A62 65 68 A63 66 Program p ortability is improved through standardization of kernels A81 A84 A87 A82 A85 A88 A83 A86 A81 A82 A83 A84 A85 A86 A87 A88 without giving up eciency since optimized versions will b e available. Global left and distributed right views of matrix Ensure go o d load balance Performance and scalability, Encompass a large numb er but not all data distribution schemes, Need redistribution routines to go from distribution to the other, Not intuitive. 13 14 BLACS { BASICS BLACS { BASICS Pro cesses are emb edded in a 2-dimensional grid, Op erations are matrix based and ID- 0123 less 0 0123 N N M 1 4567 M−N M N 2 891011 LDA M N−M An op eration whichinvolves more than one sender and one receiver is N−M \scop ed op eration". Using a 2D-grid, there are 3 natural called a M scop es: M M−N MEANING SCOPE N N Row All pro cesses in a pro cess row participate. Column All pro cesses in a pro cess column participate. All All pro cesses in the pro cess grid participate. 4 Typ es of BLACS routines: p oint-to- ! use of SCOPE parameter. p oint communication, broadcast, combine op erations and supp ort routines. 15 16 BLACS { COMMUNICATION BLACS { COMMUNICATION ROUTINES ROUTINES 2. Broadcast 1. Send/Recv Send submatrix to all pro cesses or subsection of Send submatrix from one pro cess to another: pro cesses in SCOPE, using various distribution pat- 2xxSD2D ICONTXT, M, N, A, LDA, RDEST, CDEST terns TOP: 2xxRV2D ICONTXT, M, N, A, LDA, RSRC, CSRC 2xxBS2D ICONTXT, SCOPE, TOP, M, N, A, LDA 2xxBR2D ICONTXT, SCOPE, TOP, M, N, A, LDA, RSRC, CSRC 2 Data typ e xx Matrix typ e I: INTEGER, GE: GEneral SCOPE TOP S: REAL, rectangular 'Row' ' ' default D: DOUBLE PRECISION, matrix, 'Column' 'Increasing Ring' C: COMPLEX, TR: TRap ezoidal 'All' '1-tree' ::: Z: DOUBLE COMPLEX. matrix. CALL BLACS_GRIDINFO ICONTXT, NPROW, NPCOL, MYROW, MYCOL CALL BLACS_GRIDINFO ICONTXT, NPROW, NPCOL, MYROW, MYCOL IF MYROW.EQ.0 THEN IF MYROW.EQ.0 .AND. MYCOL.EQ.0 THEN IF MYCOL.EQ.0 THEN CALL DGESD2D ICONTXT, 5, 1, X, 5, 1, 0 CALL DGEBS2D ICONTXT, 'Row', ' ', 2, 2, A, 3 ELSE IF MYROW.EQ.1 .AND. MYCOL.EQ.0 THEN ELSE CALL DGERV2D ICONTXT, 5, 1, Y, 5, 0, 0 CALL DGEBR2D ICONTXT, 'Row', ' ', 2, 2, A, 3, 0, 0 END IF END IF END IF 17 18 BLACS { COMBINE OPERATIONS BLACS { ADVANCED TOPICS 3. Global combine op erations The BLACS context is the BLACS mecha- nism for partitioning communication space. A Perform element-wise SUM, jMAXj, jMINj op erations, de ning prop erty of a context is that a mes- on rectangular matrices: sage in a context cannot be sent or received in another context. The BLACS context includes 2GSUM2D ICONTXT, SCOPE, TOP, M, N, A, LDA, RDEST, CDEST the de nition of a grid, and each pro cess co ordinates in it. 2GAMX2D ICONTXT, SCOPE, TOP, M, N, A, LDA, RA, CA, RCFLAG, RDEST, CDEST It allows the user to 2GAMN2D ICONTXT, SCOPE, TOP, M, N, A, LDA, create arbitrary groups of pro cesses, RA, CA, RCFLAG, RDEST, CDEST create multiple overlapping and/or disjoint grids, RDEST = -1 indicates that the result of the op eration should b e left on all pro cesses selected by SCOPE, isolate each pro cess grid so that grids do not For jMAXj, jMINj, when RCFLAG = -1, RA and CA interfere with each other. are not referenced; otherwise, these arrays are set on output with the co ordinates of the pro cess owning the corresp onding maximum resp. minimum elementin absolute value of A. Both arrays must b e at least of ! the BLACS context is compatible with the size M N and their leading dimension is sp eci ed by RCFLAG. MPI communicator 19 20 BLACS { REFERENCES PBLAS { INTRODUCTION BLACS software and do cumentation can b e obtained Parallel Basic Linear Algebra Subprograms for distributed via: memory MIMD computers. { WWW: http://www.netlib.org/blacs, { anonymous ftp ftp.netlib.org: cd blacs; get index Similar functionality as the BLAS: distributed vector-vector, { email [email protected] with the message: matrix-vector and matrix-matrix op erations, send index from blacs Comments and questions can b e addressed to Simpli cation of the parallelization of dense linear algebra [email protected] co des: esp ecially when implemented on top of the BLAS, J. Dongarra and R. van de Geijn, Two dimensional Basic Linear Algebra Communication Subprograms, Technical Rep ort UT CS-91-138, LAPACK Working Note 37, UniversityofTennessee, 1991. Clarity: co de is shorter and easier to read, R. C. Whaley, Basic Linear Algebra Communication Subprograms: Analysis and Implementation Across Multiple Paral lel Architectures ,Technical Re- p ort UT CS-94-234, LAPACK Working Note 73, UniversityofTennessee, Mo dularity: gives programmer larger building blo cks, 1994.

Over-Scalapack.Pdf

Recursive Approach in Sparse Matrix LU Factorization

Abstract 1 Introduction

Overview of Iterative Linear System Solver Packages

Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product

Prospectus for the Next LAPACK and Scalapack Libraries

Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist

Distibuted Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA

LAPACK WORKING NOTE 195: SCALAPACK's MRRR ALGORITHM 1. Introduction. Since 2005, the National Science Foundation Has Been Fund

Reciprocity Relations

C++ API for BLAS and LAPACK

On the Performance and Energy Efficiency of Sparse Linear Algebra on Gpus

Evolving Software Repositories