Introduction to Parallel Programming with Mpi And

INTRODUCTION TO PARALLEL PROGRAMMING WITH MPI AND OPENMP 1–5 March 2021 | Benedikt Steinbusch | Jülich Supercomputing Centre Member of the Helmholtz Association IV Nonblocking Point-to-Point Communication 14 18 Introduction 14 19 Start 14 20 Completion 15 INTRODUCTION TO 21 Remarks 16 22 Exercises 17 PARALLEL PROGRAMMING WITH V Collective Communication 18 23 Introduction 18 MPI AND OPENMP 24 Reductions 18 25 Reduction Variants 19 Benedikt Steinbusch | Jülich Supercomputing Centre | 1–5 March 2021 26 Exercises 19 27 Data Movement 20 28 Data Movement Variants 22 CONTENTS 29 Exercises 22 30 In Place Mode 23 I Fundamentals of Parallel Computing 3 31 Synchronization 24 1 Motivation 3 32 Nonblocking Collective Communication 24 2 Hardware 3 3 Software 4 VI Derived Datatypes 25 33 Introduction 25 II First Steps with MPI 5 34 Constructors 26 4 What is MPI? 5 35 Exercises 28 5 Terminology 6 36 Address Calculation 28 6 Infrastructure 6 37 Padding 29 7 Basic Program Structure 7 38 Exercises 30 8 Exercises 9 VII Input/Output 31 III Blocking Point-to-Point Communication 10 39 Introduction 31 9 Introduction 10 40 File Manipulation 31 10 Sending 10 41 File Views 33 11 Exercises 11 42 Data Access 34 12 Receiving 11 43 Consistency 40 13 Exercises 12 44 Exercises 40 14 Communication Modes 12 15 Semantics 12 VIII Tools 41 16 Pitfalls 13 45 MUST 41 17 Exercises 13 46 Exercises 42 IX Communicators 42 1 47 Introduction 42 77 The task Construct 60 48 Constructors 42 78 task Clauses 61 49 Accessors 43 79 Task Scheduling 61 50 Destructors 45 80 Task Synchronization 62 51 Exercises 45 81 Exercises 62 X Thread Compliance 46 XV Wrap-up 63 52 Introduction 46 XVI Tutorial 63 53 Enabling Thread Support 46 54 Matching Probe and Receive 46 TIMETABLE 55 Remarks 47 Day 1 Day 2 Day 3 Day 4 (Day 5) XI First Steps with OpenMP 47 56 What is OpenMP? 47 09:00 Fundamentals Blocking I/O First Steps with Tutorial 57 Terminology 49 10:30 of Parallel Collective OpenMP Computing Communication 58 Infrastructure 49 59 Basic Program Structure 50 COFFEE 60 Exercises 52 11:00 First Steps with Nonblocking I/O Low-Level Tutorial XII Low-Level OpenMP Concepts 52 12:30 MPI Collective Constructs Comm. 61 Introduction 52 62 Exercises 53 UTENSILS 63 Data Environment 54 13:30 Blocking P2P Derived Tools & Loop Tutorial 64 Exercises 55 14:30 Communication Datatypes Communicators Worksharing 65 Thread Synchronization 56 COFFEE 66 Exercises 57 15:00 Nonblocking Derived Thread Task Tutorial XIII Worksharing 57 16:30 P2P Datatypes Compliance Worksharing 67 Introduction 57 Communication 68 The single construct 57 69 single Clauses 57 70 The loop construct 58 71 loop Clauses 58 72 Exercises 59 73 workshare Construct 59 74 Exercises 59 75 Combined Constructs 59 XIV Task Worksharing 60 76 Introduction 60 2 PART I 2 HARDWARE FUNDAMENTALS OF PARALLEL COMPUTING A MODERN SUPERCOMPUTER Interconnect Interconnect Interconnect 1 MOTIVATION … CPU RAM CPU RAM CPU RAM PARALLEL COMPUTING Parallel computing is a type of computation in which many calculations or the MICROCHIP MEMORY MICROCHIP MEMORY MICROCHIP MEMORY execution of processes are carried out simultaneously. (Wikipedia1) Internal Bus Internal Bus Internal Bus WHYAMIHERE? TH MEMORY TH MEMORY TH MEMORY The Way Forward Accel. RAM Accel. RAM Accel. RAM • Frequency scaling has stopped • Performance increase through more parallel hardware PARALLEL COMPUTATIONAL UNITS • Treating scientific problems Implicit Parallelism – of larger scale • Parallel execution of different (parts of) processor instructions – in higher accuracy • Happens automatically – of a completely new kind • Can only be influenced indirectly by the programmer PARALLELISM IN THE TOP 500 LIST Multi-core / Multi-CPU Average Number of Cores of the Top 10 Systems • Found in commodity hardware today • Computational units share the same memory 106 Cluster • Found in computing centers • Independent systems linked via a (fast) interconnect 104 • Each system has its own memory Number of cores Accelerators • Strive to perform certain tasks faster than is possible on a general purpose CPU 1993 1999 2004 2009 2015 2020 • Make different trade-offs 1Wikipedia. Parallel computing — Wikipedia, The Free Encyclopedia. 2017. URL: https://en.wikipedia.org/w/index.php?title=Parallel_computing&oldid=787466585 • Often have their own memory (visited on 06/28/2017). • Often not autonomous 3 Vector Processors / Vector Units Message Passing • Perform same operation on multiple pieces of data simultaneously • Parts of program state are transferred from one process to another for coordination • Making a come-back as SIMD units in commodity CPUs (AVX-512) and GPGPU • Primitive operations are active send and active receive MPI MEMORY DOMAINS • Implements a form of Distributed State and Message Passing Shared Memory • (But also Shared State and Synchronization) • All memory is directly accessible by the parallel computational units SHARED STATE & SYNCHRONIZATION • Single address space • Programmer might have to synchronize access Shared State Distributed Memory The whole program state is directly accessible by the parallel threads. Synchronization • Memory is partitioned into parts which are private to the different computational units • Threads can manipulate shared state using common loads and stores • “Remote” parts of memory are accessed via an interconnect • Establish agreement about progress of execution using synchronization primitives, e.g. • Access is usually nonuniform barriers, critical sections, … OpenMP 3 SOFTWARE • Implements Shared State and Synchronization PROCESSES & THREADS & TASKS • (But also higher level constructs) Abstractions for the independent execution of (part of) a program. Process Usually, multiple processes, each with their own associated set of resources (memory, file descriptors, etc.), can coexist Thread • Typically “smaller” than processes • Often, multiple threads per one process • Threads of the same process can share resources Task • Typically “smaller” than threads • Often, multiple tasks per one thread • Here: user-level construct DISTRIBUTED STATE & MESSAGE PASSING Distributed State Program state is partitioned into parts which are private to the different processes. 4 PART II 4. Datatypes ✓ 5. Collective Communication ✓ FIRST STEPS WITH MPI 6. Groups, Contexts, Communicators and Caching (✓) 7. Process Topologies (✓) 4 WHATISMPI? 8. MPI Environmental Management (✓) MPI (Message-Passing Interface) is a message-passing library interface specification. 9. The Info Object [...] MPI addresses primarily the message-passing parallel programming model, in 10. Process Creation and Management which data is moved from the address space of one process to that of another process through cooperative operations on each process. (MPI Forum2) 11. One-Sided Communications • Industry standard for a message-passing programming model 12. External interfaces (✓) • Provides specifications (no implementations) 13. I/O ✓ • Implemented as a library with language bindings for Fortran and C 14. Tool Support • Portable across different computer architectures 15. … Current version of the standard: 3.1 (June 2015) LITERATURE & ACKNOWLEDGEMENTS BRIEF HISTORY Literature <1992 several message-passing libraries were developed, PVM, P4, ... • Message Passing Interface Forum. MPI: A Message-Passing 1992 At SC92, several developers for message-passing libraries agreed to develop a standard for Interface Standard. Version 3.1. June 4, 2015. URL: message-passing https://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf 1994 MPI-1.0 standard published • William Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI. Portable Parallel Programming with the Message-Passing Interface. 3rd ed. The MIT Press, Nov. 2014. 336 pp. 1997 MPI-2.0 standard adds process creation and management, one-sided communication, ISBN: 9780262527392 extended collective communication, external interfaces and parallel I/O • William Gropp et al. Using Advanced MPI. Modern Features of the Message-Passing Interface. 2008 MPI-2.1 combines MPI-1.3 and MPI-2.0 1st ed. Nov. 2014. 392 pp. ISBN: 9780262527637 2009 MPI-2.2 corrections and clarifications with minor extensions • https://www.mpi-forum.org 2012 MPI-3.0 nonblocking collectives, new one-sided operations, Fortran 2008 bindings Acknowledgements 2015 MPI-3.1 nonblocking collective I/O, current version of the standard • Rolf Rabenseifner for his comprehensive course on MPI and OpenMP • Marc-André Hermanns, Florian Janetzko and Alexander Trautmann for their course material COVERAGE on MPI and OpenMP 1. Introduction to MPI ✓ 2. MPI Terms and Conventions ✓ 3. Point-to-Point Communication ✓ 2Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Version 3.1. June 4, 2015. URL: https://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf. 5 5 TERMINOLOGY C Compiler Wrappers PROCESS ORGANIZATION [MPI-3.1, 6.2] $ # Generic compiler wrapper shipped with e.g. OpenMPI $ mpicc foo.c -o foo Terminology: Process $ # Vendor specific wrapper for IBM's XL C compiler on BG/Q $ bgxlc foo.c -o foo An MPI program consists of autonomous processes, executing their own code, in an MIMD style. Fortran Compiler Wrappers Terminology: Rank $ # Generic compiler wrapper shipped with e.g. OpenMPI A unique number assigned to each process within a group (start at 0) $ mpifort foo.f90 -o foo Terminology: Group $ # Vendor specific wrapper for IBM's XL Fortran compiler on BG/Q $ bgxlf90 foo.f90 -o foo An ordered set of process identifiers Terminology: Context However, neither the existence nor the interface of these wrappers

Load more