Parallel Programming Models

Lecture 3 Parallel Programming Models Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline • Goal: Understand parallel programming models – Programming Models – Application Models – Parallel Computing Frameworks • Open-MP • MPI • CUDA • Map-Reduce – Parallel Computing Libraries ICOM 6025: High Performance Computing 2 Parallel Programming Models • A parallel programming model represents an abstraction of the underlying hardware or computer system. – Programming models are not specific to a particular type of memory architecture. – In fact, any programming model can be in theory implemented on any underlying hardware. ICOM 6025: High Performance Computing 3 Programming Language Models • Threads Model • Data Parallel Model • Task Parallel Model • Message Passing Model • Partitioned Global Address Space Model ICOM 6025: High Performance Computing 4 Threads Model • A single process can have multiple, concurrent execution paths – The thread based model requires the programmer to manually manage synchronization, load balancing, and locality, • Deadlocks • Race conditions • Implementations – POSIX Threads • specified by the IEEE POSIX 1003.1c standard (1995) and commonly referred to as Pthreads, • Library-based explicit parallelism model. – OpenMP • Compiler directive-based model that supports parallel programming in C/C++ and FORTRAN. ICOM 6025: High Performance Computing 5 Data Parallel Model • Focus on data partition • Each task performs the same operation on their partition of data. – This model is suitable for problems with static load balancing (e.g. very regular fluid element analysis, image processing). • Implementations – FORTRAN 90 (F90), FORTRAN 95 (F95) – High Performance Fortran (HPF) – MapReduce, CUDA ICOM 6025: High Performance Computing 6 Task Parallel Model • Focus on task execution (workflow) • Each processor executes a different thread or process • Implementations – Threading Building Blocks (TBB) – Task Parallel Library (TPL) – Intel Concurrent Collections (CnC) ICOM 6025: High Performance Computing 7 Message Passing Model • A set of tasks use their own local memory during computation – Tasks exchange data through communications by sending and receiving messages – Data transfer usually requires cooperative operations to be performed by each process. • For example, a send operation must have a matching receive operation. • Implementations – MPI – Erlang ICOM 6025: High Performance Computing 8 Partitioned Global Address Space Model • Multiple threads share at least a part of a global address space – The idea behind this model is to provide the ability to access local and remote data with same mechanisms. • Implementations – Co-Array Fortran – Unified Parallel C (UPC) – Titanium ICOM 6025: High Performance Computing 9 Integrated view of PP Models Data Parallel © Wilson Rivera SIMD Data Task Parallel Parallel Data MIMD Parallel Message Passing Data Parallel SIMD Task Parallel Data Parallel Shared Memory Distributed Memory ICOM 6025: High Performance Computing 10 PP Model Ecosystem © Wilson Rivera Message Erlang MPI Passing Runtime CnC System Higher Abstraction Higher Task TBB TPL Parallel HPF UPC Co-Array MapReduce Data Fortran90 Ct CUDA RapidMind Parallel OpenMP Pthreads Thread ICOM 6025: High Performance Computing 11 Application Models • Bag-of-Tasks – The user simply states a set of unordered computation tasks and allows the system to assign them to processors • Map-Reduce – A map operator is applied to a set of name-value pairs to generate several intermediate sets – Subsequently, a reduce operator is then applied to summarize the intermediates sets into one or more final sets • Bulk-Synchronous-Parallel – Processes run in phases that are separated by a global synchronization step – During each phase, they perform calculations independently, and then communicate new results with their data dependent peers. – The execution time of a step is determined by the most heavily load processor • Directed Acyclic Graph (DAG) – A workflow application can be modeled as a directed acyclic graph A=(V,E) where V is the set of nodes that represent the tasks in an application, and E is the set of edges which represent the dependence between tasks. ICOM 6025: High Performance Computing 12 Application Models Direct Acyclic Graph Less Granularity Less Bulk Synchronous Parallel Map-Reduce Bag of Tasks © Wilson Rivera More Synchronization ICOM 6025: High Performance Computing 13 Application Settings High performance Computing High (Clusters) © Wilson Rivera Application Coupling Cloud Computing Grid Computing Low Low Latency High Motifs Embed SPEC DB Games ML HPC Health Image Speech Music Browser 1 Finite State Mach. 2 Combinational 3 Graph Traversal 4 Structured Grid 5 Dense Matrix 6 Sparse Matrix 7 Spectral (FFT) 8 Dynamic Prog 9 N-Body 10 MapReduce 11 Backtrack/ B&B 12 Graphical Models 13 Unstructured Grid Source: UC Berkeley ICOM 6025: High Performance Computing 15 High Performance Libraries http://www.netlib.org HPC Libraries • Basic Linear Algebra Subroutines (BLAS) • Linear Algebra PACKage (LAPACK) – IBM (ESSL), AMD (ACML), Intel (MKL), Cray (libsci) • Basic Linear Algebra Communication Subroutines (BLACS) • Automatically Tuned Linear Algebra Software (ATLAS) • Parallel Linear Algebra Software for Multicore Architectures (PLASMA) • Matrix Algebra on GPU and Multicore Architectures (MAGMA) Summary • Goal: Understand parallel programming models – Programming Models – Application Models – Parallel Computing Frameworks • Open-MP • MPI • CUDA • Map-Reduce – Parallel Computing Libraries ICOM 6025: High Performance Computing 18.

Parallel Programming Models

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support