Parallel Programming Patterns Overview and Map Pattern

Parallel Programming Patterns Overview and Map Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science Lecture 5 – Parallel Programming Patterns - Map Outline q Parallel programming models q Dependencies q Structured programming patterns overview ❍ Serial / parallel control flow patterns ❍ Serial / parallel data management patterns q Map pattern ❍ Optimizations ◆ sequences of Maps ◆ code Fusion ◆ cache Fusion ❍ Related Patterns ❍ Example: Scaled Vector Addition (SAXPY) Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 2 Parallel Models 101 q Sequential models Memory (RAM) ❍ von Neumann (RAM) model q Parallel model processor ❍ A parallel computer is simple a collection of processors interconnected in some manner to coordinate activities and exchange data ❍ Models that can be used as general frameworks for describing and analyzing parallel algorithms ◆ Simplicity: description, analysis, architecture independence ◆ Implementability: able to be realized, reflect performance q Three common parallel models ❍ Directed acyclic graphs, shared-memory, network Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 3 Directed Acyclic Graphs (DAG) q Captures data flow parallelism q Nodes represent operations to be performed ❍ Inputs are nodes with no incoming arcs ❍ Output are nodes with no outgoing arcs ❍ Think of nodes as tasks q Arcs are paths for flow of data results q DAG represents the operations of the algorithm and implies precedent constraints on their order for (i=1; i<100; i++) a[0] a[1] … a[99] a[i] = a[i-1] + 100; Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 4 Shared Memory Model q Parallel extension of RAM model (PRAM) ❍ Memory size is infinite P1 ❍ Number of processors in unbounded P2 P ❍ Processors communicate via the memory 3 Shared . Memory ❍ . Every processor accesses any memory . location in 1 cycle PN ❍ Synchronous ◆ All processors execute same algorithm synchronously – READ phase – COMPUTE phase – WRITE phase ◆ Some subset of the processors can stay idle ❍ Asynchronous Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 5 Network Model q G = (N,E) P11 P12 P1N ❍ N are processing nodes P21 P22 P2N P P P ❍ E are bidirectional communication links 31 32 3N . q Each processor has its own memory . q No shared memory is available PN1 PN2 PNN q Network operation may be synchronous or asynchronous q Requires communication primitives ❍ Send (X, i) ❍ Receive (Y, j) q Captures message passing model for algorithm design Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 6 Parallelism q Ability to execute different parts of a computation concurrently on different machines q Why do you want parallelism? ❍ Shorter running time or handling more work q What is being parallelized? ❍ Task: instruction, statement, procedure, … ❍ Data: data flow, size, replication ❍ Parallelism granularity ◆ Coarse-grain versus fine-grainded q Thinking about parallelism q Evaluation Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 7 Why is parallel programming important? q Parallel programming has matured ❍ Standard programming models ❍ Common machine architectures ❍ Programmer can focus on computation and use suitable programming model for implementation q Increase portability between models and architectures q Reasonable hope of portability across platforms q Problem ❍ Performance optimization is still platform-dependent ❍ Performance portability is a problem ❍ Parallel programming methods are still evolving Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 8 Parallel Algorithm q Recipe to solve a problem “in parallel” on multiple processing elements q Standard steps for constructing a parallel algorithm ❍ Identify work that can be performed concurrently ❍ Partition the concurrent work on separate processors ❍ Properly manage input, output, and intermediate data ❍ Coordinate data accesses and work to satisfy dependencies q Which are hard to do? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 9 Parallelism Views q Where can we find parallelism? q Program (task) view ❍ Statement level ◆ Between program statements ◆ Which statements can be executed at the same time? ❍ Block level / Loop level / Routine level / Process level ◆ Larger-grained program statements q Data view ❍ How is data operated on? ❍ Where does data reside? q Resource view Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 10 Parallelism, Correctness, and Dependence q Parallel execution, from any point of view, will be constrained by the sequence of operations needed to be performed for a correct result q Parallel execution must address control, data, and system dependences q A dependency arises when one operation depends on an earlier operation to complete and produce a result before this later operation can be performed q We extend this notion of dependency to resources since some operations may depend on certain resources ❍ For example, due to where data is located Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 11 Executing Two Statements in Parallel q Want to execute two statements in parallel q On one processor: Statement 1; Statement 2; q On two processors: Processor 1: Processor 2: Statement 1; Statement 2; q Fundamental (concurrent) execution assumption ❍ Processors execute independent of each other ❍ No assumptions made about speed of processor execution Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 12 Sequential Consistency in Parallel Execution q Case 1: Processor 1: Processor 2: me statement 1; statement 2; q Case 2: Processor 1: Processor 2: me statement 2; statement 1; q Sequential consistency ❍ Statements execution does not interfere with each other ❍ Computation results are the same (independent of order) Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 13 Independent versus Dependent q In other words the execution of statement1; statement2; must be equivalent to statement2; statement1; q Their order of execution must not matter! q If true, the statements are independent of each other q Two statements are dependent when the order of their execution affects the computation outcome Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 14 Examples q Example 1 r Statements are independent S1: a=1; S2: b=1; q Example 2 r Dependent (true (flow) dependence) S1: a=1; ¦ Second is dependent on first S2: b=a; ¦ Can you remove dependency? q Example 3 r Dependent (output dependence) S1: a=f(x); ¦ Second is dependent on first S2: a=b; ¦ Can you remove dependency? How? q Example 4 r Dependent (anti-dependence) S1: a=b; ¦ First is dependent on second S2: b=1; ¦ Can you remove dependency? How? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 15 True Dependence and Anti-Dependence q Given statements S1 and S2, S1; S2; q S2 has a true (flow) dependence on S1 X = ... δ if and only if = X S2 reads a value written by S1 = X q S2 has a anti-dependence on S1 ... δ-1 if and only if X = S2 writes a value read by S1 Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 16 Output Dependence q Given statements S1 and S2, S1; S2; q S2 has an output dependence on S1 X = if and only if ... δ0 S2 writes a variable written by S1 X = q Anti- and output dependences are “name” dependencies ❍ Are they “true” dependences? q How can you get rid of output dependences? ❍ Are there cases where you can not? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 17 Statement Dependency Graphs q Can use graphs to show dependence relationships q Example S1: a=1; S1 ﬂow S2: b=a; output S2 an0 S3: a=b+1; S3 S4: c=a; S4 q S2 δ S3 : S3 is flow-dependent on S2 0 q S1 δ S3 : S3 is output-dependent on S1 -1 q S2 δ S3 : S3 is anti-dependent on S2 Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 18 When can two statements execute in parallel? q Statements S1 and S2 can execute in parallel if and only if there are no dependences between S1 and S2 ❍ True dependences ❍ Anti-dependences ❍ Output dependences q Some dependences can be remove by modifying the program ❍ Rearranging statements ❍ Eliminating statements Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 19 How do you compute dependence? q Data dependence relations can be found by comparing the IN and OUT sets of each node q The IN and OUT sets of a statement S are defined as: ❍ IN(S) : set of memory locations (variables) that may be used in S ❍ OUT(S) : set of memory locations (variables) that may be modified by S q Note that these sets include all memory locations that may be fetched or modified q As such, the sets can be conservatively large Introduction to Parallel

Parallel Programming Patterns Overview and Map Pattern

Map, Reduce and Mapreduce, the Skeleton Way$

From Sequential to Parallel Programming with Patterns

Opencl: a Hands-On Introduction

Mapreduce Basics

CUDA C++ Programming Guide

Programming Massively Parallel Processors

A Case Study of Openmp Applied to Map/Reduce-Style Computations

A Review of CUDA, Mapreduce, and Pthreads Parallel Computing Models

Parallel Design Patterns

Memcuda: Map Device Memory to Host Memory on GPGPU Platform*

Openmp Advanced Overview SIMD and Target Offload

A Brief Introduction to Opencl • Reference - Programming Massively Parallel Processors: a Hands-On Approach, David Kirk and Wen-Mei W