Parallel design patterns

M. Danelutto

Introduction Parallel design Parallel design patterns patterns

Finding concurrency design space Marco Danelutto Algorithm Structure design space Dept. of Computer Science, University of Pisa, Italy Supporting structures design space May 2011, Pisa Implementation mechanisms design space

Conclusions Contents

Parallel design patterns M. Danelutto 1 Introduction

Introduction

Parallel design 2 Parallel design patterns patterns

Finding concurrency 3 Finding concurrency design space design space Algorithm 4 Algorithm Structure design space Structure design space Supporting 5 Supporting structures design space structures design space Implementation 6 Implementation mechanisms design space mechanisms design space Conclusions 7 Conclusions T s(n) = seq (speedup) Tpar (n) T d T (n) = i = seq (efficiency) Tpar (n) n × Tpar (n) 1 s(n) = 1−f (Amdhal law) f + n

Parallel computing

Parallel design patterns

M. Danelutto The problem

Introduction Solve a problem using nw processing resources Parallel design Obtaining a (close to) nw speedup with respect to time patterns spent sequentially Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions

Parallel design patterns

M. Danelutto The problem

Introduction Solve a problem using nw processing resources Parallel design Obtaining a (close to) nw speedup with respect to time patterns spent sequentially Finding concurrency design space Algorithm Tseq Structure s(n) = (speedup) design space Tpar (n) Supporting structures Ti d Tseq design space (n) = = (efficiency) Implementation Tpar (n) n × Tpar (n) mechanisms design space 1 Conclusions s(n) = 1−f (Amdhal law) f + n Parallelism exploitation program activities (threads, processes) program interactions (communications, sycnhronizations) → overhead

The problems

Parallel design patterns

M. Danelutto

Introduction Find potentially concurrent activities Parallel design patterns alternative decompositions Finding with possibly radically differences concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions The problems

Parallel design patterns

M. Danelutto

Introduction Find potentially concurrent activities Parallel design patterns alternative decompositions Finding with possibly radically differences concurrency design space

Algorithm Structure Parallelism exploitation design space program activities (threads, processes) Supporting structures design space program interactions (communications, sycnhronizations) Implementation → overhead mechanisms design space

Conclusions Algorithmic skeletons Cole 1988 → common, parametric, reusable parallelism exploitation pattern languages & libraries since ’90 (P3L, Skil, eSkel, ASSIST, Muesli, SkeTo, Mallba, Muskel, Skipper, BS, ...) high level parallel abstractions (parallel programming community) hiding most of the technicalities related to parallelism exploitation directly exposed to application programmers Parallel design patterns Massingill, Mattson, Sanders 2000 → “Patterns for parallel programming” book (2006) (software engineering community) design patterns `ala Gamma book name, problem, solution, use cases, etc. concurrency, algorithms, implementation, mechanisms

Structured parallel programming

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Parallel design patterns Massingill, Mattson, Sanders 2000 → “Patterns for parallel programming” book (2006) (software engineering community) design patterns `ala Gamma book name, problem, solution, use cases, etc. concurrency, algorithms, implementation, mechanisms

Structured parallel programming

Parallel design Algorithmic skeletons patterns

M. Danelutto Cole 1988 → common, parametric, reusable parallelism exploitation pattern Introduction languages & libraries since ’90 (P3L, Skil, eSkel, ASSIST, Parallel design patterns Muesli, SkeTo, Mallba, Muskel, Skipper, BS, ...)

Finding high level parallel abstractions (parallel programming concurrency community) design space hiding most of the technicalities related to parallelism Algorithm Structure exploitation design space directly exposed to application programmers Supporting structures design space

Implementation mechanisms design space

Conclusions Structured parallel programming

Parallel design Algorithmic skeletons patterns

M. Danelutto Cole 1988 → common, parametric, reusable parallelism exploitation pattern Introduction languages & libraries since ’90 (P3L, Skil, eSkel, ASSIST, Parallel design patterns Muesli, SkeTo, Mallba, Muskel, Skipper, BS, ...)

Finding high level parallel abstractions (parallel programming concurrency community) design space hiding most of the technicalities related to parallelism Algorithm Structure exploitation design space directly exposed to application programmers Supporting structures Parallel design patterns design space

Implementation Massingill, Mattson, Sanders 2000 → “Patterns for parallel mechanisms programming” book (2006) (software engineering community) design space design patterns `ala Gamma book Conclusions name, problem, solution, use cases, etc. concurrency, algorithms, implementation, mechanisms New architectures Heterogeneous in Hw & Sw Multicore NUMA, cache coherent architectures Further non functional concerns security, fault tolerance, power management, ...

Concept evolution

Parallel design patterns M. Danelutto Parallelism

Introduction parallelism exploitation patterns shared among applications Parallel design patterns separation of concerns:

Finding system programmers → efficient implementation of parallel concurrency patterns design space application programmers → application specific details Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Further non functional concerns security, fault tolerance, power management, ...

Concept evolution

Parallel design patterns M. Danelutto Parallelism

Introduction parallelism exploitation patterns shared among applications Parallel design patterns separation of concerns:

Finding system programmers → efficient implementation of parallel concurrency patterns design space application programmers → application specific details Algorithm Structure design space New architectures Supporting Heterogeneous in Hw & Sw structures design space Multicore NUMA, cache coherent architectures Implementation mechanisms design space

Conclusions Concept evolution

Parallel design patterns M. Danelutto Parallelism

Introduction parallelism exploitation patterns shared among applications Parallel design patterns separation of concerns:

Finding system programmers → efficient implementation of parallel concurrency patterns design space application programmers → application specific details Algorithm Structure design space New architectures Supporting Heterogeneous in Hw & Sw structures design space Multicore NUMA, cache coherent architectures Implementation mechanisms Further non functional concerns design space

Conclusions security, fault tolerance, power management, ... Parallel design patterns

Parallel design patterns Researchers active since beginning of the century M. Danelutto S. MacDonald, J. Anvik, S. Bromling, J. Schaeffer, D. Szafron, and

Introduction K. Tan. 2002. From patterns to frameworks to parallel programs.

Parallel design Parallel Comput. 28, 12 (December 2002), 1663-1683. patterns Berna L. Massingill, Timothy G. Mattson, Beverly A. Sanders: A Finding Pattern Language for Parallel Application Programs (Research Note). concurrency design space Euro-Par 2000, LNCS, pp. 678-681, 2000 Algorithm Berna L. Massingill, Timothy G. Mattson , Beverly A. Sanders, Structure design space Parallel programming with a pattern language, Springer Verlag, Int.

Supporting J. STTT 3:1-18, 2001 structures design space Berna L. Massingill, Timothy G. Mattson , Beverly A. Sanders,

Implementation Patterns for Finding Concurrency for Parallel Application Programs, mechanisms (pre-book) design space Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill, Conclusions Patterns for parallel programming, Addison Wesley, Pearson Education, 2005 Parallel design patterns

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions The pattern framework

Parallel design patterns M. Danelutto Four patter classes: Introduction 1 Finding concurrency Parallel design patterns 2 Algorithm structure Finding concurrency 3 Supporting structure design space

Algorithm 4 Implementation mechanisms Structure design space These are “design spaces” Supporting structures different concerns design space different “kind of programmers” involved Implementation mechanisms upper layers → application programmers design space lower layers → system programmers Conclusions Finding concurrency design space

Parallel design patterns

M. Danelutto Three main blocks Introduction 1 Decomposition Parallel design patterns → Decomposition of problems into pieces that can be Finding concurrency computed concurrently design space 2 Dependency analysis Algorithm Structure design space → support task grouping and dependency analysis Supporting 3 Design evaluation structures design space → aimed at supporting evaluation of alternatives Implementation mechanisms Used in an iterative process: design space design → evaluate → redesign → ... Conclusions Finding concurrency design space

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Decomposition patterns

Parallel design patterns

M. Danelutto Task decomposition How can a problem be decomposed into tasks that can execute Introduction

Parallel design concurrently ? patterns

Finding concurrency Data decomposition design space How can a problem’s date be decomposed into units that can Algorithm Structure be operated on relatively independently? design space

Supporting structures design space Forces: Implementation mechanisms Flexibility design space

Conclusions Efficiency Simplicity Dependency analysis patterns

Parallel design patterns

M. Danelutto Group tasks How can the tasks make up a problem be grouped to simplify Introduction the job of managing dependencies? Parallel design patterns

Finding concurrency Order tasks design space Given a way of decomposing a problem into tasks and a way of Algorithm Structure collecting these tasks into logically related groups, how must design space these groups of tasks be ordered to satisfy constrains among Supporting structures tasks? design space

Implementation mechanisms Data sharing design space

Conclusions Given a data and task decomposition for a problem, how is data shared among the tasks? Design evaluation pattern

Parallel design patterns M. Danelutto Design evaluation pattern Introduction Is the decomposition and dependency analysis so far good to Parallel design patterns move on to the next design space, or should the design be

Finding revisited? concurrency design space Forces: Algorithm Structure Suitability for the target platform (PE available, sharing design space support, coordination of PE activities, overheads) Supporting structures design space Design quality (flexibility, efficiency, simplicity) Implementation Preparation for the next phase of the design (regularity of mechanisms design space the solution, synchronous/asynchronous interactions, task Conclusions grouping) Algorithm structure

Parallel design patterns

M. Danelutto

Introduction Three main blocks Parallel design patterns 1 Organize by task Finding concurrency → when execution by tasks is the best organizing principle design space 2 Algorithm Organize by data decomposition Structure design space → when main source of parallelisms is data Supporting 3 Organize by flow analysis structures design space → flow of data imposing ordering on (groups of) tasks Implementation mechanisms design space

Conclusions Algorithm structure

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Organize by task

Parallel design patterns M. Danelutto Task parallelism Introduction When the problem is best decomposed into a collection of Parallel design tasks that can execute concurrently, how can this concurrency patterns

Finding be exploited efficiently? concurrency design space → dependency analysis, scheduling, ... Algorithm Structure design space Divide & conquer Supporting structures Suppose the problem is formulated using the sequential design space divide&conquer strategy. How can the potential concurrency Implementation mechanisms be exploited? design space Conclusions → dependency analysis, communication costs, ... Organize by data decomposition

Parallel design patterns

M. Danelutto

Introduction Geometric decomposition Parallel design How can an algorithm be organized around a data structure patterns that has been decomposed into concurrently updatable Finding concurrency “chuncks”? design space

Algorithm Structure Recursive data design space

Supporting Suppose the problem involves an operation on a recursive data structures structure (such as a list, tree or graph) that appears to require design space

Implementation sequential processing. How can operations on these data mechanisms design space structures be performed in parallel?

Conclusions Organize by flow of data

Parallel design patterns

M. Danelutto Pipeline Suppose that the overall computation involves performing a Introduction

Parallel design calculation on many sets of data, where the calculation can be patterns viewed in terms of data flowing through a sequence of stages. Finding concurrency How can potential concurrency be exploited ? design space

Algorithm Structure Event based coordination design space Suppose th application can be decomposed into groups of Supporting structures semi-independent tasks interacting in an irregular fashion. The design space interaction is determined by the flow of data between them Implementation mechanisms which implies ordering constraints between the tasks. How can design space these tasks and their interaction be implemented so they can Conclusions execute concurrently? Supporting structures

Parallel design patterns Two main blocks: M. Danelutto 1 Program structures Introduction → approaches for structuring source code Parallel design patterns 2 Data structures Finding concurrency → data dependency management design space

Algorithm Forces: Structure design space Clarity of abstraction

Supporting structures Scalability design space Efficiency Implementation mechanisms Maintainability design space

Conclusions Environmental affinity Sequential equivalence Supporting structures

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Program structures

Parallel design patterns

M. Danelutto SPMD (Single Program Multiple Data) Introduction The interactions between the various UEs cause most of the Parallel design patterns problems when writing correct and efficient parallel programs. Finding How can programmers structure their parallel programs to concurrency design space make these interactions more manageable and easier to Algorithm integrate with the core computations? Structure design space

Supporting structures Master/worker design space How should a program be organized when the design is Implementation mechanisms dominated by the need to dynamically balance the work on a design space set of tasks among the UEs? Conclusions Program structures (2)

Parallel design patterns

M. Danelutto Loop parallelism Introduction Given a serial program whose runtime is dominated by a set of Parallel design patterns computationally intensive loops, how can it be translated into a Finding parallel program? concurrency design space Algorithm Fork/join Structure design space In some programs the number of concurrent tasks varies as the Supporting structures program executes, and the way these tasks are related prevents design space the use of simple control structures such as parallel loops. How Implementation mechanisms can a parallel program be constructed around such complicated design space sets of dynamic tasks? Conclusions Data Structures

Parallel design patterns M. Danelutto Shared data Introduction How toes one explicitly manage shared data inside a set of Parallel design concurrent tasks? patterns

Finding concurrency Shared queue design space

Algorithm How can concurrenty-executing UEs safely share a queue data Structure design space structure?

Supporting structures design space Distributed array Implementation Arrays often need to be partitioned between multiple UEs. How mechanisms design space can we do this so the resulting program is both readable and Conclusions efficient? Implementation mechanisms

Parallel design patterns M. Danelutto Directly related to the target architecture:

Introduction 1 to provide mechanisms suitable to create a set of Parallel design concurrent activities (UE Units of Execution) patterns

Finding → threads, processes (creation, destruction) concurrency design space 2 to support interactions among the UEs Algorithm Structure → locks, mutexes, semaphores, memory fences, barriers, design space monitors, ... Supporting structures 3 to support data exchange among the UEs design space

Implementation → communication channels, queues, shared memory, mechanisms design space collective operations (broadcasts, multicast, barrier,

Conclusions reduce) ... Implementation mechanisms

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Then: Design patterns “recipes” to be implemented in order to get a working program Algorithmic skeletons predefined program constructs (language constructs, classes, library entries) implementing parallel patterns

Design patterns vs. algorithmic skeletons

Parallel design patterns

M. Danelutto Sw engineering vs. HPC community Introduction SwEng Focus on efficiency of the programming process Parallel design patterns HPC Focus on performance Finding concurrency (and programmer productivity) design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Design patterns vs. algorithmic skeletons

Parallel design patterns

M. Danelutto Sw engineering vs. HPC community Introduction SwEng Focus on efficiency of the programming process Parallel design patterns HPC Focus on performance Finding concurrency (and programmer productivity) design space Then: Algorithm Structure design space Design patterns “recipes” to be implemented in order to

Supporting get a working program structures design space Algorithmic skeletons predefined program constructs Implementation (language constructs, classes, library entries) mechanisms design space implementing parallel patterns Conclusions Typical skeleton based program development

Parallel design patterns

M. Danelutto 1 Figure out which skeleton (composition) models your Introduction problem Parallel design patterns 2 Instantiate skeletons Finding functional parameters (e.g. code, data types) & non concurrency design space functional ones (e.g parallelism degree) Algorithm 3 Fine tune program performance Structure design space parameter sweeping, bottleneck analysis or skeleton Supporting restructuring structures design space → no parallel debugging, correctness guaranteed Implementation mechanisms → no possibility to use parallel patterns not supported by design space the skeleton set Conclusions Typical algorithmic skeleton frameworks

Parallel design patterns

M. Danelutto Muesli (H. Kuchen, Munster Univ. D)

Introduction C++ class library Parallel design stream parallel skeletons patterns Pipeline, Farm, Branch&Bound, Divide&Conquer (Atomic, Finding concurrency Filter, Final, Initial) design space data parallel skeletons Algorithm Structure DistributedXXX (XXX ∈ { Array, Matrix, SparseMatrix}) design space + fold, map, scan, zip Supporting structures design space target architecture: C++ (MPI + OpenMP) Implementation all communication, synchronization, shared access mechanisms design space problems solved in the skeleton implementation Conclusions program declaration separated from execution Parallel design patterns in perspective

Parallel design patterns Principles M. Danelutto Architecting parallel Introduction software with design Parallel design patterns, not just parallel patterns programming languages Finding concurrency Split productivity and design space efficiency layers, not just a Algorithm single general-purpose layer Structure design space Generating code with Supporting search-based autotuners, structures design space not compilers

Implementation Synthesis with sketching mechanisms design space Verification and testing, not Conclusions one or the other ... Parallel design patterns in perspective (2)

Parallel design patterns TBB ( Building Block library by Intel,2005) M. Danelutto C++ library Introduction

Parallel design currently version 3.0 (since late 2010) patterns base building blocks for parallel programming with thread Finding concurrency parallel loop, reduce, pipeline design space tasks Algorithm parallel containers Structure design space mutexes Supporting structures since 3.0 → TBB Design patterns design space Agglomeration, Elementwise, Odd-even communication, Implementation Wavefront, Reduction, Divide&Conquer, GUI thread, mechanisms design space Non-preemptive priorities, Local serializer, Fenced data Conclusions transfer, Lazy initialization, Reference counting, Compare-and-swap loop. Sample TBB pattern: D&C

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Sample TBB pattern: D&C

Parallel design patterns

M. Danelutto

Introduction Example Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Sample APPL

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Sample APPL

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Sample APPL

Parallel design patterns

M. Danelutto

Introduction

Parallel design patterns

Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions You will need it also to program iPhone applications!

Conclusions

Parallel design patterns

M. Danelutto

Introduction Become an expert in parallel computing Parallel design patterns → by studying and applying parallel design patterns !!! Finding concurrency design space

Algorithm Structure design space

Supporting structures design space

Implementation mechanisms design space

Conclusions Conclusions

Parallel design patterns

M. Danelutto

Introduction Become an expert in parallel computing Parallel design patterns → by studying and applying parallel design patterns !!! Finding concurrency design space

Algorithm Structure design space

Supporting structures You will need it design space also to program iPhone applications! Implementation mechanisms design space

Conclusions