Models for Parallel Computing: Review and Perspectives

Models for Parallel Computing: Review and Perspectives Christoph Kessler and Jörg Keller Journal Article N.B.: When citing this work, cite the original article. Original Publication: Christoph Kessler and Jörg Keller, Models for Parallel Computing: Review and Perspectives, Mitteilungen - Gesellschaft für Informatik e.V., Parallel-Algorithmen und Rechnerstrukturen, 2007. 24, pp. 13-29. Postprint available at: Linköping University Electronic Press http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-40734 PARSMitteilungen Dec ISSN GIITG PARS Mo dels for Parallel Computing Review and Persp ectives Christoph Kessler and Jrg Keller PELAB Dept of Computer Science IDA Linkping university Sweden chrkeidaliuse Dept of Mathematics and Computer Science Fernuniversitt Hagen Germany joergkellerFernUni Hagen de Abstract As parallelism on dierent levels b ecomes ubiquitous in to days computers it seems worthwhile to provide a review of the wealth of mo dels for parallel computation that have evolved over the last decades We refrain from a comprehensive survey and con centrate on mo dels with some practical relevance together with a p ersp ective on mo dels with potential future relevance Besides presenting the mo dels we also refer to languages implementations and to ols Key words Parallel Computational Mo del SurveyParallel Programming Language Par allel Cost Mo del Intro duction Recent desktop and highp erformance pro cessors provide multiple hardware threads tech nically realized by hardware multithreading and multiple pro cessor cores on a single chip Very so on programmers will b e faced with hundreds of hardware threads p er pro cessor chip As exploitable instructionlevel parallelism in applications is limited and the pro cessor clo ck frequency cannot be increased any further due to power consumption and heat problems exploiting threadlevel parallelism b ecomes unavoidable if further improvement in pro ces sor p erformance is requiredand there is no doubt that our requirements and exp ectations of machine p erformance will increase further This means that parallel programming will actually concern a ma jority of application and system programmers in the foreseeable fu ture even in the desktop and emb edded domain and no longer only a few working in the comparably small HPC market niche A model of paral lel computation consists of a parallel programming mo del and a corre sp onding cost mo del A paral lel programming model describ es an abstract parallel machine by its basic op erations such as arithmetic op erations spawning of tasks reading from and writing to shared memory or sending and receiving messages their eects on the state of the computation the constraints of when and where these can be applied and how they can be comp osed In particular a parallel programming mo del also contains at least for shared memory programming mo dels a memory model that describ es how and when mem ory accesses can b ecome visible to the dierent parts of a parallel computer The memory mo del sometimes is given implicitlyA paral lel cost model asso ciates a cost which usually describ es parallel execution time and resource o ccupation with each basic op eration and describ es how to predict the accumulated cost of comp osed op erations up to entire parallel programs A parallel programming mo del is often asso ciated with one or several paral lel programming languages or libraries that realize the mo del Parallel algorithms are usually formulated in terms of a particular parallel program ming mo del In contrast to sequential programming where the von Neumann mo del is the predominant programming mo del notable alternatives are eg dataow and declarative programming there are several comp eting parallel programming mo dels This heterogene ity is partly caused by the fact that some mo dels are closer to certain existing hardware architectures than others and partly b ecause some parallel algorithms are easier to express in one mo del than in another one Programming mo dels abstract to some degree from details of the underlying hardware which increases p ortabilityof parallel algorithms across a wider range of parallel program ming languages and systems In the following we give a brief survey of parallel programming mo dels and comment on their merits and p ersp ectives from b oth a theoretical and practical view While we fo cus on the in our opinion most relevant mo dels this pap er is not aimed at providing a comprehensive presentation nor a thorough classication of parallel mo dels and languages Instead we refer to survey articles and b o oks in the literature such as by Bal et al Skillicorn Giloi Maggs et al Skillicorn and Talia Hambrusch Lengauer and Leop old Mo del Survey Before going through the various parallel programming mo dels we rst recall in Section two fundamental issues in parallel program execution that o ccur in implementations of several mo dels Parallel Execution Styles There exist several dierent parallel execution styles which describ e dierent ways how the parallel activities eg pro cesses threads executing a parallel program are created and terminated from the programmers point of view The two most prominent ones are fork joinstyle and SPMDstyle parallel execution Execution in Forkjoin style spawns parallel activities dynamically at certain points fork in the program that mark the b eginning of parallel computation and collects and terminates them at another p oint join At the b eginning and the end of program execution only one activity is executing but the number of parallel activities can vary considerably during execution and thereby adapt to the currently available parallelism The mapping of activities to physical pro cessors needs to b e done at runtime by the op erating system bya thread package or by the languages runtime system While it is p ossible to use individual library calls for fork and join functionality as in the pthreads package most parallel pro gramming languages use scoping of structured programming constructs to implicitly mark fork and join p oints for instance in Op enMP they are given bytheentry and exit from an omp parallel region in Cilk they are given by the spawn construct that frames a function call Nestedparal lelism can b e achieved with forkjoin style execution by statically or dynam ically nesting such forkjoin sections within each other such that eg an inner fork spawns several parallel activities from each activity spawned by an outer fork leading to a higher degree of parallelism Execution in SPMD style single program multiple data creates a xed number p usually known only from program start of parallel activities physical or virtual pro cessors at the b eginning of program execution ie at entry to main and this number will be kept constant throughout program execution ie no new parallel activities can be spawned In contrast to forkjoin style execution the programmer is resp onsible for mapping the parallel tasks in the program to the p available pro cessors Accordingly the programmer has the resp onsibility for load balancing while it is provided automatically by the dynamic scheduling in the forkjoin style While this is more cumb ersome at program design time and limits exibility it leads to reduced runtime overhead as dynamic scheduling is no longer needed SPMD is used eg in the MPI message passing standard and in some parallel programming languages such as UPC Nested parallelism can be achieved with SPMD style as well namely if a group of p P pro cessors is sub divided into s subgroups of p pro cessors each where p p Each i i i subgroup takes care of one subtask in parallel After all subgroups are nished with their subtask they are discarded and the parent group resumes execution As group splitting can b e nested the group hierarchy forms a tree at any time during program execution with the leaf groups b eing the currently active ones This schema is analogous to exploiting nested parallelism in forkjoin style if one regards the original group G of p pro cessors as one pthreaded pro cess which may spawn s new p threaded pro cesses G i s such that the total number of active threads is not i i increased The parent pro cess waits until all child pro cesses subgroups have terminated and reclaims their threads Parallel Random Access Machine The Paral lel Random Access Machine PRAM mo del was prop osed byFortune and Wyllie as a simple extension of the Random Access Machine RAM mo del used in the design and analysis of sequential algorithms The PRAM assumes a set of pro cessors connected to a shared memory There is a global clo ck that feeds b oth pro cessors and memory and execution of any instruction including memory accesses takes exactly one unit of time indep endent of the executing pro cessor and the p ossibly accessed memory address Also there is no limit on the number of pro cessors accessing shared memory simultaneously The memory mo del of the PRAM is strict consistency the strongest consistency mo del known whichsays that a write in clo ck cycle t b ecomes globally visible to all pro cessors in the b eginning of clo ck cycle t not earlier and not later The PRAM mo del also determines the eect of multiple pro cessors writing or reading the same memory lo cation in the same clo ck cycle An EREW PRAM allows a memory lo cation to be exclusively read or written by at most one pro cessor at a time the CREW PRAM allows concurrent reading but exclusive writing and CRCW even allows simultaneous write accesses by several pro cessors to the same memory lo cation

Models for Parallel Computing: Review and Perspectives

Technical Report Aaron Councilman

Intel® Cilk™ Plus

Part V Some Broad Topic

An Introduction to Parallel Programming

Lecture Notes : Intel Xeon Phi Coprocessor - an Overview

The Relevance of Opencl to HPC

Speculative Parallelism in Cilk++

Parallel Programming with Matlabmpi ∗

(12) United States Patent (10) Patent No.: US 8.209,664 B2 Yu Et Al

Parallel Programming in Openmp About the Authors

Dryadlinq for Scientific Analyses

Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models