
Concepts in parallel programming “The Angry Penguin“, used under creative commons licence from Swantje Hess and Jannis Pohlmann. 17/12/2018 Warwick RSE Parallel Computing • Solving multiple problems at once • Goes back before computers • Rooms full of people working on problems • Cryptanalysis • Calculating tide tables • We’re interested in getting computers to do it • How? Used under CC BY-SA 4.0. Attributed to UK Government under Crown copyright Parallelism that we’re not talking about • bit level parallelism • Processors work on chunks of data at once rather than bit by bit • Instruction level parallelism • Processors can operate on more than one variable at a time • NOT multicore • A large chunk of optimisation of code is trying to improve instruction level parallelism Parallelism that we are talking about • Task level parallelism • Split up a “task” into separate bits that computer can work on separately Embarrassing parallelism • Embarrassing parallelism • Tasks that are unrelated to each other can easily just be handed off to different processors • Exploring parameters • “Task farming” Embarrassing parallelism • Just run multiple copies of your program(s) • Don’t run more than you have physical processors • Simultaneous Multithreading (Hyper-threading) doesn't generally work well with research computing loads • Can use scheduler systems to queue jobs to run when there is a free processor Tightly coupled parallelism • Tightly coupled parallelism • Your problem is split up into separate chunks, but each chunk needs information from other chunks • Can be some other chunks • Can be all other chunks • You have to make the data available so that every chunk that needs it has access to it Problems in parallel computing • To allow multiple processors to work on a chunk, you have to do two things • Make sure that the data is somewhere so that it can be used by a processor (communication) • Make sure that data transfer is synchronised so that you know you can use it when you need to • Different models solve these problems in different ways Shared Memory CPU CPU CPU CPU Memory CPU CPU CPU CPU Shared Memory (SMP) • Several processors all have direct access to the same memory • Each processor has a work chunk, but the memory that it uses to hold all of the information is in the shared memory • Communication is automatic • Synchronisation is still a problem Shared Memory (SMP) • Surprisingly nasty problem • Imagine the code i = i + 1 CPU i=0i=1 i=0i=1 Shared Memory (SMP) • Now imagine doing it on two processors, each running i = i + 1 • The final result, should be 2 CPU i=0i=1 CPU i=0i=1 i=0i=1 Atomic operations • Ancient Greek via Latin via French • Indivisible • An atomic operation cannot be interrupted • Ultimately shared memory only works because there exist atomic operations • Usually they are hidden away under a library • Whole field of non-blocking algorithms use them though • Previous example has atomic Read-modify-write Shared Memory (SMP) • Solution is to have each processor only work on things that are safe independently • When something like that happens you enter a critical section where things happen one after the other • That’s the simplest case • It can get a lot harder • But, for most problems there are only a few bits like this • Most of the time is spent with only one processor working on each bit of memory, so it’s quite easy OpenMP • Most common in research codes • Directives that tell the compiler how to parallelise loops in code • Automates some synchronisation • Still have critical sections • Can’t do everything • Newer releases are much more powerful Threads • Explicitly tell code to run a given function in another thread • Operating system will try to schedule threads on free processors so that all processors have the same load • Common in non-academic codes, less so in academia • Synchronisation is done through Mutual Exclusion (Mutex) system (a specific form of critical section) • Explicit function for a thread to get the Mutex and another to release it • Only one thread can hold the Mutex at a time, others wait until the first has released it Distributed Memory RAM CPU CPU RAM RAM CPU CPU RAM Fabric RAM CPU CPU RAM RAM CPU CPU RAM Distributed Memory • Processors all have their own memory • Data can only flow between processors through a fabric • Typically send and receive data explicitly • Manual communication • Synchronisation is tied directly to communication • When receive operation completes data is synchronised Distributed Memory • Have to manually work out the transfer of data between processors • Send explicit messages between processors containing data • This can be difficult in general but there are strategies • Fabric is in general quite slow compared with memory access • Minimise the amount of data transferred and the number of transfers requested MPI • The Message Passing Interface (MPI) is the most popular distributed memory library • Just a library, no compiler involvement • Includes routines for • Sending • Receiving • Collective operations (summing over processors etc.) • Parallel file I/O • Others Why use MPI? • Performs very nearly as well as shared memory on a single computer • Harder to program than OpenMP • For cases where OpenMP is simple • Comparable difficulty to writing threaded code • Library itself works on the largest supercomputers in the world • Your algorithm might not, but it’ll still go as far as it can with MPI MPI on shared memory hardware • MPI works fine on shared memory hardware • At the user level treats it as if it was distributed memory • Some shared memory features, covered in advanced materials • For algorithms that work well on distributed memory performance comparable to OpenMP or pthreads • Some algorithms map better to shared memory • Can use Hybrid MPI/OpenMP/pthreads code if you want best of both worlds Alternatives? • OpenSHMEM • Coarray Fortran • Unified Parallel C • Chapel • X10 • None of them have obvious advantages over MPI at the moment and many of them are poorly or patchily supported.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages23 Page
-
File Size-