11/6/17

Introduction to parallel programming

Alberto Bosio, Associate Professor – UM Microelectronic Departement [email protected]

Definitions

l What is parallel programming? l is the simultaneous use of multiple compute resources to solve a computational problem

1 11/6/17

Serial vs parallel

l Serial Computing

l Traditionally, software has been written for serial computation: To be run on a single CPU; A problem is broken into a discrete series of instructions. Instructions are executed one after another.

l Only one instruction may execute at any moment

Serial vs parallel

l Simultaneous use of multiple compute resources to solve a computational problem:

l To be run using multiple CPUs

l A problem is broken into discrete parts, solved concurrently

l Each part is broken down to a series of instructions

l Instructions from each part execute simultaneously on different CPUs

2 11/6/17

Serial vs parallel

Why parallel computing?

l Save time and/or money l Solve larger problems l Provide concurrency l Use of non-local resources l Limits to serial computing, physical and practical

l Transmission speeds

l Miniaturization

l Economic limitations

3 11/6/17

Shared Memory

l : all processors access all memory as a global address space. l CPUs operate independently but share memory resources. Changes in a memory location effected by one are visible to all other processors. l Two main classes based upon memory access times: UMA and NUMA.

Uniform Memory Access l Uniform Memory Access (UMA): Identical processors l Equal access and access times to memory Cache coherent: if one processor updates a location in shared memory, all the other processors know about the update.

4 11/6/17

Non-Uniform Memory Access l Non-Uniform Memory Access (NUMA):

l Often made by physically linking two or more SMPs

l One SMP can directly access memory of another SMP

l Not all processors have equal access time to all memories

l Memory access across link is slower

Shared Memory

l Advantages l Global address space is easy to program l Data sharing is fast and uniform due to the proximity memory/CPUs l Disadvantages l Lack of between memory and CPUs Programmer responsibility for synchronization

5 11/6/17

Distributed Memory l Communication network to connect inter- processor memory. Processors have their own local memory.

l No concept of global address space l CPUs operate independently.

l Change to local memory have no effect on the memory of other processors.

l Cache coherency does not apply. l Data communication and synchronization are programmer's responsibility l Connection used for data transfer varies (es. Ethernet)

Distributed Memory

6