11/6/17
Introduction to parallel programming
Alberto Bosio, Associate Professor – UM Microelectronic Departement [email protected]
Definitions
l What is parallel programming? l Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem
1 11/6/17
Serial vs parallel
l Serial Computing
l Traditionally, software has been written for serial computation: To be run on a single CPU; A problem is broken into a discrete series of instructions. Instructions are executed one after another.
l Only one instruction may execute at any moment
Serial vs parallel
l Simultaneous use of multiple compute resources to solve a computational problem:
l To be run using multiple CPUs
l A problem is broken into discrete parts, solved concurrently
l Each part is broken down to a series of instructions
l Instructions from each part execute simultaneously on different CPUs
2 11/6/17
Serial vs parallel
Why parallel computing?
l Save time and/or money l Solve larger problems l Provide concurrency l Use of non-local resources l Limits to serial computing, physical and practical
l Transmission speeds
l Miniaturization
l Economic limitations
3 11/6/17
Shared Memory
l Shared memory: all processors access all memory as a global address space. l CPUs operate independently but share memory resources. Changes in a memory location effected by one processor are visible to all other processors. l Two main classes based upon memory access times: UMA and NUMA.
Uniform Memory Access l Uniform Memory Access (UMA): Identical processors l Equal access and access times to memory Cache coherent: if one processor updates a location in shared memory, all the other processors know about the update.
4 11/6/17
Non-Uniform Memory Access l Non-Uniform Memory Access (NUMA):
l Often made by physically linking two or more SMPs
l One SMP can directly access memory of another SMP
l Not all processors have equal access time to all memories
l Memory access across link is slower
Shared Memory
l Advantages l Global address space is easy to program l Data sharing is fast and uniform due to the proximity memory/CPUs l Disadvantages l Lack of scalability between memory and CPUs Programmer responsibility for synchronization
5 11/6/17
Distributed Memory l Communication network to connect inter- processor memory. Processors have their own local memory.
l No concept of global address space l CPUs operate independently.
l Change to local memory have no effect on the memory of other processors.
l Cache coherency does not apply. l Data communication and synchronization are programmer's responsibility l Connection used for data transfer varies (es. Ethernet)
Distributed Memory
6