Introduction
Approaches to Parallel Computing
K. Cooper1
1Department of Mathematics Washington State University
2019 Introduction Paradigms
Concept
Many hands make light work. . . Set several processors to work on separate aspects of a problem. Simulation: One program, different data, no communication. Master-Slave: One program, sends small tasks to many subprocesses. Communication only to/from master process. Multiple Instruction Streams: Separate programs running on many processors. Communication among processes via messages. Introduction Paradigms
Single Instruction, Multiple Data
Single CPU with many ALUs ID step fills registers for each ALU EX step does computation simultaneously on all ALUs Introduction Paradigms
SIMD Pipeline...
After instructions are decoded, the same operations can be executed on a vector array of numbers. Introduction Paradigms
Disadvantages
Specialized architecture Slower to fill ALU registers – bottleneck Many ALUs idle during EX Introduction Paradigms
Single Instruction, Multiple Thread
Many CPUs Main program spins many threads for one instruction Examples Python parallel package – uses several cores of a CPU CUDA computing – use hundreds or thousands of cores of a GPU Conjecture: This is only efficient with many many cores. Introduction MIMD
Multiple Instruction, Multiple Data
Many CPUs Asynchronous Redundant work Much more versatile than most SIMD Introduction MIMD
Shared Memory
E.g. Quad Core CPU Bus-based Limited bandwidth on FSB Scales poorly Switch-based Expensive Still does not scale well – communication bottleneck Introduction MIMD
Distributed Memory
Each node adds memory to system. Maybe no single node sees entire problem. E.g. Beowulf cluster Each CPU requires its own dedicated memory Could be separate sectors in single RAM ... Could be separate machines Communication becomes a roadblock Introduction MIMD
Distributed Memory MIMD Introduction MIMD
Message Passing
Typically, each instruction stream starts identically Each processor starts with same code Processes perform different tasks based on rank I/O to processes is performed through messages Introduction MIMD
Interconnection Network
Front side bus Infiniband Ethernet - slow Introduction MIMD SPMD SPMD
Single Program, Multiple Data You write one (1) program ...... that program runs on every processor Instances Processes perform tasks based on conditions and messages Processes have different inputs, outputs Introduction MIMD SPMD Nomenclature
Node A computer connected to a head machine by some means Interconnect The means of connecting the nodes Core A single processor on one of the CPUs of a node Processor Usually means a core Process A program that runs on a processor. Possibly (but not desirably) many processes per processor. Introduction Summary
Summary
When CPUs were expensive: Pipelines As chips became denser SIMD As CPUs become commodities: MIMD As GPUs become dense: GPU Introduction Summary
Goal
Hope to show that we can modify programs easily to take advantage of modern processors Getting speedups is more problematic Introduction Summary
Resources
Solitary - Two cpus, four cores each, 8GB RAM runs prime1 on 6 cores in .038 seconds on OpenMPI Cluster - Five nodes, one cpu per node, six cores per cpu, 8GB RAM per node runs prime1 on 6 cores in .041 seconds on MPICH2 Labs - 20 to 32 nodes, two to eight cores per node