Introduction

Approaches to Parallel

K. Cooper1

1Department of Mathematics Washington State University

2019 Introduction Paradigms

Concept

Many hands make light work. . . Set several processors to work on separate aspects of a problem. Simulation: One program, different data, no communication. Master-Slave: One program, sends small tasks to many subprocesses. Communication only to/from master . Multiple Instruction Streams: Separate programs running on many processors. Communication among processes via messages. Introduction Paradigms

Single Instruction, Multiple Data

Single CPU with many ALUs ID step fills registers for each ALU EX step does computation simultaneously on all ALUs Introduction Paradigms

SIMD ...

After instructions are decoded, the same operations can be executed on a vector array of numbers. Introduction Paradigms

Disadvantages

Specialized architecture Slower to fill ALU registers – bottleneck Many ALUs idle during EX Introduction Paradigms

Single Instruction, Multiple

Many CPUs Main program spins many threads for one instruction Examples Python parallel package – uses several cores of a CPU CUDA computing – use hundreds or thousands of cores of a GPU Conjecture: This is only efficient with many many cores. Introduction MIMD

Multiple Instruction, Multiple Data

Many CPUs Asynchronous Redundant work Much more versatile than most SIMD Introduction MIMD

Shared Memory

E.g. Quad Core CPU Bus-based Limited bandwidth on FSB Scales poorly Switch-based Expensive Still does not scale well – communication bottleneck Introduction MIMD

Distributed Memory

Each node adds memory to system. Maybe no single node sees entire problem. E.g. Each CPU requires its own dedicated memory Could be separate sectors in single RAM ... Could be separate machines Communication becomes a roadblock Introduction MIMD

Distributed Memory MIMD Introduction MIMD

Message Passing

Typically, each instruction stream starts identically Each starts with same code Processes perform different tasks based on rank I/O to processes is performed through messages Introduction MIMD

Interconnection Network

Front side bus Infiniband Ethernet - slow Introduction MIMD SPMD SPMD

Single Program, Multiple Data You write one (1) program ...... that program runs on every processor Instances Processes perform tasks based on conditions and messages Processes have different inputs, outputs Introduction MIMD SPMD Nomenclature

Node A computer connected to a head machine by some means Interconnect The means of connecting the nodes Core A single processor on one of the CPUs of a node Processor Usually means a core Process A program that runs on a processor. Possibly (but not desirably) many processes per processor. Introduction Summary

Summary

When CPUs were expensive: Pipelines As chips became denser SIMD As CPUs become commodities: MIMD As GPUs become dense: GPU Introduction Summary

Goal

Hope to show that we can modify programs easily to take advantage of modern processors Getting speedups is more problematic Introduction Summary

Resources

Solitary - Two cpus, four cores each, 8GB RAM runs prime1 on 6 cores in .038 seconds on OpenMPI Cluster - Five nodes, one cpu per node, six cores per cpu, 8GB RAM per node runs prime1 on 6 cores in .041 seconds on MPICH2 Labs - 20 to 32 nodes, two to eight cores per node