What Is SPMD? Messages

1 2 Outline Motivation for MPI Overview of PVM and MPI The pro cess that pro duced MPI What is di erent ab out MPI? { the \usual" send/receive Jack Dongarra { the MPI send/receive { simple collective op erations Computer Science Department New in MPI: Not in MPI UniversityofTennessee Some simple complete examples, in Fortran and C and Communication mo des, more on collective op erations Implementation status Mathematical Sciences Section Oak Ridge National Lab oratory MPICH - a free, p ortable implementation MPI resources on the Net MPI-2 http://www.netlib.org/utk/p eople/JackDongarra.html 3 4 Messages What is SPMD? 2 Messages are packets of data moving b etween sub-programs. 2 Single Program, Multiple Data 2 The message passing system has to b e told the 2 Same program runs everywhere. following information: 2 Restriction on the general message-passing mo del. { Sending pro cessor 2 Some vendors only supp ort SPMD parallel programs. { Source lo cation { Data typ e 2 General message-passing mo del can b e emulated. { Data length { Receiving pro cessors { Destination lo cation { Destination size 5 6 Access Point-to-Point Communication 2 A sub-program needs to b e connected to a message passing 2 Simplest form of message passing. system. 2 One pro cess sends a message to another 2 A message passing system is similar to: 2 Di erenttyp es of p oint-to p oint communication { Mail b ox { Phone line { fax machine { etc. 7 8 Synchronous Sends Asynchronous Sends Provide information about the completion of the Only know when the message has left. message. ? "Beep" 9 10 Non−Blocking Operations Return straight away and allow the sub−program to king Op erations Blo c continue to perform other work. At some later time the sub−program can TEST or WAIT for the Relate to when the op eration has completed. 2 completion of the non−blocking operation. 2 Only return from the subroutine call when the op eration has completed. 11 12 Barriers Broadcast Synchronise processes. A one−to−many communication. Barrier Barrier Barrier 13 14 Parallelizati on { Getting Started Starting with a large serial application Reduction Operations { Lo ok at the Physics { tly parallel? Combine data from several processes to produce a Is problem inheren Examine lo op structures { single result. { Are any indep endent? Mo derately so? o ols likeForge90 can b e helpful STRIKE T { Lo ok for the core linear algebra routines { Replace with parallelized versions Already b een done. check survey 15 16 Popular Distributed Programming Schemes Parallel Programming Considerations Master / Slave Granularity of tasks Master task starts all slave tasks and co ordinates their work and I/O Key measure is communication/computation ratio of the machine: Num- ber of bytes sent divided bynumb er of ops p erformed. Larger granu- SPMD hostless larity gives higher sp eedups but often lower parallelism. Same program executes on di erent pieces of the problem Numb er of messages Functional Desirable to keep the numb er of messages low but dep ending on the al- Several programs are written; each p erforms a di erent function in the gorithm it can b e more ecient to break large messages up and pip eline application. the data when this increases parallelism. Functional vs. Data parallelism Which b etter suits the application? PVM allows either or b oth to b e used. 17 18 Network Programming Considerations Load Balancing Metho ds Message latency Static load balancing Network latency can b e high. Algorithms should b e designed to account Problem is divided up and tasks are assigned to pro cessors only once. for this f.e. send data b efore it is needed. The numb er or size of tasks maybevaried to account for di erent computational p owers of machines. Di erent Machine Powers Dynamic load balancing by p o ol of tasks Virtual machines may b e comp osed of computers whose p erformance varies over orders of magnitude. Algorithm must b e able to handle this. Typically used with master/slavescheme. The master keeps a queue of tasks and sends them to idle slaves until the queue is empty.Faster Fluctuating machine and network loads machines end up getting more tasks naturally. see xep example in Multiple users and other comp eting PVM tasks cause the machine and PVM distribution network loads to change dynamically. Load balancing is imp ortant. Dynamic load balancing by co ordination Typically used in SPMD scheme. All the tasks synchronize and redis- tribute their work either at xed times or if some condition o ccurs f.e. load imbalance exceeds some limit 19 20 Communication Tips Bag of Tasks Limit size, numb er of outstanding messages Comp onents { Can load imbalance cause to o many outstanding messages? { Job p o ol { Mayhave to send very large data in parts { Worker pool Scheduler Sending { Task State of each job State of each worker Unstarted Pvmd Idle A Running B A Receiving B Busy Task Finished Figure 1: Bag of tasks state machines Complex communication patterns { Network is deadlo ck-free, shouldn't hang Possible improvements { Still have to consider { Adjust size of jobs Correct data distribution To sp eed of workers Bottlenecks To turnaround time granularity { Consider using a library { Start bigger jobs b efore smaller ones ScaLAPACK: LAPACK for distributed-memory machines { Allowworkers to communicate BLACS: Communication primitives more complex scheduling Oriented towards linear algebra Matrix distribution w/ no send-recv Used by ScaLAPACK 21 22 PVM Is Physical and Logical Views of PVM ware package that allows a collection of serial, parallel and PVM is a soft Physical vector computers on a network to b e managed as one large computing resource. IP Network (routers, bridges, ...) Po or man's sup ercomputer { High p erformance from network of workstations { O -hours crunching Metacomputer linking multiple sup ercomputers { Very high p erformance Computing elements adapted to subproblems { Multiprocessor Host host { Visualization Educational to ol Logical { Simple to install Pvmd (host) { Simple to learn { Available Can b e mo di ed { Tasks Console(s) 23 24 Parts of the PVM System Programming in PVM PVM daemon pvmd A simple message-passing environment { One manages each host of virtual machine { Hosts, Tasks, Messages { Mainly a message router, also has kernel-like functions { No enforced top ology { Has message entry points where tasks request service { Virtual machine can b e comp osed of any mix of machine typ es { Inter-host p ointofcontact Pro cess Control { Authentication { Tasks can b e spawned/killed anywhere in the virtual machine { Creates pro cesses Communication { Collects output printed by pro cesses { Any task can communicate with any other { Fault detection of pro cesses, network { Data conversion is handled by PVM { More robust than application comp onents Dynamic Pro cess Groups Interface library libpvm { Tasks can join/leave one or more groups at any time { Linked with each application comp onent { 1. Functions to comp ose, send, receive messages Fault Tolerance { 2. PVM syscal ls that send requests to pvmd { Task can request noti cation of lost/gained resources { Machine-dep endent communication part can b e replaced Underlying op erating system usually Unix is visible { Kept as simple as p ossible Supp orts C, C++ and Fortran PVM Console Can use other languages must b e able to link with C { Interactive control of virtual machine { Kind of likea shel l { Normal PVM task, several can b e attached, to any host 25 26 Hellos World Unique Features of PVM Program hello1.c, the main program: Software is highly p ortable Allows fully heterogeneous virtual machine hosts, network include <stdio.h> include "pvm3.h" Dynamic pro cess, machine con guration main Supp ort for fault tolerant programs { int tid; /* tid of child */ System can b e customized char buf[100]; Large existing user base printf"I'm tx\n", pvm_mytid; Some comparable systems pvm_spawn"hello2", char**0, 0, "", 1, &tid; pvm_recv-1, -1; { Portable message-passing pvm_bufinfocc, int*0, int*0, &tid; pvm_upkstrbuf; MPI printf"Message from tx: s\n", tid, buf; p4 pvm_exit; exit0; Express } PICL { One-of-a-kind Program hello2.c, the slave program: NX include "pvm3.h" CMMD main { Other typ es of communication { AM int ptid; /* tid of parent */ char buf[100]; Linda ptid = pvm_parent; Also DOSs, Languages, ... strcpybuf, "hello, world from "; gethostnamebuf + strlenbuf, 64; pvm_initsendPvmDataDefault; pvm_pkstrbuf; pvm_sendptid, 1; pvm_exit; exit0; } 27 28 Portability How to Get PVM PVM home page URL Oak Ridge is Con gurations include http://www /ep m/ orn l/g ov /pv m/p vm home.html 803/486 BSDI, NetBSD, FreeBSD Alliant FX/8 803/486 Linux BBN Butter y TC2000 PVM source co de, user's guide, examples and related material are pub- DEC AlphaOSF-1, Mips, uVAX Convex C2, CSPP DG Aviion Cray T-3D, YMP, 2, C90 Unicos lished on Netlib, a software rep ository with several sites around the HP 68000, PA-Risc Encore Multimax world. IBM RS-6000, RT Fujitsu 780UXP/M Mips IBM Power-4 { To get started, send email to netlib: NeXT Intel Paragon, iPSC/860, iPSC/2 Silicon Graphics Kendall Square mail netlib@orn l. gov Sun 3, 4x SunOS, Solaris Maspar Subject: send index from pvm3 NEC SX-3 Sequent Stardent Titan A list of les and instructions will b e automatically mailed back Thinking Machines CM-2, CM-5 { Using xnetlib: select directory pvm3 Very p ortable across Unix machines, usually just pick options FTP: host netlib2.cs .u tk. ed u, login anonymous, directory /pvm3 Multipro cessors: URL: http://www.

What Is SPMD? Messages

2.5 Classification of Parallel Computers

CS650 Computer Architecture Lecture 10 Introduction to Multiprocessors

Exploiting Automatic Vectorization to Employ SPMD on SIMD Registers

Polynomial-Time Algorithms for Enforcing Sequential Consistency in SPMD Programs with Arrays

Data Management and Control-Flow Aspects of an SIMD/SPMD Parallel Language/Compiler

Regent: a High-Productivity Programming Language for Implicit Parallelism with Logical Regions

Message Passing Interface Part - I

2.1 Flynn's Taxonomy of Parallel Architectures-Twoslidesxpage

Introduction to Parallel Programming with MPI Lecture #1

Introduction to Matlab Parallel Computing Toolbox

Computer Architecture: SIMD and Gpus (Part III) (And Briefly VLIW, DAE, Systolic Arrays)

Matlab: Parallel Computing Toolbox