1 2
Outline
Motivation for MPI
Overview of PVM and MPI
The pro cess that pro duced MPI
What is di erent ab out MPI?
{ the \usual" send/receive
Jack Dongarra
{ the MPI send/receive
{ simple collective op erations
Computer Science Department
New in MPI: Not in MPI
UniversityofTennessee
Some simple complete examples, in Fortran and C
and
Communication mo des, more on collective op erations
Implementation status
Mathematical Sciences Section
Oak Ridge National Lab oratory
MPICH - a free, p ortable implementation
MPI resources on the Net
MPI-2
http://www.netlib.org/utk/p eople/JackDongarra.html
3 4
Messages
What is SPMD?
2 Messages are packets of data moving b etween sub-programs.
2 Single Program, Multiple Data
2 The message passing system has to b e told the
2 Same program runs everywhere.
following information:
2 Restriction on the general message-passing mo del.
{ Sending pro cessor
2 Some vendors only supp ort SPMD parallel programs.
{ Source lo cation
{ Data typ e
2 General message-passing mo del can b e emulated.
{ Data length
{ Receiving pro cessors
{ Destination lo cation { Destination size
5 6
Access Point-to-Point Communication
2 A sub-program needs to b e connected to a message passing 2 Simplest form of message passing.
system.
2 One pro cess sends a message to another
2 A message passing system is similar to:
2 Di erenttyp es of p oint-to p oint communication
{ Mail b ox
{ Phone line
{ fax machine
{ etc.
7 8 Synchronous Sends Asynchronous Sends
Provide information about the completion of the Only know when the message has left. message.
?
"Beep"
9 10 Non−Blocking Operations
Return straight away and allow the sub−program to
king Op erations Blo c continue to perform other work. At some later time
the sub−program can TEST or WAIT for the Relate to when the op eration has completed.
2 completion of the non−blocking operation.
2 Only return from the subroutine call when the
op eration has completed.
11 12 Barriers Broadcast Synchronise processes. A one−to−many communication.
Barrier
Barrier
Barrier
13 14
Parallelizati on { Getting Started Starting with a large serial application
Reduction Operations
{ Lo ok at the Physics { tly parallel?
Combine data from several processes to produce a Is problem inheren Examine lo op structures {
single result. {
Are any indep endent? Mo derately so?
o ols likeForge90 can b e helpful
STRIKE T
{ Lo ok for the core linear algebra routines {
Replace with parallelized versions
Already b een done. check survey
15 16
Popular Distributed Programming Schemes Parallel Programming Considerations
Master / Slave Granularity of tasks
Master task starts all slave tasks and co ordinates their work and I/O Key measure is communication/computation ratio of the machine: Num-
ber of bytes sent divided bynumb er of ops p erformed. Larger granu-
SPMD hostless
larity gives higher sp eedups but often lower parallelism.
Same program executes on di erent pieces of the problem
Numb er of messages
Functional
Desirable to keep the numb er of messages low but dep ending on the al-
Several programs are written; each p erforms a di erent function in the
gorithm it can b e more ecient to break large messages up and pip eline
application.
the data when this increases parallelism.
Functional vs. Data parallelism
Which b etter suits the application? PVM allows either or b oth to b e used.
17 18
Network Programming Considerations Load Balancing Metho ds
Message latency Static load balancing
Network latency can b e high. Algorithms should b e designed to account Problem is divided up and tasks are assigned to pro cessors only once.
for this f.e. send data b efore it is needed. The numb er or size of tasks maybevaried to account for di erent
computational p owers of machines.
Di erent Machine Powers
Dynamic load balancing by p o ol of tasks
Virtual machines may b e comp osed of computers whose p erformance
varies over orders of magnitude. Algorithm must b e able to handle this. Typically used with master/slavescheme. The master keeps a queue
of tasks and sends them to idle slaves until the queue is empty.Faster
Fluctuating machine and network loads
machines end up getting more tasks naturally. see xep example in
Multiple users and other comp eting PVM tasks cause the machine and
PVM distribution
network loads to change dynamically. Load balancing is imp ortant.
Dynamic load balancing by co ordination
Typically used in SPMD scheme. All the tasks synchronize and redis-
tribute their work either at xed times or if some condition o ccurs f.e.
load imbalance exceeds some limit
19 20
Communication Tips Bag of Tasks
Limit size, numb er of outstanding messages Comp onents
{ Can load imbalance cause to o many outstanding messages? { Job p o ol
{ Mayhave to send very large data in parts { Worker pool
Scheduler Sending { Task State of each job State of each worker
Unstarted Pvmd Idle A
Running B A
Receiving B Busy Task
Finished
Figure 1: Bag of tasks state machines
Complex communication patterns
{ Network is deadlo ck-free, shouldn't hang
Possible improvements
{ Still have to consider
{ Adjust size of jobs
Correct data distribution
To sp eed of workers
Bottlenecks
To turnaround time granularity
{ Consider using a library
{ Start bigger jobs b efore smaller ones
ScaLAPACK: LAPACK for distributed-memory machines
{ Allowworkers to communicate
BLACS: Communication primitives
more complex scheduling
Oriented towards linear algebra
Matrix distribution w/ no send-recv
Used by ScaLAPACK
21 22
PVM Is Physical and Logical Views of PVM
ware package that allows a collection of serial, parallel and
PVM is a soft Physical
vector computers on a network to b e managed as one large computing
resource. IP Network (routers, bridges, ...)
Po or man's sup ercomputer
{ High p erformance from network of workstations
{ O -hours crunching
Metacomputer linking multiple sup ercomputers
{ Very high p erformance
Computing elements adapted to subproblems { Multiprocessor
Host host
{ Visualization
Educational to ol
Logical
{ Simple to install
Pvmd (host)
{ Simple to learn
{ Available Can b e mo di ed { Tasks
Console(s)
23 24
Parts of the PVM System Programming in PVM
PVM daemon pvmd A simple message-passing environment
{ One manages each host of virtual machine { Hosts, Tasks, Messages
{ Mainly a message router, also has kernel-like functions { No enforced top ology
{ Has message entry points where tasks request service { Virtual machine can b e comp osed of any mix of machine typ es
{ Inter-host p ointofcontact
Pro cess Control
{ Authentication
{ Tasks can b e spawned/killed anywhere in the virtual machine
{ Creates pro cesses
Communication
{ Collects output printed by pro cesses
{ Any task can communicate with any other
{ Fault detection of pro cesses, network
{ Data conversion is handled by PVM
{ More robust than application comp onents
Dynamic Pro cess Groups
Interface library libpvm
{ Tasks can join/leave one or more groups at any time
{ Linked with each application comp onent
{ 1. Functions to comp ose, send, receive messages
Fault Tolerance
{ 2. PVM syscal ls that send requests to pvmd
{ Task can request noti cation of lost/gained resources
{ Machine-dep endent communication part can b e replaced
Underlying op erating system usually Unix is visible
{ Kept as simple as p ossible
Supp orts C, C++ and Fortran
PVM Console
Can use other languages must b e able to link with C
{ Interactive control of virtual machine
{ Kind of likea shel l
{ Normal PVM task, several can b e attached, to any host
25 26
Hellos World Unique Features of PVM
Program hello1.c, the main program: Software is highly p ortable
Allows fully heterogeneous virtual machine hosts, network
include
include "pvm3.h"
Dynamic pro cess, machine con guration
main
Supp ort for fault tolerant programs
{
int tid; /* tid of child */
System can b e customized
char buf[100];
Large existing user base
printf"I'm tx\n", pvm_mytid;
Some comparable systems
pvm_spawn"hello2", char**0, 0, "", 1, &tid;
pvm_recv-1, -1;
{ Portable message-passing
pvm_bufinfocc, int*0, int*0, &tid;
pvm_upkstrbuf;
MPI
printf"Message from tx: s\n", tid, buf;
p4
pvm_exit;
exit0;
Express
}
PICL
{ One-of-a-kind
Program hello2.c, the slave program:
NX
include "pvm3.h"
CMMD
main
{ Other typ es of communication
{
AM
int ptid; /* tid of parent */
char buf[100];
Linda
ptid = pvm_parent;
Also DOSs, Languages, ...
strcpybuf, "hello, world from ";
gethostnamebuf + strlenbuf, 64;
pvm_initsendPvmDataDefault;
pvm_pkstrbuf;
pvm_sendptid, 1;
pvm_exit;
exit0;
}
27 28
Portability How to Get PVM
PVM home page URL Oak Ridge is
Con gurations include
http://www /ep m/ orn l/g ov /pv m/p vm
home.html
803/486 BSDI, NetBSD, FreeBSD Alliant FX/8
803/486 Linux BBN Butter y TC2000
PVM source co de, user's guide, examples and related material are pub-
DEC AlphaOSF-1, Mips, uVAX Convex C2, CSPP
DG Aviion Cray T-3D, YMP, 2, C90 Unicos
lished on Netlib, a software rep ository with several sites around the
HP 68000, PA-Risc Encore Multimax
world.
IBM RS-6000, RT Fujitsu 780UXP/M
Mips IBM Power-4
{ To get started, send email to netlib:
NeXT Intel Paragon, iPSC/860, iPSC/2
Silicon Graphics Kendall Square
mail netlib@orn l. gov
Sun 3, 4x SunOS, Solaris Maspar
Subject: send index from pvm3
NEC SX-3
Sequent
Stardent Titan
A list of les and instructions will b e automatically mailed back
Thinking Machines CM-2, CM-5
{ Using xnetlib: select directory pvm3
Very p ortable across Unix machines, usually just pick options
FTP: host netlib2.cs .u tk. ed u, login anonymous, directory /pvm3
Multipro cessors:
URL: http://www. net li b.o rg /pv m3/ in dex .h tml
{ Distributed-memory: T-3D, iPSC/860, Paragon, CM-5, SP-2/MPI
Bug rep orts, comments, questions can b e mailed to:
{ Shared-memory: Convex/HP, SGI, Alpha, Sun, KSR, Symmetry
[email protected] m.o rn l.g ov
{ Source co de largely shared with generic 80
Usenet newsgroup for discussion and supp ort:
PVM is p ortable to non-Unix machines
comp.paral lel .p vm
{ VMS p ort has b een done
Bo ok:
PVM: Paral lel Virtual Machine
{ OS/2 p ort has b een done
A Users' Guide and Tutorial for Networked Paral lel Computing
{ Windows/NT p ort in progress
MIT press 1994.
PVM di erences are almost transparent to programmer
{ Some options may not b e supp orted
{ Program runs in di erentenvironment
29 30
Installing PVM Building PVM Package
Package requires a few MB of disk + a few MB / architecture Software comes with con gurations for most Unix machines
Don't need ro ot privelege Installation is easy
Libraries and executables can b e shared b etween users After package is extracted
PVM cho oses machine architecture name for you { cd $PVM
ROOT
more than 60 currently de ned
{ make
Environmentvariable PVM
ROOT p oints to installed path
Software automatically
{ E.g. /usr/loca l/p vm3 .3 .4 or $HOME/pvm3
{ Determines architecture typ e
{ If you use csh, add to your .cshrc:
{ Creates necessary sub directories
setenv PVM
ROOT /usr/loca l/p vm3
{ Builds pvmd, console, libraries, group server and library
{ If you use sh or ksh, add to your .profile:
{ Installs executables and libraries in lib and bin
PVM
ROOT=/usr/l oc al/ pvm 3
PVM DPATH=$PVM ROOT/lib/p vm d
export PVM ROOT PVM DPATH
Imp ortant directories b elow $PVM ROOT
include Header les
man Manual pages
lib Scripts
lib/ARCH System executables
bin/ARCH System tasks
31 32
Starting PVM XPVM
Graphical interface for PVM
Three ways to start PVM
pvm [-ddebugmask] [-nhostname][host le]
{ Performs console-like functions
PVM console starts pvmd, or connects to one already running
{ Real-time graphical monitor with
View of virtual machine con guration, activity
xpvm
Space-time plot of task status
Graphical console, same as ab ove
Host utilization plot
pvmd [-ddebugmask] [-nhostname][host le]
Call level debugger, showing last libpvm call by each task
Manual start, used mainly for debugging or when necessary to enter
Writes SDDF format trace les
passwords
Can b e used for p ost-mortem analysis
Some common error messages
Built on top of PVM using
{ Can't start pvmd
Check PVM
ROOT is set, .rhosts correct, no garbage in .cshrc
{ Group library
{ Can't contact local daemon
{ Libpvm trace system
PVM crashedpreviously; socket le left over
{ Output collection system
{ Version mismatch
Mixed versions of PVM instal led or stale executables
{ No such host
Can't resolve IP address
{ Duplicate host
Host already in virtual machine or shared /tmp directory
{ failed to start group server
Group option not built or ep= not correct
{ shmget: ... No space left on device
Stale segments left from crash or not enough arecon gured
33 34
Programming Interface Pro cess Control
pvm
spawnfile, argv, flags, where, ntask, tids
Ab out 80 functions
Start new tasks
Message bu er manipulation Create, destroy bu ers
PvmTaskDefault Round-robin
PvmTaskHost Named host "." is lo cal
{ Placement options
Pack, unpack data
PvmTaskArch Named architecture class
Message passing Send, receive
PvmHostCompl Complements host set
Multicast
PvmMppFront Start on MPP service no de
{ Other ags
Pro cess control Create, destroy tasks
PvmTaskDebug Enable debugging dbx
PvmTaskTrace Enable tracing
Query task tables
{ Spawn can return partial success
Find own tid, parent tid
Dynamic pro cess groups With optional group library
pvm mytid
Join, leave group
Find my task id / enrol l as a task
Map group memb ers ! tids
pvm
parent
Broadcast
Find parent's task id
Global reduce
Machine con guration Add, remove hosts
pvm
exit
Query host status
Disconnect from PVM
Start, halt virtual machine
pvm
killtid
Miscellaneous Get, set options
Terminate another PVM task
Request noti cation
Register sp ecial tasks
pvm
pstattid
Get host timeofday clo ck o sets
Query status of another PVM task
35 36
Basic PVM Communication Higher Performance Communication
Three-step send metho d Two matched calls for high-sp eed low-latency messages
{ pvm
initsenden co din g
{ pvm
psenddest, tag, data, num items, data type
Initialize send bu er, clearing current one
{ pvm precvsourc e, tag, data, num items, data type,
PvmDataD efa ult
asource, atag, alength
Enco ding can b e PvmDataR aw
PvmDataI nPl ace
Pack and send a contiguous, single-typ ed data bu er
{ pvm pktypedata, num items, stride
As fast as native calls on multipro cessor machines
...
Pack bu er with various data
{ pvm
senddest, tag
pvm mcastdests , count, tag
Sends bu er to other tasks, returns when safe to clear bu er
To receive
{ pvm
recvsource , tag
pvm nrecvsourc e, tag
Blo cking or non-blo cking receive
{ pvm
upktypedata, num items, stride
Unpack message into user variables
Can also pvm
probesourc e, tag for a message
Another receive primitive: pvm trecvsour ce , tag, timeout
Equivalenttopvm nrecv if timeout set to zero
Equivalenttopvm recv if timeout set to null
37 38
Collective Communication Virtual Machine Control
Collective functions op erate across all memb ers of a group pvm
addhostsho st s, num hosts, tids
Add hosts to virtual machine
{ pvm
barriergro up , count
pvm
Synchronize all tasks in a group confignhos ts , narch, hosts
Get current VM con guration
{ pvm
bcastgroup , tag
Broadcast message to all tasks in a group
pvm
delhostsho st s, num hosts, results
{ pvm
scatterres ul t, data, num items, data type,
Remove hosts from virtual machine
msgtag, rootinst, group
pvm
halt
pvm gatherresu lt , data, num items, data type,
Stop al l pvmds and tasks shutdown
msgtag, rootinst, group
Distribute and collect arrays across task groups
pvm
mstathost
Query status of host
{ pvm
reduce*fu nc , data, num items, data type,
msgtag, group, rootinst
pvm
start pvmdargc, argv, block
Reduce distributed arrays. Prede ned functions are
Start new master pvmd
PvmMax
PvmMin
PvmSum
PvmProduct
39 40
PVM Examples in Distribution Compiling Applications
Examples illustrate usage and serve as templates Header les
Examples include { C programs should include
hello, hello other Hello world
master, slave Master/slave program
spmd SPMD program
{ Sp ecify include directory: cc -I$PVM ROOT/inclu de ...
gexample Group and collective op erations
timing, timing slave Tests communication p erformance
{ Fortran: INCLUDE '/usr/loc al/ pvm 3/ inc lu de/ fpv m3 .h'
hitc, hitc slave Dynamic load balance example
Compiling and linking
xep, mtile Interactive X-Window example
{ C programs must b e linked with
Examples come with Make le.aimk les
libpvm3.a Always
libgpvm3.a If using group library functions
Both C and Fortrans versions for some examples
p ossibly other libraries for so cket or XDR functions
{ Fortran programs must additionally b e linked with libfpvm3.a
41 42
Compiling Applications, Cont'd Load Balancing
Aimk Imp ortant for application p erformance
{ Shares single make le b etween architectures Not done automatically yet?
{ Builds for di erent architectures in separate directories
Static { Assignmentofwork or placement of tasks
{ Determines PVM architecture
{ Must predict algorithm time
{ Runs make, passing it PVM
ARCH
{ Mayhave di erent pro cessor sp eeds
{ Do es one of three things
{ Externally imp osed static machine loads
If $PVM
ARCH/[Mm]a kef ile exists:
Dynamic { Adapting to changing conditions
Runs make in sub directory, using make le
Else if Makefile.a im k exists:
{ Make simple scheduler: E.g. Bag of Tasks
Creates sub directory, runs make using Makefile.ai mk
Simple, often works well
Otherwise:
Divide work into small jobs
Runs make in current directory
Given to pro cessors as they b ecome idle
PVM comes with examples
C { xep
Fortran { hitc
Can include some fault tolerance
{ Work migration: Cancel / forward job
Poll for cancel message from master
Can interrupt with pvm
sendsig
Kill worker exp ensive
{ Task migration: Not in PVM yet
Even with load balancing, exp ect p erformance to b e variable
43 44
Six Examples Circular Messaging
Avector circulates among the pro cessors
Each pro cessor lls in a part of the vector
P3 P4
Circular messaging
Inner pro duct
Matrix vector multiply row distribution
P2 P5
Matrix vector multiply column distribution
Integration to evaluate
P1
Solve 1-D heat equation
Solution:
SPMD
Uses the following PVM features:
{ spawn
{ group
{ barrier
{ send-recv
{ pack-unpack
45 46
Inner Pro duct
program spmd1
Problem: In parallel compute
include '/src/icl/pvm/pvm3/include/fpvm3.h'
n
X
T
s = x y
PARAMETER NPROC=4
i=1
integer rank, left, right, i, j, ierr
integer tidsNPROC-1
integer dataNPROC
C Group Creation
call pvmfjoingroup 'foo', rank
if rank .eq. 0 then
call pvmfspawn'spmd1',PVMDEFAULT,'*',NPROC-1,tids1,ierr endif
X Partial Ddot
call pvmfbarrier 'foo', NPROC, ierr
C compute the neighbours IDs
d 'foo', MODrank+NPROC-1,NPROC, left
call pvmfgetti Y
call pvmfgettid 'foo', MODrank+1,NPROC, right
Solution:
if rank .eq. 0 then
Master - Slave
C I am the first process
Uses the following PVM features:
do 10 i=1,NPROC
10 datai = 0
{ spawn
call pvmfinitsend PVMDEFAULT, ierr
call pvmfpack INTEGER4, data, NPROC, 1, ierr
{ group
call pvmfsend right, 1 , ierr
call pvmfrecv left, 1, ierr
{ barrier
call pvmfunpack INTEGER4, data, NPROC, 1, ierr
write*,* ' Results received :'
{ send-recv
write*,* dataj,j=1,NPROC
else
{ pack-unpack
C I am an intermediate process
Master sends out data, collects the partial solutions and computes the
call pvmfrecv left, 1, ierr
sum.
call pvmfunpack INTEGER4, data, NPROC, 1, ierr
datarank+1 = rank
Slaves receive data, compute partial inner pro duct and send the results
call pvmfinitsendPVMDEFAULT, ierr
call pvmfpack INTEGER4, data, NPROC, 1, ierr
to master.
call pvmfsend right, 1, ierr
endif
call pvmflvgroup 'foo', ierr
call pvmfexitierr
stop
end
47 48
Inner Pro duct - Pseudo co de
program inner
include '/src/icl/pvm/pvm3/include/fpvm3.h'
PARAMETER NPROC=7
PARAMETER N = 100
Master
double precision ddot
external ddot
integer remain, nb
Ddot =0
integer rank, i, ierr, bufid
fori=1to
integer tidsNPROC-1, slave, master
double precision xN,yN
send ith part of X to the ith slave
double precision result,partial
send ith part of Y to the ith slave
remain = MODN,NPROC-1
end for
nb = N-remain/NPROC-1
Ddot = Ddot + Ddotrema ini ng part of X and Y
call pvmfjoingroup 'foo', rank
fori=1to
if rank .eq. 0 then
receive a partial result
call pvmfspawn'inner',PVMDEFAULT,'*',NPROC-1,tids,ierr
Ddot = Ddot + partial result
endif
end for
call pvmfbarrier 'foo', NPROC, ierr
call pvmfgettid 'foo', 0, master Slave
C MASTER
Receive a part of X
if rank .eq. 0 then
Receive a part of Y
C Set the values
partial = Ddotpart of X and part of Y
do 10 i=1,N
send partial to the master
xi = 1.0d0
10 yi = 1.0d0
C Send the data
count = 1
do 20 i=1,NPROC-1
call pvmfinitsend PVMDEFAULT, ierr
call pvmfpack REAL8, xcount, nb, 1, ierr
call pvmfpack REAL8, ycount, nb, 1, ierr
call pvmfgettid 'foo', i, slave
call pvmfsend slave, 1, ierr
count = count + nb
20 continue
49 50
Matrix - Vector Pro duct
Row Distribution
result = 0.d0
C Add the remainding part
Problem: In parallel compute y = y + Ax, where y is of length m, x is of
partial = ddotremain,xN-remain+1,1,yN-remain+1,1
result = result + partial
length n and A is an m n matrix.
Solution:
C Get the result
do 30 i =1,NPROC-1
Master - Slave
call pvmfrecv-1,1,bufid
call pvmfunpack REAL8, partial, 1, 1, ierr
result = result + partial
Uses the following PVM features:
30 continue
print *, ' The ddot = ', result
{ spawn
C SLAVE
{ group
else
{ barrier
C Receive the data
{ send-recv
call pvmfrecv -1, 1, bufid
call pvmfunpack REAL8, x, nb, 1, ierr
{ pack-unpack
call pvmfunpack REAL8, y, nb, 1, ierr
C Compute the partial product
partial = ddotnb,x1,1,y1,1 Send back the result
C n
call pvmfinitsend PVMDEFAULT, ierr
call pvmfpack REAL8, partial, 1, 1, ierr
call pvmfsend master, 1, ierr endif
P1
call pvmflvgroup 'foo', ierr
call pvmfexitierr
stop P2 end m P3
P4
A X Y Y
51 52
Matrix - Vector Pro duct
program matvec_row
Row Distribution
include '/src/icl/pvm/pvm3/include/fpvm3.h'
Pseudo Co de
C
C y <--- y + A * x
C
C A : MxN visible only on the slaves
C X:N
C Y:M
C
Master
PARAMETER NPROC = 4
PARAMETER M = 9,N=6
fori=1to
PARAMETER NBY = INTM/NPROC+1
send X to the ith slave
double precision XN, YM
send Y to the ith slave
integer tidsNPROC
end for
integer mytid, rank, i, ierr, from
fori=1to
receive a partial result from a slave
call pvmfmytid mytid
update the correspond ing part of Y
call pvmfjoingroup 'foo', rank
end for
if rank .eq. 0 then
call pvmfspawn'matvecslv_row',PVMDEFAULT,'*',NPROC,tids,ierr
endif
Slave
call pvmfbarrier 'foo', NPROC+1, ierr
C Data initialize for my part of the data
Receive X
Receive Y
do 10 i = 1,N
Compute my part of the product
xi = 1.d0
10 continue
and Update my part of Y
do 15 i = 1,M
Send back my part of Y
yi = 1.d0
15 continue
C Send X and Y to the slaves
call pvmfinitsend PVMDEFAULT, ierr
call pvmfpackREAL8, X, N, 1, ierr
call pvmfpackREAL8, Y, M, 1, ierr
call pvmfbcast'foo', 1, ierr
53 54
program matvecslv_row
C I get the results include '/src/icl/pvm/pvm3/include/fpvm3.h'
do 20 i = 1, NPROC C
call pvmfrecv-1, 1, ierr C y <--- y + A * x
call pvmfunpack INTEGER4, from, 1, ierr C A : MxN visible only on the slaves
if from .EQ. NPROC then C X:N
call pvmfunpack REAL8, Yfrom-1*NBY+1, C Y:M
$ M-NBY*NPROC-1, 1, ierr C
else PARAMETER NPROC = 4
call pvmfunpack REAL8, Yfrom-1*NBY+1, PARAMETER M = 9,N=6
$ NBY, 1, ierr PARAMETER NBY = INTM/NPROC+1
endif
20 continue double precision ANBY,N
double precision XN, YM
write*,* 'Results received'
do 30 i=1,M integer rank, i, ierr, to
write*,* 'Y',i,' = ',Yi
30 continue external dgemv
call pvmflvgroup 'foo', ierr
call pvmfexitierr call pvmfjoingroup 'foo', rank
stop call pvmfbarrier 'foo', NPROC+1, ierr
end
C Data initialize for my part of the data
do 10 j = 1,N
do 20 i = 1,NBY
Ai,j = 1.d0
20 continue
10 continue
C I receive X and Y
call pvmfrecv -1, 1, ierr
call pvmfunpackREAL8, X, N, 1, ierr
call pvmfunpackREAL8, Y, M, 1, ierr
C I compute my part
if rank .NE. NPROC then
call dgemv'N', NBY, N, 1.d0, A, NBY,
$ X, 1, 1.d0, Yrank-1*NBY+1,1
else
call dgemv'N', M-NBY*NPROC-1, N, 1.d0, A, NBY,
$ X, 1, 1.d0, YNPROC-1*NBY+1, 1
endif
55 56
Matrix - Vector Pro duct
Column Distribution
Problem: In parallel compute y = y + Ax, where y is of length m, x is of
C I send back my part of Y
length n and A is an m n matrix.
call pvmfinitsendPVMDEFAULT, ierr
Solution:
call pvmfpack INTEGER4, rank, 1, 1, ierr
if rank .NE. NPROC then
Master - Slave
call pvmfpack REAL8, Yrank-1*NBY+1,NBY, 1, ierr
else
call pvmfpack REAL8, Yrank-1*NBY+1,M-NBY*NPROC-1,1,ierr
Uses the following PVM features:
endif
call pvmfgettid'foo',0,to
{ spawn
call pvmfsendto, 1, ierr
{ group
C done
{ barrier
call pvmflvgroup 'foo', ierr
{ reduce
call pvmfexitierr
stop
{ send-recv
end
{ pack-unpack
n
m P1 P2 P3 P4
A X Y
57 58
Matrix - Vector Pro duct
program matvec_col
Column Distribution
include '/src/icl/pvm/pvm3/include/fpvm3.h'
Pseudo Co de
C
C y <--- y + A * x
C
C A : MxN visible only on the slaves
C X:N
C Y:M
C
Master
PARAMETER NPROC = 4
PARAMETER M = 9,N=6
fori=1to
double precision XN, YM
send X to the ith slave
end for
external PVMSUM
Global Sum on Y root
integer tidsNPROC
integer mytid, rank, i, ierr
Slave
call pvmfmytid mytid
Receive X
call pvmfjoingroup 'foo', rank
Compute my Contributi on to Y
if rank .eq. 0 then
call pvmfspawn'matvecslv_col',PVMDEFAULT,'*',NPROC,tids,ierr
Global Sum on Y leaf
endif
call pvmfbarrier 'foo', NPROC+1, ierr
C Data initialize for my part of the data
do 10 i = 1,N
xi = 1.d0
10 continue
do 15 i = 1,M
yi = 1.d0
15 continue
59 60
program matvecslv_col
C Send X include '/src/icl/pvm/pvm3/include/fpvm3.h'
call pvmfinitsend PVMDEFAULT, ierr C
call pvmfpackREAL8, X, N, 1, ierr C y <--- y + A * x
call pvmfbcast'foo', 1, ierr C A : MxN visible only on the slaves
C X:N
C I get the results C Y:M
C
call pvmfreducePVMSUM,Y,M,REAL8,1,'foo',0,ierr PARAMETER NPROC = 4
PARAMETER M = 9,N=6
write*,* 'Results received' PARAMETER NBX = INTN/NPROC+1
do 30 i=1,M
write*,* 'Y',i,' = ',Yi external PVMSUM
30 continue
call pvmflvgroup 'foo', ierr double precision AM,NBX
call pvmfexitierr double precision XN, YM
stop
end integer rank, i, ierr
external dgemv
call pvmfjoingroup 'foo', rank
call pvmfbarrier 'foo', NPROC+1, ierr
C Data initialize for my part of the data
do 10 j = 1,NBX
do 20 i = 1,M
Ai,j = 1.d0
20 continue
10 continue
C I receive X
call pvmfrecv -1, 1, ierr
call pvmfunpackREAL8, X, N, 1, ierr
61 62
Integration to evaluate
C I compute my part
Computer approximations to by using numerical integration
Know
if rank .NE. NPROC then
0
call dgemv'N', M, NBX, 1.d0, A, M,
tan45 =1;
$ Xrank-1*NBX+1, 1, 1.d0, Y,1
else
same as
call dgemv'N', M,N-NBX*NPROC-1, 1.d0, A, M,
tan
=1;
$ XNPROC-1*NBX+1, 1, 1.d0, Y, 1
4
endif
So that;