Lecture 2

Physical Organization

- (UMA) machines

P M All memory is equally far away from all processors. I P M

Early parallel processors like NYU Ultracomputer Problem: why go across network for instructions? read-only data? what about caches?

- Non-uniform memory access (NUMA) machines: M P I Access to local memory is usually 10-1000 times P faster than access to non-local memory M

Static and dynamic locality of reference are critical for high performance. Compiler support? Architectural support?

Bus-based symmetric multiprocessors (SMP's): combine both aspects ¡ Logical Organization - Model

x M single P address space P I (conceptual picture) P P P P M

- hardware/systems software provide single address space model to applications programmer - some systems: distinguish between local and remote references

- communication between processors: read/write shared memory locations: put get

- Model (Message Passing)

M P P P I (conceptual picture) P P M P

- each processor has its own address space - communication between processors: messages (like e-mail) - basic message-passing commands: send receive Key difference: In SMM, P1 can access remote memory locations w/o prearranged participation of application program on remote processor ¢ Message Passing £ Blocking SEND/RECEIVE : couple data transfer and synchronization

- Sender and receiver rendezvous to exchange data P P

SrcP DestP

...... M F M F x : = .... RECEIVE(y,SrcP) SEND(x, DestP) ...... History: Caltech Cosmic Cube

- SrcP field in RECEIVE command permits DestP to select which processor it wants to receive data from - Implementation: - SrcP sends token saying `ready to send' - DestP returns token saying `me too' - Data transfer takes place directly between application programs w/o buffering in O/S - Motivation: Hardware `channels' between processors in early multicomputers - Problem: - sender cannot push data out and move on - receiver cannot do other work if data is not available yet one possibility: new command TEST(SrcP,flag): is there a message from SrcP?

Overlapping of computation and communication is critical for performance ¤ Non-blocking SEND/RECEIVE : decouple synchronization from data transfer

SrcP DestP P P

...... x : = .... RECEIVE(y,SrcP,tag,flag) M F M F SEND(x, DestP,tag) ......

Network - SrcP can push data out and move on - Many variation: return to application program when - data is out on network? - data has been copied into an O/S buffer? - Tag field on messages permits receiver to receive messages in an order different from order that they were sent by SrcP - RECEIVE does not block - flag is set to true by O/S if data was transfered/false otherwise - Applications program can test flag and take the right action - What if DestP has not done a RECEIVE when data arrives from SrcP? - Data is buffered in O/S buffers at DestP till application program does a RECEIVE Can we eliminate waiting at SrcP ? Can we eliminate buffering of data at DestP ? ¥ Asynchronous SEND/RECEIVE

SrcP DestP P P

...... x : = .... IRECEIVE(y,SrcP,tag,flag2) M F M F ISEND(x, DestP,tag,flag1) ......

Network

- SEND returns as soon as O/S knows about what needs to be sent - `Flag1' set by O/S when data in x has been shipped out - Application program continues, but must test `flag1' before overwriting x - RECEIVE is non-blocking: - returns before data arrives - tells O/S to place data in `y' and set `flag' after data is received - `posting' of information to O/S - `Flag2' is written by O/S and read by application program on DestP - Eliminates buffering of data in DestP O/S area if IRECEIVE is posted before message arrives at DestP ¦ So far, we have looked at point-to-point communication

Collective communication: - patterns of group communication that can be implemented more efficiently than through long sequences of send's and receive's

- important ones: - one-to-all broadcast (eg. A*x implemented by rowwise distribution: all processors need x) - all-to-one reduction (eg. adding a set of numbers distributed across all processors) - all-to-all broadcast every processor sends a piece of data to every other processor - one-to-all personalized communication one processor sends a different piece of data to all other processors - all-to-all personalized communication each processor does a one-to-all communication § Example: One-to-all broadcast (intuition: think `tree')

3

2 0 3 7 1 1 Messages in each phase 6 2 do not compete for links 3 3 5 4 2

3

Assuming message size is small, time to send a message = Ts + h*Th where Ts = overhead at sender/receiver Th = time per hop

Total time for broadcast = Ts + Th*P/2 + Ts + Th*P/4 + ..... = Ts * logP + Th*(P-1)

Reality check: Actually, a k-ary tree makes sense because processor 0 can send many messages by the time processor 4 is ready to participate in broadcast ¨ Other topologies: use the same idea

1 2 2 2 2

2-D Mesh

Step 1: Broadcast within row of originating processor

Step 2: Broadcast within each column in parallel

Time = Ts logP + 2Th*(sqrt(P) -1)

© Example: All-to-one reduction

1

2 0 1 7 1 1 Messages in each phase 6 2 do not compete for links 3 1 5 4 2

1

Purpose: apply a commutative and associative operator (reduction operator) like +,*,AND,OR etc to values contained in each node

Can be viewed as inverse of one-to-all broadcast Same time as one-to-all broadcast

Important use: determine when all processors are finished working (implementation of `barrier')

Example: All-to-all broadcast

6 7 0 7

0 0 6 7 5 7 1 1 1 0 6 2 6 2 3 3 5 2 5 1 5 4 4 4 2 4 3 3

- Intuition: cyclic shift register - Each processor receives a value from one neighbor , stores it away, and sends it to next neighbor in the next phase. - Total of (P-1) phases to complete all-to-all broadcast

Time = (Ts + Th) *(P-1) assuming message size is small - Same idea can be applied to meshes as well: - first phase, all-to-all broadcast within each row - second phase, all-to-all broadcast within each column ¡ A Message-passing Program

¢ MPI: Message-Passing Interface

Goal: Portable Parallel Programming for Distributed Memory Computers

- Lots of vendors of Distributed Memory Computers: IBM,NCube, , CM-5, .....

- Each vendor had its own communication constructs => porting programs required changing parallel programs even to go from one distributed memory platform to another!

- MPI goal: standardize message passing constructs syntax and semantics

- Mid 1994: MPI-1 standard out and several implementations available (SP-2) £ Write an MPI program to perform matrix-vector multiply

A b

- Style of programming: Master-Slave - one master, several slaves - master co-ordinates activities of slaves - Master initially owns all rows of A and vector b

- Master broadcasts vector b to all slaves

- Slaves are self-scheduled - each slave comes to master for work - master sends a row of matrix to slave - slave performs product, returns result and asks for more work

- Very naive algorithm, but it's a start.

¤ Key MPI Routines we will use:

MPI_INIT : Initialize the MPI System MPI_COMM_SIZE: Find out how many processes there are MPI_COMM_RANK: Who am I? MPI_SEND: Send a message

MPI_SEND(address,count,datatype,DestP,tag,comm)

permits entire data structures identifies group to be sent with one command

MPI_RECV: Receive a message (blocking receive) MPI_FINALIZE: Terminate MPI MPI_BCAST: Broadcast

H

?

?



) - - -

!" / /;" " " " !" "

) . . . .

4  4

*5 *5 *5 *5 *5

*

5 2 5

$ $

. ) ' #.



%



"

& &

# #- #- #- #- - :

4 4

% % % % %

2 2

!"

!$ $ A A /

& &

) #-

2

6 

% % % % %

2 2 *

<

"

"

!

------% & ) &

'

6 6 6 6

% %

!

    

- ) ) ) ) & &

4 2

%&

   

- -

B # B , 0 B B

! 

& &

#-

/ /

) ) )

)

& &

' ) *

"

! !

- - .



 1

 B 4

* 8 8 8 8 I

& &

2

7

! )



& = = =

 0

(

B

 

/ - -

:



*5 ?

 B

*5

*

<

!"

 1

D

 #

 1 4   

>? >?

8

* * *

 -

'  F

&

0  

&  * ? ?

 

2 2



#)

/ #

. . 5 -



 B

3

* * 8 8

<

!" $

"



9

 1 B

8

* *

5 5

!"

-



* ,-



*+ 1 

C

7

" "

! )



2 E 

%

C

A$

)

#- -

5

9 2 

2



*G

<

A



#- 1

)

(

 - -

5

*

<

< <

  '

 )



2

1 )

,-  

=  '- #



# )

@

1

2

"

#

 

) B B

5 3

0

8

$

 -

/

5

 -

5

>?  @

8

@

$

#.

1 5

 8 8

?

<

! #

)

4

 3

 



?

 !

/ ) -

 )

1

9 

1  

. # .

4

.

 3 

5

1

!$

)

-



5

 

8

. !



-

,

5





8 8

 /;$



%

 1 4 9 9

 1

 

2



# :

#

 





4



1 #-

8 0

* 2

8

- -

/

/

* 2 - *  

 1



 

@

)



5 2

% 1 1

"

. /;" .

) 4





0 1  



-

5 1 5

4

! $

#

) #

%

1

@

 =

0

2

2 1

2

3

/

 ) !



* 2 -

*

A

! "

# . /

)

1 4

?

>

 / .

D

.

?

2

<

 $ $

- #

4

%

?

A -

 



*

)



1

@ 2

2

<

-

)



2 #

8

<



 

 



2

$

-

% *

B

)

'

5

 @



8

)

( 



8

#

1

2

@

J

*

9



3

- @ %







@

.

4

$

4 

<



5

-







%

8

B

*5

8

!

-

6

 C



-

,

2



@

) "

2

@

-

6

2

.

*

5

L M

H

? ? ? >?



.



N

A

K K 

*

<



'

*

5 5

@

" $ $ !"

. . . - - . .

4 4 4 

2 2 *5

<

=

5 5 5 5 

!

! !

)

5 %

2 *

$ " A

.  . . . .

- -

P # L 4 # # 4 0 #

D  H

4 4

% 

? ? 2 ?

"

"

)



5 5

*

*5 5 *5 *

<

"

A

! A

 

$  $

, 4

%

*

$

A

)

 

M &

-  # B # B

* - * # - *

% ? ?

4 >

2 %

@

<

)

#. . - 



*5

5 * *5 *5

*G

<

. 

= ' = = = = =

$ $ 1

* -

8 8

@

-





B '- :

D D

 F

- ? ? - &

#

 *

" "



1 : 1 1

I ) #

> > D > >

) #.

"

= = # = = 



5 

8



*

@

!" "

 $

. .

5

0

3 

<

$ -

/

1 = 1

 - 



* 5

> >

D %



8

"

.

! )

: 



%

1 9 2



*G

@

<

!

9 . / . #-

) 4

A  * -

1

2 2

#

=

- )

.

3 3

2

<

"

A



-



$ :

A 

%

5



2

"

. .

 

B )

1

2 2

#.



5

# O

' @

2

<

#

1

%

'- * 3

8

(

"

/

) # -

-

 1 4

) 3 *



2

> >

)

. - )

)



D

1

#

)

 3

5



@

 A

-

-

 



/

. )



B

* 5

@

8

.

#)

9

 2



 "

%

 8

2

 4

B -

1 )



2

 *







2

1 0

2

 /;"





2

"

,



B



 -

#

*



$

)





%

8

3

8

) -

 -













2

@

2

 

B





 

B



8





1 



B

"

!







Q



B

2



#-

 )





1

 1

1

*

& *

P S

R

? ? ?

K K K

/

. . )

4

5

- -

*5

" " /

- . )

S #

* 2

5

?

'

#

%

5 5

*5

<

.

. - $

5 4

D % %

2 *5

" $ $

. . .

4

- D #

5

4 - *

% %

*

<

"

$

A

) ' #

5

*5

   ! A

= #

$

5 %

<

A$

!   

# B - B B 1

R

- % -

? 2

2

<

) #

= 1

- -

3

2 >

2 *5 5 2

5

 . -

$

' # 



8 8 8

5 5

 .

- D

' B

- 2 #

0 

-

4 T

"

"

#

/

:

) -  

# ) -

4

=

& #

   5

%

1 8 )

<

$

#

1

0 

2

T

@

<

1

/

%

 " -

2

>

D

8

@ @

< <



# :-

&



1 1 9

2 

5

@ >

. =  = = 1

B )

*



<

"

)

=

5 5



1

2

<

" $ $

. 

1

5 A$

 % > 

8

B 



B



1

2 T

! !

: 1

5 2 > '

 @

8

<

$



- -

 -

8

2 ' 2 2



-

 =

B

*



 )

5 5

 @

)



!



1

8

#

 1 )

B

3

-  B # #

O O

9 8

T

2 

-

 

5

.





5

 8

@

8

> >



O

#  9

 



@

%

 



1

<

1

1 2 *



>

  



@

2

1 1



D

"



#



@



B

8

- -

$

#

2 *



)



5

8

)



1 

2

-

4

B 

)



@

-

*  1

B

)





)



.

4



1

8

5





@

-

U







B #

B

0



/

-

8

B



1







1



8



-

2



5



-

4

)

1  1

V

H

? >? ?



? ?



- -

 K K K

"

5 2 5



#.

- -

4 4

%

" "

 /

2

5 *

%

*

"

'

% %

'



"



-

4

% %

2

"

. . .  .

-

, 4 6 B 6

 % 5 %

*

<

'



5

2

"

"

   

#

%

8 I

2

2

) . .

# #

   

B B B # B

5

B

? %

>

- 

= $

#

?

*5

 

 $



? 2

8 8 8 8

  

- V B B

& H



-



 ? ?



2

=

* 

 

? ? B

<

"







8 8

 

*

=



,- 5



 

B

C

8





 

8

<

O 1



   A$

9 2 >





8

< <

A

1

A  - 

1 1

 

*

A$ .

"

) '



8

<

 ) -

9 '



 

5

# . 

3

'

%  )

1

@

- 

&

B

-

'

1 1

2

2

@

-

%

5 ) 



-

1

*

8

2 1

.

) 1

F

&



>

<

.

)

*

 -

*  %

@

! . )

 

B

4

1



2

@

/;$  )

B

A

5

8

 -

8

9



<





#-

 

8

@

*



B

#

 

@



 0



.

4 

/

 -

8

1







#







@

2



8

"

/

)

# #

3

 



8

. -





$

#

5

 

$

4



B

8



1

 B 

2

B 

#

B 



-

*

B



2

$

)

1  

)

5



1 B



W

"

!

#

@

X

!"

*

1 

% 2

#-

2

!"

.

4

) #-

)

5 1

-

1 2

-

#

"

)

!

-

5

1

2 2

"

#

2

0

/

- -

6 1

"

- ) )

!" / /;" " " " / !"

) . . ) . . . .

4 4 4

*5 *5 *5 *5 *5

* 5 5

$ $

. ) '

5

"

# #- #- #- #- : - :

4

% % % %

*5 2 2

/

 )

!$ $ A A

) # # #-

5 2

6

% % % %

* 2 2

<

$

 #.

"

! B

------)

' # #

6 6 6 6 0 0

% %

  

/ /

- ) ) ) )

- -

4 *5 0

%&

   ! /

B B B 1

! /

#- -

8

/ /

) ) )

/

) #



' )

"

! !

- - .



1 A

B . .

* 8 8 8

7

!

 1

======

 0

(

)

 

T

/ )

:



?

B

*5

*

<

!"

 1 . M

D H

 #

1 4  

> >? >?

8 #

* * *

2

&



' 

0  

&  ? ?



2 2 T

<

$

#) 

/ -

. . 5 -





3

* * 8 8

<

!" $



 %

9

 1 B

8

5 5

-

# 

B

* &



*+ 1  2

2

7

" "

! )

#

2 E

C 2

A$

)

#- -

5 3

9  8

2



"



1

) H

(



5

T

<

< <

  '



2

1

@

,-  

=  '- # ,-



# 

@

1

!

#

 

) B B

5 3

0

8 

"

$

-

/ 0

5

 -

>? 

8

8

$

#.

1 5 9 *

 8 8

?



<

! #

)

1 4

 

 



?

!

/ )

)

1

9 

1  

. -

4

. %

  

1



!$

"

)

5

 

1 T

8

.

.



-

,





8 8





%

 1 4 9 9

2

 1

 

2

'

# :

# #

 

>





1

8 0

* 2

8

- -

/ 1

H

/

* 2 -  

 1 @



  T

@

)

5

% 1 1

"

/

. )

)

5

0 1  



$

.

-

5 1 5

4

! $

)

%

1

@

!

=

-

0

2

2 1

2

! /

)

* 2

*

A

! "

# /

) .

1 4

?

>

2

/ .

D

.

-

?

2

<

 $

- #

%

?

A -





*

1

2

2Y

)

1

@ 2

2

<

-

)



2

8

T

<



 



T

-

% *

B

)

'

*

 @



8

)

2

( 



8

W

"

1

2

@

9



3

%

- @



*





@

,-

$

4 

<









%

8

T

B

*5

8

6

 C



-

,



@

)

@

2

.

*

5

L M

H

? ? ? >?

K K K

*

'

5 5

" $ $ !"

. . . - . - .

4 4 4

2 2 *5

<

5 5 5 5

!

! !

)

5 %

2 *

$ " " A

- . . . . .

- -

P # L 4 # 4 # 0 #

D H

4 4

2

%

? ? 2 ?

"

"

)

5 5 5

*

*5 5 *5 *

<

"

A

! A

 

$ $

, 4

% %

*

$

A

)

 

M &

- - # # B B

4 * - * # - *

% % ? ?

4 >

2 %

@

<

"

)

#. - .

)

*5

5 * *5 *5

*G

<

.  

= ' = = = = =

$ $ 1

5

* -

8 8

@

-

 

B B '- :

D D

F 

- ? ? - &

#

 *

" "

: = 1 1 1

) #

2 > > D > >

) #.

"

= = # #  = =



5 

8 8



*

@

!" "

$

. .

2 5

0 0

3 

<

" $ -

#

/ /

1 = 1

- -  

* 5

> >

D %



8

"

.

! )

:



%

1 1 2 9



*G

@

<

$ !

# . / . #-

) 4

A *  -

1

2 2

#

=

- )

" .

3 3

2

< <

"

A

 

-

$ :

A 

%

2 5



5

2

"

. .

 

B B )

1

2 2

<

5

 # O

' 1 @

2 2

<

#

1

%

 '- * 3

8 8

B

(

/

) # -

-

1 4

) 3 *

 

2

> >

1 . - )

)

> D

1 

8

#

)

  3

5



@



-

-

 



/

. )

 

B



5

@

8 8

 .

B

9 9

  2



%

  8

2

4

 -

1



 8

2

  *

 

 

@

2



1 1

2

 /;"



2



B





-

#

* 2





"

)

#



8 

3

8

)

 -





8

$

#







@

2

 1 

B

2

B 





 B

-

*

B



8

)





)

1 

 

1

B

@

W

"

!



W



B



B

2



#-

)



8

1

1





1

*



8







1 

P S V

R

? ? ? ?

-

K K K K

%

-

-

2

/ " "

. . )

4

2

*

"

'

% %

5

"

- -

-

% %

*5 2

" /

. ) . .

-

, S # 4 6

*

5 % 5 %

?

<

'

#

%

5

2

*5

<

"

.

"

   

. - $ #

5 4

D %

2 *5 2

2

" $ $

. . . ) .

#

   

4

D # B B B B

5 5

- *

% ?

>

*

<

$

-

A = $

' # #

5

?

*5

  ! A

= #

$ 

%

? 2

8 8 8 8

<

A$

!  

# - B B - 1

R & H



- % -



? 2

 ?

2 2

<

) #

1 =

- -

3 * 

 

>

2 *5 5 2 ? ?

5

<

"

 . -

$

'





8 8

 

5 5

*

 . =

- D

' B

- # ,- 5



-

 

4 T B

8

"

:

) 

# ) -

4

 

8

<

= O 1

& #

   5 A$

%

8 ) 9 2 >





8

<

<

A

1

1

0  A 

1 1

2

 

T

@

<

1 A$ .

/

%

 -

2

> '



D

8

@ @

<

# :-

&

  -

1 1 9 '



 

@ >

# . 

. = = = 1

)

3 *

'

 % )

1

@

"

)

- 

&

= B

5 5

-

'

1

2

<

" $ $

.  -

1

5 A$ %

% >  5 )



 -

B

1

*

8

1 2

2 T

! ! .

: )

F

&



5 2 '

@

<

<

$

 .

- - )

-

8

2 ' 2 2

*

- 

 =

B

*  %



)

5 5

@

@

! ) .



! B

4

1



2

1

@

#

/;$ 1 )

3

-  B # # A

O O 5

8

T

2

8

-

 -

 8

9



5

<

. 

 #-

5  

@

8

*

> >



B

#

O

# 9 

 

@

%

  

0



<

.

4 

/

1

 -

8

1 2 * 1



>



 

 

@

#



1 1 

D

@



2

8

@

"

/

)

 # #

B

3

 



- -

. -

2 *



) $

 #

5

8

5

 

) $

4



B

8

-

4



1

  

2

)

B

@

 1 #





-

*

 B

2

$

)

 1  

.

)

4

5

B

8

5







W

"

!

- #

@

Z

*

 1 

#

% 2

B

#-

0



!"

.

4

/ )

-

B

5 1

1



-

2

-

#



1

)

!

-

1

-

2

5 2

"

2

-

4

) -

6 1

H

>? ?

? ?

- -

K

"

5 2 5

#.

-

4 4

%

/

5

%

*

'



4

"

.  .

B 6

*

'

5

%

8 I

.

#

#

B

%

*5

 

$



 

V B B

-

?



B



8 8



C





 

<

- 

*

"

)

8

<

)

9



5



1

2

@





1

1

>

-

*

)



 )

B



8

@











8





B



B





1

W[

!"

2

#-

)

1

"

5

2

#

0

/

- 1

z

g

^

\] w

m

b

_

b

d

m

e

e

j

g

^

f

_

g

`

e j

j m

^

j

_` n

n

l

|

acb

p

d

o

^

d

^m

g j

b

de

e

_

a

n

^

_

b

k

e

d

_

e

{

~

f

d q

j

h

d

_

gih

^m

n

l

`

]

e

d x

h

j

l

h

j

e

n

n

b

d

b

k

m

g

`

h

e

j

b

d

s

` j

n

n

^

|

l

u

d

g

`

^

e

`

]

}

d

f

j

^

r

e

_

d

_

k

_ }

€

d

s

b

h

j

q

`

n j

h

p

_

e

l

^

g

b

l

~‚

l

s

b

d t

d

n

o

^m

e

€

h

W

k

o

ƒ

€

j

`

h

s

^

w

m

dy

_n

r

o

h

u

q

b

s p

b

`

~

^

}

y

}

do

}

d

r

t

j

}

v

h

j

d

m

m

`

n

v

]

g

g

e

e e

j

j

m

k

k

h

h

j

j

`

]

l

l

d

l

_

^

y

jz

~

t

^m

s

k

`

_

`

e

]

`

^ _