Parallel Algorithms Parallel Models U Hypercube U Butterfly U Fully Connected U Other Networks U Shared Memory V.S

Parallel Algorithms Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD The PRAM Model u Parallel Random Access Machine u All processors act in lock-step u Number of processors is not limited u All processors have local memory u One global memory accessible to all processors u Processors must read and write global memory A Pram Algorithm u Every Processor knows its own index (usually indicated by variable i) u Vector Sum: Read M[i] Into x; Read M[i+n] Into y; x := x + y; Write x into M[i]; Binary Fan-In Read M[i] into Largest; Write M[i] into M[i+n]; Delta := 1; For k := 1 to élg nù Read M[i+Delta] into x; Largest := Maximum(x,Largest); Write Largest into M[i]; Delta := Delta * 2; End For Parallel Addition Read M[i] into Total; Write 0 into M[i+n]; Delta := 1; For k := 1 to élg nù Read M[i+Delta] into x; Total := x + Total; Write Total into M[i]; Delta := Delta * 2; End For Pointer Jumping Read M[i] Into Total; For k := 1 to élg nù Read Next[i] into Ptr If Ptr ¹ 0 Then Read M[Ptr] Into x; Total := Total + x; Write Total into M[i]; Read Next[Ptr] Into NewPtr Write NewPtr into Next[i] End If End For Initialization of Next[i] If i = n Then Write 0 Into Next[i]; Else Write i+1 Into Next[i]; End If Calculate Node Depth I If there is a Left Child 1 -1 To “1” of Left Child 0 From “-1” of Left Child Calculate Node Depth 2 If there is no left child 1 -1 0 Calculate Node Depth 3 If there is a Right Child 1 -1 From “-1” of Right Child 0 To “1” of Right Child Calculate Node Depth 4 If there is no right child 1 -1 0 Concurrent Reads & Writes u EREW - Exclusive Read, Exclusive Write u CREW - Common Read, Exclusive Write u CRCW - Common Read, Common Write – All common writes must write the same thing – Highest Priority Processor wins contest u CREW is more powerful than EREW u CRCW is more powerful than CREW Finding Max u Square Array of Processors Indexed by i,j Write True into R[i]; Read M[i] into x; Read M[j] into y; If x < y Then Write False Into R[i]; Else If y < x Then Write False Into R[j]; End If CRCW V.S. CREW u CRCW Max runs in constant time u CREW Max runs in lg n time u CRCW cannot be any better than lg p faster than EREW EREW V.S. CREW u Finding Roots by Shortcutting Pointers u CREW Runs in lg lg n Time u EREW Runs in lg n Time Optimal Parallel Algorithms u NC -- The class of algorithms that run in Q(logmn) time using Q(nk) processors u General Boolean Functions Cannot be Computed any Faster than Q(lg n) u Q(lg n) is optimal for computing the sum of n integers Parallel Algorithms Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD The PRAM Model u Parallel Random Access Machine u All processors act in lock-step u Number of processors is not limited u All processors have local memory u One global memory accessible to all processors u Processors must read and write global memory A Pram Algorithm u Every Processor knows its own index (usually indicated by variable i) u Vector Sum: Read M[i] Into x; Read M[i+n] Into y; x := x + y; Write x into M[i]; Binary Fan-In Read M[i] into Largest; Write M[i] into M[i+n]; Delta := 1; For k := 1 to élg nù Read M[i+Delta] into x; Largest := Maximum(x,Largest); Write Largest into M[i]; Delta := Delta * 2; End For Parallel Addition Read M[i] into Total; Write 0 into M[i+n]; Delta := 1; For k := 1 to élg nù Read M[i+Delta] into x; Total := x + Total; Write Total into M[i]; Delta := Delta * 2; End For Pointer Jumping Read M[i] Into Total; For k := 1 to élg nù Read Next[i] into Ptr If Ptr ¹ 0 Then Read M[Ptr] Into x; Total := Total + x; Write Total into M[i]; Read Next[Ptr] Into NewPtr Write NewPtr into Next[i] End If End For Initialization of Next[i] If i = n Then Write 0 Into Next[i]; Else Write i+1 Into Next[i]; End If Calculate Node Depth I If there is a Left Child 1 -1 To “1” of Left Child 0 From “-1” of Left Child Calculate Node Depth 2 If there is no left child 1 -1 0 Calculate Node Depth 3 If there is a Right Child 1 -1 From “-1” of Right Child 0 To “1” of Right Child Calculate Node Depth 4 If there is no right child 1 -1 0 Concurrent Reads & Writes u EREW - Exclusive Read, Exclusive Write u CREW - Common Read, Exclusive Write u CRCW - Common Read, Common Write – All common writes must write the same thing – Highest Priority Processor wins contest u CREW is more powerful than EREW u CRCW is more powerful than CREW Finding Max u Square Array of Processors Indexed by i,j Write True into R[i]; Read M[i] into x; Read M[j] into y; If x < y Then Write False Into R[i]; Else If y < x Then Write False Into R[j]; End If CRCW V.S. CREW u CRCW Max runs in constant time u CREW Max runs in lg n time u CRCW cannot be any better than lg p faster than EREW EREW V.S. CREW u Finding Roots by Shortcutting Pointers u CREW Runs in lg lg n Time u EREW Runs in lg n Time Optimal Parallel Algorithms u NC -- The class of algorithms that run in Q(logmn) time using Q(nk) processors u General Boolean Functions Cannot be Computed any Faster than Q(lg n) u Q(lg n) is optimal for computing the sum of n integers.

Parallel Algorithms Parallel Models U Hypercube U Butterfly U Fully Connected U Other Networks U Shared Memory V.S

Vertex Coloring

Easy PRAM-Based High-Performance Parallel Programming with ICE∗

Instructor's Manual

Lawrence Berkeley National Laboratory Recent Work

PRAM Algorithms Parallel Random Access Machine

CSE 4351/5351 Notes 9: PRAM and Other Theoretical Model S

Towards a Spatial Model Checker on GPU⋆

2009-04-21 AMSC Cloud Assembler

Implementing a Tool for Designing Portable Parallel Programs

Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures

PRAM Algorithms Parallel Random Access Machine

Introduction to Parallel Processing : Algorithms and Architectures