
Introduction to Unified Parallel C: A PGAS C Tarek El-Ghazawi 1 Tarek El-Ghazawi, GWU UPC Overview 1) UPC in a nutshell 4) Advanced topics in UPC Memory model Dynamic Memory Allocation Execution model UPC Systems Synchronization in UPC UPC Libraries 2) Data Distribution and Pointers Shared vs Private Data Examples of data distribution 5) UPC Productivity UPC pointers Code efficiency 3) Workload Sharing upc_forall 2 Tarek El-Ghazawi, GWU Introduction UPC – Unified Parallel C Set of specs for a parallel C v1.0 completed February of 2001 v1.1.1 in October of 2003 v1.2 in May of 2005 v1.3 in November of 2013 Compiler implementations by vendors and universities Consortium of government, academia, and HPC vendors including IDA CCS, GWU, UCB, MTU, U of Florida, UMCP, ANL, LBNL, LLNL, DoD, DoE, HP, Cray, IBM, UMN, ARSC, Sun, Intrepid, Etnus, … 3 Tarek El-Ghazawi, GWU • http://upc.lbl.gov/publications/upc-spec-1.3.pdf Introduction cont. UPC compilers are now available for most HPC platforms and clusters Some are open source A debugger and a performance analysis tool are available Benchmarks, programming examples, and compiler testing suite(s) are available Visit www.upcworld.org or upc.gwu.edu for more information 4 Tarek El-Ghazawi, GWU UPC Systems UPC Compilers Cray Hewlett-Packard Berkeley Intrepid IBM MTU UPC Tools Totalview PPW from UF TAU 5 Tarek El-Ghazawi, GWU UPC Home Page http://upc.gwu.edu 6 Tarek El-Ghazawi, GWU UPC textbook now available UPC: Distributed Shared Memory Programming Tarek El-Ghazawi William Carlson Thomas Sterling Katherine Yelick Wiley, May, 2005 ISBN: 0-471-22048-5 7 Tarek El-Ghazawi, GWU What is UPC? Unified Parallel C An explicit parallel extension of ISO C PGAS parallel programming language 8 Tarek El-Ghazawi, GWU UPC Execution Model A number of threads working independently in a SPMD fashion MYTHREAD specifies thread index (0..THREADS-1) Number of threads specified at compile-time or run- time Synchronization when needed Barriers Locks Memory consistency control 9 Tarek El-Ghazawi, GWU UPC Memory Model Thread 0 Thread 1 Thread THREADS-1 Shared Partitioned Global address space Private 0 Private 1 Private THREADS-1 Private Spaces A pointer-to-shared can reference all locations in the shared space, but there is data-thread affinity A private pointer may reference addresses in its private space or its local portion of the shared space Static and dynamic memory allocations are supported for both shared and private memory 10 Tarek El-Ghazawi, GWU User’s General View A collection of threads operating in a single global address space, which is logically partitioned among threads. Each thread has affinity with a portion of the shared address space. Each thread has also a private space. 11 Tarek El-Ghazawi, GWU UPC Overview 1) UPC in a nutshell 4) Advanced topics in UPC Memory model Dynamic Memory Allocation Execution model UPC Systems Synchronization in UPC UPC Libraries 2) Data Distribution and Pointers Shared vs Private Data Examples of data distribution 5) UPC Productivity UPC pointers Code efficiency 3) Workload Sharing upc_forall 12 Tarek El-Ghazawi, GWU A First Example: Vector addition Thread 0 Thread 1 //vect_add.c Iteration #: 0 1 #include <upc_relaxed.h> 2 3 #define N 100*THREADS v1[0] v1[1] Shared Shared Space shared int v1[N], v2[N], v1plusv2[N]; v1[2] v1[3] void main() { … int i; v2[0] v2[1] for(i=0; i<N; i++) v2[2] v2[3] if (MYTHREAD==i%THREADS) … v1plusv2[i]=v1[i]+v2[i];v1plusv2[0] v1plusv2[1] } v1plusv2[2]…v1plusv2[3] 13 Tarek El-Ghazawi, GWU 2nd Example: A More Efficient Implementation Thread 0 Thread 1 Iteration #: 0 1 //vect_add.c 2 3 #include <upc_relaxed.h> v1[0] v1[1] #define N 100*THREADS Shared Space v1[2] … v1[3] shared int v1[N], v2[N], v1plusv2[N]; void main() { v2[0] v2[1] int i; v2[2] … v2[3] for(i=MYTHREAD; i<N; i+=THREADS) v1plusv2[i]=v1[i]+v2[i]; v1plusv2[0] v1plusv2[1] } v1plusv2[2]…v1plusv2[3] 14 Tarek El-Ghazawi, GWU 3rd Example: A More Convenient Implementation with upc_forall //vect_add.c Thread 0 Thread 1 #include <upc_relaxed.h> Iteration #: 0 1 #define N 100*THREADS 2 3 v1[0] v1[1] shared int v1[N], v2[N], v1plusv2[N]; Shared Space v1[2] … v1[3] void main() v2[0] v2[1] { v2[2] v2[3] int i; … upc_forall(i=0; i<N; i++; i) v1plusv2[0] v1plusv2[1] v1plusv2[i]=v1[i]+v2[i]; v1plusv2[2] v1plusv2[3] } … 15 Tarek El-Ghazawi, GWU Example: UPC Matrix-Vector Multiplication- Default Distribution // vect_mat_mult.c #include <upc_relaxed.h> shared int a[THREADS][THREADS] ; shared int b[THREADS], c[THREADS] ; void main (void) { int i, j; upc_forall( i = 0 ; i < THREADS ; i++; i){ c[i] = 0; for ( j= 0 ; j THREADS ; j++) c[i] += a[i][j]*b[j]; } } 16 Tarek El-Ghazawi, GWU Data Distribution Thread 0 Thread 2 Thread Th. 0 1 Thread Th. 0 = * Th. 1 Th. 1 Th. 2 Th. 2 C A B 17 Tarek El-Ghazawi, GWU A Better Data Distribution Th. 0 Thread 0 Th. 0 = * Th. 1 Thread 1 Th. 1 Th. 2 Thread 2 Th. 2 C A B 18 Tarek El-Ghazawi, GWU Example: UPC Matrix-Vector Multiplication- The Better Distribution // vect_mat_mult.c #include <upc_relaxed.h> shared [THREADS] int a[THREADS][THREADS]; shared int b[THREADS], c[THREADS]; void main (void) { int i, j; upc_forall( i = 0 ; i < THREADS ; i++; i){ c[i] = 0; for ( j= 0 ; j THREADS ; j++) c[i] += a[i][j]*b[j]; } } 19 Tarek El-Ghazawi, GWU.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages19 Page
-
File Size-