Determining Variable Scope

Determining variable scope[§3.4] This step is specific to the shared-memory programming model. For each variable, we need to decide how it is used. There are three possibilities: Read-only: variable is only read by multiple tasks R/W non-conflicting: variable is read, written, or both by only one task R/W conflicting: variable is written by one task and may be read by anotherIntuitively, why are these cases different? Example 1 for (i=1; i<=n; i++) Let’s assume for (j=1; j<=n; j++) { each iteration S2: a[i][j] = b[i][j] + c[i][j]; of the for i S3: b[i][j] = a[i][j-1] * d[i][j]; loop is a } parallel task. Fill in the tableaus here. Read-only R/W non-conflicting R/W conflictingNow, let’s assume that each for j iteration is a separate task.Read-only R/W non-conflicting R/W conflictingDo these two decompositions have the same number of tasks? Lecture 6 Architecture of Parallel Computers 1 Example 2 Let’s assume that for (i=1; i<=n; i++) each for j iteration is for (j=1; j<=n; j++) { a separate task. S1: a[i][j] = b[i][j] + c[i][j]; S2: b[i][j] = a[i-1][j] * d[i][j]; S3: e[i][j] = a[i][j]; }Read-only R/W non-conflicting R/W conflictingExercise: Suppose each for i iteration were a separate task …Read-only R/W non-conflicting R/W conflictingPrivatization Privatization means making private copies of a shared variable.What is the advantage of privatization? Of the three kinds of variables in the table above, which kind(s) does it make sense to privatize? Under what conditions is a variable privatizable?When a variable is privatized, one private copy is made for each thread (not each task).© 2010 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2010 2 Result of privatization: R/W conflicting  R/W non-conflictingLet’s revisit the examples.Example 1 for (i=1; i<=n; i++) Of the R/W conflic- for (j=1; j<=n; j++) { ting variables, which S2: a[i][j] = b[i][j] + c[i][j]; are privatizable? S3: b[i][j] = a[i][j-1] * d[i][j]; }Which case does each such variable fall into? i is case 2 (value known ahead of time); j is case 1 (always written by task before task reads it).We can think of privatized variables as arrays, indexed by process ID: Example 2 Parallel tasks: each for j loop iteration.Is the R/W conflicting variable j privatizable? If so, which case does it represent? Reduction Reduction is another way to remove conflicts. It is based on partial sums.Suppose we have a large matrix, and need to perform some operation on all of the elements— let’s say, a sum of products—to produce a single result.We could have a single processor undertake this, but this is slow and does not make good use of the parallel machine.So, we divide the matrix into portions, and have one processor work on each portion.Lecture 6 Architecture of Parallel Computers 3 Then after the partial sums are complete, they are combined into a global sum. Thus, the array has been “reduced” to a single element.Examples:  addition (+), multiplication (*) Logical (and, or, …)The reduction variable is the scalar variable that is the result of a reduction operation.Criteria for reducibility: Reduction variable is updated by each task, and the order of update  Hence, the reduction operation must be Goal: Compute y = y_init op x1 op x2 op x3 … op xn op is a reduction operator if it is commutative u op v = v op u and associative (u op v) op w = u op (v op w)Certain operations can be transformed into reduction operations (see homework).© 2010 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2010 4 Summary of scope criteriaShould be Should be Should be de- Non-privatizable declared declared clared R/W private shared reduction conflictingExample 1 for (i=1; i<=n; i++) with for i parallel for (j=1; j<=n; j++) { S2: a[i][j] = b[i][j] + c[i][j]; tasks S3: b[i][j] = a[i][j-1] * d[i][j]; Fill in the answers } here.Read-only R/W non-conflicting R/W conflictingDeclare as shared Declare as privateExample 2 for (i=1; i<=n; i++) for (j=1; j<=n; j++) { with for j parallel tasks S1: a[i][j] = b[i][j] + c[i][j]; Fill in the answers S2: b[i][j] = a[i-1][j] * d[i][j]; S3: e[i][j] = a[i][j]; here. }Read-only R/W non-conflicting R/W conflictingDeclare as shared Declare as privateLecture 6 Architecture of Parallel Computers 5 for (i=0; i<n; i++) Example 3 for (j=0; j<n; j++) { C[i][j] = 0.0; Consider matrix for (k=0; k<n; k++) { multiplication. C[i][j] = C[i][j] + A[i][k]*B[k][j]; } Exercise: } Suppose the } parallel tasks are for k iterations. Determine which variables are conflicting, which should be declared as private, and which need to be protected against concurrent access by using a critical section.Read-only R/W non-conflicting R/W conflictingDeclare as shared Declare as privateWhich variables, if any, need to be protected by a critical section? Now, suppose the parallel tasks are for i iterations. Determine which variables are conflicting, which should be declared as private, and which need to be protected against concurrent access by using a critical section.Read-only R/W non-conflicting R/W conflictingDeclare as shared Declare as private© 2010 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2010 6 Which variables, if any, need to be protected by a critical section? Lecture 6 Architecture of Parallel Computers 7 Synchronization Synchronization is how programmers control the sequence of operations that are performed by parallel threads.Three types of synchronization are in widespread use. Point-to-point: o a pair of post() and wait() o a pair of send() and recv() in message passing Lock o a pair of lock() and unlock() o only one thread is allowed to be in a locked region at a given time o ensures mutual exclusion o used, for example, to serialize accesses to R/W concurrent variables. Barrier o a point past which a thread is allowed to proceed only when all threads have reached that point. LockWhat problem may arise here?// inside a parallel region for (i=start_iter; i<end_iter; i++) sum = sum + a[i];A lock prevents more than one thread from being inside the locked region.© 2010 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2010 8 // inside a parallel region for (i=start_iter; i<end_iter; i++) { lock(x); sum = sum + a[i]; unlock(x); } Issues:  What granularity to lock?  How to build a lock that is correct and fast. Barrier: Global event synchronizationA barrier is used when the code that follows requires that all threads have gotten to this point. Example: Simulation that works in terms of timesteps.Load balance is important.Execution time is dependent on the slowest thread.This is one reason for gang scheduling and avoiding time sharing and context switching.Intro to OpenMP: directives formatRefer to http://www.openmp.org and the OpenMP 2.0 specifications in the course web site for more detailsLecture 6 Architecture of Parallel Computers 9 #pragma omp directive-name [clause[ [,] clause]...] new-lineFor example, #pragma omp for [clause[[,] clause] ... ] new-line for-loopThe clause is one of  private(variable-list)  firstprivate(variable-list)  lastprivate(variable-list)  reduction(operator: variable-list)  ordered  schedule(kind[, chunk_size])  nowait Parallel for loop#include <omp.h> //… #pragma omp parallel { … #pragma omp parallel for default(shared) private(i) for (i=0; i<n; i++) A[I]= A[I]*A[I]- 3.0; }Parallel sections #pragma omp parallel shared(A,B)private(i) { #pragma omp sections nowait { #pragma omp section for(i=0; i<n; i++) A[i]= A[i]*A[i]- 4.0; #pragma omp section for(i=0; i<n; i++) B[i]= B[i]*B[i] + 9.0; } // end omp sections } // end omp parallel© 2010 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2010 10 Type of variables  shared, private, reduction, firstprivate, lastprivateSemi-private data for parallel loops:  reduction: variable that is the target of a reduction operation performed by the loop, e.g., sum  firstprivate: initialize the private copy from the value of the shared variable prior to parallel section  lastprivate: upon loop exit, master thread holds the value seen by the thread assigned the last loop iteration (for parallel loops only)Barriers Barriers are implicit after each parallel sectionWhen barriers are not needed for correctness, use nowait clause#include <omp.h> //… #pragma omp parallel { … #pragma omp for nowait default(shared) private(i) for(i=0; i<n; i++) A[I]= A[I]*A[I]- 3.0; }Lecture 6 Architecture of Parallel Computers 11

Determining Variable Scope

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support