Advanced Openmp Tutorial Openmp for Heterogeneous Computing Slides Designed by / Based On

Advanced Openmp Tutorial Openmp for Heterogeneous Computing Slides Designed by / Based On

Advanced OpenMP Tutorial OpenMP for Heterogeneous Computing slides designed by / based on: Christian Terboven Michael Klemm James C. Beyer Kelvin Li Bronis R. de Supinski 1 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Topics n Heterogeneous device execution model n Mapping variables to a device n Accelerated workshare n Examples 2 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Device Model n OpenMP supports heterogeneous systems n Device model: àOne host device and àOne or more target devices Heterogeneous SoC Host and Co-processors Host and GPUs 3 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Terminology n Device: An implementation-defined (logical) execution unit. n Device data environment: The storage associated with a device. The execution model is host-centric such that the host device offloads target regions to target devices. 4 Advanced OpenMP Tutorial – Device Constructs Tim Cramer OpenMP Offloading Target Device Offloading Host Device void saxpy(){ main(){ int n = 10240; float a = 42.0f; float b = 23.0f; saxpy(); float *x, *y; } // Allocate and initialize x, y // Run SAXPY Target Device #pragma omp parallel for for (int i = 0; i < n; ++i){ y[i] = a*x[i] + y[i]; } } 5 Advanced OpenMP Tutorial – Device Constructs Tim Cramer OpenMP Offloading Target Device Offloading Host Device void saxpy(){ offloading main(){ int n = 10240; float a = 42.0f; float b = 23.0f; saxpy(); float *x, *y; } // Allocate and initialize x, y // Run SAXPY #pragma omp target Target Device #pragma omp parallel for for (int i = 0; i < n; ++i){ y[i] = a*x[i] + y[i]; saxpy(); } } 6 Advanced OpenMP Tutorial – Device Constructs Tim Cramer OpenMP Offloading Target Device Offloading Host Device void saxpy(){ offloading main(){ int n = 10240; float a = 42.0f; float b = 23.0f; saxpy(); float *x, *y; } // Allocate and initialize x, y x,y // Run SAXPY #pragma omp target map(to:x[0:n]) map(tofrom:y[0:n]) Target Device #pragma omp parallel for for (int i = 0; i < n; ++i){ y[i] = a*x[i] + y[i]; saxpy(); } } 7 Advanced OpenMP Tutorial – Device Constructs Tim Cramer OpenMP Device Constructs Execute code on a target device Map variables to a target device • omp target [clause[[,]clause]…] • map([[map-type-modifier[,]]map-type:] list) map-type := alloc | tofrom | to | from | release | delete structured-block map-type-modifier := always • omp declare target • omp target data clause[[[,] clause]…] [function-definitions-or-declarations] structured-block • omp target enter data clause[[[,]clause]…] • omp target exit data clause[[[,]clause]…] Workshare for acceleration • omp target update clause[[[,]clause]…] • omp teams [clause[[,]clause]…] • omp declare target structured-block [variable-definitions-or-declarations] • omp distribute [clause[[,]clause]…] for-loops 8 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Device Runtime Support Runtime routines • void omp_set_default_device(int dev_num) • int omp_get_default_device(void) • int omp_get_num_devices(void) • int omp_get_num_teams(void) • int omp_get_team_num(void) • int omp_is_initial_device(void) • int omp_get_initial_device(void) Environment variable • Control default device through OMP_DEFAULT_DEVICE • Control offloading behaviour OMP_TARGET_OFFLOAD 9 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Offloading Computation host n Use target construct to count = 500; à Transfer control from the host to the #pragma omp target map(to:b,c,d) map(from:a) target device { #pragma omp parallel for target n Use map clause to for (i=0; i<count; i++) { à Map variables between the host and a[i] = b[i] * c + d; target device data environments } } n Host thread waits until offloaded host region completed by default a0 = a[0]; à Use the nowait clause for asynchronous execution 10 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Asynchronous offloading host n A host task is generated that count = 500; encloses the target region. #pragma omp target map(to:b,c,d) map(from:a) nowait { #pragma omp parallel for n The nowait clause indicates that the target for (i=0; i<count; i++) { encountering thread does not wait for a[i] = b[i] * c + d; the target region to complete. } } n The depend clause can be used for do_some_other_work(); host synchronization with other tasks a0 = a[0]; //Synchronization missing here! target task A mergeable and untied task that is generated by a target, target enter data, target exit data or target update construct. 11 Advanced OpenMP Tutorial – Device Constructs Tim Cramer target Construct n Transfer control from the host to the target device n Syntax (C/C++) #pragma omp target [clause[[,] clause]…] structured-block n Syntax (Fortran) !$omp target [clause[[,] clause]…] structured-block !$omp end target n Clauses device(scalar-integer-expression) map([always[,]] alloc | to | from | tofrom | delete | release: list) if([target: ]scalar-expr) private(list) firstprivate(list) is_device_ptr(list) defaultmap(tofrom: scalar) nowait depend(dependence-type: list) 12 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Tasks and Target Example #pragma omp declare target #include <stdlib.h> #include <omp.h> extern void compute(float*, float*, int); #pragma omp end declare target All aspects of this void vec_mult_async(float* p, float* v1, float* v2, int N) { example will be int i; explained in the #pragma omp target enter data map(alloc: v1[:N], v2[:N]) following. #pragma omp target nowait depend(out: v1, v2) compute(v1, v2, N); #pragma omp task other_work(); // execute asynchronously on host device #pragma omp target map(from:p[0:N]) nowait depend(in: v1, v2) { #pragma omp distribute parallel for for (i=0; i<N; i++) p[i] = v1[i] * v2[i]; } #pragma omp taskwait #pragma omp target exit data map(release: v1[:N], v2[:N]) } 13 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Terminology n Mapped variable: The corresponding variable in a device data environment to an original variable in a (host) data environment. n Mappable type: A type that is valid for mapped variables. (Bitwise copyable plus additional restrictions.) 14 Advanced OpenMP Tutorial – Device Constructs Tim Cramer map Clause n Map a variable or an array section to a device data environment n Syntax: map([[map-type-modifier[,]] map-type:] list) n Where map-type is: à alloc: allocate storage for corresponding variable à to: alloc and assign value of original variable to corresponding variable on entry à from: alloc and assign value of corresponding variable to original variable on exit à tofrom: default, both to and form à delete: the corresponding variable is removed à release: the reference count is decremented n Where map-type-modifier is: à always: the mapping operation is always performed 15 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Device Data Environment n The map clauses determine how an original variable in a data environment is mapped to a corresponding variable in a device data environment. Host Device pA 1 alloc(…) 2 to(…) #pragma omp target \ 4 map(alloc:...) \ from(…) map(to:...) \ map(from:...) { ... } 3 16 Advanced OpenMP Tutorial – Device Constructs Tim Cramer map Clause void vec_mult(float *p, float *v1, float *v2, int N) { • int i; The array sections for v1, v2, init(v1, v2, N); and p are explicitly mapped #pragma omp target map(to:v1[0:N],v2[0:N]) map(from:p[0:N]) into the device data { #pragma omp parallel for environment. for (i=0; i<N; i++) p[i] = v1[i] * v2[i]; • The variable N is implicitly output(p, N); firstprivate into the device } } data environment. subroutine vec_mult(p, v1, v2, N) real, dimension(*) :: p, v1, v2 integer :: N, i call init(v1, v2, N) !$omp target map(to: v1(1:N), v2(1:N)) map(from:p(1:N)) Note: !$omp parallel do In 4.0, a scalar variable referenced inside a target do i=1, N construct is implicitly mapped as map(inout:...). p(i) = v1(i) * v2(i) end do In 4.5, a scalar variable referenced inside a target !omp end target construct is implicitly firstprivate. call output(p, N) end subroutine 17 Advanced OpenMP Tutorial – Device Constructs Tim Cramer map Clause void vec_mult(float *p, float *v1, float *v2, int N) { • On entry to the target region: int i; – init(v1, v2, N); Allocate corresponding variables v1, v2, and p in the device data environment. #pragma omp target map(to:v1[0:N],v2[0:N]) map(from:p[0:N]) – Assign the corresponding variables v1 { and v2 the value of their respective #pragma omp parallel for for (i=0; i<N; i++) original variables. p[i] = v1[i] * v2[i]; – The corresponding variable p is undefined. output(p, N); } • On exit from the target region: } – Assign the original variable p the value subroutine vec_mult(p, v1, v2, N) of its corresponding variable. real, dimension(*) :: p, v1, v2 – The original variables v1 and v2 are integer :: N, i call init(v1, v2, N) undefined. – Remove the corresponding variables v1, !$omp target map(to: v1(1:N), v2(1:N)) map(from:p(1:N)) v2, and p from the device data !$omp parallel do environment. do i=1, N p(i) = v1(i) * v2(i) end do !omp end target call output(p, N) end subroutine 18 Advanced OpenMP Tutorial – Device Constructs Tim Cramer MAP is not necessarily a copy Shared memory Memory Processor X Accelerator Y Cache Cache A A A Distributed memory Accelerator Memory X Y • The corresponding variable in the device Processor data environment may share storage with X Memory Y the original variable. Cache A A • Writes to the corresponding variable may A change the value of the original variable. 19 Advanced OpenMP Tutorial – Device Constructs Tim Cramer Map variables across multiple target regions n Optimize sharing data between host and device. n The target data, target enter data, and target exit data constructs map variables but do not offload code. n Corresponding variables remain in the device data environment for the extent of the target data region.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    47 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us