Hybrid Openmp + CUDA Programming Model Podgainy D.V., Streltsova O.I., Zuev M.I

Hybrid Openmp + CUDA Programming Model Podgainy D.V., Streltsova O.I., Zuev M.I

Hybrid OpenMP + CUDA programming model Podgainy D.V., Streltsova O.I., Zuev M.I. Heterogeneous Computations team HybriLIT Laboratory of Information Technologies, Joint Institute for Nuclear Research Dubna, Russia, from 15 February to 7 March 2017 Types of parallel machines Distributed memory Shared memory • each processor has its own • single address space for all processors memory address space • examples: IBM p-series, multi-core PC • examples: clusters, Blue Gene/L Shared-Memory Parallel Computers Uniform Memory Access CPU0 CPU1 CPU2 CPU0 CPU1 CPU2 Memory Mem0 Mem1 Mem2 Non-Uniform Memory Access (ccNUMA) CPU CPU Memory Memory CPU CPU Bus Interconnect CPU CPU Memory Memory CPU CPU HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT OpenMP (Open speciFications For Multi-Processing) OpenMP (Open speciFications For Multi-Processing) is an API that supports multi-platform shared memory multiprocessing programming in Fortran, C, C++. • Compiler • Library • Environment directives routines variables OpenMP is managed by consortium OpenMP Architecture Review Board (or OpenMP ARB) from 1997 OpenMP website: http://openmp.org/ OpenMP for OpenMP for Version 3.0 Version Fortran 1.0, in 1997 Fortran 2.0, in 2000 was released in 4.0 С/C++ in 1998 С/C++ in 2002 May 2008. July 2013 HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT What is OpenMP • OpenMP (Open speciFications For Multi-Processing) – one of the most popular parallel computing technologies for multi-processor/core computers with the shared memory architecture. • OpenMP is based on traditional programming languages. OpenMP standard is developed for Fortran, C and C++ languages. All basic constructions for these languages are similar. Also, there are known cases of OpenMP implementation for MATLAB and MATHEMATICA. • The OpenMP-based computer program contains a number of threads interacting via shared memory. OpenMP provides a number of special directives for a compiler, library functions and environment variables. • Compiler directives are used for indicating segments of a code with the possibility for parallel processing. • Utilizing OpenMP constructs (compiler directives, procedures, environment variables), a user can organize parallelism in their serial code. • “Partial parallelization” is available by “step-by-step” adding OpenMP-directives. (OpenMP offers an incremental approach to parallelism) • OpenMP-directives are ignored by standard compiler. So, the code stays workable on both single- and multi-processor platform. HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT OpenMP compilers (Full list at http://openmp.org/wp/openmp-compilers/) Compiler Flag InFormation GNU gcc -fopenmp gcc 4.2 – OpenMP 2.5 gcc 4.4 – OpenMP 3.0 gcc 4.7 – OpenMP 3.1 gcc -fopenmp start_openmp.c -o test1 gcc 4.9 – OpenMP 4.0 • Intel C/C++ -openmp on Linux or Mac OSX OpenMP 3.1 API and Fortran Specification -Qopenmp on Windows • Support for most of the new features in icc -openmp start_openmp.c -o test1 the OpenMP* 4.0 API Specification Portland -mp Full support for Group OpenMP 3.1 Compilers and Tools pgcc -mp start_openmp.c -o test1 HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT OpenMP Programming Model • OpenMP is an explicit (not automatic) programming model, offering the programmer full control over parallelization. Fork - Join Model: Thread 0 Master thread (Thread 0) Master thread (Thread 0) Thread 1 Thread 2 •All OpenMP start with just one thread: JOIN: When the team FORK: the the master thread. threads complete the master thread The master thread statements in the then creates a executes sequentially Parallel region parallel region team of until the first parallel construct, they parallel threads. region construct is synchronize and encountered. terminate HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT OpenMP: Construct parallel C / C++, General Code Structure : Fortran, General Code Structure: #include <omp.h> PROGRAM Start main () { !Serial code // Serial code !......... ... ! // Fork a team of threads: // Fork a team of threads: !$OMP PARALLEL #pragma omp parallel { ... ... structured block structured block ... ... } !$OMP END PARALLEL // Resume serial code ! Resume serial code ... END } The parallelism has to be expressed explicitly. OpenMP: Construct parallel Meaning: #pragma omp parallel • The entire code block Following the { parallel-directive is executed by all structured block threads concurrently } • This includes: - creation of team of ”worker” threads - thread executes a copy of the code #pragma omp parallel [clause ...] newline within the structured block if (scalar_expression) - barrier synchronization (implicit barrier) private (list) - termination of worker threads. shared (list) deFault (shared | none) firstprivate (list) reduction (operator: list) copyin (list) num_threads (integer-expression) { structured block } HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT OpenMP: Classes of variables There are two main classes oF variables: shared and private ones. • The shared variable always exists only in a single instance for the whole program and is available for all threads under the same name. • The declaration of the private variable causes generation of its own instance of the given variable for each thread. Change of a value of a thread’s private variable does not influence the change of this local variable value in other threads. • There are also “intermediate” types that provide interconnection between parallel and consistent sections. • Thus, if a variable that is a part of a consistent code preceding a parallel section, is declared firstprivate, then in the parallel section this value is assigned to private variables under the same name for each thread. • Likewise, a variable of lastprivate type after termination of a parallel block saves the value obtained in the latest completed parallel thread. HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT Private variables and Shared variables Shared: the data within a parallel region is shared, which means visible and accessible by all threads simultaneously. Private: the data within a parallel region is private to each thread, which means each thread will have a local copy and use it as a temporary variable. ……… int a; // shared automatic int j; What if we need to initialize a int k=3; private variable? #pragma omp parallel private (j,k) firstprivate: private variables with { int b; //private automatic initial values copied from the master thread’s copy b=j ; //b is not deFined foo (j,b,k); } HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT SpeciFication oF number oF threads Setting OpenMP environment variables is done the same way you set any other sh/tcsh setenv OMP_NUM_THREADS 4 environment variables, and depends upon which shell you use. sh/bash export OMP_NUM_THREADS=4 Via runtime functions: omp_set_num_threads(4); Other useful function to get information about threads: Runtime function omp_get_num_threads() • Returns number oF threads in parallel region • Returns 1 if called outside parallel region Runtime function omp_get_thread_num() •Returns id of thread in team (Value between [0,Nthreads-1] ) •Master thread always has id 0 HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT Explicit (low level) parallelism and High level parallelism: directive section Low level parallelism: the work is distributed between threads by means functions omp_get_thread_num (Returns the thread number of the thread executing within its thread team.) omp_get_num_threads (returns the number of threads in the parallel region.). EXAMPLE of high level parallelism #pragma omp parallel (parallel independent sections): { if(omp_get_thread_num()) ==3 ) #pragma omp sections […[parameters…]] { { #pragma omp section < code for the thread number 3 Each of block 1 { >; and block 2 < block1> } in this example } else will be carried #pragma omp section { out by one of { < code for all another threads >; parallel treads . } < block2> } } } HETEROGENEOUS COMPUTATIONS TEAM, HybriLIT OpenMP compilers Compiler Flag InFormation GNU gcc -fopenmp gcc 4.2 – OpenMP 2.5 gcc 4.4 – OpenMP 3.0 gcc 4.7 – OpenMP 3.1 gcc -fopenmp start_openmp.c -o test1 gcc 4.9 – OpenMP 4.0 • Intel C/C++ -openmp on Linux or Mac OSX OpenMP 3.1 API and Fortran Specification -Qopenmp on Windows • Support for most of the new features in icc -openmp start_openmp.c -o test1 the OpenMP* 4.0 API Specification Portland -mp Full support for Group OpenMP 3.1 Compilers and Tools pgcc -mp start_openmp.c -o test1 Full list at http://openmp.org/wp/openmp-compilers/ Hardware: CPU and GPUs on one node GPU 1 GPU 2 GPU 3 (Tesla k40) (Tesla k40) (Tesla k40) PCIe PCIe PCIe Host memory CPU 1 CPU 2 Node #1 Using Multiple GPU deviceInfo_CUDA.cu #include <stdio.h> #include <cuda.h> #include <omp.h> int main (){ int ngpus = 0; // number of CUDA GPUs int device; // How many devices ? cudaGetDeviceCount(&ngpus); if(ngpus < 1) { printf("no CUDA capable devices Were detected\n"); return 1; } printf("number of CUDA devices:\t %d \n", ngpus); for( device = 0; device < ngpus; device++){ cudaDeviceProp dprop; cudaGetDeviceProperties(&dprop, device); printf(" %d: %s\n", device, dprop.name); } return 0; } Compilation and running CUDA+OpenMP program Compilation $ nvcc -Xcompiler -fopenmp –lgomp -arch=compute_35 --gpu-code=sm_35,sm_37 deviceInfo_CUDA.cu -o cuda_app script_multiCUDA #!/bin/sh #SBATCH -p tut #SBATCH –t 60 #SBATCH -n 1 #SBATCH -c 2 #SBATCH --gres=gpu:2 export OMP_NUM_THREADS=2 srun ./cuda_app Using Multiple GPU GPU can be controlled by: • a single CPU thread • multiple CPU threads belonging to the same process • multiple CPU threads belonging to different processes All CUDA

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us