A “Hands-On” Introduction to Openmp*

A “Hands-on” Introduction to OpenMP* Tim Mattson Larry Meadows Principal Engineer Principal Engineer Intel Corporation Intel Corporation [email protected] [email protected] 1 * The name “OpenMP” is the property of the OpenMP Architecture Review Board. Preliminaries: part 1 z Disclosures The views expressed in this tutorial are those of the people delivering the tutorial. – We are not speaking for our employers. – We are not speaking for the OpenMP ARB z This is a new tutorial for us: Help us improve … tell us how you would make this tutorial better. 2 Preliminaries: Part 2 z Our plan for the day .. Active learning! We will mix short lectures with short exercises. You will use your laptop for the exercises … that way you’ll have an OpenMP environment to take home so you can keep learning on your own. z Please follow these simple rules Do the exercises we assign and then change things around and experiment. – Embrace active learning! Don’t cheat: Do Not look at the solutions before you complete an exercise … even if you get really frustrated. 3 Our Plan for the day Topic Exercise concepts I. OMP Intro Install sw, Parallel regions hello_world II. Creating threads Pi_spmd_simple Parallel, default data environment, runtime library calls Break III. Synchronization Pi_spmd_final False sharing, critical, atomic IV. Parallel loops Pi_loop For, reduction V. Odds and ends No exercise Single, master, runtime libraries, environment variables, synchronization, etc. lunch VI. Data Environment Pi_mc Data environment details, modular software, threadprivate VII. Worksharing and Linked list, For, schedules, sections schedule matmul Break VIII. Memory model Producer Point to point synch with flush consumer IX OpenMP 3 and tasks Linked list Tasks and other OpenMP 3 features 4 Outline z Introduction to OpenMP z Creating Threads z Synchronization z Parallel Loops z Synchronize single masters and stuff z Data environment z Schedule your for and sections z Memory model z OpenMP 3.0 and Tasks 5 OpenMP* Overview: C$OMP FLUSH #pragma omp critical C$OMP THREADPRIVATE(/ABC/) CALL OMP_SET_NUM_THREADS(10) OpenMP: An API for Writing Multithreaded C$OMP parallel do shared(a, b, c) call omp_test_lock(jlok) Applications call OMP_INIT_LOCK (ilok) C$OMP MASTER C$OMP ATOMIC C$OMP SINGLEA setPRIVATE(X) of compilersetenv directives OMP_SCHEDULE and library “dynamic” routines for parallel application programmers C$OMP PARALLEL DO ORDERED PRIVATE (A, B, C) C$OMP ORDERED Greatly simplifies writing multi-threaded (MT) C$OMP PARALLELprograms REDUCTION in Fortran, (+: A, C andB) C++ C$OMP SECTIONS #pragma ompStandardizes parallel for lastprivate(A, 20 years B) of !$OMPSMP practice BARRIER C$OMP PARALLEL COPYIN(/blk/) C$OMP DO lastprivate(XX) Nthrds = OMP_GET_NUM_PROCS() omp_set_lock(lck) 6 * The name “OpenMP” is the property of the OpenMP Architecture Review Board. HW System layer Prog. User layer OpenMP BasicDefs:SolutionStack Layer Proc1 Directives, Compiler OS/system support forsharedmemory andthreading Proc Shared Address Space 2 OpenMP Runtime library Application OpenMP library Proc End User 3 ProcN Environment variables 7 OpenMP core syntax z Most of the constructs in OpenMP are compiler directives. #pragma omp construct [clause [clause]…] Example #pragma omp parallel num_threads(4) z Function prototypes and types in the file: #include <omp.h> z Most OpenMP* constructs apply to a “structured block”. Structured block: a block of one or more statements with one point of entry at the top and one point of exit at the bottom. It’s OK to have an exit() within the structured block. 8 Exercise 1, Part A: Hello world Verify that your environment works z Write a program that prints “hello world”. voidvoid main()main() {{ intint ID ID == 0;0; printf(“printf(“ hello(%d) hello(%d) ”,”, ID);ID); printf(“printf(“ world(%d) world(%d) \n”,\n”, ID);ID); }} 9 Exercise 1, Part B: Hello world Verify that your OpenMP environment works z Write a multithreaded program that prints “hello world”. #include “omp.h” voidvoid main()main() {{ #pragma omp parallel Switches for compiling and linking -fopenmp gcc { -mp pgi intint ID ID == 0;0; /Qopenmp intel printf(“printf(“ hello(%d) hello(%d) ”,”, ID);ID); printf(“printf(“ world(%d) world(%d) \n”,\n”, ID);ID); } }} 10 Exercise 1: Solution A multi-threaded “Hello world” program z Write a multithreaded program where each thread prints “hello world”. #include#include “omp.h”“omp.h” OpenMPOpenMP include include file file voidvoid main()main() ParallelParallel region region with with default default Sample Output: {{ numbernumber of of threads threads Sample Output: #pragma#pragma omp omp parallel parallel hello(1)hello(1) hello(0)hello(0) world(1)world(1) { { world(0)world(0) int ID = omp_get_thread_num(); int ID = omp_get_thread_num(); hellohello (3)(3) hello(2)hello(2) world(3)world(3) printf(“printf(“ hello(%d) hello(%d) ”,”, ID);ID); printf(“printf(“ world(%d) world(%d) \n”,\n”, ID);ID); world(2)world(2) } } RuntimeRuntime library library function function to to } EndEnd of of the the Parallel Parallel region region returnreturn a a thread thread ID. ID. } 11 OpenMP Overview: How do threads interact? z OpenMP is a multi-threading, shared address model. – Threads communicate by sharing variables. z Unintended sharing of data causes race conditions: – race condition: when the program’s outcome changes as the threads are scheduled differently. z To control race conditions: – Use synchronization to protect data conflicts. z Synchronization is expensive so: – Change how data is accessed to minimize the need for synchronization. 12 Outline z Introduction to OpenMP z Creating Threads z Synchronization z Parallel Loops z Synchronize single masters and stuff z Data environment z Schedule your for and sections z Memory model z OpenMP 3.0 and Tasks 13 OpenMP Programming Model: Fork-Join Parallelism: Master thread spawns a team of threads as needed. Parallelism added incrementally until performance goals are met: i.e. the sequential program evolves into a parallel program. Parallel Regions AA Nested Nested Master ParallelParallel Thread regionregion in red Sequential Parts 14 Thread Creation: Parallel Regions z You create threads in OpenMP* with the parallel construct. z For example, To create a 4 thread Parallel region: double A[1000]; RuntimeRuntime function function to to EachEach thread thread omp_set_num_threads(4); requestrequest a a certain certain executes a number of threads executes a #pragma omp parallel number of threads copycopy of of the the code within { code within int ID = omp_get_thread_num(); thethe structured pooh(ID,A); structured RuntimeRuntime function function block block } returningreturning a a thread thread ID ID z EachEach threadthread callscalls pooh(ID,A) forfor ID == 0 toto 3 15 * The name “OpenMP” is the property of the OpenMP Architecture Review Board Thread Creation: Parallel Regions z You create threads in OpenMP* with the parallel construct. z For example, To create a 4 thread Parallel region: clauseclause to to request request a a certain certain double A[1000]; numbernumber of of threads threads EachEach thread thread executes a executes a #pragma omp parallel num_threads(4) copycopy of of the the code within { code within int ID = omp_get_thread_num(); thethe structured pooh(ID,A); structured RuntimeRuntime function function block block } returningreturning a a thread thread ID ID z EachEach threadthread callscalls pooh(ID,A) forfor ID == 0 toto 3 16 * The name “OpenMP” is the property of the OpenMP Architecture Review Board Thread Creation: Parallel Regions example double A[1000]; z Each thread executes the omp_set_num_threads(4); same code redundantly. #pragma omp parallel { int ID = omp_get_thread_num(); double A[1000]; pooh(ID, A); } omp_set_num_threads(4) printf(“all done\n”); AA single single copycopy of of A A isis shared shared pooh(0,A) pooh(1,A) pooh(2,A) pooh(3,A) betweenbetween all all threads.threads. printf(“all done\n”); ThreadsThreads wait wait here here for for all all threads threads to to finish before proceeding (i.e. a barrier) finish before proceeding (i.e. a barrier17 ) * The name “OpenMP” is the property of the OpenMP Architecture Review Board Exercises 2 to 4: Numerical Integration Mathematically, we know that: 1 4.0 4.0 ∫ (1+x2) dx = π 0 ) 2 We can approximate the +x integral as a sum of 1 /( 2.0 rectangles: .0 4 N = ) x ( F ∑ F(xi)Δx ≈π i = 0 Where each rectangle has 0.0 1.0 X width Δx and height F(xi) at the middle of interval i. 18 Exercises 2 to 4: Serial PI Program static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0; step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; } 19 Exercise 2 z Create a parallel version of the pi program using a parallel construct. z Pay close attention to shared versus private variables. z In addition to a parallel construct, you will need the runtime library routines int omp_get_num_threads(); Number of threads in the team int omp_get_thread_num(); double omp_get_wtime(); Thread ID or rank Time in Seconds since a 20 fixed point in the past Outline z Introduction to OpenMP z Creating Threads z Synchronization z Parallel Loops z Synchronize single masters and stuff z Data environment z Schedule your for and sections z Memory model z OpenMP 3.0 and Tasks 21 Synchronization Synchronization is used to impose order z High level synchronization: constraints and to protect access to – critical shared data – atomic – barrier – ordered

A “Hands-On” Introduction to Openmp*

Heterogeneous Task Scheduling for Accelerated Openmp

Parallel Programming

Openmp API 5.1 Specification

Openmp Made Easy with INTEL® ADVISOR

Introduction to Openmp Paul Edmon ITC Research Computing Associate

Parallel Programming with Openmp

Intel Threading Building Blocks

Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation

Lecture 12: Introduction to Openmp (Part 1)

Programming Your GPU with Openmp*

Openmp Tutorial

Parallel Programming in Fortran 95 Using Openmp