Openmp 3.1 API C/C++ Syntax Quick Reference Card
Total Page:16
File Type:pdf, Size:1020Kb
OpenMP API 3.1 C/C++ Page 1 OpenMP 3.1 API C/C++ Syntax Quick Reference Card OpenMP Application Program Interface(API) is a portable, scalable OpenMP supports multi-platform shared-memory parallel model that gives shared-memory parallel programmers a simple programming in C/C++ and Fortran on all architectures, including ® and flexible interface for developing parallel applications for Unix platforms and Windows NT platforms. platforms ranging from the desktop to the supercomputer. A separate OpenMP reference card for Fortran is also available. C/C++ [n.n.n] refers to sections in the OpenMP API Specification available at www.openmp.org. Directives An OpenMP executable directive applies to the Single [2.5.3] Taskyield [2.7.2] succeeding structured block or an OpenMP Construct. The single construct specifies that the associated The taskyield construct specifies that the current task can A structured-block is a single statement or a compound structured block is executed by only one of the threads be suspended in favor of execution of a different task. statement with a single entry at the top and a single exit in the team (not necessarily the master thread), in the #pragma omp taskyield at the bottom. context of its implicit task. Master [2.8.1] Parallel [2.4] #pragma omp single [clause[ [, ]clause] ...] structured-block The master construct specifies a structured block that The parallel construct forms a team of threads and starts is executed by the master thread of the team. There is parallel execution. clause: no implied barrier either on entry to, or exit from, the private(list) #pragma omp parallel [clause[ [, ]clause] ...] master construct. structured-block firstprivate(list) #pragma omp master clause: copyprivate(list) structured-block if(scalar-expression) nowait Critical [2.8.2] num_threads(integer-expression) Parallel Loop [2.6.1] The critical construct restricts execution of the associated default(shared | none) The parallel loop construct is a shortcut for specifying structured block to a single thread at a time. private(list) a parallel construct containing one or more associated #pragma omp critical[ (name)] firstprivate(list) loops and no other statements. structured-block shared(list) #pragma omp parallel for [clause[ [, ]clause] ...] copyin(list) for-loop Barrier [2.8.3] reduction(operator: list) clause: The barrier construct specifies an explicit barrier at the Any accepted by the parallel or for directives, except point at which the construct appears. Loop [2.5.1] the nowait clause, with identical meanings and #pragma omp barrier The loop construct specifies that the iterations of restrictions. loops will be distributed among and executed by the Taskwait [2.8.4] encountering team of threads. The taskwait construct specifies a wait on the completion #pragma omp for [clause[ [, ]clause] ...] Simple Parallel Loop Example of child tasks of the current task. for-loops The following example demonstrates how to #pragma omp taskwait clause: parallelize a simple loop using the parallel loop private(list) construct. Atomic [2.8.5] firstprivate(list) The atomic construct ensures that a specific storage Most common form location is updated atomically, rather than exposing it to lastprivate(list) void simple(int n, float *a, float *b) of the for loop: the possibility of multiple, simultaneous writing threads. reduction(operator: list) { schedule(kind[, chunk_size]) for(var = lb; int i; #pragma omp atomic [read | write | update | capture] expression-stmt collapse(n) var relational-op b; #pragma omp parallel for ordered var += incr) for (i=1; i<n; i++) /* i is private by default */ #pragma omp atomic capture structured-block nowait b[i] = (a[i] + a[i-1]) / 2.0; } where expression-stmt may be one of the following forms: kind: • static: Iterations are divided into chunks of size if clause is... expression-stmt: chunk_size. Chunks are assigned to threads in the Parallel Sections [2.6.2] read v = x; team in round-robin fashion in order of thread The parallel sections construct is a shortcut for specifying write x = expr; number. a parallel construct containing one sections construct and update or x++; x--; ++x; • dynamic: Each thread executes a chunk of iterations no other statements. is not present --x; x binop = expr; x = x binop expr; then requests another chunk until no chunks remain #pragma omp parallel sections [clause[ [, ]clause] ...] capture v = x++; v = x--; v = ++x; to be distributed. v = --x; v = x binop= expr; • guided: Each thread executes a chunk of iterations { then requests another chunk until no chunks remain [#pragma omp section] and structured-block may be one of the following forms: to be assigned. The chunk sizes start large and shrink structured-block {v = x; x binop= expr;} {x binop= expr; v = x;} to the indicated chunk_size as chunks are scheduled. [#pragma omp section {v = x; x = x binop expr;} {x = x binop expr; v = x;} structured-block] {v = x; x++;} {v = x; ++x;} {++x; v = x;} {x++; v = x;} • auto: The decision regarding scheduling is delegated ... to the compiler and/or runtime system. {v = x; x--;} {v = x; --x;} {--x; v = x;} {x--; v = x;} } • runtime: The schedule and chunk size are taken from Flush [2.8.6] the run-sched-var ICV. clause: Any of the clauses accepted by the parallel or sections The flush construct executes the OpenMP flush operation, which makes a thread’s temporary view of Sections [2.5.2] directives, except the nowait clause, with identical memory consistent with memory, and enforces an order The sections construct contains a set of structured blocks meanings and restrictions. on the memory operations of the variables. that are to be distributed among and executed by the encountering team of threads. Task [2.7.1] #pragma omp flush[ (list)] #pragma omp sections[clause[[ ,] clause] ...] The task construct defines an explicit task. The data Ordered [2.8.7] { environment of the task is created according to the The ordered construct specifies a structured block in a [#pragma omp section] data-sharing attribute clauses on the task construct loop region that will be executed in the order of the loop structured-block and any defaults that apply. iterations. This sequentializes and orders the code within [#pragma omp section #pragma omp task [clause[ [, ]clause] ...] an ordered region while allowing code outside the region structured-block] structured-block to run in parallel. ... clause: #pragma omp ordered } if(scalar-expression) structured-block final(scalar-expression) clause: Threadprivate [2.9.2] untied The threadprivate directive specifies that variables are private(list) default(shared | none) replicated, with each thread having its own copy. firstprivate(list) mergeable lastprivate(list) #pragma omp threadprivate(list) private(list) list: reduction(operator: list) firstprivate(list) nowait A comma-separated list of file-scope, namespace- shared(list) scope, or static block-scope variables that do not have incomplete types. © 2011 OpenMP ARB OMP0711C OpenMP API 3.1 C/C++ Page 2 Runtime Library Routines Execution Environment Routines [3.2] int omp_in_parallel(void); int omp_get_thread_limit(void); Lock Routines [3.3] Execution environment routines affect Returns true if the call to the routine is Returns the value of the thread-limit-var Lock routines support synchronization with and monitor threads, processors, and the enclosed by an active parallel region; ICV, which is the maximum number of OpenMP locks. parallel environment. otherwise, it returns false. OpenMP threads available to the program. void omp_init_lock(omp_lock_t *lock); void omp_set_num_threads( void omp_set_dynamic( void omp_set_max_active_levels( void omp_init_nest_lock( int num_threads); int dynamic_threads); int max_levels); omp_nest_lock_t *lock); Affects the number of threads used for Enables or disables dynamic adjustment of Limits the number of nested active parallel These routines initialize an OpenMP lock. subsequent parallel regions that do not the number of threads available by setting regions, by setting max-active-levels-var specify a num_threads clause. the value of the dyn-var ICV. ICV. void omp_destroy_lock(omp_lock_t *lock); int omp_get_num_threads(void); void omp_destroy_nest_lock( int omp_get_dynamic(void); int omp_get_max_active_levels(void); omp_nest_lock_t *lock); Returns the number of threads in the Returns the value of the dyn-var ICV, Returns the value of max-active-levels-var current team. These routines ensure that the OpenMP determining whether dynamic adjustment ICV, which determines the maximum lock is uninitialized. of the number of threads is enabled or int omp_get_max_threads(void); number of nested active parallel regions. disabled. void omp_set_lock(omp_lock_t *lock); Returns maximum number of threads that int omp_get_level(void); could be used to form a new team using a void omp_set_nested(int nested); void omp_set_nest_lock( Returns the number of nested parallel omp_nest_lock_t *lock); parallel construct without a num_threads Enables or disables nested parallelism, by regions enclosing the task that contains clause. setting the nest-var ICV. These routines provide a means of setting the call. an OpenMP lock. int omp_get_thread_num(void); int omp_get_nested(void); int omp_get_ancestor_thread_num( void omp_unset_lock(omp_lock_t *lock); Returns the ID of the encountering thread Returns the value of the nest-var ICV, which int level); void omp_unset_nest_lock( where ID ranges from zero to the size of determines if nested parallelism is enabled Returns, for a given nested level