Openmp Application Programming Interface Examples

Total Page:16

File Type:pdf, Size:1020Kb

Openmp Application Programming Interface Examples OpenMP Application Programming Interface Examples Version 5.0.0 – November 2019 Source codes for OpenMP 5.0.0 Examples can be downloaded from github. Copyright c 1997-2019 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP Architecture Review Board copyright notice and the title of this document appear. Notice is given that copying is by permission of OpenMP Architecture Review Board. This page intentionally left blank Contents Foreword vii Introduction1 Examples 2 1 Parallel Execution3 1.1 A Simple Parallel Loop . .5 1.2 The parallel Construct . .6 1.3 teams Construct on Host . .8 1.4 Controlling the Number of Threads on Multiple Nesting Levels . 11 1.5 Interaction Between the num_threads Clause and omp_set_dynamic .... 14 1.6 Fortran Restrictions on the do Construct . 16 1.7 The nowait Clause . 18 1.8 The collapse Clause . 21 1.9 linear Clause in Loop Constructs . 25 1.10 The parallel sections Construct . 27 1.11 The firstprivate Clause and the sections Construct . 28 1.12 The single Construct . 30 1.13 The workshare Construct . 32 1.14 The master Construct . 36 1.15 The loop Construct . 38 1.16 Parallel Random Access Iterator Loop . 39 1.17 The omp_set_dynamic and omp_set_num_threads Routines . 40 1.18 The omp_get_num_threads Routine . 42 i 2 OpenMP Affinity 44 2.1 The proc_bind Clause . 46 2.1.1 Spread Affinity Policy . 46 2.1.2 Close Affinity Policy . 49 2.1.3 Master Affinity Policy . 51 2.2 Task Affinity . 52 2.3 Affinity Display . 53 2.4 Affinity Query Functions . 64 3 Tasking 67 3.1 The task and taskwait Constructs . 68 3.2 Task Priority . 87 3.3 Task Dependences . 89 3.3.1 Flow Dependence . 89 3.3.2 Anti-dependence . 90 3.3.3 Output Dependence . 91 3.3.4 Concurrent Execution with Dependences . 92 3.3.5 Matrix multiplication . 93 3.3.6 taskwait with Dependences . 94 3.3.7 Mutually Exclusive Execution with Dependences . 101 3.3.8 Multidependences Using Iterators . 104 3.4 The taskgroup Construct . 107 3.5 The taskyield Construct . 110 3.6 The taskloop Construct . 112 3.7 The parallel master taskloop Construct . 116 4 Devices 118 4.1 target Construct . 119 4.1.1 target Construct on parallel Construct . 119 4.1.2 target Construct with map Clause . 120 4.1.3 map Clause with to/from map-types . 121 4.1.4 map Clause with Array Sections . 122 4.1.5 target Construct with if Clause . 124 4.1.6 target Reverse Offload . 127 ii OpenMP Examples Version 5.0.0 - November 2019 4.2 Pointer mapping . 130 4.3 Structure mapping . 133 4.4 Array Sections in Device Constructs . 136 4.5 Array Shaping . 140 4.6 declare mapper Construct . 142 4.7 target data Construct . 149 4.7.1 Simple target data Construct . 149 4.7.2 target data Region Enclosing Multiple target Regions . 150 4.7.3 target data Construct with Orphaned Call . 154 4.7.4 target data Construct with if Clause . 158 4.8 target enter data and target exit data Constructs . 162 4.9 target update Construct . 165 4.9.1 Simple target data and target update Constructs . 165 4.9.2 target update Construct with if Clause . 167 4.10 declare target Construct . 169 4.10.1 declare target and end declare target for a Function . 169 4.10.2 declare target Construct for Class Type . 171 4.10.3 declare target and end declare target for Variables . 171 4.10.4 declare target and end declare target with declare simd . 174 4.10.5 declare target Directive with link Clause . 177 4.11 teams Constructs . 180 4.11.1 target and teams Constructs with omp_get_num_teams and omp_get_team_num Routines . 180 4.11.2 target, teams, and distribute Constructs . 182 4.11.3 target teams, and Distribute Parallel Loop Constructs . 183 4.11.4 target teams and Distribute Parallel Loop Constructs with Scheduling Clauses . 185 4.11.5 target teams and distribute simd Constructs . 186 4.11.6 target teams and Distribute Parallel Loop SIMD Constructs . 188 4.12 Asynchronous target Execution and Dependences . 189 4.12.1 Asynchronous target with Tasks . 189 4.12.2 nowait Clause on target Construct . 193 4.12.3 Asynchronous target with nowait and depend Clauses . 195 Contents iii 4.13 Device Routines . 198 4.13.1 omp_is_initial_device Routine . 198 4.13.2 omp_get_num_devices Routine . 200 4.13.3 omp_set_default_device and omp_get_default_device Routines . 201 4.13.4 Target Memory and Device Pointers Routines . 202 5 SIMD 204 5.1 simd and declare simd Constructs . 205 5.2 inbranch and notinbranch Clauses . 212 5.3 Loop-Carried Lexical Forward Dependence . 216 6 Synchronization 219 6.1 The critical Construct . 221 6.2 Worksharing Constructs Inside a critical Construct . 224 6.3 Binding of barrier Regions . 226 6.4 The atomic Construct . 228 6.5 Restrictions on the atomic Construct . 234 6.6 The flush Construct without a List . 237 6.7 Synchronization Based on Acquire/Release Semantics . 240 6.8 The ordered Clause and the ordered Construct . 248 6.9 The depobj Construct . 252 6.10 Doacross Loop Nest . 256 6.11 Lock Routines . 262 6.11.1 The omp_init_lock Routine . 262 6.11.2 The omp_init_lock_with_hint Routine . 263 6.11.3 Ownership of Locks . 264 6.11.4 Simple Lock Routines . 265 6.11.5 Nestable Lock Routines . 268 7 Data Environment 271 7.1 The threadprivate Directive . 273 7.2 The default(none) Clause . 279 7.3 The private Clause . 281 7.4 Fortran Private Loop Iteration Variables . 285 iv OpenMP Examples Version 5.0.0 - November 2019 7.5 Fortran Restrictions on shared and private Clauses with Common Blocks . 287 7.6 Fortran Restrictions on Storage Association with the private Clause . 289 7.7 C/C++ Arrays in a firstprivate Clause . 292 7.8 The lastprivate Clause . 294 7.9 Reduction . 295 7.9.1 The reduction Clause . 295 7.9.2 Task Reduction . 303 7.9.3 Taskloop Reduction . 306 7.9.4 User-Defined Reduction . 313 7.10 The copyin Clause . 324.
Recommended publications
  • Pathscale™ Ekopath™ Compiler Suite User Guide
    PATHSCALE™ EKOPATH™ COMPILER SUITE USER GUIDE VERSION 2.4 2 Copyright © 2004, 2005, 2006 PathScale, Inc. All Rights Reserved. PathScale, EKOPath, the PathScale logo, and Accelerating Cluster Performance are trademarks of PathScale, Inc. All other trademarks belong to their respective owners. In accordance with the terms of their valid PathScale customer agreements, customers are permitted to make electronic and paper copies of this document for their own exclusive use. All other forms of reproduction, redistribution, or modification are prohibited without the prior express written permission of PathScale, Inc. Document number: 1-02404-10 Last generated on March 24, 2006 Release version New features 1.4 New Sections 3.3.3.1, 3.7.4, 10.4, 10.5 Added Appendix B: Supported Fortran in- trinsics 2.0 New Sections 2.3, 8.9.7, 11.8 Added Chapter 8: Using OpenMP in For- tran New Appendix B: Implementation depen- dent behavior for OpenMP Fortran Expanded and updated Appendix C: Sup- ported Fortran intrinsics 2.1 Added Chapter 9: Using OpenMP in C/C++, Appendix E: eko man page Expanded and updated Appendix B and Ap- pendix C 2.2 New Sections 3.5.1.4, 3.5.2; combined OpenMP chapters 2.3 Added to reference list in Chapter 1, new Section 8.2 on autoparallelization 2.4 Expanded and updated Section 3.4, Section 3.5, and Section 7.9, Updated Section C.3, added Section C.4 (Fortran intrinstic exten- sions) Contents 1 Introduction 11 1.1 Conventions used in this document . 12 1.2 Documentation suite . 12 2 Compiler Quick Reference 15 2.1 What you installed .
    [Show full text]
  • Runtime Storage
    Run-time storage layout: focus on compilation, not interpretation n Plan how and where to keep data at run-time CSE401: Storage Layout n Representation of n int, bool, etc. n arrays, records, etc. n procedures Larry Ruzzo n Placement of Spring 2001 n global variables n local variables Slides by Chambers, Eggers, Notkin, Ruzzo, and others n parameters © W.L. Ruzzo and UW CSE, 1994-2001 n results 1 2 Data layout of scalars Based on machine representation Data layout of aggregates n Aggregate scalars together Integer Use hardware representation n (2, 4, and/or 8 bytes of memory, maybe Different compilers make different decisions aligned) n Decisions are sometimes machine dependent n Note that through the discussion of the front-end, Bool 1 byte or word we never mentioned the target machine Char 1-2 bytes or word n We didn’t in interpretation, either Pointer Use hardware representation n But now it’s going to start to come up constantly (2, 4, or 8 bytes, maybe two words if n Necessarily, some of what we will say will be segmented machine) "typical", not universal. 3 4 Layout of records Layout of arrays n Concatenate layout r : record n Repeated layout of s : array [5] of b : bool; record; of fields i : int; element type i : int; n Respect alignment m : record n Respect alignment of c : char; restrictions b : bool; element type end; c : char; n Respect field order, if end n How is the length of required by language j : int; end; the array handled? n Why might a language choose to do this or not do this? n Respect contiguity? 5 6 CSE 401, © W.
    [Show full text]
  • Margaret Jacobs Phd Thesis
    TRANSLATION OF APL TO OTHER HIGH-LEVEL LANGUAGES Margaret M. Jacobs A Thesis Submitted for the Degree of PhD at the University of St Andrews 1975 Full metadata for this item is available in St Andrews Research Repository at: http://research-repository.st-andrews.ac.uk/ Please use this identifier to cite or link to this item: http://hdl.handle.net/10023/13417 This item is protected by original copyright The code generated corresponding to a particular APL routine will not at first be very efficient. However, methods of optimising the generated code are discussed at length in the thesis. A brief comparison is made with other possible methods of conversion. There are certain restrictions on the types of APL statements able to be handled by the translation method. These restrictions a/re listed in an accompanying appendix. Throughout the text, several examples are given of the code which will be generated from particular APL statements or expressions. Some more lengthy examples of conversion of APL routines to FORTRAN are provided as an appendix. TRANSLATION OP APL TO OTHER HIGH-LEVEL LANGUAGES MARGARET M. JACOBS ,\x ABSTRACT The research work required to produce this thesis was carried out in the Department of Computational Science, University of St. Andrews. Financial assistance was provided by the Science Research Council. The thesis describes a method of translating the computer language APL to other high-level languages. Particular reference is made to FORTRAN, a language widely available to computer users. Although gaining in popularity, APL is not at present so readily available, and the main aim of the translation process is to enable the more desirable features of APL to be at the disposal of a far greater number of users.
    [Show full text]
  • Lecture Materials
    Page 1 of 7 10. EQUILIBRIUM OF A RIGID BODY AND ANALYSIS OF ETRUCTURAS III 10.1 reactions in supports and joints of a two-dimensional structure: The simplest type of data structure is a linear array. This is also called one-dimensional array. In computer science, an array data structure or simply an array is a data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key. An array is stored so that the position of each element can be computed from its index tupleby a mathematical formula.[1][2][3] For example, an array of 10 32-bit integer variables, with indices 0 through 9, may be stored as 10 words at memory addresses 2000, 2004, 2008, … 2036, so that the element with index ihas the address 2000 + 4 × i.[4] Because the mathematical concept of a matrix can be represented as a two- dimensional grid, two-dimensional arrays are also sometimes called matrices. In some cases the term "vector" is used in computing to refer to an array, although tuples rather than vectors are more correctly the mathematical equivalent. Arrays are often used to implement tables, especially lookup tables; the word table is sometimes used as a synonym of array. Arrays are among the oldest and most important data structures, and are used by almost every program. They are also used to implement many other data structures, such as lists andstrings. They effectively exploit the addressing logic of computers. In most modern computers and many external storage devices, the memory is a one-dimensional array of words, whose indices are their addresses.
    [Show full text]
  • Symmetric Data Objects and Remote Memory Access Communication for Fortran 95 Applications
    Symmetric Data Objects and Remote Memory Access Communication for Fortran 95 Applications J. Nieplocha D. Baxter V. Tipparaju Pacific Northwest National Laboratory C. Rassmunsen Los Alamos National Laboratory Robert W. Numrich Minnesota Super Computing Institute University of Minnesota, Minneapoilis, MN. Abstract. Symmetric data objects have been introduced by Cray Inc. in context of SHMEM remote memory access communication on Cray T3D/E systems and later adopted by SGI for their Origin servers. Symmetric data objects greatly simplify parallel programming by allowing programmers to reference remote instance of a data structure by specifying address of the local counterpart. The current paper describes how symmetric data objects and remote memory access communication could be implemented in Fortran 95 without requiring specialized hardware or compiler support. NAS Multi-Grid parallel benchmark was used as an application example and demonstrated competitive performance to the standard MPI implementation. 1. Introduction Fortran is an integral part of the computing environment at major scientific institutions. It is often the language of choice for developing applications that model complex physical, chemical, and biological systems. In addition, Fortran is an evolving language [1]. The Fortran 90/95 standard introduced many new constructs, including derived-data types, new array features and operations, pointers, increased support for code modularization, and enhanced type safety. These features are advantageous to scientific applications and improve the programmer productivity. Remote memory access (RMA) operations facilitate an intermediate programming model between message passing and shared memory. This model combines some advantages of shared memory, such as direct access to shared/global data, and the message-passing model, namely the control over locality and data distribution.
    [Show full text]
  • Programming Language 10CS666 Dept of CSE,SJBIT 1 MODEL
    Programming Language 10CS666 MODEL Question paper and solution 1 a) With diagrams, explain the compilation and interpretation. Compare the two. (7 marks) Ans: The compiler translates the high-level source program into an equivalent target program typically in machine language and then goes away. At some arbitrary later time, the user tells the operating system to run the target program. An alternative style of implementation for high-level languages is known as interpretation output Unlike a compiler, an interpreter stays around for the execution of the application the interpreter reads statements in that language more or less one at a time, executing them as it goes along. Compilation, by contrast, generally leads to better performance. In general, a decision made at compile time is a decision that does not need to be made at run time. For example, if the compiler can guarantee that variable x will always lie at location 49378, it can generate machine language instructions that access this location whenever the source program refers to x. By contrast, an interpreter may need to look x up in a table every time it is accessed, in order to find its location. While the conceptual difference between compilation and interpretation is clear, most language implementations include a mixture of both. They typically look like this: We generally say that a language is interpreted when the initial translator is simple. If the translator is complicated, we say that the language is compiled. b) What is a frame with respect to stack based allocation? With relevant diagram, explain the contents and importance of activation record.
    [Show full text]
  • Sparcompiler Ada Programmer's Guide
    SPARCompiler Ada Programmer’s Guide A Sun Microsystems, Inc. Business 2550 Garcia Avenue Mountain View, CA 94043 U.S.A. Part No.: 802-3641-10 Revision A November, 1995 1995 Sun Microsystems, Inc. All rights reserved. 2550 Garcia Avenue, Mountain View, California 94043-1100 U.S.A. This product and related documentation are protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or related documentation may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Portions of this product may be derived from the UNIX® and Berkeley 4.3 BSD systems, licensed from UNIX System Laboratories, Inc., a wholly owned subsidiary of Novell, Inc., and the University of California, respectively. Third-party font software in this product is protected by copyright and licensed from Sun’s font suppliers. RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the United States Government is subject to the restrictions set forth in DFARS 252.227-7013 (c)(1)(ii) and FAR 52.227-19. The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications. TRADEMARKS Sun, the Sun logo, Sun Microsystems, Solaris, are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and certain other countries. UNIX is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company, Ltd. OPEN LOOK is a registered trademark of Novell, Inc. PostScript and Display PostScript are trademarks of Adobe Systems, Inc.
    [Show full text]
  • Session 6: Data Types and Representation
    Programming Languages Session 6 – Main Theme Data Types and Representation and Introduction to ML Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences Adapted from course textbook resources Programming Language Pragmatics (3rd Edition) Michael L. Scott, Copyright © 2009 Elsevier 1 Agenda 11 SessionSession OverviewOverview 22 DataData TypesTypes andand RepresentationRepresentation 33 MLML 44 ConclusionConclusion 2 What is the course about? Course description and syllabus: » http://www.nyu.edu/classes/jcf/g22.2110-001 » http://www.cs.nyu.edu/courses/fall10/G22.2110-001/index.html Textbook: » Programming Language Pragmatics (3rd Edition) Michael L. Scott Morgan Kaufmann ISBN-10: 0-12374-514-4, ISBN-13: 978-0-12374-514-4, (04/06/09) Additional References: » Osinski, Lecture notes, Summer 2010 » Grimm, Lecture notes, Spring 2010 » Gottlieb, Lecture notes, Fall 2009 » Barrett, Lecture notes, Fall 2008 3 Session Agenda Session Overview Data Types and Representation ML Overview Conclusion 4 Icons / Metaphors Information Common Realization Knowledge/Competency Pattern Governance Alignment Solution Approach 55 Session 5 Review Historical Origins Lambda Calculus Functional Programming Concepts A Review/Overview of Scheme Evaluation Order Revisited High-Order Functions Functional Programming in Perspective Conclusions 6 Agenda 11 SessionSession OverviewOverview 22 DataData TypesTypes andand RepresentationRepresentation 33 MLML 44 ConclusionConclusion 7 Data Types and Representation
    [Show full text]
  • Complex Data Management Tech Report
    Complex Data Management in OpenACCTM Programs Technical Report TR-14-1 OpenACC-Standard.org November, 2014 OpenACCTM Complex Data Management 2 A preeminent problem blocking the adoption of OpenACC by many programmers is support for user-defined types: classes and structures in C/C++ and derived types in Fortran. This problem is particularly challenging for data structures that involve pointer indirection, since transferring these data structures between the disjoint host and accelerator memories found on most modern accelerators requires deep-copy semantics. Even as architectures begin offering virtual or physical unified memory between host and device, the need to map and relocate complex data structures in memory is likely to remain important, since controlling data locality is fundamental to achieving high performance in the presence of complex memory hierarchies. This technical report presents a directive-based solution for describing the shape and layout of complex data structures. These directives are pending adoption in an upcoming OpenACC standard, where they will extend existing transfer mechanisms with the ability to relocate complex data structures. OpenACCTM Complex Data Management This is a preliminary document and may be changed substantially prior to any release of the software implementing this standard. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of the authors. c 2014 OpenACC-Standard.org.
    [Show full text]
  • Improving Dynamic Binary Optimization
    Compiler Technology of Programming Languages Code Generation Part I:Run-Time Support Prof. Farn Wang 2019/02/14 Prof. Farn Wang, Department of Electrical Engineering 1 Storage Allocation Static allocation – Storage allocation was fixed during the entire execution of a program Stack allocation – Space is pushed and popped on a run-time stack during program execution such as procedure calls and returns. Space efficiency Heap allocation Recursive procedures – Allow space to be allocated and freed at any time. 2019/02/14 Prof. Farn Wang, Department of Electrical Engineering 2 Static Stack Dynamic Allocation time Compile time Procedure malloc() at invocation runtime Variable lifetime Program start Procedure call From Alloc() to execution to end to return Free() Space efficiency Less efficient Space reuse is Depends on user high management Implementation Data segments Runtime stack Heap Access efficiency Fast (gp+offset) Fast (fp+offset) Indirect reference Less locality Compiler Easier Easier More difficult optimization 2019/02/14 Prof. Farn Wang, Department of Electrical Engineering 3 Storage Organization Code Code Static data Virtual Stack Static Data BSS address Heap Heap Stack 2019/02/14 Prof. Farn Wang, Department of Electrical Engineering 4 Storage Organization Code header Static Data text Heap data (init) bss (uninit) Initial stack top a.out file format Stack Prof. Farn Wang, Department of Electrical Engineering 5 ELF Object File Format ELF: ELF header Program header table Executable and (required for executables) Linkable Format .text section .data section A common .bss section standard file .symtab format on Unix .rel.txt or Unix-like sys .rel.data for executables, .debug object code, Section header table shared lib, and (required for relocatables) core dumps %readelf –a a.out 2019/02/14 Prof.
    [Show full text]
  • Chapter 7:: Data Types
    Chapter 7:: Data Types 1 2 Data Types • most programming languages include a notion of types for expressions, variables, and values • two main purposes of types – imply a context for operations Programming Language Pragmatics • programmer does not have to state explicitly Michael L. Scott • example: a + b means mathematical addition if a and b are integers – limit the set of operations that can be performed • prevents the programmer from making mistakes • type checking cannot prevent all meaningless operations • it catches enough of them to be useful Copyright © 2005 Elsevier Copyright © 2005 Elsevier 3 4 Type Systems Type Systems • bits in memory can be interpreted in different • a type system consists of ways – a way to define types and associate them with language – instructions constructs • constructs that must have values include constants, variables, – addresses record fields, parameters, literal constants, subroutines, and – characters complex expressions containing these – integers, floats, etc. – rules for type equivalence, type compatibility, and type inference • bits themselves are untyped, but high-level • type equivalence: when the types of two values are the same languages associate types with values • type compatibility: when a value of a given type can be used in a – useful for operator context particular context • type inference: type of an expression based on the types of its parts – allows error checking Copyright © 2005 Elsevier Copyright © 2005 Elsevier 5 6 Type Checking Type Checking • type checking ensures a program
    [Show full text]
  • Data Structures
    Data structures PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Thu, 17 Nov 2011 20:55:22 UTC Contents Articles Introduction 1 Data structure 1 Linked data structure 3 Succinct data structure 5 Implicit data structure 7 Compressed data structure 8 Search data structure 9 Persistent data structure 11 Concurrent data structure 15 Abstract data types 18 Abstract data type 18 List 26 Stack 29 Queue 57 Deque 60 Priority queue 63 Map 67 Bidirectional map 70 Multimap 71 Set 72 Tree 76 Arrays 79 Array data structure 79 Row-major order 84 Dope vector 86 Iliffe vector 87 Dynamic array 88 Hashed array tree 91 Gap buffer 92 Circular buffer 94 Sparse array 109 Bit array 110 Bitboard 115 Parallel array 119 Lookup table 121 Lists 127 Linked list 127 XOR linked list 143 Unrolled linked list 145 VList 147 Skip list 149 Self-organizing list 154 Binary trees 158 Binary tree 158 Binary search tree 166 Self-balancing binary search tree 176 Tree rotation 178 Weight-balanced tree 181 Threaded binary tree 182 AVL tree 188 Red-black tree 192 AA tree 207 Scapegoat tree 212 Splay tree 216 T-tree 230 Rope 233 Top Trees 238 Tango Trees 242 van Emde Boas tree 264 Cartesian tree 268 Treap 273 B-trees 276 B-tree 276 B+ tree 287 Dancing tree 291 2-3 tree 292 2-3-4 tree 293 Queaps 295 Fusion tree 299 Bx-tree 299 Heaps 303 Heap 303 Binary heap 305 Binomial heap 311 Fibonacci heap 316 2-3 heap 321 Pairing heap 321 Beap 324 Leftist tree 325 Skew heap 328 Soft heap 331 d-ary heap 333 Tries 335 Trie
    [Show full text]