IBM XL Fortran for Blue Gene/Q, V14.1
Optimization and Programming Guide
Ve r s i o n 1 4 .1
SC14-7370-00
IBM XL Fortran for Blue Gene/Q, V14.1
Optimization and Programming Guide
Ve r s i o n 1 4 .1
SC14-7370-00 Note Before using this information and the product it supports, read the information in “Notices” on page 311.
First edition This edition applies to IBM XL Fortran for Blue Gene/Q, V14.1 (Program 5799-AH1) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. © Copyright IBM Corporation 1990, 2012. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents
About this information ...... vii Chapter 3. Advanced optimization Who should read this information...... vii concepts ...... 41 How to use this information ...... vii Aliasing ...... 41 How this information is organized ...... vii Inlining...... 41 Conventions ...... viii Finding the right level of inlining ...... 42 Related information ...... xii IBM XL Fortran information...... xii Chapter 4. Managing code size ....45 Standards and specifications ...... xiii Steps for reducing code size ...... 46 Other IBM information ...... xiv Compiler option influences on code size.....46 Technical support ...... xiv The -qipa compiler option ...... 46 How to send your comments ...... xiv The -qinline inlining option ...... 46 The -qhot compiler option ...... 47 Chapter 1. Optimizing your applications 1 The -qcompact compiler option...... 47 Distinguishing between optimization and tuning . . 1 Other influences on code size ...... 47 Steps in the optimization process ...... 2 High activity areas ...... 47 Basic optimization ...... 2 Computed GOTOs and CASE constructs . . . 47 Optimizing at level 0 ...... 3 Code size with dynamic or static linking . . . 48 Optimizing at level 2 ...... 3 Advanced optimization ...... 4 Chapter 5. Compiler-friendly Optimizing at level 3 ...... 5 programming techniques ...... 51 An intermediate step: adding -qhot suboptions at level 3 ...... 6 General practices ...... 51 Optimizing at level 4 ...... 7 Variables and pointers ...... 51 Optimizing at level 5 ...... 8 Arrays ...... 52 Specialized optimization techniques ...... 8 Choosing appropriate variable sizes ...... 52 High-order transformation (HOT) ...... 9 Interprocedural analysis (IPA) ...... 11 Chapter 6. High performance libraries 55 Vector technology ...... 14 Using the Mathematical Acceleration Subsystem Using compiler reports to diagnose optimization libraries (MASS) ...... 55 opportunities ...... 18 Using the scalar library ...... 56 Debugging optimized code ...... 20 Using the vector libraries ...... 58 Understanding different results in optimized Using the SIMD library ...... 62 programs ...... 21 Compiling and linking a program with MASS . . 64 Debugging in the presence of optimization . . . 21 Using the Basic Linear Algebra Subprograms – Using -qoptdebug to help debug optimized BLAS ...... 65 programs ...... 22 BLAS function syntax ...... 65 Tracing procedures in your code ...... 25 Linking the libxlopt library ...... 67 Getting more performance ...... 29 Beyond performance: effective programming Chapter 7. Parallel programming with techniques ...... 30 XL Fortran ...... 69 Thread-level speculative execution...... 69 Chapter 2. Tuning XL compiler Transactional memory ...... 70 applications ...... 31 Profiler for OpenMP ...... 72 Tuning for your target architecture ...... 31 Compiling your parallelized code ...... 72 Using -qarch ...... 31 The _OPENMP C preprocessor macro and Using -qtune ...... 32 conditional compilation ...... 73 Using -qcache ...... 33 Setting runtime options ...... 74 Before you finish tuning ...... 33 XLSMPOPTS ...... 74 Tuning your code for Blue Gene ...... 33 BG_SMP_FAST_WAKEUP ...... 78 Further option driven tuning ...... 35 BG_SPEC_SCRUB_CYCLE ...... 79 Options for providing application characteristics 36 Environment variables for OpenMP .....79 Options to control optimization transformations 38 Environment variables for thread-level Options to assist with performance analysis . . 39 speculative execution ...... 85 Options that can inhibit performance .....40 Environment variables for transactional memory 86 Optimizing your SMP code ...... 89
© Copyright IBM Corp. 1990, 2012 iii Developing and running SMP applications . . . 89 f_pthread_mutex_t ...... 233 Parallelization directives ...... 90 f_pthread_mutex_trylock(mutex) ...... 233 An introduction to parallelization directives . . 90 f_pthread_mutex_unlock(mutex) ...... 234 Detailed descriptions of parallelization directives 93 f_pthread_mutexattr_destroy(mattr) .....234 Directive clauses ...... 146 f_pthread_mutexattr_getpshared(mattr, pshared) 235 Routines for parallel programming ...... 173 f_pthread_mutexattr_gettype(mattr, type) . . . 236 Routines for OpenMP ...... 173 f_pthread_mutexattr_init(mattr) ...... 237 Routines for thread-level speculative execution 195 f_pthread_mutexattr_setpshared(mattr, pshared) 237 Routines for transactional memory .....197 f_pthread_mutexattr_settype(mattr, type) . . . 238 Pthreads library module...... 201 f_pthread_mutexattr_t ...... 239 Pthreads data structures, functions, and f_pthread_once(once, initr) ...... 239 subroutines ...... 202 f_pthread_once_t ...... 239 f_maketime(delay)...... 204 f_pthread_rwlock_destroy(rwlock) .....240 f_pthread_attr_destroy(attr)...... 205 f_pthread_rwlock_init(rwlock, rwattr) ....240 f_pthread_attr_getdetachstate(attr, detach) . . . 205 f_pthread_rwlock_rdlock(rwlock) ...... 241 f_pthread_attr_getguardsize(attr, guardsize) . . 206 f_pthread_rwlock_t ...... 241 f_pthread_attr_getinheritsched(attr, inherit) . . 206 f_pthread_rwlock_tryrdlock(rwlock) .....242 f_pthread_attr_getschedparam(attr, param) . . 207 f_pthread_rwlock_trywrlock(rwlock) ....242 f_pthread_attr_getschedpolicy(attr, policy) . . . 208 f_pthread_rwlock_unlock(rwlock) .....243 f_pthread_attr_getscope(attr, scope) .....208 f_pthread_rwlock_wrlock(rwlock) .....243 f_pthread_attr_getstack(attr, stackaddr, ssize) 209 f_pthread_rwlockattr_destroy(rwattr) ....244 f_pthread_attr_init(attr) ...... 209 f_pthread_rwlockattr_getpshared(rwattr, f_pthread_attr_setdetachstate(attr, detach) . . . 210 pshared) ...... 244 f_pthread_attr_setguardsize(attr, guardsize) . . 211 f_pthread_rwlockattr_init(rwattr) ...... 245 f_pthread_attr_setinheritsched(attr, inherit) . . 211 f_pthread_rwlockattr_setpshared(rwattr, f_pthread_attr_setschedparam(attr, param) . . 212 pshared) ...... 246 f_pthread_attr_setschedpolicy(attr, policy) . . . 213 f_pthread_rwlockattr_t ...... 246 f_pthread_attr_setscope(attr, scope) .....213 f_pthread_self()...... 247 f_pthread_attr_setstack(attr, stackaddr, ssize) 214 f_pthread_setcancelstate(state, oldstate)....247 f_pthread_attr_t ...... 215 f_pthread_setcanceltype(type, oldtype) ....248 f_pthread_cancel(thread) ...... 215 f_pthread_setconcurrency(new_level) ....249 f_pthread_cleanup_pop(exec) ...... 215 f_pthread_setschedparam(thread, policy, param) 249 f_pthread_cleanup_push(cleanup, flag, arg) . . 216 f_pthread_setspecific(key, arg) ...... 250 f_pthread_cond_broadcast(cond) ...... 217 f_pthread_t ...... 251 f_pthread_cond_destroy(cond)...... 218 f_pthread_testcancel() ...... 251 f_pthread_cond_init(cond, cattr) ...... 218 f_sched_param ...... 251 f_pthread_cond_signal(cond) ...... 219 f_sched_yield() ...... 251 f_pthread_cond_t ...... 219 f_timespec ...... 252 f_pthread_cond_timedwait(cond, mutex, timeout) ...... 220 Chapter 8. Interlanguage calls ....253 f_pthread_cond_wait(cond, mutex) .....220 Conventions for XL Fortran external names . . . 253 f_pthread_condattr_destroy(cattr)...... 221 Mixed-language input and output ...... 254 f_pthread_condattr_getpshared(cattr, pshared) 221 Mixing Fortran and C++ ...... 254 f_pthread_condattr_init(cattr) ...... 222 Making calls to C functions work...... 256 f_pthread_condattr_setpshared(cattr, pshared) 223 Passing data from one language to another . . . 257 f_pthread_condattr_t ...... 223 Passing arguments between languages ....257 f_pthread_create(thread, attr, flag, ent, arg) . . 224 Passing global variables between languages . . 258 f_pthread_detach(thread) ...... 225 Passing character types between languages . . 259 f_pthread_equal(thread1, thread2) .....226 Passing arrays between languages .....260 f_pthread_exit(ret) ...... 226 Passing pointers between languages .....261 f_pthread_getconcurrency() ...... 227 Passing arguments by reference or by value . . 261 f_pthread_getschedparam(thread, policy, param) 227 Passing COMPLEX values to/from gcc ....263 f_pthread_getspecific(key, arg)...... 228 Returning values from Fortran functions . . . 263 f_pthread_join(thread, ret) ...... 229 Arguments with the OPTIONAL attribute . . . 263 f_pthread_key_create(key, dtr) ...... 229 Assembler-level subroutine linkage conventions 264 f_pthread_key_delete(key) ...... 230 The stack ...... 265 f_pthread_key_t ...... 231 The Linkage Area and Minimum Stack Frame 266 f_pthread_kill(thread, sig) ...... 231 The input parameter area ...... 267 f_pthread_mutex_destroy(mutex) ...... 231 The register save area ...... 267 f_pthread_mutex_init(mutex, mattr) .....232 The local stack area ...... 267 f_pthread_mutex_lock(mutex) ...... 233 The output parameter area ...... 267 iv XL Fortran: Optimization and Programming Guide Linkage convention for argument passing ....267 Minimizing rounding errors ...... 292 Argument passing rules (by value) .....268 Minimizing overall rounding ...... 292 Order of arguments in argument list ....269 Delaying rounding until run time .....292 Linkage convention for function calls .....269 Ensuring that the rounding mode is consistent 292 Pointers to functions ...... 270 Duplicating the floating-point results of other Function values ...... 270 systems ...... 293 The stack floor ...... 271 Maximizing floating-point performance ....293 Stack overflow ...... 271 Detecting and trapping floating-point exceptions 294 Prolog and epilog ...... 271 Compiler features for trapping floating-point Traceback...... 272 exceptions ...... 294 Installing an exception handler ...... 295 Chapter 9. Implementation details of Producing a core file ...... 296 XL Fortran Input/Output (I/O).....273 Controlling the floating-point status and control register ...... 296 Implementation details of file formats .....273 xlf_fp_util procedures ...... 297 File names ...... 274 fpgets and fpsets subroutines ...... 297 Preconnected and Implicitly Connected Files . . . 275 Sample programs for exception handling . . . 299 File positioning...... 276 Causing exceptions for particular variables . . 299 I/O redirection ...... 276 Minimizing the performance impact of How XL Fortran I/O interacts with pipes, special floating-point exception trapping ...... 299 files, and links ...... 277 Default record lengths ...... 277 File permissions ...... 277 Chapter 11. Porting programs to XL Selecting error messages and recovery actions . . 278 Fortran ...... 301 Flushing I/O buffers ...... 278 Outline of the porting process ...... 301 Choosing locations and names for Input/Output Portability of directives ...... 301 files ...... 279 Common industry extensions that XL Fortran Naming files that are connected with no explicit supports ...... 302 name ...... 279 Mixing data types in statements ...... 303 Naming scratch files ...... 279 Date and time routines ...... 304 XL Fortran thread-safe I/O library ...... 280 Other libc routines ...... 304 Synchronization of I/O operations .....280 Changing the default sizes of data types . . . 304 Parallel I/O issues...... 281 Name conflicts between your procedures and Use of I/O statements in signal handlers . . . 283 XL Fortran intrinsic procedures ...... 304 Asynchronous thread cancellation ...... 283 Reproducing results from other systems . . . 304
Chapter 10. Implementation details of Chapter 12. Sample Fortran programs 305 XL Fortran floating-point processing . 285 Example1-XLFortran source file ...... 305 IEEE floating-point overview ...... 285 Example 2 - valid C routine source file .....305 Compiling for strict IEEE conformance ....285 Example 3 - valid Fortran SMP source file ....308 IEEE Single- and double-precision values . . . 286 Example 4 - invalid Fortran SMP source file . . . 308 IEEE extended-precision values ...... 286 Programming examples using the Pthreads library Infinities and NaNs ...... 286 module ...... 309 Exception-handling model ...... 287 Hardware-specific floating-point overview....288 Notices ...... 311 Single- and double-precision values .....288 Trademarks and service marks ...... 313 Extended-precision values ...... 289 How XL Fortran rounds floating-point calculations 290 Index ...... 315 Selecting the rounding mode ...... 290
Contents v vi XL Fortran: Optimization and Programming Guide About this information
This information is part of the IBM® XL Fortran for Blue Gene®/Q, V14.1 information suite. It provides both reference information and practical tips for using XL Fortran's optimization and tuning capabilities to maximize application performance, as well as expanding on programming concepts such as I/O and interlanguage calls.
Who should read this information This information is for anyone who wants to exploit the XL Fortran compiler's capabilities for optimizing and tuning Fortran programs. Readers should be familiar with their Blue Gene/Q operating system and have extensive Fortran programming experience with complex applications. However, users new to XL Fortran can still use this information to help them understand how the compiler's features can be used for effective program optimization.
How to use this information This guide focuses on specific programming and compilation techniques that can maximize XL Fortran application performance. It covers optimization and tuning strategies, recommended programming practices and compilation procedures, debugging, and information on using XL Fortran advanced language features. This guide also contains cross-references to relevant topics of other reference guides in the XL Fortran information suite.
Topics not described in this information are available as indicated in the following: v Installation, system requirements, last-minute updates: see the XL Fortran Installation Guide and product README. v Overview of XL Fortran features: see the Getting Started with XL Fortran. v Syntax, semantics, and implementation of the XL Fortran programming language: see the XL Fortran Language Reference. v Compiler setup, compiling and running programs, compiler options, diagnostics: see the XL Fortran Compiler Reference. v Operating system commands related to the use of the compiler: consult your Linux-specific distribution's man page help and information.
How this information is organized This guide includes the following topics: v Chapter 1, “Optimizing your applications,” on page 1 provides an overview of the optimization process. v Chapter 2, “Tuning XL compiler applications,” on page 31 discusses the compiler options available for optimizing and tuning code. v Chapter 3, “Advanced optimization concepts,” on page 41, Chapter 4, “Managing code size,” on page 45, and “Debugging optimized code” on page 20 discuss advanced techniques like optimizing loops and inlining code, and debug considerations for optimized code.
© Copyright IBM Corp. 1990, 2012 vii v The following sections contain information on how to write optimization friendly, portable XL Fortran code, that is interoperable with other languages. Also included is a description of XL Fortran's OpenMP and SMP support with guidelines for writing parallel code. – Chapter 5, “Compiler-friendly programming techniques,” on page 51 – Chapter 6, “High performance libraries,” on page 55 – Chapter 7, “Parallel programming with XL Fortran,” on page 69 – Chapter 8, “Interlanguage calls,” on page 253 v The following sections contain information about XL Fortran and its implementation that can be useful for new and experienced users alike, as well as those who want to move their existing Fortran applications to the XL Fortran compiler: – Chapter 9, “Implementation details of XL Fortran Input/Output (I/O),” on page 273 – Chapter 10, “Implementation details of XL Fortran floating-point processing,” on page 285 – Chapter 11, “Porting programs to XL Fortran,” on page 301
Conventions Typographical conventions
The following table shows the typographical conventions used in the IBM XL Fortran for Blue Gene/Q, V14.1 information. Table 1. Typographical conventions Typeface Indicates Example bold Lowercase commands, executable The compiler provides basic names, compiler options, and invocation commands, bgxlf, along directives. with several other compiler invocation commands to support various Fortran language levels and compilation environments. italics Parameters or variables whose Make sure that you update the size actual names or values are to be parameter if you return more than supplied by the user. Italics are the size requested. also used to introduce new terms. underlining The default setting of a parameter nomaf | maf of a compiler option or directive. monospace Programming keywords and To compile and optimize library functions, compiler builtins, myprogram.f, enter: bgxlf examples of program code, myprogram.f -O3. command strings, or user-defined names. UPPERCASE Fortran programming keywords, The ASSERT directive applies only to bold statements, directives, and intrinsic the DO loop immediately following procedures. Uppercase letters may the directive, and not to any nested also be used to indicate the DO loops. minimum number of characters required to invoke a compiler option/suboption.
viii XL Fortran: Optimization and Programming Guide Qualifying elements (icons and bracket separators)
In descriptions of language elements, this information uses icons and marked bracket separators to delineate the Fortran language standard text as follows: Table 2. Qualifying elements Bracket Icon separator text Meaning
F2008 N/A The text describes an IBM XL Fortran implementation of the Fortran 2008 standard. F2008
Fortran 2003 The text describes an IBM XL Fortran implementation of begins / ends the Fortran 2003 standard, and it applies to all later standards.
IBM extension The text describes a feature that is an IBM XL Fortran begins / ends extension to the standard language specifications.
Note: If the information is marked with a Fortran language standard icon or bracket separators, it applies to this specific Fortran language standard and all later ones. If it is not marked, it applies to all Fortran language standards. Syntax diagrams
Throughout this information, diagrams illustrate XL Fortran syntax. This section will help you to interpret and use those diagrams. v Read the syntax diagrams from left to right, from top to bottom, following the path of the line. The ─── symbol indicates the beginning of a command, directive, or statement. The ─── symbol indicates that the command, directive, or statement syntax is continued on the next line. The ─── symbol indicates that a command, directive, or statement is continued from the previous line. The ─── symbol indicates the end of a command, directive, or statement. Fragments, which are diagrams of syntactical units other than complete commands, directives, or statements, start with the │─── symbol and end with the ───│ symbol. IBM XL Fortran extensions are marked by a number in the syntax diagram with an explanatory note immediately following the diagram. Program units, procedures, constructs, interface blocks and derived-type definitions consist of several individual statements. For such items, a box encloses the syntax representation, and individual syntax diagrams show the required order for the equivalent Fortran statements. v Required items are shown on the horizontal line (the main path):