XL Fortran: Optimization and Programming Guide Linkage Convention for Argument Passing ....267 Minimizing Rounding Errors
Total Page:16
File Type:pdf, Size:1020Kb
IBM XL Fortran for Blue Gene/Q, V14.1 Optimization and Programming Guide Ve r s i o n 1 4 .1 SC14-7370-00 IBM XL Fortran for Blue Gene/Q, V14.1 Optimization and Programming Guide Ve r s i o n 1 4 .1 SC14-7370-00 Note Before using this information and the product it supports, read the information in “Notices” on page 311. First edition This edition applies to IBM XL Fortran for Blue Gene/Q, V14.1 (Program 5799-AH1) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. © Copyright IBM Corporation 1990, 2012. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this information ........vii Chapter 3. Advanced optimization Who should read this information.......vii concepts ..............41 How to use this information ........vii Aliasing ...............41 How this information is organized ......vii Inlining................41 Conventions ..............viii Finding the right level of inlining ......42 Related information ...........xii IBM XL Fortran information........xii Chapter 4. Managing code size ....45 Standards and specifications .......xiii Steps for reducing code size .........46 Other IBM information .........xiv Compiler option influences on code size.....46 Technical support ............xiv The -qipa compiler option ........46 How to send your comments ........xiv The -qinline inlining option ........46 The -qhot compiler option ........47 Chapter 1. Optimizing your applications 1 The -qcompact compiler option.......47 Distinguishing between optimization and tuning . 1 Other influences on code size ........47 Steps in the optimization process .......2 High activity areas ...........47 Basic optimization ............2 Computed GOTOs and CASE constructs . 47 Optimizing at level 0 ..........3 Code size with dynamic or static linking . 48 Optimizing at level 2 ..........3 Advanced optimization ...........4 Chapter 5. Compiler-friendly Optimizing at level 3 ..........5 programming techniques .......51 An intermediate step: adding -qhot suboptions at level 3 ...............6 General practices ............51 Optimizing at level 4 ..........7 Variables and pointers ...........51 Optimizing at level 5 ..........8 Arrays ................52 Specialized optimization techniques ......8 Choosing appropriate variable sizes ......52 High-order transformation (HOT) ......9 Interprocedural analysis (IPA) .......11 Chapter 6. High performance libraries 55 Vector technology ...........14 Using the Mathematical Acceleration Subsystem Using compiler reports to diagnose optimization libraries (MASS) .............55 opportunities .............18 Using the scalar library .........56 Debugging optimized code .........20 Using the vector libraries .........58 Understanding different results in optimized Using the SIMD library .........62 programs ..............21 Compiling and linking a program with MASS . 64 Debugging in the presence of optimization . 21 Using the Basic Linear Algebra Subprograms – Using -qoptdebug to help debug optimized BLAS ................65 programs ..............22 BLAS function syntax ..........65 Tracing procedures in your code .......25 Linking the libxlopt library ........67 Getting more performance .........29 Beyond performance: effective programming Chapter 7. Parallel programming with techniques ...............30 XL Fortran .............69 Thread-level speculative execution.......69 Chapter 2. Tuning XL compiler Transactional memory ...........70 applications ............31 Profiler for OpenMP ...........72 Tuning for your target architecture ......31 Compiling your parallelized code .......72 Using -qarch .............31 The _OPENMP C preprocessor macro and Using -qtune .............32 conditional compilation .........73 Using -qcache ............33 Setting runtime options ..........74 Before you finish tuning .........33 XLSMPOPTS .............74 Tuning your code for Blue Gene .......33 BG_SMP_FAST_WAKEUP ........78 Further option driven tuning ........35 BG_SPEC_SCRUB_CYCLE ........79 Options for providing application characteristics 36 Environment variables for OpenMP .....79 Options to control optimization transformations 38 Environment variables for thread-level Options to assist with performance analysis . 39 speculative execution ..........85 Options that can inhibit performance .....40 Environment variables for transactional memory 86 Optimizing your SMP code .........89 © Copyright IBM Corp. 1990, 2012 iii Developing and running SMP applications . 89 f_pthread_mutex_t ..........233 Parallelization directives ..........90 f_pthread_mutex_trylock(mutex) ......233 An introduction to parallelization directives . 90 f_pthread_mutex_unlock(mutex) ......234 Detailed descriptions of parallelization directives 93 f_pthread_mutexattr_destroy(mattr) .....234 Directive clauses ...........146 f_pthread_mutexattr_getpshared(mattr, pshared) 235 Routines for parallel programming ......173 f_pthread_mutexattr_gettype(mattr, type) . 236 Routines for OpenMP .........173 f_pthread_mutexattr_init(mattr) ......237 Routines for thread-level speculative execution 195 f_pthread_mutexattr_setpshared(mattr, pshared) 237 Routines for transactional memory .....197 f_pthread_mutexattr_settype(mattr, type) . 238 Pthreads library module..........201 f_pthread_mutexattr_t .........239 Pthreads data structures, functions, and f_pthread_once(once, initr) ........239 subroutines .............202 f_pthread_once_t ...........239 f_maketime(delay)...........204 f_pthread_rwlock_destroy(rwlock) .....240 f_pthread_attr_destroy(attr)........205 f_pthread_rwlock_init(rwlock, rwattr) ....240 f_pthread_attr_getdetachstate(attr, detach) . 205 f_pthread_rwlock_rdlock(rwlock) ......241 f_pthread_attr_getguardsize(attr, guardsize) . 206 f_pthread_rwlock_t ..........241 f_pthread_attr_getinheritsched(attr, inherit) . 206 f_pthread_rwlock_tryrdlock(rwlock) .....242 f_pthread_attr_getschedparam(attr, param) . 207 f_pthread_rwlock_trywrlock(rwlock) ....242 f_pthread_attr_getschedpolicy(attr, policy) . 208 f_pthread_rwlock_unlock(rwlock) .....243 f_pthread_attr_getscope(attr, scope) .....208 f_pthread_rwlock_wrlock(rwlock) .....243 f_pthread_attr_getstack(attr, stackaddr, ssize) 209 f_pthread_rwlockattr_destroy(rwattr) ....244 f_pthread_attr_init(attr) .........209 f_pthread_rwlockattr_getpshared(rwattr, f_pthread_attr_setdetachstate(attr, detach) . 210 pshared) ..............244 f_pthread_attr_setguardsize(attr, guardsize) . 211 f_pthread_rwlockattr_init(rwattr) ......245 f_pthread_attr_setinheritsched(attr, inherit) . 211 f_pthread_rwlockattr_setpshared(rwattr, f_pthread_attr_setschedparam(attr, param) . 212 pshared) ..............246 f_pthread_attr_setschedpolicy(attr, policy) . 213 f_pthread_rwlockattr_t .........246 f_pthread_attr_setscope(attr, scope) .....213 f_pthread_self()............247 f_pthread_attr_setstack(attr, stackaddr, ssize) 214 f_pthread_setcancelstate(state, oldstate)....247 f_pthread_attr_t ...........215 f_pthread_setcanceltype(type, oldtype) ....248 f_pthread_cancel(thread) ........215 f_pthread_setconcurrency(new_level) ....249 f_pthread_cleanup_pop(exec) .......215 f_pthread_setschedparam(thread, policy, param) 249 f_pthread_cleanup_push(cleanup, flag, arg) . 216 f_pthread_setspecific(key, arg) .......250 f_pthread_cond_broadcast(cond) ......217 f_pthread_t .............251 f_pthread_cond_destroy(cond).......218 f_pthread_testcancel() .........251 f_pthread_cond_init(cond, cattr) ......218 f_sched_param ............251 f_pthread_cond_signal(cond) .......219 f_sched_yield() ............251 f_pthread_cond_t ...........219 f_timespec .............252 f_pthread_cond_timedwait(cond, mutex, timeout) ..............220 Chapter 8. Interlanguage calls ....253 f_pthread_cond_wait(cond, mutex) .....220 Conventions for XL Fortran external names . 253 f_pthread_condattr_destroy(cattr)......221 Mixed-language input and output ......254 f_pthread_condattr_getpshared(cattr, pshared) 221 Mixing Fortran and C++ .........254 f_pthread_condattr_init(cattr) .......222 Making calls to C functions work.......256 f_pthread_condattr_setpshared(cattr, pshared) 223 Passing data from one language to another . 257 f_pthread_condattr_t ..........223 Passing arguments between languages ....257 f_pthread_create(thread, attr, flag, ent, arg) . 224 Passing global variables between languages . 258 f_pthread_detach(thread) ........225 Passing character types between languages . 259 f_pthread_equal(thread1, thread2) .....226 Passing arrays between languages .....260 f_pthread_exit(ret) ...........226 Passing pointers between languages .....261 f_pthread_getconcurrency() ........227 Passing arguments by reference or by value . 261 f_pthread_getschedparam(thread, policy, param) 227 Passing COMPLEX values to/from gcc ....263 f_pthread_getspecific(key, arg).......228 Returning values from Fortran functions . 263 f_pthread_join(thread, ret) ........229 Arguments with the OPTIONAL attribute . 263 f_pthread_key_create(key, dtr) .......229 Assembler-level subroutine linkage conventions 264 f_pthread_key_delete(key) ........230 The stack ..............265 f_pthread_key_t ...........231 The Linkage Area and Minimum Stack Frame 266 f_pthread_kill(thread, sig) ........231 The input parameter area ........267 f_pthread_mutex_destroy(mutex) ......231 The register save area .........267 f_pthread_mutex_init(mutex, mattr) .....232 The local stack area ..........267 f_pthread_mutex_lock(mutex)