XL Fortran: Optimization and Programming Guide for Little Endian Distributions F Pthread Rwlock Unlock(Rwlock)
Total Page:16
File Type:pdf, Size:1020Kb
IBM XL Fortran for Linux, V16.1 IBM Optimization and Programming Guide for Little Endian Distributions Version 16.1 SC27-8049-00 IBM XL Fortran for Linux, V16.1 IBM Optimization and Programming Guide for Little Endian Distributions Version 16.1 SC27-8049-00 Note Before using this information and the product it supports, read the information in “Notices” on page 381. First edition This edition applies to IBM XL Fortran for Linux, V16.1 (Program 5765-J10; 5725-C75) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. © Copyright IBM Corporation 1990, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document ........ vii Chapter 4. Managing code size .... 47 Who should read this document ....... vii Steps for reducing code size ......... 48 How to use this document ......... vii Compiler option influences on code size..... 48 How this document is organized ....... vii The -qipa compiler option ........ 48 Conventions .............. viii The -qinline inlining option ........ 48 Related information ........... xii The -qhot compiler option ........ 49 Available help information ........ xii The -qcompact compiler option....... 49 Standards and specifications ....... xiv Other influences on code size ........ 49 Other IBM information ......... xv High activity areas ........... 49 Technical support ............ xv Computed GOTOs and CASE constructs ... 49 How to send your comments ........ xv Code size with dynamic or static linking ... 50 Chapter 1. Optimizing your applications 1 Chapter 5. Debugging optimized code 53 Distinguishing between optimization and tuning .. 1 Understanding different results in optimized Steps in the optimization process ....... 2 programs ............... 53 Basic optimization ............ 2 Debugging in the presence of optimization .... 54 Optimizing at level 0 .......... 3 Optimizing at level 2 .......... 3 Chapter 6. Compiler-friendly Advanced optimization ........... 4 programming techniques ....... 57 Optimizing at level 3 .......... 5 General practices ............ 57 An intermediate step: adding -qhot suboptions at Variables and pointers ........... 57 level 3 ............... 6 Arrays ................ 58 Optimizing at level 4 .......... 7 Choosing appropriate variable sizes ...... 58 Optimizing at level 5 .......... 8 Submodules (Fortran 2008) ......... 59 Specialized optimization techniques ...... 8 High-order transformation (HOT) ...... 9 Interprocedural analysis (IPA) ....... 11 Chapter 7. Exploiting POWER9 Profile-directed feedback ......... 15 technology ............. 63 Vector technology ........... 23 POWER9 compiler options ......... 63 Using compiler reports to diagnose optimization High-performance libraries that are tuned for opportunities ............. 27 POWER9 ............... 63 Tracing procedures in your code ....... 29 POWER9 intrinsic procedures ........ 63 Getting more performance ......... 33 Beyond performance: effective programming Chapter 8. High performance libraries 67 techniques ............... 34 Using the Mathematical Acceleration Subsystem (MASS) libraries ............. 67 Chapter 2. Tuning XL compiler Using the scalar library ......... 68 applications ............ 35 Using the vector libraries ......... 70 Tuning for your target architecture ...... 35 Using the SIMD libraries ......... 74 Using -qarch ............. 35 Compiling and linking a program with MASS .. 79 Using -qtune ............. 36 Using the Basic Linear Algebra Subprograms - BLAS 80 Using -qcache ............ 37 BLAS function syntax .......... 81 Before you finish tuning ......... 37 Linking the libxlopt library ........ 83 Further option driven tuning ........ 37 Options for providing application characteristics 38 Chapter 9. Parallel programming with Options to control optimization transformations 40 XL Fortran ............. 85 Options to assist with performance analysis .. 41 Compiling your parallelized code ....... 85 Options that can inhibit performance ..... 42 The _OPENMP C preprocessor macro and conditional compilation ......... 85 Chapter 3. Advanced optimization Setting runtime options .......... 86 concepts .............. 43 XLSMPOPTS ............. 86 Aliasing ............... 43 Environment variables for OpenMP ..... 92 Inlining................ 43 Optimizing your SMP code......... 106 Finding the right level of inlining ...... 44 Developing and running SMP applications .. 107 Parallelization directives.......... 107 © Copyright IBM Corp. 1990, 2018 iii Summary of supported parallelization directives 108 f_pthread_attr_getschedpolicy(attr, policy) ... 265 Detailed descriptions of parallelization directives 111 f_pthread_attr_getscope(attr, scope) ..... 266 Data sharing attribute rules........ 204 f_pthread_attr_getstack(attr, stackaddr, ssize) 266 Directive clauses ........... 206 f_pthread_attr_init(attr) ......... 267 Routines for OpenMP .......... 229 f_pthread_attr_setdetachstate(attr, detach) ... 267 omp_destroy_lock(svar) ......... 231 f_pthread_attr_setguardsize(attr, guardsize) .. 268 omp_destroy_nest_lock(nvar) ....... 232 f_pthread_attr_setinheritsched(attr, inherit) .. 269 omp_get_active_level() ......... 232 f_pthread_attr_setschedparam(attr, param) .. 269 omp_get_ancestor_thread_num(level) .... 232 f_pthread_attr_setschedpolicy(attr, policy) ... 270 omp_get_cancellation() ......... 233 f_pthread_attr_setscope(attr, scope) ..... 271 omp_get_default_device() ........ 233 f_pthread_attr_setstack(attr, stackaddr, ssize) 271 omp_get_dynamic() .......... 234 f_pthread_attr_t ........... 272 omp_get_initial_device()......... 234 f_pthread_cancel(thread) ........ 272 omp_get_level() ........... 235 f_pthread_cleanup_pop(exec) ....... 273 omp_get_max_active_levels() ....... 235 f_pthread_cleanup_push(cleanup, flag, arg) .. 273 omp_get_max_threads() ......... 235 f_pthread_cond_broadcast(cond) ...... 275 omp_get_nested() ........... 236 f_pthread_cond_destroy(cond)....... 275 omp_get_num_devices() ......... 236 f_pthread_cond_init(cond, cattr) ...... 276 omp_get_num_places() ......... 237 f_pthread_cond_signal(cond) ....... 276 omp_get_num_procs() ......... 237 f_pthread_cond_t ........... 277 omp_get_num_teams() ......... 238 f_pthread_cond_timedwait(cond, mutex, omp_get_num_threads() ......... 238 timeout) .............. 277 omp_get_partition_num_places() ...... 239 f_pthread_cond_wait(cond, mutex) ..... 278 omp_get_partition_place_nums(place_nums) 239 f_pthread_condattr_destroy(cattr)...... 278 omp_get_place_num() ......... 240 f_pthread_condattr_getpshared(cattr, pshared) 279 omp_get_place_num_procs(place_num).... 240 f_pthread_condattr_init(cattr) ....... 279 omp_get_place_proc_ids(place_num, ids) ... 241 f_pthread_condattr_setpshared(cattr, pshared) 280 omp_get_proc_bind() .......... 241 f_pthread_condattr_t .......... 281 omp_get_schedule(kind, modifier) ..... 242 f_pthread_create(thread, attr, flag, ent, arg) .. 281 omp_get_team_num() ......... 243 f_pthread_detach(thread) ........ 282 omp_get_team_size(level) ........ 243 f_pthread_equal(thread1, thread2) ..... 283 omp_get_thread_limit() ......... 244 f_pthread_exit(ret) ........... 283 omp_get_thread_num() ......... 244 f_pthread_getconcurrency() ........ 284 omp_get_wtick() ........... 245 f_pthread_getschedparam(thread, policy, param) 285 omp_get_wtime() ........... 246 f_pthread_getspecific(key, arg)....... 285 omp_in_final() ............ 246 f_pthread_join(thread, ret) ........ 286 omp_in_parallel() ........... 247 f_pthread_key_create(key, dtr) ....... 287 omp_init_lock(svar) .......... 248 f_pthread_key_delete(key) ........ 287 omp_init_nest_lock(nvar) ........ 249 f_pthread_key_t ........... 288 omp_is_initial_device() ......... 249 f_pthread_kill(thread, sig) ........ 288 omp_set_default_device(device_num) .... 250 f_pthread_mutex_destroy(mutex) ...... 289 omp_set_dynamic(enable_expr) ...... 250 f_pthread_mutex_init(mutex, mattr) ..... 289 omp_set_lock(svar) .......... 251 f_pthread_mutex_lock(mutex) ....... 290 omp_set_max_active_levels(max_levels) ... 252 f_pthread_mutex_t .......... 290 omp_set_nested(enable_expr) ....... 252 f_pthread_mutex_trylock(mutex) ...... 291 omp_set_nest_lock(nvar) ........ 253 f_pthread_mutex_unlock(mutex) ...... 291 omp_set_num_threads(number_of_threads_expr) 254 f_pthread_mutexattr_destroy(mattr) ..... 292 omp_set_schedule(kind, modifier) ..... 255 f_pthread_mutexattr_getpshared(mattr, pshared) 292 omp_test_lock(svar) .......... 256 f_pthread_mutexattr_gettype(mattr, type) ... 293 omp_test_nest_lock(nvar) ........ 256 f_pthread_mutexattr_init(mattr) ...... 294 omp_unset_lock(svar) ......... 257 f_pthread_mutexattr_setpshared(mattr, pshared) 294 omp_unset_nest_lock(nvar)........ 258 f_pthread_mutexattr_settype(mattr, type) ... 295 Pthreads Library Module ......... 259 f_pthread_mutexattr_t ......... 296 Pthreads data structures, functions, and f_pthread_once(once, initr) ........ 296 subroutines ............. 259 f_pthread_once_t ........... 297 f_maketime(delay)........... 262 f_pthread_rwlock_destroy(rwlock) ..... 297 f_pthread_attr_destroy(attr)........ 262 f_pthread_rwlock_init(rwlock, rwattr) .... 297 f_pthread_attr_getdetachstate(attr, detach) ... 263 f_pthread_rwlock_rdlock(rwlock) ...... 298 f_pthread_attr_getguardsize(attr, guardsize) .. 263 f_pthread_rwlock_t .......... 299 f_pthread_attr_getinheritsched(attr, inherit) .. 264 f_pthread_rwlock_tryrdlock(rwlock) ....