XL Fortran Optimization and Programming Guide

IBM XL Fortran Advanced Edition V10.1 for Linux Optimization and Programming Guide SC09-8022-00 IBM XL Fortran Advanced Edition V10.1 for Linux Optimization and Programming Guide SC09-8022-00 Note! Before using this information and the product it supports, be sure to read the general information under “Notices” on page 253. First Edition (November 2005) This edition applies to Version 10.1 of IBM XL Fortran Advanced Edition V10.1 for Linux (Program 5724-M17) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. IBM welcomes your comments. You can send your comments electronically to the network ID listed below. Be sure to include your entire network address if you wish a reply. v Internet: [email protected] When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. © Copyright International Business Machines Corporation 1990, 2005. All rights reserved. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document . vii Using -qtune . .20 Who should read this document . vii Using -qcache . .20 How to use this document . vii Before you finish tuning . .20 How this document is organized . vii Further option driven tuning . .21 Conventions and terminology used in this Options for providing application characteristics 21 document . viii Options to control optimization transformations 23 Typographical conventions . viii Options to assist with performance analysis . .24 How to read syntax diagrams . viii Options that can inhibit performance . .25 How to read syntax statements . .xi Examples . .xi Chapter 4. Advanced optimization Notes on the path names . .xi concepts . .27 Notes on the terminology used . .xi Aliasing . .27 Related information . xii Inlining . .28 IBM XL Fortran documentation . xii Finding the right level of inlining . .28 Additional documentation . xiii Related documentation . xiii Chapter 5. Managing code size . .31 Standards documents . xiii Steps for reducing code size . .32 Technical support . xiv Compiler option influences on code size . .32 How to send your comments . xiv The -qipa compiler option . .32 The -Q inlining option . .32 Chapter 1. Performance concepts . .1 The -qhot compiler option . .32 Optimization explained . .1 The -qcompact compiler option . .33 Tuning explained . .2 Other influences on code size . .33 Beyond optimization and tuning: effective High activity areas . .33 programming techniques . .3 Computed GOTOs and CASE constructs . .33 Linking and code size . .34 Chapter 2. Optimizing XL compiler applications . .5 Chapter 6. Debugging optimized code 37 Why optimization is essential. .5 Different results in optimized programs . .37 Basic command-line optimization . .5 Optimizing at level 0 . .6 Chapter 7. Compiler-friendly Optimizing at level 2 . .6 programming techniques . .39 Advanced command-line optimization . .7 General practices . .39 Optimizing at level 3 . .8 Variables and pointers . .40 An intermediate step: adding -qhot suboptions at Arrays . .40 level 3 . .8 Choosing appropriate variable sizes . .40 Optimization at level 4 . .9 Optimization at level 5 . .10 Chapter 8. High performance libraries 41 Benefits of high-order transformation (HOT) . .10 HOT short vectorization . .11 Using the Mathematical Acceleration Subsystem HOT long vectorization . .11 (MASS) . .41 HOT array size adjustment . .11 Using the scalar library . .41 Benefits of interprocedural analysis (IPA) . .12 Using the vector libraries . .43 Using IPA on the compile step only . .13 Compiling and linking a program with MASS . .46 IPA Levels and other IPA suboptions . .13 Using the Basic Linear Algebra Subprograms (BLAS) 47 Using IPA across the XL compiler family . .14 BLAS function syntax . .48 Benefits of profile-directed feedback (PDF) . .15 Linking the libxlopt library . .50 PDF walkthrough . .15 Getting more performance . .16 Chapter 9. Parallel programming with XL Fortran . .51 Chapter 3. Tuning XL compiler Compiling your SMP code . .51 applications . .17 Setting OMP and SMP run time options . .51 Tuning for your target architecture . .17 The XLSMPOPTS environment variable . .51 Using -qarch . .18 OpenMP environment variables . .56 Optimizing your SMP code . .58 © Copyright IBM Corp. 1990, 2005 iii Developing and running SMP applications . .58 omp_set_nest_lock(nvar) . 134 An introduction to SMP directives . .59 omp_set_num_threads(number_of_threads_expr) 135 Parallel region construct . .59 omp_test_lock(svar) . 136 Work-sharing constructs . .59 omp_test_nest_lock(nvar) . 136 Combined parallel work-sharing constructs . .60 omp_unset_lock(svar) . 137 Synchronization constructs . .60 omp_unset_nest_lock(nvar) . 138 Other OpenMP Directives . .60 Pthreads library module . 139 Non-OpenMP SMP directives . .60 Pthreads data structures, functions, and Detailed descriptions of SMP directives . .60 subroutines . 139 ATOMIC . .60 f_maketime(delay). 141 BARRIER . .63 f_pthread_attr_destroy(attr). 142 CRITICAL / END CRITICAL . .64 f_pthread_attr_getdetachstate(attr, detach) . 142 DO / END DO . .66 f_pthread_attr_getguardsize(attr, guardsize) . 143 DO SERIAL . .69 f_pthread_attr_getinheritsched(attr, inherit) . 143 FLUSH . .70 f_pthread_attr_getschedparam(attr, param) . 144 MASTER / END MASTER . .72 f_pthread_attr_getschedpolicy(attr, policy) . 144 ORDERED / END ORDERED . .74 f_pthread_attr_getscope(attr, scope) . 145 PARALLEL / END PARALLEL . .76 f_pthread_attr_getstack(attr, stackaddr, ssize) 145 PARALLEL DO / END PARALLEL DO . .78 f_pthread_attr_init(attr) . 146 PARALLEL SECTIONS / END PARALLEL f_pthread_attr_setdetachstate(attr, detach) . 146 SECTIONS . .82 f_pthread_attr_setguardsize(attr, guardsize) . 147 PARALLEL WORKSHARE / END PARALLEL f_pthread_attr_setinheritsched(attr, inherit) . 147 WORKSHARE . .85 f_pthread_attr_setschedparam(attr, param) . 148 SCHEDULE . .86 f_pthread_attr_setschedpolicy(attr, policy) . 149 SECTIONS / END SECTIONS . .89 f_pthread_attr_setscope(attr, scope) . 149 SINGLE / END SINGLE . .93 f_pthread_attr_setstack(attr, stackaddr, ssize) 150 THREADLOCAL . .96 f_pthread_attr_t . 150 THREADPRIVATE . .99 f_pthread_cancel(thread) . 151 WORKSHARE . 103 f_pthread_cleanup_pop(exec) . 151 OpenMP directive clauses . 106 f_pthread_cleanup_push(cleanup, flag, arg) . 152 Global rules for directive clauses . 106 f_pthread_cond_broadcast(cond) . 153 COPYIN . 107 f_pthread_cond_destroy(cond) . 153 COPYPRIVATE . 109 f_pthread_cond_init(cond, cattr) . 154 DEFAULT . .110 f_pthread_cond_signal(cond) . 154 IF . 111 f_pthread_cond_t . 155 FIRSTPRIVATE . .112 f_pthread_cond_timedwait(cond, mutex, LASTPRIVATE . .113 timeout) . 155 NUM_THREADS . .115 f_pthread_cond_wait(cond, mutex) . 156 ORDERED . .115 f_pthread_condattr_destroy(cattr) . 156 PRIVATE . .116 f_pthread_condattr_getpshared(cattr, pshared) 156 REDUCTION . .117 f_pthread_condattr_init(cattr) . 157 SCHEDULE . 120 f_pthread_condattr_setpshared(cattr, pshared) 158 SHARED . 122 f_pthread_condattr_t . 158 OpenMP execution environment, lock and timing f_pthread_create(thread, attr, flag, ent, arg) . 159 routines . 124 f_pthread_detach(thread) . 160 omp_destroy_lock(svar) . 125 f_pthread_equal(thread1, thread2) . 160 omp_destroy_nest_lock(nvar) . 125 f_pthread_exit(ret) . 161 omp_get_dynamic() . 126 f_pthread_getconcurrency() . 161 omp_get_max_threads() . 126 f_pthread_getschedparam(thread, policy, param) 162 omp_get_nested() . 127 f_pthread_getspecific(key, arg) . 162 omp_get_num_procs() . 127 f_pthread_join(thread, ret) . 163 omp_get_num_threads() . 127 f_pthread_key_create(key, dtr) . 163 omp_get_thread_num() . 128 f_pthread_key_delete(key) . 164 omp_get_wtick() . 129 f_pthread_key_t . 164 omp_get_wtime() . 130 f_pthread_kill(thread, sig) . 165 omp_in_parallel() . 130 f_pthread_mutex_destroy(mutex) . 165 omp_init_lock(svar) . 131 f_pthread_mutex_init(mutex, mattr) . 166 omp_init_nest_lock(nvar) . 132 f_pthread_mutex_lock(mutex) . 166 omp_set_dynamic(enable_expr) . 132 f_pthread_mutex_t . 167 omp_set_lock(svar) . 133 f_pthread_mutex_trylock(mutex) . 167 omp_set_nested(enable_expr) . 134 f_pthread_mutex_unlock(mutex) . 167 iv XL Fortran Optimization and Programming Guide f_pthread_mutexattr_destroy(mattr) . 168 Linkage convention for function calls . 205 f_pthread_mutexattr_getpshared(mattr, pshared) 168 Pointers to functions . 206 f_pthread_mutexattr_gettype(mattr, type) . 169 Function values . 206 f_pthread_mutexattr_init(mattr) . 170 The Stack floor . 207 f_pthread_mutexattr_setpshared(mattr, pshared) 170 Stack overflow . 207 f_pthread_mutexattr_settype(mattr, type) . 171 Prolog and epilog . 207 f_pthread_mutexattr_t . 172 Traceback . 207 f_pthread_once(once, initr) . 172 THREADLOCAL common blocks and f_pthread_once_t . 172 interlanguage calls with C . 208 f_pthread_rwlock_destroy(rwlock) . 172 Example . 209 f_pthread_rwlock_init(rwlock, rwattr) . 173 f_pthread_rwlock_rdlock(rwlock) . 173 Chapter 11. Implementation details of f_pthread_rwlock_t . 174 XL Fortran Input/Output (I/O) . .211 f_pthread_rwlock_tryrdlock(rwlock) . 174 Implementation details of file formats . .211 f_pthread_rwlock_trywrlock(rwlock) . 175 File names . 212 f_pthread_rwlock_unlock(rwlock) . 175 Preconnected and Implicitly Connected Files . 212 f_pthread_rwlock_wrlock(rwlock) . 176 File positioning . 213 f_pthread_rwlockattr_destroy(rwattr) . 176 I/O Redirection

Load more