XL Fortran: Optimization and Programming Guide for Little Endian Distributions F Pthread Rwlock Unlock(Rwlock)

Total Page:16

File Type:pdf, Size:1020Kb

XL Fortran: Optimization and Programming Guide for Little Endian Distributions F Pthread Rwlock Unlock(Rwlock) IBM XL Fortran for Linux, V16.1 IBM Optimization and Programming Guide for Little Endian Distributions Version 16.1 SC27-8049-00 IBM XL Fortran for Linux, V16.1 IBM Optimization and Programming Guide for Little Endian Distributions Version 16.1 SC27-8049-00 Note Before using this information and the product it supports, read the information in “Notices” on page 381. First edition This edition applies to IBM XL Fortran for Linux, V16.1 (Program 5765-J10; 5725-C75) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. © Copyright IBM Corporation 1990, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document ........ vii Chapter 4. Managing code size .... 47 Who should read this document ....... vii Steps for reducing code size ......... 48 How to use this document ......... vii Compiler option influences on code size..... 48 How this document is organized ....... vii The -qipa compiler option ........ 48 Conventions .............. viii The -qinline inlining option ........ 48 Related information ........... xii The -qhot compiler option ........ 49 Available help information ........ xii The -qcompact compiler option....... 49 Standards and specifications ....... xiv Other influences on code size ........ 49 Other IBM information ......... xv High activity areas ........... 49 Technical support ............ xv Computed GOTOs and CASE constructs ... 49 How to send your comments ........ xv Code size with dynamic or static linking ... 50 Chapter 1. Optimizing your applications 1 Chapter 5. Debugging optimized code 53 Distinguishing between optimization and tuning .. 1 Understanding different results in optimized Steps in the optimization process ....... 2 programs ............... 53 Basic optimization ............ 2 Debugging in the presence of optimization .... 54 Optimizing at level 0 .......... 3 Optimizing at level 2 .......... 3 Chapter 6. Compiler-friendly Advanced optimization ........... 4 programming techniques ....... 57 Optimizing at level 3 .......... 5 General practices ............ 57 An intermediate step: adding -qhot suboptions at Variables and pointers ........... 57 level 3 ............... 6 Arrays ................ 58 Optimizing at level 4 .......... 7 Choosing appropriate variable sizes ...... 58 Optimizing at level 5 .......... 8 Submodules (Fortran 2008) ......... 59 Specialized optimization techniques ...... 8 High-order transformation (HOT) ...... 9 Interprocedural analysis (IPA) ....... 11 Chapter 7. Exploiting POWER9 Profile-directed feedback ......... 15 technology ............. 63 Vector technology ........... 23 POWER9 compiler options ......... 63 Using compiler reports to diagnose optimization High-performance libraries that are tuned for opportunities ............. 27 POWER9 ............... 63 Tracing procedures in your code ....... 29 POWER9 intrinsic procedures ........ 63 Getting more performance ......... 33 Beyond performance: effective programming Chapter 8. High performance libraries 67 techniques ............... 34 Using the Mathematical Acceleration Subsystem (MASS) libraries ............. 67 Chapter 2. Tuning XL compiler Using the scalar library ......... 68 applications ............ 35 Using the vector libraries ......... 70 Tuning for your target architecture ...... 35 Using the SIMD libraries ......... 74 Using -qarch ............. 35 Compiling and linking a program with MASS .. 79 Using -qtune ............. 36 Using the Basic Linear Algebra Subprograms - BLAS 80 Using -qcache ............ 37 BLAS function syntax .......... 81 Before you finish tuning ......... 37 Linking the libxlopt library ........ 83 Further option driven tuning ........ 37 Options for providing application characteristics 38 Chapter 9. Parallel programming with Options to control optimization transformations 40 XL Fortran ............. 85 Options to assist with performance analysis .. 41 Compiling your parallelized code ....... 85 Options that can inhibit performance ..... 42 The _OPENMP C preprocessor macro and conditional compilation ......... 85 Chapter 3. Advanced optimization Setting runtime options .......... 86 concepts .............. 43 XLSMPOPTS ............. 86 Aliasing ............... 43 Environment variables for OpenMP ..... 92 Inlining................ 43 Optimizing your SMP code......... 106 Finding the right level of inlining ...... 44 Developing and running SMP applications .. 107 Parallelization directives.......... 107 © Copyright IBM Corp. 1990, 2018 iii Summary of supported parallelization directives 108 f_pthread_attr_getschedpolicy(attr, policy) ... 265 Detailed descriptions of parallelization directives 111 f_pthread_attr_getscope(attr, scope) ..... 266 Data sharing attribute rules........ 204 f_pthread_attr_getstack(attr, stackaddr, ssize) 266 Directive clauses ........... 206 f_pthread_attr_init(attr) ......... 267 Routines for OpenMP .......... 229 f_pthread_attr_setdetachstate(attr, detach) ... 267 omp_destroy_lock(svar) ......... 231 f_pthread_attr_setguardsize(attr, guardsize) .. 268 omp_destroy_nest_lock(nvar) ....... 232 f_pthread_attr_setinheritsched(attr, inherit) .. 269 omp_get_active_level() ......... 232 f_pthread_attr_setschedparam(attr, param) .. 269 omp_get_ancestor_thread_num(level) .... 232 f_pthread_attr_setschedpolicy(attr, policy) ... 270 omp_get_cancellation() ......... 233 f_pthread_attr_setscope(attr, scope) ..... 271 omp_get_default_device() ........ 233 f_pthread_attr_setstack(attr, stackaddr, ssize) 271 omp_get_dynamic() .......... 234 f_pthread_attr_t ........... 272 omp_get_initial_device()......... 234 f_pthread_cancel(thread) ........ 272 omp_get_level() ........... 235 f_pthread_cleanup_pop(exec) ....... 273 omp_get_max_active_levels() ....... 235 f_pthread_cleanup_push(cleanup, flag, arg) .. 273 omp_get_max_threads() ......... 235 f_pthread_cond_broadcast(cond) ...... 275 omp_get_nested() ........... 236 f_pthread_cond_destroy(cond)....... 275 omp_get_num_devices() ......... 236 f_pthread_cond_init(cond, cattr) ...... 276 omp_get_num_places() ......... 237 f_pthread_cond_signal(cond) ....... 276 omp_get_num_procs() ......... 237 f_pthread_cond_t ........... 277 omp_get_num_teams() ......... 238 f_pthread_cond_timedwait(cond, mutex, omp_get_num_threads() ......... 238 timeout) .............. 277 omp_get_partition_num_places() ...... 239 f_pthread_cond_wait(cond, mutex) ..... 278 omp_get_partition_place_nums(place_nums) 239 f_pthread_condattr_destroy(cattr)...... 278 omp_get_place_num() ......... 240 f_pthread_condattr_getpshared(cattr, pshared) 279 omp_get_place_num_procs(place_num).... 240 f_pthread_condattr_init(cattr) ....... 279 omp_get_place_proc_ids(place_num, ids) ... 241 f_pthread_condattr_setpshared(cattr, pshared) 280 omp_get_proc_bind() .......... 241 f_pthread_condattr_t .......... 281 omp_get_schedule(kind, modifier) ..... 242 f_pthread_create(thread, attr, flag, ent, arg) .. 281 omp_get_team_num() ......... 243 f_pthread_detach(thread) ........ 282 omp_get_team_size(level) ........ 243 f_pthread_equal(thread1, thread2) ..... 283 omp_get_thread_limit() ......... 244 f_pthread_exit(ret) ........... 283 omp_get_thread_num() ......... 244 f_pthread_getconcurrency() ........ 284 omp_get_wtick() ........... 245 f_pthread_getschedparam(thread, policy, param) 285 omp_get_wtime() ........... 246 f_pthread_getspecific(key, arg)....... 285 omp_in_final() ............ 246 f_pthread_join(thread, ret) ........ 286 omp_in_parallel() ........... 247 f_pthread_key_create(key, dtr) ....... 287 omp_init_lock(svar) .......... 248 f_pthread_key_delete(key) ........ 287 omp_init_nest_lock(nvar) ........ 249 f_pthread_key_t ........... 288 omp_is_initial_device() ......... 249 f_pthread_kill(thread, sig) ........ 288 omp_set_default_device(device_num) .... 250 f_pthread_mutex_destroy(mutex) ...... 289 omp_set_dynamic(enable_expr) ...... 250 f_pthread_mutex_init(mutex, mattr) ..... 289 omp_set_lock(svar) .......... 251 f_pthread_mutex_lock(mutex) ....... 290 omp_set_max_active_levels(max_levels) ... 252 f_pthread_mutex_t .......... 290 omp_set_nested(enable_expr) ....... 252 f_pthread_mutex_trylock(mutex) ...... 291 omp_set_nest_lock(nvar) ........ 253 f_pthread_mutex_unlock(mutex) ...... 291 omp_set_num_threads(number_of_threads_expr) 254 f_pthread_mutexattr_destroy(mattr) ..... 292 omp_set_schedule(kind, modifier) ..... 255 f_pthread_mutexattr_getpshared(mattr, pshared) 292 omp_test_lock(svar) .......... 256 f_pthread_mutexattr_gettype(mattr, type) ... 293 omp_test_nest_lock(nvar) ........ 256 f_pthread_mutexattr_init(mattr) ...... 294 omp_unset_lock(svar) ......... 257 f_pthread_mutexattr_setpshared(mattr, pshared) 294 omp_unset_nest_lock(nvar)........ 258 f_pthread_mutexattr_settype(mattr, type) ... 295 Pthreads Library Module ......... 259 f_pthread_mutexattr_t ......... 296 Pthreads data structures, functions, and f_pthread_once(once, initr) ........ 296 subroutines ............. 259 f_pthread_once_t ........... 297 f_maketime(delay)........... 262 f_pthread_rwlock_destroy(rwlock) ..... 297 f_pthread_attr_destroy(attr)........ 262 f_pthread_rwlock_init(rwlock, rwattr) .... 297 f_pthread_attr_getdetachstate(attr, detach) ... 263 f_pthread_rwlock_rdlock(rwlock) ...... 298 f_pthread_attr_getguardsize(attr, guardsize) .. 263 f_pthread_rwlock_t .......... 299 f_pthread_attr_getinheritsched(attr, inherit) .. 264 f_pthread_rwlock_tryrdlock(rwlock) ....
Recommended publications
  • Program Analysis and Optimization for Embedded Systems
    Program Analysis and Optimization for Embedded Systems Mike Jochen and Amie Souter jochen, souter ¡ @cis.udel.edu Computer and Information Sciences University of Delaware Newark, DE 19716 (302) 831-6339 1 Introduction ¢ have low cost A proliferation in use of embedded systems is giving cause ¢ have low power consumption for a rethinking of traditional methods and effects of pro- ¢ require as little physical space as possible gram optimization and how these methods and optimizations can be adapted for these highly specialized systems. This pa- ¢ meet rigid time constraints for computation completion per attempts to raise some of the issues unique to program These requirements place constraints on the underlying analysis and optimization for embedded system design and architecture’s design, which together affect compiler design to address some of the solutions that have been proposed to and program optimization techniques. Each of these crite- address these specialized issues. ria must be balanced against the other, as each will have an This paper is organized as follows: section 2 provides effect on the other (e.g. making a system smaller and faster pertinent background information on embedded systems, de- will typically increase its cost and power consumption). A lineating the special needs and problems inherent with em- typical embedded system, as illustrated in figure 1, consists bedded system design; section 3 addresses one such inherent of a processor core, a program ROM (Read Only Memory), problem, namely limited space for program code; section 4 RAM (Random Access Memory), and an ASIC (Applica- discusses current approaches and open issues for code min- tion Specific Integrated Circuit).
    [Show full text]
  • Stochastic Program Optimization by Eric Schkufza, Rahul Sharma, and Alex Aiken
    research highlights DOI:10.1145/2863701 Stochastic Program Optimization By Eric Schkufza, Rahul Sharma, and Alex Aiken Abstract Although the technique sacrifices completeness it pro- The optimization of short sequences of loop-free, fixed-point duces dramatic increases in the quality of the resulting code. assembly code sequences is an important problem in high- Figure 1 shows two versions of the Montgomery multiplica- performance computing. However, the competing constraints tion kernel used by the OpenSSL RSA encryption library. of transformation correctness and performance improve- Beginning from code compiled by llvm −O0 (116 lines, not ment often force even special purpose compilers to pro- shown), our prototype stochastic optimizer STOKE produces duce sub-optimal code. We show that by encoding these code (right) that is 16 lines shorter and 1.6 times faster than constraints as terms in a cost function, and using a Markov the code produced by gcc −O3 (left), and even slightly faster Chain Monte Carlo sampler to rapidly explore the space of than the expert handwritten assembly included in the all possible code sequences, we are able to generate aggres- OpenSSL repository. sively optimized versions of a given target code sequence. Beginning from binaries compiled by llvm −O0, we are able 2. RELATED WORK to produce provably correct code sequences that either Although techniques that preserve completeness are effec- match or outperform the code produced by gcc −O3, icc tive within certain domains, their general applicability −O3, and in some cases expert handwritten assembly. remains limited. The shortcomings are best highlighted in the context of the code sequence shown in Figure 1.
    [Show full text]
  • The Effect of Code Expanding Optimizations on Instruction Cache Design
    May 1991 UILU-EN G-91-2227 CRHC-91-17 Center for Reliable and High-Performance Computing THE EFFECT OF CODE EXPANDING OPTIMIZATIONS ON INSTRUCTION CACHE DESIGN William Y. Chen Pohua P. Chang Thomas M. Conte Wen-mei W. Hwu Coordinated Science Laboratory College of Engineering UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Approved for Public Release. Distribution Unlimited. UNCLASSIFIED____________ SECURITY ¿LASSIPlOvriON OF t h is PAGE REPORT DOCUMENTATION PAGE 1a. REPORT SECURITY CLASSIFICATION 1b. RESTRICTIVE MARKINGS Unclassified None 2a. SECURITY CLASSIFICATION AUTHORITY 3 DISTRIBUTION/AVAILABILITY OF REPORT 2b. DECLASSIFICATION/DOWNGRADING SCHEDULE Approved for public release; distribution unlimited 4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5. MONITORING ORGANIZATION REPORT NUMBER(S) UILU-ENG-91-2227 CRHC-91-17 6a. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION Coordinated Science Lab (If applicable) NCR, NSF, AMD, NASA University of Illinois N/A 6c ADDRESS (G'ty, Staff, and ZIP Code) 7b. ADDRESS (Oty, Staff, and ZIP Code) 1101 W. Springfield Avenue Dayton, OH 45420 Urbana, IL 61801 Washington DC 20550 Langley VA 20200 8a. NAME OF FUNDING/SPONSORING 8b. OFFICE SYMBOL ORGANIZATION 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 7a (If applicable) N00014-91-J-1283 NASA NAG 1-613 8c. ADDRESS (City, State, and ZIP Cod*) 10. SOURCE OF FUNDING NUMBERS PROGRAM PROJECT t a sk WORK UNIT 7b ELEMENT NO. NO. NO. ACCESSION NO. The Effect of Code Expanding Optimizations on Instruction Cache Design 12. PERSONAL AUTHOR(S) Chen, William, Pohua Chang, Thomas Conte and Wen-Mei Hwu 13a. TYPE OF REPORT 13b. TIME COVERED 14. OATE OF REPORT (Year, Month, Day) Jl5.
    [Show full text]
  • ARM Optimizing C/C++ Compiler V20.2.0.LTS User's Guide (Rev. V)
    ARM Optimizing C/C++ Compiler v20.2.0.LTS User's Guide Literature Number: SPNU151V January 1998–Revised February 2020 Contents Preface ........................................................................................................................................ 9 1 Introduction to the Software Development Tools.................................................................... 12 1.1 Software Development Tools Overview ................................................................................. 13 1.2 Compiler Interface.......................................................................................................... 15 1.3 ANSI/ISO Standard ........................................................................................................ 15 1.4 Output Files ................................................................................................................. 15 1.5 Utilities ....................................................................................................................... 16 2 Using the C/C++ Compiler ................................................................................................... 17 2.1 About the Compiler......................................................................................................... 18 2.2 Invoking the C/C++ Compiler ............................................................................................. 18 2.3 Changing the Compiler's Behavior with Options ......................................................................
    [Show full text]
  • Compiler Optimization-Space Exploration
    Journal of Instruction-Level Parallelism 7 (2005) 1-25 Submitted 10/04; published 2/05 Compiler Optimization-Space Exploration Spyridon Triantafyllis [email protected] Department of Computer Science Princeton University Princeton, NJ 08540 USA Manish Vachharajani [email protected] Department of Electrical and Computer Engineering University of Colorado at Boulder Boulder, CO 80309 USA David I. August [email protected] Department of Computer Science Princeton University Princeton, NJ 08540 USA Abstract To meet the performance demands of modern architectures, compilers incorporate an ever- increasing number of aggressive code transformations. Since most of these transformations are not universally beneficial, compilers traditionally control their application through predictive heuris- tics, which attempt to judge an optimization’s effect on final code quality a priori. However, com- plex target architectures and unpredictable optimization interactions severely limit the accuracy of these judgments, leading to performance degradation because of poor optimization decisions. This performance loss can be avoided through the iterative compilation approach, which ad- vocates exploring many optimization options and selecting the best one a posteriori. However, existing iterative compilation systems suffer from excessive compile times and narrow application domains. By overcoming these limitations, Optimization-Space Exploration (OSE) becomes the first iterative compilation technique suitable for general-purpose production compilers. OSE nar- rows down the space of optimization options explored through limited use of heuristics. A compiler tuning phase further limits the exploration space. At compile time, OSE prunes the remaining opti- mization configurations in the search space by exploiting feedback from earlier configurations tried. Finally, rather than measuring actual runtimes, OSE compares optimization outcomes through static performance estimation, further enhancing compilation speed.
    [Show full text]
  • XL Fortran: Getting Started
    IBM XL Fortran for AIX, V12.1 Getting Started with XL Fortran Ve r s i o n 12.1 GC23-8898-00 IBM XL Fortran for AIX, V12.1 Getting Started with XL Fortran Ve r s i o n 12.1 GC23-8898-00 Note Before using this information and the product it supports, read the information in “Notices” on page 29. First edition This edition applies to IBM XL Fortran for AIX, V12.1 (Program number 5724-U82) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. © Copyright International Business Machines Corporation 1996, 2008. All rights reserved. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document . .v Architecture and processor support . .12 Conventions . .v Performance and optimization . .13 Related information . .ix Other new or changed compiler options . .15 IBM XL Fortran information . .ix Standards and specifications . .x Chapter 3. Setting up and customizing Other IBM information. .xi XL Fortran . .17 Technical support . .xi Using custom compiler configuration files . .17 How to send your comments . .xi Chapter 4. Developing applications Chapter 1. Introducing XL Fortran . .1 with XL Fortran . .19 Commonality with other IBM compilers . .1 The compiler phases . .19 Hardware and operating system support . .1 Editing Fortran source files . .19 A highly configurable compiler . .1 Compiling with XL Fortran . .20 Language standards compliance . .2 Invoking the compiler . .22 Source-code migration and conformance checking 3 Compiling parallelized XL Fortran applications 22 Tools and utilities .
    [Show full text]
  • Machine Learning in Compiler Optimisation Zheng Wang and Michael O’Boyle
    1 Machine Learning in Compiler Optimisation Zheng Wang and Michael O’Boyle Abstract—In the last decade, machine learning based com- hope of improving performance but could in some instances pilation has moved from an an obscure research niche to damage it. a mainstream activity. In this article, we describe the rela- Machine learning predicts an outcome for a new data point tionship between machine learning and compiler optimisation and introduce the main concepts of features, models, training based on prior data. In its simplest guise it can be considered and deployment. We then provide a comprehensive survey and a from of interpolation. This ability to predict based on prior provide a road map for the wide variety of different research information can be used to find the data point with the best areas. We conclude with a discussion on open issues in the outcome and is closely tied to the area of optimisation. It is at area and potential research directions. This paper provides both this overlap of looking at code improvement as an optimisation an accessible introduction to the fast moving area of machine learning based compilation and a detailed bibliography of its problem and machine learning as a predictor of the optima main achievements. where we find machine-learning compilation. Optimisation as an area, machine-learning based or other- Index Terms—Compiler, Machine Learning, Code Optimisa- tion, Program Tuning wise, has been studied since the 1800s [8], [9]. An interesting question is therefore why has has the convergence of these two areas taken so long? There are two fundamental reasons.
    [Show full text]
  • An Optimization-Driven Incremental Inline Substitution Algorithm for Just-In-Time Compilers
    An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers Aleksandar Prokopec Gilles Duboscq David Leopoldseder Thomas Würthinger Oracle Labs, Switzerland Oracle Labs, Switzerland JKU Linz, Austria Oracle Labs, Switzerland [email protected] [email protected] [email protected] [email protected] Abstract—Inlining is one of the most important compiler mization prediction by incrementally exploring and specializing optimizations. It reduces call overheads and widens the scope of the program’s call tree, and clusters the callsites before inlining. other optimizations. But, inlining is somewhat of a black art of an We report the following new observations: optimizing compiler, and was characterized as a computationally intractable problem. Intricate heuristics, tuned during countless • In many programs, there is an impedance between the hours of compiler engineering, are often at the core of an logical units of code (i.e. subroutines) and the optimizable inliner implementation. And despite decades of research, well- units of code (groups of subroutines). It is therefore established inlining heuristics are still missing. beneficial to inline specific clusters of callsites, instead In this paper, we describe a novel inlining algorithm for of a single callsite at a time. We describe how to identify JIT compilers that incrementally explores a program’s call graph, and alternates between inlining and optimizations. We such callsite clusters. devise three novel heuristics that guide our inliner: adaptive • When deciding when to stop inlining, using an adaptive decision thresholds, callsite clustering, and deep inlining trials. threshold function (rather than a fixed threshold value) We implement the algorithm inside Graal, a dynamic JIT produces more efficient programs.
    [Show full text]
  • A Unified Approach to Global Program Optimization
    A UNIFIED APPROACH TO GLOBAL PROGRAM OPTIMIZATION Gary A. Kildall Computer Science Group Naval Postgraduate School Monterey, California Abstract A technique is presented for global analysie of program structure in order to perform compile time optimization of object code generated for expressions. The global expression optimization presented includes constant propagation, common subexpression elimination, elimination of redundant register load operations, and live expression analysis. A general purpose program flow analysis algorithm is developed which depends upon the existence of an “optimizing function.” The algorithm is defined formally using a directed graph model of program flow structure, and is shown to be correct, Several optimizing functions are defined which, when used in conjunction with the flow analysis algorithm, provide the various forms of code optimization. The flow analysis algorithm is sufficiently general that additional functions can easily be defined for other forms of globa~ cod: optimization. 1. INTRODUCTION of the graph represent program control flow Dossi- bilities between the nodes at execution–time. A number of techniques have evolved for the compile-time analysis of program structure in order to locate redundant computations, perform constant computations, and reduce the number of store–load sequences between memory and high-speed registers. Some of these techniques provide analysis of only entry’ ~:=j straight-line sequences of instructions [5,6,9,14, o 17,18,19,20,27,29,34,36,38,39,43,45 ,46], while B others take the program branching structure into &() account [2,3,4,10,11,12,13,15,23,30, 32,33,35]. The purpose here is to describe a single program c flow analysis algorithm which extends all of b=2 these straight-line optimizing techniques to in- D clude branching structure.
    [Show full text]
  • XL C/C++: Optimization and Programming Guide About This Document
    IBM XL C/C++ for Linux, V16.1.1 IBM Optimization and Programming Guide Version 16.1.1 SC27-8046-01 IBM XL C/C++ for Linux, V16.1.1 IBM Optimization and Programming Guide Version 16.1.1 SC27-8046-01 Note Before using this information and the product it supports, read the information in “Notices” on page 121. First edition This edition applies to IBM XL C/C++ for Linux, V16.1.1 (Program 5765-J13, 5725-C73) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. © Copyright IBM Corporation 1996, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document ......... v Optimizing at level 4 .......... 26 Who should read this document........ v Optimizing at level 5 .......... 27 How to use this document.......... v Tuning for your system architecture ...... 28 How this document is organized ....... v Getting the most out of target machine options 28 Conventions .............. vi Using high-order loop analysis and transformations 29 Related information ............ ix Getting the most out of -qhot ....... 30 Available help information ........ ix Using shared-memory parallelism (SMP) .... 31 Standards and specifications ........ xi Getting the most out of -qsmp ....... 32 Technical support ............ xi Using interprocedural analysis ........ 33 How to send your comments ........ xii Getting the most from -qipa ........ 35 Using profile-directed feedback ........ 36 Chapter 1. Using XL C/C++ with Fortran 1 Compiling with -qpdf1 ......... 37 Training with typical data sets ......
    [Show full text]
  • Demand-Driven Inlining Heuristics in a Region-Based Optimizing Compiler for ILP Architectures
    Demand-driven Inlining Heuristics in a Region-based Optimizing Compiler for ILP Architectures Tom Way, Ben Breech, Wei Du and Lori Pollock Department of Computer and Information Sciences University of Delaware, Newark, DE 19716, USA ¡ email: way,breech,wei,pollock ¢ @cis.udel.edu ABSTRACT compilation unit size and content. In addition to expos- While scientists and engineers have been taking on the ing opportunities for interprocedural instruction schedul- challenge of converting their applications into parallel ing and optimization, the size of the resulting regions (i.e., programs, compilers are charged with identifying op- compilation units) has been shown to be smaller than func- portunities and transforming codes to exploit the avail- tions, and can be limited to any desired size, in order to able instruction-level parallelism (ILP) offered by today’s reduce the impact of the quadratic algorithmic complexity uniprocessors, even within a parallel computing environ- of the applied optimization transformations [14]. ment. Region-based compilation repartitions a program The key component of a region-based compiler is the into more desirable compilation units for optimization and region formation phase which partitions the program into ILP scheduling. With region-based compilation, the com- regions using heuristics with the intent that the optimizer piler can control problem size and complexity by control- will be invoked with a scope that is limited to a single re- ling region size and contents, and expose interprocedu- gion at a time. Thus, the quality of the generated code ral scheduling and optimization opportunities without in- depends greatly upon the ability of the region formation terprocedural analysis or large function bodies.
    [Show full text]
  • Analysis and Optimization of Scientific Applications Through Set and Relation Abstractions M
    Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 2017 Analysis and Optimization of Scientific Applications through Set and Relation Abstractions M. Tohid (Rastegar Tohid, Mohammed) Louisiana State University and Agricultural and Mechanical College Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Part of the Electrical and Computer Engineering Commons Recommended Citation Tohid (Rastegar Tohid, Mohammed), M., "Analysis and Optimization of Scientific Applications through Set and Relation Abstractions" (2017). LSU Doctoral Dissertations. 4404. https://digitalcommons.lsu.edu/gradschool_dissertations/4404 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. ANALYSIS AND OPTIMIZATION OF SCIENTIFIC APPLICATIONS THROUGH SET AND RELATION ABSTRACTIONS A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The School of Electrical Engineering and Computer Science by Mohammad Rastegar Tohid M.Sc., Newcastle University, 2008 B.Sc., Azad University (S. Tehran Campus), 2006 May 2017 Dedication Dedicated to my parents who have dedicated their lives to their children. ii Acknowledgments I thank my advisor Dr. Ramanujam for valueing independent research and allowing me to work on topics that geniunely intrigued me while never lettig me go off the rail. His optimism and words of encouragement always aspired me. It has been a great pleasure to have known and worked with him.
    [Show full text]