XL C/C++: Optimization and Programming Guide About This Document

IBM XL C/C++ for Linux, V16.1.1 IBM Optimization and Programming Guide Version 16.1.1 SC27-8046-01 IBM XL C/C++ for Linux, V16.1.1 IBM Optimization and Programming Guide Version 16.1.1 SC27-8046-01 Note Before using this information and the product it supports, read the information in “Notices” on page 121. First edition This edition applies to IBM XL C/C++ for Linux, V16.1.1 (Program 5765-J13, 5725-C73) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. © Copyright IBM Corporation 1996, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document ......... v Optimizing at level 4 .......... 26 Who should read this document........ v Optimizing at level 5 .......... 27 How to use this document.......... v Tuning for your system architecture ...... 28 How this document is organized ....... v Getting the most out of target machine options 28 Conventions .............. vi Using high-order loop analysis and transformations 29 Related information ............ ix Getting the most out of -qhot ....... 30 Available help information ........ ix Using shared-memory parallelism (SMP) .... 31 Standards and specifications ........ xi Getting the most out of -qsmp ....... 32 Technical support ............ xi Using interprocedural analysis ........ 33 How to send your comments ........ xii Getting the most from -qipa ........ 35 Using profile-directed feedback ........ 36 Chapter 1. Using XL C/C++ with Fortran 1 Compiling with -qpdf1 ......... 37 Training with typical data sets ....... 38 Identifiers ............... 1 Recompiling or linking with -qpdf2 ..... 41 Corresponding data types .......... 2 Marking variables as local or imported ..... 44 Character and aggregate data ......... 4 Getting the most out of -qdatalocal ..... 44 Function calls and parameter passing ...... 4 Using compiler reports to diagnose optimization Pointers to functions............ 5 opportunities .............. 46 Sample program: C/C++ calling Fortran ..... 5 Parsing compiler reports with development tools 47 Other optimization options ......... 48 Chapter 2. Aligning data........ 7 Using alignment modes........... 7 Chapter 6. Debugging optimized code 51 Alignment of aggregates ......... 8 Detecting errors in code .......... 51 Alignment of bit-fields .......... 9 Understanding different results in optimized Using alignment modifiers ......... 10 programs ............... 52 Debugging in the presence of optimization .... 53 Chapter 3. Handling floating-point operations ............. 13 Chapter 7. Coding your application to Floating-point formats ........... 13 improve performance ........ 55 Handling multiply-and-add operations ..... 13 Finding faster input/output techniques ..... 55 Compiling for strict IEEE conformance ..... 14 Reducing function-call overhead ....... 55 Handling floating-point constant folding and Managing memory efficiently (C++ only) .... 57 rounding ............... 14 Optimizing variables ........... 57 Matching compile-time and runtime rounding Manipulating strings efficiently ........ 58 modes ............... 15 Optimizing expressions and program logic .... 58 Handling floating-point exceptions ...... 16 Optimizing operations in 64-bit mode ..... 59 Tracing functions in your code ........ 60 Chapter 4. Constructing a library ... 17 The C++ template model .......... 64 Compiling and linking a library ....... 17 Using delegating constructors (C++11) ..... 65 Compiling a static library......... 17 Using rvalue references (C++11) ....... 65 Compiling a shared library ........ 17 Using visibility attributes (IBM extension) .... 67 Linking a library to an application...... 17 Types of visibility attributes ........ 68 Linking a shared library to another shared library 18 Rules of visibility attributes ........ 70 Initializing static objects in libraries (C++) .... 18 Propagation rules (C++ only) ....... 75 Specifying visibility attributes using the Chapter 5. Optimizing your applications 21 -fvisibility option .......... 78 Distinguishing between optimization and tuning .. 21 Specifying visibility attributes using pragma Steps in the optimization process ....... 22 preprocessor directives ......... 78 Basic optimization ............ 22 Optimizing at level 0 .......... 23 Chapter 8. Exploiting POWER9 Optimizing at level 2 .......... 23 technology ............. 81 Advanced optimization .......... 24 POWER9 compiler options ......... 81 Optimizing at level 3 .......... 25 High-performance libraries that are tuned for An intermediate step: adding -qhot suboptions at POWER9 ............... 81 level 3 ............... 26 POWER9 built-in functions ......... 81 © Copyright IBM Corp. 1996, 2018 iii Chapter 9. Using the high performance Reduction operations in parallelized loops.... 109 libraries .............. 85 Using the Mathematical Acceleration Subsystem Chapter 12. Offloading computations (MASS) libraries ............. 85 to the NVIDIA GPUs......... 111 Using the scalar library ......... 86 System prerequisites for offloading to the NVIDIA Using the vector libraries ......... 88 GPUs ................ 111 Using the SIMD libraries ......... 92 Programming with OpenMP device constructs .. 111 Compiling and linking a program with MASS .. 96 Extensions from OpenMP Technical Reports .. 112 Using the Basic Linear Algebra Subprograms - BLAS 97 Using IBM XL C/C++ for Linux with NVCC ... 113 BLAS function syntax .......... 97 Limitations .............. 114 Linking the libxlopt library ........ 99 Chapter 13. Vector element order Chapter 10. Using vector technology 101 toggling .............. 117 Chapter 11. Parallelizing your Notices .............. 121 programs ............. 103 Trademarks .............. 123 Countable loops ............ 103 Enabling automatic parallelization ...... 105 Index ............... 125 Data sharing attribute rules......... 105 Using OpenMP directives ......... 107 Shared and private variables in a parallel environment.............. 108 iv XL C/C++: Optimization and Programming Guide About this document This guide discusses advanced topics related to the use of IBM® XL C/C++ for Linux, V16.1.1, with a particular focus on program portability and optimization. The guide provides both reference information and practical tips for getting the most out of the compiler's capabilities through recommended programming practices and compilation procedures. Who should read this document This document is addressed to programmers building complex applications, who already have experience compiling with XL C/C++ and would like to take further advantage of the compiler's capabilities for program optimization and tuning, support for advanced programming language features, and add-on tools and utilities. How to use this document This document uses a task-oriented approach to present the topics by concentrating on a specific programming or compilation problem in each section. Each topic contains extensive cross-references to the relevant sections of the reference guides in the IBM XL C/C++ for Linux, V16.1.1 documentation set, which provides detailed descriptions of compiler options, pragmas, and specific language extensions. Topics not described in this information are available as follows: v An executive overview of new functions: see the What's New for XL C/C++. v Compiler installation and system requirements: see the XL C/C++ Installation Guide and product README files. v Migration considerations and guide: see the XL C/C++ Migration Guide. v Overview of XL C/C++ features: see the Getting Started with XL C/C++. v Compiler options, pragmas, predefined macros, and built-in functions. These are described in the XL C/C++ Compiler Reference. v The C or C++ programming language: see the XL C/C++ Language Reference for information about the syntax, semantics, and IBM implementation of the C or C++ IBM extension features. See C/C++ standards for the details of standard features. How this document is organized This guide includes the following chapters: v Chapter 1, “Using XL C/C++ with Fortran,” on page 1 discusses considerations for calling Fortran code from XL C/C++ programs. v Chapter 2, “Aligning data,” on page 7 discusses options available for controlling the alignment of data in aggregates, such as structures and classes. v Chapter 3, “Handling floating-point operations,” on page 13 discusses options available for controlling how floating-point operations are handled by the compiler. © Copyright IBM Corp. 1996, 2018 v v Chapter 4, “Constructing a library,” on page 17 discusses how to compile and link static and shared libraries and how to specify the initialization order of static objects in C++ programs. v Chapter 5, “Optimizing your applications,” on page 21 discusses various options provided by the compiler for optimizing your programs, and it provides recommendations on how to use these options. v Chapter 6, “Debugging optimized code,” on page 51 discusses the potential usability problems of optimized programs and the options that can be used to debug optimized code. v Chapter 7, “Coding your application to improve performance,” on page 55 discusses recommended programming practices and coding techniques to enhance program performance and compatibility with the compiler's optimization capabilities. v Chapter 9, “Using the high performance libraries,” on page 85 discusses two performance libraries that are shipped with XL C/C++: the

XL C/C++: Optimization and Programming Guide About This Document

Program Analysis and Optimization for Embedded Systems

Stochastic Program Optimization by Eric Schkufza, Rahul Sharma, and Alex Aiken

The Effect of Code Expanding Optimizations on Instruction Cache Design

ARM Optimizing C/C++ Compiler V20.2.0.LTS User's Guide (Rev. V)

Compiler Optimization-Space Exploration

XL Fortran: Getting Started

Machine Learning in Compiler Optimisation Zheng Wang and Michael O’Boyle

An Optimization-Driven Incremental Inline Substitution Algorithm for Just-In-Time Compilers

A Unified Approach to Global Program Optimization

Demand-Driven Inlining Heuristics in a Region-Based Optimizing Compiler for ILP Architectures

Analysis and Optimization of Scientific Applications Through Set and Relation Abstractions M

Object Code Optimization Analysis of Source Code and Produces the Most Efficient Object Code of Any Compiler Presently Known to the Writers