Mipspro™ Compiling and Performance Tuning Guide

MIPSpro™ Compiling and Performance Tuning Guide Document Number 007-2360-006 Contributors Written by Arthur Evans, Wendy Ferguson, Jed Hartman, Jackie Neider Edited by Christina Cary Production by Lorrie Williams Engineering contributions by Dave Anderson, Zaineb Asaf, Dave Babcock, Greg Boyd, Jack Carter, Ann Mei Chang, Wei-Chau Chang, David Ciemiewicz, Rune Dahl, Jim Dehnert, David Frederick, Sanjoy Ghosh, Jay Gischer, Bob Green, Seema Hiranandani, W. Wilson Ho, Marty Itzkowitz, Bhaskar Janakiraman, Woody Lichtenstein, Dror Maydan, Ajit Mayya, Ray Milkey, Michael Murphy, Bron Nelson, Andy Palay, Ron Price, John Wilkinson © Copyright 1996 Silicon Graphics, Inc.— All Rights Reserved This document contains proprietary and confidential information of Silicon Graphics, Inc. The contents of this document may not be disclosed to third parties, copied, or duplicated in any form, in whole or in part, without the prior written permission of Silicon Graphics, Inc. Restricted Rights Legend Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and/or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94039-7311. Silicon Graphics, the Silicon Graphics logo, and IRIS are registered trademarks and IRIX, CASEVision, IRIS IM, IRIS Showcase, Impressario, Indigo Magic, Inventor, IRIS-4D, POWER Series, RealityEngine, CHALLENGE, Onyx, Origin2000, and WorkShop are trademarks of Silicon Graphics, Inc. MIPS, R4000, and R8000 are registered trademarks and MIPSpro, R5000, and R10000 are trademarks of MIPS Technologies, Inc. OSF/Motif is a trademark of Open Software Foundation, Inc. PostScript is a registered trademark and Display PostScript is a trademark of Adobe Systems, Inc. UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company, Ltd. X Window System is a trademark of the Massachusetts Institute of Technology. MIPSpro™ Compiling and Performance Tuning Guide Document Number 007-2360-006 Contents Figures xi Tables xiii About This Guide xv What This Guide Contains xv What You Should Know Before Reading This Guide xvi Suggestions for Further Reading xvi Conventions Used in This Guide xviii 1. About the MIPSpro Compiler System 1 2. Using the MIPSpro Compiler System 7 Selecting a Compiler 8 Using a Defaults Specification File 8 Using Command-Line Options 10 Setting an Environment Variable 10 Object File Format and Dynamic Linking 10 Executable and Linking Format 11 Dynamic Shared Objects 11 Position-Independent Code 12 iii Contents Source File Considerations 12 Source File Naming Conventions 13 Header Files 13 Specifying a Header File 14 Creating a Header File for Multiple Languages 14 Using Precompiled Headers in C and C++ 15 About Precompiled Headers 16 Automatic Precompiled Header Processing 16 Other Ways to Control Precompiled Headers 20 PCH Performance Issues 21 Compiler Drivers 22 Default Behavior for Compiler Drivers 22 General Options for Compiler Drivers 23 Linking 27 Invoking the Linker Manually 28 Linker Syntax 28 Linker Example 30 Linking Assembly Language Programs 30 Linking Libraries 30 Specifying Libraries and DSOs 31 Examples of Linking DSOs 32 Linking to Previously Built Dynamic Shared Objects 32 Linking Multilanguage Programs 33 Finding an Unresolved Symbol With ld 35 Debugging 35 iv Contents Getting Information About Object Files 36 Disassembling Object Files with dis 37 dis Syntax 37 dis Options 37 Listing Parts of DWARF Object Files With dwarfdump 38 dwarfdump Syntax 38 dwarfdump Options 39 Listing Parts of ELF Object Files and Libraries with elfdump 40 elfdump Syntax 40 elfump Options 40 Determining File Type with file 42 file Syntax 43 file Example 43 Listing Symbol Table Information: nm 43 nm Syntax 43 nm Symbol Table Options 44 nm Example of Obtaining a Symbol Table Listing 46 Determining Section Sizes with size 47 size Syntax 47 size Options 47 size Example 48 Removing Symbol Table and Relocation Bits with strip 48 strip Syntax 49 Using the Archiver to Create Libraries 49 ar Syntax 50 ar Options 51 ar Examples 52 3. Using Dynamic Shared Objects 55 Benefits of Using DSOs 55 v Contents Using DSOs 57 DSOs vs. Archive Libraries 57 Using QuickStart 58 Guidelines for Using Shared Libraries 58 Choosing Library Members 59 Tuning Shared Library Code 60 Taking Advantage of QuickStart 62 Building DSOs 65 Creating DSOs 65 Making DSOs Self-Contained 65 Controlling Symbols to Be Exported or Loaded 66 Using DSOs With C++ 67 Using Registry Files 68 Registry File Format 69 Directive Lines 69 Shared Object Specification Lines 70 Run-Time Linking 71 Searching for DSOs at Run Time 71 Run-Time Symbol Resolution 72 Compiling with –Bsymbolic 72 Converting Libraries to DSOs 73 Dynamic Loading Under Program Control 75 Versioning of DSOs 77 The Versioning Mechanism 77 What Is a Version? 78 Building a Shared Library Using Versioning 78 Example of Versioning 79 4. Optimizing Program Performance 83 Optimization Overview 84 Benefits of Optimization 84 Optimization and Debugging 84 Using the Optimization Options 84 vi Contents Performance Tuning with Interprocedural Analysis 85 Inlining 88 Benefits of Inlining 88 Inlining Options for Routines 89 Options To Control Inlining Heuristics 91 Common Block Padding 92 Alias and Address Taken Analysis 93 The –IPA:alias=ON Option 93 The –IPA:addressing=ON Option 94 The –IPA:opt_alias=ON Option 94 Controlling Loop Nest Optimizations 94 Running LNO 94 LNO Optimizations 97 Loop Interchange 97 Blocking and Outer Loop Unrolling 98 Loop Fusion 99 Loop Fission/Distribution 100 Prefetching 102 Gather-Scatter Optimization 102 Compiler Options for LNO 103 Controlling LNO Optimization Levels 104 Controlling Fission and Fusion 104 Controlling Gather-Scatter 105 Controlling Cache Parameters 105 Controlling Blocking and Permutation Transformations 107 Controlling Prefetch 108 Dependence Analysis 109 Pragmas and Directives for LNO 109 Fission/Fusion 110 Blocking and Permutation Transformations 111 Prefetch 114 Dependence Analysis 116 vii Contents Controlling Floating Point Optimization 117 –OPT:roundoff=n 118 –OPT:IEEE_arithmetic=n 119 Other Options to Control Floating Point Behavior 121 Debugging Floating-Point Problems 122 Controlling Miscellaneous Optimizations With the –OPT Option 123 Using the –OPT:space Option 123 Using the –OPT:Olimit=n Option 123 Using the –OPT:alias Option 124 Simplifying Code With the –OPT Option 125 The Code Generator 126 Overview of the Code Generator 126 Code Generator and Optimization Levels 127 An Example of Local Optimization for Fortran 127 Code Generator and Optimization Levels –02 and –03 128 If Conversion 128 Cross-Iteration Optimizations 130 Read-Read Elimination 130 Read-Write Elimination 130 Write-Write Elimination 131 Common Sub-expression Elimination 131 Loop Unrolling 131 Recurrence Breaking 132 Software Pipelining 133 Steps Performed By the Code Generator at Levels –O2 and –O3 133 Modifying Code Generator Defaults 134 Miscellaneous Code Generator Performance Topics 135 Prefetch and Load Latency 135 Frequency and Feedback 136 Controlling the Target Architecture 136 Controlling the Target Environment 137 viii Contents Improving Global Optimization 138 Overview of the Global Optimizer 138 Optimizing C, C++, and Fortran Programs 139 Optimizing C and C++ Programs 139 Example of Pointer Placement and Aliasing 140 Improving Other Optimization 141 C, C++, and Fortran Programs 142 C and C++ Programs 142 C++ Programs Only 143 Register Allocation 143 Using SpeedShop 144 5. Coding for 64-Bit Programs 147 Coding Assumptions to Avoid 147 sizeof(int) == sizeof(void *) 148 sizeof(int) == sizeof(long) 148 sizeof(long) == 4 148 sizeof(void *) == 4 149 Implicitly Declared Functions 149 Constants With the High-Order Bit Set 149 Arithmetic with long Types 149 Solving Porting Problems 150 Guidelines for Writing Code for 64-Bit Silicon Graphics Platforms 150 6. Porting Code to N32 and 64-Bit Silicon Graphics Systems 155 Compatibility 155 N32 Porting Guidelines 157 Porting Environment 158 Source Code Changes 158 Build Procedure 158 Runtime Issues 159 ix Contents Porting Code to 64-Bit Silicon Graphics Systems 159 Using Data Types 160 Using Predefined Types 161 Using Typedefs 162 Maximum Memory Allocation 163 Arrays Larger Than 2 Gigabytes 163 Example of Arrays Larger Than 2 Gigabytes 163 Using Large Files With XFS 165 Index 167 Important Note 181 x Figures Figure 1-1 Compiler System Flowchart 4 Figure 2-1 Compilation Control Flow for Multilanguage Programs 34 Figure 3-1 An Application Linked with DSOs 63 Figure 4-1 Compilation Process Showing Interprocedural Analysis 87 Figure 4-2 Compilation Process Showing LNO Transformations 96 Figure 6-1 Application Support Under Different ABIs 156 Figure 6-2 Library Locations for Different ABIs 157 xi Figures xii Tables Table Intro-1 IDO Topics and Manuals xvi Table 1-1 Compiler System Functional Components 2 Table 1-2 Compilers and Default Libraries 3 Table 2-1 The compiler.defaults File Specifications 8 Table 2-2 Compilation Mode Command-Line Options 10 Table 2-3 Compilation Mode Environment Variable Specifications 10 Table 2-4 Driver Input File Suffixes 13 Table 2-5 General Driver Options 23 Table 2-6 Linker Options 28

Mipspro™ Compiling and Performance Tuning Guide

Expression Rematerialization for VLIW DSP Processors with Distributed Register Files ?

User-Directed Loop-Transformations in Clang

Elimination of Memory-Based Dependences For

A General Compilation Algorithm to Parallelize and Optimize Counted Loops with Dynamic Data-Dependent Bounds Jie Zhao, Albert Cohen

Foundations of Scientific Research

Polyhedral-Model Guided Loop-Nest Auto-Vectorization Konrad Trifunović, Dorit Nuzman, Albert Cohen, Ayal Zaks, Ira Rosen

Compiler Construction

Synthesis and Exploration of Loop Accelerators for Systems-On-A-Chip

Portable Section-Level Tuning of Compiler Parallelized Applications

Mipsprotm Fortran 77 Programmer's Guide

Unified Polyhedral Modeling of Temporal and Spatial Locality

Power and Energy Impact by Loop Transformations