Pathscale™ Ekopath™ Compiler Suite User Guide

PATHSCALE™ EKOPATH™ COMPILER SUITE USER GUIDE VERSION 2.4 2 Copyright © 2004, 2005, 2006 PathScale, Inc. All Rights Reserved. PathScale, EKOPath, the PathScale logo, and Accelerating Cluster Performance are trademarks of PathScale, Inc. All other trademarks belong to their respective owners. In accordance with the terms of their valid PathScale customer agreements, customers are permitted to make electronic and paper copies of this document for their own exclusive use. All other forms of reproduction, redistribution, or modification are prohibited without the prior express written permission of PathScale, Inc. Document number: 1-02404-10 Last generated on March 24, 2006 Release version New features 1.4 New Sections 3.3.3.1, 3.7.4, 10.4, 10.5 Added Appendix B: Supported Fortran intrinsics 2.0 New Sections 2.3, 8.9.7, 11.8 Added Chapter 8: Using OpenMP in For- tran New Appendix B: Implementation depen- dent behavior for OpenMP Fortran Expanded and updated Appendix C: Sup- ported Fortran intrinsics 2.1 Added Chapter 9: Using OpenMP in C/C++, Appendix E: eko man page Expanded and updated Appendix B and Ap- pendix C 2.2 New Sections 3.5.1.4, 3.5.2; combined OpenMP chapters 2.3 Added to reference list in Chapter 1, new Section 8.2 on autoparallelization 2.4 Expanded and updated Section 3.4, Section 3.5, and Section 7.9, Updated Section C.3, added Section C.4 (Fortran intrinstic extensions) Contents 1 Introduction 11 1.1 Conventions used in this document . 12 1.2 Documentation suite . 12 2 Compiler Quick Reference 15 2.1 What you installed . 15 2.2 How to invoke the PathScale EKOPath compilers . 16 2.3 Compiling for different platforms . 17 2.3.1 Target options for the 2.4 release . 18 2.3.2 Defaults flag . 19 2.3.3 Compiling for an alternate platform . 19 2.3.4 Compiling option tool: pathhow-compiled . 19 2.4 Input file types . 20 2.5 Other input files . 21 2.6 Common compiler options . 22 2.7 Shared libraries . 22 2.8 Large file support . 23 2.9 Large object support . 23 2.9.1 Support for "large" memory model . 24 2.10 Debugging . 24 2.11 Profiling: Locate your program’s hot spots . 25 2.12 Taskset: Assigning a process to a specific CPU . 26 3 4 PathScale EKOPath Compiler Suite User Guide 2.4 3 The PathScale EKOPath Fortran compiler 29 3.1 Using the Fortran compiler . 29 3.1.1 Fixed-form and free-form files . 30 3.2 Modules . 31 3.3 Extensions . 31 3.3.1 Promotion of REAL and INTEGER types . 31 3.3.2 Cray pointers . 32 3.3.3 Directives . 32 3.3.3.1 F77 or F90 prefetch directives . 33 3.3.3.2 Changing optimization using directives . 34 3.4 Compiler and runtime features . 34 3.4.1 Preprocessing source files with -cpp ............... 34 3.4.2 Preprocessing source files with -ftpp .............. 34 3.4.3 Preprocessing source files with -fcoco ............. 35 3.4.3.1 Pre-defined macros . 35 3.4.4 Error numbers: the Explain command . 36 3.4.5 Fortran 90 dope vector . 37 3.4.6 Bounds checking . 38 3.4.7 Pseudo-random numbers . 38 3.5 Mixed code . 38 3.5.1 Calls between C and Fortran . 39 3.5.1.1 Example: Calls between C and Fortran . 40 3.5.1.2 Example: Accessing common blocks from C . 42 3.6 Runtime I/O compatibility . 43 3.6.1 Performing endian conversions . 43 3.6.1.1 The assign command . 43 3.6.1.2 Using the wildcard option . 43 3.6.1.3 Converting data and record headers . 44 3.6.1.4 The ASSIGN() procedure . 44 CONTENTS 5 3.6.1.5 I/O compilation flags . 44 3.6.2 Reserved file units . 45 3.7 Source code compatibility . 45 3.7.1 Fortran KINDs . 45 3.8 Library compatibility . 46 3.8.1 Name mangling . 46 3.8.2 ABI compatibility . 47 3.8.3 Linking with g77-compiled libraries . 47 3.8.3.1 AMD Core Math Library (ACML) . 48 3.8.4 List directed I/O and repeat factors . 48 3.8.4.1 Environment variable . 49 3.8.4.2 Assign command . 49 3.9 Porting Fortran code . 50 3.10 Debugging and troubleshooting Fortran . 50 3.10.1 Writing to constants can cause crashes . 51 3.10.2 Runtime errors caused by aliasing among Fortran dummy ar- guments . 51 3.10.3 Fortran malloc debugging . 52 3.11 Fortran compiler stack size . 52 4 The PathScale EKOPath C/C++ compiler 55 4.1 Using the C/C++ compilers . 56 4.2 Compiler and runtime features . 57 4.2.1 Preprocessing source files . 57 4.2.1.1 Pre-defined macros . 57 4.2.2 Pragmas . 58 4.2.2.1 Pragma pack . 58 4.2.2.2 Changing optimization using pragmas . 58 4.2.2.3 Code layout optimization using pragmas . 59 4.2.3 Mixing code . 59 4.2.4 Linking . 60 4.3 Debugging and troubleshooting C/C++ . 60 4.4 GCC extensions not supported . 60 6 PathScale EKOPath Compiler Suite User Guide 2.4 5 Porting and compatibility 63 5.1 Getting started . 63 5.2 GNU compatibility . 63 5.3 Porting Fortran . 63 5.3.1 Intrinsics . 64 5.3.1.1 An example . 64 5.3.2 Name-mangling . 64 5.3.3 Static data . 64 5.4 Porting to x86_64 . 64 5.5 Migrating from other compilers . 65 5.6 Compatibility . 65 5.6.1 GCC compatibility wrapper script . 65 6 Tuning Quick Reference 67 6.1 Basic optimization . 67 6.2 IPA ...................................... 67 6.3 Feedback Directed Optimization (FDO) . 68 6.4 Aggressive optimization . 68 6.5 Performance analysis . 69 6.6 Optimize your hardware . 69 7 Tuning options 71 7.1 Basic optimizations: The -O flag . 71 7.2 Syntax for complex optimizations (-CG, -IPA, -LNO -OPT, -WOPT) 72 7.3 Inter-Procedural Analysis (IPA) . 73 7.3.1 The IPA compilation model . 74 7.3.2 Inter-procedural analysis and optimization . 74 7.3.2.1 Analysis . 75 7.3.3 Optimization . 75 7.3.4 Controlling IPA . 77 CONTENTS 7 7.3.4.1 Inlining . 77 7.3.5 Cloning . 79 7.3.6 Other IPA tuning options . 79 7.3.6.1 Disabling options . 80 7.3.7 Case study on SPEC CPU2000 . 80 7.3.8 Invoking IPA . 82 7.3.9 Size and correctness limitations to IPA . 83 7.4 Loop Nest Optimization (LNO) . 84 7.4.1 Loop fusion and fission . 84 7.4.2 Cache size specification . 85 7.4.3 Cache blocking, loop unrolling, interchange transformations . 85 7.4.4 Prefetch . 86 7.4.5 Vectorization . 86 7.5 Code Generation (-CG:) . 87 7.6 Feedback Directed Optimization (FDO) . 87 7.7 Aggressive optimizations . 88 7.7.1 Alias analysis . 88 7.7.2 Numerically unsafe optimizations . 90 7.7.3 Fast-math functions . 90 7.7.4 IEEE 754 compliance . 91 7.7.4.1 Arithmetic . 91 7.7.4.2 Roundoff . ..

Load more