Intel® Fortran Compiler for Linux* Systems User's Guide, Volume II

Intel® Fortran Compiler for Linux* Systems User's Guide Volume II: Optimizing Applications Document Number: 253260-002 Disclaimer and Legal Information Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING L IABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. This User's Guide Volume II as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The software described in this User's Guide Volume II may contain software defects which may cause the product to deviate from published specifications. Current characterized software defects are available on request. Intel SpeedStep, Intel Thread Checker, Celeron, Dialogic, i386, i486, iCOMP, Intel, Intel logo, Intel386, Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetStructure, Intel Xeon, Intel XScale, Itanium, MMX, MMX logo, Pentium, Pentium II Xeon, Pentium III Xeon, Pentium M, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other names and brands may be claimed as the property of others. Copyright © Intel Corporation 2003-2004. Portions © Copyright 2001 Hewlett-Packard Development Company, L.P. Table Of Contents Optimizing Applications: Overview .....................................................................9 How to Use This Document .............................................................................11 Notation Conventions...................................................................................11 Programming for High Performance.................................................................13 Programming for High Performance: Overview .............................................13 Programming Guidelines..............................................................................13 Setting Data Type and Alignment..............................................................13 Using Arrays Efficiently.............................................................................20 Improving I/O Performance.......................................................................25 Improving Run-time Efficiency ..................................................................30 Using Intrinsics for Itanium®-based Systems ............................................33 Coding Guidelines for Intel® Architectures................................................34 Analyzing and Timing Your Application.........................................................37 Using Intel Performance Analysis Tools ....................................................37 Timing Your Application............................................................................38 Compiler Optimizations ...................................................................................41 Compiler Optimizations Overview.................................................................41 Optimizing the Compilation Process .............................................................41 Optimizing the Compilation Process Overview ..........................................41 Efficient Compilation.................................................................................41 Little-endian-to-Big-endian Conversion .....................................................45 iii Table Of Contents Default Compiler Optimizations.................................................................48 Using Compilation Options .......................................................................51 Optimizing Different Application Types .........................................................61 Optimizing Different Application Types Overview.......................................61 Setting Optimizations with -On Options.....................................................62 Restricting Optimizations..........................................................................65 Floating-point Arithmetic Optimizations.........................................................66 Options Used for Both IA-32 and Itanium® Architectures...........................66 Floating-point Arithmetic Precision for IA-32 Systems................................69 Floating-point Arithmetic Precision for Itanium®-based Systems................70 Improving/Restricting FP Arithmetic Precision ...........................................71 Optimizing for Specific Processors ...............................................................73 Optimizing for Specific Processors Overview.............................................73 Targeting a Processor, -tpp{n} ..................................................................73 Processor-specific Optimization (IA-32 only) .............................................75 Automatic Processor-specific Optimization (IA-32 only).............................76 Processor-specific Run-time Checks, IA-32 Systems.................................77 Interprocedural Optimizations (IPO)..............................................................79 Overview of Interprocedural Optimizations ................................................79 IPO Compilation Model.............................................................................80 Command Line for Creating an IPO Executable ........................................81 Generating Multiple IPO Object Files ........................................................82 Capturing Intermediate Outputs of IPO .....................................................83 iv Table Of Contents Creating an IPO Executable Using xild......................................................83 Code Layout and Multi-Object IPO............................................................85 Compilation with Real Object Files............................................................86 Creating a Library from IPO Objects..........................................................87 Using -ip with -Qoption Specifiers .............................................................88 Inline Expansion of Functions ...................................................................90 Profile-guided Optimizations.........................................................................93 Profile-guided Optimizations Overview......................................................93 Profile-guided Optimizations Methodology and Usage Model.....................94 Basic PGO Options ..................................................................................97 Advanced PGO Options ...........................................................................98 PGO Environment Variables.....................................................................99 Example of Profile-Guided Optimization ..................................................100 Merging the .dyn Files ............................................................................101 Using profmerge to Relocate the Source Files.........................................102 Code-coverage Tool ...............................................................................103 Test Prioritization Tool ............................................................................111 PGO API: Profile Information Generation Support ...................................118 High-level Language Optimizations (HLO)..................................................121 HLO Overview........................................................................................121 Loop Transformations.............................................................................122 Scalar Replacement (IA-32 Only)............................................................123 Loop Unrolling with -unroll[n]...................................................................123 v Table Of Contents Memory Dependency with IVDEP Directive.............................................124 Prefetching.............................................................................................125 Parallel Programming with Intel® Fortran.......................................................127 Parallelism: an Overview............................................................................127 Parallel Program Development ...............................................................128 Auto-vectorization (IA-32 Only)...................................................................130

Load more