PGI 5 User's Guide
Total Page:16
File Type:pdf, Size:1020Kb
PGI® User’s Guide Parallel Fortran, C and C++ for Scientists and Engineers The Portland Group™ STMicroelectronics 9150 SW Pioneer Court, Suite H Wilsonville, OR 97070 While every precaution has been taken in the preparation of this document, The Portland Group™, a wholly-owned subsidiary of STMicroelectronics, makes no warranty for the use of its products and assumes no responsibility for any errors that may appear, or for damages resulting from the use of the information contained herein. The Portland Group retains the right to make changes to this information at any time, without notice. The software described in this document is distributed under license from STMicroelectronics and may be used or copied only in accordance with the terms of the license agreement. No part of this document may be reproduced or transmitted in any form or by any means, for any purpose other than the purchaser's personal use without the express written permission of The Portland Group. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this manual, The Portland Group was aware of a trademark claim. The designations have been printed in caps or initial caps. Thanks is given to the Parallel Tools Consortium and, in particular, to the High Performance Debugging Forum for their efforts. PGF95, PGF95, PGC++, Cluster Development Kit, CDK and The Portland Group are trademarks and PGI, PGHPF, PGF77, PGCC, PGPROF, and PGDBG are registered trademarks of STMicroelectronics, Inc. Other brands and names are the property of their respective owners. The use of STLport, a C++ Library, is licensed separately and license, distribution and copyright notice can be found in the online documentation for a given release of the PGI compilers and tools. PGI User's Guide Copyright © 1998 – 2000, The Portland Group, Inc. Copyright © 2000 – 2005, STMicroelectronics, Inc. All rights reserved. Printed in the United States of America Part Number: 2030-990-888-0603 First Printing: Release 1.7, June 1998 Second Printing: Release 3.0, January 1999 Third Printing: Release 3.1, September 1999 Fourth Printing: Release 3.2, September 2000 Fifth Printing: Release 4.0, May 2002 Sixth Printing: Release 5.0, June 2003 Seventh Printing: Release 5.1, November 2003 Eighth Printing Release 5.2, June 2004 Ninth Printing Release 6.0, March 2005 Technical support: [email protected] Sales: [email protected] Web: http://www.pgroup.com/ Table of Contentsommand-line Syntax......................................................................................................8 1.2.2 Command-line Options ....................................................................................................9 1.2.3 Fortran Directives and C/C++ Pragmas ..........................................................................9 1.3 FILENAME CONVENTIONS ......................................................................................................10 1.3.1 Input Files ......................................................................................................................10 1.3.2 Output Files....................................................................................................................11 1.4 PARALLEL PROGRAMMING USING THE PGI COMPILERS ........................................................13 1.4.1 Running SMP Parallel Programs....................................................................................13 1.4.2 Running Data Parallel HPF Programs............................................................................14 1.5 USING THE PGI COMPILERS ON LINUX...................................................................................15 1.5.1 Linux Header Files.........................................................................................................15 1.5.2 Running Parallel Programs on Linux .............................................................................16 Table of Contents iii 1.6 USING THE PGI COMPILERS ON WINDOWS ............................................................................ 16 OPTIMIZATION & PARALLELIZATION ............................................................................. 19 2.1 OVERVIEW OF OPTIMIZATION................................................................................................ 19 2.2 GETTING STARTED WITH OPTIMIZATIONS.............................................................................. 21 2.3 LOCAL AND GLOBAL OPTIMIZATION USING −O .................................................................... 23 2.3.1 Scalar SSE Code Generation.......................................................................................... 25 2.4 LOOP UNROLLING USING −MUNROLL ................................................................................... 25 2.5 VECTORIZATION USING −MVECT .......................................................................................... 27 2.5.1 Vectorization Sub-options ............................................................................................. 27 2.5.1.1 Assoc Option........................................................................................................... 28 2.5.1.2 Cachesize Option .................................................................................................... 28 2.5.1.3 SSE Option ............................................................................................................. 28 2.5.1.4 Prefetch Option....................................................................................................... 29 2.5.2 Vectorization Example Using SSE/SSE2 Instructions................................................... 29 2.6 AUTO-PARALLELIZATION USING −MCONCUR ....................................................................... 32 2.6.1 Auto-parallelization Sub-options ................................................................................... 33 2.6.1.1 Altcode Option........................................................................................................ 33 2.6.1.2 Dist Option.............................................................................................................. 33 2.6.1.3 Cncall Option.......................................................................................................... 33 2.6.2 Auto-parallelization Example ........................................................................................ 34 2.6.3 Loops That Fail to Parallelize ........................................................................................ 35 2.6.3.1 Timing Loops.......................................................................................................... 35 2.6.3.2 Scalars..................................................................................................................... 35 2.6.3.3 Scalar Last Values................................................................................................... 36 2.7 INTER-PROCEDURAL ANALYSIS AND OPTIMIZATION USING -MIPA........................................ 38 iv Table of Contents 2.7.1 Building a Program Without IPA – Single Step.............................................................38 2.7.2 Building a Program Without IPA - Several Steps..........................................................38 2.7.3 Building a Program Without IPA Using Make ..............................................................39 2.7.4 Building a Program with IPA.........................................................................................39 2.7.5 Building a Program with IPA - Single Step ...................................................................40 2.7.6 Building a Program with IPA - Several Steps................................................................41 2.7.7 Building a Program with IPA Using Make ....................................................................42 2.7.8 Questions about IPA ......................................................................................................42 2.8 DEFAULT OPTIMIZATION LEVELS ..........................................................................................43 2.9 LOCAL OPTIMIZATION USING DIRECTIVES