PGI Compiler User's Guide

Total Page:16

File Type:pdf, Size:1020Kb

PGI Compiler User's Guide USER'S GUIDE FOR X86-64 CPUS Version 2020 TABLE OF CONTENTS Preface............................................................................................................. xi Audience Description.......................................................................................... xi Compatibility and Conformance to Standards............................................................ xi Organization....................................................................................................xii Hardware and Software Constraints...................................................................... xiii Conventions....................................................................................................xiii Terms............................................................................................................xiv Related Publications..........................................................................................xvi Chapter 1. Getting Started.....................................................................................1 1.1. Overview................................................................................................... 1 1.2. Creating an Example..................................................................................... 2 1.3. Invoking the Command-level PGI Compilers......................................................... 2 1.3.1. Command-line Syntax...............................................................................2 1.3.2. Command-line Options............................................................................. 3 1.3.3. Fortran Directives and C/C++ Pragmas.......................................................... 3 1.4. Filename Conventions....................................................................................4 1.4.1. Input Files............................................................................................ 4 1.4.2. Output Files.......................................................................................... 6 1.5. Fortran, C, and C++ Data Types........................................................................7 1.6. Parallel Programming Using the PGI Compilers...................................................... 7 1.6.1. Run SMP Parallel Programs.........................................................................8 1.7. Platform-specific considerations....................................................................... 8 1.7.1. Using the PGI Compilers on Linux................................................................ 8 1.7.2. Using the PGI Compilers on Windows............................................................ 9 1.7.3. PGI on the Windows Desktop.................................................................... 10 1.8. Site-Specific Customization of the Compilers...................................................... 11 1.8.1. Use siterc Files..................................................................................... 11 1.8.2. Using User rc Files.................................................................................12 1.9. Common Development Tasks.......................................................................... 13 Chapter 2. Use Command-line Options.................................................................... 15 2.1. Command-line Option Overview...................................................................... 15 2.1.1. Command-line Options Syntax................................................................... 15 2.1.2. Command-line Suboptions........................................................................ 16 2.1.3. Command-line Conflicting Options.............................................................. 16 2.2. Help with Command-line Options.................................................................... 16 2.3. Getting Started with Performance................................................................... 17 2.3.1. Using -fast...........................................................................................17 2.3.2. Other Performance-Related Options............................................................ 18 2.4. Targeting Multiple Systems—Using the -tp Option................................................. 19 2.5. Frequently-used Options............................................................................... 19 User's Guide for x86-64 CPUs Version 2020 | ii Chapter 3. Optimizing and Parallelizing................................................................... 22 3.1. Overview of Optimization..............................................................................23 3.1.1. Local Optimization.................................................................................23 3.1.2. Global Optimization............................................................................... 23 3.1.3. Loop Optimization: Unrolling, Vectorization and Parallelization........................... 23 3.1.4. Interprocedural Analysis (IPA) and Optimization..............................................24 3.1.5. Function Inlining................................................................................... 24 3.1.6. Profile-Feedback Optimization (PFO)........................................................... 24 3.2. Getting Started with Optimization................................................................... 24 3.2.1. -help..................................................................................................26 3.2.2. -Minfo................................................................................................ 26 3.2.3. -Mneginfo............................................................................................ 26 3.2.4. -dryrun............................................................................................... 27 3.2.5. -v......................................................................................................27 3.2.6. PGI Profiler..........................................................................................27 3.3. Common Compiler Feedback Format (CCFF)....................................................... 27 3.4. Local and Global Optimization........................................................................27 3.4.1. -Msafeptr............................................................................................ 28 3.4.2. -O..................................................................................................... 28 3.5. Loop Unrolling using -Munroll......................................................................... 30 3.6. Vectorization using -Mvect.............................................................................31 3.6.1. Vectorization Sub-options.........................................................................32 3.6.2. Vectorization Example Using SIMD Instructions............................................... 34 3.7. Auto-Parallelization using -Mconcur..................................................................36 3.7.1. Auto-Parallelization Sub-options.................................................................36 3.7.2. Loops That Fail to Parallelize................................................................... 38 3.8. Processor-Specific Optimization and the Unified Binary.......................................... 42 3.9. Interprocedural Analysis and Optimization using -Mipa........................................... 42 3.9.1. Building a Program Without IPA – Single Step................................................. 43 3.9.2. Building a Program Without IPA – Several Steps.............................................. 43 3.9.3. Building a Program Without IPA Using Make................................................... 44 3.9.4. Building a Program with IPA......................................................................44 3.9.5. Building a Program with IPA – Single Step..................................................... 45 3.9.6. Building a Program with IPA – Several Steps.................................................. 45 3.9.7. Building a Program with IPA Using Make....................................................... 46 3.9.8. Questions about IPA............................................................................... 46 3.10. Profile-Feedback Optimization using -Mpfi/-Mpfo................................................ 47 3.11. Default Optimization Levels......................................................................... 48 3.12. Local Optimization Using Directives and Pragmas................................................48 3.13. Execution Timing and Instruction Counting........................................................49 3.14. Portability of Multi-Threaded Programs on Linux.................................................50 3.14.1. libnuma.............................................................................................50 Chapter 4. Using Function Inlining..........................................................................51 User's Guide for x86-64 CPUs Version 2020 | iii 4.1. Automatic function inlining in C/C++................................................................51 4.2. Invoking Function Inlining..............................................................................52 4.3. Using an Inline Library................................................................................. 53 4.4. Creating an Inline Library............................................................................
Recommended publications
  • Bounds Checking on GPU
    Noname manuscript No. (will be inserted by the editor) Bounds Checking on GPU Troels Henriksen Received: date / Accepted: date Abstract We present a simple compilation strategy for safety-checking array indexing in high-level languages on GPUs. Our technique does not depend on hardware support for abnormal termination, and is designed to be efficient in the non-failing case. We rely on certain properties of array languages, namely the absence of arbitrary cross-thread communication, to ensure well-defined execution in the presence of failures. We have implemented our technique in the compiler for the functional array language Futhark, and an empirical eval- uation on 19 benchmarks shows that the geometric mean overhead of checking array indexes is respectively 4% and 6% on two different GPUs. Keywords GPU · functional programming · compilers 1 Introduction Programming languages can be divided roughly into two categories: unsafe languages, where programming errors can lead to unpredictable results at run- time; and safe languages, where all risky operations are guarded by run-time checks. Consider array indexing, where an invalid index will lead an unsafe lan- guage to read from an invalid memory address. At best, the operating system will stop the program, but at worst, the program will silently produce invalid results. A safe language will perform bounds checking to verify that the array index is within the bounds of the array, and if not, signal that something is amiss. Some languages perform an abnormal termination of the program and print an error message pointing to the offending program statement. Other languages throw an exception, allowing the problem to be handled by the pro- gram itself.
    [Show full text]
  • A Deep Dive Into the Interprocedural Optimization Infrastructure
    Stes Bais [email protected] Kut el [email protected] Shi Oku [email protected] A Deep Dive into the Luf Cen Interprocedural [email protected] Hid Ue Optimization Infrastructure [email protected] Johs Dor [email protected] Outline ● What is IPO? Why is it? ● Introduction of IPO passes in LLVM ● Inlining ● Attributor What is IPO? What is IPO? ● Pass Kind in LLVM ○ Immutable pass Intraprocedural ○ Loop pass ○ Function pass ○ Call graph SCC pass ○ Module pass Interprocedural IPO considers more than one function at a time Call Graph ● Node : functions ● Edge : from caller to callee A void A() { B(); C(); } void B() { C(); } void C() { ... B C } Call Graph SCC ● SCC stands for “Strongly Connected Component” A D G H I B C E F Call Graph SCC ● SCC stands for “Strongly Connected Component” A D G H I B C E F Passes In LLVM IPO passes in LLVM ● Where ○ Almost all IPO passes are under llvm/lib/Transforms/IPO Categorization of IPO passes ● Inliner ○ AlwaysInliner, Inliner, InlineAdvisor, ... ● Propagation between caller and callee ○ Attributor, IP-SCCP, InferFunctionAttrs, ArgumentPromotion, DeadArgumentElimination, ... ● Linkage and Globals ○ GlobalDCE, GlobalOpt, GlobalSplit, ConstantMerge, ... ● Others ○ MergeFunction, OpenMPOpt, HotColdSplitting, Devirtualization... 13 Why is IPO? ● Inliner ○ Specialize the function with call site arguments ○ Expose local optimization opportunities ○ Save jumps, register stores/loads (calling convention) ○ Improve instruction locality ● Propagation between caller and callee ○ Other passes would benefit from the propagated information ● Linkage
    [Show full text]
  • Porting a Window Manager from Xlib to XCB
    Porting a Window Manager from Xlib to XCB Arnaud Fontaine (08090091) 16 May 2008 Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version pub- lished by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Contents List of figures i List of listings ii Introduction 1 1 Backgrounds and Motivations 2 2 X Window System (X11) 6 2.1 Introduction . .6 2.2 History . .6 2.3 X Window Protocol . .7 2.3.1 Introduction . .7 2.3.2 Protocol overview . .8 2.3.3 Identifiers of resources . 10 2.3.4 Atoms . 10 2.3.5 Windows . 12 2.3.6 Pixmaps . 14 2.3.7 Events . 14 2.3.8 Keyboard and pointer . 15 2.3.9 Extensions . 17 2.4 X protocol client libraries . 18 2.4.1 Xlib . 18 2.4.1.1 Introduction . 18 2.4.1.2 Data types and functions . 18 2.4.1.3 Pros . 19 2.4.1.4 Cons . 19 2.4.1.5 Example . 20 2.4.2 XCB . 20 2.4.2.1 Introduction . 20 2.4.2.2 Data types and functions . 21 2.4.2.3 xcb-util library . 22 2.4.2.4 Pros . 22 2.4.2.5 Cons . 23 2.4.2.6 Example . 23 2.4.3 Xlib/XCB round-trip performance comparison .
    [Show full text]
  • Using the GNU Compiler Collection (GCC)
    Using the GNU Compiler Collection (GCC) Using the GNU Compiler Collection by Richard M. Stallman and the GCC Developer Community Last updated 23 May 2004 for GCC 3.4.6 For GCC Version 3.4.6 Published by: GNU Press Website: www.gnupress.org a division of the General: [email protected] Free Software Foundation Orders: [email protected] 59 Temple Place Suite 330 Tel 617-542-5942 Boston, MA 02111-1307 USA Fax 617-542-2652 Last printed October 2003 for GCC 3.3.1. Printed copies are available for $45 each. Copyright c 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with the Invariant Sections being \GNU General Public License" and \Funding Free Software", the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled \GNU Free Documentation License". (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. i Short Contents Introduction ...................................... 1 1 Programming Languages Supported by GCC ............ 3 2 Language Standards Supported by GCC ............... 5 3 GCC Command Options .........................
    [Show full text]
  • Also Includes Slides and Contents From
    The Compilation Toolchain Cross-Compilation for Embedded Systems Prof. Andrea Marongiu ([email protected]) Toolchain The toolchain is a set of development tools used in association with source code or binaries generated from the source code • Enables development in a programming language (e.g., C/C++) • It is used for a lot of operations such as a) Compilation b) Preparing Libraries Most common toolchain is the c) Reading a binary file (or part of it) GNU toolchain which is part of d) Debugging the GNU project • Normally it contains a) Compiler : Generate object files from source code files b) Linker: Link object files together to build a binary file c) Library Archiver: To group a set of object files into a library file d) Debugger: To debug the binary file while running e) And other tools The GNU Toolchain GNU (GNU’s Not Unix) The GNU toolchain has played a vital role in the development of the Linux kernel, BSD, and software for embedded systems. The GNU project produced a set of programming tools. Parts of the toolchain we will use are: -gcc: (GNU Compiler Collection): suite of compilers for many programming languages -binutils: Suite of tools including linker (ld), assembler (gas) -gdb: Code debugging tool -libc: Subset of standard C library (assuming a C compiler). -bash: free Unix shell (Bourne-again shell). Default shell on GNU/Linux systems and Mac OSX. Also ported to Microsoft Windows. -make: automation tool for compilation and build Program development tools The process of converting source code to an executable binary image requires several steps, each with its own tool.
    [Show full text]
  • Bash Shell Scripts
    Bash Shell Scripts Writing Bash shell scripts Bash shell scripts are text files Text files most efficiently built with programming editors (emacs or vi) File must be executable and in search path chmod 700 my_script PATH environment variable may not include .! An example shell script: #!/bin/bash #My first script echo "Hello World!" Bash Shell Scripts Writing Bash shell scripts Compile a Verilog file with vlog #!/bin/bash if [ ! d work ] ; then echo work does not exist, making it vlib work fi if [ ! s adder.v ] ; then vlog adder.v fi work directory must exist before compilation Get scripts via wget, eg: wget http://web.engr.oregonstate.edu/~traylor/ece474/script --- Bash Shell Scripts Writing Bash shell scripts File attribute checking #!/bin/bash if [ ! s junk_dir ] ; then mkdir junk_dir fi Spaces around brackets are needed! File attribute checking d exists and is a directory e, a file exists f exists and is a regular file s file exists and is not empty --- Bash Shell Scripts Writing Bash shell scripts Compile Verilog then run a simultion #!/bin/bash if [ ! -d "work" ] ; then vlib work fi if [ -s "adder.v" ] ; then vlog adder.v #runs simulation with a do file and no GUI vsim adder -do do.do quiet c else echo verilog file missing fi --- Bash Shell Scripts Writing Bash shell scripts vsim command and arguments vsim entity_name do dofile.do quiet -c -quiet (do not report loading file messages) -c (console mode, no GUI) -do (run vsim from a TCL do file) +nowarnTFMPC (don’t warn about mismatched ports, scary) +nowarnTSCALE (don’t warn about timing mismatches) Try vsim help for command line arguements --- Bash Shell Scripts Writing Bash Shell Scripts (TCL Script) In another text file, we create a TCL script with commands for the simulator.
    [Show full text]
  • Introduction to Linux by Lars Eklund Based on Work by Marcus Lundberg
    Introduction to Linux By Lars Eklund Based on work by Marcus Lundberg ● What is Linux ● Logging in to UPPMAX ● Navigate the file system ● “Basic toolkit” What is Linux ● The Linux Operating system is a UNIX like UNIX compatible Operating system. ● Linux is a Kernel on which many different programs can run. The shell(bash, sh, ksh, csh, tcsh and many more) is one such program ● Linux has a multiuser platform at its base which means permissions and security comes easy. Many Flavours Connect to UPPMAX ● (Download XQuartz or other X11 server for Mac OS ) ● Linux and MacOS: – start Terminal – $ ssh -X [email protected] Connect to UPPMAX for windows users ● Download a X-server such as GWSL or X-ming or VcXsrv or an other of your choosing ● Install WSL and a Distribution such as ubuntu or a ssh program such as MobaXTerm ● Connect to $ ssh -X [email protected] Windows links ● https://sourceforge.net/projects/vcxsrv/ ● https://mobaxterm.mobatek.net/ ● https://opticos.github.io/gwsl/ ● https://sourceforge.net/projects/xming/ ● https://docs.microsoft.com/en-us/windows/wsl/install-wi n10 ● Don’t forget to update to wsl2 X11-forwarding graphics from the command line ● Graphics can be sent through the SSH connection you’re using to connect - Use ssh -Y or ssh -X ● MacOS users will need to install XQuartz. ● When starting a graphical program, a new window will open, but your terminal will be “locked”. - Run using & at the end to run it as a background proccess e.g. “gedit &” - Alternatively, use ctrl-z to put gedit to sleep and
    [Show full text]
  • Computer Architecture and Assembly Language
    Computer Architecture and Assembly Language Gabriel Laskar EPITA 2015 License I Copyright c 2004-2005, ACU, Benoit Perrot I Copyright c 2004-2008, Alexandre Becoulet I Copyright c 2009-2013, Nicolas Pouillon I Copyright c 2014, Joël Porquet I Copyright c 2015, Gabriel Laskar Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with the Invariant Sections being just ‘‘Copying this document’’, no Front-Cover Texts, and no Back-Cover Texts. Introduction Part I Introduction Gabriel Laskar (EPITA) CAAL 2015 3 / 378 Introduction Problem definition 1: Introduction Problem definition Outline Gabriel Laskar (EPITA) CAAL 2015 4 / 378 Introduction Problem definition What are we trying to learn? Computer Architecture What is in the hardware? I A bit of history of computers, current machines I Concepts and conventions: processing, memory, communication, optimization How does a machine run code? I Program execution model I Memory mapping, OS support Gabriel Laskar (EPITA) CAAL 2015 5 / 378 Introduction Problem definition What are we trying to learn? Assembly Language How to “talk” with the machine directly? I Mechanisms involved I Assembly language structure and usage I Low-level assembly language features I C inline assembly Gabriel Laskar (EPITA) CAAL 2015 6 / 378 I Programmers I Wise managers Introduction Problem definition Who do I talk to? I System gurus I Low-level enthusiasts Gabriel Laskar (EPITA) CAAL
    [Show full text]
  • Handout – Dataflow Optimizations Assignment
    Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout – Dataflow Optimizations Assignment Tuesday, Mar 19 DUE: Thursday, Apr 4, 9:00 pm For this part of the project, you will add dataflow optimizations to your compiler. At the very least, you must implement global common subexpression elimination. The other optimizations listed below are optional. You may also wait until the next project to implement them if you are going to; there is no requirement to implement other dataflow optimizations in this project. We list them here as suggestions since past winners of the compiler derby typically implement each of these optimizations in some form. You are free to implement any other optimizations you wish. Note that you will be implementing register allocation for the next project, so you don’t need to concern yourself with it now. Global CSE (Common Subexpression Elimination): Identification and elimination of redun- dant expressions using the algorithm described in lecture (based on available-expression anal- ysis). See §8.3 and §13.1 of the Whale book, §10.6 and §10.7 in the Dragon book, and §17.2 in the Tiger book. Global Constant Propagation and Folding: Compile-time interpretation of expressions whose operands are compile time constants. See the algorithm described in §12.1 of the Whale book. Global Copy Propagation: Given a “copy” assignment like x = y , replace uses of x by y when legal (the use must be reached by only this def, and there must be no modification of y on any path from the def to the use).
    [Show full text]
  • Compiler Construction
    Compiler Construction Chapter 11 Compiler Construction Compiler Construction 1 A New Compiler • Perhaps a new source language • Perhaps a new target for an existing compiler • Perhaps both Compiler Construction Compiler Construction 2 Source Language • Larger, more complex languages generally require larger, more complex compilers • Is the source language expected to evolve? – E.g., Java 1.0 ! Java 1.1 ! . – A brand new language may undergo considerable change early on – A small working prototype may be in order – Compiler writers must anticipate some amount of change and their design must therefore be flexible – Lexer and parser generators (like Lex and Yacc) are therefore better than hand- coding the lexer and parser when change is inevitable Compiler Construction Compiler Construction 3 Target Language • The nature of the target language and run-time environment influence compiler construction considerably • A new processor and/or its assembler may be buggy Buggy targets make it difficult to debug compilers for that target! • A successful source language will persist over several target generations – E.g., 386 ! 486 ! Pentium ! . – Thus the design of the IR is important – Modularization of machine-specific details is also important Compiler Construction Compiler Construction 4 Compiler Performance Issues • Compiler speed • Generated code quality • Error diagnostics • Portability • Maintainability Compiler Construction Compiler Construction 5 Compiler Speed • Reduce the number of modules • Reduce the number of passes Perhaps generate machine
    [Show full text]
  • Comparative Studies of Programming Languages; Course Lecture Notes
    Comparative Studies of Programming Languages, COMP6411 Lecture Notes, Revision 1.9 Joey Paquet Serguei A. Mokhov (Eds.) August 5, 2010 arXiv:1007.2123v6 [cs.PL] 4 Aug 2010 2 Preface Lecture notes for the Comparative Studies of Programming Languages course, COMP6411, taught at the Department of Computer Science and Software Engineering, Faculty of Engineering and Computer Science, Concordia University, Montreal, QC, Canada. These notes include a compiled book of primarily related articles from the Wikipedia, the Free Encyclopedia [24], as well as Comparative Programming Languages book [7] and other resources, including our own. The original notes were compiled by Dr. Paquet [14] 3 4 Contents 1 Brief History and Genealogy of Programming Languages 7 1.1 Introduction . 7 1.1.1 Subreferences . 7 1.2 History . 7 1.2.1 Pre-computer era . 7 1.2.2 Subreferences . 8 1.2.3 Early computer era . 8 1.2.4 Subreferences . 8 1.2.5 Modern/Structured programming languages . 9 1.3 References . 19 2 Programming Paradigms 21 2.1 Introduction . 21 2.2 History . 21 2.2.1 Low-level: binary, assembly . 21 2.2.2 Procedural programming . 22 2.2.3 Object-oriented programming . 23 2.2.4 Declarative programming . 27 3 Program Evaluation 33 3.1 Program analysis and translation phases . 33 3.1.1 Front end . 33 3.1.2 Back end . 34 3.2 Compilation vs. interpretation . 34 3.2.1 Compilation . 34 3.2.2 Interpretation . 36 3.2.3 Subreferences . 37 3.3 Type System . 38 3.3.1 Type checking . 38 3.4 Memory management .
    [Show full text]
  • CS 110 Discussion 15 Programming with SIMD Intrinsics
    CS 110 Discussion 15 Programming with SIMD Intrinsics Yanjie Song School of Information Science and Technology May 7, 2020 Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 1 / 21 Table of Contents 1 Introduction on Intrinsics 2 Compiler and SIMD Intrinsics 3 Intel(R) SDE 4 Application: Horizontal sum in vector Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 2 / 21 Table of Contents 1 Introduction on Intrinsics 2 Compiler and SIMD Intrinsics 3 Intel(R) SDE 4 Application: Horizontal sum in vector Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 3 / 21 Introduction on Intrinsics Definition In computer software, in compiler theory, an intrinsic function (or builtin function) is a function (subroutine) available for use in a given programming language whose implementation is handled specially by the compiler. Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 4 / 21 Intrinsics in C/C++ Compilers for C and C++, of Microsoft, Intel, and the GNU Compiler Collection (GCC) implement intrinsics that map directly to the x86 single instruction, multiple data (SIMD) instructions (MMX, Streaming SIMD Extensions (SSE), SSE2, SSE3, SSSE3, SSE4). Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 5 / 21 x86 SIMD instruction set extensions MMX (1996, 64 bits) 3DNow! (1998) Streaming SIMD Extensions (SSE, 1999, 128 bits) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) Advanced Vector eXtensions (AVX, 2008, 256 bits) AVX2 (2013) F16C (2009) XOP (2009) FMA FMA4 (2011) FMA3 (2012) AVX-512 (2015, 512 bits) Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 6 / 21 SIMD extensions in other ISAs There are SIMD instructions for other ISAs as well, e.g.
    [Show full text]