PGI Visual Fortran Reference Manual

Total Page:16

File Type:pdf, Size:1020Kb

PGI Visual Fortran Reference Manual ® PGI Visual Fortran Reference Manual Parallel Fortran for Scientists and Engineers Release 2013 The Portland Group PGI Visual Fortran® Reference Manual Copyright © 2013 NVIDIA Corporation All rights reserved. Printed in the United States of America First printing: Release 2011, 11.0, December, 2010 Second Printing: Release 2011, 11.1, January 2011 Third Printing: Release 2011, 11.3, March 2011 Fourth Printing: Release 2011, 11.4, April 2011 Fifth Printing: Release 2011, 11.5, May 2011 Sixth Printing: Release 2012, 12.1, January 2012 Seventh Printing: Release 2012, 12.6, June 2012 Eighth Printing: Release 2012, 12.9, September 2012 Ninth Printing: Release 2013, 13.1, January 2013 Tenth Printing: Release 2013, 13.2, February 2013 Eleventh Printing: Release 2013, 13.3, March 2013 Twelfth Printing: Release 2013, 13.8, August 2013 Technical support: [email protected] Sales: [email protected] Web: www.pgroup.com ID: 132831548 Contents Preface ...................................................................................................................................... xv Audience Description ............................................................................................................. xv Compatibility and Conformance to Standards ............................................................................. xv Organization ......................................................................................................................... xvi Hardware and Software Constraints ......................................................................................... xvi Conventions ......................................................................................................................... xvii Related Publications .............................................................................................................. xix 1. Fortran Data Types ............................................................................................................ 1 Fortran Data Types .................................................................................................................. 1 Fortran Scalars ............................................................................................................... 1 FORTRAN 77 Aggregate Data Type Extensions ...................................................................... 3 Fortran 90 Aggregate Data Types (Derived Types) ................................................................ 4 2. Command-Line Options Reference ................................................................................. 5 PGI Compiler Option Summary ................................................................................................. 5 Build-Related PGI Options ................................................................................................ 6 PGI Debug-Related Compiler Options ................................................................................ 8 PGI Optimization-Related Compiler Options ........................................................................ 9 PGI Linking and Runtime-Related Compiler Options ............................................................. 9 Generic PGI Compiler Options ................................................................................................ 10 –M Options by Category ......................................................................................................... 48 Code Generation Controls ............................................................................................... 48 Environment Controls .................................................................................................... 52 Fortran Language Controls ............................................................................................. 53 Inlining Controls ........................................................................................................... 58 Optimization Controls .................................................................................................... 59 Miscellaneous Controls .................................................................................................. 70 3. Directives Reference ........................................................................................................ 77 PGI Proprietary Fortran Directive Summary .............................................................................. 77 altcode (noaltcode) ..................................................................................................... 78 assoc (noassoc) ........................................................................................................... 79 iii bounds (nobounds) ..................................................................................................... 79 cncall (nocncall) .......................................................................................................... 79 concur (noconcur) ...................................................................................................... 79 depchk (nodepchk) ...................................................................................................... 79 eqvchk (noeqvchk) ...................................................................................................... 79 invarif (noinvarif) ......................................................................................................... 80 ivdep ........................................................................................................................... 80 lstval (nolstval) ............................................................................................................. 80 prefetch ....................................................................................................................... 80 opt .............................................................................................................................. 80 safe_lastval ................................................................................................................... 80 tp ................................................................................................................................ 82 unroll (nounroll) .......................................................................................................... 82 vector (novector) .......................................................................................................... 83 vintr (novintr) .............................................................................................................. 83 Prefetch Directives ............................................................................................................... 83 IGNORE_TKR Directive .......................................................................................................... 83 !DEC$ Directives ................................................................................................................... 84 ALIAS Directive ............................................................................................................. 85 ATTRIBUTES Directive .................................................................................................... 86 DECORATE Directive ...................................................................................................... 87 DISTRIBUTE Directive .................................................................................................... 87 4. Run-time Environment .................................................................................................... 89 Win32 Programming Model .................................................................................................... 89 Function Calling Sequence .............................................................................................. 89 Function Return Values .................................................................................................. 92 Argument Passing .......................................................................................................... 93 Win64 Programming Model .................................................................................................... 96 Function Calling Sequence .............................................................................................. 96 Function Return Values .................................................................................................. 98 Argument Passing .......................................................................................................... 99 Win64 Fortran Supplement ........................................................................................... 101 5. PVF Properties ................................................................................................................ 107 Property Page Summary ....................................................................................................... 108 General Property Page ......................................................................................................... 126 Output Directory ......................................................................................................... 126 Intermediate Directory ................................................................................................. 126 Extensions to Delete on Clean ......................................................................................
Recommended publications
  • Computational Economics and Econometrics
    Computational Economics and Econometrics A. Ronald Gallant Penn State University c 2015 A. Ronald Gallant Course Website http://www.aronaldg.org/courses/compecon Go to website and discuss • Preassignment • Course plan (briefly now, in more detail a few slides later) • Source code • libscl • Lectures • Homework • Projects Course Objective – Intro Introduce modern methods of computation and numerical anal- ysis to enable students to solve computationally intensive prob- lems in economics, econometrics, and finance. The key concepts to be mastered are the following: • The object oriented programming style. • The use of standard data structures. • Implementation and use of a matrix class. • Implementation and use of numerical algorithms. • Practical applications. • Parallel processing. Details follow Object Oriented Programming Object oriented programming is a style of programming devel- oped to support modern computing projects. Much of the devel- opment was in the commercial sector. The features of interest to us are the following: • The computer code closely resembles the way we think about a problem. • The computer code is compartmentalized into objects that perform clearly specified tasks. Importantly, this allows one to work on one part of the code without having to remember how the other parts work: se- lective ignorance • One can use inheritance and virtual functions both to describe the project design as interfaces to its objects and to permit polymorphism. Inter- faces cause the compiler to enforce our design, relieving us of the chore. Polymorphism allows us to easily swap in and out objects so we can try different models, different algorithms, etc. • The structures used to implement objects are much more flexible than the minimalist types of non-object oriented language structures such as subroutines, functions, and static common storage.
    [Show full text]
  • Unit – I Computer Architecture and Operating System – Scs1315
    SCHOOL OF ELECTRICAL AND ELECTRONICS DEPARTMENT OF ELECTRONICS AND COMMMUNICATION ENGINEERING UNIT – I COMPUTER ARCHITECTURE AND OPERATING SYSTEM – SCS1315 UNIT.1 INTRODUCTION Central Processing Unit - Introduction - General Register Organization - Stack organization -- Basic computer Organization - Computer Registers - Computer Instructions - Instruction Cycle. Arithmetic, Logic, Shift Microoperations- Arithmetic Logic Shift Unit -Example Architectures: MIPS, Power PC, RISC, CISC Central Processing Unit The part of the computer that performs the bulk of data-processing operations is called the central processing unit CPU. The CPU is made up of three major parts, as shown in Fig.1 Fig 1. Major components of CPU. The register set stores intermediate data used during the execution of the instructions. The arithmetic logic unit (ALU) performs the required microoperations for executing the instructions. The control unit supervises the transfer of information among the registers and instructs the ALU as to which operation to perform. General Register Organization When a large number of registers are included in the CPU, it is most efficient to connect them through a common bus system. The registers communicate with each other not only for direct data transfers, but also while performing various microoperations. Hence it is necessary to provide a common unit that can perform all the arithmetic, logic, and shift microoperations in the processor. A bus organization for seven CPU registers is shown in Fig.2. The output of each register is connected to two multiplexers (MUX) to form the two buses A and B. The selection lines in each multiplexer select one register or the input data for the particular bus. The A and B buses form the inputs to a common arithmetic logic unit (ALU).
    [Show full text]
  • Fortran Resources 1
    Fortran Resources 1 Ian D Chivers Jane Sleightholme May 7, 2021 1The original basis for this document was Mike Metcalf’s Fortran Information File. The next input came from people on comp-fortran-90. Details of how to subscribe or browse this list can be found in this document. If you have any corrections, additions, suggestions etc to make please contact us and we will endeavor to include your comments in later versions. Thanks to all the people who have contributed. Revision history The most recent version can be found at https://www.fortranplus.co.uk/fortran-information/ and the files section of the comp-fortran-90 list. https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=comp-fortran-90 • May 2021. Major update to the Intel entry. Also changes to the editors and IDE section, the graphics section, and the parallel programming section. • October 2020. Added an entry for Nvidia to the compiler section. Nvidia has integrated the PGI compiler suite into their NVIDIA HPC SDK product. Nvidia are also contributing to the LLVM Flang project. Updated the ’Additional Compiler Information’ entry in the compiler section. The Polyhedron benchmarks discuss automatic parallelisation. The fortranplus entry covers the diagnostic capability of the Cray, gfortran, Intel, Nag, Oracle and Nvidia compilers. Updated one entry and removed three others from the software tools section. Added ’Fortran Discourse’ to the e-lists section. We have also made changes to the Latex style sheet. • September 2020. Added a computer arithmetic and IEEE formats section. • June 2020. Updated the compiler entry with details of standard conformance.
    [Show full text]
  • S.D.M COLLEGE of ENGINEERING and TECHNOLOGY Sridhar Y
    VISVESVARAYA TECHNOLOGICAL UNIVERSITY S.D.M COLLEGE OF ENGINEERING AND TECHNOLOGY A seminar report on CUDA Submitted by Sridhar Y 2sd06cs108 8th semester DEPARTMENT OF COMPUTER SCIENCE ENGINEERING 2009-10 Page 1 VISVESVARAYA TECHNOLOGICAL UNIVERSITY S.D.M COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE ENGINEERING CERTIFICATE Certified that the seminar work entitled “CUDA” is a bonafide work presented by Sridhar Y bearing USN 2SD06CS108 in a partial fulfillment for the award of degree of Bachelor of Engineering in Computer Science Engineering of the Visvesvaraya Technological University, Belgaum during the year 2009-10. The seminar report has been approved as it satisfies the academic requirements with respect to seminar work presented for the Bachelor of Engineering Degree. Staff in charge H.O.D CSE Name: Sridhar Y USN: 2SD06CS108 Page 2 Contents 1. Introduction 4 2. Evolution of GPU programming and CUDA 5 3. CUDA Structure for parallel processing 9 4. Programming model of CUDA 10 5. Portability and Security of the code 12 6. Managing Threads with CUDA 14 7. Elimination of Deadlocks in CUDA 17 8. Data distribution among the Thread Processes in CUDA 14 9. Challenges in CUDA for the Developers 19 10. The Pros and Cons of CUDA 19 11. Conclusions 21 Bibliography 21 Page 3 Abstract Parallel processing on multi core processors is the industry’s biggest software challenge, but the real problem is there are too many solutions. One of them is Nvidia’s Compute Unified Device Architecture (CUDA), a software platform for massively parallel high performance computing on the powerful Graphics Processing Units (GPUs).
    [Show full text]
  • Lecture 6: Instruction Set Architecture and the 80X86
    Lecture 6: Instruction Set Architecture and the 80x86 Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review From Last Time • Given sales a function of performance relative to competition, tremendous investment in improving product as reported by performance summary • Good products created when have: – Good benchmarks – Good ways to summarize performance • If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins • Time is the measure of computer performance! • What about cost? RHK.S96 2 Review: Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Dies per wafer = p * ( Wafer_diam / 2)2 – p * Wafer_diam – Test dies Die Area Ö 2 * Die Area Defects_per_unit_area * Die_Area Die Yield = Wafer yield * { 1 + } 4 Die Cost is goes roughly with area RHK.S96 3 Review From Last Time Price vs. Cost 100% 80% Average Discount 60% Gross Margin 40% Direct Costs 20% Component Costs 0% Mini W/S PC 5 4.7 3.8 4 3.5 Average Discount 3 2.5 Gross Margin 2 1.8 Direct Costs 1.5 1 Component Costs 0 Mini W/S PC RHK.S96 4 Today: Instruction Set Architecture • 1950s to 1960s: Computer Architecture Course Computer Arithmetic • 1970 to mid 1980s: Computer Architecture Course Instruction Set Design, especially ISA appropriate for compilers • 1990s: Computer Architecture Course Design of CPU, memory system, I/O system, Multiprocessors RHK.S96 5 Computer Architecture? . the attributes of a [computing] system as seen by the programmer, i.e.
    [Show full text]
  • Low-Power Microprocessor Based on Stack Architecture
    Girish Aramanekoppa Subbarao Low-power Microprocessor based on Stack Architecture Stack on based Microprocessor Low-power Master’s Thesis Low-power Microprocessor based on Stack Architecture Girish Aramanekoppa Subbarao Series of Master’s theses Department of Electrical and Information Technology LU/LTH-EIT 2015-464 Department of Electrical and Information Technology, http://www.eit.lth.se Faculty of Engineering, LTH, Lund University, September 2015. Department of Electrical and Information Technology Master of Science Thesis Low-power Microprocessor based on Stack Architecture Supervisors: Author: Prof. Joachim Rodrigues Girish Aramanekoppa Subbarao Prof. Anders Ard¨o Lund 2015 © The Department of Electrical and Information Technology Lund University Box 118, S-221 00 LUND SWEDEN This thesis is set in Computer Modern 10pt, with the LATEX Documentation System ©Girish Aramanekoppa Subbarao 2015 Printed in E-huset Lund, Sweden. Sep. 2015 Abstract There are many applications of microprocessors in embedded applications, where power efficiency becomes a critical requirement, e.g. wearable or mobile devices in healthcare, space instrumentation and handheld devices. One of the methods of achieving low power operation is by simplifying the device architecture. RISC/CISC processors consume considerable power because of their complexity, which is due to their multiplexer system connecting the register file to the func- tional units and their instruction pipeline system. On the other hand, the Stack machines are comparatively less complex due to their implied addressing to the top two registers of the stack and smaller operation codes. This makes the instruction and the address decoder circuit simple by eliminating the multiplex switches for read and write ports of the register file.
    [Show full text]
  • X86-64 Machine-Level Programming∗
    x86-64 Machine-Level Programming∗ Randal E. Bryant David R. O'Hallaron September 9, 2005 Intel’s IA32 instruction set architecture (ISA), colloquially known as “x86”, is the dominant instruction format for the world’s computers. IA32 is the platform of choice for most Windows and Linux machines. The ISA we use today was defined in 1985 with the introduction of the i386 microprocessor, extending the 16-bit instruction set defined by the original 8086 to 32 bits. Even though subsequent processor generations have introduced new instruction types and formats, many compilers, including GCC, have avoided using these features in the interest of maintaining backward compatibility. A shift is underway to a 64-bit version of the Intel instruction set. Originally developed by Advanced Micro Devices (AMD) and named x86-64, it is now supported by high end processors from AMD (who now call it AMD64) and by Intel, who refer to it as EM64T. Most people still refer to it as “x86-64,” and we follow this convention. Newer versions of Linux and GCC support this extension. In making this switch, the developers of GCC saw an opportunity to also make use of some of the instruction-set features that had been added in more recent generations of IA32 processors. This combination of new hardware and revised compiler makes x86-64 code substantially different in form and in performance than IA32 code. In creating the 64-bit extension, the AMD engineers also adopted some of the features found in reduced-instruction set computers (RISC) [7] that made them the favored targets for optimizing compilers.
    [Show full text]
  • Instruction Set Architecture
    The Instruction Set Architecture Application Instruction Set Architecture OiOperating Compiler System Instruction Set Architecture Instr. Set Proc. I/O system or Digital Design “How to talk to computers if Circuit Design you aren’t on Star Trek” CSE 240A Dean Tullsen CSE 240A Dean Tullsen How to Sppmpeak Computer Crafting an ISA High Level Language temp = v[k]; • Designing an ISA is both an art and a science Program v[k] = v[k+ 1]; v[k+1] = temp; • ISA design involves dealing in an extremely rare resource Compiler – instruction bits! lw $15, 0($2) AblLAssembly Language lw $16, 4($2) • Some things we want out of our ISA Program sw $16, 0($2) – completeness sw $15, 4($2) Assembler – orthogonality 1000110001100010000000000000000 – regularity and simplicity Machine Language 1000110011110010000000000000100 – compactness Program 1010110011110010000000000000000 – ease of ppgrogramming 1010110001100010000000000000100 Machine Interpretation – ease of implementation Control Signal Spec ALUOP[0:3] <= InstReg[9:11] & MASK CSE 240A Dean Tullsen CSE 240A Dean Tullsen Where are the instructions? KKyey ISA decisions • Harvard architecture • Von Neumann architecture destination operand operation • operations y = x + b – how many? inst & source operands inst cpu data – which ones storage storage operands cpu • data “stored-program” computer – how many? storage – location – types – how to specify? how does the computer know what L1 • instruction format 0001 0100 1101 1111 inst – size means? cache L2 cpu Mem cache – how many formats? L1 dtdata cache CSE
    [Show full text]
  • The Processor's Structure
    Computer Architecture, Lecture 3: The processor's structure Hossam A. H. Fahmy Cairo University Electronics and Communications Engineering 1 / 15 Structure versus organization Architecture: the structure and organization of a computer's hardware or system software. Within the processor: Registers, Operational Units (Integer, Floating Point, special purpose, . ) Outside the processor: Memory, I/O, . Instruction Set Architecture (ISA): What is the best for the target application? Why? Examples: Sun SPARC, MIPS, Intel x86 (IA32), IBM S/390. Defines: data (types, storage, and addressing modes), instruction (operation code) set, and instruction formats. 2 / 15 ISA The instruction set is the \language" that allows the hardware and the software to speak. It defines the exact functionality of the various operations. It defines the manner to perform these operations by specifying the encoding formats. It does not deal with the speed of performing these operations, the power consumed when these operations are invoked, and the specific details of the circuit implementations. 3 / 15 There are many languages Arabic and Hebrew are semitic languages. English and Urdu are Indo-European languages. The ISAs have their \families" as well. Stack architectures push operands onto a stack and issue operations with no explicit operands. The result is put on top of the same stack. Accumulator architectures use one explicit operand with the accumulator as the second operand. The result is put in the accumulator. Register-memory architectures have several registers. The operands and the result may be in the registers or the memory. Register-register architectures have several registers as well. The operands and result are restricted to be in the registers only.
    [Show full text]
  • Totalview: Debugging from Desktop to Supercomputer
    ATPESC 2017 TotalView: Debugging from Desktop to Supercomputer Peter Thompson Principal Software Support Engineer August 8, 2017 © 2017 Rogue Wave Software, Inc. All Rights Reserved. 1 Some thoughts on debugging • As soon as we started programming, we found out to our surprise that it wasn’t as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs. – Maurice Wilkes • Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. – Brian W. Kernigan • Sometimes it pays to stay in bed on Monday, rather than spending the rest of the week debugging Monday’s code. – Dan Saloman © 2017 Rogue Wave Software, Inc. All Rights Reserved. 2 Rogue Wave’s Debugging Tool TotalView for HPC • Source code debugger for C/C++/Fortran – Visibility into applications – Control over applications • Scalability • Usability • Support for HPC platforms and languages © 2017 Rogue Wave Software, Inc. All Rights Reserved. 3 TotalView Overview © 2017 Rogue Wave Software, Inc. All Rights Reserved. 4 TotalView Origins Mid-1980’s Bolt, Berenak, and Newman (BBN) Butterfly Machine An early ‘Massively Parallel’ computer © 2017 Rogue Wave Software, Inc. All Rights Reserved. 5 How do you debug a Butterfly? • TotalView project was developed as a solution for this environment – Able to debug multiple processes and threads – Point and click interface – Multiple and Mixed Language Support • Core development group has been there from the beginning and have been/are involved in defining MPI interfaces, DWARF, and lately OMPD (Open MP debugging interface) © 2017 Rogue Wave Software, Inc.
    [Show full text]
  • IMSL C Function Catalog 2020.0
    IMSL® C Numerical Library Function Catalog Version 2020.0 1 At the heart of the IMSL C Numerical Library is a comprehensive set of pre-built mathematical and statistical analysis functions that developers can embed directly into their applications. Available for a wide range of computing platforms, the robust, scalable, portable and high performing IMSL analytics allow developers to focus on their domain of expertise and reduce development time. 2 IMSL C Numerical Library v2020.0 Function Catalog COST-EFFECTIVENESS AND VALUE including transforms, convolutions, splines, The IMSL C Numerical Library significantly shortens quadrature, and more. application time to market and promotes standardization. Descriptive function names and EMBEDDABILITY variable argument lists have been implemented to Development is made easier because library code simplify calling sequences. Using the IMSL C Library readily embeds into application code, with no reduces costs associated with the design, additional infrastructure such as app/management development, documentation, testing and consoles, servers, or programming environments maintenance of applications. Robust, scalable, needed. portable, and high performing analytics with IMSL Wrappers complicate development by requiring the forms the foundation for inherently reliable developer to access external compilers and pass applications. arrays or user-defined data types to ensure compatibility between the different languages. The A RICH SET OF PREDICTIVE ANALYTICS IMSL C Library allows developers to write, build, FUNCTIONS compile and debug code in a single environment, and The library includes a comprehensive set of functions to easily embed analytic functions in applications and for machine learning, data mining, prediction, and databases. classification, including: Time series models such as ARIMA, RELIABILITY GARCH, and VARMA vector auto- 100% pure C code maximizes robustness.
    [Show full text]
  • Design and Implementation of a Multithreaded Associative Simd Processor
    DESIGN AND IMPLEMENTATION OF A MULTITHREADED ASSOCIATIVE SIMD PROCESSOR A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Kevin Schaffer December, 2011 Dissertation written by Kevin Schaffer B.S., Kent State University, 2001 M.S., Kent State University, 2003 Ph.D., Kent State University, 2011 Approved by Robert A. Walker, Chair, Doctoral Dissertation Committee Johnnie W. Baker, Members, Doctoral Dissertation Committee Kenneth E. Batcher, Eugene C. Gartland, Accepted by John R. D. Stalvey, Administrator, Department of Computer Science Timothy Moerland, Dean, College of Arts and Sciences ii TABLE OF CONTENTS LIST OF FIGURES ......................................................................................................... viii LIST OF TABLES ............................................................................................................. xi CHAPTER 1 INTRODUCTION ........................................................................................ 1 1.1. Architectural Trends .............................................................................................. 1 1.1.1. Wide-Issue Superscalar Processors............................................................... 2 1.1.2. Chip Multiprocessors (CMPs) ...................................................................... 2 1.2. An Alternative Approach: SIMD ........................................................................... 3 1.3. MTASC Processor ................................................................................................
    [Show full text]