Nvidia Hpc Compilers Reference Guide

Total Page:16

File Type:pdf, Size:1020Kb

Nvidia Hpc Compilers Reference Guide NVIDIA HPC COMPILERS REFERENCE GUIDE PR-09861-001-V20.7 | August 2020 TABLE OF CONTENTS Preface..........................................................................................................................................................xi Audience Description...............................................................................................................................xi Compatibility and Conformance to Standards.......................................................................................xi Organization............................................................................................................................................ xii Hardware and Software Constraints....................................................................................................xiii Conventions............................................................................................................................................ xiii Terms...................................................................................................................................................... xiv Related Publications...............................................................................................................................xv Chapter 1. Fortran, C, and C++ Data Types............................................................................................... 1 1.1. Fortran Data Types........................................................................................................................... 1 1.1.1. Fortran Scalars.......................................................................................................................... 1 1.1.2. FORTRAN real(2)........................................................................................................................ 3 1.1.3. FORTRAN 77 Aggregate Data Type Extensions........................................................................3 1.1.4. Fortran 90 Aggregate Data Types (Derived Types).................................................................. 4 1.2. C and C++ Data Types...................................................................................................................... 5 1.2.1. C and C++ Scalars..................................................................................................................... 5 1.2.2. C and C++ Aggregate Data Types............................................................................................. 6 1.2.3. Class and Object Data Layout...................................................................................................7 1.2.4. Aggregate Alignment................................................................................................................. 7 1.2.5. Bit-field Alignment.....................................................................................................................8 1.2.6. Other Type Keywords in C and C++.......................................................................................... 9 Chapter 2. Command-Line Options Reference........................................................................................ 10 2.1. HPC Compilers Option Summary.................................................................................................. 10 2.1.1. Build-Related NVIDIA Options.................................................................................................11 2.1.2. HPC Debug-Related Compiler Options...................................................................................13 2.1.3. HPC Optimization-Related Compiler Options.........................................................................14 2.1.4. NVIDIA Linking and Runtime-Related Compiler Options...................................................... 14 2.2. Generic HPC Compiler Options......................................................................................................15 2.2.1. -#...............................................................................................................................................15 2.2.2. -###..........................................................................................................................................16 2.2.3. The -acc option........................................................................................................................ 16 2.2.4. -Bdynamic.................................................................................................................................17 2.2.5. -Bstatic......................................................................................................................................18 2.2.6. -Bstatic_nv................................................................................................................................18 2.2.7. -byteswapio...............................................................................................................................19 2.2.8. -C...............................................................................................................................................20 2.2.9. -c............................................................................................................................................... 20 2.2.10. -d<arg>....................................................................................................................................21 2.2.11. -D.............................................................................................................................................22 2.2.12. -dryrun.................................................................................................................................... 23 NVIDIA HPC Compilers Reference Guide Version 20.7 | ii 2.2.13. -drystdinc................................................................................................................................ 23 2.2.14. -E............................................................................................................................................. 24 2.2.15. -F............................................................................................................................................. 24 2.2.16. -fast......................................................................................................................................... 25 2.2.17. -fastsse................................................................................................................................... 25 2.2.18. --flagcheck..............................................................................................................................25 2.2.19. -flags....................................................................................................................................... 26 2.2.20. -fpic......................................................................................................................................... 26 2.2.21. -fPIC........................................................................................................................................ 27 2.2.22. -g............................................................................................................................................. 27 2.2.23. -gopt........................................................................................................................................28 2.2.24. -g77libs................................................................................................................................... 28 2.2.25. -help........................................................................................................................................ 29 2.2.26. -I.............................................................................................................................................. 31 2.2.27. -i2, -i4, -i8...............................................................................................................................32 2.2.28. -K<flag>...................................................................................................................................32 2.2.29. --keeplnk................................................................................................................................ 34 2.2.30. -L............................................................................................................................................. 34 2.2.31. -l<library>................................................................................................................................35 2.2.32. -M............................................................................................................................................ 36 2.2.33. -m............................................................................................................................................36 2.2.34. -m64........................................................................................................................................ 36 2.2.35. -M<pgflag>..............................................................................................................................37 2.2.36. -mcmodel=medium................................................................................................................ 43 2.2.37. -module <moduledir>...........................................................................................................
Recommended publications
  • Mipspro C++ Programmer's Guide
    MIPSproTM C++ Programmer’s Guide 007–0704–150 CONTRIBUTORS Rewritten in 2002 by Jean Wilson with engineering support from John Wilkinson and editing support from Susan Wilkening. COPYRIGHT Copyright © 1995, 1999, 2002 - 2003 Silicon Graphics, Inc. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part, without the prior written permission of Silicon Graphics, Inc. LIMITED RIGHTS LEGEND The electronic (software) version of this document was developed at private expense; if acquired under an agreement with the USA government or any contractor thereto, it is acquired as "commercial computer software" subject to the provisions of its applicable license agreement, as specified in (a) 48 CFR 12.212 of the FAR; or, if acquired for Department of Defense units, (b) 48 CFR 227-7202 of the DoD FAR Supplement; or sections succeeding thereto. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre Pkwy 2E, Mountain View, CA 94043-1351. TRADEMARKS AND ATTRIBUTIONS Silicon Graphics, SGI, the SGI logo, IRIX, O2, Octane, and Origin are registered trademarks and OpenMP and ProDev are trademarks of Silicon Graphics, Inc. in the United States and/or other countries worldwide. MIPS, MIPS I, MIPS II, MIPS III, MIPS IV, R2000, R3000, R4000, R4400, R4600, R5000, and R8000 are registered or unregistered trademarks and MIPSpro, R10000, R12000, R1400 are trademarks of MIPS Technologies, Inc., used under license by Silicon Graphics, Inc. Portions of this publication may have been derived from the OpenMP Language Application Program Interface Specification.
    [Show full text]
  • CRTE V11.1A Common Runtime Environment
    English FUJITSU Software BS2000 CRTE V11.1A Common Runtime Environment User Guide * Edition December 2019 Comments… Suggestions… Corrections… The User Documentation Department would like to know your opinion on this manual. Your feedback helps us to optimize our documentation to suit your individual needs. Feel free to send us your comments by e-mail to: [email protected] senden. Certified documentation according to DIN EN ISO 9001:2015 To ensure a consistently high quality standard and user-friendliness, this documentation was created to meet the regulations of a quality management system which complies with the requirements of the standard DIN EN ISO 9001:2015 . Copyright and Trademarks Copyright © 2019 Fujitsu Technology Solutions GmbH. All rights reserved. Delivery subject to availability; right of technical modifications reserved. All hardware and software names used are trademarks of their respective manufacturers. Table of Contents CRTE V11.1 . 6 1 Preface . 7 1.1 Objectives and target groups of this manual . 8 1.2 Summary of contents . 9 1.3 Changes since the last edition of the manual . 10 1.4 Notational conventions . 11 2 Selectable unit, installation and shareability of CRTE . 12 2.1 CRTE V11.1A selectable unit . 13 2.2 Installing CRTE . 16 2.2.1 CRTE libraries for installation without version specification . 17 2.2.2 Standard installation under the user ID “$.” . 18 2.2.3 Installing with IMON under a non-standard user ID . 19 2.2.4 Installing header files and POSIX link switches in the default POSIX directory . 20 2.2.5 Installing header files and POSIX link switches in any POSIX directory .
    [Show full text]
  • CS 110 Discussion 15 Programming with SIMD Intrinsics
    CS 110 Discussion 15 Programming with SIMD Intrinsics Yanjie Song School of Information Science and Technology May 7, 2020 Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 1 / 21 Table of Contents 1 Introduction on Intrinsics 2 Compiler and SIMD Intrinsics 3 Intel(R) SDE 4 Application: Horizontal sum in vector Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 2 / 21 Table of Contents 1 Introduction on Intrinsics 2 Compiler and SIMD Intrinsics 3 Intel(R) SDE 4 Application: Horizontal sum in vector Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 3 / 21 Introduction on Intrinsics Definition In computer software, in compiler theory, an intrinsic function (or builtin function) is a function (subroutine) available for use in a given programming language whose implementation is handled specially by the compiler. Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 4 / 21 Intrinsics in C/C++ Compilers for C and C++, of Microsoft, Intel, and the GNU Compiler Collection (GCC) implement intrinsics that map directly to the x86 single instruction, multiple data (SIMD) instructions (MMX, Streaming SIMD Extensions (SSE), SSE2, SSE3, SSSE3, SSE4). Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 5 / 21 x86 SIMD instruction set extensions MMX (1996, 64 bits) 3DNow! (1998) Streaming SIMD Extensions (SSE, 1999, 128 bits) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) Advanced Vector eXtensions (AVX, 2008, 256 bits) AVX2 (2013) F16C (2009) XOP (2009) FMA FMA4 (2011) FMA3 (2012) AVX-512 (2015, 512 bits) Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 6 / 21 SIMD extensions in other ISAs There are SIMD instructions for other ISAs as well, e.g.
    [Show full text]
  • Red Hat Developer Toolset 9 User Guide
    Red Hat Developer Toolset 9 User Guide Installing and Using Red Hat Developer Toolset Last Updated: 2020-08-07 Red Hat Developer Toolset 9 User Guide Installing and Using Red Hat Developer Toolset Zuzana Zoubková Red Hat Customer Content Services Olga Tikhomirova Red Hat Customer Content Services [email protected] Supriya Takkhi Red Hat Customer Content Services Jaromír Hradílek Red Hat Customer Content Services Matt Newsome Red Hat Software Engineering Robert Krátký Red Hat Customer Content Services Vladimír Slávik Red Hat Customer Content Services Legal Notice Copyright © 2020 Red Hat, Inc. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/ . In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp.
    [Show full text]
  • PGI Compilers
    USER'S GUIDE FOR X86-64 CPUS Version 2019 TABLE OF CONTENTS Preface............................................................................................................ xii Audience Description......................................................................................... xii Compatibility and Conformance to Standards............................................................xii Organization................................................................................................... xiii Hardware and Software Constraints.......................................................................xiv Conventions.................................................................................................... xiv Terms............................................................................................................ xv Related Publications.........................................................................................xvii Chapter 1. Getting Started.....................................................................................1 1.1. Overview................................................................................................... 1 1.2. Creating an Example..................................................................................... 2 1.3. Invoking the Command-level PGI Compilers......................................................... 2 1.3.1. Command-line Syntax...............................................................................2 1.3.2. Command-line Options............................................................................
    [Show full text]
  • Intel Hardware Intrinsics in .NET Core
    Han Lee, Intel Corporation [email protected] Notices and Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. No product or component can be absolutely secure. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548- 4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, and other Intel product and solution names in this presentation are trademarks of Intel *Other names and brands may be claimed as the property of others © Intel Corporation. 2 What Do These Have in Common? Domain Example Image processing Color extraction High performance computing (HPC) Matrix multiplication Data processing Hamming code Text processing UTF-8 conversion Data structures Bit array Machine learning Classification For performance sensitive code, consider using Intel® hardware intrinsics 3 Objectives .
    [Show full text]
  • The Complete Guide to Return X;
    The Complete Guide to return x; I also do C++ training! [email protected] Arthur O’Dwyer 2021-05-04 Outline ● The “return slot”; NRVO; C++17 “deferred materialization” [4–23] ● C++11 implicit move [24–29]. Question break. ● Problems in C++11; solutions in C++20 [30–46]. Question break. ● The reference_wrapper saga; pretty tables of vendor divergence [47–55] ● Quick sidebar on coroutines and related topics [56–65]. Question break. ● P2266 proposed for C++23 [66–79]. Questions! Hey look! Slide numbers! 3 x86-64 calling convention int f() { _Z1fv: int i = 42; movl $42, -4(%rsp) return i; movl -4(%rsp), %eax } retq int test() _Z4testv: { callq _Z1fv int j = f(); addl $1, %eax return j + 1; retq } On x86-64, the function’s return value usually goes into the %eax register. 4 x86-64 calling convention Stack Segment int f() { Since f and test each have their own int i = 42; f i printf("%p\n", &i); stack frame, i and j naturally are different return i; prints “0x9ff00020” variables. } test j j is initialized with a int test() { copy of i — C++ : int j = f(); loves copy semantics. : printf("%p\n", &j); : return j + 1; prints “0x9ff00040” } main 5 x86-64 calling convention Stack Segment struct S { int m; }; Even for class types, C++ does “return by f i S f() { copy.” prints “ ” S i = S{42}; 0x9ff00020 The return value is printf("%p\n", &i); still passed in a test j return i; machine register } when possible. : : S test() { : S j = f(); prints “0x9ff00040” printf("%p\n", &j); main return j; } 6 x86-64 calling convention But what about when Stack Segment struct S { int m[3]; }; S is too big to fit in a register? S f() { f i Then x86-64 says that S i = S{{1,3,5}}; the caller should pass printf("%p\n", &i); an extra parameter, return i; pointing to space in } the caller’s own : return slot stack frame big : S test() { enough to hold the test: S j = f(); result.
    [Show full text]
  • Reference Guide for X86-64 Cpus
    REFERENCE GUIDE FOR X86-64 CPUS Version 2019 TABLE OF CONTENTS Preface............................................................................................................. xi Audience Description.......................................................................................... xi Compatibility and Conformance to Standards............................................................ xi Organization....................................................................................................xii Hardware and Software Constraints...................................................................... xiii Conventions....................................................................................................xiii Terms............................................................................................................xiv Related Publications.......................................................................................... xv Chapter 1. Fortran, C, and C++ Data Types................................................................ 1 1.1. Fortran Data Types....................................................................................... 1 1.1.1. Fortran Scalars.......................................................................................1 1.1.2. FORTRAN real(2).....................................................................................3 1.1.3. FORTRAN 77 Aggregate Data Type Extensions.................................................. 3 1.1.4. Fortran 90 Aggregate Data Types (Derived
    [Show full text]
  • Demystifying Value Categories in C++ Icsc 2020
    Demystifying Value Categories in C++ iCSC 2020 Nis Meinert Rostock University Disclaimer Disclaimer → This talk is mainly about hounding (unnecessary) copy ctors → In case you don’t care: “If you’re not at all interested in performance, shouldn’t you be in the Python room down the hall?” (Scott Meyers) Nis Meinert – Rostock University Demystifying Value Categories in C++ 2 / 100 Table of Contents PART I PART II → Understanding References → Dangling References → Value Categories → std::move in the wild → Perfect Forwarding → What Happens on return? → Reading Assembly for Fun and → RVO in Depth Profit → Perfect Backwarding → Implicit Costs of const& Nis Meinert – Rostock University Demystifying Value Categories in C++ 3 / 100 PART I Understanding References Q: What is the output of the programs? 1 #!/usr/bin/env python3 1 #include <iostream> 2 2 3 class S: 3 struct S{ 4 def __init__(self, x): 4 int x; 5 self.x = x 5 }; 6 6 7 def swap(a, b): 7 void swap(S& a, S& b) { 8 b, a = a, b 8 S& tmp = a; 9 9 a = b; 10 if __name__ == '__main__': 10 b = tmp; 11 a, b = S(1), S(2) 11 } 12 swap(a, b) 12 13 print(f'{a.x}{b.x}') 13 int main() { 14 S a{1}; S b{2}; 15 swap(a, b); 16 std::cout << a.x << b.x; 17 } godbolt.org/z/rE6Ecd Nis Meinert – Rostock University Demystifying Value Categories in C++ – Understanding References 4 / 100 Q: What is the output of the programs? A: 12 A: 22 1 #!/usr/bin/env python3 1 #include <iostream> 2 2 3 class S: 3 struct S{ 4 def __init__(self, x): 4 int x; 5 self.x = x 5 }; 6 6 7 def swap(a, b): 7 void swap(S& a, S& b) {
    [Show full text]
  • Optimizing Subroutines in Assembly Language an Optimization Guide for X86 Platforms
    2. Optimizing subroutines in assembly language An optimization guide for x86 platforms By Agner Fog. Copenhagen University College of Engineering. Copyright © 1996 - 2012. Last updated 2012-02-29. Contents 1 Introduction ....................................................................................................................... 4 1.1 Reasons for using assembly code .............................................................................. 5 1.2 Reasons for not using assembly code ........................................................................ 5 1.3 Microprocessors covered by this manual .................................................................... 6 1.4 Operating systems covered by this manual................................................................. 7 2 Before you start................................................................................................................. 7 2.1 Things to decide before you start programming .......................................................... 7 2.2 Make a test strategy.................................................................................................... 9 2.3 Common coding pitfalls............................................................................................. 10 3 The basics of assembly coding........................................................................................ 12 3.1 Assemblers available ................................................................................................ 12 3.2 Register set
    [Show full text]
  • C++ Programmer's Guide
    C++ Programmer’s Guide Document Number 007–0704–130 St. Peter’s Basilica image courtesy of ENEL SpA and InfoByte SpA. Disk Thrower image courtesy of Xavier Berenguer, Animatica. Copyright © 1995, 1999 Silicon Graphics, Inc. All Rights Reserved. This document or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Silicon Graphics, Inc. LIMITED AND RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the Government is subject to restrictions as set forth in the Rights in Data clause at FAR 52.227-14 and/or in similar or successor clauses in the FAR, or in the DOD, DOE or NASA FAR Supplements. Unpublished rights reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre Pkwy., Mountain View, CA 94043-1351. Autotasking, CF77, CRAY, Cray Ada, CraySoft, CRAY Y-MP, CRAY-1, CRInform, CRI/TurboKiva, HSX, LibSci, MPP Apprentice, SSD, SUPERCLUSTER, UNICOS, X-MP EA, and UNICOS/mk are federally registered trademarks and Because no workstation is an island, CCI, CCMT, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Animation Theater, CRAY APP, CRAY C90, CRAY C90D, Cray C++ Compiling System, CrayDoc, CRAY EL, CRAY J90, CRAY J90se, CrayLink, Cray NQS, Cray/REELlibrarian, CRAY S-MP, CRAY SSD-T90, CRAY SV1, CRAY T90, CRAY T3D, CRAY T3E, CrayTutor, CRAY X-MP, CRAY XMS, CRAY-2, CSIM, CVT, Delivering the power . ., DGauss, Docview, EMDS, GigaRing, HEXAR, IOS, ND Series Network Disk Array, Network Queuing Environment, Network Queuing Tools, OLNET, RQS, SEGLDR, SMARTE, SUPERLINK, System Maintenance and Remote Testing Environment, Trusted UNICOS, and UNICOS MAX are trademarks of Cray Research, Inc., a wholly owned subsidiary of Silicon Graphics, Inc.
    [Show full text]
  • Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets
    Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets Daniel S. McFarlin Volodymyr Arbatov Franz Franchetti Department of Electrical and Department of Electrical and Department of Electrical and Computer Engineering Computer Engineering Computer Engineering Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA USA 15213 Pittsburgh, PA USA 15213 Pittsburgh, PA USA 15213 [email protected] [email protected] [email protected] Markus Püschel Department of Computer Science ETH Zurich 8092 Zurich, Switzerland [email protected] ABSTRACT General Terms The well-known shift to parallelism in CPUs is often associated Performance with multicores. However another trend is equally salient: the increasing parallelism in per-core single-instruction multiple-date Keywords (SIMD) vector units. Intel’s SSE and IBM’s VMX (compatible to Autovectorization, super-optimization, SIMD, program generation, AltiVec) both offer 4-way (single precision) floating point, but the Fourier transform recent Intel instruction sets AVX and Larrabee (LRB) offer 8-way and 16-way, respectively. Compilation and optimization for vector extensions is hard, and often the achievable speed-up by using vec- 1. Introduction torizing compilers is small compared to hand-optimization using Power and area constraints are increasingly dictating microar- intrinsic function interfaces. Unfortunately, the complexity of these chitectural developments in the commodity and high-performance intrinsics interfaces increases considerably with the vector length, (HPC) CPU space. Consequently, the once dominant approach of making hand-optimization a nightmare. In this paper, we present a dynamically extracting instruction-level parallelism (ILP) through peephole-based vectorization system that takes as input the vector monolithic out-of-order microarchitectures is being supplanted by instruction semantics and outputs a library of basic data reorgani- designs with simpler, replicable architectural features.
    [Show full text]