Final Report on Arm-Optimized Fortran Compiler and Mathematics Libraries Version 1.0

MB3 D7.17{ Final report on Arm-optimized Fortran compiler and mathematics libraries Version 1.0 Document Information Contract Number 671697 Project Website www.montblanc-project.eu Contractual Deadline PM39 Dissemination Level PU Nature Report Authors Chris Goodyer (Arm), Paul Osmialowski (Arm) and Francesco Pet- rogalli (Arm) Contributors Chris Goodyer (Arm), Paul Osmialowski (Arm) and Francesco Pet- rogalli (Arm) Reviewers Keywords HPC, Fortran, OpenMP, Performance Libraries Notices: This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 671697. c Mont-Blanc 3 Consortium Partners. All rights reserved. MB3 D7.17 - Final report on Arm-optimized Fortran compiler and mathematics libraries Version 1.0 Change Log Version Description of Change v0.1 Initial version of the deliverable v1.0 Final version 2 D7.17 - Final report on Arm-optimized Fortran compiler and mathematics libraries Version 1.0 Contents Executive Summary 4 1 Arm-optimized Fortran compiler 5 1.1 The importance of the Fortran programming language in the modern HPC world5 1.2 The significance of open source Fortran compilers . .5 1.3 LLVM as an innovative open source infrastructure for developing modern compilers6 1.3.1 LLVM Intermediate Representation . .6 1.4 PGI Flang project . .7 1.5 How Flang fits into LLVM . .7 1.5.1 Fortran 90 Modules . .8 1.6 The Flang source code . .9 1.7 libpgmath { the mathematical intrinsics library . .9 1.8 Licensing . 10 1.9 Compatibility and performance . 10 1.9.1 Compatibility issues . 10 1.10 Vectorization and SVE . 14 1.11 Flang as a part of Arm Compiler for HPC suite . 17 1.11.1 Public contribution . 18 1.12 The future of Flang: F18 project . 19 2 OpenMP 20 2.1 Public contribution . 22 3 Current status of maths libraries on AArch64 26 3.1 libm functionality . 26 3.2 Vector math routines . 27 3.2.1 Limitations . 28 3.3 Vendor BLAS, LAPACK and FFT libraries . 29 3.3.1 Performance . 30 3.4 Additional HPC libraries . 33 A The ofc-test report 36 B Livermore benchmark report 45 B.1 Flang results . 45 B.2 GFortran results . 47 C Examples of vectorized and non{vectorized LLVM IR code 49 Acronyms and Abbreviations 58 3 MB3 D7.17 - Final report on Arm-optimized Fortran compiler and mathematics libraries Version 1.0 Executive Summary This final report summarises the current state{of{the{art for the following topics: • The current status and the future development of a new Arm{optimized Fortran compiler. • The current status of various categorizations of mathematical libraries for the Armv8 architecture software ecosystem. Overall it is shown how the range of packages available for users are looking very healthy for the deployment of real High Performance Computing (HPC) applications. The report emphasizes the importance of the Fortran programming language for HPC. It also explains that open source compilers ease adaptation of Fortran code base to a new hardware and various operating systems. The goal of the `PGI Flang' project (and its successor, the `F18` project) is to provide an LLVM compliant open source Fortran Compiler. Thanks to a Contributor License Agreement (CLA) between PGI and Arm, we were able to actively participate in the development of the Flang compiler and its successor. Since it was made public, the Flang project has gained a lot of attention and was tested for conformance with Fortran standards and performance of generated code. This document summarises our effort in ensuring good standards conformance and compatibility with the AArch64 architecture, including suitability for SVE vectorization. The second half of the report focuses on the latest developments in the support for the differ- ent numerical libraries available on Arm systems. This is split into four sections. In the first we focus on the optimizations that have been upstreamed this year for higher performing versions of various transcendental functions and the improvements these make for real HPC applications. Second we discuss the ability to vectorize loops that call vector versions of mathematical functions. This is followed by an update on the Arm Performance Libraries development that has happened, with a particular focus on FFT performance. Finally, the ongoing work with community codes, especially as provided through OpenHPC, is outlined. 4 D7.17 - Final report on Arm-optimized Fortran compiler and mathematics libraries Version 1.0 1 Arm-optimized Fortran compiler 1.1 The importance of the Fortran programming language in the modern HPC world Through the decades of its existence, the Fortran programming language has proven to be an ideal language for numerical and scientific computing [Bra03][Loh10]. The key reasons Fortran is still so prevalent today are: • Huge number of applications created over decades of investment in scientific software written in Fortran. • Wide acceptance in scientific subject areas resulting in a large number of available experts. • Good expressiveness for describing numerical algorithms. • High efficiency of compiled code. • Portability across many platforms with little need for conditional compilation. These features guarantee continued development and further extension of the Fortran code base. In this report we summarise the current effort to provide a new open source Arm-optimized Fortran compiler with following features: • Capability to build optimized code for AArch64 architecture including ability to utilize SIMD units and SVE extensions. • Compatibility with Fortran 95/2003 as well as with FORTRAN 77; partial compatibility with Fortran 2008. • Utilization of the LLVM compiler suite's infrastructure for advanced optimizations and binary code generation (including possible future integration with the LLVM ecosystem). • Reasonable command line compatibility with GFortran from the GCC compiler suite. 1.2 The significance of open source Fortran compilers Commercial Fortran compilers tend to be relatively expensive causing software developers to look for free open{source alternatives. The most used Open{source Fortran compiler now is GFortran which is covered by the GNU Public Licence that prevents basing a proprietary closed{source product upon it. These factors suppress the adoption of Fortran code base to new hardware or operating system platforms and inhibit experimentation on new optimization techniques. Contrary to this, open source compilers with permissive licensing are amenable to modifications and fast prototyping of new ideas. In the rest of this section we focus our discussion on LLVM, the main open source alternative to the GCC compiler suite. 5 MB3 D7.17 - Final report on Arm-optimized Fortran compiler and mathematics libraries Version 1.0 Frontend Middle-end Backend (lexical analysis) (common optimizer) (binary code generation) C AArch64 C++ LLVM IR POWER Fortran MIPS64 Figure 1: Three-phase design of LLVM compilers 1.3 LLVM as an innovative open source infrastructure for developing modern compilers LLVM is a relatively young compiler originally started at the University of Illinois in 2000 as a research project of Chris Lattner. Despite being an open source project it gained the attention of commercial enterprises which has contributed to its rapid growth and wider recognition. This is mostly due to its permissive BSD/MIT{style license that does not impose limitations on including open source projects as a part of larger proprietary products. LLVM offers a modern design, with a modular architecture, reusable libraries and well{ defined interfaces [LA04]. It implements a three{phase design that decouples parsing, optimization and binary code emission into three independent stages, as shown in Figure 1. This means that aggressive loop and interprocedural optimizations are possible, regardless of the language in which the program was written and independent of the target architecture onto which the compiled program will be executed. 1.3.1 LLVM Intermediate Representation A typical LLVM compliant compiler frontend is responsible for parsing, validating and the diagnosis of errors in the input code written in a given programming language. It then translates the parsed code into LLVM IR (Intermediate Representation), typically by building an Abstract Syntax Tree (AST) and then converting the AST to LLVM IR. The IR syntax and structure is independent of the input programming language and is the only interface between the compiler frontend and the optimizer. Since LLVM IR has both binary and human{readable textual representations, one can ob- serve how the input code is presented to the optimizer and what transformations were performed on the IR. This is because optimization passes that transform the code do take LLVM IR as an input and produce (optimized) LLVM IR as an output. Such an approach, combined with the optional ability to obtain IR before and after each optimization pass, provides good insight into both the effect and correctness of the optimization steps. 6 D7.17 - Final report on Arm-optimized Fortran compiler and mathematics libraries Version 1.0 1.4 PGI Flang project In November 2015 the U.S. Department of Energy's National Nuclear Security Administration (NNSA) and its three national laboratories announced an agreement with NVIDIA's Portland Group, Inc. (PGI), for the creation of an open source Fortran compiler integrated with LLVM compilers infrastructure1. This enterprise is a part of the joint `Collaboration of Oak Ridge, Argonne and Lawrence Livermore laboratories' (CORAL). In May 2017, PGI publicized Flang2 3, an Open{Source Fortran frontend for LLVM along with a complimentary runtime library. This frontend is capable of creating LLVM IR and is derived from the proprietary PGI Fortran compiler, which has been used in HPC environments for more than 25 years. PGI is a leading supplier of software compilers and tools for paral- lel computing; their compiler is known for its excellent level of support of the Fortran 2003 standard4. The GitHub flang-compiler account5 holds all of the source code and documentation and is open for external contributions; it is also used for bug tracking { this proved to be a good communication channel with the developers.

Load more