Building Clang/LLVM Efficiently

Total Page:16

File Type:pdf, Size:1020Kb

Building Clang/LLVM Efficiently Building Clang/LLVM efficiently Tilmann Scheller LLVM Compiler Engineer [email protected] Samsung Open Source Group Samsung Research UK EuroLLVM 2015 London, United Kingdom, April 13 – 14, 2015 Samsung Open Source Group 1 Overview ● Introduction ● Performance ● Summary Samsung Open Source Group 2 Introduction Samsung Open Source Group 3 Introduction ● Clang/LLVM are large pieces of C++ software ● Long turnaround time harms developer productivity ● Squeeze maximum performance out of the toolchain ● Speed up the build without buying new hardware ● Focus on x86-64 Linux ● Test machine: Fedora 21 on i7-4770K CPU @ 3.50GHz, 16GB RAM, 1TB 7200RPM HDD Samsung Open Source Group 4 Ideas ● Build with Clang instead of GCC ● Only build what you really need (e.g. only build relevant backends) ● Use a faster linker: GNU gold instead of GNU ld ● Use a heavily optimized compiler binary for the build (LTO, PGO, LTO+PGO) ● For incremental debug builds: shared library build instead of static build Samsung Open Source Group 5 Performance Samsung Open Source Group 6 Measurement ● GCC 4.9.2 shipping with Fedora 21 ● Clang trunk snapshot (r234392 from April 8, 2015) ● Standard debug/release builds of Clang (CMake build with default generator) unless otherwise noted ● Using GNU gold 2.24 for linking ● Make invoked with -j8 ● Best of five runs Samsung Open Source Group 7 Clang vs. GCC compile time ● Both Clang and GCC are compiled with GCC 4.9.2 755 Debug build 1.48x faster 509 Clang GCC 4.9.2 726 Release build 538 1.34x faster 0 100 200 300 400 500 600 700 800 seconds Samsung Open Source Group 8 Only build a subset of Clang/LLVM ● Only build relevant backends ● Host Clang built with GCC 4.9.2 755 Debug build 1.27x faster 591 X86 backend only All backends 726 Release build 583 1.24x faster 0 100 200 300 400 500 600 700 800 seconds Samsung Open Source Group 9 Use a faster linker ● Using GNU gold instead of GNU ld ● Building Clang with GCC 4.9.2 873 Debug build 755 1.17x faster 746 GNU gold + split DWARF GNU gold 731 GNU ld Release build 726 0 100 200 300 400 500 600 700 800 900 1000 seconds Samsung Open Source Group 10 Optimize host compiler binary aggressively ● Building Clang with a heavily optimized host Clang build ● Host Clang was compiled with Clang or GCC at the various different optimization levels Release Debug 560 520 547 510 509 512 537 538 540 498 522 500 520 480 500 467 482 s s d d 480 460 n n 450 o o 462 c c e e 460 s s 440 440 420 420 400 400 GCC -O3 GCC LTO GCC LTO+PGO GCC -O3 GCC LTO GCC LTO+PGO Clang -O3 Clang LTO GCC PGO Clang -O3 Clang LTO GCC PGO PGO 1.16x faster PGO 1.13x faster Samsung Open Source Group 11 Speeding up incremental builds ● Shared library build rather than static build ● Debug build of Clang with GCC 4.9.2 ● Incremental build after changing a single file in the ARM backend (ARMISelLowering.cpp) 27 Build time after edit Shared build Static build 5 5.4x faster 0 5 10 15 20 25 30 seconds Samsung Open Source Group 12 Summary Samsung Open Source Group 13 Summary ● Always build with Clang if you care about compile time ● Use GNU gold rather than GNU ld ● Building Clang with GCC and PGO produces fastest binary ● Shared library builds speed up incremental debug builds tremendously ● Overall speedup: 1.58x for release builds and 1.94x for debug builds (Switching from GCC + GNU ld to a PGO build of Clang + GNU gold) Samsung Open Source Group 14 Thank you. Samsung Open Source Group 15.
Recommended publications
  • Red Hat Enterprise Linux 6 Developer Guide
    Red Hat Enterprise Linux 6 Developer Guide An introduction to application development tools in Red Hat Enterprise Linux 6 Dave Brolley William Cohen Roland Grunberg Aldy Hernandez Karsten Hopp Jakub Jelinek Developer Guide Jeff Johnston Benjamin Kosnik Aleksander Kurtakov Chris Moller Phil Muldoon Andrew Overholt Charley Wang Kent Sebastian Red Hat Enterprise Linux 6 Developer Guide An introduction to application development tools in Red Hat Enterprise Linux 6 Edition 0 Author Dave Brolley [email protected] Author William Cohen [email protected] Author Roland Grunberg [email protected] Author Aldy Hernandez [email protected] Author Karsten Hopp [email protected] Author Jakub Jelinek [email protected] Author Jeff Johnston [email protected] Author Benjamin Kosnik [email protected] Author Aleksander Kurtakov [email protected] Author Chris Moller [email protected] Author Phil Muldoon [email protected] Author Andrew Overholt [email protected] Author Charley Wang [email protected] Author Kent Sebastian [email protected] Editor Don Domingo [email protected] Editor Jacquelynn East [email protected] Copyright © 2010 Red Hat, Inc. and others. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
    [Show full text]
  • The LLVM Instruction Set and Compilation Strategy
    The LLVM Instruction Set and Compilation Strategy Chris Lattner Vikram Adve University of Illinois at Urbana-Champaign lattner,vadve ¡ @cs.uiuc.edu Abstract This document introduces the LLVM compiler infrastructure and instruction set, a simple approach that enables sophisticated code transformations at link time, runtime, and in the field. It is a pragmatic approach to compilation, interfering with programmers and tools as little as possible, while still retaining extensive high-level information from source-level compilers for later stages of an application’s lifetime. We describe the LLVM instruction set, the design of the LLVM system, and some of its key components. 1 Introduction Modern programming languages and software practices aim to support more reliable, flexible, and powerful software applications, increase programmer productivity, and provide higher level semantic information to the compiler. Un- fortunately, traditional approaches to compilation either fail to extract sufficient performance from the program (by not using interprocedural analysis or profile information) or interfere with the build process substantially (by requiring build scripts to be modified for either profiling or interprocedural optimization). Furthermore, they do not support optimization either at runtime or after an application has been installed at an end-user’s site, when the most relevant information about actual usage patterns would be available. The LLVM Compilation Strategy is designed to enable effective multi-stage optimization (at compile-time, link-time, runtime, and offline) and more effective profile-driven optimization, and to do so without changes to the traditional build process or programmer intervention. LLVM (Low Level Virtual Machine) is a compilation strategy that uses a low-level virtual instruction set with rich type information as a common code representation for all phases of compilation.
    [Show full text]
  • Atmospheric Modelling and HPC
    Atmospheric Modelling and HPC Graziano Giuliani ICTP – ESP Trieste, 1 October 2015 NWP and Climate Codes ● Legacy code from the '60 : FORTRAN language ● Path and Dynamic Libraries ● Communication Libraries ● I/O format Libraries ● Data analysis tools and the data deluge http://clima-dods.ictp.it/Workshops/smr2761 Materials ● This presentation: – atmospheric_models_and_hpc.pdf ● Codes: – codes.tar.gz ● Download both, uncompress codes: – cp codes.tar.gz /scratch – cd /scratch – tar zxvf codes.tar.gz Legacy code ● Numerical Weather Prediction uses mathematical models of the atmosphere and oceans to predict the weather based on current weather conditions. ● MOST of the NWP and climate codes can push their roots back in the 1960-1970 ● The NWP problem was one of the target problem which led to the Computer Era: the ENIAC was used to create the first weather forecasts via computer in 1950. ● Code grows by ACCRETION FORTRAN and FortranXX ● Fortran Programming Language is a general-purpose, imperative programming language that is especially suited to numeric computation and scientific computing. ● Originally developed by IBM in the 1950s for scientific and engineering applications ● Its standard is controlled by an ISO comitee: the most recent standard, ISO/IEC 1539-1:2010, informally known as Fortran 2008, was approved in September 2010. ● Needs a compiler to translate code into executable Fortran Compilers ● http://fortranwiki.org ● Most of the High Performance Compilers are Commercial, and “SuperComputer” vendors usually provide their HIGH OPTIMIZED version of Fortran Compiler. ● Some part of the Standard are left “COMPILER SPECIFIC”, opening for incompatibility among the binary formats produced by different compilers. The Fortran .mod files ● Fortran 90 introduced into the language the modules: module my_mod contains subroutine mysub(a,b) real , intent(in) :: a real , intent(out) :: b end subroutine mysub end module my_mod ● The compiler creates the object file and a .mod file compiling the above code.
    [Show full text]
  • The Interplay of Compile-Time and Run-Time Options for Performance Prediction Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel
    The Interplay of Compile-time and Run-time Options for Performance Prediction Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel To cite this version: Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel. The Interplay of Compile-time and Run-time Options for Performance Prediction. SPLC 2021 - 25th ACM Inter- national Systems and Software Product Line Conference - Volume A, Sep 2021, Leicester, United Kingdom. pp.1-12, 10.1145/3461001.3471149. hal-03286127 HAL Id: hal-03286127 https://hal.archives-ouvertes.fr/hal-03286127 Submitted on 15 Jul 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Interplay of Compile-time and Run-time Options for Performance Prediction Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel Univ Rennes, INSA Rennes, CNRS, Inria, IRISA Rennes, France [email protected] ABSTRACT Both compile-time and run-time options can be configured to reach Many software projects are configurable through compile-time op- specific functional and performance goals. tions (e.g., using ./configure) and also through run-time options (e.g., Existing studies consider either compile-time or run-time op- command-line parameters, fed to the software at execution time).
    [Show full text]
  • CCPP Technical Documentation Release V3.0.0
    CCPP Technical Documentation Release v3.0.0 J. Schramm, L. Bernardet, L. Carson, G. Firl, D. Heinzeller, L. Pan, and M. Zhang Jun 17, 2019 For referencing this document please use: Schramm, J., L. Bernardet, L. Carson, G. Firl, D. Heinzeller, L. Pan, and M. Zhang, 2019. CCPP Technical Documentation Release v3.0.0. 91pp. Available at https://dtcenter.org/GMTB/v3.0/ccpp_tech_guide.pdf. CONTENTS 1 CCPP Overview 1 1.1 How to Use this Document........................................4 2 CCPP-Compliant Physics Parameterizations7 2.1 General Rules..............................................8 2.2 Input/output Variable (argument) Rules.................................9 2.3 Coding Rules............................................... 10 2.4 Parallel Programming Rules....................................... 11 2.5 Scientific Documentation Rules..................................... 12 2.5.1 Doxygen Comments and Commands.............................. 12 2.5.2 Doxygen Documentation Style................................. 13 2.5.3 Doxygen Configuration..................................... 18 2.5.4 Using Doxygen......................................... 20 3 CCPP Configuration and Build Options 23 4 Constructing Suites 27 4.1 Suite Definition File........................................... 27 4.1.1 Groups............................................. 27 4.1.2 Subcycling........................................... 27 4.1.3 Order of Schemes........................................ 28 4.2 Interstitial Schemes........................................... 28 4.3 SDF Examples.............................................
    [Show full text]
  • Design and Implementation of Generics for the .NET Common Language Runtime
    Design and Implementation of Generics for the .NET Common Language Runtime Andrew Kennedy Don Syme Microsoft Research, Cambridge, U.K. fakeÒÒ¸d×ÝÑeg@ÑicÖÓ×ÓfغcÓÑ Abstract cally through an interface definition language, or IDL) that is nec- essary for language interoperation. The Microsoft .NET Common Language Runtime provides a This paper describes the design and implementation of support shared type system, intermediate language and dynamic execution for parametric polymorphism in the CLR. In its initial release, the environment for the implementation and inter-operation of multiple CLR has no support for polymorphism, an omission shared by the source languages. In this paper we extend it with direct support for JVM. Of course, it is always possible to “compile away” polymor- parametric polymorphism (also known as generics), describing the phism by translation, as has been demonstrated in a number of ex- design through examples written in an extended version of the C# tensions to Java [14, 4, 6, 13, 2, 16] that require no change to the programming language, and explaining aspects of implementation JVM, and in compilers for polymorphic languages that target the by reference to a prototype extension to the runtime. JVM or CLR (MLj [3], Haskell, Eiffel, Mercury). However, such Our design is very expressive, supporting parameterized types, systems inevitably suffer drawbacks of some kind, whether through polymorphic static, instance and virtual methods, “F-bounded” source language restrictions (disallowing primitive type instanti- type parameters, instantiation at pointer and value types, polymor- ations to enable a simple erasure-based translation, as in GJ and phic recursion, and exact run-time types.
    [Show full text]
  • Adding Self-Healing Capabilities to the Common Language Runtime
    Adding Self-healing capabilities to the Common Language Runtime Rean Griffith Gail Kaiser Columbia University Columbia University [email protected] [email protected] Abstract systems can leverage to maintain high system availability is to perform repairs in a degraded mode of operation[23, 10]. Self-healing systems require that repair mechanisms are Conceptually, a self-managing system is composed of available to resolve problems that arise while the system ex- four (4) key capabilities [12]; Monitoring to collect data ecutes. Managed execution environments such as the Com- about its execution and operating environment, performing mon Language Runtime (CLR) and Java Virtual Machine Analysis over the data collected from monitoring, Planning (JVM) provide a number of application services (applica- an appropriate course of action and Executing the plan. tion isolation, security sandboxing, garbage collection and Each of the four functions participating in the Monitor- structured exception handling) which are geared primar- Analyze-Plan-Execute (MAPE) loop consumes and pro- ily at making managed applications more robust. How- duces knowledgewhich is integral to the correct functioning ever, none of these services directly enables applications of the system. Over its execution lifetime the system builds to perform repairs or consistency checks of their compo- and refines a knowledge-base of its behavior and environ- nents. From a design and implementation standpoint, the ment. Information in the knowledge-base could include preferred way to enable repair in a self-healing system is patterns of resource utilization and a “scorecard” tracking to use an externalized repair/adaptation architecture rather the success of applying specific repair actions to detected or than hardwiring adaptation logic inside the system where it predicted problems.
    [Show full text]
  • 18Mca42c .Net Programming (C#)
    18MCA42C .NET PROGRAMMING (C#) Introduction to .NET Framework .NET is a software framework which is designed and developed by Microsoft. The first version of .Net framework was 1.0 which came in the year 2002. It is a virtual machine for compiling and executing programs written in different languages like C#, VB.Net etc. It is used to develop Form-based applications, Web-based applications, and Web services. There is a variety of programming languages available on the .Net platform like VB.Net and C# etc.,. It is used to build applications for Windows, phone, web etc. It provides a lot of functionalities and also supports industry standards. .NET Framework supports more than 60 programming languages in which 11 programming languages are designed and developed by Microsoft. 11 Programming Languages which are designed and developed by Microsoft are: C#.NET VB.NET C++.NET J#.NET F#.NET JSCRIPT.NET WINDOWS POWERSHELL IRON RUBY IRON PYTHON C OMEGA ASML(Abstract State Machine Language) Main Components of .NET Framework 1.Common Language Runtime(CLR): CLR is the basic and Virtual Machine component of the .NET Framework. It is the run-time environment in the .NET Framework that runs the codes and helps in making the development process easier by providing the various services such as remoting, thread management, type-safety, memory management, robustness etc.. Basically, it is responsible for managing the execution of .NET programs regardless of any .NET programming language. It also helps in the management of code, as code that targets the runtime is known as the Managed Code and code doesn’t target to runtime is known as Unmanaged code.
    [Show full text]
  • Dmon: Efficient Detection and Correction of Data Locality
    DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling Tanvir Ahmed Khan and Ian Neal, University of Michigan; Gilles Pokam, Intel Corporation; Barzan Mozafari and Baris Kasikci, University of Michigan https://www.usenix.org/conference/osdi21/presentation/khan This paper is included in the Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation. July 14–16, 2021 978-1-939133-22-9 Open access to the Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling Tanvir Ahmed Khan Ian Neal Gilles Pokam Barzan Mozafari University of Michigan University of Michigan Intel Corporation University of Michigan Baris Kasikci University of Michigan Abstract cally at run time. In fact, as we (§6.2) and others [2,15,20,27] Poor data locality hurts an application’s performance. While demonstrate, compiler-based techniques can sometimes even compiler-based techniques have been proposed to improve hurt performance when the assumptions made by those heuris- data locality, they depend on heuristics, which can sometimes tics do not hold in practice. hurt performance. Therefore, developers typically find data To overcome the limitations of static optimizations, the locality issues via dynamic profiling and repair them manually. systems community has invested substantial effort in devel- Alas, existing profiling techniques incur high overhead when oping dynamic profiling tools [28,38, 57,97, 102]. Dynamic used to identify data locality problems and cannot be deployed profilers are capable of gathering detailed and more accurate in production, where programs may exhibit previously-unseen execution information, which a developer can use to identify performance problems.
    [Show full text]
  • EPICS Application Developer's Guide
    1 EPICS Application Developer’s Guide EPICS Base Release 3.15.0.2 17 October 2014 Andrew N. Johnson, Janet B. Anderson, Martin R. Kraimer (Argonne National Laboratory) W. Eric Norum (Lawrence Berkeley National Laboratory) Jeffrey O. Hill (Los Alamos National Laboratory) Ralph Lange, Benjamin Franksen (Helmholtz-Zentrum Berlin) Peter Denison (Diamond Light Source) 2 Contents EPICS Applications Developer’s Guide1 Table of Contents 7 1 Introduction 9 1.1 Overview.................................................9 1.2 Acknowledgments............................................ 11 2 Getting Started 13 2.1 Introduction................................................ 13 2.2 Example IOC Application........................................ 13 2.3 Channel Access Host Example...................................... 15 2.4 iocsh.................................................... 16 2.5 Building IOC components........................................ 16 2.6 makeBaseApp.pl............................................. 19 2.7 vxWorks boot parameters......................................... 22 2.8 RTEMS boot procedure.......................................... 23 3 EPICS Overview 25 3.1 What is EPICS?.............................................. 25 3.2 Basic Attributes.............................................. 26 3.3 IOC Software Components........................................ 26 3.4 Channel Access.............................................. 28 3.5 OPI Tools................................................. 30 3.6 EPICS Core Software..........................................
    [Show full text]
  • Studying the Evolution of Build Systems
    Studying the Evolution of Build Systems by Shane McIntosh A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen's University Kingston, Ontario, Canada January 2011 Copyright c Shane McIntosh, 2011 Abstract As a software project ages, its source code is improved by refining existing features, adding new ones, and fixing bugs. Software developers can attest that such changes often require accompanying changes to the infrastructure that converts source code into executable software packages, i.e., the build system. Intuition suggests that these build system changes slow down development progress by diverting developer focus away from making improvements to the source code. While source code evolution and maintenance is studied extensively, there is little work that focuses on the build system. In this thesis, we empirically study the static and dynamic evolution of build system complexity in proprietary and open source projects. To help counter potential bias of the study, 13 projects with different sizes, domains, build technologies, and release strategies were selected for examination, including Eclipse, Linux, Mozilla, and JBoss. We find that: (1) similar to Lehman's first law of software evolution, Java build system specifications tend to grow unless explicit effort is invested into restructuring them, (2) the build system accounts for up to 31% of the code files in a project, and (3) up to 27% of source code related development tasks require build maintenance. Project managers should include build maintenance effort of this magnitude in their project planning and budgeting estimations. i Co-authorship Earlier versions of the work in this thesis were published as listed below: 1) The Evolution of ANT Build Systems (Chapter 4) Shane McIntosh, Bram Adams, and Ahmed E.
    [Show full text]
  • Compile Time and Runtime
    COMPILE TIME AND RUNTIME Compile time and runtime are two distinctly different times during the active life of a computer program. Compile time is when the program is compiled; runtime is when it executes (on either a physical or virtual computer). Programmers use the term static to refer to anything that is created during compile time and stays fixed during the program run. They use the term dynamic to refer to things that are created and can change during execution. Example In application MyApp shown below, the data type of variable tax is statically set to double for the entire program run. Its value is dynamically set when the assignment statement (marked **) executes. public class MyApp { public static void main( String [] args ) { double tax; . ** tax = price * TAX_RATE; . } } Compiler Tasks The main tasks of the compiler are to: Check program statements for errors and report them. Generate the machine instructions that carry out the operations specified by the program. To do this, the compiler must gather as much information as possible about the items (e.g. variables, classes, objects, etc.) used in the program. Compile Time and Runtime Page 1 Example Consider this Java assignment statement: x = y2 + z; To determine if it is correct, the compiler needs to know if y2 is a declared variable (perhaps the programmer meant to type y*2). To generate correct bytecode, the compiler needs to know the data types of the variables (integer addition is a different machine instruction than floating-point addition). Declarations Program text that is primarily meant to provide information to the compiler is called a declaration, which the compiler processes by collecting the information given.
    [Show full text]