ARM® Compiler Toolchain Version 4.1

Total Page:16

File Type:pdf, Size:1020Kb

ARM® Compiler Toolchain Version 4.1 ARM® Compiler toolchain Version 4.1 Using the Compiler Copyright © 2010-2011 ARM. All rights reserved. ARM DUI 0472C (ID080411) ARM Compiler toolchain Using the Compiler Copyright © 2010-2011 ARM. All rights reserved. Release Information The following changes have been made to this book. Change History Date Issue Confidentiality Change 28 May 2010 A Non-Confidential ARM Compiler toolchain v4.1 Release 30 September 2010 B Non-Confidential Update 1 for ARM Compiler toolchain v4.1 28 January 2011 C Non-Confidential Update 2 for ARM Compiler toolchain v4.1 Patch 3 30 April 2011 C Non-Confidential Update 3 for ARM Compiler toolchain v4.1 Patch 4 30 September 2011 C Non-Confidential Update 4 for ARM Compiler toolchain v4.1 Patch 5 Proprietary Notice Words and logos marked with ™ or ® are registered trademarks or trademarks of ARM in the EU and other countries, except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the trademarks of their respective owners. Neither the whole nor any part of the information contained in, or the product described in, this document may be adapted or reproduced in any material form except with the prior written permission of the copyright holder. The product described in this document is subject to continuous developments and improvements. All particulars of the product and its use contained in this document are given by ARM in good faith. However, all warranties implied or expressed, including but not limited to implied warranties of merchantability, or fitness for purpose, are excluded. This document is intended only to assist the reader in the use of the product. ARM shall not be liable for any loss or damage arising from the use of any information in this document, or any error or omission in such information, or any incorrect use of the product. Where the term ARM is used it means “ARM or any of its subsidiaries as appropriate”. Some material in this document is based on IEEE 754 - 1985 IEEE Standard for Binary Floating-Point Arithmetic. The IEEE disclaims any responsibility or liability resulting from the placement and use in the described manner. Confidentiality Status This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in accordance with the terms of the agreement entered into by ARM and the party that ARM delivered this document to. Product Status The information in this document is final, that is for a developed product. Web Address http://www.arm.com ARM DUI 0472C Copyright © 2010-2011 ARM. All rights reserved. ii ID080411 Non-Confidential Contents ARM Compiler toolchain Using the Compiler Chapter 1 Conventions and Feedback Chapter 2 Overview of the compiler 2.1 The compiler ................................................................................................................. 2-2 2.2 Source language modes of the compiler ...................................................................... 2-3 2.3 The C and C++ libraries ................................................................................................ 2-4 Chapter 3 Getting started with the Compiler 3.1 Compiler command-line syntax .................................................................................... 3-3 3.2 Compiler command-line options listed by group ........................................................... 3-4 3.3 Default compiler behavior ............................................................................................. 3-9 3.4 Order of compiler command-line options .................................................................... 3-11 3.5 Using stdin to input source code to the compiler ........................................................ 3-12 3.6 Directing output to stdout ............................................................................................ 3-14 3.7 Filename suffixes recognized by the compiler ............................................................ 3-15 3.8 Compiler output files ................................................................................................... 3-17 3.9 Factors influencing how the compiler searches for header files ................................. 3-18 3.10 Compiler command-line options and search paths ..................................................... 3-19 3.11 Compiler search rules and the current place .............................................................. 3-20 3.12 The ARMCC41INC environment variable ................................................................... 3-21 3.13 Code compatibility between separately compiled and assembled modules ............... 3-22 3.14 Linker feedback during compilation ............................................................................ 3-23 3.15 Unused function code ................................................................................................. 3-24 3.16 Minimizing code size by eliminating unused functions during compilation ................. 3-25 3.17 Minimizing code size by reducing compilation required for interworking .................... 3-26 3.18 Compilation build time ................................................................................................ 3-27 3.19 How to minimize compilation build time ...................................................................... 3-29 3.20 Minimizing compilation build time with a single armcc invocation .............................. 3-31 3.21 Effect of --multifile on compilation build time .............................................................. 3-32 ARM DUI 0472C Copyright © 2010-2011 ARM. All rights reserved. iii ID080411 Non-Confidential Contents 3.22 Minimizing compilation build time with parallel make ................................................. 3-33 3.23 Compilation build time and operating system choice .................................................. 3-34 Chapter 4 Using the NEON Vectorizing Compiler 4.1 NEON technology ......................................................................................................... 4-3 4.2 The NEON unit ............................................................................................................. 4-4 4.3 Methods of writing code for NEON ............................................................................... 4-6 4.4 Generating NEON instructions from C or C++ code ..................................................... 4-7 4.5 NEON C extensions ...................................................................................................... 4-8 4.6 Automatic vectorization ................................................................................................. 4-9 4.7 Data references within a vectorizable loop ................................................................. 4-10 4.8 Stride patterns and data accesses ............................................................................. 4-11 4.9 Factors affecting NEON vectorization performance ................................................... 4-12 4.10 NEON vectorization performance goals ..................................................................... 4-13 4.11 Recommended loop structure for vectorization .......................................................... 4-14 4.12 Data dependency conflicts when vectorizing code ..................................................... 4-15 4.13 Carry-around scalar variables and vectorization ........................................................ 4-17 4.14 Reduction of a vector to a scalar ................................................................................ 4-18 4.15 Vectorization on loops containing pointers ................................................................. 4-19 4.16 Nonvectorization on loops containing pointers and indirect addressing ..................... 4-21 4.17 Nonvectorization on conditional loop exits .................................................................. 4-22 4.18 Vectorizable loop iteration counts ............................................................................... 4-23 4.19 Indicating loop iteration counts to the compiler with __promise(expr) ........................ 4-25 4.20 Vectorizable and nonvectorizable use of structures ................................................... 4-27 4.21 Grouping use of structures for vectorization ............................................................... 4-28 4.22 struct member lengths and vectorization .................................................................... 4-29 4.23 Nonvectorization of function calls to non-inline functions from within loops ............... 4-30 4.24 Conditional statements and efficient vectorization ...................................................... 4-31 4.25 Vectorization diagnostics to tune code for improved performance ............................. 4-32 4.26 Vectorizable code example ......................................................................................... 4-34 4.27 DSP vectorizable code example ................................................................................. 4-37 4.28 What can limit or prevent automatic vectorization ...................................................... 4-40 Chapter 5 Compiler Features 5.1 About Profiler-guided optimization ................................................................................ 5-3 5.2 Profiler-guided optimizations with link-time code generation ........................................ 5-4 5.3 Compiler intrinsics .......................................................................................................
Recommended publications
  • Toolchains Instructor: Prabal Dutta Date: October 2, 2012
    EECS 373: Design of Microprocessor-Based Systems Fall 2012 Lecture 3: Toolchains Instructor: Prabal Dutta Date: October 2, 2012 Note: Unless otherwise specified, these notes assume: (i) an ARM Cortex-M3 processor operating in little endian mode; (ii) the ARM EABI application binary interface; and (iii) the GNU GCC toolchain. Toolchains A complete software toolchain includes programs to convert source code into binary machine code, link together separately assembled/compiled code modules, disassemble the binaries, and convert their formats. Binary program file (.bin) Assembly Object Executable files (.s) files (.o) image file objcopy ld (linker) as objdump (assembler) Memory layout Disassembled Linker code (.lst) script (.ld) Figure 0.1: Assembler Toolchain. A typical GNU (GNU's Not Unix) assembler toolchain includes several programs that interact as shown in Figure 0.1 and perform the following functions: • as is the assembler and it converts human-readable assembly language programs into binary machine language code. It typically takes as input .s assembly files and outputs .o object files. • ld is the linker and it is used to combine multiple object files by resolving their external symbol references and relocating their data sections, and outputting a single executable file. It typically takes as input .o object files and .ld linker scripts and outputs .out executable files. • objcopy is a translation utility that copies and converts the contents of an object file from one format (e.g. .out) another (e.g. .bin). • objdump is a disassembler but it can also display various other information about object files. It is often used to disassemble binary files (e.g.
    [Show full text]
  • CS 110 Discussion 15 Programming with SIMD Intrinsics
    CS 110 Discussion 15 Programming with SIMD Intrinsics Yanjie Song School of Information Science and Technology May 7, 2020 Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 1 / 21 Table of Contents 1 Introduction on Intrinsics 2 Compiler and SIMD Intrinsics 3 Intel(R) SDE 4 Application: Horizontal sum in vector Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 2 / 21 Table of Contents 1 Introduction on Intrinsics 2 Compiler and SIMD Intrinsics 3 Intel(R) SDE 4 Application: Horizontal sum in vector Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 3 / 21 Introduction on Intrinsics Definition In computer software, in compiler theory, an intrinsic function (or builtin function) is a function (subroutine) available for use in a given programming language whose implementation is handled specially by the compiler. Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 4 / 21 Intrinsics in C/C++ Compilers for C and C++, of Microsoft, Intel, and the GNU Compiler Collection (GCC) implement intrinsics that map directly to the x86 single instruction, multiple data (SIMD) instructions (MMX, Streaming SIMD Extensions (SSE), SSE2, SSE3, SSSE3, SSE4). Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 5 / 21 x86 SIMD instruction set extensions MMX (1996, 64 bits) 3DNow! (1998) Streaming SIMD Extensions (SSE, 1999, 128 bits) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) Advanced Vector eXtensions (AVX, 2008, 256 bits) AVX2 (2013) F16C (2009) XOP (2009) FMA FMA4 (2011) FMA3 (2012) AVX-512 (2015, 512 bits) Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 6 / 21 SIMD extensions in other ISAs There are SIMD instructions for other ISAs as well, e.g.
    [Show full text]
  • PGI Compilers
    USER'S GUIDE FOR X86-64 CPUS Version 2019 TABLE OF CONTENTS Preface............................................................................................................ xii Audience Description......................................................................................... xii Compatibility and Conformance to Standards............................................................xii Organization................................................................................................... xiii Hardware and Software Constraints.......................................................................xiv Conventions.................................................................................................... xiv Terms............................................................................................................ xv Related Publications.........................................................................................xvii Chapter 1. Getting Started.....................................................................................1 1.1. Overview................................................................................................... 1 1.2. Creating an Example..................................................................................... 2 1.3. Invoking the Command-level PGI Compilers......................................................... 2 1.3.1. Command-line Syntax...............................................................................2 1.3.2. Command-line Options............................................................................
    [Show full text]
  • Intel Hardware Intrinsics in .NET Core
    Han Lee, Intel Corporation [email protected] Notices and Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. No product or component can be absolutely secure. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548- 4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, and other Intel product and solution names in this presentation are trademarks of Intel *Other names and brands may be claimed as the property of others © Intel Corporation. 2 What Do These Have in Common? Domain Example Image processing Color extraction High performance computing (HPC) Matrix multiplication Data processing Hamming code Text processing UTF-8 conversion Data structures Bit array Machine learning Classification For performance sensitive code, consider using Intel® hardware intrinsics 3 Objectives .
    [Show full text]
  • Linux from Scratch Version 6.2
    Linux From Scratch Version 6.2 Gerard Beekmans Linux From Scratch: Version 6.2 by Gerard Beekmans Copyright © 1999–2006 Gerard Beekmans Copyright (c) 1999–2006, Gerard Beekmans All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: • Redistributions in any form must retain the above copyright notice, this list of conditions and the following disclaimer • Neither the name of “Linux From Scratch” nor the names of its contributors may be used to endorse or promote products derived from this material without specific prior written permission • Any material derived from Linux From Scratch must contain a reference to the “Linux From Scratch” project THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Linux From Scratch - Version 6.2 Table of Contents Preface
    [Show full text]
  • Anatomy of Cross-Compilation Toolchains
    Embedded Linux Conference Europe 2016 Anatomy of cross-compilation toolchains Thomas Petazzoni free electrons [email protected] Artwork and Photography by Jason Freeny free electrons - Embedded Linux, kernel, drivers - Development, consulting, training and support. http://free-electrons.com 1/1 Thomas Petazzoni I CTO and Embedded Linux engineer at Free Electrons I Embedded Linux specialists. I Development, consulting and training. I http://free-electrons.com I Contributions I Kernel support for the Marvell Armada ARM SoCs from Marvell I Major contributor to Buildroot, an open-source, simple and fast embedded Linux build system I Living in Toulouse, south west of France Drawing from Frank Tizzoni, at Kernel Recipes 2016 free electrons - Embedded Linux, kernel, drivers - Development, consulting, training and support. http://free-electrons.com 2/1 Disclaimer I I am not a toolchain developer. Not pretending to know everything about toolchains. I Experience gained from building simple toolchains in the context of Buildroot I Purpose of the talk is to give an introduction, not in-depth information. I Focused on simple gcc-based toolchains, and for a number of examples, on ARM specific details. I Will not cover advanced use cases, such as LTO, GRAPHITE optimizations, etc. I Will not cover LLVM free electrons - Embedded Linux, kernel, drivers - Development, consulting, training and support. http://free-electrons.com 3/1 What is a cross-compiling toolchain? I A set of tools that allows to build source code into binary code for
    [Show full text]
  • GCC Toolchain Eclipse Setup Guide
    !"#$ % '#((#()*!+,-.#)/$01234 GCC Toolchain Eclipse Setup Guide WP0001 Version 5 September 23, 2020 Copyright © 2017-2020 JBLopen Inc. All rights reserved. No part of this document and any associated software may be reproduced, distributed or transmitted in any form or by any means without the prior written consent of JBLopen Inc. Disclaimer While JBLopen Inc. has made every attempt to ensure the accuracy of the information contained in this publication, JBLopen Inc. cannot warrant the accuracy of completeness of such information. JBLopen Inc. may change, add or remove any content in this publication at any time without notice. All the information contained in this publication as well as any associated material, including software, scripts, and examples are provided “as is”. JBLopen Inc. makes no express or implied warranty of any kind, including warranty of merchantability, noninfringement of intellectual property, or fitness for a particular purpose. In no event shall JBLopen Inc. be held liable for any damage resulting from the use or inability to use the information contained therein or any other associated material. Trademark JBLopen, the JBLopen logo, TREEspanTM and BASEplatformTM are trademarks of JBLopen Inc. All other trademarks are trademarks or registered trademarks of their respective owners. Contents 1 Overview 1 1.1 About Eclipse ............................................. 1 2 Eclipse Setup Guide (Windows) 2 2.1 MSYS2 Installation .......................................... 2 2.2 Eclipse Installation .......................................... 11 2.3 Toolchain Installation ......................................... 16 2.4 Environment Variable Setup ..................................... 17 2.4.1 PATH Environnement Variable Setup ........................... 17 3 Eclipse Setup Guide (Linux) 22 3.1 Eclipse Installation .......................................... 22 3.2 Toolchain Installation ......................................... 27 3.3 GNU Make Installation .......................................
    [Show full text]
  • The GNU Toolchain for ARM Targets HOWTO Wookey Chris Rutter Jeff Sutherland Paul Webb
    This chapter contains information on building a GNU toolchain for ARM targets. The GNU Toolchain for ARM targets HOWTO Wookey Chris Rutter Jeff Sutherland Paul Webb This document contains information on setting up a GNU toolchain for ARM targets. It details both some pre-compiled toolchains which you can install, and how to compile your own toolchain, as either a native or a cross-compiler. 1. Credits This document is based on the work of Chris Rutter (now, sadly, deceased) who’s ’Building the GNU toolchain for ARM targets’ document was gospel for some time. It eventually became out of date so Wookey (<[email protected]>) updated it and gave it a substantion rewrite, adding the pre-built toolchain info at the same time. Paul Webb (<[email protected]>) did the initial conversion to DocBook, Phil 1 The GNU Toolchain for ARM targets HOWTO Blundell (<[email protected]>) provided info on the current state of the art and comments on the draft. Jeff Sutherland (<[email protected]>) then fixed the bits that were still wrong, and now maintains the doc, along with Wookey. Thanx to all. As well as being on-line as a stand-alone HOWTO, this document is also available as a chapter of the book: A ’Guide to ARMLinux for Developers’ (http://www.aleph1.co.uk/armlinux/thebook.html) 2 This chapter contains information on building a GNU toolchain for ARM targets. 1. Toolchain overview The toolchain actually consists of a number of components. The main one is the compiler itself gcc, which can be native to the host or a cross-compiler.
    [Show full text]
  • Exploring the Construction of a Domain-Aware Toolchain for High-Performance Computing
    Exploring the Construction of a Domain-Aware Toolchain for High-Performance Computing Patrick McCormick, Christine Sweeney, Nick Moss, Dean Prichard, Samuel K. Gutierrez, Kei Davis, Jamaludin Mohd-Yusof Los Alamos National Laboratory Los Alamos, NM, USA Abstract—The push towards exascale computing has sparked in such a way that we can maintain the benefits of a general- a new set of explorations for providing new productive pro- purpose toolchain but also maintain a domain awareness within gramming environments. While many efforts are focusing on the entire toolchain. Scout is a strict superset of the C and the design and development of domain-specific languages (DSLs), C++ languages and extends the general-purpose toolchain few have addressed the need for providing a fully domain-aware to maintain this domain context throughout the compilation toolchain. Without such domain awareness critical features for process. This step is critical to enable support for a produc- achieving acceptance and adoption, such as debugger support, pose a long-term risk to the overall success of the DSL approach. tive and complete, domain-aware environment for developing, In this paper we explore the use of language extensions to debugging, and profiling applications. design and implement the Scout DSL and a supporting toolchain infrastructure. We highlight how language features and the A. Domain-Specific Languages software design methodologies used within the toolchain play a significant role in providing a suitable environment for DSL Although domain-specific languages have only recently development. become popular in the high-performance computing (HPC) research community, they have been a common part of com- Keywords—Domain Specific Language, Compiler, Debugging, puting for decades [2].
    [Show full text]
  • Optimizing Subroutines in Assembly Language an Optimization Guide for X86 Platforms
    2. Optimizing subroutines in assembly language An optimization guide for x86 platforms By Agner Fog. Copenhagen University College of Engineering. Copyright © 1996 - 2012. Last updated 2012-02-29. Contents 1 Introduction ....................................................................................................................... 4 1.1 Reasons for using assembly code .............................................................................. 5 1.2 Reasons for not using assembly code ........................................................................ 5 1.3 Microprocessors covered by this manual .................................................................... 6 1.4 Operating systems covered by this manual................................................................. 7 2 Before you start................................................................................................................. 7 2.1 Things to decide before you start programming .......................................................... 7 2.2 Make a test strategy.................................................................................................... 9 2.3 Common coding pitfalls............................................................................................. 10 3 The basics of assembly coding........................................................................................ 12 3.1 Assemblers available ................................................................................................ 12 3.2 Register set
    [Show full text]
  • Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets
    Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets Daniel S. McFarlin Volodymyr Arbatov Franz Franchetti Department of Electrical and Department of Electrical and Department of Electrical and Computer Engineering Computer Engineering Computer Engineering Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA USA 15213 Pittsburgh, PA USA 15213 Pittsburgh, PA USA 15213 [email protected] [email protected] [email protected] Markus Püschel Department of Computer Science ETH Zurich 8092 Zurich, Switzerland [email protected] ABSTRACT General Terms The well-known shift to parallelism in CPUs is often associated Performance with multicores. However another trend is equally salient: the increasing parallelism in per-core single-instruction multiple-date Keywords (SIMD) vector units. Intel’s SSE and IBM’s VMX (compatible to Autovectorization, super-optimization, SIMD, program generation, AltiVec) both offer 4-way (single precision) floating point, but the Fourier transform recent Intel instruction sets AVX and Larrabee (LRB) offer 8-way and 16-way, respectively. Compilation and optimization for vector extensions is hard, and often the achievable speed-up by using vec- 1. Introduction torizing compilers is small compared to hand-optimization using Power and area constraints are increasingly dictating microar- intrinsic function interfaces. Unfortunately, the complexity of these chitectural developments in the commodity and high-performance intrinsics interfaces increases considerably with the vector length, (HPC) CPU space. Consequently, the once dominant approach of making hand-optimization a nightmare. In this paper, we present a dynamically extracting instruction-level parallelism (ILP) through peephole-based vectorization system that takes as input the vector monolithic out-of-order microarchitectures is being supplanted by instruction semantics and outputs a library of basic data reorgani- designs with simpler, replicable architectural features.
    [Show full text]
  • Migrating to Swift from Flash and Actionscript
    ©Radoslava Leseva Adams & Hristo Lesev Migrating to Swift from Flash and ActionScript Radoslava Leseva Adams Hristo Lesev ©Radoslava Leseva Adams & Hristo Lesev Contents at a Glance About the Authors ...................................................................................................xxi About the Technical Reviewer ..............................................................................xxiii Acknowledgments .................................................................................................xxv Preface ................................................................................................................xxvii ■ Part I: Tool Migration ..........................................................................1 ■ Chapter 1: Setting Up Your Environment ............................................................... 3 ■ Chapter 2: Hello, Xcode! ...................................................................................... 15 ■ Chapter 3: Introducing the Xcode Debugger ....................................................... 39 ■ Chapter 4: Additional Development Tools ............................................................51 ■ Part II: Workfl ow Migration .............................................................. 69 ■ Chapter 5: “Hello, Swift!”—A Tutorial for Building an iOS App ...........................71 ■ Chapter 6: Adding a More Complex UI ...............................................................105 ■ Chapter 7: Concurrency .....................................................................................171
    [Show full text]