Automatic Performance Tuning of Sparse Matrix Kernels

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Performance Tuning of Sparse Matrix Kernels Automatic Performance Tuning of Sparse Matrix Kernels by Richard Wilson Vuduc B.S. (Cornell University) 1997 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor James W. Demmel, Chair Professor Katherine A. Yelick Professor Sanjay Govindjee Fall 2003 The dissertation of Richard Wilson Vuduc is approved: Chair Date Date Date University of California, Berkeley Fall 2003 Automatic Performance Tuning of Sparse Matrix Kernels Copyright 2003 by Richard Wilson Vuduc 1 Abstract Automatic Performance Tuning of Sparse Matrix Kernels by Richard Wilson Vuduc Doctor of Philosophy in Computer Science University of California, Berkeley Professor James W. Demmel, Chair This dissertation presents an automated system to generate highly efficient, platform- adapted implementations of sparse matrix kernels. These computational kernels lie at the heart of diverse applications in scientific computing, engineering, economic modeling, and information retrieval, to name a few. Informally, sparse kernels are computational oper- ations on matrices whose entries are mostly zero, so that operations with and storage of these zero elements may be eliminated. The challenge in developing high-performance im- plementations of such kernels is choosing the data structure and code that best exploits the structural properties of the matrix|generally unknown until application run-time|for high-performance on the underlying machine architecture (e.g., memory hierarchy con- figuration and CPU pipeline structure). We show that conventional implementations of important sparse kernels like sparse matrix-vector multiply (SpMV) have historically run at 10% or less of peak machine speed on cache-based superscalar architectures. Our imple- mentations of SpMV, automatically tuned using a methodology based on empirical-search, can by contrast achieve up to 31% of peak machine speed, and can be up to 4 faster. × Given a matrix, kernel, and machine, our approach to selecting a fast implemen- tation consists of two steps: (1) we identify and generate a space of reasonable implemen- tations, and then (2) search this space for the fastest one using a combination of heuristic models and actual experiments (i.e., running and timing the code). We build on the Spar- sity system for generating highly-tuned implementations of the SpMV kernel y y + Ax, where A is a sparse matrix and x; y are dense vectors. We extend Sparsity to support tuning for a variety of common non-zero patterns arising in practice, and for additional 2 kernels like sparse triangular solve (SpTS) and computation of ATA x (or AAT x) and Aρ x. · · · We develop new models to compute, for particular data structures and kernels, the best absolute performance (e.g., Mflop/s) we might expect on a given matrix and machine. These performance upper bounds account for the cost of memory operations at all levels of the memory hierarchy, but assume ideal instruction scheduling and low-level tuning. We evaluate our performance with respect to such bounds, finding that the generated and tuned implementations of SpMV and SpTS achieve up to 75% of the performance bound. This finding places limits on the effectiveness of additional low-level tuning (e.g., better instruction selection and scheduling). Instances in which we are further from the bounds indicate new opportunities to close the gap by applying existing automatic low-level tuning technology. We also use these bounds to assess partially what architectures are good for kernels like SpMV. Among other conclusions, we find that performance improvements may be possible for SpMV and other streaming applications by ensuring strictly increasing cache line sizes in multi-level memory hierarchies. The costs and steps of tuning imply changes to the design of sparse matrix libraries. We propose extensions to the recent standardized interface, the Sparse Basic Linear Algebra Subroutines (SpBLAS). We argue that such an extended interface complements existing approaches to sparse code generation, and furthermore is a suitable building block for widely-used higher-level scientific libraries and systems (e.g., PETSc and MATLAB) to provide users with high-performance sparse kernels. Looking toward future tuning systems, we consider an aspect of the tuning problem that is common to all current systems: the problem of search. Specifically, we pose two search-related problems. First, we consider the problem of stopping an exhaustive search while providing approximate bounds on the probability that an optimal implementation has been found. Second, we consider the problem of choosing at run-time one from among several possible implementations based on the run-time input. We formalize both problems in a manner amenable to attack by statistical modeling techniques. Our methods may potentially apply broadly to tuning systems for as yet unexplored domains. Professor James W. Demmel Dissertation Committee Chair i To my bringer of wing zings. ii Contents List of Figures vi List of Tables xi List of Symbols xiv 1 Introduction 1 1.1 Contributions ................................... 5 1.2 Problem Context and History .......................... 6 1.2.1 Hardware and software trends in sparse kernel performance ..... 6 1.2.2 Emergence of standard library interfaces ................ 8 1.3 The Case for Search ............................... 11 1.4 Summary, Scope, and Outline .......................... 15 2 Basic Sparse Matrix Data Structures 18 2.1 Survey of Basic Data Structures ........................ 20 2.1.1 Example: Contrasting dense and sparse storage ............ 20 2.1.2 Dense storage formats .......................... 21 2.1.3 Sparse vector formats .......................... 25 2.1.4 Sparse matrix formats .......................... 26 2.2 Experimental Comparison of the Basic Formats ................ 39 2.2.1 Experimental setup ............................ 40 2.2.2 Results on the Sparsity matrix benchmark suite ........... 41 2.3 Note on Recursive Storage Formats ....................... 45 2.4 Summary ..................................... 45 3 Improved Register-Level Tuning of Sparse Matrix-Vector Multiply 51 3.1 Register Blocked Sparse Matrix-Vector Multiply ............... 53 3.1.1 Register blocking overview ........................ 54 3.1.2 Surprising performance behavior in practice .............. 58 3.2 An Improved Heuristic for Block Size Selection ................ 67 3.2.1 A fill ratio estimation algorithm .................... 68 3.2.2 Tuning the fill estimator: cost and accuracy trade-offs ........ 71 3.3 Evaluation of the Heuristic: Accuracy and Costs ............... 77 iii 3.3.1 Accuracy of the Sparsity Version 2 heuristic ............. 78 3.3.2 The costs of register block size tuning ................. 85 3.4 Summary ..................................... 90 4 Performance Bounds for Sparse Matrix-Vector Multiply 93 4.1 A Performance Bounds Model for Register Blocking ............. 95 4.1.1 Modeling framework and summary of key assumptions ........ 96 4.1.2 A latency-based model of execution time ................ 97 4.1.3 Lower (and upper) bounds on cache misses .............. 98 4.2 Experimental Evaluation of the Bounds Model ................ 100 4.2.1 Determining the model latencies .................... 102 4.2.2 Cache miss model validation ...................... 109 4.2.3 Key results: observed performance vs. the bounds model ...... 121 4.3 Related Work ................................... 137 4.4 Summary ..................................... 141 5 Advanced Data Structures for Sparse Matrix-Vector Multiply 143 5.1 Splitting variable-block matrices ........................ 144 5.1.1 Test matrices and a motivating example ................ 147 5.1.2 Altering the non-zero distribution of blocks using fill ......... 149 5.1.3 Choosing a splitting ........................... 154 5.1.4 Experimental results ........................... 156 5.2 Exploiting diagonal structure .......................... 165 5.2.1 Test matrices and motivating examples ................ 165 5.2.2 Row segmented diagonal format .................... 166 5.2.3 Converting to row segmented diagonal format ............. 169 5.2.4 Experimental results ........................... 171 5.3 Summary and overview of additional techniques ................ 179 6 Performance Tuning and Bounds for Sparse Triangular Solve 184 6.1 Optimization Techniques ............................. 186 6.1.1 Improving register reuse: register blocking ............... 188 6.1.2 Using the dense BLAS: switch-to-dense ................ 189 6.1.3 Tuning parameter selection ....................... 189 6.2 Performance Bounds ............................... 191 6.2.1 Review of the latency-based execution time model .......... 192 6.2.2 Cache miss lower and upper bounds .................. 193 6.3 Performance Evaluation ............................. 194 6.3.1 Validating the cache miss bounds .................... 194 6.3.2 Evaluating optimized SpTS performance ................ 195 6.4 Related Work ................................... 197 6.5 Summary ..................................... 198 iv 7 Higher-Level Sparse Kernels 202 7.1 Automatically Tuning ATA x for the Memory Hierarchy ........... 203 7.2 Upper Bounds on ATA x Performance· ..................... 205 7.2.1 A latency-based· execution time model ................. 207 7.2.2 A lower bound on cache misses ..................... 207 7.3 Experimental Results
Recommended publications
  • Cygwin User's Guide
    Cygwin User’s Guide Cygwin User’s Guide ii Copyright © Cygwin authors Permission is granted to make and distribute verbatim copies of this documentation provided the copyright notice and this per- mission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this documentation under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this documentation into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. Cygwin User’s Guide iii Contents 1 Cygwin Overview 1 1.1 What is it? . .1 1.2 Quick Start Guide for those more experienced with Windows . .1 1.3 Quick Start Guide for those more experienced with UNIX . .1 1.4 Are the Cygwin tools free software? . .2 1.5 A brief history of the Cygwin project . .2 1.6 Highlights of Cygwin Functionality . .3 1.6.1 Introduction . .3 1.6.2 Permissions and Security . .3 1.6.3 File Access . .3 1.6.4 Text Mode vs. Binary Mode . .4 1.6.5 ANSI C Library . .4 1.6.6 Process Creation . .5 1.6.6.1 Problems with process creation . .5 1.6.7 Signals . .6 1.6.8 Sockets . .6 1.6.9 Select . .7 1.7 What’s new and what changed in Cygwin . .7 1.7.1 What’s new and what changed in 3.2 .
    [Show full text]
  • Linear Algebra Libraries
    Linear Algebra Libraries Claire Mouton [email protected] March 2009 Contents I Requirements 3 II CPPLapack 4 III Eigen 5 IV Flens 7 V Gmm++ 9 VI GNU Scientific Library (GSL) 11 VII IT++ 12 VIII Lapack++ 13 arXiv:1103.3020v1 [cs.MS] 15 Mar 2011 IX Matrix Template Library (MTL) 14 X PETSc 16 XI Seldon 17 XII SparseLib++ 18 XIII Template Numerical Toolkit (TNT) 19 XIV Trilinos 20 1 XV uBlas 22 XVI Other Libraries 23 XVII Links and Benchmarks 25 1 Links 25 2 Benchmarks 25 2.1 Benchmarks for Linear Algebra Libraries . ....... 25 2.2 BenchmarksincludingSeldon . 26 2.2.1 BenchmarksforDenseMatrix. 26 2.2.2 BenchmarksforSparseMatrix . 29 XVIII Appendix 30 3 Flens Overloaded Operator Performance Compared to Seldon 30 4 Flens, Seldon and Trilinos Content Comparisons 32 4.1 Available Matrix Types from Blas (Flens and Seldon) . ........ 32 4.2 Available Interfaces to Blas and Lapack Routines (Flens and Seldon) . 33 4.3 Available Interfaces to Blas and Lapack Routines (Trilinos) ......... 40 5 Flens and Seldon Synoptic Comparison 41 2 Part I Requirements This document has been written to help in the choice of a linear algebra library to be included in Verdandi, a scientific library for data assimilation. The main requirements are 1. Portability: Verdandi should compile on BSD systems, Linux, MacOS, Unix and Windows. Beyond the portability itself, this often ensures that most compilers will accept Verdandi. An obvious consequence is that all dependencies of Verdandi must be portable, especially the linear algebra library. 2. High-level interface: the dependencies should be compatible with the building of the high-level interface (e.
    [Show full text]
  • Technical Computing on the OS … That Is Not Linux! Or How to Leverage Everything You‟Ve Learned, on a Windows Box As Well
    Tools of the trade: Technical Computing on the OS … that is not Linux! Or how to leverage everything you‟ve learned, on a Windows box as well Sean Mortazavi & Felipe Ayora Typical situation with TC/HPC folks Why I have a Windows box How I use it It was in the office when I joined Outlook / Email IT forced me PowerPoint I couldn't afford a Mac Excel Because I LIKE Windows! Gaming It's the best gaming machine Technical/Scientific computing Note: Stats completely made up! The general impression “Enterprise community” “Hacker community” Guys in suits Guys in jeans Word, Excel, Outlook Emacs, Python, gmail Run prepackaged stuff Builds/runs OSS stuff Common complaints about Windows • I have a Windows box, but Windows … • Is hard to learn… • Doesn‟t have a good shell • Doesn‟t have my favorite editor • Doesn‟t have my favorite IDE • Doesn‟t have my favorite compiler or libraries • Locks me in • Doesn‟t play well with OSS • …. • In summary: (More like ) My hope … • I have a Windows box, and Windows … • Is easy to learn… • Has excellent shells • Has my favorite editor • Supports my favorite IDE • Supports my compilers and libraries • Does not lock me in • Plays well with OSS • …. • In summary: ( or at least ) How? • Recreating a Unix like veneer over windows to minimize your learning curve • Leverage your investment in know how & code • Showing what key codes already run natively on windows just as well • Kicking the dev tires using cross plat languages Objective is to: Help you ADD to your toolbox, not take anything away from it! At a high level… • Cygwin • SUA • Windowing systems “The Unix look & feel” • Standalone shell/utils • IDE‟s • Editors General purpose development • Compilers / languages / Tools • make • Libraries • CAS environments Dedicated CAS / IDE‟s And if there is time, a couple of demos… Cygwin • What is it? • A Unix like environment for Windows.
    [Show full text]
  • User Guide for CHOLMOD: a Sparse Cholesky Factorization and Modification Package
    User Guide for CHOLMOD: a sparse Cholesky factorization and modification package Timothy A. Davis [email protected], http://www.suitesparse.com VERSION 3.0.14, Oct 22, 2019 Abstract CHOLMOD1 is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AAT, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices. Its supernodal Cholesky factorization relies on LAPACK and the Level-3 BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both real and complex matrices are supported. It also includes a non-supernodal LDLT factorization method that can factorize symmetric indefinite matrices if all of their leading submatrices are well-conditioned (D is diagonal). CHOLMOD is written in ANSI/ISO C, with both C and MATLAB interfaces. This code works on Microsoft Windows and many versions of Unix and Linux. CHOLMOD Copyright c 2005-2015 by Timothy A. Davis. Portions are also copyrighted by William W. Hager (the Modify Module), and the University of Florida (the Partition and Core Modules). All Rights Reserved. See CHOLMOD/Doc/License.txt for the license. CHOLMOD is also available under other licenses that permit its use in proprietary applications; contact the authors for details. See http://www.suitesparse.com for the code and all documentation, including this User Guide. 1CHOLMOD is short for CHOLesky MODification, since a key feature of the package is its ability to up- date/downdate a sparse Cholesky factorization 1 Contents 1 Overview 8 2 Primary routines and data structures 9 3 Simple example program 11 4 Installation of the C-callable library 12 5 Using CHOLMOD in MATLAB 15 5.1 analyze: order and analyze .
    [Show full text]
  • Apple File System Reference
    Apple File System Reference Developer Contents About Apple File System 7 General-Purpose Types 9 paddr_t .................................................. 9 prange_t ................................................. 9 uuid_t ................................................... 9 Objects 10 obj_phys_t ................................................ 10 Supporting Data Types ........................................... 11 Object Identifier Constants ......................................... 12 Object Type Masks ............................................. 13 Object Types ................................................ 14 Object Type Flags .............................................. 20 EFI Jumpstart 22 Booting from an Apple File System Partition ................................. 22 nx_efi_jumpstart_t ........................................... 24 Partition UUIDs ............................................... 25 Container 26 Mounting an Apple File System Partition ................................... 26 nx_superblock_t ............................................. 27 Container Flags ............................................... 36 Optional Container Feature Flags ...................................... 37 Read-Only Compatible Container Feature Flags ............................... 38 Incompatible Container Feature Flags .................................... 38 Block and Container Sizes .......................................... 39 nx_counter_id_t ............................................. 39 checkpoint_mapping_t ........................................
    [Show full text]
  • Optimizing the TLB Shootdown Algorithm with Page Access Tracking
    Optimizing the TLB Shootdown Algorithm with Page Access Tracking Nadav Amit, VMware Research https://www.usenix.org/conference/atc17/technical-sessions/presentation/amit This paper is included in the Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC ’17). July 12–14, 2017 • Santa Clara, CA, USA ISBN 978-1-931971-38-6 Open access to the Proceedings of the 2017 USENIX Annual Technical Conference is sponsored by USENIX. Optimizing the TLB Shootdown Algorithm with Page Access Tracking Nadav Amit VMware Research Abstract their TLBs according to the information supplied by The operating system is tasked with maintaining the the initiator core, and they report back when they coherency of per-core TLBs, necessitating costly syn- are done. TLB shootdown can take microseconds, chronization operations, notably to invalidate stale causing a notable slowdown [48]. Performing TLB mappings. As core-counts increase, the overhead of shootdown in hardware, as certain CPUs do, is faster TLB synchronization likewise increases and hinders but still incurs considerable overheads [22]. scalability, whereas existing software optimizations In addition to reducing performance, shootdown that attempt to alleviate the problem (like batching) overheads can negatively affect the way applications are lacking. are constructed. Notably, to avoid shootdown la- We address this problem by revising the TLB tency, programmers are advised against using mem- synchronization subsystem. We introduce several ory mappings, against unmapping them, and even techniques that detect cases whereby soon-to-be against building multithreaded applications [28, 42]. invalidated mappings are cached by only one TLB But memory mappings are the efficient way to use or not cached at all, allowing us to entirely avoid persistent memory [18, 47], and avoiding unmap- the cost of synchronization.
    [Show full text]
  • PARDISO User Guide Version
    Parallel Sparse Direct And Multi-Recursive Iterative Linear Solvers PARDISO User Guide Version 7.2 (Updated December 28, 2020) Olaf Schenk and Klaus Gärtner Parallel Sparse Direct Solver PARDISO — User Guide Version 7.2 1 Since a lot of time and effort has gone into PARDISO version 7.2, please cite the following publications if you are using PARDISO for your own research: References [1] C. Alappat, A. Basermann, A. R. Bishop, H. Fehske, G. Hager, O. Schenk, J. Thies, G. Wellein. A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication. ACM Trans. Parallel Comput, Vol. 7, No. 3, Article 19, June 2020. [2] M. Bollhöfer, O. Schenk, R. Janalik, S. Hamm, and K. Gullapalli. State-of-The-Art Sparse Direct Solvers. Parallel Algorithms in Computational Science and Engineering, Modeling and Simulation in Science, Engineering and Technology. pp 3-33, Birkhäuser, 2020. [3] M. Bollhöfer, A. Eftekhari, S. Scheidegger, O. Schenk, Large-scale Sparse Inverse Covariance Matrix Estimation. SIAM Journal on Scientific Computing, 41(1), A380-A401, January 2019. Note: This document briefly describes the interface to the shared-memory and distributed-memory multiprocessing parallel direct and multi-recursive iterative solvers in PARDISO 7.21. A discussion of the algorithms used in PARDISO and more information on the solver can be found at http://www. pardiso-project.org 1Please note that this version is a significant extension compared to Intel’s MKL version and that these improvements and features are not available in Intel’s MKL 9.3 release or later releases. Parallel Sparse Direct Solver PARDISO — User Guide Version 7.2 2 Contents 1 Introduction 4 1.1 Supported Matrix Types .
    [Show full text]
  • Build Guide for Mingw (Draft)
    GnuCOBOL 3.1-dev Build Guide for MinGW (draft) GnuCOBOL 3.1-dev [25AUG2019] Build Guide for MinGW - Minimalist GNU for Windows cobc (GnuCOBOL) 3.1-dev Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Keisuke Nishida, Roger While, Ron Norman, Simon Sobisch, Edward Hart This document was prepared by: Arnold J. Trembley ([email protected]) and last updated Monday, 02 September 2019. My original starting point for building GnuCOBOL and writing these guides was the manual written by Gary Cutler ([email protected] ) . OpenCOBOL-1.1-06FEB2009-Build-Guide-For-MinGW.pdf https://www.arnoldtrembley.com/OpenCOBOL-1.1-06FEB2009-Build-Guide-For-MinGW.pdf Simon Sobisch of the GnuCOBOL project was extremely helpful whenever I encountered difficulties building GnuCOBOL, especially in running the compiler test suites and VBISAM. Brian Tiffin also provided assistance and encouragement. The GnuCOBOL project space can be found here: https://sourceforge.net/projects/open-cobol/ Page 1 GnuCOBOL 3.1-dev Build Guide for MinGW (draft) Required Components: You will need to download the following components in order to build the GnuCOBOL 3.0 compiler in Windows: 1. MinGW - Minimalist GNU for Windows 2. GNU Multiple-Precision Arithmetic package (gmplib) 3. PDCurses 4.0.2 - used for screen read and write. (alternate versions available) 4. Berkeley Database (BDB) package from Oracle ==OR== 5. VBISAM2 by Roger While 6.
    [Show full text]
  • Impact of Tool Support in Patch Construction Anil Koyuncu, Tegawendé Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, Yves Le Traon
    Impact of Tool Support in Patch Construction Anil Koyuncu, Tegawendé Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, Yves Le Traon To cite this version: Anil Koyuncu, Tegawendé Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, et al.. Impact of Tool Support in Patch Construction. 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Jul 2017, Santa Barbara, United States. pp.237–248, 10.1145/3092703.3092713. hal-01575214 HAL Id: hal-01575214 https://hal.archives-ouvertes.fr/hal-01575214 Submitted on 18 Aug 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Impact of Tool Support in Patch Construction Anil Koyuncu Tegawendé F. Bissyandé Dongsun Kim SnT, University of Luxembourg - SnT, University of Luxembourg - SnT, University of Luxembourg - Luxembourg Luxembourg Luxembourg Jacques Klein Martin Monperrus Yves Le Traon SnT, University of Luxembourg - Inria, University of Lille - France SnT, University of Luxembourg - Luxembourg Luxembourg ABSTRACT 1 INTRODUCTION In this work, we investigate the practice of patch construction in Patch construction is a key task in software development. In par- the Linux kernel development, focusing on the dierences between ticular, it is central to the repair process when developers must three patching processes: (1) patches crafted entirely manually to engineer change operations for xing the buggy code.
    [Show full text]
  • Section “Introduction” in Using the GNU Compiler Collection (GCC)
    Using the GNU Compiler Collection (GCC) Using the GNU Compiler Collection by Richard M. Stallman and the GCC Developer Community For GCC Version 4.1.2 Published by: GNU Press Website: www.gnupress.org a division of the General: [email protected] Free Software Foundation Orders: [email protected] 51 Franklin Street, Fifth Floor Tel 617-542-5942 Boston, MA 02110-1301 USA Fax 617-542-2652 Last printed October 2003 for GCC 3.3.1. Printed copies are available for $45 each. Copyright c 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License” and “Funding Free Software”, the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled “GNU Free Documentation License”. (a) The FSF’s Front-Cover Text is: A GNU Manual (b) The FSF’s Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. i Short Contents Introduction ...................................... 1 1 Programming Languages Supported by GCC ............ 3 2 Language Standards Supported by GCC ............... 5 3 GCC Command Options .......................... 7 4 C Implementation-defined behavior ................. 205 5 Extensions to the C Language Family ...............
    [Show full text]
  • FINAL REPORT an Analysis of Biomass Burning Impacts on Texas
    131 Hartwell Avenue Lexington, Massachusetts 02421-3126 USA Tel: +1 781 761-2288 Fax: +1 781 761-2299 www.aer.com FINAL REPORT An Analysis of Biomass Burning Impacts on Texas TCEQ Contract No. 582-15-50414 Work Order No. 582-16-62311-03 Revision 2.0 Prepared by: Matthew Alvarado, Christopher Brodowski, and Chantelle Lonsdale Atmospheric and Environmental Research, Inc. (AER) 131 Hartwell Ave. Lexington, MA 02466 Correspondence to: [email protected] Prepared for: Erik Gribbin Texas Commission on Environmental Quality Air Quality Division Building E, Room 342S Austin, TX 78711-3087 June 30, 2016 Work Order No. 582-16-62311-03 Final Report Document Change Record Revision Revision Date Remarks 0.1 08 June 2016 Internal Version for Review 1.0 10 June 2016 Draft Version Submitted to TCEQ for Review 2.0 30 June 2016 Final Report Submitted to TCEQ 2 Work Order No. 582-16-62311-03 Final Report TABLE OF CONTENTS Executive Summary ....................................................................................................................... 8 1. Introduction ........................................................................................................................... 10 1.1 Project Objectives .........................................................................................................................10 1.2 Background .....................................................................................................................................10 1.3 Report Outline ...............................................................................................................................11
    [Show full text]
  • Endicheck: Dynamic Analysis for Detecting Endianness Bugs
    Endicheck: Dynamic Analysis for Detecting Endianness Bugs Roman Kapl´ and Pavel Par´ızek TACAS Artifact Evaluation Department of Distributed and Dependable Systems, 2020 Faculty of Mathematics and Physics, Charles University, Accepted Prague, Czech Republic Abstract. Computers store numbers in two mutually incompatible ways: little- endian or big-endian. They differ in the order of bytes within representation of numbers. This ordering is called endianness. When two computer systems, pro- grams or devices communicate, they must agree on which endianness to use, in order to avoid misinterpretation of numeric data values. We present Endicheck, a dynamic analysis tool for detecting endianness bugs, which is based on the popular Valgrind framework. It helps developers to find those code locations in their program where they forgot to swap bytes prop- erly. Endicheck requires less source code annotations than existing tools, such as Sparse used by Linux kernel developers, and it can also detect potential bugs that would only manifest if the given program was run on computer with an oppo- site endianness. Our approach has been evaluated and validated on the Radeon SI Linux OpenGL driver, which is known to contain endianness-related bugs, and on several open-source programs. Results of experiments show that Endicheck can successfully identify many endianness-related bugs and provide useful diagnostic messages together with the source code locations of respective bugs. 1 Introduction Modern computers represent and store numbers in two mutually incompatible ways: little-endian (with the least-significant byte first) or big endian (the most-significant byte first). The byte order is also referred to as endianness.
    [Show full text]