Toward Understanding Compiler Bugs in GCC and LLVM

ifact t * r * Comple t te A A n * te W E is s * e C n l l o D C A o * * c T u e m S s E u e S e n I R Toward Understanding Compiler Bugs t v e o d t y * s E a * a l d u e a in GCC and LLVM t Chengnian Sun Vu Le Qirun Zhang Zhendong Su Department of Computer Science, University of California, Davis, USA {cnsun, vmle, qrzhang, su}@ucdavis.edu ABSTRACT 1. INTRODUCTION Compilers are critical, widely-used complex software. Bugs Compilers are an important category of system software. in them have significant impact, and can cause serious dam- Extensive research and development efforts have been de- age when they silently miscompile a safety-critical applica- voted to increasing compilers' performance and reliability. tion. An in-depth understanding of compiler bugs can help However, it may still surprise application developers that, detect and fix them. To this end, we conduct the first em- similar to application software, production compilers also pirical study on the characteristics of the bugs in two main- contain bugs, and in fact quite many. Furthermore, com- stream compilers, GCC and LLVM. Our study is significant piler bugs impact application code, and even lead to severe in scale | it exhaustively examines about 50K bugs and damage, especially when a buggy compiler is used to compile 30K bug fix revisions over more than a decade's span. safety-critical applications. This paper details our systematic study. Summary find- Different from most of the application bugs, compiler bugs ings include: (1) In both compilers, C++ is the most buggy are difficult to recognize as they usually manifest indirectly component, accounting for around 20% of the total bugs and as application failures. For example, a compiler bug makes twice as many as the second most buggy component; (2) the a program optimized and transformed into a wrong exe- bug revealing test cases are typically small, with 80% having cutable, and this bug can only manifest as the executable fewer than 45 lines of code; (3) most of the bug fixes touch a misbehaves. Even worse, in most cases the application de- single source file with small modifications (43 lines for GCC veloper will first assume the misbehavior is caused by some and 38 for LLVM on average); (4) the average lifetime of bug introduced by herself/himself, and it may take a long GCC bugs is 200 days, and 111 days for LLVM; and (5) time for her/him to realize that the compiler is the culprit. high priority tends to be assigned to optimizer bugs, most In order to better understand compiler bugs, we conduct notably 30% of the bugs in GCC's inter-procedural analysis the first empirical study on their characteristics. Although component are labeled P1 (the highest priority). there have been a number of empirical studies on software This study deepens our understanding of compiler bugs. bugs [6,7, 17, 25, 27, 31, 36, 37], none focuses on compiler For application developers, it shows that even mature pro- bugs. For example, Lu et al. [17] study the characteristics duction compilers still have many bugs, which may affect of concurrency bugs, Sahoo et al. [25] investigate the bugs development. For researchers and compiler developers, it in server software, and Chou et al. [7] research the errors sheds light on interesting characteristics of compiler bugs, in operating system kernels. In contrast, this paper ex- and highlights challenges and opportunities to more effec- amines the bugs of two mainstream production compilers, tively test and debug compilers. GCC and LLVM, in total 39,890 bugs of GCC and 12,842 bugs of LLVM. In order to explore the properties of compiler bug fixes, we also examine 22,947 GCC revisions and CCS Concepts 8,452 LLVM revisions, which are fixes to most of the bugs. •General and reference ! Empirical studies; •Software In particular, we attempt to investigate compiler bugs along and its engineering ! Software testing and debug- four central aspects: ging; (1) Location of Bugs. We compute the distribution of bugs in compiler components and the source files. The data Keywords shows that in both GCC and LLVM, the component C++ is always the most buggy one, with a defect rate twice as empirical studies, compiler bugs, compiler testing much as the second most buggy component. In addition, we find that most of the source files only contain one bug (i.e., Permission to make digital or hard copies of all or part of this work for personal or 60% for GCC and 53% for LLVM), and most of the top ten classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- buggy source files belong to the front end of C++. tion on the first page. Copyrights for components of this work owned by others than (2) Test Cases, Localization and Fixes of Bugs. We ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission investigate the size of the test cases that trigger compiler This is the author’s version of the work. It is posted here for your personal use. Not for and/orredistribution. a fee. Request The definitive permissions version from was [email protected]. published in the following publication: bugs, and find that on average GCC test cases have 32 lines of code and those of LLVM have 29 lines. This observation ISSTA’16ISSTA ’16,, July July 18–20, 18-22, 2016, 2016, Saarbrücken, SaarbrÃijcken, Germany Germany cc 20162016 ACM. ACM. ISBN 978-1-4503-4390-9/16/07... 978-1-4503-4390-9/16/07. $15.00 can guide random testing of compilers. For example, instead DOI:http://dx.doi.org/10.1145/2931037.2931074http://dx.doi.org/10.1145/2931037.2931074 of generating large simple test programs, compiler testing 294 may be more effective by generating small, yet complex test 2.1 Source of Bugs programs. Moreover, it confirms that most of the bugs can In this paper, we study the bugs in two mainstream com- be effectively reduced by techniques such as Delta [38] or C- piler systems GCC and LLVM. Reduce [24]. The fixes of compiler bugs are not big either, GCC GCC is a compiler system produced by the GNU which on average only touch 43 lines of code for GCC and Project supporting various languages (e.g. C, C++, and For- 38 for LLVM. 92% of them involve fewer than 100 lines of tran) and various target architectures (e.g. PowerPC, x86, code changes, and 58% for GCC and 54% for LLVM involve and MIPS). It has been under active development since the only one function. Our findings reveal that most of the bug late 1980s. The latest version is 5.3, which was released on fixes are local. Even for optimizers, each bug is usually from December 4, 2015. a single optimization pass. LLVM LLVM is another popular compiler infrastructure, (3) Duration of Bugs. We compute information on the providing a collection of modular and reusable compiler and duration of bugs (i.e. the time between when a bug was filed toolchain technologies for arbitrary programming languages. and when it was closed), and show that the average duration Similar to GCC, LLVM also supports multiple languages of GCC bugs is 200 days and that of LLVM bugs is 111 days. and multiple target architectures. The project started in We also show that the GCC community confirms bugs faster 2000 and has drawn much attention from both industry and than LLVM but spends more time fixing them. academia. The latest version is 3.7.1, released on January (4) Priorities of Bugs. The field priority of a bug 5, 2016. report is determined by developers for prioritizing bugs, ex- pressing the order in which bugs should be fixed. We inves- Table 1: The information of the bugs used in this tigate the distribution of bugs over priorities. Of the GCC study. bugs, 68% have the default P3 priority, and only 6.02% are Compiler Start End Bugs Revisions labeled as P1. We then study how priorities are correlated GCC Aug-1999 Oct-2015 39,890 22,947 with components. The inter-procedural analysis component, LLVM Oct-2003 Oct-2015 12,842 8,452 ipa, of GCC is the most\impactful"as 30% of its bugs are labeled as P1. We also study how priorities affect the lifetime of bugs. On average, P1 bugs are fixed the fastest, whereas Collection of Bugs Our study focuses on fixed bugs. We P5 bugs are fixed the slowest. However, for the rest, the fix say a bug is fixed if its resolution field is set to fixed and time is not strictly correlated with their priorities. the status field is set to resolved, verified or closed in As a proof-of-concept demonstration of the practical po- the bug repositories of GCC and LLVM. tentials of our findings, we design a simple yet effective pro- Identifying Bug Fix Revisions We then identify the gram mutation algorithm tkfuzz to find compiler crash- revisions that correspond to these fixed bugs. We collect the ing bugs, by leveraging the second observation that bug- entire revision log from the code repositories, and for each revealing test programs are usually small. Applying tkfuzz revision, we scan its commit message to check whether this on the 36,966 test programs in the GCC test suite (with an revision is a fix to a bug.

Toward Understanding Compiler Bugs in GCC and LLVM

Open and Efficient Type Switch For

Static Reflection

Function Pointer Declaration in C Typedef

C++0X Type Inference

Modern C++ Object-Oriented Programming

Modern C++ Tutorial: C++11/14/17/20 on the Fly

Overload Journal

Using the GNU Compiler Collection

Automatic Identification of Idiom Usage in C++ Generic Libraries

Effective Modern

Object-Oriented Programming for Scientific Computing

My Ideas for a Talk