Random Testing of Open Source C Compilers

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by The University of Utah: J. Willard Marriott Digital Library RANDOM TESTING OF OPEN SOURCE C COMPILERS by Xuejun Yang A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science School of Computing The University of Utah December 2014 Copyright c Xuejun Yang 2014 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Xuejun Yang has been approved by the following supervisory committee members: John Regehr , Chair 06/06/2014 Date Approved Ganesh Gopalakrishnan , Member 06/06/2014 Date Approved Matthew Flatt , Member 04/03/2014 Date Approved Eric Eide , Member 06/05/2014 Date Approved Chris Lattner , Member 05/12/2014 Date Approved and by Ross Whitaker , Chair/Dean of the Department/College/School of Computing and by David B. Kieda, Dean of The Graduate School. ABSTRACT Compilers are indispensable tools to developers. We expect them to be correct. However, compiler correctness is very hard to be reasoned about. This can be partly explained by the daunting complexity of compilers. In this dissertation, I will explain how we constructed a random program generator, Csmith, and used it to find hundreds of bugs in strong open source compilers such as the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). The success of Csmith depends on its ability of being expressive and unambiguous at the same time. Csmith is composed of a code generator and a GTAV (Generation-Time Analysis and Validation) engine. They work interactively to produce expressive yet unambiguous random programs. The expressiveness of Csmith is attributed to the code generator, while the unambiguity is assured by GTAV. GTAV performs program analyses, such as points-to analysis and effect analysis, efficiently to avoid ambiguities caused by undefined behaviors or unspecified behaviors. During our 4.25 years of testing, Csmith has found over 450 bugs in the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). We analyzed the bugs by putting them into different categories, studying the root causes, finding their locations in compilers' source code, and evaluating their importance. We believe analysis results are useful to future random testers, as well as compiler writers/users. To my wife, for the countless weekends with a missing husband. CONTENTS ABSTRACT :::::::::::::::::::::::::::::::::::::::::::::::::::::::: iii LIST OF FIGURES ::::::::::::::::::::::::::::::::::::::::::::::::: ix LIST OF TABLES ::::::::::::::::::::::::::::::::::::::::::::::::::: x ACKNOWLEDGMENTS :::::::::::::::::::::::::::::::::::::::::::: xi CHAPTERS 1. INTRODUCTION ::::::::::::::::::::::::::::::::::::::::::::::: 1 1.1 A monkey . .1 1.2 A compiler . .1 1.3 Putting a monkey and a compiler together - random testing of a compiler . .2 1.4 Adding more compilers to the equation - random differential testing . .4 1.5 Challenges . .5 1.5.1 A random program generator that is both expressive and unambiguous5 1.5.2 Find and understand bugs in open source C compilers . .6 1.6 Evaluation . .7 1.7 Contributions . .8 1.8 Thesis organization . .9 2. BACKGROUND :::::::::::::::::::::::::::::::::::::::::::::::: 10 2.1 Software testing . 10 2.2 Compilers . 12 2.2.1 History of compilers . 12 2.2.2 Structure of a compiler . 13 2.3 GCC ........................................................ 14 2.3.1 History . 14 2.3.2 System overview . 14 2.3.3 GCC testing . 15 2.4 LLVM and Clang . 16 2.4.1 History . 16 2.4.2 System overview . 17 2.4.3 LLVM testing . 17 2.5 Compiler correctness and testing . 18 2.6 Random testing of compilers . 19 3. CODE GENERATOR :::::::::::::::::::::::::::::::::::::::::::: 21 3.1 The objectives . 21 3.2 Design choices . 22 3.2.1 Maximizing expressiveness while maintaining unambiguity . 22 3.2.2 Finding a test oracle . 23 3.2.3 Validating intermediate states vs. final states . 25 3.2.4 Grammar-based generation vs. AST-based generation . 25 3.2.5 Avoiding false positives vs. postgeneration trimming . 26 3.3 The shape of a randomly generated program . 26 3.4 Overview of the random generation process . 28 3.5 The generation grammar of Csmith . 31 3.6 Other considerations of context sensitivity: checks, balances, and biases . 32 3.6.1 Type check . 33 3.6.2 Value-range check . 34 3.6.3 Qualifiers check . 34 3.6.4 Unsafe arithmetic check . 35 3.6.5 Pointer check . 36 3.6.6 Array out-of-bound access check . 37 3.6.7 Use before initialization check . 38 3.6.8 The program size balance . 38 3.6.9 The biases . 38 3.7 Exceptions to the general AST construction order . 41 4. GENERATION TIME ANALYSIS AND VALIDATION ::::::::::::: 42 4.1 The objectives . 42 4.2 The necessity and challenges of generation time analysis and validation (GTAV) 43 4.3 GTAV framework . 43 4.3.1 Data representations and control representations . 43 4.3.2 Pre-commit GTAV . 44 4.3.3 Post-commit GTAV. 45 4.3.4 Transfer function for programs . 45 4.3.5 Transfer function for functions . 45 4.3.6 Transfer functions for statements . 47 4.3.7 Transfer function for loops . 47 4.3.8 Transfer function for expressions . 50 4.4 Points-to analysis and validation within GTAV framework . 50 4.4.1 Abstract variables . 50 4.4.2 The lattice . 52 4.4.3 Validation of pointer references . 52 4.4.4 Validation of points-to analysis itself . 54 4.5 Side-effect analysis within GTAV framework and the evaluation order problem 55 4.5.1 The algorithm and the lattice . 57 4.5.2 The implementation . 58 4.5.3 Side-effect validation for volatiles . 58 4.6 Union-field analysis and validation within GTAV . 59 4.6.1 The lattice . 59 4.6.2 Validation of union-field reads . 60 4.6.3 Validation of assignment between overlapping objects . 61 4.7 Guaranteed convergence of fixed point of the loop transfer function . 61 vi 5. EVALUATION :::::::::::::::::::::::::::::::::::::::::::::::::: 63 5.1 Uncontrolled opportunistic compiler bug finding . 63 5.1.1 What kinds of bugs are there? . 63 5.1.2 Experience with commercial compilers . 64 5.1.3 Experience with open source compilers . 65 5.1.4 Testing CompCert . 66 5.2 Comparison with the other random program generators . 67 5.2.1 Previously published random C program generators . 67 5.2.2 Generating performance . 68 5.2.3 Compiler code coverage . 70 5.2.4 Bug-finding power . 72 5.2.5 Bug-finding performance as a function of test-case size . 77 5.2.6 Bug-finding performance compared to other tools . 78 6. COMPILER BUG STUDIES ::::::::::::::::::::::::::::::::::::: 81 6.1 Compile-time bugs vs. wrong-code bugs . ..

Random Testing of Open Source C Compilers

The Compcert C Verified Compiler: Documentation and User's Manual

An Executable Semantics for Compcert C⋆

A Multipurpose Formal RISC-V Specification Without Creating New Tools

Certification of an Instruction Set Simulator Xiaomu Shi

A Trustworthy Monadic Formalization of the Armv7 Instruction Set Architecture

Certified and Efficient Instruction Scheduling. Application To

The Compcert C Verified Compiler: Documentation and User’S Manual Xavier Leroy

An Abstract Stack Based Approach to Verified Compositional Compilation to Machine Code

Simple, Light, Yet Formally Verified, Global Common Subexpression

Factsheet Compcert C Compiler

The Compcert C Verified Compiler: Documentation and User's Manual

1 Introduction