Program Analysis of Temporal

Memory Mismanagement

by

Hua Yan

B.Sc., Peking University, China, 2007

M.Sc., Peking University, China, 2010

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE SCHOOL OF

Computer Science and Engineering THE UNIVERSITY OF NEW SOUTH WALES

SYDNEY· AUSTRALIA

Monday 22nd January, 2018

All rights reserved. This work may not be reproduced in whole or in part, by

photocopy or other means, without the permission of the author.

© Hua Yan 2018 PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet

Surname or Family name: Yan

First name: Hua Other name/s:

Abbreviation for degree as given in the University calendar: PhD

School: School of Computer Science and Engineering Faculty: Faculty of Engineering

Title: Program Analysis of Temporal Memory Mismanagement

Abstract 350 words maximum: (PLEASE TYPE)

For CIC++ programs, the performance benefits obtained from flexible low-level memory access and management sacrifice language-level support for memory safety and garbage collection. Memory-related programming mistakes are introduced as a result, rendering CIC++ programs prone to memory errors. A common category of programming mistakes is defined by the misplacement of deallocation operations, also known as temporal memory mismanagement, which can generate two types of bugs: (1) use-after-free (UAF) bugs and (2) memory leaks. The former are severe security vulnerabilities that expose programs to both data and control-flow exploits, while the latter are critical performance bugs that compromise availability and reliability. In the case of UAF bugs, existing solutions, which almost exclusively rely on dynamic analysis, suffer from limitations, including low coverage, binary incompatibility and high overheads. In the case of memory leaks, detectiontechniques are abundant; however, fixing techniques have been poorly investigated.

This thesis presents three novel program analysis frameworks to address temporal memory mismanagement in /C++. First, we introduce Tac, the first static UAF detection framework to combine typestate analysis with machine learning. Tac identifies representative features to train an SVM to classify likely true/false UAF candidates, and thereby providing guidance for typestate analysis used to locate bugs with precision.

We then present CRed, a pointer-analysis-based framework for UAF detection with a novel context-reduction technique and a new demand-driven path-sensitive pointer analysis to boost scalability and precision. A major advantage of CRed is its ability to substantially and soundly reduce search space without losing bug-finding ability. This is achieved by utilizing must-not-alias information to truncate off unnecessary segments of calling contexts.

Finally, we propose AutoFix, an automated memory leak fixing framework based on value-flow analysis and static instrumentation that can fix memory leaks safely and precisely with negligible overheads.

The contribution of this thesis is threefold. First, we advance existing state-of-the-art solutions to temporal memory mismanagement by proposing a series of novel program analysis techniques. Second, corresponding prototype tools are fully implemented in the LLVM compiler framework. Third, an extensive evaluation of open-source benchmarks is conducted to validate the effectiveness of the proposed techniques.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in DissertationAbstracts International (this is applicable to doctoral theses only).

The University recognises that there may be exceptional circumstances requiring restrictions on· copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award: ORIGINALITY STATEMENT

'I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.'

Signed

Date COPYRIGHT STATEMENT

'I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.'

Signed

Date

AUTHENTICITY STATEMENT

'I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.'

Signed

Date Abstract

For C/C++ programs, the performance benefits obtained from flexible low-level memory access and management sacrifice language-level support for memory safety and garbage collection. Memory-related programming mistakes are introduced as a result, rendering C/C++ programs prone to memory errors. A common category of programming mistakes is defined by the misplacement of deallocation operations, also known as, temporal memory mismanagement, which can generate two types of bugs: (1) use-after-free (UAF) bugs and (2) memory leaks. The former are severe security vulnerabilities that expose programs to both data and control-flow exploits, while the latter are critical performance bugs that compromise software availability and reliability. In the case of UAF bugs, existing solutions, which almost exclusively rely on dynamic analysis, suffer from limitations, including low code coverage, binary incompatibility and high overheads. In the case of memory leaks, detection techniques are abundant; however, fixing techniques have been poorly investigated.

In this thesis, we present three novel program analysis frameworks to address temporal memory mismanagement in C/C++. First, we introduce Tac, the first static UAF detection framework to combine typestate analysis with machine learn- ing. Tac identifies representative features to train a Support Vector Machine to classify likely true/false UAF candidates, thereby providing guidance for typestate

i ii analysis used to locate bugs with precision.

We then present CRed, a pointer-analysis-based framework for UAF detection with a novel context-reduction technique and a new demand-driven path-sensitive pointer analysis to boost scalability and precision. A major advantage of CRed is its ability to substantially and soundly reduce search space without losing bug-

finding ability. This is achieved by utilizing must-not-alias information to truncate unnecessary segments of calling contexts.

Finally, we propose AutoFix, an automated memory leak fixing framework based on value-flow analysis and static instrumentation that can fix all leaks re- ported by any front-end detector with negligible overheads safely with precision.

AutoFix tolerates false leaks with a shadow memory data structure carefully de- signed to keep track of the allocation and deallocation of potentially leaked memory objects.

The contribution of this thesis is threefold. First, we advance existing state- of-the-art solutions to temporal memory mismanagement by proposing a series of novel program analysis techniques. Second, corresponding prototype tools are fully implemented in the LLVM compiler framework. Third, an extensive evaluation of open-source C/C++ benchmarks is conducted to validate the effectiveness of the proposed techniques. Publications

• Hua Yan, Yulei Sui, Shiping Chen and Jingling Xue. Spatio-Temporal Context Reduction: A Pointer-Analysis-Based Static Approach for Detecting

Use-After-Free Vulnerabilities. International Conference on Software Engi-

neering (ICSE ’18).

• Hua Yan, Yulei Sui, Shiping Chen and Jingling Xue. Machine-Learning- Guided Typestate Analysis for Static Use-After-Free Detection. Annual Com-

puter Security Applications Conference (ACSAC ’17).

• Hua Yan, Yulei Sui, Shiping Chen and Jingling Xue. AutoFix: An Auto- mated Approach to Memory Leak Fixing on Value-Flow Slices for C Programs

ACM SIGAPP Applied Computing Review (ACR ’17).

• Hua Yan, Yulei Sui, Shiping Chen and Jingling Xue. Automated Memory Leak Fixing on Value-Flow Slices for C Programs. ACM Symposium on

Applied Computing (SAC ’16).

iii Acknowledgements

I would like to thank sincerely all those who have been helpful and supportive during my PhD study.

First of all, I would like to thank my supervisor, Prof. Jingling Xue for in- troducing me the exciting world of research. Prof. Xue is a knowledgeable and experienced scientist with an in-depth understanding of computer science. He is a patient advisor who always inspires me and broadens my horizon. I am honored to have the opportunity to work with him. This thesis would not be possible without his constant support, guidance and encouragement.

Second, I thank my co-supervisor Dr. Shiping Chen for providing me the pre- cious opportunity to collaborate with data61 CSIRO. Dr. Chen always gives me encouragement when I work on challenging topics, and is always helpful when I have difficulties. His insightful guidance is an essential part of my progress.

Third, I would like to thank my co-supervisor and colleague Dr. Yulei Sui for his collaboration on a series of fascinating topics. Dr. Sui is an enthusiastic young scholar who is always willing to help and share. His penetrating understanding of programming languages and solid programming skills had great influence on me. I am in debt to him for discussing ideas with me and revising my papers.

I am also thankful to everyone of our research group: Xinwei Xie, Peng Di, Ding

Ye, Sen Ye, Yue Li, Hao Zhou, Tian Tan, Xiaokang Fan, Feng Zhang, Yifei Zhang,

iv v

Jieyuan Zhang, Jie Liu, Jingbo Lu, Diyu Wu and Xuezheng Xu. It was them who made days at K17 501 cheerful and memorable.

Finally, I give my special gratitude to my parents and my wife for their uncon- ditional love. Contents

Abstracti

Acknowledgements iv

List of Figures xi

List of Tables xii

List of Algorithms xiii

1 Introduction1

1.1 Challenges...... 3

1.2 Our Approaches...... 5

1.3 Contributions...... 6

1.4 Thesis Organization...... 8

2 Background9

2.1 Temporal Memory Mismanagement...... 9

2.2 Whole-Program Analysis in LLVM...... 12

2.3 Pointer Analysis...... 14

3 Machine-Learning-Guided UAF Detection 18

vi CONTENTS vii

3.1 Overview...... 18

3.2 The Tac Framework...... 23 3.2.1 The Training Phase...... 24

3.2.2 The Analysis Phase...... 25

3.2.3 Examples...... 27

3.2.4 Discussion...... 30

3.3 Design...... 30

3.3.1 Training...... 30

3.3.2 Typestate Analysis...... 34

3.4 Evaluation...... 41

3.4.1 The Training Phase...... 43

3.4.2 The Analysis Phase...... 44

3.4.3 Case Study...... 48

3.5 Chapter Summary...... 50

4 Pointer-Analysis-Based UAF Detection 52

4.1 Overview...... 52

4.1.1 Problem Statement...... 53

4.1.2 Challenges...... 53

4.1.3 State of the Art...... 54

4.1.4 Our Solution and Contributions...... 56

4.2 The CRed Framework...... 58 4.2.1 Calling-Context Reduction...... 58

4.2.2 Path Reduction...... 64

4.3 Design...... 65

4.3.1 Demand-Driven Pointer Analysis...... 65

4.3.2 Multi-Stage UAF Analysis...... 70 CONTENTS viii

4.3.3 Spatio-Temporal Context Reduction...... 71

4.4 Implementation...... 76

4.5 Evaluation...... 77

4.5.1 Methodology...... 78

4.5.2 Benchmarks...... 78

4.5.3 Experimental Setup...... 79

4.5.4 Results and Analysis...... 80

4.5.5 Limitations...... 87

4.6 Chapter Summary...... 88

5 Memory Leak Fixing 89

5.1 Overview...... 89

5.2 A Motivating Example...... 94

5.3 The AutoFix Framework...... 97 5.3.1 Step 1: Constructing a Value-Flow Graph...... 97

5.3.2 Step 2: Locating Functions to Fix...... 99

5.3.3 Step 3: Identifying Leaky Paths...... 101

5.3.4 Step 4: Liveness Analysis...... 102

5.3.5 Step 5: Instrumentation...... 102

5.3.6 The Metadata Structure...... 103

5.4 Evaluation...... 106

5.4.1 Efficiency of AutoFix...... 107

5.4.2 Effectiveness of AutoFix...... 109

5.4.3 Runtime Overhead...... 111

5.5 Discussion...... 113

5.6 Chapter Summary...... 114 CONTENTS ix

6 Related Work 115

6.1 UAF Detection...... 115

6.2 UAF Protection...... 116

6.3 Static Analysis for Memory Error Detection...... 116

6.4 Machine Learning for Bug Detection...... 117

6.5 Memory Leak Detection...... 117

6.6 Memory Leak Tolerance...... 118

6.7 Leak Fixing...... 118

6.8 Garbage Collection...... 119

7 Conclusions 121

Bibliography 124 List of Figures

2.1 Use-after-free vulnerabilities in NVD...... 10

2.2 Exploiting a UAF bug...... 11

2.3 Our analysis in the LLVM tool chain...... 13

2.4 Andersen-style pointer analysis...... 15

2.5 Demand-driven flow- and context-insensitive pointer analysis.... 16

2.6 Demand-driven path-sensitive pointer analysis...... 17

3.1 A finite state automaton (FSA) for UAF...... 19

3.2 Tac framework...... 24 3.3 Examples illustrating imprecise UAF detection...... 26

3.4 Tac’s data-flow functions...... 38

3.5 An example for illustrating Tac...... 40 3.6 A UAF bug found in less ...... 48

3.7 CVE-2016-4817 and two new bugs in h2o...... 49

3.8 CVE-2015-1351 and two new bugs in php...... 50

4.1 Workflow of CRed...... 58 4.2 An example of context reduction...... 59

4.3 Points-to sets of the program in Figure 4.2...... 61

4.4 ICFG of the program in Figure 4.2...... 62

x LIST OF FIGURES xi

4.5 A conceputual illustration of context reduction...... 63

4.6 Path reduction...... 65

4.7 Demand-driven path-sensitive pointer analysis...... 67

4.8 Path guard construction on a CFG...... 69

4.9 Context reduction...... 72

4.10 Adding path guards to calling contexts...... 74

4.11 Hit rates for JTS...... 80

4.12 Percentage distribution of CRed’s analysis times...... 84

4.13 Case study for CRed in real-world applications...... 85

5.1 Incorrect and correct fixes...... 91

5.2 The AutoFix framework...... 92 5.3 A motivating example...... 95

5.4 Value-flow example...... 98

5.5 The metadata structure design...... 104

5.6 Implementation of shadow metadata operations...... 105

5.7 Memory footprint of redis-2.8 before and after fixing the leak.... 108

5.8 Fixing path correlated leaks in AutoFix...... 110

5.9 The leaky code (in net.c of redis-2.8) fixed by AutoFix...... 111 5.10 Comparing runtime overheads for different metadata structures... 112

5.11 Runtime overhead breakdown...... 113 List of Tables

3.1 ESP-based Typestate analysis...... 29

3.2 Features used by the SVM classifier...... 31

3.3 Program statements in the LLVM-like SSA form...... 35

3.4 Open-source benchmarks...... 42

3.5 UAF bugs detected by Tac ...... 43

3.6 Results of training of Tac...... 45

3.7 Tac’s warning-reduction ability...... 46

3.8 Tac’s efficiency and effectiveness...... 46

4.1 Benchmarks...... 78

4.2 Experimental results of CRed ...... 81 4.3 Experimental results of existing tools...... 82

5.1 Benchmark characteristics...... 106

5.2 Compile-time statistics of AutoFix ...... 108

5.3 Run-time statistics of AutoFix ...... 109

xii List of Algorithms

1 Tac’s intraprocedural typestate analysis...... 36

2 Liveness Analysis (for leaked object o)...... 100

3 Instrumentation (for leaked object o)...... 103

xiii Chapter 1

Introduction

C and C++ are the de facto standard programming languages used to develop performance-critical software including operating systems, database systems, em- bedded software, web browsers and game applications. Owing to explicit memory management and low-level memory access, C/C++ programs can make use of mem- ory more directly and efficiently, thereby improving overall performance; however, unlike Java and C#, C and C++ do not provide garbage collection at the language level, and are thus known as unmanaged languages. Adding language-level garbage collection to C/C++ is infeasible for three reasons. First, garbage collection is often too expensive to reconcile with the high-performance objective of C/C++. Second, garbage collection relies on type safety, a feature unavailable in C/C++. Due to arbitrary pointer arithmetic, unsafe type casting and pointer-based array access, a

C/C++ program could refer to any memory addresses at any time, therefore, no memory addresses can be regarded as not in use, and no memory can be reclaimed safely [3]. Third, compatibility issues with legacy systems hinder the adoption of garbage collection and other workaround solutions such as conservative garbage collection [11, 12].

1 Chapter 1. Introduction 2

The absence of garbage collection gives rise to the issue of temporal memory mis- management, which refers to situations in which the deallocation is either missing or not performed at the appropriate time. Specifically, temporal memory misman- agement causes two common types of temporal error, those being use-after-free

(UAF) and memory leaks. The former are generally regarded as severe security vulnerabilities that expose programs to both data and control-flow exploits, while the latter are critical performance bugs that compromise software availability and reliability.

While other types of memory corruption errors such as buffer overflows are harder to exploit nowadays due to mitigations, few mitigations are deployed in production environments to prevent UAF vulnerabilities [149]. Considerable ef- forts have been made to build automatic tools for mitigating UAF bugs; however, existing solutions almost exclusively rely on dynamic analysis [16, 29, 67, 83, 105,

127, 140, 149], which inserts metadata-manipulating instrumentation code into the program, and detects or protects against UAF bugs at runtime by performing checks at all pointer dereferences [16, 83, 105, 140], or by invalidating all identified dangling pointers [67, 149]. While maintaining zero or low false alarms (due to unsound modeling for, e.g., casting [83] and safety window sizes [16]), dynamic techniques have a number of limitations, including low code coverage (when used as debugging aids), binary incompatibility (due to memory layout transformations such as fat pointers [140]), and high runtime and memory overheads (due to run- time instrumentation). The use of static analysis to detect UAF bugs will not suffer from such instrumentation-based limitations. However, static techniques for UAF detection are scarce, with [39] mainly focusing on binary code, although there are several analysis tools for detecting other types of memory corruption bugs, such as buffer overflows [63, 70] and null dereferences [33, 79]. Chapter 1. Introduction 3

In contrast to the study of UAF, there are many static approaches [20, 55, 92,

120, 136] used to detect memory leaks. Dynamic detectors [14, 84] locate memory leaks by instrumenting and tracking memory accesses at runtime, incurring high overheads. By testing a program under some inputs, dynamic detectors typically compute an under-approximation that produces no false positives but potentially misses many bugs. Static detectors [20, 55, 119, 120], which over-approximate the runtime behaviors of the program without executing it, can soundly pinpoint all the leaks in the program, but this may generate some false positives. Despite substantial advances in recent years, existing (static and dynamic) solutions to memory leaks mainly focus on bug detection; fixing each leak along every leaky path must still be done manually by programmers. Bug fixing is a time-consuming and error-prone task [52, 110] that costs the US billions of dollars every year [65,

154]. Automatic bug fixing (or program repairing) has been hotly debated in recent years [52, 64, 75, 80, 133, 138]; however, memory leak fixing has rarely been investigated.

This thesis focuses on program analysis techniques for temporal memory mis- management, and attempts to advance state-of-the-art UAF and memory leak so- lutions.

1.1 Challenges

In general, the program analysis of temporal memory mismanagement faces the following challenges:

• The complexity of C/C++ programs. The program complexity poses considerable challenges to the analysis of memory management: (1) C/C++

programs are often large-sized, and C/C++ applications with millions of lines Chapter 1. Introduction 4

of code are not uncommon; (2) C/C++ programs typically exhibit complex

control-flow (e.g., deep function calls, recursion and nested loops), and ex-

tensively use data structures (e.g., array and linked-list); and (3) memory

management in C/C++ programs is often complex, with allocations/access-

es/deallocations spread across many functions along a possibly infinite num-

ber of paths [16, 67].

• Tradeoff between precision, soundness and scalability. In general, the problem of detecting and fixing memory errors is undecidable. Practical

solutions [63, 70, 119] need to trade-off between precision (i.e., the ability

to report as few false alarms as possible) and soundness (i.e., the ability to

identify as many true bugs as possible), while taking scalability (i.e., the

ability to apply in large programs) into account. Each problem has its own

specific characteristics, so it is unrealistic to design a single strategy that is

optimal for all problems. A pratical challenge associated with the specific

problem of temporal memory mismanagement is how to mine its characteris-

tics and design customized solutions with reasonable precision and soundness

for real-world applications.

• The gap between pointer analysis and its clients. A pointer analysis is often employed to disambiguate memory references. In this sense, the

analysis of temporal memory mismanagement can be viewed as a client of

pointer analysis; however, pointer analysis and its clients are typically solved

in separate problem domains without considering correlations, which can be a

significantly imprecise approach. Although overall precision can be improved

by optimizing the analysis of each domain separately, fundamentally, the

solution lies in the necessary correlations between pointer analysis and its

clients being established [25, 40]. Chapter 1. Introduction 5

1.2 Our Approaches

To address the aforementioned challenges, this thesis proposes three program anal- ysis techniques, two for UAF detection and one that fixes memory leaks. Our approaches scale to real-world C/C++ applications (millions of lines of code) with reasonable precision and soundness. To bridge the gap between pointer analysis and the client, we investigate three solutions: (1) establishing correlations between them via machine learning, (2) unifying their domains with reduction techniques to avoid space explosion, and (3) dynamically resolving the imprecision of static analysis.

• Machine-learning-guided typestate analysis for UAF detection. We formulate temporal memory management as a finite state automaton with

typestate properties and state transitions. The bulk of our UAF detection

approach with this automaton is a typestate analysis that relies on pointer

analysis to disambiguate indirect memory references. Machine learning is

used to guide the typestate analysis to precisely identify UAF-related aliases

in the UAF detection setting. Our approach first learns the correlations

between program features and UAF-related aliases by using a Support Vector

Machine (SVM), and then applies this knowledge to further disambiguate the

UAF-related aliases reported imprecisely by the pointer analysis, so that only

the aliases validated by its SVM classifier are subsequently investigated in the

typestate analysis.

• Pointer-analysis-based static UAF detection. We introduce a pointer- analysis-based static analysis for finding UAF bugs. The central idea is to

unify the domains of UAF detection and pointer analysis to increase the pre-

cision of UAF detection. Our approach relies on a novel reduction technique Chapter 1. Introduction 6

that can significantly reduce the search space without missing any true bugs.

To improve the efficiency of the analysis, we adopt a multi-stage strategy that

starts with a context-insensitive pre-analysis, and then performs a context-

sensitive analysis, followed by a final path-sensitive analysis. In addition,

our approach incorporates a path-sensitive demand-driven pointer analysis

for finding the precise points-to information required.

• Memory leak fixing on value-flow slices. We introduce a fully automated leak-fixing approach by combining static and dynamic program analyses. Our

approach takes as input the leaky allocation sites reported by a static mem-

ory leak detector. To perform precise analysis, a value-flow graph is first

constructed to capture the def-use chains of both top-level and address-taken

variables in the program. Next, for each leaky allocation site, our approach

computes a value-flow slice, on which a graph reachability analysis is per-

formed to identify leaky paths. Finally, a liveness analysis is conducted to

locate the program points on the identified leaky paths at which fixes (i.e.,

the missing free calls) are to be inserted.

1.3 Contributions

We have developed three frameworks — Tac, CRed and AutoFix— to address temporal memory mismanagement. The contributions of this thesis are as follows:

• Tac. We propose Tac, a novel static UAF detector based on typestate analysis and machine learning. Tac relies on an SVM classifier customized for UAF detection with a set of features that can effectively disambiguate UAF-

related aliases. Tac trains the SVM classifier using both false and true UAF samples in a set of real-world programs, and then uses it to guide a typestate Chapter 1. Introduction 7

analysis to find UAF bugs with precision. Tac is fully implemented in LLVM. As evaluated by its use in eight open-source C/C++ programs (totaling over

2 MLOC), Tac is able to reduce the initial 19,083 UAF candidates to only

266 warnings reported. Tac is effective with a low false-positive rate: 109 out of the 266 are true UAF bugs, which represent 14 distinct ones, including

5 CVE vulnerabilities, 1 known bug, and 8 previously unknown bugs.

• CRed. We introduce CRed, a pointer-analysis-based static analysis for detecting UAF bugs. At the core of CRed is a novel context reduction tech- nique that significantly reduces search space by eliminating irrelevant calling

contexts. CRed’s precision relies on a new demand-driven pointer analysis

that is field-, flow-, context- and path-sensitive. CRed’s efficiency is achieved by adopting a multi-stage strategy that starts with a coarse pre-analysis and

then progressively refines the results. We have implemented CRed in LLVM and evaluated it using all the UAF test cases in the Juliet Test Suite (JTS) [1]

and 10 open-source real-world applications. For JTS, CRed is able to detect all 138 known UAF bugs without reporting any false alarms. For real-world

applications (totaling over 3 MLOC), CRed produces 132 warnings including

85 bugs in about 7.6 hours. The comparison with CBMC [59], Clang [5],

Coccinelle [91], and Supa [116] suggests that CRed significantly outper- forms state-of-the-art general-purpose bug detectors in UAF detection.

• AutoFix. We present AutoFix, a fully automated approach to memory leak fixing that can reclaim all the leaked memory objects reported by leak

detectors. AutoFix achieves this by combining static and dynamic analyses. An interprocedural static analysis is first performed to identify the earliest

program points into which the missing free calls can be inserted without

introducing any UAF errors. Next, a dynamic analysis that operates on Chapter 1. Introduction 8

efficient shadow memory tracks the potentially leaked memory objects and

performs runtime checking to fix true leaks without introducing any double-

free errors. We have implemented AutoFix in LLVM and evaluated it using eight open-source programs (totaling 534 KLOC). Experimental results show

that AutoFix can safely fix all memory leaks reported with an average runtime overhead of less than two percent.

1.4 Thesis Organization

The reminder of this thesis is organized as follows. Chapter2 gives the background to the study in which we introduce temporal memory mismanagement and briefly discuss LLVM-based whole-program analysis and pointer analysis. In Chapter3, we present a machine-learning-guided typestate analysis for UAF detection. Chapter4 describes pointer-analysis-based static UAF detection. Chapter5 elaborates upon a fully automatic approach to memory leak fixing on value-flow slices. We review related work in Chapter6. Chapter7 draws conclusions. Chapter 2

Background

This chapter provides background to this thesis. Section 2.1 introduces the prob- lem of temporal memory mismanagement in C/C++ programs and discusses UAF and memory leak bugs, which are targets for analysis in this thesis. Section 2.2 provides the background to LLVM, considering that the techniques proposed are implemented within it. Section 2.3 introduces pointer analysis, which is essential for analyzing memory errors.

2.1 Temporal Memory Mismanagement

Human errors are introduced into C/C++ programs when programmers perform low-level memory access and management. Correspondingly, memory errors in

C/C++ programs can be broadly classified into two categories: incorrect access and mismanagement. The former category includes buffer overflows, null pointer dereferences and wild pointer dereferences. The latter category can be further classified into two sub categories: (1) spatial mismanagement, which may happen at both the allocation site (by allocating more memory than available) and the deallocation site (by freeing an invalid address); and (2) temporal mismanagement,

9 Chapter 2. Background 10

300 250 High Severity (7 - 10) 200 All Severity Levels (0 - 10) 150 100 50 0

Figure 2.1: Use-after-free vulnerabilities in NVD [86]. which occurs when the deallocation call (i.e., free/delete) is either misplaced before uses (causing UAF bugs) or is missing (causing memory leaks). Note that double- free bugs (freeing the same memory object twice) are considered as a special case of UAF bugs.

Use-after-free UAF bugs, also known as dangling pointer dereferences, refer to bugs that access memory objects that have already been freed. As a common class of security vulnerabilities, UAF bugs are increasingly being exploited, as shown in Figure 2.1. UAF vulnerabilities are highly dangerous, with 80.14% in the US

National Vulnerability Database (NVD) being rated critical or high in severity; they are known to cause crashes [18], silent data corruption [23, 141], information leaks [67, 106], and control-flow hijacking attacks [16, 34, 37]. This vulnerability class persists in all kinds of C/C++ applications. While other types of memory corruption errors such as buffer overflows are harder to exploit nowadays due to existing mitigations, there are few mitigations deployed in production environments to prevent UAF vulnerabilities [149].

Figure 2.2 illustrates how UAF bugs can be exploited to allow arbitrary code execution. Figure 2.2(a) shows a crafted program, while Figure 2.2(b) demonstrates its runtime memory layout at each program point, highlighted by corresponding color. Line 1 defines a function pointer type, and line 2 defines a function named Chapter 2. Background 11

1: typedef void (*func_ptr)(); p1 address of 2: void foo() {…} p2 foo

int main() { 3: Time 4: func_ptr* p1 = malloc(4); p1 p2 freed 5: func_ptr* p2 = p1; 6: *p1 = &foo; 7: free(p1); long int* q = malloc(4); p1 8: p2 user input 9: *q = userInput(); q 10: (*p2)(); }

(a) Program with a UAF at lines 7 and 10 (b) Runtime memory layout

Figure 2.2: Exploiting a UAF bug. foo. In lines 4-6, p1 and p2 both point to a heap memory address that contains a function pointer pointing to function foo. The memory address is freed via p1 in line 7, where p2 becomes a dangling pointer. Next, the program allocates memory for q in line 8, and the previously freed memory can be reused (subject to the underlying memory allocation algorithm), so that q may point to the same memory address that is also pointed to by p2, as shown in the bottom section of Figure 2.2(b). In line 9, a user-controlled value is written into the memory address via q, which is subsequently called as a function address in line 10. This way, the control-flow of the program is hijacked. While this simple example illustrates the basic idea, sophisticated attacking techniques such as heap spraying [30] and return-oriented programming [107] are often employed to facilitate the control-flow hijacking in real-world settings (in the absence of fine-grained CFI protection [2,

38, 87, 125, 128]).

Memory leak In C/C++ programs, a memory leak occurs when the program- mer forgets to free an allocated memory object after it is no longer used. Memory leaks are a common type of performance bug that undermines software reliability. Chapter 2. Background 12

When leaked memory piles up, it gradually decreases the free memory available in the system. Next, insufficient memory causes a high rate of system paging activity

(a phenomenon known as thrashing), thereby compromising the quality-of-service

(QoS) of not only the application itself, but also all the programs running on the same physical machine. On the software security front, memory leaks also pose risks of denial-of-service (DoS) attacks. Examples of this include CVE-2017-9439,

CVE-2017-8804 and CVE-2017-8779, all documented in the CVE database [86].

The consequences of memory leaks can be disastrous. In 2012, Amazon Web Ser- vices (AWS) outaged due to a memory leak, affecting millions of customers for hours [135]. In 1992, the London Ambulance Service (LAS) experienced a fail- ure in their dispatch system due to a memory leak bug, which caused significant ambulance delays and as many as 46 deaths [27].

2.2 Whole-Program Analysis in LLVM

The analysis techniques introduced in this thesis are whole-program analysis im- plemented in LLVM.

The necessity of whole-program analysis UAF analysis is a typical multi- point problem. Given a UAF candidate, it requires multiple locations spanning across the whole program to be considered, including the allocation, deallocation and dereference sites. Similarly, the analysis of a potential memory leak needs to take into account the allocation and deallocation sites at a minimum. These program locations may span multiple functions and files. Therefore, not to miss bugs, an interprocedural whole-program analysis is required. Chapter 2. Background 13

x.c x.bc y.c Parser y.bc clang z.c z.bc

x.opt.bc y.opt.bc Optimizer llvm-opt z.opt.bc

IR Linker whole.bc whole.opt.bc llvm-link

Code Our Analysis Generator llvm-llc

whole.exe Binary Linker whole.o ld

Figure 2.3: Our analysis in the LLVM tool chain.

Our analysis in LLVM LLVM [62] is an industrial-strength compiler infras- tructure that provides an SSA-based intermediate representation (i.e., LLVM IR) for analysis and transformation.

Figure 2.3 illustrates the workflow of the LLVM compilation tool chain and the position of our analysis. First, the LLVM front-end clang parses the source files and generates LLVM bitcode files (with .bc extension) which put the program in LLVM

IR. Next, the optimizer llvm-opt optimizes the bc files individually, before they are linked together into a whole-program bc file whole.bc by the IR linker llvm-link.

Whole-program optimization is then performed, which transforms whole.bc into Chapter 2. Background 14 whole.opt.bc. After that, code generation llvm-llc is conducted, which outputs the assembly code whole.o. Finally, whole.o is linked with libraries, producing the executable binary whole.exe. The analysis in this thesis (highlighted in green in Figure 2.3) is performed on the whole-program bc file produced by the IR linker.

The advantages of LLVM IR for analysis The LLVM IR is a RISC-like in- struction set that is both source-language-independent and platform-independent.

It preserves or provides key high-level information for effective analysis, including type information, control flow graphs, and an explicit dataflow representation with an infinite set of virtual registers in Static Single Assignment (SSA) form [26]. In

LLVM, there are no implicit memory accesses, because values transferred between registers and memory are only allowed by using load and store instructions with pointers. All addressable objects (or left values) are explicitly modeled: heap mem- ory allocation and deallocation are realized by the malloc and free instructions, while stack memory is explicitly allocated via alloca and automatically deallo- cated. For each global variable and function definition, a unique symbol is defined to represent its address. In this manner, a unified memory model is defined in

LLVM, in which all memory operations, including call instructions, occur through pointers. This substantially simplifies the analysis of memory.

2.3 Pointer Analysis

Almost every non-trivial program analysis task for C/C++ programs requires pointer analysis due to the pervasive use of pointers in the program. Strictly speak- ing, pointer analysis comes in two forms: (1) may-alias analysis and (2) must-alias analysis. Must-alias analysis is an unsound under-approximation, and is thus not suitable for techniques in pursuit of soundness. In this thesis, we adopt may-alias Chapter 2. Background 15 analysis. Hereafter, we mean may-alias analysis when we say pointer analysis, and the two terms are used interchangeably.

The most straight-forward pointer analysis is Andersen analysis [6], which ex- presses constraints among points-to sets as set-inclusion relations (i.e., ⊇). We formalize Andersen analysis in Figure 2.4, which gives self-explained rules for han- dling the following four types of statements in the program: p=&o (Addr), p=q

(Copy), p = ∗q (Load), and ∗p = q (Store). Note that passing parameters and returning results between a callsite and a callee are treated as Copy. The rules are solved as a whole-program dataflow problem in which the points-to set pt(p) for each pointer p is computed using a standard maximal-fixed-point worklist algorithm [50, 56].

p = &o p = q [ADDR] [COPY] pt(p) ⊇ {o} pt(p) ⊇ pt(q)

p = ∗q pt(q) ⊇ {o} ∗p = q pt(p) ⊇ {o} [LOAD] [STORE] pt(p) ⊇ pt(o) pt(o) ⊇ pt(q)

Figure 2.4: Andersen-style pointer analysis.

An Andersen-style pointer analysis computes the exhaustive points-to solution for the whole program, which can be unnecessary for its clients. More often than not, a client is interested in only a small part of the program. This observation leads to the study of demand-driven pointer analysis [47, 108, 156], which is usually far more efficient than its whole-program counterpart because only points-to queries specified by the client need to be solved.

In Figure 2.5, we formalize demand-driven pointer analysis in terms of value-

flow (i.e., def-use chains for both top-level and address-taken variables) [116], in which v ←- v0 represents the value flow from a definition of v0 to a use of v. The Chapter 2. Background 16 central idea is, given a pointer variable v, to track backward its value-flows (i.e.,

←-) to find all reachable objects o. Thus, pt(v) = { o | v ←- o } .

p = &o p = q [ADDR] [COPY] p ←- o p ←- q p = ∗q q ←- o ∗p = q p ←- o [LOAD] [STORE] p ←- o o ←- q

v ←- v0 v0 ←- v00 [TRANS] v ←- v00

Figure 2.5: Demand-driven flow- and context-insensitive pointer analysis.

We introduce a state-of-the-art demand-driven flow- and context-sensitive pointer analysis [116] in Figure 2.6, by augmenting the flow- and context-insensitive version in Figure 2.5. To achieve flow- and context-sensitivity, we use (c, l, v) to precisely denote the occurrence of v in statement l under context c.

To achieve context-sensitivity, the call stack for a function call is statically modeled in [CALL] and [RETURN] to filter out unmatched call and return edges in the program’s ICFG [102]. The abstract call stack is initialized as empty. Whenever entering a callee via a callsite l, l is pushed into the current stack c, denoted by c⊕l. Whenever returning from a callee to a callsite l , l is popped from the current stack c, denoted by c l.

To achieve flow-sensitivity, a conservative value-flow graph (VFG) is first built on top of program’s SSA form [21] to capture all possible def-use chains. Let N be the set of nodes which represents all statements, and E be the set of edges which represents potential def-use chains between two statements. A VFG G = (N,E) is a multi-edged directed graph, with all the edges e ∈ E in the form of l →v l0, which represents a potential def-use chain for variable v with the definition at statement l and the use at statement l0. Note that although both represent value flow, → and Chapter 2. Background 17

←- are different: → is the known fact (i.e., through imprecise over-approximation) after VFG construction, while ←- is what we need to solve on-demand (i.e., precise refinement). In addition to the four types of statements handled in Figure 2.5, Phi statements in the form of p=φ(..., q, ...) (which appear in the SSA representation) are also handled in Figure 2.6. The rules in Figure 2.6 are flow-sensitive, as the existence of required value-flow (i.e., →) is checked in the premises of [COPY], [PHI],

[LOAD], [STORE], [CALL] and [RETURN].

c, l : p = &o [ADDR] (c, l, p) ←- (c, ol)

q c, l : p = q lq → l [COPY] (c, l, p) ←- (c, lq, q)

q c, l : p = φ(.., q, ..) lq → l [PHI] (c, l, p) ←- (c, lq, q)

q o c, l : p = ∗q (c, lq, q) ←- (co, o) lq → l lo → l [LOAD] (c, l, p) ←- (co, lo, o)

p q o c, l : ∗p = q (c, lp, p) ←- (co, o) lp → l lq → l lo → l [STORE] (co, l, o) ←- (c, lq, q)(co, l, o) ←- (co, lo, o)

a c, l :define fun(v) {...} lcall : call fun(a) la → lcall [CALL] (c, l, v) ←- (c [l], la, a)

x c, l : y = call fun(...) lx → lret define fun(...) {..., lret :return x} [RETURN] (c, l, y) ←- (c ⊕ [l], lx, x)

(c, l, v) ←- (c0, l0, v0)(c0, l0, v0) ←- (c00, l00, v00) [TRANS] (c, l, v) ←- (c00, l00, v00)

Figure 2.6: Demand-driven pointer analysis with field-, flow- and context- sensitivity as in [116]. Chapter 3

Machine-Learning-Guided

Typestate Analysis for Detecting

Use-After-Free Vulnerabilities

3.1 Overview

Use-after-free (UAF) vulnerabilities, i.e., dangling pointer dereferences (accessing objects that have already been freed) in C/C++ programs can cause data corrup- tion [23, 141], information leaks [67, 106], denial-of-service attacks (via program crashes) [18], and control-flow hijacking attacks [16, 34, 37]. While other memory corruption bugs, such as buffer overflows, have become harder to exploit due to var- ious mitigation techniques [23, 123, 150], UAF has recently become a significantly more important target for exploitation [67, 149].

Recent years have witnessed an increasingly large body of research on detect- ing or mitigating UAF vulnerabilities. Most existing approaches rely on dynamic analysis techniques by maintaining shadow memory [83, 105, 140] and performing

18 Chapter 3. Machine-Learning-Guided UAF Detection 19

use use

live dead error use/free malloc free free Figure 3.1: A finite state automaton (FSA) for UAF. runtime checks [16, 67, 149]. Dynamic analysis yields no or few false positives, but can incur non-negligible runtime and memory overheads, hindering their adoption in production environments. In addition, dynamic analysis often suffers from bi- nary incompatibility issues due to code instrumentation used [123]. When used as bug detectors, dynamic approaches are often limited by test inputs used and can thus provide low code coverage and miss true bugs.

Static analysis, which approximates the program behavior at compile-time, does not suffer from the above limitations, but requires scalable yet precise pointer anal- ysis in order to find memory errors with low false alarm rates in large programs [67].

Typestate analysis [40, 114] represents a fundamental approach for detecting stati- cally temporal memory safety errors in C/C++ programs. For example, UAF bugs can be detected based on the finite state automaton (FSA) depicted in Figure 3.1.

The typestates of an object o are tracked by statically analyzing all the statements

(e.g., malloc sites, free sites, and pointer dereferences at loads/stores) that affect the state transitions along all the possible program paths. A UAF warning for the object o is reported when error is reached. This happens when a free site free(p) reaches a use site use(q) (which denotes a memory access on the same ob- ject pointed by q, e.g., ∗q) along a control-flow path, where ∗p and ∗q are aliases, i.e., p and q point to o. In what follows, such aliases are said to be UAF-related.

Double-free bugs are handled as a special case of UAF bugs.

ESP [31], as a representative path-sensitive typestate analysis that runs in poly- nomial time and space, is useful for checking properties such as “file open-close” [31] Chapter 3. Machine-Learning-Guided UAF Detection 20 and “socket-connection” [40]. Unlike a data-flow-based path-sensitive analysis that computes execution states as its data-flow facts by finding a meet-over-all-path

(MOP) solution, ESP avoids examining possibly infinite program paths by being partially path-sensitive [31, 134]. ESP uses a symbolic state as a data-flow fact, which includes an execution state and a property state of an FSA, based on the points-to information. At a control-flow joint point, ESP produces a single sym- bolic state, by merging the execution states whose corresponding property states are identical, thus yielding a maximal-fixed-point (MFP) solution.

Below we first discuss the challenges faced in developing a practical types- tate analysis for detecting UAF bugs. We then outline the motivation behind our machine-learning-based solution.

Challenges and insights Unlike temporal properties such as “file open-close” and “socket-connection”, UAF is much harder to handle by ESP-based typestate analysis both scalably and precisely, due to complex aliasing in the presence of a large number of free-use pairs in real-world programs. For example, php-5.6.8 has 340 million free-use pairs with 1,391 frees and 244,917 uses.

To achieve soundness, any change to the typestate of an object must be reflected in all pointers that point to the object, i.e., all aliases of the object. In addition, the typestate transitions of an object o must be tracked efficiently and precisely from its free site free(p) to all the corresponding use sites use(q), where ∗p and ∗q are aliases (with o), along possibly many program paths spanning across possibly many functions in the program.

A typestate analysis becomes more effective if a more precise pointer analysis is used. Ideally, one may wish to combine both typestates and points-to informa- tion into the same analysis domain to form a single data-flow-based path-sensitive analysis, which will be, unfortunately, intractable due to potentially an unbounded Chapter 3. Machine-Learning-Guided UAF Detection 21 number of paths and undecidability of aliasing [60, 99]. In order to simplify com- plexity, several dimensions of pointer analysis are considered to enable precision and efficiency trade-offs: flow-sensitive (by distinguishing the flow of control), field- sensitive (by distinguishing different components of an aggregate data structure), context-sensitive (by distinguishing calling contexts of a function), and/or path- sensitive (by distinguishing program paths).

In practice, these over-approximation solutions are usually imprecise, despite recent advances on sparse [45, 148, 151] and demand-driven pointer analysis [108,

112, 116], in analyzing a number of hard “corner cases” in a program, such as infeasible paths (by ignoring path sensitivity or handling it partially), recursion cycles (by merging all functions in a recursion cycle), loops (by not distinguishing different iterations of a loop), arrays (by not distinguishing array elements), and linked lists (by abstracting some of their nodes as a single one). As a result, a large number of spurious aliases will be reported, causing the corresponding typestate analysis to report a large number of spurious state transitions, i.e., false alarms.

For debugging purposes, therefore, the practical usefulness of typestate analysis for

UAF detection becomes limited.

Our solution To address the above challenges, we introduce a new UAF detec- tion framework, Tac, to bridge the gap between typestate and pointer analyses by machine learning. Our key observation is that the spurious aliases reported by pointer analysis are alike and predictable. They share some common program fea- tures explicitly (e.g., in terms of their declaration types) or implicitly (in terms of their points-to relations). By training Tac using a Two-Class Support Vector Ma- chine (TC-SVM), existing UAF ground truths, i.e., codebases containing labeled known false alarms and true bugs can be leveraged to enable Tac to learn the correlations between program features and the UAF-related aliases. Then its SVM Chapter 3. Machine-Learning-Guided UAF Detection 22 classifier can be called upon to further scrutinize the UAF-related aliases reported imprecisely by the pointer analysis so that only the ones validated by the SVM classifier are further investigated by the typestate analysis. Despite its unsound- ness, Tac turns out to be a practical tool for detecting UAF bugs efficiently with a low false alarm rate for large C/C++ programs.

We evaluate the effectiveness of Tac against Tac-NML (Tac without machine learning) in both its training and analysis phases. In the training phase, we exercise

Tac using a large number of UAF samples, including manually identified false alarms reported by Tac-NML and true bugs (both real and injected) in a set of four C/C++ training programs. By using the standard 5-fold cross validation, Tac achieves high precision (92.6%) and recall (95.8%) while Tac-NML is imprecise (42.1%) despite a total recall (100%), measured in terms of their ability in finding the true bugs in the training samples provided.

In its analysis phase, Tac finds 109 true UAF bugs out of 266 warnings reported in a set of eight C/C++ programs including the four used in the training phase.

Among the 109 bugs, there are 14 distinct ones (two UAF pairs are considered to be duplicated if they share the same free site and dereference the same pointer at the two use sites), including 5 CVE vulnerabilities, 1 known bug and 8 previously unknown ones. Compared to 19,083 warnings reported by Tac-NML, Tac reports only 266 warnings, achieving a reduction rate of 98.6%, reducing significantly the amount of manual effort needed for inspecting a vast number of false alarms.

Contributions This chapter makes the following contributions:

• We present Tac, a new machine-learning-guided typestate analysis for de- tecting UAF bugs statically.

• We introduce an SVM classifier specialized for UAF detection with a set of 35 Chapter 3. Machine-Learning-Guided UAF Detection 23

features that can effectively disambiguate the UAF-related aliases reported

imprecisely by pointer analysis to help typestate analysis in finding true UAF

bugs at a significantly reduced false alarm rate.

• We have implemented Tac in LLVM-3.8.0 and evaluated it using eight open- source C/C++ programs (2,098 KLOC). Tac finds 109 bugs out of 266 warn-

ings by suppressing 19,083 warnings reported by Tac-NML. Among the 109 true bugs, there are 14 distinct ones, including 5 CVE vulnerabilities, 1 known

bug, and 8 previously unknown bugs.

3.2 The Tac Framework

As shown in Figure 5.2, Tac has two main components. The training phase extracts the program features from ground truths and then uses these features to train an

SVM classifier to learn harmful (benign) UAF-related aliases that cause true bugs

(false alarms). A UAF pair hfree(p), use(q)i is said to be harmful (benign) if ∗p and ∗q are regarded as aliases (non-aliases) by the SVM classifier.

The analysis phase filters outs many spurious UAF-related aliases reported by the pointer analysis. A pre-analysis is first performed to identify a set of candi- date UAF pairs hfree(p), use(q)i, where ∗p and ∗q are found to be aliased. For every object o created at an allocation site, such that o is related to at least one candidate pair hfree(p), use(q)i, where ∗p and ∗q are aliased with o, a forward slice of the program starting from the allocation site but restricted only to the statements into which o flows (referred to as the slice of o below) is found. Then an on-demand typestate analysis is performed on the slice of o. Based on the fea- tures of hfree(p), use(q)i extracted on the fly from this slice, the SVM classifier passes hfree(p), use(q)i, when it is harmful, to the typestate analysis for further Chapter 3. Machine-Learning-Guided UAF Detection 24

Training Target Samples Program

Pre- Analysis allocation site of 표

True UAF False UAF Program Slicing

Typestate 표’s sub-program Extract Extract Automaton

Path-Sensitive Typestate Analysis Features Features Query for if free(p) frees 표 and if Bug Report Guide use(q) accesses 표

state state No transition transition Pointer no Analysis SVM Classifier yes Query for Cross Apply classifier for Validation Feature Extraction

Training Phase Main Analysis Phase

Figure 3.2: Tac framework. investigation.

3.2.1 The Training Phase

Ground truths We exercise Tac using both false and true UAF samples in a set of real-world C/C++ programs as training programs. All such UAF samples are annotated for feature extraction. Chapter 3. Machine-Learning-Guided UAF Detection 25

Feature extraction We use a feature vector consisting of 35 features to describe a UAF sample. We categorize these features into the following four categories: (1) type information (e.g., global, array and struct), (2) control-flow (e.g., loop, recur- sion, and the distance between a free site and a use site), (3) common programming practices (e.g., pointer casting and reference counting), and (4) points-to informa- tion (e.g., the number of objects that may be used at a use site and the number of

UAF pairs sharing the same free site).

Prediction model Our prediction model for UAF detection uses an SVM classi-

fier. Conceptually, the SVM model used is a harmfulness predicate, which separates the input space containing all the UAF samples into two regions, marked as harmful and benign, respectively. To tune the intrinsic SVM parameters for optimal accu- racy, standard grid search is applied with 5-fold cross validation by enumerating all possible combinations of the SVM parameters. Given a set of SVM parameters, 5- fold cross validation computes the accuracy of the SVM model in three steps. First, all the UAF samples are divided into 5 equal-sized subsets. Then, each subset, in turn, is used as a test set with the remaining 4 subsets combined as its training set.

Finally, the averaged accuracy rate obtained is the expected accuracy of a model under the set of SVM parameters [17].

3.2.2 The Analysis Phase

Pre-analysis We start conservatively with a set of candidate objects that may be unsafe (as they may induce UAF bugs). An object o (identified by its allocation site) is selected as a candidate to be further investigated by our typestate analysis if a free site free(p) can reach a use site use(q) via context-sensitive control-flow reachability in the program, where p and q point to o, i.e., o ∈ pt(p) ∩ pt(q). Here, Chapter 3. Machine-Learning-Guided UAF Detection 26

// Data structure for linked-lists 1: typedef struct NODE { 2: int data; 3: struct NODE* nxt; 4: } NODE; 1: void foo() { // The 1st, 2nd, 3rd and ith nodes 2: void* p = malloc(1);//o1 3: void* q = p; 5: NODE *n1, *n2, *n3, *ni; 4: int flag = 0; 6: unsigned sz = sizeof(NODE); 5: if (Cond) { 6: free(p); 7: void bar() { 7: p = malloc(2);//o2 8: ni = n1 = (NODE*)malloc(sz);//o1 8: flag = 1; 9: for (int i = 0; i < 10; ++i) { 9: } 10: ni->nxt = (NODE*)malloc(sz);//o2 10: if (flag == 0) 11: ni = ni->nxt; 11: use(q); // False UAF w.r.t. line 6 12: ni->nxt = NULL; 12: use(q); // True UAF w.r.t line 6 13: } 13: use(p); // False UAF w.r.t line 6 14: n2 = n1->nxt; 14: } 15: n3 = n2->nxt; 16: free(n2); 17: use(n1); // False UAF w.r.t line 16 18: use(n2); // True UAF w.r.t line 16 19: use(n3); // False UAF w.r.t line 16 20: }

(a) Path-sensitivity (b) Linked-list Figure 3.3: Examples illustrating how imprecise pointer analysis leads to imprecise typestate analysis for UAF detection. pt(v) denotes the points-to set of a variable v. In this case, ∗p and ∗q are aliased

(with o). For efficiency reasons, the pre-analysis is performed in terms of Andersen’s pointer analysis [6] as implemented in [117]. As is standard, context-sensitive control-flow reachability is solved as a balanced-parentheses problem by matching calls and returns to filter out unrealizable program paths on the interprocedural

CFG (Control Flow Graph) of the program [102].

Slicing For each candidate object o, the program is sliced to keep only the rel- evant functions that o may flow to (i.e., o’s liveness scope) by using a standard mod-ref analysis, with its value-flow dependences computed by a flow-insensitive pointer analysis [45, 119]. Our typestate analysis for o will be performed on this slice. Chapter 3. Machine-Learning-Guided UAF Detection 27

Typestate analysis Our typestate analysis starts from a candidate object o created at its allocation site, with its path-sensitivity focused on the typestates of the FSA depicted in Figure 3.1. Following ESP [31], a data-flow fact is a symbolic state consisting of a property state, i.e., live, dead or error, and an execution state, which represents the values of all the variables affecting the control flow. At a two-way joint point, one symbolic state is obtained, by merging the execution states whose corresponding property states are the same. On encountering a free site free(p), the FSA transits from live to dead if o ∈ pt(p). On encountering subsequently a use/free site use(q)/free(q), the FSA transits from dead to error if both o ∈ pt(q) and the aliasing relation between ∗p and ∗q (with respect to o) reported by the pointer analysis is also validated by our SVM classifier. In this case, a UAF/double-free warning is issued.

Given a UAF pair, hfree(p), use(q)i, pt(p) and pt(q) are computed by using a demand-driven flow-sensitive pointer analysis [116]. If ∗p and ∗q are found to be aliased (with o), we pass the pair to an SVM classifier for a further sanity check based on the features of hfree(p), use(q)i extracted on the fly from the sliced program of o.

3.2.3 Examples

Figure 3.3 gives two examples to illustrate how the imprecision in pointer analysis leads to the imprecision in typestate analysis.

Typestate analysis with path-insensitive pointer analysis ESP-based type- state analysis is path-sensitive in tracking non-pointer scalar values but interprets pointer values conservatively as ⊥, i.e., obtainable from a pointer analysis. Fig- ure 3.3(a) gives an example with one true UAF bug (at line 12) and two false Chapter 3. Machine-Learning-Guided UAF Detection 28

UAF bugs (at lines 11 and 13) with respect to the free site at line 6, by using the points-to information computed by a path-insensitive pointer analysis.

We focus on analyzing the object o1 allocated at line 2 and freed conditionally at line 6, at which point, the property state of o1 becomes dead. By using a flow- sensitive pointer analysis without path sensitivity, ESP can (1) prove that use(q) at line 11 is not a UAF bug, (2) identify use(q) at line 12 as a true UAF bug, but

(3) report imprecisely use(p) at line 13 as a false alarm.

Table 3.1 gives the symbolic states (including o1’s property states for the FSA shown in Figure 3.1 and execution states) and the points-to sets at some relevant program points. After analyzing line 9, typestate analysis combines the symbolic states from the two branches into one, resulting in sline9 = s1 ∪ s2, where s1 =

[dead, p = q = ⊥, flag = 1, Cond] (if-branch) and s2 =[live, p = q = ⊥, flag =

0, ¬Cond] (else-branch). However, after line 10, s1 is filtered out due to path contradiction, since flag = 1 in s1’s execution state is inconsistent with the branch condition flag == 0 at line 10. As a transition from dead to error is impossible, the typestate analysis correctly proves the absence of a UAF bug for use(q) at line

11.

At line 12, the pointer analysis finds precisely that p and q point to o1 allocated at line 2. Therefore, a state transition from dead to error occurs in s1, so that a true UAF bug for use(q) at line 12 is reported. However, at line 13, due to the lack of path-sensitivity, p is found to point to both o1 (allocated at line 2) and o2 (allocated at line 7), resulting in a spurious alias relation between ∗p and

∗q. Therefore, free(p) at line 6 and use(q) at line 13 are considered to access o1, triggering a spurious state transition from dead to error in s1. Thus, a false alarm is raised for use(p) at line 13. Chapter 3. Machine-Learning-Guided UAF Detection 29

Table 3.1: ESP-based Typestate analysis with path-insensitive pointer analysis for the program given in Figure 3.3(a).

line 4: [live, p = q = ⊥, flag = 0] line 6: [dead, p = q = ⊥, flag = 0, Cond] line 8: [dead, p = q = ⊥, flag = 1, Cond] symbolic line 9: [dead, p = q = ⊥, flag = 1, Cond] ∪ typestates [live, p = q = ⊥, flag = 0, ¬Cond] line 10: [live, p = q = ⊥, flag = 0, ¬Cond] line 11: [dead, p = q = ⊥, flag = 1, Cond] ∪ [live, p = q = ⊥, flag = 0, ¬Cond]

line 6: pt(p) = {o1} points-to lines 11 and 12: pt(q) = {o } sets 1 line 13: pt(p) = {o1, o2}

Typestate analysis with imprecise handling of lists Figure 3.3(b) gives a linked-list example to demonstrate that field-sensitivity is not powerful enough to enable pointer analysis to distinguish the internal structure of an aggregate object.

With field-sensitivity, we can distinguish the head node (represented by o1) from the remaining 10 nodes (abstracted by o2) in the linked-list, created at the two allocation sites at lines 8 and 10, respectively. Thus, the typestate analysis can correctly prove the absence of a UAF bug for use(n1) at line 17 and report use(n2) at line 18 as a true UAF bug. However, the pointer analysis cannot distinguish the accesses to the second and third elements of the linked-list, since pt(n2) = pt(n3) = {o2}, resulting in a spurious alias relation between ∗n2 and ∗n3. Therefore, a false alarm for use(n3) at line 19 is reported. Chapter 3. Machine-Learning-Guided UAF Detection 30

3.2.4 Discussion

There are many spurious aliases introduced by pointer analysis. We propose to apply machine learning to significantly reduce their presence in order to improve the precision of typestate analysis.

3.3 Design

We introduce Tac, including its training phase (Section 3.3.1) and machine-learning- guided typestate analysis phase (Section 3.3.2).

3.3.1 Training

The aim of our SVM classifier is to further disambiguate the UAF-related aliases imprecisely reported by pointer analysis.

Building an SVM classifier We use x∈X to denote a UAF sample representing a pair of free and use sites hfree(p), use(q)ix ∈X. A feature Fi is either a syntactic or semantic property of a program, mapping x to either a boolean or numeric value

Fi : X → N. Following the standard normalization [17] to achieve accuracy in the training process, we adjust the values of the samples in X in order to map a feature to a real number between 0 and 1 inclusive, by using function Fi :

n X → [0, 1] . Specifically, given a sample x ∈ X, this is done as Fi(x) = (Fi(x) − min(Fi(X)))/(max(Fi(X)) − min(Fi(X))), where min (max) returns the minimum

(maximum) value for Fi among all the samples in X. Finally, a feature vector of length n is defined as F = (F1,..., Fn) containing a set of n features to capture the properties of every sample.

During the training process, we build an SVM classifier C : [0, 1]n → {0, 1} Chapter 3. Machine-Learning-Guided UAF Detection 31

Table 3.2: 35 Features used by the SVM classifier for a UAF sample hfree(p), use(q)i, where o ∈ pt(p) ∩ pt(q).

Group ID Feature Type Description 1 Array Bool o is an array or an element of an array 2 Struct Bool o is a struct or an element of a struct 3 Container Bool o is a container (e.g., vector or map) or an element of a container Type 4 IsLoad Bool use(q) is a load instruction Information 5 IsStore Bool use(q) is a store instruction 6 IsExtCall Bool use(q) is an external call 7 GlobalFree Bool free(p), where p is a global pointer 8 GlobalUse Bool use(q), where q is a global pointer 9 CompatibleType Bool p and q are type-compatible 10 InSameLoop Bool free(p) and use(q) are in the same loop 11 InSameRecursion Bool free(p) and use(q) are in the same recursion cycle 12 #FuncsInBetween Int number of functions in the shortest path from free(p) to use(q) in the program’s call graph Control 13 DiffIteration Bool use(q) is after free(p) via a loop back-edge Flow 14 Dom Bool free(p) dominates use(q) 15 PostDom Bool use(q) post-dominates free(p) 16 #IndCalls Int number of indirect calls in the shortest path from free(p) to use(q) in the call graph 17 UseBeforeFree Bool a UAF pair, free(p) and use(q), is also a use-before-free 18 NullifyAfterFree Bool p is set to null immediately after free(p) 19 ReturnConstInt Bool a const Int is returned after free(p) 20 ReturnBool Bool a Bool value is returned after free(p) Common 21 Casting Bool pointer casting is applied to q at use(q) Program- 22 ReAllocAfterFree Bool p is redefined to point to a newly allocated ming object immediately after free(p) Practices 23 RefCounting Bool o is an reference-counted object 24 ValidatedFreePtr Bool null checking for p before free(p) 25 ValidatedUsePtr Bool null checking for q before use(q) 26 PTSize@Free Int number of objects pointed to by p at free(p) 27 PTSize@Use Int number of objects pointed to by q at use(q) 28 #UAF@Free Int number of UAF pairs sharing the same free(p) 29 #UAF@Use Int number of UAF pairs sharing the same use(q) 30 #Aliases Int number of pointers pointing to o Points-to 31 AllocInLoop Bool o is allocated in a loop Information 32 AllocInRecursion Bool o is allocated in recursion 33 LinkedList Bool o is in a points-to cycle (signifying its presence in a linked-list) 34 SamePointer Bool p and q at free(p) and use(q) are the same pointer variable 35 DefinedBeforeFree Bool q at use(q) is defined before free(p) Chapter 3. Machine-Learning-Guided UAF Detection 32

that takes a feature vector F of a sample hfree(p), use(q)ix as input and returns whether ∗p and ∗q are aliases (1) or not (0). The typestate analysis phase will make use of the classifier to reduce the number of UAF-related spurious aliases.

For a program, let Xall be the set of all UAF pairs hfree(p), use(q)ix causing the FSA in Figure 3.1 to transit into error, where ∗p and ∗q are found to be aliased by a pointer analysis used. Only the following subset XML will be further investigated by the typestate analysis:

XML={hfree(p), use(q)ix ∈ Xall | C (F (x))=1 ∧ pt(p)∩pt(q) 6= ∅}

In other words, the UAF pairs hfree(p), use(q)ix in Xall \ XML are ignored, since ∗p and ∗q are not aliases by the SVM classifier.

Extracting program features Table 3.2 gives a set of 35 features, which are divided into four categories below, to represent a UAF sample. Note that this set of features can be extended by considering other program characteristics or reused by other program analyses.

• Type Information (Features 1 – 9). Type information is used to identify

arrays (F1), structs (F2), C++ containers (F3), different kinds of use sites (F4,

F5 and F6), global variable accesses for free and use sites (F7 and F8), and type compatibility for the pointers p and q at a free site free(p) and a use site use(q)

(F9).

• Control Flow (Features 10 – 17). We consider the following control-flow properties, including whether a pair of free and use sites resides in the same loop

or recursion cycle (F10 and F11), the distance between a free site and a use site

in the program’s call graph (F12), control-flow reachability from a free site to a

use site via a loop back-edge (F13), control-flow dominance and post-dominance Chapter 3. Machine-Learning-Guided UAF Detection 33

between a free site and a use site (F14 and F15), the number of indirect calls along the shortest path from a free site to a use site in the program’s call graph

(F16), and control-flow reachability from a use site to a free site for a UAF pair

(F17).

• Common Programming Practices (Features 18 – 25). We consider a number of programming practices for memory management, including setting p

to null immediately after free(p)(F18), returning an integer or a boolean value

from a wrapper for free(p) to signify the success or failure for free(p)(F19 and

F20), pointer casting (F21), setting p to point to a newly allocated object after

free(p)(F22), reference counting for an object (F23), and null checking before a

pointer is freed (F24) or used (F25).

• Points-to Information (Features 26 – 35). We take advantage of the points- to information computed by the pointer analysis used, including the sizes of the

points-to sets at free and use sites (F26 and F27), the number of UAF pairs

sharing the same free (F28) or use (F29) site, and the number of aliased pointers

pointing to a candidate UAF object (F30). In addition, we also consider whether

a candidate object is allocated in loops (F31), recursion cycles (F32) or as a node

of a linked-list (F33) participating in a points-to cycle (causing the object to abstract many concrete nodes in the list). Finally, we consider whether p and q

at a UAF pair, hfree(p), use(q)i, are the same variable (F34) and whether q at

use(q) is defined just before the free site (F35).

Let us revisit the two examples in Figure 3.3. In Figure 3.3(a), F9, F14, F15,

F22, F26, F27, F34 and F35 for line 13 are useful to predict that the UAF pair at lines 6 and 13 is not a bug. In Figure 3.3(b), F31, F33 and F35 for line 19 can help avoid a false alarm that would otherwise be raised for the UAF pair at lines 16 and Chapter 3. Machine-Learning-Guided UAF Detection 34

19.

Support vector machine (SVM) Given a set of labeled samples represented by their feature vectors, an SVM [24] can separate them by computing an underly- ing mathematical function, called a kernel function. There are four commonly used kernels: linear, polynomial, radial basis function (RBF) and sigmoid kernels. Fol- lowing [57, 72, 129], a RBF kernel is used. The kernel function maps each feature vector to a high dimensional space, where the SVM computes a hyperplane that best separates the labeled samples into two sides. Once trained, an SVM classifier can be used to classify a given UAF pair according to simply which side of the hyperplane it falls in.

3.3.2 Typestate Analysis

We describe our ESP-based typestate analysis for UAF detection. The basic idea is to achieve improved precision (compared to the prior work) by applying an

SMV classifier to further validate the UAF-related aliases found imprecisely by the pointer analysis.

Our typestate analysis is a whole-program analysis. We describe how it works,

first intraprocedurally and then interprocedurally.

3.3.2.1 Intraprocedural Analysis

Intraprocedurally, our typestate analysis is performed on the CFG of a function,

CFG = (N,E), where N is a set of nodes representing program statements and

E ⊆ N × N is a set of edges corresponding to the flow of control between nodes.

For a given edge e, src(e)/dst(e) denotes its source/destination node.

Following ESP [31], we assume three types of nodes on a CFG, i.e., N = Chapter 3. Machine-Learning-Guided UAF Detection 35

Table 3.3: Program statements in the LLVM-like SSA form.

p, q, i ∈ T (Top-level Vars), a, o∈A (Address-taken Objs), c, fld ∈ C (Consts)

uop ∈ {++, −−, !} unary operators bop ∈ {+, −, ×, /, &&, ||, ==, 6=, <, >, ≤, ≥} binary operators

E ::= p, q | c | E1 bop E2 | uop E scalar expressions

use(p) ::= ∗p=q | q =∗p | p[i]=q | q =p[i] memory access | p→fld=q | q =p→fld

StmtNode::= p = E scalar statement

| p=&a | p=malloco | use(p) | free(p) memory statements

JointNode∪BranchNode∪StmtNode: (1) a joint node (i.e., φ-node) n∈JointNode has two incoming edges InEdge0(n) and InEdge1(n), and a single outgoing edge;

(2) a branch node nE ∈ BranchNode with a branch condition expression E has a single incoming edge InEdge0(n) and two outgoing edges, OutEdge1(n) if E eval- uates to true and OutEdge0(n) otherwise; and (3) a statement node n∈StmtNode has a single incoming edge and a single outgoing edge.

Program representation Table 3.3 gives all the statements put on an LLVM- like SSA form for a function [45, 62, 69, 116].

The set of variables is separated into two subsets: T containing all top-level variables, including pointers and non-pointer scalars, and A containing all possible targets, i.e., address-taken objects of a pointer. C is a set of all constants. We use use(p) to denote a memory access via a pointer p, including a pointer dereference

∗p, a field access p → fld, and an array access p[i]. Complex statements like

∗p = ∗q are simplified to t = ∗q and ∗p = t by introducing a top-level pointer t.

Accessing a multi-dimensional array as in q = p[i][j] is transformed into q = p[k], where k =i ∗ n + j and n represents the size of the second dimension of the array. Chapter 3. Machine-Learning-Guided UAF Detection 36

We consider UAF only for the objects o in the heap allocated via p = malloco (but not for the objects a on the stack allocated via p = &a).

Given a CFG = (N,E), our typestate analysis computes and maintains the data-flow facts in DF (e) for every edge e ∈ E, where DF (e) maps e to a set of symbolic states S with each element s = hρ, σi consisting of a property state

ρ ∈ P roperties = {live, dead, error} and an execution state σ. The notation σ(E) is used to evaluate the expression E in σ. As is standard, σ[v ← v0] denotes the state obtained by updating the value of v in σ with v0 and leaving the values of all other variables in σ unchanged.

Algorithm 1 Tac’s intraprocedural typestate analysis.

1: procedure Solve (nmalloco ,CFG = (N,E)) 2: foreach e ∈ EDF (e) := ∅

3: DF (OutEdge(nmalloco )) := {[live, >]}

4: W orklist := {dst(OutEdge(nmalloco ))} 5: while(W orklist 6= ∅) 6: Remove a node n from W orklist 7: switch (n) 8: case: n ∈ JointNode 9: S := Fjnt (n, DF (InEdge0(n)),DF (InEdge1(n))) 10: Add (OutEdge0(n),S) 11: case: nE ∈ BranchNode 12: ST := Fbr (n, DF (InEdge0(n)), E) 13: SF := Fbr (n, DF (InEdge0(n)), ¬E) 14: Add (OutEdge1(n),ST ) 15: Add (OutEdge0(n),SF ) 16: case: n ∈ StmtNode 17: S := Fstmt (n, DF (InEdge0(n))) 18: Add (OutEdge0(n),S) 19: if(n is free(p)) Fo := Fo ∪ {free(p)} 20: end procedure 21: procedure Add (e, S) 22: if(DF (e) 6= S) 23: DF (e) := S 24: W orklist := W orklist ∪ {dst (e)} 25: end procedure Chapter 3. Machine-Learning-Guided UAF Detection 37

Machine-learning-guided typestate analysis Algorithm1 gives a standard worklist algorithm for the typestate analysis that computes and updates the data-

flow facts on the CFG of a function for a given UAF candidate object o (determined by pre-analysis) until a fixed point. Unlike ESP [31], which starts its path-sensitive analysis from the entry of a CFG, our analysis starts from the allocation statement

nmalloco (line 3) of o to trade precision for efficiency. Our analysis handles three types of CFG nodes, JointNode (lines 8 – 10),

BranchNode (lines 11 – 15) and StmtNode (lines 16 – 19) using the flow functions given in Figure 3.4(a), mapping an input state to an output state for every node.

At line 19, we record the current free sites for object o in Fo found during the control-flow traversal so that we can pair them with uses of o in order to validate their associated aliases using our SVM classifier. Figure 3.4(b) gives the typestate grouping function α(S) that reorganizes a set of symbolic states S by merging the execution states of two symbolic states s1 and s2 ∈ S if s1 and s2 have the same property state.

For a joint node, the flow function Fjnt unions the data-flow facts on its incoming

0 edges. For a branch node, Fbr updates its input execution state σ with σ = σ ∪{E} if F easible(σ0) holds, i.e., the branch predicate E is not ruled out due to path contradiction, decided by a satisfiability solver, which is Z3 [32] in our evaluation.

For a statement node, Fstmt maps its input state to a new output state by using the transfer function TF defined in Figure 3.4(c). Note that a free statement is handled at line 19 as a special case.

TF (n, hlive, σi, o) and TF (n, hdead, σi, o) handle the state transitions of o when the current property states are live and dead, respectively. The former is handled in the usual way. So let us focus on the latter. Tac reports a UAF bug if TF (n, hdead, σi, o) signifies a state transition of o from dead to error (Fig- Chapter 3. Machine-Learning-Guided UAF Detection 38

Fjnt(n, S1,S2) = α(S1 ∪ S2) 0 0 0 Fbr(n, S, E) = α({hρ, σ i | σ =σ∪{E} ∧ F easible(σ ) ∧ hρ, σi∈S}) Fstmt(n, S, o) = α({TF (n, hρ, σi, o) | hρ, σi∈S})

(a) Flow functions for three types of CFG nodes

F α(S) = {hd, hρ,σi∈S[d] σi | d ∈ P roperties ∧ S[d] 6= ∅} where S[d] = {hρ, σi ∈ S | d = ρ}

(b) Grouping function for merging symbolic states

 hlive, σ[q ←σ(E)]i if n is q = E TF n, hlive, σi, o  = hdead, σi if n is free(p) ∧ o ∈ pt(p) hlive, Γ(σ, n)i otherwise  hdead, σ[q ←σ(E)]i if n is q = E herror, Γ(σ, n)i if n is use(q) ∧ o ∈ pt(q) ∧ predict(n) TF n, hdead, σi, o  = herror, σi if n is free(q) ∧ o ∈ pt(q) ∧ predict(n) hdead, Γ(σ, n)i otherwise T rue if ∃ m ∈ F : hm, ni ∈ X where predict(n) = o ML F alse otherwise

(c) Transfer function for program statement (with Fo defined in Algorithm1 )

if pt(p) = {o0} and Statement n otherwise o0 ∈Singleton q = ∗p σ[q ← σ(o0)] σ[q ← ⊥] q = p→fld σ[q ← σ(o0.fld)] σ[q ← ⊥] q = p[i] σ[q ← σ(o0[σ(i)])] σ[q ← ⊥] ∗p = q σ[o0 ←σ(q)] σ[∀o0∈ pt(p): o0 ←⊥] p→fld = q σ[o.fld0 ←σ(q)] σ[∀o0∈ pt(p): o0.fld ← ⊥] p[i] = q σ[o0[σ(i)]←σ(q)] σ[∀o0∈ pt(p): o0[σ(i)] ← ⊥] p = &a σ[p ← ⊥] p = malloco

(d) Γ(σ, n): Updating execution states for memory statements

Figure 3.4: The data-flow functions for Tac’s machine-learning-guided intraproce- dural typestate analysis. Chapter 3. Machine-Learning-Guided UAF Detection 39 ure 3.1), when n is a use(q) or a free(q). Thus, there are two cases. If n is a use(q), then a UAF warning is issued when both (1) o ∈ pt(q), implying that ∗p in a free site free(p) seen earlier and ∗q are found to be aliased with o by the pointer analysis [116], and (2) predict(n) returns true, implying that this alias is also validated by our SMV classifier. If n is a free(q), then a double-free warning is issued, instead.

Finally, Figure 3.4(d) gives the rules for performing strong updates on address- taken objects in order to improve the precision of F easible(σ0) in Figure 3.4(a) for top-level variables. We can distinguish two cases when tracking the execution states of statements. For a scalar statement p ← E, σ simply evolves into σ[p ← σ(E)].

However, updating σ for a memory-related statement is more complex, as shown in

Figure 3.4(d). Strong updates are performed when p points to exactly one (runtime) singleton object o0 in Singleton, which contains all objects in A except for the locals in recursion cycles and all the heap objects [69, 116]. Otherwise, the variables on the left-hand side of an assignment are updated to be ⊥ conservatively. Note that dynamically (statically) allocated arrays are treated as heap objects (locals or globals). For an array access p[i], o0[σ(i)] represents any element in o0 if i is statically unknown.

3.3.2.2 Interprocedural Analysis

Given a whole program, our typestate analysis proceeds context-sensitively on its interprocedural CFG [61] with indirect calls resolved by Andersen’s pointer anal- ysis [6]. Every function has a unique entry node and a unique exit node, with each callsite being split into a call node and a return node. Context-sensitivity is achieved by solving a balanced-parentheses problem [102] with an additional ab- stract call stack (a sequence of callsites) maintained in every symbolic state to filter Chapter 3. Machine-Learning-Guided UAF Detection 40

1: int main() { 2: int* p = malloc(1);//o Line Symbolic State 3: int flg = userInput(); 2: [live, p=⊥] 4: int* r = &flg; 4: [live, r=p=flg=⊥] 5: int i = 1, j = 0; 6: if (flg < 0) { 5: [live, r=p=flg=⊥, i=1, j=0] free(p) 7: ; 6: [live, r=p=⊥, flg<0, i=1, j=0] 8: *r = i; 9: } 7: [dead, r=p=⊥, flg<0, i=1, j=0] 10: else { 8: [dead, r=p=⊥, flg=1, i=1, j=0] 11: *p = i; 11: [live, r=p=⊥, flg≥0, i=1, j=0] 12: *r = j; 13: } 12: [live, r=p=⊥, flg=0, i=1, j=0] ...//q is defined here 13: [dead, r=p=⊥, flg=1, i=1, j=0] ∪ 14: if (flg == 1) [live, r=p=⊥, flg=0, i=1, j=0] 15: *q = i; 16: } 14: [dead, r=p=⊥, flg=1, i=1, j=0]

Figure 3.5: An example for illustrating Tac. out unrealizable inter-procedural paths by matching calls and returns. Following

ESP [31], we apply a mod-ref analysis to avoid analyzing a function invoked at a callsite if it may not access the candidate UAF object being analyzed by using value-flow slicing [31, 119]. Unlike ESP [31], which starts its from the entry of the program, our analysis starts from an allocation statement, as discussed above.

3.3.2.3 Example

We use an example in Figure 5.3 to illustrate how Tac correctly reports the true UAF bug (at lines 7 and 15). At line 2, a memory object o is allocated and pointed by p. In the if-branch (lines 6 – 9), o is freed, indicated with flg set as 1. In the else-branch (lines 10 – 13), o is updated, indicated with flg set as 0. Lines 14 – 15 are the buggy code that mistakenly check flg == 1 instead of flg == 0 before dereferencing p, causing a UAF bug. Figure 5.3 gives the symbolic states obtained by Tac at some program points. Chapter 3. Machine-Learning-Guided UAF Detection 41

Tac starts from o’s allocation site at line 2, where the property state of o is initialized as live and the symbolic state of p is set as ⊥. At line 3 (not shown), flg is assumed to be initialized to ⊥ (returned by userInput()). Let us see how the if-branch (lines 6 – 9) is analyzed. When analyzing line 6, Tac records its branch condition flg < 0 in the resulting symbolic state. At line 7, o is freed, causing the property state of o to transit from live to dead. At line 8, Tac makes a strong update to get flg = 1, since r points to flg, where flg ∈ Singleton.

Let us now move to the else-branch (lines 10 – 13). When analyzing line 10,

Tac records flg ≥ 0 in the resulting symbolic state. At line 11, the property state of o remains unchanged according to Γ. At line 12, Tac makes a strong update to get flg = 0.

At line 13, Fbr is applied to merge the two symbolic states from the two branches. The if branch at lines 14-15 filters out the states that do not satisfy flg == 1.

Thus, [dead, r = p = ⊥, flg = 1, i = 1, j = 0] is kept but [live, r = p = ⊥, flg =

0, i=1, j=0]] dropped. Finally, there are two cases when line 15 (∗q = i) is analyzed. If ∗q is found not to be aliased with ∗p according to the pointer analysis, then no UAF bug exists.

Otherwise, predict(∗q = i) comes into play. The FSA for o will transit from dead to error if hfree(p), *q=ii∈XML and remains in the dead state otherwise.

3.4 Evaluation

Our evaluation aims to demonstrate the effectiveness of our machine-learning- guided approach in detecting UAF bugs with a low false alarm rate in real-world programs. We evaluate Tac using eight popular open-source C/C++ programs described in Table 3.4: rtorrent, a fast text-based BitTorrent client; less, a text Chapter 3. Machine-Learning-Guided UAF Detection 42

Table 3.4: Open-source benchmarks.

Program Version Language LOC #Frees #Uses rtorrent 0.96 C++ 13,036 118 3,039 less 451 C 27,134 86 7,902 bitlbee 4.2 C 68,413 201 5,897 nghttp2 1.6.0 C++ 71,387 29 7,566 mupdf 1.2.337 C++ 122,481 253 105,911 h2o 1.7.2 C++ 517,731 896 150,887 xserver 1.14.3 C 568,964 1,675 90,841 php 5.6.7 C 709,356 1,391 244,917 Total — — 2,098,502 4,649 616,960

file viewer; bitlbee, a cross-platform IRC instant messaging gateway; nghttp2, an implementation of hypertext transfer protocol; mupdf, an E-book viewer; h2o, an optimized HTTP server, xserver, a windowing system for bitmap displays on

UNIX-like OS; and php, a general-purpose scripting language for web development.

Tac is implemented in the LLVM compiler (version 3.8.0) [62]. The source files of each C/C++ program are compiled under -O0 into LLVM bit-code files by

Clang and then merged using the LLVM Gold Plugin at link time to produce a whole program bc file.

In the training phase, Tac uses the widely-used SVM classifier libSVM [17]. In the analysis phase, Tac’s pre-analysis is implemented on top of SVF [117]. For the flow-sensitive demand-driven pointer analysis [116] deployed in the analysis phase, the budget of a points-to query is set as 50,000 (the maximum number of def-use chains traversable) in the underlying pointer analysis to enable early termination and returning conservative may-alias results.

Our experiments were conducted on a 3.0 GHZ Intel Core2 Duo processor with

128 GB memory, running RedHat Enterprise Linux 5 (2.6.18). As listed in Ta- Chapter 3. Machine-Learning-Guided UAF Detection 43

Table 3.5: 14 (distinct) UAF bugs detected by Tac, including 5 known CVE vul- nerabilities and 1 known bug given in Column 2 and 8 new bugs given in Column 3.

Known bugs New bugs Program Identifier Detected #Detected rtorrent —— 0 less —— 1 bitlbee CVE-2016-10188 ! 0 nghttp2 CVE-2015-8659 ! 0 mupdf BugID-694382 ! 0 h2o CVE-2016-4817 ! 5 xserver CVE-2013-4396 ! 0 php CVE-2015-1351 ! 2 ble 3.4, the eight programs combined exhibit a total of 2,098,502 LOC, containing

4,649 free sites and 616,960 use sites. As shown in Table 3.5, these programs contain

6 known UAF bugs, with 5 registered in the CVE database and 1 unregistered.

3.4.1 The Training Phase

We train the SVM classifier for Tac using both false and true UAF samples in real-world programs, as illustrated in Table 3.6. To generate false alarm samples, we run Tac-NML, an ESP-based typestate analysis without machine learning, to analyze four relatively small ones in the set of eight programs evaluated (Table 3.4), rtorrent, less, bitlbee, and nghttp2. Then, we manually inspect 30% (a limit set for the manual labor invested) of all the warnings reported by Tac-NML for each program. To generate true UAF bugs, we use all the 138 C programs and 322

C++ programs (which are small programs extracted from real-world applications) in the CWE-416-Use-After-Free category of Juliet Test Suite (JTS) [1], with each program containing one single UAF vulnerability. In addition, we also make use of synthetic UAF bugs automatically introduced into the training programs, inspired Chapter 3. Machine-Learning-Guided UAF Detection 44 by the bug insertion technique [96]. To do so, we first find all use-before-free pairs huse(p), free(q)i statically by conducting a control-flow reachability analysis from a use(p) to a free(q), where ∗p and ∗q are aliases identified by a flow-sensitive pointer analysis [116]. Next, we swap use(p) and free(q) for each pair and run

Valgrind [84] to detect dynamically if the thus injected UAF bug manifests itself as a true bug under the default test inputs in every program. Finally, all UAF samples, including 623 false and 858 true bugs as shown in Columns 2 and 3 of

Table 3.6, are annotated for feature extraction.

The training phase applies the standard 5-fold cross validation to find optimal intrinsic SVM parameters that yield the best classification accuracy. We consider three standard metrics: accuracy, precision and recall. Accuracy is the percentage of correctly classified samples out of all the samples. Precision is the percentage of correctly classified true positive samples out of the samples that are classified as true positives. Recall is the percentage of correctly classified true positive samples out of all the true positive samples.

Due to 5-fold cross validation, Tac is highly effective for the training programs, with its the accuracy, precision and recall results given in Columns 4 – 6 of Ta- ble 3.6. For all the training programs combined, Tac’s accuracy, precision and recall are 95.0%, 92.6% and 95.8%, respectively. These results indicate that the

SVM classifier trained by using the 35 features (Table 3.2) and the RBF kernel

(Section 3.3.1) is effective in classifying true and false UAF samples.

3.4.2 The Analysis Phase

Our results are summarized in Table 3.7 and Table 3.8. In Table 3.7, Column 2 gives the number of candidate UAF pairs computed by Tac’s pre-analysis, which selects a candidate hfree(p), use(q)i if free(p) can reach use(q) context-sensitively Chapter 3. Machine-Learning-Guided UAF Detection 45

Table 3.6: Results of training. #True and #False are the numbers of true and false UAF samples, respectively.

Samples Results Program #True #False Accuracy Precision Recall rtorrent 46 69 88.6% 81.0% 93.4% less 22 237 96.9% 77.0% 91.0% bitlbee 52 31 90.4% 86.7% 100.0% nghttp2 43 61 82.7% 75.5% 86.0% JTS-C 138 138 96.4% 97.8% 94.9% JTS-C++ 322 322 97.4% 97.2% 97.5% Total 623 858 95.0% 92.6% 95.8% via control-flow, where ∗p and ∗q are found to be aliases by Andersen’s pointer analysis [6] implemented in [117].

In Columns 3 – 4 of Table 3.7, we give the results produced by Tac-NML (i.e.

Tac without machine learning). For each program, Column 3 gives the number of warnings reported and Column 4 gives the reduction rate with respect to the number of warnings produced by the pre-analysis. On average (across the eight programs), Tac-NML achieves a reduction rate of 81.2%. This indicates that path-sensitive typestate analysis alone is quite effective in improving the precision of a coarse-grained pre-analysis. However, a total of 19,803 UAF warnings are still reported, making Tac-NML impractical.

In Columns 5 – 6 Table 3.7, we give the results produced by Tac when machine learning is enabled. For each program, Tac has improved Tac-NML significantly by reducing the number of warnings further (Column 5) and thus achieving an impressive reduction rate (with respect to Tac-NML) (Column 6). On average,

Tac achieves a reduction rate of 96.5%, resulting in only 266 warnings. This shows its effectiveness in suppressing warnings raised.

Tac is also efficient, as shown in Column 2 of Table 3.8 (with the analysis Chapter 3. Machine-Learning-Guided UAF Detection 46

Table 3.7: Tac’s warning-reduction ability. #Candidates is the number of candi- date UAF pairs found by pre-analysis. #WarnsTac−NML and #WarnsTac denote the number of warnings reported by Tac-NML and Tac, respectively. Reduction1 is computed as (#Candidates - #WarnsTac−NML) / #Candidates. Reduction2 is computed as ( #WarnsTac−NML - #WarnsTac) / #WarnsTac−NML.

Program #Candidates #WarnsTac−NML Reduction1 #WarnsTac Reduction2 rtorrent 803 229 71.5% 0 100.0% less 4,628 790 82.9% 3 99.6% bitlbee 529 113 78.6% 16 85.8% nghttp2 975 210 78.5% 16 92.4% mupdf 21,701 1,658 92.4% 50 97.0% h2o 18,143 3,559 80.4% 23 99.4% xserver 53,258 6,706 87.4% 102 98.5% php 26,306 5,818 77.9% 56 99.0% Total 126,343 19,083 — 266 —

Table 3.8: Tac’s efficiency and effectiveness. #True is the number of true bugs (confirmed by manual inspection), FPR is the false positive rate and TPR is the true positive rate.

Program Time (secs) #True FPR TPR rtorrent 90 0 — — less 316 1 66.7% 33.3% bitlbee 151 9 43.8% 56.3% nghttp2 83 7 56.3% 43.8% mupdf 197 19 62.0% 38.0% h2o 6,205 9 60.9% 39.1% xserver 2,053 40 60.8% 39.2% php 5,942 24 57.1% 42.9% Total 15,037 109 — — Chapter 3. Machine-Learning-Guided UAF Detection 47 time of a program averaged over the five runs). Tac spends a total of 4.2 hours on analyzing all the eight programs (consisting of 2,098 KLOC in total), with 90 seconds for the smallest program (less) and 5,942 seconds for the largest program

(php). In the last three columns of Table 3.8, with Column 3 giving the number of true bugs (confirmed by manual inspection), Column 4 the false positive rate (FPR), and Column 5 the true positive rate (TPR) for each program, we see that Tac is capable of finding UAF bugs at low false positive rates. Out of the total 266 warnings reported, 109 are true bugs, yielding an FPR of 58.2% (or a TPR of

41.8%). Thus, our machine-learning-guided approach is effective in locating UAF bugs (with reasonable manual inspection effort required).

Among the 109 bugs found, 14 bugs are distinct (with the UAF pairs shar- ing the same free site and dereferencing the same pointer at their use sites being counted as one), as listed in Table 3.5. These include 6 known ones (5 known CVE vulnerabilities and 1 known bug) and 8 new ones (1 in less, 5 in h2o and and 2 in php).

For less, the 1 new bug is found in a while loop in function ch delbufs in ch.c

(illustrated in Figure 3.6). For h2o, the 5 new bugs are all interprocedural due to premature connection close operations, including one in function do emit writereq in connection.c (illustrated in Figure 3.7), one in function h2o timeout unlink in timeout.c, one in function h2o http2 scheduler run in scheduler.c, one in function h2o linklist unlink and one in function h2o linklist islink in linklist.c. For php, the 2 new bugs are interprocedural, found in zend persist.c (illustrated in Figure 3.8).

It is important to emphasize that for the 14 distinct UAF bugs found, as listed in Table 3.5, only 3 bugs appear in the four training programs. This demonstrates Chapter 3. Machine-Learning-Guided UAF Detection 48

//ch.c 774 static void ch_delbufs() 775 { 776 register struct bufnode *bn; 777 step2 778 while (ch_bufhead != END_OF_CHAIN) 779 { step3 780 bn = ch_bufhead; step4 781 (bn)->next->prev = (bn)->prev; (bn)->prev->next = (bn)->next; step1 782 free(((struct buf *) bn)); 783 } 784 ch_nbufs = 0; 785 init_hashtbl(); 786 }

Figure 3.6: A UAF bug found in less, with the free, use and their aliasing high- lighted in pink, blue and red, respectively. again the effectiveness of our approach in applying machine learning to static UAF detection.

3.4.3 Case Study

Let us take a look at some representative UAF bugs (both previously known and unknown) found by Tac in three programs.

less. Figure 3.6 shows a new UAF bug found in less (version 451) by Tac. At line 782, the program frees an object pointed to by bn and then starts possibly the next iteration of the while-loop at line 778. At line 780, bn is made to point to the same freed object and then dereferenced four times at line 781, causing one distinct

UAF bug. This bug occurs since the programmer forgot to update ch bufhead in the while loop after bn has been freed.

h2o. Figure 3.7 shows a known UAF bug (CVE-2016-4817) in h2o (version

1.7.2) detected by Tac. The program frees conn at line 261 in close connection now through a nested call chain via lines 834 and 861. Then conn is used in the func- Chapter 3. Machine-Learning-Guided UAF Detection 49

//lib/http2/connection.c 228 void close_connection_now(http2_conn_t *conn) { step1 261 free(conn); 262 }

811 static void parse_input(http2_conn_t *conn) { 829 if (ret < 0) { step2 834 close_connection_now(conn); 836 } 848 }

850 static void on_read(socket_t *sock, int stat) { 852 http2_conn_t *conn = sock->data; step3 861 parse_input(conn); step4 865 timeout_unlink(&conn->_write.timeout_entry); step5 866 do_emit_writereq(conn); 868 }

step6 994 int do_emit_writereq(http2_conn_t *conn) { step7 1006 buf = {conn->_write.bbytes, conn->_write.bsz}; step8 1007 socket_write(conn->sock, &buf, 1, on_w_compl); 1012 }

Figure 3.7: CVE-2016-4817 and two new bugs in h2o. tion timeout unlink called at line 865. Tac has succeeded in finding this CVE vulnerability and also two new bugs on dereferencing conn at lines 1006 – 1007 in the function do emit writereq called at line 866. These two new bugs are counted as one distinct bug.

php. Figure 3.8 shows a known UAF bug (CVE-2015-1351) and two new ones in php (version 5.6.7) detected by Tac. These bugs are found in two files. CVE- 2015-1351 is in zend shared alloc.c, where the object pointed by source is freed at line 350 and then used inside a function called at line 352. In addition, Tac also finds two new UAF bugs in zend persist.c. One is a UAF, where the object pointed to by source (and also by ast) is freed at line 350 and then accessed at line 154. The other is a double-free bug, as the same object (pointed by source and ast) is freed at line 350 and then again at line 160. CVE-2015-1351 was fixed in the latest version by simply moving line 352 to just before line 349. However, Chapter 3. Machine-Learning-Guided UAF Detection 50

//ext/opcache/zend_shared_alloc.c 338 void *_zend_shared_memdup(void *source, size_t s){ 349 if (free_source) { step1 350 free(source); 351 } step2 352 zend_shared_alloc_register_xlat_en(source, r); 353 return retval; 354 }

//ext/opcache/zend_persist.c 143 zend_ast *zend_persist_ast(zend_ast *ast) { step3 153 node = _zend_shared_memdup(ast, size); step4 154 for (i = 0; i < ast->children; i++) { 155 if ((&node->u.child)[i]) { 156 (&node->u.child)[i] = ...; 157 } 158 } step5 160 free(ast); 161 return node; 162 }

Figure 3.8: CVE-2015-1351 and two new bugs in php. the two new ones remain unfixed.

3.5 Chapter Summary

In this chapter, we present Tac, a machine-learning-guided static UAF detection framework that bridges the gap between typestate and pointer analyses by cap- turing the correlations between program features and UAF-related aliases that are often imprecisely answered by the state-of-the-art pointer analysis. Tac is effective (in terms of finding 5 known CVE vulnerabilities, 1 known bug, and 8 new bugs with a low false alarm rate) and scalable (in terms of analyzing a large real-world codebase with 2,098 KLOC in just over 4 hours).

Tac relies on pointer analysis and machine learning. Its accuracy can be fur- ther improved in several ways. First, path-sensitivity can be strengthened by solv- ing path feasibility more soundly and precisely. Currently, non-singleton objects Chapter 3. Machine-Learning-Guided UAF Detection 51 are over-approximated to contain ⊥ and path conditions are interpreted as non- satisfiable when the underlying satisfiability solver returns unknown results (caus- ing infeasible paths to be considered conservatively as feasible). Second, a more advanced pointer analysis can itself enable more UAF pairs to be ruled out as UAF bugs. Finally, a better SVM classifier can be developed by adding more UAF train- ing samples in real-world programs and extending the set of features introduced. Chapter 4

Pointer-Analysis-Based Static

Detection of Use-After-Free

Vulnerabilities

Orthogonal to the approach in Chapter3 which uses machine learning to mitigate the gap between pointer analysis and UAF detection, this chapter investigates the problem from a different perspective, in which UAF detection is performed in a combined domain with pointer analysis by a novel reduction technique.

4.1 Overview

In this chapter, we introduce a new pointer-analysis-based static source code anal- ysis for finding UAF bugs in multi-MLOC C programs efficiently and effectively.

We first formulate the problem of detecting UAF bugs statically. We then describe several challenges faced, existing static techniques (for analyzing C/C++ source code), and our solution (by highlighting its novelty).

52 Chapter 4. Pointer-Analysis-Based UAF Detection 53

4.1.1 Problem Statement

Consider a pair of statements, hfree(p@lf ), use(q@lu)i, where p and q are pointers and lf and lu are line numbers. Let P(l) be the set of all feasible (concrete) program paths reaching line l from main(). The pair is a UAF bug if and only if  ST free(p@lf ), use(q@lu) holds:

[Spatio-Temporal Correlation]  ST free(p@lf ), use(q@lu) := (4.1) ∼ ∃ (ρf , ρu) ∈ P(lf ) × P(lu):(ρf , lf ) (ρu, lu) ∧ (ρf , p) = (ρu, q) ; where denotes temporal reachability (in the program’s ICFG (Interprocedural ; Control Flow Graph)) and ∼= denotes a spatial alias relation (meaning that p and q point to a common object). By convention, (ρ, l) identifies the program point l under a path abstraction ρ. Both temporal and spatial properties must correlate on the same concrete program path. However, ST is not computationally verifiable due to exponentially many paths in large codebases.

4.1.2 Challenges

One main challenge faced in designing a pointer-analysis-based static UAF anal- ysis, A, lies in how to reason about the exponential number of program paths in P(lf ) × P(lu) in order to find real bugs at a low false positive rate. This en-

A tails approximating ST with ST by abstracting these program paths with some contexts according to a tradeoff to be made among soundness, precision and scal-  ability. A is sound (by catching all UAF bugs) if ST free(p@lf ), use(q@lu) A  ⇒ ST free(p@lf ), use(q@lu) for every UAF pair hfree(p@lf ), use(q@lu)i. A A  is precise (by reporting no false alarms) if ST free(p@lf ), use(q@lu) ⇒ ST Chapter 4. Pointer-Analysis-Based UAF Detection 54

 free(p@lf ), use(q@lu) for every hfree(p@lf ), use(q@lu)i. A is regarded as being

A scalable if ST can analyze large codebases under a given budget. For convenience, A ST is also said to be sound/precise/scalable if A is sound/precise/scalable. Another challenge is how to verify efficiently and precisely, especially in the ; presence of aliasing, as discussed below.

A final challenge lies in how to obtain ∼= efficiently and precisely. This re- quires a pointer analysis that is field-sensitive (by distinguishing different fields in a struct), flow-sensitive (by distinguishing flow of control), context-sensitive (by distinguishing calling contexts for a function), and path-sensitive (by distinguishing different program paths). However, computing such precise points-to information by reasoning about P(lf ) × P(lu) is unscalable, despite recent advances on whole- program [45, 69, 148, 151] and demand-driven [47, 108, 116, 156] pointer analyses for C programs.

4.1.3 State of the Art

Due to the above challenges, there has been little work on developing specialized static approaches for detecting UAF bugs at the source-code level. General-purpose static approaches for detecting memory corruption bugs include model checking [8,

10, 59], abstract interpretation [5, 36, 54], pattern matching [91], and pointer analy- sis [112, 116]. Their corresponding representative tools are CBMC [59], Clang [5],

Coccinelle [91], and Supa (which can be leveraged for finding UAF bugs) [116].

Model checking CBMC [59] is a bounded model checker that reasons about all the program paths in P(lf ) × P(lu) given in Formula 4.1 for C/C++ programs as constraints that can be solved by an SMT solver. When used in finding UAF bugs, CBMC is sound (in a bounded manner) and highly precise but scales only Chapter 4. Pointer-Analysis-Based UAF Detection 55 to small programs [130] whose “sizes are restricted” (according to its user manual).

Abstract interpretation Clang [5] is an abstract interpreter for analyzing C/C++ programs. It adopts a highly unsound model by analyzing only a small subset of the program paths in P(lf )×P(lu) given in Formula 4.1 in order to achieve scalability and precision. To scale for large codebases with few false alarms, Clang limits its UAF-bug-finding ability by performing an intraprocedural analysis (with inlining). In general, such tools refrain from reporting too many false alarms, but at the expense of missing many UAF bugs.

Pattern matching Coccinelle [91] is a pattern-based tool for analyzing and certifying C programs. Coccinelle can find UAF bugs based on some patterns given. Due to the lack of the points-to information, Coccinelle can be both fairly unsound and imprecise but is highly scalable (due to its pattern-matching nature).

Pointer analysis Supa [116] is a state-of-the-art demand-driven pointer analysis that is field-, flow- and context-sensitive but path-insensitive for C programs. When used in finding UAF bugs, Supa can be regarded as reasoning about all the program paths in P(lf )×P(lu) with an extremely coarse abstraction, {[]}×{[]}, in order to achieve soundness and scalability. By convention, [ ] represents all possible calling contexts and thus all possible (concrete) paths reaching l. Thus, ST in Formula 4.1 Supa is weakened significantly to ST :

[Spatio-Temporal Correlation with a High Level of Spuriosity] (4.2) Supa  ∼ ST free(p@lf ), use(q@lu) := ([ ], lf ) ([ ], lu) ∧ ([ ], p) = ([ ], q) ; where is the standard context-sensitive reachability and ∼= is the standard ; context-sensitive alias relation obtained under [ ]. Chapter 4. Pointer-Analysis-Based UAF Detection 56

Supa When used in finding UAF bugs, ST will be highly imprecise, since spurious spatio-temporal correlations are introduced at an extremely large number of UAF

Supa   pairs, where ST free(p@lf ), use(q@lu) 6⇒ ST free(p@lf ), use(q@lu) holds, as explained in Section 4.2 and validated in Section 4.5. These spurious correlations are false alarms.

4.1.4 Our Solution and Contributions

We introduce an (interprocedural) pointer-analysis-based static analysis, CRed, for finding UAF bugs in multi-MLOC C code, by making several contributions.

First, we present a spatio-temporal context reduction technique that enables developing our new static UAF analysis systematically by simplifying ST in For- CRed mula 4.1 into ST given below:

[Spatio-Temporal Context Reduction]

CRed  ST free(p@lf ), use(q@lu) := (4.3) ∼ ∃ (ρef , ρeu) ∈ Pe(lf ) × Pe(lu):(ρef , lf ) (ρeu, lu) ∧ (ρef , p) = (ρeu, q) ;

CRed We ensure that ST is sound by requiring Pe(l) to be a coarser abstraction of P(l) and scalable by requiring |Pe(lf ) × Pe(lu)|  |P(lf ) × P(lu)|. Unlike

Supa CRed CRed  ST , however, ST will be highly precise, as ST free(p@lf ), use(q@lu)  6⇒ ST free(p@lf ), use(q@lu) happens only for a small number of UAF pairs. With spatio-temporal context reduction, CRed is designed purposely to preserve the spatio-temporal correlation of ST by keeping spurious correlations, i.e., false alarms, as low as possible. Without it, CRed will be either highly unsound or highly imprecise.

Second, we adopt a multi-stage approach that starts with some UAF pairs ob- Chapter 4. Pointer-Analysis-Based UAF Detection 57 tained by a pre-analysis and then uses increasingly more precise yet more costly

UAF analyses on increasingly fewer UAF pairs to filter out false alarms. In our cur- rent implementation, we perform context reduction by first using calling contexts and then considering path sensitivity. Staging such analyses this way improves the efficiency of the overall solution.

Third, we introduce a demand-driven pointer analysis with field-, flow-, context- and path-sensitivity as the foundation for the main analysis stages of CRed. This work is the first to consider path-sensitivity on-demand in order to reduce false

UAF alarms.

Finally, we have implemented CRed in LLVM-3.8.0 and compared it with four state-of-the-art source-code analysis tools: CBMC (model checking) [59],

Clang (abstract interpretation) [5], Coccinelle (pattern matching) [91], and

Supa (pointer analysis) [116] using all the C test cases in Juliet Test Suite (JTS) [1] and 10 open-source C applications. For the ground truth evaluated with JTS,

CRed is as effective as CBMC and Supa by detecting all the 138 known UAF bugs while Clang reports only 36 bugs and Coccinelle finds 126 bugs, with no false alarms issued in all the cases. For practicality evaluated with the 10 applica- tions (totaling over 3 MLOC), CRed produces 132 warnings including 85 bugs in about 7.6 hours. In contrast, CBMC produces no warnings, terminating in 19.0 hours for the smallest application but exceeding the 3-day time budget for every remaining application; Clang reports 3 warnings including 1 bug in 1.2 hours;

Coccinelle reports 103 false alarms in 179.0 seconds without finding any bugs; and Supa detects the same 85 bugs found by CRed, together with 23,095 false alarms, in 5.1 hours. Chapter 4. Pointer-Analysis-Based UAF Detection 58

4.2 The CRed Framework

Pre- Context Path- Input Analysis Refined Reduction Refined Sensitivity Refined pairs pairs pairs Bug Program Report Demand-Driven Pointer Analysis

Figure 4.1: Workflow of CRed.

As depicted in Figure 5.2, we start with a fast but imprecise “Pre-Analysis” (i.e., an Andersen-style pointer analysis [6]) to obtain a set of candidate UAF pairs to be analyzed (according to Formula 4.1). We then apply two spatio-temporal context reductions, “Calling Context Reduction” (Section 4.2.1) and “Path Reduction”

(Section 4.2.2), founded on the same demand-driven pointer analysis infrastructure.

Note that each stage refines the results from the preceding one.

We focus on describing how calling-context reduction works and why it is sig- nificant. Without the two reduction techniques, a UAF analysis that relies on existing pointer analysis techniques will be either unscalable or highly imprecise

(Section 4.5).

4.2.1 Calling-Context Reduction

CRed The objective is to simplify ST in Formula 4.1 into ST in Formula 4.3 by abstracting program paths with calling contexts so that CRed is sound, scalable and precise. Our example is given in Figure 4.2. We use whole-program pointer analysis [45, 69, 148, 151] to explain why CRed would be unscalable if full calling contexts were used (although it would be highly precise) and imprecise if k-limited calling contexts were used (although it would be possibly scalable). These argu- ments apply also to demand-driven pointer analysis [47, 108, 116, 156] (as validated Chapter 4. Pointer-Analysis-Based UAF Detection 59

1: int main() { 21: int* xmalloc() { 2: f1(); //ca1 22: return malloc(1); //o 3: f1(); //cb1 23: } 4: }

5: void f1() { 24: void xuse(int* u) { 6: f2(); //ca2 25: xxuse(u); //c8 7: f2(); //cb2 26: } 8: } ...... 9: void fn() { 27: void xfree(int* v) { 10: com(); //c1 28: xxfree(v); //c9 11: } 29: }

12: int *x, *y; 30: void xxuse(int* q) { 13: void com() { 31: print(*q); //use(q) 14: x = xmalloc(); //c2 32: } 15: y = xmalloc(); //c3 16: xuse(x); //c4 17: xfree(x); //c5 33: void xxfree(int* p) { 18: xuse(y); //c6 34: free(p); 19: xfree(y); //c7 35: } 20: }

Figure 4.2: An example program that illustrates how calling-context reduction overcomes the limitations of full and k-limited context sensitivity in UAF detection.

later). We achieve both efficiency and precision by reducing full calling contexts substantially in both length and quantity.

4.2.1.1 Context-Sensitivity

We introduce the terminologies and notations used in context-sensitive program analysis.

• Call String (or Call Stack). In a k-limited or k-callsite context-sensitive analysis, every variable accessed or object allocated in a function fun is identified Chapter 4. Pointer-Analysis-Based UAF Detection 60

by a call string c = [c1, . . . , ck], known as a calling context, which represents a sequence of the k-most-recent call sites (on the call stack) calling fun. In a call

string, every recursion cycle is typically approximated once. The analysis is said

to be fully context-sensitive if c1 starts from main().

• Context-Sensitive Control-Flow Reachability. Given two program points l and l0 identified under contexts c and c0, respectively, (c, l) (c0, l0) signifies that ; (c, l) reaches context-sensitively (c0, l0). This is solved as a balanced-parentheses

problem by matching calls and returns to filter out unrealizable paths in the

program’s ICFG [102]. We start from (c, l) with an abstract stack initialized as

c. When entering a callee function from a callsite ci, we push ci into the context

stack containing c, denoted c ⊕ [ci]. When returning from a callee to a callsite

cj, we pop cj from the current stack containing c, denoted c [cj], if c contains

cj as its top value or c = [ ] since a realizable path may start and end in different functions. Finally, (c, l) (c0, l0) is established if l0 is reached when the context ; stack contains c0.

• k-Call-Site Context-Sensitive Pointer Analysis. Let pt(c, v) be the points- to set of a variable v under a calling context c such that |c| = k. Given two

variables p and q,(c, p) ∼= (c0, q) holds if p and q may point to a common object, i.e., pt(c, p) ∩ pt(c0, q) 6= ∅. Here, c (c0) represents the calling sequence for the

function where p (q) is defined and h (h0) represents the calling sequence for the

function where object o is allocated. We speak of full context-sensitivity if c, c0,

h and h0 all start from main().

4.2.1.2 Limitations of k-Call-Site Context-Sensitivity.

Figure 4.2 illustrates a typical heap usage scenario. In lines 1 – 11, there are 2n call- ing contexts to com() from main(). In lines 12 – 35, two heap objects are allocated Chapter 4. Pointer-Analysis-Based UAF Detection 61

pt([ca1, ..., can, c1, c5, c9], p) = { ([ca1, ..., can, c1, c2], o) } pt([ca1, ..., can, c1, c7, c9], p) = { ([ca1, ..., can, c1, c3], o) }

pt([ca1, ..., can, c1, c4, c8], q) = { ([ca1, ..., can, c1, c2], o) } ૛

pt([ca1, ..., can, c1, c6, c8], q) = { ([ca1, ..., can, c1, c3], o) } ࢔ ૝ ൈ ...

pt([cb1, ..., cbn, c1, c5, c9], p) = { ([cb1, ..., cbn, c1, c2], o) } pt([cb1, ..., cbn, c1, c7, c9], p) = { ([cb1, ..., cbn, c1, c3], o) } pt([cb1, ..., cbn, c1, c4, c8], q) = { ([cb1, ..., cbn, c1, c2], o) } pt([cb1, ..., cbn, c1, c6, c8], q) = { ([cb1, ..., cbn, c1, c3], o) } (a) Fully context-sensitive points-to sets

pt([c5, c9], p) = {([c2], o)} pt([c9], p) = { ([c2], o), ([c3], o) } pt([c7, c9], p) = {([c3], o)} pt([c8], q) = { ([c2], o), ([c3], o) } pt([c4, c8], q) = {([c2], o)} pt([c6, c8], q) = {([c3], o)} (b) k-limited context-sensitive (c) Points-to sets with points-to sets (k = 1) calling-context reduction

Figure 4.3: Points-to sets of the program in Figure 4.2.

(lines 14 – 15), then used (lines 16 and 18), and finally, deallocated (lines 17 and 19),

through a series of wrappers. There is one UAF pair hfree(p@ln34), use(q@ln31)i

to be analyzed, where use(q@ln31) stands for print(∗q) at line 31.

This example is UAF-free. With full context-sensitivity, no warnings would

be reported but the resulting analysis is unscalable. With k-limiting, the analysis

scales, but at the expense of precision. To understand this, we give the points-to

sets in Figure 4.3 for the example program in Figure 4.2.

• Full Context-Sensitivity: Precise but Unscalable. As shown in Figure 4.4, R is the set of 2n full calling contexts for com(). Thus, there are 2n+1 × 2n+1

calling context pairs reaching hfree(p), print(q)i. As ∀c∈R :(c⊕[c5, c9], ln34) ;∼ (c⊕[c6, c8], ln31), free(p) reaches print(∗q). However, as @ c∈R, ([c⊕[c5, c9], p) = Chapter 4. Pointer-Analysis-Based UAF Detection 62

main f1 f2 … fn-1 fn com call strings from main to com

[ 풄풂ퟏ, 풄풂ퟐ, … , 풄풂풏, 풄ퟏ ] … [ 풄풂ퟏ, 풄풂ퟐ, … , 풄풃풏, 풄ퟏ ] [ 풄풂ퟎ, 풄…풂ퟐ,, … , 풄풂풏, 풄ퟏ ] 퐑 = [ 풄풂ퟎ, 풄…풂ퟐ,, … , 풄풃풏, 풄ퟏ ] ퟐ풏 … … …

[ 풄풃ퟏ, 풄풃ퟐ, … , 풄풂풏, 풄ퟏ ] [ 풄풃ퟏ, 풄풃ퟐ, … , 풄풃풏, 풄ퟏ ]

com main (Common Caller) xmalloc : call ...... 풄ퟐ: x=xmalloc() malloc() : return 풄 : ퟑ y=xmalloc() xuse xxuse 풄ퟒ: xuse (x) 풄ퟖ: xxuse(u) print(*q) 풄ퟓ: xfree(x) fn 풄ퟔ: xuse (y) xfree xxfree 풄ퟏ: com() 풄ퟕ: xfree(y) 풄ퟗ: xxfree(v) free(p)

෪ 풄풇풖 ∈ 푹 풉 ∈ [풄ퟐ],[풄ퟑ] 풄෦풇 ∈ 풄ퟓ,풄ퟗ , 풄ퟕ,풄ퟗ 풄෦풖 ∈ 풄ퟒ,풄ퟖ , 풄ퟔ,풄ퟖ Figure 4.4: Interprocedural control-flow graph (ICFG) of the program in Figure 4.2.

([c⊕[c6, c8], q), p and q never point to a common object. Therefore, no UAF warning will be issued. To reason about ∼=, however, we may have to compute 2n+2 points-to sets in Figure 4.3(a), making existing pointer analysis techniques

[45, 112, 116, 148] unscalable when n is large.

• k-Limiting: Scalable but Imprecise. With k = 1, the 2n+1 calling contexts

reaching free(p)( print(∗q)) are abstracted by [c9]([ c8]). Then ([c9], ln34) ∼ ; ([c8], ln31). In addition, ([c9], p) = ([c8], q) holds spuriously, based on the two points-to sets in Figure 4.3(b), computed imprecisely but possibly efficiently.

Thus, a false alarm (from line 17 to line 18) is reported.

With 2-limiting, the false alarm will be suppressed. However, increasing k will not

work for large codebases for two reasons. First, the number of context pairs to be Chapter 4. Pointer-Analysis-Based UAF Detection 63

෪ ∞ ∞ (풄풇풖⊕ 풉 , 풐) ∈ 풑풕 풄풇풖 ⊕ 풄෦풇 , 풑 ∩ 풑풕 (풄풇풖 ⊕ 풄෦풖 , 풒) Common Caller ෪풉 //풐 main 풍풐: Transitively call malloc malloc() 풄풇풖 풄෦풇 ... 풍풑: Transitively call free 풍풇: free(풑) 풄෦풖 풍풒: Transitively call use 풍풖: use(풒)

Figure 4.5: Context reduction, illustrated conceptually with an oracle fully-context- sensitive pointer analysis.

analyzed at a UAF pair will grow exponentially. Second, the optimal values for

k vary across the UAF pairs. Finding such values is beyond the state-of-the-art.

4.2.1.3 Spatio-Temporal Calling-Context Reduction

The key insight is to remove prefixes in full calling contexts that do not contribute to context-sensitivity, thereby achieving the precision of full context-sensitivity and the scalability of k-limiting.

Let pt∞(c, v) be the points-to set of v under context c computed by an oracle pointer analysis fully context-sensitively. As illustrated in Figure 4.5, a UAF pair hfree(p@lf ), use(q@lu)i is a bug when C1 – C4 hold:

(C1): main() calls, under a context cfu, a common caller function, which calls an object allocation function, e.g., malloc( ), free(p) and use(q) at lines lo, lp and lq in that order,

(C2): o is allocated under context cfu ⊕ eh,

∞ (C3):(cfu ⊕ eh, o) ∈ pt (cfu ⊕ cef , p), and

∞ (C4):(cfu ⊕ eh, o) ∈ pt (cfu ⊕ ceu, q). ∼ By definition, (cf , lf ) (cu, lu) ∧ (cf , p) = (cu, q) ⇐⇒ (cef , lf ) (ceu, lu) ∧ ∼ ; ; (cef , p) = (ceu, q), making cfu redundant. Chapter 4. Pointer-Analysis-Based UAF Detection 64

For our example, Figure 4.4 illustrates the calling context reduction performed.

As cfu ∈ R is a common prefix for the common caller, com(), that satisfies C1 – C4, a total of 2n+1 × 2n+1 full calling context pairs reaching hfree(p), print(∗q)i have been reduced to just four, with (cef , ceu) ∈ {[c5, c9], [c7, c9]} × {[c4, c8], [c6, c8]} ∼ 0 and eh ∈ {c2, c3}. As com() is a common caller, (c ⊕ cef , p) = (c ⊕ ceu, q) does not hold, i.e., p and q are must-not-aliases if c and c0 are different prefixes in R. Thus,

0 0 it is only necessary to verify (c ⊕ cef , ln34) (c ⊕ ceu, ln31) when c = c . We can ; do this efficiently by checking if car(cef ) appears lexically before car(ceu) in com(), i.e., if lp appears before lq in Figure 4.5. Note that car is the standard function for returning the first element in a sequence. For the four reduced context pairs, only ([c5, c9], ln34) ([c6, c8], ln31) holds since c5 precedes c6 in com(). According ; to the points-to sets, shown in Figure 5.1(c), computed efficiently for the reduced ∼ calling contexts, ([c5, c9], p) 6= ([c6, c8], q). Hence, no UAF warnings are reported. In Section 4.3, we will give an inference rule for performing the calling context reduction illustrated above.

4.2.2 Path Reduction

We improve precision by augmenting calling contexts with path-sensitivity. Con- sider a bug-free example in Figure 4.6. Without path-sensitivity, p at line 4 points to o1 and q at line 7 points to o1 and o2, causing a path-insensitive detector to report a false alarm hfree(p@ln4), use(p@ln7)i. With path-sensitivity, however, this false alarm will be suppressed successfully.

In Section 4.3, we will give an inference rule for path reduction. To the best of our knowledge, CRed is the first UAF detector for large codebases that reasons about path-sensitivity on-demand based on a new path-sensitive demand-driven pointer analysis. Chapter 4. Pointer-Analysis-Based UAF Detection 65

1: void foo() { 2: p = malloc(...);//o1 3: if (cnd) { 4: free(p); //free(p@ln4) 5: p = malloc(...);//o2 6: } 7: print(*p); //use(p@ln7) 8: }

Figure 4.6: Path reduction.

4.3 Design

As shown in Figure 5.2, CRed comprises three key components: ○1 spatio-temporal context reduction, ○2 demand-driven pointer analysis, and ○3 multi-stage UAF analysis. While ○1 represents the most important contribution of this chapter, we introduce ○2 and ○3 first in that order in order to build the basis for ○1 .

4.3.1 Demand-Driven Pointer Analysis

We describe a demand-driven pointer analysis that is not only field-, flow- and context-sensitive as in [112, 116] but also path-sensitive. Adding path-sensitivity is significant in terms of both advancing demand-driven pointer analysis in general and reducing a large number of false alarms that would otherwise be reported by

CRed.

4.3.1.1 Program Representation

A C program is represented by putting it into LLVM’s partial SSA form, follow- ing [45, 69, 71, 148]. The set of program variables V is separated into two subsets:

A containing all possible targets, i.e., address-taken variables of a pointer, and T containing all top-level variables, where V = T ∪ A. Chapter 4. Pointer-Analysis-Based UAF Detection 66

After the SSA conversion, a program has seven types of statements: p = &a

(AddrOf), p = q (Copy), p = ∗q (Load), ∗p = q (Store), p = φ(..., q, ...)(Phi), p = callfun(q)(Call), and return p (Return), where p, q ∈ T and a ∈ A. Top-level variables are put directly in SSA form while address-taken variables are accessed indirectly via Load or Store. For an AddrOf statement p = &a, known as an allocation site, a is a stack or global variable with its address taken or a dynamically created abstract heap object. Passing parameters and return values

(explicitly for top-level and implicitly for address-taken variables) is modeled by

Copy. All pointer analyses used are field-sensitive. Each field instance of a struct is treated as a separate object. However, arrays are considered monolithic. Precise solutions for arrays do not exist.

Given a program, its ICFG is built in the normal manner [61]. A call site for a function fun is split into a call node and a return node, with a call edge from the call node to the entry node of fun and a return edge from the exit node of fun to the return node.

4.3.1.2 Algorithm

As shown in Figure 4.7, we extend [112, 116] by making it also path-sensitive with the required path guards generated on-demand. Our analysis is flow-sensitive, since it answers a points-to query for a variable v by traversing all the def-use chains affecting v backwards on a value-flow graph (VFG) [45, 117, 119]. In the VFG, a node represents a statement (identified by its line number) and an edge from statement l to statement l0, denoted l →v l0, represents a def-use relation for a variable v ∈ V, with its def at statement l and its use at statement l0. These def-use chains are pre-computed with a fast but imprecise Andersen-style pointer Chapter 4. Pointer-Analysis-Based UAF Detection 67 analysis flow- and context-insensitively [6]. Our analysis is also context-sensitive.

The points-to query pt([c1, . . . , ck], v), where ci identifies a call site, returns the points-to set of v for all the function calling sequences ending with [c1, . . . , ck]. Thus, pt([ ], v@l) gives the points-to set of v at line l at all calling contexts.

c,τ,l : p = &o [ADDR] (c,τ,l,p ) ←- (c,τ,o l)

q c,τ,l : p = ql q → lδ q = Guard(lq, l) [COPY] (c,τ,l,p ) ←- (c,τ ∧ δq, lq, q)

q c,τ,l : p = φ(.., q, ..) lq → lδ q = Guard(lq, l) [PHI] (c,τ,l,p ) ←- (c,τ ∧ δq, lq, q)

c,τ,l : p = ∗q (c,τ ∧ δq, lq, q) ←- (co,τ o, o) q o lq → l lo → lδ q = Guard(lq, l) δo = Guard(lo, l) [LOAD] (c,τ,l,p ) ←- (co,τ o ∧ δo, lo, o)

c,τ,l : ∗p = q (c,τ ∧ δp, lp, p) ←- (co,τ o, o) p q o lp → l lq → l lo → l δp = Guard(lp, l) δq = Guard(lq, l) δo = Guard(lo, l) [STORE] (co,τ o, l, o) ←- (c ,τ ∧ δq, lq, q) (co,τ o, l, o) ←- (co,τ o ∧ δo, lo, o)

c,τ,l :define fun(v) {...} lcall : call fun(a) a la → lcall δa = Guard(la, lcall) [CALL] (c,τ,l,v ) ←- (c [l],τ ∧ δa, la, a)

c,τ,l : y = call fun(...) define fun(...) {..., lret :return x} x lx → lret δx = Guard(lx, lret) [RETURN] (c,τ,l,y ) ←- (c ⊕ [l],τ ∧ δx, lx, x)

(c,τ,l,v ) ←- (c0,τ 0, l0, v0)(c0,τ 0, l0, v0) ←- (c00,τ 00, l00, v00) [TRANS] (c,τ,l,v ) ←- (c00,τ 00, l00, v00)

Figure 4.7: Demand-driven pointer analysis with field-, flow- and context- sensitivity as in [116] and path-sensitivity added.

We explain our extension on handling path-sensitivity highlighted in red. Call- ing contexts are path abstractions but can be too coarse. To perform path-sensitive Chapter 4. Pointer-Analysis-Based UAF Detection 68 analysis, we represent an abstract path by both a calling context c and a path guard

τ so that c specifies its calling sequence and τ collects its branch conditions. Thus, pt((c, τ), v@l) gives the points-to set of v at line l under (c, τ).

In a function fun, every branch condition is treated as a Boolean formula. As in [20, 119], a loop (after unrolling, if needed) is approximated only once with its backedge ignored. For each control-flow edge e, EdgeGuard(e) is the branch condition under which e is executed. For a control-flow path cp, which consists of a set of control-flow edges e, the path condition is the logical conjunction of V branch conditions of e, i.e., EdgeGuard(e). A path guard Guard(l, l0) from a e∈cp statement l to a statement l0 in fun is the logical disjunction of path condition of all control-flow paths from l to l0:

_ ^ Guard(l, l0) = EdgeGuard(e) (4.4) cp∈P ath(l,l0) e∈cp

where P ath(l, l0) denotes the set of control-flow paths from l to l0.

A path guard τ from the entry of main() to a statement is defined simply in terms of Formula 4.4. For the two special cases, true (false) represents an abstract feasible (infeasible) path.

Given (c, τ, l, v), where variable v appears at line l, the points-to set of v is computed by finding all reachable objects (co, τo, o) via backward traversal on the pre-computed def-use chains:

pt((c, τ), v@l) = { (co, τo, o) | (c, τ, l, v) ←- (co, τo, o) } (4.5)

The first seven rules handle the seven types of statements in the program by traversing backwards along all the pre-computed def-use chains affecting v@l. The last says that ←- is transitive. In [ADDR], objects created at different allocation Chapter 4. Pointer-Analysis-Based UAF Detection 69

main() { ln1 - ln5 Guard(ln1, ln2) = true cnd 1: p = &o1; Guard(ln1, ln4) = true cnd ln6 2: x = &o2; Guard(ln1, ln6) = cnd 3: y = &o true 3 ൓ ln7 4: *p = x; cnd Guard(ln1, ln8) = Guard(ln4, ln8) 5: if(cnd) cnd ln8 = (cnd∧¬cnd) ∨ (¬cnd∧¬cnd) 6: *p = y; ൓true = ¬cnd 7: if(!cnd) return 8: z = *p; Guard(ln6, ln8) = ¬cnd }

(a) Code (b) CFG (c) Path Guards

Figure 4.8: Path guard construction on a CFG.

sites are identified by their line numbers. In [CALL], a∈V denotes a variable passed into the callee directly or indirectly via parameter passing. Similarly, x in [RE-

TURN] represents a value returned directly or indirectly from the callee to its caller.

Context-sensitivity is enforced by matching calls and returns. In c⊕[l], the callsite label l is appended to c. In c [l], l is removed from c if c contains l as its top value or is empty since a realizable path may start and end in different functions [113].

Strong updates are performed on singleton objects as in Supa [116]. For a program, its call graph is built on the fly. Our analysis handles its SCCs

(Strongly Connected Components) context-sensitively but the function calls in an

SCC context-insensitively as in [113]. Thus, our analysis is fully field- and flow- sensitive as well as fully context- and path-sensitive (modulo loops and recursion cycles).

4.3.1.3 Example

We use an example in Figure 4.8 to explain our rules on on-demand path-guard generation, with some relevant path guards shown. Suppose a points-to query pt(([ ], true), z@ln8) is issued. With path-sensitivity, we can determine precisely Chapter 4. Pointer-Analysis-Based UAF Detection 70

that z points only to o2 but not o3. In line 8, [LOAD] is applied:

p [], true, ln8 : z = ∗p ([ ], ¬cnd, ln1, p) ←- ([ ], ¬cnd, o1) ln1 → ln8

o1 δp =Guard(ln1, ln8)= ¬cnd δo1 =Guard(ln4, ln8)=¬cnd ln4 → ln8

([ ], true, ln8, z) ←- ([ ], ¬cnd, ln4, o1)

Similarly, by applying [STORE] and [ADDR] to lines 4 and 2, respectively, we obtain

([],true, ln8, z)←-([],¬cnd, ln4, o1)←-([],¬cnd, ln2, x)←-([],¬cnd, o2). Thus, z points to o2. We can also attempt to trace z backwards to o3 via ∗p = y (line 6), by first ap- plying [LOAD] in line 8, which produces ([ ], true, ln8, z) ←- ([ ], ¬cnd, ln6, o1). How- ever, no more rules can be applied further, because ([ ], ¬cnd, ln6, o1) 6←- ([ ], ¬cnd∧ cnd, ln6, y), as ¬cnd∧cnd = false, representing an infeasible path. Thus, z cannot point to o3.

4.3.2 Multi-Stage UAF Analysis

CRed, as shown in Figure 5.2, consists of two stages, Stages 1 and 2. Each stage decides whether to issue a warning or not for a given UAF pair by verifying its own

CRed version of ST given in Formula 4.3, which is discussed below. The pre-analysis, which serves to provide the set of UAF pairs for CRed to analyze, can be regarded as Stage 0.

Each stage is founded on a pointer analysis, Pi. In Stage 0 (our pre-analysis), P0 is flow-, context- and path-insensitive. In Stage 1 (with calling-context reduction),

P1 is flow- and context-sensitive on-demand. In Stage 2 (with path reduction), P2 is also path-sensitive. As i increases, Stage i becomes progressively more precise but also more costly, working on filtering out false alarms from an increasingly smaller set of UAF warnings provided by Stage i−1. ∼ At Stage i, where 1 6 i 6 2, we obtain and = as follows. To obtain ; Chapter 4. Pointer-Analysis-Based UAF Detection 71

∼ =, we invoke Pi to compute the points-to set pti(ρi, v), with pt in Formula 4.5 subscripted by i, for every variable v needed on-demand under a budget ηi. Here,

ρi is an appropriate path abstraction used by Pi for querying v. If ηi is exhausted before pti(ρi, v) is found, we fall back to Pi−1 by setting pti(ρi, v) = pti−1(ρi−1, v) conservatively, where the set of concrete paths abstracted by ρi is a subset of the set of concrete paths abstracted by ρi−1. To obtain , we compute it on the ICFG ; obtained in Stage 0 and refined with the function pointers being resolved more precisely by Pi.

4.3.3 Spatio-Temporal Context Reduction

We describe two reductions performed for Stages 1 and 2, with the latter being developed on top of the former, making Stage 2 more precise but also more costly than Stage 1. For each stage, we give the inference rules for implementing for its reduction.

4.3.3.1 Stage 1. Calling-Context Reduction

We abstract program paths with calling contexts so that the resulting UAF analysis is sound, scalable and highly precise (with as few spurious correlations as possible), as already motivated in Section 4.2. To this end, we would like to replace P(lf ) ×

P(lu) in Formula 4.1 with a coarser abstraction Ce(lf ) × Ce(lu) expressed in terms of calling contexts, reduced as shown in Figure 4.5, so that ST in Formula 4.1 can C simplify to ST :

[Spatio-Temporal Calling Context Reduction]

C  ST free(p@lf ), use(q@lu) := (4.6) ∼ ∃ (cef , ceu)∈Ce(lf )×Ce(lu):(cef ,lf ) (ceu,lu) ∧ (cef ,p) = (ceu,q) ; Chapter 4. Pointer-Analysis-Based UAF Detection 72

෪ ෪ ෪ common 풉풑 common 풉풑(풉풒) caller malloc()//풐 caller malloc()//풐 main − main + 풍풐(풍풑) 풄 풄 풍풐 풍풇:free(풑) 풄෦풇 … … 풍풇:free(풑) 풍풒 풍풑 풄෦풖 풍풖:use(풒) + − 풍풒 풄෦풖 ෦풄풇=풄 ⨁풄 풍풖:use(풒) (a) ෪ + ෪ (b) 풉풒=풄 ⨁풉풑

Figure 4.9: Context reduction with a demand-driven flow- and context-sensitive pointer analysis that computes the points-to set of a variable under the calling context [ ]. Boxes and arrows represent functions and (transitive) calls, respectively.

How do we construct Ce(lf )×Ce(lu)? The basic idea is illustrated conceptually in Figure 4.5 with an oracle fully-context-sensitive pointer analysis, pt∞. To reduce

the number of context pairs in Ce(lf ) × Ce(lu), we should remove their redundant prefixes if they do not help separate calling contexts as desired.

However, pt∞ is non-existent as it is not scalable for reasonably large programs.

Below we obtain Ce(lf )×Ce(lu) equivalently by using pt1, which is a flow- and context- sensitive pointer analysis in Stage 1 (Section 4.3.2), with the intuition illustrated

in Figure 4.9:

(hep, o) ∈ pt1([ ], p)(heq, o) ∈ pt1([ ], q) hep is a suffix of heq

lo = car(heq ⊕[o]) lp = car(cef ⊕[lf ]) lq = car(ceu ⊕[lu])

lp and lq reside in the function containing lo or its callee, s.t. lp 6= lq (4.7) cef is a calling context for lf ceu is a calling context for lu [CTX-R]

(cef , ceu) ∈ Ce(lf ) × Ce(lu)

Figure 4.9 illustrates a total of two cases in which hfree(p@lf ), use(q@lu)i may be potentially a UAF bug. The scenario illustrated earlier in our motivating ex- Chapter 4. Pointer-Analysis-Based UAF Detection 73 ample given in Figure 4.5 is a special instance of one of these two cases.

As shown in Figure 4.7, pt([ ], v) is computed on-demand by traversing interpro- cedurally the statements producing values that may flow into v, under all possible calling contexts for the function containing v, as indicated by [ ]. If v is found to point to o under context c when [ADDR] is applied, then (c, o)∈pt([ ], v). Note that c is a suffix of a calling sequence from main() to o’s allocation site. ∼ Let us examine [CTX-R]. As (hep, o) ∈ pt1([ ], p) and (heq, o) ∈ pt1([ ], q), (cf , p) =

(cu, q) holds if hep is a suffix of heq, i.e., the set of full calling contexts (from main()) abstracted by hep is a superset of the set of full calling contexts abstracted by heq, in which case, (hep, o) and (heq, o) may represent a common (concrete) object. The common caller, Com, for malloc(), free(p) and use(q), is the function containing line lo. Lines lp and lq reside in either Com (Figure 4.9(a)) or a (direct or indirect) callee of Com (Figure 4.9(b)) (In the special case, lp = lf and lq = lu). Thus, cef is simply the calling context from lp to lf such that lp = car(cef ⊕[lf ]). Similarly, ceu is derived.

∞ C Let ST be obtained from ST such that Ce(lf ) × Ce(lu) are now expressed in C  terms of all full calling contexts possible. By [CTX-R], ST free(p@lf ),use(q@lu) ⇐⇒ ∞  C ST free(p@lf ), use(q@lu) . Thus, ST is sound and as precise as possible by C using calling contexts. In addition, ST is efficiently verifiable, as motivated in Section 4.3 and validated later.

∼ 0 By construction, (cfu⊕ cef , p) 6= (cfu⊕ ceu, q), i.e., p and q are must-not-aliases if 0 0 cfu and cfu are different context prefixes. Now, (cfu ⊕ cef , lf ) (cfu⊕ ceu, lu) holds, 0 ; where cfu = cfu, i.e., (cef , lf ) (ceu, lu) holds, only if lp appears lexically before lq ; ∼ in the function containing lo or its callee in [CTX-R]. To check (cef , p) = (ceu, q) for these reachable pairs, we rely on pt1(cef , p) and pt1(ceu, q).  Let us apply [CTX-R] to formally analyze the UAF pair free(p@ln34), use(q@ln31) Chapter 4. Pointer-Analysis-Based UAF Detection 74

Common Caller xfree

Entry푐 1 Entry푙푓 푐푛푑1 3 F 푐෦푓 =[푐1] F 푐푛푑 T T Return 푙푓: free(p) 푐1: xfree() xuse

푐푛푑2 F Entry푙푢 T 4 푐෦푢 =[푐2] F 푐푛푑 T Return 푐2: xuse() Return 푙푢: use(q)

흉෦ = 푐 푐 ∧ 푙 = 푐푛푑1 ∧ 푐푛푑3 풇 Guard(Entry 1, 1) Guard(Entry푙푓, 푓) 흉෦ = 휏෦ ∧ 푙 푐 ∧ 푙 = 푐푛푑1 ∧ 푐푛푑3 ∧ 푐푛푑2 ∧ 푐푛푑4 풖 푓 Guard( 푓, 2) Guard(Entry푙푢, 푢)

Figure 4.10: Adding path guards to calling contexts in [PAT-R]. in Figure 5.1. By computing on-demand the points-to sets of p and q flow- and context-sensitively, we obtain pt1([ ], p) = pt1([ ], q) = {(c2, o), (c3, o)}. Let us con- sider (c2, o) only. For this example, considering also (c3, o) adds no information.

As hep = heq = [c2], we have lo = car([c2, o]) = c2. Thus, com() is the common caller that transitively calls malloc(), free(p) and use(q). As lp ∈ {c5, c7} and lq ∈ {c4, c6}, we obtain Ce(lf ) × Ce(lu) = {[c5, c9], [c7, c9]} × {[c4, c8], [c6, c8]}. Finally, the UAF pair is filtered out as a false alarm, as discussed in Section 4.2.1.

4.3.3.2 Stage 2. Path Reduction

We improve calling-context reduction by augmenting the calling contexts ec ∈ Ce(l) from Stage 1 with path guards τe ∈ Ge(l), thus achieving path reduction. As a result, Chapter 4. Pointer-Analysis-Based UAF Detection 75

C P ST is refined to ST by considering path-sensitivity:

[Spatio-Temporal Path Reduction]

P  ST free(p@lf ), use(q@lu) := (4.8)    ∃ (cef , τef ), (ceu, τeu) ∈ (Ce(lf )×Ge(lf ) × Ce(lu)×Ge(lu) : ∼ ((cef , τef ), lf ) ((ceu, τeu), lu) ∧ ((cef , τef ), p)=((ceu, τeu), q) ;   where Ce(lf ) × Ge(lf ) × Ce(lu) × Ge(lu) is constructed below:

c ∈ C(l ) c ∈ C(l ) τ = V Guard(ENTRY , c ) ef e f eu e u ef ci i ci∈cff ⊕[lf ]   τ = τ ∧ Guard(l , car(c ⊕[l ])) ∧ V Guard(ENTRY , c ) eu ef f eu u ci i ci ∈ cdr(cfu⊕[lu])

IsFeasible(τu) [PAT-R] e    (cef , τef ), (ceu, τeu) ∈ Ce(lf ) × Ge(lf ) × Ce(lu) × Ge(lu)

Figure 4.10 illustrates the intraprocedural paths captured by these guards (marked by different colors). The interprocedural path from xfree() to the common caller and the interprocedural path from the common caller to xuse() are distinguished by

calling contexts. ENTRYci denotes the entry statement of the function containing the point ci. Thus, τef represents the path from the entry of the function containing the first call site in cef to free(p@lf ), and τeu for use(q@lu) consists of three parts:

(1) τef , (2) Guard(lf , car(ceu ⊕[lu])), which represents the path from lf to the first V call site in cu, and (3) Guard(ENTRYc , ci), which is similarly defined e ci∈cdr(cu⊕[lu]) i as τef . Given a sequence, car returns its first element and cdr returns the rest in the sequence. We also check the feasibility of τeu (and τef implicitly) by using an SMT solver to enforce branch correlation. Chapter 4. Pointer-Analysis-Based UAF Detection 76

P ST is efficiently verifiable. For ,(cef , lf ) (ceu, lu) =⇒ ((cef , τef ), lf ) ∼ ; ∼ ; ; ((ceu, τeu), lu). For =, we check ((cef , τef ), p) = ((ceu, τeu), q) by querying pt2((cef , τef ), p) and pt2((ceu, τeu), q). Let us see how hfree(p@ln4), use(p@ln7)i in Figure 4.6 is reported as a UAF warning in Stage 1 (with calling-context reduction) but removed as a false alarm in

Stage 2 (with path reduction). In Stage 1, Ce(ln4) × Ce(ln7) = {([ ], [ ])} by applying ∼ [CTX-R]. As ([ ], ln4) ([ ], ln7) and ([ ], p@ln4) = ([ ], p@ln7) (since pt1([ ], p@ln4) = ; {o1} and pt1([ ], p@ln7) = {o1, o2}), a UAF warning is issued. Let us now apply

[PAT-R]. We find that τef = cnd encodes the path from the entry of the function foo() to line 4. Similarly, τeu = cnd ∧ true ∧ true = cnd encodes the path from the entry to line 7 via line 4. Thus, (([ ],cnd), ln4) (([ ],cnd), ln7). We obtain Ce(ln4) ×    ;  Ge(ln4) × Ce(ln7) × Ge(ln7) = ([ ],cnd), ([ ],cnd) . As pt2(([], cnd), p@ln4) = ∼ {([], cnd, o1)} and pt2(([], cnd), p@ln7) = {([],cnd,o2)}, we have (([],cnd),p@ln4) 6= (([],cnd),p@ln7). Thus, free(p@ln4), use(p@ln7) has been filtered out as a false alarm.

4.4 Implementation

We have implemented CRed in LLVM (3.8.0). The source files of a program are compiled under “-O0” into bit-code by clang front-end and then merged using the

LLVM Gold Plugin at link time to produce a whole program bc file. For debug- ging purposes, LLVM under “-O1” or higher flags behaves non-deterministically on undefined (i.e., undef) values [155], making bug detection nondeterministic.

We have implemented our demand-driven pointer analysis, by operating on the def-use chains computed by the open-source software tool, SVF [117], field- sensitively but flow- and context-insensitively using Andersen’s algorithm [6]. The Chapter 4. Pointer-Analysis-Based UAF Detection 77 wave propagation technique [94, 148] is used for constraint resolution. The positive weight cycles [93] are detected by Nuutila’s algorithm [89]. Distinct allocation sites

(i.e., AddrOf statements) are modeled by distinct abstract objects as in [45]. A program’s call graph is built on the fly and points-to sets are represented using sparse bit vectors.

In static analysis, a linked list is modeled finitely. Thus, a node in a points- to cycle is not considered for UAF detection (to avoid false alarms), as it may represent many different concrete nodes.

Arrays must be approximated in static analysis. When computing with a ; pointer analysis, arrays are considered monolithic. When computing ∼= , we dis- tinguish different array elements intraprocedurally. LLVM’s ScalarEvolution pass is applied to reason about must-aliases between two array accesses intraprocedurally.

Path guards are encoded by BBDs (Binary Decision Diagrams) using CUDD-

2.5.0 [111]. For path feasibility, IsFeasible(τeu) in [PAT-R] is checked by an SMT solver, known as Z3 [32].

4.5 Evaluation

We show that CRed is efficient and effective in detecting UAF bugs in real-world programs without generating excessively many false alarms, by answering four re- search questions (RQs):

RQ1: Is CRed effective in detecting existing UAF bugs?

RQ2: Can CRed find (true) UAF bugs efficiently with a low false positive rate in programs with millions of lines of code?

RQ3: What are the patterns of false alarms eliminated? Chapter 4. Pointer-Analysis-Based UAF Detection 78

Table 4.1: Benchmarks.

Program Version KLOC #Pointers #Frees #Uses bison 3.0.4 113 102679 299 20163 curl 7.52.2 188 16432 249 2179 ed 1.1 3 1062351 17 1604 grep 2.21 118 1692834 193 5910 ghostscript 9.14 1693 24067 489 255891 gzip 1.6 644 106458 66 3904 phptrace 0.3 6 354077 39 1344 redis 3.2.6 133 37793 782 59056 sed 4.2 38 548267 221 6969 zfs 0.7.0 327 52629 680 6162

RQ4: What are the patterns of UAF bugs detected?

4.5.1 Methodology

CRed is fully automatic without requiring user annotations. To answer RQ1 and

RQ2, we compare CRed with four state-of-the-art source-code analysis tools: (1)

CBMC (a bounded model checker for C/C++) [59], (2) Clang (an abstract inter- preter for C/C++ in LLVM) [5], (3) Coccinelle (a pattern-based bug detector for C) [91], and (4) Supa (a flow- and context-sensitive demand-driven pointer

Supa analysis for C used for detecting UAF bugs according to ST in (4.2)[116]. To answer RQ3 and RQ4, we perform manual inspection in real code to check whether a reported UAF warning is a bug or not.

4.5.2 Benchmarks

To answer RQ1 (for ground truth), we use all the C test cases in Juliet Test Suite

(JTS) [1], including 138 known UAF vulnerabilities. Each test case consists of 100

- 500 lines of code extracted from real-world applications. To answer RQ2 – RQ4 Chapter 4. Pointer-Analysis-Based UAF Detection 79

(in order to test the practicality of CRed), we use 10 widely-used open-source C applications, totaling over 3 MLOC, given in Table 4.1.

4.5.3 Experimental Setup

CBMC is configured to run as a UAF detector by enabling “–pointer-check” and disabling the other checks. To ensure that CBMC handles loops identically as

CRed (as described in Section 4.3.1.2), every loop is unrolled by specifying “- unwind 2”. To ensure termination, the per-program analysis budget for CBMC is set as three days. To use Clang, each program is compiled with “scan-build

./configure” and “scan-build make”, following its official user manual [5]. Coc- cinelle is invoked with spatch --sp-file, with the UAF patterns specified with its official UAF script, osdi kfree.cocci. Supa is used for finding UAF bugs according to the analysis given in (4.2).

Both Supa and CRed share the same pre-analysis, which is performed with Andersen’s algorithm [6] field-sensitively but flow-, context- and path-insensitively.

In both cases, the budget for one points-to query is set as 300, 000 (the maximum number of def-use chains traversable). For any larger budget, both Supa and

CRed take longer to run but exhibit small improvements in precision.

For CRed, we apply one optimization in Stage 2 to reduce the human effort re-

1 quired in inspecting warnings. Consider two warnings, B1 = hfree(p@lf ), use(q1@lu)i

2 and B2 = hfree(p@lf ), use(q2@lu)i, with the same free site. It suffices to report

B1 only, if B1 is a bug whenever B2 is and B2 is a false alarm whenever B1 is.

1 1 2 2 This happens if (1) pt2((ceu, τeu ), q1) includes all the objects in pt2((ceu, τeu ), q2) and

2 1 (2) τeu =⇒ τeu (solved by Z3). Our experiments were done on a 3.0 GHZ Intel Core2 Duo processor with 128

GB memory, running RedHat Enterprise Linux 5 (2.6.18). The analysis time of a Chapter 4. Pointer-Analysis-Based UAF Detection 80

CBMC Clang 36 Coccinelle 126 Supa CRed 0 20 40 60 80 100 120 138

Figure 4.11: Hit rates for the 138 bugs in JTS: Clang (26%), Coccinelle (91%), Supa (100%) and CRed (100%). program is the average of 3 runs.

4.5.4 Results and Analysis

4.5.4.1 RQ1: Recall (i.e., Hit Rate)

We assess whether CRed is capable of locating the 138 known UAF bugs in JTS [1].

As displayed in Figure 4.11, CRed finds all the 138 bugs, just as CBMC and Supa do, but Clang and Coccinelle detect only 36 and 126 bugs, respectively, with no false alarms produced by any tool.

CRed achieves a total recall, i.e., a 100% hit rate in 3.7 seconds. CBMC, as a verification tool, also achieves a total recall but in 125.5 seconds, the longest among all the five tools. Supa, as a sound pointer analysis, achieves a total recall in 3.0 seconds.

Both Clang and Coccinelle miss some bugs. Clang finds only 36 bugs in

2.5 seconds with a hit rate of 26%. Clang fails to detect 102 out of 138 UAF bugs for several reasons: (i) it lacks a pointer analysis, (ii) it performs only some limited interprocedural analysis through inlining, and (iii) it reasons about loops very conservatively.

Coccinelle detects 126 bugs in 19.7 seconds. It has missed 12 bugs due to some unsound search-space reduction heuristics used. One is concerned with Chapter 4. Pointer-Analysis-Based UAF Detection 81

Table 4.2: Experimental results of CRed (#T:#True Positives (Bugs) and #F: #False Positives (i.e., False Alarms)).

#Warnings Context Reduction Report Time Program Pre CS PS Before After #T #F (secs) bison 1640 352 1 7.3 × 1015 2.0 × 105 0 1 1904 curl 699 82 0 3.3 × 107 8.2 × 103 0 0 668 ed 34 32 2 6.3 × 104 3.6 × 103 0 2 4 grep 630 493 2 1.1 × 107 3.0 × 105 1 1 2023 ghostscript 2630 1038 3 6.4 × 1015 1.6 × 105 0 3 2805 gzip 382 117 1 7.1 × 108 3.6 × 103 1 0 4 phptrace 268 5 1 7.0 × 105 3.2 × 103 1 0 2 redis 11019 395 20 1.1 × 1015 4.0 × 103 16 4 13551 sed 2258 441 29 1.0 × 109 1.8 × 105 26 3 5102 zfs 22283 2730 73 2.3 × 1014 1.0 × 106 40 33 1271 Total 41843 5685 132 1.5 × 1016 1.9 × 106 85 47 27334 matching a free site with its use sites. Given a free site, Coccinelle examines only the use sites reachable along the forward edges in the program’s call graph. Thus, any UAF bug will be missed if its free site resides in a wrapper. Another is related to the limited alias analysis in Coccinelle. Given a free site free(p), Coccinelle considers the aliases for ∗p by tracking only the value-flow of p forwards along the control flow via only a sequence of copy assignments on top-level variables. Thus, an alias between ∗p and ∗q (for a use(q) that is formed before free(p) or indirectly via address-taken variables in terms of loads and stores will be missed. All the 12 bugs in JTS are missed this way.

4.5.4.2 RQ2: Bug-Finding Ability

We assess how efficiently and effectively CRed finds new UAF bugs in the 10 real-world applications (Table 4.1). Table 4.2 gives the results of CRed, with results of the other four existing tools given in Table 4.3 for comparison. CRed issues 132 warnings including 85 bugs in 27,334 seconds (7.6 hours), starting from Chapter 4. Pointer-Analysis-Based UAF Detection 82

Table 4.3: Experimental results of existing tools (#T:#True Positives (Bugs) and #F: #False Positives (i.e., False Alarms)).

CBMC Clang Coccinelle Supa Program Report Time Report Time Report Time Report Time #T #F (secs) #T #F (secs) #T #F (secs) #T #F (secs) bison 0 0 > 259200 0 0 113 0 18 7 0 1044 1793 curl 0 0 > 259200 0 0 355 0 8 53 0 694 27 ed 0 0 68553 0 0 18 0 0 1 0 34 1 grep 0 0 > 259200 0 0 110 0 18 9 1 537 362 ghostscript 0 0 > 259200 0 0 2007 0 23 68 0 1944 2556 gzip 0 0 > 259200 1 0 68 0 12 3 1 381 3 phptrace 0 0 > 259200 0 0 29 0 0 1 1 192 1 redis 0 0 > 259200 0 2 836 0 5 7 16 4187 13333 sed 0 0 > 259200 0 0 116 0 14 3 26 1887 160 zfs 0 0 > 259200 0 0 790 0 5 30 40 12195 180 Total 0 0 > 2401353 1 2 4442 0 103 179 85 23095 18416

41,843 warnings generated by the pre-analysis. However, the four existing tools are either unscalable by terminating within three days only for one application

(CBMC) or impractical by reporting virtually no bugs (Clang and Coccinelle) or excessively many false alarms (Supa).

CBMC does not scale yet to large codebases. It spends 68,553 seconds, i.e., 19.0 hours, in analyzing ed (the smallest with 3 KLOC) but cannot terminate for each remaining application in three days. As a result, CBMC detects no UAF bugs (as ed is absent of UAF bugs).

Clang reports 3 warnings including only 1 (intraprocedurally-detectable) bug, which is also found by CRed, in 1.2 hours. Interestingly, Clang has even a higher false positive rate than CRed.

Coccinelle reports 103 warnings, which are all false alarms by manual in- spection, in 179.0 seconds. Coccinelle fails to detect any true bug due to mainly its two unsound heuristics that are described above in Section 4.5.4.1. Specifically, among the 85 bugs detected by CRed, 84 bugs require tracking the backward (i.e., Chapter 4. Pointer-Analysis-Based UAF Detection 83 return) edges of wrappers for free sites, with one exception in gzip, which, however, requires analyzing the aliasing relations for address-taken variables. In addition, all the 26 bugs in sed and 32 bugs in zfs also require the value flows of address-taken variables to be tracked.

Supa also starts with the same 41,843 UAF warnings pre-computed by CRed’s pre-analysis. Being sound, Supa reports the same 85 bugs found by CRed but also 23,180 warnings (with both as expected). These false alarms are the spu-

Supa rious spatio-temporal correlations introduced in ST in (4.2), as motivated in Section 4.2.

Dynamic detectors have low false positive rates but may not find UAF bugs. We have run AddressSanitizer [105] on the 10 applications with “-O0 -fsanitize=address” enabled. Under the default test inputs provided, it reports 5 memory leaks in ghostscript and grep and 1 buffer overflow in bison, but no UAF bugs at all in about 6.5 mins.

CRed is effective in finding new UAF bugs in real-world applications. By examining manually the 132 warnings reported, we found 85 to be bugs and 47 to be false positives. These false alarms are issued due to mainly imprecise handling of complex path conditions (among others as explained in Section 4.5.5). Clang

finds only 1 bug in gzip, which is also found by CRed, among the 3 warnings reported. The other 2 warnings (in sed) are false alarms, due to its lack of pointer analysis. These 2 false alarms are not reported by CRed. CRed is also highly effective in filtering out false alarms in its two stages. Let wi be the warnings produced by Stage i. The false alarm elimination (FAE) rate at Stage i, where

1 6 i 6 2, is given by (wi−1 − wi)/wi−1. The two stages (CS and PS in Table 4.2) are effective, with their average FAE rates being 68.7% and 95.8%.

CRed is also efficient in its two stages, as shown in Figure 4.12, by using Chapter 4. Pointer-Analysis-Based UAF Detection 84

Pre-Analysis Context-Sensitive Analysis Path-Sensitive Analysis 100% 80% 60% 40% 20% 0%

Figure 4.12: Percentage distribution of CRed’s analysis times. increasingly more precise yet more expensive analyses on handling increasingly fewer UAF warnings (as validated in Table 4.2). In Stage 1 (context-sensitive analysis), context reduction is significant, as revealed in Columns 5 – 6 in Table 4.2.

Otherwise, Stage 1 would run for 2 × 109 days for the 10 applications (estimated based on the per-query time consumed in), implying that Supa would be unscalable

(as its core pointer analysis pt1 is used in [CTX-R]). Given its effectiveness, CRed is the most scalable interprocedural UAF de- tector reported (to the best of our knowledge). CRed spends just 7.6 hours in analyzing the 10 applications (totaling 3+ MLOC). The analysis time for a pro- gram includes the times elapsed in its two stages and its pre-analysis. For Clang and Coccinelle, ghostscript takes the longest to analyze since it is the largest with 1693 KLOC. For CRed and Supa, redis takes the longest since it has the second largest number of UAF candidate pairs, i.e., 11,019 pairs to be analyzed and complex constraints to be solved by Z3.

4.5.4.3 RQ3: False Alarm Patterns

To gain insights on detecting UAF bugs statically, we discuss some common pat- terns of false alarms removed by CRed in its two stages (Figures 4.13(a) and (b)). Chapter 4. Pointer-Analysis-Based UAF Detection 85

//tif_dirread.c 3418 int TIFFReadDirectory (TIFF* tif) { Different objects when 3728 _TIFFfree (data); // 풄풔ퟏ free site reaches use site 3978 EstimateStripByteCounts (tif, dir, dircount); 3985 _TIFFfree (dir); // 풄풔ퟐ } Same object when free site does not reach //gstiffio.c 248 void _TIFFfree (void* p) { use site 250 free (p); } (a) A false alarm in gs eliminated by context-correlation

//compile.c 877 static void read_text (...) { pending_text = ... //풐ퟏ Both may point to 풐 , 884 if (buf) { ퟏ but never along the 887 free (pending_text); same path 888 pending_text = init_buffer();//풐ퟐ } 923 normalize(get_buffer(pending_text)); } 523 struct buffer* init_buffer () 529 { return MALLOC (1, struct buffer); } (b) A false alarm in sed eliminated by path-correlation

//lib/regexec.c 643 reg_errcode_t re_search_internal (...) { 757 for (;; match_first += incr) { ③ 862 match_last = check_matching (&mctx, ...); ② ④ 866 if (match_last == REG_ERROR)//REG_ERROR=-2 869 return err; } } 1116 Idx check_matching (re_match_context_t *mctx, ...) { ⑤ 4280 if (BE (new_entry == NULL, 0)) { 4282 free (mctx->bkref_ents); ① 4283 return REG_ESPACE; //REG_ESPACE=-12 } 4292 mctx->bkref_ents[mctx->nbkref_ents - 1].more = 1; ⑥ 4294 mctx->bkref_ents[mctx->nbkref_ents].node = node; ⑦ 4295 mctx->bkref_ents[mctx->nbkref_ents].str_idx = str_idx; ⑧ } (c) Three UAF bugs in sed //deps/lua/src/strbuf.c 104 void strbuf_free(strbuf_t *s) { 113 free(s); ① } //redis/deps/lua/src/lua_cjson.c 536 void json_check_encode_depth(..., strbuf_t *json) 553 { strbuf_free(json); ② } 660 void json_append_data(..., strbuf_t *json) { 680 json_check_encode_depth(..., json); ③ 683 json_append_array(..., json, ...); ④ } 566 void json_append_array(..., strbuf_t *json, ...) ⑤ 571 { strbuf_append_char(json, '['); ⑥ } //deps/lua/src/strbuf.h 116 void strbuf_append_char(strbuf_t *s, const char c) { ⑦ 119 s->buf[s->length++] = c; ⑧ } (d) Two UAF bugs in redis

Figure 4.13: Case study for some false alarms eliminated and some bugs reported by CRed in real-world applications. Chapter 4. Pointer-Analysis-Based UAF Detection 86

Context reduction (stage 1) Figure 4.13(a) shows a false alarm in ghostscript eliminated in Stage 1. Consider lines 250 and 3978. Supa (Table 4.3) will report hfree(p@ln250), use(dir@ln3978)i as a warning, since ([ ], ln250) ([ ], ln3978) ∼ ; and ([ ], p) = ([ ], dir) (due to pt([ ], p) ∩ pt([ ], dir) 6= ∅). With calling-context reduction, this will be identified as a false alarm successfully.

Path reduction (stage 2) Figure 4.13(b) illustrates a UAF candidate hfree

(pending text@ln887), use(pending text@ln923)i, that is identified as a false alarm path-sensitively (by capturing branch correlation), just like hfree(p@ln4), use(p@ln7)i in Figure 4.6.

4.5.4.4 RQ4: Understanding UAF Bugs

There are 85 UAF bugs detected by CRed. We first examine two representative patterns in Figures 4.13(c) and (d) and then discuss these bugs briefly.

Figure 4.13(c) illustrates three UAF bugs found in sed (counted as one in Column 7 in Table 4.2, as discussed in Section 4.5.3). Under a certain path con- dition, the program frees mctx->bkref ents (line 4282) and returns an error flag

REG ESPACE (line 4283). Unfortunately, the error is not captured later in line 866, since REG ESPACE 6= REG ERROR, causing the freed pointer mctx->bkref ents to be dereferenced in lines 4292 – 4295.

Figure 4.13(d) gives two UAF bugs (counted as one in Column 7 in Table 4.2) in redis. In function json append data (line 660), json is indirectly freed in line 680 by calling json check encode depth, which in turn calls strbuf free (line 553) to free the object (line 113). After that, json append data calls json append array

(line 683) with json passed as a parameter, where the freed object is accessed twice (line 119), resulting in two UAF bugs.

The 85 bugs detected by CRed reside in grep, gzip, phptrace, redis, sed and Chapter 4. Pointer-Analysis-Based UAF Detection 87 zfs. Precise pointer analysis is essential. As mentioned earlier, 58 bugs (including

26 in sed and 32 in zaf) require analyzing aliases for address-taken variables. The other 27 bugs, which are found in grep, gzip, phptrace, redis and zfs, require analyzing top-level pointers only. The 4 bugs in zfs would be missied if some function pointers in the call sequence from their common callers to their use sites were not resolved accurately. In addition, interprocedural analysis is also essential.

Consider Figure 4.9. The average call sequence from a common caller to a free

(use) site is 2.33 (3.71), with the longest being 4 (7). For only one out of the 85 bugs, its free and use sites reside directly in its common caller.

4.5.5 Limitations

As a static analysis, CRed can suffer from both false negatives and false posi- tives. CRed can miss bugs due to its unsound modeling of loops (by analyzing two iterations), its unsound handling of a linked list (by ignoring its nodes partic- ipating in points-to cycles), and its unsound modeling of array access aliases (by using LLVM’s ScalarEvolution pass for detecting must-aliases). In addition, in non- compliant C programs, where one uses a pointer pointing to one object to access another object with pointer arithmetic, pointer analysis will be unsound, resulting in potentially false negatives.

CRed yields false alarms due to mainly (1) imprecise path reduction in [PAT-

R], and (2) imprecise points-to information for out-of-budget points-to queries (in traversing points-to cycles). Chapter 4. Pointer-Analysis-Based UAF Detection 88

4.6 Chapter Summary

In this chapter, we present CRed, a novel static detector for finding UAF bugs, and demonstrate its effectiveness and efficiency in finding all the known UAF bugs in Juliet Test Suite and new ones in multi-MLOC C applications. To the best of our knowledge, CRed is the first such interprocedural analysis achieving this level of scalability, precision and accuracy. Its success relies on the three advances made:

(1) a context reduction technique for scaling CRed to large codebases, (2) a multi- stage approach for filtering false alarms earlier, and (3) a field-, flow-, context- and path-sensitive demand-driven pointer analysis for providing the precise points-to information required. Chapter 5

Memory Leak Fixing on

Value-Flow Slices

5.1 Overview

In software testing, the two central tasks facing software engineers are finding bugs and fixing them. Both tasks are expensive in dollar terms and time-consuming due to the ever-increasing scale and complexity of modern software systems. A large number of existing program analyses and testing techniques focus on automatic bug detection. However, finding bugs is only the first step. Once reported, bugs must still be repaired. Indeed, manually fixing bugs can be a non-trivial and error-prone process, especially for large-scale software.

Recently, a few approaches to automatic bug fixing have been proposed to re- duce maintenance costs by producing candidate patches for program validation and repairing [7, 43, 52, 85, 95, 133]. For example, ClearView [95] enforces vio- lated invariants to correct buffer overflow and illegal control flow errors by creating patches for binaries. AutoFix-E [133] relies on user specifications and generates

89 Chapter 5. Memory Leak Fixing 90 repairs using contracts. Pachika [28] infers object behavior models to propose candidate fixes for bugs like null dereferences. GenProg [43] uses genetic pro- gramming to repair bugs in legacy code.

Most of the existing automatic approaches for fixing bugs in C programs are related to spatial memory errors such as buffer overflows and null pointer deref- erences. Such types of bug can be validated by inserting assertions and repaired by adding conditional checks to avoid executing the code segment that leads to undesired behaviors (e.g., program crashes) [66, 118, 146].

Memory leaks represent another major category of memory errors, i.e. temporal memory errors, which are more complicated to fix automatically. Unlike a spatial error that can be fixed by adding a conditional check to bypass the point where the spatial error occurs, every leaky path from a leaky allocation site needs to be tracked by inserting an appropriate fix (i.e., a free call) without introducing new memory errors.

Figure 5.1 illustrates how intra- and inter-procedural memory leaks are fixed correctly and incorrectly. Suppose the memory allocated in line 2 in Figure 5.1(a) is never freed. Adding a fix, free(p), too early in line 5 can cause a UAF error in line

11, whereas adding free(p) in line 12 at the end of program without considering path correlation may introduce an invalid free site for a stack object (when the if branch is executed). A correct fix is provided in lines 13 – 16, with the underlying path correlation accounted for correctly. Let us consider now an inter-procedural leak shown in Figure 5.1(b), where the memory allocated in line 2 is leaked partially along the else branch (lines 8 – 11). A simple-minded fix, free(x), which is inserted in line 16 in function Use, is incorrect. Without considering correlated calling contexts, this fix may introduce a double-free in line 6 when the if branch is executed. A correct fix is given in line 10, ensuring that only the leak along the Chapter 5. Memory Leak Fixing 91

1: void foo() { 1: void bar () { 2: char* p = malloc(30); //o 2: char* p = malloc(…); //o 3: char* q = “on stack”; 3: fgets(p, …); 4: fgets(p, …); 4: if (C2) { 5: free(p); 5: use(p); 6: if (C1) { 6: free(p); 7: char* t = p; 7: } 8: p = q; 8: else { 9: q = t; 9: use (p); 10: } 10: free (p); 11: printf(“%s”, q); 11: } 12: free(p); 12: } 13: if (C1) 13: 14: free(q); 14: void use(char* x) { 15: else 15: printf(“%s”, x); 16: free(p); 16: free(x); 17: } 17: } (a) Fixing an intraprocedural leak (b) Fixing an interprocedural leak

Figure 5.1: Incorrect ( ) and correct (X) fixes. else branch is fixed.

This chapter presents AutoFix, a fully automated approach to fixing memory leaks in C programs by combining static and dynamic analyses. Given a list of leaky allocation sites reported by a static detector, AutoFix automatically fixes all the reported leaks by inserting appropriate fixes (i.e., free calls) along all the leaky paths. There are two main challenges. First, a detector reports a leaky allocation site as long as it discovers one leaky path from the site without necessarily reporting all the leaky paths. AutoFix is designed to fully repair the memory leak for all its leaky paths. Second, some reported leaks are false positives. AutoFix must guarantee memory safety by ensuring that the fixes are correct regardless of whether the reported leak is a true bug or a false positive. Note that AutoFix certainly cannot fix any leaks that are missed (i.e., not reported) by a static detector.

AutoFix applies to a large class of real-world C programs where memory Chapter 5. Memory Leak Fixing 92

Leak Detector Leaky Allocation Sites

Analysis 6 Value-Flow Construction

6 Locating Functions to Fix )USVORK

6 Identifying Leaky Paths

6 Liveness Analysis :OSK

Annotated IR

Instrumentation 6 Inserting Fixes

Instrumented IR

Sandbox 8[T

Runtime Checking :OSK

Memory Reclamation

Figure 5.2: The AutoFix framework. management is explicitly orchestrated by programmers without resorting to garbage collection (GC) and/or reference counting (RC). Compared to the GC and RC approaches, our approach is lightweight as only small instrumentation overhead is incurred. To safely reclaim a leaked memory object o from an allocation site without any programmer intervention, all the memory allocation and deallocation sites reachable from o on the value-flow slices of the program are instrumented to keep track of the liveness of o in shadow memory, thereby enforcing correct leak

fixing inside a memory-safe execution sandbox at runtime.

Figure 5.2 highlights the basic idea behind AutoFix. Given a leaked object o from an allocation site (reported by any leak detector), AutoFix builds from the Chapter 5. Memory Leak Fixing 93 program a sparse value-flow graph (S1), on which a graph reachability analysis is

first performed to locate the candidate functions for inserting appropriate fixes, i.e., free calls (S2). For each candidate function f, AutoFix then identifies the leaky paths in f for o by computing the value-flow guards with respect to its existing deallocation sites found in the program (S3). Next, a liveness analysis is applied inside f on the value-flow slice of the identified leaky paths for o to determine every program point P where a fix is needed with path correlation considered (S4).

Finally, the fixes are inserted immediately after the last use sites of o on all its leaky paths (S5). At runtime, the instrumented fixes performs runtime checking to verify statically identified leaks, and reclaims only true leaked memory objects.

This chapter makes the following contributions:

• We present AutoFix, a fully automated approach to memory leak fixing for C programs that can reclaim all the leaked memory objects reported by any

leak detector.

• We propose an approach to safe memory leak fixing by combining static and dynamic analyses: an inter-procedural static analysis is first performed to

identify the earliest program points that the missing free calls can be inserted without introducing any use-after-free error; and then, a dynamic analysis,

which operates on efficient shadow memory, tracks the potentially leaked

memory objects and performs runtime checking to fix true leaks without

introducing any double-free error.

• We have implemented AutoFix in LLVM-3.5.0 and evaluated it using five SPEC2000 benchmarks and three open-source applications. Our experimen-

tal results show that AutoFix can safely fix all the leaks reported by the

state-of-the-art static leak detector, Saber [119, 120], with runtime overhead Chapter 5. Memory Leak Fixing 94

averaged at under 2%. For the long-running server application redis eval-

uated, AutoFix has successfully reduced its memory usage by more than 300MB in a three-hour continuous run after having fixed its leaks.

5.2 A Motivating Example

In order to describe the main idea of our approach, we use the example in Figure 5.3 to go through the five key steps shown in Figure 5.2. An allocation site in line 4 of fun in Figure 5.3(a) is partially leaky along the else branch in line 10 inside a while loop. Given a leaked object o detected from this allocation site by a static leak detector, AutoFix first constructs a value-flow graph shown in Figure 5.3(b) for o. Based on this graph, AutoFix inserts one fix immediately after line 16 on the leaky else branch, as shown in Figure 5.3(c). The leaky allocation site (in line

4) and the deallocation site (in line 8) in the non-leaky if branch are instrumented with dynamic checks to ensure safe fixing at runtime, as illustrated in Figure 5.3(d).

Step 1: Constructing Value-Flows. Following [119, 120], we construct an inter-procedural value-flow graph (VFG) for every leaky allocation, with the one for the example given in Figure 5.3(b). Note that its nodes are numbered by their corresponding line numbers in Figure 5.3(a). In a VFG, its nodes represent the definitions of variables and its edges capture their def-use relations.

Step 2: Locating Functions to Fix. Given a leaked object o from an allocation site, AutoFix first determines the functions where the leaks of o should be fixed. A candidate function f is chosen if f allocates o (directly in itself or indirectly in its callee functions) such that o is never returned to any caller of f.

Note that the existence of a candidate function for o is guaranteed since the main function will be the last resort. In our example, Fun is selected as a candidate Chapter 5. Memory Leak Fixing 95

1: void fun ( ) { Step 1 : Allocation 2: ----char* q = “on stack”; : Deallocation 3: ----while (C0) { 4: ------char* p = malloc(30);//o {C1} : Free condition 5: ------fgets(p, …); 6: ------if (C1){ fun 7: ------use(p); 4 {C1} 8: ------free(p); 9: ------} 10: ------else { 5 8 12 11: ------if (C2) { 12: ------char* t = p; 7 13: ------p = q; 14 14: ------q = t; 15: ------} 16: ------use(p); 16 17: ------} 18: ------printf(“loop”); 19: ----} use 20: } 21

21: void use(char* p) { 22: ----printf(“%s”, p); 22 23: } (a) Program (b) Constructing value-flows graph

static dynamic fun + q = “on stack” allocation address site hash map while (C0) * return * … … p = malloc(…) ④ allocID address Add (allocID, p) … … fget (p , … ) ⑤ if (C1) Shadow memory mapping * ⑦ if (C2) use (p) * free (p) ⑧ Add (allocID, addr) t = p ⑫ Remove Insert addr into the hash map indexed by (allocID, p) * p = q allocID * q = t ⑭ * use (p) ⑯ Remove (allocID, addr) Fix (allocID) Remove addr from the hash map indexed by allocID * printf(“loop”)

Control-Flow Graph Fix (allocID) Reclaim memory addresses in the hash map Step 2 Step 3 indexed by allocID, and clear the hash map + : Functions to fix * : Leaky paths

Step 4 Step 5 … : Operations on = : Live basic blocks : Fixes shadow memory

(c) Leak fixing (d) Runtime shadow memory

Figure 5.3: A motivating example. Chapter 5. Memory Leak Fixing 96 function since it contains an allocation site of o (line 4) and o is never returned to any callers of Fun based on its value-flows.

Step 3: Identifying Leaky Paths. AutoFix identifies the leaky paths for o in Fun by reasoning about value-flow guards, which are Boolean formulas capturing branch conditions between defs and uses in the control flow graph (CFG). The free condition C1 under which the free site in line 8 is reached is computed by performing a guarded reachability analysis from the malloc source 4 to its free sink 8 . Thus, the leak condition ¬C1 encodes the leaky paths in the else branch as highlighted in red in Figure 5.3(c).

Step 4: Liveness Analysis. An intra-procedural liveness analysis is per- formed for Fun to mark the live basic blocks for o (shown as double-framed boxes in Figure 5.3(c)) that are reachable from the allocation site of o on its leaky paths.

As a result, node 16 is identified as the last-use site of o.

Step 5: Inserting Fixes. As shown in Figure 5.3(c), a deallocation Fix() is inserted immediately after 16 (i.e., line 16), where the last use of o is found. In ad- dition, the instrumentation code (in dotted-line boxes) also includes the metadata- manipulating functions inserted (after the malloc source 4 and the free sink 8 ) to maintain runtime shadows for o to ensure safe fixing for both leaky and non-leaky paths.

Figure 5.3(d) shows that the shadow memory simply maps an allocation site with its unique ID, allocID, to a hash map that records (start) addresses of the dynamically allocated objects that are not yet freed. Consider Figure 5.3(c) again.

Every address p that points to an object allocated at 4 is recorded in the shadow memory by calling function Add(allocID, p) instrumented immediately after 4 . The deallocation site 8 , which is reachable from 4 via value-flows, is instrumented by calling function Remove(allocID, p) to delete the address p from the shadow Chapter 5. Memory Leak Fixing 97 hash map since its pointed-to object has been released along the non-leaky if branch. On reaching Fix(allocID) during program execution, all the objects iden- tified by the addresses corresponding to the allocation site allocID in the shadow memory are freed (as they would be leaked otherwise).

5.3 The AutoFix Framework

AutoFix is a compile-time transformation for inserting runtime checks to reclaim leaked memory for C programs, which keeps track of potential leaked memory addresses via an efficient shadow metadata structure. In this section, we first present the five steps of AutoFix’s compile-time transformation (Section 5.3.1

– Section 5.3.5), and then describe the design of AutoFix’s metadata structure (Section 5.3.6).

5.3.1 Step 1: Constructing a Value-Flow Graph

An inter-procedural sparse value-flow graph (VFG) [45, 115, 119, 120, 121, 122,

147, 148] for a program is a multi-edged directed graph that captures the def-use chains of both top-level and address-taken variables. Top-level variables are the variables whose addresses are not taken. The def-use chains for top-level variables are readily available once they have been put in SSA form as is standard. Address- taken variables are accessed indirectly and inexplicitly at loads and stores. Their def-use chains are built in several steps following [21, 119, 120].

First, the points-to information for the program is computed by using, e.g.,

Andersen’s analysis. Second, the accesses of address-taken variables are made explicit by adding annotations. A load p = ∗q is annotated with a function µ(x) for each variable x that may be pointed to by q to represent a potential use of x Chapter 5. Memory Leak Fixing 98

1 char a, b, *p = &a, *q = &b; 2 int main( ) { 9 void swap(char *x, char *y) { 3 *p = '>'; 10 char t1 = *x; 4 *q = '<'; 11 char t2 = *y; 5 swap(p,q); 12 *x = t2; 6 char c = *p; 13 *y = t1; 7 char d = *q; 14 } 8 } (a) The swap program

1 char a, b, *p=&a, *q=&b; 2 int main( ) { 3 *p = '>'; 4 *q = '<'; ࢇ૚ ൌ ࣑ ࢇ૙ 5 swap(p,q); ࢈૚ ൌ ࣑ ࢈૙ 6 char c = *p; ࣆ ࢇ૚ ࣆ ࢈૚ ࢇ૛ ൌ ࣑ ࢇ૚ ࢈૛ ൌ ࣑ ࢈૚ 7 char d = *q; ࣆ ࢇ૛ 8 } ࣆ ࢈૛ 9 void swap(char *x, char *y) { 10 char *t1 = *x; ࢇ૚ ൌ ࣑ ࢇ૙ ࢈૚ ൌ ࣑ ࢈૙ 11 char *t2 = *y; ࣆ ࢇ૚ 12 *x = t2; ࣆ ࢈૚ 13 *y = t1; ࢇ૛ ൌ ࣑ ࢇ૚ 14 } ࢈૛ ൌ ࣑ ࢈૚ ࣆ (b)ࢇ૛ Memoryࣆ ࢈૛ SSA form with annotations

[o] indirect value flow of o ࣆȀ࣑ [o] direct value flow of o

3 *p= '>'; 4 *q= '<'; [a] [b] ૚ ૙ ૚ ૙ 5 ࢇ ൌ ࣑ ࢇ 5 ࢈ ൌ ࣑ ࢈ [a] [b] ૚ ૚ 9 ࣆ ࢇ 9 ࣆ ࢈ [a] [b] ૚ ૙ ૚ ૙ 10 ࢇ ൌ ࣑ t1ࢇ =*x; 11 ࢈ ൌ ࣑ t2࢈ =*y; [t1] [t2] ૚ ૚ 13 *y=ࣆ ࢇ t1 ; 12 ࣆ*x=࢈ t2 ; [b] [a] ૛ ૚ ૛ ૚ 14 ࢈ ൌ ࣑ ࢈ 14 ࢇ ൌ ࣑ ࢇ [b] [a] ૛ 5 ࣆ ࢈૛ 5 ࣆ ࢇ [b] [a] ૛ ૚ ૛ ૚ 7 d=*q;࢈ ൌ ࣑ ࢈ 6 c=*p;ࢇ ൌ ࣑ ࢇ

ࣆ ࢈૛ (c) Value-flow graph ࣆ ࢇ૛

Figure 5.4: Value-flow example. Chapter 5. Memory Leak Fixing 99 at the load; A store ∗p = q is annotated with x = χ(x) for each variable x that may be pointed to by p to represent a potential def and use of x at the store;

A call site cs is also annotated with µ(x)/x = χ(x) for each variable x that is inter-procedurally referred/modified inside the callee of cs. A function definition def f(...) is annotated at the entry/exit with µ(x)/x = χ(x) for each variable x that is inter-procedurally referred/modified inside f. Third, all the address-taken variables are converted to SSA form, with each µ(x) being treated as a use of x and each x = χ(x) as both a def and use of x. Finally, the value-flows are constructed by connecting the def-uses for each converted SSA variable.

Figure 5.4 gives an example. The program in Figure 5.4(a) defines two variables a and b in line 1, whose addresses are taken by pointers p and q, respectively. By indirectly memory accesses via pointers p and q, a and b are first initialized in line

3 and 4 respectively, and are then passed to subroutine swap in line 5. Finally, c gets a’s value (‘<’ after the swap) by dereferencing p in line 6, while d gets b’s value (‘>’ after the swap) by dereferencing q in line 7. To track indirect value-

flows, annotations are added to make indirect memory accesses explicit, as shown in Figure 5.4(b), where the subscripts of a and b are as in standard SSA form.

Note that the subscripts start from zero at each function entry (a0 in line 3 for main and in line 9 for swap, b0 in line 4 for main and in line 9 for swap). By connecting def-uses based on the annotations, the value-flow graph in Figure 5.4(c) is constructed, which shows how a and b’s values are exchanged and then flow to c and d respectively.

5.3.2 Step 2: Locating Functions to Fix

Definition 1 (Value-flow Reachability) A variable v is o-reachable if there exists a value-flow path from the allocation site of o to the definition site of v on Chapter 5. Memory Leak Fixing 100

Algorithm 2 Liveness Analysis (for leaked object o) Let F be the set of candidate functions for fixing o foreach f ∈ F do Let subCF G be the subgraph of f’s CFG that contains only the leaky paths for o in f foreach basic block b in subCFG do isLive(b) ← false if b contains an o-reachable variable then isLive(b) ← true

while isLive has changed do foreach basic block b in subCFG do W isLive(b) ← s∈succ(b) isLive(s) // succ(b) is the set of successors of b

the VFG of the program. A callsite p = call(...) is o-reachable if either variable p or x in any x = χ(x) function annotated at the callsite is o-reachable.

Given a leaked object o from an allocation site reported by a static detector,

AutoFix first determines the candidate functions where the leaks of o will be fixed. A function f is a candidate function to insert fixes for o if (1) f contains at least one o-reachable callsite and (2) there is no o-reachable variable in any caller of f.

In the case of recursion, if there is no data dependence on the leaked object o between any two function calls in the recursive cycle, o can be fixed in the recursive functions; otherwise fixes for o must be put outside the recursive functions to ensure safe fixing. In the case of global variables, since they are reachable for every function, leaked objects pointed to by global pointers can only be fixed in main. However, global variables are generally not considered as leaks in existing leak detectors [20, 55, 119, 120]. Chapter 5. Memory Leak Fixing 101

5.3.3 Step 3: Identifying Leaky Paths

To identify the leaky paths in a candidate function f, AutoFix performs a forward analysis on the VFG from an o-reachable callsite src to construct a value-flow slice

Ssrc that includes all the nodes reachable from src but confined in f. If no free sites are reachable from src, then all paths in function f are leaky paths. Otherwise, for a free site snk corresponding to Ssrc, let vfp(src, snk) be the set of all value-flow paths from src to snk on the VFG, and vfe(P) be the set of all value-flow edges in a single value-flow path P ∈ vfp(src, snk). Thus, we can obtain the value-flow guards from src to snk:

_ ^ V F Guard(src, snk)= CF Guard(ˆs, dˆ) P ∈vfp(src,snk) (ˆs,dˆ)∈vfe(P )

_ ^ CF Guard(ˆs, dˆ) = boolCond(e) Q∈cfp(ˆs,dˆ) e∈Q

where CF Guard(ˆs, dˆ) is a boolean formula that encodes the set of control-flow paths, denoted as cfp(ˆs, dˆ), from program points ˆ to dˆ on f’s CFG. Each branch edge e on a control flow path Q is uniquely assigned a boolean variable boolCond(e).

In the presence of loops, guards can grow unboundedly. To avoid unbounded conjunctions that describe all loop iterations, we follow [20, 119, 120] and bound loops to one iteration. Finally, the leak condition for src is obtained by computing guards from src to all its reachable free sites in f:

_ LeakCond = V F Guard(src, snk)

snk∈Ssrc Chapter 5. Memory Leak Fixing 102

Any path from src to the end of function f that satisfies LeakCond is a leaky path for the leaked object o.

5.3.4 Step 4: Liveness Analysis

For a candidate function f, a subgraph is extracted from f’s CFG by excluding the control flow edges that are not on any leaky path. Then, a liveness analysis is applied to this subgraph to determine the basic blocks in f where o may be live.

As shown in Algorithm2, a backward data-flow analysis is performed in f, starting from blocks containing o-reachable variables (lines 6 – 7) and iteratively marking the liveness of each block until a fix-point is reached (lines 8 – 11). For a leaked object o allocated inside a loop, if there is no data dependence on o between different loop iterations, then fixes for o can be inserted inside the loop; otherwise

fixes for o must be put outside the loop to ensure safe fixing.

5.3.5 Step 5: Instrumentation

Based on the liveness information, Algorithm3 performs instrumentation to insert

fixes for every leaked object o in each of its candidate functions f. A call to Fix() is inserted either at the end of b where b is the last live basic block for o (lines 6 –

7), or at the beginning of n where n is a newly created basic block between a live block and a non-live block of o (lines 9 – 11). As shown earlier in Figure 5.3(d), these fixes serve to reclaim the dynamically allocated memory at the allocation site of o that would otherwise be leaked. In addition, calls to Add() and Remove() are instrumented to maintain runtime shadow memory, thereby enforcing safe leak-

fixing. A call to Add() is inserted after the allocation site of o (line 12) to track all its allocated objects in the shadow memory. A call to Remove() is inserted after each free site reachable from its corresponding allocation site (line 13), so Chapter 5. Memory Leak Fixing 103 that the freed objects are removed from the shadow memory.

Our instrumentation is safe even if the leaked object o is a false positive, for two reasons. First, the VFG of a program over-approximates its def-use chains. Thus, the last-use sites of o in its candidate functions for fixing o are conservatively found, ensuring safety by avoiding any use-after-free. Second, Add() and Re- move() maintain valid memory addresses in the shadow memory, ensuring safety by avoiding any potential double free along any program path.

Algorithm 3 Instrumentation (for leaked object o) Let F be the set of candidate functions for fixing o Let allocID be the unique ID of o’s allocation site foreach f ∈ F do Let liveBBs be the set of basic blocks that are marked live for o in f’s CFG foreach b ∈ liveBBs do if @s ∈ succ(b), isLive(s) = true then Insert Fix(allocID) at the end of b

else foreach s ∈ succ(b) s.t. isLive(s)=false do Insert a new block n between b and s Insert Fix(allocID) at the beginning of n

Insert Add(allocID, p) immediately after the allocation site p = malloc(...) for o Insert Remove(allocID, q) immediately after each free site free(q) where q is o-reachable

5.3.6 The Metadata Structure

The design philosophy behind our metadata structure is to enable a judicious tradeoff between time and space, which aims to support fast lookup, insertion and removal operations with reasonable space overhead. As shown in Figure 5.5,

AutoFix maintains a closed hash map HallocID for every leaky allocation site (with its unique ID, allocID) to keep track of all the dynamic allocated memory Chapter 5. Memory Leak Fixing 104 Store/Compare 0x16fa201c 0x03 64-bit address (only the lower 48 bits are valid) 40-bit 8-bit

Lookup Linked hash map (HallocID) for a leaky allocation site 0x00

1: void fun ( ) { Step 1 : Allocation 2: ----char* q = “on stack”; : Deallocation 0x01 1 3: ----while (C0) { 0x03 0x07 4: ------char* p = malloc(30);//o {C1} : Free condition 5: ------fgets(p, …); 6: ------if (C1){ fun 0x02 7: ------use(p); 4 {C1} 8: ------free(p); 9: ------} 10: ------else { 5 8 12 0x03 1 11: ------if (C2) { 0x16fa201c 0x06 0x01 0x2aa0501d 12: ------char* t = p; 7 13: ------p = q; 14 0x04 14: ------q = t; 15: ------} 16: ------use(p); 16 17: ------}

slots 0x05 18: ------printf(“loop”); 19: ----} use 8 20: } 21

2 0x06 1 21: void use(char* p) { 0x07 0x03 22: ----printf(“%s”, p); 22 23: } 128 0x07 1 (a) Program (b) Constructing value0x01-flows graph 0x06

static dynamic fun + q = “on stack” allocation address site hash map while (C0) * return

… * … … … … p = malloc(…) ④ allocID address Add (allocID, p) … … fget (p , … ) ⑤ 0xFF FEN if (C1)USL ADDRShadow memory mapping PRE NEXT SC * ⑦ if (C2) use (p) * free (p) ⑧ Add (allocID, addr) t = p ⑫ Remove Insert addr into the hash map indexed by 1 7-bit* p = q 40-allocIDbit 8-bit 8-bit 64-bit (allocID, p) * q = t ⑭ * use (p) ⑯ Remove (allocID, addr) Fix (allocID) Remove addr from the hash map indexed by allocID * printf(“loop”) 128-bit record Control-Flow Graph Fix (allocID) Reclaim memory addresses in the hash map Step 2 Step 3 indexed by allocID, and clear the hash map + : Functions to fix * : Leaky paths FEN : Flag for entryStep 4 validityStep 5 … : Operations on ADDR : The recorded address = : Live basic blocks : Fixes shadow memory PRE : Forward internal(c) Leak fixingchaining(d) Runtime shadow memory SC : Separate chaining NEXT : Backward internal Chaining USL : Unused

Figure 5.5: The metadata structure design.

addresses. The size of HallocID can be user-defined, with larger hash maps con- suming more space and smaller ones potentially imposing higher slowdown due to hash collisions. To achieve a reasonable tradeoff (as evaluated in Section 5.4.3), the default hash map size is set to 28, with 128 bits for each slot, resulting in a total of 4 KB consumed for the hash map.

Without loss of generality, our shadow mechanism is supported on a 64-bit x86 64 architectures with 48-bit virtual address space and word-aligned pointers.

For a 64-bit memory address allocated at a leaky site, AutoFix uses its lower 8 bits as the index to the corresponding entry in the hash map, and maps its middle

40 bits to the field ADDR of the entry. A linked list is implemented to handle hash Chapter 5. Memory Leak Fixing 105

// o’s allocation site: allocID

addro = malloc(…);

1: entry = HallocID -> lookup(addro);

2: entry -> ADDR = (uint64_t)addro >> 8;

3: HallocID -> insert(entry);

(a) Add(allocID, addro)

// o’s deallocation site

free(addro);

4: entry = HallocID -> lookup(addro); 5: if (entry)

6: HallocID -> remove(entry);

(b) Remove(allocID, addro)

// o’s last-use site

lastUse(addro);

7: for(entry = HallocID -> getHead(); entry != 0; entry = HallocID -> getNext(entry))

8: free(entry -> ADDR << 8| entry - HallocID);

9: HallocID -> clear();

(c) Fix(allocID)

Figure 5.6: Implementation of shadow metadata operations. collisions in each hash slot, with the field SC recording the head of the list. Due to the sparsity of the hash map, it is expensive to retrieve all the valid entries to reclaim leaked memory by performing a full scan for the map. To speed up the search, we have used a doubly linked list (similar to Java’s LinkedHashmap) with the two 8-bit fields, PRE and NEXT, to record the previous and the next valid entry indexes. The one-bit field FEN indicates whether an entry is valid or not. It is set to 0 when the entry is removed from the hash map. The 7-bit field USL is preserved for future use.

Figure 5.6 gives the implementation of our shadow metadata functions i.e.

Add(), Remove(), Fix(). The lookup, insert, remove and clear are standard operations that are similar to those in Java’s LinkedHashmap and are thus omit- ted. Function Add(), which is instrumented immediately after the leaky allocation Chapter 5. Memory Leak Fixing 106

Table 5.1: Benchmark characteristics

Size #Leaky Alloc Program #Alloc Sites #Free Sites #True Leaks (KLOC) Sites Reported ammp 13.4 37 30 20 20 gcc 230.4 161 19 45 40 perlbmk 87.1 148 2 12 8 mesa 61.3 82 76 7 3 twolf 20.5 185 1 5 5 a2ps 41.8 295 161 39 28 h2o 18.2 95 123 27 26 redis 61.8 47 62 24 20 Total 534.5 1050 474 179 150

site allocID, first finds a slot in HallocID to create an entry for the allocated mem- ory object o (line 1), then maps bits 8 through 47 of o’s address addro to the 40-bit field ADDR using a simple shift operation (line 2), and finally inserts the entry into

HallocID (line 3). Function Remove(), which is instrumented after a deallocation site of o, checks whether the deallocated address is recorded in HallocID (lines 4 and 5). If so, the corresponding entry is removed (line 6). Function Fix(), which is instrumented after the last-use site of o, first traverses HallocID using its internal linked list via getNext (lines 7), then frees all the recorded addresses (line 8) and clears the hash map (line 9).

5.4 Evaluation

We have implemented AutoFix on top of LLVM (version 3.5.0). Eight C programs are used for evaluation as shown in Table 5.1, including five SPEC2000 benchmarks and three popular open-source applications. The SPEC2000 suite is widely used for evaluating static leak detectors [20, 55, 119, 120]. However, the SPEC2000 bench- marks that have less than two reported leaks (e.g. parser and gap) are excluded Chapter 5. Memory Leak Fixing 107 from our evaluation. The five selected SPEC2000 benchmarks are ammp (contains many leaks), gcc (large-sized and contains many leaks), perlbmkm (allocation- intensive), twolf (allocation-intensive) and mesa (deallocation-intensive). We also include three open-source applications: a2ps-4.14 (a postscript filter) containing a relative large number of leaks, and two long-running server programs: h2o-1.2

(an http server) and redis-2.8 (a NoSQL database). All our experiments are conducted on a platform consisting of a 3.0 GHZ Intel

Core2 Duo processor with 16 GB memory, running RedHat Enterprise Linux 5

(kernel version 2.6.18). The source code of each program is compiled into bit-code

files using clang and then merged together using LLVM Gold Plugin at link-time

(LTO) to produce a whole-program bitcode file. The compiler option mem2reg is turned on to promote memory into registers. Andersen’s pointer analysis is used to build the VFG for the program [45]. We use the leak warnings (leaky allocation sites) reported by the state-of-the-art leak detector, Saber [119, 120], as input to

AutoFix.

We evaluate AutoFix based on three criteria: (1) efficiency (number of fixes generated and the analysis time taken to do so), (2) effectiveness (ability to fix memory leaks and reduce memory usage at runtime), and (3) performance degra- dation (instrumentation overhead at runtime).

5.4.1 Efficiency of AutoFix

The compile-time results of AutoFix are summarized in Table 5.2. Given a total of 179 leaky allocations reported in the eight programs, AutoFix fixes them all by inserting 393 calls to Fix() (Column 2), 179 calls to Add() (Column 3) and 107 calls to Remove() (Column 4). On average, a leaky allocation results in only 2.2

fixes. This shows that AutoFix is able to precisely place fixes along the identified Chapter 5. Memory Leak Fixing 108

303 342 300 262

250 229

200 181 before AutoFix 153 136 150 120 107 89 100 after AutoFix 50 35 35 50 16 26 22 24 25 25 28 25 31 Memory Usage (MB) Memory Usage 23

0 15 30 45 60 75 105 120 135 150 165 180 Time (minute) Figure 5.7: Memory footprint of redis-2.8 before and after fixing the leak.

Table 5.2: Compile-time statistics of AutoFix

Program #Fix() #Add() #Remove() Analysis Time (s) ammp 20 20 0 0.9 gcc 74 45 13 81.7 perlbmk 131 12 28 32.0 mesa 19 7 9 15.1 twolf 7 5 0 3.9 a2ps 51 39 48 17.0 h2o 61 27 2 9.5 redis 30 24 7 56.0 Total 393 179 107 216.1 leaky paths with lightweight instrumentation. As shown in Table 5.2 (Column 5), it takes 216.1 seconds to analyze the 534.5 KLOC for the eight C programs altogether.

In particular, AutoFix spends 81.7 seconds on gcc, the largest program (230.4 KLOC) studied. The analysis times for the other seven programs are all within one minute. Chapter 5. Memory Leak Fixing 109

Table 5.3: Run-time statistics of AutoFix

Program #Triggered Leaks Overhead (%) ammp 1 1.36 gcc 13 0.75 perlbmk 12 0.88 mesa 3 0.76 twolf 2 0.89 a2ps 12 1.82 h2o 15 1.58 redis 9 0.66

5.4.2 Effectiveness of AutoFix

To evaluate the effectiveness of AutoFix in fixing leaks at runtime, we compare the memory usage of each program before and after automated fixing using Val- grind [84]. For the five SPEC2000 benchmarks, their reference inputs are used. For the three open-source applications, their own regression test suites are used. For a total of 67 real leaks triggered by the inputs (Column 2 in Table 5.3), AutoFix is able to reclaim all the leaked memory at runtime, which is verified by Val- grind [84]. In our experiments, we observed that a substantial number of leaks are inter- procedural, involving path correlation. These leaks are ignored and cannot be

fixed by the pure static approach LeakFix [41] due to the over-approximative nature of static analysis. In contrast, AutoFix combines static analysis with runtime checking to enable precise fixing for all leaks including those involving path correlations.

Figure 5.8 shows a memory leak pattern in gcc and the instrumented code with the leak fixed. The pointer p at the callsite use (in line 6) may point to either

(1) a heap object (allocated in line 3) when the if branch is taken, or (2) a stack Chapter 5. Memory Leak Fixing 110

1: char* p = "..."; //stack obj 2: if (...) { 3: p = malloc(...); //heap obj 4: ... 5: } 6: use(p); //conservative last use site of heap obj (a) Code with a memory leak before applying AutoFix

1: char* p = "..."; //stack obj 2: if (...) { 3: p = malloc(...); //heap obj ++ Add(...); //instrumented by AutoFix 4: ... 5: } 6: use(p); //conservative last use site of heap obj ++ Fix(...); //instrumented by AutoFix

(b) Code without memory leaks after applying AutoFix

Figure 5.8: Fixing path correlated leaks in AutoFix. object (allocated in line 1) otherwise. A leak happens in the former case, while the code is leak-free in the latter case. AutoFix tracks the memory allocation by instrumenting an Add after malloc (in line 3) and reclaims only truly leaked memory by performing runtime checks in Fix immediately after the last use site (in line 6) conservatively computed by value-flow analysis.

To further evaluate the effectiveness of AutoFix in fixing leaks for long-running programs, we reproduce a real leak which causes memory exhaustion in redis with its corresponding regression tests [81]. As shown in Figure 5.9, a loop (line 281) is used to query the IPs of slave servers. If a slave sever is dead, reconnection attempts are repeatedly made by calling getaddrinfo (line 291), which allocates memory chunks that are never freed, resulting in leaks inside the for loop. As shown in Figure 5.7, the memory consumption of the leaked version of redis increases around 31.7 KB per second, and over 300 MB are leaked after three hours. Chapter 5. Memory Leak Fixing 111

---256:--static int _redisContextConnectTcp(...) { ... ---281:-----for (p = servinfo; p != NULL; p = p->ai_next) { ... ---291:------rv = getaddrinfo(…, &bservinfo); //o’s allocation site -- ++ Add (bservinfo, allocID); // instrumented by AutoFix ---297:------for (b = bservinfo; b != NULL; b = b->ai_next) { ... //o is used in the loop ---302:------} -- ++ Fix (allocID); //instrumented by AutoFix ---329:----} ---342: -}

Figure 5.9: The leaky code (in net.c of redis-2.8) fixed by AutoFix.

AutoFix can fix this leak effectively, enabling redis’s memory consumption to remain below 35 MB in the fixed version.

5.4.3 Runtime Overhead

To measure runtime overhead, each program is executed five times before and after automated fixing respectively, and the average overhead is reported in Table 5.3.

The experimental results show that AutoFix only introduces negligible overhead for all the eight programs, 1.06% on average, with the maximum 1.82% observed in a2ps. This confirms that our instrumentation is lightweight, achieved by identifying the required deallocation fixes on the value-flow slices of leaky allocations and tracking leaked objects with simple shadow operations at runtime.

To evaluate the impact of the metadata structure (as described in Section 5.3.6) on runtime instrumentation overhead, we choose four different sizes for the hash map used in order to demonstrate the time and space tradeoffs made: 1 (with the hash map degenerating into one linked list), 26, 28 and 216. The results are shown in Figure 5.10. Chapter 5. Memory Leak Fixing 112

Linked List Hash Map with 64 Slots Hash Map with 256 Slots Hash Map with 65536 Slots 3 2.5 2 1.5 1 0.5

Runtime Overhead (%) 0 ammp gcc perlbmk mesa twolf a2ps h2o redis

Figure 5.10: Comparing runtime overheads for different metadata structures used in AutoFix.

For the five benchmarks, ammp, gcc, mesa, twolf and redis, the four configura- tions yield similar overheads. However, for the other three benchmarks, perlbmk, a2ps and h2o, much higher overheads are incurred when their underlying hash maps have degenerated into a single linked list. In this degenerate case, the lookup operations become too expensive, especially when a large number of memory ob- jects are present. When the other three hash map sizes are used, lookup operations can be performed more efficiently. The hash map with 26 slots is not very space- consuming, costing 1 KB for each leaky allocation. However, due to its high collision rates, this hash map still results in high overheads for perlbmk, a2ps and h2o. As shown in Figure 5.10, the hash maps with 28 slots (4 KB per leaky allocation) and

216 slots (1 MB per leaky allocation) suffer from similar overheads. This indicates that 28 slots are already sufficient to guarantee low collision rates, and more slots cannot provide any noticeable performance benefit. For more complicated appli- cations beyond our evaluation, it is still possible that 28 slots are not enough to ensure low hash collision rates. In this situation, AutoFix allows users to allocate more slots for the shadow hash map to achieve better performance.

To understand how the runtime overhead are caused and distributed, we profile each Add, Remove and Fix operation instrumented. Figure 5.11 shows the result.

For all the programs evaluated, maintaining shadow memory (Add and Remove Chapter 5. Memory Leak Fixing 113

Add Remove Fix

ammp gcc perlbmk mesa

25% 26% 37% 36% 44% 45% 43% 63% 32% 11% 38%

twolf a2ps h2o redis

18% 42% 39% 43% 41% 49% 43% 58% 43% 8% 16%

Figure 5.11: Runtime overhead breakdown. together) incurs more overhead than reclaiming the leaked memory (Fix) by an average of 31.3%. For the two programs without the Remove operation instru- mented (i.e. ammp and twolf), the Add operation alone (63% in ammp and 58% in twolf) still accounts for more overhead than the Fix operation (37% in ammp and

42% in twolf). The Remove operation causes a significant proportion of overhead in perlbmk (32%), mesa (38%), and a2ps (43%). These three programs also have relatively high numbers of false memory leaks, as shown in Table 5.1. AutoFix has to identify and tolerate false memory leaks by the Remove operation when fixing true ones.

5.5 Discussion

Like many program analysis and software testing problems, static memory leak detection and fixing are undecidable in general. At its core, it is undecidable whether each leaky path is feasible. Moreover, even if we ignore runtime evaluation Chapter 5. Memory Leak Fixing 114 of branch conditions and assume all static control-flow paths are feasible, it is still impossible to develop an approach that can statically fix all memory leaks by only inserting missing free calls. This is because it requires the following strict condition to be satisfied for each leaky path ρ of each leaked object o: there must exist a program point l on ρ and l must not be on any non-leaky path ρ0. If the condition is not satisfied, the free call instrumented may mistakenly free non-leaked memory objects and is therefore unsafe. However, the condition has been proven to be not satisfiable for many memory leaks [145]. As a result, pure static approach (e.g.

LeakFix [41]) can fix only some memory leaks.

The current AutoFix implementation reclaims leaked memory as early as pos- sible, which minimizes performance degradation caused by memory leaks. However, this is not necessarily the solution with the smallest instrumentation overhead, be- cause postponing a fix may allow multiple instrumentations to be merged, there by reducing runtime overhead. We leave this as our future work.

5.6 Chapter Summary

This chapter presents AutoFix, a fully automated approach to memory leak fixing for C programs by combining static and dynamic analysis. Given a leaky allocation reported by a leak detector, AutoFix performs a graph reachability analysis to identify the leaky paths on the value-flow graph of the program, and then performs a liveness analysis to locate the program points for instrumenting the required fixes on the identified leaky paths at compile-time. To guarantee safe fixing, shadow memory is maintained for the potential leaked memory objects at runtime. Our evaluation shows that AutoFix is capable of fixing all reported memory leaks with small instrumentation overhead. Chapter 6

Related Work

6.1 UAF Detection

Almost all solutions are dynamic (instrumentation-based). Debugging tools such as Valgrind [84] and Dr.Memory [14] can detect a range of memory corruption errors including UAF bugs at the expense of high runtime and memory overheads.

AddressSanitizer [105] is another widely used dynamic tool. However, it can miss dangling pointers that, when dereferenced, point to an object that has reused the memory range. Undangle [16] detects dangling pointers by performing a dynamic taint analysis. Its early detection approach can incur high runtime overheads.

CETS [83] uses an identifier-based scheme, which assigns a unique key for each allocation region to identify dangling pointers. It has an overhead of 116% in order to provide complete memory safety.

Static tools dedicated to UAF detection are scarce, with [39] focusing on binary code, for the reasons given in Section 4.1. General-purpose memory-safety checking tools that can be used to detect UAF bugs include CBMC [59], Clang [5], Coc- cinelle [91], and Supa [116], which have been compared with CRed. Specialized

115 Chapter 6. Related Work 116 tools for detecting other types of bugs exist. Saturn [33, 137] detects memory leaks and null pointers by solving a Boolean satisfiability problem. FastCheck [20] and Saber [119] find memory leaks on the value-flow graph of a program. Buffer overflows can be detected path-sensitively [63] or symbolically [70].

6.2 UAF Protection

Instead of detecting UAF bugs, some efforts are made on protecting against their ex- ploitation. Cling [4] and Diehard [9] represent safe memory allocators that restrict memory reallocation by checking type consistency or approximating infinite heap.

In these cases, dereferenced dangling pointers cannot access memory reallocated to other objects. Thus, UAF exploits are made harder. Alternatively, FreeSentry [67] and DangNull [67] track pointer propagation to invalidate all aliased pointers imme- diately their pointed-to object is freed, at the expense of high runtime and memory overheads. Control-flow integrity [2, 37, 87, 125, 128, 153] restricts the program execution to follow a precomputed CFG even if there are memory corruption bugs

(e.g., UAF). A special case of control-flow integrity is virtual-table integrity, which has also been widely studied to mitigate UAF bugs [44, 51, 97, 103, 152]. However, all fine-grained solutions are too costly to be deployed in production environments and all coarse-grained solutions are bypassable [35]. Existing mitigations and our work are orthogonal, and can be in combination to address UAF.

6.3 Static Analysis for Memory Error Detection

Static analysis has been used for detecting a wide range of memory errors, such as buffer overflows [63, 70, 146], memory leaks [20, 119], uninitialized variables [78,

147], information leaks [19, 42, 74], SQL injection and XSS errors [53, 58, 126, 132], Chapter 6. Related Work 117 and format string vulnerabilities [109], on top of various program representa- tions, such as inter-procedural SSA form [73, 119] abstract syntax tree [143], code property graph [142, 144], and value-flow graph [31, 117], to capture the syntax and/or semantic properties of a program. Tac and CRed are developed on top of SVF [117], which not only inherit the strengths of traditional static analysis but also address the precision and scalability limitations by machine learning and novel program analysis techniques.

6.4 Machine Learning for Bug Detection

In recent years, machine learning techniques have been shown to be effective in guid- ing program analysis for bug detection, such as fault invariant classification [15] for reflecting important aspects of fault-revealing properties in a program, dynamic memory leak detection by classifying staleness values of objects [68], defect predic- tion (e.g., [131]), detection of malicious Java applets [104], source and sink classi-

fication for information flow analysis for Android apps [100], automatic program repair [76, 77], and abstract interpretation [48, 90]. To apply machine learning, a central task is feature engineering, i.e., to mine suitable domain-specific features to train the machine learning engine. For example, the features used in buffer-overflow detection[48], information flow analysis [100], and UAF detection (i.e., Tac) are completely different.

6.5 Memory Leak Detection

Memory leak detection has been extensively studied using static [55, 119, 120, 136] or dynamic [46, 84, 105] analysis. Static detectors examine the source code at compile-time without executing the program. Saturn [136] detects memory leaks Chapter 6. Related Work 118 by solving a Boolean satisfiability problem. Sparrow [55] is based on abstract in- terpretation, and uses function summaries. FastCheck [20] and Saber [119, 120] find memory leaks on the value-flow graph of the program. Dynamic detectors, which find leaks by executing the program, track the memory allocation and deal- location via either binary instrumentation as in Valgrind [84] or source code instrumentation as in AddressSanitizer [105].

6.6 Memory Leak Tolerance

Another line of research focuses on tolerating leaks at runtime [13, 88, 124]. The basic idea is to delay out-of-memory crashes at runtime by offloading stale objects

(regarded as likely leaked) to disks and reclaiming their virtual memory. Upon accessing a mistakenly swapped-out object, the object will be swapped back into the memory, thereby guaranteeing safety. Apart from the space overhead, dy- namically detecting stale objects by tracking accesses of memory objects also re- sults in non-negligible time overhead. For example, LeakSurvivor [124] incurs an average runtime overhead of 23.7% even for applications without memory leaks.

AutoFix, which aims at fixing leaks, is orthogonal to tolerating leaks. Instead of dynamically tracking every memory access of every object to determine objects’ liveness, AutoFix conservatively approximates the liveness for only leaky objects at compile-time, therefore avoiding high runtime overhead.

6.7 Leak Fixing

Memory leaks can be fixed manually or automatically. LeakPoint [22] is a dy- namic taint analysis that identifies last-use sites of leaked objects by tracking point- ers and presents programmers the identified sites as candidate locations for leak Chapter 6. Related Work 119

fixing. LeakChaser [139] relies on user annotations to improve the relevance of bug reports, thereby assisting programmers to diagnose and fix memory leaks. Ob- ject ownership profiling has also been applied to assisting manual leak detection and fixing [101]. LeakFix [41] is a pure static approach to automatically fixing leaks in C programs. Because it cannot handle false positives produced by other state-of-the-art leak detectors, LeakFix relies on its own dedicated leak detector and can fix only some but not all reported leaks. In contrast, our approach com- bines static and dynamic analyses, and is able to automatically fix all the true leaks reported by a detector with small runtime overhead.

6.8 Garbage Collection

Garbage collection (GC) can eliminate UAF bugs and memory leaks (not includ- ing memory bloats [82]) based on its automatic memory management. However, in type-unsafe languages such as C and C++, it is theoretically impossible to imple- ment sound GC to automatically manage memory. A few unsound (conservative) solutions for C and C++ [11, 12, 49, 98] have been shown empirically to be effec- tive with low space and time overheads, in which memory allocations (e.g. malloc sites) are replaced by special allocators, and memory deallocations (e.g. free sites) are removed from the program, at the expense of the prompt low-cost reclamation provided by explicit memory management. GC requires every call to malloc() to be replaced by a call to a special allocator, and is thus hardly useful for legacy code and code using customized allocators.

Compared to AutoFix’s static fixing on value-flow slices, GC uses runtime object reachability to over-approximate object liveness. In addition, garbage col- lectors for C and C++ typically need to monitor all static data areas, stacks, Chapter 6. Related Work 120 registers and heap. In contrast, AutoFix only monitors potential leaky alloca- tions reported by leak detectors, which makes AutoFix much more lightweight than GC. AutoFix and conservative GC can be applied simultaneously, with the former in charge of the leaked memory objects allocated by malloc and the latter in charge of the memory allocated by GC’s special allocators. Chapter 7

Conclusions

In this thesis, we have presented a set of practical program analysis techniques to address temporal memory mismanagement. We focus on two common types of serious memory bugs, those being UAF bugs and memory leaks, the solutions to which require interprocedural analysis on top of pointer analysis.

The three approaches introduced in this thesis are: (1) Tac (Chapter3), a novel machine learning guided typestate analysis for static UAF detection; (2) CRed (Chapter4), a pointer-analysis-based static UAF detection technique with a novel context reduction technique; and (3) AutoFix (Chapter5), a fully automatic memory leak fixing technique based on value-flow analysis.

The key problem in developing these techniques is how to improve analysis precision by mitigating the gap between pointer analysis and its clients. To address this problem, this thesis has investigated three different strategies, described in

Chapters3,4, and5, respectively. Tac establishes correlations between pointer analysis and its client (i.e., a typestate analysis for UAF) via machine learning, in which an SVM classifier is trained and applied to predict UAF-related aliases.

CRed combines the domains of pointer analysis and its client (i.e., UAF detection),

121 Chapter 7. Conclusions 122 with a reduction technique to avoid space explosion. AutoFix combines static and dynamic analyses and resolves the gap between pointer analysis and its client through runtime checking. The three strategies above are orthogonal, and it is possible to design a hybrid strategy based on them, which could be examined in future research.

This thesis aims at finding practical solutions to practical problems. To this end, substantial efforts have been made to strike a balance between precision, soundness and scalability. Multi-staging is emphasized throughout the thesis: light-weight pre-analyses are applied to narrow down the search space, before subsequent heavy- weight analyses are performed to produce precise results. Existing sophisticated program analysis techniques (e.g., demand-driven analysis, slicing, SAT-solving, and CFL reachability algorithm) have been extensively leveraged.

We have implemented Tac, CRed and AutoFix in the LLVM compiler frame- work and conducted comprehensive experiments using well-known real-world appli- cations. The effectiveness and efficiency of the three approaches have been validated by the experimental results.

To conclude, this thesis addresses temporal memory mismanagement by intro- ducing three practical program analysis approaches. Tac and CRed are among the first attempts to detect UAF bugs statically, and alleviate the scarcity of static approaches to UAF detection, while AutoFix, a fully automatic memory leak fixer, provides a new solution to this long-standing performance issue.

Future work in the line of this thesis may involve the following: (1) extending the techniques used in this thesis by mining domain-specific knowledge to enable more precise and efficient analysis for complex program features such as recursive data structure, recursive control-flow, complicated path conditions and sophisticated design patterns; (2) combining the techniques used in this thesis with other state- Chapter 7. Conclusions 123 of-the-art advances, such as symbolic execution, deep learning and architectural support, to develop more powerful hybrid analyses; and (3) applying the techniques used in this thesis to the analysis of other types of memory errors, such as data races, uninitialized variables and null dereferences. Bibliography

[1] J. T. S. 1.2. https://samate.nist.gov/SRD/testsuite.php.

[2] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. Control-flow integrity. In

CCS’05, pages 340–353, 2005.

[3] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: principles, techniques,

and tools, volume 2. Addison-wesley Reading, 2007.

[4] P. Akritidis. Cling: A memory allocator to mitigate dangling pointers. In

USENIX Security’10, pages 177–192, 2010.

[5] C. S. Analyzer. http://clang-analyzer.llvm.org/.

[6] L. O. Andersen. Program analysis and specialization for the C programming

language. PhD thesis, DIKU, University of Copenhagen, 1994.

[7] A. Arcuri and X. Yao. A novel co-evolutionary approach to automatic soft-

ware bug fixing. In IEEE World Congress on Computational Intelligence,

pages 162–168, 2008.

[8] T. Ball and S. K. Rajamani. The slam project: Debugging system software

via static analysis. In POPL’02, pages 1–3, 2002.

[9] E. D. Berger and B. G. Zorn. DieHard: Probabilistic memory safety for

unsafe languages. In PLDI’06, pages 158–168, 2006.

124 BIBLIOGRAPHY 125

[10] D. Beyer, T. A. Henzinger, R. Jhala, and R. Majumdar. The software model

checker blast. International Journal on Software Tools for Technology Trans-

fer, 9(5-6):505–525, 2007.

[11] H.-J. Boehm. Space efficient conservative garbage collection. In PLDI’93,

pages 197–206, 1993.

[12] H.-J. Boehm. Bounding space usage of conservative garbage collectors. In

POPL’02, pages 93–100, 2002.

[13] M. D. Bond and K. S. McKinley. Tolerating memory leaks. In OOPSLA’08,

pages 109–126, 2008.

[14] D. Bruening and Q. Zhao. Practical memory checking with Dr. memory. In

CGO’11, pages 213–223, 2011.

[15] Y. Brun and M. D. Ernst. Finding latent code errors via machine learning

over program executions. In ICSE’04, pages 480–490, 2004.

[16] J. Caballero, G. Grieco, M. Marron, and A. Nappa. Undangle: Early detec-

tion of dangling pointers in use-after-free and double-free vulnerabilities. In

ISSTA’12, pages 133–143, 2012.

[17] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines.

ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27,

2011.

[18] H. Chen, Y. Mao, X. Wang, D. Zhou, N. Zeldovich, and M. F. Kaashoek.

Linux kernel vulnerabilities: State-of-the-art defenses and open problems. In

APSYS’11, page Article No.5, 2011. BIBLIOGRAPHY 126

[19] Q. A. Chen, Z. Qian, Y. J. Jia, Y. Shao, and Z. M. Mao. Static detection

of packet injection vulnerabilities: A case for identifying attacker-controlled

implicit information leaks. In CCS’15, pages 388–400, 2015.

[20] S. Cherem, L. Princehouse, and R. Rugina. Practical memory leak detection

using guarded value-flow analysis. In PLDI’07, pages 480–491, 2007.

[21] F. Chow, S. Chan, S. Liu, R. Lo, and M. Streich. Effective representation

of aliases and indirect memory operations in SSA form. In CC ’96, pages

253–267, 1996.

[22] J. Clause and A. Orso. Leakpoint: pinpointing the causes of memory leaks.

In ICSE’10, pages 515–524, 2010.

[23] M. Conti, S. Crane, L. Davi, M. Franz, P. Larsen, M. Negro, C. Liebchen,

M. Qunaibit, and A.-R. Sadeghi. Losing control: On the effectiveness of

control-flow integrity under stack attacks. In CCS’15, pages 952–963, 2015.

[24] C. Cortes and V. Vapnik. Support-vector networks. Machine learning,

20(3):273–297, 1995.

[25] P. Cousot and R. Cousot. Systematic design of program analysis frameworks.

In POPL’79, pages 269–282. ACM, 1979.

[26] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck.

Efficiently computing static single assignment form and the control depen-

dence graph. ACM Transactions on Programming Languages and Systems

(TOPLAS), 13(4):451–490, 1991.

[27] D. Dalcher. Disaster in London: The LAS case study. In ECBS’99, pages

41–52, 1999. BIBLIOGRAPHY 127

[28] V. Dallmeier, A. Zeller, and B. Meyer. Generating fixes from object behavior

anomalies. In ASE’09, pages 550–554, 2009.

[29] T. H. Dang, P. Maniatis, and D. Wagner. Oscar: A practical page-

permissions-based scheme for thwarting dangling pointers. In USENIX Se-

curity’17, pages 815–832, 2017.

[30] M. Daniel, J. Honoroff, and C. Miller. Engineering heap overflow exploits

with javascript. USENIX WOOT’08, pages 1–6, 2008.

[31] M. Das, S. Lerner, and M. Seigle. ESP: Path-sensitive program verification

in polynomial time. In PLDI’02, pages 57–68, 2002.

[32] L. De Moura and N. Bjørner. Z3: An efficient smt solver. In TACAS’08,

pages 337–340, 2008.

[33] I. Dillig, T. Dillig, and A. Aiken. Sound, complete and scalable path-sensitive

analysis. In PLDI’08, pages 270–280, 2008.

[34] N. Elhage. Virtunoid: a kvm guest → host privilege, escalation exploit. Black

Hat USA, 2011, 2011.

[35] I. Evans, F. Long, U. Otgonbaatar, H. Shrobe, M. Rinard, H. Okhravi, and

S. Sidiroglou-Douskos. Control Jujutsu: On the weaknesses of fine-grained

control flow integrity. In CCS’15, pages 901–913, 2015.

[36] M. F¨ahndrich and F. Logozzo. Static contract checking with abstract inter-

pretation. In FoVeOOS’10, pages 10–30, 2010.

[37] X. Fan, Y. Sui, X. Liao, and J. Xue. Boosting the precision of virtual call

integrity protection with partial pointer analysis for C++. In ISSTA’17,

pages 329–340, 2017. BIBLIOGRAPHY 128

[38] X. Fan, Y. Sui, X. Liao, and J. Xue. Boosting the precision of virtual call

integrity protection with partial pointer analysis for C++. In ISSTA’17,

pages 329–340, 2017.

[39] J. Feist, L. Mounier, and M.-L. Potet. Statically detecting use after free

on binary code. Journal of Computer Virology and Hacking Techniques,

10(3):211–217, 2014.

[40] S. J. Fink, E. Yahav, N. Dor, G. Ramalingam, and E. Geay. Effective types-

tate verification in the presence of aliasing. ACM Transactions on Software

Engineering and Methodology (TOSEM), 17(2):9, 2008.

[41] Q. Gao, Y. Xiong, Y. Mi, L. Zhang, W. Yang, Z. Zhou, B. Xie, and H. Mei.

Safe memory-leak fixing for C programs. In ICSE’15, pages 459–470, 2015.

[42] M. Gordon, D. Kim, J. Perkins, L. Gilham, N. Nguyen, and M. Rinard.

Information flow analysis of android applications in droidsafe. In NDSS’15,

2015.

[43] C. L. Goues, T. Nguyen, S. Forrest, and W. Weimer. Genprog: A generic

method for automatic software repair. IEEE Transaction on Software Engi-

neering (TSE), 38(1):54–72, 2012.

[44] I. Haller, E. G¨okta¸s, E. Athanasopoulos, G. Portokalidis, and H. Bos.

Shrinkwrap: Vtable protection without loose ends. In ACSAC’15, pages

341–350, 2015.

[45] B. Hardekopf and C. Lin. Flow-sensitive pointer analysis for millions of lines

of code. In CGO’11, pages 289–298, 2011.

[46] M. Hauswirth and T. M. Chilimbi. Low-overhead memory leak detection

using adaptive statistical profiling. In ASPLOS’04, pages 156–164, 2004. BIBLIOGRAPHY 129

[47] N. Heintze and O. Tardieu. Demand-driven pointer analysis. In PLDI’01,

pages 24–34, 2001.

[48] K. Heo, H. Oh, and K. Yi. Machine-learning-guided selectively unsound static

analysis. In ICSE’17, pages 519–529, 2017.

[49] M. Hirzel, A. Diwan, and J. Henkel. On the usefulness of type and liveness

accuracy for garbage collection and leak detection. ACM Transactions on

Programming Languages and Systems (TOPLAS), 24(6):593–624, 2002.

[50] S. Horwitz, A. Demers, and T. Teitelbaum. An efficient general iterative

algorithm for dataflow analysis. Acta Informatica, 24(6):679–694, 1987.

[51] D. Jang, Z. Tatlock, and S. Lerner. Safedispatch: Securing c++ virtual calls

from memory corruption attacks. In NDSS’13, 2014.

[52] G. Jin, W. Zhang, D. Deng, B. Liblit, and S. Lu. Automated concurrency-bug

fixing. In OSDI’12, pages 221–236, 2012.

[53] N. Jovanovic, C. Kruegel, and E. Kirda. Pixy: A static analysis tool for

detecting web application vulnerabilities. In SP’06, pages 258–263, 2006.

[54] J. Julien Bertrane, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Min´e,

and X. Rival. Static analysis by abstract interpretation of embedded critical

software. ACM SIGSOFT Software Engineering Notes, 36(1):1–8, 2011.

[55] Y. Jung and K. Yi. Practical memory leak detector based on parameterized

procedural summaries. In ISMM’08, pages 131–140, 2008.

[56] J. B. Kam and J. D. Ullman. Monotone data flow analysis frameworks. Acta

Informatica, 7(3):305–317, 1977. BIBLIOGRAPHY 130

[57] S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support vector machines

with gaussian kernel. Neural computation, 15(7):1667–1689, 2003.

[58] A. Kieyzun, P. J. Guo, K. Jayaraman, and M. D. Ernst. Automatic creation

of sql injection and cross-site scripting attacks. In ICSE’09, pages 199–209,

2009.

[59] D. Kroening and M. Tautschnig. CBMC–C bounded model checker. In

TACAS’14, pages 389–391, 2014.

[60] W. Landi. Undecidability of static analysis. ACM Letters on Programming

Languages and Systems (LOPLAS), 1(4):323–337, 1992.

[61] W. Landi and B. G. Ryder. A safe approximate algorithm for interprocedural

aliasing. In PLDI’92, pages 235–248, 1992.

[62] C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program

analysis & transformation. In CGO’04, 2004.

[63] W. Le and M. L. Soffa. Marple: A demand-driven path-sensitive buffer

overflow detector. In FSE’08, pages 272–282, 2008.

[64] X.-B. D. Le, D.-H. Chu, D. Lo, C. Le Goues, and W. Visser. Jfix: semantics-

based repair of java programs via symbolic pathfinder. In ISSTA’17, pages

376–379, 2017.

[65] C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer. A systematic

study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In ICSE’12, pages 3–13, 2012.

[66] C. Le Goues, S. Forrest, and W. Weimer. Current challenges in automatic

software repair. Software Quality Journal, 21(3):421–443, 2013. BIBLIOGRAPHY 131

[67] B. Lee, C. Song, Y. Jang, T. Wang, T. Kim, L. Lu, and W. Lee. Preventing

use-after-free with dangling pointers nullification. In NDSS’15, 2015.

[68] S. Lee, C. Jung, and S. Pande. Detecting memory leaks through introspective

dynamic behavior modelling using machine learning. In ICSE’14, pages 814–

824, 2014.

[69] O. Lhot´akand K.-C. A. Chung. Points-to analysis with efficient strong up-

dates. In POPL’11, pages 3–16, 2011.

[70] L. Li, C. Cifuentes, and N. Keynes. Practical and effective symbolic analysis

for buffer overflow detection. In FSE’10, pages 317–326, 2010.

[71] L. Li, C. Cifuentes, and N. Keynes. Boosting the performance of flow-sensitive

points-to analysis using value flow. In FSE’11, pages 343–353, 2011.

[72] H.-T. Lin and C.-J. Lin. A study on sigmoid kernels for svm and the training

of non-psd kernels by smo-type methods. Technical report, Department of

Computer Science, National Taiwan University, 2003.

[73] B. Livshits and M. S. Lam. Tracking pointers with path and context sensi-

tivity for bug detection in c programs. In FSE’03, pages 317–326, 2003.

[74] B. Livshits, A. V. Nori, S. K. Rajamani, and A. Banerjee. Merlin: specifi-

cation inference for explicit information flow problems. In PLDI’09, pages

75–86, 2009.

[75] F. Long and M. Rinard. Staged program repair with condition synthesis. In

ESEC-FSE’15, pages 166–178, 2015.

[76] F. Long and M. Rinard. An analysis of the search spaces for generate and

validate patch generation systems. In ICSE’16, pages 702–713, 2016. BIBLIOGRAPHY 132

[77] F. Long and M. Rinard. Automatic patch generation by learning correct

code. In POPL’16, pages 298–312, 2016.

[78] K. Lu, C. Song, T. Kim, and W. Lee. Unisan: Proactive kernel memory

initialization to eliminate data leakages. In CCS’16, pages 920–932, 2016.

[79] R. Madhavan and R. Komondoor. Null dereference verification via over-

approximated weakest pre-conditions analysis. In OOSPLA’11, pages 1033–

1052, 2011.

[80] S. Mechtaev, J. Yi, and A. Roychoudhury. Angelix: Scalable multiline pro-

gram patch synthesis via symbolic analysis. In ICSE’16, pages 691–701, 2016.

[81] D. Mezzatto. Sentinel 2.8 branch memory leak in redis,

https://github.com/antirez/redis/issues/2012, 2014.

[82] N. Mitchell, E. Schonberg, and G. Sevitsky. Four trends leading to java

runtime bloat. IEEE software, 27(1), 2010.

[83] S. Nagarakatte, J. Zhao, M. M. Martin, and S. Zdancewic. CETS: compiler

enforced temporal safety for C. In ISMM’10, pages 31–40, 2010.

[84] N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dy-

namic binary instrumentation. In PLDI’07, pages 89–100, 2007.

[85] H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra. Semfix: Program

repair via semantic analysis. In ICSE’13, pages 772–781, 2013.

[86] NIST. National vulnerability database. http://nvd.nist.gov/.

[87] B. Niu and G. Tan. Per-input control-flow integrity. In CCS’15, pages 914–

926, 2015. BIBLIOGRAPHY 133

[88] G. Novark, E. D. Berger, and B. G. Zorn. Plug: Automatically tolerating

memory leaks in C and C++ applications. Technical Report UM-CS-2008-

009, University of Massachusetts, 2008.

[89] E. Nuutila and E. Soisalon-Soininen. On finding the strongly connected com-

ponents in a directed graph. Information Processing Letters, 49(1):9–14,

1994.

[90] H. Oh, H. Yang, and K. Yi. Learning a strategy for adapting a program

analysis via bayesian optimisation. In OOPSLA’15, pages 572–588, 2015.

[91] M. C. Olesen, R. R. Hansen, J. L. Lawall, and N. Palix. Coccinelle: Tool

support for automated cert c secure coding standard certification. Science of

Computer Programming, 91:141–160, 2014.

[92] M. Orlovich and R. Rugina. Memory leak analysis by contradiction. In

SAS’06, pages 405–424. 2006.

[93] D. J. Pearce, P. H. Kelly, and C. Hankin. Efficient field-sensitive pointer

analysis of C. ACM Transactions on Programming Languages and Systems

(TOPLAS), 30(1):4, 2007.

[94] F. M. Q. Pereira and D. Berlin. Wave propagation and deep propagation for

pointer analysis. In CGO’09, pages 126–135, 2009.

[95] J. H. Perkins, S. Kim, S. Larsen, S. Amarasinghe, J. Bachrach, M. Carbin,

C. Pacheco, F. Sherwood, S. Sidiroglou, G. Sullivan, et al. Automatically

patching errors in deployed software. In SOSP’09, pages 87–102, 2009.

[96] J. Pewny and T. Holz. Evilcoder: automated bug insertion. In ACSAC’16,

pages 214–225, 2016. BIBLIOGRAPHY 134

[97] A. Prakash, X. Hu, and H. Yin. vfguard: Strict protection for virtual function

calls in cots c++ binaries. In NDSS’15, 2015.

[98] J. Rafkind, A. Wick, J. Regehr, and M. Flatt. Precise garbage collection for

c. In ISMM’09, pages 39–48, 2009.

[99] G. Ramalingam. The undecidability of aliasing. ACM Transactions on Pro-

gramming Languages and Systems (TOPLAS), 16(5):1467–1471, 1994.

[100] S. Rasthofer, S. Arzt, and E. Bodden. A machine-learning approach for

classifying and categorizing android sources and sinks. In NDSS’14, 2014.

[101] D. Rayside and L. Mendel. Object ownership profiling: a technique for finding

and fixing memory leaks. In ASE’07, pages 194–203, 2007.

[102] T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis

via graph reachability. In POPL’95, pages 49–61, 1995.

[103] P. Sarbinowski, V. P. Kemerlis, C. Giuffrida, and E. Athanasopoulos. Vtpin:

practical vtable hijacking protection for binaries. In ACSAC’16, pages 448–

459, 2016.

[104] J. Schlumberger, C. Kruegel, and G. Vigna. Jarhead: Analysis and detection

of malicious java applets. In ACSAC’12, pages 249–257, 2012.

[105] K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov. AddressSanitizer:

A fast address sanity checker. In USENIX ATC, pages 309–318, 2012.

[106] F. J. Serna. The info leak era on software exploitation. Black Hat USA, 2012.

[107] H. Shacham. The geometry of innocent flesh on the bone: Return-into-libc

without function calls (on the x86). In CCS’07, pages 552–561, 2007. BIBLIOGRAPHY 135

[108] L. Shang, X. Xie, and J. Xue. On-demand dynamic summary-based points-to

analysis. In CGO’12, pages 264–274, 2012.

[109] U. Shankar, K. Talwar, J. S. Foster, and D. Wagner. Detecting format string

vulnerabilities with type qualifiers. In USENIX Security’01, pages 201–220,

2001.

[110] S. Sidiroglou, S. Ioannidis, and A. D. Keromytis. Band-aid patching. In

HotDep’07, page 6, 2007.

[111] F. Somenzi. Cudd: Cu decision diagram package (3.0.0). http://vlsi.

colorado.edu/~fabio/CUDD/cudd.pdf.

[112] J. Sp¨ath,L. N. Q. Do, K. Ali, and E. Bodden. Boomerang: demand-driven

flow-and context-sensitive pointer analysis for Java. In ECOOP’16, pages

22:1–22:26, 2016.

[113] M. Sridharan and R. Bod´ık. Refinement-based context-sensitive points-to

analysis for java. In PLDI’16, pages 387–400, 2006.

[114] R. E. Strom and S. Yemini. Typestate: A concept for

enhancing software reliability. IEEE Transactions on Software Engineering

(TSE), (1):157–171, 1986.

[115] Y. Sui, P. Di, and J. Xue. Sparse flow-sensitive pointer analysis for multi-

threaded programs. In CGO’16, pages 160–170. ACM, 2016.

[116] Y. Sui and J. Xue. On-demand strong update analysis via value-flow refine-

ment. In FSE’16, pages 460–473, 2016.

[117] Y. Sui and J. Xue. SVF: Interprocedural static value-flow analysis in LLVM.

In CC’16, pages 265–266, 2016. BIBLIOGRAPHY 136

[118] Y. Sui, D. Ye, Y. Su, and J. Xue. Eliminating redundant bounds checks in

dynamic buffer overflow detection using weakest preconditions. IEEE Trans-

actions on Reliability, 65(4):1682–1699, 2016.

[119] Y. Sui, D. Ye, and J. Xue. Static memory leak detection using full-sparse

value-flow analysis. In ISSTA’12, pages 254–264, 2012.

[120] Y. Sui, D. Ye, and J. Xue. Detecting memory leaks statically with full-sparse

value-flow analysis. IEEE Transactions on Software Engineering (TSE),

40(2):107–122, 2014.

[121] Y. Sui, S. Ye, J. Xue, and P. Yew. SPAS: Scalable path-sensitive pointer

analysis on full-sparse SSA. In APLAS’11, pages 155–171.

[122] Y. Sui, S. Ye, J. Xue, and J. Zhang. Making context-sensitive inclusion-based

pointer analysis practical for compilers using parameterised summarisation.

Software: Practice and Experience, 44(12):1485–1510, 2014.

[123] L. Szekeres, M. Payer, T. Wei, and D. Song. Sok: eternal war in memory. In

SP’13, pages 48–62, 2013.

[124] Y. Tang, Q. Gao, and F. Qin. Leaksurvivor: towards safely tolerating memory

leaks for garbage-collected languages. In USENIX ATC’08, pages 307–320,

2008.

[125] C. Tice, T. Roeder, P. Collingbourne, S. Checkoway, U.´ Erlingsson,

L. Lozano, and G. Pike. Enforcing forward-edge control-flow integrity in

gcc & llvm. In USENIX Security’14, pages 941–955, 2014.

[126] O. Tripp, M. Pistoia, S. J. Fink, M. Sridharan, and O. Weisman. Taj: effective

taint analysis of web applications. In PLDI’09, pages 87–97, 2009. BIBLIOGRAPHY 137

[127] E. van der Kouwe, V. Nigade, and C. Giuffrida. Dangsan: scalable use-after-

free detection. In EuroSys’17, pages 405–419, 2017.

[128] V. van der Veen, D. Andriesse, E. G¨okta¸s,B. Gras, L. Sambuc, A. Slowinska,

H. Bos, and C. Giuffrida. Practical context-sensitive CFI. In CCS’15, pages

927–940, 2015.

[129] V. Vapnik. The nature of statistical learning theory. Springer Science &

Business Media, 2013.

[130] K. Vorobyov and P. Krishnan. Comparing model checking and static program

analysis: A case study in error detection approaches. In SSV’10, pages 1–7,

2010.

[131] S. Wang, T. Liu, and L. Tan. Automatically learning semantic features for

defect prediction. In ICSE’16, pages 297–308, 2016.

[132] G. Wassermann and Z. Su. Static detection of cross-site scripting vulnerabil-

ities. In ICSE’08, pages 171–180, 2008.

[133] Y. Wei, Y. Pei, C. A. Furia, L. S. Silva, S. Buchholz, B. Meyer, and A. Zeller.

Automated fixing of programs with contracts. In ISSTA’10, pages 61–72,

2010.

[134] W. Weimer and G. C. Necula. Mining temporal specifications for error de-

tection. In TACAS’05, pages 461–476, 2005.

[135] A. Williams. Amazon web services outage caused by memory leak

and failure in monitoring alarm. http://techcrunch.com/2012/10/27/

amazon-web-services-outage-caused-by-memory-leak-and-failure-

in-monitoring-alarm. 2012. BIBLIOGRAPHY 138

[136] Y. Xie and A. Aiken. Context- and path-sensitive memory leak detection. In

FSE’05, pages 116–125, 2005.

[137] Y. Xie and A. Aiken. Saturn: A scalable framework for error detection using

boolean satisfiability. ACM Transactions on Programming Languages and

Systems (TOPLAS), 29(3):16, 2007.

[138] Y. Xiong, H. Zhang, A. Hubaux, S. She, J. Wang, and K. Czarnecki. Range

fixes: Interactive error resolution for software configuration. IEEE Transac-

tions on Software Engineering (TSE), 41(6):603–619, 2015.

[139] G. Xu, M. D. Bond, F. Qin, and A. Rountev. Leakchaser: helping pro-

grammers narrow down causes of memory leaks. In PLDI’11, pages 270–282,

2011.

[140] W. Xu, D. C. DuVarney, and R. Sekar. An efficient and backwards-compatible

transformation to ensure memory safety of C programs. In FSE’12, pages

117–126, 2004.

[141] W. Xu, J. Li, J. Shu, W. Yang, T. Xie, Y. Zhang, and D. Gu. From collision

to exploitation: Unleashing use-after-free vulnerabilities in linux kernel. In

CCS’15, pages 414–425, 2015.

[142] F. Yamaguchi, N. Golde, D. Arp, and K. Rieck. Modeling and discovering

vulnerabilities with code property graphs. In SP’14, pages 590–604, 2014.

[143] F. Yamaguchi, M. Lottmann, and K. Rieck. Generalized vulnerability ex-

trapolation using abstract syntax trees. In ACSAC’12, pages 359–368, 2012.

[144] F. Yamaguchi, A. Maier, H. Gascon, and K. Rieck. Automatic inference of

search patterns for taint-style vulnerabilities. In SP’15, pages 797–812, 2015. BIBLIOGRAPHY 139

[145] H. Yan, Y. Sui, S. Chen, and J. Xue. Automated memory leak fixing on

value-flow slices for C programs. In SAC’16, pages 1386–1393, 2016.

[146] D. Ye, Y. Su, Y. Sui, and J. Xue. Wpbound: Enforcing spatial memory safety

efficiently at runtime with weakest preconditions. In ISSRE’14, pages 88–99,

2014.

[147] D. Ye, Y. Sui, and J. Xue. Accelerating dynamic detection of uses of undefined

variables with static value-flow analysis. In CGO’14, pages 154–164.

[148] S. Ye, Y. Sui, and J. Xue. Region-based selective flow-sensitive pointer anal-

ysis. In SAS’14, pages 319–336, 2014.

[149] Y. Younan. Freesentry: Protecting against use-after-free vulnerabilities due

to dangling pointers. In NDSS’15, 2015.

[150] Y. Younan, W. Joosen, and F. Piessens. Runtime countermeasures for code

injection attacks against C and C++ programs. ACM Computing Surveys

(CSUR), 44(3):17, 2012.

[151] H. Yu, J. Xue, W. Huo, X. Feng, and Z. Zhang. Level by level: making flow-

and context-sensitive pointer analysis scalable for millions of lines of code. In

CGO’10, pages 218–229, 2010.

[152] C. Zhang, C. Song, K. Z. Chen, Z. Chen, and D. Song. Vtint: Protecting

virtual function tables’ integrity. In NDSS’15, 2015.

[153] C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres, S. McCamant, D. Song,

and W. Zou. Practical control flow integrity and randomization for binary

executables. In SP’13, pages 559–573, 2013. BIBLIOGRAPHY 140

[154] H. Zhang, L. Gong, and S. Versteeg. Predicting bug-fixing time: An em-

pirical study of commercial software projects. In Proceedings of the 2013

international conference on software engineering, pages 1042–1051, 2013.

[155] J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic. Formalizing the

LLVM intermediate representation for verified program transformations. In

POPL’12, pages 427–440, 2012.

[156] X. Zheng and R. Rugina. Demand-driven alias analysis for C. In POPL’08,

pages 197–208, 2008.