Attack Surface Analysis and Code Coverage Improvement for Fuzzing

This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore.

Attack surface analysis and code coverage improvement for fuzzing

Peng, Lunan

2019

Peng, L. (2019). Attack surface analysis and code coverage improvement for fuzzing. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/105642 https://doi.org/10.32657/10356/105642

Downloaded on 24 Sep 2021 13:41:57 SGT Attack Surface Analysis and Code Coverage Improvement for Fuzzing

PENG LUNAN

School of Physical and Mathematical Sciences

2019 Attack Surface Analysis and Code Coverage Improvement for Fuzzing

PENG LUNAN

School of Physical and Mathematical Sciences

A thesis submitted to the Nanyang Technological University in partial fulﬁllment of the requirements for the degree of Master of Science

2019

Supervisor Declaration Statement

I have reviewed the content and presentation style of this thesis and declare it of sufficient grammatical clarity to be examined. To the best of my knowledge, the thesis is free of plagiarism and the research and writing are those of the candidate’s except as acknowledged in the Author Attribution Statement. I confirm that the investigations were conducted in accord with the ethics policies and integrity standards of Nanyang Technological University and that the research data are presented honestly and without prejudice.

...... Date Wu Hongjun

Abstract

As cybercrime becoming a worldwide threat in the past decades, research on cybersecurity keeps attracting a great deal of attention. During a long time competition between attackers and defenders, vulnerability detection has been considered as the decisive pre-step for both sides. Among the massive methodologies of vulnerability detection, fuzzing test has demonstrated its outstanding performance on ﬁnding bugs automatically and eﬀectively.

A fuzzer repeatedly provides generative-based or mutation-based samples to the target program to explore misbehavior of it. Even though many boosting tech- niques have been proposed to further improve the eﬃciency of fuzzing, nowadays there are still two crucial aspects remaining with enduring appeal to researchers: one is attack surface analysis to help fuzzers put more eﬀort on the most potentially vulnerable locations, another one is code coverage improvement to guide fuzzers to explore more code regions.

In this thesis, we present attack surface analysis and code coverage improvement for fuzzing. In the first work, we choose Linux Kernel as the target, categorize its source files into different components upon their functionalities. Then we collect data of all related Common Vulnerabilities and Exposures (CVE) and analyze their distributive features to identify the vulnerable level of each component. In the second work, we utilize the rarely-hit edges as the metric to guide a multi- round generative-based fuzzing on Document Object Model (DOM) of Chromium browser. We use the default template to generate a large number of samples in the first fuzzing round, compute the hit times of all covered edges and find out samples that cover any rarely-hit edges as templates for the second round fuzzing. The approach achieved an obvious improvement on the code coverage of newly generated samples compared to the default one.

1 Acknowledgements

I would like to express my sincere thanks to everyone who has supported me in the past a couple of years. I do feel considerably grateful for all your help with both academic issues and daily life.

First of all, I would like to thank my supervisor, Professor Wu Hongjun, for his continuous and patient guidance throughout my whole study and work progress. His encouragement motivates me to keep exerting my full energy in solving any troubles and doubts faced during the work.

I would like to thank Professor Liu Yang for helping me with his expert insight and rich experience in the security research area. It is precisely because of his excellent guidance on my ﬁnal year project during the undergraduate study that inspired my interest in cybersecurity research.

Furthermore, I would also like to thank my senior colleagues: Dr. Huang Tao, Dr. Wang Chenyu and Mr. Yu Haiwan, for providing signiﬁcant advice and assistance to my research. They are all excellent friends to play with and good teachers to learn from.

Last but not least, I express my greatest gratitude to my parents. Their uncon- ditional love equips me with unlimited determination and courage in my whole life.

2 Contents

Abstract1

Acknowledgements2

List of Figures5

List of Tables6

1 Introduction7 1.1 Cybersecurity...... 7 1.2 Vulnerability...... 8 1.2.1 Types...... 9 1.2.2 Detection...... 12 1.3 Fuzzing...... 14 1.3.1 Basic Approach...... 15 1.3.2 Guided Approach...... 16 1.4 Thesis Organization...... 17

2 Attack Surface Analysis on Linux Kernel 19 2.1 Background...... 19 2.1.1 Attack Surface Analysis...... 19 2.1.2 Linux Kernel Fuzzing...... 20 2.2 Motivation and Approach...... 22 2.3 Crawler Design...... 23 2.4 CVE Collection...... 24 2.4.1 Database Choice...... 24 2.4.2 Collect ID, Type, Score...... 26 2.4.3 Collect Patch Commits...... 28 2.5 Linux Kernel Component...... 30 2.5.1 Component Category...... 30 2.5.2 Collect Component Files...... 31 2.6 Results and Discussion...... 33

3 Code Coverage Improvement for Coverage Guided Fuzzing 41

3 3.1 Background...... 41 3.1.1 Code Coverage...... 41 3.1.2 Coverage Guided Fuzzing...... 43 3.2 Motivation...... 44 3.3 Approach...... 45 3.3.1 Overview...... 45 3.3.2 Fuzzing Target...... 46 3.3.3 Test Case Generator...... 48 3.3.4 Monitor...... 52 3.3.5 Coverage...... 52 3.3.6 Scoring...... 54 3.3.7 Reﬁnement...... 55 3.4 Implementation and Evaluation...... 60

4 Conclusion and Future Work 66 4.1 Conclusion...... 66 4.2 Future Work...... 67

Bibliography 68

4 List of Figures

1.1 Number of CVEs (1999 Jan - 2019 May)...... 9 1.2 CVSS Score Distribution [1]...... 9 1.3 Stack-based Buﬀer Overﬂow Exploitation...... 10 1.4 Use After Free...... 11 1.5 Race Condition...... 12 1.6 Basic Approach of A Fuzzing Framework...... 15

2.1 User Space and Kernel Space...... 21 2.2 Workflow of Our Crawler...... 23 2.3 Known Affected Software Configurations of CVE-2018-1857..... 25 2.4

tag contains hyperlinks to CVE lists...... 27 2.5 An Example of Reference Links Provided by CVEDetails...... 28 2.6 An Example of commits change log...... 30 2.7 Our Component Category for Linux Kernel...... 31 2.8 Number of Collected Linux Kernel CVEs with Years...... 33 2.9 Number of Collected Linux Kernel CVEs with Types & Years... 34 2.10 Number of Collected Linux Kernel CVEs with Scores & Years... 35 2.11 Amount of Source Files in Components...... 37 2.12 Amount of CVEs in Components...... 37 2.13 Component Files/CVEs Ratio...... 39 2.14 Vulnerable Level of Linux Kernel Components...... 39

3.1 A Sample Control Flow Graph...... 43 3.2 The Overview of Our Approach...... 47 3.3 Classical Mutation-based Coverage Guided Fuzzing Approach... 47 3.4 Workﬂow of Samples Generation...... 48 3.5 Workﬂow of Minimization...... 57 3.6 Chromium Build Arguments...... 60 3.7 Code Coverage Comparison...... 65

5 List of Tables

2.1 Database Comparison in aspect of Linux Kernel CVE Collection.. 26 2.2 CVE Amount and Percentage with Types...... 34 2.3 Amount of Collected Data Related to CVE-Component Mapping. 36 2.4 Top 5 Linux Kernel Source Files upon Relevant CVE Amount... 38

3.1 Basic Results of Execution in Round 1...... 63 3.2 Threshold Related Result...... 64 3.3 Results of Reﬁnement in Round 1...... 64 3.4 Code Coverage Comparison...... 64

6 Chapter 1

Introduction

1.1 Cybersecurity

With the increasing reliance on computer and information technology, electronic crimes that target computers and networks, known as cybercrime, have caused unbelievable loss to society. In 2014, Reuters reported that the annual damage of cybercrime to the global economy is about $445 billion [2], including damage to business as well as to individuals. In 2017, WannaCry Ransomware Attack infected more than 230,000 computers in over 150 countries within one day, encrypted ﬁles on victim’s computers without permission and asked for ransom money, even hit the UK’s National Health service and caused unexpected emergencies [3]. Due to the growing threats from cybercrime, cybersecurity is attracting more and more worldwide attention.

Cybersecurity is a very wide ﬁeld that concentrates on the protection of computer system and people’s legal assets from electronic damage. The International Telecommunications Union (ITU) deﬁned it as an assembly of policies, protocols, guidelines, tools, technologies, management approaches and assurance that can be used to ensure the safety of cyber environment [4].

Many countries have oﬃcially published statements and measures to ﬁght against cybercrime such as system hacking and secrets stealing [5]. And companies with

7 business around cybersecurity are growing fast and strongly. Along with that, competition between nation and nation, company and company tends to be in- creasingly ﬁerce. Therefore, there is no doubt that cybersecurity will continually play an important role in national safeguarding, global business, academic research and our daily life.

1.2 Vulnerability

The first necessary step required by hackers to launch a cybercrime is to find a vulnerability, which is defined as a weakness that allows an attacker to reduce a system’s information assurance [6]. In this thesis, we only focus on software vulnerabilities rather than hardware or physical cases.

Software vulnerabilities exist mainly because of developers’ carelessness and mis- handling of abnormal conditions and will be triggered by speciﬁc inputs. Both attackers and developers try hard to discover these vulnerabilities. The attackers aim to build malformed inputs accordingly to break protections of the target. While the developers need to locate and ﬁx these vulnerabilities for enhancing safeguarding for their product.

To collect and summarize as many software vulnerabilities as possible, the Common Vulnerabilities and Exposures (CVE) system was developed. Vulnerabilities in publicly released and widely used software packages and datasets can be submitted to this system. After verification and evaluation, unique CVE Identifiers, as well as Common Vulnerability Scoring System (CVSS) scores, will be assigned based on their impactive level on confidentiality, integrity and availability of affected target.

The number of assigned CVEs provided by CVE system [7], shown in Figure 1.1, has indicated an obviously increasing trend in the past decades. By comparing CVSS score distribution in recently three years in Figure 1.2[1], it can be ﬁgured out that the amount of CVEs with a score lower than 8 is in explosive growth, indicating the ﬁeld of vulnerability discovery and submission is attracting a mass of individuals and organizations.

8 Figure 1.1: Number of CVEs (1999 Jan - 2019 May)

Figure 1.2: CVSS Score Distribution [1]

1.2.1 Types

There is no authoritative standard of classiﬁcation for vulnerabilities in software due to the variety of root causes. In this thesis, we introduce four common types of software vulnerabilities.

• Overflow: Includes but not limited to Integer Overflow and Buffer Overflow.

Integer Overflow occurs when assigning a numeric value that is out of repre- sentable range to a certain type of variable. For example, assign the result of 255+2 to an 8-bit unsigned variable, whose value range is from 0 to 255, will trigger an integer overflow, causing the unsigned variable to be 1 instead of 257. It may cause security bugs in case the overflowed value is used to define some crucial number, such as the size of memory allocation.

Buffer Overflow occurs when program overruns the boundary of a buffer and overwrites adjacent memory address. For example, if we define A as a string buffer with a size of 5 bytes, without bounds checking, assigning 9 an 8-byte string to A will cause buffer overflow and overwrite the next 3 bytes. Exploitation on buffer overflow varies by the memory architecture. Usually, there are stack-based and heap-based. Figure 1.3 demos a simple stack-based buffer overflow exploitation, which overwrites the return address of the vulnerable stack, results in arbitrary address jumping.

Figure 1.3: Stack-based Buﬀer Overﬂow Exploitation

• Use After Free: Refers to the abnormal attempts to access memory address which has already been freed. Usually caused and exploited by a dangling pointer.

In computer programming, a pointer points to a specified memory address that is allocated for storing values. Programmer can obtain these stored values by referencing the correlative pointer. That significantly reduces the overload of repetitive operations such as string traversal and tree traversal. If a pointer is not properly handled after freeing its pointed memory, it becomes a dangling pointer, also called a wide pointer, and consequently may lead to a memory corruption flaw. Attackers detect and make use of dangling pointers to violate memory and launch an exploitation.

10 One example of use after free is shown in Figure 1.4. Once the memory address is freed, the pointer referencing to it becomes a dangling pointer. If the attacker is able to write malicious code to this freed memory and the dangling pointer is reused after that, the malicious code may get executed, and the pointer turns to be an accomplice for attacker’s evil purpose.

Figure 1.4: Use After Free

• Race Condictions: Occurs when the timing or sequence of processes execution gets into confusion.

Modern software and application usually take advantage of multithreading to achieve better performance. In the case that diﬀerent threads share some same states or have high dependence between each other, the execution sequence must be strict and exclusive. Otherwise, the possibility of corrupting will raise, resulting in undeﬁned behavior.

For example, there are two threads both read an variable A and increase it by one, then write back. The normal execution sequence is first running thread 1, after completion, run thread 2, finally get A=2 (initially A=0 ). However, in a race condition, the sequence may be changed to that in Figure 1.5, thread 2 reads variable A before thread 1 writes the new vaule back. The final value of A hence becomes 1 instead of 2.

• Input Validation: Arises when the improper user input is unchecked.

User interaction is a signiﬁcant aspect of software development. Many software is designed to take and handle inputs from the user and provide output to solve their inquiries or requests. If an input is not suitable for the requirement, such as inputting a string when an integer is expected, and without

11 Figure 1.5: Race Condition

properly checking, error may occur. In some worse situations, it can be used for security exploitation.

One exploiting technique based on that is Structured Query Language (SQL) Injection. SQL is a widely used programming language for processing and managing data in a database [8]. In a SQL-based application with a lack of input check, malicious SQL statements can be inserted and get executed. For example, the following statement intends to query records from table of users with speciﬁed username in background database:

SELECT * FROM users WHERE name = (INPUT);

Attacker can obtain the records of all users rather than the inputted one, by simply inject input as a short statement, like the following:

SELECT * FROM users WHERE name = ‘’ OR ‘1’ = ‘1’;

In that case, since ‘1’ = ‘1’ is always TRUE, all records in the table will be selected, causing information leakage.

1.2.2 Detection

Since vulnerability is so crucial in defense of cybercrime, massive research about its detection has been conducted, and many detection tools have been developed. There exist multiple approaches to detect software vulnerabilities. According to

12 the methodology, we can categorize these approaches into two basic types: static approach and dynamic approach.

• Static Approach: Analyze and examine the source code of the target application without executing it.

Static analysis aims to extract information such as data chunk, integrity constraints, data ﬂow and control ﬂow, from source code and launch a judgment based on that. Sotirov introduced that the most popular approaches to static analysis include but not limited to pattern matching, abstract syntax tree (AST) analysis and taint checking [9].

Pattern matching is to firstly summarize sequence patterns of specific abnor- mality, then check the matching condition to detect it in the target program. AST represents the abstract syntax of target source code in a tree structure, within that every construct is parsed into a node, and can be used to analyze both syntax and semantic rules of the program. Taint checking proposes to proceeds variables one by one and obtains a set of variables with a potential probability of being influenced by outside inputs, therefore the checker is able to alert users when an influenced variable is used in some critical statements.

The above approaches are widely used in software vulnerability detection. Van Lunteren [10] and Dharmapurikar [11] build pattern-matching models for intrusion detection. Skyﬁre applies AST [12] analysis to implement a seed generator for software testing. And Leakminer [13] is designed to detect information leakage vulnerability on Android by static taint analysis.

The biggest beneﬁt of the static approach in vulnerability detection is the outstanding capacity of detecting bugs deeply hidden in rarely reached code blocks. But it also has many obvious limitations. Firstly, static analysis is not suitable for closed-source applications. Secondly, approaches like pattern matching highly depend on the quality of summarized rules and patterns, that requires deep knowledge and rich experience in the related area.

• Dynamic Approach: Analyze the software and detect abnormities based on the performance during execution.

13 In plain words, it is to execute the target application with sufficient valid (both syntactically-valid and semantically-valid) inputs, monitor the process, and dump pre-defined interesting behavior for tracking and analyzing, aiming to find out the root cause and identify potential vulnerabilities.

Compared to static analysis, because of black-box testing technique, open- source is not required for detecting vulnerabilities dynamically. Dynamic testing is also under a low-level requirement of knowledge on the target program. However, multiple assistant means, like test case minimization and code coverage, are needed to improve the eﬃciency and accuracy of dynamic detection [14, 15]. In some cases, symbolic execution and loop analysis are also required for overcoming complicated magic number and dead loop issues to trigger deeper paths [14, 16–18].

In this thesis, we focus on fuzzing, one of the most commonly used dynamic approach for detecting software vulnerabilities. More details will be introduced in the next subsection.

1.3 Fuzzing

Fuzzing is firstly publicly mentioned in a research project of the University of Wisconsin in 1988 [19]. Sutton, Greene and Amini define fuzzing as a method for discovering faults in software by providing unexpected inputs and monitoring for exceptions [20]. It is an automated technique for software testing, and has been proved as one of the most effective testing methodologies by the fact that a huge amount of software vulnerabilities have been detected based on it in the past decades. Because of its high efficiency, famous vendors such as Microsoft [21] and Google [22] keep spending more and more resources on developing fuzzers in recent years.

14 1.3.1 Basic Approach

The simplest approach of a fuzzing framework is shown in Figure 1.6. It consists of only three components: inputs, target program and execution monitor. The framework collects suﬃcient inputs (also known as test cases) through automatic generation, self-mutation or crawling from the Internet. Then it repeatedly sends these inputs to the target program to execute. A monitor is attached during the execution, to discover and dump any exceptions. Valuable exceptions will be ana- lyzed manually by using disassembler or debugger, such as Interactive Disassembler (IDA) [23] and GNU Debugger (GDB) [24], to be further classiﬁed as vulnerable or vulnerable-free.

Figure 1.6: Basic Approach of A Fuzzing Framework

A fuzzer can be categorized by diﬀerent policies. Depending on the methodology of test case collection, it can be categorized into generation-based fuzzing or mutation- based fuzzing. Depending on the awareness of the target program’s structure and rules, it can be categorized into white-box, black-box or grey-box fuzzing.

• Generation-based: Be aware of syntax structure and semantic rules of input, automatically generate new valid inputs accordingly. Fuzzing frameworks such as Domato [25] and LangFuzz [26] that focus on the input generation approach are all generation-based.

• Mutation-based: Launch mutative strategies such as bit ﬂipping and byte ﬂipping on existing inputs. Mutating these provided inputs to generate new

15 inputs. TaintScope [27], Driller [17] and American Fuzzy Lop (AFL) [28] belong to this category.

• White-Box Fuzzing: Fuzzing on programs whose source code is available. Since the internal structure is visible, information such as control ﬂow, data ﬂow and code coverage are relatively simple to get. Therefore, a white-box fuzzing framework like SAGE [29] usually leverages static analysis as well as symbolic execution to perform fuzzing.

• Black-Box Fuzzing: Opposed to white-box, uses a massive amount of inputs to fuzz target program in the condition that internal structure is un- aware. Most black-box fuzzing frameworks such as KameleonFuzz [30] and PULSAR [31] make use of high-quality vulnerable patterns summarized by themselves, combining with output-based learning, to develop their fuzzing strategy.

• Grey-Box Fuzzing: Partially aware of internal structure, leverages lightweight instrumentation to fetch transition and coverage information, in order to in- tegrate with black-box fuzzing’s methodologies and resulting in a reasonable balance between accuracy and execution speed. Some famous fuzzing framework such as LibFuzzer [32] and AFL [28] are based on grey-box approaches.

1.3.2 Guided Approach

One extreme eﬃcient extension to fuzzing is guiding the approach by some feedback, which is known as guided fuzzing.

In a guided approach of fuzzing, some aspects of outcome produced by previous inputs are recorded. Through speciﬁed treatment, inputs with valuable feedback will be kept and marked as “interesting”, while “uninteresting” inputs are discarded. These interesting inputs can be viewed as high-quality seeds and will be used as bases of mutation or templates of generation for creating new inputs. The simple algorithm of guided fuzzing approach is shown in Algorithm1.

16 Algorithm 1 Guided Fuzzing Algorithm Input: Inputs(Test cases) in a queue: Q; Generation/Mutation approach: M; Feedback judgement approach: F ; Target program: T 1: while Q is not empty do 2: S ← Dequeue(Q) 3: for Pre-deﬁned times of Generation/Mutation on S do 4: S0 ← M(S) 5: R ← Execute(S0,T ) 6: if Crash occurs in R then 7: Save R and S0; 8: else 9: if F (R) is interesting then 10: Update F based on R; 11: Enqueue(Q, S0); 12: end if 13: end if 14: end for 15: end while

From Algorithm1, we can see the crucial aspect of the guided fuzzing approach is how to judge whether an executive result is interesting, in other words, the metric of judgment. That has a decisive inﬂuence on the eﬃciency of fuzzing.

In Chapter3 of this thesis, we use code coverage, one broadly used metric in recent fuzzing research like [14, 15, 28, 33, 34], to launch a coverage-guided approach for fuzzing JavaScript engines and browsers. More background will be discussed in that chapter.

1.4 Thesis Organization

In this thesis, we will present two works related to software vulnerability detection. The ﬁrst one is attack surface analysis for fuzzing, and the second one is code coverage improvement for guided generative fuzzing on DOM of Chromium browser.

17 The rest of this thesis is organized as follows:

Chapter 2 presents the work about attack surface analysis on Linux Kernel by collecting related vulnerabilities and mapping them to kernel components. We ﬁrst introduce some background of the attack surface and Linux Kernel fuzzing. Then we present the ﬂow of our vulnerability collection and mapping work. Finally, identify the vulnerable level of each component by analyzing the distributive status of these vulnerabilities.

Chapter 3 presents the work about improving code coverage of generative fuzzing on Chromium DOM through a rarely-hit edges targeted strategy. We will ﬁrst de- scribe the background of coverage-guided fuzzing. Then explain in detail the steps of our approach. Finally, we will present our implementation, discuss experiment outcomes, and evaluate its performance on coverage improvement.

Chapter 4 summarizes our works and discusses what can be further improved in the future.

18 Chapter 2

Attack Surface Analysis on Linux Kernel

2.1 Background

2.1.1 Attack Surface Analysis

The attack surface of a system is deﬁned as the set of ways in which an adversary can enter the system and potentially cause damage [35]. In more detail, it is the sum of all valuable data, all paths lead to these data and all code protects these paths and data in a system [36]. Larger attack surface usually indicates weaker security.

Attack surface analysis focuses on proceeding a security evaluation for target system or application, aims to obtain a vulnerability assessment, figure out potential security risks, identify vulnerable parts and finally assist developers to fix these vulnerabilities and enhance the safeguard. It is important for both developers and attackers to find and understand vulnerable areas in the target, because of a brief overview of which parts are at high risk can be summarized and then benefits the vulnerability detection.

19 Since it can predict to some extent where vulnerabilities may exist, attack surface analysis is usually viewed as powerful assistance for eﬃcient vulnerability detection. Here we discuss some related work about attack surface analysis, as well as its assistant application in vulnerability prediction and detection:

Howard, Pincus and Wing [37] propose a multi-dimension metric for measuring the security of a system. They summarize the used resources, communication channels & protocols, and access rights of a given attack as three dimensions for describing a system’s attack surface, and use a count of attack opportunities to indicate a system’s “attackability”, which is treated as a measurement of how exposed the attack surface is.

Shin and Williams [38] make use of code complexity metrics to predict security vulnerabilities. According to the correlation between software vulnerability and complexity, they extract nine code complexity metrics to categorize functions into vulnerable, none-vulnerable and faulty functions. They also develop a predictor based on that and demonstrate its capacity of distinguishing vulnerable functions.

LEOPARD [39] is a framework designed to assist the identiﬁcation of potential vulnerability functions. Complexity metrics and vulnerability metrics are combined in this framework. The former one is used to group functions, and the latter one performs as a rank standard to rank these functions in descending order of vulnerable level. The application of LEOPARD on fuzzing of real software indicates an outstanding performance.

2.1.2 Linux Kernel Fuzzing

To ensure the safety of memory and hardware, the virtual memory of an operating system is usually segregated into kernel space and user space. The core in an operating system is running in the kernel space, known as system kernel, which is protected by the highest privilege and takes full control over the whole system.

20 The kernel is responsible for directly controlling the computer hardware such as Central Processing Unit (CPU), Random-access Memory (RAM) and storage device. Meanwhile, it interfaces with user space processes through system calls. The brief architecture is shown as Figure 2.1.

Figure 2.1: User Space and Kernel Space

Linux Kernel is an open-source kernel for the Linux family of operating systems. It was created by Linus Torvalds in 1991 [40], written mainly in C language and compiled by GNU Compiler Collection (GCC). As of November 2017, all of the top 500 supercomputers in the world use Linux as the system kernel [41]. In Linux Kernel, partial code such as device drivers is treated as loadable kernel modules (LKMs) and can be loaded once required and unloaded to free memory after usage.

Since kernel holds the root privilege for accessing everything in a computer system, successful exploitation on kernel usually results in complete control of the system. Because of that, kernel vulnerability detection keeps attracting the attention of participators in cybersecurity and cybercrime. In recent years, as fuzzing demonstrates its power in vulnerability detection, a lot of research around kernel fuzzing is published. Here we discuss some related work that aims to detect vulnerabilities in OS kernel by fuzzing:

Trinity [42] is a fuzzing framework designed to detect Linux Kernel vulnerability through testing Linux system calls. Instead of calling system calls with random arguments, Trinity builds a bunch of sockets for a certain type of argument, such 21 as ﬁle descriptor parameters. Once such an argument is required for a system call, the fuzzer will select one randomly from the bunch, aiming to avoid the problem that the kernel will reject a pure random invalid parameter by returning -EINVAL.

DIFUZE [43] focuses on fuzzing kernel drivers, which are essential for the kernel to interface with physical devices, ranging from data storage devices to camera, speaker and sound card. It utilizes static analysis to generate valid inputs of target drivers in an analysis host, then run these inputs to trigger the corresponding operations in kernel drivers in an external execution host. The input sequence is logged and execution results are transferred back, for manual analysis when the target host runs into a crash. kAFL [44] is a hardware-assisted fuzzing framework for operating system kernel. It makes use of Interl Processor Trace to obtain information about process execution and traces of branches, then leverages the information as feedback to guide the further fuzzing approach. kAFL also segregates its framework into two components: a hypervisor to handle fuzzing logic, produce branch coverage and generate new test cases accordingly, and a target virtual machine to execute the test cases. The two components communicate with each other via hypercalls, in order to implement data transfer and minimize the expensive execution overhead such as a reboot.

2.2 Motivation and Approach

One of the most acknowledged challenging problems in Linux Kernel fuzzing is its extreme high complexity. Until September of 2018, the repository of Linux Kernel source tree contains more than 61 thousand ﬁles, 782 thousand commits (a set of changes in a repository) and 25 million lines of code [45]. This level of complexity leads to an unclear attack surface, so the fuzzer’s designer is hard to identify the most vulnerable part of Linux Kernel.

The main motivation of our work in this chapter is to ﬁnd out the top vulnerable parts or components of Linux Kernel, through that guiding the fuzzing framework to avoid wasting time on well-protected parts and put more energy on these vulnerable parts, to achieve a higher fuzzing eﬃciency for Linux Kernel. 22 We perform the analysis as follows:

1. Collect existing Linux Kernel CVEs in past decades.

2. Search and crawl all commits that are proposed to patch Linux Kernel CVEs, identify the ﬁles infected by each of these commits.

3. Categorize Linux Kernel components, ﬁgure out what ﬁles are included in each component.

4. Summarize and analyze the collection results. Map collected CVEs to categorized components. Figure out frequent vulnerability types, vulnerable components and vulnerable Linux Kernel source ﬁles.

2.3 Crawler Design

In this work, we develop a web crawler to crawl needed data from the Internet.

The crawler is written in Python code and mainly based on three Python libraries: threading, requests and BeautifulSoup [46]. Its work ﬂow is shown in Figure 2.2

Figure 2.2: Workﬂow of Our Crawler

Firstly, we identify the target URLs in the main function and create multiple threads. Each thread will proceed a function which takes a speciﬁed URL as input to send an HTTP request for the given URL, and get the responded HTML text through the command html=requests.get(url).text.

Secondly, we utilize BeautifulSoup, a Python library used for pulling data out of HTML ﬁles, together with lxml’s HTML parser [47] to parse the responded html text and obtain nested html data. The command is soup=BeautifulSoup(html, ’lxml’). This nested data contains complete html code of the given URL.

23 Finally, we navigate that nested data by using methods provide by BeautifulSoup to extract data we need. For example, the find all() mtehod will extract all tags in the html data that match filters defined by us, the filter can be tag name, at- tributes and contained text. One instance is the command soup.find all(’div’, attrs=’class’ : "A", ’id’ : "B") which returns a list of all tags whose name=’div’, class=’A’ and id=’B’.

In summary, the workflow of our crawler is very straightforward. The crucial part is how to find data source URL and how to extract specified data accurately. The former needs manual search and decision, and the latter highly depends on the HTML code structure of the chosen source.

2.4 CVE Collection

We propose to collect the existing publicly known vulnerabilities of Linux Kernel that are assigned unique CVE IDs. The signiﬁcant information of a CVE vulnerability we want to get includes published year, ID, CVSS score, types and its patch commits.

2.4.1 Database Choice

There are two famous databases of CVE security vulnerabilities. One is National Vulnerability Database (NVD) [48], a repository managed by the U.S. government. The other one is CVEdetails [49], a free CVE vulnerability information source. Both of them provide information about a speciﬁed vulnerability such as its assigned ID, CVSS score, description, impact details, types and links for references.

In this work, we prefer to collect Linux Kernel related CVEs from CVEDetails. The main reason is, although both CVEDetails and NVD provide the list of aﬀected products for every CVE vulnerability which enables us to get a statistic list for CVEs that aﬀect Linux Kernel, searching results in NVD contain many unexpected false positives.

24 One false positive instance of searching in NVD is CVE-2018-1857 [50], which is an information leakage vulnerability found in IBM DB2, a family of data management products. Through viewing its advisory, we can ﬁnd that the root cause is located in DB2 itself, but not in Linux Kernel.

This false positive occurs because NVD includes Linux Kernel as one of the running platforms for DB2, as shown in Figure 2.3, and wrongly put this CVE in the return list when we search vulnerabilities by specifying Linux Kernel included in the aﬀected products. We can further ﬁnd many more similar false positive results in this style in NVD, such as CVE-2018-1786 [51] and CVE-2018-1834 [52].

Figure 2.3: Known Aﬀected Software Conﬁgurations of CVE-2018-1857

Instead of crawling from NVD, we can get a CVE list related to Linux Kernel from a vulnerability statistics webpage [53] provided by CVEDetails. These CVEs are divided by published year, and they all have Linux Kernel included in their aﬀected products. So there is no false positive.

Finally, we summarize signiﬁcant indicators about Linux Kernel CVE collection in the above two databases and show in Table 2.1. These two databases both contain all needed information of a vulnerability for us, such as published year, ID, score, types and commit links. Through searching CVEs published from 1999 to 2018 as Linux Kernel included in aﬀected products, NVD reports 3543 results with many false positives. CVEDetails reports less as 2162, but the advantage is free of false positive. Since false positive results will surely have very disruptive impact on our analysis, we chose CVEDetails as the source of CVE info collection.

25 Database Signiﬁcant Info Number of Results False Positive (1999 - 2018) NVD All Contained 3543 Many CVEDetails All Contained 2162 None

Table 2.1: Database Comparison in aspect of Linux Kernel CVE Collection

2.4.2 Collect ID, Type, Score

Basic information of a CVE vulnerability we need includes its ID, type and CVSS score.

CVEDetails divides CVEs that affect Linux Kernel into multiple lists according to published years. And the list of each year may be shown in multiple webpages since CVEDetails lists only 50 CVEs at most on one webpage. In this case, we must first find out all the webpage links that reference to CVE lists we want.

The basic URL of Linux Kernel CVEs in CVEDetails is https://www.cvedetails. com/vulnerability-list/vendor_id-33/product_id-47/, where vendor id-33 specifies the affected vendor is Linux and product id-47 specifies the affected vendor is Linux Kernel. To obtain the list of a specified published year, we only need to append year-$Y with $Y specifies the years behind it. For example, the following URL will direct us to a webpage which lists Linux Kernel CVEs published in the year 2005: https://www.cvedetails.com/vulnerability-list/vendor_id-33/product_ id-47/year-2005

By visiting above URL, we ﬁnd the number of Linux Kernel CVEs publish in 2005 is 133, and these CVEs are listed in 3 webpages referenced by 3 hyperlinks. In HTML text of above URL, the hyperlinks are stored in a

tag with class="paging" and id="pagingb", and this

tag is unique, as shown in Figure 2.4. So we can obtain this tag through command page = soup.find(’div’, attrs=’class’ : "paging", ’id’ : "pagingb"), where soup stores the nested html data after parsing. Nextly, use command page.find all(’a’, href=True) to obtain all the 26 tags, whose element in name of ’href’ can provide us the hyperlink that references to a webpage containing our needed CVE lists.

Figure 2.4:

tag contains hyperlinks to CVE lists

We collect 55 hyperlinks for CVE lists from 1999 to 2018. Next step is to collect CVE ID list for each year. We send request to these 55 hyperlinks in multiple threads, and get responded html. After parsing and look through the html text, we ﬁnd that every CVE ID is stored in an tag like follows:

CVE-2005-4811

All these tags can be fetched through command CVE List = soup.find all(’a’, href=True,string=re.compile("^CVE-")), here string=re.compile("^CVE-") is used to match tags whose string starts with “CVE-”. Then CVE ID like above, CVE-2005-4811, is extracted by query ‘.text’ of elements in CVE List.

We collect IDs of all the 2162 Linux Kernel CVEs provided by CVEDetails. Next we need to collect types, scores and reference links of them.

The details of a CVE can be viewed in https://www.cvedetails.com/cve/$ID, where $ID speciﬁes its ID. We multi-threadedly request URLs for all the 2162 CVE IDs, then extract required data through the following methods:

• Types:

Types of a CVE are stored in a table row whose header is Vulnerability Type(s). So we use command Type = soup.find(’th’, text="Vulnerability 27 Type(s)") to locate target table row, then use Type = Type.find next(’td’) and Type List = Type.find all(’span’) to extract all the types stored in tags in the table cell. For example html text segment in List- ing 2.1, through above commands we can obtain a list of two strings inside: [’Overflow’,’Gain privileges’], which are the types categorized by CVEDetails.

Vulnerability Type(s) Overflow Gain privileges

Listing 2.1: Example tag that contains CVE types

• CVSS Score:

CVSS Score is stored in the same table as types, but the tag containing it has a speciﬁc class name: "cvssbox". So it can be obtained directly by command soup.find(’div’,’class’ : "cvssbox").text.

2.4.3 Collect Patch Commits

CVEDetails usually provides multiple reference links for a CVE. These links point to messages, discussion, advisories or patches relevant to ﬁxing this CVE. Figure 2.5 show an example of how the reference links table looks like in CVEDetails.

Figure 2.5: An Example of Reference Links Provided by CVEDetails

28 All the reference links of a CVE are stored in a table with id=id="vulnrefstable" in its html text. Each link takes one cell, means one , to store, and has a tag with attribute target=" blank" to create the link or hyperlink.

In this work, we are only interested in links that direct us to the commits that update Linux Kernel source code to put a patch. Through research, we ﬁnd an online repository [54] that provides the details of Linux Kernel commits. User can request the detailed information of a commit through the following URL with a speciﬁed $id: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=$id

So the target reference links we want should be in the above URL format. We look at Figure 2.5 again and observe that the last link in red rectangle suits this format (although in the middle it is cgit/ instead of pub/scm/, this link still directs us to where we want). Therefore, we use the command soup.find all(’a’,’target’ :" blank",text=re.compile("git.kernel.org")) to fetch all tags that have target=" blank" and include "git.kernel.org" in the text, then extract their text to get URLs link to patch commits of the CVE.

Once we get URLs of patch commits, we can automatically access the their detailed information. These information includes its author, date, hash id, parent commit, download link and change logs. The most significant part for us is the changelogs, which are stored in a .diff file and used to record addition and deletion of code segments, as well as the names of modified files and functions, just like what Figure 2.6 shows.

In this work, we only care about the names of the modified files that can be found in tags nested in a tag with class="upd". We extract all the target tags through command soup.find all(’td’, ’class’ : "upd"), then for each element in the returned list, use .find next(’a’,href=True).text to obtain the file name.

It should be noted that some CVEs have no patch commit while some may have multiple. After crawling, we ﬁnd that 1189 of total 2162 CVEs have at least one patch commit, and the commits of 62 of them are no longer valid. For the 1127

29 Figure 2.6: An Example of commits change log

CVEs with valid commits, we collected 1310 commits and in total 2492 ﬁles are modiﬁed in these commits.

2.5 Linux Kernel Component

We need to categorize Linux Kernel components and collect all ﬁle names that are included in each component for further CVE-Component mapping and analysis.

2.5.1 Component Category

Currently there is no restrictive official standard for categorizing Linux Kernel components. People usually categorize Linux Kernel code into different components upon their functionalities. For example, Jones [55] decomposes it into 7 major components: system call interface, process management, memory management, virtual file system, network stack, device drivers and architecture dependent code.

For attack surface analysis, categorizing in more details leads to a better result. In this work, we reference a common functionality-based but more detailed category method proposed by Constantine in his Linux Kernel Map [56]. This method ﬁrst follows functionality to classify 6 major components : human interface,

30 system, processing, memory, storage and networking. Then the major components are further decomposed into 37 components based on 6 diﬀerent layers: user space interface, virtual, bridges, logicalm device control and hardware interface.

The table of components is shown in Figure 2.7. The horizontal headers are functionalities and vertical headers are layers. There are 37 components and some of them crosses multiple functionalities or layers.

Figure 2.7: Our Component Category for Linux Kernel

Linux Kernel Map [56] also provides us 411 keywords such as related variable/type names or directory/ﬁle names for the 37 components. These keywords will be used in the collection of component ﬁles.

2.5.2 Collect Component Files

We are going to identify which files each component includes. The Linux Kernel source code database we choose is maintained by Bootlin [57], and can be accessed through URL https://elixir.bootlin.com/linux/$Version/$Type/$keyword. Here the version is specified by variable $Version, variable $Type can be "source" to direct to a directory/file or "ident" to search for a variable/type and variable $keyword is the directory/file name or searched target. 31 Although different CVEs may affect different versions of Linux Kernel (the last version is version 5.1.5), we only choose the last version as our target because a later version usually contains more complete files.

For every component, we use the keywords mentioned in the previous sub-section that belong to it to crawl data from the above URL. There are three diﬀerent cases:

• Keyword is a ﬁle name:

If the keyword is the name of a .c ﬁle, we just record down the keyword.

• Keyword is a variable/type name:

In this case, the URL to be requested is https://elixir.bootlin.com/linux/latest/ident/$keyword, and it provides a list of ﬁles that deﬁne or reference this keyword. File names are stored in tags and can be crawled by soup.find all(’strong’).

• Keyword is a directory name:

The requested URL is https://elixir.bootlin.com/linux/latest/source/$keyword. It directs to a webpage that shows files and subdirectories in a directory whose name is the keyword. We extract all the file names by soup.find all(’a’, attrs=’class’ : re.compile("tree-icon icon-blob")) since they are stored in tags with string “tree-icon icon-blob” included in the class name. For the subdirectories, we use soup.find all(’a’, attrs=’class’ : re.compile("tree-icon icon-tree")) to locate them and do a recur- sively traversal to obtain all contained files in every level of subdirectories.

Through crawling with the 411 keywords for the 37 component categories, we totally collect 30,724 Linux Kernel ﬁles in the latest version (5.1.5). The number of ﬁles in each component will be shown in the next section.

32 2.6 Results and Discussion

We collect 2162 Linux Kernel CVEs. The collection result distributed by published year is shown in Figure 2.8.

Figure 2.8: Number of Collected Linux Kernel CVEs with Years

We can see the yearly amount of publicly reported Linux Kernel CVEs are lower than 30 before 2004, and rapidly increase to 51 in 2004. Only after one year, in 2005, the amount leaps to 133. After then, the yearly reported amount generally shows an increasing trend and reaches the maximum value 454 in 2017. But one year after, the amount fell back to 176.

There are 8 different vulnerability types in our collected Linux Kernel CVEs: Gain privileges, Denial Of Service, Bypass a restriction or similar, Overflow, Obtain Information, Execute Code, Memory corruption, and Directory traversal. These types are categoried based on kinds of the consequence, while the criterion of classification we introduce in Section 1.2.1 is the root cause. Their detail amount and percentage are shown in Table 2.2. The tendency chart of CVEs with types and years is demonstrated in Figure 2.9.

It is needed to note that in these 2162 CVEs, some may not be categorised into any type, while some may belong to multiple types, which will be counted multiple times. So the total amount in Table 2.2 is 2619 instead of 2162, but the percentage is still calculated with 2162 as the dividend.

33 Rank Type Amount Percentage 1 Denial Of Service 1186 54.8% 2 Obtain Information 350 16.2% 3 Overﬂow 346 16.0% 4 Gain privileges 260 12.0% 5 Execute Code 241 11.1% 6 Memory corruption 124 5.7% 7 Bypass a restriction or similar 112 5.2% 8 Directory traversal 3 0.1%

Table 2.2: CVE Amount and Percentage with Types

Figure 2.9: Number of Collected Linux Kernel CVEs with Types & Years

34 From the tendency chart, we can indicate that Denial Of Service is the most common vulnerability type that occurs in Linux Kernel. It stays at a high amount level (more than 40) since 2005 and always occupies the number one position except 2017. Amounts of CVEs in Gain privileges, Overﬂow and Obtain Informa- tion over the past 20 years also take high percentage (all above 12%). The rarest type is Directory traversal, which only occurs twice in 2006 and once in 2017. The trend of Execute Code is very interesting because of its explosive growth in 2017 due to a series of critical arbitrary code execution bugs found in Andriod products based on Liunx Kernel.

Another information we collect about Linux Kernel CVEs is the CVSS score, which is an industry metric to rate vulnerable levels. We count every year’s CVE amount with diﬀerent vulnerable levels(CVSS Score 0-4: Low, 4-7: Medium, 7-10: High) and demo the data in Figure 2.10.

Figure 2.10: Number of Collected Linux Kernel CVEs with Scores & Years

It can be seen that CVEs in Low and Medium level constitute the major parts in every year except 2017. It is because of the same reason with the explosive growth of CVEs in Execute Code type: most of the Execute Code vulnerabilities reported in that year are able to cause serious damage to the system and are rated with a very high score (7-10). We should also pay attention to the growing trend

35 Data Amount Linux Kernel Components 37 Source Files in Components 30724 Linux Kernel CVEs 2162 Patch Commits 1310 CVEs with Patch Commits 1189 Files Modiﬁed in Patch Commits 2492 CVEs with Modiﬁed Files 1127

Table 2.3: Amount of Collected Data Related to CVE-Component Mapping of high-risk vulnerabilities in the recent few years to prepare well for upcoming threats.

For CVE to components of Linux Kernel mapping work, we summarize the related data amount we collect in Table 2.3.

We collect 30,724 source files that are included in these 37 Linux Kernel components. The distribution of source files in components is shown in Figure 2.11. Since a source file can be categorized into multiple components, the summation of amounts in Figure 2.11 is much bigger than 30,724. We can see the top three components with maximum amount of files included is synchronization (12173), Device Model (9922) and Logical memory (8494).

There are only 1127 CVEs whose patch commits are accessible. We collect all ﬁles modiﬁed in patch commits for each of these CVEs, then map these 1127 CVEs to 37 components through the algorithm presented in Algorithm2.

Figure 2.12 shows the mapping result. A CVE may also be mapped to multiple components according to the modified files in its patch commits. From the figure we find the top three components with maximum CVEs mapped to are logical memory(802), synchronization(726) and threads(620).

We also ﬁnd out the top 5 most vulnerable Linux Kernel source ﬁles upon how many CVEs are relevant to it and show them in Table 2.4.

36 Figure 2.11: Amount of Source Files in Components

Figure 2.12: Amount of CVEs in Components

37 Algorithm 2 CVE-Component Mapping Algorithm

Input: Dictonary of CVEs with Modiﬁed Files, DV ; Dictonary of Components

with Included Files, DC ;

Output: Dictonary of Components with mapped CVEs, DVC

1: for cve in DV .keys() do

2: for comp in DC .keys() do

3: if Any match pairs between DV [cve] and DC [comp] then

4: DVC [comp].add(cve) 5: end if 6: end for 7: end for

File Relevant CVE Amount Relevant Component Amount arch/x86/kvm/x86.c 17 8 net/socket.c 16 14 fs/ext4/super.c 14 10 kernel/bpf/veriﬁer.c 14 9 arch/x86/kvm/vmx.c 14 8

Table 2.4: Top 5 Linux Kernel Source Files upon Relevant CVE Amount

We calculate the Files/CVEs ratio for each component and show the result in Figure 2.13 (set the value to zero for components with no CVEs). Smaller value (except for zero) indicates more vulnerable. The component with the smallest ratio is socket splice, which is found with 42 CVEs and with only 11 files included. The second vulnerable component is logical file system, which has 16 CVEs found and 6 CVEs included. If we only consider components that have more than 300 CVEs, the most vulnerable one is interfaces core with 412 CVEs, 1615 files, and the ratio is 3.92. Other low ratio components includes socket access (5.65), Scheduler (5.88) and file & directories access (6.86). Component Syn- chronization contains the maximum amount of files but its Files/CVEs ratio is 16.77. And component logical memory, which gets the maximum amount of CVEs found in, has a ratio of 16.77.

Through summarizing the data shown in Figure 2.11, Figure 2.12 and Figure 2.13,

38 Figure 2.13: Component Files/CVEs Ratio we plot the chart of the vulnerable level for each component in Figure 2.14. Here Few CVEs, Many CVEs and Massive CVEs denote the CVE amount is less than 100, more than 100 but less than 300, and more than 300. Demarcation point of High and Low Ration is 10.00. From that we can easily ﬁgure out that all the Linux Kernel components inside layers of user space interface, Virtual, Bridges and Logical are at a high vulnerable level, while the Device control and Hardware interfaces layer are much more safer.

Figure 2.14: Vulnerable Level of Linux Kernel Components

39 In conclusion, we searched and collected Linux Kernel CVE data such as ID, type, score, patch commits and relevant changed files. We also collected sets of Linux Kernel files for 37 components categorised by Constantine’s Linux Kernel map [56]. Then we built a mapping between Linux Kernel CVEs and components through matching source files in both of them. Our mapping work identifies vulnerable levels of these 37 components and can be used to benefit the detection of Linux Kernel vulnerabilities in the target identification aspect. Our approach is also extensible with new CVE data and a more advanced classification strategy of the target component.

40 Chapter 3

Code Coverage Improvement for Coverage Guided Fuzzing

3.1 Background

3.1.1 Code Coverage

Code coverage is a crucial and widely used metric in software testing. It presents which part of the source code is executed during the execution of a particular test case, and measures how many percentages of coverage achieved by this test case. Test cases with higher code coverage are usually regarded as having a higher chance to discover undetected bugs compared to those with lower coverage.

There are diﬀerent criteria for diﬀerent aspects of code coverage, mainly including statement coverage, function coverage, basic block coverage, and edge coverage.

We use a simple C function in Listing 3.1 to explain the basic concepts of them.

1 int foo ( int k ){ 2 i f ( k ) 3 k = 0 ; 4 return k ; }

Listing 3.1: A Sample C Function

41 • Statement Coverage:

Also known as line coverage since usually one statement takes one line. It identiﬁes statements executed in a run of test case. In the above C function, foo(1) will execute all the staments while foo(0) cannot execute k=0;.

• Function Coverage:

Coverage is measured on function level. If a function is called during execution, mark it as covered for the executed test case. For example, if function foo in Listing 3.1 is called, it will be listed in the covered functions in function coverage.

• Basic Block Coverage:

The code of a program can be decomposed into basic blocks that contain only one entry point and one exit point, with no branches in the middle. In other words, every node in a control ﬂow graph denotes a basic block. Basic Block Coverage measures which basic block is hit in an execution.

A function may contain multiple basic blocks, for example in Listing 3.1, the function foo contains three basic blocks: statement k = 0; forms basic block B, statement return k; forms basic block C and rest forms basic block A. If we call foo(1), all basic blocks A, B and C are hit. If call foo(0), clearly that basic block B will not be hit.

• Edge Coverage:

A jump from one basic block to another creates an edge. Still consider the function in Listing 3.1, foo(1) results in a path with two edges: A→B→C, while foo(0) will only trigger one edge: A→C.

Edge coverage tells us which edge is triggered during an execution. It is more accurate than function coverage and already subsumes covered basic blocks. Furthermore, it can provide more detailed control ﬂow information than basic block coverage. Compared to statement coverage, edge coverage is able to be obtained in grey or block box testing. It is also able to discover issues arising in control ﬂow constructs through identifying state transitions.

42 3.1.2 Coverage Guided Fuzzing

Coverage guided fuzzing is a fuzzing approach guided by code coverage information. Here the code coverage produced by an existing test case is used as feedback to evaluate its value. A test case is regarded as valuable or interesting if it triggers new coverage. Only valuable test cases will be kept for mutation and further fuzzing.

New coverage is determined upon which criterion introduced in Section 3.1.1 is used. The most broadly used criterion is edge coverage, in which a test case obtains new coverage if there is at least one edge never observed before being triggered by it. For example, with the sample control ﬂow graph shown in Figure 3.1, in case that the path A→B→C→D has already been observed, a test case that triggers path A→B→A→B→C→D obtains a new edge coverage because it triggers an unobserved edge BA. Another test case that triggers path A→B→D is also considered as interesting since the new edge BD is discovered. After that, a test case that triggers A→B→A→B→D won’t be kept due to all edges in this path have been triggered before.

Figure 3.1: A Sample Control Flow Graph

In recent years, coverage guided fuzzing demonstrates its high eﬃciency in the practice of vulnerability detection. Many fuzzing frameworks [28, 32, 58] inspired by this guided approach have been developed and several works [15, 33, 59] were proposed to improve the performance of them. Here we discuss some of these works:

43 American Fuzzy Lop (AFL) [28] is one of the most popular coverage guided fuzzing frameworks. It regards the transition between basic blocks as coverage information and obtains them through applying compile-time instrumentation. New test cases are generated via mutation on existing test cases in an input queue. A test case that triggers a new transition is marked as interesting and will be saved into the queue. Therefore, AFL is able to gradually increase the total coverage of fuzzing target, execute more code regions thus detect more vulnerabilities.

Steelix [59] provides a boosting technique to AFL in the aspect of exploring paths protected by comparisons of magic bytes. Based on AFL, it applies extra static analysis and binary instrumentation to fetch comparison progress information, then utilizes such information to identify the location of magic bytes as well as how to reach the correct magic bytes. With that, it helps the fuzzer to penetrate more paths and achieve higher coverage.

FairFuzz [15] is a tool also based on AFL. It counts the number of hits for each branch discovered during the fuzzing, then marks branches that are hit few times below a pre-defined rarity-cutoff value as rare branches. Instead of mutating all test cases in the input queue, FairFuzz will only mutate these hitting at least one rare branch. And the mutation strategy is modified to ensure new test cases still be able to hit a given rare branch. In one word, it optimizes the distribution of fuzzing energy via spending more energy on code regions that are rarely touched.

3.2 Motivation

Recently coverage guided fuzzing demonstrates its outstanding performance in software vulnerability detection. But we notice that the most popular coverage guided fuzzing frameworks, such as AFL, are all based on the mutation approach. One reason is that in case an interesting sample is given as seed, test cases generated by mutation are more related to the seed and can inherit the interesting features more easily. Another reason is that the basic mutation-based approach requires less knowledge about the structure of test cases since mutative strategies such as bit

44 or byte ﬂipping can be implemented randomly, thus make the fuzzer being generic to suit diﬀerent targets.

However, a test case that cannot bypass syntactic and semantic check processed by the target is an invalid input and regarded as useless for triggering real bugs. These mutation-based fuzzing frameworks usually generate invalid test cases and waste many time on that, causing these fuzzers to be ineﬃcient when the target requires highly-structured inputs.

By contrast, generative fuzzing frameworks perform better when facing challenges from highly-structured inputs. Because generative fuzzing is able to generate test cases according to the given syntax features and ensure these test cases are syntactic valid. And some widely used generative fuzzers utilize manually-summarized and speciﬁed semantic rules as well as code emulation to improve the probability of semantic valid for a certain fuzzing target, such as Domato [25] for DOM structure and jsfunfuzz [60] for JavaScript engines.

Driven by above, in this work, we explore the possibility of applying coverage guided fuzzing strategy on generative fuzzing framework to achieve both beneﬁts on fuzzing applications with high-structured inputs. We choose the HTML DOM structure in Chromium browser [61] as the fuzzing target, and Domato [25] as our test case generator to generate .html samples. We obtain edge coverage during execution through a code coverage instrumentation provided by SanitizerCoverage [62]. We use the rarely-hit edge as a metric to judge whether a test case is interesting and implement a coverage-guided approach. We hope to observe a signiﬁcant growth of edge coverage for the test cases generated upon interesting samples.

3.3 Approach

3.3.1 Overview

We would like to present our approach shown in Figure 3.2 through comparing with the widely-used classical mutation-based coverage guided fuzzing approach shown in Figure 3.3. 45 The classical approach in Figure 3.3 maintains an input queue. The fuzzer iterates the queue and implements pre-defined mutation strategies on current proceeding input to generate more samples. It executes these mutated samples on fuzzing target and crashes are detected by an attached monitor. After execution, the code coverage is computed through instrumentation on target during its compiling time. Then the fuzzer launches a judgement on the coverage result. If the result triggers any new behavior, such as hitting a new basic block or branch, depending on which metric is used, the sample is regarded as an “interesting sample”, and after some refinement operations it will be appended to the input queue. As the queue growing, more interesting samples are used as bases for mutation and the code coverage increases gradually, resulting in a higher chance to find out vulnerabilities.

Our approach in Figure 3.2 is different from the classical one in some aspects. Firstly, since it is generation-based, the queue we maintain should contain templates for generating samples instead of samples themselves, then in each iteration, Domato generator takes the current template to generate a massive amount of .html samples. Secondly, instead of using only newly undiscovered edges as metric of judgement, we regard samples that hit an edge which is hit very rarely as interesting, because an edge is rarely hit means there is a code region which is rarely explored, leading to a higher probability to trigger new further edge starts from this region. To identify rare edges, we build a database that records all the discovered edges as well as their number of hit. Then we find out samples that hit at least one rare edge, refine, minimize and format them into templates while keeping the condition that rare edge gets hit. Finally we append these templates into the queue for further generation.

3.3.2 Fuzzing Target

Our fuzzing target in this work is the DOM structure in Google Chromium browser on Linux platform.

46 Figure 3.2: The Overview of Our Approach

Figure 3.3: Classical Mutation-based Coverage Guided Fuzzing Approach

DOM presents for “Document Object Model”, which is an application programming interface that treats HTML or XML documents as a logical structure, commonly like a logical tree. It deﬁnes how a document can be accessed and manip- ulated, allows these documents to be used in object-oriented programs. In past decades, DOM was speciﬁed standardly and supported by the most widely used browsers. Therefore, DOM becomes one of the most common attack vectors for browser fuzzing.

Google Chromium is an open-source web browser and many vendors have utilized

47 its source code as the basis to develop their own browsers such as Maxthon and 360 secure browser. It suits Linux platform well and can be instrumented during compiling time for crash monitoring and coverage computing purpose.

3.3.3 Test Case Generator

Our test case generator is based on Domato [25], a DOM fuzzer that provides a grammar-aware generative approach to generate .html ﬁles, which can be parsed into a DOM tree by browsers. Its brief workﬂow is shown in Figure 3.4.

Figure 3.4: Workﬂow of Samples Generation

The generator consists of four components: a .html template, a series of grammar ﬁles, a grammar script and a main generator script.

• Template:

The template for sample generation is a .html ﬁle where the content can be programmatically accessed by HTML DOM methods. Listing 3.2 shows its structure.

48 1 2 3