“Web Protocol Fuzzing of the TLS/SSL Protocol with Focus on the OpenSSL

by Fouad Nouri Khalaf

School of Information Technology and Electrical Engineering, University of Queensland.

Submitted for the degree of Bachelor of Engineering (Honours) in the division of Software Engineering.

9th November 2020

i

Fouad Nouri Khalaf [email protected]

9th November 2020

Prof Amin Abbosh Acting Head of School School of Information Technology and Electrical Engineering The University of Queensland St Lucia QLD 4072

Dear Professor Abbosh,

In accordance with the requirements of the degree of Bachelor of Engineering (Honours) in the School of Information Technology and Electrical Engineering, I submit the following thesis entitled

“Web Protocol Fuzzing of the TLS/SSL Protocol with Focus on the OpenSSL Library”

The thesis was performed under the supervision of senior lecturer Dr Guangdong Bai. I declare that the work submitted in the thesis is my own, except as acknowledged in the text and footnotes, and that it has not previously been submitted for a degree at the University of Queensland or any other institution.

Yours sincerely

Fouad Nouri Khalaf

ii

iii

iv

Abstract

Fuzzing is a testing mechanism that uses randomly generated data as input for functions to be tested with, detecting any unexpected behaviour in a program. To properly fuzz an application can be a challenging task as fuzzing targets must be efficient and have a high code coverage to be valuable. Inadequate fuzzing is an issue that is faced by many open-source web protocol libraries, specifically OpenSSL. The OpenSSL library contains numerous fuzzing targets that only cover a fraction of the codebase. Researching and investigating how open-source web protocol implementations are being fuzzed would then aid in producing solutions that improve the fuzzing capabilities of these implementations. Many issues were discovered that caused a lack of fuzzing, predominantly the lack of funding and education on the importance of fuzzing. It was discovered through the results gathered from investigating web protocol fuzzers that there was a significant lack of functions being fuzzed, and the majority of currently-fuzzed targets are inefficient and could be improved. Therefore, after improving some of the critical fuzzing targets currently being used by web protocol implementations, and after writing new targets that focus on covering new functions, the overall code coverage and testing efficiency of these libraries drastically increased. In the future, graphical processing units and dynamic testing will also be utilised to improve fuzzing performance and capabilities while also reducing costs.

Keywords: Fuzzing, Web Protocols, Open-source, OSS-Fuzz, libFuzzer, OpenSSL, SSL, TLS, LibreSSL, BoringSSL.

v

Contents

Abstract ...... v

List of Figures ...... viii

List of Tables ...... ix

Chapter 1 Introduction ...... 1

1.1 Topic Definition ...... 1

1.2 Goals ...... 2

1.3 Relevance ...... 3

Chapter 2 Background and Literature Review ...... 5

2.1 TLS/SSL ...... 5

2.2 OpenSSL ...... 9

2.3 ...... 11

2.3.1 After Heartbleed...... 13

2.4 Fuzzing – A Testing Mechanism ...... 14

2.4.1 Black Box Fuzzing ...... 16

2.4.2 White Box Fuzzing ...... 17

2.4.3 Problems with Fuzzing OpenSSL ...... 17

2.5 Fuzzing Engines ...... 18

2.5.1 OSS-Fuzz ...... 18

2.5.2 AFL Fuzzer ...... 19

2.5.3 libFuzzer ...... 20

Chapter 3 Implementations and Experiments ...... 21

3.1 Justifying OpenSSL...... 21

vi

3.2 Investigating Previous Implementations ...... 22

3.2.1 OSS-Fuzz Project Setup ...... 22

3.2.2 OpenSSL ...... 24

3.2.2.1 Target: client. ...... 25

3.2.2.2 Target: server.c ...... 29

3.2.3 LibreSSL ...... 33

3.2.4 BoringSSL...... 35

3.3 Implementing New Solutions ...... 37

3.3.1 Improving already existing targets...... 38

3.3.2 Creating new targets ...... 44

3.3.2.1 Unicode to ASCII ...... 45

3.3.2.2 UTF-8 to Unicode ...... 46

Chapter 4 Results and Discussions ...... 49

4.1 Discovered bugs and vulnerabilities ...... 49

4.2 Code coverage and fuzzing performance ...... 50

Chapter 5 Conclusions ...... 53

5.1 Summary and conclusions ...... 53

5.2 Possible future work ...... 55

Appendices ...... 57

A. Data analyser and plotter for libFuzzer results – code coverage ...... 57

B. Unicode to ASCII fuzzer ...... 58

C. UTF-8 to Unicode fuzzer ...... 59

D. Example CSV data produced to analyse results ...... 60

Bibliography ...... 61

vii

List of Figures

Figure 1: Top-level view of how operates (server-side) [3]...... 10

Figure 2: This is what American Fuzzy Loop looks like once it has executed [26]...... 19

Figure 3: An example of a fuzz target function in C [23]...... 20

Figure 4: Code coverage of the ‘client.c’ target ...... 28

Figure 5: Snippet of the target’s logs while it is being fuzzed...... 29

Figure 6: Code coverage of the 'server' fuzzing target ...... 32

Figure 7: Snippet of the do-while loop operation in the server target for OSSL ...... 39

Figure 8: Graph representing the code coverages with and without READ_EARLY_DATA 40 Figure 9: Graph representing the number of executions per second with and without

READ_EARLY_DATA ...... 41

Figure 10: Code coverage for the server target with and without all ciphers included ...... 42 Figure 11: Comparing the server target's execution before and after the cipher modifications were performed ...... 43

Figure 12: Code coverage for the Unicode to ASCII fuzzer ...... 46 Figure 13: Code coverage for the UTF-8 to Unicode target plotted over the number of executions performed...... 47

Figure 14: Execution rate for the UTF-8 to Unicode fuzzing target ...... 48

viii

List of Tables

Table 1: Example CSV data to be used to collect results where x usually describes the # of executions so far and y describes the type of data being collected, such as code coverage in the example above...... 60

ix

0

Chapter 1 Introduction

1.1 Topic Definition

Recently, web information systems have grown in popularity with many desktop and mobile applications moving online as web applications, such as Microsoft Word, Discord, Instagram, and more. In addition to desktop applications moving online, many information-sensitive physical activities moved online too, such as banking, grocery shopping, managing taxes and social security services such as Centrelink and Medicare (myGov). Deploying such services onto a server or a cloud platform reliably and securely poses many challenges. Designing and developing software can be very complicated. Many issues may arise from developing sophisticated software, including bugs, security vulnerabilities, and unexpected behaviours. Security vulnerabilities exist in the majority of software being used today, and plenty of these vulnerabilities have not been discovered or patched yet [4]. These security vulnerabilities may range in severity depending on the impact they may have on the end-user, the provider, and any other stakeholders in the software or system.

Many web information systems use OpenSSL as their library. OpenSSL is a free, full-featured security library which provides an open-source implementation of the TLS and SSL protocols, including the early TLS version 1 protocol [3]. OpenSSL is compatible with all modern Unix-based and Windows devices, making it the most widely used HTTPS SSL/TLS security library to-date [2][3]. However, just like any other software and software library, OpenSSL and TLS/SSL have had many vulnerabilities discovered in the past, and possibly more vulnerabilities that are yet to be discovered. For that reason, had developed a fuzzing tool named OSS-Fuzz. OSS-Fuzz is an open-source, community-backed fuzzing tool that is continuously checking (fuzzing) for any vulnerabilities in OpenSSL and many SSL projects using some of Google’s powerful supercomputers [18]. Fuzzing itself will be identified and discussed in more detail (including OSS-Fuzz) later in this thesis.

1

The purpose of this project is to investigate the current open-source implementations of cryptographic libraries, mainly OpenSSL. Additionally, investigate their security, current testing methods, suggest an improvement to the current security methods and build a fuzzing tool which helps detect future vulnerabilities in libraries such as OpenSSL. Moreover, the testing, research, development and financial support for their projects will be investigated and the impacts they have had on the projects’ security and reliability. Some of the significant vulnerabilities detected in these libraries will be investigated, focusing on the origins of such vulnerabilities and how they were detected. Fuzzing, which is an automated testing technique, will be heavily investigated and experimented with, learning as much as possible about them and some of the popular techniques so that a better tool - or an improvement to a current tool - may be built to detect new and previously undetected vulnerabilities.

1.2 Goals

This thesis project will focus on researching the security of OpenSSL, compatible fuzzers, previously and currently used fuzzing methods, and coming up with further recommendations as to how to further improve the security of OpenSSL and other open-source security and libraries. This project also aims to achieve several goals. The purposes and goals of this thesis are to:

- Deeply examine the workflow of the TLS/SSL protocols and understand them, - Investigate different fuzzing engines, techniques and methods, - Research how effective the fuzzing engine and their targets have been in detecting bugs and vulnerabilities, - Investigate what needs to be done in order to improve fuzzing for these libraries, - Experiment with current solutions and improve them, - Build new fuzzing targets which statically and dynamically test the security of security libraries, - Analyse the results and discuss what needs to be done to improve currently available solutions further, and - Produce several well-justified methods to improve currently published and new work.

2

1.3 Relevance

TLS/SSL is the most widely used web security protocol and is considered by some as the standard for securing data transmitted to and from web information systems. As mentioned earlier, OpenSSL is the most common and the most-supported TLS/SSL library. OpenSSL is also considered as one of the most secure open-source TLS/SSL libraries. That is due to the continuous research being done on OpenSSL, its implementation of the TLS/SSL protocol, and the vulnerability detection tools used on it such as scanners, fuzzers, and more [3].

Fuzzing is a technique used by security researchers and architects to detect vulnerabilities in software by providing it with as many inputs as possible, catching any unexpected behaviour that could potentially be abused by a malicious party [21]. Fuzzing has existed for many years now. It has recently grown in popularity due to its ability to detect and catch vulnerabilities in many software, including OpenSSL. Plenty of computing power has been dedicated to fuzzing OpenSSL in 2016 due to a vulnerability called Heartbleed, which will be discussed in more detail later in this thesis.

Since OpenSSL and its security protocol implementations are the most commonly used worldwide, and since fuzzing is increasingly growing in popularity and has helped detect many vulnerabilities, it is safe to say that investigating current fuzzing techniques for OpenSSL is very relevant in recommending future improvements to current fuzzing methods, which could potentially detect more vulnerabilities than ever before.

3

4

Chapter 2 Background and Literature Review

2.1 TLS/SSL

Since more information-sensitive services have moved online – as mentioned earlier – web security had significantly grown in popularity. More websites than ever have moved from using the unencrypted Hypertext Transfer Protocol (HTTP) into Hypertext Transfer Protocol Secure (HTTPS) [4]. According to the Oxford Dictionary of the Internet (3rd edition), HTTPS is a web protocol which enables secure transactions between a client and a server [2]. This allowed many services to operate securely without comprising the security and integrity of their data and their users’ data.

HTTPS uses the Secure Socket Layer (SSL) and (TLS) protocols, commonly referred to as TLS/SSL, to encrypt the data transmitted by both parties and secure the connection. TLS/SSL is considered to be the standard for web security nowadays, with every webpage using some version of TLS/SSL to encrypt the data transmitted between themselves and their users. The protocol uses and heavily relies on various public-key cryptographic algorithms to provide authenticity and security, some of which are the Diffie- Hellman, Digital Signature, and RSA algorithms [3].

TLS/SSL was built on an earlier, a now outdated protocol called the Secure Socket Layer (SSL). Netscape first developed SSL in 1995 as the first version (v1.0) of the Secure Socket Layer, but it was never released due to severe security vulnerabilities. The protocol was later updated, the pre-existing vulnerabilities were removed, and the protocol was released as SSL version 2.0 in February 1995 [4].

However, not long after its release, researchers quickly discovered that version 2.0 was packed with new security vulnerabilities. Netscape then released the third iteration of the protocol as SSL 3.0 in November 1996, but similar to its two predecessors; it also had several security

5

vulnerabilities [3][4]. SSL 3.0 was later deprecated in 2015 in favour of the Transport Layer Security (TLS) protocol.

All versions of SSL had significant security flaws, which rendered them as insecure and prone to various attacks. Since version 1.0 was never released, there is barely any information available on the vulnerabilities it may have had. SSL 2.0 is vulnerable to length extension attacks as it uses weak MAC construction which utilises the MD5 hashing function [4]. A length extension attack is one where a malicious party calculates the hash of the first message and its length, concatenates it with the second message – which was created by the attacker – and calculates the hash of that to perform the attack [4].

Moreover, version 2.0 used identical keys for message authentication and encryption, which would compromise the privacy of the send if the keys were to be compromised [4]. In some cases, the encryption keys can be based on 40-bits of data when it could be much longer. If the keys were much longer, it would make it much more difficult for an attacker to crack the keys. This was fixed in the third iteration of the protocol, where different and longer keys were used by the protocol to reduce the chances of an attack occurring, and reducing the damages done if a successful attack was to occur.

One of the more severe vulnerabilities in SSL 2.0 was the possibility of a Man-In-The-Middle attack occurring undetected. Such an attack could occur due to the lack of security in the handshake process, causing an attacker to pretend to be someone else and possibly gain private information of the victim [3][4].

SSL 3.0 was then released, patching many of the vulnerabilities detected in its predecessor. However, it still had many security flaws which rendered it insecure and undesirable by today’s standards. Version 3.0 supported the SHA-1 hashing algorithm for its certificate authentication, which increased its security at the time. Moreover, the master key creation process was based on the MD5 and SHA-1 hashing algorithms. At the time, that was considered ‘good enough’ due to the SHA-1 option, but it was later discovered that SHA-1 was not collision-resistant, rendering it ‘insecure’ [4].

6

In 2014, another man-in-the-middle attack was discovered by a Google security team called the Padding Oracle On Downgraded Legacy Encryption, or POODLE for short. An attacker can exploit this vulnerability by requesting that the client connects to the site using an older version of SSL. By using an older version, an attacker may exploit old vulnerabilities to read, manipulate, or steal user data. An attacker would need to run a JavaScript agent on a site, sending a request to the client’s (victim’s) browser asking them to send a cookie to the site [4][5]. The attacker would then intercept the SSL records being sent to the browser by the website, modifies and sends them again, with a 1 in 256 chance that the website would accept the modified records [5][6]. If a website accepts a modified record, it would reveal one byte of cookie data to the attacker. However, if a website rejects the modified record, the attacker would have to send another record until the website accepts one [6].

Not long after the vulnerability was detected and documented, browsers such as Firefox and Chrome disabled fallback to SSL 3.0 to prevent attackers from exploiting this vulnerability [5][6]. They later disabled SSL 3.0 by default on their browsers to encourage websites to at least use the more recent protocol at the time, TLS 1.0 [6]. Most of the other prominent browsers such as , Safari and Opera disabled SSL 3.0 by default a few months after Firefox and Google Chrome in favour of TLS 1.0 as the minimum required protocol.

Despite all the efforts made to eliminate PODDLE attacks, a variant of the POODLE attack was later detected that affected TLS 1.0, 1.1 and 1.2 protocols. Since some TLS implementations ignore checking the padding structure after decryption, they are vulnerable to the ‘new POODLE’ attack, even if they were not using SSL 3.0 [5][6].

TLS, or Transport Layer Security, is the successor protocol to SSL. Released in 1999, it fixed some of the vulnerabilities SSL 3.0 had [6]. Despite the different names given to both protocols, TLS 1.0 and SSL 3.0 were very similar and had minor differences. In 2006, version 1.1 of TLS was released, patching many vulnerabilities, improving authentication and encryption protocols, and implementing a protection against CBC (cipher block chaining) attacks by:

- changing the implicit Initialisation Vector to an explicit one, and by

7

- changing the handling of padding errors to use “bad_record_mac” alert than the “decryption_failed” one [8].

TLS 1.2 was announced in January 1999 as the successor to TLS 1.1. The new version was packed with plenty of new updates and improvements that enhanced the protocol’s security and its resistance to several attacks which existed in previous versions. One of the significant improvements in the protocol is how the P_SHA256 algorithm was chosen over MD5/SHA1 in the pseudorandom function, which is a much more secure algorithm in comparison to MD5/SHA1 [7]. Moreover, the protocol added support for with some extra data modes, tighter requirements, tighter checking of the “EncryptedPreMasterSexret” version, added support for the HMAC-SHA256 cipher suits, and setting TLS_RSA_WITH_AES_128_CBC_SHA as the mandatory algorithm to implement the cipher suites.

Despite all the security updates in TLS 1.2, there were several vulnerabilities which could have been exploited by an attacker in some situations, such as the ‘POODLE’ vulnerability. Moreover, some of the algorithms used by TLS 1.2 were considered ‘insecure’ by many, such as MD5/SHA1, where it is still used in some circumstances by TLS 1.2. For that reason, TLS 1.3 was released as a standard in August 2018 in RFC8446 as the successor to TLS 1.2 and is currently the most up-to-date version of the TLS protocol [9]. It had a large number of minor and major updates. Some of the most critical updates which significantly improved the protocol’s security were:

- The removal of all legacy symmetric encryption algorithms, except for ones that would cause incompatibility issues, - The removal of the Diffie-Hellman and RSA cipher suites, - All messages after the ServerHello message are encrypted, and - The removal of compression and changing the RSA padding to use the RSA Probabilistic Signature Scheme [9].

8

These changes were necessary to ensure that the security of the protocol remained up-to-date and as secure as possible, helping prevent malicious parties from harming legitimate users as much as possible.

2.2 OpenSSL

OpenSSL, which was first released in 1998, is a “full-featured, commercial-grade toolkit” for the Secure Socket Layer and Transport Layer Security protocols [3]. It provides developers with an open-source implementation of the TLS and SSL protocols as part of the library to secure communications between two or more unknown (or initially untrusted) parties. The library may also be used for general-purpose encryption needs. The library is licensed under an “Apache-style License”, which allows users to use the library for both commercial and non- commercial purposes [3].

OpenSSL’s core library implementation of the web security protocols was written in C, but other parts of the library were written in and assembly as well [3][10]. The library includes the implementations of every version of SSL and TSL, in addition to some of the most popular algorithms for hashing, message digests, symmetric and public-key cryptography, and pseudorandom number generation [10].

OpenSSL is the main focus of this thesis project, as it is the most up-to-date and most popular TLS/SSL library. It is used by many developers to secure their applications and software from eavesdroppers and malicious users and to protect their users’ and their private data. However, some developers use OpenSSL to secure applications which they have not developed themselves. OpenSSL is compatible with most operating systems – that includes all Unix-based OS and common operating systems – and OpenSSL’s API allows developers to secure applications without modifying the source code by much (if at all) [3][10]. For instance, MySQL is one of the applications which support OpenSSL. Users can configure the package by enabling the two options --with- and --with-vio, which would allow the package to build with OpenSSL.

9

In addition to securing third-party applications (such as MySQL), developers can enable OpenSSL server-side. Stunnel, which was created by Michał Trojnara, is a proxy aimed at providing applications – clients and servers – with TLS security without modifying its source code [11]. It does that by “standing” in the middle of a connection between a client and the server, encrypting the data between Stunnel and the client, which Stunnel then decrypts and passes on to the server in a format it understands (usually unencrypted) [3][11]. Stunnel is not meant to work alone. It was created to work with OpenSSL, as it uses OpenSSL’s cryptographic implementations for its encryption and decryption needs [3][11]. The diagram below illustrates how Stunnel operates server-side.

Figure 1: Top-level view of how Stunnel operates (server-side) [3].

Security and software engineers (users) would first need to install a valid certificate and a valid private key to establish a successful SSL connection. For instance, after performing the initial steps in regards to the certificate and key, a user would then proceed to download the OpenSSL library supported by the server’s OS, install it on their server (an IMAP mailing server for instance), and enable a second port (encrypted) in addition to the default port (unencrypted), which users can connect to if they would like to communicate securely with the server. Then the user may let the server know to forward all the connections from the secure port to the insecure port using Stunnel. The Stunnel proxy would then decrypt the data and sends it to the client [3].

Moreover, since Stunnel can tunnel to different ports and addresses, developers can provide it with a port to a different machine (one dedicated to encryption/decryption), which could potentially cut costs and improve performance on the main server. This is also useful if a security researcher or developer wanted to secure a particular server or service, but was not

10

able to modify its source code due to privacy, legal and commercial issues. An example of such a situation is a commercial company which would not feel very comfortable revealing its sensitive source code to the third party (researchers and developers, for instance).

In addition to Stunnel’s ability to run on servers, it is also able to run as a client-side proxy. Clients who do not support SSL can connect to a server which supports the protocol, providing them with a secure and encrypted connection. However, clients would need to authenticate themselves to prevent a man-in-the-middle attack from occurring. The process to authenticate a client is very tedious and not as simple as just installing it on a server, but it is highly recommended, if not necessary. A client would first need to validate certificates, preferably at level 3 (maximum validation) [3]. After that, the client would be able to communicate with a server using Stunnel through a command-line interface. However, the user might need to give the client (Stunnel) root privileges in order to accurately send and receive requests on some machines [11].

2.3 Heartbleed

On the 1st April 2014, a major vulnerability was discovered in the OpenSSL library by Neel Mehta, a member of Google’s security team. The vulnerable code was initially added pushed into the library in 2012 and was undetected for around two years [14]. The vulnerability was then patched seven days after it was officially discovered. When exploited, the vulnerability causes memory leaks of up to 64KBs to occur, which may reveal some of the data transmitted between a server and a client and vice-versa [14][15]. Examples of such data are usernames, passwords, addresses, payment information, and more importantly, primary and secondary key materials [16].

Heartbleed was accidentally added to the OpenSSL library with the Heartbeat Extension for TLS and Datagram TLS (DTLS), which provided programs with the ability to test and keep secure connections alive without needing to re-establish the connection every time [16][17]. The Heartbeat Extension was pulled into the OpenSSL repository in 2012 and was enabled in

11

every version of the library by default. Once Mehta detected the bug, he secretly reported it to OpenSSL. At around the same time, Heartbleed was discovered by Mehta, another security team, Codenomicon, discovered the same bug and immediately launched the website “www.heartbleed.com” to raise public awareness of this potentially catastrophic bug [16]. Unfortunately, there had been various ‘successful’ attacks aimed at impersonating high-profile persons and theft of sensitive and private information of many users, especially those registered on government websites [14][15][16].

The Google security team started working on a patch/fix for the bug as soon as they discovered it. They patched it by adding bound checks and ignoring the requests where Heartbeat asks for more information than it could handle in its payload [9]. Google applied the patch as soon as they have completed the necessary tests. Despite Google’s swift release of a patched version of OpenSSL, many websites (over 800,000) remained vulnerable across the web for a few months. As of the second half of 2019, over 75,000 websites are still vulnerable to Heartbleed; that figure excludes some hardware networking devices such as routers and managed switches [17].

Heartbleed is not a bug in the TLS/SSL protocol. It is a bug in OpenSSL’s implementation of the protocol. OpenSSL is used in the majority of web applications and software which require a secure connection to the internet. List of such software includes computer drivers, default smartphone web browsers, printers, smart televisions, gaming consoles, smart speakers, and many more. In addition to these devices, many routers, modems, network switches and servers use OpenSSL, and a fair share of vendors do not provide software updates for a number of their hardware (usually the cheaper models) [14][17]. A large number of these were made and developed before 2014 and did not support software and firmware updates. For that reason, there are many devices running today which are still vulnerable to the Heartbleed bug.

One obvious way to solve the issue of outdated and unsupported software would be to purchase a newer device whose software is supported and is continuously being updated. However, that is not ideal, and many consumers would prefer not to spend extra money when their device is operating well. Consumers would then refrain from purchasing devices from a company whose software is always vulnerable and outdated. A better solution would be for companies to

12

continuously support and fund the free and open-source libraries they use in their software. By continuously supporting these projects, developers would be able to dedicate more time to working on improving the software and looking for vulnerabilities in the code [13].

2.3.1 After Heartbleed

Heartbleed was a big wakeup call to the security and software industries. It reminded everyone how commonly used free and open-source software is and how underfunded it is [13][15]. Heartbleed is considered by most to be the most severe software vulnerability discovered since commercial traffic began using the internet to-date [13][14][15]. The vulnerability could result in catastrophic damages to the internet and network security; it could have resulted in significant data breaches and billions of dollars lost. For that reason, Google had invested plenty of funding and resources to OpenSSL just a few months after the discovery of Heartbleed [18]. Google later announced that it had forked OpenSSL to create their own library, BoringSSL.

BoringSSL was designed to meet Google’s needs and hence only included the functionalities and algorithms used by Google [20]. The library is not meant to be used by the general public as it breaks compatibility [20]. BoringSSL’s team cleaned up a large percentage of OpenSSL’s code, documented it appropriately and wrote more tests to strengthen its security and improve readability. Google currently uses BoringSSL’s implementation of the SSL protocol for Android, its Chrome and Chromium browsers, and a number of their applications [20]. Despite Google’s fork of BoringSSL, they have supported (and still support) OpenSSL by continuously reviewing its code, testing it, and providing them with financial funding to keep the project going [20][21].

BoringSSL was not Google’s only contribution to open-source cryptographic libraries. Tink is a cross-platform, multi-lingual cryptographic library and toolkit based on BoringSSL, designed and written by Google’s cryptographers and security engineers. It was released as an open- source library in 2018 on GitHub [12][19]. Tink is aimed at being an open-source, well- documented, well-supported, and user-friendly cryptographic library to be used by everyone

13

[19]. According to their official GitHub page, their library aims to provide cryptographic APIs that are “easy to use correctly, and hard(er) to misuse” [19].

Tink provides secure and easy to use APIs which have been used in applications such as Google Play, Google Assistant, Firebase, and more [12][19]. It has also been used in some independent, personal projects on GitHub. However, Tink is not as common as OpenSSL or LibreSSL, for instance. It only has 64 contributors on GitHub, some of which have only contributed a minuscule amount [19]. This can be attributed to its young age in comparison to the other libraries available, as it is only under two years old from the time this thesis was produced.

Despite all the negatives and roadblocks mentioned regarding BoringSSL and Tink, both libraries were not vulnerable to many of the vulnerabilities OpenSSL had after BoringSSL was forked. That is due to software and security engineers removing over 10,000 lines of code and rewriting thousands more from OpenSSL’s codebase after the fork [12][19]. That highlights how poorly written and tested OpenSSL was, and the need for a better testing mechanism which would allow security and software engineers to detect more of these hidden – potentially catastrophic – vulnerabilities.

In addition to all that support, Google announced in late 2016 that it would release OSS-Fuzz. This fuzzer provides several security-critical OpenSSL-based projects with the ability to continuously fuzz their applications for security vulnerabilities [18][23] — more on fuzzing in section 2.4 and more on OSS-Fuzz in section 2.5.1.

2.4 Fuzzing – A Testing Mechanism

There are many testing methods and techniques which exist, each serving their purposes. Some of which are: unit testing, integration testing, user testing, system testing, and the list goes on. Fuzzing is another testing method which has been gaining lots of attention recently. Fuzzing is an automated, brute-force software testing technique which is performed by providing a program with a large number of pseudorandom, unexpected and invalid inputs, attempting to ‘crash’ the program and catching any unexpected behaviours [1][21][22]. However, not every

14

program can accept random input. Some inputs, such as signatures, cannot be generated randomly and need to be pre-determined by the tester. On the other hand, cryptographic libraries must utilise random inputs for them to be considered secure, as predictability reduces security [23].

Fuzzing is an advantageous method for discovering security vulnerabilities in software, especially in ones where the code base is too large and complex. Since fuzzing was first invented, the technology and fuzzing techniques have improved significantly. Machine learning, genetic algorithms (evolutionary algorithms), grammar representations, scheduling algorithms, static analysis and more have been integrated with many of the earlier, more ‘primitive’ techniques to help improve their efficiency and effectiveness [21].

Fuzzing was invented for many reasons, but one of the reasons was due to lack of testing in early Unix systems which caused it to crash whenever random, invalid data was provided to it as input, either by a user or through the internet (noise) [25]. Others had different reasoning; some were just frustrated with the lack of coordination between vulnerability testing teams and wanted unified testing suits for their applications [22]. The original fuzzing structure was simple. It contained a test case generator, delivery module, bug detector, and a bug filter [22]. The idea of testing a program with a chunk of random inputs was not new when fuzzing first came out. However, the automated and sophisticated process at which the input was provided, and how the behaviour of the program was analysed is what made different fuzzers unique.

Security and software engineers can use one of the many fuzzers available on the internet to test the security of their applications and noting any unexpected behaviours. Fuzzing is also a great tool to increase code coverage as it can test almost every possible scenario the code might reach. Research had shown that over 80% of communication software available today is vulnerable to some sort of an implementation-level security weakness [22]. It has also shown that out of 30 Bluetooth implementations, 25 crashed when tested using a fuzzing tool [22]. However, not all fuzzers are the same as some fuzzers may target different injection vectors, and some may have different test case complexities. A good fuzzer will cover all the input spaces of an interface and provide them with a wide range of raw and sophisticated data. Moreover, by choosing the right fuzzer for a program (a Bluetooth fuzzer to fuzz a Bluetooth

15

protocol for example), it will be able to uncover more vulnerabilities and programming flaws than a more generic fuzzer.

Each software to be fuzzed must have a fuzz target function implemented within it. This function takes in a raw array of bytes and processes it. It is recommended that developers write multiple, small fuzz targets rather than one large fuzz target. Writing smaller targets help developers analyse the results more quickly, detect more bugs and promotes a cleaner codebase [22][25].

Some fuzzers have been implemented as fuzzing frameworks which allow users to install those fuzzers and write their test cases to suit their needs [21]. Fuzzing frameworks require much time and effort to be implemented. Thankfully, there are communities on the web that maintain and improve these frameworks at no cost. Many of these frameworks include test suits which were written by other developers, which users can download and use to test their protocols and applications. However, there is always a need for new and better tests to be added to these frameworks, and many people are kind enough to share their own with the rest of the world.

2.4.1 Black Box Fuzzing

By default, fuzzing is considered to be a black box testing technique by many, as testers do not know the full implementation and the exact inner workings of the program they are testing. The term black box is used to describe programs where there may be a few interfaces where input may be provided, but without the program revealing any information on how the internals of the program function when these inputs are injected. It is crucial that for black-box testing, the target program is to be injected with input which it will not expect, as the results would reveal aspects of its behaviour which its developers have not thought of [22]. Some consider fuzzing to be a grey box testing technique, as the tester would need to know a little about how the program works from the inside for them to write accurate and reliable tests [25].

One of the significant benefits of black-box testing – including fuzzing – is the ability to test any software through any of their interfaces, such as APIs, network protocols, GUIs, return values, files, and more. For that reason, any person interested in the security of a software can fuzz it given that they have been permitted to do so (e.g. the average user cannot fuzz

16

Facebook.com directly through a web browser as the user will probably be flagged as a DoS attack/spam).

2.4.2 White Box Fuzzing

In addition to black-box fuzzing, another useful testing technique is white box testing, sometimes referred to as glass box testing. White box/glass box fuzzing is a testing method where the tester can view the code they are testing, and hence they can understand precisely what is going on [1]. Despite the differences, black box testing and white box testing are performed in the same manner and use the same frameworks. This form of testing ensures that the program functions as intended by allowing the fuzzer to run every line of code in the program, ensuring full code coverage. White box tests can be automated. However, testers must ensure that all of the program’s flows are exercised [25].

2.4.3 Problems with Fuzzing OpenSSL

OpenSSL is one of the protocols which benefited significantly from fuzzing. After the Heartbleed vulnerability was discovered (see 2.3), more time, money and resources have been invested into fuzzing OpenSSL than ever before. One of the outcomes of such funding is OSS- Fuzz, which will be discussed in section 2.5.1. Fuzzers have been, and still are, an excellent reason for OpenSSL’s security. They have detected hundreds of vulnerabilities which would have gone unnoticed otherwise [23].

However, one issue with fuzzing OpenSSL is that it has low code coverage. Low code coverage means that a large chunk of the software is not being tested for vulnerabilities, meaning that there could be several undetected vulnerabilities in OpenSSL. Fuzzing generally has shallow penetration for complex protocols, such as OpenSSL’s implementation of TLS/SSL and some of its complex cryptographic algorithms, and hence has a relatively low code coverage [23].

There have been multiple attempts at writing better and more comprehensive test cases for OSS-Fuzz (and other fuzzers), but that has not improved code coverage significantly. Research done on this issue noted the lack of research in how such libraries and protocols operate in comparison to the amount of test written [1][21][23]. Therefore, a better approach to solving

17

low code coverage is to investigate further how fuzzers can be better written in a way which allows them to cover more of the code base and possibly detect previously undetected vulnerabilities. One path which will be explored in this project is White Box fuzzing. This method should allow security and software engineers to view parts of OpenSSL which are uncovered, understand why they have been uncovered, write fuzzing targets to cover them and potentially improve black-box fuzzing for other software as well.

2.5 Fuzzing Engines

The following section briefly investigates some of the most common fuzzers used on OpenSSL.

2.5.1 OSS-Fuzz

OSS-Fuzz is an online, automated and continuously running fuzzing platform developed and launched by Google in late 2016 to help improve the security of many open-source software. It aims to do so by combining modern fuzzing techniques with scalable and distributed implementation [18]. OSS-Fuzz also supports libFuzzer and AFL fuzzer, which will be discussed in the next couple of subsections.

One of the unique points behind OSS-Fuzzer is the amount of computing power provided to it. It is continuously running on several of Google’s virtual machines till today and fuzzing over 290 projects, one of which is OpenSSL, despite OpenSSL having a low code coverage of around 38% [23]. According to Google, it fuzzes over 4 trillion test cases a week, and that number is continuously increasing as they add more computing power to these virtual machines. Before its launch, the fuzzer had already discovered 150 bugs, which further signifies the importance and need for a continuously running, autonomous, powerful fuzzer. By the end of 2019, it had found over 11,000 bugs in open-source software alone [18].

OSS-Fuzzer had grown in popularity after its release. Since it only supports open-source software, Google released a public fuzzer named ClusterFuzz, which is their version of a public

18

fuzzer that is capable of running on any software and can run on a cluster of up to 25 thousand machines [24].

Anyone can register to pull their open-source software into OSS-Fuzz’s repository to be fuzzed by Google. The fuzzer’s team then checks whether the software is indeed security-critical and whether it would genuinely benefit from OSS-Fuzz. However, they recommend that users download and install OSS-Fuzz on their machines, add their software to the fuzzer and then test-run it to ensure that their software can run successfully without crashing. Doing so can be extremely useful for this project as it would provide us with the ability to locally investigate the fuzzer, modify it, and experiment with it.

2.5.2 AFL Fuzzer

AFL, which stands for American Fuzzy Loop, is a fast, command-line fuzzing tool developed by Michał Zalewski. It is one of the fastest fuzzing tools available due to its low-level compile time. Moreover, AFL supports a function called __AFL_LOOP(), which processes new inputs as a for loop rather than creating a new process [23]. This helps to increase performance significantly and reduces run time.

AFL was used by Google to fuzz Chrome and some of its application. AFL also inspired Google’s OSS-Fuzz, and some aspects of AFL were used to build Google’s fuzzer. However, after OSS-Fuzz was released, Google switched to mainly using their fuzzer to test their software, including Chrome.

Figure 2: This is what American Fuzzy Loop looks like once it has executed [26].

19

Installing AFL is simple. It works best on a Unix machine, such as or Mac OS. A user would need to clone the repository on their computer, cd to the directory, and run several commands to fuzz-test their software [26].

2.5.3 libFuzzer libFuzzer is one of the most common fuzzing engines available. It supports a large number of applications and is a part of the clang compiler suite, and hence it can connect easily with other software. All that it requires is a fuzz target – a function – which follows a particular format.

Figure 3: An example of a fuzz target function in C [23].

libFuzzer also supports sanitisers. Sanitisers are tools which convert ‘warnings’ into ‘fatal errors’ which causes the program to fail when a warning could be a potential security vulnerability. Some of these warnings could cause memory leaks, which would be extremely dangerous if they were to be exploited. Sanitisers are optional as they consume a lot of memory and CPU power, which can hinder the fuzzer’s performance [23].

OpenSSL and libFuzzer do not work very well together due to libFuzzer’s main method interfering and conflicting with OpenSSL’s main method. libFuzzer’s team came up with a solution; adding a --with-fuzzer-lib flag during execution. This solved most of the compatibility issues with OpenSSL, but still did not integrate very well with each other [23]. libFuzzer can be easily installed and run on one’s machine by cloning the repository locally and following the necessary README steps. It works best on Unix-based machines such as Linux and Mac OS.

20

Chapter 3 Implementations and Experiments

3.1 Justifying OpenSSL

Before discussing the implementation and experimentations performed on the OpenSSL library, it must be justified why I focused on researching and fuzzing the OpenSSL library. As I mentioned earlier, OpenSSL is the most common and widely used open-source security library on the internet, with over 70% of all TLS/SSL implementation using it. It also has the highest funding, developer and third-party support out of any other SSL library.

It can be argued that other SSL implementations such as BoringSSL and LibreSSL are also relatively common (not as much as OpenSSL, though, and LibreSSL is more common than BoringSSL) and must be researched in addition to OpenSSL. However, these libraries are forks of the OpenSSL libraries – forked in 2014 after the Heartbleed incident. The OpenBSD project developed LibreSSL, and Google developed BoringSSL. Since they are forks of the OpenSSL library and have very similar implementations, it would be more optimal to research and fuzz the parent library, OpenSSL. OpenSSL contains a similar implementation to the two forked libraries, and many of the existing functions in the forked implementations remained unchanged.

Moreover, BoringSSL and LibreSSL are heavily maintained by their developers (Google and BoringSSL, for example, which is one of the wealthiest companies worldwide). At the same time, OpenSSL is developed by the OpenSSL Project, which consists of mainly independent developers. As mentioned earlier in 2.3.1 and 2.4.3, lack of funding for research and vigorous testing is one of the significant concerns in OpenSSL. This meant that there are not as many active developers working on the project to review and test pull requests and changes to the project. It also meant that there is not as much active research and development being performed into testing current implementation, increasing code coverage and potentially find bugs and vulnerabilities. This is another reason which I used to justify focusing this thesis’

21

security research and fuzzing implementation on the OpenSSL library rather than a range of libraries.

Besides, BoringSSL has far less commits than OpenSSL even after the fork in 2014, and if I were to examine the codebase, it would be evident that BoringSSL has much fewer functions than OpenSSL despite the funding. As mentioned earlier in the background chapter, BoringSSL was created to meet the needs of Google and its products, such as Chrome and Android. Hence, Google does not need to develop and test extra functionalities that they may not even use and would prefer to invest that time to focus on, review and test functionalities that they need. The lack of functionalities is also one of the main reasons why BoringSSL and Tink, BoringSSL’s fork, did not gain the same popularity as OpenSSL.

Even though more funding is being put into researching and testing OpenSSL, other libraries are being developed and administrated by large corporations and a modest number of employees that provide more adequate support than what OpenSSL receives. At the time of writing this report, there are less than nine active maintainers on OpenSSL’s official repository, and only a fraction of the maintainers work on improving the fuzzing targets that are run on OSS-Fuzz and other fuzzing frameworks. This contribution rate is incredibly low for open- source and widely-available software. For that reason, researching how to improve the library’s security would be very beneficial to the development of this open-source software.

3.2 Investigating Previous Implementations

3.2.1 OSS-Fuzz Project Setup

As I mentioned earlier in chapter 3.1, one of the main reasons OpenSSL is so widely used by developers and corporations worldwide, and one of the reasons why I chose to research OpenSSL is due to the large number of functionalities, platforms and environments that OpenSSL supports. OpenSSL has also received an increase in funding after the Heartbleed

22

attack, which allowed the developers to work on it even more. A product of that increased funding is OSS-Fuzz, which was first introduced in chapter 2.5.1.

OSS-Fuzz’s repository on GitHub has a directory named “projects” that contains sub- directories for every project that is being fuzzed by the engine. In each of these sub-directories, three files must exist for the fuzzer to support the project: a Dockerfile, a build shell script and a YAML document.

A Dockerfile is a file that allows applications to be containerised so that they may be built, scaled and redeployed on multiple different machines with just a simple command. For OSS- Fuzz projects, these Dockerfiles are based on a primary image provided by Google titled ‘base- builder’ [27]. The file also includes commands that run some basic Linux commands such updating and installing necessary dependencies that a project may need – such as the Clang compiler. It then clones the project from its main Git repository, builds it using the build shell script and runs it.

The build shell script is simply a bash script which describes how the application cloned within the Dockerfile may be built in the newly created container. Finally, the ‘project.yaml’ file contains information needed by OSS-Fuzz to properly fuzz the project such as the supported programming languages, fuzzing engines and data sanitisers. It also contains some other information such as contact details.

It is important to note that the structure of each sub-directory (a project folder) is not definite and may be slightly varying between different projects. For example, LibreSSL and OpenSSL both have an additional file defining the ‘max_len’ option in the libFuzzer engine to be of size 2048 bytes.

It is crucial to understand how these open-source libraries were implemented and how they differ in their implementation. Understanding the libraries would allow for more accurate analysis and execution of experiments. In the following sub-sections, I will investigate and describe how the implementations of OpenSSL, LibreSSL and BoringSSL work, study their functions, discuss significant differences, explore how they are being fuzzed and investigate how they were deployed on OSS-Fuzz.

23

3.2.2 OpenSSL

OpenSSL is akin to being the parent of open-source TLS/SSL implementations. Most common open-source libraries have either forked OpenSSL or have taken some inspiration from it, which can be evident from their codebase. According to the library’s GitHub repository [28], there have been over 27,300 commits at the time of writing on the library’s codebase. The library also had over 515 contributors since it was created. However, recent records show that only less than 20 developers are actively working on the project as of November 2020, where the majority of them are longstanding and established developers of this project.

OpenSSL is an extensive library in terms of file count, with over 4000 different files packaged in version 1.1.1h – the latest version at the time of writing. A decent percentage of these files are configuration, test, documentation, and miscellaneous files. The remaining majority consist of various C and C++ classes. These classes contain functions that can be used by the OpenSSL software to fulfil tasks requested by the users or developers. Examples of such functions are cryptography, application and networking functions. Despite the lack of comments in the majority of classes, a full list of every function supported by the library with a clear description of what they do and how they can be used is available on the library’s official documentation. The documentation helps developers know what they are working with. However, due to the lack of in-code comments, it can be challenging for new developers or testers to precisely understand the implementation of every single function. Hence assumptions have to be made, which could be incorrect. As an unfamiliar developer to the project, it did take me a while to understand every function I was fuzzing fully. However, it was not impossible, thanks to the documentation available online. I also noticed that every C file/class in the library had a low number of functions included within them, thus allowing the library to be more modular and to be debugged and tested more easily.

Regarding fuzzing, OpenSSL has included a directory in the library’s main directory named ‘fuzz’. This directory contains several C files, each containing at least one target to be tested by a fuzzing engine. As of November 2020, there are only 14 targets in the ‘fuzz’ directory.

24

Each one of these files corresponds to an OpenSSL feature or functionality. For instance, many of these fuzzing files test functionalities which can be found as a sub-directory under the ‘crypto’ directory, which holds numerous cryptographic functions.

After investigating the history of the fuzzing targets listed under the ‘fuzz’ directory, I noticed that the majority of these fuzz targets have only been created in 2016, around the same time as OSS-Fuzz was released. This does not necessarily mean that these targets were only created due to the release of OSS-Fuzz. Seven of the fourteen currently listed targets were all added to the directory in the same commit (c38bb72797916f2a0ab9906aad29162ca8d53546) on the 8th of May 2016. This commit was reviewed by Emilia Käsper, who is a security and cryptography researcher and a software engineer at Google. This review may suggest that this commit was done to pull all existing fuzzing targets from the developers’ local machines onto the repository so that Google’s new product may run them. However, the commit dates of the remaining targets vary by many months, which may suggest that these targets are new and were created specifically to be run by OSS-Fuzz.

According to the provided documentation [29], the targets support both libFuzzer and AFL as fuzzing engines. Since OSS-Fuzz supports both of those engines, it allows OSS-Fuzz to fuzz OpenSSL with those to engines to maximise code coverage and testing accuracy. Moreover, this allows users who are more familiar and comfortable with one engine to be able to write fuzzing targets and manually fuzz without having to learn a new engine or technique.

3.2.2.1 Target: client.c

The first target I thoroughly investigated and studied was the ‘client.c’ target. This target emulates and fuzzes what would be the client in the application/connection. This target imports multiple header files such as SSL.h and rand.h, where the latter is responsible for handling the generation of random bytes to be used by the application, not the fuzzer. Rand.h can also generate large, random and secure bytes to be used by cryptographic ciphers. The ‘client.c’ file also imports header files for ciphers, with the more well-known algorithms RSA and DSA also being imported. Moreover, the target also has a FUZZTIME option which was pre-defined by the developers to be 1485898104 milliseconds, which equates to a total fuzzing time of over

25

17 days. After the fuzzing targets run for that predefined period, it exits with a success status, declaring that no bugs were detected. Developers may fuzz the target again with the same environment and parameters, or they may modify them and see if the results differ.

Just like other targets, the FuzzerInitialize function is defined and implemented in this target. The target initialises OpenSSL’s crypto and SSL libraries, since, in normal operations, OpenSSL automatically initialises and allocates resources to the application. However, while fuzzing, the developer must explicitly call these functions. The target then initialises the object SSL_get_ex_data_X509_STORE_CTX_idx, which retrieves the ex_data index that is required to access the SSL structure [30]. The target then calls the remaining functions that are necessary to initialise OpenSSL and the fuzzer to operate without odd errors.

The next function in the target, FuzzerTestOneInput, is the fuzzing target’s “main()” function that it first calls, after initialising the environment. This function describes the steps that are required to create, run and use an OpenSSL client application. The following is an ordered list summarising the execution steps and logic behind this function:

- Numerous objects and variables are initialised; - Validates that the fuzzer is not generating input with size 0; - Create a new Client SSL object with SSL context as a parameter; - Performs several assertions; - Sets the in and out buffers to the current SSL object; - The client then writes the randomly generated data into the in buffer and asserts that it has been inserted correctly; - The client then performs a handshake and checks the return value; - If it returns a value of 1, the client will continue to read application data until it reaches the end of file (size 0), or detects an error; - After the client reaches the end of all the logic, it would free its allocated memory - OpenSSL would then clear the error buffer, frees the context and returns 0, indicating that no bugs were detected

26

These steps are well-written, logically sound and allows the client application to run as expected using the randomly generated input by the fuzzer. After fuzzing the target on the local machine for a few days, I noticed that it executed over a hundred million times and covered around 2700 lines of code while maintaining the number of lines covered before terminating the fuzzer. After investigating the functions called, I was able to count the number of lines in every function (and sub-functions that it might call), and it was very close to the number of lines fuzzed by the engine. The results indicate that the fuzzing target is very well written as it covers the vast majority of the lines in the functions and subfunctions called within the target. Since the target is very concise and very well written, it is impossible to make this target any smaller without compromising the code coverage, which is the opposite of what must be done to fuzzing targets. Hence, this target efficiently creates a client application, loads dummy data into the buffer and executes the necessary operations in just under 40 lines.

As mentioned earlier, the target’s code coverage is excellent and is very close to the actual number of lines of the functions covered by the target. However, I still needed to analyse the data even further. The fuzzer produced thousands of lines which contained information about how the fuzzer is currently performing. However, that data is enormous and would take hours to record them manually on Excel. So, I wrote a small program (appendix A) that analysis the logs and results from libFuzzer and plots them on a graph and also stores them in a CSV format (appendix D). Since the data provided by the fuzzer are not accurately and adequately scaled, I had to modify the program so that it scaled the data by a set amount that I could assign to it in the “rate” variable. I was not able to plot for every single execution as that would have resulted in gigabytes of data which would have been extremely inefficient and time-consuming to produce, hence why I implemented the scaling feature.

The following figure is a plot of the client target’s code coverage over numerous executions after fuzzing it for numerous days, scaled by a 1:100 (1 datapoint per 100 executions).

27

Figure 4: Code coverage of the ‘client.c’ target

The graph above shows the number of lines covered over > 100 million executions. It can be seen how vital fuzzing over long periods is, as the code coverage continues to increase significantly at a relatively high rate for the first 30 million executions. There are a few sudden jumps, with a small one happening right before 0.2 on the horizontal axis, a large one occurring right after 0.2, and the greatest jump occurring right after 0.3. After revisiting and confirming the validity of these anomalies through the fuzzer’s logs, I found that these jumps were due to a new case being discovered by the fuzzer where it needed to access new functions that it had not visited before. This is a normal phenomenon that occurs with most targets and is not necessarily unique to the client target precisely.

After the fuzzer reaches past the 0.3 mark, it seems to have not discovered new lines or functions as much as it did early on in the fuzzing process. The target seemed to have peaked around the 2665 coverage mark by the time the fuzzer was terminated. The trend seemed to have slowed down as well, reaching a somewhat clear limit which was then confirmed by investigating all of the functions it could have visited. This reaffirms the initial indication that the target was well written and could not be divided or made any more concise – without comprising code coverage or without modifying the underlying library’s implementation.

28

Moreover, the fuzzing target is very stable when it comes to the number of executions per second, as it maintained an average of around 1950 executions/second while using 25% of the available CPU resources. This would equate to around 7800 executions/second if the CPU were to be 100% utilised. The following figure is a snippet of the target’s logs around the time it hit the 100 million executions line after the target had been fuzzed for days with 25% of the computer’s resources utilised to it, as the remaining resources were allocated to other targets being fuzzed in the background.

Figure 5: Snippet of the target’s logs while it is being fuzzed

The target has received some updates throughout the years. However, only three updates were improvements or modifications to the target, while the rest where modifications to the license, copyright year, and other metadata information. However, these updates did not improve the target’s ability to increase code coverage, but it allowed the target to work with more recent versions of the library and to be more efficient in terms of executions/second.

To summarise the findings, the target ‘client’ is a very well written and concise target that covers most of the code lines that it could have access to very efficiently and at an exceptionally satisfactory rate. It does not require any significant modifications as it tests the functions accurately.

3.2.2.2 Target: server.c

Similar to the client.c target, the OpenSSL (OSSL) team also wrote a fuzzing target aimed at emulating, testing and fuzzing the OpenSSL server functionalities and libraries. This target is titles server.c. Similar to the client.c, server.c also imports numerous libraries, most of which are security and cipher header files to be used by the server. However, unlike the client target, this target imports server.c. Similar to the client.c, server.c also imports numerous libraries, most of which are security and cipher header files to be used by the server. However, unlike the client target, this target also includes unsigned integer constructs that hold the data of

29

multiple certificates (public key) and their private keys for most of the imported cipher algorithms. Moreover, this target has the same FuzzerInitialize function as client.c and hence, server.c initialises the same variables and functions as the client target. The target includes initialising the same cryptography libraries with the same settings.

Moving on to the FuzzerTestOneInput function – which acts as the “main” function –, it initialises the same variables as the client target: the Buffer object variables, SSL object variable and an SSL context (SSL_ctx) object variable. The server target also initialises more variables that are required to store data that are used and created by the cipher algorithms. The function also performs a size check on the random data to ensure that the size (length) of the data is greater than ‘one’ to meet the protocol’s requirements. Then, the function performs the following operations to mock a server running OpenSSL (OSSL) and fuzz it:

- Sets several environment variables that are required by OSSL to operate; - Performs assertions to confirm the changes have been updated; - Then, depending on the encryption libraries supplied with the target, these algorithms are executed; - All of the ciphers share many identical functionalities in the OSSL suite and have similar implementations; these are listed as follows: o Creates new temporary input/output buffers; o Perform assertions on the buffer, keys and their sizes; o Use the public and private keys to execute cryptographic functions; o Assert that the results are valid and then free the temporary buffer; o Then, more cryptographic operations and assertions are performed on the same keys and certificates; - Back to OSSL, the SSL object’s buffers are set to the current buffers in memory (as assigned by the previous cryptographic operations; - An assertion is performed on the input buffer, the randomly generated data and its length; - The server then performs an early data read check; - OSSL then attempts to perform a handshake with the server using the dummy data;

30

- Finally, the fuzzer frees all of the used memory and returns 0 if it found no bugs.

I also noticed that the OSSL fuzzing team aims to add support for the SRP and PSK algorithms as per their TODO comments.

Despite the results that this target may produce, it is not as efficient, and it does not have as many executions per second as the client target, despite sharing many of the same functions and libraries. I investigated the reason behind it, and it came down to the ciphers. The fuzzer would be required to recompute and perform assertions on the certificates and their private keys on every execution. This issue could be fixed and made more efficient by pre-computing and hard-coding the results. Since the ciphers and other cryptographic algorithms can be fuzzed in separate targets to isolate the results, making all targets more efficient, and making the results easier to analyse and study. More on this in section 3.3: current experiments and implementations.

Moreover, after being fuzzed for a few days, the target covered a high code coverage of over 4200 lines of code. After investigating and estimating the maximum number of lines involved in performing a handshake with the server, the estimated number of lines was found to be less than the fuzzer’s code coverage. The reason behind this is due to the fuzzer also accessing functions that are not being fuzzed with the randomly generated data. Examples of such functions are the cipher libraries. Despite unnecessarily fuzzing these extra lines, the target still had a very high code coverage and maintained it despite it being fuzzed for a while. The following figure shows a graph representing the lines of code covered over the number of executions performed, scaled by 1:100 (1 point per 100 executions).

31

Figure 6: Code coverage of the 'server' fuzzing target

As seen by the graph above, the fuzzer almost instantly covers over 3000 lines of code after it being executed. It then continues to rapidly rise, slowing down at around the 3800 lines mark for the next few million executions. At around 5 million executions (0.5 mark on the horizontal axis), a new function is discovered and hence the reason behind the jump to over 4100 lines. It then continues to climb past the 4200 code coverage mark, stabilising there and barely covering any new lines. After that, the fuzzer had to be stopped as it was not covering any more lines due to it accessing every function possible. However, this does not mean that every possible combination was executed on these functions. Hence why it is highly recommended to continue fuzzing the same target for much longer durations, and if possible, have multiple instances of the same fuzzer running either in a cluster or in a separate core on the same CPU. While experimenting, I had two other instances each running on a separate core, and both had almost the same code coverage, with a 1 – 2 lines difference between all results, which is negligible.

As mentioned in the previous paragraph, the reason behind the two sharp jumps in code coverage between 0.1 and 0.5 is due to the fuzzer discovering new cases where it needed to access ‘new’ functions. After investigating the logs and understanding what types of functions it is discovering, I found that all of these newly detected functions were encoding functions that were needed to convert certain encoded characters to other encoding formats. These

32

functions can indeed be fuzzed as their targets. However, while checking the OpenSSL (OSSL) fuzzing directory, there does not exist any specific and isolated fuzzing targets for encoding functions. All encoding fuzzing is done at random while other functions are being fuzzed – similar to what is happening with the server target. I have written targets to convert specific types of encoded characters which will be discussed in more detail in section 3.3.

In terms of executions per second, the target seemed to have been performing at a much lower execution rate than the client target – 3x slower on average. Since both targets have an almost identical implementation and since both just perform a handshake, I decided to remove all of the extra functions in server excluding the cryptographic functions that are required when initialising an SSL object/variable. Also, as discussed earlier, using pre-initialised objects would make the executions much more efficient. I then compiled the target and fuzzed it again. After fuzzing it for over ten minutes, the target still had a similar execution rate. I was then able to conclude that these extra functions and calls are indeed slowing down the fuzzing process by increasing the number of operations performed in each execution.

In conclusion, the server target is a beneficial target that acts as a server using the OSSL implementation of the TLS/SSL protocol. It covers the majority of the code that can be possibly executed by a handshake performed by a server. However, it can be modified and made more efficient by pre-initialising the SSL objects with the pre-loaded certificates so that they are not executed every time. Moreover, some of the functions reached by the target are very common can be fuzzed separately, hence further increasing code coverage. Besides, fuzzing these common functions separately increases the test coverage of the entire library and strengthens it.

3.2.3 LibreSSL

LibreSSL, which was introduced previously in chapter 2 and briefly re-introduced in this chapter, is another excellent open-source security library that was forked from OpenSSL. Despite the fork occurring over six years ago and despite it a part of the OpenBSD project, it has not received a significant amount of support and development in comparison to other

33

similar libraries. As of the time of writing, the library only has around 1550 commits [31]. The majority of these commits were minor updates such as modifying comments, updating method name, removing or adding header files, updating changelogs, and other similar updated. It was difficult finding any significant commits that at least moderately updates the library’s implementation from the OpenSSL fork. The majority of the SSL protocol files in the library have not been functionally updated in years, and only a few have had a significant update that made it different from OSSL’s implementation.

LibreSSL has much fewer total contributors in comparison to OSSL, with a total of 61 contributors that have contributed to the project at some point, in comparison to OSSL’s total contributors of over 515; that is at least 11% of OSSL’s contributors. However, only 7 of the 61 total contributors were active on the project for the past year. Out of the seven contributors, only three have committed at least ten times to the project. Moreover, only two out of the three are actively working on the project [31]. This low number of contributors can be one of the main reasons why the library has not developed much since the fork from OpenSSL. However, the lack of updates can also be attributed to developers not justifying the need to work on a library that is not very functionally different from OSSL, is a little behind on updated in comparison to OSSL and is much less prevalent than OSSL. Hence, developers may be more attracted to a more active and well-funded project that is more widely utilised. More importantly, as mentioned in the background research in chapter 2, LibreSSL was mainly forked from OSSL so that the developers could work on the library to make it complement the OpenBSD and environment better. LibreSSL was not primarily forked to be a complete alternative to OSSL.

OSS-Fuzz is continuously fuzzing LibreSSL alongside hundreds of other projects, including OpenSSL. However, the ‘fuzz’ directory in the LibreSSL repository contains the same targets as OSSL’s targets, only modified to work with LibreSSL. Moreover, most of these targets were not functionally modified since 2018. This implies that the LibreSSL and OpenBSD team are not planning on investing much time – if any – into improving fuzzing targets. This discovery came at no surprise, however, since the entire library is not updated as frequently as other

34

similar libraries are. For this reason, I did not spend much time investigating the fuzzing targets and experimenting with them as they were almost-identical copies of OSSL’s targets.

3.2.4 BoringSSL

BoringSSL is another fork of the OSSL library which was also initially introduced in chapter 2 and briefly re-introduced earlier in this chapter. BoringSSL has had significantly more development work done on it than LibreSSL, with the library being updated on an almost daily basis. Out of these updates, the vast majority of them are functional updates that have an actual impact on the library’s functionality. As of the time of writing, BoringSSL has over 6300 commits on the library’s codebase. Also, the library has had over 105 contributors over its lifetime, with over 12 of them being active in the past year. Out of these 12 contributors, five have committed on the repository over ten times the past year. Out of the five, only two have been continuously working on the library [32]. These contributions might seem very familiar to how the contributions are laid out in LibreSSL. However, they are very different. Those two very active developers have been working on this project and committing almost daily – excluding weekends – improving the library’s functionality, security and compatibility. The most crucial difference to note, however, is the fact that both of those highly active developers – David Benjamin and Adam Langley – are both software engineers at Google. Therefore, it means that part of their daily jobs is to maintain, improve and continue working on the BoringSSL project.

Even though Google financially supports OSSL so that it continues to be improved and developed, Google decided to fork OSSL into their version of the library, naming it BoringSSL and funding this library as well. A result of that funding is the number of commits seen in this library, particularly by those two Google software engineers. As previously mentioned, one of Google’s main reasons to create BoringSSL is so that they can tune the library to work better and more efficiently with their software, such as Chromium and Android, while at the same time improving the library’s features and codebase. Google still uses OSSL for some of their applications and would still like to use some of the functions that OSSL may produce after the fork, hence why they continue to fund it till today. Moreover, Google strongly believes in a

35

more secure internet. Since most applications and websites on the internet use OSSL in one form or another, it would still like the library to have continuously active support.

In comparison to OSSL and LibreSSL’s heavy use of C in their codebases, BoringSSL uses a wide variety of languages in its implementation. BoringSLL uses C and C++ almost evenly taking up just under 3/4th of the codebase, Perl and Go coming in next with around 1/4th of the codebase, and other languages covering up the remaining portion [32].

BoringSSL have had fuzzing targets since at least 2015, which is one year earlier than what OSSL’s repository states as their first target. BoringSSL’s targets are all written in C++, which is different from OSSL’s and LibreSSL’s C written targets. Moreover, BoringSSL has all of the fuzzing targets placed in a fuzzer.h folder as functions, so that every actual file to be fuzzed in the ‘fuzz’ folder only requires one line to be imported. This idea is fantastic and a great way to write fuzzing targets. It also reduces the number of repeated code (such as the initialisation functions). Fuzzer.h has received numerous functional updates over the years, much more than OSSL and LibreSSL. The latest update was performed almost a year ago on November 2019. That update allowed TLS 1.3 to be enabled by default while fuzzing.

In conclusion, BoringSSL’s targets are exceptionally well written and covers thousands of lines. It is being continuously fuzzed on Google’s virtual computer cluster and allows multiple fuzzers to operate at the same time and on multiple computers. BoringSSL also receives much support for testing and development, even more than OSSL, hence why not much time was spent researching how to improve fuzzing for BoringSSL.

36

3.3 Implementing New Solutions

Before implementing new solutions for the OpenSSL library, I had to research, investigate, experiment and analyse previous and already existing solutions to more clearly and accurately understand where the research is at currently. The findings and results from the research and investigations performed are discussed in detail under section 3.2. Throughout this entire chapter, I have performed the following experiments on a set workstation – unless specified otherwise. The specifications of that workstation are:

- CPU: Intel i5-7600k 4-cores water-cooled and overclocked @ 4.5Ghz per core - Memory: 16GB DDR4-3000 MHz RAM (Dual Channel) - Storage: 4.25TB of mixed storage – experiments performed on a Nvme 2.0 SSD - GPU: Gigabyte Nvidia GTX 1060 6GB

Despite a graphics card (GPU) being installed on the workstation, it has not been utilised in any actual experiments other than to render video onto the monitors. The quad-core capabilities on the CPU allow for multiple fuzzers to be run at once without conflicting with one another. However, more cores would allow for even more fuzzers to run at once, but that was not a significant issue with the experiments conducted. The memory used by the workstation is 16 Gigabytes of DDR4 RAM which allows for data to be stored and accessed very quickly by fuzzers and other applications that I experimented with. The storage utilised to the workstation is more than enough and allows for gigabytes of records and other data to be stored and accessed at high speeds, due to the Nvme 2.0 storage having read/write speeds of over 2000MB/second.

In regards to the packages used for the experiments, the OpenSSL (OSSL) library was used as the opensource package to be tested. Many versions could have been used for testing, such as 1.1.1g, 1.0.1f, 3.0.0 and more. However, only one was mainly chosen for testing, and that was release 1.1.1g due to it being the most stable, recent and most downloaded release at the time. Release 1.0.1f was used to mock Heartbleed only and was not used for any further testing due to it being outdated and is not used by almost everyone anymore. Version 3.0.0 is the newest

37

release of the library and is still in the alpha and beta phases. Since there is no stable version for the library, and since I faced some issues attempting to fuzz it, I decided to not go ahead with 3.0.0. The issues I faced were most likely due to API and documentation changes which would not have been documented yet due to the release being in its very early stages. However, this is an exciting field to investigate once a stable version is released and realise whether fuzzing would be any different under the new release, or not.

3.3.1 Improving already existing targets

As discussed in the subchapter 3.2, many fuzzing targets already exist for all three projects – OSLL, LibreSSL and BoringSSL. Depending on the project and the specific functionality being fuzzed, these targets may be well written, outdated or have a low library code coverage. Many of these targets require some modifications to make them more efficient, cover mode of the codebase or to focus on one specific functionality (by splitting up the target). Based on the targets investigated in 3.1, it seemed that the server target might need to be modified based on the recommendations mentioned earlier to improve the target’s performance, and potentially its code coverage.

The server target has many functions that could be fuzzed separately. However, for this section, I will modify the target to make it more efficient to run while minimising the number of unnecessary functions being performed.

Initially, I attempted to remove a while loop in the target. Before, entering this while loop, an if statement check performs an AND bitwise operation with the last byte in the buffer and 0x01. This operation checks whether the buffer is of an even value or an odd value. If the value is odd – hence satisfying the condition – it enters the ‘do{}while()’ loop. Otherwise, it skips the entire operation and code block and attempts to perform an SSL handshake. The following figure is a snippet of the code block that executes these operations.

38

Figure 7: Snippet of the do-while loop operation in the server target for OSSL

As it can be seen from the previous figure, the do-while loop attempts to read early data from an early connection – only if the connection has it enabled. This functionality is mainly utilised by connections where TLSv1.3 is being used, where connections may exchange some data before performing a handshake. This feature was also briefly discussed in chapter 2. As per the documentation, an if statement checks whether the SSL_READ_EARLY_DATA function returned SSL_READ_EARLY_DATA_SUCCESS, …FINISH, or …ERROR. As long as the function returns a …SUCCESS message, it should continue reading till it either encounters an IO error (or any other error) or until there is no more data to be read [33]. I decided to duplicate the target, remove this function from the duplicated target, compiled it with OSSLv1.1.1g and fuzzed it alongside the original target, with each target running two instances (so a total of 4 fuzzers running at once – 2 for each). The results were then averaged, plotted, studied and analysed.

The following figure was created using a modified version of the initial graphical and analysis program that I created (appendix A), which allows for two datasets to be plotted on the same graph. The figure describes the code coverage of both targets over the execution it was performed on, each being the average of the two instances it was running.

39

Figure 8: Graph representing the code coverages with and without READ_EARLY_DATA

For the figure shown above, the red line represents the dataset of the fuzzing target, which had the READ_EARLY_DATA function, while the blue line represents the dataset which had the function and its while loop removed. Both targets had semi-identical code coverages, with the red dataset covering a few more lines than the blue dataset, and it covers it quicker over a much shorter duration in comparison to the blue dataset. This result hinted that the target might indeed be running more efficiently with the function and the loop than it is without it. This experiment prompted for a slight modification and updated the program which I created to parse the logs for the rate of executions (executions per second).

The following figure was created using the updated program, and it has the same keys as figure 8, where the red dataset represents the target with the READ_EARLY_DATA function and the blue dataset representing the target without it.

40

Figure 9: Graph representing the number of executions per second with and without READ_EARLY_DATA

As it can be seen in the figure above, both targets seemed to have a very high execution rate of just under 1300 executions per second for the first few executions. However, they quickly drop back down to the 800 mark, slightly fluctuating and then slowly drops under 800. At this point, the blue dataset’s rate (without the function) drops at a much sharper rate than the red dataset’s rate (with the function). However, at around the second datapoint on the horizontal axis, both targets’ execution rates started to increase slightly, with the blue dataset continuing to be approximately 50 executions/second less efficient than the red dataset’s rate.

It can be concluded from both graphs that the function, despite it causing the target to be more computationally expensive in theory, it essentially improves the entire processes’ performance, including real-world applications. This finding also strengthens the premise that the encryption and decryption (cipher) functions are the ones slowing down the fuzzer.

After noticing that the target had several function calls that could have been pre-initialised and called – without needing to call them every time the target is fuzzed with data – I decided to investigate whether it was possible to perform these calls once. Also, if it was possible to perform these calls once, I wanted to gather, study and understand the data and whether they

41

indeed made any difference, to support my initial hypothesis. These functions were used for cipher, encryption and decryption purposes and did not utilise the randomly generated data whatsoever. The functions took in parameters such as temporary buffer variables, public certificates and private key data (both in hexadecimal formats). Computations are performed on the same set of constant and hardcoded data without any new modifications using the data generated by the fuzzer; hence the same lines of codes are covered every time the fuzzer starts a new execution, potentially wasting resources.

Initially, to test it out, I removed all of the functions related to the ECDSA and DSA ciphers, keeping only RSA so that the fuzzer could function. The RSA functions are still being re- executed every time the fuzzer is created, however. I compiled the target with OSSLv1.1.1g and fuzzed it for a few minutes alongside the ‘stock’ target provided in the OSSL repository. I noticed that even though the modified target had been running for just a few minutes, it had over four times the execution rate of the original target. Despite the significant increase in execution rates, the code covered by the modified target was only around 15% - 25% lower than the original target’s code coverage. The following figure describes the code coverage before and after removing the extra encryption functions.

Figure 10: Code coverage for the server target with and without all ciphers included

42

Regarding the previous chart, it is imperative to note that both targets were fuzzed at the same time. The discrepancy between the red and blue datasets is due to the different execution rate, which will be discussed next. As it can be seen from the chart, the red dataset represents the standard server target without removing any of the functionalities it comes with. The blue dataset, on the other hand, represents the target with all extra cipher functionalities removed except for RSA. Initially, the blue dataset covers around 2500 lines when it first ran, and the red dataset coves around 3000 lines when it was first executed. The red dataset follows the same trend and reaches the same > 4000 lines code coverage peak as previous server tests in section 3.2. On the other hand, the blue dataset seemed to be nearing its peak at just above the 3000-line mark. This proves that some of the code covered in the original server test (red dataset) is due to the extra cipher libraries that were included within the test. These ciphers can – and should – be fuzzed separately in their targets so that each fuzzer can focus on as little functionalities as possible to maximise results – as discusses in chapter 2.

In terms of execution rate (executions per second), the target with only RSA included is much more efficient than the original target. The following figure describes the execution rate of both targets on the same chart.

Figure 11: Comparing the server target's execution before and after the cipher modifications were performed

43

Looking at the previous figure, it can be seen that the executions per second (execution rate) for the blue data set (which represents the target with only RSA included) was fluctuating significantly between 2000 and 3000 executions per second in the first few thousand executions. After that, it began to slowly stabilise at around 2200 executions per second till right before the 4 million executions mark (0.4 on the chart), where it began to slowly decline and stabilise between 1900 and 1800 executions per second for the remainder of the fuzzing process. On the other hand, the red data set (which represents the target with all ciphers included) stabilised at around the 700 executions / second mark, before it continued to decline at a semi-linear rate, approximately hitting the 600 executions / second mark before being terminated.

The data for this figure was retrieved from the same source as the data in figure 10. It means that, as mentioned previously, both targets were fuzzed and terminated at the same time. This can also be seen on the graph since the rate of the RSA-only target (blue) is around 3x faster and more efficient than the all-ciphers target (red), and since the blue dataset has around 3x more executions than the red dataset. By looking at the results from both figures and the significantly evident correlation between them, it can be concluded that the original hypothesis is correct and that these extra functions are indeed making the entire target run slower. Dividing the target into smaller, more portable targets and running them all at the same time – with each having their own dedicated CPU core – it would allow the targets to be fuzzed at least three times quicker than if they were all in the same target. This applies with all other targets that have a similar structure and format of fuzzing; the more these targets are split into smaller targets, it would allow for better code coverage and more executions (fuzzes) at a fraction of the time.

3.3.2 Creating new targets

After improving already existing targets, the code coverage for the library’s codebase is still not high enough. These improvements mainly highlighted issues with current targets and improved their overall fuzzing performance by multiples. These already existing targets have

44

already reached the maximum code coverage they could reach – or close enough to the maximum coverage. This means that there is a lack of more fuzzing targets that fuzz all of the other untested functionalities. Developers and testers do not necessarily have to write a target for every single function. However, a good starting point is to first test and write targets for more common functions, then slowly moving on to fewer-common functions and features. Doing so ensures that while more of the codebase is being tested and covered, more common functions are also being targeted, hence increasing the number of users these tests impact.

Initially, I analysed the results from section 3.2 to find out what functions are being frequently called by them and noticed that a large majority of these functions were encoding libraries to convert particular character sets to character sets supported by the SSL protocol. Examples of these encoding functions include Unicode to ASCII, UTF8 to Unicode, UTF8 to Ascii and more similar libraries. Some encoding libraries can convert character sets in just a few lines, while others may require the fuzzer to access hundreds of lines to safely and accurately convert them; more on this later in this chapter and chapter 4.

As discussed earlier, targets such as the server target could be split into multiple targets to increase efficiency and achieve code coverage within a shorter period. Moreover, the functions it continued to discover as it was being fuzzed (NEW_FUNC) were mostly encoding functions. For that reason, I decided to write a few targets that attempt to fuzz these functions alone, increasing the code coverage for these functions specifically and potentially finding bugs or vulnerabilities within them.

3.3.2.1 Unicode to ASCII

The first target that I wrote to fuzz encoding functions was for the Unicode to ASCII (OPENSSL_uni2asc) function in the OSSL library, which can be found in appendix B. The function takes in a data pointer of unsigned characters and the size of that data. It then calls the function and stores the returned data into the “output” variable. It then checks whether the variable has been used or not, and if it has been used, it frees it to prevent memory leaks and unexpected behaviours. The program then returns 0 when it fuzzed the data successfully so that

45

it continues fuzzing with more data. The following figure contains a chart that describes the code coverage that the target provides based on the number of executions.

Figure 12: Code coverage for the Unicode to ASCII fuzzer

The chart in the figure shows that the target only covered 18 lines. After checking the project’s implementation, I discovered that the target indeed covered all possible lines that are involved with this function every time it was executed. The target may continue to be fuzzed to check for any possible bugs or undetected cases. However, after running it for just over a day, it did not detect any unexpected behaviour. The target also had an average execution rate of around 200,000 executions per second, which meant that I was able to fuzz the target at a very high rate within a short period.

3.3.2.2 UTF-8 to Unicode

The next function that I attempted to fuzz was a ubiquitous function that I noticed was being frequently called by other targets while investigating previous implementations; the function is OPENSSL_utf82uni. Surprisingly, there were no targets on OSS-Fuzz that were explicitly made to fuzz this function alone purely with the randomly generated data. For that reason, I decided to write a small target that fuzzes the function alone without any extra unique

46

functionalities (found in appendix C). The function takes in four objects/variables: the random data, size of the data, a uni and unilen unsinged characters, with double pointers for uni. The code coverage with this target was very impressive. After running the target for a long while, it was able to cover more and more lines the longer it was kept fuzzing. Before terminating the fuzzer, it was still evident from the logs that the target could still fuzz and detect more lines if it were to be kept running for longer durations. The following figure describes the code coverage of the target over the executions it performed.

Figure 13: Code coverage for the UTF-8 to Unicode target plotted over the number of executions performed.

For the majority of the first 300 million executions performed by the fuzzer, the code coverage remained at around the 800 lines mark. However, at approximately 320 million executions (3.2 on the graph), the fuzzer discovered a new function which caused the code coverage to jump and reach 1,100 lines. The fuzzer then continued to climb to cover over 1,200 lines slowly. Looking at the graph, it is evident that the target’s code coverage had a slight upwards trend for the last 50 million executions, which means that the target could encounter another jump if it were running for a while longer. However, the target had to be terminated as it reached the maximum time allocated to it.

47

In addition to code coverage, the execution rate of the target is also awe-inspiring, peaking and stabilising at approximately 57,000 executions per second after hitting the 20 million executions mark. The following figure describes the execution rate of the fuzzing target.

Figure 14: Execution rate for the UTF-8 to Unicode fuzzing target

The figure describes the execution rate for the UTF-8 to Unicode target. However, it does not cover the entirety of the fuzzing process displayed in figure 13 because of a logging issue, due to which I was unable to reproduce an accurate graph for the target. Despite that, after I manually investigated the logs, the execution rate remained at an almost stable rate of 55 – 57 thousand executions per second for the entire fuzzing process. Moreover, the gradual increase in fuzzing the application is due to libFuzzer detecting new functions while starting up. However, once it retrieves all of the applications that it requires, it stabilises and returns the real execution rate.

Using the data gathered in this subsection, it is apparent that this target – in addition to other potential encoding functions targets – would benefit significantly from OSS-Fuzz’s cluster fuzzing technology and continuous testing capabilities. Doing so would also allow for increased code coverage on some of the most frequently used functionalities in the OSSL library.

48

Chapter 4 Results and Discussions

This chapter merely outlines, highlights and further discusses the results and findings discovered while experimenting with the projects in chapter 3, hence why I strongly recommend the reader to review that chapter – specifically sections 3.2 and 3.3 – to more appropriately understand this chapter.

4.1 Discovered bugs and vulnerabilities

While working on this project, one of the aims of it was to fuzz targets on the OSSL library attempting to detect a new vulnerability that has not been detected before. However, this aim was not the main aim of this project but was more of a nice-to-have, since Google’s supercomputers are continuously fuzzing the projects with computers that are thousands of times more powerful than my own. This aim became even less significant after discovering more critical and substantial issues with the fuzzing process than just detecting vulnerabilities. These issues – as discussed in chapter 3 – were part of the reason why not that many vulnerabilities or bugs were being detected in the first place. Hence, investigating and providing a solution to these issues was of more importance.

Moreover, discovering new vulnerabilities or bugs is not that common either, especially in OSSL’s history. Looking at OSS-Fuzz’s logs for the OSSL library’s fuzzing targets, there have been only a handful of actual bugs being detected within the library’s codebase in the past year. The majority of these bugs were due to the team modifying a few lines or creating a new function that would break the fuzzer [34]. It is easier to detect bugs and vulnerabilities if the fuzzer was already running on OSS-Fuzz, which always uses the latest release, rather than having to download the new release every once in a while manually.

Even though discovering new vulnerabilities or bugs was not the primary goal of this project, targets were still fuzzed for days in hopes that it would detect bugs, especially for targets that

49

were modified and updated, and for new targets that have been written for functions as investigated previously in sections 3.2 and 3.3. However, no new bugs or vulnerabilities were detected throughout the project, including the new and modified targets. The results do not mean that no bugs or vulnerabilities exist in the functions being fuzzed. As mentioned in section 3.3.2.2 and as it can be seen in figure 13, one of the new targets that I wrote, UTF-8 to Unicode, was being fuzzed for days without discovering any new vulnerabilities. Even while reaching the maximum fuzzing time allocated to it for this project, the code coverage continued to increase. This increase meant that there is a high probability of a new function being discovered by the fuzzer if it was to be fuzzed for longer and if more computing resourced were to be allocated to it. For that reason, this target – alongside both, the newly written and modified targets discussed in chapter 3 – will be committed onto the OSSL repository so that they may be fuzzed on OSS-Fuzz.

4.2 Code coverage and fuzzing performance

Despite not discovering new vulnerabilities or bugs, more significant and more important discoveries were made concerning the code coverage and overall fuzzing performance for targets currently being fuzzed for open-source cryptographic and TLS/SSL libraries, but especially OpenSSL. These findings will help improve currently existing targets, hence increasing the total code coverage of the library. It would also help developers write even better new targets once they understand the impacts of the findings discovered in chapter 3.

The overall code coverage by fuzzers of the OSSL library (and LibreSSL, since both use the same targets) is not very high and is relatively low compared to other testing methods. For such a security-focused library, critical testing for the majority of the library is essential, and developers and contributors must spend more time on improving that. Some of the experiments I performed in chapter 3 were explicitly focused on increasing code coverage and to also increase the execution rate for fuzzers, hence improving their efficiency and improving their chance of detecting a bug or a vulnerability in the library. The results I received from the improvements were significant, and it supported a previous statement, which states that

50

currently existing targets are focusing on too many functionalities all at once and must be divided into smaller targets to cover more of the codebase. The issue includes having too many function calls and testing multiple features all in one target and also having very commonly used functionalities fuzzed within other targets that are not directly meant for them, such as encoding libraries. These libraries are very frequently used by targets to convert the randomly generated data into something that the program could understand. Having these functionalities be tested in their targets significantly increases the code coverage for the entire library, therefore strengthening the entire library and its codebase. Moreover, this allows for a more efficient fuzzing process, where the more a target is split and made more explicit, the better it performs and has a significantly higher execution rate than when it tests for various functions.

Overall, the code coverage in OSSL and other open-source security libraries – specifically forks of OSSL – is not the best and needs significant improvement. As mentioned, performed and tested in chapter 3, developers need to study the codebase and documentation more to understand what needs to be fuzzed first, based on how common a function is used and how vital it is in terms of ensuring the library’s and users’ security. Moreover, currently existing targets are great at testing some of the primary and essential functionalities in OSSL. However, many of them are very heavy and would be better split-up and tested separately. As it was performed in the experiments in chapter 3, splitting up targets does indeed improve performance and increases the overall tested code coverage over a shorter period. It also allows the targets to be more easily maintained. If a new bug or vulnerability were to be detected by a smaller target, it would make the debugging and bug-tracking process much more straightforward, taking up less of the developers’ time.

Moreover, as mentioned previously, the same experiments performed also improved the execution rate of all the fuzzing targets that have been modified and newly-created. The higher execution rate by the allowed for a much higher overall code coverage over the same period. Splitting up targets allowed for single cores to better utilise processing and fuzzing these targets, hence increasing the number of possible fuzzing targets running on one machine while also increasing the execution rate and code coverage at the same time, without compromising anything – except for a higher power bill due to the increase CPU usage, possibly.

51

52

Chapter 5 Conclusions

5.1 Summary and conclusions

This thesis focused on studying, investigating, improving, experimenting with and writing fuzzing solutions for open-source web protocol implementations, explicitly focusing on the OpenSSL library. The OpenSSL library is the most common and widely deployed open-source implementation of the TLS/SSL protocol on the web, and the majority of other open-source implementations are forks of OpenSSL. However, the library’s development is mainly focused on writing new implementations and improving current functionalities, with testing being placed on a lower priority as perceived from the commits on the library’s official repository.

One of the primary and prominent issues with the lack of fuzzing – and testing in general – was due to the low funding being put into the development and maintenance of the library. Large tech corporations, especially Google, invested heavily into the OpenSSL organisation to ensure that the team continues to develop and work on the project, while also creating OSS-Fuzz as an open-source fuzzing cluster for OpenSSL and other open-source applications. Despite that, not much effort has been put into writing more and better fuzzing targets for the library. Some of the development efforts need to be focused on the fuzzing of some of the library’s essential and security-critical functions.

While investigating existing targets and solutions for the three most commonly used and deployed implementations (OpenSSL, LibreSSL and BoringSSL), the findings were fascinating. LibreSSL, a fork of OpenSSL, has a relatively low development effort put into it in comparison to other implementations. This low development effort is also evident in its fuzzing targets, which were exact copies of OpenSSL’s with minor modifications so that they can work with LibreSSL. Moreover, the targets were mostly unmodified or even improved upon. BoringSSL, on the other hand, has a palpably high development effort put into the project, including testing and fuzzing. The BoringSSL team have significantly improved upon

53

OpenSSL’s fuzzing targets and continues to do so, more frequent than any other TLS/SSL implementation. The results of these improvements were evident on OSS-Fuzz’s logs where the fuzzer detected a handful of bugs – not necessarily security vulnerabilities – in 2020 alone. OpenSSL’s fuzzing targets have seen minor and minimal improvements over the years. After investigating the way they were implemented, it was also palpable that they were very ‘brute- forced’ and were testing for multiple functionalities all at once. As discussed previously in section 3.2 and chapter 4, the execution rate on these targets was much lower than if they were to be optimised, meaning it takes longer to cover X amount of the codebase in comparison to a more optimised target in the same period. After optimising, rewriting and modifying the targets, and more importantly, after splitting up large targets into smaller ones, they were able to fuzz the library more efficiently. This improvement would allow the fuzzers to work more efficiently on multiple CPU cores and detect bugs and vulnerabilities in a fraction of the time they would have if they were all in one target.

In addition to improving currently existing targets, new targets were written with the knowledge obtained from the background and literature review and the experiments performed on previous implementations. These targets were aimed at fuzzing and testing some of the extremely common functions that are frequently – but not always – accessed by fuzzers, such as encoding and decoding functions that are used to convert inputted data into a format supported by OpenSSL. Fuzzing these small yet essential functions increased the library’s overall code coverage by a noticeable percentage of over 15-20% relative to the current code coverage on the official repository. This increase in code coverage was also done in a relatively short period with a fraction of the computing resources assigned to the targets being fuzzed currently. This raised even more concerns that not enough effort is being put into writing new targets, despite the available computational resources being of no concern, thanks to Google’s OSS-Fuzz.

The findings from studying, investigating and experimenting with current literature and work will help increase the total code coverage of security-critical implementations of web protocols through fuzz testing, while at the same time reducing the time taken to fuzz individual targets. This will help detect more bugs and vulnerabilities in functions that would have been untested

54

before due to low code coverage and short testing time. Therefore, this thesis completed and achieved all of its stated goals and tasks within the timeframe assigned, providing new findings into the current status of fuzzing implementations and new solutions based on the findings in open-source web protocol implementations, with focus on OpenSSL.

5.2 Possible future work

Despite the thesis achieving all of its stated goals, more work can be done to improve the overall security of open-source web protocol implementations, especially with regards to fuzz testing. A task that could be performed is to create a pull request on OpenSSL’s official repository to update the fuzzing directory with the changes performed. This task would require some prior communication with the OpenSSL organisation and development team to ensure that the targets would work appropriately and seamlessly with current targets and the OSS-Fuzz cluster. Moreover, the ‘README’ document in the fuzzing directory could include some of the findings in this thesis to help new and current developers write better targets. If the OpenSSL team approves these changes, they will be pulled, and all current and future targets will take these findings into account.

Another field that could be discovered is how to utilise graphics processing units (GPUs) to fuzz targets. Similar to how the blockchain technology utilises GPUs to produce large and random numbers to solve complex mathematical problems, fuzzing engines could be tunes to utilise GPUs to perform the generation and fuzzing. This utilisation would allow for many instances running all at once in a much more efficient manner than on a CPU since CPUs only process one operation at a time per core. A reason why fuzzing engines do not utilise fuzzing with GPUs is because GPUs do not read and process code (such as C and C++) in the same manner that x86 CPUs – for example – do.

Another reason why current fuzzing engines do not utilise GPUs today is due to them being created at a time when GPUs were extremely weak and more expensive for the performance they provided in comparison to CPUs. Also, since programming on CPUs was a lot more

55

common back then, more developers were familiar with CPU architectures than GPU. However, GPUs today are getting even more powerful that it would be justified for more research to be performed on this field. This massive research can only occur if large corporations or universities that show interest in security and fuzzing, such as Google, specify some funds for significant research to organisations and development teams that work on compilers and fuzzers so that more research can be performed on this topic.

An article by Ryan Eberhart of Stanford University was published on Security Boulevard by the writer Noël Ponthieux a few weeks before publishing this thesis [35] which briefly researched and discussed the potential of fuzzing with GPUs. Ponthieux was also able to fuzz a simple program using their GPU after solving all of the issues they faced. As mentioned earlier, some of the concerns raised with using GPUs is the difficulties associated with converting CPU code into GPU code. Ponthieux also discussed another issue which was the difficulty associated with managing the memory of a GPU. GPUs do not have a simple memory management system in place, and due to the way fuzzers behave, programs could very quickly crash due to memory errors and not bugs or vulnerabilities. These issues can be resolved easily with already-existing solutions, some of which were implemented in blockchain technology. However, they may require extra work for them to be implemented for fuzzing engines.

Moreover, dynamic testing, in conjunction with fuzzing, is also a beneficial and exciting topic to research further and how they can be implemented on currently existing targets, since the majority of them mainly utilise static testing in their process. Implementing dynamic testing capabilities with current targets would cover more test cases and allow the fuzzer to increase its code coverage.

56

Appendices

A. Data analyser and plotter for libFuzzer results – code coverage

#!/usr/bin/python import sys import random import matplotlib.pyplot as plt check = False execution = [] coverage = [] count_of_lines = 0 rate = 100 file_lines = open(sys.argv[1], "r").read().splitlines() for x in file_lines: split_spaces = x.split(" ")

if split_spaces[0].startswith("#"): while count_of_lines < int(split_spaces[0].split("\t")[0][1:]): count_of_lines += rate execution.append(int(count_of_lines)) coverage.append(int(split_spaces[split_spaces.index("cov:")+1])) plt.plot(execution,coverage) plt.title('Code Coverage - ' + sys.argv[2]) plt.xlabel('Executions x ' + str(rate)) plt.ylabel('Coverage') plt.show() plt.savefig('plot_' + sys.argv[2] + '.png') plt.close() random = random.randint(0,50000) with open('data-' + sys.argv[2] + str(random) + '.csv', 'w') as file: for x in range(len(execution)): file.write(str(execution[x]) + ',' + str(coverage[x]) + '\n')

57

B. Unicode to ASCII fuzzer

#include #include extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { char *output; output = OPENSSL_uni2asc((unsigned char *)data, size);

if (output != NULL) { free(output); }

return 0; }

58

C. UTF-8 to Unicode fuzzer

#include #include extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { unsigned char *output; unsigned char **uni; int *unilen;

output = OPENSSL_utf82uni((const char *)data, size, uni, unilen); if (output != NULL) { free(output); }

return 0; }

59

D. Example CSV data produced to analyse results

x y

100 2

200 2805 300 2805 400 2805 500 2805

600 2805

700 2805

800 2805

900 2805 1000 2805 1100 2805

1200 2805

1300 2805

1400 2805

1500 2897 1600 2897 1700 2897 1800 2897 1900 2902

2000 2902

2100 2902

2200 2903 2300 2903 2400 2905 2500 2905

2600 2931

2700 2931

2800 2931 … … 37303700 4207

Table 1: Example CSV data to be used to collect results where x usually describes the # of executions so far and

y describes the type of data being collected, such as code coverage in the example above.

60

Bibliography

[1] P. Godefroid, M. Y. Levin, and D. Molnar, "SAGE: Whitebox Fuzzing for Security Testing," Queue, vol. 10, no. 1, pp. 20–27, 2012, doi: 10.1145/2090147.2094081. [2] D. Ince, "Hypertext Transfer Protocol Secure," 3 ed, 2013. [3] J. Viega, M. Messier, and P. Chandra, Network security with OpenSSL cryptography for secure communications by John Viega, Matt Messier, Pravir Chandra. O'Reilly Media Inc., 2009. [4] S. Chen, K.-K. R. Choo, X. Fu, W. Lou, and A. Mohaisen, Security and privacy in communication networks 15th EAI International Conference, SecureComm 2019, Orlando, FL, USA, October 23-25, 2019, Proceedings. Part II Songqing Chen, Kim- Kwang Raymond Choo, Xinwen Fu, Wenjing Lou, Aziz Mohaisen (eds.). (Lecture notes of the Institute for Computer Sciences, Social Informatics, and Telecommunications Engineering; 305). Springer, 2019. [5] T. D. Bodo Möller, Krzysztof Kotowicz, "This POODLE Bites: Exploiting The SSL 3.0 Fallback," September 2014 2014. [6] I. Ristic. "Poodle Bites TLS." Qualys. ://blog.qualys.com/ssllabs/2014/12/08/poodle-bites-tls (accessed 20 March, 2019). [7] T. Dierks and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2," in "Standards Track," RTFM, Inc., August 2008. [Online]. Available: https://tools.ietf.org/html/rfc5246 [8] T. Dierks and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.1," in "Standards Track " RTFM, Inc., April 2006. [Online]. Available: https://tools.ietf.org/html/rfc4346 [9] E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.3," in "Standards Track," Mozilla, August 2018. [Online]. Available: https://tools.ietf.org/html/rfc8446 [10] OpenSSL. "Welcome to OpenSSL!" OpenSSL Software Foundation. https://www.openssl.org/ (accessed 21 March, 2020). [11] Stunnel. "About Stunnel." Stunnel. https://www.stunnel.org/ (accessed 12 March, 2020). [12] B. Buchanan, "Goodbye OpenSSL, and Hello To Google Tink." [Online]. Available: https://medium.com/asecuritysite-when-bob-met-alice/goodbye-openssl-and-hello-to- google-tink-583163cfd76c [13] P.-H. Kamp, "Quality software costs money---heartbleed was free," Communications of the ACM, vol. 57, no. 8, pp. 49--51, 2014. [14] M. Carvalho, J. Demott, R. Ford, and D. A. Wheeler, "Heartbleed 101," IEEE Security & Privacy, vol. 12, no. 4, pp. 63--67, 2014. [15] M. Thurman, "The Heartburn of Heartbleed," Computerworld, vol. 48, no. 9, pp. 25-- 25, 2014. [Online]. Available: http://search.proquest.com/docview/1551040325/. [16] Heartbleed. "The Heartbleed Bug." Synopsys, Inc. https://heartbleed.com/ (accessed 10 March, 2020). [17] Shodan. "Heartbleed Report." Shodan.io. https://www.shodan.io/report/0Wew7Zq7 (accessed 11 March, 2020). [18] M. Aizatsky, K. Serebryany, O. Chang, A. Arya, and M. Whittaker, "Announcing OSS- Fuzz: Continuous Fuzzing for Open Source Software ", ed. California: Google, 2016. [19] Google. "Tink." Google. https://github.com/google/tink (accessed 12 March, 2020).

61

[20] Google. "BoringSSL." Google. https://boringssl.googlesource.com/boringssl/ (accessed 13 March, 2020). [21] C. Chen, B. Cui, J. Ma, R. Wu, J. Guo, and W. Liu, "A systematic review of fuzzing techniques," Computers & Security, vol. 75, p. 118, 2018. [Online]. Available: http://search.proquest.com/docview/2084463443/. [22] A. Takanen, J. DeMott, and C. A. Miller, Fuzzing for software security testing and quality assurance Ari Takanen, Jared DeMott, Charlie Miller. (Artech House information security and privacy series). Artech House, 2008. [23] B. Chen, A. Kim, and J. Lam, "Fuzzing OpenSSL," 2019. [24] Google, "Open sourcing ClusterFuzz," 7th February 2019. [Online]. Available: https://opensource.googleblog.com/2019/02/open-sourcing-clusterfuzz.html [25] G. Evron, Open source fuzzing tools Gadi Evron ... et al. . Syngress Pub., 2007. [26] A. f. loop. "american fuzzy lop (2.52b)." coredump. https://lcamtuf.coredump.cx/afl/ (accessed 14 March, 2020). [27] base-builder. (2016). Google, GitHub. Accessed: 4 August 2020. [Online]. Available: https://github.com/google/oss-fuzz/tree/master/infra/base-images/base-builder [28] openssl. (2020). OpenSSL Software Foundation, GitHub. Accessed: 7 August 2020. [Online]. Available: https://github.com/openssl/openssl [29] OpenSSL. "Fuzzing OpenSSL." OpenSSL Software Foundation. https://github.com/openssl/openssl/blob/master/fuzz/README.md (accessed 10 August, 2020). [30] OpenSSL. "SSL_CTX_set_verify." OpenSSL Software Foundation. https://www.openssl.org/docs/man1.1.1/man3/SSL_get_ex_data_X509_STORE_CT X_idx.html (accessed 1 September, 2020). [31] -portable. (2020). OpenBSD. [Online]. Available: https://github.com/libressl- portable/portable [32] boringssl. (2020). Google, GitHub. Accessed: 3 September 2020. [Online]. Available: https://github.com/google/boringssl [33] OpenSSL. "SSL_read_early_data." OpenSSL Software Foundation. https://www.openssl.org/docs/man1.1.1/man3/SSL_read_early_data.html (accessed 12 September, 2020). [34] oss-fuzz. "No title." Google. https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=- opened&q=proj%3Aopenssl&can=1 (accessed 3 October, 2020). [35] R. Eberhart. "Let’s build a high-performance fuzzer with GPUs!" Security Boulevard. https://securityboulevard.com/2020/10/lets-build-a-high-performance-fuzzer-with- gpus/ (accessed 24 October, 2020).

62