MASARYK UNIVERSITY FACULTY OF INFORMATICS

Secure coding in modern ++

MASTER'S THESIS

Be. Matěj Plch

Brno, Spring 2018 MASARYK UNIVERSITY FACULTY OF INFORMATICS

Secure coding in modern C++

MASTER'S THESIS

Be. Matěj Plch

Brno, Spring 2018 This is where a copy of the official signed thesis assignment and a copy of the Statement of an Author is located in the printed version of the document. Declaration

Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Be. Matěj Plch

Advisor: RNDr. Jifi Kur, Ph.D.

i Acknowledgements

I would like to thank my supervisor Jiří Kůr for his valuable guidance and advice. I would also like to thank my parents for their support throughout my studies.

ii Abstract

This thesis documents how using modern C++ standards can help with writing more secure code. We describe common vulnerabilities, and show new language features which prevent them. We also de• scribe coding conventions and tools which help programmers with using modern C++ features. This thesis can be used as a handbook for programmers who would like to know how to use modern C++ features for writing secure code. We also perform an extensive static analysis of open source C++ projects to find out how often are obsolete constructs of C++ still used in practice.

iii Keywords secure coding, modern C++, vulnerabilities, ISO standard, coding conventions, static analysis

iv Contents

1 Introduction 1 1.1 C++ standardization 2 1.2 Thesis structure 2

2 Common vulnerabilities 4 2.1 Buffer overflow 4 2.2 Use-after-free 5 2.3 Memory leak 6 2.4 Doublefree 6 2.5 Integer overflow 6 2.6 Race condition 6 2.7 Uninitialized variable 7 2.8 Units mismatch 7

3 Features improving code security 8 3.1 Garbage collector support 8 3.2 Smart pointers 9 3.2.1 Unique pointer 9 3.2.2 Shared pointer 10 3.2.3 Utility functions 11 3.3 Arrays with known size 12 3.3.1 Span 12 3.4 Compile time assertions 13 3.5 Type-safe units 13 3.6 Random numbers library 14 3.7 Ranges 16 3.8 Attributes 17 3.9 Concurrent programming 18 3.9.1 Lock guard 19 3.9.2 19 3.9.3 Atomics 19 3.9.4 Parallel algorithms 19 3.10 Variadic templates 20 3.11 Class member initialization 21

4 Coding conventions 23

v 4.1 C++ Core Guidelines 23 4.1.1 Guideline Support Library 24 4.2 SEI CERT C++ Coding Standard 24 4.3 High Integrity C++ Coding Standard 24 4.4 AUTOSAR C++14 Coding Guidelines 25 4.5 C++ Besf Practices 26

5 Tools helping with modern C++ 27 5.1 Clang Static Analyzer 27 5.2 Cppcheck 28 5.3 ECLAIR 28 5.4 Klocwork 28 5.5 LDRA tool suite 28 5.6 Parasoft C/C++test 29 5.7 QA C++ 29 5.8 Clang-tidy 29

6 Other languages 32 6.1 32 6.2 C# 33 6.3 Python 33 6.4 Rust 33 6.5 Go 34 6.6 Haskell 34

7 Analysis of C++ projects 35 7.1 Methodology 35 7.2 Analysis execution 37 7.2.1 Implementation details 37 7.3 Results 38 7.3.1 Combinations of memory issues 40

7.4 Conclusions 40

8 Conclusions 43

Bibliography 45

A Attachments 49

vi 1 Introduction

Security of systems is not always a matter of security policies, au• thentication, or encryption. To compromise a system, it is sometimes sufficient to exploit a bug in the code. Programmers should be aware of possible vulnerabilities in code and strive to write code without security flaws. Such errors in code in most cases do not pose a problem for normal functionality of a system, but a motivated attacker can use them to cause unexpected behavior of program, breaking system's security as a result. With respect to code security, C++ has a bad reputation. The lan• guage was created as an extension to the C , which does not have any mechanisms to protect programmer from writing code vulnerable to exploits by mistake. There exist other programming languages, where many of imper• fections of C are resolved, but are not capable of replacing C++. Some languages offer better security than C++, but at a cost of overhead. Main focus of C++ is to be as fast as possible, with zero cost abstrac• tions and no overhead. These properties are vital for applications where maximum performance is important, for example in embedded systems, where there are limited computing resources, in supercom• puters, where faster code allows to calculate larger simulations, or in data centers, where large amounts of requests must be satisfied in a limited time. Fast code is also more cost effective, performing computations in less time with less resources leads to a lower energy consumption. Embedded and mobile systems are then able to run longer on a power from batteries, and large data centers save money on electricity and cooling. Some languages claim to be fast and also secure, but these languages are relatively new, and maturity is also an important quality of a technology to be adopted in the industry. The aim of this thesis is to explore possibilities of writing C++ code not prone to common security vulnerabilities, with focus on features introduced in modern C++ standards. This thesis should serve as a brief handbook for C++ programmers, who want to learn how they can use modern C++ features to write more secure code. We show how specific features improve code security, together with brief examples how to use them, and with an information in which C++ standard

1 i. INTRODUCTION

they were introduced. We also perform an extensive analysis of open source C++ projects, to find out how much are obsolete C++ features still used in practice.

1.1 C++ standardization

C++ is a relatively old programming language. It first appeared as an object-oriented version of C called C with classes, created at Bell Labs by Bjarne Stroustrup. Name C++ was introduced in 1985. In 1998 C++ achieved an important milestone, when it has been standardized as an international standard ISO/IEC 14882:1998, also called C++98. Standardization is important, because it gives technology a stabil• ity, an important aspect for adoption in industry. C++ standardization committee is divided into domain-specific study groups, which rec• ommend submitted proposals for final voting and standardization. The ISO C++ committee consists of many experts from the industry, working for companies interested in C++, such as Google, Microsoft, Qualcomm, The Qt Company, and many more. Development of C++ has not stopped with C++98. In 2003, a main• tenance release known as C++03 was released, fixing defects discov• ered in the previous standard. Another standard was released 8 years later in 2011, and was a significant extension of the language. C++11 added many features for more convenient and secure programming, making C++ a more modern language. C++11 is acknowledged as a be• ginning of modern C++. To keep up pace with advances in software development, the standardization committee has decided to release new C++ standards every 3 years. C++14 followed in 2014, and C++17 is to this day the latest standard, published in 2017 [1]. Next revision [2] is expected to be published in 2020 as C++20.

1.2 Thesis structure

First chapter is introduction. In Chapter 2, we describe common vul• nerabilities present in code. Chapter 3 presents features of modern C++ which improve code security. In Chapter 4, we list coding conventions recommending to use modern C++ features. Chapter 5 follows up with a collection of software tools helping programmers to use modern

2 i. INTRODUCTION

C++. In Chapter 6, we briefly study how security issues addressed by modern C++ standards are present in other popular languages. Large scale analysis of open source C++ projects is described in Chapter 7. In the last chapter we conclude this thesis and the status of secure coding in C++.

3 2 Common vulnerabilities

Errors in code caused by a programmer can lead to an undesired be• havior of a program, they can cause crashes, execution of arbitrary code, or even escalation of privileges. In this chapter we briefly de• scribe most common types of programming errors, what harm they can make, and whether there are mitigations preventing successful exploitation by an attacker.

2.1 Buffer overflow

Buffer overflow is an error caused by accessing memory out of bound• aries of an array [3]. It is usually caused by writing too much bytes to an array without checking size of the array. Attacker can make use of the knowledge of how does execution stack look like to change values of variables or execute arbitrary code. To change value of a variable, attacker must know where on the stack the variable lies and cause overflow of bytes of specific value to overwrite the memory location of the variable on the stack. Similarly, to execute arbitrary code, attacker must overwrite return address on the stack to a specific address where code is prepared for execution, this technique is called return oriented programming (ROP). In C++, accessing arbitrary memory past arrays is an undefined behavior, meaning that anything can happen. Modern C++ compilers extensively take advantage of undefined behavior for optimization, the compilers simply assume that undefined behavior never happens. Thanks to this approach compilers can generate faster code, but does not provide implicit security checks to prevent undefined behavior, like throwing an exception when buffer overflow happens. For security reasons, some compilers offer an optional mechanism called canaries to detect buffer overflows at run time. A value called canary is inserted to an execution stack next to the stack's control information. When program returns from a function call, it checks whether value of the canary has changed. Change of the value suggests that the stack was overwritten and the program is terminated. Value of the canary can be randomized at run time to prevent attacker from knowing its value and hiding buffer overflow by writing the same

4 2. COMMON VULNERABILITIES value to the position of the canary. Downside is that instrumentation of binaries with canaries incur performance penalty. There is also a mechanism called shadow stack, which is able to detect an unwanted change of return address. Shadows stack is a parallel stack to the main stack, but holds only return addresses. When function call ends, return address in main stack must match with an address on the shadow stack, in case of mismatch the program is terminated. Intel offers shadow stacks as part of their Control-flow Enforcement Technology [4]. To prevent successful execution of an arbitrary code by changing return address on the stack, operating systems implement address space layout randomization (ASLR). Thanks to ASLR programs are loaded to a randomly chosen address in memory, making it much harder for an attacker to predict where to find functions useful for the ROP attack. However, attacks for bypassing ASLR has been published [5].

2.2 Use-after-free

Use-after-free error is caused by a wrong manual memory manage• ment [6]. When dynamically allocated object is released, pointer which pointed to the object still points to the same memory address. Using this dangling pointer as if it still points to an object may cause termi• nation of the program by the operating system. Moreover, use-after-free error can lead to an execution of an arbi• trary code. If the released object was a class with virtual methods, then the memory contains also vtable, a table of virtual function pointers. It can be possible for an attacker to change the memory so it contains pointer to a valid code for execution, for example by allocating differ• ent object at the same address. Subsequent call of a method using the dangling pointer uses overwritten pointer in vtable and calls attacker's code. For this attack to be successful, an attacker usually needs to per• form a method called heap spraying, which allocates large chunk of memory and fills it with desired bytes, potentially landing the bytes of an address to the right position of a vtable. Defenses in operating systems against this type of memory related error are similar to the defenses against the previously mentioned buffer overflow.

5 2. COMMON VULNERABILITIES

2.3 Memory leak

Memory leak is a situation when allocated memory is not properly released, usually caused by a programmer not calling deallocation function by mistake [7]. Leaked memory therefore won't be released until termination of a program. Repeated memory leaks can cause ex• haustion of available system memory and crash of a program, allowing an attacker to perform a denial of service (DoS) attack.

2.4 Double free

In C++, calling f ree() or delete on the same pointer to the dynam• ically allocated memory twice is a double free error. According to the C++ standard, double free causes undefined behavior. In reality, double free may corrupt memory manager, which attacker can exploit to crash the program or execute arbitrary code [8].

2.5 Integer overflow

Integer values in C++ are limited by their binary width, for example uint32_t is an unsigned integral type stored in 32 bits (4 bytes). Mini• mal value representable by this type is 0, maximal is 232 — 1. Adding 1 to the maximal value causes a variable to overflow from the range it can represent and wrap around to its minimal value, while subtract• ing 1 from the minimal value causes the variable to underflow to the maximal value. Integer overflow can be exploited by subtracting from an unsigned variable to underflow it to a large value, and use this value as a size of a buffer to perform buffer overflow [9].

2.6 Race condition

In concurrent programs where multiple threads perform calculations on common data, programmer must ensure that mutual resources are accessed exclusively from one thread at a time [10]. Modifying data from multiple threads without synchronization can result in incorrect results, causing unexpected behavior of a program. For example, two

6 2. COMMON VULNERABILITIES

competing threads can inconsistently modify a shared integer value, causing integer overflow, which can lead to buffer overflow.

2.7 Uninitialized variable

Variables of built-in types are not implicitly initialized when declared. Value of an uninitialized variable is determined by the values of bytes in the memory location where it is stored. Using uninitialized variables in the code can lead to an unexpected control-flow of a program. It is possible for an attacker to influence initial values of uninitialized variables to change behavior of a program [11].

2.8 Units mismatch

Values of different units of measurement are usually represented by common types, such as floats or integers. Using value representing one unit as value of different unit without necessary recalculation causes program to return incorrect results. In 1999, Mars Climate Orbiter was a space probe built to study the Martian atmosphere. Software for calculating engine thrust impulses, supplied by Lockheed Martin, provided results in pound-seconds. These results were used by a software developed by NASA, which cal• culated trajectory, but expected input values to be in newton-seconds. As a result the probe encountered Mars on too low orbit and burnt in the atmosphere. The error was caused by Lockheed Martin delivering software not matching Software Interface Specification [12].

7 3 Features improving code security

Apart from adding features for more convenient and effective program• ming, modern C++ standards provide facilities for doing previously insecure operations in a secure way Unfortunately it is not possible to just remove insecure features to make the language better, because backwards compatibility and language stability is very important for the adoption of the language in the industry. Instead, modern C++ standards introduce new, more secure features, and discourage pro• grammers from using the old, insecure ones. But some features are deprecated and removed from the standard, because they only cause harm and don't bring significant benefits. This chapter presents features introduced by modern C++ stan• dards which improve code security. Scope of these features range from improved management of various resources, thread safety, or type safety enforced by compilers at compile time and run time to prevent undefined behavior. Detailed documentation of all the features we describe here can be obtained online at C++ reference [13], which is an unofficial, but the only one complete documentation available for C++ users.

3.1 Garbage collector support

Many programming languages use garbage collector (GC) for an au• tomatic release of memory which is not used anymore. C++11 intro• duced an interface for garbage collection and reachability-based leak detection [14] to prevent memory leaks. However, implementing GC is not mandatory for standard library implementations. Main distri• butions of the standard C++ library, namely GCC, Clang, and MSVC, does not provide GC implementation. There are two reasons why GC is being omitted from implementa• tion. First reason is that it would introduce an overhead, but C++ is being used because it has little to no overhead. Second reason is that when using GC in C++ programmers still need to use pointers care• fully. It is possible in C++ to store a pointer in a value of non-pointer type, effectively hiding the pointer. GC could then release memory

8 3. FEATURES IMPROVING CODE SECURITY

owned by the hidden pointer, but it is possible to recover it to access now deallocated memory, causing use-after-free error.

3.2 Smart pointers

In C++, there is usually no need to hold a pointer to an array of multiple values, because such use cases should be handled by using containers from the standard library, such as std: : vector or std: : string. How• ever, there is need to use a pointer to a single object, so the object can be used polymorphically or shared in different parts of a program. Raw pointers to objects obtained by operator new have disadvan• tage that programmer must not forget to release them by using opera• tor delete. If programmer fails to do so, the memory is leaked and there is no way to release it. Programmers are discouraged from using raw pointers for owning memory and should use classes introduced in C++11 for memory management. Smart pointers are classes which use a programming idiom re• source acquisition is initialization (RAII) for memory management. RAII enables objects to hold a resource and correctly release it upon destruc• tion, which is desired in C++, because destructors of objects whose lifetime ends are called in a deterministic way, even if end of their life• time is caused by throwing an exception. C++ has already offered one smart pointer, std: : auto_ptr, which has been deprecated in C++11 and removed from the language in C++17, because of its unintuitive copy semantics. Copying an auto pointer transfers ownership of mem• ory from the original to the new instance, which in not how copying should work. Because of this, it is also tricky to store auto pointers in containers. Modern smart pointers offer different copy semantics to provide programmers with a RAII-based memory management with clear semantics.

3.2.1 Unique pointer Unique pointer holds a pointer to a dynamically allocated memory, which gets deallocated when unique pointer is destroyed or another owning pointer is assigned to it, preventing memory leaks and use- after-free errors. To prevent double free errors, it is not possible to copy

9 3. FEATURES IMPROVING CODE SECURITY a unique pointer, because the class does not provide copy constructor or copy assignment operator. It is only possible to transfer ownership to another instance of unique pointer using move semantics. Moving is a mechanism for transferring resources between objects without copy• ing the resource. Move semantics are mostly used for performance reasons to avoid unnecessary copying of objects, but unique point• ers utilize this facility to enforce correct memory management. We demonstrate how to use unique pointers in listing 3.1. #include

// C++03 class old_class { public : old_class(int i) : .resource(new resource(i)) {} ~old_class() { delete .resource; } private : // disable copy to prevent double free error old_class (const old_class&); old_class& operator =(const old_class&); resource* .resource; };

// C++14 class new_class { public : new_class (int i) : .resource ( std :: make_unique( i )) {} // no need for explicit destructor private : // copy of new_class is implicitly disabled std :: unique_ptr .resource; }; Listing 3.1: Comparison of manual management required to correctly release memory allocated in a class.

3.2.2 Shared pointer std: : sharedptr is a copyable smart pointer, where both new and original instance hold the same pointer. To prevent double free error, shared pointers contain also a reference counter, to store how many in-

10 3. FEATURES IMPROVING CODE SECURITY

stances of shared pointers hold the same pointer. Reference counter is increased on each copy, and decreased when one instance is destroyed. When reference counter reaches 0, it means that no instance use the stored pointer and it can be safely deallocated. Reference counter is implemented using atomic operations and stored on a shared dynam• ically allocated block of memory. Thanks to this it is suitable to use shared pointers in concurrent or asynchronous systems. Users of shared pointers must keep in mind the structure of refer• ences created by shared pointers. If user manages to create a reference cycle, where for example object A holds shared pointer to object B, which holds shared pointer to object A, then reference counter never drops to 0, and the memory is never released. In other words, memory leak happened. To break reference cycles, a special smart pointer called std: : weak_ptr exists. This pointer can be constructed from a shared pointer, but does not own any memory and its existence does not prevent release of memory. Weak pointer is then able to tell whether the managed object still exists or not, and can be promoted back to a shared pointer.

3.2.3 Utility functions In a very specific corner case, smart pointers on their own are not com• pletely capable of preventing memory leaks. When calling a function with multiple arguments, compilers are allowed to optimize the call by interleaving evaluation of the arguments. If one argument happens to be a call to a smart pointer's constructor together with operator new, and another argument throws exception on evaluation, it can happen that the exception is thrown after the call to operator new, but before the call to the smart pointer's constructor, effectively leaking the newly allocated memory. To fully initialize smart pointers in a single function call which cannot be interleaved, utility function std: :make_shared for shared pointers has been added in C++11, and function std: :make_unique for unique pointers has been added in C++14. These functions for• ward their arguments to the constructor of the managed object, and return a smart pointer. It is recommended to use these functions for allocating smart pointers. Listing 3.1 also includes an example how to use function std: :make_unique.

11 3. FEATURES IMPROVING CODE SECURITY

3.3 Arrays with known size

Arrays are used for storing a fixed number of objects, and offer a per• formance benefit, because objects are stored on stack, instead of a dy• namically allocated storage. Classic C-style arrays have a significant disadvantage, they do not offer a way how to tell their size. This in• formation must be maintained in a separate variable, which is error prone and susceptible to buffer overflow errors. In C++11 a more secure alternative to plain arrays has been in• troduced. It is a class called std: : array. This class offers methods for determining boundaries of a buffer. Methods begin () and end() return pointers where the array starts and ends, and method size () tells its size, which is known at compile time. And unlike C arrays, std: : array does not decay to a plain pointer, so the information about its size is always kept. In listing 3.2 we show how C++11 arrays retain information about their size. #include

// information about number of elements is lost void take_array(int*) {...} // size of array is still known template< std :: size_t N >

void take_array ( std :: array < int, N >&) {...}

#define SIZE 16 int ints[SIZE]; // C—style array std::array< int, SIZE > cpp_ints; // calls take_array(int *) take_array(ints ); // calls take_array(std ::array< int , N >&) take_array ( cpp_ints ); Listing 3.2: Example how in contrast to C arrays std: : array does not decay to a pointer.

3.3.1 Span To uniformly represent any buffer of objects, in C++20 will be available std: : span. Span is a non-owning class representing given range or

12 3. FEATURES IMPROVING CODE SECURITY

subrange of elements. Similarly to std: : array, it conveniently pro• vides methods for obtaining its starting and ending iterators and its size, which can be either compile time constant or value determined at run time.

3.4 Compile time assertions

Programmers sometimes have assumptions about properties of their program, and need to check whether their assumptions are valid. To check properties of a program assert is useful. It checks given condition and terminates program if the condition is not true. However, this macro works only in debug mode, in release mode it is disabled, and its purpose is to detect deficiencies, not to prevent them. From C++11 we can use a feature called static_assert. Static assertions are used similarly to normal assertions, with a difference that assumptions are evaluated at compile time. Static asserts are therefore useful only for checking properties known at compile time. This property is actually very strong, because if condition in a static assert does not hold, it is a compilation error, and the program is not successfully compiled, effectively preventing the program from failing. In listing 3.3 we show how to use static assertions. struct values { int a, b; char c, d; } attribute ((packed ));

static_assert(sizeof(values) == 2*sizeof(int) + 2*sizeof(char), // optional error message "struct values has padding"); Listing 3.3: Example of using static_assert.

3.5 Type-safe units

As we described in section 2.8, mismatching types of units can cause serious damage. C++11 offers library called chrono, which provides classes for working with time, together with type-safe system of time units.

13 3. FEATURES IMPROVING CODE SECURITY

Periods of time in C++11 are not represented by regular scalar types, but by a class std: : chrono: : duration. This class is a template, and one of its type parameters is a fraction of time described by type std: : ratio. For example, seconds are represented by a duration with ratio, minutes use ratio<60>, because there are 60 seconds in one second, and milliseconds use ratio, because there are 1000 milliseconds in one second. Because magnitude of a unit is encode in its type, compiler checks that units are used in a correct way. In addition, durations support arithmetic operations, so programmer can do calculations with dif• ferent types of units and compiler will automatically do necessary recalculations, for example 1 hour minus 1 second yields correct result 3599 seconds. C++14 later introduced time literals, providing a con• venient way of expressing periods of time. In listing 3.4 we show an example how C++ standard library utilize safe time units. C++ standard library offers only safe units for time durations, but programmers can use std: : ratio to model any type of units. #include // Posix API sleep (5); // 5 seconds

# include // Win API Sleep (5); // 5 milliseconds!

#include // C++11 #include

const std :: chrono :: seconds ten_seconds {10}; std :: this..threa d : :sleep. _f or (ten_seconds ); // time I iterals std :: this..threa d : :sleep. .for (5s); std :: this..threa d : :sleep. .for (5ms); std :: this..threa d : :sleep. .for (lh - lOmin); Listing 3.4: Example of type-safe time units.

3.6 Random numbers library

Generating random or pseudo-random values is needed for simu• lations or cryptographic purposes. Especially in cryptography it is

14 3. FEATURES IMPROVING CODE SECURITY

important to generate values which are not predictable and have suffi• cient entropy. For a long time, there was only one standard way how to obtain pseudo-random values in C++, using C function rand(). However, there are no guarantees about the quality of the pseudo-random engine used, and it is not recommended to use the function for serious random number generation needs. In C++11 a whole new library for working with random and pseudo-random numbers was added. For generating random num• bers is designated class std: : random_device. Each call to the random device's operator () generates new random value. Random device is allowed, but not required, to be non-deterministic or cryptographically secure random number generator. The way how does random device work is implementation defined. On Windows in the Visual C++ stan• dard library implementation the values produced by random device are supposed to be non-deterministic and cryptographically secure [15]. On Linux, standard library implementations libc++ and libstdc++ use pseudo-random device /dev/urandom, and libstd++ uses CPU instruction RDRND where possible. Because of these reasons users of the random device should inquire how does their implementation of STL generates random numbers before using them for security sensitive applications. To generate potentially large amount of pseudo-random values the random device should be used to seed a pseudo-random numbers generator. Random numbers library provides several types of pseudo• random generators, for example Linear congruential generator or Mersenne Twister. For a fast and convenient seeding of a generator, it is possible to construct one by supplying a single random value to the constructor. For initializing a generator with more entropy, second constructor is available to accept a sequence of random values for initializing internal state of a generator. To obtain a sequence of values distributed according to some sta• tistical probability function, the library provides distributions for pro• cessing outputs of pseudo-random generators. To name a few, standard library provides uniform, Bernoulli, Poisson, or normal distribution. We show how to generate a sequence of uniformly distributed random values in listing 3.5.

15 3. FEATURES IMPROVING CODE SECURITY

To discourage programmers from using random functions with low entropy, an STL algorithm std: : random_shuf f le has been removed in C++17. Purpose of this algorithm was to reorder elements in a container according to results of an implementation defined random function, usually low-entropy function std: :rand. Programmers are encouraged to use algorithm std: :shuffle, to which programmer must provide an appropriate generator. #include

// create random device std :: random_device rng; // seed Mersene Twister generator std :: mtl9937 prng(rng()); // limit interval to 0 — 1024 std :: unit orm_int_distribution < int > dist(0, 1024); // generate 16 uniformly distributed random values II in the given interval std::array< int, 16 > values; std :: generate (values . begin () , values.end(), [&]{ return dist(prng); }); Listing 3.5: Example of generating uniformly distributed random val• ues using C++11 random numbers library.

3.7 Ranges

C++ standard library provides the algorithms library, which consists of over a hundred functions for performing common operations on containers, such as copying, removing, sorting, or searching values. Problem with these functions is that they are cumbersome to use. Stan• dard algorithms operate on sequences of elements given by boundary iterators. For example, function sort takes as arguments iterators to the beginning and end of a sequence to sort. This generality is useful, but programmers usually want to operate on whole containers. Need to always type iterators to begin and end is not only annoying, but also error prone. If programmer by mistake use an end iterator which does not belong to the same container as the begin iterator, buffer overflow may occur.

16 3. FEATURES IMPROVING CODE SECURITY

To simplify usage of standard algorithms, ranges library is being standardized. The library defines term range as any type which pro• vides iterators to its beginning and end, i.e. containers are ranges. Ranges library provides similar functions as the algorithms library, but these functions take as arguments ranges, in other words whole containers, not just iterators. The idea is that programmer ideally should not determine where a sequence starts and where it ends, and leave the decision to the library, narrowing space for possible errors. Absence of boundary calculations also reduces possibility of buffer overflow caused by integer overflow. We show how ranges simplify code in listing 3.6. Exact contents of the ranges library is still being discussed, current status is described by the Ranges Technical Specification. It is expected for ranges to be included in C++20 or C++23. std :: vector a{0,l,2}, b{3,3,3}, result;

// classic STL algorithm requires 4 iterators std :: transform (a . begin () , a.end(), b.begin(), std :: back_inserter (result) , std :: plus < >());

// ranges, notice absence of iterators using namespace std :: ranges ; push_back( result , view :: transform (a, b, std :: plus < >())); Listing 3.6: Comparison of element-wise sum of two vectors using classic STL algorithm and ranges.

3.8 Attributes

Attributes are a form of annotations, which can be used to mark that some part of code have special properties. These properties vary from optimization hints for compiler to code behavior checks done during compilation. There are currently 9 standard attributes, and compilers are allowed to offer their own implementation defined attributes. 3 of the standard attributes are intended to check that annotated code is used in a correct way, and violation is detected by a compiler and reported as a warning:

17 3. FEATURES IMPROVING CODE SECURITY

• deprecated - Compiler issues warning when code marked by this annotation is used. Can be used on functions, classes, type- defs, variables, enumerations, or namespaces. • nodiscard - Annotated value must always be used, is applicable to classes or function return values.

• fallthrough - Used in switch statement, documents that break or return command is missing before the next case statement on purpose, so the program should execute also the following code.

// ignoring return value of function is_admin () II triggers warning [[nodiscard]] bool is_admin(); // all values of type permissions must be used [[nodiscard]] struct permissions; permissions get_file_permissions (path name); // applied in STL, for example in C++20: [[nodiscard]] bool std :: vector :: empty() const noexcept;

// deprecate passing ownership using a raw pointer [[deprecated]] void too (my_class* obj ); void too ( std :: unique_ptr obj);

switch (number) { case 1: bar(); [[fallthrough]] case 2: baz (); }; Listing 3.7: Example of using attributes.

3.9 Concurrent programming

Threading library was added in C++11, but multithreading has been used long before that through native OS APIs, such as Posix API or Win API. However, these interfaces offer only C-style functions and structures. C does not offer RAII idiom for automatic release of resources, which could cause deadlocks in case a mutex is by mistake not unlocked. C also does not have strong enough type system, so

18 3. FEATURES IMPROVING CODE SECURITY

arguments to functions invoked in new thread must be passed through pointer to void. Threading in C++11 is not only portable alternative to native inter• faces, but also offers ways how to use threads safely without deadlocks or race conditions.

3.9.1 Lock guard std: : lockguard is a RAII-style resource manager for mutexes, for example for std: :mutex. On construction, lock guard takes reference to a mutex as argument, and locks it. Acquired mutex is then correctly unlocked in the lock guard's destructor. How to use the lock guard is shown in listing 3.8.

3.9.2 Type safety Functions executed in a separate threads sometimes require argu• ments, but in Posix API and WinAPI only allow to pass argument of type void*. Programmer then must remember to which type the argument must be casted. If argument happens to be casted to a wrong type, invalid access to memory may occur. In C++, function is exe• cuted in a new thread by creating new instance of class std: : thread. Thread's constructor is a variadic template, which we describe in sec• tion 3.10, making it possible to pass arguments to a parallel function in the same way as to a normal functions, without need to use pointers with erased type.

3.9.3 Atomics For thread-safe arithmetic C++11 offers type std: : atomic, which is a template class able to hold any primitive type. Instances of atomic variables are free of data races, their manipulation is realized by atomic operations, without explicit need for synchronization primitives such as mutexes.

3.9.4 Parallel algorithms To greatly simplify parallel code, C++17 offers parallel algorithms. These algorithms are overloads of normal algorithms, which take

19 3. FEATURES IMPROVING CODE SECURITY

an extra argument, called an execution policy. When parallel execu• tion policy is used, algorithm is automatically executed in multiple threads, so programmers do not need to manually manage instances of std: : thread. Listing 3.8 includes an example of invoking parallel algorithm reduce. #include #include #include

// usage of mutex and lock guard static std :: mutex mutex; void baz() { std :: lock_guard < std :: mutex > guard (mutex); // critical section here } II mutex unlocked when junction ends

II sum values in parallel std::vector< int > numbers(10000000, 13); std :: reduce (std :: execution :: par, numbers . begin () , numbers . end ()); Listing 3.8: Examples of using lock guard and parallel algorithm re• duce.

3.10 Variadic templates

Functions which accept an arbitrary number of arguments are called variadic functions. C offers weakly typed variadic functions, where programmer can by mistake use arguments as different type than they are, and can also wrongly suppose that there are more arguments than were actually passed. Example of such vulnerable function is printf (), if provided formatting string does not match with the passed argu• ments, invalid access to memory is possible. For this reason input from users should never be used as a formatting string for printf (). C++11 offers variadic templates, which accept any number of type parameters. Variadic function templates can be called with any num• ber of arguments, and compiler derives exact type signature of the function, so arguments preserve their type and number of arguments is encoded in the function's type. The arguments are then available in a template parameter pack, which is similar to C alternative va_list,

20 3- FEATURES IMPROVING CODE SECURITY

but type-safe. C++17 simplified working with parameter packs with fold expressions. Variadic template functions are not only type-safe, but also efficient, recursive function in listing 3.9 can be optimized by compilers to a single function call. #include

// sum any number of given values template auto sum( Args&&... args) { return (args + ...); // fold expression 1

// print any number of given arguments template void logger(const T& v, Args&&... args) { std ::cout << v; // print value if constexpr (sizeof ...(args) == 0) { // print end of line after last value std:: cout << std :: endl; } else { // separate printed values with spaces std ::cout << " "; // recursively print the rest of values logger ( std :: forward(args )...);

// prints "Some sums: 6 1.875" logger("Some sums:", sum(l,2,3), sum(l, 0.5, 0.25, 0.125)); Listing 3.9: Example how to implement variadic template functions, which are able to accept an arbitrary number of arguments.

3.11 Class member initialization

To prevent unexpected behavior caused by using value of an uninitial• ized variable, C++11 has added an option to initialize class member variables upon declaration. In older C++ code, class variables must have been initialized in each constructor in an initialization section. Modern C++ allows programmers to assign default values to class vari• ables and omit them in constructors. Different types of initialization are described in listing 3.10.

21 3- FEATURES IMPROVING CODE SECURITY class init_demo { public : II a is left uninitialized init_demo() = default; // all variables initialized init_demo (int new_a) : a(new_a) {} // all variables initialized in constructor init_demo (int new_a, int newjb, int new_c, int new_d) : a(new_a), b(new_b) , c(new_c), d(new_d)

{} private : int a; 11 no default initialization int b = 1; 11 by default 1 int c = {}; 11 by default 0 int d{2}; //by default 2 }; Listing 3.10: Initialization of class member variables.

22 4 Coding conventions

Features introduced in modern C++ standards improve code security, but they must be also used by programmers to take effect. In this chapter we describe recognized coding standards, which recommend modern C++ features programmers should use to produce quality code. Official website of the Standard C++ Foundation recommends several coding standards [16]. Problem is, that the recommended standards are from year 2005, and therefore do not contain recommen• dations about modern C++, so we consider the website with recom• mended standards to be outdated. However, one recommendation there still holds true, we should never use C coding conventions for C++ programming, because C is inherently insecure language.

4.1 C++ Core Guidelines

C++ Core Guidelines [17] are an official set of rules and best practices from the Standard C++ Foundation. This guideline contains hundreds of items and is still in development, which is led by Bjarne Stroustrup. The aim of this guideline is to help programmers to use modern C++ effectively, the rules are focused on high-level issues with emphasis on safety and simplicity. The guideline describes broad range of issues. It starts with de• scribing philosophy of C++, offering general rules how good C++ code should be written, for example that the code should express intent or should use RAII principle to not leak any resources. Other sections contain specific rules about interfaces, class hierarchies, performance, concurrency, resource management or error handling. There are hundreds of rules, programmers should be aware of them, but no developer is comfortable with memorizing all of them. Instead, where possible, the rules are designed in such a way so that they can be employed in an analysis tool. When some rule is violated, static code checker should issue a warning with reference to the partic• ular rule. Static analysis of code should be an essential part of software development, as we describe in the chapter 5.

23 4. CODING CONVENTIONS

4.1.1 Guideline Support Library There are rules in the C++ Core Guidelines which suggest using fea• tures from the Guideline Support Library (GSL) [18]. This library is a collection of utilities for improving code security. The idea is that the classes in GSL can be used now to improve code, but are also can• didates for future standardization. One of the classes is span, which has been approved for upcoming C++20, as we have mentioned in the section 3.3.1. Another interesting class is non_null, which is an implemen• tation of a design pattern called decorator pattern. This class encap• sulates a pointer and ensures that it is never null. The class is not default-constructible, so it cannot be empty. It is not possible to assign nullptr literal there, because that intentionally does not compile. And finally, assigning pointer pointing to null is detected at run time, caus• ing throwing of an exception or immediately terminating the program, this behavior is configurable. So user if class non_null can be sure that it always contains non-null pointer.

4.2 SEI CERT C++ Coding Standard

SEI CERT C++ Coding Standard has rules focused on secure coding in the C++ programming language [19]. Following the rules is nec• essary for developing reliable system without undefined behavior and exploitable vulnerabilities. Importantly, the coding standard rec• ommends using modern C++ features, such as smart pointers, lock guards and static asserts. Authors of the standard claim that the following tools do check for some of the rules: Clang, Coverity, ECLAIR, EDG, GCC, Klocwork, LDRA, Parasoft, PRQA QA-C++, PVS-Studio, Rose, SonarQube C/C++ Plugin, Splint.

4.3 High Integrity C++ Coding Standard

High Integrity C++ [20] is a coding standard developed by company Programming Research Limited [21]. This standard has existed from year 2003, and has been overhauled in 2013 to reflect modern C++11

24 4. CODING CONVENTIONS

features. The rules enforced by this standard are similar to the previ• ously mentioned guidelines. For example, they recommend to man• age resources using modern C++ classes such as std: :unique_ptr or std: : lock_guard, and also recommend to stop using deprecated features, such as std: :auto_ptr. Programming Research Limited offers the coding standard as part of their commercial static code analyzer QA C++, which we describe in more detail in the chapter 5. Their customers come from industries where safety-critical systems are needed, such as aerospace (Lock• heed Martin, Honeywell), automotive (BMW, Ferrari), medical devices (Hitachi Medical Systems, Toshiba) or banking (Visa). Programming Research Limited has also an active role in the C++ standardization, they participate in the ISO C++ working groups and attend votings about upcoming C++ features.

4.4 AUTOSAR C++14 Coding Guidelines

Full name of this coding standard is Guidelines for the use of the C++14 language in critical and safety-related systems [22]. It is a document cre• ated by AUTOSAR (AUTomotive Open System ARchitecture) [23], which is a development partnership between companies interested in automotive industry. Goal of AUTOSAR is to establish an open standard for embedded software used in cars. AUTOSAR has more than 180 members, which are car makers, suppliers, tool vendors or research institutes. AUTOSAR has been founded by leaders in the automotive industry: BMW Group, BOSCH, Continental, Daimler, Ford, General Motors, Toyota, and Volkswagen Group. The guideline is an extension to the MISRA C++ 2008 coding stan• dard, but is also a compilation of rules from several other coding standards, to which the guideline explicitly refer, such as High In• tegrity C++, CERT C++, C++ Core Guidelines and Joint Strike Fighter Air Vehicle C++ Coding Standard. AUTOSAR C++14 Guidelines could be considered too restrictive outside of automotive or embedded industry, but it is an exemplary guideline for modern C++ development. It encourages usage of secure features of modern C++, such as C++11 smart pointers for secure mem-

25 4. CODING CONVENTIONS

ory management, std: : array as a static buffer with known size, or static_assert to enforce restrictions at compile time where possible. In these days, the dominant programming language in the em• bedded and automotive industry is C, however, this guideline can be seen as a push for the industry to use modern C++ as a more secure alternative.

4.5 C++ Best Practices

C++ Best Practices [24] is a collaboratively developed collection of best practices for programming in C++. It is divided into sections focused on different aspects of development, covered topics are for example code style, tooling, maintainability, portability, safety, or performance. With contrast to the other standards described above, C++ Best Prac• tices strive to be a succinct guideline with illustrative code examples. In general, the guideline recommends to use C++ features over C features, together with modern features like smart pointers, variadic templates, std: : array, or default initialization of class member variables.

26 5 Tools helping with modern C++

Tooling is an unimaginable part of software development, helping programmers and testers to produce, process, review, and test code. In the previous chapters, we have already described what modern C++ is, what features does it provide to improve code security, and which coding standards require their usage. In this chapter, we focus on some commercial and open source static analysis tools, and describe how they can help with code modernization and adherence to modern coding standards. We focus on mainly static analysis tools because static analysis offers examination of the whole codebase at little or no cost, compared to manual, dynamic, or formal analysis methods. Formal validation has a power of proving that the validated software does not contain errors, but it is unfeasible to use such methods on every system in development. Tools presented in this chapter are listed in an alphabetical order, with the exception of clang-tidy. We describe clang-tidy at the end of this chapter and in more detail, because we have used it in our analysis of open source C++ projects in Chapter 7.

5.1 Clang Static Analyzer

Clang Static Analyzer [25] is an open source static code analyzer, developed as part of Clang/LLVM project. Currently there are 60 types of issues which Clang Static Analyzer detects, divided in several categories. Supported languages are C, C++ and Objective-C. Results can be visualized in a web browser interface. Clang Static Analyzer inspects control-flow graph of a program to discover issues. Core checks are able to detect for example use of uninitialized values or division by zero. Unix checks for instance de• tect insecure usage of common APIs, such as check unix.Malloc, which can detect memory leaks, double free, and use-after-free errors caused by wrong usage of function malloc. Specifically for C++ there are two checks, cplusplus.NewDelete and cplusplus.NewDeleteLeaks, which de• tect memory leaks, double free, and use-after-free errors caused by incorrect usage of operators new and delete.

27 5- TOOLS HELPING WITH MODERN C++

5.2 Cppcheck

Cppcheck is an open source static analyzer for C++. Its goal is to iden• tify suspicious code constructs, with no false positives. Cppcheck is a very popular tool thanks to its availability, simplicity to use and range of implemented checks. However, it does not provide much function• ality with respect to modern C++ standards. It is able to find usages of deprecated std: : auto_ptr, but it is not able to detect obsolete unsafe language constructs and recommend safer alternatives. It also does not employ rules from common coding standards.

5.3 ECLAIR

ECLAIR [26] is a commercial analyzer for C and C++. It is equipped with static code analyzer and also semantic analyzer with symbolic model checking for deeper inspection of how a program works. It is able to analyze C++11 and employ checks of rules from many coding standards, including CERT C++ and High-Integrity C++.

5.4 Klocwork

Klocwork [27] is a commercial static code analyzer, with security re• lated features. It is able to analyze all the modern C++ standards, including C++17, and incorporates checks related to common secure coding and industry related standards, such as CERT, AUTOSAR C++14, or OWASR There are also checks related to outdated language constructs, but programmers should focus on fixing the cause of a problem, e.g. manual memory management, and not on consequences, e.g. memory leaks.

5.5 LDRA tool suite

Liverpool Data Research Associates (LDRA) [28] is a provider a software quality tools, founded in 1975. They provide a collection of tools de• signed for industries where mission critical systems are used, specifi• cally for aerospace, energetic, automotive, rail, and medical industry.

28 5- TOOLS HELPING WITH MODERN C++

The tools perform static analysis, dynamic analysis, or checks com• pliance with coding standards. A wide range of coding standards is employed to ensure code quality, including standards which recom• mend modern C++ features, for example AUTOSAR C++14 Guidelines and High Integrity C++ coding standards.

5.6 Parasoft C/C++test

This is a tool capable of static and dynamic analysis, and is equipped with support for safety-critical and best practices standards. From modern C++ coding standards there is support for CERT C++, High Integrity C++ and AUTOSAR C++14. Parasoft C/C++test is being used by Lockheed Martin, Daimler, or Western Digital.

5.7 QA C++

QA C++ [29] is a static code analyzer developed by Programming Re• search Limited, authors or High Integrity C++, as we have mentioned in the section 4.3. The tool is able to check correct usage of modern C++ features, for example it has dedicated analysis of move semantics or keyword override. Supported C++ standards up to C++14. Emphasis is also put on following best practices, apart from High Integrity C++ coding standard other standards are also supported, most importantly CERT C++.

5.8 Clang-tidy

Clang-tidy [30] is a tool which looks very much like Clang Static Analyzer, and is also part of the Clang/LLVM project. Clang-tidy is a static code analyzer and linter, which is equipped with around 200 different checks. For clarity, the checks are divided into the following categories:

• android, boost, fuchsia, mpi - Detects issues related to popular frameworks or libraries. For example, check mpi-type-mismatch detects that programmer reports usage of different type than

29 5- TOOLS HELPING WITH MODERN C++

actually used type, which is possible due to weak typing of MPI functions. • bugprone - Issue warnings on constructs commonly causing bugs. As an example we can describe check bugprone-suspicious- memset-usage, which warns for instance when function memset (destination, fill_value, byte_count) is used with zero constant as byte_count argument, because programmer most probably wanted to use the zero as f ill_value argument.

• cert, cppcoreguidelines, hicpp - Checks in these categories warn when rules of specific coding standards are violated. An ex• ample is a check hicpp-member-init, which warns about leaving class member variables uninitialized. However, only selected rules from these coding standards are implemented.

• google, llvm - These checks enforce compliance to the popular coding styles of Google and LLVM.

• misc - Checks from this category point out to some miscella• neous issues, such as using assert with compile-time constants instead of static_assert (misc-static-assert).

• modernize - Checks in this category are focused on finding ob• solete language constructs, together with suggestions of mod• ern replacements. Additionally, if we use command line op• tion -fix, clang-tidy will automatically alter the code, replacing old features with modern ones. One example can be modernize- replace-auto-ptr, which can replace usage of std: : auto_ptr with std: : unique_ptr, with documented limitations. Another exam• ple is check modernize-loop-convert, which is able to transform manual traversion over a container from for loop to a C++11 for-range loop.

• objc - Checks detecting issues related to Objective-C code. Not useful for C++.

• performance - These checks look for possible performance im• provements. For example performance-inefficient-algorithm offers to use class methods of associative containers instead of generic

30 5- TOOLS HELPING WITH MODERN C++

STL algorithms. One of the suggestions is to use method find of std: : set instead of algorithm std: : f ind, because the class method takes advantage of the set's internal data structure to speed up item lookup. • readability - Readability issues are triggered by a code which is completely redundant, or should be written in a different, more understandable way. For example issue readability-misleading- indentation warns about code which is indented as if it belongs to a block of code, but in fact it does not due to missing braces. This check was inspired by an infamous goto fail; bug in Apple's SSL verification function [31].

• analyzer - In this category are present all the checks from the Clang Static Analyzer mentioned above. In case programmer does not need Clang Static Analyzer's visualization of issues, clang-tidy becomes an universal tool for detecting a broad range of issues in code.

Because clang-tidy is an actively developed open source project, which offers a wide range of checks related to modern C++ features, we have chosen to use it for performing an analysis of open source C++ projects, described in Chapter 7.

31 6 Other languages

Vulnerabilities in code are not only issue of C and C++, this problem exists also in other programming languages. Purpose of this chapter is to briefly examine presence of described vulnerabilities in other pop• ular general-purpose programming languages. Selection of examined languages is not only based on an absolute popularity, we also study languages which claim to solve some issues related to secure coding.

6.1 Java

Java is an object oriented, statically typed, compiled language. The language first appeared in 1995, and its syntax is inspired by C++. With respect to the memory management, Java does not offer any means of manual memory management. All the memory is managed by a garbage collector, so it is not possible to encounter a dangling pointer in Java. Albeit Java programmers do not need to deal with memory allocation and deallocation, proper resource management must still be addressed by programmers. For every usage of a resource, such as opened file or database connection, an extra code must wrap the resource to be properly released later. For this purpose are meant code constructs try-catch-finally and newer try-with-resources. With respect to buffer overflows, Java does not allow to access an arbitrary address in memory. Accessing array element at an invalid index causes exception to be thrown. Variables in Java are always initialized, so it is not possible to affect behavior of a program by controlling values present in undefined variables. Arithmetic types are initialized to 0, booleans to false, and reference types to null. Dereference of a null pointer throws NullPoint- er Exception. To prevent race conditions, it is possible in Java to use any object as a mutex, or even mark whole methods with keyword synchronized, which means that the method can be executed on one object only once at a time.

32 6. OTHER LANGUAGES 6.2 C#

C# is a very similar technology to Java, but it differs in one safety- related aspect. In C#, there is a keyword unsafe, which permits to use pointers. This feature is used for low level operations for performance reasons, but runtime is unable to check correctness of operations with pointers.

6.3 Python

Python is an interpreted language with a dynamic type system. Re• sources in Python are managed by a garbage collector, but similarly to Java, programmer must take care of proper release of resources, either by wrapping resource in try-except-finally or with statement. Python offers an interesting solution to the problem of integer overflow. Integers in Python have an arbitrary precision, which means they can represent any integral value. Therefore, the boundaries over which an integer could overflow does not exist. However, arbitrary precision types have lower performance than bounded arithmetic types.

6.4 Rust

Rust is a relatively new programming language, released in 2010. Rust is a compiled, statically typed language, focused on safe concurrent programming. Rust uses RAII-style resource management, similarly to C++. For shared resources, there is an option to use reference counting, Rust does not have garbage collector. Compiler ensures that the program does not have null or dangling pointers. To mitigate race conditions in multithreaded applications, Rust's compiler is equipped with a borrow checker. This checker ensures that at all times, one object can be referenced through multiple immutable references, or just one mutable reference. For low-level code, where safety checks cannot be done, Rust has keyword unsafe, which is used to mark code where normal safety checks do not .

33 6. OTHER LANGUAGES 6.5 Go

In 2009, Google has released Go, a statically typed, compiled language, with features focused on concurrent applications. Creators of Go has designed the language with common issues in mind, so for example problematic pointer arithmetic is is not present in Go. Memory management is done by a garbage collector, which is optimized for a minimal latency, but still poses an overhead. Concurrency in Go is realized by a so called goroutines, which are concurrently executed functions. There are no restrictions on accessing shared resources in goroutines, meaning that race conditions are possi• ble. For safe communication between goroutines, Go offers thread-safe message passing mechanism called channels.

6.6 Haskell

Haskell is a strongly typed, purely functional programming language. Haskell is not ranked as one of the most popular languages, but large scale systems written in Haskell exist, for example an anti-spam system at Facebook [32]. Haskell is suitable for such large, distributed systems, because values in Haskell are implicitly immutable, preventing race conditions. Furthermore, Haskell uses garbage collector for memory manage• ment, and its type system does not allow to use pointers to values, which prevents all kinds of memory related issues, including buffer overflows. However, for a very specific, low-level operations, Haskell offers several unsafe interfaces. One of them is function unsaf ePerf ormlO, which is able to mask side effects, but is not type safe and may lead to a program crash [33]. Another unsafe feature is Foreign Function Interface, which allows to call functions from other languages to create bindings to different languages, for example C.

34 7 Analysis of C++ projects

To explore how much are insecure features still used in real world C++ projects, we have performed an analysis of the most popular projects. Our goal was to perform static analysis on a significant amount of projects, and study occurrences of a selected subset of issues resolved in modern C++ standards. In this chapter we first describe methodology of our research. Then we describe the process of analysis execution, with timings, imple• mentation details, and technical problems we have faced. Next, we present quantitative results of our analysis. In the end of this chapter we provide conclusion of our analysis.

7.1 Methodology

We have decided to perform mass analysis of a significant amount of projects. We have retrieved source codes for analysis from GitHub, which is the largest platform for collaborative development of open source projects in the world, with 80 million repositories as of March 2018 [34]. To find issues in the source code, we have used static analysis tool clang-tidy, which we have described in the Chapter 5. We have enabled checks which indicated existence of a problem solved by modern C++ features. Checks have been chosen to find errors in the code caused by usage of obsolete features, to find code which could be modernized, and also to find improper usage of modern features. In the clang-tidy we have enabled the following checks:

• clang-analyzer-cplusplus.NewDelete - Checks for misuse of manual memory management, detects use-after-free and double free errors. We consider this error as critical.

• clang-analyzer-cplusplus.NewDeleteLeaks - Detects potential memory leaks by analyzing control-flow graph of a program. This error is also considered to be critical.

35 7. ANALYSIS OF C++ PROJECTS

• cert-dcl50-cpp - Detects definitions of C-style variadic functions. This is not an error, but it potentially gives a room for one, and is generally considered to be a bad practice in C++. • cert-msc50-cpp - Detects usage of function std: : rand(). Sever• ity of this finding depends on what entropy and distribution of random numbers is really needed.

• cppcoreguidelines-no-malloc - Check for usage of C-style mem• ory allocation functions malloc () and f ree (). We consider this finding as serious, because these functions do not call construc• tors and destructors, which can lead to usage of uninitialized memory and presence memory leaks.

• modernize-replace-auto-ptr - Detects usage of deprecated class std: : auto_ptr. Such code needs refactoring to work in C++17, where std: :auto_ptr is removed.

• readability-uniqueptr-delete-release - Detects improper usage of std: :unique_ptr, where programmer manually deallocates its contents. This is not an error, but an indication that the pro• grammer may not know hot to use unique pointers in a correct way.

• misc-uniqueptr-reset-release - This check is similar to the pre• vious one, detects improper usage of std: :unique_ptrs's API. Similarly to the previous check indicates that the programmer may not know how unique pointers should be used.

• modernize-make-shared - Suggests to use a utility function std: : make_shared to create object managed by shared pointer, instead of operator new. Severity of this finding is moderate, it can lead to memory leaks only under very specific cases, but most importantly means unnecessary usage of manual memory allocation.

• modernize-make-unique - Similar to the previous check, sug• gests to use function std: : make_unique instead of operator new. Consequences of this finding are the same as for the previous check.

36 7. ANALYSIS OF C++ PROJECTS

7.2 Analysis execution

At first, we have chosen to analyze 1000 of most popular C++ reposito• ries on GitHub. This number is a limit of results provided by GitHub's search API. To discover more popular C++ repositories would require do exhaustive traversion of repositories. Such traversion would re• quire excessive amount of time due to existence of hour limit of API queries. Furthermore, we have required the projects to contain CMake con• figuration files. CMake is a meta-build system, used for generating native build system files, for example Unix Makefiles. CMake is con• sidered to be the standard instrument for configuring C++ projects. CMake can also generate database of source files in the project, to• gether with used compilation flags. This database is called compile commands database. Clang-tidy uses this database to learn infor• mation about the project, which is important for providing accurate results. Out of the initial 1000 repositories, 557 contained CMake con• figuration scripts. To generate compile commands database, CMake needs to successfully configure the project. On many projects con• figuration failed due to need to take custom configuration steps, or because of missing dependencies. Disadvantage of C++ is that there is no standard way of managing dependencies. We have been able to successfully configure 203 repositories. We consider this number of repositories still significant for our analysis.

7.2.1 Implementation details We have developed scripts written in Python, which perform all 3 phases of the analysis: discovering projects, downloading them, and performing static analysis. The scripts use common SQLite database to store status of processing and found issues. Source code of the scripts and database with discovered issues are available in the attachments. However, we had to remove critical issues from the public version of the database, because they could possibly be used to exploit affected projects. To discover most popular C++ projects we have used GitHub API version 3. We have user Search API to look for C++ projects, ordered by number of stars given to the project by users. We have also needed

37 7. ANALYSIS OF C++ PROJECTS

to use authentication token to increase limits of available API requests per hour. We have filtered the discovered project to contain CMake code. GitHub API provides results in JSON format. Discover phase took 15 minutes. For downloading repositories, we have not used standard cloning function of git, but downloaded zip archives from GitHub without git metadata, to speed up download. The download took 2 hours, in total 35 GB of project files were stored. According to command line tool sloe count [35], the downloaded repositories contained 92 million lines of C++ code. To speed up analysis done by clang-tidy, we have used parallel wrapper run-clang-1 idy. py, which is distributed together with clang- tidy, to utilize all CPU cores. We also had to deduplicate found issues, because issues present in common header files are reported for every translation unit using such header file. We have also found out that some projects contained their own configuration for clang-tidy, which got added to the configuration used by us, resulting in finding more types of issues than we have looked for, so after the analysis we had to remove unwanted issues from the database. The step of running clang-tidy on the projects took 20 hours.

7.3 Results

Out of 203 projects successfully configured by CMake, clang-tidy than was able to successfully analyze 186. For the remaining 17 projects clang-tidy issued only diagnostic errors, most probably because of missing configurations steps or dependencies not checked by CMake. According to sloccount, we have obtained results from analyzing approximately 40 million lines of C++ code. In the table 7.1 we can see that the most frequent issue is usage of C-style memory management functions mallocO and free(). More than 5 thousand instances of this issue were found, which is multiple times more than of any other issue we were looking for. Alarming is also number of projects where this issue was present, more than 59% of analyzed projects is affected. From the high standard deviation of occurrences in projects we conclude that there are several projects with much higher number of issues than the rest of the projects. Overuse of

38 7. ANALYSIS OF C++ PROJECTS

C-style memory management could be caused by analyzing C libraries bundled together with downloaded projects, but manual inspection of the issues revealed that most of the findings are in C++ code. We conclude that programmers need to be better trained how C++ should be used, that compared to C there are more differences than support for object oriented programming. Second most frequent issue is presence of pseudo-random gener• ator function rand(), more that 1500 usages in 37% of projects were found. Random numbers are definitely not needed as much as mem• ory allocation, we consider this number of issues as relatively high. There can be many reasons for this situation. Programmers probably have low awareness of the problematic of random numbers generation and entropy in general. Also the C++11 random numbers library is probably considered by many programmers to be too complicated. Another reason is that some projects do not have need for random numbers with high entropy. Surprising finding is that there were found 832 definitions of C- style variadic functions. This feature should not have been used even before C++11 and should be replaced by programmers with type-safe variadic templates. Interesting result is that we have found 481 potential memory leaks in 26% of projects. In a similar number of projects, 23%, we have also found 130 serious bugs related to manual usage of operators new and delete. We consider the fact that one quarter of C++ projects have significant issues with memory management as very serious. Analysis also found 326 instances of deprecated auto pointers, but only in 14 projects, which accounts for 7.5% of projects. This is an evidence that auto pointers are not widely used and their removal from the C++ standard will not significantly hurt backwards compatibility. Suggestions to allocate smart pointers using dedicated utility func• tions were found in 16 project in case of shared pointers, and only in 2 projects in case of unique pointers. According to the results related to memory management issues, we conclude that smart pointers are not used as much as they should, but programmers can handle them correctly in most cases. And finally, we have found only 3 misuses of smart pointers API. Good result is that we have not found any cases of deallocation of a pointer previously owned by a smart pointer.

39 7. ANALYSIS OF C++ PROJECTS

7.3.1 Combinations of memory issues Due to the large number of found memory issues we have decided to further examine relations between them, to find out whether the issues are isolated or related to each other. Results are shown in the table 7.2. Not a good result is that programmers mix usage of C-style and C++-style memory allocations, shown in the first two rows of the table. 10% of projects suffer both serious bugs with memory management (check NewDelete) and memory leaks (check NewDeleteLeaks). All 3 types of memory issues: usage of malloc and issues NewDelete and NewDeleteLeaks were found in 11% of projects. In total, at least one memory related issue was found in 67% of analyzed C++ projects.

Combination of issues Projects Projects (%) malloc and NewDeleteLeaks 40 21.5 malloc and NewDelete 36 19.4 NewDelete and NewDeleteLeaks 22 11.8 all 3 types 21 11.3 at least one type 125 67.2

Table 7.2: Numbers of repositories where multiple types of memory issues were found.

7.4 Conclusions

From the results presented above we conclude that obsolete features are still excessively used in open source C++ projects. We have even been able to find serious errors related to poor handling of memory management, like potential memory leaks, use-after-free or double free errors. However, we think that not using modern C++ standards is only a part of the problem. High usage of C-style constructs suggests that programmers should be more educated in differences between C and C++, because even older C++ standards offer much higher security than C. Compatibility with C is one of the reasons of C++'s success, but programmers should take advantage of this compatibility by using

40 Issue Total count Projects Projects (%) Average* a* cppcoreguidelines-no-malloc 5293 110 59.1 48.1 106.6 cert-msc50-cpp (randO) 1563 69 37.1 22.7 87.8 cert-dcl50-cpp (C variadics) 832 83 44.6 10.0 19.4 NewDeleteLeaks 481 49 26.3 9.8 26.5 modernize-replace-auto-ptr 326 14 7.5 23.3 37.0 NewDelete 130 43 23.1 3.0 2.5 modernize-make-shared 120 16 8.6 7.5 7.0 modernize-make-unique 37 2 1.1 18.5 10.6 misc-uniqueptr-reset-release 3 3 1.6 1.0 0 readability-uniqueptr-delete-release 0 0 0 0 0

"•average number of issues in projects and corresponding standard deviation c are calculated over projects that contain at least 1 issue Table 7.1: Total numbers of issues found by clang-tidy. 7. ANALYSIS OF C++ PROJECTS existing C libraries, and should not write C code in C++ projects. From the fact that C-style and C++-style manual memory management was mixed in a significant number of projects we conclude that to the open source projects contribute developers with different levels of C++ knowledge, and these projects should specify rules about what features are permitted in code, for example by following one of the coding conventions we have presented in Chapter 4.

42 8 Conclusions

In this thesis, we have reviewed current status of secure coding in modern C++. This thesis forms a handbook for C++ programmers, who want to learn new features introduced in modern C++ standards, which replace insecure constructs and offer a way of writing code less prone to common vulnerabilities. We describe what issues these fea• tures solve and provide brief examples how they can be used. We also examined means of possible enforcement of modern C++ standards, which are coding conventions recommending modern features, and tools for automatic detection of violations of rules prescribed by the conventions. We have also briefly compared security of programming in C++ with other programming languages, which claim to be more secure than C++. In the comparison we have found out that other languages in some ways implicitly offer better security, but similarly to C++ also offer unsafe low-level APIs. However, these unsafe APIs are often in other languages explicitly marked as unsafe, usually directly in their name, while in C++ programmers must be trained which features are secure to use. In the second part, we presented our analysis of open source C++ projects. We have been able to successfully analyze 186 projects, and discovered a large amount of issues caused by not using features from modern C++ standards. Most of the issues, and most serious, were related to manual memory management. Majority of the projects used C-style memory allocation functions, which are insecure to use for allocating objects, because they do not call constructors and destruc• tors. Hundreds of potential memory leaks, use-after-free, and double free errors were also present, which would not be possible if smart pointers were used instead of manual memory allocation and deal• location. The rest of the findings show high usage of C-style random numbers generation with low entropy and C-style variadic functions, which should be substituted with alternatives introduced in C++11. On the other hand, low number of misuses of smart pointers were found. Because of an excessive usage of C-style memory management, we conclude that low adoption of modern C++ features is not caused by low adoption of modern standards in general, but by preference

43 8. CONCLUSIONS

of programmers to write C-style code in C++ projects. Compatibility with C is one of the selling points of C++, but should only be utilized for using existing libraries written in C. This usage then should be well encapsulated and programmers should be educated to prefer C++ style of programming where possible.

44 Bibliography

1. ISO/IEC. ISO International Standard ISO/IEC 14882:2017 - Pro• gramming Language C++. 2017. Available also from: https : / / www . iso . org/standard/68564 . html. Standard. International Organization for Standardization (ISO). 2. ISO C++ STANDARDS COMMITTEE. C++ Standard Draft Sources [online]. 2018 [visited on 2018-05-15]. Available from: https : //github.com/cplusplus/draft. 3. THE MITRE CORPORATION. CWE-121: Stack-based Buffer Over• flow [online]. 2018 [visited on 2018-05-15]. Available from: https: //ewe.mitre.org/data/definitions/121.html. 4. INTEL CORPORATION. Control-flow Enforcement. 2017. Avail• able also from: software . intel. com/sites/default/files/ managed / 4d / 2a / control - flow - enforcement - technology - preview.pdf. Technology Preview. 5. EVTYUSHKIN, D.; PONOMAREV, D.; ABU-GHAZALEH, N. Jump over ASLR: Attacking branch predictors to bypass ASLR. In: 2016 49th Annual IEEE/ACM International Symposium on Mi• croarchitecture (MICRO). 2016, pp. 1-13. 6. THE MITRE CORPORATION. CWE-416: Use After Free [online]. 2018 [visited on 2018-05-15]. Available from: https : //ewe .mitre. org/data/definitions/416.html. 7. THE MITRE CORPORATION. CWE-401: Improper Release of Mem• ory Before Removing Last Reference ('Memory Leak') [online]. 2018 [visited on 2018-05-15]. Available from: https : / /ewe . mitre . org/data/definitions/401.html. 8. THE MITRE CORPORATION. CWE-415: Double Free [online]. 2018 [visited on 2018-05-15]. Available from: https : //ewe .mitre. org/data/definitions/415.html. 9. THE MITRE CORPORATION. CWE-190: Integer Overflow [on• line]. 2018 [visited on 2018-05-15]. Available from: https : //ewe. mitre.org/data/definitions/190.html.

45 BIBLIOGRAPHY 10. THE MITRE CORPORATION. CWE-362: Concurrent Execution using Shared Resource with Improper Synchronization ('Race Condi• tion') [online]. 2018 [visited on 2018-05-15]. Available from: https: //ewe.mitre.org/data/def initions/362.html. 11. THE MITRE CORPORATION. CWE-457: Use of Uninitialized Vari• able [online]. 2018 [visited on 2018-05-15]. Available from: https: //ewe.mitre.org/data/def initions/457.html. 12. STEPHENSON, Arthur G; LAPIANA, Lia S.; MULVILLE, Daniel R; RUTLEDGE, Peter J.; BAUER, Frank H; FOLTA, David; DUKE- MAN, Greg A.; SACKHEIM, Robert; NORVIG, Peter. Mars Cli• mate Orbiter Mishap Investigation Board Phase I Report [https : // His . nasa . gov/llis_lib/pdf/1009464mainl_0641-mr . pdf ]. 1999. 13. C++ REFERENCE CONTRIBUTORS. C++ reference [online]. 2018 [visited on 2018-05-15]. Available from: https : //eppref erence. com/. 14. BOEHM, Hans-J.; SPERTUS, Mike; NELSON, Clark. N2670: Min• imal Support for Garbage Collection and Reachability-Based Leak De• tection [http : //www . open-std . org/jtcl/sc22/wg21/docs/ papers/2008/n2670.htm]. 2008. 15. MICROSOFT. random_device Class [online]. 2018 [visited on 2018- 05-15]. Available from: https : //docs .microsoft. com/en-us/ epp/standard-library/random-device-class. 16. STANDARD C++ FOUNDATION. What are some good C++ coding standards? [online]. 2018 [visited on 2018-05-15]. Available from: https://isoepp.org/wiki/faq/coding-standards. 17. STANDARD C++ FOUNDATION. C++ Core Guidelines [online]. 2018 [visited on 2018-05-15]. Available from: https : //github. com/isocpp/CppCoreGuidelines. 18. MICROSOFT. GSL: Guideline Support Library [online]. 2018 [vis• ited on 2018-05-15]. Available from: hthttps : / /github . com/ Microsoft/GSL.

46 BIBLIOGRAPHY 19. SOFTWARE ENGINEERING INSTITUTE, CARNEGIE MELLON UNIVERSITY. SEI CERT C++ Coding Standard [online]. 2018 [vis• ited on 2018-05-15]. Available from: https : //wiki . sei . emu . edu/confluence/display/cplusplus/Introduction. 20. PROGRAMMING RESEARCH LTD. High Integrity C++ Coding Standard [online]. 2018 [visited on 2018-05-15]. Available from: http://www.codingstandard.com/. 21. PROGRAMMING RESEARCH LTD. PRQA [online]. 2018 [visited on 2018-05-15]. Available from: http: //www. prqa. com/. 22. AUTOSAR. Guidelines for the use of the C++14 language in critical and safety-related systems [online]. 2017 [visited on 2018-05-15]. Available from: www . autosar . org/f ileadmin/user_upload/ standards /adaptive/17- 03/AUTOSAR_RS _CPP14Guidelines . pdf. 23. AUTOSAR. AUTOSAR (AUTomotive Open System ARchitecture) [online]. 2018 [visited on 2018-05-15]. Available from: https : //www.autosar.org/. 24. TURNER, Jason; C++ BEST PRACTICES CONTRIBUTORS. C++ Best Practices [online]. 2018 [visited on 2018-05-15]. Available from: http: //eppbestpractices. com. 25. LLVM. Clang Static Analyzer [online]. 2018 [visited on 2018-05-15]. Available from: https: //clang-analyzer. llvm. org. 26. BUGSENG. ECLAIR [online]. 2018 [visited on 2018-05-15]. Avail• able from: http: //www. bugseng. com/eclair. 27. ROGUE WAVE. Klocwork [online]. 2018 [visited on 2018-05-15]. Available from: roguewave. com/products-services/klocwork. 28. LIVERPOOL DATA RESEARCH ASSOCIATES. LDRA: Automat• ing Software Verification, Requirements Traceability and Standards Compliance [online]. 2018 [visited on 2018-05-15]. Available from: https://ldra.com/. 29. PROGRAMMING RESEARCH LTD. QA Static Analyzers [online]. 2018 [visited on 2018-05-15]. Available from: http://www.prqa. com/static-analysis-software/qac-qacpp-static-analyzers/.

47 BIBLIOGRAPHY 30. LLVM. Clang-Tidy [online]. 2018 [visited on 2018-05-15]. Available from: http://releases.llvm.org/6.0.0/tools/clang/tools/ extra/docs/c1ang-1idy/index.html. 31. NIST NATIONAL VULNERABILITY DATABASE. CVE-2014- 1266 [online]. 2014 [visited on 2018-05-15]. Available from: https: //nvd.nist.gov/vuln/detail/CVE-2014-1266. 32. MARLOW, Simon. Fighting spam with Haskell [online]. 2015 [vis• ited on 2018-05-15]. Available from: https : //code . f acebook . com/posts/745068642270222/fighting-spam-with-haskell. 33. THE UNIVERSITY OF GLASGOW. System.10.Unsafe [online]. 2018 [visited on 2018-05-15]. Available from: http : //hackage . haskell . org/package / base -4.11.1. 0 / docs / System - 10 - Unsafe.html. 34. GITHUB. GitHub [online]. 2018 [visited on 2018-05-15]. Available from: https: //github. com/about. 35. WHEELER, David A. SLOCCount [online]. 2018 [visited on 2018- 05-15]. Available from: https : //www. dwheeler. com/sloccount/.

48 A Attachments

To the electronic version of the thesis were attached following files: • source.zip - source codes of our scripts for static analysis of C++ projects

• repositories.db - SQLite database containing discovered issues

49