Secure Coding in Modern C++
Total Page:16
File Type:pdf, Size:1020Kb
MASARYK UNIVERSITY FACULTY OF INFORMATICS Secure coding in modern C++ MASTER'S THESIS Be. Matěj Plch Brno, Spring 2018 MASARYK UNIVERSITY FACULTY OF INFORMATICS Secure coding in modern C++ MASTER'S THESIS Be. Matěj Plch Brno, Spring 2018 This is where a copy of the official signed thesis assignment and a copy of the Statement of an Author is located in the printed version of the document. Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Be. Matěj Plch Advisor: RNDr. Jifi Kur, Ph.D. i Acknowledgements I would like to thank my supervisor Jiří Kůr for his valuable guidance and advice. I would also like to thank my parents for their support throughout my studies. ii Abstract This thesis documents how using modern C++ standards can help with writing more secure code. We describe common vulnerabilities, and show new language features which prevent them. We also de• scribe coding conventions and tools which help programmers with using modern C++ features. This thesis can be used as a handbook for programmers who would like to know how to use modern C++ features for writing secure code. We also perform an extensive static analysis of open source C++ projects to find out how often are obsolete constructs of C++ still used in practice. iii Keywords secure coding, modern C++, vulnerabilities, ISO standard, coding conventions, static analysis iv Contents 1 Introduction 1 1.1 C++ standardization 2 1.2 Thesis structure 2 2 Common vulnerabilities 4 2.1 Buffer overflow 4 2.2 Use-after-free 5 2.3 Memory leak 6 2.4 Doublefree 6 2.5 Integer overflow 6 2.6 Race condition 6 2.7 Uninitialized variable 7 2.8 Units mismatch 7 3 Features improving code security 8 3.1 Garbage collector support 8 3.2 Smart pointers 9 3.2.1 Unique pointer 9 3.2.2 Shared pointer 10 3.2.3 Utility functions 11 3.3 Arrays with known size 12 3.3.1 Span 12 3.4 Compile time assertions 13 3.5 Type-safe units 13 3.6 Random numbers library 14 3.7 Ranges 16 3.8 Attributes 17 3.9 Concurrent programming 18 3.9.1 Lock guard 19 3.9.2 Type safety 19 3.9.3 Atomics 19 3.9.4 Parallel algorithms 19 3.10 Variadic templates 20 3.11 Class member initialization 21 4 Coding conventions 23 v 4.1 C++ Core Guidelines 23 4.1.1 Guideline Support Library 24 4.2 SEI CERT C++ Coding Standard 24 4.3 High Integrity C++ Coding Standard 24 4.4 AUTOSAR C++14 Coding Guidelines 25 4.5 C++ Besf Practices 26 5 Tools helping with modern C++ 27 5.1 Clang Static Analyzer 27 5.2 Cppcheck 28 5.3 ECLAIR 28 5.4 Klocwork 28 5.5 LDRA tool suite 28 5.6 Parasoft C/C++test 29 5.7 QA C++ 29 5.8 Clang-tidy 29 6 Other languages 32 6.1 Java 32 6.2 C# 33 6.3 Python 33 6.4 Rust 33 6.5 Go 34 6.6 Haskell 34 7 Analysis of C++ projects 35 7.1 Methodology 35 7.2 Analysis execution 37 7.2.1 Implementation details 37 7.3 Results 38 7.3.1 Combinations of memory issues 40 7.4 Conclusions 40 8 Conclusions 43 Bibliography 45 A Attachments 49 vi 1 Introduction Security of systems is not always a matter of security policies, au• thentication, or encryption. To compromise a system, it is sometimes sufficient to exploit a bug in the code. Programmers should be aware of possible vulnerabilities in code and strive to write code without security flaws. Such errors in code in most cases do not pose a problem for normal functionality of a system, but a motivated attacker can use them to cause unexpected behavior of program, breaking system's security as a result. With respect to code security, C++ has a bad reputation. The lan• guage was created as an extension to the C programming language, which does not have any mechanisms to protect programmer from writing code vulnerable to exploits by mistake. There exist other programming languages, where many of imper• fections of C are resolved, but are not capable of replacing C++. Some languages offer better security than C++, but at a cost of overhead. Main focus of C++ is to be as fast as possible, with zero cost abstrac• tions and no overhead. These properties are vital for applications where maximum performance is important, for example in embedded systems, where there are limited computing resources, in supercom• puters, where faster code allows to calculate larger simulations, or in data centers, where large amounts of requests must be satisfied in a limited time. Fast code is also more cost effective, performing computations in less time with less resources leads to a lower energy consumption. Embedded and mobile systems are then able to run longer on a power from batteries, and large data centers save money on electricity and cooling. Some languages claim to be fast and also secure, but these languages are relatively new, and maturity is also an important quality of a technology to be adopted in the industry. The aim of this thesis is to explore possibilities of writing C++ code not prone to common security vulnerabilities, with focus on features introduced in modern C++ standards. This thesis should serve as a brief handbook for C++ programmers, who want to learn how they can use modern C++ features to write more secure code. We show how specific features improve code security, together with brief examples how to use them, and with an information in which C++ standard 1 i. INTRODUCTION they were introduced. We also perform an extensive analysis of open source C++ projects, to find out how much are obsolete C++ features still used in practice. 1.1 C++ standardization C++ is a relatively old programming language. It first appeared as an object-oriented version of C called C with classes, created at Bell Labs by Bjarne Stroustrup. Name C++ was introduced in 1985. In 1998 C++ achieved an important milestone, when it has been standardized as an international standard ISO/IEC 14882:1998, also called C++98. Standardization is important, because it gives technology a stabil• ity, an important aspect for adoption in industry. C++ standardization committee is divided into domain-specific study groups, which rec• ommend submitted proposals for final voting and standardization. The ISO C++ committee consists of many experts from the industry, working for companies interested in C++, such as Google, Microsoft, Qualcomm, The Qt Company, and many more. Development of C++ has not stopped with C++98. In 2003, a main• tenance release known as C++03 was released, fixing defects discov• ered in the previous standard. Another standard was released 8 years later in 2011, and was a significant extension of the language. C++11 added many features for more convenient and secure programming, making C++ a more modern language. C++11 is acknowledged as a be• ginning of modern C++. To keep up pace with advances in software development, the standardization committee has decided to release new C++ standards every 3 years. C++14 followed in 2014, and C++17 is to this day the latest standard, published in 2017 [1]. Next revision [2] is expected to be published in 2020 as C++20. 1.2 Thesis structure First chapter is introduction. In Chapter 2, we describe common vul• nerabilities present in code. Chapter 3 presents features of modern C++ which improve code security. In Chapter 4, we list coding conventions recommending to use modern C++ features. Chapter 5 follows up with a collection of software tools helping programmers to use modern 2 i. INTRODUCTION C++. In Chapter 6, we briefly study how security issues addressed by modern C++ standards are present in other popular languages. Large scale analysis of open source C++ projects is described in Chapter 7. In the last chapter we conclude this thesis and the status of secure coding in C++. 3 2 Common vulnerabilities Errors in code caused by a programmer can lead to an undesired be• havior of a program, they can cause crashes, execution of arbitrary code, or even escalation of privileges. In this chapter we briefly de• scribe most common types of programming errors, what harm they can make, and whether there are mitigations preventing successful exploitation by an attacker. 2.1 Buffer overflow Buffer overflow is an error caused by accessing memory out of bound• aries of an array [3]. It is usually caused by writing too much bytes to an array without checking size of the array. Attacker can make use of the knowledge of how does execution stack look like to change values of variables or execute arbitrary code. To change value of a variable, attacker must know where on the stack the variable lies and cause overflow of bytes of specific value to overwrite the memory location of the variable on the stack. Similarly, to execute arbitrary code, attacker must overwrite return address on the stack to a specific address where code is prepared for execution, this technique is called return oriented programming (ROP). In C++, accessing arbitrary memory past arrays is an undefined behavior, meaning that anything can happen. Modern C++ compilers extensively take advantage of undefined behavior for optimization, the compilers simply assume that undefined behavior never happens.