Linköping University | Department of Computer and Information Science Master’s thesis, 30 ECTS | Datateknik 2021 | LIU-IDA/LITH-EX-A--21/018--SE

Using the SEI CERT Secure Cod- ing Standard to Reduce Vulnera- bilities

Johan Fisch Carl Haglund

Supervisors : Senyang Huang, Rahul Hiran, Ioannis Avgouleas Examiner : Andrei Gurtov

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko- pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis- ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker- heten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman- nens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

Johan Fisch © Carl Haglund Abstract

Security is a critical part of every developed today and it will be even more important going forward when more devices are getting connected to the internet. By striv- ing to improve the quality of the code, in particular the security aspects, there might be a reduction in the number of vulnerabilities and improvements of the software developed. By looking at issues from past problems and studying the code in question to see whether it follows the SEI CERT secure coding standards, it is possible to tell if compliance to this standard would be helpful to reduce future problems. In this thesis an analysis of vulner- abilities, written in and C++, reported in Common Vulnerabilities and Exposures (CVE), will be done to verify whether applying the SEI CERT secure coding standard will help reduce vulnerabilities. This study also evaluates the SEI CERT rule coverage of three dif- ferent static analysis tools, Rosecheckers, PVS-Studio and CodeChecker by executing them on these vulnerabilities. By using three different metrics, true positive, false negative and the run time. The results of the study are promising since it shows that compliance to the SEI CERT standard does indeed reduce vulnerabilities. Of the analyzed vulnerabilities it was found that about 60% of these could have been avoided, if the standard had been fol- lowed. The results of the tools were of great interest as well, it showed that the tools did not perform as well as the manual analysis, however, all of them found some SEI CERT rule vi- olations in different areas. Conclusively, a combination of manual analysis and these three static analysis tools would have resulted in the highest number of vulnerabilities avoided. Acknowledgments

We would like to thank Ericsson and their employees that have been involved in our work. A special thanks goes out to Rahul Hiran, our supervisor at Ericsson. Without his interesting ideas and help throughout the whole process, the results of the thesis would not have been the same. We would also like to thank the developers of the tool CodeChecker at Ericsson, especially Daniel Krupp who took the time to have a meeting with us and explain more about the tool. Appreciation also goes out to Linköping University. We would like to thank our supervisors Senyang Huang and Ioannis Avgouleas as well as our examiner Andrei Gurtov who have assisted us with the thesis writing and provided us with interesting and valuable thoughts about the area.

iv Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables ix

Listings x

1 Introduction 1 1.1 Motivation ...... 2 1.2 Aim...... 2 1.3 Research questions ...... 2 1.4 Delimitations ...... 2

2 Theory 4 2.1 Secure software development ...... 4 2.2 CVE...... 5 2.3 SEI CERT Coding Standard ...... 5 2.4 SEI CERT C Coding Standard ...... 5 2.5 SEI CERT C++ Coding Standard ...... 11 2.6 CVSS ...... 11 2.7 Static Analysis Tools ...... 11 2.8 Programming languages ...... 13

3 Related Work 15 3.1 Secure coding ...... 15 3.2 Benefits of coding standards ...... 16 3.3 Evaluation of static analysis tools ...... 17 3.4 Collection of vulnerabilities ...... 18

4 Method 20 4.1 Approach ...... 20 4.2 Gathering of vulnerabilities in CVE ...... 20 4.3 Analyzing vulnerabilities in CVE ...... 21 4.4 Gathering rule specific CVE vulnerabilities ...... 25 4.5 Analyzing rule specific CVE vulnerabilities ...... 26 4.6 Case studies ...... 26

5 Results 28 5.1 Gathering of vulnerabilities in CVE ...... 28

v 5.2 Analyzing vulnerabilities in CVE ...... 28

6 Discussion 41 6.1 Method ...... 41 6.2 Results ...... 44 6.3 The work in a wider context ...... 47

7 Conclusion 48 7.1 How can vulnerabilities be reduced in the early phase of software development? 48 7.2 To what extent does SEI CERT compliance help reduce vulnerabilities? . . . . . 48 7.3 What tools can help complying with the SEI CERT secure coding standard? . . 49 7.4 Future work ...... 49

Bibliography 50

A Script for gathering EXP34-C CVE vulnerabilities. 54

B Script to gather C++ CVE:s 55

C C CVE:s 56

D C++ CVE:s 58

E Rule Specific CVE:s 60

F Rule Specific figures 63 F.1 ARR30-C ...... 63 F.2 EXP33-C ...... 65 F.3 EXP34-C ...... 66 F.4 FIO47-C ...... 67 F.5 INT30-C ...... 68 F.6 INT32-C ...... 69 F.7 INT33-C ...... 69 F.8 MEM30-C ...... 70 F.9 MEM35-C ...... 71 F.10 STR31-C ...... 72

vi List of Figures

2.1 Abstract syntax tree generated for the code in Listing 2.9 ...... 13

4.1 Description of a CVE vulnerability...... 21 4.2 Example of PVS-Studio output...... 23 4.3 Example of Rosecheckers output...... 24 4.4 Rules that Rosecheckers covers for memory management [rose_source]...... 24 4.5 Example of CodeChecker HTML output...... 25

5.1 SEI CERT C Rule vs. No Rule distribution for the 60 CVE:s analyzed...... 29 5.2 SEI CERT C Rule distribution for the 38 CVE:s where a rule could be found. . . . . 29 5.3 Risk distribution for the 16 different rules found during C analysis...... 30 5.4 Number of SEI CERT C violations found per rule during C analysis...... 30 5.5 Percentages of violations found per rule during C analysis...... 31 5.6 Violations found in relation to size during C analysis...... 31 5.7 SEI CERT C++ Rule vs. No Rule distribution for the 60 CVE:s analyzed...... 32 5.8 SEI CERT C++ Rule distribution for the 37 CVE:s where a rule could be found. . . 33 5.9 Risk level distribution for the 12 different rules found during C++ analysis. . . . . 33 5.10 Number of SEI CERT C++ violations found per rule during C++ analysis...... 34 5.11 Percentages of violations found per rule during C++ analysis...... 34 5.12 Static analysis tools Run time comparison...... 35 5.13 PVS & Rosecheckers Run time in relation to project size...... 35 5.14 PVS & Rosecheckers Run time in relation to number of files...... 36 5.15 CodeChecker Run time in relation to project size...... 36 5.16 CodeChecker Run time in relation to number of files...... 36 5.17 Rule specific violations found per static analysis tool...... 37 5.18 Rule specific project size in relation to found violations per static analysis tool. . . 38 5.19 Rule specific project run time in relation to size per static analysis tool...... 39 5.20 Rule specific number of violations found in relation to CVSS per static analysis tool. 39

F.1 ARR30-C Size related to run time...... 63 F.2 ARR30-C Size related to number of found violations...... 64 F.3 ARR30-C CVSS related to number of found violations...... 64 F.4 EXP33-C Size related to run time...... 65 F.5 EXP33-C Size related to number of found violations...... 65 F.6 EXP33-C CVSS related to number of found violations...... 65 F.7 EXP34-C Size related to run time...... 66 F.8 EXP34-C Size related to number of found violations...... 66 F.9 EXP34-C CVSS related to number of found violations...... 66 F.10 FIO47-C Size related to run time...... 67 F.11 FIO47-C Size related to number of found violations...... 67 F.12 FIO47-C CVSS related to number of found violations...... 67 F.13 INT30-C Size related to run time...... 68 F.14 INT30-C Size related to number of found violations...... 68

vii F.15 INT30-C CVSS related to number of found violations...... 68 F.16 INT32-C Size related to run time...... 69 F.17 INT33-C Size related to run time...... 69 F.18 INT33-C Size related to number of found violations...... 69 F.19 INT33-C CVSS related to number of found violations...... 70 F.20 MEM30-C Size related to run time...... 70 F.21 MEM30-C Size related to number of found violations...... 70 F.22 MEM30-C CVSS related to number of found violations...... 71 F.23 MEM35-C Size related to run time...... 71 F.24 MEM35-C Size related to number of found violations...... 71 F.25 MEM35-C CVSS related to number of found violations...... 72 F.26 STR31-C Size related to run time...... 72 F.27 STR31-C Size related to number of found violations...... 72 F.28 STR31-C CVSS related to number of found violations...... 73

viii List of Tables

2.1 Likelihood table in a risk assessment...... 7 2.2 Severity table in a risk assessment...... 7 2.3 Remediation Cost table in a risk assessment...... 7 2.4 Possible levels in a risk assessment...... 8

4.1 Rules tested in tools analysis...... 26

5.1 Rules tested in C analysis...... 29 5.2 True positive and False negative for C analysis...... 32 5.3 Rules tested in C++ analysis...... 32 5.4 True positive and False negative for C++ analysis...... 37 5.5 Found violations per tool for each Size range during Rule specific analysis. . . . . 38 5.6 True positive and False negative for Rule specific analysis...... 40

C.1 CVE:s tested in C CVE analysis...... 56

D.1 CVE:s tested in C++ CVE analysis...... 58

E.1 CVE:s tested in Rule Specific CVE analysis...... 60

ix Listings

2.1 Off-by-One error...... 8 2.2 Fixed Off-by-One error...... 8 2.3 Accessing freed memory...... 9 2.4 No longer accessing freed memory...... 9 2.5 Format string bug...... 9 2.6 No longer contains a Format string bug...... 10 2.7 Integer overflow...... 10 2.8 Fixed Integer Overflow...... 10 2.9 Abstract syntax tree example code...... 13 4.1 Python script for extracting C vulnerabilities...... 21 4.2 Commands for PVS-Studio analysis...... 22 4.3 Python script for adding PVS-Studio student license comment...... 23 4.4 Command for getting the docker container...... 23 4.5 Example command for running Rosecheckers analysis on the rtp.c file in the Janus-gateway project...... 23 4.6 Command for setting up and running CodeChecker ...... 24 4.7 Git diff for CVE-2020-14033...... 27 4.8 Git diff for CVE-2018-9304...... 27 4.9 Line with problematic code for CVE-2019-9113...... 27 A.1 Python script for extracting EXP34-C CVE vulnerabilities...... 54 B.1 Python script for extracting C++ vulnerabilities...... 55

x 1 Introduction

In a society that is constantly moving forward and where the amount of connected devices is increasing each day, the danger of cyber attacks is rising and the need for a defense is more important than ever [15]. This potential danger is a significant issue, in both private and public sectors, where involved parties need to consider different security aspects such as which systems to defend, the expected frequency of cyber attacks and what type of security to invest in. As Ijaz Ahmad et al. explain in their article "Security for 5G and Beyond" [1], the rise of 5G and the massive growth of connected devices that come with it have also opened up for more security threats. The emerging 6G standard, which will be introduced in about 10 years and change the society we live in, also comes with challenging security threats that needs to be taken care of [36]. One way to address the issue of software vulnerabilities and as such cyber attacks is to in- troduce secure coding standards into the software development and maintenance of the code [44]. Mark Grover et al. [25] conclude in their study that cyber attacks will increase over time and that introducing secure coding is an effective countermeasure. By applying these types of standards, the programmers will be encouraged to follow a collection of guidelines that are established to make the software more secure. There are a great number of different stan- dards and guidelines that can be followed depending on the type of programming language used. For example, the SEI CERT Secure Coding Standard [27] and MISRA C [6]. Static analysis tools are one way to identify the existing vulnerabilities within the code. These tools can also help with complying to a specific coding standard, since the purpose of the tools is to give warnings where the code is non-compliant with the standard. There are multiple studies that have been conducted showing that static analysis tools result in both false and correct warnings, called false positive and true positive. Studies like Jiang Zheng et al.’s [54], also shows that tools are a good way to observe what mistakes occur most often. They found that "possible use of NULL pointer" stood for 45.92% of all the vulnerabilities. While it can be discussed whether one static analysis tool is better than the other in regards to violations found, it is equally important to include the run time of the tool as well as the amount of projects the tool actually can be run on. Therefore, an evaluation of different tools is of great interest when deciding on what tool to use in a project.

1 1.1. Motivation

1.1 Motivation

Different coding standards have for a long time been suggested to increase the security, reli- ability and overall quality [27, 13]. Currently there is not much empirical evidence backing these statements, and studies have even shown that conformance to all rules in a specific standard may result in more introduced faults [11]. This is beyond doubt an interesting area and more effort needs to be put into these types of questions. Therefore, the motivation and main focus of this thesis seeks to establish how compliance to different rules of SEI CERT Coding Standards can be used to reduce the amount of vulnerabilities and as such improve the quality of the code.

1.1.1 Ericsson This thesis is conducted at Ericsson, which is a leading company of Information and Com- munication Technology. Ericsson works with technology ranging from networks and digital services, to managed services and emerging business. This leads to Ericsson having to deal with thousands of lines of code every day. As a part of this, Ericsson needs to handle all vul- nerabilities related to the code. A recurring part of Ericsson’s development process is to write trouble reports (TRs) when problems occur. Since this is taking a substantial amount of time, this is something Ericsson wants to minimize. By investigating whether using the SEI CERT standard is beneficial, Ericsson hopes to reduce the vulnerabilities, meaning the workload in regards to TRs. Ericsson could further improve the quality of their products by writing more secure code.

1.2 Aim

This thesis aims to test if compliance to the SEI CERT secure coding standard can help reduce vulnerabilities. This can be achieved by analyzing vulnerabilities, both manually and with static analysis tools, reported in Common Vulnerabilities and Exposures (CVE) [17], a public database where a significant number of vulnerabilities from different software projects are reported. A secondary aim of this thesis is to evaluate different static analysis tools in regards to SEI CERT coverage and performance.

1.3 Research questions

To achieve the aim of the thesis, the following research questions will be answered:

RQ1. How can vulnerabilities be reduced in the early phase of software development?

RQ2. To what extent does SEI CERT compliance help reduce vulnerabilities?

RQ3. What static analysis tools can help complying with the SEI CERT secure coding stan- dard?

1.4 Delimitations

When analyzing what tools can help complying with SEI CERT, the time it takes to run the tools on the projects is an important factor and it was decided to limit this by not including projects that are very large in size, such as the kernel. The tools analyzed were limited to three different static analysis tools, mainly because the selected ones were free to use and because they cover many SEI CERT rules. Both of these limitations were also due to the 20 week time limit for the research, study and report to be finished.

2 1.4. Delimitations

Another delimitation is that this thesis only looks at the C and C++ SEI CERT coding standards and not the Android, Java or standards. This is primarily because it was requested by Ericsson, but also due to the limited time for the thesis to be completed.

3 2 Theory

In this chapter, theory about Secure software development, CVE, and the SEI CERT Coding Standard will be given. After that, CVSS will be briefly presented as well as the static analysis tools used.

2.1 Secure software development

In this subsection, secure software development (also referred to as secure coding inter- changeably throughout this thesis) and how it can be applied to a project will be presented. The reason secure coding, at this moment, is a prevailing topic and more important than ever before to devote to, is because of the ever-present threat of cyber attacks. As more and more systems are getting connected to the internet people need to consider the risks and if protect- ing the system is cost-effective [10]. As Pawani Porambage et al. talk about in the article "The Quest for Privacy in the Internet of Things" [37], the big increase of the Internet of Things (IoT) is a reason to why the developers need to take the privacy of their users into consider- ation and why they need to develop secure and trustworthy software that protects the users in any case. Secure software development is what developers need to adhere to, to make sure that the software is protected from vulnerabilities. By neglecting this, it could result in loss of important data, denial of services, company secrets getting leaked or damage to the system. The software development life cycle (SDLC) is a model that includes different processes of the life cycle of the development of a software project [8]. This model consists of six steps, the requirement phase, the architecture and design phase, the implementation phase, the testing phase, the deployment phase and the maintenance phase. Secure coding is not only applied in the implementation phase, instead this is something that should be considered throughout the whole life cycle. Security requirements should start being established as early as in the requirement phase. In the architecture and design phase, risk analysis should be exercised. The implementation phase might contain risk-based security testing and static analysis, and deployment and verification phase could for example include risk analysis and penetration testing. There are many ways developers can make their systems more secure against threats. Commonly used practices could for example be to follow a specific secure coding standard,

4 2.2. CVE e.g. the SEI CERT Secure Coding Standard, or adhere to a special SDL method. The next section will introduce the CVE database and describe some of its components.

2.2 CVE

The Common Vulnerabilities and Exposures, or CVE, is a program and a database that was launched in 1999 [17]. CVE aims to gather known vulnerabilities from different software projects. CVE consists of countless CVE records, these records contain, amongst other things, a CVE ID number, a description, and a section of references. The CVE records also have the vulnerability analysis and the description that often includes references to the source of the vulnerability. The reference section usually contains links, for example, a GitHub repository link or a report of the vulnerability on the developers product website. Because of the high usability of CVE, CVE has become the industry standard for vulnerability reports [18]. There are a few other databases, such as the IBM X-Force Exchange [26] and SecurityFo- cus [45], that also collect and show vulnerabilities. The IBM X-Force Exchange usually has a link to a CVE record in the vulnerability record, if there is one for the vulnerability. Vulner- ability records in the IBM X-Force Exchange have tags, making the searching for particular vulnerability types quite easy.

2.3 SEI CERT Coding Standard

The SEI CERT Secure Coding Standard includes five different components where each of them consists of guidelines about secure coding in that specific programming language or area:

1. SEI CERT C Coding Standard

2. SEI CERT C++ Coding Standard

3. SEI CERT Oracle Coding Standard for Java

4. Android™ Secure Coding Standard

5. SEI CERT Perl Coding Standard

In particular, CERT stands for Computer Emergency Response Team and is a program that is part of the Software Engineering Institute (SEI) at Carnegie Mellon University [14, p. xxvii]. Originally the CERT program was created to help teams of experts communicate during security emergencies, however this is no longer the sole purpose of CERT. They now produce analysis in different security areas as well as provide standards for secure coding practices. Next, theory about the SEI CERT C standard will be introduced. All the sections explained for the C standard can be applied for the SEI CERT C++ standard and most of the SEI CERT C rules are included in the C++ standard as well. Therefore the C++ section will be shorter, as otherwise there would be a lot of repetition.

2.4 SEI CERT C Coding Standard

The CERT C Secure Coding Standard, referred to as CERT C standard at times, is developed by SEI and the goal of the standard is to make it easier to develop safe, reliable, and secure systems [27]. Compliance to the CERT C standard will make sure that the system is more secure and reliable on a code level, but this is not always enough since there might exist critical design flaws in the system design, which is not something that SEI CERT directly addresses. In systems where safety is of utmost importance, the requirements are usually stricter than that of the CERT C standard.

5 2.4. SEI CERT C Coding Standard

The different rules in the SEI CERT C Secure Coding Standard consist of a few parts. A title to shortly describe the rule and a description that is a bit more specific and explains the requirements of the rule. There are also code examples, both non-compliant and compliant examples. The guidelines also consist of recommendations to help guide the programmers to a more secure and reliable code. These recommendations do not need to be followed in the same way that a rule does and a violation of a recommendation does not automatically mean that the code is insecure or bad. To check for compliance to the SEI CERT C coding standard it is most efficient to have a static analysis tool setup, explained in Section 2.7, but it can also be done manually. However, manual analysis takes a lot more time than automated tools analysis. The SEI CERT C Secure Coding Standard is meant to make project members change the way they think about secure coding in software development. By adhering to this standard the team can create the highest form of value, and also gain knowledge that will be useful for a long time in future work.

2.4.1 Scope of SEI CERT C As for now, the SEI CERT C Secure Coding Standard focuses mainly on version C11 (ISO/IEC 9899:2011), but it can also be practiced in previous versions. The difference between the versions may lead to ambiguities, therefore it is important when following the standard, to look for notations about how the standard would affect a specific version. Some of the issues that are not addressed in the CERT C standard are the coding style and rules that are seen as controversial. The reason for this is that coding style is usually subjec- tive and it is extremely difficult to create a style guide that is in agreement with everyone. Therefore coding style is skipped completely in the CERT C standard. For a similar reason, controversial rules are skipped. Since there is no broader consensus on these controversial rules CERT have decided to not include these at all.

2.4.2 Validation Compliance with the SEI CERT C Secure Coding Standard can be checked with different static analysis tools. The reason to use these is because of the increased complexity of a program with thousands lines of code. The static analysis tools can not be applied to enforce all of the guidelines, since some of the rules are only meant to be descriptive, for example, "MSC41-C. Never hard code sensitive information". A static tool will in most cases not be able to tell whether a program is following a specific guideline or a set of rules. The reason for that is because it is computationally infeasible to tell if a program is pursuing a specific rule or recommendation. When deciding which static analysis tool to use there are certain aspects that should be taken into consideration. One of these aspects is the completeness, which means that no false positives are reported, and soundness, which means that no false negatives are reported. Where a false negative means that there is a vulnerability in the program but the static analysis tool does not report it, and a false positive means that the tool reports that there is a vulnerability in the code when in reality there is not [20]. The false negative is usually the more serious of these two since it leaves the users of the tool with the illusion that there are no vulnerabilities in the program. When deciding on what analyzer to use, it is important to choose one that is both sound and complete according to the specific guidelines or set of rules that are of importance.

2.4.3 Rules and recommendations As mentioned earlier, the foundation of CERT Coding Standard is built upon rules and rec- ommendations. For the CERT C standard there are in fact 99 rules and 185 recommendations present. Next, these rules and recommendations will be introduced.

6 2.4. SEI CERT C Coding Standard

Table 2.1: Likelihood table in a risk assessment. Value Category 1 Unlikely 2 Probable 3 Likely

The purpose of the rules is to guide the developers throughout the development process. The rules can be seen as requirements that the coders need to follow to comply with the stan- dard. A failure to comply with a specific rule should result in a defect in the code which could lead to an exploitable vulnerability in the program. By making sure every rule is adhered to, the program is considered to be more reliable, secure and safe. A recommendation is not treated as a rule, this means that if a specific recommendation is not followed there will not pop up a vulnerability as a result of this. Instead a recommendation could be seen as a way to help the developers navigate through the development process to make the final product more stable and to improve the safety and security. Thus it can be seen that together, rules and recommendations are referred to as guidelines for the developers through the developing process. Each rule has a risk assessment where it is given a level depending on how likely, severe and the ease of fixing it. These levels are L1, L2, and L3. The likelihood of a rule leading to a vulnerability that an attacker can exploit is measured in three categories i.e., unlikely, probable, likely, each of these are given a value 1-3 (see Table 2.1). The severity is a measure of the possible consequences of a vulnerability that occurred due to a rule violation. Each example of this is categorized as either low, medium or high severity. These are also explained with an example, which can be seen in Table 2.2. The third metric that is used in the risk assessment is the remediation cost. This is the estimated cost for the developers to change a program that is violating a rule to make it comply to the standard. The remediation cost is categorized in the same way as the severity, but this time each category is also given a detection and correction class, which can be either automatic or manual, as seen in Table 2.3.

Table 2.2: Severity table in a risk assessment. Value Category Examples 1 Low DoS attack 2 Medium Data breach 3 High Buffer overflow

Table 2.3: Remediation Cost table in a risk assessment. Value Category Detection Correction 1 High Manual Manual 2 Medium Auto Manual 3 Low Auto Auto

When all these three metrics have been evaluated it is possible to give each rule an accu- rate level (L1, L2, L3). This is done by multiplying the value of the likelihood, severity and remediation cost. If the product of the multiplied values is in the range 1 to 4, it is given the level 3 (L3, low severity, probable, medium remediation cost), if it is in the range of 6 to 9 it is given the level 2 (L2, low severity, likely, low remediation cost) and the range 12 to 27 is given the level 1 (L1, high severity, likely, medium remediation cost), as seen in Figure 2.4.

7 2.4. SEI CERT C Coding Standard

Table 2.4: Possible levels in a risk assessment. Levels Priorities Clarification L1 12, 18, 27 High severity, likely, medium remediation cost L2 6, 8, 9 Low severity, likely, high re- mediation cost L3 1, 2, 3, 4 Low severity, probable, medium remediation cost

2.4.4 Code Examples for Rules As mentioned in the former section, the SEI CERT C Coding Standard includes 99 differ- ent rules. These rules are categorized in different areas, such as Preprocessor, Declarations and Initialization, Integers, Characters and Strings. Next, examples from these areas will be presented.

Characters and Strings An example of a rule in this area is "STR31-C. Guarantee that storage for strings has sufficient space for character data and the null terminator" [27, p. 230], which is an L1 rule on the risk assessment scale. This rule is meant to hinder users from causing a buffer overflow by overwriting a buffer with data. An Off-by-One Error is a typical case of this, which can be seen in Listing 2.1 , this may occur if the null terminator is not taken into consideration when writing the looping condition.

1 void copy_to_buffer(char *input) { 2 char buf[20]; 3 if(strlen(input) > 20) { 4 exit(1); 5 } 6 strcpy(buf, input); 7 } Listing 2.1: Off-by-One error.

In this case the Off-by-One error could be exploited since the strlen() function does not take the null terminator into account when checking the length of the input. This means that the if-statement will be passed if the input is 20 characters long (not counting the null terminator) and the call to strcpy() will write outside of the buffer. To comply with the rule and avoid this vulnerability, the function could be rewritten as in Listing 2.2, where the null- terminator is taken into consideration by the sizeof() function call in the if-statement.

1 void copy_to_buffer(char *input) { 2 char buf[20]; 3 if(sizeof(input) > 20) { 4 exit(1); 5 } 6 strcpy(buf, input); 7 } Listing 2.2: Fixed Off-by-One error.

8 2.4. SEI CERT C Coding Standard

Memory Management One of the rules this area contains is the L1 rule "MEM30-C. Do not access freed memory" [27, p. 256]. An example of this is when accessing a dangling pointer, which can be described as a pointer that used to point to data located in memory, but now points to nothing more than the memory address since that data has been deleted and no longer exists. This behavior can be seen in Listing 2.3 and may result in undefined behavior.

1 void use_after_free(char *msg) { 2 char* ptr = (char*) malloc(sizeof(char)); 3 ptr ="abc"; 4 if(msg != NULL) { 5 free(ptr); 6 } 7 printf("error on:%s\n", ptr); 8 } Listing 2.3: Accessing freed memory.

To avoid this happening it is important that the pointer is not being dereferenced after it has been freed, as in Listing 2.4.

1 void use_after_free(char *msg) { 2 char* ptr = (char*) malloc(sizeof(char)); 3 ptr ="abc"; 4 printf("error on:%s\n", ptr); 5 if(msg != NULL) { 6 free(ptr); 7 } 8 } Listing 2.4: No longer accessing freed memory.

Input/Output From the Input/Output area there are rules such as "FIO30-C. Exclude user input from format strings" [27, p. 281], which is also an L1 rule. This type of rule exists to prevent attackers from having the opportunity to directly control the contents of a string format. When an attacker controls the string format, it may be possible to view contents on the stack/memory or even to execute arbitrary code through the vulnerable software. A non-compliant code example can be seen in Listing 2.5, where a user were to give his/her password to be able to log in to a service.

1 void check_user_password(const char *psw) { 2 if(psw == NULL) { 3 no_psw_input(); 4 } else if(...) 5 {...} 6 else{ 7 printf("Wrong password, you wrote:"); 8 printf(psw); 9 } 10 } Listing 2.5: Format string bug.

To fix this type of bug it is important to not allow the user to have any control over the format of the string. One way to do it is shown in Listing 2.6, where the printf() call now uses the "%s" format specifier to insert the user input into the string.

9 2.4. SEI CERT C Coding Standard

1 void check_user_password(const char *psw) { 2 if(psw == NULL) { 3 no_psw_input(); 4 } else if(...) 5 {...} 6 else{ 7 printf("Wrong password, you wrote:%s\n", psw); 8 } 9 } Listing 2.6: No longer contains a Format string bug.

Integers One of the rules which belongs to the Integer area is "INT32-C. Ensure that operations on signed integers do not result in overflow" [27, p. 147]. This is an L2 rule that has a high sever- ity, is likely, and has a high remediation cost, therefore making it L2 instead of L1. This rule ensures that signed integer overflows are handled, meaning that a program will not allow a signed int to go outside of its defined range, which would result in undefined behavior and often means the integer turning negative. This may result in an attacker bypassing weaker if- statements meant to prevent these overflows, as seen in Listing 2.7, where an attacker could exploit the code by giving "size_a" and "size_b" INT_MAX values. This could result in the variable "size" becoming negative, which would bypass the if-statement and allow the at- tacker to overflow the buffer.

1 void func(signed int *size_a, signed int *size_b) { 2 char buf[1024]; 3 signed int size = size_a + size_b; 4 if(size > 1024) { 5 printf("size ofc too big for buffer"); 6 return ERROR_CODE; 7 } 8 printf("size will fit buffer"); 9 return OK; 10 } Listing 2.7: Integer overflow.

To avoid this you need to check the integers size_a and size_b as in Listing 2.8.

1 void func(signed int *size_a, signed int *size_b) { 2 char buf[1024]; 3 signed int size = size_a + size_b; 4 if(((size_b > 0) && (size_a > (INT_MAX - size_b))) || 5 ((size_b < 0) && (size_a < (INT_MIN - size_b)))) { 6 exit(0); 7 } 8 if(size < 0 && size > 1024) { 9 printf("size too big(or negative) for buffer"); 10 return ERROR_CODE; 11 } 12 printf("size will fit buffer"); 13 return OK; 14 } Listing 2.8: Fixed Integer Overflow.

10 2.5. SEI CERT C++ Coding Standard

2.5 SEI CERT C++ Coding Standard

As for the C version of this standard, the CERT C++ Secure Coding Standard [7] is developed by the same institute, SEI. The purpose of this standard is also the same, to develop safe, reliable, and secure systems, but this time for systems written in the programming language C++. The CERT C++ standard references the C standard for some parts, for example some of the rules are included in the C standard also apply in the C++ standard. The standard could also be used by software customers when defining important requirements for the software.

2.5.1 Scope of SEI CERT C++ The scope of the SEI CERT C++ Coding Standard mainly focuses on the C++ version C++14 (ISO/IEC 14882 standard), but can be applied for earlier released versions as well. As for the C standard, the guideline of the C++ standard consists of rules and recommendation where each rule and recommendation will have a compliant and non-compliant code example that conforms with the C++14 guidelines. The issues not addressed in the C++ standard are the same as the ones not addressed for the C standard.

2.5.2 Validation The validation of the SEI CERT C++ Coding Standard is the same as for the C standard in section 2.4.2, meaning, compliance to the C++ standard could be checked in the same way.

2.5.3 Rules and recommendations In the CERT C++ standard, the rules and recommendations are defined and function in the same way as for the CERT C standard, described in section 2.4.3. However, this time SEI decided that the recommendations should not be included until additional research and de- velopment have been done into this area. Two new main rule areas have been added in the C++ standard, these are Object Oriented Programming (OOP) and Containers (CTR). How- ever, containers is very similar to the Array area in the C standard and even includes some of the rules from the Array area, but it is expanded since C++ has multiple different container types.

2.6 CVSS

Each CVE record is given a CVSS base score by the National Vulnerability Database (NVD), which can be described as a way to classify the all around severity of the specific vulnerability [46]. The CVSS base score ranges from 0 to 10, where 10 is the most severe. Depending on the CVSS base score, the vulnerability is also given a severity rank of "None", "Low", "Medium", "High", or "Critical" if the base score is between 0, 0.1-3.9, 4.0-6.9, 7.0-8.9 and 9.0- 10.0, respectively.

2.7 Static Analysis Tools

A static analysis tool is a tool that aims to identify as many coding problems as possible during development and testing [16]. Software consists of that has to be run, static analysis tools are able to examine this code statically without needing to execute the software. However, all vulnerabilities are not possible to find with static analysis, which means that only static analysis is not enough. One form of static analysis is Manual auditing, this is done by letting people go through the code, line by line. This sort of analysis is known to be slow and slightly problematic since the people analyzing the code needs to know a lot about security vulnerabilities by hand, which leads to many vulnerabilities being missed.

11 2.7. Static Analysis Tools

By using a tool instead of manual auditing the analysing process will be more precise and consistent since it will, for each analysis, base it on the set of rules it was programmed to use. Although, this set of rules is not complete or perfect in any way and a static analysis tool should not be trusted completely. When using static analysis tools, one thing to keep in mind is that these tools may report a large number of false negatives or false positives [19, 38], meaning that the software consists of vulnerabilities that the tool does not report or that the tool reports vulnerabilities that really is not a problem in the software, respectively. Static analysis tools will probably not uncover all of the vulnerabilities and bugs in software, this is something developers need to keep in mind when using these types of tools [2]. A good way to use the tools could be as assistance during manual coding reviews. A technique that static analysis tools can apply is pattern matching [3]. This method scans the code for predefined and potentially dangerous patterns. This could for example be unsafe library functions like gets() and sprintf(). Another approach is the Data-flow analysis method [3], by going through all possible paths this method gathers the information needed to be able to tell whether a set of values or a chosen path is dangerous. This could be if a program variable is used in a way that might be dangerous, e.g. in an unsafe library function. Symbolic execution [28] is another technique that can be used, this method will use symbolic values instead of running the program with actual inputs. When the symbolic values are used, the interpreter gets constraint expressions based on the symbolic values which contain every possible outcome.

2.7.1 CodeChecker CodeChecker [22] is a static analysis tool developed by Ericsson. It uses the analyzers Clang- Tidy and Clang Static Analyzer with the possibility of using Cross-Translation Unit analysis (CTU), which makes it possible to analyze functions that communicate between multiple files. The Clang Static Analyzer uses techniques based on symbolic execution and path-sensitive inter-procedural analysis [51], which is a type of data-flow analysis. Clang-Tidy is more of a linter-type tool that focuses on finding simpler errors related to style and syntax [52]. There is also the possibility for statistical analysis when there are checkers available for use. Results can be visualized in both the terminal and in static HTML files. CodeChecker also allows for web based report storage, which means that the performed analysis can be visualized in a web browser where it is easier to go through the report of the code.

2.7.2 PVS-Studio PVS-Studio [31], which will be referred to as PVS at times, is a static analysis tool developed to work with programming languages written in either C, C++, C# or Java and it can be run on most operating systems today, such as Windows, Linux and macOS. PVS-Studio applies techniques such as pattern-matching, symbolic execution and data-flow analysis. When PVS- Studio analyzes the source code, it will print out error codes that correspond to some type of rule, for example: V501, V517, V522. These error codes have a short description of what they are and a more in-depth explanation is available on the PVS-Studio website. There is also a classification table of these error codes and warnings available on the PVS-Studio website, where you can look up which SEI CERT rule the error code corresponds to. PVS-Studio costs money for commercial use, but for students there is a free license avail- able [30]. There is a small catch to the free license however, in that each file that is analyzed needs to start with a specific set of lines stating that it is for an academic project. This free version does not have the full set of features that the paid version has, but the analysis is not hindered; only some of the customization commands are restricted.

12 2.8. Programming languages

2.7.3 Rosecheckers Another static analysis tool is Rosecheckers, which is a tool developed by the Software En- gineering Institute at Carnegie Mellon University that performs static analysis on software written in the C and C++ programming languages [50]. The tool is made to check for com- pliance to the SEI CERT Secure Coding Standard. By reading the source code and generating an Abstract Syntax Tree (AST), Rosecheckers is able to create a graph of the analyzed code [40]. The AST is then traversed to be able to check for compliance to SEI CERT. An example of how an AST may look like can be seen in Figure 2.1, which was generated from the code shown in Listing 2.9.

1 while x > y: 2 x -= 1 3 returnx Listing 2.9: Abstract syntax tree example code.

Figure 2.1: Abstract syntax tree generated for the code in Listing 2.9

2.8 Programming languages

Below we will give a short introduction to the programming languages that were used in this thesis.

2.8.1 C The C programming language [42] [41] is on of the most popular programming languages. It was created in 1972 by Dennis Ritchie when he worked at Bell Labs. C is a low-level program- ming language that was designed to make it easier for the developers to access the memory. To be able to execute C code, it will first have to be compiled. It is a cross-platform language and can be run on multiple different operating systems like Windows, macOS and different Unix variants. A programming paradigm that C uses is the structured programming, un- like many newer languages like Python and JavaScript, C does not include object oriented programming or garbage collections.

13 2.8. Programming languages

2.8.2 C++ The C++ programming language [47] [48] started as an extension of the C programming language. It was developed by Bjarne Stroustrup in the 1980s. In comparison to C, C++ makes it possible for the developers to use object oriented programming and to implementing classes. C++ was meant to offer the same efficiency and flexibility as C but to also include support for high-level programming. To be able to run C++ code, just like for C code, it will first have to be compiled. The platforms that are supported for C++ are, same as for C, Windows, macOS and different Unix distributions.

2.8.3 Python Guido van Rossum, the creator of Python [39], started working on an implementation of Python in the 1980s, but the first release did not happen until 1991. Python is an interpreted, high-level, and object oriented programming language that supports dynamic typing, func- tional programming and garbage collection. The Python language is a programming lan- guage that is open source, meaning everyone can contribute with improvements. As for the other programming languages described in previous sections, Python is cross-platform and can be run on most operating systems.

14 3 Related Work

This section aims to introduce previous studies that have been conducted within the area. The first section will present secure coding. The second section will introduce the benefits of coding standards and what they bring to the table. The third section will present earlier studies that have been conducted to evaluate different static analysis tools. The fourth section will demonstrate a way to collect vulnerabilities.

3.1 Secure coding

It is often questioned by management whether it is worth the time, money, and effort to im- plement a more strict and secure coding standard when developing information systems. This problem is also mentioned in the article “Moving Beyond Coding: Why Secure Coding Should be Implemented” by Mark Grover et al. [25]. In the article, where they review this problem, Grover et al. give examples of major data breaches that are caused by poor secu- rity. As mentioned in the review, what is even worse is that in the future more of these types of attacks can be expected since the number of devices that are connected to the internet is increasing drastically. It can be seen that secure coding will be more important than ever before. The article also looks at the definition of secure coding and by combining different definitions they end up defining secure coding as: “The practice of writing code that is resis- tant to attacks.”. They claim that one of the main reasons for software not being developed in a secure way is that developers are usually under time pressure and the main focus is to develop something that works. This means that the security is often an afterthought and not something taken into consideration from the start. This article shows that the topic of this thesis is relevant and that there is a lot of evidence to suggest that secure coding may be helpful for organizations that develop different kinds of systems. Juan F. García et al. [24] depict another interesting point of view regarding secure cod- ing standards. Unlike this thesis, which will try to answer if secure coding standards can reduce the number of vulnerabilities in code, García et al.’s “C Secure Coding Standards Per- formance: CMU SEI CERT vs MISRA” tries to answer whether secure coding standards affect the performance, more specifically the run time, of a program. García et al. also compared the two different secure coding standards, SEI CERT C with MISRA C. To accomplish this they compared the solutions to six different coding problems, where they had three versions of the solutions. One where no standards were followed, one where the SEI CERT C coding

15 3.2. Benefits of coding standards standard was used, and one final where the MISRA C standard was used. They then exe- cuted these different versions and compared the run times. This showed that, in relation to run time, the original with no standards applied was always the fastest, while the MISRA version was usually the same (only slower on 1/6 problems), and the SEI CERT version was slower than the original on half of the problems (3/6 problems). This study only shows the tip of the iceberg since, as they also claim, the analysis has to be performed on large scale software projects. However, the result of García et al.’s study shows that compliance to the SEI CERT C Standard has its downsides and needs to be researched more. This makes this thesis even more relevant as it tries to further investigate the effectiveness of the SEI CERT Secure Coding Standard and how it can be used to reduce vulnerabilities. In 2008, Cathal Boogerd and Leon Moonen [11] performed an empirical study where they tried to give an answer to whether there is a relation between a MISRA-C:2004 rule violation and a fault in the software. To answer this they used two different methods, one being the ex- amining of violations of the MISRA-C standard and faults over time to check for correlation, and the other method being to study separate violations of the standard closely over a period of time to tell how often they actually lead to faults. They also used two metrics, firstly the number of violations and secondly the faults, both divided by KLOC (each taken on a per version basis). Boogerd and Moonen observed that by complying with MISRA-C:2004 rules completely there may be a rise in the number of faults in the specific software due to the fact that compliance with a rule has the risk of causing new faults, in fact, their study showed that only 12 out of 72 rules that were observed were able to find faults much better than a random predictor that randomly selects a line in the code. In a follow up work, Boogerd and Moonen [12] tried to measure the quality of two dif- ferent projects written in the C language. In this study they introduced the term violation density, which is a metric where the violations are divided by the LOC. They analyzed 89 different rules of the MISRA-C:2004 standard, and as in the study before, they found that only a small amount (10) of these rules are more fault-prone when there is a higher violation density. In contrast to this thesis, where the method is based on searching for vulnerabilities to see whether they could have been avoided using SEI CERT rules, Boogerd and Moonen’s studies, search for MISRA violations to see whether they are faults and if they lead to vulner- abilities. That said, the results of their studies are well worth taking into consideration when answering RQ2, especially when contemplating the risk of not following a rule in contrast to the risk of introducing new faults when complying to it.

3.2 Benefits of coding standards

The tech world always strives to be on the forefront of new and revolutionizing ideas. Many of the new technologies and products spawned from these ideas are connected to the internet and tend to handle sensitive information, and as such the need for good software security is greater than ever before. This is commonly known in the software industry and has also been brought up by Srdan¯ Popi´c et al. [35]. In their study they researched whether different coding standards, including secure ones, can be used to improve software quality, maintenance of the code, stability and safety. The projects that the study analyzed were two python projects where the PEP8 [43] style guide was followed. Using the two dimensions, “new or corrected lines during the implementation step” and “the number of fails detected during the coding standard check” the study came to the conclusion that the number of errors dropped over time as the developers acclimate to the PEP8 standard. As mentioned in the previous paragraph program quality is of utmost importance when developing software. One way to help increase the quality of software is to follow a coding standard, as mentioned by Xuefen Fang [23]. In Fang’s study he investigated whether the quality of a program can be increased by making the developers adhere to a Java program- ming coding standard. The coding standard was made from the Java Coding Conventions

16 3.3. Evaluation of static analysis tools

[33] with some additions made by Fang. Fang introduces four different projects where three of them have their developers comply to a standard and one does not follow any particu- lar coding standard. To address if the software quality is increased, Fang looks at the lines of code (LOC) and comment rate in each file. The result of the study shows that the LOC did not change drastically no matter if the project followed a coding standard or not. Although, the projects that followed the coding standard had a higher rate of comments. Fang concludes that a coding standard is an efficient way to boost the quality of the code and specifically the maintainability. Fang only takes the quality of a project into consideration while this thesis focuses on the security aspect, however, it is important to understand the relation between high quality and the security of the software, as Richard Bellairs explains in [9].

3.3 Evaluation of static analysis tools

As explained in section 2.7, static analysis tools will inevitably report false positives as well as miss important vulnerabilities, which can lead to a high number of false negatives. Thu- Trang Nguyen et al. [32] tried to "Enable Precise Check for SEI CERT C Coding Standard" as well as automate the verification process of true and false positives. In their study they car- ried out an experiment where they ran the static analysis tool Rosecheckers on two different large projects to get SEI CERT C warnings. The method they used was to first run Rosecheck- ers to find code in the projects that did not comply with SEI CERT C. They then verified these warnings by running deductive verification, model checking and pattern matching, which all three combined gave a more accurate result. Their method showed that 60% of the Rosecheckers warnings could be verified and that 87% of the verified warnings in the first project and 57% in the second project were true positives, while 13% and 43% were false positives, respectively. In comparison to this study, Nguyen et al. only checked for compli- ance with four different SEI CERT C areas (Declarations and Initialization (DCL), Expressions (EXP), Integers (INT) and Arrays (ARR)), both recommendations and rules, while this study considers all areas, but only rules and no recommendations. This study also does not focus on the amount of false positives as Nguyen et al., but instead the number of false negatives. In another study written by Andrei Arusoaie et al. [4] they compared twelve different static analysis tools, Frama-C, Clang (alpha), Clang (core), Oclint, "System", Cppcheck, Splint, Facebook Infer, Uno, Flawfinder, Sparse and Flint++. In their study they ran the different tools on the Toyota ITC test suite, which contained 639 test cases. To compare the tools they checked for how many of the violations they found and the run time of each tool. In the study they also reported the amount of false positives that were found each time they ran the tools. To find out whether a tool found a violation they checked if the tool reported an error on the exact line that the violation was said to be located. They manually confirmed this approach and they found some bugs in the test suite and a few imprecision for the tools. Arusoaie et al. showed in the study that the different tools varied in found violations from 1.1% up to 44.13% and that the run time varied from 0.27s up to 50.80s, where Clang core and Clang alpha found 15.34% and 28.17% with the run times 6.42 and 13.29, respectively. This means that 84.66% and 71.83% false negatives were found for Clang core and Clang alpha. In regards to false positives they found that Clang core had a false positive rate of 0.63% and Clang alpha had a false positive rate of 10.33%. Finally, Arusoaie et al. summarize that Clang Static Analyzer (core and alpha combined) offered a good trade-off compared to the other tools in regards to run time vs. violations found. In relation to this thesis the results for Clang are especially interesting, as CodeChecker uses Clang Static Analyzer for the C and C++ analysis. In an article from 2020 by Lisa Nguyen Quang Do et al. [21] they researched static analysis tools from a user perspective, taking the reasoning for why and how developers use the tools into account. They state that these are important points to consider when presenting requirements for new static analysis tools and improvements to current tools. In the study they came to the conclusion that time was a very important factor for static analysis tools

17 3.4. Collection of vulnerabilities and that it was a deciding factor for how the tools were used. Since the time factor is very important during tools analysis, it is something that will be considered in this study as well. Jiang Zheng et al. [54] discussed some interesting points in regards to the static analysis tools FlexeLint and Klocwork. In their study from 2006 they addressed the economic viability of static analysis tools, the effectiveness of a tool and what types of vulnerabilities that were most commonly reported by the tools. They approached these questions by analyzing three large projects where they had access to a manifest with reported violations and issues. Zheng et al. ran the different tools on the projects and compared the result with the violations in the manifest. The result of this showed that about 30% of the reported violations in the manifest were found by the tools and that compared to manual inspection, the number of findings did not deviate significantly between them. In regards to the economical viability they found that using static analysis tools in the early phases of development was more advantageous compared to having to fix the violations later on. This result shows that RQ1 is of interest for further research, since companies always strive to decrease the amount of money spent on fixing bugs. The study also showed that the most common vulnerability was "possible use of NULL pointer", which stood for 45.92% of all faults in the analyzed projects. Zheng et al.’s result will be interesting to compare to the result that will be gathered in this study, to see whether or not "null pointer dereference" is still the most common vulnerability 15 years later. A more recent study that was conducted by Jose D’Abruzzo Pereira and Marco Vieira in 2020 [34], where they, in a similar way, evaluated two static analysis tools, Flawfinder and CppCheck, on the large open source vulnerability data set made by Mozilla. The data set was made up of vulnerabilities from five projects: Mozilla, httpd, glibc, Linux kernel, and Xen Hypervisor. The study came to the conclusion that none of the tools were that great, but CppCheck was much better choice, finding more true negatives, less false positives, less false negatives, and more true positives at 92.8%, 7.2%, 16.5% and 83.5% in comparison to Flawfinder which found 6.8%, 93.2%, 60.8%, and 36.2%, respectively. Jose D’Abruzzo Pereira and Marco Vieira’s results will be interesting to compare with the results of the different static analysis tools and on the different dataset used in this thesis. In the research paper "Evaluating Static Analysis Defect Warnings On Production Soft- ware" written by Nathaniel Ayewah et al. [5] they conducted a study where they evaluated the performance of the static analysis tool FindBugs, which is a tool that finds bugs in Java programs. In this paper they explained that a possible reason to why static analysis tools in general reports more trivial vulnerabilities than critical vulnerabilities is because the tools can not check if the code is implemented in the way it was meant to be, since the tools do not know the purpose of the code. They continued by explaining that this could be a con- sequence of the tools analysis techniques, which often is based on finding rare and unsafe code patterns. This conclusion might be of interest when analyzing the results of the differ- ent static analysis tools used in this thesis and their performance on the different languages analyzed.

3.4 Collection of vulnerabilities

James Walden et al. [53] wanted to compare different vulnerability prediction models. To achieve this they created a high quality public data set containing many different kinds of vulnerabilities. The collection of said data set was based on the following requirements:

• The source code must be available

• Must be written in PHP

• Must have a large number of vulnerabilities found

18 3.4. Collection of vulnerabilities

Even though the goal of Walden et al.’s study was different from the one in this thesis, the method of collecting applications that contain vulnerabilities can be applied in this thesis as well. This can be done by changing the programming language requirement from PHP to C and C++. Unlike Walden et al.’s study that looked at three different projects with many vulnerabilities, this thesis looks at hundreds of different projects in the CVE database and then selects the latest ones for C analysis. For the C++ analysis, the CVSS is used to rank the collected vulnerabilities and select the most severe ones. To summarize, these related works show that there is a certain ambiguity about secure coding, and specifically coding standards, since some of the rules may not be worth imple- menting due to the risk of introducing new faults. To the best of the authors’ knowledge, there has not been much empirical studies done on the effectiveness of coding standards in regards to the security, and few on the SEI CERT Standard. This thesis gives a different approach to the selection process of vulnerabilities, analysis, and testing of the performance of the SEI CERT standards. The results give an overview of how the SEI CERT standards perform over a larger number of real world projects, which combined with the results of other studies will give a more comprehensive picture of the real world effectiveness of the SEI CERT Secure Coding Standards.

19 4 Method

4.1 Approach

This chapter describes the chosen methods that were used to answer the research questions. The method that was used is structured as following:

1. Gathering CVE C vulnerabilities

2. Gathering CVE C++ vulnerabilities

3. Analyzing CVE vulnerabilities

4. Gathering Rule Specific CVE vulnerabilities

5. Analyzing Rule Specific CVE vulnerabilities

Over the course of the project, data on what tools helped to comply with the SEI CERT secure coding standards were gathered. This data was then analyzed to be able to give an answer to RQ3 in the following way:

1. Examine how many of the analyzed CVE SEI CERT violations that were found by the tools

2. Study the run time of each tool in contrast to the project size (in MB) and number of files when analyzing

4.2 Gathering of vulnerabilities in CVE

The gathering process was split into two parts, one for the C programming language and one for the C++ programming language.

C The gathering of the CVE vulnerabilities written in C code was done by a python-script, which can be seen in Listing 4.1, that extracts vulnerabilities written in C. A copy of the CVE database was downloaded (in .csv-format) to make the extracting process faster. The

20 4.3. Analyzing vulnerabilities in CVE

vulnerabilities were selected by searching for ".c" in the description for each vulnerability. Filtering was also done to extract the most recently updated vulnerabilities and only those that had a link to a public GitHub where the source code could be analyzed.

1 import csv 2 3 with open("allitems.csv", encoding="utf-8") as csvfile, open("cve.txt","w") as out: 4 csv_reader = csv.DictReader(csvfile, delimiter=",") 5 for row in csv_reader: 6 try: 7 if".c" in row["Description"] and"" in row["References"]: 8 out.write(row["Name"] +"\n") 9 except UnicodeDecodeError: 10 print(row) Listing 4.1: Python script for extracting C vulnerabilities.

4.2.1 C++ The gathering of C++ related CVE vulnerabilities was done in a similar way as the one de- scribed for the C language. The script used can be seen in Appendix B. This time the focus was on the vulnerabilities with the highest severity instead of the latest updated CVE vulner- abilities. This was done by filtering the CVE:s (from 2017-2020) based on the vulnerabilities CVSS Base Score, which is a score that ranges from 0 to 10, with 10 being the highest severity. However, this method did not only collect vulnerabilities with a severity score of 10, instead it resulted in most of them being over 7.0.

4.3 Analyzing vulnerabilities in CVE

When a subset of vulnerabilities was gathered the analysis began. Some of the vulnerabilities did not have enough information to tell whether compliance to the SEI CERT standard would have made a noticeable difference to the end result, i.e. preventing the vulnerability. There were also vulnerabilities in projects that could not be compiled or built for some of the static analysis tools for various reasons. Both of these were skipped and were not included in the final analysis result. When a manual review of the vulnerability had been completed, an analysis was done with different static analysis tools. If the manual analysis outcome was that there was no rule violation for the CVE, the tools were not run to save time.

Figure 4.1: Description of a CVE vulnerability.

21 4.3. Analyzing vulnerabilities in CVE

4.3.1 Manually The first step of the analysis process was to manually examine the code related to the vul- nerability as well as the description of the CVE. If the vulnerabilities had an issue page on GitHub, where users had done analysis and described the problem, this was also taken into consideration. When an understanding of the reported issue was achieved, the different SEI CERT rules were scanned to find the best fit, if there was one covering the vulnerability. The description of the CVE:s often included the type of vulnerability, this could be used to narrow down the search area of the SEI CERT rules. For example, in Figure 4.1 the description of the CVE states that the vulnerability relates to some limitation of characters in a format argument. From this description, the conclusion that the area can be narrowed down to "Characters and Strings (STR)" or "Input Output (FIO)" can be drawn. Since it was a buffer overflow where there was an unsafe format argument for the fscanf-function, which is part of the two previ- ously mentioned areas. When the area had been decided, each individual rule was taken into consideration to decide whether at least one rule would have prevented the vulnerability. When this process was finished it should be clear whether compliance to a specific SEI CERT rule would have helped or not.

4.3.2 Static analysis tools To run the different static analysis tools the Ubuntu 20.04 LTS operating system was used. The tools that were used did not cover all of the SEI CERT rules, this had to be looked up on the official tool website [29, 49] to make sure the tool was not executed unnecessarily on a project. For each project, the project size (in MB), git-commit, CVSS score, SEI CERT rule and risk level were recorded. For C++ projects the run time (make time + analysis time) of each tool was also recorded.

Hardware The hardware that was used to run the static analysis tools were two computers with differ- ent configurations and specs. One of them was a Lenovo T5 26AMR5 with an AMD Ryzen 5 3600 6-Core processor with 16GB of DDR4 RAM. The virtual machine configuration for this machine was to use 2 processor cores and 4GB of RAM. The other was a custom built com- puter with an Intel Core i7 4770k and 16GB of DDR3 RAM, where the virtual machine was configured to use 3 processor cores and 4GB of RAM.

PVS-Studio One of the static analysis tools that were used was PVS-Studio [31], version 7.11. This was done by first making the project according to the build instructions for the project and tracing the make process by the first command shown in Listing 4.2. When the make had been successfully completed and traced, PVS-Studio analyzed the make result using the command on the second line in Listing 4.2. The third line in Listing 4.2 was used to create a readable HTML report where PVS-Studio reported all violations found. To be able to run PVS-Studio using the student license, it was also needed to add a specific set of lines to the start of each file that was analyzed [30]. Since many projects were analyzed, a python-script was made to make the analysis faster, shown in Listing 4.3.

1 sudo pvs-studio-analyzer trace -- make 2 pvs-studio-analyzer analyze -o pvs.log 3 plog-converter -a GA:1,2 -t fullhtml pvs.log -o ./reports Listing 4.2: Commands for PVS-Studio analysis.

22 4.3. Analyzing vulnerabilities in CVE

1 import os, sys 2 3 dir = sys.argv[1] if sys.argv[1][-1] =="/" else sys.argv[1] +"/" 4 for root, dirs, files in os.walk(dir): 5 for file in files: 6 if file.endswith(’.c’) or file.endswith(’.cpp’) or file.endswith(’.cc’): 7 file_ptr = open(os.path.join(root,file),’r’, encoding="latin-1") 8 old_content = file_ptr.read() 9 file_ptr = open(os.path.join(root,file),’w’, encoding="latin-1") 10 file_ptr.write("// This isa personal academic project. Dear PVS-Studio, please check it.\n// PVS-Studio Static Code Analyzer forC,C++,C#, and Java: http://www.viva64.com") 11 file_ptr.write(old_content) Listing 4.3: Python script for adding PVS-Studio student license comment.

In Figure 4.2, the result of PVS-studio can be seen. The Figure shows a list of the files and the violations found in each file. In this case, the violation is called "V522" and by looking at the SEI CERT site, this rule translates into the SEI CERT rule called "EXP34-C", which is a null pointer dereference.

Figure 4.2: Example of PVS-Studio output.

To check whether PVS covered a specific SEI CERT rule it was as simple as going to the PVS home page [29] and look at the SEI CERT rule table.

Rosecheckers Rosecheckers is another static analysis tool that was used. To run this, the official Rosecheck- ers Docker container was used. The container was built by running the command found in the Dockerfile in commit "r342" on the Rosecheckers Sourceforge [49] which can be seen in Listing 4.4. Once the container was set up, Rosecheckers was run by issuing the command (include files depend on the project being analyzed) seen in Listing 4.5. In this specific case, the project name was "janus-gateway", the file being analyzed was "rtp.c" and the include folder needed was located in "/home/USER_NAME/Downloads/janus-gateway/include".

1 docker build -t rosecheckers . Listing 4.4: Command for getting the docker container.

1 sudo docker run -it --rm -v /home/USER_NAME/Downloads/janus-gateway:/tmp rosecheckers rosecheckers -c -I/tmp/include/ /tmp/plugins/rtp.c Listing 4.5: Example command for running Rosecheckers analysis on the rtp.c file in the Janus-gateway project.

In Figure 4.3, an example of the result of Rosecheckers output can be seen. Rosecheckers tells whether it found an error or a warning and on what line. An error corresponds to a rule violation, while a warning corresponds to a violation of a recommendation.

23 4.3. Analyzing vulnerabilities in CVE

Figure 4.3: Example of Rosecheckers output.

To check whether Rosecheckers covered a specific SEI CERT rule or not it was needed to look in the source code file for the specific rule. This since the official Rosecheckers website is not up to date. When looking at the source code [49] there were files for each rule area, for example "ARR.C" for arrays, "STR.C" for characters and strings, and "MEM.C" for memory management. As can be seen in Figure 4.4, each rule that is present in the "MEM_C" and "MEM_CPP" functions are the ones that Rosecheckers covered, in this example "MEM41- CPP" and the "MEM_C" rules for C++.

Figure 4.4: Rules that Rosecheckers covers for memory management [49].

1 source ~/codechecker/venv/bin/activate 2 export PATH=~/codechecker/build/CodeChecker/bin:$PATH 3 CodeChecker log -b "make" -o compilation.json 4 CodeChecker analyze compilation.json -j10 --ctu --stats -o ./cc_reports -e profile: sensitive -e guideline:sei-cert 5 CodeChecker parse ./cc_reports -e html -o ./reports_html 6 CodeChecker analyze compilation.json -j10 --ctu -o ./cc_reports -e profile:sensitive -e guideline:sei-cert --file "*FILE_NAME" Listing 4.6: Command for setting up and running CodeChecker

CodeChecker The final tool that was used was CodeChecker, specifically version 6.15.1. To be able to use CodeChecker, firstly set up the environment in the terminal, this was done by running the first two commands in Listing 4.6. After that CodeChecker was ready to execute the analysis, by first running the third command in Listing 4.6, which built the chosen project and noted

24 4.4. Gathering rule specific CVE vulnerabilities down the files to analyze in the "compilation.json" file. Once the project was built, analysis could be started by running the fourth command. The flags used in this command were Cross Translation Unit (CTU), j10, o and e. CTU was used to not only analyze the direct files of interest, but also related files. The -j10 flag was used to run the command faster when more threads are available, the -o flag makes the analysis report its findings into the directory "cc_reports” and the -e flag enabled the ability to choose a specific checker profile or guideline, which in this case was "profile:sensitive" and "guideline:sei-cert". The sensitive profile made the tool analyze the code with a higher quality in regards to the false positive rate and the "guideline:sei-cert" flag made the tool indicate if there was a specific SEI CERT rule for a violation. Once the analysis was finished the output needed to be parsed into something readable, in this case a static HTML-file. This was done by executing the fifth command in Listing 4.6. The resulting HTML file could then be opened in a browser and may look like in Figure 4.5. As can be seen in the figure, the HTML file shows what checker that found a specific violation and a message with a description of the violation. The analysis could also be run on a specific file instead of the whole project if needed. This could be useful if the project is very big and only one file needs to be run. This was done by executing the sixth command in Listing 4.6.

Figure 4.5: Example of CodeChecker HTML output.

As of writing this, CodeChecker only had an internally (at Ericsson) available table of information regarding what SEI CERT rules it covered. Therefore it can not be published in this thesis.

4.4 Gathering rule specific CVE vulnerabilities

To be able to give a better answer to RQ3, this part of the method focused on CVE:s where the problem was known to have violated a specific SEI CERT rule. Firstly 10 rules with the risk level L1 and L2 from the SEI CERT C Standard were chosen. These rules can be seen in Table 4.1. For each of these rules, 10 CVE:s that violated the rule were gathered by using different modified versions of the python-script shown in Listing 4.1, for example, the modification for EXP34-C (Do not dereference null pointers) can be seen in appendix A where the CVE:s is filtered by "null pointer" and "dereference". When the rules were gathered by the script, they were also manually reviewed to make sure that it was a violation of the particular rule.

25 4.5. Analyzing rule specific CVE vulnerabilities

Rule Title Level INT30-C Ensure that unsigned integer operations do L2 not wrap INT32-C Ensure that operations on signed integers do L2 not result in overflow INT33-C Ensure that division and remainder opera- L2 tions do not result in divide-by-zero errors MEM30-C Do not access freed memory L1 MEM35-C Allocate sufficient memory for an object L2 EXP33-C Do not read uninitialized memory L1 EXP34-C Do not dereference null pointers L1 ARR30-C Do not form or use out-of-bounds pointers or L2 array subscripts FIO47-C Use valid format strings L2 STR31-C Guarantee that storage for strings has suffi- L1 cient space for character data and the null ter- minator Table 4.1: Rules tested in tools analysis.

4.5 Analyzing rule specific CVE vulnerabilities

The analysis of the rule specific CVE:s was done in a similar way as the analysis described in 4.3. The only difference lies in the manual analysis, since in this part the CVE:s only needed to be verified that they violated the rule of interest. The static analysis tools testing was performed in the same way as described in 4.3.

4.6 Case studies

This chapter explains three different case studies that will demonstrate the approach of the analysis method described above. The first and second case studies will walk through exam- ples of the method explained in 4.3 for analysis of a C and C++ CVE vulnerability, while the third, described in 4.5, will go through the analysis of a Rule specific vulnerability.

4.6.1 C CVE The reported vulnerability in CVE-2020-14033 comes from the project Janus. The vulnerabil- ity is located in the file "janus_streaming.c" and described as a buffer overflow via a crafted RTSP server. By reading the description, it is not easy to tell whether it is a violation of SEI CERT C or not, and especially not easy to tell which specific rule. By looking at the fixed code in Listing ?? it is quite straightforward to tell that it is in fact covered by both FIO47-C (Use valid format strings" and STR31-C (Guarantee that storage for strings has sufficient space for character data and the null terminator), since before the fix, the format string is not providing the correct number of arguments of the string. This is fixed by adding the "255" before the type "s" of the string. After manually taking the decision regarding what rule the vulnera- bility violates the static analysis tools Rosecheckers, PVS and CodeChecker were run. In this case Rosecheckers gives a warning on the specific row and tells that it could be a vulnerabil- ity of the type STR31-C there, and PVS gives a warning that it could be FIO47-C. Since both of theses would cover the reported problem it is concluded that this warning is in fact a true positive. CodeChecker, on the other hand, did not find the violation.

26 4.6. Case studies

1 char ip[256]; 2 in_addr_t mcast = INADDR_ANY; 3 if(c != NULL) { 4 if(sscanf(c, "c=IN IP4 %[^/]", ip) != 0) { 5 if(sscanf(c, "c=IN IP4 %255[^/]", ip) != 0) { 6 memcpy(host, ip, sizeof(ip)); 7 c = strstr(host,"\r\n"); 8 if(c) Listing 4.7: Git diff for CVE-2020-14033.

4.6.2 C++ CVE The reported vulnerability in this case comes from "CVE-2018-9304", which is the project exiv2 1. In this case, an educated guess would be that the reported issue may violate INT33-C (Ensure that division and remainder operations do not result in divide-by-zero errors), since it says in the description that "a divide by zero" result in denial of service. When looking at the code in Listing 4.8, this can be confirmed by the fix that makes sure that the variable "count" is not zero, and therefore it can not be divided by zero. As usual, the static analysis tools Rosecheckers, PVS and CodeChecker were run. Unfortunately not a single one of them found this violation. The conclusion is that this vulnerability is a false negative for all of the tools since none of them reports it.

1 if (size > std::numeric_limits::max() / count) 2 throw Error(kerInvalidMalloc); 3 if (count != 0) { 4 if (size > std::numeric_limits::max() / count) { 5 throw Error(kerInvalidMalloc); 6 } 7 } Listing 4.8: Git diff for CVE-2018-9304.

4.6.3 Rule specific CVE CVE-2019-9113, this issue comes from the project libming and may violate the recommenda- tion EXP34-C (Do not dereference null pointers) or EXP37-C (Call functions with the correct number and type of arguments), since as described in the description of the reported CVE, this is a "NULL pointer dereference". This was then controlled by looking at the reported code, this time the reported issue still was open and therefore there was no solution. By look- ing at the reported line seen in Listing 4.9, there is a null pointer dereference in the case that "act" is null, which also means that an invalid type will be sent to the function. The static analysis tool PVS and Rosechecker were in this case the only two tools that found any of the SEI CERT C rules, PVS found EXP37-C and Rosecheckers found EXP34-C. CodeChecker did not give any relevant warnings on the reported line.

1 t=malloc(strlenext(pool[act->p.Constant16])+3); Listing 4.9: Line with problematic code for CVE-2019-9113.

1https://github.com/Exiv2/exiv2

27 5 Results

This chapter covers all the results that have been gathered following the method described in Chapter 4. It includes the gathering and analysis of the CVE:s and the tools for C, C++ and SEI CERT rule specific vulnerabilities.

5.1 Gathering of vulnerabilities in CVE

This section presents the different CVE:s that have been collected.

5.1.1 C CVE vulnerabilities The list in Appendix C shows the 60 gathered CVE:s for C. As can be seen, the CVE:s consist of 37 different open-source projects, ranging in size. The CVE:s that have been gathered were all from 2020 except for one that was applied for in 2019.

5.1.2 C++ CVE vulnerabilities The list in Appendix D shows the 60 gathered CVE:s for C++. This part consists of 36 different open-source projects, also ranging in size. The C++ related CVE:s were a mix of CVE:s applied for in 2018, 2019 and 2020.

5.1.3 Rule specific CVE vulnerabilities Appendix E shows a list of the gathered CVE:s for the rule specific part of the analysis in the method. As can be seen there are 100 different CVE:s from 53 different projects, these 100 are in turn distributed over 10 different SEI CERT rules with 10 CVE:s per rule. Most of the CVE:s are from 2020, 2019 or 2018, but a few are older. Out of the chosen rules there were four L1 rules and six L2 rules.

5.2 Analyzing vulnerabilities in CVE

This section covers the resulting data and the graphs made from the analysis of CVE vulner- abilities from C, C++ and rule specific analysis.

28 5.2. Analyzing vulnerabilities in CVE

5.2.1 C CVE vulnerabilities As said above, there were 60 different CVE:s gathered in the C language. In Figure 5.1 it can be seen that in 38 out of these 60 cases it was possible to find a SEI CERT C rule that should have prevented the reported issue if complied to. In the other 22 cases there was no SEI CERT C rule that could have prevented the vulnerability. In some of these 22 cases the issue was reported and the code of the project was too complicated or not detailed enough to clearly select a SEI CERT C rule that could have prevented it.

Figure 5.1: SEI CERT C Rule vs. No Rule distribution for the 60 CVE:s analyzed.

In Table 5.1 the average size of the 60 CVE:s analyzed, divided into those that did and did not have a SEI CERT C rule, shows that the projects where no rule could be found had the average size of 66.84 MB and the projects where rules could be found had the smaller average size of 52.88 MB. When looking at the average CVSS score it can be seen that it is about the same, regardless whether there is a SEI CERT C rule covering the CVE issue or not.

Table 5.1: Rules tested in C analysis. Rule Occurrences Average Size (MB) Average CVSS No rule 22 66.84 7.60 Has rule 38 52.88 7.62

Figure 5.2: SEI CERT C Rule distribution for the 38 CVE:s where a rule could be found.

29 5.2. Analyzing vulnerabilities in CVE

Figure 5.2 shows the occurrences of the 16 different SEI CERT C rules in the CVE:s where a rule could have been applied to avoid the issue. As can be seen EXP34-C (Do not dereference null pointers) and ARR32-C (Ensure size arguments for variable length arrays are in a valid range) are the most commonly occurring violations out of the 16 found.

Figure 5.3: Risk level distribution for the 16 different rules found during C analysis.

Figure 5.3 shows the distribution of the risk level for the different rules. As can be seen, there are 18 occurrences that are of Level 1, 24 of Level 2 and only two occurrences of Level 3, making L2 the most common level.

Figure 5.4: Number of SEI CERT C violations found per rule during C analysis.

30 5.2. Analyzing vulnerabilities in CVE

Figure 5.5: Percentages of violations found per rule during C analysis.

The tools that were run on the C related CVE:s were Rosecheckers, PVS and CodeChecker. The results of this analysis can be seen in Figure 5.4 and 5.5. In most cases the tools did not find any rule that could have been followed to avoid the vulnerability. For only four of the different SEI CERT C rules did the tools find 100% of the violations, three by PVS and one by Rosecheckers. For the eight different rules that were found manually, the tools were not able to find a single violation related to the reported issue in the CVE. For various reasons previously mentioned in 4.3 some projects were not able to be analyzed by the tools, which meant that PVS was only able to run 33 projects, Rosecheckers 24 projects and CodeChecker 34 out of the total 38. In total out of these, PVS found seven, Rosecheckers found six and CodeChecker found three of the violations. Figure 5.6 shows that the found C related violations tend to be in smaller size projects, in fact, only two of the found violations were from a project with a size larger than 40 MB. In the figure, some projects were run more than one time since they occurred in multiple CVE:s, but the x-axis is aggregated so that they are combined under the same size. Also, even though some of the projects occurred more than once, the tools did only manage to find more than one rule violation one time for each of those projects.

Figure 5.6: Violations found in relation to size during C analysis.

31 5.2. Analyzing vulnerabilities in CVE

Based on the results from the amount of rule violations the tools found, the number of true positives and false negatives can be calculated. As shown in Table 5.2, Rosecheckers could detect six out of 24 rules, which means it has 25% true positive rate and 75% false negative rate. PVS found seven out of 33 rules, which means 21.21% and 78.79% true positive and false negative, respectively. CodeChecker found three out of 34 rules, which means 8.82% true positive and 91.18% false negative.

Table 5.2: True positive and False negative for C analysis. Tool True positive False negative Rosecheckers 25.00% 75.00% (6/24) (18/24) PVS-Studio 21.21% 78.79% (7/33) (26/33) CodeChecker 8.82% 91.18% (3/34) (31/34)

5.2.2 C++ CVE vulnerabilities The analysis that has been done on the C++ related CVE:s is the same as the ones for C related CVE:s, but this time the run time for each tool was also noted down.

Figure 5.7: SEI CERT C++ Rule vs. No Rule distribution for the 60 CVE:s analyzed.

In Figure 5.7 it can be seen that 37 of the different CVE:s did have a SEI CERT C++ rule that could have been complied to to avoid the reported issue, and in the other 23 cases there were no rule. As mentioned previously in 5.2.1, there were some projects that were too complicated to find a SEI CERT rule, the same thing was true for C++.

Table 5.3: Rules tested in C++ analysis. Rule Occurrences Average Size (MB) Average CVSS No rule 23 55.16 8.30 Has rule 37 63.64 7.85

Table 5.3 shows that for C++ related CVE:s the ones where a rule was found had the larger average size of 63.64 MB and the ones without a rule had an average size of 55.16 MB, unlike for C in Table 5.1, where the CVE:s with no rule had a larger average size. The CVSS score for the C++ projects differed a bit more than those for the C projects, although, the difference was still not very big.

32 5.2. Analyzing vulnerabilities in CVE

Figure 5.8: SEI CERT C++ Rule distribution for the 37 CVE:s where a rule could be found.

In Figure 5.8 the distribution between each of the 12 different rules that were found can be seen. As for C in Figure 5.2, EXP34-C (Do not dereference null pointers) is overrepresented here as well.

Figure 5.9: Risk level distribution for the 12 different rules found during C++ analysis.

As can be seen in Figure 5.9, risk level 2 is the most occurring level amongst the rules that were found for C++ related CVE:s, then level 1 and lastly level 3. As for the C vulnerabilities and as mentioned in 4.3 some projects were not able to be analyzed by the static analysis tools, which meant that PVS was only able to run 31 projects, Rosecheckers 13 projects, and CodeChecker 32 projects out of the total 37. Figure 5.10 and 5.11 show what different rules were found by the tools. As can be seen, barely any of the violations were found, in fact, PVS only found two, Rosecheckers found one and CodeChecker did not find a single one.

33 5.2. Analyzing vulnerabilities in CVE

Figure 5.10: Number of SEI CERT C++ violations found per rule during C++ analysis.

Figure 5.11: Percentages of violations found per rule during C++ analysis.

34 5.2. Analyzing vulnerabilities in CVE

Figure 5.12: Static analysis tools Run time comparison.

The time it took to run each tool on the different projects can be seen in Figure 5.12. In most of the projects, CodeChecker took the longest time to run. For some of the projects there is no bar for a certain tool, this means that the tool could not be run on that project.

Figure 5.13: PVS & Rosecheckers Run time in relation to project size.

35 5.2. Analyzing vulnerabilities in CVE

Figure 5.14: PVS & Rosecheckers Run time in relation to number of files.

Figure 5.15: CodeChecker Run time in relation to project size.

Figure 5.16: CodeChecker Run time in relation to number of files.

Figure 5.13 and 5.14 shows the PVS and Rosecheckers run time in relation to the size of the project and the amount of files of the project, respectively. As can be seen by the trendline in both of the figures, there is an increase in run time as the project size and number of files in the project increased for both of the tools. In the same way, Figure 5.15 and 5.16 shows the run time of CodeChecker in relation to the size of the project and the amount of files of the

36 5.2. Analyzing vulnerabilities in CVE project, respectively. Here it can be seen that the increase in run time is even bigger than the one for the other two tools. When it comes to false negatives and true positives, they are shown in Table 5.4. Rosecheckers could detect one out of 13 rules, which means it has 7.69% true positive rate and 92.31% false negative rate. CodeChecker found zero of the 32 rules, which means 0% true positives and 100% false negatives. PVS found two out of 31 rules, which means 6.45% and 93.55% true positive and false negative, respectively.

Table 5.4: True positive and False negative for C++ analysis. Tool True positive False negative Rosecheckers 7.69% 92.31% (1/13) (12/13) PVS-Studio 6.45% 93.55% (2/31) (29/31) CodeChecker 0.00% 100.00% (0/32) (32/32)

5.2.3 Rule specific CVE vulnerabilities The rule specific CVE analysis was done on 10 different SEI CERT rules for C and C++. For each SEI CERT rule 10 different CVE:s were analyzed, meaning 100 CVE:s in total. Out of these 100 CVE:s, there were 53 different projects where 47 were C language projects and only six were C++. Three of the rules tested were not covered by Rosecheckers, namely, EXP33-C, FIO47-C, and INT30-C, which meant that Rosecheckers was only tested on 70 total CVE:s. Figure 5.17 shows how many of the violations that were found by the different static analysis tools. As said above, each rule was run 10 different times, this means that, for example, FIO47-C was found 60%, 10%, and 0% of the time by PVS, CodeChecker, and Rosecheckers, respectively. In total, PVS found 17, Rosecheckers found seven and CodeChecker found 14 violations out of the 100 checked during this analysis.

Figure 5.17: Rule specific violations found per static analysis tool.

37 5.2. Analyzing vulnerabilities in CVE

Figure 5.18: Rule specific project size in relation to found violations per static analysis tool.

Table 5.5: Found violations per tool for each Size range during Rule specific analysis.

Size Range (MB) PVS found Rosecheckers found CodeChecker found Total found 0 - 10 1 2 6 9 10 - 20 5 1 5 11 20 - 40 4 1 2 7 40 - 100 3 2 1 6 100+ 4 1 1 6

The relation between the size of the projects and the violations found by the tools can be seen in Figure 5.18. The findings are stacked to make the graph easier to read. In Table 5.5 the findings are split into different size ranges, as can be seen, the violations found by PVS was split rather evenly, apart from the one found in the smallest range, Rosecheckers found violations in most of the size ranges, while CodeChecker found more of the violations in smaller projects.

38 5.2. Analyzing vulnerabilities in CVE

Figure 5.19: Rule specific project run time in relation to size per static analysis tool.

The run time for each of the static analysis tools is shown in Figure 5.19, where it can be seen that each of the tools run time has a linear increase in run time in relation to project size. Here PVS has the steepest increase in run time, followed by CodeChecker and finally Rosecheckers. However, CodeChecker took the longest time to run for these projects.

Figure 5.20: Rule specific number of violations found in relation to CVSS per static analysis tool.

Figure 5.20 shows the relation between the CVSS score of the projects that were run and the amount of violations the tools found. As can be seen in the figure, the CVSS score does not seem to change whether the tool will find the violation or not. However, PVS did seem to find more of the higher CVSS violations in comparison to the others. The false negatives and true positives for the tools can be seen in Table 5.6. Rosecheckers could detect 7 out of 70 rules, which means it has 10% true positive rate and 90% false negative rate. CodeChecker found 15 of the 100 rules, which means 15% true positives and 85% false

39 5.2. Analyzing vulnerabilities in CVE negatives. PVS found 17 out of 100 rules, which means 17% and 83% true positive and false negative, respectively.

Table 5.6: True positive and False negative for Rule specific analysis. Tool True positive False negative Rosecheckers 10% 90% (7/70) (63/70) PVS-Studio 17% 83% (17/100) (83/100) CodeChecker 15% 85% (15/100) (85/100)

In Appendix F detailed graphs about the gathered results for each of the 10 different rules can be found.

40 6 Discussion

This chapter includes a discussion based on the method followed and the gathered result, it also contains a section with a discussion about ethical and societal aspects.

6.1 Method

In this section, the method that has been used will be discussed. All the different sub-steps that have been followed will be covered by discussing why different decisions were taken, what went well, and what could have been done differently.

6.1.1 Gathering of vulnerabilities in CVE The gathering of CVE:s was done by selecting the latest published CVE:s, the purpose of this was to gather a large number of open source projects that were mostly different from each other. As mentioned in Chapter 3, other studies [12, 35, 23, 32, 4, 54] that have analyzed dif- ferent secure coding standards have, to the best of the authors knowledge, most often chosen only a few projects and examined the different reported issues of those specific projects. In contrast to the method of gathering vulnerabilities in this thesis, it was believed that, firstly, by looking at more than just a few projects the thesis will gain a broader and more diverse point of view. Secondly, the authors wanted to conduct a different method, which was con- sidered to benefit and expand the research of this area. Another reason why the decision to examine multiple different projects instead of only a few is because of RQ2, this method will give more of a real world perspective into this research question due to the sheer amount of different projects analyzed. However, as said in Section 1.4, the projects that were gathered could not be too large in size, since it would take too long time to run the static analysis tools on them. Because of this, the diversity got affected negatively. If we had been able to run larger projects as well, the results would have been even more heterogeneous. Furthermore, there is room for improvement regarding the vulnerability database selec- tion and usage. By including more than just one database, the process of gathering vulner- abilities would have been more validated since more sources may lead to a more heteroge- neous data set. For example, by including other databases such as the IBM X-Force Exchange, which has a better search functionality that is easier to use than the one for CVE, due to the collections and tags. Even though there are benefits by using more than one database when

41 6.1. Method collecting vulnerabilities, it was believed that the CVE database is enough for the purpose of this thesis. This since the CVE database is simply the biggest and the method used to filter fulfilled the thesis needs, even though it might not have been the most efficient. In regards to the filtering when gathering CVE:s, which was done by searching for ".c" to find C language projects and ".cpp" or ".cc" for C++ language projects, it is believed that another approach to this could have been to consider all of the CVE:s that had a GitHub and then examine the GitHub page to see what primary programming language the project had. This approach would have found more CVE:s at the cost of not only more time spent to further develop the script, but also a significant increase in the run time of the script. This approach could have been done manually as well, although this would have taken an even longer time. The primary reason why this method was chosen was because the approach gathered enough CVE:s and it was a very simple script to create and run. The filtering of the C++ projects was partially done by sorting the CVE:s by the CVSS score and then going down the list starting from most severe CVE:s until 60 were reached. This method led to the focus being on the CVE:s with a high CVSS score, which could be argued to make the data set less diverse since the vulnerabilities are all in the highest CVSS severity ranks. However, it was in the author’s interest to find out how good the SEI CERT Standard was for these very severe vulnerabilities.

6.1.2 Analyzing vulnerabilities in CVE Many studies that have been conducted in this area tend to focus on the false positive rates of tools and how to reduce it, and many times the studies already had access to a manifest with confirmed reported issues in the beginning of the study, which led to the study focusing on whether the tools could find these earlier reported issues or not. As mentioned in Section 3.3, Thu-Trang Nguyen et al. [32] conducted a study like this where they examined the false pos- itives of Rosecheckers for a specific project. In contrast to this method, the authors wanted to contribute something different to this field of study by instead of studying the false positives of different tools, they researched the true positives and false negatives of the tools. As mentioned in Section 3.3 where Jose D’Abruzzo Pereira and Marco Vieira [34] analyzed and compared the results of the static analysis tools Flawfinder and CppCheck, and Jiang Zheng et al. analyzed and compared the tools FlexeLint and Klocwork, a range of different tools have been run and compared to each other. To the best of the authors knowledge, a comparison between the static analysis tools used in this study has not been made. Therefore, the authors wanted to add an additional point of view to the research area by performing the analysis on these tools which helped answering RQ3. Three tools were chosen to be part of this thesis to be able to analyze a large enough number of CVE:s and still finish in time. This is also the reason why focus was only on static analysis tools and not other types like dynamic analysis tools. It was also difficult to gain access to more static analysis tools, since many of them were paid and did not offer a trial version for research. If more tools would have been included in the thesis, the result when comparing the different tools would have been more heterogeneous. Although, the three tools that were run are considered to give a good enough picture to be able to answer RQ3, since they are all different from each other not using exactly the same techniques as the other one. The method of the analysis was based on first doing a manual analysis and then, if the manual analysis found a SEI CERT rule violation, continuing with the static analysis tools to check whether they could find this rule violation as well. Since there are hundreds of different SEI CERT rules, there could be mistakes in the manual analysis leading to missed or (incorrectly) added SEI CERT rule violations. Because of this, the tools could have slightly different results in regards to found violations, since the tools were not run on CVE:s if there was no violation found manually. However, this number of mistakes is considered to be small and not large enough to have any substantial impact on the end results. This could have been avoided completely if the tools were run before a decision was made manually, since the

42 6.1. Method results of the tools could have been taken into consideration when the manual analysis was done. On the other hand, this would have taken more time since the tools took a long time to run, which in turn would have led to less projects being analyzed. As said in 4.3.2, two different computers were used to run the static analysis tools. This was done partly due to the distance mode where the authors needed to work from home because of the COVID-19 pandemic, and since the authors did not have access to the same computers from home. The effect of this meant that the run time of the tools would differ slightly depending on which computer was used on the specific project. The consistency of the run time in regards to the size of the project could have been even better if computers with the same specs were used. This also means that, since different computers were run, the reliability, in regards to the run time in relation to the project size, will have a lesser degree of probability to get the same outcome, if the method is repeated. This is something that was taken into consideration, Rosecheckers and PVS were run on one computer each, which meant the results for the run time of these tools will be consistent. However, CodeChecker had to be run on both of the computers since it was the tool that took the longest time to run and therefore this process needed to be optimized to be able to analyze more CVE vulnera- bilities.

6.1.3 Gathering rule specific vulnerabilities in CVE In Section 4.4 it was explained that for the rule specific part of the method, 10 rules with 10 CVE:s each were analyzed. If more than 10 rules and 100 CVE:s total were analyzed, the result would have been even more accurate, since the results would have been based on more data. However, the decision to stop at 10 was taken because of the time limit of the study, as well as the belief that the result of this would be enough to make a meaningful and accurate conclusion to be able to answer RQ1 and RQ3. It was also mentioned in Section 4.4 that out of the 10 rules that were analyzed, four of them were of the risk level L1 and six were L2. The authors wanted to analyze the rules with the highest risk level, be that as it may, they had to resort to lower risk levels as well as it was difficult to find CVE:s for some of the L1 rules. The reason for having multiple rules from the same area, for example three rules from the category INT and two from MEM, instead of one rule from each area, was for a similar reason, as it was much more difficult to find CVE:s for the rules not included. The reason why the method was changed from gathering the most recently reported CVE:s to gathering of rule specific CVE:s, was because the risk of making a faulty manual decision when deciding whether a violation could have been avoided by a rule or not be- came as good as non-existent. This is because the developers who reported and analyzed the issue described clearly in the issue description what kind of problem it was. As a result of this, the answer to the third research question RQ3 could be given a more accurate answer. In regards to the filtering of rule specific vulnerabilities, the discussion mentioned in Sec- tion 6.1.1 could be applied in this step as well.

6.1.4 Analyzing rule specific vulnerabilities in CVE The analysis of rule specific vulnerabilities was conducted in the same way as for the public vulnerabilities described in 4.2. Therefore the reasoning behind most of the decisions dis- cussed in 6.1.2 can be applied here as well.

6.1.5 Replicability, Reliability, and Validity The replicability of the method is considered to be good enough for a reader to conduct the study. In the method it is described what different CVE:s that were used and how they were extracted, as well as what settings were used for each tool when analyzing. All of these steps have been described thoroughly allowing for a high degree of replicability. However, the

43 6.2. Results result of the study can not be guaranteed to have the exact same outcome for the individual analysis of the CVE:s, if the study is conducted again. This since the manual analysis may differ depending on knowledge about the SEI CERT standards and personal experience. If using the same settings when running the tools, it will give the same results in regards to finding SEI CERT violations. The run time of the tools may differ depending on what hard- ware was used. Nevertheless, it is expected to get the same overall outcome and conclusion, pointing in the same direction as the ones presented in this thesis and thus have an overall high reliability. Regarding the validity of the study, the result is considered to be credible as the steps of the method are clearly described as well as the settings of the tools. However, the manual analysis may have errors due to potential lack of knowledge about SEI CERT and the code that were analyzed.

6.1.6 Source Criticism There has been a lot of peer-reviewed research on coding standards and their impact. How- ever, the SEI CERT coding standard is relatively unresearched in some aspects. Static analysis tools is also an area of interest when talking about conducted studies, although to the best of the authors knowledge, there has not been a single study comparing the tools Rosecheck- ers, PVS and CodeChecker against each other. Nevertheless, the majority of the research papers used in this thesis are well regarded, as in published in respected journals. Many of them were published by the likes of IEEE, ACM and Elsevier. In the areas where no rele- vant research papers were found, blog posts, websites and GitHub pages have been used as a complement. These are regarded as good quality as well, since most of them come from re- spected companies and universities. In addition to this, there have been conversations with the developers of CodeChecker regarding various topics of the tool.

6.2 Results

In this section, the results of the gathering and analysis of CVE:s will be discussed.

6.2.1 Gathering of vulnerabilities in CVE As said in Section 5.1, the number of projects that were gathered was 37 for C, 36 for C++ and 53 for the rule specific CVE:s. These are believed to be good numbers that offer a large amount of different projects meaning a considerable variety in the data set, but of course, by including more projects the result would have been even more diverse. The variation for the C++ projects might have been different if the CVE:s were not sorted in order of CVSS score. However, as the number of different projects for C and C++ were about the same and the C project CVE:s were not sorted according to CVSS, this aspect seems to not have affected the project diversity very much. Furthermore, the focus was to examine many different CVE:s and the data set analyzed consisted of 220 CVE:s in total for C, C++, and rule specific analysis. The decision to stop at 220 CVE:s was taken due to lack of time, as the thesis had a time limit of 20 weeks. If more CVE:s were analyzed the results would have been more trustworthy due to the law of large numbers, although, 220 CVE:s is still a substantial enough amount to draw reasonable conclusions regarding RQ1 and RQ2.

6.2.2 Analyzing vulnerabilities in CVE In this subsection results from the analysis of the CVE:s will be discussed.

44 6.2. Results

C CVE vulnerabilities As presented in Figure 5.1, 60 different CVE:s were analyzed manually and in 38 of these cases a relevant SEI CERT C rule was found that could have prevented the vulnerability. Since there are projects from both early and late phases of the SDLC in the ones analyzed, this result is interesting in regards to RQ1, as it shows that the SEI CERT C standard could be used to reduce vulnerabilities throughout the whole life cycle of the software development. It also shows that compliance to the SEI CERT C standard would help reduce vulnerabilities to a large extent, more precisely in 63% of the cases according to the data, which gives a definitive answer to RQ2. Although, as said in Section 3.1, Cathal Boogerd and Leon Moonen [11] explains that their study showed that there might be a rise in the number of faults when refactoring code to make it comply to a specific secure coding standard. However, as that study was conducted on the MISRA-C standard, more research needs to be done to verify whether this statement holds for the SEI CERT C Standard as well. Until then, this result is believed to be enough to answer RQ2. In regards to the size of the projects, it can be seen in Table 5.1 that the average size of the ones where a rule could be found was larger than for those where no rule could be found. This will be discussed later when compared to the C++ projects. The table also shows that the average CVSS score does not seem to affect whether the vulnerability could have been prevented by the CERT C standard or not. The result for the rule distribution shows that some CERT C rules occurred more often than others, specifically EXP34-C (Do not dereference null pointers) and in second place ARR32-C. This result was very interesting, since as said in Jiang Zheng et al.’s study [54] from 2006, they also found that "Possible use of NULL pointer" was the most common vulnerabil- ity, which validates the result in regards to the rule distribution. It is especially interesting that the most common vulnerability remains the same 15 years later. In regards to RQ1 and based on this result, a good first step would be to educate developers on these two rules to avoid these issues, since they were the most commonly occurring rules. This could also be interesting to compiler developers, since this result shows that it is worth investing in mak- ing the rules stricter for these types of faults. Further learning could be to work on the rest of the rules mentioned in Figure 5.2. The same reasoning could be applied for the risk level distribution shown in Figure 5.3 as well, focusing the learning of the rules in the L1 and L2 risk levels, since they occurred more often. As said in Section 5.2.1, the tools that were run on this part were Rosecheckers, PVS and CodeChecker. As can be seen in 5.2 the rate of true positives and false negatives were 25% and 75% for Rosecheckers, 21.21% and 78.79% for PVS, and 8.82% and 91.18% for CodeChecker, respectively. The results for Rosecheckers and PVS were slightly better than for the static analysis tools Clang core and Clang alpha, which were analyzed in Andrei Arusoaie et al.’s study [4]. The true positive rate was 15.34% and 28.17% for Clang core and Clang alpha, which means that the false negative rate was 84.66% and 71.83%, respectively. CodeChecker, on the other hand, landed bellow these numbers, but not far away. This makes sense, since CodeChecker uses these, among other checkers, in its analysis. However, something to keep in mind when comparing these numbers is that their study contained 639 test cases and this study was conducted on 60 real world project vulnerabilities, which could explain the dif- ference. There have also been other studies that tested static analysis tools that performed better than both Rosecheckers and PVS in this study. As brought up in Section 5.2.1, Jiang Zheng et al.’s study [54] resulted in a true positive rate of about 30% meaning a 70% false negative rate for the two tools FlexeLint and Klocwork. The results for the tools CppCheck and Flawfinder in Jose D’Abruzzo Pereira’s and Marco Vieira’s study [34] also outperformed the results of the tools from this thesis. Regarding the amount of CVE:s that each of the tools could run, Rosecheckers performed worse than the other tools. This can be a crucial aspect and is something that needs to be considered when answering RQ3, and deciding on what static analysis tools to use, because if the tool cannot be run on the project, it is useless. As

45 6.2. Results can be seen, the true positive and false negative rate differs between studies and from tool to tool. However, this comparison is not entirely fair since the tools were not tested on the same projects containing the vulnerabilities, meaning that some vulnerabilities could have been more difficult to find in some of the studies. In regards to RQ3, this result shows that there are a variety of static analysis tools that can be used to comply with the SEI CERT standards. As shown in Figure 5.4, the tools performed better on different rule areas of the SEI CERT, for example, Rosecheckers was the only tool that found any of the "INT" violations and PVS was the only one to find any FIO47-C violations. Due to this result and in regards to RQ1 and RQ3, a combination of the three tools might be the best approach to reduce vulnerabilities during software development. As can be seen in Figure 5.6, the tools are better at finding violations in smaller projects. This could mean that the tools are not very scalable and may not be as good of a choice for larger projects. Another possible takeaway could be to design and split projects into smaller modules to make the testing with tools more effective and efficient.

C++ CVE vulnerabilities Once again, 60 CVE:s were analyzed, but this time for C++ projects. The manual analysis resulted in 37 out of these having some SEI CERT rule that could have helped prevent the vulnerability. Compared to the C projects, the number of rules found only differed by one. This supports the claim that was presented above regarding RQ2, that compliance to the SEI CERT standard would help reduce vulnerabilities to a large extent, and in this case, more precisely 62% of the cases. In regards to the size of the found projects, the result for C++ did not match the ones presented for C projects, since this time the average size was bigger for the projects where no rule was found. Due to this contradiction, no conclusions can be drawn from this data except that the size does not matter, which has been a question asked since the dawn of time. The average CVSS score for the C++ projects in Table 5.3 is, as for the C projects, not big in difference. This supports the claim that the CVSS score does not affect the outcome whether a CVE violation could be prevented by a SEI CERT rule or not. The rule distribution for the C++ projects points in the same direction as the one for C projects. As can be seen in Figure 5.8, EXP34-C (Do not dereference null pointers) is by far the most occurring one. This result supports the suggestions, made above for C, that a good start to reduce vulnerabilities in the early phase of software development would be to educate the developers about handling of null pointers. The same goes for the risk level distribution for C++ projects, since the result is almost the same as for the C projects. When it comes to the tools that were run on the C++ projects, Table 5.4 shows that the amount of violations found was noticeably low in comparison to the amount of violations found for the C projects, as well as for tools tested in related work [4] [54]. Regarding the amount of CVE:s that each of the tools could run on the C++ projects, it follows the same pat- tern as for the C projects, meaning that Rosecheckers could not run as many as the other two tools. In this case the difference was even larger, hence the same discussion regarding RQ3 can be applied here as well. It is difficult to tell why the tools did worse on the C++ projects, although, a possible reason for this could be that the C++ projects were more difficult to un- derstand and the code for the vulnerability was often split into more than one file, making the manual analysis take longer to figure out what SEI CERT rule, if any, were violated. In the related work conducted by Nathaniel Ayewah et al. [5] they concluded that static analysis tools performed worse on vulnerabilities in programs that were not as trivial as others, this might explain why the static analysis tools did not find as many vulnerabilities in the C++ projects. However, to give a better answer to this, further research needs to be done on the C++ CVE:s. The run time of the tools, which can be seen in Figure 5.12, did differ between them. This result can be useful to answer RQ3, since the run time of the tool can be an important factor when deciding on what tool to use, especially if the analysis needs to be done in a certain

46 6.3. The work in a wider context amount of time. If that is the case, Rosecheckers would be a good choice. A reason for the difference in run time could be that the tools all use slightly different techniques for analy- sis. For example, CodeChecker which took the most time, uses a few different techniques, including one called Cross Translation Unit (CTU) analysis, which could make the analysis take longer. The run time for the tools in relation to size and number of files, as shown in Figures 5.13, 5.14, 5.15 and 5.16, displays a rather steep increase as the size and number of files rises. The scalability of the tools is something that is affected by this, as time spent running tools for very large projects may not be viable. This can be related to the C projects lack of scalability, but for those the issue was the amount of vulnerabilities found as the project size grew larger. Because the tools only found three rule violations for the C++ analysis, it is difficult to tell whether this was the case for C++ projects as well.

Rule specific CVE vulnerabilities The result regarding the amount of violations found per rule supports the previous sugges- tion given in the results discussion of the C part analysis, that a combination of the tools would be the best way to make sure the highest amount of violations will be found. The results for the violations found in relation to project size supports the idea that the tools are better at finding violations for smaller projects, as seen in Figure 5.18 and Table 5.5. However, due to the low amount of large projects being analyzed, the validity of this result may be impaired, it would have been of a higher degree if more resources would have been spent on analysis of larger projects. The run time in relation to size of the projects follows, as for the C++ analysis, the same trend for each tool. Since this part consisted of mostly C projects, namely 47 out of 53, this means that the run time of the tools does not seem to be affected by what languages that it is being run on, at least for C and C++. The big spread of CVSS score in relation to the amount of violations found shows that the CVSS score of the vulnerability does not have any impact on the likelihood of whether the tools will find the violation.

6.3 The work in a wider context

The static analysis tools that were used in this study could also be used by both experi- enced and inexperienced "bad guys" to get information about vulnerabilities in open-source projects. This information could then be used to harm companies using the project or for their own financial gain. It is important for companies to consider the security of their data and especially to pro- tect their users data. The security of the company’s software may improve when complying to a secure coding standard. If the software would be less secure as a consequence of not following any secure coding standards, companies have an ethical responsibility to follow a standard to make their software as secure as possible. The results of this thesis conclude that by complying to the SEI CERT Secure Coding Stan- dard there will be a decrease in number of vulnerabilities in the code. This should not be followed blindly, since even though the results of this thesis says the security will increase by following the standard, it does not mean that the security of the project will increase in every situation. The recommendation to the developers who are reading this thesis and thinking about introducing the SEI CERT Standard into their project is that these results should only work as a source of inspiration, you also need to do your own research regarding your par- ticular project and situation.

47 7 Conclusion

The aim of this study was to investigate whether SEI CERT Secure Coding Standard com- pliance would reduce vulnerabilities, as well as to evaluate static analysis tools in relation to SEI CERT performance and coverage. A manual analysis and three static analysis tools, Rosecheckers, PVS-Studio and CodeChecker, was used to examine the SEI CERT C and C++ standards to tell to what extent it could reduce the vulnerabilities. To make the research more diversified and to represent common everyday software vulnerabilities, the vulnerabilities analyzed were taken from an open database, CVE, that consists of thousands of projects, all different from each other.

7.1 How can vulnerabilities be reduced in the early phase of software development?

The result of the manual analysis showed that 38/60 and 37/60 CVE:s for C and C++ could have been prevented by compliance to a specific SEI CERT rule, which means that adhering to this standard would indeed reduce vulnerabilities during the whole life cycle of software development. Some SEI CERT rule violations were more common than others, which meant that avoiding these would be a good first step to reduce vulnerabilities. The tools did not find as many violations as the manual analysis, however, they did find vulnerabilities and they found them in a more efficient manner since they are automated, therefore it is concluded that the three tools can be used to reduce vulnerabilities. The results of the tools also show that a combination of them would reduce more vulnerabilities, as the tools are good at find- ing different types of vulnerabilities. However, a combination of manual analysis and static analysis tools would result in a more accurate analysis and even less vulnerabilities.

7.2 To what extent does SEI CERT compliance help reduce vulnerabilities?

As said in 7.1, the manual analysis found that more than half of the CVE:s could have been prevented by SEI CERT compliance, more precisely in 63% and 62% cases for the C and C++ related CVE:s.

48 7.3. What tools can help complying with the SEI CERT secure coding standard?

7.3 What tools can help complying with the SEI CERT secure coding standard?

When running the tools on the C related CVE:s, the number of violations found for each tool was 6/24, 7/33 and 3/34 for Rosecheckers, PVS and CodeChecker, which gives a true positive rate of 25%, 21.21%, and 8.82%, respectively. For the C++ related CVE:s, the tools performed worse, here the number of violations found were 1/13, 2/31 and 0/32 for the tools, ordered as above, which gives a true positive rate of 7.69%, 6.45%, and 0%. On the rule specific part, where 10 SEI CERT rules were analyzed, the tools found 7/70, 17/100, and 15/100, meaning 10%, 17%, and 15% true positives, respectively. This result shows that all of the tools could have been used to help comply with the SEI CERT standard in some way. Another aspect to consider is the run time of the tools, where Rosecheckers took the least amount of time, then PVS and finally CodeChecker. When deciding on what tools to use and if the time spent on the analysis is an important factor, this is something that could be worth taking into consideration. Based on the CVE:s analyzed by the tools, they found more violations in smaller projects compared to large ones, meaning that the tools did not scale very well for these CVE:s.

7.4 Future work

There are many interesting aspects left to be studied further in regards to the area of this the- sis. One being to repeat this method to verify the result found in this thesis. Also, including more CVE:s could be a good way to increase the validity of the result. Because of the large amount of accessible static analysis tools on the market today, another interesting area of research would be to compare more than just Rosecheckers, PVS-Studio and CodeChecker. By expanding and evaluating more tools against each other, the recom- mendations of what tools to use will be more comprehensive. Something to also consider is to evaluate more than just different static analysis tools and broaden the horizons by explor- ing alternate solutions like dynamic analysis tools. It would be interesting to see how these types of tools compare to static analysis tools in regards to the SEI CERT standard. This since dynamic analysis tools test the code while the program is executed, hence it can be more effective at finding some vulnerabilities. Another interesting area would be to research large projects to see how the SEI CERT Secure Coding Standard would perform on a larger scale. The results for the static analysis tools would also be of interest, especially how, and by what factors, the run time is affected. Since this thesis only looked at projects 500 MB or smaller, future work could include projects even larger, like the Linux kernel. In a perfect world the SEI CERT Secure Coding Standard and its rules would only reduce vulnerabilities and always be worth following. However, as mentioned in a related work, written by Boogerd and Moonen in 2008 [11], only a few of the rules in the MISRA-C stan- dard were worth following, since refactoring the code to comply with the standard could introduce new faults. A similar study could be made with the SEI CERT Secure Coding Stan- dard to check if compliance to the standard introduces new faults as it did with the MISRA-C standard and to see what rules and recommendations are worth complying with. As this thesis only checked the true positives and false negatives of the different static analysis tools, it would be interesting to also include the false positives of these tools in future work. By introducing this, conclusions about how much time needs to be spent on the false warnings that the tools produce can be drawn.

49 Bibliography

[1] Ijaz Ahmad, Shahriar Shahabuddin, Tanesh Kumar, Jude Okwuibe, Andrei Gurtov, and Mika Ylianttila. “Security for 5G and Beyond”. In: IEEE Communications Surveys Tutori- als 21.4 (2019), pp. 3682–3722. DOI: 10.1109/COMST.2019.2916180. [2] Hamda Hasan AlBreiki and Qusay H Mahmoud. “Evaluation of static analysis tools for software security”. In: 2014 10th International Conference on Innovations in Information Technology (IIT). IEEE. 2014, pp. 93–98. [3] Bushra Aloraini, Meiyappan Nagappan, Daniel M German, Shinpei Hayashi, and Yoshiki Higo. “An empirical study of security warnings from static application security testing tools”. In: Journal of Systems and Software 158 (2019), p. 110427. DOI: 10.1016/ j.jss.2019.110427. [4] A. Arusoaie, S. Ciobâca, V. Craciun, D. Gavrilut, and D. Lucanu. “A Comparison of Open-Source Static Analysis Tools for Vulnerability Detection in C/C++ Code”. In: 2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). 2017, pp. 161–168. DOI: 10.1109/SYNASC.2017.00035. [5] Nathaniel Ayewah, William Pugh, J David Morgenthaler, John Penix, and YuQian Zhou. “Evaluating static analysis defect warnings on production software”. In: Proceed- ings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. 2007, pp. 1–8. [6] Roberto Bagnara, Abramo Bagnara, and Patricia M Hill. “The MISRA C coding stan- dard and its role in the development and analysis of safety-and security-critical embed- ded software”. In: International Static Analysis Symposium. Springer. 2018, pp. 5–23. DOI: 10.1007/978-3-319-99725-4_2. [7] A Ballman. The SEI CERT C++ Coding Standard. 2016. URL: https://wiki.sei.cmu. edu/confluence/display/cplusplus. [Accessed 2021-05-27]. [8] Richard Bellairs. Secure Coding Practice Guidelines. 2021. URL: https://security. berkeley.edu/secure-coding-practice-guidelines. [Accessed 2021-01-05]. [9] Richard Bellairs. What Is Code Quality? And How to Improve Code Quality. 2019. URL: https://www.perforce.com/blog/sca/what- code- quality- and- how- improve-code-quality. [Accessed 2021-02-02]. [10] Richard Bellairs. What Is Secure Coding? 2019. URL: https://www.perforce.com/ blog/sca/what-secure-coding. [Accessed 2021-01-05].

50 Bibliography

[11] Cathal Boogerd and Leon Moonen. “Assessing the value of coding standards: An em- pirical study”. In: 2008 IEEE International Conference on Software Maintenance. IEEE. 2008, pp. 277–286. [12] Cathal Boogerd and Leon Moonen. “Evaluating the relation between coding standard violations and faultswithin and across software versions”. In: 2009 6th IEEE Interna- tional Working Conference on Mining Software Repositories. IEEE. 2009, pp. 41–50. [13] MISRA C. “Guidelines for the use of the C language in critical systems”. In: MIRA Limited. Warwickshire, UK (2004). [14] Dawn M Cappelli, Andrew P Moore, and Randall F Trzeciak. The CERT guide to insider threats: how to prevent, detect, and respond to information technology crimes (Theft, Sabotage, Fraud). Addison-Wesley, 2012. [15] Brian Cashell, William D Jackson, Mark Jickling, and Baird Webel. “The economic im- pact of cyber-attacks”. In: Congressional research service documents, CRS RL32331 (Wash- ington DC) 2 (2004). [16] B. Chess and G. McGraw. “Static analysis for security”. In: IEEE Security Privacy 2.6 (2004), pp. 76–79. DOI: 10.1109/MSP.2004.111. [17] The MITRE Corporation. About CVE. 2021. URL: https://cve.mitre.org/about/ index.html. [Accessed 2021-02-23]. [18] The MITRE Corporation. Frequently Asked Questions. 2021. URL: https : / / cve . mitre.org/about/faqs.html. [Accessed 2021-05-13]. [19] Priyanka Darke, Mayur Khanzode, Arun Nair, Ulka Shrotri, and R Venkatesh. “Precise analysis of large industry code”. In: 2012 19th Asia-Pacific Software Engineering Confer- ence. Vol. 1. IEEE. 2012, pp. 306–309. [20] Ryan Dewhurst. Static Code Analysis. 2021. URL: https : / / owasp . org / www - community/controls/Static_Code_Analysis. [Accessed 2021-01-25]. [21] Lisa Nguyen Quang Do, James Wright, and Karim Ali. “Why do software developers use static analysis tools? a user-centered study of developer needs and motivations”. In: IEEE Transactions on Software Engineering (2020). [22] Ericsson. CodeChecker. 2021. URL: https://github.com/Ericsson/codechecker. [Accessed 2021-02-26]. [23] Xuefen Fang. “Using a coding standard to improve program quality”. In: Proceedings Second Asia-Pacific Conference on Quality Software. IEEE. 2001, pp. 73–78. [24] Juan Felipe García Sierra, Miguel Carriegos Vieira, Jesús Balsa, Fernando Sánchez, Mario Fernández, Alejandro Fernández, Cristian Cadenas, Javier Rodríguez, Vladislav Lebedev, et al. “C Secure Coding Standards Performance: CMU SEI CERT vs MISRA”. In: III Jornadas Nacionales de Investigacion en Ciberseguridad, JNIC2017, Servicio de Publica- ciones de la URJC (2017), pp. 168–169. [25] Mark Grover, Jeffrey Cummings, and Tom Janicki. “Moving beyond coding: why se- cure coding should be implemented”. In: Journal of Information Systems Applied Research 9.1 (2016), p. 38. [26] IBM Security. IBM X-Force Exchange. 2021. URL: https : / / exchange . xforce . ibmcloud.com/. [Accessed 2021-04-20]. [27] Software Engineering Institute. SEI CERT C Coding Standard: Rules for Developing Safe, Reliable, and Secure Systems. 2016. [28] James C King. “Symbolic execution and program testing”. In: Communications of the ACM 19.7 (1976), pp. 385–394. [29] PVS-Studio LLC. Classification of PVS-Studio warnings according to the SEI CERT Coding Standard. 2021. URL: https://www.viva64.com/en/cert/. [Accessed 2021-03-29].

51 Bibliography

[30] PVS-Studio LLC. Free PVS-Studio for Students and Teachers. 2021. URL: https://pvs- studio.com/en/for-students/. [Accessed 2021-02-18]. [31] PVS-Studio LLC. PVS-Studio Analyzer. 2021. URL: https://pvs-studio.com/en/ pvs-studio/. [Accessed 2021-02-18]. [32] Thu-Trang Nguyen, Toshiaki Aoki, Takashi Tomita, and Iori Yamada. “Multiple Pro- gram Analysis Techniques Enable Precise Check for SEI CERT C Coding Standard”. In: 2019 26th Asia-Pacific Software Engineering Conference (APSEC). IEEE. 2019, pp. 70–77. [33] Oracle. Code Conventions for the Java TM Programming Language. 1999. URL: https : / / www . oracle . com / java / technologies / javase / codeconventions - contents.html. [Accessed 2020-12-01]. [34] José D’Abruzzo Pereira and Marco Vieira. “On the Use of Open-Source C/C++ Static Analysis Tools in Large Projects”. In: 2020 16th European Dependable Computing Confer- ence (EDCC). IEEE. 2020, pp. 97–102. [35] Srdan Popi´c,Gordana Veliki´c,HlavaˇcJaroslav, Zvjezdan Spasi´c,and Marko Vuli´c.“The Benefits of the Coding Standards Enforcement and it’s Influence on the Developers’ Coding Behaviour: A Case Study on Two Small Projects”. In: 2018 26th Telecommunica- tions Forum (TELFOR). IEEE. 2018, pp. 420–425. [36] Pawani Porambage, Gürkan Gür, Diana Pamela Moya Osorio, Madhusanka Liyanage, Andrei Gurtov, and Mika Ylianttila. “The roadmap to 6G security and privacy”. In: IEEE Open Journal of the Communications Society (2021). [37] Pawani Porambage, Mika Ylianttila, Corinna Schmitt, Pardeep Kumar, Andrei Gurtov, and Athanasios V Vasilakos. “The quest for privacy in the internet of things”. In: IEEE Cloud Computing 3.2 (2016), pp. 36–45. [38] Hendrik Post, Carsten Sinz, Alexander Kaiser, and Thomas Gorges. “Reducing false positives by combining abstract interpretation and bounded model checking”. In: 2008 23rd IEEE/ACM International Conference on Automated Software Engineering. IEEE. 2008, pp. 188–197. [39] Python Software Foundation. General Python FAQ. 2021. URL: https : / / docs . python.org/3/faq/general.html. [Accessed 2021-05-18]. [40] Daniel Quinlan, Chunhua Liao, Thomas Panas, Robb Matzke, Markus Schordan, Rich Vuduc, and Qing Yi. ROSE User Manual: A Tool for Building Source-to-Source Translators. 2019. URL: http : / / rosecompiler . org / uploads / ROSE - UserManual . pdf. [Accessed 2021-04-13]. [41] Dennis M Ritchie. “The development of the C language”. In: ACM Sigplan Notices 28.3 (1993), pp. 201–208. [42] Dennis M Ritchie, Brian W Kernighan, and Michael E Lesk. The C programming language. Prentice Hall Englewood Cliffs, 1988. [43] Guido van Rossum. Style Guide for Python Code. 2001. URL: https://www.python. org/dev/peps/pep-0008/. [Accessed 2020-12-01]. [44] Robert C Seacord and Jason A Rafail. “Secure coding standards”. In: Proceedings of the Static Analysis Summit, NIST Special Publication 13 (2006), p. 17. [45] SecurityFocus. About SecurityFocus. 2010. URL: https : / / www . securityfocus . com/about. [Accessed 2021-04-20]. [46] National Institute of Standards and Technology. Vulnerability Metrics. 2021. URL: https://nvd.nist.gov/vuln-metrics/cvss. [Accessed 2021-02-23]. [47] Bjarne Stroustrup. “An overview of the C++ programming language”. In: Handbook of object technology (1999).

52 Bibliography

[48] Bjarne Stroustrup. The C++ programming language. Pearson Education India, 2000. [49] David Svoboda. CERT Rosecheckers. 2020. URL: https : / / sourceforge . net / projects/rosecheckers/. [Accessed 2021-02-23]. [50] David Svoboda. Rosecheckers. 2021. URL: https : / / resources . sei . cmu . edu / library/asset-view.cfm?assetid=508017. [Accessed 2021-02-26]. [51] The Clang Team. Clang Static Analyzer. 2021. URL: https : / / clang . llvm . org / docs/ClangStaticAnalyzer.html. [Accessed 2021-04-09]. [52] The Clang Team. Clang-Tidy. 2021. URL: https : / / clang . llvm . org / docs / ClangStaticAnalyzer.html. [Accessed 2021-04-09]. [53] James Walden, Jeff Stuckman, and Riccardo Scandariato. “Predicting vulnerable com- ponents: Software metrics vs text mining”. In: 2014 IEEE 25th international symposium on software reliability engineering. IEEE. 2014, pp. 23–33. [54] Jiang Zheng, Laurie Williams, Nachiappan Nagappan, Will Snipes, John P Hudepohl, and Mladen A Vouk. “On the value of static analysis for fault detection in software”. In: IEEE transactions on software engineering 32.4 (2006), pp. 240–253.

53 A Script for gathering EXP34-C CVE vulnerabilities.

1 import csv 2 import json 3 4 with open("allitems.csv", encoding="utf-8") as csvfile, open("exp34.json","w") as out: 5 with open("scores.json", encoding="utf-8") as cve_scores: 6 csv_reader = csv.DictReader(csvfile, delimiter=",") 7 scores = json.load(cve_scores) 8 counter = 0 9 cves = {} 10 for row in csv_reader: 11 try: 12 desc = row["Description"].lower() 13 if".c" in desc and"github" in row["References"] and"torvalds" not in row["References"]: 14 if"null pointer" in desc and"dereference" in desc: 15 counter += 1 16 cves[row["Name"]] = scores[row["Name"]] 17 except UnicodeDecodeError: 18 print(row) 19 except KeyError: 20 cves[row["Name"]] ="None" 21 continue 22 json.dump(cves, out) 23 print("Found:", counter) Listing A.1: Python script for extracting EXP34-C CVE vulnerabilities.

54 B Script to gather C++ CVE:s

1 import csv 2 import json 3 4 with open("allitems.csv", encoding="utf-8") as csvfile, open("cve_cpp_git2.json","w ") as out: 5 with open("nvdcve-1.1-2018.json", encoding="utf-8") as nvd2018, open("nvdcve -1.1-2019.json", encoding="utf-8") as nvd2019, open("nvdcve-1.1-2020.json", encoding="utf-8") as nvd2020: 6 csv_reader = csv.DictReader(csvfile, delimiter=",") 7 nvd18_data = json.load(nvd2018) 8 nvd19_data = json.load(nvd2019) 9 nvd20_data = json.load(nvd2020) 10 cve_scores = {} 11 for nvd in [nvd18_data, nvd19_data, nvd20_data]: 12 for cve in nvd["CVE_Items"]: 13 try: 14 cve_scores[cve["cve"]["CVE_data_meta"]["ID"] 15 ] = cve["impact"]["baseMetricV3"]["cvssV3"][" baseScore"] 16 except KeyError: 17 cve_scores[cve["cve"]["CVE_data_meta"]["ID"]] ="None" 18 19 with open("scores2.json","w") as save_data: 20 json.dump(cve_scores, save_data) 21 counter = 0 22 cves = {} 23 for row in csv_reader: 24 try: 25 if".cpp" in row["Description"] and"github" in row["References"]: 26 cves[row["Name"]] = cve_scores[row["Name"]] 27 counter += 1 28 except UnicodeDecodeError: 29 continue 30 except KeyError: 31 continue 32 cves_sorted = dict(sorted(cves.items(), key=lambda item: item[1])) 33 json.dump(cves_sorted, out) 34 print("Found:", counter) Listing B.1: Python script for extracting C++ vulnerabilities.

55 C C CVE:s

Table C.1: CVE:s tested in C CVE analysis. # Project Size CVE:s (MB) 1 blosc 22.12 CVE-2020-29367 2 dlt-daemon 8.8 CVE-2020-29394 3 doom-vanille 2.6 CVE-2020-15007 4 FFmpeg 253.65 CVE-2020-35965 CVE-2020-35964 CVE-2020-12284 CVE-2020-13904 5 fluent-bit 44.86 CVE-2020-35963 6 FreeRDP 43.6 CVE-2020-11521 CVE-2020-11522 CVE-2020-11523 CVE-2020-13397 7 gpac 122.63 CVE-2020-11558 8 hiredis 1.33 CVE-2020-7105 9 Janus 37.41 CVE-2020-13898 CVE-2020-13899 CVE-2020-14033 CVE-2020-13900 CVE-2020-13901 10 jbig2dec 0.958 CVE-2020-12268 11 JerryScript 57.33 CVE-2020-29657 CVE-2020-13649 CVE-2020-14163 CVE-2020-13991 12 kitty 24.8 CVE-2020-35605 13 krb5 72.06 CVE-2020-28196 14 Libexif 3.97 CVE-2020-12767

56 15 Libjpeg 13.74 CVE-2020-13790 16 libming 18.27 CVE-2019-9113 17 libsixel 33.78 CVE-2020-11721 18 libvips 74.11 CVE-2020-20739 19 LibVNCServer 13.89 CVE-2020-14405 CVE-2020-14402 CVE-2020-14401 CVE-2020-14397 CVE-2020-14396 20 MariaDB 6.28 CVE-2020-13249 21 md4c 1.24 CVE-2020-26148 22 mRuby 16.32 CVE-2020-15866 23 MuJS 0.929 CVE-2020-24343 24 nDPI 115.01 CVE-2020-11940 25 openjpeg 110.52 CVE-2020-15389 26 pam_tacplus 0.503 CVE-2020-13881 27 Pillow 71.32 CVE-2020-5312 CVE-2020-5313 CVE-2020-5311 CVE-2020-5310 CVE-2020-11538 28 PostSRSd 0.231 CVE-2020-35573 29 QEMU 321.03 CVE-2020-13765 30 radare2 128.06 CVE-2020-16269 31 rauc 4.45 CVE-2020-25860 32 tmux 7.8 CVE-2020-27347 33 uftpd 0.634 CVE-2020-14149 CVE-2020-5204 34 VLC 380.22 CVE-2020-13428 35 WavPack 10.28 CVE-2020-35738 36 wolfSSL 153.4 CVE-2020-11713 CVE-2020-11735 37 x11vnc 13.89 CVE-2020-29074

57 D C++ CVE:s

Table D.1: CVE:s tested in C++ CVE analysis. # Project Size CVE:s (MB) 1 adplug 4.93 CVE-2018-17825 2 AFFLIBv3 0.946 CVE-2018-8050 3 aspell 8.07 CVE-2019-17544 4 audiofile 2.35 CVE-2019-13147 5 Bento4 47.33 CVE-2018-14445 CVE-2018-20186 CVE-2018-20659 CVE-2018-13846 CVE-2019-13238 CVE-2018-14589 CVE-2019-17530 CVE-2018-5253 6 Binaryen 112.44 CVE-2019-15759 CVE-2019-7153 CVE-2019-7662 CVE-2019-7701 CVE-2019-7702 CVE-2019-7704 7 cbang 19.7 CVE-2020-15908 8 EOSIO 208.81 CVE-2018-11548 9 Exiv2 88.78 CVE-2018-17282 CVE-2018-19607 CVE-2018-8977 CVE-2018-9304 CVE-2019-17402 CVE-2019-20421 10 GDal 250.53 CVE-2019-17545

58 11 graphite 35.4 CVE-2018-7999 12 Jack2 14.92 CVE-2019-13351 13 Leanify 1.35 CVE-2019-12835 14 libproxy 1.13 CVE-2020-26154 CVE-2020-25219 15 LibRaw 13.17 CVE-2020-15365 CVE-2018-20337 CVE-2018-10529 CVE-2020-24889 16 libsass 12.26 CVE-2018-20190 17 libzmq 18.53 CVE-2019-6250 18 Marlin 76.14 CVE-2018-1000537 19 MediaInfoLib 18.44 CVE-2019-11373 20 nnabla 117.42 CVE-2019-10844 21 Openexr 46.79 CVE-2020-16588 CVE-2018-18444 22 openmpt 110.29 CVE-2018-6611 23 phosphor-host- 7.05 CVE-2020-14156 ipmid 24 qBittorrent 180.83 CVE-2019-13640 25 quassel 32.75 CVE-2018-1000179 26 ros_com 12.29 CVE-2019-13445 27 sam2p 2.11 CVE-2018-7553 28 sddm 9.1 CVE-2018-14345 29 serenity 58.07 CVE-2019-20172 30 sleuthkit 53.05 CVE-2019-14532 31 tcpflow 50.8 CVE-2018-14938 32 Teeworlds 106.59 CVE-2019-10879 CVE-2019-10878 CVE-2019-10877 CVE-2020-12066 33 WAVM 15.02 CVE-2018-17293 34 znc 18.88 CVE-2019-12816 35 ZoneMinder 122.31 CVE-2019-6991

59 E Rule Specific CVE:s

Table E.1: CVE:s tested in Rule Specific CVE analysis. # Project Size Rule CVE:s (MB) 1 Bento4 47.33 MEM35-C CVE-2018-20659 CVE-2018-20502 CVE-2018-20186 2 cyrus-sasl 13.47 STR31-C CVE-2019-19906 3 dlt-daemon 8.8 STR31-C CVE-2020-29394 FIO47-C CVE-2020-29394 4 exiv2 89.49 EXP34-C CVE-2019-13114 INT33-C CVE-2019-14982 INT32-C CVE-2018-9304 5 ffjpeg 0.186 EXP34-C CVE-2019-19887 INT33-C CVE-2019-19888 6 FFmpeg 254.16 EXP34-C CVE-2019-17539 MEM30-C CVE-2020-13904 EXP33-C CVE-2015-3417 CVE-2019-12730 7 gpac 120.47 MEM30-C CVE-2019-20628 ARR30-C CVE-2019-20630 8 hiredis 1.33 EXP34-C CVE-2020-7105 9 icu 252.97 INT32-C CVE-2020-10531 10 Janus 37.48 STR31-C CVE-2020-14033 FIO47-C CVE-2020-14034 CVE-2020-14033 11 jasper 3.13 MEM30-C CVE-2015-5221 12 jq 6.67 STR31-C CVE-2015-8863 13 krb5 72.06 MEM30-C CVE-2014-9421 14 leptonica 22.27 FIO47-C CVE-2018-7186 15 libav 86.68 STR31-C CVE-2019-9720

60 16 Libdoc 0.049 EXP34-C CVE-2019-7233 INT33-C CVE-2019-7156 17 libgit2 54.41 INT30-C CVE-2018-8098 18 libiec61850 5.3 ARR30-C CVE-2019-19957 19 libiec61850 5.32 MEM35-C CVE-2019-19958 CVE-2019-19930 20 libming 18.27 EXP34-C CVE-2018-9165 STR31-C CVE-2018-20429 MEM30-C CVE-2019-9113 INT32-C CVE-2018-13251 MEM35-C CVE-2018-8964 CVE-2018-9009 CVE-2019-12980 CVE-2018-7867 21 libmysofa 244.58 EXP33-C CVE-2019-20063 22 libpcap 14.63 FIO47-C CVE-2019-15165 23 libredwg 74.33 ARR30-C CVE-2020-6610 MEM35-C CVE-2019-20915 CVE-2020-6609 CVE-2020-6613 24 libsixel 32.2 ARR30-C CVE-2019-19637 INT30-C CVE-2019-19637 INT32-C CVE-2019-19638 CVE-2019-20205 CVE-2019-3574 25 libsolv 13.76 ARR30-C CVE-2019-20387 26 libssh2 3.06 INT30-C CVE-2019-13115 27 libu2f-host 0.542 EXP33-C CVE-2019-9578 28 libvips 75.96 EXP33-C CVE-2020-20739 29 LibVNCServer 13.89 EXP34-C CVE-2020-14397 EXP33-C CVE-2019-20788 INT32-C CVE-2018-21247 CVE-2018-7225 30 libzmq 21.49 INT30-C CVE-2019-6250 31 lua 10.12 EXP34-C CVE-2020-24369 32 matio 4.74 EXP33-C CVE-2019-17533 33 miniupnp 3.69 MEM30-C CVE-2019-12106 34 nDPI 114.66 ARR30-C CVE-2020-11939 INT30-C CVE-2020-11940 35 neomutt 110.64 FIO47-C CVE-2018-14353 INT32-C CVE-2018-14360 36 nfdump 5.35 INT30-C CVE-2019-14459 37 oniguruma 5.51 INT30-C CVE-2019-19012 38 openjpeg 110.52 MEM30-C CVE-2019-6988 FIO47-C CVE-2015-8871 INT33-C CVE-2016-4797 MEM35-C CVE-2017-17479 CVE-2017-17480 CVE-2018-20845 CVE-2018-14423 CVE-2016-10506

61 39 -src 417 STR31-C CVE-2015-8617 FIO47-C CVE-2016-10160 CVE-2015-8617 40 Pillow 71.32 ARR30-C CVE-2020-10378 CVE-2020-10994 41 rabbitmq-c 2.91 INT32-C CVE-2019-18609 42 radare2 128.06 MEM30-C CVE-2017-9762 43 schismtracker 7.55 INT30-C CVE-2019-14523 44 ssdp-responder 0.39 STR31-C CVE-2019-14323 45 swftools 74.33 INT33-C CVE-2017-16890 46 taglib 71.32 INT33-C CVE-2012-1107 47 teeworlds 106.59 INT32-C CVE-2019-10877 48 udisks 9.69 FIO47-C CVE-2018-17336 49 uftpd 0.634 STR31-C CVE-2020-5204 50 uriparser 1.31 INT32-C CVE-2018-19199 51 viabtc exchange 0.778 INT30-C CVE-2018-17569 server 52 WavPack 10.28 EXP33-C CVE-2018-7254 INT33-C CVE-2018-10538 MEM35-C CVE-2019-1010317 CVE-2019-1010319 CVE-2019-1010315 53 yara 20.44 EXP33-C CVE-2018-19974

62 F Rule Specific figures

F.1 ARR30-C

Figure F.1: ARR30-C Size related to run time.

63 F.1. ARR30-C

Figure F.2: ARR30-C Size related to number of found violations.

Figure F.3: ARR30-C CVSS related to number of found violations.

64 F.2. EXP33-C

F.2 EXP33-C

Figure F.4: EXP33-C Size related to run time.

Figure F.5: EXP33-C Size related to number of found violations.

Figure F.6: EXP33-C CVSS related to number of found violations.

65 F.3. EXP34-C

F.3 EXP34-C

Figure F.7: EXP34-C Size related to run time.

Figure F.8: EXP34-C Size related to number of found violations.

Figure F.9: EXP34-C CVSS related to number of found violations.

66 F.4. FIO47-C

F.4 FIO47-C

Figure F.10: FIO47-C Size related to run time.

Figure F.11: FIO47-C Size related to number of found violations.

Figure F.12: FIO47-C CVSS related to number of found violations.

67 F.5. INT30-C

F.5 INT30-C

Figure F.13: INT30-C Size related to run time.

Figure F.14: INT30-C Size related to number of found violations.

Figure F.15: INT30-C CVSS related to number of found violations.

68 F.6. INT32-C

F.6 INT32-C

Figure F.16: INT32-C Size related to run time.

The tools found no violations for INT32-C, therefore there is only one interesting graph.

F.7 INT33-C

Figure F.17: INT33-C Size related to run time.

Figure F.18: INT33-C Size related to number of found violations.

69 F.8. MEM30-C

Figure F.19: INT33-C CVSS related to number of found violations.

F.8 MEM30-C

Figure F.20: MEM30-C Size related to run time.

Figure F.21: MEM30-C Size related to number of found violations.

70 F.9. MEM35-C

Figure F.22: MEM30-C CVSS related to number of found violations.

F.9 MEM35-C

Figure F.23: MEM35-C Size related to run time.

Figure F.24: MEM35-C Size related to number of found violations.

71 F.10. STR31-C

Figure F.25: MEM35-C CVSS related to number of found violations.

F.10 STR31-C

Figure F.26: STR31-C Size related to run time.

Figure F.27: STR31-C Size related to number of found violations.

72 F.10. STR31-C

Figure F.28: STR31-C CVSS related to number of found violations.

73