Code Audit Report

Muhammad Uzair May 25, 2017

Abstract Many applications are being used by individuals and businesses these days. Also, these applications becoming a part of daily life. These appli- cations use personal information and sometimes very important business transaction information. The security of that information is a very im- portant issue. To make the information secure enough, it is required that the application is working as it should be. Code audit/review is done in this regard. If there are any flaws, weaknesses, threats or any kind of vulnerabilities found in the code, then actions are taken accordingly and fixes are applied. Some code audits examples are presented in this report and then the code audit report of DigiDoc is presented, which is a module of Open-EID.

1 Introduction

Security of the applications being used is a major concern nowadays. Applica- tions have become a very common part of our daily life. These applications are being used for business and for personal use. Personal data, internal organi- zational data is managed by these applications so it is critical and essential to protect this data. There are many challenges when we talk about the protection of this important data and the security of these applications. Also the exposure to internet has made these applications a prime target for the information they contain. There are many advantages of Free and Open-Source Software (FOSS). One of them is that their source code is available which enables anyone to check the code and fix bugs, anyone can add new features and it is also important from security point of that anyone can check the security of the code and make it better. The free availability of the code also enables the organizations to find any vulnerabilities in the code and allow them to refine the security which results into a safer user experience and also make the applications secure.

2 Background

Here are some of the code audits done by the EU-FOSSA (Free and Open-Source Software Auditing). FOSSA focuses on the security aspects of the software. Following code audits are included in this report:

1 • Apache Core & APR (Apache Portable Runtime) • KeePass Password Safe The details of the code analysis are below:

2.1 Apache Core & APR Code Review Apache HTTP server is one of the widely used HTTP and proxy server and it is a FOSS (Free and Open Source Software) [3]. It has become mature because many security flaws have been detected and corrected since it started running in 1995. The objective of the code review was to examine the Apache Core & APR and the main focus was on security aspects.

2.1.1 Scope

Application Name Apache Core & APR Code Reviewer owner European Commission – Directorate-General for Informatics (DIGIT) Review Start 25/07/2016 Review End 22/08/2016 Objective Security Code Review Number of Lines 61286 Code Review Mode • Managed • Defined

• Optimized

Libraries • Apache Core • Apache Portable Runtime

Extensions N/A Services Required N/A

Table 1: Scope of Apache Core & APR

2.1.2 Executive Summary The Apache server is composed of many components. This code review only focused on the core of Apache server and did not include any external module or extension.

2 The reviewed modules are Apache HTTP core and the Apache Portable Run- time. The total number of reviewed lines were 61286 which makes nearly 20% of the total lines of Apache server. When doing code review, the Linux and Windows were considered because they are the most widely used operating sys- tems. Talking about the code review, there were many findings discovered and all of those findings were within the Secure Code Design and Specific controls categories. Only 7 out of 160 controls had at least one finding detected in them which is a low percentage overall. Following is the summary of the findings and their risk level:

Risk Level Finding Critical 0 High 0 Medium 0 Low 2 Info 5

Table 2: Summary of findings in Apache Core & APR

It is obvious that there were no critical or even medium or high risks. The findings were classified as low or informative, but they still should be fixed.

2.1.3 Methodology The methodology was consisting of four phases and each phase had 3 further activities. Following are the phases and their activities:

1. Planning The first phase is all about gathering the required information which can be useful for code review. It includes the basic information about code review, applicable test cases and the preparation of test environments if there is any need.

• Preparation – information is gathered to define the scope for the code review. • Test Design – test cases are defined once the scope is defined in order to achieve the objectives. • Environment Preparation – before starting the next phase, it is made sure that the environment is ready for conducting the test cases. 2. Execution In this phase, the test cases are executed which were selected in the previ- ous phase. The execution was divided into three parts. Each part provide data as input for the other part.

3 • Managed Mode – this phase uses the automated tools for the execution of test cases. Following categories were analyzed: • Data/Input Management (DIM) • Authentication Controls (AUT) • Session Management (SMG) • Authorization Management (ATS) • Cryptography (CPT) • Error Handling/Information Leakage (EHI) • Software communication (COM) • Logging/Auditing (LOG) • Secure Code Design All of the above mentioned categories have further sub-categories but those are not important to mention here. • Defined Mode – the results are gathered from automated tools and compared with manual tests results for final results. • Optimized Mode – in this section the riskiest parts are evaluated. They are divided into following sub-categories: • Concurrency (CCR) • Denial of Service (DOS) • Memory and resource management (MRM) • Code Structure (COS) • Role-privilege matrix (RPM) There were also language specific controls in the optimized mode as follows: • Pre-Processor (PRE) • Variable Management (VMG) • Memory Management (MEM) • File I/O Management (FIO) • Environment (ENV) • Signal and Error Handling (SEH) • Concurrency (CON) • Miscellaneous (MSC) 3. Assessment The risk analysis of the findings from the previous steps was done in this phase. The score was calculated on the basis of threat and vulnerability level of the findings. Following are the scores that were assigned to the findings: The findings were marked as Low, Medium or High based on their average numeric results. Following are the sub-categories of assessment phase:

4 Numerical Value Impact 0 to 3 Low 4 to 6 Medium 7 to 9 High

Table 3: Assessment of findings

• Technical Report • Impact Analysis • Finding Prioritization 4. Reporting • Report • Report Dissemination • Post-audit

The controls were divided into two main groups: 1. Common controls – these controls are applicable regardless of the language of the code. 2. Language specific controls – for C, C++, JAVA or PHP.

The combination of both groups should be used for accurate results.

2.1.4 Detailed Results There were findings in 7 controls. Following are those controls: • Secure Code design – Framework Requirement: SCD-FWK-001 (info)

• Specific C Controls – Variable Management: CBC-VMG-004 (info), CBC-VMG-011 (info) – Memory Management: CBC-MEM-001 (low), CBC-MEM-005 (info) – File I/O Management: CBC-FIO-001 (low) – Signal and Error Handling: CBC-SHE-007 (info)

There were evidences provided for every control and the recommendations to overcome that risk.

5 2.1.5 Conclusion Most of the findings were language-specific. It is because the reviewed part did not include as many functionalities as other parts of the Apache server. The main focus was on the APR as it is very important from the security point of view. The final conclusion stated that both Apache Core & APR have a good level from a security point of view. There were only few controls with findings and those were not of high severity. Also those findings cannot be considered as directly security flaws because security is a set of layers and therefore several risky findings are necessary to compromise the software.

2.2 KeePass Code Review KeePass is a free and open source software tool that helps to manage passwords in a secure way [4]. The passwords can be stored in one database which is locked with one master key or a key file. So only one password or key file is required to unlock the database. AES and Twofish encryption algorithms are used to encrypt the databases.

2.2.1 Scope

Application Name KeePass Password Safe Code Reviewer owner European Commission – Directorate-General for Informatics (DIGIT) Review Start 24/08/2016 Review End 23/09/2016 Objective Security Code Review Number of Lines 84622 Code Review Mode • Managed • Defined • Optimized

Libraries MFC v 9.0 (It was not in the scope of code review because it is not open source) Extensions N/A Services Required N/A

Table 4: Scope of KeePass Password Safe

6 2.2.2 Executive Summary There were 84622 lines to be reviewed. To speed up the process, the total was divided into 33 sub-sections and it was handed over to the review team. Following are the categories in which the team discovered findings: • Error Handling/Information Leakage • Logging/Auditing • Secure Code Design • Specific C Controls • Specific C++ Controls

2.2.3 Methodology The same methodology was adopted for KeePass as it was used for the code review of Apache Core & APR.

2.2.4 Assessment The same assessment method was adopted for KeePass as it was used for the code review of Apache Core & APR.

2.2.5 Detailed Results There were 14 controls with findings. Following are those categories and sub- categories with findings: • Error Handling/Information Leakage – Error Handling (1 info)

• Logging/Auditing – Log Configuration Management (1 info)

• Secure Code Design – Framework Requirements (1 low) – Variable types/operations (1 low)

• Specific C Controls – Variable Management (1 medium, 1 low) – Memory management (1 medium) – Environment (1 medium) – Miscellaneous (1 medium)

7 • Specific C++ Controls – Variable Management (2 info) – Object-Oriented Programming (2 info) – Miscellaneous (1 medium)

2.2.6 Conclusion Again most of the findings were language specific. Main focus was on the en- cryption algorithm as it is critical section from the security point of view. GUI was also considered for code review. The final conclusion states that the code has a good level from a security point of view. There were no findings with critical or high risk in nature.

2.3 Effort and Cost Estimation The following activities were conducted in order to estimate the code effort:

Lines of Code • Around 61000 in C • Around 84000 in C++

Code Review Team • 3 Members for C Code • 4 Members for C++ Code

Time-line Four Weeks Number of Controls • 160 for C Language • 218 for C++ Language

Table 5: Effort and Cost Estimation

The above data resulted in: • 145000 lines of code in total • 28 reviewers per week • 129.5 lines of code per reviewer per hour

8 These examples are taken from the project named EU-FOSSA. It was initiated by the European Parliament in October 2014. It started proper execution at the end of 2015. The budget of the project was 1 million euros. The budget was increased for the year of 2016 and it was set to 1.9 million euros [8]. This project was for one year and if we want to calculate the hourly cost for one code reviewer, then it would be as follows:

Hourly Cost = 1 Million/7 reviewers×160 hours per month × 12 months Hourly Cost = 74.4 Euro

2.4 Tools used for Code Review 1. CodeLite – CodeLite is an open source cross platform for development and it supports many popular languages like C, C++ and PHP etc [5].

2. F lawfinder – it is an open source software to examine C/C++ source code. It also reports the “flaws” in code if there are any [9].

2.5 Static Code Analysis Static code analysis is done to verify the quality, reliability and security of a software. Defects and security vulnerabilities can be identified which can com- promise the security of the software. Some code analysis methods also enables to diagnose run-time errors such as overflows, divide by zero, and illegally deref- erenced pointers. Static code analysis can is a cost-effective approach because it eliminates the need to write test cases. It is also automated, which means that the analysis can be done even without executing the program[7].

Basic static code analysis techniques include: • Generating code quality metrics, such as counting the number of lines of code, determining comment density, and assessing code complexity[7]

• Verifying compliance with coding standards such as MISRA C R /C++ or JSF++ (Joint Strike Fighter Air Vehicle C++)[7] There are also sophisticated techniques which combines the formal methods with static code analysis.

The combination of static code analysis and formal methods enables to:

• Detect software defects and security vulnerabilities[7] • Comply with MISRA, CWE, CERT C, ISO/IEC 17961, and other stan- dards and cybersecurity guidelines[7] • Prove the absence of certain run-time errors[7]

9 Usually, the static code analysis tools find vulnerabilities at a high rate but it is not guaranteed that they will be able to find different kinds of security flaws. So, these tools can be considered as aids for code/security analysts so that they can find security flaws more efficiently. Some tools are built-in into the Integrated Development Environment (IDE). These tools can provide immediate feedback during the development cycle. This immediate feedback is very useful as compared to finding vulnerabilities much later in the development cycle.

2.5.1 Strengths • Scalability • Some tools are very good at finding vulnerabilities like buffer overflow, SQL Injection, etc.

2.5.2 Weaknesses • Many type of security vulnerabilities are very difficult to find automati- cally such as authentication problems, access control issues, insecure use of cryptography, etc. • High numbers of false positives. • Not very good at finding configuration issues. • It is hard to prove that the found vulnerability is an actual security flaw.

2.5.3 Tools for Static code analysis 1. Multi-language • Axivion Bauhaus Suite - C, C++, C#, and Java • Cigital - Java, .NET, and PHP • Codacy - Python, Ruby, PHP, Java, JavaScript, Scala • ConQAT - Java, C#, C++, JavaScript, ABAP, Ada 2. .NET • .NET Compiler Platform • CodeIt.Right • CodeRush 3. C, C++ • BLAST • Cppcheck • Coccinelle

10 • Coverity 4. JAVA • Checkstyle • Coverity • FindBugs • IntelliJ IDEA

11 3 Code Review of Open-EID

The objective was to do the code review of the “libdigidocpp” module of the Open-EID. It is a library that offers the creation, signing and verification of the digitally signed documents. It uses XAdES and XML-DSIG standards [2].

Following is the framework of digidoc:

Figure 1: DigiDoc Framework (Courtesy of Estonian Information System Authority c ) [1]

3.1 Methodology Flawfinder has been used to review the code to this point, but manual code review is also an option to verify the output of Flawfinder and to carefully examine the code as it is a tool after all and we cannot fully rely on it.

3.2 Executive Summary Following modules of “DigiDoc” were analyzed: • Cryptography • XML Encoding • Utility

12 • Zip Utility • XAdES Standard There were total 96 files and 25036 lines of code reviewed. Flawfinder checks for the built-in functions and report if there are any kinds of weaknesses in the code. The results from Flawfinder can then be compared with Common Weakness Enumeration (CWE) [6].

3.3 Detailed Results Every “Hit” found by Flawfinder has a Common Weakness Enumeration num- ber which provide details about that specific hit.

Following are the details of results for a specific module:

Module Hits Found CWE – 126 Crypto CWE – 119 CWE – 120 CWE – 807 CWE – 20 CWE – 120 Utility CWE – 377 CWE – 126 CWE – 732 CWE – 362 XML Encoding CWE – 120 CWE – 362 Zip Utility CWE – 120 CWE – 126 CWE – 362 CWE – 120 CWE – 20 XAdES Standard CWE – 134 CWE – 676 CWE – 190

Table 6: Detailed Results with CWE number

Following is the description of the hits found from the Common Weakness Enumeration: • CWE – 20: Improper Input Validation It occurs when the software does not validate the input properly. An attacker can craft the input in a format which is not expected by the

13 application. It can lead the system to a state which may result in altered control flow or arbitrary code execution. • CWE – 119: Improper Restriction of Operations within the Bounds of a Memory Buffer It occurs when the application performs operations on a memory buffer, but it can read from or write to a memory location that is outside of the intended boundary of the buffer. As a result, an attacker may be able to execute arbitrary code, alter the intended control flow, read sensitive information, or cause the system to crash. • CWE – 120: Buffer Copy without checking the size of the input (“Buffer Overflow”) The program can cause a buffer overflow when it does not check the size of the input and output buffer and tries to copy the data. • CWE – 126: Buffer Over read This happens when the program tries to read from a buffer and it can go beyond the reference memory location of the buffer. It can cause the exposure of sensitive information or sometimes even a crash of the appli- cation. • CWE – 134: Use of Externally-Controlled Format String When an application is accepting some format string from external sources then an attacker can modify that string, which can result into buffer over- flow, denial of service or data representation problems. • CWE – 190: Integer Overflow or Wraparound When the result of a calculation produces an integer overflow. This prob- lem can lead to weaknesses such as resource management or execution control. This becomes security-critical when the result is used to con- trol looping, make a security decision, or determine the offset or size in behaviors such as memory allocation, copying, concatenation, etc. • CWE – 362: Concurrent Execution using Shared Resource with Improper Synchronization When the program is running a code concurrently with other code, and the other code can modify the shared resources, then this situation can lead to security implications if the expected synchronization is in security-critical code such as authentication or modification some important information. • CWE – 377: Insecure Temporary File Creation of a insecure temporary file can make the application vulnerable to the attacks. • CWE – 676: Use of potentially dangerous function If the program uses a function in an incorrect way, it could lead to some vulnerabilities. The function can be used safely, though.

14 • CWE – 732: Incorrect Permission Assignment for Critical Resource This problem occurs when the application specifies the permissions in a way that any unintended actor can read or modify the resource. It can be critical when the resource is related to configuration, execution or sensitive user data.

• CWE – 807: Reliance on Untrusted Inputs in a Security Decision When the protection mechanism of an application depends upon the input, but that input can be modified by an untrusted actor in a way that it bypasses the protection mechanism.

3.4 Recommendations Following are some of the recommendations which can be adopted to minimize or even eliminate the risk of the weaknesses/vulnerabilities:

Table 7: Recommendations

Common Weakness Enu- Recommendation meration CWE – 20: Improper Input Val- The developer should under- idation stand all the potential areas where there can be untrusted inputs, such as parameters, cookies, query results, e-mail, databases, etc[6]. CWE – 119: Improper Restric- Check the buffer size and also tion of Operations within the check the boundaries of the Bounds of a Memory Buffer buffer when accessing in loop[6]. CWE – 120: Buffer Copy with- Same as for CWE - 119 out checking the size of the input (“Buffer Overflow”) CWE – 126: Buffer Over read The boundaries of the buffer must be checked when using with indexing mechanism[6]. CWE – 134: Use of Externally- It must make sure that all Controlled Format String the string arguments should be passed as static arguments to the functions so that they cannot be modified[6]. CWE – 190: Integer Overflow or There should be input validation Wraparound on the numeric values to make sure that they are in between the minimum and maximum ex- pected range[6].

15 Common Weakness Enu- Recommendation meration CWE – 362: Concurrent Execu- Thread-safe functions should be tion using Shared Resource with used when using multi-threading Improper Synchronization and when operating on shared variables[6]. CWE – 377: Insecure Temporary mkstemp() function should be File used to create temporary files be- cause it is a safe way to create temporary files[6]. CWE – 676: Use of potentially The prohibited API functions dangerous function should be identified and inform the developers to be careful when using them. Automated tools are a good way to identify such functions[6]. CWE – 732: Incorrect Permis- Check the permissions when us- sion Assignment for Critical Re- ing a critical resource. Gener- source ate an error or even close the application if there is a possibil- ity that the resource could have been modified[6]. CWE – 807: Reliance on Un- Same as CWE - 20 trusted Inputs in a Security De- cision

3.5 Conclusion The security of applications being used nowadays is a great concern. Many steps have been taken into account to make sure the security and code review is one of them. Code review of open source software can be done by anyone/organization which is a great opportunity. Our objective was to do the code review of the “libdigidocpp” library from “DigiDoc”, which is a module of Open-EID and is used to create, sign and verify the digitally signed documents. The results show that there are some findings which can lead to security flaws if proper programming conventions are not followed. Also, to avoid security flaws, the attention should be paid to integer overflow, untrusted inputs, buffer overflow and buffer over read as they seem to be more critical according to security point of view. There are also recommendations to avoid these vulnerabilities.

16 Acknowledgments With deep sense of gratitude, I acknowledge the encouragement, motivation and guidance received from Mr. Benson Muite. I learned many things as this was entirely a new topic for me. I would also like to thanks Prof. Vitaly Skachek for organizing the seminar, Mr. Martin Paljak (one of the developers of ”libdigidocpp”) for reviewing the report and providing suggestions and all the students who participated in the seminar.

References

[1] Estonian Information System Authority. Digidoc framework. http: //open-eid.github.io/libdigidocpp/manual.html, 2017. [Online; ac- cessed 25-May-2017].

[2] Estonian Information System Authority. Digidoc:libdigidocpp. https: //github.com/open-eid/libdigidocpp, 2017. [Online; accessed 25-May- 2017].

[3] DIGIT. Apache core & apr code review deliverable. https://joinup.ec. europa.eu/community/eu-fossa/og_page/project-deliveries, 2017. [Online; accessed 25-May-2017].

[4] DIGIT. Keepass password safe code review deliverable. https://joinup. ec.europa.eu/community/eu-fossa/og_page/project-deliveries, 2017. [Online; accessed 25-May-2017].

[5] Eran Ifrah. Codelite. https://codelite.org/, 2017. [Online; accessed 25-May-2017].

[6] The MITRE Corporation National Cybersecurity FFRDC. Common weak- ness enumeration. https://cwe.mitre.org/, 2017. [Online; accessed 25- May-2017].

[7] OWASP. Static code analysis. https://www.mathworks.com/discovery/ static-code-analysis.html, 2017. [Online; accessed 8-June-2017].

[8] European Parliament. Eu-fossa project. https://www.marietjeschaake. eu/en/eu-budget-creates-bug-bounty-programme-to-improve-cybersecurity, 2017. [Online; accessed 25-May-2017].

[9] David A. Wheeler. Flawfinder. https://www.dwheeler.com/flawfinder/, 2017. [Online; accessed 25-May-2017].

17