Automated Secure Code Review for Webapplications

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2021 Automated secure code review for webapplications SADEQ GHOLAMI ZEINEB AMRI KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Title English: Automated secure code review for webapplications Svenska: Automatiserad kodgranskning för webbapplikationer Authors Sadeq Gholami <[email protected]> Zeineb Amri <[email protected]> School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Place for Project Stockholm, Sweden Examiner Johan Montelius KTH Royal Institute of Technology Supervisors Fadil Galjic KTH Royal Institute of Technology Christoffer Jerkeby FSecure ii Abstract Carefully scanning and analysing webapplications is important, in order to avoid potential security vulnerabilities, or at least reduce them. Traditional code reviewing methods, such as manual code reviews, have various drawbacks when performed on large codebases. Therefore it is appropriate to explore automated code reviewing tools and study their performance and reliability. The literature study helped identify various prerequisites, which facilitated the application of automated code reviewing tools. In a case study, two static analysis tools, CodeQL and Semgrep, were used to find security risks in three open source web applications with already known vulnerabilities. The result of the case study indicates that the automated code reviewing tools are much faster and more efficient than the manual reviewing, and they can detect security vulnerabilities to a certain acceptable degree. However there are vulnerabilities that do not follow a pattern and are difficult to be identified with these tools, and need human intelligence to be detected. Keywords: automated code reviewing tools, CodeQL, Semgrep, code review, security vulner abilities, webapplications iii Abstrakt Det är viktigt att skanna och analysera webbapplikationer noggrant för att undvika potentiella säkerhetsproblem eller åtminstone minska dem. Traditionella kodgranskningsmetoder, såsom manuella kodgranskningar, har olika nackdelar när de utförs på stora kodbaser. Därför är det lämpligt att utforska automatiserade verktyg för kodgranskning och studera deras prestanda och tillförlitlighet. Litteraturstudien hjälpte till att identifiera olika förutsättningar, som underlättade tillämpningen av automatiserade kodgranskningsverktyg. I en fallstudie användes två statiska analysverktyg, CodeQL och Semgrep, för att hitta säkerhetsrisker i tre open sourcewebbapplikationer med redan kända sårbarheter. Resultatet av fallstudien indikerar att de automatiska kodgranskningsverktygen är mycket snabbare och effektivare än de manualla kodgranskningar och att de kan upptäcka säkerhetsproblem i viss acceptabel grad. Det finns emellertid sårbarheter som inte följer ett mönster och som är svåra att identifiera med dessa verktyg, och behöver mänsklig intelligens för att upptäckas. Nyckelord: automatiserade kodgranskningsverktyg, CodeQL, Semgrep, kodgranskning, säkerhet sårbarheter, webbapplikationer iv Acknowledgements Firstly, we would like to thank our advisors at KTH Royal Institute of Technology, Fadil Galjic and Johan Montelius, for providing us with their valuable insights and feedback throught the entirety of the degree project. Secondly, we would like to give very special thanks to our supervisor at FSecure, Christoffer Jerkeby, for his guidance and helping us to achieve this project. v vi Contents 1 Introduction 1 1.1 Background .................................. 2 1.2 Problem .................................... 2 1.3 Purpose .................................... 2 1.4 Goals ..................................... 3 1.5 Research Methodology ........................... 3 1.6 Delimitations ................................. 3 1.7 Structure of the thesis ............................ 4 2 Theoretical Background 5 2.1 Data Security ................................. 5 2.2 Security Risk Classes ............................ 6 2.3 Security code reviews ............................ 10 2.4 Automated code reviewing tools ...................... 11 2.5 Related work ................................. 13 3 Method 15 3.1 Research Methodologies .......................... 15 3.2 Research method process ......................... 17 3.3 Breaking down the research question into subquestions ........ 19 3.4 Data Collection ................................ 19 3.5 Development tools .............................. 20 3.6 Documentation ............................... 21 4 Prerequisites for Case Study 23 4.1 Prerequisites for the survey ......................... 23 4.2 Prerequisites for CodeQL .......................... 23 vii 4.3 Prerequisites for Semgrep ......................... 26 4.4 CodeQL extension for Vscode ....................... 28 5 Performing the Case study 31 5.1 Case Study: Part 1 .............................. 31 5.2 Case Study: Part 2 .............................. 33 6 Case study: Results 37 6.1 Case study: Part 1 .............................. 37 6.2 Case study: Part 2 .............................. 40 7 Discussion 43 7.1 Answering the research subquestions .................. 43 7.2 Methods .................................... 47 7.3 Limitations .................................. 49 7.4 Summary ................................... 50 7.5 Future Work ................................. 50 Appendix A: Questionnaire 57 Appendix B: Comments about the questionnaire 69 viii Chapter 1 Introduction In the modern digitalized world many aspects of one’s life are controlled or driven by digital tools. Cloud computing, smart devices, internet banking, personal information, social media, and many other technologies that involve individuals’ everyday life are common examples of digitalization. Furthermore, it has a great impact on industries, businesses, armies and governments as well [1]. Therefore protecting personal and important data stored in databases against unauthorized users, becomes one of the most discussed topics today. Data security is about protecting these sensitive data from unwanted actions of malicious users. Its primary characteristic is to discover vulnerabilities in computer systems that might lead to unauthorized access of data or allow a variety of cyberattacks [19]. A cyberattack is the act of disruption of integrity or authenticity of private information. It’s main goal is to steal or alter sensitive information from organizations, government offices or even personal computers [2]. In recent years many new security vulnerabilities and common bugs that the developers mistakenly make have been detected. Furthermore, each developer develops more codes and applications, nowadays compared to 10 years ago [3]. Each new developed code needs to be reviewed, either manually or automatically, to identify security vulnerabilities and protect them against cyberattacks. Hence the increasing need for a more effective way to review the code arises. 1 1.1 Background Security bugs which lead to exploitation occur repeatedly during the development stage of applications. The most common and critical security vulnerabilities are well classified and ranked by OWASP [4] and CWE [5]. In order to detect these security weaknesses and prevent potential vulnerabilities, there is a need to continuously review the newly developed code. The code review can be done manually by reading the codes line by line and finding the security bugs. As most of the bugs follow a certain pattern, they can also be detected by automated code reviewing tools. 1.2 Problem During manual code reviewing, developers who are familiar with all security weaknesses analyse the codes thoroughly, or ethical hackers try to hack the application and find security flaws in the application. This is a process that requires patience, experience and skills. One problem that may arise while manually reviewing these codes is the fact that some bugs can easily go undetected, because they are difficult to be identified by the human eyes. Another issue with manual code reviewing is that it can be time consuming and expensive. An alternative way to review the codes is to use automated vulnerability detection tools, which scans all the code and finds security vulnerabilities. Hence, the following research question: RQ. What are the benefits and drawbacks of using automated code reviews in web applications? 1.3 Purpose The purpose of the thesis is to contribute to a more effective and less timeconsuming security code review by examining different automated code analysis tools and comparing them with manual code review. The aim is to present the concept of automated code reviewing tools to other developers and convince them to use them. 2 1.4 Goals The main goal of the thesis is to present the benefits and drawbacks of secure automated code reviews using vulnerability detection tools against web applications. This is achieved by analysing and evaluating their effectiveness. Another goal is to find out if the tools have the potential to take over the manual code review. 1.5 Research Methodology Primarily, a literature study is done in order to get essential knowledge to answer the research question. This study aims to gather relevant information about data security, different security vulnerabilities and automated code reviewing tools. This method is approached qualitatively, as only necessary information was obtained from different sources. Secondly a quantitative experimental research method was done in the form of a case study. The experimental

Automated Secure Code Review for Webapplications

Semantic Patches for Java Program Transformation

Coccinelle: Reducing the Barriers to Modularization in a Large C Code Base

Inferring Semantic Patches for the Linux Kernel

Automating Patching of Vulnerable Open-Source Software Versions in Application Binaries

SED 1214 Transcript EPISODE 1214

Detect Complex Code Patterns Using Semantic Grep

Introducing Semgrep

Towards Generating Transformation Rules Without Examples for Android API Replacement

Effective Source Code Analysis with Minimization

Clang and Coccinelle: Synergising Program Analysis Tools for CERT C Secure Coding Standard Certification

Aalborg Universitet Coccinelle Tool Support for Automated

Design and Implementation of Semantic Patch Support for the Spoon Java Transformation Engine