Detection of Web Vulnerabilities Via Model Inference Assisted Evolutionary Fuzzing Fabien Duchene

Detection of Web Vulnerabilities via Model Inference assisted Evolutionary Fuzzing Fabien Duchene To cite this version: Fabien Duchene. Detection of Web Vulnerabilities via Model Inference assisted Evolutionary Fuzzing. Computation and Language [cs.CL]. Grenoble University, 2014. English. tel-01102325 HAL Id: tel-01102325 https://hal.archives-ouvertes.fr/tel-01102325 Submitted on 12 Jan 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` Pour obtenir le grade de DOCTEUR DE L’UNIVERSITE´ DE GRENOBLE Specialit´ e´ : Informatique Arretˆ e´ ministerial´ : 7 Aoutˆ 2006 Present´ ee´ par Fabien Duchene` These` dirigee´ par Professeur Roland Groz et co-encadree´ par Docteur Jean-Luc Richier prepar´ ee´ au sein Laboratoire d’Informatique de Grenoble et de Ecole´ Doctorale Mathematiques,´ Sciences et Technologies de l’Information, Informatique Detection of Web Vulnerabilities via Model Inference assisted Evo- lutionary Fuzzing These` soutenue publiquement le 2 Juin, 2014, devant le jury compose´ de : M. Jean-Luc RICHIER Charge´ de Recherche, CNRS, France, Co-encadrant de these` M. Roland GROZ Professeur, Grenoble INP, France, Directeur de these` M. Bruno LEGEARD Professeur, Universite´ de Franche-Comte,´ France, Rapporteur M. Herbert BOS Professeur, VU Amsterdam, Pays-Bas, Rapporteur M. Mario HEIDERICH Docteur, Ruhr-Universitat¨ Bochum, Allemagne, Examinateur M. Yves DENNEULIN Professeur, Grenoble INP, France, Examinateur 2 Dedication 3 4 Acknowledgements 5 6 Detection´ de Vulnerabilit´ es´ Web par Frelatage Evolutionniste et Inference´ de Modele` Resum´ e:´ Le test est une approche efficace pour detecter´ des bogues d’implemen-´ tation ayant un impact sur la securit´ e,´ c.-a-d.` des vulnerabilit´ es.´ Lorsque le code source n’est pas disponible, il est necessaire´ d’utiliser des techniques de test en boˆıte noire. Nous nous interessons´ au probleme` de detection´ automatique d’une classe de vulnerabilit´ es´ (Cross Site Scripting alias XSS) dans les applications web dans un contexte de test en boˆıte noire. Nous proposons une approche pour inferer´ des modeles` de telles applications et frelatons des sequences´ d’entrees´ gen´ er´ ees´ a` partir de ces modeles` et d’une grammaire d’attaque. Nous inferons´ des automates de controleˆ et de teinte, dont nous extrayons des sous-modeles` afin de reduire´ l’espace de recherche de l’etape´ de frelatage. Nous utilisons des algorithmes gen´ etiques´ pour guider la production d’entrees´ malicieuses envoyees´ a` l’application. Nous produisons un verdict de test graceˆ a` une double inference´ de teinte sur l’arbre d’analyse grammaticale d’un navigateur et a` l’utilisation de mo- tifs de vulnerabilit´ es´ comportant des annotations de teinte. Nos implementations´ LigRE et KameleonFuzz obtiennent de meilleurs resultats´ que les scanneurs boˆıte noire open-source. Nous avons decouvert´ des XSS “0-day” (c.-a-d.` des vulnerabilit´ es´ jusque lors inconnues publiquement) dans des applications web utilisees´ par des millions d’utilisateurs. Keywords: Securit´ e,´ Frelatage, XSS, Algorithme Evolutionniste, Inference,´ Intelligence Artificielle, Applications Web Detection of Web Vulnerabilities via Model Inference assisted Evolutionary Fuzzing Abstract: Testing is a viable approach for detecting implementation bugs which have a security impact, a.k.a. vulnerabilities. When the source code is not available, it is necessary to use black-box testing techniques. We address the problem of automatically detecting a certain class of vulnerabilities (Cross Site Scripting a.k.a. XSS) in web applications in a black-box test context. We propose an approach for inferring models of web applications and fuzzing from such models and an attack grammar. We infer control plus taint flow automata, from which we produce slices, which narrow the fuzzing search space. Genetic algorithms are then used to schedule the malicious inputs which are sent to the application. We incorporate a test verdict by performing a double taint inference on the browser parse tree and combining this with taint aware vulnerability patterns. Our implementations LigRE and KameleonFuzz outperform current open-source black-box scanners. We discovered 0-day XSS (i.e., previously unknown vulnerabilities) in web applications used by millions of users. Keywords: Security, Fuzzing, XSS, Evolutionary Algorithm, Inference, Artificial Intelligence, Web Applications Contents 1 Introduction 11 1.1 Context............................... 11 1.2 Vulnerabilities............................ 13 1.3 Objectives.............................. 15 1.4 Contributions............................ 15 1.5 Dissertation Structure........................ 16 2 Problem Statement 17 2.1 Cross Site Scripting (XSS)..................... 18 2.2 Definitions.............................. 22 2.3 Fuzzing............................... 31 2.4 Other WCI............................. 32 2.5 Black-Box XSS Detection..................... 32 3 Our Proposal 35 3.1 Model Inference........................... 35 3.2 Evolutionary XSS Fuzzing..................... 38 4 Inference for XSS 43 4.1 Our Approach............................ 43 4.2 Control Flow Inference....................... 44 4.3 Taint Flow Inference........................ 61 4.4 Flow-Aware Input Gen........................ 65 4.5 Implementation........................... 67 4.6 Related Work............................ 69 5 GA XSS Fuzzing 73 5.1 Introduction............................. 73 5.2 Evolutionary XSS Fuzzing..................... 74 5.3 Implementation........................... 86 5.4 Related Work............................ 88 6 XSS Experiments 91 6.1 Evaluation methodology...................... 91 6.2 LigRE evaluation.......................... 92 6.3 KameleonFuzz evaluation...................... 97 6.4 Discussion.............................. 101 9 CONTENTS CONTENTS 7 Other Approches 105 7.1 White & GreyBox.......................... 105 7.2 Black-Box Approaches....................... 106 8 Discussion & Conclusion 109 8.1 Discussion.............................. 109 8.2 Conclusion............................. 114 References 117 List of Figures............................... 132 List of Tables............................... 133 List of Algorithms............................. 135 List of Listings.............................. 137 List of Acronyms............................. 139 A Web Scanners Configuration 141 B KameleonFuzz: List of Taint Aware tree Patterns 153 C 0-day Found XSS Vulnerabilities 155 Two of the 0-days XSS discovered by KameleonFuzz.......... 155 Examples of Vulnerabilities that KameleonFuzz is unable to detect... 158 Appendices 131 10 CHAPTER 1 Introduction Why did I rob banks? Because I enjoyed it. I loved it. Go where the money is...and go there often. [Sutton & Linn 2004] The world is a dangerous place to live ; not because of those who do evil, but because of those of the people who don’t do anything about it. [Einstein 1955] Computer security is the cancer of the software industry. There is no money to prevent it. Only sick persons care about it, but it is generally already too late. However, everybody will have to face it someday. [Ruff 2013b] Do not underestimate the importance of cyber-attack capabilities. I do not know how to defend a system if you are unaware of how to attack it. [Filiol 2013b] 1.1 Context Actors and Threats The Internet is a connected network of billions of devices. For the simplicity of administrating them, we plugged into this network of net- works devices having an impact on the physical world: traffic control, power plants, gas stations, etc. Corporations and governments have data-stores connected to the Internet [Duwell 2013]. Banks and trading systems offer an interface with the In- ternet. If not secured, those systems make the Internet a playground for hack- ers with varying motivations (e.g., enemy governments, army, individuals paid by Mafia, etc.). As security researchers, it is our duty to develop new techniques to protect bet- ter national assets such as: energy, money, communication and information. More 11 1.1. CONTEXT CHAPTER 1. INTRODUCTION specifically, in this cyber war, we want to protect computer assets from attacks exploiting vulnerabilities (flaws in the system). A way to achieve such goal is to detect vulnerabilities, and more precisely to detect them as soon as possible. If they are present in our systems, we need to patch them to prevent exploitation (defen- sive security). If they are present in enemy systems, attackers may want to exploit them to gain additional privileges (offensive security). The source code of applications may not be available (e.g., when testing integrated or remote components). In such cases, we need to rely on black-box testing techniques. This thesis focuses on detecting certain classes of vulnerabilities in a black-box test context. Security and Vulnerabilities Figure 1.1 shows a dependability tree according to the [Avizienisˇ et al. 2004] taxonomy. We mark in bold our focus for this thesis. We detect errors and failures that affect availability

Load more