Web Applications Security
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF NICE - SOPHIA ANTIPOLIS These d’Habilitation a Diriger des Recherches presented by Davide BALZAROTTI Web Applications Security Submitted in total fulfillment of the requirements of the degree of Habilitation a Diriger des Recherches Committee: Reviewers: Evangelos Markatos - University of Crete Frank Piessens - Katholieke Universiteit Leuven Frank Kargl - University of Ulm Examinators: Marc Dacier - Qatar Computing Research Institute Roberto Di Pietro - University of Padova Acknowledgments Even though this document is mostly written in first person, every single “I” should be replaced by a very large “We” to pay the due respect to everyone who made this hundred-or-so pages possible. Let me start with my two mentors, who toke me fresh after my Ph.D. and taught me how to do research and how to supervise students. The work of a professor requires good ideas, but also strong management skills, and a good number of tricks of the trade. Giovanni Vigna, at UCSB, was the first to introduce me to web security and to teach me how to do things right. Engin Kirda, at Eurecom, filled up the missing part – teaching me how to find and manage money and how to supervise other Ph.D. students. A big thank to both of them. The second group of people that made this possible are the students I was lucky to work with and supervise in the past five years. Giancarlo Pellegrino, Jelena Isacenkova, Davide Canali, Jonas Zaddach, Mariano Graziano, Andrei Costin, and Onur Catakoglu. They are not all represented in this document, but it is their hard work that transformed some of the papers that are part of this dissertation from an idea to a real scientific study. Third, I would like to thank everyone else I met on this journey - from the office mates in California (Wil, Vika, Marco, Fredrik, Greg, Chris, . ) to all the other colleagues I collaborated with in the past ten years (Aurelien, Andrea, Theodoor, . ). It was an honor to work with you all. Finally, I big thank to my wife - who supported me and encouraged me to go through all of this. Contents 1 Introduction 1 Part I: Input Validation Vulnerabilities 9 2 The Evergreens 9 3 ARTICLE Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications 10 4 ARTICLE Quo Vadis? A Study of the Evolution of Input Validation Vulnerabilities in Web Applications 26 5 ARTICLE Automated discovery of parameter pollution vulnerabilities in web applications 41 Part II: Logic Vulnerabilities 59 6 From Traditional Flaws to Logic Flows 59 7 ARTICLE Multi-Module Vulnerability Analysis of Web-based Appli- cations 60 8 ARTICLE Toward Black-Box Detection of Logic Flaws in Web Appli- cations 72 v Part III: A Play with Many Actors 89 9 A Change of Perspective 89 10 ARTICLE The Role of Web Hosting Providers in Detecting Compro- mised Websites 90 11 ARTICLE Behind the Scenes of Online Attacks: an Analysis of Ex- ploitation Behaviors on the Web 102 12 Conclusion and Future Directions 121 12.1 The Past . 121 12.2 The Future . 122 vi 1 Introduction I obtained my Ph.D. from Politecnico di Milano in 2006, working in the area of network intrusion detection. As a postdoc researcher first, and as Assistant Pro- fessor later, I then expanded my interests and my research activities to the broader area of system security. After my graduation, I co-authored over 50 scientific ar- ticles in international journals and conferences, on binary and malware analysis, web security, reverse engineering, host-based protection mechanisms, botnet de- tection, embedded system security, and computer forensics. This broad range of topics reflects the fact that security is a cross-cutting aspect that applies to multi- ple, if not all, areas of computer science. On the one hand, this is what makes this field so challenging and interesting to work on. On the other hand, this variety of topics also makes it difficult to distill my contribution to the system security area in a single coherent document. Therefore, I decided to focus this dissertation on a single area – which repre- sents, in terms of publications, only a small part of my entire work. Selecting a single topic was very hard, and therefore I decided to base my decision on a purely temporal aspect: the first research project I worked on after I completed my Ph.D. was on web security, and my last paper submitted before completing this disser- tation was again on web security. For the nine years between these two events, I always continued to work on this topic both on my own and by supervising other master and Ph.D. students in this area. Therefore, I believe web security, and the security of web applications in particular, successfully captures a line of research that characterized my past activity in system security. For space reasons, I am not able to present in this document my entire contribu- tion to the web security domain. Instead, I decided to include seven representative papers - grouped in three distinct areas - that summarize some of the problems I faced (and partially solved) after my Ph.D. graduation. 1 World Wide Web The World Wide Web (or simply the Web) was initially proposed in 1990 by Tim Berners-Lee and Robert Cailliau as a distributed “web” of interconnected hyper- text documents. These static documents were written using a markup language (HTML) and transferred over the network using a dedicated protocol (HTTP). The initial system also required two new pieces of software, one to serve the pages (a web server) and one to retrieve them and render their content on the screen (a web browser). In few years, this initially simple architecture evolved far beyond imagination. In particular, the majority of web sites are nowadays complex distributed applica- tions, with part of their code running on the server (to dynamically construct the pages content, typically based on information stored in a backend database) and part running in the user browser (to implement the user interface and fetch content on demand from the server). As a result, according to the HttpArchive[1], today almost 50% of the web pages require more than 30 separate connections to fetch all the required elements. However, since the focus of this document is on the security of web applica- tions, I will limit the historical discussion of the Web to three important aspects that, in my opinion, characterized the evolution of this technology from a security perspective. A Platform for the Masses In few years from its introduction, the Web rapidly evolved from a client-server system to deliver hypertext documents into a platform to run stateful, asynchronous, distributed applications. On the one hand, this transformation required more so- phisticated software components (modern browsers are now comparable to an op- erating system kernel in terms of lines of code) and a more complex architecture, involving the interaction of dozens of languages and protocols (such as WebDav, XML, Soap, JSON, SSL, OAuth, just to name a few). On the other hand, the Web was designed since the beginning to be simple for users to use and to deploy new content. In other words, on one side we have a very complex architecture, whose details are difficult to grasp also for experts in the field. On the other side, the same architecture is advertised as a platform for the masses – that even people with little to no experience in software design can use to quickly develop new web sites and applications, with advanced and customizable user interfaces. To make things worse, the extremely fast evolution of Web technologies - often driven by a market pushing for new features - was often affected by design flaws and poor security planning. This combination had catastrophic security consequences. Hundred of thou- sands of web sites were created by web designers, experts in customizing the vi- sual look-and-feel of an application, but completely unaware of the complexity and [1]http://httparchive.org 2 Chapter 1. Introduction risks of the technology they were using. Unfortunately, these web sites have of- ten access to databases and to a large spectrum of sensitive client information. As a result, even simple vulnerabilities such as SQL injections became a plague that gave curios people first, and criminals later, an easy way to steal data and access company networks. To close the loop, search engines provided a simple way for attackers to find vulnerable targets, so that the entire exploitation process could be easily automated. The combination of all these factors made the Web the target of choice not only for criminals and knowledgeable attackers, but also for a multitude of wanna-be hack- ers (the so called script-kiddies) with little technical background - that by simply using automated tools were able to wreak havoc with many Internet services. Reachable by Design If the Web revolution was bad from a software development point of view, it was not much better from a network administrator perspective. Internet was designed to connect networks together, so that every network would be reachable from ev- erywhere else. However, networks were also designed with strict monitoring and access control in mind. For instance, firewalls limited the access to sensitive parts of a company’s network, and allowed only a limited number of selected services to be open to the entire world. At the same time, intrusion detection systems carefully monitored those services for signs of attacks or suspicious behaviors. The web broke this isolation. Usability had priority over security, and devel- opers were fast to port many existing services to the new platform, allowing all sort of traffic to be tunneled through the web server on port 80 (open by default on most of the networks).