Design and Analysis of Web Application Frameworks
Total Page:16
File Type:pdf, Size:1020Kb
Aarhus University PhD Dissertation Design and Analysis of Web Application Frameworks Mathias Schwarz Supervisor: Anders Møller Submitted: January 29, 2013 Abstract Numerous web application frameworks have been developed in recent years. These frameworks enable programmers to reuse common components and to avoid typical pitfalls in web application development. Although such frameworks help the pro- grammer to avoid many common errors, we find that there are important, common errors that remain unhandled by web application frameworks. Guided by a survey of common web application errors and of web application frameworks, we identify the need for techniques to help the programmer avoid HTML invalidity and security vulnerabilities, in particular client-state manipulation vulnerabilities. The hypothesis of this dissertation is that we can design frameworks and static analyses that aid the programmer to avoid such errors. First, we present the JWIG web application framework for writing secure and maintainable web applications. We discuss how this framework solves some of the common errors through an API that is designed to be safe by default. Second, we present a novel technique for checking HTML validity for output that is generated by web applications. Through string analysis, we approximate the out- put of web applications as context-free grammars. We model the HTML validation algorithm and the DTD language, and we generalize the validation algorithm to work for context-free grammars. Third, we present a novel technique for identifying client-state manipulation vulnerabilities. The technique uses a combination of output analysis and informa- tion flow analysis to detect flow in the web application that might be exploited by malicious clients. We implement and evaluate the techniques to study their usefulness in practice. We find that JWIG is useful for implementing large web applications. We further- more evaluate the static analyses techniques and find that they are able to detect real bugs with few false positives in open-source applications. 1 Resume Igennem de senere ˚arer et stort antal web applikations-frameworks blevet udviklet. Disse frameworks gør det muligt for programmører at genbruge løsninger og undg˚a typiske fejl i web applikationer. Selvom s˚adanneframeworks hjælper programmører med at undg˚amange typer fejl, s˚aviser det sig, at der alligevel er typer af fejl, som endnu ikke bliver h˚andterettilstrækkeligt af frameworks. Vi undersøger hvilke typer fejl, der ofte opst˚ari web applikationer, og hvilke løsninger forskellige frameworks har p˚aproblermerne. Denne undersøgelse viser, at der er brug for teknikker, som kan hjælpe programmøren med at undg˚augyldig HTML og sikkerhedsproblemer, især s˚arbarhedersom skyldes manipulation af klien- tens tilstand. Hypotesen i denne afhandling er, at vi kan designe nye frameworks og statiske analyser som kan hjælpe programmøren med at undg˚adisse typer af fejl. Først præsenterer vi JWIG frameworket der kan bruges til at skrive sikre web applikationer som er lette af vedligeholde. Vi diskuterer hvordan dette framework løser nogle af de typiske problemer med et API som er designet til at undg˚afejl. Dernæst præsenterer vi en ny teknik til at verificerere korrekthed af genereret HTML i web applikationer. Vi bruger streng-analyse til at tilnærme det mulige output med kontekst-frie grammatikker. Vi modellerer HTML valideringsalgorit- men og DTD specifikationssproget og viser, hvordan valideringsalgoritmen lader sig generalisere til at virke for kontekst-frie grammatikker. Endelig præsenterer vi en ny teknik til at finde s˚arbarheder,som skyldes ma- nipulation af klientens tilstand. Teknikken bruger en kombination af en output analyse og en information flow analyse til at finde flow af data som kan udnyttes af en ondsindet klient. Vi implementerer og evaluerer teknikkerne for at undersøge, hvor godt de virker i praksis. Vi ser, at JWIG er brugbart til at implementere store web applikationer. Vi evaluerer ogs˚aanalyseteknikkerne og ser, at de kan bruges til at finde fejl i open-source programmer, som vi har fundet p˚anettet. 3 Acknowledgments Thanks to the members of the Programming Languages group at Aarhus University for insightful discussions, inspiring friendships, and countless foosball tournaments. I am truly thankful to Anders Møller for his patient and insightful mentorship. He has been the best supervisor one can imagine. I thank Simon Holm Jensen for his valuable feedback during the writing of this dissertation. I would like to thank Henning Rohde and Anna Gringauze for an inspiring stay at Microsoft Research. Thanks to my parents and brothers for supporting and encouraging me. A very special thanks to my wife Rikke who has been a loving help and support throughout my studies. Our son Andreas has barely been with us for a year but I thank him for his happy smiles and for reminding me to never stop being curious. Mathias Schwarz, Aarhus, January 29, 2013. 5 Contents Abstract 1 Resume 3 Acknowledgments 5 Contents 7 I Overview 1 1 Introduction 3 1.1 Overview . 4 1.2 Method . 5 1.3 Contributions . 5 1.4 Software packages . 6 2 Web frameworks and web applications 7 2.1 Purpose and structure of web frameworks . 8 2.1.1 Components of a web framework . 8 2.1.2 Static analysis in the presence of web application frameworks 10 2.2 Software engineering principles in web frameworks . 10 2.2.1 Cohesion, coupling, and the MVC pattern . 10 2.2.2 Safety by default . 11 2.3 Common web application errors . 11 2.3.1 Output correctness . 11 2.3.2 Security . 13 2.4 A survey of web application frameworks . 17 2.4.1 PHP . 18 2.4.2 Java Servlets . 19 2.4.3 Java Server Pages . 21 2.4.4 JSF . 22 2.4.5 Struts 2 . 24 2.4.6 Seaside . 26 2.4.7 Hop . 28 2.5 The need for a new web framework and analysis for the existing ones 29 3 Designing a new web application framework 31 3.1 Overview of JWIG . 31 3.2 Introduction to JWIG application programming . 33 3.2.1 Xact syntax . 33 3.2.2 Web methods . 34 3.2.3 Session data . 35 7 3.2.4 Handlers . 35 3.2.5 Client/server communication . 36 3.3 Relation to existing frameworks . 36 3.3.1 Relation to the previous version of JWIG . 36 3.3.2 The JWIG approach compared to MVC . 37 3.3.3 Relation between submit handlers and Seaside callbacks . 38 3.4 Static analysis of JWIG applications . 38 3.4.1 XHTML validation of output construction . 38 3.4.2 Web application page graphs . 39 3.4.3 Link consistency . 39 3.4.4 Filter coverage . 39 3.4.5 Parameter names . 40 3.5 Evaluation of the JWIG web application framework . 40 3.5.1 Safety of JWIG applications . 41 3.5.2 Case study: CourseAdmin . 41 4 Static analysis for existing frameworks 43 4.1 Approximating web application output . 43 4.1.1 Output-stream flow graphs . 43 4.1.2 From Java Servlets and JSP to output-stream flow graphs . 44 4.1.3 From output-stream flow graphs to context-free grammars . 45 4.2 HTML validation in WARLord . 45 4.2.1 Annotating context-free grammars with contexts . 46 4.2.2 HTML validation of an annotated grammar . 46 4.2.3 On HTML5 . 48 4.2.4 Comparison to related work . 49 4.2.5 Evaluation . 50 4.3 Client-state manipulation vulnerability detection . 50 4.3.1 Related security analysis techniques for web applications . 50 4.3.2 Overview of the analysis technique in WARLord . 52 4.3.3 Evaluation . 54 5 Conclusion 55 II Publications 57 6 JWIG: Yet Another Framework for Maintainable and Secure Web Applications 59 6.1 Introduction . 60 6.2 Architecture . 62 6.3 Generating XML Output . 64 6.4 XML Producers and Page Updates . 65 6.5 Forms and Event Handlers . 66 6.6 Example: MicroChat . 66 6.7 Parameters and References to Web Methods . 67 6.8 Session State and Persistence . 68 6.9 Caching and Authentication . 69 6.10 Additional Examples . 70 6.10.1 QuickPoll . 70 6.10.2 GuessingGame . 70 6.11 Case Study: CourseAdmin . 71 6.12 Conclusion . 72 8 7 HTML Validation of Context-Free Languages 79 7.1 Introduction . 80 7.1.1 Outline of the Paper . 81 7.1.2 Example . 81 7.2 Related Work . 82 7.3 Parsing HTML Documents . 83 7.3.1 A Model of HTML Parsing . 84 7.4 Parsing Context-Free Sets of Documents . 86 7.4.1 Generating Constraints . 86 7.4.2 Solving Constraints . 87 7.4.3 Example . 89 7.5 Experimental Results . 90 7.6 Conclusion . 91 7.7 Proof of Theorem 7.1 . 93 8 Automated Detection of Client-State Manipulation Vulnerabilities 97 8.1 Introduction . 98 8.2 Client-State Manipulation Vulnerabilities . 101 8.3 Outline of the Analysis . 104 8.4 Identifying Client State . 105 8.4.1 Analyzing HTML Output . 106 8.4.2 Analyzing Input Parameters . 109 8.5 Identifying Shared Application State . 109 8.6 Information Flow from Client State to Shared Application State . 111 8.7 Automatic Configuration of a Security Filter . 112 8.8 Evaluation . 113 8.8.1 Experiments . 114 8.8.2 Summary of Results . 120 8.9 Related Work . 122 8.10 Conclusion . 123 Bibliography 125 9 Part I Overview 1 Chapter 1 Introduction The hypothesis in this dissertation is that we can design frameworks and static analyses that aid the programmer to avoid invalid HTML and to avoid programs that are vulnerable against common security vulnerabilities, in particular client- state manipulation vulnerabilities. For such frameworks and analyses to be useful to a programmer it must be possible to apply them to large scale programs.