UC Berkeley UC Berkeley Electronic Theses and Dissertations

UC Berkeley UC Berkeley Electronic Theses and Dissertations Title Systematic Techniques for Finding and Preventing Script Injection Vulnerabilities Permalink https://escholarship.org/uc/item/89m7p9xw Author Saxena, Prateek Publication Date 2012 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California Systematic Techniques for Finding and Preventing Script Injection Vulnerabilities by Prateek Saxena A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Dawn Song, Chair Professor David Wagner Professor Brian Carver Fall 2012 Systematic Techniques for Finding and Preventing Script Injection Vulnerabilities Copyright 2012 by Prateek Saxena 1 Abstract Systematic Techniques for Finding and Preventing Script Injection Vulnerabilities by Prateek Saxena Doctor of Philosophy in Computer Science University of California, Berkeley Professor Dawn Song, Chair Computer users trust web applications to protect their financial transactions and online identities from attacks by cyber criminals. However, web applications today are riddled with security flaws which can compromise the security of their web sessions. In this thesis, we ad- dress the problem of automatically finding and preventing script injection vulnerabilities, one of the most prominent classes of web application vulnerabilities at present. Specifically, this thesis makes three contributions towards addressing script injection vulnerabilities. First, we propose two techniques that together automatically uncover script injection vulnerabilities in client-side JavaScript components of web applications without raising false positives. Sec- ond, we empirically study the use of sanitization, which is the predominant defense technique to prevent these attacks today. We expose two new classes of errors in the practical use of sanitization in shipping web applications and demonstrate weaknesses of emerging defenses employed in widely used web application frameworks. Third, we propose a type-based approach to automatically perform correct sanitization for applications authored in emerging web application frameworks. Finally, we propose a conceptual framework for a sanitization- free defense against script injection vulnerabilities, which can form a robust second line of defense. i To my parents, Krati and my brother Siddharth. ii Contents Contents ii List of Figures iv List of Tables ix 1 Introduction 1 1.1 Contributions . 2 1.2 Statement of Joint Work . 3 2 Background & Overview 4 2.1 Script Injection Vulnerabilities: Definition & Examples . 4 2.2 Techniques for Finding Script Injection Vulnerabilities Automatically . 10 2.3 Techniques for Preventing Script Injection Vulnerabilities . 12 3 Finding Vulnerabilities using Taint-Enhanced Blackbox Fuzzing 14 3.1 Approach and Architectural Overview . 15 3.2 Technical Challenges and Design Points . 16 3.3 Flax: Design and Implementation . 17 3.4 Evaluation . 24 3.5 Related Work . 31 3.6 Conclusion . 32 4 Finding Vulnerabilities using Dynamic Symbolic Execution 33 4.1 Problem Statement and Overview . 35 4.2 End-to-End System Design . 37 4.3 Core Constraint Language . 42 4.4 Core Constraint Solving Approach . 44 4.5 Reducing JavaScript to String Constraints . 48 4.6 Experimental Evaluation . 50 4.7 Related Work . 58 4.8 Conclusion . 59 iii 5 Analysis of Existing Defenses 61 5.1 Challenges in Sanitization . 62 5.2 Support for Auto-Sanitization in Existing Web Application Frameworks . 67 5.3 Failures of Sanitization in Large-Scale Applications . 74 5.4 Conclusion . 81 6 Securing Sanitization-based Defense 85 6.1 Problem Definition . 88 6.2 Our Approach . 93 6.3 The Context Type System . 94 6.4 CSAS Engine . 102 6.5 Operational Semantics . 104 6.6 Implementation & Evaluation . 108 6.7 Related Work . 112 6.8 Conclusion . 113 7 DSI: A Basis For Sanitization-Free Defense 115 7.1 XSS Definition and Examples . 116 7.2 Approach Overview . 119 7.3 Enforcement Mechanisms . 122 7.4 Architecture . 127 7.5 Implementation . 130 7.6 Evaluation . 132 7.7 Comparison with Existing XSS Defenses . 135 7.8 Discussion . 139 7.9 Related Work . 140 7.10 Conclusion . 141 8 Conclusion 142 Bibliography 144 iv List of Figures 2.1 A snippet of HTML pseudo-code generated by a social networking application server vulnerable to scripting attack. Untrusted input data, identified by the $GET[`...'] variables, to the server is echoed inline in the HTML response and without any modification or sanitization. 5 2.2 An example of a chat application's JavaScript code for the main window, which fetches messages from the backend server at http://example.com/ . 7 2.3 An example vulnerable chat application's JavaScript code for a child message dis- play window, which takes chat messages from the main window via postMessage. The vulnerable child message window code processes the received message in four steps, as shown in the receiveMessage function. First, it parses the principal domain of the message sender. Next, it tries to check if the origin's port and domain are \http" or \https" and \example.com" respectively. If the checks succeed, the popup parses the JSON [58] string data into an array object and finally, invokes a function for displaying received messages. In lines 29-31, the child window sends confirmation of the message reception to a backend server script. 8 3.1 Approach Overview . 15 3.2 System Architecture for Flax ............................ 18 3.3 Algorithm for Flax ................................. 19 3.4 Simplified operations supported in JASIL intermediate representation . 19 3.5 Type system of JASIL intermediate representation . 20 3.6 (Left) Sources of untrusted data. (Right) Critical sinks and corresponding ex- ploits that may result if untrusted data is used without proper validation. 21 3.7 (Left) Acceptor Slice showing validation and parsing operations on event.origin field in the running example. (Right) Execution of the Acceptor Slice on a can- didate attack input, namely http://evilexample.com/ . 23 3.8 An example of a acceptor slice which uses complex string operations for input validation, which is not directly expressible to the off-the-shelf string decision procedures available today. 28 v 3.9 A gadget overwriting attack layered on a client-side script injection vulnerability. The user clicks on an untrusted link which shows the iGoogle web page with an overwritten iGoogle gadget. The URL bar continues to point to the iGoogle web page. 30 4.1 Architecture diagram for Kudzu. The components drawn in the dashed box perform functions specific to our application of finding client-side script injection. The remaining components are application-agnostic. Components shaded in light gray are the core contribution of this chapter. 38 4.2 Abstract grammar of the core constraint language. 43 4.3 Relations between the unbounded versions of several theories of strings. Theories higher in the graph are strictly more expressive but are also at least as complex to decide. Kudzu's core constraint language (shaded) is strictly more expressive than either the core language of HAMPI [66] or the theory of word equations and an equal length predicate (the \pure library language" of [17]). 43 4.4 Algorithm for solving the core constraints. 45 4.5 A sample concat graph for a set of concatenation constraints. The relative ordering of the strings in the final character array is shown as start and end positions in parentheses alongside each node. 47 4.6 A set of concat constraints with contradictory ordering requirements. Nodes are duplicated to resolve the contradiction. 47 4.7 Type system for the full constraint language . 49 4.8 Grammar and types for the full constraint language including operations on strings, integers, and booleans. 49 4.9 Distribution of string operations in our subject applications. 52 4.10 Kudzu code coverage improvements over the testing period. For each experiment, the right bar shows the increase in the executed code from the initial run to total code executed. The left bar shows the increase in the code compiled from initial run to the total code compiled in the entire test period. 54 4.11 Benefits from symbolic execution alone (dark bars) vs. complete Kudzu (light bars). For each experiment, the right bar shows the increase in the total executed code when the event-space exploration is also turned on. The left bar shows the observed increase in the code compiled when the event-space exploration is turned on. .......................................... 55 4.12 The constraint solver's running time (in seconds) as a function of the size of the input constraints (in terms of the number of symbolic JavaScript operations) . 58 5.1 Flow of Data in our Browser Model. Certain contexts such as PCDATA and CDATA directly refer to parser states in the HTML 5 specification. We refer to the numbered and underlined edges during our discussion in the text. 63 5.2 A real-world vulnerability in PHPBB3. 67 5.3 Example of Django application with wrong sanitization . 70 vi 5.4 Example of Auto-sanitization in Google Ctemplate framework . 72 5.5 Sanitizer-to-context mapping for our test applications. 75 5.6 Running example: C# code fragment illustrating the problem of automatic sanitizer placement. Underlined values are derived from untrusted data and require sanitization; function calls are shown with thick black arrows C1-C3 and basic blocks B1-B4 are shown in gray circles. 76 5.7 Two different sanitization approaches are shown: Method 1 is shown above and method 2 below. 76 5.8 HTML outputs obtained by executing different paths in the running example. TOENCODE denotes the untrusted string in the output. 78 5.9 Histogram of sanitizer sequences consisting of 2 or more sanitizers empirically observed in analysis, characterizing sanitization practices resulting from manual sanitizer placement. E,H,U, K,P,S denote sanitizers EcmaScriptStringLiteralEncode, HtmlEncode, HtmlAttribEncode, UrlKeyValueEncode, UrlPathEncode, and Sim- pleHtmlEncode respectively.

UC Berkeley UC Berkeley Electronic Theses and Dissertations

Apachecon US 2008 with Apache Shindig

Creating a Simple Website

Juror Misconduct in the Digital Age

How to Install the UA Directory Gadget

Google Sites & Apps Keith Warne

Google Acks First Edition

Web 2.0 Tutorials

Getting SAP to Talk to Google Gadgets

Google Search & Personalized News

Google Code Google Code Offers Quick Access to Google Apis and Open Source Code for Develope Rs.Current Offerings Include Google

A Semantic Approach for Web Widget Mashup

* His Is the Original Ubuntuguide. You Are Free to Copy This Guide but Not to Sell It Or Any Derivative of It. Copyright Of