Automatic Creation of SQL Injection and Cross-Site Scripting Attacks
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Creation of SQL Injection and Cross-Site Scripting Attacks Adam Kiezun˙ Philip J. Guo Karthick Jayaraman Michael D. Ernst MIT Stanford University Syracuse University MIT [email protected] [email protected] [email protected] [email protected] Abstract code on the victim’s machine by exploiting inadequate val- idation of data flowing to statements that output HTML. In We present a technique for finding security vulnerabili- 2008, SQLI comprised 27% of Web-application vulnerabil- ties in Web applications. SQL Injection (SQLI) and cross- ities (up from 18% in 2007), and XSS comprised 24% (up site scripting (XSS) attacks are widespread forms of attack from 21%) [2]. in which the attacker crafts the input to the application to Previous approaches to identifying SQLI and XSS vul- access or modify user data and execute malicious code. In nerabilities and preventing exploits include defensive cod- the most serious attacks (called second-order, or persistent, ing, static analysis, dynamic monitoring, and test genera- XSS), an attacker can corrupt a database so as to cause tion. Each of these approaches has its own merits, but also subsequent users to execute malicious code. offers opportunities for improvement. Defensive coding [3] This paper presents an automatic technique for creating is error-prone and requires rewriting existing software to inputs that expose SQLI and XSS vulnerabilities. The tech- use safe libraries. Static analysis tools [12, 16, 29] can pro- nique generates sample inputs, symbolically tracks taints duce false warnings and do not create concrete examples of through execution (including through database accesses), inputs that exploit the vulnerabilities. Dynamic monitoring and mutates the inputs to produce concrete exploits. Ours tools [8,24,26] incur runtime overhead on the running appli- is the first analysis of which we are aware that precisely cation. Black-box test generation does not take advantage addresses second-order XSS attacks. of the application’s internals [27], while previous white-box Our technique creates real attack vectors, has few false techniques have not been shown to discover unknown vul- positives, incurs no runtime overhead for the deployed ap- nerabilities [30]. plication, works without requiring modification of appli- We have created a new technique for identifying SQLI cation code, and handles dynamic programming-language and XSS vulnerabilities. Unlike previous approaches, our constructs. We implemented the technique for PHP, in a tool technique works on unmodified existing code, creates con- Ardilla. We evaluated Ardilla on five PHP applications crete inputs that expose vulnerabilities, has no overhead for and found 68 previously unknown vulnerabilities (23 SQLI, the released software, and analyzes application internals to 33 first-order XSS, and 12 second-order XSS). discover vulnerable code. As an implementation of our technique, we created Ardilla, an automated tool for cre- ating SQLI and XSS attacks in PHP/MySQL applications. 1 Introduction In our experiments, Ardilla discovered 68 previously un- known vulnerabilities in five applications. This paper presents a technique and an automated tool Ardilla is based on input generation, taint propagation, for finding security vulnerabilities in Web applications. and input mutation to find variants of an execution that ex- Multi-user Web applications are responsible for handling ploit a vulnerability. We now discuss these components. much of the business on today’s Internet. Such applica- Ardilla can use any input generator. Our current im- tions often manage sensitive data for many users, and that plementation uses combined concrete and symbolic execu- makes them attractive targets for attackers: up to 70% tion [1,6,25,30]. During each execution, the input generator of recently reported vulnerabilities affected Web applica- monitors the program to record path constraints that capture tions [2]. Therefore, security and privacy are of great im- the outcome of control-flow predicates. The input generator portance for Web applications. automatically and iteratively generates new inputs by negat- Two classes of attacks are particularly common and dam- ing one of the observed constraints and solving the modi- aging. In SQL injection (SQLI), the attacker executes ma- fied constraint system. Each newly-created input explores licious database statements by exploiting inadequate vali- at least one additional control-flow path. dation of data flowing from the user to the database. In Ardilla’s vulnerability detection is based on dynamic cross-site scripting (XSS), the attacker executes malicious taint analysis [8, 29]. Ardilla marks data coming from the 1 user as potentially unsafe (tainted), tracks the flow of tainted SQL Injection. A SQLI vulnerability results from the ap- data in the application, and checks whether tainted data can plication’s use of user input in constructing database state- reach sensitive sinks. An example of a sensitive sink is the ments. The attacker invokes the application, passing as an PHP mysql query function, which executes a string argu- input a (partial) SQL statement, which the application exe- ment as a MySQL statement. If a string derived from tainted cutes. This permits the attacker to get unauthorized access data is passed into this function, then an attacker can poten- to, or to damage, the data stored in a database. To prevent tially perform an SQL injection, if the tainted string affects this attack, applications need to sanitize input values that the structure of the SQL query as opposed to just being used are used in constructing SQL statements, or else reject po- as data within the query. Similarly, passing tainted data into tentially dangerous inputs. functions that output HTML can lead to XSS attacks. First-order XSS. A first-order XSS (also known as Type 1, rdilla A ’s taint propagation is unique in that it tracks the or reflected, XSS) vulnerability results from the application flow of tainted data through the database. When tainted inserting part of the user’s input in the next HTML page data is stored in the database, the taint information is stored that it renders. The attacker uses social engineering to con- with it. When the data is later retrieved from the database, vince a victim to click on a (disguised) URL that contains it is marked with the stored taint. Thus, only data that malicious HTML/JavaScript code. The user’s browser then was tainted upon storing is tainted upon retrieval. This displays HTML and executes JavaScript that was part of the rdilla precision makes A able to accurately detect second- attacker-crafted malicious URL. This can result in stealing order (persistent) XSS attacks. By contrast, previous tech- of browser cookies and other sensitive user data. To prevent niques either treat all data retrieved from the database as first-order XSS attacks, users need to check link anchors tainted [28, 29] (which may lead to false warnings) or treat before clicking on them, and applications need to reject or all such data as untainted [16] (which may lead to missing modify input values that may contain script code. real vulnerabilities). Not every flow of tainted data to a sensitive sink indicates Second-order XSS. A second-order XSS (also known as a vulnerability because the data may flow through routines persistent, stored, or Type 2 XSS) vulnerability results from that check or sanitize it. To improve usability, Ardilla does the application storing (part of) the attacker’s input in a not require the user to provide a list of sanitizing routines. database, and then later inserting it in an HTML page that is Instead, Ardilla systematically mutates inputs that propa- displayed to multiple victim users (e.g., in an online bulletin gate taints to sensitive sinks, using a library of strings that board application). It is harder to prevent second-order XSS can induce SQLI and XSS attacks. Ardilla then analyzes than first-order XSS, because applications need to reject or the difference between the parse trees of application out- sanitize input values that may contain script code and are puts (SQL and HTML) to determine whether the attack may displayed in HTML output, and need to use different tech- subvert the behavior of a database or a Web browser, respec- niques to reject or sanitize input values that may contain tively. This step enables Ardilla to reduce the number of SQL code and are used in database commands. false warnings and to precisely identify real vulnerabilities. Second-order XSS is much more damaging than first- order XSS, for two reasons: (a) social engineering is not This paper makes the following contributions: required (the attacker can directly supply the malicious in- put without tricking users into clicking on a URL), and (b) • A fully-automatic technique for creating SQLI and a single malicious script planted once into a database exe- XSS attack vectors, including those for second-order cutes on the browsers of many victim users. (persistent) XSS attacks. (Section 3) • A novel technique that determines whether a propa- 2.1 Example PHP/MySQL Application gated taint is a vulnerability, using input mutation and PHP is a server-side scripting language widely used in output comparison. (Section 4.3) creating Web applications. The program in Figure 1 imple- • A novel approach to symbolically tracking the flow of ments a simple message board that allows users to read and tainted data through a database. (Section 4.4) post messages, which are stored in a MySQL database. To • An implementation of the technique for PHP in a tool use the message board, users of the program fill an HTML Ardilla (Section 4), and evaluation of Ardilla on real form (not shown here) that communicates the inputs to the PHP applications (Section 5). server via a specially formatted URL, e.g., http://www.mysite.com/?mode=display&topicid=1 2 SQL Injection and Cross-Site Scripting Input parameters passed inside the URL are available in This section describes SQLI and XSS Web-application the $ GET associative array.