Improving Application Security with Data Flow Assertions

Improving Application Security with Data Flow Assertions Alexander Yip, Xi Wang, Nickolai Zeldovich, and M. Frans Kaashoek Massachusetts Institute of Technology – Computer Science and Artificial Intelligence Laboratory ABSTRACT miss some, which can lead to exploits. For example, one popular Web application, phpMyAdmin [38], requires sanitizing user input RESIN is a new language runtime that helps prevent security vulnerabilities, by allowing programmers to specify application-level data in 1,409 places. Not surprisingly, phpMyAdmin has suffered 60 vulnerabilities because some of these calls were forgotten [41]. flow assertions. RESIN provides policy objects, which programmers use to specify assertion code and metadata; data tracking, which This paper presents RESIN, a system that allows programmers allows programmers to associate assertions with application data, to make their plan for correct data flow explicit using data flow and to keep track of assertions as the data flow through the appli- assertions. Programmers can write a data flow assertion in one cation; and filter objects, which programmers use to define data place to capture the application’s high-level data flow invariant, and RESIN checks the assertion in all relevant places, even places where flow boundaries at which assertions are checked. RESIN’s runtime checks data flow assertions by propagating policy objects along with the programmer might have otherwise forgotten to check. data, as that data moves through the application, and then invoking RESIN operates within a language runtime, such as the Python or filter objects when data crosses a data flow boundary, such as when PHP interpreter. RESIN tracks application data as it flows through writing data to the network or a file. the application, and checks data flow assertions on every executed path. RESIN uses runtime mechanisms because they can capture Using RESIN, Web application programmers can prevent a range of problems, from SQL injection and cross-site scripting, to inadver- dynamic properties, like user-defined access control lists, while tent password disclosure and missing access control checks. Adding integration with the language allows programmers to reuse the application’s existing code in an assertion. RESIN is designed to help a RESIN assertion to an application requires few changes to the existing application code, and an assertion can reuse existing code programmers gain confidence in the correctness of their application, and data structures. For instance, 23 lines of code detect and prevent and is not designed to handle malicious code. three previously-unknown missing access control vulnerabilities in A key challenge facing RESIN is knowing when to verify a data phpBB, a popular Web forum application. Other assertions compris- flow assertion. Consider the assertion that a user’s password can ing tens of lines of code prevent a range of vulnerabilities in Python flow only to the user herself. There are many different ways that an adversary might violate this assertion, and extract someone’s and PHP applications. A prototype of RESIN incurs a 33% CPU overhead running the HotCRP conference management application. password from the system. The adversary might trick the application into emailing the password; the adversary might use a SQL injection attack to query the passwords from the database; or the adversary Categories and Subject Descriptors might fetch the password file from the server using a directory D.2.4 [Software Engineering]: Software Verification—Assertion traversal attack. RESIN needs to cover every one of these paths to checkers; D.4.6 [Operating Systems]: Security and Protection prevent password disclosure. A second challenge is to design a generic mechanism that makes General Terms it easy to express data flow assertions, including common assertions like cross-site scripting avoidance, as well as application-specific Security, Languages, Design assertions. For example, HotCRP [26], a conference management application, has its own data flow rules relating to password disclo- 1. INTRODUCTION sure and reviewer conflicts of interest, among others. Can a single assertion API allow for succinct assertions for cross-site scripting Software developers often have a plan for correct data flow within avoidance as well as HotCRP’s unique data flow rules? their applications. For example, a user u’s password may flow out The final challenge is to make data flow assertions coexist with of a Web site only via an email to user u’s email address. As another each other and with the application code. A single application may example, user inputs must always flow through a sanitizing function have many different data flow assertions, and it must be easy to before flowing into a SQL query or HTML, to avoid SQL injection add an additional assertion if a new data flow rule arises, without or cross-site scripting vulnerabilities. Unfortunately, today these having to change existing assertions. Moreover, applications are plans are implemented implicitly: programmers try to insert code often written by many different programmers. One programmer in all the appropriate places to ensure correct flow, but it is easy to may work on one part of the application and lack understanding of the application’s overall data flow plan. RESIN should be able to enforce data flow assertions without all the programmers being Permission to make digital or hard copies of all or part of this work for aware of the assertions. personal or classroom use is granted without fee provided that copies are RESIN addresses these challenges using three ideas: policy ob- not made or distributed for profit or commercial advantage and that copies jects, data tracking, and filter objects. Programmers explicitly an- bear this notice and the full citation on the first page. To copy otherwise, to notate data, such as strings, with policy objects, which encapsulate republish, to post on servers or to redistribute to lists, requires prior specific the assertion functionality that is specific to that data. Program- permission and/or a fee. mers write policy objects in the same language that the rest of the SOSP’09, October 11–14, 2009, Big Sky, Montana, USA. Copyright 2009 ACM 978-1-60558-752-3/09/10 ...$10.00. application is written in, and can reuse existing code and data struc- Vulnerability Count Percentage Vulnerable sites SQL injection 1176 20.4% Vulnerability among those surveyed Cross-site scripting 805 14.0% Cross-site scripting 31.5% Denial of service 661 11.5% Information leakage 23.3% Buffer overflow 550 9.5% Predictable resource location 10.2% Directory traversal 379 6.6% SQL injection 7.9% Server-side script injection 287 5.0% Insufficient access control 1.5% Missing access checks 263 4.6% HTTP response splitting 0.8% Other vulnerabilities 1647 28.6% Total 5768 100% Table 2: Top Web site vulnerabilities of 2007 [48]. Table 1: Top CVE security vulnerabilities of 2008 [41]. or HTML. In practice this is difficult to accomplish because there are many data flow paths to keep track of, and some of them are tures, which simplifies writing application-specific assertions. The non-intuitive. For example, in one cross-site scripting vulnerability, RESIN runtime then tracks these policy objects as the data propa- phpBB queried a malicious whois server, and then incorporated the gates through the application. When the data is about to leave the response into HTML without first sanitizing the response. A survey control of RESIN, such as being sent over the network, RESIN in- of Web applications [48] summarized in Table 2 illustrates how vokes filter objects to check the data flow assertions with assistance common these bugs are with cross-site scripting affecting more than from the data’s policy objects. 31% of applications, and SQL injection affecting almost 8%. We evaluate RESIN in the context of application security by show- If there were a tool that could enforce a data flow assertion on ing how these three mechanisms can prevent a wide range of vul- an entire application, a programmer could write an assertion to nerabilities in real Web applications, while requiring programmers catch these bugs and prevent an adversary from exploiting them. to write only tens of lines of code. One application, the MoinMoin For example, an assertion to prevent SQL injection exploits would wiki [31], required only 8 lines of code to catch the same access verify that: control bugs that required 2,000 lines in Flume [28], although Flume provides stronger guarantees. HotCRP can use RESIN to uphold DATA FLOW ASSERTION 1. Any user input data must flow through its data flow rules, by adding data flow assertions that control who a sanitization function before it flows into a SQL query. may read a paper’s reviews, and to whom HotCRP can email a pass- RESIN aims to be such a tool. word reminder. Data flow assertions also help prevent a range of other previously-unknown vulnerabilities in Python and PHP Web Directory Traversal applications. A prototype RESIN runtime for PHP has acceptable performance overhead, amounting to 33% for HotCRP. Directory traversal is another common vulnerability that accounts The contributions of this work are the idea of an application-level for 6.6% of the vulnerabilities in Table 1. In a directory traversal data flow assertion, and a technique for implementing data flow attack, a vulnerable application allows the user to enter a file name, assertions using filter objects, policy objects, and data tracking. Ex- but neglects to limit the directories available to the user. To exploit periments with several real applications further show that data flow this vulnerability, an adversary typically inserts the “..” string as part assertions are concise, effective at preventing many security vulner- of the file name which allows the adversary to gain unauthorized abilities, and incrementally deployable in existing applications. access to read, or write files in the server’s file system.

Load more