FLAX: Systematic Discovery of Client-Side Validation Vulnerabilities in Rich Web Applications

FLAX: Systematic Discovery of Client-side Validation Vulnerabilities in Rich Web Applications Prateek Saxena§ Steve Hanna§ Pongsin Poosankam‡§ Dawn Song§ {prateeks,sch,ppoosank,dawnsong}@eecs.berkeley.edu §University of California, Berkeley ‡Carnegie Mellon University Abstract side component and a client-side component. The server- side component processes the user’s request and generates The complexity of the client-side components of web an HTML response that is sent back to the browser. The applications has exploded with the increase in popularity client-side code of the web application, typically written in of web 2.0 applications. Today, traditional desktop ap- JavaScript, is sent with the HTML response from the server. plications, such as document viewers, presentation tools The client-side component executes in the web browser and and chat applications are commonly available as online is responsible for processing input data and dynamically up- JavaScript applications. dating the view of web page on the client. We define a CSV Previous research on web vulnerabilities has primarily vulnerability as one which results from unsafe usage of un- concentrated on flaws in the server-side components of web trusted data in the client-side code of the web application. applications. This paper highlights a new class of vulnera- CSV vulnerabilities belong to the general class of in- bilities, which we term client-side validation (or CSV) vul- put validation vulnerabilities, but are different from tradi- nerabilities. CSV vulnerabilities arise from unsafe usage of tional web vulnerabilities like SQL injection [10, 35] and untrusted data in the client-side code of the web applica- reflected/stored cross-site scripting [18, 26, 37, 39]. For ex- tion that is typically written in JavaScript. In this paper, ample, one type of CSV vulnerability involves data that we demonstrate that they can result in a broad spectrum of enters the application through the browser’s cross-window attacks. Our work provides empirical evidence that CSV communication abstractions and is processed completely by vulnerabilities are not merely conceptual but are prevalent JavaScript code, without ever being sent back to the web in today’s web applications. server. Another type of CSV vulnerability is one where a We propose dynamic analysis techniques to systemati- web application sanitizes input data sufficiently before em- cally discover vulnerabilities of this class. The techniques bedding it in its initial HTML response, but does not sani- are light-weight, efficient, and have no false positives. tize the data sufficiently for its use in the JavaScript compo- We implemented our techniques in a prototype tool called nent. FLAX, which scales to real-world applications and has dis- CSV vulnerabilities are becoming increasingly likely covered 11 vulnerabilities in the wild so far. due to the growing complexity of JavaScript applications. Increasing demand for interactive performance of rich web 2.0 applications has led to rapid deployment of application 1 Introduction logic as client-side scripts. A significant fraction of the data processing in AJAX applications (such as Gmail, Google Docs, and Facebook) is done by JavaScript components. Input validation vulnerabilities constitute a majority of JavaScript has several dynamic features for code evaluation web vulnerabilities and have been widely studied in the and is highly permissive in allowing code and data to be past [4, 8, 24, 28, 30, 35, 42, 43]. However, previous vul- inter-mixed. As a result, attacks resulting from CSV vulner- nerability research has focused primarily on the server-side abilities often result in compromise of the web application’s components of web applications. This paper focuses on integrity. client-side validation (or CSV) vulnerabilities, a new class of vulnerabilities which result from bugs in the client-side Goals. As a first step towards finding CSV vulnerabil- code. ities, we aim to develop techniques that analyzes a web A typical Web 2.0 application has two parts: a server- application in an end-to-end manner. Since most existing works have targeted their analyses to server-side compo- if the analysis treats any use of untrusted data which has nents (written in PHP, Java, etc.), this paper develops com- been passed through a parsing/validation construct as safe, plementary techniques to discover vulnerabilities in client- it is likely to miss many bugs. Static analysis is another side code. In particular, we develop a framework for sys- approach [14, 17]; however static analysis tools do not di- tematic analysis of JavaScript1 code. Our objective is to rectly provide concrete exploit instances and require addi- build a tool for vulnerability discovery that does not require tional developer analysis to prune away false positives. developer annotations, has no false positives and is usable Recently, symbolic execution techniques have been used on real-world applications. for discovering and diagnosing vulnerabilities in server-side Challenges. The first challenge of holistic application anal- logic [9,23,25,42]. However, web applications pervasively ysis is in dealing with the complexity of JavaScript. Many use complicated operations on string and arrays data types, JavaScript programs use code evaluation constructs to dy- both of which raise difficulties for decision procedures in- namically generate code as well as to serialize strings into volved in symbolic execution techniques. The power and complex data structures (such as JSON arrays/objects). In expressiveness of string decision procedures today is lim- addition, the language supports myriad high-level opera- ited. Practical implementations of string decision proce- tions on complex data types, which makes the task of prac- dures presently do not deal with the generality of JavaScript tical analysis difficult. string constraints involving common operations (such as In JavaScript application code, we observe that parsing String.replace, regular expression match, concatena- operations are syntactically indistinguishable from validation and equality) expressed together over multi-variable, tion checks. This makes it infeasible for automated syn- variable-length inputs [9,20,23,25]. Other approaches have tactic analyses to reason about the sufficiency of validation been limited to a subset of input-transformation operations checks in isolation from the rest of the logic. Due to the in PHP [4]. The present limitations of symbolic execution convenience of their use in the language, developers tend tools has motivated the need for designing lighter-weight to treat strings as a universal type for exchange, both of techniques. code as well as data. Consequently, complex string op- Our Approach. We propose a dynamic analysis approach erations such as regular expression match and replace are to discover vulnerabilities in web applications called taint pervasively used both for parsing input and for performing enhanced blackbox fuzzing. Our technique is a hybrid ap- custom validation checks. proach that combines the features of dynamic taint analy- Third, in many web applications the client-side code pe- sis with those of automated random fuzzing. It remedies riodically sends data to a remote server for processing via the limitations of purely dynamic taint analysis (described browser interfaces such as XMLHttpRequest, and then above), by using random fuzz testing to generate test cases operates on the returned result. We call such a flow of data, that concretely demonstrate the presence of a CSV vulner- to a server and back, a reflected flow. Client-side analyses ability. This simple mechanism eliminates false alarms that face the inherent difficulty of dealing with hidden process- would result from a purely taint-based tool. ing on remote servers due to reflected flows. The number of test cases generated by vanilla blackbox Existing Approaches. Fuzzing or black-box testing is a fuzzing increases combinatorially with the size of the input. popular light-weight mechanism for testing applications. In our hybrid approach, we use character-level precise dy- However, black-box fuzzing does not scale well with a large namic taint information to prune the input search space sig- number of inputs and is often inefficient in exploration of nificantly. Dynamic taint information extracts knowledge the input space. A more directed approach used in the past of the type of sink operation involved in the vulnerability, in the context of server-side code analysis is based on dy- thereby making the subsequent blackbox fuzzing special- namic taint-tracking. Dynamic taint analysis is useful for ized for each sink type (or in other words, be sink-aware). identifying a flow of data from an untrusted source to a Taint enhanced blackbox fuzzing scales well because the critical operation. However, dynamic taint-tracking alone results of dynamic taint analysis are used to create inde- alone can not determine if the application sufficiently vali- pendent abstractions of the original application which are dates untrusted data before using it, especially when parsing small and take fewer inputs, and can be tested efficiently and validation checks are syntactically indistinguishable. If with sink-aware fuzzing. From our experiments (Section 5), an analysis tool treats all string operations on the input as we report a average reduction of 55% in the input sizes with parsing constructs, it will fail to identify validation checks the use of dynamic taint information. and will report

Load more