HTTPOS: Sealing Information Leaks with Browser-Side Obfuscation of Encrypted Flows
Total Page:16
File Type:pdf, Size:1020Kb
HTTPOS: Sealing Information Leaks with Browser-side Obfuscation of Encrypted Flows § § § † § ‡ Xiapu Luo ∗, Peng Zhou , Edmond W. W. Chan , Wenke Lee , Rocky K. C. Chang , Roberto Perdisci The Hong Kong Polytechnic University§, Georgia Institute of Technology†, University of Georgia‡ § † ‡ {csxluo,cspzhouroc,cswwchan,csrchang}@comp.polyu.edu.hk , [email protected] , [email protected] Abstract be profiled from traffic features [29]. A common approach to preventing leaks is to obfuscate the encrypted traffic by Leakage of private information from web applications— changing the statistical features of the traffic, such as the even when the traffic is encrypted—is a major security packet size and packet timing information [13,23,35,38]. threat to many applications that use HTTP for data deliv- Existing methods for defending against information ery. This paper considers the problem of inferring from en- leaks, however, suffer from quite a few problems. A major crypted HTTP traffic the web sites or web pages visited by problem is that, as server-side solutions, they require modi- a user. Existing browser-side approaches to this problem fications of web entities, such as browsers, servers, and even cannot defend against more advanced attacks, and server- web objects [13,38]. Modifying the web entities is not fea- side approaches usually require modifications to web enti- sible in many circumstances and cannot easily satisfy differ- ties, such as browsers, servers, or web objects. In this paper, ent applications’ requirements on information leak preven- we propose a novel browser-side system, namely HTTPOS, tion. A second fundamental problem with these methods to prevent information leaks and offer much better scalabil- is that they are still vulnerable to some advanced traffic- ity and flexibility. HTTPOS provides a comprehensive and analysis attacks. For example, although Sun et al. [35] pro- configurable suite of traffic transformation techniques for posed twelve approaches to defeat their traffic-analysis at- a browser to defeat traffic analysis without requiring any tack based on web object size, new attacks based on the server-side modifications. Extensive evaluation of HTTPOS tuple of packet size and direction [26] could still identify on live web traffic shows that it can successfully prevent the the web sites visited by a user. Finally, the efficacy of these state-of-the-art attacks from inferring private information methods has not been validated thoroughly based on actual from encrypted HTTP flows. implementations and live HTTP traffic. An exception is the work from Chen et al. [13] that is implemented as an IIS extension and a Firefox add-on. 1 Introduction In this paper we explore a browser-side approach to pre- vent information leaks from encrypted web traffic. Com- pared with the server-side approach, the browser-side ap- Leakage of private information from web applications proach has the scalability advantage, because only the traf- is a major security threat to many applications that use fic between the browser and the visited servers needs to be HTTP for data delivery. Cloud computing and other similar obfuscated. Moreover, it is possible for users to choose service-oriented platforms will only exacerbate this prob- which encrypted flows to be obfuscated in order to con- lem, because these services are usually delivered through serve resources and to reduce impacts on performance, but web browsers. Moreover, it is well known that data encryp- this flexibility advantage is very difficult to obtain from a tion alone is insufficient for preventing information leaks. server-side approach. However, designing a browser-side For instance, traffic-analysis attacks can identify the web method is very challenging, because the server’s behavior sites visited by a user from encrypted traffic [7,22,23,26] cannot be directly modified to evade traffic analysis. That and anonymized NetFlows records [14]. Chen et al. have is, we cannot apply the previously proposed methods that further showed that sensitive personal information, such as assume the capability of modifying the server’s behavior to medical records and financial data, could also be inferred the browser-side approach. through traffic analysis [13]. Besides, a user’s browser We show in this paper that it is possible to devise a could be fingerprinted [39], and her browsing patterns could browser-side method to defeat traffic analysis by presenting ∗Most of the work by this author was performed while at Georgia Tech. HTTPOS (which stands for HTTP or HTTPS with Obfus- cation), our proposed browser-side method. In addition to and HTTPOS’s location. There are five entities in the threat the advantages discussed above, HTTPOS has several other models: a victim user (i.e., client in Figure 1), an attacker, important advantages, such as supporting a wider scope of an encrypted tunnel, HTTPOS, and remote web sites/pages. application scenarios than the previous approaches (see the The threat models for scenarios (a)-(c) were adopted in pre- threat model in Section 2). HTTPOS is a user-level pro- vious works, including Sun et al. [35], Liberatore et al. [26] gram and does not need to modify any web entity. It obfus- and Wright et al. [38], and the threat model for scenario (d) cates encrypted web traffic by modifying four fundamen- was adopted by Chen et al. [13]. tal network flow features on the TCP and the HTTP layers, In scenarios (a)-(c), a client visits a web site through an namely, packet size, timing of packets, web object size, and encrypted tunnel at different layers, for example, wireless flow size. These features, as shown in the evaluation, are channel with WEP/WPA [21], IPSec-based IP tunnel [37], sufficient for diffusing and confusing the existing traffic- and SSH-based TCP tunnel [6], and the attacker attempts to analysis attacks. To modify the traffic from web servers, find outthat web site. In scenario (d), a client visits different HTTPOS exploits a number of basic features in TCP (e.g., web pages at a certain web site. This attack model assumes Maximal Segment Size (MSS) negotiation and advertising that the attacker knows the web site, and she attempts to window) and HTTP (e.g., HTTP Range and HTTP Pipelin- discover the web pages visited by the client. Note that an ing). updated web page due to the client’s interactions with the We have implemented HTTPOS and conducted exten- web site is considered as a new web page from an attacker’s sive experiments on live HTTP traffic to evaluate its per- viewpoint. For example, some web sites (e.g., Google) may formance in terms of evading traffic-analysis attacks and return auto-suggested words upon receiving a user input. impacts on the performance of network flows. The re- These dynamically updated web pages are regarded as dif- sults show that HTTPOS can effectively prevent the state- ferent web pages. of-the-art attacks from inferring private information from In all four scenarios, the attacker eavesdrops the en- encrypted HTTP flows. As will be shown in Section 5.2, crypted tunnel to obtain the encrypted packets sent between without HTTPOS an attacker can achieve high accuracy the victim and web servers, but she cannot decrypt these of inferring the web sites visited by a user. For example, packets. To infer the visited web sites/pages from these some attacks can achieve 94% accuracy on inferring the encrypted packets, the attacker first profiles the character- 100 web sites we tested by selecting the one with the high- istics of the traffic between the victim and each candidate est confidence based on their classification algorithm. With web site/page. The traffic profiling depends on the traffic HTTPOS all attacks’ accuracy drops to zero for at least 98 analysis methods [7,13,22,23,26]. She can easily build web sites. Even if an attacker chooses the top five web sites such profiles by visiting those web sites/pages via the en- as her inference, the accuracy for at least 94 web sites re- crypted tunnels herself. Equipped with the set of traffic pro- mains zero for all attacks. Moreover, some traffic-analysis files, the attacker then performs the inference by classifying attack can easily infer a user’s input to the Google search the captured traffic trace into the traffic profiles prepared box. But when HTTPOS is applied, the output of such at- beforehand. From the viewpoint of pattern classification, tack is reduced to a random guess. the traffic profiling step is known as conducting supervised Section 2 first presents the threat model, and Section 3 learning to train a classifier, whose feature set is the traffic describes our strategies for evading traffic-analysis attacks profile, and the class label is the web site/page. Moreover, and methods for manipulating network flow features. After the inference step corresponds to labeling a traffic trace ac- that, we introduce HTTPOS’s design and implementation cording to the trained classifier [18]. in Section 4, followed by extensive experiment results in Section 5. We finally introduce the related work in Section HTTPOS obfuscates the encrypted traffic by exploiting the protocol features in TCP and HTTP. Since the TCP con- 6 before concluding this paper in Section 7. nection (and therefore the HTTP connection) is end-to-end in scenarios (a), (b), and (d), HTTPOS can be deployed at 2 Threat models the browsers. On the other hand, the browser’s TCP connec- tion is terminated at the tunnel entry in scenario (c). There- Unlike previous works, we consider in this paper both fore, when HTTPOS is deployed at the browsers for TCP (1) the problem of inferring the web sites visited by users tunnels, only the HTTP methods can be used for traffic ob- and (2) the problem of inferring the web pages browsed fuscation.