Traffic-Classification) Rules and Avoiding Them Efficiently
Total Page:16
File Type:pdf, Size:1020Kb
lib•erate, (n) : A library for exposing (traffic-classification) rules and avoiding them efficiently Fangfan Li Abbas Razaghpanah Arash Molavi Kakhki Northeastern University Stony Brook University Northeastern University [email protected] [email protected] [email protected] Arian Akhavan Niaki David Choffnes Phillipa Gill Stony Brook University Northeastern University University of Massachusetts Amherst [email protected] [email protected] [email protected] Alan Mislove Northeastern University [email protected] ABSTRACT : A library for exposing (traffic-classification) rules and avoiding them Middleboxes implement a variety of network management policies efficiently. In Proceedings of IMC ’17, London, United Kingdom, November (e.g., prioritizing or blocking traffic) in their networks. While such 1–3, 2017, 14 pages. https://doi.org/10.1145/3131365.3131376 policies can be beneficial (e.g., blocking malware) they also raise issues of network neutrality and freedom of speech when used for application-specific differentiation and censorship. There is a poor understanding of how such policies are implemented in 1 INTRODUCTION practice, and how they can be evaded efficiently. As a result, most It is common for ISPs to use middleboxes to manage their network circumvention solutions are brittle, point solutions based on manual traffic. Recent work [32, 35] showed that such middleboxes are often analysis. implemented using deep packet inspection (DPI) on network traffic, This paper presents the design and implementation of lib·erate, relying on packet contents (e.g., Host headers, TLS Server Name a tool for automatically identifying middlebox policies, reverse- Indication (SNI) extensions, or protocol-specific fields) to selectively engineering their implementations, and adaptively deploying cus- apply policies to flows. While such policies may sometimes be tom circumvention techniques. Unlike previous work, our approach beneficent, there are many scenarios in which they may cause harm is application-agnostic, can be deployed unilaterally (i.e., only at to users and/or service providers, e.g., by violating net neutrality, one endpoint) on unmodified applications via a linked library or implementing traffic shaping, or censoring content. transparent proxy, and can adapt to changes to classifiers at run- In response, researchers have proposed a number of solutions to time. We implemented a lib·erate prototype as a transparent proxy obfuscate traffic in a way that evades classification. In general, these and evaluate it both in a testbed environment and in operational approaches entail encrypting/proxying traffic [3, 19], transforming networks that throttle or block traffic based on DPI-based classifier flows to resemble different protocols [48, 49], or selectively modify- rules, and show that our approach is effective across a wide range ing contents of fields that are targeted by classifiers24 [ ]. However, of middlebox deployments. developers of evasion techniques find themselves in an arms race with network providers, where the scales are tipped in favor of the CCS CONCEPTS provider. Designing evasion schemes generally involves one-off analyses based on a specific middlebox classifier (or suspected cen- • Networks → Middle boxes / network appliances; sorship trigger). Further, these schemes face deployment hurdles as KEYWORDS they often require both client- and server-side changes. For exam- ple, the mobile app Signal recently deployed domain fronting [24] Network Neutrality, Traffic Differentiation to avoid blocking in Egypt and UAE [36]. This required users to ACM Reference Format: update their apps to evade censorship, with the obvious caveat that Fangfan Li, Abbas Razaghpanah, Arash Molavi Kakhki, Arian Akhavan the app store might also be censored in the region. In contrast, a Niaki, David Choffnes, Phillipa Gill, and Alan Mislove. 2017. lib•erate, (n) network provider can often neutralize a statically defined evasion technique once it is known (e.g., by blocking a fronted domain). To Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial resolve this imbalance, there is a need for generalizable, adaptive, advantage and that copies bear this notice and the full citation on the first page. Copyrights for and non-invasive techniques to evade classifiers. components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, This paper presents the design and implementation of lib·erate, requires prior specific permission and/or a fee. Request permissions from [email protected]. a tool for identifying middlebox policies, reverse-engineering their IMC ’17, November 1–3, 2017, London, United Kingdom © 2017 Copyright held by the owner/author(s). Publication rights licensed to Association for Com- implementations, and adaptively deploying custom circumvention puting Machinery. techniques. Our key insight is that differentiation is necessarily im- ACM ISBN 978-1-4503-5118-8/17/11...$15.00 https://doi.org/10.1145/3131365.3131376 plemented by middleboxes using incomplete models of end-to-end IMC ’17, November 1–3, 2017, London, United Kingdom Fangfan Li et al. communication at the network and transport layers. These models • Policies are implemented using classifiers that rely on searches are necessarily incomplete because a middlebox does not have visi- for keywords in HTTP payloads, SNI fields, and protocol- bility into the state at endpoints; namely, a packet that traverses a specific fields. Some of the policies apply only to asmallnum- middlebox may never reach the intended destination (and/or the ber of initial packets, while Iran’s censoring devices inspect corresponding application at the destination). Further, in practice the entire flow. We found no evidence that UDP traffic was we find that middlebox designers do not completely implement classified by any of the operational networks we tested, pro- standards governing network traffic, nor do they necessarily in- viding a surprisingly easy way to evade their policies (i.e., use spect every packet of every flow (potentially for efficiency/cost UDP-based protocols). reasons). With this insight, lib·erate conducts efficient, targeted net- • Middleboxes running the classifiers exhibit different, incom- work measurements to identify any network- and transport-layer plete implementations of network and transport layers. Specif- inconsistencies between middlebox- and endpoint-views of end-to- ically, our testbed device does not check for a wide range of end communication, and leverages this information to transform invalid packet header values, while the Great Firewall of China arbitrary network traffic such that itis purposefully misclassified (GFC) does extensive packet validation. Iran and T-Mobile use (e.g., to avoid shaping or censorship). middleboxes that only partially check for invalid packet head- At a high level, lib·erate operates in four automated phases. First, ers. Further, we find that reordering of TCP segments can alter it conducts tests to determine whether an application’s network classification in all instances except for the GFC and AT&T traffic is differentiated by a middlebox (e.g., throttled, blocked). Ifso, (the latter uses a transparent HTTP proxy). it identifies which features of that network traffic that the middlebox • Except for AT&T and Iran, all middleboxes in our experiments uses to classify the application for differentiated treatment. Next, are vulnerable to misclassification using TTL-limited traffic it uses this information to test candidate evasion techniques that that reaches the middlebox but not the server. are designed to exploit inconsistencies between the middlebox and • The classifiers for Iran and AT&T only inspect port 80 for end-to-end view of network traffic. Finally, lib·erate uses the lowest matching content; thus simply changing the server port can cost, working technique to evade differentiation for network traffic evade classification. generated by the application. • Classifier results do not persist indefinitely, meaning that estab- Our approach is application agnostic, can be deployed unilater- lishing a connection and pausing can evade middlebox policies. ally (i.e., only at one endpoint) on unmodified applications via a For some devices, flow contents are ignored after a fixed delay. linked library or transparent proxy, and can adapt to changes to • If we can assume server-side support as well, we found that in- classifiers at runtime. Like any evasion technique, lib·erate does serting even one packet carrying dummy traffic (that is ignored not end the cat-and-mouse game between evasion and countermea- by the server) at the beginning of a flow evades classification sures; rather, by automating evasion and adapting to changes in in our testbed, T-Mobile, AT&T, and the GFC. middlebox classifiers quickly, it makes countermeasures substan- The paper is organized as follows. The next section discusses tially more expensive for network providers. related work in the area of middlebox implementations and evasion. The primary contributions of this paper are: In Section 3 we detail the goals and approach taken in this work. • An application-agnostic approach to identifying traffic-classifica- Section 4 presents the design of lib·erate