Frolov Eric Wustrow University of Colorado Boulder University of Colorado Boulder

HTTPT: A Probe-Resistant Proxy Sergey Frolov Eric Wustrow University of Colorado Boulder University of Colorado Boulder Abstract deployed behind already-existing Web Servers, making it im- Recently, censors have been observed using increasingly so- possible for censors to access and identify the proxy without phisticated active probing attacks to reliably identify and knowing the secret necessary to use it. block proxies. In this paper, we introduce HTTPT, a proxy designed to hide behind HTTPS servers to resist these active probing 1.1 Benefits attacks. HTTPT leverages the ubiquity of the HTTPS protocol HTTPT has several benefits over existing designs: to effectively blend in with Internet traffic, making it more difficult for censors to block. We describe the challenges that Replay attack protection Existing probe-resistant proxies HTTPT must overcome, and the benefits it has over previous are vulnerable to replay attacks, where the censor re-sends ob- probe-resistant designs. served client messages. Some proxies, such as shadowsocks- libev [43], implement a cache to prevent replays of previous 1 Introduction connections. However, China’s active probing thwarts this defense by permuting replays [3], and the behavior of not re- Internet censors strive to detect and block circumvention prox- sponding at all to replays may be unusual. In contrast, HTTPT ies. Many censors have adopted sophisticated proxy discovery is immune to replay attacks due to its reliance on TLS, which techniques, such as active probing [14,38,39], where the cen- includes bidirectional nonces in the handshake. sor connects to suspected proxy servers and sends probes Using existing web servers HTTPT leverages existing designed to distinguish them from non-proxies. In response web servers, making it more difficult for censors to single out to this attack, developers designed and built probe-resistant or identify. Hiding a proxy behind an authentic web server, proxies, such as ScrambleSuit [40], obfs4 [44], and Shadow- such as Apache or nginx, obviates the need to mimic TLS or socks [2] which require clients to prove knowledge of a secret any other server fingerprints [21] that could allow censors to key (obtained out of band) before the proxy will respond to block it. them. Overhead After the initial TLS handshake, HTTPT’s over- While probe-resistant proxies are harder for censors to head is minimal. To avoid the overhead associated with send- identify, recent work demonstrates there are still ways censors ing HTTP-compatible encodings, HTTPT instead uses Web- could detect probe-resistant proxies [20]. For instance, not Sockets between client and server. This allows the client responding to common protocols like HTTP or TLS is un- and server to send binary data over the web server, obviating usual behavior for servers. Recently, China’s GFW has been the need for costly HTTP-safe encodings such as MIME or observed using similar active probing techniques to detect base64. and block Shadowsocks [3]. Using a popular protocol Existing probe-resistant proxies In this work, we propose HTTPT, an alternative proxy rely on randomized protocols that try to blend in with generic architecture that is designed to defend against advanced ac- Internet traffic and make it hard for censors to identify pas- tive probing. While previous probe-resistant designs avoid sively. However, prior work has shown that these protocols looking like any protocol, we argue this approach is funda- might still be detectable using entropy analysis and machine mentally limited, as this behavior is unusual for servers on learning [5,37]. In HTTPT, we rely on the popular TLS pro- the Internet [20]. Instead, HTTPT uses HTTPS, a ubiquitous tocol to tunnel data, making it more difficult for censors to protocol with many diverse implementations, providing a het- block outright (because TLS is popular), and hard to perform erogeneous set of behaviors to blend in with. HTTPT can be fingerprinting attacks due to the heterogeneity of the protocol. 2 Background and random data, and identify the servers whose responses are consistent with responses of a given proxy (e.g. close im- The arms race between censors and circumvention tools has mediately for probes over 256 bytes or close after 90 seconds led to new techniques from both sides. In this section, we pro- otherwise for Lampshade). vide background on the active probing attacks and defenses Another potential way censors can identify probe-resistant employed recently. proxies is with replay attacks, where they replay a previous legitimate client’s initial message to the server. Since this mes- Active Probing To detect and block proxies, censors often sage proves knowledge of the secret, the server will respond, use active probing attacks, where they connect to suspected alerting the censor it is a proxy. proxy servers and attempt to communicate using known cir- Recently, the GFW has been observed using replay attacks cumvention protocols. If a server responds to these probes and the aforementioned timeout fingerprinting to identify and using the protocol (e.g. they offer proxy service to the censor), block Shadowsocks servers [3]. When a client connects to the censor learns the server is in fact a proxy, and can block it. a server and sends a Shadowsocks-sized initial message, the The Great Firewall of China (GFW) have used this technique GFW will send several follow-up probes, including replays for nearly a decade to find and block Tor bridges and other of the client’s message, and random-data probes of various proxies [14, 38, 39]. sizes up to and exceeding the 50-byte data limit unique to Censors typically find suspected proxies by traffic analysis Shadowsocks. or Internet scanning [13]. For instance, the GFW has been Probe-resistant proxies could try to prevent replay attacks observed to send probes to TLS (TCP port 443) servers when using server randoms or challenges, where the client proves a client first connects to them [38]. These probing techniques knowledge of the shared secret by hashing it with server- include sending random data and attempting to perform a Tor provided (random) data. However, this would require the handshake. If the server responds like a proxy, for instance server send data before the client has authenticated, which by responding with the Tor protocol, the endpoint is blocked could be used by censors to identify those proxies. by the censor. In contrast, web servers provide a natural defense against replays due to the security properties of TLS. TLS defends Probe-resistant proxies To resist active probing attacks, against client replay attacks by including the server random circumvention tools have developed probe-resistant proxy in the key agreement, ensuring each connection uses unique protocols, such as ScrambleSuit [40], obfs4 [44], Shadow- cryptographic keys. socks [2], and Lampshade [28]. These protocols require TLS also has a large number of popular implementations, clients to prove knowledge of a shared secret before the server providing a plethora of fingerprints to blend into [21]. While will respond. This shared secret is distributed among the previous work has shown the difficulty in mimicking existing clients through private channels, such as Tor’s BridgeDB [36] protocols for circumvention [25], many of these protocols or email, and without the knowledge of the secret, censors are only had a single used implementation, making it difficult for unable to get responses to their probes. As these proxies effec- circumvention tools to copy perfectly. In contrast, our tech- tively remain silent to such probes, it is difficult for censors nique leverages existing TLS implementations and efficiently to confirm if they are proxies. tunnels through them, avoiding the difficult task of mimicry al- together. While prior work has tunneled circumvention traffic through other protocols, including chat applications, Skype, Detecting probe-resistant proxies However, there are still or video streaming [4,29,31,32], these protocols are generally ways censors can identify probe-resistant proxies. In 2020, less popular than TLS and used in a narrow set of applica- Frolov et al. showed how not responding to any probes is tions. This allows censors to use traffic analysis to find proxy uncommon behavior online: over 94% of servers responded use [23]. In contrast, the wide range of applications used with data to at least one popular protocol [20]. Furthermore, by TLS makes it more difficult (though not impossible) to circumvention servers would often have unique timeouts or perform this kind of analysis. data limits before they closed connections, allowing censors to identify and even fingerprint probe-resistant proxies that never respond to the censor. For example, Lampshade reads 256 3 Design bytes from the client, and closes the connection immediately if it does not demonstrate knowledge of the secret (e.g. a At a high level, HTTPT functions as a TLS server that also censor sends random data). However, if less than 256 bytes has secret proxy functionality. The proxy function can only is read, the server will wait for 90 seconds, and then timeout be accessed by clients that know a secret key, distributed and close the connection. out of band. If a censor attempts to probe an HTTPT server, This gives censors a new strategy for identifying probe- they will receive the TLS server’s benign response, making resistant proxies: send several probes for popular protocols it difficult to identify the server as a proxy. We first describe HTTPT HTTPT Destination Web Server Destination Client Server TLS Client Hello ‘Hello’ HTTPT Server Server Hello, Change Cipher Spec HTTPT Client {TLS Encrypted Extentions, Cert, Cert Verify, Finished} Common TLS Record(‘Hello’) {TLS Finished} ‘Hello’ {HTTP GET /SecretLink Web Server Upgrade: WebSocket X-Destination: example.com:4242 HTTP GET /SecretLink... User App ‘Hello’ X-Message: hi server!} hi server! User Device HTTP/1.1 101..

Frolov Eric Wustrow University of Colorado Boulder University of Colorado Boulder

Uila Supported Apps

Poster: Introducing Massbrowser: a Censorship Circumvention System Run by the Masses

In Computer Networks, A

An ISP-Scale Deployment of Tapdance

Threat Modeling and Circumvention of Internet Censorship by David Fifield

Let's Encrypt: 30,229 Jan, 2018 | Let's Encrypt: 18,326 Jan, 2016 | Let's Encrypt: 330 Feb, 2017 | Let's Encrypt: 8,199

Enhancing System Transparency, Trust, and Privacy with Internet Measurement

Style Counsel: Seeing the (Random) Forest for the Trees in Adversarial Code Stylometry∗

How to Download Torrent Anonymously How to Download Torrent Anonymously

World-Wide Web Proxies

Colorado College Is Hiring!

Product End User License Agreement