Quick viewing(Text Mode)

Climbing China's Great Firewall

Climbing China's Great Firewall

Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal

Volume 5 Issue 2 Article 2

June 2018

Climbing 's Great

Adam Casey University of Minnesota, Morris

Follow this and additional works at: ://digitalcommons.morris.umn.edu/horizons

Part of the Computer Sciences Commons

Recommended Citation Casey, Adam (2018) "Climbing China's ," Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal: Vol. 5 : Iss. 2 , Article 2. Available at: https://digitalcommons.morris.umn.edu/horizons/vol5/iss2/2

This Article is brought to you for free and open access by the Journals at University of Minnesota Morris Digital Well. It has been accepted for inclusion in Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal by an authorized editor of University of Minnesota Morris Digital Well. For more information, please contact [email protected]. Climbing China's Great Firewall

Cover Page Footnote This work is licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/ 4.0/. UMM CSci Senior Seminar Conference, April 2018 Morris, MN.

This article is available in Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal: https://digitalcommons.morris.umn.edu/horizons/vol5/iss2/2 Casey: Climbing China's Great Firewall

Climbing China’s Great Firewall

Adam L. Casey Division of Science and Mathematics University of Minnesota, Morris Morris, Minnesota, USA 56267 [email protected]

ABSTRACT 2. BACKGROUND Many different countries censor the within their In order to understand how circumvention techniques work state. Citizens frequently wish to avoid the state censor- a basic understanding of some internet protocols and con- ship. There are many different methods that have been de- cepts is required. In this section, background will first be veloped to achieve this. Governments and citizens are in a given on the basic frameworks of the internet. Also, back- constant arms race, with both developing opposing technolo- ground is given on the Transport Control Protocol (TCP). gies. China in particular has the largest population of peo- This background information is necessary to understand how ple on the planet, and the Chinese government attempts to INTANG circumvents the GFW. Information is also given censor the internet. This paper will investigate three meth- on Content Delivery Networks (CDNs). A foundation in ods of navigating around state censorship: Cachebrowser, this subject is necessary to understand how Cachebrowser INTANG and . Cachebrowser and INTANG were devel- circumvents the GFW. An overview of Tor will also be given oped specifically to navigate around state censorship while in this section. Background on Tor is necessary to under- Tor was originally developed for anonymous browsing. This stand one of the most common ways of circumventing inter- paper will analyze their effectiveness and viability to avoid net censorship and to understand how the Chinese govern- censorship. ment attempts to stop Tor traffic.

Keywords 2.1 Internet Basics In order to understand how the Chinese government cen- China, Great Firewall, Tor, CDNs, INTANG sors the internet and to understand the TCP protocol, first some internet frameworks must be explained. To begin with client and server roles need to be explained. When you make 1. INTRODUCTION a request to visit a website such as www..com your The largest group of people in the world with censored computer talks to the server for facebook. This is referred access to the internet is the population of China, with up- to as the server because the user makes a request and the wards of 1.3 billion people. The Chinese government still server serves the content to the user. The user in this case actively censors the internet through a system of network is referred to as the client. monitoring and network manipulation, referred to broadly Before the client actually makes a request to facebook.com as the Great Firewall of China (GFW) [11]. it needs to know how to get to it. Each server on the internet This paper will describe the methods and evaluate the has a unique address associated with it. This is called an IP success of three separate ways of evading the censorship address [10]. In order for the client to find the IP address for of the GFW. These methods are Cachebrowser, INTANG a server they are trying to access they must make a Domain and Tor. These methods each work in very different ways. Name System (DNS) request. This is done by making a Cachebrowser works by allowing the user to easily access request to a DNS server which has a list of IP addresses for uncensored versions of websites that are usually blocked by different websites in a cache. The client tells the DNS server accessing the cached versions of these sites that the Chinese the website they are trying to access. The server then sends government cannot feasibly for reasons that will be the client the IP address of the website they are trying to discussed in section 3.1. INTANG works by manipulating access. Finally, the client can make a request to the server the internet traffic being sent by the user’s machine directly they are trying to access and send data to it and the server to avoid censorship by manipulating packets which will be can send data back to the client. discussed in depth in section 3.2. Tor works by bouncing One way the Chinese government attempts to censor the the user’s connection through multiple different nodes. This internet is by manipulating the records on DNS servers to makes it difficult to track. How the Chinese government return incorrect IPs for websites or to drop packets for re- probes for Tor servers will be discussed in section 3.3 quests to websites the government does not want clients to be able to access. This practice is referred to broadly as DNS poisoning [3]. This work is licensed under the Creative Commons Attribution- Another way the Chinese government attempts to censor NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/. the internet is through a technique called IP address filtering UMM CSci Senior Seminar Conference, April 2018 Morris, MN. [5]. This is where the IP of the website you are trying to

Published by University of Minnesota Morris Digital Well, 2018 1 Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal, Vol. 5, Iss. 2 [2018], Art. 2

Figure 1: Diagram of a TCP packet taken from [9].

access is blocked at the hardware level, usually on your home to this paper the relevant ones are SYN/ACK/FIN/RST. router. This is different from DNS poisoning because even The ACK flag indicates that the packet is an acknowledge- if we know the IP address of the website we are attempting ment packet. The RST flag indicates that the connection to access we cannot get to it. should be reset. The SYN flag indicates that sequence num- The final censorship technique that will be mentioned in bers should be synced. In practice this should only be used this paper is keyword censorship [11]. This is where the at the beginning of the connection process. The FIN (finish) Chinese government monitors the client’s internet traffic and flag indicates that this is the last packet from the sender. will terminate connections with keywords the government Another important aspect of the TCP protocol to know has deemed sensitive. This is a way to block connections for this paper is Time to live (TTL). Each packet is given if the IP of the website is not blocked and its DNS records a TTL. The TTL indicates how long the packet will stay in have not been poisoned. the network before it destroys itself. This is useful so that Another concept that is necessary to understand circum- packets do not clog up network infrastructure forever if they vention techniques is the idea of a proxy, sometimes referred do not reach their intended destination. to as a [7]. A proxy is essentially a server that There are three main steps to transmitting data using the makes web requests for you. Instead of talking directly to TCP protocol. This first step is to establish the connection the server the client wants to access, it talks to a different using a handshake process. A connection is a link between server, which talks to the server the client wants to talk client and host where data can be sent freely between client to. Then the proxy server then takes the information the and server. Next the actual data is sent. Finally the con- intended server sent it and sends it back to the client. nection is terminated. Connection establishment is a multistage process. The first stage is when the user establishes a connection to the 2.2 The TCP Protocol server. Next the user sends a SYN (sync) packet to the Transport Control Protocol (TCP) [10] is one of the main server. Once the server receives this SYN packet it sends an internet standards. The main data structure of TCP is the ACK (Acknowledgement) packet back to the user along with packet. Data is sent in discrete pieces in order. One of these a SYN packet. The next step to establish data exchange pieces is referred to as packet. This order is maintained is for the user to send an ACK packet back to the server. by incrementing the sequence number with each new packet After this process has been completed regular data transfer that is sent. Packets are also sent to establish and end a con- can occur. nection. Figure 1 shows a diagram of a TCP packet. The Closing a connection is also a multistage process. Closing first 16 bits are reserved for the source port. The next 16 are a connection is different from establishing a connection in reserved for the destination port. Bits 32 to 63 are reserved that a client or a server can close the connection. Only a for the sequence number. Bits 64 to 95 are reserved for the client can establish a connection. The process for closing a acknowledgement number if the packet is flagged as an ACK connection on the server end or the client end is the same (acknowledgement) packet. The data offset bits serve a dual but the roles are reversed. As an example let’s consider a purpose. They indicate the size of the TCP header and the server closing a connection with a client. The server will offset from the beginning of the header to the beginning of send a FIN packet to the client with the sequence number of the actual data. The next three bits are reserved. Next, the the next data packet the client is expecting to receive. Once flag bits begin. These indicate the type of packet. Bits 112 the client receives the FIN packet from the sever it sends to 128 indicate the windows size the sender of the packet is an ACK packet with the sequence number increased by one. willing to receive. The next section of bits is a checksum The client then sends the server its own FIN packet with which is used for error checking. The next section is the ur- sequence number relative to the amount of data it has sent gent data if this is a flagged URG (urgent) packet. There are to server thus far. The server will then send a final ACK multiple flags that can be set. Many of these are not relevant

https://digitalcommons.morris.umn.edu/horizons/vol5/iss2/2 2 Casey: Climbing China's Great Firewall

packet to the client and the connection will be terminated on both ends. Another part of the TCP protocol that is manipulated to get around censorship is the TCP Control Block (TCB). The TCB is a data structure that is created by the TCP protocol when a connection is established. This TCB is nor- mally created on the client and the server, but in the case of the GFW one is also created by the GFW to monitor the connection. The purpose of the TCB is to keep track of all connections incoming and outgoing on the machine it is created on. The GFW uses the TCB it creates in combina- tion with packet inspection to terminate connections with sensitive keywords. 2.3 CDNs and cached content CDNs [6] also known as content delivery networks are a Figure 2: Diagram of Tor traffic through nodes. distributed set of servers hosting web content. The goal of Taken from [1]. this system is to decrease latency by redirecting users to servers hosting the content near them instead of one central one that could possibly be across the globe. CDNs are not not publicly available, like it is for Tor nodes. One of the run by the companies that have the actual content. They ways the Chinese government blocks Tor traffic is to block are run by a separate company and pay the company that all IPs of the publicly listed Tor nodes. runs the CDN to host their content. There are many rea- Tor also has the ability to use multiple different pluggable sons to use a CDN. One reason is less stress on a single transports. A pluggable transport is a type of cipher suite, server. Since the network load is spread out between multi- a set of encryption algorithms which the data is encrypted ple servers hosting the cached content, no one server takes with before being sent out. This allows users to access parts the brunt of the load. The servers that host this cached of the Tor network that are usually blocked by the Chinese content that users access are referred to as edge servers. government using IP address filtering. Pluggable transports A common way to implement a CDN for an already exist- also work to disguise traffic from being identifiable as Tor ing website is through DNS modifications. When navigating traffic. This is because even if censors do not know what to the web address for a site, the client will be redirected website you are trying to access they will terminate your for the CDN server for all content that does not frequently connection if they recognize it as Tor. Not all pluggable change on the site. This would be things like logos, head- transports work well for avoiding the censorship of the Great ers that are always the same, etc. Content that changes Firewall however. These pluggable transports often have frequently will be served by the host server. In some cases only one layer of encryption and encrypt packets in an easily for very popular websites dynamic content is still hosted on recognizable way, allowing the government to recognize Tor CDN servers. This is accomplished by having a high speed traffic and block it accordingly. The currently recommended pipeline from the host server to the CDN server which keeps pluggable transport to use is meek-amazon. Meek-amazon the CDN server up to date. When a CDN server has con- works by using a technique called . While tent change on it, it relays this information to all other CDN too complicated to explain in-depth in this paper, domain servers hosting the site so that things are consistent. As a fronting makes it look like the client is accessing a different result of the fact that CDNs are operated by companies sep- website than they actually are. The meek-amazon pluggable arate from the ones wanting their web content hosted, mul- transport accomplishes this by routing traffic through Ama- tiple different websites content is stored on the same CDN zon cloud servers, which then access the Tor network as a server. The client just requests the portion of the content proxy. Figure 3 illustrates how the meek-amazon pluggable that they actually need. transport works. 2.4 Tor Tor [8] gets its name from the project’s previous name 3. METHODS OF CIRCUMVENTION “The Onion Router”. Tor is a free piece of software that is used for anonymous internet browsing. It achieves this by 3.1 Cachebrowser redirecting the user’s connection through multiple different Cachebrowser [11] is a tool developed by John Holowczak nodes. Tor is referred to as ”The Onion Router” because and Amir Houmansadr to bypass the censorship of the Great each data packet is wrapped in a layer of encryption for Firewall of China. It uses CDNs and cached content, as each node that it passes through. Each node only decrypts mentioned in Section 2.3, to access web pages that would be one layer of information to know where to send the packet inaccessible using the internet without a circumvention tool. next and does not have access to the actual data you are Using cached content is way of circumventing common cen- sending. This is because the actual data is only located sorship techniques such as IP address filtering. IP address in the last layer. Figure 2 illustrates this process of traffic filtering is the process of blocking access to certain IP ad- traveling through multiple nodes. This makes it difficult to dresses. IP address filtering is ineffective at blocking CDNs track. Another important feature of Tor to understand is because one website is spread across multiple different IPs, the Tor bridge. A Tor bridge is essentially the same as a so the censors would have to blacklist all of them to block Tor node. The difference is that the list of Tor bridges is the site. Also, because of the way edge servers work, multi-

Published by University of Minnesota Morris Digital Well, 2018 3 Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal, Vol. 5, Iss. 2 [2018], Art. 2

Figure 3: Diagram of meek-amazon domain fronting taken from [4].

ple sites are hosted on one server, so blocking access to one with the incorrect sequence number to create a false TCB server will inadvertently block access to all websites on that on the GFW and then initiating the real connection with the shared IP, not all of which the censor necessarily desires to server, which the GFW will ignore because of the sequence block. number discrepancies. DNS interference is another widely used censorship tech- TCB teardown works by by crafting RST,RST/ACK and nique and is most effective at blocking cached content. DNS FIN packets with a TTL constructed in a such that the interference works by interfering with the name resolution packet reaches the GFW and terminates the TCB but does process when trying to access a blocked website. This is ef- not reach the server, thus keeping the server alive. fective at blocking cached content because it doesn’t matter Data reassembly has two separate forms: Out-of-order how many IPs the content is spread across or if more than data overlapping and In-order data overlapping. Out-of- one site is hosted at that one IP. This is because the DNS order data overlapping works by sending garbage data frag- interference prevents the end user from knowing the IP of ments with the same offset and length as the real data. the content in the first place. When the GFW encounters two packets with the same offset Cachebrowser works by keeping its own database of IP and length it records the first one and ignores the second. addresses to CDN hosted alternatives to regular content. In-order data overlapping works by filling up the GFWs in- When Cachebrowser encounters a CDN domain it internally put buffer until it is overloaded and can no longer read new enumerates and saves all other IP addresses for that CDN data. This done by crafting insertion packets with either a in case that one is censored. If Cachebrowser encounters wrong checksum or a very short Time to Live (TTL) so they a customer domain it will return the addresses of CDNs fill up the GFW buffer while still keeping the connection to which host that content. This is done by using the free DNS the server alive. resolver www.digwebinterface.com. This site is currently INTANG uses a combination of all these strategies to- not blocked in China. gether with new strategies developed from analyzing the As an alternative to this method, Cachebrowser also im- performance of the old strategies to approach circumven- plements a remote bootstrapper using SWEET [2]. SWEET tion from multiple angles. INTANG is a measurement tool, is a communications tool that encapsulates messages through which means that it keeps track of which strategies work emails and sends them through standard email protocols. with regards to different IPs and adjusts strategies in use In the case of Cachebrowser, a web server is set up in the based on that. United States watching for emails. When a Cachebrowser user makes a request for a site it does not have a CDN- 3.3 Tor hosted IP for a message is sent out through SWEET using Unlike [5] and [11] which both developed tools to circum- the bootstrapper. This message is sent to a server in the vent the GFW and tested them within their paper, [4] in- United States. The server then makes the DNS lookup and stead examines the GFW behavior and hypothesizes on how sends the information back to the client to be added to its this information can be used create Tor servers that can database. more easily avoid blacklisting. Probing for Tor servers is triggered the the GFW sees traffic that carries the signature 3.2 INTANG of a cipher suite that Tor uses. Tor in its default configu- INTANG [5] is a tool developed by Wang et al. to avoid the ration is essentially completely useless already since it has censorship of the GFW. It combines many different strate- publicly available list of IPs the government can blacklist gies. A large portion of the paper [11] is dedicated to figuring outright. Tor developed a pluggable transport layer in re- out out how the GFW works using trial and error to gain sponse. The suggested pluggable transport to use currently knowledge of the system in order to help develop a tool to is meek-amazon or meek-azure. Neither of these transports avoid it. ITANG works by implementing all of strategies were tested in the paper as they were new and not yet pop- evaluated in the analysis section of the paper. There are ular at the time. too many individual strategies to go over all of them in this For the study, two different infrastructures were created paper, but they fall into 3 main categories: TCB creation, and long running Tor servers’ logs were analyzed. One in- TCB teardown, and data reassembly. frastructure that was created was the shadow infrastructure. TCB creation works by sending a SYN insertion packet This infrastructure was built up of private Tor bridges that

https://digitalcommons.morris.umn.edu/horizons/vol5/iss2/2 4 Casey: Climbing China's Great Firewall

censored browsing in every case where non-censored brows- Dataset Time span ing is available. The maximum difference between page Shadow Dec 2014 – Feb 2015 (three months) download times between Cachebrowser and a non-censored Sybil Jan 29, 2015 – Jan 30, 2015 (20 hours) version inside of China is .725 seconds. Moving on to the Log Jan 2010 – Aug 2015 (five years) latency sample data from page access in the United States, Counterprobe Apr 22 – Apr 27 (six days) Tor has higher latency than Cachebrowser in every case. This suggests that if Tor were to be tested within China it Table 1: Timeline of our experiments. We created four would most likely also be slower than Cachebrowser there as datasets that span hours, days, months, and years. well. The privacy of Cachebrowser is also robust. Assuming the CDN is using an encrypted pathway, which almost all Counter- do, the state cannot see what content you are viewing. The Shadow Sybil Log probe Figure 3: Experimental setup for the “Shadow” dataset. state will also not know what website in particular you are Probing rate ! Figure 4: Diagram of the Shadow dataset taken viewing because multiple sites are hosted at the IP since the ISN patterns ! from [4]. CDN serves multiple different sites off of the same server. TSval patterns !!! they did not advertise themselves publicly, neither to the The only possible leak in this situation is that the CDN obfs2/3 blocking ! ! publicDNS Torresolver directory, nor IP to its database except Tianjin of secret bridges. All itself has information about its visitors. However, sharing Tor bootstrapping ! As a result, no genuine Tor user should attempt to connect this information with anyone is a violation of the CDNs user Probing types DYN 1 216.146.35.35 98.6 % 92.7 & ! to one of our bridges. The bridges listened on random ports agreement therefore we can assume that they do not share Architecture DYN 2 216.146.36.36 99.6 % 93.1 % !! in the ephemeral range, to reduce the chance of their dis- this information. Topology ! covery by blind Internet scanning. Finally, we used another Figure 5: DNS server access success rates. Taken 4.2 INTANG Results Table 2: Observed phenomena and their visibility in our EC2 machine to proxy the communication between our Tor from [11]. INTANG [11] in general is very successful at avoiding the datasets. bridges and the first public Tor relay in a circuit. This extra proxy hop is to prevent another potential bridge-discovery censorship of the GFW. Research indicates that success rates only the researchers could access. Figure 4 is diagram of attack, wherein a malicious Tor relay makes a list of all the for different strategies vary wildly. INTANG takes advan- how the Tor clients and servers were set up for the shadow IP addresses that connect to it (cf. [14, III.D]). tage of this by implementing multiple different packet ma- 4.1 Shadow Infrastructure data set. § nipulation strategies and choosing the best one systemati- We built a “shadow infrastructure” of our own Tor clients 4.2The Sybil second Infrastructure infrastructure was the Sybil infrastructure. cally. Figure 7 shows the success rates of the multiple in- The Sybil infrastructure set up a Tor bridge in France that dividual strategies tested and the success rate of INTANG and bridges for a controlled experiment of active probing To obtain broader insight into the extent of the censor’s over time. These clients and bridges were not actually used redirected 600 ports to it using a firewall redirection, then itself. connectedactive probing to each infrastructure, of the ports we in designed ascending another order. experi- The by any real users, but rather were dedicated exclusively to 1 One side effect of INTANG is that because the government Logment dataset to attract camemany from activeanalyzing probers the logs in a a short one of period the re- of poisons DNS requests in the same way it blocks TCP traffic, our own experimental purposes. The infrastructure tested time. vanilla Tor, obfs2, and obfs3 in equal measure, since active searcher’s Tor servers that has been running since January INTANG is also effective at evading DNS poisoning. Figure 2011.We By did looking so by at constructing logs they found a “Sybil the server infrastructure,” had been sub- so 5 shows the success rate of accessing DNS servers using IN- probing is known to target all three of these protocols. Fig- named because it seemingly consisted of hundreds of dis- ure 3 illustrates this setup. ject to active probing from China for 2.5 years, first showing TANG. The data was collected querying a DNS poisoned uptinct in Tor 2013. servers. This was We not used the a virtual result of private an attempt server to (VPS) induce in domain, in this case www.dropbox.com, 100 times. The dis- We had six Tor clients within China: three in China Uni- France and one in China. We ran a vanilla Tor bridge on the com, a large country-wide ISP; and three in CERNET, the probing but seemingly the regular amount of probing by the crepancy in the DNS resolution table is because the area government.VPS in France The and logs redirected were then the used port to range evaluate 30000–30600 how ef- of Tianjin is an outlier. Success rates are much lower in Chinese Research and Education Network. (We chose CER- to our Tor port using firewall port redirection. The actual NET because previous work suggested that censorship of fective probing was at disrupting Tor by looking at whether Tianjin. The exact reason for this is unknown, but it is hy- theTor TCP server handshake ran on a separate was completed. port in theThey ephemeral found that range. obfs2 pothesized that the GFW infrastructure is more developed CERNET might differ from the rest of China [7].) Outside Then, from our VPS in China, we established Tor con- of China, we ran six Tor bridges in Amazon’s EC2 cloud. and obfs3, which are different types of pluggable transports, in this area. Success rates in Tianjin were 38% for DYN 1 werenections very to rarely every disrupted port in andthe porthad high range success in ascending rates while or- and 24% for DYN 2. DYN 1 and DYN 2 are two different Two of the bridges ran vanilla Tor, two ran obfs2, and two der. This took approximately two hours, because we waited ran obfs3. We assigned each of our six clients in China to a Tor without a pluggable transport was essentially unusable. DNS servers. In particular DYN 1 is ’s DNS server several seconds in between connection attempts. Since the 8.8.8.8 and DYN 2 is Google’s DNS server 8.8.4.4. Normally unique EC2 machine; the clients never contacted any bridge GFW blocks by IP:port tuple, not just IP address [32], the other than their own assigned one. Initially, all clients at- 4. RESULTS connections to these DNS servers are terminated using a GFW interpreted every single port in the range as a distinct TCP connection termination. Research also indicates that tempted to connect to their assigned bridge every 15 min- Tor bridge and probed them separately. This experiment utes. After one month, we changed this to five minutes after 4.1 Cachebrowser Results two OpenDNS resolvers 208.67.222.222 and 208.67.220.220 resulted in 622 active probing connections (and significantly are uncensored even without the use of INTANG. This is preliminary analysis showed that finer granularity in timing Cachebrowser [5] was able to successfully load facebook.com more TCP connections, as we will discuss later) to the VPS similar to the method that the bootstrapper uses of simply might be useful. on a client’s machine. This is a complex website with content in France. request the DNS lookup from an uncensored address in the We also created a control group consisting of nine bridges hosted on multiple CDNs that is completely inaccessible to first place. (six in Amazon EC2, three in a US university) and a sin- users4.3 not Server using Logsome Analysiskind of circumvention technique. Re- gle client outside of China. We never connected to any of searchThis found dataset that comes of the from top the 1000 application Alexa websites, logs of 82% a server were the control bridges from one of the clients within China; by 4.3 Tor Results hostedoperated on by some one CDN of the provider authors, while some 85% stretching of news sites back like to comparing the traffic received by our “active” and control Research [4] found that while the GFW may not be able wsj.comJanuary were 2010. hosted The on server CDNs. runs various common network bridges, we can isolate general background scanning from to detect Tor traffic from clients using sufficiently advanced services,The main including issue with the three Cachebrowser we use in isthe latency. analysis: Figure HTTP, 6 active probing by the GFW. The three control bridges not pluggable transports such as obfs2 and obfs3, it can effec- comparesHTTPS, and the SSH. latency In addition times of to Cahcebrowser common networking and other ports, al- hosted on EC2 allow us to determine whether the GFW tively probe for and shut down Tor servers regardless of the ternativethe server methods has hosted of circumvention a Tor bridge since within January and outside 2011, an of treats EC2-hosted servers differently from others. The con- cipher suite used. Research also indicated that the Chinese China.ordinary As vanilla you bridge may notice without facebook.com pluggable transports. does not have By trol client outside of China connects to all of the bridges. government’s Tor probes are active in real-time, only stop- amining latency the value application for inside logs, of weChina. found This that is the because server face- has If our control client could not establish a Tor connection to ping for short periods. book.combeen receiving is completely active probes censored from within China China. for over Also, 2.5 years. there one of the “active” bridges, we discarded the measurement isAn no important latency measurement difference in for the Tor Log within experiment China. compared The re- we did from China for that bridge. searchers indicate that they were not allowed to run Tor 5. CONCLUSIONS We took various steps to prevent our bridges from being on1Our their earlier client analysis in China confirmed and that that is why simply it isestablishing not included an After analyzing three circumvention techniques it is clear discovered by any means other than active probing. We con- here.initial Chachebrowser TLS handshake does with have a server higher su latencyffices to than attract non- a that there are advantages and downsides to each. Tor is figured all of them to be private bridges, which means that prober.

Published448 by University of Minnesota Morris Digital Well, 2018 5 2.5 2.0 1.789 1.6 1.675

1.988 1.601

1.4 2.0 1.178 1.5 1.2

1.159

0.922 0.911 0.915 1.5 1.275 1.0

0.787 0.936

1.040 1.0 0.8 0.736 0.728 0.721 0.715 0.715 0.849 0.694

1.0 0.608 0.615 0.726 Download Time (s) Download Time (s) Download Time (s) 0.572 0.6 0.405 0.405 0.405 0.407

0.535 0.538 0.535 0.536 0.340 0.493 0.336 0.333

0.436 0.415 0.409 0.399 0.389 0.5 0.300 0.4 0.283 0.260 0.245 0.245 0.233 0.5 0.277 0.226

0.167 0.111 0.136

0.054 0.067 0.072 0.073 0.2 0.049 0.049 0.048 0.049 0.049 0.050 0.049

0.0 0.0 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Default Alt. Edge Server Default Alt. Edge Server Default Alt. Edge Server (a) Akamai (b) EdgeCast (c) Amazon CloudFront

Figure 3: ComparingScholarly downloadHorizons: timeUniversity of a CDN of Minnesota, content object Morris from Undergraduate the “best” edge Journal, server Vol. returned 5, Iss. 2 by [2018], mapping Art. 2 system vs. other edge servers. The alternative edge servers are chosen to be geographically far distanced from the end-user.

3.0 Non-Censored CacheBrowser 3.0 2.534 Non-Censored CacheBrowser Tor

2.381

2.218 2.5 2.5

2.0 2.0 1.504 1.397

1.329 1.349 1.349

1.204 1.5 1.124 1.5

0.910

0.878

0.704

0.704 0.704

Download Time (s) 1.0 Download Time (s) 1.0 0.479 0.467 0.493 0.477 0.457 0.424 0.340

0.245 0.245 0.5 0.193 0.5 0.049 0.067 0.000 0.049 0.011

0.0 istockphoto.com nbc.com facebook.com Facebook Object .com theatlantic.com 0.0 istockphoto.com nbc.com facebook.com Facebook Object pinterest.com theatlantic.com

(a) Client in China (b) Client in Amherst, MA, U.S.

Figure 4: Comparing download latency for several websites using three methods: non-censored (regular) download, using Figure 6: Cachebrowser latency compared to regular access latency and Tor latency inside China and outside CacheBrowser, and using Tor. We were not allowed to run Tor on our Chinese client. Also, in our China experiments, there of China. Taken from [5]. is no “Non-censored” measurement for facebook.com, which is blocked, and we use the HTTPS version of istockphoto.com IMC ’17, November 1–3, 2017, London, UK for its “Non-censored” measurement (it is blocked by keyword filtering only). All the other websites are not blocked in China. Success Failure 1 Failure 2 Vantage Points Strategy [3] DNS spoofing. Apr. 2018. : https://en.wikipedia. Min Max Avg. Min Max Avg. Min Max Avg. org//DNS_spoofing. in order to blockImproved forbidden TCB Teardown content as well89.2% as to 98.2% detect95.8% and1.7% 6.7%cooperates3.1% 0.0% with 5.4% the1.1% GFW in order to protect its business in- Improved In-order Data Overlapping 86.7% 97.1% 94.5% 2.9% 8.9% 4.4% 0.0% 5.2% 1.1% Inside China disable the use ofTCB any Creation censorship + Resync/Desync circumvention 88.5% 98.1% system95.6% like1.9% 7.0%terests[4]3.3% Roya and0.0% reputation Ensafi, 5.5% 1.1% David in other Fifield, (non-censored) Philipp Winter, regions. Nick Feam- We CacheBrowser. However,TCB Teardown we + TCB assume Reversal that the 90.2% censors 98.2% 96.2% are ra-1.7% 5.6%assume2.6% ster, that0.0% Nicholasthe 5.7% CDN1.1% Weaver, providers and fully Vern controlled Paxson. by “Examin- the cen- tional, i.e., theyINTANG refrain Performance from any actions93.7% thatwill 100.0% interfere98.3% 0.0% 3.0%sors0.9% (e.g.,ing0.0% ChineseHow 3.5% the0.6% CDN Great providers) Firewall Discovers will not Hidden even host Circum- any Improved TCB Teardown 85.6% 92.9% 89.8% 4.6% 7.6% 6.8% 0.3% 6.8% 3.5% with non-prohibitedImproved Internet In-order Data activities Overlapping ofa 89.4% significant 96.0% 92.7% num-1.3% 6.2%forbidden3.6% vention0.6% content. 7.0% Servers”.3.7% In: Proceedings of the 2015 Inter- Outside China berFigure of their 7: non-circumvention INTANGTCB Creation +success Resync/Desync citizens. rates taken 78.1% In particular, 95.6% from84.6% [11]. we2.4% 18.6% In12.9% thisnet0.9% paper, Measurement 4.0% we2.6% do not Conference consider. unobservability IMC ’15. Tokyo, against Japan: assume that theTCB censors Teardown do + TCB not Reversal block all 84.6%encrypted 93.1% 89.5% trac5.5% 8.7%active7.1% attacksACM,0.1% 7.9% 2015, and3.3% tra pp.c 445–458. analysis,isbn but: discuss 978-1-4503-3848-6. the challenges Table 4: Success rate of new strategies as encryption is essential for various popular non-forbidden and possible solutions in the rest of this section. proven to be effective and has an active development com- [5] John Holowczak and Amir Houmansadr.“CacheBrowser: Internet services. Additionally, we assume that the cen- munity. It is, however, slow and prone to frequent shut- Bypassing Chinese Censorship Without Proxies Using sorsevolved do GFW not devices entirely to “re-transition” block a into commercial the resynchronization CDN providerthis is caused (e.g., by some unknown GFW behavior or middlebox in- downs. This is due to the government identifying Tor servers 6.2 PrivacyCached Content”. In: Proceedings of the 22Nd ACM bystate. IP blacklisting all edge servers) merely becauseterference. it serves However, since these cases areSIGSAC not sustained Conference (are very on Computer and Communica- throughWe combine probing. the TCB Reversal Cachebrowserstrategy with the isTCB an Teardown effectiverare), way we of argue view- that this isPrivacy more likely to form be due to Circumvention middlebox Provider Proxy-based some prohibited content publishers, since the CDN provider tions Security. CCS ’15. Denver, Colorado, USA: ACM, ingwith RSTwebsitesstrategy. thatSpeci￿cally, would as shown normally in Fig. 4, we be￿rst censored send a byinterference. the Chinese circumvention systems like Tor, , and VPNs expose willfakelikely SYN/ACK be packet hosting from the client many to the non-forbidden server to create a false contentOverall, publish- we ￿nd that high Failure 1 rates is the major reason government, but it has downsides as well. If a website is not 2015, pp. 70–83. isbn: 978-1-4503-3832-5. ersTCB as on well. the evolved The GFW censors, device. Next, however, we establish the may legitimate performfor —selective overall low success rates.their An introspective users to look significant suggests that privacy risks from the circumven- hosted3-way handshake, on a whichCDN invalid it cannot with respect be to accessed the evolved GFW using Cachebrowserbecause some servers/middleboxes[6] acceptHow packets Content regardless of Delivery the Networks Work. url: https: —blocking of CDN domain names and encrypted trac. tion providers. For instance, malicious Tor relays can per- atdue toall. the existing INTANG TCB. Then is we useful send a RST for insertion navigating packet to around(wrong) keyword ACK number or theform presence a of/toolset/ the www MD5 . of option cdnetworks attacks header, [31 .] tocom comprise / en / news Tor / howusers’ - content privacy, - teardownWe assume the TCB on that the old commercial GFW model, followed CDN by the providers HTTP Failures do not 1 happen. co- Further, the TTL chosen is sometimes inaccurate censorship, however it is prone to updates to the GFW. If delivery-networks-work/4258. operaterequest. with censored users nor with contentdue publishers to (a) network dynamicsand or (b) hitting a malicious server-side middleboxes; VPN service (e.g., one run by a repres- theAvoiding government interference implements from middleboxes changes or server. thatWhen negatethis resultsthe packet in undesired side-e￿ects that increase “Failures 1”. in order to circumvent censorship, as this may jeopardize sive[7] governmentProxy server to. spy Apr. on 2018. dissidents)url: https://en.wikipedia. can learn the users’ manipulationscrafting “insertion” packet, done we choose by INTANG the insertion packets the entire wisely toolIn is addition, rendered we ￿nd that for vantage points outside China, the theirso as to business not experience interests interference from with the middleboxes, economically-powerful and not TTL discrepancy state- unfortunatelybrowsing has a signiorg/wiki/Proxy_server￿ activitiescant drawback. and When even the. content of their communi- useless.result in side-e I￿ wouldects on the suggest server. We usingprimarily a use combination TTL-based accessing of INTANG the servers in China, the GFW devices and the desired level censors like China. A CDN provider may cooperate cations. Such risks do not apply to CacheBrowser due to its andinsertion Cachebroswer packets since it is generally to avoid applicable. censorship The key challenge first,servers as they are usually have within a fewpublisher-centric[8] hops ofTor each. otherurl (sometimes: approach,https co- : / / i.e., www no . torproject proxy is used. . org / about / withhere isthe to choose censors, an accurate however, TTL value the to cooperationhit the GFW, while is constrainedlocated). As a result to it is extremely hard to converge to a TTL value lower latency than Tor. If the content you are trying to view overview.html.en. notnot hittingviolate server-side thejurisdiction middleboxes or servers. of the We do CDN’s that by ￿rst homefor country the insertion as packet, that satisA￿es CacheBrowser the requirement of hitting client the may use a remote Bootstrapper, cannotmeasuring bethe hopaccessed count from using the client this to strategy,the server using Tor a canGFW be but used not the as server. a As a consequence,[9] Transmission in these scenarios, Control use Protocol. Mar. 2018. url: https: well as its business interests in other parts of the world. For e.g., the email-based system described earlier. The remote backup.way similar as tcptraceroute. Then, we subtract a small from the of this discrepancy can causeBootstrapper either type//en.wikipedia.org/wiki/Transmission_Control_ of failures.can We seenot from see the content of the client’s commu- instance,measured hop as count, weto analyzed try and prevent inthe Section insertion3 packet, Akamai from onlyTable partially 4 that both the Failure 1 and Failure 2 rates are on average a reaching (hitting) the server-side middleboxes or the server. In bit higher than for the vantagenications, points insideProtocol but China. may. learn the destinations she browses. Even our evaluation, we heuristically choose = 2, but INTANG can Finally, because INTANG can choose the best strategy and in- Acknowledgmentsiteratively change this to converge to a good value. sertion packets for each server[10] IP basedTransmission on historic results, we Control also Protocol. url: https://tools. I wouldIn addition, like we exploit to thank the new Elena MD5 and Machkasovaold timestamp insertion and Nicevaluated McPhee INTANG for performance in an additionalietf.org/html/rfc793 row in Table 4 for . packets, which allow the bypassing of the GFW without interfering vantage points inside China. It shows an average success rate of helpfulwith middelboxes feedback or the server. on draftsTable 5 summarizes of this how paper. we choose I would98.3% also which like represents to the performance[11] Zhongjie with the optimal Wang, strategy Yue Cao, Zhiyun Qian, Chengyu Song, thankinsertion packetsUMM for alumni each type of Jeff TCP Lindblompacket. for their usefulspeci￿c feedback to each website79 and network path.and This Srikanth is without further V. Krishnamurthy. “Your State is Not optimizing our implementation (e.g., measuring packet losses and as well.Packet Type TTL MD5 Bad ACK Timestamp adjusting the level of redundancy for insertionMine: packets). A Closer Look at Evading Stateful Internet Cen- SYN Take away: While we do magnify thesorship”. causes for In:failures,Proceedings the of the 2017 Internet Measure- RST biggest take away from this section is thatment our new Conference hypothesized. IMC ’17. London, United Kingdom: ReferencesData behaviors of the GFW seem to be fairlyACM, accurate, 2017, and that pp. the 114–127. isbn: 978-1-4503-5118-8. Table 5: Preferred construction of insertion packets new strategies are seemingly very e￿ective in realizing the goal of [1] Nandan Desai. Unclosing the Dark Web -evading Post the 2. GFW, June especially when the best strategies are chosen 2016. url: https://knowledgetransmitter.quora.according to websites and network paths. Results.com/Unclosing-the-Dark-Web-Post-2We ￿rst analyze the results for individual evasion strate- . gies. As seen from Table 4, the overall “Failure 2” rate is as low as 7.2 Evading TCP DNS Censorship 1%[2] for all G. the Dludla, strategies, which M. Bembe, (a) show that T. our Olwal, new strategies and JunThe Kyun GFW censors Choi. UDP DNS requests with DNS poisoning. It cen- have a high“Enhanced success rate on SWEET the GFW which protocol suggests that for (b) energy our efficientsors TCP DNS wire- requests by injecting RST packets just like how it hypotheses with regards to the GFW evolution seem accurate. censors HTTP connections. Thus, our evasion strategies can also We ￿ndless that sensor both the Failures networks”. 1 and Failures In: 22013 always happen Internationalbe used toConfer- help evade TCP DNS censorship. As discussed in §6, with regardsence to a on few ICT speci￿c Convergence websites/IPs. One can (ICTC) presume. that Oct. 2013,INTANG pp. converts 332– UDP DNS requests into TCP DNS requests. To 335.

124

https://digitalcommons.morris.umn.edu/horizons/vol5/iss2/2 6