How the Great Firewall of China Is Blocking Tor
Total Page:16
File Type:pdf, Size:1020Kb
How the Great Firewall of China is Blocking Tor Philipp Winter and Stefan Lindskog Karlstad University fphilwint, stefl[email protected] Abstract Tor blocking infrastructure is designed and (3) we dis- cuss and propose circumvention techniques. Internet censorship in China is not just limited to the We also point out that censorship is a fast moving tar- web: the Great Firewall of China prevents thousands get. Our results are only valid at the time of writing1 and of potential Tor users from accessing the network. In might—and probably will—be subject to change. Nev- this paper, we investigate how the blocking mechanism ertheless, we believe that a detailed understanding of the is implemented, we conjecture how China’s Tor block- GFC’s current capabilities, a “censorship snapshot”, is ing infrastructure is designed and we propose circumven- important for future circumvention work. tion techniques. Our work bolsters the understanding of China’s censorship capabilities and thus paves the way towards more effective circumvention techniques. 2 Related Work In [28], Wilde revealed first crucial insights about the 1 Introduction block of Tor traffic. Over a period of one week in Decem- ber 2011, Wilde analysed how unpublished Tor bridges On October 4, 2011 a user reported to the Tor bug tracker are getting scanned and, as a result, blocked by the GFC. that unpublished bridges stop working after only a few Wilde’s results showed that when a Tor user in China minutes when used from within China [17]. Bridges are establishes a connection to a bridge or relay, deep packet unpublished Tor relays and their very purpose is to help inspection (DPI) boxes identify the Tor protocol. Shortly censored users to access the Tor network if the “main after a Tor connection is detected, active scanning is ini- entrance” is blocked [4]. The bug report indicated that tiated. The scanning is done by seemingly random Chi- the Great Firewall of China (GFC) has been enhanced nese IP addresses. The scanners connect to the respec- with the potentiality of dynamically blocking Tor. tive bridge and try to establish a Tor connection. If it This censorship attempt is by no means China’s first succeeds, the bridge is blocked. attempt to block Tor. In the past, there have been efforts Wilde was able to narrow down the suspected cause to block the website [27], the public Tor network [24, 22] for active scanning to the cipher list sent by the Tor client and parts of the bridges [18]. According to a report [27], inside the TLS client hello2. This cipher list appears to these blocks were realised by simple IP blacklisting and be unique and only used by Tor although for a long time HTTP header filtering. All these blocking attempts had it was identical to the cipher list advertised by Firefox in common that they were straightforward and inflexible. 3. That gives the GFC the opportunity to easily identify In contrast to the above mentioned censorship at- Tor connections. Furthermore, Wilde noticed that active tempts, the currently observable block appears to be scanning is started at the beginning of multiples of 15 much more flexible and sophisticated. The GFC blocks minutes. An analysis of the Tor debug logs yielded that bridges dynamically without simply enumerating their IP Chinese scanners initiate a TLS connection, conduct a addresses and blacklisting them (cf. [7]). renegotiation and start building a Tor circuit, once the In this paper, we try to deepen the understanding of TLS connection was set up. After the scan succeeded, the the infrastructure used by the GFC to block the Tor 1 anonymity network. Our contributions are threefold: (1) The data was mostly gathered in March 2012 and the paper written in April 2012. we reveal how users within China are hindered from ac- 2The TLS client hello is sent by the client after a TCP connection cessing the Tor network, (2) we conjecture how China’s has been established. Details can be found in the Tor design paper [5]. 1 IP address together with the associated port (we hereafter proxies and a VPS. We compiled a list of public Chi- refer to this as “IP:port tuple”) of the freshly scanned nese SOCKS proxies by searching Google. We were bridge is blocked resulting in users in China not being able to find a total of 32 SOCKS proxies which were dis- able to use the bridge anymore. tributed amongst 12 distinct autonomous systems. We With respect to Wilde’s contribution, we (1) revisit connected to these SOCKS proxies from computers out- certain experiments in greater detail and with signif- side of China and used them to rerun certain experiments icantly more data, we (2) rectify observations which on a smaller scale to rule out phenomena limited to our changed since Wilde’s analysis, and we (3) answer yet VPS. open questions. Our second vantage point and primary experimental machine is a VPS we rented. The VPS ran Linux and resided in the autonomous system number (ASN) 4808. 3 Experimental Setup We had full root access to the VPS which made it pos- sible for us to sniff traffic and conduct experiments be- During the process of preparing and running our experi- low the application layer. Most of our experiments were ments we took special care to not violate any ethical stan- conducted from our VPS, whereas the SOCKS proxies’ dards and laws. In addition, all our experiments were primary use was to verify the results. in accordance with the terms of service of our service providers. In order to ensure reproducibility and en- courage further research, we publish our gathered data 3.2 Shortcomings and developed code3. The data includes Chinese IP ad- Active analysis of a censorship system can easily attract dresses which were found to conduct active scanning the censor’s attention if no special care is taken to “stay of our bridge. We carefully configured our Tor bridges under the radar”. Due to the fact that China is a sophis- to remain unpublished and we always picked randomly ticated censor with the potential power to actively fal- chosen high ports to listen to so that we can be sure that sify measurement results, we have to point out potential the data is free from legitimate Tor users. shortcomings in our experimental setup. We have no reliable information about the owners of 3.1 Vantage Points our public SOCKS proxies. Whois lookups did not yield anything suspicious but the information in the records In order to ensure a high degree of confidence in our re- can be spoofed. It might even be possible that the sults, we used different vantage points. We had a relay SOCKS proxies are operated by Chinese authorities. in Russia, bridges in Singapore and Sweden and clients Second, our VPS was located in a data center where Tor in China. There is, however, no technical reason why we connections typically do not originate. We also had no chose Russia, Singapore and Sweden. information about whether our service provider conducts Bridge in Singapore: A large part of our experiments Internet filtering and the type or extent thereof. was conducted with our Tor bridge located in Singapore. The bridge was running inside the Amazon EC2 cloud [1, 23]. The OpenNet Initiative reports Singapore as a 4 Analysis country conducting minimal Internet filtering involving only pornography [9]. Hence, we assume that our ex- 4.1 How Are Bridges and Relays Blocked? perimental results were not interfered with by Internet filtering in Singapore. The first step in bootstrapping a Tor connection requires Bridge in Sweden: To reproduce certain experiments, connecting to the directory authorities to download the we set up Tor bridges located at our institution in Swe- consensus which contains all public relays. We noticed den. Internet filtering for these bridges was limited to that seven of all eight directory authorities are blocked well-known malware ports, so we can rule out filtering on the IP layer. These machines responded neither to mechanisms interfering with our results. TCP, nor to ICMP packets. One authority turned out to Relay in Russia: A public Tor relay located in a Rus- be reachable and it was possible for us to download the sian data center was used to investigate the type of block, consensus. We have no explanation why this particular public Tor relays are undergoing. The relay served as a machine was unblocked. middle relay, meaning that it is not picked as the first hop After the consensus has been downloaded, clients can in a Tor circuit and it does not see the exit traffic of users. start creating a circuit. Using our Russian relay, we Clients in China: To avoid biased results, we used found out that when a client in China connects to a re- two types of vantage points inside China: open SOCKS lay, the GFC lets the TCP SYN pass through but drops the SYN/ACK sent by the bridge to the client. The 3http://www.cs.kau.se/philwint/static/gfc/ same happens when a client tries to connect to a blocked 2 bridge. However, clients are still able to connect to dif- 4.4 Where Does the Fingerprinting Hap- ferent TCP ports as well as ping the bridge. We believe pen? that the reason for the GFC blocking relays and bridges by IP:port tuples rather than by IPs is to minimise collat- We want to gain a better understanding of where the Chi- eral damage. nese DPI boxes are looking for the Tor fingerprint. In detail, we tried to investigate whether the DPI boxes also analyse domestic and ingress traffic.