Inferring Mechanics of Web Censorship Around the World
Total Page:16
File Type:pdf, Size:1020Kb
Inferring Mechanics of Web Censorship Around the World John-Paul Verkamp Minaxi Gupta School of Informatics and Computing, Indiana University fverkampj, [email protected] Abstract known censored websites from the test machines, analyzing raw packet captures, and various other specialized tests on a While mechanics of Web censorship in China are well stud- per site basis, we discovered and verified the following as- ied, those of other countries are less understood. Through a pects of the mechanics of censorship in the countries under combination of personal contacts and Planet-Lab nodes, we study: conduct experiments to explore the mechanics of Web cen- sorship in 11 countries around the world, including China. - Mechanics of censorship vary across countries: Even Our work provides insights into the diversity of modus though we were only able to conduct measurements in a operandi of censors around the world and can guide future small number of countries of interest due to lack of mea- work on censorship evasion. surement infrastructure in other countries, we saw evidence of much of the design space from our taxonomy being ex- 1 Introduction ploited by censors. For example, Malaysia, Russia and Turkey censor at the DNS while other countries do so at Internet censorship is a growing concern around the world, routers or other network hardware. South Korea censors affecting a large portion of the world’s population. Modes both at the DNS and at the routers. Also, Saudi Arabia and of censorship vary widely, ranging from complete discon- China censor on destination IP addresses, while other coun- nection, to selective censorship of Web pages, to censorship tries primarily censor on hostnames, URLs, or keywords. of search engines and online social networks (OSNs)–such Further, execution of censorship ranges between DNS redi- as Facebook and Twitter. Owing to these developments, the rects (Malaysia, Russia, South Korea and Turkey), connec- research community has recently been studying censorship tion timeouts (Bangladesh and India), TCP resets (China), from various angles. One of those angles is censorship eva- and HTTP responses with various 200-, 300-, and 400-level sion, with works in [1, 5, 7, 10, 19] proposing various ways status codes (Bahrain, Iran, Saudi Arabia, South Korea and of defeating censorship. Thailand). Measurement studies of censorship have also been un- - Censorship is not always explicitly communicated: dertaken. For example, Dainotti et al. studied complete While Bahrain, Iran, Saudi Arabia, South Korea and Thai- Internet disconnection, such as that in Egypt and Libya dur- land make censorship evident, other countries leave users ing the Arab Spring revolution in early 2011 [3]. The Open wondering why their attempt to fetch a web page was un- Net Initiative (ONI) studies have focused on what various successful. countries around the world censor [4, 11, 12]. In a similar spirit, CensMon used PlanetLab infrastructure to measure - Censorship can be stateful: China filters only the first censorship in various countries [15]. Aspects of Chinese HTTP GET request in a TCP stream, likely for the sake of censorship have received particular attention, with works efficiency. It does so by maintaining state. In addition, af- investigating how China censors Web accesses [2], Tor [17], ter filtering an HTTP request, it maintains flow state about Skype [9] and where its censoring modules are located [20]. source and destination IP addresses, port number and pro- However, the study of censorship in other countries has thus tocol of the denied request to deny further communication far been limited to identifying what is being censored rather between the same pair of machines even when such com- than how, leading to a gap in our understanding of how cen- munication would not previously have been blocked. No sors around the world operate. other country in our lists conducts stateful censorship. In this paper, we complement existing knowledge about - Customized censorship evasion is possible: Censorship the mechanics of Web censorship in 11 countries around the evasion techniques roughly fall in two categories. In the world, including China and an additional seven where no first are ones that avoid arousing suspicion from the censor- well-known publicly-available measurement infrastructure ing techniques in the first place, such as [1, 5, 7, 10, 19]. exists. Focusing on the question of how censorship is con- The second category includes techniques where the impact ducted (as opposed to what is being censored), we present of censorship is annulled at the client, such as in [2], where results on how Bangladesh, Bahrain, India, Iran, Malaysia, Clayton et al. propose a way to ignore TCP reset packets Russia, Saudi Arabia, South Korea, Thailand, and Turkey sent by the Chinese censoring module. Our work can mo- censor the Web. Of these, we could only find accessible tivate novel censorship evasion techniques, particularly in PlanetLab nodes in China, India, Russia and Turkey. For the latter category. the other countries we recruited volunteers through personal contacts. In many cases we confirm previous work; how- 2 Design Space for a Censoring Module ever in at least one case we found censorship where none There are three primary design considerations for any Web had previously been reported [4, 11, 12]. censorship system. The first is the trigger for censorship to We began by creating a taxonomy that explores the de- take place. The trigger could be a combination of hostname, sign space of a censoring module in terms of what trig- IP address, port number, protocol, or URL/keyword(s) cap- gers it, where it is located, and how censorship is executed. tured by regular expressions or an equivalent. The second Then through a series of experiments that include accessing consideration is the the location of the censoring module in 1 the network. Common locations of censoring modules in- - Trigger: Keyword, Location: Router Keyword-based clude DNS and routers. The final consideration is the actual filtering is unique in that while all the above forms of fil- execution of censorship, whereby a censor decides which tering filter only outgoing traffic, this type of filtering can protocols to modify and the manner in which censorship is be applied both to outgoing and incoming traffic. Specifi- communicated to the user. The following sections describe cally, a router can filter HTTP requests based on keywords the design space along these dimensions. contained in any portion of the URL and also filter re- 2.1 Trigger and Location sponses based on keywords in the returned content. For ex- ample, if the keyword ‘falun’ is included in the blacklist, Censorship could be triggered at several points after a user each of http://www.falun.com/index.html, http://example. first initiates a given connection. We discuss triggers and com/falun.htm and http://example.com/search?q=falun will location together since only certain kinds of information is be filtered in spite of the fact that ‘falun’ appears in the host- available for triggering censorship at specific locations. name in the first URL, in the directory path in the second, - Trigger: Hostname, Location: Local DNS resolver. The and as a query parameter in the third. first point at which censorship could be triggered is when - Trigger: Any, Location: Client machine. Filtering at the the hostname in the user’s request gets resolved through the client machine is another alternative to filtering at the DNS local DNS resolver. Instead of fetching the IP address(es) or routers. While its deployment could be challenging, it corresponding to the hostname by consulting the DNS hi- has the advantage of not consuming any network resources erarchy (which includes the resolver’s local cache), the re- or requiring any enhancements to network hardware. An solver could determine that the either the hostname itself example of such censorship is the custom version of Skype or the domain or sub-domain portion of it is on a to-be- produced at the behest of Chinese authorities. This version filtered list and return one or more of the pre-configured automatically scans incoming and outgoing messages for IP addresses. Typically, servers behind the returned IP ad- questionable content and sends this information to the Chi- dresses send custom pages indicating the reason for redirec- nese government [9]. Unfortunately, such techniques are tion. Note that since this filter does not have access to port radically different from those otherwise investigated. As number and protocol information, it will indiscriminately such, we do not consider this option subsequently in this filter all connections, possibly hurting non-Web services on paper. the filtered hostnames. 2.2 Execution - Trigger: IP address/port/protocol, Location: Router. Filtering at the level of TCP connection establishment and The actual execution of censorship is dependent on the loca- initial HTTP requests are the next logical steps after DNS tion of the censoring module. When the module is located at resolution. Each of these traverse through routers, making the local DNS resolver, the execution involves a DNS redi- them an excellent choice for locating the censoring module. rection to a specialized server. When the module is located While any router could be used to execute filtering, border at the router, there are multiple choices. In fact, several can routers belonging to ISPs are a good choice and are thus typ- co-exist. We describe them next. ically used because all traffic traverses them [20]. Among - Filter request or response: A censor may simply choose other things, router-based filtering could be triggered on the to filter requests. The first opportunity for this arises during destination server’s IP address–which is available as early the TCP connection establishment phase. However, since as the first SYN packet during TCP connection establish- hostnames are not available in TCP SYN packets, the black- ment and in every subsequent TCP packet containing HTTP list for such filtering would consist of IP addresses, port and data.