Multiproxy: a Collaborative Approach to Censorship Circumvention
Total Page:16
File Type:pdf, Size:1020Kb
MultiProxy: a collaborative approach to censorship circumvention Gaomei Shi MultiProxy: a collaborative approach to censorship circumvention Master’s Thesis in Computer Science Distributed Systems group Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Gaomei Shi 28th March 2019 Author Gaomei Shi Title MultiProxy: a collaborative approach to censorship circumvention MSc presentation 29th March 2019 Graduation Committee Dr. sc. ETH J.S.Rellermeyer Delft University of Technology Dr. ir. J.A.Pouwelse Delft University of Technology Dr. -Ing. T.Fiebig Delft University of Technology Abstract In recent years, many countries and administrative domains exploit control over their communication infrastructures to censor online materials. The concrete reas- ons behind the Internet censorship remain poorly understood due to the opaque nature of the systems. Generally, Internet censorship is to disrupt the free flow of information. It involves a series of steps to stop the dissemination of inform- ation, or prevent the access to information, for example, disrupt the link between the users and providers. These technologies bring significant inconvenience for le- gitimate users. The goal of the thesis is to undertake a recent study to measure the behavior of the Great Firewall of China (GFW). Based on that, this work designs a Peer-to-Peer (P2P) circumvention system called MultiProxy which exploits the blockchain-based economical model in order to create a balanced environment for resources providing and consuming. The system also uses multi-hop messaging to protect the privacy of the request initiators. The evaluation results show that MultiProxy can evade censorship while protecting users privacy. iv Preface Censorship is existing and prevalent with the advent of the Internet. It is like a double-edged sword which has both positive and negative impact on the gen- eral public. For instance, the Internet censorship limits the bad information from spreading, while at the same time it also restricts the access according to prefer- ences of regimes, and this can cause inconvenient for netizens. Take the GFW, the world’s largest country-wide Internet censorship system as an example. There are few numbers of formal documentation about the operational principles under such a sophisticated system. Therefore, I think its patterns are deserved to be explored, and with knowledge about the working mechanisms of the GFW, a countermeasure could be further designed. It was very pleasant to be able to work on this exciting and challenging research topic in the Distributed Systems group. First of all, I would like to thank my super- visor Jan, I would not have been able to complete the underlying work without his excellent scientific guidance. I would also like to thank Martijn for his assistance and support on the project API usage and coding. Finally, I would like to express my sincerest gratitude to my family for their unconditional supporting, encourage- ment and motivational capabilities. Gaomei Shi Delft, The Netherlands 18th March 2019 v vi Contents Preface v 1 Introduction 5 1.1 A brief history of the GFW . .5 1.2 The categories of circumvention systems . .6 1.3 Thesis Structure . .7 2 Problem description 9 2.1 Research Questions . .9 2.2 Internet Censorship . 10 2.2.1 Client-side censorship . 10 2.2.2 Server-side censorship . 11 2.2.3 In-path censorship . 11 2.2.4 On-path censorship . 11 2.3 Analysis and Blocking Mechanisms . 11 2.3.1 In-path censorship . 12 2.3.2 On-path censorship . 13 2.4 Obfuscation of censorship circumvention systems . 16 2.4.1 Payload encryption . 17 2.4.2 Randomizer . 17 2.4.3 Mimicry . 18 2.4.4 Tunneling . 18 2.5 P2P architecture - Build a trust network to overcome censorship . 20 3 Empirical evaluation of the GFW 23 3.1 Experimental Setup . 23 3.2 IP blocking . 24 3.3 TCP connection reset . 27 3.3.1 HTTP keywords detection . 27 3.3.2 DNS keywords detection . 28 3.3.3 TCP connection reset module . 31 3.4 DNS hijacking and DNS cache poisoning . 32 3.4.1 How the GFW intercepts DNS resolution . 32 vii 3.5 Experimental results and summary . 35 3.5.1 Threat model . 35 3.5.2 Results and Circumvention suggestions . 35 4 Circumvention System Design 37 4.1 Design goals . 37 4.2 System Architecture . 39 4.3 Traffic forwarding . 40 4.3.1 SOCKS on edges . 40 4.3.2 Peer-to-peer system . 42 4.4 Token economy . 43 4.4.1 Analysis of threats to the system . 44 4.4.2 Solutions for threats . 46 4.5 Multi-hop messaging . 52 4.5.1 Solutions for data privacy . 52 4.6 Implementation details . 53 4.6.1 Traffic forwarding . 53 4.6.2 Token economy . 54 4.6.3 Anonymous messaging . 55 5 Evaluation 57 5.1 Evaluation framework . 57 5.1.1 Network performance . 57 5.1.2 System performance . 58 5.2 Methodologies and experimental steps . 59 5.2.1 System performance . 59 5.2.2 Performance Comparison . 60 5.3 Results and analysis . 61 5.3.1 Network performance . 61 5.3.2 System performance . 63 5.3.3 Scalability test . 66 6 Conclusions and Future Work 67 6.1 Conclusions . 67 6.1.1 Results for each research questions . 67 6.1.2 Main contributions . 71 6.2 Future Work . 71 viii List of Figures 2.1 An overview of censorship system . 10 2.2 A general censorship model . 11 2.3 DNS hijacking and DNS poisoning . 16 3.1 Experimental pipeline for categorizing different root causes . 25 3.2 IP blocking . 25 3.3 TCP connection reset . 27 3.4 The model of TCP connection reset device . 31 3.5 DNS hijacking . 32 3.6 DNS cache poisoning . 33 4.1 A Tor circuit with pluggable transport meek . 38 4.2 The average connected clients number of cymrubridge02[32] . 39 4.3 Circumvention system architecture . 39 4.4 Traffic routing path . 40 4.5 Work flow of the SOCKS5 protocol . 42 4.6 Bootstrapping of a peer-to-peer system . 43 4.7 Challenge response mechanism . 45 4.8 A malicious server node . 45 4.9 Trustchain protocol[20] . 47 4.10 Intel SGX Remote attestation[19] . 49 4.11 Intel SGX Remote attestation full work flown[19] . 50 4.12 A 2-hop onion routing circuit . 52 4.13 The class UML diagram of Multiproxy . 53 4.14 Components and work flow of MultiProxy . 54 4.15 Packet structure . 54 4.16 Build a 2-hop circuit[37] . 56 5.1 Premium networking topology of Google Cloud instances[21] . 60 5.2 Latency with different hop length . 62 5.3 Latency measurement . 63 5.4 Throughput measurement . 64 5.5 CPU usage (%) . 65 5.6 Memory usage (MB) . 65 1 5.7 Scalability test . 66 2 List of Listings 1 Traceroute from uncensored domain . 26 2 Traceroute from censored domain . 26 3 Packet capture example of TCP connection reset over HTTP protocol 28 4 Packet capture example of TCP connection reset over HTTP pro- tocol during a certain period . 29 5 Packet capture example over TCP . 30 6 Packet capture example over UDP . 32 List of Tables 3.1 Poisoned IP addresses . 34 3.2 The proportion of poisoned domain names . 35 3.3 The number of domains affected by each blocking mechanism . 36 4.1 The IP addresses of meek server . 38 4.2 MultiProxy system components . 41 5.1 Evaluation framework . 57 3 4 Chapter 1 Introduction This chapter gives the background of the censorship. Censorship can occur in various types of media for a variety of reasons, such as politics, religions, copy approval, etc. This thesis does not focus on the root cause of censorship. Instead, it emphasizes the technical aspects of Internet censorship. Since different coun- tries have different implementations of the censorship systems, this work takes the country-wide content monitoring system, the GFW as a case study. Section 1.1 introduces the brief history of the Great Firewall of China (GFW). Following this, a short introduction of different circumvention methods and systems are presented in section 1.2. Section 1.3 outlines the thesis structure. 1.1 A brief history of the GFW The censorship already exists for many years according to different rules and reg- ulations as the Internet has become a common communication platform. Take the most famous example: The GFW, also known as the Great Firewall of China, is the most extensive and sophisticated country-wide Internet censoring and monitor- ing system around the world. It is a combination of the hardware and software which aims at distinguishing and blocking the network traffic in the particular blacklist. This undesired web content contains search engines, e.g., Google and DuckDuckGo, social media and social networking websites, e.g., Twitter, Face- book, Instagram, YouTube, etc. The GFW uses multiple techniques and modules to prevent the citizens from accessing the blocked contents. In the last few decades, most application layer protocols such as HTTP and DNS are directly used based on the transport layer protocols, e.g., TCP and UDP. Although these application protocols provide the standard ways of transferring re- sources, they are vulnerable to man-in-the-middle attacks because of their insuf- ficient considerations of security. All the network traffic is in plain text between 5 the source and the destination. The lack of security gives the GFW a chance to detect the contents inside the specific protocols. In the early years from 2002[41], the GFW starts developing the keyword filtering system to block the access to se- lected target websites including some search engines and social media websites which can spread a massive amount of information.