BY Under the Supervision of Sambuddho Chakravarty NEW DELHI
Total Page:16
File Type:pdf, Size:1020Kb
AQUEST FOR CYBER HIGH GROUND:IMPACT OF INTERNET STRUCTURE ON (ANTI-)CENSORSHIP BY DEVASHISH GOSAIN Under the supervision of Sambuddho Chakravarty COMPUTER SCIENCE AND ENGINEERING INDRAPRASTHA INSTITUTE OF INFORMATION TECHNOLOGY DELHI NEW DELHI– 110020 JULY, 2020 AQUEST FOR CYBER HIGH GROUND:IMPACT OF INTERNET STRUCTURE ON (ANTI-)CENSORSHIP BY DEVASHISH GOSAIN ATHESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy COMPUTER SCIENCE AND ENGINEERING INDRAPRASTHA INSTITUTE OF INFORMATION TECHNOLOGY DELHI NEW DELHI– 110020 JULY, 2020 Certificate This is to certify that the thesis titled A Quest for Cyber High Ground: Impact of Internet Structure on (Anti-)Censorship being submitted by Devashish Gosain to the Indraprastha Institute of Information Technology Delhi, for the award of the degree of Doctor of Philosophy, is an original research work carried out by him under my supervision. In my opinion, the thesis has reached the standard fulfilling the requirements of the regulations relating to the degree. The results contained in this thesis have not been submitted in part or full to any other university or institute for the award of any degree or diploma. July, 2020 Dr. Sambuddho Chakravarty Indraprastha Institute of Information Technology Delhi New Delhi 110020 Abstract The original design of the Internet was a resilient, distributed architecture, that should be able to route around (and therefore recover from) massive disruption — up to and including nuclear war. However, network routing policies and business decisions cause traffic to be often routed through a relatively small set of Autonomous Systems (ASes). It has practical implications — some of these frequently appearing ASes (i.e., “key” ASes) are hosted in censorious nations. Other than censoring their own citizens’ network access, such ASes may inadvertently filter traffic for other foreign customer ASes as well. Thus in this thesis, we analyzed, “how can inferences drawn from Internet maps be used to aid (Anti-)censorship?” Specifically, we attempted to answer questions like — Is Internet structure still hierarchical? What are the key ASes, and in which countries they are located? Can censorious countries (which may host key ASes) filter Internet traffic of other nations transiting them? To begin with, we constructed a map of the Internet and examined the extent of routing centralization on the Internet. We identified the major players who control the “Internet back- bone” and point out how many of these are, in fact, under the jurisdiction of censorious countries (specifically Russia, China, and India). We found that approx. one-third of the Internet backbone belongs to the aforementioned known censors that may potentially monitor a large fraction of global Internet traffic. Further, we went ahead to study whether this hierarchy exists within the nation(s) itself? If so, can censorious nations exploit this hierarchy to achieve censorship/- surveillance within their national boundary? With censorship mechanisms deployed in a few key ASes, a censor may achieve large scale censorship within its territory. As a case study, we selected India that has the second-largest Internet userbase. We conducted a study on the Internet hierarchy in India from the point of view of the censor. We then consider the question (feasibility) of whether India might potentially follow the Chinese model and institute a single, government-controlled filter. We found that a few “key” ASes (1% of Indian ASes) collectively intercept 95% of paths to the censored sites, and also to all publicly-visible DNS servers. Five thousand routers spanning these key ASes would suffice to carry out IP or DNS filtering for the entire country; 70% of these routers belong to only two private ISPs. However, the previous feasibility study does not consider the present censorship mechanisms and infrastructure employed by Indian ISPs. Thus, we developed various techniques and heuristics to assess the pervasiveness of censorship and study the underlying mechanisms used by these ISPs to achieve them. We fortified our findings by adjudging the coverage and consistency of censorship infrastructure, broadly in terms of the average number of network paths and requested domains, the infrastructure censors. Our results indicate an apparent disparity among the ISPs — what they filter and on how they install censorship infrastructure. For instance, in i Idea cellular (a popular ISP), we observed the censorious middleboxes in over 90% of our tested intra-AS paths, whereas for others like Vodafone, it is as low as 2.5%. We later devised novel anti-censorship strategies that do not depend on third-party tools (like proxies, Tor, and VPNs, etc.). We managed to access all blocked websites in all ISPs under test. It must be noted that the proposed anti-censorship solutions were temporary, i.e., they were based on obfuscating the pattern matching used by the censorship infrastructure (e.g., in HTTP GET request changing the Host: evil.com to HoSt: evil.com). If in the future, censors evolve and improve their infrastructure, the proposed solutions may likely fail. Thus, we focused on a relatively new anti-censorship scheme Decoy Routing, which aims to end the arms race between the censor and the free speech activists. Decoy Routing, the use of routers (rather than end hosts) as proxies, is a new direction in anti-censorship research. However, practical decoy routing deployment poses a new challenge of where to place decoy Routers (DRs) on the Internet? Thus, we proposed an efficient decoy router placement strategy that requires the construction of global (and country-level) Internet maps. We found that few (≈ 30) ASes intercepted over 90% of paths to the top n sites worldwide, for n = 10, 20...200 and also to other destinations. Our first contribution is to demonstrate with real paths that the number of ASes required for a world-wide DR framework is small (≈ 30). Our second contribution is to consider the details of DR placement — not just in which ASes DRs should be placed to intercept traffic, but exactly where in each AS. We found that even with 30 ASes, we still need a total of about 11,700 DRs. Decoy Routing requires accessing web content hosted outside the censors’ boundary. How- ever, Content Distribution Network (CDNs), which are designed to bring web content closer to end-user, might pose operational challenges to DR. Popular web content (e.g., Alexa popular websites) served from CDNs, might be available within the censor’s boundary itself. Thus, we analyzed how do CDN-based web content localization can hinder such systems. Moreover, we quantitatively analyzed the impact of CDN localization on various anti-censorship systems, including DR. Such analysis requires geolocating the websites. Thus we adapted a multilateration method, Constraint Based Geolocation (CBG), with novel heuristics and termed it as Region Specific CBG (R-CBG). In ≈ 91% cases, R-CBG correctly classifies hosts as inside (or outside) w.r.t. a nation. Using R-CBG, we observe that most of the popular websites are hosted inside each of the nations. Our empirical study, involving five countries, shows that popular websites (≈ 80% of Alexa top-1k for each nation) are hosted within a client’s domicile. These results reveal that anti-censorship approaches like DR may not directly use a significant fraction of popular websites. However, a small, yet a significant set of websites (≈ 20%), are hosted outside the censors’ boundaries and may be used. ii Dedication To, All those who understand the importance of a question... And do not consider any question to be a “wrong” question... Tell people there’s an invisible man in the sky who created the universe, and the vast majority will believe you. Tell them the paint is wet, and they have to touch it to be sure. George Carlin In questions of science, the authority of a thousand is not worth the humble reasoning of a single individual. Galileo Galilei iii Acknowledgements “At times our own light goes out and is rekindled by a spark from another person. Each of us has cause to think with deep gratitude of those who have lighted the flame within us.” Albert Schweitzer I thank all of those who have helped me in shaping me into what I am — sometimes overtly and sometimes covertly. Last but not the least, I thank almighty God, if it exists... iv Research Papers From This Work 1. Few Throats to Choke: On the Current Structure of the Internet. IEEE Conference on Local Computer Networks (LCN), 2017. 2. Mending Wall: On the Implementation of Censorship in India. EAI International Conference on Security and Privacy in Comm. Networks (SECURECOMM), 2017. (Best Student Paper Award.) 3. The Devil’s in The Details: Placing Decoy Routers in the Internet. Annual Computer Security Applications Conference (ACSAC), 2017. 4. Where The Light Gets In: Analyzing Web Censorship Mechanisms in India. ACM Internet Measurement Conference (IMC), 2018. 5. Too Close for Comfort: Morasses of (Anti-) Censorship in the Era of CDNs. Under submission to PoPETS 2020. Available on demand. v Contents Abstract i Dedication iii Acknowledgements iv Publicationsv List of Tables xi List of Figures xiii 1 Introduction1 List of Abbreviations1 2 Few Throats to Choke: On the Current Structure of the Internet5 2.1 Background and Related Work..........................8 2.1.1 Internet Censorship............................8 2.1.2 Internet Mapping.............................9 2.2 Approach and Methods..............................9 2.2.1 Mapping the Internet........................... 10 2.2.2 Identifying ASes of interest....................... 13 2.2.3 Validation................................. 14 2.3 Experimental Results............................... 14 2.3.1 Test 1 : Alexa top-100.......................... 14 vi 2.3.2 Cross-Validation............................. 17 2.3.3 Collateral Damage............................ 17 2.4 Discussion and Future Work........................... 19 2.4.1 Heavy-Hitters vs Tier-1 ASes...................... 20 2.4.2 AS path estimation............................ 23 2.4.3 Current and Future Work........................