Enhancing System Transparency, Trust, and Privacy with Internet Measurement
Total Page:16
File Type:pdf, Size:1020Kb
Enhancing System Transparency, Trust, and Privacy with Internet Measurement by Benjamin VanderSloot A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in the University of Michigan 2020 Doctoral Committee: Professor J. Alex Halderman, Chair Assistant Professor Roya Ensafi Research Professor Peter Honeyman Assistant Professor Baris Kasikci Professor Kentaro Toyama Benjamin VanderSloot [email protected] ORCID iD: 0000-0001-6528-3927 © Benjamin VanderSloot 2020 ACKNOWLEDGEMENTS First and foremost, I thank my wife-to-be, Alexandra. Without her unwavering love and support, this dissertation would have remained unfinished. Without her help, the ideas here would not be as strong. Without her, I would not be here. Thanks are also due to my advisor, J. Alex Halderman, for giving me the independence and support I needed to find myself. I also thank Peter Honeyman for sharing his experience freely, both with history lessons and mentorship. Thank you Roya Ensafi, for watching out for me since we met. Thank you Baris Kasikci and Kentaro Toyama, for helping me out the door, in spite of the ongoing pandemic. I owe a great debt to my parents, Craig and Teresa VanderSloot, for raising me so well. No doubt the lessons about perseverance and hard work were put to good use. Your pride has been fuel to my fire for a very long time now. I also owe so much to my sister, Christine Kirbitz, for being the best example a little brother could hope for. I always had a great path to follow and an ear to share my problems with. Matt Bernhard and Allison McDonald, I owe you my sanity. Let me know if you need to borrow it; I’ll leave the door open so you can follow me out quickly. Thank you Drew Springall and David Adrian for always taking the time out of your own journey to help me. My time in grad school would not have been the same without the fun and academic challenge I had in lab, and for that I have so many to thank: Eric Wustrow, James Kasten, Zakir Durumeric, Will Scott, Chris Dzombak, Ariana Mirian, Ben Burgess, Alex Holland, Gabrielle Beck, Rose Howell, Sai Gouravajhala, Ofir Weisse, Marina Minkin, Andrew Kwong, and many others. Thank you to all of the professors that showed me kindness, helped me, or inspired me. The work in this dissertation was supported in part by the National Science Foundation; the United States Department of State Bureau of Democracy, Human Rights, and Labor; the Rackham Merit Fellowship; and the Computer Science and Engineering Division at Michigan. ii TABLE OF CONTENTS ACKNOWLEDGEMENTS ::::::::::::::::::::::::::::::::: ii LIST OF FIGURES :::::::::::::::::::::::::::::::::::: vi LIST OF TABLES ::::::::::::::::::::::::::::::::::::: xi ABSTRACT ::::::::::::::::::::::::::::::::::::::::: xiv CHAPTER I. Introduction .....................................1 1.1 Measuring Internet Censorship . .2 1.2 Measuring to Circumvent Censorship . .3 1.3 Measuring Web Tracking Evasion . .4 1.4 Measuring the Certificate Ecosystem . .4 II. Measurement of Application-Layer Censorship .................7 2.1 Introduction . .8 2.2 Related Work . 10 2.2.1 Application-layer Blocking . 10 2.2.2 Direct Measurement Systems . 11 2.2.3 Remote Measurement Systems . 11 2.2.4 Investigations of DPI Policies . 11 2.3 Our Measurement System . 12 2.3.1 Design Goals . 12 2.3.2 System Design . 12 2.4 Experimentation . 18 2.4.1 Echo Server Characterization . 18 2.4.2 Datasets . 22 2.5 Evaluation . 24 2.5.1 Validation . 24 2.5.2 Detection of Disruption . 26 2.5.3 Disruption Mechanisms . 29 2.5.4 HTTP vs HTTPS . 30 iii 2.5.5 Disruption Breadth . 32 2.6 Discussion . 35 2.6.1 Ethics . 35 2.6.2 Limitations . 38 2.6.3 Future Work . 40 2.7 Conclusion . 41 III. Measurement in Aid of Censorship Circumvention ............... 42 3.1 Introduction . 42 3.2 Background . 45 3.2.1 Prior schemes . 45 3.3 Deployment Architecture . 47 3.3.1 ISP Deployment . 47 3.3.2 Long-lived Sessions . 47 3.3.3 Handling 40Gbps Links . 49 3.3.4 Multi-Station Constellation . 50 3.3.5 Client Integration . 51 3.3.6 Monitoring . 53 3.3.7 Reachable site selection . 54 3.3.8 Managing Load on Reachable Sites . 56 3.4 Trial Results . 58 3.4.1 At-Scale Operation . 58 3.4.2 User Traffic . 58 3.4.3 Impact on Reachable Sites . 60 3.5 Deployment Results . 60 3.5.1 Psiphon Impact . 61 3.5.2 Client Performance . 62 3.5.3 Decoy Impact . 64 3.5.4 Station Performance . 69 3.5.5 Blocking Event . 72 3.6 Discussion . 74 3.6.1 Role of Refraction Networking . 74 3.6.2 Cost . 75 3.6.3 Partner Concerns . 75 3.6.4 Future Directions . 76 3.7 Conclusion . 77 IV. Measurement of Web Privacy Defenses ...................... 78 4.1 Introduction . 79 4.2 Related Work . 80 4.3 Methods . 84 4.3.1 Web Crawl . 84 4.3.2 Blocker List Testing . 85 iv 4.3.3 Blocker List Probing . 88 4.3.4 Inclusion Graph . 89 4.3.5 Privacy Metrics . 90 4.4 Comparing Blocker Lists . 91 4.4.1 Default Extension Behavior . 92 4.4.2 Explicit A&A Exceptions . 94 4.4.3 EasyList Variants . 96 4.5 Blocking Completeness . 97 4.6 Privacy Impacts . 100 4.6.1 Exchange Sensitivity . 100 4.6.2 Tool Evaluation . 101 4.6.3 Acceptable Advertisement . 103 4.6.4 Improved Model . 105 4.7 Discussion . 106 4.7.1 Findings and Lessons . 106 4.7.2 Limitations . 107 4.8 Conclusion . 108 V. Measurement of HTTPS Certificates ....................... 109 5.1 Introduction . 110 5.2 Certificate Perspectives . 112 5.2.1 Perspectives Enumerated . 112 5.2.2 Ethics of Measurement . 114 5.3 Impact of Perspective on HTTPS Research . 114 5.4 Two Complementary Perspectives Emerge . 115 5.4.1 Limitations of Censys . 115 5.4.2 Limitations of Certificate Transparency . 118 5.5 Passive Traffic Monitoring . 119 5.5.1 Limitations of Scope . 119 5.5.2 User-driven . 120 5.6 Merger of Certificate Transparency and Censys . 120 5.6.1 Benefits to System Administrators . 121 5.6.2 Load-testing Certificate Transparency . 121 5.7 Related Work . 122 5.8 Conclusion . 123 VI. Conclusion and Future Work ............................ 124 BIBLIOGRAPHY ::::::::::::::::::::::::::::::::::::: 128 v LIST OF FIGURES Figure 2.1 Echo Protocol—The Echo Protocol, when properly performed, is a simple ex- change between the client and server where the server’s response is identical to the client’s request. In the example above, the censoring middlebox ignores the client’s inbound request, but reacts to the the echo server’s response, injecting RST packets and terminating the TCP connection. 14 2.2 Test Control Flow—A single test using an echo server is performed by following this diagram. The most common path is also the fastest, in which an echo server responds correctly to the first request and the test is marked as Not Blocked. If the server never responds correctly, the experiment is considered a failure and we do not use the test in our evaluation. 16 2.3 Persistent Interference Duration—We use echo servers in all countries we observe censorship to empirically measure the length of time interference occurs after a censorship event has been triggered. Roughly half of the servers responded correctly to our request within 60 seconds. By 100 seconds, 99.9% responded correctly. We therefore choose two minutes as a safe delay in the Delay Phase. 16 2.4 Echo Server Churn—Only 18% of tested servers were reachable in every obser- vation over 2 months of daily scans. However, 56% were present in both our first and final scans. 20 2.5 Coverage of Autonomous Systems per Country—Echo servers were present in 184 countries with 4458 unique ASes, while OONI probes were in 113 countries with 678 ASes. ..