Methods and Systems for Understanding Large-Scale Internet Threats

Methods and Systems for Understanding Large-Scale Internet Threats by Paul James Pearce A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Vern Paxson, Chair Professor Deirdre K. Mulligan Professor Stefan Savage Professor David Wagner Summer 2018 Methods and Systems for Understanding Large-Scale Internet Threats Copyright 2018 by Paul James Pearce 1 Abstract Methods and Systems for Understanding Large-Scale Internet Threats by Paul James Pearce Doctor of Philosophy in Computer Science University of California, Berkeley Professor Vern Paxson, Chair Large-scale Internet attacks are pervasive. A broad spectrum of actors from organized gangs of criminals to nation-states exploit the modern, layered Internet to launch politically and economically motivated attacks. The impact of these attacks is vast, ranging from billions of users experi- encing Internet censorship, to tens of millions of dollars lost annually to cybercrime. Developing effective and comprehensive defenses to these large scale threats requires systematic empirical measurement. In this dissertation we develop empirical measurement methods and systems for understanding politically and economically motivated Internet threats. Specifically, we examine the problems of Internet censorship and advertising abuse in-depth and at-scale. To understand censorship, we develop Augur and Iris, methods and accompanying systems that allow us to perform global, lon- gitudinal measurement of Internet censorship at the TCP/IP and DNS layers of the network stack— without the use of volunteers. This work addresses a range of both technical and extra-technical challenges, at a scale and fidelity not previously achieved. In combating advertising abuse, we in- vestigate and chronicle multiple facets of the ecosystem— from clickbots to large-scale botnets to advertising injection—using a variety of empirical methods. Our work ultimately identifies fundamental structural weak-points leverageable for defense, resulting in dismantling botnets, cleaning up ad networks, and protecting users. i For Judy ii Contents Contents ii 1 Introduction 1 I Global Censorship Measurement 5 2 Censorship Introduction, Related Work and Ethics 6 2.1 Introduction . 6 2.2 Related Work . 7 2.3 Ethics . 10 3 Augur: Internet-Wide Detection of Connectivity Disruptions 12 3.1 Introduction . 12 3.2 Method Overview . 13 3.3 Putting the Method to Practice . 17 3.4 Augur Implementation and Experiment Data . 23 3.5 Validation and Analysis . 30 3.6 Discussion . 39 3.7 Augur Summary . 40 4 Iris: Global Measurement of DNS Manipulation 42 4.1 Introduction . 42 4.2 Method . 43 4.3 Dataset . 52 4.4 Results . 55 4.5 Iris Summary . 65 5 Censorship Discussion and Conclusion 68 II Understanding Advertising Abuse 70 6 Advertising Abuse Introduction and Related Work 71 iii 6.1 Introduction . 71 6.2 Related Work and Background . 73 7 What’s Clicking What? Clickbot Techniques and Innovations 78 7.1 Introduction . 78 7.2 Methodology . 79 7.3 The Fiesta Clickbot . 81 7.4 The 7cy Clickbot . 85 7.5 Discussion and Summary . 96 8 ZeroAccess: Background and Evolution 98 8.1 ZeroAccess Evolution and Takedown . 98 8.2 Technical Background . 100 9 The ZA Auto-Clicking and Search-Hijacking Malware 102 9.1 Introduction . 102 9.2 Methodology . 103 9.3 The ZeroAccess Platform . 104 9.4 The Auto-Clicking Module . 105 9.5 Serpent: The Search-Hijacking Module . 108 9.6 Summary . 116 10 Characterizing Large-Scale Click Fraud in ZeroAccess 117 10.1 Introduction . 117 10.2 Data Sources and Quality . 118 10.3 Analyzing Fraud . 121 10.4 Assessing ZA-Dirty Ad Units . 130 10.5 Summary . 135 11 Ad Injection at Scale: Assessing Deceptive Advertisement Modifications 136 11.1 Introduction and Background . 136 11.2 Impacted Properties . 137 11.3 Understanding Injectors in Revenue Chains . 138 11.4 Advertisers and Intermediates For Top Injectors . 139 11.5 Summary . 143 12 Advertising Abuse Conclusion 144 13 Conclusion and Future Directions 145 Bibliography 147 iv Acknowledgments I must begin by thanking my amazing PhD advisor Vern Paxson. His support, understanding, encouragement, and expertise on topics ranging from research to life were invaluable. Without his guidance I would not be where I am today. Thank you. I must also thank the faculty of the Center for Evidence based Security Research (CESR), namely Stefan Savage and Geoff Voelker. Their council and feedback were instrumental in my career. They both continually went above and beyond in facilitating my PhD, at times engaging more as advisors than mentors. Thank you both. Similarity, I thank Nick Feamster, whose guidance throughout the 2nd half of my PhD was critical in achieving my career objectives. I also thank David Wagner, a resource and mentor throughout my time in Berkeley. PhD’s are a marathon, and it helps to have good friends to share the load. Thank you to 1044++ for the life part of work-life balance. I owe a huge debt of gratitude to all of my fellow security students at UC Berkeley and the broader CESR family. This includes but is not limited to: Devdatta Akhawe, Jethro Beekman, Joe DeBlasio, Adrienne Porter Felt, David Fifield, Chris Grier, Grant Ho, Danny Huang, Mobin Javed, Noah Johnson, Frank Li, Bill Marczak, Michael McCoyd, Brad Miller, Ariana Mirian, Austin Murdock, and Kurt Thomas. Thank you all for the support and comradery. The unsung heros of any PhD are the staff that make everything work. Chief among them are Angie Abbatecola and Jon Kuroda. Thank you both for being awesome and continually going above and beyond in keeping the department going. Going back further, this degree was not possible without the opportunity of the California Community College system, numerous professors at Chaffey Community College, and the support of several individuals whom facilitated my transfer to UC Berkeley as an undergraduate. Among those who deserve recognition are: Tim Arner (Chaffey Community College Professor, retired), Charles Hollenbeck (Chaffey Community College Professor, retired), Rebecca Miller (former UC Berkeley Director of Student Affairs), Karen Pender (Chaffey Community College Professor, retired), Ana Rafferty (UC Berkeley Transfer Specialist), and Sheila Humphreys (UC Berkeley EECS Emerita Director of Diversity). Thank you all helping me get to this point in my life and career. Throughout my entire PhD I had the privilege to work with an amazing set of collaborators from both academia and industry. I thank them all for the opportunity to collaborate, and acknowledge that this work was not possible without their assistance. I also thank my dissertation committee, Vern Paxson, Deirdre Mulligan, Stefan Savage, and David Wagner, for enduring my qualifying exam slides and providing invaluable feedback on my work’s direction and impact. Additionally, I am grateful for the research assistance of the following throughout my graduate career: David Anselmi, Manos Antonakakis, Richard Boscovich, Eric Brewer, Randy Bush, Chia Yuan Cho, Jed Crandall, Hitesh Dharamdasani, Zakir Durumeric, David Fifield, Sarthak Grover, Yunhong Gu, Brad Karp, Niels Provos, Moheeb Abu Rajab, the Google Safe Browsing Team, the v Microsoft Digital Crimes Unit (DCU), the Microsoft Malware Protection Center (MMPC), and the Bing Ads Online Forensics Team. My work was supported in part by National Science Foundation Awards CNS-0433702, CNS- 0831535, CNS-0905631, CNS-1213157, CNS-1237076 CNS-1237264, CNS-1237265, CNS-140- 6041, CNS-1518878, CNS-1518918, CNS-1540066, and CNS-1602399; by the Office of Naval Research MURI grants N00014-09-1-1081 and N00014-12-1-0165; by the U.S. Army Research Office MURI grant W911NF-09-1-0553; the Team for Research in Ubiquitous Secure Technology (TRUST); and by gifts from Google, Microsoft, and the UCSD Center for Networked Systems (CNS). Any opinions, findings, and conclusions or recommendations expressed in this dissertation are those of the author and do not necessarily reflect the views of the sponsors. And to Judy, my wife: You are the best person I know. Your immeasurable patience, love, and support is what allows me to do what I do. Thank you for the journey thus far, and what is come. Also I’m sorry I cold-called you (prior to our formal introduction) when we took EE20N. But it seems to have worked out ok in the long run. 1 Chapter 1 Introduction Internet security is defined by conflict. The value and power mediated by the global, interconnected systems of today’s Internet in turn attract adversaries who seek to exploit these same systems for economic, political or social gain. However, the underlying complexity of the Internet infrastruc- ture, the layering of its services, and the indirect nature of its business relationships can make it difficult to identify even the existence of adversaries manipulating systems for their benefit. Fur- ther, identifying that an attack is taking place is only the first step in an ongoing challenge, as adversaries have the luxury to define where and when their actions take place and responders are forced to discover the landscape of the battle after it commences. Studying these problems is challenging. While anecdotes and serendipitous findings are com- mon, understanding the full nature of a particular action or attack requires systematic measurement— frequently measurement of an actor who seeks to hide or camouflage their actions. Designing effective and comprehensive defenses requires sound understanding of not only specific problems, but also the fundamental limitations and costs associated with those problems. In the context of global-scale attacks such as cybercrime and censorship, acquiring this understanding is frequently challenging and requires new types of research systems and measurement methods. Remediation without systematic understanding of opponents’ costs, capabilities, and objectives risks developing reactionary, incremental defenses lacking feasibility or robustness. This dissertation brings empirical grounding and understanding to the study of global, hidden Internet security threats.

Methods and Systems for Understanding Large-Scale Internet Threats

Challenges and Benefits of Web 2.0-Based Learning Among International Students of English During the Covid-19 Pandemic in Cyprus

The Role of User Generated Content (UGC) in Social Media for Tourism Sector

Global Connectivity and Multilinguals in the Twitter Network

The Connectivity Doctrine: a Critique of Techo-Utopian Discourse, Liberal Governmentality, and the Colonization of Cyberspace

Social Media and Social Networking Department: Fudan International Summer Session

Chipping Away at Censorship Firewalls with User-Generated Content

Understanding User-Generated Content on Social Media

Jose Van Dijck: Culture of Connectivity: a Critical History of Social Media

Social Networks and the Diffusion of User-Generated Content: Evidence from Youtube

The Marketer's Guide to User-Generated Content

Digital Globalization: the New Era of Global Flows March 2016

Teaching and Learning with Twitter