Characterizing Network Infrastructure Using the Domain Name System
Total Page:16
File Type:pdf, Size:1020Kb
CHARACTERIZING NETWORK INFRASTRUCTURE USING THE DOMAIN NAME SYSTEM A Dissertation Presented to The Academic Faculty By Panagiotis Kintis In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of School of Computer Science Georgia Institute of Technology December 2020 Copyright c Panagiotis Kintis 2020 CHARACTERIZING NETWORK INFRASTRUCTURE USING THE DOMAIN NAME SYSTEM Approved by: Dr. Emmanouil Antonakakis School of Electrical and Computer Engineering Dr. Mustaque Ahamad Georgia Institute of Technology School of Computer Science Georgia Institute of Technology Dr. Douglas Blough School of Electrical and Computer Dr. Jonathan M. Smith Engineering Department of Computer and Infor- Georgia Institute of Technology mation Science University of Pennsylvania Dr. Angelos Keromytis School of Electrical and Computer Date Approved: October 1, 2020 Engineering Georgia Institute of Technology Reason is immortal, all else mortal. Pythagoras of Samos To my family and friends, for being there. ACKNOWLEDGEMENTS Several people have made the last few years very exciting and productive. I owe a great debt of gratitude to exceptional researchers and devoted friends, who have played a significant role in my personal and professional life, and helped me in many different ways to complete this thesis. I would like to thank my advisor, Manos Antonakakis, who has been there throughout this journey to guide and assist. He pushed me to achieve my full potential, helped me navigate the research world, and provided me with the intellectual capital to complete this thesis and become a better researcher. Several years ago, he gave me a chance, accepting me in the PhD program, and taught me everything I needed to succeed. In addition to my advisor, I would like to thank Chaz Lever, a collaborator and, most importantly, a very good friend. He made sure to help me become very good technically, introduced me to a plethora of new technologies, and assisted me in the development of every system used in this thesis. He was there to listen, influence, motivate, and inspire both professionally and personally; for that, and so much more, thank you, Chaz! In my very first steps in the academic realm, I was fortunate enough to collaborate with very smart people whose influence has been paramount to this work. Dave Dagon, the first person I met at Georgia Tech, introduced me to the DNS world, believed in me, and gave me the chance and tools to work on many interesting problems. Nick Nikiforakis, Michalis Polychronakis, and Roberto Perdisci are three amazing people and outstanding researchers, who always helped me see problems from a different angle. I cannot forget, of course, Angelos Keromytis, who became an integral part of my professional life the last few years. Thank you, everyone, for your invaluable contribution. My time at Georgia Tech would not have been as enjoyable if it was not for the As- trolavos Lab and its members. Everyone, in their own way, was there for the good and the hard times. I would like to thank Yacin Nadji, Yizheng Chen, Thanasis Kountouras, Omar v Alrawi, Logan O’Hara, Thomas Papastergiou, Thanos Avgetidis, Konstantinos Karakat- sanis, Miuyin Yong Wong, Kleanthis Karakolios, Aaron Faulkenberry, William Garrison, Alex Neal, and Michael Mitchel. You helped more than you can imagine, both as collabo- rators and friends. I would also like to thank my family in Greece, whose unconditional love and support throughout these years allowed me to work on this thesis. Thousands of miles away, they were always in my thoughts and knowing I was in theirs gave me strength to continue. Everything I have achieved, is because you made sure I could, many years ago. Of course, Niki, who knowing we would be far from each other, never let that be a barrier, but did everything to help and firmly pushed me towards the right direction, no matter what. My other family, away from my family, here in Atlanta. Chris, Lula, Pano, George, words are just not enough; Nikolaki, Vicki, Ismini, Ioanna, Nefeli, you made me feel I belong. You all really gave me a family here, and for that, I will always be grateful. Finally, I would like to thank my committee members. Doug Blough, the first professor I had the chance to work with, who taught me enough to make the contributions of this thesis a reality. Mustaque Ahamad, who helped me see beyond a single research direction and helped me understand research applicability. Jonathan Smith, who believed in me and our work, helped design a path for this thesis, and introduced me to a research commu- nity outside of Georgia Tech. Thank you all for your feedback, guidance, and assistance throughout this process. vi TABLE OF CONTENTS Acknowledgments . v List of Tables . xii List of Figures . xiv Summary . .xviii Chapter 1: Introduction . 1 1.1 Hypothesis . 4 1.2 Thesis Statement . 5 1.3 Contributions . 5 1.4 Dissertation Overview . 6 Chapter 2: Background . 8 2.1 The Domain Name System . 8 2.1.1 Domain Names . 8 2.1.2 Domain Resolution . 9 2.1.3 DNS Packets & Contents . 12 2.1.4 DNS Data Collection . 21 2.2 Previous Work . 22 vii 2.2.1 DNS Measurements . 22 2.2.2 DNS Abuse . 23 2.2.3 DNS Squatting Abuse . 24 Chapter 3: Active DNS Measurements . 25 3.1 Introduction . 25 3.1.1 Contributions . 26 3.2 Active DNS Data Collection . 27 3.2.1 Infrastructure . 27 3.2.2 Domain Seed . 30 3.2.3 Measurements . 33 3.3 Comparing Active And Passive DNS Datasets . 33 3.3.1 Datasets . 34 3.4 Case Studies . 40 3.4.1 Enhancing Public Blacklists . 40 3.4.2 Enhancing The Detection Of Domain’s Residual Trust Change . 44 3.4.3 Tracking Malicious Domain Names In Non-routable IP Space . 47 3.5 Conclusion . 49 Chapter 4: Active DNS: The First Quinquennium . 50 4.1 Introduction . 50 4.2 Thales 2.0 Architecture . 52 4.3 Challenges With Thales . 56 4.3.1 Data Collection & Temporary Storage . 57 viii 4.3.2 Data Size & Data Transfer . 58 4.3.3 Orchestration . 60 4.3.4 Data Collection . 62 4.3.5 Altera Pars . 65 4.4 Redesign . 65 4.4.1 Seed Management . 65 4.4.2 Resource Orchestration . 66 4.4.3 Data Processing . 67 4.4.4 Long-Term Storage . 68 4.4.5 Schema . 69 4.5 Thales 2.0 Value . 70 4.5.1 DNS Data . 71 4.5.2 Active DNS Data in Security Research . 85 4.6 Lessons Learned . 89 4.7 Active and Passive DNS Applications . 94 4.7.1 Passive DNS . 95 4.7.2 Active DNS . 101 4.7.3 Combining Datasets . 103 Chapter 5: Combosquatting Domain Name Threats . 108 5.1 Introduction . 108 5.1.1 Contributions . 109 5.2 Squatting Background . 110 ix 5.2.1 DNS Squatting & Combosquatting . 111 5.2.2 Combosquatting Abuse . 112 5.3 Measurement Methodology . 115 5.3.1 Trademark Selection . 115 5.3.2 Datasets . 117 5.3.3 Linking Datasets . 119 5.4 Measuring Combosquatting Domains . 119 5.4.1 Combosquatting versus Typosquatting . 120 5.4.2 Lexical Characteristics . 122 5.4.3 Temporal Analysis . 128 5.4.4 Infrastructure Analysis . 131 5.5 Combosquatting in the Wild . 133 5.5.1 Exploring & Labeling Combosquatting Domains . 134 5.6 Combosquatting Rating System . 137 5.7 CSR Evaluation and Analysis . 142 5.7.1 Evaluating the Connected Component Clustering . 142 5.7.2 Ranking Cluster Behavioral Analysis . 145 5.7.3 Using CSR Operationally . 147 Chapter 6: Conclusion . 154 6.1 Considerations and Limitations . 155 6.1.1 Active DNS Limitations . 155 6.1.2 Thales 2.0 Limitations . 157 x 6.1.3 Combosquatting Limitations . 157 6.2 Closing Remarks . 160 Appendix A: Combosquatting . 163 A.1 APT Domains . 163 References . 169 xi LIST OF TABLES 3.1 Number of data points collected over the last 12 days of March 2016. Val- ues are in thousands (×103). 38 3.2 The distribution of QTYPEs for the active and passive DNS in our datasets. 39 3.3 Operation Hangover and CopyKittens Attack Group Infrastructure and Do- main Names. 48 4.1 Issues and related components from Thales and Thales 2.0. ..