
DISSERTATION CHARACTERIZING THE VISIBLE ADDRESS SPACE TO ENABLE EFFICIENT CONTINUOUS IP GEOLOCATION Submitted by Manaf Gharaibeh Department of Computer Science In partial fulfillment of the requirements For the Degree of Doctor of Philosophy Colorado State University Fort Collins, Colorado Spring 2020 Doctoral Committee: Advisor: Christos Papadopolous Co-Advisor: Craig Partridge John Heidemann Indrakshi Ray Stephen Hayne Copyright by Manaf Gharaibeh 2020 All Rights Reserved ABSTRACT CHARACTERIZING THE VISIBLE ADDRESS SPACE TO ENABLE EFFICIENT CONTINUOUS IP GEOLOCATION Internet Protocol (IP) geolocation is vital for location-dependent applications and many network research problems. The benefits to applications include enabling content customiza- tion, proximal server selection, and management of digital rights based on the location of users, to name a few. The benefits to networking research include providing geographic context use- ful for several purposes, such as to study the geographic deployment of Internet resources, bind cloud data to a location, and to study censorship and monitoring, among others. The measurement-based IP geolocation is widely considered as the state-of-the-art client- independent approach to estimate the location of an IP address. However, full measurement- based geolocation is prohibitive when applied continuously to the entire Internet to maintain up-to-date IP-to-location mappings. Furthermore, many IP address blocks rarely move, making it unnecessary to perform such full geolocation. The thesis of this dissertation states that we can enable efficient, continuous IP geolocation by identifying clusters of co-located IP addresses and their location stability from latency obser- vations. In this statement, a cluster indicates a group of an arbitrary number of adjacent co- located IP addresses (a few up to a /16). Location stability indicates a measure of how often an IP block changes location. We gain efficiency by allowing IP geolocation systems to geolocate IP addresses as units, and by detecting when a geolocation update is required, optimizations not explored in prior work. We present several studies to support this thesis statement. We first present a study to evaluate the reliability of router geolocation in popular geoloca- tion services, complementing prior work that evaluates end-hosts geolocation in such services. The results show the limitations of these services and the need for better solutions, motivating our work to enable more accurate approaches. Second, we present a method to identify clus- ii ters of co-located IP addresses by the similarity in their latency. Identifying such clusters allows us to geolocate them efficiently as units without compromising accuracy. Third, we present an efficient delay-based method to identify IP blocks that move over time, allowing us to recognize when geolocation updates are needed and avoid frequent geolocation of the entire Internet to maintain up-to-date geolocation. In our final study, we present a method to identify cellular blocks by their distinctive variation in latency compared to WiFi and wired blocks. Our method to identify cellular blocks allows a better interpretation of their latency estimates and to study their geographic properties without the need for proprietary data from operators or users. iii ACKNOWLEDGEMENTS I wish to show my sincere gratitude to the people who helped me on the path towards this dissertation. I am particularly grateful to my advisor, Dr. Christos Papadopolous. Under his guidance, I have had the chance to explore exciting topics in network measurement and security and to work with talented researchers from other research groups. I am thankful for his mentorship, encouragement, kindness, insights into the specifics of my work, and emphasis on rigorous research. Going forward, I hope to continue having Christos as a mentor and to build on this fruitful relationship. My sincere thanks to Dr. Craig Partridge, whom I have been honored to have as my co- advisor for the past two years. Craig’s guidance and support proved monumental towards im- proving and finishing this dissertation, I will always be indebted to him for that. I am also in- debted to Dr. John Heidemann, USC/ISI. John has always been willing to provide insight and direction in the past five years. He helped me become a better researcher. I am grateful for all of the feedback he provided to help me improve this document, and I appreciate his trip from the West Coast to attend my dissertation oral-defense. I also appreciate the valuable learning expe- rience I have had from participating in the weekly meetings of his research group. I also thank John’s group for making much of the measurement data I used in this dissertation available. I thank, Christos, Craig, John, once more, for making me want to know so much more and for showing me the way to be a scientist. I wish to thank Dr. Indrakshi Ray and Dr. Stephen Hayne for their service on my dissertation committee. I thank my committee members for the insights, feedback, and discussions that helped me improve this dissertation. Thanks are also due to all current and former professors at CSU who helped me along the way, including Dr. Charles Anderson, Dr. Sanjay Rajopadhye, Dr. Dan Massey, Dr. Joseph Gersch, Dr. Lorenzo De Carli, Dr. Indrajit Ray, and Dr. Robert France. iv I am grateful to all my friends in the network-security research group for making the Ph.D. life a little easier, and for all the interesting discussions about research and other innumerable random topics. I extend my gratitude to all other friends in the Department of Computer Sci- ence, CSU, and the good old comrades anywhere for their support and encouragement. Most importantly, I wish to acknowledge the endless support and encouragement of my family, who kept me going. I owe it all to my beloved parents, Hamed and Fawzeyah. The support of my sister, Manal, and my two brothers, Sameer and Ehab, has been invaluable, “Shokran”. v DEDICATION to my parents ... vi TABLE OF CONTENTS ABSTRACT............................................. ii ACKNOWLEDGEMENTS ....................................... iv DEDICATION . vi LIST OF TABLES . x LIST OF FIGURES . xi Chapter 1 Introduction . 1 1.1 Problems Addressed . 2 1.2 Thesis Statement . 4 1.3 Overview of the Dissertation Studies . 5 1.3.1 Evaluating the Reliability of Router Geolocation . 5 1.3.2 Optimizations for More Accurate Continuous IP Geolocation . 6 1.3.3 Identifying Cellular IP Blocks . 7 1.4 Research Contributions . 8 Chapter 2 Background and Related Work . 11 2.1 Location Information from Existing Databases . 11 2.1.1 Look-up WHOIS Databases by IP Address . 11 2.1.2 DNS LOC Records . 12 2.1.3 Location Hints within Domain Names . 13 2.1.4 Public and Commercial Geolocation Databases . 14 2.2 Target-Assisted Geolocation . 15 2.2.1 Utilizing Technology-Enabled Devices . 15 2.2.2 Crowdsourcing-based Geolocation . 16 2.3 Measurement-based IP Geolocation . 17 2.3.1 Terminology . 17 2.3.2 Nearest Landmark to Target . 18 2.3.3 Geolocation via Delay-to-Distance Mapping . 20 2.3.4 Delay-based with Topology Geolocation . 26 2.3.5 CBG Variations Comparison . 32 2.4 Evaluating the Accuracy of Geolocation Services . 32 2.5 Scale up of Measurement-based IP Geolocation . 34 2.6 Cellular Block Identification . 36 2.7 Conclusions . 38 Chapter 3 Limitations of Router Geolocation in Popular Geolocation Services . 40 3.1 Introduction . 41 3.2 Datasets . 43 3.2.1 CAIDA Topology Dataset . 43 3.2.2 Geolocation Databases . 43 3.3 GroundTruthData................................. 44 vii 3.3.1 DNS-Based Ground Truth Data . 44 3.3.2 RTT-Proximity Ground Truth Data . 44 3.3.3 Ground Truth Data Regional and Topological Distribution . 45 3.3.4 Ground Truth Data Correctness . 46 3.4 Methodology . 49 3.4.1 Databases’ Coverage and Consistency . 49 3.4.2 Same City Coordinates Across Databases . 50 3.4.3 Accuracy of the Databases . 51 3.5 Results and Discussion . 51 3.5.1 Databases’ Coverage . 52 3.5.2 Databases’ Consistency . 52 3.5.3 Evaluation Using Ground Truth Data . 53 3.6 Recommendations . 58 3.7 Conclusions . 60 Chapter 4 IP Blocks Co-locality . 62 4.1 Introduction . 62 4.2 Dataset Description . 64 4.3 Identification of Co-located IP Addresses . 65 4.3.1 Methodology . 65 4.3.2 Methodology Limitations . 67 4.4 Validating Identification of Multi-Location Blocks . 68 4.4.1 Building a Single-Location Ground Truth Dataset . 68 4.4.2 Building a Multi-Location Ground-Truth Dataset . 69 4.4.3 Validation . 70 4.4.4 Bounding the False Positives . 71 4.5 Co-Locality of /24 Blocks in the Wild . 72 4.5.1 Identifying Multi-Location /24 Blocks . 73 4.5.2 Characterizing Multi-Location /24 Blocks . 73 4.6 Identifying Arbitrary-Size Clusters of Co-located Addresses . 76 4.6.1 Similarity of Co-located Blocks Latency . 76 4.6.2 Evaluation Dataset . 77 4.6.3 Results . 78 4.7 Conclusions . 79 Chapter 5 Delay-based Identification of Internet Block Movement . 81 5.1 Introduction . 82 5.2 Datasets . 83 5.2.1 Latency Information from the USC Internet Outage Data . 83 5.2.2 Paths from the CAIDA UCSD IPv4 Routed /24 Topology Dataset . 84 5.2.3 Paths from the CAIDA Internet Topology Data Kit . 85 5.3 Methodology: From Block Latency to Block Movement . 86 5.3.1 Stable Estimation of VP-to-Block Latency . 87 5.3.2 Common Patterns in IP-Block Latency . 87 5.3.3 Identifying Block Movement from Latency Measurements . 89 viii 5.4 Controlled Experiments with Synthetic Data . 90 5.4.1 Simulation of Block Movement . 90 5.4.2 Building a Dataset with Synthetic Movement . 91 5.4.3 ROC Analysis . 93 5.5 Evaluation with Real-World Data .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages155 Page
-
File Size-