UNIVERSITY of CALIFORNIA, SAN DIEGO Leveraging Internet
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA, SAN DIEGO Leveraging Internet Background Radiation for Opportunistic Network Analysis A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Karyn Benson Committee in charge: kc claffy, Chair Alex C. Snoeren, Co-Chair Alberto Daniotti George Papen Stefan Savage Geoffrey M. Voelker 2016 Copyright Karyn Benson, 2016 All rights reserved. The Dissertation of Karyn Benson is approved and is acceptable in qual- ity and form for publication on microfilm and electronically: Co-Chair Chair University of California, San Diego 2016 iii TABLE OF CONTENTS Signature Page ........................................................ iii Table of Contents ..................................................... iv List of Figures ........................................................ viii List of Tables ......................................................... xi Acknowledgements .................................................... xiii Vita ................................................................. xviii Abstract of the Dissertation ............................................. xx Chapter 1 Introduction ............................................... 1 1.1 IBR Datasets ................................................. 6 1.2 Contributions ................................................. 6 1.3 Organization .................................................. 8 Chapter 2 Related work .............................................. 10 2.1 Outcomes when using or comparing multiple data sources . 11 2.1.1 Outcome: One data source is superior ...................... 11 2.1.2 Outcome: Interesting differences between the data sources . 12 2.1.3 Outcome: Combining the data sources yields improved visibility 13 2.2 Opportunistic measurement with other data sources . 14 2.2.1 Web traffic ............................................. 15 2.2.2 Spam ................................................. 16 2.2.3 Logs .................................................. 17 2.2.4 BitTorrent ............................................. 18 2.2.5 Discussion ............................................. 19 2.3 Previous studies using IBR ...................................... 19 2.3.1 Questions to consider when constructing a darknet . 20 2.3.2 Previous work characterizing IBR composition . 25 2.3.3 Previous work using IBR to analyze malicious activities. 28 2.3.4 Previous opportunistic uses of IBR ........................ 33 Chapter 3 Santization: Removing spoofed traffic ......................... 36 3.1 Related work ................................................. 37 3.1.1 Packet fields indicative of spoofing ........................ 38 3.1.2 Techniques for identifying spoofed traffic ................... 42 3.2 Our technique for identifying and removing spoofed traffic from IBR . 50 3.2.1 Identifying bursty spoofing behavior ....................... 51 iv 3.2.2 Identifying consistent spoofing behavior .................... 53 3.2.3 Removing spoofed traffic ................................ 54 3.2.4 Need for a multi-faceted approach ......................... 57 3.2.5 Influence of spoofing on network analysis ................... 58 3.2.6 Validation of our technique ............................... 60 3.2.7 Conclusion ............................................ 61 Chapter 4 IBR Composition: Phenomena responsible for IBR . 63 4.1 Scanning ..................................................... 64 4.1.1 Properties of scans reaching UCSD-NT .................... 66 4.1.2 Appropriate inferences with scanning traffic . 73 4.2 Backscatter ................................................... 74 4.2.1 Spamhaus attack ........................................ 74 4.2.2 DNS backscatter ........................................ 75 4.2.3 Appropriate inferences with backscatter traffic . 80 4.3 Bugs and Misconfigurations ..................................... 80 4.3.1 Qihoo 360 ............................................. 81 4.3.2 BitTorrent ............................................. 83 4.3.3 Appropriate inferences with traffic resulting from bugs or mis- configurations .......................................... 89 4.4 Conclusion ................................................... 90 Chapter 5 IBR Visibility: Factors influencing network measurability . 93 5.1 Overall visibility .............................................. 94 5.1.1 How many sources are observed? .......................... 94 5.1.2 What types of networks are observed? . 101 5.1.3 Implications of overall IBR visibility . 103 5.2 Properties of IBR influencing visibility ............................ 103 5.2.1 Impact of IBR components ............................... 104 5.2.2 How often do we receive IBR? ............................ 108 5.3 Properties of collection infrastructure influencing visibility . 117 5.3.1 Dependence on position in IPv4 space . 118 5.3.2 Dependence on darknet size .............................. 125 5.4 Conclusion ................................................... 126 Chapter 6 Inferences with IBR: Using IBR to learn about address space usage . 128 6.1 Inferring IPv4 address space utilization . 130 6.1.1 Related work ........................................... 131 6.1.2 Methodology ........................................... 132 6.1.3 Dataset selection ........................................ 132 6.1.4 Validation ............................................. 134 6.1.5 Results of combining multiple datasets . 136 6.1.6 Sensitivity analysis ...................................... 138 v 6.1.7 Discussion ............................................. 142 6.2 Characterizing host functionality ................................. 144 6.2.1 Comparison to other data sources . 145 6.2.2 Combining with other data sources . 149 6.2.3 Discussion ............................................. 153 6.3 Extracting host attributes ....................................... 154 6.3.1 Determining uptime ..................................... 154 6.3.2 Assessing BitTorrent client popularity . 159 6.3.3 Time to patch the Qihoo 360 bug . 160 6.3.4 Discussion ............................................. 162 6.4 Inferring network configurations ................................. 163 6.4.1 Detecting NAT usage .................................... 163 6.4.2 Detecting carrier grade NAT (CGN) . 167 6.4.3 Analyzing DHCP lease dynamics . 173 6.4.4 Discussion ............................................. 179 6.5 Conclusion ................................................... 180 Chapter 7 Inferences with IBR: Using IBR to infer network status . 184 7.1 Identifying path changes ........................................ 185 7.1.1 Method of identifying path changes with IBR . 186 7.1.2 Number of analyzable networks . 187 7.1.3 Validation ............................................. 189 7.1.4 Route stability .......................................... 194 7.1.5 Discussion ............................................. 197 7.2 Recognizing Packet loss ........................................ 197 7.2.1 Data source and signal extraction . 198 7.2.2 Definition of metrics .................................... 202 7.2.3 Packet loss case studies .................................. 206 7.2.4 Discussion ............................................. 212 7.3 Limitations of using IBR to analyze Internet status . 213 7.3.1 Results are lower bounds ................................. 214 7.3.2 Difficulty pinpointing location of change . 215 7.4 Conclusion ................................................... 217 Chapter 8 Conclusion ................................................ 219 8.1 Future directions .............................................. 223 8.2 Final thoughts ................................................ 224 Appendix A Attributing IBR to responsible Internet phenomena . 226 A.1 Evidence that port-based techniques are insufficient . 226 A.1.1 Top TCP ports .......................................... 227 A.1.2 Top UDP ports ......................................... 228 A.2 Our approach to traffic attribution ................................ 229 vi A.2.1 Existing protocol identification tools . 230 A.2.2 Literature on well-studied protocols . 230 A.2.3 Web search for common text .............................. 231 A.2.4 Analysis of other traffic to a hotspot . 231 A.2.5 Running software ....................................... 232 A.2.6 Analysis of hosts sending traffic . 232 A.2.7 Differentiating between encryption and obfuscation . 232 A.3 Discussion ................................................... 234 Appendix B Scanning strategy heuristics .................................. 235 B.1 Sequential strategies ........................................... 236 B.2 Reverse-byte order strategy ..................................... 236 B.3 Conficker .................................................... 236 B.4 Random strategies ............................................. 238 References ........................................................... 239 vii LIST OF FIGURES Figure 3.1. Routed and unrouted networks by hour UCSD-13 . 52 Figure 3.2. /24 blocks observed per hour after removing spoofed traffic . 56 Figure 3.3. Influence of spoofing on number of observed /24 blocks . 59 Figure 4.1. Frequently scanned ports .................................. 68 Figure 4.2. Distribution of scanning IP addresses by number of scanning pack- ets sent to UCSD-NT per day ............................... 69 Figure 4.3. Scanning strategy heuristics ................................ 72 Figure 4.4. Sources from