Proactive Cyberfraud Detection Through Infrastructure Analysis

PROACTIVE CYBERFRAUD DETECTION THROUGH INFRASTRUCTURE ANALYSIS Andrew J. Kalafut Submitted to the faculty of the Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy in Computer Science Indiana University July 2010 Accepted by the Graduate Faculty, Indiana University, in partial fulfillment of the requirements of the degree of Doctor of Philosophy. Doctoral Minaxi Gupta, Ph.D. Committee (Principal Advisor) Steven Myers, Ph.D. Randall Bramley, Ph.D. July 19, 2010 Raquel Hill, Ph.D. ii Copyright c 2010 Andrew J. Kalafut ALL RIGHTS RESERVED iii To my family iv Acknowledgements I would first like to thank my advisor, Minaxi Gupta. Minaxi’s feedback on my research and writing have invariably resulted in improvements. Minaxi has always been supportive, encouraged me to do the best I possibly could, and has provided me many valuable opportunities to gain experience in areas of academic life beyond simply doing research. I would also like to thank the rest of my committee members, Raquel Hill, Steve Myers, and Randall Bramley, for their comments and advice on my research and writing, especially during my dissertation proposal. Much of the work in this dissertation could not have been done without the help of Rob Henderson and the rest of the systems staff. Rob has provided valuable data, and assisted in several other ways which have ensured my experiments have run as smoothly as possible. Several members of the departmental staff have been very helpful in many ways. Specifically, I would like to thank Debbie Canada, Sherry Kay, Ann Oxby, and Lucy Battersby. I would like to thank my co-authors on all the work that has contributed to this dissertation. Material in Chapter 2 [42] is based on work co-authored with Minaxi Gupta. Material in Chap- ter 4 [58] is based on work co-authored with D. Kevin McGrath and Minaxi Gupta. Material in Chapter 5 [41] is based on work co-authored with Minaxi Gupta, Pairoj Rattadilok and Pragneshku- mar Patel. Material in Chapter 6 [40] is based on work co-authored with Minaxi Gupta, Christopher Cole, Lei Chen, and Nathan Myers. Material in Chapter 7 [43] and Chapter 8 [93] is based on work co-authored with Craig Shue and Minaxi Gupta. v My parents, my sister, and my brother have been a great source of support throughout my life, and encouragement in pursuing my education. I am greatful for their support. I would especially like to thank my financée, Heather Goodman, for her love and support. Finally I would like to thank my friends who have helped make life in graduate school more fun and less stressful than it otherwise would have been, especially Craig Shue, Joshua Hursey, Samantha Foley, Alia Al-Kasimi, Eric Nichols, and Abhinav Acharya. vi Andrew J. Kalafut PROACTIVE CYBERFRAUD DETECTION THROUGH INFRASTRUCTURE ANALYSIS Internet users are threatened daily by spam, phishing, and malware. These attacks are often launched using armies of compromised machines, complicating identification of the miscreants behind the attacks. Unfortunately, most current approaches to fight these problems are reactive in nature, allowing significant damage before security measures are adapted to new attacks. For example, blacklisting prevents communications with known malicious hosts, but many users may fall victim to an attack before blacklists are updated. In this dissertation we argue for a proactive approach to fighting cybercrime. Our approach relies on the observation that to avoid attribution and to stay up amidst take-down attempts, miscreants must provision their infrastructure differently than legitimate web sites. Thus, we propose to proactively identify malicious activity using unique characteristics of malicious web site provisioning. Specifically, using near real-time feeds of malicious web hosts, we investigate the extent to which miscreants use five specific provisioning practices. The first three are based on the Domain Name System (DNS), which translates host names to IP addresses. We first examine fast-flux, a practice where the association between name and address changes much more frequently than usual. We then investigate the use of DNS wildcards, which point many host names to a single address. Next, we examine the use of orphan DNS servers, which are DNS servers in non-existent domains. Then, we study the concentration of malicious activity in certain networks. Finally, we examine web redirects, which may appear to be links to legitimate web sites but in reality trick users into visiting malicious sites. We find that although good web sites sometimes make use of some of these techniques, malicious web sites are more likely to use them. Consequently, their presence can be used for proactive identification of malicious web sites. vii Contents Acknowledgements v Abstract vii 1 Introduction 1 1.1 CurrentApproaches ............................... ..... 1 1.2 OurApproach..................................... ... 3 1.3 Organization .................................... .... 4 1.3.1 Background.................................... 5 1.3.2 InfrastructureFeatures. ....... 5 1.3.3 Conclusion .................................... 8 2 Domain Name System 9 2.1 Introduction.................................... ..... 9 2.2 DNSStructure .................................... ... 9 2.3 DNSRecords ...................................... .. 10 2.4 DNSQueryResolution.............................. ..... 11 viii 2.5 DNSContents..................................... ... 12 2.5.1 DataCollection................................ ... 12 2.5.2 RecordTypes................................... 13 2.5.3 NameServerRedundancy . ... 15 2.6 Discussion...................................... .... 16 3 Data Sources 18 3.1 SourcesofBenignHosts .. .. .. .. .. .. .. .. .. .. .. .. ...... 18 3.2 SourcesofNeutralHosts. ....... 19 3.3 SourcesofMaliciousHosts. ........ 20 3.3.1 PhishingSites................................. ... 20 3.3.2 Spam/ScamSites................................ .. 21 3.3.3 SpamSenders................................... 22 3.3.4 ExploitedHosts................................ ... 22 3.3.5 MalwareDistributionHosts . ...... 22 3.3.6 BotCommandandControl . .. 23 4 Fast Flux 24 4.1 Introduction.................................... ..... 24 4.2 DataCollection.................................. ..... 26 4.2.1 OverviewofCollectedData . ..... 27 4.3 DetectingFlux ................................... .... 28 4.3.1 Methodology ................................... 28 4.3.2 ParametersUsed ................................ .. 29 ix 4.4 PrevalenceofFlux ................................ ..... 31 4.4.1 ImpactofParameterSet. .... 32 4.4.2 How Many DNS Resolutions Are Enough? . ..... 33 4.5 FluxandFraudLongevity. ...... 38 4.6 TopFluxers...................................... ... 39 4.6.1 Characteristics of Just-Fast-Flux Networks . ............ 40 4.6.2 Characteristics of Double-Flux Networks . ........... 41 4.7 Discussion...................................... .... 43 5 DNS Wildcards 45 5.1 Introduction.................................... ..... 45 5.2 Background...................................... ... 46 5.3 DataSets ........................................ .. 47 5.4 WildcardPrevalence .............................. ...... 48 5.4.1 WildcardsattheDomainLevel . ..... 49 5.4.2 WildcardsattheSubdomainLevel . ...... 49 5.4.3 OverriddenWildcards . .... 50 5.5 WildcardUsage................................... .... 51 5.5.1 WildcardUsageAmongGoodDomains . .... 52 5.5.2 Wildcard Usage Among a General Collection of Domains . .......... 53 5.5.3 WildcardUsageAmongBadDomains . .... 54 5.6 Identifying Malicious Wildcard Usage . ........... 57 5.6.1 TTLsofWildcardedRecords . .... 58 x 5.6.2 Autonomous Systems Pointed to by Wildcards . ........ 59 5.6.3 HostNamesRepresentedbyWildcards. ....... 61 5.7 Discussion...................................... .... 63 6 Orphan DNS servers 65 6.1 Introduction.................................... ..... 65 6.2 Background...................................... ... 67 6.2.1 DNSBackground................................. 67 6.2.2 OrphanCreation ................................ .. 68 6.3 MethodologyandDataOverview . ....... 69 6.3.1 FindingOrphanDNSServers . .... 69 6.3.2 DataOverview .................................. 71 6.4 CharacteristicsofOrphans. ......... 72 6.4.1 PrevalenceofOrphanDNSServers . ...... 72 6.4.2 DistributionofOrphanDNSServers . ....... 72 6.4.3 LifetimesofOrphanDNSServers. ...... 74 6.5 UsesofOrphans ................................... ... 75 6.5.1 UseasDNSServers ............................... 75 6.5.2 RunningServices............................... ... 76 6.5.3 DomainswiththeMostOrphans . .... 77 6.5.4 MaliciousUsesofOrphans. ..... 78 6.6 Discussion...................................... .... 80 xi 7 Malicious Autonomous Systems 81 7.1 Introduction.................................... ..... 81 7.2 DataCollection.................................. ..... 82 7.2.1 DataSetComparisons . .. .. .. .. .. .. .. .. .. .. .. ... 83 7.3 Degree of Autonomous System Maliciousness . ........... 86 7.3.1 Examination of ASes by Fraction of Advertised IP Space ........... 87 7.3.2 Examination of ASes by Proportionof Data Set . ........ 88 7.3.3 HowLongDoHostsStayInfected?. ..... 91 7.3.4 ASeswithUnrulyChildren . .... 92 7.4 Identifying Malicious Autonomous System . ............ 93 7.4.1 BGPBehavior ................................... 94 7.4.2 ASSizes....................................... 97 7.4.3 DegreeofASPeering ............................. .. 98 7.5 Discussion...................................... .... 99 8 Web

Proactive Cyberfraud Detection Through Infrastructure Analysis

Characterizing Certain Dns Ddos Attacks

Click Trajectories: End-To-End Analysis of the Spam Value Chain

Passive Monitoring of DNS Anomalies Bojan Zdrnja1, Nevil Brownlee1, and Duane Wessels2

Sidekick Suspicious Domain Classification in the .Nl Zone

Fast-Flux Botnet Detection Based on Traffic Response and Search Engines Credit Worthiness

Taxonomy and Adversarial Strategies of Random Subdomain Attacks

From WHOIS to WHOWAS: a Large-Scale Measurement Study of Domain Registration Privacy Under the GDPR

Detection of Fast-Flux Networks Using Various DNS Feature Sets

Country Code Top-Level Domain (Cctld) Project for State of Kuwait

DNS) Administration Guide

How to Add Domains and DNS Records

PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration