I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111 US009306969B2 c12) United States Patent (IO) Patent No.: US 9 ,306,969 B2 Dagon et al. (45) Date of Patent: *Apr. 5, 2016

(54) METHOD AND SYSTEMS FOR DETECTING (52) U.S. Cl. COMPROMISED NETWORKS AND/OR CPC ...... H04L 6311441 (2013.01); H04L 29112066 COMPUTERS (2013.01); H04L 6111511 (2013.01); (Continued) (71) Applicant: GEORGIA TECH RESEARCH CORPORATION, Atlanta, GA (US) (58) Field of Classification Search CPC ...... H04L 61/1511 (72) Inventors: David Dagon, Tampa, FL (US); Nick USPC ...... 726122 Feamster, Atlanta, GA (US); Wenke See application file for complete search history. Lee, Atlanta, GA (US); Robert Edmonds, Woodstock, GA (US); (56) References Cited Richard Lipton, Atlanta, GA (US); U.S. PATENT DOCUMENTS Anirudh Ramachandran, Atlanta, GA (US) 4,843,540 A 6/1989 Stolfo 4,860,201 A 8/1989 Stolfo et al. (73) Assignee: GEORGIA TECH RESEARCH (Continued) CORPORATION, Atlanta, GA (US) FOREIGN PATENT DOCUMENTS ( *) Notice: Subject to any disclaimer, the term ofthis patent is extended or adjusted under 35 WO WO 02/37730 512002 U.S.. 154(b) by 0 days. WO WO 02/098100 12/2002 WO WO 2007/050244 5/2007 This patent is subject to a terminal dis­ claimer. OTHER PUBLICATIONS "Spamming Botnets: Signatures and Characteristics" Xie et al; ACM (21) Appl. No.: 14/015,661 SIGCOMM. Settle. WA; Aug. 2008; 12 pages.* (22) Filed: Aug. 30, 2013 (Continued)

(65) Prior Publication Data Primary Examiner - Jason Lee US 2014/0245436 Al Aug. 28, 2014 (74) Attorney, Agent, or Firm - DLA Piper LLP (US)

Related U.S. Application Data (57) ABSTRACT Collect Domain Name System (DNS) data, the DNS data (63) Continuation of application No. 11/538,212, filed on generated by a DNS server and/or similar device, wherein the Oct. 3, 2006, now Pat. No. 8,566,928. DNS data comprises DNS queries, wherein the collected (60) Provisional application No. 60/730,615, filed on Oct. DNS data comprises DNS query rate information. Examine 27, 2005, provisional application No. 60/799,248, the collected DNS data relative to DNS data from known filed on May 10, 2006. compromised and/or uncompromised computers. Determine an existence of the collection of compromised networks and/ (51) Int. Cl. or computers, and/or an identity of compromised networks G06F 21100 (2013.01) and/or computers, based on the examination. H04L29/06 (2006.01) H04L29/12 (2006.01) 52 Claims, 28 Drawing Sheets

!05 ,------·--·

VX PURCHASES AT LEAST ONE DOMAIN NAME

1'~-~----__!___·---

i VX HARD-CODES DOMAIN NAMES INTO DROPPER ' 1 PROGRAMS, ANgii:i7r'~:STHEM INTO 80T !

- J __ ------·-

vx CREATES C&C MACHINE FOR ViCTlM 807 1 COMPUTERS TO USE TO COMMUNICATE I

------·---~ __i

125 - ---~-·______L-

VX ARRANGES FOR DNS RESOLlrTION OF DOMAIN NA!VIE AND REGISTERS WITH DNS SERVICE US 9,306,969 B2 Page 2

(52) U.S. Cl. 2003/0065926 Al 4/2003 Schultz et al. CPC ...... H04L63/1425 (2013.01); H04L 6311491 2003/0065943 Al 4/2003 Geis et al. 2003/0069992 Al 4/2003 Ramig (2013.01); H04L 29112301 (2013.01); H04L 2003/0167402 Al 9/2003 Stolfo et al. 6112076 (2013.01); H04L 24631144 (2013.01) 2003/0204621 Al 10/2003 Poletto et al. 2004/0002903 Al 112004 Stolfo et al. (56) References Cited 2004/0088348 Al * 512004 Yeager ...... H04L 67/104 709/202 U.S. PATENT DOCUMENTS 200410111636 Al 6/2004 Baffes et al. 200410187032 Al 912004 Gels et al. 5,363,473 A 1111994 Stolfo et al. 2004/0205474 Al 10/2004 Eskin et al. 5,497,486 A 3/1996 Stolfo et al. 2004/0215972 Al 10/2004 Sung et al. 2005/0021848 Al* 1/2005 Jorgenson ...... 709/238 5,563,783 A 10/1996 Stolfo et al. 2005/0039019 Al 212005 Delany 5,668,897 A 9/1997 Stolfo 5,717,915 A 2/1998 Stolfo et al. 2005/0086523 Al 412005 Zimmer et al. 2005/0108407 Al 512005 Johnson et al. 5,748,780 A 5/1998 Stolfo 2005/0108415 Al 512005 Turk et al. 5,920,848 A 7/1999 Schutzer et al. 2005/0257264 Al 11/2005 Stolfo et al. 6,401,118 Bl 612002 Thomas 2005/0261943 Al 11/2005 Quarterman et al. 6,983,320 Bl 112006 Thomas et al. 2005/0265331 Al 12/2005 Stolfo 7,013,323 Bl 3/2006 Thomas et al. 2005/0278540 Al 12/2005 Cho 7,039,721 Bl 512006 Wu eta!. 2005/0281291 Al 12/2005 Stolfo et al. 7,069,249 B2 612006 Stolfo et al. 200610015630 Al 112006 Stolfo et al. 7,093,292 Bl 8/2006 Pantuso 2006/0031483 Al 212006 Lund 7,136,932 Bl 1112006 Schneider 7,152,242 B2 12/2006 Douglas 2006/0068806 Al 3/2006 Nam 2006/0075084 Al 412006 Lyon 7,162,741 B2 112007 Eskin et al. 2006/0143711 Al* 612006 Huang et al ...... 726/23 7,225,343 Bl 5/2007 Honig et al. 2006/0146816 Al* 7,277,961 Bl 10/2007 Smith et al. 712006 Jain...... 370/389 200610150249 Al 712006 Gassen et al. 7,331,060 Bl 2/2008 Ricciulli 200610156402 Al 712006 Stone et al. 7,372,809 B2 5/2008 Chen et al. 7,383,577 B2 * 6/2008 Hrastar et al...... 726/23 200610168024 Al 712006 Mehr 7,424,619 Bl 9/2008 Fan et al. 200610178994 Al 8/2006 Stolfo et al. 2006/0200539 Al 912006 Kappler 7,426,576 Bl 9/2008 Banga et al. 2006/0212925 Al 912006 Shull 7,448,084 Bl 1112008 Apapet al. 7,483,947 B2 112009 Starbuck 2006/0224677 Al 10/2006 Ishikawa et al. 2006/0230039 Al 10/2006 Shull 7,487,544 B2 212009 Schultz et al. 2006/0247982 Al 1112006 Stolfo et al. 7,536,360 B2 512009 Stolfo et al. 7,634,808 Bl 12/2009 Szor 2006/0253581 Al 1112006 Dixon 2006/0253584 Al 1112006 Dixon 7,639,714 B2 12/2009 Stolfo et al. 2006/0259967 Al 1112006 Thomas et al. 7,657,935 B2 2/2010 Stolfo et al. 7,665,131 B2 2/2010 Goodman 2006/0265436 Al 1112006 Edmond 7,698,442 Bl 4/2010 Krishnamurthy 2007/0050708 Al 3/2007 Gupta et al. 2007/0056038 Al 3/2007 Lok 7,712,134 Bl 5/2010 Nucci et al. 2007/0064617 Al 3/2007 Reves 7,752,125 Bl 7/2010 Kothari et al. 7,752,665 Bl 7/2010 Robertson et al. 2007/0076606 Al 4/2007 Olesinski 2007/0083931 Al 4/2007 Spiegel 7,779,463 B2 8/2010 Stolfo et al. 2007/0118669 Al 5/2007 Rand 7,784,097 Bl 8/2010 Stolfo et al. 7,818,797 Bl 10/2010 Fan et al. 2007/0136455 Al 6/2007 Lee et al. 2007/0162587 Al 7/2007 Lund et al. 7,890,627 Bl 212011 Thomas 2007/0209074 Al 9/2007 Coffman 7,913,306 B2 3/2011 Apapet al. 7,930,353 B2 412011 Chickering 2007/0239999 Al 10/2007 Honig et al. 7,962,798 B2 612011 Locasto et al. 2007/0274312 Al 11/2007 Salmela et al. 2007/0294419 Al 12/2007 Ulevitch 7,979,907 B2 712011 Schultz et al. 2008/0028073 Al 1/2008 Trabe et al. 7,996,288 Bl 8/2011 Stolfo 8,015,414 B2 912011 Mahone 2008/0028463 Al 1/2008 Dagon 2008/0060054 Al 3/2008 Srivastava 8,019,764 Bl 912011 Nucci 2008/0060071 Al 3/2008 Herman 8,074,115 B2 12/2011 Stolfo et al. 8,161,130 B2 412012 Stokes 2008/0098476 Al 4/2008 Syversen 2008/0133300 Al 6/2008 Jalinous 8,170,966 Bl 512012 Musat et al. 2008/0155694 Al 6/2008 Kwon et al. 8,200,761 Bl 612012 Tevanian 8,224,994 Bl 7/2012 Schneider 2008/0177736 Al 712008 Spangler 8,260,914 Bl 912012 Ranjan 2008/0178293 Al 712008 Keen et al. 2008/0184371 Al 712008 Moskovitch 8,341,745 Bl* 12/2012 Chau...... G06F 21156 2008/0195369 Al 8/2008 Duyanovich et al. 709/223 8,347,394 Bl 112013 Lee 2008/0222729 Al 9/2008 Chen et al. 2008/0229415 Al 9/2008 Kapoor 8,402,543 Bl 3/2013 Ranjan et al. 2008/0262985 Al 10/2008 Cretu et al. 8,418,249 Bl 412013 Nucci et al. 8,484,377 Bl 7/2013 Chen et al. 2008/0263659 Al 10/2008 Alme 2008/0276111 Al 11/2008 Jacoby et al. 8,516,585 B2 8/2013 Cao et al. 8,527,592 B2 912013 Gabe 2009/0055929 Al 212009 Lee et al. 8,631,489 B2 112014 Antonakakis et al. 2009/0083855 Al 3/2009 Apap et al. 200110014093 Al 8/2001 Yoda et al. 200910106304 Al 412009 Song 200110044785 Al 1112001 Stolfo et al. 2009/0138590 Al 512009 Lee et al. 200110052007 Al 12/2001 Shigezumi 2009/0193293 Al 712009 Stolfo et al. 200110052016 Al 12/2001 Skene et al. 2009/0198997 Al 8/2009 Yeap 200110055299 Al 12/2001 Kelly 2009/0210417 Al 8/2009 Bennett 2002/0021703 Al 212002 Tsuchiya et al. 200910222922 Al 912009 Sidiroglou et al. 2002/0066034 Al * 512002 Schlossberg et al...... 713/201 2009/0241190 Al 912009 Todd et al. 200210166063 Al 1112002 Lachman et al. 2009/0241191 Al 912009 Keromytis et al. US 9,306,969 B2 Page 3

(56) References Cited Alexander Gostev, "Malware Elovution: Jan.-Mar. 2005", Viruslist. com, http.//www.viruslist.com/en/ analysis?pubid~ 1624 54316, U.S. PATENT DOCUMENTS (Apr. 18, 2005). Jiang Wu et al., "An Effective Architecture and Algorithm for Detect­ 2009/0254658 Al 10/2009 Kamikura et al. ing Worms with Various Scan Techniques", In Proceedings of the 2009/0254989 Al 10/2009 Achan et al. 11th Annual Network and Distributed System Security Symposium 2009/0254992 Al 10/2009 Schultz et al. 2009/0265777 Al 10/2009 Scott (NDSS '04), Feb. 2004. 2009/0282479 Al 1112009 Smith et al. Matthew M. Williamson et al., "Virus Throttling for Instant Messag­ 2009/0327487 Al 12/2009 Olson et al. ing", Virus Bulletin Conference, Sep. 2004, Chicago, IL, USA, (Sep. 2010/0011243 Al 112010 Locasto et al. 2004). 2010/0011420 Al 112010 Drako F. Weimer, "Passive DNS Replication", http://www.enyo.de/fW/soft­ 2010/0017487 Al 112010 Patinkin ware/dnslogger, 2005. 2010/0023810 Al 112010 Stolfo et al. Ke Wang et al., "Anomalous Payload-Based Network Intrusion 2010/0031358 Al 2/2010 Elovici et al. 2010/0034109 Al 2/2010 Shomura et al. Detection", In Proceedings of the 7th International Symposium on 2010/0037314 Al 2/2010 Perdisci et al. Recent Advances in Intrusion Detection (RAID 2004), 2004. 2010/0054278 Al 3/2010 Stolfo et al. P. Vixie et al,. "RFC 2136: Dynamic Updates in the Domain Name 2010/0064368 Al 3/2010 Stolfo et al. System (DNS Update)", http://www.faqs.org/rfcs.rfc2136.htrnl 2010/0064369 Al 3/2010 Stolfo et al. (Apr. 1997). 2010/0077483 Al 3/2010 Stolfo et al. Joe Stewart, "Dipnet/Oddbob Worm Analysis", SecureWorks, http:// 2010/0138919 Al 6/2010 Peng 2010/0146615 Al 6/2010 Locasto et al. www.secureworks.com/research/threats/dipnet/ (Jan. 13, 2005). 2010/0153785 Al 6/2010 Keromytis et al. Harold Thimbleby et al., "A Framework for Modeling Trojans and 2010/0169970 Al 7/2010 Stolfo et al. Computer Virus Infection", Computer Journal, vol. 41, No. 7, pp. 2010/0269175 Al 10/2010 Stolfo et al. 444-458 (1999). 2010/0274970 Al 10/2010 Treuhaft et al. Paul Bachner et al., "Know Your Enemy: Tracking Botnets", http:// 2010/027 5263 Al 10/2010 Bennett et al. www.honeynet.org/papers/bots/, (Mar. 13, 2005). 2010/0281539 Al 1112010 Burns et al. "LockDown Security Bulletin-Sep. 23, 2001", http:// 2010/0281541 Al 1112010 Stolfo et al. lockdowncorp.corn/bots/ (Sep. 23, 2001). 2010/0281542 Al 1112010 Stolfo et al. Colleen Shannon et al., "The Spread of the Witty Worm", http://www. 2010/0319069 Al 12/2010 Granstedt caida.org/analysis/security/witty/index.xml (Mar. 19, 2004). 2010/0332680 Al 12/2010 Anderson et al. MoheebAbu Rajah et al., "On the Effectiveness of Distributed Worm 201110041179 Al 212011 Stahlberg 201110067106 Al 3/2011 Evans et al. Monitoring", In Proceedings fo the 14th Usenix Security Symposium 201110167493 Al 712011 Song et al. (2005). 201110167494 Al 712011 Bowen eta!. Niels Provos, "CITI Technical Report 03-1: A Virtual Honeypot 201110185423 Al 712011 Sal lam Framework", http://www. ci ti. umich .edu/techreports/reports/citi-tr- 201110185428 Al 712011 Sal lam 03-1. pdf (Oct. 21, 2003). 201110214161 Al 912011 Stolfo et al. "Know your Enemy: Honeynets", http://www.honeypot.org/papers/ 2012/0084860 Al 412012 Cao et al. honeynet, (May 31, 2006). 2012/0117641 Al 512012 Holloway David Moore et al., "Internet Quarantine: Requirements for Contain­ 2012/0143650 Al 612012 Crowley et al. ing Self-Propagating Code", In Proceedings of the IEEE INFOCOM 2012/0198549 Al 8/2012 Antonakakis et al. 2003, Mar. 2003. 2013/0232574 Al 912013 Carothers Joe Stewart, "I-Worm Baba Analysis", http://secureworks.com/re­ 2014/0068763 Al 3/2014 Ward et al. 2014/0289854 Al 9/2014 Mahvi search/threats/baba (Oct. 22, 2004. David Moore et al., "Slammer Worm Dissection: Inside the Slammer OTHER PUBLICATIONS Worm", IEEE Security & Privacy, vol. 1, No. 4 (Jul.-Aug. 2003). David Moore et al., "Code-Red: A Case Study on the Spread and File History of U.S. Appl. No. 111538,212. Victims of an Internet Worm", http://www.icir.org/vern/imw-2002/ File History of U.S. Appl. No. 12/538,612. imw2002-papers/209.ps/gz (2002). File History of U.S. Appl. No. 12/985,140. Joe Stewart, "Sinit P2P Trojan Analysis", http://www.secureworks. File History of U.S. Appl. No. 13/008,257. corn/research/threats/sinit, (Dec. 8, 2003). File History of U.S. Appl. No. 13/205,928. Martin Krzywinski, "Port Knocking-Network Authentication File History of U.S. Appl. No. 13/309,202. Across Closed Ports", Sys Admin Magazine, vol. 12, pp. 12-17 File History of U.S. Appl. No. 13/358,303. (2003). File History of U.S. Appl. No. 13/749,205. File History of U.S. Appl. No. 14/010,016. Christopher Kruegel et al., "Anomaly Detection of Web-Based File History of U.S. Appl. No. 14/015,582. Attacks", In Proceedings of the 10th ACM Conference on Computer File History of U.S. Appl. No. 14/015,621. and Communication Security (CCS '03), Oct. 27-31, 2003, Wash­ File History of U.S. Appl. No. 14/015,663. ington, DC, USA, pp. 251-261. File History of U.S. Appl. No. 14/015,704. "Dabber Worm Analysis", LURHQ Threat Intelligence Group, http:// File History of U.S. Appl. No. 14/041,796. www.lurhq.com/dabber.htrnl (May 13, 2004). File History of U.S. Appl. No. 14/096,803. Abstract of Jeffrey 0. Kephart et al,. "Directed-Graph Epidemiologi­ File History of U.S. Appl. No. 14/194,076. cal Models of Computer Viruses", Proceedings of the 1991 IEEE Cliff Changchun Zou et al., "Code Red Worm Propagation Modeling Computer Society Symposium on Research in Security and Privacy; and Analysis", In Proceedings of 9th ACM Conference on Computer Oakland, CA, May 20-22, 1991; pp. 343-359 (May 20-22, 1991). and Communications Security (CCS '02), Nov. 18, 2002. C. Kalt "RFC 2810-: Architecture" http://faqs. Cliff C. Zou et al,. "Email Worm Modeling and Defense", In the 13th org/rfcs/rfc2810.html (Apr. 2000). ACM International Confrerence on Computer Communications and Xuxian Jiang et al., "Cerias Tech Report 2005-24: Virtual Play­ Networks (CCCN '04), Oct. 27, 2004. grounds for Worm Behavior Investigation", Purdue University, Feb. Cliff Changchun Zou et al., "Monitoring and Early Warning for 2005. Internet Worms", In Proceedings fo the 10th ACM Conference on Neal Hindocha et al., "Malicious Threats and Vulnerabilities in Computer and Communications Security (CCS '03), Oct. 2003. Instant Messaging", Virus Bulletin International Conference, Sep. Cliff Changchun Zou et al., "On the Performance of Internet Worm 2003. Scanning Strategies", Technical Report TR-03-CSE-07, Umass ECE Thomer M. Gil, "NSTX (IP-over-DNS) HOWTO", http://thomer. Dept., Nov. 2003. corn/howtos/nstx.html, Nov. 4, 2005 (5 pages). US 9,306,969 B2 Page 4

(56) References Cited Martin Overton, "Bots and Botnets: Risks, Issues and Prevention", 2005 Virus Bulletin Conference at the Burlington, Dublin, Ireland, OTHER PUBLICATIONS Oct. 5-7, 2005, http://arachnid.homeip.net/papersNB2005-Bots_ and_Botnets-1.0.2.pdf. V. Fuller et al., "RFC 1519---Classless Inter-Domain Routing Yin Zhang et al., "Detecting Stepping Stones", Proceedings of the 9th (CIDR): An Address Assignment and Aggregation Strategy", http:// USENIX Security Symposium, Denver, Colorado, USA, Aug. 14-17, www.faqs.org/rfcs/rfc1519.html (Sep. 1993). 2000. David E. Smith "Dynamic DNS", http://www.technopagan.org/dy­ Joe Stewart, "Bobax Trojan Analysis", http://www.lurhq.com/bobax. namic (Aug. 7, 2006). html, May 17, 2004. Dave Dittrich, "Active Response Continuum Research Project", David Brumley et al., "Tracking Hackers on IRC", http://www. http://staff.washington.edu/dittrich/arc/ (Nov. 14, 2005). doomded.corn/texts/ircmirc/TrackingHackersonIRC.htm, Dec. 8, Joe Stewart, "Akak Trojan Analysis", http://www.secureworks.com/ 1999. research/threats/akak/ (Aug. 31, 2004). Brian Krebs, "Bringing Botnets Out of the Shadows", Monirul I. Sharif, "Mechanisms of Dynamic Analysis and Washingtonpost.com, http://www.washingtonpost.com/wp-dyn/ DSTRACE". content/artcile/2006/03/21/AR2006032100279 _pf.html, Mar. 21, 2006. Kapil Kumar Singh, "IRC Reconnaissance (IRCRecon) Public IRC "SwatIT: Bots, Drones, Zombies, Worms and Other Things That Go Heuristics (BotSniffer)" (Jul. 24, 2006). Bump in the Night", http://swatit.org/bots, 2004. http://www. trendmicro .corn/ en/home/us/home.htm. Christian Kreibich, "Honeycomb: Automated NIDS Signature Cre­ "InterCloud Security Service", http://ww. trendmicro .corn/ en/prod­ ation Using Honeypots", 2003, http://www.cl.cam.ac.uk/research/ ucts/nss/icss/evaluate/overview.thm. srg/netos/papers/2003-honeycomb-sigcomm-poster.pdf. "2006 Press Releases: Trend Micro Takes Unprecedented Approach DMOZ Open Directory Project, Dynamic DNS Providers List, http:// to Eliminating Botnet Threats with the Unveiling oflnterCloud Secu­ .org/Computers/Software/Internet/Servers/ Address_Manage­ rity Service", http://www.trendmicro.com/en/about/news/pr/ ment/Dynamic_DNS_Services/. archive/2006/pr092506.htm, (Sep. 25, 2006). David Moore, "Network Telescopes: Observing Small or Distant Paul F. Roberts, "Trend Micro Launches Anti-Botnet Service", Security Events", http://www.caida.org/publications/presentations/ Info World, http://www.infoworld.com/article/06/09/25/ 2002/usenix_sec/usenix_sec_2002_files/frame.htm; Aug. 8, HNtrendintercloud_1.html (Sep. 25, 2006). 2002. CNN Technology News-Expert: Botnets No. 1 Emerging Internet Vincent H. Berk et al., "Using Sensor Networks and Data Fusion for Threat, CNN.com, http://www.cnn.com/2006/TECH/internet/O 1/31/ Early Detection of Active Worms", Sensors, and Command, Control, furst.index.html (Jan. 31, 2006). Communications, and Intelligence (C3iI) Technologies for Home­ Evan Cooke et al., "The Zombie Roundup: Understanding, Detect­ land Defense and Law Enforcement II, Proceedings of SPIE, vol. ing, and Disrupting Botnets", In Usenix Workshop on Steps to 5071, pp. 92-104 (2003). Reducing Unwanted Traffic on the Internet (SRUTI), Jun. 2005. David Dagon et al., "Worm Population Control Through Periodic Sven Dietrich et al., "Analyzing Distributed Denial of Service Tools: Response", Technical Report, Georgia Institute for Technology, Jun. The Shaft Case", Proceedings of the 14th Systems Administration 2004. Conference (LISA 2000), New Orleans, Louisiana, USA, Dec. 3-8, Scott Jones et al., "The IPM Model of Computer Virus Management", 2000. Computers & Security, vol. 9, pp. 411-418 (1990). Felix C. Freiling et al,. "Botnet Tracking: Exploring a Root-Cause Jeffrey 0. Kephart et al., "Directed-Graph Epidemiological Models Methodology to Prevent Distributed Denial-of-Service Attacks", of Computer Viruses", In Proceedings ofIEEE Symposium on Secu­ ESORICS 2005, LNCS 3679, pp. 319-335 (2005). rity and Privacy, pp. 343-359 (1991). Luiz Henrique Gomes et al,. "Characterizing a Spam Traffic", In Darrell M. Kienzle et al., "Recent Worms: A Survey and Trends", In Proc. ACM SIGCOMM Internet Measurement Conference (IMC WORM '03, Proceedings of the 2003 ACM Workshop on Rapid '04), Oct. 25-27, 2004 Taormina, Sicily, Italy, pp. 356-369. Malcode, Washington, DC, USA, pp. 1-10, Oct. 27, 2003. Christopher W. Hanna, "Using Snort to Detect Rogue IRC Bot Pro­ Bill McCarty, "Botnets: Big and Bigger", IEEE Security and Privacy grams", Technical Report, SANS Institute 2004 (Oct. 8, 2004). Magazine, vol. 1, pp. 87-89 (2003). Jaeyeon Jung et al., "An Empirical Study of Spam Traffic and the Use Xinzhou Qin et al., "Worm Detection Using Local Networks", Tech­ of DNS Black Lists", In Proc. ACM SIGCOMM Internet Measure­ nical Report GIT-CC-04-04, College of Computing, Georgia Insti­ ment Conference (IMC '04), Oct. 25-27, 2004, Taormina, Sicily, tute of Technology, Feb. 2004. Italy, pp. 370-375. Yang Wang et al., "Modeling the Effects of Timing Parameters on Srikanth Kandula et al., "Botz-4-Sale: Surviving Organized DDoS Virus Propagation", In Proceedings of ACM CCS Workshop on Attacks That Mimic Flash Crowds", Technical Report LCS TR-969, Rapid Malcode (WORM '03), Washington, DC, pp. 61-66, Oct. 27, Laboratory for Computer Science, MIT, 2004. 2003. Sven Krasser et al., "Real-Time and Forensic Network Data Analysis Donald J. Welch et al., "Strike Back: Offensive Actions in Informa­ Using Animated and Coordinated Visualization", Proceedings of the tion Warfare", in AMC New Security Paradigm Workshop, pp. 47-52 6th IEEE Information Assurance Workshop (Jun. 2005). (1999). David Moore et al., "Inferring Internet Denial-of-Service Activity", T. Liston, "Welcome to my Tarpit: The Tactical and Stragetic Use of In Proceedings of the 2001 Usenix Security Symposium, 2001. LaBrea", http://www.hackbusters.net/LaBrea/LaBrea.txt, Oct. 24, Stephane Racine, "Master's Thesis: Analysis for Internet Relay Chat 2001. Usage by DDoS Zombies", ftp://www.tik.ee.ethz.ch/pub/students/ R. Pointer, "Eggdrop Development", http://www.eggheads.org, Oct. 2003-2004-Wi/MA-2004-01.pdf (Nov. 3, 2003). 1, 2005. Anirudh Ramachandran et al., "Understanding the Network-Level S. Staniford, "Code Red Analysis Pages: July Infestation Analysis", Behavior of Spammers", SIGCOMM '06, Sep. 11-15, 2006, Pisa, http://silicondefense.org/cr/july.html, Nov. 18, 2001. Italy, pp. 291-302. Alex Ma, "NetGeo-The Internet Geographic Database", http:// RamneekPuri, "Bots & Botnet: An Overview", SANS Institute 2003, www.caida.org/tools/utilities/netgeo/index.xml, Sep. 6, 2006. http://www.giac.com/practical/GSEC/Ramneek_Puri_GSEC.pdf MathWorks Inc. Simulink, http://www.mathworks.com/products/ (Aug. 8, 2003). simulink, Dec. 31, 2005. Stuart E. Schechter et al., "Access For Sale: A New Class ofWorm", David Dagon et al., "Modeling Botnet Propagation Using Time In 2003 ACM Workshop on Rapid Malcode (WORM '03), ACM Zones", In Proceedings of the 13th Annual Network and Distributed SIGSAC, Oct. 27, 2003, Washington, DC, USA. Systems Security Symposium (NDSS '06), Feb. 2006. Stuart Staniford, "How to Own the Internet in Your Spare Time", In John Canavan, "Symantec Security Response: W32.Bobax.D", Proc. 11th Usenix Security Symposium, San Francisco, CA, Aug. http://www.sarc.com/avcent/venc/data/w32.bobax.d.html, May 26, 2002. 2004. US 9,306,969 B2 Page 5

(56) References Cited Richard 0. Duda et al., "Pattern Classification, Second Edition", John Wiley & Sons, Inc., pp. vii-xx, and 1-654, Copyright 2001. OTHER PUBLICATIONS Guofei Gu et al. "BotMiner: Clustering Analysis of Network Traffic Protocol- and Structure-Independent Botnet Detection", 2008, "Whois Privacy", www.gnso.icann.org/issues/whois-privacy/index/ Usenix Security Symposium, pp. 139-154. shtrul, Jun. 3, 2005. Zhu et al., "Using Failure Information Analysis to Detect Enterprise John D. Hardin, "The Scanner Tarpit HOWTO", http://www.impsec. Zombies," Lecture note ofthe Institute for Computer Science, Social­ org/linus/security/scanner-tarpit.htrul, Jul. 20, 2002. Informatics and Telecommunications Engineering, vol. 19, part 4, Charles J. Krebs, "Ecological Methodology", Harper & Row, Pub­ pp. 185-206, 2009. lishers, New York, pp. v-x, 15-37, 155-166, and 190-194 (1989). Manos Antonakakis et al., "Building a Dynamic Reputation System D.J. Daley et al., "Epidemic Modeling: An Introduction", Cambridge for DNS", 19th Usenix Security Symposium, Aug. 11-13, 2010 (17 University Press, pp. vii-ix, 7-15, and 27-38 (1999). pages). Lance Spitzner, "Honeypots: Tracking Hackers", Addison-Wesley, pp. vii-xiv, 73-139, 141-166, and 229-276 (2003). ManosAntonakakis eta!., "Detecting Malware Domains atthe Upper International Search Report issued in Application No. PCT/US06/ DNS Hierarchy", In Proceeding of the 20th Usenix Security Sympo­ 038611 mailed Jul. 8, 2008. sium, Aug. 8-12, 2011 (16 pages). Written Opinion issued in Application No. PCT/US06/038611 Leyla Bilge et al., "Exposure: Finding Malicious Domains Using mailed Jul. 8, 2008. Passive DNS Analysis", 18th Annual Network and Distributed Sys­ International Preliminary Report on Patentability issued in Applica­ tem Security Symposium, Feb. 6-9, 2011 (17 pages). tion No. PCT/US06/038611 mailed Mar. 26, 2009. "Virus:Win32/Expiro.Z". http://www.microsoft.com/security/por­ 0. Diekmann et al,. "Mathematical Epidemiology oflnfectious Dis­ tal/Threat/Encyclopedia/Entry.aspx, Jun. 9, 2011 (5pages). eases: Model Building, Analysis and Interpretation", John Wiley & Mike Geide, "Another Trojan Bamital Pattern", http://research. Son, Ltd., 2000, pp. v-xv and 1-303. zscaler.corn/2011/05/another-trojan-bamital-pattern.htrul, May 6, Jelena Mirkovic et al,. "Internet Denial of Service: Attack and 2011 (5 pages). Defense Mechanisms", Prentice Hall Professional Technical Refer­ Sergey Golovanov et al., "TDL4-Top Bot", http://www.secuirlist. ence, 2004, pp. v-xxii and 1-372. corn/en/analysis/204792180/TDL4_Top_Bot, Jun. 27, 2011 ( 15 "Symantec Internet Security Threat Report: Trends for Jan. 1-Jun. 30, pages). 2004" Symantec, Sep. 2004, pp. 1-54. P. Mockapetris, "Domain Names~oncepts and Facilities", Net­ David Dagon et al., "HoneyStat: Local Worm Detection Using work Working Group, http://www.iett.org/rfc/rfcl 034.txt, Nov. 1987 Honeypots", RAID 2004, LNCS 3224, pp. 39-58 (2004). (52 pages). Jonghyun Kim et al., "Measurement and Analysis of Worm Propa­ P. Mockapetris, "Domain Names-Implementation and Specifica­ gation on Internet Network Topology", IEEE, pp. 495-500 (2004). tion", Network Working Group, http://www.iett.org/rfc/rfc1035.txt, Andreas Marx, "Outbreak Response Times: Putting AV to the Test", Nov. 1987 (52 pages). www.virusbtn.com, Feb. 2004, pp. 4-6. Phillip Porras et al. "SRI International Technical Report: An Analysis Vinod Yegneswaran et al., "Global Intrusion Detection in the of Conficker's Logic and Rendezvous Points", http://mtc.sri.com/ DOMINO Overlay System", Proceedings of Network and Distrib­ Conficker/, Mar. 19, 2009, (31 pages). uted Security Symposium (NDSS), 17 pages Feb. 2004. Phillip Porras et al. "SRI International Technical Report: Conficker C Vinod Yegneswaran et al., "On the Design and Use oflnternet Sinks Analysis", http://mtc.sri.com/Conficker/addendumC, Apr. 4, 2009, for Network Abuse Monitoring", RAID 2004, LNCS 3224, pp. 146- (24 pages). 165 (2004). Paul Royal, Damballa, "Analysis of the Kracken Botnet", Apr. 9, Cliff Changchun Zou et al., "Worm Propagation Modeling and 2008 ( 13 pages). Analysis Under Dynamic Quarantine Defense", WORM'03, Oct. 27, Sergei Shevchenko, "Srizbi's Domain Calculator", http://blog. 2003, Washington, DC USA, 10 pages. threatexpert.corn/2008/11/srizbix-domain-calculator.htrul, Nov. 28, Cliff C. Zou et al., "Routing Worm: A Fast, Selective Attack Worm 2008 (3 pages). Based on IP Address Information", Technical Report: TR-03-CSE- Sergei Shevchenko, "Domain Name Generator for Murofet", http:// 06, Principles ofAdvanced and Distributed Simulation (PADS) 2005, blog.threatexpert.corn/2010/ 1O/domain-name-generator-for­ pp. 199-206, Jun. 1-3, 2005. murofet.htrul, Oct. 14, 2010 (4 pages). Thorsten Holz, "Anti-Honeypot Technology", 21st Chaos Commu­ P. Akritidis et al., "Efficient Content-Based Detection of Zero-Day nication Congress, slides 1-57, Dec. 2004. Worms", 2005 IEEE International Conference in communications, "CipherTrust's Zombie Stats", http://www.ciphertrust.com/re­ vol. 2, pp. 837-843, May 2005. sources/statistics/zombie.php 3 pages, printed Mar. 25, 2009. M. Patrick Collins et al., "Hit-List Worm Detection and Bot Identi­ Joe Stewart, "Phatbot Trojan Analysis", http://www.secureworks. fication in Large Networks Using Protocol Graphs", RAID 2007, corn/research/threats/phatbot, Mar. 15, 2004, 3 pages. LNCS 4637, pp. 276-295 (2007). Thorsten Holz et al., "A Short Visit to the Bot Zoo", IEEE Security & Nicholas Weaver et al., "Very Fast Containment of Scanning Privacy, pp. 7679 (2005). Worms", In proceedings of the 13th Usenix Security Symposium, pp. Michael Glenn, "A Sununary ofDoS/DDoS Prevention, Monitoring 29-44, Aug. 9-13, 2004. and Mitigation Techniques in a Service Provider Environment", David Whyte et al., "DNA-Based Detection of Scanning Worms in an SANS Institute 2003, Aug. 21, 2003, pp. -iv, and 1-30. Enterprise Network", In Proc. of the 12th Annual Network and Dis­ Dennis Fisher, "Thwarting the Zombies", Mar. 31, 2003, 2 pages. tributed System Security Symposium, pp. 181-195, Feb. 3-4, 2005. Dongeun Kim et al., "Request Rate Adaptive Dispatching Architec­ Cristian Abad et al., "Log Correlation for Intrusion Detection: A ture for Scalable Internet Server", Proceedings of the IEEE Interna­ Proof of Concept", In Proceedings of The 19th Annual Computer tional Conference on Cluster Computing (CLUSTER'OO); pp. 289- Security Application Conference (ACSAC'03), (11 pages) (2003). 296 (2000). Lala A. Adamic et al., "Zipfs Law and the Internet", Glottometrics, Keisuke Ishibashi et al., "Detecting Mass-Mailing Worm Infected vol. 3, pp. 143-150 (2002). Hosts by Mining DNS Traffic Data", SIGCOMM'05 Workshops, pp. K.G. Anagnostakis et al., "Detecting Targeted Attacks Using Shadow 159-164 (Aug. 22-26, 2005). Honeypots", In Proceedings of the 14th Usenx Secuirty Symposium, Nicholas Weaver et al., "A Taxonomy of Computer Worms", pp. 129-144 (2005). WORM'03, pp. 11-18 (Oct. 27, 2003). Paul Baecher et al., "The Nepenthes Platform: An Efficient Approach Stephan Axelsson, "The Base-Rate Fallacy and the Difficulty of to Collect Malware", In Proceedings of Recent Advances in Intrusion Intrusion Detection", ACM Transactions on Information and System Detection (RAID 2006), LNCS 4219, pp. 165-184, Sep. 2006. Security, vol. 3, No. 3, pp. 186-205 (Aug. 2000). Paul Barford et al., "An Inside Look at Botnets", Special Workshop Niel Landwehr et al., "Logistic Model Trees", Machine Learning, on Malware Detection, Advances in Information Security, Spring vol. 59, pp. 161-205 (2005). Verlag, pp. 171-192 (2006). US 9,306,969 B2 Page 6

(56) References Cited Ke Wang et al., "Anomalous Payload-Based Worm Detection and Signature Generation", In Proceedings of the International Sympo­ OTHER PUBLICATIONS sium on Recent Advances in Intrusion Detection (RAID) (200 5) (20 pages). James R. Binkley et al., "An Algorithm for Anomaly-Based Botnet David Whyte, "Exposure Maps: Removing Reliance on Attribution Detection", 2nd Workshop on Steps to Reducing Unwanted Traffic on During Scan Detection", 1st Usenix Workshop on Hot Topics in the Internet (SRUTI '06), pp. 43-48, Jul. 7, 2006. Security, pp. 51-55 (2006). Steven Cheung et al., "Modeling Multistep Cyber Attacks for Sce­ Jiahai Yang et al., "CARDS: A Distributed System for Detecting nario Recognition", In Proceedings ofthe Third DARPA Information Coordinated Attacks", In Sec (2000) (10 pages). Survivability Conference and Exposition (DISCEX III), vol. 1, pp. Vinod Yegneswaran et al., "Using Honeynets for Internet Situational 284-292, Apr. 22-24, 2003. Awareness", In proceedings of the Fourth Workshop on Hot Topics in Evan Cooke et al., "The Zombie Roundup: Understanding, Detect­ Networks (HotNets IV), Nov. 2005 (6 pages). ing, and Disrupting Botnets", Steps to Reducing Unwanted Traffic on David Dagon et al., "Corrupted DNS Resolution Paths: The Rise ofa the Internet Workshop (SRUTI '05), pp. 39-44, Jul. 7, 2005. Malicious Resolution Authority", In Proceedings of Network and Frederic Cuppens et al., "Alert Correlation in a Cooperative Intrusion Distributed Security Symposium (NDSS '08) (2008) ( 15 pages). Detection Framework", In Proceedings of IEEE Symposium on Dihe's IP-Index Browser, http://ipindex.homelinux.net/index.php, Security and Privacy 2002, pp. 202-215 (2002). updated Oct. 13, 2012 (1 page). David Dagon et al., "Modeling Botnet Propagation using Time Shuang Hao et al., "An Internet-Wide View into DNS Lookup Pat­ Zones", The 13th Annual Network and Distributed System Security terns", http://labs. veri sign.corn/projects/rnali ci ous-domain -names/ Symposium 2006, Feb. 2-3, 2006 (18 pages). white-paper/dns-imc2010.pdf (2010) (6 pages). Roger Dingledine et al., "Tor: The Second-Generation Onion Thorsten Holz et al., "Measuring and Detecting Fast-Flux Service Router", In Proceedings of the 13th Usenix Security Symposium, pp. Networks", In Proceedings ofNDSS (2008) (12 pages). 303-320 Aug. 9-13, 2004. Jaeyeon Jung et al., "DNS Performance and the Effectiveness of Steven T. Eckman et al., "STATL: An Attack Language for State­ Caching", IEEE/ACM Transactions on Networking, vol. 10, No. 5, Based Intrusion Detection", Journal of Computer Security, vol. 10, pp. 589-603, Oct. 2002. pp. 71-103 (2002). The Honeynet Project & Research Alliance, "Know Your Enemy: Daniel R. Ellis, et al., "A Behavioral Approach to Worm Detection", Fast-Flux Service Networks: An Ever Changing Enemy", http://old. WORM'04, Oct. 29, 2004 (11 pages). honeynet.org/papers/ff/fast-flux.htrnl, Jul. 13, 2007 (10 pages). Prahlad Fogla et al., "Polymorphic Blending Attacks", In Proceed­ Duane Wessels et al., "Measurements and Laboratory Simulations of ings of 15th Usenix Security Symposium, pp. 241-256, (2006). the Upper DNS Hierarchy", In PAM (2005) (10 pages). Jan Goebel, "Rishi: Identify Bot Contaminated Hosts by IRC Nick­ Joe Stewart, "Top Spam Botnets Exposed", http://www.secureworks. name Evaluation", Hot Bots'07, Apr. 10, 2007 (14 pages). corn/cyber-threat-intelligence/threats/topbotnets/, Apr. 8, 2008 (11 Koral II gun et al., "State transition Analysis: A Rule-Based Intrusion pages). Detection Approach", IEEE Transactions on Software Engineering, Brett Stone-Gross et al., "Your Botnet is My Botnet: Analysis of a vol. 21, No. 3, pp. 181-199, Mar. 1995. Botnet Takeover", CCS'09, Nov. 9-13, 2009 (13 pages). Xuxian Jiang et al., "Profiling Self-Propagating Worms Via Behav­ Sam Stover et al., "Analysis of the Storm and Nugache Trojans: P2P ioral Footprinting", WORM'06, Nov. 3, 2006 (7 pages). is here", Login, vol. 32, No. 6, pp. 18-27, Dec. 2007. Giovanni Vigna et al., "NetSTAT: A Network-based Intrusion Detec­ "Storm Botnet", http://en.wikipedia.org/wiki/Storm_botnet, Printed tion Approach", In Proceedings of the 14th Annual Computer Secu­ Jan. 29, 2013 (7 pages). rity Applications Conference (ACSAC '98), pp. 25-34, Dec. 7-11, Jeff Williams, "What We Know (and Learn) for the Waledac 1998. Takedown", http:/ /blogs.technet.corn/b/mmpc/archive/2010/03/ 15/ Kelly Jackson Higgins, "Shadowserver to Build' Sinkhole' Server to what-we-know-and-learned-from-the-waledac-takedown.aspx, Mar. Find Errant Bots: new Initiative Will Emulate IRC, HTTP Botnet 15, 2010 (2 pages). Traffic", http:/I darkreading .corn/taxonomy/index/printarticle/id/ "Trojan: J ava/Boonan", http://micro so ft.corn/ security/portal/threat/ 211201241. Sep. 24, 2008 (2 pages). encyclopedia/entry.aspx?Name~Trojan%3AJava%2FBoonan, Apr. Kelly Jackson Higgins, "Hacking a New DNS Attack: DNS Expert 17, 2011 (5 pages). Disputes Georgia Tach and Google Research That Points to Mali­ Julia Wolf, "Technical Details ofSrizbi's Domain Generation Algo­ cious Deployment of Certain Types of DNS Servers", http:// rithm", http://blog.fireeye.com/research/2008/ 11/technical-details­ darkreading.corn/taxonomy/index/printarticle/id/208803784. Dec. of-srizbis-domain-generation-algorithm.html, Nov. 25, 2008 (4 18, 2007 (2 pages). pages). Christian Kreibich, "Honeycomb: Automated Signature Creation Sandeep Yadav et al., "Detecting Algorithmically Generated Mali­ Using Honeypots", http://www.icir.org/christain/honeycomb/index. cious Domain Names", In Proceedings of the 10th Annual Confer­ html, Mar. 26, 2007, (3 pages). ence on Internet Measurement (IMC' 10), pp. 48-61, Nov. 1-3, 2010. Artem Dinaburg et al., "Ether: Malware Analysis Via Hardware "TEMU: The BitBlaze Dynamic Analysis Component", http:// Virtualization Extensions", CCS'08, Oct. 27-31, 2008 (12 pages). bitblaze.cs.berkeley.edu/temu.htrnl, printed Jan. 29, 2013 (1 page). Paul Royal, "Alternative Medicine: The Malware Analyst's Blue Paul Bacher et al., "Know Your Enemy: Tracking Botnets: Using Pill", Black Hat USA 2008, Aug. 6, 2008 (33 pages). Honeynets to Learn More About Bots", http://www.honeynet.org/ Paul Royal, "Alternative Medicine: The Malware Analyst's Blue papers/bots, Aug. 10, 2008 ( 1 page). Pill", www.damballa.com/downloads/r_pubs/KrakenWhitepaper. Michael Bailey et al., "Automated Classification and Analysis of pdf (2008) (3pages ). Internet Malware", RAID 2007, LNCS 4637, pp. 178-197 (2007). Robert Perdisci et al., "Behavioral Clustering of HTTP-Based Paul Barham et al., "Xen and the Art ofVirtualization", SOSP'03, Malware and Signature Generation Using Malicious Network Oct. 19-22, 2003 (14 pages). Traces", Usenix Symposium on Networked Systems Design and Ulrich Bayer et al., "TTAnalyze: A Tool for Analyzing Malware", In Implementation (NSDI 2010), (2010) (16 Pages). Proceedings of the 15th Annual Conference European Institute for Christopher Kruegel et al., "Polymorphic Worm Detection using Computer Antivirus Research (EICAR), pp. 180-192 (2006). Structural Information of Executables", RAID 2005, pp. 207-226 Fabrice Bellard, "QEMU, A Fast and Portable Dynamic Translator", (2005). In Proceedings of the Annual Confernce on Usenix Annual Technical Paul Vixie, "DNS Complexity", ACM Queue, pp. 24-29, Apr. 2007. Conference, pp. 41-46 (2005). Ke Wang et al., "Anagram: A Content Anomaly Detector Resistant to Kevin Borders et al., "Siren: Catching Evasive Malware (Short Mimicry Attack", In Proceedings of the International Symposium on Paper)", IEEE Symposium on Security and Privacy, pp. 78-85, May Recent Advances in Intrusion Detection (RAID) (2006) (20 pages). 21-24, 2006. US 9,306,969 B2 Page 7

(56) References Cited Paul Royal et al., "PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware", In Proceedings of the OTHER PUBLICATIONS Annual Computer Security Applications Conference (ACSAC), pp. 289-300 (2006). Christopher M. Bishop, Pattern Recognition and Machine Learning Rich Uhlig et al., "Intel Virualization Technology", Computer, vol. (Information Science and Statistics), Springer-Verlag New York, 38, No. 5, pp. 48-56, May 2005. Inc., Secauscus, NJ, USA, 2006. Amit Vasudevan et al., "Stealth Breakpoints", In Proceedings of the Ronen Feldman et al., "The Text Mining Handbook: Advance 21st Annual Computer Security Applications Conference (ACSAC), Approaches in Analyzing Unstructured Data", Cambridge Univ. Pr., pp. 381-392, (2005). 2007. Amit Vasudevan et al., "Cobra: Fine-Grained Malware Analysis Michael Hale Ligh et al., "Malware Analyst's Cookbook and DVD", Using Stealth Localized-Executions", In Proceedings of the 2006 Wiley, 2010. IEEE Symposium on Security and Privacy (S&P'06), pp. 264-279 M. Newman, "Networks: An Introduction", Oxford University Press, (2006). 2010. Yi-Min Wang et al., "Automated Web Patrol with Strider Matt Bishop, "Computer Security: Art and Science", Addison­ HoneyMonkeys: Finding Web Sites That Exploit Browser Vulner­ Wesley Professional, 2003. abilities", In NDSS'06 (2006) (15 pages). Neils Provos et al., "Virtual Honeypots: Form Botnet Tracking to Heng Yin et al., "Panorama: Capturing System-Wide Information Intrusion Detection", Addison-Wesley Professional, Reading, 2007. Flow for Malware Detection and Analysis", In Proceedings of ACM Michael Sipser, "Introduction to the Theory of Computation", Inter­ Conference on Computer and Communication Security, Oct. 29-Nov. national Thomson Publishing, 1996. 2, 2007 (13 pages). Peter Szor, "The Art of Computer Virus Research and Defense", Joanna Rutkowska, "Introducing Blue Pill", http://theinvisbileth­ Addison-Wesley Professional, 2005. ings.blogspot.com/2006/06/introducing-blue-pill.html, Jun. 22, Anil K. Jain et al., "Algorithms for Clustering Data", Prentice-Hall, 2006 (26 pages). Inc., 1988. Peter Ferrie, "Anti-Unpacker Tricks", In Proceedings of the 2nd V. Laurikari, "TRE", 2006 (5 pages). International CARO Workshop (2008) (25 pages). P. Porras, "Inside Risks: Reflections on Conficker", Communications Danny Quist, "Covert Debugging Circumventing Software Armoring of the ACM, vol. 52, No. 10, pp. 23-24, Oct. 2009. Techniques"; In Proceedings of Black Hat USA 2007 (2007) (5 Changda Wang et al., "The Dilemma of Covert Channels Searching", pages). ICISC 2005, LNCS 3935, pp. 169-174, 2006. Ulrich Bayer et al., "Scalable, Behavior-Basedmalware Clustering", C. Willems et al., "Toward Automated Dynamic Malware Analysis In Network and Distributed System Security Symposium (2009) (18 Using CWSandbox", IEEE Security and Privacy, vol. 5, No. 2, pp. pages). 32-39, 2007. R Developmental Core Team, "R: A Language and Environment for David Brumley et al., "Automatically Identifying Trigger-Based statistical Computing", R. Foundation for Statistical Computing, Behavior in Malware", Botnet Detection, pp. 1-24 (2008). Vienna Austria, 2008. Dancho Danchev, "Web Based Botnet Command and Control Kit Simon Urbanek, "rJava: Low-Level-R to Java Interface", printed 2 .O", http://ddanchev. blogspot.corn/2008/08/web-based-botnet­ May 6, 2013 (5 pages). command-and-control.html, Aug. 22, 2008 (5 pages). Juan Caballero et al., "Polyglot: Automatic Extraction of Protocol Ozgun Erdogan et al., "Hash-AV: Fast Virus Signature matching by Message Format Using Dynamic Binary Analysis", In Proceedings Cache-Resident Filters", Int. J. Secur. Netw., vol. 2, pp. 50-59 (2007). ofACM Conference on Computer and Communication Security, Oct. Fanglu Guo et al., "A Study of the Packer Problem and Its Solutions", 2007 (15 pages). In Recent Advances in Intrusion Detection (RAID 2008), LNCS Mihai Christodorescu et al., "Semantics-Aware Malware Detection", 5230, pp. 95-115 (2008). In Proceeding of the 200 5 IEEE Symposium on Security and Privacy, Maria Halkidi et al., "On Clustering Validation Techniques", Journal pp. 32-46 (2005). oflntelligent Information Systems, vol. 17, pp. 107-145 (2001). Mihai Christodorescu et al,. "Mining Specifications on Malicious A.K. Jain et al., "Data Clustering: A Review", ACM Computing Behavior", ESEC/FSE'07, Sep. 3-7, 2007 (10 pages). Surveys, vol. 31, No. 3, pp. 264-323, Sep. 1999. Peter Ferrie, "Attacks on Virtual Machine Emulators", Symantec John P. John et al., "Studying Spamming Botnets using Botlab", In Advance Threat Research, 2006 ( 13 pages). Usenix Symposium on Networked Systems Design and Implemen­ Peter Ferrie, "Attacks on More Virtual Machine Emulators", tation (NDSI), (2009) ( 16 pages). Symantec Advance Threat Research, http://pferrie.tripod.com/pa­ Hyang-Ah Kim et al., "Autograph: Toward Automated, distributed pers/attacks2.pdf, 2007 (17 pages). Worm Signature Detection", In Usenix Security Symposium (2004) Tai Garfinkel et al., "A Virtual Machine Introspection Based Archi­ (16 pages). tecture for Intrusion Detection", In Proceedings of Network and Clemens Kolbitsch et al., "Effective and Efficient Malware Detection Distributed Systems Security Symposium, Feb. 2003 (16 pages). at the End Host", In 18th Usenix Security Symposium, pp. 351-366 G. Hunt et al., "Detours: Binary Interception ofWIN32 Functions", (2009). Proceedings of the 3rd Usenix Windows NT Symposium, Jul. 12-13, Kevin Borders et al., "Protecting Confidential Data on Personal Com­ 1999 (9 pages). puters with Storage Capsules", In 18th Usenix Security Symposium, Xuxian Jiang et al., "Stealthy Malware Detection Through VMM­ pp. 367-382 (2009). Based "Out-of-the-Box" Semantic View Reconstruction", CCS'07, Ralf Hund et al., "Return-Oriented Rootkits: Bypassing Kernel Code Oct. 29-Nov. 2, 2007 (11 pages). Integrity Protection Mechanisms", In 18th Usenix Security Sympo­ Xuxian Jiang et al., "Virtual Playgrounds for Worm Behavior Inves­ sium, pp. 383-398 (2009). tigation", RAID 2005, LNCS 3858, pp. 1-21 (2006). Christian Kreibich et al., "Honeycomb--Creating Intrusion Detec­ Min Gyung Kang et al., "Renovo: A Hidden Code Extract for Packed tion Signatures Using Honeypots", In ACM Workshop on Hot Topics Executables", WORM'07, Nov. 2, 2007 (8 pages). in Networks (2003) (6 pages). Christopher Kruegel et al., "Detecting Kernel-Level Rootkits Zhichun Li et al., "Hamsa: Fast Signature Generational for Zero-Day Through Binary Analysis", In Proceedings of the Annual Computer Polymorphic Worms with Provable Attack Resilience", In IEEE Security Applications Conference (ACSAC), pp. 91-100, Dec. 2004. Symposium on Security and Privacy (2006) (15 pages). Lorenzo Martignoni et al., "OnmiUnpack: Fast, Generic, and Safe James Newsome et al., "Polygraph: Automatically Generating Sig­ Unpacking of Malware", In Proceedings of the Annual Computer natures for Polymorphic Worms", In IEEE Symposium on Security Security Applications Conference (ACSAC), pp. 431-441 (2007). and Privacy (2005) (16 pages). Thomas Raffetseder et al., "Detecting System Emulators", In ISC, Sun Wu et al., "AGREP-A Fast Approximate Pattern-Matching pp. 1-18 (2007). Tool'', In Usenix Technical Conference (1992) (10 pages). US 9,306,969 B2 Page 8

(56) References Cited Simon Biles, "Detecting the Unknown with Snort and Statistical Packet Anomaly Detecting Engine", www.cs.luc.edu/-pld/courses/ OTHER PUBLICATIONS 447 /sum08/class6/biles.spade.pdf (2003) (9 pages). James Newsome et al., "Paragraph: Thwarting Signature Learning by Vinod Yegneswaren et al.,, "An Architecture for Generating Seman­ Training Maliciously", In Recent Advance in Intrusion Detection tics-Aware Signatures", In Usenix Security Symposium (2005) (16 (RAID), 2005 (21 pages). pages). Jon Oberheide et al., "CloudAV: N-Version Antivirus in the Network Jaeyeon Jung, "Fast Portscan Detection Using Sequential Hypothesis Cloud", In Proceedings of the 17th Usenix Security Symposium, pp. Testing", In Proceedings of IEEE Symposium on Security Privacy, 91-106 (2008). pp. 211-225 (2004). Dan Pelleg et al., "X-Means: Extending K-Means with Efficient Anestis Karasaridis et al., "Wide-Scale Botnet Detection and Char­ Estimation of the Number of Clusters", In International Conference acterization", In Usenix Workshop on Hot Topics in Understanding on Machine Learning (2000) (8 pages). Botnets (HotBots'07), Apr. 11-13, 2007 (9 pages). Roberto Perdisci et al., "Misleading Worm Signature Generators Carl Livades et al., "Using Machine Learning Techniques to Identify Using Deliberate Noise Injection", In IEEE Symposium on Security Botnet Traffic", In 2nd IEEE LCN Workshop on Network Security and Privacy (2006) (15 pages). Mark F elegyhazi et al., "On the Potential of Proactive Domain Black­ (WoNS'2006), pp. 967-974 (2006). listing", In the Third Usenix LEET Workshop (2010) (8 pages). "CVE-2006-3439", http://cve.mitre.org/cgi-bin/cvename. Konrad Rieck et al., "Learning and Classification ofMalware Behav­ cgi?name~CVE-2006-3439, printed Jun. 27, 2012 (2 pages). ior", DIMVA 2008, LNCS 5137, pp. 108-125 (2008). David Moore, "Inferring Internet Denial-of-Service Activity", In Sumeet Singh et al., "Automated Worm Fingerprinting", in ACM/ Proceedings of the 10th Usenix Security Symposium, Aug. 13-17, Usenix Symposium on Design and Implementa­ 2001 (15 pages). tion, Dec. 2004 (16 pages). Peng Ning et al., "Constructing Attack Scenarios Through Correla­ "EFnet Chat Network", http://www.efnet.org, dated Jun. 18, 2007 (3 tion of Intrusion Alerts", In Proceedings of Computer and Commu­ pages). nications Security (CCS'02), Nov. 18-22, 2002 (10 pages). Guofei Gu et al. "Bothunter: Detecting Malware Infection Through Vern Paxson, "Bro: A System for Detecting Network Intruders in IDS-Driven Dialog Correlation", Proceedings of 16th Usenix Secu­ Real-Time", In Proceedings of the 7th Usenix Security Symposium, rity Symposium, pp. 167-182 (2007). Jan. 26-29, 1998 (22 pages). The Conficker Working Group,"Conficker Working Group: Lessons Roberto Perdisci et al., "Using an Ensemble of One-Class SVM Learned", Conficker_Working_Group_Lessons_Learned_17 _ Classifiers to Harden Payload-Based Anomaly Detection Systems", June_2010_final.pdf, published Jan. 2011 (59 pages). In Proceedings of the 6th International Conference on Data Mining Manos Antonakakis et al., "The Command Structure of the Aurora (ICDM'06), pp. 488-498, Dec. 2006. Bonet", http://www.damballa.com/downloads/r_pubs/Aurora_ Phillip A. Porras, "Privacy-Enabled Global Threat Monitoring", Botnet_Command_Structure.pdf, 2010 (31 pages). R. Arends et al. , "Protocol Modifications for the DNS Security IEEE Security & Privacy, pp. 60-63 (2006). Extensions", htp://www.ietf.org/rfc/rfc4035.txt, Mar. 2005 (50 Moheeb Abu Raj ab et al., "A Multifaceted Approach to Understand­ pages). ing the Botnet Phenomenon", In Proceedings of the ACM R. Arends et al. , "DNS Security Introduction and Requirements", SIGCOMM/Usenix Internet Measurement Conference (ICM'06), htp://www.ietf.org/rfc/rfc4033.txt, Mar. 2005 (20 pages). Oct. 25-27, 2006 (12 pages). R. Arends et al. , "Resource Records for the DNS Security Exten­ Anirudh Ramachandran et al., "Understanding the Network-Level sions", htp://www.ietf.org/rfc/rfc4034.txt, Mar. 2005 (28 pages). Behavior of Spanuners", In Proceedings of the 2006 Conference on Andreas Berger et al., "Assessing the Real-World Dynamics of Applications, Technologies, Architectures, and Protocols for Com­ DNS", Lecture Notes in Computer Science, vol. 7189, pp. 1-14 puter Communications (SIGCOMM'06), Sep. 11-16, 2006 (13 (2012). pages). Global Research & Analysis Team (GReAT), "Full Analysis of Martin Roesch, "SNORT-Lightweight Intrusion Detection for Net­ Flame's Command & Control Servers", http://www.securelist.com/ works", In Proceedings of 13th System Administration Conference en/blog/7 50/Full_Analysis_of _Flames_ Command_ Control_ (LISA'99), pp. 229-238, Nov. 7-12, 1999. Servers, Sep. 17, 2012 (10 pages). Robin Sommer et al., "Enhancing Byte-Level Network Intrusion Nicolas Falliere et al., "W32.Stuxnet Dossier", http://www. Detection Signatures with Context", In Proceedings ofthe 10th ACM symantec .corn/content/ en/us/enterprise/media/ security _response/ Conference on Computer and Communications Security (CCS'03), whitepapers/w32_stuxnet_dossier.pdf, Feb. 2011 (69 pages). pp. 262-271, Oct. 27-30, 2003. Steinar H. Gunderson, "Global IPv6 Statistics: Measuring the Cur­ "W32/IRCBot-TO", http://www.sophos.com/virusinfo/analyses. rent State of IPv6 for Ordinary Users", http://meetings.ripe.net/ripe- w32ircbotto.htrnl, Jan. 19, 2007 (1 page). 57 /presentations/Colitti-Global_IPv6_statistics_-_Measuring_ Stuart Staniford et al., "Practical Automated Detection of Stealthy the_current_state_of_IPv6_for_ordinary_users_.7gzD.pdf, Portscans", Journal of Computer Security, vol. 10, pp. 105-136 Oct. 24-30, 2008 (20 pages). (2002). Jaeyeon Jung et al., "Modeling TTL-Based Internet Caches", IEEE S. Staniford-Chen et al., "GrIDS-A Graph Based Intrusion Detec­ INFOCOM 2003, pp. 417-426, Mar. 2003. tion System for Large Networks", In Proceedings of the 19th Srinivas Krishnan et al., "DNS Prefetching and Its Privacy Implica­ National Information Systems Security Conference, pp. 361-370 tions: When Good Things Go Bad", In Proceeding of the 3rd (1996). USENIX Conference on Large-Scale Exploits and Emergent Steven J. Templeton et al., "A Requires/Provides Model for Com­ Threats: Botnets, Spyware, Worms, and More (LEET' 10), (2010) (9 puter Attacks", In Proceedings of the 2000 Workshop on New Secu­ pages). rity Paradigms (NSPW'OO), pp. 31-38 (2000). Zhuoqing Morley Mao et al., "A Precise and Efficient Evaluation of Alfonso Valdes et al., "Probabilistic Alert Correlation", In Proceed­ the Proximity Between Web Clients and Their Local DNS Servers", ings of the Recent Attack in Intrusion Detection (RAID 2001 ), LNCS In Proceedings of Usenix Annual Technical Conference (2002) (14 2212, pp. 54-68 (2001). pages). Fredrik Valeur et al., "A Comprehensive Approach to Intrusion Mozilla Foundation, "Public Suffix List", http://publicsuffix.org/, Detection Alert Correlation", IEEE Transactions on Dependable and printed May 23, 2013 (8 pages). Secure Computing, vol. 1, No. 3, pp. 146-169, Jul. 2004. David Plonka et al., "Context-Aware Clustering ofDNS Query Traf­ Kjersti Aas et al., "Text Categorisation: A Survey", Norwegian Com­ fic", In Proceedings of the 8th IMC (2008) (13 pages). puting Center, Jun. 1999 (38 pages). RSA FraudAction Research Labs, "Anatomy of an Attack", http:// M. Andrews, "Negative Caching ofDNS Queries (DNS NCACHE)", blogs/rsa.corn/rivner/anatomy-of-an-attack/, Apr. 1, 2011 (17 http://tools.ietf.org/htrnl/rfc2308, Mar. 1998 (20 pages). pages). US 9,306,969 B2 Page 9

(56) References Cited Peter Wurzinger et al., "Automatically Generating Models for Botnet Detection", In Proceedings of the 14th European Conference on OTHER PUBLICATIONS Research in Computer Security (ESORICS'09), pp. 232-249 (2009). Yinglian Xie et al., "Spamming Botnet: Signatures and Characteris­ Steve Souders, "Sharing Dominant Domains", http://www. tics", In Proceeding of the ACM SIGCOMM 2008 Conference on stevesouders.corn/blog/2009/05/12/sharding-dominant-domains, Data Communications (SIGCOMM'08), pp. 171-182, Aug. 17-22, May 12, 2009 (3 pages). 2008. Paul Vixie, "What DNS Is Not", Communications of the ACM, vol. Yajin Zhou et al., "Dissecting Android Malware: Characterization 52, No. 12, pp. 43-47, Dec. 2009. and Evolution", 2012 IEEE Symposium on Security and Privacy, pp. N. Weaver et al., "Redirecting DNS for ADS and Profit", In Usenix 95-109 (2012). Workshop on Free and Open communications on the Internet (FOCI), Nello Cristianini et al., "An Introduction to Support Vector Machines: Aug. 2011 (6 pages). and other Kernal-Based Learning Methods", Cambridge University Florian Weimer, "Passive DNS Replication", In Proceedings of the Press, New York, NY, USA (2000). 17th Annual FIRST Conference on Computer Security Incident, Apr. Timo Sirainen, "", http://en.wikipedia.org/wiki/Irssi, updated 2005 (13 pages). May 8, 2013 (3 pages). Manos Antonakakis et al., "Unveiling the Network Criminal Infra­ Team Cymru, "IP to ASN Mapping", http://www.team-cymru.org/ structure ofTDSS/TDL4", http://www.damballa.com/ downloads/r_ Services/ip-to-asn.html, printed Mar. 23, 2013 (6 pages). pubs/Damballa_tdss_tdl4_case_study_public.pdf, (undated) ( 16 http://www.bleedingsnort.com, retrieved from Internet Archive on pages). May 23, 2013, Archived Sep. 26, 2006 (3 pages). Manos Antonakakis et al., "From Throw-Away Traffic to Bots: http://www.dshield.org, retrieved from Internet Archive on May 23, Detecting the Rise ofDGA-Based Malware", In Proceedings of the 2013, Archived Sep. 29, 2006 (2 pages). 21st Usenix Conference on Security Symposium (Security' 12), http://www.alexa.com, retrieved from Internet Archive on May 23, (2012) (16 pages). 2013, Archived Sep. 25, 2006 (3 pages). T. Berners-Lee et al., "RFC3986-Uniform Resource Identifier https://sie.isc.org/, retrieved from Internet Archive on May 23, 2013, (URI): Generic Syntax", http://www.hjp.at/ doc/rfc/rfc3986 .html, Archived Dec. 29, 2008 (2 pages). Jan. 2005 (62 pages). http://damballa.com, retrieved from Internet Archive on May 23, Juan Caballero et al., "Measuring Pay-Per-Install: The Commoditiza­ 2013, Archived Jan. 28, 2007 (10 pages). tion of malware Distribution", In Proceedings of the 20th Usenix http://www.dnswl.org, retrieved from Internet Archive on May 23, Conference on Security (SEC' 11 ), (2011) ( 16 pages). 2013, Archived Jul. 15, 2006 (4 pages). Chih-Chung Chang et al., "LIBSVM: A Library for Support Vector http://www.sparnhaus.org/sbl/, retrieved from Internet Archive on Machines" ACM Transactions on Intelligent Systems and Technol­ May 23, 2013, Archived Sep. 24, 2006 (24 pages). ogy 2011, Last Updated Jun. 14, 2007 (26 pages). Dancho Danchev, "Leaked DIY Malware Generating Tool Spotted in http://malwaredornains.com, retrieved from Internet Archive on May the Wild", http://blog.webroot.com/2013/01118/leaked-diy­ 23, 2013, Archived Dec. 28, 2007 (12 pages). malware-generating-tool-spotted-in-the-wild/, Jan. 18, 2013 (6 http://www.opendns.com, retrieved from Internet Archive on May pages). 23, 2013, Archived Sep. 9, 2006 (25 pages). D. De La Higuera et al., "Topology of Strings: Median String is https://zeustracker.abuse.ch, retrieved from Internet Archive on May NP-Complete", Theoretical Computer Science, vol. 230, pp. 39-48 23, 2013, Archived Oct. 26, 2010 (37 pages). (2000). http://www.threatfire.com, retrieved from Internet Archive on May Robert Edmonds, "ISC Passive DNS Architecture", http://kb.isc.org/ 23, 2013, Archived Aug. 22, 2007 (18 pages). getAttach/30/AA-00654/passive-dns-architecture.pdf, Mar. 2012 http://www.avira.com, retrieved from Internet Archive on May 23, (18 pages). 2013, Archived Sep. 29, 2006 (13 pages). Manuel Egele et al., "A Survey on Automated Dynamic Malware­ https://alliance.mwcollect.org, retrieved from Internet Archive on Analysis Techniques and Tools", ACM Computing Surveys, vol. 44, May 23, 2013, Archived Jan. 7, 2007 (2 pages). No. 2, Article 6, pp. 6: 1-6:42, Feb. 2012. http://malfease.oarci.net, retrieved from Internet Archive on May 23, Dennis Fisher, "Zeus Source Code Leaked", http://threatpost.com/ 2013, Archived Apr. 12, 2008 (1 pages). en_us/blogs/zeus-source-code-leaked-051011, May 10, 2011 (6 http://www.oreans.com/themida.php, retrieved from Internet pages). Archive on May 23, 2013, Archived Aug. 23, 2006 (12 pages). Guofei Gu et al., "BotSniffer: Detecting Botnet Command and Con­ http://www.vmware.com, retrieved from Internet Archive on May 23, trol Channels in Network Traffic", In Proceedings of the 15th Annual 2013, Archived Sep. 26, 2006 (32 pages). Network and Distributed System Security Symposium (NDSS'08), Thomas Ptacek, "Side-Channel Detection Attacks Against Unautho­ Feb. 2008 (18 pages). rized Hypervisors", http://www.matasano .corn/log/930/ side-chan­ Grefoire Jacob, "Jackstraws: Picking Command and Control Con­ nel-detection-attacks-against-unauthorized-hypervisors/, Aug. 20, nections from Bot Traffic", In Proceedings of the 20th Usenix Con­ 2007, retrieved from Internet Archive on May 23, 2013, Archived ference on Security (SEC' 11) (2011) (16 pages). Aug. 27, 2007 ( 12 pages). Jiyong Jang et al., "Bitshred: Feature Hashing Malware for Scalable http://cyber-ta.org/releases/botHunter/index.html, retrieved from Triage and Semantic Analysis", In Proceedings of the 18th ACM Internet Archive on May 23, 2013, Archived Aug. 30, 2007 ( 6 pages). Conference on Computer and Communications Security (CCS' 11), http://anubis.seclab.tuwien.ac.at, retrieved from Internet Archive on pp. 309-320, Oct. 17-21, 2011. May 23, 2013, Archived Apr. 9, 2008 (2 pages). J. Zico Kolter et al., "Learning to Detect and Classify Malicious http://www.siliconrealms.com, retrieved from Internet Archive on Executables in the Wild", Journal of Machine Learning Research, May 23, 2013, Archived Sep. 4, 2006 (12 pages). vol. 7, pp. 2721-2744, Dec. 2006. http://bitblaze.cs.berkeley.edu, retrieved from Internet Archive on John C. Platt, "Probablistic Outputs for Support Vector Machines and May 23, 2013, Archived Jan. 28, 2008 (4 pages). Comparisons to Regularized Likelihood Methods", Advances in http://www.dyninst.org, retrieved from Internet Archive on May 23, Large margin Classifiers, vol. 10, No. 3, pp. 61-74, Mar. 26, 1999. 2013, Archived Aug. 20, 2006 (pages). Team Cymru, "Developing Botnets", http://www.team-cymru.com/ http://www.peid.info, retrieved from Internet Archive on May 23, ReadingRoorn/Whitepapers/2010/ developing-botnets. pdf (2010) (3 2013, Archived Dec. 4, 2007 (2 pages). pages). Mark Russinovich et al., "RegMon for Windows V7.04", http:// Brett Stone-Gross et al., "Pushdo Downloader Variant Generating technet.microsoft.corn/en-us/sysinternals/bb896652.aspx, Pub­ Fake HTTP Requests", http://www.secureworks.com/cyber-threat­ lished Nov. 1, 2006 (4 pages). intelligence/threats/Pushdo_Downloader_Variant_Generating_ "Troj/Agobot-IB", http://www. sophos.corn/virusinfo/analyses/ Fake_HTTP_Requests/, Aug. 31, 2012 (4 pages). trojagobotib.html, printed Jun. 27, 2012 ( 1 page). US 9,306,969 B2 Page 10

(56) References Cited File History of U.S. Appl. No. 14/317,785. File History ofU.S. Appl. No. 13/205,928, for Nov. 14, 2014-Jul. 27, OTHER PUBLICATIONS 2015. File History ofU.S. Appl. No. 13/309,202, for Nov. 14, 2014-Jul. 27, Mark Russinovich et al., "FileMon for Windows V7.04", http:// 2015. technet.microsoft.corn/en-us/sysinternals/bb89664 2.aspx, Nov. 1, File HistoryofU.S. Appl. No. 13/749,205, for Nov. 14, 2014-Jul. 27, 2006 (6 pages). 2015. "Norman Sandbox Whitepaper", Copyright Norman 2003 (19 File History of U.S. Appl. No. 14/015,621, for Apr. 11, 2014-Jul. 27, pages). 2015. Tanveer Alam et al., "Webinar: Intel Virtualization Technology of File History of U.S. Appl. No. 14/015,663, for Apr. 11, 2014-Jul. 27, Embedded Applications", Intel, Copyright 2010 (34 pages). 2015. F. Heinz et al., "IP Tunneling Through Narneserver", http://slashdot. File History of U.S. Appl. No. 14/041,796, for Apr. 11, 2014-Jul. 27, org/story/00/09/10/2230242/ip-tunneling-through-narneservers, 2015. Sep. 10, 2000 (23 Pages). File HistoryofU.S. Appl. No. 14/096,803, for Nov. 14, 2014-Jul. 27, http://www.mcafee.com/us/, printed May 23, 2013 (23 pages). 2015. "Windows Virtual PC", http://en.wikipedia.org/wiki/Windows_Vir­ File HistoryofU.S. Appl. No. 14/317,785, for Nov. 14, 2014-Jul. 27, tual_PC, Last Modified May 5, 2013, Printed May 23, 2013 (21 2015. pages). File History of U.S. Appl. No. 14/616,387. Par Fabien Perigaud, "New Pill?", http://cert.lexsi.com/weblog/in­ File History of U.S. Appl. No. 14/668,329. dex.php/2008/03/21/223-new-pill, Mar. 21, 2008 (3 pages). U.S. Appl. No. 13/309,202. http://handlers.sans.org/jclausing/userdb.txt, printed May 24, 2013 U.S. Appl. No. 14/015,621. (149 pages). U.S. Appl. No. 14/194,076. Avi Kivity et al., "KYM: The Linux Virtual Machine Monitor", U.S. Appl. No. 14/010,016. Proceedings of the Linux Symposium, pp. 225-230, Jun. 27-30, U.S. Appl. No. 12/538,612. 2007. U.S. Appl. No. 13/205,928. Symantec, "Symantec Global Internet Security Threat Report: U.S. Appl. No. 13/749,205. Trends for 2008", vol. XIV, Apr. 2009 (110 pages). U.S. Appl. No. 14/015,582. FileHistoryofU.S.Appl. No. 13/008,257, for Apr. 11-Nov.14, 2014. U.S. Appl. No. 14/015,663. FileHistoryofU.S.Appl. No. 13/205,928, for Apr. 11-Nov.14, 2014. U.S. Appl. No. 14/015,704. FileHistoryofU.S.Appl. No. 13/309,202, for Apr. 11-Nov.14, 2014. U.S. Appl. No. 14/041,796. FileHistoryofU.S.Appl. No. 13/749,205, for Apr. 11-Nov.14, 2014. U.S. Appl. No. 14/096,803. FileHistoryofU.S.Appl. No. 14/096,803, for Apr. 11-Nov.14, 2014. U.S. Appl. No. 14/305,998. FileHistoryofU.S.Appl. No. 14/194,076, for Apr. 11-Nov.14, 2014. U.S. Appl. No. 14/317,785. File History of U.S. Appl. No. 14/304,015. File History of U.S. Appl. No. 14/305,998. * cited by examiner U.S. Patent Apr. 5, 2016 Sheet 1of28 US 9,306,969 B2

"hacker-example. org" iO V!ctin1 Cioud ~~------~ ;--~-~- 5 C\"--- (x:;~ \ ' vx ) ~ .r---.., !:,\ ) /~ / "hacker-exemple.org"? ~- .i. ( V, ) t, v,) -~ i r L /,,- 0~ /-r'~-1 \ I \~ \I I ~\\ - 0 J \\ / / I 1 s ~ ~ '--...../'"'- \s '\ 2 ;l..___ I\ "'x-m":e--- F~DynONS /'",\I G)' '>~~\ _ ;' I_ \ ""a~~''"ii -...,;\._,1-i:; C:.h~.. org~• i~--- ~-~ '- 1 \ -192.0.34.166 ~__...., CNAl\'!E h ~ --· / \ 192.0.34.166 \_ 1-:-\ ""_/ \ • • l t \'o ; l Cornmano a.10 1 ----f t v, ) ~ / / 5 Control Sexes ) \ '-.___/ -- / . l '-....._ ;< / r-- L_ __ 1 / --./ -..... ______~/ ----*i 182.034.166 k:_i ______: !' T------. : ' f---i ~-- : l : ' I . b--~ U.S. Patent Apr. 5, 2016 Sheet 2of28 US 9,306,969 B2

F~GURE 18

'105

l-\-fX~LJ-lR-C-HA-S-ES AT LEAS~ ONE DOMA!N NAME I I I L_ ······-··-·-·-··-·-····-··············------,--···-·····--·------··-----J

115 ---- -· ·- j ______...... ~ i ! ! \IX H/l.RD-CODES DOMAIN NAMES !NTO DROPPER ! j PROGRAMS, .AND SPREADS THEM !NTO BOT i ! COMPUTERS :, i i !______··-·------·----~----

'120

''-,1..... VX C:E:T:::&C MA:HINE~OR VICTl:l~OT · 1 COMPUTERS TO USE TO COMMUN!CATE I i : !______-----··-~·------j

125

-·------·--·-~------·-

/ VX ARRANGES FOR DNS RESOLUTION OF DOMAIN NAME AND REGISTERS Vvrni DNS I SERVICE. i !...... ·-·-----······ ·------·-·· .. ·------·--. ----- ..... J U.S. Patent Apr. 5, 2016 Sheet 3of28 US 9,306,969 B2

1 uhacker-exampie.org' 10 V1ctirn Cioud ~------'---._ ~/---y--~------, 5r-y ~ r.-\ ~ 1\r vx.,, )' "'nacker-example.crg "?. .;-~ " \ 1\. v.-· ; \ L-----...__ \ y ~~.,..-·-" I Vj f'-. ~·~ \ ./ / \.__/ ~------(" \ ! I ' .. ' \ I I \ ,, v ( I ~ \ < " i'· r\ ~ .....-L...\ \ I __,,/ f I V r.-...._ ~I • \ "hacker-exam DynDNS I \J ~ (:J / \ -192.0.34.:6o CNAME \ " , -\ ,,/ '\ : 192.0.34.!66 , L ~--....., ~ .v6 ,i "\--: Command and ', 1 V 1 l \ ../ I 25 .Control Boxes -~----'REDIRECT)-- ~·~ ~ J . -----1 ' ' ~' ,// .: C!< \. .~ ------, 1S2.0.~34.166r~-t------\e~ j '• .I .' 6; F----1 8~ l ______• U.S. Patent Apr. 5, 2016 Sheet 4of28 US 9,306,969 B2

2 05 ,------··---··--·1

"'-- i I ~ ! I I I I i !DENT!FY BOTNET C&C DOMAIN NAME(S) I L-~------~------·-----l 210 ~--··· -J_ - ······································ .... ! I REPLACE lP ,.~ODRESS OF C&C COMPUTER WffH !

IP ADDRESS OF SINKHOLE COMPUTER iI j I I \ L______------__ .J

215 i r·---·--·-·- ______!/{ ______------· "' "'!i ·1 BOTS LOOKING UP C&C COMPUTER VV!LL BE I IREDtRECTED TO CONTACT SINKHOLE COMPUTER l L______i·------~------J 220 ------·· ------· ___'f ___ --~ ------, ~ I t j 1 WHEN BOT COMPUTER CONTACTS SINKHOLE i i COMPUTER, !TS !P ADDRESS !S RECORDED l l : ! I ! '1 I I I I 2~ ~~~= -=~ _]_ ~-=- ·1

•. TRAFFlC FROM THE BOT COMPUTER TO THE i SINKHOLE COMPUTER !S UTIUZED I [______·------·------...... 1 U.S. Patent Apr. 5, 2016 Sheet 5of28 US 9,306,969 B2

. ------1 f ------1 I '"'W' I l DNS i I SE~R·, -:- 0 l I DETECTION I #I . \1 i:.., ~ so~"ARE / I ?"" ; ~- , vv . - / • -VV I 265 i . I I . I ------__ ,,/,/ L---·----~~ L---.:-?-~ -J /// "--.._ .. / ~,, . I ~~ '" // I \ ./ I \ //// r i ! \i ./ :..(\:/ NETWORK ; 250

235

235 U.S. Patent Apr. 5, 2016 Sheet 6of28 US 9 ,306,969 B2

205 305 ' 1------······-··--····------·-····················--······· "J USE DOMA!N AND SUBDOMA!N !NFORMATiON: FOR A G!VEN SLD, FIND 3LDs AND USE TO ' I DETERMINE VVHETHER BOT DNS REQUEST RATE !S NORMAL OR SUSPlC!OUS I ---········· -r------~----J

310 ~, ------·------····-··J..·······--···-·······------·-···---····-··-··1 '-. I ' -"~ FOR DATA DEEMED SUSP!ClOUS, OR FOR DATA i

"t WHERE 305 ANALYS\S W.A.S OTHERWISE ii'

I !NSUFF!CiENT, DETERMINE iF DATA HAS ' 1· EXPONENTLA.L REQUEST RATES (E.G., PER!OD!C ! 1 SPIKES\ I L--····-·------·-··------~---~ --- ______! U.S. Patent Apr. 5, 2016 Sheet 7of28 US 9,306,969 B2

305 ~

405 """ 1-····----·-·······--··--·-- ···--·--- ················- ...... --·--··-···- ! I FOR GiVEN SLD, CALCULATE CANONiC/\L SLD REQUEST RATE I i L___ ~ ------~ ______J I 410 l '-."'- ·----···--·····-----····-·····-··--···----l-···-·-·---···-·-·····-···-··· ·····-·-·-1

DETERrvHNE !F CANONICAL SLD REQUEST RATE S!GN!F!CANTLY DEVIATES FROM THE iv1EAN

IJ ______I ··--~---·--·····-' U.S. Patent Apr. 5, 2016 Sheet 8of28 US 9 ,306,969 B2

F~GURE 48

i rr --:- l 8oitraffic (Canor.ica! SLD Form) Normal traffic (Cano11ica! SLD Form)""""""""""' 1400 -

1200

~

"'0 1000 --·

-; 800 c --'

(/) 2'. C! 600 -

400

200 ! --i i j \ i~f""" ~='""""'"""""~ ,, ...... 1 I o U__LJlJ ___ LLJ_u ___l L-1_l__l_J_J_j_U_11 t. _LLL_!!J_j 01123 01/24 01/25 01126 01127 01128 01129 01i30 01131 02101 02102

Time (in days) U.S. Patent Apr. 5, 2016 Sheet 9of28 US 9,306,969 B2

3'10

505 1------i i I

i _ .. S~RT DNS REQU~SlT RA TESPERH~UR .... _. I

! 510 y_

",f DETERMINE b;::~z~~~~p~E~~~b'AL ACT:: I I i ------. -- - -I U.S. Patent Apr. 5, 2016 Sheet 10 of 28 US 9,306,969 B2

400 i-· ------·--·1------·----t------i·------! Bot--- i Normai Host- - - - 350 I \

300 I~\! i 250 1- \ .c

Rank (by decreasing number of lookups) U.S. Patent Apr. 5, 2016 Sheet 11 of 28 US 9,306,969 B2

FIGURE 6 225

6!0 615 605 '"~ "'-,,"" r·-·-- ·"'------~- ·1 __' , ______· 1 : i i i SURVE!LLANCE DONS REMOVAL TARPSTS i' REPORTING l I . j______I J ~----j I____ ~------U.S. Patent Apr. 5, 2016 Sheet 12 of 28 US 9,306,969 B2

FlGURE 7

!Tl r-riT--r TT-T: ' France Poland Ge;rn;;r.y !fa!y Nethe;lar:ds Uni\ed Kingdom

II I I

04/01/05 04/01105 05/01105 OO;OO 12:00 00:00 12:00 00:00 12:00 00:00 12:00 00:00 12:00 DO:OO U.S. Patent Apr. 5, 2016 Sheet 13 of 28 US 9,306,969 B2

North America Group r------·1 --·------..--"·----··-- -- f

~2/31104 Time

Europe Group 1~----M-r ------;------T-- 5 ··-r------1 ::;(i) c I --1 -~ 4 l[__ : ~ rr\ 3 ~0 N '\ i ID . - l ' c § i ' \ 0 2 l z >- (/) tl\j \j \ i 0 ------L-~~\ ------'-~-:____- ~-~---- - i2i3'1f04 0110il05 01102/05 01103!05 Oi/04105 0!/05iC5 Time

JB s:J E U3 c _Q u Q c c 0 0 z >­(/)

12/31/04 O!iOi/05 Oi/02105 Oi/03/05 Oi/G4i05 Oi/05/05 Time U.S. Patent Apr. 5, 2016 Sheet 14 of 28 US 9,306,969 B2

F!GURE 9l\

Time after release (hours) U.S. Patent Apr. 5, 2016 Sheet 15 of 28 US 9,306,969 B2

xl04 I 8 l------r-----··-- -··------·----T------··--- ·-··-···----,------,----- ! 161· -' \ I B ~ ~ 0 i411_.. ' \

~ i/ ly ~ . l ~ 2 i ! \, ro 1 t·· ..J. ' c i 0 ! ¥Jl '\ ·~ . i ! \ :::; 10r-1 ~

Q..g- !: I! \ -g s 15 l/!s '\\ ~ ~ i!,J g! \ vr ~

00:00 04:00 08:00 i2:00 16:00 20:00 24:00 Release time (UTC hours) U.S. Patent Apr. 5, 2016 Sheet 16 of 28 US 9,306,969 B2

·-...._,~ "1005 ''-. ... 10i0 ''-..__,, "1015 -~ ...... ______:::~"-~----·······-----, :------· ·------~--·---~------, 1··-······------>~...... ------

I I I I THIRD-PARTY DISTRIBUTED SELF-RECONNAiSSANCE RECONNAISSANCE RECONNA.iSSANCE - I j L_ ---- - ______------' L U.S. Patent Apr. 5, 2016 Sheet 17 of 28 US 9,306,969 B2

query A I~~~ queryB \ ~ \ ! ' ~ U.S. Patent Apr. 5, 2016 Sheet 18 of 28 US 9,306,969 B2

FlGURE 12

------·-···-----· --·------···------···------~------·--.' 1~:~; ASN_QJ Nod~ Out-Dearee i !1 Everyone's Internet (AS 13749) 36,875 i2 !quest (AS 7332} 32, 159 7 !3 UUNet (li..S 701} 31,682 5 4 UPC Broadband (AS 6830) 26,502 8 5 E-xpedient (AS i 7054) 19,530 4 U.S. Patent Apr. 5, 2016 Sheet 19 of 28 US 9,306,969 B2

1305 ~ r----' ------··1 ! I IDENTIFY UST OF KNOWN OR I PROBABLE LEGlT!MATE EMAIL i SERVERS THAT IS US!NG DNSBL I

I t L .. ~--T -······-J i3~-~------· t----~-1

II FOR EACH KNOWN OR ':, PROBABLE LEGITIMATE EMA!L SERVER, DER!VE A VERAGE LOOK-UP ARRIVAL RATE I

------1------'! I ~~~- : ~v ~o ~ I :·----··-'c.______!______·---1 l i ANALYZE AVERAGE LOOK-UP !: RATE FOR SOURCE !P I i 13:0 ~---1~------··

,----·-·.---·--··---·-.L______~ I !F AVERAGE LOOK-UP RATE FOR I

J SOURCE IP !S SlGN!F!CANTL Y l' i DlFFERENT THAN AVERAGE ! LOOK-UP FOR LEG!T!M,A.TE :SERVERS, CONSIDER SOURCE ! !PA BOT ) L------_ ___j U.S. Patent Apr. 5, 2016 Sheet 20 of 28 US 9,306,969 B2

FlGURE 14

Attacker(s) performing DNSBL reconnaissance

...... ~ ~ ...... ~ - ......

------ti --,r------,

DNS~based I ------~ I__ Biack!ist C&C Commands Sparnminga•/ Bots_/- / _.- legitimate DNSBL 1ookups from viclim's mai!s;.:,rver

------~ Ii -n

Spam recipient ~--,:::;.~~~- L--- U.S. Patent Apr. 5, 2016 Sheet 21 of 28 US 9,306,969 B2

F~GURE ·15

1505 \

l/l ~:s~: ~~E~~ -, LOGS

! '----,------r---~J

i510 \\ _____ J______l i' i ! DNSBL QUERIES I I PARSED .

A 5'.-5------· ---i--- _J

: i \\ ~ r----"--______J ______l

I r DNSBL QUERIES ' PRUNED

j -- -~-----..-- -- ~------

! i 1520 ' \ \ J: ______l I I G CONSTRUCTED i

1 i '--~-~------;------; I i I •:!:) ,.2" ...., \ l ,--·--·-··-·-·_)'\ w ______, I : I I PERFORM ! QUERY GRAPH i i EXTRAPOLA T!ON iL ______J: U.S. Patent Apr. 5, 2016 Sheet 22 of 28 US 9,306,969 B2

F~GURE 16

""------1 i

CONSTRUCTGRAPH() create empty directed graph G I //Parsing I for each DNSBL query: Identify querier and queried I I/Pruning i if querier in B or queried in B then l add querier and queried to G if they ! are not already members of G i if there exists an edge E (querier, queried} in G then ( increment the waight of that edge else l add E (querier, queried'; to G with weight i I ______J U.S. Patent Apr. 5, 2016 Sheet 23 of 28 US 9,306,969 B2

1705 U.S. Patent Apr. 5, 2016 Sheet 24 of 28 US 9,306,969 B2

F~GURE 18

T1 T2

TTL -t

1

time 1,..... ------~'------~-"'-i ___,_ ___ ,____,,.___ __L ______U.S. Patent Apr. 5, 2016 Sheet 25 of 28 US 9,306,969 B2

1905 \ \ 1·-;R RO~T~~~- -·1 i ADDRESSES ARE i ORGANIZED !NTO i ORGAN!ZAT!ONAL I UNlTS I ··------~,~------1910 \ I ,-F::.EA:H 1 ' U~iT: I j CALCULATING THE I l CPRS I

'-.--. --~ _J

1915 \ _1_ ___ i-:~~~--;;E -SORTED i i lNA UST!N ! DESCENDING I ! ORDER /',CCORD!NG I I TO CPRS VALUES I I I [--~------: U.S. Patent Apr. 5, 2016 Sheet 26 of 28 US 9,306,969 B2

i~L ______,~______*

R L~~J~.

*! ! i L------·-·---1- U.S. Patent Apr. 5, 2016 Sheet 27 of 28 US 9,306,969 B2

FIGURE 21

2105 !~------··- i i i I NUMBER OF ADS's tS

iI sr::TTOL "1" I

l ·------·-y 21 iO I i-~--L __ I

I QUERY SENT AND j I TLL PERIOD IS \ l _OBSERV:O _ _j

2! 15 t r~ __ J ______l

I WA!T!NG PERIOD IS OBSERVED i i t ! ______j ! 2120 ' [ f __' , ______lt ______i i t i ADD!T!ONAL QUERY i SENT AND TLL k- I 1 I OBSERVED . '{ES L______J i i 2125 I . ___ J___ -·------2130 /,'-.._"-- // "-- ' INCREMENT ADS ;--(AS ADS cm~~ ! VALUEBY"1"iF ------..~./ SEEN )

REQUIREMENTS MET 1 "~\tCREMENTED?--· I i "-- / i i '" ,/ L______I ~(/ ; NO ~ 2'140 ~

EX!T U.S. Patent Apr. 5, 2016 Sheet 28 of 28 US 9,306,969 B2

2205 \ \ 1--~------···-····-·-··-·1

FOR EACH DOMAIN, i DNS SOA !S I CONSULTED FOR ! TTL ! I 1-----.-·--·-·····-··.J

22 rn '·""· __ _:c, ______vI______.,

FOR ORN, x I THREADS CREATED ND SYNCHRONIZED TO PERFORM DNS I ~ QUERIES I ! SIMULTANEOUSLY I L_. / <::--·-···-' / ',

222,~':~------1~_· , 2;r,i··:···.,. _, i ' i i , -iR'->=1 s~-,-~ D"TA I i OPEN RECURSIVE i ~"'':. t::t for on the bot itself. Thus, the a value of a bot could be close to all i's, wheret is a threshold, the source IP is considered a bot. that oflegitimate servers. Thus, an additional method can be The above process of measuring the arrival rates of the legiti­ used to detect bots performing distributed reconnaissance. mate servers is repeated for every n time intervals. The com­ The temporal arrival pattern of queries at the DNSBL by parison of the arrival rate from a source IP, A', with the normal hosts performing reconnaissance may differ from temporal 25 values, A,' s, is performed using the A' and A,' s computed over characteristics of queries performed by legitimate hosts. With the same period in time. legitimate mail server's DNSBL look-ups, the look-ups are FIG. 15 illustrates a method for constructing a DNSBL typically driven automatically when email arrives at the mail query graph, according to one embodiment of the invention. server and will thus arrive at a rate that mirrors the arrival rates Referring to FIG. 15, in 1505 a set of DNSBL query logs is of email. Distributed reconnaissance-based look-ups, on the 30 input. In 1510, the DNSBL queries are parsed to include only other hand, will not reflect any realistic arrival patterns of querier or queried IP addresses. In 1515, the DNSBL queries legitimate email. In other words, the arrival rate of look-ups are then pruned to include only IP addresses which are present from a bot is not likely to be similar to the arrival rate of in a set B, which is a set ofkuown bot IP addresses. In 1520, look-ups from a legitimate email server. a graph G is a DNSBL query graph constructed using the FIG. 13 illustrates the process of determining whether the 35 input from 1505-1515. G illustrates all IP addresses that are arrival rate of look-ups from a source IP are similar to the querier or queried by the DNSBL pruned queries. Thus, G arrival rate oflook-ups from legitimate email servers, accord­ illustrates all suspect IP addresses that either queried, or were ing to one embodiment of the invention. In 1305, a list of queried by the suspect IP addresses in set B. In 1525, to kuown or probable legitimate email servers that are using the address the situation where both the querier or queried nodes DNSBL service is identified. This can be done, for example, 40 from the DNSBL query set are members ofB, a query graph as set forth below: extrapolation is performed. Here a second pass is made and If the DNSBL is subscription-based or has access control, edges are added if at least one of the endpoints of the edge use a list of approved users (the email servers) to record the IP (i.e., either querier or queried) is already present on the graph addresses that the servers use for accessing the DNSBL ser­ G. vice. Enter these addresses into a list of Known Mail Server 45 FIG. 16 is an algorithm setting forth the method explained IPs. in FIG. 15, according to one embodiment of the invention. If the DNSBL service allows anonymous access, monitor FIG. 12 sets forth a table of nodes, found utilizing the algo­ the source IPs of incoming look-up requests, and record a list rithm in FIG. 16, which has the highest out-degrees, and the of unique IP addresses (hereinafter "Probable Known Mail number of hosts that are kuown spammers (appearing in a Server IPs"). For each IP address in the Probably Known Mail 50 spam sinkhole). Server IPs list: In addition to finding bots that perform queries for other IP Connect to the IP address to see ifthe IP address is running addresses, the above methods also lead to the identification of on a kuown mail server. If a barmer string is in the return additional bots. This is because when a bot has been identified message from the IP address, and its responses to a small set as performing queries for other IP addresses, the other of SMTP commands, e.g. VRFY, HELO, EHLO, etc., match 55 machines being queried by the bot also have a reasonable kuown types and formats of responses associated with a typi- likelihood of being bots. cal kuown mail server, then the IP address is very likely to be The above methods could be used by a DNSBL operator to a legitimate email server, and in such a case, enter it into the take countermeasures (sometimes called reconnaissance poi­ list of Known Mail Server IPs. soning) towards reducing spam by providing inaccurate Those of skill in the art will understand that other methods 60 information for the reconnaissance queries. Examples of may be used to compile a list of kuown legitimate email countermeasures include a DNSBL communicating to a bot- servers. In 1310, for each of the kuown or probable legitimate master that a bot was not listed in the DNSBL when in fact it email servers, its look-ups to DNSBL are observed, and its was, causing the botmaster to send spam from IP addresses average look-up arrival rate A., for a time interval (say, a that victims would be able to more easily identify and block. 10-minute interval) is derived. This can be done, for example, 65 As another example, a DNSBL could tell a botmaster that a by using the following simple estimation method. For n inter­ bot was listed in the blacklist when in fact it was not, poten- vals (say n is 6), for each interval, the number of look-ups tially causing the botmaster to abandon (or change the use of) US 9,306,969 B2 15 16 a machine that would likely be capable of successfully send­ unknown domain, we can then surmise that the domain is ing spam. The DNSBL could also be intergrated with a sys­ used for malicious purposes (e.g., botnet coordination). tem that performs bot detection heuristics, as shown in FIG. Referring to FIG. 17, one embodiment of a DNS cache 14. FIG. 14 illustrates spamming bots and a C&C performing inspection technique is as follows: In 1705, probes are done reconnaissance, attempting to get DNSBL information. for open recursion, and open recursive servers are identified. Legitimate DNSBL lookups from a victim's computer are Open recursive servers are servers that will perform recursive also being requested. A DNSBL responds to the bots, the DNS lookups on behalf of queries originating outside of their C&C, and the legitimate computer, but the DNSBL may network. In 1710, priority ranking of domains is performed. respond in different ways. For example, the DNSBL may tell (This process is described in more detail later.) The output of the bot computers wrong information in response to their 10 1705 and 1710 (which can be independent phases) is then DNSBL requests in order to confuse the botnet, while return­ used in a non-recursive query in 1715. In 1720, analysis is ing correct information to legitimate servers. performed, including: (a) determining the relative ranking of In addition, a kuownreconnaissance query could be used to boost confidence that the IP address being queried is in fact botnet sizes, (b) estimating the number of infected individu- als/bots within a botnet, and ( c) assessing whether and to what also a spamming bot. Furthermore, DNSBL lookup traces 15 would be combined with other passively collected network extent a given network has infected computers. Since infec­ data, such as SMTP connection logs. For example, a DNSBL tions are dynamic, ongoing probes are needed. Thus, the query executed from a mail server for some IP address that did analysis from 1720 can also be used to redo 1715 and priori­ not recently receive an SMTP connection attempt from that IP tize the work performed in 1715. address also suggests reconnaissance activity. 20 Identifying Open Recursive Servers DNS Cache Snooping Open recursive servers can be identified to, for example: FIGS. 17-18 illustrate a technique to estimate the popula­ (a) estimate botnet populations, (b) compare the relative sizes tion of bots within a network through DNS cache inspection of botnets, and ( c) determine if networks have botnet infec­ or snooping, according to one embodiment of the invention. tions based on the inspection of open recursive DNS caches. DNS non-recursive queries (or resolution requests for 25 Open recursive DNS servers are DNS servers that respond domains that the DNS server is not authoritative for) are used to any user's recursive queries. Thus, even individuals outside to check the cache in a large number of DNS servers on the of the network are permitted to use the open recursive DNS Internet to infer how many bots are present in the network server. The cache of any DNS server stores mappings served by each DNS server. DNS non-recursive queries between domain names and IP addresses for a limited period instruct the DNS cache not to use recursion in finding a 30 of time, the TTL period, which is described in more detail response to the query. Non-recursive queries indicate in the above. The presence of a domain name in a DNS server's query that the party being queried should not contact any cache indicates that, within the last TTL period, a user had other parties if the queried party cannot answer the query. requested that domain. In most cases, the user using the DNS Recursive queries indicate that the party being queried can server is local to the network. contact other parties if needed to answer the query. 35 In 1705 of FIG. 17, networks are scanned for all DNS In general, most domain names that are very popular, and servers, and the networks identify the servers that are open thus used extensively, are older, well-kuowndomains, such as recursive DNS servers. A DNS server (and thus, an open google.com. Because of the nature of botnets, however, recursive DNS server) can be operated at almost any address although they are new, they are also used extensively because within the IPv4 space (i.e., that portion not reserved for spe­ bets in the botnet will query the botnet C&C machine name 40 cial use). We refer to this usable IPv4 address space as a more frequently at the local Domain Name Server (LDNS), "mutable address". and hence, the resource record of the C&C machine name will To speed up the search for all DNS servers on the Internet, appear more frequently in the DNS cache. Since non-recur­ 1705 breaks up the mutable space into organizational units. sive DNS queries used for DNS cache inspection do not alter The intuition is that not all IPv4 addresses have the same the DNS cache (i.e., they do not interfere with the analysis of 45 probability of running a DNS server. Often, organizations run bot queries to the DNS), they can be used to infer the bot just a handful of DNS servers, or even just one. The discovery population in a given domain. Thus, when the majority of of a DNS server within an organizational unit diminishes (to local DNS servers in the Internet are probed, a good estimate a non-zero value) the chance that other addresses within the of the bot population in a botnet is found. same organization's unit are also DNS servers. DNS cache inspection utilizes a TTL (time-to-live) value 50 1705 is explained in more detail in FIG. 19, according to (illustrated in FIG.18) of the resource recordofa botnet C&C one embodiment of the invention. In 1905, the IPv4 mutable domain to get an accurate view of how long the resource addresses (using, for example, Request for Comments (RFC) record stays in the DNS cache. (Note that IP addresses change 3330) (note that an RFC is a document in which standards and/or the DNS server can only remember cache information relating to the operation of the Internet are published) is for a certain amount of time.) When the resource record is 55 organized into organizational units (using for example, RFC saved in the cache, (e.g., as a result of the first DNS look up of 1446). In 1910, for each organizational unit in 1905, the the C&C domain from the network), it has a default TTL following calculations are performed to obtain the classless value, set by the authoritative DNS server. As time goes on, interdomain routing (CIDR) Priority Ranking Score the TTL value decreases accordingly until the resource record ("CPRS"): is removed from the cache when the TTL value drops to zero. 60 a. For each DNS server kuown to exist in the organizational Referring to FIG. 18, three caching episodes are illustrated, unit, add 1.0. each with a beginning point in time b 1, b2, and b3, and an end b. For each IP address unit that has previously been seen to point in time el, e2, e3. The distance between caching epi­ not run a DNS server, add 0.01. sodes is described as Tl, T2, etc. Thus, if we see many c. For each IP address unit for which no information is caching episodes (or "shark fins") on FIG. 18, we can deter- 65 available, add 0.1. mine that a large number ofhosts are attempting to contact the In 1915, the organizational units are sorted in descending C&C domain. If the C&C domain is a relatively new and order according to their CPRS values. US 9,306,969 B2 17 18 Domain Ranking Some load balancing is performed by a load balancing 1710 of the DNS cache inspection process (which can be switch (often in hardware) that uses a hash of the 4-tuple of independent ofl 705) produces a set of candidate domains. In the source destination ports and IP addresses to determine other words, this phase generates a list of "suspect" domains which DNS server to query. That is, queries will always reach that are likely botnet C&C domains. There are multiple tech­ the same DNS server if the queries originate from the same nologies for deriving such a suspect list. For example, one can source IP and port. To accommodate this type ofload balanc­ use DDNS or IRC monitoring to identify a list of C&C ing, a variation of the above steps can be performed. 2115 domains. Those of ordinary skill in the art will see that DDNS through 2135 can be performed on different machines with monitoring technologies can yield a list ofbotnet domains. distinct source IPs. (This may also be executed on a single 10 multihomed machine that has multiple IP addresses associ­ Cache Inspection ated with the same machine and that can effectively act as 1715 of the DNS cache inspection process combines the multiple machines.) Thus, instead of starting three threads outputs ofl 705and1710. Foreachdomainidentifiedin 1710, from a single source IP address, three machines may each a non-recursive query is made to each non-recursive DNS start a single thread and each be responsible for querying the server identified in 1705. Thus, for the top N entries (i.e., the 15 DNS server from a distinct source IP. One of the machines is N units with the lowest scores in 1915), the following steps elected to keep track of the ADS count. The distributed are performed to determine ifthe DNS server is open recur­ machines each wait for a separate wait period, w 1 , w 2 , and w 3 , sive: per step 2115. The distributed machines coordinate by report­ a. A non-recursive query is sent to the DNS server for a ing the outcome of the results in steps 2120-2130 to the newly registered domain name. This step is repeated with 20 machine keeping track of the ADS count. appropriate delays until the server returns an NXDOMAIN If all DNS queries use only (stateless) UDP packets, the answer, meaning that no such domain exists. queries may all originate from the same machine, but forge b. A recursive query is then immediately sent to the DNS the return address of three distinct machines progranimed to server for the same domain name used in the previous non­ listen for the traffic and forward the data to the machine recursive query. If the answer returned by the DNS server is 25 keeping track of the ADS count. the correct resource record for the domain (instead ofNXDO­ Once the ADS count has been determined for a given DNS MAIN), the DNS server is designated as open recursive. server, cache inspection can be performed according to the Determine Number of DNS Servers procedure in FIG. 22. In 2205, each domain identified in 1710 Once an open recursive server is discovered, its cache can is called a Domains. For each Domains, the DNS start of be queried to find the server's IP address. Often the server's IP 30 authority (SOA) is consulted forthe TTL. This value is called address can be hard to discover because of server load bal- TTLsoA In 2210, for an ORN, x threads are created, where ancing. Load balancing is when DNS servers are clustered x=ADS*2 (2 times the number of Assumed DNS Servers). into a farm, with a single external IP address. Requests are The threads are synchronized to perform DNS queries simul­ handed off (often in round-robin style) to an array of recursive taneously according to the following procedure. For DNS machines behind a single server or firewall. This is 35 Domains, illustrated in FIG. 20. Each DNS machine maintains its own A master thread waits for half the TTLsoA period, and then unique cache, but the DNS farm itself presents a single IP instructs the child threads to send their DNS queries. (Since address to outside users. Thus, an inspection of the DNS there are twice as many queries as ADS, there is a high cache state could come (randomly) from any of the machines probability that each of the DNS servers will receive once of behind the single load balancing server or firewall. 40 the queries.) This problem is addressed by deducing the number ofDNS If any of the threads querying an 0 RN (an open recursive machines in a DNS farm. Intuitively, multiple non-recursive DNS server) reports the ORN not having a cache entry for inspection queries are issued, which discover differences in Domains, repeat step (a) immediately. TTL periods for a given domain. This indirectly indicates the If all of the threads reports that the ORN has a cache entry presence of a separate DNS cache, and the presence of more 45 for Domains, the smallest returned TTL for all of the threads than one DNS server behind a given IP address. is called TTLmim and all of the threads for TTLmin -1 seconds FIG. 21 illustrates a procedure used to deduce the number sleep before waking to repeat step (a). of DNS servers behind a load balancing server or firewall, In 2215, the above cycle, from 2210(a) to 2210(c), builds a according to one embodiment of the present invention. For time series data set of Domains with respect to an open each open recursive DNS server (ORN), it is determined ifthe 50 recursive DNS server. This cycle repeats until Domains is no DNS service is behind a load balancing server or firewall and longer of interest. This occurs when any of the following if so the number of servers is estimated as follows: In 2105, takes place: thenumberofAssumedDNS Servers (or"ADS") is set to "1". a. Domains is removed from the list of domains generated In 2110, an existing domain is recursively queried for, and the by 1710. That is, Domains is no longer of interest. TTL response time is observed. This can be called the TTL 55 b. For a period of x TTLsoA consecutive periods, fewer than response TTL0 , and can be placed into a table of Known TTL y recursive DNS servers identified in 1705 have any cache Values ("KTV"). In 2115, a period ofw u w2 , and w3 seconds entries for Domains. That is, the botnet is old, no longer is waited, where all values ofw are less than all KTV entries. propagating, and has no significant infected population. In

In2120, afterw 1 , w2 , w3 seconds, another query is sent to the practice, the sum of the x TTLSOA period may total several server. The corresponding TTL response times are observed 60 weeks. and called TTL1 , TTL2 , and TTL3 . In2125, ifw1 +TTL1 does In 2220, the cycle from steps 2210(a) to 2210(c) can also not equal any value already in KTV, then TTL1 is entered into stop when the open recursive DNS server is no longer listed as the KTV table, and the number of ADS's is incremented by open recursive by 1705 (i.e., the DNS server can no longer be one. This is repeatedforw2 +TTL2 , and w3 +TTL3 . In2130, it queried). is determined if the ADS count has not been incremented. If 65 Analysis not, in 2140, the system is exited. Ifyes, steps 2120-2130 are The analysis phase 1720 takes the cache observations from repeated until the number of ADS' s does not increase. 1715, and for each domain, performs population estimates. In US 9,306,969 B2 19 20 one embodiment, the estimates are lower and upper bound itself a Poisson process, with the same arrival rate. This sam­ calculations of the number of infected computers in a botnet. pling is called the "Constructed Poisson" process. For example, a botnet could be estimated to have between For M observations, the estimated upper bound (u) arrival 10,000 and 15,000 infected computers. One assumption rate Aµ is: made is that the requests from all the bots in a network follow the same Poisson distribution with the same Poisson arrival rate. In a Poisson process, the time interval between two _!_=f!l consecutive queries is exponentially distributed. We denote ,\u i=l M the exponential distribution rate as A. Each cache gap time interval. T,, ends with a new DNS query from one bot in the 10 local network, and begins some time after the previous DNS The population of victims needed to generate the upper query. Thus, in FIG. 18, the cache interval for the first bat's bound arrival rate Aµ can therefore be estimated as: request occurs between b 1 and e 1 . The time interval T 1 mea­ sures the distance between the end of the first caching episode 15 ev and the start of the second b2 . As illustrated in FIG. 18, for a given domain, each name resolution (DNS query) by a bot triggers a caching event with a fresh TTL value that decays linearly over time. The time between any two caching episodes is designated T,. The CONCLUSION "memoryless" property of exponential distribution indicates 20 that the cache gap time interval T, follows the same exponen­ While various embodiments of the present invention have tial distribution with the same rate A, no matter when the been described above, it should be understood that they have cache gap time interval begins. A function is said to be memo­ been presented by way of example, and not limitation. It will ryless when the outcome of any input does not depend on be apparent to persons skilled in the relevant art( s) that vari­ prior inputs. All exponentially distributed random variables 25 ous changes in form and detail can be made therein without are memoryless. In the context of the DNS cache inspection, departing from the spirit and scope ofthe present invention. In this means that the length of the current cache interval T, does fact, after reading the above description, it will be apparent to not depend on the length of the previous cache interval T,_ . 1 one skilled in the relevant art(s) how to implement the inven- Lower Bound Calculation. tion in alternative embodiments. Thus, the present invention A lower bound can be calculated on the estimated bot 30 should not be limited by any of the above-described exem­ population. For the scenario depicted in the figure above, plary embodiments. there was at least one query that triggered the cache episode In addition, it should be understood that the figures and from b to e . While there may have been more queries in each 1 1 algorithms, which highlight the functionality and advantages caching episode, each caching event from b, toe, represents at of the present invention, are presented for example purposes least a single query. 35 only. The architecture of the present invention is sufficiently If A is a lower bound (I) for the arrival rate, and T, is the 1 flexible and configurable, such that it may be utilized in ways delta between two caching episodes, and M is the number of other than that shown in the accompanying figures and algo­ observations, for M+l cache inspections, A can be estimated 1 rithms. as: 40 Further, the purpose of the Abstract of the Disclosure is to enable the U.S. Patent and Trademark Office and the public generally, and especially the scientists, engineers and practi­ 1 .{;!i T; + TTL .{;!i T; - = L_, --- =TTL+ L_,­ tioners in the art who are not familiar with patent or legal ,\i i=l M i=l M terms or phraseology, to determine quickly from a cursory 45 inspection the nature and essence of the technical disclosure of the application. The Abstract of the Disclosure is not Using analysis of a bot (e.g., by tools for bot binary analy­ intended to be limiting as to the scope of the present invention sis), the DNS query rate A can be obtained for each individual in anyway. bot. Then from the above formula, the estimate of the bot population N, in the network can be derived as follows: 50 What is claimed is: 1. A method of detecting a collection of compromised networks and/or computers, comprising: A ,\i N-­, - A performing processing associated with collecting Domain Name System (DNS) data, utilizing a detection system 55 in communication with a database, the DNS data gener­ Upper Bound Calculation. ated by a DNS server and/or similar device, wherein the During a caching period, there are no externally observable DNS data comprises DNS queries, wherein the collected effects of bot DNS queries. In a pathological case, numerous DNS data comprises DNS query rate information, and queries could arrive just before the end of a caching episode, wherein the collecting DNS data from the DNS server e,. An upper bound can be calculated on the estimated bot 60 comprises: population. Define Aµ as the upper bound estimate of the performing processing associated with identifying a Poisson arrival rate. For the upper bound estimate, there are command and control (C&C) computer in a first net­ queries arriving between the times b, and e,. The time inter­ work: vals T,, however, represent periods of no arrivals, and can be when the DNS data of a computer has an exponential treated as the sampled Poisson arrival time intervals of the 65 request rate, wherein determining the exponential underlying Poisson arrival process. It is fundamental that request rate comprises sorting DNS request rates random, independent sample drawn from a Poisson process is per current epoch and determining whether there is US 9,306,969 B2 21 22 exponential activity over the current epoch and an puters is accomplished without contacting any networks or epoch longer than the current epoch; and computers in the collection of compromised networks and/or performing processing associated with recording an computers. IP address and/or traffic information from a com­ 10. The method of claim 1, wherein collecting DNS data promised computer when the compromised com­ comprises: puter contacts another computer; performing processing associated with determining performing processing associated with examining the whether a source Internet Protocol (IP) address perform­ collected DNS data relative to DNS data from known ing reconnaissance belongs to a compromised computer, compromised and/ or uncompromised computers; and the source IP address looking up at least one subject IP performing processing associated with determining an 10 address; and existence of the collection of compromised networks when the source IP is known to belong to a compromised and/or computers, and/or an identity of compromised computer, performing processing associated with desig­ networks and/or computers, based on the examina­ nating the at least one subject IP addresses as a compro­

tion. 15 mised computer. 2. The method of claim 1, wherein the performing process­ 11. The method of claim 10, wherein determining whether ing associated with identifying a command and control the source IP address belongs to a compromised computer (C&C) computer in a first network further comprises: comprises: performing processing associated with determining performing processing associated with determining whether a computer has a suspicious DNS request rate, 20 whether the source IP address is a known compromised comprising: computer utilizing a DNS-based Blackhole List performing processing associated with calculating a (DNSBL) and/or another list of compromised comput­ canonical sub-level domain (SLD) request rate for a ers. given SLD, wherein the canonical SLD request rate is 12. The method of claim 11, wherein determining whether calculated as the total number ofrequests to third level 25 the source IP address belongs to a compromised computer domains (3LDs) present in the given SLD plus any comprises: request to the given SLD, and performing processing associated with determining performing processing associated with determining whether the source IP address is also the subject IP whether the canonical SLD request rate of the given address. SLD significantly deviates from the mean of canoni- 30 13. The method of claim 11, wherein determining whether the source IP address belongs to a compromised computer cal request rates of SLDs. comprises: 3. The method of claim 1, wherein the performing process­ performing processing associated with determining a look­ ing associated with identifying a command and control up ratio for the source IP address, the look-up ratio (C&C) computer in a first network further comprises: 35 comprising the number of IP addresses the source IP when the DNS request rate is suspicious, performing pro­ address queries divided by the number of IP addresses cessing associated with determining whether the DNS that issue a look-up for the source IP address; and data has an exponential request rate comprising: when the look-up ratio for the source IP address is high, performing processing associated with sorting DNS designating the source IP address as a compromised request rates per epoch, and 40 computer. performing processing associated with determining 14. The method of claim 11, wherein determining whether whether there is exponential activity over a longer the source IP address belongs to a compromised computer time epoch. comprises: 4. The method of claim 1, wherein collecting DNS data performing processing associated with determining a look­ further comprises: 45 up ratio for the source IP address, the look-up ratio performing processing associated with replacing an IP comprising the number of IP addresses the source IP address of the C&C computer with an IP address of queries divided by the number ofIP addresses that issue another computer, causing the compromised computer a look-up for the source IP address; seeking to contact the C&C computer to be redirected to when the look-up ratio for the source IP address is low, the other computer. 50 performing processing associated with determining 5. The method of claim 4, wherein the other computer whether the look-up arrival rate mirrors the email arrival comprises a sinkhole computer. rate; and 6. The method of claim 4, further comprising: when the look-up arrival rate does not mirror the email performing processing associated with isolating the collec­ arrival rate, performing processing associated with des­ tion of compromised networks and/or computers from 55 ignating the source IP address as a compromised com­ its C&C computer, causing the collection of compro­ puter. mised networks and/or computers to lose the ability to 15. The method of claim 14, wherein determining whether act as a coordinated group. the look-up arrival rate mirrors the email arrival rate further 7. The method of claim 5, further comprising: comprises: analyzing traffic from the compromised computer to the 60 performing processing associated with identifying a list of sinkhole computer to obtain information about a mal­ known and/or probably legitimate IP addresses using a ware author. DNSBL service; 8. The method of claim 1, further comprising: for each of the known and/or probably legitimate IP utilizing time zone and time of release information to pre­ addresses, performing processing associated with deter­ dict optimal release time information for an attack. 65 mining its average look-up arrival rate; 9. The method of claim 1, wherein determining the exist­ performing processing associated with determining an ence of the collection of compromised networks and/or com- average look-up arrival rate from the source IP address; US 9,306,969 B2 23 24 performing processing associated with comparing the 22. The method of claim 18, wherein identifying open average look-up rates of the known and/or probably recursive DNS servers comprises: legitimate IP addresses to the arrival rate from the source performing processing associated with organizing IPv4 IP address; and mutable addresses into units; when the average look-up rates of the known and/or prob­ performing processing associated with determining a ably legitimate IP addresses differ significantly from the classless interdomain routing (CIDR) priority ranking arrival rate from the source IP address, performing pro­ score (CPRS) value for each unit utilizing DNS server cessing associated with designating the source IP information; address as a compromised computer. performing processing associated with sorting the units a 16. The method of claim 15, wherein identifying a list of 10 list in descending order utilizing the CPRS value; and known IPs comprises: performing processing associated with determining when the DNSBL service has controlled access, perform­ whether a DNS server is an open recursive DNS server ing processing associated with recording IP addresses of for DNS servers in the top of the list.

approved users. 15 23. The method of claim 22, wherein determining the 17. The method of claim 15, wherein identifying a list of CPRS value comprises: probably legitimate IPs comprises: performing processing associated with giving a value of when the DNSBL service allows anonymous access, per­ 1.0 for each DNS server known to exist in the unit; forming processing associated with monitoring the performing processing associated with giving a value of source IP addresses of incoming look-up requests, and 20 0.01 for each IP address known to run on a DNS server; recording these source IP addresses; and performing processing associated with connecting to the IP performing processing associated with giving a value of address to determine whether the IP address is running 0.1 for each IP address with no DNS server information. on a known server; and 24. The method of claim 22, further comprising: when the IP address is running on a known server, perform- 25 performing processing associated with determining the ing processing associated with designating the IP number ofDNS servers behind a load balancing server. address as probably legitimate. 25. The method of claim 1, wherein the DNS data is non­ 18. The method of claim 1, wherein collecting DNS data recurs1ve. comprises: 26. The method of claim 1, wherein the performing pro- performing processing associated with identifying open 30 cessing associated with collecting Domain Name System recursive DNS servers; and (DNS) data comprises: performing processing associated with priority ranking analyzing the DNS data to determine whether the DNS domain names. data has an exponential request rate. 19. Themethodofclaim18, wherein the determining com- 27. A system for detecting a collection of compromised 35 prises: networks and/or computers, comprising: performing processing associated with utilizing the open a computer constructed and arranged to perform process­ recursive DNS servers and the priority-ranked domain ing associated with collecting Domain Name System names to determine whether the open recursive DNS (DNS) data, utilizing a detection system in communica­ servers are compromised computers. 40 tion with a database, the DNS data generated by a DNS 20. The method of claim 19, further comprising: server and/or similar device, wherein the DNS data com­ performing processing associated with ranking sizes of prises DNS queries, wherein the collected DNS data networks of compromised computers; comprises DNS query rate information, and wherein the performing processing associated with estimating a num­ collecting DNS data from the DNS server comprises: ber of compromised computers in a network; 45 performing processing associated with identifying a performing processing associated with assessing to what command and control (C&C) computer in a first net- extent a given network has compromised computers; work performing processing associated with determining a when the DNS data of a computer has an exponential lower bound calculation of a compromised computer request rate, wherein determining the exponential population; or 50 request rate comprises sorting DNS request rates performing processing associated with determining an per current epoch and determining whether there is upper bound calculation of a compromised computer exponential activity over the current epoch and an population; or epoch longer than the current epoch; and any combination of the above. performing processing associated with recording an 21. The method of claim 19, wherein utilizing the open 55 IP address and/or traffic information from a com­ recursive DNS servers and the priority-ranked domains in a promised computer when the compromised com­ recursive query comprises: puter contacts another computer; performing processing associated with sending at least one performing processing associated with examining the non-recursive query to the DNS server for a newly reg­ collected DNS data relative to DNS data from known istered domain until the DNS server returns an NXDO- 60 compromised and/ or uncompromised computers; and MAIN answer; performing processing associated with determining an performing processing associated with immediately send­ existence of the collection of compromised networks ing a recursive query to the DNS server for the newly and/or computers, and/or an identity of compromised registered domain; and networks and/or computers, based on the examina­ performing processing associated with designating the 65 tion. DNS server as open recursive when the answer returned 28. The system of claim 27, wherein the computer is con­ by the DNS server is not NXDOMAIN. structed and arranged to perform processing associated with US 9,306,969 B2 25 26 identifying a connnand and control (C&C) computer in a first when the source IP is known to belong to a compromised network by further: computer, performing processing associated with desig­ performing processing associated with determining nating the at least one subject IP addresses as a compro­ whether a computer has a suspicious DNS request rate, mised computer. comprising: 37. The system of claim 36, wherein the computer is con­ performing processing associated with calculating a structed and arranged to determine whether the source IP canonical sub-level domain (SLD) request rate for a address belongs to a compromised computer by further: given SLD, wherein the canonical SLD request rate is performing processing associated with determining calculated as the total number ofrequests to third level whether the source IP address is a known compromised domains (3LDs) present in the given SLD plus any 10 computer utilizing a DNS-based Blackhole List request to the given SLD, and (DNSBL) and/or another list of compromised comput- performing processing associated with determining ers. whether the canonical SLD request rate of the given 38. The system of claim 27, wherein the computer is con­

SLD significantly deviates from the mean of canoni- 15 structed and arranged to determine whether the source IP cal request rates of SLDs. address belongs to a compromised computer by further: 29. The system of claim 27, wherein the computer is con­ performing processing associated with determining structed and arranged to perform processing associated with whether the source IP address is also the subject IP identifying a connnand and control (C&C) computer in a first address. network by further: 20 39. The system of claim 27, wherein the computer is con­ when the DNS request rate is suspicious, performing pro­ structed and arranged to determine whether the source IP cessing associated with determining whether the DNS address belongs to a compromised computer by further: data has an exponential request rate comprising: performing processing associated with determining a look­ performing processing associated with sorting DNS up ratio for the source IP address, the look-up ratio request rates per epoch, and 25 comprising the number of IP addresses the source IP performing processing associated with determining address queries divided by the number of IP addresses whether there is exponential activity over a longer that issue a look-up for the source IP address; and time epoch. when the look-up ratio for the source IP address is high, 30. The system of claim 27, wherein the computer is con- designating the source IP address as a compromised structed and arranged to collect DNS data by further: 30 computer. performing processing associated with replacing an IP 40. The system of claim 37, wherein the computer is con­ address of the C&C computer with an IP address of structed and arranged to determine whether the source IP another computer, causing the compromised computer address belongs to a compromised computer by further: seeking to contact the C&C computer to be redirected to performing processing associated with determining a look- 35 the other computer. up ratio for the source IP address, the look-up ratio 31. The system of claim 30, wherein the other computer comprising the number of IP addresses the source IP comprises a sinkhole computer. queries divided by the number ofIP addresses that issue 32. The system of claim 30, wherein the computer is further a look-up for the source IP address; constructed and arranged to: 40 when the look-up ratio for the source IP address is low, perform processing associated with isolating the collection performing processing associated with determining of compromised networks and/or computers from its whether the look-up arrival rate mirrors the email arrival C&C computer, causing the collection of compromised rate; and networks and/or computers to lose the ability to act as a when the look-up arrival rate does not mirror the email coordinated group. 45 arrival rate, performing processing associated with des­ 33. The system of claim 31, wherein the computer is further ignating the source IP address as a compromised com­ constructed and arranged to: puter. analyze traffic from the compromised computer to the sink­ 41. The system of claim 40, wherein the computer is con­ hole computer to obtain information about a malware structed and arranged to determine whether the look-up author. 50 arrival rate mirrors the email arrival rate by further: 34. The system of claim 27, wherein the computer is further performing processing associated with identifying a list of known and/or probably legitimate IP addresses using a constructed and arranged to: DNSBL service; utilizing time zone and time of release information to pre- for each of the known and/or probably legitimate IP dict optimal release time information for an attack. 55 addresses, performing processing associated with deter­ 35. The system of claim 27, wherein the computer is con­ mining its average look-up arrival rate; structed and arranged to determine the existence of the col­ performing processing associated with determining an lection of compromised networks and/or computers without average look-up arrival rate from the source IP address; contacting any networks or computers in the collection of performing processing associated with comparing the compromised networks and/or computers. 60 average look-up rates of the known and/or probably 36. The system of claim 27, wherein the computer is con­ legitimate IP addresses to the arrival rate from the source structed and arranged to collect DNS data by further: IP address; and performing processing associated with determining when the average look-up rates of the known and/or prob­ whether a source Internet Protocol (IP) address perform­ ably legitimate IP addresses differ significantly from the ing reconnaissance belongs to a compromised computer, 65 arrival rate from the source IP address, performing pro­ the source IP address looking up at least one subject IP cessing associated with designating the source IP address; and address as a compromised computer. US 9,306,969 B2 27 28 42. The system of claim 41, wherein the computer is con­ performing processing associated with sending at least one structed and arranged to identify a list of known IPs by fur­ non-recursive query to the DNS server for a newly reg­ ther: istered domain until the DNS server returns an NXDO­ when the DNSBL service has controlled access, perform­ MAIN answer; ing processing associated with recording IP addresses of performing processing associated with immediately send­ approved users. ing a recursive query to the DNS server for the newly 43. The system of claim 41, wherein the computer is con­ registered domain; and structed and arranged to identify a list of probably legitimate performing processing associated with designating the IPs by further: DNS server as open recursive when the answer returned when the DNSBL service allows anonymous access, per- 10 by the DNS server is not NXDOMAIN. forming processing associated with monitoring the source IP addresses of incoming look-up requests, and 48. The system of claim 44, wherein the computer is con­ recording these source IP addresses; structed and arranged to identify open recursive DNS servers performing processing associated with connecting to the IP by further: address to determine whether the IP address is running 15 performing processing associated with organizing IPv4 on a known server; and mutable addresses into units; when the IP address is running on a known server, perform­ performing processing associated with determining a ing processing associated with designating the IP classless interdomain routing (CIDR) priority ranking address as probably legitimate. score (CPRS) value for each unit utilizing DNS server 44. The system of claim 27, wherein the computer is con- 20 information; structed and arranged to collect DNS data by further: performing processing associated with sorting the units a performing processing associated with identifying open list in descending order utilizing the CPRS value; and recursive DNS servers; and performing processing associated with determining performing processing associated with priority ranking whether a DNS server is an open recursive DNS server domain names. 25 for DNS servers in the top of the list. 45. The system of claim 44, wherein the computer is con­ structed and arranged to determine by further: 49. The system of claim 48, wherein the computer is con­ performing processing associated with utilizing the open structed and arranged to determine the CPRS value by fur­ recursive DNS servers and the priority-ranked domain ther: names to determine whether the open recursive DNS 30 performing processing associated with giving a value of servers are compromised computers. 1.0 for each DNS server known to exist in the unit; 46. The system of claim 45, wherein the computer is further performing processing associated with giving a value of constructed and arranged to: 0.01 for each IP address known to run on a DNS server; performing processing associated with ranking sizes of and networks of compromised computers; 35 performing processing associated with giving a value of performing processing associated with estimating a num­ 0.1 for each IP address with no DNS server information. ber of compromised computers in a network; performing processing associated with assessing to what 50. The system of claim 48, wherein the computer is further extent a given network has compromised computers; constructed and arranged to: performing processing associated with determining a 40 performing processing associated with determining the lower bound calculation of a compromised computer number ofDNS servers behind a load balancing server. population; or 51. The system of claim 27, wherein the DNS data is performing processing associated with determining an non-recursive. upper bound calculation of a compromised computer 52. The system of claim 27, wherein the computer is con- population; or 45 structed and arranged to perform processing associated with any combination of the above. collecting Domain Name System (DNS) data by further: 47. The system of claim 45, wherein the computer is con­ structed and arranged to utilize the open recursive DNS serv­ analyzing the DNS data to determine whether the DNS ers and the priority-ranked domains in a recursive query by data has an exponential request rate. further: * * * * *