<<

Code-Red: a case study on the spread and victims of an Intemet worm

David Moore, Colleen Shannon, k claffy

Abstractm On July 19, 2001, more than 359,000 comput- I. INTRODUCTION ers connected to the were infected with the Code- At 18:00 on November 2, 1988, Robert T. Morris re- Red (CRv2) worm in less than 14 hours. The cost of this epidemic, including subsequent strains of Code-Red, is es- leased a 99 line program onto the Internet. At 00:34 on timated to be in excess of $2.6 billion. Despite the global November 3, 1988, Andy Sudduth of Harvard University damage caused by this attack, there have been few seri- posted the following message: "There may be a virus loose ous attempts to characterize the spread of the worm, partly on the Internet?' Indeed, Sun and VAX machines across due to the challenge of collecting global information about the country were screeching to a halt as invisible tasks uti- worms. Using a technique that enables global detection of lized all available resources [1] [2]. worm spread, we collected and analyzed data over a period No virus brought large computers across the country to of 45 days beginning July 2nd, 2001 to determine the charac- teristics of the spread of Code-Red throughout the Internet. a standstill - the culprit was actually the first malicious worm. Unlike viruses and trojans which rely on human In this paper, we describe the methodology we use to trace the spread of Code-Red, and then describe the results of our intervention to spread, worms are self-repficating software trace analyses. We first detail the spread of the Code-Red designed to spread throughout a network on their own. Al- and CodeRedH worms in terms of infection and deactiva- though the Morris worm was the first malicious worm to tion rates. Even without being optimized for spread of infec- wreak widespread havoc, earfier worms were actually de- tion, Code-Red infection rates peaked at over 2,000 hosts per signed to maximize utilization of networked computation minute. We then examine the properties of the infected host resources. In 1982 at Xerox's Palo Alto Research Cen- population, including geographic location, weekly and diur- ter, John Shoch and Jon Hupp wrote five worm programs nal time effects, top-level domains, and ISPs. We demon- strate that the worm was an international event, infection ac- that performed such benign tasks as posting announce- tivity exhibited time-of-day effects, and found that, although ments [3]. However, research into using worm programs most attention focused on large corporations, the Code-Red as tools was abandoned after it was determined that the worm primarily preyed upon home and small business users. consequences of a worm malfunction could be dire. We also qualified the effects of DHCP on measurements of In the years between the Morris worm in November infected hosts and determined that IP addresses are not an 1988 and June 2001, Several other worms achieved tim- accurate measure of the spread of a worm on fimescales ited spread through host populations. The WANK (Worms longer than 24 hours. Finally, the experience of the Code- Red worm demonstrates that wide-spread vulnerabilities in Against Nuclear Killers) worm of October, 1989 attacked Internet hosts can be exploited quickly and dramatically, SPAN VAX/VMS systems via DECnet protocols [4]. The and that techniques other than host patching are required Ramen worm, first spread in January of 2001 targeted the to mitigate Internet worms. wu-ftp daemon on RedHat Linux 6.2 and 7.0 systems [5]. Keywords--Code-Red, Code-RedI, CodeRedI, CodeRedH, Finally, the Lion Worm targeted the TSIG vulnerability in worm, security, backseatter, virus, epidemiology BIND in March of 2001 [6]. While all of these worms caused some damage, none approached the $2.6 billion cost of recovering from the CAIDA, San Diego Supercomputer Center, University of California, Code-Red and CodeRedlI worms [7]. We can no longer af- San Diego. E-mail: {cshannon, dmoore, kc}@caida, org. Support for this work is provided by DARPA NMS Grant N66001- ford to remain ignorant of the spread and effects of worms 01-1-8909, NSF grant NCR-971 I092, Cisco Systems URB Grant, and as information technology plays a critical role in our global Caida members. economy. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee pro~ed that copies are not made or distributed for profit or commercial advantage and that II. BACKGROUND copies bear this notice and the full citation on the first page. To copy otherwise, to rep0blish, to post on servers or to redistribute to lists, On June 18, 2001, eEye released information about requires prior specific permission and/or a fee. a buffer-overflow vulnerability in Microsoft's IIS web IMW'02, Nov. 6-8, 2002, Marseille, France Copyright 2002 ACM ISBN 1-58113-603-X/02/0011 ...$5.00 servers [8]. Microsoft released a patch for the vulnera- bility eight days later, on June 26, 2001 [9]. Then on July

273 12, 2001, the Code-RedI worm began to exploit the afore- probing random IP addresses and infecting all hosts vul- mentioned buffer-overflow vulnerability in Microsoft's IIS nerable to the IIS exploit. Unlike Code-RedI vl, Code- web servers. RedI v2 uses a random seed in its pseudo-random number Upon infecting a machine, the worm checks to see if the generator, so each infected computer tries to infect a differ- date (as kept by the system clock) is between the first and ent list of randomly generated IP addresses at an observed the nineteenth of the month. If so, the worm generates a rate of roughly 11 probes per second (pps). This seemingly random list of IP addresses and probes each machine on minor change had a major impact: more than 359,000 ma- the list ia an attempt to infect as many computers as pos- chines were infected with Code-RedI v2 in just fourteen sible. However, this first version of the worm uses a static hours [10][11]. seed in its random number generator and thus generates Because Code-RedI v2 is identical to Code-Red vl in identical lists of IP addresses on each infected machine. all respects except the seed for its pseudo-random num- The first version of the worm spread slowly, because each ber generator, the only direct damage to the infected host infected machine began to spread the worm by probing is the "Hacked by Chinese" message added to top level machines that were either already infected or impregnable. web pages on some hosts. However, Code-RedI v2 had a On the 20th of every month, the worm is programmed to greater impact on global infrastructure due to the sheer vol- stop infecting other machines and proceed to its next at- ume of hosts infected and probes sent to infect new hosts. tack phase in which it launches a Denial-of-Service attack Code-RedI v2 also wreaked havoc on some additional against wwwl. whitehouse, gov from the 20th to the devices with web interfaces, such as routers, switches, 28th of each month. The worm is dormant on days of the DSL modems, and printers [12]. Although these devices month following the 28th. were not susceptible to infection by the worm, they either On Jtd!y 13th, Ryan Permeh and Marc Maiffret at eEye crashed or rebooted when an infected machine attempted Digital Security received logs of attacks by the worm and to send them the unusual http request containing a copy of worked through the night to disassemble and analyze the the worm. worm. They christened the worm "Code-Red" both be- Like Code-RedI v 1, Code-RedI v2 can be removed from cause the highly caffeinated "Code Red" Mountain Dew a computer simply by rebooting it. However, reboofing the beverage fueled their efforts to understand the workings machine does not prevent reinfection once the machine is of the worm and because the worm defaces some web online again. On July 19th, the number of machines at- pages with the phrase "Hacked by Chinese". There is no tempting to infect new hosts was so high that many ma- evidence either supporting or refuting the involvement of chines were infected while the patch for the vulnerability Chinese with the Code-RedI worm. The first ver- was being applied. sion of rite Code-Red worm (Code-RedI v 11) caused little On August 4, 2001, an entirely new worm, CodeRedlI damage. Although the worm's attempts to spread itself began to exploit the buffer-overflow vulnerability in Mi- consumed resources on infected machines and local area crosoft's HS web servers [13] [14]. Although the new networks, it had little impact on global resources. worm is completely unrelated to the original Code-RedI The Code-RedI vl worm is memory resident, so an in- worm, the source code of the worm contained the string fected machine can be disinfected by simply rebooting it. "CodeRedlI" which became the name of the new worm. However, the machine is still vulnerable to repeat infec- Ryan Permeh and Marc Maiffret analyzed CodeRedlI tion. Any machines infected by Code-RedI vl and sub- to determine its attack mechanism. When a worm infects sequently rebooted were likely to be reinfected, because a new host, it first determines if the system has already each newly infected machine probes the same list of IP been infected. If not, the worm initiates its propagation addresses in the same order. mechanism, sets up a "backdoor" into the infected ma- At approximately 10:00 UTC in the morning of July chine, becomes dormant for a day, and then reboots the 19th, 2001, we observed a change in the behavior of the machine. Unlike Code-RedI, CodeRedlI is not memory worm as infected computers began to probe new hosts. resident, so rebooting an infected machine does not elimi- At this point, a random-seed variant of the Code-RedI vl nate CodeRedH. worm began to infect hosts running unpatched versions of Initial intuition might lead one to believe that this Microsoft's IIS web server. The worm still spreads by twenty-four hour delay will retard the spread of the worm so severely that it will never compromise a large number 1Although the initial Code-Red worm did not can'y a suffix denoting its tempor~d position, we have added the suffix 'T' in the interest of of machines, this is not the case. The delay adds a layer clarity, in the same manner as The Great War later came to be known of subterfuge to the worm, since perusal of logs showing as World War I. connections to the machine around the time that the ma- 274 chine begins to demonstrate symptoms of the infection (i.e. The data used for this study were collected from two lo- when it starts to actively spread the worm) will not yield cations: a/8 network and two/16 networks. Two types any unusual activity. of data from the/8 network are used to maximize cov- After rebooting the machine, the CodeRedlI worm be- erage of the expansion of the worm. Between midnight gins to spread. If the host infected with CodeRedlI has and 16:30 UTC on July 19, a passive network monitor Chinese (Taiwanese) or Chinese (PRC) as the system lan- recorded headers of all packets destined for the/8 research guage, it uses 600 threads to probe other machines. On network. After 16:30 UTC, a filter installed on a campus all other machines it uses 300 threads. CodeRedlI uses router to reduce congestion caused by the worm blocked a more complex method of selecting hosts to probe than all external traffic to this network. Because this filter was Code-RedI. CodeRedII generates a random IP address and put into place upstream of the monitor, we were unable then applies a mask .to produce the IP address to probe. to capture IP packet headers after 16:30 UTC. However, a The length of the mask determines the similarity between backup data set consisting of sampled netflow [15] output the IP address of the infected machine and the probed ma- from the filtering router was available for the/8 through- chine. CodeRedlI probes a completely random IP address out the 24 hour period. The data from the/16 networks l/8th of the time. Half of the time, CodeRedlI probes a were collected with Bro between 10:00 UTC on July 19 machine in the same/8 (so if the infected machine had and 7:00 on July 20 [16]. We merged these three sources the IP address 10.9.8.7, the IP address probed would start into a single dataset. Hosts were considered to be infected with 10.), while 3/8ths of the time, it probes a machine if they sent at least two TCP SYN packets on port 80 to on the same/16 (so the IP address probed would begin nonexistent hosts on these networks during this time pe- with 10.9.). Like Code-RedI, CodeRedII avoids probing riod. The requirement of two packets helps to eliminate IP addresses in the 224.0.0.0/8 (multicast) and 127.0.0.0/8 random source denial-of-service attacks from the Code- (loopback) address spaces. The bias toward the local/16 Red data. and/8 networks means that an infected machine may be Early on July 20, the filter was removed and we resumed more likely to probe a susceptible machine, based on the packet header data collection. Although we collected data supposition that machines on a single network are more through October, we include data through August 25, 2001 likely to be running the same software as machines on un- in this study. No significant changes were observed in related IP subnets. Code-RedI or CodeRedII activity between August 2001 The CodeRedlI worm is mt/ch more dangerous than and the pre-programmed shutdown of CodeRedlI on Oc- Code-RedI because CodeRedII installs a mechanism for tober 1, 2001.

remote, administrator-level access to the infected machine. 1800 ~ ~ ~ ~ ~ ~ ~ ~ ~ J ~ ~ ~ ~ Atl port 80 probes CRvl candidates Unlike Code-RedI, CodeRedlI neither defaces web pages 1600 on infected machines nor launches a Denial-of-Service at- tack. However, the backdoor installed on the machine al- 1200 lows any code to be executed, so the machines could be 1000 used as "zombies" for future attacks (Denial-of-Service or 8oo otherwise).

Ill. METHODOLOGY 400 2oo t _. In this section, we detail our trace collection methodol- i 0 ...... , , , , , , 00:00 00:00 00:00 ogy, how we validated that the traffic we trace is from the 07105 07/12 07119 spread of the worms, and describe our approaches for char- ~me (UTC) acterizing the type of hosts infected and their geographics Fig. 1. Background level of unsolicited SYN probes and the locations. beginning of the spread of the Code-RedI worm. Our analysis of the Code-RedI worm covers the spread of the worm between July 4, 2001 and August 25, 2001. A constant background level of unsolicited TCP SYN Before Code-RedI began to spread, we were collecting packets, most likely port scans seeking to identify vulnera- data in the form of a packet header trace of hosts sending ble machines, target the IPv4 address space. In our/8, this unsolicited TCP SYN packets into our/8 network. When rate fluctuates between 100 and 600 hosts per two hour the worm began to spread extensively on the morning of period, with diurnal and weekly variations. On July 12, July 19, we noticed the sudden influx of probes into our the static-seed version of the Code-RedI worm began to network and began our monitoring efforts in earnest. spread. We noticed that the hosts that appeared clearly in- 275 fected with Code-RedI vl probed the same set of 23 IP ing categories: mail servers, name servers, web servers, addresses within our/8 research network. In Figure 1, we IRC servers, firewalls, dial-up, broadband, other (unclassi- used the criterion of probing these 23 addresses to separate fiable) hosts, and hosts with no hostname. The prevalence the Code-RedI v 1 probes from the background port scans. of each type of host is discussed in Section IV-B.4. To coafirm that the 23 addresses were actually among We also used hia's IxMapping [18] service to determine those probed by the worm, we reverse engineered the ex- the latitude, longitude, and country of each IP address in- ploit to extract the IP addresses probed by its static-seed fected with the worm. hMapping uses public data sources pseudo-random number generator. We obtained a disas- such as NHOI S and DNS, as well as specialized measure- sembled version of the worm from eEye [ 17] and identified ment to geographically place IP addresses. We identified a the code responsible for spreading the worm. The worm rough approximation of the fimezone of each infected host creates one hundred threads, each with its own static-seed based on this longitude. and thus its own distinct, although not disjoint, set of IP addresses probed sequentially. IV. RESULTS We examined the PRNG (Pseudo-Random Number In this section of the paper, we present the results of Generate,r) code used to generate the target sequences and our trace analyses. We first characterize the spread of the wrote a C implementation to generate the first one thou- Code-RedI and CodeRedlI worms, then examine the prop- sand IP addresses probed by each thread (approximately erties of the infected host population, and finally determine the first one milfion IP addresses). We extracted the IP ad- the rate at which infected hosts are repaired. dresses that fell in our/8 and found the same 23 address sequence we predicted from our packet trace data. The A. Worm Spread 23 addresses that fall in our research network actually oc- In this section, we examine the dynamics of the spread cur very early in the generated sequences2. A machine of the Code-RedI and CodeRedII worms. newly infected with Code-RedI vl probes our/8 network 23 times in the first fifteen minutes of propagation. A. 1 Host Infection Rate Once we had identified the IP addresses initially probed We detected more than 359,000 unique IP addresses4 in- by the worm, we compared this sequence to the hosts fected with the Code-RedI worm between midnight UTC we observed probing the 23 target addresses in our re- on July 19 and midnight UTC on July 20. To determine search network. We discovered that the first three hosts the rate of host infection, we recorded the time of the first that probed our/8 research network were not contained in attempt of each infected host to spread the worm. Because the IP address sequence probed by any thread. We be- our data represent only a sample of all probes sent by in- lieve that the individual (or individuals) responsible for fected machines, the number of hosts detected provides a the Code-RedI worm compromised these machines and lower bound on the number of hosts that have been com- seeded them with the worm to initiate the epidemic. The promised at any given time. first two machines both appear to be located in the United

States, one in Cambridge, Massachusetts and the other in 400000 Atlanta, Georgia. The third address appears to be in the 350000 city of Foshan in China's GuangDong province. However there remains no evidence linking Chinese hackers to the 300000 250000 development or deployment of the Code-RedI worm. o We classify infected hosts using the DNS name of each 200000 host and a hand-tuned set of regular expression matches 3 --= 150000

(e.g. DNS names with "dialup" represent modems, "dsl" 1000010 or "home.com" identifies broadband, etc.) into the follow- 50000

2IP addlx:sses in the monitored class A network occurred early in each 0 i i I i i 00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00 of the 100 threads started on Code-RedI vl infected machines. Probe 07;19 tJme (UTC) 07/20 sequence numbers within their threads included: 8, 12, 14, 20, 22, 25, 26, 29, 32, 34, 36, 40, 41, 41, 43, 43, 4-4, 45, 45, 51, 56, 57, 59. Thus Fig. 2. Cumulative total of unique IP addresses infected by the we are able to detect the compromise of a new host almost instantly first outbreak of Code-RedI v2. as we receive many probes from the host in the first minute following infection. 3The regular expressions are available at http://www, caida. 4We required at least 2 probes from each host to two different ad- org/tools/measurement/misc/HostCla ssi fy dresses before we conclusively identified it as infected. 276 le+06 16:51 and 17:21 UTC. We believe that in actuality the in- fection rate from 16:30 to 18:00 UTC tapered smoothly.

100000 Although the growth was slowing, had the worm not been programmed to stop spreading at midnight, addi- tional hosts would have been compromised. The infection 10000 rate would have continued to decrease once the vast ma- jority of vulnerable machines were infected. We speculate 1000 that the memory resident status of this worm would have /1 allowed reinfection of many hosts after a reboot cleared / measured -- 100 ' ' [' ' C=3600100; K=16i T=16-:-- the initial infection.. 00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00 07119 time (UTC) 07/20 350000 Fig. 3. Comparison of the growth rate of the first outbreak of 300000 Code-RedI v2 with infection model.

250000

2500 200000

[ 150000 Y 2000

100000

'~ 1500 50000

= 1000 0 i i , i | 00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00 08:00 c 08,'01 08/01 lime (UTC) 08/02

500 Fig. 5. Cumulative total of unique IP addresses infected during the first day of the second outbreak of Code-RedI v2. 0 t ,i, i 00:00 04:00 08:00 12:00 18:00 20:00 00:00 04:00 07119 lime (UTC) 07/20 On August 1, the Code-RedI v2 worm began to spread Fig. 4. One minute infection rates for Code-RedI v2. again in earnest. By midnight, we had observed approxi- mately 275,000 unique IP addresses spreading the Code- Figure 2 shows the number of infected hosts over time. Redi v2 worm, as seen in Figure 5. The difference between The growth of the curve between 11:00 and 16:30 UTC is the infected host count at 24 hours for the first and second exponential, as can be seen in the logarithmic scale plot outbreaks of Code-RedI v2 is likely caused by the patching (Figure 3). On the surface, the data seems to fit reasonably of hosts, which removed them from the susceptible popu- with the growth model for the worm infection proposed lation. by Stuart Staniford [11]. Discrepancies between the upper Figure 6 shows the rate at which new hosts were in- ranges of the growth model and our data are caused both fected with the Code-RedI v2 worm. The spread of the by the fixed cutoff time of the worm itself and by hosts outbreak peaked in the early afternoon of August 1, with repaired or isolated throughout the day. 29710 hosts infected in the hour following 14:00 UTC and 28583 following 15:00 UTC. A rate of more than twenty Figure 4 provides a more detailed view of the spread of thousand new hosts per hour was sustained from 13:00 the worm in terms of the number of newly infected hosts through 17:00 UTC. After this point, the host population seen in 1 minute periods throughout the day. In the figure, approached saturation with the worm - when almost all we see that the infection rate peaked at 2,000 host/minute. susceptible hosts are already infected by the worm, it be- Unfortunately, the peak of the initial curve occurs at about comes increasingly difficult to locate new hosts. the same time that the passive monitor data became un- available, so the duration of the 2,000 host/minute infec- A.2 Deactivation rate tion rate is unknown. In particular, the large spike corre- sponds to 7,700 hosts; it is an anomaly caused by a small During the course of the day on July 19, a few ini- gap in the collected netflow data that resulted in detection tially infected machines were , rebooted, or filtered of all hosts infected during the down time when collection and consequently ceased to probe networks for vulnerable resumed. Thus the spike in the number of hosts infected hosts. We consider a host that was previously infected to is actual/y representative of all the hosts infected between be inactive after we have observed no further unsolicited 277 3C000 12000

2~000 10000

a~ 2C000 8O00

~ lEO00 6000 2: | ~ 10000 4000

5000 2000

i 0 L.5 00:00 00:00 00:00 00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00 08/O2 ti~,F~'rc) o8/16 0~19 time (UTC) 07/20

Fig. 6. Iqlourly infection rates for the second outbreak of Code- Fig. 8. Rate of infected host deactivation in one minute periods. RedI v2 between August 1 and August 19, 2001.

8000

4,00000 7000

350000 6000 300000 E i 5000 •~ 250000 E 4000 ~ 200000

3000 150000

2000 .5 100000

50000 00:00 00:00 00:00 00:00 00:00 00:00 00:00 08/04 08/07 08/10 0~13 0~16 0W19 0~22 0 i i Ume (UTC) 00:00 04:00 OBrO0 12:00 16:00 20:00 00:00 04:00 07/19 lime (UTC) 07/20 Fig. 9. The raw probe rate observed in our/8 network. Fig. 7. Cumulative total of deactivated Code-RedI v2. infected hosts. ter from a Denial-of-Service attack [19], while the dip on August 9 was caused by a gap in data collection. No traffic from it. Figure 7 shows the total number of inac- change in probe rate is apparent following the spread of tive hosts over time. The majority of hosts stopped prob- CodeRedlI. Although CodeRedII uses six times as many ing in the last hour before midnight UTC on July 20. At threads to spread as Code-RedI v2, only one probe in eight midnighl;, the worm was programmed to switch from an is sent to a random IP address, with the rest sent to lo- "infection phase" to an "attack phase", so the large rise in cal networks as described in Section HI. Because our/8 host inactivity is due to this design. The end of day phase network contained no susceptible hosts, the net probe rate change can be seen clearly in Figure 8, which shows the we observe from CodeRedII is the same as that of Code- number of newly inactive hosts per minute. As in previ- RedI v2. Thus, we cannot distinguish hosts infected with ous graphs, the spike near 16:30 is caused by a gap in data CodeRedH from those infected with Code-RedI v2 without collection. collecting packet payloads. In their October 2001 study, Arbor Networks measured the ratio between Code-RedI A.3 CodeRedlI and CodeRedlI probes to be 1:3 [20]. This 1:3 ratio may Because the CodeRedII worm infects the same host pop- indicate the ratio between hosts infected with CodeRedlI ulation ~LS Code-RedI v2, we neither expected nor mea- versus CodeRedI. However, we expect that the bias to- sured an increase in the number of hosts probing our net- wards hosts on the same subnet causes wide variations work once the CodeRedlI worm began to spread. We also in the actual probe rates measured at different locations monitored no significant difference in the overall number across the Internet. of unsolicited TCP SYNs. Figure 9 shows the raw probe B. Host Characterization rate (including both worm spread and port scans) into our /8 network for every 2 hours between August 1 and Au- In this section, we look at the properties of the host pop- gust 22. The spike in probes on August 6 shows backscat- ulation infected by the Code-RedI and CodeRedlI worms.

278 Top 10 Countries Top 10 Domains Country hosts hosts(%) Domains hosts hosts(%) United States 157694 43.91 Unknown 169584 47.22 Korea 37948 10.57 home.corn 10610 2.95 China 18141 5.05 rr. com 5862 1.63 Taiwan 15124 4.21 t-dialin.net 5514 1.54 Canada 12469 3.47 pacbell.net 3937 1.10 United Kingdom 11918 3.32 uu.net 3653 1.02 Germany 11762 3.28 aol.com 3595 1.00 Australia 8587 2.39 hinet.net 3491 0.97 Japan 8282 2.31 net.tw 3401 0.95 Netherlands 7771 2.16 edu.tw 2942 0.82 TABLE I TABLE II/ TOP TEN COUNTRIES WITH CODE-RED INFECTED HOSTS ON TOP TEN DOMAINS WITH CODE-RED INFECTED HOSTS ON JULY 19. JULY 19.

Top 10 Top-Level Domains GOV hosts infected by the worm. Approximately 50% of TLD hosts hosts(%) all July 19th infected hosts had no reverse DNS records, Unknown 169584 47.22 so they could not be classified by their domain names. net 67486 18.79 These included, for example, 390 addresses in the reserved com 51740 14.41 network space 10.0.0.0/8. These machines were probably edu 8495 2.37 on private networks and were infected via either an exter- tw 7150 1.99 nal interface or another machine accessible via both inter- 4770 1.33 JP nal and external networks. This suggests that many more ca 4003 1.11 hosts on internal networks may have been compromised in it 3076 0.86 a manner transparent to our monitor. fr 2677 0.75 nl 2633 0.73 B.3 Domain Names TABLE ]1 Table HI shows the top ten domains in terms of the ToP TEN TOP-LEVEL DOMAINS WITH CODE-RED INFECTED number of infected hosts. We note that the top domain HOSTS ON JULY 19. names are providers of home and small business connec- tivitty, suggesting that hosts maintained by individuals at home are an important aspect of global Internet health. B.1 Countries B.4 Host Classification To understand the demography of the Code-RedI v2 epi- We utilized the reverse DNS records for the August demic on July 19, we examined the domains, geographic Code-Red infected hosts to identify the function of the locations, and top level domains. (TLDs) of the infected compromised machines. While reverse DNS records did hosts. Table I shows the breakdown of hosts by coun- not exist for 55% of the hosts infected in August 2001, try, as placed by IxMapping [18]. Surprisingly, Korea is the second most prevalent source country of compromised we did manage to identify about 22% of the host types. Computers without reverse DNS records are less likely to machines, with 10.57% of all infected hosts. be running major services (such as those demonstrated in B.2 Top-Level Domains the other host types). Broadband and dial-up services rep- resented the vast majority of identifiable hosts, as shown Table II provides a breakdown of machines infected on in Figure 10(a). Furthermore, the large diurnal variations July 19th by top-level domain (TLD). NET, COM, and in the number of infected hosts suggest that these ma- EDU are all represented in proportions roughly equivalent chines are unlikely to be running production web servers to their estimated share of all existing hosts, as estimated of any kind, a surprising result given that the worm at- by NetSizer [21]. We also observed 136 MIL and 213 tacks a vulnerabihty in web servers. This periodicity con- 279 Code-Red Infected Hc~ts By Hcet Type Code-RDd Inlected Hosts By Host Type 7O00 00 , Othor Broadband .... Web Server .... ~al-Up ...... Narrw.server ...... ~ooo Mail Server -- , Fqrewall -- Web Server ...... IRC Server .... Nameserver ...... Flr~all ...... 5O - , ~1 .~ 5ooo , IRC Sen'or ......

~oo 00

1ooo ''. . ., • ... i.i o oo:oo oo:oo oo:oo oo:oo oo:oo 00:00 oo:o0 00:00 o&'og 08/18 08/23 08/02 08/09 08/16 08,223 lime (UTC) 'ime (UTC)

(a) All hosts with reverse DNS records. (b) A closer look at the lower ranges.

Fig. 10. Reverse DNS Record-based classification of Code-Red hosts.

DNS-based host types source denial-of-service attack targets shared this charac- Type Average Hosts Hosts(%) teristic. And while web servers, mail servers, and name- Unknown 88116 54.8 servers were the target of 5% of all denial-of-service at- Other 37247 23.1 tacks, they represent only 1.1% of the computers infected Broadband 19293 12.0 by the Code-Red worm. Dial-Up 14532 9.0 Web 846 0.5 B.5 Timezones Mail 731 0.5

Nameserver 184 0.1 180000

Firewall 9 0.0 160000

IRC 2 0.0 14,0000

TABLE IV 120000 THE CLASSIFICATIONSOF HOSTNAMES BASED ON .~ 100000 REVERSE-DNS LOOKUPS OF THE IP ADDRESSES OF ~ 80000

CODE-RED INFECTED HOSTS BETWEEN AUGUST 1 AND 60000

AUGUST 8, 2001. SHOWN HERE ARE THE AVERAGE 4.11000

NUMBER OF ACTIVE HOSTS IN EACH TWO HOUR INTERVAL 20000

AND THE OVERALLPERCENTAGE OF EACH TYPE OF HOST 0 00:00 00:00 00:00 00:00 00:00 00:O0 00:00 00:00 00:00 ACROSS THE WHOLE SEVEN DAY INTERVAL. UNKNOWN 08/01 os~04 0~07 °s~°u-°s/'l'z"'°s/ls,,,,,~u,,,~ 0s/19 0s/22 0~5 HOSTS HAD NO REVERSE DNS RECORDS. Fig. 11. Unique IP addresses infected with Code-RedI v2 in ten minute intervals. trasts with the limited diurnal variation seen in the number Figure 11 shows the number of unique IP addresses in- of infected web and DNS servers in Figure 10(b), which fected with Code-RedI v2 in ten minute intervals. From show limited fluctuations from day to day. We do ob- the figure, we see that the number of infected hosts fol- serve significant diurnal changes in the number of infected lows both diurnal and weekly variations. While the slight mail servers, indicating that we may be mis-identifying decrease in infected hosts on the weekends (Aug 4-5, 11- the function of a number of these computers. Overall, the 12, and 18-19) is immediately apparent, the origins of the number of broadband and dial-up users affected by this rather strangely shaped daily variations proved perplexing. random-source worm seems to significantly exceed those Suspecting that the varying local times of day obscured affected by random-source denial-of-service attacks [19]. the infection pattern, we identified the longitude of each While 2A% of all hosts compromised by Code-Red were infected host via IxMapping and mapped each host to an home a:ad small business machines, only 13% of random- approximate timezone. We ignored minor variations in

280 60000 .... +o9oo iJST) -- , I , I , I , I , -o4oo (t~OT) ..... /~ , /'~ +0200 (CEST) ...... = 50000 ]\ A I I -0500 (COT) 00: 1 -0700 PD ..... 2-10 l/a oooo, o ...... 11 - 100 [ \ I \ m I \ t I +0700 (WAST) ...... 101 - 1000 4.0000 / \ [ "~ / ~ /'~ I / / \ ~ -0300 (ADT) ...... 1G01 - 10000 +0000 G ...... 1~1 - 100000 b 50- ~ 30000

11 i ~ 20000 ~100...... ' .-

~ 10000 ;r "',. .,: .~ .--..:~,~'L:..zx<,'7=~:,., .~,,~:, -~.~'~., . ":., ", .~,,.,~ .... 1 .. :.i:',,:: ...... ~"'.: • , " /, ,"" "'.;:'.:~-"'...."-.i..'--.. "::.--"..Y -" ', ~r -".... ,. /"- ". .~.r~ ~ ~~.C,..~2' ,.~' L ' " "

00:00 00:00 00:00 00:00 00:00 08/02 08/04 08/06 08/08 08/10 ~~.~.~...~..~.~:,~::::..:::~,..~..r~:...... • .. Iocal0me ' I ' I ' I ' i ' I 50 100 150 300 250 Fig. 12. Unique IP addresses infected with Code-RedI v2 in Total Ullique IP Addl~sses it~r $ubn©t two hour intervals, loealtime. Fig. 14. The DHCP effect: the relationship between total active IP addresses in a subnet and the maximum number of IP the longitudinal boundaries of timezones, as well as differ- addresses active simultaneously, August 2-16. ences in observation of daylight saving within a timezone. We then recreated the interval of Figure 11 between Au- Between August 2 and August 16, we observed two gust 2 and August 10, differentiating infected hosts ac- million IP addresses actively transmitting the worm (Fig- cording to local time, rather than UTC. ure 13), yet only 143000 active hosts in the most active Figure 12 shows the results of this differentiation for ten minute period (Figure 11). This order of magnitude the top ten timezones in terms of number of infected hosts. discrepancy leads us to question whether there were actu- The diurnal pattern clearly follows the pattern of the busi- ally around two milfion infected hosts, or whether the use ness day, with the number of infected hosts rising sharply of DHCP is sufficiently extensive that it artificially inflates around 8 am and falling off in the afternoon and evening IP address-based estimates of the extent of the Code-RedI hours as people shut down their computers to go home for epidemic. the night. The Code-RedI worm attacks a vulnerability in To answer this question, we compared two measures of Microsoft web server software, yet production web servers the infection within a subnet: the number of total unique are not usually shut down at the end of the day. We suspect IP addresses in each subnet active at any time between Au- that these machines are office desktop computers whose gust 2 and August 16; and the 2 hour period in between users are not aware that they are running an active web August 2 and August 16 in which the greatest number of server. This calls into question both the wisdom and the infected hosts were actively spreading the worm simulta- security of automatically enabling software unbeknownst neously. We plotted total unique IP addresses on the X to the end user. axis, maximum IP addresses per two hour window on the Y axis, and then colored the data points based on the num- B.6 Subnets ber of subnets with the same X and Y values (Figure 14). The resulting graph is surprisingly bimodal: one line of 2.5e+06 hosts stretches along the y = x intercept, representing

2e+06 subnets in which the total number infected and the max- imum number infected were the same - no shift in the IP

1.5e+06 addresses of infected machines was detected. A far more populous arm stretches just above the X axis, showing

le+06 many subnets with as many as fifteen times as many to- tal IP addresses infected as were infected simultaneously. 500000 / This suggests that without accounting for this "DHCP ef- fect," counting the number of IP addresses infected by a

00:00 00:00 00:00 00:00 00:00 00:00 00:00 pathogen grossly overestimates the actual number of in- 08/01 08/04 08/0~m e 1~lC~1~} 08/13 08/16 08/19 fected machines. Fig. 13. Cumulative total of unique IP addresses infected with While the vast majority of the subnets had fewer than the second outbreak of Code-RedI v2. fifteen machines infected, a few subnets had as many as two hundred simultaneously infected machines. We inves-

281 Patch Rate in Top 10 Countries els of local and national news coverage of the Code-RedI Country patched (%) unpatched (%) worm and its predicted resurgence on August 1, the re- United Kingdom 65.65 34.34 sponse to the known threat was sluggish. Only after Code- United States 59.59 40.41 RedI began to spread again on August 1 did the percent- Ca~aada 57.57 42.42 age of patched machines increase significantly, rising from Germany 55.55 44.44 32% to 64%. Netherlands 46.46 53.53 Improvement is needed in the communication of in- Japan 39.39 60.61 formation about present threats to non-English speaking Australia 37.37 62.62 countries. As shown in Table V, there is a significant gap Korea 20.20 79.79 between the patch rate in English speaking countries and Taiwan 15.15 84.84 non-English speaking countries. China 13.13 86.86 We observed a wide range in the response to Code- RedI exhibited by the top ten most frequently infected do- TABLE V mains, as shown in Table VI. While many of these do- PATCHING RATE SEEN ON AUGUST 14TH FOR THE TEN mains contain IP addresses that are assigned dynamically COUNTRIES WITH CODE-RED INFECTED HOSTS ON JULY via DHCP, the percentages of unpatched machines remain 19. PERCENTAGES ARE OF INFECTED HOSTS IN EACH valid whether or not the machines we reached in our sur- COUNTRY THUS EACH ROW ADDS UP TO 100% vey were known previously to be infected. Some ISPs took aggressive action to prevent the spread of the worm, in- cluding temporarily blocking both inbound and outbound tigated I:he ownership of the subnets with the most infected traffic on port 80 and rapid notification of customers who machine,s and discovered that they belonged to Microsoft. were observed to be spreading the worm. While DHCP use may artificially inflate the number of The EDU top-level domain exhibited a much better infected hosts as measured by IP addresses, the use of Net- patching response to Code-RedI than did COM or NET work Address Translation may artificially deflate the num- - 81% of infected hosts were patched by August 14. COM ber of compromised IP addresses that we measured. While (56%) and NET (51%) did respond well, ranked third and many irtfected machines can sit behind a NAT router, it ap- sixth, respectively. pears to the rest of the Internet as only a single machine. We attempted to use the observed probe rate of a host as a g. CONCLUSION way of identifying NAT IP addresses. However, the wide The primary observation to make about the Code-RedI variation in machine load and network connection speed of worm is the speed at which a malicious exploit of a ubiqui- individually infected IP addresses masks all but the most tous software bug can incapacitate host machines. In par- blatant evidence of NAT use. Further work on quantify- ticular, physical and geographical boundaries are meaning- ing the effects of NAT on epidemiological study of worm less in the face of a virulent attack. In less than 14 hours, spread is in progress. 359,104 hosts were compromised. This assault also demonstrates that machines operated C. Repair rate by home users or small businesses (hosts less likely to be We performed a follow-up survey to determine the ex- maintained by a professional systems administrators) are tent to which infected machines were patched in response integral to the robustness of the global Internet. As is the to the Code-RedI worm. Every day between July 24 and case with biologically active pathogens, vulnerable hosts August 28, we chose ten thousand hosts at random from can and do put everyone at risk, regardless of the signifi- the 359,000 hosts infected with Code-RedI on July 19 and cance of their role in the population. probed them to determine the version number and whether Care must be taken in estimating the extent of the spread a patch had been applied to the system. Using that infor- of Internet pathogens. The effects of DHCP on IP address mation, we assessed whether they were still vulnerable to counts lead to gross overrepresentation of the cumulative the IIS exploited by Code-RedI. number of hosts infected over time. The majority of sub- Although this data does not show the immediate re- nets show a discrepancy between the total number of IP sponse co Code-RedI, it does characterize the efficacy over addresses observed to be transmitting the worm and the time of user response to a known threat. Between July 24 maximum number active in a two hour period. and July 31, the number of patched machines increased Finally, we should all be concerned that it seems to take an average of 1.5% every day. Despite unprecedented lev- a global, catastrophic incident to motivate us to respond 282 L:n:~,:she~Ik(i .---, ...... :,~...... !:-~~!~i!~ i;~.-': :.: ~, ~ :'

25 ,, :'~ ~ .,, * ...... " :~- ..% ~/

60 ~o .:: / i

;i:- "/ / ! to i :.:.. _. • ~

o01o0 ®i00 00':.... i.... '00 0 ..... ®':00 d00 ®':00 ®':00 07,'28 O8/02 O8'O9 08/'~° 08,'~'3 07/26 O8/02 O5/09 08/I 8 08,'23

(a)AU survey attempts. (b) Hosts which responded.

Fig. 15. Patching rate of IIS servers following initial Code-RedI v2 outbreak on July 19th.

Domain Unpatched IIS (%) Patched IIS (%) Conn. Timeout (%) I Conn. Refused (%) in-addr.arpa 40 7 30 11 home.com 44 5 30 8 rr.com 44 5 27 10 t-dialin.net 0.4 0 81 16 aol.com 0.3 0 39 61 pacbell.net 29 8 24 23 uu.net 0.6 0.2 51 47 hinet.net 20 0 46 25 net.tw 32 1 46 13 edu.tw 60 2 20 5 TABLE VI PERCENTAGE BREAKDOWN OF PATCHING SURVEY RESPONSES BY CATEGORY FOR THE TOP DOMAINS ORIGINALLY INFECTED WITH CODERED V2. ROWS ADDING TO LESS THAN 100% ARE DUE TO RESPONSES NOT BEING CLEARLY CATEGORIZABLE AS PATCHED IIS OR UNPATCHED IIS. MOST DOMAINS SHOW A LARGE PERCENTAGE OF CONNECTION REFUSED OR CONNECTION TIMEOUT SUGGESTION FILTERING OF TRAFFIC, DISABLING OF PREVIOUSLY RUNNING IIS SERVERS OR DHCP.

to a known threat. The exploit was discovered on June 18, ACIRD for providing an additional view point of data; Jef- 2001 and the first version of the Code-RedI worm emerged frey Mogul and Compaq Research for additional data; Jeff on July 12, 2001. The especially virulent strain of the Brown (UCSD/CSE) for producing animations of worm worm (Code-RedI v2) began to spread on July 19, a full spread; Ken Keys (CAIDA) for development of graphs and 29 days after the initial discovery of the exploit and four discussion; Bill Fenner (AT&T Research) for useful com- days after the detection of the first (static seed) attack. As ments and fli2gif; and Stefan Savage (UCSD) and Geoff the economies of many nations become increasingly de- Voelker (UCSD) for suggestions. Support for this work pendent on wide area network technologies, we must criti- was provided by DARPA ITO NGI and NMS programs, cally assess and remedy the economic consequences of the NSF ANIR, and CAIDA members. This work would not current lack of adequate network and host security mea- have been possible without the generous support of Cisco sures. Systems.

VI. ACKNOWLEDGMENTS REFERENCES

We would like to thank Pat Wilson and Brian Kantor [1] E. Spafford, "The internet worm: Crisis and aftermath," 1989. of UCSD for data and discussion; Vern Paxson (LBL and [2] Charles Schmidt and Tom Darby, "The

283 Morris Internet Worm," Tech. Rep., htt p://www.software.com.pl/newarchive/rnisc/Worrrddarbyt/ pages/history.html. [3] John Shoch and Jon Hupp, "The 'Worm' Programs - Early Expe- rience with a Distributed Computation," Communications of the ACM, vol. 25, no. 3, pp. 172-180, Mar. 1982. [4] CERT Coordination Center, "CERT Advisory CA-1989-04 WANK Worm On SPAN Network," hggp://www.cerg. org/advi sories/CA- 1989- 04. htrnl. [5] Max Vision, "Ramen Internet Worm Analysis," http : //www. whitehat s. corn/librar y/worrns / ramen/. [6] SANS Global Incident Analysis Center, "Lion Worm," http: //WWW. sans. org/y2k/lion.htrn. [7] Computer Economics, "2001 economic impact of malicious code attacks," http ://www. computereconomics, corn/cei/ press/pr92101, html. [8] eEye Digital Security, "Advisories and Alerts: AD20010618," http : //www. eeye. com/html/Research/ Advisories/AD20010618. html. [9] Microsoft, "A Very Real and Present Threat to the Internet," http:llwww.microsoft.comltechnetltreeviewldefault.asp?url=l technet/security/topic s/codealrt.asp. [10] eEye Digital Security, "Advisories and Alerts: .ida "Code Red" Worm," July 2001, hgtp ://www. eeye. com/hgrnl/ Re sear ch/Advi sorie s/AL20010717, htrnl. [11] Silicon Defense, "Code Red Analysis page," http://www. silicondefense, com/cr/. [12] Cisco Systems, Inc, "Cisco Security Advisory: "Code Red" Worm - Customer Impact," htgp://www.cisco.com/ warp/public/707 / cisco-code- red-worm-pub. shtrnl. [13] eEye Digital Security, "CodeRedlI Worm Analysis," Au- gust 2001, http : //www. eeye. corn/html/Research/ Advisories/AL200 i0804, html. [14] S~zurityFocus, "SecurityFocus Code Red II Informa- tion Headquarters," http : //aris. securityfocus, corn/ alerts/codered2/. [15] "Cisco Ned~low,'' http ://www. cisco, corn/warp/ public/732/netflow/. [ 16] Vern Paxson, "Bro: A System for Detecting Network Intruders in Real-Time," Computer Networks, vol. 31, no. 23-24, pp. 2435- 2463, 1999. [17] eEye Digital Security, "eEye CodeRed analysis, commented dis- assembly, full IDA database, and binary of the worm;' http : //www. eeye. com/htrnl/advisories/codered, zip. [18] Ixiacom IxMapping, '~xmapping," http : //www. ipmapper. com. [19] David Moore, Geoffrey M. Voelker, and Stefan Savage, "Inferring Internet Denial-of-Service Activity," Usenix Security Symposium, 20) 1. [20] Dug Song and Rob Malan and Robert Stone, "A Snapshot of Global Internet Worm Activ- ity~7' http://research.arbor.neffup _media/up_files/ snap- shot _worrn_activity_f.ps. [21] Netsizer, "Evaluating the size of the internet," http : //www. netsizer, com.

284