Characterization of Spam Advertised Website Hosting Strategy

Chun Wei Alan Sprague Gary Warner Anthony Skjellum Dept. of Computer and Dept. of Computer and Dept. of Computer and Dept. of Computer and Information Sciences, Univ. Information Sciences, Univ. Information Sciences, Univ. Information Sciences, Univ. of Alabama at Birmingham of Alabama at Birmingham of Alabama at Birmingham of Alabama at Birmingham 1300 University Blvd. 1300 University Blvd. 1300 University Blvd. 1300 University Blvd. Birmingham, AL, USA Birmingham, AL, USA Birmingham, AL, USA Birmingham, AL, USA [email protected] [email protected] [email protected] [email protected]

ABSTRACT 2. SPAM-ADVERTISED WEBSITES This paper surveys a three months of spam data and investigates The reason spam has become popular is that it is profitable and the hosting strategy of spam domains that are used to sell almost risk-free. According to a survey by New Research from pharmaceutical, luxury goods and sexual enhancement tools. Marshal London, 29% of internet users have purchased goods Thousands of domains have been found and most of them use from web sites advertised by spam emails[10]. Researchers at wildcard DNS records to support non-existing machine names. UCSD studying the Storm Worm projected that the The hosting IP addresses are much fewer than the number of pharmaceutical spam portion of the Storm Worm activities may domains, with a large number of domains hosted on a limited have generated as much as $350 Million for the botnet controllers number of hosts. The majority of these heavily-used hosts reside [7]. The most commonly purchased items include sexual in networks outside the USA. These hosts are stable and have enhancement pills, pirated software, adult material, and luxury good connectivity and availability. As a result, many domains on goods such as watches, jewelry and clothing. These goods are these hosts are alive for the entire three-month investigation expensive if acquired from legitimate sellers. Table 1 shows some period. The hosting IP addresses began to move in late March. of the most popular spam-advertised sites in our data. According The new IP addresses, however, are still in the same network to the Microsoft SIR report [9], 49% of all spam emails are range as the old IP addresses. The result suggests that further advertisements for pharmaceutical products. However, spam investigation on spam web hosting will be beneficial in disrupting emails are a means to an end. The end is the web site that will the spamming network. complete online transactions with buyers. To establish a spam- advertised site, a spammer needs to find a hosting place and 1. INTRODUCTION register a domain name. Because of anti-spam efforts, a spam domain is likely to be detected quickly and reported to a blacklist Despite the effort to combat spam, the number of spam messages [14]. To maintain the site availability, spammers usually have to is steadily increasing every year. In the second half of 2008, register many domain names that will resolve to the same site. If Microsoft says 97% of all email messages on the Internet were some domains get blocked by spam filters, then new ones will be spam [9], and the majority of spam emails are sent by bots, added. The hosting servers should have good connection and zombie computers infected by malware. Because botnet machines high availability just as a regular online store website and are are used, spam can be sent with almost no cost to the spammer unlikely to reside on a bot. Therefore, we predict the hosts should and minimal risk of revealing the true identity of the spammers. be on more stable networks and are probably owned and Although spam emails have been used for many purposes, some maintained by spammers. Figure 1 shows how the spamming common email-based crimes include: 1) spreading malware network operates. The spammer controls the bots through a through hostile email attachments or a URL pointing to malware- command & control server (C&C). He updates information on infested websites; 2) luring recipients to counterfeit bank sites the C&C server and each of his bots contact the server to receive (phishing sites) to steal account and personal information by new commands and new spam templates and email address lists. delivering misleading emails claiming to be sent by a legitimate Then the bots send out the spam emails with web links pointing to financial institution or online business; 3) scam emails that lure websites. The spammer also maintains the websites on various people into false transactions by exploiting human greed, such as web-hosts he controls, as well as maintaining the corresponding promising lottery winnings, overseas inheritances, or easy work- DNS entries on name servers under his control. To stop spam at-home jobs with great salaries; or 4) advertising counterfeit more effectively, we need to use spam emails as a clue to target products and services, e.g. pharmaceuticals, luxury jewelry and the hosting and C&C servers. This will greatly disrupt the watches, sexual products, software. spamming network and raise the cost and risk for spammers. Figure 1: Information flow on spamming network 3. RELATED WORK So far many spam researches have been focused on spam of stability helps ensure success when product sales and credit messages, such as spam filtering and categorization [4, 8] and its card transactions are involved. In contrast to the Spamscatter distribution channel– the botnets [2, 5], but much less effort on findings, we discovered many significant hosts (hosting vast the hosting and C&C servers that are actually owned by number of spam domains) reside outside the US and more than spammers. Compared to bots, we believe the hosting structure half of the domains are hosted in more than 1 physical IP address. should be more stable and thus easier to trace. The Spamscatter project [1] probed the linkage in spam emails and clustered the destination web sites to explore the scam hosting infrastructure. Table 1: Popular spam-advertised websites They found most scam hosts were steady and had high availability Average Email and good connectivity. Our research both in previous work and in Web site thumbnail this paper is focused on targeting the origin by analyzing the spam count per day emails. We began with clustering the email headers and subjects [11], but that revealed little about where emails came from. 5000 Tracking the sender’s IP addresses can lead to the botnets, but the botnets are actually intended to protect the spammers from being 2400 traced [12]. To trace back to the C&C server, one needs to analyze the communication traffic of a bot and any suspicious computer on the Internet. After spammers adopted Peer–to–Peer 1158 C&C infrastructure [6], the tracking became more difficult. Therefore, we started to explore the derived information from the 570 spam-email-embedded URLs, such as web hosts information, domain registrar information and web content. They provide 222 more useful clues than the email itself in tracing the spammers. This paper shows the result of our first attempt to analyze the 125 spam hosting infrastructure in a systematic way. Our findings support our prediction that spam-advertised websites selling products and services are less likely to be hosted on botnets, which are more transient. Successful spam campaigns need highly available servers and reliable connectivity to successfully complete their sales transaction. While botnets are used successfully for hosting phishing and malware sites, a higher level 4. METHODOLOGY investigators. We are also able to trace a particular IP group by This section describes our methods: the collection of data, the monitoring the domains hosted there each day. extraction of domain name, probing the hosting IP addresses and fetching the websites. 4.5 Website Fetching 4.1 Data Collection We also fetch the websites hosted on the most significant IP groups based on the number of domains. The successful fetch rate For this study, we gathered spam emails from a number of is about 60%. Later we found out some hosts have firewalls that domains controlled by our researchers, including the “catch-all” can block probing IP addresses if they show up frequently in log email accounts for these domains. Traditionally emails sent to files. Therefore, some of our probing IP addresses have been non-existent users are rejected by a mailserver, but in the case of a blacklisted and always returned “network timeout”. As a result, catch-all address, all emails are accepted, but non-existent we have to abandon automatic fetching until we find a more addresses are sent to a single account. Email sent to these catch- successful strategy. all domains which do not correspond to real users form a large percentage of our data set. From our dataset of more than 7 million emails, we chose to focus on a three month period 5. RESULTS containing approximately 1.2 million email messages received Over 95% of the spam in our dataset contained URLs, among from January 2009 to March 2009. which we found 42,703 domains that were using wildcard DNS entries and which accounted for 1,050,180 host machine names 4.2 Extracting Domain Name appearing in the email messages. Obviously spammers have taken We extract URLs from email messages and then the domain name advantage of this technique to create a large number of phantom portion of the URL. During the process, we observed many spam machines by adding a random string in front of an existing domains use wildcard DNS records. registered domain name, e.g. one of these domains we found had 11,383 phantom machines associated with it. We believe the A wildcard DNS record is a DNS record that will resolve requests motivation for this is that if a spam filter is building a domain for non-existent machine names having a matched domain suffix blacklist, it will find many domains that are actually made up [13]. It is specified by using a "*" as the left most part of a from just a single real domain. machine name, e.g. *.domain.com. Therefore, if a user requests a domain name ending with “domain.com” that does not have a Multiple IP hosting, a technique used to balance traffic load, is corresponding entry in the DNS records, the wildcard record will also very common. About 58% of the domains have been hosted be used to resolve the request. on more than 1 IP address. To test a wildcard domain, we first extract the domain name portion from the machine name, e.g. the domain name for 5.1 Top Hosting IPs “zhpt.tarecahol.cn” would be “tarecahol.cn”. Then we create our There are over 4000 hosting IP addresses found in the three- own phantom machine by attaching a random string to the domain month period of time. Several groups of IP addresses stand out as name. If the new machine can still be resolved, it proves to be they host a great number of domains that are mostly spam- using wildcard DSN record. Then it is very likely all other advertised websites. Table 2 is a list of the Top ten IP groups domains ending with the same domain name should also resolve hosting the largest number of domains. These IP groups host 75% to the same site. This strategy greatly reduces the number of of the 42,703 domains we found from January 2009 to March machines that need to be fetched. 2009. Table 2: Top 10 hosting IP groups 4.3 Probing the Hosting IPs IP addresses Domain count The Unix “dig” command is used to check IP hosting information. We save the domain-IP pair in a database table. Since a domain 60.191.221.126 /220.248.186.101 8951 can be hosted on more than 1 IP address and an IP address can host many domains, there is a many-to-many relationship between 220.248.186.111 5571 domain and IP and each domain-IP pair is a unique entry. We 203.93.212.239/ 211.91.237.3 3923 also record the date when the domain is first observed in spam emails and the last time it is observed. The WHOIS information 220.248.186.114/220.248.186.126 3712 for each IP is also retrieved using the “dig” command, and we store the network block, organization name, country code and asn 203.93.212.239/ 220.248.186.126 2073 number in another table. The two tables are linked by IP index. 87.242.78.57 1394 4.4 Clustering Domains on IP Group 110.52.7.250/211.91.237.3 1345 Once the IP addresses of the domains are fetched, we cluster the 60.191.221.135/ 219.152.120.12/ 1040 domains on a daily basis depending on the IP group they belong 220.248.172.37 to. An IP group is one or several IP addresses that simultaneously host the same domain. Since many domains have the same IP 125.181.106.147 960 combination in DNS record, the clustering result reveals the most 210.51.10.189 937 heavily hosted IP groups, which is interesting to spam 5.2 Top Hosting Network Blocks Table 4: Top 10 hosting countries The hosting IP addresses are distributed among 1391 network Country IP count Domain count blocks residing in 61 countries. Table 3 shows the Top ten network blocks that host the most number of domains (NW is China 72 31516 abbreviation for Network). Note a domain can be hosted on USA 3426 5199 multiple network blocks, so it will be counted multiple times in this table. Nine of them are in China and are spread out in 6 Korea, Republic of 45 1816 provinces. The other one is in Russia. Russia 92 1695 We suspect these long-lived IP addresses hosting many domains are examples of “bullet-proof” hosting services. One current Canada 94 670 example can be found at http://www.bulletproof-server.net [3]. Bulgaria 7 521 This website advertises that “Bullet Proof Hosting” is meant for people who want to promote their product, service on their web Germany 141 509 site by sending commercial and bulk emails. What is interesting Hong Kong 4 446 is that the site says their servers are located in Data Center in China, and all BP servers can be used as hosting servers or proxy Hungary 51 438 mailing servers. A single IP address server costs $1200 per month, but a 30-IP address server is only $4200 per month. Other Romania 24 394 services offered by bullet-proof providers include a feature that will hide the true domain name by displaying a disguised domain name in the URL. This kind of servers becomes an ideal place 5.4 IP Sharing for spammers to host their websites because of the lack of Based on the websites we fetched, we discovered many IP groups regulation. are hosting more than one website, suggesting some relationship Table 3: Top 10 hosting network blocks between these websites. Figure 2 shows the different websites and the number of domains that were hosted on IP addresses IP Domain Organization Network block 220.248.186.114 and 220.248.186.126 during the week of count count Valentine’s Day. There are five different spam websites: ED Pill Store, Dr. Maxman, Exquisite Replicas, PowerGain+ and US China Unicom Hu 220.248.160.0/19 6 22343 Health Care. nan NW 110.52.0.0/15 3 2991 China Jinhua 60.191.192.0/18 4 10454 Telecom Co.Ltd China Qingdao 203.93.208.0/21 1 5948 Cncgroup NW China United Tele 211.91.224.0/20 1 5268 com Corp. 219.152.0.0/17 1 2104 CHINANET Chongqing NW 222.176.0.0/14 1 977 Russian Federation Moscow 87.242.64.0/18 2 1395 Masterhost-hstd

CNCGROUP 221.200.0.0/14 2 1338 Figure 2: Share of websites on host 220.248.186.114 and Liaoning NW 220.248.186.126, Feb 14-20, 2009 Changsha CenterCityNetBar 58.20.0.0/16 1 981 NW On IP 210.51.10.189, we found 9 different websites corresponding to different subdirectories (Table 5). For example, 5.3 Top Hosting Countries a domain on that IP with the subdirectory “/a” would point to The next table shows the Top ten countries that host the most “Penis Enlarge Patch Rx” site, and the subdirectory “/f” would number of domains. USA has the largest number of IP addresses, lead to “Lifefree: The Safest and Easiest way out!” site. but the average number of domains hosted on those IP addresses Therefore, IP sharing is very common among different spam is relatively smaller. In contrast, the IP addresses in mainland groups, suggesting there may be strong connection among them. China have much larger average number of domains and the total number of IP addresses is only a small fraction of that in USA. Table 5: Websites hosted in different sub-directories on the spam emails in the future. Therefore, it will be interesting to 210.51.10.189 keep tracking and see if they come back again. We also plan to Sub-directory Website collect more spam emails from different sources and see if that still holds or it just occurs to our dataset. /a Penis Enlarge Patch Rx /f Lifefree: The Safest and Easiest way out! Although the USA has the highest number of hosting IP /n Anatrim The most powerful fat loss blend addresses, the most heavily used IP addresses reside outside USA, beyond the reach of US jurisdiction. Some IP addresses in China available anywhere are found to host more than 8000 real domains in a period of three /r Ultra allure pheromones months. The number of domains created for spam is astonishing, /s Advanced Gain Pro Pills suggesting strong organized criminal behavior. These IP /t Elite Extender addresses are found to tie with some popular spam, such as /u Ultra Curves Canadian Pharmacy and PowerGain+. The spam emails adverting /v ERECTifix Rx Strips : All the Strength of these products are sent by some notorious Trojan-controlled Viagra, Cialis, and Levitra in Half the Time! botnets. We suggest that this evidence be used to form strategic /z Secrets of Seduction relationships with law enforcement in China to assist in stopping this criminal web-hosting activity. If provided with overwhelming evidence of spam being sent from these bullet-proof servers, it is 5.5 IP Shifting hoped these relationships will encourage a new level of Starting from around Mar 18, some old IP groups began to stop enforcement. receiving new domains and corresponding new IP groups started to emerge for the same spam cluster. The chart (Figure 3) shows IP sharing is also common among different spam websites, the number of domains on two IP groups per day from Jan 01 to suggesting they are either related to one spammer or reflect March 31, 2009. Both IP groups are found to host counterfeit cooperation between different spammers. “Canadian Pharmacy” websites. Therefore, the spammer In the future, we want to make several improvements. Many apparently moved the hosting IP addresses by replacing old IP URLs are redirections, thus the IP addresses we fetched are not addresses with new ones in the nameserver entries. We then the ones for the destination domains. We need to record the checked the domains on old IP groups and found they also destination domains and their corresponding IP addresses and changed to the same IP groups. However, the new IP addresses compare them with the ones appeared in emails. We also need to are still in the same network range. check the domains more regularly to see if the hosting IP Compare to the IP addresses, domains are replaced more addresses change. We suspect the spammers have multiple IP frequently as 87% of them appeared in spam emails for at most addresses in several different networks and shift the hosts back two consecutive days although many of them are alive during the and forth from time to time. Because some hosts have deployed entire investigation period. defenses against repeated probing, we need to develop a mechanism that will use dynamic IP addresses while fetching the web content. We also want to tie the IP and domain clusters to the 6. CONCLUSIONS AND FUTURE spam emails to find the number of email messages that are WORK referring to the domains and the patterns in the spam messages. This paper analyzed the IP hosting of web links appearing in spam Tracking the nameserver of the hosting IPs should also be useful emails. Many spam domains are found to be using wildcard DNS because it is where the spammers update the DNS records. We records, resulting in large number of phantom machines in spam suspect some nameservers are also controlled by spammers and emails. The results confirmed our speculation that hosting IPs are are used to host spam websites as well. Our goal is to find all the more stable and easier to trace than botnets. Many hosting IPs hosts controlled by the spammers and their residing networks, have remained for more than two months or maybe even longer what websites are hosted there, and the nameservers. Then we since we started the analysis on Jan 1, 2009. The shifting of can relate the websites to the spam emails and find the scope of hosting IP addresses does occur but much less frequent than the the spam. The results should catch the attention of law rate of domains. One interesting thing is most domains showed enforcement. If something can be done to disrupt the hosting up in our spam email collection for only one day and ceased to networks of spammers, the spam emails will become useless as appear again, but many of them are still alive. We suspect the hosts are gone. We believe it will be a effective way to reduce cause is a counter-strategy of spammers against domain spam as well. blacklisting. They may keep the old domains alive to be used in Figure 3: Websites Domain count per day on two IP groups hosting "Canadian Pharmacy" websites, Jan-Mar 2009

7. REFERENCES An empirical analysis of spam marketing conversion. In [1] Anderson, D. S., Fleizach, C., Savage, S., & Voelker, G. M. Proc. of 15th ACM Conference on Computer and (2007). Spamscatter: Characterizing internet scam hosting Communication Security. infrastructure. In Proc. of 16th USENIX Security Symposium, [8] Li, F. and Hsieh, M. H. (2006). An empirical study of pp.125-148. clustering behavior of spammers and group-based anti-spam [2] Bacher, P., Holz, T., Kotter, M. and Wicherski, G (2005). strategies. In Proc. of the Third Conference on Email and “Know your enemy: Tracking botnets”. The Honeynet Anti-Spam. Project and Research Alliance, http://www.honeynet.org [9] Microsoft Security Intelligence Report, Volume 6, 2H08. [3] Bullet proof hosting package: http://www.bulletproof- http://www.microsoft.com/sir/ server.net/bulletproof-hosting.html [10] Sex, Drugs and Software Lead Spam Purchase Growth [4] Calais, P. H., Pires, D. E. V., Guedes, D. O., Meira, W. Jr., http://www.marshal.com/pages/newsitem.asp?article=748 Hoepers, C. and Klaus, S. (2007). A campaign-based [11] Wei, C., Sprague, A., Warner, G. and Skjellum, A. (2008). characterization of spamming strategies. In Proc. of the Fifth Mining spam email to identify common origins for forensic Conference on Email and Anti-Spam. application. In Proc. of 2008 ACM Symposium on Applied [5] Cooke, E., Jahanian, F. and McPherson, D. (2005). The Computing, pp. 1433-1437. zombie roundup: Understanding, detecting and disrupting [12] Wei, C., Sprague, A. and Warner, G. (2008). Detection of botnets. Workshop on Steps to Reducing Unwanted Traffic Network Blocks Used by the Storm Worm Botnet. In Proc. on the Internet, pp. 39-44. of 46th ACM Southeast Conference. [6] Grizzard, J., Sharma, V. and Dagon, D. (2007). Peer-to-peer [13] Wildcard DNS Record: botnets: overview and case study. HotBots '07: Workshop on http://en.wikipedia.org/wiki/Wildcard_DNS_record Hot Topics in Understanding Botnets. [14] Zhang, J., Porras, P. and Ullrich, J. (2008). Highly Predictive [7] Kanich, C., Kreibich, C., Levchenko, K., Enright, B., Blacklisting. In Proc. of the 17 Conference on Security Voelker, G., Paxson, V. and Savage, S. (2008). Spamalytics: Symposium. pp. 107-122.