Characterizing Spam Traffic and Spammers
Total Page:16
File Type:pdf, Size:1020Kb
2007 International Conference on Convergence Information Technology Characterizing Spam traffic and Spammers Cynthia Dhinakaran and Jae Kwang Lee , Department of Computer Engineering Hannam University, South Korea Dhinaharan Nagamalai Wireilla Net Solutions Inc, Chennai, India, Abstract upraise of Asian power houses like China and India, the number of email users have increased tremendously There is a tremendous increase in spam traffic [15]. These days spam has become a serious problem these days [2]. Spam messages muddle up users inbox, to the Internet Community [8]. Spam is defined as consume network resources, and build up DDoS unsolicited, unwanted mail that endangers the very attacks, spread worms and viruses. Our goal is to existence of the e-mail system with massive and present a definite figure about the characteristics of uncontrollable amounts of message [4]. Spam brings spam and spammers. Since spammers change their worms, viruses and unwanted data to the user’s mode of operation to counter anti spam technology, mailbox. Spammers are different from hackers. continues evaluation of the characteristics of spam and Spammers are well organized business people or spammers technology has become mandatory. These organizations that want to make money. DDoS attacks, evaluations help us to enhance the existing technology spy ware installations, worms are not negligible to combat spam effectively. We collected 400 thousand portion of spam traffic. According to research [5] most spam mails from a spam trap set up in a corporate spam originates from USA, South Korea, and China mail server for a period of 14 months form January respectively. Nearly 80% of all spam are received from 2006 to February 2007. Spammers use common mail relays [5]. Our aim is to present clear techniques to spam end users regardless of corporate characteristics of spam and spam senders. We setup a server and public mail server. So we believe that our spam trap in our mail server and collected spam for the spam collection is a sample of world wide spam traffic. past 14 months from January 2006 to February 2007. Studying the characteristics of this sample helps us to better understand the features of spam and spammers We used this data for our study to characterize spam technology. We believe that this analysis could be and its senders. We conducted several standard spam useful to develop more efficient anti spam techniques. tests to separate spam from incoming mail traffic. The standard test includes various source filters, content 1. Introduction filter. The various source filter tests includes Baysean filter, DNSBL, SURBL, SPF, Grey List, rDNS etc. The E-mail has emerged as an important communication learning is enabled in content filters. The size of the source for millions of people world wide by its dictionary is 50000 words. At our organization we convenience and cost effectiveness [6]. Email provides strictly implement mail policies to avoid spam mails. user’s low cost message to large number of people by The users are well instructed on how to use mail simply clicking the send button. Email message sizes service for effective communication. ranges from 1 kb to multiple mega bytes which is much larger than fax and other communication The rest of the paper is organized as follows. Section 2 devices. The byproducts of email like instant discusses related work. Section 3 provides data messaging, chat etc., make life easier and adds more collection of legitimate and spam mails. In section 4, sophisticated facilities to the Internet users. According we describe our classification and characteristics of to a Radicati Group [18] study from the first quarter of spam traffic. Section 5 provides details of spammers 2006, there were about 1.1 billion email users and their technology. We conclude in section 6. worldwide. Traditionally the Internet penetration is very high in USA and Europe. But due to the recent 0-7695-3038-9/07 $25.00 © 2007 IEEE 831 DOI 10.1109/ICCIT.2007.89 2. Related work In [1] propose a novel approach to defend DDoS attack caused by spam mails. Their study reveals the effectiveness of SURBL, DNSBLs, content filters. They have presented inclusive characteristics of virus, worms and trojans accompanied spam as an attachment. Their approach is a combination of fine tuning of source filters, content filters, strictly implementing mail policies, educating user, network monitoring and logical solutions to the ongoing attack. In [3] examines the use of DNS black lists. They have examined seven popular DNSBL and found that 80% of the spam sources are listed in some DNSBL. In [4] presented a comprehensive study of clustering behavior of spammers and group based anti spam strategies. Their study exposed that the spammers has Figure 1. Incoming mail traffic demonstrated clustering structures. They have proposed a group based anti spam frame work to block .Spammers do not change their tactics on day to day organized spammers. In [5] presented a network level basis. Our study shows that spammers follow the same behavior of spammers. They have analyzed spammers technology until the anti spammers find efficient way IP address ranges, modes and characteristics of botnet. to keep them at bay. The time period ranges from 8 Their study reveals that blacklists were remarkably months to 1 year. We found that the major spammers ineffective at detecting spamming relays. Their study follow the same technology from May 2006 to states that to trace senders the internet routing structure February 2007. The figure 1 shows the incoming mail should be secured. In [6] presented a comprehensive traffic of our mail server for 2 weeks. The figure shows study of spam and spammers technology. His study that the spam traffic is not related to legitimate mail reveals that few work email accounts suffer from spam traffic. The legitimate mail traffic is two way traffic than private email. In [8] Gomez, Crsitino presented an induced by social network [8]. But the spam traffic is extensive study on characteristics of spam traffic in one way traffic. From this picture we can understand terms of email arrival process, size distribution, the that the server is handling more number of spam than distributions of popularity and temporal locality of legitimate mail traffic. email recipients etc., compared with legitimate mail traffic. Their study reveals major differences between The Figure.1 shows the number of legitimate mails, spam and non spam mails. spam and spam with virus as an attachment for a period of 2 weeks from February 1 to February 15. The 3. Data Collection x axis is day and y axis is the number of spam received by the spam trap on server. Roughly the number of Our characterization of spam is based on 14 months legitimate mails ranges from 720 to 7253 with an collection of data over 400,000 spam from a corporate average rate of 906 per day. The spam mails ranges mail server. The web server provides service to 200 from 1701 to 8615 with an average rate of 4736 per users with 20 group email IDs and 200 individual mail day. The spam with viruses as an attachment ranges accounts. The speed of the Internet connection is 100 from 209 to 541 with an average rate of 403 per day. Mpbs for the LAN, with 20 Mbps upload and download speed (Due to security and privacy concerns 4. Spam Classification and Characteristics we are not able to disclose the real domain name). To segregate spam from legitimate mail, we conducted a We have analyzed millions of spam received in our standard spam detection tests in our server. The spam spam trap. Mostly the spam mails are related to mails detected by these techniques were directed to the finance, pharmacy, business promotion, adultery spam trap in the mail server. services and viruses. Considerable amount of spam has virus and worms as an attachment [1]. 832 Plain text Text + link message to the website Days Others 1 2271 949 25 2 1790 2682 14 3 207 2492 1 4 489 734 1 5 169 2376 1 6 253 1520 0 7 546 2167 5 Spam trap received considerable numbers of spam without attachment. These mails are related to pharmacy. These pharmacy related spam contains text Figure 2. Spam with/ without attachment traffic messages and links to their web site. The size of these . mails in this category ranges from 1 kb to 3 KB. Based on our study the spam mails typically fall into one of two camps, like spam without attachment and spam with attachment. The Fig. 2 shows the number of spam with attachment and spam without attachment collections for a period of 2 weeks in our corporate mail server. The x axis is day and y axis is the number of spam received by our mail server. The spam with attachment mail traffic ranges from 354 to 4557 with an average rate of 1872 mails per day. The spam mails without attachment received by our spam trap ranges from 346 to 4825 with an average rate of 2722. The spam with attachment does not have any relationship with spam without attachment in the terms of traffic volume. Both are driven by different spammers with different technology. Figure 3. Spam without attachment collection 4.1. Spam without attachment Figure. 3 shows the number of plain text spam mails, Spam without attachments are mostly text messages. It spam mail containing text message and a link to the can be roughly categorized as text only messages and targets website or a URL and different types of spam text messages with URL or a clickable link to website. mails received by our spam trap for a period of seven The size of these message ranges from 2 kb to 3 kb.