A Dynamic Approach to Spam Filtering

International Journal of Pure and Applied Mathematics Volume 120 No. 6 2018, 179-200 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/ A DYNAMIC APPROACH TO SPAM FILTERING Indumathi.J 1, Gitanjali.J2 1Department of Information Science and Technology, Anna University,Chennai 600 025. Tamilnadu,India. 2School of Information Technology and Engineering VIT University, Vellore-632014, Tamil Nadu, India *Corresponding author E-mail: [email protected] July 13, 2018 Abstract Spam, the unsolicited guest in our inbox is showing incredible proliferation in the recent period. Spam is encoun- tered by every internet user at one point or the other. Spam- mers are growing leaps and bounds and they tirelessly get equipped with the latest technologies; forcing the research community to devise new anti-spam techniques. The email service providers in spite of providing filters, find that it is challenging to keep up with the shifting sands of spam technology. Hence, development of anti-spam filtering techniques should be perpetually kept in stride with appropriate outcome solutions. The algorithms frequently, determine spams using statistical or heuristic schemes. The paper pro- poses a system with improved facilities that can overcome all the limitations of the existing system. The proposed Fac- tors Hyperbolic Tree (FHT) based algorithm merged with the Bayesian algorithm (unlike the lexical matching algorithms), handles spam filtering in a dynamic environment by considering various relevant factors like age, temperature 1 179 International Journal of Pure and Applied Mathematics Special Issue etc., A decision is made normally by considering a certain amount of factors along with specific reasoning along with the keywords. To express the relationship among decisions, factors objects, a FHT is used which can describe every ob- ject of interest with related factors. The outcomes of the experimentations show this practice professionally filters E- mail, along with enhancement of the filtering degree of precision. In this paper, the proposed FHT algorithm, which, filters e-mail by conditional factors instead of word matching is enhanced. The enhance FHT is tested and proved to be apt for semantic analysis and decision supporting field. The results of the experiments, show that the proposed enhanced technique can more proficiently filter E-mail, and improve the filtering degree of precision, than the FHT algorithm. Experimentally, the enhance FHT is a more practical technique to be used in anti-spam filters; and it outperforms the previous techniques, thereby proving it is more robust. Key Words:Spam,Ranked Term Frequency,Factors Hy- perbolic Tree,Bayesian algorithm 1 INTRODUCTION In the hands of a cybercriminal, spam not only improves subjective gains; it also opens the Pandoras Box, leading to pilfering a person’s identity or sell contraband or stalk preys or unnerve operations with malicious programs or disseminate malicious contents. When the entire planet is strenuous to realize the purpose in lifetime by making a difference in all the line of work, the spam flip-flops everything with its payloads. Spam, the unsolicited bulk mail is usually sent to several recipients, with payloads of virus and Trojans. Let us take a quick look at the significance, history of email spams and filtering techniques used. 1.1 HISTORY OF EMAIL SPAM The first documented spam is a note, promoting the availability of a new model of Digital Equipment Corporation (DEC) comput- ers; and was sent by Gary Thuerk to 393 recipients on ARPANET in 1978.His assistant, Carl Gartley, wrote a single mass e-mail for which the feedback from the net community was violently negative. 2 180 International Journal of Pure and Applied Mathematics Special Issue From the USENET, the world’s biggest online conferencing system, mailing, the computer nerds (who were big fans of Monty Python) recognized the mass mailing as spam and the name caught on. Af- ter this major recorded instance, many spam incidents have made quantum jumps. This unsolicited email was called spam. Table 1 in annexure gives a quick view of the spam timeline. 1.2 WHY DO WE CALL UNWANTED EMAILS ”SPAM”? The geneses of the term is hilarious. The word originates from Monty Pythons famous Spam sketch. It was the script of that spam skit the inexorable spam, spam, spam, spam, and the way that unwanted mail just throw up out of the PC the email connection was forged by the Monty Python sketch. As the scene develops, you can see why this outline is a great analogy for uninvited and unwelcome email. Till date, the term spam is used to indicate the uninvited junk mail which makes up around 80 to 85% of email. Likewise, the anoraks who tenanted the PC world were ardent Python fans. Monty Python sketch happens in a spoon cafe with a menu in which every single dish features spam. 1.3 MOTIVATION Surf-Controls Anti-Spam Prevalence Study (2002) points out that there are many reasons unsolicited commercial e-mail is such a problem: Consumer perception: for many people the accessing of e-mail still• represents a bit of a struggle especially at peak traffic times, and network congestion can make it an laborious task to simply download your e-mail.[Surf Control ”AntiSpam Prevalence Study” (2002) (last accessed 19 June 2005)] Cost shifting: The inexpensiveness of an e-mail is the main reason• for its soaring utility. From the advertisers perspective, the price of sending hundreds or thousands of messages per hour, is even more economical. But the costs of receiving it ranging from the long-distance charges or per-minute access charges for dialing into an Internet service provider (ISP) to the cost of connectivity and disk storage space at the ISP and the inevitable administrative 3 181 International Journal of Pure and Applied Mathematics Special Issue costs when the incoming flood outstrips capacity, results in system outages. These costs can be quite substantial. Here cost of oppor- tunities is possible lost because of system outages, delayed services, and overflowing mailboxes. .[Surf Control ”AntiSpam Prevalence Study” (2002) (last accessed 19 June 2005)] Fraud: In response to several surveys it is found that the consumers• hate to receive spams and, many ISPs have taken a variety of costly steps to reduce the volume of spam transmitted through their systems, including the build-up of extra capacity to accommodate the demands of filtering and storing. Global implications: Email is a wonderful tool of professional and• personal communication; there are even more far-reaching po- tentials of e-mail that may be lost if the medium’s functionality and utility get destroyed by the proliferation of junk e-mail. The In- ternet is an incredible tool for spreading information critical to the development of freedom and democracy around the world. .[Surf Control ”Anti-Spam Prevalence Study” (2002) (last accessed 19 June 2005)] Harm to the marketplace: The email message from the spammers• travels to million people, via numerous other systems en route to its destinations, once again shifting cost away from the origina- tor. The carriers in between precipitously are bearing the burden of carrying advertisements for the spammer.[Surf Control ”Anti-Spam Prevalence Study” (2002) (last accessed 19 June 2005)] Theft: The sending of spam results in one party’s imposing costs• on another, against the party’s will and without permission. Some have called unsolicited e-mail a form of postage-due market- ing or a form of theft. .[Surf Control ”AntiSpam Prevalence Study” (2002) (last accessed 19 June 2005)] This paper is organized as follows: In the subsequent section 2, the perils of spam are briefly reviewed and the relationship of spam with privacy and security are presented in Section 3. Possible solutions and antispam techniques are dealt in the sections 4 and 5.The notions of Factor Hyperbolic Tree, Ranked Term Frequency [RTF] are explained in section 6, 7.The results and analysis and the conclusions are given in Sections 8 and 9 respectively. 4 182 International Journal of Pure and Applied Mathematics Special Issue 2 WHY SPAM IS A PROBLEM? The most hostile of spam e-mails is that, these are not only as- saulting the users without their assent, but also invades the e-mail space of the user, discarding the network capacity and consumes prolonged time in checking and deleting the spam mails. To look at the tip of the iceberg, a few issues arising out of spams are dis- cussed. 2.1 DESTRUCTIONS CAUSED BY SPAM. Direct damages caused by spams are Loss of productivity and Use of corporate network resources like bandwidth, disk space. The indirect damages, like the risk of shifting spam accountability from one person to another or domain, are being recognized as spammers by the servers that have been sent spam deprived of knowing it and accidently deleting important valid messages erroneously when eradicating spam. Spam by propagation also leads to dangers like malware which does not have its own means of propagation: Trojans, key loggers, backdoors etc. There are cases wherein criminals, sent emails with attachments enclosing malware, and utilized it to access information stored on your computer. This known as ’invoice’ email scam steals bank details as the malicious software logs your online banking details, along with other financial information, and sends it to criminals. Trolls (http://www.digitalethics.org/essays/end-anonymous - emails) are individuals who hide their identity and behind an anonymous email address, derive pleasure in create an atmosphere of hate and discontent ranging from infrequent off-color remarks to full-blown daily bullying with far-reaching and sometimes tragic consequences. Just this year, a young girl committed suicide after being cyberbul- lied by anonymous posters on the site www.ask.fm.

A Dynamic Approach to Spam Filtering

A Rule Based Approach for Spam Detection

Review on Email Spam Filtering Techniques

The History of Digital Spam

Subject Line Labeling As a Weapon Against Spam

Amending Can-Spam & Aiming Toward a Global Solution

Ljubljana, 2015 Zbornik Digitalna Forenzika, Seminarske Naloge 2014/2015

Email Spam Filter Using Machine Learning

1972 Can Be Made with the Use of a Blue Box and a Plastic Toy Whistle That Comes in Cap’N Crunch Cereal Boxes

Fakhraei Umd 0117E 18330.Pdf (5.635Mb)

Clustering Spam Domains and Hosts: Anti-Spam Forensics with Data Mining

Research Report Prepare for Voip Spam

A Brief History of Email