International Journal of Pure and Applied Mathematics Volume 120 No. 6 2018, 179-200 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/

A DYNAMIC APPROACH TO FILTERING

Indumathi.J 1, Gitanjali.J2 1Department of Information Science and Technology, Anna University,Chennai 600 025. Tamilnadu,India. 2School of Information Technology and Engineering VIT University, Vellore-632014, Tamil Nadu, India *Corresponding author E-mail: [email protected]

July 13, 2018

Abstract Spam, the unsolicited guest in our inbox is showing in- credible proliferation in the recent period. Spam is encoun- tered by every internet user at one point or the other. Spam- mers are growing leaps and bounds and they tirelessly get equipped with the latest technologies; forcing the research community to devise new anti-spam techniques. The service providers in spite of providing filters, find that it is challenging to keep up with the shifting sands of spam technology. Hence, development of anti-spam filtering tech- niques should be perpetually kept in stride with appropriate outcome solutions. The algorithms frequently, determine spams using statistical or heuristic schemes. The paper pro- poses a system with improved facilities that can overcome all the limitations of the existing system. The proposed Fac- tors Hyperbolic Tree (FHT) based algorithm merged with the Bayesian algorithm (unlike the lexical matching algo- rithms), handles spam filtering in a dynamic environment by considering various relevant factors like age, temperature

1

179 International Journal of Pure and Applied Mathematics Special Issue

etc., A decision is made normally by considering a certain amount of factors along with specific reasoning along with the keywords. To express the relationship among decisions, factors objects, a FHT is used which can describe every ob- ject of interest with related factors. The outcomes of the experimentations show this practice professionally filters E- mail, along with enhancement of the filtering degree of pre- cision. In this paper, the proposed FHT algorithm, which, filters e-mail by conditional factors instead of word matching is enhanced. The enhance FHT is tested and proved to be apt for semantic analysis and decision supporting field. The results of the experiments, show that the proposed enhanced technique can more proficiently filter E-mail, and improve the filtering degree of precision, than the FHT algorithm. Experimentally, the enhance FHT is a more practical tech- nique to be used in anti-spam filters; and it outperforms the previous techniques, thereby proving it is more robust. Key Words:Spam,Ranked Term Frequency,Factors Hy- perbolic Tree,Bayesian algorithm

1 INTRODUCTION

In the hands of a cybercriminal, spam not only improves subjective gains; it also opens the Pandoras Box, leading to pilfering a person’s identity or sell contraband or stalk preys or unnerve operations with malicious programs or disseminate malicious contents. When the entire planet is strenuous to realize the purpose in lifetime by making a difference in all the line of work, the spam flip-flops everything with its payloads. Spam, the unso- licited bulk mail is usually sent to several recipients, with payloads of virus and Trojans. Let us take a quick look at the significance, history of email spams and filtering techniques used.

1.1 HISTORY OF The first documented spam is a note, promoting the availability of a new model of Digital Equipment Corporation (DEC) comput- ers; and was sent by Gary Thuerk to 393 recipients on ARPANET in 1978.His assistant, Carl Gartley, wrote a single mass e-mail for which the feedback from the net community was violently negative.

2

180 International Journal of Pure and Applied Mathematics Special Issue

From the , the world’s biggest online conferencing system, mailing, the computer nerds (who were big fans of Monty Python) recognized the mass mailing as spam and the name caught on. Af- ter this major recorded instance, many spam incidents have made quantum jumps. This unsolicited email was called spam. Table 1 in annexure gives a quick view of the spam timeline.

1.2 WHY DO WE CALL UNWANTED ”SPAM”? The geneses of the term is hilarious. The word originates from Monty Pythons famous Spam sketch. It was the script of that spam skit the inexorable spam, spam, spam, spam, and the way that unwanted mail just throw up out of the PC the email connection was forged by the Monty Python sketch. As the scene develops, you can see why this outline is a great analogy for uninvited and unwelcome email. Till date, the term spam is used to indicate the uninvited junk mail which makes up around 80 to 85% of email. Likewise, the anoraks who tenanted the PC world were ardent Python fans. Monty Python sketch happens in a spoon cafe with a menu in which every single dish features spam.

1.3 MOTIVATION Surf-Controls Anti-Spam Prevalence Study (2002) points out that there are many reasons unsolicited commercial e-mail is such a prob- lem: Consumer perception: for many people the accessing of e-mail still• represents a bit of a struggle especially at peak traffic times, and network congestion can make it an laborious task to simply download your e-mail.[Surf Control ”AntiSpam Prevalence Study” (2002) (last accessed 19 June 2005)] Cost shifting: The inexpensiveness of an e-mail is the main reason• for its soaring utility. From the advertisers perspective, the price of sending hundreds or thousands of messages per hour, is even more economical. But the costs of receiving it ranging from the long-distance charges or per-minute access charges for dialing into an Internet service provider (ISP) to the cost of connectivity and disk storage space at the ISP and the inevitable administrative

3

181 International Journal of Pure and Applied Mathematics Special Issue

costs when the incoming flood outstrips capacity, results in system outages. These costs can be quite substantial. Here cost of oppor- tunities is possible lost because of system outages, delayed services, and overflowing mailboxes. .[Surf Control ”AntiSpam Prevalence Study” (2002) (last accessed 19 June 2005)] Fraud: In response to several surveys it is found that the consumers• hate to receive spams and, many ISPs have taken a variety of costly steps to reduce the volume of spam transmitted through their systems, including the build-up of extra capacity to accommodate the demands of filtering and storing. Global implications: Email is a wonderful tool of professional and• personal communication; there are even more far-reaching po- tentials of e-mail that may be lost if the medium’s functionality and utility get destroyed by the proliferation of junk e-mail. The In- ternet is an incredible tool for spreading information critical to the development of freedom and democracy around the world. .[Surf Control ”Anti-Spam Prevalence Study” (2002) (last accessed 19 June 2005)] Harm to the marketplace: The email message from the spam- mers• travels to million people, via numerous other systems en route to its destinations, once again shifting cost away from the origina- tor. The carriers in between precipitously are bearing the burden of carrying advertisements for the spammer.[Surf Control ”Anti-Spam Prevalence Study” (2002) (last accessed 19 June 2005)] Theft: The sending of spam results in one party’s imposing costs• on another, against the party’s will and without permission. Some have called unsolicited e-mail a form of postage-due market- ing or a form of theft. .[Surf Control ”AntiSpam Prevalence Study” (2002) (last accessed 19 June 2005)] This paper is organized as follows: In the subsequent section 2, the perils of spam are briefly reviewed and the relationship of spam with privacy and security are presented in Section 3. Possible solutions and antispam techniques are dealt in the sections 4 and 5.The notions of Factor Hyperbolic Tree, Ranked Term Frequency [RTF] are explained in section 6, 7.The results and analysis and the conclusions are given in Sections 8 and 9 respectively.

4

182 International Journal of Pure and Applied Mathematics Special Issue

2 WHY SPAM IS A PROBLEM?

The most hostile of spam e-mails is that, these are not only as- saulting the users without their assent, but also invades the e-mail space of the user, discarding the network capacity and consumes prolonged time in checking and deleting the spam mails. To look at the tip of the iceberg, a few issues arising out of spams are dis- cussed.

2.1 DESTRUCTIONS CAUSED BY SPAM. Direct damages caused by spams are Loss of productivity and Use of corporate network resources like bandwidth, disk space. The indirect damages, like the risk of shifting spam accountability from one person to another or domain, are being recognized as spammers by the servers that have been sent spam deprived of knowing it and accidently deleting important valid messages erroneously when eradicating spam. Spam by propagation also leads to dangers like malware which does not have its own means of propagation: Trojans, key loggers, backdoors etc. There are cases wherein criminals, sent emails with attachments enclosing malware, and utilized it to access information stored on your computer. This known as ’invoice’ email scam steals bank details as the malicious software logs your online banking details, along with other financial information, and sends it to criminals. Trolls (http://www.digitalethics.org/essays/end-anonymous - emails) are individuals who hide their identity and behind an anonymous email address, derive pleasure in create an atmosphere of hate and discontent ranging from infrequent off-color remarks to full-blown daily bullying with far-reaching and sometimes tragic consequences. Just this year, a young girl committed suicide after being cyberbul- lied by anonymous posters on the site www.ask.fm.

3 SECURITY,PRIVACY AND SPAMS

Security and privacy are like the two eyes and they are indispens- able. There are several mechanisms of preserving privacy Gitan- jali J.,(2007,2008,2009),Indumathi.J.,(2012,2013a,b,c), Indumathi

5

183 International Journal of Pure and Applied Mathematics Special Issue

J., Uma G.V.(2007a,b,2008a,b,c,d),Murugesan K.,(2009,2010a,b),Prakash D.,(2009),Satheesh Kumar K(2008). ePrivacy Directive The Directive on Privacy and electronic communications (2002/ 58 /EC), part of the EUs eCommunications regulatory framework, which came into force in July 2002 states that; the ePrivacy Direc- tive protects the privacy and the personal data of natural persons (and the legitimate interests of legal persons) when using commu- nications services. It also bans spam and spy ware. (https://iapp.org/media/pdf/resource center/ePR 2018-04-13 .pdf) Does the Spam Filter Compromise the Privacy of My Email? A spam filtering mechanism has to be devised in order to use a spam control system which does not jeopardize the privacy of clients email messages. The spam filter software should run on mail servers, which is absolutely automated, running with minimum human intervention, should not save any data about the message after regulating their spam scores.

4 SOLUTIONS

We can keep the spam problem at bay and limit it to tolerable levels. The research community has continuously striving to de- tect and eliminate spam for quite a few years. Several techniques have shown beneficial, confined results, but most only for a short time. Henceforth, for any new anti-spam proposal, we need an array of complementary techniques and sustained efforts to accli- matize them, as spammers endure to adapt their own methods. The solutions can be taken in terms of its likely incremental bene- fit, rather than as a nominee to be the Final Ultimate Solution to Solve Spam (FUSSP).

4.1 How Anti-Spam Can Save Your Business? (https://afterglowprod.com/security/4-important-reasons-to-use-anti- spam-filtering-in-your-business/) Block threats: The spam filter scope is to block the spam from ever reaching the email client. The best solution is to automatically detect and delete or hold the activating spam malware securely. These spams get activated either instantly or slowly.

6

184 International Journal of Pure and Applied Mathematics Special Issue

Filter legitimate emails: A genuine mail has to be discernible and shouldnt be trashed. Anti-spam filtering is bestowed with eru- dite recognition capabilities which prevents spam only and lets the genuine email mail to land securely in mailboxes. Meet data regulations: all businesses are subject to strict pri- vacy and data storage regulations, and one has to meet conditions including always using spam filtering to reduce the risk of data breach. Protect your business reputation: Unless the data of the client is fully protected, the company faces financial loss, business reputa- tion takes a nosedive and they have to admit the breach. Anti-spam filtering can ensure these types of scenarios dont happen to you.

5 ANTI-SPAM TECHNIQUES

To tackle the spams we have to concoct protection mechanisms. A quick glance at the anti-spam techniques are as follows: Various anti-spam techniques are used to prevent email spam. No technique is a complete solution to the spam problem, and each has trade-off between incorrectly rejecting legitimate email (false positives) vs. not rejecting all spam (false negatives) and the asso- ciated costs in time and effort. The Anti-spam techniques can also be discussed under four broad groups: Anti-spam techniques which assist individual actions, • Anti-spam techniques programmed by email administrators, • Anti-spam techniques automated by email senders and • Anti-spam techniques employed by researchers and law en- forcement• officials. The different approaches which filter data, by automatically identifying and eradicating the untenable messages are Knowledge-based technique, • Clustering techniques, • Learning based technique, • Heuristic processes. Different• existing email spam filtering system regarding Machine Learning Technique (MLT) such as Nave Bayes, SVM, K-Nearest• Neighbor, Bayes Additive Regression, KNN Tree, and

7

185 International Journal of Pure and Applied Mathematics Special Issue

rules. An email comprises information about the sender, receiver, and message and there are divergent mechanisms to safe guard them. Header forging is taken care by Google, Yahoo, and who spend money and resources. Our fortification is usually determined by the email service utilized as everyone uses different type of pro- tections. Types of Filters based on location Depending on the location of the message navigating from the sender to the subscribers inbox, we can categorize the various types of filters based on influence deliverability and inbox placement as Gateway spam filters, Third party (or hosted) spam filters and Desktop spam filters. Types of filtering analysis The main types of filtering analysis, looking at the four main aspects of mail when making filtering decisions are the source of the mail, the reputation of the sender, the content of the mail they send and Subscriber engagement

6 RELATED WORK

An extensive comparison of the different methods has been made in many papers [Androutsopoulos et al.,(2000), H. Katirai [1999], K. Mock [2001], Zhang et al.,(2004)] Amidst several techniques, the methods proposed by, Fiaidhi et al. [2013] and Arora et al. [2014] estimates that, 70% todays business emails are spam[Scholar, M. (2010)]. Knowledge engineering and Machine learning are two subdivi- sions of spam filtering. Knowledge engineering guides you to de- termine the spam emails; whereas, Machine learning does not need any rules defined prior. As majority of the spam filtering methods utilize text techniques [Chang et al.,(2009)]; classifiers are used to extract features from an email. Many available models use machine learning algorithms [Asuncion et al.,(2007), Ian H. Witten et al.,(2005), T.S. Guzella .,(2009)]. There are systems that use automatic classification of emails [4]; that use decision-based systems [C. Wu(2009)], Bayesian classifiers [Yue Yang et al.,(2007)], support vector machine [I. An-

8

186 International Journal of Pure and Applied Mathematics Special Issue

derouysopoulos et al.,(2000 a,b)], neural networks[Yue Yang et al.,(2007)] and sample-based methods [F. Fdez Riverola et al.,(2007)]. Literature study also indicates that there are various approaches of features selection methods used in e-mail classification. But we need to classify only with a small set of discriminative features is preferred in view of processing complexity. The trajectory of three information visualization innovations are tree maps, cone trees, and Hyperbolic tree. Lamping et al.[1995] introduced the Factors Hyperbolic Tree (FHT) which uses a hy- perbolic tree. Hyperbolic trees employ hyperbolic space, which innately has ”more room” than Euclidean space. Hyperbolic trees have been patented in the U.S. by Xerox.[ US patent 5590250, Lamping]. The charts are ideal for exhibiting single values and simple se- quences of values. But the case is tangent when hierarchical data needs to be displayed. An array of problems arise when the users need to visualize the information relating to each individual node, and the existence (and possibly the nature) of the relationships be- tween nodes. The problem is further compounded wherein, it be- comes cluttered with very large trees.To overcome this limitation a Hyperbolic Tree (or Hypertree) is used.

7 HYPERBOLIC TREE

The Hypertree, reduces the screen space used to display nodes ex- ponentially as the distance from the centre of the chart increases. Thus the Hypertree can pack much more information into the little available space than a standard tree graph. Hyperbolic Trees (or Hyper Trees, or H-Trees) is a metaphore for the graphical repre- sentation of ontologies. A hyperbolic tree originates from a central point (node/general category), and it then branches off into subcat- egories/ outgoing relations. Every subcategory continues to branch out so that you have manifold levels of specificity. Advantages- The hyper tree is dynamic, it has good focus, it provides user with main focal point in the whole hierarchy and users can select their own navigation direction. It is useful to show diverse discrete levels of detail for one category. Disadvantages-It has copyright issues. Because of the hyperbolic

9

187 International Journal of Pure and Applied Mathematics Special Issue

distortion, some nodes are too close to the border and therefore may not be visible to all.

8 IMPLEMENTATION

8.1 FACTOR HYPERBOLIC TREE (FHT) Keeping in view of the tradeoffs, we have taken the Factor Hy- perbolic Tree (FHT) for anti-spam filtering. The factor hyperbolic tree based algorithm deliberates on the dynamic factors to filter spam, and it eludes the use and upkeep of white-list and black-list. Frequently objects have relevant factors that affect its value. The FHT model expresses the relationship between objects, its factors and decisions. To implement the model a large database is essential and it should be periodically updated with knowledge about vari- ous factors. The decision is taken by bearing in mind the factors with its reasoning. The issues such as digitizing the relationship between objects and factors, accessing relevant factors from huge database, types of factor that must be included, etc. should be addressed before the implementation.

8.1.1 CLASSIFIED FACTORS The factors can be classified as two types: Normal Factors- The factors that change according to time naturally• or periodically. Abnormal Factors - The factors that occur suddenly and affect the• object quickly. The fuzzy logic is used to score the e-mail. This score is related to a predefined value to check if the mail is a spam or not. The influencing factor in the algorithm is the object‘s attribute which is matchless to every object.

10

188 International Journal of Pure and Applied Mathematics Special Issue

[Source:https://www.ontology4.us/Ontology4/Visualization/Hyperbolic- Trees/index.html]

8.2 RANKED TERM FREQUENCY [RTF] The Ranked term frequency algorithm extracts features from the mail. Generally nouns are reclaimed from the mail and exposed to verification, that how important is that word for the document. The term is ranked according to the number of times it appears in the document. This count is normalized. A high weight in RTF is reached by a high term frequency, so the weights tend to filter out important noun terms. Henceforth, to know the real meaning of document, first m nouns with greater weights are selected to FHT process. a. FHT based algorithm 1. Extract features by using RTF and input to FHT. 2. Search related factors to the features in FHT and output fac- tors to Fuzzy Logic Unit in two types: normal factors and abnormal factors. 3. Fuzzy Logic Unit will compute normal factors to obtain an incomplete result; then abnormal factors will be processed to obtain another incomplete result.

11

189 International Journal of Pure and Applied Mathematics Special Issue

4. Weighted computation Unit will calculate two incomplete re- sults with predefined weights to obtain a final decision.

Factorial analysis is applied to optimally setting up a rule base, during step 3.We test the final output compared to prior known spam to identify the spam email by FHT. b. Bayesian Algorithm Bayesian spam filtering, is a commonly used to identify and filter spam. It entails manual intervention as a user would train the function as to what is spam. The tool uses the concept of Bayes theorem and so using probabilities to weigh a message. Gmail and Yahoo use Bayesian spam filtering technique. Bayesian spam filter looks for combination of words that are statistically likely to occur in spam messages, and for words that are statistically likely to occur in legitimate messages, in order to determine the probability that an e-mail is likely to be spam or a legitimate e-mail. The Bayesian filter works on the principle that most events are dependent and that the probability of an event occurring in the future can be inferred from the occurrences of this event in the past. This approach is used to identify spam. If some piece of text occurred mostly in spam emails but not in legitimate mail, then it would be reasonable to suppose that this email is probably spam.

9 ANALYSIS AND RESULT

The experimental results show that the FHT algorithm when com- bined with Bayesian algorithm filters out spam with high precision. Furthermore, the FHT algorithm is more efficient than other meth- ods when it filters E-mails with complex influencing factors and when this is paired with Bayesian techniques the spam are detected more easily and with more precision. The main feature is that the FHT based algorithm can filter E-mails based on influencing fac- tors instead of matched words to allow dynamic filtering of spam Emails. The Bayesian algorithm matches the keywords also so this makes sure that even if a spam gets past the FHT algorithm it is caught by the Bayesian algorithm and vice versa. The pairing up of FHT and Bayesian algorithms ensures that a spam does not go undetected.

12

190 International Journal of Pure and Applied Mathematics Special Issue

The experimental results show that the FHT algorithm when combined with Bayesian algorithm filters out spam with high preci- sion. Furthermore, the FHT algorithm is more efficient than other methods when it filters E-mails with complex influencing factors and when this is paired with Bayesian techniques the spam are de- tected more easily and with more precision. The main feature is that the FHT based algorithm can filter E-mails based on influenc- ing factors instead of matched words to allow dynamic filtering of spam Emails. The Bayesian algorithm matches the keywords also so this makes sure that even if a spam gets past the FHT algorithm it is caught by the Bayesian algorithm and vice versa. The pairing up of FHT and Bayesian algorithms ensures that a spam does not go undetected.

10 PERFORMANCE AND EVALUA- TION

In order to compute the performance of the FHT based algorithm and associate it to Bayesian algorithm the following experimental setup is made. Experiment environment: Windows XP professional • Development tools: VS2005, Microsoft Access 2003. A.• Results and evaluation The result of the enhanced e-mail filter is based on the weights of normal and abnormal factors. Varied ratios of the above normal and abnormal factors produced diverse outputs . In this experiment, the ratio is set to a static value and several groups of E-mails are tested appropriately. The outputs are verified for x1 spam and x2 emails. Based on the correct classifications of spam and email it is found that the proposed enhanced FHT algorithm expresses a lower correct filtering percentage than the FHT and Bayesian algorithm. The enhanced FHT algorithm also presents a higher filtering percentage than the FHT algorithm, which indicates that enhanced FHT algorithm is more efficient for filtering spam than the FHT algorithm in our experiment.

13

191 International Journal of Pure and Applied Mathematics Special Issue

11 CONCLUSION

The outcomes of the experimentations show this practice profes- sionally filters E-mail, along with enhancement of the filtering de- gree of precision. In this paper, the proposed FHT algorithm, which, filters e-mail by conditional factors instead of word matching is enhanced. The enhance FHT is tested and proved to be apt for semantic analysis and decision supporting field. The results of the experiments, show that the proposed enhanced technique can more proficiently filter E-mail, and improve the filtering degree of preci- sion. Experimentally, the enhance FHT is a more practical tech- nique to be used in anti-spam filters; particularly when information related to existing environment factors is not precisely denoted by lexical matching algorithms.

References

[1] Anderouysopoulos, J. Koutsias, K.V. Chandrianos, G. Paliouras, C. Spyrpolous, An evaluation of nave Bayesian anti- spam filtering, in: Proceeding of 11th Euro Conference on Match Learn, 2000.

[2] Anderouysopoulos, J. Koutsias, K.V. Chandrianos, G. Paliouras, C. Spyrpolous, An experimental comparison of nave Bayesian and keywordbased anti-spam filtering with personal e-mail messages, in: Proceeding of the an International ACM SIGIR Conference on Res and Devel in Inform Retrieval, 2000.

[3] Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C.D. Spyropoulos, and P. Stamatopoulos. Learning to filter spam e- mail: A comparison of a naive bayesian and a memorybased approach. In H. Zaragoza, P. Gallinari, , and M. Rajman, editors, Proceedings of the Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000),pages 113, 2000.

[4] Anti-Spam filtering using neural networks and Bayesian classi- fiers. Yue Yang and Sherif Elfayoumy. Proceeding of the 2007

14

192 International Journal of Pure and Applied Mathematics Special Issue

IEEE international symposium on computational intelligence in robotics and automation. [5] Asuncion, D. Newman, UCI Machine Learning Repository, 2007. http://www.ics.uci.edu/mlearn/MLRRepository.html. [6] Baku Azerbaijan ,SaadatNazirova Institute of Infotech Technology of Azerbaijan National Academy of Science 2011, published Online August 2011(http://www. SciRP. org/journal/cn)accepted May 15,2011. Communication and Network 2011,3,153-160. [7] Bin Wang Gareth J. F. Jones Wenfeng Pan, ”Using On- line Linear Classifiers to Filter Spam Emails”, Springer-Verlag London Limited 2006, Published online: 3 October 2006. [8] Enrico Blanzieri,AntonBryl, 10 July 2009,”A Survey Of Learning-Based Techniques of Email Spam Filtering”,Springer,Published on- line: 10 July 2009 Springer Science+ Business Media B. V. 2009. [9] F. Fdez Riverola, E. Iglesias, F. Diaz, J.R. Mendez, J.M. Chor- chodo, Spam hunting: an instance-based reasoning system for spam labeling anf filtering, Decis. Support Syst. 43 (3) (2007) 722e736. [10] Gitanjali J., Banu S.N., Indumathi J., Uma G.V.(2008), A Panglossian Solitary-Skim Sanitization for Privacy Preserv- ing Data Archaeology, International Journal of Electrical and Power Engineering. Vol. 2, No. 3, pp.154 -165. [11] Gitanjali J., Md.Rukunuddin Ghalib., Murugesan K., Indu- mathi J., Manjula D. (2009) , An Object-Oriented Scaffold Premeditated For Privacy Preserving Data Mining of Outsourced Medical Data,International Journal of Software Engineering and Its Applications , Accepted for pub- lication. In press. 2009. [12] Gitanjali J., Md.Rukunuddin Ghalib., Murugesan K., Indu- mathi J., Manjula D. (2009) , A Hybrid Scheme Of Data Cam- ouflaging For Privacy Preserved

15

193 International Journal of Pure and Applied Mathematics Special Issue

Electronic Copyright Publishing Using Cryptography And Wa- termarking Technologies, International Journal of Security and Its Applications.

[13] Gitanjali J., Shaik Nusrath Banu*, Geetha Mary A*., Indu- mathi J., Uma G.V.(2007), An Agent Based Burgeoning Framework for Pri- vacy Preserving Information Harvesting Systems, Interna- tional Journal of Computer Science and Network Security, Vol.7, No.11, pp.268-276.

[14] H. Katirai(1999). Filtering junk e-mail: A performance com- parison between genetic programming and naive bayes.

[15] Harisinghaney, A., Dixit, A., Gupta, S., & Arora, A. (2014, February). Text and image based spam email classification us- ing KNN, Nave Bayes and Reverse DBSCAN algorithm. In Op- timization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on (pp. 153-155).IEEE.

[16] Harisinghaney, A., Dixit, A., Gupta, S., & Arora, A. (2014, February). Text and image based spam email classification us- ing KNN, Nave Bayes and Reverse DBSCAN algorithm. In Op- timization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on (pp. 153-155). IEEE.

[17] Ian H. Witten, Eibe Frank, Data Mining e Practical Machine Learning Tools and Techniques, second ed., Elsevier, 2005.

[18] Indumathi J.(2012) , A Generic Scaffold Housing The In- novative Modus Operandi For Selection Of The Superlative Anonymisation Technique For Optimized Privacy Preserving Data Mining,Chapter 6 of book Data Mining Applications in Engineering and Medicine, Edited by Adem Karahoca InTech ; ISBN: 9535107200 9789535107200 ; 335 pages ;pp.133-156

16

194 International Journal of Pure and Applied Mathematics Special Issue

[19] Indumathi J.(2013a) , Amelioration of Anonymity Modus Operandi for Privacy Preserving Data Publishing , Chapter 7 of book Network Se- curity Technologies: Design and Applications. Abdelmalek Amine (Tahar Moulay University, Algeria), Otmane Ait Mohamed (Concordia Uni- versity,USA) and Boualem Benatallah (University of New South Wales, Australia).Release Date: November, 2013. Copy- right 2014. 330 pages; PP. 96-107 Indumathi J., (2013b), An Enhanced Secure Agent-Oriented Burgeoning Integrated Home Tele Health Care Framework for the Silver Generation, Int. J. Advanced Networking and Ap- plications Volume: 04, Issue: 04, Pages: 16-21, Special Issue on Computational Intelligence A Research Per- spective held on 21st -22nd Feburary, 2013 [20] Indumathi J., (2013c), State-of-the-Art in Reconstruction- Based Modus Operandi for Privacy Preserving Data Dredg- ing, Int. J. Advanced Networking and Applications Volume: 04, Issue: 04, Pages: 9-15, Special Issue on Computational Intelligence A Research Perspective held on 21st -22nd Febu- rary, 2013 [21] Indumathi J., Uma G.V.(2007a), Customized Privacy Preser- vation Using Unknowns to Stymie Unearthing Of Association Rules, Journal of Computer Science, Vol. 3, No. 12, pp. 874-881. [22] Indumathi J., Uma G.V.(2007b), Using Privacy Preserving Techniques to Accomplish a Secure Accord, International Journal of Com- puter Science and Network Security, Vol.7, No.8, pp. 258-266. [23] Indumathi J., Uma G.V.(2008a), A Bespoked Secure Frame- work for an Ontology-Based Data-Extraction System, Journal of Software Engineering, Vol. 2, No. 2. pp. 1-13. [24] Indumathi J., Uma G.V.(2008b), A New flustering approach for Privacy

17

195 International Journal of Pure and Applied Mathematics Special Issue

Preserving Data Fishing in Tele-Health Care Systems, Inter- national Journal of Healthcare Technology and Management. Special Issue on: ”Tele-Healthcare System Implementation, Challenges and Issues.” Vol.9 No.5-6, pp.495 516(22). [25] Indumathi J., Uma G.V.(2008c), A Novel Framework for Op- timized Privacy Preserving Data Mining Using the innovative Desultory Technique, International Journal of Computer Ap- plications in Technology ; Special Issue on: ”Computer Appli- cations in Knowledge-Based Systems”. In press. 2008. Vol.35 Nos.2/3/4, pp.194 203. [26] Indumathi J., Uma G.V.(2008d), An Aggrandized Framework For Genetic Privacy Preserving Pattern Analysis Using Cryptography And Contra- vening - Conscious Knowledge Management Systems, Interna- tional Journal of Molecular Medicine and Advance Sciences. Vol. 4, No. 1, pp.33-40. [27] Int. J. Communications, Network and System Sciences, 2013, 6, 88 -99 http://dx.doi.org/10.4236/ijcns.2013.62011 Engi- neering Department, Faculty of Engineering, Aswan Univer- sity, Aswan, Egypt. [28] International Journal of Engineering Trends and Technology (IJETT) Volume 11 Number 6 - May 2014 ISSN: 2231-5381 http://www.ijettjournal.org Page 315 A Review on Different Spam Detection Approaches Rekha 1 ,Sandeep Negi 2 [29] International Journal of Engineering Trends and Technol- ogy (IJETT) Volume 4 Issue 9- Sep 2013 ISSN: 2231-5381 http://www.ijettjournal.org Page 4237 Content- Based Spam Filtering and Detection Algorithms- An Efficient Analysis & Comparison 1R.Malarvizhi, 2K.Saraswathi [30] IOSR Journal of Computer Science (IOSR-JCE) e-ISSN: 2278- 0661, p- ISSN:2278-8727 PP 68-72 www.iosrjournals.org Inter- national Conference on Advances in Engineering & Technology 2014 (ICAET-2014) 68, Page Effective Spam Detection Method for Email Savita Teli, Santoshkumar Biradar

18

196 International Journal of Pure and Applied Mathematics Special Issue

[31] K. Mock. An experimental framework for email categorization and management. In 24th Annual ACM International Confer- ence on Research and Development in Information Retrieval, New Orleans, LA, September 2001.

[32] L. Zhang, J. Zhu, and T. Yao. An evaluation of statistical spam filtering techniques. ACMTransactions on Asian Language Information Processing (TALIP), 3(4):243 269, 2004.

[33] Lamping, John; Rao, Ramana; Pirolli, Peter (1995). A fo- cus+context technique based on hyperbolic geometry for visu- alizing large hierarchies. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 1995). pp. 401 408.

[34] M. Chang, C.K. Poon, Using phrase as features in email clas- sification, J. Syst. Softw. 82 (2009) 1036e1045.

[35] Mohammed, S., Mohammed, O., Fiaidhi, J., Fong, S. J., & Kim, T. H. (2013). Classifying Unsolicited Bulk Email (UBE) using Python Machine Learning Techniques.

[36] Mohammed, S., Mohammed, O., Fiaidhi, J., Fong,S. J., & Kim, T. H. (2013). Classifying Unsolicited Bulk Email (UBE) using Python Machine Learning Techniques.

[37] Murugesan K., Gitanjali J., Indumathi J., Manjula D.(2009),Sprouting Modus Operandi for Selection of the Best PPDM Technique for Health Care Domain, International Journal Conference in recent trends in computer science. Vol.1, No.1, pp. 627-629.

[38] Murugesan K., Indumathi J., Manjula D. (2010a), An Opti- mised Intellectual Agent Based Secure Decision System For Health Care, International Journal of Engineering Science and Technology Vol. 2(8), 2010, 3662-3675

[39] Murugesan K., Indumathi J., Manjula D. (2010b), A Frame- work for an Ontology-Based Data-Gleaning and Agent Based

19

197 International Journal of Pure and Applied Mathematics Special Issue

Intelligent Decision Support PPDM System Employing Gen- eralization Technique for Health Care, International Journal on Computer Science and Engineering Vol. 02, No. 05, 2010, 1588-1596 [40] Prakash D., Murugesan K., Indumathi J., Manjula D. (2009), A Novel Cardiac Attack Prediction and Classification using Su- pervised Agent Techniques, In the CiiT International Journal of Artificial Intelligent Systems and Machine Learning, May 2009. Vol.1, No.2, P.59. [41] Satheesh Kumar K.,Indumathi J., Uma G.V.(2008), Design of Smoke Screening Techniques for Data Surreptitiousness in Privacy Preserving Data Snooping Using Object Oriented Ap- proach and UML,IJCSNS International Journal of Computer Science and Network Security, Vol.8 No.4, pp.106 - 115. [42] Scholar, M. (2010). Supervised learning approach for spam classification analysis using data mining tools. organization, 2(8), 2760-2766. [43] Scholar, M. (2010). Supervised learning approach for spam classification analysis using data mining tools. organization, 2(8), 2760-2766. [44] Surf-Controls Anti-Spam Prevalence Study 2002,URL:http://www.surfcontrol.com/resources/Anti- Spam Study v2.pdf. [45] T.S. Guzella, T.M. Caminhas, A review of machine learning approaches to spam filtering, Expert Syst. Appl. 36 (2009) 10206e10222. [46] Vasudevan V., Sivaraman N*., SenthilKumar S*., Muthuraj R*., Indumathi J., Uma. G.V.(2007), A Comparative Study of SPKI/SDSI and K-SPKI/SDSI Systems, Information Technology Journal 6(8); pp.1208-1216. [47] VenkateshRamanathan and Harry Wechsler,”Phishing Detec- tion Methodology Using Probabilistic Latent Semantic Anal- ysis”,AdaBoost, and co-training EURASIP Journal on Infor- mation Security,2012:1,phishGILLNET, 2012.

20

198 International Journal of Pure and Applied Mathematics Special Issue

[48] Wikipedia.org. 2012. Fingerprint (computing). http://en.wikipedia.org/wiki/Fingerprint (computing). Andrew G. West, Avantika Agrawal, etc. 2011. Autonomous link spam detection in purely collaborative environments. In WikiSym 2011: The 7th International Symposium on Wikis and Open Collaboration. Network Security, pp. 1517.

[49] Wu, Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks, Expert Syst. (2009).

[50] http://www.detritus.org/spam/skit.html

[51] http://www.digitalethics.org/essays/end-anonymous-emails

[52] http://www3.cs.stonybrook.edu/ ay- chakrabort/courses/cse508/

[53] https://afterglowprod.com/security/4-important-reasons-to- use-anti-spam-filtering-in-your-business/

[54] https://iapp.org/media/pdf/resource center/ePR 2018-04- 13.pdf

[55] https://www.drh.net/blog/2013/09/spam-complaints-cause- effect-cure/

[56] https://www.matchilling.com/comparison-of-machine- learning-methods-in-email- spam-detection/

[57] https://www.semanticscholar.org/paper/Filtering- Spam-by-Using-Factors- Hyperbolic-Trees-Hou- Chen/200d1ea917da5e883504e064b443d5f33682c42e/figure/2

[58] https://www.semanticscholar.org/paper/Filtering- Spam-by-Using-Factors- Hyperbolic-Tree-Hou- Chen/460b9927afd2846d7a566821d2d0cf5fd1f41bea/figure/1

[59] US patent 5590250, Lamping; John O. & Rao; Ramana B., ”Layout of node-link structures in space with negative curva- ture”, assigned to Xerox Corporation

21

199 200