Histogram Based Image Spam Detection Using Back Propagation Neural Networks GJCST Classifications: M
Total Page:16
File Type:pdf, Size:1020Kb
Global Journal of Computer Science and Technology Vol. 9 Issue 5 (Ver 2.0), January 2010 P a g e | 62 Histogram based Image Spam Detection using Back propagation Neural Networks GJCST Classifications: M. Soranamageswari and Dr. C. Meena I.2.10, K.6.5, F.1.1, I.4.0, I.5.4 PT Abstract-In general words, image spam is a type of e- them emails also have similar properties as image-based mail in which the text message is presented as a picture in an emails; existing spam filters are no longer capable of image file. This prevents the text based spam filters from detecting between image-based spam and image ham [1]. detecting and blocking such spam messages. In our study we This provides a way for the spammers to easily foil the spam have considered the valid message as “ham” and the invalid filters. The text messages embedded in all image spam will message as “spam”. Though there are several techniques available for detecting the image spam (DNSBL, Greylisting, convey the intent of the spammer and this text is usually an Spamtraps, etc.,) each one has its own advantages and advertisement and often contains text, which has been disadvantages. On behalf of their weakness, they become blacklisted by spam filters (drug store, stock tip, etc). controversial to one another. This paper includes a general study on image spam detection using some of the well-liked II. NEURAL NETWORKS methods. The methods comprise, image spam filtering based on File type, RGB Histogram, and HSV histogram, which are This image spam can be identified using various methods. explained in the following sections. The finest method for By using supervised and semi, supervised learning detecting the image spam from the above-mentioned methods algorithms in neural networks image spam can be detected. can be determined from the above study. A neural network is a computational model or mathematical Keywords- File Type, HSV Histogram, Image Spam, RGB model that tries to simulate the structure and/or functional Histogram aspects of biological neural networks. An artificial neural network is an adaptive system that change its structure based I. INTRODUCTION on external or internal information that flows through the network during the learning phase. Thus, neural networks pam can be uttered as Unsolicited Bulk E-mail (UNBE). are non-linear statistical data modeling tool. Different types The most extensively predictable category of spam is e- S of artificial neural networks that can be trained to perform mail spam. Most UNBE is designed for elicitation, phishing, image processing are feed forward neural networks, Self- or advertisement. E-mail spam is the practice of sending Organizing Feature Maps, Learning Vector Quantizer unwanted e-mail message through junk mail, frequently network. All these networks contain at least one hidden with commercial content, in large quantities to an layer, with fewer units than the input and the output layers. indiscriminate set of recipients. A Spam message also holds In particular, the Back propagation neural network its hand with Instant Messaging System. This Instant algorithm performs gradient-descent in the parameter space Messaging spam, which is also known as “Spim”, makes use minimizing an appropriate error function. The Parameters of instant messaging system. Mobile Phone Spam is directed like mode of learning, information content, activation at the text messaging service of a mobile phone. In the function, target values, input normalization, initialization, similar fashion spam targets on video sharing sites, real time learning rate, and momentum decide the performance of the search engines, online game messaging and so on. back propagation neural networks. The back propagation In the mid 1990s when the internet was opened up for the neural network can be used for compression of various types general public Spam in e-mail started to become a problem. of images like natural scenes, satellite images, and standard In the following years the growth of spam was exponential test images. In this paper, back propagation neural network and today it comprises some 80-85 percent of all the e-mail is implemented to detect the image spam. Back propagation in the world, by conservative estimate [3]. Spam messages algorithm is a widely implemented learning algorithm in have its wings stretch into all kinds of applications in recent ANN. This algorithm implemented is based on error years. More techniques are adopted by several service correction learning rule. The error propagation contains two providers to eliminate spam messages and not all are passes through the different layers of the network, a forward noteworthy. All these techniques are losing their potency as pass and a backward pass. In the first case, the synaptic spammers become more agile. The spammers have turned weights of the networks are fixed, whereas in the latter case, their towards image spam in the recent years. the synaptic weights of the network are adjusted in The embedded image carries the target message and most accordance with an error-correction tool. A neural network email clients display the message in their entirety. Most of has wide range of applications. Their application areas include data processing including filtering, clustering, blind M. Soranamageswari is with LRG Govt Arts College for source separation and compression, classification including Women, Tirupur. (Email: [email protected]). pattern and sequence recognition, novelty detection and Dr. C. Meena is with the Avinashilingam University for sequential decision making and function approximation or Women in India. (E-mail: [email protected]; P a g e | 63 Vol. 9 Issue 5 (Ver 2.0), January 2010 Global Journal of Computer Science and Technology regression analysis, including time series prediction, fitness detect similar patterns even the spammer creates a variation approximation and modeling. of the original pattern. This proved to be an effective approach in detecting and grouping spam emails using III. RELATED WORK templates. Clustering algorithm is utilized to find the similarity of strings, similarity of spam subjects and for Many discussions have been carried out previously on clustering spam subjects. image spam detection. Under this section, we have an Seongwook Youn and Dennis McLeod in [24], describes the overview of the literature. method of filtering gray e-mail using personalized Marco Barreno et al., in [2], explains the different types of ontologies. Their work in [24], explains a personalized attacks on the machine learning algorithms and the systems, ontology spam filter to make decisions for gray e-mail. Gray a variety of defenses against those attacks, and the ideas that e-mail is a message that could reasonably be estimated as are important to secure the machine learning. This approach either spam or ham. A user profile has been created for each illustrates the methods that spammers handle to attack a user or a class of users to handle gray e-mail. This profile system to design an image spam. The issue of machine ontology creates a blacklist of contacts and topic words. learning security goes beyond intrusion detection systems and spam e-mail filters. The different measures of defenses A. Image Spam Classification Based Of Text Properties involved in their discussion are robustness, detecting the attacks, disinformation, randomization for targeted attack, Image Spam classification based on text properties includes and cost of countermeasures. finding the location of texts in images or videos. Texts are A modification of Latent Dirichlet Allocation (LDA), usually designed to attract attention and to reveal Known as multi-corpus LDA technique was introduced by information. The Connected Component based (CC) and the Istvan Biro et al., in [4]. In their proposal, they created a texture-based approaches are the two leading approaches bag-of-words document for every web site and run LDA used in the past to extract the characteristics for the text both on the corpus of sites labeled as spam and as non-spam. detection task. These characteristics include coherence in This assisted them to collect spam and non-spam topics space, geometry and color [5]. In CC-based methods [6], the during their training phase. They implemented these image is segmented into a set of CCs and is grouped into collections on an unseen test site to detect the spam potential text regions based on their geometric relation. messages. This method in combination with web spam These potential regions are then examined using some rule- challenge 2008 public features, and the connectivity sonar based heuristics, which makes use of the characteristics like features is used to test images. Using logistic regression to size, the aspect ratio and the orientation of the region. The aggregate these classifiers, the multi-corpus LDA yields an efficiency of this method becomes questionable when the improvement of around 11 percent in F-measures and 1.5 text is multi-colored, textured, with a small font size, or percent in ROC. overlapping with other graphical objects. Spam web page detection through content analysis is put In texture-based methods [7], it is assumed that the texts forth by Alexandros Ntoulas et al., in [11], which projected have distinct textural properties, and this can be used to some earlier undefined techniques for automatic spam distinguish then from the background. Even though this message detection. They also discussed the effectiveness of method perform well for images with noisy, degraded, or those techniques in isolation and when aggregated using complex texts and/or background it seems to be time- some classification algorithms, which proved to be truth consuming as texture analysis is essentially computational worthy in detecting the image spam.. intensive. Bhaskar Mehta et al., in their paper [2], describe two An increase in the use of internet in the recent years, had led solutions for detecting image-based spam after considering to tremendous growth in volumes of text documents the characteristics of image spam.