Histogram Based Image Spam Detection Using Back Propagation Neural Networks GJCST Classifications: M

Total Page:16

File Type:pdf, Size:1020Kb

Histogram Based Image Spam Detection Using Back Propagation Neural Networks GJCST Classifications: M Global Journal of Computer Science and Technology Vol. 9 Issue 5 (Ver 2.0), January 2010 P a g e | 62 Histogram based Image Spam Detection using Back propagation Neural Networks GJCST Classifications: M. Soranamageswari and Dr. C. Meena I.2.10, K.6.5, F.1.1, I.4.0, I.5.4 PT Abstract-In general words, image spam is a type of e- them emails also have similar properties as image-based mail in which the text message is presented as a picture in an emails; existing spam filters are no longer capable of image file. This prevents the text based spam filters from detecting between image-based spam and image ham [1]. detecting and blocking such spam messages. In our study we This provides a way for the spammers to easily foil the spam have considered the valid message as “ham” and the invalid filters. The text messages embedded in all image spam will message as “spam”. Though there are several techniques available for detecting the image spam (DNSBL, Greylisting, convey the intent of the spammer and this text is usually an Spamtraps, etc.,) each one has its own advantages and advertisement and often contains text, which has been disadvantages. On behalf of their weakness, they become blacklisted by spam filters (drug store, stock tip, etc). controversial to one another. This paper includes a general study on image spam detection using some of the well-liked II. NEURAL NETWORKS methods. The methods comprise, image spam filtering based on File type, RGB Histogram, and HSV histogram, which are This image spam can be identified using various methods. explained in the following sections. The finest method for By using supervised and semi, supervised learning detecting the image spam from the above-mentioned methods algorithms in neural networks image spam can be detected. can be determined from the above study. A neural network is a computational model or mathematical Keywords- File Type, HSV Histogram, Image Spam, RGB model that tries to simulate the structure and/or functional Histogram aspects of biological neural networks. An artificial neural network is an adaptive system that change its structure based I. INTRODUCTION on external or internal information that flows through the network during the learning phase. Thus, neural networks pam can be uttered as Unsolicited Bulk E-mail (UNBE). are non-linear statistical data modeling tool. Different types The most extensively predictable category of spam is e- S of artificial neural networks that can be trained to perform mail spam. Most UNBE is designed for elicitation, phishing, image processing are feed forward neural networks, Self- or advertisement. E-mail spam is the practice of sending Organizing Feature Maps, Learning Vector Quantizer unwanted e-mail message through junk mail, frequently network. All these networks contain at least one hidden with commercial content, in large quantities to an layer, with fewer units than the input and the output layers. indiscriminate set of recipients. A Spam message also holds In particular, the Back propagation neural network its hand with Instant Messaging System. This Instant algorithm performs gradient-descent in the parameter space Messaging spam, which is also known as “Spim”, makes use minimizing an appropriate error function. The Parameters of instant messaging system. Mobile Phone Spam is directed like mode of learning, information content, activation at the text messaging service of a mobile phone. In the function, target values, input normalization, initialization, similar fashion spam targets on video sharing sites, real time learning rate, and momentum decide the performance of the search engines, online game messaging and so on. back propagation neural networks. The back propagation In the mid 1990s when the internet was opened up for the neural network can be used for compression of various types general public Spam in e-mail started to become a problem. of images like natural scenes, satellite images, and standard In the following years the growth of spam was exponential test images. In this paper, back propagation neural network and today it comprises some 80-85 percent of all the e-mail is implemented to detect the image spam. Back propagation in the world, by conservative estimate [3]. Spam messages algorithm is a widely implemented learning algorithm in have its wings stretch into all kinds of applications in recent ANN. This algorithm implemented is based on error years. More techniques are adopted by several service correction learning rule. The error propagation contains two providers to eliminate spam messages and not all are passes through the different layers of the network, a forward noteworthy. All these techniques are losing their potency as pass and a backward pass. In the first case, the synaptic spammers become more agile. The spammers have turned weights of the networks are fixed, whereas in the latter case, their towards image spam in the recent years. the synaptic weights of the network are adjusted in The embedded image carries the target message and most accordance with an error-correction tool. A neural network email clients display the message in their entirety. Most of has wide range of applications. Their application areas include data processing including filtering, clustering, blind M. Soranamageswari is with LRG Govt Arts College for source separation and compression, classification including Women, Tirupur. (Email: [email protected]). pattern and sequence recognition, novelty detection and Dr. C. Meena is with the Avinashilingam University for sequential decision making and function approximation or Women in India. (E-mail: [email protected]; P a g e | 63 Vol. 9 Issue 5 (Ver 2.0), January 2010 Global Journal of Computer Science and Technology regression analysis, including time series prediction, fitness detect similar patterns even the spammer creates a variation approximation and modeling. of the original pattern. This proved to be an effective approach in detecting and grouping spam emails using III. RELATED WORK templates. Clustering algorithm is utilized to find the similarity of strings, similarity of spam subjects and for Many discussions have been carried out previously on clustering spam subjects. image spam detection. Under this section, we have an Seongwook Youn and Dennis McLeod in [24], describes the overview of the literature. method of filtering gray e-mail using personalized Marco Barreno et al., in [2], explains the different types of ontologies. Their work in [24], explains a personalized attacks on the machine learning algorithms and the systems, ontology spam filter to make decisions for gray e-mail. Gray a variety of defenses against those attacks, and the ideas that e-mail is a message that could reasonably be estimated as are important to secure the machine learning. This approach either spam or ham. A user profile has been created for each illustrates the methods that spammers handle to attack a user or a class of users to handle gray e-mail. This profile system to design an image spam. The issue of machine ontology creates a blacklist of contacts and topic words. learning security goes beyond intrusion detection systems and spam e-mail filters. The different measures of defenses A. Image Spam Classification Based Of Text Properties involved in their discussion are robustness, detecting the attacks, disinformation, randomization for targeted attack, Image Spam classification based on text properties includes and cost of countermeasures. finding the location of texts in images or videos. Texts are A modification of Latent Dirichlet Allocation (LDA), usually designed to attract attention and to reveal Known as multi-corpus LDA technique was introduced by information. The Connected Component based (CC) and the Istvan Biro et al., in [4]. In their proposal, they created a texture-based approaches are the two leading approaches bag-of-words document for every web site and run LDA used in the past to extract the characteristics for the text both on the corpus of sites labeled as spam and as non-spam. detection task. These characteristics include coherence in This assisted them to collect spam and non-spam topics space, geometry and color [5]. In CC-based methods [6], the during their training phase. They implemented these image is segmented into a set of CCs and is grouped into collections on an unseen test site to detect the spam potential text regions based on their geometric relation. messages. This method in combination with web spam These potential regions are then examined using some rule- challenge 2008 public features, and the connectivity sonar based heuristics, which makes use of the characteristics like features is used to test images. Using logistic regression to size, the aspect ratio and the orientation of the region. The aggregate these classifiers, the multi-corpus LDA yields an efficiency of this method becomes questionable when the improvement of around 11 percent in F-measures and 1.5 text is multi-colored, textured, with a small font size, or percent in ROC. overlapping with other graphical objects. Spam web page detection through content analysis is put In texture-based methods [7], it is assumed that the texts forth by Alexandros Ntoulas et al., in [11], which projected have distinct textural properties, and this can be used to some earlier undefined techniques for automatic spam distinguish then from the background. Even though this message detection. They also discussed the effectiveness of method perform well for images with noisy, degraded, or those techniques in isolation and when aggregated using complex texts and/or background it seems to be time- some classification algorithms, which proved to be truth consuming as texture analysis is essentially computational worthy in detecting the image spam.. intensive. Bhaskar Mehta et al., in their paper [2], describe two An increase in the use of internet in the recent years, had led solutions for detecting image-based spam after considering to tremendous growth in volumes of text documents the characteristics of image spam.
Recommended publications
  • SEO Techniques for Various Applications - a Comparative Analyses and Evaluation
    IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727 PP 20-24 www.iosrjournals.org SEO Techniques for various Applications - A Comparative Analyses and Evaluation Sandhya Dahake1, Dr. V. M. Thakare2 , Dr. Pradeep Butey3 1([email protected], G.H.R.I.T., Nagpur, RTM Nagpur University, India) 2([email protected] , P.G. Dept of Computer Science, SGB Amravati University, Amravati, India) 3([email protected], Head, Dept of Computer Science, Kamla Nehru College, Nagpur, India) Abstract : In this information times, the website management is becoming hot topic and search engine has more influence on website. So many website managers are trying o make efforts for Search Engine Optimization (SEO). In order to make search engine transfer information efficiently and accurately, to improve web search ranking different SEO techniques are used. The purpose of this paper is a comparative study of applications of SEO techniques used for increasing the page rank of undeserving websites. With the help of specific SEO techniques, SEO committed to optimize the website to increase the search engine ranking and web visits number and eventually to increase the ability of selling and publicity through the study of how search engines scrape internet page and index and confirm the ranking of the specific keyword. These technique combined page rank algorithm which helps to discover target page. Search Engine Optimization, as a kind of optimization techniques, which can promote website’s ranking in the search engine, also will get more and more attention. Therefore, the continuous technology improvement on network is also one of the focuses of search engine optimization.
    [Show full text]
  • A Rule Based Approach for Spam Detection
    A RULE BASED APPROACH FOR SPAM DETECTION Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering In Computer Science & Engineering By: Ravinder Kamboj (Roll No. 800832030) Under the supervision of: Dr. V.P Singh Mrs. Sanmeet Bhatia Assistant Professor Assistant Professor Computer Science & Engineering Department of SMCA COMPUTER SCIENCE AND ENGINEERING DEPARTMENT THAPAR UNIVERSITY PATIALA – 147004 JULY- 2010 i ii Abstract Spam is defined as a junk Email or unsolicited Email. Spam has increased tremendously in the last few years. Today more than 85% of e-mails that are received by e-mail users are spam. The cost of spam can be measured in lost human time, lost server time and loss of valuable mail. Spammers use various techniques like spam via botnet, localization of spam and image spam. According to the mail delivery process anti-spam measures for Email Spam can be divided in to two parts, based on Emails envelop and Email data. Black listing, grey listing and white listing techniques can be applied on the Email envelop to detect spam. Techniques based on the data part of Email like heuristic techniques and Statistical techniques can be used to combat spam. Bayesian filters as part of statistical technique divides the income message in to words called tokens and checks their probability of occurrence in spam e-mails and ham e-mails. Two types of approaches can be followed for the detection of spam e-mails one is learning approach other is rule based approach. Learning approach required a large dataset of spam e-mails and ham e-mails is required for the training of spam filter; this approach has good time characteristics filter can be retrained quickly for new Spam.
    [Show full text]
  • A Survey on Spam Detection Techniques
    ISSN (Online) : 2278-1021 ISSN (Print) : 2319-5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 3, Issue 12, December 2014 A survey on spam detection techniques Anjali Sharma1, Manisha 2, Dr.Manisha 3 , Dr.Rekha Jain 4 1,2,3,4 Bansthali Vidyapith, Jaipur Campus, India Abstract: Today e-mails have become one of the most popular and economical forms of communication for Internet users. Thus due to its popularity, the e-mail is going to be misused. One such misuse is the posting of unwelcome, unwanted e-mails known as spam or junk e-mails [1]. E-mail spam has various consequences. It reduces productivity, takes extra space in mail boxes, extra time, extend software damaging viruses, and materials that contains potentially harmful information for Internet users, destroy stability of mail servers, and as a result users spend lots of time for sorting incoming mail and deleting unwanted correspondence. So there is a need of spam detection so that its consequences can be reduced [2]. In this paper, we present various spam detection techniques. Keywords: Spam, Spam detection techniques, Email classification I. INTRODUCTION Spam refers to unsolicited commercial email. Also known firewalls; therefore, it is an especially useful way for as junk mail, spam floods Internet users’ electronic spammers. It targets the users when they join any chat mailboxes. These junk mails can contain various types of room to find new friends. It spoils enjoy of people and messages such as pornography, commercial advertising, waste their time also. doubtful product, viruses or quasi legal services [3].
    [Show full text]
  • PIA National Agency Marketing Guide!
    Special Report: Agents’ Guide to Internet Marketing PIA National Agency Marketing Guide2011 Brought to you by these sponsors: A product of the PIA Branding Program Seaworthy Insurance... The Choice When It Comes To Security And Superior Service In Marine Coverage n Featuring policies for pleasure boats, yachts, super yachts and commercial charter boats n Excellent claims service provided by boating experts, including 24-hour dispatch of assistance n Competitive commission structure and low premium volume commitment n Web-based management system allows agents to quote, bind, track commissions and claims status online, any time Become a Seaworthy Agency today, visit www.seaworthyinsurance.com or call toll-free 1-877-580-2628 PIA National Marketing Guide_7x9.5live.indd 1 4/21/11 3:31 PM elcome to the second edition of the PIA National Agency Marketing Guide! The 2011 edition of this PIA Branding Program publication follows the highly Wacclaimed 2010 edition, which focused on agent use of social media. While social media continues to be of major interest to agents across America, many of the questions coming in to PIA National’s office this year had to do with the Internet. PIA mem- bers wanted to know how to best position their agency website, how to show up in search engines, how to use email marketing, and more. Our special report, the Agents’ Guide to Internet Marketing, helps to answer these questions. Whether you handle most of your agency’s marketing in-house, or whether you outsource many of these functions, you’ll find a treasure trove of things to implement, things to consider and things to watch out for.
    [Show full text]
  • Henry Bramwell, President, Visionary Graphics
    #sbdcbootcamp Understanding who our target audience is and where they spend their time, you can build an organic presence by focusing on content that speaks to their needs and wants. What Is SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine's unpaid results—often referred to as "natural," "organic," or "earned" results. SEO Approach BLACK HAT WHITE HAT Black Hat SEO • Exploits weaknesses in search engine algorithms to obtain high rankings for a website. • Keyword Stuffing, Cloaking, Hidden Text and links and Link Spam. • Quick, unpredictable and short-lasting growth in rankings. White Hat SEO • Utilize techniques and methods to improve search rankings in accordance with the Search Engine’s Guidelines. • High quality content, website HTML optimization, relevant inbound links, social authority . • Steady, gradual, last growth in rankings. Meta Tags Meta Tags explain to search engines and (sometimes) searchers themselves what your page is about. Meta Tags are inserted into every page of your website code. How SEO Has Changed OLD SEO NEW SEO Focus on singular Focus on long tail keyword keywords, keyword intent and searcher’s needs How SEO Has Changed • When search engines were first created, early search marketers were able to easily find ways to make the search engine think that their client's site was the one that should rank well. • In some cases it was as simple as putting in some code on the website called a meta keywords tag. The meta keywords tag would tell search engines what the page was about.
    [Show full text]
  • Asian Anti-Spam Guide 1
    Asian Anti-Spam Guide 1 © MediaBUZZ Pte Ltd January 2009 Asian Anti-SpamHighlights Guide 2 • Combating the latest inbound threat: Spam and dark traffic, Pg. 13 • Secure Email Policy Best Practices, Pg. 17 • The Continuous Hurdle of Spam, Pg. 29 • Asian Anti Spam Acts, Pg. 42 Contents: • Email Spam: A Rising Tide 4 • What everyone should know about spam and privacy 7 • Scary Email Issues of 2008 12 • Combating the latest inbound threat: Spam and dark 13 • Proofpoint survey viewed spam as an increasing threat 16 • Secure Email Policy Best Practices 17 • Filtering Out Spam and Scams 24 • The Resurgence of Spam 26 • 2008 Q1 Security Threat landscape 27 • The Continuous Hurdle of Spam 29 • Spam Filters are Adaptive 30 • Liberating the inbox: How to make email safe and pro- 31 ductive again • Guarantee a clear opportunity to opt out 33 • The Great Balancing Act: Juggling Collaboration and 34 Authentication in Government IT Networks • The Not So Secret Cost of Spam 35 • How to Avoid Spam 36 • How to ensure your e-mails are not classified as spam 37 • Blue Coat’s Top Security Trends for 2008 38 • The Underground Economy 40 • Losing Email is No Longer Inevitable 42 • Localized malware gains ground 44 • Cyber-crime shows no signs of abating 45 MEDIABUZZ PTE LTD • Asian Anti-Spam Acts 47 ASIAN ANTI-SPAM GUIDE © MediaBUZZ Pte Ltd January 2009 Asian Anti-SpamHighlights Guide 3 • Frost & Sullivan: Do not underestimate spam, Pg. 65 • Unifying email security is key, Pg. 71 • The many threats of network security, Pg. 76 • The UTM story, Pg.
    [Show full text]
  • Internet Security
    In the News Articles in the news from the past month • “Security shockers: 75% of US bank websites Internet Security have flaws” • “Blank robbers swipe 3,000 ‘fraud-proof’ UK passports” • “Korean load sharks feed on hacked data” • “Worms spread via spam on Facebook and Nan Niu ([email protected]) MySpace” CSC309 -- Fall 2008 • “Beloved websites riddled with crimeware” • “Google gives GMail always-on encryption” http://www.theregister.co.uk 2 New Targets of 2007 Scenario 1 • Cyber criminals and cyber spies have • The Chief Information Security Officer shifted their focus again of a medium sized, but sensitive, federal – Facing real improvements in system and agency learned that his computer was network security sending data to computers in China. • The attackers now have two new targets • He had been the victim of a new type of spear phishing attack highlighted in this – users who are easily misled year’s Top 20. – custom-built applications • Once they got inside, the attackers had • Next, 4 exploits scenarios… freedom of action to use his personal • Reported by SANS (SysAdmin, Audit, Network, computer as a tunnel into his agency’s Security), http://www.sans.org systems. 3 4 Scenario 2 Scenario 3 • Hundreds of senior federal officials and business • A hospital’s website was compromised executives visited a political think-tank website that had been infected and caused their computers to because a Web developer made a become zombies. programming error. • Keystroke loggers, placed on their computers by the • Sensitive patient records were taken. criminals (or nation-state), captured their user names and passwords when their stock trading accounts and • When the criminals proved they had the their employers computers, and sent the data to data, the hospital had to choose between computers in different countries.
    [Show full text]
  • Design of SMS Commanded-And-Controlled and P2P-Structured Mobile Botnets
    Design of SMS Commanded-and-Controlled and P2P-Structured Mobile Botnets Yuanyuan Zeng, Kang G. Shin, Xin Hu The University of Michigan, Ann Arbor, MI 48109-2121, U.S.A. fgracez, kgshin, [email protected] Abstract—Botnets have become one of the most serious security is usually capable of only one or two functions. Although the threats to the Internet and personal computer (PC) users. number of mobile malware families and their variants has been Although botnets have not yet caused major outbreaks in mobile growing steadily over the recent years, their functionalities networks, with the rapidly-growing popularity of smartphones such as Apple’s iPhone and Android-based phones that store have remained simple until recently. more personal data and gain more capabilities than earlier- SymbOS.Exy.A trojan [2] was discovered in February 2009 generation handsets, botnets are expected to move towards this and its variant SymbOS.Exy.C resurfaced in July 2009. This mobile domain. Since SMS is ubiquitous to every phone and can mobile worm, which is said to have “botnet-esque” behavior delay message delivery for offline phones, it is a suitable medium patterns, differs from other mobile malware because after for command and control (C&C). In this paper, we describe how a mobile botnet can be built by utilizing SMS messages infection, it connects back to a malicious HTTP server and for C&C, and how different P2P structures can be exploited reports information of the device and its user. The Ikee.B for mobile botnets. Our simulation results demonstrate that a worm [3] targets jailbroken iPhones, and has behavior similar modified Kademlia—a structured architecture—is a better choice to SymbOS.Exy.
    [Show full text]
  • Brave River Solutions 875 Centerville Rd Suite #3, Warwick, RI 02886
    Brave River Solutions ● 875 Centerville Rd Suite #3, Warwick, RI 02886 ● (401) 828-6611 Is your head spinning from acronyms like PPC, SEO, SEM, SMM, or terms like bot, canonical, duplicate content, and link sculpting? Well this is the resource you've been looking for. In this guide we have compiled a comprehensive collection of SEO terms and practices that marketers, businesses, and clients have needed further clarification on. 301 redirect To begin, let’s cover the basics of redirects. A redirection occurs if you visit a specific webpage and then are immediately redirected to an entirely new webpage with a different URL. The two main types of redirects are temporary and permanent. To users, there is no obvious difference. But for search engines, a 301 redirect informs them that the page attempting to be accessed has changed permanently and must have its link juice transferred to the new address (which would not happen with a temporary redirect). Affiliate An affiliate site markets products or services that are sold by another website or business in exchange for fees or commissions. An example is a site that markets Amazon or eBay products on their own website. Affiliate sites often assist in increasing traffic to the site in which they are marketing products or services for. Algorithm (algo) An algorithm is a type of program that search engines when deciding which pages to put forth as most relevant for a particular search query. Alt tag An alt tag is an HTML attribute of the IMG tag. The IMG tag is responsible for displaying images, while the alt tag/attribute is the text that gets displayed if an image is unable to be loaded (or if the file happens to be missing).
    [Show full text]
  • Towards Evaluating Web Spam Threats and Countermeasures
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 9, No. 10, 2018 Towards Evaluating Web Spam Threats and Countermeasures Lina A. Abuwardih Computer Information Systems Department Jordan University of Science and Technology, Irbid, Jordan Abstract—Web spam is a deceiving technique that aims to get The paper is organized as follow: Section II briefly presents high ranks for the retrieved web pages at the top Search Engine related studies of web spam types. Section III presents the Result Pages (SERPs). This paper provides an evaluation for the main web spam threats. Section IV shows the Anti Web spam web spam threats and countermeasures. It is started by presenting tools, systems, and countermeasures. Section V presents the the different types of web spam threats that aim to deceive conclusion of the study. the users with incorrect information, distribute phishing and propagate malware. Then, presenting a detailed description for the proposed anti web spam tools, systems and countermeasures, II. WEB SPAM and conducting a comparison between them. The results indicate that online real time tools are highly recommended solutions Web spam includes various types such as content-based, against web spam threats. link-based, opinion based, and cloaking. Many studies ex- plored different spam techniques and proposed several solu- Keywords—SEO; web spam threats; phishing; malware; web tions to handle web spamming techniques. attacks • content web spam: In content web spam, the actual contents of the web pages are altered using different I. INTRODUCTION techniques. Typically, search engines are based on the TF-IDF algorithm of information retrieval for Nowadays, enormous amounts of queries are performed by evaluation and ranking the retrieved web pages.
    [Show full text]
  • Anti-Spam Methods
    INTRODUCTION Spamming is flooding the Internet with many copies of the same message, in an attempt to force the message on people who would not otherwise choose to receive it. Most spam is commercial advertising, often for dubious products, get-rich-quick schemes, or quasi- legal services. Spam costs the sender very little to send -- most of the costs are paid for by the recipient or the carriers rather than by the sender. There are two main types of spam, and they have different effects on Internet users. Cancellable Usenet spam is a single message sent to 20 or more Usenet newsgroups. (Through long experience, Usenet users have found that any message posted to so many newsgroups is often not relevant to most or all of them.) Usenet spam is aimed at "lurkers", people who read newsgroups but rarely or never post and give their address away. Usenet spam robs users of the utility of the newsgroups by overwhelming them with a barrage of advertising or other irrelevant posts. Furthermore, Usenet spam subverts the ability of system administrators and owners to manage the topics they accept on their systems. Email spam targets individual users with direct mail messages. Email spam lists are often created by scanning Usenet postings, stealing Internet mailing lists, or searching the Web for addresses. Email spams typically cost users money out-of-pocket to receive. Many people - anyone with measured phone service - read or receive their mail while the meter is running, so to speak. Spam costs them additional money. On top of that, it costs money for ISPs and online services to transmit spam, and these costs are transmitted directly to subscribers.
    [Show full text]
  • Spam (Spam 2.0) Through Web Usage
    Digital Ecosystems and Business Intelligence Institute Addressing the New Generation of Spam (Spam 2.0) Through Web Usage Models Pedram Hayati This thesis is presented for the Degree of Doctor of Philosophy of Curtin University July 2011 I Abstract Abstract New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications. The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method. This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering. This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach.
    [Show full text]