Understanding How Spammers Steal Your E-Mail Address: an Analysis of the First Six Months of Data from Project Honey Pot

Total Page:16

File Type:pdf, Size:1020Kb

Understanding How Spammers Steal Your E-Mail Address: an Analysis of the First Six Months of Data from Project Honey Pot Understanding How Spammers Steal Your E-Mail Address: An Analysis of the First Six Months of Data from Project Honey Pot Matthew B. Prince† Lee Holloway, Arthur M. Keller CEO, Unspam, LLC Eric Langheinrich, Information Systems and Adjunct Professor of Law & Benjamin M. Dahl Technology Management John Marshall Law School Unspam, LLC Baskin School of Engineering 850 W. Adams, Suite 4e 850 W. Adams, Suite 4e Univ. of California, Santa Cruz Chicago, IL 60607-3096 Chicago, IL 60607-3096 & Advisor, Unspam, LLC Abstract CEAS 2004 and opened to public volunteers October 14, 2004. Since its launch, the Project’s software has This paper summarizes and analyses data been installed by more than 5,000 users on websites compiled on the activities of email harvesters worldwide. The Project’s honey pots are running in at gathered through a 5,000+ member honey pot least 80 countries and on every inhabited continent. system that issues unique addresses based on a visitor’s IP address and specific spidering As of June 20, 2005, the Project is monitoring more time. The project, known as Project Honey than 250,000 active spamtrap e-mail honey pots. Pot, has provided data about the geographical Thousands of spamtrap addresses are distributed source of harvesting and mail processing, the through honey pots each day and the Project is on pace sending patterns of different types of to have more than 1 million active spamtraps monitored spammers as well as list management by the end of 2005. behavior. In addition to providing guidance for website administrators trying to forestall This paper is the first thorough analysis of the data gathered by Project Honey Pot. Understanding the harvesting, the Project data also suggest that behavior of harvesters is critical to controlling the spam anti-harvesting regulations offer a new, problem. Harvesters sit at the beginning of the spam potentially successful prosecutorial avenues cycle. Studies by the Pew Internet Project, the Center against spam as well as inform potential for Democracy and Technology, as well as the Federal amendments to current anti-spam laws that may help those efforts. Trade Commission have found that harvesting is the primary way spammers obtain new e-mail addresses.1 Understanding harvesting and the resulting address 1 Introduction† distribution can provide not only a mechanism to keep e-mail addresses out of the hands of spammers, but may It is axiomatic to say that the best way to stop spam is also help identify spam gangs and give law enforcement to keep spammers from getting your e-mail address. officials a new cause of action for prosecutions. While e-postage, challenge-response systems, Bayesian filters, realtime block lists, and reputation services may be necessary once an address is widely distributed, all 2 Technical Background of these anti-spam measures can be made more Project Honey Pot consists of two primary components: effective if the process of obtaining e-mail addresses in 1) the honey pot software installed on machines the first place is made difficult and auditable. To that worldwide, and 2) the centralized server which collects end, Project Honey Pot was created to understand the data from and distributes spamtrap addresses to the primary way by which spammers obtain new e-mail honey pots. The Project currently supports honey pot addresses. software for platforms running the following scripting Project Honey Pot (www.projecthoneypot.org) is a 1 distributed honey pot network to track e-mail See “Spam: How it is hurting email and degrading life on the Internet,” Deborah Fallows, Pew Internett & Amer. Life Project, Oct. harvesters, and, subsequently, the spammers who send 22, 2003 <http://www.pewinternet.org/report_display.asp?r=102>; to harvested addresses. The Project was announced at “Why Am I Getting All This Spam?” Center for Democracy and Technology, March 2003 <http://www.cdt.org/speech/spam/ 030319spamreport.shtml>; “Email Address Harvesting: How † Corresponding author. Email: [email protected]. Spammers Reap What You Sow,” FTC Report, Nov. 2002 Telephone: +01.312.543.3045 (direct). <http://www.ftc.gov/bcp/conline/pubs/alerts/spamalrt.htm>. language: PHP, ASP, ASP.NET, Perl, mod_perl, as well as thousands of additional domains donated by ColdFusion, SAP Netweaver BSP, and Python. We also our members. These donations take place by members provide a wrapper for users of the MovableType pointing their donated domains’ MX record to our blogging software to allow for easy installation. servers. Website administrators download and install the honey By combining our donated domains with our possible pot software. The static content of the honey pots, usernames, we can currently create approximately 10 which primarily consists of a legal disclaimer trillion unique e-mail addresses that will resolve to our forbidding the harvesting of the addresses displayed on mail servers. This allows us to distribute a unique the page, is randomized for each download in order to spamtrap to every visitor to a honey pot for the make the Project’s honey pots difficult to recognize and foreseeable future. Moreover, it means it is difficult for avoid. On a few high-traffic websites, we have further spammers to determine what e-mail addresses on their customized the boilerplate legal disclaimer as well as lists are, in fact, spamtraps. To further disguise our the look of the honey pot for particular members’ spamtraps we rotate the IP addresses of our mail servers needs. and are continuously looking for ways to further hide what addresses belong to the Project. After the honey pot script is installed and activated, we provide instructions to the website administrator on linking from his current web pages to the honey pot 3 Data Analysis page. These links are generally formatted to be invisible Harvesters make up a significant percentage of the to human visitors to the website, but to be followed by robot traffic currently trolling the Internet. web spiders and robots. We test these formats to ensure Approximately 6.5 percent of the traffic visiting our they are followed by the latest crop of spam harvesters. honey pots subsequently turns out to be spam When one of these links is followed and a honey pot is harvesters. While some human traffic inevitably finds accessed by a visitor, the honey pot script installed on our honey pots, the vast majority of visitors to these the webserver instantly contacts the centralized Project pages are automated spiders. We estimate, therefore, Honey Pot servers. The honey pot script passes to the that harvesters make up at least 5 percent of all centralized servers an array that includes the IP address automated traffic online. of the visitor, the useragent of the visitor, and the The average time from a spamtrap address being referer string of the visitor. The servers record this harvested to when it receives its first message is visitor information as well as a timestamp and return a currently 11 days, 7 hours, 43 minutes, and 10 seconds. unique spamtrap e-mail address to the honey pot script. The fastest turnaround is less than 1 second, and the The spamtrap address is handed out only once and is slowest is 223 days, 19 hours, 37 minutes, and 8 tied to both a moment in time and visitor information. seconds. The slowest time is just under the total online The honey pot script combines the spamtrap address age of the Project. As a result, we believe that the average turnaround time will continue to rise slightly as with the static content and displays a web page. The 2 process from access to page display typically takes less the Project ages. than a second and creates little additional load for the We’ve been surprised, so far, by how slow the web server where the honey pot script is installed. turnaround for some spammers has been. This lends While every spamtrap address is unique, they are support to the hypothesis that there is a class of designed to look like real addresses. There are two parts individuals involved in the spam trade who to every e-mail address: 1) the username, which appears methodically gather addresses. These individuals could before the @ sign, and 2) the domain, which appears be spammers who also send to those addresses, or they after the @ sign. We construct usernames with a list of could sit at the top of the spam food chain, selling the more than 6,000 common first names, 12,000 common lists they obtain to the spammers who then send last names, a 60,000 word dictionary, and random other messages to those lists. There is additional evidence letters and numbers. These components are combined to from recent legal cases that these “listmen” do exist as form typical usernames used by legitimate mailing part of the spam economy. Identification of these systems. For example: listmen, with an understanding and control of their behavior following closely thereafter, we believe offers • john.smith a critical choke point in the spam cycle both legally and • john_smith technologically. • johnasmith • jsmith We have also been surprised by how clearly many • orange42 harvesters identify themselves. A substantial percentage • orangegrasslands of harvesters can be identified by the “useragent” they For the domain portion of the spamtrap address, we use a number of domains controlled by Project Honey Pot 2 For the latest stats, see the Project Honey Pot statistics available online at: http://www.projecthoneypot.org/statistics.php. broadcast when visiting a website. While some universe of harvesters. In fact, only 25 harvesters are harvester disguise their identity pretending to be a responsible for more than 50 percent of the volume of typical website visitor (e.g., Mozilla/4.0 (compatible; spam that has been sent to the Project.
Recommended publications
  • Prospects, Leads, and Subscribers
    PAGE 2 YOU SHOULD READ THIS eBOOK IF: You are looking for ideas on finding leads. Spider Trainers can help You are looking for ideas on converting leads to Marketing automation has been shown to increase subscribers. qualified leads for businesses by as much as 451%. As You want to improve your deliverability. experts in drip and nurture marketing, Spider Trainers You want to better maintain your lists. is chosen by companies to amplify lead and demand generation while setting standards for design, You want to minimize your list attrition. development, and deployment. Our publications are designed to help you get started, and while we may be guilty of giving too much information, we know that the empowered and informed client is the successful client. We hope this white paper does that for you. We look forward to learning more about your needs. Please contact us at 651 702 3793 or [email protected] . ©2013 SPIDER TRAINERS PAGE 3 TAble Of cOnTenTS HOW TO cAPTure SubScriberS ...............................2 HOW TO uSe PAiD PrOGrAMS TO GAin Tipping point ..................................................................2 SubScriberS ...........................................................29 create e mail lists ...........................................................3 buy lists .........................................................................29 Pop-up forms .........................................................4 rent lists ........................................................................31 negative consent
    [Show full text]
  • Address Munging: the Practice of Disguising, Or Munging, an E-Mail Address to Prevent It Being Automatically Collected and Used
    Address Munging: the practice of disguising, or munging, an e-mail address to prevent it being automatically collected and used as a target for people and organizations that send unsolicited bulk e-mail address. Adware: or advertising-supported software is any software package which automatically plays, displays, or downloads advertising material to a computer after the software is installed on it or while the application is being used. Some types of adware are also spyware and can be classified as privacy-invasive software. Adware is software designed to force pre-chosen ads to display on your system. Some adware is designed to be malicious and will pop up ads with such speed and frequency that they seem to be taking over everything, slowing down your system and tying up all of your system resources. When adware is coupled with spyware, it can be a frustrating ride, to say the least. Backdoor: in a computer system (or cryptosystem or algorithm) is a method of bypassing normal authentication, securing remote access to a computer, obtaining access to plaintext, and so on, while attempting to remain undetected. The backdoor may take the form of an installed program (e.g., Back Orifice), or could be a modification to an existing program or hardware device. A back door is a point of entry that circumvents normal security and can be used by a cracker to access a network or computer system. Usually back doors are created by system developers as shortcuts to speed access through security during the development stage and then are overlooked and never properly removed during final implementation.
    [Show full text]
  • Technical and Legal Approaches to Unsolicited Electronic Mail†
    35 U.S.F. L. REV. 325 (2001) Technical and Legal Approaches to Unsolicited Electronic Mail† By DAVID E. SORKIN* “Spamming” is truly the scourge of the Information Age. This problem has become so widespread that it has begun to burden our information infrastructure. Entire new networks have had to be constructed to deal with it, when resources would be far better spent on educational or commercial needs. United States Senator Conrad Burns (R-MT)1 UNSOLICITED ELECTRONIC MAIL, also called “spam,”2 causes or contributes to a wide variety of problems for network administrators, † Copyright © 2000 David E. Sorkin. * Assistant Professor of Law, Center for Information Technology and Privacy Law, The John Marshall Law School; Visiting Scholar (1999–2000), Center for Education and Research in Information Assurance and Security (CERIAS), Purdue University. The author is grateful for research support furnished by The John Marshall Law School and by sponsors of the Center for Education and Research in Information Assurance and Security. Paul Hoffman, Director of the Internet Mail Consortium, provided helpful comments on technical matters based upon an early draft of this Article. Additional information related to the subject of this Article is available at the author’s web site Spam Laws, at http://www.spamlaws.com/. 1. Spamming: Hearing Before the Subcomm. on Communications of the Senate Comm. on Commerce, Sci. & Transp., 105th Cong. 2 (1998) (prepared statement of Sen. Burns), available at 1998 WL 12761267 [hereinafter 1998 Senate Hearing]. 2. The term “spam” reportedly came to be used in connection with online activities following a mid-1980s episode in which a participant in a MUSH created and used a macro that repeatedly typed the word “SPAM,” interfering with others’ ability to participate.
    [Show full text]
  • Presentations Made by Senders
    SES ���� ��� � �� � � � � � � � ������������� DomainKeys ��������� SPF ��������������������� ���������� ����������������� ������������������������������������������������ Contents Introduction 3 Deployment: For Email Receivers 6 Audience 3 Two Sides of the Coin 6 How to Read this White Paper 3 Recording Trusted Senders Who Passed Authentication 6 A Vision for Spam-Free Email 4 Whitelisting Incoming Forwarders 6 The Problem of Abuse 4 What To Do About Forgeries 6 The Underlying Concept 4 Deployment: For ISPs and Enterprises 7 Drivers; or, Who’s Buying It 4 Complementary considerations for ISPs 7 Vision Walkthrough 5 Deployment: For MTA vendors 8 About Sender Authentication 8 Which specification? 8 An Example 8 Conformance testing 8 History 8 Perform SRS and prepend headers when forwarding 8 How IP-based Authentication Works 9 Add ESMTP support for Submitter 8 The SPF record 9 Record authentication and policy results in the headers 8 How SPF Classic Works 9 Join the developers mailing list 8 How Sender ID works 9 Deployment: For MUA vendors 9 How Cryptographic Techniques Work 0 Displaying Authentication-Results 9 Using Multiple Approaches Automatic switching to port 587 9 Reputation Systems Deployment: For ESPs 20 Deployment: For Email Senders 2 Don’t look like a phisher! 20 First, prepare. 2 Delegation 20 Audit Your Outbound Mailstreams 2 Publish Appropriately 20 Construct the record 2 Deployment: For Spammers 2 Think briefly about PRA and Mail-From contexts. 3 Two Types of Spammers 2 Test the record, part 3 Publish SPF and sign with DomainKeys. 2 Put the record in DNS 3 Stop forging random domains. 2 Test the record, part 2 4 Buy your own domains. 2 Keep Track of Violations 4 Reuse an expired domain.
    [Show full text]
  • Asian Anti-Spam Guide 1
    Asian Anti-Spam Guide 1 © MediaBUZZ Pte Ltd January 2009 Asian Anti-SpamHighlights Guide 2 • Combating the latest inbound threat: Spam and dark traffic, Pg. 13 • Secure Email Policy Best Practices, Pg. 17 • The Continuous Hurdle of Spam, Pg. 29 • Asian Anti Spam Acts, Pg. 42 Contents: • Email Spam: A Rising Tide 4 • What everyone should know about spam and privacy 7 • Scary Email Issues of 2008 12 • Combating the latest inbound threat: Spam and dark 13 • Proofpoint survey viewed spam as an increasing threat 16 • Secure Email Policy Best Practices 17 • Filtering Out Spam and Scams 24 • The Resurgence of Spam 26 • 2008 Q1 Security Threat landscape 27 • The Continuous Hurdle of Spam 29 • Spam Filters are Adaptive 30 • Liberating the inbox: How to make email safe and pro- 31 ductive again • Guarantee a clear opportunity to opt out 33 • The Great Balancing Act: Juggling Collaboration and 34 Authentication in Government IT Networks • The Not So Secret Cost of Spam 35 • How to Avoid Spam 36 • How to ensure your e-mails are not classified as spam 37 • Blue Coat’s Top Security Trends for 2008 38 • The Underground Economy 40 • Losing Email is No Longer Inevitable 42 • Localized malware gains ground 44 • Cyber-crime shows no signs of abating 45 MEDIABUZZ PTE LTD • Asian Anti-Spam Acts 47 ASIAN ANTI-SPAM GUIDE © MediaBUZZ Pte Ltd January 2009 Asian Anti-SpamHighlights Guide 3 • Frost & Sullivan: Do not underestimate spam, Pg. 65 • Unifying email security is key, Pg. 71 • The many threats of network security, Pg. 76 • The UTM story, Pg.
    [Show full text]
  • Taxonomy of Email Reputation Systems (Invited Paper)
    Taxonomy of Email Reputation Systems (Invited Paper) Dmitri Alperovitch, Paul Judge, and Sven Krasser Secure Computing Corporation 4800 North Point Pkwy Suite 400 Alpharetta, GA 30022 678-969-9399 {dalperovitch, pjudge, skrasser}@securecomputing.com Abstract strong incentive for people to act maliciously without paying reputational consequences [1]. While this Today a common goal in the area of email security problem can be solved by disallowing anonymity on is to provide protection from a wide variety of threats the Internet, email reputation systems are able to by being more predictive instead of reactive and to address this problem in a much more practical fashion. identify legitimate messages in addition to illegitimate By assigning a reputation to every email entity, messages. There has been previous work in the area of reputation systems can influence agents to operate email reputation systems that can accomplish these responsibly for fear of getting a bad reputation and broader goals by collecting, analyzing, and being unable to correspond with others [2]. distributing email entities' past behavior The goal of an email reputation system is to monitor characteristics. In this paper, we provide taxonomy activity and assign a reputation to an entity based on its that examines the required properties of email past behavior. The reputation value should be able to reputation systems, identifies the range of approaches, denote different levels of trustworthiness on the and surveys previous work. spectrum from good to bad. In 2000, Resnick et al. described Internet reputation system as having three 1. Introduction required properties [3]: • Entities are long lived, As spam volumes have continued to increase with • feedback about current interactions is high rates, comprising 90% of all email by the end of captured and distributed, and 2006 as determined by Secure Computing Research, • past feedback guides buyer decisions.
    [Show full text]
  • Factors Involved in Estimating Cost of Email Spam
    Factors involved in estimating cost of Email spam Farida Ridzuan, Vidyasagar Potdar, Alex Talevski Anti Spam Research Lab, Digital Ecosystems and Business Intelligence Institute, Curtin University of Technology. [email protected], {v.potdar, a.talevski}@curtin.edu.au Abstract. This paper analyses existing research work to identify all possible factors involved in estimating cost of spam. Main motivation of this paper is to provide unbiased spam costs estimation. For that, we first study the email spam lifecycle and identify all possible stakeholders. We then categorise cost and study the impact on each stakeholder. This initial study will form the backbone of the real time spam cost calculating engine that we are developing for Australia. Keywords: spam cost, email spam, spam lifecycle 1 Introduction Spamming in email refers to sending unwanted, irrelevant, inappropriate and unsolicited email messages to a large number of recipients. Sending email is fast, convenient and cheap; making it as an important means of communication in business and personal. This is supported by the report from Radicati Group saying that there is a growth of email users from time to time [1]. Dependencies on email usage throughout the whole world provide a huge opportunity to the spammers for spamming. Spamming activities starts from spammers (who create and send spam), but its impacts goes far beyond them, involving Internet Service Provider (ISP), company, and users (spam email recipients) since they represent the key stakeholders. It is undeniable that each stakeholders involved in this activity has to bear some costs associated with spam. Throughout our study, there are a few papers discussing on the costs of email spam, but most of them focuses only on one stakeholder, which is the user.
    [Show full text]
  • NIST SP 800-44 Version 2
    Special Publication 800-44 Version 2 Guidelines on Securing Public Web Servers Recommendations of the National Institute of Standards and Technology Miles Tracy Wayne Jansen Karen Scarfone Theodore Winograd NIST Special Publication 800-44 Guidelines on Securing Public Web Version 2 Servers Recommendations of the National Institute of Standards and Technology Miles Tracy, Wayne Jansen, Karen Scarfone, and Theodore Winograd C O M P U T E R S E C U R I T Y Computer Security Division Information Technology Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899-8930 September 2007 U.S. Department of Commerce Carlos M. Gutierrez, Secretary National Institute of Standards and Technology James Turner, Acting Director GUIDELINES ON SECURING PUBLIC WEB SERVERS Reports on Computer Systems Technology The Information Technology Laboratory (ITL) at the National Institute of Standards and Technology (NIST) promotes the U.S. economy and public welfare by providing technical leadership for the Nation’s measurement and standards infrastructure. ITL develops tests, test methods, reference data, proof of concept implementations, and technical analysis to advance the development and productive use of information technology. ITL’s responsibilities include the development of technical, physical, administrative, and management standards and guidelines for the cost-effective security and privacy of sensitive unclassified information in Federal computer systems. This Special Publication 800-series reports on ITL’s research, guidance, and outreach efforts in computer security, and its collaborative activities with industry, government, and academic organizations. National Institute of Standards and Technology Special Publication 800-44 Version 2 Natl. Inst. Stand. Technol. Spec. Publ. 800-44 Ver.
    [Show full text]
  • WHITE PAPER Email Deliverability Review
    WHITE PAPER Email DELIVeraBility REView dmawe are the White Paper Email Deliverability Review Published by Deliverability Hub of the Email Marketing Council Sponsored by 1 COPYRIGHT: THE DIRECT MARKETING ASSOCIATION (UK) LTD 2012 WHITE PAPER Email DELIVeraBility REView Contents About this document ...............................................................................................................................3 About the authors ...................................................................................................................................4 Sponsor’s perspective .............................................................................................................................5 Executive summary .................................................................................................................................6 1. Major factors that impact on deliverability ..............................................................................................7 1.1 Sender reputation .............................................................................................................................7 1.2 Spam filtering ...................................................................................................................................7 1.3 Blacklist operators ............................................................................................................................8 1.4 Smart Inboxes ..................................................................................................................................9
    [Show full text]
  • How Do Spammers Harvest Email Addresses ?
    11/26/12 How do spammers harv est email addresses ? How do spammers harvest email addresses ? By Uri Raz There are many ways in which spammers can get your email address. The ones I know of are : 1. From posts to UseNet with your email address. Spammers regularily scan UseNet for email address, using ready made programs designed to do so. Some programs just look at articles headers which contain email address (From:, Reply-To:, etc), while other programs check the articles' bodies, starting with programs that look at signatures, through programs that take everything that contain a '@' character and attempt to demunge munged email addresses. There have been reports of spammers demunging email addresses on occasions, ranging from demunging a single address for purposes of revenge spamming to automatic methods that try to unmunge email addresses that were munged in some common ways, e.g. remove such strings as 'nospam' from email addresses. As people who where spammed frequently report that spam frequency to their mailbox dropped sharply after a period in which they did not post to UseNet, as well as evidence to spammers' chase after 'fresh' and 'live' addresses, this technique seems to be the primary source of email addresses for spammers. 2. From mailing lists. Spammers regularily attempt to get the lists of subscribers to mailing lists [some mail servers will give those upon request],knowing that the email addresses are unmunged and that only a few of the addresses are invalid. When mail servers are configured to refuse such requests, another trick might be used - spammers might send an email to the mailing list with the headers Return- Receipt-To: <email address> or X-Confirm-Reading-To: <email address>.
    [Show full text]
  • Spam and Spam Prevention
    SPAM AND SPAM PREVENTION 1 WHAT IS SPAM? • Classic definition: • Any kind of unsolicited bulk messages, unwanted by the receiver • Cambridge dictionary: • Email that is sent to a lot of people, esp. email that is not wanted • To send someone an advertisement that they do not want by email 2 WHAT IS SPAM? • According to Finn Brunton: “Spamming the project of leveraging information technology to exploit existing gatherings of attention” • Other definitions: • Breakfast meat sold in tin cans • Abbreviation for Special Processed American Meat 3 MEANINGS OF SPAM • Is spam a noun, adjective or a verb? • It refers to exploitation, malfeasance, and bad behavior. • Spam terminology has branched out into specific subdomains like: “Phishing spam”, “419 spam”, splogs, linkfarms, floodbots, content farms. 4 HISTORY OF SPAM • The three epochs of spam: 1. The first from 1970s – 1995 • During this time spam in this context was loud annoying messages 2. The second phase from 1995 – 2003 • Privatization of network • Passage of CAN-SPAM Act in the United States 3. The most recent phase from 2003 – present day • Algorithms and human attention • Adoption of powerful spam filters 5 SPAM STATISTICS • Out of the emails that people receive daily, about 85% are spam That is about 122.3 billion email spam messages • The most common source: • 10.85% come from IPs based in the United States • 23.52% originated from Russia (largest source of spam unsolicited emails sent) 6 SPAM STATISTICS 2019 VS 2020 2019: 2020: • 50,37% of emails were spam (6,14 • Most common spam: Nigerian Prince spam decrease) • Americans faced a fatality of $703,000 to this • Most originated from Russia (21,27%) type of fraud.
    [Show full text]
  • 84. Blog Spam Detection Using Intelligent Bayesian Approach
    International Journal of Engineering Research and General Science Volume 2, Issue 5, August-September, 2014 ISSN 2091-2730 Blog-Spam Detection Using intelligent Bayesian Approach - Krushna Pandit, Savyasaachi Pandit Assistant Professor, GCET, VVNagar, E-mail- [email protected] , M - 9426759947 Abstract Blog-spam is one of the major problems of the Internet nowadays. Since the history of the internet the spam are considered a huge threat to the security and reliability of web content. The spam is the unsolicited messages sent for the fulfillment of the sender’s purpose and to harm the privacy of user, site owner and/or to steal available resource over the internet (may be or may not be allocated to).For dealing with spam there are so many methodologies available. Nowadays the blog spamming is a rising threat to safety, reliability, & purity of the published internet content. Since the search engines are using certain specific algorithms for creating the searching page-index/rank for the websites (i.e. google-analytics), it has attracted so many attention to spam the SMS(Social Media Sites) for gaining rank in order to increase the company’s popularity. The available solutions to malicious content detection are quite a more to be used very frequently in order to fulfill the requirement of analyzing all the web content in certain time with least possible ―false positives‖. For this purpose a site level algorithm is needed so that it can be easy, cheap & understandable (for site modifiers) to filter and monitor the content being published. Now for that we use a ―Bayes Theorem‖ of the ―Statistical approach‖.
    [Show full text]