Web Spam Taxonomy
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Internet and Drug Markets
INSIGHTS EN ISSN THE INTERNET AND DRUG MARKETS 2314-9264 The internet and drug markets 21 The internet and drug markets EMCDDA project group Jane Mounteney, Alessandra Bo and Alberto Oteo 21 Legal notice This publication of the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA) is protected by copyright. The EMCDDA accepts no responsibility or liability for any consequences arising from the use of the data contained in this document. The contents of this publication do not necessarily reflect the official opinions of the EMCDDA’s partners, any EU Member State or any agency or institution of the European Union. Europe Direct is a service to help you find answers to your questions about the European Union Freephone number (*): 00 800 6 7 8 9 10 11 (*) The information given is free, as are most calls (though some operators, phone boxes or hotels may charge you). More information on the European Union is available on the internet (http://europa.eu). Luxembourg: Publications Office of the European Union, 2016 ISBN: 978-92-9168-841-8 doi:10.2810/324608 © European Monitoring Centre for Drugs and Drug Addiction, 2016 Reproduction is authorised provided the source is acknowledged. This publication should be referenced as: European Monitoring Centre for Drugs and Drug Addiction (2016), The internet and drug markets, EMCDDA Insights 21, Publications Office of the European Union, Luxembourg. References to chapters in this publication should include, where relevant, references to the authors of each chapter, together with a reference to the wider publication. For example: Mounteney, J., Oteo, A. and Griffiths, P. -
Fully Automatic Link Spam Detection∗ Work in Progress
SpamRank – Fully Automatic Link Spam Detection∗ Work in progress András A. Benczúr1,2 Károly Csalogány1,2 Tamás Sarlós1,2 Máté Uher1 1 Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) 11 Lagymanyosi u., H–1111 Budapest, Hungary 2 Eötvös University, Budapest {benczur, cskaresz, stamas, umate}@ilab.sztaki.hu www.ilab.sztaki.hu/websearch Abstract Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We propose a novel method based on the concept of personalized PageRank that detects pages with an undeserved high PageRank value without the need of any kind of white or blacklists or other means of human intervention. We assume that spammed pages have a biased distribution of pages that contribute to the undeserved high PageRank value. We define SpamRank by penalizing pages that originate a suspicious PageRank share and personalizing PageRank on the penalties. Our method is tested on a 31 M page crawl of the .de domain with a manually classified 1000-page stratified random sample with bias towards large PageRank values. 1 Introduction Identifying and preventing spam was cited as one of the top challenges in web search engines in a 2002 paper [24]. Amit Singhal, principal scientist of Google Inc. estimated that the search engine spam industry had a revenue potential of $4.5 billion in year 2004 if they had been able to completely fool all search engines on all commercially viable queries [36]. Due to the large and ever increasing financial gains resulting from high search engine ratings, it is no wonder that a significant amount of human and machine resources are devoted to artificially inflating the rankings of certain web pages. -
Clique-Attacks Detection in Web Search Engine for Spamdexing Using K-Clique Percolation Technique
International Journal of Machine Learning and Computing, Vol. 2, No. 5, October 2012 Clique-Attacks Detection in Web Search Engine for Spamdexing using K-Clique Percolation Technique S. K. Jayanthi and S. Sasikala, Member, IACSIT Clique cluster groups the set of nodes that are completely Abstract—Search engines make the information retrieval connected to each other. Specifically if connections are added task easier for the users. Highly ranking position in the search between objects in the order of their distance from one engine query results brings great benefits for websites. Some another a cluster if formed when the objects forms a clique. If website owners interpret the link architecture to improve ranks. a web site is considered as a clique, then incoming and To handle the search engine spam problems, especially link farm spam, clique identification in the network structure would outgoing links analysis reveals the cliques existence in web. help a lot. This paper proposes a novel strategy to detect the It means strong interconnection between few websites with spam based on K-Clique Percolation method. Data collected mutual link interchange. It improves all websites rank, which from website and classified with NaiveBayes Classification participates in the clique cluster. In Fig. 2 one particular case algorithm. The suspicious spam sites are analyzed for of link spam, link farm spam is portrayed. That figure points clique-attacks. Observations and findings were given regarding one particular node (website) is pointed by so many nodes the spam. Performance of the system seems to be good in terms of accuracy. (websites), this structure gives higher rank for that website as per the PageRank algorithm. -
Spamming Botnets: Signatures and Characteristics
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten+,IvanOsipkov+ Microsoft Research, Silicon Valley +Microsoft Corporation {yxie,fangyu,kachan,rina,ghulten,ivano}@microsoft.com ABSTRACT botnet infection and their associated control process [4, 17, 6], little In this paper, we focus on characterizing spamming botnets by effort has been devoted to understanding the aggregate behaviors of leveraging both spam payload and spam server traffic properties. botnets from the perspective of large email servers that are popular Towards this goal, we developed a spam signature generation frame- targets of botnet spam attacks. work called AutoRE to detect botnet-based spam emails and botnet An important goal of this paper is to perform a large scale analy- membership. AutoRE does not require pre-classified training data sis of spamming botnet characteristics and identify trends that can or white lists. Moreover, it outputs high quality regular expression benefit future botnet detection and defense mechanisms. In our signatures that can detect botnet spam with a low false positive rate. analysis, we make use of an email dataset collected from a large Using a three-month sample of emails from Hotmail, AutoRE suc- email service provider, namely, MSN Hotmail. Our study not only cessfully identified 7,721 botnet-based spam campaigns together detects botnet membership across the Internet, but also tracks the with 340,050 unique botnet host IP addresses. sending behavior and the associated email content patterns that are Our in-depth analysis of the identified botnets revealed several directly observable from an email service provider. Information interesting findings regarding the degree of email obfuscation, prop- pertaining to botnet membership can be used to prevent future ne- erties of botnet IP addresses, sending patterns, and their correlation farious activities such as phishing and DDoS attacks. -
Zambia and Spam
ZAMNET COMMUNICATION SYSTEMS LTD (ZAMBIA) Spam – The Zambian Experience Submission to ITU WSIS Thematic meeting on countering Spam By: Annabel S Kangombe – Maseko June 2004 Table of Contents 1.0 Introduction 1 1.1 What is spam? 1 1.2 The nature of Spam 1 1.3 Statistics 2 2.0 Technical view 4 2.1 Main Sources of Spam 4 2.1.1 Harvesting 4 2.1.2 Dictionary Attacks 4 2.1.3 Open Relays 4 2.1.4 Email databases 4 2.1.5 Inadequacies in the SMTP protocol 4 2.2 Effects of Spam 5 2.3 The fight against spam 5 2.3.1 Blacklists 6 2.3.2 White lists 6 2.3.3 Dial‐up Lists (DUL) 6 2.3.4 Spam filtering programs 6 2.4 Challenges of fighting spam 7 3.0 Legal Framework 9 3.1 Laws against spam in Zambia 9 3.2 International Regulations or Laws 9 3.2.1 US State Laws 9 3.2.2 The USA’s CAN‐SPAM Act 10 4.0 The Way forward 11 4.1 A global effort 11 4.2 Collaboration between ISPs 11 4.3 Strengthening Anti‐spam regulation 11 4.4 User education 11 4.5 Source authentication 12 4.6 Rewriting the Internet Mail Exchange protocol 12 1.0 Introduction I get to the office in the morning, walk to my desk and switch on the computer. One of the first things I do after checking the status of the network devices is to check my email. -
Download PDF Document, 456 KB
ENISA Position Paper No. 2 Reputation-based Systems: a security analysis Editors: Elisabetta Carrara and Giles Hogben, ENISA October 2007 Reputation-based Systems ENISA Position Papers represent expert opinion on topics ENISA considers to be important emerging risks or key security components. They are produced as the result of discussion among a group of experts who were selected for their knowledge in the area. The content was collected via wiki, mailing list and telephone conferences and edited by ENISA. This paper aims to provide a useful introduction to security issues affecting Reputation-based Systems by identifying a number of possible threats and attacks, highlighting the security requirements that should be fulfilled by these systems and providing recommendations for action and best practices to reduce the security risks to users. Examples are given from a number of providers throughout the paper. These should be taken as examples only and there is no intention to single out a specific provider for criticism or praise. The examples provided are not necessarily those most representative or important, nor is the aim of this paper to conduct any kind of market survey, as there might be other providers which are not mentioned here and nonetheless are equally or more representative of the market. Audience This paper is aimed at providers, designers, research and standardisation communities, government policy-makers and businesses. ENISA Position Paper No.2 Reputation-based Systems: a security analysis 1 Reputation-based Systems EXECUTIVE -
Enisa Etl2020
EN From January 2019 to April 2020 Spam ENISA Threat Landscape Overview The first spam message was sent in 1978 by a marketing manager to 393 people via ARPANET. It was an advertising campaign for a new product from the company he worked for, the Digital Equipment Corporation. For those first 393 spammed people it was as annoying as it would be today, regardless of the novelty of the idea.1 Receiving spam is an inconvenience, but it may also create an opportunity for a malicious actor to steal personal information or install malware.2 Spam consists of sending unsolicited messages in bulk. It is considered a cybersecurity threat when used as an attack vector to distribute or enable other threats. Another noteworthy aspect is how spam may sometimes be confused or misclassified as a phishing campaign. The main difference between the two is the fact that phishing is a targeted action using social engineering tactics, actively aiming to steal users’ data. In contrast spam is a tactic for sending unsolicited e-mails to a bulk list. Phishing campaigns can use spam tactics to distribute messages while spam can link the user to a compromised website to install malware and steal personal data. Spam campaigns, during these last 41 years have taken advantage of many popular global social and sports events such as UEFA Europa League Final, US Open, among others. Even so, nothing compared with the spam activity seen this year with the COVID-19 pandemic.8 2 __Findings 85%_of all e-mails exchanged in April 2019 were spam, a 15-month high1 14_million -
Locating Spambots on the Internet
BOTMAGNIFIER: Locating Spambots on the Internet Gianluca Stringhinix, Thorsten Holzz, Brett Stone-Grossx, Christopher Kruegelx, and Giovanni Vignax xUniversity of California, Santa Barbara z Ruhr-University Bochum fgianluca,bstone,chris,[email protected] [email protected] Abstract the world-wide email traffic [20], and a lucrative busi- Unsolicited bulk email (spam) is used by cyber- ness has emerged around them [12]. The content of spam criminals to lure users into scams and to spread mal- emails lures users into scams, promises to sell cheap ware infections. Most of these unwanted messages are goods and pharmaceutical products, and spreads mali- sent by spam botnets, which are networks of compro- cious software by distributing links to websites that per- mised machines under the control of a single (malicious) form drive-by download attacks [24]. entity. Often, these botnets are rented out to particular Recent studies indicate that, nowadays, about 85% of groups to carry out spam campaigns, in which similar the overall spam traffic on the Internet is sent with the mail messages are sent to a large group of Internet users help of spamming botnets [20,36]. Botnets are networks in a short amount of time. Tracking the bot-infected hosts of compromised machines under the direction of a sin- that participate in spam campaigns, and attributing these gle entity, the so-called botmaster. While different bot- hosts to spam botnets that are active on the Internet, are nets serve different, nefarious goals, one important pur- challenging but important tasks. In particular, this infor- pose of botnets is the distribution of spam emails. -
The History of Spam Timeline of Events and Notable Occurrences in the Advance of Spam
The History of Spam Timeline of events and notable occurrences in the advance of spam July 2014 The History of Spam The growth of unsolicited e-mail imposes increasing costs on networks and causes considerable aggravation on the part of e-mail recipients. The history of spam is one that is closely tied to the history and evolution of the Internet itself. 1971 RFC 733: Mail Specifications 1978 First email spam was sent out to users of ARPANET – it was an ad for a presentation by Digital Equipment Corporation (DEC) 1984 Domain Name System (DNS) introduced 1986 Eric Thomas develops first commercial mailing list program called LISTSERV 1988 First know email Chain letter sent 1988 “Spamming” starts as prank by participants in multi-user dungeon games by MUDers (Multi User Dungeon) to fill rivals accounts with unwanted electronic junk mail. 1990 ARPANET terminates 1993 First use of the term spam was for a post from USENET by Richard Depew to news.admin.policy, which was the result of a bug in a software program that caused 200 messages to go out to the news group. The term “spam” itself was thought to have come from the spam skit by Monty Python's Flying Circus. In the sketch, a restaurant serves all its food with lots of spam, and the waitress repeats the word several times in describing how much spam is in the items. When she does this, a group of Vikings in the corner start a song: "Spam, spam, spam, spam, spam, spam, spam, spam, lovely spam! Wonderful spam!" Until told to shut up. -
A Survey of Learning-Based Techniques of Email Spam Filtering
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Unitn-eprints Research A SURVEY OF LEARNING-BASED TECHNIQUES OF EMAIL SPAM FILTERING Enrico Blanzieri and Anton Bryl January 2008 (Updated version) Technical Report # DIT-06-056 A Survey of Learning-Based Techniques of Email Spam Filtering Enrico Blanzieri, University of Trento, Italy, and Anton Bryl University of Trento, Italy, Create-Net, Italy [email protected] January 11, 2008 Abstract vertising pornography, pyramid schemes, etc. [68]. The total worldwide financial losses caused by spam Email spam is one of the major problems of the to- in 2005 were estimated by Ferris Research Analyzer day’s Internet, bringing financial damage to compa- Information Service at $50 billion [31]. nies and annoying individual users. Among the ap- Lately, Goodman et al. [39] presented an overview proaches developed to stop spam, filtering is an im- of the field of anti-spam protection, giving a brief portant and popular one. In this paper we give an history of spam and anti-spam and describing major overview of the state of the art of machine learn- directions of development. They are quite optimistic ing applications for spam filtering, and of the ways in their conclusions, indicating learning-based spam of evaluation and comparison of different filtering recognition, together with anti-spoofing technologies methods. We also provide a brief description of and economic approaches, as one of the measures other branches of anti-spam protection and discuss which together will probably lead to the final victory the use of various approaches in commercial and non- over email spammers in the near future. -
Uncovering Social Network Sybils in the Wild
Uncovering Social Network Sybils in the Wild ZHI YANG, Peking University 2 CHRISTO WILSON, University of California, Santa Barbara XIAO WANG, Peking University TINGTING GAO,RenrenInc. BEN Y. ZHAO, University of California, Santa Barbara YAFEI DAI, Peking University Sybil accounts are fake identities created to unfairly increase the power or resources of a single malicious user. Researchers have long known about the existence of Sybil accounts in online communities such as file-sharing systems, but they have not been able to perform large-scale measurements to detect them or measure their activities. In this article, we describe our efforts to detect, characterize, and understand Sybil account activity in the Renren Online Social Network (OSN). We use ground truth provided by Renren Inc. to build measurement-based Sybil detectors and deploy them on Renren to detect more than 100,000 Sybil accounts. Using our full dataset of 650,000 Sybils, we examine several aspects of Sybil behavior. First, we study their link creation behavior and find that contrary to prior conjecture, Sybils in OSNs do not form tight-knit communities. Next, we examine the fine-grained behaviors of Sybils on Renren using clickstream data. Third, we investigate behind-the-scenes collusion between large groups of Sybils. Our results reveal that Sybils with no explicit social ties still act in concert to launch attacks. Finally, we investigate enhanced techniques to identify stealthy Sybils. In summary, our study advances the understanding of Sybil behavior on OSNs and shows that Sybils can effectively avoid existing community-based Sybil detectors. We hope that our results will foster new research on Sybil detection that is based on novel types of Sybil features. -
Svms for the Blogosphere: Blog Identification and Splog Detection
SVMs for the Blogosphere: Blog Identification and Splog Detection Pranam Kolari∗, Tim Finin and Anupam Joshi University of Maryland, Baltimore County Baltimore MD {kolari1, finin, joshi}@umbc.edu Abstract Most blog search engines identify blogs and index con- tent based on update pings received from ping servers1 or Weblogs, or blogs have become an important new way directly from blogs, or through crawling blog directories to publish information, engage in discussions and form communities. The increasing popularity of blogs has and blog hosting services. To increase their coverage, blog given rise to search and analysis engines focusing on the search engines continue to crawl the Web to discover, iden- “blogosphere”. A key requirement of such systems is to tify and index blogs. This enables staying ahead of compe- identify blogs as they crawl the Web. While this ensures tition in a domain where “size does matter”. Even if a web that only blogs are indexed, blog search engines are also crawl is inessential for blog search engines, it is still pos- often overwhelmed by spam blogs (splogs). Splogs not sible that processed update pings are from non-blogs. This only incur computational overheads but also reduce user requires that the source of the pings need to be verified as a satisfaction. In this paper we first describe experimental blog prior to indexing content.2 results of blog identification using Support Vector Ma- In the first part of this paper we address blog identifica- chines (SVM). We compare results of using different feature sets and introduce new features for blog iden- tion by experimenting with different feature sets.